联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-03-30 11:00

W4111 -- Introduction to Databases

Homework 3

Spring 2019, Sections 03, V03, H03

Introduction

This is the specification of for homework 3 for W4111 - Introduction to Databases, section 03,

H03, V03 for spring semester 2019. This document is always the current version of the

specification. Developers are responsible for continuously reviewing the document for changes.

Document Control

Roles

Author UNI Role

Ferguson, Donald, F. dff9 Instructor

Approver UNI Role

Ferguson, Donald, F. dff9 Instructor

Reviewer UNI Role

Dalchand, Samantha sd2995 Assistant Instructor

Dhillon, Kirit ksd2142 Assistant Instructor

Gandikota, Chandana cg3111 Assistant Instructor

Gorrela, Meghna mg3740 Assistant Instructor

Huang, Rose rh2805 Assistant Instructor

Hudson, Alysha alh2202 Assistant Instructor

Karasev, Mikhail mak2257 Assistant Instructor

Peterson, Ara alp2210 Assistant Instructor

Saosun, Tahsina ts2931 Assistant Instructor

Swaroop, Vatsala vs2671 Assistant Instructor

Tan, Xinyue xt2215 Assistant Instructor

Change Log

Change No. Date Document

Version

Changes

Change Process

Students should post clarification requests on this Piazza thread. The current version of this

document and the change log will note changes/clarifications. There will not be any other source

documenting changes or clarifications.

Overview

This project has two parts:

1. Implement indexes on top of CSV data in CSVDataTable implementations.

2. Query optimization:

1. Add a JOIN function to the CSVDataTable and implement query optimizations.

2. Use Access Paths based on index selection to optimize find_by_template() and

join().

Allowed Frameworks/Libraries

● You MAY only use libraries that are part of the core Python environment, e.g. csv, json,

etc.

● You MUST NOT use Pandas.

Indexes and File

1. The CSVDataTable manages dictionaries (also known as maps, name value pairs). An

individual dictionary represents a row. The CSVDataTable as a whole represents all of

the rows in a CSV file.

2. A template is also a dictionary. A row matches a template if for each key in the

dictionary, the row has a key that identifies a piece of data with exactly the same value

as in the template.

3. Your CSVDataTable must implement the following operations:

1. insert(row)

2. find_by_template(template, field_list, index_allowed)

1. This must return a CSVDataTable.

2. The table contains dictionaries that match the template and contain the

requested fields (dictionary keys and values).

3. If index_allowed is True, the find may use an index if one is supports the

template.

3. delete(template) deletes all rows matching a template.

4. add_index(name, kind, column_list):

1. name is a caller defined name.

2. kind is one of “PRIMARY”, “UNIQUE”, “INDEX”.

1. “UNIQUE” means that at most one row may exist in the table for a

set of column values specified by the column_list.

2. “PRIMARY” has the same behavior as unique, but there can be

only one “PRIMARY” index.

3. “INDEX” allows duplicate values.

3. column_list is the set of column names that comprise the index definition.

5. import(rows): rows is a list of dictionaries. This operation inserts the rows into the

CSVDataTable.

6. save(): This function saves the CSVDataTable data (rows) and index information

to a single file.

7. load(): This function loads the rows and index information from a single file.

4. Data file behavior:

1. Index information/state must persist between a save() and load(). You may not

rebuild indexes on data load.

2. load() loads the entire data and indexes.

3. save() saves all of the data and index information.

5. Your implementation should perform input validation on methods.

6. Indexes only need to support equality comparisons.

Join and Query Optimization

Your CSVDataTable implementation must support the following operation:

join(other_table, on_columns, where_template, field_list):

● Other table is a reference to a CSVDataTable.

● on_columns is a list of column names common to both tables. The join() function

implements an equi-join using these columns.

● where_template is a dictionary. The only comparison operator is “==” and the resulting

rows must match the template completely. They keys in the dictionary are of the form:

○ table_name.column_name

○ This specified the name of the table and column to which the template element

applies.

● field_list is of columns of the same form as the template column definition. The query

only returns the requested columns.

The join() function returns a CSVDataTable, which supports all CSVDataTable operations.

Your implementation of MUST implement at least three optimizations that are analogous to SQL

optimizations covered in lectures.

Your implementation should test for obvious error conditions and demonstrate improvements

from your optimizations.

Submission Format

You will submit your homework as a zip file. The file will have the following directory structure:

● /src: Your implementation code.

● /tests: Your test code.

● /test_output: The console output from running the tests.

● /CSVFile: Contains any CSV files you use in testing

● /DB: Contains the holding tables/indexes.

You must name the zip file following guidelines previously posted on Piazza.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp