W4111 -- Introduction to Databases
Homework 3
Spring 2019, Sections 03, V03, H03
Introduction
This is the specification of for homework 3 for W4111 - Introduction to Databases, section 03,
H03, V03 for spring semester 2019. This document is always the current version of the
specification. Developers are responsible for continuously reviewing the document for changes.
Document Control
Roles
Author UNI Role
Ferguson, Donald, F. dff9 Instructor
Approver UNI Role
Ferguson, Donald, F. dff9 Instructor
Reviewer UNI Role
Dalchand, Samantha sd2995 Assistant Instructor
Dhillon, Kirit ksd2142 Assistant Instructor
Gandikota, Chandana cg3111 Assistant Instructor
Gorrela, Meghna mg3740 Assistant Instructor
Huang, Rose rh2805 Assistant Instructor
Hudson, Alysha alh2202 Assistant Instructor
Karasev, Mikhail mak2257 Assistant Instructor
Peterson, Ara alp2210 Assistant Instructor
Saosun, Tahsina ts2931 Assistant Instructor
Swaroop, Vatsala vs2671 Assistant Instructor
Tan, Xinyue xt2215 Assistant Instructor
Change Log
Change No. Date Document
Version
Changes
Change Process
Students should post clarification requests on this Piazza thread. The current version of this
document and the change log will note changes/clarifications. There will not be any other source
documenting changes or clarifications.
Overview
This project has two parts:
1. Implement indexes on top of CSV data in CSVDataTable implementations.
2. Query optimization:
1. Add a JOIN function to the CSVDataTable and implement query optimizations.
2. Use Access Paths based on index selection to optimize find_by_template() and
join().
Allowed Frameworks/Libraries
● You MAY only use libraries that are part of the core Python environment, e.g. csv, json,
etc.
● You MUST NOT use Pandas.
Indexes and File
1. The CSVDataTable manages dictionaries (also known as maps, name value pairs). An
individual dictionary represents a row. The CSVDataTable as a whole represents all of
the rows in a CSV file.
2. A template is also a dictionary. A row matches a template if for each key in the
dictionary, the row has a key that identifies a piece of data with exactly the same value
as in the template.
3. Your CSVDataTable must implement the following operations:
1. insert(row)
2. find_by_template(template, field_list, index_allowed)
1. This must return a CSVDataTable.
2. The table contains dictionaries that match the template and contain the
requested fields (dictionary keys and values).
3. If index_allowed is True, the find may use an index if one is supports the
template.
3. delete(template) deletes all rows matching a template.
4. add_index(name, kind, column_list):
1. name is a caller defined name.
2. kind is one of “PRIMARY”, “UNIQUE”, “INDEX”.
1. “UNIQUE” means that at most one row may exist in the table for a
set of column values specified by the column_list.
2. “PRIMARY” has the same behavior as unique, but there can be
only one “PRIMARY” index.
3. “INDEX” allows duplicate values.
3. column_list is the set of column names that comprise the index definition.
5. import(rows): rows is a list of dictionaries. This operation inserts the rows into the
CSVDataTable.
6. save(): This function saves the CSVDataTable data (rows) and index information
to a single file.
7. load(): This function loads the rows and index information from a single file.
4. Data file behavior:
1. Index information/state must persist between a save() and load(). You may not
rebuild indexes on data load.
2. load() loads the entire data and indexes.
3. save() saves all of the data and index information.
5. Your implementation should perform input validation on methods.
6. Indexes only need to support equality comparisons.
Join and Query Optimization
Your CSVDataTable implementation must support the following operation:
join(other_table, on_columns, where_template, field_list):
● Other table is a reference to a CSVDataTable.
● on_columns is a list of column names common to both tables. The join() function
implements an equi-join using these columns.
● where_template is a dictionary. The only comparison operator is “==” and the resulting
rows must match the template completely. They keys in the dictionary are of the form:
○ table_name.column_name
○ This specified the name of the table and column to which the template element
applies.
● field_list is of columns of the same form as the template column definition. The query
only returns the requested columns.
The join() function returns a CSVDataTable, which supports all CSVDataTable operations.
Your implementation of MUST implement at least three optimizations that are analogous to SQL
optimizations covered in lectures.
Your implementation should test for obvious error conditions and demonstrate improvements
from your optimizations.
Submission Format
You will submit your homework as a zip file. The file will have the following directory structure:
● /src: Your implementation code.
● /tests: Your test code.
● /test_output: The console output from running the tests.
● /CSVFile: Contains any CSV files you use in testing
● /DB: Contains the holding tables/indexes.
You must name the zip file following guidelines previously posted on Piazza.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。