联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-12-02 10:10

2.5 Project: A miniature relational database with order

This project is due on Thursday Dec 5, 2019 at 4:30 PM.

Given ordered tables (array-tables) whose rows consist of strings (string

constants are bracketed by single quotes) and integers, you are to write a

program which will

? Perform the basic operations of relational algebra: selection, projection,

join, group by, and count, sum and avg aggregates. The comparators

for select and join will be = <, >, ! =, ≥, ≤

? Because the array-tables are potentially ordered, you can sort an arraytable

by one or more columns, and running moving sums and average

aggregates on a column of an array-table.

? Import a vertical bar delimited file into an array-table (in the same

order), export from an array-table to a file preserving its order, and

assign the result of a query to an array-table.

? Each operation will be on a single line. Each time you execute a line,

you should print the time it took to execute.

? You will support in memory B-trees and hash structures. You are

welcome to take those implementations from wherever you can find

them, but you must say where.

? Your program should be written in python or java. You will hand in

clean and well structured source code in which each function has a

header that says (i) what the function deos, (ii) what its inputs are

and what they mean (iii) what the outputs are and mean (iv) any side

effects to globals.

? You must ensure that your software runs on the Courant Institute

(cims) machine crunchy5.cims.nyu.edu

? You may NOT use any relational algebra or SQL library or system

(e.g. no SQLite, no mySQL, no other relational database system, no

Pandas ). Stick pretty much to the standard stuff (e.g. in Python:

numpy, core language features, string manipulation, random number

generators, and data structure support for in memory B-trees and hash

structures). You may not use anyone else’s code (other than for the

4

B-tree or Hash structure implementation). Doing so will constitute

plagiarism.

Your program should take operations from standard input. We will run

your programs on test cases of our choosing. For ease of parsing there will

be one operation per line. Comments begin with // and go to the end of

the line. For example,

R := inputfromfile(sales1) // import vertical bar delimited foo, first line

// has column headers.

// Suppose they are saleid|itemid|customerid|storeid|time|qty|pricerange

// In general there can be more or fewer columns than this.

R1 := select(R, (time > 50) or (qty < 30))

// select * from R where time > 50 or qty < 30

R2 := project(R1, saleid, qty, pricerange) // select saleid, qty, pricerange

// from R1

R3 := avg(R1, qty) // select avg(qty) from R1

R4 := sumgroup(R1, time, qty) // select sum(time), qty from R1 group by qty

R5 := sumgroup(R1, qty, time, pricerange) // select sum(qty), time,

// pricerange from R1 group by time, pricerange

R6 := avggroup(R1, qty, pricerange) // select avg(qty), pricerange

// from R1 group by by pricerange

S := inputfromfile(sales2) // suppose column headers are

// saleid|I|C|S|T|Q|P

T := join(R, S, R.customerid = S.C) // select * from R, S

// where R.customerid = S.C

T1 := join(R1, S, (R1.qty > S.Q) and (R1.saleid = S.saleid)) // select * from R1, S where

T2 := sort(T1, S_C) // sort T1 by S_C

T2prime := sort(T1, R1_time, S_C) // sort T1 by R_time, S_C (in that order)

T3 := movavg(T2prime, R1_qty, 3) // perform the three item moving average of T2prime

// on column R_qty. This will be as long as R_qty with the three way

// moving average of 4 8 9 7 being 4 6 7 8

T4 := movsum(T2prime, R1_qty, 5) // perform the five item moving sum of T2prime

// on column R_qty

Q1 := select(R, qty = 5) // select * from R where qty=5

Btree(R, qty) // create an index on R based on column qty

// Equality selections and joins on R should use the index.

// All indexes will be on one column (both Btree and Hash)

Q2 := select(R, qty = 5) // this should use the index

5

Q3 := select(R, itemid = 7) // select * from R where itemid = 7

Hash(R,itemid)

Q4 := select(R, itemid = 7) // this should use the hash index

Q5 := concat(Q4, Q2) // concatenate the two tables (must have the same schema)

// Duplicate rows may result (though not with this example).

outputtofile(Q5, Q5) // This should output the table Q5 into a file

// with the same name and with vertical bar separators

outputtofile(T, T) // This should output the table T

Our tests may operate on different files with different column headers.

Our queries may use different paramter values (e.g. 14 way moving average).

Our joins may be on different fields.

Some constraints to make your life easier:

? There will be no syntax errors in our tests. However, white space

may vary in the operations, e.g. select(R, itemid = 7), select( R,

itemid = 7), select(R,itemid=7), and select(R ,itemid = 7) all should

be interpreted in the same way.

? The only aggregates are count, sum, and avg and the corresponding

countgroup, sumgroup, and avggroup.

? The only moving aggregates are movsum and movavg. There is no

group by for moving sums and averages.

? All data is in main memory.

? Within select there may be several conditions but only all ands or all

ors.

? Within join there may be several conditions but only all ands.

? A join condition can be

table.attribute [arithop constant] relop table.attribute [artithop constant]

? A select condition can be

attribute [arithop constant] relop constant

or

constant relop attribute [arithop constant]

6

? If there are multiple conditions, each condition will be surrounded by

parentheses.

? relop is =, !=, >, >=, <, <=

arithop is +, -, *, /

? If R is (A,B,C) and S is (B,C,D,E) then

the result of the join will have A, R_B, R_C, S_B, S_C, D, E

as column names.

You have the option to use the _ notation for every column, so

the result of the above join could be:

R_A, R_B, R_C, S_B, S_C, S_D, S_E

? All numbers should be interpreted as floating point. So 5/2 = 2.5

Here is an example of the first few lines of the sales1 file:

saleid|itemid|customerid|storeid|time|qty|pricerange

45|133|2|63|49|23|outrageous

658|75|2|89|46|43|outrageous

149|103|2|23|67|2|cheap

398|82|2|41|3|27|outrageous

147|81|2|4|92|11|outrageous

778|75|160|72|67|17|supercheap

829|112|2|70|63|43|supercheap

101|105|2|9|74|28|expensive

940|62|2|90|67|39|outrageous

864|119|12|38|67|49|outrageous

288|46|2|95|67|26|outrageous

875|83|59|56|59|20|outrageous

783|86|180|29|67|46|outrageous

289|16|2|95|92|2|cheap

Full example files can be found here:

http://cs.nyu.edu/cs/faculty/shasha/papers/sales1

http://cs.nyu.edu/cs/faculty/shasha/papers/sales2

You will hand in your code using Reprozip inside a Docker Virtual Machine

so that it is reproducible across platforms.

7


版权所有:编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。