Part I. Paper-and-Pencil Questions
1. Assume a hard drive with the following specifications: 1,024 bytes/sector, 8
sectors/track, 10 tracks/cylinder, 18 msec for average seek time, 8.3 msec for average
latency time, and 1229 bytes/msec for transfer rate. Consider a file of 8,000 records, each
of size 200 bytes. Suppose that the file is contiguously stored on the drive, without
records spanning between sectors. Further, assume that there are roughly one average
seek and one average latency for each sequential access of the file. Indicate briefly how
your answers are derived for each of the following questions:
(a) Using the external merge-sort method, approximately how many disk accesses are
required to sort the file by the primary key. Assume that the RAM has 4 buffers
available, each of which contains 1024 bytes. When merging sorted runs, you can use
three buffers for the input, each buffer for a run, and one buffer for the output. How
much is the total I/O access time for the internal sort phase? Note that the access time is
only required for the internal sort phase, but the number of disk accesses is required for
the entire merge-sort process.
(b) Assume that each primary key takes 40 bytes and each pointer, 5 bytes. Consider the
creation of a sparse index for the above sorted file.
(i) How many levels does the index contain, and how many blocks are required at
each level? Note that you can build a higher-level index based on an existing
index until the top-level index becomes small enough to be stored in a single
block.
(ii) Answer question (i) again if the primary keys can be compressed from 40 bytes to
15 bytes. How much space is saved for the entire index in terms of blocks?
2. A PARTS file with Part# as the key field includes records with the following Part#
values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
(a) Construct B-trees of orders 4 and 8, respectively, for the above sequence of key
values. Indicate explicitly the relative page numbers for each of the B-tree nodes
created.
2
(b) For the B-tree of order 4 constructed in (a), show how the tree is changed after
inserting the key values 9, 10, and 8.
(c) For the B-tree modified in (b), show how the tree is further changed after deleting the
key values: 23, 11, and 19. Note that when deleting a key value, we always try
redistribution before concatenation, and in both cases, we always try the left sibling
before the right sibling.
Part II. Programming Exercise
3. Implement a C/C++ program that will create and maintain a B-tree index for a large
data file. Each B-tree page contains 512 bytes, and you can assume fixed-length fields for
the key and the byte-offset to a data record when designing an appropriate B-tree page
structure for the data set specified at the end of this exercise. The data file consists of
variable-length records, each of which is started with a length indicator and followed with
a sequence of fields separated by the delimiter (“|”). Both the data file and the B-tree
index should have their own header records. For the data file, the header record should
contain a length-indicator, B-tree index name, flag for index modification, the number of
records, and the file size. For B-tree index, the header record should have a lengthindicator,
RRN for the root page, and RRN for the last page.
You should support an interactive command interpreter that takes a command
from a user and performs the required task. A good user interface is important: it does
not have to be fancy, but should be friendly with easy-to-follow instructions. In addition,
defensive programming is desirable: if the user makes a mistake, he or she should be
allowed to try again. The set of commands to be supported are described in the
following:
(a) insert: add the information of a new data record into the B-tree index and the data
file. Note that for B-tree index, redistribution is not normally required for insertion.
(b) load: sequentially load the initial set of data records from a text file and display the
total time in msec used for the loading process. You can implement this command in
terms of insertions by adding one record at a time.
(c) search: look for a particular record by the primary key and if found, display its content
to the screen.
(d) traverse: start with the root page and display all the B-tree pages in the depth-first
ordering.
(e) report: sequentially display all the data records in the increasing order of the primary
keys.
3
(f) quit: exit the command interpreter.
Note that before entering and after leaving the command interpreter, you need to open and
close the B-tree index and the data file, and the “load” command is only allowed when
the B-tree index and the data file are truly empty.
Sample Data Set: To make it easier for you to test your program, a sample data set is
provided on CourseLink. It is a collection of English words, along with their
pronunciations, stress markings, and indications of whether they are
regular/foreign/irregular. You can use the word field as the primary key, since no two
words spell the same in the sample data set. There are two data files available:
nettalk.data.huge and nettalk.data.small. You can use the small data file to test and debug
your program, and then use the huge data file for testing when your program is reasonably
robust. There is also another file: nettalk.data.format, which explains the data format in
details.
Submission Requirements: Same as described in Assignment One.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。