Total Points (Weight): 100 (10%)
Assignment 3
COMPSCI 351-751/SOFTENG 351:
Database Systems
Due: 31 May at 11:59 pm 2024
1 Query Processing [10 marks]
Consider the join r ▷◁ s of two relations r and s whose common attribute set is {A}. Physically, r is stored
on 25 blocks and s on 21 blocks on the disk, tuples in both relations are unordered. Assume that the buffer
pool allocated for carrying out the join has 3 frames. Compare block nested-loop join against merge join in
facilitating r ▷◁ s by analyzing their I/O costs. The I/Os for exporting the ffnal joined results to the disk is called
the reporting cost, which shall be excluded from the calculation of the I/Os of r ▷◁ s, because the reporting
costs of both nested-loop join and merge join cancel each other out in the comparison. Speciffcally,
(A) Compute the # of I/Os, excluding the reporting cost, engaged by block nested-loop join. [5 marks]
(B) Compute the # of I/Os, excluding the reporting cost, engaged by merge join in the worst-case and
best-case scenarios, respectively. Here a scenario indicates an instantiation of the tuples in r and s. [5
marks]
2 Query Processing [10 marks]
Consider three relations r1(A, B, C), r2(C, D, E), and r3(E, F), with primary keys A,C, and E, respectively.
Assume that relation r1 has 1000 tuples, r2 has 1500 tuples, and r3 has 750 tuples.
(A) Compute the size of r1 ▷◁ r2 ▷◁ r3. [4 marks]
(B) Assume that each relation has a primary index (B+tree) based on its key. Give two strategies (explicitly
show what to report as the result) for computing the join. Note that you can use ffle scan, sorting, and
index to ffnd the resulting tuples. For example, a strategy could be an execution plan below. Speciffcally,
sort r1 based on attribute C. Perform a merge join on r1 and r2 to produce intermediate relation
r12 = r1 ▷◁ r2. Materialize r12 to the disk. For each tuple t12 ∈ r12, use the index of r3 to ffnd the tuple
t3 ∈ r3 that can join t12, report t12 ▷◁ t3. [6 marks]
▷◁: index-based block nested loop join
▷◁: merge join
r1 r2
r3
3 Locking Protocol [40 marks]
Recall the Consistency of Transactions: Actions and locks must relate in the expected ways:
• A transaction can only read or write an element if it previously was granted a lock on the element and
hasn’t yet released the lock.
• If a transaction locks an element, it must later unlock that element.
For each of the transactions described below, suppose that we insert one lock and one unlock action for each
database element that is accessed. Calculate how many sequence orders of the lock, unlock, read, and write
actions are in the following cases. Please show your working. (Note: the order of the data access operations
shall not be affected.)
T1: r1(A), w1(B)
T2: r2(A), w2(A), w2(B).
(A) Consistent and two-phase locked. [10 marks]
(B) Consistent, but not two-phase locked. [4 marks]
(C) Inconsistent, but two-phase locked. [20 marks]
(D) Neither consistent nor two-phase locked. [6 marks]
14 Transaction State [10 marks]
During execution, a transaction passes through several states until it ffnally terminates.
• List all possible sequences of states (i.e., path) through which a transaction may pass. [6 marks]
• Explain the state that each possible path may occur. [4 marks]
5 Deadlock [5 marks]
(A) Explain the concept of deadlocks. [1 marks]
(B) Provide two possible solutions to deadlocks. Explain your answer. [4 marks]
6 Recovery [10 marks]
The following ffgure shows the log corresponding to a particular schedule at the point of a system crash for four
transactions T1, T2, T3, and T4. Suppose that we use immediate update protocol with check-pointing. Describe
the recovery process from the system crash. Specify which transactions are rolled back, which operations in
the log are redone and which are undone and whether any cascading rollback takes places.
7 NoSQL [15 marks]
Describe a scenario in which one would prefer, respectively, as the data storage solution of a data-intensive
application, explain your reason.
(A) Document Database [3 marks]
(B) Graph Database [3 marks]
(C) Log-structured Storage (LSM) [3 marks]
(D) Column Store [3 marks]
(E) Traditional Relational Database [3 marks]
28 Bloom Filter [5 marks]
Explain the data structure of a bloom fflter and why it is used in the LSM tree.
9 Storage and Retrieval Efffciency [5 marks]
List give techniques, introduced in the course, that can improve the efffciency of data storage and retrieval of a
database system.
10 Application [5 marks]
A popular social mobile app has three functions for each user: sending messages to contacts, receiving
messages from contacts, and displaying the number of unread messages in the upper right corner of the app
logo. Once the user is online, the app is supposed to show all the unread messages to the user. However, user
feedback indicates an inconsistency between the displayed number of unread messages and the actual number.
Speciffcally, when the displayed number increases, users often have to wait for a long time (sometimes several
minutes) to see the new messages, during which the displayed number remains inconsistent with the actual
number of unread messages. Similarly, when users have read all the unread messages, it takes a long time
(sometimes several minutes) for the number to be updated accordingly. Among the following statements,
which could be possible reasons for the above inconsistency?
(A) For each user, the storage solution of the app only keeps a list of out-bounding messages.
(B) For each user, the storage solution of the app only keeps a list of in-bounding messages.
(C) The app treats users with a high number of contacts differently with the other users.
(D) The app has a cache for some of the users in storing their received message.
(E) The app cannot handle the load produced by the current users while a more scalable architecture should
be deployed.
3
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。