联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-12-15 04:58

TU856/TU857/TU858

Advanced Databases

Apache Cassandra CA Task

(using data from a Data Warehouse)


This task will be marked out of 100%.

The lab will contribute 25% to your CA (when weighted to 60%))

__________________________________________________________________________________

IMPORTANT

You will need to complete Labs from week 8, week 9 and week 10 to be able to

complete this lab.

__________________________________________________________________________________ Contents

TASK OVERVIEW ...................................................................................................................................... 1

TASK DETAILS .......................................................................................................................................... 2

MARKING ................................................................................................................................................ 3

SUBMISSION ............................................................................................................................................ 4

What needs to be submitted? ............................................................................................................ 4

How do I submit? ................................................................................................................................ 4

What is the deadline? ......................................................................................................................... 4


TASK OVERVIEW

You are going to:

Setup a Cassandra cluster

Create a keyspace in that cluster using replication

Port data from a query on the fact table created in the PostgreSQL data warehouse for

the golf exercise for the lab during week 8.

o You will be writing a Python script to extract the data to a JSON file, create a

table in your Cassandra keyspace and then import the data into the table

You will then execute queries against this database using indexes to improve

performance.

You will also create a second table incorporating a column of type collection and

implement indexes on this table.

You will capture relevant information about the performance of your Cassandra cluster

in general and the impact that the indexes have on your query performance.

__________________________________________________________________________________


2


TASK DETAILS

Task # Description Covered

in Lab

1. Setup:

a. Create a Cassandra cluster

This should be named with your student number

b. Create a keyspace within this cluster

Choose an appropriate partitioning strategy and replication

factor.

WK 9

2. Port data from PostgreSQL to Cassandra:

a. Working with a PostgreSQL database, write a query using the fact table in

the data warehouse created in the lab class in week 8. This needs to

generate some text data in the results.

b. Adapt the Python script provided for the lab in week 9 to extract the

results of the query to a JSON file, create an appropriate table in

Cassandra and populate the table with the contents of the JSON file.

WK 9

(requires

WK 8)

3. Work with tables in Cassandra:

a. Write a CQL statement to query the table created in step 2.

b. Write a CQL statement to query the resulting table on a non-primary key

column – ensure that this can succeed without adding an index.

c. Create a secondary index on a non-primary key column. Demonstrate

that the secondary index has succeeded.

d. Create an SASI index on your table to facilitate pattern matching in a text

column. Demonstrate that the SASI index has succeeded.

WK 9

and

WK 10

4. Working with collection data type:

a. Create a new table that includes a column of type collection and

populate with some data (at your discretion).

b. Write a CQL statement to query the resulting table on the collection

column – ensure that this succeeds without adding an index.

c. Create an appropriate index on your collection column. Demonstrate that

the index has succeeded.


Wk 10

and

WK 9

5. Monitor your cluster and query performance

a. Capture relevant information about cluster and table performance using

nodetool.

b. Capture relevant information about query performance using tracing.

WK 9

and

WK 10


__________________________________________________________________________________


3


MARKING

Marking Breakdown

Setup (cluster and keyspace) 10 marks

PostgreSQL to Cassandra extract and load 15 marks

Working with Cassandra Golf data 40 marks

Basic Query 5 marks 0 marks

Adding a secondary index (and verification) 15 marks 0 marks

Adding an SASI index to support pattern matching in

text (and verification

20 marks 0 marks

Working with Second Cassandra table with collection

datatype

25 marks

CQL to create and populate data 10 marks 0 marks

CQL query using the collection column 5 marks 0 marks

Adding an index for your collection column (and

verification)

10 marks 0 marks

Provide relevant output to demonstrate the existence and

performance of your cluster, keyspace and tables for

relevant aspects of the above.

10 marks

Total Marks 100 marks


_______________________________________________________________________________



4


SUBMISSION

What needs to be submitted?

You need to SUBMIT A SINGLE ARCHIVE (.ZIP, .RAR, .7Z) named with your student number, e.g.

D123456.zip, containing the following:

1. A single CQL file named with your student number, e.g., D123456.cql

Containing your create statements and queries

Commented appropriately explaining what you are attempting to achieve.

NOTE: It should be VERY clear in your CQL where you are addressing each task.

2. A Python script which extracts data from PostgreSQL and loads it into Cassandra named with

your student number, e.g. D123456.py

Commented appropriately.

3. The JSON file of data extracted from PostgreSQL, named with your student number, e.g.

D123456.json

4. Either

A companion document named with your student number (either docx or pdf) e.g.

D123456.docx, D123456.pdf

i. A template outlining the type of content to include is available in the file

called ADvDB-CassandraCA-Template.docx attached to the assignment in

Brightspace.

ii. Note: You are free to adapt this template as you see fit.

OR

A link to a recording of the task/set of recordings of the task being completed with

relevant performance output being created with audio description.

o Refer to the template for the document to identify what should be

addressed.

NOTE: You may be asked to demonstrate your work.

How do I submit?

Submit this via the Assignment section in Brightspace into the assignment called Cassandra CA.

What is the deadline?

The deadline is Friday December 16th 2022 @ 23:59.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp