联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2024-03-22 08:15

Department of Computing

School of Mathematical, Physical and Computational Sciences

Assessed Coursework Set Front Page

Module code: CSMBD21

Coursework Description for Big Data and Cloud Computing

Module Title: Big Data and Cloud Computing

Lecturers responsible: Prof. Atta Badii, Dr Zahra Pooranian

Type of Assignment: Coursework

Individual/group Assignment: Individual

Total Weighting of the Assignment: 50% comprising of 25% for each of Big Data and Cloud Computing

Page limit/Word count for the technical report of the results:

Approximately 3000 words max, consisting of two sections of 3 (max) pages each, to report on the

implementation of two tasks (Task A and Task B); Section A to report on the Big Data Task (Task A) and Section

B to report on the Cloud Computing (Task B) – Maximum of 6 pages excluding appendices and should follow

the School Style Guide.

Expected hours spent for this assignment: 30 hrs.

Items to be submitted: Two PDFs to be submitted via BB, each of 3 pages max, one for Section A (Task A) and

one for Section B (Task B). The PDFs are to include, on the first page, a link to the code to be made accessible to

assessors [z.pooranian@reading.ac.uk (rk929650), atta.badii@reading.ac.uk (sis04ab), r.faulkner@reading.ac.uk

(ei194011), weiwei.he@reading.ac.uk (in928478)] through GitLab or similar repository. For Section A, as the

solution is to be provided in your free Azure account, please provide a temporary username and password to

your azure account; for Section B the link can be a GitLab.

Work to be submitted on-line via Blackboard Learn by: 12:00 hrs, Wednesday 20th March 2024

Work will be marked and returned by: 15 working days after the date of submission.

NOTES

By submitting this work, you are certifying that it is all your own work and that use of material from othersources

has been properly and fully acknowledged in the text. You are also confirming that you have read and

understood the University’s Statement of Academic Misconduct, available on the University web-pages.

If your work is submitted after the deadline, 10% of the maximum possible mark will be deducted for each

working day (or part of) it is late. A mark of zero will be awarded if your work is submitted more than 5 working

days late. You are strongly recommended to hand in your work by the deadline as a late submission on one piece

of work can have impacts on other work.

If you believe that you have a valid reason for failing to meet a deadline then you should complete an

Extenuating Circumstances form and submit it to the Student Support Centre before the deadline, or as soon as

is practicable afterwards, explaining why.

2

Section A (The Big Data Task):

? Task A: Implement a solution to predicting flight delays based on historical

weather and airline data as provided in your free azure account and

explain the reason for your preferred Machine Learning (ML)model.

Section B (The Cloud Computing Task)

? Task B: Implement a MapReduce solution to determine the passenger(s)

having had the highest number of flights based on flights and

passenger data provided in the Assignment Folder of the Module on

Blackboard.

Assignment Tasks based on the explanatory notes in the Appendix to this document

If you face any difficulties, make clear, in your submission, how far you were able to proceed with the

implementation and explain the challenges you faced.

Marking Criteria for Task A:

? Total marks for this Task A will be normalised for 25% credit towards the overall coursework.

? The table below indicates the level of performance expected for each range of assessment:

Classification Range Typically, the work should meet these requirements

First Class (>= 70%) The assignment demonstrates:

? Excellent technical skills in implementing the system, possibly also suggesting

any other solution deemed viable; including reasons for the preferred solution.

? Professional technical writing skills and style.

Upper Second (60-69) The assignment demonstrates:

? Excellent technical skills in implementing the solution.

? Appropriate technical writing skills and clear presentation; including reasons for

the preferred solution.

Lower Second (50-59) The assignment demonstrates:

? Excellent technical skills in implementing the system.

? Moderate technical writing skills and clear presentation; including reasons for

the preferred solution.

Third (40-49) The assignment demonstrates:

? Satisfactory technical skills in implementing the system.

? Some technical writing ability and clear presentation; including some reasoning

for the preferred solution.

Fail (<40) The coursework fails to demonstrate technical skills to implement and technical

writing and clear presentation; inadequate or non-existent reasoning for the

preferred solution.

3

Marking Scheme and feedback template for Task A (Big Data Task)

? Total marks for this Task A will be normalised for 25% credit towards the overall coursework assessment

The key criteria for the assessment of the submitted coursework Contribution to Mark in %

Introduction

? Brief description of the background of the case study.

? Description of the tools and techniques deployed, including “Data

Factory”, “Data Bricks”, “Power BI” as used to analyse this solution

(explaining the solution architecture).

5

10

Solution Implementation

Implementation of Solution:

? Creating the Data Bricks cluster

? Load sample data

? Setup the Data Factory

? Data factory pipeline

? Operation of ML

? Summarizing data

? Visualisation of data


Evaluation:

Your personal reflections on:

? Stating reasons for your preferred solution 10

Presentation of the report:

? Structure and layout of the report

? Professional writing style

? Use of figures, tables, references, citations, and captions


Marking Criteria for Task B (Cloud Computing Task)

? Total marks for this Task A will be normalised for 25% credit towards the overall coursework.

? The table below indicates the level of performance expected for each range of assessment:

Classification Range Typically, the work should meet these requirements

First Class (>= 70%) The assignment demonstrates:

? Deep understanding of the MapReduce paradigm and excellent technical skills in

implementing the system to fulfil the objectives of the task.

? Highest quality technical reporting including solution evaluation, addressing all

aspects, completely and clearly.

Upper Second (60..69) The assignment demonstrates:

? Good understanding of the MapReduce paradigm and good implementation

consistent with the objectives of the task.

? Good quality technical reporting, inclusive, complete, and clear.

Lower Second (50..59) The assignment demonstrates:

? Sufficient understanding of the MapReduce Paradigm and satisfactory

implementation consistent with the objective of the task.

? Acceptable technical writing reporting tackling the key aspects.

Third (40..49) The assignment demonstrates:

? Basic understanding of MapReduce and basic level of implementation of the

task.

? Basic standard of technical reporting; with some notable shortcomings.

Fail (<40) The coursework fails to demonstrate sufficient understanding of the MapReduce

paradigm and fails to provide reporting even to a basic standard.

5

Marking Scheme and feedback template for Task B (Cloud Computing Task)

? Total marks for this Task B will be normalised for 25% credit towards the overall coursework assessment

MapReduce Concepts

Concept Example Max.

Map Phase Inputs and Outputs 5

Reduce Phase Inputs and Outputs 5

Segmentation of Roles Split of work 2

File Handling Use of Files and Buffers 3

Distributed parallelism Advantages, fault tolerance etc. 3

Explanation of additional process Combining/Shuffling/partitioning etc. 1

Flowchart Illustration of MapReduce problem solving 1

20

Software Prototyping

Concept Example Max

Project Structure Object-Orientation/class hierarchy 7

Code Re-usability Generics, Templating 7

Solution Elegance Design Optimality 6

20

Implementation

Aspect Max.

Task Implementation

Key/Value Selection 6

Correct Result 4

Output Format 4

Parallelisation Multi-threading 6

20

Documentation

Aspect Max.

Report Structure Abstract, Sections, Length, References, etc. 2

Section Content

Description of development 4

Evidence of use of Version Control 5

Evidence of understanding MapReduce 7

Conclusions 5

Report Quality Overall Quality of Report 5

Code Commenting Use of comments in code 12

40

6

Coursework description: Analysis of big data solution architecture

Implement and evaluate the big data solution to be provided to a customer who needs to modernise their

system.

Your task:

Implement a solution to predicting flight delays based on historical weather and airline data

In order to deal with big data, it is required to process data in a distributed manner. Azure Synapse Analytics in

Azure Machine Learning (ML) provides a platform for data pre-processing, featurization, training and

deployment. It can connect Spark pools in Azure Synapse Analytics. PySpark helps pre-processing the data in an

interactive way. This environment provides powerful Bigdata Analytics tools such as Data Factory, Data Brick

and Power BI which you will need to use in developing a solution for this coursework using the data for the case

study available on Azure Synapse Case Study as described below.

Please download the case study from the Big Data assignment guide in the assessment area:

Blackboard → Enrolments → CSMBD21-23-4MOD: Big Data and Cloud Computing (2023/24)

In the Assessment tab, select as follows:

Assessment → Big Data → Case Study

and develop your solution accordingly.

Assignment Case Study

Margie's Travel (MT) provides concierge services for business travellers. They need to modernise their system.

They want to focus on web app for their customer service agents who are providing flight booking information

to the travellers. This could, for example, include features such as a prediction of flight delay of 15-minutes or

longer, due to weather conditions.

You are expected to analyse the design of solution-I to predict the flight delay by processing the data provided.

Your solutions will need to be responsive to the customers’ needs as specified.

Your report is to describe your progress on the objectives of the task, including the aspects set out below:

1. A brief description of the background of the case study;

2. A description of the implementation of the solution supported by your free Azure account including

tools and techniques deployed such as Data Factory, Data Brick, and Power BI;

3. The reason for your preferred solution.

By clicking on the link below, you can take the first step to create your free MS Azure account and should be

able to complete allthe steps for Task A using the student’sfree $100 allowance. Please keep track of your usage

so that it will not exceed the free allowance limit as you will be liable for any excess charges.

Azure for Students – Free Account Credit | Microsoft Azure

With Microsoft Azure for Students, get a $100 credit when you create your free account. There is

no credit card needed and 12 months of free Azureservices.

Please register for this free account through https://azureforeducation.microsoft.com/devtools In

this way you will not be asked for a credit card number and risk subsequently being billed because

there is no need for you to incur charges in attempting to use the MS Azure under this scheme; so

please follow this link and register correctly so that you will not risk incurring charges

for which you will be liable.

7

Assignment Case Study Description and Data Access Details for Task B

For this coursework there are two files containing lists of data. These are located on the Blackboard system in

the Big Data and Cloud Computing assignments directory – download them from:

Blackboard → Enrolments → CSMBD21-23-4MOD: Big Data and Cloud Computing (2023/24)

In the Assessment tab, select as follows:

Assessment → Cloud Computing → Coursework Data

The coursework data folder includes the files:

AComp_Passenger_data_no_error.csv Top30_airports_LatLong.csv

The first data file contains details of passengers that have flown between airports over a certain period. The

data is in a comma delimited text file, one line per record, using the following format:

Passenger id Format: ????????????????????

Flight id: Format: ????????????????

From airport IATA/FAA code Format: ??????

Destination airport IATA/FAA code Format: ??????

Departure time (GMT) Format: ?? [10] (Unix ‘epoch’ time)

Total flight time (mins) Format: ?? [1. .4]

Where ?? is Uppercase ASCII, ?? is digit 0. .9 and [??. . ??] is the min/max range of the number of digits/characters

in a string.

The second data file is a list of airport data comprising the name, IATA/FAA code, and location of the airport.

The data is in a comma delimited text file, one line per record using the following format:

Airport Name Format: ?? [3. .20]

Airport IATA/FAA code Format: ??????

Latitude Format: ??. ?? [3. .13]

Longitude Format: ??. ?? [3. .13]

There are two additional data input files which can be used for analysis and validation however should not be

used for the final execution of the implemented jobs, these can be downloaded from this directory and are as

follows:

AComp_Passenger_data.csv AComp_Passenger_data_no_error_DateTime.csv

Your Task:

Determine the passenger(s) having had the highest number of flights.

For this task in the development process, develop a MapReduce-like executable prototype, (in Java, C, C++, or

Python). The objective is to develop the basic functional ‘building-blocks’ that will address the Task above, in a

way that emulates the MapReduce/Hadoop framework.

The solution may use multi-threading as required. The marking scheme reflects the appropriate use of coding

techniques, succinct code comments as required, data structures and overall program design. The code should

be subject to version control best-practices using a hosted repository under your university username.

8

Write a brief report (no more than 3 pages, excluding any appendices), describing:

? The high-level description of the development of the prototype software;

? A simple description of the version control processes undertaken;

? A detailed description of the MapReduce functionsimplemented;

? The output format of any reports that the job is to produce.


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp