联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Database作业Database作业

日期:2019-11-12 10:54

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 1

Take-Home Task 2: Popularity Clouds

(Weight: 30%. Due 8 Friday November 2019)

Overview

This task requires you to use

several of the technologies

introduced in this unit

(Python), database queries

(SQLite) and web

development (HTML).

Goal

This task involves writing a

Python program which

interrogates an SQLite

database and generates HTML documents as its output. It is thus an excellent example of the

main theme of this unit, i.e., the way different “computer languages” can be used together to

build new IT systems.

The aim of this task is to develop a program which creates HTML documents that visualise

movie actors’ popularity amongst their fans, according to a survey of movie customers stored

in a database. More specifically, given a particular category of movie fans, for example,

females or customers whose age is between 30 to 40, we want to know how popular the

actors stored in the movie database are with this group. The required output is a set of HTML

pages, one per customer group. Each page must contain a visual representation of the actors’

popularity in which the more popular actors names are displayed in a bigger font.

The Database

An SQLite database backup file or ‘dump’ called movie_survey.sql containing the data

to use for this task has been supplied. It contains results from a survey taken by Microsoft

employees about their favourite actors and movies. In order to complete this task, you need

to first recreate this database by importing the supplied script in the SQLite DB Browser.

There are four tables in this database: actors, customers, actors_movies and

favorite_actors. Among them, two tables are needed for this task, customers and

favorite_actors. The definition of these two tables is explained below.

Table customers

Table customers provides information about movie patrons or customers. Each row in the

table consists of seven fields as illustrated below. Field customerID is the identifier of a

customer and is the primary key. Fields two to seven provide the customer’s personal

information including age, education level, gender, marital status, number of children and

number of cars. The fields are defined by the schema below:

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 2

customerID INT(6),

age INT(11),

education_level VARCHAR(30),

gender VARCHAR(10),

marital_status VARCHAR(20,

number_of_cars INT(11),

number_of_children INT(11),

PRIMARY KEY (customerID)

Table favorite_actors

Table favorite_actors provides information about movie fans’ favourite actors. Each

row in the table consists of three fields as illustrated below. ID is the identifier of this record,

customerID is the identifier of a customer, and actor is the name of an actor who is one

of this fan’s favourites. The fields are defined by the schema below:

ID INT(6),

customerID INT(6),

actor VARCHAR(100),

PRIMARY KEY (ID)

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 3

Notice that the format of actor names in this database is “given_name surname” as shown in

the tables above. When you order and display the names as required in this task, you can

treat each name “given_name surname” as one string without separately dealing with given

names and surnames. (This means that the names will be sorted on the actors’ given names.)

General Requirements

The aim of this task is to visually show the popularity of actors among specific categories of

movie fans. You are required to develop a Python program that accesses the SQL database

described above to generate HTML documents. The user of your program specifies:

? How many results to display; and

? Which categories of movie fans’ opinions to display.

The result is a collection of hyperlinked HTML documents, one per category of movie

customer. The different categories of customer that the user can specify are:

? Male fans;

? Female fans;

? Fans in a certain age range, e.g., 30 to 40, inclusive; and

? All fans.

An example of one of the HTML documents is shown in Figure 1. It shows the popularity of

actors with movie customers with an age between 30 and 40, inclusive. (This page is one of

those to be produced by Test 2 in the provided template file popularity.py.) The

popularity of an actor is defined by the number of customers who like this actor.

The idea is that more popular actors are displayed in larger fonts. For instance, Tom Hanks is

more popular than Sandra Bullock because Tom Hanks has 420 fans in this age range, while

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 4

Sandra Bullock has only 130. In Figure 1, the font used for Tom Hanks is obviously larger than

that of Sandra Bullock. Each actor’s name is also followed by the number of movie customers

who like this actor, in a smaller, unintrusive font. For instance, in Figure 1, Tom Hanks is

associated with number 420 because he is a favourite actor to 420 customers.

In this task, we also use different colours to display actors whose popularities are different.

Actors with the same popularity are displayed in the same font size and also the same colour.

For example, Dustin Hoffman and Nicole Kidman in the example below have the same font

size and coloursince both of them are popular with 88 fans. Actors with different popularities

are shown in a different font size and also a different colour, e.g., Tom Hanks and Sandra

Bullock in the example below have a different font size and also a different colour. (A possible

refinement of the task is to relate the colours, so that more popular actors get a stronger

colour, but you are not required to do so.)

Figure 1: Actors popularity (top 50) with fans aged between 30 and 40 (Test_02, p2)

Your job is to develop a Python program which generates such pages using the data in the

SQL database movie_survey. As illustrated in the Python template file accompanying

these instructions, you must define a function called show_popularity which, given a list

of customer categories, produces HTML files that can be displayed in a web browser, one per

customer category.

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 5

Another feature of the document shown in Figure 1 is that on the top of the page you must

display the category of the customers, e.g., aged “30-40” in this case, and the number of

customers in this group, e.g., 1549 above.

In addition, your program must hyperlink each of the pages together, to make it easy for the

user to search through them. In Figure 1 above we can see hyperlinks at the bottom of the

page to the previous and next categories in the list. However, no ‘previous’ link is produced

for the first category in a user’s list and no ‘next’ link is produced for the last page in a list.

For example, if the given list of customer groups is [‘Female’, ‘Male’, ‘30-40’] then your

function must generate three HTML pages, one for female customers, one for male

customers, and one for customers aged between 30 and 40, inclusive. In the male movie

customers’ page, the ‘previous’ link and the ‘next’ link should take us to the female customers’

page and the page for customers aged between 30 and 40, respectively. However, for the

female customers’ page, there is no ‘previous’ link, and for the page of customers aged

between 30 and 40 there is no ‘next’ link.

Figures 2 to 4 below show the expected output when we ask for the top 20 actors preferred

by fans in the category list [‘Female’, ‘Male’, ‘30-40’]. (These three pages are the output

expected for Test 1 in the template file popularity.py.) In each case the relative

popularity of actors in each of the specific customer categories is shown clearly, and each of

the three pages is hyperlinked together in the order of the list of customer categories.

Figure 2: Actors popularity (top 20) for female fans (Test_01, p1)

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 6

Figure 3: Actors popularity (top 20) for male fans (Test_01, p2)

Figure 4: Actors popularity (top 20) for customers aged between 30 and 40 (Test_01, p3)

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 7

Specific Tasks

The specific things you must do to complete this portfolio task are as follows. NB: This is a

large task. You should modularise your solution into several functions to make your program

code manageable and readable. You must use the supplied template file popularity.py

as a starting point.

1. Restore the movie_survey database using the supplied dump file

movie_survey.sql

2. Write a Python function named show_popularity that has three parameters: (a) a list

of strings representing a list of customer groups, (b) an integer value representing how

many actors are to be displayed on the page, and (c) a string which gives a name to this

particular multi-page document.

For example, consider the following function call:

show_popularity(['20-40', '40-80', 'All'], 30, 'Test03')

The first argument (a list)

represents the customer

categories required and this

example indicates that three

pages are to be created: one for

customers aged 20 to 40,

inclusive, one for customers

aged 40 to 80 inclusive, and one

for all customers. (Other valid

customer categories are

'Female' and 'Male'.)

The second parameter is an

integer value which specifies

how many actors’ names are to

be displayed on each page.

Those displayed must be the

most popular actors for that

particular customer group.

(Hint: you will want the results

set returned from the database

query to be ordered by

popularity.)

The third parameter is simply a name used to

uniquely identify this collection of HTML

pages. You should use it together with the

customer category to create the name for the

HTML files generated. In this particular

example your program must produce the

following three HTML files:

Test03_20-40.html

Test03_40-80.html

Test03_All.html

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 8

Actors must be displayed in alphabetical order regardless of their popularity. As

mentioned before, each actor’s name in the database consists of a given name and a

surname in the format of “given_name surname”, so just sort them as a single string. For

example, in Figure 1, Al Pacino, Anthony Hopkins, and Arnold Schwarzenegger are

displayed in that order, even though Arnold Schwarzenegger has a higher popularity than

Anthony Hopkins. Each actor’s name must be followed with the actual number of fans

derived from the database. Ensure that the numbers produced by your database query

match those in the figures above.

At the bottom of each page, you must have hyperlinks to the previous page and next page

in the customer category list, if any.

3. Some queries will produce no results at all. In this case you should still produce an HTML

page, even if there are no actors listed. An example is shown in Figure 6 below (which is

one of the pages produced by Test 10 in the provided template file).

Figure 5: A page with no results returned (Test_10, p2)

4. As noted above, the valid customer categories are 'Female', 'Male', 'All' and an age range

e.g. '30-50'. If any other category is found in the list provided as the first parameter to

show_popularity, a message should be printed noting the error and this item should

then be ignored. Pages must still be produced for all valid cases in the list, however. See

the unit tests in the provided template file for examples.

Development hints

? Before you can begin this task you must ensure that you have access to SQLite DB

Browser so that you can create the database. You must also have access to the

necessary SQLite-Python module so that you can call SQLite functions from Python

code.

? Given a movie customer category you will need to execute two queries on the

database, one to find out how many customers there are in this category and one to

get the list of actors and their popularity. The first query is relatively easy and involves

the customers table only. The second involves a join of both the customers and

favorite_actors tables. It is recommended that you develop and debug your

database queries in the DB Browser before attempting to incorporate them into your

Python program. Save your queries to .sql files so that you can refer back to them

later if you need to.

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 9

? Use the screenshots shown above as a guide, but you do not need to duplicate the

precise choices of font sizes and colours. Importantly, however, your SQL queries

must return exactly the same number of fans as shown in the examples above.

? This is a large program, so it is suggested that you tackle it incrementally. Before you

even begin coding you should use the DB Browser to determine exactly what query

your Python program needs to execute. Then develop code to generate just one HTML

page. Finally, write the code to iterate over a list of customer groups to generate all

documents.

? You need to work out how to define the font size and colour for the text to be

displayed on web pages. This can be done by changing the font’s ‘style’. Refer to the

lecture demonstrations and workshop exercises for how to do this.

Deliverables

You should develop your solution by completing and submitting the provided template file,

popularity.py as follows.

1. This file must include a function show_popularity which has all the

characteristics defined above. When executed in the presence of an appropriately

configured SQLite database, this program must produce the required HTML files.

2. Complete the “statement of authorship” at the beginning of the Python file to confirm

that this is your own individual work by inserting your name and student number in

the places indicated. I will assume that submissions without a completed statement

are not your own work and they will not be marked.

You are not to submit any SQL database scripts or dumps. The supplied SQL database, and

possibly updates of this database, will be used to test your software. Your program must

therefore not be hard-coded in any way, as it will be required work with any similarly

structured database.

Requirements and marking guide

You are required to extend the provided template by completing the show_popularity

function (and adding other functions as necessary) so that it can generate the required HTML

pages from data stored in the database. Your code must work for all the supplied datasets in

the template file and any other data set in the same format.

Your solution must pass all the supplied test cases in the template file, and work with any

other database of the same format. In particular, your solution must have the following

features:

? SQL queries (4%). The SQL queries must be well formed, and produce accurate results

from the database for any given category value.

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 10

? HTML pages (2%). Produces the required number of HTML pages according to the

supplied first parameter (ie the list of categories).

? Top Actors (2%). Each HTML page includes only the number of most popular actors,

according to the value supplied in the second parameter.

? HTML contents (10%). Each HTML page is to include:

o an appropriate title (in the browser tab)

o a heading including the number of actors listed

o a sub-heading including:

? the type of customer group; and

? the number of customers in that group.

o hyperlinks to previous and next pages (if any)

? HTML body (10%). the body of the HTML page must display the most popular actors

in alphabetic order, with:

o the size of the actor's name reflecting the popularity (ie more popular actors

displayed in a bigger font);

o actors with the same popularity displayed in the same sized font;

o a different font colour for difficult popularities; and

o actors with the same poparlity displayed in the same font colour;

o the popularity count of each actor, following that actor's name.

o the fonts and colours used in the HTML must be readable.

? Python code quality and presentation (2%). Your program code must be presented

in a professional manner. See the coding guidelines in the JIT104 Code Marking Guide

for suggestions on how to achieve this. All Python and HTML code must be easy to

read and understand, thanks to:

o clear uncluttered layout;

o concise English comments, explaining each significant code segment's purpose;

o variable names that clarify their roles;

o avoidance of 'magic numnbers';

o code that avoides unnecessary duplication of code;

o appropriate use of functions to modularise the code;

o correct grammar and spelling.

If you are unable to solve the whole problem, submit whatever parts you can get working.

You will receive partial marks for incomplete solutions.

JIT104 Building IT Systems

Fall, 2019

CRICOS 00213J Page 11

Academic Integrity

This assignment is for individual assessment only. That means the work you submit must be

your own work and not the work of anyone else. You are not permitted to collaborate with

your peers on this assignment, other than to discuss high-level strategies. You are not

permitted to ask or commission someone else to write the assignment for you, or help you

write the assignment.

If you are in any doubt as to what is permitted and what is considered a breach of academic

integrity, please talk to one of the teaching staff as soon as possible.

Author: Colin Fidge (QUT)

Revised: Donna Teague (QUTIC 2019)


版权所有:编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。