联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2018-12-10 10:29

CS799 - Data-Driven Development with Python Fall 2018 Programming Assignment 6

1

Programming Assignment 6

Getting started

Review class handouts and examples, work on the reading and practice assignments posted on the course

schedule. This assignment is designed to practice data manipulation with Pandas and plotting with

Matplotlib.

Programming Project: Plotting worth: 20 points

Create plots visualizing movie data.

Data and program overview

In this assignment you will be working with data on movies and people’s ratings of these movies. The task

will be to create visualizations of the movie data.

The following data will be provided using csv files:

A table with movie information (IMDB.csv); we will call this the movie data.

A table documenting the movie data columns (IMDB-dataDict.csv).

The files are supplied in a zip file, which will create a data subfolder, when unpacked.

Your program should produce two plots when it is run. The plots should follow the specification provided

below. Note that creating the second plot will require your independent exploration of Matplotlib for

generating a bar graph.

Plot 1: Ratings by age group

This plot displays the average ratings by voters in the four age groups identified in the data for a set of

movies selected by the user. A sample is presented in the first figure in the sample interaction. The four age

groups are: voters under 18 (VotesU18 column), between 18 and 29 years old, inclusively (Votes1829 ),

between 30 and 44, inclusively (Votes3044 ), and voters 45 and older (Votes45A ).

To generate the plot, the user will first need to specify which movies data to include, as described below:

Step 1. The program should ask how many movies should be included. Assume the user will enter a valid

number between 1 and the total number of movies in the data file.

Step 2. The program should let the user pick movies by executing the following procedure for each

selection:

a. Ask the user to specify a title keyword and identify the movies that have the keyword in the title (the

search should be case insensitive).

b. If there is a single movie with the provided keyword (as shown in the interaction, while selecting the

movie for Plot 2), that movie becomes the selected movie.

Otherwise, if there is more than one matching movie, all matching titles should be displayed, along

with a 1-based number identifying each title (see interaction for Plot 1). The user should be invited to

select one of these titles by specifying a number (you may assume user input will be valid here). The

corresponding movie becomes the selected movie.

If there are no titles containing the specified keyword, the user should be asked to provide another

keyword and steps 2a and 2b must be repeated.

Step 3. Save the plot as plot1.jpg file using plt.savefig() function.

CS799 - Data-Driven Development with Python Fall 2018 Programming Assignment 6

2

Other Plot 1 Requirements: different movie data must be shown in different colors, displaying a marked

point for each of the four values, and a line, connecting the points. Axes must be clearly labeled and the

movie title and genre must be displayed near the first point of the respective graph (see figure). The plot

must include a dashed grid and a title. See the sample interaction for an illustration.

Plot 2: Percentage of raters within different gender-age groups

The movie data contains eight columns specifying the number of female and male voters within each of the

age groups identified in the description of Plot 1. We will refer to these columns as age-gender columns

(they are titled CVotesU18M, CVotesU18F and so on). Note: the total sum of values in the age-gender

columns does not equal the value supplied by the TotalVotes column; perhaps this is because the TotalVotes

count also includes people who did not identify their age or gender, and thus are not accounted for by the

age-gender columns. We will refer to the sum of values within the age-gender columns for each row as all

votes for the movie.

Plot 2 displays a bar graph, showing what percentage of all votes for a movie are contributed by the specific

age-gender group. As with Plot 1, the user must be allowed to select a movie, using the sample procedure as

specified in Step 2 of Plot 1 description, except run for a single movie. The plot must be saved by your

program as plot2.jpg.

Other Plot 2 Requirements: Bars for different genders must be shown in different colors, with the

corresponding percentage value displayed clearly on top of the bar. Axes must be clearly labeled as shown,

and the values along the x axis must be slanted by 30 degrees. Plot title must include the title of the movie.

Refer to the sample interaction for an illustration.

Required Functions

Include main(), function pickMovieWithKeyword() that will run the movie selection procedure described in

Step 2 of Plot 1 description. Functions plot1() and plot2() should each generate and save the respective

plot. Pick function parameters and return values as you see fit, and define other functions as needed.

General Requirements

You can assume that the provided file will have all of the columns involved in the required

computations, but the number of records and order of columns may be different.

Your program should have no code outside of function definitions, except for a single call to main()

and definitions described in the next bullet.

In order to make the code easier to modify for a different set of column names, define global variables

that store the names of columns and use them throughout, e.g. TITLE = 'Title'.

All file related operations must use device-independent handling of paths (use os.getcwd() and

os.path.join() functions to create paths, instead of hardcoding them).

Submission and Grading

Submit your code along with two image files that your program will generate for the input data contained

in the sample interaction. Grading will be based on the accuracy (conforming to all the requirements and

format of the interaction), generality of code and the appropriate use of pandas/numpy/matplotlib resources

(data structures and functions). Each plot will be worth 9 points and two points will be awarded for

programming style.

The sample interaction that follows demonstrates one full execution of the program with the generated

plots. Note that the location of the data file must be requested from the user first.

CS799 - Data-Driven Development with Python Fall 2018 Programming Assignment 6

3

Sample interaction

Please enter the name of the subfolder with the data file: data

Plot1: ratings by age group

How many of the 118 movies would you like to consider? 3

Select 3 movies

Enter movie keyword: game

Which of the following movies would you like to pick (enter number)

1 The Hunger Games: Catching Fire

2 The Imitation Game

enter a number: 1

Movie #1: 'The Hunger Games: Catching Fire'

Enter movie keyword: girl

Which of the following movies would you like to pick (enter number)

1 Gone Girl

2 Me and Earl and the Dying Girl

3 The Girl with the Dragon Tattoo

enter a number: 3

Movie #2: 'The Girl with the Dragon Tattoo'

Enter movie keyword: 12

Which of the following movies would you like to pick (enter number)

1 12 Years a Slave

2 127 Hours

3 Short Term 12

Enter a number: 1

Movie #3: '12 Years a Slave'

--------------------------------------------------------------------------------------------------------------------------------

Plot2: Percentage of raters within gender-age. Select a movie:

Enter movie keyword: argo

Movie #1: 'Argo'

Created by Tamara Babaian on December 1, 2018


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp