ECS32A Makeup Assignment (HW7 in Piazza)
This assignment is optional. It can improve your grade if you received less than full credit on any of your programming assignments. If completed, it will replace your lowest assignment grade.
KODETHON submission begins Monday at 3pm
The Popularity of Baby Names
In this project we will graph data about American baby names taken from the Social Security Administration Website. We will read the data, store it in a dictionary of lists data structure, and then let the user type in names. If a name appears in our data, we will draw a graph, like the one below, showing how the popularity of the name evolved since 1880. The graph below plots the frequency of babies named Karl by year.
As a side note, this plot is very similar to the plots that can be obtained from Googles N-gram viewer, with the exception that this is a much more manageable set of data.
Input File
The primary data file?countsByName.csv?contains 97,310 names compiled from Social Security card applications after 1879. [If you are having memory issues you can also work with the smaller database?countsByName.test.csv] It was downloaded and reformatted from the Social Security Administration names database file (here?(Links to an external site.)Links to an external site.). The first line describes the columns. The leftmost column is the name. This is followed by the number of applications with that name from 1880 to 2017.
Here are the first 9 lines from the file countsByName.csv:
Name,1880,1881,1882,1883,1884,1885,1886,1887,1888,1889,1890,1891,1892,1893
Mary,7092,6948,8178,8044,9253,9166,9921,9935,11804,11689,12113,11742,13222,12839
Anna,2616,2714,3143,3322,3880,4014,4298,4240,5008,5085,5253,5114,5562,5712
Emma,2013,2043,2310,2374,2596,2742,2775,2661,3104,2894,2996,2897,3140,2982
Elizabeth,1948,1852,2193,2268,2565,2591,2691,2695,3236,3074,3124,3065,3469,3372
Minnie,1755,1661,2014,2049,2243,2184,2380,2226,2668,2637,2666,2440,2617,2528
Margaret,1578,1667,1828,1894,2152,2215,2283,2432,2914,2930,3115,3077,3447,3579
Ida,1480,1444,1678,1639,1890,1860,2057,1938,2242,2130,2188,2002,2269,2256
Alice,1414,1315,1542,1494,1732,1690,1821,1829,2207,2154,2281,2024,2381,2445
Normalizing your counts.?To account for the fact that the population of the United States has been increasing it makes sense to plot the percentage of applications instead of the total number of applications for a specific name. To get the percentage of applications for a specific name and year, you need to divide the count for that name by the total number of applications that year. You have the counts by year from the file above. We are also providing the totals by year in a separate file named?totalsByYear.csv:
Year,Total
1880,201484
1881,192696
1882,221533
…
2014,3696311
2015,3688687
2016,3652968
2017,3546301
Hint:?The first line can be ignored by your program. Your program may assume that that the year range always runs from 1880 to 2017 when creating the x-axis for your plots.
Graphing module
This assignment includes an introduction to using a simple graphing module that outputs results in text to the screen. ?We will be using the?plainchart.py?module which is not covered in your primary textbook. Download the module here from Canvas and put it in the same directory as your Python code. You will only need to know three lines of code to use the plainchart.py module:
# import the plainchart module at the top of your program
import plainchart
# create and print a text chart that is 25 lines high
# numList is a list of integer counts by year or
# a list of floating point percentages by year
chart = plainchart.PlainChart(numList, height=25)
print(chart.render())
Your objective is to create a list of numbers numList to give to the charting module. The module works with both integers and floating point values. Also regardless of the range of values, the module will automatically adjust the height of the chart to 25 lines.
Part 1?(40%)?Reading the Database and Getting User Input
Get the file?countsByName.csv?and place it in the same directory you are running your script.?[If you are having memory issues you can also work with the smaller database?countsByName.test.csv]
You will begin by writing a program that reads in the file countsByName.csv and populates a dictionary that can be used to look up a list of counts by year (columns 2,3,4,5,...in countsByName.csv) using the person's name (column 1 in countsByName.csv). The counts should be stored as a list of integers, not strings.
You will then write a loop that keeps asking the user to enter a name until it finds it in the dictionary. When “Name” is found print out “Found Name” followed by the list of counts. Do not assume that the user entered the name in the same case as used in the file, instead use string methods to convert the user input to the correct case.
The exact format for the prompt and output is shown here:
What's your name?chipmunk
What's your name?xylophone
What's your name?porschE
Found Porsche
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 10, 14, 9, 7, 19, 48, 64, 51, 45, 46, 57, 66, 82, 127, 182, 172, 159, 179, 123, 123, 70, 68, 45, 30, 35, 27, 12, 25, 32, 15, 20, 9, 15, 18, 17, 16, 11, 9, 14, 9, 5, 0, 13, 0, 5]
Strategy hints:
When looking up a name, you should use string methods convert the case of whatever the user entered to titlecase. (see example above)
You will need to remove the name from the list of counts before storing it in the dictionary. The pop list method can be used to remove elements from lists.
The split method will give you a list of counts as strings, but you will need to convert them to integers before storing them in the dictionary.
The format for the printed list is exactly what you get if your give a list to the print command. (e.g. print(counts) ) We will improve on this output in the next part.
Submit this to Kodethon as Baby_Names_Pt1.py
Part 2?(30%)?Plotting the output
Download the?plainchart.py?module and place it in the directory where you are running your script.
Modify your program so that it prints the maximum counts and then instead of printing a list of integers it uses plainchart to plot the values.
Plainchart does not provide axes. The y-axis is provided by the Maximum Count. You can write a loop that uses string methods to create the x-axis. Every 10 years (from 1880 to 2010) draw a bar “|”, followed by the year, followed by 5 spaces (for a total of 10 characters).
Strategy hints:
If you can’t figure out the x-axis loop you may also do it with a single print statement.
Submit this to Kodethon as Baby_Names_Pt2.py
Part 3(30%)?Normalizing
Get the file?totalsByYear.csv?and place it in the same directory you are running your script.
Read in the file to get the total number of people in the database for each year. Note that the number increases over time, and our plots will be more accurate representation of the name’s popularity if we plot the percentage over time. For each year, compute the percentage (count / total for that year) * 100, before plotting it with plainchart. Also change the y=axis to show the maximum popularity as a percentage.
Strategy hints:
The popularity is a percentage formatted to four decimal places. You will need to multiply the years frequency (counts/total counts) by 100 to get a percentage. You will need to use the format method to format the floating point value for printing.
Submit this to Kodethon as Baby_Names_Pt3.py
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。