Task 1: Basic Python Programming
Background
We are going to use ‘PyCharm’, an integrated development environment (IDE) for
Python. If you want to have it on your own devices, please follow the instructions on
the link: https://www.guru99.com/how-to-install-python.html
Install libraries
We need to install 3 Python data analysis libraries: pandas, matplotlib, plotnine, which
helped us on data analysis and visualization. Please click: File-> Settings -> Project:
data_lab -> Project interpreter, then click ‘+’ and type in the library you want to install.
Even the first library is still being installed, you can type in the second and the third
libraries to install several packages in parallel.
2
In the status bar, it shows ‘3 processes running’. You can click and see the installing
process, which may take a few minutes.
3
Task 1.1: A number guess game
Now, we will use Python’s most common data types: integer, string, Boolean (True,
False), list and dictionary; the most common logic structures: if-else and while; and
the most common operators: =, >, <, ==, to get familiar with the Python basics and
create an interesting guess game.
https://www.dropbox.com/s/eh9puev65i5fk71/guess_num.py?dl=0
a. Copy the codes, run and play.
b. We need to introduce an access control mechanism to the game, as we only want
the right person to play the game.
We can add a simple user name & password check by the data type dictionary:
passwordDictionary= {"geb":"2503", "ray":" ray110", "cuhksz":"666"}
Please check the code and play with the new feature.
https://www.dropbox.com/s/2zqwoytcfyb5kkh/guess_num_access_control.py?dl=0
Can you give a suggestion that how we can give the VIP pass “cuhksz” some
favor to make it easier for this account?
c. We need to introduce a play-limit mechanism, as someone just got addicted to the
game. For example, the game should terminate if you played the game for more than
3 times. Please screenshot your code implementation.
d. Check with Google, explain what does the structure “try-except” do? Why we
need two? Do we really need two?
4
Task 2: Analyze Big Data
Task 2.1: Load the data
We are studying the species, hinder foot length and weight of animals captured by
sensors. The data sets are stored in ‘surveys.csv’. Please download it and save it in the
folder of ‘data_lab’.
https://www.dropbox.com/s/u9oi69xk6mmwe0f/surveys.csv?dl=0
We will use the ‘read_csv’ function in the pandas library to load the file ‘surveys.csv’
into a variable, then we can start to deal with the data stored in the variable. Download
‘py_demo_part1.py’ and save it in the folder of ‘data_lab’.
https://www.dropbox.com/s/6mphlaqde441lrc/py_demo_part1.py?dl=0
a. Run the code, print (surveys_df)’ will print the variable that holds all data of
the ‘surveys.csv’ file. The result excerpts the first and the last few lines of the file.
b. Now we comment out ‘print (surveys_df)’, uncomment line 5 & 6, and run
the code.
‘print (surveys_df.dtypes)’ gets the type of each column:
5
‘print (type(surveys_df))’ gets the output:
dtype: object
<class 'pandas.core.frame.DataFrame'>
It means the variable ‘surveys_df’ is an object of DataFrame of pandas. By
checking the pandas documentation (https://pandas.pydata.org/pandasdocs/stable/reference/frame.html?highlight=dataframe),
it says DataFrame is a “Twodimensional
size-mutable, potentially heterogeneous tabular data structure”, which
means DataFrame is similar to a spreadsheet in Microsoft Excel.
Note, you may face lots of attributes and functions on using a library (e.g.,
pandas). If we compare a library to a person, attributes, are like this person’s
characteristics; while functions are like this person’s capability. Although
programmers try their best to give attributes and functions nice names to let users
guess their functionalities at the first glance, the best way to find out what they
exactly mean or do is to check with Google or the documentation of that library (e.g.,
the above link is the DataFrame page of the pandas documentation). It shows all the
attributes and functions of DataFrame. For example, it includes the explanation of
‘dtypes’, ‘head()’, ‘tail()’, and other functions in ‘py_demo_part1.py’. Please
uncomment and run the lines of functions you are interested in, and check with the
documentation if you have any doubt.
c. Find a proper indexing function in the documentation page to return the value
of the 3rd row, and the 8th column, which is 37.0. Note, the indexing of Python starts
6
from 0. Please write the command down.
d. What if I want to have the values of the first 3 rows, but only the columns from
2nd to 4th, namely, the first 3 rows’ values of month, day and year. Please write the
command down.
e. What if I want to have all the values of the 6th column, namely, the species id.
Please write the command down.
f. What if I want to have all the values of the 6th column, but with the duplicates
removed. Please write the command down.
7
Task 3: Visualize Big Data
Download ‘py_demo_part2.py’ and save it in the folder of ‘data_lab’.
https://www.dropbox.com/s/32ozhbhaq7n11rg/py_demo_part2.py?dl=0
a. Run the code, and use the documentation page to explore the plotnine
functions:
https://plotnine.readthedocs.io/en/stable/
b. Please use geom.point() to plot species_id vs hindfoot_length using sex as
the color parameter. In addition, please explain a bit what you observe from the
plot.
c. Please use geom.bar() to plot species_id vs weight using sex as the color
parameter. In addition, please explain a bit what you observe from the plot.
d. Please use geom.point() to plot species_id vs hindfoot_length using sex as
the color parameter, but we want only years of 1979, 1993, 1999, 1977. In
addition, please explain a bit what you observe from the plot.
8
END of Tutorial / lab-3
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。