Create a new Excel Worksheet
Sorry guys and gals, but it’s a New Year and we have new computers. So we’re going to all make a new spreadsheet. Remember that the data is at www.basketball-reference.com. If you don’t remember exactly how to load the data into the spread sheet, I will help you.
The players we are studying are: Luka Doncic, Shai Gilgeous-Alexander, Deandre Ayton, Marvin Bagley, Trae Young, Josh Okogie, Jaren Jackson, Kevin Knox, Wendell Carter, Mohamed Bamba
Format the Excel Worksheet
Delete columns “G”, “DATE”, “AGE”, “TM”, “OPP”, the column next to “OPP”, “GS”, “MP”, “FG%”, “3P%”, “FT%”, “GMSC”, “+/-”
To do this you can either go to www.basketball-reference.com and recreate the .xls file or you can simply delete columns in Excel or WPS by clicking on the header of the column and pressing delete. If you do the second option you need to delete the columns for every sheet.
Loading the file into R
1)We need to prepare R to read the file. Go to Packages on the title bar and click install package. R will ask you to choose a location. Select China (Shanghai); after R loads select gdata from the list of packages. It will take a few minutes for R to load the package.
2)Once the package is installed you have to load it into R. Type library(gdata). If the library doesn’t load ask Mr. Wilson for help.
3)We are now ready to load the file into R. We are going to load each player into their own variable. Start with Luka Doncic; define a variable for Luka (ex: Doncic =) and then type in the command:
a.read.xls(“Basketball.xls”,sheet = 1)
b.Continue for all 10 players. Make sure to change the number of the sheet according to the player that is being loaded in.
c.Now that you’ve done that, do it again! Re-define the variable and use the command: read.xls(“Basketball.xls”,sheet = “Doncic”). It still works!
Figuring the average points per game for our players
Of course for any of our players we can type in
mean(Doncic$PTS)
And that will return the average points per game (PPG) for that one player. Try that for all 10 players now. If the program returns:
[1]NA
It means that we need to fix the data that we imported into R:
1.If you did the above steps you saw that the Doncic variable is broken. Type:
fix(Doncic)
A screen will appear that looks like a spreadsheet. Look for words in the spreadsheet. That’s what we want to delete.
2.Redefine Doncic but add the argument “na.strings.” na.strings tells R to change the word we don’t want (if you did number 1 you will see that the word is “Inac”) to “NA.” So type:
Doncic = read.xls(“Basketball.xls”,sheet = “Doncic, na.strings “Inac”
Now all of the instances of the word “Inac” have been changed to “NA.” Type:
fix(Doncic)
What is the result?
3.If it still didn’t work that’s because R cannot give a mean of data with words (in programming words are called strings). We need to delete all of the “NA.” Type:
Doncic = na.omit(Doncic)
Now the Doncic variable should be fixed. Repeat this process for the other names if you cannot get a value for the mean of their PTS.
We should now be able to calculate the PPG for any player. Of course we can look at the numbers and see who scored highest, but we want a visual representation. We’ll use the barplot() function to create a bar graph.
We can’t type
barplot(mean(Doncic$PTS),mean(Ayton$PTS).........
This won’t work. We need to create a vector of values that we can use in barplot(). We are going to create a variable (PPG) and we are going to put the average points per game for every player in it. Type:
PPG = c(mean(Doncic$PTS).........
The command c( ) creates vectors (or lists).
Try to make a bar graph now by typing barplot(PPG). It should work. If it doesn’t see Mr. Wilson. You should have this on your screen.
This is a decent looking bar graph but there are several problems:
1)There are no names
2)Some of the data is taller than the graph
3)The colors are a little plain
You need to fix these things. Fortunately everything you need to know can be found in R! Type:barplot
Read the information file and find the functions for names, limit of the x and y axes, and colors. Experiment and change all of these things and take a picture of your work.
Triple Bar Graph
You need to create a graph like the one above. Use total rebounds, offensive rebounds and defensive rebounds.
We want to create another graph for points (PPG, FGA, FGM) and percentages (FG%, 3P%, FT%) but we don’t want to work as hard as we did to make the first triple bar graph. So we’re going to create a “function” for “R.”
Functions
Let’s define a variable called
Average
Let Average = function (stat = “stat”)
In this line of code “Average” is a variable that we are defining.
“function” is a command. It tells the program that we are going to create a rule for the program to execute.
(stat = “stat”) defines the input as a string (a string means the input will be words instead of numbers).
When we finish writing the function the input will look like this:
Average (PTS)
The above command will tell the program to take the average PTS of all players. Before we can use that command we have to finish writing the function.
So we have
Average = function (stat = “stat”)
We are going to add a bracket to the end of our command line so that we can start writing our function.
Average = function (stat = “stat”){
Take note of what kind of bracket we have added to the end of the command line. Make sure you type the correct bracket into R.
Okay, now that we have added the bracket on the command line we can press [enter] and R will drop us down a line without closing the function. It will look like this
On the next line we are going to define a new variable. I will call it AVG (you can call it whatever you want).
Now here is the difficult part. We want to create a vector of averages. Remember that we want our function “Average” to return a list of averages of each player.If we type in
Average (PTS)
We want 10 values for average PTS for each player AND we want their names to be next to their average PTS. So we need to tell our function to do that.
I don’t want to tell you exactly how to do this part of the project, but I will tell you this. You need to create a vector of values (averages) and you need to create an array of names. Both of these things need to be created inside of the Average function.
When you have finished the Average function show it to me so that I can verify that it works.
After you have finished this function there are three more assignments to complete that should be much faster no that you’ve mastered this skill. They are on the next page.
1.Create a triple bar graph for PTS, FGA, FGM
2.Create a triple bar graph for FG% (=FGM/FGA), FT% (=FTM/FTA), 3P%(=3PM/3PA). You’ll have to calculate those percentages yourself.
3.Create a function that allows you to load in a spreadsheet by typing in a command. For example:
This function allows me to load a players spreadsheet into R by typing in their name.
MAKE SURE YOU SAVE ALL YOUR WORK SO THAT YOU CAN SHOW IT TO ME.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。