联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-10-31 09:10

COSC 2670/2738 Practical Data Science (with Python)

Project Assignment 3, Semester 2, 2022

Marks : This assignment is worth 30% of the overall assessment for this course.

Due Date : Wed, 26 October 2022, 11:59PM (Week 14), via canvas. Late penalties

apply. A penalty of 10% of the total project score will be deducted per day. No

submissions will be accepted 5 days beyond the due date.

Objective

The key objectives of this assignment are to learn how to compare and contrast

several recommendation system algorithms. There are three major components of the

assignment – a completed Jupyter notebook used to run your experiments, a written

report, and a short video presentation where you describe what you did and your key

findings.

The dataset you will use will be a sample of the Netflix Prize data. The problem is

movie recommendation, and the data is already split into a training and validation set

that you can use to run all of your experiments.

Provided files

The following template files are provided:

SXXXXX-A3.ipynb : The primer Jupyter notebook file you should use to stage

and run all of your experiments.

netflix-5k.movie-titles.feather : The movie title dataframe that can be used to

map a movieID to a title, as well as a list of genres.

netflix-5k.train.feather : The training tuples for 5,000 users, where each tuple

is userID,movieID,rating?.

netflix-5k.validation.feather : A predefined set of validation tuples for the

same users that can be used by you to benchmark the performance of various

algorithms.

A3.pdf : This specification file.

Creating Your Workspace

Once again, you should rename the file SXXXXX-A3.ipynb appropriately based on

your student ID.

Creating Your Anaconda Environment

In order to create your anaconda environment for this project, you should run the

following command in a terminal shell:

conda create -n PDSA3 python=3.8

conda activate PDSA3

pip install jupyterhub notebook numpy pandas

pip install matplotlib scikit-learn seaborn

1

pip install kneed scikit-surprise

Note that both kneed and scikit-surprise can be finicky on some systems. For

example on my machine, an error was thrown during the compile of scikit-surprise

(Macbook Pro M1) but the install still worked when it tried a fallback install method.

So if you find that it really fails for you using pip, then and only then resort to conda.

This would break requirements.txt, but it should work reliably for everyone albeit not

very reproducible. The magic commands would be:

conda install -c conda-forge kneed

conda install -c conda-forge scikit-surprise

You can type “pip freeze” to see a list of the packages that are correctly installed

in your environment. You can also install scikit-learn-intelex and/or psutil if

you want to use the Intel-based optimisations or debug memory management as shown

in the sample Jupyter notebook.

If you also wish the timing and other jupyter extensions to be enabled in your

notebook (optional but may be useful depending on how you decide to present your

results), you need to run the following additional commands:

pip install jupyter_contrib_nbextensions

pip install jupyter_nbextensions_configurator

jupyter contrib nbextension install --user

jupyter nbextensions_configurator enable --user

Now you just need to type “jupyter notebook” to start up jupyter correctly with

access to the libraries you just installed.

Also, recall that if you ever stop working in your environment and come back later.

You must open a terminal, run “conda activate PDSA3” and then “jupyter notebook”,

otherwise you will not be working in the virtual environment you created above, and

most things you try to do will probably start failing.

You should not use any other libraries to complete your assignment beyond the

ones shown above without written permission from the course coordinator (Shane).

1 The Jupyter Notebook Primer (5 marks)

I have included the jupyter notebook I walked through at the end of the Week 10

Lectorial. This notebook will provide you with everything you need to correctly load

the dataframe files from feather, and also includes an example of how to do both a

grid search and randomised search for parameter tuning on one of the recommendation

system algorithms included in Surprise. You should spend some time reading the API

documentation and tutorial for this library provided at https://surpriselib.com.

This will be critical information you should use to stage your experiments.

2 The Report (15 marks)

The main component of your assignment will be to carefully write up your key findings.

Your report should not be more than 5 pages using 11pt font. You may also have one

2

additional Appendix page containing additional graphs or tables. Your final report

must be submitted as a PDF file. You can use Microsoft Word, but I would strongly

encourage you to consider writing up your report using LATEX (https:\overleaf.

com). Writing in LATEX may seem daunting at first, but Overleaf provides plenty of

tutorials and examples, and it is pretty easy once you get the hang of it. This spec

file was written using LATEX. Microsoft Word is not a good tool for writing technical

documents, and in fact most Computer Science Conferences all require papers to be

written in LATEX. The quality of of the presentation layer is easily discernible by most

people in Computer Science. Write a paragraph or two with a graph or diagram in

overleaf and then in MSWord and compare the two – I bet you’ll see the difference

immediately!

Regardless of which tool you use to generate your final PDF, the format of the

report should be:

1. Your name and student number at the top of page 1.

2. Introduction (usually no more than 1/2 page).

3. Methodology (usually about 1 page) - This would contain a clear description of

each recommendation system algorithm you are using and a rationale as to why

it is being used.

4. Experiments ( 3 pages) – This is the main component of your report. Here you

should document all of the parameters and algorithms used, Tables, Figures, or

Graphs that you create in order to compare and contrast all of the algorithms you

have benchmarked and a discussion about what you have discovered. There is

no reason include images of code snippets as you are submitting your Jupyter

notebook already – you should include images of graphs you create – assuming

they are important to the story you are telling.

5. Conclusion (1/2 page) – A clear summary of your key findings.

6. References (Separate page) – You can include as many references as you see fit

and need in your report, and this is not counted against the 5 page limit.

7. Appendix (Separate page) – Any additional graphs or tables that you think are

important but that you could not include into the main document because of

space constraints.

Summary – a report that has a 5 page body, 1+ pages of citations, and 1 optional

page at the end as an Appendix.

You must compare at least four different algorithms from the surprise library in

your report. Note that one algorithm (e.g. SVD) with two different parameter settings

does not count as two different algorithms. That is one algorithm with two different

parameter settings, i.e. one algorithm. You can certainly include results for multiple

parameter settings for each of the four algorithms in your experiments, just make sure

you are using 4 different algorithms in total in your shootout.

In week 8, I provided several evaluation classes you can use to compute a wide

variety of evaluation measures, and we encourage you to explore as many as you

can, as doing so will provide additional evidence that your “winning” algorithm is

really a winner. The “official” metric will be RMSE as this is the metric used in the

original competition, but you should compare the 4 algorithms using a minimum of

3

three different evaluation metrics. You should aim to have at least one algorithm that

can achieve an RMSE score≤ 0.800. This should be achievable with a little parameter

tuning, if you choose good algorithms from surprise, and you may even decide that the

algorithm that got the best RMSE score is not necessarily the “best” algorithm overall

based on the experiments you have ran. Hint: Think carefully about what “good”

movie recommendations really mean to you. We all know what it is, so what do you

think is the best way to prove or disprove recommendations from Algorithm A are

clearly better than the ones you get from Algorithm B. We covered a wide variety of

evaluation measures in the Week 8 Lectorial, so go look at what was in that notebook

and think about it.

Other key hints – (1) If you have a Figure, Table, or Diagram in the report, it must

have a caption and you must reference it in your report and discuss it. By that I mean

“Table 1 contains a table of RMSE results for Algorithms A-E. We can see that ...” (2)

If you use ideas or code from somewhere else, you must include it in your references.

A typical “bibtex” citation for something taken from the web would look something

like:

@misc{StackA,

title = {{Stackoverflow Discussion on X},

howpublished = {\url{http://www.example.com}},

note = {Accessed: 2022-10-05}

The preferred referencing style for Computer Science is usually APA. See https:

//libguides.murdoch.edu.au/APA/sample for an example. There are lots of

tutorials available online on APA referencing, so if you have never seen it, just

search for tutorials on APA referencing and you’ll find more than you could ever

read/watch. (3) If it isn’t clear, you must include graphs, tables, and/or diagrams in

your experimental section, which are to be used as evidence to back any claims about

algorithm performance that you make.

3 The Video Presentation (10 marks)

Your final task it to create a 5-10 minute video recording of you presenting your

findings. You must create slides, and I would strongly encourage you to use PIP

(Picture in Picture Mode) in your video. If you have no idea what I am talking about,

have a look at this video created by our guest speaker when he was a PhD student

at RMIT – https://www.youtube.com/watch?v=jFhhL1mkziQ The video format

should be MP4. No other video formats will be accepted.

A good presentation will have visually appealing slides, a clear story line, and

the presenter will not just be reading what is on the slide or from a piece of paper –

they will be describing what the current slide is showing. If you want to know more,

check out the talk by S. P. Jones on “How to Give a Great Research Talk” – https:

//www.youtube.com/watch?v=sT_-owjKIbA

You might say to yourself, I’m not giving a research talk – but you are. You are

doing experimental research in Data Science on Recommendation Systems. So, treat it

like one.

If you have no idea how to create a PIP video of a presentation quickly and eas-

ily, check out this video – https://www.youtube.com/watch?v=rKDKniBmwe0&

feature=youtu.be. If you don’t like this technique just search for how to create a

4

picture in picture presentation using Zoom or Microsoft Teams. It is surprisingly easy

to do using Microsoft Teams, Zoom, or the open source program OBS. I’m sure there

are many other ways to do it, but there are definitely simple and quick ways to do it

once you see how. The point is that I don’t want “technology” to be an impediment

in creating a great presentation. It is much more useful spending time on getting the

content right and doing a few practice runs so that you can create an interesting talk.

Submission

All submissions must be made on the canvas shell for the course. A link will be

provided within canvas, under the Assignments tab for submissions. Assignments

submitted through any other method will not be marked.

Pay close attention because the final submission must be done differently than

the first two assignments. The rationale for this is so that each of you have access

to a Turnitin report of your submission when you submit. Since you are writing a

report, this is a valuable tool for you to ensure that you do not get dinged for academic

misconduct. Believe it or not, we do take this seriously, and every semester several

students get reported to the university for failing to comply with the plagiarism rules.

So make sure you submit early enough that you can action it, otherwise, you’ll find

yourself in a really uncomfortable meeting with a university panel and you won’t get a

mark for the course until it is resolved!

This is an individual project. Do your own work.

You should submit four separate files in canvas.

[Your Student Number]-A3-notebook.ipynb

[Your Student Number]-A3-notebook.pdf

[Your Student Number]-A3-report.pdf

[Your Student Number]-A3-presentation.mp4

Do not put them in a single zip file or you will not get the plagiarism report as it

cannot run on archived files. If you do submit a single zip file, you will lose a mark,

and we will still run the Turnitin report, but you will not get a copy of it, unless you

end up getting flagged for plagiarism, and none of us wants that to happen. So for your

own sanity and ours please submit the four files individually, named as shown above.

If you do, you should be able to see the Turnitin results for your report and for your

notebook.

GETTING HELP. There is a discussion board available in the course canvas. Please

do not post code snippets in this forum. Do ask questions if you have them! We will

do our best to answer them as quickly as we can. This is the best medium to ask a

question, as there is a very strong chance that if you are confused about something,

your classmates are too. We will also be running office hours in Week 13 from 3-4PM

on Thursday in Collaborate Ultra if you want to stop by and ask a question.

Plagiarism Warning

University Policy on Academic Honesty and Plagiarism: It is University policy that

cheating by students in any form is not permitted, and that work submitted for

5

assessment purposes must be the independent work of the student concerned. Please

see the RMIT policy for more details: https://www.rmit.edu.au/students/

my-course/assessment-results/academic-integrity.

THIS IS NOT A GROUP PROJECT. Students are reminded that this assignment is

to be attempted individually. Plagiarism of any form will result in zero marks being

given for this assessment, and can result in disciplinary action. We routinely use

plagiarism software on projects! Please, please don’t do it. Be aware that paying

someone on a coding site to do it for you is a form of plagiarism. If you are submitting

work that is not your own, regardless of how you got it – you are in breach of this

policy.

Extension Policy

Individual extensions are very unlikely to be considered or granted by

the PDS team for the final assignment. We have this rule in place simply

because we set the deadline as late in the semester as we possibly could

based on the deadlines we have to submit your final grades. If you have

a strong case for needing one, you must request it 48 hours before the

deadline, and you should submit the form provided on canvas along

with supporting documentation to me. Be aware, there is no wiggle

room on the Assignment 3 deadline. We have a very short period of

time to mark 300 projects and submit the final marks to the University

Panel for approval. So, if you do get an extension from me, it will be

very short – maybe 1-2 days at most.

If you find yourself in a situation where you really need an extension

of a week or more (it can happen to any of us), you will need to apply

formally through RMIT for Special Consideration. This isn’t because

we are heartless academics, it is because we cannot change a mark for

a course after the university deadline without documentation from SC.

Even with the appropriate documentation, it can take many weeks to

get grade changes approved after the cutoff, which means in practical

terms, if you miss the cutoff, you won’t be able to graduate if it your

last semester, or you won’t be able to register for any classes that have

PDS as a prerequisite when enrolments open for next semester. So, the

moral of the story is, seriously, don’t be late on the final assignment. If

you are, there is very little we can do to turn it around and get a final

mark for you before the RMIT grade entry and approval deadline.

There is a University process in place to grant extensions. Our

preference is that you go this route if you must, as they have very

clear criteria to grant exemptions. We will always honour these –

assuming you provided the correct information to them. For more

information about applying for Special Consideration, see the rules and

regulations at https://www.rmit.edu.au/students/my-course/

assessment-results/special-consideration-extensions/

special-consideration.

6

Getting Help

There is a discussion group available in the course canvas. Please do not post code

snippets in this forum. Do ask questions if you have them! We will do our best to

answer them as quickly as we can. This is the best medium to ask a question, as there

is a very strong chance that if you are confused about something, your classmates are

too. We will also be running office hours in Week 13 from 3-4PM on Thursday in

Collaborate Ultra if you want to stop by and ask a question.

Email us.

Use the discussion board.

Ask a question in a Lectorial or a practical.

There is help available if you need it. We do strongly urge that you start with the

discussion board because, as mentioned above, if you are confused, there is a very

good chance someone else has exactly the same question, and it is not a great use of

the team’s time to keep answering the same question over and over.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp