COMP3208 Social Computing Techniques
Updated: 9
th February 2023
Deliverables and deadlines
Assignment
Number
Deliverable(s) Deadline Marking Scheme
Task
The aim of the coursework is to build and evaluate realistic recommender system algorithms. This is
broken down into two assignments, each testing a different element of the overall goal. Individual
submissions are expected for this coursework (i.e. not group work).
Each of the two assignments expects the same deliverable format to be submitted via ECS handin.
There should be a source_code.txt file and a results.csv file in each submission.
You should run your source_code offline (on your laptop or ECS hardware) and only submit it when
it's ready for evaluation. Each task expects you to calculate a set of results which are then submitted
for automated evaluation alongside the source code itself.
The source_code used to run your code must be a single file in either .java or .py format (renamed as
source_code.txt). You cannot submit files in other formats, such as Eclipse project files or Jupyter
notebook .ipynb files. Multiple file submissions are not allowed, there must be only a
source_code.txt file and a results.csv file in each submission.
Your source code must only use in-built java or python libraries. The only exception to this rule is for
the python lib numpy which is allowed to make array manipulation easier. For example scipy and
scikit_learn libs are not built into python and so are not permitted. Same goes for weka java libs,
these are not part of java and so are not permitted. The assignment is intended to evaluate your
ability to understand and develop recommender system algorithms from scratch, not your ability to
use powerful third-party libraries to do the task efficiently without a deep understanding.
Assignments are evaluated automatically via the ECS handin system. For each assignment, 10
submission attempts are allowed (formative), with the best scoring attempt used for the final mark
(summative).
The source_code must be self-described, using easy-to-read inline comments to explain how the
coded algorithm works. Your self-described code should provide a sufficiently detailed explanation
for how each algorithm works (e.g. narrative to code steps alongside an explanation of maths behind
the algorithm) to provide evidence of a deep understanding of the algorithms used. Use your
judgement for how long the inline comments should be, providing enough information to show deep
understanding but not so much it becomes hard to read or needlessly bloated.
An example submission is provided so the format of submission files and examples of self-described
comments are clear.
For the 2nd assignment, the source_code self-described comments will be manually assessed via a
code review, and marks will be provided for evidence of a deep understanding of the algorithms
used and clarify of explanations. Code with no comments to explain it will be awarded zero marks in
the code review.
Assignment #1 [15 marks] = small scale cosine similarity recommender system
algorithm
The source_code must code a cosine similarity recommender system algorithm to train and then
predict ratings for a small (100k) set of items.
Feedback >> Automated evaluation of results to compute the MAE score of predictions. To get a
good mark your algorithms will need to do much better than the easiest baseline of a hardcoded
recommender system which returns a fixed value (e.g. average ratings of corpus; average for a
particular item; average for a particular user). As an indication, for the cohort of the last year, an
MAE 0.78 was awarded an average score (~70%) and an MAE 0.71 was awarded the best score
(100%). However, these values may change as a different test set will be given this time.
Marks >> Marks assigned based on MAE [15 marks]
Optional Assignment [0 marks] = small scale matrix factorization recommender
system algorithm
The source_code must code a small-scale matrix factorization recommender system algorithm to
train and then predict ratings for a small (100k) set of items. This task is optional and does not
contain any marks. The goal of this task is for the students to practise matrix factorization on a
smaller dataset before using a large dataset without having to deal with database.
Feedback >> Automated evaluation of results to compute the MAE score of predictions. As an
indication, for the cohort of the last year, an MAE 0.73 was awarded an average score (~70%) and an
MAE 0.68 was awarded the best score (100%). However, these values may change as a different test
set will be given this time.
Marks >> 0 marks
Assignment #2 [25 marks] = large scale matrix factorization recommender system
algorithm
The source_code must code a large-scale matrix factorization recommender system algorithm to
train and then predict ratings for a large (20M) set of items. You may need to use a database to
handle the large numbers of ratings.
Note: If you really have trouble with assignment 3 and cannot get your source_code to generate any
predictions at all, then you can submit the self-described code anyway with an empty results file.
You will score zero marks for the empty results (10), but you will still be able to score 'method'
marks (15) for your self-described code if you show evidence of a deep understanding of the
algorithm and maths behind it.
Feedback >> Automated evaluation of results to compute the MAE score of predictions. The MAE
score will be compared against a low-scoring benchmark, which is a basic cosine similarity
recommender system algorithm without any optimisation work. To get a good mark your algorithms
will need to do better than a cosine similarity algorithm. As an indication, for the cohort of the last
year, an MAE 0.64 was awarded an average score (~70%) and an MAE 0.6 was awarded the best
score (100%). However, these values may change as a different test set will be given this time.
Marks >> Marks assigned based on MAE [10 marks]
Marks >> Manual inspection of self-described code assessing the criteria of (a) clarity of selfdescribed code and (b) depth of understanding of algorithm and maths behind it. Submission of an
incorrect or impossible to read source code file (i.e. a file that is not a python or java source file
serialized as a single plain text file) will result in a zero mark [15 marks]
Notes and Restrictions
Make sure that you provide a prediction for each of the rows in the test set. Failure to provide a
complete set will result in missing predictions, which will be set to zero by default and cause a higher
MAE rate than would otherwise be achieved.
Given the size of the large dataset, we recommend that you use a database. A simple database such
as SQLite suffices. Example java and python code for using a database is provided to help and using it
will not count as plagiarism. It is not a requirement to use it, however.
Feedback
Feedback will be returned automatically for this coursework each time you submit one of your 10
submission attempts per assignment, in the form of an emailed MAE report (not marks) based on
automated evaluation of your submitted rating predictions.
The final marks you get for each assignment will be emailed 4 weeks after the deadline. This allows
time for marks to be computed, student extensions processed etc. This mark confirmation email will
not contain any additional written feedback.
Learning Outcomes
B1. Use recommender technologies such as item-based and user-based collaborative filtering
techniques
D1. Set up social computing experiments and analyse the results using a scientific approach
Late submissions
Late submissions will be penalised according to the standard rules.
The handin submission time is based on your last submission, so if you submit after the deadline you
will incur a late penalty.
Plagiarism
source_code will be checked using an automated code similarity checker. Do not cut and paste code
from online sources like tutorials (i.e. plagiarism) or other students (i.e. collusion). Write your own
code and your own self-described comments. Reusing your own work from earlier submissions for
assignments in this module is explicitly allowed. Code and comment similarity checks will be on a
per-assignment basis, with your best scoring source_code compared to other students best scoring
source_code for that assignment.
Any violations, deliberate or otherwise, will be reported to the Academic Integrity Officer.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。