联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2024-09-22 09:50

COMP5328 - Advanced Machine Learning

Assignment 1

Due: 19/09/2024, 11:59PM

This assignment is to be completed in groups of 3 to 4 students. It is worth 25%

of your total mark.

1 Objective

The objective of this assignment is to implement Non-negative Matrix Factorization

(NMF) algorithms and analyze the robustness of NMF algorithms when the

dataset is contaminated by large magnitude noise or corruption. More speciffcally,

you should implement at least two NMF algorithms and compare their robustness.

2 Instructions

2.1 Dataset description

In this assignment, you need to apply NMF algorithms on two real-world face

image datasets: (1) ORL dataset

1

; (2) Extended YaleB dataset

2

.

• ORL dataset: it contains 400 images of 40 distinct subjects (i.e., 10 images

per subject). For some subjects, the images were taken at different times,

varying the lighting, facial expressions and facial details (glasses / no glasses).

All the images were taken against a dark homogeneous background with the

subjects in an upright, frontal position. All images are cropped and resized

to 92×112 pixels.

• Extended YaleB dataset: it contains 2414 images of 38 subjects under

9 poses and 64 illumination conditions. All images are manually aligned,

cropped, and then resized to 168×192 pixels.

1https://cam-orl.co.uk/facedatabase.html

2http://vision.ucsd.edu/ leekc/ExtYaleDatabase/ExtYaleB.html

1Figure 1: An example face image and its occluded versions by b × b-blocks with

b = 10, 12, and 14 pixels.

Note: we provide a tutorial for this assignment, which contains example code for

loading a dataset to numpy array. Please ffnd more details in assignment1.ipynb.

2.2 Assignment tasks

1. You need to implement at least two Non-negative Matrix Factorization (NMF)

algorithms:

• You should implement at least two NMF algorithms with at least one

not taught in this course (e.g., L1-Norm Based NMF, Hypersurface Cost

Based NMF, L1-Norm Regularized Robust NMF, and L2,1-Norm Based

NMF).

• For each algorithm, you need to describe the deffnition of the objective

function as well as the optimization methods used in your implementation.

2.

You need to analyze the robustness of each algorithm on two datasets:

• You are allowed to design your own data preprocessing method (if necessary).

You need to use a block-occlusion noise similar to those shown in Figure

1. The noise is generated by setting the pixel values to be 255 in the

block. You can design your own value for b (not neccessary to be 10, 12

or 14). You are also encouraged to design your own noise other than

the block-occlusion noise.

2• You need to demonstrate each type of noise used in your experiment

(show the original image as well as the image contaminated by noise).

• You should carefully choose the NMF algorithms and design experiment

settings to clearly show the different robustness of the algorithms you

have implemented.

3. You are only allowed to use the python standard library, numpy and

scipy (if necessary) to implement NMF algorithms.

2.3 Programming and External Libraries

This assignment is required to be ffnished by Python3. When you implement

NMF algorithms, you are not allowed to use external libraries which contains

NMF implementations, such as scikit-learn, and Nimfa (i.e., you have to implement

the NMF algorithms by yourself). You are allowed to use scikit-learn

for evaluation only (please ffnd more details in assignment1.ipynb). If you have

any ambiguity whether you can use a particular library or a function, please post

on canvas under the ”Assignment 1” thread.

2.4 Evaluate metrics

To compare the performance and robustness of different NMF algorithms, we provide

three evaluation metrics: (1) Relative Reconstruction Errors; (2) Average

Accuracy (optional); (3) Normalized Mutual Information (optional). For all

experiments, you need to use at least one metric, i.e., Relative Reconstruction

Errors. You are encouraged to use the other two metrics, i.e., Average

Accuracy and Normalized Mutual Information.

• Relative Reconstruction Errors (RRE): let V denote the contaminated

dataset (by adding noise), and Vˆ denote the clean dataset. Let W and H

denote the factorization results on V , the relative reconstruction errors

then can be deffned as follows:

RRE =

∥Vˆ − WH∥F

∥Vˆ ∥F

. (1)

• Average Accuracy: Let W and H denote the factorization results on

V , you need to perform some clustering algorithms (i.e., K-means) with

num clusters equal to num classes. Each example is assigned with the

cluster label (please ffnd more details in assignment1.ipynb). Lastly, you

3can evaluate the accuracy of predictions Ypred as follows:

Acc(Y, Ypred) =

1

n

Xn

i=1

1{Ypred(i) == Y (i)}.

• Normalized Mutual Information (NMI):

NMI(Y, Ypred) =

2I(Y, Ypred)

H(Y ) + H(Ypred)

,

where I(·, ·) is mutual information and H(·) is entropy.

Note: we expect you to have a rigorous performance evaluation. To provide

an estimate of the performance of the algorithms in the report, you can repeat

multiple times (e.g., 5 times) for each experiment by randomly sampling 90% data

from the whole dataset, and average the metrics on different subset. You are also

required to report the standard deviations.

3 Report

The report should be organized similar to research papers, and should contain the

following sections:

• In abstract, you should brieffy introduce the topic of this assignment and

describe the organization of your report.

• In introduction, you should ffrst introduce the main idea of NMF as well

as its applications. You should then give an overview of the methods you

want to use.

• In related work, you are expected to review the main idea of related NMF

algorithms (including their advantages and disadvantages).

• In methods, you should describe the details of your method (including

the deffnition of cost functions as well as optimization steps). You should

also describe your choices of noise and you are encouraged to explain the

robustness of each algorithm from theoretical view.

• In experiment, ffrstly, you should introduce the experimental setup (e.g.,

datasets, algorithms, and noise used in your experiment for comparison).

Second, you should show the experimental results and give some comments.

• In conclusion, you should summarize your results and discuss your insights

for future work.

4• In reference, you should list all references cited in your report and formatted

all references in a consistent way.

The layout of the report:

• Font: Times New Roman; Title: font size 14; Body: font size 12

• Length: Ideally 10 to 15 pages - maximum 20 pages

Note: Submissions must be typeset in LaTex using the provided template.

4 Submissions

Detailed instructions are as follows:

1. The submission contains two parts: report and source code.

(a) report (a pdf ffle): the report should include each member’s details

(student id and name).

(b) code (a compressed folder)

i. algorithm (a sub-folder): your code could be multiple ffles.

ii. data (an empty sub-folder): although two datasets should be inside

the data folder, please do not include them in the zip ffle. We will

copy two datasets to the data folder when we test the code.

2. The report (ffle type: pdf) and the codes (ffle type: zip) must be named

as student ID numbers of all group members separated by underscores. For

example, “xxxxxxxx xxxxxxxx xxxxxxxx.pdf”.

3. OOnly one student needs to submit your report (ffle type: pdf) to Assignment

1 (report) and upload your codes (ffle type: zip) to Assignment 1

(codes).

4. Your submission should include the report and the code. A plagiarism

checker will be used.

5. You need to clearly provide instructions on how to run your code in the

appendix of the report.

6. You need to indicate the contribution of each group member.

7. A penalty of minus 5 (5%) marks per each day after due (email late submissions

to TA and conffrm late submission dates with TA). Maximum delay is

10 days, after that assignments will not be accepted.

55 Marking scheme

Category Criterion Marks Comments

Report [80]

Abstract [3]

•problem, methods, organization.

Introduction [5]

•What is the problem you intend to solve?

•Why is this problem important?

Previous work [6]

•Previous relevant methods used in literature?

Methods [25]

•Pre-processing (if any)

•NMF Algorithm’s formulation.

•Noise choice and description.

Experiments and Discussions [25]

•Experiments, comparisons and evaluation

•Extensive analysis and discussion of results

•Relevant personal reflection

Conclusions and Future work [3]

•Meaningful conclusions based on results

•Meaningful future work suggested

Presentation [5]

•Grammatical sentences, no spelling mistakes

•Good structure and layout, consistent formatting

•Appropriate

citation and referencing

•Use graphs and tables to summarize data

Other [8]

•At the discretion of the marker: for impressing

the marker, excelling expectation,

etc. Examples include clear presentation,

well-designed experiment, fast code, etc.

6Category Criterion Marks Comments

Code [20]

•Code runs within a feasible time

•Well organized, commented and documented

Penalties [−]

•Badly written code: [−20]

•Not including instructions on how to run

your code: [−20]

Note: Marks for each category is indicated in square brackets. The minimum mark for the assignment will be 0 (zero).

7


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp