COMP61021: Modelling and Visualisation of High Dimensional Data
Lab 3: Self-Organizing Map Implementation and Application (Assessed)
This coursework (a zipped file) must be submitted via the Blackboard. The deadline of this
lab exercise is 23:30 on 26th November 2019. The late submission policy is applied (see the
teaching website and FAQs for details).
Kohonen’s Self-Organizing Map (SOM) is a biologically inspired unsupervised learning method that
can be used to learn and visualise the topological nature of high dimensional data. In this exercise,
you are asked to use Matlab to implement and make use of SOMs to visualize 2D and 3D data
sets. You will then build a SOM which is capable of finding similar images in a collection of images.
You can download the relevant Matlab code and the image data set from
http://syllabus.cs.manchester.ac.uk/pgt/COMP61021/Lab/lab2.zip
After unzipping it, you can see three zipped packages, lab2_Part1.zip, lab2_Part2.zip and
Image.zip, that contains the Matlab code for Parts 1 and 2 as well as an image data set used for
Parts 2 and 3, respectively. To use those functions contained in these zipped packages, you need
to unzip them to a location such as C:\COMP61021\som and then use the Matlab command
addpath('C:\COMP61021\som'), which indicates to Matlab where your code is.
Now… the lab exercise ...
PART 1 – SOM Implementation
The purpose of this exercise is to demonstrate that you understand the details of the SOM
algorithm. For Part 1, you need to complete the unfinished one-dimensional SOM implementation
in the file lab_som.m. [3 Marks] Comments in the file indicate what you need to add to complete
the functionality. You should add your own comments to explain your implementation details
of the function in your code. To test your implementation, you can use the following commands:
1. data=nicering;
2. som=lab_som(data, numNeurons, steps, learningRate, radius);
3. lab_vis(som, data);
You should design and document an experiment to find suitable hyper-parameters for the function
lab_som. Your results should show that the inter-connected SOM units, represented by yellow
points and red lines, approximate the shape of the ring accurately using a chain. [2 Mark]
You should now complete the SOM implementation in lab_som2d.m, which has similar
functionality to lab_som.m but makes use of a two-dimensional grid instead of a chain. [3 Marks]
You should again investigate the best hyper-parameters for the SOM and visualize your results by
using the following commands:
1. data=nicering;
2. [som,grid]=lab_som2d(data, width, height, steps, learningRate, radius);
3. lab_vis2d(som, grid, data);
This time your visualisation should show a mesh approximated to the shape of the ring. [2 Mark].
All the hyper-parameters used in experiments and main results must appear explicitly in
your report.
PART 2 – Image clustering with SOM
The purpose of this part is to show you a practical application of SOM. In Part 2, you will use a
sophisticated SOM Toolbox (see http://www.cis.hut.fi/projects/somtoolbox/documentation/ for
details), which provides various methods for initializing, training and visualizing SOMs. The toolbox
is included lab2_Part2.zip so there is no need to download it. The aim is to train a SOM that
can be used to find images that are similar to each other. You are provided with 172 images in
Image.zip, which your SOM will be trained with. Images are very large pieces of data with many
redundant components and are usually unsuitable to fed directly into a machine learning algorithm.
Therefore, a feature extraction/selection process is often necessary before applying a machine
learning algorithm. To extract features that can be used for training, run the following command:
[imgs,training]=lab_featuresets('x:\My\Path\To\Images\', -1);
This will return the array of images used to train the data imgs and the training data itself
training. You should then run the command som_gui to launch the training interface. The GUI
allows you to import the training data training from the workspace, train a SOM and visualize its
structure. You can experiment with the various training parameters.
After you have trained your SOM, you should save it to the workspace with a name such as som
using Load/Save->Save Map. After saving, you can visualise the U-Matrix for the SOM using the
command som_show(som, 'umat', 'all'). You should explain what the U-matrix represents
in general and what it says about the SOM you have just trained. [1 Mark]
You can then run the command lab_showsimilar(imgs,training,som.codebook,1), which
will allow you to click through various sets of similar images the SOM thinks. It is expected that
your results will not be perfect and there will be quite a lot of invalid matches. However, your
results should produce matches that are better than using the function lab_showsimilar on an
untrained SOM. You should experiment with the parameters in the function som_gui and report on
what you found to be the best in your own experiment. [1 Mark].
All the hyper-parameters used in this experiments and main results, e.g., U-Matrix and
clusters you observed from the U-Matrix, must be described in your report. You are
suggested highlighting the clustering results by annotating the U-Matrix directly if possible.
PART 3 – Bonus marks
Additional marks are available for students who further look into feature extraction issues. In
Lab2_Part2, the file lab_features.m contains a function that extracts various types of features
from an image. This function is called automatically for each image in the image directory when the
function lab_featuresets is run. You should read the comments in the code and attempt to
implement some of the missing features. You should then test whether you achieve better results
or not with your new features. One may instead be able to find an alternative method that is more
effective than those proposed in lab_features.m. Full marks can be awarded only if you can
justify why to gain a success with sufficient evidence and your result is significantly better
than that with the default code given in Part 2.
DELIVERABLES
A zipped file, named “yourname-lab2.zip”, including a report in the PDF format (two singleside
A4 pages (font of 11pt) and one additional page can be used only for Part 3 if you do)
and all relevant source code in Part 1 along with a readme.txt file in the text format. The
zipped file must be submitted via the Blackboard.
In your report, you should give key points of your implementation and results for Part 1 and a full
description of your observation for Part 2 as well as your feature extraction/selection method and
its justification for Part 3 if you do it, and any graphs that you think represent your achievements for
different parts. For Part 2, you should describe the hyper-parameters used in your experiments, the
U-matrix and observed clusters in the report without including the images/the system in the zip file.
Your readme.txt file must contain a step-by-step procedure for Parts 1 and 2 so that a marker
can follow your instructions to run your submitted code and the SOM system used in Part 2
straightforwardly for replicating the results described in your report.
Take note, we are not interested in the details of your code, what Matlab functions are called, what
they return etc. This course unit is about machine learning algorithms, and is indifferent to how you
program them in Matlab.
There is no specific format – marks will be allocated roughly on the basis of:
• rigorous experimentation
• how informative and well your results are presented in your report
• imagination/research/understanding/performance in Part 3 (if you do)
• grammar, ease of reading
The lab is marked out of 15:
Part 1 – Implementation 10 marks
Part 2 – Image clustering 2 marks
Part 3 – Bonus 3 marks
Mark and Feedback will be available on the Blackboard. Once the marking is completed,
you will be notified via email.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。