Natural Language Engineering:
Assessed Coursework 2
Submission format: You should submit one file that should either be a Python notebook
or a zip file containing a Python notebook and any other files (e.g., images
or Python files) that you want to include in the notebook.
Due date: Your work should be submitted on the module’s Canvas site before 4pm
on Thursday 26th November. This is Thursday of week 9. The standard late
penalties apply.
Return date: Marks and feedback will be provided on Canvas on Thursday December
17th for all submissions that are submitted by the due date.
Weighting This assessment contributes 20% of the mark for the module.
Overview
For this assignment you are asked to complete a python notebook (‘NLEassignment1.ipynb‘)
which is provided with these guidelines. It is based on activities that you have already
completed in labs during weeks 1-7 of the module. Any code you have developed
during the labs can be submitted as part of your answers to the questions in the assignment.
To score highly on this assignment you will need to demonstrate that you:
• understand the theory and your code;
• can write and document high quality python code;
• can develop code further to solve related problems;
• can carry out experiments and display results in a coherent way;
• can analyse and interpret results; and
• can draw conclusions and understand limitations of the technology.
For this report you should submit a single Python notebook containing all of your answers
to all of the questions in ‘NLEassignment2.ipynb‘. You may import from standard
libraries and the ‘sussex nltk‘ resources which you have been provided with. If
you wish to import any other code, it must be included in a zip file with your notebook.
It must be possible for the assessors to run your Python notebook.
Marking Criteria and Requirements
Your submission will be marked out of 100. The assignment question is broken down
into 5 parts, all parts should be answered and the breakdown of marks between parts
is specified in the notebook. General and part specific criteria are given below. Please
read these guidelines carefully and ask if you have any questions.
1
General: 20 marks available
20 marks are available for the overall quality of your assignment. When awarding
these marks the following general guidelines will be considered.
• In order to avoid misconduct, you should not talk about these coursework
questions with your peers. If you are not sure what a question is asking
you to do or have any other questions, please ask me or one of the Teaching
Assistants.
• Your report should be no more than 2000 words in length excluding code
and the content of graphs, tables and any references.
• You should specify the length of your report. 2000 is a strict limit.
• You should use a formal writing style.
• All graphs should have a title and have each axis clearly labelled.
• In all parts, marks will be awarded for the quality of your written answers
as well as your code.
• Written / textual answers MUST be included in Markdown cells. Otherwise,
you may score 0 for these answers.
• Code on its own does not count as an explanation or a discussion. Nor do
code comments. Code should be commented but explanation and discussion
MUST be given as text in Markdown cells (see previous point!).
• Do not add external text (e.g. code, output) as images.
• Your code must be applied to and your explanations must refer to the unique
set of examples generated by entering your candidate number at the top of
the notebook. This must be your own candidate number. Otherwise you
may score 0.
• You should submit your notebook with the code having been run (i.e., with
the output displayed rather than cleared)
• It must be possible for the assessors to run your Python notebook.
Part 1: 10 marks available
Run generate features(sentences[:5]). With reference to the code and
the specific examples, explain how the output was generated [10 marks]
The following breakdown of marks will be applied
• Correct general explanation [2 marks]
• Correct explanation which refers to examples in the output [4 marks]
• Correct explanation which refers to steps in the code [4 marks]
Part 2: 10 marks available
Write code and find the 1000 most frequently occurring words that are
in your sample; AND have at least one noun sense according to WordNet
[10 marks]
2
The following breakdown of marks will be applied
• Clear and effective use of code to find most frequently occurring words in
sample [3 marks]
• Clear and effective use of code to identify words with at least one noun
sense in WordNet [3 marks]
• Clear and effective use of code to combine the conditions and display the
required words [4 marks]
Part 3: 20 marks available
Consider the code above which outputs the path similarity score, the
Resnik similarity score and the Lin similarity score for a pair of concepts
in WordNet. Answer the following questions [20 marks]
The following breakdown of marks will be applied
Part a: Clear explanation of each of the similarity scores and what the number calculated
means [6 marks]
Part b: Clear and effective use of code to find the semantic similarity of a pair of
words [2 marks]
Part b: Clear and effective use of code to find semantic similarity with a parameter
to specify the measure of semantic similarity [2 marks]
Part b: Explanation and justification of the strategy used for words which have
multiple senses [2 marks]
Part c: Clear and effective use of code to find semantic similarity of every pair of
words [4 marks]
Part c: Justification of choice of semantic similarity measure [1 mark]
Part d: Clear and effective use of code to identify the 10 most similar words to the
most frequent word in the corpus [3 marks]
Part 4: 15 marks available
The construction and use of distributional vector representations to
find similar words [15 marks]
The following breakdown of marks will be applied
Part a: Clear and effective use of code to construct distributional vector representations
of words in the corpus with a parameter to specify context size. [4
marks]
Part a: Clear and correct explanation of how you calculate the value of association
between each word and each context feature [4 marks]
Part b: Correct use of code to construct representations of the 1000 words identified
in Q2 with a window size of 1 [3 marks]
Part c: Clear and correct use of code and representations to find the 10 words which
are distributionally most similar to the most frequent word in the corpus. [4
marks]
3
Part 5: 25 marks available
Plan and carry out an investigation into the correlation between semantic
similarity according to WordNet and distributional similarity
with different context window sizes. You should make sure that you
include a graph of how correlation varies with context window size
and that you discuss your results. [25 marks]
The following breakdown of marks will be applied
• Description of plan of how to carry out the investigation [5 marks]
• Clear and effective use of code to carry out the investigation [3 marks]
• Correct calculation of correlation between WordNet similarity and distributional
similarity for at least one context window size [4 marks]
• Correct calculation of correlation between WordNet similarity and distributional
similarity for different window sizes [3 marks]
• Presentation of results [5 marks]
• Discussion of results / conclusions [5 marks]
4
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。