联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-11-19 11:33

Natural Language Engineering:

Assessed Coursework 2

Submission format: You should submit one file that should either be a Python notebook

or a zip file containing a Python notebook and any other files (e.g., images

or Python files) that you want to include in the notebook.

Due date: Your work should be submitted on the module’s Canvas site before 4pm

on Thursday 26th November. This is Thursday of week 9. The standard late

penalties apply.

Return date: Marks and feedback will be provided on Canvas on Thursday December

17th for all submissions that are submitted by the due date.

Weighting This assessment contributes 20% of the mark for the module.

Overview

For this assignment you are asked to complete a python notebook (‘NLEassignment1.ipynb‘)

which is provided with these guidelines. It is based on activities that you have already

completed in labs during weeks 1-7 of the module. Any code you have developed

during the labs can be submitted as part of your answers to the questions in the assignment.

To score highly on this assignment you will need to demonstrate that you:

? understand the theory and your code;

? can write and document high quality python code;

? can develop code further to solve related problems;

? can carry out experiments and display results in a coherent way;

? can analyse and interpret results; and

? can draw conclusions and understand limitations of the technology.

For this report you should submit a single Python notebook containing all of your answers

to all of the questions in ‘NLEassignment2.ipynb‘. You may import from standard

libraries and the ‘sussex nltk‘ resources which you have been provided with. If

you wish to import any other code, it must be included in a zip file with your notebook.

It must be possible for the assessors to run your Python notebook.

Marking Criteria and Requirements

Your submission will be marked out of 100. The assignment question is broken down

into 5 parts, all parts should be answered and the breakdown of marks between parts

is specified in the notebook. General and part specific criteria are given below. Please

read these guidelines carefully and ask if you have any questions.

1

General: 20 marks available

20 marks are available for the overall quality of your assignment. When awarding

these marks the following general guidelines will be considered.

? In order to avoid misconduct, you should not talk about these coursework

questions with your peers. If you are not sure what a question is asking

you to do or have any other questions, please ask me or one of the Teaching

Assistants.

? Your report should be no more than 2000 words in length excluding code

and the content of graphs, tables and any references.

? You should specify the length of your report. 2000 is a strict limit.

? You should use a formal writing style.

? All graphs should have a title and have each axis clearly labelled.

? In all parts, marks will be awarded for the quality of your written answers

as well as your code.

? Written / textual answers MUST be included in Markdown cells. Otherwise,

you may score 0 for these answers.

? Code on its own does not count as an explanation or a discussion. Nor do

code comments. Code should be commented but explanation and discussion

MUST be given as text in Markdown cells (see previous point!).

? Do not add external text (e.g. code, output) as images.

? Your code must be applied to and your explanations must refer to the unique

set of examples generated by entering your candidate number at the top of

the notebook. This must be your own candidate number. Otherwise you

may score 0.

? You should submit your notebook with the code having been run (i.e., with

the output displayed rather than cleared)

? It must be possible for the assessors to run your Python notebook.

Part 1: 10 marks available

Run generate features(sentences[:5]). With reference to the code and

the specific examples, explain how the output was generated [10 marks]

The following breakdown of marks will be applied

? Correct general explanation [2 marks]

? Correct explanation which refers to examples in the output [4 marks]

? Correct explanation which refers to steps in the code [4 marks]

Part 2: 10 marks available

Write code and find the 1000 most frequently occurring words that are

in your sample; AND have at least one noun sense according to WordNet

[10 marks]

2

The following breakdown of marks will be applied

? Clear and effective use of code to find most frequently occurring words in

sample [3 marks]

? Clear and effective use of code to identify words with at least one noun

sense in WordNet [3 marks]

? Clear and effective use of code to combine the conditions and display the

required words [4 marks]

Part 3: 20 marks available

Consider the code above which outputs the path similarity score, the

Resnik similarity score and the Lin similarity score for a pair of concepts

in WordNet. Answer the following questions [20 marks]

The following breakdown of marks will be applied

Part a: Clear explanation of each of the similarity scores and what the number calculated

means [6 marks]

Part b: Clear and effective use of code to find the semantic similarity of a pair of

words [2 marks]

Part b: Clear and effective use of code to find semantic similarity with a parameter

to specify the measure of semantic similarity [2 marks]

Part b: Explanation and justification of the strategy used for words which have

multiple senses [2 marks]

Part c: Clear and effective use of code to find semantic similarity of every pair of

words [4 marks]

Part c: Justification of choice of semantic similarity measure [1 mark]

Part d: Clear and effective use of code to identify the 10 most similar words to the

most frequent word in the corpus [3 marks]

Part 4: 15 marks available

The construction and use of distributional vector representations to

find similar words [15 marks]

The following breakdown of marks will be applied

Part a: Clear and effective use of code to construct distributional vector representations

of words in the corpus with a parameter to specify context size. [4

marks]

Part a: Clear and correct explanation of how you calculate the value of association

between each word and each context feature [4 marks]

Part b: Correct use of code to construct representations of the 1000 words identified

in Q2 with a window size of 1 [3 marks]

Part c: Clear and correct use of code and representations to find the 10 words which

are distributionally most similar to the most frequent word in the corpus. [4

marks]

3

Part 5: 25 marks available

Plan and carry out an investigation into the correlation between semantic

similarity according to WordNet and distributional similarity

with different context window sizes. You should make sure that you

include a graph of how correlation varies with context window size

and that you discuss your results. [25 marks]

The following breakdown of marks will be applied

? Description of plan of how to carry out the investigation [5 marks]

? Clear and effective use of code to carry out the investigation [3 marks]

? Correct calculation of correlation between WordNet similarity and distributional

similarity for at least one context window size [4 marks]

? Correct calculation of correlation between WordNet similarity and distributional

similarity for different window sizes [3 marks]

? Presentation of results [5 marks]

? Discussion of results / conclusions [5 marks]

4


相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。