Automated Fact Checking
1.Dataset
We will be using the publicly available Fact Extraction and Verification (FEVER) dataset(http://fever.ai/resources.html). It consists of 185,445 claims manually verified against Wikipedia pages and classified as Supported, Refuted and NotEnoughInfo. For the first two classes, there is a combination of sentences forming the necessary evidence supporting or refuting the claim. This dataset consists of a collection of documents (wiki-pages), a labeled training subset (train.jsonl), a labeled development subset (dev.jsonl) and a reserved testing subset (test.jsonl). For a claim in the train.jsonl file, the value of the "evidence" field is a list of relevant sentences in the format of [_, _, wiki document ID, sentence ID].
2. Claims for IR Tasks
To reduce computational time, for the subtask 2 and 3, you are supposed to return retrieval results only for the first 10 verifiable claims in the train.jsonl file. The list of the claim IDs is [75397, 150448, 214861, 156709, 129629, 33078, 6744, 226034, 40190, 76253].
3. Involved Subtasks
3.1 Text Statistics. Count frequencies of every term in the collection of documents, plot the curve of term frequencies and verify Zip’s Law. Report the values of the parameters for Zipf’s law for this corpus.
3.2 Vector Space Document retrieval. Extract TF-IDF representations of the 10 claims and all the documents respectively based on the document collection. The goal of this representation to later compute the cosine similarity between the document and the claims. Hence, for computational efficiency, you are allowed to represent the documents only based on the words that would have an effect on the cosine similarity computation. Given a claim, compute its cosine similarity with each document and return the document ID (the "id" field in the wiki-page) of the five most similar documents for that claim.
3.3 Probabilistic Document Retrieval. Establish a query-likelihood unigram language model based on the document collection and return the five most similar documents for each one of the 10 claims. Implement and apply Laplace Smoothing, Jelinek-Mercer Smoothing and Dirichlet Smoothing to the query-likelihood language model, return the five most similar documents for the 10 claims.
3.4 Sentence Relevance. For a claim in the training subset and the retrieved five documents for this claim (either based on cosine similarity or the query likelihood model), represent the claim and sentences in these documents based on a word embedding method, (such as Word2Vec, GloVe, FastText or ELMo). With these embeddings as input, implement a logistic regression model trained on the training subset. Use this model to identify five relevant documents to the claims in the development data and select the five most relevant sentences for a given claim within these documents. Report the performance of your system in this dataset using an evaluation metric you think would fit to this task. Analyze the effect of the learning rate on the model training loss. Instead of using Python sklearn or other packages, the implementations of the logistic regression algorithm should be your own.
3.5 Relevance Evaluation. Implement methods to compute recall, precision and F1 metrics. Analyze the sentence retrieval performance of your model using the labelled data in the development subset.
3.6 Truthfulness of Claims. Using the relevant sentences in Subtask 4 as your training data and using their corresponding truthfulness labels in the train.jsonl file, build a neural network based model to assess the truthfulness of a claim in the training subset. You may use existing packages like Tensorflow or PyTorch in this subtask. You are expected to propose your own network architecture for the neural network. Report the performance of your system in the labelled development subset using evaluation metrics you have implemented. Furthermore, describe the motivation behind your proposed neural architecture. The marks you get will be based on the quality and novelty of your proposed neural network architecture, as well as its performance. (15 marks in total)
3.7 Literature Review. Do a literature review regarding fact checking and misinformation detection, identify pros and cons of existing models/methods and provide critical analysis. Explain what you think the drawback of each of these models are (if any).
3.8 Propose ways to improve the machine learning models you have implemented. You can either propose new machine learning models, new ways of sampling/using the training data, or propose new neural architectures. You are allowed to use existing libraries/packages for this part. Explain how your proposed method(s) differ from existing work in the literature. The marks you get will be based on the quality and novelty of your proposed methods, as well as the performance of your final model.
4. What to submit
You are expected to submit all the code you have written, two csv files containing the retrieval results of subtask 2 and 3 respectively, a jsonl file containing your final model’s prediction (the results of the model you have obtained in the last step) on the test subset.
Unless otherwise stated above, all the code should be your own and you are not allowed to reuse any code that is available online.
For model performance comparison, students are encouraged to submit their test subset predictions to the FEVER Challenge CodaLab and report their results as well. Marks for the last subtask will be based on your model’s performance on the test subset, as well as the quality and the novelty of your proposed method.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。