代写FIN7880 2024 Assignment 1代写Python语言-代写Matlab编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> Matlab编程Matlab编程

代写FIN7880 2024 Assignment 1代写Python语言

日期：2024-05-13 11:42

FIN7880

Assignment 1

Introduction

You will receive three movie review datasets.

• The first dataset is "movie_train_set.csv" and contains 21,000 reviews, each with a positive (1) and negative (0) sentiment label.

• Use this dataset to train and test your sentiment classification model.

• The second dataset is "movie_test_set_masked.csv" and contains 4,000 reviews.

• Use this dataset to score sentiments of movie reviews of your developed ML models.

• The third dataset is "unlabeledTrainData.tsv” and contains 50,000 unlabeled movie reviews.

• Use this dataset to build the W2V embedding model.

Sample Data

Introduction

• Develop sentiment classification models using the given training data (first dataset).

• Use your best ML classification model to score the sentiment of the 4,000 movie reviews in the evaluation dataset (second dataset).

Problem Description

Before you can train your ML classification model, the text in the movie

review column must be encoded into a numerical format. The following two

approaches are recommended for encoding the movie review.

• Bag of Words:

• Utilize CountVectorizer/TfidfVectorizer to encode the movie review in Dataset 1 and Dataset 2

• Word Embedding:

• Implement Skipgram/CBOW to create a Word2Vec model based on the content of Dataset 3.

• Utilize the Word2Vec model to encode the movie reviews in Dataset 1 and Dataset 2

Evaluation

Your evaluation will be based on the completion of the following tasks:

• Steps used to pre-process the movie reviews

• How to represent the movie reviews in a vector space model using CountVectorizer/TfidfVectorizer and W2V embedding methods

• Train/Test Preparation (e.g., 70/30 split).

• Model training and evaluation (Naïve Bayes, SVM, Logistic Regression, etc.)

• Use of hyperparameters

• Model evaluations (Confusion Matrix, Accuracy, Precision, F1 score, AUC, ROC, etc.)

• Score the sentiment of reviews in the evaluation dataset using your best-trained model.

• Save the scoring results to a CSV file named 'movie_evaluation.csv ’ in a proper format.

What should you submit in the Moodle folder?

1. Python Notebook, which has the following steps:

• Prepare data, vectorize text, and train/test/evaluate models.

• Predict sentiment scores for 4,000 reviews in 'movie_test_set_masked.csv ’.

• Save scoring output as 'movie_evaluation.csv ’.

2. CSV File ('movie_evaluation.csv’) with the following format.

• Column1: 'recordNum’

• Column 2: 'sentiment' (predicted sentiment: 0/1)

3. Written Report: - Microsoft PowerPoint format.

Assessment Criteria

1. Preprocessing of the Text [30%]

• Tokenization, stopwords removal, normalization + Python Program [20%]

• Text Vectorization approaches

2. Machine Learning Model {40%]

• Train/testing preparation + Python programs [5%]

• Machine Learning Model Training/Testing + Python programs [25%]

• Machine Learning Model Evaluation Accuracy [10%]

3. Summary of processes and results in written format [30%]

• Discuss the model performance using the different text vectorization approaches

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写CSCI-UA.0202-002: Operating Systems Final Exam 2020代写留学生R语言

【下一篇】：代写CSCI-UA.0202-002: Operating Systems Final Exam 2020代写留学生R语言

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Matlab编程Matlab编程

代写FIN7880 2024 Assignment 1代写Python语言

日期：2024-05-13 11:42

相关文章