联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2025-01-10 09:31

CIS – General Lecture Project


Objective:

In this project, you will employ Python programming to conduct an analysis of a text-based dataset using Natural Language Processing (NLP) techniques. You are required to prepare a report comprising 300-500 words alongside a Python script containing a minimum of 100 lines of code.


Assignment Description:


Dataset Selection:

Students are encouraged to select a text dataset that captures their interest. Below are some suggested datasets:

Movie Reviews: A collection of movie reviews (e.g., IMDB, Rotten Tomatoes).

Tweets: A set of tweets regarding a specific topic (e.g., political sentiment, public opinion on social issues).

Product Reviews: Customer feedback data from e-commerce platforms like Amazon or eBay.

News Articles: A collection of news articles on a particular subject (e.g., sports, politics, technology).

Books or Articles: A dataset of books or articles suitable for analysis regarding topic modeling or keyword extraction.


Python Code Requirements:

The analysis should consist of at least 100 lines of Python code.

The code must encompass:

oText Preprocessing: Tokenization, removal of stopwords, stemming/lemmatization, and vectorization techniques such as TF-IDF or word embeddings.

oNLP and/or ML Techniques: Implement NLP and/or ML algorithms.

oData Visualization: Illustrate the distribution of key terms or phrases.

oBasic Evaluation: Assess model performance utilizing accuracy, precision, recall, or other pertinent metrics.


Written Report:


Word Count: 300-500 words

The report should include the following sections:

1.     Introduction:

Provide a brief introduction to the selected dataset and elucidate its relevance to your interests.

Outline the problem being tackled using NLP, such as sentiment analysis or topic modeling.


2.     Literature Review:

Present a concise overview of existing research or methodologies associated with your analysis or general NLP tasks, exploring areas like:

oThe application of NLP in analyzing sentiment on social media.

oSurveys of sentiment analysis methodologies employing machine learning.

oPrevious studies utilizing NLP in evaluating product reviews, movie critiques, or customer feedback.

oHighlight common techniques or algorithms (e.g., Random Forest, Naive Bayes, SVM, or advanced deep learning models like BERT) employed in your analysis.

3.     Methodology and Analysis:

Detail the steps undertaken to analyze the dataset:

oText Preprocessing: What text-cleaning measures were implemented?

oModeling: Which model(s) did you select for data analysis? Offer insights into the construction of your  machine learning model (e.g., feature engineering, algorithmic choice).

oDiscuss the outcomes of your analysis:

oWhat trends were observed in the data?

oWhat predictions did your model yield (e.g., sentiment classification)?

oDid you uncover any noteworthy patterns or correlations in the textual data?

4.     Research Proposal:

Suggest potential avenues for future research or enhancements to your analysis:

Is there a possibility to refine the model by integrating more sophisticated NLP techniques (e.g., transformer models like BERT)?

Could additional features (e.g., user metadata) be incorporated to boost prediction accuracy?


Evaluation Criteria:

(10%) Code Quality and Efficiency: The Python code should be well-written, clean, and efficient, with clear comments to explain each section.

(10%) Text Preprocessing: The quality and thoroughness of your text preprocessing steps will be evaluated.

(20%) Modeling and Analysis: How well you perform the analysis or other NLP tasks will be a key evaluation factor. This includes the appropriate choice of algorithm and the explanation of its use.

(10%) Visualization: The visualizations should clearly convey the results of your analysis, whether that be i.e. sentiment distributions, word clouds, or feature importance.

(50%) Written Report: The report should be well-structured, clear, and demonstrate a strong understanding of the task, literature, and findings.


Additional Guidelines:

Dataset Size: The dataset should have at least a few thousand data points (e.g., tweets, reviews, articles) for meaningful analysis.

Data Sources: Ensure the dataset is publicly available, and you clearly cite the source in the report.

Ethical Considerations: When using social media or other publicly available datasets, ensure to respect user privacy and ethical guidelines

Example Topics:

1.Sentiment Analysis of Movie Reviews:

Dataset: IMDB reviews or Rotten Tomatoes movie reviews.

Topic: Perform sentiment analysis on movie reviews to classify them as positive or negative and explore how sentiment correlates with movie success (box office performance).

2.Analysis of Tweets on a Political Topic:

Dataset: A collection of tweets from Twitter about a political issue or candidate.

Topic: Analyze the topics of tweets and identify public opinion trends regarding a political event or figure.

3.Product Review Sentiment Analysis:

Dataset: Product reviews from Amazon, eBay, or another e-commerce platform.

Topic: Analyze the sentiment of customer reviews to predict product success or identify areas for improvement.

4.Topic Modeling in News Articles:

Dataset: A collection of news articles on various topics.

Topic: Perform topic modeling to identify key themes in news coverage or detect emerging trends in current events.


By the end of this project, students will have hands-on experience with NLP techniques and machine learning models and will gain insights into how these methods can be applied to real-world data. The project will also help students improve their ability to interpret and communicate their findings effectively through code and a written report.


Please submit the following:


A 300 - 500 word report in PDF format

At least 100 lines of Python code in .py format (.ipynb acceptable as well)


相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp