Group Project Specification
300958 Social Web Analytics
Due Date: Friday of Week 12
1 Aim
2 Method
3 Group Size and Organisation
4 Due Date and Submission
5 Report Format
6 Marks
7 Declaration
8 Project Description
1 Aim
The Group Project provides you with a chance to analyse the Social Web using knowledge obtained from this unit and a computer based-statistical package. For this project, we will focus on identifying a chosen company’s Twitter image.
2 Method
To complete this project:
1. Read through this specification.
2. Form a group and register your group using the Project Groups section of vUWS.
3. Choose a company that is active on Twitter, check that it is not already on the list of Group Project Twitter Handles. Then submit the Twitter handle of the company using the same link. Note that a given company cannot be allocated to more than one group. If duplicate company names are found on the list, the group with the later time stamp will be asked to find a new company.
4. Complete the data analysis required by the specification.
5. Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and presented well.
6. Include the student declaration text on the front page of your report. Please make sure that the names and student numbers of each group member are clearly displayed on the front page. If a group member did not contribute to any part of the project, state their contribution as 0% (no contribution means 0 mark).
7. Submit the report as a PDF file by the due date using the Submit Group Project. All code and the outputs must be shown in the project, also include comments in the code to explain what you tried to do. Any submissions other than a PDF file will not be marked.
3 Group Size and Organisation
Students in groups of size 3 or 4 are to work together to complete this project. One project report is to be submitted per group.
The group must be formed by signing-up to a group within the Project section of 300958 in vUWS. Zero marks will be awarded to lone submissions.
Groups must be formed by week 7. Once the group is formed, one person should be nominated within the group to be responsible for submitting the report.
4 Due date and Submission
The project report is due by 11:59 p.m. on the Friday of week 12. The report must be submitted as a PDF file using the assignment submission facilities in the Assessment 1 section of 300958 in vUWS. Only one student from each group needs to submit the assignment.
5 Report Format
Once the required analysis is performed by the group, the members of the group are to write up the analysis as a report. Remember that the assessor will only see the group's report and will be marking the group's analysis based on your report. Therefore, the report should contain a clear and concise description of the procedures carried out, comments on the code, explanations of what you tried to do, the analysis of results and any conclusions reached from the analysis.
The required analysis in this specification covers the material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report.
6 Marks
This project is worth 30% of your final grade, and so the project will be marked out of 30. The project consists of three investigations and will be marked using the following criteria:
Marks Criteria Satisfied
10 marks First section completed correctly.
11 marks Second section completed correctly.
7 marks Third section completed correctly.
There are also two marks allocated to presentation (based on the report formatting, style, grammar, clarity and mathematical notation). If the report looks like something that would be submitted to an employer, then the full two marks will be awarded.
If a report is submitted late, the maximum mark it can achieve will be reduced by 10% per day.
7 Declaration
The following declaration must be included in a clearly visible and readable place on the first page of the report.
"Names and Student IDs of all group members who contributed the project"
Student Name Student Number Contribution(%)
By including this statement, we the authors of this work, verify that:
We hold a copy of this assignment that we can produce if the original is lost or damaged.
We hereby certify that no part of this assignment/product has been copied from any other student's work or from any other source except where due acknowledgement is made in the assignment.
No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
We are aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
We hereby certify that we have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.
Note: An examiner or lecturer/tutor has the right not to mark this project report if the above declaration has not been added to the cover of the report.
8 Project Description
Due Week 12, Friday 11:59 pm
The company "Old School Business", also known as OSB wants to start using social media to promote its business. They have approached your team with a request to find what other businesses have done successfully using social media. OSB are particularly interested in using Twitter and so have asked your group to perform the following analysis on Twitter. To begin, find a company that has a Twitter handle with over 10,000 followers and 1500 tweets, then perform the following tasks using the chosen Twitter handle.
8.1 Analysing the source of the tweets
The company wants to know how other companies and the public post their tweets. They want to use this information to understand if there is a relationship between the source of a tweet and the retweeting behaviour.
1. Use rtweet library to download 1000 tweets that the company posted. Save these tweets as “tweets.company”.
2. Use rtweet library to download 1000 tweets about the company you selected. Save these tweets as “tweets.public".
3. Examine the source column of both the company and the public tweets to see the source of tweets. Find out how many different levels of sources exist in the public and company tweets.
4. Draw a bar plot of the top 10 most frequent tweet sources for both company tweets and the public tweets. Label each bar with the source name.
5. Comment on your bar plots.
6. By using an appropriate statistical test, test whether retweeting is independent of the tweet source that the public posted. Use the “source” and “is_retweet” columns to get the source and retweet information. Group the sources as; “Salesforce - Social Studio”, "Twitter for Android", “Twitter for Ipad”, “Twitter for iPhone”, “Twitter Web App”, “Twitter Web Client” and “Other”.
7. What is the conclusion of the test? Interpret your results.
8. Calculate a 95% confidence interval of the text width used in the tweets that the company posted. Use the “display_text_width” column to get this information.
8.2 Themes in public and company tweets
To be successful on Twitter, a company needs to provide useful information to its followers and encourage customers to talk about their posts. We will examine this information so that we can suggest what OSB can tweet about. We do not want to present all tweets to OSB, so we must identify if there is a set of common tweet themes between the public and company tweets. This process involves:
9. Combine tweets.public and tweets.company and save as tweets.
10. Clean and pre-process the data (use TFIDF weights in your analysis).
11. Compute the most appropriate number of clusters using the elbow method for the combined tweets by using cosine distance.
12. Cluster the tweets using the most appropriate clustering method.
13. Visualize your clustering in 2-dimensional vector space. Show each cluster in a different colour and the tweets in tweets.public and tweets.company with different symbols in your visualization.
14. Comment on your visualization.
15. Compute the proportion of tweets.public at each cluster. Print these proportions.
16. Which clusters are dominated by the public and which are dominated by the company?
17. Draw a word cloud and a dendrogram of these two clusters to understand the theme of the clusters.
8.3 Following friends
We are unsure if friending leads to an increase in popularity. To examine this, we will: (you can use twitteR package in this section).
18. Find the most popular 10 friends of the chosen Twitter handle.
19. Obtain a 1.5-degree egocentric graph centred at the chosen Twitter handle and plot the graph. The egocentric graph should contain the most popular 10 friends of the chosen Twitter handle.
20. Compute the betweenness centrality score for each Twitter handle in our graph. List the top 3 most central people in your graph according to the betweenness centrality.
21. Comment on your results.
Important notes:
Note that in Section 8.3, depending on the friends of the chosen twitter handle, you possibly will reach the rate limit of the Twitter API. I strongly recommend that you save your objects as an RData file once you download friends - so you can continue downloading friends the following day or with a different authentication key. For more information on how to save your objects see: https://stackoverflow.com/questions/19967478/how-to-save-data-file-into-rdata.
See this https://developer.twitter.com/en/docs/basics/rate-limiting.html for more information on the rate limit.
It is also strongly recommended that you save your tweets.public and tweets.company once you have downloaded them. Otherwise you will get different tweets each time you run your script and you will need to change your clustering.
The company wants the above three-part analysis to be written up as a professional report. Each part should have its own section of the report and all questions should have thoughtful answers. Include all the R code along with its output in your assignment. Output without the code, or code without the output will result zero marks for the relevant section.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。