联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2025-07-03 09:46

Project 2: Simplified PageRank Algorithm


95 Points Possible

In Progress

NEXT UP: Submit Assignment

6/23/2025 to 7/19/2025

Attempt 1 Add Comment

Details

FAQ

Simplified PageRank FAQs

The course staff will maintain an active FAQ as a Google document to answer your questions. Post your questions

in Slack, but we will answer them in this document and send you the link.

This assignment handoutin PDF form: Project2_SimplifiedPageRank.pdf

()

Problem Statement

In the late 90s, as the number of webpages on the internet was growing exponentially, different search engines

were trying different approaches to rank the webpages. At Stanford, two computer science Ph.D. students,

Sergey Brin and Larry Page, were working on the following questions: How can we trust information? Why are

some webpages more important than others? Their research led to the formation of the Google search engine.

In this project, you are required to implement a simplified version of the original PageRank algorithm on which

Google was built by representing the web as a graph and implementing this graph using an Adjacency List or an

equivalent data structure. The PageRank algorithm is an algorithm that is used to order or rank different

webpages on the internet.

? Representing the Web as a Graph

The entire internet consists of different webpages that can be represented as a graph. Each node represents a

webpage, and each edge represents a link between two webpages. This graph can be implemented as an

Adjacency Matrix or an Adjacency List. In this assignment, you are supposed to implement the Adjacency List

Module 10

Project 2: Simplified PageRank

Project 2: Simplified PageRank Algorithm

1/14

representation. If you use an Adjacency Matrix representation, you will be penalized by 25% of your

implementation points and will also fail the large test cases.

? Adjacency Matrix Representation

Now, for the sake of simplicity, we are explaining the project in the form of an Adjacency Matrix, M. We represent

the internet in the form of |V|x|V| matrix where |V| is the total number of vertices in this graph or the total

number of webpages on the internet. Each vertex, V , is a webpage on the entire internet. In the below graph, we

have five vertices or webpages. Within our graph, if there is an edge from V to V (the from_page points to_page),

we have the value in our adjacency matrix M = 1 and 0 otherwise.

Each webpage is thus a node in the directed graph and has incoming edges and outgoing edges. Each node has a

rank, r. According to PageRank, this rank, r, is equally split among the node's outgoing links. In the below figure, the

rank of node i is denoted by it, and this rank is similarly divided among node i's three outgoing edges.

i

i j

ij

M=

1 2 3 4 5

1 0 1 0 1 0

2 0 0 0 1 0

3 0 0 0 0 1

4 0 0 1 0 0

5 1 1 0 0 0

Rank(i) = Rank(j)/out_degree(j) + Rank(k)/out_degree(k)

Project 2: Simplified PageRank Algorithm

2/14

According to PageRank, this rank, r, is equal to the sum of the incoming ranks. In the above figure, the rank of node

i, i = k /out_degree(k) + j /out_degree(j); Thus, the rank is based on the indegree (the number of nodes pointing to

it) and the importance of an incoming node. This is important considering, let's say, you create your webpage and

have a million links to other pages of importance, then you might artificially inflate your webpage's ranking if this

is not the case. If, for calculating the rank, we used our links, we could have easily duped the algorithm. Therefore,

the rank is only based on in-links.

r r r

M=

1 2 3 4 5

1 0 1 0 1 0

2 0 0 0 1 0

3 0 0 0 0 1

4 0 0 1 0 0

5 1 1 0 0 0

Project 2: Simplified PageRank Algorithm

3/14

? Core Idea of PageRank

Important webpages will point to other important webpages.

Each page will have a score, and the search results will be based on the page score (called page rank).

? Goal

In this assignment, you need to compute the rank of the webpages using a Simplified PageRank Algorithm, which

is explained in the example below. You are supposed to implement an Adjacency List data structure to represent

the graph.

Input

Line 1 contains the number of lines (n) that will follow and the number of power iterations (p) you need to

perform. Each line from 2 to n+1 will contain two URLs    from_page to_page separated by a single space. This

means that the from_page points to the URL to_page.

Output

Print the PageRank of all pages after p powerIterations in ascending alphabetical order of the webpage. Also,

round off the rank of the page to two decimal places.

Constraints

1 <= p <= 10,000

Input Output Constraints

Project 2: Simplified PageRank Algorithm

4/14

? Explanation of PageRank Through an Adjacency Matrix (Example)

Gradescope Test Case 1 Explanation (p=2):

Note: Here, p = 2, n = 7, |V| = 5

1 <= n <= 10,000

1 <= Unique webpages or |V| <= 10000

Input

7 2

google.com

google.com

facebook.com

ufl.edu

ufl.edu

maps.com

gmail.com

gmail.com

maps.com

ufl.edu

google.com

gmail.com

facebook.com

maps.com

Output

facebook.com 0.20

gmail.com 0.20

google.com 0.10

maps.com 0.30

ufl.edu 0.20

Project 2: Simplified PageRank Algorithm

5/14

Algorithm

Graph Representation ofthe Above Input

Data Structure 1

1 google.com

2 gmail.com

3 facebook.com

4 maps.com

5 ufl.edu

Step 1 Step 2 Step 3

Project 2: Simplified PageRank Algorithm

6/14

Mapping for Simplicity

(Optional, but you will need a mechanism to store unique vertices.)

Use a map/associative array to map the URLs with a unique ID:

1 google.com

2 gmail.com

3 facebook.com

4 maps.com

5 ufl.edu

Graph Representation and Page Rank

In PageRank, the equation to calculate the rank for your graph is given as follows:

Rank of a Page, r = M.R where,

M is the matrix with values given by the following:

M = 1/d if there is an edge from V to V (D is the outdegree of node j)

0 otherwise

For our graph, the adjacency matrix, M will look like:

1 2 3 4 5

1 0 0 0 0 1/d

2 1/d 0 0 0 1/d

3 0 0 0 1/d 0

4 1/d 1/d 0 0 0

5 0 0 1/d 0 0

ij j j i j

5

1 5

4

1 2

3

Power Iteration, r(t+1) = M.r(t)

This means that the rank of the webpage at time t+1 is equal to the rank of that page at time t multiplied

by matrix, M. To achieve this, we create our matrix M based on input. Next, we initialize r(t), which is a

matrix of size |V|x1 and consists of the ranks of every webpage. We initialize r(t) to 1/|V|. Next, we

compute power_iterations based on our input, p. There is mathematical proof that the matrix r converges,

for example, r(t+1) = r(t), at which point the algorithm stops. However, this is difficult to test, and we,

therefore, will be testing your algorithm on a fixed power iteration value, p.

Example r and M matrices for our input:

r(0)

1

1 1/5

Project 2: Simplified PageRank Algorithm

7/14

r(1) = M.r(0) =

2 1/5

3 1/5

4 1/5

5 1/5

M

1 2 3 4 5

1 0 0 0 0 1/d

2 1/d 0 0 0 1/d

3 0 0 0 1/d 0

4 1/d 1/d 0 0 0

5 0 0 1/d 0 0

5

1 5

4

1 2

3

M x

1 2 3 4 5

1 0 0 0 0 1/2

2 1/2 0 0 0 1/2

3 0 0 0 1 0

4 1/2 1 0 0 0

5 0 0 1 0 0

r(0) =

1

1 1/5

2 1/5

3 1/5

4 1/5

5 1/5

r(1)

1

1 1/10

2 1/5

3 1/5

Project 2: Simplified PageRank Algorithm

8/14

Gradescope Test Case 2 Explanation (p=3):

r(t+1) = r(2) = M.r(1) =

Note: In our input case, the number of power_iterations, p, is 2. Therefore, we print r(1) of the URLs

sorted in alphabetical order. If p was 1, then return the initializing rank matrix or r(0). If p>2, the

process repeats where you multiply the matrix, M, with the new rank matrix r(t+1) at the next

iteration.

4 3/10

5 1/5

M x

1 2 3 4 5

1 0 0 0 0 1/2

2 1/2 0 0 0 1/2

3 0 0 0 1 0

4 1/2 1 0 0 0

5 0 0 1 0 0

r(1) =

1

1 1/10

2 1/5

3 1/5

4 3/10

5 1/5

r(2)

1

1 1/10

2 3/20

3 3/10

4 1/4

5 1/5

Project 2: Simplified PageRank Algorithm

9/14

? Optional Template

You can create a template, but make sure your code passes the sample test cases. An example template is in the

section below. Click the arrow to see the example.

This template is also provided in the P2 Catch Template (https://github.com/COP3530/P2-Catch-Template) . Even

if you choose to make your own code structure, the template will likely still be useful to you as it sets up Catch

testing for you.

Example Template

class AdjacencyList {

private:

//Think about what member variables you need to initialize

public:

//Think about what helper functions you will need in the algorithm

};

void AdjacencyList::PageRank(int n){ }

// prints the PageRank of all pages after p powerIterations in

ascending alphabetical order of webpagesand rounding rank to two

decimal places

// This class and method are optional. To accept the input, you can use this method:

int main()

{

int no_of_lines, power_iterations;

std::string from, to;

std::cin >> no_of_lines;

std::cin >> power_iterations;

for(int i = 0;I < no_of_lines; i++)

{

std::cin >> from;

std::cin >> to;

// Do Something

}

//Create a graph object

Created_Graph.PageRank(power_iterations);

Project 2: Simplified PageRank Algorithm

10/14

Supplemental Resources

The tabs below offer optional resources to help you with this project.

}

Page Rank Paper citation

Breakdown by Jackie Wang

Project 2 Breakdown Video (https://www.youtube.com/watch?v=BKDudJ8rim8)

Project 2 Breakdown Slides

(https://docs.google.com/presentation/d/1kIDcckTooyPVc1GOJObORseuFAB5skGYMwkFrQceBjM/edit?

usp=sharing)

Additional Page Rank Videos: https://www.youtube.com/playlist?

list=PLLssT5z_DsK9JDLcT8T62VtzwyW9LNepV (https://www.youtube.com/playlist?

list=PLLssT5z_DsK9JDLcT8T62VtzwyW9LNepV)

(Assignment is Based on These Videos)

Lecture 5

Lecture 6

Lecture 7

Lecture 8

Cloud-Based and Local Catch Testing Template

UPDATED TEMPLATE LINK(https://github.com/COP3530/P2-Catch-Template)

PageRank Paper PageRank Videos Project 2 Template

The documentation for the template is included in its readme file, linked above on GitHub. Use this template

when starting your project to have the least amount of trouble when you begin to test your code. Ideally, you

should start writing unittests as soon as you startimplementing your methods! This will save you from a lot

of trouble closer to the deadline.

Video Guide for Template with VSCode/Codespaces

Project 2: Simplified PageRank Algorithm

11/14

Testing

Test your code on Gradescope before submitting your implementation. You have five available test cases and

can submit any number of times.

Create your tests and test as much as possible. We recommend that you spend at least 1 C2 hours on testing

extensively.

We will stick to the input format. There is no need to test for off-input statements, such as inputting one URL

instead of two in a line or testing whether a URL is valid.

You can use the updated P2 Catch Template (highly

recommended) to build and test your code. It is recommended to start writing tests assoon as you start on the

project, which will be made easier with the template. If you choose not to use the template, you may still find

the Common Issues section useful if you run

into problems with Catch. Additionally, the provided test.cpp has an example test that shows how to use

Catch to easily test a multiline string input and output using "raw strings".

Grading

Video Guide for Template with CLionImplementation [75 points]

You are supposed to implement an Adjacency List data structure to represent the graph. Failure to

implement this will incur a 25-point deduction.

Your code will be tested on 15 test cases, each worth 5 points:

Five publicly available test cases.

Ten test cases will be added by the course staff and are not shown to the students.

Documentation [14.5 Points]

Submit a document addressing these prompts:

Describe the data structure you used to implement the graph and why. [2.5 points]

What is the computational complexity of each method in your implementation in the worst case in terms

of Big O notation?[5 points]

What is the computational complexity of your main method in your implementation in the worst case in

terms of Big O notation?[5 points]

What did you learn from this assignment, and what would you do differently if you had to start over?[2

points]

Implementation Documentation Code Style and Design Catch Tests

Project 2: Simplified PageRank Algorithm

12/14

? Submission

You are required to upload on Gradescope the following:

At least one .cpp file with your implementation of the entire project, including the main. If you use header files

or more than one .cpp file, upload all of them together on Gradescope. Feel free to choose the names for your

header and .cpp files. Make sure atleast one .cpp file contains the main() method.

One PDF file that has your documentation. Upper limit    3 pages with 20% reduction in report per extra page.

The cover page is not required; just your name on the front page of the PDF. Name this PDF as Report.pdf.

Failure to follow these instructions will lead to a 10% non-negotiable penalty.

? Return to Module (https://ufl.instructure.com/courses/538770/pages/module-10-graph-terminology-and?implementation)

Code Style and Design [5.5 Points]

The eight points for code style and design are divided into:

2.5 points for design decisions, a well-designed Graph API, and code modularity.

1 point for appropriate comments.

1 point for consistent whitespace mechanism.

1 point for consistent naming conventions.

Catch Tests (Submitted Previously)

No additional Catch tests are required beyond the five that you submitted for the Project 2 Testing Check-in

assignment. However, it is still recommended that you create more tests as you complete your project to be

certain that it is working correctly.

Project 2: Simplified PageRank Algorithm

13/14

2 weeks, 2 days

Jun 23 at 12:00AM Jul 16 at 11:5

Late Due Date: Jul 19 at 11:5

1 week, 2 days

Jun 08 at 12:00AM Jul 09 at 11:5

5 days, 1 hour

Jun 15 at 12:00AM Jul 05 at 11:5

May 19 at 12:00AM Jun 11 at 11:5

Late Due Date: Jun 14 at 11:5

COP3530-Su25 Summer 2025

Course ID: 1038573

? Name ? Status Released Due (EDT

Project 2: Simplified PageRank Algorithm No Submission

Project 2 Testing Check-In No Submission

Project 1 Resubmission No Submission

Project 1: Gator AVL Project 79.34 / 95.0

Submit Programming Assignment

? Upload all files for your submission

Drag & Drop

Any file(s) including .zip. Click to browse.

Submission Method

? Upload

Project 2: Simplified PageRank Algorithm

14/14


相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp