Socket Programming: HTTP Web Proxy Server (without POST)
In this project, you will learn how web proxy servers work and one of their basic functionalities –
caching.
Your task is to develop a small web proxy server which is able to cache web pages. It is a very simple
proxy server which only understands simple GET-requests, but is able to handle all kinds of objects - not
just HTML pages, but also images.
Generally, when the client makes a request, the request is sent to the web server. The web server then
processes the request and sends back a response message to the requesting client. In order to improve the
performance we create a proxy server between the client and the web server. Now, both the request
message sent by the client and the response message delivered by the web server pass through the proxy
server. In other words, the client requests the objects via the proxy server. The proxy server will forward
the client’s request to the web server. The web server will then generate a response message and deliver it
to the proxy server, which in turn sends it to the client.
With respect to the client, the proxy server behaves like a server, and with respect to the web server, the
proxy server behaves like a client. In what follows, the web server will also be referred to as “original
server”.
Your code has to implement inter-process communication at the socket level: It has to explicitly
create sockets, send data to the sockets, and receive data from the sockets. You are not allowed to
use higher level mechanisms and you won't get credit if you do so.
1. Programming Language
There is some flexibility for the programming language, but you need to get the TA’s approval.
2. Running the Proxy Server
To run the proxy server program locally on your computer, start the proxy using your command prompt.
Then launch your browser. For the browser to direct requests to the proxy server, use “localhost”, along
with the destination port number.
For example, to go to google.com, instead of typing www.google.com, you would type
http://localhost:5005/www.google.com
5005 is an arbitrarily chosen port number where the client can reach the proxy server. The only
requirement is that the port number should not coincide with any of the reserved port numbers. To use the
proxy server with browser and proxy on separate computers, you will need the IP address on which your
proxy server is running. In this case, while running the proxy, you will have to replace the “localhost”
with the IP address of the computer where the proxy server is running. Also note the port number used.
You will replace the port number used here “5005” with the port number you have used in your server
code at which your proxy server is listening.
3. Outline for the Proxy Server Logic
HTTP runs over TCP, so you will be using TCP sockets. You should insert print statements in your code
as shown below.
Create a server socket welcomeSocket, bind to a port p # This is the welcoming socket used to listen to
requests coming from a client
Listen on welcomeSocket
While True
# print ‘WEB PROXY SERVER IS LISTENING’
Wait on welcomeSocket to accept a client connection
When a connection is accepted, a new client connection socket is created. Call that socket
clientSocket
Read the client request from clientSocket
# Print the message received from the client (see note below)
Process the request
Split the message to extract the header
Parse the header to extract the method, dest address, HTTP version
# Print the extracted method, dest address and HTTP version (see note below)
if method is GET
Look up the cache to determine if requested object is in the cache
if object is not in the cache # Need to request object from the original server
# Print “Object not found in the cache” message
Create a serverSocket to send request to the original server
Compose the request header, send request
# Print the request sent to the original server (see note below)
Read response from the original server
Parse it to extract the relevant components
# Print the response header from the original server (see note below)
if response is 200 OK
Write object into cache
Close serverSocket
Compose response and send to client on clientSocket
# Print the response header from the proxy to the client (see note below)
if object is in the cache
# Print “Object found in the cache” message
Read from the cache, compose response, send to client on clientSocket
# Print the response header from the proxy to the client (see note below)
else # Method is not GET
Create a serverSocket to send request to the original server
Compose the request header, send request
# Print the request sent to the original server
Read response from the original server
Parse it to extract the relevant components
# Print the response header from the original server
Compose response to the client and send on clientSocket
# Print the response header from the proxy to the client
Close clientSocket
# End of while True
Close welcomeSocket
Note: Examples can be found in the screenshots below
HTTP Processing done by Proxy
Examples of parsing done by the Proxy to extract the method, dest address, HTTP version, hostname,
URL, file name from the client HTTP request, and examples of the Proxy building the request sent to the
original server can be found in the screenshots below.
4. Required Extensions
In addition to the above, you are required to implement the following.
Error Handling
Currently the proxy server does no error handling. This can be a problem especially when the
client requests an object which is not available, since the "404 Not found" response usually has no
response body and the proxy assumes there is a body and tries to read it. Extend the code to be able to
handle the "404 Not found" case.
Caching
Caching: A typical proxy server will cache the web pages each time the client makes a particular
request for the first time. The basic functionality of caching works as follows. When the proxy
gets a request, it checks if the requested object is cached, and if yes, it returns the object from the
cache, without contacting the server. If the object is not cached, the proxy retrieves the object
from the server, returns it to the client and caches a copy for future requests. In practice, the proxy
server must verify that the cached responses are still valid and that they are the correct responses
to the client's requests. You can read more about caching and how it is handled in HTTP in RFC
2068. Add the simple caching functionality described above. You do not need to implement any
replacement or validation policies. Your implementation, however, will need to be able to write
responses to the disk (i.e., the cache) and fetch them from the disk when you get a cache hit. For
this you need to implement some internal data structure in the proxy to keep track of which
objects are cached and where they are on the disk. You can keep this data structure in main
memory; there is no need to make it persist across shutdowns.
Not in the scope of the assignment: Support for POST
The simple proxy server supports only HTTP GET method. Add support for POST, by including
the request body sent in the POST-request.
5. Validation Scenarios
In order to validate your code, use the following scenarios. Note the details of HTTP messages may
depend on the browser used.
htmldog.com/examples/images1.html, response is 200 OK
Do the following for <WebSite> = htmldog.com/examples/images1.html
Step Action Expected behavior
0 Clear the browser’s cache and make sure the
proxy’s cache is empty
1 Start your proxy Proxy should display “WEB PROXY
SERVER LISTENING” or similar message
2 Launch the browser and type
http://localhost:5005/<WebSite>
The browser should display the page
requested. In addition, the Proxy should
display the info shown in screenshot #1. In
particular, it should indicate the objects
requested by the client are not found in the
cache. At the end of this step, check that the
objects requested by the client are in the
cache.
3 type again http://localhost:5005/<WebSite> The browser should display the page
requested. In addition, the Proxy should
display the info shown in screenshot #2. In
particular, it should indicate the objects
requested by the client are found in the
cache.
Screenshot #1 (Objects not in cache)
Screenshot # 2 (Objects in cache)
www.google.com/unknown, response is 404 not found
Do the following for <WebSite> = www.google.com/unknown (assume the Proxy is already running)
Step Action Expected behavior
Launch the browser and type
http://localhost:5005/<WebSite>
The browser should display the “404 not
found” page. In addition, the Proxy should
display the info shown in screenshot #3.
Screenshot #3
www.google.com, response is 302 found (redirection)
Do the following for <WebSite> = www.google.com (assume the Proxy is already running)
Step Action Expected behavior
Launch the browser and type
http://localhost:5005/<WebSite>
The browser should display the page
requested. In addition, the Proxy should
display the info shown in screenshot #4.
Screenshot #4
6. Research Paper Topics
Write a research paper on one of the below topics. The research paper is essentially a compilation
from the public literature, and you need to write it for your class mates as target audience. You may
be asked to make an in-class presentation of your paper (or a subset of it). More details on the
presentation will be provided at a later stage.
In the first phase, you are required to submit for approval a proposed outline listing the specific
items in your paper before you work on the detailed paper to be submitted in the final team report.
Below are the topics, please refer to the “Research-paper” document for details.
• Overview of SPDY/HTTP 2 and Google’s QUIC
• SSL/TLS
7. What to Turn In
1. A proposed action plan by the due date specified in “ProjectTimeline” in “Projects Overview”.
2. A proposed outline of the research paper for approval by the due date specified in “ProjectTimeline” in
“Projects Overview”.
3. A team report by the due date specified in “ProjectTimeline” in “Projects Overview”.
a) Including the complete code for including the extensions. The complete code includes the pieces
you modified or wrote and the pieces you did not have to modify. Include a README file that
describes how to compile/run the program.
b) Providing a background summary of HTTP and caching
c) Describing your hardware setup and configuration
d) Including the design document of your code. This should give a description of the various design
choices you have made, for example the error handling function, caching mechanism, etc.
e) Providing screenshots at the client side verifying that you indeed get the web page via the proxy
server in the normal case, and screenshots of other cases such as “404 Not found”. Also include
screenshots of the info displayed by the Proxy (HTTP messages, results of parsing, result of
cache look up, etc.)
f) Include screenshots of the client side browser verifying that you indeed get the web page via the
proxy server in the normal case,.
g) Describing what issues, if any, the team encountered during the project, how the team overcame
the issues and what the team learned from the project. You can also provide suggestions on how
the projects in Computer Networks could be improved in the future.
h) Including a video clip to demo the code running
i) Including the research paper
4. Individual reports, one for each team member by the due date specified in “ProjectTimeline” in
“Projects Overview”. The individual report is confidential and not shared with the other team members.
a) If you, as an individual team member, have anything specific to add to 1.f) in the team report,
please do it in your individual report. Describe what issues, if any, you, as an individual team
member, encountered during the project, how you overcame the issues and what you learned from
the project (this is not necessarily just about the topic, could be related to teamwork, etc.). You
can also provide suggestions on how the projects in Computer Networks could be improved in the
future. This complements the team report with any individual viewpoint not included in the team
report.
b) Describe what each team member (including yourself) did and contributed to the project, and
assign a numerical score from 1 to 10 to each team member, including yourself. 1 is the poorest,
and 10 is the best.
5. Powerpoint slides for an in-class presentation, by the date specified in “ProjectTimeline” in “Projects
Overview”. The scope of the presentation will be specified at a later date, but it is a subset of the research
paper.
Note: There will be a separate session (taking place outside of lecture hours) for you to demo your code is
running and answer questions about your code and design. The in-class presentations and the code demo
sessions will take place towards the end of the semester.
8. Grading Criteria
General Note: The following are criteria used to come up with a team grade. Your final individual project
score is not necessarily the team grade, it may be flexed up or down, depending on your individual
contribution to the team and the quality of your individual report.
Coding (50%)
Source code of your well-structured and well-documented program. Include comments on your codes for
clarification. In addition, your code has to meet the validation scenarios in the project description. You
should be able to demonstrate good understanding of the code and be able to answer specific
questions on the code and design.
Documents (50%)
Group Report (24%)
The group report will be evaluated not only on its content, but also the professionalism of its appearance.
Research Paper and Presentation (26%)
Refer to the “Research-paper” for details on the criteria.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。