Application Example: WWW Communication in the WWW WWW

Werbung
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Application Example: WWW
Communication in the WWW
The Client/Server model is used in the WWW
In the following: application protocol examples for WWW and E-Mail
World Wide Web (WWW)
Access to linked documents, which are distributed over several computers in the
Internet
History of the WWW
• Origin: 1989 in the nuclear research laboratory CERN in Switzerland
• Developed to exchange data, figures, etc. between a large number of
geographically distributed project partners via Internet
• First text-based version in 1990
• First graphic interface (Mosaic) in February 1993, developed on to Netscape,
Internet Explorer…
• Standardization by the WWW consortium (http://www.w3.org)
Chapter 4.4: Application Protocols: WWW and E-Mail
Client (a Browser)
• Presents the actually loaded WWW page
• Permits navigating in the network (e.g. through clicking on a hyperlink)
• Offers a number of additional functions (e.g. external viewer or helper
applications)
• Usually, a browser can also be used also for other services (e.g. FTP, e-mail,
news,…)
Server
• Process which manages WWW pages.
• Is addressed by the client e.g. through indication of an URL (Uniform Resource
Locator = logical address of a web page). The server sends the requested page
(or file) back to the client.
Page 1
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 2
Chapter 4.4: Application Protocols: WWW and E-Mail
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
WWW, HTML, URL and HTTP
Loading of Web Pages
• WWW stands for World Wide Web and means the world-wide cross-linking of
information and documents – the is like a distributed application for document
management
• Each web page is addressed by a unique URL (Uniform Resource Locator)
(e.g. http://www-i4.informatik.rwth-aachen.de/education/tcpip).
Browser
PC
TCP/IP network
WWW server
Browser asks DNS for the IP address of the server
• The standard language for web documents is the HyperText Markup
Language (HTML)
DNS answers
• The standard protocol used between a web server and a web client is the
HyperText Transfer Protocol (HTTP).
– uses the TCP port 80
– defines the allowed requests and responses
– is an ASCII protocol
Browser opens a TCP connection to port 80 of the computer
Browser sends the command GET /info/general.html
WWW server sends back the file general.html
That is: HTTP establishes a TCP connection and downloads documents (web
pages) from a server in ASCII format
Chapter 4.4: Application Protocols: WWW and E-Mail
DNS server
Page 3
Connection is terminated
Chapter 4.4: Application Protocols: WWW and E-Mail
Page 4
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Loading of Web Pages
HTTP – Abstract Message Format
Example: Call of the URL http://www.informatik.rwth-aachen.de/material/general.html
1. The Browser determines the URL (which was clicked or typed).
2. The Browser asks the DNS for the IP address of the server www.informatik.rwthaachen.de.
3. DNS answers with 137.226.116.241.
4. The browser opens a TCP connection to port 80 of the computer 137.226.116.241
5. Afterwards, the browser sends the command GET /material/general.html
6. The WWW server sends back the file general.html.
7. The connection is terminated.
8. The browser analyzes the WWW page general.html and presents the text.
9. If necessary, each picture is reloaded over a new connection to the server
(The address is included in the page general.html in form of an URL).
Note!
Step 9 applies only to HTTP/1.0! With the newer version HTTP/1.1 all referenced
pictures are loaded before the connection termination (more efficiently for pages
with many pictures).
Page 5
Chapter 4.4: Application Protocols: WWW and E-Mail
GET
HTTP server
domain name
path name
GET http:// www.informatik.rwth-aachen.de / info / general.html
Instructions on a URL are e.g.
• GET: Load a web page
• HEAD: Load only the header of a web page
• PUT: Store a web page on the server
• POST: Append something to the request passed to the web server
• DELETE: Delete a web page
Chapter 4.4: Application Protocols: WWW and E-Mail
header field name
sp version cr lf
:
:
value
value
:
calue
GET server.name/path/file.type
version sp status code sp phrase cr lf
cr lf
header field name
:
value
cr lf
cr lf
header field name
:
value
cr lf
:
:
header field name
Page 6
HTTP Response Header
Request line: necessary part, e.g.
header field name
file name
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
HTTP Request Header
URL
http://server.name/path/file.type
protocol
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
method sp
URL
command
cr lf
cr lf
Data
Header lines: optionally, further
information to the host/document, e.g.
Host: www.rwth-aachen.de
Accept-language: fr
User-agent: Opera /5.0
header field name
400 Bad Request
404 Not Found
:
value
cr lf
Data
Entity Body: inquired data
HEAD method: the server answers, but
does not transmit the inquired data (e.g.
used if caching is applied)
sp: space
cr/lf: carriage return/line feed
Chapter 4.4: Application Protocols: WWW and E-Mail
200 OK
:
:
cr lf
Entity Body: optionally. Further
data, if the Client transmits data
(POST method)
Page 7
Status LINE: status code and
phrase indicate the result of an
inquiry and an associated message,
e.g.
Chapter 4.4: Application Protocols: WWW and E-Mail
Groups of status messages:
1xx: Only for information
2xx: Successful inquiry
3xx: Further activities are necessary
4xx: Client error (syntax)
5xx: Server error
Page 8
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
User Identification: Cookies
Proxy Server
Problem with HTTP: stateless protocol
• An HTTP session corresponds to one TCP connection. After the connection is
terminated, the web server “forgets” everything about the request.
• Simple principle, enough for browsing
• But: modern principles like online shops? Storing information about related
sessions?
• Solution: new header field Set-cookie: …
• Instructs client to store the received cookie together with the server name and fill it
in its own header in following requests to that server. Thus the server is able to
identify related requests
• Cookie content:
• name-value pair defined by the server for identification
• optional name-value pairs for e.g. comments, date, TTL
Page 9
Chapter 4.4: Application Protocols: WWW and E-Mail
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
A Proxy is an intermediate entity used by several browsers. It takes over tasks of the
browsers (complexity) and servers for more efficient page loading!
HTTP
Browser
HTTP
Browser
HTTP
Browser
Proxy
Server
Internet
Server
e.g. HTTP
Caching of WWW pages
• A proxy temporarily stores the pages loaded by browsers. If a page is requested
by a browser which already is in the cache, the proxy controls whether the page
has changed since storing it (sending a HEAD request to the server). If not, the
page can be passed back from the cache. If yes, the page is normally loaded
from the server and again stored in the cache, replacing the old version.
Integration into a Firewall
• The proxy can deny the access to certain web pages (e.g. in schools)
Page 10
Chapter 4.4: Application Protocols: WWW and E-Mail
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Electronic Mail: E-Mail
Electronic Mail: E-Mail
An e-mail system generally consists of two subsystems:
Early systems
A simple file transmission took place, with the convention that the first line contains
the address of the receiver of the file
Problems
E-Mail to groups, structuring of the e-mail, delegation of the administration to a
secretary, file editor as user interface, no mixed media
Solution
X.400 as standard for e-mail transfer. This specification was however too complex
and badly designed. Generally accepted only became a simpler system, cobbled
together “by a handful of computer science students”:
the Simple Mail Transfer Protocol (SMTP)
Chapter 4.4: Application Protocols: WWW and E-Mail
Page 11
• User Agent (UA, normal e-mail program)
Usually runs on the computer of the user and helps during the processing of
e-mails
Creation of new and answering of old e-mail
Receipt and presentation of e-mail
Administration of received e-mail
• Message Transfer Agent (MTA, e-mail server)
Usually runs in the background (around the clock)
Delivery of e-mail which is sent by User Agents
Intermediate storage of messages for users or other Message Transfer
Agents
User
User
User
Agent
User
Agent
Agent
Agent
Chapter 4.4: Application Protocols: WWW and E-Mail
Message
Transfer
Agent
Internet
Page 12
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Structure of an E-Mail
E-Mail Header
Header
For sending an e-mail, the following information is needed from the user:
• Message (usually normal text + attachments, e.g. word file, GIF image…)
• Destination address (in general in the form mailbox@location,
e.g. [email protected])
• Possibly additional parameters concerning e.g. priority or security
E-Mail formats: two used standards
• RFC 2822
• MIME (Multipurpose Internet Mail Extensions)
To:
Address of the main receiver (possibly several receivers or also a mailing list)
Cc:
Carbon copy, e-mail addresses of less important receivers
Bcc:
Blind carbon copy, a receiver which is not indicated to the other receivers
From:
Person who wrote the message
Sender:
Address of the actual sender of the message (possibly different to “From” person)
Received:
One entry per Message Transfer Agent on the path to the receiver
Return Path: Path back to the sender (usually only e-mail address of the sender)
With RFC 2822 an e-mail consists of
• a simple “envelope” (created by the Message Transfer Agent based on the data in
the e-mail header),
• a set of header fields (each one line ASCII text),
• a blank line, and
• the actual message (Message Body).
Page 13
Chapter 4.4: Application Protocols: WWW and E-Mail
Date:
Transmission date and time
Reply to:
E-Mail address to which answers are to be addressed
Message-Id: Clear identification number of the e-mail (for later references)
In-Reply-to: Message-Id of the message to which the answer is directed
References: Other relevant Message-Ids
Subject:
Page 14
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
E-Mail Header
MIME
RFC 2822: only suitably for messages of pure ASCII text without special characters.
Nowadays demanded additionally:
• E-Mail in languages with special characters (e.g. French or German)
• E-Mail in languages not using the Latin alphabet (e.g. Russian)
• E-Mail in languages not at all using an alphabet (e.g. Japanese)
• E-Mail not completely consisting of pure text (e.g. audio or video)
MIME keeps the RFC-2822 format, but additionally defines
• a structure in the Message Body (by using additional headers), and
• coding rules for non-ASCII characters
Meaning
MIME-Version:
Used version of MIME is marked
Content-Description:
String which describes the contents of the message
Content-Id:
Clear identifier for the contents
Content-TransferEncoding:
Coding which was selected for the contents of the email (some networks understand
e.g. only ASCII characters). Examples: base64, quoted-printable
Content-Type:
Type/Subtype regarding RFC 1521, e.g. text/plain, image/jpeg, multi-part/mixed
Chapter 4.4: Application Protocols: WWW and E-Mail
One line to indicate the contents of the message (is presented the receiver)
Chapter 4.4: Application Protocols: WWW and E-Mail
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Header
Meaning
Page 15
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
BOUNDARY= "8323328-2120168431-824156555=:325"
--8323328-2120168431-824156555=:325
Content-Type: TEXT/PLAIN; charset=US-ASCII
A picture is in the appendix
--8323328-2120168431-824156555=:325
Content-Type: IMAGE/JPEG; name="picture.jpg"
Content-Transfer-Encoding: BASE64
Content-ID: <PINE.LNX.3.91.960212212235.325B@localhost>
Content-Description:
/9j/4AAQSkZJRgABAQEAlgCWAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQIBAQEBA
QIBAQECAgICAgICAgIDAwQDAwMDAwICAwQDAwQEBAQEAgMFBQQEBQQEBAT/
2wBDAQEBAQEBAQIBAQIEAwIDBAQEBA […]
KKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoooo
AKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiig
AooooAD//Z
---8323328-2120168431-824156555=:325 —
Chapter 4.4: Application Protocols: WWW and E-Mail
Page 16
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
E-Mail over POP3 and SMTP
E-Mail over POP3 and SMTP
• User 1: writes an e-mail
Simple Mail Transfer Protocol (SMTP)
• Sending e-mails over a TCP connection (port 25)
• SMTP is a simple ASCII protocol
• Without checksums, without encryption
• Receiving machine is the server and begins with the communication
• If the server is ready for receiving, it signals this to the client. This sends the
information from whom the e-mail comes and who the receiver is. If the
receiver is known to the server, the client sends the message, the server
confirms the receipt
User
User
User
Agent
User
Agent
Agent
Agent
Message
Transfer
Agent
Internet
• Server 1 (MTA 1): Sets up a connection to the SMTP
server (MTA 2) of the receiver and sends a copy of
the e-mail
Internet
Page 17
• Client 2 (UA 2): sets up a connection to the mail
server and authenticates itself with username and
password (unencrypted!)
/* Receiver is ready/*
/* Identification of the sender/*
/* Server announces itself */
/* Sender of the e-mail */
/* Sending is permitted */
/* Receiver of the e-mail */
/* Receiver known */
Message
Transfer
Agent
POP3
User
User
User
Agent
AgentUser
Agent
• Client 2 (UA 2): formats the e-mail
Agent
• User 2: reads the e-mail
Page 18
Chapter 4.4: Application Protocols: WWW and E-Mail
• Authorizing phase:
Client
(UA)
POP3 Server
(MTA)
PC
TCP/IP
network
Greetings
Commands
/* Transfer of the whole e-mail,
including all headers */
/* Terminating the connection */
S = server, receiving MTA / C = Client, sending MTA
Get e-mails from the server by means of POP3:
TCP connection port 110
C: RCPT TO:<[email protected]>
S: 250 OK
/* The data are following */
C: DATA
S: 354 Start mail inputs; end with “<crlf>.<crlf>” on a line by itself
Chapter 4.4: Application Protocols: WWW and E-Mail
SMTP
POP3
Communication between partners (from abc.com to beta.edu) in text form of the
following kind:
S: 250 OK
C: QUIT
S: 221 <beta.edu> Server Closing
Message
Transfer
Agent
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
SMTP - Command Sequence
C: From: Krogull@…. <crlf>.<crlf>
SMTP
• Server (MTA 2): sends the e-mail to the client
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
S: 220 <beta.edu> Service Ready
C: HELO <abc.com>
S: 250 <beta.edu> OK
C: MAIL FROM:<[email protected]>
S: 250 OK
Agent
Internet
Message
Transfer
Agent
Chapter 4.4: Application Protocols: WWW and E-Mail
User
User
User
Agent
AgentUser
Agent
• Server (MTA 2): Produces the header of the e-mail
and places the e-mail into the appropriate mailbox
Post Office Protocol version 3 (POP3)
• Get e-mails from the server over a TCP
connection, port 110
• Commands for logging in and out, message download, deleting messages on
the server (maybe without transferring them to the client)
• Only copies e-mails of the remote server to the local system
User
User
User
Agent
User
Agent
Agent
Agent
• Client 1 (UA 1): formats the e-mail, produces the
receiver list, and sends the e-mail to its mail server
(MTA 1)
Page 19
Replies
USER name
PASS string
• Transaction phase:
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT
Minimal protocol with only two command types:
• Copy e-mails to the local computer
• Delete e-mails from the server
Chapter 4.4: Application Protocols: WWW and E-Mail
Page 20
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
POP3 Protocol
Authorizing phase
• user identifies the user
• pass is its password
• +OK or -ERR are possible server
answers
Transaction phase
• list for the listing of the
message numbers and the
message sizes
• retr to requesting a message
by its number
• dele deletes the appropriate
message
IMAP as POP3 “Variant “
S:
C:
S:
C:
S:
C:
S:
S:
S:
C:
S:
S:
C:
C:
S:
S:
C:
C:
S:
+OK POP3 server ready
user alice
+OK
pass hungry
+OK user successfully logged in
list
1
498
2
912
.
retr 1
<message 1 contents>
.
dele 1
retr 2
<message 2 contents>
.
dele 2
quit
+OK
Chapter 4.4: Application Protocols: WWW and E-Mail
Page 21
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Conclusion
Application protocols in the Internet…
• handle the functionality of layer 5 – 7 of the OSI reference model
• are unaware of the network, focusing on application-related tasks
• define syntax, semantics, and order of exchanged messages for certain applications
This lecture was about communication basics; what’s next?
• Mobile Communications: network and protocol variants for radio transmission
(WS 08/09)
• Distributed Systems: “general” application layer protocols, cooperation of
application processes as addition to communication, support of developing
distributed applications using middleware (SS 08)
• Modeling and Evaluation of Communication Systems: methods for analytic
evaluation of new systems or protocols before implementation (SS 08)
• Simulation: simulative evaluation as addition to analytic evaluation (SS 08)
• Security in Communication Networks: security-related aspects of communication,
e.g. encryption, authentication, anonymity, … (?)
• Multimedia Systems: media formats and coding, quality of service mechanisms, and
transfer/storage
of multimedia
dataand
(?)E-Mail
Page 23
Chapter
4.4: Application
Protocols: WWW
Enhancement of POP3: IMAP (Interactive Mail Access Protocol)
• TCP connection over port 143
• E-Mails are not downloaded and stored locally, but remain on the server
• The client performs all actions remotely. This is suitable for users who need access
to their e-mails from different hosts
• Protocols are more complex than with POP3: set up and manage remote
mailboxes
Meanwhile also many operators of web pages offer email services: gmx, web.de,
yahoo,…
Here finally again HTTP serves as protocol for the access to the e-mails. The
management is similar as with IMAP, only that the client is integrated into the web
server.
Chapter 4.4: Application Protocols: WWW and E-Mail
Page 22
Herunterladen