The application layer
Principles of Network Applications
Communication paradigm
Client-server paradigm: server:
- always-on host
- permanent IP address
- often in data centers, for scaling clients:
- contact, communicate with server
- may be intermittently connected
- may have dynamic IP addresses
- do not communicate directly with each other examples: HTTP, IMAP, FTP
Peer-peer architecture:
- no always-on server
- arbitrary end systems directly communicate
- peers request service from other peers, provide service in return to other peers
- peers are intermittently connected and change IP addresses example: P2P file sharing [BitTorrent]
Sockets
Process: program running within a host(resources management and allocation)
- Addressing: IP address + port
- Communication:
- within same host: IPC(inter-process communication)
- different host: exchanging messages
Protocols
Securing TCP:
Vanilla TCP & UDP sockets:
- no encryption
- cleartext passwords sent into socket traverse Internet in cleartext (!)
Transport Layer Security (TLS):
- provides encrypted TCP connections
- data integrity
- end-point authentication
Web and HTTP
HTTP
HTTP: hypertext transfer protocol
- Web's application-layer protocol
- client/server model
- stateless: server maintains no infomation about past client requests(so we need cookie)
- HTTP uses TCP in the transport-layer
There are two types of HTTP connections:
- Non-persistent HTTP: at most one object sent over TCP connection. It is used by HTTP/1.0 by default.
- Persistent HTTP: multiple objects can be sent over single TCP connection between client, and that server. Used by HTTP/1.1 by default.
Non-persistent HTTP example:
But it is bothering to initiate TCP every time we want to sent a request because it takes time and gets OS overhead:
RTT
RTT is short for roundtrip time
Therefore, we use persistent HTTP(HTTP1.1) more. Property of persistent HTTP (HTTP1.1):
- server leaves connection open after sending response
- subsequent HTTP messages between same client/server sent over open connection
- client sends requests as soon as it encounters a referenced object
- as little as one RTT for all the referenced objects (cutting response time in half)
HTTP message
Request message HTTP request message general format:
Other HTTP request messages:
- GET method: include user data in URL field of HTTP GET request message (following a ‘?’)
- POST method: for web page including form input
- PUT method: uploads new file (object) to server, and completely replaces file that exists at specified URL with content in entity body of POST HTTP request message
- HEAD method: request the headers of a resource, but without the actual content (body) of the resource, similar to the GET method but without getting response body
Response message:
- Format:
- Status line: HTTP version Status code Status message
- Headers: Content-type etc
- Body(optional): html, jpeg, etc
Cookies
Why we need cookies:
HTTP GET/response interaction is stateless:
- no need for client/server to track “state” of multi-step exchange
- all HTTP requests are independent of each other
- no need for client/server to “recover” from a partially-completed-but-never-completely-completed transaction
Considering stateful one protocol:
What is Cookie
Cookie is a small piece of data sent from a server to a client's web browser when client requests server for the first time and it is stored on the client side.
Format: Set-Cookie: <cookie-name>=<cookie-value>; <attributes>
What it can do:
- track user behavior on a given website (first party cookies)
- track user behavior across multiple websites (third party cookies) without user ever choosing to visit tracker site (!)
How third party cookies(from websites you didn't choose to visit) track users' behavior?
The third-party: like ad.com. When you visit example.com, it returns you cookies and you store it locally. This is the first party cookie. But at the same time you will request ad.com in some way and they send cookies to you and store them.
When you visit example2.com, there is still ad there so you send request to ad.com and they store the cookies recording you have visited example2.com.
In this way, the ad.com know you have visited example.com, example2.com and so on. It knows more about you and could give you personalized ads on sites with ad.com resourses embedded in.
Tip
In chrome the cookie of a specific site we are visiting can be seen by using application in developer tools.
GDPR
When cookies can identify an individual, cookies are considered personal data, subject to GDPR(general data protection regulation) personal data regulations.
Web caches
caching's goal
To satisfy client requests without involving origin server. Then this can reduce response time for client request and reduce traffic on an institution's access link.
How it works:
- user configures browser to point to a (local) Web cache and browser sends all HTTP requests to cache
- if object in cache: cache returns object to client
- else cache requests object from origin server, caches received object, then returns object to client
Cache-Control
There is Cache-Control
section in both HTTP request and response headers.
-
In request header, used to indicate how the client (usually a browser) wants to interact with caches when requesting a resource.
-
In response header: used by the server to specify caching instructions for the client (browser) and intermediate caches (like CDNs or proxy servers).
Cache-Control: private
: the response is specific to a single user and should not be cached by shared caches (e.g., proxy servers or CDNs). However, it can be cached in the browser's local cache.Cache-Control: public
: the response can be cached by both private and shared caches. This is typically used for resources that are meant to be publicly available, such as images or static files.
HTTP/2
Goal: decrease delay in multi-object HTTP requests
Property: increased flexibility at server in sending objects to client:
- methods, status codes, most header fields unchanged from HTTP 1.1
- transmission order of requested objects based on client-specified object priority (not necessarily FCFS)
- push unrequested objects to client
- divide objects into frames, schedule frames to mitigate HOL(head-of-line) blocking(first large packets block the latter small ones)
3 main components: user agent(mail reader), mail severs, SMTP(simple mail transfer protocol)
DNS(domain name system)
Services
services:
- translate hostname to IP address
- host/mailserver aliasing
Structure
- root name servers: official, contact-of-last-resort by name servers that can not resolve name
- Top-Level domain(TLD) severs:
- responsible for .com, .org, .net, .edu, .aero, .jobs, .museums, and all top-level country domains, e.g.: .cn, .uk, .fr, .ca, .jp
- Network Solutions: authoritative registry for .com, .net TLD
- Educause: .edu TLD
- Authoritative DNS servers: organization’s own DNS server(s), providing authoritative hostname to IP mappings for organization’s named hosts.
Caching
Once (any) name server learns mapping, it caches mapping, and immediately returns a cached mapping in response to a query.
- caching improves response time
- cache entries timeout (disappear) after some time (TTL)
- TLD servers typically cached in local name servers
DNS records
DNS: distributed database storing resource records(RR)
RR format:(name, value, type, ttl)
Common record types:
- A record(address record)
- name: domain name
-
value: IPv4 address
-
AAAA record(IPv6 address record)
- name: domain name
-
value: IPv6
-
NS record(name server)
- name: domain
-
value: hostname of authoritative name server for this domain
-
CNAME
- name: alias name for some "canonical"(the real) name(another domain name who has its own records)
-
value: canonical name
-
MX record(mail exchange record)
- name: domain name
- value: name of SMTP mail server associated with the domain name
DNS protocol messages:
An DNS response message in wireshark(results with display filter: dns)
Useful commands
commands
There are some useful commands about DNS:
nslookup –option1 –option2 host-to-find dns-server
nslookup is used to resolved the IP address of a specific domain.
With options like: nslookup –type=NS zju.edu.cn
. This gives the DNS servers and theire IP of domain zju.edu.cn
.
ipconfig \display dns
will display the cached DNS records on your computer.
ipconfig \flushdns
will flush these DNS records.
P2P applications
P2P architecture
Basic idea
- no always-on server
- arbitrary end systems directly communicate
- peers request service from other peers, provide service in return to other peers
- self scalability – new peers bring new service capacity, and new service demands
- peers are intermittently connected and change IP addresses
- complex management
- examples: P2P file sharing (BitTorrent), streaming (KanKan), VoIP (Skype)
File distribution time
For the following discussion: F is the file size and u and v are the upload and download speed respectively.
BitTorrent
- A peer joining torrent:
- has no chunks, but will accumulate them over time from other peers
-
registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
-
while downloading, peer uploads chunks to other peers
- peer may change peers with whom it exchanges chunks
- churn: peers may come and go
- once peer has entire file, it may (selfishly) leave or (altruistically) remain in torrent
Requesting chunks: At any given time, different peers have different subsets of file chunks. Periodically, Alice asks each peer for list of chunks that they have. Alice requests missing chunks from peers, rarest first
Sending chunks: tit-for-tat
- Alice sends chunks to those four peers currently sending her chunks at highest rate. Other peers are choked by Alice (do not receive chunks from her)
- Re-evaluate top 4 every 10 secs
- Every 30 secs: randomly select another peer, starts sending chunks, namely“optimistically unchoke” this peer and this newly chosen peer may join top 4
- For higher upload rate: find better trading partners
Video streaming and CNDs
Challenges
- server-to-client bandwidth will vary over time
- packet loss, delay due to congestion will delay playout
- continuous playout constraint: during client video playout, playout timig must match original timing
With client-side buffering:
DASH
Refer to Dynamic, Adaptive streaming over HTTP. It is an approach that allows a client to adapt the encoding rate of retrieved video to network congestion conditions.
- server:
- divides video file into multiple chunks
- each chunk encoded at multiple different rates
- different rate encodings stored in different files
- files replicated in various CDN nodes
-
manifest file: provides URLs for different chunks
-
client:
- periodically estimates server-to-client bandwidth
- consulting manifest, requests one chunk at a time
- chooses maximum coding rate sustainable given current bandwidth
- can choose different coding rates at different points in time (depending on available bandwidth at time), and from different servers
General streaming
Streaming video = encoding + DASH + playout buffering
CND(content delivery network)
challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
store/serve multiple copies of videos at multiple geographically distributed sites (CDN)
- enter deep: push CDN servers deep into many access networks
- bring home: smaller number of larger clusters in POPs near access nets