From ffe74a37b860939e426fe88fb264204a703710c9 Mon Sep 17 00:00:00 2001 From: mo khan Date: Sat, 27 Sep 2025 12:11:59 -0600 Subject: organize files --- generated/NOTES.md | 1190 ++ generated/STUDY_PLAN.md | 398 + generated/textbook.md | 31112 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 32700 insertions(+) create mode 100644 generated/NOTES.md create mode 100644 generated/STUDY_PLAN.md create mode 100644 generated/textbook.md (limited to 'generated') diff --git a/generated/NOTES.md b/generated/NOTES.md new file mode 100644 index 0000000..08c05e2 --- /dev/null +++ b/generated/NOTES.md @@ -0,0 +1,1190 @@ +# COMP-347: Computer Networks - Final Exam Study Notes + +**Based on:** Computer Networking: A Top-Down Approach, 8th Edition +**Course Units:** 1-7 covering all major networking concepts +**Prepared for:** Final Examination + +--- + +## Table of Contents + +1. [Unit 1: Introduction to Computer Networks](#unit-1-introduction-to-computer-networks) +2. [Unit 2: Application Layer](#unit-2-application-layer) +3. [Unit 3: Transport Layer](#unit-3-transport-layer) +4. [Unit 4: Network Layer - Data Plane](#unit-4-network-layer---data-plane) +5. [Unit 5: Network Layer - Control Plane](#unit-5-network-layer---control-plane) +6. [Unit 6: Link Layer and LANs](#unit-6-link-layer-and-lans) +7. [Unit 7: Wireless and Mobile Networks](#unit-7-wireless-and-mobile-networks) +8. [Key Formulas and Calculations](#key-formulas-and-calculations) +9. [Protocol Comparison Tables](#protocol-comparison-tables) +10. [Common Exam Topics](#common-exam-topics) + +--- + +## Unit 1: Introduction to Computer Networks + +### 1.1 What is the Internet? + +**Key Concepts:** +- **Internet**: Global network of interconnected computers using TCP/IP protocols +- **Network Edge**: End systems (hosts) that connect to the Internet +- **Network Core**: Routers and switches that forward data +- **ISP Hierarchy**: Tier-1, Regional, and Local ISPs + +**Components:** +- **Hosts/End Systems**: Computers, smartphones, IoT devices +- **Communication Links**: Fiber, copper, radio, satellite +- **Packet Switches**: Routers and link-layer switches +- **Protocols**: Rules for communication (TCP, IP, HTTP, etc.) + +### 1.2 Network Edge + +**Access Technologies:** +- **DSL**: Digital Subscriber Line over telephone lines +- **Cable Internet**: Over cable TV infrastructure +- **FTTH**: Fiber to the Home +- **Ethernet**: Wired LAN connections +- **Wi-Fi**: Wireless LAN (802.11) +- **Cellular**: 3G, 4G/LTE, 5G mobile networks + +**Physical Media:** +- **Guided Media**: Twisted pair, coaxial cable, fiber optic +- **Unguided Media**: Terrestrial radio, satellite, microwave + +### 1.3 Network Core + +**Switching Techniques:** + +| Aspect | Circuit Switching | Packet Switching | +|--------|------------------|------------------| +| **Connection** | Dedicated path | Store-and-forward | +| **Resources** | Reserved for entire call | Shared dynamically | +| **Efficiency** | Poor for bursty data | Good for bursty data | +| **Delay** | Consistent | Variable | +| **Examples** | Traditional telephony | Internet | + +**Packet Switching Concepts:** +- **Store-and-Forward**: Entire packet received before forwarding +- **Queuing Delay**: Waiting in output buffer +- **Packet Loss**: Buffer overflow causes dropped packets + +### 1.4 Delay, Loss, and Throughput + +**Types of Delay:** + +1. **Processing Delay (d_proc)**: Router processing time +2. **Queuing Delay (d_queue)**: Waiting in output queue +3. **Transmission Delay (d_trans)**: Time to push packet onto link + - Formula: d_trans = L/R (L = packet length, R = transmission rate) +4. **Propagation Delay (d_prop)**: Time for signal to travel + - Formula: d_prop = d/s (d = distance, s = propagation speed) + +**Total Nodal Delay:** +``` +d_nodal = d_proc + d_queue + d_trans + d_prop +``` + +**Throughput:** +- **Instantaneous**: Rate at a given instant +- **Average**: Long-term average rate +- **Bottleneck Link**: Link with minimum transmission rate + +**Traffic Intensity:** +- Formula: La/R (L = avg packet length, a = avg arrival rate, R = transmission rate) +- If La/R > 1: Queues grow without bound +- If La/R ≈ 1: Large delays +- If La/R << 1: Small delays + +--- + +## Unit 2: Application Layer + +### 2.1 Principles of Network Applications + +**Application Architectures:** + +1. **Client-Server Model**: + - Server: Always-on, permanent IP address + - Clients: Communicate with server, may be intermittently connected + +2. **Peer-to-Peer (P2P)**: + - Minimal/no dedicated servers + - End systems communicate directly + - Self-scalability + +**Transport Services:** +- **Reliable Data Transfer**: TCP provides, UDP does not +- **Throughput**: Minimum guaranteed vs best-effort +- **Timing**: Low-delay guarantees +- **Security**: Encryption, authentication + +### 2.2 Web and HTTP + +**HTTP (HyperText Transfer Protocol):** +- **Stateless**: Server maintains no client state +- **TCP-based**: Uses reliable transport +- **Methods**: GET, POST, HEAD, PUT, DELETE + +**HTTP Connections:** +- **Non-persistent**: Separate TCP connection for each object +- **Persistent**: Multiple objects over single TCP connection + - Without pipelining: Wait for response before next request + - With pipelining: Send requests back-to-back + +**Response Codes:** +- 200 OK: Request succeeded +- 301 Moved Permanently: Object moved +- 400 Bad Request: Request not understood +- 404 Not Found: Requested document not found +- 505 HTTP Version Not Supported + +**Web Caching:** +- **Proxy Server**: Acts as intermediary +- **Benefits**: Reduced response time, reduced traffic +- **Conditional GET**: If-Modified-Since header + +### 2.3 Electronic Mail + +**Email System Components:** +- **User Agents**: Mail readers (Outlook, Gmail) +- **Mail Servers**: Store and forward messages +- **SMTP**: Simple Mail Transfer Protocol + +**Email Protocols:** + +| Protocol | Purpose | Port | Characteristics | +|----------|---------|------|-----------------| +| **SMTP** | Sending mail | 25 | Push protocol, ASCII-based | +| **POP3** | Retrieving mail | 110 | Download-and-delete | +| **IMAP** | Retrieving mail | 143 | Server-side storage | + +**SMTP Process:** +1. Client establishes TCP connection to server port 25 +2. Client sends commands, server responds with status codes +3. Transfer message using DATA command +4. Close connection + +### 2.4 Domain Name System (DNS) + +**DNS Functions:** +- **Hostname-to-IP translation** +- **Host aliasing** (canonical vs alias names) +- **Mail server aliasing** +- **Load distribution** (replicated web servers) + +**DNS Hierarchy:** +- **Root DNS Servers**: Top level (13 logical servers worldwide) +- **TLD DNS Servers**: .com, .org, .net, country codes +- **Authoritative DNS Servers**: Organization's own servers +- **Local DNS Server**: ISP's default name server + +**DNS Record Types:** +- **A**: Hostname to IPv4 address +- **AAAA**: Hostname to IPv6 address +- **CNAME**: Alias to canonical hostname +- **MX**: Mail exchange server +- **NS**: Authoritative name server + +**DNS Queries:** +- **Recursive**: DNS server queries on behalf of client +- **Iterative**: DNS server returns next server to query + +### 2.5 P2P File Distribution + +**Scalability Comparison:** + +**Client-Server File Distribution:** +- Distribution time: D_cs ≥ max{NF/u_s, F/d_min} +- Grows linearly with N (number of clients) + +**P2P File Distribution:** +- Distribution time: D_P2P ≥ max{F/u_s, F/d_min, NF/(u_s + Σu_i)} +- Self-scaling: upload capacity increases with peers + +**BitTorrent Protocol:** +- **Torrent**: Group of peers sharing same file +- **Tracker**: Infrastructure node tracking participating peers +- **Chunks**: File divided into 256KB pieces +- **Tit-for-tat**: Trade pieces with neighbors + +--- + +## Unit 3: Transport Layer + +### 3.1 Transport Layer Principles + +**Transport vs Network Layer:** +- **Network Layer**: Logical communication between hosts +- **Transport Layer**: Logical communication between processes + +**Multiplexing/Demultiplexing:** +- **Multiplexing**: Gathering data from multiple sockets +- **Demultiplexing**: Delivering received segments to correct socket +- **Socket Identification**: (source IP, source port, dest IP, dest port) + +### 3.2 UDP (User Datagram Protocol) + +**UDP Characteristics:** +- **Connectionless**: No handshaking +- **Unreliable**: No delivery guarantee +- **No congestion control**: Sends at desired rate +- **Small header**: 8 bytes only + +**UDP Header:** +- Source port (16 bits) +- Destination port (16 bits) +- Length (16 bits) +- Checksum (16 bits) + +**UDP Checksum:** +- **Purpose**: Error detection +- **Calculation**: 1's complement of 1's complement sum +- **Receiver**: Adds all 16-bit words including checksum + - No errors: Result = 1111111111111111 + - Errors detected: Result ≠ 1111111111111111 + +### 3.3 TCP (Transmission Control Protocol) + +**TCP Characteristics:** +- **Connection-oriented**: Three-way handshake +- **Reliable**: Guarantees delivery +- **Flow control**: Receiver controls sender rate +- **Congestion control**: Network-aware rate control +- **Full-duplex**: Bidirectional data flow + +**TCP Segment Structure:** +- **Header Length**: 20-60 bytes (options) +- **Sequence Number**: Byte stream number +- **Acknowledgment Number**: Next expected sequence number +- **Window Size**: Flow control (bytes) +- **Flags**: SYN, FIN, RST, PSH, URG, ACK + +**TCP Reliable Data Transfer:** + +1. **Sequence Numbers**: Byte-stream numbers +2. **Acknowledgments**: Cumulative ACKs +3. **Retransmission Timer**: Single timer for oldest unACKed segment +4. **Fast Retransmit**: 3 duplicate ACKs trigger immediate retransmission + +**TCP Connection Management:** + +**Three-Way Handshake (Connection Establishment):** +1. Client sends SYN segment +2. Server responds with SYN-ACK +3. Client sends ACK + +**Connection Termination:** +1. Client sends FIN +2. Server sends ACK and FIN +3. Client sends ACK +4. Connection closed + +### 3.4 TCP Congestion Control + +**Congestion Control Principles:** +- **End-to-end**: TCP infers congestion from loss/delay +- **Network-assisted**: Routers provide feedback + +**TCP Congestion Control Algorithm:** + +**Slow Start:** +- cwnd starts at 1 MSS +- cwnd doubles each RTT until loss or threshold +- Exponential growth + +**Congestion Avoidance:** +- cwnd increases by 1 MSS per RTT +- Linear growth (additive increase) + +**Fast Recovery:** +- On 3 duplicate ACKs: cwnd = ssthresh + 3 +- Multiplicative decrease on timeout + +**TCP Tahoe vs Reno:** +- **Tahoe**: Always goes to slow start on loss +- **Reno**: Fast recovery on 3 duplicate ACKs + +**Congestion Window Evolution:** +- **Sawtooth Pattern**: Linear increase, multiplicative decrease +- **Average Throughput**: ~0.75 × W/RTT (W = max window size) + +--- + +## Unit 4: Network Layer - Data Plane + +### 4.1 Network Layer Overview + +**Network Layer Functions:** +- **Forwarding**: Move packets from input to output port +- **Routing**: Determine path from source to destination + +**Data Plane vs Control Plane:** +- **Data Plane**: Per-router forwarding function +- **Control Plane**: Network-wide routing logic + +### 4.2 Router Architecture + +**Router Components:** +- **Input Ports**: Physical layer, link layer, lookup/forwarding +- **Switching Fabric**: Transfer packets from input to output +- **Output Ports**: Store and forward packets +- **Routing Processor**: Control plane functions + +**Input Port Processing:** +- **Line Termination**: Physical layer +- **Link Layer Protocol**: Data link layer +- **Lookup/Forwarding**: Destination-based forwarding + +**Switching Fabric Types:** +- **Memory**: Via system bus (slowest) +- **Bus**: Via shared bus +- **Crossbar**: Interconnection network (fastest) + +**Output Port Processing:** +- **Queuing**: When arrival rate > transmission rate +- **Packet Scheduler**: FIFO, priority, weighted fair queuing + +### 4.3 Internet Protocol (IP) + +**IPv4 Header Format:** +- **Version** (4 bits): IP version +- **Header Length** (4 bits): 32-bit words +- **Type of Service** (8 bits): Priority, delay, throughput +- **Total Length** (16 bits): Header + data +- **Identification** (16 bits): Fragmentation +- **Flags** (3 bits): Don't fragment, more fragments +- **Fragment Offset** (13 bits): Fragment position +- **Time to Live** (8 bits): Max hops +- **Protocol** (8 bits): Upper layer protocol +- **Header Checksum** (16 bits): Error detection +- **Source Address** (32 bits): Sender IP +- **Destination Address** (32 bits): Receiver IP + +**IPv4 Addressing:** + +**Classful Addressing (Historical):** +- **Class A**: 1.0.0.0 to 126.0.0.0 (/8) +- **Class B**: 128.0.0.0 to 191.255.0.0 (/16) +- **Class C**: 192.0.0.0 to 223.255.255.0 (/24) + +**CIDR (Classless Inter-Domain Routing):** +- **Format**: a.b.c.d/x (x = number of network bits) +- **Subnet Mask**: Network portion identification +- **Longest Prefix Matching**: Most specific route wins + +**Subnetting Example:** +- Network: 192.168.1.0/24 +- Subnet 1: 192.168.1.0/26 (hosts .1-.62) +- Subnet 2: 192.168.1.64/26 (hosts .65-.126) +- Subnet 3: 192.168.1.128/26 (hosts .129-.190) +- Subnet 4: 192.168.1.192/26 (hosts .193-.254) + +**Special IP Addresses:** +- **Loopback**: 127.0.0.0/8 +- **Private**: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 +- **Link-Local**: 169.254.0.0/16 +- **Multicast**: 224.0.0.0/4 +- **Broadcast**: 255.255.255.255 + +**NAT (Network Address Translation):** +- **Purpose**: Share single IP among multiple hosts +- **NAT Table**: (internal IP, port) ↔ (external IP, port) +- **Problems**: Violates end-to-end principle, complicates P2P + +### 4.4 IPv6 + +**IPv6 Motivation:** +- **Address Space**: 128-bit addresses (vs 32-bit IPv4) +- **Header Simplification**: Fixed 40-byte header +- **Flow Labeling**: Quality of service +- **Built-in Security**: IPSec integration + +**IPv6 Header:** +- **Version** (4 bits): IP version (6) +- **Traffic Class** (8 bits): QoS +- **Flow Label** (20 bits): Flow identification +- **Payload Length** (16 bits): Data length +- **Next Header** (8 bits): Protocol type +- **Hop Limit** (8 bits): TTL equivalent +- **Source Address** (128 bits) +- **Destination Address** (128 bits) + +**IPv6 Address Types:** +- **Unicast**: Single interface +- **Multicast**: Group of interfaces +- **Anycast**: Nearest of group + +**IPv4 to IPv6 Transition:** +- **Dual Stack**: Run both protocols +- **Tunneling**: Encapsulate IPv6 in IPv4 +- **Translation**: Convert between protocols + +### 4.5 Generalized Forwarding (SDN) + +**Traditional Forwarding:** +- **Destination-based**: Forward based on destination IP +- **Fixed Function**: Hardware-based lookup + +**Generalized Forwarding:** +- **Flow-based**: Forward based on header fields +- **Programmable**: Software-defined rules + +**OpenFlow Protocol:** +- **Flow Table**: Match + Action + Stats +- **Match Fields**: 12-tuple (IPs, ports, protocol, etc.) +- **Actions**: Forward, drop, modify, send to controller +- **Controller**: Centralized control plane + +**SDN Benefits:** +- **Centralized Control**: Global network view +- **Programmability**: Custom forwarding logic +- **Separation**: Control plane from data plane +- **Innovation**: Rapid protocol development + +--- + +## Unit 5: Network Layer - Control Plane + +### 5.1 Routing Algorithms + +**Graph Abstraction:** +- **Nodes**: Routers +- **Edges**: Physical links +- **Edge Costs**: Delay, congestion, monetary cost + +**Routing Algorithm Classification:** +- **Global vs Decentralized**: Complete topology knowledge +- **Static vs Dynamic**: Route changes over time +- **Load-sensitive vs Load-insensitive**: Cost reflects traffic load + +### 5.2 Link State Routing (Dijkstra's Algorithm) + +**Algorithm Steps:** +1. **Initialization**: Distance to source = 0, others = ∞ +2. **Find minimum**: Select unvisited node with minimum distance +3. **Update neighbors**: Relax edge weights +4. **Repeat**: Until all nodes visited + +**Dijkstra's Complexity:** +- **Time**: O(n²) with simple implementation, O(n log n) with heap +- **Space**: O(n) for distance and predecessor arrays + +**Link State Protocol Features:** +- **Flooding**: Broadcast link state to all routers +- **LSDB**: Link State Database at each router +- **SPF**: Shortest Path First calculation +- **Convergence**: Fast when topology changes + +### 5.3 Distance Vector Routing + +**Bellman-Ford Equation:** +- d_x(y) = min_v{c(x,v) + d_v(y)} +- Distance from x to y = minimum over neighbors v + +**Distance Vector Algorithm:** +1. **Initialize**: Distance vector at each node +2. **Periodic Updates**: Send DV to neighbors +3. **Update**: Recalculate using Bellman-Ford +4. **Notify**: Send updates if changes occur + +**Problems:** +- **Count-to-Infinity**: Bad news travels slowly +- **Routing Loops**: Temporary loops during convergence +- **Solution**: Split horizon, poison reverse + +### 5.4 OSPF (Open Shortest Path First) + +**OSPF Characteristics:** +- **Link State Protocol**: Uses Dijkstra's algorithm +- **Area Concept**: Hierarchical routing +- **Authentication**: Secure routing updates +- **Load Balancing**: Multiple equal-cost paths + +**OSPF Areas:** +- **Backbone Area**: Area 0, connects all other areas +- **Regular Areas**: Connected to backbone +- **Stub Areas**: No external routes +- **NSSA**: Not-So-Stubby Areas + +**LSA Types:** +- **Type 1**: Router LSA +- **Type 2**: Network LSA +- **Type 3**: Summary LSA +- **Type 4**: ASBR Summary LSA +- **Type 5**: External LSA + +### 5.5 BGP (Border Gateway Protocol) + +**BGP Purpose:** +- **Inter-domain Routing**: Between autonomous systems +- **Policy-based**: Economic and political considerations +- **Path Vector Protocol**: Maintains entire AS path + +**BGP Attributes:** +- **AS-PATH**: Sequence of ASs through which route passed +- **NEXT-HOP**: IP address of next-hop router +- **LOCAL-PREF**: Degree of preference (higher = better) +- **MED**: Multi-Exit Discriminator (lower = better) + +**BGP Route Selection:** +1. **Highest LOCAL-PREF** +2. **Shortest AS-PATH** +3. **Closest NEXT-HOP** (hot potato routing) +4. **Additional tie-breakers** + +**BGP Loop Prevention:** +- **AS-PATH Attribute**: Contains list of ASs +- **Loop Detection**: Reject routes containing own AS number + +**BGP Policy Examples:** +- **Customer-Provider**: Provider advertises customer routes +- **Peer-Peer**: Peers exchange customer routes +- **Valley-Free**: No provider pays another provider + +### 5.6 SDN Control Plane + +**SDN Architecture:** +- **Data Plane**: Network switches +- **Control Plane**: SDN controller +- **Management Plane**: Network operating system + +**OpenFlow Protocol:** +- **Southbound API**: Controller to switch +- **Flow Tables**: Match-action rules +- **Reactive vs Proactive**: On-demand vs pre-installed rules + +**SDN Controller Functions:** +- **Topology Discovery**: Learn network topology +- **Routing Calculations**: Compute paths +- **Flow Installation**: Program switch flow tables +- **Load Balancing**: Distribute traffic + +**Benefits:** +- **Centralized Control**: Global optimization +- **Programmability**: Custom applications +- **Vendor Independence**: Open standards + +### 5.7 Network Management + +**SNMP (Simple Network Management Protocol):** +- **Manager**: Monitoring system +- **Agent**: Managed device +- **MIB**: Management Information Base +- **Operations**: GET, SET, TRAP + +**SNMP Messages:** +- **GetRequest**: Retrieve variable value +- **SetRequest**: Set variable value +- **GetResponse**: Response to Get/Set +- **Trap**: Asynchronous notification + +**Why UDP for SNMP:** +- **Simplicity**: Minimal overhead +- **Reliability**: Application-level retransmission +- **Efficiency**: Small message sizes +- **Availability**: Works during network problems + +--- + +## Unit 6: Link Layer and LANs + +### 6.1 Link Layer Introduction + +**Link Layer Services:** +- **Framing**: Encapsulate network layer packets +- **Link Access**: Coordinate access to shared medium +- **Reliable Delivery**: Error detection and correction +- **Flow Control**: Pace between sender and receiver +- **Error Detection**: Detect bit errors in frames +- **Error Correction**: Correct bit errors +- **Half-duplex/Full-duplex**: Bidirectional communication + +**Where is Link Layer Implemented:** +- **Network Interface Card (NIC)**: Hardware + software +- **Network Adapter**: Ethernet card, Wi-Fi card + +### 6.2 Error Detection and Correction + +**Error Types:** +- **Single Bit Error**: One bit flipped +- **Burst Error**: Multiple consecutive bits flipped + +**Detection vs Correction:** +- **Detection**: Identify presence of errors +- **Correction**: Fix errors without retransmission + +### Parity Checking + +**Single Bit Parity:** +- **Even Parity**: Even number of 1s (including parity bit) +- **Odd Parity**: Odd number of 1s (including parity bit) +- **Detection**: Single bit errors only + +**Two-Dimensional Parity:** +- **Row and Column Parity**: Arrange bits in matrix +- **Detection**: Single bit errors +- **Correction**: Single bit errors (locate intersection) + +### Checksums + +**1's Complement Checksum:** +1. **Sum**: Add all 16-bit words +2. **Wraparound**: Add carry to result +3. **Complement**: Flip all bits +4. **Check**: Add all words + checksum = all 1s + +**Internet Checksum Algorithm:** +- Used in IP, TCP, UDP headers +- Relatively weak error detection +- Fast computation in software + +### Cyclic Redundancy Check (CRC) + +**CRC Process:** +1. **Generator**: Agreed upon r+1 bit pattern G +2. **Remainder**: R = remainder of (D×2^r) ÷ G +3. **Transmitted**: D concatenated with R +4. **Check**: (D||R) divisible by G + +**CRC Properties:** +- **Detection**: All single bit errors +- **Detection**: All double bit errors +- **Detection**: Odd number of bit errors (if G has factor (x+1)) +- **Detection**: All burst errors of length ≤ r + +**Common CRC Standards:** +- **CRC-8**: 8-bit remainder +- **CRC-16**: 16-bit remainder +- **CRC-32**: 32-bit remainder (Ethernet, Wi-Fi) + +### 6.3 Multiple Access Protocols + +**Multiple Access Problem:** +- **Shared Medium**: Single broadcast channel +- **Collision**: Two or more simultaneous transmissions +- **Goal**: Coordinate access to avoid/handle collisions + +**Protocol Categories:** +1. **Channel Partitioning**: Divide channel +2. **Random Access**: Allow collisions, recover +3. **Taking Turns**: Pass token or poll + +### Channel Partitioning Protocols + +**TDMA (Time Division Multiple Access):** +- **Time Slots**: Each node gets fixed time slot +- **Advantages**: No collisions, fair +- **Disadvantages**: Unused slots wasted + +**FDMA (Frequency Division Multiple Access):** +- **Frequency Bands**: Each node gets frequency band +- **Advantages**: No collisions +- **Disadvantages**: Limited frequency spectrum + +**CDMA (Code Division Multiple Access):** +- **Unique Codes**: Each sender assigned unique code +- **Advantages**: Simultaneous transmission +- **Applications**: Cellular networks + +### Random Access Protocols + +**ALOHA:** +- **Pure ALOHA**: Transmit immediately + - **Efficiency**: 18.4% maximum +- **Slotted ALOHA**: Synchronize to time slots + - **Efficiency**: 36.8% maximum + +**CSMA (Carrier Sense Multiple Access):** +- **Listen Before Talk**: Sense channel before transmit +- **Collisions Still Possible**: Propagation delay +- **1-Persistent**: Always transmit when idle +- **p-Persistent**: Transmit with probability p + +**CSMA/CD (Collision Detection):** +- **Listen While Talk**: Detect collisions during transmission +- **Jam Signal**: Alert other stations of collision +- **Binary Exponential Backoff**: Exponentially increase backoff time +- **Minimum Frame Size**: Ensure collision detection + +**CSMA/CD Algorithm:** +1. **Sense**: Is channel idle? +2. **Transmit**: If idle, start transmission +3. **Collision Detection**: Monitor for collisions +4. **Jam**: If collision, send jam signal +5. **Backoff**: Wait random time, go to step 1 + +**CSMA/CA (Collision Avoidance):** +- **Used in Wi-Fi**: Can't detect collisions reliably +- **RTS/CTS**: Request to Send / Clear to Send +- **ACK**: Explicit acknowledgments +- **Backoff**: Random backoff before each transmission + +### 6.4 Ethernet + +**Ethernet Evolution:** + +| Standard | Year | Speed | Cable | Max Distance | +|----------|------|-------|-------|--------------| +| **10BASE-T** | 1990 | 10 Mbps | UTP Cat3 | 100m | +| **100BASE-TX** | 1995 | 100 Mbps | UTP Cat5 | 100m | +| **1000BASE-T** | 1999 | 1 Gbps | UTP Cat5e | 100m | +| **10GBASE-T** | 2006 | 10 Gbps | UTP Cat6a | 100m | + +**Ethernet Frame Format:** +- **Preamble** (8 bytes): Synchronization (10101010...) +- **Destination Address** (6 bytes): MAC address +- **Source Address** (6 bytes): MAC address +- **Type** (2 bytes): Higher layer protocol +- **Data** (46-1500 bytes): Payload +- **CRC** (4 bytes): Error detection + +**MAC Addresses:** +- **48-bit**: Unique identifier (e.g., 1A-2F-BB-76-09-AD) +- **OUI**: Organizationally Unique Identifier (first 24 bits) +- **NIC**: Network Interface Card specific (last 24 bits) +- **Broadcast**: FF-FF-FF-FF-FF-FF +- **Multicast**: First bit = 1 + +### 6.5 Link Layer Switches + +**Switch Functions:** +- **Learning**: Build MAC address table +- **Flooding**: Forward to all ports if unknown destination +- **Filtering**: Drop frames destined for same segment +- **Forwarding**: Send to specific port + +**Switch Learning Algorithm:** +1. **Record**: Source MAC and input port +2. **Age**: Remove old entries +3. **Lookup**: Check destination in table +4. **Forward**: Send to appropriate port or flood + +**Spanning Tree Protocol (STP):** +- **Purpose**: Prevent loops in switched networks +- **Root Bridge**: Elected based on lowest bridge ID +- **Port States**: Blocking, listening, learning, forwarding +- **BPDU**: Bridge Protocol Data Units + +**VLANs (Virtual LANs):** +- **Purpose**: Logically separate broadcast domains +- **Trunk Links**: Carry multiple VLAN traffic +- **802.1Q**: VLAN tagging standard +- **Benefits**: Security, traffic management, flexibility + +--- + +## Unit 7: Wireless and Mobile Networks + +### 7.1 Wireless Networking Fundamentals + +**Wireless Challenges:** +- **Signal Strength**: Decreases with distance +- **Interference**: Other sources in same frequency +- **Multipath Propagation**: Signal reflection/scattering +- **Hidden Terminal**: Can't detect all collisions + +**Wireless Network Elements:** +- **Wireless Hosts**: Laptops, smartphones +- **Base Station**: Access point, cell tower +- **Wireless Link**: Radio spectrum connection + +### 7.2 Wi-Fi (802.11 Wireless LANs) + +**802.11 Architecture:** +- **BSS (Basic Service Set)**: Wireless hosts + AP +- **ESS (Extended Service Set)**: Multiple interconnected BSSs +- **Ad Hoc Network**: No access point + +**802.11 Standards:** + +| Standard | Year | Frequency | Max Speed | Range | +|----------|------|-----------|-----------|-------| +| **802.11a** | 1999 | 5 GHz | 54 Mbps | ~35m | +| **802.11b** | 1999 | 2.4 GHz | 11 Mbps | ~100m | +| **802.11g** | 2003 | 2.4 GHz | 54 Mbps | ~100m | +| **802.11n** | 2009 | 2.4/5 GHz | 600 Mbps | ~70m | +| **802.11ac** | 2013 | 5 GHz | 6.93 Gbps | ~35m | +| **802.11ax** | 2019 | 2.4/5/6 GHz | 9.6 Gbps | ~35m | + +**802.11 Frame Structure:** +- **Frame Control** (2 bytes): Frame type, flags +- **Duration** (2 bytes): Time to transmit frame +- **Address Fields**: Up to 4 MAC addresses +- **Sequence Control** (2 bytes): Fragment/sequence numbers +- **Data**: Payload (0-2312 bytes) +- **CRC** (4 bytes): Error detection + +**802.11 MAC Protocol (CSMA/CA):** +1. **DIFS**: Distributed Inter-Frame Space wait +2. **Random Backoff**: If channel busy +3. **Transmission**: Send frame +4. **SIFS**: Short Inter-Frame Space +5. **ACK**: Acknowledgment frame + +**RTS/CTS Protocol:** +- **Purpose**: Solve hidden terminal problem +- **RTS**: Request to Send (small frame) +- **CTS**: Clear to Send (broadcast response) +- **Collision Avoidance**: Reserves channel + +**802.11 Power Management:** +- **Sleep Mode**: Node can sleep +- **Beacon Frames**: AP announces sleeping nodes +- **TIM**: Traffic Indication Map + +### 7.3 Cellular Networks + +**Cellular Concept:** +- **Cell**: Geographic area served by base station +- **Frequency Reuse**: Same frequencies in distant cells +- **Handoff**: Transfer between cells +- **Roaming**: Service outside home network + +**1G Networks:** +- **Technology**: Analog, FDMA +- **Service**: Voice only +- **Example**: AMPS (Advanced Mobile Phone System) + +**2G Networks:** +- **Technology**: Digital, TDMA/CDMA +- **Services**: Voice, SMS, low-speed data +- **Examples**: GSM, IS-95 CDMA + +**3G Networks:** +- **Technology**: CDMA-based +- **Services**: Voice, data up to 2 Mbps +- **Examples**: UMTS/WCDMA, CDMA2000 + +**4G/LTE Networks:** +- **Technology**: OFDMA, all-IP +- **Services**: High-speed data (100+ Mbps) +- **Architecture**: Evolved Packet Core (EPC) + +**5G Networks:** +- **Technology**: Massive MIMO, mmWave +- **Services**: Ultra-low latency, IoT, enhanced mobile broadband +- **Speeds**: Up to 20 Gbps + +### 7.4 Mobility Management + +**Mobility Challenges:** +- **Addressing**: How to find mobile user? +- **Routing**: How to route to mobile user? +- **Handoff**: Seamless connectivity during movement + +**Mobile IP (IPv4):** +- **Home Network**: Permanent network +- **Foreign Network**: Visited network +- **Home Agent**: Router in home network +- **Foreign Agent**: Router in foreign network +- **Care-of-Address**: Temporary address in foreign network + +**Mobile IP Process:** +1. **Registration**: Mobile node registers with foreign agent +2. **Tunneling**: Home agent tunnels packets to care-of-address +3. **Delivery**: Foreign agent delivers to mobile node +4. **Reverse**: Direct routing or via home agent + +**GSM Mobility Management:** +- **HLR (Home Location Register)**: User's home database +- **VLR (Visitor Location Register)**: Current location database +- **MSC (Mobile Switching Center)**: Call switching +- **Authentication**: Challenge-response with shared secret + +--- + +## Key Formulas and Calculations + +### Delay Calculations + +**Transmission Delay:** +``` +d_trans = L / R +where: L = packet length (bits), R = transmission rate (bps) +``` + +**Propagation Delay:** +``` +d_prop = d / s +where: d = distance (m), s = propagation speed (m/s) +``` + +**Total Delay:** +``` +d_total = d_proc + d_queue + d_trans + d_prop +``` + +**Round-Trip Time (RTT):** +``` +RTT = 2 × d_prop (assuming negligible other delays) +``` + +### Throughput Calculations + +**Throughput:** +``` +Throughput = min(R1, R2, ..., Rn) for bottleneck link +``` + +**File Transfer Time:** +``` +Transfer Time = File Size / Throughput +``` + +### Queuing Theory + +**Traffic Intensity:** +``` +ρ = λa / μ = La / R +where: λ = arrival rate, a = avg packet size, μ = service rate, R = link capacity +``` + +**Average Queue Length (M/M/1):** +``` +L = ρ / (1 - ρ) +``` + +**Average Waiting Time (M/M/1):** +``` +W = ρ / (μ(1 - ρ)) +``` + +### TCP Congestion Control + +**TCP Throughput Approximation:** +``` +Throughput ≈ (3/4) × (MSS / RTT) × (1 / √p) +where: MSS = maximum segment size, p = loss probability +``` + +**TCP Sawtooth Average:** +``` +Average Window = (3/4) × W_max +where: W_max = maximum window size before loss +``` + +### Subnet Calculations + +**Number of Subnets:** +``` +Number of Subnets = 2^n +where: n = number of borrowed bits +``` + +**Number of Hosts per Subnet:** +``` +Hosts per Subnet = 2^h - 2 +where: h = number of host bits, -2 for network and broadcast +``` + +**Subnet Address Range:** +``` +Network Address = IP & Subnet Mask +Broadcast Address = Network Address | (~Subnet Mask) +First Host = Network Address + 1 +Last Host = Broadcast Address - 1 +``` + +### Error Detection + +**1's Complement Checksum:** +1. Sum all 16-bit words +2. Add carry to result +3. Take 1's complement +4. Append to data + +**CRC Polynomial Division:** +``` +Transmitted = Data || CRC +where: CRC = remainder of (Data × 2^r) ÷ Generator +``` + +### Wireless Calculations + +**Path Loss (Free Space):** +``` +Path Loss (dB) = 20 log10(d) + 20 log10(f) + 32.44 +where: d = distance (km), f = frequency (MHz) +``` + +**Signal-to-Noise Ratio:** +``` +SNR (dB) = 10 log10(Signal Power / Noise Power) +``` + +**Shannon Capacity:** +``` +C = B log2(1 + SNR) +where: C = capacity (bps), B = bandwidth (Hz) +``` + +--- + +## Protocol Comparison Tables + +### Transport Protocols + +| Feature | TCP | UDP | +|---------|-----|-----| +| **Connection** | Connection-oriented | Connectionless | +| **Reliability** | Reliable, ordered delivery | Unreliable, no ordering | +| **Flow Control** | Yes (sliding window) | No | +| **Congestion Control** | Yes | No | +| **Header Size** | 20-60 bytes | 8 bytes | +| **Speed** | Slower (overhead) | Faster (minimal overhead) | +| **Applications** | HTTP, FTP, SMTP, SSH | DNS, DHCP, streaming, games | + +### Routing Protocols + +| Aspect | Link State (OSPF) | Distance Vector (RIP) | Path Vector (BGP) | +|--------|-------------------|----------------------|------------------| +| **Algorithm** | Dijkstra's | Bellman-Ford | Path vector | +| **Convergence** | Fast | Slow | Slow | +| **Scalability** | Good (areas) | Poor | Excellent | +| **Loop Prevention** | SPF tree | Split horizon | AS-PATH | +| **Metric** | Cost | Hop count | Policy-based | +| **Updates** | Event-triggered | Periodic | Incremental | + +### Ethernet Standards + +| Standard | Speed | Cable | Distance | Collision Domain | +|----------|-------|-------|-----------|-----------------| +| **10BASE-T** | 10 Mbps | Cat3 UTP | 100m | Per segment | +| **100BASE-TX** | 100 Mbps | Cat5 UTP | 100m | Per segment | +| **1000BASE-T** | 1 Gbps | Cat5e UTP | 100m | Per link | +| **10GBASE-T** | 10 Gbps | Cat6a UTP | 100m | Per link | + +### Wi-Fi Standards + +| Standard | Frequency | Max Speed | Range | MIMO | +|----------|-----------|-----------|-------|------| +| **802.11n** | 2.4/5 GHz | 600 Mbps | 70m | 4×4 | +| **802.11ac** | 5 GHz | 6.93 Gbps | 35m | 8×8 | +| **802.11ax** | 2.4/5/6 GHz | 9.6 Gbps | 35m | 8×8 | + +### Multiple Access Protocols + +| Protocol | Efficiency | Delay | Complexity | Use Case | +|----------|------------|-------|------------|----------| +| **TDMA** | High | Low | Medium | Cellular | +| **FDMA** | High | Low | Medium | Radio | +| **CSMA/CD** | Medium | Variable | Low | Ethernet | +| **CSMA/CA** | Low | High | Medium | Wi-Fi | + +--- + +## Common Exam Topics + +### Calculation Problems + +1. **Delay and Throughput** + - Calculate transmission, propagation, and total delay + - Determine bottleneck links and end-to-end throughput + - File transfer time calculations + +2. **Subnet Design** + - CIDR notation and subnet masks + - Number of subnets and hosts + - IP address ranges and broadcast addresses + +3. **TCP Performance** + - Congestion window evolution + - Throughput estimation + - Connection establishment time + +4. **Checksum Calculations** + - 1's complement arithmetic + - Error detection probability + - Two-dimensional parity + +5. **Routing Algorithm Execution** + - Dijkstra's shortest path + - Distance vector updates + - BGP path selection + +### Conceptual Questions + +1. **Protocol Comparison** + - TCP vs UDP characteristics + - Circuit vs packet switching + - Link state vs distance vector routing + +2. **Network Architecture** + - OSI vs Internet protocol stack + - Client-server vs P2P applications + - SDN vs traditional networking + +3. **Error Control** + - Detection vs correction + - ARQ protocols (Stop-and-wait, Go-back-N, Selective Repeat) + - Forward error correction + +4. **Multiple Access** + - Hidden and exposed terminal problems + - CSMA/CD vs CSMA/CA + - Random access vs controlled access + +5. **Wireless Networking** + - Wi-Fi architecture and protocols + - Cellular network evolution + - Mobility management + +### Design Problems + +1. **Network Topology Design** + - Subnet planning and addressing + - Router placement and configuration + - QoS and traffic engineering + +2. **Protocol Selection** + - Application requirements analysis + - Transport protocol choice + - Routing protocol selection + +3. **Performance Optimization** + - Bottleneck identification + - Capacity planning + - Load balancing strategies + +### Security Considerations + +1. **Network Attacks** + - DoS and DDoS attacks + - Man-in-the-middle attacks + - Packet sniffing and spoofing + +2. **Security Mechanisms** + - Encryption and authentication + - Firewalls and intrusion detection + - VPNs and secure tunneling + +--- + +## Final Exam Preparation Tips + +### Study Strategy + +1. **Review assignments** - Your completed assignments cover key exam topics +2. **Practice calculations** - Master delay, throughput, and subnetting problems +3. **Understand protocols** - Know when and why different protocols are used +4. **Create comparison tables** - Organize similar technologies and protocols +5. **Draw diagrams** - Visualize network architectures and protocol operations + +### Key Areas to Focus + +1. **Fundamentals** (Unit 1): Delay calculations, throughput, switching +2. **Application Layer** (Unit 2): HTTP, DNS, P2P file sharing +3. **Transport Layer** (Unit 3): TCP reliability and congestion control +4. **Network Layer** (Units 4-5): IP addressing, routing algorithms +5. **Link Layer** (Unit 6): Error detection, Ethernet, switching +6. **Wireless** (Unit 7): Wi-Fi protocols, cellular networks + +### Problem-Solving Approach + +1. **Read carefully** - Identify what's given and what's asked +2. **Draw diagrams** - Visualize the network or protocol operation +3. **Show work** - Step-by-step calculations with units +4. **Check answers** - Verify reasonableness and units +5. **Explain reasoning** - Justify protocol choices and design decisions + +**Good luck with your final exam!** \ No newline at end of file diff --git a/generated/STUDY_PLAN.md b/generated/STUDY_PLAN.md new file mode 100644 index 0000000..42d80ff --- /dev/null +++ b/generated/STUDY_PLAN.md @@ -0,0 +1,398 @@ +# COMP-347 Final Exam Study Plan (3 Weeks) + +**Textbook:** Computer Networking: A Top-Down Approach, 8th Edition +**Timeline:** 3 weeks to final exam +**Strategy:** Build on your excellent assignment work, focus on high-yield topics + +--- + +## Week 1: Foundation & Core Concepts (Days 1-7) + +### Day 1-2: Network Fundamentals & Performance +**Textbook Sections:** +- **Chapter 1.1-1.4**: What is the Internet? Network Edge, Core, Delays +- **Chapter 1.5**: Protocol Layers and Service Models + +**Focus Areas:** +- Delay calculations (you did well on Assignment 1) +- Throughput and bottleneck analysis +- Circuit vs packet switching comparison +- Protocol stack layers + +**Study Method:** +- Read sections, then solve textbook problems +- Cross-reference with Assignment 1 Q2.1 (file transfer analysis) +- Practice delay calculation variations + +**Time Allocation:** 3-4 hours per day + +--- + +### Day 3-4: Application Layer Deep Dive +**Textbook Sections:** +- **Chapter 2.1**: Principles of Network Applications +- **Chapter 2.2**: The Web and HTTP +- **Chapter 2.3**: Electronic Mail +- **Chapter 2.4**: DNS + +**Focus Areas:** +- HTTP connection types and caching (Assignment 1 Q2.3) +- DNS hierarchy and resolution process +- Email protocols (SMTP, POP3, IMAP) +- Client-server vs P2P architectures + +**Study Method:** +- Review Assignment 1 solutions for HTTP and email +- Focus on protocol operation details from textbook +- Understand DNS lookup process with examples + +**Time Allocation:** 3-4 hours per day + +--- + +### Day 5: P2P and CDNs +**Textbook Sections:** +- **Chapter 2.5**: Peer-to-Peer Applications +- **Chapter 2.6**: Video Streaming and Content Distribution Networks + +**Focus Areas:** +- P2P file distribution efficiency (Assignment 1 Q2.4) +- BitTorrent protocol operation +- CDN architecture and benefits + +**Study Method:** +- Review Assignment 1 Q2.4 calculations +- Understand scaling advantages of P2P +- Learn CDN request routing + +**Time Allocation:** 2-3 hours + +--- + +### Day 6-7: Transport Layer Mastery +**Textbook Sections:** +- **Chapter 3.1-3.3**: Transport Layer Services, UDP, TCP Basics +- **Chapter 3.4**: Principles of Reliable Data Transfer +- **Chapter 3.5**: Connection-Oriented Transport (TCP) + +**Focus Areas:** +- TCP reliability mechanisms (Assignment 2 Q1.1) +- Go-Back-N vs Selective Repeat (Assignment 2 Q1.2) +- Error detection methods +- TCP connection management + +**Study Method:** +- Your Assignment 2 Q1.1 is excellent - use as reference +- Focus on RDT protocol evolution +- Practice TCP state diagrams + +**Time Allocation:** 4-5 hours per day + +--- + +## Week 2: Network Layer & Routing (Days 8-14) + +### Day 8-9: Network Layer Fundamentals +**Textbook Sections:** +- **Chapter 4.1**: Introduction to Network Layer +- **Chapter 4.2**: What's Inside a Router? +- **Chapter 4.3**: The Internet Protocol (IP) + +**Focus Areas:** +- IPv4 addressing and CIDR (Assignment 2 Q2.3) +- Subnetting calculations +- IPv6 and transition mechanisms (Assignment 2 Q1.3) +- NAT operation + +**Study Method:** +- Master Assignment 2 Q2.3 CIDR routing methodology +- Practice more subnetting problems from textbook +- Understand IPv6 addressing structure + +**Time Allocation:** 4 hours per day + +--- + +### Day 10-11: Routing Algorithms +**Textbook Sections:** +- **Chapter 5.1**: Introduction to Network Control Plane +- **Chapter 5.2**: Routing Algorithms +- **Chapter 5.3**: Intra-AS Routing (OSPF) + +**Focus Areas:** +- Dijkstra's algorithm (Assignment 2 Q2.2) - you mastered this! +- Distance vector routing +- OSPF areas and LSA types +- Link state vs distance vector comparison + +**Study Method:** +- Your Assignment 2 Q2.2 is perfect - review methodology +- Practice Dijkstra's with different topologies +- Understand OSPF hierarchical design + +**Time Allocation:** 4 hours per day + +--- + +### Day 12: BGP and Advanced Routing +**Textbook Sections:** +- **Chapter 5.4**: Routing Among the ISPs (BGP) +- **Chapter 5.5**: The SDN Control Plane + +**Focus Areas:** +- BGP path selection and loop prevention (Assignment 2 Q1.6) +- AS relationships and policies +- SDN architecture and OpenFlow (Assignment 2 Q1.5) + +**Study Method:** +- Build on Assignment 2 Q1.6 BGP knowledge +- Understand Internet hierarchy and peering +- Learn OpenFlow match-action paradigm + +**Time Allocation:** 3-4 hours + +--- + +### Day 13-14: TCP Performance & Network Management +**Textbook Sections:** +- **Chapter 3.6**: Principles of Congestion Control +- **Chapter 3.7**: TCP Congestion Control +- **Chapter 5.7**: Network Management (SNMP) + +**Focus Areas:** +- TCP congestion control algorithms (Assignment 2 Q2.4) +- Throughput calculations and performance analysis +- SNMP operations and MIB (Assignment 2 Q1.4) + +**Study Method:** +- Review Assignment 2 Q2.4 congestion control calculations +- Practice TCP sawtooth pattern problems +- Understand why UDP is used for SNMP + +**Time Allocation:** 4 hours per day + +--- + +## Week 3: Link Layer & Wireless + Review (Days 15-21) + +### Day 15-16: Link Layer & Error Control +**Textbook Sections:** +- **Chapter 6.1**: Introduction to Link Layer +- **Chapter 6.2**: Error-Detection and Error-Correction Techniques +- **Chapter 6.3**: Multiple Access Links and Protocols + +**Focus Areas:** +- Error detection methods (Assignment 2 Q2.1) - your 1's complement work +- Two-dimensional parity (Assignment 3 Q2.2) +- CSMA/CD protocol and timing (Assignment 3 Q2.3) +- Multiple access protocol comparison + +**Study Method:** +- Review Assignment 2 Q2.1 checksum calculations +- Master Assignment 3 Q2.2 two-dimensional parity design +- Understand Assignment 3 Q2.3 collision analysis perfectly + +**Time Allocation:** 4-5 hours per day + +--- + +### Day 17: Ethernet and Switching +**Textbook Sections:** +- **Chapter 6.4**: Switched Local Area Networks +- **Chapter 6.5**: Link Virtualization (VLANs) + +**Focus Areas:** +- Ethernet frame format and addressing +- Switch learning and forwarding +- Spanning Tree Protocol +- VLAN operation and trunking + +**Study Method:** +- Understand switch vs hub operation +- Learn MAC address learning algorithm +- Practice VLAN design problems + +**Time Allocation:** 3-4 hours + +--- + +### Day 18: Wireless Networks +**Textbook Sections:** +- **Chapter 7.1**: Introduction to Wireless Networking +- **Chapter 7.3**: WiFi (802.11 Wireless LANs) + +**Focus Areas:** +- Wireless challenges (hidden terminal, fading) +- 802.11 architecture and standards (Assignment 3 Q1.6) +- CSMA/CA and RTS/CTS (Assignment 3 Q2.4) +- Wi-Fi frame format and power management + +**Study Method:** +- Build on Assignment 3 Q1.6 Wi-Fi standards comparison +- Perfect Assignment 3 Q2.4 RTS/CTS timing analysis +- Understand differences from wired networks + +**Time Allocation:** 3-4 hours + +--- + +### Day 19: Cellular Networks +**Textbook Sections:** +- **Chapter 7.2**: Cellular Internet Access +- **Chapter 7.4**: Mobility Management + +**Focus Areas:** +- Cellular network evolution (2G → 5G) +- GSM architecture and handoffs (Assignment 3 Q1.1) +- LTE characteristics (Assignment 3 Q1.2) +- Mobile IP and mobility management + +**Study Method:** +- Review Assignment 3 Q1.1 Anchor MSC role +- Understand Assignment 3 Q1.2 LTE evolution +- Learn cellular frequency reuse concepts + +**Time Allocation:** 3-4 hours + +--- + +### Day 20-21: Final Review & Practice +**Focus:** +- **Calculation Practice**: Work through all assignment calculation problems again +- **Protocol Comparisons**: Use comparison tables from NOTES.md +- **Weak Areas**: Identify and strengthen any gaps +- **Past Exams**: If available, practice with sample questions + +**Study Method:** +1. **Morning**: Review one unit completely +2. **Afternoon**: Practice calculations and problems +3. **Evening**: Quick review of key concepts + +**Priority Topics for Review:** +1. **Delay/throughput calculations** (Assignment 1 strength) +2. **TCP reliability and congestion control** (Assignment 2 strength) +3. **Routing algorithms** (Assignment 2 Dijkstra mastery) +4. **CIDR and subnetting** (Assignment 2 strength) +5. **Wireless protocols** (Assignment 3 comprehensive coverage) +6. **Error detection** (Assignment 2 & 3 strength) + +**Time Allocation:** 6-8 hours per day + +--- + +## Daily Study Schedule Template + +### Optimal Daily Routine: +- **Morning (2-3 hours)**: Read textbook sections +- **Afternoon (1-2 hours)**: Work practice problems +- **Evening (1 hour)**: Review notes and assignment solutions + +### Study Techniques: +1. **Active Reading**: Take notes, create summaries +2. **Practice Problems**: Essential for calculations +3. **Teach Back**: Explain concepts aloud +4. **Spaced Repetition**: Review previous days' material +5. **Error Analysis**: Learn from mistakes + +--- + +## High-Yield Topics (Focus Extra Attention) + +### Calculation-Heavy Areas: +1. **Delay and throughput analysis** - Very common on exams +2. **TCP congestion window evolution** - Complex but high-value +3. **Subnetting and CIDR** - Practical and testable +4. **Routing algorithm execution** - You excel at Dijkstra's +5. **Error detection probabilities** - Mathematical applications + +### Conceptual Areas: +1. **Protocol comparisons** (TCP vs UDP, routing protocols) +2. **Network architecture** (layered approach, SDN) +3. **Wireless challenges** (collision avoidance, mobility) +4. **Internet structure** (ISP hierarchy, BGP policies) +5. **Security considerations** (though not heavily tested) + +--- + +## Week-by-Week Goals + +### Week 1 Success Metrics: +- [ ] Can calculate all delay types accurately +- [ ] Understand HTTP caching and DNS resolution +- [ ] Master TCP reliability mechanisms +- [ ] Comfortable with transport protocols + +### Week 2 Success Metrics: +- [ ] Expert at IP addressing and subnetting +- [ ] Can execute routing algorithms perfectly +- [ ] Understand BGP path selection +- [ ] Grasp SDN and network management + +### Week 3 Success Metrics: +- [ ] Master error detection calculations +- [ ] Understand all multiple access protocols +- [ ] Expert on wireless networking concepts +- [ ] Ready for comprehensive exam + +--- + +## Resources to Use Alongside Textbook + +### Your Assignment Solutions (Excellent Reference): +- **Assignment 1**: Application layer, performance calculations +- **Assignment 2**: Transport/network layers, advanced protocols +- **Assignment 3**: Link layer, wireless, comprehensive coverage + +### My Study Notes (NOTES.md): +- Quick reference for key concepts +- Comparison tables for protocols +- Formula summary for calculations +- Exam strategy and tips + +### Additional Practice: +- Textbook end-of-chapter problems +- Online networking calculators (for verification) +- RFC documents for protocol details (if time permits) + +--- + +## Final Week Emergency Plan + +**If running short on time, prioritize:** + +### Must-Know Topics (60% of exam value): +1. Delay/throughput calculations +2. TCP operation and congestion control +3. IP addressing and routing +4. Error detection methods +5. Basic wireless protocols + +### Good-to-Know Topics (30% of exam value): +1. Advanced routing (BGP, OSPF details) +2. Detailed protocol operations +3. Network management +4. Advanced wireless features + +### Nice-to-Know Topics (10% of exam value): +1. Historical context +2. Emerging technologies +3. Implementation details + +--- + +## Success Tips + +### Based on Your Assignment Performance: +1. **You excel at calculations** - use this strength +2. **Your protocol analysis is thorough** - leverage this skill +3. **You understand complex concepts** - trust your preparation +4. **Your technical writing is clear** - organize exam answers well + +### Exam Day Strategy: +1. **Start with calculation problems** (your strength) +2. **Show all work clearly** (partial credit opportunities) +3. **Use comparison tables** (organize protocol questions) +4. **Manage time carefully** (don't get stuck on one problem) +5. **Review answers** (check units and reasonableness) + +**You've got this! Your assignment work shows excellent understanding - now just reinforce with textbook depth.** \ No newline at end of file diff --git a/generated/textbook.md b/generated/textbook.md new file mode 100644 index 0000000..cfb43cd --- /dev/null +++ b/generated/textbook.md @@ -0,0 +1,31112 @@ + Computer Networking A Top-Down Approach Seventh Edition James F. Kurose +University of Massachusetts, Amherst Keith W. Ross NYU and NYU Shanghai + +Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam +Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi +Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Vice +President, Editorial Director, ECS: Marcia Horton Acquisitions Editor: +Matt Goldstein Editorial Assistant: Kristy Alaura Vice President of +Marketing: Christy Lesko Director of Field Marketing: Tim Galligan +Product Marketing Manager: Bram Van Kempen Field Marketing Manager: +Demetrius Hall Marketing Assistant: Jon Bryant Director of Product +Management: Erin Gregg Team Lead, Program and Project Management: Scott +Disanno Program Manager: Joanne Manning and Carole Snyder Project +Manager: Katrina Ostler, Ostler Editorial, Inc. Senior Specialist, +Program Planning and Support: Maura Zaldivar-Garcia + + Cover Designer: Joyce Wells Manager, Rights and Permissions: Ben Ferrini +Project Manager, Rights and Permissions: Jenny Hoffman, Aptara +Corporation Inventory Manager: Ann Lam Cover Image: Marc Gutierrez/Getty +Images Media Project Manager: Steve Wright Composition: Cenveo +Publishing Services Printer/Binder: Edwards Brothers Malloy Cover and +Insert Printer: Phoenix Color/ Hagerstown Credits and acknowledgments +borrowed from other sources and reproduced, with ­permission, in this +textbook appear on appropriate page within text. Copyright © 2017, 2013, +2010 Pearson Education, Inc. All rights reserved. Manufactured in the +United States of America. This publication is protected by Copyright, +and permission should be obtained from the publisher prior to any +prohibited reproduction, storage in a retrieval system, or transmission +in any form or by any means, electronic, mechanical, photocopying, +recording, or likewise. For information regarding permissions, request +forms and the appropriate contacts within the Pearson Education Global +Rights & Permissions Department, please visit +www.pearsoned.com/permissions/. Many of the designations by +manufacturers and seller to distinguish their products are claimed as +trademarks. Where those designations appear in this book, and the +publisher was aware of a trademark claim, the designations have been +printed in initial caps or all caps. Library of Congress +Cataloging-in-Publication Data Names: Kurose, James F. \| Ross, Keith +W., 1956Title: Computer networking: a top-down approach / James F. +Kurose, University of Massachusetts, Amherst, Keith W. Ross, NYU and NYU +Shanghai. Description: Seventh edition. \| Hoboken, New Jersey: Pearson, +\[2017\] \| Includes bibliographical references and index. Identifiers: +LCCN 2016004976 \| ISBN 9780133594140 \| ISBN 0133594149 Subjects: LCSH: +Internet. \| Computer networks. Classification: LCC TK5105.875.I57 K88 +2017 \| DDC 004.6-dc23 + + LC record available at http://lccn.loc.gov/2016004976 + +ISBN-10: + +0-13-359414-9 + +ISBN-13: 978-0-13-359414-0 + +About the Authors Jim Kurose Jim Kurose is a Distinguished University +Professor of Computer Science at the University of Massachusetts, +Amherst. He is currently on leave from the University of Massachusetts, +serving as an Assistant Director at the US National Science Foundation, +where he leads the Directorate of Computer and Information Science and +Engineering. Dr. Kurose has received a number of recognitions for his +educational activities including Outstanding Teacher Awards from the +National Technological University (eight times), the University of +Massachusetts, and the Northeast Association of Graduate Schools. He +received the IEEE Taylor Booth Education Medal and was recognized for +his leadership of Massachusetts' Commonwealth Information Technology +Initiative. He has won several conference best paper awards and received +the IEEE Infocom Achievement Award and the ACM Sigcomm Test of Time +Award. + +Dr. Kurose is a former Editor-in-Chief of IEEE Transactions on +Communications and of IEEE/ACM Transactions on Networking. He has served +as Technical Program co-Chair for IEEE Infocom, ACM SIGCOMM, ACM +Internet Measurement Conference, and ACM SIGMETRICS. He is a Fellow of +the IEEE and the ACM. His research ­interests include network protocols +and architecture, network measurement, multimedia communication, and +modeling and performance ­evaluation. He holds a PhD in Computer Science +from Columbia University. + +Keith Ross + + Keith Ross is the Dean of Engineering and Computer Science at NYU +Shanghai and the Leonard J. Shustek Chair Professor in the Computer +Science and Engineering Department at NYU. Previously he was at +University of Pennsylvania (13 years), Eurecom Institute (5 years) and +Polytechnic University (10 years). He received a B.S.E.E from Tufts +University, a M.S.E.E. from Columbia University, and a Ph.D. in Computer +and Control Engineering from The University of Michigan. Keith Ross is +also the co-founder and original CEO of Wimba, which develops online +multimedia applications for e-learning and was acquired by Blackboard in +2010. + +Professor Ross's research interests are in privacy, social networks, +peer-to-peer networking, Internet measurement, content distribution +networks, and stochastic modeling. He is an ACM Fellow, an IEEE Fellow, +recipient of the Infocom 2009 Best Paper Award, and recipient of 2011 +and 2008 Best Paper Awards for Multimedia Communications (awarded by +IEEE Communications Society). He has served on numerous journal +editorial boards and conference program committees, including IEEE/ACM +Transactions on Networking, ACM SIGCOMM, ACM CoNext, and ACM Internet +Measurement Conference. He also has served as an advisor to the Federal +Trade Commission on P2P file sharing. + +To Julie and our three precious ones---Chris, Charlie, and Nina JFK + +A big THANKS to my professors, colleagues, and students all over the +world. KWR + +Preface Welcome to the seventh edition of Computer Networking: A +Top-Down Approach. Since the publication of the first edition 16 years +ago, our book has been adopted for use at many hundreds of colleges and +universities, translated into 14 languages, and used by over one hundred +thousand students and practitioners worldwide. We've heard from many of +these readers and have been overwhelmed by the ­positive ­response. + + What's New in the Seventh Edition? We think one important reason for +this success has been that our book continues to offer a fresh and +timely approach to computer networking instruction. We've made changes +in this seventh edition, but we've also kept unchanged what we believe +(and the instructors and students who have used our book have confirmed) +to be the most important aspects of this book: its top-down approach, +its focus on the Internet and a modern treatment of computer networking, +its attention to both principles and practice, and its accessible style +and approach toward learning about computer networking. Nevertheless, +the seventh edition has been revised and updated substantially. +Long-time readers of our book will notice that for the first time since +this text was published, we've changed the organization of the chapters +themselves. The network layer, which had been previously covered in a +single chapter, is now covered in Chapter 4 (which focuses on the +so-called "data plane" component of the network layer) and Chapter 5 +(which focuses on the network layer's "control plane"). This expanded +coverage of the network layer reflects the swift rise in importance of +software-defined networking (SDN), arguably the most important and +exciting advance in networking in decades. Although a relatively recent +innovation, SDN has been rapidly adopted in practice---so much so that +it's already hard to imagine an introduction to modern computer +networking that doesn't cover SDN. The topic of network management, +previously covered in Chapter 9, has now been folded into the new +Chapter 5. As always, we've also updated many other sections of the text +to reflect recent changes in the dynamic field of networking since the +sixth edition. As always, material that has been retired from the +printed text can always be found on this book's Companion Website. The +most important updates are the following: Chapter 1 has been updated to +reflect the ever-growing reach and use of the ­Internet. Chapter 2, which +covers the application layer, has been significantly updated. We've +removed the material on the FTP protocol and distributed hash tables to +make room for a new section on application-level video streaming and +­content distribution networks, together with Netflix and YouTube case +studies. The socket programming sections have been updated from Python 2 +to Python 3. Chapter 3, which covers the transport layer, has been +modestly updated. The ­material on asynchronous transport mode (ATM) +networks has been replaced by more modern material on the Internet's +explicit congestion notification (ECN), which teaches the same +principles. Chapter 4 covers the "data plane" component of the network +layer---the per-router forwarding function that determine how a packet +arriving on one of a router's input links is forwarded to one of that +router's output links. We updated the material on traditional Internet +forwarding found in all previous editions, and added material on packet +scheduling. We've also added a new section on generalized forwarding, as +practiced in SDN. There are also numerous updates throughout the +chapter. Material on multicast and broadcast communication has been +removed to make way for the new material. In Chapter 5, we cover the +control plane functions of the network layer---the ­network-wide logic +that controls how a datagram is routed along an end-to-end path of +routers from the source host to the destination host. As in previous +­editions, we cover routing algorithms, as well as routing protocols +(with an updated treatment of BGP) used in today's Internet. We've added +a significant new section on the SDN control plane, where routing and +other functions are implemented in so-called SDN controllers. Chapter 6, +which now covers the link layer, has an updated treatment of Ethernet, +and of data center networking. Chapter 7, which covers wireless and +mobile networking, contains updated ­material on 802.11 (so-called "WiFi) +networks and cellular networks, including 4G and LTE. Chapter 8, which +covers network security and was extensively updated in the sixth +edition, has only + + modest updates in this seventh edition. Chapter 9, on multimedia +networking, is now slightly "thinner" than in the sixth edition, as +material on video streaming and content distribution networks has been +moved to Chapter 2, and material on packet scheduling has been +incorporated into Chapter 4. Significant new material involving +end-of-chapter problems has been added. As with all previous editions, +homework problems have been revised, added, and removed. As always, our +aim in creating this new edition of our book is to continue to provide a +focused and modern treatment of computer networking, emphasizing both +principles and practice. Audience This textbook is for a first course on +computer networking. It can be used in both computer science and +electrical engineering departments. In terms of programming languages, +the book assumes only that the student has experience with C, C++, Java, +or Python (and even then only in a few places). Although this book is +more precise and analytical than many other introductory computer +networking texts, it rarely uses any mathematical concepts that are not +taught in high school. We have made a deliberate effort to avoid using +any advanced calculus, probability, or stochastic process concepts +(although we've included some homework problems for students with this +advanced background). The book is therefore appropriate for +undergraduate courses and for first-year graduate courses. It should +also be useful to practitioners in the telecommunications industry. What +Is Unique About This Textbook? The subject of computer networking is +enormously complex, involving many concepts, protocols, and technologies +that are woven together in an intricate manner. To cope with this scope +and complexity, many computer networking texts are often organized +around the "layers" of a network architecture. With a layered +organization, students can see through the complexity of computer +networking---they learn about the distinct concepts and protocols in one +part of the architecture while seeing the big picture of how all parts +fit together. From a pedagogical perspective, our personal experience +has been that such a layered approach indeed works well. Nevertheless, +we have found that the traditional approach of teaching---bottom up; +that is, from the physical layer towards the application layer---is not +the best approach for a modern course on computer networking. A Top-Down +Approach Our book broke new ground 16 years ago by treating networking +in a top-down ­manner---that is, by beginning at the application layer +and working its way down toward the physical layer. The feedback we +received from teachers and students alike have confirmed that this +top-down approach has many advantages and does indeed work well +pedagogically. First, it places emphasis on the application layer (a +"high growth area" in networking). Indeed, many of the recent +revolutions in ­computer networking---including the Web, peer-to-peer +file sharing, and media streaming---have taken place at the application +layer. An early emphasis on application-layer issues differs from the +approaches taken in most other texts, which have only a small amount of +material on network applications, their requirements, application-layer +paradigms (e.g., client-server and peer-to-peer), and application +programming ­interfaces. ­Second, our experience as instructors (and that +of many instructors who have used this text) has been that teaching +networking applications near the beginning of the course is a powerful +motivational tool. Students are thrilled to learn about how networking + + applications work---applications such as e-mail and the Web, which most +students use on a daily basis. Once a student understands the +applications, the student can then understand the network services +needed to support these applications. The student can then, in turn, +examine the various ways in which such services might be provided and +implemented in the lower layers. Covering applications early thus +provides motivation for the remainder of the text. Third, a top-down +approach enables instructors to introduce network application +development at an early stage. Students not only see how popular +applications and protocols work, but also learn how easy it is to create +their own network ­applications and application-level protocols. With the +top-down approach, students get early ­exposure to the notions of socket +programming, service models, and ­protocols---important concepts that +resurface in all subsequent layers. By providing socket programming +examples in Python, we highlight the central ideas without confusing +students with complex code. Undergraduates in electrical engineering and +computer science should not have difficulty following the Python code. +An Internet Focus Although we dropped the phrase "Featuring the +Internet" from the title of this book with the fourth edition, this +doesn't mean that we dropped our focus on the Internet. Indeed, nothing +could be further from the case! Instead, since the Internet has become +so pervasive, we felt that any networking textbook must have a +significant focus on the Internet, and thus this phrase was somewhat +unnecessary. We continue to use the Internet's architecture and +protocols as primary vehicles for studying fundamental computer +networking concepts. Of course, we also include concepts and protocols +from other network architectures. But the spotlight is clearly on the +Internet, a fact reflected in our organizing the book around the +Internet's five-layer architecture: the application, transport, network, +link, and physical layers. Another benefit of spotlighting the Internet +is that most computer science and electrical engineering students are +eager to learn about the Internet and its protocols. They know that the +Internet has been a revolutionary and disruptive technology and can see +that it is profoundly changing our world. Given the enormous relevance +of the Internet, students are naturally curious about what is "under the +hood." Thus, it is easy for an instructor to get students excited about +basic principles when using the Internet as the guiding focus. Teaching +Networking Principles Two of the unique features of the book---its +top-down approach and its focus on the Internet---have appeared in the +titles of our book. If we could have squeezed a third phrase into the +subtitle, it would have contained the word principles. The field of +networking is now mature enough that a number of fundamentally important +issues can be identified. For example, in the transport layer, the +fundamental issues include reliable communication over an unreliable +network layer, connection establishment/ teardown and handshaking, +congestion and flow control, and multiplexing. Three fundamentally +important network-layer issues are determining "good" paths between two +routers, interconnecting a large number of heterogeneous networks, and +managing the complexity of a modern network. In the link layer, a +fundamental problem is sharing a multiple access channel. In network +security, techniques for providing confidentiality, authentication, and +message integrity are all based on cryptographic fundamentals. This text +identifies fundamental networking issues and studies approaches towards +addressing these issues. The student learning these principles will gain +knowledge with a long "shelf life"---long after today's network +standards and protocols have become obsolete, the principles they embody +will remain important and relevant. We believe that the combination of +using the Internet to get the student's foot in the door and then +emphasizing fundamental issues and solution approaches will allow the +student to + + quickly understand just about any networking technology. The Website +Each new copy of this textbook includes twelve months of access to a +Companion ­Website for all book readers at +http://www.pearsonhighered.com/cs-resources/, which includes: +Interactive learning material. The book's Companion Website contains +­VideoNotes---video presentations of important topics throughout the book +done by the authors, as well as walkthroughs of solutions to problems +similar to those at the end of the chapter. We've seeded the Web site +with VideoNotes and ­online problems for Chapters 1 through 5 and will +continue to actively add and update this material over time. As in +earlier editions, the Web site contains the interactive Java applets +that animate many key networking concepts. The site also has interactive +quizzes that permit students to check their basic understanding of the +subject matter. Professors can integrate these interactive features into +their lectures or use them as mini labs. Additional technical material. +As we have added new material in each edition of our book, we've had to +remove coverage of some existing topics to keep the book at manageable +length. For example, to make room for the new ­material in this ­edition, +we've removed material on FTP, distributed hash tables, and +multicasting, Material that appeared in earlier editions of the text is +still of ­interest, and thus can be found on the book's Web site. +Programming assignments. The Web site also provides a number of detailed +programming assignments, which include building a multithreaded Web +­server, building an e-mail client with a GUI interface, programming the +sender and ­receiver sides of a reliable data transport protocol, +programming a distributed routing algorithm, and more. Wireshark labs. +One's understanding of network protocols can be greatly ­deepened by +seeing them in action. The Web site provides numerous Wireshark +assignments that enable students to actually observe the sequence of +messages exchanged between two protocol entities. The Web site includes +separate Wireshark labs on HTTP, DNS, TCP, UDP, IP, ICMP, Ethernet, ARP, +WiFi, SSL, and on tracing all protocols involved in satisfying a request +to fetch a Web page. We'll continue to add new labs over time. In +addition to the Companion Website, the authors maintain a public Web +site, http://gaia.cs.umass.edu/kurose_ross/interactive, containing +interactive exercises that create (and present solutions for) problems +similar to selected end-of-chapter problems. Since students can generate +(and view solutions for) an unlimited number of similar problem +instances, they can work until the material is truly mastered. +Pedagogical Features We have each been teaching computer networking for +more than 30 years. Together, we bring more than 60 years of teaching +experience to this text, during which time we have taught many thousands +of students. We have also been active researchers in computer networking +during this time. (In fact, Jim and Keith first met each other as +master's students in a computer networking course taught by Mischa +Schwartz in 1979 at Columbia University.) We think all this gives us a +good perspective on where networking has been and where it is likely to +go in the future. Nevertheless, we have resisted temptations to bias the +material in this book towards our own pet research projects. We figure +you can visit our personal Web sites if you are interested in our +research. Thus, this book is about modern computer networking---it is +about contemporary protocols and technologies as well as the underlying +principles behind these protocols and technologies. We also believe + + that learning (and teaching!) about networking can be fun. A sense of +humor, use of analogies, and real-world examples in this book will +hopefully make this material more fun. Supplements for Instructors We +provide a complete supplements package to aid instructors in teaching +this course. This material can be accessed from Pearson's Instructor +Resource Center (http://www.pearsonhighered.com/irc). Visit the +Instructor Resource Center for ­information about accessing these +instructor's supplements. PowerPoint® slides. We provide PowerPoint +slides for all nine chapters. The slides have been completely updated +with this seventh edition. The slides cover each chapter in detail. They +use graphics and animations (rather than relying only on monotonous text +bullets) to make the slides interesting and visually appealing. We +provide the original PowerPoint slides so you can customize them to best +suit your own teaching needs. Some of these slides have been contributed +by other instructors who have taught from our book. Homework solutions. +We provide a solutions manual for the homework problems in the text, +programming assignments, and Wireshark labs. As noted ­earlier, we've +introduced many new homework problems in the first six chapters of the +book. Chapter Dependencies The first chapter of this text presents a +self-contained overview of computer networking. Introducing many key +concepts and terminology, this chapter sets the stage for the rest of +the book. All of the other chapters directly depend on this first +chapter. After completing Chapter 1, we recommend instructors cover +Chapters 2 through 6 in sequence, following our top-down philosophy. +Each of these five chapters leverages material from the preceding +chapters. After completing the first six chapters, the instructor has +quite a bit of flexibility. There are no interdependencies among the +last three chapters, so they can be taught in any order. However, each +of the last three chapters depends on the material in the first six +chapters. Many instructors first teach the first six chapters and then +teach one of the last three chapters for "dessert." One Final Note: We'd +Love to Hear from You We encourage students and instructors to e-mail us +with any comments they might have about our book. It's been wonderful +for us to hear from so many instructors and students from around the +world about our first five editions. We've incorporated many of these +suggestions into later editions of the book. We also encourage +instructors to send us new homework problems (and solutions) that would +complement the current homework problems. We'll post these on the +instructor-only portion of the Web site. We also encourage instructors +and students to create new Java applets that illustrate the concepts and +protocols in this book. If you have an applet that you think would be +appropriate for this text, please submit it to us. If the applet +(including notation and terminology) is appropriate, we'll be happy to +include it on the text's Web site, with an appropriate reference to the +applet's authors. So, as the saying goes, "Keep those cards and letters +coming!" Seriously, please do continue to send us interesting URLs, +point out typos, disagree with any of our claims, and tell us what works +and what doesn't work. Tell us what you think should or shouldn't be +included in the next edition. Send your e-mail to kurose@cs.umass.edu +and keithwross@nyu.edu. + + Acknowledgments Since we began writing this book in 1996, many people +have given us invaluable help and have been influential in shaping our +thoughts on how to best organize and teach a networking course. We want +to say A BIG THANKS to everyone who has helped us from the earliest +first drafts of this book, up to this seventh edition. We are also very +thankful to the many hundreds of readers from around the +world---students, faculty, practitioners---who have sent us thoughts and +comments on earlier editions of the book and suggestions for future +editions of the book. Special thanks go out to: Al Aho (Columbia +University) Hisham Al-Mubaid (University of Houston-Clear Lake) Pratima +Akkunoor (Arizona State University) Paul Amer (University of Delaware) +Shamiul Azom (Arizona State University) Lichun Bao (University of +California at Irvine) Paul Barford (University of Wisconsin) Bobby +Bhattacharjee (University of Maryland) Steven Bellovin (Columbia +University) Pravin Bhagwat (Wibhu) Supratik Bhattacharyya (previously at +Sprint) Ernst Biersack (Eurécom Institute) Shahid Bokhari (University of +Engineering & Technology, Lahore) Jean Bolot (Technicolor Research) +Daniel Brushteyn (former University of Pennsylvania student) Ken Calvert +(University of Kentucky) Evandro Cantu (Federal University of Santa +Catarina) Jeff Case (SNMP Research International) Jeff Chaltas (Sprint) +Vinton Cerf (Google) Byung Kyu Choi (Michigan Technological University) +Bram Cohen (BitTorrent, Inc.) Constantine Coutras (Pace University) John +Daigle (University of Mississippi) Edmundo A. de Souza e Silva (Federal +University of Rio de Janeiro) + + Philippe Decuetos (Eurécom Institute) Christophe Diot (Technicolor +Research) Prithula Dhunghel (Akamai) Deborah Estrin (University of +California, Los Angeles) Michalis Faloutsos (University of California at +Riverside) Wu-chi Feng (Oregon Graduate Institute) Sally Floyd (ICIR, +University of California at Berkeley) Paul Francis (Max Planck +Institute) David Fullager (Netflix) Lixin Gao (University of +Massachusetts) JJ Garcia-Luna-Aceves (University of California at Santa +Cruz) Mario Gerla (University of California at Los Angeles) David +Goodman (NYU-Poly) Yang Guo (Alcatel/Lucent Bell Labs) Tim Griffin +(Cambridge University) Max Hailperin (Gustavus Adolphus College) Bruce +Harvey (Florida A&M University, Florida State University) Carl Hauser +(Washington State University) Rachelle Heller (George Washington +University) Phillipp Hoschka (INRIA/W3C) Wen Hsin (Park University) +Albert Huang (former University of Pennsylvania student) Cheng Huang +(Microsoft Research) Esther A. Hughes (Virginia Commonwealth University) +Van Jacobson (Xerox PARC) Pinak Jain (former NYU-Poly student) Jobin +James (University of California at Riverside) Sugih Jamin (University of +Michigan) Shivkumar Kalyanaraman (IBM Research, India) Jussi Kangasharju +(University of Helsinki) Sneha Kasera (University of Utah) + + Parviz Kermani (formerly of IBM Research) Hyojin Kim (former University +of Pennsylvania student) Leonard Kleinrock (University of California at +Los Angeles) David Kotz (Dartmouth College) Beshan Kulapala (Arizona +State University) Rakesh Kumar (Bloomberg) Miguel A. Labrador +(University of South Florida) Simon Lam (University of Texas) Steve Lai +(Ohio State University) Tom LaPorta (Penn State University) Tim-Berners +Lee (World Wide Web Consortium) Arnaud Legout (INRIA) Lee Leitner +(Drexel University) Brian Levine (University of Massachusetts) Chunchun +Li (former NYU-Poly student) Yong Liu (NYU-Poly) William Liang (former +University of Pennsylvania student) Willis Marti (Texas A&M University) +Nick McKeown (Stanford University) Josh McKinzie (Park University) Deep +Medhi (University of Missouri, Kansas City) Bob Metcalfe (International +Data Group) Sue Moon (KAIST) Jenni Moyer (Comcast) Erich Nahum (IBM +Research) Christos Papadopoulos (Colorado Sate University) Craig +Partridge (BBN Technologies) Radia Perlman (Intel) Jitendra Padhye +(Microsoft Research) Vern Paxson (University of California at Berkeley) +Kevin Phillips (Sprint) + + George Polyzos (Athens University of Economics and Business) Sriram +Rajagopalan (Arizona State University) Ramachandran Ramjee (Microsoft +Research) Ken Reek (Rochester Institute of Technology) Martin Reisslein +(Arizona State University) Jennifer Rexford (Princeton University) Leon +Reznik (Rochester Institute of Technology) Pablo Rodrigez (Telefonica) +Sumit Roy (University of Washington) Dan Rubenstein (Columbia +University) Avi Rubin (Johns Hopkins University) Douglas Salane (John +Jay College) Despina Saparilla (Cisco Systems) John Schanz (Comcast) +Henning Schulzrinne (Columbia University) Mischa Schwartz (Columbia +University) Ardash Sethi (University of Delaware) Harish Sethu (Drexel +University) K. Sam Shanmugan (University of Kansas) Prashant Shenoy +(University of Massachusetts) Clay Shields (Georgetown University) Subin +Shrestra (University of Pennsylvania) Bojie Shu (former NYU-Poly +student) Mihail L. Sichitiu (NC State University) Peter Steenkiste +(Carnegie Mellon University) Tatsuya Suda (University of California at +Irvine) Kin Sun Tam (State University of New York at Albany) Don Towsley +(University of Massachusetts) David Turner (California State University, +San Bernardino) Nitin Vaidya (University of Illinois) Michele Weigle +(Clemson University) + + David Wetherall (University of Washington) Ira Winston (University of +Pennsylvania) Di Wu (Sun Yat-sen University) Shirley Wynn (NYU-Poly) Raj +Yavatkar (Intel) Yechiam Yemini (Columbia University) Dian Yu (NYU +Shanghai) Ming Yu (State University of New York at Binghamton) Ellen +Zegura (Georgia Institute of Technology) Honggang Zhang (Suffolk +University) Hui Zhang (Carnegie Mellon University) Lixia Zhang +(University of California at Los Angeles) Meng Zhang (former NYU-Poly +student) Shuchun Zhang (former University of Pennsylvania student) +Xiaodong Zhang (Ohio State University) ZhiLi Zhang (University of +Minnesota) Phil Zimmermann (independent consultant) Mike Zink +(University of Massachusetts) Cliff C. Zou (University of Central +Florida) We also want to thank the entire Pearson team---in particular, +Matt Goldstein and Joanne Manning---who have done an absolutely +outstanding job on this seventh ­edition (and who have put up with two +very finicky authors who seem congenitally ­unable to meet deadlines!). +Thanks also to our artists, Janet Theurer and Patrice Rossi Calkin, for +their work on the beautiful figures in this and earlier editions of our +book, and to Katie Ostler and her team at Cenveo for their wonderful +production work on this edition. Finally, a most special thanks go to +our previous two editors at ­Addison-Wesley---Michael Hirsch and Susan +Hartman. This book would not be what it is (and may well not have been +at all) without their graceful management, constant encouragement, +nearly infinite patience, good humor, and perseverance. + + Table of Contents Chapter 1 Computer Networks and the Internet 1 1.1 +What Is the Internet? 2 1.1.1 A Nuts-and-Bolts Description 2 1.1.2 A +Services Description 5 1.1.3 What Is a Protocol? 7 1.2 The Network Edge +9 1.2.1 Access Networks 12 1.2.2 Physical Media 18 1.3 The Network Core +21 1.3.1 Packet Switching 23 1.3.2 Circuit Switching 27 1.3.3 A Network +of Networks 31 1.4 Delay, Loss, and Throughput in Packet-Switched +Networks 35 1.4.1 Overview of Delay in Packet-Switched Networks 35 1.4.2 +Queuing Delay and Packet Loss 39 1.4.3 End-to-End Delay 41 1.4.4 +Throughput in Computer Networks 43 1.5 Protocol Layers and Their Service +Models 47 1.5.1 Layered Architecture 47 1.5.2 Encapsulation 53 1.6 +Networks Under Attack 55 1.7 History of Computer Networking and the +Internet 59 1.7.1 The Development of Packet Switching: 1961--1972 59 +1.7.2 Proprietary Networks and Internetworking: 1972--1980 60 1.7.3 A +Proliferation of Networks: 1980--1990 62 1.7.4 The Internet Explosion: +The 1990s 63 1.7.5 The New Millennium 64 1.8 Summary 65 + + Homework Problems and Questions 67 Wireshark Lab 77 Interview: Leonard +Kleinrock 79 Chapter 2 Application Layer 83 2.1 Principles of Network +Applications 84 2.1.1 Network Application Architectures 86 2.1.2 +Processes Communicating 88 2.1.3 Transport Services Available to +Applications 90 2.1.4 Transport Services Provided by the Internet 93 +2.1.5 Application-Layer Protocols 96 2.1.6 Network Applications Covered +in This Book 97 2.2 The Web and HTTP 98 2.2.1 Overview of HTTP 98 2.2.2 +Non-Persistent and Persistent Connections 100 2.2.3 HTTP Message Format +103 2.2.4 User-Server Interaction: Cookies 108 2.2.5 Web Caching 110 2.3 +Electronic Mail in the Internet 116 2.3.1 SMTP 118 2.3.2 Comparison with +HTTP 121 2.3.3 Mail Message Formats 121 2.3.4 Mail Access Protocols 122 +2.4 DNS---The Internet's Directory Service 126 2.4.1 Services Provided +by DNS 127 2.4.2 Overview of How DNS Works 129 2.4.3 DNS Records and +Messages 135 2.5 Peer-to-Peer Applications 140 2.5.1 P2P File +Distribution 140 2.6 Video Streaming and Content Distribution Networks +147 2.6.1 Internet Video 148 2.6.2 HTTP Streaming and DASH 148 + + 2.6.3 Content Distribution Networks 149 2.6.4 Case Studies: Netflix, +YouTube, and Kankan 153 2.7 Socket Programming: Creating Network +Applications 157 2.7.1 Socket Programming with UDP 159 2.7.2 Socket +Programming with TCP 164 2.8 Summary 170 Homework Problems and Questions +171 Socket Programming Assignments 180 Wireshark Labs: HTTP, DNS 182 +Interview: Marc Andreessen 184 Chapter 3 Transport Layer 187 3.1 +Introduction and Transport-Layer Services 188 3.1.1 Relationship Between +Transport and Network Layers 188 3.1.2 Overview of the Transport Layer +in the Internet 191 3.2 Multiplexing and Demultiplexing 193 3.3 +Connectionless Transport: UDP 200 3.3.1 UDP Segment Structure 204 3.3.2 +UDP Checksum 204 3.4 Principles of Reliable Data Transfer 206 3.4.1 +Building a Reliable Data Transfer Protocol 208 3.4.2 Pipelined Reliable +Data Transfer Protocols 217 3.4.3 Go-Back-N (GBN) 221 3.4.4 Selective +Repeat (SR) 226 3.5 Connection-Oriented Transport: TCP 233 3.5.1 The TCP +Connection 233 3.5.2 TCP Segment Structure 236 3.5.3 Round-Trip Time +Estimation and Timeout 241 3.5.4 Reliable Data Transfer 244 3.5.5 Flow +Control 252 3.5.6 TCP Connection Management 255 3.6 Principles of +Congestion Control 261 + + 3.6.1 The Causes and the Costs of Congestion 261 3.6.2 Approaches to +Congestion Control 268 3.7 TCP Congestion Control 269 3.7.1 Fairness 279 +3.7.2 Explicit Congestion Notification (ECN): Network-assisted +Congestion Control 282 3.8 Summary 284 Homework Problems and Questions +286 Programming Assignments 301 Wireshark Labs: Exploring TCP, UDP 302 +Interview: Van Jacobson 303 Chapter 4 The Network Layer: Data Plane 305 +4.1 Overview of Network Layer 306 4.1.1 Forwarding and Routing: The +Network Data and Control Planes 306 4.1.2 Network Service Models 311 4.2 +What's Inside a Router? 313 4.2.1 Input Port Processing and +Destination-Based Forwarding 316 4.2.2 Switching 319 4.2.3 Output Port +Processing 321 4.2.4 Where Does Queuing Occur? 321 4.2.5 Packet +Scheduling 325 4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, +and More 329 4.3.1 IPv4 Datagram Format 330 4.3.2 IPv4 Datagram +Fragmentation 332 4.3.3 IPv4 Addressing 334 4.3.4 Network Address +Translation (NAT) 345 4.3.5 IPv6 348 4.4 Generalized Forwarding and SDN +354 4.4.1 Match 356 4.4.2 Action 358 4.4.3 OpenFlow Examples of +Match-plus-action in Action 358 4.5 Summary 361 + + Homework Problems and Questions 361 Wireshark Lab 370 Interview: Vinton +G. Cerf 371 Chapter 5 The Network Layer: Control Plane 373 5.1 +Introduction 374 5.2 Routing Algorithms 376 5.2.1 The Link-State (LS) +Routing Algorithm 379 5.2.2 The Distance-Vector (DV) Routing Algorithm +384 5.3 Intra-AS Routing in the Internet: OSPF 391 5.4 Routing Among the +ISPs: BGP 395 5.4.1 The Role of BGP 395 5.4.2 Advertising BGP Route +Information 396 5.4.3 Determining the Best Routes 398 5.4.4 IP-Anycast +402 5.4.5 Routing Policy 403 5.4.6 Putting the Pieces Together: +Obtaining Internet Presence 406 5.5 The SDN Control Plane 407 5.5.1 The +SDN Control Plane: SDN Controller and SDN Control Applications 410 5.5.2 +OpenFlow Protocol 412 5.5.3 Data and Control Plane Interaction: An +Example 414 5.5.4 SDN: Past and Future 415 5.6 ICMP: The Internet +Control Message Protocol 419 5.7 Network Management and SNMP 421 5.7.1 +The Network Management Framework 422 5.7.2 The Simple Network Management +Protocol (SNMP) 424 5.8 Summary 426 Homework Problems and Questions 427 +Socket Programming Assignment 433 Programming Assignment 434 Wireshark +Lab 435 Interview: Jennifer Rexford 436 + + Chapter 6 The Link Layer and LANs 439 6.1 Introduction to the Link Layer +440 6.1.1 The Services Provided by the Link Layer 442 6.1.2 Where Is the +Link Layer Implemented? 443 6.2 Error-Detection and -Correction +Techniques 444 6.2.1 Parity Checks 446 6.2.2 Checksumming Methods 448 +6.2.3 Cyclic Redundancy Check (CRC) 449 6.3 Multiple Access Links and +Protocols 451 6.3.1 Channel Partitioning Protocols 453 6.3.2 Random +Access Protocols 455 6.3.3 Taking-Turns Protocols 464 6.3.4 DOCSIS: The +Link-Layer Protocol for Cable Internet Access 465 6.4 Switched Local +Area Networks 467 6.4.1 Link-Layer Addressing and ARP 468 6.4.2 Ethernet +474 6.4.3 Link-Layer Switches 481 6.4.4 Virtual Local Area Networks +(VLANs) 487 6.5 Link Virtualization: A Network as a Link Layer 491 6.5.1 +Multiprotocol Label Switching (MPLS) 492 6.6 Data Center Networking 495 +6.7 Retrospective: A Day in the Life of a Web Page Request 500 6.7.1 +Getting Started: DHCP, UDP, IP, and Ethernet 500 6.7.2 Still Getting +Started: DNS and ARP 502 6.7.3 Still Getting Started: Intra-Domain +Routing to the DNS Server 503 6.7.4 Web Client-Server Interaction: TCP +and HTTP 504 6.8 Summary 506 Homework Problems and Questions 507 +Wireshark Lab 515 Interview: Simon S. Lam 516 + + Chapter 7 Wireless and Mobile Networks 519 7.1 Introduction 520 7.2 +Wireless Links and Network Characteristics 525 7.2.1 CDMA 528 7.3 WiFi: +802.11 Wireless LANs 532 7.3.1 The 802.11 Architecture 533 7.3.2 The +802.11 MAC Protocol 537 7.3.3 The IEEE 802.11 Frame 542 7.3.4 Mobility +in the Same IP Subnet 546 7.3.5 Advanced Features in 802.11 547 7.3.6 +Personal Area Networks: Bluetooth and Zigbee 548 7.4 Cellular Internet +Access 551 7.4.1 An Overview of Cellular Network Architecture 551 7.4.2 +3G Cellular Data Networks: Extending the Internet to Cellular +Subscribers 554 7.4.3 On to 4G: LTE 557 7.5 Mobility Management: +Principles 560 7.5.1 Addressing 562 7.5.2 Routing to a Mobile Node 564 +7.6 Mobile IP 570 7.7 Managing Mobility in Cellular Networks 574 7.7.1 +Routing Calls to a Mobile User 576 7.7.2 Handoffs in GSM 577 7.8 +Wireless and Mobility: Impact on Higher-Layer Protocols 580 7.9 Summary +582 Homework Problems and Questions 583 Wireshark Lab 588 Interview: +Deborah Estrin 589 Chapter 8 Security in Computer Networks 593 8.1 What +Is Network Security? 594 8.2 Principles of Cryptography 596 8.2.1 +Symmetric Key Cryptography 598 8.2.2 Public Key Encryption 604 + + 8.3 Message Integrity and Digital Signatures 610 8.3.1 Cryptographic +Hash Functions 611 8.3.2 Message Authentication Code 613 8.3.3 Digital +Signatures 614 8.4 End-Point Authentication 621 8.4.1 Authentication +Protocol ap1.0 622 8.4.2 Authentication Protocol ap2.0 622 8.4.3 +Authentication Protocol ap3.0 623 8.4.4 Authentication Protocol ap3.1 +623 8.4.5 Authentication Protocol ap4.0 624 8.5 Securing E-Mail 626 +8.5.1 Secure E-Mail 627 8.5.2 PGP 630 8.6 Securing TCP Connections: SSL +631 8.6.1 The Big Picture 632 8.6.2 A More Complete Picture 635 8.7 +Network-Layer Security: IPsec and Virtual Private Networks 637 8.7.1 +IPsec and Virtual Private Networks (VPNs) 638 8.7.2 The AH and ESP +Protocols 640 8.7.3 Security Associations 640 8.7.4 The IPsec Datagram +641 8.7.5 IKE: Key Management in IPsec 645 8.8 Securing Wireless LANs +646 8.8.1 Wired Equivalent Privacy (WEP) 646 8.8.2 IEEE 802.11i 648 8.9 +Operational Security: Firewalls and Intrusion Detection Systems 651 +8.9.1 Firewalls 651 8.9.2 Intrusion Detection Systems 659 8.10 Summary +662 Homework Problems and Questions 664 Wireshark Lab 672 + + IPsec Lab 672 Interview: Steven M. Bellovin 673 Chapter 9 Multimedia +Networking 675 9.1 Multimedia Networking Applications 676 9.1.1 +Properties of Video 676 9.1.2 Properties of Audio 677 9.1.3 Types of +Multimedia Network Applications 679 9.2 Streaming Stored Video 681 9.2.1 +UDP Streaming 683 9.2.2 HTTP Streaming 684 9.3 Voice-over-IP 688 9.3.1 +Limitations of the Best-Effort IP Service 688 9.3.2 Removing Jitter at +the Receiver for Audio 691 9.3.3 Recovering from Packet Loss 694 9.3.4 +Case Study: VoIP with Skype 697 9.4 Protocols for Real-Time +Conversational Applications 700 9.4.1 RTP 700 9.4.2 SIP 703 9.5 Network +Support for Multimedia 709 9.5.1 Dimensioning Best-Effort Networks 711 +9.5.2 Providing Multiple Classes of Service 712 9.5.3 Diffserv 719 9.5.4 +Per-Connection Quality-of-Service (QoS) Guarantees: Resource Reservation +and Call Admission 723 9.6 Summary 726 Homework Problems and Questions +727 Programming Assignment 735 Interview: Henning Schulzrinne 736 +References 741 Index 783 + + Chapter 1 Computer Networks and the Internet + +Today's Internet is arguably the largest engineered system ever created +by ­mankind, with hundreds of millions of connected computers, +communication links, and switches; with billions of users who connect +via laptops, tablets, and smartphones; and with an array of new +Internet-connected "things" including game consoles, surveillance +systems, watches, eye glasses, thermostats, body scales, and cars. Given +that the Internet is so large and has so many diverse components and +uses, is there any hope of understanding how it works? Are there guiding +principles and structure that can provide a foundation for understanding +such an amazingly large and complex system? And if so, is it possible +that it actually could be both interesting and fun to learn about +computer networks? Fortunately, the answer to all of these questions is +a resounding YES! Indeed, it's our aim in this book to provide you with +a modern introduction to the dynamic field of computer networking, +giving you the principles and practical insights you'll need to +understand not only today's networks, but tomorrow's as well. This first +chapter presents a broad overview of computer networking and the +Internet. Our goal here is to paint a broad picture and set the context +for the rest of this book, to see the forest through the trees. We'll +cover a lot of ground in this introductory chapter and discuss a lot of +the pieces of a computer network, without losing sight of the big +picture. We'll structure our overview of computer networks in this +chapter as follows. After introducing some basic terminology and +concepts, we'll first examine the basic hardware and software components +that make up a network. We'll begin at the network's edge and look at +the end systems and network applications running in the network. We'll +then explore the core of a computer network, examining the links and the +switches that transport data, as well as the access networks and +physical media that connect end systems to the network core. We'll learn +that the Internet is a network of networks, and we'll learn how these +networks connect with each other. After having completed this overview +of the edge and core of a computer network, we'll take the broader and +more abstract view in the second half of this chapter. We'll examine +delay, loss, and throughput of data in a computer network and provide +simple quantitative models for end-to-end throughput and delay: models +that take into account transmission, propagation, and queuing delays. +We'll then introduce some of the key architectural principles in +computer networking, namely, protocol layering and service models. We'll +also learn that computer networks are vulnerable to many different types +of attacks; we'll survey + + some of these attacks and consider how computer networks can be made +more secure. Finally, we'll close this chapter with a brief history of +computer networking. + + 1.1 What Is the Internet? In this book, we'll use the public Internet, a +specific computer network, as our principal vehicle for discussing +computer networks and their protocols. But what is the Internet? There +are a couple of ways to answer this question. First, we can describe the +nuts and bolts of the Internet, that is, the basic hardware and software +components that make up the Internet. Second, we can describe the +Internet in terms of a networking infrastructure that provides services +to distributed applications. Let's begin with the nuts-and-bolts +description, using Figure 1.1 to illustrate our discussion. + +1.1.1 A Nuts-and-Bolts Description The Internet is a computer network +that interconnects billions of computing devices throughout the world. +Not too long ago, these computing devices were primarily traditional +desktop PCs, Linux workstations, and so-called servers that store and +transmit information such as Web pages and e-mail messages. +Increasingly, however, nontraditional Internet "things" such as laptops, +smartphones, tablets, TVs, gaming consoles, thermostats, home security +systems, home appliances, watches, eye glasses, cars, traffic control +systems and more are being connected to the Internet. Indeed, the term +computer network is beginning to sound a bit dated, given the many +nontraditional devices that are being hooked up to the Internet. In +Internet jargon, all of these devices are called hosts or end systems. +By some estimates, in 2015 there were about 5 billion devices connected +to the Internet, and the number will reach 25 billion by 2020 \[Gartner +2014\]. It is estimated that in 2015 there were over 3.2 billion +Internet users worldwide, approximately 40% of the world population +\[ITU 2015\]. + + Figure 1.1 Some pieces of the Internet + +End systems are connected together by a network of communication links +and packet switches. We'll see in Section 1.2 that there are many types +of communication links, which are made up of + + different types of physical media, including coaxial cable, copper wire, +optical fiber, and radio spectrum. Different links can transmit data at +different rates, with the transmission rate of a link measured in +bits/second. When one end system has data to send to another end system, +the sending end system segments the data and adds header bytes to each +segment. The resulting packages of information, known as packets in the +jargon of computer networks, are then sent through the network to the +destination end system, where they are reassembled into the original +data. A packet switch takes a packet arriving on one of its incoming +communication links and forwards that packet on one of its outgoing +communication links. Packet switches come in many shapes and flavors, +but the two most prominent types in today's Internet are routers and +link-layer switches. Both types of switches forward packets toward their +ultimate destinations. Link-layer switches are typically used in access +networks, while routers are typically used in the network core. The +sequence of communication links and packet switches traversed by a +packet from the sending end system to the receiving end system is known +as a route or path through the network. Cisco predicts annual global IP +traffic will pass the zettabyte (1021 bytes) threshold by the end of +2016, and will reach 2 zettabytes per year by 2019 \[Cisco VNI 2015\]. +Packet-switched networks (which transport packets) are in many ways +similar to transportation networks of highways, roads, and intersections +(which transport vehicles). Consider, for example, a factory that needs +to move a large amount of cargo to some destination warehouse located +thousands of kilometers away. At the factory, the cargo is segmented and +loaded into a fleet of trucks. Each of the trucks then independently +travels through the network of highways, roads, and intersections to the +destination warehouse. At the destination warehouse, the cargo is +unloaded and grouped with the rest of the cargo arriving from the same +shipment. Thus, in many ways, packets are analogous to trucks, +communication links are analogous to highways and roads, packet switches +are analogous to intersections, and end systems are analogous to +buildings. Just as a truck takes a path through the transportation +network, a packet takes a path through a computer network. End systems +access the Internet through Internet Service Providers (ISPs), including +residential ISPs such as local cable or telephone companies; corporate +ISPs; university ISPs; ISPs that provide WiFi access in airports, +hotels, coffee shops, and other public places; and cellular data ISPs, +providing mobile access to our smartphones and other devices. Each ISP +is in itself a network of packet switches and communication links. ISPs +provide a variety of types of network access to the end systems, +including residential broadband access such as cable modem or DSL, +high-speed local area network access, and mobile wireless access. ISPs +also provide ­Internet access to content providers, connecting Web sites +and video servers directly to the Internet. The Internet is all about +connecting end systems to each other, so the ISPs that provide access to +end systems must also be interconnected. These lower-tier ISPs are +interconnected through national and international upper-tier ISPs such +as Level 3 Communications, AT&T, Sprint, and NTT. An upper-tier ISP +consists of high-speed routers interconnected with high-speed +fiber-optic links. Each ISP network, whether upper-tier or lower-tier, +is + + managed independently, runs the IP protocol (see below), and conforms to +certain naming and address conventions. We'll examine ISPs and their +interconnection more closely in Section 1.3. End systems, packet +switches, and other pieces of the Internet run protocols that control +the sending and receiving of information within the Internet. The +Transmission Control Protocol (TCP) and the Internet Protocol (IP) are +two of the most important protocols in the Internet. The IP protocol +specifies the format of the packets that are sent and received among +routers and end systems. The Internet's principal protocols are +collectively known as TCP/IP. We'll begin looking into protocols in this +introductory chapter. But that's just a start---much of this book is +concerned with computer network protocols! Given the importance of +protocols to the Internet, it's important that everyone agree on what +each and every protocol does, so that people can create systems and +products that interoperate. This is where standards come into play. +Internet ­standards are developed by the Internet Engineering Task Force +(IETF) \[IETF 2016\]. The IETF standards documents are called requests +for comments (RFCs). RFCs started out as general requests for comments +(hence the name) to resolve network and protocol design problems that +faced the precursor to the Internet \[Allman 2011\]. RFCs tend to be +quite technical and detailed. They define protocols such as TCP, IP, +HTTP (for the Web), and SMTP (for e-mail). There are currently more than +7,000 RFCs. Other bodies also specify standards for network components, +most notably for network links. The IEEE 802 LAN/MAN Standards Committee +\[IEEE 802 2016\], for example, specifies the Ethernet and wireless WiFi +standards. + +1.1.2 A Services Description Our discussion above has identified many of +the pieces that make up the Internet. But we can also describe the +Internet from an entirely different angle---namely, as an infrastructure +that provides services to applications. In addition to traditional +applications such as e-mail and Web surfing, Internet applications +include mobile smartphone and tablet applications, including Internet +messaging, mapping with real-time road-traffic information, music +streaming from the cloud, movie and television streaming, online social +networks, video conferencing, multi-person games, and location-based +recommendation systems. The applications are said to be distributed +applications, since they involve multiple end systems that exchange data +with each other. Importantly, Internet applications run on end +systems--- they do not run in the packet switches in the network core. +Although packet switches facilitate the exchange of data among end +systems, they are not concerned with the application that is the source +or sink of data. Let's explore a little more what we mean by an +infrastructure that provides ­services to applications. To this end, +suppose you have an exciting new idea for a distributed Internet +application, one that may greatly benefit humanity or one that may +simply make you rich and famous. How might you go about + + transforming this idea into an actual Internet application? Because +applications run on end systems, you are going to need to write programs +that run on the end systems. You might, for example, write your programs +in Java, C, or Python. Now, because you are developing a distributed +Internet application, the programs running on the different end systems +will need to send data to each other. And here we get to a central +issue---one that leads to the alternative way of describing the Internet +as a platform for applications. How does one program running on one end +system instruct the Internet to deliver data to another program running +on another end system? End systems attached to the Internet provide a +socket interface that specifies how a program running on one end system +asks the Internet infrastructure to deliver data to a specific +destination program running on another end system. This Internet socket +interface is a set of rules that the sending program must follow so that +the Internet can deliver the data to the destination program. We'll +discuss the Internet socket interface in detail in Chapter 2. For now, +let's draw upon a simple analogy, one that we will frequently use in +this book. Suppose Alice wants to send a letter to Bob using the postal +service. Alice, of course, can't just write the letter (the data) and +drop the letter out her window. Instead, the postal service requires +that Alice put the letter in an envelope; write Bob's full name, +address, and zip code in the center of the envelope; seal the envelope; +put a stamp in the upper-right-hand corner of the envelope; and finally, +drop the envelope into an official postal service mailbox. Thus, the +postal service has its own "postal service interface," or set of rules, +that Alice must follow to have the postal service deliver her letter to +Bob. In a similar manner, the Internet has a socket interface that the +program sending data must follow to have the Internet deliver the data +to the program that will receive the data. The postal service, of +course, provides more than one service to its customers. It provides +express delivery, reception confirmation, ordinary use, and many more +services. In a similar manner, the Internet provides multiple services +to its applications. When you develop an Internet application, you too +must choose one of the Internet's services for your application. We'll +describe the Internet's services in Chapter 2. We have just given two +descriptions of the Internet; one in terms of its hardware and software +components, the other in terms of an infrastructure for providing +services to distributed applications. But perhaps you are still confused +as to what the Internet is. What are packet switching and TCP/IP? What +are routers? What kinds of communication links are present in the +Internet? What is a distributed application? How can a thermostat or +body scale be attached to the Internet? If you feel a bit overwhelmed by +all of this now, don't worry---the purpose of this book is to introduce +you to both the nuts and bolts of the Internet and the principles that +govern how and why it works. We'll explain these important terms and +questions in the following sections and chapters. + +1.1.3 What Is a Protocol? + + Now that we've got a bit of a feel for what the Internet is, let's +consider another important buzzword in computer networking: protocol. +What is a protocol? What does a protocol do? A Human Analogy It is +probably easiest to understand the notion of a computer network protocol +by first considering some human analogies, since we humans execute +protocols all of the time. Consider what you do when you want to ask +someone for the time of day. A typical exchange is shown in Figure 1.2. +Human protocol (or good manners, at least) dictates that one first offer +a greeting (the first "Hi" in Figure 1.2) to initiate communication with +someone else. The typical response to a "Hi" is a returned "Hi" message. +Implicitly, one then takes a cordial "Hi" response as an indication that +one can proceed and ask for the time of day. A different response to the +initial "Hi" (such as "Don't bother me!" or "I don't speak English," or +some unprintable reply) might + +Figure 1.2 A human protocol and a computer network protocol + +indicate an unwillingness or inability to communicate. In this case, the +human protocol would be not to ask for the time of day. Sometimes one +gets no response at all to a question, in which case one typically gives +up asking that person for the time. Note that in our human protocol, +there are specific messages + + we send, and specific actions we take in response to the received reply +messages or other events (such as no reply within some given amount of +time). Clearly, transmitted and received messages, and actions taken +when these messages are sent or received or other events occur, play a +central role in a human protocol. If people run different protocols (for +example, if one person has manners but the other does not, or if one +understands the concept of time and the other does not) the protocols do +not interoperate and no useful work can be accomplished. The same is +true in networking---it takes two (or more) communicating entities +running the same protocol in order to accomplish a task. Let's consider +a second human analogy. Suppose you're in a college class (a computer +networking class, for example!). The teacher is droning on about +protocols and you're confused. The teacher stops to ask, "Are there any +questions?" (a message that is transmitted to, and received by, all +students who are not sleeping). You raise your hand (transmitting an +implicit message to the teacher). Your teacher acknowledges you with a +smile, saying "Yes . . ." (a transmitted message encouraging you to ask +your question---teachers love to be asked questions), and you then ask +your question (that is, transmit your message to your teacher). Your +teacher hears your question (receives your question message) and answers +(transmits a reply to you). Once again, we see that the transmission and +receipt of messages, and a set of conventional actions taken when these +messages are sent and received, are at the heart of this +question-and-answer protocol. Network Protocols A network protocol is +similar to a human protocol, except that the entities exchanging +messages and taking actions are hardware or software components of some +device (for example, computer, smartphone, tablet, router, or other +network-capable device). All activity in the Internet that involves two +or more communicating remote entities is governed by a protocol. For +example, hardware-implemented protocols in two physically connected +computers control the flow of bits on the "wire" between the two network +interface cards; congestion-control protocols in end systems control the +rate at which packets are transmitted between sender and receiver; +protocols in routers determine a packet's path from source to +destination. Protocols are running everywhere in the Internet, and +consequently much of this book is about computer network protocols. As +an example of a computer network protocol with which you are probably +familiar, consider what happens when you make a request to a Web server, +that is, when you type the URL of a Web page into your Web browser. The +scenario is illustrated in the right half of Figure 1.2. First, your +computer will send a connection request message to the Web server and +wait for a reply. The Web server will eventually receive your connection +request message and return a connection reply message. Knowing that it +is now OK to request the Web document, your computer then sends the name +of the Web page it wants to fetch from that Web server in a GET message. +Finally, the Web server returns the Web page (file) to your computer. + + Given the human and networking examples above, the exchange of messages +and the actions taken when these messages are sent and received are the +key defining elements of a protocol: A protocol defines the format and +the order of messages exchanged between two or more communicating +entities, as well as the actions taken on the transmission and/or +receipt of a message or other event. The Internet, and computer networks +in general, make extensive use of protocols. Different protocols are +used to accomplish different communication tasks. As you read through +this book, you will learn that some protocols are simple and +straightforward, while others are complex and intellectually deep. +Mastering the field of computer networking is equivalent to +understanding the what, why, and how of networking protocols. + + 1.2 The Network Edge In the previous section we presented a high-level +overview of the Internet and networking protocols. We are now going to +delve a bit more deeply into the components of a computer network (and +the Internet, in particular). We begin in this section at the edge of a +network and look at the components with which we are most +­familiar---namely, the computers, smartphones and other devices that we +use on a daily basis. In the next section we'll move from the network +edge to the network core and examine switching and routing in computer +networks. Recall from the previous section that in computer networking +jargon, the computers and other devices connected to the Internet are +often referred to as end systems. They are referred to as end systems +because they sit at the edge of the Internet, as shown in Figure 1.3. +The Internet's end systems include desktop computers (e.g., desktop PCs, +Macs, and Linux boxes), servers (e.g., Web and e-mail servers), and +mobile devices (e.g., laptops, smartphones, and tablets). Furthermore, +an increasing number of non-traditional "things" are being attached to +the Internet as end ­systems (see the Case History feature). End systems +are also referred to as hosts because they host (that is, run) +application programs such as a Web browser program, a Web server +program, an e-mail client program, or an e-mail server program. +Throughout this book we will use the + + Figure 1.3 End-system interaction + +CASE HISTORY THE INTERNET OF THINGS Can you imagine a world in which +just about everything is wirelessly connected to the Internet? A world +in which most people, cars, bicycles, eye glasses, watches, toys, +hospital equipment, home sensors, classrooms, video surveillance +systems, atmospheric sensors, store-shelf + + products, and pets are connected? This world of the Internet of Things +(IoT) may actually be just around the corner. By some estimates, as of +2015 there are already 5 billion things connected to the Internet, and +the number could reach 25 billion by 2020 \[Gartner 2014\]. These things +include our smartphones, which already follow us around in our homes, +offices, and cars, reporting our geolocations and usage data to our ISPs +and Internet applications. But in addition to our smartphones, a +wide-variety of non-traditional "things" are already available as +products. For example, there are Internet-connected wearables, including +watches (from Apple and many others) and eye glasses. Internet-connected +glasses can, for example, upload everything we see to the cloud, +allowing us to share our visual experiences with people around the world +in realtime. There are Internet-connected things already available for +the smart home, including Internet-connected thermostats that can be +controlled remotely from our smartphones, and Internet-connected body +scales, enabling us to graphically review the progress of our diets from +our smartphones. There are Internet-connected toys, including dolls that +recognize and interpret a child's speech and respond appropriately. The +IoT offers potentially revolutionary benefits to users. But at the same +time there are also huge security and privacy risks. For example, +attackers, via the Internet, might be able to hack into IoT devices or +into the servers collecting data from IoT devices. For example, an +attacker could hijack an Internet-connected doll and talk directly with +a child; or an attacker could hack into a database that stores ­personal +health and activity information collected from wearable devices. These +security and privacy concerns could undermine the consumer confidence +necessary for the ­technologies to meet their full potential and may +result in less widespread adoption \[FTC 2015\]. + +terms hosts and end systems interchangeably; that is, host = end system. +Hosts are sometimes further divided into two categories: clients and +servers. Informally, clients tend to be desktop and mobile PCs, +smartphones, and so on, whereas servers tend to be more powerful +machines that store and distribute Web pages, stream video, relay +e-mail, and so on. Today, most of the servers from which we receive +search results, e-mail, Web pages, and videos reside in large data +centers. For example, Google has 50-100 data centers, including about 15 +large centers, each with more than 100,000 servers. + +1.2.1 Access Networks Having considered the applications and end systems +at the "edge of the network," let's next consider the access +network---the network that physically connects an end system to the +first router (also known as the "edge router") on a path from the end +system to any other distant end system. Figure 1.4 shows several types +of access + + Figure 1.4 Access networks + +networks with thick, shaded lines and the settings (home, enterprise, +and wide-area mobile wireless) in which they are used. Home Access: DSL, +Cable, FTTH, Dial-Up, and Satellite + + In developed countries as of 2014, more than 78 percent of the +households have Internet access, with Korea, Netherlands, Finland, and +Sweden leading the way with more than 80 percent of households having +Internet access, almost all via a high-speed broadband connection \[ITU +2015\]. Given this widespread use of home access networks let's begin +our overview of access networks by considering how homes connect to the +Internet. Today, the two most prevalent types of broadband residential +access are digital subscriber line (DSL) and cable. A residence +typically obtains DSL Internet access from the same local telephone +company (telco) that provides its wired local phone access. Thus, when +DSL is used, a customer's telco is also its ISP. As shown in Figure 1.5, +each customer's DSL modem uses the existing telephone line (twistedpair +copper wire, which we'll discuss in Section 1.2.2) to exchange data with +a digital subscriber line access multiplexer (DSLAM) located in the +telco's local central office (CO). The home's DSL modem takes digital +data and translates it to high-­frequency tones for transmission over +telephone wires to the CO; the analog signals from many such houses are +translated back into digital format at the DSLAM. The residential +telephone line carries both data and traditional telephone signals +simultaneously, which are encoded at different frequencies: A high-speed +downstream channel, in the 50 kHz to 1 MHz band A medium-speed upstream +channel, in the 4 kHz to 50 kHz band An ordinary two-way telephone +channel, in the 0 to 4 kHz band This approach makes the single DSL link +appear as if there were three separate links, so that a telephone call +and an Internet connection can share the DSL link at the same time. + +Figure 1.5 DSL Internet access + +(We'll describe this technique of frequency-division multiplexing in +Section 1.3.1.) On the customer side, a splitter separates the data and +telephone signals arriving to the home and forwards the data signal to + + the DSL modem. On the telco side, in the CO, the DSLAM separates the +data and phone signals and sends the data into the Internet. Hundreds or +even thousands of households connect to a single DSLAM \[Dischinger +2007\]. The DSL standards define multiple transmission rates, including +12 Mbps downstream and 1.8 Mbps upstream \[ITU 1999\], and 55 Mbps +downstream and 15 Mbps upstream \[ITU 2006\]. Because the downstream and +upstream rates are different, the access is said to be asymmetric. The +actual downstream and upstream transmission rates achieved may be less +than the rates noted above, as the DSL provider may purposefully limit a +residential rate when tiered service (different rates, available at +different prices) are offered. The maximum rate is also limited by the +distance between the home and the CO, the gauge of the twisted-pair line +and the degree of electrical interference. Engineers have expressly +designed DSL for short distances between the home and the CO; generally, +if the residence is not located within 5 to 10 miles of the CO, the +residence must resort to an alternative form of Internet access. While +DSL makes use of the telco's existing local telephone infrastructure, +cable Internet access makes use of the cable television company's +existing cable television infrastructure. A residence obtains cable +Internet access from the same company that provides its cable +television. As illustrated in Figure 1.6, fiber optics connect the cable +head end to neighborhood-level junctions, from which traditional coaxial +cable is then used to reach individual houses and apartments. Each +neighborhood junction typically supports 500 to 5,000 homes. Because +both fiber and coaxial cable are employed in this system, it is often +referred to as hybrid fiber coax (HFC). + +Figure 1.6 A hybrid fiber-coaxial access network + +Cable internet access requires special modems, called cable modems. As +with a DSL modem, the cable + + modem is typically an external device and connects to the home PC +through an Ethernet port. (We will discuss Ethernet in great detail in +Chapter 6.) At the cable head end, the cable modem termination system +(CMTS) serves a similar function as the DSL network's DSLAM---turning +the analog signal sent from the cable modems in many downstream homes +back into digital format. Cable modems divide the HFC network into two +channels, a downstream and an upstream channel. As with DSL, access is +typically asymmetric, with the downstream channel typically allocated a +higher transmission rate than the upstream channel. The ­DOCSIS 2.0 +standard defines downstream rates up to 42.8 Mbps and upstream rates of +up to 30.7 Mbps. As in the case of DSL networks, the maximum achievable +rate may not be realized due to lower contracted data rates or media +impairments. One important characteristic of cable Internet access is +that it is a shared broadcast medium. In particular, every packet sent +by the head end travels downstream on every link to every home and every +packet sent by a home travels on the upstream channel to the head end. +For this reason, if several users are simultaneously downloading a video +file on the downstream channel, the actual rate at which each user +receives its video file will be significantly lower than the aggregate +cable downstream rate. On the other hand, if there are only a few active +users and they are all Web surfing, then each of the users may actually +receive Web pages at the full cable downstream rate, because the users +will rarely request a Web page at exactly the same time. Because the +upstream channel is also shared, a distributed multiple access protocol +is needed to coordinate transmissions and avoid collisions. (We'll +discuss this collision issue in some detail in Chapter 6.) Although DSL +and cable networks currently represent more than 85 percent of +residential broadband access in the United States, an up-and-coming +technology that provides even higher speeds is fiber to the home (FTTH) +\[FTTH Council 2016\]. As the name suggests, the FTTH concept is +simple---provide an optical fiber path from the CO directly to the home. +Many countries today---including the UAE, South Korea, Hong Kong, Japan, +Singapore, Taiwan, Lithuania, and Sweden---now have household +penetration rates exceeding 30% \[FTTH Council 2016\]. There are several +competing technologies for optical distribution from the CO to the +homes. The simplest optical distribution network is called direct fiber, +with one fiber leaving the CO for each home. More commonly, each fiber +leaving the central office is actually shared by many homes; it is not +until the fiber gets relatively close to the homes that it is split into +individual customer-specific fibers. There are two competing +optical-distribution network architectures that perform this splitting: +active optical networks (AONs) and passive optical networks (PONs). AON +is essentially switched Ethernet, which is discussed in Chapter 6. Here, +we briefly discuss PON, which is used in Verizon's FIOS service. Fig­ure +1.7 shows FTTH using the PON distribution architecture. Each home has an +optical network terminator (ONT), which is connected by dedicated +optical fiber to a neighborhood splitter. The splitter combines a number +of homes (typically less + + Figure 1.7 FTTH Internet access + +than 100) onto a single, shared optical fiber, which connects to an +optical line ­terminator (OLT) in the telco's CO. The OLT, providing +conversion between optical and electrical signals, connects to the +Internet via a telco router. In the home, users connect a home router +(typically a wireless router) to the ONT and access the ­Internet via +this home router. In the PON architecture, all packets sent from OLT to +the splitter are replicated at the splitter (similar to a cable head +end). FTTH can potentially provide Internet access rates in the gigabits +per second range. However, most FTTH ISPs provide different rate +offerings, with the higher rates naturally costing more money. The +average downstream speed of US FTTH customers was approximately 20 Mbps +in 2011 (compared with 13 Mbps for cable access networks and less than 5 +Mbps for DSL) \[FTTH Council 2011b\]. Two other access network +technologies are also used to provide Internet access to the home. In +locations where DSL, cable, and FTTH are not available (e.g., in some +rural settings), a satellite link can be used to connect a residence to +the Internet at speeds of more than 1 Mbps; StarBand and HughesNet are +two such satellite access providers. Dial-up access over traditional +phone lines is based on the same model as DSL---a home modem connects +over a phone line to a modem in the ISP. Compared with DSL and other +broadband access networks, dial-up access is excruciatingly slow at 56 +kbps. Access in the Enterprise (and the Home): Ethernet and WiFi On +corporate and university campuses, and increasingly in home settings, a +local area network (LAN) is used to connect an end system to the edge +router. Although there are many types of LAN technologies, Ethernet is +by far the most prevalent access technology in corporate, university, +and home networks. As shown in Figure 1.8, Ethernet users use +twisted-pair copper wire to connect to an Ethernet switch, a technology +discussed in detail in Chapter 6. The Ethernet switch, or a network of +such + + Figure 1.8 Ethernet Internet access + +interconnected switches, is then in turn connected into the larger +Internet. With Ethernet access, users typically have 100 Mbps or 1 Gbps +access to the Ethernet switch, whereas servers may have 1 Gbps or even +10 Gbps access. Increasingly, however, people are accessing the Internet +wirelessly from laptops, smartphones, tablets, and other "things" (see +earlier sidebar on "Internet of Things"). In a wireless LAN setting, +wireless users transmit/receive packets to/from an access point that is +connected into the enterprise's network (most likely using wired +Ethernet), which in turn is connected to the wired Internet. A wireless +LAN user must typically be within a few tens of meters of the access +point. Wireless LAN access based on IEEE 802.11 technology, more +colloquially known as WiFi, is now just about everywhere---universities, +business offices, cafes, airports, homes, and even in airplanes. In many +cities, one can stand on a street corner and be within range of ten or +twenty base stations (for a browseable global map of 802.11 base +stations that have been discovered and logged on a Web site by people +who take great enjoyment in doing such things, see \[wigle.net 2016\]). +As discussed in detail in Chapter 7, 802.11 today provides a shared +transmission rate of up to more than 100 Mbps. Even though Ethernet and +WiFi access networks were initially deployed in enterprise (corporate, +university) settings, they have recently become relatively common +components of home networks. Many homes combine broadband residential +access (that is, cable modems or DSL) with these inexpensive wireless +LAN technologies to create powerful home networks \[Edwards 2011\]. +Figure 1.9 shows a typical home network. This home network consists of a +roaming laptop as well as a wired PC; a base station (the wireless +access point), which communicates with the wireless PC and other +wireless devices in the home; a cable modem, providing broadband access +to the Internet; and a router, which interconnects the base station and +the stationary PC with the cable modem. This network allows household +members to have broadband access to the Internet with one member roaming +from the + + kitchen to the backyard to the bedrooms. + +Figure 1.9 A typical home network + +Wide-Area Wireless Access: 3G and LTE Increasingly, devices such as +iPhones and Android devices are being used to message, share photos in +social networks, watch movies, and stream music while on the run. These +devices employ the same wireless infrastructure used for cellular +telephony to send/receive packets through a base station that is +operated by the cellular network provider. Unlike WiFi, a user need only +be within a few tens of kilometers (as opposed to a few tens of meters) +of the base station. Telecommunications companies have made enormous +investments in so-called third-generation (3G) wireless, which provides +packet-switched wide-area wireless Internet access at speeds in excess +of 1 Mbps. But even higher-speed wide-area access technologies---a +fourth-generation (4G) of wide-area wireless networks---are already +being deployed. LTE (for "Long-Term Evolution"---a candidate for Bad +Acronym of the Year Award) has its roots in 3G technology, and can +achieve rates in excess of 10 Mbps. LTE downstream rates of many tens of +Mbps have been reported in commercial deployments. We'll cover the basic +principles of wireless networks and mobility, as well as WiFi, 3G, and +LTE technologies (and more!) in Chapter 7. + +1.2.2 Physical Media In the previous subsection, we gave an overview of +some of the most important network access technologies in the Internet. +As we described these technologies, we also indicated the physical media +used. For example, we said that HFC uses a combination of fiber cable +and coaxial cable. We said that DSL and Ethernet use copper wire. And we +said that mobile access networks use the radio spectrum. In this +subsection we provide a brief overview of these and other transmission +media that are commonly used in the Internet. + + In order to define what is meant by a physical medium, let us reflect on +the brief life of a bit. Consider a bit traveling from one end system, +through a series of links and routers, to another end system. This poor +bit gets kicked around and transmitted many, many times! The source end +system first transmits the bit, and shortly thereafter the first router +in the series receives the bit; the first router then transmits the bit, +and shortly thereafter the second router receives the bit; and so on. +Thus our bit, when traveling from source to destination, passes through +a series of transmitter-receiver pairs. For each transmitterreceiver +pair, the bit is sent by propagating electromagnetic waves or optical +pulses across a physical medium. The physical medium can take many +shapes and forms and does not have to be of the same type for each +transmitter-receiver pair along the path. Examples of physical media +include twisted-pair copper wire, coaxial cable, multimode fiber-optic +cable, terrestrial radio spectrum, and satellite radio spectrum. +Physical media fall into two categories: guided media and unguided +media. With guided media, the waves are guided along a solid medium, +such as a fiber-optic cable, a twisted-pair copper wire, or a coaxial +cable. With unguided media, the waves propagate in the atmosphere and in +outer space, such as in a wireless LAN or a digital satellite channel. +But before we get into the characteristics of the various media types, +let us say a few words about their costs. The actual cost of the +physical link (copper wire, fiber-optic cable, and so on) is often +relatively minor compared with other networking costs. In particular, +the labor cost associated with the installation of the physical link can +be orders of magnitude higher than the cost of the material. For this +reason, many builders install twisted pair, optical fiber, and coaxial +cable in every room in a building. Even if only one medium is initially +used, there is a good chance that another medium could be used in the +near future, and so money is saved by not having to lay additional wires +in the future. Twisted-Pair Copper Wire The least expensive and most +commonly used guided transmission medium is twisted-pair copper wire. +For over a hundred years it has been used by telephone networks. In +fact, more than 99 percent of the wired connections from the telephone +handset to the local telephone switch use twisted-pair copper wire. Most +of us have seen twisted pair in our homes (or those of our parents or +grandparents!) and work environments. Twisted pair consists of two +insulated copper wires, each about 1 mm thick, arranged in a regular +spiral pattern. The wires are twisted together to reduce the electrical +interference from similar pairs close by. Typically, a number of pairs +are bundled together in a cable by wrapping the pairs in a protective +shield. A wire pair constitutes a single communication link. Unshielded +twisted pair (UTP) is commonly used for computer networks within a +building, that is, for LANs. Data rates for LANs using twisted pair +today range from 10 Mbps to 10 Gbps. The data rates that can be achieved +depend on the thickness of the wire and the distance between transmitter +and receiver. When fiber-optic technology emerged in the 1980s, many +people disparaged twisted pair because of its relatively low bit rates. +Some people even felt that fiber-optic technology would completely +replace twisted pair. But twisted pair did not give up so easily. Modern +twisted-pair technology, such as category + + 6a cable, can achieve data rates of 10 Gbps for distances up to a +hundred meters. In the end, twisted pair has emerged as the dominant +solution for high-speed LAN networking. As discussed earlier, twisted +pair is also commonly used for residential Internet access. We saw that +dial-up modem technology enables access at rates of up to 56 kbps over +twisted pair. We also saw that DSL (digital subscriber line) technology +has enabled residential users to access the Internet at tens of Mbps +over twisted pair (when users live close to the ISP's central office). +Coaxial Cable Like twisted pair, coaxial cable consists of two copper +conductors, but the two conductors are concentric rather than parallel. +With this construction and special insulation and shielding, coaxial +cable can achieve high data transmission rates. Coaxial cable is quite +common in cable television systems. As we saw earlier, cable television +systems have recently been coupled with cable modems to provide +residential users with Internet access at rates of tens of Mbps. In +cable television and cable Internet access, the transmitter shifts the +digital signal to a specific frequency band, and the resulting analog +signal is sent from the transmitter to one or more receivers. Coaxial +cable can be used as a guided shared medium. Specifically, a number of +end systems can be connected directly to the cable, with each of the end +systems receiving whatever is sent by the other end systems. Fiber +Optics An optical fiber is a thin, flexible medium that conducts pulses +of light, with each pulse representing a bit. A single optical fiber can +support tremendous bit rates, up to tens or even hundreds of gigabits +per second. They are immune to electromagnetic interference, have very +low signal attenuation up to 100 kilometers, and are very hard to tap. +These characteristics have made fiber optics the preferred longhaul +guided transmission media, particularly for overseas links. Many of the +long-distance telephone networks in the United States and elsewhere now +use fiber optics exclusively. Fiber optics is also prevalent in the +backbone of the Internet. However, the high cost of optical +devices---such as transmitters, receivers, and switches---has hindered +their deployment for short-haul transport, such as in a LAN or into the +home in a residential access network. The Optical Carrier (OC) standard +link speeds range from 51.8 Mbps to 39.8 Gbps; these specifications are +often referred to as OC-n, where the link speed equals n ∞ 51.8 Mbps. +Standards in use today include OC-1, OC-3, OC-12, OC-24, OC-48, OC96, +OC-192, OC-768. \[Mukherjee 2006, Ramaswami 2010\] provide coverage of +various aspects of optical networking. Terrestrial Radio Channels Radio +channels carry signals in the electromagnetic spectrum. They are an +attractive medium because they require no physical wire to be installed, +can penetrate walls, provide connectivity to a mobile user, + + and can potentially carry a signal for long distances. The +characteristics of a radio channel depend significantly on the +propagation environment and the distance over which a signal is to be +carried. Environmental considerations determine path loss and shadow +fading (which decrease the signal strength as the signal travels over a +distance and around/through obstructing objects), multipath fading (due +to signal reflection off of interfering objects), and interference (due +to other transmissions and electromagnetic signals). Terrestrial radio +channels can be broadly classified into three groups: those that operate +over very short distance (e.g., with one or two meters); those that +operate in local areas, typically spanning from ten to a few hundred +meters; and those that operate in the wide area, spanning tens of +kilometers. Personal devices such as wireless headsets, keyboards, and +medical devices operate over short distances; the wireless LAN +technologies described in Section 1.2.1 use local-area radio channels; +the cellular access technologies use wide-area radio channels. We'll +discuss radio channels in detail in Chapter 7. Satellite Radio Channels +A communication satellite links two or more Earth-based microwave +transmitter/ receivers, known as ground stations. The satellite receives +transmissions on one frequency band, regenerates the signal using a +repeater (discussed below), and transmits the signal on another +frequency. Two types of satellites are used in communications: +geostationary satellites and low-earth orbiting (LEO) satellites \[Wiki +Satellite 2016\]. Geostationary satellites permanently remain above the +same spot on Earth. This stationary presence is achieved by placing the +satellite in orbit at 36,000 kilometers above Earth's surface. This huge +distance from ground station through satellite back to ground station +introduces a substantial signal propagation delay of 280 milliseconds. +Nevertheless, satellite links, which can operate at speeds of hundreds +of Mbps, are often used in areas without access to DSL or cable-based +Internet access. LEO satellites are placed much closer to Earth and do +not remain permanently above one spot on Earth. They rotate around Earth +(just as the Moon does) and may communicate with each other, as well as +with ground stations. To provide continuous coverage to an area, many +satellites need to be placed in orbit. There are currently many +low-altitude communication systems in development. LEO satellite +technology may be used for Internet access sometime in the future. + + 1.3 The Network Core Having examined the Internet's edge, let us now +delve more deeply inside the network core---the mesh of packet switches +and links that interconnects the Internet's end systems. Figure 1.10 +highlights the network core with thick, shaded lines. + + Figure 1.10 The network core + +1.3.1 Packet Switching In a network application, end systems exchange +messages with each other. Messages can contain anything the application +designer wants. Messages may perform a control function (for example, +the "Hi" messages in our handshaking example in Figure 1.2) or can +contain data, such as an e-mail message, a JPEG image, or an MP3 audio +file. To send a message from a source end system to a destination end +system, the source breaks long messages into smaller chunks of data +known as packets. Between source and destination, each packet travels +through communication links and packet switches (for which there are two +predominant types, routers and link-layer switches). Packets are +transmitted over each communication link at a rate equal to the full +transmission rate of the link. So, if a source end system or a packet +switch is sending a packet of L bits over a link with transmission rate +R bits/sec, then the time to transmit the packet is L / R seconds. +Store-and-Forward Transmission Most packet switches use +store-and-forward transmission at the inputs to the links. +Store-and-forward transmission means that the packet switch must receive +the entire packet before it can begin to transmit the first bit of the +packet onto the outbound link. To explore store-and-forward transmission +in more detail, consider a simple network consisting of two end systems +connected by a single router, as shown in Figure 1.11. A router will +typically have many incident links, since its job is to switch an +incoming packet onto an outgoing link; in this simple example, the +router has the rather simple task of transferring a packet from one +(input) link to the only other attached link. In this example, the +source has three packets, each consisting of L bits, to send to the +destination. At the snapshot of time shown in Figure 1.11, the source +has transmitted some of packet 1, and the front of packet 1 has already +arrived at the router. Because the router employs store-and-forwarding, +at this instant of time, the router cannot transmit the bits it has +received; instead it must first buffer (i.e., "store") the packet's +bits. Only after the router has received all of the packet's bits can it +begin to transmit (i.e., "forward") the packet onto the outbound link. +To gain some insight into store-and-forward transmission, let's now +calculate the amount of time that elapses from when the source begins to +send the packet until the destination has received the entire packet. +(Here we will ignore propagation delay---the time it takes for the bits +to travel across the wire at near the speed of light---which will be +discussed in Section 1.4.) The source begins to transmit at time 0; at +time L/R seconds, the source has transmitted the entire packet, and the +entire packet has been received and stored at the router (since there is +no propagation delay). At time L/R seconds, since the router has just +received the entire packet, it can begin to transmit the packet onto the +outbound link towards the destination; at time 2L/R, the router has +transmitted the entire packet, and the + + entire packet has been received by the destination. Thus, the total +delay is 2L/R. If the + +Figure 1.11 Store-and-forward packet switching + +switch instead forwarded bits as soon as they arrive (without first +receiving the entire packet), then the total delay would be L/R since +bits are not held up at the router. But, as we will discuss in Section +1.4, routers need to receive, store, and process the entire packet +before forwarding. Now let's calculate the amount of time that elapses +from when the source begins to send the first packet until the +destination has received all three packets. As before, at time L/R, the +router begins to forward the first packet. But also at time L/R the +source will begin to send the second packet, since it has just finished +sending the entire first packet. Thus, at time 2L/R, the destination has +received the first packet and the router has received the second packet. +Similarly, at time 3L/R, the destination has received the first two +packets and the router has received the third packet. Finally, at time +4L/R the destination has received all three packets! Let's now consider +the general case of sending one packet from source to destination over a +path consisting of N links each of rate R (thus, there are N-1 routers +between source and destination). Applying the same logic as above, we +see that the end-to-end delay is: dend-to-end=NLR + +(1.1) + +You may now want to try to determine what the delay would be for P +packets sent over a series of N links. Queuing Delays and Packet Loss +Each packet switch has multiple links attached to it. For each attached +link, the packet switch has an output buffer (also called an output +queue), which stores packets that the router is about to send into that +link. The output buffers play a key role in packet switching. If an +arriving packet needs to be transmitted onto a link but finds the link +busy with the transmission of another packet, the arriving packet must +wait in the output buffer. Thus, in addition to the store-and-forward +delays, packets suffer output buffer queuing delays. These delays are +variable and depend on the level of congestion in the network. + + Since the amount of buffer space is finite, an + +Figure 1.12 Packet switching + +arriving packet may find that the buffer is completely full with other +packets waiting for transmission. In this case, packet loss will +occur---either the arriving packet or one of the already-queued packets +will be dropped. Figure 1.12 illustrates a simple packet-switched +network. As in Figure 1.11, packets are represented by three-dimensional +slabs. The width of a slab represents the number of bits in the packet. +In this figure, all packets have the same width and hence the same +length. Suppose Hosts A and B are sending packets to Host E. Hosts A and +B first send their packets along 100 Mbps Ethernet links to the first +router. The router then directs these packets to the 15 Mbps link. If, +during a short interval of time, the arrival rate of packets to the +router (when converted to bits per second) exceeds 15 Mbps, congestion +will occur at the router as packets queue in the link's output buffer +before being transmitted onto the link. For example, if Host A and B +each send a burst of five packets back-to-back at the same time, then +most of these packets will spend some time waiting in the queue. The +situation is, in fact, entirely analogous to many common-day +situations---for example, when we wait in line for a bank teller or wait +in front of a tollbooth. We'll examine this queuing delay in more detail +in Section 1.4. Forwarding Tables and Routing Protocols Earlier, we said +that a router takes a packet arriving on one of its attached +communication links and forwards that packet onto another one of its +attached communication links. But how does the router determine which +link it should forward the packet onto? Packet forwarding is actually +done in different ways in different types of computer networks. Here, we +briefly describe how it is done in the Internet. + + In the Internet, every end system has an address called an IP address. +When a source end system wants to send a packet to a destination end +system, the source includes the destination's IP address in the packet's +header. As with postal addresses, this address has a hierarchical +structure. When a packet arrives at a router in the network, the router +examines a portion of the packet's destination address and forwards the +packet to an adjacent router. More specifically, each router has a +forwarding table that maps destination addresses (or portions of the +destination addresses) to that router's outbound links. When a packet +arrives at a router, the router examines the address and searches its +forwarding table, using this destination address, to find the +appropriate outbound link. The router then directs the packet to this +outbound link. The end-to-end routing process is analogous to a car +driver who does not use maps but instead prefers to ask for directions. +For example, suppose Joe is driving from Philadelphia to 156 Lakeside +Drive in Orlando, Florida. Joe first drives to his neighborhood gas +station and asks how to get to 156 Lakeside Drive in Orlando, Florida. +The gas station attendant extracts the Florida portion of the address +and tells Joe that he needs to get onto the interstate highway I-95 +South, which has an entrance just next to the gas station. He also tells +Joe that once he enters Florida, he should ask someone else there. Joe +then takes I-95 South until he gets to Jacksonville, Florida, at which +point he asks another gas station attendant for directions. The +attendant extracts the Orlando portion of the address and tells Joe that +he should continue on I-95 to Daytona Beach and then ask someone else. +In Daytona Beach, another gas station attendant also extracts the +Orlando portion of the address and tells Joe that he should take I-4 +directly to Orlando. Joe takes I-4 and gets off at the Orlando exit. Joe +goes to another gas station attendant, and this time the attendant +extracts the Lakeside Drive portion of the address and tells Joe the +road he must follow to get to Lakeside Drive. Once Joe reaches Lakeside +Drive, he asks a kid on a bicycle how to get to his destination. The kid +extracts the 156 portion of the address and points to the house. Joe +finally reaches his ultimate destination. In the above analogy, the gas +station attendants and kids on bicycles are analogous to routers. We +just learned that a router uses a packet's destination address to index +a forwarding table and determine the appropriate outbound link. But this +statement begs yet another question: How do forwarding tables get set? +Are they configured by hand in each and every router, or does the +Internet use a more automated procedure? This issue will be studied in +depth in Chapter 5. But to whet your appetite here, we'll note now that +the Internet has a number of special routing protocols that are used to +automatically set the forwarding tables. A routing protocol may, for +example, determine the shortest path from each router to each +destination and use the shortest path results to configure the +forwarding tables in the routers. How would you actually like to see the +end-to-end route that packets take in the Internet? We now invite you to +get your hands dirty by interacting with the Trace-route program. Simply +visit the site www.traceroute.org, choose a source in a particular +country, and trace the route from that source to your computer. (For a +discussion of Traceroute, see Section 1.4.) + + 1.3.2 Circuit Switching There are two fundamental approaches to moving +data through a network of links and switches: circuit switching and +packet switching. Having covered packet-switched networks in the +previous subsection, we now turn our attention to circuit-switched +networks. In circuit-switched networks, the resources needed along a +path (buffers, link transmission rate) to provide for communication +between the end systems are reserved for the duration of the +communication session between the end systems. In packet-switched +networks, these resources are not reserved; a session's messages use the +resources on demand and, as a consequence, may have to wait (that is, +queue) for access to a communication link. As a simple analogy, consider +two restaurants, one that requires reservations and another that neither +requires reservations nor accepts them. For the restaurant that requires +reservations, we have to go through the hassle of calling before we +leave home. But when we arrive at the restaurant we can, in principle, +immediately be seated and order our meal. For the restaurant that does +not require reservations, we don't need to bother to reserve a table. +But when we arrive at the restaurant, we may have to wait for a table +before we can be seated. Traditional telephone networks are examples of +circuit-switched networks. ­Consider what happens when one person wants +to send information (voice or facsimile) to another over a telephone +network. Before the sender can send the information, the network must +establish a connection between the sender and the receiver. This is a +bona fide connection for which the switches on the path between the +sender and receiver maintain connection state for that connection. In +the jargon of telephony, this connection is called a circuit. When the +network establishes the circuit, it also reserves a constant +transmission rate in the network's links (representing a fraction of +each link's transmission capacity) for the duration of the connection. +Since a given transmission rate has been reserved for this +sender-toreceiver connection, the sender can transfer the data to the +receiver at the guaranteed constant rate. Figure 1.13 illustrates a +circuit-switched network. In this network, the four circuit switches are +interconnected by four links. Each of these links has four circuits, so +that each link can support four simultaneous connections. The hosts (for +example, PCs and workstations) are each directly connected to one of the +switches. When two hosts want to communicate, the network establishes a +dedicated endto-end connection between the two hosts. Thus, in order for +Host A to communicate with Host B, the network must first reserve one +circuit on each of two links. In this example, the dedicated end-to-end +connection uses the second circuit in the first link and the fourth +circuit in the second link. Because each link has four circuits, for +each link used by the end-to-end connection, the connection gets one +fourth of the link's total transmission capacity for the duration of the +connection. Thus, for example, if each link between adjacent switches +has a transmission rate of 1 Mbps, then each end-to-end circuit-switch +connection gets 250 kbps of dedicated transmission rate. + + Figure 1.13 A simple circuit-switched network consisting of four +switches and four links + +In contrast, consider what happens when one host wants to send a packet +to another host over a packet-switched network, such as the Internet. As +with circuit switching, the packet is transmitted over a series of +communication links. But different from circuit switching, the packet is +sent into the network without reserving any link resources whatsoever. +If one of the links is congested because other packets need to be +transmitted over the link at the same time, then the packet will have to +wait in a buffer at the sending side of the transmission link and suffer +a delay. The Internet makes its best effort to deliver packets in a +timely manner, but it does not make any guarantees. Multiplexing in +Circuit-Switched Networks A circuit in a link is implemented with either +frequency-division multiplexing (FDM) or time-division multiplexing +(TDM). With FDM, the frequency spectrum of a link is divided up among +the connections established across the link. Specifically, the link +dedicates a frequency band to each connection for the duration of the +connection. In telephone networks, this frequency band typically has a +width of 4 kHz (that is, 4,000 hertz or 4,000 cycles per second). The +width of the band is called, not surprisingly, the bandwidth. FM radio +stations also use FDM to share the frequency spectrum between 88 MHz and +108 MHz, with each station being allocated a specific frequency band. +For a TDM link, time is divided into frames of fixed duration, and each +frame is divided into a fixed number of time slots. When the network +establishes a connection across a link, the network dedicates one time +slot in every frame to this connection. These slots are dedicated for +the sole use of that connection, with one time slot available for use +(in every frame) to transmit the connection's data. + + Figure 1.14 With FDM, each circuit continuously gets a fraction of the +bandwidth. With TDM, each circuit gets all of the bandwidth periodically +during brief intervals of time (that is, during slots) + +Figure 1.14 illustrates FDM and TDM for a specific network link +supporting up to four circuits. For FDM, the frequency domain is +segmented into four bands, each of bandwidth 4 kHz. For TDM, the time +domain is segmented into frames, with four time slots in each frame; +each circuit is assigned the same dedicated slot in the revolving TDM +frames. For TDM, the transmission rate of a circuit is equal to the +frame rate multiplied by the number of bits in a slot. For example, if +the link transmits 8,000 frames per second and each slot consists of 8 +bits, then the transmission rate of each circuit is 64 kbps. Proponents +of packet switching have always argued that circuit switching is +wasteful because the dedicated circuits are idle during silent periods. +For example, when one person in a telephone call stops talking, the idle +network resources (frequency bands or time slots in the links along the +connection's route) cannot be used by other ongoing connections. As +another example of how these resources can be underutilized, consider a +radiologist who uses a circuit-switched network to remotely access a +series of x-rays. The radiologist sets up a connection, requests an +image, contemplates the image, and then requests a new image. Network +resources are allocated to the connection but are not used (i.e., are +wasted) during the radiologist's contemplation periods. Proponents of +packet switching also enjoy pointing out that establishing end-to-end +circuits and reserving end-to-end transmission capacity is complicated +and requires complex signaling software to coordinate the operation of +the switches along the end-to-end path. Before we finish our discussion +of circuit switching, let's work through a numerical example that should +shed further insight on the topic. Let us consider how long it takes to +send a file of 640,000 bits from Host A to Host B over a +circuit-switched network. Suppose that all links in the network use TDM +with 24 slots and have a bit rate of 1.536 Mbps. Also suppose that it +takes 500 msec to establish an end-to-end circuit before Host A can +begin to transmit the file. How long does it take to send the file? Each +circuit has a transmission rate of (1.536 Mbps)/24=64 kbps, so it takes +(640,000 bits)/(64 kbps)=10 seconds to transmit the file. To this 10 +seconds we add the circuit establishment time, giving 10.5 seconds to +send the file. Note that the transmission time is independent of the +number of links: The transmission time would be 10 seconds if the +end-to-end circuit passed through one link or a hundred links. (The +actual + + end-to-end delay also includes a propagation delay; see Section 1.4.) +Packet Switching Versus Circuit Switching Having described circuit +switching and packet switching, let us compare the two. Critics of +packet switching have often argued that packet switching is not suitable +for real-time services (for example, telephone calls and video +conference calls) because of its variable and unpredictable end-to-end +delays (due primarily to variable and unpredictable queuing delays). +Proponents of packet switching argue that (1) it offers better sharing +of transmission capacity than circuit switching and (2) it is simpler, +more efficient, and less costly to implement than circuit switching. An +interesting discussion of packet switching versus circuit switching is +\[Molinero-Fernandez 2002\]. Generally speaking, people who do not like +to hassle with ­restaurant reservations prefer packet switching to +circuit switching. Why is packet switching more efficient? Let's look at +a simple example. Suppose users share a 1 Mbps link. Also suppose that +each user alternates between periods of activity, when a user generates +data at a constant rate of 100 kbps, and periods of inactivity, when a +user generates no data. Suppose further that a user is active only 10 +percent of the time (and is idly drinking coffee during the remaining 90 +percent of the time). With circuit switching, 100 kbps must be reserved +for each user at all times. For example, with circuit-switched TDM, if a +one-second frame is divided into 10 time slots of 100 ms each, then each +user would be allocated one time slot per frame. Thus, the +circuit-switched link can support only 10(=1 Mbps/100 kbps) simultaneous +users. With packet switching, the probability that a specific user is +active is 0.1 (that is, 10 percent). If there are 35 users, the +probability that there are 11 or more simultaneously active users is +approximately 0.0004. (Homework Problem P8 outlines how this probability +is obtained.) When there are 10 or fewer simultaneously active users +(which happens with probability 0.9996), the aggregate arrival rate of +data is less than or equal to 1 Mbps, the output rate of the link. Thus, +when there are 10 or fewer active users, users' packets flow through the +link essentially without delay, as is the case with circuit switching. +When there are more than 10 simultaneously active users, then the +aggregate arrival rate of packets exceeds the output capacity of the +link, and the output queue will begin to grow. (It continues to grow +until the aggregate input rate falls back below 1 Mbps, at which point +the queue will begin to diminish in length.) Because the probability of +having more than 10 simultaneously active users is minuscule in this +example, packet switching provides essentially the same performance as +circuit switching, but does so while allowing for more than three times +the number of users. Let's now consider a second simple example. Suppose +there are 10 users and that one user suddenly generates one thousand +1,000-bit packets, while other users remain quiescent and do not +generate packets. Under TDM circuit switching with 10 slots per frame +and each slot consisting of 1,000 bits, the active user can only use its +one time slot per frame to transmit data, while the remaining nine time +slots in each frame remain idle. It will be 10 seconds before all of the +active user's one million bits of data has + + been transmitted. In the case of packet switching, the active user can +continuously send its packets at the full link rate of 1 Mbps, since +there are no other users generating packets that need to be multiplexed +with the active user's packets. In this case, all of the active user's +data will be transmitted within 1 second. The above examples illustrate +two ways in which the performance of packet switching can be superior to +that of circuit switching. They also highlight the crucial difference +between the two forms of sharing a link's transmission rate among +multiple data streams. Circuit switching pre-allocates use of the +transmission link regardless of demand, with allocated but unneeded link +time going unused. Packet switching on the other hand allocates link use +on demand. Link transmission capacity will be shared on a +packet-by-packet basis only among those users who have packets that need +to be transmitted over the link. Although packet switching and circuit +switching are both prevalent in today's telecommunication networks, the +trend has certainly been in the direction of packet switching. Even many +of today's circuitswitched telephone networks are slowly migrating +toward packet switching. In particular, telephone networks often use +packet switching for the expensive overseas portion of a telephone call. + +1.3.3 A Network of Networks We saw earlier that end systems (PCs, +smartphones, Web servers, mail servers, and so on) connect into the +Internet via an access ISP. The access ISP can provide either wired or +wireless connectivity, using an array of access technologies including +DSL, cable, FTTH, Wi-Fi, and cellular. Note that the access ISP does not +have to be a telco or a cable company; instead it can be, for example, a +university (providing Internet access to students, staff, and faculty), +or a company (providing access for its employees). But connecting end +users and content providers into an access ISP is only a small piece of +solving the puzzle of connecting the billions of end systems that make +up the Internet. To complete this puzzle, the access ISPs themselves +must be interconnected. This is done by creating a network of +networks---understanding this phrase is the key to understanding the +Internet. Over the years, the network of networks that forms the +Internet has evolved into a very complex structure. Much of this +evolution is driven by economics and national policy, rather than by +performance considerations. In order to understand today's Internet +network structure, let's incrementally build a series of network +structures, with each new structure being a better approximation of the +complex Internet that we have today. Recall that the overarching goal is +to interconnect the access ISPs so that all end systems can send packets +to each other. One naive approach would be to have each access ISP +directly connect with every other access ISP. Such a mesh design is, of +course, much too costly for the access ISPs, as it would require each +access ISP to have a separate communication link to each of the hundreds +of thousands of other access ISPs all over the world. + + Our first network structure, Network Structure 1, interconnects all of +the access ISPs with a single global transit ISP. Our (imaginary) global +transit ISP is a network of routers and communication links that not +only spans the globe, but also has at least one router near each of the +hundreds of thousands of access ISPs. Of course, it would be very costly +for the global ISP to build such an extensive network. To be profitable, +it would naturally charge each of the access ISPs for connectivity, with +the pricing reflecting (but not necessarily directly proportional to) +the amount of traffic an access ISP exchanges with the global ISP. Since +the access ISP pays the global transit ISP, the access ISP is said to be +a customer and the global transit ISP is said to be a provider. Now if +some company builds and operates a global transit ISP that is +profitable, then it is natural for other companies to build their own +global transit ISPs and compete with the original global transit ISP. +This leads to Network Structure 2, which consists of the hundreds of +thousands of access ISPs and multiple global ­transit ISPs. The access +ISPs certainly prefer Network Structure 2 over Network Structure 1 since +they can now choose among the competing global transit providers as a +function of their pricing and services. Note, however, that the global +transit ISPs themselves must interconnect: Otherwise access ISPs +connected to one of the global transit providers would not be able to +communicate with access ISPs connected to the other global transit +providers. Network Structure 2, just described, is a two-tier hierarchy +with global transit providers residing at the top tier and access ISPs +at the bottom tier. This assumes that global transit ISPs are not only +capable of getting close to each and every access ISP, but also find it +economically desirable to do so. In reality, although some ISPs do have +impressive global coverage and do directly connect with many access +ISPs, no ISP has presence in each and every city in the world. Instead, +in any given region, there may be a regional ISP to which the access +ISPs in the region connect. Each regional ISP then connects to tier-1 +ISPs. Tier-1 ISPs are similar to our (imaginary) global transit ISP; but +tier-1 ISPs, which actually do exist, do not have a presence in every +city in the world. There are approximately a dozen tier-1 ISPs, +including Level 3 Communications, AT&T, Sprint, and NTT. Interestingly, +no group officially sanctions tier-1 status; as the saying goes---if you +have to ask if you're a member of a group, you're probably not. +Returning to this network of networks, not only are there multiple +competing tier-1 ISPs, there may be multiple competing regional ISPs in +a region. In such a hierarchy, each access ISP pays the regional ISP to +which it connects, and each regional ISP pays the tier-1 ISP to which it +connects. (An access ISP can also connect directly to a tier-1 ISP, in +which case it pays the tier-1 ISP). Thus, there is customerprovider +relationship at each level of the hierarchy. Note that the tier-1 ISPs +do not pay anyone as they are at the top of the hierarchy. To further +complicate matters, in some regions, there may be a larger regional ISP +(possibly spanning an entire country) to which the smaller regional ISPs +in that region connect; the larger regional ISP then connects to a +tier-1 ISP. For example, in China, there are access ISPs in each city, +which connect to provincial ISPs, which in turn connect to national +ISPs, which finally connect to tier-1 ISPs \[Tian 2012\]. We refer to +this multi-tier hierarchy, which is still only a crude + + approximation of today's Internet, as Network Structure 3. To build a +network that more closely resembles today's Internet, we must add points +of presence (PoPs), multi-homing, peering, and Internet exchange points +(IXPs) to the hierarchical Network Structure 3. PoPs exist in all levels +of the hierarchy, except for the bottom (access ISP) level. A PoP is +simply a group of one or more routers (at the same location) in the +provider's network where customer ISPs can connect into the provider +ISP. For a customer network to connect to a provider's PoP, it can lease +a high-speed link from a third-party telecommunications provider to +directly connect one of its routers to a router at the PoP. Any ISP +(except for tier-1 ISPs) may choose to multi-home, that is, to connect +to two or more provider ISPs. So, for example, an access ISP may +multi-home with two regional ISPs, or it may multi-home with two +regional ISPs and also with a tier-1 ISP. Similarly, a regional ISP may +multi-home with multiple tier-1 ISPs. When an ISP multi-homes, it can +continue to send and receive packets into the Internet even if one of +its providers has a failure. As we just learned, customer ISPs pay their +provider ISPs to obtain global Internet interconnectivity. The amount +that a customer ISP pays a provider ISP reflects the amount of traffic +it exchanges with the provider. To reduce these costs, a pair of nearby +ISPs at the same level of the hierarchy can peer, that is, they can +directly connect their networks together so that all the traffic between +them passes over the direct connection rather than through upstream +intermediaries. When two ISPs peer, it is typically settlement-free, +that is, neither ISP pays the other. As noted earlier, tier-1 ISPs also +peer with one another, settlement-free. For a readable discussion of +peering and customer-provider relationships, see \[Van der Berg 2008\]. +Along these same lines, a third-party company can create an Internet +Exchange Point (IXP), which is a meeting point where multiple ISPs can +peer together. An IXP is typically in a stand-alone building with its +own switches \[Ager 2012\]. There are over 400 IXPs in the Internet +today \[IXP List 2016\]. We refer to this ecosystem---consisting of +access ISPs, regional ISPs, tier-1 ISPs, PoPs, multi-homing, peering, +and IXPs---as Network Structure 4. We now finally arrive at Network +Structure 5, which describes today's Internet. Network Structure 5, +illustrated in Figure 1.15, builds on top of Network Structure 4 by +adding content-provider networks. Google is currently one of the leading +examples of such a content-provider network. As of this writing, it is +estimated that Google has 50--100 data centers distributed across North +America, Europe, Asia, South America, and Australia. Some of these data +centers house over one hundred thousand servers, while other data +centers are smaller, housing only hundreds of servers. The Google data +centers are all interconnected via Google's private TCP/IP network, +which spans the entire globe but is nevertheless separate from the +public Internet. Importantly, the Google private network only carries +traffic to/from Google servers. As shown in Figure 1.15, the Google +private network attempts to "bypass" the upper tiers of the Internet by +peering (settlement free) with lower-tier ISPs, either by directly +connecting with them or by connecting with them at IXPs \[Labovitz +2010\]. However, because many access ISPs can still only be reached by +transiting through tier-1 networks, the Google network also connects to +tier-1 ISPs, and pays those ISPs for the traffic it exchanges with them. +By creating its own network, a content + + provider not only reduces its payments to upper-tier ISPs, but also has +greater control of how its services are ultimately delivered to end +users. Google's network infrastructure is described in greater detail in +Section 2.6. In summary, today's Internet---a network of networks---is +complex, consisting of a dozen or so tier-1 ISPs and hundreds of +thousands of lower-tier ISPs. The ISPs are diverse in their coverage, +with some spanning multiple continents and oceans, and others limited to +narrow geographic regions. The lowertier ISPs connect to the higher-tier +ISPs, and the higher-tier ISPs interconnect with one another. Users and +content providers are customers of lower-tier ISPs, and lower-tier ISPs +are customers of higher-tier ISPs. In recent years, major content +providers have also created their own networks and connect directly into +lower-tier ISPs where possible. + +Figure 1.15 Interconnection of ISPs + + 1.4 Delay, Loss, and Throughput in Packet-Switched Networks Back in +Section 1.1 we said that the Internet can be viewed as an infrastructure +that provides services to distributed applications running on end +systems. Ideally, we would like Internet services to be able to move as +much data as we want between any two end systems, instantaneously, +without any loss of data. Alas, this is a lofty goal, one that is +unachievable in reality. Instead, computer networks necessarily +constrain throughput (the amount of data per second that can be +transferred) between end systems, introduce delays between end systems, +and can actually lose packets. On one hand, it is unfortunate that the +physical laws of reality introduce delay and loss as well as constrain +throughput. On the other hand, because computer networks have these +problems, there are many fascinating issues surrounding how to deal with +the problems---more than enough issues to fill a course on computer +networking and to motivate thousands of PhD theses! In this section, +we'll begin to examine and quantify delay, loss, and throughput in +computer networks. + +1.4.1 Overview of Delay in Packet-Switched Networks Recall that a packet +starts in a host (the source), passes through a series of routers, and +ends its journey in another host (the destination). As a packet travels +from one node (host or router) to the subsequent node (host or router) +along this path, the packet suffers from several types of delays at each +node along the path. The most important of these delays are the nodal +processing delay, queuing delay, transmission delay, and propagation +delay; together, these delays accumulate to give a total nodal delay. +The performance of many Internet applications---such as search, Web +browsing, e-mail, maps, instant messaging, and voice-over-IP---are +greatly affected by network delays. In order to acquire a deep +understanding of packet switching and computer networks, we must +understand the nature and importance of these delays. Types of Delay +Let's explore these delays in the context of Figure 1.16. As part of its +end-to-end route between source and destination, a packet is sent from +the upstream node through router A to router B. Our goal is to +characterize the nodal delay at router A. Note that router A has an +outbound link leading to router B. This link is preceded by a queue +(also known as a buffer). When the packet arrives at router A from the +upstream node, router A examines the packet's header to determine the +appropriate outbound link for the packet and then directs the packet to +this link. In this example, the outbound link for the packet is the one +that leads to router B. A packet can be transmitted on a link only if +there is no other packet currently + + being transmitted on the link and if there are no other packets +preceding it in the queue; if the link is + +Figure 1.16 The nodal delay at router A + +currently busy or if there are other packets already queued for the +link, the newly arriving packet will then join the queue. Processing +Delay The time required to examine the packet's header and determine +where to direct the packet is part of the processing delay. The +processing delay can also include other factors, such as the time needed +to check for bit-level errors in the packet that occurred in +transmitting the packet's bits from the upstream node to router A. +Processing delays in high-speed routers are typically on the order of +microseconds or less. After this nodal processing, the router directs +the packet to the queue that precedes the link to router B. (In Chapter +4 we'll study the details of how a router operates.) Queuing Delay At +the queue, the packet experiences a queuing delay as it waits to be +transmitted onto the link. The length of the queuing delay of a specific +packet will depend on the number of earlier-arriving packets that are +queued and waiting for transmission onto the link. If the queue is empty +and no other packet is currently being transmitted, then our packet's +queuing delay will be zero. On the other hand, if the traffic is heavy +and many other packets are also waiting to be transmitted, the queuing +delay will be long. We will see shortly that the number of packets that +an arriving packet might expect to find is a function of the intensity +and nature of the traffic arriving at the queue. ­Queuing delays can be +on the order of microseconds to milliseconds in practice. Transmission +Delay Assuming that packets are transmitted in a first-come-first-served +manner, as is common in packetswitched networks, our packet can be +transmitted only after all the packets that have arrived before it have +been transmitted. Denote the length of the packet by L bits, and denote +the transmission rate of + + the link from router A to router B by R bits/sec. For example, for a 10 +Mbps Ethernet link, the rate is R=10 Mbps; for a 100 Mbps Ethernet link, +the rate is R=100 Mbps. The transmission delay is L/R. This is the +amount of time required to push (that is, transmit) all of the packet's +bits into the link. Transmission delays are typically on the order of +microseconds to milliseconds in practice. Propagation Delay Once a bit +is pushed into the link, it needs to propagate to router B. The time +required to propagate from the beginning of the link to router B is the +propagation delay. The bit propagates at the propagation speed of the +link. The propagation speed depends on the physical medium of the link +(that is, fiber optics, twisted-pair copper wire, and so on) and is in +the range of 2⋅108 meters/sec to 3⋅108 meters/sec which is equal to, or +a little less than, the speed of light. The propagation delay is the +distance between two routers divided by the propagation speed. That is, +the propagation delay is d/s, where d is the distance between router A +and router B and s is the propagation speed of the link. Once the last +bit of the packet propagates to node B, it and all the preceding bits of +the packet are stored in router B. The whole process then continues with +router B now performing the forwarding. In wide-area networks, +propagation delays are on the order of milliseconds. Comparing +Transmission and Propagation Delay + +Exploring propagation delay and transmission delay + +Newcomers to the field of computer networking sometimes have difficulty +understanding the difference between transmission delay and propagation +delay. The difference is subtle but important. The transmission delay is +the amount of time required for the router to push out the packet; it is +a function of the packet's length and the transmission rate of the link, +but has nothing to do with the distance between the two routers. The +propagation delay, on the other hand, is the time it takes a bit to +propagate from one router to the next; it is a function of the distance +between the two routers, but has nothing to do with the packet's length +or the transmission rate of the link. An analogy might clarify the +notions of transmission and propagation delay. Consider a highway that +has a tollbooth every 100 kilometers, as shown in Figure 1.17. You can +think of the highway segments + + between tollbooths as links and the tollbooths as routers. Suppose that +cars travel (that is, propagate) on the highway at a rate of 100 km/hour +(that is, when a car leaves a tollbooth, it instantaneously accelerates +to 100 km/hour and maintains that speed between tollbooths). Suppose +next that 10 cars, traveling together as a caravan, follow each other in +a fixed order. You can think of each car as a bit and the caravan as a +packet. Also suppose that each + +Figure 1.17 Caravan analogy + +tollbooth services (that is, transmits) a car at a rate of one car per +12 seconds, and that it is late at night so that the caravan's cars are +the only cars on the highway. Finally, suppose that whenever the first +car of the caravan arrives at a tollbooth, it waits at the entrance +until the other nine cars have arrived and lined up behind it. (Thus the +entire caravan must be stored at the tollbooth before it can begin to be +forwarded.) The time required for the tollbooth to push the entire +caravan onto the highway is (10 cars)/(5 cars/minute)=2 minutes. This +time is analogous to the transmission delay in a router. The time +required for a car to travel from the exit of one tollbooth to the next +tollbooth is 100 km/(100 km/hour)=1 hour. This time is analogous to +propagation delay. Therefore, the time from when the caravan is stored +in front of a tollbooth until the caravan is stored in front of the next +tollbooth is the sum of transmission delay and propagation delay---in +this example, 62 minutes. Let's explore this analogy a bit more. What +would happen if the tollbooth service time for a caravan were greater +than the time for a car to travel between tollbooths? For example, +suppose now that the cars travel at the rate of 1,000 km/hour and the +tollbooth services cars at the rate of one car per minute. Then the +traveling delay between two tollbooths is 6 minutes and the time to +serve a caravan is 10 minutes. In this case, the first few cars in the +caravan will arrive at the second tollbooth before the last cars in the +caravan leave the first tollbooth. This situation also arises in +packet-switched networks---the first bits in a packet can arrive at a +router while many of the remaining bits in the packet are still waiting +to be transmitted by the preceding router. If a picture speaks a +thousand words, then an animation must speak a million words. The Web +site for this textbook provides an interactive Java applet that nicely +illustrates and contrasts transmission delay and propagation delay. The +reader is highly encouraged to visit that applet. \[Smith 2009\] also +provides a very readable discussion of propagation, queueing, and +transmission delays. If we let dproc, dqueue, dtrans, and dprop denote +the processing, queuing, transmission, and propagation + + delays, then the total nodal delay is given by +dnodal=dproc+dqueue+dtrans+dprop The contribution of these delay +components can vary significantly. For example, dprop can be negligible +(for example, a couple of microseconds) for a link connecting two +routers on the same university campus; however, dprop is hundreds of +milliseconds for two routers interconnected by a geostationary satellite +link, and can be the dominant term in dnodal. Similarly, dtrans can +range from negligible to significant. Its contribution is typically +negligible for transmission rates of 10 Mbps and higher (for example, +for LANs); however, it can be hundreds of milliseconds for large +Internet packets sent over low-speed dial-up modem links. The processing +delay, dproc, is often negligible; however, it strongly influences a +router's maximum throughput, which is the maximum rate at which a router +can forward packets. + +1.4.2 Queuing Delay and Packet Loss The most complicated and interesting +component of nodal delay is the queuing delay, dqueue. In fact, queuing +delay is so important and interesting in computer networking that +thousands of papers and numerous books have been written about it +\[Bertsekas 1991; Daigle 1991; Kleinrock 1975, Kleinrock 1976; Ross +1995\]. We give only a high-level, intuitive discussion of queuing delay +here; the more curious reader may want to browse through some of the +books (or even eventually write a PhD thesis on the subject!). Unlike +the other three delays (namely, dproc, dtrans, and dprop), the queuing +delay can vary from packet to packet. For example, if 10 packets arrive +at an empty queue at the same time, the first packet transmitted will +suffer no queuing delay, while the last packet transmitted will suffer a +relatively large queuing delay (while it waits for the other nine +packets to be transmitted). Therefore, when characterizing queuing +delay, one typically uses statistical measures, such as average queuing +delay, variance of queuing delay, and the probability that the queuing +delay exceeds some specified value. When is the queuing delay large and +when is it insignificant? The answer to this question depends on the +rate at which traffic arrives at the queue, the transmission rate of the +link, and the nature of the arriving traffic, that is, whether the +traffic arrives periodically or arrives in bursts. To gain some insight +here, let a denote the average rate at which packets arrive at the queue +(a is in units of packets/sec). Recall that R is the transmission rate; +that is, it is the rate (in bits/sec) at which bits are pushed out of +the queue. Also suppose, for simplicity, that all packets consist of L +bits. Then the average rate at which bits arrive at the queue is La +bits/sec. Finally, assume that the queue is very big, so that it can +hold essentially an infinite number of bits. The ratio La/R, called the +traffic intensity, often plays an important role in estimating the +extent of the queuing delay. If La/R \> 1, then the average rate at +which bits arrive at the queue exceeds the rate at which the bits can be +transmitted from the queue. In this + + unfortunate situation, the queue will tend to increase without bound and +the queuing delay will approach infinity! Therefore, one of the golden +rules in traffic engineering is: Design your system so that the traffic +intensity is no greater than 1. Now consider the case La/R ≤ 1. Here, +the nature of the arriving traffic impacts the queuing delay. For +example, if packets arrive periodically---that is, one packet arrives +every L/R seconds---then every packet will arrive at an empty queue and +there will be no queuing delay. On the other hand, if packets arrive in +bursts but periodically, there can be a significant average queuing +delay. For example, suppose N packets arrive simultaneously every (L/R)N +seconds. Then the first packet transmitted has no queuing delay; the +second packet transmitted has a queuing delay of L/R seconds; and more +generally, the nth packet transmitted has a queuing delay of (n−1)L/R +seconds. We leave it as an exercise for you to calculate the average +queuing delay in this example. The two examples of periodic arrivals +described above are a bit academic. ­Typically, the arrival process to a +queue is random; that is, the arrivals do not follow any pattern and the +packets are spaced apart by random amounts of time. In this more +realistic case, the quantity La/R is not usually sufficient to fully +characterize the queuing delay statistics. Nonetheless, it is useful in +gaining an intuitive understanding of the extent of the queuing delay. +In particular, if the traffic intensity is close to zero, then packet +arrivals are few and far between and it is unlikely that an arriving +packet will find another packet in the queue. Hence, the average queuing +delay will be close to zero. On the other hand, when the traffic +intensity is close to 1, there will be intervals of time when the +arrival rate exceeds the transmission capacity (due to variations in +packet arrival rate), and a queue will form during these periods of +time; when the arrival rate is less than the transmission capacity, the +length of the queue will shrink. Nonetheless, as the traffic intensity +approaches 1, the average queue length gets larger and larger. The +qualitative dependence of average queuing delay on the traffic intensity +is shown in Figure 1.18. One important aspect of Figure 1.18 is the fact +that as the traffic intensity approaches 1, the average queuing delay +increases rapidly. A small percentage increase in the intensity will +result in a much larger percentage-wise increase in delay. Perhaps you +have experienced this phenomenon on the highway. If you regularly drive +on a road that is typically congested, the fact that the road is +typically + + Figure 1.18 Dependence of average queuing delay on traffic intensity + +congested means that its traffic intensity is close to 1. If some event +causes an even slightly larger-thanusual amount of traffic, the delays +you experience can be huge. To really get a good feel for what queuing +delays are about, you are encouraged once again to visit the textbook +Web site, which provides an interactive Java applet for a queue. If you +set the packet arrival rate high enough so that the traffic intensity +exceeds 1, you will see the queue slowly build up over time. Packet Loss +In our discussions above, we have assumed that the queue is capable of +holding an infinite number of packets. In reality a queue preceding a +link has finite capacity, although the queuing capacity greatly depends +on the router design and cost. Because the queue capacity is finite, +packet delays do not really approach infinity as the traffic intensity +approaches 1. Instead, a packet can arrive to find a full queue. With no +place to store such a packet, a router will drop that packet; that is, +the packet will be lost. This overflow at a queue can again be seen in +the Java applet for a queue when the traffic intensity is greater +than 1. From an end-system viewpoint, a packet loss will look like a +packet having been transmitted into the network core but never emerging +from the network at the destination. The fraction of lost packets +increases as the traffic intensity increases. Therefore, performance at +a node is often measured not only in terms of delay, but also in terms +of the probability of packet loss. As we'll discuss in the subsequent +chapters, a lost packet may be retransmitted on an end-to-end basis in +order to ensure that all data are eventually transferred from source to +destination. + +1.4.3 End-to-End Delay + + Our discussion up to this point has focused on the nodal delay, that is, +the delay at a single router. Let's now consider the total delay from +source to destination. To get a handle on this concept, suppose there +are N−1 routers between the source host and the destination host. Let's +also suppose for the moment that the network is uncongested (so that +queuing delays are negligible), the processing delay at each router and +at the source host is dproc, the transmission rate out of each router +and out of the source host is R bits/sec, and the propagation on each +link is dprop. The nodal delays accumulate and give an end-toend delay, +dend−end=N(dproc+dtrans+dprop) + +(1.2) + +where, once again, dtrans=L/R, where L is the packet size. Note that +Equation 1.2 is a generalization of Equation 1.1, which did not take +into account processing and propagation delays. We leave it to you to +generalize Equation 1.2 to the case of ­heterogeneous delays at the nodes +and to the presence of an average queuing delay at each node. Traceroute + +Using Traceroute to discover network paths and measure network delay + +To get a hands-on feel for end-to-end delay in a computer network, we +can make use of the Traceroute program. Traceroute is a simple program +that can run in any Internet host. When the user specifies a destination +hostname, the program in the source host sends multiple, special packets +toward that destination. As these packets work their way toward the +destination, they pass through a series of routers. When a router +receives one of these special packets, it sends back to the source a +short message that contains the name and address of the router. More +specifically, suppose there are N−1 routers between the source and the +destination. Then the source will send N special packets into the +network, with each packet addressed to the ultimate destination. These N +special packets are marked 1 through N, with the first packet marked 1 +and the last packet marked N. When the nth router receives the nth +packet marked n, the router does not forward the packet toward its +destination, but instead sends a message back to the source. When the +destination host receives the Nth packet, it too returns a message back +to the source. The source records the time that elapses between when it +sends a packet and when it receives the corresponding + + return message; it also records the name and address of the router (or +the destination host) that returns the message. In this manner, the +source can reconstruct the route taken by packets flowing from source to +destination, and the source can determine the round-trip delays to all +the intervening routers. Traceroute actually repeats the experiment just +described three times, so the source actually sends 3 • N packets to the +destination. RFC 1393 describes Traceroute in detail. Here is an example +of the output of the Traceroute program, where the route was being +traced from the source host gaia.cs.umass.edu (at the University of +­Massachusetts) to the host cis.poly.edu (at Polytechnic University in +Brooklyn). The output has six columns: the first column is the n value +described above, that is, the number of the router along the route; the +second column is the name of the router; the third column is the address +of the router (of the form xxx.xxx.xxx.xxx); the last three columns are +the round-trip delays for three experiments. If the source receives +fewer than three messages from any given router (due to packet loss in +the network), Traceroute places an asterisk just after the router number +and reports fewer than three round-trip times for that router. + +1 + +cs-gw (128.119.240.254) 1.009 ms 0.899 ms 0.993 ms + +2 + +128.119.3.154 (128.119.3.154) 0.931 ms 0.441 ms 0.651 ms + +3 + +-border4-rt-gi-1-3.gw.umass.edu (128.119.2.194) 1.032 ms 0.484 ms + +0.451 ms 4 + +-acr1-ge-2-1-0.Boston.cw.net (208.172.51.129) 10.006 ms 8.150 ms 8.460 + +ms 5 + +-agr4-loopback.NewYork.cw.net (206.24.194.104) 12.272 ms 14.344 ms + +13.267 ms 6 + +-acr2-loopback.NewYork.cw.net (206.24.194.62) 13.225 ms 12.292 ms + +12.148 ms 7 + +-pos10-2.core2.NewYork1.Level3.net (209.244.160.133) 12.218 ms 11.823 + +ms 11.793 ms 8 + +-gige9-1-52.hsipaccess1.NewYork1.Level3.net (64.159.17.39) 13.081 ms + +11.556 ms 13.297 ms 9 + +-p0-0.polyu.bbnplanet.net (4.25.109.122) 12.716 ms 13.052 ms 12.786 ms + +10 cis.poly.edu (128.238.32.126) 14.080 ms 13.035 ms 12.802 ms + +In the trace above there are nine routers between the source and the +destination. Most of these routers have a name, and all of them have +addresses. For example, the name of Router 3 is +border4-rt-gi1-3.gw.umass.edu and its address is 128.119.2.194 . Looking +at the data provided for this same router, we see that in the first of +the three trials the round-trip delay between the source and the router +was 1.03 msec. The round-trip delays for the subsequent two trials were +0.48 and 0.45 msec. These + + round-trip delays include all of the delays just discussed, including +transmission delays, propagation delays, router processing delays, and +queuing delays. Because the queuing delay is varying with time, the +round-trip delay of packet n sent to a router n can sometimes be longer +than the round-trip delay of packet n+1 sent to router n+1. Indeed, we +observe this phenomenon in the above example: the delays to Router 6 are +larger than the delays to Router 7! Want to try out Traceroute for +yourself? We highly recommended that you visit http:// +www.traceroute.org, which provides a Web interface to an extensive list +of sources for route tracing. You choose a source and supply the +hostname for any destination. The Traceroute program then does all the +work. There are a number of free software programs that provide a +graphical interface to Traceroute; one of our favorites is PingPlotter +\[PingPlotter 2016\]. End System, Application, and Other Delays In +addition to processing, transmission, and propagation delays, there can +be additional significant delays in the end systems. For example, an end +system wanting to transmit a packet into a shared medium (e.g., as in a +WiFi or cable modem scenario) may purposefully delay its transmission as +part of its protocol for sharing the medium with other end systems; +we'll consider such protocols in detail in Chapter 6. Another important +delay is media packetization delay, which is present in Voice-over-IP +(VoIP) applications. In VoIP, the sending side must first fill a packet +with encoded digitized speech before passing the packet to the Internet. +This time to fill a packet---called the packetization delay---can be +significant and can impact the user-perceived quality of a VoIP call. +This issue will be further explored in a homework problem at the end of +this chapter. + +1.4.4 Throughput in Computer Networks In addition to delay and packet +loss, another critical performance measure in computer networks is +endto-end throughput. To define throughput, consider transferring a +large file from Host A to Host B across a computer network. This +transfer might be, for example, a large video clip from one peer to +another in a P2P file sharing system. The instantaneous throughput at +any instant of time is the rate (in bits/sec) at which Host B is +receiving the file. (Many applications, including many P2P file sharing +­systems, display the instantaneous throughput during downloads in the +user interface---perhaps you have observed this before!) If the file +consists of F bits and the transfer takes T seconds for Host B to +receive all F bits, then the average throughput of the file transfer is +F/T bits/sec. For some applications, such as Internet telephony, it is +desirable to have a low delay and an instantaneous throughput +consistently above some threshold (for example, over 24 kbps for some +Internet telephony applications and over 256 kbps for some real-time +video applications). For other applications, including those involving +file transfers, delay is not critical, but it is desirable to have the +highest possible throughput. + + To gain further insight into the important concept of throughput, let's +consider a few examples. Figure 1.19(a) shows two end systems, a server +and a client, connected by two communication links and a router. +Consider the throughput for a file transfer from the server to the +client. Let Rs denote the rate of the link between the server and the +router; and Rc denote the rate of the link between the router and the +client. Suppose that the only bits being sent in the entire network are +those from the server to the client. We now ask, in this ideal scenario, +what is the server-to-client throughput? To answer this question, we may +think of bits as fluid and communication links as pipes. Clearly, the +server cannot pump bits through its link at a rate faster than Rs bps; +and the router cannot forward bits at a rate faster than Rc bps. If +Rs\ S:  250 alice@crepes.fr ... Sender ok +C:  RCPT TO: S:  250 bob@hamburger.edu ... Recipient +ok C:  DATA S:  354 Enter mail, end with "." on a line by itself C:  Do +you like ketchup? C:  How about pickles? C:  . S:  250 Message accepted +for delivery C:  QUIT S:  221 hamburger.edu closing connection + +In the example above, the client sends a message (" Do you like ketchup? +How about pickles? ") from mail server crepes.fr to mail server +hamburger.edu . As part of the dialogue, the client issued five +commands: HELO (an abbreviation for HELLO), MAIL FROM , RCPT TO , DATA , +and QUIT . These commands are self-explanatory. The client also sends a +line consisting of a single period, which indicates the end of the +message to the server. (In ASCII jargon, each message ends with +CRLF.CRLF , where CR and LF stand for carriage return and line feed, +respectively.) The server issues replies to each command, with each +reply having a reply code and some (optional) Englishlanguage +explanation. We mention here that SMTP uses persistent connections: If +the sending mail server has several messages to send to the same +receiving mail server, it can send all of the messages over the same TCP +connection. For each message, the client begins the process with a new +MAIL FROM: crepes.fr , designates the end of message with an isolated +period, and issues QUIT only after all messages have been sent. It is +highly recommended that you use Telnet to carry out a direct dialogue +with an SMTP server. To do this, issue + +telnet serverName 25 + +where serverName is the name of a local mail server. When you do this, +you are simply establishing a TCP connection between your local host and +the mail server. After typing this line, you should immediately receive +the 220 reply from the server. Then issue the SMTP commands HELO , MAIL +FROM , RCPT TO , DATA , CRLF.CRLF , and QUIT at the appropriate times. +It is also highly recommended that you do Programming Assignment 3 at +the end of this chapter. In that assignment, you'll build a simple user +agent that implements the client side of SMTP. It will allow you to send +an e- + + mail message to an arbitrary recipient via a local mail server. + +2.3.2 Comparison with HTTP Let's now briefly compare SMTP with HTTP. +Both protocols are used to transfer files from one host to another: HTTP +transfers files (also called objects) from a Web server to a Web client +(typically a browser); SMTP transfers files (that is, e-mail messages) +from one mail server to another mail server. When transferring the +files, both persistent HTTP and SMTP use persistent connections. Thus, +the two protocols have common characteristics. However, there are +important differences. First, HTTP is mainly a pull protocol---someone +loads information on a Web server and users use HTTP to pull the +information from the server at their convenience. In particular, the TCP +connection is initiated by the machine that wants to receive the file. +On the other hand, SMTP is primarily a push protocol---the sending mail +server pushes the file to the receiving mail server. In particular, the +TCP connection is initiated by the machine that wants to send the file. +A second difference, which we alluded to earlier, is that SMTP requires +each message, including the body of each message, to be in 7-bit ASCII +format. If the message contains characters that are not 7-bit ASCII (for +example, French characters with accents) or contains binary data (such +as an image file), then the message has to be encoded into 7-bit ASCII. +HTTP data does not impose this restriction. A third important difference +concerns how a document consisting of text and images (along with +possibly other media types) is handled. As we learned in Section 2.2, +HTTP encapsulates each object in its own HTTP response message. SMTP +places all of the message's objects into one message. + +2.3.3 Mail Message Formats When Alice writes an ordinary snail-mail +letter to Bob, she may include all kinds of peripheral header +information at the top of the letter, such as Bob's address, her own +return address, and the date. Similarly, when an e-mail message is sent +from one person to another, a header containing peripheral information +precedes the body of the message itself. This peripheral information is +contained in a series of header lines, which are defined in RFC 5322. +The header lines and the body of the message are separated by a blank +line (that is, by CRLF ). RFC 5322 specifies the exact format for mail +header lines as well as their semantic interpretations. As with HTTP, +each header line contains readable text, consisting of a keyword +followed by a colon followed by a value. Some of the keywords are +required and others are optional. Every header must have a From: header +line and a To: header line; a header may include a Subject: header line +as well as other optional header lines. It is important to note that +these header lines are different from the SMTP commands we studied in +Section 2.4.1 (even though + + they contain some common words such as "from" and "to"). The commands in +that section were part of the SMTP handshaking protocol; the header +lines examined in this section are part of the mail message itself. A +typical message header looks like this: + +From: alice@crepes.fr To: bob@hamburger.edu Subject: Searching for the +meaning of life. + +After the message header, a blank line follows; then the message body +(in ASCII) follows. You should use Telnet to send a message to a mail +server that contains some header lines, including the Subject: header +line. To do this, issue telnet serverName 25, as discussed in Section +2.4.1. + +2.3.4 Mail Access Protocols Once SMTP delivers the message from Alice's +mail server to Bob's mail server, the message is placed in Bob's +mailbox. Throughout this discussion we have tacitly assumed that Bob +reads his mail by logging onto the server host and then executing a mail +reader that runs on that host. Up until the early 1990s this was the +standard way of doing things. But today, mail access uses a +client-server architecture---the typical user reads e-mail with a client +that executes on the user's end system, for example, on an office PC, a +laptop, or a smartphone. By executing a mail client on a local PC, users +enjoy a rich set of features, including the ability to view multimedia +messages and attachments. Given that Bob (the recipient) executes his +user agent on his local PC, it is natural to consider placing a mail +server on his local PC as well. With this approach, Alice's mail server +would dialogue directly with Bob's PC. There is a problem with this +approach, however. Recall that a mail server manages mailboxes and runs +the client and server sides of SMTP. If Bob's mail server were to reside +on his local PC, then Bob's PC would have to remain always on, and +connected to the Internet, in order to receive new mail, which can +arrive at any time. This is impractical for many Internet users. +Instead, a typical user runs a user agent on the local PC but accesses +its mailbox stored on an always-on shared mail server. This mail server +is shared with other users and is typically maintained by the user's ISP +(for example, university or company). Now let's consider the path an +e-mail message takes when it is sent from Alice to Bob. We just learned +that at some point along the path the e-mail message needs to be +deposited in Bob's mail server. This could be done simply by having +Alice's user agent send the message directly to Bob's mail server. And + + this could be done with SMTP---indeed, SMTP has been designed for +pushing e-mail from one host to another. However, typically the sender's +user agent does not dialogue directly with the recipient's mail server. +Instead, as shown in Figure 2.16, Alice's user agent uses SMTP to push +the e-mail message into her mail server, then Alice's mail server uses +SMTP (as an SMTP client) to relay the e-mail message to Bob's mail +server. Why the two-step procedure? Primarily because without relaying +through Alice's mail server, Alice's user agent doesn't have any +recourse to an unreachable destination + +Figure 2.16 E-mail protocols and their communicating entities + +mail server. By having Alice first deposit the e-mail in her own mail +server, Alice's mail server can repeatedly try to send the message to +Bob's mail server, say every 30 minutes, until Bob's mail server becomes +operational. (And if Alice's mail server is down, then she has the +recourse of complaining to her system administrator!) The SMTP RFC +defines how the SMTP commands can be used to relay a message across +multiple SMTP servers. But there is still one missing piece to the +puzzle! How does a recipient like Bob, running a user agent on his local +PC, obtain his messages, which are sitting in a mail server within Bob's +ISP? Note that Bob's user agent can't use SMTP to obtain the messages +because obtaining the messages is a pull operation, whereas SMTP is a +push protocol. The puzzle is completed by introducing a special mail +access protocol that transfers messages from Bob's mail server to his +local PC. There are currently a number of popular mail access protocols, +including Post Office Protocol---Version 3 (POP3), Internet Mail Access +Protocol (IMAP), and HTTP. Figure 2.16 provides a summary of the +protocols that are used for Internet mail: SMTP is used to transfer mail +from the sender's mail server to the recipient's mail server; SMTP is +also used to transfer mail from the sender's user agent to the sender's +mail server. A mail access protocol, such as POP3, is used to transfer +mail from the recipient's mail server to the recipient's user agent. +POP3 POP3 is an extremely simple mail access protocol. It is defined in +\[RFC 1939\], which is short and quite readable. Because the protocol is +so simple, its functionality is rather limited. POP3 begins when the +user agent (the client) opens a TCP connection to the mail server (the +server) on port 110. With the TCP + + connection established, POP3 progresses through three phases: +authorization, transaction, and update. During the first phase, +authorization, the user agent sends a username and a password (in the +clear) to authenticate the user. During the second phase, transaction, +the user agent retrieves messages; also during this phase, the user +agent can mark messages for deletion, remove deletion marks, and obtain +mail statistics. The third phase, update, occurs after the client has +issued the quit command, ending the POP3 session; at this time, the mail +server deletes the messages that were marked for deletion. In a POP3 +transaction, the user agent issues commands, and the server responds to +each command with a reply. There are two possible responses: +OK +(sometimes followed by server-to-client data), used by the server to +indicate that the previous command was fine; and -ERR , used by the +server to indicate that something was wrong with the previous command. +The authorization phase has two principal commands: user +``{=html} and pass ``{=html} . To illustrate these +two commands, we suggest that you Telnet directly into a POP3 server, +using port 110, and issue these commands. Suppose that mailServer is the +name of your mail server. You will see something like: + +telnet mailServer 110 +OK POP3 server ready user bob +OK pass hungry +OK +user successfully logged on + +If you misspell a command, the POP3 server will reply with an -ERR +message. Now let's take a look at the transaction phase. A user agent +using POP3 can often be configured (by the user) to "download and +delete" or to "download and keep." The sequence of commands issued by a +POP3 user agent depends on which of these two modes the user agent is +operating in. In the downloadand-delete mode, the user agent will issue +the list , retr , and dele commands. As an example, suppose the user has +two messages in his or her mailbox. In the dialogue below, C: (standing +for client) is the user agent and S: (standing for server) is the mail +server. The transaction will look something like: + +C: list S: 1 498 S: 2 912 + + S: . C: retr 1 S: (blah blah ... S: ................. S: ..........blah) +S: . C: dele 1 C: retr 2 S: (blah blah ... S: ................. S: +..........blah) S: . C: dele 2 C: quit S: +OK POP3 server signing off + +The user agent first asks the mail server to list the size of each of +the stored messages. The user agent then retrieves and deletes each +message from the server. Note that after the authorization phase, the +user agent employed only four commands: list , retr , dele , and quit . +The syntax for these commands is defined in RFC 1939. After processing +the quit command, the POP3 server enters the update phase and removes +messages 1 and 2 from the mailbox. A problem with this +download-and-delete mode is that the recipient, Bob, may be nomadic and +may want to access his mail messages from multiple machines, for +example, his office PC, his home PC, and his portable computer. The +download-and-delete mode partitions Bob's mail messages over these three +machines; in particular, if Bob first reads a message on his office PC, +he will not be able to reread the message from his portable at home +later in the evening. In the download-and-keep mode, the user agent +leaves the messages on the mail server after downloading them. In this +case, Bob can reread messages from different machines; he can access a +message from work and access it again later in the week from home. +During a POP3 session between a user agent and the mail server, the POP3 +server maintains some state information; in particular, it keeps track +of which user messages have been marked deleted. However, the POP3 +server does not carry state information across POP3 sessions. This lack +of state information across sessions greatly simplifies the +implementation of a POP3 server. IMAP With POP3 access, once Bob has +downloaded his messages to the local machine, he can create mail + + folders and move the downloaded messages into the folders. Bob can then +delete messages, move messages across folders, and search for messages +(by sender name or subject). But this paradigm--- namely, folders and +messages in the local machine---poses a problem for the nomadic user, +who would prefer to maintain a folder hierarchy on a remote server that +can be accessed from any computer. This is not possible with POP3---the +POP3 protocol does not provide any means for a user to create remote +folders and assign messages to folders. To solve this and other +problems, the IMAP protocol, defined in \[RFC 3501\], was invented. Like +POP3, IMAP is a mail access protocol. It has many more features than +POP3, but it is also significantly more complex. (And thus the client +and server side implementations are significantly more complex.) An IMAP +server will associate each message with a folder; when a message first +arrives at the server, it is associated with the recipient's INBOX +folder. The recipient can then move the message into a new, user-created +folder, read the message, delete the message, and so on. The IMAP +protocol provides commands to allow users to create folders and move +messages from one folder to another. IMAP also provides commands that +allow users to search remote folders for messages matching specific +criteria. Note that, unlike POP3, an IMAP server maintains user state +information across IMAP sessions---for example, the names of the folders +and which messages are associated with which folders. Another important +feature of IMAP is that it has commands that permit a user agent to +obtain components of messages. For example, a user agent can obtain just +the message header of a message or just one part of a multipart MIME +message. This feature is useful when there is a low-bandwidth connection +(for example, a slow-speed modem link) between the user agent and its +mail server. With a low-bandwidth connection, the user may not want to +download all of the messages in its mailbox, particularly avoiding long +messages that might contain, for example, an audio or video clip. +Web-Based E-Mail More and more users today are sending and accessing +their e-mail through their Web browsers. Hotmail introduced Web-based +access in the mid 1990s. Now Web-based e-mail is also provided by +Google, Yahoo!, as well as just about every major university and +corporation. With this service, the user agent is an ordinary Web +browser, and the user communicates with its remote mailbox via HTTP. +When a recipient, such as Bob, wants to access a message in his mailbox, +the e-mail message is sent from Bob's mail server to Bob's browser using +the HTTP protocol rather than the POP3 or IMAP protocol. When a sender, +such as Alice, wants to send an e-mail message, the e-mail message is +sent from her browser to her mail server over HTTP rather than over +SMTP. Alice's mail server, however, still sends messages to, and +receives messages from, other mail servers using SMTP. + + 2.4 DNS---The Internet's Directory Service We human beings can be +identified in many ways. For example, we can be identified by the names +that appear on our birth certificates. We can be identified by our +social security numbers. We can be identified by our driver's license +numbers. Although each of these identifiers can be used to identify +people, within a given context one identifier may be more appropriate +than another. For example, the computers at the IRS (the infamous +tax-collecting agency in the United States) prefer to use fixed-length +social security numbers rather than birth certificate names. On the +other hand, ordinary people prefer the more mnemonic birth certificate +names rather than social security numbers. (Indeed, can you imagine +saying, "Hi. My name is 132-67-9875. Please meet my husband, +178-87-1146.") Just as humans can be identified in many ways, so too can +Internet hosts. One identifier for a host is its hostname. +Hostnames---such as www.facebook.com, www.google.com , gaia.cs.umass.edu +---are mnemonic and are therefore appreciated by humans. However, +hostnames provide little, if any, information about the location within +the Internet of the host. (A hostname such as www.eurecom.fr , which +ends with the country code .fr , tells us that the host is probably in +France, but doesn't say much more.) Furthermore, because hostnames can +consist of variable-length alphanumeric characters, they would be +difficult to process by routers. For these reasons, hosts are also +identified by so-called IP addresses. We discuss IP addresses in some +detail in Chapter 4, but it is useful to say a few brief words about +them now. An IP address consists of four bytes and has a rigid +hierarchical structure. An IP address looks like 121.7.106.83 , where +each period separates one of the bytes expressed in decimal notation +from 0 to 255. An IP address is hierarchical because as we scan the +address from left to right, we obtain more and more specific information +about where the host is located in the Internet (that is, within which +network, in the network of networks). Similarly, when we scan a postal +address from bottom to top, we obtain more and more specific information +about where the addressee is located. + +2.4.1 Services Provided by DNS We have just seen that there are two ways +to identify a host---by a hostname and by an IP address. People prefer +the more mnemonic hostname identifier, while routers prefer +fixed-length, hierarchically structured IP addresses. In order to +reconcile these preferences, we need a directory service that translates +hostnames to IP addresses. This is the main task of the Internet's +domain name system (DNS). The DNS is (1) a distributed database +implemented in a hierarchy of DNS servers, and (2) an + + application-layer protocol that allows hosts to query the distributed +database. The DNS servers are often UNIX machines running the Berkeley +Internet Name Domain (BIND) software \[BIND 2016\]. The DNS protocol +runs over UDP and uses port 53. DNS is commonly employed by other +application-layer protocols---including HTTP and SMTP to translate +user-supplied hostnames to IP addresses. As an example, consider what +happens when a browser (that is, an HTTP client), running on some user's +host, requests the URL www.someschool.edu/index.html . In order for the +user's host to be able to send an HTTP request message to the Web server +www.someschool.edu , the user's host must first obtain the IP address of +www.someschool.edu . This is done as follows. + +1. The same user machine runs the client side of the DNS application. + +2. The browser extracts the hostname, www.someschool.edu , from the URL + and passes the hostname to the client side of the DNS application. + +3. The DNS client sends a query containing the hostname to a DNS + server. + +4. The DNS client eventually receives a reply, which includes the IP + address for the hostname. + +5. Once the browser receives the IP address from DNS, it can initiate a + TCP connection to the HTTP server process located at port 80 at that + IP address. We see from this example that DNS adds an additional + delay---sometimes substantial---to the Internet applications that + use it. Fortunately, as we discuss below, the desired IP address is + often cached in a "nearby" DNS server, which helps to reduce DNS + network traffic as well as the average DNS delay. DNS provides a few + other important services in addition to translating hostnames to IP + addresses: Host aliasing. A host with a complicated hostname can + have one or more alias names. For example, a hostname such as + relay1.west-coast.enterprise.com could have, say, two aliases such + as enterprise.com and www.enterprise.com . In this case, the + hostname relay1.west-coast.enterprise.com is said to be a canonical + hostname. Alias hostnames, when present, are typically more mnemonic + than canonical hostnames. DNS can be invoked by an application to + obtain the canonical hostname for a supplied alias hostname as well + as the IP address of the host. Mail server aliasing. For obvious + reasons, it is highly desirable that e-mail addresses be mnemonic. + For example, if Bob has an account with Yahoo Mail, Bob's e-mail + address might be as simple as bob@yahoo.mail . However, the hostname + of the Yahoo mail server is more complicated and much less mnemonic + than simply yahoo.com (for example, the canonical hostname might be + something like relay1.west-coast.yahoo.com ). DNS can be invoked by + a mail application to obtain the canonical hostname for a supplied + alias hostname as well as the IP address of the host. In fact, the + MX record (see below) permits a company's mail server and Web server + to have identical (aliased) hostnames; for example, a company's Web + server and mail server can both be called + + enterprise.com . Load distribution. DNS is also used to perform load +distribution among replicated servers, such as replicated Web servers. +Busy sites, such as cnn.com , are replicated over multiple servers, with +each server running on a different end system and each having a +different IP address. For replicated Web servers, a set of IP addresses +is thus associated with one canonical hostname. The DNS database +contains this set of IP addresses. When clients make a DNS query for a +name mapped to a set of addresses, the server responds with the entire +set of IP addresses, but rotates the ordering of the addresses within +each reply. Because a client typically sends its HTTP request message to +the IP address that is listed first in the set, DNS rotation distributes +the traffic among the replicated servers. DNS rotation is also used for +e-mail so that multiple mail servers can have the same alias name. Also, +content distribution companies such as Akamai have used DNS in more +sophisticated ways \[Dilley 2002\] to provide Web content distribution +(see Section 2.6.3). The DNS is specified in RFC 1034 and RFC 1035, and +updated in several additional RFCs. It is a complex system, and we only +touch upon key aspects of its + +PRINCIPLES IN PRACTICE DNS: CRITICAL NETWORK FUNCTIONS VIA THE +CLIENT-SERVER PARADIGM Like HTTP, FTP, and SMTP, the DNS protocol is an +application-layer protocol since it (1) runs between communicating end +systems using the client-server paradigm and (2) relies on an underlying +end-to-end transport protocol to transfer DNS messages between +communicating end systems. In another sense, however, the role of the +DNS is quite different from Web, file transfer, and e-mail applications. +Unlike these applications, the DNS is not an application with which a +user directly interacts. Instead, the DNS provides a core Internet +function---namely, translating hostnames to their underlying IP +addresses, for user applications and other software in the Internet. We +noted in Section 1.2 that much of the complexity in the Internet +architecture is located at the "edges" of the network. The DNS, which +implements the critical name-toaddress translation process using clients +and servers located at the edge of the network, is yet another example +of that design philosophy. + +operation here. The interested reader is referred to these RFCs and the +book by Albitz and Liu \[Albitz 1993\]; see also the retrospective paper +\[Mockapetris 1988\], which provides a nice description of the what and +why of DNS, and \[Mockapetris 2005\]. + +2.4.2 Overview of How DNS Works We now present a high-level overview of +how DNS works. Our discussion will focus on the hostname-to- + + IP-address translation service. Suppose that some application (such as a +Web browser or a mail reader) running in a user's host needs to +translate a hostname to an IP address. The application will invoke the +client side of DNS, specifying the hostname that needs to be translated. +(On many UNIX-based machines, gethostbyname() is the function call that +an application calls in order to perform the translation.) DNS in the +user's host then takes over, sending a query message into the network. +All DNS query and reply messages are sent within UDP datagrams to port +53. After a delay, ranging from milliseconds to seconds, DNS in the +user's host receives a DNS reply message that provides the desired +mapping. This mapping is then passed to the invoking application. Thus, +from the perspective of the invoking application in the user's host, DNS +is a black box providing a simple, straightforward translation service. +But in fact, the black box that implements the service is complex, +consisting of a large number of DNS servers distributed around the +globe, as well as an application-layer protocol that specifies how the +DNS servers and querying hosts communicate. A simple design for DNS +would have one DNS server that contains all the mappings. In this +centralized design, clients simply direct all queries to the single DNS +server, and the DNS server responds directly to the querying clients. +Although the simplicity of this design is attractive, it is +inappropriate for today's Internet, with its vast (and growing) number +of hosts. The problems with a centralized design include: A single point +of failure. If the DNS server crashes, so does the entire Internet! +Traffic volume. A single DNS server would have to handle all DNS queries +(for all the HTTP requests and e-mail messages generated from hundreds +of millions of hosts). Distant centralized database. A single DNS server +cannot be "close to" all the querying clients. If we put the single DNS +server in New York City, then all queries from Australia must travel to +the other side of the globe, perhaps over slow and congested links. This +can lead to significant delays. Maintenance. The single DNS server would +have to keep records for all Internet hosts. Not only would this +centralized database be huge, but it would have to be updated frequently +to account for every new host. In summary, a centralized database in a +single DNS server simply doesn't scale. Consequently, the DNS is +distributed by design. In fact, the DNS is a wonderful example of how a +distributed database can be implemented in the Internet. A Distributed, +Hierarchical Database In order to deal with the issue of scale, the DNS +uses a large number of servers, organized in a hierarchical fashion and +distributed around the world. No single DNS server has all of the +mappings for all of the hosts in the Internet. Instead, the mappings are +distributed across the DNS servers. To a first approximation, there are +three classes of DNS servers---root DNS servers, top-level domain (TLD) +DNS + + servers, and authoritative DNS servers---organized in a hierarchy as +shown in Figure 2.17. To understand how these three classes of servers +interact, suppose a DNS client wants to determine the IP address for the +hostname www.amazon.com . To a first + +Figure 2.17 Portion of the hierarchy of DNS servers + +approximation, the following events will take place. The client first +contacts one of the root servers, which returns IP addresses for TLD +servers for the top-level domain com . The client then contacts one of +these TLD servers, which returns the IP address of an authoritative +server for amazon.com . Finally, the client contacts one of the +authoritative servers for amazon.com , which returns the IP address for +the hostname www.amazon.com . We'll soon examine this DNS lookup process +in more detail. But let's first take a closer look at these three +classes of DNS servers: Root DNS servers. There are over 400 root name +servers scattered all over the world. Figure 2.18 shows the countries +that have root names servers, with countries having more than ten darkly +shaded. These root name servers are managed by 13 different +organizations. The full list of root name servers, along with the +organizations that manage them and their IP addresses can be found at +\[Root Servers 2016\]. Root name servers provide the IP addresses of the +TLD servers. Top-level domain (TLD) servers. For each of the top-level +domains --- top-level domains such as com, org, net, edu, and gov, and +all of the country top-level domains such as uk, fr, ca, and jp --- +there is TLD server (or server cluster). The company Verisign Global +Registry Services maintains the TLD servers for the com top-level +domain, and the company Educause maintains the TLD servers for the edu +top-level domain. The network infrastructure supporting a TLD can be +large and complex; see \[Osterweil 2012\] for a nice overview of the +Verisign network. See \[TLD list 2016\] for a list of all top-level +domains. TLD servers provide the IP addresses for authoritative DNS +servers. + + Figure 2.18 DNS root servers in 2016 + +Authoritative DNS servers. Every organization with publicly accessible +hosts (such as Web servers and mail servers) on the Internet must +provide publicly accessible DNS records that map the names of those +hosts to IP addresses. An organization's authoritative DNS server houses +these DNS records. An organization can choose to implement its own +authoritative DNS server to hold these records; alternatively, the +organization can pay to have these records stored in an authoritative +DNS server of some service provider. Most universities and large +companies implement and maintain their own primary and secondary +(backup) authoritative DNS server. The root, TLD, and authoritative DNS +servers all belong to the hierarchy of DNS servers, as shown in Figure +2.17. There is another important type of DNS server called the local DNS +server. A local DNS server does not strictly belong to the hierarchy of +servers but is nevertheless central to the DNS architecture. Each +ISP---such as a residential ISP or an institutional ISP---has a local +DNS server (also called a default name server). When a host connects to +an ISP, the ISP provides the host with the IP addresses of one or more +of its local DNS servers (typically through DHCP, which is discussed in +Chapter 4). You can easily determine the IP address of your local DNS +server by accessing network status windows in Windows or UNIX. A host's +local DNS server is typically "close to" the host. For an institutional +ISP, the local DNS server may be on the same LAN as the host; for a +residential ISP, it is typically separated from the host by no more than +a few routers. When a host makes a DNS query, the query is sent to the +local DNS server, which acts a proxy, forwarding the query into the DNS +server hierarchy, as we'll discuss in more detail below. Let's take a +look at a simple example. Suppose the host cse.nyu.edu desires the IP +address of gaia.cs.umass.edu . Also suppose that NYU's ocal DNS server +for cse.nyu.edu is called + + dns.nyu.edu and that an authoritative DNS server for gaia.cs.umass.edu +is called dns.umass.edu . As shown in Figure 2.19, the host cse.nyu.edu +first sends a DNS query message to its local DNS server, dns.nyu.edu . +The query message contains the hostname to be translated, namely, +gaia.cs.umass.edu . The local DNS server forwards the query message to a +root DNS server. The root DNS server takes note of the edu suffix and +returns to the local DNS server a list of IP addresses for TLD servers +responsible for edu . The local DNS server then resends the query +message to one of these TLD servers. The TLD server takes note of the +umass.edu suffix and responds with the IP address of the authoritative +DNS server for the University of Massachusetts, namely, dns.umass.edu . +Finally, the local DNS server resends the query message directly to +dns.umass.edu , which responds with the IP address of gaia.cs.umass.edu +. Note that in this example, in order to obtain the mapping for one +hostname, eight DNS messages were sent: four query messages and four +reply messages! We'll soon see how DNS caching reduces this query +traffic. Our previous example assumed that the TLD server knows the +authoritative DNS server for the hostname. In general this not always +true. Instead, the TLD server + +Figure 2.19 Interaction of the various DNS servers + + may know only of an intermediate DNS server, which in turn knows the +authoritative DNS server for the hostname. For example, suppose again +that the University of Massachusetts has a DNS server for the +university, called dns.umass.edu . Also suppose that each of the +departments at the University of Massachusetts has its own DNS server, +and that each departmental DNS server is authoritative for all hosts in +the department. In this case, when the intermediate DNS server, +dns.umass.edu , receives a query for a host with a hostname ending with +cs.umass.edu , it returns to dns.nyu.edu the IP address of +dns.cs.umass.edu , which is authoritative for all hostnames ending with +cs.umass.edu . The local DNS server dns.nyu.edu then sends the query to +the authoritative DNS server, which returns the desired mapping to the +local DNS server, which in turn returns the mapping to the requesting +host. In this case, a total of 10 DNS messages are sent! The example +shown in Figure 2.19 makes use of both recursive queries and iterative +queries. The query sent from cse.nyu.edu to dns.nyu.edu is a recursive +query, since the query asks dns.nyu.edu to obtain the mapping on its +behalf. But the subsequent three queries are iterative since all of the +replies are directly returned to dns.nyu.edu . In theory, any DNS query +can be iterative or recursive. For example, Figure 2.20 shows a DNS +query chain for which all of the queries are recursive. In practice, the +queries typically follow the pattern in Figure 2.19: The query from the +requesting host to the local DNS server is recursive, and the remaining +queries are iterative. DNS Caching Our discussion thus far has ignored +DNS caching, a critically important feature of the DNS system. In truth, +DNS extensively exploits DNS caching in order to improve the delay +performance and to reduce the number of DNS messages + + Figure 2.20 Recursive queries in DNS + +ricocheting around the Internet. The idea behind DNS caching is very +simple. In a query chain, when a DNS server receives a DNS reply +(containing, for example, a mapping from a hostname to an IP address), +it can cache the mapping in its local memory. For example, in Figure +2.19, each time the local DNS server dns.nyu.edu receives a reply from +some DNS server, it can cache any of the information contained in the +reply. If a hostname/IP address pair is cached in a DNS server and +another query arrives to the DNS server for the same hostname, the DNS +server can provide the desired IP address, even if it is not +authoritative for the hostname. Because hosts and mappings between +hostnames and IP addresses are by no means permanent, DNS servers +discard cached information after a period of time (often set to two +days). As an example, suppose that a host apricot.nyu.edu queries +dns.nyu.edu for the IP address for the hostname cnn.com . Furthermore, +­suppose that a few hours later, another NYU host, say, kiwi.nyu.edu , +also queries dns.nyu.edu with the same hostname. Because of caching, the +local DNS server will be able to immediately return the IP address of +cnn.com to this second requesting + + host without having to query any other DNS servers. A local DNS server +can also cache the IP addresses of TLD servers, thereby allowing the +local DNS server to bypass the root DNS servers in a query chain. In +fact, because of caching, root servers are bypassed for all but a very +small fraction of DNS queries. + +2.4.3 DNS Records and Messages The DNS servers that together implement +the DNS distributed database store resource records (RRs), including RRs +that provide hostname-to-IP address mappings. Each DNS reply message +carries one or more resource records. In this and the following +subsection, we provide a brief overview of DNS resource records and +messages; more details can be found in \[Albitz 1993\] or in the DNS +RFCs \[RFC 1034; RFC 1035\]. A resource record is a four-tuple that +contains the following fields: + +(Name, Value, Type, TTL) + +TTL is the time to live of the resource record; it determines when a +resource should be removed from a cache. In the example records given +below, we ignore the TTL field. The meaning of Name and Value depend on +Type : If Type=A , then Name is a hostname and Value is the IP address +for the hostname. Thus, a Type A record provides the standard +hostname-to-IP address mapping. As an example, (relay1.bar.foo.com, +145.37.93.126, A) is a Type A record. If Type=NS , then Name is a domain +(such as foo.com ) and Value is the hostname of an authoritative DNS +server that knows how to obtain the IP addresses for hosts in the +domain. This record is used to route DNS queries further along in the +query chain. As an example, (foo.com, dns.foo.com, NS) is a Type NS +record. If Type=CNAME , then Value is a canonical hostname for the alias +hostname Name . This record can provide querying hosts the canonical +name for a hostname. As an example, (foo.com, relay1.bar.foo.com, CNAME) +is a CNAME record. If Type=MX , then Value is the canonical name of a +mail server that has an alias hostname Name . As an example, (foo.com, +mail.bar.foo.com, MX) is an MX record. MX records allow the hostnames of +mail servers to have simple aliases. Note that by using the MX record, a +company can have the same aliased name for its mail server and for one +of its other servers (such as its Web server). To obtain the canonical +name for the mail server, a DNS client would query for an MX + + record; to obtain the canonical name for the other server, the DNS +client would query for the CNAME record. If a DNS server is +authoritative for a particular hostname, then the DNS server will +contain a Type A record for the hostname. (Even if the DNS server is not +authoritative, it may contain a Type A record in its cache.) If a server +is not authoritative for a hostname, then the server will contain a Type +NS record for the domain that includes the hostname; it will also +contain a Type A record that provides the IP address of the DNS server +in the Value field of the NS record. As an example, suppose an edu TLD +server is not authoritative for the host gaia.cs.umass.edu . Then this +server will contain a record for a domain that includes the host +gaia.cs.umass.edu , for example, (umass.edu, dns.umass.edu, NS) . The +edu TLD server would also contain a Type A record, which maps the DNS +server dns.umass.edu to an IP address, for example, (dns.umass.edu, +128.119.40.111, A) . DNS Messages Earlier in this section, we referred +to DNS query and reply messages. These are the only two kinds of DNS +messages. Furthermore, both query and reply messages have the same +format, as shown in Figure 2.21.The semantics of the various fields in a +DNS message are as follows: The first 12 bytes is the header section, +which has a number of fields. The first field is a 16-bit number that +identifies the query. This identifier is copied into the reply message +to a query, allowing the client to match received replies with sent +queries. There are a number of flags in the flag field. A 1-bit +query/reply flag indicates whether the message is a query (0) or a reply +(1). A 1-bit authoritative flag is + + Figure 2.21 DNS message format + +set in a reply message when a DNS server is an authoritative server for +a queried name. A 1-bit recursion-desired flag is set when a client +(host or DNS server) desires that the DNS server perform recursion when +it doesn't have the record. A 1-bit recursion-available field is set in +a reply if the DNS server supports recursion. In the header, there are +also four number-of fields. These fields indicate the number of +occurrences of the four types of data sections that follow the header. +The question section contains information about the query that is being +made. This section includes (1) a name field that contains the name that +is being queried, and (2) a type field that indicates the type of +question being asked about the name---for example, a host address +associated with a name (Type A) or the mail server for a name (Type MX). +In a reply from a DNS server, the answer section contains the resource +records for the name that was originally queried. Recall that in each +resource record there is the Type (for example, A, NS, CNAME, and MX), +the Value , and the TTL . A reply can return multiple RRs in the answer, +since a hostname can have multiple IP addresses (for example, for +replicated Web servers, as discussed earlier in this section). The +authority section contains records of other authoritative servers. The +additional section contains other helpful records. For example, the +answer field in a reply to an MX query contains a resource record +providing the canonical hostname of a mail server. The additional +section contains a Type A record providing the IP address for the +canonical hostname of the mail server. How would you like to send a DNS +query message directly from the host you're working on to some DNS +server? This can easily be done with the nslookup program, which is +available from most Windows and UNIX platforms. For example, from a +Windows host, open the Command Prompt and invoke the nslookup program by +simply typing "nslookup." After invoking nslookup, you can send a DNS +query to any DNS server (root, TLD, or authoritative). After receiving +the reply message from the DNS server, nslookup will display the records +included in the reply (in a human-readable format). As an alternative to +running nslookup from your own host, you can visit one of many Web sites +that allow you to remotely employ nslookup. (Just type "nslookup" into a +search engine and you'll be brought to one of these sites.) The DNS +Wireshark lab at the end of this chapter will allow you to explore the +DNS in much more detail. Inserting Records into the DNS Database The +discussion above focused on how records are retrieved from the DNS +database. You might be wondering how records get into the database in +the first place. Let's look at how this is done in the context of a +specific example. Suppose you have just created an exciting new startup +company called Network Utopia. The first thing you'll surely want to do +is register the domain name + + networkutopia.com at a registrar. A registrar is a commercial entity +that verifies the uniqueness of the domain name, enters the domain name +into the DNS database (as discussed below), and collects a small fee +from you for its services. Prior to 1999, a single registrar, Network +Solutions, had a monopoly on domain name registration for com , net , +and org domains. But now there are many registrars competing for +customers, and the Internet Corporation for Assigned Names and Numbers +(ICANN) accredits the various registrars. A complete list of accredited +registrars is available at http:// www.internic.net . When you register +the domain name networkutopia.com with some registrar, you also need to +provide the registrar with the names and IP addresses of your primary +and secondary authoritative DNS servers. Suppose the names and IP +addresses are dns1.networkutopia.com , dns2.networkutopia.com , +212.2.212.1, and 212.212.212.2. For each of these two authoritative DNS +servers, the registrar would then make sure that a Type NS and a Type A +record are entered into the TLD com servers. Specifically, for the +primary authoritative server for networkutopia.com , the registrar would +insert the following two resource records into the DNS system: + +(networkutopia.com, dns1.networkutopia.com, NS) (dns1.networkutopia.com, +212.212.212.1, A) + +You'll also have to make sure that the Type A resource record for your +Web server www.networkutopia.com and the Type MX resource record for +your mail server mail.networkutopia.com are entered into your +authoritative DNS FOCUS ON SECURITY DNS VULNERABILITIES We have seen +that DNS is a critical component of the Internet infrastructure, with +many important services---including the Web and e-mail---simply +incapable of functioning without it. We therefore naturally ask, how can +DNS be attacked? Is DNS a sitting duck, waiting to be knocked out of +service, while taking most Internet applications down with it? The first +type of attack that comes to mind is a DDoS bandwidth-flooding attack +(see Section 1.6) against DNS servers. For example, an attacker could +attempt to send to each DNS root server a deluge of packets, so many +that the majority of legitimate DNS queries never get answered. Such a +large-scale DDoS attack against DNS root servers actually took place on +October 21, 2002. In this attack, the attackers leveraged a botnet to +send truck loads of ICMP ping messages to each of the 13 DNS root IP +addresses. (ICMP messages are discussed in + + Section 5.6. For now, it suffices to know that ICMP packets are special +types of IP datagrams.) Fortunately, this large-scale attack caused +minimal damage, having little or no impact on users' Internet +experience. The attackers did succeed at directing a deluge of packets +at the root servers. But many of the DNS root servers were protected by +packet filters, configured to always block all ICMP ping messages +directed at the root servers. These protected servers were thus spared +and functioned as normal. Furthermore, most local DNS servers cache the +IP addresses of top-level-domain servers, allowing the query process to +often bypass the DNS root servers. A potentially more effective DDoS +attack against DNS would be send a deluge of DNS queries to +top-level-domain servers, for example, to all the top-level-domain +servers that handle the .com domain. It would be harder to filter DNS +queries directed to DNS servers; and top-level-domain servers are not as +easily bypassed as are root servers. But the severity of such an attack +would be partially mitigated by caching in local DNS servers. DNS could +potentially be attacked in other ways. In a man-in-the-middle attack, +the attacker intercepts queries from hosts and returns bogus replies. In +the DNS poisoning attack, the attacker sends bogus replies to a DNS +server, tricking the server into accepting bogus records into its cache. +Either of these attacks could be used, for example, to redirect an +unsuspecting Web user to the attacker's Web site. These attacks, +however, are difficult to implement, as they require intercepting +packets or throttling servers \[Skoudis 2006\]. In summary, DNS has +demonstrated itself to be surprisingly robust against attacks. To date, +there hasn't been an attack that has successfully impeded the DNS +service. + +servers. (Until recently, the contents of each DNS server were +configured statically, for example, from a configuration file created by +a system manager. More recently, an UPDATE option has been added to the +DNS protocol to allow data to be dynamically added or deleted from the +database via DNS messages. \[RFC 2136\] and \[RFC 3007\] specify DNS +dynamic updates.) Once all of these steps are completed, people will be +able to visit your Web site and send e-mail to the employees at your +company. Let's conclude our discussion of DNS by verifying that this +statement is true. This verification also helps to solidify what we have +learned about DNS. Suppose Alice in Australia wants to view the Web page +www.networkutopia.com . As discussed earlier, her host will first send a +DNS query to her local DNS server. The local DNS server will then +contact a TLD com server. (The local DNS server will also have to +contact a root DNS server if the address of a TLD com server is not +cached.) This TLD server contains the Type NS and Type A resource +records listed above, because the registrar had these resource records +inserted into all of the TLD com servers. The TLD com server sends a +reply to Alice's local DNS server, with the reply containing the two +resource records. The local DNS server then sends a DNS query to +212.212.212.1 , asking for the Type A record corresponding to +www.networkutopia.com . This record provides the IP address of the +desired Web server, say, 212.212.71.4 , which the local DNS server +passes back to Alice's host. Alice's browser can now + + initiate a TCP connection to the host 212.212.71.4 and send an HTTP +request over the connection. Whew! There's a lot more going on than what +meets the eye when one surfs the Web! + + 2.5 Peer-to-Peer File Distribution The applications described in this +chapter thus far---including the Web, e-mail, and DNS---all employ +client-server architectures with significant reliance on always-on +infrastructure servers. Recall from Section 2.1.1 that with a P2P +architecture, there is minimal (or no) reliance on always-on +infrastructure servers. Instead, pairs of intermittently connected +hosts, called peers, communicate directly with each other. The peers are +not owned by a service provider, but are instead desktops and laptops +controlled by users. In this section we consider a very natural P2P +application, namely, distributing a large file from a single server to a +large number of hosts (called peers). The file might be a new version of +the Linux operating system, a software patch for an existing operating +system or application, an MP3 music file, or an MPEG video file. In +client-server file distribution, the server must send a copy of the file +to each of the peers---placing an enormous burden on the server and +consuming a large amount of server bandwidth. In P2P file distribution, +each peer can redistribute any portion of the file it has received to +any other peers, thereby assisting the server in the distribution +process. As of 2016, the most popular P2P file distribution protocol is +BitTorrent. Originally developed by Bram Cohen, there are now many +different independent BitTorrent clients conforming to the BitTorrent +protocol, just as there are a number of Web browser clients that conform +to the HTTP protocol. In this subsection, we first examine the +selfscalability of P2P architectures in the context of file +distribution. We then describe BitTorrent in some detail, highlighting +its most important characteristics and features. Scalability of P2P +Architectures To compare client-server architectures with peer-to-peer +architectures, and illustrate the inherent selfscalability of P2P, we +now consider a simple quantitative model for distributing a file to a +fixed set of peers for both architecture types. As shown in Figure 2.22, +the server and the peers are connected to the Internet with access +links. Denote the upload rate of the server's access link by us, the +upload rate of the ith peer's access link by ui, and the download rate +of the ith peer's access link by di. Also denote the size of the file to +be distributed (in bits) by F and the number of peers that want to +obtain a copy of the file by N. The distribution time is the time it +takes to get + + Figure 2.22 An illustrative file distribution problem + +a copy of the file to all N peers. In our analysis of the distribution +time below, for both client-server and P2P architectures, we make the +simplifying (and generally accurate \[Akella 2003\]) assumption that the +Internet core has abundant bandwidth, implying that all of the +bottlenecks are in access networks. We also suppose that the server and +clients are not participating in any other network applications, so that +all of their upload and download access bandwidth can be fully devoted +to distributing this file. Let's first determine the distribution time +for the client-server architecture, which we denote by Dcs. In the +client-server architecture, none of the peers aids in distributing the +file. We make the following observations: The server must transmit one +copy of the file to each of the N peers. Thus the server must transmit +NF bits. Since the server's upload rate is us, the time to distribute +the file must be at least NF/us. Let dmin denote the download rate of +the peer with the lowest download rate, that is, dmin=min{d1,dp,. . +.,dN}. The peer with the lowest download rate cannot obtain all F bits +of the file in less than F/dmin seconds. Thus the minimum distribution +time is at least F/dmin. Putting these two observations together, we +obtain Dcs≥max{NFus,Fdmin}. + + This provides a lower bound on the minimum distribution time for the +client-server architecture. In the homework problems you will be asked +to show that the server can schedule its transmissions so that the lower +bound is actually achieved. So let's take this lower bound provided +above as the actual distribution time, that is, Dcs=max{NFus,Fdmin} + +(2.1) + +We see from Equation 2.1 that for N large enough, the client-server +distribution time is given by NF/us. Thus, the distribution time +increases linearly with the number of peers N. So, for example, if the +number of peers from one week to the next increases a thousand-fold from +a thousand to a million, the time required to distribute the file to all +peers increases by 1,000. Let's now go through a similar analysis for +the P2P architecture, where each peer can assist the server in +distributing the file. In particular, when a peer receives some file +data, it can use its own upload capacity to redistribute the data to +other peers. Calculating the distribution time for the P2P architecture +is somewhat more complicated than for the client-server architecture, +since the distribution time depends on how each peer distributes +portions of the file to the other peers. Nevertheless, a simple +expression for the minimal distribution time can be obtained \[Kumar +2006\]. To this end, we first make the following observations: At the +beginning of the distribution, only the server has the file. To get this +file into the community of peers, the server must send each bit of the +file at least once into its access link. Thus, the minimum distribution +time is at least F/us. (Unlike the client-server scheme, a bit sent once +by the server may not have to be sent by the server again, as the peers +may redistribute the bit among themselves.) As with the client-server +architecture, the peer with the lowest download rate cannot obtain all F +bits of the file in less than F/dmin seconds. Thus the minimum +distribution time is at least F/dmin. Finally, observe that the total +upload capacity of the system as a whole is equal to the upload rate of +the server plus the upload rates of each of the individual peers, that +is, utotal=us+u1+⋯+uN. The system must deliver (upload) F bits to each +of the N peers, thus delivering a total of NF bits. This cannot be done +at a rate faster than utotal. Thus, the minimum distribution time is +also at least NF/(us+u1+⋯+uN). Putting these three observations +together, we obtain the minimum distribution time for P2P, denoted by +DP2P. DP2P≥max{Fus,Fdmin,NFus+∑i=1Nui} + +(2.2) + +Equation 2.2 provides a lower bound for the minimum distribution time +for the P2P architecture. It turns out that if we imagine that each peer +can redistribute a bit as soon as it receives the bit, then there is a + + redistribution scheme that actually achieves this lower bound \[Kumar +2006\]. (We will prove a special case of this result in the homework.) +In reality, where chunks of the file are redistributed rather than +individual bits, Equation 2.2 serves as a good approximation of the +actual minimum distribution time. Thus, let's take the lower bound +provided by Equation 2.2 as the actual minimum distribution time, that +is, DP2P=max{Fus,Fdmin,NFus+∑i=1Nui} + +(2.3) + +Figure 2.23 compares the minimum distribution time for the client-server +and P2P architectures assuming that all peers have the same upload rate +u. In Figure 2.23, we have set F/u=1 hour, us=10u, and dmin≥us. Thus, a +peer can transmit the entire file in one hour, the server transmission +rate is 10 times the peer upload rate, + +Figure 2.23 Distribution time for P2P and client-server architectures + +and (for simplicity) the peer download rates are set large enough so as +not to have an effect. We see from Figure 2.23 that for the +client-server architecture, the distribution time increases linearly and +without bound as the number of peers increases. However, for the P2P +architecture, the minimal distribution time is not only always less than +the distribution time of the client-server architecture; it is also less +than one hour for any number of peers N. Thus, applications with the P2P +architecture can be self-scaling. This scalability is a direct +consequence of peers being redistributors as well as consumers of bits. +BitTorrent BitTorrent is a popular P2P protocol for file distribution +\[Chao 2011\]. In BitTorrent lingo, the collection of + + all peers participating in the distribution of a particular file is +called a torrent. Peers in a torrent download equal-size chunks of the +file from one another, with a typical chunk size of 256 KBytes. When a +peer first joins a torrent, it has no chunks. Over time it accumulates +more and more chunks. While it downloads chunks it also uploads chunks +to other peers. Once a peer has acquired the entire file, it may +(selfishly) leave the torrent, or (altruistically) remain in the torrent +and continue to upload chunks to other peers. Also, any peer may leave +the torrent at any time with only a subset of chunks, and later rejoin +the torrent. Let's now take a closer look at how BitTorrent operates. +Since BitTorrent is a rather complicated protocol and system, we'll only +describe its most important mechanisms, sweeping some of the details +under the rug; this will allow us to see the forest through the trees. +Each torrent has an infrastructure node called a tracker. + +Figure 2.24 File distribution with BitTorrent + +When a peer joins a torrent, it registers itself with the tracker and +periodically informs the tracker that it is still in the torrent. In +this manner, the tracker keeps track of the peers that are participating +in the torrent. A given torrent may have fewer than ten or more than a +thousand peers participating at any instant of time. + + As shown in Figure 2.24, when a new peer, Alice, joins the torrent, the +tracker randomly selects a subset of peers (for concreteness, say 50) +from the set of participating peers, and sends the IP addresses of these +50 peers to Alice. Possessing this list of peers, Alice attempts to +establish concurrent TCP connections with all the peers on this list. +Let's call all the peers with which Alice succeeds in establishing a TCP +connection "neighboring peers." (In Figure 2.24, Alice is shown to have +only three neighboring peers. Normally, she would have many more.) As +time evolves, some of these peers may leave and other peers (outside the +initial 50) may attempt to establish TCP connections with Alice. So a +peer's neighboring peers will fluctuate over time. At any given time, +each peer will have a subset of chunks from the file, with different +peers having different subsets. Periodically, Alice will ask each of her +neighboring peers (over the TCP connections) for the list of the chunks +they have. If Alice has L different neighbors, she will obtain L lists +of chunks. With this knowledge, Alice will issue requests (again over +the TCP connections) for chunks she currently does not have. So at any +given instant of time, Alice will have a subset of chunks and will know +which chunks her neighbors have. With this information, Alice will have +two important decisions to make. First, which chunks should she request +first from her neighbors? And second, to which of her neighbors should +she send requested chunks? In deciding which chunks to request, Alice +uses a technique called rarest first. The idea is to determine, from +among the chunks she does not have, the chunks that are the rarest among +her neighbors (that is, the chunks that have the fewest repeated copies +among her neighbors) and then request those rarest chunks first. In this +manner, the rarest chunks get more quickly redistributed, aiming to +(roughly) equalize the numbers of copies of each chunk in the torrent. +To determine which requests she responds to, BitTorrent uses a clever +trading algorithm. The basic idea is that Alice gives priority to the +neighbors that are currently supplying her data at the highest rate. +Specifically, for each of her neighbors, Alice continually measures the +rate at which she receives bits and determines the four peers that are +feeding her bits at the highest rate. She then reciprocates by sending +chunks to these same four peers. Every 10 seconds, she recalculates the +rates and possibly modifies the set of four peers. In BitTorrent lingo, +these four peers are said to be unchoked. Importantly, every 30 seconds, +she also picks one additional neighbor at random and sends it chunks. +Let's call the randomly chosen peer Bob. In BitTorrent lingo, Bob is +said to be optimistically unchoked. Because Alice is sending data to +Bob, she may become one of Bob's top four uploaders, in which case Bob +would start to send data to Alice. If the rate at which Bob sends data +to Alice is high enough, Bob could then, in turn, become one of Alice's +top four uploaders. In other words, every 30 seconds, Alice will +randomly choose a new trading partner and initiate trading with that +partner. If the two peers are satisfied with the trading, they will put +each other in their top four lists and continue trading with each other +until one of the peers finds a better partner. The effect is that peers +capable of uploading at compatible rates tend to find each other. The +random neighbor selection also allows new peers to get chunks, so that +they can have something to trade. All other neighboring peers besides +these five peers + + (four "top" peers and one probing peer) are "choked," that is, they do +not receive any chunks from Alice. BitTorrent has a number of +interesting mechanisms that are not discussed here, including pieces +(minichunks), pipelining, random first selection, endgame mode, and +anti-snubbing \[Cohen 2003\]. The incentive mechanism for trading just +described is often referred to as tit-for-tat \[Cohen 2003\]. It has +been shown that this incentive scheme can be circumvented \[Liogkas +2006; Locher 2006; Piatek 2007\]. Nevertheless, the BitTorrent ecosystem +is wildly successful, with millions of simultaneous peers actively +sharing files in hundreds of thousands of torrents. If BitTorrent had +been designed without tit-fortat (or a variant), but otherwise exactly +the same, BitTorrent would likely not even exist now, as the majority of +the users would have been freeriders \[Saroiu 2002\]. We close our +discussion on P2P by briefly mentioning another application of P2P, +namely, Distributed Hast Table (DHT). A distributed hash table is a +simple database, with the database records being distributed over the +peers in a P2P system. DHTs have been widely implemented (e.g., in +BitTorrent) and have been the subject of extensive research. An overview +is provided in a Video Note in the companion website. + +Walking though distributed hash tables + + 2.6 Video Streaming and Content Distribution Networks Streaming +prerecorded video now accounts for the majority of the traffic in +residential ISPs in North America. In particular, the Netflix and +YouTube services alone consumed a whopping 37% and 16%, respectively, of +residential ISP traffic in 2015 \[Sandvine 2015\]. In this section we +will provide an overview of how popular video streaming services are +implemented in today's Internet. We will see they are implemented using +application-level protocols and servers that function in some ways like +a cache. In Chapter 9, devoted to multimedia networking, we will further +examine Internet video as well as other Internet multimedia services. + +2.6.1 Internet Video In streaming stored video applications, the +underlying medium is prerecorded video, such as a movie, a television +show, a prerecorded sporting event, or a prerecorded user-generated +video (such as those commonly seen on YouTube). These prerecorded videos +are placed on servers, and users send requests to the servers to view +the videos on demand. Many Internet companies today provide streaming +video, including, Netflix, YouTube (Google), Amazon, and Youku. But +before launching into a discussion of video streaming, we should first +get a quick feel for the video medium itself. A video is a sequence of +images, typically being displayed at a constant rate, for example, at 24 +or 30 images per second. An uncompressed, digitally encoded image +consists of an array of pixels, with each pixel encoded into a number of +bits to represent luminance and color. An important characteristic of +video is that it can be compressed, thereby trading off video quality +with bit rate. Today's off-the-shelf compression algorithms can compress +a video to essentially any bit rate desired. Of course, the higher the +bit rate, the better the image quality and the better the overall user +viewing experience. From a networking perspective, perhaps the most +salient characteristic of video is its high bit rate. Compressed +Internet video typically ranges from 100 kbps for low-quality video to +over 3 Mbps for streaming high-definition movies; 4K streaming envisions +a bitrate of more than 10 Mbps. This can translate to huge amount of +traffic and storage, particularly for high-end video. For example, a +single 2 Mbps video with a duration of 67 minutes will consume 1 +gigabyte of storage and traffic. By far, the most important performance +measure for streaming video is average end-to-end throughput. In order +to provide continuous playout, the network must provide an average +throughput to the streaming application that is at least as large as the +bit rate of the compressed video. + + We can also use compression to create multiple versions of the same +video, each at a different quality level. For example, we can use +compression to create, say, three versions of the same video, at rates +of 300 kbps, 1 Mbps, and 3 Mbps. Users can then decide which version +they want to watch as a function of their current available bandwidth. +Users with high-speed Internet connections might choose the 3 Mbps +version; users watching the video over 3G with a smartphone might choose +the 300 kbps version. + +2.6.2 HTTP Streaming and DASH In HTTP streaming, the video is simply +stored at an HTTP server as an ordinary file with a specific URL. When a +user wants to see the video, the client establishes a TCP connection +with the server and issues an HTTP GET request for that URL. The server +then sends the video file, within an HTTP response message, as quickly +as the underlying network protocols and traffic conditions will allow. +On the client side, the bytes are collected in a client application +buffer. Once the number of bytes in this buffer exceeds a predetermined +threshold, the client application begins playback---specifically, the +streaming video application periodically grabs video frames from the +client application buffer, decompresses the frames, and displays them on +the user's screen. Thus, the video streaming application is displaying +video as it is receiving and buffering frames corresponding to latter +parts of the video. Although HTTP streaming, as described in the +previous paragraph, has been extensively deployed in practice (for +example, by YouTube since its inception), it has a major shortcoming: +All clients receive the same encoding of the video, despite the large +variations in the amount of bandwidth available to a client, both across +different clients and also over time for the same client. This has led +to the development of a new type of HTTP-based streaming, often referred +to as Dynamic Adaptive Streaming over HTTP (DASH). In DASH, the video is +encoded into several different versions, with each version having a +different bit rate and, correspondingly, a different quality level. The +client dynamically requests chunks of video segments of a few seconds in +length. When the amount of available bandwidth is high, the client +naturally selects chunks from a high-rate version; and when the +available bandwidth is low, it naturally selects from a low-rate +version. The client selects different chunks one at a time with HTTP GET +request messages \[Akhshabi 2011\]. DASH allows clients with different +Internet access rates to stream in video at different encoding rates. +Clients with low-speed 3G connections can receive a low bit-rate (and +low-quality) version, and clients with fiber connections can receive a +high-quality version. DASH also allows a client to adapt to the +available bandwidth if the available end-to-end bandwidth changes during +the session. This feature is particularly important for mobile users, +who typically see their bandwidth availability fluctuate as they move +with respect to the base stations. With DASH, each video version is +stored in the HTTP server, each with a different URL. The HTTP + + server also has a manifest file, which provides a URL for each version +along with its bit rate. The client first requests the manifest file and +learns about the various versions. The client then selects one chunk at +a time by specifying a URL and a byte range in an HTTP GET request +message for each chunk. While downloading chunks, the client also +measures the received bandwidth and runs a rate determination algorithm +to select the chunk to request next. Naturally, if the client has a lot +of video buffered and if the measured receive bandwidth is high, it will +choose a chunk from a high-bitrate version. And naturally if the client +has little video buffered and the measured received bandwidth is low, it +will choose a chunk from a low-bitrate version. DASH therefore allows +the client to freely switch among different quality levels. + +2.6.3 Content Distribution Networks Today, many Internet video companies +are distributing on-demand multi-Mbps streams to millions of users on a +daily basis. YouTube, for example, with a library of hundreds of +millions of videos, distributes hundreds of millions of video streams to +users around the world every day. Streaming all this traffic to +locations all over the world while providing continuous playout and high +interactivity is clearly a challenging task. For an Internet video +company, perhaps the most straightforward approach to providing +streaming video service is to build a single massive data center, store +all of its videos in the data center, and stream the videos directly +from the data center to clients worldwide. But there are three major +problems with this approach. First, if the client is far from the data +center, server-to-client packets will cross many communication links and +likely pass through many ISPs, with some of the ISPs possibly located on +different continents. If one of these links provides a throughput that +is less than the video consumption rate, the end-to-end throughput will +also be below the consumption rate, resulting in annoying freezing +delays for the user. (Recall from Chapter 1 that the end-to-end +throughput of a stream is governed by the throughput at the bottleneck +link.) The likelihood of this happening increases as the number of links +in the end-to-end path increases. A second drawback is that a popular +video will likely be sent many times over the same communication links. +Not only does this waste network bandwidth, but the Internet video +company itself will be paying its provider ISP (connected to the data +center) for sending the same bytes into the Internet over and over +again. A third problem with this solution is that a single data center +represents a single point of failure---if the data center or its links +to the Internet goes down, it would not be able to distribute any video +streams. In order to meet the challenge of distributing massive amounts +of video data to users distributed around the world, almost all major +video-streaming companies make use of Content Distribution Networks +(CDNs). A CDN manages servers in multiple geographically distributed +locations, stores copies of the videos (and other types of Web content, +including documents, images, and audio) in its servers, and attempts to +direct each user request to a CDN location that will provide the best +user experience. The + + CDN may be a private CDN, that is, owned by the content provider itself; +for example, Google's CDN distributes YouTube videos and other types of +content. The CDN may alternatively be a third-party CDN that distributes +content on behalf of multiple content providers; Akamai, Limelight and +Level-3 all operate third-party CDNs. A very readable overview of modern +CDNs is \[Leighton 2009; Nygren 2010\]. CDNs typically adopt one of two +different server placement philosophies \[Huang 2008\]: Enter Deep. One +philosophy, pioneered by Akamai, is to enter deep into the access +networks of Internet Service Providers, by deploying server clusters in +access ISPs all over the world. (Access networks are described in +Section 1.3.) Akamai takes this approach with clusters in approximately +1,700 locations. The goal is to get close to end users, thereby +improving user-perceived delay and throughput by decreasing the number +of links and routers between the end user and the CDN server from which +it receives content. Because of this highly distributed design, the task +of maintaining and managing the clusters becomes challenging. Bring +Home. A second design philosophy, taken by Limelight and many other CDN +companies, is to bring the ISPs home by building large clusters at a +smaller number (for example, tens) of sites. Instead of getting inside +the access ISPs, these CDNs typically place their clusters in Internet +Exchange Points (IXPs) (see Section 1.3). Compared with the enter-deep +design philosophy, the bring-home design typically results in lower +maintenance and management overhead, possibly at the expense of higher +delay and lower throughput to end users. Once its clusters are in place, +the CDN replicates content across its clusters. The CDN may not want to +place a copy of every video in each cluster, since some videos are +rarely viewed or are only popular in some countries. In fact, many CDNs +do not push videos to their clusters but instead use a simple pull +strategy: If a client requests a video from a cluster that is not +storing the video, then the cluster retrieves the video (from a central +repository or from another cluster) and stores a copy locally while +streaming the video to the client at the same time. Similar Web caching +(see Section 2.2.5), when a cluster's storage becomes full, it removes +videos that are not frequently requested. CDN Operation Having +identified the two major approaches toward deploying a CDN, let's now +dive down into the nuts and bolts of how a CDN operates. When a browser +in a user's + +CASE STUDY GOOGLE'S NETWORK INFRASTRUCTURE To support its vast array of +cloud services---including search, Gmail, calendar, YouTube video, maps, +documents, and social networks---Google has deployed an extensive +private network and CDN infrastructure. Google's CDN infrastructure has +three tiers of server clusters: + + Fourteen "mega data centers," with eight in North America, four in +Europe, and two in Asia \[Google Locations 2016\], with each data center +having on the order of 100,000 servers. These mega data centers are +responsible for serving dynamic (and often personalized) content, +including search results and Gmail messages. An estimated 50 clusters in +IXPs scattered throughout the world, with each cluster consisting on the +order of 100--500 servers \[Adhikari 2011a\]. These clusters are +responsible for serving static content, including YouTube videos +\[Adhikari 2011a\]. Many hundreds of "enter-deep" clusters located +within an access ISP. Here a cluster typically consists of tens of +servers within a single rack. These enter-deep ­servers perform TCP +splitting (see Section 3.7) and serve static content \[Chen 2011\], +including the static portions of Web pages that embody search results. +All of these data centers and cluster locations are networked together +with Google's own private network. When a user makes a search query, +often the query is first sent over the local ISP to a nearby enter-deep +cache, from where the static content is retrieved; while providing the +static content to the client, the nearby cache also forwards the query +over Google's private network to one of the mega data centers, from +where the personalized search results are retrieved. For a YouTube +video, the video itself may come from one of the bring-home caches, +whereas portions of the Web page surrounding the video may come from the +nearby enter-deep cache, and the advertisements surrounding the video +come from the data centers. In summary, except for the local ISPs, the +Google cloud services are largely provided by a network infrastructure +that is independent of the public Internet. + +host is instructed to retrieve a specific video (identified by a URL), +the CDN must intercept the request so that it can (1) determine a +suitable CDN server cluster for that client at that time, and (2) +redirect the client's request to a server in that cluster. We'll shortly +discuss how a CDN can determine a suitable cluster. But first let's +examine the mechanics behind intercepting and redirecting a request. +Most CDNs take advantage of DNS to intercept and redirect requests; an +interesting discussion of such a use of the DNS is \[Vixie 2009\]. Let's +consider a simple example to illustrate how the DNS is typically +involved. Suppose a content provider, NetCinema, employs the third-party +CDN company, KingCDN, to distribute its videos to its customers. On the +NetCinema Web pages, each of its videos is assigned a URL that includes +the string "video" and a unique identifier for the video itself; for +example, Transformers 7 might be assigned +http://video.netcinema.com/6Y7B23V. Six steps then occur, as shown in +Figure 2.25: + +1. The user visits the Web page at NetCinema. +2. When the user clicks on the link http://video.netcinema.com/6Y7B23V, + the user's host sends a DNS query for video.netcinema.com. + + 3. The user's Local DNS Server (LDNS) relays the DNS query to an +authoritative DNS server for NetCinema, which observes the string +"video" in the hostname video.netcinema.com. To "hand over" the DNS +query to KingCDN, instead of returning an IP address, the NetCinema +authoritative DNS server returns to the LDNS a hostname in the KingCDN's +domain, for example, a1105.kingcdn.com. + +4. From this point on, the DNS query enters into KingCDN's private DNS + infrastructure. The user's LDNS then sends a second query, now for + a1105.kingcdn.com, and KingCDN's DNS system eventually returns the + IP addresses of a KingCDN content server to the LDNS. It is thus + here, within the KingCDN's DNS system, that the CDN server from + which the client will receive its content is specified. + +Figure 2.25 DNS redirects a user's request to a CDN server + +5. The LDNS forwards the IP address of the content-serving CDN node to + the user's host. +6. Once the client receives the IP address for a KingCDN content + server, it establishes a direct TCP connection with the server at + that IP address and issues an HTTP GET request for the video. If + DASH is used, the server will first send to the client a manifest + file with a list of URLs, one for each version of the video, and the + client will dynamically select chunks from the different versions. + Cluster Selection Strategies At the core of any CDN deployment is a + cluster selection strategy, that is, a mechanism for dynamically + directing clients to a server cluster or a data center within the + CDN. As we just saw, the + + CDN learns the IP address of the client's LDNS server via the client's +DNS lookup. After learning this IP address, the CDN needs to select an +appropriate cluster based on this IP address. CDNs generally employ +proprietary cluster selection strategies. We now briefly survey a few +approaches, each of which has its own advantages and disadvantages. One +simple strategy is to assign the client to the cluster that is +geographically closest. Using commercial geo-location databases (such as +Quova \[Quova 2016\] and Max-Mind \[MaxMind 2016\]), each LDNS IP +address is mapped to a geographic location. When a DNS request is +received from a particular LDNS, the CDN chooses the geographically +closest cluster, that is, the cluster that is the fewest kilometers from +the LDNS "as the bird flies." Such a solution can work reasonably well +for a large fraction of the clients \[Agarwal 2009\]. However, for some +clients, the solution may perform poorly, since the geographically +closest cluster may not be the closest cluster in terms of the length or +number of hops of the network path. Furthermore, a problem inherent with +all DNS-based approaches is that some end-users are configured to use +remotely located LDNSs \[Shaikh 2001; Mao 2002\], in which case the LDNS +location may be far from the client's location. Moreover, this simple +strategy ignores the variation in delay and available bandwidth over +time of Internet paths, always assigning the same cluster to a +particular client. In order to determine the best cluster for a client +based on the current traffic conditions, CDNs can instead perform +periodic real-time measurements of delay and loss performance between +their clusters and clients. For instance, a CDN can have each of its +clusters periodically send probes (for example, ping messages or DNS +queries) to all of the LDNSs around the world. One drawback of this +approach is that many LDNSs are configured to not respond to such +probes. + +2.6.4 Case Studies: Netflix, YouTube, and Kankan We conclude our +discussion of streaming stored video by taking a look at three highly +successful largescale deployments: Netflix, YouTube, and Kankan. We'll +see that each of these systems take a very different approach, yet +employ many of the underlying principles discussed in this section. +Netflix Generating 37% of the downstream traffic in residential ISPs in +North America in 2015, Netflix has become the leading service provider +for online movies and TV series in the United States \[Sandvine 2015\]. +As we discuss below, Netflix video distribution has two major +components: the Amazon cloud and its own private CDN infrastructure. +Netflix has a Web site that handles numerous functions, including user +registration and login, billing, movie catalogue for browsing and +searching, and a movie recommendation system. As shown in Figure + + 2.26, this Web site (and its associated backend databases) run entirely +on Amazon servers in the Amazon cloud. Additionally, the Amazon cloud +handles the following critical functions: Content ingestion. Before +Netflix can distribute a movie to its customers, it must first ingest +and process the movie. Netflix receives studio master versions of movies +and uploads them to hosts in the Amazon cloud. Content processing. The +machines in the Amazon cloud create many different formats for each +movie, suitable for a diverse array of client video players running on +desktop computers, smartphones, and game consoles connected to +televisions. A different version is created for each of these formats +and at multiple bit rates, allowing for adaptive streaming over HTTP +using DASH. Uploading versions to its CDN. Once all of the versions of a +movie have been created, the hosts in the Amazon cloud upload the +versions to its CDN. + +Figure 2.26 Netflix video streaming platform + +When Netflix first rolled out its video streaming service in 2007, it +employed three third-party CDN companies to distribute its video +content. Netflix has since created its own private CDN, from which it +now streams all of its videos. (Netflix still uses Akamai to distribute +its Web pages, however.) To create its own CDN, Netflix has installed +server racks both in IXPs and within residential ISPs themselves. +Netflix currently has server racks in over 50 IXP locations; see +\[Netflix Open Connect 2016\] for a current list of IXPs housing Netflix +racks. There are also hundreds of ISP locations housing Netflix racks; +also see \[Netflix Open Connect 2016\], where Netflix provides to +potential ISP partners instructions about installing a (free) Netflix +rack for their networks. Each server in the rack has several 10 Gbps + + Ethernet ports and over 100 terabytes of storage. The number of servers +in a rack varies: IXP installations often have tens of servers and +contain the entire Netflix streaming video library, including multiple +versions of the videos to support DASH; local IXPs may only have one +server and contain only the most popular videos. Netflix does not use +pull-caching (Section 2.2.5) to populate its CDN servers in the IXPs and +ISPs. Instead, Netflix distributes by pushing the videos to its CDN +servers during offpeak hours. For those locations that cannot hold the +entire library, Netflix pushes only the most popular videos, which are +determined on a day-to-day basis. The Netflix CDN design is described in +some detail in the YouTube videos \[Netflix Video 1\] and \[Netflix +Video 2\]. Having described the components of the Netflix architecture, +let's take a closer look at the interaction between the client and the +various servers that are involved in movie delivery. As indicated +earlier, the Web pages for browsing the Netflix video library are served +from servers in the Amazon cloud. When a user selects a movie to play, +the Netflix software, running in the Amazon cloud, first determines +which of its CDN servers have copies of the movie. Among the servers +that have the movie, the software then determines the "best" server for +that client request. If the client is using a residential ISP that has a +Netflix CDN server rack installed in that ISP, and this rack has a copy +of the requested movie, then a server in this rack is typically +selected. If not, a server at a nearby IXP is typically selected. Once +Netflix determines the CDN server that is to deliver the content, it +sends the client the IP address of the specific server as well as a +manifest file, which has the URLs for the different versions of the +requested movie. The client and that CDN server then directly interact +using a proprietary version of DASH. Specifically, as described in +Section 2.6.2, the client uses the byte-range header in HTTP GET request +messages, to request chunks from the different versions of the movie. +Netflix uses chunks that are approximately four-seconds long \[Adhikari +2012\]. While the chunks are being downloaded, the client measures the +received throughput and runs a rate-determination algorithm to determine +the quality of the next chunk to request. Netflix embodies many of the +key principles discussed earlier in this section, including adaptive +streaming and CDN distribution. However, because Netflix uses its own +private CDN, which distributes only video (and not Web pages), Netflix +has been able to simplify and tailor its CDN design. In particular, +Netflix does not need to employ DNS redirect, as discussed in Section +2.6.3, to connect a particular client to a CDN server; instead, the +Netflix software (running in the Amazon cloud) directly tells the client +to use a particular CDN server. Furthermore, the Netflix CDN uses push +caching rather than pull caching (Section 2.2.5): content is pushed into +the servers at scheduled times at off-peak hours, rather than +dynamically during cache misses. YouTube With 300 hours of video +uploaded to YouTube every minute and several billion video views per day +\[YouTube 2016\], YouTube is indisputably the world's largest +video-sharing site. YouTube began its + + service in April 2005 and was acquired by Google in November 2006. +Although the Google/YouTube design and protocols are proprietary, +through several independent measurement efforts we can gain a basic +understanding about how YouTube operates \[Zink 2009; Torres 2011; +Adhikari 2011a\]. As with Netflix, YouTube makes extensive use of CDN +technology to distribute its videos \[Torres 2011\]. Similar to Netflix, +Google uses its own private CDN to distribute YouTube videos, and has +installed server clusters in many hundreds of different IXP and ISP +locations. From these locations and directly from its huge data centers, +Google distributes YouTube videos \[Adhikari 2011a\]. Unlike Netflix, +however, Google uses pull caching, as described in Section 2.2.5, and +DNS redirect, as described in Section 2.6.3. Most of the time, Google's +cluster-selection strategy directs the client to the cluster for which +the RTT between client and cluster is the lowest; however, in order to +balance the load across clusters, sometimes the client is directed (via +DNS) to a more distant cluster \[Torres 2011\]. YouTube employs HTTP +streaming, often making a small number of different versions available +for a video, each with a different bit rate and corresponding quality +level. YouTube does not employ adaptive streaming (such as DASH), but +instead requires the user to manually select a version. In order to save +bandwidth and server resources that would be wasted by repositioning or +early termination, YouTube uses the HTTP byte range request to limit the +flow of transmitted data after a target amount of video is prefetched. +Several million videos are uploaded to YouTube every day. Not only are +YouTube videos streamed from server to client over HTTP, but YouTube +uploaders also upload their videos from client to server over HTTP. +YouTube processes each video it receives, converting it to a YouTube +video format and creating multiple versions at different bit rates. This +processing takes place entirely within Google data centers. (See the +case study on Google's network infrastructure in Section 2.6.3.) Kankan +We just saw that dedicated servers, operated by private CDNs, stream +Netflix and YouTube videos to clients. Netflix and YouTube have to pay +not only for the server hardware but also for the bandwidth the servers +use to distribute the videos. Given the scale of these services and the +amount of bandwidth they are consuming, such a CDN deployment can be +costly. We conclude this section by describing an entirely different +approach for providing video on demand over the Internet at a large +scale---one that allows the service provider to significantly reduce its +infrastructure and bandwidth costs. As you might suspect, this approach +uses P2P delivery instead of (or along with) client-server delivery. +Since 2011, Kankan (owned and operated by Xunlei) has been deploying P2P +video delivery with great success, with tens of millions of users every +month \[Zhang 2015\]. At a high level, P2P video streaming is very +similar to BitTorrent file downloading. When a peer wants to + + see a video, it contacts a tracker to discover other peers in the system +that have a copy of that video. This requesting peer then requests +chunks of the video in parallel from the other peers that have the +video. Different from downloading with BitTorrent, however, requests are +preferentially made for chunks that are to be played back in the near +future in order to ensure continuous playback \[Dhungel 2012\]. +Recently, Kankan has migrated to a hybrid CDN-P2P streaming system +\[Zhang 2015\]. Specifically, Kankan now deploys a few hundred servers +within China and pushes video content to these servers. This Kankan CDN +plays a major role in the start-up stage of video streaming. In most +cases, the client requests the beginning of the content from CDN +servers, and in parallel requests content from peers. When the total P2P +traffic is sufficient for video playback, the client will cease +streaming from the CDN and only stream from peers. But if the P2P +streaming traffic becomes insufficient, the client will restart CDN +connections and return to the mode of hybrid CDN-P2P streaming. In this +manner, Kankan can ensure short initial start-up delays while minimally +relying on costly infrastructure servers and bandwidth. + + 2.7 Socket Programming: Creating Network Applications Now that we've +looked at a number of important network applications, let's explore how +network application programs are actually created. Recall from Section +2.1 that a typical network application consists of a pair of +programs---a client program and a server program---residing in two +different end systems. When these two programs are executed, a client +process and a server process are created, and these processes +communicate with each other by reading from, and writing to, sockets. +When creating a network application, the developer's main task is +therefore to write the code for both the client and server programs. +There are two types of network applications. One type is an +implementation whose operation is specified in a protocol standard, such +as an RFC or some other standards document; such an application is +sometimes referred to as "open," since the rules specifying its +operation are known to all. For such an implementation, the client and +server programs must conform to the rules dictated by the RFC. For +example, the client program could be an implementation of the client +side of the HTTP protocol, described in Section 2.2 and precisely +defined in RFC 2616; similarly, the server program could be an +implementation of the HTTP server protocol, also precisely defined in +RFC 2616. If one developer writes code for the client program and +another developer writes code for the server program, and both +developers carefully follow the rules of the RFC, then the two programs +will be able to interoperate. Indeed, many of today's network +applications involve communication between client and server programs +that have been created by independent developers---for example, a Google +Chrome browser communicating with an Apache Web server, or a BitTorrent +client communicating with BitTorrent tracker. The other type of network +application is a proprietary network application. In this case the +client and server programs employ an application-layer protocol that has +not been openly published in an RFC or elsewhere. A single developer (or +development team) creates both the client and server programs, and the +developer has complete control over what goes in the code. But because +the code does not implement an open protocol, other independent +developers will not be able to develop code that interoperates with the +application. In this section, we'll examine the key issues in developing +a client-server application, and we'll "get our hands dirty" by looking +at code that implements a very simple client-server application. During +the development phase, one of the first decisions the developer must +make is whether the application is to run over TCP or over UDP. Recall +that TCP is connection oriented and provides a reliable byte-stream +channel through which data flows between two end systems. UDP is +connectionless and sends independent packets of data from one end system +to the other, without any guarantees about delivery. + + Recall also that when a client or server program implements a protocol +defined by an RFC, it should use the well-known port number associated +with the protocol; conversely, when developing a proprietary +application, the developer must be careful to avoid using such +well-known port numbers. (Port numbers were briefly discussed in Section +2.1. They are covered in more detail in Chapter 3.) We introduce UDP and +TCP socket programming by way of a simple UDP application and a simple +TCP application. We present the simple UDP and TCP applications in +Python 3. We could have written the code in Java, C, or C++, but we +chose Python mostly because Python clearly exposes the key socket +concepts. With Python there are fewer lines of code, and each line can +be explained to the novice programmer without difficulty. But there's no +need to be frightened if you are not familiar with Python. You should be +able to easily follow the code if you have experience programming in +Java, C, or C++. If you are interested in client-server programming with +Java, you are encouraged to see the Companion Website for this textbook; +in fact, you can find there all the examples in this section (and +associated labs) in Java. For readers who are interested in +client-server programming in C, there are several good references +available \[Donahoo 2001; Stevens 1997; Frost 1994; Kurose 1996\]; our +Python examples below have a similar look and feel to C. + +2.7.1 Socket Programming with UDP In this subsection, we'll write simple +client-server programs that use UDP; in the following section, we'll +write similar programs that use TCP. Recall from Section 2.1 that +processes running on different machines communicate with each other by +sending messages into sockets. We said that each process is analogous to +a house and the process's socket is analogous to a door. The application +resides on one side of the door in the house; the transport-layer +protocol resides on the other side of the door in the outside world. The +application developer has control of everything on the application-layer +side of the socket; however, it has little control of the +transport-layer side. Now let's take a closer look at the interaction +between two communicating processes that use UDP sockets. Before the +sending process can push a packet of data out the socket door, when +using UDP, it must first attach a destination address to the packet. +After the packet passes through the sender's socket, the Internet will +use this destination address to route the packet through the Internet to +the socket in the receiving process. When the packet arrives at the +receiving socket, the receiving process will retrieve the packet through +the socket, and then inspect the packet's contents and take appropriate +action. So you may be now wondering, what goes into the destination +address that is attached to the packet? + + As you might expect, the destination host's IP address is part of the +destination address. By including the destination IP address in the +packet, the routers in the Internet will be able to route the packet +through the Internet to the destination host. But because a host may be +running many network application processes, each with one or more +sockets, it is also necessary to identify the particular socket in the +destination host. When a socket is created, an identifier, called a port +number, is assigned to it. So, as you might expect, the packet's +destination address also includes the socket's port number. In summary, +the sending process attaches to the packet a destination address, which +consists of the destination host's IP address and the destination +socket's port number. Moreover, as we shall soon see, the sender's +source address---consisting of the IP address of the source host and the +port number of the source socket---are also attached to the packet. +However, attaching the source address to the packet is typically not +done by the UDP application code; instead it is automatically done by +the underlying operating system. We'll use the following simple +client-server application to demonstrate socket programming for both UDP +and TCP: + +1. The client reads a line of characters (data) from its keyboard and + sends the data to the server. +2. The server receives the data and converts the characters to + uppercase. +3. The server sends the modified data to the client. +4. The client receives the modified data and displays the line on its + screen. Figure 2.27 highlights the main socket-related activity of + the client and server that communicate over the UDP transport + service. Now let's get our hands dirty and take a look at the + client-server program pair for a UDP implementation of this simple + application. We also provide a detailed, line-by-line analysis after + each program. We'll begin with the UDP client, which will send a + simple application-level message to the server. In order for + + Figure 2.27 The client-server application using UDP + +the server to be able to receive and reply to the client's message, it +must be ready and running---that is, it must be running as a process +before the client sends its message. The client program is called +UDPClient.py, and the server program is called UDPServer.py. In order to +emphasize the key issues, we intentionally provide code that is minimal. +"Good code" would certainly have a few more auxiliary lines, in +particular for handling error cases. For this application, we have +arbitrarily chosen 12000 for the server port number. UDPClient.py Here +is the code for the client side of the application: + +from socket import \* serverName = 'hostname' serverPort = 12000 + + clientSocket = socket(AF_INET, SOCK_DGRAM) message = raw_input('Input +lowercase sentence:') clientSocket.sendto(message.encode(),(serverName, +serverPort)) modifiedMessage, serverAddress = +clientSocket.recvfrom(2048) print(modifiedMessage.decode()) +clientSocket.close() + +Now let's take a look at the various lines of code in UDPClient.py. + +from socket import \* + +The socket module forms the basis of all network communications in +Python. By including this line, we will be able to create sockets within +our program. + +serverName = 'hostname' serverPort = 12000 + +The first line sets the variable serverName to the string 'hostname'. +Here, we provide a string containing either the IP address of the server +(e.g., "128.138.32.126") or the hostname of the server (e.g., +"cis.poly.edu"). If we use the hostname, then a DNS lookup will +automatically be performed to get the IP address.) The second line sets +the integer variable serverPort to 12000. + +clientSocket = socket(AF_INET, SOCK_DGRAM) + +This line creates the client's socket, called clientSocket . The first +parameter indicates the address family; in particular, AF_INET indicates +that the underlying network is using IPv4. (Do not worry about this +now---we will discuss IPv4 in Chapter 4.) The second parameter indicates +that the socket is of type SOCK_DGRAM , which means it is a UDP socket +(rather than a TCP socket). Note that we are not specifying the port +number of the client socket when we create it; we are instead letting +the operating system do this for us. Now that the client process's door +has been created, we will want to create a message to send through the +door. + +message = raw_input('Input lowercase sentence:') + + raw_input() is a built-in function in Python. When this command is +executed, the user at the client is prompted with the words "Input +lowercase sentence:" The user then uses her keyboard to input a line, +which is put into the variable message . Now that we have a socket and a +message, we will want to send the message through the socket to the +destination host. + +clientSocket.sendto(message.encode(),(serverName, serverPort)) + +In the above line, we first convert the message from string type to byte +type, as we need to send bytes into a socket; this is done with the +encode() method. The method sendto() attaches the destination address ( +serverName, serverPort ) to the message and sends the resulting packet +into the process's socket, clientSocket . (As mentioned earlier, the +source address is also attached to the packet, although this is done +automatically rather than explicitly by the code.) Sending a +client-to-server message via a UDP socket is that simple! After sending +the packet, the client waits to receive data from the server. + +modifiedMessage, serverAddress = clientSocket.recvfrom(2048) + +With the above line, when a packet arrives from the Internet at the +client's socket, the packet's data is put into the variable +modifiedMessage and the packet's source address is put into the variable +serverAddress . The variable serverAddress contains both the server's IP +address and the server's port number. The program UDPClient doesn't +actually need this server address information, since it already knows +the server address from the outset; but this line of Python provides the +server address nevertheless. The method recvfrom also takes the buffer +size 2048 as input. (This buffer size works for most purposes.) + +print(modifiedMessage.decode()) + +This line prints out modifiedMessage on the user's display, after +converting the message from bytes to string. It should be the original +line that the user typed, but now capitalized. + +clientSocket.close() + + This line closes the socket. The process then terminates. UDPServer.py +Let's now take a look at the server side of the application: + +from socket import \* serverPort = 12000 serverSocket = socket(AF_INET, +SOCK_DGRAM) serverSocket.bind(('', serverPort)) print("The server is +ready to receive") while True: message, clientAddress = +serverSocket.recvfrom(2048) modifiedMessage = message.decode().upper() +serverSocket.sendto(modifiedMessage.encode(), clientAddress) + +Note that the beginning of UDPServer is similar to UDPClient. It also +imports the socket module, also sets the integer variable serverPort to +12000, and also creates a socket of type SOCK_DGRAM (a UDP socket). The +first line of code that is significantly different from UDPClient is: + +serverSocket.bind(('', serverPort)) + +The above line binds (that is, assigns) the port number 12000 to the +server's socket. Thus in UDPServer, the code (written by the application +developer) is explicitly assigning a port number to the socket. In this +manner, when anyone sends a packet to port 12000 at the IP address of +the server, that packet will be directed to this socket. UDPServer then +enters a while loop; the while loop will allow UDPServer to receive and +process packets from clients indefinitely. In the while loop, UDPServer +waits for a packet to arrive. + +message, clientAddress = serverSocket.recvfrom(2048) + +This line of code is similar to what we saw in UDPClient. When a packet +arrives at the server's socket, the packet's data is put into the +variable message and the packet's source address is put into the +variable clientAddress . The variable ­clientAddress contains both the +client's IP address and the client's port number. Here, UDPServer will +make use of this address information, as it provides a return + + address, similar to the return address with ordinary postal mail. With +this source address information, the server now knows to where it should +direct its reply. + +modifiedMessage = message.decode().upper() + +This line is the heart of our simple application. It takes the line sent +by the client and, after converting the message to a string, uses the +method upper() to capitalize it. + +serverSocket.sendto(modifiedMessage.encode(), clientAddress) + +This last line attaches the client's address (IP address and port +number) to the capitalized message (after converting the string to +bytes), and sends the resulting packet into the server's socket. (As +mentioned earlier, the server address is also attached to the packet, +although this is done automatically rather than explicitly by the code.) +The Internet will then deliver the packet to this client address. After +the server sends the packet, it remains in the while loop, waiting for +another UDP packet to arrive (from any client running on any host). To +test the pair of programs, you run UDPClient.py on one host and +UDPServer.py on another host. Be sure to include the proper hostname or +IP address of the server in UDPClient.py. Next, you execute +UDPServer.py, the compiled server program, in the server host. This +creates a process in the server that idles until it is contacted by some +client. Then you execute UDPClient.py, the compiled client program, in +the client. This creates a process in the client. Finally, to use the +application at the client, you type a sentence followed by a carriage +return. To develop your own UDP client-server application, you can begin +by slightly modifying the client or server programs. For example, +instead of converting all the letters to uppercase, the server could +count the number of times the letter s appears and return this number. +Or you can modify the client so that after receiving a capitalized +sentence, the user can continue to send more sentences to the server. + +2.7.2 Socket Programming with TCP Unlike UDP, TCP is a +connection-oriented protocol. This means that before the client and +server can start to send data to each other, they first need to +handshake and establish a TCP connection. One end of the TCP connection +is attached to the client socket and the other end is attached to a +server socket. When creating the TCP connection, we associate with it +the client socket address (IP address and port + + number) and the server socket address (IP address and port number). With +the TCP connection established, when one side wants to send data to the +other side, it just drops the data into the TCP connection via its +socket. This is different from UDP, for which the server must attach a +destination address to the packet before dropping it into the socket. +Now let's take a closer look at the interaction of client and server +programs in TCP. The client has the job of initiating contact with the +server. In order for the server to be able to react to the client's +initial contact, the server has to be ready. This implies two things. +First, as in the case of UDP, the TCP server must be running as a +process before the client attempts to initiate contact. Second, the +server program must have a special door---more precisely, a special +socket---that welcomes some initial contact from a client process +running on an arbitrary host. Using our house/door analogy for a +process/socket, we will sometimes refer to the client's initial contact +as "knocking on the welcoming door." With the server process running, +the client process can initiate a TCP connection to the server. This is +done in the client program by creating a TCP socket. When the client +creates its TCP socket, it specifies the address of the welcoming socket +in the server, namely, the IP address of the server host and the port +number of the socket. After creating its socket, the client initiates a +three-way handshake and establishes a TCP connection with the server. +The three-way handshake, which takes place within the transport layer, +is completely invisible to the client and server programs. During the +three-way handshake, the client process knocks on the welcoming door of +the server process. When the server "hears" the knocking, it creates a +new door---more precisely, a new socket that is dedicated to that +particular ­client. In our example below, the welcoming door is a TCP +socket object that we call ­ serverSocket ; the newly created socket +dedicated to the client making the connection is called connectionSocket +. Students who are encountering TCP sockets for the first time sometimes +confuse the welcoming socket (which is the initial point of contact for +all clients wanting to communicate with the server), and each newly +created server-side connection socket that is subsequently created for +communicating with each client. From the application's perspective, the +client's socket and the server's connection socket are directly +connected by a pipe. As shown in Figure 2.28, the client process can +send arbitrary bytes into its socket, and TCP guarantees that the server +process will receive (through the connection socket) each byte in the +order sent. TCP thus provides a reliable service between the client and +server processes. Furthermore, just as people can go in and out the same +door, the client process not only sends bytes into but also receives +bytes from its socket; similarly, the server process not only receives +bytes from but also sends bytes into its connection socket. We use the +same simple client-server application to demonstrate socket programming +with TCP: The client sends one line of data to the server, the server +capitalizes the line and sends it back to the client. Figure 2.29 +highlights the main socket-related activity of the client and server +that communicate over + + the TCP transport service. + +Figure 2.28 The TCPServer process has two sockets + +TCPClient.py Here is the code for the client side of the application: + +from socket import \* serverName = 'servername' serverPort = 12000 +clientSocket = socket(AF_INET, SOCK_STREAM) +clientSocket.connect((serverName, serverPort)) sentence = +raw_input('Input lowercase sentence:') +clientSocket.send(sentence.encode()) modifiedSentence = +clientSocket.recv(1024) print('From Server: ', +modifiedSentence.decode()) clientSocket.close() + +Let's now take a look at the various lines in the code that differ +significantly from the UDP implementation. The first such line is the +creation of the client socket. + + clientSocket = socket(AF_INET, SOCK_STREAM) + +This line creates the client's socket, called clientSocket . The first +parameter again indicates that the underlying network is using IPv4. The +second parameter + +Figure 2.29 The client-server application using TCP + +indicates that the socket is of type SOCK_STREAM , which means it is a +TCP socket (rather than a UDP socket). Note that we are again not +specifying the port number of the client socket when we create it; we +are instead letting the operating system do this for us. Now the next +line of code is very different from what we saw in UDPClient: + + clientSocket.connect((serverName, serverPort)) + +Recall that before the client can send data to the server (or vice +versa) using a TCP socket, a TCP connection must first be established +between the client and server. The above line initiates the TCP +connection between the client and server. The parameter of the connect() +method is the address of the server side of the connection. After this +line of code is executed, the three-way handshake is performed and a TCP +connection is established between the client and server. + +sentence = raw_input('Input lowercase sentence:') + +As with UDPClient, the above obtains a sentence from the user. The +string sentence continues to gather characters until the user ends the +line by typing a carriage return. The next line of code is also very +different from UDPClient: + +clientSocket.send(sentence.encode()) + +The above line sends the sentence through the client's socket and into +the TCP connection. Note that the program does not explicitly create a +packet and attach the destination address to the packet, as was the case +with UDP sockets. Instead the client program simply drops the bytes in +the string sentence into the TCP connection. The client then waits to +receive bytes from the server. + +modifiedSentence = clientSocket.recv(2048) + +When characters arrive from the server, they get placed into the string +modifiedSentence . Characters continue to accumulate in modifiedSentence +until the line ends with a carriage return character. After printing the +capitalized sentence, we close the client's socket: + +clientSocket.close() + +This last line closes the socket and, hence, closes the TCP connection +between the client and the server. It causes TCP in the client to send a +TCP message to TCP in the server (see Section 3.5). + + TCPServer.py Now let's take a look at the server program. + +from socket import \* serverPort = 12000 serverSocket = socket(AF_INET, +SOCK_STREAM) serverSocket.bind(('', serverPort)) serverSocket.listen(1) +print('The server is ready to receive') while True: connectionSocket, +addr = serverSocket.accept() sentence = +connectionSocket.recv(1024).decode() capitalizedSentence = +sentence.upper() connectionSocket.send(capitalizedSentence.encode()) +connectionSocket.close() + +Let's now take a look at the lines that differ significantly from +UDPServer and TCPClient. As with TCPClient, the server creates a TCP +socket with: + +serverSocket=socket(AF_INET, SOCK_STREAM) + +Similar to UDPServer, we associate the server port number, serverPort , +with this socket: + +serverSocket.bind(('', serverPort)) + +But with TCP, serverSocket will be our welcoming socket. After +establishing this welcoming door, we will wait and listen for some +client to knock on the door: + +serverSocket.listen(1) + +This line has the server listen for TCP connection requests from the +client. The parameter specifies the maximum number of queued connections +(at least 1). + + connectionSocket, addr = serverSocket.accept() + +When a client knocks on this door, the program invokes the accept() +method for serverSocket, which creates a new socket in the server, +called ­ connectionSocket , dedicated to this particular client. The +client and server then complete the handshaking, creating a TCP +connection between the client's clientSocket and the server's +connectionSocket . With the TCP connection established, the client and +server can now send bytes to each other over the connection. With TCP, +all bytes sent from one side not are not only guaranteed to arrive at +the other side but also guaranteed arrive in order. + +connectionSocket.close() + +In this program, after sending the modified sentence to the client, we +close the connection socket. But since serverSocket remains open, +another client can now knock on the door and send the server a sentence +to modify. This completes our discussion of socket programming in TCP. +You are encouraged to run the two programs in two separate hosts, and +also to modify them to achieve slightly different goals. You should +compare the UDP program pair with the TCP program pair and see how they +differ. You should also do many of the socket programming assignments +described at the ends of Chapter 2, 4, and 9. Finally, we hope someday, +after mastering these and more advanced socket programs, you will write +your own popular network application, become very rich and famous, and +remember the authors of this textbook! + + 2.8 Summary In this chapter, we've studied the conceptual and the +implementation aspects of network applications. We've learned about the +ubiquitous client-server architecture adopted by many Internet +applications and seen its use in the HTTP, SMTP, POP3, and DNS +protocols. We've studied these important applicationlevel protocols, and +their corresponding associated applications (the Web, file transfer, +e-mail, and DNS) in some detail. We've learned about the P2P +architecture and how it is used in many applications. We've also learned +about streaming video, and how modern video distribution systems +leverage CDNs. We've examined how the socket API can be used to build +network applications. We've walked through the use of sockets for +connection-oriented (TCP) and connectionless (UDP) end-to-end transport +services. The first step in our journey down the layered network +architecture is now complete! At the very beginning of this book, in +Section 1.1, we gave a rather vague, bare-bones definition of a +protocol: "the format and the order of messages exchanged between two or +more communicating entities, as well as the actions taken on the +transmission and/or receipt of a message or other event." The material +in this chapter, and in particular our detailed study of the HTTP, SMTP, +POP3, and DNS protocols, has now added considerable substance to this +definition. Protocols are a key concept in networking; our study of +application protocols has now given us the opportunity to develop a more +intuitive feel for what protocols are all about. In Section 2.1, we +described the service models that TCP and UDP offer to applications that +invoke them. We took an even closer look at these service models when we +developed simple applications that run over TCP and UDP in Section 2.7. +However, we have said little about how TCP and UDP provide these service +models. For example, we know that TCP provides a reliable data service, +but we haven't said yet how it does so. In the next chapter we'll take a +careful look at not only the what, but also the how and why of transport +protocols. Equipped with knowledge about Internet application structure +and application-level protocols, we're now ready to head further down +the protocol stack and examine the transport layer in Chapter 3. + + Homework Problems and Questions + +Chapter 2 Review Questions + +SECTION 2.1 R1. List five nonproprietary Internet applications and the +application-layer protocols that they use. R2. What is the difference +between network architecture and application architecture? R3. For a +communication session between a pair of processes, which process is the +client and which is the server? R4. For a P2P file-sharing application, +do you agree with the statement, "There is no notion of client and +server sides of a communication session"? Why or why not? R5. What +information is used by a process running on one host to identify a +process running on another host? R6. Suppose you wanted to do a +transaction from a remote client to a server as fast as possible. Would +you use UDP or TCP? Why? R7. Referring to Figure 2.4 , we see that none +of the applications listed in Figure 2.4 requires both no data loss and +timing. Can you conceive of an application that requires no data loss +and that is also highly time-sensitive? R8. List the four broad classes +of services that a transport protocol can provide. For each of the +service classes, indicate if either UDP or TCP (or both) provides such a +service. R9. Recall that TCP can be enhanced with SSL to provide +process-to-process security services, including encryption. Does SSL +operate at the transport layer or the application layer? If the +application developer wants TCP to be enhanced with SSL, what does the +developer have to do? + +SECTION 2.2--2.5 R10. What is meant by a handshaking protocol? R11. Why +do HTTP, SMTP, and POP3 run on top of TCP rather than on UDP? R12. +Consider an e-commerce site that wants to keep a purchase record for +each of its customers. Describe how this can be done with cookies. R13. +Describe how Web caching can reduce the delay in receiving a requested +object. Will Web caching reduce the delay for all objects requested by a +user or for only some of the objects? + + Why? R14. Telnet into a Web server and send a multiline request message. +Include in the request message the If-modified-since: header line to +force a response message with the 304 Not Modified status code. R15. +List several popular messaging apps. Do they use the same protocols as +SMS? R16. Suppose Alice, with a Web-based e-mail account (such as +Hotmail or Gmail), sends a message to Bob, who accesses his mail from +his mail server using POP3. Discuss how the message gets from Alice's +host to Bob's host. Be sure to list the series of application-layer +protocols that are used to move the message between the two hosts. R17. +Print out the header of an e-mail message you have recently received. +How many Received: header lines are there? Analyze each of the header +lines in the message. R18. From a user's perspective, what is the +difference between the download-and-delete mode and the +download-and-keep mode in POP3? R19. Is it possible for an +organization's Web server and mail server to have exactly the same alias +for a hostname (for example, foo.com )? What would be the type for the +RR that contains the hostname of the mail server? R20. Look over your +received e-mails, and examine the header of a message sent from a user +with a .edu e-mail address. Is it possible to determine from the header +the IP address of the host from which the message was sent? Do the same +for a message sent from a Gmail account. + +SECTION 2.5 R21. In BitTorrent, suppose Alice provides chunks to Bob +throughout a 30-second interval. Will Bob necessarily return the favor +and provide chunks to Alice in this same interval? Why or why not? R22. +Consider a new peer Alice that joins BitTorrent without possessing any +chunks. Without any chunks, she cannot become a top-four uploader for +any of the other peers, since she has nothing to upload. How then will +Alice get her first chunk? R23. What is an overlay network? Does it +include routers? What are the edges in the overlay network? + +SECTION 2.6 R24. CDNs typically adopt one of two different server +placement philosophies. Name and briefly describe them. R25. Besides +network-related considerations such as delay, loss, and bandwidth +performance, there are other important factors that go into designing a +CDN server selection strategy. What are they? + + SECTION 2.7 R26. In Section 2.7, the UDP server described needed only +one socket, whereas the TCP server needed two sockets. Why? If the TCP +server were to support n simultaneous connections, each from a different +client host, how many sockets would the TCP server need? R27. For the +client-server application over TCP described in Section 2.7 , why must +the server program be executed before the client program? For the +client-server application over UDP, why may the client program be +executed before the server program? + +Problems P1. True or false? + +a. A user requests a Web page that consists of some text and three + images. For this page, the client will send one request message and + receive four response messages. + +b. Two distinct Web pages (for example, www.mit.edu/research.html and + www.mit.edu/students.html ) can be sent over the same persistent + connection. + +c. With nonpersistent connections between browser and origin server, it + is possible for a single TCP segment to carry two distinct HTTP + request messages. + +d. The Date: header in the HTTP response message indicates when the + object in the response was last modified. + +e. HTTP response messages never have an empty message body. P2. SMS, + iMessage, and WhatsApp are all smartphone real-time messaging + systems. After doing some research on the Internet, for each of + these systems write one paragraph about the protocols they use. Then + write a paragraph explaining how they differ. P3. Consider an HTTP + client that wants to retrieve a Web document at a given URL. The IP + address of the HTTP server is initially unknown. What transport and + application-layer protocols besides HTTP are needed in this + scenario? P4. Consider the following string of ASCII characters that + were captured by Wireshark when the browser sent an HTTP GET message + (i.e., this is the actual content of an HTTP GET message). The + characters ``{=html}``{=html} are carriage return and + line-feed characters (that is, the italized character string + ``{=html} in the text below represents the single + carriage-return character that was contained at that point in the + HTTP header). Answer the following questions, indicating where in + the HTTP GET message below you find the answer. GET + /cs453/index.html HTTP/1.1``{=html}``{=html}Host: gai + a.cs.umass.edu``{=html}``{=html}User-Agent: Mozilla/5.0 ( + Windows;U; Windows NT 5.1; en-US; rv:1.7.2) Gec ko/20040804 + Netscape/7.2 (ax) ``{=html}``{=html}Accept:ex + + t/xml, application/xml, application/xhtml+xml, text /html;q=0.9, +text/plain;q=0.8, image/png,*/*;q=0.5 +``{=html}``{=html}Accept-Language: en-us, +en;q=0.5``{=html}``{=html}AcceptEncoding: zip, +deflate``{=html}``{=html}Accept-Charset: ISO -8859-1, +utf-8;q=0.7,\*;q=0.7``{=html}``{=html}Keep-Alive: +300``{=html} +``{=html}Connection:keep-alive``{=html}``{=html}``{=html}``{=html} + +a. What is the URL of the document requested by the browser? + +b. What version of HTTP is the browser running? + +c. Does the browser request a non-persistent or a persistent + connection? + +d. What is the IP address of the host on which the browser is running? + +e. What type of browser initiates this message? Why is the browser type + needed in an HTTP request message? P5. The text below shows the + reply sent from the server in response to the HTTP GET message in + the question above. Answer the following questions, indicating where + in the message below you find the answer. HTTP/1.1 200 + OK``{=html}``{=html}Date: Tue, 07 Mar 2008 + 12:39:45GMT``{=html}``{=html}Server: Apache/2.0.52 (Fedora) + ``{=html}``{=html}Last-Modified: Sat, 10 Dec2005 18:27:46 + GMT``{=html}``{=html}ETag: + "526c3-f22-a88a4c80"``{=html}``{=html}AcceptRanges: + bytes``{=html}``{=html}Content-Length: + 3874``{=html}``{=html} Keep-Alive: + timeout=max=100``{=html}``{=html}Connection: + Keep-Alive``{=html}``{=html}Content-Type: text/html; + charset= + ISO-8859-1``{=html}``{=html}``{=html}``{=html}\``{=html} + + ```{=html} + + ``` + ``{=html} + + ```{=html} + + ``` + ``{=html} + + ```{=html} + + ``` + ``{=html} \``{=html} + + ```{=html} + + ``` + CMPSCI 453 / 591 / NTU-ST550ASpring 2005 homepage + + ```{=html} + + ``` + ``{=html} + + ```{=html} + + ``` + ``{=html} \ + +f. Was the server able to successfully find the document or not? What + time was the document reply provided? + +g. When was the document last modified? + +h. How many bytes are there in the document being returned? + +i. What are the first 5 bytes of the document being returned? Did the + server agree to a + + persistent connection? P6. Obtain the HTTP/1.1 specification (RFC 2616). +Answer the following questions: + +a. Explain the mechanism used for signaling between the client and + server to indicate that a persistent connection is being closed. Can + the client, the server, or both signal the close of a connection? + +b. What encryption services are provided by HTTP? + +c. Can a client open three or more simultaneous connections with a + given server? + +d. Either a server or a client may close a transport connection between + them if either one detects the connection has been idle for some + time. Is it possible that one side starts closing a connection while + the other side is transmitting data via this connection? Explain. + P7. Suppose within your Web browser you click on a link to obtain a + Web page. The IP address for the associated URL is not cached in + your local host, so a DNS lookup is necessary to obtain the IP + address. Suppose that n DNS servers are visited before your host + receives the IP address from DNS; the successive visits incur an RTT + of RTT1,. . .,RTTn. Further suppose that the Web page associated + with the link contains exactly one object, consisting of a small + amount of HTML text. Let RTT0 denote the RTT between the local host + and the server containing the object. Assuming zero transmission + time of the object, how much time elapses from when the client + clicks on the link until the client receives the object? P8. + Referring to Problem P7, suppose the HTML file references eight very + small objects on the same server. Neglecting transmission times, how + much time elapses with + +e. Non-persistent HTTP with no parallel TCP connections? + +f. Non-persistent HTTP with the browser configured for 5 parallel + connections? + +g. Persistent HTTP? P9. Consider Figure 2.12 , for which there is an + institutional network connected to the Internet. Suppose that the + average object size is 850,000 bits and that the average request + rate from the institution's browsers to the origin servers is 16 + requests per second. Also suppose that the amount of time it takes + from when the router on the Internet side of the access link + forwards an HTTP request until it receives the response is three + seconds on average (see Section 2.2.5). Model the total average + response time as the sum of the average access delay (that is, the + delay from Internet router to institution router) and the average + Internet delay. For the average access delay, use Δ/(1−Δβ), where Δ + is the average time required to send an object over the access link + and b is the arrival rate of objects to the access link. + +h. Find the total average response time. + +i. Now suppose a cache is installed in the institutional LAN. Suppose + the miss rate is 0.4. Find the total response time. + + P10. Consider a short, 10-meter link, over which a sender can transmit +at a rate of 150 bits/sec in both directions. Suppose that packets +containing data are 100,000 bits long, and packets containing only +control (e.g., ACK or handshaking) are 200 bits long. Assume that N +parallel connections each get 1/N of the link bandwidth. Now consider +the HTTP protocol, and suppose that each downloaded object is 100 Kbits +long, and that the initial downloaded object contains 10 referenced +objects from the same sender. Would parallel downloads via parallel +instances of non-persistent HTTP make sense in this case? Now consider +persistent HTTP. Do you expect significant gains over the non-persistent +case? Justify and explain your answer. P11. Consider the scenario +introduced in the previous problem. Now suppose that the link is shared +by Bob with four other users. Bob uses parallel instances of +non-persistent HTTP, and the other four users use non-persistent HTTP +without parallel downloads. + +a. Do Bob's parallel connections help him get Web pages more quickly? + Why or why not? +b. If all five users open five parallel instances of non-persistent + HTTP, then would Bob's parallel connections still be beneficial? Why + or why not? P12. Write a simple TCP program for a server that + accepts lines of input from a client and prints the lines onto the + server's standard output. (You can do this by modifying the + TCPServer.py program in the text.) Compile and execute your program. + On any other machine that contains a Web browser, set the proxy + server in the browser to the host that is running your server + program; also configure the port number appropriately. Your browser + should now send its GET request messages to your server, and your + server should display the messages on its standard output. Use this + platform to determine whether your browser generates conditional GET + messages for objects that are locally cached. P13. What is the + difference between MAIL FROM : in SMTP and From : in the mail + message itself? P14. How does SMTP mark the end of a message body? + How about HTTP? Can HTTP use the same method as SMTP to mark the end + of a message body? Explain. P15. Read RFC 5321 for SMTP. What does + MTA stand for? Consider the following received spam e-mail (modified + from a real spam e-mail). Assuming only the originator of this spam + e-mail is malicious and all other hosts are honest, identify the + malacious host that has generated this spam e-mail. + +From - Fri Nov 07 13:41:30 2008 Return-Path: +Received: from barmail.cs.umass.edu (barmail.cs.umass.edu +\[128.119.240.3\]) by cs.umass.edu (8.13.1/8.12.6) for +; Fri, 7 Nov 2008 13:27:10 -0500 Received: from +asusus-4b96 (localhost \[127.0.0.1\]) by barmail.cs.umass.edu (Spam +Firewall) for ; Fri, 7 + + Nov 2008 13:27:07 -0500 (EST) Received: from asusus-4b96 +(\[58.88.21.177\]) by barmail.cs.umass.edu for ; Fri, +07 Nov 2008 13:27:07 -0500 (EST) Received: from \[58.88.21.177\] by +inbnd55.exchangeddd.com; Sat, 8 Nov 2008 01:27:07 +0700 From: "Jonny" + To: Subject: How to secure your +savings + +P16. Read the POP3 RFC, RFC 1939. What is the purpose of the UIDL POP3 +command? P17. Consider accessing your e-mail with POP3. + +a. Suppose you have configured your POP mail client to operate in the + download-anddelete mode. Complete the following transaction: + +C: list S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: +..........blah S: . ? ? + +b. Suppose you have configured your POP mail client to operate in the + download-and-keep mode. Complete the following transaction: C: list + S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: ..........blah + S: . ? + + ? + +c. Suppose you have configured your POP mail client to operate in the + download-and-keep mode. Using your transcript in part (b), suppose + you retrieve messages 1 and 2, exit POP, and then five minutes later + you again access POP to retrieve new e-mail. Suppose that in the + five-minute interval no new messages have been sent to you. Provide + a transcript of this second POP session. P18. + +d. What is a whois database? + +e. Use various whois databases on the Internet to obtain the names of + two DNS servers. Indicate which whois databases you used. + +f. Use nslookup on your local host to send DNS queries to three DNS + servers: your local DNS server and the two DNS servers you found in + part (b). Try querying for Type A, NS, and MX reports. Summarize + your findings. + +g. Use nslookup to find a Web server that has multiple IP addresses. + Does the Web server of your institution (school or company) have + multiple IP addresses? + +h. Use the ARIN whois database to determine the IP address range used + by your university. + +i. Describe how an attacker can use whois databases and the nslookup + tool to perform reconnaissance on an institution before launching an + attack. + +j. Discuss why whois databases should be publicly available. P19. In + this problem, we use the useful dig tool available on Unix and Linux + hosts to explore the hierarchy of DNS servers. Recall that in Figure + 2.19 , a DNS server in the DNS hierarchy delegates a DNS query to a + DNS server lower in the hierarchy, by sending back to the DNS client + the name of that lower-level DNS server. First read the man page for + dig, and then answer the following questions. + +k. Starting with a root DNS server (from one of the root servers + \[a-m\].root-servers.net), initiate a sequence of queries for the IP + address for your department's Web server by using dig. Show the list + of the names of DNS servers in the delegation chain in answering + your query. + +l. Repeat part (a) for several popular Web sites, such as google.com, + yahoo.com, or amazon.com. P20. Suppose you can access the caches in + the local DNS servers of your department. Can you propose a way to + roughly determine the Web servers (outside your department) that are + most popular among the users in your department? Explain. P21. + Suppose that your department has a local DNS server for all + computers in the department. + + You are an ordinary user (i.e., not a network/system administrator). Can +you determine if an external Web site was likely accessed from a +computer in your department a couple of seconds ago? Explain. P22. +Consider distributing a file of F=15 Gbits to N peers. The server has an +upload rate of us=30 Mbps, and each peer has a download rate of di=2 +Mbps and an upload rate of u. For N=10, 100, and 1,000 and u=300 Kbps, +700 Kbps, and 2 Mbps, prepare a chart giving the minimum distribution +time for each of the combinations of N and u for both client-server +distribution and P2P distribution. P23. Consider distributing a file of +F bits to N peers using a client-server architecture. Assume a fluid +model where the server can simultaneously transmit to multiple peers, +transmitting to each peer at different rates, as long as the combined +rate does not exceed us. + +a. Suppose that us/N≤dmin. Specify a distribution scheme that has a + distribution time of NF/us. + +b. Suppose that us/N≥dmin. Specify a distribution scheme that has a + distribution time of F/dmin. + +c. Conclude that the minimum distribution time is in general given by + max{NF/us, F/dmin}. P24. Consider distributing a file of F bits to N + peers using a P2P architecture. Assume a fluid model. For simplicity + assume that dmin is very large, so that peer download bandwidth is + never a bottleneck. + +d. Suppose that us≤(us+u1+...+uN)/N. Specify a distribution scheme that + has a distribution time of F/us. + +e. Suppose that us≥(us+u1+...+uN)/N. Specify a distribution scheme that + has a distribution time of NF/(us+u1+...+uN). + +f. Conclude that the minimum distribution time is in general given by + max{F/us, NF/(us+u1+...+uN)}. P25. Consider an overlay network with + N active peers, with each pair of peers having an active TCP + connection. Additionally, suppose that the TCP connections pass + through a total of M routers. How many nodes and edges are there in + the corresponding overlay network? P26. Suppose Bob joins a + BitTorrent torrent, but he does not want to upload any data to any + other peers (so called free-riding). + +g. Bob claims that he can receive a complete copy of the file that is + shared by the swarm. Is Bob's claim possible? Why or why not? + +h. Bob further claims that he can further make his "free-riding" more + efficient by using a collection of multiple computers (with distinct + IP addresses) in the computer lab in his department. How can he do + that? P27. Consider a DASH system for which there are N video + versions (at N different rates and qualities) and N audio versions + (at N different rates and qualities). Suppose we want to allow the + + player to choose at any time any of the N video versions and any of the +N audio versions. + +a. If we create files so that the audio is mixed in with the video, so + server sends only one media stream at given time, how many files + will the server need to store (each a different URL)? + +b. If the server instead sends the audio and video streams separately + and has the client synchronize the streams, how many files will the + server need to store? P28. Install and compile the Python programs + TCPClient and UDPClient on one host and TCPServer and UDPServer on + another host. + +c. Suppose you run TCPClient before you run TCPServer. What happens? + Why? + +d. Suppose you run UDPClient before you run UDPServer. What happens? + Why? + +e. What happens if you use different port numbers for the client and + server sides? P29. Suppose that in UDPClient.py, after we create the + socket, we add the line: clientSocket.bind(('', 5432)) + +Will it become necessary to change UDPServer.py? What are the port +numbers for the sockets in UDPClient and UDPServer? What were they +before making this change? P30. Can you configure your browser to open +multiple simultaneous connections to a Web site? What are the advantages +and disadvantages of having a large number of simultaneous TCP +connections? P31. We have seen that Internet TCP sockets treat the data +being sent as a byte stream but UDP sockets recognize message +boundaries. What are one advantage and one disadvantage of byte-oriented +API versus having the API explicitly recognize and preserve +application-defined message boundaries? P32. What is the Apache Web +server? How much does it cost? What functionality does it currently +have? You may want to look at Wikipedia to answer this question. + +Socket Programming Assignments The Companion Website includes six socket +programming assignments. The first four assignments are summarized +below. The fifth assignment makes use of the ICMP protocol and is +summarized at the end of Chapter 5. The sixth assignment employs +multimedia protocols and is summarized at the end of Chapter 9. It is +highly recommended that students complete several, if not all, of these +assignments. Students can find full details of these assignments, as +well as important snippets of the Python code, at the Web site +www.pearsonhighered.com/cs-resources. Assignment 1: Web Server + + In this assignment, you will develop a simple Web server in Python that +is capable of processing only one request. Specifically, your Web server +will (i) create a connection socket when contacted by a client +(browser); (ii) receive the HTTP request from this connection; (iii) +parse the request to determine the specific file being requested; (iv) +get the requested file from the server's file system; (v) create an HTTP +response message consisting of the requested file preceded by header +lines; and (vi) send the response over the TCP connection to the +requesting browser. If a browser requests a file that is not present in +your server, your server should return a "404 Not Found" error message. +In the Companion Website, we provide the skeleton code for your server. +Your job is to complete the code, run your server, and then test your +server by sending requests from browsers running on different hosts. If +you run your server on a host that already has a Web server running on +it, then you should use a different port than port 80 for your Web +server. Assignment 2: UDP Pinger In this programming assignment, you +will write a client ping program in Python. Your client will send a +simple ping message to a server, receive a corresponding pong message +back from the server, and determine the delay between when the client +sent the ping message and received the pong message. This delay is +called the Round Trip Time (RTT). The functionality provided by the +client and server is similar to the functionality provided by standard +ping program available in modern operating systems. However, standard +ping programs use the Internet Control Message Protocol (ICMP) (which we +will study in Chapter 5). Here we will create a nonstandard (but +simple!) UDP-based ping program. Your ping program is to send 10 ping +messages to the target server over UDP. For each message, your client is +to determine and print the RTT when the corresponding pong message is +returned. Because UDP is an unreliable protocol, a packet sent by the +client or server may be lost. For this reason, the client cannot wait +indefinitely for a reply to a ping message. You should have the client +wait up to one second for a reply from the server; if no reply is +received, the client should assume that the packet was lost and print a +message accordingly. In this assignment, you will be given the complete +code for the server (available in the Companion Website). Your job is to +write the client code, which will be very similar to the server code. It +is recommended that you first study carefully the server code. You can +then write your client code, liberally cutting and pasting lines from +the server code. Assignment 3: Mail Client The goal of this programming +assignment is to create a simple mail client that sends e-mail to any +recipient. Your client will need to establish a TCP connection with a +mail server (e.g., a Google mail server), dialogue with the mail server +using the SMTP protocol, send an e-mail message to a recipient + + (e.g., your friend) via the mail server, and finally close the TCP +connection with the mail server. For this assignment, the Companion +Website provides the skeleton code for your client. Your job is to +complete the code and test your client by sending e-mail to different +user accounts. You may also try sending through different servers (for +example, through a Google mail server and through your university mail +server). Assignment 4: Multi-Threaded Web Proxy In this assignment, you +will develop a Web proxy. When your proxy receives an HTTP request for +an object from a browser, it generates a new HTTP request for the same +object and sends it to the origin server. When the proxy receives the +corresponding HTTP response with the object from the origin server, it +creates a new HTTP response, including the object, and sends it to the +client. This proxy will be multi-threaded, so that it will be able to +handle multiple requests at the same time. For this assignment, the +Companion Website provides the skeleton code for the proxy server. Your +job is to complete the code, and then test it by having different +browsers request Web objects via your proxy. + +Wireshark Lab: HTTP Having gotten our feet wet with the Wireshark packet +sniffer in Lab 1, we're now ready to use Wireshark to investigate +protocols in operation. In this lab, we'll explore several aspects of +the HTTP protocol: the basic GET/reply interaction, HTTP message +formats, retrieving large HTML files, retrieving HTML files with +embedded URLs, persistent and non-persistent connections, and HTTP +authentication and security. As is the case with all Wireshark labs, the +full description of this lab is available at this book's Web site, +www.pearsonhighered.com/cs-resources. + +Wireshark Lab: DNS In this lab, we take a closer look at the client side +of the DNS, the protocol that translates Internet hostnames to IP +addresses. Recall from Section 2.5 that the client's role in the DNS is +relatively simple ---a client sends a query to its local DNS server and +receives a response back. Much can go on under the covers, invisible to +the DNS clients, as the hierarchical DNS servers communicate with each +other to either recursively or iteratively resolve the client's DNS +query. From the DNS client's standpoint, however, the protocol is quite +simple---a query is formulated to the local DNS server and a response is +received from that server. We observe DNS in action in this lab. + + As is the case with all Wireshark labs, the full description of this lab +is available at this book's Web site, +www.pearsonhighered.com/cs-resources. An Interview With... Marc +Andreessen Marc Andreessen is the co-creator of Mosaic, the Web browser +that popularized the World Wide Web in 1993. Mosaic had a clean, easily +understood interface and was the first browser to display images in-line +with text. In 1994, Marc Andreessen and Jim Clark founded Netscape, +whose browser was by far the most popular browser through the mid-1990s. +Netscape also developed the Secure Sockets Layer (SSL) protocol and many +Internet server products, including mail servers and SSL-based Web +servers. He is now a co-founder and general partner of venture capital +firm Andreessen Horowitz, overseeing portfolio development with holdings +that include Facebook, Foursquare, Groupon, Jawbone, Twitter, and Zynga. +He serves on numerous boards, including Bump, eBay, Glam Media, +Facebook, and Hewlett-Packard. He holds a BS in Computer Science from +the University of Illinois at Urbana-Champaign. + +How did you become interested in computing? Did you always know that you +wanted to work in information technology? The video game and personal +computing revolutions hit right when I was growing up---personal +computing was the new technology frontier in the late 70's and early +80's. And it wasn't just Apple and the IBM PC, but hundreds of new +companies like Commodore and Atari as well. I taught myself to program +out of a book called "Instant Freeze-Dried BASIC" at age 10, and got my +first computer (a TRS-80 Color Computer---look it up!) at age 12. Please +describe one or two of the most exciting projects you have worked on +during your career. + + What were the biggest challenges? Undoubtedly the most exciting project +was the original Mosaic web browser in '92--'93---and the biggest +challenge was getting anyone to take it seriously back then. At the +time, everyone thought the interactive future would be delivered as +"interactive television" by huge companies, not as the Internet by +startups. What excites you about the future of networking and the +Internet? What are your biggest concerns? The most exciting thing is the +huge unexplored frontier of applications and services that programmers +and entrepreneurs are able to explore---the Internet has unleashed +creativity at a level that I don't think we've ever seen before. My +biggest concern is the principle of unintended consequences---we don't +always know the implications of what we do, such as the Internet being +used by governments to run a new level of surveillance on citizens. Is +there anything in particular students should be aware of as Web +technology advances? The rate of change---the most important thing to +learn is how to learn---how to flexibly adapt to changes in the specific +technologies, and how to keep an open mind on the new opportunities and +possibilities as you move through your career. What people inspired you +professionally? Vannevar Bush, Ted Nelson, Doug Engelbart, Nolan +Bushnell, Bill Hewlett and Dave Packard, Ken Olsen, Steve Jobs, Steve +Wozniak, Andy Grove, Grace Hopper, Hedy Lamarr, Alan Turing, Richard +Stallman. What are your recommendations for students who want to pursue +careers in computing and information technology? Go as deep as you +possibly can on understanding how technology is created, and then +complement with learning how business works. Can technology solve the +world's problems? No, but we advance the standard of living of people +through economic growth, and most economic growth throughout history has +come from technology---so that's as good as it gets. + + Chapter 3 Transport Layer + +Residing between the application and network layers, the transport layer +is a central piece of the layered network architecture. It has the +critical role of providing communication services directly to the +application processes running on different hosts. The pedagogic approach +we take in this chapter is to alternate between discussions of +transport-layer principles and discussions of how these principles are +implemented in existing protocols; as usual, particular emphasis will be +given to Internet protocols, in particular the TCP and UDP +transport-layer protocols. We'll begin by discussing the relationship +between the transport and network layers. This sets the stage for +examining the first critical function of the transport layer---extending +the network layer's delivery service between two end systems to a +delivery service between two application-layer processes running on the +end systems. We'll illustrate this function in our coverage of the +Internet's connectionless transport protocol, UDP. We'll then return to +principles and confront one of the most fundamental problems in computer +networking---how two entities can communicate reliably over a medium +that may lose and corrupt data. Through a series of increasingly +complicated (and realistic!) scenarios, we'll build up an array of +techniques that transport protocols use to solve this problem. We'll +then show how these principles are embodied in TCP, the Internet's +connection-oriented transport protocol. We'll next move on to a second +fundamentally important problem in networking---controlling the +transmission rate of transport-layer entities in order to avoid, or +recover from, congestion within the network. We'll consider the causes +and consequences of congestion, as well as commonly used +congestion-control techniques. After obtaining a solid understanding of +the issues behind congestion control, we'll study TCP's approach to +congestion control. + + 3.1 Introduction and Transport-Layer Services In the previous two +chapters we touched on the role of the transport layer and the services +that it provides. Let's quickly review what we have already learned +about the transport layer. A transport-layer protocol provides for +logical communication between application processes running on different +hosts. By logical communication, we mean that from an application's +perspective, it is as if the hosts running the processes were directly +connected; in reality, the hosts may be on opposite sides of the planet, +connected via numerous routers and a wide range of link types. +Application processes use the logical communication provided by the +transport layer to send messages to each other, free from the worry of +the details of the physical infrastructure used to carry these messages. +Figure 3.1 illustrates the notion of logical communication. As shown in +Figure 3.1, transport-layer protocols are implemented in the end systems +but not in network routers. On the sending side, the transport layer +converts the application-layer messages it receives from a sending +application process into transport-layer packets, known as +transport-layer segments in Internet terminology. This is done by +(possibly) breaking the application messages into smaller chunks and +adding a transport-layer header to each chunk to create the +transport-layer segment. The transport layer then passes the segment to +the network layer at the sending end system, where the segment is +encapsulated within a network-layer packet (a datagram) and sent to the +destination. It's important to note that network routers act only on the +network-layer fields of the datagram; that is, they do not examine the +fields of the transport-layer segment encapsulated with the datagram. On +the receiving side, the network layer extracts the transport-layer +segment from the datagram and passes the segment up to the transport +layer. The transport layer then processes the received segment, making +the data in the segment available to the receiving application. More +than one transport-layer protocol may be available to network +applications. For example, the Internet has two protocols---TCP and UDP. +Each of these protocols provides a different set of transportlayer +services to the invoking application. + +3.1.1 Relationship Between Transport and Network Layers Recall that the +transport layer lies just above the network layer in the protocol stack. +Whereas a transport-layer protocol provides logical communication +between + + Figure 3.1 The transport layer provides logical rather than physical +communication between application processes + +processes running on different hosts, a network-layer protocol provides +logical-communication between hosts. This distinction is subtle but +important. Let's examine this distinction with the aid of a household +analogy. Consider two houses, one on the East Coast and the other on the +West Coast, with each house being home to a dozen kids. The kids in the +East Coast household are cousins of the kids in the West Coast + + household. The kids in the two households love to write to each +other---each kid writes each cousin every week, with each letter +delivered by the traditional postal service in a separate envelope. +Thus, each household sends 144 letters to the other household every +week. (These kids would save a lot of money if they had e-mail!) In each +of the households there is one kid---Ann in the West Coast house and +Bill in the East Coast house---responsible for mail collection and mail +distribution. Each week Ann visits all her brothers and sisters, +collects the mail, and gives the mail to a postal-service mail carrier, +who makes daily visits to the house. When letters arrive at the West +Coast house, Ann also has the job of distributing the mail to her +brothers and sisters. Bill has a similar job on the East Coast. In this +example, the postal service provides logical communication between the +two houses---the postal service moves mail from house to house, not from +person to person. On the other hand, Ann and Bill provide logical +communication among the cousins---Ann and Bill pick up mail from, and +deliver mail to, their brothers and sisters. Note that from the cousins' +perspective, Ann and Bill are the mail service, even though Ann and Bill +are only a part (the end-system part) of the end-to-end delivery +process. This household example serves as a nice analogy for explaining +how the transport layer relates to the network layer: application +messages = letters in envelopes processes = cousins hosts (also called +end systems) = houses transport-layer protocol = Ann and Bill +network-layer protocol = postal service (including mail carriers) +Continuing with this analogy, note that Ann and Bill do all their work +within their respective homes; they are not involved, for example, in +sorting mail in any intermediate mail center or in moving mail from one +mail center to another. Similarly, transport-layer protocols live in the +end systems. Within an end system, a transport protocol moves messages +from application processes to the network edge (that is, the network +layer) and vice versa, but it doesn't have any say about how the +messages are moved within the network core. In fact, as illustrated in +Figure 3.1, intermediate routers neither act on, nor recognize, any +information that the transport layer may have added to the application +messages. Continuing with our family saga, suppose now that when Ann and +Bill go on vacation, another cousin pair---say, Susan and +Harvey---substitute for them and provide the household-internal +collection and delivery of mail. Unfortunately for the two families, +Susan and Harvey do not do the collection and delivery in exactly the +same way as Ann and Bill. Being younger kids, Susan and Harvey pick up +and drop off the mail less frequently and occasionally lose letters +(which are sometimes chewed up by the family dog). Thus, the cousin-pair +Susan and Harvey do not provide the same set of services (that is, the +same service model) as Ann and Bill. In an analogous manner, a computer +network may make + + available multiple transport protocols, with each protocol offering a +different service model to applications. The possible services that Ann +and Bill can provide are clearly constrained by the possible services +that the postal service provides. For example, if the postal service +doesn't provide a maximum bound on how long it can take to deliver mail +between the two houses (for example, three days), then there is no way +that Ann and Bill can guarantee a maximum delay for mail delivery +between any of the cousin pairs. In a similar manner, the services that +a transport protocol can provide are often constrained by the service +model of the underlying network-layer protocol. If the network-layer +protocol cannot provide delay or bandwidth guarantees for +transport-layer segments sent between hosts, then the transport-layer +protocol cannot provide delay or bandwidth guarantees for application +messages sent between processes. Nevertheless, certain services can be +offered by a transport protocol even when the underlying network +protocol doesn't offer the corresponding service at the network layer. +For example, as we'll see in this chapter, a transport protocol can +offer reliable data transfer service to an application even when the +underlying network protocol is unreliable, that is, even when the +network protocol loses, garbles, or duplicates packets. As another +example (which we'll explore in Chapter 8 when we discuss network +security), a transport protocol can use encryption to guarantee that +application messages are not read by intruders, even when the network +layer cannot guarantee the confidentiality of transport-layer segments. + +3.1.2 Overview of the Transport Layer in the Internet Recall that the +Internet makes two distinct transport-layer protocols available to the +application layer. One of these protocols is UDP (User Datagram +Protocol), which provides an unreliable, connectionless service to the +invoking application. The second of these protocols is TCP (Transmission +Control Protocol), which provides a reliable, connection-oriented +service to the invoking application. When designing a network +application, the application developer must specify one of these two +transport protocols. As we saw in Section 2.7, the application developer +selects between UDP and TCP when creating sockets. To simplify +terminology, we refer to the transport-layer packet as a segment. We +mention, however, that the Internet literature (for example, the RFCs) +also refers to the transport-layer packet for TCP as a segment but often +refers to the packet for UDP as a datagram. But this same Internet +literature also uses the term datagram for the network-layer packet! For +an introductory book on computer networking such as this, we believe +that it is less confusing to refer to both TCP and UDP packets as +segments, and reserve the term datagram for the network-layer packet. + + Before proceeding with our brief introduction of UDP and TCP, it will be +useful to say a few words about the Internet's network layer. (We'll +learn about the network layer in detail in Chapters 4 and 5.) The +Internet's network-layer protocol has a name---IP, for Internet +Protocol. IP provides logical communication between hosts. The IP +service model is a best-effort delivery service. This means that IP +makes its "best effort" to deliver segments between communicating hosts, +but it makes no guarantees. In particular, it does not guarantee segment +delivery, it does not guarantee orderly delivery of segments, and it +does not guarantee the integrity of the data in the segments. For these +reasons, IP is said to be an unreliable service. We also mention here +that every host has at least one networklayer address, a so-called IP +address. We'll examine IP addressing in detail in Chapter 4; for this +chapter we need only keep in mind that each host has an IP address. +Having taken a glimpse at the IP service model, let's now summarize the +service models provided by UDP and TCP. The most fundamental +responsibility of UDP and TCP is to extend IP's delivery service between +two end systems to a delivery service between two processes running on +the end systems. Extending host-to-host delivery to process-to-process +delivery is called transport-layer multiplexing and demultiplexing. +We'll discuss transport-layer multiplexing and demultiplexing in the +next section. UDP and TCP also provide integrity checking by including +error-detection fields in their segments' headers. These two minimal +transport-layer services---process-to-process data delivery and error +checking---are the only two services that UDP provides! In particular, +like IP, UDP is an unreliable service---it does not guarantee that data +sent by one process will arrive intact (or at all!) to the destination +process. UDP is discussed in detail in Section 3.3. TCP, on the other +hand, offers several additional services to applications. First and +foremost, it provides reliable data transfer. Using flow control, +sequence numbers, acknowledgments, and timers (techniques we'll explore +in detail in this chapter), TCP ensures that data is delivered from +sending process to receiving process, correctly and in order. TCP thus +converts IP's unreliable service between end systems into a reliable +data transport service between processes. TCP also provides congestion +control. Congestion control is not so much a service provided to the +invoking application as it is a service for the Internet as a whole, a +service for the general good. Loosely speaking, TCP congestion control +prevents any one TCP connection from swamping the links and routers +between communicating hosts with an excessive amount of traffic. TCP +strives to give each connection traversing a congested link an equal +share of the link bandwidth. This is done by regulating the rate at +which the sending sides of TCP connections can send traffic into the +network. UDP traffic, on the other hand, is unregulated. An application +using UDP transport can send at any rate it pleases, for as long as it +pleases. A protocol that provides reliable data transfer and congestion +control is necessarily complex. We'll need several sections to cover the +principles of reliable data transfer and congestion control, and +additional sections to cover the TCP protocol itself. These topics are +investigated in Sections 3.4 through 3.8. The approach taken in this +chapter is to alternate between basic principles and the TCP protocol. +For example, we'll first discuss reliable data transfer in a general +setting and then discuss how TCP + + specifically provides reliable data transfer. Similarly, we'll first +discuss congestion control in a general setting and then discuss how TCP +performs congestion control. But before getting into all this good +stuff, let's first look at transport-layer multiplexing and +demultiplexing. + + 3.2 Multiplexing and Demultiplexing In this section, we discuss +transport-layer multiplexing and demultiplexing, that is, extending the +host-tohost delivery service provided by the network layer to a +process-to-process delivery service for applications running on the +hosts. In order to keep the discussion concrete, we'll discuss this +basic transport-layer service in the context of the Internet. We +emphasize, however, that a multiplexing/demultiplexing service is needed +for all computer networks. At the destination host, the transport layer +receives segments from the network layer just below. The transport layer +has the responsibility of delivering the data in these segments to the +appropriate application process running in the host. Let's take a look +at an example. Suppose you are sitting in front of your computer, and +you are downloading Web pages while running one FTP session and two +Telnet sessions. You therefore have four network application processes +running---two Telnet processes, one FTP process, and one HTTP process. +When the transport layer in your computer receives data from the network +layer below, it needs to direct the received data to one of these four +processes. Let's now examine how this is done. First recall from Section +2.7 that a process (as part of a network application) can have one or +more sockets, doors through which data passes from the network to the +process and through which data passes from the process to the network. +Thus, as shown in Figure 3.2, the transport layer in the receiving host +does not actually deliver data directly to a process, but instead to an +intermediary socket. Because at any given time there can be more than +one socket in the receiving host, each socket has a unique identifier. +The format of the identifier depends on whether the socket is a UDP or a +TCP socket, as we'll discuss shortly. Now let's consider how a receiving +host directs an incoming transport-layer segment to the appropriate +socket. Each transport-layer segment has a set of fields in the segment +for this purpose. At the receiving end, the transport layer examines +these fields to identify the receiving socket and then directs the +segment to that socket. This job of delivering the data in a +transport-layer segment to the correct socket is called demultiplexing. +The job of gathering data chunks at the source host from different +sockets, encapsulating each data chunk with header information (that +will later be used in demultiplexing) to create segments, and passing +the segments to the network layer is called multiplexing. Note that the +transport layer in the middle host + + Figure 3.2 Transport-layer multiplexing and demultiplexing + +in Figure 3.2 must demultiplex segments arriving from the network layer +below to either process P1 or P2 above; this is done by directing the +arriving segment's data to the corresponding process's socket. The +transport layer in the middle host must also gather outgoing data from +these sockets, form transportlayer segments, and pass these segments +down to the network layer. Although we have introduced multiplexing and +demultiplexing in the context of the Internet transport protocols, it's +important to realize that they are concerns whenever a single protocol +at one layer (at the transport layer or elsewhere) is used by multiple +protocols at the next higher layer. To illustrate the demultiplexing +job, recall the household analogy in the previous section. Each of the +kids is identified by his or her name. When Bill receives a batch of +mail from the mail carrier, he performs a demultiplexing operation by +observing to whom the letters are addressed and then hand delivering the +mail to his brothers and sisters. Ann performs a multiplexing operation +when she collects letters from her brothers and sisters and gives the +collected mail to the mail person. Now that we understand the roles of +transport-layer multiplexing and demultiplexing, let us examine how it +is actually done in a host. From the discussion above, we know that +transport-layer multiplexing requires (1) that sockets have unique +identifiers, and (2) that each segment have special fields that indicate +the socket to which the segment is to be delivered. These special +fields, illustrated in Figure 3.3, are the source port number field and +the destination port number field. (The UDP and TCP segments have other +fields as well, as discussed in the subsequent sections of this +chapter.) Each port number is a 16-bit number, ranging from 0 to 65535. +The port numbers ranging from 0 to 1023 are called well-known port +numbers and are restricted, which means that they are reserved for use +by well-known + + Figure 3.3 Source and destination port-number fields in a +transport-layer segment + +application protocols such as HTTP (which uses port number 80) and FTP +(which uses port number 21). The list of well-known port numbers is +given in RFC 1700 and is updated at http://www.iana.org \[RFC 3232\]. +When we develop a new application (such as the simple application +developed in Section 2.7), we must assign the application a port number. +It should now be clear how the transport layer could implement the +demultiplexing service: Each socket in the host could be assigned a port +number, and when a segment arrives at the host, the transport layer +examines the destination port number in the segment and directs the +segment to the corresponding socket. The segment's data then passes +through the socket into the attached process. As we'll see, this is +basically how UDP does it. However, we'll also see that +multiplexing/demultiplexing in TCP is yet more subtle. Connectionless +Multiplexing and Demultiplexing Recall from Section 2.7.1 that the +Python program running in a host can create a UDP socket with the line + +clientSocket = socket(AF_INET, SOCK_DGRAM) + +When a UDP socket is created in this manner, the transport layer +automatically assigns a port number to the socket. In particular, the +transport layer assigns a port number in the range 1024 to 65535 that is +currently not being used by any other UDP port in the host. +Alternatively, we can add a line into our Python program after we create +the socket to associate a specific port number (say, 19157) to this UDP +socket via the socket bind() method: + +clientSocket.bind(('', 19157)) + + If the application developer writing the code were implementing the +server side of a "well-known protocol," then the developer would have to +assign the corresponding well-known port number. Typically, the client +side of the application lets the transport layer automatically (and +transparently) assign the port number, whereas the server side of the +application assigns a specific port number. With port numbers assigned +to UDP sockets, we can now precisely describe UDP +multiplexing/demultiplexing. Suppose a process in Host A, with UDP port +19157, wants to send a chunk of application data to a process with UDP +port 46428 in Host B. The transport layer in Host A creates a +transport-layer segment that includes the application data, the source +port number (19157), the destination port number (46428), and two other +values (which will be discussed later, but are unimportant for the +current discussion). The transport layer then passes the resulting +segment to the network layer. The network layer encapsulates the segment +in an IP datagram and makes a best-effort attempt to deliver the segment +to the receiving host. If the segment arrives at the receiving Host B, +the transport layer at the receiving host examines the destination port +number in the segment (46428) and delivers the segment to its socket +identified by port 46428. Note that Host B could be running multiple +processes, each with its own UDP socket and associated port number. As +UDP segments arrive from the network, Host B directs (demultiplexes) +each segment to the appropriate socket by examining the segment's +destination port number. It is important to note that a UDP socket is +fully identified by a two-tuple consisting of a destination IP address +and a destination port number. As a consequence, if two UDP segments +have different source IP addresses and/or source port numbers, but have +the same destination IP address and destination port number, then the +two segments will be directed to the same destination process via the +same destination socket. You may be wondering now, what is the purpose +of the source port number? As shown in Figure 3.4, in the A-to-B segment +the source port number serves as part of a "return address"---when B +wants to send a segment back to A, the destination port in the B-to-A +segment will take its value from the source port value of the A-to-B +segment. (The complete return address is A's IP address and the source +port number.) As an example, recall the UDP server program studied in +Section 2.7. In UDPServer.py , the server uses the recvfrom() method to +extract the client-side (source) port number from the segment it +receives from the client; it then sends a new segment to the client, +with the extracted source port number serving as the destination port +number in this new segment. Connection-Oriented Multiplexing and +Demultiplexing In order to understand TCP demultiplexing, we have to +take a close look at TCP sockets and TCP connection establishment. One +subtle difference between a TCP socket and a UDP socket is that a TCP + + socket is identified by a four-tuple: (source IP address, source port +number, destination IP address, destination port number). Thus, when a +TCP segment arrives from the network to a host, the host uses all four +values to direct (demultiplex) the segment to the appropriate socket. + +Figure 3.4 The inversion of source and destination port numbers + +In particular, and in contrast with UDP, two arriving TCP segments with +different source IP addresses or source port numbers will (with the +exception of a TCP segment carrying the original connectionestablishment +request) be directed to two different sockets. To gain further insight, +let's reconsider the TCP client-server programming example in Section +2.7.2: The TCP server application has a "welcoming socket," that waits +for connection-establishment requests from TCP clients (see Figure 2.29) +on port number 12000. The TCP client creates a socket and sends a +connection establishment request segment with the lines: + +clientSocket = socket(AF_INET, SOCK_STREAM) +clientSocket.connect((serverName,12000)) + +A connection-establishment request is nothing more than a TCP segment +with destination port number 12000 and a special +connection-establishment bit set in the TCP header (discussed in Section +3.5). The segment also includes a source port number that was chosen by +the client. When the host operating system of the computer running the +server process receives the incoming + + connection-request segment with destination port 12000, it locates the +server process that is waiting to accept a connection on port number +12000. The server process then creates a new socket: connectionSocket, +addr = serverSocket.accept() + +Also, the transport layer at the server notes the following four values +in the connection-request segment: (1) the source port number in the +segment, (2) the IP address of the source host, (3) the destination port +number in the segment, and (4) its own IP address. The newly created +connection socket is identified by these four values; all subsequently +arriving segments whose source port, source IP address, destination +port, and destination IP address match these four values will be +demultiplexed to this socket. With the TCP connection now in place, the +client and server can now send data to each other. The server host may +support many simultaneous TCP connection sockets, with each socket +attached to a process, and with each socket identified by its own +four-tuple. When a TCP segment arrives at the host, all four fields +(source IP address, source port, destination IP address, destination +port) are used to direct (demultiplex) the segment to the appropriate +socket. + +FOCUS ON SECURITY Port Scanning We've seen that a server process waits +patiently on an open port for contact by a remote client. Some ports are +reserved for well-known applications (e.g., Web, FTP, DNS, and SMTP +servers); other ports are used by convention by popular applications +(e.g., the Microsoft 2000 SQL server listens for requests on UDP port +1434). Thus, if we determine that a port is open on a host, we may be +able to map that port to a specific application running on the host. +This is very useful for system administrators, who are often interested +in knowing which network applications are running on the hosts in their +networks. But attackers, in order to "case the joint," also want to know +which ports are open on target hosts. If a host is found to be running +an application with a known security flaw (e.g., a SQL server listening +on port 1434 was subject to a buffer overflow, allowing a remote user to +execute arbitrary code on the vulnerable host, a flaw exploited by the +Slammer worm \[CERT 2003--04\]), then that host is ripe for attack. +Determining which applications are listening on which ports is a +relatively easy task. Indeed there are a number of public domain +programs, called port scanners, that do just that. Perhaps the most +widely used of these is nmap, freely available at http://nmap.org and +included in most Linux distributions. For TCP, nmap sequentially scans +ports, looking for ports that are accepting TCP connections. For UDP, +nmap again sequentially scans ports, looking for UDP ports that respond +to transmitted UDP segments. In both cases, nmap returns a list of open, +closed, or unreachable ports. A host running nmap can attempt to scan +any target host anywhere in the + + Internet. We'll revisit nmap in Section 3.5.6, when we discuss TCP +connection management. + +Figure 3.5 Two clients, using the same destination port number (80) to +communicate with the same Web server application + +The situation is illustrated in Figure 3.5, in which Host C initiates +two HTTP sessions to server B, and Host A initiates one HTTP session to +B. Hosts A and C and server B each have their own unique IP address---A, +C, and B, respectively. Host C assigns two different source port numbers +(26145 and 7532) to its two HTTP connections. Because Host A is choosing +source port numbers independently of C, it might also assign a source +port of 26145 to its HTTP connection. But this is not a problem---server +B will still be able to correctly demultiplex the two connections having +the same source port number, since the two connections have different +source IP addresses. Web Servers and TCP Before closing this discussion, +it's instructive to say a few additional words about Web servers and how +they use port numbers. Consider a host running a Web server, such as an +Apache Web server, on port 80. When clients (for example, browsers) send +segments to the server, all segments will have destination port 80. In +particular, both the initial connection-establishment segments and the +segments carrying HTTP request messages will have destination port 80. +As we have just described, the server distinguishes the segments from +the different clients using source IP addresses and source port + + numbers. Figure 3.5 shows a Web server that spawns a new process for +each connection. As shown in Figure 3.5, each of these processes has its +own connection socket through which HTTP requests arrive and HTTP +responses are sent. We mention, however, that there is not always a +one-to-one correspondence between connection sockets and processes. In +fact, today's high-performing Web servers often use only one process, +and create a new thread with a new connection socket for each new client +connection. (A thread can be viewed as a lightweight subprocess.) If you +did the first programming assignment in Chapter 2, you built a Web +server that does just this. For such a server, at any given time there +may be many connection sockets (with different identifiers) attached to +the same process. If the client and server are using persistent HTTP, +then throughout the duration of the persistent connection the client and +server exchange HTTP messages via the same server socket. However, if +the client and server use non-persistent HTTP, then a new TCP connection +is created and closed for every request/response, and hence a new socket +is created and later closed for every request/response. This frequent +creating and closing of sockets can severely impact the performance of a +busy Web server (although a number of operating system tricks can be +used to mitigate the problem). Readers interested in the operating +system issues surrounding persistent and non-persistent HTTP are +encouraged to see \[Nielsen 1997; Nahum 2002\]. Now that we've discussed +transport-layer multiplexing and demultiplexing, let's move on and +discuss one of the Internet's transport protocols, UDP. In the next +section we'll see that UDP adds little more to the network-layer +protocol than a multiplexing/demultiplexing service. + + 3.3 Connectionless Transport: UDP In this section, we'll take a close +look at UDP, how it works, and what it does. We encourage you to refer +back to Section 2.1, which includes an overview of the UDP service +model, and to Section 2.7.1, which discusses socket programming using +UDP. To motivate our discussion about UDP, suppose you were interested +in designing a no-frills, bare-bones transport protocol. How might you +go about doing this? You might first consider using a vacuous transport +protocol. In particular, on the sending side, you might consider taking +the messages from the application process and passing them directly to +the network layer; and on the receiving side, you might consider taking +the messages arriving from the network layer and passing them directly +to the application process. But as we learned in the previous section, +we have to do a little more than nothing! At the very least, the +transport layer has to provide a multiplexing/demultiplexing service in +order to pass data between the network layer and the correct +application-level process. UDP, defined in \[RFC 768\], does just about +as little as a transport protocol can do. Aside from the +multiplexing/demultiplexing function and some light error checking, it +adds nothing to IP. In fact, if the application developer chooses UDP +instead of TCP, then the application is almost directly talking with IP. +UDP takes messages from the application process, attaches source and +destination port number fields for the multiplexing/demultiplexing +service, adds two other small fields, and passes the resulting segment +to the network layer. The network layer encapsulates the transport-layer +segment into an IP datagram and then makes a best-effort attempt to +deliver the segment to the receiving host. If the segment arrives at the +receiving host, UDP uses the destination port number to deliver the +segment's data to the correct application process. Note that with UDP +there is no handshaking between sending and receiving transport-layer +entities before sending a segment. For this reason, UDP is said to be +connectionless. DNS is an example of an application-layer protocol that +typically uses UDP. When the DNS application in a host wants to make a +query, it constructs a DNS query message and passes the message to UDP. +Without performing any handshaking with the UDP entity running on the +destination end system, the host-side UDP adds header fields to the +message and passes the resulting segment to the network layer. The +network layer encapsulates the UDP segment into a datagram and sends the +datagram to a name server. The DNS application at the querying host then +waits for a reply to its query. If it doesn't receive a reply (possibly +because the underlying network lost the query or the reply), it might +try resending the query, try sending the query to another name server, +or inform the invoking application that it can't get a reply. + + Now you might be wondering why an application developer would ever +choose to build an application over UDP rather than over TCP. Isn't TCP +always preferable, since TCP provides a reliable data transfer service, +while UDP does not? The answer is no, as some applications are better +suited for UDP for the following reasons: Finer application-level +control over what data is sent, and when. Under UDP, as soon as an +application process passes data to UDP, UDP will package the data inside +a UDP segment and immediately pass the segment to the network layer. +TCP, on the other hand, has a congestioncontrol mechanism that throttles +the transport-layer TCP sender when one or more links between the source +and destination hosts become excessively congested. TCP will also +continue to resend a segment until the receipt of the segment has been +acknowledged by the destination, regardless of how long reliable +delivery takes. Since real-time applications often require a minimum +sending rate, do not want to overly delay segment transmission, and can +tolerate some data loss, TCP's service model is not particularly well +matched to these applications' needs. As discussed below, these +applications can use UDP and implement, as part of the application, any +additional functionality that is needed beyond UDP's no-frills +segment-delivery service. No connection establishment. As we'll discuss +later, TCP uses a three-way handshake before it starts to transfer data. +UDP just blasts away without any formal preliminaries. Thus UDP does not +introduce any delay to establish a connection. This is probably the +principal reason why DNS runs over UDP rather than TCP---DNS would be +much slower if it ran over TCP. HTTP uses TCP rather than UDP, since +reliability is critical for Web pages with text. But, as we briefly +discussed in Section 2.2, the TCP connection-establishment delay in HTTP +is an important contributor to the delays associated with downloading +Web documents. Indeed, the QUIC protocol (Quick UDP Internet Connection, +\[Iyengar 2015\]), used in Google's Chrome browser, uses UDP as its +underlying transport protocol and implements reliability in an +application-layer protocol on top of UDP. No connection state. TCP +maintains connection state in the end systems. This connection state +includes receive and send buffers, congestion-control parameters, and +sequence and acknowledgment number parameters. We will see in Section +3.5 that this state information is needed to implement TCP's reliable +data transfer service and to provide congestion control. UDP, on the +other hand, does not maintain connection state and does not track any of +these parameters. For this reason, a server devoted to a particular +application can typically support many more active clients when the +application runs over UDP rather than TCP. Small packet header overhead. +The TCP segment has 20 bytes of header overhead in every segment, +whereas UDP has only 8 bytes of overhead. Figure 3.6 lists popular +Internet applications and the transport protocols that they use. As we +expect, email, remote terminal access, the Web, and file transfer run +over TCP---all these applications need the reliable data transfer +service of TCP. Nevertheless, many important applications run over UDP +rather than TCP. For example, UDP is used to carry network management +(SNMP; see Section 5.7) data. UDP is preferred to TCP in this case, +since network management applications must often run when the + + network is in a stressed state---precisely when reliable, +congestion-controlled data transfer is difficult to achieve. Also, as we +mentioned earlier, DNS runs over UDP, thereby avoiding TCP's +connectionestablishment delays. As shown in Figure 3.6, both UDP and TCP +are somtimes used today with multimedia applications, such as Internet +phone, real-time video conferencing, and streaming of stored audio and +video. We'll take a close look at these applications in Chapter 9. We +just mention now that all of these applications can tolerate a small +amount of packet loss, so that reliable data transfer is not absolutely +critical for the application's success. Furthermore, real-time +applications, like Internet phone and video conferencing, react very +poorly to TCP's congestion control. For these reasons, developers of +multimedia applications may choose to run their applications over UDP +instead of TCP. When packet loss rates are low, and with some +organizations blocking UDP traffic for security reasons (see Chapter 8), +TCP becomes an increasingly attractive protocol for streaming media +transport. + +Figure 3.6 Popular Internet applications and their underlying transport +protocols + +Although commonly done today, running multimedia applications over UDP +is controversial. As we mentioned above, UDP has no congestion control. +But congestion control is needed to prevent the network from entering a +congested state in which very little useful work is done. If everyone +were to start streaming high-bit-rate video without using any congestion +control, there would be so much packet overflow at routers that very few +UDP packets would successfully traverse the source-to-destination path. +Moreover, the high loss rates induced by the uncontrolled UDP senders +would cause the TCP senders (which, as we'll see, do decrease their +sending rates in the face of congestion) to dramatically decrease their +rates. Thus, the lack of congestion control in UDP can result in high +loss rates between a UDP sender and receiver, and the crowding out of +TCP sessions---a potentially serious problem \[Floyd + + 1999\]. Many researchers have proposed new mechanisms to force all +sources, including UDP sources, to perform adaptive congestion control +\[Mahdavi 1997; Floyd 2000; Kohler 2006: RFC 4340\]. Before discussing +the UDP segment structure, we mention that it is possible for an +application to have reliable data transfer when using UDP. This can be +done if reliability is built into the application itself (for example, +by adding acknowledgment and retransmission mechanisms, such as those +we'll study in the next section). We mentioned earlier that the QUIC +protocol \[Iyengar 2015\] used in Google's Chrome browser implements +reliability in an application-layer protocol on top of UDP. But this is +a nontrivial task that would keep an application developer busy +debugging for a long time. Nevertheless, building reliability directly +into the application allows the application to "have its cake and eat it +too. That is, application processes can communicate reliably without +being subjected to the transmission-rate constraints imposed by TCP's +congestion-control mechanism. + +3.3.1 UDP Segment Structure The UDP segment structure, shown in Figure +3.7, is defined in RFC 768. The application data occupies the data field +of the UDP segment. For example, for DNS, the data field contains either +a query message or a response message. For a streaming audio +application, audio samples fill the data field. The UDP header has only +four fields, each consisting of two bytes. As discussed in the previous +section, the port numbers allow the destination host to pass the +application data to the correct process running on the destination end +system (that is, to perform the demultiplexing function). The length +field specifies the number of bytes in the UDP segment (header plus +data). An explicit length value is needed since the size of the data +field may differ from one UDP segment to the next. The checksum is used +by the receiving host to check whether errors have been introduced into +the segment. In truth, the checksum is also calculated over a few of the +fields in the IP header in addition to the UDP segment. But we ignore +this detail in order to see the forest through the trees. We'll discuss +the checksum calculation below. Basic principles of error detection are +described in Section 6.2. The length field specifies the length of the +UDP segment, including the header, in bytes. + +3.3.2 UDP Checksum The UDP checksum provides for error detection. That +is, the checksum is used to determine whether bits within the UDP +segment have been altered (for example, by noise in the links or while +stored in a router) as it moved from source to destination. + + Figure 3.7 UDP segment structure + +UDP at the sender side performs the 1s complement of the sum of all the +16-bit words in the segment, with any overflow encountered during the +sum being wrapped around. This result is put in the checksum field of +the UDP segment. Here we give a simple example of the checksum +calculation. You can find details about efficient implementation of the +calculation in RFC 1071 and performance over real data in \[Stone 1998; +Stone 2000\]. As an example, suppose that we have the following three +16-bit words: 0110011001100000 0101010101010101 1000111100001100 The sum +of first two of these 16-bit words is 0110011001100000 0101010101010101 +1011101110110101 Adding the third word to the above sum gives +1011101110110101 1000111100001100 0100101011000010 Note that this last +addition had overflow, which was wrapped around. The 1s complement is +obtained by converting all the 0s to 1s and converting all the 1s to 0s. +Thus the 1s complement of the sum 0100101011000010 is 1011010100111101, +which becomes the checksum. At the receiver, all four 16- + + bit words are added, including the checksum. If no errors are introduced +into the packet, then clearly the sum at the receiver will be +1111111111111111. If one of the bits is a 0, then we know that errors +have been introduced into the packet. You may wonder why UDP provides a +checksum in the first place, as many link-layer protocols (including the +popular Ethernet protocol) also provide error checking. The reason is +that there is no guarantee that all the links between source and +destination provide error checking; that is, one of the links may use a +link-layer protocol that does not provide error checking. Furthermore, +even if segments are correctly transferred across a link, it's possible +that bit errors could be introduced when a segment is stored in a +router's memory. Given that neither link-by-link reliability nor +in-memory error detection is guaranteed, UDP must provide error +detection at the transport layer, on an end-end basis, if the endend +data transfer service is to provide error detection. This is an example +of the celebrated end-end principle in system design \[Saltzer 1984\], +which states that since certain functionality (error detection, in this +case) must be implemented on an end-end basis: "functions placed at the +lower levels may be redundant or of little value when compared to the +cost of providing them at the higher level." Because IP is supposed to +run over just about any layer-2 protocol, it is useful for the transport +layer to provide error checking as a safety measure. Although UDP +provides error checking, it does not do anything to recover from an +error. Some implementations of UDP simply discard the damaged segment; +others pass the damaged segment to the application with a warning. That +wraps up our discussion of UDP. We will soon see that TCP offers +reliable data transfer to its applications as well as other services +that UDP doesn't offer. Naturally, TCP is also more complex than UDP. +Before discussing TCP, however, it will be useful to step back and first +discuss the underlying principles of reliable data transfer. + + 3.4 Principles of Reliable Data Transfer In this section, we consider +the problem of reliable data transfer in a general context. This is +appropriate since the problem of implementing reliable data transfer +occurs not only at the transport layer, but also at the link layer and +the application layer as well. The general problem is thus of central +importance to networking. Indeed, if one had to identify a "top-ten" +list of fundamentally important problems in all of networking, this +would be a candidate to lead the list. In the next section we'll examine +TCP and show, in particular, that TCP exploits many of the principles +that we are about to describe. Figure 3.8 illustrates the framework for +our study of reliable data transfer. The service abstraction provided to +the upper-layer entities is that of a reliable channel through which +data can be transferred. With a reliable channel, no transferred data +bits are corrupted (flipped from 0 to 1, or vice versa) or lost, and all +are delivered in the order in which they were sent. This is precisely +the service model offered by TCP to the Internet applications that +invoke it. It is the responsibility of a reliable data transfer protocol +to implement this service abstraction. This task is made difficult by +the fact that the layer below the reliable data transfer protocol may be +unreliable. For example, TCP is a reliable data transfer protocol that +is implemented on top of an unreliable (IP) end-to-end network layer. +More generally, the layer beneath the two reliably communicating end +points might consist of a single physical link (as in the case of a +link-level data transfer protocol) or a global internetwork (as in the +case of a transport-level protocol). For our purposes, however, we can +view this lower layer simply as an unreliable point-to-point channel. In +this section, we will incrementally develop the sender and receiver +sides of a reliable data transfer protocol, considering increasingly +complex models of the underlying channel. For example, we'll consider +what protocol mechanisms are + + Figure 3.8 Reliable data transfer: Service model and service +implementation + + needed when the underlying channel can corrupt bits or lose entire +packets. One assumption we'll adopt throughout our discussion here is +that packets will be delivered in the order in which they were sent, +with some packets possibly being lost; that is, the underlying channel +will not reorder packets. Figure 3.8(b) illustrates the interfaces for +our data transfer protocol. The sending side of the data transfer +protocol will be invoked from above by a call to rdt_send() . It will +pass the data to be delivered to the upper layer at the receiving side. +(Here rdt stands for reliable data transfer protocol and \_send +indicates that the sending side of rdt is being called. The first step +in developing any protocol is to choose a good name!) On the receiving +side, rdt_rcv() will be called when a packet arrives from the receiving +side of the channel. When the rdt protocol wants to deliver data to the +upper layer, it will do so by calling deliver_data() . In the following +we use the terminology "packet" rather than transport-layer "segment." +Because the theory developed in this section applies to computer +networks in general and not just to the Internet transport layer, the +generic term "packet" is perhaps more appropriate here. In this section +we consider only the case of unidirectional data transfer, that is, data +transfer from the sending to the receiving side. The case of reliable +bidirectional (that is, full-duplex) data transfer is conceptually no +more difficult but considerably more tedious to explain. Although we +consider only unidirectional data transfer, it is important to note that +the sending and receiving sides of our protocol will nonetheless need to +transmit packets in both directions, as indicated in Figure 3.8. We will +see shortly that, in addition to exchanging packets containing the data +to be transferred, the sending and receiving sides of rdt will also need +to exchange control packets back and forth. Both the send and receive +sides of rdt send packets to the other side by a call to udt_send() +(where udt stands for unreliable data transfer). + +3.4.1 Building a Reliable Data Transfer Protocol We now step through a +series of protocols, each one becoming more complex, arriving at a +flawless, reliable data transfer protocol. Reliable Data Transfer over a +Perfectly Reliable Channel: rdt1.0 We first consider the simplest case, +in which the underlying channel is completely reliable. The protocol +itself, which we'll call rdt1.0 , is trivial. The finite-state machine +(FSM) definitions for the rdt1.0 sender and receiver are shown in Figure +3.9. The FSM in Figure 3.9(a) defines the operation of the sender, while +the FSM in Figure 3.9(b) defines the operation of the receiver. It is +important to note that there are separate FSMs for the sender and for +the receiver. The sender and receiver FSMs in Figure 3.9 each have just +one state. The arrows in the FSM description indicate the transition of +the protocol from one state to another. (Since each FSM in Figure 3.9 +has just one state, a transition is necessarily from the one state back +to itself; we'll see more complicated state diagrams shortly.) The event +causing + + the transition is shown above the horizontal line labeling the +transition, and the actions taken when the event occurs are shown below +the horizontal line. When no action is taken on an event, or no event +occurs and an action is taken, we'll use the symbol Λ below or above the +horizontal, respectively, to explicitly denote the lack of an action or +event. The initial state of the FSM is indicated by the dashed arrow. +Although the FSMs in Figure 3.9 have but one state, the FSMs we will see +shortly have multiple states, so it will be important to identify the +initial state of each FSM. The sending side of rdt simply accepts data +from the upper layer via the rdt_send(data) event, creates a packet +containing the data (via the action make_pkt(data) ) and sends the +packet into the channel. In practice, the rdt_send(data) event would +result from a procedure call (for example, to rdt_send() ) by the +upper-layer application. + +Figure 3.9 rdt1.0 -- A protocol for a completely reliable channel + +On the receiving side, rdt receives a packet from the underlying channel +via the rdt_rcv(packet) event, removes the data from the packet (via the +action extract (packet, data) ) and passes the data up to the upper +layer (via the action deliver_data(data) ). In practice, the +rdt_rcv(packet) event would result from a procedure call (for example, +to rdt_rcv() ) from the lower-layer protocol. In this simple protocol, +there is no difference between a unit of data and a packet. Also, all +packet flow is from the sender to receiver; with a perfectly reliable +channel there is no need for the receiver side to provide any feedback +to the sender since nothing can go wrong! Note that we have also assumed +that + + the receiver is able to receive data as fast as the sender happens to +send data. Thus, there is no need for the receiver to ask the sender to +slow down! Reliable Data Transfer over a Channel with Bit Errors: rdt2.0 +A more realistic model of the underlying channel is one in which bits in +a packet may be corrupted. Such bit errors typically occur in the +physical components of a network as a packet is transmitted, propagates, +or is buffered. We'll continue to assume for the moment that all +transmitted packets are received (although their bits may be corrupted) +in the order in which they were sent. Before developing a protocol for +reliably communicating over such a channel, first consider how people +might deal with such a situation. Consider how you yourself might +dictate a long message over the phone. In a typical scenario, the +message taker might say "OK" after each sentence has been heard, +understood, and recorded. If the message taker hears a garbled sentence, +you're asked to repeat the garbled sentence. This message-dictation +protocol uses both positive acknowledgments ("OK") and negative +acknowledgments ("Please repeat that."). These control messages allow +the receiver to let the sender know what has been received correctly, +and what has been received in error and thus requires repeating. In a +computer network setting, reliable data transfer protocols based on such +retransmission are known as ARQ (Automatic Repeat reQuest) protocols. +Fundamentally, three additional protocol capabilities are required in +ARQ protocols to handle the presence of bit errors: Error detection. +First, a mechanism is needed to allow the receiver to detect when bit +errors have occurred. Recall from the previous section that UDP uses the +Internet checksum field for exactly this purpose. In Chapter 6 we'll +examine error-detection and -correction techniques in greater detail; +these techniques allow the receiver to detect and possibly correct +packet bit errors. For now, we need only know that these techniques +require that extra bits (beyond the bits of original data to be +transferred) be sent from the sender to the receiver; these bits will be +gathered into the packet checksum field of the rdt2.0 data packet. +Receiver feedback. Since the sender and receiver are typically executing +on different end systems, possibly separated by thousands of miles, the +only way for the sender to learn of the receiver's view of the world (in +this case, whether or not a packet was received correctly) is for the +receiver to provide explicit feedback to the sender. The positive (ACK) +and negative (NAK) acknowledgment replies in the message-dictation +scenario are examples of such feedback. Our rdt2.0 protocol will +similarly send ACK and NAK packets back from the receiver to the sender. +In principle, these packets need only be one bit long; for example, a 0 +value could indicate a NAK and a value of 1 could indicate an ACK. +Retransmission. A packet that is received in error at the receiver will +be retransmitted by the sender. + + Figure 3.10 shows the FSM representation of rdt2.0 , a data transfer +protocol employing error detection, positive acknowledgments, and +negative acknowledgments. The send side of rdt2.0 has two states. In the +leftmost state, the send-side protocol is waiting for data to be passed +down from the upper layer. When the rdt_send(data) event occurs, the +sender will create a packet ( sndpkt ) containing the data to be sent, +along with a packet checksum (for example, as discussed in Section 3.3.2 +for the case of a UDP segment), and then send the packet via the +udt_send(sndpkt) operation. In the rightmost state, the sender protocol +is waiting for an ACK or a NAK packet from the receiver. If an ACK +packet is received + +Figure 3.10 rdt2.0 -- A protocol for a channel with bit errors + +(the notation rdt_rcv(rcvpkt) && isACK (rcvpkt) in Figure 3.10 +corresponds to this event), the sender knows that the most recently +transmitted packet has been received correctly and thus the protocol +returns to the state of waiting for data from the upper layer. If a NAK +is received, the protocol retransmits the last packet and waits for an +ACK or NAK to be returned by the receiver in response to + + the retransmitted data packet. It is important to note that when the +sender is in the wait-for-ACK-or-NAK state, it cannot get more data from +the upper layer; that is, the rdt_send() event can not occur; that will +happen only after the sender receives an ACK and leaves this state. +Thus, the sender will not send a new piece of data until it is sure that +the receiver has correctly received the current packet. Because of this +behavior, protocols such as rdt2.0 are known as stop-and-wait protocols. +The receiver-side FSM for rdt2.0 still has a single state. On packet +arrival, the receiver replies with either an ACK or a NAK, depending on +whether or not the received packet is corrupted. In Figure 3.10, the +notation rdt_rcv(rcvpkt) && corrupt(rcvpkt) corresponds to the event in +which a packet is received and is found to be in error. Protocol rdt2.0 +may look as if it works but, unfortunately, it has a fatal flaw. In +particular, we haven't accounted for the possibility that the ACK or NAK +packet could be corrupted! (Before proceeding on, you should think about +how this problem may be fixed.) Unfortunately, our slight oversight is +not as innocuous as it may seem. Minimally, we will need to add checksum +bits to ACK/NAK packets in order to detect such errors. The more +difficult question is how the protocol should recover from errors in ACK +or NAK packets. The difficulty here is that if an ACK or NAK is +corrupted, the sender has no way of knowing whether or not the receiver +has correctly received the last piece of transmitted data. Consider +three possibilities for handling corrupted ACKs or NAKs: For the first +possibility, consider what a human might do in the message-dictation +scenario. If the speaker didn't understand the "OK" or "Please repeat +that" reply from the receiver, the speaker would probably ask, "What did +you say?" (thus introducing a new type of sender-to-receiver packet to +our protocol). The receiver would then repeat the reply. But what if the +speaker's "What did you say?" is corrupted? The receiver, having no idea +whether the garbled sentence was part of the dictation or a request to +repeat the last reply, would probably then respond with "What did you +say?" And then, of course, that response might be garbled. Clearly, +we're heading down a difficult path. A second alternative is to add +enough checksum bits to allow the sender not only to detect, but also to +recover from, bit errors. This solves the immediate problem for a +channel that can corrupt packets but not lose them. A third approach is +for the sender simply to resend the current data packet when it receives +a garbled ACK or NAK packet. This approach, however, introduces +duplicate packets into the sender-to-receiver channel. The fundamental +difficulty with duplicate packets is that the receiver doesn't know +whether the ACK or NAK it last sent was received correctly at the +sender. Thus, it cannot know a priori whether an arriving packet +contains new data or is a retransmission! A simple solution to this new +problem (and one adopted in almost all existing data transfer protocols, +including TCP) is to add a new field to the data packet and have the +sender number its data packets by putting a sequence number into this +field. The receiver then need only check this sequence number to + + determine whether or not the received packet is a retransmission. For +this simple case of a stop-andwait protocol, a 1-bit sequence number +will suffice, since it will allow the receiver to know whether the +sender is resending the previously transmitted packet (the sequence +number of the received packet has the same sequence number as the most +recently received packet) or a new packet (the sequence number changes, +moving "forward" in modulo-2 arithmetic). Since we are currently +assuming a channel that does not lose packets, ACK and NAK packets do +not themselves need to indicate the sequence number of the packet they +are acknowledging. The sender knows that a received ACK or NAK packet +(whether garbled or not) was generated in response to its most recently +transmitted data packet. Figures 3.11 and 3.12 show the FSM description +for rdt2.1 , our fixed version of rdt2.0 . The rdt2.1 sender and +receiver FSMs each now have twice as many states as before. This is +because the protocol state must now reflect whether the packet currently +being sent (by the sender) or expected (at the receiver) should have a +sequence number of 0 or 1. Note that the actions in those states where a +0numbered packet is being sent or expected are mirror images of those +where a 1-numbered packet is being sent or expected; the only +differences have to do with the handling of the sequence number. +Protocol rdt2.1 uses both positive and negative acknowledgments from the +receiver to the sender. When an out-of-order packet is received, the +receiver sends a positive acknowledgment for the packet it has received. +When a corrupted packet + +Figure 3.11 rdt2.1 sender + + Figure 3.12 rdt2.1 receiver + +is received, the receiver sends a negative acknowledgment. We can +accomplish the same effect as a NAK if, instead of sending a NAK, we +send an ACK for the last correctly received packet. A sender that +receives two ACKs for the same packet (that is, receives duplicate ACKs) +knows that the receiver did not correctly receive the packet following +the packet that is being ACKed twice. Our NAK-free reliable data +transfer protocol for a channel with bit errors is rdt2.2 , shown in +Figures 3.13 and 3.14. One subtle change between rtdt2.1 and rdt2.2 is +that the receiver must now include the sequence number of the packet +being acknowledged by an ACK message (this is done by including the ACK +, 0 or ACK , 1 argument in make_pkt() in the receiver FSM), and the +sender must now check the sequence number of the packet being +acknowledged by a received ACK message (this is done by including the 0 +or 1 argument in isACK() in the sender FSM). Reliable Data Transfer over +a Lossy Channel with Bit Errors: rdt3.0 Suppose now that in addition to +corrupting bits, the underlying channel can lose packets as well, a +notuncommon event in today's computer networks (including the Internet). +Two additional concerns must now be addressed by the protocol: how to +detect packet loss and what to do when packet loss occurs. The use of +checksumming, sequence numbers, ACK packets, and retransmissions---the +techniques + + Figure 3.13 rdt2.2 sender + +already developed in rdt2.2 ---will allow us to answer the latter +concern. Handling the first concern will require adding a new protocol +mechanism. There are many possible approaches toward dealing with packet +loss (several more of which are explored in the exercises at the end of +the chapter). Here, we'll put the burden of detecting and recovering +from lost packets on the sender. Suppose that the sender transmits a +data packet and either that packet, or the receiver's ACK of that +packet, gets lost. In either case, no reply is forthcoming at the sender +from the receiver. If the sender is willing to wait long enough so that +it is certain that a packet has been lost, it can simply retransmit the +data packet. You should convince yourself that this protocol does indeed +work. But how long must the sender wait to be certain that something has +been lost? The sender must clearly wait at least as long as a round-trip +delay between the sender and receiver (which may include buffering at +intermediate routers) plus whatever amount of time is needed to process +a packet at the receiver. In many networks, this worst-case maximum +delay is very difficult even to estimate, much less know with certainty. +Moreover, the protocol should ideally recover from packet loss as soon +as possible; waiting for a worst-case delay could mean a long wait until +error recovery + + Figure 3.14 rdt2.2 receiver + +is initiated. The approach thus adopted in practice is for the sender to +judiciously choose a time value such that packet loss is likely, +although not guaranteed, to have happened. If an ACK is not received +within this time, the packet is retransmitted. Note that if a packet +experiences a particularly large delay, the sender may retransmit the +packet even though neither the data packet nor its ACK have been lost. +This introduces the possibility of duplicate data packets in the +sender-to-receiver channel. Happily, protocol rdt2.2 already has enough +functionality (that is, sequence numbers) to handle the case of +duplicate packets. From the sender's viewpoint, retransmission is a +panacea. The sender does not know whether a data packet was lost, an ACK +was lost, or if the packet or ACK was simply overly delayed. In all +cases, the action is the same: retransmit. Implementing a time-based +retransmission mechanism requires a countdown timer that can interrupt +the sender after a given amount of time has expired. The sender will +thus need to be able to (1) start the timer each time a packet (either a +first-time packet or a retransmission) is sent, (2) respond to a timer +interrupt (taking appropriate actions), and (3) stop the timer. Figure +3.15 shows the sender FSM for rdt3.0 , a protocol that reliably +transfers data over a channel that can corrupt or lose packets; in the +homework problems, you'll be asked to provide the receiver FSM for +rdt3.0 . Figure 3.16 shows how the protocol operates with no lost or +delayed packets and how it handles lost data packets. In Figure 3.16, +time moves forward from the top of the diagram toward the bottom of the + + Figure 3.15 rdt3.0 sender + +diagram; note that a receive time for a packet is necessarily later than +the send time for a packet as a result of transmission and propagation +delays. In Figures 3.16(b)--(d), the send-side brackets indicate the +times at which a timer is set and later times out. Several of the more +subtle aspects of this protocol are explored in the exercises at the end +of this chapter. Because packet sequence numbers alternate between 0 and +1, protocol rdt3.0 is sometimes known as the alternating-bit protocol. +We have now assembled the key elements of a data transfer protocol. +Checksums, sequence numbers, timers, and positive and negative +acknowledgment packets each play a crucial and necessary role in the +operation of the protocol. We now have a working reliable data transfer +protocol! + +Developing a protocol and FSM representation for a simple +application-layer protocol + + 3.4.2 Pipelined Reliable Data Transfer Protocols Protocol rdt3.0 is a +functionally correct protocol, but it is unlikely that anyone would be +happy with its performance, particularly in today's high-speed networks. +At the heart of rdt3.0 's performance problem is the fact that it is a +stop-and-wait protocol. + +Figure 3.16 Operation of rdt3.0 , the alternating-bit protocol + + Figure 3.17 Stop-and-wait versus pipelined protocol + +To appreciate the performance impact of this stop-and-wait behavior, +consider an idealized case of two hosts, one located on the West Coast +of the United States and the other located on the East Coast, as shown +in Figure 3.17. The speed-of-light round-trip propagation delay between +these two end systems, RTT, is approximately 30 milliseconds. Suppose +that they are connected by a channel with a transmission rate, R, of 1 +Gbps (109 bits per second). With a packet size, L, of 1,000 bytes (8,000 +bits) per packet, including both header fields and data, the time needed +to actually transmit the packet into the 1 Gbps link is dtrans=LR=8000 +bits/packet109 bits/sec=8 microseconds Figure 3.18(a) shows that with +our stop-and-wait protocol, if the sender begins sending the packet at +t=0, then at t=L/R=8 microseconds, the last bit enters the channel at +the sender side. The packet then makes its 15-msec cross-country +journey, with the last bit of the packet emerging at the receiver at +t=RTT/2+L/R= 15.008 msec. Assuming for simplicity that ACK packets are +extremely small (so that we can ignore their transmission time) and that +the receiver can send an ACK as soon as the last bit of a data packet is +received, the ACK emerges back at the sender at t=RTT+L/R=30.008 msec. +At this point, the sender can now transmit the next message. Thus, in +30.008 msec, the sender was sending for only 0.008 msec. If we define +the utilization of the sender (or the channel) as the fraction of time +the sender is actually busy sending bits into the channel, the analysis +in Figure 3.18(a) shows that the stop-andwait protocol has a rather +dismal sender utilization, Usender, of Usender=L/RRTT+L/R +=.00830.008=0.00027 + + Figure 3.18 Stop-and-wait and pipelined sending + +That is, the sender was busy only 2.7 hundredths of one percent of the +time! Viewed another way, the sender was able to send only 1,000 bytes +in 30.008 milliseconds, an effective throughput of only 267 kbps---even +though a 1 Gbps link was available! Imagine the unhappy network manager +who just paid a fortune for a gigabit capacity link but manages to get a +throughput of only 267 kilobits per second! This is a graphic example of +how network protocols can limit the capabilities provided by the +underlying network hardware. Also, we have neglected lower-layer +protocol-processing times at the sender and receiver, as well as the +processing and queuing delays that would occur at any intermediate +routers + + between the sender and receiver. Including these effects would serve +only to further increase the delay and further accentuate the poor +performance. The solution to this particular performance problem is +simple: Rather than operate in a stop-and-wait manner, the sender is +allowed to send multiple packets without waiting for acknowledgments, as +illustrated in Figure 3.17(b). Figure 3.18(b) shows that if the sender +is allowed to transmit three packets before having to wait for +acknowledgments, the utilization of the sender is essentially tripled. +Since the many in-transit sender-to-receiver packets can be visualized +as filling a pipeline, this technique is known as pipelining. Pipelining +has the following consequences for reliable data transfer protocols: The +range of sequence numbers must be increased, since each in-transit +packet (not counting retransmissions) must have a unique sequence number +and there may be multiple, in-transit, unacknowledged packets. The +sender and receiver sides of the protocols may have to buffer more than +one packet. Minimally, the sender will have to buffer packets that have +been transmitted but not yet acknowledged. Buffering of correctly +received packets may also be needed at the receiver, as discussed below. +The range of sequence numbers needed and the buffering requirements will +depend on the manner in which a data transfer protocol responds to lost, +corrupted, and overly delayed packets. Two basic approaches toward +pipelined error recovery can be identified: Go-Back-N and selective +repeat. + +3.4.3 Go-Back-N (GBN) In a Go-Back-N (GBN) protocol, the sender is +allowed to transmit multiple packets (when available) without waiting +for an acknowledgment, but is constrained to have no more than some +maximum allowable number, N, of unacknowledged packets in the pipeline. +We describe the GBN protocol in some detail in this section. But before +reading on, you are encouraged to play with the GBN applet (an awesome +applet!) at the companion Web site. Figure 3.19 shows the sender's view +of the range of sequence numbers in a GBN protocol. If we define base to +be the sequence number of the oldest unacknowledged + +Figure 3.19 Sender's view of sequence numbers in Go-Back-N + + packet and nextseqnum to be the smallest unused sequence number (that +is, the sequence number of the next packet to be sent), then four +intervals in the range of sequence numbers can be identified. Sequence +numbers in the interval \[ 0, base-1 \] correspond to packets that have +already been transmitted and acknowledged. The interval \[base, +nextseqnum-1\] corresponds to packets that have been sent but not yet +acknowledged. Sequence numbers in the interval \[nextseqnum, base+N-1\] +can be used for packets that can be sent immediately, should data arrive +from the upper layer. Finally, sequence numbers greater than or equal to +base+N cannot be used until an unacknowledged packet currently in the +pipeline (specifically, the packet with sequence number base ) has been +acknowledged. As suggested by Figure 3.19, the range of permissible +sequence numbers for transmitted but not yet acknowledged packets can be +viewed as a window of size N over the range of sequence numbers. As the +protocol operates, this window slides forward over the sequence number +space. For this reason, N is often referred to as the window size and +the GBN protocol itself as a sliding-window protocol. You might be +wondering why we would even limit the number of outstanding, +unacknowledged packets to a value of N in the first place. Why not allow +an unlimited number of such packets? We'll see in Section 3.5 that flow +control is one reason to impose a limit on the sender. We'll examine +another reason to do so in Section 3.7, when we study TCP congestion +control. In practice, a packet's sequence number is carried in a +fixed-length field in the packet header. If k is the number of bits in +the packet sequence number field, the range of sequence numbers is thus +\[0,2k−1\]. With a finite range of sequence numbers, all arithmetic +involving sequence numbers must then be done using modulo 2k arithmetic. +(That is, the sequence number space can be thought of as a ring of size +2k, where sequence number 2k−1 is immediately followed by sequence +number 0.) Recall that rdt3.0 had a 1-bit sequence number and a range of +sequence numbers of \[0,1\]. Several of the problems at the end of this +chapter explore the consequences of a finite range of sequence numbers. +We will see in Section 3.5 that TCP has a 32-bit sequence number field, +where TCP sequence numbers count bytes in the byte stream rather than +packets. Figures 3.20 and 3.21 give an extended FSM description of the +sender and receiver sides of an ACKbased, NAK-free, GBN protocol. We +refer to this FSM + + Figure 3.20 Extended FSM description of the GBN sender + +Figure 3.21 Extended FSM description of the GBN receiver + +description as an extended FSM because we have added variables (similar +to programming-language variables) for base and nextseqnum , and added +operations on these variables and conditional actions involving these +variables. Note that the extended FSM specification is now beginning to +look somewhat like a programming-language specification. \[Bochman +1984\] provides an excellent survey of + + additional extensions to FSM techniques as well as other +programming-language-based techniques for specifying protocols. The GBN +sender must respond to three types of events: Invocation from above. +When rdt_send() is called from above, the sender first checks to see if +the window is full, that is, whether there are N outstanding, +unacknowledged packets. If the window is not full, a packet is created +and sent, and variables are appropriately updated. If the window is +full, the sender simply returns the data back to the upper layer, an +implicit indication that the window is full. The upper layer would +presumably then have to try again later. In a real implementation, the +sender would more likely have either buffered (but not immediately sent) +this data, or would have a synchronization mechanism (for example, a +semaphore or a flag) that would allow the upper layer to call rdt_send() +only when the window is not full. Receipt of an ACK. In our GBN +protocol, an acknowledgment for a packet with sequence number n will be +taken to be a cumulative acknowledgment, indicating that all packets +with a sequence number up to and including n have been correctly +received at the receiver. We'll come back to this issue shortly when we +examine the receiver side of GBN. A timeout event. The protocol's name, +"Go-Back-N," is derived from the sender's behavior in the presence of +lost or overly delayed packets. As in the stop-and-wait protocol, a +timer will again be used to recover from lost data or acknowledgment +packets. If a timeout occurs, the sender resends all packets that have +been previously sent but that have not yet been acknowledged. Our sender +in Figure 3.20 uses only a single timer, which can be thought of as a +timer for the oldest transmitted but not yet acknowledged packet. If an +ACK is received but there are still additional transmitted but not yet +acknowledged packets, the timer is restarted. If there are no +outstanding, unacknowledged packets, the timer is stopped. The +receiver's actions in GBN are also simple. If a packet with sequence +number n is received correctly and is in order (that is, the data last +delivered to the upper layer came from a packet with sequence number +n−1), the receiver sends an ACK for packet n and delivers the data +portion of the packet to the upper layer. In all other cases, the +receiver discards the packet and resends an ACK for the most recently +received in-order packet. Note that since packets are delivered one at a +time to the upper layer, if packet k has been received and delivered, +then all packets with a sequence number lower than k have also been +delivered. Thus, the use of cumulative acknowledgments is a natural +choice for GBN. In our GBN protocol, the receiver discards out-of-order +packets. Although it may seem silly and wasteful to discard a correctly +received (but out-of-order) packet, there is some justification for +doing so. Recall that the receiver must deliver data in order to the +upper layer. Suppose now that packet n is expected, but packet n+1 +arrives. Because data must be delivered in order, the receiver could +buffer (save) packet n+1 and then deliver this packet to the upper layer +after it had later received and delivered packet n. However, if packet n +is lost, both it and packet n+1 will eventually be retransmitted as a +result of the + + GBN retransmission rule at the sender. Thus, the receiver can simply +discard packet n+1. The advantage of this approach is the simplicity of +receiver buffering---the receiver need not buffer any outof-order +packets. Thus, while the sender must maintain the upper and lower bounds +of its window and the position of nextseqnum within this window, the +only piece of information the receiver need maintain is the sequence +number of the next in-order packet. This value is held in the variable +expectedseqnum , shown in the receiver FSM in Figure 3.21. Of course, +the disadvantage of throwing away a correctly received packet is that +the subsequent retransmission of that packet might be lost or garbled +and thus even more retransmissions would be required. Figure 3.22 shows +the operation of the GBN protocol for the case of a window size of four +packets. Because of this window size limitation, the sender sends +packets 0 through 3 but then must wait for one or more of these packets +to be acknowledged before proceeding. As each successive ACK (for +example, ACK0 and ACK1 ) is received, the window slides forward and the +sender can transmit one new packet (pkt4 and pkt5, respectively). On the +receiver side, packet 2 is lost and thus packets 3, 4, and 5 are found +to be out of order and are discarded. Before closing our discussion of +GBN, it is worth noting that an implementation of this protocol in a +protocol stack would likely have a structure similar to that of the +extended FSM in Figure 3.20. The implementation would also likely be in +the form of various procedures that implement the actions to be taken in +response to the various events that can occur. In such event-based +programming, the various procedures are called (invoked) either by other +procedures in the protocol stack, or as the result of an interrupt. In +the sender, these events would be (1) a call from the upper-layer entity +to invoke rdt_send() , (2) a timer interrupt, and (3) a call from the +lower layer to invoke rdt_rcv() when a packet arrives. The programming +exercises at the end of this chapter will give you a chance to actually +implement these routines in a simulated, but realistic, network setting. +We note here that the GBN protocol incorporates almost all of the +techniques that we will encounter when we study the reliable data +transfer components of TCP in Section 3.5. These techniques include the +use of sequence numbers, cumulative acknowledgments, checksums, and a +timeout/retransmit operation. + + Figure 3.22 Go-Back-N in operation + +3.4.4 Selective Repeat (SR) The GBN protocol allows the sender to +potentially "fill the pipeline" in Figure 3.17 with packets, thus +avoiding the channel utilization problems we noted with stop-and-wait +protocols. There are, however, scenarios in which GBN itself suffers +from performance problems. In particular, when the window size and +bandwidth-delay product are both large, many packets can be in the +pipeline. A single packet error can thus cause GBN to retransmit a large +number of packets, many unnecessarily. As the probability of channel +errors increases, the pipeline can become filled with these unnecessary +retransmissions. Imagine, in our message-dictation scenario, that if +every time a word was garbled, the surrounding 1,000 words (for example, +a window size of 1,000 words) had to be repeated. The dictation would be + + slowed by all of the reiterated words. As the name suggests, +selective-repeat protocols avoid unnecessary retransmissions by having +the sender retransmit only those packets that it suspects were received +in error (that is, were lost or corrupted) at the receiver. This +individual, as-needed, retransmission will require that the receiver +individually acknowledge correctly received packets. A window size of N +will again be used to limit the number of outstanding, unacknowledged +packets in the pipeline. However, unlike GBN, the sender will have +already received ACKs for some of the packets in the window. Figure 3.23 +shows the SR sender's view of the sequence number space. Figure 3.24 +details the various actions taken by the SR sender. The SR receiver will +acknowledge a correctly received packet whether or not it is in order. +Out-of-order packets are buffered until any missing packets (that is, +packets with lower sequence numbers) are received, at which point a +batch of packets can be delivered in order to the upper layer. Figure +3.25 itemizes the various actions taken by the SR receiver. Figure 3.26 +shows an example of SR operation in the presence of lost packets. Note +that in Figure 3.26, the receiver initially buffers packets 3, 4, and 5, +and delivers them together with packet 2 to the upper layer when packet +2 is finally received. + +Figure 3.23 Selective-repeat (SR) sender and receiver views of +sequence-number space + + Figure 3.24 SR sender events and actions + +Figure 3.25 SR receiver events and actions + +It is important to note that in Step 2 in Figure 3.25, the receiver +reacknowledges (rather than ignores) already received packets with +certain sequence numbers below the current window base. You should +convince yourself that this reacknowledgment is indeed needed. Given the +sender and receiver sequence number spaces in Figure 3.23, for example, +if there is no ACK for packet send_base propagating from the + + Figure 3.26 SR operation + +receiver to the sender, the sender will eventually retransmit packet +send_base , even though it is clear (to us, not the sender!) that the +receiver has already received that packet. If the receiver were not to +acknowledge this packet, the sender's window would never move forward! +This example illustrates an important aspect of SR protocols (and many +other protocols as well). The sender and receiver will not always have +an identical view of what has been received correctly and what has not. +For SR protocols, this means that the sender and receiver windows will +not always coincide. The lack of synchronization between sender and +receiver windows has important consequences when we are faced with the +reality of a finite range of sequence numbers. Consider what could +happen, for example, with a finite range of four packet sequence +numbers, 0, 1, 2, 3, and a window size of three. + + Suppose packets 0 through 2 are transmitted and correctly received and +acknowledged at the receiver. At this point, the receiver's window is +over the fourth, fifth, and sixth packets, which have sequence numbers +3, 0, and 1, respectively. Now consider two scenarios. In the first +scenario, shown in Figure 3.27(a), the ACKs for the first three packets +are lost and the sender retransmits these packets. The receiver thus +next receives a packet with sequence number 0---a copy of the first +packet sent. In the second scenario, shown in Figure 3.27(b), the ACKs +for the first three packets are all delivered correctly. The sender thus +moves its window forward and sends the fourth, fifth, and sixth packets, +with sequence numbers 3, 0, and 1, respectively. The packet with +sequence number 3 is lost, but the packet with sequence number 0 +arrives---a packet containing new data. Now consider the receiver's +viewpoint in Figure 3.27, which has a figurative curtain between the +sender and the receiver, since the receiver cannot "see" the actions +taken by the sender. All the receiver observes is the sequence of +messages it receives from the channel and sends into the channel. As far +as it is concerned, the two scenarios in Figure 3.27 are identical. +There is no way of distinguishing the retransmission of the first packet +from an original transmission of the fifth packet. Clearly, a window +size that is 1 less than the size of the sequence number space won't +work. But how small must the window size be? A problem at the end of the +chapter asks you to show that the window size must be less than or equal +to half the size of the sequence number space for SR protocols. At the +companion Web site, you will find an applet that animates the operation +of the SR protocol. Try performing the same experiments that you did +with the GBN applet. Do the results agree with what you expect? This +completes our discussion of reliable data transfer protocols. We've +covered a lot of ground and introduced numerous mechanisms that together +provide for reliable data transfer. Table 3.1 summarizes these +mechanisms. Now that we have seen all of these mechanisms in operation +and can see the "big picture," we encourage you to review this section +again to see how these mechanisms were incrementally added to cover +increasingly complex (and realistic) models of the channel connecting +the sender and receiver, or to improve the performance of the protocols. +Let's conclude our discussion of reliable data transfer protocols by +considering one remaining assumption in our underlying channel model. +Recall that we have assumed that packets cannot be reordered within the +channel between the sender and receiver. This is generally a reasonable +assumption when the sender and receiver are connected by a single +physical wire. However, when the "channel" connecting the two is a +network, packet reordering can occur. One manifestation of packet +reordering is that old copies of a packet with a sequence or +acknowledgment + + Figure 3.27 SR receiver dilemma with too-large windows: A new packet or +a retransmission? + +Table 3.1 Summary of reliable data transfer mechanisms and their use +Mechanism + +Use, Comments + +Checksum + +Used to detect bit errors in a transmitted packet. + +Timer + +Used to timeout/retransmit a packet, possibly because the packet (or its +ACK) was lost within the channel. Because timeouts can occur when a +packet is delayed but not lost (premature timeout), or when a packet has +been received by the receiver but the receiver-to-sender ACK has been +lost, duplicate copies + + of a packet may be received by a receiver. Sequence + +Used for sequential numbering of packets of data flowing from sender to + +number + +receiver. Gaps in the sequence numbers of received packets allow the +receiver to detect a lost packet. Packets with duplicate sequence +numbers allow the receiver to detect duplicate copies of a packet. + +Acknowledgment + +Used by the receiver to tell the sender that a packet or set of packets +has been received correctly. Acknowledgments will typically carry the +sequence number of the packet or packets being acknowledged. +Acknowledgments may be individual or cumulative, depending on the +protocol. + +Negative + +Used by the receiver to tell the sender that a packet has not been +received + +acknowledgment + +correctly. Negative acknowledgments will typically carry the sequence +number of the packet that was not received correctly. + +Window, + +The sender may be restricted to sending only packets with sequence +numbers + +pipelining + +that fall within a given range. By allowing multiple packets to be +transmitted but not yet acknowledged, sender utilization can be +increased over a stop-and-wait mode of operation. We'll see shortly that +the window size may be set on the basis of the receiver's ability to +receive and buffer messages, or the level of congestion in the network, +or both. + +number of x can appear, even though neither the sender's nor the +receiver's window contains x. With packet reordering, the channel can be +thought of as essentially buffering packets and spontaneously emitting +these packets at any point in the future. Because sequence numbers may +be reused, some care must be taken to guard against such duplicate +packets. The approach taken in practice is to ensure that a sequence +number is not reused until the sender is "sure" that any previously sent +packets with sequence number x are no longer in the network. This is +done by assuming that a packet cannot "live" in the network for longer +than some fixed maximum amount of time. A maximum packet lifetime of +approximately three minutes is assumed in the TCP extensions for +high-speed networks \[RFC 1323\]. \[Sunshine 1978\] describes a method +for using sequence numbers such that reordering problems can be +completely avoided. + + 3.5 Connection-Oriented Transport: TCP Now that we have covered the +underlying principles of reliable data transfer, let's turn to TCP---the +Internet's transport-layer, connection-oriented, reliable transport +protocol. In this section, we'll see that in order to provide reliable +data transfer, TCP relies on many of the underlying principles discussed +in the previous section, including error detection, retransmissions, +cumulative acknowledgments, timers, and header fields for sequence and +acknowledgment numbers. TCP is defined in RFC 793, RFC 1122, RFC 1323, +RFC 2018, and RFC 2581. + +3.5.1 The TCP Connection TCP is said to be connection-oriented because +before one application process can begin to send data to another, the +two processes must first "handshake" with each other---that is, they +must send some preliminary segments to each other to establish the +parameters of the ensuing data transfer. As part of TCP connection +establishment, both sides of the connection will initialize many TCP +state variables (many of which will be discussed in this section and in +Section 3.7) associated with the TCP connection. The TCP "connection" is +not an end-to-end TDM or FDM circuit as in a circuit-switched network. +Instead, the "connection" is a logical one, with common state residing +only in the TCPs in the two communicating end systems. Recall that +because the TCP protocol runs only in the end systems and not in the +intermediate network elements (routers and link-layer switches), the +intermediate network elements do not maintain TCP connection state. In +fact, the intermediate routers are completely oblivious to TCP +connections; they see datagrams, not connections. A TCP connection +provides a full-duplex service: If there is a TCP connection between +Process A on one host and Process B on another host, then +application-layer data can flow from Process A to Process B at the same +time as application-layer data flows from Process B to Process A. A TCP +connection is also always point-to-point, that is, between a single +sender and a single receiver. Socalled "multicasting" (see the online +supplementary materials for this text)---the transfer of data from one +sender to many receivers in a single send operation---is not possible +with TCP. With TCP, two hosts are company and three are a crowd! Let's +now take a look at how a TCP connection is established. Suppose a +process running in one host wants to initiate a connection with another +process in another host. Recall that the process that is + + initiating the connection is called the client process, while the other +process is called the server process. The client application process +first informs the client transport layer that it wants to establish a +connection + +CASE HISTORY Vinton Cerf, Robert Kahn, and TCP/IP In the early 1970s, +packet-switched networks began to proliferate, with the ARPAnet---the +precursor of the Internet---being just one of many networks. Each of +these networks had its own protocol. Two researchers, Vinton Cerf and +Robert Kahn, recognized the importance of interconnecting these networks +and invented a cross-network protocol called TCP/IP, which stands for +Transmission Control Protocol/Internet Protocol. Although Cerf and Kahn +began by seeing the protocol as a single entity, it was later split into +its two parts, TCP and IP, which operated separately. Cerf and Kahn +published a paper on TCP/IP in May 1974 in IEEE Transactions on +Communications Technology \[Cerf 1974\]. The TCP/IP protocol, which is +the bread and butter of today's Internet, was devised before PCs, +workstations, smartphones, and tablets, before the proliferation of +Ethernet, cable, and DSL, WiFi, and other access network technologies, +and before the Web, social media, and streaming video. Cerf and Kahn saw +the need for a networking protocol that, on the one hand, provides broad +support for yet-to-be-defined applications and, on the other hand, +allows arbitrary hosts and link-layer protocols to interoperate. In +2004, Cerf and Kahn received the ACM's Turing Award, considered the +"Nobel Prize of Computing" for "pioneering work on internetworking, +including the design and implementation of the Internet's basic +communications protocols, TCP/IP, and for inspired leadership in +networking." + +to a process in the server. Recall from Section 2.7.2, a Python client +program does this by issuing the command + +clientSocket.connect((serverName, serverPort)) + +where serverName is the name of the server and serverPort identifies the +process on the server. TCP in the client then proceeds to establish a +TCP connection with TCP in the server. At the end of this section we +discuss in some detail the connection-establishment procedure. For now +it suffices to know that the client first sends a special TCP segment; +the server responds with a second special TCP segment; and finally the +client responds again with a third special segment. The first two +segments carry no payload, that is, no application-layer data; the third +of these segments may carry a payload. Because + + three segments are sent between the two hosts, this +connection-establishment procedure is often referred to as a three-way +handshake. Once a TCP connection is established, the two application +processes can send data to each other. Let's consider the sending of +data from the client process to the server process. The client process +passes a stream of data through the socket (the door of the process), as +described in Section 2.7. Once the data passes through the door, the +data is in the hands of TCP running in the client. As shown in Figure +3.28, TCP directs this data to the connection's send buffer, which is +one of the buffers that is set aside during the initial three-way +handshake. From time to time, TCP will grab chunks of data from the send +buffer and pass the data to the network layer. Interestingly, the TCP +specification \[RFC 793\] is very laid back about specifying when TCP +should actually send buffered data, stating that TCP should "send that +data in segments at its own convenience." The maximum amount of data +that can be grabbed and placed in a segment is limited by the maximum +segment size (MSS). The MSS is typically set by first determining the +length of the largest link-layer frame that can be sent by the local +sending host (the socalled maximum transmission unit, MTU), and then +setting the MSS to ensure that a TCP segment (when encapsulated in an IP +datagram) plus the TCP/IP header length (typically 40 bytes) will fit +into a single link-layer frame. Both Ethernet and PPP link-layer +protocols have an MTU of 1,500 bytes. Thus a typical value of MSS is +1460 bytes. Approaches have also been proposed for discovering the path +MTU ---the largest link-layer frame that can be sent on all links from +source to destination \[RFC 1191\]---and setting the MSS based on the +path MTU value. Note that the MSS is the maximum amount of +application-layer data in the segment, not the maximum size of the TCP +segment including headers. (This terminology is confusing, but we have +to live with it, as it is well entrenched.) TCP pairs each chunk of +client data with a TCP header, thereby forming TCP segments. The +segments are passed down to the network layer, where they are separately +encapsulated within network-layer IP datagrams. The IP datagrams are +then sent into the network. When TCP receives a segment at the other +end, the segment's data is placed in the TCP connection's receive +buffer, as shown in Figure 3.28. The application reads the stream of +data from this buffer. Each side of the connection has + +Figure 3.28 TCP send and receive buffers + + its own send buffer and its own receive buffer. (You can see the online +flow-control applet at http://www.awl.com/kurose-ross, which provides an +animation of the send and receive buffers.) We see from this discussion +that a TCP connection consists of buffers, variables, and a socket +connection to a process in one host, and another set of buffers, +variables, and a socket connection to a process in another host. As +mentioned earlier, no buffers or variables are allocated to the +connection in the network elements (routers, switches, and repeaters) +between the hosts. + +3.5.2 TCP Segment Structure Having taken a brief look at the TCP +connection, let's examine the TCP segment structure. The TCP segment +consists of header fields and a data field. The data field contains a +chunk of application data. As mentioned above, the MSS limits the +maximum size of a segment's data field. When TCP sends a large file, +such as an image as part of a Web page, it typically breaks the file +into chunks of size MSS (except for the last chunk, which will often be +less than the MSS). Interactive applications, however, often transmit +data chunks that are smaller than the MSS; for example, with remote +login applications like Telnet, the data field in the TCP segment is +often only one byte. Because the TCP header is typically 20 bytes (12 +bytes more than the UDP header), segments sent by Telnet may be only 21 +bytes in length. Figure 3.29 shows the structure of the TCP segment. As +with UDP, the header includes source and destination port numbers, which +are used for multiplexing/demultiplexing data from/to upper-layer +applications. Also, as with UDP, the header includes a checksum field. A +TCP segment header also contains the following fields: The 32-bit +sequence number field and the 32-bit acknowledgment number field are +used by the TCP sender and receiver in implementing a reliable data +transfer service, as discussed below. The 16-bit receive window field is +used for flow control. We will see shortly that it is used to indicate +the number of bytes that a receiver is willing to accept. The 4-bit +header length field specifies the length of the TCP header in 32-bit +words. The TCP header can be of variable length due to the TCP options +field. (Typically, the options field is empty, so that the length of the +typical TCP header is 20 bytes.) The optional and variable-length +options field is used when a sender and receiver negotiate the maximum +segment size (MSS) or as a window scaling factor for use in high-speed +networks. A timestamping option is also defined. See RFC 854 and RFC +1323 for additional details. The flag field contains 6 bits. The ACK bit +is used to indicate that the value carried in the acknowledgment field +is valid; that is, the segment contains an acknowledgment for a segment +that has been successfully received. The RST, + + Figure 3.29 TCP segment structure + +SYN, and FIN bits are used for connection setup and teardown, as we will +discuss at the end of this section. The CWR and ECE bits are used in +explicit congestion notification, as discussed in Section 3.7.2. Setting +the PSH bit indicates that the receiver should pass the data to the +upper layer immediately. Finally, the URG bit is used to indicate that +there is data in this segment that the sending-side upper-layer entity +has marked as "urgent." The location of the last byte of this urgent +data is indicated by the 16-bit urgent data pointer field. TCP must +inform the receiving-side upperlayer entity when urgent data exists and +pass it a pointer to the end of the urgent data. (In practice, the PSH, +URG, and the urgent data pointer are not used. However, we mention these +fields for completeness.) Our experience as teachers is that our +students sometimes find discussion of packet formats rather dry and +perhaps a bit boring. For a fun and fanciful look at TCP header fields, +particularly if you love Legos™ as we do, see \[Pomeranz 2010\]. +Sequence Numbers and Acknowledgment Numbers Two of the most important +fields in the TCP segment header are the sequence number field and the +acknowledgment number field. These fields are a critical part of TCP's +reliable data transfer service. But before discussing how these fields +are used to provide reliable data transfer, let us first explain what +exactly TCP puts in these fields. + + Figure 3.30 Dividing file data into TCP segments + +TCP views data as an unstructured, but ordered, stream of bytes. TCP's +use of sequence numbers reflects this view in that sequence numbers are +over the stream of transmitted bytes and not over the series of +transmitted segments. The sequence number for a segment is therefore the +byte-stream number of the first byte in the segment. Let's look at an +example. Suppose that a process in Host A wants to send a stream of data +to a process in Host B over a TCP connection. The TCP in Host A will +implicitly number each byte in the data stream. Suppose that the data +stream consists of a file consisting of 500,000 bytes, that the MSS is +1,000 bytes, and that the first byte of the data stream is numbered 0. +As shown in Figure 3.30, TCP constructs 500 segments out of the data +stream. The first segment gets assigned sequence number 0, the second +segment gets assigned sequence number 1,000, the third segment gets +assigned sequence number 2,000, and so on. Each sequence number is +inserted in the sequence number field in the header of the appropriate +TCP segment. Now let's consider acknowledgment numbers. These are a +little trickier than sequence numbers. Recall that TCP is full-duplex, +so that Host A may be receiving data from Host B while it sends data to +Host B (as part of the same TCP connection). Each of the segments that +arrive from Host B has a sequence number for the data flowing from B to +A. The acknowledgment number that Host A puts in its segment is the +sequence number of the next byte Host A is expecting from Host B. It is +good to look at a few examples to understand what is going on here. +Suppose that Host A has received all bytes numbered 0 through 535 from B +and suppose that it is about to send a segment to Host B. Host A is +waiting for byte 536 and all the subsequent bytes in Host B's data +stream. So Host A puts 536 in the acknowledgment number field of the +segment it sends to B. As another example, suppose that Host A has +received one segment from Host B containing bytes 0 through 535 and +another segment containing bytes 900 through 1,000. For some reason Host +A has not yet received bytes 536 through 899. In this example, Host A is +still waiting for byte 536 (and beyond) in order to re-create B's data +stream. Thus, A's next segment to B will contain 536 in the +acknowledgment number field. Because TCP only acknowledges bytes up to +the first missing byte in the stream, TCP is said to provide cumulative +acknowledgments. + + This last example also brings up an important but subtle issue. Host A +received the third segment (bytes 900 through 1,000) before receiving +the second segment (bytes 536 through 899). Thus, the third segment +arrived out of order. The subtle issue is: What does a host do when it +receives out-of-order segments in a TCP connection? Interestingly, the +TCP RFCs do not impose any rules here and leave the decision up to the +programmers implementing a TCP implementation. There are basically two +choices: either (1) the receiver immediately discards out-of-order +segments (which, as we discussed earlier, can simplify receiver design), +or (2) the receiver keeps the out-of-order bytes and waits for the +missing bytes to fill in the gaps. Clearly, the latter choice is more +efficient in terms of network bandwidth, and is the approach taken in +practice. In Figure 3.30, we assumed that the initial sequence number +was zero. In truth, both sides of a TCP connection randomly choose an +initial sequence number. This is done to minimize the possibility that a +segment that is still present in the network from an earlier, +already-terminated connection between two hosts is mistaken for a valid +segment in a later connection between these same two hosts (which also +happen to be using the same port numbers as the old connection) +\[Sunshine 1978\]. Telnet: A Case Study for Sequence and Acknowledgment +Numbers Telnet, defined in RFC 854, is a popular application-layer +protocol used for remote login. It runs over TCP and is designed to work +between any pair of hosts. Unlike the bulk data transfer applications +discussed in Chapter 2, Telnet is an interactive application. We discuss +a Telnet example here, as it nicely illustrates TCP sequence and +acknowledgment numbers. We note that many users now prefer to use the +SSH protocol rather than Telnet, since data sent in a Telnet connection +(including passwords!) are not encrypted, making Telnet vulnerable to +eavesdropping attacks (as discussed in Section 8.7). Suppose Host A +initiates a Telnet session with Host B. Because Host A initiates the +session, it is labeled the client, and Host B is labeled the server. +Each character typed by the user (at the client) will be sent to the +remote host; the remote host will send back a copy of each character, +which will be displayed on the Telnet user's screen. This "echo back" is +used to ensure that characters seen by the Telnet user have already been +received and processed at the remote site. Each character thus traverses +the network twice between the time the user hits the key and the time +the character is displayed on the user's monitor. Now suppose the user +types a single letter, 'C,' and then grabs a coffee. Let's examine the +TCP segments that are sent between the client and server. As shown in +Figure 3.31, we suppose the starting sequence numbers are 42 and 79 for +the client and server, respectively. Recall that the sequence number of +a segment is the sequence number of the first byte in the data field. +Thus, the first segment sent from the client will have sequence number +42; the first segment sent from the server will have sequence number 79. +Recall that the acknowledgment number is the sequence + + Figure 3.31 Sequence and acknowledgment numbers for a simple Telnet +application over TCP + +number of the next byte of data that the host is waiting for. After the +TCP connection is established but before any data is sent, the client is +waiting for byte 79 and the server is waiting for byte 42. As shown in +Figure 3.31, three segments are sent. The first segment is sent from the +client to the server, containing the 1-byte ASCII representation of the +letter 'C' in its data field. This first segment also has 42 in its +sequence number field, as we just described. Also, because the client +has not yet received any data from the server, this first segment will +have 79 in its acknowledgment number field. The second segment is sent +from the server to the client. It serves a dual purpose. First it +provides an acknowledgment of the data the server has received. By +putting 43 in the acknowledgment field, the server is telling the client +that it has successfully received everything up through byte 42 and is +now waiting for bytes 43 onward. The second purpose of this segment is +to echo back the letter 'C.' Thus, the second segment has the ASCII +representation of 'C' in its data field. This second segment has the +sequence number 79, the initial sequence number of the server-to-client +data flow of this TCP connection, as this is the very first byte of data +that the server is sending. Note that the acknowledgment for +client-to-server data is carried in a segment carrying server-to-client +data; this acknowledgment is said to be piggybacked on the +server-to-client data segment. + + The third segment is sent from the client to the server. Its sole +purpose is to acknowledge the data it has received from the server. +(Recall that the second segment contained data---the letter 'C'---from +the server to the client.) This segment has an empty data field (that +is, the acknowledgment is not being piggybacked with any +client-to-server data). The segment has 80 in the acknowledgment number +field because the client has received the stream of bytes up through +byte sequence number 79 and it is now waiting for bytes 80 onward. You +might think it odd that this segment also has a sequence number since +the segment contains no data. But because TCP has a sequence number +field, the segment needs to have some sequence number. + +3.5.3 Round-Trip Time Estimation and Timeout TCP, like our rdt protocol +in Section 3.4, uses a timeout/retransmit mechanism to recover from lost +segments. Although this is conceptually simple, many subtle issues arise +when we implement a timeout/retransmit mechanism in an actual protocol +such as TCP. Perhaps the most obvious question is the length of the +timeout intervals. Clearly, the timeout should be larger than the +connection's round-trip time (RTT), that is, the time from when a +segment is sent until it is acknowledged. Otherwise, unnecessary +retransmissions would be sent. But how much larger? How should the RTT +be estimated in the first place? Should a timer be associated with each +and every unacknowledged segment? So many questions! Our discussion in +this section is based on the TCP work in \[Jacobson 1988\] and the +current IETF recommendations for managing TCP timers \[RFC 6298\]. +Estimating the Round-Trip Time Let's begin our study of TCP timer +management by considering how TCP estimates the round-trip time between +sender and receiver. This is accomplished as follows. The sample RTT, +denoted SampleRTT , for a segment is the amount of time between when the +segment is sent (that is, passed to IP) and when an acknowledgment for +the segment is received. Instead of measuring a SampleRTT for every +transmitted segment, most TCP implementations take only one SampleRTT +measurement at a time. That is, at any point in time, the SampleRTT is +being estimated for only one of the transmitted but currently +unacknowledged segments, leading to a new value of SampleRTT +approximately once every RTT. Also, TCP never computes a SampleRTT for a +segment that has been retransmitted; it only measures SampleRTT for +segments that have been transmitted once \[Karn 1987\]. (A problem at +the end of the chapter asks you to consider why.) Obviously, the +SampleRTT values will fluctuate from segment to segment due to +congestion in the routers and to the varying load on the end systems. +Because of this fluctuation, any given SampleRTT value may be atypical. +In order to estimate a typical RTT, it is therefore natural to take some +sort of average of the SampleRTT values. TCP maintains an average, +called EstimatedRTT , of the + + SampleRTT values. Upon obtaining a new SampleRTT , TCP updates +EstimatedRTT according to the following formula: + +EstimatedRTT=(1−α)⋅EstimatedRTT+α⋅SampleRTT The formula above is written +in the form of a programming-language statement---the new value of +EstimatedRTT is a weighted combination of the previous value of +EstimatedRTT and the new value for SampleRTT. The recommended value of α +is α = 0.125 (that is, 1/8) \[RFC 6298\], in which case the formula +above becomes: + +EstimatedRTT=0.875⋅EstimatedRTT+0.125⋅SampleRTT + +Note that EstimatedRTT is a weighted average of the SampleRTT values. As +discussed in a homework problem at the end of this chapter, this +weighted average puts more weight on recent samples than on old samples. +This is natural, as the more recent samples better reflect the current +congestion in the network. In statistics, such an average is called an +exponential weighted moving average (EWMA). The word "exponential" +appears in EWMA because the weight of a given SampleRTT decays +exponentially fast as the updates proceed. In the homework problems you +will be asked to derive the exponential term in EstimatedRTT . Figure +3.32 shows the SampleRTT values and EstimatedRTT for a value of α = 1/8 +for a TCP connection between gaia.cs.umass.edu (in Amherst, +Massachusetts) to fantasia.eurecom.fr (in the south of France). Clearly, +the variations in the SampleRTT are smoothed out in the computation of +the EstimatedRTT . In addition to having an estimate of the RTT, it is +also valuable to have a measure of the variability of the RTT. \[RFC +6298\] defines the RTT variation, DevRTT , as an estimate of how much +SampleRTT typically deviates from EstimatedRTT : + +DevRTT=(1−β)⋅DevRTT+β⋅\|SampleRTT−EstimatedRTT\| + +Note that DevRTT is an EWMA of the difference between SampleRTT and +EstimatedRTT . If the SampleRTT values have little fluctuation, then +DevRTT will be small; on the other hand, if there is a lot of +fluctuation, DevRTT will be large. The recommended value of β is 0.25. + + Setting and Managing the Retransmission Timeout Interval Given values of +EstimatedRTT and DevRTT , what value should be used for TCP's timeout +interval? Clearly, the interval should be greater than or equal to + +PRINCIPLES IN PRACTICE TCP provides reliable data transfer by using +positive acknowledgments and timers in much the same way that we studied +in Section 3.4. TCP acknowledges data that has been received correctly, +and it then retransmits segments when segments or their corresponding +acknowledgments are thought to be lost or corrupted. Certain versions of +TCP also have an implicit NAK mechanism---with TCP's fast retransmit +mechanism, the receipt of three duplicate ACKs for a given segment +serves as an implicit NAK for the following segment, triggering +retransmission of that segment before timeout. TCP uses sequences of +numbers to allow the receiver to identify lost or duplicate segments. +Just as in the case of our reliable data transfer protocol, rdt3.0 , TCP +cannot itself tell for certain if a segment, or its ACK, is lost, +corrupted, or overly delayed. At the sender, TCP's response will be the +same: retransmit the segment in question. TCP also uses pipelining, +allowing the sender to have multiple transmitted but +yet-to-beacknowledged segments outstanding at any given time. We saw +earlier that pipelining can greatly improve a session's throughput when +the ratio of the segment size to round-trip delay is small. The specific +number of outstanding, unacknowledged segments that a sender can have is +determined by TCP's flow-control and congestion-control mechanisms. TCP +flow control is discussed at the end of this section; TCP congestion +control is discussed in Section 3.7. For the time being, we must simply +be aware that the TCP sender uses pipelining. EstimatedRTT , or +unnecessary retransmissions would be sent. But the timeout interval +should not be too much larger than EstimatedRTT ; otherwise, when a +segment is lost, TCP would not quickly retransmit the segment, leading +to large data transfer delays. It is therefore desirable to set the +timeout equal to the EstimatedRTT plus some margin. The margin should be +large when there is a lot of fluctuation in the SampleRTT values; it +should be small when there is little fluctuation. The value of DevRTT +should thus come into play here. All of these considerations are taken +into account in TCP's method for determining the retransmission timeout +interval: + +TimeoutInterval=EstimatedRTT+4⋅DevRTT + +An initial TimeoutInterval value of 1 second is recommended \[RFC +6298\]. Also, when a timeout occurs, the value of TimeoutInterval is +doubled to avoid a premature timeout occurring for a + + subsequent segment that will soon be acknowledged. However, as soon as a +segment is received and EstimatedRTT is updated, the TimeoutInterval is +again computed using the formula above. + +Figure 3.32 RTT samples and RTT estimates + +3.5.4 Reliable Data Transfer Recall that the Internet's network-layer +service (IP service) is unreliable. IP does not guarantee datagram +delivery, does not guarantee in-order delivery of datagrams, and does +not guarantee the integrity of the data in the datagrams. With IP +service, datagrams can overflow router buffers and never reach their +destination, datagrams can arrive out of order, and bits in the datagram +can get corrupted (flipped from 0 to 1 and vice versa). Because +transport-layer segments are carried across the network by IP datagrams, +transport-layer segments can suffer from these problems as well. TCP +creates a reliable data transfer service on top of IP's unreliable +best-effort service. TCP's reliable data transfer service ensures that +the data stream that a process reads out of its TCP receive buffer is +uncorrupted, without gaps, without duplication, and in sequence; that +is, the byte stream is exactly the same byte stream that was sent by the +end system on the other side of the connection. How TCP provides a +reliable data transfer involves many of the principles that we studied +in Section 3.4. In our earlier development of reliable data transfer +techniques, it was conceptually easiest to assume + + that an individual timer is associated with each transmitted but not yet +acknowledged segment. While this is great in theory, timer management +can require considerable overhead. Thus, the recommended TCP timer +management procedures \[RFC 6298\] use only a single retransmission +timer, even if there are multiple transmitted but not yet acknowledged +segments. The TCP protocol described in this section follows this +single-timer recommendation. We will discuss how TCP provides reliable +data transfer in two incremental steps. We first present a highly +simplified description of a TCP sender that uses only timeouts to +recover from lost segments; we then present a more complete description +that uses duplicate acknowledgments in addition to timeouts. In the +ensuing discussion, we suppose that data is being sent in only one +direction, from Host A to Host B, and that Host A is sending a large +file. Figure 3.33 presents a highly simplified description of a TCP +sender. We see that there are three major events related to data +transmission and retransmission in the TCP sender: data received from +application above; timer timeout; and ACK + +Figure 3.33 Simplified TCP sender + + receipt. Upon the occurrence of the first major event, TCP receives data +from the application, encapsulates the data in a segment, and passes the +segment to IP. Note that each segment includes a sequence number that is +the byte-stream number of the first data byte in the segment, as +described in Section 3.5.2. Also note that if the timer is already not +running for some other segment, TCP starts the timer when the segment is +passed to IP. (It is helpful to think of the timer as being associated +with the oldest unacknowledged segment.) The expiration interval for +this timer is the TimeoutInterval , which is calculated from +EstimatedRTT and DevRTT , as described in Section 3.5.3. The second +major event is the timeout. TCP responds to the timeout event by +retransmitting the segment that caused the timeout. TCP then restarts +the timer. The third major event that must be handled by the TCP sender +is the arrival of an acknowledgment segment (ACK) from the receiver +(more specifically, a segment containing a valid ACK field value). On +the occurrence of this event, TCP compares the ACK value y with its +variable SendBase . The TCP state variable SendBase is the sequence +number of the oldest unacknowledged byte. (Thus SendBase--1 is the +sequence number of the last byte that is known to have been received +correctly and in order at the receiver.) As indicated earlier, TCP uses +cumulative acknowledgments, so that y acknowledges the receipt of all +bytes before byte number y . If y \> SendBase , then the ACK is +acknowledging one or more previously unacknowledged segments. Thus the +sender updates its SendBase variable; it also restarts the timer if +there currently are any not-yet-acknowledged segments. A Few Interesting +Scenarios We have just described a highly simplified version of how TCP +provides reliable data transfer. But even this highly simplified version +has many subtleties. To get a good feeling for how this protocol works, +let's now walk through a few simple scenarios. Figure 3.34 depicts the +first scenario, in which Host A sends one segment to Host B. Suppose +that this segment has sequence number 92 and contains 8 bytes of data. +After sending this segment, Host A waits for a segment from B with +acknowledgment number 100. Although the segment from A is received at B, +the acknowledgment from B to A gets lost. In this case, the timeout +event occurs, and Host A retransmits the same segment. Of course, when +Host B receives the retransmission, it observes from the sequence number +that the segment contains data that has already been received. Thus, TCP +in Host B will discard the bytes in the retransmitted segment. In a +second scenario, shown in Figure 3.35, Host A sends two segments back to +back. The first segment has sequence number 92 and 8 bytes of data, and +the second segment has sequence number 100 and 20 bytes of data. Suppose +that both segments arrive intact at B, and B sends two separate +acknowledgments for each of these segments. The first of these +acknowledgments has acknowledgment number 100; the second has +acknowledgment number 120. Suppose now that neither of the +acknowledgments arrives at Host A before the timeout. When the timeout +event occurs, Host + + Figure 3.34 Retransmission due to a lost acknowledgment + +A resends the first segment with sequence number 92 and restarts the +timer. As long as the ACK for the second segment arrives before the new +timeout, the second segment will not be retransmitted. In a third and +final scenario, suppose Host A sends the two segments, exactly as in the +second example. The acknowledgment of the first segment is lost in the +network, but just before the timeout event, Host A receives an +acknowledgment with acknowledgment number 120. Host A therefore knows +that Host B has received everything up through byte 119; so Host A does +not resend either of the two segments. This scenario is illustrated in +Figure 3.36. Doubling the Timeout Interval We now discuss a few +modifications that most TCP implementations employ. The first concerns +the length of the timeout interval after a timer expiration. In this +modification, whenever the timeout event occurs, TCP retransmits the +not-yet-acknowledged segment with the smallest sequence number, as +described above. But each time TCP retransmits, it sets the next timeout +interval to twice the previous value, + + Figure 3.35 Segment 100 not retransmitted + +rather than deriving it from the last EstimatedRTT and DevRTT (as +described in Section 3.5.3). For example, suppose TimeoutInterval +associated with the oldest not yet acknowledged segment is .75 sec when +the timer first expires. TCP will then retransmit this segment and set +the new expiration time to 1.5 sec. If the timer expires again 1.5 sec +later, TCP will again retransmit this segment, now setting the +expiration time to 3.0 sec. Thus the intervals grow exponentially after +each retransmission. However, whenever the timer is started after either +of the two other events (that is, data received from application above, +and ACK received), the TimeoutInterval is derived from the most recent +values of EstimatedRTT and DevRTT . This modification provides a limited +form of congestion control. (More comprehensive forms of TCP congestion +control will be studied in Section 3.7.) The timer expiration is most +likely caused by congestion in the network, that is, too many packets +arriving at one (or more) router queues in the path between the source +and destination, causing packets to be dropped and/or long queuing +delays. In times of congestion, if the sources continue to retransmit +packets persistently, the congestion + + Figure 3.36 A cumulative acknowledgment avoids retransmission of the +first segment + +may get worse. Instead, TCP acts more politely, with each sender +retransmitting after longer and longer intervals. We will see that a +similar idea is used by Ethernet when we study CSMA/CD in Chapter 6. +Fast Retransmit One of the problems with timeout-triggered +retransmissions is that the timeout period can be relatively long. When +a segment is lost, this long timeout period forces the sender to delay +resending the lost packet, thereby increasing the end-to-end delay. +Fortunately, the sender can often detect packet loss well before the +timeout event occurs by noting so-called duplicate ACKs. A duplicate ACK +is an ACK that reacknowledges a segment for which the sender has already +received an earlier acknowledgment. To understand the sender's response +to a duplicate ACK, we must look at why the receiver sends a duplicate +ACK in the first place. Table 3.2 summarizes the TCP receiver's ACK +generation policy \[RFC 5681\]. When a TCP receiver receives Table 3.2 +TCP ACK Generation Recommendation \[RFC 5681\] Event + +TCP Receiver Action + + Arrival of in-order segment with expected + +Delayed ACK. Wait up to 500 msec for arrival of + +sequence number. All data up to expected + +another in-order segment. If next in-order segment + +sequence number already acknowledged. + +does not arrive in this interval, send an ACK. + +Arrival of in-order segment with expected + +One Immediately send single cumulative ACK, + +sequence number. One other in-order + +ACKing both in-order segments. + +segment waiting for ACK transmission. Arrival of out-of-order segment +with higher- + +Immediately send duplicate ACK, indicating + +than-expected sequence number. Gap + +sequence number of next expected byte (which is + +detected. + +the lower end of the gap). + +Arrival of segment that partially or completely + +Immediately send ACK, provided that segment + +fills in gap in received data. + +starts at the lower end of gap. + +a segment with a sequence number that is larger than the next, expected, +in-order sequence number, it detects a gap in the data stream---that is, +a missing segment. This gap could be the result of lost or reordered +segments within the network. Since TCP does not use negative +acknowledgments, the receiver cannot send an explicit negative +acknowledgment back to the sender. Instead, it simply reacknowledges +(that is, generates a duplicate ACK for) the last in-order byte of data +it has received. (Note that Table 3.2 allows for the case that the +receiver does not discard out-of-order segments.) Because a sender often +sends a large number of segments back to back, if one segment is lost, +there will likely be many back-to-back duplicate ACKs. If the TCP sender +receives three duplicate ACKs for the same data, it takes this as an +indication that the segment following the segment that has been ACKed +three times has been lost. (In the homework problems, we consider the +question of why the sender waits for three duplicate ACKs, rather than +just a single duplicate ACK.) In the case that three duplicate ACKs are +received, the TCP sender performs a fast retransmit \[RFC 5681\], +retransmitting the missing segment before that segment's timer expires. +This is shown in Figure 3.37, where the second segment is lost, then +retransmitted before its timer expires. For TCP with fast retransmit, +the following code snippet replaces the ACK received event in Figure +3.33: + +event: ACK received, with ACK field value of y if (y \> SendBase) { +SendBase=y if (there are currently any not yet acknowledged segments) +start timer + + } + +Figure 3.37 Fast retransmit: retransmitting the missing segment before +the segment's timer expires + +else {/\* a duplicate ACK for already ACKed segment */ increment number +of duplicate ACKs received for y if (number of duplicate ACKS received +for y==3) /* TCP fast retransmit \*/ resend segment with sequence number +y } break; + + We noted earlier that many subtle issues arise when a timeout/retransmit +mechanism is implemented in an actual protocol such as TCP. The +procedures above, which have evolved as a result of more than 20 years +of experience with TCP timers, should convince you that this is indeed +the case! Go-Back-N or Selective Repeat? Let us close our study of TCP's +error-recovery mechanism by considering the following question: Is TCP a +GBN or an SR protocol? Recall that TCP acknowledgments are cumulative +and correctly received but out-of-order segments are not individually +ACKed by the receiver. Consequently, as shown in Figure 3.33 (see also +Figure 3.19), the TCP sender need only maintain the smallest sequence +number of a transmitted but unacknowledged byte ( SendBase ) and the +sequence number of the next byte to be sent ( NextSeqNum ). In this +sense, TCP looks a lot like a GBN-style protocol. But there are some +striking differences between TCP and Go-Back-N. Many TCP implementations +will buffer correctly received but out-of-order segments \[Stevens +1994\]. Consider also what happens when the sender sends a sequence of +segments 1, 2, . . ., N, and all of the segments arrive in order without +error at the receiver. Further suppose that the acknowledgment for +packet n\