From a3640c3c6c9ab5123aee5d7319916dde729f5815 Mon Sep 17 00:00:00 2001 From: mo khan Date: Sat, 27 Sep 2025 12:09:28 -0600 Subject: Add generated content to generated folder --- generated/final-exam-notes.md | 248 + generated/textbook-extracted.txt | 24488 +++++++++++++++++++++++++++++++++++++ generated/textbook-formatted.md | 442 + 3 files changed, 25178 insertions(+) create mode 100644 generated/final-exam-notes.md create mode 100644 generated/textbook-extracted.txt create mode 100644 generated/textbook-formatted.md (limited to 'generated') diff --git a/generated/final-exam-notes.md b/generated/final-exam-notes.md new file mode 100644 index 0000000..d17902c --- /dev/null +++ b/generated/final-exam-notes.md @@ -0,0 +1,248 @@ +# COMP-347 Computer Networks Final Exam Study Guide + +## Unit 1: Introduction to Computer Networks, the Internet, and the World Wide Web +**Chapter 1: Computer Networks and the Internet** + +### Key Topics: +- **What Is the Internet?** + - Nuts-and-bolts description: hosts, communication links, packet switches, protocols + - Services description: distributed applications, socket interface + - **Protocol definition**: format and order of messages, actions taken + +- **The Network Edge** + - Access Networks: DSL, cable, FTTH, Ethernet, WiFi + - Physical Media: guided (copper, fiber) vs. unguided (wireless) + +- **The Network Core** + - **Packet Switching**: store-and-forward, queuing delays, packet loss + - **Circuit Switching**: FDM, TDM, vs. packet switching comparison + - Network of Networks: ISP structure + +- **Performance Metrics & Formulas** + - **Delay components**: processing, queuing, transmission, propagation + - **Transmission delay**: L/R (packet length/transmission rate) + - **Propagation delay**: d/s (distance/propagation speed) + - **End-to-end delay**: sum of all delays across path + - **Throughput**: min(R1, R2, ..., RN) for bottleneck link + +- **Protocol Layers and Service Models** + - **5-layer Internet protocol stack**: Application, Transport, Network, Link, Physical + - **Encapsulation**: headers added at each layer + +- **Network Security** + - Types of attacks: DoS, packet sniffing, IP spoofing + +## Unit 2: The Application Layer and Network Applications +**Chapter 2: Application Layer** + +### Key Topics: +- **Principles of Network Applications** + - **Application architectures**: client-server, P2P, hybrid + - **Process communication**: sockets, addressing (IP + port) + - **Transport services**: reliable/unreliable, throughput, timing, security + +- **Web and HTTP** + - **HTTP overview**: request-response, stateless + - **Connection types**: non-persistent vs. persistent + - **HTTP message format**: requests (GET, POST, HEAD) and responses + - **Status codes**: 200 OK, 404 Not Found, etc. + - **Cookies**: user-server state + - **Web caching**: proxy servers, conditional GET + +- **Electronic Mail** + - **SMTP**: Simple Mail Transfer Protocol + - **Message formats**: RFC 822 + - **Mail access protocols**: POP3, IMAP, HTTP + +- **DNS (Domain Name System)** + - **Services**: hostname-to-IP translation, mail server aliasing + - **Hierarchy**: root, TLD, authoritative servers + - **DNS records**: A, NS, CNAME, MX + - **DNS messages**: queries and replies + +- **P2P Applications** + - **File distribution**: BitTorrent protocol + - **DHT (Distributed Hash Tables)** + +- **Video Streaming and CDNs** + - **DASH**: Dynamic Adaptive Streaming over HTTP + - **Content Distribution Networks**: origin servers, clusters + +- **Socket Programming** + - UDP and TCP socket programming principles + +## Unit 3: The Transport Layer +**Chapter 3: Transport Layer** + +### Key Topics: +- **Transport Layer Services** + - **Relationship to network layer**: logical communication between processes + - **Multiplexing/Demultiplexing**: using port numbers + +- **UDP (User Datagram Protocol)** + - **Characteristics**: connectionless, unreliable, no flow control + - **UDP segment structure**: source/dest ports, length, checksum + - **Checksum calculation**: Internet checksum algorithm + +- **Principles of Reliable Data Transfer** + - **rdt protocols**: rdt1.0, rdt2.0, rdt2.1, rdt2.2, rdt3.0 + - **ARQ protocols**: acknowledgments, negative acknowledgments, timeouts + - **Pipelined protocols**: Go-Back-N (GBN), Selective Repeat (SR) + +- **TCP (Transmission Control Protocol)** + - **Connection-oriented**: 3-way handshake, connection teardown + - **TCP segment structure**: sequence numbers, acknowledgment numbers + - **RTT estimation**: EstimatedRTT, DevRTT, TimeoutInterval + - **Reliable data transfer**: cumulative ACKs, fast retransmit + - **Flow control**: receive window (rwnd) + - **Connection management**: SYN, SYNACK, FIN + +- **Congestion Control** + - **Principles**: network-assisted vs. end-to-end + - **TCP congestion control algorithms**: + - **Slow start**: exponential increase + - **Congestion avoidance**: linear increase (AIMD) + - **Fast recovery**: multiplicative decrease + - **TCP variants**: Tahoe, Reno, New Reno, CUBIC + - **Fairness**: achieving fair bandwidth allocation + +## Unit 4: The Network Layer: Data Plane +**Chapter 4: The Network Layer: Data Plane** + +### Key Topics: +- **Network Layer Overview** + - **Forwarding vs. Routing**: data plane vs. control plane + - **Network service models**: best-effort, guaranteed services + +- **Router Architecture** + - **Input port processing**: destination-based forwarding + - **Switching fabrics**: memory, bus, crossbar + - **Output port processing**: buffering, scheduling + - **Packet scheduling**: FIFO, priority queuing, round robin, WFQ + +- **Internet Protocol (IP)** + - **IPv4 datagram format**: version, header length, TOS, length, ID, flags, fragment offset, TTL, protocol, checksum + - **Fragmentation**: MTU, fragmentation process + - **IPv4 addressing**: + - **CIDR notation**: a.b.c.d/x + - **Subnetting**: network and host portions + - **DHCP**: dynamic address assignment + - **NAT (Network Address Translation)**: private addresses, port mapping + - **IPv6**: address format, transitioning from IPv4 + +- **Generalized Forwarding and SDN** + - **OpenFlow**: match-plus-action paradigm + - **Flow tables**: match fields, actions, statistics + +## Unit 5: The Network Layer: Control Plane +**Chapter 5: The Network Layer: Control Plane** + +### Key Topics: +- **Routing Algorithms** + - **Link-State (LS)**: Dijkstra's algorithm, OSPF + - **Distance-Vector (DV)**: Bellman-Ford equation, RIP + - **Comparison**: message complexity, convergence, robustness + +- **Intra-AS Routing: OSPF** + - **OSPF protocol**: link-state advertisements, area concept + - **Hierarchical OSPF**: backbone area, area border routers + +- **Inter-AS Routing: BGP** + - **BGP basics**: eBGP, iBGP sessions + - **Path attributes**: AS-PATH, NEXT-HOP + - **Route selection**: local preference, shortest AS-PATH + - **BGP routing policies**: customer-provider, peer-to-peer + +- **SDN Control Plane** + - **SDN architecture**: controller, southbound/northbound APIs + - **OpenFlow protocol**: controller-switch communication + +- **ICMP (Internet Control Message Protocol)** + - **ICMP message types**: ping, traceroute + - **Error reporting**: destination unreachable, time exceeded + +- **Network Management** + - **SNMP**: Simple Network Management Protocol + - **MIB**: Management Information Base + +## Unit 6: The Link Layer and Local Area Networks +**Chapter 6: The Link Layer and LANs** + +### Key Topics: +- **Link Layer Services** + - **Framing**: encapsulation into frames + - **Link access**: MAC protocols + - **Reliable delivery**: error detection/correction + - **Error detection**: parity checks, checksums, CRC + +- **Error Detection and Correction** + - **Parity bits**: single bit parity, 2D parity + - **Cyclic Redundancy Check (CRC)**: polynomial arithmetic + - **Checksumming**: Internet checksum + +- **Multiple Access Protocols** + - **Channel partitioning**: TDMA, FDMA, CDMA + - **Random access**: ALOHA, slotted ALOHA, CSMA, CSMA/CD + - **Taking turns**: polling, token passing + +- **Switched LANs** + - **Link-layer addressing**: MAC addresses, ARP protocol + - **Ethernet**: frame format, Manchester encoding, 10BASE-T, 100BASE-T, Gigabit Ethernet + - **Link-layer switches**: learning, flooding, spanning tree protocol + - **VLANs**: port-based, 802.1Q standard + +- **Link Virtualization** + - **MPLS**: Multiprotocol Label Switching + +- **Data Center Networking** + - **Load balancing**: techniques and architectures + +## Unit 7: Wireless and Mobile Networks +**Chapter 7: Wireless and Mobile Networks** + +### Key Topics: +- **Wireless Link Characteristics** + - **Signal attenuation**: path loss, multipath propagation + - **Interference**: from other sources + - **CDMA**: Code Division Multiple Access + +- **WiFi: 802.11 Wireless LANs** + - **802.11 architecture**: BSS, AP, ad hoc networks + - **802.11 MAC protocol**: CSMA/CA, RTS/CTS, hidden terminal problem + - **802.11 frame structure**: address fields, frame types + - **Mobility within IP subnet**: association, reassociation + +- **Cellular Networks** + - **Cellular architecture**: cells, base stations, MSC + - **3G networks**: UMTS, data and voice integration + - **4G LTE**: all-IP architecture, OFDMA + +- **Mobility Management** + - **Addressing approaches**: indirect routing, direct routing + - **Mobile IP**: home agent, foreign agent, care-of address + +- **Impact on Higher Layers** + - **TCP over wireless**: performance issues, solutions + +--- + +## Key Formulas for Exam: + +1. **Transmission Delay**: L/R (bits/bps) +2. **Propagation Delay**: d/s (distance/speed) +3. **Queuing Delay**: depends on traffic intensity (La/R) +4. **Throughput**: min(R1, R2, ..., RN) +5. **TCP Timeout**: EstimatedRTT + 4 × DevRTT +6. **TCP Window Size**: min(rwnd, cwnd) +7. **Utilization**: U = (L/R)/(RTT + L/R) for stop-and-wait +8. **CRC Calculation**: polynomial division +9. **Efficiency of Slotted ALOHA**: maximum 1/e ≈ 0.37 + +## Important Protocols Summary: +- **Application Layer**: HTTP, SMTP, DNS, DHCP +- **Transport Layer**: TCP, UDP +- **Network Layer**: IP, ICMP, OSPF, BGP +- **Link Layer**: Ethernet, ARP, PPP +- **Wireless**: 802.11, 802.3 + +This study guide covers the essential concepts, protocols, and formulas from each unit that are most likely to appear on your final exam. Focus on understanding the principles behind each protocol and how they work together in the Internet architecture. \ No newline at end of file diff --git a/generated/textbook-extracted.txt b/generated/textbook-extracted.txt new file mode 100644 index 0000000..5403e7e --- /dev/null +++ b/generated/textbook-extracted.txt @@ -0,0 +1,24488 @@ + Computer Networking +A Top-Down Approach +Seventh Edition +James F. Kurose +University of Massachusetts, Amherst +Keith W. Ross +NYU and NYU Shanghai + +Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam Cape +Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São +Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo +Vice President, Editorial Director, ECS: Marcia Horton +Acquisitions Editor: Matt Goldstein +Editorial Assistant: Kristy Alaura +Vice President of Marketing: Christy Lesko +Director of Field Marketing: Tim Galligan +Product Marketing Manager: Bram Van Kempen +Field Marketing Manager: Demetrius Hall +Marketing Assistant: Jon Bryant +Director of Product Management: Erin Gregg +Team Lead, Program and Project Management: Scott Disanno +Program Manager: Joanne Manning and Carole Snyder +Project Manager: Katrina Ostler, Ostler Editorial, Inc. +Senior Specialist, Program Planning and Support: Maura Zaldivar-Garcia + + Cover Designer: Joyce Wells +Manager, Rights and Permissions: Ben Ferrini +Project Manager, Rights and Permissions: Jenny Hoffman, Aptara Corporation +Inventory Manager: Ann Lam +Cover Image: Marc Gutierrez/Getty Images +Media Project Manager: Steve Wright +Composition: Cenveo Publishing Services +Printer/Binder: Edwards Brothers Malloy +Cover and Insert Printer: Phoenix Color/ Hagerstown +Credits and acknowledgments borrowed from other sources and reproduced, with ­permission, in this +textbook appear on appropriate page within text. +Copyright © 2017, 2013, 2010 Pearson Education, Inc. All rights reserved. Manufactured in the United States +of America. This publication is protected by Copyright, and permission should be obtained from the publisher +prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any +means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, +request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions +Department, please visit www.pearsoned.com/permissions/. Many of the designations by manufacturers +and seller to distinguish their products are claimed as trademarks. Where those designations appear in this +book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or +all caps. +Library of Congress Cataloging-in-Publication Data +Names: Kurose, James F. | Ross, Keith W., 1956Title: Computer networking: a top-down approach / James F. Kurose, University of Massachusetts, Amherst, +Keith W. Ross, NYU and NYU Shanghai. +Description: Seventh edition. | Hoboken, New Jersey: Pearson, [2017] | Includes bibliographical references +and index. +Identifiers: LCCN 2016004976 | ISBN 9780133594140 | ISBN 0133594149 +Subjects: LCSH: Internet. | Computer networks. +Classification: LCC TK5105.875.I57 K88 2017 | DDC 004.6-dc23 + + LC record available at http://lccn.loc.gov/2016004976 + +ISBN-10: + +0-13-359414-9 + +ISBN-13: 978-0-13-359414-0 + +About the Authors +Jim Kurose +Jim Kurose is a Distinguished University Professor of Computer Science at the University of Massachusetts, +Amherst. He is currently on leave from the University of Massachusetts, serving as an Assistant Director at the +US National Science Foundation, where he leads the Directorate of Computer and Information Science and +Engineering. +Dr. Kurose has received a number of recognitions for his educational activities including Outstanding Teacher +Awards from the National Technological University (eight times), the University of Massachusetts, and the +Northeast Association of Graduate Schools. He received the IEEE Taylor Booth Education Medal and was +recognized for his leadership of Massachusetts’ Commonwealth Information Technology Initiative. He has won +several conference best paper awards and received the IEEE Infocom Achievement Award and the ACM +Sigcomm Test of Time Award. + +Dr. Kurose is a former Editor-in-Chief of IEEE Transactions on Communications and of IEEE/ACM +Transactions on Networking. He has served as Technical Program co-Chair for IEEE Infocom, ACM +SIGCOMM, ACM Internet Measurement Conference, and ACM SIGMETRICS. He is a Fellow of the IEEE and +the ACM. His research ­interests include network protocols and architecture, network measurement, +multimedia communication, and modeling and performance ­evaluation. He holds a PhD in Computer Science +from Columbia University. + +Keith Ross + + Keith Ross is the Dean of Engineering and Computer Science at NYU Shanghai and the Leonard J. Shustek +Chair Professor in the Computer Science and Engineering Department at NYU. Previously he was at +University of Pennsylvania (13 years), Eurecom Institute (5 years) and Polytechnic University (10 years). He +received a B.S.E.E from Tufts University, a M.S.E.E. from Columbia University, and a Ph.D. in Computer and +Control Engineering from The University of Michigan. Keith Ross is also the co-founder and original CEO of +Wimba, which develops online multimedia applications for e-learning and was acquired by Blackboard in 2010. + +Professor Ross’s research interests are in privacy, social networks, peer-to-peer networking, Internet +measurement, content distribution networks, and stochastic modeling. He is an ACM Fellow, an IEEE Fellow, +recipient of the Infocom 2009 Best Paper Award, and recipient of 2011 and 2008 Best Paper Awards for +Multimedia Communications (awarded by IEEE Communications Society). He has served on numerous journal +editorial boards and conference program committees, including IEEE/ACM Transactions on Networking, ACM +SIGCOMM, ACM CoNext, and ACM Internet Measurement Conference. He also has served as an advisor to +the Federal Trade Commission on P2P file sharing. + +To Julie and our three precious ones—Chris, Charlie, and Nina +JFK + +A big THANKS to my professors, colleagues, and students all over the world. +KWR + +Preface +Welcome to the seventh edition of Computer Networking: A Top-Down Approach. Since the publication of the +first edition 16 years ago, our book has been adopted for use at many hundreds of colleges and universities, +translated into 14 languages, and used by over one hundred thousand students and practitioners worldwide. +We’ve heard from many of these readers and have been overwhelmed by the ­positive ­response. + + What’s New in the Seventh Edition? +We think one important reason for this success has been that our book continues to offer a fresh and timely +approach to computer networking instruction. We’ve made changes in this seventh edition, but we’ve also kept +unchanged what we believe (and the instructors and students who have used our book have confirmed) to be +the most important aspects of this book: its top-down approach, its focus on the Internet and a modern +treatment of computer networking, its attention to both principles and practice, and its accessible style and +approach toward learning about computer networking. Nevertheless, the seventh edition has been revised and +updated substantially. +Long-time readers of our book will notice that for the first time since this text was published, we’ve changed the +organization of the chapters themselves. The network layer, which had been previously covered in a single +chapter, is now covered in Chapter 4 (which focuses on the so-called “data plane” component of the network +layer) and Chapter 5 (which focuses on the network layer’s “control plane”). This expanded coverage of the +network layer reflects the swift rise in importance of software-defined networking (SDN), arguably the most +important and exciting advance in networking in decades. Although a relatively recent innovation, SDN has +been rapidly adopted in practice—so much so that it’s already hard to imagine an introduction to modern +computer networking that doesn’t cover SDN. The topic of network management, previously covered in +Chapter 9, has now been folded into the new Chapter 5. As always, we’ve also updated many other sections +of the text to reflect recent changes in the dynamic field of networking since the sixth edition. As always, +material that has been retired from the printed text can always be found on this book’s Companion Website. +The most important updates are the following: +Chapter 1 has been updated to reflect the ever-growing reach and use of the ­Internet. +Chapter 2, which covers the application layer, has been significantly updated. We’ve removed the material +on the FTP protocol and distributed hash tables to make room for a new section on application-level video +streaming and ­content distribution networks, together with Netflix and YouTube case studies. The +socket programming sections have been updated from Python 2 to Python 3. +Chapter 3, which covers the transport layer, has been modestly updated. The ­material on asynchronous +transport mode (ATM) networks has been replaced by more modern material on the Internet’s explicit +congestion notification (ECN), which teaches the same principles. +Chapter 4 covers the “data plane” component of the network layer—the per-router forwarding function that +determine how a packet arriving on one of a router’s input links is forwarded to one of that router’s output +links. We updated the material on traditional Internet forwarding found in all previous editions, and added +material on packet scheduling. We’ve also added a new section on generalized forwarding, as practiced in +SDN. There are also numerous updates throughout the chapter. Material on multicast and broadcast +communication has been removed to make way for the new material. +In Chapter 5, we cover the control plane functions of the network layer—the ­network-wide logic that +controls how a datagram is routed along an end-to-end path of routers from the source host to the +destination host. As in previous ­editions, we cover routing algorithms, as well as routing protocols (with an +updated treatment of BGP) used in today’s Internet. We’ve added a significant new section on the SDN +control plane, where routing and other functions are implemented in so-called SDN controllers. +Chapter 6, which now covers the link layer, has an updated treatment of Ethernet, and of data center +networking. +Chapter 7, which covers wireless and mobile networking, contains updated ­material on 802.11 (so-called +“WiFi) networks and cellular networks, including 4G and LTE. +Chapter 8, which covers network security and was extensively updated in the sixth edition, has only + + modest updates in this seventh edition. +Chapter 9, on multimedia networking, is now slightly “thinner” than in the sixth edition, as material on video +streaming and content distribution networks has been moved to Chapter 2, and material on packet +scheduling has been incorporated into Chapter 4. +Significant new material involving end-of-chapter problems has been added. As with all previous editions, +homework problems have been revised, added, and removed. +As always, our aim in creating this new edition of our book is to continue to provide a focused and modern +treatment of computer networking, emphasizing both principles and practice. +Audience +This textbook is for a first course on computer networking. It can be used in both computer science and +electrical engineering departments. In terms of programming languages, the book assumes only that the +student has experience with C, C++, Java, or Python (and even then only in a few places). Although this book +is more precise and analytical than many other introductory computer networking texts, it rarely uses any +mathematical concepts that are not taught in high school. We have made a deliberate effort to avoid using any +advanced calculus, probability, or stochastic process concepts (although we’ve included some homework +problems for students with this advanced background). The book is therefore appropriate for undergraduate +courses and for first-year graduate courses. It should also be useful to practitioners in the telecommunications +industry. +What Is Unique About This Textbook? +The subject of computer networking is enormously complex, involving many concepts, protocols, and +technologies that are woven together in an intricate manner. To cope with this scope and complexity, many +computer networking texts are often organized around the “layers” of a network architecture. With a layered +organization, students can see through the complexity of computer networking—they learn about the distinct +concepts and protocols in one part of the architecture while seeing the big picture of how all parts fit together. +From a pedagogical perspective, our personal experience has been that such a layered approach indeed +works well. Nevertheless, we have found that the traditional approach of teaching—bottom up; that is, from the +physical layer towards the application layer—is not the best approach for a modern course on computer +networking. +A Top-Down Approach +Our book broke new ground 16 years ago by treating networking in a top-down ­manner—that is, by +beginning at the application layer and working its way down toward the physical layer. The feedback we +received from teachers and students alike have confirmed that this top-down approach has many advantages +and does indeed work well pedagogically. First, it places emphasis on the application layer (a “high growth +area” in networking). Indeed, many of the recent revolutions in ­computer networking—including the Web, +peer-to-peer file sharing, and media streaming—have taken place at the application layer. An early emphasis +on application-layer issues differs from the approaches taken in most other texts, which have only a small +amount of material on network applications, their requirements, application-layer paradigms (e.g., client-server +and peer-to-peer), and application programming ­interfaces. ­Second, our experience as instructors (and that +of many instructors who have used this text) has been that teaching networking applications near the +beginning of the course is a powerful motivational tool. Students are thrilled to learn about how networking + + applications work—applications such as e-mail and the Web, which most students use on a daily basis. Once +a student understands the applications, the student can then understand the network services needed to +support these applications. The student can then, in turn, examine the various ways in which such services +might be provided and implemented in the lower layers. Covering applications early thus provides motivation +for the remainder of the text. +Third, a top-down approach enables instructors to introduce network application development at an early +stage. Students not only see how popular applications and protocols work, but also learn how easy it is to +create their own network ­applications and application-level protocols. With the top-down approach, students +get early ­exposure to the notions of socket programming, service models, and ­protocols—important +concepts that resurface in all subsequent layers. By providing socket programming examples in Python, we +highlight the central ideas without confusing students with complex code. Undergraduates in electrical +engineering and computer science should not have difficulty following the Python code. +An Internet Focus +Although we dropped the phrase “Featuring the Internet” from the title of this book with the fourth edition, this +doesn’t mean that we dropped our focus on the Internet. Indeed, nothing could be further from the case! +Instead, since the Internet has become so pervasive, we felt that any networking textbook must have a +significant focus on the Internet, and thus this phrase was somewhat unnecessary. We continue to use the +Internet’s architecture and protocols as primary vehicles for studying fundamental computer networking +concepts. Of course, we also include concepts and protocols from other network architectures. But the +spotlight is clearly on the Internet, a fact reflected in our organizing the book around the Internet’s five-layer +architecture: the application, transport, network, link, and physical layers. +Another benefit of spotlighting the Internet is that most computer science and electrical engineering students +are eager to learn about the Internet and its protocols. They know that the Internet has been a revolutionary +and disruptive technology and can see that it is profoundly changing our world. Given the enormous relevance +of the Internet, students are naturally curious about what is “under the hood.” Thus, it is easy for an instructor +to get students excited about basic principles when using the Internet as the guiding focus. +Teaching Networking Principles +Two of the unique features of the book—its top-down approach and its focus on the Internet—have appeared +in the titles of our book. If we could have squeezed a third phrase into the subtitle, it would have contained the +word principles. The field of networking is now mature enough that a number of fundamentally important issues +can be identified. For example, in the transport layer, the fundamental issues include reliable communication +over an unreliable network layer, connection establishment/ teardown and handshaking, congestion and flow +control, and multiplexing. Three fundamentally important network-layer issues are determining “good” paths +between two routers, interconnecting a large number of heterogeneous networks, and managing the +complexity of a modern network. In the link layer, a fundamental problem is sharing a multiple access channel. +In network security, techniques for providing confidentiality, authentication, and message integrity are all based +on cryptographic fundamentals. This text identifies fundamental networking issues and studies approaches +towards addressing these issues. The student learning these principles will gain knowledge with a long “shelf +life”—long after today’s network standards and protocols have become obsolete, the principles they embody +will remain important and relevant. We believe that the combination of using the Internet to get the student’s +foot in the door and then emphasizing fundamental issues and solution approaches will allow the student to + + quickly understand just about any networking technology. +The Website +Each new copy of this textbook includes twelve months of access to a Companion ­Website for all book +readers at http://www.pearsonhighered.com/cs-resources/, which includes: +Interactive learning material. The book’s Companion Website contains ­VideoNotes—video +presentations of important topics throughout the book done by the authors, as well as walkthroughs of +solutions to problems similar to those at the end of the chapter. We’ve seeded the Web site with +VideoNotes and ­online problems for Chapters 1 through 5 and will continue to actively add and update +this material over time. As in earlier editions, the Web site contains the interactive Java applets that +animate many key networking concepts. The site also has interactive quizzes that permit students to check +their basic understanding of the subject matter. Professors can integrate these interactive features into their +lectures or use them as mini labs. +Additional technical material. As we have added new material in each edition of our book, we’ve had to +remove coverage of some existing topics to keep the book at manageable length. For example, to make +room for the new ­material in this ­edition, we’ve removed material on FTP, distributed hash tables, and +multicasting, Material that appeared in earlier editions of the text is still of ­interest, and thus can be found +on the book’s Web site. +Programming assignments. The Web site also provides a number of detailed programming assignments, +which include building a multithreaded Web ­server, building an e-mail client with a GUI interface, +programming the sender and ­receiver sides of a reliable data transport protocol, programming a +distributed routing algorithm, and more. +Wireshark labs. One’s understanding of network protocols can be greatly ­deepened by seeing them in +action. The Web site provides numerous Wireshark assignments that enable students to actually observe +the sequence of messages exchanged between two protocol entities. The Web site includes separate +Wireshark labs on HTTP, DNS, TCP, UDP, IP, ICMP, Ethernet, ARP, WiFi, SSL, and on tracing all +protocols involved in satisfying a request to fetch a Web page. We’ll continue to add new labs over time. +In addition to the Companion Website, the authors maintain a public Web site, +http://gaia.cs.umass.edu/kurose_ross/interactive, containing interactive exercises that create (and present +solutions for) problems similar to selected end-of-chapter problems. Since students can generate (and view +solutions for) an unlimited number of similar problem instances, they can work until the material is truly +mastered. +Pedagogical Features +We have each been teaching computer networking for more than 30 years. Together, we bring more than 60 +years of teaching experience to this text, during which time we have taught many thousands of students. We +have also been active researchers in computer networking during this time. (In fact, Jim and Keith first met +each other as master’s students in a computer networking course taught by Mischa Schwartz in 1979 at +Columbia University.) We think all this gives us a good perspective on where networking has been and where +it is likely to go in the future. Nevertheless, we have resisted temptations to bias the material in this book +towards our own pet research projects. We figure you can visit our personal Web sites if you are interested in +our research. Thus, this book is about modern computer networking—it is about contemporary protocols and +technologies as well as the underlying principles behind these protocols and technologies. We also believe + + that learning (and teaching!) about networking can be fun. A sense of humor, use of analogies, and real-world +examples in this book will hopefully make this material more fun. +Supplements for Instructors +We provide a complete supplements package to aid instructors in teaching this course. This material can be +accessed from Pearson’s Instructor Resource Center (http://www.pearsonhighered.com/irc). Visit the +Instructor Resource Center for ­information about accessing these instructor’s supplements. +PowerPoint® slides. We provide PowerPoint slides for all nine chapters. The slides have been completely +updated with this seventh edition. The slides cover each chapter in detail. They use graphics and +animations (rather than relying only on monotonous text bullets) to make the slides interesting and visually +appealing. We provide the original PowerPoint slides so you can customize them to best suit your own +teaching needs. Some of these slides have been contributed by other instructors who have taught from our +book. +Homework solutions. We provide a solutions manual for the homework problems in the text, programming +assignments, and Wireshark labs. As noted ­earlier, we’ve introduced many new homework problems in +the first six chapters of the book. +Chapter Dependencies +The first chapter of this text presents a self-contained overview of computer networking. Introducing many key +concepts and terminology, this chapter sets the stage for the rest of the book. All of the other chapters directly +depend on this first chapter. After completing Chapter 1, we recommend instructors cover Chapters 2 through +6 in sequence, following our top-down philosophy. Each of these five chapters leverages material from the +preceding chapters. After completing the first six chapters, the instructor has quite a bit of flexibility. There are +no interdependencies among the last three chapters, so they can be taught in any order. However, each of the +last three chapters depends on the material in the first six chapters. Many instructors first teach the first six +chapters and then teach one of the last three chapters for “dessert.” +One Final Note: We’d Love to Hear from You +We encourage students and instructors to e-mail us with any comments they might have about our book. It’s +been wonderful for us to hear from so many instructors and students from around the world about our first five +editions. We’ve incorporated many of these suggestions into later editions of the book. We also encourage +instructors to send us new homework problems (and solutions) that would complement the current homework +problems. We’ll post these on the instructor-only portion of the Web site. We also encourage instructors and +students to create new Java applets that illustrate the concepts and protocols in this book. If you have an +applet that you think would be appropriate for this text, please submit it to us. If the applet (including notation +and terminology) is appropriate, we’ll be happy to include it on the text’s Web site, with an appropriate +reference to the applet’s authors. +So, as the saying goes, “Keep those cards and letters coming!” Seriously, please do continue to send us +interesting URLs, point out typos, disagree with any of our claims, and tell us what works and what doesn’t +work. Tell us what you think should or shouldn’t be included in the next edition. Send your e-mail to +kurose@cs.umass.edu and keithwross@nyu.edu. + + Acknowledgments +Since we began writing this book in 1996, many people have given us invaluable help and have been +influential in shaping our thoughts on how to best organize and teach a networking course. We want to say A +BIG THANKS to everyone who has helped us from the earliest first drafts of this book, up to this seventh +edition. We are also very thankful to the many hundreds of readers from around the world—students, faculty, +practitioners—who have sent us thoughts and comments on earlier editions of the book and suggestions for +future editions of the book. Special thanks go out to: +Al Aho (Columbia University) +Hisham Al-Mubaid (University of Houston-Clear Lake) +Pratima Akkunoor (Arizona State University) +Paul Amer (University of Delaware) +Shamiul Azom (Arizona State University) +Lichun Bao (University of California at Irvine) +Paul Barford (University of Wisconsin) +Bobby Bhattacharjee (University of Maryland) +Steven Bellovin (Columbia University) +Pravin Bhagwat (Wibhu) +Supratik Bhattacharyya (previously at Sprint) +Ernst Biersack (Eurécom Institute) +Shahid Bokhari (University of Engineering & Technology, Lahore) +Jean Bolot (Technicolor Research) +Daniel Brushteyn (former University of Pennsylvania student) +Ken Calvert (University of Kentucky) +Evandro Cantu (Federal University of Santa Catarina) +Jeff Case (SNMP Research International) +Jeff Chaltas (Sprint) +Vinton Cerf (Google) +Byung Kyu Choi (Michigan Technological University) +Bram Cohen (BitTorrent, Inc.) +Constantine Coutras (Pace University) +John Daigle (University of Mississippi) +Edmundo A. de Souza e Silva (Federal University of Rio de Janeiro) + + Philippe Decuetos (Eurécom Institute) +Christophe Diot (Technicolor Research) +Prithula Dhunghel (Akamai) +Deborah Estrin (University of California, Los Angeles) +Michalis Faloutsos (University of California at Riverside) +Wu-chi Feng (Oregon Graduate Institute) +Sally Floyd (ICIR, University of California at Berkeley) +Paul Francis (Max Planck Institute) +David Fullager (Netflix) +Lixin Gao (University of Massachusetts) +JJ Garcia-Luna-Aceves (University of California at Santa Cruz) +Mario Gerla (University of California at Los Angeles) +David Goodman (NYU-Poly) +Yang Guo (Alcatel/Lucent Bell Labs) +Tim Griffin (Cambridge University) +Max Hailperin (Gustavus Adolphus College) +Bruce Harvey (Florida A&M University, Florida State University) +Carl Hauser (Washington State University) +Rachelle Heller (George Washington University) +Phillipp Hoschka (INRIA/W3C) +Wen Hsin (Park University) +Albert Huang (former University of Pennsylvania student) +Cheng Huang (Microsoft Research) +Esther A. Hughes (Virginia Commonwealth University) +Van Jacobson (Xerox PARC) +Pinak Jain (former NYU-Poly student) +Jobin James (University of California at Riverside) +Sugih Jamin (University of Michigan) +Shivkumar Kalyanaraman (IBM Research, India) +Jussi Kangasharju (University of Helsinki) +Sneha Kasera (University of Utah) + + Parviz Kermani (formerly of IBM Research) +Hyojin Kim (former University of Pennsylvania student) +Leonard Kleinrock (University of California at Los Angeles) +David Kotz (Dartmouth College) +Beshan Kulapala (Arizona State University) +Rakesh Kumar (Bloomberg) +Miguel A. Labrador (University of South Florida) +Simon Lam (University of Texas) +Steve Lai (Ohio State University) +Tom LaPorta (Penn State University) +Tim-Berners Lee (World Wide Web Consortium) +Arnaud Legout (INRIA) +Lee Leitner (Drexel University) +Brian Levine (University of Massachusetts) +Chunchun Li (former NYU-Poly student) +Yong Liu (NYU-Poly) +William Liang (former University of Pennsylvania student) +Willis Marti (Texas A&M University) +Nick McKeown (Stanford University) +Josh McKinzie (Park University) +Deep Medhi (University of Missouri, Kansas City) +Bob Metcalfe (International Data Group) +Sue Moon (KAIST) +Jenni Moyer (Comcast) +Erich Nahum (IBM Research) +Christos Papadopoulos (Colorado Sate University) +Craig Partridge (BBN Technologies) +Radia Perlman (Intel) +Jitendra Padhye (Microsoft Research) +Vern Paxson (University of California at Berkeley) +Kevin Phillips (Sprint) + + George Polyzos (Athens University of Economics and Business) +Sriram Rajagopalan (Arizona State University) +Ramachandran Ramjee (Microsoft Research) +Ken Reek (Rochester Institute of Technology) +Martin Reisslein (Arizona State University) +Jennifer Rexford (Princeton University) +Leon Reznik (Rochester Institute of Technology) +Pablo Rodrigez (Telefonica) +Sumit Roy (University of Washington) +Dan Rubenstein (Columbia University) +Avi Rubin (Johns Hopkins University) +Douglas Salane (John Jay College) +Despina Saparilla (Cisco Systems) +John Schanz (Comcast) +Henning Schulzrinne (Columbia University) +Mischa Schwartz (Columbia University) +Ardash Sethi (University of Delaware) +Harish Sethu (Drexel University) +K. Sam Shanmugan (University of Kansas) +Prashant Shenoy (University of Massachusetts) +Clay Shields (Georgetown University) +Subin Shrestra (University of Pennsylvania) +Bojie Shu (former NYU-Poly student) +Mihail L. Sichitiu (NC State University) +Peter Steenkiste (Carnegie Mellon University) +Tatsuya Suda (University of California at Irvine) +Kin Sun Tam (State University of New York at Albany) +Don Towsley (University of Massachusetts) +David Turner (California State University, San Bernardino) +Nitin Vaidya (University of Illinois) +Michele Weigle (Clemson University) + + David Wetherall (University of Washington) +Ira Winston (University of Pennsylvania) +Di Wu (Sun Yat-sen University) +Shirley Wynn (NYU-Poly) +Raj Yavatkar (Intel) +Yechiam Yemini (Columbia University) +Dian Yu (NYU Shanghai) +Ming Yu (State University of New York at Binghamton) +Ellen Zegura (Georgia Institute of Technology) +Honggang Zhang (Suffolk University) +Hui Zhang (Carnegie Mellon University) +Lixia Zhang (University of California at Los Angeles) +Meng Zhang (former NYU-Poly student) +Shuchun Zhang (former University of Pennsylvania student) +Xiaodong Zhang (Ohio State University) +ZhiLi Zhang (University of Minnesota) +Phil Zimmermann (independent consultant) +Mike Zink (University of Massachusetts) +Cliff C. Zou (University of Central Florida) +We also want to thank the entire Pearson team—in particular, Matt Goldstein and Joanne Manning—who have +done an absolutely outstanding job on this seventh ­edition (and who have put up with two very finicky authors +who seem congenitally ­unable to meet deadlines!). Thanks also to our artists, Janet Theurer and Patrice +Rossi Calkin, for their work on the beautiful figures in this and earlier editions of our book, and to Katie Ostler +and her team at Cenveo for their wonderful production work on this edition. Finally, a most special thanks go to +our previous two editors at ­Addison-Wesley—Michael Hirsch and Susan Hartman. This book would not be +what it is (and may well not have been at all) without their graceful management, constant encouragement, +nearly infinite patience, good humor, and perseverance. + + Table of Contents +Chapter 1 Computer Networks and the Internet 1 +1.1 What Is the Internet? 2 +1.1.1 A Nuts-and-Bolts Description 2 +1.1.2 A Services Description 5 +1.1.3 What Is a Protocol? 7 +1.2 The Network Edge 9 +1.2.1 Access Networks 12 +1.2.2 Physical Media 18 +1.3 The Network Core 21 +1.3.1 Packet Switching 23 +1.3.2 Circuit Switching 27 +1.3.3 A Network of Networks 31 +1.4 Delay, Loss, and Throughput in Packet-Switched Networks 35 +1.4.1 Overview of Delay in Packet-Switched Networks 35 +1.4.2 Queuing Delay and Packet Loss 39 +1.4.3 End-to-End Delay 41 +1.4.4 Throughput in Computer Networks 43 +1.5 Protocol Layers and Their Service Models 47 +1.5.1 Layered Architecture 47 +1.5.2 Encapsulation 53 +1.6 Networks Under Attack 55 +1.7 History of Computer Networking and the Internet 59 +1.7.1 The Development of Packet Switching: 1961–1972 59 +1.7.2 Proprietary Networks and Internetworking: 1972–1980 60 +1.7.3 A Proliferation of Networks: 1980–1990 62 +1.7.4 The Internet Explosion: The 1990s 63 +1.7.5 The New Millennium 64 +1.8 Summary 65 + + Homework Problems and Questions 67 +Wireshark Lab 77 +Interview: Leonard Kleinrock 79 +Chapter 2 Application Layer 83 +2.1 Principles of Network Applications 84 +2.1.1 Network Application Architectures 86 +2.1.2 Processes Communicating 88 +2.1.3 Transport Services Available to Applications 90 +2.1.4 Transport Services Provided by the Internet 93 +2.1.5 Application-Layer Protocols 96 +2.1.6 Network Applications Covered in This Book 97 +2.2 The Web and HTTP 98 +2.2.1 Overview of HTTP 98 +2.2.2 Non-Persistent and Persistent Connections 100 +2.2.3 HTTP Message Format 103 +2.2.4 User-Server Interaction: Cookies 108 +2.2.5 Web Caching 110 +2.3 Electronic Mail in the Internet 116 +2.3.1 SMTP 118 +2.3.2 Comparison with HTTP 121 +2.3.3 Mail Message Formats 121 +2.3.4 Mail Access Protocols 122 +2.4 DNS—The Internet’s Directory Service 126 +2.4.1 Services Provided by DNS 127 +2.4.2 Overview of How DNS Works 129 +2.4.3 DNS Records and Messages 135 +2.5 Peer-to-Peer Applications 140 +2.5.1 P2P File Distribution 140 +2.6 Video Streaming and Content Distribution Networks 147 +2.6.1 Internet Video 148 +2.6.2 HTTP Streaming and DASH 148 + + 2.6.3 Content Distribution Networks 149 +2.6.4 Case Studies: Netflix, YouTube, and Kankan 153 +2.7 Socket Programming: Creating Network Applications 157 +2.7.1 Socket Programming with UDP 159 +2.7.2 Socket Programming with TCP 164 +2.8 Summary 170 +Homework Problems and Questions 171 +Socket Programming Assignments 180 +Wireshark Labs: HTTP, DNS 182 +Interview: Marc Andreessen 184 +Chapter 3 Transport Layer 187 +3.1 Introduction and Transport-Layer Services 188 +3.1.1 Relationship Between Transport and Network Layers 188 +3.1.2 Overview of the Transport Layer in the Internet 191 +3.2 Multiplexing and Demultiplexing 193 +3.3 Connectionless Transport: UDP 200 +3.3.1 UDP Segment Structure 204 +3.3.2 UDP Checksum 204 +3.4 Principles of Reliable Data Transfer 206 +3.4.1 Building a Reliable Data Transfer Protocol 208 +3.4.2 Pipelined Reliable Data Transfer Protocols 217 +3.4.3 Go-Back-N (GBN) 221 +3.4.4 Selective Repeat (SR) 226 +3.5 Connection-Oriented Transport: TCP 233 +3.5.1 The TCP Connection 233 +3.5.2 TCP Segment Structure 236 +3.5.3 Round-Trip Time Estimation and Timeout 241 +3.5.4 Reliable Data Transfer 244 +3.5.5 Flow Control 252 +3.5.6 TCP Connection Management 255 +3.6 Principles of Congestion Control 261 + + 3.6.1 The Causes and the Costs of Congestion 261 +3.6.2 Approaches to Congestion Control 268 +3.7 TCP Congestion Control 269 +3.7.1 Fairness 279 +3.7.2 Explicit Congestion Notification (ECN): Network-assisted Congestion Control 282 +3.8 Summary 284 +Homework Problems and Questions 286 +Programming Assignments 301 +Wireshark Labs: Exploring TCP, UDP 302 +Interview: Van Jacobson 303 +Chapter 4 The Network Layer: Data Plane 305 +4.1 Overview of Network Layer 306 +4.1.1 Forwarding and Routing: The Network Data and Control Planes 306 +4.1.2 Network Service Models 311 +4.2 What’s Inside a Router? 313 +4.2.1 Input Port Processing and Destination-Based Forwarding 316 +4.2.2 Switching 319 +4.2.3 Output Port Processing 321 +4.2.4 Where Does Queuing Occur? 321 +4.2.5 Packet Scheduling 325 +4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, and More 329 +4.3.1 IPv4 Datagram Format 330 +4.3.2 IPv4 Datagram Fragmentation 332 +4.3.3 IPv4 Addressing 334 +4.3.4 Network Address Translation (NAT) 345 +4.3.5 IPv6 348 +4.4 Generalized Forwarding and SDN 354 +4.4.1 Match 356 +4.4.2 Action 358 +4.4.3 OpenFlow Examples of Match-plus-action in Action 358 +4.5 Summary 361 + + Homework Problems and Questions 361 +Wireshark Lab 370 +Interview: Vinton G. Cerf 371 +Chapter 5 The Network Layer: Control Plane 373 +5.1 Introduction 374 +5.2 Routing Algorithms 376 +5.2.1 The Link-State (LS) Routing Algorithm 379 +5.2.2 The Distance-Vector (DV) Routing Algorithm 384 +5.3 Intra-AS Routing in the Internet: OSPF 391 +5.4 Routing Among the ISPs: BGP 395 +5.4.1 The Role of BGP 395 +5.4.2 Advertising BGP Route Information 396 +5.4.3 Determining the Best Routes 398 +5.4.4 IP-Anycast 402 +5.4.5 Routing Policy 403 +5.4.6 Putting the Pieces Together: Obtaining Internet Presence 406 +5.5 The SDN Control Plane 407 +5.5.1 The SDN Control Plane: SDN Controller and SDN Control Applications 410 +5.5.2 OpenFlow Protocol 412 +5.5.3 Data and Control Plane Interaction: An Example 414 +5.5.4 SDN: Past and Future 415 +5.6 ICMP: The Internet Control Message Protocol 419 +5.7 Network Management and SNMP 421 +5.7.1 The Network Management Framework 422 +5.7.2 The Simple Network Management Protocol (SNMP) 424 +5.8 Summary 426 +Homework Problems and Questions 427 +Socket Programming Assignment 433 +Programming Assignment 434 +Wireshark Lab 435 +Interview: Jennifer Rexford 436 + + Chapter 6 The Link Layer and LANs 439 +6.1 Introduction to the Link Layer 440 +6.1.1 The Services Provided by the Link Layer 442 +6.1.2 Where Is the Link Layer Implemented? 443 +6.2 Error-Detection and -Correction Techniques 444 +6.2.1 Parity Checks 446 +6.2.2 Checksumming Methods 448 +6.2.3 Cyclic Redundancy Check (CRC) 449 +6.3 Multiple Access Links and Protocols 451 +6.3.1 Channel Partitioning Protocols 453 +6.3.2 Random Access Protocols 455 +6.3.3 Taking-Turns Protocols 464 +6.3.4 DOCSIS: The Link-Layer Protocol for Cable Internet Access 465 +6.4 Switched Local Area Networks 467 +6.4.1 Link-Layer Addressing and ARP 468 +6.4.2 Ethernet 474 +6.4.3 Link-Layer Switches 481 +6.4.4 Virtual Local Area Networks (VLANs) 487 +6.5 Link Virtualization: A Network as a Link Layer 491 +6.5.1 Multiprotocol Label Switching (MPLS) 492 +6.6 Data Center Networking 495 +6.7 Retrospective: A Day in the Life of a Web Page Request 500 +6.7.1 Getting Started: DHCP, UDP, IP, and Ethernet 500 +6.7.2 Still Getting Started: DNS and ARP 502 +6.7.3 Still Getting Started: Intra-Domain Routing to the DNS Server 503 +6.7.4 Web Client-Server Interaction: TCP and HTTP 504 +6.8 Summary 506 +Homework Problems and Questions 507 +Wireshark Lab 515 +Interview: Simon S. Lam 516 + + Chapter 7 Wireless and Mobile Networks 519 +7.1 Introduction 520 +7.2 Wireless Links and Network Characteristics 525 +7.2.1 CDMA 528 +7.3 WiFi: 802.11 Wireless LANs 532 +7.3.1 The 802.11 Architecture 533 +7.3.2 The 802.11 MAC Protocol 537 +7.3.3 The IEEE 802.11 Frame 542 +7.3.4 Mobility in the Same IP Subnet 546 +7.3.5 Advanced Features in 802.11 547 +7.3.6 Personal Area Networks: Bluetooth and Zigbee 548 +7.4 Cellular Internet Access 551 +7.4.1 An Overview of Cellular Network Architecture 551 +7.4.2 3G Cellular Data Networks: Extending the Internet to Cellular Subscribers 554 +7.4.3 On to 4G: LTE 557 +7.5 Mobility Management: Principles 560 +7.5.1 Addressing 562 +7.5.2 Routing to a Mobile Node 564 +7.6 Mobile IP 570 +7.7 Managing Mobility in Cellular Networks 574 +7.7.1 Routing Calls to a Mobile User 576 +7.7.2 Handoffs in GSM 577 +7.8 Wireless and Mobility: Impact on Higher-Layer Protocols 580 +7.9 Summary 582 +Homework Problems and Questions 583 +Wireshark Lab 588 +Interview: Deborah Estrin 589 +Chapter 8 Security in Computer Networks 593 +8.1 What Is Network Security? 594 +8.2 Principles of Cryptography 596 +8.2.1 Symmetric Key Cryptography 598 +8.2.2 Public Key Encryption 604 + + 8.3 Message Integrity and Digital Signatures 610 +8.3.1 Cryptographic Hash Functions 611 +8.3.2 Message Authentication Code 613 +8.3.3 Digital Signatures 614 +8.4 End-Point Authentication 621 +8.4.1 Authentication Protocol ap1.0 622 +8.4.2 Authentication Protocol ap2.0 622 +8.4.3 Authentication Protocol ap3.0 623 +8.4.4 Authentication Protocol ap3.1 623 +8.4.5 Authentication Protocol ap4.0 624 +8.5 Securing E-Mail 626 +8.5.1 Secure E-Mail 627 +8.5.2 PGP 630 +8.6 Securing TCP Connections: SSL 631 +8.6.1 The Big Picture 632 +8.6.2 A More Complete Picture 635 +8.7 Network-Layer Security: IPsec and Virtual Private Networks 637 +8.7.1 IPsec and Virtual Private Networks (VPNs) 638 +8.7.2 The AH and ESP Protocols 640 +8.7.3 Security Associations 640 +8.7.4 The IPsec Datagram 641 +8.7.5 IKE: Key Management in IPsec 645 +8.8 Securing Wireless LANs 646 +8.8.1 Wired Equivalent Privacy (WEP) 646 +8.8.2 IEEE 802.11i 648 +8.9 Operational Security: Firewalls and Intrusion Detection Systems 651 +8.9.1 Firewalls 651 +8.9.2 Intrusion Detection Systems 659 +8.10 Summary 662 +Homework Problems and Questions 664 +Wireshark Lab 672 + + IPsec Lab 672 +Interview: Steven M. Bellovin 673 +Chapter 9 Multimedia Networking 675 +9.1 Multimedia Networking Applications 676 +9.1.1 Properties of Video 676 +9.1.2 Properties of Audio 677 +9.1.3 Types of Multimedia Network Applications 679 +9.2 Streaming Stored Video 681 +9.2.1 UDP Streaming 683 +9.2.2 HTTP Streaming 684 +9.3 Voice-over-IP 688 +9.3.1 Limitations of the Best-Effort IP Service 688 +9.3.2 Removing Jitter at the Receiver for Audio 691 +9.3.3 Recovering from Packet Loss 694 +9.3.4 Case Study: VoIP with Skype 697 +9.4 Protocols for Real-Time Conversational Applications 700 +9.4.1 RTP 700 +9.4.2 SIP 703 +9.5 Network Support for Multimedia 709 +9.5.1 Dimensioning Best-Effort Networks 711 +9.5.2 Providing Multiple Classes of Service 712 +9.5.3 Diffserv 719 +9.5.4 Per-Connection Quality-of-Service (QoS) Guarantees: Resource Reservation and Call +Admission 723 +9.6 Summary 726 +Homework Problems and Questions 727 +Programming Assignment 735 +Interview: Henning Schulzrinne 736 +References 741 +Index 783 + + Chapter 1 Computer Networks and the Internet + +Today’s Internet is arguably the largest engineered system ever created by ­mankind, with hundreds of +millions of connected computers, communication links, and switches; with billions of users who connect +via laptops, tablets, and smartphones; and with an array of new Internet-connected “things” including +game consoles, surveillance systems, watches, eye glasses, thermostats, body scales, and cars. Given +that the Internet is so large and has so many diverse components and uses, is there any hope of +understanding how it works? Are there guiding principles and structure that can provide a foundation for +understanding such an amazingly large and complex system? And if so, is it possible that it actually +could be both interesting and fun to learn about computer networks? Fortunately, the answer to all of +these questions is a resounding YES! Indeed, it’s our aim in this book to provide you with a modern +introduction to the dynamic field of computer networking, giving you the principles and practical insights +you’ll need to understand not only today’s networks, but tomorrow’s as well. +This first chapter presents a broad overview of computer networking and the Internet. Our goal here is to +paint a broad picture and set the context for the rest of this book, to see the forest through the trees. +We’ll cover a lot of ground in this introductory chapter and discuss a lot of the pieces of a computer +network, without losing sight of the big picture. +We’ll structure our overview of computer networks in this chapter as follows. After introducing some +basic terminology and concepts, we’ll first examine the basic hardware and software components that +make up a network. We’ll begin at the network’s edge and look at the end systems and network +applications running in the network. We’ll then explore the core of a computer network, examining the +links and the switches that transport data, as well as the access networks and physical media that +connect end systems to the network core. We’ll learn that the Internet is a network of networks, and we’ll +learn how these networks connect with each other. +After having completed this overview of the edge and core of a computer network, we’ll take the broader +and more abstract view in the second half of this chapter. We’ll examine delay, loss, and throughput of +data in a computer network and provide simple quantitative models for end-to-end throughput and delay: +models that take into account transmission, propagation, and queuing delays. We’ll then introduce some +of the key architectural principles in computer networking, namely, protocol layering and service models. +We’ll also learn that computer networks are vulnerable to many different types of attacks; we’ll survey + + some of these attacks and consider how computer networks can be made more secure. Finally, we’ll +close this chapter with a brief history of computer networking. + + 1.1 What Is the Internet? +In this book, we’ll use the public Internet, a specific computer network, as our principal vehicle for +discussing computer networks and their protocols. But what is the Internet? There are a couple of ways +to answer this question. First, we can describe the nuts and bolts of the Internet, that is, the basic +hardware and software components that make up the Internet. Second, we can describe the Internet in +terms of a networking infrastructure that provides services to distributed applications. Let’s begin with +the nuts-and-bolts description, using Figure 1.1 to illustrate our discussion. + +1.1.1 A Nuts-and-Bolts Description +The Internet is a computer network that interconnects billions of computing devices throughout the +world. Not too long ago, these computing devices were primarily traditional desktop PCs, Linux +workstations, and so-called servers that store and transmit information such as Web pages and e-mail +messages. Increasingly, however, nontraditional Internet “things” such as laptops, smartphones, tablets, +TVs, gaming consoles, thermostats, home security systems, home appliances, watches, eye glasses, +cars, traffic control systems and more are being connected to the Internet. Indeed, the term computer +network is beginning to sound a bit dated, given the many nontraditional devices that are being hooked +up to the Internet. In Internet jargon, all of these devices are called hosts or end systems. By some +estimates, in 2015 there were about 5 billion devices connected to the Internet, and the number will +reach 25 billion by 2020 [Gartner 2014]. It is estimated that in 2015 there were over 3.2 billion Internet +users worldwide, approximately 40% of the world population [ITU 2015]. + + Figure 1.1 Some pieces of the Internet + +End systems are connected together by a network of communication links and packet switches. +We’ll see in Section 1.2 that there are many types of communication links, which are made up of + + different types of physical media, including coaxial cable, copper wire, optical fiber, and radio spectrum. +Different links can transmit data at different rates, with the transmission rate of a link measured in +bits/second. When one end system has data to send to another end system, the sending end system +segments the data and adds header bytes to each segment. The resulting packages of information, +known as packets in the jargon of computer networks, are then sent through the network to the +destination end system, where they are reassembled into the original data. +A packet switch takes a packet arriving on one of its incoming communication links and forwards that +packet on one of its outgoing communication links. Packet switches come in many shapes and flavors, +but the two most prominent types in today’s Internet are routers and link-layer switches. Both types of +switches forward packets toward their ultimate destinations. Link-layer switches are typically used in +access networks, while routers are typically used in the network core. The sequence of communication +links and packet switches traversed by a packet from the sending end system to the receiving end +system is known as a route or path through the network. Cisco predicts annual global IP traffic will pass +the zettabyte (1021 bytes) threshold by the end of 2016, and will reach 2 zettabytes per year by 2019 +[Cisco VNI 2015]. +Packet-switched networks (which transport packets) are in many ways similar to transportation networks +of highways, roads, and intersections (which transport vehicles). Consider, for example, a factory that +needs to move a large amount of cargo to some destination warehouse located thousands of kilometers +away. At the factory, the cargo is segmented and loaded into a fleet of trucks. Each of the trucks then +independently travels through the network of highways, roads, and intersections to the destination +warehouse. At the destination warehouse, the cargo is unloaded and grouped with the rest of the cargo +arriving from the same shipment. Thus, in many ways, packets are analogous to trucks, communication +links are analogous to highways and roads, packet switches are analogous to intersections, and end +systems are analogous to buildings. Just as a truck takes a path through the transportation network, a +packet takes a path through a computer network. +End systems access the Internet through Internet Service Providers (ISPs), including residential ISPs +such as local cable or telephone companies; corporate ISPs; university ISPs; ISPs that provide WiFi +access in airports, hotels, coffee shops, and other public places; and cellular data ISPs, providing +mobile access to our smartphones and other devices. Each ISP is in itself a network of packet switches +and communication links. ISPs provide a variety of types of network access to the end systems, +including residential broadband access such as cable modem or DSL, high-speed local area network +access, and mobile wireless access. ISPs also provide ­Internet access to content providers, +connecting Web sites and video servers directly to the Internet. The Internet is all about connecting end +systems to each other, so the ISPs that provide access to end systems must also be interconnected. +These lower-tier ISPs are interconnected through national and international upper-tier ISPs such as +Level 3 Communications, AT&T, Sprint, and NTT. An upper-tier ISP consists of high-speed routers +interconnected with high-speed fiber-optic links. Each ISP network, whether upper-tier or lower-tier, is + + managed independently, runs the IP protocol (see below), and conforms to certain naming and address +conventions. We’ll examine ISPs and their interconnection more closely in Section 1.3. +End systems, packet switches, and other pieces of the Internet run protocols that control the sending +and receiving of information within the Internet. The Transmission Control Protocol (TCP) and the +Internet Protocol (IP) are two of the most important protocols in the Internet. The IP protocol specifies +the format of the packets that are sent and received among routers and end systems. The Internet’s +principal protocols are collectively known as TCP/IP. We’ll begin looking into protocols in this +introductory chapter. But that’s just a start—much of this book is concerned with computer network +protocols! +Given the importance of protocols to the Internet, it’s important that everyone agree on what each and +every protocol does, so that people can create systems and products that interoperate. This is where +standards come into play. Internet ­standards are developed by the Internet Engineering Task Force +(IETF) [IETF 2016]. The IETF standards documents are called requests for comments (RFCs). RFCs +started out as general requests for comments (hence the name) to resolve network and protocol design +problems that faced the precursor to the Internet [Allman 2011]. RFCs tend to be quite technical and +detailed. They define protocols such as TCP, IP, HTTP (for the Web), and SMTP (for e-mail). There are +currently more than 7,000 RFCs. Other bodies also specify standards for network components, most +notably for network links. The IEEE 802 LAN/MAN Standards Committee [IEEE 802 2016], for example, +specifies the Ethernet and wireless WiFi standards. + +1.1.2 A Services Description +Our discussion above has identified many of the pieces that make up the Internet. But we can also +describe the Internet from an entirely different angle—namely, as an infrastructure that provides +services to applications. In addition to traditional applications such as e-mail and Web surfing, Internet +applications include mobile smartphone and tablet applications, including Internet messaging, mapping +with real-time road-traffic information, music streaming from the cloud, movie and television streaming, +online social networks, video conferencing, multi-person games, and location-based recommendation +systems. The applications are said to be distributed applications, since they involve multiple end +systems that exchange data with each other. Importantly, Internet applications run on end systems— +they do not run in the packet switches in the network core. Although packet switches facilitate the +exchange of data among end systems, they are not concerned with the application that is the source or +sink of data. +Let’s explore a little more what we mean by an infrastructure that provides ­services to applications. To +this end, suppose you have an exciting new idea for a distributed Internet application, one that may +greatly benefit humanity or one that may simply make you rich and famous. How might you go about + + transforming this idea into an actual Internet application? Because applications run on end systems, you +are going to need to write programs that run on the end systems. You might, for example, write your +programs in Java, C, or Python. Now, because you are developing a distributed Internet application, the +programs running on the different end systems will need to send data to each other. And here we get to +a central issue—one that leads to the alternative way of describing the Internet as a platform for +applications. How does one program running on one end system instruct the Internet to deliver data to +another program running on another end system? +End systems attached to the Internet provide a socket interface that specifies how a program running +on one end system asks the Internet infrastructure to deliver data to a specific destination program +running on another end system. This Internet socket interface is a set of rules that the sending program +must follow so that the Internet can deliver the data to the destination program. We’ll discuss the +Internet socket interface in detail in Chapter 2. For now, let’s draw upon a simple analogy, one that we +will frequently use in this book. Suppose Alice wants to send a letter to Bob using the postal service. +Alice, of course, can’t just write the letter (the data) and drop the letter out her window. Instead, the +postal service requires that Alice put the letter in an envelope; write Bob’s full name, address, and zip +code in the center of the envelope; seal the envelope; put a stamp in the upper-right-hand corner of the +envelope; and finally, drop the envelope into an official postal service mailbox. Thus, the postal service +has its own “postal service interface,” or set of rules, that Alice must follow to have the postal service +deliver her letter to Bob. In a similar manner, the Internet has a socket interface that the program +sending data must follow to have the Internet deliver the data to the program that will receive the data. +The postal service, of course, provides more than one service to its customers. It provides express +delivery, reception confirmation, ordinary use, and many more services. In a similar manner, the Internet +provides multiple services to its applications. When you develop an Internet application, you too must +choose one of the Internet’s services for your application. We’ll describe the Internet’s services in +Chapter 2. +We have just given two descriptions of the Internet; one in terms of its hardware and software +components, the other in terms of an infrastructure for providing services to distributed applications. But +perhaps you are still confused as to what the Internet is. What are packet switching and TCP/IP? What +are routers? What kinds of communication links are present in the Internet? What is a distributed +application? How can a thermostat or body scale be attached to the Internet? If you feel a bit +overwhelmed by all of this now, don’t worry—the purpose of this book is to introduce you to both the +nuts and bolts of the Internet and the principles that govern how and why it works. We’ll explain these +important terms and questions in the following sections and chapters. + +1.1.3 What Is a Protocol? + + Now that we’ve got a bit of a feel for what the Internet is, let’s consider another important buzzword in +computer networking: protocol. What is a protocol? What does a protocol do? +A Human Analogy +It is probably easiest to understand the notion of a computer network protocol by first considering some +human analogies, since we humans execute protocols all of the time. Consider what you do when you +want to ask someone for the time of day. A typical exchange is shown in Figure 1.2. Human protocol (or +good manners, at least) dictates that one first offer a greeting (the first “Hi” in Figure 1.2) to initiate +communication with someone else. The typical response to a “Hi” is a returned “Hi” message. Implicitly, +one then takes a cordial “Hi” response as an indication that one can proceed and ask for the time of day. +A different response to the initial “Hi” (such as “Don’t bother me!” or “I don’t speak English,” or some +unprintable reply) might + +Figure 1.2 A human protocol and a computer network protocol + +indicate an unwillingness or inability to communicate. In this case, the human protocol would be not to +ask for the time of day. Sometimes one gets no response at all to a question, in which case one typically +gives up asking that person for the time. Note that in our human protocol, there are specific messages + + we send, and specific actions we take in response to the received reply messages or other events (such +as no reply within some given amount of time). Clearly, transmitted and received messages, and actions +taken when these messages are sent or received or other events occur, play a central role in a human +protocol. If people run different protocols (for example, if one person has manners but the other does +not, or if one understands the concept of time and the other does not) the protocols do not interoperate +and no useful work can be accomplished. The same is true in networking—it takes two (or more) +communicating entities running the same protocol in order to accomplish a task. +Let’s consider a second human analogy. Suppose you’re in a college class (a computer networking +class, for example!). The teacher is droning on about protocols and you’re confused. The teacher stops +to ask, “Are there any questions?” (a message that is transmitted to, and received by, all students who +are not sleeping). You raise your hand (transmitting an implicit message to the teacher). Your teacher +acknowledges you with a smile, saying “Yes . . .” (a transmitted message encouraging you to ask your +question—teachers love to be asked questions), and you then ask your question (that is, transmit your +message to your teacher). Your teacher hears your question (receives your question message) and +answers (transmits a reply to you). Once again, we see that the transmission and receipt of messages, +and a set of conventional actions taken when these messages are sent and received, are at the heart of +this question-and-answer protocol. +Network Protocols +A network protocol is similar to a human protocol, except that the entities exchanging messages and +taking actions are hardware or software components of some device (for example, computer, +smartphone, tablet, router, or other network-capable device). All activity in the Internet that involves two +or more communicating remote entities is governed by a protocol. For example, hardware-implemented +protocols in two physically connected computers control the flow of bits on the “wire” between the two +network interface cards; congestion-control protocols in end systems control the rate at which packets +are transmitted between sender and receiver; protocols in routers determine a packet’s path from +source to destination. Protocols are running everywhere in the Internet, and consequently much of this +book is about computer network protocols. +As an example of a computer network protocol with which you are probably familiar, consider what +happens when you make a request to a Web server, that is, when you type the URL of a Web page into +your Web browser. The scenario is illustrated in the right half of Figure 1.2. First, your computer will +send a connection request message to the Web server and wait for a reply. The Web server will +eventually receive your connection request message and return a connection reply message. Knowing +that it is now OK to request the Web document, your computer then sends the name of the Web page it +wants to fetch from that Web server in a GET message. Finally, the Web server returns the Web page +(file) to your computer. + + Given the human and networking examples above, the exchange of messages and the actions taken +when these messages are sent and received are the key defining elements of a protocol: +A protocol defines the format and the order of messages exchanged between two or more +communicating entities, as well as the actions taken on the transmission and/or receipt of a message +or other event. +The Internet, and computer networks in general, make extensive use of protocols. Different protocols +are used to accomplish different communication tasks. As you read through this book, you will learn that +some protocols are simple and straightforward, while others are complex and intellectually deep. +Mastering the field of computer networking is equivalent to understanding the what, why, and how of +networking protocols. + + 1.2 The Network Edge +In the previous section we presented a high-level overview of the Internet and networking protocols. We +are now going to delve a bit more deeply into the components of a computer network (and the Internet, +in particular). We begin in this section at the edge of a network and look at the components with which +we are most ­familiar—namely, the computers, smartphones and other devices that we use on a daily +basis. In the next section we’ll move from the network edge to the network core and examine switching +and routing in computer networks. +Recall from the previous section that in computer networking jargon, the computers and other devices +connected to the Internet are often referred to as end systems. They are referred to as end systems +because they sit at the edge of the Internet, as shown in Figure 1.3. The Internet’s end systems include +desktop computers (e.g., desktop PCs, Macs, and Linux boxes), servers (e.g., Web and e-mail servers), +and mobile devices (e.g., laptops, smartphones, and tablets). Furthermore, an increasing number of +non-traditional “things” are being attached to the Internet as end ­systems (see the Case History +feature). +End systems are also referred to as hosts because they host (that is, run) application programs such as +a Web browser program, a Web server program, an e-mail client program, or an e-mail server program. +Throughout this book we will use the + + Figure 1.3 End-system interaction + +CASE HISTORY +THE INTERNET OF THINGS +Can you imagine a world in which just about everything is wirelessly connected to the Internet? +A world in which most people, cars, bicycles, eye glasses, watches, toys, hospital equipment, +home sensors, classrooms, video surveillance systems, atmospheric sensors, store-shelf + + products, and pets are connected? This world of the Internet of Things (IoT) may actually be just +around the corner. +By some estimates, as of 2015 there are already 5 billion things connected to the Internet, and +the number could reach 25 billion by 2020 [Gartner 2014]. These things include our +smartphones, which already follow us around in our homes, offices, and cars, reporting our geolocations and usage data to our ISPs and Internet applications. But in addition to our +smartphones, a wide-variety of non-traditional “things” are already available as products. For +example, there are Internet-connected wearables, including watches (from Apple and many +others) and eye glasses. Internet-connected glasses can, for example, upload everything we see +to the cloud, allowing us to share our visual experiences with people around the world in realtime. There are Internet-connected things already available for the smart home, including +Internet-connected thermostats that can be controlled remotely from our smartphones, and +Internet-connected body scales, enabling us to graphically review the progress of our diets from +our smartphones. There are Internet-connected toys, including dolls that recognize and interpret +a child’s speech and respond appropriately. +The IoT offers potentially revolutionary benefits to users. But at the same time there are also +huge security and privacy risks. For example, attackers, via the Internet, might be able to hack +into IoT devices or into the servers collecting data from IoT devices. For example, an attacker +could hijack an Internet-connected doll and talk directly with a child; or an attacker could hack +into a database that stores ­personal health and activity information collected from wearable +devices. These security and privacy concerns could undermine the consumer confidence +necessary for the ­technologies to meet their full potential and may result in less widespread +adoption [FTC 2015]. + +terms hosts and end systems interchangeably; that is, host = end system. Hosts are sometimes further +divided into two categories: clients and servers. Informally, clients tend to be desktop and mobile PCs, +smartphones, and so on, whereas servers tend to be more powerful machines that store and distribute +Web pages, stream video, relay e-mail, and so on. Today, most of the servers from which we receive +search results, e-mail, Web pages, and videos reside in large data centers. For example, Google has +50-100 data centers, including about 15 large centers, each with more than 100,000 servers. + +1.2.1 Access Networks +Having considered the applications and end systems at the “edge of the network,” let’s next consider the +access network—the network that physically connects an end system to the first router (also known as +the “edge router”) on a path from the end system to any other distant end system. Figure 1.4 shows +several types of access + + Figure 1.4 Access networks + +networks with thick, shaded lines and the settings (home, enterprise, and wide-area mobile wireless) in +which they are used. +Home Access: DSL, Cable, FTTH, Dial-Up, and Satellite + + In developed countries as of 2014, more than 78 percent of the households have Internet access, with +Korea, Netherlands, Finland, and Sweden leading the way with more than 80 percent of households +having Internet access, almost all via a high-speed broadband connection [ITU 2015]. Given this +widespread use of home access networks let’s begin our overview of access networks by considering +how homes connect to the Internet. +Today, the two most prevalent types of broadband residential access are digital subscriber line (DSL) +and cable. A residence typically obtains DSL Internet access from the same local telephone company +(telco) that provides its wired local phone access. Thus, when DSL is used, a customer’s telco is also its +ISP. As shown in Figure 1.5, each customer’s DSL modem uses the existing telephone line (twistedpair copper wire, which we’ll discuss in Section 1.2.2) to exchange data with a digital subscriber line +access multiplexer (DSLAM) located in the telco’s local central office (CO). The home’s DSL modem +takes digital data and translates it to high-­frequency tones for transmission over telephone wires to the +CO; the analog signals from many such houses are translated back into digital format at the DSLAM. +The residential telephone line carries both data and traditional telephone signals simultaneously, which +are encoded at different frequencies: +A high-speed downstream channel, in the 50 kHz to 1 MHz band +A medium-speed upstream channel, in the 4 kHz to 50 kHz band +An ordinary two-way telephone channel, in the 0 to 4 kHz band +This approach makes the single DSL link appear as if there were three separate links, so that a +telephone call and an Internet connection can share the DSL link at the same time. + +Figure 1.5 DSL Internet access + +(We’ll describe this technique of frequency-division multiplexing in Section 1.3.1.) On the customer side, +a splitter separates the data and telephone signals arriving to the home and forwards the data signal to + + the DSL modem. On the telco side, in the CO, the DSLAM separates the data and phone signals and +sends the data into the Internet. Hundreds or even thousands of households connect to a single DSLAM +[Dischinger 2007]. +The DSL standards define multiple transmission rates, including 12 Mbps downstream and 1.8 Mbps +upstream [ITU 1999], and 55 Mbps downstream and 15 Mbps upstream [ITU 2006]. Because the +downstream and upstream rates are different, the access is said to be asymmetric. The actual +downstream and upstream transmission rates achieved may be less than the rates noted above, as the +DSL provider may purposefully limit a residential rate when tiered service (different rates, available at +different prices) are offered. The maximum rate is also limited by the distance between the home and +the CO, the gauge of the twisted-pair line and the degree of electrical interference. Engineers have +expressly designed DSL for short distances between the home and the CO; generally, if the residence is +not located within 5 to 10 miles of the CO, the residence must resort to an alternative form of Internet +access. +While DSL makes use of the telco’s existing local telephone infrastructure, cable Internet access +makes use of the cable television company’s existing cable television infrastructure. A residence obtains +cable Internet access from the same company that provides its cable television. As illustrated in Figure +1.6, fiber optics connect the cable head end to neighborhood-level junctions, from which traditional +coaxial cable is then used to reach individual houses and apartments. Each neighborhood junction +typically supports 500 to 5,000 homes. Because both fiber and coaxial cable are employed in this +system, it is often referred to as hybrid fiber coax (HFC). + +Figure 1.6 A hybrid fiber-coaxial access network + +Cable internet access requires special modems, called cable modems. As with a DSL modem, the cable + + modem is typically an external device and connects to the home PC through an Ethernet port. (We will +discuss Ethernet in great detail in Chapter 6.) At the cable head end, the cable modem termination +system (CMTS) serves a similar function as the DSL network’s DSLAM—turning the analog signal sent +from the cable modems in many downstream homes back into digital format. Cable modems divide the +HFC network into two channels, a downstream and an upstream channel. As with DSL, access is +typically asymmetric, with the downstream channel typically allocated a higher transmission rate than +the upstream channel. The ­DOCSIS 2.0 standard defines downstream rates up to 42.8 Mbps and +upstream rates of up to 30.7 Mbps. As in the case of DSL networks, the maximum achievable rate may +not be realized due to lower contracted data rates or media impairments. +One important characteristic of cable Internet access is that it is a shared broadcast medium. In +particular, every packet sent by the head end travels downstream on every link to every home and every +packet sent by a home travels on the upstream channel to the head end. For this reason, if several +users are simultaneously downloading a video file on the downstream channel, the actual rate at which +each user receives its video file will be significantly lower than the aggregate cable downstream rate. On +the other hand, if there are only a few active users and they are all Web surfing, then each of the users +may actually receive Web pages at the full cable downstream rate, because the users will rarely request +a Web page at exactly the same time. Because the upstream channel is also shared, a distributed +multiple access protocol is needed to coordinate transmissions and avoid collisions. (We’ll discuss this +collision issue in some detail in Chapter 6.) +Although DSL and cable networks currently represent more than 85 percent of residential broadband +access in the United States, an up-and-coming technology that provides even higher speeds is fiber to +the home (FTTH) [FTTH Council 2016]. As the name suggests, the FTTH concept is simple—provide +an optical fiber path from the CO directly to the home. Many countries today—including the UAE, South +Korea, Hong Kong, Japan, Singapore, Taiwan, Lithuania, and Sweden—now have household +penetration rates exceeding 30% [FTTH Council 2016]. +There are several competing technologies for optical distribution from the CO to the homes. The +simplest optical distribution network is called direct fiber, with one fiber leaving the CO for each home. +More commonly, each fiber leaving the central office is actually shared by many homes; it is not until the +fiber gets relatively close to the homes that it is split into individual customer-specific fibers. There are +two competing optical-distribution network architectures that perform this splitting: active optical +networks (AONs) and passive optical networks (PONs). AON is essentially switched Ethernet, which is +discussed in Chapter 6. +Here, we briefly discuss PON, which is used in Verizon’s FIOS service. Fig­ure 1.7 shows FTTH using +the PON distribution architecture. Each home has an optical network terminator (ONT), which is +connected by dedicated optical fiber to a neighborhood splitter. The splitter combines a number of +homes (typically less + + Figure 1.7 FTTH Internet access + +than 100) onto a single, shared optical fiber, which connects to an optical line ­terminator (OLT) in the +telco’s CO. The OLT, providing conversion between optical and electrical signals, connects to the +Internet via a telco router. In the home, users connect a home router (typically a wireless router) to the +ONT and access the ­Internet via this home router. In the PON architecture, all packets sent from OLT +to the splitter are replicated at the splitter (similar to a cable head end). +FTTH can potentially provide Internet access rates in the gigabits per second range. However, most +FTTH ISPs provide different rate offerings, with the higher rates naturally costing more money. The +average downstream speed of US FTTH customers was approximately 20 Mbps in 2011 (compared +with 13 Mbps for cable access networks and less than 5 Mbps for DSL) [FTTH Council 2011b]. +Two other access network technologies are also used to provide Internet access to the home. In +locations where DSL, cable, and FTTH are not available (e.g., in some rural settings), a satellite link can +be used to connect a residence to the Internet at speeds of more than 1 Mbps; StarBand and +HughesNet are two such satellite access providers. Dial-up access over traditional phone lines is based +on the same model as DSL—a home modem connects over a phone line to a modem in the ISP. +Compared with DSL and other broadband access networks, dial-up access is excruciatingly slow at 56 +kbps. +Access in the Enterprise (and the Home): Ethernet and WiFi +On corporate and university campuses, and increasingly in home settings, a local area network (LAN) is +used to connect an end system to the edge router. Although there are many types of LAN technologies, +Ethernet is by far the most prevalent access technology in corporate, university, and home networks. As +shown in Figure 1.8, Ethernet users use twisted-pair copper wire to connect to an Ethernet switch, a +technology discussed in detail in Chapter 6. The Ethernet switch, or a network of such + + Figure 1.8 Ethernet Internet access + +interconnected switches, is then in turn connected into the larger Internet. With Ethernet access, users +typically have 100 Mbps or 1 Gbps access to the Ethernet switch, whereas servers may have 1 Gbps or +even 10 Gbps access. +Increasingly, however, people are accessing the Internet wirelessly from laptops, smartphones, tablets, +and other “things” (see earlier sidebar on “Internet of Things”). In a wireless LAN setting, wireless +users transmit/receive packets to/from an access point that is connected into the enterprise’s network +(most likely using wired Ethernet), which in turn is connected to the wired Internet. A wireless LAN user +must typically be within a few tens of meters of the access point. Wireless LAN access based on IEEE +802.11 technology, more colloquially known as WiFi, is now just about everywhere—universities, +business offices, cafes, airports, homes, and even in airplanes. In many cities, one can stand on a street +corner and be within range of ten or twenty base stations (for a browseable global map of 802.11 base +stations that have been discovered and logged on a Web site by people who take great enjoyment in +doing such things, see [wigle.net 2016]). As discussed in detail in Chapter 7, 802.11 today provides a +shared transmission rate of up to more than 100 Mbps. +Even though Ethernet and WiFi access networks were initially deployed in enterprise (corporate, +university) settings, they have recently become relatively common components of home networks. Many +homes combine broadband residential access (that is, cable modems or DSL) with these inexpensive +wireless LAN technologies to create powerful home networks [Edwards 2011]. Figure 1.9 shows a +typical home network. This home network consists of a roaming laptop as well as a wired PC; a base +station (the wireless access point), which communicates with the wireless PC and other wireless +devices in the home; a cable modem, providing broadband access to the Internet; and a router, which +interconnects the base station and the stationary PC with the cable modem. This network allows +household members to have broadband access to the Internet with one member roaming from the + + kitchen to the backyard to the bedrooms. + +Figure 1.9 A typical home network + +Wide-Area Wireless Access: 3G and LTE +Increasingly, devices such as iPhones and Android devices are being used to message, share photos in +social networks, watch movies, and stream music while on the run. These devices employ the same +wireless infrastructure used for cellular telephony to send/receive packets through a base station that is +operated by the cellular network provider. Unlike WiFi, a user need only be within a few tens of +kilometers (as opposed to a few tens of meters) of the base station. +Telecommunications companies have made enormous investments in so-called third-generation (3G) +wireless, which provides packet-switched wide-area wireless Internet access at speeds in excess of 1 +Mbps. But even higher-speed wide-area access technologies—a fourth-generation (4G) of wide-area +wireless networks—are already being deployed. LTE (for “Long-Term Evolution”—a candidate for Bad +Acronym of the Year Award) has its roots in 3G technology, and can achieve rates in excess of 10 +Mbps. LTE downstream rates of many tens of Mbps have been reported in commercial deployments. +We’ll cover the basic principles of wireless networks and mobility, as well as WiFi, 3G, and LTE +technologies (and more!) in Chapter 7. + +1.2.2 Physical Media +In the previous subsection, we gave an overview of some of the most important network access +technologies in the Internet. As we described these technologies, we also indicated the physical media +used. For example, we said that HFC uses a combination of fiber cable and coaxial cable. We said that +DSL and Ethernet use copper wire. And we said that mobile access networks use the radio spectrum. In +this subsection we provide a brief overview of these and other transmission media that are commonly +used in the Internet. + + In order to define what is meant by a physical medium, let us reflect on the brief life of a bit. Consider a +bit traveling from one end system, through a series of links and routers, to another end system. This +poor bit gets kicked around and transmitted many, many times! The source end system first transmits +the bit, and shortly thereafter the first router in the series receives the bit; the first router then transmits +the bit, and shortly thereafter the second router receives the bit; and so on. Thus our bit, when traveling +from source to destination, passes through a series of transmitter-receiver pairs. For each transmitterreceiver pair, the bit is sent by propagating electromagnetic waves or optical pulses across a physical +medium. The physical medium can take many shapes and forms and does not have to be of the same +type for each transmitter-receiver pair along the path. Examples of physical media include twisted-pair +copper wire, coaxial cable, multimode fiber-optic cable, terrestrial radio spectrum, and satellite radio +spectrum. Physical media fall into two categories: guided media and unguided media. With guided +media, the waves are guided along a solid medium, such as a fiber-optic cable, a twisted-pair copper +wire, or a coaxial cable. With unguided media, the waves propagate in the atmosphere and in outer +space, such as in a wireless LAN or a digital satellite channel. +But before we get into the characteristics of the various media types, let us say a few words about their +costs. The actual cost of the physical link (copper wire, fiber-optic cable, and so on) is often relatively +minor compared with other networking costs. In particular, the labor cost associated with the installation +of the physical link can be orders of magnitude higher than the cost of the material. For this reason, +many builders install twisted pair, optical fiber, and coaxial cable in every room in a building. Even if only +one medium is initially used, there is a good chance that another medium could be used in the near +future, and so money is saved by not having to lay additional wires in the future. +Twisted-Pair Copper Wire +The least expensive and most commonly used guided transmission medium is twisted-pair copper wire. +For over a hundred years it has been used by telephone networks. In fact, more than 99 percent of the +wired connections from the telephone handset to the local telephone switch use twisted-pair copper +wire. Most of us have seen twisted pair in our homes (or those of our parents or grandparents!) and +work environments. Twisted pair consists of two insulated copper wires, each about 1 mm thick, +arranged in a regular spiral pattern. The wires are twisted together to reduce the electrical interference +from similar pairs close by. Typically, a number of pairs are bundled together in a cable by wrapping the +pairs in a protective shield. A wire pair constitutes a single communication link. Unshielded twisted +pair (UTP) is commonly used for computer networks within a building, that is, for LANs. Data rates for +LANs using twisted pair today range from 10 Mbps to 10 Gbps. The data rates that can be achieved +depend on the thickness of the wire and the distance between transmitter and receiver. +When fiber-optic technology emerged in the 1980s, many people disparaged twisted pair because of its +relatively low bit rates. Some people even felt that fiber-optic technology would completely replace +twisted pair. But twisted pair did not give up so easily. Modern twisted-pair technology, such as category + + 6a cable, can achieve data rates of 10 Gbps for distances up to a hundred meters. In the end, twisted +pair has emerged as the dominant solution for high-speed LAN networking. +As discussed earlier, twisted pair is also commonly used for residential Internet access. We saw that +dial-up modem technology enables access at rates of up to 56 kbps over twisted pair. We also saw that +DSL (digital subscriber line) technology has enabled residential users to access the Internet at tens of +Mbps over twisted pair (when users live close to the ISP’s central office). +Coaxial Cable +Like twisted pair, coaxial cable consists of two copper conductors, but the two conductors are concentric +rather than parallel. With this construction and special insulation and shielding, coaxial cable can +achieve high data transmission rates. Coaxial cable is quite common in cable television systems. As we +saw earlier, cable television systems have recently been coupled with cable modems to provide +residential users with Internet access at rates of tens of Mbps. In cable television and cable Internet +access, the transmitter shifts the digital signal to a specific frequency band, and the resulting analog +signal is sent from the transmitter to one or more receivers. Coaxial cable can be used as a guided +shared medium. Specifically, a number of end systems can be connected directly to the cable, with +each of the end systems receiving whatever is sent by the other end systems. +Fiber Optics +An optical fiber is a thin, flexible medium that conducts pulses of light, with each pulse representing a +bit. A single optical fiber can support tremendous bit rates, up to tens or even hundreds of gigabits per +second. They are immune to electromagnetic interference, have very low signal attenuation up to 100 +kilometers, and are very hard to tap. These characteristics have made fiber optics the preferred longhaul guided transmission media, particularly for overseas links. Many of the long-distance telephone +networks in the United States and elsewhere now use fiber optics exclusively. Fiber optics is also +prevalent in the backbone of the Internet. However, the high cost of optical devices—such as +transmitters, receivers, and switches—has hindered their deployment for short-haul transport, such as in +a LAN or into the home in a residential access network. The Optical Carrier (OC) standard link speeds +range from 51.8 Mbps to 39.8 Gbps; these specifications are often referred to as OC-n, where the link +speed equals n ∞ 51.8 Mbps. Standards in use today include OC-1, OC-3, OC-12, OC-24, OC-48, OC96, OC-192, OC-768. [Mukherjee 2006, Ramaswami 2010] provide coverage of various aspects of +optical networking. +Terrestrial Radio Channels +Radio channels carry signals in the electromagnetic spectrum. They are an attractive medium because +they require no physical wire to be installed, can penetrate walls, provide connectivity to a mobile user, + + and can potentially carry a signal for long distances. The characteristics of a radio channel depend +significantly on the propagation environment and the distance over which a signal is to be carried. +Environmental considerations determine path loss and shadow fading (which decrease the signal +strength as the signal travels over a distance and around/through obstructing objects), multipath fading +(due to signal reflection off of interfering objects), and interference (due to other transmissions and +electromagnetic signals). +Terrestrial radio channels can be broadly classified into three groups: those that operate over very short +distance (e.g., with one or two meters); those that operate in local areas, typically spanning from ten to a +few hundred meters; and those that operate in the wide area, spanning tens of kilometers. Personal +devices such as wireless headsets, keyboards, and medical devices operate over short distances; the +wireless LAN technologies described in Section 1.2.1 use local-area radio channels; the cellular access +technologies use wide-area radio channels. We’ll discuss radio channels in detail in Chapter 7. +Satellite Radio Channels +A communication satellite links two or more Earth-based microwave transmitter/ receivers, known as +ground stations. The satellite receives transmissions on one frequency band, regenerates the signal +using a repeater (discussed below), and transmits the signal on another frequency. Two types of +satellites are used in communications: geostationary satellites and low-earth orbiting (LEO) +satellites [Wiki Satellite 2016]. +Geostationary satellites permanently remain above the same spot on Earth. This stationary presence is +achieved by placing the satellite in orbit at 36,000 kilometers above Earth’s surface. This huge distance +from ground station through satellite back to ground station introduces a substantial signal propagation +delay of 280 milliseconds. Nevertheless, satellite links, which can operate at speeds of hundreds of +Mbps, are often used in areas without access to DSL or cable-based Internet access. +LEO satellites are placed much closer to Earth and do not remain permanently above one spot on Earth. +They rotate around Earth (just as the Moon does) and may communicate with each other, as well as +with ground stations. To provide continuous coverage to an area, many satellites need to be placed in +orbit. There are currently many low-altitude communication systems in development. LEO satellite +technology may be used for Internet access sometime in the future. + + 1.3 The Network Core +Having examined the Internet’s edge, let us now delve more deeply inside the network core—the mesh +of packet switches and links that interconnects the Internet’s end systems. Figure 1.10 highlights the +network core with thick, shaded lines. + + Figure 1.10 The network core + +1.3.1 Packet Switching +In a network application, end systems exchange messages with each other. Messages can contain +anything the application designer wants. Messages may perform a control function (for example, the “Hi” +messages in our handshaking example in Figure 1.2) or can contain data, such as an e-mail message, +a JPEG image, or an MP3 audio file. To send a message from a source end system to a destination end +system, the source breaks long messages into smaller chunks of data known as packets. Between +source and destination, each packet travels through communication links and packet switches (for +which there are two predominant types, routers and link-layer switches). Packets are transmitted over +each communication link at a rate equal to the full transmission rate of the link. So, if a source end +system or a packet switch is sending a packet of L bits over a link with transmission rate R bits/sec, then +the time to transmit the packet is L / R seconds. +Store-and-Forward Transmission +Most packet switches use store-and-forward transmission at the inputs to the links. Store-and-forward +transmission means that the packet switch must receive the entire packet before it can begin to transmit +the first bit of the packet onto the outbound link. To explore store-and-forward transmission in more +detail, consider a simple network consisting of two end systems connected by a single router, as shown +in Figure 1.11. A router will typically have many incident links, since its job is to switch an incoming +packet onto an outgoing link; in this simple example, the router has the rather simple task of transferring +a packet from one (input) link to the only other attached link. In this example, the source has three +packets, each consisting of L bits, to send to the destination. At the snapshot of time shown in Figure +1.11, the source has transmitted some of packet 1, and the front of packet 1 has already arrived at the +router. Because the router employs store-and-forwarding, at this instant of time, the router cannot +transmit the bits it has received; instead it must first buffer (i.e., “store”) the packet’s bits. Only after the +router has received all of the packet’s bits can it begin to transmit (i.e., “forward”) the packet onto the +outbound link. To gain some insight into store-and-forward transmission, let’s now calculate the amount +of time that elapses from when the source begins to send the packet until the destination has received +the entire packet. (Here we will ignore propagation delay—the time it takes for the bits to travel across +the wire at near the speed of light—which will be discussed in Section 1.4.) The source begins to +transmit at time 0; at time L/R seconds, the source has transmitted the entire packet, and the entire +packet has been received and stored at the router (since there is no propagation delay). At time L/R +seconds, since the router has just received the entire packet, it can begin to transmit the packet onto the +outbound link towards the destination; at time 2L/R, the router has transmitted the entire packet, and the + + entire packet has been received by the destination. Thus, the total delay is 2L/R. If the + +Figure 1.11 Store-and-forward packet switching + +switch instead forwarded bits as soon as they arrive (without first receiving the entire packet), then the +total delay would be L/R since bits are not held up at the router. But, as we will discuss in Section 1.4, +routers need to receive, store, and process the entire packet before forwarding. +Now let’s calculate the amount of time that elapses from when the source begins to send the first packet +until the destination has received all three packets. As before, at time L/R, the router begins to forward +the first packet. But also at time L/R the source will begin to send the second packet, since it has just +finished sending the entire first packet. Thus, at time 2L/R, the destination has received the first packet +and the router has received the second packet. Similarly, at time 3L/R, the destination has received the +first two packets and the router has received the third packet. Finally, at time 4L/R the destination has +received all three packets! +Let’s now consider the general case of sending one packet from source to destination over a path +consisting of N links each of rate R (thus, there are N-1 routers between source and destination). +Applying the same logic as above, we see that the end-to-end delay is: +dend-to-end=NLR + +(1.1) + +You may now want to try to determine what the delay would be for P packets sent over a series of N +links. +Queuing Delays and Packet Loss +Each packet switch has multiple links attached to it. For each attached link, the packet switch has an +output buffer (also called an output queue), which stores packets that the router is about to send into +that link. The output buffers play a key role in packet switching. If an arriving packet needs to be +transmitted onto a link but finds the link busy with the transmission of another packet, the arriving packet +must wait in the output buffer. Thus, in addition to the store-and-forward delays, packets suffer output +buffer queuing delays. These delays are variable and depend on the level of congestion in the network. + + Since the amount of buffer space is finite, an + +Figure 1.12 Packet switching + +arriving packet may find that the buffer is completely full with other packets waiting for transmission. In +this case, packet loss will occur—either the arriving packet or one of the already-queued packets will +be dropped. +Figure 1.12 illustrates a simple packet-switched network. As in Figure 1.11, packets are represented by +three-dimensional slabs. The width of a slab represents the number of bits in the packet. In this figure, +all packets have the same width and hence the same length. Suppose Hosts A and B are sending +packets to Host E. Hosts A and B first send their packets along 100 Mbps Ethernet links to the first +router. The router then directs these packets to the 15 Mbps link. If, during a short interval of time, the +arrival rate of packets to the router (when converted to bits per second) exceeds 15 Mbps, congestion +will occur at the router as packets queue in the link’s output buffer before being transmitted onto the link. +For example, if Host A and B each send a burst of five packets back-to-back at the same time, then +most of these packets will spend some time waiting in the queue. The situation is, in fact, entirely +analogous to many common-day situations—for example, when we wait in line for a bank teller or wait in +front of a tollbooth. We’ll examine this queuing delay in more detail in Section 1.4. +Forwarding Tables and Routing Protocols +Earlier, we said that a router takes a packet arriving on one of its attached communication links and +forwards that packet onto another one of its attached communication links. But how does the router +determine which link it should forward the packet onto? Packet forwarding is actually done in different +ways in different types of computer networks. Here, we briefly describe how it is done in the Internet. + + In the Internet, every end system has an address called an IP address. When a source end system +wants to send a packet to a destination end system, the source includes the destination’s IP address in +the packet’s header. As with postal addresses, this address has a hierarchical structure. When a packet +arrives at a router in the network, the router examines a portion of the packet’s destination address and +forwards the packet to an adjacent router. More specifically, each router has a forwarding table that +maps destination addresses (or portions of the destination addresses) to that router’s outbound links. +When a packet arrives at a router, the router examines the address and searches its forwarding table, +using this destination address, to find the appropriate outbound link. The router then directs the packet +to this outbound link. +The end-to-end routing process is analogous to a car driver who does not use maps but instead prefers +to ask for directions. For example, suppose Joe is driving from Philadelphia to 156 Lakeside Drive in +Orlando, Florida. Joe first drives to his neighborhood gas station and asks how to get to 156 Lakeside +Drive in Orlando, Florida. The gas station attendant extracts the Florida portion of the address and tells +Joe that he needs to get onto the interstate highway I-95 South, which has an entrance just next to the +gas station. He also tells Joe that once he enters Florida, he should ask someone else there. Joe then +takes I-95 South until he gets to Jacksonville, Florida, at which point he asks another gas station +attendant for directions. The attendant extracts the Orlando portion of the address and tells Joe that he +should continue on I-95 to Daytona Beach and then ask someone else. In Daytona Beach, another gas +station attendant also extracts the Orlando portion of the address and tells Joe that he should take I-4 +directly to Orlando. Joe takes I-4 and gets off at the Orlando exit. Joe goes to another gas station +attendant, and this time the attendant extracts the Lakeside Drive portion of the address and tells Joe +the road he must follow to get to Lakeside Drive. Once Joe reaches Lakeside Drive, he asks a kid on a +bicycle how to get to his destination. The kid extracts the 156 portion of the address and points to the +house. Joe finally reaches his ultimate destination. In the above analogy, the gas station attendants and +kids on bicycles are analogous to routers. +We just learned that a router uses a packet’s destination address to index a forwarding table and +determine the appropriate outbound link. But this statement begs yet another question: How do +forwarding tables get set? Are they configured by hand in each and every router, or does the Internet +use a more automated procedure? This issue will be studied in depth in Chapter 5. But to whet your +appetite here, we’ll note now that the Internet has a number of special routing protocols that are used +to automatically set the forwarding tables. A routing protocol may, for example, determine the shortest +path from each router to each destination and use the shortest path results to configure the forwarding +tables in the routers. +How would you actually like to see the end-to-end route that packets take in the Internet? We now invite +you to get your hands dirty by interacting with the Trace-route program. Simply visit the site +www.traceroute.org, choose a source in a particular country, and trace the route from that source to +your computer. (For a discussion of Traceroute, see Section 1.4.) + + 1.3.2 Circuit Switching +There are two fundamental approaches to moving data through a network of links and switches: circuit +switching and packet switching. Having covered packet-switched networks in the previous +subsection, we now turn our attention to circuit-switched networks. +In circuit-switched networks, the resources needed along a path (buffers, link transmission rate) to +provide for communication between the end systems are reserved for the duration of the communication +session between the end systems. In packet-switched networks, these resources are not reserved; a +session’s messages use the resources on demand and, as a consequence, may have to wait (that is, +queue) for access to a communication link. As a simple analogy, consider two restaurants, one that +requires reservations and another that neither requires reservations nor accepts them. For the +restaurant that requires reservations, we have to go through the hassle of calling before we leave home. +But when we arrive at the restaurant we can, in principle, immediately be seated and order our meal. +For the restaurant that does not require reservations, we don’t need to bother to reserve a table. But +when we arrive at the restaurant, we may have to wait for a table before we can be seated. +Traditional telephone networks are examples of circuit-switched networks. ­Consider what happens +when one person wants to send information (voice or facsimile) to another over a telephone network. +Before the sender can send the information, the network must establish a connection between the +sender and the receiver. This is a bona fide connection for which the switches on the path between the +sender and receiver maintain connection state for that connection. In the jargon of telephony, this +connection is called a circuit. When the network establishes the circuit, it also reserves a constant +transmission rate in the network’s links (representing a fraction of each link’s transmission capacity) for +the duration of the connection. Since a given transmission rate has been reserved for this sender-toreceiver connection, the sender can transfer the data to the receiver at the guaranteed constant rate. +Figure 1.13 illustrates a circuit-switched network. In this network, the four circuit switches are +interconnected by four links. Each of these links has four circuits, so that each link can support four +simultaneous connections. The hosts (for example, PCs and workstations) are each directly connected +to one of the switches. When two hosts want to communicate, the network establishes a dedicated endto-end connection between the two hosts. Thus, in order for Host A to communicate with Host B, the +network must first reserve one circuit on each of two links. In this example, the dedicated end-to-end +connection uses the second circuit in the first link and the fourth circuit in the second link. Because each +link has four circuits, for each link used by the end-to-end connection, the connection gets one fourth of +the link’s total transmission capacity for the duration of the connection. Thus, for example, if each link +between adjacent switches has a transmission rate of 1 Mbps, then each end-to-end circuit-switch +connection gets 250 kbps of dedicated transmission rate. + + Figure 1.13 A simple circuit-switched network consisting of four switches and four links + +In contrast, consider what happens when one host wants to send a packet to another host over a +packet-switched network, such as the Internet. As with circuit switching, the packet is transmitted over a +series of communication links. But different from circuit switching, the packet is sent into the network +without reserving any link resources whatsoever. If one of the links is congested because other packets +need to be transmitted over the link at the same time, then the packet will have to wait in a buffer at the +sending side of the transmission link and suffer a delay. The Internet makes its best effort to deliver +packets in a timely manner, but it does not make any guarantees. +Multiplexing in Circuit-Switched Networks +A circuit in a link is implemented with either frequency-division multiplexing (FDM) or time-division +multiplexing (TDM). With FDM, the frequency spectrum of a link is divided up among the connections +established across the link. Specifically, the link dedicates a frequency band to each connection for the +duration of the connection. In telephone networks, this frequency band typically has a width of 4 kHz +(that is, 4,000 hertz or 4,000 cycles per second). The width of the band is called, not surprisingly, the +bandwidth. FM radio stations also use FDM to share the frequency spectrum between 88 MHz and 108 +MHz, with each station being allocated a specific frequency band. +For a TDM link, time is divided into frames of fixed duration, and each frame is divided into a fixed +number of time slots. When the network establishes a connection across a link, the network dedicates +one time slot in every frame to this connection. These slots are dedicated for the sole use of that +connection, with one time slot available for use (in every frame) to transmit the connection’s data. + + Figure 1.14 +With FDM, each circuit continuously gets a fraction of the bandwidth. With TDM, each circuit gets all of +the bandwidth periodically during brief intervals of time (that is, during slots) + +Figure 1.14 illustrates FDM and TDM for a specific network link supporting up to four circuits. For FDM, +the frequency domain is segmented into four bands, each of bandwidth 4 kHz. For TDM, the time +domain is segmented into frames, with four time slots in each frame; each circuit is assigned the same +dedicated slot in the revolving TDM frames. For TDM, the transmission rate of a circuit is equal to the +frame rate multiplied by the number of bits in a slot. For example, if the link transmits 8,000 frames per +second and each slot consists of 8 bits, then the transmission rate of each circuit is 64 kbps. +Proponents of packet switching have always argued that circuit switching is wasteful because the +dedicated circuits are idle during silent periods. For example, when one person in a telephone call +stops talking, the idle network resources (frequency bands or time slots in the links along the +connection’s route) cannot be used by other ongoing connections. As another example of how these +resources can be underutilized, consider a radiologist who uses a circuit-switched network to remotely +access a series of x-rays. The radiologist sets up a connection, requests an image, contemplates the +image, and then requests a new image. Network resources are allocated to the connection but are not +used (i.e., are wasted) during the radiologist’s contemplation periods. Proponents of packet switching +also enjoy pointing out that establishing end-to-end circuits and reserving end-to-end transmission +capacity is complicated and requires complex signaling software to coordinate the operation of the +switches along the end-to-end path. +Before we finish our discussion of circuit switching, let’s work through a numerical example that should +shed further insight on the topic. Let us consider how long it takes to send a file of 640,000 bits from +Host A to Host B over a circuit-switched network. Suppose that all links in the network use TDM with 24 +slots and have a bit rate of 1.536 Mbps. Also suppose that it takes 500 msec to establish an end-to-end +circuit before Host A can begin to transmit the file. How long does it take to send the file? Each circuit +has a transmission rate of (1.536 Mbps)/24=64 kbps, so it takes (640,000 bits)/(64 kbps)=10 seconds to +transmit the file. To this 10 seconds we add the circuit establishment time, giving 10.5 seconds to send +the file. Note that the transmission time is independent of the number of links: The transmission time +would be 10 seconds if the end-to-end circuit passed through one link or a hundred links. (The actual + + end-to-end delay also includes a propagation delay; see Section 1.4.) +Packet Switching Versus Circuit Switching +Having described circuit switching and packet switching, let us compare the two. Critics of packet +switching have often argued that packet switching is not suitable for real-time services (for example, +telephone calls and video conference calls) because of its variable and unpredictable end-to-end delays +(due primarily to variable and unpredictable queuing delays). Proponents of packet switching argue that +(1) it offers better sharing of transmission capacity than circuit switching and (2) it is simpler, more +efficient, and less costly to implement than circuit switching. An interesting discussion of packet +switching versus circuit switching is [Molinero-Fernandez 2002]. Generally speaking, people who do +not like to hassle with ­restaurant reservations prefer packet switching to circuit switching. +Why is packet switching more efficient? Let’s look at a simple example. Suppose users share a 1 Mbps +link. Also suppose that each user alternates between periods of activity, when a user generates data at +a constant rate of 100 kbps, and periods of inactivity, when a user generates no data. Suppose further +that a user is active only 10 percent of the time (and is idly drinking coffee during the remaining 90 +percent of the time). With circuit switching, 100 kbps must be reserved for each user at all times. For +example, with circuit-switched TDM, if a one-second frame is divided into 10 time slots of 100 ms each, +then each user would be allocated one time slot per frame. +Thus, the circuit-switched link can support only 10(=1 Mbps/100 kbps) simultaneous users. With packet +switching, the probability that a specific user is active is 0.1 (that is, 10 percent). If there are 35 users, +the probability that there are 11 or more simultaneously active users is approximately 0.0004. +(Homework Problem P8 outlines how this probability is obtained.) When there are 10 or fewer +simultaneously active users (which happens with probability 0.9996), the aggregate arrival rate of data is +less than or equal to 1 Mbps, the output rate of the link. Thus, when there are 10 or fewer active users, +users’ packets flow through the link essentially without delay, as is the case with circuit switching. When +there are more than 10 simultaneously active users, then the aggregate arrival rate of packets exceeds +the output capacity of the link, and the output queue will begin to grow. (It continues to grow until the +aggregate input rate falls back below 1 Mbps, at which point the queue will begin to diminish in length.) +Because the probability of having more than 10 simultaneously active users is minuscule in this +example, packet switching provides essentially the same performance as circuit switching, but does so +while allowing for more than three times the number of users. +Let’s now consider a second simple example. Suppose there are 10 users and that one user suddenly +generates one thousand 1,000-bit packets, while other users remain quiescent and do not generate +packets. Under TDM circuit switching with 10 slots per frame and each slot consisting of 1,000 bits, the +active user can only use its one time slot per frame to transmit data, while the remaining nine time slots +in each frame remain idle. It will be 10 seconds before all of the active user’s one million bits of data has + + been transmitted. In the case of packet switching, the active user can continuously send its packets at +the full link rate of 1 Mbps, since there are no other users generating packets that need to be +multiplexed with the active user’s packets. In this case, all of the active user’s data will be transmitted +within 1 second. +The above examples illustrate two ways in which the performance of packet switching can be superior to +that of circuit switching. They also highlight the crucial difference between the two forms of sharing a +link’s transmission rate among multiple data streams. Circuit switching pre-allocates use of the +transmission link regardless of demand, with allocated but unneeded link time going unused. Packet +switching on the other hand allocates link use on demand. Link transmission capacity will be shared on +a packet-by-packet basis only among those users who have packets that need to be transmitted over +the link. +Although packet switching and circuit switching are both prevalent in today’s telecommunication +networks, the trend has certainly been in the direction of packet switching. Even many of today’s circuitswitched telephone networks are slowly migrating toward packet switching. In particular, telephone +networks often use packet switching for the expensive overseas portion of a telephone call. + +1.3.3 A Network of Networks +We saw earlier that end systems (PCs, smartphones, Web servers, mail servers, and so on) connect +into the Internet via an access ISP. The access ISP can provide either wired or wireless connectivity, +using an array of access technologies including DSL, cable, FTTH, Wi-Fi, and cellular. Note that the +access ISP does not have to be a telco or a cable company; instead it can be, for example, a university +(providing Internet access to students, staff, and faculty), or a company (providing access for its +employees). But connecting end users and content providers into an access ISP is only a small piece of +solving the puzzle of connecting the billions of end systems that make up the Internet. To complete this +puzzle, the access ISPs themselves must be interconnected. This is done by creating a network of +networks—understanding this phrase is the key to understanding the Internet. +Over the years, the network of networks that forms the Internet has evolved into a very complex +structure. Much of this evolution is driven by economics and national policy, rather than by performance +considerations. In order to understand today’s Internet network structure, let’s incrementally build a +series of network structures, with each new structure being a better approximation of the complex +Internet that we have today. Recall that the overarching goal is to interconnect the access ISPs so that +all end systems can send packets to each other. One naive approach would be to have each access +ISP directly connect with every other access ISP. Such a mesh design is, of course, much too costly for +the access ISPs, as it would require each access ISP to have a separate communication link to each of +the hundreds of thousands of other access ISPs all over the world. + + Our first network structure, Network Structure 1, interconnects all of the access ISPs with a single global +transit ISP. Our (imaginary) global transit ISP is a network of routers and communication links that not +only spans the globe, but also has at least one router near each of the hundreds of thousands of access +ISPs. Of course, it would be very costly for the global ISP to build such an extensive network. To be +profitable, it would naturally charge each of the access ISPs for connectivity, with the pricing reflecting +(but not necessarily directly proportional to) the amount of traffic an access ISP exchanges with the +global ISP. Since the access ISP pays the global transit ISP, the access ISP is said to be a customer +and the global transit ISP is said to be a provider. +Now if some company builds and operates a global transit ISP that is profitable, then it is natural for +other companies to build their own global transit ISPs and compete with the original global transit ISP. +This leads to Network Structure 2, which consists of the hundreds of thousands of access ISPs and +multiple global ­transit ISPs. The access ISPs certainly prefer Network Structure 2 over Network +Structure 1 since they can now choose among the competing global transit providers as a function of +their pricing and services. Note, however, that the global transit ISPs themselves must interconnect: +Otherwise access ISPs connected to one of the global transit providers would not be able to +communicate with access ISPs connected to the other global transit providers. +Network Structure 2, just described, is a two-tier hierarchy with global transit providers residing at the +top tier and access ISPs at the bottom tier. This assumes that global transit ISPs are not only capable of +getting close to each and every access ISP, but also find it economically desirable to do so. In reality, +although some ISPs do have impressive global coverage and do directly connect with many access +ISPs, no ISP has presence in each and every city in the world. Instead, in any given region, there may +be a regional ISP to which the access ISPs in the region connect. Each regional ISP then connects to +tier-1 ISPs. Tier-1 ISPs are similar to our (imaginary) global transit ISP; but tier-1 ISPs, which actually +do exist, do not have a presence in every city in the world. There are approximately a dozen tier-1 ISPs, +including Level 3 Communications, AT&T, Sprint, and NTT. Interestingly, no group officially sanctions +tier-1 status; as the saying goes—if you have to ask if you’re a member of a group, you’re probably not. +Returning to this network of networks, not only are there multiple competing tier-1 ISPs, there may be +multiple competing regional ISPs in a region. In such a hierarchy, each access ISP pays the regional +ISP to which it connects, and each regional ISP pays the tier-1 ISP to which it connects. (An access ISP +can also connect directly to a tier-1 ISP, in which case it pays the tier-1 ISP). Thus, there is customerprovider relationship at each level of the hierarchy. Note that the tier-1 ISPs do not pay anyone as they +are at the top of the hierarchy. To further complicate matters, in some regions, there may be a larger +regional ISP (possibly spanning an entire country) to which the smaller regional ISPs in that region +connect; the larger regional ISP then connects to a tier-1 ISP. For example, in China, there are access +ISPs in each city, which connect to provincial ISPs, which in turn connect to national ISPs, which finally +connect to tier-1 ISPs [Tian 2012]. We refer to this multi-tier hierarchy, which is still only a crude + + approximation of today’s Internet, as Network Structure 3. +To build a network that more closely resembles today’s Internet, we must add points of presence +(PoPs), multi-homing, peering, and Internet exchange points (IXPs) to the hierarchical Network +Structure 3. PoPs exist in all levels of the hierarchy, except for the bottom (access ISP) level. A PoP is +simply a group of one or more routers (at the same location) in the provider’s network where customer +ISPs can connect into the provider ISP. For a customer network to connect to a provider’s PoP, it can +lease a high-speed link from a third-party telecommunications provider to directly connect one of its +routers to a router at the PoP. Any ISP (except for tier-1 ISPs) may choose to multi-home, that is, to +connect to two or more provider ISPs. So, for example, an access ISP may multi-home with two regional +ISPs, or it may multi-home with two regional ISPs and also with a tier-1 ISP. Similarly, a regional ISP +may multi-home with multiple tier-1 ISPs. When an ISP multi-homes, it can continue to send and receive +packets into the Internet even if one of its providers has a failure. +As we just learned, customer ISPs pay their provider ISPs to obtain global Internet interconnectivity. The +amount that a customer ISP pays a provider ISP reflects the amount of traffic it exchanges with the +provider. To reduce these costs, a pair of nearby ISPs at the same level of the hierarchy can peer, that +is, they can directly connect their networks together so that all the traffic between them passes over the +direct connection rather than through upstream intermediaries. When two ISPs peer, it is typically +settlement-free, that is, neither ISP pays the other. As noted earlier, tier-1 ISPs also peer with one +another, settlement-free. For a readable discussion of peering and customer-provider relationships, see +[Van der Berg 2008]. Along these same lines, a third-party company can create an Internet Exchange +Point (IXP), which is a meeting point where multiple ISPs can peer together. An IXP is typically in a +stand-alone building with its own switches [Ager 2012]. There are over 400 IXPs in the Internet today +[IXP List 2016]. We refer to this ecosystem—consisting of access ISPs, regional ISPs, tier-1 ISPs, +PoPs, multi-homing, peering, and IXPs—as Network Structure 4. +We now finally arrive at Network Structure 5, which describes today’s Internet. Network Structure 5, +illustrated in Figure 1.15, builds on top of Network Structure 4 by adding content-provider networks. +Google is currently one of the leading examples of such a content-provider network. As of this writing, it +is estimated that Google has 50–100 data centers distributed across North America, Europe, Asia, +South America, and Australia. Some of these data centers house over one hundred thousand servers, +while other data centers are smaller, housing only hundreds of servers. The Google data centers are all +interconnected via Google’s private TCP/IP network, which spans the entire globe but is nevertheless +separate from the public Internet. Importantly, the Google private network only carries traffic to/from +Google servers. As shown in Figure 1.15, the Google private network attempts to “bypass” the upper +tiers of the Internet by peering (settlement free) with lower-tier ISPs, either by directly connecting with +them or by connecting with them at IXPs [Labovitz 2010]. However, because many access ISPs can +still only be reached by transiting through tier-1 networks, the Google network also connects to tier-1 +ISPs, and pays those ISPs for the traffic it exchanges with them. By creating its own network, a content + + provider not only reduces its payments to upper-tier ISPs, but also has greater control of how its +services are ultimately delivered to end users. Google’s network infrastructure is described in greater +detail in Section 2.6. +In summary, today’s Internet—a network of networks—is complex, consisting of a dozen or so tier-1 +ISPs and hundreds of thousands of lower-tier ISPs. The ISPs are diverse in their coverage, with some +spanning multiple continents and oceans, and others limited to narrow geographic regions. The lowertier ISPs connect to the higher-tier ISPs, and the higher-tier ISPs interconnect with one another. Users +and content providers are customers of lower-tier ISPs, and lower-tier ISPs are customers of higher-tier +ISPs. In recent years, major content providers have also created their own networks and connect +directly into lower-tier ISPs where possible. + +Figure 1.15 Interconnection of ISPs + + 1.4 Delay, Loss, and Throughput in Packet-Switched Networks +Back in Section 1.1 we said that the Internet can be viewed as an infrastructure that provides services +to distributed applications running on end systems. Ideally, we would like Internet services to be able to +move as much data as we want between any two end systems, instantaneously, without any loss of +data. Alas, this is a lofty goal, one that is unachievable in reality. Instead, computer networks necessarily +constrain throughput (the amount of data per second that can be transferred) between end systems, +introduce delays between end systems, and can actually lose packets. On one hand, it is unfortunate +that the physical laws of reality introduce delay and loss as well as constrain throughput. On the other +hand, because computer networks have these problems, there are many fascinating issues surrounding +how to deal with the problems—more than enough issues to fill a course on computer networking and to +motivate thousands of PhD theses! In this section, we’ll begin to examine and quantify delay, loss, and +throughput in computer networks. + +1.4.1 Overview of Delay in Packet-Switched Networks +Recall that a packet starts in a host (the source), passes through a series of routers, and ends its +journey in another host (the destination). As a packet travels from one node (host or router) to the +subsequent node (host or router) along this path, the packet suffers from several types of delays at each +node along the path. The most important of these delays are the nodal processing delay, queuing +delay, transmission delay, and propagation delay; together, these delays accumulate to give a total +nodal delay. The performance of many Internet applications—such as search, Web browsing, e-mail, +maps, instant messaging, and voice-over-IP—are greatly affected by network delays. In order to acquire +a deep understanding of packet switching and computer networks, we must understand the nature and +importance of these delays. +Types of Delay +Let’s explore these delays in the context of Figure 1.16. As part of its end-to-end route between source +and destination, a packet is sent from the upstream node through router A to router B. Our goal is to +characterize the nodal delay at router A. Note that router A has an outbound link leading to router B. +This link is preceded by a queue (also known as a buffer). When the packet arrives at router A from the +upstream node, router A examines the packet’s header to determine the appropriate outbound link for +the packet and then directs the packet to this link. In this example, the outbound link for the packet is the +one that leads to router B. A packet can be transmitted on a link only if there is no other packet currently + + being transmitted on the link and if there are no other packets preceding it in the queue; if the link is + +Figure 1.16 The nodal delay at router A + +currently busy or if there are other packets already queued for the link, the newly arriving packet will +then join the queue. +Processing Delay +The time required to examine the packet’s header and determine where to direct the packet is part of +the processing delay. The processing delay can also include other factors, such as the time needed to +check for bit-level errors in the packet that occurred in transmitting the packet’s bits from the upstream +node to router A. Processing delays in high-speed routers are typically on the order of microseconds or +less. After this nodal processing, the router directs the packet to the queue that precedes the link to +router B. (In Chapter 4 we’ll study the details of how a router operates.) +Queuing Delay +At the queue, the packet experiences a queuing delay as it waits to be transmitted onto the link. The +length of the queuing delay of a specific packet will depend on the number of earlier-arriving packets +that are queued and waiting for transmission onto the link. If the queue is empty and no other packet is +currently being transmitted, then our packet’s queuing delay will be zero. On the other hand, if the traffic +is heavy and many other packets are also waiting to be transmitted, the queuing delay will be long. We +will see shortly that the number of packets that an arriving packet might expect to find is a function of the +intensity and nature of the traffic arriving at the queue. ­Queuing delays can be on the order of +microseconds to milliseconds in practice. +Transmission Delay +Assuming that packets are transmitted in a first-come-first-served manner, as is common in packetswitched networks, our packet can be transmitted only after all the packets that have arrived before it +have been transmitted. Denote the length of the packet by L bits, and denote the transmission rate of + + the link from router A to router B by R bits/sec. For example, for a 10 Mbps Ethernet link, the rate is +R=10 Mbps; for a 100 Mbps Ethernet link, the rate is R=100 Mbps. The transmission delay is L/R. This +is the amount of time required to push (that is, transmit) all of the packet’s bits into the link. +Transmission delays are typically on the order of microseconds to milliseconds in practice. +Propagation Delay +Once a bit is pushed into the link, it needs to propagate to router B. The time required to propagate from +the beginning of the link to router B is the propagation delay. The bit propagates at the propagation +speed of the link. The propagation speed depends on the physical medium of the link (that is, fiber +optics, twisted-pair copper wire, and so on) and is in the range of +2⋅108 meters/sec to 3⋅108 meters/sec +which is equal to, or a little less than, the speed of light. The propagation delay is the distance between +two routers divided by the propagation speed. That is, the propagation delay is d/s, where d is the +distance between router A and router B and s is the propagation speed of the link. Once the last bit of +the packet propagates to node B, it and all the preceding bits of the packet are stored in router B. The +whole process then continues with router B now performing the forwarding. In wide-area networks, +propagation delays are on the order of milliseconds. +Comparing Transmission and Propagation Delay + +Exploring propagation delay and transmission delay + +Newcomers to the field of computer networking sometimes have difficulty understanding the difference +between transmission delay and propagation delay. The difference is subtle but important. The +transmission delay is the amount of time required for the router to push out the packet; it is a function of +the packet’s length and the transmission rate of the link, but has nothing to do with the distance between +the two routers. The propagation delay, on the other hand, is the time it takes a bit to propagate from +one router to the next; it is a function of the distance between the two routers, but has nothing to do with +the packet’s length or the transmission rate of the link. +An analogy might clarify the notions of transmission and propagation delay. Consider a highway that has +a tollbooth every 100 kilometers, as shown in Figure 1.17. You can think of the highway segments + + between tollbooths as links and the tollbooths as routers. Suppose that cars travel (that is, propagate) +on the highway at a rate of 100 km/hour (that is, when a car leaves a tollbooth, it instantaneously +accelerates to 100 km/hour and maintains that speed between tollbooths). Suppose next that 10 cars, +traveling together as a caravan, follow each other in a fixed order. You can think of each car as a bit and +the caravan as a packet. Also suppose that each + +Figure 1.17 Caravan analogy + +tollbooth services (that is, transmits) a car at a rate of one car per 12 seconds, and that it is late at night +so that the caravan’s cars are the only cars on the highway. Finally, suppose that whenever the first car +of the caravan arrives at a tollbooth, it waits at the entrance until the other nine cars have arrived and +lined up behind it. (Thus the entire caravan must be stored at the tollbooth before it can begin to be +forwarded.) The time required for the tollbooth to push the entire caravan onto the highway is +(10 cars)/(5 cars/minute)=2 minutes. This time is analogous to the transmission delay in a router. The +time required for a car to travel from the exit of one tollbooth to the next tollbooth is +100 km/(100 km/hour)=1 hour. This time is analogous to propagation delay. Therefore, the time from +when the caravan is stored in front of a tollbooth until the caravan is stored in front of the next tollbooth +is the sum of transmission delay and propagation delay—in this example, 62 minutes. +Let’s explore this analogy a bit more. What would happen if the tollbooth service time for a caravan were +greater than the time for a car to travel between tollbooths? For example, suppose now that the cars +travel at the rate of 1,000 km/hour and the tollbooth services cars at the rate of one car per minute. Then +the traveling delay between two tollbooths is 6 minutes and the time to serve a caravan is 10 minutes. In +this case, the first few cars in the caravan will arrive at the second tollbooth before the last cars in the +caravan leave the first tollbooth. This situation also arises in packet-switched networks—the first bits in a +packet can arrive at a router while many of the remaining bits in the packet are still waiting to be +transmitted by the preceding router. +If a picture speaks a thousand words, then an animation must speak a million words. The Web site for +this textbook provides an interactive Java applet that nicely illustrates and contrasts transmission delay +and propagation delay. The reader is highly encouraged to visit that applet. [Smith 2009] also provides +a very readable discussion of propagation, queueing, and transmission delays. +If we let dproc, dqueue, dtrans, and dprop denote the processing, queuing, transmission, and propagation + + delays, then the total nodal delay is given by +dnodal=dproc+dqueue+dtrans+dprop +The contribution of these delay components can vary significantly. For example, dprop can be negligible +(for example, a couple of microseconds) for a link connecting two routers on the same university +campus; however, dprop is hundreds of milliseconds for two routers interconnected by a geostationary +satellite link, and can be the dominant term in dnodal. Similarly, dtrans can range from negligible to +significant. Its contribution is typically negligible for transmission rates of 10 Mbps and higher (for +example, for LANs); however, it can be hundreds of milliseconds for large Internet packets sent over +low-speed dial-up modem links. The processing delay, dproc, is often negligible; however, it strongly +influences a router’s maximum throughput, which is the maximum rate at which a router can forward +packets. + +1.4.2 Queuing Delay and Packet Loss +The most complicated and interesting component of nodal delay is the queuing delay, dqueue. In fact, +queuing delay is so important and interesting in computer networking that thousands of papers and +numerous books have been written about it [Bertsekas 1991; Daigle 1991; Kleinrock 1975, Kleinrock +1976; Ross 1995]. We give only a high-level, intuitive discussion of queuing delay here; the more +curious reader may want to browse through some of the books (or even eventually write a PhD thesis on +the subject!). Unlike the other three delays (namely, dproc, dtrans, and dprop), the queuing delay can vary +from packet to packet. For example, if 10 packets arrive at an empty queue at the same time, the first +packet transmitted will suffer no queuing delay, while the last packet transmitted will suffer a relatively +large queuing delay (while it waits for the other nine packets to be transmitted). Therefore, when +characterizing queuing delay, one typically uses statistical measures, such as average queuing delay, +variance of queuing delay, and the probability that the queuing delay exceeds some specified value. +When is the queuing delay large and when is it insignificant? The answer to this question depends on +the rate at which traffic arrives at the queue, the transmission rate of the link, and the nature of the +arriving traffic, that is, whether the traffic arrives periodically or arrives in bursts. To gain some insight +here, let a denote the average rate at which packets arrive at the queue (a is in units of packets/sec). +Recall that R is the transmission rate; that is, it is the rate (in bits/sec) at which bits are pushed out of the +queue. Also suppose, for simplicity, that all packets consist of L bits. Then the average rate at which bits +arrive at the queue is La bits/sec. Finally, assume that the queue is very big, so that it can hold +essentially an infinite number of bits. The ratio La/R, called the traffic intensity, often plays an +important role in estimating the extent of the queuing delay. If La/R > 1, then the average rate at which +bits arrive at the queue exceeds the rate at which the bits can be transmitted from the queue. In this + + unfortunate situation, the queue will tend to increase without bound and the queuing delay will approach +infinity! Therefore, one of the golden rules in traffic engineering is: Design your system so that the traffic +intensity is no greater than 1. +Now consider the case La/R ≤ 1. Here, the nature of the arriving traffic impacts the queuing delay. For +example, if packets arrive periodically—that is, one packet arrives every L/R seconds—then every +packet will arrive at an empty queue and there will be no queuing delay. On the other hand, if packets +arrive in bursts but periodically, there can be a significant average queuing delay. For example, suppose +N packets arrive simultaneously every (L/R)N seconds. Then the first packet transmitted has no queuing +delay; the second packet transmitted has a queuing delay of L/R seconds; and more generally, the nth +packet transmitted has a queuing delay of (n−1)L/R seconds. We leave it as an exercise for you to +calculate the average queuing delay in this example. +The two examples of periodic arrivals described above are a bit academic. ­Typically, the arrival +process to a queue is random; that is, the arrivals do not follow any pattern and the packets are spaced +apart by random amounts of time. In this more realistic case, the quantity La/R is not usually sufficient to +fully characterize the queuing delay statistics. Nonetheless, it is useful in gaining an intuitive +understanding of the extent of the queuing delay. In particular, if the traffic intensity is close to zero, then +packet arrivals are few and far between and it is unlikely that an arriving packet will find another packet +in the queue. Hence, the average queuing delay will be close to zero. On the other hand, when the +traffic intensity is close to 1, there will be intervals of time when the arrival rate exceeds the transmission +capacity (due to variations in packet arrival rate), and a queue will form during these periods of time; +when the arrival rate is less than the transmission capacity, the length of the queue will shrink. +Nonetheless, as the traffic intensity approaches 1, the average queue length gets larger and larger. The +qualitative dependence of average queuing delay on the traffic intensity is shown in Figure 1.18. +One important aspect of Figure 1.18 is the fact that as the traffic intensity approaches 1, the average +queuing delay increases rapidly. A small percentage increase in the intensity will result in a much larger +percentage-wise increase in delay. Perhaps you have experienced this phenomenon on the highway. If +you regularly drive on a road that is typically congested, the fact that the road is typically + + Figure 1.18 Dependence of average queuing delay on traffic intensity + +congested means that its traffic intensity is close to 1. If some event causes an even slightly larger-thanusual amount of traffic, the delays you experience can be huge. +To really get a good feel for what queuing delays are about, you are encouraged once again to visit the +textbook Web site, which provides an interactive Java applet for a queue. If you set the packet arrival +rate high enough so that the traffic intensity exceeds 1, you will see the queue slowly build up over time. +Packet Loss +In our discussions above, we have assumed that the queue is capable of holding an infinite number of +packets. In reality a queue preceding a link has finite capacity, although the queuing capacity greatly +depends on the router design and cost. Because the queue capacity is finite, packet delays do not really +approach infinity as the traffic intensity approaches 1. Instead, a packet can arrive to find a full queue. +With no place to store such a packet, a router will drop that packet; that is, the packet will be lost. This +overflow at a queue can again be seen in the Java applet for a queue when the traffic intensity is greater +than 1. +From an end-system viewpoint, a packet loss will look like a packet having been transmitted into the +network core but never emerging from the network at the destination. The fraction of lost packets +increases as the traffic intensity increases. Therefore, performance at a node is often measured not only +in terms of delay, but also in terms of the probability of packet loss. As we’ll discuss in the subsequent +chapters, a lost packet may be retransmitted on an end-to-end basis in order to ensure that all data are +eventually transferred from source to destination. + +1.4.3 End-to-End Delay + + Our discussion up to this point has focused on the nodal delay, that is, the delay at a single router. Let’s +now consider the total delay from source to destination. To get a handle on this concept, suppose there +are N−1 routers between the source host and the destination host. Let’s also suppose for the moment +that the network is uncongested (so that queuing delays are negligible), the processing delay at each +router and at the source host is dproc, the transmission rate out of each router and out of the source host +is R bits/sec, and the propagation on each link is dprop. The nodal delays accumulate and give an end-toend delay, +dend−end=N(dproc+dtrans+dprop) + +(1.2) + +where, once again, dtrans=L/R, where L is the packet size. Note that Equation 1.2 is a generalization of +Equation 1.1, which did not take into account processing and propagation delays. We leave it to you to +generalize Equation 1.2 to the case of ­heterogeneous delays at the nodes and to the presence of an +average queuing delay at each node. +Traceroute + +Using Traceroute to discover network paths and measure network delay + +To get a hands-on feel for end-to-end delay in a computer network, we can make use of the Traceroute +program. Traceroute is a simple program that can run in any Internet host. When the user specifies a +destination hostname, the program in the source host sends multiple, special packets toward that +destination. As these packets work their way toward the destination, they pass through a series of +routers. When a router receives one of these special packets, it sends back to the source a short +message that contains the name and address of the router. +More specifically, suppose there are N−1 routers between the source and the destination. Then the +source will send N special packets into the network, with each packet addressed to the ultimate +destination. These N special packets are marked 1 through N, with the first packet marked 1 and the last +packet marked N. When the nth router receives the nth packet marked n, the router does not forward +the packet toward its destination, but instead sends a message back to the source. When the +destination host receives the Nth packet, it too returns a message back to the source. The source +records the time that elapses between when it sends a packet and when it receives the corresponding + + return message; it also records the name and address of the router (or the destination host) that returns +the message. In this manner, the source can reconstruct the route taken by packets flowing from source +to destination, and the source can determine the round-trip delays to all the intervening routers. +Traceroute actually repeats the experiment just described three times, so the source actually sends 3 • +N packets to the destination. RFC 1393 describes Traceroute in detail. +Here is an example of the output of the Traceroute program, where the route was being traced from the +source host gaia.cs.umass.edu (at the University of ­Massachusetts) to the host cis.poly.edu (at +Polytechnic University in Brooklyn). The output has six columns: the first column is the n value +described above, that is, the number of the router along the route; the second column is the name of the +router; the third column is the address of the router (of the form xxx.xxx.xxx.xxx); the last three columns +are the round-trip delays for three experiments. If the source receives fewer than three messages from +any given router (due to packet loss in the network), Traceroute places an asterisk just after the router +number and reports fewer than three round-trip times for that router. + +1 + +cs-gw (128.119.240.254) 1.009 ms 0.899 ms 0.993 ms + +2 + +128.119.3.154 (128.119.3.154) 0.931 ms 0.441 ms 0.651 ms + +3 + +-border4-rt-gi-1-3.gw.umass.edu (128.119.2.194) 1.032 ms 0.484 ms + +0.451 ms +4 + +-acr1-ge-2-1-0.Boston.cw.net (208.172.51.129) 10.006 ms 8.150 ms 8.460 + +ms +5 + +-agr4-loopback.NewYork.cw.net (206.24.194.104) 12.272 ms 14.344 ms + +13.267 ms +6 + +-acr2-loopback.NewYork.cw.net (206.24.194.62) 13.225 ms 12.292 ms + +12.148 ms +7 + +-pos10-2.core2.NewYork1.Level3.net (209.244.160.133) 12.218 ms 11.823 + +ms 11.793 ms +8 + +-gige9-1-52.hsipaccess1.NewYork1.Level3.net (64.159.17.39) 13.081 ms + +11.556 ms 13.297 ms +9 + +-p0-0.polyu.bbnplanet.net (4.25.109.122) 12.716 ms 13.052 ms 12.786 ms + +10 cis.poly.edu (128.238.32.126) 14.080 ms 13.035 ms 12.802 ms + +In the trace above there are nine routers between the source and the destination. Most of these routers +have a name, and all of them have addresses. For example, the name of Router 3 is border4-rt-gi1-3.gw.umass.edu and its address is 128.119.2.194 . Looking at the data provided for this same +router, we see that in the first of the three trials the round-trip delay between the source and the router +was 1.03 msec. The round-trip delays for the subsequent two trials were 0.48 and 0.45 msec. These + + round-trip delays include all of the delays just discussed, including transmission delays, propagation +delays, router processing delays, and queuing delays. Because the queuing delay is varying with time, +the round-trip delay of packet n sent to a router n can sometimes be longer than the round-trip delay of +packet n+1 sent to router n+1. Indeed, we observe this phenomenon in the above example: the delays +to Router 6 are larger than the delays to Router 7! +Want to try out Traceroute for yourself? We highly recommended that you visit http:// +www.traceroute.org, which provides a Web interface to an extensive list of sources for route tracing. +You choose a source and supply the hostname for any destination. The Traceroute program then does +all the work. There are a number of free software programs that provide a graphical interface to +Traceroute; one of our favorites is PingPlotter [PingPlotter 2016]. +End System, Application, and Other Delays +In addition to processing, transmission, and propagation delays, there can be additional significant +delays in the end systems. For example, an end system wanting to transmit a packet into a shared +medium (e.g., as in a WiFi or cable modem scenario) may purposefully delay its transmission as part of +its protocol for sharing the medium with other end systems; we’ll consider such protocols in detail in +Chapter 6. Another important delay is media packetization delay, which is present in Voice-over-IP +(VoIP) applications. In VoIP, the sending side must first fill a packet with encoded digitized speech +before passing the packet to the Internet. This time to fill a packet—called the packetization delay—can +be significant and can impact the user-perceived quality of a VoIP call. This issue will be further +explored in a homework problem at the end of this chapter. + +1.4.4 Throughput in Computer Networks +In addition to delay and packet loss, another critical performance measure in computer networks is endto-end throughput. To define throughput, consider transferring a large file from Host A to Host B across +a computer network. This transfer might be, for example, a large video clip from one peer to another in a +P2P file sharing system. The instantaneous throughput at any instant of time is the rate (in bits/sec) at +which Host B is receiving the file. (Many applications, including many P2P file sharing ­systems, display +the instantaneous throughput during downloads in the user interface—perhaps you have observed this +before!) If the file consists of F bits and the transfer takes T seconds for Host B to receive all F bits, then +the average throughput of the file transfer is F/T bits/sec. For some applications, such as Internet +telephony, it is desirable to have a low delay and an instantaneous throughput consistently above some +threshold (for example, over 24 kbps for some Internet telephony applications and over 256 kbps for +some real-time video applications). For other applications, including those involving file transfers, delay +is not critical, but it is desirable to have the highest possible throughput. + + To gain further insight into the important concept of throughput, let’s consider a few examples. Figure +1.19(a) shows two end systems, a server and a client, connected by two communication links and a +router. Consider the throughput for a file transfer from the server to the client. Let Rs denote the rate of +the link between the server and the router; and Rc denote the rate of the link between the router and the +client. Suppose that the only bits being sent in the entire network are those from the server to the client. +We now ask, in this ideal scenario, what is the server-to-client throughput? To answer this question, we +may think of bits as fluid and communication links as pipes. Clearly, the server cannot pump bits through +its link at a rate faster than Rs bps; and the router cannot forward bits at a rate faster than Rc bps. If +Rs +S:  250 alice@crepes.fr ... Sender ok +C:  RCPT TO: +S:  250 bob@hamburger.edu ... Recipient ok +C:  DATA +S:  354 Enter mail, end with ”.” on a line by itself +C:  Do you like ketchup? +C:  How about pickles? +C:  . +S:  250 Message accepted for delivery +C:  QUIT +S:  221 hamburger.edu closing connection + +In the example above, the client sends a message (“ Do you like ketchup? How about +pickles? ”) from mail server crepes.fr to mail server hamburger.edu . As part of the dialogue, +the client issued five commands: HELO (an abbreviation for HELLO), MAIL FROM , RCPT TO , DATA , +and QUIT . These commands are self-explanatory. The client also sends a line consisting of a single +period, which indicates the end of the message to the server. (In ASCII jargon, each message ends with +CRLF.CRLF , where CR and LF stand for carriage return and line feed, respectively.) The server +issues replies to each command, with each reply having a reply code and some (optional) Englishlanguage explanation. We mention here that SMTP uses persistent connections: If the sending mail +server has several messages to send to the same receiving mail server, it can send all of the messages +over the same TCP connection. For each message, the client begins the process with a new MAIL +FROM: crepes.fr , designates the end of message with an isolated period, and issues QUIT only +after all messages have been sent. +It is highly recommended that you use Telnet to carry out a direct dialogue with an SMTP server. To do +this, issue + +telnet serverName 25 + +where serverName is the name of a local mail server. When you do this, you are simply establishing a +TCP connection between your local host and the mail server. After typing this line, you should +immediately receive the 220 reply from the server. Then issue the SMTP commands HELO , MAIL +FROM , RCPT TO , DATA , CRLF.CRLF , and QUIT at the appropriate times. It is also highly +recommended that you do Programming Assignment 3 at the end of this chapter. In that assignment, +you’ll build a simple user agent that implements the client side of SMTP. It will allow you to send an e- + + mail message to an arbitrary recipient via a local mail server. + +2.3.2 Comparison with HTTP +Let’s now briefly compare SMTP with HTTP. Both protocols are used to transfer files from one host to +another: HTTP transfers files (also called objects) from a Web server to a Web client (typically a +browser); SMTP transfers files (that is, e-mail messages) from one mail server to another mail server. +When transferring the files, both persistent HTTP and SMTP use persistent connections. Thus, the two +protocols have common characteristics. However, there are important differences. First, HTTP is mainly +a pull protocol—someone loads information on a Web server and users use HTTP to pull the +information from the server at their convenience. In particular, the TCP connection is initiated by the +machine that wants to receive the file. On the other hand, SMTP is primarily a push protocol—the +sending mail server pushes the file to the receiving mail server. In particular, the TCP connection is +initiated by the machine that wants to send the file. +A second difference, which we alluded to earlier, is that SMTP requires each message, including the +body of each message, to be in 7-bit ASCII format. If the message contains characters that are not 7-bit +ASCII (for example, French characters with accents) or contains binary data (such as an image file), +then the message has to be encoded into 7-bit ASCII. HTTP data does not impose this restriction. +A third important difference concerns how a document consisting of text and images (along with possibly +other media types) is handled. As we learned in Section 2.2, HTTP encapsulates each object in its own +HTTP response message. SMTP places all of the message’s objects into one message. + +2.3.3 Mail Message Formats +When Alice writes an ordinary snail-mail letter to Bob, she may include all kinds of peripheral header +information at the top of the letter, such as Bob’s address, her own return address, and the date. +Similarly, when an e-mail message is sent from one person to another, a header containing peripheral +information precedes the body of the message itself. This peripheral information is contained in a series +of header lines, which are defined in RFC 5322. The header lines and the body of the message are +separated by a blank line (that is, by CRLF ). RFC 5322 specifies the exact format for mail header lines +as well as their semantic interpretations. As with HTTP, each header line contains readable text, +consisting of a keyword followed by a colon followed by a value. Some of the keywords are required and +others are optional. Every header must have a From: header line and a To: header line; a header +may include a Subject: header line as well as other optional header lines. It is important to note that +these header lines are different from the SMTP commands we studied in Section 2.4.1 (even though + + they contain some common words such as “from” and “to”). The commands in that section were part of +the SMTP handshaking protocol; the header lines examined in this section are part of the mail message +itself. +A typical message header looks like this: + +From: alice@crepes.fr +To: bob@hamburger.edu +Subject: Searching for the meaning of life. + +After the message header, a blank line follows; then the message body (in ASCII) follows. You should +use Telnet to send a message to a mail server that contains some header lines, including the +Subject: header line. To do this, issue telnet serverName 25, as discussed in Section 2.4.1. + +2.3.4 Mail Access Protocols +Once SMTP delivers the message from Alice’s mail server to Bob’s mail server, the message is placed +in Bob’s mailbox. Throughout this discussion we have tacitly assumed that Bob reads his mail by +logging onto the server host and then executing a mail reader that runs on that host. Up until the early +1990s this was the standard way of doing things. But today, mail access uses a client-server +architecture—the typical user reads e-mail with a client that executes on the user’s end system, for +example, on an office PC, a laptop, or a smartphone. By executing a mail client on a local PC, users +enjoy a rich set of features, including the ability to view multimedia messages and attachments. +Given that Bob (the recipient) executes his user agent on his local PC, it is natural to consider placing a +mail server on his local PC as well. With this approach, Alice’s mail server would dialogue directly with +Bob’s PC. There is a problem with this approach, however. Recall that a mail server manages +mailboxes and runs the client and server sides of SMTP. If Bob’s mail server were to reside on his local +PC, then Bob’s PC would have to remain always on, and connected to the Internet, in order to receive +new mail, which can arrive at any time. This is impractical for many Internet users. Instead, a typical +user runs a user agent on the local PC but accesses its mailbox stored on an always-on shared mail +server. This mail server is shared with other users and is typically maintained by the user’s ISP (for +example, university or company). +Now let’s consider the path an e-mail message takes when it is sent from Alice to Bob. We just learned +that at some point along the path the e-mail message needs to be deposited in Bob’s mail server. This +could be done simply by having Alice’s user agent send the message directly to Bob’s mail server. And + + this could be done with SMTP—indeed, SMTP has been designed for pushing e-mail from one host to +another. However, typically the sender’s user agent does not dialogue directly with the recipient’s mail +server. Instead, as shown in Figure 2.16, Alice’s user agent uses SMTP to push the e-mail message +into her mail server, then Alice’s mail server uses SMTP (as an SMTP client) to relay the e-mail +message to Bob’s mail server. Why the two-step procedure? Primarily because without relaying through +Alice’s mail server, Alice’s user agent doesn’t have any recourse to an unreachable destination + +Figure 2.16 E-mail protocols and their communicating entities + +mail server. By having Alice first deposit the e-mail in her own mail server, Alice’s mail server can +repeatedly try to send the message to Bob’s mail server, say every 30 minutes, until Bob’s mail server +becomes operational. (And if Alice’s mail server is down, then she has the recourse of complaining to +her system administrator!) The SMTP RFC defines how the SMTP commands can be used to relay a +message across multiple SMTP servers. +But there is still one missing piece to the puzzle! How does a recipient like Bob, running a user agent on +his local PC, obtain his messages, which are sitting in a mail server within Bob’s ISP? Note that Bob’s +user agent can’t use SMTP to obtain the messages because obtaining the messages is a pull operation, +whereas SMTP is a push protocol. The puzzle is completed by introducing a special mail access +protocol that transfers messages from Bob’s mail server to his local PC. There are currently a number of +popular mail access protocols, including Post Office Protocol—Version 3 (POP3), Internet Mail +Access Protocol (IMAP), and HTTP. +Figure 2.16 provides a summary of the protocols that are used for Internet mail: SMTP is used to +transfer mail from the sender’s mail server to the recipient’s mail server; SMTP is also used to transfer +mail from the sender’s user agent to the sender’s mail server. A mail access protocol, such as POP3, is +used to transfer mail from the recipient’s mail server to the recipient’s user agent. +POP3 +POP3 is an extremely simple mail access protocol. It is defined in [RFC 1939], which is short and quite +readable. Because the protocol is so simple, its functionality is rather limited. POP3 begins when the +user agent (the client) opens a TCP connection to the mail server (the server) on port 110. With the TCP + + connection established, POP3 progresses through three phases: authorization, transaction, and update. +During the first phase, authorization, the user agent sends a username and a password (in the clear) to +authenticate the user. During the second phase, transaction, the user agent retrieves messages; also +during this phase, the user agent can mark messages for deletion, remove deletion marks, and obtain +mail statistics. The third phase, update, occurs after the client has issued the quit command, ending +the POP3 session; at this time, the mail server deletes the messages that were marked for deletion. +In a POP3 transaction, the user agent issues commands, and the server responds to each command +with a reply. There are two possible responses: +OK (sometimes followed by server-to-client data), +used by the server to indicate that the previous command was fine; and -ERR , used by the server to +indicate that something was wrong with the previous command. +The authorization phase has two principal commands: user and pass . +To illustrate these two commands, we suggest that you Telnet directly into a POP3 server, using port +110, and issue these commands. Suppose that mailServer is the name of your mail server. You will +see something like: + +telnet mailServer 110 ++OK POP3 server ready +user bob ++OK +pass hungry ++OK user successfully logged on + +If you misspell a command, the POP3 server will reply with an -ERR message. +Now let’s take a look at the transaction phase. A user agent using POP3 can often be configured (by the +user) to “download and delete” or to “download and keep.” The sequence of commands issued by a +POP3 user agent depends on which of these two modes the user agent is operating in. In the downloadand-delete mode, the user agent will issue the list , retr , and dele commands. As an example, +suppose the user has two messages in his or her mailbox. In the dialogue below, C: (standing for +client) is the user agent and S: (standing for server) is the mail server. The transaction will look +something like: + +C: list +S: 1 498 +S: 2 912 + + S: . +C: retr 1 +S: (blah blah ... +S: ................. +S: ..........blah) +S: . +C: dele 1 +C: retr 2 +S: (blah blah ... +S: ................. +S: ..........blah) +S: . +C: dele 2 +C: quit +S: +OK POP3 server signing off + +The user agent first asks the mail server to list the size of each of the stored messages. The user agent +then retrieves and deletes each message from the server. Note that after the authorization phase, the +user agent employed only four commands: list , retr , dele , and quit . The syntax for these +commands is defined in RFC 1939. After processing the quit command, the POP3 server enters the +update phase and removes messages 1 and 2 from the mailbox. +A problem with this download-and-delete mode is that the recipient, Bob, may be nomadic and may +want to access his mail messages from multiple machines, for example, his office PC, his home PC, and +his portable computer. The download-and-delete mode partitions Bob’s mail messages over these three +machines; in particular, if Bob first reads a message on his office PC, he will not be able to reread the +message from his portable at home later in the evening. In the download-and-keep mode, the user +agent leaves the messages on the mail server after downloading them. In this case, Bob can reread +messages from different machines; he can access a message from work and access it again later in the +week from home. +During a POP3 session between a user agent and the mail server, the POP3 server maintains some +state information; in particular, it keeps track of which user messages have been marked deleted. +However, the POP3 server does not carry state information across POP3 sessions. This lack of state +information across sessions greatly simplifies the implementation of a POP3 server. +IMAP +With POP3 access, once Bob has downloaded his messages to the local machine, he can create mail + + folders and move the downloaded messages into the folders. Bob can then delete messages, move +messages across folders, and search for messages (by sender name or subject). But this paradigm— +namely, folders and messages in the local machine—poses a problem for the nomadic user, who would +prefer to maintain a folder hierarchy on a remote server that can be accessed from any computer. This +is not possible with POP3—the POP3 protocol does not provide any means for a user to create remote +folders and assign messages to folders. +To solve this and other problems, the IMAP protocol, defined in [RFC 3501], was invented. Like POP3, +IMAP is a mail access protocol. It has many more features than POP3, but it is also significantly more +complex. (And thus the client and server side implementations are significantly more complex.) +An IMAP server will associate each message with a folder; when a message first arrives at the server, it +is associated with the recipient’s INBOX folder. The recipient can then move the message into a new, +user-created folder, read the message, delete the message, and so on. The IMAP protocol provides +commands to allow users to create folders and move messages from one folder to another. IMAP also +provides commands that allow users to search remote folders for messages matching specific criteria. +Note that, unlike POP3, an IMAP server maintains user state information across IMAP sessions—for +example, the names of the folders and which messages are associated with which folders. +Another important feature of IMAP is that it has commands that permit a user agent to obtain +components of messages. For example, a user agent can obtain just the message header of a message +or just one part of a multipart MIME message. This feature is useful when there is a low-bandwidth +connection (for example, a slow-speed modem link) between the user agent and its mail server. With a +low-bandwidth connection, the user may not want to download all of the messages in its mailbox, +particularly avoiding long messages that might contain, for example, an audio or video clip. +Web-Based E-Mail +More and more users today are sending and accessing their e-mail through their Web browsers. Hotmail +introduced Web-based access in the mid 1990s. Now Web-based e-mail is also provided by Google, +Yahoo!, as well as just about every major university and corporation. With this service, the user agent is +an ordinary Web browser, and the user communicates with its remote mailbox via HTTP. When a +recipient, such as Bob, wants to access a message in his mailbox, the e-mail message is sent from +Bob’s mail server to Bob’s browser using the HTTP protocol rather than the POP3 or IMAP protocol. +When a sender, such as Alice, wants to send an e-mail message, the e-mail message is sent from her +browser to her mail server over HTTP rather than over SMTP. Alice’s mail server, however, still sends +messages to, and receives messages from, other mail servers using SMTP. + + 2.4 DNS—The Internet’s Directory Service +We human beings can be identified in many ways. For example, we can be identified by the names that +appear on our birth certificates. We can be identified by our social security numbers. We can be +identified by our driver’s license numbers. Although each of these identifiers can be used to identify +people, within a given context one identifier may be more appropriate than another. For example, the +computers at the IRS (the infamous tax-collecting agency in the United States) prefer to use fixed-length +social security numbers rather than birth certificate names. On the other hand, ordinary people prefer +the more mnemonic birth certificate names rather than social security numbers. (Indeed, can you +imagine saying, “Hi. My name is 132-67-9875. Please meet my husband, 178-87-1146.”) +Just as humans can be identified in many ways, so too can Internet hosts. One identifier for a host is its +hostname. Hostnames—such as www.facebook.com, www.google.com , +gaia.cs.umass.edu —are mnemonic and are therefore appreciated by humans. However, +hostnames provide little, if any, information about the location within the Internet of the host. (A +hostname such as www.eurecom.fr , which ends with the country code .fr , tells us that the host is +probably in France, but doesn’t say much more.) Furthermore, because hostnames can consist of +variable-length alphanumeric characters, they would be difficult to process by routers. For these +reasons, hosts are also identified by so-called IP addresses. +We discuss IP addresses in some detail in Chapter 4, but it is useful to say a few brief words about +them now. An IP address consists of four bytes and has a rigid hierarchical structure. An IP address +looks like 121.7.106.83 , where each period separates one of the bytes expressed in decimal +notation from 0 to 255. An IP address is hierarchical because as we scan the address from left to right, +we obtain more and more specific information about where the host is located in the Internet (that is, +within which network, in the network of networks). Similarly, when we scan a postal address from bottom +to top, we obtain more and more specific information about where the addressee is located. + +2.4.1 Services Provided by DNS +We have just seen that there are two ways to identify a host—by a hostname and by an IP address. +People prefer the more mnemonic hostname identifier, while routers prefer fixed-length, hierarchically +structured IP addresses. In order to reconcile these preferences, we need a directory service that +translates hostnames to IP addresses. This is the main task of the Internet’s domain name system +(DNS). The DNS is (1) a distributed database implemented in a hierarchy of DNS servers, and (2) an + + application-layer protocol that allows hosts to query the distributed database. The DNS servers are often +UNIX machines running the Berkeley Internet Name Domain (BIND) software [BIND 2016]. The DNS +protocol runs over UDP and uses port 53. +DNS is commonly employed by other application-layer protocols—including HTTP and SMTP to +translate user-supplied hostnames to IP addresses. As an example, consider what happens when a +browser (that is, an HTTP client), running on some user’s host, requests the URL +www.someschool.edu/index.html . In order for the user’s host to be able to send an HTTP request +message to the Web server www.someschool.edu , the user’s host must first obtain the IP address of +www.someschool.edu . This is done as follows. + +1. The same user machine runs the client side of the DNS application. +2. The browser extracts the hostname, www.someschool.edu , from the URL and passes the +hostname to the client side of the DNS application. + +3. The DNS client sends a query containing the hostname to a DNS server. +4. The DNS client eventually receives a reply, which includes the IP address for the hostname. +5. Once the browser receives the IP address from DNS, it can initiate a TCP connection to the +HTTP server process located at port 80 at that IP address. +We see from this example that DNS adds an additional delay—sometimes substantial—to the Internet +applications that use it. Fortunately, as we discuss below, the desired IP address is often cached in a +“nearby” DNS server, which helps to reduce DNS network traffic as well as the average DNS delay. +DNS provides a few other important services in addition to translating hostnames to IP addresses: +Host aliasing. A host with a complicated hostname can have one or more alias names. For +example, a hostname such as relay1.west-coast.enterprise.com could have, say, two +aliases such as enterprise.com and www.enterprise.com . In this case, the hostname +relay1.west-coast.enterprise.com is said to be a canonical hostname. Alias hostnames, +when present, are typically more mnemonic than canonical hostnames. DNS can be invoked by an +application to obtain the canonical hostname for a supplied alias hostname as well as the IP address +of the host. +Mail server aliasing. For obvious reasons, it is highly desirable that e-mail addresses be mnemonic. +For example, if Bob has an account with Yahoo Mail, Bob’s e-mail address might be as simple as +bob@yahoo.mail . However, the hostname of the Yahoo mail server is more complicated and +much less mnemonic than simply yahoo.com (for example, the canonical hostname might be +something like relay1.west-coast.yahoo.com ). DNS can be invoked by a mail application to +obtain the canonical hostname for a supplied alias hostname as well as the IP address of the host. +In fact, the MX record (see below) permits a company’s mail server and Web server to have identical +(aliased) hostnames; for example, a company’s Web server and mail server can both be called + + enterprise.com . +Load distribution. DNS is also used to perform load distribution among replicated servers, such as +replicated Web servers. Busy sites, such as cnn.com , are replicated over multiple servers, with +each server running on a different end system and each having a different IP address. For replicated +Web servers, a set of IP addresses is thus associated with one canonical hostname. The DNS +database contains this set of IP addresses. When clients make a DNS query for a name mapped to +a set of addresses, the server responds with the entire set of IP addresses, but rotates the ordering +of the addresses within each reply. Because a client typically sends its HTTP request message to +the IP address that is listed first in the set, DNS rotation distributes the traffic among the replicated +servers. DNS rotation is also used for e-mail so that multiple mail servers can have the same alias +name. Also, content distribution companies such as Akamai have used DNS in more sophisticated +ways [Dilley 2002] to provide Web content distribution (see Section 2.6.3). +The DNS is specified in RFC 1034 and RFC 1035, and updated in several additional RFCs. It is a +complex system, and we only touch upon key aspects of its + +PRINCIPLES IN PRACTICE +DNS: CRITICAL NETWORK FUNCTIONS VIA THE CLIENT-SERVER PARADIGM +Like HTTP, FTP, and SMTP, the DNS protocol is an application-layer protocol since it (1) runs +between communicating end systems using the client-server paradigm and (2) relies on an +underlying end-to-end transport protocol to transfer DNS messages between communicating +end systems. In another sense, however, the role of the DNS is quite different from Web, file +transfer, and e-mail applications. Unlike these applications, the DNS is not an application with +which a user directly interacts. Instead, the DNS provides a core Internet function—namely, +translating hostnames to their underlying IP addresses, for user applications and other software +in the Internet. We noted in Section 1.2 that much of the complexity in the Internet architecture +is located at the “edges” of the network. The DNS, which implements the critical name-toaddress translation process using clients and servers located at the edge of the network, is yet +another example of that design philosophy. + +operation here. The interested reader is referred to these RFCs and the book by Albitz and Liu [Albitz +1993]; see also the retrospective paper [Mockapetris 1988], which provides a nice description of the +what and why of DNS, and [Mockapetris 2005]. + +2.4.2 Overview of How DNS Works +We now present a high-level overview of how DNS works. Our discussion will focus on the hostname-to- + + IP-address translation service. +Suppose that some application (such as a Web browser or a mail reader) running in a user’s host needs +to translate a hostname to an IP address. The application will invoke the client side of DNS, specifying +the hostname that needs to be translated. (On many UNIX-based machines, gethostbyname() is the +function call that an application calls in order to perform the translation.) DNS in the user’s host then +takes over, sending a query message into the network. All DNS query and reply messages are sent +within UDP datagrams to port 53. After a delay, ranging from milliseconds to seconds, DNS in the user’s +host receives a DNS reply message that provides the desired mapping. This mapping is then passed to +the invoking application. Thus, from the perspective of the invoking application in the user’s host, DNS is +a black box providing a simple, straightforward translation service. But in fact, the black box that +implements the service is complex, consisting of a large number of DNS servers distributed around the +globe, as well as an application-layer protocol that specifies how the DNS servers and querying hosts +communicate. +A simple design for DNS would have one DNS server that contains all the mappings. In this centralized +design, clients simply direct all queries to the single DNS server, and the DNS server responds directly +to the querying clients. Although the simplicity of this design is attractive, it is inappropriate for today’s +Internet, with its vast (and growing) number of hosts. The problems with a centralized design include: +A single point of failure. If the DNS server crashes, so does the entire Internet! +Traffic volume. A single DNS server would have to handle all DNS queries (for all the HTTP +requests and e-mail messages generated from hundreds of millions of hosts). +Distant centralized database. A single DNS server cannot be “close to” all the querying clients. If +we put the single DNS server in New York City, then all queries from Australia must travel to the +other side of the globe, perhaps over slow and congested links. This can lead to significant delays. +Maintenance. The single DNS server would have to keep records for all Internet hosts. Not only +would this centralized database be huge, but it would have to be updated frequently to account for +every new host. +In summary, a centralized database in a single DNS server simply doesn’t scale. Consequently, the +DNS is distributed by design. In fact, the DNS is a wonderful example of how a distributed database can +be implemented in the Internet. +A Distributed, Hierarchical Database +In order to deal with the issue of scale, the DNS uses a large number of servers, organized in a +hierarchical fashion and distributed around the world. No single DNS server has all of the mappings for +all of the hosts in the Internet. Instead, the mappings are distributed across the DNS servers. To a first +approximation, there are three classes of DNS servers—root DNS servers, top-level domain (TLD) DNS + + servers, and authoritative DNS servers—organized in a hierarchy as shown in Figure 2.17. To +understand how these three classes of servers interact, suppose a DNS client wants to determine the IP +address for the hostname www.amazon.com . To a first + +Figure 2.17 Portion of the hierarchy of DNS servers + +approximation, the following events will take place. The client first contacts one of the root servers, +which returns IP addresses for TLD servers for the top-level domain com . The client then contacts one +of these TLD servers, which returns the IP address of an authoritative server for amazon.com . Finally, +the client contacts one of the authoritative servers for amazon.com , which returns the IP address for +the hostname www.amazon.com . We’ll soon examine this DNS lookup process in more detail. But let’s +first take a closer look at these three classes of DNS servers: +Root DNS servers. There are over 400 root name servers scattered all over the world. Figure 2.18 +shows the countries that have root names servers, with countries having more than ten darkly +shaded. These root name servers are managed by 13 different organizations. The full list of root +name servers, along with the organizations that manage them and their IP addresses can be found +at [Root Servers 2016]. Root name servers provide the IP addresses of the TLD servers. +Top-level domain (TLD) servers. For each of the top-level domains — top-level domains such as +com, org, net, edu, and gov, and all of the country top-level domains such as uk, fr, ca, and jp — +there is TLD server (or server cluster). The company Verisign Global Registry Services maintains +the TLD servers for the com top-level domain, and the company Educause maintains the TLD +servers for the edu top-level domain. The network infrastructure supporting a TLD can be large and +complex; see [Osterweil 2012] for a nice overview of the Verisign network. See [TLD list 2016] for +a list of all top-level domains. TLD servers provide the IP addresses for authoritative DNS servers. + + Figure 2.18 DNS root servers in 2016 + +Authoritative DNS servers. Every organization with publicly accessible hosts (such as Web servers +and mail servers) on the Internet must provide publicly accessible DNS records that map the names +of those hosts to IP addresses. An organization’s authoritative DNS server houses these DNS +records. An organization can choose to implement its own authoritative DNS server to hold these +records; alternatively, the organization can pay to have these records stored in an authoritative DNS +server of some service provider. Most universities and large companies implement and maintain +their own primary and secondary (backup) authoritative DNS server. +The root, TLD, and authoritative DNS servers all belong to the hierarchy of DNS servers, as shown in +Figure 2.17. There is another important type of DNS server called the local DNS server. A local DNS +server does not strictly belong to the hierarchy of servers but is nevertheless central to the DNS +architecture. Each ISP—such as a residential ISP or an institutional ISP—has a local DNS server (also +called a default name server). When a host connects to an ISP, the ISP provides the host with the IP +addresses of one or more of its local DNS servers (typically through DHCP, which is discussed in +Chapter 4). You can easily determine the IP address of your local DNS server by accessing network +status windows in Windows or UNIX. A host’s local DNS server is typically “close to” the host. For an +institutional ISP, the local DNS server may be on the same LAN as the host; for a residential ISP, it is +typically separated from the host by no more than a few routers. When a host makes a DNS query, the +query is sent to the local DNS server, which acts a proxy, forwarding the query into the DNS server +hierarchy, as we’ll discuss in more detail below. +Let’s take a look at a simple example. Suppose the host cse.nyu.edu desires the IP address of +gaia.cs.umass.edu . Also suppose that NYU’s ocal DNS server for cse.nyu.edu is called + + dns.nyu.edu and that an authoritative DNS server for gaia.cs.umass.edu is called +dns.umass.edu . As shown in Figure 2.19, the host cse.nyu.edu first sends a DNS query +message to its local DNS server, dns.nyu.edu . The query message contains the hostname to be +translated, namely, gaia.cs.umass.edu . The local DNS server forwards the query message to a +root DNS server. The root DNS server takes note of the edu suffix and returns to the local DNS server a +list of IP addresses for TLD servers responsible for edu . The local DNS server then resends the query +message to one of these TLD servers. The TLD server takes note of the umass.edu suffix and +responds with the IP address of the authoritative DNS server for the University of Massachusetts, +namely, dns.umass.edu . Finally, the local DNS server resends the query message directly to +dns.umass.edu , which responds with the IP address of gaia.cs.umass.edu . Note that in this +example, in order to obtain the mapping for one hostname, eight DNS messages were sent: four query +messages and four reply messages! We’ll soon see how DNS caching reduces this query traffic. +Our previous example assumed that the TLD server knows the authoritative DNS server for the +hostname. In general this not always true. Instead, the TLD server + +Figure 2.19 Interaction of the various DNS servers + + may know only of an intermediate DNS server, which in turn knows the authoritative DNS server for the +hostname. For example, suppose again that the University of Massachusetts has a DNS server for the +university, called dns.umass.edu . Also suppose that each of the departments at the University of +Massachusetts has its own DNS server, and that each departmental DNS server is authoritative for all +hosts in the department. In this case, when the intermediate DNS server, dns.umass.edu , receives a +query for a host with a hostname ending with cs.umass.edu , it returns to dns.nyu.edu the IP +address of dns.cs.umass.edu , which is authoritative for all hostnames ending with cs.umass.edu . +The local DNS server dns.nyu.edu then sends the query to the authoritative DNS server, which +returns the desired mapping to the local DNS server, which in turn returns the mapping to the requesting +host. In this case, a total of 10 DNS messages are sent! +The example shown in Figure 2.19 makes use of both recursive queries and iterative queries. The +query sent from cse.nyu.edu to dns.nyu.edu is a recursive query, since the query asks +dns.nyu.edu to obtain the mapping on its behalf. But the subsequent three queries are iterative since +all of the replies are directly returned to dns.nyu.edu . In theory, any DNS query can be iterative or +recursive. For example, Figure 2.20 shows a DNS query chain for which all of the queries are recursive. +In practice, the queries typically follow the pattern in Figure 2.19: The query from the requesting host to +the local DNS server is recursive, and the remaining queries are iterative. +DNS Caching +Our discussion thus far has ignored DNS caching, a critically important feature of the DNS system. In +truth, DNS extensively exploits DNS caching in order to improve the delay performance and to reduce +the number of DNS messages + + Figure 2.20 Recursive queries in DNS + +ricocheting around the Internet. The idea behind DNS caching is very simple. In a query chain, when a +DNS server receives a DNS reply (containing, for example, a mapping from a hostname to an IP +address), it can cache the mapping in its local memory. For example, in Figure 2.19, each time the local +DNS server dns.nyu.edu receives a reply from some DNS server, it can cache any of the information +contained in the reply. If a hostname/IP address pair is cached in a DNS server and another query +arrives to the DNS server for the same hostname, the DNS server can provide the desired IP address, +even if it is not authoritative for the hostname. Because hosts and mappings between hostnames and IP +addresses are by no means permanent, DNS servers discard cached information after a period of time +(often set to two days). +As an example, suppose that a host apricot.nyu.edu queries dns.nyu.edu for the IP address for +the hostname cnn.com . Furthermore, ­suppose that a few hours later, another NYU host, say, +kiwi.nyu.edu , also queries dns.nyu.edu with the same hostname. Because of caching, the local +DNS server will be able to immediately return the IP address of cnn.com to this second requesting + + host without having to query any other DNS servers. A local DNS server can also cache the IP +addresses of TLD servers, thereby allowing the local DNS server to bypass the root DNS servers in a +query chain. In fact, because of caching, root servers are bypassed for all but a very small fraction of +DNS queries. + +2.4.3 DNS Records and Messages +The DNS servers that together implement the DNS distributed database store resource records (RRs), +including RRs that provide hostname-to-IP address mappings. Each DNS reply message carries one or +more resource records. In this and the following subsection, we provide a brief overview of DNS +resource records and messages; more details can be found in [Albitz 1993] or in the DNS RFCs [RFC +1034; RFC 1035]. +A resource record is a four-tuple that contains the following fields: + +(Name, Value, Type, TTL) + +TTL is the time to live of the resource record; it determines when a resource should be removed from a +cache. In the example records given below, we ignore the TTL field. The meaning of Name and Value +depend on Type : +If Type=A , then Name is a hostname and Value is the IP address for the hostname. Thus, a Type +A record provides the standard hostname-to-IP address mapping. As an example, +(relay1.bar.foo.com, 145.37.93.126, A) is a Type A record. +If Type=NS , then Name is a domain (such as foo.com ) and Value is the hostname of an +authoritative DNS server that knows how to obtain the IP addresses for hosts in the domain. This +record is used to route DNS queries further along in the query chain. As an example, (foo.com, +dns.foo.com, NS) is a Type NS record. +If Type=CNAME , then Value is a canonical hostname for the alias hostname Name . This record +can provide querying hosts the canonical name for a hostname. As an example, (foo.com, +relay1.bar.foo.com, CNAME) is a CNAME record. +If Type=MX , then Value is the canonical name of a mail server that has an alias hostname Name . +As an example, (foo.com, mail.bar.foo.com, MX) is an MX record. MX records allow the +hostnames of mail servers to have simple aliases. Note that by using the MX record, a company can +have the same aliased name for its mail server and for one of its other servers (such as its Web +server). To obtain the canonical name for the mail server, a DNS client would query for an MX + + record; to obtain the canonical name for the other server, the DNS client would query for the CNAME +record. +If a DNS server is authoritative for a particular hostname, then the DNS server will contain a Type A +record for the hostname. (Even if the DNS server is not authoritative, it may contain a Type A record in +its cache.) If a server is not authoritative for a hostname, then the server will contain a Type NS record +for the domain that includes the hostname; it will also contain a Type A record that provides the IP +address of the DNS server in the Value field of the NS record. As an example, suppose an edu TLD +server is not authoritative for the host gaia.cs.umass.edu . Then this server will contain a record for +a domain that includes the host gaia.cs.umass.edu , for example, (umass.edu, +dns.umass.edu, NS) . The edu TLD server would also contain a Type A record, which maps the +DNS server dns.umass.edu to an IP address, for example, (dns.umass.edu, +128.119.40.111, A) . +DNS Messages +Earlier in this section, we referred to DNS query and reply messages. These are the only two kinds of +DNS messages. Furthermore, both query and reply messages have the same format, as shown in +Figure 2.21.The semantics of the various fields in a DNS message are as follows: +The first 12 bytes is the header section, which has a number of fields. The first field is a 16-bit +number that identifies the query. This identifier is copied into the reply message to a query, allowing +the client to match received replies with sent queries. There are a number of flags in the flag field. A +1-bit query/reply flag indicates whether the message is a query (0) or a reply (1). A 1-bit authoritative +flag is + + Figure 2.21 DNS message format + +set in a reply message when a DNS server is an authoritative server for a queried name. A 1-bit +recursion-desired flag is set when a client (host or DNS server) desires that the DNS server perform +recursion when it doesn’t have the record. A 1-bit recursion-available field is set in a reply if the DNS +server supports recursion. In the header, there are also four number-of fields. These fields indicate +the number of occurrences of the four types of data sections that follow the header. +The question section contains information about the query that is being made. This section includes +(1) a name field that contains the name that is being queried, and (2) a type field that indicates the +type of question being asked about the name—for example, a host address associated with a name +(Type A) or the mail server for a name (Type MX). +In a reply from a DNS server, the answer section contains the resource records for the name that +was originally queried. Recall that in each resource record there is the Type (for example, A, NS, +CNAME, and MX), the Value , and the TTL . A reply can return multiple RRs in the answer, since a +hostname can have multiple IP addresses (for example, for replicated Web servers, as discussed +earlier in this section). +The authority section contains records of other authoritative servers. +The additional section contains other helpful records. For example, the answer field in a reply to an +MX query contains a resource record providing the canonical hostname of a mail server. The +additional section contains a Type A record providing the IP address for the canonical hostname of +the mail server. +How would you like to send a DNS query message directly from the host you’re working on to some +DNS server? This can easily be done with the nslookup program, which is available from most +Windows and UNIX platforms. For example, from a Windows host, open the Command Prompt and +invoke the nslookup program by simply typing “nslookup.” After invoking nslookup, you can send a DNS +query to any DNS server (root, TLD, or authoritative). After receiving the reply message from the DNS +server, nslookup will display the records included in the reply (in a human-readable format). As an +alternative to running nslookup from your own host, you can visit one of many Web sites that allow you +to remotely employ nslookup. (Just type “nslookup” into a search engine and you’ll be brought to one of +these sites.) The DNS Wireshark lab at the end of this chapter will allow you to explore the DNS in much +more detail. +Inserting Records into the DNS Database +The discussion above focused on how records are retrieved from the DNS database. You might be +wondering how records get into the database in the first place. Let’s look at how this is done in the +context of a specific example. Suppose you have just created an exciting new startup company called +Network Utopia. The first thing you’ll surely want to do is register the domain name + + networkutopia.com at a registrar. A registrar is a commercial entity that verifies the uniqueness of +the domain name, enters the domain name into the DNS database (as discussed below), and collects a +small fee from you for its services. Prior to 1999, a single registrar, Network Solutions, had a monopoly +on domain name registration for com , net , and org domains. But now there are many registrars +competing for customers, and the Internet Corporation for Assigned Names and Numbers (ICANN) +accredits the various registrars. A complete list of accredited registrars is available at http:// +www.internic.net . +When you register the domain name networkutopia.com with some registrar, you also need to +provide the registrar with the names and IP addresses of your primary and secondary authoritative DNS +servers. Suppose the names and IP addresses are dns1.networkutopia.com , +dns2.networkutopia.com , 212.2.212.1, and 212.212.212.2. For each of these two +authoritative DNS servers, the registrar would then make sure that a Type NS and a Type A record are +entered into the TLD com servers. Specifically, for the primary authoritative server for +networkutopia.com , the registrar would insert the following two resource records into the DNS +system: + +(networkutopia.com, dns1.networkutopia.com, NS) +(dns1.networkutopia.com, 212.212.212.1, A) + +You’ll also have to make sure that the Type A resource record for your Web server +www.networkutopia.com and the Type MX resource record for your mail server +mail.networkutopia.com are entered into your authoritative DNS +FOCUS ON SECURITY +DNS VULNERABILITIES +We have seen that DNS is a critical component of the Internet infrastructure, with many +important services—including the Web and e-mail—simply incapable of functioning without it. +We therefore naturally ask, how can DNS be attacked? Is DNS a sitting duck, waiting to be +knocked out of service, while taking most Internet applications down with it? +The first type of attack that comes to mind is a DDoS bandwidth-flooding attack (see Section +1.6) against DNS servers. For example, an attacker could attempt to send to each DNS root +server a deluge of packets, so many that the majority of legitimate DNS queries never get +answered. Such a large-scale DDoS attack against DNS root servers actually took place on +October 21, 2002. In this attack, the attackers leveraged a botnet to send truck loads of ICMP +ping messages to each of the 13 DNS root IP addresses. (ICMP messages are discussed in + + Section 5.6. For now, it suffices to know that ICMP packets are special types of IP datagrams.) +Fortunately, this large-scale attack caused minimal damage, having little or no impact on users’ +Internet experience. The attackers did succeed at directing a deluge of packets at the root +servers. But many of the DNS root servers were protected by packet filters, configured to always +block all ICMP ping messages directed at the root servers. These protected servers were thus +spared and functioned as normal. Furthermore, most local DNS servers cache the IP addresses +of top-level-domain servers, allowing the query process to often bypass the DNS root servers. +A potentially more effective DDoS attack against DNS would be send a deluge of DNS queries +to top-level-domain servers, for example, to all the top-level-domain servers that handle the .com +domain. It would be harder to filter DNS queries directed to DNS servers; and top-level-domain +servers are not as easily bypassed as are root servers. But the severity of such an attack would +be partially mitigated by caching in local DNS servers. +DNS could potentially be attacked in other ways. In a man-in-the-middle attack, the attacker +intercepts queries from hosts and returns bogus replies. In the DNS poisoning attack, the +attacker sends bogus replies to a DNS server, tricking the server into accepting bogus records +into its cache. Either of these attacks could be used, for example, to redirect an unsuspecting +Web user to the attacker’s Web site. These attacks, however, are difficult to implement, as they +require intercepting packets or throttling servers [Skoudis 2006]. +In summary, DNS has demonstrated itself to be surprisingly robust against attacks. To date, +there hasn’t been an attack that has successfully impeded the DNS service. + +servers. (Until recently, the contents of each DNS server were configured statically, for example, from a +configuration file created by a system manager. More recently, an UPDATE option has been added to +the DNS protocol to allow data to be dynamically added or deleted from the database via DNS +messages. [RFC 2136] and [RFC 3007] specify DNS dynamic updates.) +Once all of these steps are completed, people will be able to visit your Web site and send e-mail to the +employees at your company. Let’s conclude our discussion of DNS by verifying that this statement is +true. This verification also helps to solidify what we have learned about DNS. Suppose Alice in Australia +wants to view the Web page www.networkutopia.com . As discussed earlier, her host will first send +a DNS query to her local DNS server. The local DNS server will then contact a TLD com server. (The +local DNS server will also have to contact a root DNS server if the address of a TLD com server is not +cached.) This TLD server contains the Type NS and Type A resource records listed above, because the +registrar had these resource records inserted into all of the TLD com servers. The TLD com server +sends a reply to Alice’s local DNS server, with the reply containing the two resource records. The local +DNS server then sends a DNS query to 212.212.212.1 , asking for the Type A record corresponding +to www.networkutopia.com . This record provides the IP address of the desired Web server, say, +212.212.71.4 , which the local DNS server passes back to Alice’s host. Alice’s browser can now + + initiate a TCP connection to the host 212.212.71.4 and send an HTTP request over the connection. +Whew! There’s a lot more going on than what meets the eye when one surfs the Web! + + 2.5 Peer-to-Peer File Distribution +The applications described in this chapter thus far—including the Web, e-mail, and DNS—all employ +client-server architectures with significant reliance on always-on infrastructure servers. Recall from +Section 2.1.1 that with a P2P architecture, there is minimal (or no) reliance on always-on infrastructure +servers. Instead, pairs of intermittently connected hosts, called peers, communicate directly with each +other. The peers are not owned by a service provider, but are instead desktops and laptops controlled +by users. +In this section we consider a very natural P2P application, namely, distributing a large file from a single +server to a large number of hosts (called peers). The file might be a new version of the Linux operating +system, a software patch for an existing operating system or application, an MP3 music file, or an +MPEG video file. In client-server file distribution, the server must send a copy of the file to each of the +peers—placing an enormous burden on the server and consuming a large amount of server bandwidth. +In P2P file distribution, each peer can redistribute any portion of the file it has received to any other +peers, thereby assisting the server in the distribution process. As of 2016, the most popular P2P file +distribution protocol is BitTorrent. Originally developed by Bram Cohen, there are now many different +independent BitTorrent clients conforming to the BitTorrent protocol, just as there are a number of Web +browser clients that conform to the HTTP protocol. In this subsection, we first examine the selfscalability of P2P architectures in the context of file distribution. We then describe BitTorrent in some +detail, highlighting its most important characteristics and features. +Scalability of P2P Architectures +To compare client-server architectures with peer-to-peer architectures, and illustrate the inherent selfscalability of P2P, we now consider a simple quantitative model for distributing a file to a fixed set of +peers for both architecture types. As shown in Figure 2.22, the server and the peers are connected to +the Internet with access links. Denote the upload rate of the server’s access link by us, the upload rate of +the ith peer’s access link by ui, and the download rate of the ith peer’s access link by di. Also denote the +size of the file to be distributed (in bits) by F and the number of peers that want to obtain a copy of the +file by N. The distribution time is the time it takes to get + + Figure 2.22 An illustrative file distribution problem + +a copy of the file to all N peers. In our analysis of the distribution time below, for both client-server and +P2P architectures, we make the simplifying (and generally accurate [Akella 2003]) assumption that the +Internet core has abundant bandwidth, implying that all of the bottlenecks are in access networks. We +also suppose that the server and clients are not participating in any other network applications, so that +all of their upload and download access bandwidth can be fully devoted to distributing this file. +Let’s first determine the distribution time for the client-server architecture, which we denote by Dcs. In the +client-server architecture, none of the peers aids in distributing the file. We make the following +observations: +The server must transmit one copy of the file to each of the N peers. Thus the server must transmit +NF bits. Since the server’s upload rate is us, the time to distribute the file must be at least NF/us. +Let dmin denote the download rate of the peer with the lowest download rate, that is, +dmin=min{d1,dp,. . .,dN}. The peer with the lowest download rate cannot obtain all F bits of the file in +less than F/dmin seconds. Thus the minimum distribution time is at least F/dmin. +Putting these two observations together, we obtain +Dcs≥max{NFus,Fdmin}. + + This provides a lower bound on the minimum distribution time for the client-server architecture. In the +homework problems you will be asked to show that the server can schedule its transmissions so that the +lower bound is actually achieved. So let’s take this lower bound provided above as the actual distribution +time, that is, +Dcs=max{NFus,Fdmin} + +(2.1) + +We see from Equation 2.1 that for N large enough, the client-server distribution time is given by NF/us. +Thus, the distribution time increases linearly with the number of peers N. So, for example, if the number +of peers from one week to the next increases a thousand-fold from a thousand to a million, the time +required to distribute the file to all peers increases by 1,000. +Let’s now go through a similar analysis for the P2P architecture, where each peer can assist the server +in distributing the file. In particular, when a peer receives some file data, it can use its own upload +capacity to redistribute the data to other peers. Calculating the distribution time for the P2P architecture +is somewhat more complicated than for the client-server architecture, since the distribution time +depends on how each peer distributes portions of the file to the other peers. Nevertheless, a simple +expression for the minimal distribution time can be obtained [Kumar 2006]. To this end, we first make +the following observations: +At the beginning of the distribution, only the server has the file. To get this file into the community of +peers, the server must send each bit of the file at least once into its access link. Thus, the minimum +distribution time is at least F/us. (Unlike the client-server scheme, a bit sent once by the server may +not have to be sent by the server again, as the peers may redistribute the bit among themselves.) +As with the client-server architecture, the peer with the lowest download rate cannot obtain all F bits +of the file in less than F/dmin seconds. Thus the minimum distribution time is at least F/dmin. +Finally, observe that the total upload capacity of the system as a whole is equal to the upload rate of +the server plus the upload rates of each of the individual peers, that is, utotal=us+u1+⋯+uN. The +system must deliver (upload) F bits to each of the N peers, thus delivering a total of NF bits. This +cannot be done at a rate faster than utotal. Thus, the minimum distribution time is also at least +NF/(us+u1+⋯+uN). +Putting these three observations together, we obtain the minimum distribution time for P2P, denoted by +DP2P. +DP2P≥max{Fus,Fdmin,NFus+∑i=1Nui} + +(2.2) + +Equation 2.2 provides a lower bound for the minimum distribution time for the P2P architecture. It turns +out that if we imagine that each peer can redistribute a bit as soon as it receives the bit, then there is a + + redistribution scheme that actually achieves this lower bound [Kumar 2006]. (We will prove a special +case of this result in the homework.) In reality, where chunks of the file are redistributed rather than +individual bits, Equation 2.2 serves as a good approximation of the actual minimum distribution time. +Thus, let’s take the lower bound provided by Equation 2.2 as the actual minimum distribution time, that +is, +DP2P=max{Fus,Fdmin,NFus+∑i=1Nui} + +(2.3) + +Figure 2.23 compares the minimum distribution time for the client-server and P2P architectures +assuming that all peers have the same upload rate u. In Figure 2.23, we have set F/u=1 hour, us=10u, +and dmin≥us. Thus, a peer can transmit the entire file in one hour, the server transmission rate is 10 +times the peer upload rate, + +Figure 2.23 Distribution time for P2P and client-server architectures + +and (for simplicity) the peer download rates are set large enough so as not to have an effect. We see +from Figure 2.23 that for the client-server architecture, the distribution time increases linearly and +without bound as the number of peers increases. However, for the P2P architecture, the minimal +distribution time is not only always less than the distribution time of the client-server architecture; it is +also less than one hour for any number of peers N. Thus, applications with the P2P architecture can be +self-scaling. This scalability is a direct consequence of peers being redistributors as well as consumers +of bits. +BitTorrent +BitTorrent is a popular P2P protocol for file distribution [Chao 2011]. In BitTorrent lingo, the collection of + + all peers participating in the distribution of a particular file is called a torrent. Peers in a torrent download +equal-size chunks of the file from one another, with a typical chunk size of 256 KBytes. When a peer +first joins a torrent, it has no chunks. Over time it accumulates more and more chunks. While it +downloads chunks it also uploads chunks to other peers. Once a peer has acquired the entire file, it may +(selfishly) leave the torrent, or (altruistically) remain in the torrent and continue to upload chunks to other +peers. Also, any peer may leave the torrent at any time with only a subset of chunks, and later rejoin the +torrent. +Let’s now take a closer look at how BitTorrent operates. Since BitTorrent is a rather complicated +protocol and system, we’ll only describe its most important mechanisms, sweeping some of the details +under the rug; this will allow us to see the forest through the trees. Each torrent has an infrastructure +node called a tracker. + +Figure 2.24 File distribution with BitTorrent + +When a peer joins a torrent, it registers itself with the tracker and periodically informs the tracker that it is +still in the torrent. In this manner, the tracker keeps track of the peers that are participating in the torrent. +A given torrent may have fewer than ten or more than a thousand peers participating at any instant of +time. + + As shown in Figure 2.24, when a new peer, Alice, joins the torrent, the tracker randomly selects a +subset of peers (for concreteness, say 50) from the set of participating peers, and sends the IP +addresses of these 50 peers to Alice. Possessing this list of peers, Alice attempts to establish +concurrent TCP connections with all the peers on this list. Let’s call all the peers with which Alice +succeeds in establishing a TCP connection “neighboring peers.” (In Figure 2.24, Alice is shown to have +only three neighboring peers. Normally, she would have many more.) As time evolves, some of these +peers may leave and other peers (outside the initial 50) may attempt to establish TCP connections with +Alice. So a peer’s neighboring peers will fluctuate over time. +At any given time, each peer will have a subset of chunks from the file, with different peers having +different subsets. Periodically, Alice will ask each of her neighboring peers (over the TCP connections) +for the list of the chunks they have. If Alice has L different neighbors, she will obtain L lists of chunks. +With this knowledge, Alice will issue requests (again over the TCP connections) for chunks she currently +does not have. +So at any given instant of time, Alice will have a subset of chunks and will know which chunks her +neighbors have. With this information, Alice will have two important decisions to make. First, which +chunks should she request first from her neighbors? And second, to which of her neighbors should she +send requested chunks? In deciding which chunks to request, Alice uses a technique called rarest first. +The idea is to determine, from among the chunks she does not have, the chunks that are the rarest +among her neighbors (that is, the chunks that have the fewest repeated copies among her neighbors) +and then request those rarest chunks first. In this manner, the rarest chunks get more quickly +redistributed, aiming to (roughly) equalize the numbers of copies of each chunk in the torrent. +To determine which requests she responds to, BitTorrent uses a clever trading algorithm. The basic idea +is that Alice gives priority to the neighbors that are currently supplying her data at the highest rate. +Specifically, for each of her neighbors, Alice continually measures the rate at which she receives bits +and determines the four peers that are feeding her bits at the highest rate. She then reciprocates by +sending chunks to these same four peers. Every 10 seconds, she recalculates the rates and possibly +modifies the set of four peers. In BitTorrent lingo, these four peers are said to be unchoked. +Importantly, every 30 seconds, she also picks one additional neighbor at random and sends it chunks. +Let’s call the randomly chosen peer Bob. In BitTorrent lingo, Bob is said to be optimistically unchoked. +Because Alice is sending data to Bob, she may become one of Bob’s top four uploaders, in which case +Bob would start to send data to Alice. If the rate at which Bob sends data to Alice is high enough, Bob +could then, in turn, become one of Alice’s top four uploaders. In other words, every 30 seconds, Alice +will randomly choose a new trading partner and initiate trading with that partner. If the two peers are +satisfied with the trading, they will put each other in their top four lists and continue trading with each +other until one of the peers finds a better partner. The effect is that peers capable of uploading at +compatible rates tend to find each other. The random neighbor selection also allows new peers to get +chunks, so that they can have something to trade. All other neighboring peers besides these five peers + + (four “top” peers and one probing peer) are “choked,” that is, they do not receive any chunks from Alice. +BitTorrent has a number of interesting mechanisms that are not discussed here, including pieces (minichunks), pipelining, random first selection, endgame mode, and anti-snubbing [Cohen 2003]. +The incentive mechanism for trading just described is often referred to as tit-for-tat [Cohen 2003]. It has +been shown that this incentive scheme can be circumvented [Liogkas 2006; Locher 2006; Piatek +2007]. Nevertheless, the BitTorrent ecosystem is wildly successful, with millions of simultaneous peers +actively sharing files in hundreds of thousands of torrents. If BitTorrent had been designed without tit-fortat (or a variant), but otherwise exactly the same, BitTorrent would likely not even exist now, as the +majority of the users would have been freeriders [Saroiu 2002]. +We close our discussion on P2P by briefly mentioning another application of P2P, namely, Distributed +Hast Table (DHT). A distributed hash table is a simple database, with the database records being +distributed over the peers in a P2P system. DHTs have been widely implemented (e.g., in BitTorrent) +and have been the subject of extensive research. An overview is provided in a Video Note in the +companion website. + +Walking though distributed hash tables + + 2.6 Video Streaming and Content Distribution Networks +Streaming prerecorded video now accounts for the majority of the traffic in residential ISPs in North +America. In particular, the Netflix and YouTube services alone consumed a whopping 37% and 16%, +respectively, of residential ISP traffic in 2015 [Sandvine 2015]. In this section we will provide an +overview of how popular video streaming services are implemented in today’s Internet. We will see they +are implemented using application-level protocols and servers that function in some ways like a cache. +In Chapter 9, devoted to multimedia networking, we will further examine Internet video as well as other +Internet multimedia services. + +2.6.1 Internet Video +In streaming stored video applications, the underlying medium is prerecorded video, such as a movie, a +television show, a prerecorded sporting event, or a prerecorded user-generated video (such as those +commonly seen on YouTube). These prerecorded videos are placed on servers, and users send +requests to the servers to view the videos on demand. Many Internet companies today provide +streaming video, including, Netflix, YouTube (Google), Amazon, and Youku. +But before launching into a discussion of video streaming, we should first get a quick feel for the video +medium itself. A video is a sequence of images, typically being displayed at a constant rate, for +example, at 24 or 30 images per second. An uncompressed, digitally encoded image consists of an +array of pixels, with each pixel encoded into a number of bits to represent luminance and color. An +important characteristic of video is that it can be compressed, thereby trading off video quality with bit +rate. Today’s off-the-shelf compression algorithms can compress a video to essentially any bit rate +desired. Of course, the higher the bit rate, the better the image quality and the better the overall user +viewing experience. +From a networking perspective, perhaps the most salient characteristic of video is its high bit rate. +Compressed Internet video typically ranges from 100 kbps for low-quality video to over 3 Mbps for +streaming high-definition movies; 4K streaming envisions a bitrate of more than 10 Mbps. This can +translate to huge amount of traffic and storage, particularly for high-end video. For example, a single 2 +Mbps video with a duration of 67 minutes will consume 1 gigabyte of storage and traffic. By far, the most +important performance measure for streaming video is average end-to-end throughput. In order to +provide continuous playout, the network must provide an average throughput to the streaming +application that is at least as large as the bit rate of the compressed video. + + We can also use compression to create multiple versions of the same video, each at a different quality +level. For example, we can use compression to create, say, three versions of the same video, at rates of +300 kbps, 1 Mbps, and 3 Mbps. Users can then decide which version they want to watch as a function of +their current available bandwidth. Users with high-speed Internet connections might choose the 3 Mbps +version; users watching the video over 3G with a smartphone might choose the 300 kbps version. + +2.6.2 HTTP Streaming and DASH +In HTTP streaming, the video is simply stored at an HTTP server as an ordinary file with a specific URL. +When a user wants to see the video, the client establishes a TCP connection with the server and issues +an HTTP GET request for that URL. The server then sends the video file, within an HTTP response +message, as quickly as the underlying network protocols and traffic conditions will allow. On the client +side, the bytes are collected in a client application buffer. Once the number of bytes in this buffer +exceeds a predetermined threshold, the client application begins playback—specifically, the streaming +video application periodically grabs video frames from the client application buffer, decompresses the +frames, and displays them on the user’s screen. Thus, the video streaming application is displaying +video as it is receiving and buffering frames corresponding to latter parts of the video. +Although HTTP streaming, as described in the previous paragraph, has been extensively deployed in +practice (for example, by YouTube since its inception), it has a major shortcoming: All clients receive the +same encoding of the video, despite the large variations in the amount of bandwidth available to a client, +both across different clients and also over time for the same client. This has led to the development of a +new type of HTTP-based streaming, often referred to as Dynamic Adaptive Streaming over HTTP +(DASH). In DASH, the video is encoded into several different versions, with each version having a +different bit rate and, correspondingly, a different quality level. The client dynamically requests chunks of +video segments of a few seconds in length. When the amount of available bandwidth is high, the client +naturally selects chunks from a high-rate version; and when the available bandwidth is low, it naturally +selects from a low-rate version. The client selects different chunks one at a time with HTTP GET request +messages [Akhshabi 2011]. +DASH allows clients with different Internet access rates to stream in video at different encoding rates. +Clients with low-speed 3G connections can receive a low bit-rate (and low-quality) version, and clients +with fiber connections can receive a high-quality version. DASH also allows a client to adapt to the +available bandwidth if the available end-to-end bandwidth changes during the session. This feature is +particularly important for mobile users, who typically see their bandwidth availability fluctuate as they +move with respect to the base stations. +With DASH, each video version is stored in the HTTP server, each with a different URL. The HTTP + + server also has a manifest file, which provides a URL for each version along with its bit rate. The client +first requests the manifest file and learns about the various versions. The client then selects one chunk +at a time by specifying a URL and a byte range in an HTTP GET request message for each chunk. +While downloading chunks, the client also measures the received bandwidth and runs a rate +determination algorithm to select the chunk to request next. Naturally, if the client has a lot of video +buffered and if the measured receive bandwidth is high, it will choose a chunk from a high-bitrate +version. And naturally if the client has little video buffered and the measured received bandwidth is low, +it will choose a chunk from a low-bitrate version. DASH therefore allows the client to freely switch among +different quality levels. + +2.6.3 Content Distribution Networks +Today, many Internet video companies are distributing on-demand multi-Mbps streams to millions of +users on a daily basis. YouTube, for example, with a library of hundreds of millions of videos, distributes +hundreds of millions of video streams to users around the world every day. Streaming all this traffic to +locations all over the world while providing continuous playout and high interactivity is clearly a +challenging task. +For an Internet video company, perhaps the most straightforward approach to providing streaming video +service is to build a single massive data center, store all of its videos in the data center, and stream the +videos directly from the data center to clients worldwide. But there are three major problems with this +approach. First, if the client is far from the data center, server-to-client packets will cross many +communication links and likely pass through many ISPs, with some of the ISPs possibly located on +different continents. If one of these links provides a throughput that is less than the video consumption +rate, the end-to-end throughput will also be below the consumption rate, resulting in annoying freezing +delays for the user. (Recall from Chapter 1 that the end-to-end throughput of a stream is governed by +the throughput at the bottleneck link.) The likelihood of this happening increases as the number of links +in the end-to-end path increases. A second drawback is that a popular video will likely be sent many +times over the same communication links. Not only does this waste network bandwidth, but the Internet +video company itself will be paying its provider ISP (connected to the data center) for sending the same +bytes into the Internet over and over again. A third problem with this solution is that a single data center +represents a single point of failure—if the data center or its links to the Internet goes down, it would not +be able to distribute any video streams. +In order to meet the challenge of distributing massive amounts of video data to users distributed around +the world, almost all major video-streaming companies make use of Content Distribution Networks +(CDNs). A CDN manages servers in multiple geographically distributed locations, stores copies of the +videos (and other types of Web content, including documents, images, and audio) in its servers, and +attempts to direct each user request to a CDN location that will provide the best user experience. The + + CDN may be a private CDN, that is, owned by the content provider itself; for example, Google’s CDN +distributes YouTube videos and other types of content. The CDN may alternatively be a third-party +CDN that distributes content on behalf of multiple content providers; Akamai, Limelight and Level-3 all +operate third-party CDNs. A very readable overview of modern CDNs is [Leighton 2009; Nygren 2010]. +CDNs typically adopt one of two different server placement philosophies [Huang 2008]: +Enter Deep. One philosophy, pioneered by Akamai, is to enter deep into the access networks of +Internet Service Providers, by deploying server clusters in access ISPs all over the world. (Access +networks are described in Section 1.3.) Akamai takes this approach with clusters in approximately +1,700 locations. The goal is to get close to end users, thereby improving user-perceived delay and +throughput by decreasing the number of links and routers between the end user and the CDN server +from which it receives content. Because of this highly distributed design, the task of maintaining and +managing the clusters becomes challenging. +Bring Home. A second design philosophy, taken by Limelight and many other CDN companies, is to +bring the ISPs home by building large clusters at a smaller number (for example, tens) of sites. +Instead of getting inside the access ISPs, these CDNs typically place their clusters in Internet +Exchange Points (IXPs) (see Section 1.3). Compared with the enter-deep design philosophy, the +bring-home design typically results in lower maintenance and management overhead, possibly at the +expense of higher delay and lower throughput to end users. +Once its clusters are in place, the CDN replicates content across its clusters. The CDN may not want to +place a copy of every video in each cluster, since some videos are rarely viewed or are only popular in +some countries. In fact, many CDNs do not push videos to their clusters but instead use a simple pull +strategy: If a client requests a video from a cluster that is not storing the video, then the cluster retrieves +the video (from a central repository or from another cluster) and stores a copy locally while streaming +the video to the client at the same time. Similar Web caching (see Section 2.2.5), when a cluster’s +storage becomes full, it removes videos that are not frequently requested. +CDN Operation +Having identified the two major approaches toward deploying a CDN, let’s now dive down into the nuts +and bolts of how a CDN operates. When a browser in a user’s + +CASE STUDY +GOOGLE’S NETWORK INFRASTRUCTURE +To support its vast array of cloud services—including search, Gmail, calendar, YouTube video, +maps, documents, and social networks—Google has deployed an extensive private network and +CDN infrastructure. Google’s CDN infrastructure has three tiers of server clusters: + + Fourteen “mega data centers,” with eight in North America, four in Europe, and two in Asia +[Google Locations 2016], with each data center having on the order of 100,000 servers. +These mega data centers are responsible for serving dynamic (and often personalized) +content, including search results and Gmail messages. +An estimated 50 clusters in IXPs scattered throughout the world, with each cluster consisting +on the order of 100–500 servers [Adhikari 2011a]. These clusters are responsible for +serving static content, including YouTube videos [Adhikari 2011a]. +Many hundreds of “enter-deep” clusters located within an access ISP. Here a cluster +typically consists of tens of servers within a single rack. These enter-deep ­servers perform +TCP splitting (see Section 3.7) and serve static content [Chen 2011], including the static +portions of Web pages that embody search results. +All of these data centers and cluster locations are networked together with Google’s own private +network. When a user makes a search query, often the query is first sent over the local ISP to a +nearby enter-deep cache, from where the static content is retrieved; while providing the static +content to the client, the nearby cache also forwards the query over Google’s private network to +one of the mega data centers, from where the personalized search results are retrieved. For a +YouTube video, the video itself may come from one of the bring-home caches, whereas portions +of the Web page surrounding the video may come from the nearby enter-deep cache, and the +advertisements surrounding the video come from the data centers. In summary, except for the +local ISPs, the Google cloud services are largely provided by a network infrastructure that is +independent of the public Internet. + +host is instructed to retrieve a specific video (identified by a URL), the CDN must intercept the request +so that it can (1) determine a suitable CDN server cluster for that client at that time, and (2) redirect the +client’s request to a server in that cluster. We’ll shortly discuss how a CDN can determine a suitable +cluster. But first let’s examine the mechanics behind intercepting and redirecting a request. +Most CDNs take advantage of DNS to intercept and redirect requests; an interesting discussion of such +a use of the DNS is [Vixie 2009]. Let’s consider a simple example to illustrate how the DNS is typically +involved. Suppose a content provider, NetCinema, employs the third-party CDN company, KingCDN, to +distribute its videos to its customers. On the NetCinema Web pages, each of its videos is assigned a +URL that includes the string “video” and a unique identifier for the video itself; for example, Transformers +7 might be assigned http://video.netcinema.com/6Y7B23V. Six steps then occur, as shown in Figure +2.25: + +1. The user visits the Web page at NetCinema. +2. When the user clicks on the link http://video.netcinema.com/6Y7B23V, the user’s host sends a +DNS query for video.netcinema.com. + + 3. The user’s Local DNS Server (LDNS) relays the DNS query to an authoritative DNS server for +NetCinema, which observes the string “video” in the hostname video.netcinema.com. To “hand +over” the DNS query to KingCDN, instead of returning an IP address, the NetCinema +authoritative DNS server returns to the LDNS a hostname in the KingCDN’s domain, for +example, a1105.kingcdn.com. + +4. From this point on, the DNS query enters into KingCDN’s private DNS infrastructure. The user’s +LDNS then sends a second query, now for a1105.kingcdn.com, and KingCDN’s DNS system +eventually returns the IP addresses of a KingCDN content server to the LDNS. It is thus here, +within the KingCDN’s DNS system, that the CDN server from which the client will receive its +content is specified. + +Figure 2.25 DNS redirects a user’s request to a CDN server + +5. The LDNS forwards the IP address of the content-serving CDN node to the user’s host. +6. Once the client receives the IP address for a KingCDN content server, it establishes a direct +TCP connection with the server at that IP address and issues an HTTP GET request for the +video. If DASH is used, the server will first send to the client a manifest file with a list of URLs, +one for each version of the video, and the client will dynamically select chunks from the different +versions. +Cluster Selection Strategies +At the core of any CDN deployment is a cluster selection strategy, that is, a mechanism for +dynamically directing clients to a server cluster or a data center within the CDN. As we just saw, the + + CDN learns the IP address of the client’s LDNS server via the client’s DNS lookup. After learning this IP +address, the CDN needs to select an appropriate cluster based on this IP address. CDNs generally +employ proprietary cluster selection strategies. We now briefly survey a few approaches, each of which +has its own advantages and disadvantages. +One simple strategy is to assign the client to the cluster that is geographically closest. Using +commercial geo-location databases (such as Quova [Quova 2016] and Max-Mind [MaxMind 2016]), +each LDNS IP address is mapped to a geographic location. When a DNS request is received from a +particular LDNS, the CDN chooses the geographically closest cluster, that is, the cluster that is the +fewest kilometers from the LDNS “as the bird flies.” Such a solution can work reasonably well for a large +fraction of the clients [Agarwal 2009]. However, for some clients, the solution may perform poorly, since +the geographically closest cluster may not be the closest cluster in terms of the length or number of +hops of the network path. Furthermore, a problem inherent with all DNS-based approaches is that some +end-users are configured to use remotely located LDNSs [Shaikh 2001; Mao 2002], in which case the +LDNS location may be far from the client’s location. Moreover, this simple strategy ignores the variation +in delay and available bandwidth over time of Internet paths, always assigning the same cluster to a +particular client. +In order to determine the best cluster for a client based on the current traffic conditions, CDNs can +instead perform periodic real-time measurements of delay and loss performance between their +clusters and clients. For instance, a CDN can have each of its clusters periodically send probes (for +example, ping messages or DNS queries) to all of the LDNSs around the world. One drawback of this +approach is that many LDNSs are configured to not respond to such probes. + +2.6.4 Case Studies: Netflix, YouTube, and Kankan +We conclude our discussion of streaming stored video by taking a look at three highly successful largescale deployments: Netflix, YouTube, and Kankan. We’ll see that each of these systems take a very +different approach, yet employ many of the underlying principles discussed in this section. +Netflix +Generating 37% of the downstream traffic in residential ISPs in North America in 2015, Netflix has +become the leading service provider for online movies and TV series in the United States [Sandvine +2015]. As we discuss below, Netflix video distribution has two major components: the Amazon cloud +and its own private CDN infrastructure. +Netflix has a Web site that handles numerous functions, including user registration and login, billing, +movie catalogue for browsing and searching, and a movie recommendation system. As shown in Figure + + 2.26, this Web site (and its associated backend databases) run entirely on Amazon servers in the +Amazon cloud. Additionally, the Amazon cloud handles the following critical functions: +Content ingestion. Before Netflix can distribute a movie to its customers, it must first ingest and +process the movie. Netflix receives studio master versions of movies and uploads them to hosts in +the Amazon cloud. +Content processing. The machines in the Amazon cloud create many different formats for each +movie, suitable for a diverse array of client video players running on desktop computers, +smartphones, and game consoles connected to televisions. A different version is created for each of +these formats and at multiple bit rates, allowing for adaptive streaming over HTTP using DASH. +Uploading versions to its CDN. Once all of the versions of a movie have been created, the hosts in +the Amazon cloud upload the versions to its CDN. + +Figure 2.26 Netflix video streaming platform + +When Netflix first rolled out its video streaming service in 2007, it employed three third-party CDN +companies to distribute its video content. Netflix has since created its own private CDN, from which it +now streams all of its videos. (Netflix still uses Akamai to distribute its Web pages, however.) To create +its own CDN, Netflix has installed server racks both in IXPs and within residential ISPs themselves. +Netflix currently has server racks in over 50 IXP locations; see [Netflix Open Connect 2016] for a +current list of IXPs housing Netflix racks. There are also hundreds of ISP locations housing Netflix racks; +also see [Netflix Open Connect 2016], where Netflix provides to potential ISP partners instructions +about installing a (free) Netflix rack for their networks. Each server in the rack has several 10 Gbps + + Ethernet ports and over 100 terabytes of storage. The number of servers in a rack varies: IXP +installations often have tens of servers and contain the entire Netflix streaming video library, including +multiple versions of the videos to support DASH; local IXPs may only have one server and contain only +the most popular videos. Netflix does not use pull-caching (Section 2.2.5) to populate its CDN servers +in the IXPs and ISPs. Instead, Netflix distributes by pushing the videos to its CDN servers during offpeak hours. For those locations that cannot hold the entire library, Netflix pushes only the most popular +videos, which are determined on a day-to-day basis. The Netflix CDN design is described in some detail +in the YouTube videos [Netflix Video 1] and [Netflix Video 2]. +Having described the components of the Netflix architecture, let’s take a closer look at the interaction +between the client and the various servers that are involved in movie delivery. As indicated earlier, the +Web pages for browsing the Netflix video library are served from servers in the Amazon cloud. When a +user selects a movie to play, the Netflix software, running in the Amazon cloud, first determines which of +its CDN servers have copies of the movie. Among the servers that have the movie, the software then +determines the “best” server for that client request. If the client is using a residential ISP that has a +Netflix CDN server rack installed in that ISP, and this rack has a copy of the requested movie, then a +server in this rack is typically selected. If not, a server at a nearby IXP is typically selected. +Once Netflix determines the CDN server that is to deliver the content, it sends the client the IP address +of the specific server as well as a manifest file, which has the URLs for the different versions of the +requested movie. The client and that CDN server then directly interact using a proprietary version of +DASH. Specifically, as described in Section 2.6.2, the client uses the byte-range header in HTTP GET +request messages, to request chunks from the different versions of the movie. Netflix uses chunks that +are approximately four-seconds long [Adhikari 2012]. While the chunks are being downloaded, the +client measures the received throughput and runs a rate-determination algorithm to determine the +quality of the next chunk to request. +Netflix embodies many of the key principles discussed earlier in this section, including adaptive +streaming and CDN distribution. However, because Netflix uses its own private CDN, which distributes +only video (and not Web pages), Netflix has been able to simplify and tailor its CDN design. In particular, +Netflix does not need to employ DNS redirect, as discussed in Section 2.6.3, to connect a particular +client to a CDN server; instead, the Netflix software (running in the Amazon cloud) directly tells the client +to use a particular CDN server. Furthermore, the Netflix CDN uses push caching rather than pull +caching (Section 2.2.5): content is pushed into the servers at scheduled times at off-peak hours, rather +than dynamically during cache misses. +YouTube +With 300 hours of video uploaded to YouTube every minute and several billion video views per day +[YouTube 2016], YouTube is indisputably the world’s largest video-sharing site. YouTube began its + + service in April 2005 and was acquired by Google in November 2006. Although the Google/YouTube +design and protocols are proprietary, through several independent measurement efforts we can gain a +basic understanding about how YouTube operates [Zink 2009; Torres 2011; Adhikari 2011a]. As with +Netflix, YouTube makes extensive use of CDN technology to distribute its videos [Torres 2011]. Similar +to Netflix, Google uses its own private CDN to distribute YouTube videos, and has installed server +clusters in many hundreds of different IXP and ISP locations. From these locations and directly from its +huge data centers, Google distributes YouTube videos [Adhikari 2011a]. Unlike Netflix, however, +Google uses pull caching, as described in Section 2.2.5, and DNS redirect, as described in Section +2.6.3. Most of the time, Google’s cluster-selection strategy directs the client to the cluster for which the +RTT between client and cluster is the lowest; however, in order to balance the load across clusters, +sometimes the client is directed (via DNS) to a more distant cluster [Torres 2011]. +YouTube employs HTTP streaming, often making a small number of different versions available for a +video, each with a different bit rate and corresponding quality level. YouTube does not employ adaptive +streaming (such as DASH), but instead requires the user to manually select a version. In order to save +bandwidth and server resources that would be wasted by repositioning or early termination, YouTube +uses the HTTP byte range request to limit the flow of transmitted data after a target amount of video is +prefetched. +Several million videos are uploaded to YouTube every day. Not only are YouTube videos streamed from +server to client over HTTP, but YouTube uploaders also upload their videos from client to server over +HTTP. YouTube processes each video it receives, converting it to a YouTube video format and creating +multiple versions at different bit rates. This processing takes place entirely within Google data centers. +(See the case study on Google’s network infrastructure in Section 2.6.3.) +Kankan +We just saw that dedicated servers, operated by private CDNs, stream Netflix and YouTube videos to +clients. Netflix and YouTube have to pay not only for the server hardware but also for the bandwidth the +servers use to distribute the videos. Given the scale of these services and the amount of bandwidth they +are consuming, such a CDN deployment can be costly. +We conclude this section by describing an entirely different approach for providing video on demand +over the Internet at a large scale—one that allows the service provider to significantly reduce its +infrastructure and bandwidth costs. As you might suspect, this approach uses P2P delivery instead of +(or along with) client-server delivery. Since 2011, Kankan (owned and operated by Xunlei) has been +deploying P2P video delivery with great success, with tens of millions of users every month [Zhang +2015]. +At a high level, P2P video streaming is very similar to BitTorrent file downloading. When a peer wants to + + see a video, it contacts a tracker to discover other peers in the system that have a copy of that video. +This requesting peer then requests chunks of the video in parallel from the other peers that have the +video. Different from downloading with BitTorrent, however, requests are preferentially made for chunks +that are to be played back in the near future in order to ensure continuous playback [Dhungel 2012]. +Recently, Kankan has migrated to a hybrid CDN-P2P streaming system [Zhang 2015]. Specifically, +Kankan now deploys a few hundred servers within China and pushes video content to these servers. +This Kankan CDN plays a major role in the start-up stage of video streaming. In most cases, the client +requests the beginning of the content from CDN servers, and in parallel requests content from peers. +When the total P2P traffic is sufficient for video playback, the client will cease streaming from the CDN +and only stream from peers. But if the P2P streaming traffic becomes insufficient, the client will restart +CDN connections and return to the mode of hybrid CDN-P2P streaming. In this manner, Kankan can +ensure short initial start-up delays while minimally relying on costly infrastructure servers and bandwidth. + + 2.7 Socket Programming: Creating Network Applications +Now that we’ve looked at a number of important network applications, let’s explore how network +application programs are actually created. Recall from Section 2.1 that a typical network application +consists of a pair of programs—a client program and a server program—residing in two different end +systems. When these two programs are executed, a client process and a server process are created, +and these processes communicate with each other by reading from, and writing to, sockets. When +creating a network application, the developer’s main task is therefore to write the code for both the client +and server programs. +There are two types of network applications. One type is an implementation whose operation is +specified in a protocol standard, such as an RFC or some other standards document; such an +application is sometimes referred to as “open,” since the rules specifying its operation are known to all. +For such an implementation, the client and server programs must conform to the rules dictated by the +RFC. For example, the client program could be an implementation of the client side of the HTTP +protocol, described in Section 2.2 and precisely defined in RFC 2616; similarly, the server program +could be an implementation of the HTTP server protocol, also precisely defined in RFC 2616. If one +developer writes code for the client program and another developer writes code for the server program, +and both developers carefully follow the rules of the RFC, then the two programs will be able to +interoperate. Indeed, many of today’s network applications involve communication between client and +server programs that have been created by independent developers—for example, a Google Chrome +browser communicating with an Apache Web server, or a BitTorrent client communicating with +BitTorrent tracker. +The other type of network application is a proprietary network application. In this case the client and +server programs employ an application-layer protocol that has not been openly published in an RFC or +elsewhere. A single developer (or development team) creates both the client and server programs, and +the developer has complete control over what goes in the code. But because the code does not +implement an open protocol, other independent developers will not be able to develop code that +interoperates with the application. +In this section, we’ll examine the key issues in developing a client-server application, and we’ll “get our +hands dirty” by looking at code that implements a very simple client-server application. During the +development phase, one of the first decisions the developer must make is whether the application is to +run over TCP or over UDP. Recall that TCP is connection oriented and provides a reliable byte-stream +channel through which data flows between two end systems. UDP is connectionless and sends +independent packets of data from one end system to the other, without any guarantees about delivery. + + Recall also that when a client or server program implements a protocol defined by an RFC, it should use +the well-known port number associated with the protocol; conversely, when developing a proprietary +application, the developer must be careful to avoid using such well-known port numbers. (Port numbers +were briefly discussed in Section 2.1. They are covered in more detail in Chapter 3.) +We introduce UDP and TCP socket programming by way of a simple UDP application and a simple TCP +application. We present the simple UDP and TCP applications in Python 3. We could have written the +code in Java, C, or C++, but we chose Python mostly because Python clearly exposes the key socket +concepts. With Python there are fewer lines of code, and each line can be explained to the novice +programmer without difficulty. But there’s no need to be frightened if you are not familiar with Python. +You should be able to easily follow the code if you have experience programming in Java, C, or C++. +If you are interested in client-server programming with Java, you are encouraged to see the Companion +Website for this textbook; in fact, you can find there all the examples in this section (and associated +labs) in Java. For readers who are interested in client-server programming in C, there are several good +references available [Donahoo 2001; Stevens 1997; Frost 1994; Kurose 1996]; our Python examples +below have a similar look and feel to C. + +2.7.1 Socket Programming with UDP +In this subsection, we’ll write simple client-server programs that use UDP; in the following section, we’ll +write similar programs that use TCP. +Recall from Section 2.1 that processes running on different machines communicate with each other by +sending messages into sockets. We said that each process is analogous to a house and the process’s +socket is analogous to a door. The application resides on one side of the door in the house; the +transport-layer protocol resides on the other side of the door in the outside world. The application +developer has control of everything on the application-layer side of the socket; however, it has little +control of the transport-layer side. +Now let’s take a closer look at the interaction between two communicating processes that use UDP +sockets. Before the sending process can push a packet of data out the socket door, when using UDP, it +must first attach a destination address to the packet. After the packet passes through the sender’s +socket, the Internet will use this destination address to route the packet through the Internet to the +socket in the receiving process. When the packet arrives at the receiving socket, the receiving process +will retrieve the packet through the socket, and then inspect the packet’s contents and take appropriate +action. +So you may be now wondering, what goes into the destination address that is attached to the packet? + + As you might expect, the destination host’s IP address is part of the destination address. By including +the destination IP address in the packet, the routers in the Internet will be able to route the packet +through the Internet to the destination host. But because a host may be running many network +application processes, each with one or more sockets, it is also necessary to identify the particular +socket in the destination host. When a socket is created, an identifier, called a port number, is assigned +to it. So, as you might expect, the packet’s destination address also includes the socket’s port number. +In summary, the sending process attaches to the packet a destination address, which consists of the +destination host’s IP address and the destination socket’s port number. Moreover, as we shall soon see, +the sender’s source address—consisting of the IP address of the source host and the port number of the +source socket—are also attached to the packet. However, attaching the source address to the packet is +typically not done by the UDP application code; instead it is automatically done by the underlying +operating system. +We’ll use the following simple client-server application to demonstrate socket programming for both +UDP and TCP: + +1. The client reads a line of characters (data) from its keyboard and sends the data to the server. +2. The server receives the data and converts the characters to uppercase. +3. The server sends the modified data to the client. +4. The client receives the modified data and displays the line on its screen. +Figure 2.27 highlights the main socket-related activity of the client and server that communicate over +the UDP transport service. +Now let’s get our hands dirty and take a look at the client-server program pair for a UDP implementation +of this simple application. We also provide a detailed, line-by-line analysis after each program. We’ll +begin with the UDP client, which will send a simple application-level message to the server. In order for + + Figure 2.27 The client-server application using UDP + +the server to be able to receive and reply to the client’s message, it must be ready and running—that is, +it must be running as a process before the client sends its message. +The client program is called UDPClient.py, and the server program is called UDPServer.py. In order to +emphasize the key issues, we intentionally provide code that is minimal. “Good code” would certainly +have a few more auxiliary lines, in particular for handling error cases. For this application, we have +arbitrarily chosen 12000 for the server port number. +UDPClient.py +Here is the code for the client side of the application: + +from socket import * +serverName = ’hostname’ +serverPort = 12000 + + clientSocket = socket(AF_INET, SOCK_DGRAM) +message = raw_input(’Input lowercase sentence:’) +clientSocket.sendto(message.encode(),(serverName, serverPort)) +modifiedMessage, serverAddress = clientSocket.recvfrom(2048) +print(modifiedMessage.decode()) +clientSocket.close() + +Now let’s take a look at the various lines of code in UDPClient.py. + +from socket import * + +The socket module forms the basis of all network communications in Python. By including this line, we +will be able to create sockets within our program. + +serverName = ’hostname’ +serverPort = 12000 + +The first line sets the variable serverName to the string ‘hostname’. Here, we provide a string +containing either the IP address of the server (e.g., “128.138.32.126”) or the hostname of the server +(e.g., “cis.poly.edu”). If we use the hostname, then a DNS lookup will automatically be performed to get +the IP address.) The second line sets the integer variable serverPort to 12000. + +clientSocket = socket(AF_INET, SOCK_DGRAM) + +This line creates the client’s socket, called clientSocket . The first parameter indicates the address +family; in particular, AF_INET indicates that the underlying network is using IPv4. (Do not worry about +this now—we will discuss IPv4 in Chapter 4.) The second parameter indicates that the socket is of type +SOCK_DGRAM , which means it is a UDP socket (rather than a TCP socket). Note that we are not +specifying the port number of the client socket when we create it; we are instead letting the operating +system do this for us. Now that the client process’s door has been created, we will want to create a +message to send through the door. + +message = raw_input(’Input lowercase sentence:’) + + raw_input() is a built-in function in Python. When this command is executed, the user at the client is +prompted with the words “Input lowercase sentence:” The user then uses her keyboard to input a line, +which is put into the variable message . Now that we have a socket and a message, we will want to +send the message through the socket to the destination host. + +clientSocket.sendto(message.encode(),(serverName, serverPort)) + +In the above line, we first convert the message from string type to byte type, as we need to send bytes +into a socket; this is done with the encode() method. The method sendto() attaches the destination +address ( serverName, serverPort ) to the message and sends the resulting packet into the +process’s socket, clientSocket . (As mentioned earlier, the source address is also attached to the +packet, although this is done automatically rather than explicitly by the code.) Sending a client-to-server +message via a UDP socket is that simple! After sending the packet, the client waits to receive data from +the server. + +modifiedMessage, serverAddress = clientSocket.recvfrom(2048) + +With the above line, when a packet arrives from the Internet at the client’s socket, the packet’s data is +put into the variable modifiedMessage and the packet’s source address is put into the variable +serverAddress . The variable serverAddress contains both the server’s IP address and the +server’s port number. The program UDPClient doesn’t actually need this server address information, +since it already knows the server address from the outset; but this line of Python provides the server +address nevertheless. The method recvfrom also takes the buffer size 2048 as input. (This buffer size +works for most purposes.) + +print(modifiedMessage.decode()) + +This line prints out modifiedMessage on the user’s display, after converting the message from bytes to +string. It should be the original line that the user typed, but now capitalized. + +clientSocket.close() + + This line closes the socket. The process then terminates. +UDPServer.py +Let’s now take a look at the server side of the application: + +from socket import * +serverPort = 12000 +serverSocket = socket(AF_INET, SOCK_DGRAM) +serverSocket.bind((’’, serverPort)) +print(”The server is ready to receive”) +while True: +message, clientAddress = serverSocket.recvfrom(2048) +modifiedMessage = message.decode().upper() +serverSocket.sendto(modifiedMessage.encode(), clientAddress) + +Note that the beginning of UDPServer is similar to UDPClient. It also imports the socket module, also +sets the integer variable serverPort to 12000, and also creates a socket of type SOCK_DGRAM (a +UDP socket). The first line of code that is significantly different from UDPClient is: + +serverSocket.bind((’’, serverPort)) + +The above line binds (that is, assigns) the port number 12000 to the server’s socket. Thus in +UDPServer, the code (written by the application developer) is explicitly assigning a port number to the +socket. In this manner, when anyone sends a packet to port 12000 at the IP address of the server, that +packet will be directed to this socket. UDPServer then enters a while loop; the while loop will allow +UDPServer to receive and process packets from clients indefinitely. In the while loop, UDPServer waits +for a packet to arrive. + +message, clientAddress = serverSocket.recvfrom(2048) + +This line of code is similar to what we saw in UDPClient. When a packet arrives at the server’s socket, +the packet’s data is put into the variable message and the packet’s source address is put into the +variable clientAddress . The variable ­clientAddress contains both the client’s IP address and the +client’s port number. Here, UDPServer will make use of this address information, as it provides a return + + address, similar to the return address with ordinary postal mail. With this source address information, +the server now knows to where it should direct its reply. + +modifiedMessage = message.decode().upper() + +This line is the heart of our simple application. It takes the line sent by the client and, after converting the +message to a string, uses the method upper() to capitalize it. + +serverSocket.sendto(modifiedMessage.encode(), clientAddress) + +This last line attaches the client’s address (IP address and port number) to the capitalized message +(after converting the string to bytes), and sends the resulting packet into the server’s socket. (As +mentioned earlier, the server address is also attached to the packet, although this is done automatically +rather than explicitly by the code.) The Internet will then deliver the packet to this client address. After +the server sends the packet, it remains in the while loop, waiting for another UDP packet to arrive (from +any client running on any host). +To test the pair of programs, you run UDPClient.py on one host and UDPServer.py on another host. Be +sure to include the proper hostname or IP address of the server in UDPClient.py. Next, you execute +UDPServer.py, the compiled server program, in the server host. This creates a process in the server +that idles until it is contacted by some client. Then you execute UDPClient.py, the compiled client +program, in the client. This creates a process in the client. Finally, to use the application at the client, +you type a sentence followed by a carriage return. +To develop your own UDP client-server application, you can begin by slightly modifying the client or +server programs. For example, instead of converting all the letters to uppercase, the server could count +the number of times the letter s appears and return this number. Or you can modify the client so that +after receiving a capitalized sentence, the user can continue to send more sentences to the server. + +2.7.2 Socket Programming with TCP +Unlike UDP, TCP is a connection-oriented protocol. This means that before the client and server can +start to send data to each other, they first need to handshake and establish a TCP connection. One end +of the TCP connection is attached to the client socket and the other end is attached to a server socket. +When creating the TCP connection, we associate with it the client socket address (IP address and port + + number) and the server socket address (IP address and port number). With the TCP connection +established, when one side wants to send data to the other side, it just drops the data into the TCP +connection via its socket. This is different from UDP, for which the server must attach a destination +address to the packet before dropping it into the socket. +Now let’s take a closer look at the interaction of client and server programs in TCP. The client has the +job of initiating contact with the server. In order for the server to be able to react to the client’s initial +contact, the server has to be ready. This implies two things. First, as in the case of UDP, the TCP server +must be running as a process before the client attempts to initiate contact. Second, the server program +must have a special door—more precisely, a special socket—that welcomes some initial contact from a +client process running on an arbitrary host. Using our house/door analogy for a process/socket, we will +sometimes refer to the client’s initial contact as “knocking on the welcoming door.” +With the server process running, the client process can initiate a TCP connection to the server. This is +done in the client program by creating a TCP socket. When the client creates its TCP socket, it specifies +the address of the welcoming socket in the server, namely, the IP address of the server host and the +port number of the socket. After creating its socket, the client initiates a three-way handshake and +establishes a TCP connection with the server. The three-way handshake, which takes place within the +transport layer, is completely invisible to the client and server programs. +During the three-way handshake, the client process knocks on the welcoming door of the server +process. When the server “hears” the knocking, it creates a new door—more precisely, a new socket +that is dedicated to that particular ­client. In our example below, the welcoming door is a TCP socket +object that we call ­ +serverSocket ; the newly created socket dedicated to the client making the +connection is called connectionSocket . Students who are encountering TCP sockets for the first +time sometimes confuse the welcoming socket (which is the initial point of contact for all clients wanting +to communicate with the server), and each newly created server-side connection socket that is +subsequently created for communicating with each client. +From the application’s perspective, the client’s socket and the server’s connection socket are directly +connected by a pipe. As shown in Figure 2.28, the client process can send arbitrary bytes into its +socket, and TCP guarantees that the server process will receive (through the connection socket) each +byte in the order sent. TCP thus provides a reliable service between the client and server processes. +Furthermore, just as people can go in and out the same door, the client process not only sends bytes +into but also receives bytes from its socket; similarly, the server process not only receives bytes from but +also sends bytes into its connection socket. +We use the same simple client-server application to demonstrate socket programming with TCP: The +client sends one line of data to the server, the server capitalizes the line and sends it back to the client. +Figure 2.29 highlights the main socket-related activity of the client and server that communicate over + + the TCP transport service. + +Figure 2.28 The TCPServer process has two sockets + +TCPClient.py +Here is the code for the client side of the application: + +from socket import * +serverName = ’servername’ +serverPort = 12000 +clientSocket = socket(AF_INET, SOCK_STREAM) +clientSocket.connect((serverName, serverPort)) +sentence = raw_input(’Input lowercase sentence:’) +clientSocket.send(sentence.encode()) +modifiedSentence = clientSocket.recv(1024) +print(’From Server: ’, modifiedSentence.decode()) +clientSocket.close() + +Let’s now take a look at the various lines in the code that differ significantly from the UDP +implementation. The first such line is the creation of the client socket. + + clientSocket = socket(AF_INET, SOCK_STREAM) + +This line creates the client’s socket, called clientSocket . The first parameter again indicates that the +underlying network is using IPv4. The second parameter + +Figure 2.29 The client-server application using TCP + +indicates that the socket is of type SOCK_STREAM , which means it is a TCP socket (rather than a UDP +socket). Note that we are again not specifying the port number of the client socket when we create it; we +are instead letting the operating system do this for us. Now the next line of code is very different from +what we saw in UDPClient: + + clientSocket.connect((serverName, serverPort)) + +Recall that before the client can send data to the server (or vice versa) using a TCP socket, a TCP +connection must first be established between the client and server. The above line initiates the TCP +connection between the client and server. The parameter of the connect() method is the address of +the server side of the connection. After this line of code is executed, the three-way handshake is +performed and a TCP connection is established between the client and server. + +sentence = raw_input(’Input lowercase sentence:’) + +As with UDPClient, the above obtains a sentence from the user. The string sentence continues to +gather characters until the user ends the line by typing a carriage return. The next line of code is also +very different from UDPClient: + +clientSocket.send(sentence.encode()) + +The above line sends the sentence through the client’s socket and into the TCP connection. Note that +the program does not explicitly create a packet and attach the destination address to the packet, as was +the case with UDP sockets. Instead the client program simply drops the bytes in the string sentence +into the TCP connection. The client then waits to receive bytes from the server. + +modifiedSentence = clientSocket.recv(2048) + +When characters arrive from the server, they get placed into the string modifiedSentence . +Characters continue to accumulate in modifiedSentence until the line ends with a carriage return +character. After printing the capitalized sentence, we close the client’s socket: + +clientSocket.close() + +This last line closes the socket and, hence, closes the TCP connection between the client and the +server. It causes TCP in the client to send a TCP message to TCP in the server (see Section 3.5). + + TCPServer.py +Now let’s take a look at the server program. + +from socket import * +serverPort = 12000 +serverSocket = socket(AF_INET, SOCK_STREAM) +serverSocket.bind((’’, serverPort)) +serverSocket.listen(1) +print(’The server is ready to receive’) +while True: +connectionSocket, addr = serverSocket.accept() +sentence = connectionSocket.recv(1024).decode() +capitalizedSentence = sentence.upper() +connectionSocket.send(capitalizedSentence.encode()) +connectionSocket.close() + +Let’s now take a look at the lines that differ significantly from UDPServer and TCPClient. As with +TCPClient, the server creates a TCP socket with: + +serverSocket=socket(AF_INET, SOCK_STREAM) + +Similar to UDPServer, we associate the server port number, serverPort , with this socket: + +serverSocket.bind((’’, serverPort)) + +But with TCP, serverSocket will be our welcoming socket. After establishing this welcoming door, we +will wait and listen for some client to knock on the door: + +serverSocket.listen(1) + +This line has the server listen for TCP connection requests from the client. The parameter specifies the +maximum number of queued connections (at least 1). + + connectionSocket, addr = serverSocket.accept() + +When a client knocks on this door, the program invokes the accept() method for serverSocket, which +creates a new socket in the server, called ­ +connectionSocket , dedicated to this particular client. +The client and server then complete the handshaking, creating a TCP connection between the client’s +clientSocket and the server’s connectionSocket . With the TCP connection established, the +client and server can now send bytes to each other over the connection. With TCP, all bytes sent from +one side not are not only guaranteed to arrive at the other side but also guaranteed arrive in order. + +connectionSocket.close() + +In this program, after sending the modified sentence to the client, we close the connection socket. But +since serverSocket remains open, another client can now knock on the door and send the server a +sentence to modify. +This completes our discussion of socket programming in TCP. You are encouraged to run the two +programs in two separate hosts, and also to modify them to achieve slightly different goals. You should +compare the UDP program pair with the TCP program pair and see how they differ. You should also do +many of the socket programming assignments described at the ends of Chapter 2, 4, and 9. Finally, we +hope someday, after mastering these and more advanced socket programs, you will write your own +popular network application, become very rich and famous, and remember the authors of this textbook! + + 2.8 Summary +In this chapter, we’ve studied the conceptual and the implementation aspects of network applications. +We’ve learned about the ubiquitous client-server architecture adopted by many Internet applications and +seen its use in the HTTP, SMTP, POP3, and DNS protocols. We’ve studied these important applicationlevel protocols, and their corresponding associated applications (the Web, file transfer, e-mail, and DNS) +in some detail. We’ve learned about the P2P architecture and how it is used in many applications. +We’ve also learned about streaming video, and how modern video distribution systems leverage CDNs. +We’ve examined how the socket API can be used to build network applications. We’ve walked through +the use of sockets for connection-oriented (TCP) and connectionless (UDP) end-to-end transport +services. The first step in our journey down the layered network architecture is now complete! +At the very beginning of this book, in Section 1.1, we gave a rather vague, bare-bones definition of a +protocol: “the format and the order of messages exchanged between two or more communicating +entities, as well as the actions taken on the transmission and/or receipt of a message or other event.” +The material in this chapter, and in particular our detailed study of the HTTP, SMTP, POP3, and DNS +protocols, has now added considerable substance to this definition. Protocols are a key concept in +networking; our study of application protocols has now given us the opportunity to develop a more +intuitive feel for what protocols are all about. +In Section 2.1, we described the service models that TCP and UDP offer to applications that invoke +them. We took an even closer look at these service models when we developed simple applications that +run over TCP and UDP in Section 2.7. However, we have said little about how TCP and UDP provide +these service models. For example, we know that TCP provides a reliable data service, but we haven’t +said yet how it does so. In the next chapter we’ll take a careful look at not only the what, but also the +how and why of transport protocols. +Equipped with knowledge about Internet application structure and application-level protocols, we’re now +ready to head further down the protocol stack and examine the transport layer in Chapter 3. + + Homework Problems and Questions + +Chapter 2 Review Questions + +SECTION 2.1 +R1. List five nonproprietary Internet applications and the application-layer protocols that they +use. +R2. What is the difference between network architecture and application architecture? +R3. For a communication session between a pair of processes, which process is the client and +which is the server? +R4. For a P2P file-sharing application, do you agree with the statement, “There is no notion of +client and server sides of a communication session”? Why or why not? +R5. What information is used by a process running on one host to identify a process running on +another host? +R6. Suppose you wanted to do a transaction from a remote client to a server as fast as possible. +Would you use UDP or TCP? Why? +R7. Referring to Figure 2.4 , we see that none of the applications listed in Figure 2.4 requires +both no data loss and timing. Can you conceive of an application that requires no data loss and +that is also highly time-sensitive? +R8. List the four broad classes of services that a transport protocol can provide. For each of the +service classes, indicate if either UDP or TCP (or both) provides such a service. +R9. Recall that TCP can be enhanced with SSL to provide process-to-process security services, +including encryption. Does SSL operate at the transport layer or the application layer? If the +application developer wants TCP to be enhanced with SSL, what does the developer have to +do? + +SECTION 2.2–2.5 +R10. What is meant by a handshaking protocol? +R11. Why do HTTP, SMTP, and POP3 run on top of TCP rather than on UDP? +R12. Consider an e-commerce site that wants to keep a purchase record for each of its +customers. Describe how this can be done with cookies. +R13. Describe how Web caching can reduce the delay in receiving a requested object. Will Web +caching reduce the delay for all objects requested by a user or for only some of the objects? + + Why? +R14. Telnet into a Web server and send a multiline request message. Include in the request +message the If-modified-since: header line to force a response message with the 304 +Not Modified status code. +R15. List several popular messaging apps. Do they use the same protocols as SMS? +R16. Suppose Alice, with a Web-based e-mail account (such as Hotmail or Gmail), sends a +message to Bob, who accesses his mail from his mail server using POP3. Discuss how the +message gets from Alice’s host to Bob’s host. Be sure to list the series of application-layer +protocols that are used to move the message between the two hosts. +R17. Print out the header of an e-mail message you have recently received. How many +Received: header lines are there? Analyze each of the header lines in the message. +R18. From a user’s perspective, what is the difference between the download-and-delete mode +and the download-and-keep mode in POP3? +R19. Is it possible for an organization’s Web server and mail server to have exactly the same +alias for a hostname (for example, foo.com )? What would be the type for the RR that contains +the hostname of the mail server? +R20. Look over your received e-mails, and examine the header of a message sent from a user +with a .edu e-mail address. Is it possible to determine from the header the IP address of the host +from which the message was sent? Do the same for a message sent from a Gmail account. + +SECTION 2.5 +R21. In BitTorrent, suppose Alice provides chunks to Bob throughout a 30-second interval. Will +Bob necessarily return the favor and provide chunks to Alice in this same interval? Why or why +not? +R22. Consider a new peer Alice that joins BitTorrent without possessing any chunks. Without +any chunks, she cannot become a top-four uploader for any of the other peers, since she has +nothing to upload. How then will Alice get her first chunk? +R23. What is an overlay network? Does it include routers? What are the edges in the overlay +network? + +SECTION 2.6 +R24. CDNs typically adopt one of two different server placement philosophies. Name and briefly +describe them. +R25. Besides network-related considerations such as delay, loss, and bandwidth performance, +there are other important factors that go into designing a CDN server selection strategy. What +are they? + + SECTION 2.7 +R26. In Section 2.7, the UDP server described needed only one socket, whereas the TCP server +needed two sockets. Why? If the TCP server were to support n simultaneous connections, each +from a different client host, how many sockets would the TCP server need? +R27. For the client-server application over TCP described in Section 2.7 , why must the server +program be executed before the client program? For the client-server application over UDP, why +may the client program be executed before the server program? + +Problems +P1. True or false? + +a. A user requests a Web page that consists of some text and three images. For this page, +the client will send one request message and receive four response messages. + +b. Two distinct Web pages (for example, www.mit.edu/research.html and +www.mit.edu/students.html ) can be sent over the same persistent connection. + +c. With nonpersistent connections between browser and origin server, it is possible for a +single TCP segment to carry two distinct HTTP request messages. + +d. The Date: header in the HTTP response message indicates when the object in the +response was last modified. + +e. HTTP response messages never have an empty message body. +P2. SMS, iMessage, and WhatsApp are all smartphone real-time messaging systems. After +doing some research on the Internet, for each of these systems write one paragraph about the +protocols they use. Then write a paragraph explaining how they differ. +P3. Consider an HTTP client that wants to retrieve a Web document at a given URL. The IP +address of the HTTP server is initially unknown. What transport and application-layer protocols +besides HTTP are needed in this scenario? +P4. Consider the following string of ASCII characters that were captured by Wireshark when the +browser sent an HTTP GET message (i.e., this is the actual content of an HTTP GET message). +The characters are carriage return and line-feed characters (that is, the italized +character string in the text below represents the single carriage-return character that was +contained at that point in the HTTP header). Answer the following questions, indicating where in +the HTTP GET message below you find the answer. +GET /cs453/index.html HTTP/1.1Host: gai +a.cs.umass.eduUser-Agent: Mozilla/5.0 ( +Windows;U; Windows NT 5.1; en-US; rv:1.7.2) Gec +ko/20040804 Netscape/7.2 (ax) Accept:ex + + t/xml, application/xml, application/xhtml+xml, text +/html;q=0.9, text/plain;q=0.8, image/png,*/*;q=0.5 +Accept-Language: en-us, en;q=0.5AcceptEncoding: zip, deflateAccept-Charset: ISO +-8859-1, utf-8;q=0.7,*;q=0.7Keep-Alive: 300 +Connection:keep-alive + +a. What is the URL of the document requested by the browser? +b. What version of HTTP is the browser running? +c. Does the browser request a non-persistent or a persistent connection? +d. What is the IP address of the host on which the browser is running? +e. What type of browser initiates this message? Why is the browser type needed in an +HTTP request message? +P5. The text below shows the reply sent from the server in response to the HTTP GET message +in the question above. Answer the following questions, indicating where in the message below +you find the answer. +HTTP/1.1 200 OKDate: Tue, 07 Mar 2008 +12:39:45GMTServer: Apache/2.0.52 (Fedora) +Last-Modified: Sat, 10 Dec2005 18:27:46 +GMTETag: ”526c3-f22-a88a4c80”AcceptRanges: bytesContent-Length: 3874 +Keep-Alive: timeout=max=100Connection: +Keep-AliveContent-Type: text/html; charset= +ISO-8859-1 + CMPSCI 453 / 591 / +NTU-ST550ASpring 2005 homepage + + +a. Was the server able to successfully find the document or not? What time was the +document reply provided? + +b. When was the document last modified? +c. How many bytes are there in the document being returned? +d. What are the first 5 bytes of the document being returned? Did the server agree to a + + persistent connection? +P6. Obtain the HTTP/1.1 specification (RFC 2616). Answer the following questions: + +a. Explain the mechanism used for signaling between the client and server to indicate that a +persistent connection is being closed. Can the client, the server, or both signal the close +of a connection? + +b. What encryption services are provided by HTTP? +c. Can a client open three or more simultaneous connections with a given server? +d. Either a server or a client may close a transport connection between them if either one +detects the connection has been idle for some time. Is it possible that one side starts +closing a connection while the other side is transmitting data via this connection? +Explain. +P7. Suppose within your Web browser you click on a link to obtain a Web page. The IP address +for the associated URL is not cached in your local host, so a DNS lookup is necessary to obtain +the IP address. Suppose that n DNS servers are visited before your host receives the IP address +from DNS; the successive visits incur an RTT of RTT1,. . .,RTTn. Further suppose that the Web +page associated with the link contains exactly one object, consisting of a small amount of HTML +text. Let RTT0 denote the RTT between the local host and the server containing the object. +Assuming zero transmission time of the object, how much time elapses from when the client +clicks on the link until the client receives the object? +P8. Referring to Problem P7, suppose the HTML file references eight very small objects on the +same server. Neglecting transmission times, how much time elapses with + +a. Non-persistent HTTP with no parallel TCP connections? +b. Non-persistent HTTP with the browser configured for 5 parallel connections? +c. Persistent HTTP? +P9. Consider Figure 2.12 , for which there is an institutional network connected to the Internet. +Suppose that the average object size is 850,000 bits and that the average request rate from the +institution’s browsers to the origin servers is 16 requests per second. Also suppose that the +amount of time it takes from when the router on the Internet side of the access link forwards an +HTTP request until it receives the response is three seconds on average (see Section 2.2.5). +Model the total average response time as the sum of the average access delay (that is, the delay +from Internet router to institution router) and the average Internet delay. For the average access +delay, use Δ/(1−Δβ), where Δ is the average time required to send an object over the access link +and b is the arrival rate of objects to the access link. + +a. Find the total average response time. +b. Now suppose a cache is installed in the institutional LAN. Suppose the miss rate is 0.4. +Find the total response time. + + P10. Consider a short, 10-meter link, over which a sender can transmit at a rate of 150 bits/sec +in both directions. Suppose that packets containing data are 100,000 bits long, and packets +containing only control (e.g., ACK or handshaking) are 200 bits long. Assume that N parallel +connections each get 1/N of the link bandwidth. Now consider the HTTP protocol, and suppose +that each downloaded object is 100 Kbits long, and that the initial downloaded object contains 10 +referenced objects from the same sender. Would parallel downloads via parallel instances of +non-persistent HTTP make sense in this case? Now consider persistent HTTP. Do you expect +significant gains over the non-persistent case? Justify and explain your answer. +P11. Consider the scenario introduced in the previous problem. Now suppose that the link is +shared by Bob with four other users. Bob uses parallel instances of non-persistent HTTP, and +the other four users use non-persistent HTTP without parallel downloads. + +a. Do Bob’s parallel connections help him get Web pages more quickly? Why or why not? +b. If all five users open five parallel instances of non-persistent HTTP, then would Bob’s +parallel connections still be beneficial? Why or why not? +P12. Write a simple TCP program for a server that accepts lines of input from a client and prints +the lines onto the server’s standard output. (You can do this by modifying the TCPServer.py +program in the text.) Compile and execute your program. On any other machine that contains a +Web browser, set the proxy server in the browser to the host that is running your server +program; also configure the port number appropriately. Your browser should now send its GET +request messages to your server, and your server should display the messages on its standard +output. Use this platform to determine whether your browser generates conditional GET +messages for objects that are locally cached. +P13. What is the difference between MAIL FROM : in SMTP and From : in the mail message +itself? +P14. How does SMTP mark the end of a message body? How about HTTP? Can HTTP use the +same method as SMTP to mark the end of a message body? Explain. +P15. Read RFC 5321 for SMTP. What does MTA stand for? Consider the following received +spam e-mail (modified from a real spam e-mail). Assuming only the originator of this spam e-mail +is malicious and all other hosts are honest, identify the malacious host that has generated this +spam e-mail. + +From - Fri Nov 07 13:41:30 2008 +Return-Path: +Received: from barmail.cs.umass.edu (barmail.cs.umass.edu +[128.119.240.3]) by cs.umass.edu (8.13.1/8.12.6) for +; Fri, 7 Nov 2008 13:27:10 -0500 +Received: from asusus-4b96 (localhost [127.0.0.1]) by +barmail.cs.umass.edu (Spam Firewall) for ; Fri, 7 + + Nov 2008 13:27:07 -0500 (EST) +Received: from asusus-4b96 ([58.88.21.177]) by barmail.cs.umass.edu +for ; Fri, 07 Nov 2008 13:27:07 -0500 (EST) +Received: from [58.88.21.177] by inbnd55.exchangeddd.com; Sat, 8 +Nov 2008 01:27:07 +0700 +From: ”Jonny” +To: +Subject: How to secure your savings + +P16. Read the POP3 RFC, RFC 1939. What is the purpose of the UIDL POP3 command? +P17. Consider accessing your e-mail with POP3. + +a. Suppose you have configured your POP mail client to operate in the download-anddelete mode. Complete the following transaction: + +C: list +S: 1 498 +S: 2 912 +S: . +C: retr 1 +S: blah blah ... +S: ..........blah +S: . +? +? + +b. Suppose you have configured your POP mail client to operate in the download-and-keep +mode. Complete the following transaction: +C: list +S: 1 498 +S: 2 912 +S: . +C: retr 1 +S: blah blah ... +S: ..........blah +S: . +? + + ? + +c. Suppose you have configured your POP mail client to operate in the download-and-keep +mode. Using your transcript in part (b), suppose you retrieve messages 1 and 2, exit +POP, and then five minutes later you again access POP to retrieve new e-mail. Suppose +that in the five-minute interval no new messages have been sent to you. Provide a +transcript of this second POP session. +P18. + +a. What is a whois database? +b. Use various whois databases on the Internet to obtain the names of two DNS servers. +Indicate which whois databases you used. + +c. Use nslookup on your local host to send DNS queries to three DNS servers: your local +DNS server and the two DNS servers you found in part (b). Try querying for Type A, NS, +and MX reports. Summarize your findings. + +d. Use nslookup to find a Web server that has multiple IP addresses. Does the Web server +of your institution (school or company) have multiple IP addresses? + +e. Use the ARIN whois database to determine the IP address range used by your +university. + +f. Describe how an attacker can use whois databases and the nslookup tool to perform +reconnaissance on an institution before launching an attack. + +g. Discuss why whois databases should be publicly available. +P19. In this problem, we use the useful dig tool available on Unix and Linux hosts to explore the +hierarchy of DNS servers. Recall that in Figure 2.19 , a DNS server in the DNS hierarchy +delegates a DNS query to a DNS server lower in the hierarchy, by sending back to the DNS +client the name of that lower-level DNS server. First read the man page for dig, and then answer +the following questions. + +a. Starting with a root DNS server (from one of the root servers [a-m].root-servers.net), +initiate a sequence of queries for the IP address for your department’s Web server by +using dig. Show the list of the names of DNS servers in the delegation chain in +answering your query. + +b. Repeat part (a) for several popular Web sites, such as google.com, yahoo.com, or +amazon.com. +P20. Suppose you can access the caches in the local DNS servers of your department. Can you +propose a way to roughly determine the Web servers (outside your department) that are most +popular among the users in your department? Explain. +P21. Suppose that your department has a local DNS server for all computers in the department. + + You are an ordinary user (i.e., not a network/system administrator). Can you determine if an +external Web site was likely accessed from a computer in your department a couple of seconds +ago? Explain. +P22. Consider distributing a file of F=15 Gbits to N peers. The server has an upload rate of +us=30 Mbps, and each peer has a download rate of di=2 Mbps and an upload rate of u. For +N=10, 100, and 1,000 and u=300 Kbps, 700 Kbps, and 2 Mbps, prepare a chart giving the +minimum distribution time for each of the combinations of N and u for both client-server +distribution and P2P distribution. +P23. Consider distributing a file of F bits to N peers using a client-server architecture. Assume a +fluid model where the server can simultaneously transmit to multiple peers, transmitting to each +peer at different rates, as long as the combined rate does not exceed us. + +a. Suppose that us/N≤dmin. Specify a distribution scheme that has a distribution time of +NF/us. + +b. Suppose that us/N≥dmin. Specify a distribution scheme that has a distribution time of +F/dmin. + +c. Conclude that the minimum distribution time is in general given by max{NF/us, F/dmin}. +P24. Consider distributing a file of F bits to N peers using a P2P architecture. Assume a fluid +model. For simplicity assume that dmin is very large, so that peer download bandwidth is never a +bottleneck. + +a. Suppose that us≤(us+u1+…+uN)/N. Specify a distribution scheme that has a distribution +time of F/us. + +b. Suppose that us≥(us+u1+…+uN)/N. Specify a distribution scheme that has a distribution +time of NF/(us+u1+…+uN). + +c. Conclude that the minimum distribution time is in general given by +max{F/us, NF/(us+u1+…+uN)}. +P25. Consider an overlay network with N active peers, with each pair of peers having an active +TCP connection. Additionally, suppose that the TCP connections pass through a total of M +routers. How many nodes and edges are there in the corresponding overlay network? +P26. Suppose Bob joins a BitTorrent torrent, but he does not want to upload any data to any +other peers (so called free-riding). + +a. Bob claims that he can receive a complete copy of the file that is shared by the swarm. Is +Bob’s claim possible? Why or why not? + +b. Bob further claims that he can further make his “free-riding” more efficient by using a +collection of multiple computers (with distinct IP addresses) in the computer lab in his +department. How can he do that? +P27. Consider a DASH system for which there are N video versions (at N different rates and +qualities) and N audio versions (at N different rates and qualities). Suppose we want to allow the + + player to choose at any time any of the N video versions and any of the N audio versions. + +a. If we create files so that the audio is mixed in with the video, so server sends only one +media stream at given time, how many files will the server need to store (each a different +URL)? + +b. If the server instead sends the audio and video streams separately and has the client +synchronize the streams, how many files will the server need to store? +P28. Install and compile the Python programs TCPClient and UDPClient on one host and +TCPServer and UDPServer on another host. + +a. Suppose you run TCPClient before you run TCPServer. What happens? Why? +b. Suppose you run UDPClient before you run UDPServer. What happens? Why? +c. What happens if you use different port numbers for the client and server sides? +P29. Suppose that in UDPClient.py, after we create the socket, we add the line: +clientSocket.bind((’’, 5432)) + +Will it become necessary to change UDPServer.py? What are the port numbers for the sockets +in UDPClient and UDPServer? What were they before making this change? +P30. Can you configure your browser to open multiple simultaneous connections to a Web site? +What are the advantages and disadvantages of having a large number of simultaneous TCP +connections? +P31. We have seen that Internet TCP sockets treat the data being sent as a byte stream but +UDP sockets recognize message boundaries. What are one advantage and one disadvantage of +byte-oriented API versus having the API explicitly recognize and preserve application-defined +message boundaries? +P32. What is the Apache Web server? How much does it cost? What functionality does it +currently have? You may want to look at Wikipedia to answer this question. + +Socket Programming Assignments +The Companion Website includes six socket programming assignments. The first four assignments are +summarized below. The fifth assignment makes use of the ICMP protocol and is summarized at the end +of Chapter 5. The sixth assignment employs multimedia protocols and is summarized at the end of +Chapter 9. It is highly recommended that students complete several, if not all, of these assignments. +Students can find full details of these assignments, as well as important snippets of the Python code, at +the Web site www.pearsonhighered.com/cs-resources. +Assignment 1: Web Server + + In this assignment, you will develop a simple Web server in Python that is capable of processing only +one request. Specifically, your Web server will (i) create a connection socket when contacted by a client +(browser); (ii) receive the HTTP request from this connection; (iii) parse the request to determine the +specific file being requested; (iv) get the requested file from the server’s file system; (v) create an HTTP +response message consisting of the requested file preceded by header lines; and (vi) send the response +over the TCP connection to the requesting browser. If a browser requests a file that is not present in +your server, your server should return a “404 Not Found” error message. +In the Companion Website, we provide the skeleton code for your server. Your job is to complete the +code, run your server, and then test your server by sending requests from browsers running on different +hosts. If you run your server on a host that already has a Web server running on it, then you should use +a different port than port 80 for your Web server. +Assignment 2: UDP Pinger +In this programming assignment, you will write a client ping program in Python. Your client will send a +simple ping message to a server, receive a corresponding pong message back from the server, and +determine the delay between when the client sent the ping message and received the pong message. +This delay is called the Round Trip Time (RTT). The functionality provided by the client and server is +similar to the functionality provided by standard ping program available in modern operating systems. +However, standard ping programs use the Internet Control Message Protocol (ICMP) (which we will +study in Chapter 5). Here we will create a nonstandard (but simple!) UDP-based ping program. +Your ping program is to send 10 ping messages to the target server over UDP. For each message, your +client is to determine and print the RTT when the corresponding pong message is returned. Because +UDP is an unreliable protocol, a packet sent by the client or server may be lost. For this reason, the +client cannot wait indefinitely for a reply to a ping message. You should have the client wait up to one +second for a reply from the server; if no reply is received, the client should assume that the packet was +lost and print a message accordingly. +In this assignment, you will be given the complete code for the server (available in the Companion +Website). Your job is to write the client code, which will be very similar to the server code. It is +recommended that you first study carefully the server code. You can then write your client code, liberally +cutting and pasting lines from the server code. +Assignment 3: Mail Client +The goal of this programming assignment is to create a simple mail client that sends e-mail to any +recipient. Your client will need to establish a TCP connection with a mail server (e.g., a Google mail +server), dialogue with the mail server using the SMTP protocol, send an e-mail message to a recipient + + (e.g., your friend) via the mail server, and finally close the TCP connection with the mail server. +For this assignment, the Companion Website provides the skeleton code for your client. Your job is to +complete the code and test your client by sending e-mail to different user accounts. You may also try +sending through different servers (for example, through a Google mail server and through your +university mail server). +Assignment 4: Multi-Threaded Web Proxy +In this assignment, you will develop a Web proxy. When your proxy receives an HTTP request for an +object from a browser, it generates a new HTTP request for the same object and sends it to the origin +server. When the proxy receives the corresponding HTTP response with the object from the origin +server, it creates a new HTTP response, including the object, and sends it to the client. This proxy will +be multi-threaded, so that it will be able to handle multiple requests at the same time. +For this assignment, the Companion Website provides the skeleton code for the proxy server. Your job +is to complete the code, and then test it by having different browsers request Web objects via your +proxy. + +Wireshark Lab: HTTP +Having gotten our feet wet with the Wireshark packet sniffer in Lab 1, we’re now ready to use Wireshark +to investigate protocols in operation. In this lab, we’ll explore several aspects of the HTTP protocol: the +basic GET/reply interaction, HTTP message formats, retrieving large HTML files, retrieving HTML files +with embedded URLs, persistent and non-persistent connections, and HTTP authentication and +security. +As is the case with all Wireshark labs, the full description of this lab is available at this book’s Web site, +www.pearsonhighered.com/cs-resources. + +Wireshark Lab: DNS +In this lab, we take a closer look at the client side of the DNS, the protocol that translates Internet +hostnames to IP addresses. Recall from Section 2.5 that the client’s role in the DNS is relatively simple +—a client sends a query to its local DNS server and receives a response back. Much can go on under +the covers, invisible to the DNS clients, as the hierarchical DNS servers communicate with each other to +either recursively or iteratively resolve the client’s DNS query. From the DNS client’s standpoint, +however, the protocol is quite simple—a query is formulated to the local DNS server and a response is +received from that server. We observe DNS in action in this lab. + + As is the case with all Wireshark labs, the full description of this lab is available at this book’s Web site, +www.pearsonhighered.com/cs-resources. +An Interview With… +Marc Andreessen +Marc Andreessen is the co-creator of Mosaic, the Web browser that popularized the World Wide +Web in 1993. Mosaic had a clean, easily understood interface and was the first browser to +display images in-line with text. In 1994, Marc Andreessen and Jim Clark founded Netscape, +whose browser was by far the most popular browser through the mid-1990s. Netscape also +developed the Secure Sockets Layer (SSL) protocol and many Internet server products, +including mail servers and SSL-based Web servers. He is now a co-founder and general partner +of venture capital firm Andreessen Horowitz, overseeing portfolio development with holdings that +include Facebook, Foursquare, Groupon, Jawbone, Twitter, and Zynga. He serves on numerous +boards, including Bump, eBay, Glam Media, Facebook, and Hewlett-Packard. He holds a BS in +Computer Science from the University of Illinois at Urbana-Champaign. + +How did you become interested in computing? Did you always know that you wanted to work in +information technology? +The video game and personal computing revolutions hit right when I was growing up—personal +computing was the new technology frontier in the late 70’s and early 80’s. And it wasn’t just +Apple and the IBM PC, but hundreds of new companies like Commodore and Atari as well. I +taught myself to program out of a book called “Instant Freeze-Dried BASIC” at age 10, and got +my first computer (a TRS-80 Color Computer—look it up!) at age 12. +Please describe one or two of the most exciting projects you have worked on during your career. + + What were the biggest challenges? +Undoubtedly the most exciting project was the original Mosaic web browser in ’92–’93—and the +biggest challenge was getting anyone to take it seriously back then. At the time, everyone +thought the interactive future would be delivered as “interactive television” by huge companies, +not as the Internet by startups. +What excites you about the future of networking and the Internet? What are your biggest +concerns? +The most exciting thing is the huge unexplored frontier of applications and services that +programmers and entrepreneurs are able to explore—the Internet has unleashed creativity at a +level that I don’t think we’ve ever seen before. My biggest concern is the principle of unintended +consequences—we don’t always know the implications of what we do, such as the Internet +being used by governments to run a new level of surveillance on citizens. +Is there anything in particular students should be aware of as Web technology advances? +The rate of change—the most important thing to learn is how to learn—how to flexibly adapt to +changes in the specific technologies, and how to keep an open mind on the new opportunities +and possibilities as you move through your career. +What people inspired you professionally? +Vannevar Bush, Ted Nelson, Doug Engelbart, Nolan Bushnell, Bill Hewlett and Dave Packard, +Ken Olsen, Steve Jobs, Steve Wozniak, Andy Grove, Grace Hopper, Hedy Lamarr, Alan Turing, +Richard Stallman. +What are your recommendations for students who want to pursue careers in computing and +information technology? +Go as deep as you possibly can on understanding how technology is created, and then +complement with learning how business works. +Can technology solve the world’s problems? +No, but we advance the standard of living of people through economic growth, and most +economic growth throughout history has come from technology—so that’s as good as it gets. + + Chapter 3 Transport Layer + +Residing between the application and network layers, the transport layer is a central piece of the layered +network architecture. It has the critical role of providing communication services directly to the +application processes running on different hosts. The pedagogic approach we take in this chapter is to +alternate between discussions of transport-layer principles and discussions of how these principles are +implemented in existing protocols; as usual, particular emphasis will be given to Internet protocols, in +particular the TCP and UDP transport-layer protocols. +We’ll begin by discussing the relationship between the transport and network layers. This sets the stage +for examining the first critical function of the transport layer—extending the network layer’s delivery +service between two end systems to a delivery service between two application-layer processes running +on the end systems. We’ll illustrate this function in our coverage of the Internet’s connectionless +transport protocol, UDP. +We’ll then return to principles and confront one of the most fundamental problems in computer +networking—how two entities can communicate reliably over a medium that may lose and corrupt data. +Through a series of increasingly complicated (and realistic!) scenarios, we’ll build up an array of +techniques that transport protocols use to solve this problem. We’ll then show how these principles are +embodied in TCP, the Internet’s connection-oriented transport protocol. +We’ll next move on to a second fundamentally important problem in networking—controlling the +transmission rate of transport-layer entities in order to avoid, or recover from, congestion within the +network. We’ll consider the causes and consequences of congestion, as well as commonly used +congestion-control techniques. After obtaining a solid understanding of the issues behind congestion +control, we’ll study TCP’s approach to congestion control. + + 3.1 Introduction and Transport-Layer Services +In the previous two chapters we touched on the role of the transport layer and the services that it +provides. Let’s quickly review what we have already learned about the transport layer. +A transport-layer protocol provides for logical communication between application processes running +on different hosts. By logical communication, we mean that from an application’s perspective, it is as if +the hosts running the processes were directly connected; in reality, the hosts may be on opposite sides +of the planet, connected via numerous routers and a wide range of link types. Application processes use +the logical communication provided by the transport layer to send messages to each other, free from the +worry of the details of the physical infrastructure used to carry these messages. Figure 3.1 illustrates +the notion of logical communication. +As shown in Figure 3.1, transport-layer protocols are implemented in the end systems but not in +network routers. On the sending side, the transport layer converts the application-layer messages it +receives from a sending application process into transport-layer packets, known as transport-layer +segments in Internet terminology. This is done by (possibly) breaking the application messages into +smaller chunks and adding a transport-layer header to each chunk to create the transport-layer +segment. The transport layer then passes the segment to the network layer at the sending end system, +where the segment is encapsulated within a network-layer packet (a datagram) and sent to the +destination. It’s important to note that network routers act only on the network-layer fields of the +datagram; that is, they do not examine the fields of the transport-layer segment encapsulated with the +datagram. On the receiving side, the network layer extracts the transport-layer segment from the +datagram and passes the segment up to the transport layer. The transport layer then processes the +received segment, making the data in the segment available to the receiving application. +More than one transport-layer protocol may be available to network applications. For example, the +Internet has two protocols—TCP and UDP. Each of these protocols provides a different set of transportlayer services to the invoking application. + +3.1.1 Relationship Between Transport and Network Layers +Recall that the transport layer lies just above the network layer in the protocol stack. Whereas a +transport-layer protocol provides logical communication between + + Figure 3.1 The transport layer provides logical rather than physical communication between +application processes + +processes running on different hosts, a network-layer protocol provides logical-communication between +hosts. This distinction is subtle but important. Let’s examine this distinction with the aid of a household +analogy. +Consider two houses, one on the East Coast and the other on the West Coast, with each house being +home to a dozen kids. The kids in the East Coast household are cousins of the kids in the West Coast + + household. The kids in the two households love to write to each other—each kid writes each cousin +every week, with each letter delivered by the traditional postal service in a separate envelope. Thus, +each household sends 144 letters to the other household every week. (These kids would save a lot of +money if they had e-mail!) In each of the households there is one kid—Ann in the West Coast house +and Bill in the East Coast house—responsible for mail collection and mail distribution. Each week Ann +visits all her brothers and sisters, collects the mail, and gives the mail to a postal-service mail carrier, +who makes daily visits to the house. When letters arrive at the West Coast house, Ann also has the job +of distributing the mail to her brothers and sisters. Bill has a similar job on the East Coast. +In this example, the postal service provides logical communication between the two houses—the postal +service moves mail from house to house, not from person to person. On the other hand, Ann and Bill +provide logical communication among the cousins—Ann and Bill pick up mail from, and deliver mail to, +their brothers and sisters. Note that from the cousins’ perspective, Ann and Bill are the mail service, +even though Ann and Bill are only a part (the end-system part) of the end-to-end delivery process. This +household example serves as a nice analogy for explaining how the transport layer relates to the +network layer: +application messages = letters in envelopes +processes = cousins +hosts (also called end systems) = houses +transport-layer protocol = Ann and Bill +network-layer protocol = postal service (including mail carriers) +Continuing with this analogy, note that Ann and Bill do all their work within their respective homes; they +are not involved, for example, in sorting mail in any intermediate mail center or in moving mail from one +mail center to another. Similarly, transport-layer protocols live in the end systems. Within an end system, +a transport protocol moves messages from application processes to the network edge (that is, the +network layer) and vice versa, but it doesn’t have any say about how the messages are moved within +the network core. In fact, as illustrated in Figure 3.1, intermediate routers neither act on, nor recognize, +any information that the transport layer may have added to the application messages. +Continuing with our family saga, suppose now that when Ann and Bill go on vacation, another cousin +pair—say, Susan and Harvey—substitute for them and provide the household-internal collection and +delivery of mail. Unfortunately for the two families, Susan and Harvey do not do the collection and +delivery in exactly the same way as Ann and Bill. Being younger kids, Susan and Harvey pick up and +drop off the mail less frequently and occasionally lose letters (which are sometimes chewed up by the +family dog). Thus, the cousin-pair Susan and Harvey do not provide the same set of services (that is, +the same service model) as Ann and Bill. In an analogous manner, a computer network may make + + available multiple transport protocols, with each protocol offering a different service model to +applications. +The possible services that Ann and Bill can provide are clearly constrained by the possible services that +the postal service provides. For example, if the postal service doesn’t provide a maximum bound on how +long it can take to deliver mail between the two houses (for example, three days), then there is no way +that Ann and Bill can guarantee a maximum delay for mail delivery between any of the cousin pairs. In a +similar manner, the services that a transport protocol can provide are often constrained by the service +model of the underlying network-layer protocol. If the network-layer protocol cannot provide delay or +bandwidth guarantees for transport-layer segments sent between hosts, then the transport-layer +protocol cannot provide delay or bandwidth guarantees for application messages sent between +processes. +Nevertheless, certain services can be offered by a transport protocol even when the underlying network +protocol doesn’t offer the corresponding service at the network layer. For example, as we’ll see in this +chapter, a transport protocol can offer reliable data transfer service to an application even when the +underlying network protocol is unreliable, that is, even when the network protocol loses, garbles, or +duplicates packets. As another example (which we’ll explore in Chapter 8 when we discuss network +security), a transport protocol can use encryption to guarantee that application messages are not read +by intruders, even when the network layer cannot guarantee the confidentiality of transport-layer +segments. + +3.1.2 Overview of the Transport Layer in the Internet +Recall that the Internet makes two distinct transport-layer protocols available to the application layer. +One of these protocols is UDP (User Datagram Protocol), which provides an unreliable, connectionless +service to the invoking application. The second of these protocols is TCP (Transmission Control +Protocol), which provides a reliable, connection-oriented service to the invoking application. When +designing a network application, the application developer must specify one of these two transport +protocols. As we saw in Section 2.7, the application developer selects between UDP and TCP when +creating sockets. +To simplify terminology, we refer to the transport-layer packet as a segment. We mention, however, that +the Internet literature (for example, the RFCs) also refers to the transport-layer packet for TCP as a +segment but often refers to the packet for UDP as a datagram. But this same Internet literature also +uses the term datagram for the network-layer packet! For an introductory book on computer networking +such as this, we believe that it is less confusing to refer to both TCP and UDP packets as segments, +and reserve the term datagram for the network-layer packet. + + Before proceeding with our brief introduction of UDP and TCP, it will be useful to say a few words about +the Internet’s network layer. (We’ll learn about the network layer in detail in Chapters 4 and 5.) The +Internet’s network-layer protocol has a name—IP, for Internet Protocol. IP provides logical +communication between hosts. The IP service model is a best-effort delivery service. This means that +IP makes its “best effort” to deliver segments between communicating hosts, but it makes no +guarantees. In particular, it does not guarantee segment delivery, it does not guarantee orderly delivery +of segments, and it does not guarantee the integrity of the data in the segments. For these reasons, IP +is said to be an unreliable service. We also mention here that every host has at least one networklayer address, a so-called IP address. We’ll examine IP addressing in detail in Chapter 4; for this +chapter we need only keep in mind that each host has an IP address. +Having taken a glimpse at the IP service model, let’s now summarize the service models provided by +UDP and TCP. The most fundamental responsibility of UDP and TCP is to extend IP’s delivery service +between two end systems to a delivery service between two processes running on the end systems. +Extending host-to-host delivery to process-to-process delivery is called transport-layer multiplexing +and demultiplexing. We’ll discuss transport-layer multiplexing and demultiplexing in the next section. +UDP and TCP also provide integrity checking by including error-detection fields in their segments’ +headers. These two minimal transport-layer services—process-to-process data delivery and error +checking—are the only two services that UDP provides! In particular, like IP, UDP is an unreliable +service—it does not guarantee that data sent by one process will arrive intact (or at all!) to the +destination process. UDP is discussed in detail in Section 3.3. +TCP, on the other hand, offers several additional services to applications. First and foremost, it provides +reliable data transfer. Using flow control, sequence numbers, acknowledgments, and timers +(techniques we’ll explore in detail in this chapter), TCP ensures that data is delivered from sending +process to receiving process, correctly and in order. TCP thus converts IP’s unreliable service between +end systems into a reliable data transport service between processes. TCP also provides congestion +control. Congestion control is not so much a service provided to the invoking application as it is a +service for the Internet as a whole, a service for the general good. Loosely speaking, TCP congestion +control prevents any one TCP connection from swamping the links and routers between communicating +hosts with an excessive amount of traffic. TCP strives to give each connection traversing a congested +link an equal share of the link bandwidth. This is done by regulating the rate at which the sending sides +of TCP connections can send traffic into the network. UDP traffic, on the other hand, is unregulated. An +application using UDP transport can send at any rate it pleases, for as long as it pleases. +A protocol that provides reliable data transfer and congestion control is necessarily complex. We’ll need +several sections to cover the principles of reliable data transfer and congestion control, and additional +sections to cover the TCP protocol itself. These topics are investigated in Sections 3.4 through 3.8. The +approach taken in this chapter is to alternate between basic principles and the TCP protocol. For +example, we’ll first discuss reliable data transfer in a general setting and then discuss how TCP + + specifically provides reliable data transfer. Similarly, we’ll first discuss congestion control in a general +setting and then discuss how TCP performs congestion control. But before getting into all this good stuff, +let’s first look at transport-layer multiplexing and demultiplexing. + + 3.2 Multiplexing and Demultiplexing +In this section, we discuss transport-layer multiplexing and demultiplexing, that is, extending the host-tohost delivery service provided by the network layer to a process-to-process delivery service for +applications running on the hosts. In order to keep the discussion concrete, we’ll discuss this basic +transport-layer service in the context of the Internet. We emphasize, however, that a +multiplexing/demultiplexing service is needed for all computer networks. +At the destination host, the transport layer receives segments from the network layer just below. The +transport layer has the responsibility of delivering the data in these segments to the appropriate +application process running in the host. Let’s take a look at an example. Suppose you are sitting in front +of your computer, and you are downloading Web pages while running one FTP session and two Telnet +sessions. You therefore have four network application processes running—two Telnet processes, one +FTP process, and one HTTP process. When the transport layer in your computer receives data from the +network layer below, it needs to direct the received data to one of these four processes. Let’s now +examine how this is done. +First recall from Section 2.7 that a process (as part of a network application) can have one or more +sockets, doors through which data passes from the network to the process and through which data +passes from the process to the network. Thus, as shown in Figure 3.2, the transport layer in the +receiving host does not actually deliver data directly to a process, but instead to an intermediary socket. +Because at any given time there can be more than one socket in the receiving host, each socket has a +unique identifier. The format of the identifier depends on whether the socket is a UDP or a TCP socket, +as we’ll discuss shortly. +Now let’s consider how a receiving host directs an incoming transport-layer segment to the appropriate +socket. Each transport-layer segment has a set of fields in the segment for this purpose. At the receiving +end, the transport layer examines these fields to identify the receiving socket and then directs the +segment to that socket. This job of delivering the data in a transport-layer segment to the correct socket +is called demultiplexing. The job of gathering data chunks at the source host from different sockets, +encapsulating each data chunk with header information (that will later be used in demultiplexing) to +create segments, and passing the segments to the network layer is called multiplexing. Note that the +transport layer in the middle host + + Figure 3.2 Transport-layer multiplexing and demultiplexing + +in Figure 3.2 must demultiplex segments arriving from the network layer below to either process P1 or P2 +above; this is done by directing the arriving segment’s data to the corresponding process’s socket. The +transport layer in the middle host must also gather outgoing data from these sockets, form transportlayer segments, and pass these segments down to the network layer. Although we have introduced +multiplexing and demultiplexing in the context of the Internet transport protocols, it’s important to realize +that they are concerns whenever a single protocol at one layer (at the transport layer or elsewhere) is +used by multiple protocols at the next higher layer. +To illustrate the demultiplexing job, recall the household analogy in the previous section. Each of the +kids is identified by his or her name. When Bill receives a batch of mail from the mail carrier, he +performs a demultiplexing operation by observing to whom the letters are addressed and then hand +delivering the mail to his brothers and sisters. Ann performs a multiplexing operation when she collects +letters from her brothers and sisters and gives the collected mail to the mail person. +Now that we understand the roles of transport-layer multiplexing and demultiplexing, let us examine how +it is actually done in a host. From the discussion above, we know that transport-layer multiplexing +requires (1) that sockets have unique identifiers, and (2) that each segment have special fields that +indicate the socket to which the segment is to be delivered. These special fields, illustrated in Figure +3.3, are the source port number field and the destination port number field. (The UDP and TCP +segments have other fields as well, as discussed in the subsequent sections of this chapter.) Each port +number is a 16-bit number, ranging from 0 to 65535. The port numbers ranging from 0 to 1023 are +called well-known port numbers and are restricted, which means that they are reserved for use by +well-known + + Figure 3.3 Source and destination port-number fields in a transport-layer segment + +application protocols such as HTTP (which uses port number 80) and FTP (which uses port number 21). +The list of well-known port numbers is given in RFC 1700 and is updated at http://www.iana.org [RFC +3232]. When we develop a new application (such as the simple application developed in Section 2.7), +we must assign the application a port number. +It should now be clear how the transport layer could implement the demultiplexing service: Each socket +in the host could be assigned a port number, and when a segment arrives at the host, the transport layer +examines the destination port number in the segment and directs the segment to the corresponding +socket. The segment’s data then passes through the socket into the attached process. As we’ll see, this +is basically how UDP does it. However, we’ll also see that multiplexing/demultiplexing in TCP is yet +more subtle. +Connectionless Multiplexing and Demultiplexing +Recall from Section 2.7.1 that the Python program running in a host can create a UDP socket with the +line + +clientSocket = socket(AF_INET, SOCK_DGRAM) + +When a UDP socket is created in this manner, the transport layer automatically assigns a port number +to the socket. In particular, the transport layer assigns a port number in the range 1024 to 65535 that is +currently not being used by any other UDP port in the host. Alternatively, we can add a line into our +Python program after we create the socket to associate a specific port number (say, 19157) to this UDP +socket via the socket bind() method: + +clientSocket.bind((’’, 19157)) + + If the application developer writing the code were implementing the server side of a “well-known +protocol,” then the developer would have to assign the corresponding well-known port number. +Typically, the client side of the application lets the transport layer automatically (and transparently) +assign the port number, whereas the server side of the application assigns a specific port number. +With port numbers assigned to UDP sockets, we can now precisely describe UDP +multiplexing/demultiplexing. Suppose a process in Host A, with UDP port 19157, wants to send a chunk +of application data to a process with UDP port 46428 in Host B. The transport layer in Host A creates a +transport-layer segment that includes the application data, the source port number (19157), the +destination port number (46428), and two other values (which will be discussed later, but are +unimportant for the current discussion). The transport layer then passes the resulting segment to the +network layer. The network layer encapsulates the segment in an IP datagram and makes a best-effort +attempt to deliver the segment to the receiving host. If the segment arrives at the receiving Host B, the +transport layer at the receiving host examines the destination port number in the segment (46428) and +delivers the segment to its socket identified by port 46428. Note that Host B could be running multiple +processes, each with its own UDP socket and associated port number. As UDP segments arrive from +the network, Host B directs (demultiplexes) each segment to the appropriate socket by examining the +segment’s destination port number. +It is important to note that a UDP socket is fully identified by a two-tuple consisting of a destination IP +address and a destination port number. As a consequence, if two UDP segments have different source +IP addresses and/or source port numbers, but have the same destination IP address and destination +port number, then the two segments will be directed to the same destination process via the same +destination socket. +You may be wondering now, what is the purpose of the source port number? As shown in Figure 3.4, in +the A-to-B segment the source port number serves as part of a “return address”—when B wants to send +a segment back to A, the destination port in the B-to-A segment will take its value from the source port +value of the A-to-B segment. (The complete return address is A’s IP address and the source port +number.) As an example, recall the UDP server program studied in Section 2.7. In UDPServer.py , +the server uses the recvfrom() method to extract the client-side (source) port number from the +segment it receives from the client; it then sends a new segment to the client, with the extracted source +port number serving as the destination port number in this new segment. +Connection-Oriented Multiplexing and Demultiplexing +In order to understand TCP demultiplexing, we have to take a close look at TCP sockets and TCP +connection establishment. One subtle difference between a TCP socket and a UDP socket is that a TCP + + socket is identified by a four-tuple: (source IP address, source port number, destination IP address, +destination port number). Thus, when a TCP segment arrives from the network to a host, the host uses +all four values to direct (demultiplex) the segment to the appropriate socket. + +Figure 3.4 The inversion of source and destination port numbers + +In particular, and in contrast with UDP, two arriving TCP segments with different source IP addresses or +source port numbers will (with the exception of a TCP segment carrying the original connectionestablishment request) be directed to two different sockets. To gain further insight, let’s reconsider the +TCP client-server programming example in Section 2.7.2: +The TCP server application has a “welcoming socket,” that waits for connection-establishment +requests from TCP clients (see Figure 2.29) on port number 12000. +The TCP client creates a socket and sends a connection establishment request segment with the +lines: + +clientSocket = socket(AF_INET, SOCK_STREAM) +clientSocket.connect((serverName,12000)) + +A connection-establishment request is nothing more than a TCP segment with destination port +number 12000 and a special connection-establishment bit set in the TCP header (discussed in +Section 3.5). The segment also includes a source port number that was chosen by the client. +When the host operating system of the computer running the server process receives the incoming + + connection-request segment with destination port 12000, it locates the server process that is waiting +to accept a connection on port number 12000. The server process then creates a new socket: +connectionSocket, addr = serverSocket.accept() + +Also, the transport layer at the server notes the following four values in the connection-request +segment: (1) the source port number in the segment, (2) the IP address of the source host, (3) the +destination port number in the segment, and (4) its own IP address. The newly created connection +socket is identified by these four values; all subsequently arriving segments whose source port, +source IP address, destination port, and destination IP address match these four values will be +demultiplexed to this socket. With the TCP connection now in place, the client and server can now +send data to each other. +The server host may support many simultaneous TCP connection sockets, with each socket attached to +a process, and with each socket identified by its own four-tuple. When a TCP segment arrives at the +host, all four fields (source IP address, source port, destination IP address, destination port) are used to +direct (demultiplex) the segment to the appropriate socket. + +FOCUS ON SECURITY +Port Scanning +We’ve seen that a server process waits patiently on an open port for contact by a remote client. +Some ports are reserved for well-known applications (e.g., Web, FTP, DNS, and SMTP servers); +other ports are used by convention by popular applications (e.g., the Microsoft 2000 SQL server +listens for requests on UDP port 1434). Thus, if we determine that a port is open on a host, we +may be able to map that port to a specific application running on the host. This is very useful for +system administrators, who are often interested in knowing which network applications are +running on the hosts in their networks. But attackers, in order to “case the joint,” also want to +know which ports are open on target hosts. If a host is found to be running an application with a +known security flaw (e.g., a SQL server listening on port 1434 was subject to a buffer overflow, +allowing a remote user to execute arbitrary code on the vulnerable host, a flaw exploited by the +Slammer worm [CERT 2003–04]), then that host is ripe for attack. +Determining which applications are listening on which ports is a relatively easy task. Indeed +there are a number of public domain programs, called port scanners, that do just that. Perhaps +the most widely used of these is nmap, freely available at http://nmap.org and included in most +Linux distributions. For TCP, nmap sequentially scans ports, looking for ports that are accepting +TCP connections. For UDP, nmap again sequentially scans ports, looking for UDP ports that +respond to transmitted UDP segments. In both cases, nmap returns a list of open, closed, or +unreachable ports. A host running nmap can attempt to scan any target host anywhere in the + + Internet. We’ll revisit nmap in Section 3.5.6, when we discuss TCP connection management. + +Figure 3.5 Two clients, using the same destination port number (80) to communicate with the +same Web server application + +The situation is illustrated in Figure 3.5, in which Host C initiates two HTTP sessions to server B, and +Host A initiates one HTTP session to B. Hosts A and C and server B each have their own unique IP +address—A, C, and B, respectively. Host C assigns two different source port numbers (26145 and 7532) +to its two HTTP connections. Because Host A is choosing source port numbers independently of C, it +might also assign a source port of 26145 to its HTTP connection. But this is not a problem—server B will +still be able to correctly demultiplex the two connections having the same source port number, since the +two connections have different source IP addresses. +Web Servers and TCP +Before closing this discussion, it’s instructive to say a few additional words about Web servers and how +they use port numbers. Consider a host running a Web server, such as an Apache Web server, on port +80. When clients (for example, browsers) send segments to the server, all segments will have +destination port 80. In particular, both the initial connection-establishment segments and the segments +carrying HTTP request messages will have destination port 80. As we have just described, the server +distinguishes the segments from the different clients using source IP addresses and source port + + numbers. +Figure 3.5 shows a Web server that spawns a new process for each connection. As shown in Figure +3.5, each of these processes has its own connection socket through which HTTP requests arrive and +HTTP responses are sent. We mention, however, that there is not always a one-to-one correspondence +between connection sockets and processes. In fact, today’s high-performing Web servers often use only +one process, and create a new thread with a new connection socket for each new client connection. (A +thread can be viewed as a lightweight subprocess.) If you did the first programming assignment in +Chapter 2, you built a Web server that does just this. For such a server, at any given time there may be +many connection sockets (with different identifiers) attached to the same process. +If the client and server are using persistent HTTP, then throughout the duration of the persistent +connection the client and server exchange HTTP messages via the same server socket. However, if the +client and server use non-persistent HTTP, then a new TCP connection is created and closed for every +request/response, and hence a new socket is created and later closed for every request/response. This +frequent creating and closing of sockets can severely impact the performance of a busy Web server +(although a number of operating system tricks can be used to mitigate the problem). Readers interested +in the operating system issues surrounding persistent and non-persistent HTTP are encouraged to see +[Nielsen 1997; Nahum 2002]. +Now that we’ve discussed transport-layer multiplexing and demultiplexing, let’s move on and discuss +one of the Internet’s transport protocols, UDP. In the next section we’ll see that UDP adds little more to +the network-layer protocol than a multiplexing/demultiplexing service. + + 3.3 Connectionless Transport: UDP +In this section, we’ll take a close look at UDP, how it works, and what it does. We encourage you to refer +back to Section 2.1, which includes an overview of the UDP service model, and to Section 2.7.1, which +discusses socket programming using UDP. +To motivate our discussion about UDP, suppose you were interested in designing a no-frills, bare-bones +transport protocol. How might you go about doing this? You might first consider using a vacuous +transport protocol. In particular, on the sending side, you might consider taking the messages from the +application process and passing them directly to the network layer; and on the receiving side, you might +consider taking the messages arriving from the network layer and passing them directly to the +application process. But as we learned in the previous section, we have to do a little more than nothing! +At the very least, the transport layer has to provide a multiplexing/demultiplexing service in order to pass +data between the network layer and the correct application-level process. +UDP, defined in [RFC 768], does just about as little as a transport protocol can do. Aside from the +multiplexing/demultiplexing function and some light error checking, it adds nothing to IP. In fact, if the +application developer chooses UDP instead of TCP, then the application is almost directly talking with +IP. UDP takes messages from the application process, attaches source and destination port number +fields for the multiplexing/demultiplexing service, adds two other small fields, and passes the resulting +segment to the network layer. The network layer encapsulates the transport-layer segment into an IP +datagram and then makes a best-effort attempt to deliver the segment to the receiving host. If the +segment arrives at the receiving host, UDP uses the destination port number to deliver the segment’s +data to the correct application process. Note that with UDP there is no handshaking between sending +and receiving transport-layer entities before sending a segment. For this reason, UDP is said to be +connectionless. +DNS is an example of an application-layer protocol that typically uses UDP. When the DNS application +in a host wants to make a query, it constructs a DNS query message and passes the message to UDP. +Without performing any handshaking with the UDP entity running on the destination end system, the +host-side UDP adds header fields to the message and passes the resulting segment to the network +layer. The network layer encapsulates the UDP segment into a datagram and sends the datagram to a +name server. The DNS application at the querying host then waits for a reply to its query. If it doesn’t +receive a reply (possibly because the underlying network lost the query or the reply), it might try +resending the query, try sending the query to another name server, or inform the invoking application +that it can’t get a reply. + + Now you might be wondering why an application developer would ever choose to build an application +over UDP rather than over TCP. Isn’t TCP always preferable, since TCP provides a reliable data +transfer service, while UDP does not? The answer is no, as some applications are better suited for UDP +for the following reasons: +Finer application-level control over what data is sent, and when. Under UDP, as soon as an +application process passes data to UDP, UDP will package the data inside a UDP segment and +immediately pass the segment to the network layer. TCP, on the other hand, has a congestioncontrol mechanism that throttles the transport-layer TCP sender when one or more links between the +source and destination hosts become excessively congested. TCP will also continue to resend a +segment until the receipt of the segment has been acknowledged by the destination, regardless of +how long reliable delivery takes. Since real-time applications often require a minimum sending rate, +do not want to overly delay segment transmission, and can tolerate some data loss, TCP’s service +model is not particularly well matched to these applications’ needs. As discussed below, these +applications can use UDP and implement, as part of the application, any additional functionality that +is needed beyond UDP’s no-frills segment-delivery service. +No connection establishment. As we’ll discuss later, TCP uses a three-way handshake before it +starts to transfer data. UDP just blasts away without any formal preliminaries. Thus UDP does not +introduce any delay to establish a connection. This is probably the principal reason why DNS runs +over UDP rather than TCP—DNS would be much slower if it ran over TCP. HTTP uses TCP rather +than UDP, since reliability is critical for Web pages with text. But, as we briefly discussed in Section +2.2, the TCP connection-establishment delay in HTTP is an important contributor to the delays +associated with downloading Web documents. Indeed, the QUIC protocol (Quick UDP Internet +Connection, [Iyengar 2015]), used in Google’s Chrome browser, uses UDP as its underlying +transport protocol and implements reliability in an application-layer protocol on top of UDP. +No connection state. TCP maintains connection state in the end systems. This connection state +includes receive and send buffers, congestion-control parameters, and sequence and +acknowledgment number parameters. We will see in Section 3.5 that this state information is +needed to implement TCP’s reliable data transfer service and to provide congestion control. UDP, on +the other hand, does not maintain connection state and does not track any of these parameters. For +this reason, a server devoted to a particular application can typically support many more active +clients when the application runs over UDP rather than TCP. +Small packet header overhead. The TCP segment has 20 bytes of header overhead in every +segment, whereas UDP has only 8 bytes of overhead. +Figure 3.6 lists popular Internet applications and the transport protocols that they use. As we expect, email, remote terminal access, the Web, and file transfer run over TCP—all these applications need the +reliable data transfer service of TCP. Nevertheless, many important applications run over UDP rather +than TCP. For example, UDP is used to carry network management (SNMP; see Section 5.7) data. +UDP is preferred to TCP in this case, since network management applications must often run when the + + network is in a stressed state—precisely when reliable, congestion-controlled data transfer is difficult to +achieve. Also, as we mentioned earlier, DNS runs over UDP, thereby avoiding TCP’s connectionestablishment delays. +As shown in Figure 3.6, both UDP and TCP are somtimes used today with multimedia applications, +such as Internet phone, real-time video conferencing, and streaming of stored audio and video. We’ll +take a close look at these applications in Chapter 9. We just mention now that all of these applications +can tolerate a small amount of packet loss, so that reliable data transfer is not absolutely critical for the +application’s success. Furthermore, real-time applications, like Internet phone and video conferencing, +react very poorly to TCP’s congestion control. For these reasons, developers of multimedia applications +may choose to run their applications over UDP instead of TCP. When packet loss rates are low, and +with some organizations blocking UDP traffic for security reasons (see Chapter 8), TCP becomes an +increasingly attractive protocol for streaming media transport. + +Figure 3.6 Popular Internet applications and their underlying transport protocols + +Although commonly done today, running multimedia applications over UDP is controversial. As we +mentioned above, UDP has no congestion control. But congestion control is needed to prevent the +network from entering a congested state in which very little useful work is done. If everyone were to start +streaming high-bit-rate video without using any congestion control, there would be so much packet +overflow at routers that very few UDP packets would successfully traverse the source-to-destination +path. Moreover, the high loss rates induced by the uncontrolled UDP senders would cause the TCP +senders (which, as we’ll see, do decrease their sending rates in the face of congestion) to dramatically +decrease their rates. Thus, the lack of congestion control in UDP can result in high loss rates between a +UDP sender and receiver, and the crowding out of TCP sessions—a potentially serious problem [Floyd + + 1999]. Many researchers have proposed new mechanisms to force all sources, including UDP sources, +to perform adaptive congestion control [Mahdavi 1997; Floyd 2000; Kohler 2006: RFC 4340]. +Before discussing the UDP segment structure, we mention that it is possible for an application to have +reliable data transfer when using UDP. This can be done if reliability is built into the application itself (for +example, by adding acknowledgment and retransmission mechanisms, such as those we’ll study in the +next section). We mentioned earlier that the QUIC protocol [Iyengar 2015] used in Google’s Chrome +browser implements reliability in an application-layer protocol on top of UDP. But this is a nontrivial task +that would keep an application developer busy debugging for a long time. Nevertheless, building +reliability directly into the application allows the application to “have its cake and eat it too. That is, +application processes can communicate reliably without being subjected to the transmission-rate +constraints imposed by TCP’s congestion-control mechanism. + +3.3.1 UDP Segment Structure +The UDP segment structure, shown in Figure 3.7, is defined in RFC 768. The application data occupies +the data field of the UDP segment. For example, for DNS, the data field contains either a query +message or a response message. For a streaming audio application, audio samples fill the data field. +The UDP header has only four fields, each consisting of two bytes. As discussed in the previous section, +the port numbers allow the destination host to pass the application data to the correct process running +on the destination end system (that is, to perform the demultiplexing function). The length field specifies +the number of bytes in the UDP segment (header plus data). An explicit length value is needed since the +size of the data field may differ from one UDP segment to the next. The checksum is used by the +receiving host to check whether errors have been introduced into the segment. In truth, the checksum is +also calculated over a few of the fields in the IP header in addition to the UDP segment. But we ignore +this detail in order to see the forest through the trees. We’ll discuss the checksum calculation below. +Basic principles of error detection are described in Section 6.2. The length field specifies the length of +the UDP segment, including the header, in bytes. + +3.3.2 UDP Checksum +The UDP checksum provides for error detection. That is, the checksum is used to determine whether +bits within the UDP segment have been altered (for example, by noise in the links or while stored in a +router) as it moved from source to destination. + + Figure 3.7 UDP segment structure + +UDP at the sender side performs the 1s complement of the sum of all the 16-bit words in the segment, +with any overflow encountered during the sum being wrapped around. This result is put in the checksum +field of the UDP segment. Here we give a simple example of the checksum calculation. You can find +details about efficient implementation of the calculation in RFC 1071 and performance over real data in +[Stone 1998; Stone 2000]. As an example, suppose that we have the following three 16-bit words: +0110011001100000 +0101010101010101 +1000111100001100 +The sum of first two of these 16-bit words is +0110011001100000 +0101010101010101 +1011101110110101 +Adding the third word to the above sum gives +1011101110110101 +1000111100001100 +0100101011000010 +Note that this last addition had overflow, which was wrapped around. The 1s complement is obtained by +converting all the 0s to 1s and converting all the 1s to 0s. Thus the 1s complement of the sum +0100101011000010 is 1011010100111101, which becomes the checksum. At the receiver, all four 16- + + bit words are added, including the checksum. If no errors are introduced into the packet, then clearly the +sum at the receiver will be 1111111111111111. If one of the bits is a 0, then we know that errors have +been introduced into the packet. +You may wonder why UDP provides a checksum in the first place, as many link-layer protocols +(including the popular Ethernet protocol) also provide error checking. The reason is that there is no +guarantee that all the links between source and destination provide error checking; that is, one of the +links may use a link-layer protocol that does not provide error checking. Furthermore, even if segments +are correctly transferred across a link, it’s possible that bit errors could be introduced when a segment is +stored in a router’s memory. Given that neither link-by-link reliability nor in-memory error detection is +guaranteed, UDP must provide error detection at the transport layer, on an end-end basis, if the endend data transfer service is to provide error detection. This is an example of the celebrated end-end +principle in system design [Saltzer 1984], which states that since certain functionality (error detection, +in this case) must be implemented on an end-end basis: “functions placed at the lower levels may be +redundant or of little value when compared to the cost of providing them at the higher level.” +Because IP is supposed to run over just about any layer-2 protocol, it is useful for the transport layer to +provide error checking as a safety measure. Although UDP provides error checking, it does not do +anything to recover from an error. Some implementations of UDP simply discard the damaged segment; +others pass the damaged segment to the application with a warning. +That wraps up our discussion of UDP. We will soon see that TCP offers reliable data transfer to its +applications as well as other services that UDP doesn’t offer. Naturally, TCP is also more complex than +UDP. Before discussing TCP, however, it will be useful to step back and first discuss the underlying +principles of reliable data transfer. + + 3.4 Principles of Reliable Data Transfer +In this section, we consider the problem of reliable data transfer in a general context. This is appropriate +since the problem of implementing reliable data transfer occurs not only at the transport layer, but also +at the link layer and the application layer as well. The general problem is thus of central importance to +networking. Indeed, if one had to identify a “top-ten” list of fundamentally important problems in all of +networking, this would be a candidate to lead the list. In the next section we’ll examine TCP and show, +in particular, that TCP exploits many of the principles that we are about to describe. +Figure 3.8 illustrates the framework for our study of reliable data transfer. The service abstraction +provided to the upper-layer entities is that of a reliable channel through which data can be transferred. +With a reliable channel, no transferred data bits are corrupted (flipped from 0 to 1, or vice versa) or lost, +and all are delivered in the order in which they were sent. This is precisely the service model offered by +TCP to the Internet applications that invoke it. +It is the responsibility of a reliable data transfer protocol to implement this service abstraction. This +task is made difficult by the fact that the layer below the reliable data transfer protocol may be +unreliable. For example, TCP is a reliable data transfer protocol that is implemented on top of an +unreliable (IP) end-to-end network layer. More generally, the layer beneath the two reliably +communicating end points might consist of a single physical link (as in the case of a link-level data +transfer protocol) or a global internetwork (as in the case of a transport-level protocol). For our +purposes, however, we can view this lower layer simply as an unreliable point-to-point channel. +In this section, we will incrementally develop the sender and receiver sides of a reliable data transfer +protocol, considering increasingly complex models of the underlying channel. For example, we’ll +consider what protocol mechanisms are + + Figure 3.8 Reliable data transfer: Service model and service implementation + + needed when the underlying channel can corrupt bits or lose entire packets. One assumption we’ll adopt +throughout our discussion here is that packets will be delivered in the order in which they were sent, with +some packets possibly being lost; that is, the underlying channel will not reorder packets. Figure 3.8(b) +illustrates the interfaces for our data transfer protocol. The sending side of the data transfer protocol will +be invoked from above by a call to rdt_send() . It will pass the data to be delivered to the upper layer +at the receiving side. (Here rdt stands for reliable data transfer protocol and _send indicates that the +sending side of rdt is being called. The first step in developing any protocol is to choose a good +name!) On the receiving side, rdt_rcv() will be called when a packet arrives from the receiving side +of the channel. When the rdt protocol wants to deliver data to the upper layer, it will do so by calling +deliver_data() . In the following we use the terminology “packet” rather than transport-layer +“segment.” Because the theory developed in this section applies to computer networks in general and +not just to the Internet transport layer, the generic term “packet” is perhaps more appropriate here. +In this section we consider only the case of unidirectional data transfer, that is, data transfer from the +sending to the receiving side. The case of reliable bidirectional (that is, full-duplex) data transfer is +conceptually no more difficult but considerably more tedious to explain. Although we consider only +unidirectional data transfer, it is important to note that the sending and receiving sides of our protocol +will nonetheless need to transmit packets in both directions, as indicated in Figure 3.8. We will see +shortly that, in addition to exchanging packets containing the data to be transferred, the sending and +receiving sides of rdt will also need to exchange control packets back and forth. Both the send and +receive sides of rdt send packets to the other side by a call to udt_send() (where udt stands for +unreliable data transfer). + +3.4.1 Building a Reliable Data Transfer Protocol +We now step through a series of protocols, each one becoming more complex, arriving at a flawless, +reliable data transfer protocol. +Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0 +We first consider the simplest case, in which the underlying channel is completely reliable. The protocol +itself, which we’ll call rdt1.0 , is trivial. The finite-state machine (FSM) definitions for the rdt1.0 +sender and receiver are shown in Figure 3.9. The FSM in Figure 3.9(a) defines the operation of the +sender, while the FSM in Figure 3.9(b) defines the operation of the receiver. It is important to note that +there are separate FSMs for the sender and for the receiver. The sender and receiver FSMs in Figure +3.9 each have just one state. The arrows in the FSM description indicate the transition of the protocol +from one state to another. (Since each FSM in Figure 3.9 has just one state, a transition is necessarily +from the one state back to itself; we’ll see more complicated state diagrams shortly.) The event causing + + the transition is shown above the horizontal line labeling the transition, and the actions taken when the +event occurs are shown below the horizontal line. When no action is taken on an event, or no event +occurs and an action is taken, we’ll use the symbol Λ below or above the horizontal, respectively, to +explicitly denote the lack of an action or event. The initial state of the FSM is indicated by the dashed +arrow. Although the FSMs in Figure 3.9 have but one state, the FSMs we will see shortly have multiple +states, so it will be important to identify the initial state of each FSM. +The sending side of rdt simply accepts data from the upper layer via the rdt_send(data) event, +creates a packet containing the data (via the action make_pkt(data) ) and sends the packet into the +channel. In practice, the rdt_send(data) event would result from a procedure call (for example, to +rdt_send() ) by the upper-layer application. + +Figure 3.9 rdt1.0 – A protocol for a completely reliable channel + +On the receiving side, rdt receives a packet from the underlying channel via the rdt_rcv(packet) +event, removes the data from the packet (via the action extract (packet, data) ) and passes the +data up to the upper layer (via the action deliver_data(data) ). In practice, the +rdt_rcv(packet) event would result from a procedure call (for example, to rdt_rcv() ) from the +lower-layer protocol. +In this simple protocol, there is no difference between a unit of data and a packet. Also, all packet flow is +from the sender to receiver; with a perfectly reliable channel there is no need for the receiver side to +provide any feedback to the sender since nothing can go wrong! Note that we have also assumed that + + the receiver is able to receive data as fast as the sender happens to send data. Thus, there is no need +for the receiver to ask the sender to slow down! +Reliable Data Transfer over a Channel with Bit Errors: rdt2.0 +A more realistic model of the underlying channel is one in which bits in a packet may be corrupted. Such +bit errors typically occur in the physical components of a network as a packet is transmitted, propagates, +or is buffered. We’ll continue to assume for the moment that all transmitted packets are received +(although their bits may be corrupted) in the order in which they were sent. +Before developing a protocol for reliably communicating over such a channel, first consider how people +might deal with such a situation. Consider how you yourself might dictate a long message over the +phone. In a typical scenario, the message taker might say “OK” after each sentence has been heard, +understood, and recorded. If the message taker hears a garbled sentence, you’re asked to repeat the +garbled sentence. This message-dictation protocol uses both positive acknowledgments (“OK”) and +negative acknowledgments (“Please repeat that.”). These control messages allow the receiver to let +the sender know what has been received correctly, and what has been received in error and thus +requires repeating. In a computer network setting, reliable data transfer protocols based on such +retransmission are known as ARQ (Automatic Repeat reQuest) protocols. +Fundamentally, three additional protocol capabilities are required in ARQ protocols to handle the +presence of bit errors: +Error detection. First, a mechanism is needed to allow the receiver to detect when bit errors have +occurred. Recall from the previous section that UDP uses the Internet checksum field for exactly this +purpose. In Chapter 6 we’ll examine error-detection and -correction techniques in greater detail; +these techniques allow the receiver to detect and possibly correct packet bit errors. For now, we +need only know that these techniques require that extra bits (beyond the bits of original data to be +transferred) be sent from the sender to the receiver; these bits will be gathered into the packet +checksum field of the rdt2.0 data packet. +Receiver feedback. Since the sender and receiver are typically executing on different end systems, +possibly separated by thousands of miles, the only way for the sender to learn of the receiver’s view +of the world (in this case, whether or not a packet was received correctly) is for the receiver to +provide explicit feedback to the sender. The positive (ACK) and negative (NAK) acknowledgment +replies in the message-dictation scenario are examples of such feedback. Our rdt2.0 protocol will +similarly send ACK and NAK packets back from the receiver to the sender. In principle, these +packets need only be one bit long; for example, a 0 value could indicate a NAK and a value of 1 +could indicate an ACK. +Retransmission. A packet that is received in error at the receiver will be retransmitted by the +sender. + + Figure 3.10 shows the FSM representation of rdt2.0 , a data transfer protocol employing error +detection, positive acknowledgments, and negative acknowledgments. +The send side of rdt2.0 has two states. In the leftmost state, the send-side protocol is waiting for data +to be passed down from the upper layer. When the rdt_send(data) event occurs, the sender will +create a packet ( sndpkt ) containing the data to be sent, along with a packet checksum (for example, +as discussed in Section 3.3.2 for the case of a UDP segment), and then send the packet via the +udt_send(sndpkt) operation. In the rightmost state, the sender protocol is waiting for an ACK or a +NAK packet from the receiver. If an ACK packet is received + +Figure 3.10 rdt2.0 – A protocol for a channel with bit errors + +(the notation rdt_rcv(rcvpkt) && isACK (rcvpkt) in Figure 3.10 corresponds to this event), +the sender knows that the most recently transmitted packet has been received correctly and thus the +protocol returns to the state of waiting for data from the upper layer. If a NAK is received, the protocol +retransmits the last packet and waits for an ACK or NAK to be returned by the receiver in response to + + the retransmitted data packet. It is important to note that when the sender is in the wait-for-ACK-or-NAK +state, it cannot get more data from the upper layer; that is, the rdt_send() event can not occur; that +will happen only after the sender receives an ACK and leaves this state. Thus, the sender will not send +a new piece of data until it is sure that the receiver has correctly received the current packet. Because of +this behavior, protocols such as rdt2.0 are known as stop-and-wait protocols. +The receiver-side FSM for rdt2.0 still has a single state. On packet arrival, the receiver replies with +either an ACK or a NAK, depending on whether or not the received packet is corrupted. In Figure 3.10, +the notation rdt_rcv(rcvpkt) && corrupt(rcvpkt) corresponds to the event in which a packet +is received and is found to be in error. +Protocol rdt2.0 may look as if it works but, unfortunately, it has a fatal flaw. In particular, we haven’t +accounted for the possibility that the ACK or NAK packet could be corrupted! (Before proceeding on, +you should think about how this problem may be fixed.) Unfortunately, our slight oversight is not as +innocuous as it may seem. Minimally, we will need to add checksum bits to ACK/NAK packets in order +to detect such errors. The more difficult question is how the protocol should recover from errors in ACK +or NAK packets. The difficulty here is that if an ACK or NAK is corrupted, the sender has no way of +knowing whether or not the receiver has correctly received the last piece of transmitted data. +Consider three possibilities for handling corrupted ACKs or NAKs: +For the first possibility, consider what a human might do in the message-dictation scenario. If the +speaker didn’t understand the “OK” or “Please repeat that” reply from the receiver, the speaker +would probably ask, “What did you say?” (thus introducing a new type of sender-to-receiver packet +to our protocol). The receiver would then repeat the reply. But what if the speaker’s “What did you +say?” is corrupted? The receiver, having no idea whether the garbled sentence was part of the +dictation or a request to repeat the last reply, would probably then respond with “What did you say?” +And then, of course, that response might be garbled. Clearly, we’re heading down a difficult path. +A second alternative is to add enough checksum bits to allow the sender not only to detect, but also +to recover from, bit errors. This solves the immediate problem for a channel that can corrupt packets +but not lose them. +A third approach is for the sender simply to resend the current data packet when it receives a +garbled ACK or NAK packet. This approach, however, introduces duplicate packets into the +sender-to-receiver channel. The fundamental difficulty with duplicate packets is that the receiver +doesn’t know whether the ACK or NAK it last sent was received correctly at the sender. Thus, it +cannot know a priori whether an arriving packet contains new data or is a retransmission! +A simple solution to this new problem (and one adopted in almost all existing data transfer protocols, +including TCP) is to add a new field to the data packet and have the sender number its data packets by +putting a sequence number into this field. The receiver then need only check this sequence number to + + determine whether or not the received packet is a retransmission. For this simple case of a stop-andwait protocol, a 1-bit sequence number will suffice, since it will allow the receiver to know whether the +sender is resending the previously transmitted packet (the sequence number of the received packet has +the same sequence number as the most recently received packet) or a new packet (the sequence +number changes, moving “forward” in modulo-2 arithmetic). Since we are currently assuming a channel +that does not lose packets, ACK and NAK packets do not themselves need to indicate the sequence +number of the packet they are acknowledging. The sender knows that a received ACK or NAK packet +(whether garbled or not) was generated in response to its most recently transmitted data packet. +Figures 3.11 and 3.12 show the FSM description for rdt2.1 , our fixed version of rdt2.0 . The +rdt2.1 sender and receiver FSMs each now have twice as many states as before. This is because the +protocol state must now reflect whether the packet currently being sent (by the sender) or expected (at +the receiver) should have a sequence number of 0 or 1. Note that the actions in those states where a 0numbered packet is being sent or expected are mirror images of those where a 1-numbered packet is +being sent or expected; the only differences have to do with the handling of the sequence number. +Protocol rdt2.1 uses both positive and negative acknowledgments from the receiver to the sender. +When an out-of-order packet is received, the receiver sends a positive acknowledgment for the packet it +has received. When a corrupted packet + +Figure 3.11 rdt2.1 sender + + Figure 3.12 rdt2.1 receiver + +is received, the receiver sends a negative acknowledgment. We can accomplish the same effect as a +NAK if, instead of sending a NAK, we send an ACK for the last correctly received packet. A sender that +receives two ACKs for the same packet (that is, receives duplicate ACKs) knows that the receiver did +not correctly receive the packet following the packet that is being ACKed twice. Our NAK-free reliable +data transfer protocol for a channel with bit errors is rdt2.2 , shown in Figures 3.13 and 3.14. One +subtle change between rtdt2.1 and rdt2.2 is that the receiver must now include the sequence +number of the packet being acknowledged by an ACK message (this is done by including the ACK , 0 +or ACK , 1 argument in make_pkt() in the receiver FSM), and the sender must now check the +sequence number of the packet being acknowledged by a received ACK message (this is done by +including the 0 or 1 argument in isACK() in the sender FSM). +Reliable Data Transfer over a Lossy Channel with Bit Errors: rdt3.0 +Suppose now that in addition to corrupting bits, the underlying channel can lose packets as well, a notuncommon event in today’s computer networks (including the Internet). Two additional concerns must +now be addressed by the protocol: how to detect packet loss and what to do when packet loss occurs. +The use of checksumming, sequence numbers, ACK packets, and retransmissions—the techniques + + Figure 3.13 rdt2.2 sender + +already developed in rdt2.2 —will allow us to answer the latter concern. Handling the first concern will +require adding a new protocol mechanism. +There are many possible approaches toward dealing with packet loss (several more of which are +explored in the exercises at the end of the chapter). Here, we’ll put the burden of detecting and +recovering from lost packets on the sender. Suppose that the sender transmits a data packet and either +that packet, or the receiver’s ACK of that packet, gets lost. In either case, no reply is forthcoming at the +sender from the receiver. If the sender is willing to wait long enough so that it is certain that a packet has +been lost, it can simply retransmit the data packet. You should convince yourself that this protocol does +indeed work. +But how long must the sender wait to be certain that something has been lost? The sender must clearly +wait at least as long as a round-trip delay between the sender and receiver (which may include buffering +at intermediate routers) plus whatever amount of time is needed to process a packet at the receiver. In +many networks, this worst-case maximum delay is very difficult even to estimate, much less know with +certainty. Moreover, the protocol should ideally recover from packet loss as soon as possible; waiting for +a worst-case delay could mean a long wait until error recovery + + Figure 3.14 rdt2.2 receiver + +is initiated. The approach thus adopted in practice is for the sender to judiciously choose a time value +such that packet loss is likely, although not guaranteed, to have happened. If an ACK is not received +within this time, the packet is retransmitted. Note that if a packet experiences a particularly large delay, +the sender may retransmit the packet even though neither the data packet nor its ACK have been lost. +This introduces the possibility of duplicate data packets in the sender-to-receiver channel. Happily, +protocol rdt2.2 already has enough functionality (that is, sequence numbers) to handle the case of +duplicate packets. +From the sender’s viewpoint, retransmission is a panacea. The sender does not know whether a data +packet was lost, an ACK was lost, or if the packet or ACK was simply overly delayed. In all cases, the +action is the same: retransmit. Implementing a time-based retransmission mechanism requires a +countdown timer that can interrupt the sender after a given amount of time has expired. The sender +will thus need to be able to (1) start the timer each time a packet (either a first-time packet or a +retransmission) is sent, (2) respond to a timer interrupt (taking appropriate actions), and (3) stop the +timer. +Figure 3.15 shows the sender FSM for rdt3.0 , a protocol that reliably transfers data over a channel +that can corrupt or lose packets; in the homework problems, you’ll be asked to provide the receiver FSM +for rdt3.0 . Figure 3.16 shows how the protocol operates with no lost or delayed packets and how it +handles lost data packets. In Figure 3.16, time moves forward from the top of the diagram toward the +bottom of the + + Figure 3.15 rdt3.0 sender + +diagram; note that a receive time for a packet is necessarily later than the send time for a packet as a +result of transmission and propagation delays. In Figures 3.16(b)–(d), the send-side brackets indicate +the times at which a timer is set and later times out. Several of the more subtle aspects of this protocol +are explored in the exercises at the end of this chapter. Because packet sequence numbers alternate +between 0 and 1, protocol rdt3.0 is sometimes known as the alternating-bit protocol. +We have now assembled the key elements of a data transfer protocol. Checksums, sequence numbers, +timers, and positive and negative acknowledgment packets each play a crucial and necessary role in the +operation of the protocol. We now have a working reliable data transfer protocol! + +Developing a protocol and FSM representation for a simple application-layer protocol + + 3.4.2 Pipelined Reliable Data Transfer Protocols +Protocol rdt3.0 is a functionally correct protocol, but it is unlikely that anyone would be happy with its +performance, particularly in today’s high-speed networks. At the heart of rdt3.0 ’s performance +problem is the fact that it is a stop-and-wait protocol. + +Figure 3.16 Operation of rdt3.0 , the alternating-bit protocol + + Figure 3.17 Stop-and-wait versus pipelined protocol + +To appreciate the performance impact of this stop-and-wait behavior, consider an idealized case of two +hosts, one located on the West Coast of the United States and the other located on the East Coast, as +shown in Figure 3.17. The speed-of-light round-trip propagation delay between these two end systems, +RTT, is approximately 30 milliseconds. Suppose that they are connected by a channel with a +transmission rate, R, of 1 Gbps (109 bits per second). With a packet size, L, of 1,000 bytes (8,000 bits) +per packet, including both header fields and data, the time needed to actually transmit the packet into +the 1 Gbps link is +dtrans=LR=8000 bits/packet109 bits/sec=8 microseconds +Figure 3.18(a) shows that with our stop-and-wait protocol, if the sender begins sending the packet at +t=0, then at t=L/R=8 microseconds, the last bit enters the channel at the sender side. The packet then +makes its 15-msec cross-country journey, with the last bit of the packet emerging at the receiver at +t=RTT/2+L/R= 15.008 msec. Assuming for simplicity that ACK packets are extremely small (so that we +can ignore their transmission time) and that the receiver can send an ACK as soon as the last bit of a +data packet is received, the ACK emerges back at the sender at t=RTT+L/R=30.008 msec. At this point, +the sender can now transmit the next message. Thus, in 30.008 msec, the sender was sending for only +0.008 msec. If we define the utilization of the sender (or the channel) as the fraction of time the sender +is actually busy sending bits into the channel, the analysis in Figure 3.18(a) shows that the stop-andwait protocol has a rather dismal sender utilization, Usender, of +Usender=L/RRTT+L/R =.00830.008=0.00027 + + Figure 3.18 Stop-and-wait and pipelined sending + +That is, the sender was busy only 2.7 hundredths of one percent of the time! Viewed another way, the +sender was able to send only 1,000 bytes in 30.008 milliseconds, an effective throughput of only 267 +kbps—even though a 1 Gbps link was available! Imagine the unhappy network manager who just paid a +fortune for a gigabit capacity link but manages to get a throughput of only 267 kilobits per second! This +is a graphic example of how network protocols can limit the capabilities provided by the underlying +network hardware. Also, we have neglected lower-layer protocol-processing times at the sender and +receiver, as well as the processing and queuing delays that would occur at any intermediate routers + + between the sender and receiver. Including these effects would serve only to further increase the delay +and further accentuate the poor performance. +The solution to this particular performance problem is simple: Rather than operate in a stop-and-wait +manner, the sender is allowed to send multiple packets without waiting for acknowledgments, as +illustrated in Figure 3.17(b). Figure 3.18(b) shows that if the sender is allowed to transmit three packets +before having to wait for acknowledgments, the utilization of the sender is essentially tripled. Since the +many in-transit sender-to-receiver packets can be visualized as filling a pipeline, this technique is known +as pipelining. Pipelining has the following consequences for reliable data transfer protocols: +The range of sequence numbers must be increased, since each in-transit packet (not counting +retransmissions) must have a unique sequence number and there may be multiple, in-transit, +unacknowledged packets. +The sender and receiver sides of the protocols may have to buffer more than one packet. Minimally, +the sender will have to buffer packets that have been transmitted but not yet acknowledged. +Buffering of correctly received packets may also be needed at the receiver, as discussed below. +The range of sequence numbers needed and the buffering requirements will depend on the manner +in which a data transfer protocol responds to lost, corrupted, and overly delayed packets. Two basic +approaches toward pipelined error recovery can be identified: Go-Back-N and selective repeat. + +3.4.3 Go-Back-N (GBN) +In a Go-Back-N (GBN) protocol, the sender is allowed to transmit multiple packets (when available) +without waiting for an acknowledgment, but is constrained to have no more than some maximum +allowable number, N, of unacknowledged packets in the pipeline. We describe the GBN protocol in +some detail in this section. But before reading on, you are encouraged to play with the GBN applet (an +awesome applet!) at the companion Web site. +Figure 3.19 shows the sender’s view of the range of sequence numbers in a GBN protocol. If we define +base to be the sequence number of the oldest unacknowledged + +Figure 3.19 Sender’s view of sequence numbers in Go-Back-N + + packet and nextseqnum to be the smallest unused sequence number (that is, the sequence number of +the next packet to be sent), then four intervals in the range of sequence numbers can be identified. +Sequence numbers in the interval [ 0, base-1 ] correspond to packets that have already been +transmitted and acknowledged. The interval [base, nextseqnum-1] corresponds to packets that +have been sent but not yet acknowledged. Sequence numbers in the interval [nextseqnum, +base+N-1] can be used for packets that can be sent immediately, should data arrive from the upper +layer. Finally, sequence numbers greater than or equal to base+N cannot be used until an +unacknowledged packet currently in the pipeline (specifically, the packet with sequence number base ) +has been acknowledged. +As suggested by Figure 3.19, the range of permissible sequence numbers for transmitted but not yet +acknowledged packets can be viewed as a window of size N over the range of sequence numbers. As +the protocol operates, this window slides forward over the sequence number space. For this reason, N +is often referred to as the window size and the GBN protocol itself as a sliding-window protocol. You +might be wondering why we would even limit the number of outstanding, unacknowledged packets to a +value of N in the first place. Why not allow an unlimited number of such packets? We’ll see in Section +3.5 that flow control is one reason to impose a limit on the sender. We’ll examine another reason to do +so in Section 3.7, when we study TCP congestion control. +In practice, a packet’s sequence number is carried in a fixed-length field in the packet header. If k is the +number of bits in the packet sequence number field, the range of sequence numbers is thus [0,2k−1]. +With a finite range of sequence numbers, all arithmetic involving sequence numbers must then be done +using modulo 2k arithmetic. (That is, the sequence number space can be thought of as a ring of size 2k, +where sequence number 2k−1 is immediately followed by sequence number 0.) Recall that rdt3.0 +had a 1-bit sequence number and a range of sequence numbers of [0,1]. Several of the problems at the +end of this chapter explore the consequences of a finite range of sequence numbers. We will see in +Section 3.5 that TCP has a 32-bit sequence number field, where TCP sequence numbers count bytes +in the byte stream rather than packets. +Figures 3.20 and 3.21 give an extended FSM description of the sender and receiver sides of an ACKbased, NAK-free, GBN protocol. We refer to this FSM + + Figure 3.20 Extended FSM description of the GBN sender + +Figure 3.21 Extended FSM description of the GBN receiver + +description as an extended FSM because we have added variables (similar to programming-language +variables) for base and nextseqnum , and added operations on these variables and conditional +actions involving these variables. Note that the extended FSM specification is now beginning to look +somewhat like a programming-language specification. [Bochman 1984] provides an excellent survey of + + additional extensions to FSM techniques as well as other programming-language-based techniques for +specifying protocols. +The GBN sender must respond to three types of events: +Invocation from above. When rdt_send() is called from above, the sender first checks to see if +the window is full, that is, whether there are N outstanding, unacknowledged packets. If the window +is not full, a packet is created and sent, and variables are appropriately updated. If the window is full, +the sender simply returns the data back to the upper layer, an implicit indication that the window is +full. The upper layer would presumably then have to try again later. In a real implementation, the +sender would more likely have either buffered (but not immediately sent) this data, or would have a +synchronization mechanism (for example, a semaphore or a flag) that would allow the upper layer to +call rdt_send() only when the window is not full. +Receipt of an ACK. In our GBN protocol, an acknowledgment for a packet with sequence number n +will be taken to be a cumulative acknowledgment, indicating that all packets with a sequence +number up to and including n have been correctly received at the receiver. We’ll come back to this +issue shortly when we examine the receiver side of GBN. +A timeout event. The protocol’s name, “Go-Back-N,” is derived from the sender’s behavior in the +presence of lost or overly delayed packets. As in the stop-and-wait protocol, a timer will again be +used to recover from lost data or acknowledgment packets. If a timeout occurs, the sender resends +all packets that have been previously sent but that have not yet been acknowledged. Our sender in +Figure 3.20 uses only a single timer, which can be thought of as a timer for the oldest transmitted +but not yet acknowledged packet. If an ACK is received but there are still additional transmitted but +not yet acknowledged packets, the timer is restarted. If there are no outstanding, unacknowledged +packets, the timer is stopped. +The receiver’s actions in GBN are also simple. If a packet with sequence number n is received correctly +and is in order (that is, the data last delivered to the upper layer came from a packet with sequence +number n−1), the receiver sends an ACK for packet n and delivers the data portion of the packet to the +upper layer. In all other cases, the receiver discards the packet and resends an ACK for the most +recently received in-order packet. Note that since packets are delivered one at a time to the upper layer, +if packet k has been received and delivered, then all packets with a sequence number lower than k have +also been delivered. Thus, the use of cumulative acknowledgments is a natural choice for GBN. +In our GBN protocol, the receiver discards out-of-order packets. Although it may seem silly and wasteful +to discard a correctly received (but out-of-order) packet, there is some justification for doing so. Recall +that the receiver must deliver data in order to the upper layer. Suppose now that packet n is expected, +but packet n+1 arrives. Because data must be delivered in order, the receiver could buffer (save) packet +n+1 and then deliver this packet to the upper layer after it had later received and delivered packet n. +However, if packet n is lost, both it and packet n+1 will eventually be retransmitted as a result of the + + GBN retransmission rule at the sender. Thus, the receiver can simply discard packet n+1. The +advantage of this approach is the simplicity of receiver buffering—the receiver need not buffer any outof-order packets. Thus, while the sender must maintain the upper and lower bounds of its window and +the position of nextseqnum within this window, the only piece of information the receiver need +maintain is the sequence number of the next in-order packet. This value is held in the variable +expectedseqnum , shown in the receiver FSM in Figure 3.21. Of course, the disadvantage of throwing +away a correctly received packet is that the subsequent retransmission of that packet might be lost or +garbled and thus even more retransmissions would be required. +Figure 3.22 shows the operation of the GBN protocol for the case of a window size of four packets. +Because of this window size limitation, the sender sends packets 0 through 3 but then must wait for one +or more of these packets to be acknowledged before proceeding. As each successive ACK (for +example, ACK0 and ACK1 ) is received, the window slides forward and the sender can transmit one +new packet (pkt4 and pkt5, respectively). On the receiver side, packet 2 is lost and thus packets 3, 4, +and 5 are found to be out of order and are discarded. +Before closing our discussion of GBN, it is worth noting that an implementation of this protocol in a +protocol stack would likely have a structure similar to that of the extended FSM in Figure 3.20. The +implementation would also likely be in the form of various procedures that implement the actions to be +taken in response to the various events that can occur. In such event-based programming, the various +procedures are called (invoked) either by other procedures in the protocol stack, or as the result of an +interrupt. In the sender, these events would be (1) a call from the upper-layer entity to invoke +rdt_send() , (2) a timer interrupt, and (3) a call from the lower layer to invoke rdt_rcv() when a +packet arrives. The programming exercises at the end of this chapter will give you a chance to actually +implement these routines in a simulated, but realistic, network setting. +We note here that the GBN protocol incorporates almost all of the techniques that we will encounter +when we study the reliable data transfer components of TCP in Section 3.5. These techniques include +the use of sequence numbers, cumulative acknowledgments, checksums, and a timeout/retransmit +operation. + + Figure 3.22 Go-Back-N in operation + +3.4.4 Selective Repeat (SR) +The GBN protocol allows the sender to potentially “fill the pipeline” in Figure 3.17 with packets, thus +avoiding the channel utilization problems we noted with stop-and-wait protocols. There are, however, +scenarios in which GBN itself suffers from performance problems. In particular, when the window size +and bandwidth-delay product are both large, many packets can be in the pipeline. A single packet error +can thus cause GBN to retransmit a large number of packets, many unnecessarily. As the probability of +channel errors increases, the pipeline can become filled with these unnecessary retransmissions. +Imagine, in our message-dictation scenario, that if every time a word was garbled, the surrounding +1,000 words (for example, a window size of 1,000 words) had to be repeated. The dictation would be + + slowed by all of the reiterated words. +As the name suggests, selective-repeat protocols avoid unnecessary retransmissions by having the +sender retransmit only those packets that it suspects were received in error (that is, were lost or +corrupted) at the receiver. This individual, as-needed, retransmission will require that the receiver +individually acknowledge correctly received packets. A window size of N will again be used to limit the +number of outstanding, unacknowledged packets in the pipeline. However, unlike GBN, the sender will +have already received ACKs for some of the packets in the window. Figure 3.23 shows the SR sender’s +view of the sequence number space. Figure 3.24 details the various actions taken by the SR sender. +The SR receiver will acknowledge a correctly received packet whether or not it is in order. Out-of-order +packets are buffered until any missing packets (that is, packets with lower sequence numbers) are +received, at which point a batch of packets can be delivered in order to the upper layer. Figure 3.25 +itemizes the various actions taken by the SR receiver. Figure 3.26 shows an example of SR operation +in the presence of lost packets. Note that in Figure 3.26, the receiver initially buffers packets 3, 4, and 5, +and delivers them together with packet 2 to the upper layer when packet 2 is finally received. + +Figure 3.23 Selective-repeat (SR) sender and receiver views of sequence-number space + + Figure 3.24 SR sender events and actions + +Figure 3.25 SR receiver events and actions + +It is important to note that in Step 2 in Figure 3.25, the receiver reacknowledges (rather than ignores) +already received packets with certain sequence numbers below the current window base. You should +convince yourself that this reacknowledgment is indeed needed. Given the sender and receiver +sequence number spaces in Figure 3.23, for example, if there is no ACK for packet send_base +propagating from the + + Figure 3.26 SR operation + +receiver to the sender, the sender will eventually retransmit packet send_base , even though it is clear +(to us, not the sender!) that the receiver has already received that packet. If the receiver were not to +acknowledge this packet, the sender’s window would never move forward! This example illustrates an +important aspect of SR protocols (and many other protocols as well). The sender and receiver will not +always have an identical view of what has been received correctly and what has not. For SR protocols, +this means that the sender and receiver windows will not always coincide. +The lack of synchronization between sender and receiver windows has important consequences when +we are faced with the reality of a finite range of sequence numbers. Consider what could happen, for +example, with a finite range of four packet sequence numbers, 0, 1, 2, 3, and a window size of three. + + Suppose packets 0 through 2 are transmitted and correctly received and acknowledged at the receiver. +At this point, the receiver’s window is over the fourth, fifth, and sixth packets, which have sequence +numbers 3, 0, and 1, respectively. Now consider two scenarios. In the first scenario, shown in Figure +3.27(a), the ACKs for the first three packets are lost and the sender retransmits these packets. The +receiver thus next receives a packet with sequence number 0—a copy of the first packet sent. +In the second scenario, shown in Figure 3.27(b), the ACKs for the first three packets are all delivered +correctly. The sender thus moves its window forward and sends the fourth, fifth, and sixth packets, with +sequence numbers 3, 0, and 1, respectively. The packet with sequence number 3 is lost, but the packet +with sequence number 0 arrives—a packet containing new data. +Now consider the receiver’s viewpoint in Figure 3.27, which has a figurative curtain between the sender +and the receiver, since the receiver cannot “see” the actions taken by the sender. All the receiver +observes is the sequence of messages it receives from the channel and sends into the channel. As far +as it is concerned, the two scenarios in Figure 3.27 are identical. There is no way of distinguishing the +retransmission of the first packet from an original transmission of the fifth packet. Clearly, a window size +that is 1 less than the size of the sequence number space won’t work. But how small must the window +size be? A problem at the end of the chapter asks you to show that the window size must be less than +or equal to half the size of the sequence number space for SR protocols. +At the companion Web site, you will find an applet that animates the operation of the SR protocol. Try +performing the same experiments that you did with the GBN applet. Do the results agree with what you +expect? +This completes our discussion of reliable data transfer protocols. We’ve covered a lot of ground and +introduced numerous mechanisms that together provide for reliable data transfer. Table 3.1 summarizes +these mechanisms. Now that we have seen all of these mechanisms in operation and can see the “big +picture,” we encourage you to review this section again to see how these mechanisms were +incrementally added to cover increasingly complex (and realistic) models of the channel connecting the +sender and receiver, or to improve the performance of the protocols. +Let’s conclude our discussion of reliable data transfer protocols by considering one remaining +assumption in our underlying channel model. Recall that we have assumed that packets cannot be +reordered within the channel between the sender and receiver. This is generally a reasonable +assumption when the sender and receiver are connected by a single physical wire. However, when the +“channel” connecting the two is a network, packet reordering can occur. One manifestation of packet +reordering is that old copies of a packet with a sequence or acknowledgment + + Figure 3.27 SR receiver dilemma with too-large windows: A new packet or a retransmission? + +Table 3.1 Summary of reliable data transfer mechanisms and their use +Mechanism + +Use, Comments + +Checksum + +Used to detect bit errors in a transmitted packet. + +Timer + +Used to timeout/retransmit a packet, possibly because the packet (or its ACK) +was lost within the channel. Because timeouts can occur when a packet is +delayed but not lost (premature timeout), or when a packet has been received +by the receiver but the receiver-to-sender ACK has been lost, duplicate copies + + of a packet may be received by a receiver. +Sequence + +Used for sequential numbering of packets of data flowing from sender to + +number + +receiver. Gaps in the sequence numbers of received packets allow the receiver +to detect a lost packet. Packets with duplicate sequence numbers allow the +receiver to detect duplicate copies of a packet. + +Acknowledgment + +Used by the receiver to tell the sender that a packet or set of packets has been +received correctly. Acknowledgments will typically carry the sequence number +of the packet or packets being acknowledged. Acknowledgments may be +individual or cumulative, depending on the protocol. + +Negative + +Used by the receiver to tell the sender that a packet has not been received + +acknowledgment + +correctly. Negative acknowledgments will typically carry the sequence number +of the packet that was not received correctly. + +Window, + +The sender may be restricted to sending only packets with sequence numbers + +pipelining + +that fall within a given range. By allowing multiple packets to be transmitted but +not yet acknowledged, sender utilization can be increased over a stop-and-wait +mode of operation. We’ll see shortly that the window size may be set on the +basis of the receiver’s ability to receive and buffer messages, or the level of +congestion in the network, or both. + +number of x can appear, even though neither the sender’s nor the receiver’s window contains x. With +packet reordering, the channel can be thought of as essentially buffering packets and spontaneously +emitting these packets at any point in the future. Because sequence numbers may be reused, some +care must be taken to guard against such duplicate packets. The approach taken in practice is to ensure +that a sequence number is not reused until the sender is “sure” that any previously sent packets with +sequence number x are no longer in the network. This is done by assuming that a packet cannot “live” in +the network for longer than some fixed maximum amount of time. A maximum packet lifetime of +approximately three minutes is assumed in the TCP extensions for high-speed networks [RFC 1323]. +[Sunshine 1978] describes a method for using sequence numbers such that reordering problems can +be completely avoided. + + 3.5 Connection-Oriented Transport: TCP +Now that we have covered the underlying principles of reliable data transfer, let’s turn to TCP—the +Internet’s transport-layer, connection-oriented, reliable transport protocol. In this section, we’ll see that in +order to provide reliable data transfer, TCP relies on many of the underlying principles discussed in the +previous section, including error detection, retransmissions, cumulative acknowledgments, timers, and +header fields for sequence and acknowledgment numbers. TCP is defined in RFC 793, RFC 1122, RFC +1323, RFC 2018, and RFC 2581. + +3.5.1 The TCP Connection +TCP is said to be connection-oriented because before one application process can begin to send data +to another, the two processes must first “handshake” with each other—that is, they must send some +preliminary segments to each other to establish the parameters of the ensuing data transfer. As part of +TCP connection establishment, both sides of the connection will initialize many TCP state variables +(many of which will be discussed in this section and in Section 3.7) associated with the TCP +connection. +The TCP “connection” is not an end-to-end TDM or FDM circuit as in a circuit-switched network. Instead, +the “connection” is a logical one, with common state residing only in the TCPs in the two communicating +end systems. Recall that because the TCP protocol runs only in the end systems and not in the +intermediate network elements (routers and link-layer switches), the intermediate network elements do +not maintain TCP connection state. In fact, the intermediate routers are completely oblivious to TCP +connections; they see datagrams, not connections. +A TCP connection provides a full-duplex service: If there is a TCP connection between Process A on +one host and Process B on another host, then application-layer data can flow from Process A to +Process B at the same time as application-layer data flows from Process B to Process A. A TCP +connection is also always point-to-point, that is, between a single sender and a single receiver. Socalled “multicasting” (see the online supplementary materials for this text)—the transfer of data from one +sender to many receivers in a single send operation—is not possible with TCP. With TCP, two hosts are +company and three are a crowd! +Let’s now take a look at how a TCP connection is established. Suppose a process running in one host +wants to initiate a connection with another process in another host. Recall that the process that is + + initiating the connection is called the client process, while the other process is called the server process. +The client application process first informs the client transport layer that it wants to establish a +connection + +CASE HISTORY +Vinton Cerf, Robert Kahn, and TCP/IP +In the early 1970s, packet-switched networks began to proliferate, with the ARPAnet—the +precursor of the Internet—being just one of many networks. Each of these networks had its own +protocol. Two researchers, Vinton Cerf and Robert Kahn, recognized the importance of +interconnecting these networks and invented a cross-network protocol called TCP/IP, which +stands for Transmission Control Protocol/Internet Protocol. Although Cerf and Kahn began by +seeing the protocol as a single entity, it was later split into its two parts, TCP and IP, which +operated separately. Cerf and Kahn published a paper on TCP/IP in May 1974 in IEEE +Transactions on Communications Technology [Cerf 1974]. +The TCP/IP protocol, which is the bread and butter of today’s Internet, was devised before PCs, +workstations, smartphones, and tablets, before the proliferation of Ethernet, cable, and DSL, +WiFi, and other access network technologies, and before the Web, social media, and streaming +video. Cerf and Kahn saw the need for a networking protocol that, on the one hand, provides +broad support for yet-to-be-defined applications and, on the other hand, allows arbitrary hosts +and link-layer protocols to interoperate. +In 2004, Cerf and Kahn received the ACM’s Turing Award, considered the “Nobel Prize of +Computing” for “pioneering work on internetworking, including the design and implementation of +the Internet’s basic communications protocols, TCP/IP, and for inspired leadership in +networking.” + +to a process in the server. Recall from Section 2.7.2, a Python client program does this by issuing the +command + +clientSocket.connect((serverName, serverPort)) + +where serverName is the name of the server and serverPort identifies the process on the server. +TCP in the client then proceeds to establish a TCP connection with TCP in the server. At the end of this +section we discuss in some detail the connection-establishment procedure. For now it suffices to know +that the client first sends a special TCP segment; the server responds with a second special TCP +segment; and finally the client responds again with a third special segment. The first two segments carry +no payload, that is, no application-layer data; the third of these segments may carry a payload. Because + + three segments are sent between the two hosts, this connection-establishment procedure is often +referred to as a three-way handshake. +Once a TCP connection is established, the two application processes can send data to each other. Let’s +consider the sending of data from the client process to the server process. The client process passes a +stream of data through the socket (the door of the process), as described in Section 2.7. Once the data +passes through the door, the data is in the hands of TCP running in the client. As shown in Figure 3.28, +TCP directs this data to the connection’s send buffer, which is one of the buffers that is set aside during +the initial three-way handshake. From time to time, TCP will grab chunks of data from the send buffer +and pass the data to the network layer. Interestingly, the TCP specification [RFC 793] is very laid back +about specifying when TCP should actually send buffered data, stating that TCP should “send that data +in segments at its own convenience.” The maximum amount of data that can be grabbed and placed in +a segment is limited by the maximum segment size (MSS). The MSS is typically set by first +determining the length of the largest link-layer frame that can be sent by the local sending host (the socalled maximum transmission unit, MTU), and then setting the MSS to ensure that a TCP segment +(when encapsulated in an IP datagram) plus the TCP/IP header length (typically 40 bytes) will fit into a +single link-layer frame. Both Ethernet and PPP link-layer protocols have an MTU of 1,500 bytes. Thus a +typical value of MSS is 1460 bytes. Approaches have also been proposed for discovering the path MTU +—the largest link-layer frame that can be sent on all links from source to destination [RFC 1191]—and +setting the MSS based on the path MTU value. Note that the MSS is the maximum amount of +application-layer data in the segment, not the maximum size of the TCP segment including headers. +(This terminology is confusing, but we have to live with it, as it is well entrenched.) +TCP pairs each chunk of client data with a TCP header, thereby forming TCP segments. The segments +are passed down to the network layer, where they are separately encapsulated within network-layer IP +datagrams. The IP datagrams are then sent into the network. When TCP receives a segment at the +other end, the segment’s data is placed in the TCP connection’s receive buffer, as shown in Figure +3.28. The application reads the stream of data from this buffer. Each side of the connection has + +Figure 3.28 TCP send and receive buffers + + its own send buffer and its own receive buffer. (You can see the online flow-control applet at +http://www.awl.com/kurose-ross, which provides an animation of the send and receive buffers.) +We see from this discussion that a TCP connection consists of buffers, variables, and a socket +connection to a process in one host, and another set of buffers, variables, and a socket connection to a +process in another host. As mentioned earlier, no buffers or variables are allocated to the connection in +the network elements (routers, switches, and repeaters) between the hosts. + +3.5.2 TCP Segment Structure +Having taken a brief look at the TCP connection, let’s examine the TCP segment structure. The TCP +segment consists of header fields and a data field. The data field contains a chunk of application data. +As mentioned above, the MSS limits the maximum size of a segment’s data field. When TCP sends a +large file, such as an image as part of a Web page, it typically breaks the file into chunks of size MSS +(except for the last chunk, which will often be less than the MSS). Interactive applications, however, +often transmit data chunks that are smaller than the MSS; for example, with remote login applications +like Telnet, the data field in the TCP segment is often only one byte. Because the TCP header is +typically 20 bytes (12 bytes more than the UDP header), segments sent by Telnet may be only 21 bytes +in length. +Figure 3.29 shows the structure of the TCP segment. As with UDP, the header includes source and +destination port numbers, which are used for multiplexing/demultiplexing data from/to upper-layer +applications. Also, as with UDP, the header includes a checksum field. A TCP segment header also +contains the following fields: +The 32-bit sequence number field and the 32-bit acknowledgment number field are used by the +TCP sender and receiver in implementing a reliable data transfer service, as discussed below. +The 16-bit receive window field is used for flow control. We will see shortly that it is used to indicate +the number of bytes that a receiver is willing to accept. +The 4-bit header length field specifies the length of the TCP header in 32-bit words. The TCP +header can be of variable length due to the TCP options field. (Typically, the options field is empty, +so that the length of the typical TCP header is 20 bytes.) +The optional and variable-length options field is used when a sender and receiver negotiate the +maximum segment size (MSS) or as a window scaling factor for use in high-speed networks. A timestamping option is also defined. See RFC 854 and RFC 1323 for additional details. +The flag field contains 6 bits. The ACK bit is used to indicate that the value carried in the +acknowledgment field is valid; that is, the segment contains an acknowledgment for a segment that +has been successfully received. The RST, + + Figure 3.29 TCP segment structure + +SYN, and FIN bits are used for connection setup and teardown, as we will discuss at the end of this +section. The CWR and ECE bits are used in explicit congestion notification, as discussed in Section +3.7.2. Setting the PSH bit indicates that the receiver should pass the data to the upper layer +immediately. Finally, the URG bit is used to indicate that there is data in this segment that the +sending-side upper-layer entity has marked as “urgent.” The location of the last byte of this urgent +data is indicated by the 16-bit urgent data pointer field. TCP must inform the receiving-side upperlayer entity when urgent data exists and pass it a pointer to the end of the urgent data. (In practice, +the PSH, URG, and the urgent data pointer are not used. However, we mention these fields for +completeness.) +Our experience as teachers is that our students sometimes find discussion of packet formats rather dry +and perhaps a bit boring. For a fun and fanciful look at TCP header fields, particularly if you love +Legos™ as we do, see [Pomeranz 2010]. +Sequence Numbers and Acknowledgment Numbers +Two of the most important fields in the TCP segment header are the sequence number field and the +acknowledgment number field. These fields are a critical part of TCP’s reliable data transfer service. But +before discussing how these fields are used to provide reliable data transfer, let us first explain what +exactly TCP puts in these fields. + + Figure 3.30 Dividing file data into TCP segments + +TCP views data as an unstructured, but ordered, stream of bytes. TCP’s use of sequence numbers +reflects this view in that sequence numbers are over the stream of transmitted bytes and not over the +series of transmitted segments. The sequence number for a segment is therefore the byte-stream +number of the first byte in the segment. Let’s look at an example. Suppose that a process in Host A +wants to send a stream of data to a process in Host B over a TCP connection. The TCP in Host A will +implicitly number each byte in the data stream. Suppose that the data stream consists of a file consisting +of 500,000 bytes, that the MSS is 1,000 bytes, and that the first byte of the data stream is numbered 0. +As shown in Figure 3.30, TCP constructs 500 segments out of the data stream. The first segment gets +assigned sequence number 0, the second segment gets assigned sequence number 1,000, the third +segment gets assigned sequence number 2,000, and so on. Each sequence number is inserted in the +sequence number field in the header of the appropriate TCP segment. +Now let’s consider acknowledgment numbers. These are a little trickier than sequence numbers. Recall +that TCP is full-duplex, so that Host A may be receiving data from Host B while it sends data to Host B +(as part of the same TCP connection). Each of the segments that arrive from Host B has a sequence +number for the data flowing from B to A. The acknowledgment number that Host A puts in its segment is +the sequence number of the next byte Host A is expecting from Host B. It is good to look at a few +examples to understand what is going on here. Suppose that Host A has received all bytes numbered 0 +through 535 from B and suppose that it is about to send a segment to Host B. Host A is waiting for byte +536 and all the subsequent bytes in Host B’s data stream. So Host A puts 536 in the acknowledgment +number field of the segment it sends to B. +As another example, suppose that Host A has received one segment from Host B containing bytes 0 +through 535 and another segment containing bytes 900 through 1,000. For some reason Host A has not +yet received bytes 536 through 899. In this example, Host A is still waiting for byte 536 (and beyond) in +order to re-create B’s data stream. Thus, A’s next segment to B will contain 536 in the acknowledgment +number field. Because TCP only acknowledges bytes up to the first missing byte in the stream, TCP is +said to provide cumulative acknowledgments. + + This last example also brings up an important but subtle issue. Host A received the third segment (bytes +900 through 1,000) before receiving the second segment (bytes 536 through 899). Thus, the third +segment arrived out of order. The subtle issue is: What does a host do when it receives out-of-order +segments in a TCP connection? Interestingly, the TCP RFCs do not impose any rules here and leave +the decision up to the programmers implementing a TCP implementation. There are basically two +choices: either (1) the receiver immediately discards out-of-order segments (which, as we discussed +earlier, can simplify receiver design), or (2) the receiver keeps the out-of-order bytes and waits for the +missing bytes to fill in the gaps. Clearly, the latter choice is more efficient in terms of network bandwidth, +and is the approach taken in practice. +In Figure 3.30, we assumed that the initial sequence number was zero. In truth, both sides of a TCP +connection randomly choose an initial sequence number. This is done to minimize the possibility that a +segment that is still present in the network from an earlier, already-terminated connection between two +hosts is mistaken for a valid segment in a later connection between these same two hosts (which also +happen to be using the same port numbers as the old connection) [Sunshine 1978]. +Telnet: A Case Study for Sequence and Acknowledgment Numbers +Telnet, defined in RFC 854, is a popular application-layer protocol used for remote login. It runs over +TCP and is designed to work between any pair of hosts. Unlike the bulk data transfer applications +discussed in Chapter 2, Telnet is an interactive application. We discuss a Telnet example here, as it +nicely illustrates TCP sequence and acknowledgment numbers. We note that many users now prefer to +use the SSH protocol rather than Telnet, since data sent in a Telnet connection (including passwords!) +are not encrypted, making Telnet vulnerable to eavesdropping attacks (as discussed in Section 8.7). +Suppose Host A initiates a Telnet session with Host B. Because Host A initiates the session, it is labeled +the client, and Host B is labeled the server. Each character typed by the user (at the client) will be sent +to the remote host; the remote host will send back a copy of each character, which will be displayed on +the Telnet user’s screen. This “echo back” is used to ensure that characters seen by the Telnet user +have already been received and processed at the remote site. Each character thus traverses the +network twice between the time the user hits the key and the time the character is displayed on the +user’s monitor. +Now suppose the user types a single letter, ‘C,’ and then grabs a coffee. Let’s examine the TCP +segments that are sent between the client and server. As shown in Figure 3.31, we suppose the starting +sequence numbers are 42 and 79 for the client and server, respectively. Recall that the sequence +number of a segment is the sequence number of the first byte in the data field. Thus, the first segment +sent from the client will have sequence number 42; the first segment sent from the server will have +sequence number 79. Recall that the acknowledgment number is the sequence + + Figure 3.31 Sequence and acknowledgment numbers for a simple Telnet application over TCP + +number of the next byte of data that the host is waiting for. After the TCP connection is established but +before any data is sent, the client is waiting for byte 79 and the server is waiting for byte 42. +As shown in Figure 3.31, three segments are sent. The first segment is sent from the client to the +server, containing the 1-byte ASCII representation of the letter ‘C’ in its data field. This first segment also +has 42 in its sequence number field, as we just described. Also, because the client has not yet received +any data from the server, this first segment will have 79 in its acknowledgment number field. +The second segment is sent from the server to the client. It serves a dual purpose. First it provides an +acknowledgment of the data the server has received. By putting 43 in the acknowledgment field, the +server is telling the client that it has successfully received everything up through byte 42 and is now +waiting for bytes 43 onward. The second purpose of this segment is to echo back the letter ‘C.’ Thus, +the second segment has the ASCII representation of ‘C’ in its data field. This second segment has the +sequence number 79, the initial sequence number of the server-to-client data flow of this TCP +connection, as this is the very first byte of data that the server is sending. Note that the acknowledgment +for client-to-server data is carried in a segment carrying server-to-client data; this acknowledgment is +said to be piggybacked on the server-to-client data segment. + + The third segment is sent from the client to the server. Its sole purpose is to acknowledge the data it has +received from the server. (Recall that the second segment contained data—the letter ‘C’—from the +server to the client.) This segment has an empty data field (that is, the acknowledgment is not being +piggybacked with any client-to-server data). The segment has 80 in the acknowledgment number field +because the client has received the stream of bytes up through byte sequence number 79 and it is now +waiting for bytes 80 onward. You might think it odd that this segment also has a sequence number since +the segment contains no data. But because TCP has a sequence number field, the segment needs to +have some sequence number. + +3.5.3 Round-Trip Time Estimation and Timeout +TCP, like our rdt protocol in Section 3.4, uses a timeout/retransmit mechanism to recover from lost +segments. Although this is conceptually simple, many subtle issues arise when we implement a +timeout/retransmit mechanism in an actual protocol such as TCP. Perhaps the most obvious question is +the length of the timeout intervals. Clearly, the timeout should be larger than the connection’s round-trip +time (RTT), that is, the time from when a segment is sent until it is acknowledged. Otherwise, +unnecessary retransmissions would be sent. But how much larger? How should the RTT be estimated in +the first place? Should a timer be associated with each and every unacknowledged segment? So many +questions! Our discussion in this section is based on the TCP work in [Jacobson 1988] and the current +IETF recommendations for managing TCP timers [RFC 6298]. +Estimating the Round-Trip Time +Let’s begin our study of TCP timer management by considering how TCP estimates the round-trip time +between sender and receiver. This is accomplished as follows. The sample RTT, denoted SampleRTT , +for a segment is the amount of time between when the segment is sent (that is, passed to IP) and when +an acknowledgment for the segment is received. Instead of measuring a SampleRTT for every +transmitted segment, most TCP implementations take only one SampleRTT measurement at a time. +That is, at any point in time, the SampleRTT is being estimated for only one of the transmitted but +currently unacknowledged segments, leading to a new value of SampleRTT approximately once every +RTT. Also, TCP never computes a SampleRTT for a segment that has been retransmitted; it only +measures SampleRTT for segments that have been transmitted once [Karn 1987]. (A problem at the +end of the chapter asks you to consider why.) +Obviously, the SampleRTT values will fluctuate from segment to segment due to congestion in the +routers and to the varying load on the end systems. Because of this fluctuation, any given SampleRTT +value may be atypical. In order to estimate a typical RTT, it is therefore natural to take some sort of +average of the SampleRTT values. TCP maintains an average, called EstimatedRTT , of the + + SampleRTT values. Upon obtaining a new SampleRTT , TCP updates EstimatedRTT according to +the following formula: + +EstimatedRTT=(1−α)⋅EstimatedRTT+α⋅SampleRTT +The formula above is written in the form of a programming-language statement—the new value of +EstimatedRTT is a weighted combination of the previous value of EstimatedRTT and the new value for +SampleRTT. The recommended value of α is α = 0.125 (that is, 1/8) [RFC 6298], in which case the +formula above becomes: + +EstimatedRTT=0.875⋅EstimatedRTT+0.125⋅SampleRTT + +Note that EstimatedRTT is a weighted average of the SampleRTT values. As discussed in a homework +problem at the end of this chapter, this weighted average puts more weight on recent samples than on +old samples. This is natural, as the more recent samples better reflect the current congestion in the +network. In statistics, such an average is called an exponential weighted moving average (EWMA). +The word “exponential” appears in EWMA because the weight of a given SampleRTT decays +exponentially fast as the updates proceed. In the homework problems you will be asked to derive the +exponential term in EstimatedRTT . +Figure 3.32 shows the SampleRTT values and EstimatedRTT for a value of α = 1/8 for a TCP +connection between gaia.cs.umass.edu (in Amherst, Massachusetts) to fantasia.eurecom.fr +(in the south of France). Clearly, the variations in the SampleRTT are smoothed out in the computation +of the EstimatedRTT . +In addition to having an estimate of the RTT, it is also valuable to have a measure of the variability of the +RTT. [RFC 6298] defines the RTT variation, DevRTT , as an estimate of how much SampleRTT +typically deviates from EstimatedRTT : + +DevRTT=(1−β)⋅DevRTT+β⋅|SampleRTT−EstimatedRTT| + +Note that DevRTT is an EWMA of the difference between SampleRTT and EstimatedRTT . If the +SampleRTT values have little fluctuation, then DevRTT will be small; on the other hand, if there is a lot +of fluctuation, DevRTT will be large. The recommended value of β is 0.25. + + Setting and Managing the Retransmission Timeout Interval +Given values of EstimatedRTT and DevRTT , what value should be used for TCP’s timeout interval? +Clearly, the interval should be greater than or equal to + +PRINCIPLES IN PRACTICE +TCP provides reliable data transfer by using positive acknowledgments and timers in much the +same way that we studied in Section 3.4. TCP acknowledges data that has been received +correctly, and it then retransmits segments when segments or their corresponding +acknowledgments are thought to be lost or corrupted. Certain versions of TCP also have an +implicit NAK mechanism—with TCP’s fast retransmit mechanism, the receipt of three duplicate +ACKs for a given segment serves as an implicit NAK for the following segment, triggering +retransmission of that segment before timeout. TCP uses sequences of numbers to allow the +receiver to identify lost or duplicate segments. Just as in the case of our reliable data transfer +protocol, rdt3.0 , TCP cannot itself tell for certain if a segment, or its ACK, is lost, corrupted, or +overly delayed. At the sender, TCP’s response will be the same: retransmit the segment in +question. +TCP also uses pipelining, allowing the sender to have multiple transmitted but yet-to-beacknowledged segments outstanding at any given time. We saw earlier that pipelining can +greatly improve a session’s throughput when the ratio of the segment size to round-trip delay is +small. The specific number of outstanding, unacknowledged segments that a sender can have is +determined by TCP’s flow-control and congestion-control mechanisms. TCP flow control is +discussed at the end of this section; TCP congestion control is discussed in Section 3.7. For the +time being, we must simply be aware that the TCP sender uses pipelining. +EstimatedRTT , or unnecessary retransmissions would be sent. But the timeout interval should not be +too much larger than EstimatedRTT ; otherwise, when a segment is lost, TCP would not quickly +retransmit the segment, leading to large data transfer delays. It is therefore desirable to set the timeout +equal to the EstimatedRTT plus some margin. The margin should be large when there is a lot of +fluctuation in the SampleRTT values; it should be small when there is little fluctuation. The value of +DevRTT should thus come into play here. All of these considerations are taken into account in TCP’s +method for determining the retransmission timeout interval: + +TimeoutInterval=EstimatedRTT+4⋅DevRTT + +An initial TimeoutInterval value of 1 second is recommended [RFC 6298]. Also, when a timeout +occurs, the value of TimeoutInterval is doubled to avoid a premature timeout occurring for a + + subsequent segment that will soon be acknowledged. However, as soon as a segment is received and +EstimatedRTT is updated, the TimeoutInterval is again computed using the formula above. + +Figure 3.32 RTT samples and RTT estimates + +3.5.4 Reliable Data Transfer +Recall that the Internet’s network-layer service (IP service) is unreliable. IP does not guarantee +datagram delivery, does not guarantee in-order delivery of datagrams, and does not guarantee the +integrity of the data in the datagrams. With IP service, datagrams can overflow router buffers and never +reach their destination, datagrams can arrive out of order, and bits in the datagram can get corrupted +(flipped from 0 to 1 and vice versa). Because transport-layer segments are carried across the network +by IP datagrams, transport-layer segments can suffer from these problems as well. +TCP creates a reliable data transfer service on top of IP’s unreliable best-effort service. TCP’s reliable +data transfer service ensures that the data stream that a process reads out of its TCP receive buffer is +uncorrupted, without gaps, without duplication, and in sequence; that is, the byte stream is exactly the +same byte stream that was sent by the end system on the other side of the connection. How TCP +provides a reliable data transfer involves many of the principles that we studied in Section 3.4. +In our earlier development of reliable data transfer techniques, it was conceptually easiest to assume + + that an individual timer is associated with each transmitted but not yet acknowledged segment. While +this is great in theory, timer management can require considerable overhead. Thus, the recommended +TCP timer management procedures [RFC 6298] use only a single retransmission timer, even if there +are multiple transmitted but not yet acknowledged segments. The TCP protocol described in this section +follows this single-timer recommendation. +We will discuss how TCP provides reliable data transfer in two incremental steps. We first present a +highly simplified description of a TCP sender that uses only timeouts to recover from lost segments; we +then present a more complete description that uses duplicate acknowledgments in addition to timeouts. +In the ensuing discussion, we suppose that data is being sent in only one direction, from Host A to Host +B, and that Host A is sending a large file. +Figure 3.33 presents a highly simplified description of a TCP sender. We see that there are three major +events related to data transmission and retransmission in the TCP sender: data received from +application above; timer timeout; and ACK + +Figure 3.33 Simplified TCP sender + + receipt. Upon the occurrence of the first major event, TCP receives data from the application, +encapsulates the data in a segment, and passes the segment to IP. Note that each segment includes a +sequence number that is the byte-stream number of the first data byte in the segment, as described in +Section 3.5.2. Also note that if the timer is already not running for some other segment, TCP starts the +timer when the segment is passed to IP. (It is helpful to think of the timer as being associated with the +oldest unacknowledged segment.) The expiration interval for this timer is the TimeoutInterval , +which is calculated from EstimatedRTT and DevRTT , as described in Section 3.5.3. +The second major event is the timeout. TCP responds to the timeout event by retransmitting the +segment that caused the timeout. TCP then restarts the timer. +The third major event that must be handled by the TCP sender is the arrival of an acknowledgment +segment (ACK) from the receiver (more specifically, a segment containing a valid ACK field value). On +the occurrence of this event, TCP compares the ACK value y with its variable SendBase . The TCP +state variable SendBase is the sequence number of the oldest unacknowledged byte. (Thus +SendBase–1 is the sequence number of the last byte that is known to have been received correctly +and in order at the receiver.) As indicated earlier, TCP uses cumulative acknowledgments, so that y +acknowledges the receipt of all bytes before byte number y . If y > SendBase , then the ACK is +acknowledging one or more previously unacknowledged segments. Thus the sender updates its +SendBase variable; it also restarts the timer if there currently are any not-yet-acknowledged segments. +A Few Interesting Scenarios +We have just described a highly simplified version of how TCP provides reliable data transfer. But even +this highly simplified version has many subtleties. To get a good feeling for how this protocol works, let’s +now walk through a few simple scenarios. Figure 3.34 depicts the first scenario, in which Host A sends +one segment to Host B. Suppose that this segment has sequence number 92 and contains 8 bytes of +data. After sending this segment, Host A waits for a segment from B with acknowledgment number 100. +Although the segment from A is received at B, the acknowledgment from B to A gets lost. In this case, +the timeout event occurs, and Host A retransmits the same segment. Of course, when Host B receives +the retransmission, it observes from the sequence number that the segment contains data that has +already been received. Thus, TCP in Host B will discard the bytes in the retransmitted segment. +In a second scenario, shown in Figure 3.35, Host A sends two segments back to back. The first +segment has sequence number 92 and 8 bytes of data, and the second segment has sequence number +100 and 20 bytes of data. Suppose that both segments arrive intact at B, and B sends two separate +acknowledgments for each of these segments. The first of these acknowledgments has +acknowledgment number 100; the second has acknowledgment number 120. Suppose now that neither +of the acknowledgments arrives at Host A before the timeout. When the timeout event occurs, Host + + Figure 3.34 Retransmission due to a lost acknowledgment + +A resends the first segment with sequence number 92 and restarts the timer. As long as the ACK for the +second segment arrives before the new timeout, the second segment will not be retransmitted. +In a third and final scenario, suppose Host A sends the two segments, exactly as in the second +example. The acknowledgment of the first segment is lost in the network, but just before the timeout +event, Host A receives an acknowledgment with acknowledgment number 120. Host A therefore knows +that Host B has received everything up through byte 119; so Host A does not resend either of the two +segments. This scenario is illustrated in Figure 3.36. +Doubling the Timeout Interval +We now discuss a few modifications that most TCP implementations employ. The first concerns the +length of the timeout interval after a timer expiration. In this modification, whenever the timeout event +occurs, TCP retransmits the not-yet-acknowledged segment with the smallest sequence number, as +described above. But each time TCP retransmits, it sets the next timeout interval to twice the previous +value, + + Figure 3.35 Segment 100 not retransmitted + +rather than deriving it from the last EstimatedRTT and DevRTT (as described in Section 3.5.3). For +example, suppose TimeoutInterval associated with the oldest not yet acknowledged segment is +.75 sec when the timer first expires. TCP will then retransmit this segment and set the new expiration +time to 1.5 sec. If the timer expires again 1.5 sec later, TCP will again retransmit this segment, now +setting the expiration time to 3.0 sec. Thus the intervals grow exponentially after each retransmission. +However, whenever the timer is started after either of the two other events (that is, data received from +application above, and ACK received), the TimeoutInterval is derived from the most recent values +of EstimatedRTT and DevRTT . +This modification provides a limited form of congestion control. (More comprehensive forms of TCP +congestion control will be studied in Section 3.7.) The timer expiration is most likely caused by +congestion in the network, that is, too many packets arriving at one (or more) router queues in the path +between the source and destination, causing packets to be dropped and/or long queuing delays. In +times of congestion, if the sources continue to retransmit packets persistently, the congestion + + Figure 3.36 A cumulative acknowledgment avoids retransmission of the first segment + +may get worse. Instead, TCP acts more politely, with each sender retransmitting after longer and longer +intervals. We will see that a similar idea is used by Ethernet when we study CSMA/CD in Chapter 6. +Fast Retransmit +One of the problems with timeout-triggered retransmissions is that the timeout period can be relatively +long. When a segment is lost, this long timeout period forces the sender to delay resending the lost +packet, thereby increasing the end-to-end delay. Fortunately, the sender can often detect packet loss +well before the timeout event occurs by noting so-called duplicate ACKs. A duplicate ACK is an ACK +that reacknowledges a segment for which the sender has already received an earlier acknowledgment. +To understand the sender’s response to a duplicate ACK, we must look at why the receiver sends a +duplicate ACK in the first place. Table 3.2 summarizes the TCP receiver’s ACK generation policy [RFC +5681]. When a TCP receiver receives +Table 3.2 TCP ACK Generation Recommendation [RFC 5681] +Event + +TCP Receiver Action + + Arrival of in-order segment with expected + +Delayed ACK. Wait up to 500 msec for arrival of + +sequence number. All data up to expected + +another in-order segment. If next in-order segment + +sequence number already acknowledged. + +does not arrive in this interval, send an ACK. + +Arrival of in-order segment with expected + +One Immediately send single cumulative ACK, + +sequence number. One other in-order + +ACKing both in-order segments. + +segment waiting for ACK transmission. +Arrival of out-of-order segment with higher- + +Immediately send duplicate ACK, indicating + +than-expected sequence number. Gap + +sequence number of next expected byte (which is + +detected. + +the lower end of the gap). + +Arrival of segment that partially or completely + +Immediately send ACK, provided that segment + +fills in gap in received data. + +starts at the lower end of gap. + +a segment with a sequence number that is larger than the next, expected, in-order sequence number, it +detects a gap in the data stream—that is, a missing segment. This gap could be the result of lost or +reordered segments within the network. Since TCP does not use negative acknowledgments, the +receiver cannot send an explicit negative acknowledgment back to the sender. Instead, it simply +reacknowledges (that is, generates a duplicate ACK for) the last in-order byte of data it has received. +(Note that Table 3.2 allows for the case that the receiver does not discard out-of-order segments.) +Because a sender often sends a large number of segments back to back, if one segment is lost, there +will likely be many back-to-back duplicate ACKs. If the TCP sender receives three duplicate ACKs for +the same data, it takes this as an indication that the segment following the segment that has been +ACKed three times has been lost. (In the homework problems, we consider the question of why the +sender waits for three duplicate ACKs, rather than just a single duplicate ACK.) In the case that three +duplicate ACKs are received, the TCP sender performs a fast retransmit [RFC 5681], retransmitting +the missing segment before that segment’s timer expires. This is shown in Figure 3.37, where the +second segment is lost, then retransmitted before its timer expires. For TCP with fast retransmit, the +following code snippet replaces the ACK received event in Figure 3.33: + +event: ACK received, with ACK field value of y +if (y > SendBase) { +SendBase=y +if (there are currently any not yet +acknowledged segments) +start timer + + } + +Figure 3.37 Fast retransmit: retransmitting the missing segment before the segment’s timer +expires + +else {/* a duplicate ACK for already ACKed +segment */ +increment number of duplicate ACKs +received for y +if (number of duplicate ACKS received +for y==3) +/* TCP fast retransmit */ +resend segment with sequence number y +} +break; + + We noted earlier that many subtle issues arise when a timeout/retransmit mechanism is implemented in +an actual protocol such as TCP. The procedures above, which have evolved as a result of more than 20 +years of experience with TCP timers, should convince you that this is indeed the case! +Go-Back-N or Selective Repeat? +Let us close our study of TCP’s error-recovery mechanism by considering the following question: Is TCP +a GBN or an SR protocol? Recall that TCP acknowledgments are cumulative and correctly received but +out-of-order segments are not individually ACKed by the receiver. Consequently, as shown in Figure +3.33 (see also Figure 3.19), the TCP sender need only maintain the smallest sequence number of a +transmitted but unacknowledged byte ( SendBase ) and the sequence number of the next byte to be +sent ( NextSeqNum ). In this sense, TCP looks a lot like a GBN-style protocol. But there are some +striking differences between TCP and Go-Back-N. Many TCP implementations will buffer correctly +received but out-of-order segments [Stevens 1994]. Consider also what happens when the sender +sends a sequence of segments 1, 2, . . ., N, and all of the segments arrive in order without error at the +receiver. Further suppose that the acknowledgment for packet n