1 files changed, 31112 insertions, 0 deletions
diff --git a/generated/textbook.md b/generated/textbook.md
new file mode 100644
index 0000000..cfb43cd
--- /dev/null
+++ b/generated/textbook.md
@@ -0,0 +1,31112 @@
+Computer Networking A Top-Down Approach Seventh Edition James F. Kurose
+University of Massachusetts, Amherst Keith W. Ross NYU and NYU Shanghai
+
+Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam
+Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi
+Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Vice
+President, Editorial Director, ECS: Marcia Horton Acquisitions Editor:
+Matt Goldstein Editorial Assistant: Kristy Alaura Vice President of
+Marketing: Christy Lesko Director of Field Marketing: Tim Galligan
+Product Marketing Manager: Bram Van Kempen Field Marketing Manager:
+Demetrius Hall Marketing Assistant: Jon Bryant Director of Product
+Management: Erin Gregg Team Lead, Program and Project Management: Scott
+Disanno Program Manager: Joanne Manning and Carole Snyder Project
+Manager: Katrina Ostler, Ostler Editorial, Inc. Senior Specialist,
+Program Planning and Support: Maura Zaldivar-Garcia
+
+Cover Designer: Joyce Wells Manager, Rights and Permissions: Ben Ferrini
+Project Manager, Rights and Permissions: Jenny Hoffman, Aptara
+Corporation Inventory Manager: Ann Lam Cover Image: Marc Gutierrez/Getty
+Images Media Project Manager: Steve Wright Composition: Cenveo
+Publishing Services Printer/Binder: Edwards Brothers Malloy Cover and
+Insert Printer: Phoenix Color/ Hagerstown Credits and acknowledgments
+borrowed from other sources and reproduced, with permission, in this
+textbook appear on appropriate page within text. Copyright © 2017, 2013,
+2010 Pearson Education, Inc. All rights reserved. Manufactured in the
+United States of America. This publication is protected by Copyright,
+and permission should be obtained from the publisher prior to any
+prohibited reproduction, storage in a retrieval system, or transmission
+in any form or by any means, electronic, mechanical, photocopying,
+recording, or likewise. For information regarding permissions, request
+forms and the appropriate contacts within the Pearson Education Global
+Rights & Permissions Department, please visit
+www.pearsoned.com/permissions/. Many of the designations by
+manufacturers and seller to distinguish their products are claimed as
+trademarks. Where those designations appear in this book, and the
+publisher was aware of a trademark claim, the designations have been
+printed in initial caps or all caps. Library of Congress
+Cataloging-in-Publication Data Names: Kurose, James F. \| Ross, Keith
+W., 1956Title: Computer networking: a top-down approach / James F.
+Kurose, University of Massachusetts, Amherst, Keith W. Ross, NYU and NYU
+Shanghai. Description: Seventh edition. \| Hoboken, New Jersey: Pearson,
+\[2017\] \| Includes bibliographical references and index. Identifiers:
+LCCN 2016004976 \| ISBN 9780133594140 \| ISBN 0133594149 Subjects: LCSH:
+Internet. \| Computer networks. Classification: LCC TK5105.875.I57 K88
+2017 \| DDC 004.6-dc23
+
+LC record available at http://lccn.loc.gov/2016004976
+
+ISBN-10:
+
+0-13-359414-9
+
+ISBN-13: 978-0-13-359414-0
+
+About the Authors Jim Kurose Jim Kurose is a Distinguished University
+Professor of Computer Science at the University of Massachusetts,
+Amherst. He is currently on leave from the University of Massachusetts,
+serving as an Assistant Director at the US National Science Foundation,
+where he leads the Directorate of Computer and Information Science and
+Engineering. Dr. Kurose has received a number of recognitions for his
+educational activities including Outstanding Teacher Awards from the
+National Technological University (eight times), the University of
+Massachusetts, and the Northeast Association of Graduate Schools. He
+received the IEEE Taylor Booth Education Medal and was recognized for
+his leadership of Massachusetts' Commonwealth Information Technology
+Initiative. He has won several conference best paper awards and received
+the IEEE Infocom Achievement Award and the ACM Sigcomm Test of Time
+Award.
+
+Dr. Kurose is a former Editor-in-Chief of IEEE Transactions on
+Communications and of IEEE/ACM Transactions on Networking. He has served
+as Technical Program co-Chair for IEEE Infocom, ACM SIGCOMM, ACM
+Internet Measurement Conference, and ACM SIGMETRICS. He is a Fellow of
+the IEEE and the ACM. His research interests include network protocols
+and architecture, network measurement, multimedia communication, and
+modeling and performance evaluation. He holds a PhD in Computer Science
+from Columbia University.
+
+Keith Ross
+
+Keith Ross is the Dean of Engineering and Computer Science at NYU
+Shanghai and the Leonard J. Shustek Chair Professor in the Computer
+Science and Engineering Department at NYU. Previously he was at
+University of Pennsylvania (13 years), Eurecom Institute (5 years) and
+Polytechnic University (10 years). He received a B.S.E.E from Tufts
+University, a M.S.E.E. from Columbia University, and a Ph.D. in Computer
+and Control Engineering from The University of Michigan. Keith Ross is
+also the co-founder and original CEO of Wimba, which develops online
+multimedia applications for e-learning and was acquired by Blackboard in
+2010.
+
+Professor Ross's research interests are in privacy, social networks,
+peer-to-peer networking, Internet measurement, content distribution
+networks, and stochastic modeling. He is an ACM Fellow, an IEEE Fellow,
+recipient of the Infocom 2009 Best Paper Award, and recipient of 2011
+and 2008 Best Paper Awards for Multimedia Communications (awarded by
+IEEE Communications Society). He has served on numerous journal
+editorial boards and conference program committees, including IEEE/ACM
+Transactions on Networking, ACM SIGCOMM, ACM CoNext, and ACM Internet
+Measurement Conference. He also has served as an advisor to the Federal
+Trade Commission on P2P file sharing.
+
+To Julie and our three precious ones---Chris, Charlie, and Nina JFK
+
+A big THANKS to my professors, colleagues, and students all over the
+world. KWR
+
+Preface Welcome to the seventh edition of Computer Networking: A
+Top-Down Approach. Since the publication of the first edition 16 years
+ago, our book has been adopted for use at many hundreds of colleges and
+universities, translated into 14 languages, and used by over one hundred
+thousand students and practitioners worldwide. We've heard from many of
+these readers and have been overwhelmed by the positive response.
+
+What's New in the Seventh Edition? We think one important reason for
+this success has been that our book continues to offer a fresh and
+timely approach to computer networking instruction. We've made changes
+in this seventh edition, but we've also kept unchanged what we believe
+(and the instructors and students who have used our book have confirmed)
+to be the most important aspects of this book: its top-down approach,
+its focus on the Internet and a modern treatment of computer networking,
+its attention to both principles and practice, and its accessible style
+and approach toward learning about computer networking. Nevertheless,
+the seventh edition has been revised and updated substantially.
+Long-time readers of our book will notice that for the first time since
+this text was published, we've changed the organization of the chapters
+themselves. The network layer, which had been previously covered in a
+single chapter, is now covered in Chapter 4 (which focuses on the
+so-called "data plane" component of the network layer) and Chapter 5
+(which focuses on the network layer's "control plane"). This expanded
+coverage of the network layer reflects the swift rise in importance of
+software-defined networking (SDN), arguably the most important and
+exciting advance in networking in decades. Although a relatively recent
+innovation, SDN has been rapidly adopted in practice---so much so that
+it's already hard to imagine an introduction to modern computer
+networking that doesn't cover SDN. The topic of network management,
+previously covered in Chapter 9, has now been folded into the new
+Chapter 5. As always, we've also updated many other sections of the text
+to reflect recent changes in the dynamic field of networking since the
+sixth edition. As always, material that has been retired from the
+printed text can always be found on this book's Companion Website. The
+most important updates are the following: Chapter 1 has been updated to
+reflect the ever-growing reach and use of the Internet. Chapter 2, which
+covers the application layer, has been significantly updated. We've
+removed the material on the FTP protocol and distributed hash tables to
+make room for a new section on application-level video streaming and
+content distribution networks, together with Netflix and YouTube case
+studies. The socket programming sections have been updated from Python 2
+to Python 3. Chapter 3, which covers the transport layer, has been
+modestly updated. The material on asynchronous transport mode (ATM)
+networks has been replaced by more modern material on the Internet's
+explicit congestion notification (ECN), which teaches the same
+principles. Chapter 4 covers the "data plane" component of the network
+layer---the per-router forwarding function that determine how a packet
+arriving on one of a router's input links is forwarded to one of that
+router's output links. We updated the material on traditional Internet
+forwarding found in all previous editions, and added material on packet
+scheduling. We've also added a new section on generalized forwarding, as
+practiced in SDN. There are also numerous updates throughout the
+chapter. Material on multicast and broadcast communication has been
+removed to make way for the new material. In Chapter 5, we cover the
+control plane functions of the network layer---the network-wide logic
+that controls how a datagram is routed along an end-to-end path of
+routers from the source host to the destination host. As in previous
+editions, we cover routing algorithms, as well as routing protocols
+(with an updated treatment of BGP) used in today's Internet. We've added
+a significant new section on the SDN control plane, where routing and
+other functions are implemented in so-called SDN controllers. Chapter 6,
+which now covers the link layer, has an updated treatment of Ethernet,
+and of data center networking. Chapter 7, which covers wireless and
+mobile networking, contains updated material on 802.11 (so-called "WiFi)
+networks and cellular networks, including 4G and LTE. Chapter 8, which
+covers network security and was extensively updated in the sixth
+edition, has only
+
+modest updates in this seventh edition. Chapter 9, on multimedia
+networking, is now slightly "thinner" than in the sixth edition, as
+material on video streaming and content distribution networks has been
+moved to Chapter 2, and material on packet scheduling has been
+incorporated into Chapter 4. Significant new material involving
+end-of-chapter problems has been added. As with all previous editions,
+homework problems have been revised, added, and removed. As always, our
+aim in creating this new edition of our book is to continue to provide a
+focused and modern treatment of computer networking, emphasizing both
+principles and practice. Audience This textbook is for a first course on
+computer networking. It can be used in both computer science and
+electrical engineering departments. In terms of programming languages,
+the book assumes only that the student has experience with C, C++, Java,
+or Python (and even then only in a few places). Although this book is
+more precise and analytical than many other introductory computer
+networking texts, it rarely uses any mathematical concepts that are not
+taught in high school. We have made a deliberate effort to avoid using
+any advanced calculus, probability, or stochastic process concepts
+(although we've included some homework problems for students with this
+advanced background). The book is therefore appropriate for
+undergraduate courses and for first-year graduate courses. It should
+also be useful to practitioners in the telecommunications industry. What
+Is Unique About This Textbook? The subject of computer networking is
+enormously complex, involving many concepts, protocols, and technologies
+that are woven together in an intricate manner. To cope with this scope
+and complexity, many computer networking texts are often organized
+around the "layers" of a network architecture. With a layered
+organization, students can see through the complexity of computer
+networking---they learn about the distinct concepts and protocols in one
+part of the architecture while seeing the big picture of how all parts
+fit together. From a pedagogical perspective, our personal experience
+has been that such a layered approach indeed works well. Nevertheless,
+we have found that the traditional approach of teaching---bottom up;
+that is, from the physical layer towards the application layer---is not
+the best approach for a modern course on computer networking. A Top-Down
+Approach Our book broke new ground 16 years ago by treating networking
+in a top-down manner---that is, by beginning at the application layer
+and working its way down toward the physical layer. The feedback we
+received from teachers and students alike have confirmed that this
+top-down approach has many advantages and does indeed work well
+pedagogically. First, it places emphasis on the application layer (a
+"high growth area" in networking). Indeed, many of the recent
+revolutions in computer networking---including the Web, peer-to-peer
+file sharing, and media streaming---have taken place at the application
+layer. An early emphasis on application-layer issues differs from the
+approaches taken in most other texts, which have only a small amount of
+material on network applications, their requirements, application-layer
+paradigms (e.g., client-server and peer-to-peer), and application
+programming interfaces. Second, our experience as instructors (and that
+of many instructors who have used this text) has been that teaching
+networking applications near the beginning of the course is a powerful
+motivational tool. Students are thrilled to learn about how networking
+
+applications work---applications such as e-mail and the Web, which most
+students use on a daily basis. Once a student understands the
+applications, the student can then understand the network services
+needed to support these applications. The student can then, in turn,
+examine the various ways in which such services might be provided and
+implemented in the lower layers. Covering applications early thus
+provides motivation for the remainder of the text. Third, a top-down
+approach enables instructors to introduce network application
+development at an early stage. Students not only see how popular
+applications and protocols work, but also learn how easy it is to create
+their own network applications and application-level protocols. With the
+top-down approach, students get early exposure to the notions of socket
+programming, service models, and protocols---important concepts that
+resurface in all subsequent layers. By providing socket programming
+examples in Python, we highlight the central ideas without confusing
+students with complex code. Undergraduates in electrical engineering and
+computer science should not have difficulty following the Python code.
+An Internet Focus Although we dropped the phrase "Featuring the
+Internet" from the title of this book with the fourth edition, this
+doesn't mean that we dropped our focus on the Internet. Indeed, nothing
+could be further from the case! Instead, since the Internet has become
+so pervasive, we felt that any networking textbook must have a
+significant focus on the Internet, and thus this phrase was somewhat
+unnecessary. We continue to use the Internet's architecture and
+protocols as primary vehicles for studying fundamental computer
+networking concepts. Of course, we also include concepts and protocols
+from other network architectures. But the spotlight is clearly on the
+Internet, a fact reflected in our organizing the book around the
+Internet's five-layer architecture: the application, transport, network,
+link, and physical layers. Another benefit of spotlighting the Internet
+is that most computer science and electrical engineering students are
+eager to learn about the Internet and its protocols. They know that the
+Internet has been a revolutionary and disruptive technology and can see
+that it is profoundly changing our world. Given the enormous relevance
+of the Internet, students are naturally curious about what is "under the
+hood." Thus, it is easy for an instructor to get students excited about
+basic principles when using the Internet as the guiding focus. Teaching
+Networking Principles Two of the unique features of the book---its
+top-down approach and its focus on the Internet---have appeared in the
+titles of our book. If we could have squeezed a third phrase into the
+subtitle, it would have contained the word principles. The field of
+networking is now mature enough that a number of fundamentally important
+issues can be identified. For example, in the transport layer, the
+fundamental issues include reliable communication over an unreliable
+network layer, connection establishment/ teardown and handshaking,
+congestion and flow control, and multiplexing. Three fundamentally
+important network-layer issues are determining "good" paths between two
+routers, interconnecting a large number of heterogeneous networks, and
+managing the complexity of a modern network. In the link layer, a
+fundamental problem is sharing a multiple access channel. In network
+security, techniques for providing confidentiality, authentication, and
+message integrity are all based on cryptographic fundamentals. This text
+identifies fundamental networking issues and studies approaches towards
+addressing these issues. The student learning these principles will gain
+knowledge with a long "shelf life"---long after today's network
+standards and protocols have become obsolete, the principles they embody
+will remain important and relevant. We believe that the combination of
+using the Internet to get the student's foot in the door and then
+emphasizing fundamental issues and solution approaches will allow the
+student to
+
+quickly understand just about any networking technology. The Website
+Each new copy of this textbook includes twelve months of access to a
+Companion Website for all book readers at
+http://www.pearsonhighered.com/cs-resources/, which includes:
+Interactive learning material. The book's Companion Website contains
+VideoNotes---video presentations of important topics throughout the book
+done by the authors, as well as walkthroughs of solutions to problems
+similar to those at the end of the chapter. We've seeded the Web site
+with VideoNotes and online problems for Chapters 1 through 5 and will
+continue to actively add and update this material over time. As in
+earlier editions, the Web site contains the interactive Java applets
+that animate many key networking concepts. The site also has interactive
+quizzes that permit students to check their basic understanding of the
+subject matter. Professors can integrate these interactive features into
+their lectures or use them as mini labs. Additional technical material.
+As we have added new material in each edition of our book, we've had to
+remove coverage of some existing topics to keep the book at manageable
+length. For example, to make room for the new material in this edition,
+we've removed material on FTP, distributed hash tables, and
+multicasting, Material that appeared in earlier editions of the text is
+still of interest, and thus can be found on the book's Web site.
+Programming assignments. The Web site also provides a number of detailed
+programming assignments, which include building a multithreaded Web
+server, building an e-mail client with a GUI interface, programming the
+sender and receiver sides of a reliable data transport protocol,
+programming a distributed routing algorithm, and more. Wireshark labs.
+One's understanding of network protocols can be greatly deepened by
+seeing them in action. The Web site provides numerous Wireshark
+assignments that enable students to actually observe the sequence of
+messages exchanged between two protocol entities. The Web site includes
+separate Wireshark labs on HTTP, DNS, TCP, UDP, IP, ICMP, Ethernet, ARP,
+WiFi, SSL, and on tracing all protocols involved in satisfying a request
+to fetch a Web page. We'll continue to add new labs over time. In
+addition to the Companion Website, the authors maintain a public Web
+site, http://gaia.cs.umass.edu/kurose_ross/interactive, containing
+interactive exercises that create (and present solutions for) problems
+similar to selected end-of-chapter problems. Since students can generate
+(and view solutions for) an unlimited number of similar problem
+instances, they can work until the material is truly mastered.
+Pedagogical Features We have each been teaching computer networking for
+more than 30 years. Together, we bring more than 60 years of teaching
+experience to this text, during which time we have taught many thousands
+of students. We have also been active researchers in computer networking
+during this time. (In fact, Jim and Keith first met each other as
+master's students in a computer networking course taught by Mischa
+Schwartz in 1979 at Columbia University.) We think all this gives us a
+good perspective on where networking has been and where it is likely to
+go in the future. Nevertheless, we have resisted temptations to bias the
+material in this book towards our own pet research projects. We figure
+you can visit our personal Web sites if you are interested in our
+research. Thus, this book is about modern computer networking---it is
+about contemporary protocols and technologies as well as the underlying
+principles behind these protocols and technologies. We also believe
+
+that learning (and teaching!) about networking can be fun. A sense of
+humor, use of analogies, and real-world examples in this book will
+hopefully make this material more fun. Supplements for Instructors We
+provide a complete supplements package to aid instructors in teaching
+this course. This material can be accessed from Pearson's Instructor
+Resource Center (http://www.pearsonhighered.com/irc). Visit the
+Instructor Resource Center for information about accessing these
+instructor's supplements. PowerPoint® slides. We provide PowerPoint
+slides for all nine chapters. The slides have been completely updated
+with this seventh edition. The slides cover each chapter in detail. They
+use graphics and animations (rather than relying only on monotonous text
+bullets) to make the slides interesting and visually appealing. We
+provide the original PowerPoint slides so you can customize them to best
+suit your own teaching needs. Some of these slides have been contributed
+by other instructors who have taught from our book. Homework solutions.
+We provide a solutions manual for the homework problems in the text,
+programming assignments, and Wireshark labs. As noted earlier, we've
+introduced many new homework problems in the first six chapters of the
+book. Chapter Dependencies The first chapter of this text presents a
+self-contained overview of computer networking. Introducing many key
+concepts and terminology, this chapter sets the stage for the rest of
+the book. All of the other chapters directly depend on this first
+chapter. After completing Chapter 1, we recommend instructors cover
+Chapters 2 through 6 in sequence, following our top-down philosophy.
+Each of these five chapters leverages material from the preceding
+chapters. After completing the first six chapters, the instructor has
+quite a bit of flexibility. There are no interdependencies among the
+last three chapters, so they can be taught in any order. However, each
+of the last three chapters depends on the material in the first six
+chapters. Many instructors first teach the first six chapters and then
+teach one of the last three chapters for "dessert." One Final Note: We'd
+Love to Hear from You We encourage students and instructors to e-mail us
+with any comments they might have about our book. It's been wonderful
+for us to hear from so many instructors and students from around the
+world about our first five editions. We've incorporated many of these
+suggestions into later editions of the book. We also encourage
+instructors to send us new homework problems (and solutions) that would
+complement the current homework problems. We'll post these on the
+instructor-only portion of the Web site. We also encourage instructors
+and students to create new Java applets that illustrate the concepts and
+protocols in this book. If you have an applet that you think would be
+appropriate for this text, please submit it to us. If the applet
+(including notation and terminology) is appropriate, we'll be happy to
+include it on the text's Web site, with an appropriate reference to the
+applet's authors. So, as the saying goes, "Keep those cards and letters
+coming!" Seriously, please do continue to send us interesting URLs,
+point out typos, disagree with any of our claims, and tell us what works
+and what doesn't work. Tell us what you think should or shouldn't be
+included in the next edition. Send your e-mail to kurose@cs.umass.edu
+and keithwross@nyu.edu.
+
+Acknowledgments Since we began writing this book in 1996, many people
+have given us invaluable help and have been influential in shaping our
+thoughts on how to best organize and teach a networking course. We want
+to say A BIG THANKS to everyone who has helped us from the earliest
+first drafts of this book, up to this seventh edition. We are also very
+thankful to the many hundreds of readers from around the
+world---students, faculty, practitioners---who have sent us thoughts and
+comments on earlier editions of the book and suggestions for future
+editions of the book. Special thanks go out to: Al Aho (Columbia
+University) Hisham Al-Mubaid (University of Houston-Clear Lake) Pratima
+Akkunoor (Arizona State University) Paul Amer (University of Delaware)
+Shamiul Azom (Arizona State University) Lichun Bao (University of
+California at Irvine) Paul Barford (University of Wisconsin) Bobby
+Bhattacharjee (University of Maryland) Steven Bellovin (Columbia
+University) Pravin Bhagwat (Wibhu) Supratik Bhattacharyya (previously at
+Sprint) Ernst Biersack (Eurécom Institute) Shahid Bokhari (University of
+Engineering & Technology, Lahore) Jean Bolot (Technicolor Research)
+Daniel Brushteyn (former University of Pennsylvania student) Ken Calvert
+(University of Kentucky) Evandro Cantu (Federal University of Santa
+Catarina) Jeff Case (SNMP Research International) Jeff Chaltas (Sprint)
+Vinton Cerf (Google) Byung Kyu Choi (Michigan Technological University)
+Bram Cohen (BitTorrent, Inc.) Constantine Coutras (Pace University) John
+Daigle (University of Mississippi) Edmundo A. de Souza e Silva (Federal
+University of Rio de Janeiro)
+
+Philippe Decuetos (Eurécom Institute) Christophe Diot (Technicolor
+Research) Prithula Dhunghel (Akamai) Deborah Estrin (University of
+California, Los Angeles) Michalis Faloutsos (University of California at
+Riverside) Wu-chi Feng (Oregon Graduate Institute) Sally Floyd (ICIR,
+University of California at Berkeley) Paul Francis (Max Planck
+Institute) David Fullager (Netflix) Lixin Gao (University of
+Massachusetts) JJ Garcia-Luna-Aceves (University of California at Santa
+Cruz) Mario Gerla (University of California at Los Angeles) David
+Goodman (NYU-Poly) Yang Guo (Alcatel/Lucent Bell Labs) Tim Griffin
+(Cambridge University) Max Hailperin (Gustavus Adolphus College) Bruce
+Harvey (Florida A&M University, Florida State University) Carl Hauser
+(Washington State University) Rachelle Heller (George Washington
+University) Phillipp Hoschka (INRIA/W3C) Wen Hsin (Park University)
+Albert Huang (former University of Pennsylvania student) Cheng Huang
+(Microsoft Research) Esther A. Hughes (Virginia Commonwealth University)
+Van Jacobson (Xerox PARC) Pinak Jain (former NYU-Poly student) Jobin
+James (University of California at Riverside) Sugih Jamin (University of
+Michigan) Shivkumar Kalyanaraman (IBM Research, India) Jussi Kangasharju
+(University of Helsinki) Sneha Kasera (University of Utah)
+
+Parviz Kermani (formerly of IBM Research) Hyojin Kim (former University
+of Pennsylvania student) Leonard Kleinrock (University of California at
+Los Angeles) David Kotz (Dartmouth College) Beshan Kulapala (Arizona
+State University) Rakesh Kumar (Bloomberg) Miguel A. Labrador
+(University of South Florida) Simon Lam (University of Texas) Steve Lai
+(Ohio State University) Tom LaPorta (Penn State University) Tim-Berners
+Lee (World Wide Web Consortium) Arnaud Legout (INRIA) Lee Leitner
+(Drexel University) Brian Levine (University of Massachusetts) Chunchun
+Li (former NYU-Poly student) Yong Liu (NYU-Poly) William Liang (former
+University of Pennsylvania student) Willis Marti (Texas A&M University)
+Nick McKeown (Stanford University) Josh McKinzie (Park University) Deep
+Medhi (University of Missouri, Kansas City) Bob Metcalfe (International
+Data Group) Sue Moon (KAIST) Jenni Moyer (Comcast) Erich Nahum (IBM
+Research) Christos Papadopoulos (Colorado Sate University) Craig
+Partridge (BBN Technologies) Radia Perlman (Intel) Jitendra Padhye
+(Microsoft Research) Vern Paxson (University of California at Berkeley)
+Kevin Phillips (Sprint)
+
+George Polyzos (Athens University of Economics and Business) Sriram
+Rajagopalan (Arizona State University) Ramachandran Ramjee (Microsoft
+Research) Ken Reek (Rochester Institute of Technology) Martin Reisslein
+(Arizona State University) Jennifer Rexford (Princeton University) Leon
+Reznik (Rochester Institute of Technology) Pablo Rodrigez (Telefonica)
+Sumit Roy (University of Washington) Dan Rubenstein (Columbia
+University) Avi Rubin (Johns Hopkins University) Douglas Salane (John
+Jay College) Despina Saparilla (Cisco Systems) John Schanz (Comcast)
+Henning Schulzrinne (Columbia University) Mischa Schwartz (Columbia
+University) Ardash Sethi (University of Delaware) Harish Sethu (Drexel
+University) K. Sam Shanmugan (University of Kansas) Prashant Shenoy
+(University of Massachusetts) Clay Shields (Georgetown University) Subin
+Shrestra (University of Pennsylvania) Bojie Shu (former NYU-Poly
+student) Mihail L. Sichitiu (NC State University) Peter Steenkiste
+(Carnegie Mellon University) Tatsuya Suda (University of California at
+Irvine) Kin Sun Tam (State University of New York at Albany) Don Towsley
+(University of Massachusetts) David Turner (California State University,
+San Bernardino) Nitin Vaidya (University of Illinois) Michele Weigle
+(Clemson University)
+
+David Wetherall (University of Washington) Ira Winston (University of
+Pennsylvania) Di Wu (Sun Yat-sen University) Shirley Wynn (NYU-Poly) Raj
+Yavatkar (Intel) Yechiam Yemini (Columbia University) Dian Yu (NYU
+Shanghai) Ming Yu (State University of New York at Binghamton) Ellen
+Zegura (Georgia Institute of Technology) Honggang Zhang (Suffolk
+University) Hui Zhang (Carnegie Mellon University) Lixia Zhang
+(University of California at Los Angeles) Meng Zhang (former NYU-Poly
+student) Shuchun Zhang (former University of Pennsylvania student)
+Xiaodong Zhang (Ohio State University) ZhiLi Zhang (University of
+Minnesota) Phil Zimmermann (independent consultant) Mike Zink
+(University of Massachusetts) Cliff C. Zou (University of Central
+Florida) We also want to thank the entire Pearson team---in particular,
+Matt Goldstein and Joanne Manning---who have done an absolutely
+outstanding job on this seventh edition (and who have put up with two
+very finicky authors who seem congenitally unable to meet deadlines!).
+Thanks also to our artists, Janet Theurer and Patrice Rossi Calkin, for
+their work on the beautiful figures in this and earlier editions of our
+book, and to Katie Ostler and her team at Cenveo for their wonderful
+production work on this edition. Finally, a most special thanks go to
+our previous two editors at Addison-Wesley---Michael Hirsch and Susan
+Hartman. This book would not be what it is (and may well not have been
+at all) without their graceful management, constant encouragement,
+nearly infinite patience, good humor, and perseverance.
+
+Table of Contents Chapter 1 Computer Networks and the Internet 1 1.1
+What Is the Internet? 2 1.1.1 A Nuts-and-Bolts Description 2 1.1.2 A
+Services Description 5 1.1.3 What Is a Protocol? 7 1.2 The Network Edge
+9 1.2.1 Access Networks 12 1.2.2 Physical Media 18 1.3 The Network Core
+21 1.3.1 Packet Switching 23 1.3.2 Circuit Switching 27 1.3.3 A Network
+of Networks 31 1.4 Delay, Loss, and Throughput in Packet-Switched
+Networks 35 1.4.1 Overview of Delay in Packet-Switched Networks 35 1.4.2
+Queuing Delay and Packet Loss 39 1.4.3 End-to-End Delay 41 1.4.4
+Throughput in Computer Networks 43 1.5 Protocol Layers and Their Service
+Models 47 1.5.1 Layered Architecture 47 1.5.2 Encapsulation 53 1.6
+Networks Under Attack 55 1.7 History of Computer Networking and the
+Internet 59 1.7.1 The Development of Packet Switching: 1961--1972 59
+1.7.2 Proprietary Networks and Internetworking: 1972--1980 60 1.7.3 A
+Proliferation of Networks: 1980--1990 62 1.7.4 The Internet Explosion:
+The 1990s 63 1.7.5 The New Millennium 64 1.8 Summary 65
+
+Homework Problems and Questions 67 Wireshark Lab 77 Interview: Leonard
+Kleinrock 79 Chapter 2 Application Layer 83 2.1 Principles of Network
+Applications 84 2.1.1 Network Application Architectures 86 2.1.2
+Processes Communicating 88 2.1.3 Transport Services Available to
+Applications 90 2.1.4 Transport Services Provided by the Internet 93
+2.1.5 Application-Layer Protocols 96 2.1.6 Network Applications Covered
+in This Book 97 2.2 The Web and HTTP 98 2.2.1 Overview of HTTP 98 2.2.2
+Non-Persistent and Persistent Connections 100 2.2.3 HTTP Message Format
+103 2.2.4 User-Server Interaction: Cookies 108 2.2.5 Web Caching 110 2.3
+Electronic Mail in the Internet 116 2.3.1 SMTP 118 2.3.2 Comparison with
+HTTP 121 2.3.3 Mail Message Formats 121 2.3.4 Mail Access Protocols 122
+2.4 DNS---The Internet's Directory Service 126 2.4.1 Services Provided
+by DNS 127 2.4.2 Overview of How DNS Works 129 2.4.3 DNS Records and
+Messages 135 2.5 Peer-to-Peer Applications 140 2.5.1 P2P File
+Distribution 140 2.6 Video Streaming and Content Distribution Networks
+147 2.6.1 Internet Video 148 2.6.2 HTTP Streaming and DASH 148
+
+2.6.3 Content Distribution Networks 149 2.6.4 Case Studies: Netflix,
+YouTube, and Kankan 153 2.7 Socket Programming: Creating Network
+Applications 157 2.7.1 Socket Programming with UDP 159 2.7.2 Socket
+Programming with TCP 164 2.8 Summary 170 Homework Problems and Questions
+171 Socket Programming Assignments 180 Wireshark Labs: HTTP, DNS 182
+Interview: Marc Andreessen 184 Chapter 3 Transport Layer 187 3.1
+Introduction and Transport-Layer Services 188 3.1.1 Relationship Between
+Transport and Network Layers 188 3.1.2 Overview of the Transport Layer
+in the Internet 191 3.2 Multiplexing and Demultiplexing 193 3.3
+Connectionless Transport: UDP 200 3.3.1 UDP Segment Structure 204 3.3.2
+UDP Checksum 204 3.4 Principles of Reliable Data Transfer 206 3.4.1
+Building a Reliable Data Transfer Protocol 208 3.4.2 Pipelined Reliable
+Data Transfer Protocols 217 3.4.3 Go-Back-N (GBN) 221 3.4.4 Selective
+Repeat (SR) 226 3.5 Connection-Oriented Transport: TCP 233 3.5.1 The TCP
+Connection 233 3.5.2 TCP Segment Structure 236 3.5.3 Round-Trip Time
+Estimation and Timeout 241 3.5.4 Reliable Data Transfer 244 3.5.5 Flow
+Control 252 3.5.6 TCP Connection Management 255 3.6 Principles of
+Congestion Control 261
+
+3.6.1 The Causes and the Costs of Congestion 261 3.6.2 Approaches to
+Congestion Control 268 3.7 TCP Congestion Control 269 3.7.1 Fairness 279
+3.7.2 Explicit Congestion Notification (ECN): Network-assisted
+Congestion Control 282 3.8 Summary 284 Homework Problems and Questions
+286 Programming Assignments 301 Wireshark Labs: Exploring TCP, UDP 302
+Interview: Van Jacobson 303 Chapter 4 The Network Layer: Data Plane 305
+4.1 Overview of Network Layer 306 4.1.1 Forwarding and Routing: The
+Network Data and Control Planes 306 4.1.2 Network Service Models 311 4.2
+What's Inside a Router? 313 4.2.1 Input Port Processing and
+Destination-Based Forwarding 316 4.2.2 Switching 319 4.2.3 Output Port
+Processing 321 4.2.4 Where Does Queuing Occur? 321 4.2.5 Packet
+Scheduling 325 4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6,
+and More 329 4.3.1 IPv4 Datagram Format 330 4.3.2 IPv4 Datagram
+Fragmentation 332 4.3.3 IPv4 Addressing 334 4.3.4 Network Address
+Translation (NAT) 345 4.3.5 IPv6 348 4.4 Generalized Forwarding and SDN
+354 4.4.1 Match 356 4.4.2 Action 358 4.4.3 OpenFlow Examples of
+Match-plus-action in Action 358 4.5 Summary 361
+
+Homework Problems and Questions 361 Wireshark Lab 370 Interview: Vinton
+G. Cerf 371 Chapter 5 The Network Layer: Control Plane 373 5.1
+Introduction 374 5.2 Routing Algorithms 376 5.2.1 The Link-State (LS)
+Routing Algorithm 379 5.2.2 The Distance-Vector (DV) Routing Algorithm
+384 5.3 Intra-AS Routing in the Internet: OSPF 391 5.4 Routing Among the
+ISPs: BGP 395 5.4.1 The Role of BGP 395 5.4.2 Advertising BGP Route
+Information 396 5.4.3 Determining the Best Routes 398 5.4.4 IP-Anycast
+402 5.4.5 Routing Policy 403 5.4.6 Putting the Pieces Together:
+Obtaining Internet Presence 406 5.5 The SDN Control Plane 407 5.5.1 The
+SDN Control Plane: SDN Controller and SDN Control Applications 410 5.5.2
+OpenFlow Protocol 412 5.5.3 Data and Control Plane Interaction: An
+Example 414 5.5.4 SDN: Past and Future 415 5.6 ICMP: The Internet
+Control Message Protocol 419 5.7 Network Management and SNMP 421 5.7.1
+The Network Management Framework 422 5.7.2 The Simple Network Management
+Protocol (SNMP) 424 5.8 Summary 426 Homework Problems and Questions 427
+Socket Programming Assignment 433 Programming Assignment 434 Wireshark
+Lab 435 Interview: Jennifer Rexford 436
+
+Chapter 6 The Link Layer and LANs 439 6.1 Introduction to the Link Layer
+440 6.1.1 The Services Provided by the Link Layer 442 6.1.2 Where Is the
+Link Layer Implemented? 443 6.2 Error-Detection and -Correction
+Techniques 444 6.2.1 Parity Checks 446 6.2.2 Checksumming Methods 448
+6.2.3 Cyclic Redundancy Check (CRC) 449 6.3 Multiple Access Links and
+Protocols 451 6.3.1 Channel Partitioning Protocols 453 6.3.2 Random
+Access Protocols 455 6.3.3 Taking-Turns Protocols 464 6.3.4 DOCSIS: The
+Link-Layer Protocol for Cable Internet Access 465 6.4 Switched Local
+Area Networks 467 6.4.1 Link-Layer Addressing and ARP 468 6.4.2 Ethernet
+474 6.4.3 Link-Layer Switches 481 6.4.4 Virtual Local Area Networks
+(VLANs) 487 6.5 Link Virtualization: A Network as a Link Layer 491 6.5.1
+Multiprotocol Label Switching (MPLS) 492 6.6 Data Center Networking 495
+6.7 Retrospective: A Day in the Life of a Web Page Request 500 6.7.1
+Getting Started: DHCP, UDP, IP, and Ethernet 500 6.7.2 Still Getting
+Started: DNS and ARP 502 6.7.3 Still Getting Started: Intra-Domain
+Routing to the DNS Server 503 6.7.4 Web Client-Server Interaction: TCP
+and HTTP 504 6.8 Summary 506 Homework Problems and Questions 507
+Wireshark Lab 515 Interview: Simon S. Lam 516
+
+Chapter 7 Wireless and Mobile Networks 519 7.1 Introduction 520 7.2
+Wireless Links and Network Characteristics 525 7.2.1 CDMA 528 7.3 WiFi:
+802.11 Wireless LANs 532 7.3.1 The 802.11 Architecture 533 7.3.2 The
+802.11 MAC Protocol 537 7.3.3 The IEEE 802.11 Frame 542 7.3.4 Mobility
+in the Same IP Subnet 546 7.3.5 Advanced Features in 802.11 547 7.3.6
+Personal Area Networks: Bluetooth and Zigbee 548 7.4 Cellular Internet
+Access 551 7.4.1 An Overview of Cellular Network Architecture 551 7.4.2
+3G Cellular Data Networks: Extending the Internet to Cellular
+Subscribers 554 7.4.3 On to 4G: LTE 557 7.5 Mobility Management:
+Principles 560 7.5.1 Addressing 562 7.5.2 Routing to a Mobile Node 564
+7.6 Mobile IP 570 7.7 Managing Mobility in Cellular Networks 574 7.7.1
+Routing Calls to a Mobile User 576 7.7.2 Handoffs in GSM 577 7.8
+Wireless and Mobility: Impact on Higher-Layer Protocols 580 7.9 Summary
+582 Homework Problems and Questions 583 Wireshark Lab 588 Interview:
+Deborah Estrin 589 Chapter 8 Security in Computer Networks 593 8.1 What
+Is Network Security? 594 8.2 Principles of Cryptography 596 8.2.1
+Symmetric Key Cryptography 598 8.2.2 Public Key Encryption 604
+
+8.3 Message Integrity and Digital Signatures 610 8.3.1 Cryptographic
+Hash Functions 611 8.3.2 Message Authentication Code 613 8.3.3 Digital
+Signatures 614 8.4 End-Point Authentication 621 8.4.1 Authentication
+Protocol ap1.0 622 8.4.2 Authentication Protocol ap2.0 622 8.4.3
+Authentication Protocol ap3.0 623 8.4.4 Authentication Protocol ap3.1
+623 8.4.5 Authentication Protocol ap4.0 624 8.5 Securing E-Mail 626
+8.5.1 Secure E-Mail 627 8.5.2 PGP 630 8.6 Securing TCP Connections: SSL
+631 8.6.1 The Big Picture 632 8.6.2 A More Complete Picture 635 8.7
+Network-Layer Security: IPsec and Virtual Private Networks 637 8.7.1
+IPsec and Virtual Private Networks (VPNs) 638 8.7.2 The AH and ESP
+Protocols 640 8.7.3 Security Associations 640 8.7.4 The IPsec Datagram
+641 8.7.5 IKE: Key Management in IPsec 645 8.8 Securing Wireless LANs
+646 8.8.1 Wired Equivalent Privacy (WEP) 646 8.8.2 IEEE 802.11i 648 8.9
+Operational Security: Firewalls and Intrusion Detection Systems 651
+8.9.1 Firewalls 651 8.9.2 Intrusion Detection Systems 659 8.10 Summary
+662 Homework Problems and Questions 664 Wireshark Lab 672
+
+IPsec Lab 672 Interview: Steven M. Bellovin 673 Chapter 9 Multimedia
+Networking 675 9.1 Multimedia Networking Applications 676 9.1.1
+Properties of Video 676 9.1.2 Properties of Audio 677 9.1.3 Types of
+Multimedia Network Applications 679 9.2 Streaming Stored Video 681 9.2.1
+UDP Streaming 683 9.2.2 HTTP Streaming 684 9.3 Voice-over-IP 688 9.3.1
+Limitations of the Best-Effort IP Service 688 9.3.2 Removing Jitter at
+the Receiver for Audio 691 9.3.3 Recovering from Packet Loss 694 9.3.4
+Case Study: VoIP with Skype 697 9.4 Protocols for Real-Time
+Conversational Applications 700 9.4.1 RTP 700 9.4.2 SIP 703 9.5 Network
+Support for Multimedia 709 9.5.1 Dimensioning Best-Effort Networks 711
+9.5.2 Providing Multiple Classes of Service 712 9.5.3 Diffserv 719 9.5.4
+Per-Connection Quality-of-Service (QoS) Guarantees: Resource Reservation
+and Call Admission 723 9.6 Summary 726 Homework Problems and Questions
+727 Programming Assignment 735 Interview: Henning Schulzrinne 736
+References 741 Index 783
+
+Chapter 1 Computer Networks and the Internet
+
+Today's Internet is arguably the largest engineered system ever created
+by mankind, with hundreds of millions of connected computers,
+communication links, and switches; with billions of users who connect
+via laptops, tablets, and smartphones; and with an array of new
+Internet-connected "things" including game consoles, surveillance
+systems, watches, eye glasses, thermostats, body scales, and cars. Given
+that the Internet is so large and has so many diverse components and
+uses, is there any hope of understanding how it works? Are there guiding
+principles and structure that can provide a foundation for understanding
+such an amazingly large and complex system? And if so, is it possible
+that it actually could be both interesting and fun to learn about
+computer networks? Fortunately, the answer to all of these questions is
+a resounding YES! Indeed, it's our aim in this book to provide you with
+a modern introduction to the dynamic field of computer networking,
+giving you the principles and practical insights you'll need to
+understand not only today's networks, but tomorrow's as well. This first
+chapter presents a broad overview of computer networking and the
+Internet. Our goal here is to paint a broad picture and set the context
+for the rest of this book, to see the forest through the trees. We'll
+cover a lot of ground in this introductory chapter and discuss a lot of
+the pieces of a computer network, without losing sight of the big
+picture. We'll structure our overview of computer networks in this
+chapter as follows. After introducing some basic terminology and
+concepts, we'll first examine the basic hardware and software components
+that make up a network. We'll begin at the network's edge and look at
+the end systems and network applications running in the network. We'll
+then explore the core of a computer network, examining the links and the
+switches that transport data, as well as the access networks and
+physical media that connect end systems to the network core. We'll learn
+that the Internet is a network of networks, and we'll learn how these
+networks connect with each other. After having completed this overview
+of the edge and core of a computer network, we'll take the broader and
+more abstract view in the second half of this chapter. We'll examine
+delay, loss, and throughput of data in a computer network and provide
+simple quantitative models for end-to-end throughput and delay: models
+that take into account transmission, propagation, and queuing delays.
+We'll then introduce some of the key architectural principles in
+computer networking, namely, protocol layering and service models. We'll
+also learn that computer networks are vulnerable to many different types
+of attacks; we'll survey
+
+some of these attacks and consider how computer networks can be made
+more secure. Finally, we'll close this chapter with a brief history of
+computer networking.
+
+1.1 What Is the Internet? In this book, we'll use the public Internet, a
+specific computer network, as our principal vehicle for discussing
+computer networks and their protocols. But what is the Internet? There
+are a couple of ways to answer this question. First, we can describe the
+nuts and bolts of the Internet, that is, the basic hardware and software
+components that make up the Internet. Second, we can describe the
+Internet in terms of a networking infrastructure that provides services
+to distributed applications. Let's begin with the nuts-and-bolts
+description, using Figure 1.1 to illustrate our discussion.
+
+1.1.1 A Nuts-and-Bolts Description The Internet is a computer network
+that interconnects billions of computing devices throughout the world.
+Not too long ago, these computing devices were primarily traditional
+desktop PCs, Linux workstations, and so-called servers that store and
+transmit information such as Web pages and e-mail messages.
+Increasingly, however, nontraditional Internet "things" such as laptops,
+smartphones, tablets, TVs, gaming consoles, thermostats, home security
+systems, home appliances, watches, eye glasses, cars, traffic control
+systems and more are being connected to the Internet. Indeed, the term
+computer network is beginning to sound a bit dated, given the many
+nontraditional devices that are being hooked up to the Internet. In
+Internet jargon, all of these devices are called hosts or end systems.
+By some estimates, in 2015 there were about 5 billion devices connected
+to the Internet, and the number will reach 25 billion by 2020 \[Gartner
+2014\]. It is estimated that in 2015 there were over 3.2 billion
+Internet users worldwide, approximately 40% of the world population
+\[ITU 2015\].
+
+Figure 1.1 Some pieces of the Internet
+
+End systems are connected together by a network of communication links
+and packet switches. We'll see in Section 1.2 that there are many types
+of communication links, which are made up of
+
+different types of physical media, including coaxial cable, copper wire,
+optical fiber, and radio spectrum. Different links can transmit data at
+different rates, with the transmission rate of a link measured in
+bits/second. When one end system has data to send to another end system,
+the sending end system segments the data and adds header bytes to each
+segment. The resulting packages of information, known as packets in the
+jargon of computer networks, are then sent through the network to the
+destination end system, where they are reassembled into the original
+data. A packet switch takes a packet arriving on one of its incoming
+communication links and forwards that packet on one of its outgoing
+communication links. Packet switches come in many shapes and flavors,
+but the two most prominent types in today's Internet are routers and
+link-layer switches. Both types of switches forward packets toward their
+ultimate destinations. Link-layer switches are typically used in access
+networks, while routers are typically used in the network core. The
+sequence of communication links and packet switches traversed by a
+packet from the sending end system to the receiving end system is known
+as a route or path through the network. Cisco predicts annual global IP
+traffic will pass the zettabyte (1021 bytes) threshold by the end of
+2016, and will reach 2 zettabytes per year by 2019 \[Cisco VNI 2015\].
+Packet-switched networks (which transport packets) are in many ways
+similar to transportation networks of highways, roads, and intersections
+(which transport vehicles). Consider, for example, a factory that needs
+to move a large amount of cargo to some destination warehouse located
+thousands of kilometers away. At the factory, the cargo is segmented and
+loaded into a fleet of trucks. Each of the trucks then independently
+travels through the network of highways, roads, and intersections to the
+destination warehouse. At the destination warehouse, the cargo is
+unloaded and grouped with the rest of the cargo arriving from the same
+shipment. Thus, in many ways, packets are analogous to trucks,
+communication links are analogous to highways and roads, packet switches
+are analogous to intersections, and end systems are analogous to
+buildings. Just as a truck takes a path through the transportation
+network, a packet takes a path through a computer network. End systems
+access the Internet through Internet Service Providers (ISPs), including
+residential ISPs such as local cable or telephone companies; corporate
+ISPs; university ISPs; ISPs that provide WiFi access in airports,
+hotels, coffee shops, and other public places; and cellular data ISPs,
+providing mobile access to our smartphones and other devices. Each ISP
+is in itself a network of packet switches and communication links. ISPs
+provide a variety of types of network access to the end systems,
+including residential broadband access such as cable modem or DSL,
+high-speed local area network access, and mobile wireless access. ISPs
+also provide Internet access to content providers, connecting Web sites
+and video servers directly to the Internet. The Internet is all about
+connecting end systems to each other, so the ISPs that provide access to
+end systems must also be interconnected. These lower-tier ISPs are
+interconnected through national and international upper-tier ISPs such
+as Level 3 Communications, AT&T, Sprint, and NTT. An upper-tier ISP
+consists of high-speed routers interconnected with high-speed
+fiber-optic links. Each ISP network, whether upper-tier or lower-tier,
+is
+
+managed independently, runs the IP protocol (see below), and conforms to
+certain naming and address conventions. We'll examine ISPs and their
+interconnection more closely in Section 1.3. End systems, packet
+switches, and other pieces of the Internet run protocols that control
+the sending and receiving of information within the Internet. The
+Transmission Control Protocol (TCP) and the Internet Protocol (IP) are
+two of the most important protocols in the Internet. The IP protocol
+specifies the format of the packets that are sent and received among
+routers and end systems. The Internet's principal protocols are
+collectively known as TCP/IP. We'll begin looking into protocols in this
+introductory chapter. But that's just a start---much of this book is
+concerned with computer network protocols! Given the importance of
+protocols to the Internet, it's important that everyone agree on what
+each and every protocol does, so that people can create systems and
+products that interoperate. This is where standards come into play.
+Internet standards are developed by the Internet Engineering Task Force
+(IETF) \[IETF 2016\]. The IETF standards documents are called requests
+for comments (RFCs). RFCs started out as general requests for comments
+(hence the name) to resolve network and protocol design problems that
+faced the precursor to the Internet \[Allman 2011\]. RFCs tend to be
+quite technical and detailed. They define protocols such as TCP, IP,
+HTTP (for the Web), and SMTP (for e-mail). There are currently more than
+7,000 RFCs. Other bodies also specify standards for network components,
+most notably for network links. The IEEE 802 LAN/MAN Standards Committee
+\[IEEE 802 2016\], for example, specifies the Ethernet and wireless WiFi
+standards.
+
+1.1.2 A Services Description Our discussion above has identified many of
+the pieces that make up the Internet. But we can also describe the
+Internet from an entirely different angle---namely, as an infrastructure
+that provides services to applications. In addition to traditional
+applications such as e-mail and Web surfing, Internet applications
+include mobile smartphone and tablet applications, including Internet
+messaging, mapping with real-time road-traffic information, music
+streaming from the cloud, movie and television streaming, online social
+networks, video conferencing, multi-person games, and location-based
+recommendation systems. The applications are said to be distributed
+applications, since they involve multiple end systems that exchange data
+with each other. Importantly, Internet applications run on end
+systems--- they do not run in the packet switches in the network core.
+Although packet switches facilitate the exchange of data among end
+systems, they are not concerned with the application that is the source
+or sink of data. Let's explore a little more what we mean by an
+infrastructure that provides services to applications. To this end,
+suppose you have an exciting new idea for a distributed Internet
+application, one that may greatly benefit humanity or one that may
+simply make you rich and famous. How might you go about
+
+transforming this idea into an actual Internet application? Because
+applications run on end systems, you are going to need to write programs
+that run on the end systems. You might, for example, write your programs
+in Java, C, or Python. Now, because you are developing a distributed
+Internet application, the programs running on the different end systems
+will need to send data to each other. And here we get to a central
+issue---one that leads to the alternative way of describing the Internet
+as a platform for applications. How does one program running on one end
+system instruct the Internet to deliver data to another program running
+on another end system? End systems attached to the Internet provide a
+socket interface that specifies how a program running on one end system
+asks the Internet infrastructure to deliver data to a specific
+destination program running on another end system. This Internet socket
+interface is a set of rules that the sending program must follow so that
+the Internet can deliver the data to the destination program. We'll
+discuss the Internet socket interface in detail in Chapter 2. For now,
+let's draw upon a simple analogy, one that we will frequently use in
+this book. Suppose Alice wants to send a letter to Bob using the postal
+service. Alice, of course, can't just write the letter (the data) and
+drop the letter out her window. Instead, the postal service requires
+that Alice put the letter in an envelope; write Bob's full name,
+address, and zip code in the center of the envelope; seal the envelope;
+put a stamp in the upper-right-hand corner of the envelope; and finally,
+drop the envelope into an official postal service mailbox. Thus, the
+postal service has its own "postal service interface," or set of rules,
+that Alice must follow to have the postal service deliver her letter to
+Bob. In a similar manner, the Internet has a socket interface that the
+program sending data must follow to have the Internet deliver the data
+to the program that will receive the data. The postal service, of
+course, provides more than one service to its customers. It provides
+express delivery, reception confirmation, ordinary use, and many more
+services. In a similar manner, the Internet provides multiple services
+to its applications. When you develop an Internet application, you too
+must choose one of the Internet's services for your application. We'll
+describe the Internet's services in Chapter 2. We have just given two
+descriptions of the Internet; one in terms of its hardware and software
+components, the other in terms of an infrastructure for providing
+services to distributed applications. But perhaps you are still confused
+as to what the Internet is. What are packet switching and TCP/IP? What
+are routers? What kinds of communication links are present in the
+Internet? What is a distributed application? How can a thermostat or
+body scale be attached to the Internet? If you feel a bit overwhelmed by
+all of this now, don't worry---the purpose of this book is to introduce
+you to both the nuts and bolts of the Internet and the principles that
+govern how and why it works. We'll explain these important terms and
+questions in the following sections and chapters.
+
+1.1.3 What Is a Protocol?
+
+Now that we've got a bit of a feel for what the Internet is, let's
+consider another important buzzword in computer networking: protocol.
+What is a protocol? What does a protocol do? A Human Analogy It is
+probably easiest to understand the notion of a computer network protocol
+by first considering some human analogies, since we humans execute
+protocols all of the time. Consider what you do when you want to ask
+someone for the time of day. A typical exchange is shown in Figure 1.2.
+Human protocol (or good manners, at least) dictates that one first offer
+a greeting (the first "Hi" in Figure 1.2) to initiate communication with
+someone else. The typical response to a "Hi" is a returned "Hi" message.
+Implicitly, one then takes a cordial "Hi" response as an indication that
+one can proceed and ask for the time of day. A different response to the
+initial "Hi" (such as "Don't bother me!" or "I don't speak English," or
+some unprintable reply) might
+
+Figure 1.2 A human protocol and a computer network protocol
+
+indicate an unwillingness or inability to communicate. In this case, the
+human protocol would be not to ask for the time of day. Sometimes one
+gets no response at all to a question, in which case one typically gives
+up asking that person for the time. Note that in our human protocol,
+there are specific messages
+
+we send, and specific actions we take in response to the received reply
+messages or other events (such as no reply within some given amount of
+time). Clearly, transmitted and received messages, and actions taken
+when these messages are sent or received or other events occur, play a
+central role in a human protocol. If people run different protocols (for
+example, if one person has manners but the other does not, or if one
+understands the concept of time and the other does not) the protocols do
+not interoperate and no useful work can be accomplished. The same is
+true in networking---it takes two (or more) communicating entities
+running the same protocol in order to accomplish a task. Let's consider
+a second human analogy. Suppose you're in a college class (a computer
+networking class, for example!). The teacher is droning on about
+protocols and you're confused. The teacher stops to ask, "Are there any
+questions?" (a message that is transmitted to, and received by, all
+students who are not sleeping). You raise your hand (transmitting an
+implicit message to the teacher). Your teacher acknowledges you with a
+smile, saying "Yes . . ." (a transmitted message encouraging you to ask
+your question---teachers love to be asked questions), and you then ask
+your question (that is, transmit your message to your teacher). Your
+teacher hears your question (receives your question message) and answers
+(transmits a reply to you). Once again, we see that the transmission and
+receipt of messages, and a set of conventional actions taken when these
+messages are sent and received, are at the heart of this
+question-and-answer protocol. Network Protocols A network protocol is
+similar to a human protocol, except that the entities exchanging
+messages and taking actions are hardware or software components of some
+device (for example, computer, smartphone, tablet, router, or other
+network-capable device). All activity in the Internet that involves two
+or more communicating remote entities is governed by a protocol. For
+example, hardware-implemented protocols in two physically connected
+computers control the flow of bits on the "wire" between the two network
+interface cards; congestion-control protocols in end systems control the
+rate at which packets are transmitted between sender and receiver;
+protocols in routers determine a packet's path from source to
+destination. Protocols are running everywhere in the Internet, and
+consequently much of this book is about computer network protocols. As
+an example of a computer network protocol with which you are probably
+familiar, consider what happens when you make a request to a Web server,
+that is, when you type the URL of a Web page into your Web browser. The
+scenario is illustrated in the right half of Figure 1.2. First, your
+computer will send a connection request message to the Web server and
+wait for a reply. The Web server will eventually receive your connection
+request message and return a connection reply message. Knowing that it
+is now OK to request the Web document, your computer then sends the name
+of the Web page it wants to fetch from that Web server in a GET message.
+Finally, the Web server returns the Web page (file) to your computer.
+
+Given the human and networking examples above, the exchange of messages
+and the actions taken when these messages are sent and received are the
+key defining elements of a protocol: A protocol defines the format and
+the order of messages exchanged between two or more communicating
+entities, as well as the actions taken on the transmission and/or
+receipt of a message or other event. The Internet, and computer networks
+in general, make extensive use of protocols. Different protocols are
+used to accomplish different communication tasks. As you read through
+this book, you will learn that some protocols are simple and
+straightforward, while others are complex and intellectually deep.
+Mastering the field of computer networking is equivalent to
+understanding the what, why, and how of networking protocols.
+
+1.2 The Network Edge In the previous section we presented a high-level
+overview of the Internet and networking protocols. We are now going to
+delve a bit more deeply into the components of a computer network (and
+the Internet, in particular). We begin in this section at the edge of a
+network and look at the components with which we are most
+familiar---namely, the computers, smartphones and other devices that we
+use on a daily basis. In the next section we'll move from the network
+edge to the network core and examine switching and routing in computer
+networks. Recall from the previous section that in computer networking
+jargon, the computers and other devices connected to the Internet are
+often referred to as end systems. They are referred to as end systems
+because they sit at the edge of the Internet, as shown in Figure 1.3.
+The Internet's end systems include desktop computers (e.g., desktop PCs,
+Macs, and Linux boxes), servers (e.g., Web and e-mail servers), and
+mobile devices (e.g., laptops, smartphones, and tablets). Furthermore,
+an increasing number of non-traditional "things" are being attached to
+the Internet as end systems (see the Case History feature). End systems
+are also referred to as hosts because they host (that is, run)
+application programs such as a Web browser program, a Web server
+program, an e-mail client program, or an e-mail server program.
+Throughout this book we will use the
+
+Figure 1.3 End-system interaction
+
+CASE HISTORY THE INTERNET OF THINGS Can you imagine a world in which
+just about everything is wirelessly connected to the Internet? A world
+in which most people, cars, bicycles, eye glasses, watches, toys,
+hospital equipment, home sensors, classrooms, video surveillance
+systems, atmospheric sensors, store-shelf
+
+products, and pets are connected? This world of the Internet of Things
+(IoT) may actually be just around the corner. By some estimates, as of
+2015 there are already 5 billion things connected to the Internet, and
+the number could reach 25 billion by 2020 \[Gartner 2014\]. These things
+include our smartphones, which already follow us around in our homes,
+offices, and cars, reporting our geolocations and usage data to our ISPs
+and Internet applications. But in addition to our smartphones, a
+wide-variety of non-traditional "things" are already available as
+products. For example, there are Internet-connected wearables, including
+watches (from Apple and many others) and eye glasses. Internet-connected
+glasses can, for example, upload everything we see to the cloud,
+allowing us to share our visual experiences with people around the world
+in realtime. There are Internet-connected things already available for
+the smart home, including Internet-connected thermostats that can be
+controlled remotely from our smartphones, and Internet-connected body
+scales, enabling us to graphically review the progress of our diets from
+our smartphones. There are Internet-connected toys, including dolls that
+recognize and interpret a child's speech and respond appropriately. The
+IoT offers potentially revolutionary benefits to users. But at the same
+time there are also huge security and privacy risks. For example,
+attackers, via the Internet, might be able to hack into IoT devices or
+into the servers collecting data from IoT devices. For example, an
+attacker could hijack an Internet-connected doll and talk directly with
+a child; or an attacker could hack into a database that stores personal
+health and activity information collected from wearable devices. These
+security and privacy concerns could undermine the consumer confidence
+necessary for the technologies to meet their full potential and may
+result in less widespread adoption \[FTC 2015\].
+
+terms hosts and end systems interchangeably; that is, host = end system.
+Hosts are sometimes further divided into two categories: clients and
+servers. Informally, clients tend to be desktop and mobile PCs,
+smartphones, and so on, whereas servers tend to be more powerful
+machines that store and distribute Web pages, stream video, relay
+e-mail, and so on. Today, most of the servers from which we receive
+search results, e-mail, Web pages, and videos reside in large data
+centers. For example, Google has 50-100 data centers, including about 15
+large centers, each with more than 100,000 servers.
+
+1.2.1 Access Networks Having considered the applications and end systems
+at the "edge of the network," let's next consider the access
+network---the network that physically connects an end system to the
+first router (also known as the "edge router") on a path from the end
+system to any other distant end system. Figure 1.4 shows several types
+of access
+
+Figure 1.4 Access networks
+
+networks with thick, shaded lines and the settings (home, enterprise,
+and wide-area mobile wireless) in which they are used. Home Access: DSL,
+Cable, FTTH, Dial-Up, and Satellite
+
+In developed countries as of 2014, more than 78 percent of the
+households have Internet access, with Korea, Netherlands, Finland, and
+Sweden leading the way with more than 80 percent of households having
+Internet access, almost all via a high-speed broadband connection \[ITU
+2015\]. Given this widespread use of home access networks let's begin
+our overview of access networks by considering how homes connect to the
+Internet. Today, the two most prevalent types of broadband residential
+access are digital subscriber line (DSL) and cable. A residence
+typically obtains DSL Internet access from the same local telephone
+company (telco) that provides its wired local phone access. Thus, when
+DSL is used, a customer's telco is also its ISP. As shown in Figure 1.5,
+each customer's DSL modem uses the existing telephone line (twistedpair
+copper wire, which we'll discuss in Section 1.2.2) to exchange data with
+a digital subscriber line access multiplexer (DSLAM) located in the
+telco's local central office (CO). The home's DSL modem takes digital
+data and translates it to high-frequency tones for transmission over
+telephone wires to the CO; the analog signals from many such houses are
+translated back into digital format at the DSLAM. The residential
+telephone line carries both data and traditional telephone signals
+simultaneously, which are encoded at different frequencies: A high-speed
+downstream channel, in the 50 kHz to 1 MHz band A medium-speed upstream
+channel, in the 4 kHz to 50 kHz band An ordinary two-way telephone
+channel, in the 0 to 4 kHz band This approach makes the single DSL link
+appear as if there were three separate links, so that a telephone call
+and an Internet connection can share the DSL link at the same time.
+
+Figure 1.5 DSL Internet access
+
+(We'll describe this technique of frequency-division multiplexing in
+Section 1.3.1.) On the customer side, a splitter separates the data and
+telephone signals arriving to the home and forwards the data signal to
+
+the DSL modem. On the telco side, in the CO, the DSLAM separates the
+data and phone signals and sends the data into the Internet. Hundreds or
+even thousands of households connect to a single DSLAM \[Dischinger
+2007\]. The DSL standards define multiple transmission rates, including
+12 Mbps downstream and 1.8 Mbps upstream \[ITU 1999\], and 55 Mbps
+downstream and 15 Mbps upstream \[ITU 2006\]. Because the downstream and
+upstream rates are different, the access is said to be asymmetric. The
+actual downstream and upstream transmission rates achieved may be less
+than the rates noted above, as the DSL provider may purposefully limit a
+residential rate when tiered service (different rates, available at
+different prices) are offered. The maximum rate is also limited by the
+distance between the home and the CO, the gauge of the twisted-pair line
+and the degree of electrical interference. Engineers have expressly
+designed DSL for short distances between the home and the CO; generally,
+if the residence is not located within 5 to 10 miles of the CO, the
+residence must resort to an alternative form of Internet access. While
+DSL makes use of the telco's existing local telephone infrastructure,
+cable Internet access makes use of the cable television company's
+existing cable television infrastructure. A residence obtains cable
+Internet access from the same company that provides its cable
+television. As illustrated in Figure 1.6, fiber optics connect the cable
+head end to neighborhood-level junctions, from which traditional coaxial
+cable is then used to reach individual houses and apartments. Each
+neighborhood junction typically supports 500 to 5,000 homes. Because
+both fiber and coaxial cable are employed in this system, it is often
+referred to as hybrid fiber coax (HFC).
+
+Figure 1.6 A hybrid fiber-coaxial access network
+
+Cable internet access requires special modems, called cable modems. As
+with a DSL modem, the cable
+
+modem is typically an external device and connects to the home PC
+through an Ethernet port. (We will discuss Ethernet in great detail in
+Chapter 6.) At the cable head end, the cable modem termination system
+(CMTS) serves a similar function as the DSL network's DSLAM---turning
+the analog signal sent from the cable modems in many downstream homes
+back into digital format. Cable modems divide the HFC network into two
+channels, a downstream and an upstream channel. As with DSL, access is
+typically asymmetric, with the downstream channel typically allocated a
+higher transmission rate than the upstream channel. The DOCSIS 2.0
+standard defines downstream rates up to 42.8 Mbps and upstream rates of
+up to 30.7 Mbps. As in the case of DSL networks, the maximum achievable
+rate may not be realized due to lower contracted data rates or media
+impairments. One important characteristic of cable Internet access is
+that it is a shared broadcast medium. In particular, every packet sent
+by the head end travels downstream on every link to every home and every
+packet sent by a home travels on the upstream channel to the head end.
+For this reason, if several users are simultaneously downloading a video
+file on the downstream channel, the actual rate at which each user
+receives its video file will be significantly lower than the aggregate
+cable downstream rate. On the other hand, if there are only a few active
+users and they are all Web surfing, then each of the users may actually
+receive Web pages at the full cable downstream rate, because the users
+will rarely request a Web page at exactly the same time. Because the
+upstream channel is also shared, a distributed multiple access protocol
+is needed to coordinate transmissions and avoid collisions. (We'll
+discuss this collision issue in some detail in Chapter 6.) Although DSL
+and cable networks currently represent more than 85 percent of
+residential broadband access in the United States, an up-and-coming
+technology that provides even higher speeds is fiber to the home (FTTH)
+\[FTTH Council 2016\]. As the name suggests, the FTTH concept is
+simple---provide an optical fiber path from the CO directly to the home.
+Many countries today---including the UAE, South Korea, Hong Kong, Japan,
+Singapore, Taiwan, Lithuania, and Sweden---now have household
+penetration rates exceeding 30% \[FTTH Council 2016\]. There are several
+competing technologies for optical distribution from the CO to the
+homes. The simplest optical distribution network is called direct fiber,
+with one fiber leaving the CO for each home. More commonly, each fiber
+leaving the central office is actually shared by many homes; it is not
+until the fiber gets relatively close to the homes that it is split into
+individual customer-specific fibers. There are two competing
+optical-distribution network architectures that perform this splitting:
+active optical networks (AONs) and passive optical networks (PONs). AON
+is essentially switched Ethernet, which is discussed in Chapter 6. Here,
+we briefly discuss PON, which is used in Verizon's FIOS service. Figure
+1.7 shows FTTH using the PON distribution architecture. Each home has an
+optical network terminator (ONT), which is connected by dedicated
+optical fiber to a neighborhood splitter. The splitter combines a number
+of homes (typically less
+
+Figure 1.7 FTTH Internet access
+
+than 100) onto a single, shared optical fiber, which connects to an
+optical line terminator (OLT) in the telco's CO. The OLT, providing
+conversion between optical and electrical signals, connects to the
+Internet via a telco router. In the home, users connect a home router
+(typically a wireless router) to the ONT and access the Internet via
+this home router. In the PON architecture, all packets sent from OLT to
+the splitter are replicated at the splitter (similar to a cable head
+end). FTTH can potentially provide Internet access rates in the gigabits
+per second range. However, most FTTH ISPs provide different rate
+offerings, with the higher rates naturally costing more money. The
+average downstream speed of US FTTH customers was approximately 20 Mbps
+in 2011 (compared with 13 Mbps for cable access networks and less than 5
+Mbps for DSL) \[FTTH Council 2011b\]. Two other access network
+technologies are also used to provide Internet access to the home. In
+locations where DSL, cable, and FTTH are not available (e.g., in some
+rural settings), a satellite link can be used to connect a residence to
+the Internet at speeds of more than 1 Mbps; StarBand and HughesNet are
+two such satellite access providers. Dial-up access over traditional
+phone lines is based on the same model as DSL---a home modem connects
+over a phone line to a modem in the ISP. Compared with DSL and other
+broadband access networks, dial-up access is excruciatingly slow at 56
+kbps. Access in the Enterprise (and the Home): Ethernet and WiFi On
+corporate and university campuses, and increasingly in home settings, a
+local area network (LAN) is used to connect an end system to the edge
+router. Although there are many types of LAN technologies, Ethernet is
+by far the most prevalent access technology in corporate, university,
+and home networks. As shown in Figure 1.8, Ethernet users use
+twisted-pair copper wire to connect to an Ethernet switch, a technology
+discussed in detail in Chapter 6. The Ethernet switch, or a network of
+such
+
+Figure 1.8 Ethernet Internet access
+
+interconnected switches, is then in turn connected into the larger
+Internet. With Ethernet access, users typically have 100 Mbps or 1 Gbps
+access to the Ethernet switch, whereas servers may have 1 Gbps or even
+10 Gbps access. Increasingly, however, people are accessing the Internet
+wirelessly from laptops, smartphones, tablets, and other "things" (see
+earlier sidebar on "Internet of Things"). In a wireless LAN setting,
+wireless users transmit/receive packets to/from an access point that is
+connected into the enterprise's network (most likely using wired
+Ethernet), which in turn is connected to the wired Internet. A wireless
+LAN user must typically be within a few tens of meters of the access
+point. Wireless LAN access based on IEEE 802.11 technology, more
+colloquially known as WiFi, is now just about everywhere---universities,
+business offices, cafes, airports, homes, and even in airplanes. In many
+cities, one can stand on a street corner and be within range of ten or
+twenty base stations (for a browseable global map of 802.11 base
+stations that have been discovered and logged on a Web site by people
+who take great enjoyment in doing such things, see \[wigle.net 2016\]).
+As discussed in detail in Chapter 7, 802.11 today provides a shared
+transmission rate of up to more than 100 Mbps. Even though Ethernet and
+WiFi access networks were initially deployed in enterprise (corporate,
+university) settings, they have recently become relatively common
+components of home networks. Many homes combine broadband residential
+access (that is, cable modems or DSL) with these inexpensive wireless
+LAN technologies to create powerful home networks \[Edwards 2011\].
+Figure 1.9 shows a typical home network. This home network consists of a
+roaming laptop as well as a wired PC; a base station (the wireless
+access point), which communicates with the wireless PC and other
+wireless devices in the home; a cable modem, providing broadband access
+to the Internet; and a router, which interconnects the base station and
+the stationary PC with the cable modem. This network allows household
+members to have broadband access to the Internet with one member roaming
+from the
+
+kitchen to the backyard to the bedrooms.
+
+Figure 1.9 A typical home network
+
+Wide-Area Wireless Access: 3G and LTE Increasingly, devices such as
+iPhones and Android devices are being used to message, share photos in
+social networks, watch movies, and stream music while on the run. These
+devices employ the same wireless infrastructure used for cellular
+telephony to send/receive packets through a base station that is
+operated by the cellular network provider. Unlike WiFi, a user need only
+be within a few tens of kilometers (as opposed to a few tens of meters)
+of the base station. Telecommunications companies have made enormous
+investments in so-called third-generation (3G) wireless, which provides
+packet-switched wide-area wireless Internet access at speeds in excess
+of 1 Mbps. But even higher-speed wide-area access technologies---a
+fourth-generation (4G) of wide-area wireless networks---are already
+being deployed. LTE (for "Long-Term Evolution"---a candidate for Bad
+Acronym of the Year Award) has its roots in 3G technology, and can
+achieve rates in excess of 10 Mbps. LTE downstream rates of many tens of
+Mbps have been reported in commercial deployments. We'll cover the basic
+principles of wireless networks and mobility, as well as WiFi, 3G, and
+LTE technologies (and more!) in Chapter 7.
+
+1.2.2 Physical Media In the previous subsection, we gave an overview of
+some of the most important network access technologies in the Internet.
+As we described these technologies, we also indicated the physical media
+used. For example, we said that HFC uses a combination of fiber cable
+and coaxial cable. We said that DSL and Ethernet use copper wire. And we
+said that mobile access networks use the radio spectrum. In this
+subsection we provide a brief overview of these and other transmission
+media that are commonly used in the Internet.
+
+In order to define what is meant by a physical medium, let us reflect on
+the brief life of a bit. Consider a bit traveling from one end system,
+through a series of links and routers, to another end system. This poor
+bit gets kicked around and transmitted many, many times! The source end
+system first transmits the bit, and shortly thereafter the first router
+in the series receives the bit; the first router then transmits the bit,
+and shortly thereafter the second router receives the bit; and so on.
+Thus our bit, when traveling from source to destination, passes through
+a series of transmitter-receiver pairs. For each transmitterreceiver
+pair, the bit is sent by propagating electromagnetic waves or optical
+pulses across a physical medium. The physical medium can take many
+shapes and forms and does not have to be of the same type for each
+transmitter-receiver pair along the path. Examples of physical media
+include twisted-pair copper wire, coaxial cable, multimode fiber-optic
+cable, terrestrial radio spectrum, and satellite radio spectrum.
+Physical media fall into two categories: guided media and unguided
+media. With guided media, the waves are guided along a solid medium,
+such as a fiber-optic cable, a twisted-pair copper wire, or a coaxial
+cable. With unguided media, the waves propagate in the atmosphere and in
+outer space, such as in a wireless LAN or a digital satellite channel.
+But before we get into the characteristics of the various media types,
+let us say a few words about their costs. The actual cost of the
+physical link (copper wire, fiber-optic cable, and so on) is often
+relatively minor compared with other networking costs. In particular,
+the labor cost associated with the installation of the physical link can
+be orders of magnitude higher than the cost of the material. For this
+reason, many builders install twisted pair, optical fiber, and coaxial
+cable in every room in a building. Even if only one medium is initially
+used, there is a good chance that another medium could be used in the
+near future, and so money is saved by not having to lay additional wires
+in the future. Twisted-Pair Copper Wire The least expensive and most
+commonly used guided transmission medium is twisted-pair copper wire.
+For over a hundred years it has been used by telephone networks. In
+fact, more than 99 percent of the wired connections from the telephone
+handset to the local telephone switch use twisted-pair copper wire. Most
+of us have seen twisted pair in our homes (or those of our parents or
+grandparents!) and work environments. Twisted pair consists of two
+insulated copper wires, each about 1 mm thick, arranged in a regular
+spiral pattern. The wires are twisted together to reduce the electrical
+interference from similar pairs close by. Typically, a number of pairs
+are bundled together in a cable by wrapping the pairs in a protective
+shield. A wire pair constitutes a single communication link. Unshielded
+twisted pair (UTP) is commonly used for computer networks within a
+building, that is, for LANs. Data rates for LANs using twisted pair
+today range from 10 Mbps to 10 Gbps. The data rates that can be achieved
+depend on the thickness of the wire and the distance between transmitter
+and receiver. When fiber-optic technology emerged in the 1980s, many
+people disparaged twisted pair because of its relatively low bit rates.
+Some people even felt that fiber-optic technology would completely
+replace twisted pair. But twisted pair did not give up so easily. Modern
+twisted-pair technology, such as category
+
+6a cable, can achieve data rates of 10 Gbps for distances up to a
+hundred meters. In the end, twisted pair has emerged as the dominant
+solution for high-speed LAN networking. As discussed earlier, twisted
+pair is also commonly used for residential Internet access. We saw that
+dial-up modem technology enables access at rates of up to 56 kbps over
+twisted pair. We also saw that DSL (digital subscriber line) technology
+has enabled residential users to access the Internet at tens of Mbps
+over twisted pair (when users live close to the ISP's central office).
+Coaxial Cable Like twisted pair, coaxial cable consists of two copper
+conductors, but the two conductors are concentric rather than parallel.
+With this construction and special insulation and shielding, coaxial
+cable can achieve high data transmission rates. Coaxial cable is quite
+common in cable television systems. As we saw earlier, cable television
+systems have recently been coupled with cable modems to provide
+residential users with Internet access at rates of tens of Mbps. In
+cable television and cable Internet access, the transmitter shifts the
+digital signal to a specific frequency band, and the resulting analog
+signal is sent from the transmitter to one or more receivers. Coaxial
+cable can be used as a guided shared medium. Specifically, a number of
+end systems can be connected directly to the cable, with each of the end
+systems receiving whatever is sent by the other end systems. Fiber
+Optics An optical fiber is a thin, flexible medium that conducts pulses
+of light, with each pulse representing a bit. A single optical fiber can
+support tremendous bit rates, up to tens or even hundreds of gigabits
+per second. They are immune to electromagnetic interference, have very
+low signal attenuation up to 100 kilometers, and are very hard to tap.
+These characteristics have made fiber optics the preferred longhaul
+guided transmission media, particularly for overseas links. Many of the
+long-distance telephone networks in the United States and elsewhere now
+use fiber optics exclusively. Fiber optics is also prevalent in the
+backbone of the Internet. However, the high cost of optical
+devices---such as transmitters, receivers, and switches---has hindered
+their deployment for short-haul transport, such as in a LAN or into the
+home in a residential access network. The Optical Carrier (OC) standard
+link speeds range from 51.8 Mbps to 39.8 Gbps; these specifications are
+often referred to as OC-n, where the link speed equals n ∞ 51.8 Mbps.
+Standards in use today include OC-1, OC-3, OC-12, OC-24, OC-48, OC96,
+OC-192, OC-768. \[Mukherjee 2006, Ramaswami 2010\] provide coverage of
+various aspects of optical networking. Terrestrial Radio Channels Radio
+channels carry signals in the electromagnetic spectrum. They are an
+attractive medium because they require no physical wire to be installed,
+can penetrate walls, provide connectivity to a mobile user,
+
+and can potentially carry a signal for long distances. The
+characteristics of a radio channel depend significantly on the
+propagation environment and the distance over which a signal is to be
+carried. Environmental considerations determine path loss and shadow
+fading (which decrease the signal strength as the signal travels over a
+distance and around/through obstructing objects), multipath fading (due
+to signal reflection off of interfering objects), and interference (due
+to other transmissions and electromagnetic signals). Terrestrial radio
+channels can be broadly classified into three groups: those that operate
+over very short distance (e.g., with one or two meters); those that
+operate in local areas, typically spanning from ten to a few hundred
+meters; and those that operate in the wide area, spanning tens of
+kilometers. Personal devices such as wireless headsets, keyboards, and
+medical devices operate over short distances; the wireless LAN
+technologies described in Section 1.2.1 use local-area radio channels;
+the cellular access technologies use wide-area radio channels. We'll
+discuss radio channels in detail in Chapter 7. Satellite Radio Channels
+A communication satellite links two or more Earth-based microwave
+transmitter/ receivers, known as ground stations. The satellite receives
+transmissions on one frequency band, regenerates the signal using a
+repeater (discussed below), and transmits the signal on another
+frequency. Two types of satellites are used in communications:
+geostationary satellites and low-earth orbiting (LEO) satellites \[Wiki
+Satellite 2016\]. Geostationary satellites permanently remain above the
+same spot on Earth. This stationary presence is achieved by placing the
+satellite in orbit at 36,000 kilometers above Earth's surface. This huge
+distance from ground station through satellite back to ground station
+introduces a substantial signal propagation delay of 280 milliseconds.
+Nevertheless, satellite links, which can operate at speeds of hundreds
+of Mbps, are often used in areas without access to DSL or cable-based
+Internet access. LEO satellites are placed much closer to Earth and do
+not remain permanently above one spot on Earth. They rotate around Earth
+(just as the Moon does) and may communicate with each other, as well as
+with ground stations. To provide continuous coverage to an area, many
+satellites need to be placed in orbit. There are currently many
+low-altitude communication systems in development. LEO satellite
+technology may be used for Internet access sometime in the future.
+
+1.3 The Network Core Having examined the Internet's edge, let us now
+delve more deeply inside the network core---the mesh of packet switches
+and links that interconnects the Internet's end systems. Figure 1.10
+highlights the network core with thick, shaded lines.
+
+Figure 1.10 The network core
+
+1.3.1 Packet Switching In a network application, end systems exchange
+messages with each other. Messages can contain anything the application
+designer wants. Messages may perform a control function (for example,
+the "Hi" messages in our handshaking example in Figure 1.2) or can
+contain data, such as an e-mail message, a JPEG image, or an MP3 audio
+file. To send a message from a source end system to a destination end
+system, the source breaks long messages into smaller chunks of data
+known as packets. Between source and destination, each packet travels
+through communication links and packet switches (for which there are two
+predominant types, routers and link-layer switches). Packets are
+transmitted over each communication link at a rate equal to the full
+transmission rate of the link. So, if a source end system or a packet
+switch is sending a packet of L bits over a link with transmission rate
+R bits/sec, then the time to transmit the packet is L / R seconds.
+Store-and-Forward Transmission Most packet switches use
+store-and-forward transmission at the inputs to the links.
+Store-and-forward transmission means that the packet switch must receive
+the entire packet before it can begin to transmit the first bit of the
+packet onto the outbound link. To explore store-and-forward transmission
+in more detail, consider a simple network consisting of two end systems
+connected by a single router, as shown in Figure 1.11. A router will
+typically have many incident links, since its job is to switch an
+incoming packet onto an outgoing link; in this simple example, the
+router has the rather simple task of transferring a packet from one
+(input) link to the only other attached link. In this example, the
+source has three packets, each consisting of L bits, to send to the
+destination. At the snapshot of time shown in Figure 1.11, the source
+has transmitted some of packet 1, and the front of packet 1 has already
+arrived at the router. Because the router employs store-and-forwarding,
+at this instant of time, the router cannot transmit the bits it has
+received; instead it must first buffer (i.e., "store") the packet's
+bits. Only after the router has received all of the packet's bits can it
+begin to transmit (i.e., "forward") the packet onto the outbound link.
+To gain some insight into store-and-forward transmission, let's now
+calculate the amount of time that elapses from when the source begins to
+send the packet until the destination has received the entire packet.
+(Here we will ignore propagation delay---the time it takes for the bits
+to travel across the wire at near the speed of light---which will be
+discussed in Section 1.4.) The source begins to transmit at time 0; at
+time L/R seconds, the source has transmitted the entire packet, and the
+entire packet has been received and stored at the router (since there is
+no propagation delay). At time L/R seconds, since the router has just
+received the entire packet, it can begin to transmit the packet onto the
+outbound link towards the destination; at time 2L/R, the router has
+transmitted the entire packet, and the
+
+entire packet has been received by the destination. Thus, the total
+delay is 2L/R. If the
+
+Figure 1.11 Store-and-forward packet switching
+
+switch instead forwarded bits as soon as they arrive (without first
+receiving the entire packet), then the total delay would be L/R since
+bits are not held up at the router. But, as we will discuss in Section
+1.4, routers need to receive, store, and process the entire packet
+before forwarding. Now let's calculate the amount of time that elapses
+from when the source begins to send the first packet until the
+destination has received all three packets. As before, at time L/R, the
+router begins to forward the first packet. But also at time L/R the
+source will begin to send the second packet, since it has just finished
+sending the entire first packet. Thus, at time 2L/R, the destination has
+received the first packet and the router has received the second packet.
+Similarly, at time 3L/R, the destination has received the first two
+packets and the router has received the third packet. Finally, at time
+4L/R the destination has received all three packets! Let's now consider
+the general case of sending one packet from source to destination over a
+path consisting of N links each of rate R (thus, there are N-1 routers
+between source and destination). Applying the same logic as above, we
+see that the end-to-end delay is: dend-to-end=NLR
+
+(1.1)
+
+You may now want to try to determine what the delay would be for P
+packets sent over a series of N links. Queuing Delays and Packet Loss
+Each packet switch has multiple links attached to it. For each attached
+link, the packet switch has an output buffer (also called an output
+queue), which stores packets that the router is about to send into that
+link. The output buffers play a key role in packet switching. If an
+arriving packet needs to be transmitted onto a link but finds the link
+busy with the transmission of another packet, the arriving packet must
+wait in the output buffer. Thus, in addition to the store-and-forward
+delays, packets suffer output buffer queuing delays. These delays are
+variable and depend on the level of congestion in the network.
+
+Since the amount of buffer space is finite, an
+
+Figure 1.12 Packet switching
+
+arriving packet may find that the buffer is completely full with other
+packets waiting for transmission. In this case, packet loss will
+occur---either the arriving packet or one of the already-queued packets
+will be dropped. Figure 1.12 illustrates a simple packet-switched
+network. As in Figure 1.11, packets are represented by three-dimensional
+slabs. The width of a slab represents the number of bits in the packet.
+In this figure, all packets have the same width and hence the same
+length. Suppose Hosts A and B are sending packets to Host E. Hosts A and
+B first send their packets along 100 Mbps Ethernet links to the first
+router. The router then directs these packets to the 15 Mbps link. If,
+during a short interval of time, the arrival rate of packets to the
+router (when converted to bits per second) exceeds 15 Mbps, congestion
+will occur at the router as packets queue in the link's output buffer
+before being transmitted onto the link. For example, if Host A and B
+each send a burst of five packets back-to-back at the same time, then
+most of these packets will spend some time waiting in the queue. The
+situation is, in fact, entirely analogous to many common-day
+situations---for example, when we wait in line for a bank teller or wait
+in front of a tollbooth. We'll examine this queuing delay in more detail
+in Section 1.4. Forwarding Tables and Routing Protocols Earlier, we said
+that a router takes a packet arriving on one of its attached
+communication links and forwards that packet onto another one of its
+attached communication links. But how does the router determine which
+link it should forward the packet onto? Packet forwarding is actually
+done in different ways in different types of computer networks. Here, we
+briefly describe how it is done in the Internet.
+
+In the Internet, every end system has an address called an IP address.
+When a source end system wants to send a packet to a destination end
+system, the source includes the destination's IP address in the packet's
+header. As with postal addresses, this address has a hierarchical
+structure. When a packet arrives at a router in the network, the router
+examines a portion of the packet's destination address and forwards the
+packet to an adjacent router. More specifically, each router has a
+forwarding table that maps destination addresses (or portions of the
+destination addresses) to that router's outbound links. When a packet
+arrives at a router, the router examines the address and searches its
+forwarding table, using this destination address, to find the
+appropriate outbound link. The router then directs the packet to this
+outbound link. The end-to-end routing process is analogous to a car
+driver who does not use maps but instead prefers to ask for directions.
+For example, suppose Joe is driving from Philadelphia to 156 Lakeside
+Drive in Orlando, Florida. Joe first drives to his neighborhood gas
+station and asks how to get to 156 Lakeside Drive in Orlando, Florida.
+The gas station attendant extracts the Florida portion of the address
+and tells Joe that he needs to get onto the interstate highway I-95
+South, which has an entrance just next to the gas station. He also tells
+Joe that once he enters Florida, he should ask someone else there. Joe
+then takes I-95 South until he gets to Jacksonville, Florida, at which
+point he asks another gas station attendant for directions. The
+attendant extracts the Orlando portion of the address and tells Joe that
+he should continue on I-95 to Daytona Beach and then ask someone else.
+In Daytona Beach, another gas station attendant also extracts the
+Orlando portion of the address and tells Joe that he should take I-4
+directly to Orlando. Joe takes I-4 and gets off at the Orlando exit. Joe
+goes to another gas station attendant, and this time the attendant
+extracts the Lakeside Drive portion of the address and tells Joe the
+road he must follow to get to Lakeside Drive. Once Joe reaches Lakeside
+Drive, he asks a kid on a bicycle how to get to his destination. The kid
+extracts the 156 portion of the address and points to the house. Joe
+finally reaches his ultimate destination. In the above analogy, the gas
+station attendants and kids on bicycles are analogous to routers. We
+just learned that a router uses a packet's destination address to index
+a forwarding table and determine the appropriate outbound link. But this
+statement begs yet another question: How do forwarding tables get set?
+Are they configured by hand in each and every router, or does the
+Internet use a more automated procedure? This issue will be studied in
+depth in Chapter 5. But to whet your appetite here, we'll note now that
+the Internet has a number of special routing protocols that are used to
+automatically set the forwarding tables. A routing protocol may, for
+example, determine the shortest path from each router to each
+destination and use the shortest path results to configure the
+forwarding tables in the routers. How would you actually like to see the
+end-to-end route that packets take in the Internet? We now invite you to
+get your hands dirty by interacting with the Trace-route program. Simply
+visit the site www.traceroute.org, choose a source in a particular
+country, and trace the route from that source to your computer. (For a
+discussion of Traceroute, see Section 1.4.)
+
+1.3.2 Circuit Switching There are two fundamental approaches to moving
+data through a network of links and switches: circuit switching and
+packet switching. Having covered packet-switched networks in the
+previous subsection, we now turn our attention to circuit-switched
+networks. In circuit-switched networks, the resources needed along a
+path (buffers, link transmission rate) to provide for communication
+between the end systems are reserved for the duration of the
+communication session between the end systems. In packet-switched
+networks, these resources are not reserved; a session's messages use the
+resources on demand and, as a consequence, may have to wait (that is,
+queue) for access to a communication link. As a simple analogy, consider
+two restaurants, one that requires reservations and another that neither
+requires reservations nor accepts them. For the restaurant that requires
+reservations, we have to go through the hassle of calling before we
+leave home. But when we arrive at the restaurant we can, in principle,
+immediately be seated and order our meal. For the restaurant that does
+not require reservations, we don't need to bother to reserve a table.
+But when we arrive at the restaurant, we may have to wait for a table
+before we can be seated. Traditional telephone networks are examples of
+circuit-switched networks. Consider what happens when one person wants
+to send information (voice or facsimile) to another over a telephone
+network. Before the sender can send the information, the network must
+establish a connection between the sender and the receiver. This is a
+bona fide connection for which the switches on the path between the
+sender and receiver maintain connection state for that connection. In
+the jargon of telephony, this connection is called a circuit. When the
+network establishes the circuit, it also reserves a constant
+transmission rate in the network's links (representing a fraction of
+each link's transmission capacity) for the duration of the connection.
+Since a given transmission rate has been reserved for this
+sender-toreceiver connection, the sender can transfer the data to the
+receiver at the guaranteed constant rate. Figure 1.13 illustrates a
+circuit-switched network. In this network, the four circuit switches are
+interconnected by four links. Each of these links has four circuits, so
+that each link can support four simultaneous connections. The hosts (for
+example, PCs and workstations) are each directly connected to one of the
+switches. When two hosts want to communicate, the network establishes a
+dedicated endto-end connection between the two hosts. Thus, in order for
+Host A to communicate with Host B, the network must first reserve one
+circuit on each of two links. In this example, the dedicated end-to-end
+connection uses the second circuit in the first link and the fourth
+circuit in the second link. Because each link has four circuits, for
+each link used by the end-to-end connection, the connection gets one
+fourth of the link's total transmission capacity for the duration of the
+connection. Thus, for example, if each link between adjacent switches
+has a transmission rate of 1 Mbps, then each end-to-end circuit-switch
+connection gets 250 kbps of dedicated transmission rate.
+
+Figure 1.13 A simple circuit-switched network consisting of four
+switches and four links
+
+In contrast, consider what happens when one host wants to send a packet
+to another host over a packet-switched network, such as the Internet. As
+with circuit switching, the packet is transmitted over a series of
+communication links. But different from circuit switching, the packet is
+sent into the network without reserving any link resources whatsoever.
+If one of the links is congested because other packets need to be
+transmitted over the link at the same time, then the packet will have to
+wait in a buffer at the sending side of the transmission link and suffer
+a delay. The Internet makes its best effort to deliver packets in a
+timely manner, but it does not make any guarantees. Multiplexing in
+Circuit-Switched Networks A circuit in a link is implemented with either
+frequency-division multiplexing (FDM) or time-division multiplexing
+(TDM). With FDM, the frequency spectrum of a link is divided up among
+the connections established across the link. Specifically, the link
+dedicates a frequency band to each connection for the duration of the
+connection. In telephone networks, this frequency band typically has a
+width of 4 kHz (that is, 4,000 hertz or 4,000 cycles per second). The
+width of the band is called, not surprisingly, the bandwidth. FM radio
+stations also use FDM to share the frequency spectrum between 88 MHz and
+108 MHz, with each station being allocated a specific frequency band.
+For a TDM link, time is divided into frames of fixed duration, and each
+frame is divided into a fixed number of time slots. When the network
+establishes a connection across a link, the network dedicates one time
+slot in every frame to this connection. These slots are dedicated for
+the sole use of that connection, with one time slot available for use
+(in every frame) to transmit the connection's data.
+
+Figure 1.14 With FDM, each circuit continuously gets a fraction of the
+bandwidth. With TDM, each circuit gets all of the bandwidth periodically
+during brief intervals of time (that is, during slots)
+
+Figure 1.14 illustrates FDM and TDM for a specific network link
+supporting up to four circuits. For FDM, the frequency domain is
+segmented into four bands, each of bandwidth 4 kHz. For TDM, the time
+domain is segmented into frames, with four time slots in each frame;
+each circuit is assigned the same dedicated slot in the revolving TDM
+frames. For TDM, the transmission rate of a circuit is equal to the
+frame rate multiplied by the number of bits in a slot. For example, if
+the link transmits 8,000 frames per second and each slot consists of 8
+bits, then the transmission rate of each circuit is 64 kbps. Proponents
+of packet switching have always argued that circuit switching is
+wasteful because the dedicated circuits are idle during silent periods.
+For example, when one person in a telephone call stops talking, the idle
+network resources (frequency bands or time slots in the links along the
+connection's route) cannot be used by other ongoing connections. As
+another example of how these resources can be underutilized, consider a
+radiologist who uses a circuit-switched network to remotely access a
+series of x-rays. The radiologist sets up a connection, requests an
+image, contemplates the image, and then requests a new image. Network
+resources are allocated to the connection but are not used (i.e., are
+wasted) during the radiologist's contemplation periods. Proponents of
+packet switching also enjoy pointing out that establishing end-to-end
+circuits and reserving end-to-end transmission capacity is complicated
+and requires complex signaling software to coordinate the operation of
+the switches along the end-to-end path. Before we finish our discussion
+of circuit switching, let's work through a numerical example that should
+shed further insight on the topic. Let us consider how long it takes to
+send a file of 640,000 bits from Host A to Host B over a
+circuit-switched network. Suppose that all links in the network use TDM
+with 24 slots and have a bit rate of 1.536 Mbps. Also suppose that it
+takes 500 msec to establish an end-to-end circuit before Host A can
+begin to transmit the file. How long does it take to send the file? Each
+circuit has a transmission rate of (1.536 Mbps)/24=64 kbps, so it takes
+(640,000 bits)/(64 kbps)=10 seconds to transmit the file. To this 10
+seconds we add the circuit establishment time, giving 10.5 seconds to
+send the file. Note that the transmission time is independent of the
+number of links: The transmission time would be 10 seconds if the
+end-to-end circuit passed through one link or a hundred links. (The
+actual
+
+end-to-end delay also includes a propagation delay; see Section 1.4.)
+Packet Switching Versus Circuit Switching Having described circuit
+switching and packet switching, let us compare the two. Critics of
+packet switching have often argued that packet switching is not suitable
+for real-time services (for example, telephone calls and video
+conference calls) because of its variable and unpredictable end-to-end
+delays (due primarily to variable and unpredictable queuing delays).
+Proponents of packet switching argue that (1) it offers better sharing
+of transmission capacity than circuit switching and (2) it is simpler,
+more efficient, and less costly to implement than circuit switching. An
+interesting discussion of packet switching versus circuit switching is
+\[Molinero-Fernandez 2002\]. Generally speaking, people who do not like
+to hassle with restaurant reservations prefer packet switching to
+circuit switching. Why is packet switching more efficient? Let's look at
+a simple example. Suppose users share a 1 Mbps link. Also suppose that
+each user alternates between periods of activity, when a user generates
+data at a constant rate of 100 kbps, and periods of inactivity, when a
+user generates no data. Suppose further that a user is active only 10
+percent of the time (and is idly drinking coffee during the remaining 90
+percent of the time). With circuit switching, 100 kbps must be reserved
+for each user at all times. For example, with circuit-switched TDM, if a
+one-second frame is divided into 10 time slots of 100 ms each, then each
+user would be allocated one time slot per frame. Thus, the
+circuit-switched link can support only 10(=1 Mbps/100 kbps) simultaneous
+users. With packet switching, the probability that a specific user is
+active is 0.1 (that is, 10 percent). If there are 35 users, the
+probability that there are 11 or more simultaneously active users is
+approximately 0.0004. (Homework Problem P8 outlines how this probability
+is obtained.) When there are 10 or fewer simultaneously active users
+(which happens with probability 0.9996), the aggregate arrival rate of
+data is less than or equal to 1 Mbps, the output rate of the link. Thus,
+when there are 10 or fewer active users, users' packets flow through the
+link essentially without delay, as is the case with circuit switching.
+When there are more than 10 simultaneously active users, then the
+aggregate arrival rate of packets exceeds the output capacity of the
+link, and the output queue will begin to grow. (It continues to grow
+until the aggregate input rate falls back below 1 Mbps, at which point
+the queue will begin to diminish in length.) Because the probability of
+having more than 10 simultaneously active users is minuscule in this
+example, packet switching provides essentially the same performance as
+circuit switching, but does so while allowing for more than three times
+the number of users. Let's now consider a second simple example. Suppose
+there are 10 users and that one user suddenly generates one thousand
+1,000-bit packets, while other users remain quiescent and do not
+generate packets. Under TDM circuit switching with 10 slots per frame
+and each slot consisting of 1,000 bits, the active user can only use its
+one time slot per frame to transmit data, while the remaining nine time
+slots in each frame remain idle. It will be 10 seconds before all of the
+active user's one million bits of data has
+
+been transmitted. In the case of packet switching, the active user can
+continuously send its packets at the full link rate of 1 Mbps, since
+there are no other users generating packets that need to be multiplexed
+with the active user's packets. In this case, all of the active user's
+data will be transmitted within 1 second. The above examples illustrate
+two ways in which the performance of packet switching can be superior to
+that of circuit switching. They also highlight the crucial difference
+between the two forms of sharing a link's transmission rate among
+multiple data streams. Circuit switching pre-allocates use of the
+transmission link regardless of demand, with allocated but unneeded link
+time going unused. Packet switching on the other hand allocates link use
+on demand. Link transmission capacity will be shared on a
+packet-by-packet basis only among those users who have packets that need
+to be transmitted over the link. Although packet switching and circuit
+switching are both prevalent in today's telecommunication networks, the
+trend has certainly been in the direction of packet switching. Even many
+of today's circuitswitched telephone networks are slowly migrating
+toward packet switching. In particular, telephone networks often use
+packet switching for the expensive overseas portion of a telephone call.
+
+1.3.3 A Network of Networks We saw earlier that end systems (PCs,
+smartphones, Web servers, mail servers, and so on) connect into the
+Internet via an access ISP. The access ISP can provide either wired or
+wireless connectivity, using an array of access technologies including
+DSL, cable, FTTH, Wi-Fi, and cellular. Note that the access ISP does not
+have to be a telco or a cable company; instead it can be, for example, a
+university (providing Internet access to students, staff, and faculty),
+or a company (providing access for its employees). But connecting end
+users and content providers into an access ISP is only a small piece of
+solving the puzzle of connecting the billions of end systems that make
+up the Internet. To complete this puzzle, the access ISPs themselves
+must be interconnected. This is done by creating a network of
+networks---understanding this phrase is the key to understanding the
+Internet. Over the years, the network of networks that forms the
+Internet has evolved into a very complex structure. Much of this
+evolution is driven by economics and national policy, rather than by
+performance considerations. In order to understand today's Internet
+network structure, let's incrementally build a series of network
+structures, with each new structure being a better approximation of the
+complex Internet that we have today. Recall that the overarching goal is
+to interconnect the access ISPs so that all end systems can send packets
+to each other. One naive approach would be to have each access ISP
+directly connect with every other access ISP. Such a mesh design is, of
+course, much too costly for the access ISPs, as it would require each
+access ISP to have a separate communication link to each of the hundreds
+of thousands of other access ISPs all over the world.
+
+Our first network structure, Network Structure 1, interconnects all of
+the access ISPs with a single global transit ISP. Our (imaginary) global
+transit ISP is a network of routers and communication links that not
+only spans the globe, but also has at least one router near each of the
+hundreds of thousands of access ISPs. Of course, it would be very costly
+for the global ISP to build such an extensive network. To be profitable,
+it would naturally charge each of the access ISPs for connectivity, with
+the pricing reflecting (but not necessarily directly proportional to)
+the amount of traffic an access ISP exchanges with the global ISP. Since
+the access ISP pays the global transit ISP, the access ISP is said to be
+a customer and the global transit ISP is said to be a provider. Now if
+some company builds and operates a global transit ISP that is
+profitable, then it is natural for other companies to build their own
+global transit ISPs and compete with the original global transit ISP.
+This leads to Network Structure 2, which consists of the hundreds of
+thousands of access ISPs and multiple global transit ISPs. The access
+ISPs certainly prefer Network Structure 2 over Network Structure 1 since
+they can now choose among the competing global transit providers as a
+function of their pricing and services. Note, however, that the global
+transit ISPs themselves must interconnect: Otherwise access ISPs
+connected to one of the global transit providers would not be able to
+communicate with access ISPs connected to the other global transit
+providers. Network Structure 2, just described, is a two-tier hierarchy
+with global transit providers residing at the top tier and access ISPs
+at the bottom tier. This assumes that global transit ISPs are not only
+capable of getting close to each and every access ISP, but also find it
+economically desirable to do so. In reality, although some ISPs do have
+impressive global coverage and do directly connect with many access
+ISPs, no ISP has presence in each and every city in the world. Instead,
+in any given region, there may be a regional ISP to which the access
+ISPs in the region connect. Each regional ISP then connects to tier-1
+ISPs. Tier-1 ISPs are similar to our (imaginary) global transit ISP; but
+tier-1 ISPs, which actually do exist, do not have a presence in every
+city in the world. There are approximately a dozen tier-1 ISPs,
+including Level 3 Communications, AT&T, Sprint, and NTT. Interestingly,
+no group officially sanctions tier-1 status; as the saying goes---if you
+have to ask if you're a member of a group, you're probably not.
+Returning to this network of networks, not only are there multiple
+competing tier-1 ISPs, there may be multiple competing regional ISPs in
+a region. In such a hierarchy, each access ISP pays the regional ISP to
+which it connects, and each regional ISP pays the tier-1 ISP to which it
+connects. (An access ISP can also connect directly to a tier-1 ISP, in
+which case it pays the tier-1 ISP). Thus, there is customerprovider
+relationship at each level of the hierarchy. Note that the tier-1 ISPs
+do not pay anyone as they are at the top of the hierarchy. To further
+complicate matters, in some regions, there may be a larger regional ISP
+(possibly spanning an entire country) to which the smaller regional ISPs
+in that region connect; the larger regional ISP then connects to a
+tier-1 ISP. For example, in China, there are access ISPs in each city,
+which connect to provincial ISPs, which in turn connect to national
+ISPs, which finally connect to tier-1 ISPs \[Tian 2012\]. We refer to
+this multi-tier hierarchy, which is still only a crude
+
+approximation of today's Internet, as Network Structure 3. To build a
+network that more closely resembles today's Internet, we must add points
+of presence (PoPs), multi-homing, peering, and Internet exchange points
+(IXPs) to the hierarchical Network Structure 3. PoPs exist in all levels
+of the hierarchy, except for the bottom (access ISP) level. A PoP is
+simply a group of one or more routers (at the same location) in the
+provider's network where customer ISPs can connect into the provider
+ISP. For a customer network to connect to a provider's PoP, it can lease
+a high-speed link from a third-party telecommunications provider to
+directly connect one of its routers to a router at the PoP. Any ISP
+(except for tier-1 ISPs) may choose to multi-home, that is, to connect
+to two or more provider ISPs. So, for example, an access ISP may
+multi-home with two regional ISPs, or it may multi-home with two
+regional ISPs and also with a tier-1 ISP. Similarly, a regional ISP may
+multi-home with multiple tier-1 ISPs. When an ISP multi-homes, it can
+continue to send and receive packets into the Internet even if one of
+its providers has a failure. As we just learned, customer ISPs pay their
+provider ISPs to obtain global Internet interconnectivity. The amount
+that a customer ISP pays a provider ISP reflects the amount of traffic
+it exchanges with the provider. To reduce these costs, a pair of nearby
+ISPs at the same level of the hierarchy can peer, that is, they can
+directly connect their networks together so that all the traffic between
+them passes over the direct connection rather than through upstream
+intermediaries. When two ISPs peer, it is typically settlement-free,
+that is, neither ISP pays the other. As noted earlier, tier-1 ISPs also
+peer with one another, settlement-free. For a readable discussion of
+peering and customer-provider relationships, see \[Van der Berg 2008\].
+Along these same lines, a third-party company can create an Internet
+Exchange Point (IXP), which is a meeting point where multiple ISPs can
+peer together. An IXP is typically in a stand-alone building with its
+own switches \[Ager 2012\]. There are over 400 IXPs in the Internet
+today \[IXP List 2016\]. We refer to this ecosystem---consisting of
+access ISPs, regional ISPs, tier-1 ISPs, PoPs, multi-homing, peering,
+and IXPs---as Network Structure 4. We now finally arrive at Network
+Structure 5, which describes today's Internet. Network Structure 5,
+illustrated in Figure 1.15, builds on top of Network Structure 4 by
+adding content-provider networks. Google is currently one of the leading
+examples of such a content-provider network. As of this writing, it is
+estimated that Google has 50--100 data centers distributed across North
+America, Europe, Asia, South America, and Australia. Some of these data
+centers house over one hundred thousand servers, while other data
+centers are smaller, housing only hundreds of servers. The Google data
+centers are all interconnected via Google's private TCP/IP network,
+which spans the entire globe but is nevertheless separate from the
+public Internet. Importantly, the Google private network only carries
+traffic to/from Google servers. As shown in Figure 1.15, the Google
+private network attempts to "bypass" the upper tiers of the Internet by
+peering (settlement free) with lower-tier ISPs, either by directly
+connecting with them or by connecting with them at IXPs \[Labovitz
+2010\]. However, because many access ISPs can still only be reached by
+transiting through tier-1 networks, the Google network also connects to
+tier-1 ISPs, and pays those ISPs for the traffic it exchanges with them.
+By creating its own network, a content
+
+provider not only reduces its payments to upper-tier ISPs, but also has
+greater control of how its services are ultimately delivered to end
+users. Google's network infrastructure is described in greater detail in
+Section 2.6. In summary, today's Internet---a network of networks---is
+complex, consisting of a dozen or so tier-1 ISPs and hundreds of
+thousands of lower-tier ISPs. The ISPs are diverse in their coverage,
+with some spanning multiple continents and oceans, and others limited to
+narrow geographic regions. The lowertier ISPs connect to the higher-tier
+ISPs, and the higher-tier ISPs interconnect with one another. Users and
+content providers are customers of lower-tier ISPs, and lower-tier ISPs
+are customers of higher-tier ISPs. In recent years, major content
+providers have also created their own networks and connect directly into
+lower-tier ISPs where possible.
+
+Figure 1.15 Interconnection of ISPs
+
+1.4 Delay, Loss, and Throughput in Packet-Switched Networks Back in
+Section 1.1 we said that the Internet can be viewed as an infrastructure
+that provides services to distributed applications running on end
+systems. Ideally, we would like Internet services to be able to move as
+much data as we want between any two end systems, instantaneously,
+without any loss of data. Alas, this is a lofty goal, one that is
+unachievable in reality. Instead, computer networks necessarily
+constrain throughput (the amount of data per second that can be
+transferred) between end systems, introduce delays between end systems,
+and can actually lose packets. On one hand, it is unfortunate that the
+physical laws of reality introduce delay and loss as well as constrain
+throughput. On the other hand, because computer networks have these
+problems, there are many fascinating issues surrounding how to deal with
+the problems---more than enough issues to fill a course on computer
+networking and to motivate thousands of PhD theses! In this section,
+we'll begin to examine and quantify delay, loss, and throughput in
+computer networks.
+
+1.4.1 Overview of Delay in Packet-Switched Networks Recall that a packet
+starts in a host (the source), passes through a series of routers, and
+ends its journey in another host (the destination). As a packet travels
+from one node (host or router) to the subsequent node (host or router)
+along this path, the packet suffers from several types of delays at each
+node along the path. The most important of these delays are the nodal
+processing delay, queuing delay, transmission delay, and propagation
+delay; together, these delays accumulate to give a total nodal delay.
+The performance of many Internet applications---such as search, Web
+browsing, e-mail, maps, instant messaging, and voice-over-IP---are
+greatly affected by network delays. In order to acquire a deep
+understanding of packet switching and computer networks, we must
+understand the nature and importance of these delays. Types of Delay
+Let's explore these delays in the context of Figure 1.16. As part of its
+end-to-end route between source and destination, a packet is sent from
+the upstream node through router A to router B. Our goal is to
+characterize the nodal delay at router A. Note that router A has an
+outbound link leading to router B. This link is preceded by a queue
+(also known as a buffer). When the packet arrives at router A from the
+upstream node, router A examines the packet's header to determine the
+appropriate outbound link for the packet and then directs the packet to
+this link. In this example, the outbound link for the packet is the one
+that leads to router B. A packet can be transmitted on a link only if
+there is no other packet currently
+
+being transmitted on the link and if there are no other packets
+preceding it in the queue; if the link is
+
+Figure 1.16 The nodal delay at router A
+
+currently busy or if there are other packets already queued for the
+link, the newly arriving packet will then join the queue. Processing
+Delay The time required to examine the packet's header and determine
+where to direct the packet is part of the processing delay. The
+processing delay can also include other factors, such as the time needed
+to check for bit-level errors in the packet that occurred in
+transmitting the packet's bits from the upstream node to router A.
+Processing delays in high-speed routers are typically on the order of
+microseconds or less. After this nodal processing, the router directs
+the packet to the queue that precedes the link to router B. (In Chapter
+4 we'll study the details of how a router operates.) Queuing Delay At
+the queue, the packet experiences a queuing delay as it waits to be
+transmitted onto the link. The length of the queuing delay of a specific
+packet will depend on the number of earlier-arriving packets that are
+queued and waiting for transmission onto the link. If the queue is empty
+and no other packet is currently being transmitted, then our packet's
+queuing delay will be zero. On the other hand, if the traffic is heavy
+and many other packets are also waiting to be transmitted, the queuing
+delay will be long. We will see shortly that the number of packets that
+an arriving packet might expect to find is a function of the intensity
+and nature of the traffic arriving at the queue. Queuing delays can be
+on the order of microseconds to milliseconds in practice. Transmission
+Delay Assuming that packets are transmitted in a first-come-first-served
+manner, as is common in packetswitched networks, our packet can be
+transmitted only after all the packets that have arrived before it have
+been transmitted. Denote the length of the packet by L bits, and denote
+the transmission rate of
+
+the link from router A to router B by R bits/sec. For example, for a 10
+Mbps Ethernet link, the rate is R=10 Mbps; for a 100 Mbps Ethernet link,
+the rate is R=100 Mbps. The transmission delay is L/R. This is the
+amount of time required to push (that is, transmit) all of the packet's
+bits into the link. Transmission delays are typically on the order of
+microseconds to milliseconds in practice. Propagation Delay Once a bit
+is pushed into the link, it needs to propagate to router B. The time
+required to propagate from the beginning of the link to router B is the
+propagation delay. The bit propagates at the propagation speed of the
+link. The propagation speed depends on the physical medium of the link
+(that is, fiber optics, twisted-pair copper wire, and so on) and is in
+the range of 2⋅108 meters/sec to 3⋅108 meters/sec which is equal to, or
+a little less than, the speed of light. The propagation delay is the
+distance between two routers divided by the propagation speed. That is,
+the propagation delay is d/s, where d is the distance between router A
+and router B and s is the propagation speed of the link. Once the last
+bit of the packet propagates to node B, it and all the preceding bits of
+the packet are stored in router B. The whole process then continues with
+router B now performing the forwarding. In wide-area networks,
+propagation delays are on the order of milliseconds. Comparing
+Transmission and Propagation Delay
+
+Exploring propagation delay and transmission delay
+
+Newcomers to the field of computer networking sometimes have difficulty
+understanding the difference between transmission delay and propagation
+delay. The difference is subtle but important. The transmission delay is
+the amount of time required for the router to push out the packet; it is
+a function of the packet's length and the transmission rate of the link,
+but has nothing to do with the distance between the two routers. The
+propagation delay, on the other hand, is the time it takes a bit to
+propagate from one router to the next; it is a function of the distance
+between the two routers, but has nothing to do with the packet's length
+or the transmission rate of the link. An analogy might clarify the
+notions of transmission and propagation delay. Consider a highway that
+has a tollbooth every 100 kilometers, as shown in Figure 1.17. You can
+think of the highway segments
+
+between tollbooths as links and the tollbooths as routers. Suppose that
+cars travel (that is, propagate) on the highway at a rate of 100 km/hour
+(that is, when a car leaves a tollbooth, it instantaneously accelerates
+to 100 km/hour and maintains that speed between tollbooths). Suppose
+next that 10 cars, traveling together as a caravan, follow each other in
+a fixed order. You can think of each car as a bit and the caravan as a
+packet. Also suppose that each
+
+Figure 1.17 Caravan analogy
+
+tollbooth services (that is, transmits) a car at a rate of one car per
+12 seconds, and that it is late at night so that the caravan's cars are
+the only cars on the highway. Finally, suppose that whenever the first
+car of the caravan arrives at a tollbooth, it waits at the entrance
+until the other nine cars have arrived and lined up behind it. (Thus the
+entire caravan must be stored at the tollbooth before it can begin to be
+forwarded.) The time required for the tollbooth to push the entire
+caravan onto the highway is (10 cars)/(5 cars/minute)=2 minutes. This
+time is analogous to the transmission delay in a router. The time
+required for a car to travel from the exit of one tollbooth to the next
+tollbooth is 100 km/(100 km/hour)=1 hour. This time is analogous to
+propagation delay. Therefore, the time from when the caravan is stored
+in front of a tollbooth until the caravan is stored in front of the next
+tollbooth is the sum of transmission delay and propagation delay---in
+this example, 62 minutes. Let's explore this analogy a bit more. What
+would happen if the tollbooth service time for a caravan were greater
+than the time for a car to travel between tollbooths? For example,
+suppose now that the cars travel at the rate of 1,000 km/hour and the
+tollbooth services cars at the rate of one car per minute. Then the
+traveling delay between two tollbooths is 6 minutes and the time to
+serve a caravan is 10 minutes. In this case, the first few cars in the
+caravan will arrive at the second tollbooth before the last cars in the
+caravan leave the first tollbooth. This situation also arises in
+packet-switched networks---the first bits in a packet can arrive at a
+router while many of the remaining bits in the packet are still waiting
+to be transmitted by the preceding router. If a picture speaks a
+thousand words, then an animation must speak a million words. The Web
+site for this textbook provides an interactive Java applet that nicely
+illustrates and contrasts transmission delay and propagation delay. The
+reader is highly encouraged to visit that applet. \[Smith 2009\] also
+provides a very readable discussion of propagation, queueing, and
+transmission delays. If we let dproc, dqueue, dtrans, and dprop denote
+the processing, queuing, transmission, and propagation
+
+delays, then the total nodal delay is given by
+dnodal=dproc+dqueue+dtrans+dprop The contribution of these delay
+components can vary significantly. For example, dprop can be negligible
+(for example, a couple of microseconds) for a link connecting two
+routers on the same university campus; however, dprop is hundreds of
+milliseconds for two routers interconnected by a geostationary satellite
+link, and can be the dominant term in dnodal. Similarly, dtrans can
+range from negligible to significant. Its contribution is typically
+negligible for transmission rates of 10 Mbps and higher (for example,
+for LANs); however, it can be hundreds of milliseconds for large
+Internet packets sent over low-speed dial-up modem links. The processing
+delay, dproc, is often negligible; however, it strongly influences a
+router's maximum throughput, which is the maximum rate at which a router
+can forward packets.
+
+1.4.2 Queuing Delay and Packet Loss The most complicated and interesting
+component of nodal delay is the queuing delay, dqueue. In fact, queuing
+delay is so important and interesting in computer networking that
+thousands of papers and numerous books have been written about it
+\[Bertsekas 1991; Daigle 1991; Kleinrock 1975, Kleinrock 1976; Ross
+1995\]. We give only a high-level, intuitive discussion of queuing delay
+here; the more curious reader may want to browse through some of the
+books (or even eventually write a PhD thesis on the subject!). Unlike
+the other three delays (namely, dproc, dtrans, and dprop), the queuing
+delay can vary from packet to packet. For example, if 10 packets arrive
+at an empty queue at the same time, the first packet transmitted will
+suffer no queuing delay, while the last packet transmitted will suffer a
+relatively large queuing delay (while it waits for the other nine
+packets to be transmitted). Therefore, when characterizing queuing
+delay, one typically uses statistical measures, such as average queuing
+delay, variance of queuing delay, and the probability that the queuing
+delay exceeds some specified value. When is the queuing delay large and
+when is it insignificant? The answer to this question depends on the
+rate at which traffic arrives at the queue, the transmission rate of the
+link, and the nature of the arriving traffic, that is, whether the
+traffic arrives periodically or arrives in bursts. To gain some insight
+here, let a denote the average rate at which packets arrive at the queue
+(a is in units of packets/sec). Recall that R is the transmission rate;
+that is, it is the rate (in bits/sec) at which bits are pushed out of
+the queue. Also suppose, for simplicity, that all packets consist of L
+bits. Then the average rate at which bits arrive at the queue is La
+bits/sec. Finally, assume that the queue is very big, so that it can
+hold essentially an infinite number of bits. The ratio La/R, called the
+traffic intensity, often plays an important role in estimating the
+extent of the queuing delay. If La/R \> 1, then the average rate at
+which bits arrive at the queue exceeds the rate at which the bits can be
+transmitted from the queue. In this
+
+unfortunate situation, the queue will tend to increase without bound and
+the queuing delay will approach infinity! Therefore, one of the golden
+rules in traffic engineering is: Design your system so that the traffic
+intensity is no greater than 1. Now consider the case La/R ≤ 1. Here,
+the nature of the arriving traffic impacts the queuing delay. For
+example, if packets arrive periodically---that is, one packet arrives
+every L/R seconds---then every packet will arrive at an empty queue and
+there will be no queuing delay. On the other hand, if packets arrive in
+bursts but periodically, there can be a significant average queuing
+delay. For example, suppose N packets arrive simultaneously every (L/R)N
+seconds. Then the first packet transmitted has no queuing delay; the
+second packet transmitted has a queuing delay of L/R seconds; and more
+generally, the nth packet transmitted has a queuing delay of (n−1)L/R
+seconds. We leave it as an exercise for you to calculate the average
+queuing delay in this example. The two examples of periodic arrivals
+described above are a bit academic. Typically, the arrival process to a
+queue is random; that is, the arrivals do not follow any pattern and the
+packets are spaced apart by random amounts of time. In this more
+realistic case, the quantity La/R is not usually sufficient to fully
+characterize the queuing delay statistics. Nonetheless, it is useful in
+gaining an intuitive understanding of the extent of the queuing delay.
+In particular, if the traffic intensity is close to zero, then packet
+arrivals are few and far between and it is unlikely that an arriving
+packet will find another packet in the queue. Hence, the average queuing
+delay will be close to zero. On the other hand, when the traffic
+intensity is close to 1, there will be intervals of time when the
+arrival rate exceeds the transmission capacity (due to variations in
+packet arrival rate), and a queue will form during these periods of
+time; when the arrival rate is less than the transmission capacity, the
+length of the queue will shrink. Nonetheless, as the traffic intensity
+approaches 1, the average queue length gets larger and larger. The
+qualitative dependence of average queuing delay on the traffic intensity
+is shown in Figure 1.18. One important aspect of Figure 1.18 is the fact
+that as the traffic intensity approaches 1, the average queuing delay
+increases rapidly. A small percentage increase in the intensity will
+result in a much larger percentage-wise increase in delay. Perhaps you
+have experienced this phenomenon on the highway. If you regularly drive
+on a road that is typically congested, the fact that the road is
+typically
+
+Figure 1.18 Dependence of average queuing delay on traffic intensity
+
+congested means that its traffic intensity is close to 1. If some event
+causes an even slightly larger-thanusual amount of traffic, the delays
+you experience can be huge. To really get a good feel for what queuing
+delays are about, you are encouraged once again to visit the textbook
+Web site, which provides an interactive Java applet for a queue. If you
+set the packet arrival rate high enough so that the traffic intensity
+exceeds 1, you will see the queue slowly build up over time. Packet Loss
+In our discussions above, we have assumed that the queue is capable of
+holding an infinite number of packets. In reality a queue preceding a
+link has finite capacity, although the queuing capacity greatly depends
+on the router design and cost. Because the queue capacity is finite,
+packet delays do not really approach infinity as the traffic intensity
+approaches 1. Instead, a packet can arrive to find a full queue. With no
+place to store such a packet, a router will drop that packet; that is,
+the packet will be lost. This overflow at a queue can again be seen in
+the Java applet for a queue when the traffic intensity is greater
+than 1. From an end-system viewpoint, a packet loss will look like a
+packet having been transmitted into the network core but never emerging
+from the network at the destination. The fraction of lost packets
+increases as the traffic intensity increases. Therefore, performance at
+a node is often measured not only in terms of delay, but also in terms
+of the probability of packet loss. As we'll discuss in the subsequent
+chapters, a lost packet may be retransmitted on an end-to-end basis in
+order to ensure that all data are eventually transferred from source to
+destination.
+
+1.4.3 End-to-End Delay
+
+Our discussion up to this point has focused on the nodal delay, that is,
+the delay at a single router. Let's now consider the total delay from
+source to destination. To get a handle on this concept, suppose there
+are N−1 routers between the source host and the destination host. Let's
+also suppose for the moment that the network is uncongested (so that
+queuing delays are negligible), the processing delay at each router and
+at the source host is dproc, the transmission rate out of each router
+and out of the source host is R bits/sec, and the propagation on each
+link is dprop. The nodal delays accumulate and give an end-toend delay,
+dend−end=N(dproc+dtrans+dprop)
+
+(1.2)
+
+where, once again, dtrans=L/R, where L is the packet size. Note that
+Equation 1.2 is a generalization of Equation 1.1, which did not take
+into account processing and propagation delays. We leave it to you to
+generalize Equation 1.2 to the case of heterogeneous delays at the nodes
+and to the presence of an average queuing delay at each node. Traceroute
+
+Using Traceroute to discover network paths and measure network delay
+
+To get a hands-on feel for end-to-end delay in a computer network, we
+can make use of the Traceroute program. Traceroute is a simple program
+that can run in any Internet host. When the user specifies a destination
+hostname, the program in the source host sends multiple, special packets
+toward that destination. As these packets work their way toward the
+destination, they pass through a series of routers. When a router
+receives one of these special packets, it sends back to the source a
+short message that contains the name and address of the router. More
+specifically, suppose there are N−1 routers between the source and the
+destination. Then the source will send N special packets into the
+network, with each packet addressed to the ultimate destination. These N
+special packets are marked 1 through N, with the first packet marked 1
+and the last packet marked N. When the nth router receives the nth
+packet marked n, the router does not forward the packet toward its
+destination, but instead sends a message back to the source. When the
+destination host receives the Nth packet, it too returns a message back
+to the source. The source records the time that elapses between when it
+sends a packet and when it receives the corresponding
+
+return message; it also records the name and address of the router (or
+the destination host) that returns the message. In this manner, the
+source can reconstruct the route taken by packets flowing from source to
+destination, and the source can determine the round-trip delays to all
+the intervening routers. Traceroute actually repeats the experiment just
+described three times, so the source actually sends 3 • N packets to the
+destination. RFC 1393 describes Traceroute in detail. Here is an example
+of the output of the Traceroute program, where the route was being
+traced from the source host gaia.cs.umass.edu (at the University of
+Massachusetts) to the host cis.poly.edu (at Polytechnic University in
+Brooklyn). The output has six columns: the first column is the n value
+described above, that is, the number of the router along the route; the
+second column is the name of the router; the third column is the address
+of the router (of the form xxx.xxx.xxx.xxx); the last three columns are
+the round-trip delays for three experiments. If the source receives
+fewer than three messages from any given router (due to packet loss in
+the network), Traceroute places an asterisk just after the router number
+and reports fewer than three round-trip times for that router.
+
+1
+
+cs-gw (128.119.240.254) 1.009 ms 0.899 ms 0.993 ms
+
+2
+
+128.119.3.154 (128.119.3.154) 0.931 ms 0.441 ms 0.651 ms
+
+3
+
+-border4-rt-gi-1-3.gw.umass.edu (128.119.2.194) 1.032 ms 0.484 ms
+
+0.451 ms 4
+
+-acr1-ge-2-1-0.Boston.cw.net (208.172.51.129) 10.006 ms 8.150 ms 8.460
+
+ms 5
+
+-agr4-loopback.NewYork.cw.net (206.24.194.104) 12.272 ms 14.344 ms
+
+13.267 ms 6
+
+-acr2-loopback.NewYork.cw.net (206.24.194.62) 13.225 ms 12.292 ms
+
+12.148 ms 7
+
+-pos10-2.core2.NewYork1.Level3.net (209.244.160.133) 12.218 ms 11.823
+
+ms 11.793 ms 8
+
+-gige9-1-52.hsipaccess1.NewYork1.Level3.net (64.159.17.39) 13.081 ms
+
+11.556 ms 13.297 ms 9
+
+-p0-0.polyu.bbnplanet.net (4.25.109.122) 12.716 ms 13.052 ms 12.786 ms
+
+10 cis.poly.edu (128.238.32.126) 14.080 ms 13.035 ms 12.802 ms
+
+In the trace above there are nine routers between the source and the
+destination. Most of these routers have a name, and all of them have
+addresses. For example, the name of Router 3 is
+border4-rt-gi1-3.gw.umass.edu and its address is 128.119.2.194 . Looking
+at the data provided for this same router, we see that in the first of
+the three trials the round-trip delay between the source and the router
+was 1.03 msec. The round-trip delays for the subsequent two trials were
+0.48 and 0.45 msec. These
+
+round-trip delays include all of the delays just discussed, including
+transmission delays, propagation delays, router processing delays, and
+queuing delays. Because the queuing delay is varying with time, the
+round-trip delay of packet n sent to a router n can sometimes be longer
+than the round-trip delay of packet n+1 sent to router n+1. Indeed, we
+observe this phenomenon in the above example: the delays to Router 6 are
+larger than the delays to Router 7! Want to try out Traceroute for
+yourself? We highly recommended that you visit http://
+www.traceroute.org, which provides a Web interface to an extensive list
+of sources for route tracing. You choose a source and supply the
+hostname for any destination. The Traceroute program then does all the
+work. There are a number of free software programs that provide a
+graphical interface to Traceroute; one of our favorites is PingPlotter
+\[PingPlotter 2016\]. End System, Application, and Other Delays In
+addition to processing, transmission, and propagation delays, there can
+be additional significant delays in the end systems. For example, an end
+system wanting to transmit a packet into a shared medium (e.g., as in a
+WiFi or cable modem scenario) may purposefully delay its transmission as
+part of its protocol for sharing the medium with other end systems;
+we'll consider such protocols in detail in Chapter 6. Another important
+delay is media packetization delay, which is present in Voice-over-IP
+(VoIP) applications. In VoIP, the sending side must first fill a packet
+with encoded digitized speech before passing the packet to the Internet.
+This time to fill a packet---called the packetization delay---can be
+significant and can impact the user-perceived quality of a VoIP call.
+This issue will be further explored in a homework problem at the end of
+this chapter.
+
+1.4.4 Throughput in Computer Networks In addition to delay and packet
+loss, another critical performance measure in computer networks is
+endto-end throughput. To define throughput, consider transferring a
+large file from Host A to Host B across a computer network. This
+transfer might be, for example, a large video clip from one peer to
+another in a P2P file sharing system. The instantaneous throughput at
+any instant of time is the rate (in bits/sec) at which Host B is
+receiving the file. (Many applications, including many P2P file sharing
+systems, display the instantaneous throughput during downloads in the
+user interface---perhaps you have observed this before!) If the file
+consists of F bits and the transfer takes T seconds for Host B to
+receive all F bits, then the average throughput of the file transfer is
+F/T bits/sec. For some applications, such as Internet telephony, it is
+desirable to have a low delay and an instantaneous throughput
+consistently above some threshold (for example, over 24 kbps for some
+Internet telephony applications and over 256 kbps for some real-time
+video applications). For other applications, including those involving
+file transfers, delay is not critical, but it is desirable to have the
+highest possible throughput.
+
+To gain further insight into the important concept of throughput, let's
+consider a few examples. Figure 1.19(a) shows two end systems, a server
+and a client, connected by two communication links and a router.
+Consider the throughput for a file transfer from the server to the
+client. Let Rs denote the rate of the link between the server and the
+router; and Rc denote the rate of the link between the router and the
+client. Suppose that the only bits being sent in the entire network are
+those from the server to the client. We now ask, in this ideal scenario,
+what is the server-to-client throughput? To answer this question, we may
+think of bits as fluid and communication links as pipes. Clearly, the
+server cannot pump bits through its link at a rate faster than Rs bps;
+and the router cannot forward bits at a rate faster than Rc bps. If
+Rs\<Rc, then the bits pumped by the server will "flow" right through the
+router and arrive at the client at a rate of Rs bps, giving a throughput
+of Rs bps. If, on the other hand, Rc\<Rs, then the router will not be
+able to forward bits as quickly as it receives them. In this case, bits
+will only leave the router at rate Rc, giving an end-to-end throughput
+of Rc. (Note also that if bits continue to arrive at the router at rate
+Rs, and continue to leave the router at Rc, the backlog of bits at the
+router waiting
+
+Figure 1.19 Throughput for a file transfer from server to client
+
+for transmission to the client will grow and grow---a most undesirable
+situation!) Thus, for this simple two-link network, the throughput is
+min{Rc, Rs}, that is, it is the transmission rate of the bottleneck
+link. Having determined the throughput, we can now approximate the time
+it takes to transfer a large file of F bits from server to client as
+F/min{Rs, Rc}. For a specific example, suppose you are downloading an
+MP3 file of F=32 million bits, the server has a transmission rate of
+Rs=2 Mbps, and you have an access link of Rc=1 Mbps. The time needed to
+transfer the file is then 32 seconds. Of course, these expressions for
+throughput and transfer time are only approximations, as they do not
+account for store-and-forward and processing delays as well as protocol
+issues. Figure 1.19(b) now shows a network with N links between the
+server and the client, with the transmission rates of the N links being
+R1,R2,..., RN. Applying the same analysis as for the two-link network,
+we find that the throughput for a file transfer from server to client is
+min{R1,R2,..., RN}, which
+
+is once again the transmission rate of the bottleneck link along the
+path between server and client. Now consider another example motivated
+by today's Internet. Figure 1.20(a) shows two end systems, a server and
+a client, connected to a computer network. Consider the throughput for a
+file transfer from the server to the client. The server is connected to
+the network with an access link of rate Rs and the client is connected
+to the network with an access link of rate Rc. Now suppose that all the
+links in the core of the communication network have very high
+transmission rates, much higher than Rs and Rc. Indeed, today, the core
+of the Internet is over-provisioned with high speed links that
+experience little congestion. Also suppose that the only bits being sent
+in the entire network are those from the server to the client. Because
+the core of the computer network is like a wide pipe in this example,
+the rate at which bits can flow from source to destination is again the
+minimum of Rs and Rc, that is, throughput = min{Rs, Rc}. Therefore, the
+constraining factor for throughput in today's Internet is typically the
+access network. For a final example, consider Figure 1.20(b) in which
+there are 10 servers and 10 clients connected to the core of the
+computer network. In this example, there are 10 simultaneous downloads
+taking place, involving 10 client-server pairs. Suppose that these 10
+downloads are the only traffic in the network at the current time. As
+shown in the figure, there is a link in the core that is traversed by
+all 10 downloads. Denote R for the transmission rate of this link R.
+Let's suppose that all server access links have the same rate Rs, all
+client access links have the same rate Rc, and the transmission rates of
+all the links in the core---except the one common link of rate R---are
+much larger than Rs, Rc, and R. Now we ask, what are the throughputs of
+the downloads? Clearly, if the rate of the common link, R, is
+large---say a hundred times larger than both Rs and Rc---then the
+throughput for each download will once again be min{Rs, Rc}. But what if
+the rate of the common link is of the same order as Rs and Rc? What will
+the throughput be in this case? Let's take a look at a specific example.
+Suppose Rs=2 Mbps, Rc=1 Mbps, R=5 Mbps, and the
+
+Figure 1.20 End-to-end throughput: (a) Client downloads a file from
+server; (b) 10 clients downloading with 10 servers
+
+common link divides its transmission rate equally among the 10
+downloads. Then the bottleneck for each download is no longer in the
+access network, but is now instead the shared link in the core, which
+only provides each download with 500 kbps of throughput. Thus the
+end-to-end throughput for each download is now reduced to 500 kbps. The
+examples in Figure 1.19 and Figure 1.20(a) show that throughput depends
+on the transmission rates of the links over which the data flows. We saw
+that when there is no other intervening traffic, the throughput can
+simply be approximated as the minimum transmission rate along the path
+between source and destination. The example in Figure 1.20(b) shows that
+more generally the throughput depends not only on the transmission rates
+of the links along the path, but also on the intervening traffic. In
+particular, a link with a high transmission rate may nonetheless be the
+bottleneck link for a file transfer if many other data flows are also
+passing through that link. We will examine throughput in computer
+networks more closely in the homework problems and in the subsequent
+chapters.
+
+1.5 Protocol Layers and Their Service Models From our discussion thus
+far, it is apparent that the Internet is an extremely complicated
+system. We have seen that there are many pieces to the Internet:
+numerous applications and protocols, various types of end systems,
+packet switches, and various types of link-level media. Given this
+enormous complexity, is there any hope of organizing a network
+architecture, or at least our discussion of network architecture?
+Fortunately, the answer to both questions is yes.
+
+1.5.1 Layered Architecture Before attempting to organize our thoughts on
+Internet architecture, let's look for a human analogy. Actually, we deal
+with complex systems all the time in our everyday life. Imagine if
+someone asked you to describe, for example, the airline system. How
+would you find the structure to describe this complex system that has
+ticketing agents, baggage checkers, gate personnel, pilots, airplanes,
+air traffic control, and a worldwide system for routing airplanes? One
+way to describe this system might be to describe the series of actions
+you take (or others take for you) when you fly on an airline. You
+purchase your ticket, check your bags, go to the gate, and eventually
+get loaded onto the plane. The plane takes off and is routed to its
+destination. After your plane lands, you deplane at the gate and claim
+your bags. If the trip was bad, you complain about the flight to the
+ticket agent (getting nothing for your effort). This scenario is shown
+in Figure 1.21.
+
+Figure 1.21 Taking an airplane trip: actions
+
+Figure 1.22 Horizontal layering of airline functionality
+
+Already, we can see some analogies here with computer networking: You
+are being shipped from source to destination by the airline; a packet is
+shipped from source host to destination host in the Internet. But this
+is not quite the analogy we are after. We are looking for some structure
+in Figure 1.21. Looking at Figure 1.21, we note that there is a
+ticketing function at each end; there is also a baggage function for
+already-ticketed passengers, and a gate function for already-ticketed
+and already-baggagechecked passengers. For passengers who have made it
+through the gate (that is, passengers who are already ticketed,
+baggage-checked, and through the gate), there is a takeoff and landing
+function, and while in flight, there is an airplane-routing function.
+This suggests that we can look at the functionality in Figure 1.21 in a
+horizontal manner, as shown in Figure 1.22. Figure 1.22 has divided the
+airline functionality into layers, providing a framework in which we can
+discuss airline travel. Note that each layer, combined with the layers
+below it, implements some functionality, some service. At the ticketing
+layer and below, airline-counter-to-airline-counter transfer of a person
+is accomplished. At the baggage layer and below,
+baggage-check-to-baggage-claim transfer of a person and bags is
+accomplished. Note that the baggage layer provides this service only to
+an already-ticketed person. At the gate layer,
+departure-gate-to-arrival-gate transfer of a person and bags is
+accomplished. At the takeoff/landing layer, runway-to-runway transfer of
+people and their bags is accomplished. Each layer provides its service
+by (1) performing certain actions within that layer (for example, at the
+gate layer, loading and unloading people from an airplane) and by (2)
+using the services of the layer directly below it (for example, in the
+gate layer, using the runway-to-runway passenger transfer service of the
+takeoff/landing layer). A layered architecture allows us to discuss a
+well-defined, specific part of a large and complex system. This
+simplification itself is of considerable value by providing modularity,
+making it much easier to change the implementation of the service
+provided by the layer. As long as the layer provides the same service to
+the layer above it, and uses the same services from the layer below it,
+the remainder of the system remains unchanged when a layer's
+implementation is changed. (Note that changing the
+
+implementation of a service is very different from changing the service
+itself!) For example, if the gate functions were changed (for instance,
+to have people board and disembark by height), the remainder of the
+airline system would remain unchanged since the gate layer still
+provides the same function (loading and unloading people); it simply
+implements that function in a different manner after the change. For
+large and complex systems that are constantly being updated, the ability
+to change the implementation of a service without affecting other
+components of the system is another important advantage of layering.
+Protocol Layering But enough about airlines. Let's now turn our
+attention to network protocols. To provide structure to the design of
+network protocols, network designers organize protocols---and the
+network hardware and software that implement the protocols---in layers.
+Each protocol belongs to one of the layers, just as each function in the
+airline architecture in Figure 1.22 belonged to a layer. We are again
+interested in the services that a layer offers to the layer above---the
+so-called service model of a layer. Just as in the case of our airline
+example, each layer provides its service by (1) performing certain
+actions within that layer and by (2) using the services of the layer
+directly below it. For example, the services provided by layer n may
+include reliable delivery of messages from one edge of the network to
+the other. This might be implemented by using an unreliable edge-to-edge
+message delivery service of layer n−1, and adding layer n functionality
+to detect and retransmit lost messages. A protocol layer can be
+implemented in software, in hardware, or in a combination of the two.
+Application-layer protocols---such as HTTP and SMTP---are almost always
+implemented in software in the end systems; so are transport-layer
+protocols. Because the physical layer and data link layers are
+responsible for handling communication over a specific link, they are
+typically implemented in a network interface card (for example, Ethernet
+or WiFi interface cards) associated with a given link. The network layer
+is often a mixed implementation of hardware and software. Also note that
+just as the functions in the layered airline architecture were
+distributed among the various airports and flight control centers that
+make up the system, so too is a layer n protocol distributed among the
+end systems, packet switches, and other components that make up the
+network. That is, there's often a piece of a layer n protocol in each of
+these network components. Protocol layering has conceptual and
+structural advantages \[RFC 3439\]. As we have seen, layering provides a
+structured way to discuss system components. Modularity makes it easier
+to update system components. We mention, however, that some researchers
+and networking engineers are vehemently opposed to layering \[Wakeman
+1992\]. One potential drawback of layering is that one layer may
+duplicate lower-layer functionality. For example, many protocol stacks
+provide error recovery
+
+Figure 1.23 The Internet protocol stack (a) and OSI reference model (b)
+
+on both a per-link basis and an end-to-end basis. A second potential
+drawback is that functionality at one layer may need information (for
+example, a timestamp value) that is present only in another layer; this
+violates the goal of separation of layers. When taken together, the
+protocols of the various layers are called the protocol stack. The
+Internet protocol stack consists of five layers: the physical, link,
+network, transport, and application layers, as shown in Figure 1.23(a).
+If you examine the Table of Contents, you will see that we have roughly
+organized this book using the layers of the Internet protocol stack. We
+take a top-down approach, first covering the application layer and then
+proceeding downward. Application Layer The application layer is where
+network applications and their application-layer protocols reside. The
+Internet's application layer includes many protocols, such as the HTTP
+protocol (which provides for Web document request and transfer), SMTP
+(which provides for the transfer of e-mail messages), and FTP (which
+provides for the transfer of files between two end systems). We'll see
+that certain network functions, such as the translation of
+human-friendly names for Internet end systems like www.ietf.org to a
+32-bit network address, are also done with the help of a specific
+application-layer protocol, namely, the domain name system (DNS). We'll
+see in Chapter 2 that it is very easy to create and deploy our own new
+application-layer protocols. An application-layer protocol is
+distributed over multiple end systems, with the application in one end
+system using the protocol to exchange packets of information with the
+application in another end system. We'll refer to this packet of
+information at the application layer as a message. Transport Layer
+
+The Internet's transport layer transports application-layer messages
+between application endpoints. In the Internet there are two transport
+protocols, TCP and UDP, either of which can transport applicationlayer
+messages. TCP provides a connection-oriented service to its
+applications. This service includes guaranteed delivery of
+application-layer messages to the destination and flow control (that is,
+sender/receiver speed matching). TCP also breaks long messages into
+shorter segments and provides a congestion-control mechanism, so that a
+source throttles its transmission rate when the network is congested.
+The UDP protocol provides a connectionless service to its applications.
+This is a no-frills service that provides no reliability, no flow
+control, and no congestion control. In this book, we'll refer to a
+transport-layer packet as a segment. Network Layer The Internet's
+network layer is responsible for moving network-layer packets known as
+datagrams from one host to another. The Internet transport-layer
+protocol (TCP or UDP) in a source host passes a transport-layer segment
+and a destination address to the network layer, just as you would give
+the postal service a letter with a destination address. The network
+layer then provides the service of delivering the segment to the
+transport layer in the destination host. The Internet's network layer
+includes the celebrated IP protocol, which defines the fields in the
+datagram as well as how the end systems and routers act on these fields.
+There is only one IP protocol, and all Internet components that have a
+network layer must run the IP protocol. The Internet's network layer
+also contains routing protocols that determine the routes that datagrams
+take between sources and destinations. The Internet has many routing
+protocols. As we saw in Section 1.3, the Internet is a network of
+networks, and within a network, the network administrator can run any
+routing protocol desired. Although the network layer contains both the
+IP protocol and numerous routing protocols, it is often simply referred
+to as the IP layer, reflecting the fact that IP is the glue that binds
+the Internet together. Link Layer The Internet's network layer routes a
+datagram through a series of routers between the source and destination.
+To move a packet from one node (host or router) to the next node in the
+route, the network layer relies on the services of the link layer. In
+particular, at each node, the network layer passes the datagram down to
+the link layer, which delivers the datagram to the next node along the
+route. At this next node, the link layer passes the datagram up to the
+network layer. The services provided by the link layer depend on the
+specific link-layer protocol that is employed over the link. For
+example, some link-layer protocols provide reliable delivery, from
+transmitting node, over one link, to receiving node. Note that this
+reliable delivery service is different from the reliable delivery
+service of TCP, which provides reliable delivery from one end system to
+another. Examples of link-layer
+
+protocols include Ethernet, WiFi, and the cable access network's DOCSIS
+protocol. As datagrams typically need to traverse several links to
+travel from source to destination, a datagram may be handled by
+different link-layer protocols at different links along its route. For
+example, a datagram may be handled by Ethernet on one link and by PPP on
+the next link. The network layer will receive a different service from
+each of the different link-layer protocols. In this book, we'll refer to
+the link-layer packets as frames. Physical Layer While the job of the
+link layer is to move entire frames from one network element to an
+adjacent network element, the job of the physical layer is to move the
+individual bits within the frame from one node to the next. The
+protocols in this layer are again link dependent and further depend on
+the actual transmission medium of the link (for example, twisted-pair
+copper wire, single-mode fiber optics). For example, Ethernet has many
+physical-layer protocols: one for twisted-pair copper wire, another for
+coaxial cable, another for fiber, and so on. In each case, a bit is
+moved across the link in a different way. The OSI Model Having discussed
+the Internet protocol stack in detail, we should mention that it is not
+the only protocol stack around. In particular, back in the late 1970s,
+the International Organization for Standardization (ISO) proposed that
+computer networks be organized around seven layers, called the Open
+Systems Interconnection (OSI) model \[ISO 2016\]. The OSI model took
+shape when the protocols that were to become the Internet protocols were
+in their infancy, and were but one of many different protocol suites
+under development; in fact, the inventors of the original OSI model
+probably did not have the Internet in mind when creating it.
+Nevertheless, beginning in the late 1970s, many training and university
+courses picked up on the ISO mandate and organized courses around the
+seven-layer model. Because of its early impact on networking education,
+the seven-layer model continues to linger on in some networking
+textbooks and training courses. The seven layers of the OSI reference
+model, shown in Figure 1.23(b), are: application layer, presentation
+layer, session layer, transport layer, network layer, data link layer,
+and physical layer. The functionality of five of these layers is roughly
+the same as their similarly named Internet counterparts. Thus, let's
+consider the two additional layers present in the OSI reference
+model---the presentation layer and the session layer. The role of the
+presentation layer is to provide services that allow communicating
+applications to interpret the meaning of data exchanged. These services
+include data compression and data encryption (which are
+self-explanatory) as well as data description (which frees the
+applications from having to worry about the internal format in which
+data are represented/stored---formats that may differ from one computer
+to another). The session layer provides for delimiting and
+synchronization of data exchange, including the means to build a
+checkpointing and recovery scheme.
+
+The fact that the Internet lacks two layers found in the OSI reference
+model poses a couple of interesting questions: Are the services provided
+by these layers unimportant? What if an application needs one of these
+services? The Internet's answer to both of these questions is the
+same---it's up to the application developer. It's up to the application
+developer to decide if a service is important, and if the service is
+important, it's up to the application developer to build that
+functionality into the application.
+
+1.5.2 Encapsulation Figure 1.24 shows the physical path that data takes
+down a sending end system's protocol stack, up and down the protocol
+stacks of an intervening link-layer switch
+
+Figure 1.24 Hosts, routers, and link-layer switches; each contains a
+different set of layers, reflecting their differences in functionality
+
+and router, and then up the protocol stack at the receiving end system.
+As we discuss later in this book, routers and link-layer switches are
+both packet switches. Similar to end systems, routers and link-layer
+switches organize their networking hardware and software into layers.
+But routers and link-layer switches do not implement all of the layers
+in the protocol stack; they typically implement only the bottom layers.
+As shown in Figure 1.24, link-layer switches implement layers 1 and 2;
+routers implement layers 1 through 3. This means, for example, that
+Internet routers are capable of implementing the IP protocol (a layer 3
+protocol), while link-layer switches are not. We'll see later that
+
+while link-layer switches do not recognize IP addresses, they are
+capable of recognizing layer 2 addresses, such as Ethernet addresses.
+Note that hosts implement all five layers; this is consistent with the
+view that the Internet architecture puts much of its complexity at the
+edges of the network. Figure 1.24 also illustrates the important concept
+of encapsulation. At the sending host, an application-layer message (M
+in Figure 1.24) is passed to the transport layer. In the simplest case,
+the transport layer takes the message and appends additional information
+(so-called transport-layer header information, Ht in Figure 1.24) that
+will be used by the receiver-side transport layer. The application-layer
+message and the transport-layer header information together constitute
+the transportlayer segment. The transport-layer segment thus
+encapsulates the application-layer message. The added information might
+include information allowing the receiver-side transport layer to
+deliver the message up to the appropriate application, and
+error-detection bits that allow the receiver to determine whether bits
+in the message have been changed in route. The transport layer then
+passes the segment to the network layer, which adds network-layer header
+information (Hn in Figure 1.24) such as source and destination end
+system addresses, creating a network-layer datagram. The datagram is
+then passed to the link layer, which (of course!) will add its own
+link-layer header information and create a link-layer frame. Thus, we
+see that at each layer, a packet has two types of fields: header fields
+and a payload field. The payload is typically a packet from the layer
+above. A useful analogy here is the sending of an interoffice memo from
+one corporate branch office to another via the public postal service.
+Suppose Alice, who is in one branch office, wants to send a memo to Bob,
+who is in another branch office. The memo is analogous to the
+application-layer message. Alice puts the memo in an interoffice
+envelope with Bob's name and department written on the front of the
+envelope. The interoffice envelope is analogous to a transport-layer
+segment---it contains header information (Bob's name and department
+number) and it encapsulates the application-layer message (the memo).
+When the sending branch-office mailroom receives the interoffice
+envelope, it puts the interoffice envelope inside yet another envelope,
+which is suitable for sending through the public postal service. The
+sending mailroom also writes the postal address of the sending and
+receiving branch offices on the postal envelope. Here, the postal
+envelope is analogous to the datagram---it encapsulates the
+transportlayer segment (the interoffice envelope), which encapsulates
+the original message (the memo). The postal service delivers the postal
+envelope to the receiving branch-office mailroom. There, the process of
+de-encapsulation is begun. The mailroom extracts the interoffice memo
+and forwards it to Bob. Finally, Bob opens the envelope and removes the
+memo. The process of encapsulation can be more complex than that
+described above. For example, a large message may be divided into
+multiple transport-layer segments (which might themselves each be
+divided into multiple network-layer datagrams). At the receiving end,
+such a segment must then be reconstructed from its constituent
+datagrams.
+
+1.6 Networks Under Attack The Internet has become mission critical for
+many institutions today, including large and small companies,
+universities, and government agencies. Many individuals also rely on the
+Internet for many of their professional, social, and personal
+activities. Billions of "things," including wearables and home devices,
+are currently being connected to the Internet. But behind all this
+utility and excitement, there is a dark side, a side where "bad guys"
+attempt to wreak havoc in our daily lives by damaging our
+Internetconnected computers, violating our privacy, and rendering
+inoperable the Internet services on which we depend. The field of
+network security is about how the bad guys can attack computer networks
+and about how we, soon-to-be experts in computer networking, can defend
+networks against those attacks, or better yet, design new architectures
+that are immune to such attacks in the first place. Given the frequency
+and variety of existing attacks as well as the threat of new and more
+destructive future attacks, network security has become a central topic
+in the field of computer networking. One of the features of this
+textbook is that it brings network security issues to the forefront.
+Since we don't yet have expertise in computer networking and Internet
+protocols, we'll begin here by surveying some of today's more prevalent
+security-related problems. This will whet our appetite for more
+substantial discussions in the upcoming chapters. So we begin here by
+simply asking, what can go wrong? How are computer networks vulnerable?
+What are some of the more prevalent types of attacks today? The Bad Guys
+Can Put Malware into Your Host Via the Internet We attach devices to the
+Internet because we want to receive/send data from/to the Internet. This
+includes all kinds of good stuff, including Instagram posts, Internet
+search results, streaming music, video conference calls, streaming
+movies, and so on. But, unfortunately, along with all that good stuff
+comes malicious stuff---collectively known as malware---that can also
+enter and infect our devices. Once malware infects our device it can do
+all kinds of devious things, including deleting our files and installing
+spyware that collects our private information, such as social security
+numbers, passwords, and keystrokes, and then sends this (over the
+Internet, of course!) back to the bad guys. Our compromised host may
+also be enrolled in a network of thousands of similarly compromised
+devices, collectively known as a botnet, which the bad guys control and
+leverage for spam e-mail distribution or distributed denial-of-service
+attacks (soon to be discussed) against targeted hosts.
+
+Much of the malware out there today is self-replicating: once it infects
+one host, from that host it seeks entry into other hosts over the
+Internet, and from the newly infected hosts, it seeks entry into yet
+more hosts. In this manner, self-replicating malware can spread
+exponentially fast. Malware can spread in the form of a virus or a worm.
+Viruses are malware that require some form of user interaction to infect
+the user's device. The classic example is an e-mail attachment
+containing malicious executable code. If a user receives and opens such
+an attachment, the user inadvertently runs the malware on the device.
+Typically, such e-mail viruses are self-replicating: once executed, the
+virus may send an identical message with an identical malicious
+attachment to, for example, every recipient in the user's address book.
+Worms are malware that can enter a device without any explicit user
+interaction. For example, a user may be running a vulnerable network
+application to which an attacker can send malware. In some cases,
+without any user intervention, the application may accept the malware
+from the Internet and run it, creating a worm. The worm in the newly
+infected device then scans the Internet, searching for other hosts
+running the same vulnerable network application. When it finds other
+vulnerable hosts, it sends a copy of itself to those hosts. Today,
+malware, is pervasive and costly to defend against. As you work through
+this textbook, we encourage you to think about the following question:
+What can computer network designers do to defend Internet-attached
+devices from malware attacks? The Bad Guys Can Attack Servers and
+Network Infrastructure Another broad class of security threats are known
+as denial-of-service (DoS) attacks. As the name suggests, a DoS attack
+renders a network, host, or other piece of infrastructure unusable by
+legitimate users. Web servers, e-mail servers, DNS servers (discussed in
+Chapter 2), and institutional networks can all be subject to DoS
+attacks. Internet DoS attacks are extremely common, with thousands of
+DoS attacks occurring every year \[Moore 2001\]. The site Digital Attack
+Map allows use to visualize the top daily DoS attacks worldwide \[DAM
+2016\]. Most Internet DoS attacks fall into one of three categories:
+Vulnerability attack. This involves sending a few well-crafted messages
+to a vulnerable application or operating system running on a targeted
+host. If the right sequence of packets is sent to a vulnerable
+application or operating system, the service can stop or, worse, the
+host can crash. Bandwidth flooding. The attacker sends a deluge of
+packets to the targeted host---so many packets that the target's access
+link becomes clogged, preventing legitimate packets from reaching the
+server. Connection flooding. The attacker establishes a large number of
+half-open or fully open TCP connections (TCP connections are discussed
+in Chapter 3) at the target host. The host can become so bogged down
+with these bogus connections that it stops accepting legitimate
+connections. Let's now explore the bandwidth-flooding attack in more
+detail. Recalling our delay and loss analysis discussion in Section
+1.4.2, it's evident that if the server has an access rate of R bps, then
+the attacker will need to send traffic at a rate of approximately R bps
+to cause damage. If R is very large, a single attack source may not be
+able to generate enough traffic to harm the server. Furthermore, if all
+the
+
+traffic emanates from a single source, an upstream router may be able to
+detect the attack and block all traffic from that source before the
+traffic gets near the server. In a distributed DoS (DDoS) attack,
+illustrated in Figure 1.25, the attacker controls multiple sources and
+has each source blast traffic at the target. With this approach, the
+aggregate traffic rate across all the controlled sources needs to be
+approximately R to cripple the service. DDoS attacks leveraging botnets
+with thousands of comprised hosts are a common occurrence today \[DAM
+2016\]. DDos attacks are much harder to detect and defend against than a
+DoS attack from a single host. We encourage you to consider the
+following question as you work your way through this book: What can
+computer network designers do to defend against DoS attacks? We will see
+that different defenses are needed for the three types of DoS attacks.
+
+Figure 1.25 A distributed denial-of-service attack
+
+The Bad Guys Can Sniff Packets Many users today access the Internet via
+wireless devices, such as WiFi-connected laptops or handheld devices
+with cellular Internet connections (covered in Chapter 7). While
+ubiquitous Internet access is extremely convenient and enables marvelous
+new applications for mobile users, it also creates a major security
+vulnerability---by placing a passive receiver in the vicinity of the
+wireless transmitter, that receiver can obtain a copy of every packet
+that is transmitted! These packets can contain all kinds of sensitive
+information, including passwords, social security numbers, trade
+secrets, and private personal messages. A passive receiver that records
+a copy of every packet that flies by is called a packet sniffer.
+
+Sniffers can be deployed in wired environments as well. In wired
+broadcast environments, as in many Ethernet LANs, a packet sniffer can
+obtain copies of broadcast packets sent over the LAN. As described in
+Section 1.2, cable access technologies also broadcast packets and are
+thus vulnerable to sniffing. Furthermore, a bad guy who gains access to
+an institution's access router or access link to the Internet may be
+able to plant a sniffer that makes a copy of every packet going to/from
+the organization. Sniffed packets can then be analyzed offline for
+sensitive information. Packet-sniffing software is freely available at
+various Web sites and as commercial products. Professors teaching a
+networking course have been known to assign lab exercises that involve
+writing a packetsniffing and application-layer data reconstruction
+program. Indeed, the Wireshark \[Wireshark 2016\] labs associated with
+this text (see the introductory Wireshark lab at the end of this
+chapter) use exactly such a packet sniffer! Because packet sniffers are
+passive---that is, they do not inject packets into the channel---they
+are difficult to detect. So, when we send packets into a wireless
+channel, we must accept the possibility that some bad guy may be
+recording copies of our packets. As you may have guessed, some of the
+best defenses against packet sniffing involve cryptography. We will
+examine cryptography as it applies to network security in Chapter 8. The
+Bad Guys Can Masquerade as Someone You Trust It is surprisingly easy
+(you will have the knowledge to do so shortly as you proceed through
+this text!) to create a packet with an arbitrary source address, packet
+content, and destination address and then transmit this hand-crafted
+packet into the Internet, which will dutifully forward the packet to its
+destination. Imagine the unsuspecting receiver (say an Internet router)
+who receives such a packet, takes the (false) source address as being
+truthful, and then performs some command embedded in the packet's
+contents (say modifies its forwarding table). The ability to inject
+packets into the Internet with a false source address is known as IP
+spoofing, and is but one of many ways in which one user can masquerade
+as another user. To solve this problem, we will need end-point
+authentication, that is, a mechanism that will allow us to determine
+with certainty if a message originates from where we think it does. Once
+again, we encourage you to think about how this can be done for network
+applications and protocols as you progress through the chapters of this
+book. We will explore mechanisms for end-point authentication in Chapter
+8. In closing this section, it's worth considering how the Internet got
+to be such an insecure place in the first place. The answer, in essence,
+is that the Internet was originally designed to be that way, based on
+the model of "a group of mutually trusting users attached to a
+transparent network" \[Blumenthal 2001\]---a model in which (by
+definition) there is no need for security. Many aspects of the original
+Internet architecture deeply reflect this notion of mutual trust. For
+example, the ability for one user to send a
+
+packet to any other user is the default rather than a requested/granted
+capability, and user identity is taken at declared face value, rather
+than being authenticated by default. But today's Internet certainly does
+not involve "mutually trusting users." Nonetheless, today's users still
+need to communicate when they don't necessarily trust each other, may
+wish to communicate anonymously, may communicate indirectly through
+third parties (e.g., Web caches, which we'll study in Chapter 2, or
+mobility-assisting agents, which we'll study in Chapter 7), and may
+distrust the hardware, software, and even the air through which they
+communicate. We now have many security-related challenges before us as
+we progress through this book: We should seek defenses against sniffing,
+endpoint masquerading, man-in-the-middle attacks, DDoS attacks, malware,
+and more. We should keep in mind that communication among mutually
+trusted users is the exception rather than the rule. Welcome to the
+world of modern computer networking!
+
+1.7 History of Computer Networking and the Internet Sections 1.1 through
+1.6 presented an overview of the technology of computer networking and
+the Internet. You should know enough now to impress your family and
+friends! However, if you really want to be a big hit at the next
+cocktail party, you should sprinkle your discourse with tidbits about
+the fascinating history of the Internet \[Segaller 1998\].
+
+1.7.1 The Development of Packet Switching: 1961--1972 The field of
+computer networking and today's Internet trace their beginnings back to
+the early 1960s, when the telephone network was the world's dominant
+communication network. Recall from Section 1.3 that the telephone
+network uses circuit switching to transmit information from a sender to
+a receiver---an appropriate choice given that voice is transmitted at a
+constant rate between sender and receiver. Given the increasing
+importance of computers in the early 1960s and the advent of timeshared
+computers, it was perhaps natural to consider how to hook computers
+together so that they could be shared among geographically distributed
+users. The traffic generated by such users was likely to be
+bursty---intervals of activity, such as the sending of a command to a
+remote computer, followed by periods of inactivity while waiting for a
+reply or while contemplating the received response. Three research
+groups around the world, each unaware of the others' work \[Leiner
+1998\], began inventing packet switching as an efficient and robust
+alternative to circuit switching. The first published work on
+packet-switching techniques was that of Leonard Kleinrock \[Kleinrock
+1961; Kleinrock 1964\], then a graduate student at MIT. Using queuing
+theory, Kleinrock's work elegantly demonstrated the effectiveness of the
+packet-switching approach for bursty traffic sources. In 1964, Paul
+Baran \[Baran 1964\] at the Rand Institute had begun investigating the
+use of packet switching for secure voice over military networks, and at
+the National Physical Laboratory in England, Donald Davies and Roger
+Scantlebury were also developing their ideas on packet switching. The
+work at MIT, Rand, and the NPL laid the foundations for today's
+Internet. But the Internet also has a long history of a
+let's-build-it-and-demonstrate-it attitude that also dates back to the
+1960s. J. C. R. Licklider \[DEC 1990\] and Lawrence Roberts, both
+colleagues of Kleinrock's at MIT, went on to lead the computer science
+program at the Advanced Research Projects Agency (ARPA) in the United
+States. Roberts published an overall plan for the ARPAnet \[Roberts
+1967\], the first packet-switched computer network and a direct ancestor
+of today's public Internet. On Labor Day in 1969, the first packet
+switch was installed at UCLA under Kleinrock's supervision, and three
+additional packet switches were installed
+
+shortly thereafter at the Stanford Research Institute (SRI), UC Santa
+Barbara, and the University of Utah (Figure 1.26). The fledgling
+precursor to the Internet was four nodes large by the end of 1969.
+Kleinrock recalls the very first use of the network to perform a remote
+login from UCLA to SRI, crashing the system \[Kleinrock 2004\]. By 1972,
+ARPAnet had grown to approximately 15 nodes and was given its first
+public demonstration by Robert Kahn. The first host-to-host protocol
+between ARPAnet end systems, known as the networkcontrol protocol (NCP),
+was completed \[RFC 001\]. With an end-to-end protocol available,
+applications could now be written. Ray Tomlinson wrote the first e-mail
+program in 1972.
+
+1.7.2 Proprietary Networks and Internetworking: 1972--1980 The initial
+ARPAnet was a single, closed network. In order to communicate with an
+ARPAnet host, one had to be actually attached to another ARPAnet IMP. In
+the early to mid-1970s, additional stand-alone packet-switching networks
+besides ARPAnet came into being: ALOHANet, a microwave network linking
+universities on the Hawaiian islands \[Abramson 1970\], as well as
+DARPA's packet-satellite \[RFC 829\]
+
+Figure 1.26 An early packet switch
+
+and packet-radio networks \[Kahn 1978\]; Telenet, a BBN commercial
+packet-switching network based on ARPAnet technology; Cyclades, a French
+packet-switching network pioneered by Louis Pouzin \[Think 2012\];
+Time-sharing networks such as Tymnet and the GE Information Services
+network, among others, in the late 1960s and early 1970s \[Schwartz
+1977\]; IBM's SNA (1969--1974), which paralleled the ARPAnet work
+\[Schwartz 1977\].
+
+The number of networks was growing. With perfect hindsight we can see
+that the time was ripe for developing an encompassing architecture for
+connecting networks together. Pioneering work on interconnecting
+networks (under the sponsorship of the Defense Advanced Research
+Projects Agency (DARPA)), in essence creating a network of networks, was
+done by Vinton Cerf and Robert Kahn \[Cerf 1974\]; the term internetting
+was coined to describe this work. These architectural principles were
+embodied in TCP. The early versions of TCP, however, were quite
+different from today's TCP. The early versions of TCP combined a
+reliable in-sequence delivery of data via end-system retransmission
+(still part of today's TCP) with forwarding functions (which today are
+performed by IP). Early experimentation with TCP, combined with the
+recognition of the importance of an unreliable, non-flow-controlled,
+end-to-end transport service for applications such as packetized voice,
+led to the separation of IP out of TCP and the development of the UDP
+protocol. The three key Internet protocols that we see today---TCP, UDP,
+and IP---were conceptually in place by the end of the 1970s. In addition
+to the DARPA Internet-related research, many other important networking
+activities were underway. In Hawaii, Norman Abramson was developing
+ALOHAnet, a packet-based radio network that allowed multiple remote
+sites on the Hawaiian Islands to communicate with each other. The ALOHA
+protocol \[Abramson 1970\] was the first multiple-access protocol,
+allowing geographically distributed users to share a single broadcast
+communication medium (a radio frequency). Metcalfe and Boggs built on
+Abramson's multiple-access protocol work when they developed the
+Ethernet protocol \[Metcalfe 1976\] for wire-based shared broadcast
+networks. Interestingly, Metcalfe and Boggs' Ethernet protocol was
+motivated by the need to connect multiple PCs, printers, and shared
+disks \[Perkins 1994\]. Twentyfive years ago, well before the PC
+revolution and the explosion of networks, Metcalfe and Boggs were laying
+the foundation for today's PC LANs.
+
+1.7.3 A Proliferation of Networks: 1980--1990 By the end of the 1970s,
+approximately two hundred hosts were connected to the ARPAnet. By the
+end of the 1980s the number of hosts connected to the public Internet, a
+confederation of networks looking much like today's Internet, would
+reach a hundred thousand. The 1980s would be a time of tremendous
+growth. Much of that growth resulted from several distinct efforts to
+create computer networks linking universities together. BITNET provided
+e-mail and file transfers among several universities in the Northeast.
+CSNET (computer science network) was formed to link university
+researchers who did not have access to ARPAnet. In 1986, NSFNET was
+created to provide access to NSF-sponsored supercomputing centers.
+Starting with an initial backbone speed of 56 kbps, NSFNET's backbone
+would be running at 1.5 Mbps by the end of the decade and would serve as
+a primary backbone linking regional networks.
+
+In the ARPAnet community, many of the final pieces of today's Internet
+architecture were falling into place. January 1, 1983 saw the official
+deployment of TCP/IP as the new standard host protocol for ARPAnet
+(replacing the NCP protocol). The transition \[RFC 801\] from NCP to
+TCP/IP was a flag day event---all hosts were required to transfer over
+to TCP/IP as of that day. In the late 1980s, important extensions were
+made to TCP to implement host-based congestion control \[Jacobson
+1988\]. The DNS, used to map between a human-readable Internet name (for
+example, gaia.cs.umass.edu) and its 32-bit IP address, was also
+developed \[RFC 1034\]. Paralleling this development of the ARPAnet
+(which was for the most part a US effort), in the early 1980s the French
+launched the Minitel project, an ambitious plan to bring data networking
+into everyone's home. Sponsored by the French government, the Minitel
+system consisted of a public packet-switched network (based on the X.25
+protocol suite), Minitel servers, and inexpensive terminals with
+built-in low-speed modems. The Minitel became a huge success in 1984
+when the French government gave away a free Minitel terminal to each
+French household that wanted one. Minitel sites included free
+sites---such as a telephone directory site---as well as private sites,
+which collected a usage-based fee from each user. At its peak in the mid
+1990s, it offered more than 20,000 services, ranging from home banking
+to specialized research databases. The Minitel was in a large proportion
+of French homes 10 years before most Americans had ever heard of the
+Internet.
+
+1.7.4 The Internet Explosion: The 1990s The 1990s were ushered in with a
+number of events that symbolized the continued evolution and the
+soon-to-arrive commercialization of the Internet. ARPAnet, the
+progenitor of the Internet, ceased to exist. In 1991, NSFNET lifted its
+restrictions on the use of NSFNET for commercial purposes. NSFNET itself
+would be decommissioned in 1995, with Internet backbone traffic being
+carried by commercial Internet Service Providers. The main event of the
+1990s was to be the emergence of the World Wide Web application, which
+brought the Internet into the homes and businesses of millions of people
+worldwide. The Web served as a platform for enabling and deploying
+hundreds of new applications that we take for granted today, including
+search (e.g., Google and Bing) Internet commerce (e.g., Amazon and eBay)
+and social networks (e.g., Facebook). The Web was invented at CERN by
+Tim Berners-Lee between 1989 and 1991 \[Berners-Lee 1989\], based on
+ideas originating in earlier work on hypertext from the 1940s by
+Vannevar Bush \[Bush 1945\] and since the 1960s by Ted Nelson \[Xanadu
+2012\]. Berners-Lee and his associates developed initial versions of
+HTML, HTTP, a Web server, and a browser---the four key components of the
+Web. Around the end of 1993 there were about two hundred Web servers in
+operation, this collection of servers being
+
+just a harbinger of what was about to come. At about this time several
+researchers were developing Web browsers with GUI interfaces, including
+Marc Andreessen, who along with Jim Clark, formed Mosaic Communications,
+which later became Netscape Communications Corporation \[Cusumano 1998;
+Quittner 1998\]. By 1995, university students were using Netscape
+browsers to surf the Web on a daily basis. At about this time
+companies---big and small---began to operate Web servers and transact
+commerce over the Web. In 1996, Microsoft started to make browsers,
+which started the browser war between Netscape and Microsoft, which
+Microsoft won a few years later \[Cusumano 1998\]. The second half of
+the 1990s was a period of tremendous growth and innovation for the
+Internet, with major corporations and thousands of startups creating
+Internet products and services. By the end of the millennium the
+Internet was supporting hundreds of popular applications, including four
+killer applications: E-mail, including attachments and Web-accessible
+e-mail The Web, including Web browsing and Internet commerce Instant
+messaging, with contact lists Peer-to-peer file sharing of MP3s,
+pioneered by Napster Interestingly, the first two killer applications
+came from the research community, whereas the last two were created by a
+few young entrepreneurs. The period from 1995 to 2001 was a
+roller-coaster ride for the Internet in the financial markets. Before
+they were even profitable, hundreds of Internet startups made initial
+public offerings and started to be traded in a stock market. Many
+companies were valued in the billions of dollars without having any
+significant revenue streams. The Internet stocks collapsed in
+2000--2001, and many startups shut down. Nevertheless, a number of
+companies emerged as big winners in the Internet space, including
+Microsoft, Cisco, Yahoo, e-Bay, Google, and Amazon.
+
+1.7.5 The New Millennium Innovation in computer networking continues at
+a rapid pace. Advances are being made on all fronts, including
+deployments of faster routers and higher transmission speeds in both
+access networks and in network backbones. But the following developments
+merit special attention: Since the beginning of the millennium, we have
+been seeing aggressive deployment of broadband Internet access to
+homes---not only cable modems and DSL but also fiber to the home, as
+discussed in Section 1.2. This high-speed Internet access has set the
+stage for a wealth of video applications, including the distribution of
+user-generated video (for example, YouTube), on-demand streaming of
+movies and television shows (e.g., Netflix), and multi-person video
+conference (e.g., Skype,
+
+Facetime, and Google Hangouts). The increasing ubiquity of high-speed
+(54 Mbps and higher) public WiFi networks and mediumspeed (tens of Mbps)
+Internet access via 4G cellular telephony networks is not only making it
+possible to remain constantly connected while on the move, but also
+enabling new location-specific applications such as Yelp, Tinder, Yik
+Yak, and Waz. The number of wireless devices connecting to the Internet
+surpassed the number of wired devices in 2011. This high-speed wireless
+access has set the stage for the rapid emergence of hand-held computers
+(iPhones, Androids, iPads, and so on), which enjoy constant and
+untethered access to the Internet. Online social networks---such as
+Facebook, Instagram, Twitter, and WeChat (hugely popular in
+China)---have created massive people networks on top of the Internet.
+Many of these social networks are extensively used for messaging as well
+as photo sharing. Many Internet users today "live" primarily within one
+or more social networks. Through their APIs, the online social networks
+create platforms for new networked applications and distributed games.
+As discussed in Section 1.3.3, online service providers, such as Google
+and Microsoft, have deployed their own extensive private networks, which
+not only connect together their globally distributed data centers, but
+are used to bypass the Internet as much as possible by peering directly
+with lower-tier ISPs. As a result, Google provides search results and
+e-mail access almost instantaneously, as if their data centers were
+running within one's own computer. Many Internet commerce companies are
+now running their applications in the "cloud"---such as in Amazon's EC2,
+in Google's Application Engine, or in Microsoft's Azure. Many companies
+and universities have also migrated their Internet applications (e.g.,
+e-mail and Web hosting) to the cloud. Cloud companies not only provide
+applications scalable computing and storage environments, but also
+provide the applications implicit access to their high-performance
+private networks.
+
+1.8 Summary In this chapter we've covered a tremendous amount of
+material! We've looked at the various pieces of hardware and software
+that make up the Internet in particular and computer networks in
+general. We started at the edge of the network, looking at end systems
+and applications, and at the transport service provided to the
+applications running on the end systems. We also looked at the
+link-layer technologies and physical media typically found in the access
+network. We then dove deeper inside the network, into the network core,
+identifying packet switching and circuit switching as the two basic
+approaches for transporting data through a telecommunication network,
+and we examined the strengths and weaknesses of each approach. We also
+examined the structure of the global Internet, learning that the
+Internet is a network of networks. We saw that the Internet's
+hierarchical structure, consisting of higherand lower-tier ISPs, has
+allowed it to scale to include thousands of networks. In the second part
+of this introductory chapter, we examined several topics central to the
+field of computer networking. We first examined the causes of delay,
+throughput and packet loss in a packetswitched network. We developed
+simple quantitative models for transmission, propagation, and queuing
+delays as well as for throughput; we'll make extensive use of these
+delay models in the homework problems throughout this book. Next we
+examined protocol layering and service models, key architectural
+principles in networking that we will also refer back to throughout this
+book. We also surveyed some of the more prevalent security attacks in
+the Internet day. We finished our introduction to networking with a
+brief history of computer networking. The first chapter in itself
+constitutes a minicourse in computer networking. So, we have indeed
+covered a tremendous amount of ground in this first chapter! If you're a
+bit overwhelmed, don't worry. In the following chapters we'll revisit
+all of these ideas, covering them in much more detail (that's a promise,
+not a threat!). At this point, we hope you leave this chapter with a
+still-developing intuition for the pieces that make up a network, a
+still-developing command of the vocabulary of networking (don't be shy
+about referring back to this chapter), and an ever-growing desire to
+learn more about networking. That's the task ahead of us for the rest of
+this book.
+
+Road-Mapping This Book Before starting any trip, you should always
+glance at a road map in order to become familiar with the major roads
+and junctures that lie ahead. For the trip we are about to embark on,
+the ultimate destination is a deep understanding of the how, what, and
+why of computer networks. Our road map is
+
+the sequence of chapters of this book:
+
+1.  Computer Networks and the Internet
+2.  Application Layer
+3.  Transport Layer
+4.  Network Layer: Data Plane
+5.  Network Layer: Control Plane
+6.  The Link Layer and LANs
+7.  Wireless and Mobile Networks
+8.  Security in Computer Networks
+9.  Multimedia Networking Chapters 2 through 6 are the five core
+    chapters of this book. You should notice that these chapters are
+    organized around the top four layers of the five-layer Internet
+    protocol. Further note that our journey will begin at the top of the
+    Internet protocol stack, namely, the application layer, and will
+    work its way downward. The rationale behind this top-down journey is
+    that once we understand the applications, we can understand the
+    network services needed to support these applications. We can then,
+    in turn, examine the various ways in which such services might be
+    implemented by a network architecture. Covering applications early
+    thus provides motivation for the remainder of the text. The second
+    half of the book---Chapters 7 through 9---zooms in on three
+    enormously important (and somewhat independent) topics in modern
+    computer networking. In Chapter 7, we examine wireless and mobile
+    networks, including wireless LANs (including WiFi and Bluetooth),
+    Cellular telephony networks (including GSM, 3G, and 4G), and
+    mobility (in both IP and GSM networks). Chapter 8, which addresses
+    security in computer networks, first looks at the underpinnings of
+    encryption and network security, and then we examine how the basic
+    theory is being applied in a broad range of Internet contexts. The
+    last chapter, which addresses multimedia networking, examines audio
+    and video applications such as Internet phone, video conferencing,
+    and streaming of stored media. We also look at how a packetswitched
+    network can be designed to provide consistent quality of service to
+    audio and video applications.
+
+Homework Problems and Questions
+
+Chapter 1 Review Questions
+
+SECTION 1.1 R1. What is the difference between a host and an end system?
+List several different types of end systems. Is a Web server an end
+system? R2. The word protocol is often used to describe diplomatic
+relations. How does Wikipedia describe diplomatic protocol? R3. Why are
+standards important for protocols?
+
+SECTION 1.2 R4. List six access technologies. Classify each one as home
+access, enterprise access, or widearea wireless access. R5. Is HFC
+transmission rate dedicated or shared among users? Are collisions
+possible in a downstream HFC channel? Why or why not? R6. List the
+available residential access technologies in your city. For each type of
+access, provide the advertised downstream rate, upstream rate, and
+monthly price. R7. What is the transmission rate of Ethernet LANs? R8.
+What are some of the physical media that Ethernet can run over? R9.
+Dial-up modems, HFC, DSL and FTTH are all used for residential access.
+For each of these access technologies, provide a range of transmission
+rates and comment on whether the transmission rate is shared or
+dedicated. R10. Describe the most popular wireless Internet access
+technologies today. Compare and contrast them.
+
+SECTION 1.3 R11. Suppose there is exactly one packet switch between a
+sending host and a receiving host. The transmission rates between the
+sending host and the switch and between the switch and the receiving
+host are R1 and R2, respectively. Assuming that the switch uses
+store-and-forward packet switching, what is the total end-to-end delay
+to send a packet of length L? (Ignore queuing, propagation delay, and
+processing delay.)
+
+R12. What advantage does a circuit-switched network have over a
+packet-switched network? What advantages does TDM have over FDM in a
+circuit-switched network? R13. Suppose users share a 2 Mbps link. Also
+suppose each user transmits continuously at 1 Mbps when transmitting,
+but each user transmits only 20 percent of the time. (See the discussion
+of statistical multiplexing in Section 1.3 .)
+
+a.  When circuit switching is used, how many users can be supported?
+
+b.  For the remainder of this problem, suppose packet switching is used.
+    Why will there be essentially no queuing delay before the link if
+    two or fewer users transmit at the same time? Why will there be a
+    queuing delay if three users transmit at the same time?
+
+c.  Find the probability that a given user is transmitting.
+
+d.  Suppose now there are three users. Find the probability that at any
+    given time, all three users are transmitting simultaneously. Find
+    the fraction of time during which the queue grows. R14. Why will two
+    ISPs at the same level of the hierarchy often peer with each other?
+    How does an IXP earn money? R15. Some content providers have created
+    their own networks. Describe Google's network. What motivates
+    content providers to create these networks?
+
+SECTION 1.4 R16. Consider sending a packet from a source host to a
+destination host over a fixed route. List the delay components in the
+end-to-end delay. Which of these delays are constant and which are
+variable? R17. Visit the Transmission Versus Propagation Delay applet at
+the companion Web site. Among the rates, propagation delay, and packet
+sizes available, find a combination for which the sender finishes
+transmitting before the first bit of the packet reaches the receiver.
+Find another combination for which the first bit of the packet reaches
+the receiver before the sender finishes transmitting. R18. How long does
+it take a packet of length 1,000 bytes to propagate over a link of
+distance 2,500 km, propagation speed 2.5⋅108 m/s, and transmission rate
+2 Mbps? More generally, how long does it take a packet of length L to
+propagate over a link of distance d, propagation speed s, and
+transmission rate R bps? Does this delay depend on packet length? Does
+this delay depend on transmission rate? R19. Suppose Host A wants to
+send a large file to Host B. The path from Host A to Host B has three
+links, of rates R1=500 kbps, R2=2 Mbps, and R3=1 Mbps.
+
+a.  Assuming no other traffic in the network, what is the throughput for
+    the file transfer?
+
+b.  Suppose the file is 4 million bytes. Dividing the file size by the
+    throughput, roughly how long will it take to transfer the file to
+    Host B?
+
+c.  Repeat (a) and (b), but now with R2 reduced to 100 kbps.
+
+R20. Suppose end system A wants to send a large file to end system B. At
+a very high level, describe how end system A creates packets from the
+file. When one of these packets arrives to a router, what information in
+the packet does the router use to determine the link onto which the
+packet is forwarded? Why is packet switching in the Internet analogous
+to driving from one city to another and asking directions along the way?
+R21. Visit the Queuing and Loss applet at the companion Web site. What
+is the maximum emission rate and the minimum transmission rate? With
+those rates, what is the traffic intensity? Run the applet with these
+rates and determine how long it takes for packet loss to occur. Then
+repeat the experiment a second time and determine again how long it
+takes for packet loss to occur. Are the values different? Why or why
+not?
+
+SECTION 1.5 R22. List five tasks that a layer can perform. Is it
+possible that one (or more) of these tasks could be performed by two (or
+more) layers? R23. What are the five layers in the Internet protocol
+stack? What are the principal responsibilities of each of these layers?
+R24. What is an application-layer message? A transport-layer segment? A
+network-layer datagram? A link-layer frame? R25. Which layers in the
+Internet protocol stack does a router process? Which layers does a
+link-layer switch process? Which layers does a host process?
+
+SECTION 1.6 R26. What is the difference between a virus and a worm? R27.
+Describe how a botnet can be created and how it can be used for a DDoS
+attack. R28. Suppose Alice and Bob are sending packets to each other
+over a computer network. Suppose Trudy positions herself in the network
+so that she can capture all the packets sent by Alice and send whatever
+she wants to Bob; she can also capture all the packets sent by Bob and
+send whatever she wants to Alice. List some of the malicious things
+Trudy can do from this position.
+
+Problems P1. Design and describe an application-level protocol to be
+used between an automatic teller machine and a bank's centralized
+computer. Your protocol should allow a user's card and password to be
+verified, the account balance (which is maintained at the centralized
+computer) to be queried, and an account withdrawal to be made (that is,
+money disbursed to the user).
+
+Your protocol entities should be able to handle the all-too-common case
+in which there is not enough money in the account to cover the
+withdrawal. Specify your protocol by listing the messages exchanged and
+the action taken by the automatic teller machine or the bank's
+centralized computer on transmission and receipt of messages. Sketch the
+operation of your protocol for the case of a simple withdrawal with no
+errors, using a diagram similar to that in Figure 1.2 . Explicitly state
+the assumptions made by your protocol about the underlying end-toend
+transport service. P2. Equation 1.1 gives a formula for the end-to-end
+delay of sending one packet of length L over N links of transmission
+rate R. Generalize this formula for sending P such packets back-toback
+over the N links. P3. Consider an application that transmits data at a
+steady rate (for example, the sender generates an N-bit unit of data
+every k time units, where k is small and fixed). Also, when such an
+application starts, it will continue running for a relatively long
+period of time. Answer the following questions, briefly justifying your
+answer:
+
+a.  Would a packet-switched network or a circuit-switched network be
+    more appropriate for this application? Why?
+
+b.  Suppose that a packet-switched network is used and the only traffic
+    in this network comes from such applications as described above.
+    Furthermore, assume that the sum of the application data rates is
+    less than the capacities of each and every link. Is some form of
+    congestion control needed? Why? P4. Consider the circuit-switched
+    network in Figure 1.13 . Recall that there are 4 circuits on each
+    link. Label the four switches A, B, C, and D, going in the clockwise
+    direction.
+
+c.  What is the maximum number of simultaneous connections that can be
+    in progress at any one time in this network?
+
+d.  Suppose that all connections are between switches A and C. What is
+    the maximum number of simultaneous connections that can be in
+    progress?
+
+e.  Suppose we want to make four connections between switches A and C,
+    and another four connections between switches B and D. Can we route
+    these calls through the four links to accommodate all eight
+    connections? P5. Review the car-caravan analogy in Section 1.4 .
+    Assume a propagation speed of 100 km/hour.
+
+f.  Suppose the caravan travels 150 km, beginning in front of one
+    tollbooth, passing through a second tollbooth, and finishing just
+    after a third tollbooth. What is the end-to-end delay?
+
+g.  Repeat (a), now assuming that there are eight cars in the caravan
+    instead of ten. P6. This elementary problem begins to explore
+    propagation delay and transmission delay, two central concepts in
+    data networking. Consider two hosts, A and B, connected by a single
+    link of rate R bps. Suppose that the two hosts are separated by m
+    meters, and suppose the
+
+propagation speed along the link is s meters/sec. Host A is to send a
+packet of size L bits to Host B.
+
+Exploring propagation delay and transmission delay
+
+a.  Express the propagation delay, dprop, in terms of m and s.
+
+b.  Determine the transmission time of the packet, dtrans, in terms of L
+    and R.
+
+c.  Ignoring processing and queuing delays, obtain an expression for the
+    end-to-end delay.
+
+d.  Suppose Host A begins to transmit the packet at time t=0. At time t=
+    dtrans, where is the last bit of the packet?
+
+e.  Suppose dprop is greater than dtrans. At time t=dtrans, where is the
+    first bit of the packet?
+
+f.  Suppose dprop is less than dtrans. At time t=dtrans, where is the
+    first bit of the packet?
+
+g.  Suppose s=2.5⋅108, L=120 bits, and R=56 kbps. Find the distance m so
+    that dprop equals dtrans. P7. In this problem, we consider sending
+    real-time voice from Host A to Host B over a packetswitched network
+    (VoIP). Host A converts analog voice to a digital 64 kbps bit stream
+    on the fly. Host A then groups the bits into 56-byte packets. There
+    is one link between Hosts A and B; its transmission rate is 2 Mbps
+    and its propagation delay is 10 msec. As soon as Host A gathers a
+    packet, it sends it to Host B. As soon as Host B receives an entire
+    packet, it converts the packet's bits to an analog signal. How much
+    time elapses from the time a bit is created (from the original
+    analog signal at Host A) until the bit is decoded (as part of the
+    analog signal at Host B)? P8. Suppose users share a 3 Mbps link.
+    Also suppose each user requires 150 kbps when transmitting, but each
+    user transmits only 10 percent of the time. (See the discussion of
+    packet switching versus circuit switching in Section 1.3 .)
+
+h.  When circuit switching is used, how many users can be supported?
+
+i.  For the remainder of this problem, suppose packet switching is used.
+    Find the probability that a given user is transmitting.
+
+j.  Suppose there are 120 users. Find the probability that at any given
+    time, exactly n users are transmitting simultaneously. (Hint: Use
+    the binomial distribution.)
+
+k.  Find the probability that there are 21 or more users transmitting
+    simultaneously. P9. Consider the discussion in Section 1.3 of packet
+    switching versus circuit switching in which an example is provided
+    with a 1 Mbps link. Users are generating data at a rate of 100 kbps
+    when busy, but are busy generating data only with probability p=0.1.
+    Suppose that the 1 Mbps link is
+
+replaced by a 1 Gbps link.
+
+a.  What is N, the maximum number of users that can be supported
+    simultaneously under circuit switching?
+
+b.  Now consider packet switching and a user population of M users. Give
+    a formula (in terms of p, M, N) for the probability that more than N
+    users are sending data. P10. Consider a packet of length L that
+    begins at end system A and travels over three links to a destination
+    end system. These three links are connected by two packet switches.
+    Let di, si, and Ri denote the length, propagation speed, and the
+    transmission rate of link i, for i=1,2,3. The packet switch delays
+    each packet by dproc. Assuming no queuing delays, in terms of di,
+    si, Ri, (i=1,2,3), and L, what is the total end-to-end delay for the
+    packet? Suppose now the packet is 1,500 bytes, the propagation speed
+    on all three links is 2.5⋅108m/s, the transmission rates of all
+    three links are 2 Mbps, the packet switch processing delay is 3
+    msec, the length of the first link is 5,000 km, the length of the
+    second link is 4,000 km, and the length of the last link is 1,000
+    km. For these values, what is the end-to-end delay? P11. In the
+    above problem, suppose R1=R2=R3=R and dproc=0. Further suppose the
+    packet switch does not store-and-forward packets but instead
+    immediately transmits each bit it receives before waiting for the
+    entire packet to arrive. What is the end-to-end delay? P12. A packet
+    switch receives a packet and determines the outbound link to which
+    the packet should be forwarded. When the packet arrives, one other
+    packet is halfway done being transmitted on this outbound link and
+    four other packets are waiting to be transmitted. Packets are
+    transmitted in order of arrival. Suppose all packets are 1,500 bytes
+    and the link rate is 2 Mbps. What is the queuing delay for the
+    packet? More generally, what is the queuing delay when all packets
+    have length L, the transmission rate is R, x bits of the
+    currently-being-transmitted packet have been transmitted, and n
+    packets are already in the queue? P13.
+
+c.  Suppose N packets arrive simultaneously to a link at which no
+    packets are currently being transmitted or queued. Each packet is of
+    length L and the link has transmission rate R. What is the average
+    queuing delay for the N packets?
+
+d.  Now suppose that N such packets arrive to the link every LN/R
+    seconds. What is the average queuing delay of a packet? P14.
+    Consider the queuing delay in a router buffer. Let I denote traffic
+    intensity; that is, I=La/R. Suppose that the queuing delay takes the
+    form IL/R(1−I) for I\<1.
+
+e.  Provide a formula for the total delay, that is, the queuing delay
+    plus the transmission delay.
+
+f.  Plot the total delay as a function of L /R. P15. Let a denote the
+    rate of packets arriving at a link in packets/sec, and let µ denote
+    the link's transmission rate in packets/sec. Based on the formula
+    for the total delay (i.e., the queuing delay
+
+plus the transmission delay) derived in the previous problem, derive a
+formula for the total delay in terms of a and µ. P16. Consider a router
+buffer preceding an outbound link. In this problem, you will use
+Little's formula, a famous formula from queuing theory. Let N denote the
+average number of packets in the buffer plus the packet being
+transmitted. Let a denote the rate of packets arriving at the link. Let
+d denote the average total delay (i.e., the queuing delay plus the
+transmission delay) experienced by a packet. Little's formula is
+N=a⋅d. Suppose that on average, the buffer contains 10 packets, and the
+average packet queuing delay is 10 msec. The link's transmission rate is
+100 packets/sec. Using Little's formula, what is the average packet
+arrival rate, assuming there is no packet loss? P17.
+
+a.  Generalize Equation 1.2 in Section 1.4.3 for heterogeneous
+    processing rates, transmission rates, and propagation delays.
+
+b.  Repeat (a), but now also suppose that there is an average queuing
+    delay of dqueue at each node. P18. Perform a Traceroute between
+    source and destination on the same continent at three different
+    hours of the day.
+
+Using Traceroute to discover network paths and measure network delay
+
+a.  Find the average and standard deviation of the round-trip delays at
+    each of the three hours.
+
+b.  Find the number of routers in the path at each of the three hours.
+    Did the paths change during any of the hours?
+
+c.  Try to identify the number of ISP networks that the Traceroute
+    packets pass through from source to destination. Routers with
+    similar names and/or similar IP addresses should be considered as
+    part of the same ISP. In your experiments, do the largest delays
+    occur at the peering interfaces between adjacent ISPs?
+
+d.  Repeat the above for a source and destination on different
+    continents. Compare the intra-continent and inter-continent results.
+    P19.
+
+e.  Visit the site www.traceroute.org and perform traceroutes from two
+    different cities in France to the same destination host in the
+    United States. How many links are the same
+
+in the two traceroutes? Is the transatlantic link the same?
+
+b.  Repeat (a) but this time choose one city in France and another city
+    in Germany.
+
+c.  Pick a city in the United States, and perform traceroutes to two
+    hosts, each in a different city in China. How many links are common
+    in the two traceroutes? Do the two traceroutes diverge before
+    reaching China? P20. Consider the throughput example corresponding
+    to Figure 1.20(b) . Now suppose that there are M client-server pairs
+    rather than 10. Denote Rs, Rc, and R for the rates of the server
+    links, client links, and network link. Assume all other links have
+    abundant capacity and that there is no other traffic in the network
+    besides the traffic generated by the M client-server pairs. Derive a
+    general expression for throughput in terms of Rs, Rc, R, and M. P21.
+    Consider Figure 1.19(b) . Now suppose that there are M paths between
+    the server and the client. No two paths share any link. Path
+    k(k=1,...,M) consists of N links with transmission rates
+    R1k,R2k,...,RNk. If the server can only use one path to send data to
+    the client, what is the maximum throughput that the server can
+    achieve? If the server can use all M paths to send data, what is the
+    maximum throughput that the server can achieve? P22. Consider Figure
+    1.19(b) . Suppose that each link between the server and the client
+    has a packet loss probability p, and the packet loss probabilities
+    for these links are independent. What is the probability that a
+    packet (sent by the server) is successfully received by the
+    receiver? If a packet is lost in the path from the server to the
+    client, then the server will re-transmit the packet. On average, how
+    many times will the server re-transmit the packet in order for the
+    client to successfully receive the packet? P23. Consider Figure
+    1.19(a) . Assume that we know the bottleneck link along the path
+    from the server to the client is the first link with rate Rs
+    bits/sec. Suppose we send a pair of packets back to back from the
+    server to the client, and there is no other traffic on this path.
+    Assume each packet of size L bits, and both links have the same
+    propagation delay dprop.
+
+d.  What is the packet inter-arrival time at the destination? That is,
+    how much time elapses from when the last bit of the first packet
+    arrives until the last bit of the second packet arrives?
+
+e.  Now assume that the second link is the bottleneck link (i.e.,
+    Rc\<Rs). Is it possible that the second packet queues at the input
+    queue of the second link? Explain. Now suppose that the server sends
+    the second packet T seconds after sending the first packet. How
+    large must T be to ensure no queuing before the second link?
+    Explain. P24. Suppose you would like to urgently deliver 40
+    terabytes data from Boston to Los Angeles. You have available a 100
+    Mbps dedicated link for data transfer. Would you prefer to transmit
+    the data via this link or instead use FedEx over-night delivery?
+    Explain. P25. Suppose two hosts, A and B, are separated by 20,000
+    kilometers and are connected by a direct link of R=2 Mbps. Suppose
+    the propagation speed over the link is 2.5⋅108 meters/sec.
+
+f.  Calculate the bandwidth-delay product, R⋅dprop.
+
+b. Consider sending a file of 800,000 bits from Host A to Host B.
+Suppose the file is sent continuously as one large message. What is the
+maximum number of bits that will be in the link at any given time?
+
+c.  Provide an interpretation of the bandwidth-delay product.
+
+d.  What is the width (in meters) of a bit in the link? Is it longer
+    than a football field?
+
+e.  Derive a general expression for the width of a bit in terms of the
+    propagation speed s, the transmission rate R, and the length of the
+    link m. P26. Referring to problem P25, suppose we can modify R. For
+    what value of R is the width of a bit as long as the length of the
+    link? P27. Consider problem P25 but now with a link of R=1 Gbps.
+
+f.  Calculate the bandwidth-delay product, R⋅dprop.
+
+g.  Consider sending a file of 800,000 bits from Host A to Host B.
+    Suppose the file is sent continuously as one big message. What is
+    the maximum number of bits that will be in the link at any given
+    time?
+
+h.  What is the width (in meters) of a bit in the link? P28. Refer again
+    to problem P25.
+
+i.  How long does it take to send the file, assuming it is sent
+    continuously?
+
+j.  Suppose now the file is broken up into 20 packets with each packet
+    containing 40,000 bits. Suppose that each packet is acknowledged by
+    the receiver and the transmission time of an acknowledgment packet
+    is negligible. Finally, assume that the sender cannot send a packet
+    until the preceding one is acknowledged. How long does it take to
+    send the file?
+
+k.  Compare the results from (a) and (b). P29. Suppose there is a 10
+    Mbps microwave link between a geostationary satellite and its base
+    station on Earth. Every minute the satellite takes a digital photo
+    and sends it to the base station. Assume a propagation speed of
+    2.4⋅108 meters/sec.
+
+l.  What is the propagation delay of the link?
+
+m.  What is the bandwidth-delay product, R⋅dprop?
+
+n.  Let x denote the size of the photo. What is the minimum value of x
+    for the microwave link to be continuously transmitting? P30.
+    Consider the airline travel analogy in our discussion of layering in
+    Section 1.5 , and the addition of headers to protocol data units as
+    they flow down the protocol stack. Is there an equivalent notion of
+    header information that is added to passengers and baggage as they
+    move down the airline protocol stack? P31. In modern packet-switched
+    networks, including the Internet, the source host segments long,
+    application-layer messages (for example, an image or a music file)
+    into smaller packets
+
+and sends the packets into the network. The receiver then reassembles
+the packets back into the original message. We refer to this process as
+message segmentation. Figure 1.27 illustrates the end-to-end transport
+of a message with and without message segmentation. Consider a message
+that is 8⋅106 bits long that is to be sent from source to destination in
+Figure 1.27 . Suppose each link in the figure is 2 Mbps. Ignore
+propagation, queuing, and processing delays.
+
+a.  Consider sending the message from source to destination without
+    message segmentation. How long does it take to move the message from
+    the source host to the first packet switch? Keeping in mind that
+    each switch uses store-and-forward packet switching, what is the
+    total time to move the message from source host to destination host?
+
+b.  Now suppose that the message is segmented into 800 packets, with
+    each packet being 10,000 bits long. How long does it take to move
+    the first packet from source host to the first switch? When the
+    first packet is being sent from the first switch to the second
+    switch, the second packet is being sent from the source host to the
+    first switch. At what time will the second packet be fully received
+    at the first switch?
+
+c.  How long does it take to move the file from source host to
+    destination host when message segmentation is used? Compare this
+    result with your answer in part (a) and comment.
+
+Figure 1.27 End-to-end message transport: (a) without message
+segmentation; (b) with message segmentation
+
+d.  In addition to reducing delay, what are reasons to use message
+    segmentation?
+e.  Discuss the drawbacks of message segmentation. P32. Experiment with
+    the Message Segmentation applet at the book's Web site. Do the
+    delays in the applet correspond to the delays in the previous
+    problem? How do link propagation delays affect the overall
+    end-to-end delay for packet switching (with message segmentation)
+    and for message switching? P33. Consider sending a large file of F
+    bits from Host A to Host B. There are three links (and two switches)
+    between A and B, and the links are uncongested (that is, no queuing
+    delays). Host A
+
+segments the file into segments of S bits each and adds 80 bits of
+header to each segment, forming packets of L=80 + S bits. Each link has
+a transmission rate of R bps. Find the value of S that minimizes the
+delay of moving the file from Host A to Host B. Disregard propagation
+delay. P34. Skype offers a service that allows you to make a phone call
+from a PC to an ordinary phone. This means that the voice call must pass
+through both the Internet and through a telephone network. Discuss how
+this might be done.
+
+Wireshark Lab
+
+"Tell me and I forget. Show me and I remember. Involve me and I
+understand." Chinese proverb
+
+One's understanding of network protocols can often be greatly deepened
+by seeing them in action and by playing around with them---observing the
+sequence of messages exchanged between two protocol entities, delving
+into the details of protocol operation, causing protocols to perform
+certain actions, and observing these actions and their consequences.
+This can be done in simulated scenarios or in a real network environment
+such as the Internet. The Java applets at the textbook Web site take the
+first approach. In the Wireshark labs, we'll take the latter approach.
+You'll run network applications in various scenarios using a computer on
+your desk, at home, or in a lab. You'll observe the network protocols in
+your computer, interacting and exchanging messages with protocol
+entities executing elsewhere in the Internet. Thus, you and your
+computer will be an integral part of these live labs. You'll
+observe---and you'll learn---by doing. The basic tool for observing the
+messages exchanged between executing protocol entities is called a
+packet sniffer. As the name suggests, a packet sniffer passively copies
+(sniffs) messages being sent from and received by your computer; it also
+displays the contents of the various protocol fields of these captured
+messages. A screenshot of the Wireshark packet sniffer is shown in
+Figure 1.28. Wireshark is a free packet sniffer that runs on Windows,
+Linux/Unix, and Mac computers.
+
+Figure 1.28 A Wireshark screenshot (Wireshark screenshot reprinted by
+permission of the Wireshark Foundation.)
+
+Throughout the textbook, you will find Wireshark labs that allow you to
+explore a number of the protocols studied in the chapter. In this first
+Wireshark lab, you'll obtain and install a copy of Wireshark, access a
+Web site, and capture and examine the protocol messages being exchanged
+between your Web browser and the Web server. You can find full details
+about this first Wireshark lab (including instructions about how to
+obtain and install Wireshark) at the Web site
+http://www.pearsonhighered.com/csresources/.
+
+AN INTERVIEW WITH... Leonard Kleinrock Leonard Kleinrock is a professor
+of computer science at the University of California, Los Angeles. In
+1969, his computer at UCLA became the first node of the Internet. His
+creation of packet-switching principles in 1961 became the technology
+behind the Internet. He received his B.E.E. from the City College of New
+York (CCNY) and his masters and PhD in electrical engineering from MIT.
+
+What made you decide to specialize in networking/Internet technology? As
+a PhD student at MIT in 1959, I looked around and found that most of my
+classmates were doing research in the area of information theory and
+coding theory. At MIT, there was the great researcher, Claude Shannon,
+who had launched these fields and had solved most of the important
+problems already. The research problems that were left were hard and of
+lesser consequence. So I decided to launch out in a new area that no one
+else had yet conceived of. Remember that at MIT I was surrounded by lots
+of computers, and it was clear to me that soon these machines would need
+to communicate with each other. At the time, there was no effective way
+for them to do so, so I decided to develop the technology that would
+permit efficient and reliable data networks to be created. What was your
+first job in the computer industry? What did it entail? I went to the
+evening session at CCNY from 1951 to 1957 for my bachelor's degree in
+electrical engineering. During the day, I worked first as a technician
+and then as an engineer at a small, industrial electronics firm called
+Photobell. While there, I introduced digital technology to their product
+line. Essentially, we were using photoelectric devices to detect the
+presence of certain items (boxes, people, etc.) and the use of a circuit
+known then as a bistable multivibrator was just the kind of technology
+we needed to bring digital processing into this field of detection.
+These circuits happen to be the building blocks for computers, and have
+come to be known as flip-flops or switches in today's vernacular. What
+was going through your mind when you sent the first host-to-host message
+(from UCLA to the Stanford Research Institute)? Frankly, we had no idea
+of the importance of that event. We had not prepared a special message
+of historic significance, as did so many inventors of the past (Samuel
+Morse with "What hath God wrought." or Alexander Graham Bell with
+"Watson, come here! I want you." or Neal Amstrong with "That's one small
+step for a man, one giant leap for mankind.") Those guys were
+
+smart! They understood media and public relations. All we wanted to do
+was to login to the SRI computer. So we typed the "L", which was
+correctly received, we typed the "o" which was received, and then we
+typed the "g" which caused the SRI host computer to crash! So, it turned
+out that our message was the shortest and perhaps the most prophetic
+message ever, namely "Lo!" as in "Lo and behold!" Earlier that year, I
+was quoted in a UCLA press release saying that once the network was up
+and running, it would be possible to gain access to computer utilities
+from our homes and offices as easily as we gain access to electricity
+and telephone connectivity. So my vision at that time was that the
+Internet would be ubiquitous, always on, always available, anyone with
+any device could connect from any location, and it would be invisible.
+However, I never anticipated that my 99-year-old mother would use the
+Internet---and indeed she did! What is your vision for the future of
+networking? The easy part of the vision is to predict the infrastructure
+itself. I anticipate that we see considerable deployment of nomadic
+computing, mobile devices, and smart spaces. Indeed, the availability of
+lightweight, inexpensive, high-performance, portable computing, and
+communication devices (plus the ubiquity of the Internet) has enabled us
+to become nomads. Nomadic computing refers to the technology that
+enables end users who travel from place to place to gain access to
+Internet services in a transparent fashion, no matter where they travel
+and no matter what device they carry or gain access to. The harder part
+of the vision is to predict the applications and services, which have
+consistently surprised us in dramatic ways (e-mail, search technologies,
+the World Wide Web, blogs, social networks, user generation, and sharing
+of music, photos, and videos, etc.). We are on the verge of a new class
+of surprising and innovative mobile applications delivered to our
+hand-held devices. The next step will enable us to move out from the
+netherworld of cyberspace to the physical world of smart spaces. Our
+environments (desks, walls, vehicles, watches, belts, and so on) will
+come alive with technology, through actuators, sensors, logic,
+processing, storage, cameras, microphones, speakers, displays, and
+communication. This embedded technology will allow our environment to
+provide the IP services we want. When I walk into a room, the room will
+know I entered. I will be able to communicate with my environment
+naturally, as in spoken English; my requests will generate replies that
+present Web pages to me from wall displays, through my eyeglasses, as
+speech, holograms, and so forth. Looking a bit further out, I see a
+networking future that includes the following additional key components.
+I see intelligent software agents deployed across the network whose
+function it is to mine data, act on that data, observe trends, and carry
+out tasks dynamically and adaptively. I see considerably more network
+traffic generated not so much by humans, but by these embedded devices
+and these intelligent software agents. I see large collections of
+selforganizing systems controlling this vast, fast network. I see huge
+amounts of information flashing
+
+across this network instantaneously with this information undergoing
+enormous processing and filtering. The Internet will essentially be a
+pervasive global nervous system. I see all these things and more as we
+move headlong through the twenty-first century. What people have
+inspired you professionally? By far, it was Claude Shannon from MIT, a
+brilliant researcher who had the ability to relate his mathematical
+ideas to the physical world in highly intuitive ways. He was on my PhD
+thesis committee. Do you have any advice for students entering the
+networking/Internet field? The Internet and all that it enables is a
+vast new frontier, full of amazing challenges. There is room for great
+innovation. Don't be constrained by today's technology. Reach out and
+imagine what could be and then make it happen.
+
+Chapter 2 Application Layer
+
+Network applications are the raisons d'être of a computer network---if
+we couldn't conceive of any useful applications, there wouldn't be any
+need for networking infrastructure and protocols to support them. Since
+the Internet's inception, numerous useful and entertaining applications
+have indeed been created. These applications have been the driving force
+behind the Internet's success, motivating people in homes, schools,
+governments, and businesses to make the Internet an integral part of
+their daily activities. Internet applications include the classic
+text-based applications that became popular in the 1970s and 1980s: text
+e-mail, remote access to computers, file transfers, and newsgroups. They
+include the killer application of the mid-1990s, the World Wide Web,
+encompassing Web surfing, search, and electronic commerce. They include
+instant messaging and P2P file sharing, the two killer applications
+introduced at the end of the millennium. In the new millennium, new and
+highly compelling applications continue to emerge, including voice over
+IP and video conferencing such as Skype, Facetime, and Google Hangouts;
+user generated video such as YouTube and movies on demand such as
+Netflix; multiplayer online games such as Second Life and World of
+Warcraft. During this same period, we have seen the emergence of a new
+generation of social networking applications---such as Facebook,
+Instagram, Twitter, and WeChat---which have created engaging human
+networks on top of the Internet's network or routers and communication
+links. And most recently, along with the arrival of the smartphone,
+there has been a profusion of location based mobile apps, including
+popular check-in, dating, and road-traffic forecasting apps (such as
+Yelp, Tinder, Waz, and Yik Yak). Clearly, there has been no slowing down
+of new and exciting Internet applications. Perhaps some of the readers
+of this text will create the next generation of killer Internet
+applications! In this chapter we study the conceptual and implementation
+aspects of network applications. We begin by defining key
+application-layer concepts, including network services required by
+applications, clients and servers, processes, and transport-layer
+interfaces. We examine several network applications in detail, including
+the Web, e-mail, DNS, peer-to-peer (P2P) file distribution, and video
+streaming. (Chapter 9 will further examine multimedia applications,
+including streaming video and VoIP.) We then cover network application
+development, over both TCP and UDP. In particular, we study the socket
+interface and walk through some simple client-server applications in
+Python. We also provide several fun and interesting socket programming
+assignments at the end of the chapter.
+
+The application layer is a particularly good place to start our study of
+protocols. It's familiar ground. We're acquainted with many of the
+applications that rely on the protocols we'll study. It will give us a
+good feel for what protocols are all about and will introduce us to many
+of the same issues that we'll see again when we study transport,
+network, and link layer protocols.
+
+2.1 Principles of Network Applications Suppose you have an idea for a
+new network application. Perhaps this application will be a great
+service to humanity, or will please your professor, or will bring you
+great wealth, or will simply be fun to develop. Whatever the motivation
+may be, let's now examine how you transform the idea into a real-world
+network application. At the core of network application development is
+writing programs that run on different end systems and communicate with
+each other over the network. For example, in the Web application there
+are two distinct programs that communicate with each other: the browser
+program running in the user's host (desktop, laptop, tablet, smartphone,
+and so on); and the Web server program running in the Web server host.
+As another example, in a P2P file-sharing system there is a program in
+each host that participates in the file-sharing community. In this case,
+the programs in the various hosts may be similar or identical. Thus,
+when developing your new application, you need to write software that
+will run on multiple end systems. This software could be written, for
+example, in C, Java, or Python. Importantly, you do not need to write
+software that runs on network-core devices, such as routers or
+link-layer switches. Even if you wanted to write application software
+for these network-core devices, you wouldn't be able to do so. As we
+learned in Chapter 1, and as shown earlier in Figure 1.24, network-core
+devices do not function at the application layer but instead function at
+lower layers---specifically at the network layer and below. This basic
+design---namely, confining application software to the end systems---as
+shown in Figure 2.1, has facilitated the rapid development and
+deployment of a vast array of network applications.
+
+Figure 2.1 Communication for a network application takes place between
+end systems at the application layer
+
+2.1.1 Network Application Architectures
+
+Before diving into software coding, you should have a broad
+architectural plan for your application. Keep in mind that an
+application's architecture is distinctly different from the network
+architecture (e.g., the five-layer Internet architecture discussed in
+Chapter 1). From the application developer's perspective, the network
+architecture is fixed and provides a specific set of services to
+applications. The application architecture, on the other hand, is
+designed by the application developer and dictates how the application
+is structured over the various end systems. In choosing the application
+architecture, an application developer will likely draw on one of the
+two predominant architectural paradigms used in modern network
+applications: the client-server architecture or the peer-to-peer (P2P)
+architecture. In a client-server architecture, there is an always-on
+host, called the server, which services requests from many other hosts,
+called clients. A classic example is the Web application for which an
+always-on Web server services requests from browsers running on client
+hosts. When a Web server receives a request for an object from a client
+host, it responds by sending the requested object to the client host.
+Note that with the client-server architecture, clients do not directly
+communicate with each other; for example, in the Web application, two
+browsers do not directly communicate. Another characteristic of the
+client-server architecture is that the server has a fixed, well-known
+address, called an IP address (which we'll discuss soon). Because the
+server has a fixed, well-known address, and because the server is always
+on, a client can always contact the server by sending a packet to the
+server's IP address. Some of the better-known applications with a
+client-server architecture include the Web, FTP, Telnet, and e-mail. The
+client-server architecture is shown in Figure 2.2(a). Often in a
+client-server application, a single-server host is incapable of keeping
+up with all the requests from clients. For example, a popular
+social-networking site can quickly become overwhelmed if it has only one
+server handling all of its requests. For this reason, a data center,
+housing a large number of hosts, is often used to create a powerful
+virtual server. The most popular Internet services---such as search
+engines (e.g., Google, Bing, Baidu), Internet commerce (e.g., Amazon,
+eBay, Alibaba), Webbased e-mail (e.g., Gmail and Yahoo Mail), social
+networking (e.g., Facebook, Instagram, Twitter, and WeChat)---employ one
+or more data centers. As discussed in Section 1.3.3, Google has 30 to 50
+data centers distributed around the world, which collectively handle
+search, YouTube, Gmail, and other services. A data center can have
+hundreds of thousands of servers, which must be powered and maintained.
+Additionally, the service providers must pay recurring interconnection
+and bandwidth costs for sending data from their data centers. In a P2P
+architecture, there is minimal (or no) reliance on dedicated servers in
+data centers. Instead the application exploits direct communication
+between pairs of intermittently connected hosts, called peers. The peers
+are not owned by the service provider, but are instead desktops and
+laptops controlled by users, with most of the
+
+Figure 2.2 (a) Client-server architecture; (b) P2P architecture
+
+peers residing in homes, universities, and offices. Because the peers
+communicate without passing through a dedicated server, the architecture
+is called peer-to-peer. Many of today's most popular and
+traffic-intensive applications are based on P2P architectures. These
+applications include file sharing (e.g., BitTorrent), peer-assisted
+download acceleration (e.g., Xunlei), and Internet telephony and video
+conference (e.g., Skype). The P2P architecture is illustrated in Figure
+2.2(b). We mention that some applications have hybrid architectures,
+combining both client-server and P2P elements. For example, for many
+instant messaging applications, servers are used to track the IP
+addresses of users, but user-touser messages are sent directly between
+user hosts (without passing through intermediate servers). One of the
+most compelling features of P2P architectures is their self-scalability.
+For example, in a P2P file-sharing application, although each peer
+generates workload by requesting files, each peer also adds service
+capacity to the system by distributing files to other peers. P2P
+architectures are also cost effective, since they normally don't require
+significant server infrastructure and server bandwidth (in contrast with
+clients-server designs with datacenters). However, P2P applications face
+challenges of security, performance, and reliability due to their highly
+decentralized structure.
+
+2.1.2 Processes Communicating Before building your network application,
+you also need a basic understanding of how the programs, running in
+multiple end systems, communicate with each other. In the jargon of
+operating systems, it is not actually programs but processes that
+communicate. A process can be thought of as a program that is running
+within an end system. When processes are running on the same end system,
+they can communicate with each other with interprocess communication,
+using rules that are governed by the end system's operating system. But
+in this book we are not particularly interested in how processes in the
+same host communicate, but instead in how processes running on different
+hosts (with potentially different operating systems) communicate.
+Processes on two different end systems communicate with each other by
+exchanging messages across the computer network. A sending process
+creates and sends messages into the network; a receiving process
+receives these messages and possibly responds by sending messages back.
+Figure 2.1 illustrates that processes communicating with each other
+reside in the application layer of the five-layer protocol stack. Client
+and Server Processes A network application consists of pairs of
+processes that send messages to each other over a network. For example,
+in the Web application a client browser process exchanges messages with
+a Web server
+
+process. In a P2P file-sharing system, a file is transferred from a
+process in one peer to a process in another peer. For each pair of
+communicating processes, we typically label one of the two processes as
+the client and the other process as the server. With the Web, a browser
+is a client process and a Web server is a server process. With P2P file
+sharing, the peer that is downloading the file is labeled as the client,
+and the peer that is uploading the file is labeled as the server. You
+may have observed that in some applications, such as in P2P file
+sharing, a process can be both a client and a server. Indeed, a process
+in a P2P file-sharing system can both upload and download files.
+Nevertheless, in the context of any given communication session between
+a pair of processes, we can still label one process as the client and
+the other process as the server. We define the client and server
+processes as follows: In the context of a communication session between
+a pair of processes, the process that initiates the communication (that
+is, initially contacts the other process at the beginning of the
+session) is labeled as the client. The process that waits to be
+contacted to begin the session is the server. In the Web, a browser
+process initializes contact with a Web server process; hence the browser
+process is the client and the Web server process is the server. In P2P
+file sharing, when Peer A asks Peer B to send a specific file, Peer A is
+the client and Peer B is the server in the context of this specific
+communication session. When there's no confusion, we'll sometimes also
+use the terminology "client side and server side of an application." At
+the end of this chapter, we'll step through simple code for both the
+client and server sides of network applications. The Interface Between
+the Process and the Computer Network As noted above, most applications
+consist of pairs of communicating processes, with the two processes in
+each pair sending messages to each other. Any message sent from one
+process to another must go through the underlying network. A process
+sends messages into, and receives messages from, the network through a
+software interface called a socket. Let's consider an analogy to help us
+understand processes and sockets. A process is analogous to a house and
+its socket is analogous to its door. When a process wants to send a
+message to another process on another host, it shoves the message out
+its door (socket). This sending process assumes that there is a
+transportation infrastructure on the other side of its door that will
+transport the message to the door of the destination process. Once the
+message arrives at the destination host, the message passes through the
+receiving process's door (socket), and the receiving process then acts
+on the message. Figure 2.3 illustrates socket communication between two
+processes that communicate over the Internet. (Figure 2.3 assumes that
+the underlying transport protocol used by the processes is the
+Internet's TCP protocol.) As shown in this figure, a socket is the
+interface between the application layer and the transport layer within a
+host. It is also referred to as the Application Programming Interface
+(API)
+
+between the application and the network, since the socket is the
+programming interface with which network applications are built. The
+application developer has control of everything on the applicationlayer
+side of the socket but has little control of the transport-layer side of
+the socket. The only control that the application developer has on the
+transport-layer side is (1) the choice of transport protocol and (2)
+perhaps the ability to fix a few transport-layer parameters such as
+maximum buffer and maximum segment sizes (to be covered in Chapter 3).
+Once the application developer chooses a transport protocol (if a choice
+is available), the application is built using the transport-layer
+services provided by that protocol. We'll explore sockets in some detail
+in Section 2.7. Addressing Processes In order to send postal mail to a
+particular destination, the destination needs to have an address.
+Similarly, in order for a process running on one host to send packets to
+a process running on another host, the receiving process needs to have
+an address.
+
+Figure 2.3 Application processes, sockets, and underlying transport
+protocol
+
+To identify the receiving process, two pieces of information need to be
+specified: (1) the address of the host and (2) an identifier that
+specifies the receiving process in the destination host. In the
+Internet, the host is identified by its IP address. We'll discuss IP
+addresses in great detail in Chapter 4. For now, all we need to know is
+that an IP address is a 32-bit quantity that we can think of as uniquely
+identifying the host. In addition to knowing the address of the host to
+which a message is destined, the sending process must also identify the
+receiving process (more specifically, the receiving socket) running in
+the host. This information is needed because in general a host could be
+running many network applications. A destination port number serves this
+purpose. Popular applications have been
+
+assigned specific port numbers. For example, a Web server is identified
+by port number 80. A mail server process (using the SMTP protocol) is
+identified by port number 25. A list of well-known port numbers for all
+Internet standard protocols can be found at www.iana.org. We'll examine
+port numbers in detail in Chapter 3.
+
+2.1.3 Transport Services Available to Applications Recall that a socket
+is the interface between the application process and the transport-layer
+protocol. The application at the sending side pushes messages through
+the socket. At the other side of the socket, the transport-layer
+protocol has the responsibility of getting the messages to the socket of
+the receiving process. Many networks, including the Internet, provide
+more than one transport-layer protocol. When you develop an application,
+you must choose one of the available transport-layer protocols. How do
+you make this choice? Most likely, you would study the services provided
+by the available transport-layer protocols, and then pick the protocol
+with the services that best match your application's needs. The
+situation is similar to choosing either train or airplane transport for
+travel between two cities. You have to choose one or the other, and each
+transportation mode offers different services. (For example, the train
+offers downtown pickup and drop-off, whereas the plane offers shorter
+travel time.) What are the services that a transport-layer protocol can
+offer to applications invoking it? We can broadly classify the possible
+services along four dimensions: reliable data transfer, throughput,
+timing, and security. Reliable Data Transfer As discussed in Chapter 1,
+packets can get lost within a computer network. For example, a packet
+can overflow a buffer in a router, or can be discarded by a host or
+router after having some of its bits corrupted. For many
+applications---such as electronic mail, file transfer, remote host
+access, Web document transfers, and financial applications---data loss
+can have devastating consequences (in the latter case, for either the
+bank or the customer!). Thus, to support these applications, something
+has to be done to guarantee that the data sent by one end of the
+application is delivered correctly and completely to the other end of
+the application. If a protocol provides such a guaranteed data delivery
+service, it is said to provide reliable data transfer. One important
+service that a transport-layer protocol can potentially provide to an
+application is process-to-process reliable data transfer. When a
+transport protocol provides this service, the sending process can just
+pass its data into the socket and know with complete confidence that the
+data will arrive without errors at the receiving process. When a
+transport-layer protocol doesn't provide reliable data transfer, some of
+the data sent by the
+
+sending process may never arrive at the receiving process. This may be
+acceptable for loss-tolerant applications, most notably multimedia
+applications such as conversational audio/video that can tolerate some
+amount of data loss. In these multimedia applications, lost data might
+result in a small glitch in the audio/video---not a crucial impairment.
+Throughput In Chapter 1 we introduced the concept of available
+throughput, which, in the context of a communication session between two
+processes along a network path, is the rate at which the sending process
+can deliver bits to the receiving process. Because other sessions will
+be sharing the bandwidth along the network path, and because these other
+sessions will be coming and going, the available throughput can
+fluctuate with time. These observations lead to another natural service
+that a transportlayer protocol could provide, namely, guaranteed
+available throughput at some specified rate. With such a service, the
+application could request a guaranteed throughput of r bits/sec, and the
+transport protocol would then ensure that the available throughput is
+always at least r bits/sec. Such a guaranteed throughput service would
+appeal to many applications. For example, if an Internet telephony
+application encodes voice at 32 kbps, it needs to send data into the
+network and have data delivered to the receiving application at this
+rate. If the transport protocol cannot provide this throughput, the
+application would need to encode at a lower rate (and receive enough
+throughput to sustain this lower coding rate) or may have to give up,
+since receiving, say, half of the needed throughput is of little or no
+use to this Internet telephony application. Applications that have
+throughput requirements are said to be bandwidth-sensitive applications.
+Many current multimedia applications are bandwidth sensitive, although
+some multimedia applications may use adaptive coding techniques to
+encode digitized voice or video at a rate that matches the currently
+available throughput. While bandwidth-sensitive applications have
+specific throughput requirements, elastic applications can make use of
+as much, or as little, throughput as happens to be available. Electronic
+mail, file transfer, and Web transfers are all elastic applications. Of
+course, the more throughput, the better. There'san adage that says that
+one cannot be too rich, too thin, or have too much throughput! Timing A
+transport-layer protocol can also provide timing guarantees. As with
+throughput guarantees, timing guarantees can come in many shapes and
+forms. An example guarantee might be that every bit that the sender
+pumps into the socket arrives at the receiver's socket no more than 100
+msec later. Such a service would be appealing to interactive real-time
+applications, such as Internet telephony, virtual environments,
+teleconferencing, and multiplayer games, all of which require tight
+timing constraints on data delivery in order to be effective. (See
+Chapter 9, \[Gauthier 1999; Ramjee 1994\].) Long delays in Internet
+telephony, for example, tend to result in unnatural pauses in the
+conversation; in a multiplayer game or virtual interactive environment,
+a long delay between taking an action and seeing the response
+
+from the environment (for example, from another player at the end of an
+end-to-end connection) makes the application feel less realistic. For
+non-real-time applications, lower delay is always preferable to higher
+delay, but no tight constraint is placed on the end-to-end delays.
+Security Finally, a transport protocol can provide an application with
+one or more security services. For example, in the sending host, a
+transport protocol can encrypt all data transmitted by the sending
+process, and in the receiving host, the transport-layer protocol can
+decrypt the data before delivering the data to the receiving process.
+Such a service would provide confidentiality between the two processes,
+even if the data is somehow observed between sending and receiving
+processes. A transport protocol can also provide other security services
+in addition to confidentiality, including data integrity and end-point
+authentication, topics that we'll cover in detail in Chapter 8.
+
+2.1.4 Transport Services Provided by the Internet Up until this point,
+we have been considering transport services that a computer network
+could provide in general. Let's now get more specific and examine the
+type of transport services provided by the Internet. The Internet (and,
+more generally, TCP/IP networks) makes two transport protocols available
+to applications, UDP and TCP. When you (as an application developer)
+create a new network application for the Internet, one of the first
+decisions you have to make is whether to use UDP or TCP. Each of these
+protocols offers a different set of services to the invoking
+applications. Figure 2.4 shows the service requirements for some
+selected applications. TCP Services The TCP service model includes a
+connection-oriented service and a reliable data transfer service. When
+an application invokes TCP as its transport protocol, the application
+receives both of these services from TCP. Connection-oriented service.
+TCP has the client and server exchange transport-layer control
+information with each other before the application-level messages begin
+to flow. This so-called handshaking procedure alerts the client and
+server, allowing them to prepare for an onslaught of packets. After the
+handshaking phase, a TCP connection is said to exist between the sockets
+
+Figure 2.4 Requirements of selected network applications
+
+of the two processes. The connection is a full-duplex connection in that
+the two processes can send messages to each other over the connection at
+the same time. When the application finishes sending messages, it must
+tear down the connection. In Chapter 3 we'll discuss connection-oriented
+service in detail and examine how it is implemented. Reliable data
+transfer service. The communicating processes can rely on TCP to deliver
+all data sent without error and in the proper order. When one side of
+the application passes a stream of bytes into a socket, it can count on
+TCP to deliver the same stream of bytes to the receiving socket, with no
+missing or duplicate bytes. TCP also includes a congestion-control
+mechanism, a service for the general welfare of the Internet rather than
+for the direct benefit of the communicating processes. The TCP
+congestion-control mechanism throttles a sending process (client or
+server) when the network is congested between sender and receiver. As we
+will see
+
+FOCUS ON SECURITY SECURING TCP Neither TCP nor UDP provides any
+encryption---the data that the sending process passes into its socket is
+the same data that travels over the network to the destination process.
+So, for example, if the sending process sends a password in cleartext
+(i.e., unencrypted) into its socket, the cleartext password will travel
+over all the links between sender and receiver, potentially getting
+sniffed and discovered at any of the intervening links. Because privacy
+and other security issues have become critical for many applications,
+the Internet community has developed an enhancement for TCP, called
+Secure Sockets Layer (SSL). TCP-enhanced-with-SSL not only
+
+does everything that traditional TCP does but also provides critical
+process-to-process security services, including encryption, data
+integrity, and end-point authentication. We emphasize that SSL is not a
+third Internet transport protocol, on the same level as TCP and UDP, but
+instead is an enhancement of TCP, with the enhancements being
+implemented in the application layer. In particular, if an application
+wants to use the services of SSL, it needs to include SSL code
+(existing, highly optimized libraries and classes) in both the client
+and server sides of the application. SSL has its own socket API that is
+similar to the traditional TCP socket API. When an application uses SSL,
+the sending process passes cleartext data to the SSL socket; SSL in the
+sending host then encrypts the data and passes the encrypted data to the
+TCP socket. The encrypted data travels over the Internet to the TCP
+socket in the receiving process. The receiving socket passes the
+encrypted data to SSL, which decrypts the data. Finally, SSL passes the
+cleartext data through its SSL socket to the receiving process. We'll
+cover SSL in some detail in Chapter 8.
+
+in Chapter 3, TCP congestion control also attempts to limit each TCP
+connection to its fair share of network bandwidth. UDP Services UDP is a
+no-frills, lightweight transport protocol, providing minimal services.
+UDP is connectionless, so there is no handshaking before the two
+processes start to communicate. UDP provides an unreliable data transfer
+service---that is, when a process sends a message into a UDP socket, UDP
+provides no guarantee that the message will ever reach the receiving
+process. Furthermore, messages that do arrive at the receiving process
+may arrive out of order. UDP does not include a congestion-control
+mechanism, so the sending side of UDP can pump data into the layer below
+(the network layer) at any rate it pleases. (Note, however, that the
+actual end-to-end throughput may be less than this rate due to the
+limited transmission capacity of intervening links or due to
+congestion). Services Not Provided by Internet Transport Protocols We
+have organized transport protocol services along four dimensions:
+reliable data transfer, throughput, timing, and security. Which of these
+services are provided by TCP and UDP? We have already noted that TCP
+provides reliable end-to-end data transfer. And we also know that TCP
+can be easily enhanced at the application layer with SSL to provide
+security services. But in our brief description of TCP and UDP,
+conspicuously missing was any mention of throughput or timing
+guarantees--- services not provided by today's Internet transport
+protocols. Does this mean that time-sensitive applications such as
+Internet telephony cannot run in today's Internet? The answer is clearly
+no---the Internet has been hosting time-sensitive applications for many
+years. These applications often work fairly well because
+
+they have been designed to cope, to the greatest extent possible, with
+this lack of guarantee. We'll investigate several of these design tricks
+in Chapter 9. Nevertheless, clever design has its limitations when delay
+is excessive, or the end-to-end throughput is limited. In summary,
+today's Internet can often provide satisfactory service to
+time-sensitive applications, but it cannot provide any timing or
+throughput guarantees. Figure 2.5 indicates the transport protocols used
+by some popular Internet applications. We see that email, remote
+terminal access, the Web, and file transfer all use TCP. These
+applications have chosen TCP primarily because TCP provides reliable
+data transfer, guaranteeing that all data will eventually get to its
+destination. Because Internet telephony applications (such as Skype) can
+often tolerate some loss but require a minimal rate to be effective,
+developers of Internet telephony applications usually prefer to run
+their applications over UDP, thereby circumventing TCP's congestion
+control mechanism and packet overheads. But because many firewalls are
+configured to block (most types of) UDP traffic, Internet telephony
+applications often are designed to use TCP as a backup if UDP
+communication fails.
+
+Figure 2.5 Popular Internet applications, their application-layer
+protocols, and their underlying transport protocols
+
+2.1.5 Application-Layer Protocols We have just learned that network
+processes communicate with each other by sending messages into sockets.
+But how are these messages structured? What are the meanings of the
+various fields in the messages? When do the processes send the messages?
+These questions bring us into the realm of application-layer protocols.
+An application-layer protocol defines how an application's processes,
+running on different end systems, pass messages to each other. In
+particular, an application-layer protocol defines:
+
+The types of messages exchanged, for example, request messages and
+response messages The syntax of the various message types, such as the
+fields in the message and how the fields are delineated The semantics of
+the fields, that is, the meaning of the information in the fields Rules
+for determining when and how a process sends messages and responds to
+messages Some application-layer protocols are specified in RFCs and are
+therefore in the public domain. For example, the Web's application-layer
+protocol, HTTP (the HyperText Transfer Protocol \[RFC 2616\]), is
+available as an RFC. If a browser developer follows the rules of the
+HTTP RFC, the browser will be able to retrieve Web pages from any Web
+server that has also followed the rules of the HTTP RFC. Many other
+application-layer protocols are proprietary and intentionally not
+available in the public domain. For example, Skype uses proprietary
+application-layer protocols. It is important to distinguish between
+network applications and application-layer protocols. An
+application-layer protocol is only one piece of a network application
+(albeit, a very important piece of the application from our point of
+view!). Let's look at a couple of examples. The Web is a client-server
+application that allows users to obtain documents from Web servers on
+demand. The Web application consists of many components, including a
+standard for document formats (that is, HTML), Web browsers (for
+example, Firefox and Microsoft Internet Explorer), Web servers (for
+example, Apache and Microsoft servers), and an application-layer
+protocol. The Web's application-layer protocol, HTTP, defines the format
+and sequence of messages exchanged between browser and Web server. Thus,
+HTTP is only one piece (albeit, an important piece) of the Web
+application. As another example, an Internet e-mail application also has
+many components, including mail servers that house user mailboxes; mail
+clients (such as Microsoft Outlook) that allow users to read and create
+messages; a standard for defining the structure of an e-mail message;
+and application-layer protocols that define how messages are passed
+between servers, how messages are passed between servers and mail
+clients, and how the contents of message headers are to be interpreted.
+The principal application-layer protocol for electronic mail is SMTP
+(Simple Mail Transfer Protocol) \[RFC 5321\]. Thus, e-mail's principal
+application-layer protocol, SMTP, is only one piece (albeit an important
+piece) of the e-mail application.
+
+2.1.6 Network Applications Covered in This Book New public domain and
+proprietary Internet applications are being developed every day. Rather
+than covering a large number of Internet applications in an encyclopedic
+manner, we have chosen to focus on a small number of applications that
+are both pervasive and important. In this chapter we discuss five
+important applications: the Web, electronic mail, directory service
+video streaming, and P2P applications. We first discuss the Web, not
+only because it is an enormously popular application, but also because
+its application-layer protocol, HTTP, is straightforward and easy to
+understand. We then discuss electronic mail, the Internet's first killer
+application. E-mail is more complex than the Web in the
+
+sense that it makes use of not one but several application-layer
+protocols. After e-mail, we cover DNS, which provides a directory
+service for the Internet. Most users do not interact with DNS directly;
+instead, users invoke DNS indirectly through other applications
+(including the Web, file transfer, and electronic mail). DNS illustrates
+nicely how a piece of core network functionality (network-name to
+networkaddress translation) can be implemented at the application layer
+in the Internet. We then discuss P2P file sharing applications, and
+complete our application study by discussing video streaming on demand,
+including distributing stored video over content distribution networks.
+In Chapter 9, we'll cover multimedia applications in more depth,
+including voice over IP and video conferencing.
+
+2.2 The Web and HTTP Until the early 1990s the Internet was used
+primarily by researchers, academics, and university students to log in
+to remote hosts, to transfer files from local hosts to remote hosts and
+vice versa, to receive and send news, and to receive and send electronic
+mail. Although these applications were (and continue to be) extremely
+useful, the Internet was essentially unknown outside of the academic and
+research communities. Then, in the early 1990s, a major new application
+arrived on the scene---the World Wide Web \[Berners-Lee 1994\]. The Web
+was the first Internet application that caught the general public's eye.
+It dramatically changed, and continues to change, how people interact
+inside and outside their work environments. It elevated the Internet
+from just one of many data networks to essentially the one and only data
+network. Perhaps what appeals the most to users is that the Web operates
+on demand. Users receive what they want, when they want it. This is
+unlike traditional broadcast radio and television, which force users to
+tune in when the content provider makes the content available. In
+addition to being available on demand, the Web has many other wonderful
+features that people love and cherish. It is enormously easy for any
+individual to make information available over the Web---everyone can
+become a publisher at extremely low cost. Hyperlinks and search engines
+help us navigate through an ocean of information. Photos and videos
+stimulate our senses. Forms, JavaScript, Java applets, and many other
+devices enable us to interact with pages and sites. And the Web and its
+protocols serve as a platform for YouTube, Web-based e-mail (such as
+Gmail), and most mobile Internet applications, including Instagram and
+Google Maps.
+
+2.2.1 Overview of HTTP The HyperText Transfer Protocol (HTTP), the Web's
+application-layer protocol, is at the heart of the Web. It is defined in
+\[RFC 1945\] and \[RFC 2616\]. HTTP is implemented in two programs: a
+client program and a server program. The client program and server
+program, executing on different end systems, talk to each other by
+exchanging HTTP messages. HTTP defines the structure of these messages
+and how the client and server exchange the messages. Before explaining
+HTTP in detail, we should review some Web terminology. A Web page (also
+called a document) consists of objects. An object is simply a
+file---such as an HTML file, a JPEG image, a Java applet, or a video
+clip---that is addressable by a single URL. Most Web pages consist of a
+base HTML file and several referenced objects. For example, if a Web
+page
+
+contains HTML text and five JPEG images, then the Web page has six
+objects: the base HTML file plus the five images. The base HTML file
+references the other objects in the page with the objects' URLs. Each
+URL has two components: the hostname of the server that houses the
+object and the object's path name. For example, the URL
+
+http://www.someSchool.edu/someDepartment/picture.gif
+
+has www.someSchool.edu for a hostname and /someDepartment/picture.gif
+for a path name. Because Web browsers (such as Internet Explorer and
+Firefox) implement the client side of HTTP, in the context of the Web,
+we will use the words browser and client interchangeably. Web servers,
+which implement the server side of HTTP, house Web objects, each
+addressable by a URL. Popular Web servers include Apache and Microsoft
+Internet Information Server. HTTP defines how Web clients request Web
+pages from Web servers and how servers transfer Web pages to clients. We
+discuss the interaction between client and server in detail later, but
+the general idea is illustrated in Figure 2.6. When a user requests a
+Web page (for example, clicks on a hyperlink), the browser sends HTTP
+request messages for the objects in the page to the server. The server
+receives the requests and responds with HTTP response messages that
+contain the objects. HTTP uses TCP as its underlying transport protocol
+(rather than running on top of UDP). The HTTP client first initiates a
+TCP connection with the server. Once the connection is established, the
+browser and the server processes access TCP through their socket
+interfaces. As described in Section 2.1, on the client side the socket
+interface is the door between the client process and the TCP connection;
+on the server side it is the door between the server process and the TCP
+connection. The client sends HTTP request messages into its socket
+interface and receives HTTP response messages from its socket interface.
+Similarly, the HTTP server receives request messages
+
+Figure 2.6 HTTP request-response behavior
+
+from its socket interface and sends response messages into its socket
+interface. Once the client sends a message into its socket interface,
+the message is out of the client's hands and is "in the hands" of TCP.
+Recall from Section 2.1 that TCP provides a reliable data transfer
+service to HTTP. This implies that each HTTP request message sent by a
+client process eventually arrives intact at the server; similarly, each
+HTTP response message sent by the server process eventually arrives
+intact at the client. Here we see one of the great advantages of a
+layered architecture---HTTP need not worry about lost data or the
+details of how TCP recovers from loss or reordering of data within the
+network. That is the job of TCP and the protocols in the lower layers of
+the protocol stack. It is important to note that the server sends
+requested files to clients without storing any state information about
+the client. If a particular client asks for the same object twice in a
+period of a few seconds, the server does not respond by saying that it
+just served the object to the client; instead, the server resends the
+object, as it has completely forgotten what it did earlier. Because an
+HTTP server maintains no information about the clients, HTTP is said to
+be a stateless protocol. We also remark that the Web uses the
+client-server application architecture, as described in Section 2.1. A
+Web server is always on, with a fixed IP address, and it services
+requests from potentially millions of different browsers.
+
+2.2.2 Non-Persistent and Persistent Connections In many Internet
+applications, the client and server communicate for an extended period
+of time, with the client making a series of requests and the server
+responding to each of the requests. Depending on the application and on
+how the application is being used, the series of requests may be made
+back-to-back, periodically at regular intervals, or intermittently. When
+this client-server interaction is taking place over TCP, the application
+developer needs to make an important decision---should each
+request/response pair be sent over a separate TCP connection, or should
+all of the requests and their corresponding responses be sent over the
+same TCP connection? In the former approach, the application is said to
+use non-persistent connections; and in the latter approach, persistent
+connections. To gain a deep understanding of this design issue, let's
+examine the advantages and disadvantages of persistent connections in
+the context of a specific application, namely, HTTP, which can use both
+non-persistent connections and persistent connections. Although HTTP
+uses persistent connections in its default mode, HTTP clients and
+servers can be configured to use non-persistent connections instead.
+HTTP with Non-Persistent Connections
+
+Let's walk through the steps of transferring a Web page from server to
+client for the case of nonpersistent connections. Let's suppose the page
+consists of a base HTML file and 10 JPEG images, and that all 11 of
+these objects reside on the same server. Further suppose the URL for the
+base HTML file is
+
+http://www.someSchool.edu/someDepartment/home.index
+
+Here is what happens:
+
+1.  The HTTP client process initiates a TCP connection to the server
+    www.someSchool.edu on port number 80, which is the default port
+    number for HTTP. Associated with the TCP connection, there will be a
+    socket at the client and a socket at the server.
+
+2.  The HTTP client sends an HTTP request message to the server via its
+    socket. The request message includes the path name
+    /someDepartment/home .index . (We will discuss HTTP messages in some
+    detail below.)
+
+3.  The HTTP server process receives the request message via its socket,
+    retrieves the object /someDepartment/home.index from its storage
+    (RAM or disk), encapsulates the object in an HTTP response message,
+    and sends the response message to the client via its socket.
+
+4.  The HTTP server process tells TCP to close the TCP connection. (But
+    TCP doesn't actually terminate the connection until it knows for
+    sure that the client has received the response message intact.)
+
+5.  The HTTP client receives the response message. The TCP connection
+    terminates. The message indicates that the encapsulated object is an
+    HTML file. The client extracts the file from the response message,
+    examines the HTML file, and finds references to the 10 JPEG objects.
+
+6.  The first four steps are then repeated for each of the referenced
+    JPEG objects. As the browser receives the Web page, it displays the
+    page to the user. Two different browsers may interpret (that is,
+    display to the user) a Web page in somewhat different ways. HTTP has
+    nothing to do with how a Web page is interpreted by a client. The
+    HTTP specifications (\[RFC 1945\] and \[RFC 2616\]) define only the
+    communication protocol between the client HTTP program and the
+    server HTTP program. The steps above illustrate the use of
+    non-persistent connections, where each TCP connection is closed
+    after the server sends the object---the connection does not persist
+    for other objects. Note that each TCP connection transports exactly
+    one request message and one response message. Thus, in this example,
+    when a user requests the Web page, 11 TCP connections are generated.
+    In the steps described above, we were intentionally vague about
+    whether the client obtains the 10
+
+JPEGs over 10 serial TCP connections, or whether some of the JPEGs are
+obtained over parallel TCP connections. Indeed, users can configure
+modern browsers to control the degree of parallelism. In their default
+modes, most browsers open 5 to 10 parallel TCP connections, and each of
+these connections handles one request-response transaction. If the user
+prefers, the maximum number of parallel connections can be set to one,
+in which case the 10 connections are established serially. As we'll see
+in the next chapter, the use of parallel connections shortens the
+response time. Before continuing, let's do a back-of-the-envelope
+calculation to estimate the amount of time that elapses from when a
+client requests the base HTML file until the entire file is received by
+the client. To this end, we define the round-trip time (RTT), which is
+the time it takes for a small packet to travel from client to server and
+then back to the client. The RTT includes packet-propagation delays,
+packetqueuing delays in intermediate routers and switches, and
+packet-processing delays. (These delays were discussed in Section 1.4.)
+Now consider what happens when a user clicks on a hyperlink. As shown in
+Figure 2.7, this causes the browser to initiate a TCP connection between
+the browser and the Web server; this involves a "three-way
+handshake"---the client sends a small TCP segment to the server, the
+server acknowledges and responds with a small TCP segment, and, finally,
+the client acknowledges back to the server. The first two parts of the
+three-way handshake take one RTT. After completing the first two parts
+of the handshake, the client sends the HTTP request message combined
+with the third part of the three-way handshake (the acknowledgment) into
+the TCP connection. Once the request message arrives at
+
+Figure 2.7 Back-of-the-envelope calculation for the time needed to
+request and receive an HTML file
+
+the server, the server sends the HTML file into the TCP connection. This
+HTTP request/response eats up another RTT. Thus, roughly, the total
+response time is two RTTs plus the transmission time at the server of
+the HTML file. HTTP with Persistent Connections Non-persistent
+connections have some shortcomings. First, a brand-new connection must
+be established and maintained for each requested object. For each of
+these connections, TCP buffers must be allocated and TCP variables must
+be kept in both the client and server. This can place a significant
+burden on the Web server, which may be serving requests from hundreds of
+different clients simultaneously. Second, as we just described, each
+object suffers a delivery delay of two RTTs---one RTT to establish the
+TCP connection and one RTT to request and receive an object. With HTTP
+1.1 persistent connections, the server leaves the TCP connection open
+after sending a response. Subsequent requests and responses between the
+same client and server can be sent over the same connection. In
+particular, an entire Web page (in the example above, the base HTML file
+and the 10 images) can be sent over a single persistent TCP connection.
+Moreover, multiple Web pages residing on the same server can be sent
+from the server to the same client over a single persistent TCP
+connection. These requests for objects can be made back-to-back, without
+waiting for replies to pending requests (pipelining). Typically, the
+HTTP server closes a connection when it isn't used for a certain time (a
+configurable timeout interval). When the server receives the
+back-to-back requests, it sends the objects back-to-back. The default
+mode of HTTP uses persistent connections with pipelining. Most recently,
+HTTP/2 \[RFC 7540\] builds on HTTP 1.1 by allowing multiple requests and
+replies to be interleaved in the same connection, and a mechanism for
+prioritizing HTTP message requests and replies within this connection.
+We'll quantitatively compare the performance of non-persistent and
+persistent connections in the homework problems of Chapters 2 and 3. You
+are also encouraged to see \[Heidemann 1997; Nielsen 1997; RFC 7540\].
+
+2.2.3 HTTP Message Format The HTTP specifications \[RFC 1945; RFC 2616;
+RFC 7540\] include the definitions of the HTTP message formats. There
+are two types of HTTP messages, request messages and response messages,
+both of which are discussed below. HTTP Request Message
+
+Below we provide a typical HTTP request message:
+
+GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu Connection:
+close User-agent: Mozilla/5.0 Accept-language: fr
+
+We can learn a lot by taking a close look at this simple request
+message. First of all, we see that the message is written in ordinary
+ASCII text, so that your ordinary computer-literate human being can read
+it. Second, we see that the message consists of five lines, each
+followed by a carriage return and a line feed. The last line is followed
+by an additional carriage return and line feed. Although this particular
+request message has five lines, a request message can have many more
+lines or as few as one line. The first line of an HTTP request message
+is called the request line; the subsequent lines are called the header
+lines. The request line has three fields: the method field, the URL
+field, and the HTTP version field. The method field can take on several
+different values, including GET, POST, HEAD, PUT, and DELETE . The great
+majority of HTTP request messages use the GET method. The GET method is
+used when the browser requests an object, with the requested object
+identified in the URL field. In this example, the browser is requesting
+the object /somedir/page.html . The version is selfexplanatory; in this
+example, the browser implements version HTTP/1.1. Now let's look at the
+header lines in the example. The header line Host: www.someschool.edu
+specifies the host on which the object resides. You might think that
+this header line is unnecessary, as there is already a TCP connection in
+place to the host. But, as we'll see in Section 2.2.5, the information
+provided by the host header line is required by Web proxy caches. By
+including the Connection: close header line, the browser is telling the
+server that it doesn't want to bother with persistent connections; it
+wants the server to close the connection after sending the requested
+object. The Useragent: header line specifies the user agent, that is,
+the browser type that is making the request to the server. Here the user
+agent is Mozilla/5.0, a Firefox browser. This header line is useful
+because the server can actually send different versions of the same
+object to different types of user agents. (Each of the versions is
+addressed by the same URL.) Finally, the Accept-language: header
+indicates that the user prefers to receive a French version of the
+object, if such an object exists on the server; otherwise, the server
+should send its default version. The Accept-language: header is just one
+of many content negotiation headers available in HTTP. Having looked at
+an example, let's now look at the general format of a request message,
+as shown in Figure 2.8. We see that the general format closely follows
+our earlier example. You may have noticed,
+
+however, that after the header lines (and the additional carriage return
+and line feed) there is an "entity body." The entity body is empty with
+the GET method, but is used with the POST method. An HTTP client often
+uses the POST method when the user fills out a form---for example, when
+a user provides search words to a search engine. With a POST message,
+the user is still requesting a Web page from the server, but the
+specific contents of the Web page
+
+Figure 2.8 General format of an HTTP request message
+
+depend on what the user entered into the form fields. If the value of
+the method field is POST , then the entity body contains what the user
+entered into the form fields. We would be remiss if we didn't mention
+that a request generated with a form does not necessarily use the POST
+method. Instead, HTML forms often use the GET method and include the
+inputted data (in the form fields) in the requested URL. For example, if
+a form uses the GET method, has two fields, and the inputs to the two
+fields are monkeys and bananas , then the URL will have the structure
+www.somesite.com/animalsearch?monkeys&bananas . In your day-to-day Web
+surfing, you have probably noticed extended URLs of this sort. The HEAD
+method is similar to the GET method. When a server receives a request
+with the HEAD method, it responds with an HTTP message but it leaves out
+the requested object. Application developers often use the HEAD method
+for debugging. The PUT method is often used in conjunction with Web
+publishing tools. It allows a user to upload an object to a specific
+path (directory) on a specific Web server. The PUT method is also used
+by applications that need to upload objects to Web servers. The DELETE
+method allows a user, or an application, to delete an object on a Web
+server. HTTP Response Message
+
+Below we provide a typical HTTP response message. This response message
+could be the response to the example request message just discussed.
+
+HTTP/1.1 200 OK Connection: close Date: Tue, 18 Aug 2015 15:44:04 GMT
+Server: Apache/2.2.3 (CentOS) Last-Modified: Tue, 18 Aug 2015 15:11:03
+GMT Content-Length: 6821 Content-Type: text/html (data data data data
+data ...)
+
+Let's take a careful look at this response message. It has three
+sections: an initial status line, six header lines, and then the entity
+body. The entity body is the meat of the message---it contains the
+requested object itself (represented by data data data data data ... ).
+The status line has three fields: the protocol version field, a status
+code, and a corresponding status message. In this example, the status
+line indicates that the server is using HTTP/1.1 and that everything is
+OK (that is, the server has found, and is sending, the requested
+object). Now let's look at the header lines. The server uses the
+Connection: close header line to tell the client that it is going to
+close the TCP connection after sending the message. The Date: header
+line indicates the time and date when the HTTP response was created and
+sent by the server. Note that this is not the time when the object was
+created or last modified; it is the time when the server retrieves the
+object from its file system, inserts the object into the response
+message, and sends the response message. The Server: header line
+indicates that the message was generated by an Apache Web server; it is
+analogous to the User-agent: header line in the HTTP request message.
+The LastModified: header line indicates the time and date when the
+object was created or last modified. The Last-Modified: header, which we
+will soon cover in more detail, is critical for object caching, both in
+the local client and in network cache servers (also known as proxy
+servers). The Content-Length: header line indicates the number of bytes
+in the object being sent. The Content-Type: header line indicates that
+the object in the entity body is HTML text. (The object type is
+officially indicated by the Content-Type: header and not by the file
+extension.) Having looked at an example, let's now examine the general
+format of a response message, which is shown in Figure 2.9. This general
+format of the response message matches the previous example of a
+response message. Let's say a few additional words about status codes
+and their phrases. The status
+
+code and associated phrase indicate the result of the request. Some
+common status codes and associated phrases include: 200 OK: Request
+succeeded and the information is returned in the response. 301 Moved
+Permanently: Requested object has been permanently moved; the new URL is
+specified in Location : header of the response message. The client
+software will automatically retrieve the new URL. 400 Bad Request: This
+is a generic error code indicating that the request could not be
+understood by the server.
+
+Figure 2.9 General format of an HTTP response message
+
+404 Not Found: The requested document does not exist on this server. 505
+HTTP Version Not Supported: The requested HTTP protocol version is not
+supported by the server. How would you like to see a real HTTP response
+message? This is highly recommended and very easy to do! First Telnet
+into your favorite Web server. Then type in a one-line request message
+for some object that is housed on the server. For example, if you have
+access to a command prompt, type:
+
+Using Wireshark to investigate the HTTP protocol
+
+telnet gaia.cs.umass.edu 80 GET /kurose_ross/interactive/index.php
+HTTP/1.1 Host: gaia.cs.umass.edu
+
+(Press the carriage return twice after typing the last line.) This opens
+a TCP connection to port 80 of the host gaia.cs.umass.edu and then sends
+the HTTP request message. You should see a response message that
+includes the base HTML file for the interactive homework problems for
+this textbook. If you'd rather just see the HTTP message lines and not
+receive the object itself, replace GET with HEAD . In this section we
+discussed a number of header lines that can be used within HTTP request
+and response messages. The HTTP specification defines many, many more
+header lines that can be inserted by browsers, Web servers, and network
+cache servers. We have covered only a small number of the totality of
+header lines. We'll cover a few more below and another small number when
+we discuss network Web caching in Section 2.2.5. A highly readable and
+comprehensive discussion of the HTTP protocol, including its headers and
+status codes, is given in \[Krishnamurthy 2001\]. How does a browser
+decide which header lines to include in a request message? How does a
+Web server decide which header lines to include in a response message? A
+browser will generate header lines as a function of the browser type and
+version (for example, an HTTP/1.0 browser will not generate any 1.1
+header lines), the user configuration of the browser (for example,
+preferred language), and whether the browser currently has a cached, but
+possibly out-of-date, version of the object. Web servers behave
+similarly: There are different products, versions, and configurations,
+all of which influence which header lines are included in response
+messages.
+
+2.2.4 User-Server Interaction: Cookies We mentioned above that an HTTP
+server is stateless. This simplifies server design and has permitted
+engineers to develop high-performance Web servers that can handle
+thousands of simultaneous TCP connections. However, it is often
+desirable for a Web site to identify users, either because the server
+wishes to restrict user access or because it wants to serve content as a
+function of the user identity. For these purposes, HTTP uses cookies.
+Cookies, defined in \[RFC 6265\], allow sites to keep track of users.
+Most major commercial Web sites use cookies today. As shown in Figure
+2.10, cookie technology has four components: (1) a cookie header line in
+the HTTP response message; (2) a cookie header line in the HTTP request
+message; (3) a cookie file kept on the
+
+user's end system and managed by the user's browser; and (4) a back-end
+database at the Web site. Using Figure 2.10, let's walk through an
+example of how cookies work. Suppose Susan, who always accesses the Web
+using Internet Explorer from her home PC, contacts Amazon.com for the
+first time. Let us suppose that in the past she has already visited the
+eBay site. When the request comes into the Amazon Web server, the server
+creates a unique identification number and creates an entry in its
+backend database that is indexed by the identification number. The
+Amazon Web server then responds to Susan's browser, including in the
+HTTP response a Set-cookie: header, which contains the identification
+number. For example, the header line might be:
+
+Set-cookie: 1678
+
+When Susan's browser receives the HTTP response message, it sees the
+Set-cookie: header. The browser then appends a line to the special
+cookie file that it manages. This line includes the hostname of the
+server and the identification number in the Set-cookie: header. Note
+that the cookie file already has an entry for eBay, since Susan has
+visited that site in the past. As Susan continues to browse the Amazon
+site, each time she requests a Web page, her browser consults her cookie
+file, extracts her identification number for this site, and puts a
+cookie header line that
+
+Figure 2.10 Keeping user state with cookies
+
+includes the identification number in the HTTP request. Specifically,
+each of her HTTP requests to the Amazon server includes the header line:
+
+Cookie: 1678
+
+In this manner, the Amazon server is able to track Susan's activity at
+the Amazon site. Although the Amazon Web site does not necessarily know
+Susan's name, it knows exactly which pages user 1678 visited, in which
+order, and at what times! Amazon uses cookies to provide its shopping
+cart service--- Amazon can maintain a list of all of Susan's intended
+purchases, so that she can pay for them
+
+collectively at the end of the session. If Susan returns to Amazon's
+site, say, one week later, her browser will continue to put the header
+line Cookie: 1678 in the request messages. Amazon also recommends
+products to Susan based on Web pages she has visited at Amazon in the
+past. If Susan also registers herself with Amazon--- providing full
+name, e-mail address, postal address, and credit card
+information---Amazon can then include this information in its database,
+thereby associating Susan's name with her identification number (and all
+of the pages she has visited at the site in the past!). This is how
+Amazon and other e-commerce sites provide "one-click shopping"---when
+Susan chooses to purchase an item during a subsequent visit, she doesn't
+need to re-enter her name, credit card number, or address. From this
+discussion we see that cookies can be used to identify a user. The first
+time a user visits a site, the user can provide a user identification
+(possibly his or her name). During the subsequent sessions, the browser
+passes a cookie header to the server, thereby identifying the user to
+the server. Cookies can thus be used to create a user session layer on
+top of stateless HTTP. For example, when a user logs in to a Web-based
+e-mail application (such as Hotmail), the browser sends cookie
+information to the server, permitting the server to identify the user
+throughout the user's session with the application. Although cookies
+often simplify the Internet shopping experience for the user, they are
+controversial because they can also be considered as an invasion of
+privacy. As we just saw, using a combination of cookies and
+user-supplied account information, a Web site can learn a lot about a
+user and potentially sell this information to a third party. Cookie
+Central \[Cookie Central 2016\] includes extensive information on the
+cookie controversy.
+
+2.2.5 Web Caching A Web cache---also called a proxy server---is a
+network entity that satisfies HTTP requests on the behalf of an origin
+Web server. The Web cache has its own disk storage and keeps copies of
+recently requested objects in this storage. As shown in Figure 2.11, a
+user's browser can be configured so that all of the user's HTTP requests
+are first directed to the Web cache. Once a browser is configured, each
+browser request for an object is first directed to the Web cache. As an
+example, suppose a browser is requesting the object
+http://www.someschool.edu/campus.gif . Here is what happens:
+
+1.  The browser establishes a TCP connection to the Web cache and sends
+    an HTTP request for the object to the Web cache.
+
+2.  The Web cache checks to see if it has a copy of the object stored
+    locally. If it does, the Web cache returns the object within an HTTP
+    response message to the client browser.
+
+Figure 2.11 Clients requesting objects through a Web cache
+
+3.  If the Web cache does not have the object, the Web cache opens a TCP
+    connection to the origin server, that is, to www.someschool.edu .
+    The Web cache then sends an HTTP request for the object into the
+    cache-to-server TCP connection. After receiving this request, the
+    origin server sends the object within an HTTP response to the Web
+    cache.
+
+4.  When the Web cache receives the object, it stores a copy in its
+    local storage and sends a copy, within an HTTP response message, to
+    the client browser (over the existing TCP connection between the
+    client browser and the Web cache). Note that a cache is both a
+    server and a client at the same time. When it receives requests from
+    and sends responses to a browser, it is a server. When it sends
+    requests to and receives responses from an origin server, it is a
+    client. Typically a Web cache is purchased and installed by an ISP.
+    For example, a university might install a cache on its campus
+    network and configure all of the campus browsers to point to the
+    cache. Or a major residential ISP (such as Comcast) might install
+    one or more caches in its network and preconfigure its shipped
+    browsers to point to the installed caches. Web caching has seen
+    deployment in the Internet for two reasons. First, a Web cache can
+    substantially reduce the response time for a client request,
+    particularly if the bottleneck bandwidth between the client and the
+    origin server is much less than the bottleneck bandwidth between the
+    client and the cache. If there is a high-speed connection between
+    the client and the cache, as there often is, and if the cache has
+    the requested object, then the cache will be able to deliver the
+    object rapidly to the client. Second, as we will soon illustrate
+    with an example, Web caches can substantially reduce traffic on an
+    institution's access link to the Internet. By reducing traffic, the
+    institution (for example, a company or a university) does not have
+    to upgrade bandwidth as quickly, thereby reducing costs.
+    Furthermore, Web caches can
+
+substantially reduce Web traffic in the Internet as a whole, thereby
+improving performance for all applications. To gain a deeper
+understanding of the benefits of caches, let's consider an example in
+the context of Figure 2.12. This figure shows two networks---the
+institutional network and the rest of the public Internet. The
+institutional network is a high-speed LAN. A router in the institutional
+network and a router in the Internet are connected by a 15 Mbps link.
+The origin servers are attached to the Internet but are located all over
+the globe. Suppose that the average object size is 1 Mbits and that the
+average request rate from the institution's browsers to the origin
+servers is 15 requests per second. Suppose that the HTTP request
+messages are negligibly small and thus create no traffic in the networks
+or in the access link (from institutional router to Internet router).
+Also suppose that the amount of time it takes from when the router on
+the Internet side of the access link in Figure 2.12 forwards an HTTP
+request (within an IP datagram) until it receives the response
+(typically within many IP datagrams) is two seconds on average.
+Informally, we refer to this last delay as the "Internet delay."
+
+Figure 2.12 Bottleneck between an institutional network and the Internet
+
+The total response time---that is, the time from the browser's request
+of an object until its receipt of the object---is the sum of the LAN
+delay, the access delay (that is, the delay between the two routers),
+and
+
+the Internet delay. Let's now do a very crude calculation to estimate
+this delay. The traffic intensity on the LAN (see Section 1.4.2) is (15
+requests/sec)⋅(1 Mbits/request)/(100 Mbps)=0.15 whereas the traffic
+intensity on the access link (from the Internet router to institution
+router) is (15 requests/sec)⋅(1 Mbits/request)/(15 Mbps)=1 A traffic
+intensity of 0.15 on a LAN typically results in, at most, tens of
+milliseconds of delay; hence, we can neglect the LAN delay. However, as
+discussed in Section 1.4.2, as the traffic intensity approaches 1 (as is
+the case of the access link in Figure 2.12), the delay on a link becomes
+very large and grows without bound. Thus, the average response time to
+satisfy requests is going to be on the order of minutes, if not more,
+which is unacceptable for the institution's users. Clearly something
+must be done. One possible solution is to increase the access rate from
+15 Mbps to, say, 100 Mbps. This will lower the traffic intensity on the
+access link to 0.15, which translates to negligible delays between the
+two routers. In this case, the total response time will roughly be two
+seconds, that is, the Internet delay. But this solution also means that
+the institution must upgrade its access link from 15 Mbps to 100 Mbps, a
+costly proposition. Now consider the alternative solution of not
+upgrading the access link but instead installing a Web cache in the
+institutional network. This solution is illustrated in Figure 2.13. Hit
+rates---the fraction of requests that are satisfied by a cache---
+typically range from 0.2 to 0.7 in practice. For illustrative purposes,
+let's suppose that the cache provides a hit rate of 0.4 for this
+institution. Because the clients and the cache are connected to the same
+high-speed LAN, 40 percent of the requests will be satisfied almost
+immediately, say, within 10 milliseconds, by the cache. Nevertheless,
+the remaining 60 percent of the requests still need to be satisfied by
+the origin servers. But with only 60 percent of the requested objects
+passing through the access link, the traffic intensity on the access
+link is reduced from 1.0 to 0.6. Typically, a traffic intensity less
+than 0.8 corresponds to a small delay, say, tens of milliseconds, on a
+15 Mbps link. This delay is negligible compared with the two-second
+Internet delay. Given these considerations, average delay therefore is
+0.4⋅(0.01 seconds)+0.6⋅(2.01 seconds) which is just slightly greater
+than 1.2 seconds. Thus, this second solution provides an even lower
+response time than the first solution, and it doesn't require the
+institution
+
+Figure 2.13 Adding a cache to the institutional network
+
+to upgrade its link to the Internet. The institution does, of course,
+have to purchase and install a Web cache. But this cost is low---many
+caches use public-domain software that runs on inexpensive PCs. Through
+the use of Content Distribution Networks (CDNs), Web caches are
+increasingly playing an important role in the Internet. A CDN company
+installs many geographically distributed caches throughout the Internet,
+thereby localizing much of the traffic. There are shared CDNs (such as
+Akamai and Limelight) and dedicated CDNs (such as Google and Netflix).
+We will discuss CDNs in more detail in Section 2.6. The Conditional GET
+Although caching can reduce user-perceived response times, it introduces
+a new problem---the copy of an object residing in the cache may be
+stale. In other words, the object housed in the Web server may have been
+modified since the copy was cached at the client. Fortunately, HTTP has
+a mechanism that allows a cache to verify that its objects are up to
+date. This mechanism is called the conditional GET.
+
+An HTTP request message is a so-called conditional GET message if (1)
+the request message uses the GET method and (2) the request message
+includes an If-Modified-Since: header line. To illustrate how the
+conditional GET operates, let's walk through an example. First, on the
+behalf of a requesting browser, a proxy cache sends a request message to
+a Web server:
+
+GET /fruit/kiwi.gif HTTP/1.1 Host: www.exotiquecuisine.com
+
+Second, the Web server sends a response message with the requested
+object to the cache:
+
+HTTP/1.1 200 OK Date: Sat, 3 Oct 2015 15:39:29 Server: Apache/1.3.0
+(Unix) Last-Modified: Wed, 9 Sep 2015 09:23:24 Content-Type: image/gif
+(data data data data data ...)
+
+The cache forwards the object to the requesting browser but also caches
+the object locally. Importantly, the cache also stores the last-modified
+date along with the object. Third, one week later, another browser
+requests the same object via the cache, and the object is still in the
+cache. Since this object may have been modified at the Web server in the
+past week, the cache performs an up-to-date check by issuing a
+conditional GET. Specifically, the cache sends:
+
+GET /fruit/kiwi.gif HTTP/1.1 Host: www.exotiquecuisine.com
+If-modified-since: Wed, 9 Sep 2015 09:23:24
+
+Note that the value of the If-modified-since: header line is exactly
+equal to the value of the Last-Modified: header line that was sent by
+the server one week ago. This conditional GET is telling the server to
+send the object only if the object has been modified since the specified
+date. Suppose the object has not been modified since 9 Sep 2015
+09:23:24. Then, fourth, the Web server sends a response message to the
+cache:
+
+HTTP/1.1 304 Not Modified Date: Sat, 10 Oct 2015 15:39:29 Server:
+Apache/1.3.0 (Unix) (empty entity body)
+
+We see that in response to the conditional GET, the Web server still
+sends a response message but does not include the requested object in
+the response message. Including the requested object would only waste
+bandwidth and increase user-perceived response time, particularly if the
+object is large. Note that this last response message has 304 Not
+Modified in the status line, which tells the cache that it can go ahead
+and forward its (the proxy cache's) cached copy of the object to the
+requesting browser. This ends our discussion of HTTP, the first Internet
+protocol (an application-layer protocol) that we've studied in detail.
+We've seen the format of HTTP messages and the actions taken by the Web
+client and server as these messages are sent and received. We've also
+studied a bit of the Web's application infrastructure, including caches,
+cookies, and back-end databases, all of which are tied in some way to
+the HTTP protocol.
+
+2.3 Electronic Mail in the Internet Electronic mail has been around
+since the beginning of the Internet. It was the most popular application
+when the Internet was in its infancy \[Segaller 1998\], and has become
+more elaborate and powerful over the years. It remains one of the
+Internet's most important and utilized applications. As with ordinary
+postal mail, e-mail is an asynchronous communication medium---people
+send and read messages when it is convenient for them, without having to
+coordinate with other people's schedules. In contrast with postal mail,
+electronic mail is fast, easy to distribute, and inexpensive. Modern
+e-mail has many powerful features, including messages with attachments,
+hyperlinks, HTML-formatted text, and embedded photos. In this section,
+we examine the application-layer protocols that are at the heart of
+Internet e-mail. But before we jump into an in-depth discussion of these
+protocols, let's take a high-level view of the Internet mail system and
+its key components. Figure 2.14 presents a high-level view of the
+Internet mail system. We see from this diagram that it has three major
+components: user agents, mail servers, and the Simple Mail Transfer
+Protocol (SMTP). We now describe each of these components in the context
+of a sender, Alice, sending an e-mail message to a recipient, Bob. User
+agents allow users to read, reply to, forward, save, and compose
+messages. Microsoft Outlook and Apple Mail are examples of user agents
+for e-mail. When Alice is finished composing her message, her user agent
+sends the message to her mail server, where the message is placed in the
+mail server's outgoing message queue. When Bob wants to read a message,
+his user agent retrieves the message from his mailbox in his mail
+server. Mail servers form the core of the e-mail infrastructure. Each
+recipient, such as Bob, has a mailbox located in one of the mail
+servers. Bob's mailbox manages and
+
+Figure 2.14 A high-level view of the Internet e-mail system
+
+maintains the messages that have been sent to him. A typical message
+starts its journey in the sender's user agent, travels to the sender's
+mail server, and travels to the recipient's mail server, where it is
+deposited in the recipient's mailbox. When Bob wants to access the
+messages in his mailbox, the mail server containing his mailbox
+authenticates Bob (with usernames and passwords). Alice's mail server
+must also deal with failures in Bob's mail server. If Alice's server
+cannot deliver mail to Bob's server, Alice's server holds the message in
+a message queue and attempts to transfer the message later. Reattempts
+are often done every 30 minutes or so; if there is no success after
+several days, the server removes the message and notifies the sender
+(Alice) with an e-mail message. SMTP is the principal application-layer
+protocol for Internet electronic mail. It uses the reliable data
+transfer service of TCP to transfer mail from the sender's mail server
+to the recipient's mail server. As with most application-layer
+protocols, SMTP has two sides: a client side, which executes on the
+sender's mail server, and a server side, which executes on the
+recipient's mail server. Both the client and server sides of SMTP run on
+every mail server. When a mail server sends mail to other mail servers,
+it acts as an SMTP client. When a mail server receives mail from other
+mail servers, it acts as an SMTP server.
+
+2.3.1 SMTP SMTP, defined in RFC 5321, is at the heart of Internet
+electronic mail. As mentioned above, SMTP transfers messages from
+senders' mail servers to the recipients' mail servers. SMTP is much
+older than HTTP. (The original SMTP RFC dates back to 1982, and SMTP was
+around long before that.) Although SMTP has numerous wonderful
+qualities, as evidenced by its ubiquity in the Internet, it is
+nevertheless a legacy technology that possesses certain archaic
+characteristics. For example, it restricts the body (not just the
+headers) of all mail messages to simple 7-bit ASCII. This restriction
+made sense in the early 1980s when transmission capacity was scarce and
+no one was e-mailing large attachments or large image, audio, or video
+files. But today, in the multimedia era, the 7-bit ASCII restriction is
+a bit of a pain ---it requires binary multimedia data to be encoded to
+ASCII before being sent over SMTP; and it requires the corresponding
+ASCII message to be decoded back to binary after SMTP transport. Recall
+from Section 2.2 that HTTP does not require multimedia data to be ASCII
+encoded before transfer. To illustrate the basic operation of SMTP,
+let's walk through a common scenario. Suppose Alice wants to send Bob a
+simple ASCII message.
+
+1.  Alice invokes her user agent for e-mail, provides Bob's e-mail
+    address (for example, bob@someschool.edu ), composes a message, and
+    instructs the user agent to send the message.
+
+2.  Alice's user agent sends the message to her mail server, where it is
+    placed in a message queue.
+
+3.  The client side of SMTP, running on Alice's mail server, sees the
+    message in the message queue. It opens a TCP connection to an SMTP
+    server, running on Bob's mail server.
+
+4.  After some initial SMTP handshaking, the SMTP client sends Alice's
+    message into the TCP connection.
+
+5.  At Bob's mail server, the server side of SMTP receives the message.
+    Bob's mail server then places the message in Bob's mailbox.
+
+6.  Bob invokes his user agent to read the message at his convenience.
+    The scenario is summarized in Figure 2.15. It is important to
+    observe that SMTP does not normally use intermediate mail servers
+    for sending mail, even when the two mail servers are located at
+    opposite ends of the world. If Alice's server is in Hong Kong and
+    Bob's server is in St. Louis, the TCP
+
+Figure 2.15 Alice sends a message to Bob
+
+connection is a direct connection between the Hong Kong and St. Louis
+servers. In particular, if Bob's mail server is down, the message
+remains in Alice's mail server and waits for a new attempt---the message
+does not get placed in some intermediate mail server. Let's now take a
+closer look at how SMTP transfers a message from a sending mail server
+to a receiving mail server. We will see that the SMTP protocol has many
+similarities with protocols that are used for face-to-face human
+interaction. First, the client SMTP (running on the sending mail server
+host) has TCP establish a connection to port 25 at the server SMTP
+(running on the receiving mail server host). If the server is down, the
+client tries again later. Once this connection is established, the
+server and client perform some application-layer handshaking---just as
+humans often introduce themselves before transferring information from
+one to another, SMTP clients and servers introduce themselves before
+transferring information. During this SMTP handshaking phase, the SMTP
+client indicates the email address of the sender (the person who
+generated the message) and the e-mail address of the recipient. Once the
+SMTP client and server have introduced themselves to each other, the
+client sends the message. SMTP can count on the reliable data transfer
+service of TCP to get the message to the server without errors. The
+client then repeats this process over the same TCP connection if it has
+other messages to send to the server; otherwise, it instructs TCP to
+close the connection. Let's next take a look at an example transcript of
+messages exchanged between an SMTP client (C) and an SMTP server (S).
+The hostname of the client is crepes.fr and the hostname of the server
+is hamburger.edu . The ASCII text lines prefaced with C: are exactly the
+lines the client sends into its TCP socket, and the ASCII text lines
+prefaced with S: are exactly the lines the server sends into its TCP
+socket. The following transcript begins as soon as the TCP connection is
+established.
+
+S:  220 hamburger.edu C:  HELO crepes.fr S:  250 Hello crepes.fr,
+pleased to meet you
+
+C:  MAIL FROM: <alice@crepes.fr> S:  250 alice@crepes.fr ... Sender ok
+C:  RCPT TO: <bob@hamburger.edu> S:  250 bob@hamburger.edu ... Recipient
+ok C:  DATA S:  354 Enter mail, end with "." on a line by itself C:  Do
+you like ketchup? C:  How about pickles? C:  . S:  250 Message accepted
+for delivery C:  QUIT S:  221 hamburger.edu closing connection
+
+In the example above, the client sends a message (" Do you like ketchup?
+How about pickles? ") from mail server crepes.fr to mail server
+hamburger.edu . As part of the dialogue, the client issued five
+commands: HELO (an abbreviation for HELLO), MAIL FROM , RCPT TO , DATA ,
+and QUIT . These commands are self-explanatory. The client also sends a
+line consisting of a single period, which indicates the end of the
+message to the server. (In ASCII jargon, each message ends with
+CRLF.CRLF , where CR and LF stand for carriage return and line feed,
+respectively.) The server issues replies to each command, with each
+reply having a reply code and some (optional) Englishlanguage
+explanation. We mention here that SMTP uses persistent connections: If
+the sending mail server has several messages to send to the same
+receiving mail server, it can send all of the messages over the same TCP
+connection. For each message, the client begins the process with a new
+MAIL FROM: crepes.fr , designates the end of message with an isolated
+period, and issues QUIT only after all messages have been sent. It is
+highly recommended that you use Telnet to carry out a direct dialogue
+with an SMTP server. To do this, issue
+
+telnet serverName 25
+
+where serverName is the name of a local mail server. When you do this,
+you are simply establishing a TCP connection between your local host and
+the mail server. After typing this line, you should immediately receive
+the 220 reply from the server. Then issue the SMTP commands HELO , MAIL
+FROM , RCPT TO , DATA , CRLF.CRLF , and QUIT at the appropriate times.
+It is also highly recommended that you do Programming Assignment 3 at
+the end of this chapter. In that assignment, you'll build a simple user
+agent that implements the client side of SMTP. It will allow you to send
+an e-
+
+mail message to an arbitrary recipient via a local mail server.
+
+2.3.2 Comparison with HTTP Let's now briefly compare SMTP with HTTP.
+Both protocols are used to transfer files from one host to another: HTTP
+transfers files (also called objects) from a Web server to a Web client
+(typically a browser); SMTP transfers files (that is, e-mail messages)
+from one mail server to another mail server. When transferring the
+files, both persistent HTTP and SMTP use persistent connections. Thus,
+the two protocols have common characteristics. However, there are
+important differences. First, HTTP is mainly a pull protocol---someone
+loads information on a Web server and users use HTTP to pull the
+information from the server at their convenience. In particular, the TCP
+connection is initiated by the machine that wants to receive the file.
+On the other hand, SMTP is primarily a push protocol---the sending mail
+server pushes the file to the receiving mail server. In particular, the
+TCP connection is initiated by the machine that wants to send the file.
+A second difference, which we alluded to earlier, is that SMTP requires
+each message, including the body of each message, to be in 7-bit ASCII
+format. If the message contains characters that are not 7-bit ASCII (for
+example, French characters with accents) or contains binary data (such
+as an image file), then the message has to be encoded into 7-bit ASCII.
+HTTP data does not impose this restriction. A third important difference
+concerns how a document consisting of text and images (along with
+possibly other media types) is handled. As we learned in Section 2.2,
+HTTP encapsulates each object in its own HTTP response message. SMTP
+places all of the message's objects into one message.
+
+2.3.3 Mail Message Formats When Alice writes an ordinary snail-mail
+letter to Bob, she may include all kinds of peripheral header
+information at the top of the letter, such as Bob's address, her own
+return address, and the date. Similarly, when an e-mail message is sent
+from one person to another, a header containing peripheral information
+precedes the body of the message itself. This peripheral information is
+contained in a series of header lines, which are defined in RFC 5322.
+The header lines and the body of the message are separated by a blank
+line (that is, by CRLF ). RFC 5322 specifies the exact format for mail
+header lines as well as their semantic interpretations. As with HTTP,
+each header line contains readable text, consisting of a keyword
+followed by a colon followed by a value. Some of the keywords are
+required and others are optional. Every header must have a From: header
+line and a To: header line; a header may include a Subject: header line
+as well as other optional header lines. It is important to note that
+these header lines are different from the SMTP commands we studied in
+Section 2.4.1 (even though
+
+they contain some common words such as "from" and "to"). The commands in
+that section were part of the SMTP handshaking protocol; the header
+lines examined in this section are part of the mail message itself. A
+typical message header looks like this:
+
+From: alice@crepes.fr To: bob@hamburger.edu Subject: Searching for the
+meaning of life.
+
+After the message header, a blank line follows; then the message body
+(in ASCII) follows. You should use Telnet to send a message to a mail
+server that contains some header lines, including the Subject: header
+line. To do this, issue telnet serverName 25, as discussed in Section
+2.4.1.
+
+2.3.4 Mail Access Protocols Once SMTP delivers the message from Alice's
+mail server to Bob's mail server, the message is placed in Bob's
+mailbox. Throughout this discussion we have tacitly assumed that Bob
+reads his mail by logging onto the server host and then executing a mail
+reader that runs on that host. Up until the early 1990s this was the
+standard way of doing things. But today, mail access uses a
+client-server architecture---the typical user reads e-mail with a client
+that executes on the user's end system, for example, on an office PC, a
+laptop, or a smartphone. By executing a mail client on a local PC, users
+enjoy a rich set of features, including the ability to view multimedia
+messages and attachments. Given that Bob (the recipient) executes his
+user agent on his local PC, it is natural to consider placing a mail
+server on his local PC as well. With this approach, Alice's mail server
+would dialogue directly with Bob's PC. There is a problem with this
+approach, however. Recall that a mail server manages mailboxes and runs
+the client and server sides of SMTP. If Bob's mail server were to reside
+on his local PC, then Bob's PC would have to remain always on, and
+connected to the Internet, in order to receive new mail, which can
+arrive at any time. This is impractical for many Internet users.
+Instead, a typical user runs a user agent on the local PC but accesses
+its mailbox stored on an always-on shared mail server. This mail server
+is shared with other users and is typically maintained by the user's ISP
+(for example, university or company). Now let's consider the path an
+e-mail message takes when it is sent from Alice to Bob. We just learned
+that at some point along the path the e-mail message needs to be
+deposited in Bob's mail server. This could be done simply by having
+Alice's user agent send the message directly to Bob's mail server. And
+
+this could be done with SMTP---indeed, SMTP has been designed for
+pushing e-mail from one host to another. However, typically the sender's
+user agent does not dialogue directly with the recipient's mail server.
+Instead, as shown in Figure 2.16, Alice's user agent uses SMTP to push
+the e-mail message into her mail server, then Alice's mail server uses
+SMTP (as an SMTP client) to relay the e-mail message to Bob's mail
+server. Why the two-step procedure? Primarily because without relaying
+through Alice's mail server, Alice's user agent doesn't have any
+recourse to an unreachable destination
+
+Figure 2.16 E-mail protocols and their communicating entities
+
+mail server. By having Alice first deposit the e-mail in her own mail
+server, Alice's mail server can repeatedly try to send the message to
+Bob's mail server, say every 30 minutes, until Bob's mail server becomes
+operational. (And if Alice's mail server is down, then she has the
+recourse of complaining to her system administrator!) The SMTP RFC
+defines how the SMTP commands can be used to relay a message across
+multiple SMTP servers. But there is still one missing piece to the
+puzzle! How does a recipient like Bob, running a user agent on his local
+PC, obtain his messages, which are sitting in a mail server within Bob's
+ISP? Note that Bob's user agent can't use SMTP to obtain the messages
+because obtaining the messages is a pull operation, whereas SMTP is a
+push protocol. The puzzle is completed by introducing a special mail
+access protocol that transfers messages from Bob's mail server to his
+local PC. There are currently a number of popular mail access protocols,
+including Post Office Protocol---Version 3 (POP3), Internet Mail Access
+Protocol (IMAP), and HTTP. Figure 2.16 provides a summary of the
+protocols that are used for Internet mail: SMTP is used to transfer mail
+from the sender's mail server to the recipient's mail server; SMTP is
+also used to transfer mail from the sender's user agent to the sender's
+mail server. A mail access protocol, such as POP3, is used to transfer
+mail from the recipient's mail server to the recipient's user agent.
+POP3 POP3 is an extremely simple mail access protocol. It is defined in
+\[RFC 1939\], which is short and quite readable. Because the protocol is
+so simple, its functionality is rather limited. POP3 begins when the
+user agent (the client) opens a TCP connection to the mail server (the
+server) on port 110. With the TCP
+
+connection established, POP3 progresses through three phases:
+authorization, transaction, and update. During the first phase,
+authorization, the user agent sends a username and a password (in the
+clear) to authenticate the user. During the second phase, transaction,
+the user agent retrieves messages; also during this phase, the user
+agent can mark messages for deletion, remove deletion marks, and obtain
+mail statistics. The third phase, update, occurs after the client has
+issued the quit command, ending the POP3 session; at this time, the mail
+server deletes the messages that were marked for deletion. In a POP3
+transaction, the user agent issues commands, and the server responds to
+each command with a reply. There are two possible responses: +OK
+(sometimes followed by server-to-client data), used by the server to
+indicate that the previous command was fine; and -ERR , used by the
+server to indicate that something was wrong with the previous command.
+The authorization phase has two principal commands: user
+`<username>`{=html} and pass `<password>`{=html} . To illustrate these
+two commands, we suggest that you Telnet directly into a POP3 server,
+using port 110, and issue these commands. Suppose that mailServer is the
+name of your mail server. You will see something like:
+
+telnet mailServer 110 +OK POP3 server ready user bob +OK pass hungry +OK
+user successfully logged on
+
+If you misspell a command, the POP3 server will reply with an -ERR
+message. Now let's take a look at the transaction phase. A user agent
+using POP3 can often be configured (by the user) to "download and
+delete" or to "download and keep." The sequence of commands issued by a
+POP3 user agent depends on which of these two modes the user agent is
+operating in. In the downloadand-delete mode, the user agent will issue
+the list , retr , and dele commands. As an example, suppose the user has
+two messages in his or her mailbox. In the dialogue below, C: (standing
+for client) is the user agent and S: (standing for server) is the mail
+server. The transaction will look something like:
+
+C: list S: 1 498 S: 2 912
+
+S: . C: retr 1 S: (blah blah ... S: ................. S: ..........blah)
+S: . C: dele 1 C: retr 2 S: (blah blah ... S: ................. S:
+..........blah) S: . C: dele 2 C: quit S: +OK POP3 server signing off
+
+The user agent first asks the mail server to list the size of each of
+the stored messages. The user agent then retrieves and deletes each
+message from the server. Note that after the authorization phase, the
+user agent employed only four commands: list , retr , dele , and quit .
+The syntax for these commands is defined in RFC 1939. After processing
+the quit command, the POP3 server enters the update phase and removes
+messages 1 and 2 from the mailbox. A problem with this
+download-and-delete mode is that the recipient, Bob, may be nomadic and
+may want to access his mail messages from multiple machines, for
+example, his office PC, his home PC, and his portable computer. The
+download-and-delete mode partitions Bob's mail messages over these three
+machines; in particular, if Bob first reads a message on his office PC,
+he will not be able to reread the message from his portable at home
+later in the evening. In the download-and-keep mode, the user agent
+leaves the messages on the mail server after downloading them. In this
+case, Bob can reread messages from different machines; he can access a
+message from work and access it again later in the week from home.
+During a POP3 session between a user agent and the mail server, the POP3
+server maintains some state information; in particular, it keeps track
+of which user messages have been marked deleted. However, the POP3
+server does not carry state information across POP3 sessions. This lack
+of state information across sessions greatly simplifies the
+implementation of a POP3 server. IMAP With POP3 access, once Bob has
+downloaded his messages to the local machine, he can create mail
+
+folders and move the downloaded messages into the folders. Bob can then
+delete messages, move messages across folders, and search for messages
+(by sender name or subject). But this paradigm--- namely, folders and
+messages in the local machine---poses a problem for the nomadic user,
+who would prefer to maintain a folder hierarchy on a remote server that
+can be accessed from any computer. This is not possible with POP3---the
+POP3 protocol does not provide any means for a user to create remote
+folders and assign messages to folders. To solve this and other
+problems, the IMAP protocol, defined in \[RFC 3501\], was invented. Like
+POP3, IMAP is a mail access protocol. It has many more features than
+POP3, but it is also significantly more complex. (And thus the client
+and server side implementations are significantly more complex.) An IMAP
+server will associate each message with a folder; when a message first
+arrives at the server, it is associated with the recipient's INBOX
+folder. The recipient can then move the message into a new, user-created
+folder, read the message, delete the message, and so on. The IMAP
+protocol provides commands to allow users to create folders and move
+messages from one folder to another. IMAP also provides commands that
+allow users to search remote folders for messages matching specific
+criteria. Note that, unlike POP3, an IMAP server maintains user state
+information across IMAP sessions---for example, the names of the folders
+and which messages are associated with which folders. Another important
+feature of IMAP is that it has commands that permit a user agent to
+obtain components of messages. For example, a user agent can obtain just
+the message header of a message or just one part of a multipart MIME
+message. This feature is useful when there is a low-bandwidth connection
+(for example, a slow-speed modem link) between the user agent and its
+mail server. With a low-bandwidth connection, the user may not want to
+download all of the messages in its mailbox, particularly avoiding long
+messages that might contain, for example, an audio or video clip.
+Web-Based E-Mail More and more users today are sending and accessing
+their e-mail through their Web browsers. Hotmail introduced Web-based
+access in the mid 1990s. Now Web-based e-mail is also provided by
+Google, Yahoo!, as well as just about every major university and
+corporation. With this service, the user agent is an ordinary Web
+browser, and the user communicates with its remote mailbox via HTTP.
+When a recipient, such as Bob, wants to access a message in his mailbox,
+the e-mail message is sent from Bob's mail server to Bob's browser using
+the HTTP protocol rather than the POP3 or IMAP protocol. When a sender,
+such as Alice, wants to send an e-mail message, the e-mail message is
+sent from her browser to her mail server over HTTP rather than over
+SMTP. Alice's mail server, however, still sends messages to, and
+receives messages from, other mail servers using SMTP.
+
+2.4 DNS---The Internet's Directory Service We human beings can be
+identified in many ways. For example, we can be identified by the names
+that appear on our birth certificates. We can be identified by our
+social security numbers. We can be identified by our driver's license
+numbers. Although each of these identifiers can be used to identify
+people, within a given context one identifier may be more appropriate
+than another. For example, the computers at the IRS (the infamous
+tax-collecting agency in the United States) prefer to use fixed-length
+social security numbers rather than birth certificate names. On the
+other hand, ordinary people prefer the more mnemonic birth certificate
+names rather than social security numbers. (Indeed, can you imagine
+saying, "Hi. My name is 132-67-9875. Please meet my husband,
+178-87-1146.") Just as humans can be identified in many ways, so too can
+Internet hosts. One identifier for a host is its hostname.
+Hostnames---such as www.facebook.com, www.google.com , gaia.cs.umass.edu
+---are mnemonic and are therefore appreciated by humans. However,
+hostnames provide little, if any, information about the location within
+the Internet of the host. (A hostname such as www.eurecom.fr , which
+ends with the country code .fr , tells us that the host is probably in
+France, but doesn't say much more.) Furthermore, because hostnames can
+consist of variable-length alphanumeric characters, they would be
+difficult to process by routers. For these reasons, hosts are also
+identified by so-called IP addresses. We discuss IP addresses in some
+detail in Chapter 4, but it is useful to say a few brief words about
+them now. An IP address consists of four bytes and has a rigid
+hierarchical structure. An IP address looks like 121.7.106.83 , where
+each period separates one of the bytes expressed in decimal notation
+from 0 to 255. An IP address is hierarchical because as we scan the
+address from left to right, we obtain more and more specific information
+about where the host is located in the Internet (that is, within which
+network, in the network of networks). Similarly, when we scan a postal
+address from bottom to top, we obtain more and more specific information
+about where the addressee is located.
+
+2.4.1 Services Provided by DNS We have just seen that there are two ways
+to identify a host---by a hostname and by an IP address. People prefer
+the more mnemonic hostname identifier, while routers prefer
+fixed-length, hierarchically structured IP addresses. In order to
+reconcile these preferences, we need a directory service that translates
+hostnames to IP addresses. This is the main task of the Internet's
+domain name system (DNS). The DNS is (1) a distributed database
+implemented in a hierarchy of DNS servers, and (2) an
+
+application-layer protocol that allows hosts to query the distributed
+database. The DNS servers are often UNIX machines running the Berkeley
+Internet Name Domain (BIND) software \[BIND 2016\]. The DNS protocol
+runs over UDP and uses port 53. DNS is commonly employed by other
+application-layer protocols---including HTTP and SMTP to translate
+user-supplied hostnames to IP addresses. As an example, consider what
+happens when a browser (that is, an HTTP client), running on some user's
+host, requests the URL www.someschool.edu/index.html . In order for the
+user's host to be able to send an HTTP request message to the Web server
+www.someschool.edu , the user's host must first obtain the IP address of
+www.someschool.edu . This is done as follows.
+
+1.  The same user machine runs the client side of the DNS application.
+
+2.  The browser extracts the hostname, www.someschool.edu , from the URL
+    and passes the hostname to the client side of the DNS application.
+
+3.  The DNS client sends a query containing the hostname to a DNS
+    server.
+
+4.  The DNS client eventually receives a reply, which includes the IP
+    address for the hostname.
+
+5.  Once the browser receives the IP address from DNS, it can initiate a
+    TCP connection to the HTTP server process located at port 80 at that
+    IP address. We see from this example that DNS adds an additional
+    delay---sometimes substantial---to the Internet applications that
+    use it. Fortunately, as we discuss below, the desired IP address is
+    often cached in a "nearby" DNS server, which helps to reduce DNS
+    network traffic as well as the average DNS delay. DNS provides a few
+    other important services in addition to translating hostnames to IP
+    addresses: Host aliasing. A host with a complicated hostname can
+    have one or more alias names. For example, a hostname such as
+    relay1.west-coast.enterprise.com could have, say, two aliases such
+    as enterprise.com and www.enterprise.com . In this case, the
+    hostname relay1.west-coast.enterprise.com is said to be a canonical
+    hostname. Alias hostnames, when present, are typically more mnemonic
+    than canonical hostnames. DNS can be invoked by an application to
+    obtain the canonical hostname for a supplied alias hostname as well
+    as the IP address of the host. Mail server aliasing. For obvious
+    reasons, it is highly desirable that e-mail addresses be mnemonic.
+    For example, if Bob has an account with Yahoo Mail, Bob's e-mail
+    address might be as simple as bob@yahoo.mail . However, the hostname
+    of the Yahoo mail server is more complicated and much less mnemonic
+    than simply yahoo.com (for example, the canonical hostname might be
+    something like relay1.west-coast.yahoo.com ). DNS can be invoked by
+    a mail application to obtain the canonical hostname for a supplied
+    alias hostname as well as the IP address of the host. In fact, the
+    MX record (see below) permits a company's mail server and Web server
+    to have identical (aliased) hostnames; for example, a company's Web
+    server and mail server can both be called
+
+enterprise.com . Load distribution. DNS is also used to perform load
+distribution among replicated servers, such as replicated Web servers.
+Busy sites, such as cnn.com , are replicated over multiple servers, with
+each server running on a different end system and each having a
+different IP address. For replicated Web servers, a set of IP addresses
+is thus associated with one canonical hostname. The DNS database
+contains this set of IP addresses. When clients make a DNS query for a
+name mapped to a set of addresses, the server responds with the entire
+set of IP addresses, but rotates the ordering of the addresses within
+each reply. Because a client typically sends its HTTP request message to
+the IP address that is listed first in the set, DNS rotation distributes
+the traffic among the replicated servers. DNS rotation is also used for
+e-mail so that multiple mail servers can have the same alias name. Also,
+content distribution companies such as Akamai have used DNS in more
+sophisticated ways \[Dilley 2002\] to provide Web content distribution
+(see Section 2.6.3). The DNS is specified in RFC 1034 and RFC 1035, and
+updated in several additional RFCs. It is a complex system, and we only
+touch upon key aspects of its
+
+PRINCIPLES IN PRACTICE DNS: CRITICAL NETWORK FUNCTIONS VIA THE
+CLIENT-SERVER PARADIGM Like HTTP, FTP, and SMTP, the DNS protocol is an
+application-layer protocol since it (1) runs between communicating end
+systems using the client-server paradigm and (2) relies on an underlying
+end-to-end transport protocol to transfer DNS messages between
+communicating end systems. In another sense, however, the role of the
+DNS is quite different from Web, file transfer, and e-mail applications.
+Unlike these applications, the DNS is not an application with which a
+user directly interacts. Instead, the DNS provides a core Internet
+function---namely, translating hostnames to their underlying IP
+addresses, for user applications and other software in the Internet. We
+noted in Section 1.2 that much of the complexity in the Internet
+architecture is located at the "edges" of the network. The DNS, which
+implements the critical name-toaddress translation process using clients
+and servers located at the edge of the network, is yet another example
+of that design philosophy.
+
+operation here. The interested reader is referred to these RFCs and the
+book by Albitz and Liu \[Albitz 1993\]; see also the retrospective paper
+\[Mockapetris 1988\], which provides a nice description of the what and
+why of DNS, and \[Mockapetris 2005\].
+
+2.4.2 Overview of How DNS Works We now present a high-level overview of
+how DNS works. Our discussion will focus on the hostname-to-
+
+IP-address translation service. Suppose that some application (such as a
+Web browser or a mail reader) running in a user's host needs to
+translate a hostname to an IP address. The application will invoke the
+client side of DNS, specifying the hostname that needs to be translated.
+(On many UNIX-based machines, gethostbyname() is the function call that
+an application calls in order to perform the translation.) DNS in the
+user's host then takes over, sending a query message into the network.
+All DNS query and reply messages are sent within UDP datagrams to port
+53. After a delay, ranging from milliseconds to seconds, DNS in the
+user's host receives a DNS reply message that provides the desired
+mapping. This mapping is then passed to the invoking application. Thus,
+from the perspective of the invoking application in the user's host, DNS
+is a black box providing a simple, straightforward translation service.
+But in fact, the black box that implements the service is complex,
+consisting of a large number of DNS servers distributed around the
+globe, as well as an application-layer protocol that specifies how the
+DNS servers and querying hosts communicate. A simple design for DNS
+would have one DNS server that contains all the mappings. In this
+centralized design, clients simply direct all queries to the single DNS
+server, and the DNS server responds directly to the querying clients.
+Although the simplicity of this design is attractive, it is
+inappropriate for today's Internet, with its vast (and growing) number
+of hosts. The problems with a centralized design include: A single point
+of failure. If the DNS server crashes, so does the entire Internet!
+Traffic volume. A single DNS server would have to handle all DNS queries
+(for all the HTTP requests and e-mail messages generated from hundreds
+of millions of hosts). Distant centralized database. A single DNS server
+cannot be "close to" all the querying clients. If we put the single DNS
+server in New York City, then all queries from Australia must travel to
+the other side of the globe, perhaps over slow and congested links. This
+can lead to significant delays. Maintenance. The single DNS server would
+have to keep records for all Internet hosts. Not only would this
+centralized database be huge, but it would have to be updated frequently
+to account for every new host. In summary, a centralized database in a
+single DNS server simply doesn't scale. Consequently, the DNS is
+distributed by design. In fact, the DNS is a wonderful example of how a
+distributed database can be implemented in the Internet. A Distributed,
+Hierarchical Database In order to deal with the issue of scale, the DNS
+uses a large number of servers, organized in a hierarchical fashion and
+distributed around the world. No single DNS server has all of the
+mappings for all of the hosts in the Internet. Instead, the mappings are
+distributed across the DNS servers. To a first approximation, there are
+three classes of DNS servers---root DNS servers, top-level domain (TLD)
+DNS
+
+servers, and authoritative DNS servers---organized in a hierarchy as
+shown in Figure 2.17. To understand how these three classes of servers
+interact, suppose a DNS client wants to determine the IP address for the
+hostname www.amazon.com . To a first
+
+Figure 2.17 Portion of the hierarchy of DNS servers
+
+approximation, the following events will take place. The client first
+contacts one of the root servers, which returns IP addresses for TLD
+servers for the top-level domain com . The client then contacts one of
+these TLD servers, which returns the IP address of an authoritative
+server for amazon.com . Finally, the client contacts one of the
+authoritative servers for amazon.com , which returns the IP address for
+the hostname www.amazon.com . We'll soon examine this DNS lookup process
+in more detail. But let's first take a closer look at these three
+classes of DNS servers: Root DNS servers. There are over 400 root name
+servers scattered all over the world. Figure 2.18 shows the countries
+that have root names servers, with countries having more than ten darkly
+shaded. These root name servers are managed by 13 different
+organizations. The full list of root name servers, along with the
+organizations that manage them and their IP addresses can be found at
+\[Root Servers 2016\]. Root name servers provide the IP addresses of the
+TLD servers. Top-level domain (TLD) servers. For each of the top-level
+domains --- top-level domains such as com, org, net, edu, and gov, and
+all of the country top-level domains such as uk, fr, ca, and jp ---
+there is TLD server (or server cluster). The company Verisign Global
+Registry Services maintains the TLD servers for the com top-level
+domain, and the company Educause maintains the TLD servers for the edu
+top-level domain. The network infrastructure supporting a TLD can be
+large and complex; see \[Osterweil 2012\] for a nice overview of the
+Verisign network. See \[TLD list 2016\] for a list of all top-level
+domains. TLD servers provide the IP addresses for authoritative DNS
+servers.
+
+Figure 2.18 DNS root servers in 2016
+
+Authoritative DNS servers. Every organization with publicly accessible
+hosts (such as Web servers and mail servers) on the Internet must
+provide publicly accessible DNS records that map the names of those
+hosts to IP addresses. An organization's authoritative DNS server houses
+these DNS records. An organization can choose to implement its own
+authoritative DNS server to hold these records; alternatively, the
+organization can pay to have these records stored in an authoritative
+DNS server of some service provider. Most universities and large
+companies implement and maintain their own primary and secondary
+(backup) authoritative DNS server. The root, TLD, and authoritative DNS
+servers all belong to the hierarchy of DNS servers, as shown in Figure
+2.17. There is another important type of DNS server called the local DNS
+server. A local DNS server does not strictly belong to the hierarchy of
+servers but is nevertheless central to the DNS architecture. Each
+ISP---such as a residential ISP or an institutional ISP---has a local
+DNS server (also called a default name server). When a host connects to
+an ISP, the ISP provides the host with the IP addresses of one or more
+of its local DNS servers (typically through DHCP, which is discussed in
+Chapter 4). You can easily determine the IP address of your local DNS
+server by accessing network status windows in Windows or UNIX. A host's
+local DNS server is typically "close to" the host. For an institutional
+ISP, the local DNS server may be on the same LAN as the host; for a
+residential ISP, it is typically separated from the host by no more than
+a few routers. When a host makes a DNS query, the query is sent to the
+local DNS server, which acts a proxy, forwarding the query into the DNS
+server hierarchy, as we'll discuss in more detail below. Let's take a
+look at a simple example. Suppose the host cse.nyu.edu desires the IP
+address of gaia.cs.umass.edu . Also suppose that NYU's ocal DNS server
+for cse.nyu.edu is called
+
+dns.nyu.edu and that an authoritative DNS server for gaia.cs.umass.edu
+is called dns.umass.edu . As shown in Figure 2.19, the host cse.nyu.edu
+first sends a DNS query message to its local DNS server, dns.nyu.edu .
+The query message contains the hostname to be translated, namely,
+gaia.cs.umass.edu . The local DNS server forwards the query message to a
+root DNS server. The root DNS server takes note of the edu suffix and
+returns to the local DNS server a list of IP addresses for TLD servers
+responsible for edu . The local DNS server then resends the query
+message to one of these TLD servers. The TLD server takes note of the
+umass.edu suffix and responds with the IP address of the authoritative
+DNS server for the University of Massachusetts, namely, dns.umass.edu .
+Finally, the local DNS server resends the query message directly to
+dns.umass.edu , which responds with the IP address of gaia.cs.umass.edu
+. Note that in this example, in order to obtain the mapping for one
+hostname, eight DNS messages were sent: four query messages and four
+reply messages! We'll soon see how DNS caching reduces this query
+traffic. Our previous example assumed that the TLD server knows the
+authoritative DNS server for the hostname. In general this not always
+true. Instead, the TLD server
+
+Figure 2.19 Interaction of the various DNS servers
+
+may know only of an intermediate DNS server, which in turn knows the
+authoritative DNS server for the hostname. For example, suppose again
+that the University of Massachusetts has a DNS server for the
+university, called dns.umass.edu . Also suppose that each of the
+departments at the University of Massachusetts has its own DNS server,
+and that each departmental DNS server is authoritative for all hosts in
+the department. In this case, when the intermediate DNS server,
+dns.umass.edu , receives a query for a host with a hostname ending with
+cs.umass.edu , it returns to dns.nyu.edu the IP address of
+dns.cs.umass.edu , which is authoritative for all hostnames ending with
+cs.umass.edu . The local DNS server dns.nyu.edu then sends the query to
+the authoritative DNS server, which returns the desired mapping to the
+local DNS server, which in turn returns the mapping to the requesting
+host. In this case, a total of 10 DNS messages are sent! The example
+shown in Figure 2.19 makes use of both recursive queries and iterative
+queries. The query sent from cse.nyu.edu to dns.nyu.edu is a recursive
+query, since the query asks dns.nyu.edu to obtain the mapping on its
+behalf. But the subsequent three queries are iterative since all of the
+replies are directly returned to dns.nyu.edu . In theory, any DNS query
+can be iterative or recursive. For example, Figure 2.20 shows a DNS
+query chain for which all of the queries are recursive. In practice, the
+queries typically follow the pattern in Figure 2.19: The query from the
+requesting host to the local DNS server is recursive, and the remaining
+queries are iterative. DNS Caching Our discussion thus far has ignored
+DNS caching, a critically important feature of the DNS system. In truth,
+DNS extensively exploits DNS caching in order to improve the delay
+performance and to reduce the number of DNS messages
+
+Figure 2.20 Recursive queries in DNS
+
+ricocheting around the Internet. The idea behind DNS caching is very
+simple. In a query chain, when a DNS server receives a DNS reply
+(containing, for example, a mapping from a hostname to an IP address),
+it can cache the mapping in its local memory. For example, in Figure
+2.19, each time the local DNS server dns.nyu.edu receives a reply from
+some DNS server, it can cache any of the information contained in the
+reply. If a hostname/IP address pair is cached in a DNS server and
+another query arrives to the DNS server for the same hostname, the DNS
+server can provide the desired IP address, even if it is not
+authoritative for the hostname. Because hosts and mappings between
+hostnames and IP addresses are by no means permanent, DNS servers
+discard cached information after a period of time (often set to two
+days). As an example, suppose that a host apricot.nyu.edu queries
+dns.nyu.edu for the IP address for the hostname cnn.com . Furthermore,
+suppose that a few hours later, another NYU host, say, kiwi.nyu.edu ,
+also queries dns.nyu.edu with the same hostname. Because of caching, the
+local DNS server will be able to immediately return the IP address of
+cnn.com to this second requesting
+
+host without having to query any other DNS servers. A local DNS server
+can also cache the IP addresses of TLD servers, thereby allowing the
+local DNS server to bypass the root DNS servers in a query chain. In
+fact, because of caching, root servers are bypassed for all but a very
+small fraction of DNS queries.
+
+2.4.3 DNS Records and Messages The DNS servers that together implement
+the DNS distributed database store resource records (RRs), including RRs
+that provide hostname-to-IP address mappings. Each DNS reply message
+carries one or more resource records. In this and the following
+subsection, we provide a brief overview of DNS resource records and
+messages; more details can be found in \[Albitz 1993\] or in the DNS
+RFCs \[RFC 1034; RFC 1035\]. A resource record is a four-tuple that
+contains the following fields:
+
+(Name, Value, Type, TTL)
+
+TTL is the time to live of the resource record; it determines when a
+resource should be removed from a cache. In the example records given
+below, we ignore the TTL field. The meaning of Name and Value depend on
+Type : If Type=A , then Name is a hostname and Value is the IP address
+for the hostname. Thus, a Type A record provides the standard
+hostname-to-IP address mapping. As an example, (relay1.bar.foo.com,
+145.37.93.126, A) is a Type A record. If Type=NS , then Name is a domain
+(such as foo.com ) and Value is the hostname of an authoritative DNS
+server that knows how to obtain the IP addresses for hosts in the
+domain. This record is used to route DNS queries further along in the
+query chain. As an example, (foo.com, dns.foo.com, NS) is a Type NS
+record. If Type=CNAME , then Value is a canonical hostname for the alias
+hostname Name . This record can provide querying hosts the canonical
+name for a hostname. As an example, (foo.com, relay1.bar.foo.com, CNAME)
+is a CNAME record. If Type=MX , then Value is the canonical name of a
+mail server that has an alias hostname Name . As an example, (foo.com,
+mail.bar.foo.com, MX) is an MX record. MX records allow the hostnames of
+mail servers to have simple aliases. Note that by using the MX record, a
+company can have the same aliased name for its mail server and for one
+of its other servers (such as its Web server). To obtain the canonical
+name for the mail server, a DNS client would query for an MX
+
+record; to obtain the canonical name for the other server, the DNS
+client would query for the CNAME record. If a DNS server is
+authoritative for a particular hostname, then the DNS server will
+contain a Type A record for the hostname. (Even if the DNS server is not
+authoritative, it may contain a Type A record in its cache.) If a server
+is not authoritative for a hostname, then the server will contain a Type
+NS record for the domain that includes the hostname; it will also
+contain a Type A record that provides the IP address of the DNS server
+in the Value field of the NS record. As an example, suppose an edu TLD
+server is not authoritative for the host gaia.cs.umass.edu . Then this
+server will contain a record for a domain that includes the host
+gaia.cs.umass.edu , for example, (umass.edu, dns.umass.edu, NS) . The
+edu TLD server would also contain a Type A record, which maps the DNS
+server dns.umass.edu to an IP address, for example, (dns.umass.edu,
+128.119.40.111, A) . DNS Messages Earlier in this section, we referred
+to DNS query and reply messages. These are the only two kinds of DNS
+messages. Furthermore, both query and reply messages have the same
+format, as shown in Figure 2.21.The semantics of the various fields in a
+DNS message are as follows: The first 12 bytes is the header section,
+which has a number of fields. The first field is a 16-bit number that
+identifies the query. This identifier is copied into the reply message
+to a query, allowing the client to match received replies with sent
+queries. There are a number of flags in the flag field. A 1-bit
+query/reply flag indicates whether the message is a query (0) or a reply
+(1). A 1-bit authoritative flag is
+
+Figure 2.21 DNS message format
+
+set in a reply message when a DNS server is an authoritative server for
+a queried name. A 1-bit recursion-desired flag is set when a client
+(host or DNS server) desires that the DNS server perform recursion when
+it doesn't have the record. A 1-bit recursion-available field is set in
+a reply if the DNS server supports recursion. In the header, there are
+also four number-of fields. These fields indicate the number of
+occurrences of the four types of data sections that follow the header.
+The question section contains information about the query that is being
+made. This section includes (1) a name field that contains the name that
+is being queried, and (2) a type field that indicates the type of
+question being asked about the name---for example, a host address
+associated with a name (Type A) or the mail server for a name (Type MX).
+In a reply from a DNS server, the answer section contains the resource
+records for the name that was originally queried. Recall that in each
+resource record there is the Type (for example, A, NS, CNAME, and MX),
+the Value , and the TTL . A reply can return multiple RRs in the answer,
+since a hostname can have multiple IP addresses (for example, for
+replicated Web servers, as discussed earlier in this section). The
+authority section contains records of other authoritative servers. The
+additional section contains other helpful records. For example, the
+answer field in a reply to an MX query contains a resource record
+providing the canonical hostname of a mail server. The additional
+section contains a Type A record providing the IP address for the
+canonical hostname of the mail server. How would you like to send a DNS
+query message directly from the host you're working on to some DNS
+server? This can easily be done with the nslookup program, which is
+available from most Windows and UNIX platforms. For example, from a
+Windows host, open the Command Prompt and invoke the nslookup program by
+simply typing "nslookup." After invoking nslookup, you can send a DNS
+query to any DNS server (root, TLD, or authoritative). After receiving
+the reply message from the DNS server, nslookup will display the records
+included in the reply (in a human-readable format). As an alternative to
+running nslookup from your own host, you can visit one of many Web sites
+that allow you to remotely employ nslookup. (Just type "nslookup" into a
+search engine and you'll be brought to one of these sites.) The DNS
+Wireshark lab at the end of this chapter will allow you to explore the
+DNS in much more detail. Inserting Records into the DNS Database The
+discussion above focused on how records are retrieved from the DNS
+database. You might be wondering how records get into the database in
+the first place. Let's look at how this is done in the context of a
+specific example. Suppose you have just created an exciting new startup
+company called Network Utopia. The first thing you'll surely want to do
+is register the domain name
+
+networkutopia.com at a registrar. A registrar is a commercial entity
+that verifies the uniqueness of the domain name, enters the domain name
+into the DNS database (as discussed below), and collects a small fee
+from you for its services. Prior to 1999, a single registrar, Network
+Solutions, had a monopoly on domain name registration for com , net ,
+and org domains. But now there are many registrars competing for
+customers, and the Internet Corporation for Assigned Names and Numbers
+(ICANN) accredits the various registrars. A complete list of accredited
+registrars is available at http:// www.internic.net . When you register
+the domain name networkutopia.com with some registrar, you also need to
+provide the registrar with the names and IP addresses of your primary
+and secondary authoritative DNS servers. Suppose the names and IP
+addresses are dns1.networkutopia.com , dns2.networkutopia.com ,
+212.2.212.1, and 212.212.212.2. For each of these two authoritative DNS
+servers, the registrar would then make sure that a Type NS and a Type A
+record are entered into the TLD com servers. Specifically, for the
+primary authoritative server for networkutopia.com , the registrar would
+insert the following two resource records into the DNS system:
+
+(networkutopia.com, dns1.networkutopia.com, NS) (dns1.networkutopia.com,
+212.212.212.1, A)
+
+You'll also have to make sure that the Type A resource record for your
+Web server www.networkutopia.com and the Type MX resource record for
+your mail server mail.networkutopia.com are entered into your
+authoritative DNS FOCUS ON SECURITY DNS VULNERABILITIES We have seen
+that DNS is a critical component of the Internet infrastructure, with
+many important services---including the Web and e-mail---simply
+incapable of functioning without it. We therefore naturally ask, how can
+DNS be attacked? Is DNS a sitting duck, waiting to be knocked out of
+service, while taking most Internet applications down with it? The first
+type of attack that comes to mind is a DDoS bandwidth-flooding attack
+(see Section 1.6) against DNS servers. For example, an attacker could
+attempt to send to each DNS root server a deluge of packets, so many
+that the majority of legitimate DNS queries never get answered. Such a
+large-scale DDoS attack against DNS root servers actually took place on
+October 21, 2002. In this attack, the attackers leveraged a botnet to
+send truck loads of ICMP ping messages to each of the 13 DNS root IP
+addresses. (ICMP messages are discussed in
+
+Section 5.6. For now, it suffices to know that ICMP packets are special
+types of IP datagrams.) Fortunately, this large-scale attack caused
+minimal damage, having little or no impact on users' Internet
+experience. The attackers did succeed at directing a deluge of packets
+at the root servers. But many of the DNS root servers were protected by
+packet filters, configured to always block all ICMP ping messages
+directed at the root servers. These protected servers were thus spared
+and functioned as normal. Furthermore, most local DNS servers cache the
+IP addresses of top-level-domain servers, allowing the query process to
+often bypass the DNS root servers. A potentially more effective DDoS
+attack against DNS would be send a deluge of DNS queries to
+top-level-domain servers, for example, to all the top-level-domain
+servers that handle the .com domain. It would be harder to filter DNS
+queries directed to DNS servers; and top-level-domain servers are not as
+easily bypassed as are root servers. But the severity of such an attack
+would be partially mitigated by caching in local DNS servers. DNS could
+potentially be attacked in other ways. In a man-in-the-middle attack,
+the attacker intercepts queries from hosts and returns bogus replies. In
+the DNS poisoning attack, the attacker sends bogus replies to a DNS
+server, tricking the server into accepting bogus records into its cache.
+Either of these attacks could be used, for example, to redirect an
+unsuspecting Web user to the attacker's Web site. These attacks,
+however, are difficult to implement, as they require intercepting
+packets or throttling servers \[Skoudis 2006\]. In summary, DNS has
+demonstrated itself to be surprisingly robust against attacks. To date,
+there hasn't been an attack that has successfully impeded the DNS
+service.
+
+servers. (Until recently, the contents of each DNS server were
+configured statically, for example, from a configuration file created by
+a system manager. More recently, an UPDATE option has been added to the
+DNS protocol to allow data to be dynamically added or deleted from the
+database via DNS messages. \[RFC 2136\] and \[RFC 3007\] specify DNS
+dynamic updates.) Once all of these steps are completed, people will be
+able to visit your Web site and send e-mail to the employees at your
+company. Let's conclude our discussion of DNS by verifying that this
+statement is true. This verification also helps to solidify what we have
+learned about DNS. Suppose Alice in Australia wants to view the Web page
+www.networkutopia.com . As discussed earlier, her host will first send a
+DNS query to her local DNS server. The local DNS server will then
+contact a TLD com server. (The local DNS server will also have to
+contact a root DNS server if the address of a TLD com server is not
+cached.) This TLD server contains the Type NS and Type A resource
+records listed above, because the registrar had these resource records
+inserted into all of the TLD com servers. The TLD com server sends a
+reply to Alice's local DNS server, with the reply containing the two
+resource records. The local DNS server then sends a DNS query to
+212.212.212.1 , asking for the Type A record corresponding to
+www.networkutopia.com . This record provides the IP address of the
+desired Web server, say, 212.212.71.4 , which the local DNS server
+passes back to Alice's host. Alice's browser can now
+
+initiate a TCP connection to the host 212.212.71.4 and send an HTTP
+request over the connection. Whew! There's a lot more going on than what
+meets the eye when one surfs the Web!
+
+2.5 Peer-to-Peer File Distribution The applications described in this
+chapter thus far---including the Web, e-mail, and DNS---all employ
+client-server architectures with significant reliance on always-on
+infrastructure servers. Recall from Section 2.1.1 that with a P2P
+architecture, there is minimal (or no) reliance on always-on
+infrastructure servers. Instead, pairs of intermittently connected
+hosts, called peers, communicate directly with each other. The peers are
+not owned by a service provider, but are instead desktops and laptops
+controlled by users. In this section we consider a very natural P2P
+application, namely, distributing a large file from a single server to a
+large number of hosts (called peers). The file might be a new version of
+the Linux operating system, a software patch for an existing operating
+system or application, an MP3 music file, or an MPEG video file. In
+client-server file distribution, the server must send a copy of the file
+to each of the peers---placing an enormous burden on the server and
+consuming a large amount of server bandwidth. In P2P file distribution,
+each peer can redistribute any portion of the file it has received to
+any other peers, thereby assisting the server in the distribution
+process. As of 2016, the most popular P2P file distribution protocol is
+BitTorrent. Originally developed by Bram Cohen, there are now many
+different independent BitTorrent clients conforming to the BitTorrent
+protocol, just as there are a number of Web browser clients that conform
+to the HTTP protocol. In this subsection, we first examine the
+selfscalability of P2P architectures in the context of file
+distribution. We then describe BitTorrent in some detail, highlighting
+its most important characteristics and features. Scalability of P2P
+Architectures To compare client-server architectures with peer-to-peer
+architectures, and illustrate the inherent selfscalability of P2P, we
+now consider a simple quantitative model for distributing a file to a
+fixed set of peers for both architecture types. As shown in Figure 2.22,
+the server and the peers are connected to the Internet with access
+links. Denote the upload rate of the server's access link by us, the
+upload rate of the ith peer's access link by ui, and the download rate
+of the ith peer's access link by di. Also denote the size of the file to
+be distributed (in bits) by F and the number of peers that want to
+obtain a copy of the file by N. The distribution time is the time it
+takes to get
+
+Figure 2.22 An illustrative file distribution problem
+
+a copy of the file to all N peers. In our analysis of the distribution
+time below, for both client-server and P2P architectures, we make the
+simplifying (and generally accurate \[Akella 2003\]) assumption that the
+Internet core has abundant bandwidth, implying that all of the
+bottlenecks are in access networks. We also suppose that the server and
+clients are not participating in any other network applications, so that
+all of their upload and download access bandwidth can be fully devoted
+to distributing this file. Let's first determine the distribution time
+for the client-server architecture, which we denote by Dcs. In the
+client-server architecture, none of the peers aids in distributing the
+file. We make the following observations: The server must transmit one
+copy of the file to each of the N peers. Thus the server must transmit
+NF bits. Since the server's upload rate is us, the time to distribute
+the file must be at least NF/us. Let dmin denote the download rate of
+the peer with the lowest download rate, that is, dmin=min{d1,dp,. .
+.,dN}. The peer with the lowest download rate cannot obtain all F bits
+of the file in less than F/dmin seconds. Thus the minimum distribution
+time is at least F/dmin. Putting these two observations together, we
+obtain Dcs≥max{NFus,Fdmin}.
+
+This provides a lower bound on the minimum distribution time for the
+client-server architecture. In the homework problems you will be asked
+to show that the server can schedule its transmissions so that the lower
+bound is actually achieved. So let's take this lower bound provided
+above as the actual distribution time, that is, Dcs=max{NFus,Fdmin}
+
+(2.1)
+
+We see from Equation 2.1 that for N large enough, the client-server
+distribution time is given by NF/us. Thus, the distribution time
+increases linearly with the number of peers N. So, for example, if the
+number of peers from one week to the next increases a thousand-fold from
+a thousand to a million, the time required to distribute the file to all
+peers increases by 1,000. Let's now go through a similar analysis for
+the P2P architecture, where each peer can assist the server in
+distributing the file. In particular, when a peer receives some file
+data, it can use its own upload capacity to redistribute the data to
+other peers. Calculating the distribution time for the P2P architecture
+is somewhat more complicated than for the client-server architecture,
+since the distribution time depends on how each peer distributes
+portions of the file to the other peers. Nevertheless, a simple
+expression for the minimal distribution time can be obtained \[Kumar
+2006\]. To this end, we first make the following observations: At the
+beginning of the distribution, only the server has the file. To get this
+file into the community of peers, the server must send each bit of the
+file at least once into its access link. Thus, the minimum distribution
+time is at least F/us. (Unlike the client-server scheme, a bit sent once
+by the server may not have to be sent by the server again, as the peers
+may redistribute the bit among themselves.) As with the client-server
+architecture, the peer with the lowest download rate cannot obtain all F
+bits of the file in less than F/dmin seconds. Thus the minimum
+distribution time is at least F/dmin. Finally, observe that the total
+upload capacity of the system as a whole is equal to the upload rate of
+the server plus the upload rates of each of the individual peers, that
+is, utotal=us+u1+⋯+uN. The system must deliver (upload) F bits to each
+of the N peers, thus delivering a total of NF bits. This cannot be done
+at a rate faster than utotal. Thus, the minimum distribution time is
+also at least NF/(us+u1+⋯+uN). Putting these three observations
+together, we obtain the minimum distribution time for P2P, denoted by
+DP2P. DP2P≥max{Fus,Fdmin,NFus+∑i=1Nui}
+
+(2.2)
+
+Equation 2.2 provides a lower bound for the minimum distribution time
+for the P2P architecture. It turns out that if we imagine that each peer
+can redistribute a bit as soon as it receives the bit, then there is a
+
+redistribution scheme that actually achieves this lower bound \[Kumar
+2006\]. (We will prove a special case of this result in the homework.)
+In reality, where chunks of the file are redistributed rather than
+individual bits, Equation 2.2 serves as a good approximation of the
+actual minimum distribution time. Thus, let's take the lower bound
+provided by Equation 2.2 as the actual minimum distribution time, that
+is, DP2P=max{Fus,Fdmin,NFus+∑i=1Nui}
+
+(2.3)
+
+Figure 2.23 compares the minimum distribution time for the client-server
+and P2P architectures assuming that all peers have the same upload rate
+u. In Figure 2.23, we have set F/u=1 hour, us=10u, and dmin≥us. Thus, a
+peer can transmit the entire file in one hour, the server transmission
+rate is 10 times the peer upload rate,
+
+Figure 2.23 Distribution time for P2P and client-server architectures
+
+and (for simplicity) the peer download rates are set large enough so as
+not to have an effect. We see from Figure 2.23 that for the
+client-server architecture, the distribution time increases linearly and
+without bound as the number of peers increases. However, for the P2P
+architecture, the minimal distribution time is not only always less than
+the distribution time of the client-server architecture; it is also less
+than one hour for any number of peers N. Thus, applications with the P2P
+architecture can be self-scaling. This scalability is a direct
+consequence of peers being redistributors as well as consumers of bits.
+BitTorrent BitTorrent is a popular P2P protocol for file distribution
+\[Chao 2011\]. In BitTorrent lingo, the collection of
+
+all peers participating in the distribution of a particular file is
+called a torrent. Peers in a torrent download equal-size chunks of the
+file from one another, with a typical chunk size of 256 KBytes. When a
+peer first joins a torrent, it has no chunks. Over time it accumulates
+more and more chunks. While it downloads chunks it also uploads chunks
+to other peers. Once a peer has acquired the entire file, it may
+(selfishly) leave the torrent, or (altruistically) remain in the torrent
+and continue to upload chunks to other peers. Also, any peer may leave
+the torrent at any time with only a subset of chunks, and later rejoin
+the torrent. Let's now take a closer look at how BitTorrent operates.
+Since BitTorrent is a rather complicated protocol and system, we'll only
+describe its most important mechanisms, sweeping some of the details
+under the rug; this will allow us to see the forest through the trees.
+Each torrent has an infrastructure node called a tracker.
+
+Figure 2.24 File distribution with BitTorrent
+
+When a peer joins a torrent, it registers itself with the tracker and
+periodically informs the tracker that it is still in the torrent. In
+this manner, the tracker keeps track of the peers that are participating
+in the torrent. A given torrent may have fewer than ten or more than a
+thousand peers participating at any instant of time.
+
+As shown in Figure 2.24, when a new peer, Alice, joins the torrent, the
+tracker randomly selects a subset of peers (for concreteness, say 50)
+from the set of participating peers, and sends the IP addresses of these
+50 peers to Alice. Possessing this list of peers, Alice attempts to
+establish concurrent TCP connections with all the peers on this list.
+Let's call all the peers with which Alice succeeds in establishing a TCP
+connection "neighboring peers." (In Figure 2.24, Alice is shown to have
+only three neighboring peers. Normally, she would have many more.) As
+time evolves, some of these peers may leave and other peers (outside the
+initial 50) may attempt to establish TCP connections with Alice. So a
+peer's neighboring peers will fluctuate over time. At any given time,
+each peer will have a subset of chunks from the file, with different
+peers having different subsets. Periodically, Alice will ask each of her
+neighboring peers (over the TCP connections) for the list of the chunks
+they have. If Alice has L different neighbors, she will obtain L lists
+of chunks. With this knowledge, Alice will issue requests (again over
+the TCP connections) for chunks she currently does not have. So at any
+given instant of time, Alice will have a subset of chunks and will know
+which chunks her neighbors have. With this information, Alice will have
+two important decisions to make. First, which chunks should she request
+first from her neighbors? And second, to which of her neighbors should
+she send requested chunks? In deciding which chunks to request, Alice
+uses a technique called rarest first. The idea is to determine, from
+among the chunks she does not have, the chunks that are the rarest among
+her neighbors (that is, the chunks that have the fewest repeated copies
+among her neighbors) and then request those rarest chunks first. In this
+manner, the rarest chunks get more quickly redistributed, aiming to
+(roughly) equalize the numbers of copies of each chunk in the torrent.
+To determine which requests she responds to, BitTorrent uses a clever
+trading algorithm. The basic idea is that Alice gives priority to the
+neighbors that are currently supplying her data at the highest rate.
+Specifically, for each of her neighbors, Alice continually measures the
+rate at which she receives bits and determines the four peers that are
+feeding her bits at the highest rate. She then reciprocates by sending
+chunks to these same four peers. Every 10 seconds, she recalculates the
+rates and possibly modifies the set of four peers. In BitTorrent lingo,
+these four peers are said to be unchoked. Importantly, every 30 seconds,
+she also picks one additional neighbor at random and sends it chunks.
+Let's call the randomly chosen peer Bob. In BitTorrent lingo, Bob is
+said to be optimistically unchoked. Because Alice is sending data to
+Bob, she may become one of Bob's top four uploaders, in which case Bob
+would start to send data to Alice. If the rate at which Bob sends data
+to Alice is high enough, Bob could then, in turn, become one of Alice's
+top four uploaders. In other words, every 30 seconds, Alice will
+randomly choose a new trading partner and initiate trading with that
+partner. If the two peers are satisfied with the trading, they will put
+each other in their top four lists and continue trading with each other
+until one of the peers finds a better partner. The effect is that peers
+capable of uploading at compatible rates tend to find each other. The
+random neighbor selection also allows new peers to get chunks, so that
+they can have something to trade. All other neighboring peers besides
+these five peers
+
+(four "top" peers and one probing peer) are "choked," that is, they do
+not receive any chunks from Alice. BitTorrent has a number of
+interesting mechanisms that are not discussed here, including pieces
+(minichunks), pipelining, random first selection, endgame mode, and
+anti-snubbing \[Cohen 2003\]. The incentive mechanism for trading just
+described is often referred to as tit-for-tat \[Cohen 2003\]. It has
+been shown that this incentive scheme can be circumvented \[Liogkas
+2006; Locher 2006; Piatek 2007\]. Nevertheless, the BitTorrent ecosystem
+is wildly successful, with millions of simultaneous peers actively
+sharing files in hundreds of thousands of torrents. If BitTorrent had
+been designed without tit-fortat (or a variant), but otherwise exactly
+the same, BitTorrent would likely not even exist now, as the majority of
+the users would have been freeriders \[Saroiu 2002\]. We close our
+discussion on P2P by briefly mentioning another application of P2P,
+namely, Distributed Hast Table (DHT). A distributed hash table is a
+simple database, with the database records being distributed over the
+peers in a P2P system. DHTs have been widely implemented (e.g., in
+BitTorrent) and have been the subject of extensive research. An overview
+is provided in a Video Note in the companion website.
+
+Walking though distributed hash tables
+
+2.6 Video Streaming and Content Distribution Networks Streaming
+prerecorded video now accounts for the majority of the traffic in
+residential ISPs in North America. In particular, the Netflix and
+YouTube services alone consumed a whopping 37% and 16%, respectively, of
+residential ISP traffic in 2015 \[Sandvine 2015\]. In this section we
+will provide an overview of how popular video streaming services are
+implemented in today's Internet. We will see they are implemented using
+application-level protocols and servers that function in some ways like
+a cache. In Chapter 9, devoted to multimedia networking, we will further
+examine Internet video as well as other Internet multimedia services.
+
+2.6.1 Internet Video In streaming stored video applications, the
+underlying medium is prerecorded video, such as a movie, a television
+show, a prerecorded sporting event, or a prerecorded user-generated
+video (such as those commonly seen on YouTube). These prerecorded videos
+are placed on servers, and users send requests to the servers to view
+the videos on demand. Many Internet companies today provide streaming
+video, including, Netflix, YouTube (Google), Amazon, and Youku. But
+before launching into a discussion of video streaming, we should first
+get a quick feel for the video medium itself. A video is a sequence of
+images, typically being displayed at a constant rate, for example, at 24
+or 30 images per second. An uncompressed, digitally encoded image
+consists of an array of pixels, with each pixel encoded into a number of
+bits to represent luminance and color. An important characteristic of
+video is that it can be compressed, thereby trading off video quality
+with bit rate. Today's off-the-shelf compression algorithms can compress
+a video to essentially any bit rate desired. Of course, the higher the
+bit rate, the better the image quality and the better the overall user
+viewing experience. From a networking perspective, perhaps the most
+salient characteristic of video is its high bit rate. Compressed
+Internet video typically ranges from 100 kbps for low-quality video to
+over 3 Mbps for streaming high-definition movies; 4K streaming envisions
+a bitrate of more than 10 Mbps. This can translate to huge amount of
+traffic and storage, particularly for high-end video. For example, a
+single 2 Mbps video with a duration of 67 minutes will consume 1
+gigabyte of storage and traffic. By far, the most important performance
+measure for streaming video is average end-to-end throughput. In order
+to provide continuous playout, the network must provide an average
+throughput to the streaming application that is at least as large as the
+bit rate of the compressed video.
+
+We can also use compression to create multiple versions of the same
+video, each at a different quality level. For example, we can use
+compression to create, say, three versions of the same video, at rates
+of 300 kbps, 1 Mbps, and 3 Mbps. Users can then decide which version
+they want to watch as a function of their current available bandwidth.
+Users with high-speed Internet connections might choose the 3 Mbps
+version; users watching the video over 3G with a smartphone might choose
+the 300 kbps version.
+
+2.6.2 HTTP Streaming and DASH In HTTP streaming, the video is simply
+stored at an HTTP server as an ordinary file with a specific URL. When a
+user wants to see the video, the client establishes a TCP connection
+with the server and issues an HTTP GET request for that URL. The server
+then sends the video file, within an HTTP response message, as quickly
+as the underlying network protocols and traffic conditions will allow.
+On the client side, the bytes are collected in a client application
+buffer. Once the number of bytes in this buffer exceeds a predetermined
+threshold, the client application begins playback---specifically, the
+streaming video application periodically grabs video frames from the
+client application buffer, decompresses the frames, and displays them on
+the user's screen. Thus, the video streaming application is displaying
+video as it is receiving and buffering frames corresponding to latter
+parts of the video. Although HTTP streaming, as described in the
+previous paragraph, has been extensively deployed in practice (for
+example, by YouTube since its inception), it has a major shortcoming:
+All clients receive the same encoding of the video, despite the large
+variations in the amount of bandwidth available to a client, both across
+different clients and also over time for the same client. This has led
+to the development of a new type of HTTP-based streaming, often referred
+to as Dynamic Adaptive Streaming over HTTP (DASH). In DASH, the video is
+encoded into several different versions, with each version having a
+different bit rate and, correspondingly, a different quality level. The
+client dynamically requests chunks of video segments of a few seconds in
+length. When the amount of available bandwidth is high, the client
+naturally selects chunks from a high-rate version; and when the
+available bandwidth is low, it naturally selects from a low-rate
+version. The client selects different chunks one at a time with HTTP GET
+request messages \[Akhshabi 2011\]. DASH allows clients with different
+Internet access rates to stream in video at different encoding rates.
+Clients with low-speed 3G connections can receive a low bit-rate (and
+low-quality) version, and clients with fiber connections can receive a
+high-quality version. DASH also allows a client to adapt to the
+available bandwidth if the available end-to-end bandwidth changes during
+the session. This feature is particularly important for mobile users,
+who typically see their bandwidth availability fluctuate as they move
+with respect to the base stations. With DASH, each video version is
+stored in the HTTP server, each with a different URL. The HTTP
+
+server also has a manifest file, which provides a URL for each version
+along with its bit rate. The client first requests the manifest file and
+learns about the various versions. The client then selects one chunk at
+a time by specifying a URL and a byte range in an HTTP GET request
+message for each chunk. While downloading chunks, the client also
+measures the received bandwidth and runs a rate determination algorithm
+to select the chunk to request next. Naturally, if the client has a lot
+of video buffered and if the measured receive bandwidth is high, it will
+choose a chunk from a high-bitrate version. And naturally if the client
+has little video buffered and the measured received bandwidth is low, it
+will choose a chunk from a low-bitrate version. DASH therefore allows
+the client to freely switch among different quality levels.
+
+2.6.3 Content Distribution Networks Today, many Internet video companies
+are distributing on-demand multi-Mbps streams to millions of users on a
+daily basis. YouTube, for example, with a library of hundreds of
+millions of videos, distributes hundreds of millions of video streams to
+users around the world every day. Streaming all this traffic to
+locations all over the world while providing continuous playout and high
+interactivity is clearly a challenging task. For an Internet video
+company, perhaps the most straightforward approach to providing
+streaming video service is to build a single massive data center, store
+all of its videos in the data center, and stream the videos directly
+from the data center to clients worldwide. But there are three major
+problems with this approach. First, if the client is far from the data
+center, server-to-client packets will cross many communication links and
+likely pass through many ISPs, with some of the ISPs possibly located on
+different continents. If one of these links provides a throughput that
+is less than the video consumption rate, the end-to-end throughput will
+also be below the consumption rate, resulting in annoying freezing
+delays for the user. (Recall from Chapter 1 that the end-to-end
+throughput of a stream is governed by the throughput at the bottleneck
+link.) The likelihood of this happening increases as the number of links
+in the end-to-end path increases. A second drawback is that a popular
+video will likely be sent many times over the same communication links.
+Not only does this waste network bandwidth, but the Internet video
+company itself will be paying its provider ISP (connected to the data
+center) for sending the same bytes into the Internet over and over
+again. A third problem with this solution is that a single data center
+represents a single point of failure---if the data center or its links
+to the Internet goes down, it would not be able to distribute any video
+streams. In order to meet the challenge of distributing massive amounts
+of video data to users distributed around the world, almost all major
+video-streaming companies make use of Content Distribution Networks
+(CDNs). A CDN manages servers in multiple geographically distributed
+locations, stores copies of the videos (and other types of Web content,
+including documents, images, and audio) in its servers, and attempts to
+direct each user request to a CDN location that will provide the best
+user experience. The
+
+CDN may be a private CDN, that is, owned by the content provider itself;
+for example, Google's CDN distributes YouTube videos and other types of
+content. The CDN may alternatively be a third-party CDN that distributes
+content on behalf of multiple content providers; Akamai, Limelight and
+Level-3 all operate third-party CDNs. A very readable overview of modern
+CDNs is \[Leighton 2009; Nygren 2010\]. CDNs typically adopt one of two
+different server placement philosophies \[Huang 2008\]: Enter Deep. One
+philosophy, pioneered by Akamai, is to enter deep into the access
+networks of Internet Service Providers, by deploying server clusters in
+access ISPs all over the world. (Access networks are described in
+Section 1.3.) Akamai takes this approach with clusters in approximately
+1,700 locations. The goal is to get close to end users, thereby
+improving user-perceived delay and throughput by decreasing the number
+of links and routers between the end user and the CDN server from which
+it receives content. Because of this highly distributed design, the task
+of maintaining and managing the clusters becomes challenging. Bring
+Home. A second design philosophy, taken by Limelight and many other CDN
+companies, is to bring the ISPs home by building large clusters at a
+smaller number (for example, tens) of sites. Instead of getting inside
+the access ISPs, these CDNs typically place their clusters in Internet
+Exchange Points (IXPs) (see Section 1.3). Compared with the enter-deep
+design philosophy, the bring-home design typically results in lower
+maintenance and management overhead, possibly at the expense of higher
+delay and lower throughput to end users. Once its clusters are in place,
+the CDN replicates content across its clusters. The CDN may not want to
+place a copy of every video in each cluster, since some videos are
+rarely viewed or are only popular in some countries. In fact, many CDNs
+do not push videos to their clusters but instead use a simple pull
+strategy: If a client requests a video from a cluster that is not
+storing the video, then the cluster retrieves the video (from a central
+repository or from another cluster) and stores a copy locally while
+streaming the video to the client at the same time. Similar Web caching
+(see Section 2.2.5), when a cluster's storage becomes full, it removes
+videos that are not frequently requested. CDN Operation Having
+identified the two major approaches toward deploying a CDN, let's now
+dive down into the nuts and bolts of how a CDN operates. When a browser
+in a user's
+
+CASE STUDY GOOGLE'S NETWORK INFRASTRUCTURE To support its vast array of
+cloud services---including search, Gmail, calendar, YouTube video, maps,
+documents, and social networks---Google has deployed an extensive
+private network and CDN infrastructure. Google's CDN infrastructure has
+three tiers of server clusters:
+
+Fourteen "mega data centers," with eight in North America, four in
+Europe, and two in Asia \[Google Locations 2016\], with each data center
+having on the order of 100,000 servers. These mega data centers are
+responsible for serving dynamic (and often personalized) content,
+including search results and Gmail messages. An estimated 50 clusters in
+IXPs scattered throughout the world, with each cluster consisting on the
+order of 100--500 servers \[Adhikari 2011a\]. These clusters are
+responsible for serving static content, including YouTube videos
+\[Adhikari 2011a\]. Many hundreds of "enter-deep" clusters located
+within an access ISP. Here a cluster typically consists of tens of
+servers within a single rack. These enter-deep servers perform TCP
+splitting (see Section 3.7) and serve static content \[Chen 2011\],
+including the static portions of Web pages that embody search results.
+All of these data centers and cluster locations are networked together
+with Google's own private network. When a user makes a search query,
+often the query is first sent over the local ISP to a nearby enter-deep
+cache, from where the static content is retrieved; while providing the
+static content to the client, the nearby cache also forwards the query
+over Google's private network to one of the mega data centers, from
+where the personalized search results are retrieved. For a YouTube
+video, the video itself may come from one of the bring-home caches,
+whereas portions of the Web page surrounding the video may come from the
+nearby enter-deep cache, and the advertisements surrounding the video
+come from the data centers. In summary, except for the local ISPs, the
+Google cloud services are largely provided by a network infrastructure
+that is independent of the public Internet.
+
+host is instructed to retrieve a specific video (identified by a URL),
+the CDN must intercept the request so that it can (1) determine a
+suitable CDN server cluster for that client at that time, and (2)
+redirect the client's request to a server in that cluster. We'll shortly
+discuss how a CDN can determine a suitable cluster. But first let's
+examine the mechanics behind intercepting and redirecting a request.
+Most CDNs take advantage of DNS to intercept and redirect requests; an
+interesting discussion of such a use of the DNS is \[Vixie 2009\]. Let's
+consider a simple example to illustrate how the DNS is typically
+involved. Suppose a content provider, NetCinema, employs the third-party
+CDN company, KingCDN, to distribute its videos to its customers. On the
+NetCinema Web pages, each of its videos is assigned a URL that includes
+the string "video" and a unique identifier for the video itself; for
+example, Transformers 7 might be assigned
+http://video.netcinema.com/6Y7B23V. Six steps then occur, as shown in
+Figure 2.25:
+
+1.  The user visits the Web page at NetCinema.
+2.  When the user clicks on the link http://video.netcinema.com/6Y7B23V,
+    the user's host sends a DNS query for video.netcinema.com.
+
+3. The user's Local DNS Server (LDNS) relays the DNS query to an
+authoritative DNS server for NetCinema, which observes the string
+"video" in the hostname video.netcinema.com. To "hand over" the DNS
+query to KingCDN, instead of returning an IP address, the NetCinema
+authoritative DNS server returns to the LDNS a hostname in the KingCDN's
+domain, for example, a1105.kingcdn.com.
+
+4.  From this point on, the DNS query enters into KingCDN's private DNS
+    infrastructure. The user's LDNS then sends a second query, now for
+    a1105.kingcdn.com, and KingCDN's DNS system eventually returns the
+    IP addresses of a KingCDN content server to the LDNS. It is thus
+    here, within the KingCDN's DNS system, that the CDN server from
+    which the client will receive its content is specified.
+
+Figure 2.25 DNS redirects a user's request to a CDN server
+
+5.  The LDNS forwards the IP address of the content-serving CDN node to
+    the user's host.
+6.  Once the client receives the IP address for a KingCDN content
+    server, it establishes a direct TCP connection with the server at
+    that IP address and issues an HTTP GET request for the video. If
+    DASH is used, the server will first send to the client a manifest
+    file with a list of URLs, one for each version of the video, and the
+    client will dynamically select chunks from the different versions.
+    Cluster Selection Strategies At the core of any CDN deployment is a
+    cluster selection strategy, that is, a mechanism for dynamically
+    directing clients to a server cluster or a data center within the
+    CDN. As we just saw, the
+
+CDN learns the IP address of the client's LDNS server via the client's
+DNS lookup. After learning this IP address, the CDN needs to select an
+appropriate cluster based on this IP address. CDNs generally employ
+proprietary cluster selection strategies. We now briefly survey a few
+approaches, each of which has its own advantages and disadvantages. One
+simple strategy is to assign the client to the cluster that is
+geographically closest. Using commercial geo-location databases (such as
+Quova \[Quova 2016\] and Max-Mind \[MaxMind 2016\]), each LDNS IP
+address is mapped to a geographic location. When a DNS request is
+received from a particular LDNS, the CDN chooses the geographically
+closest cluster, that is, the cluster that is the fewest kilometers from
+the LDNS "as the bird flies." Such a solution can work reasonably well
+for a large fraction of the clients \[Agarwal 2009\]. However, for some
+clients, the solution may perform poorly, since the geographically
+closest cluster may not be the closest cluster in terms of the length or
+number of hops of the network path. Furthermore, a problem inherent with
+all DNS-based approaches is that some end-users are configured to use
+remotely located LDNSs \[Shaikh 2001; Mao 2002\], in which case the LDNS
+location may be far from the client's location. Moreover, this simple
+strategy ignores the variation in delay and available bandwidth over
+time of Internet paths, always assigning the same cluster to a
+particular client. In order to determine the best cluster for a client
+based on the current traffic conditions, CDNs can instead perform
+periodic real-time measurements of delay and loss performance between
+their clusters and clients. For instance, a CDN can have each of its
+clusters periodically send probes (for example, ping messages or DNS
+queries) to all of the LDNSs around the world. One drawback of this
+approach is that many LDNSs are configured to not respond to such
+probes.
+
+2.6.4 Case Studies: Netflix, YouTube, and Kankan We conclude our
+discussion of streaming stored video by taking a look at three highly
+successful largescale deployments: Netflix, YouTube, and Kankan. We'll
+see that each of these systems take a very different approach, yet
+employ many of the underlying principles discussed in this section.
+Netflix Generating 37% of the downstream traffic in residential ISPs in
+North America in 2015, Netflix has become the leading service provider
+for online movies and TV series in the United States \[Sandvine 2015\].
+As we discuss below, Netflix video distribution has two major
+components: the Amazon cloud and its own private CDN infrastructure.
+Netflix has a Web site that handles numerous functions, including user
+registration and login, billing, movie catalogue for browsing and
+searching, and a movie recommendation system. As shown in Figure
+
+2.26, this Web site (and its associated backend databases) run entirely
+on Amazon servers in the Amazon cloud. Additionally, the Amazon cloud
+handles the following critical functions: Content ingestion. Before
+Netflix can distribute a movie to its customers, it must first ingest
+and process the movie. Netflix receives studio master versions of movies
+and uploads them to hosts in the Amazon cloud. Content processing. The
+machines in the Amazon cloud create many different formats for each
+movie, suitable for a diverse array of client video players running on
+desktop computers, smartphones, and game consoles connected to
+televisions. A different version is created for each of these formats
+and at multiple bit rates, allowing for adaptive streaming over HTTP
+using DASH. Uploading versions to its CDN. Once all of the versions of a
+movie have been created, the hosts in the Amazon cloud upload the
+versions to its CDN.
+
+Figure 2.26 Netflix video streaming platform
+
+When Netflix first rolled out its video streaming service in 2007, it
+employed three third-party CDN companies to distribute its video
+content. Netflix has since created its own private CDN, from which it
+now streams all of its videos. (Netflix still uses Akamai to distribute
+its Web pages, however.) To create its own CDN, Netflix has installed
+server racks both in IXPs and within residential ISPs themselves.
+Netflix currently has server racks in over 50 IXP locations; see
+\[Netflix Open Connect 2016\] for a current list of IXPs housing Netflix
+racks. There are also hundreds of ISP locations housing Netflix racks;
+also see \[Netflix Open Connect 2016\], where Netflix provides to
+potential ISP partners instructions about installing a (free) Netflix
+rack for their networks. Each server in the rack has several 10 Gbps
+
+Ethernet ports and over 100 terabytes of storage. The number of servers
+in a rack varies: IXP installations often have tens of servers and
+contain the entire Netflix streaming video library, including multiple
+versions of the videos to support DASH; local IXPs may only have one
+server and contain only the most popular videos. Netflix does not use
+pull-caching (Section 2.2.5) to populate its CDN servers in the IXPs and
+ISPs. Instead, Netflix distributes by pushing the videos to its CDN
+servers during offpeak hours. For those locations that cannot hold the
+entire library, Netflix pushes only the most popular videos, which are
+determined on a day-to-day basis. The Netflix CDN design is described in
+some detail in the YouTube videos \[Netflix Video 1\] and \[Netflix
+Video 2\]. Having described the components of the Netflix architecture,
+let's take a closer look at the interaction between the client and the
+various servers that are involved in movie delivery. As indicated
+earlier, the Web pages for browsing the Netflix video library are served
+from servers in the Amazon cloud. When a user selects a movie to play,
+the Netflix software, running in the Amazon cloud, first determines
+which of its CDN servers have copies of the movie. Among the servers
+that have the movie, the software then determines the "best" server for
+that client request. If the client is using a residential ISP that has a
+Netflix CDN server rack installed in that ISP, and this rack has a copy
+of the requested movie, then a server in this rack is typically
+selected. If not, a server at a nearby IXP is typically selected. Once
+Netflix determines the CDN server that is to deliver the content, it
+sends the client the IP address of the specific server as well as a
+manifest file, which has the URLs for the different versions of the
+requested movie. The client and that CDN server then directly interact
+using a proprietary version of DASH. Specifically, as described in
+Section 2.6.2, the client uses the byte-range header in HTTP GET request
+messages, to request chunks from the different versions of the movie.
+Netflix uses chunks that are approximately four-seconds long \[Adhikari
+2012\]. While the chunks are being downloaded, the client measures the
+received throughput and runs a rate-determination algorithm to determine
+the quality of the next chunk to request. Netflix embodies many of the
+key principles discussed earlier in this section, including adaptive
+streaming and CDN distribution. However, because Netflix uses its own
+private CDN, which distributes only video (and not Web pages), Netflix
+has been able to simplify and tailor its CDN design. In particular,
+Netflix does not need to employ DNS redirect, as discussed in Section
+2.6.3, to connect a particular client to a CDN server; instead, the
+Netflix software (running in the Amazon cloud) directly tells the client
+to use a particular CDN server. Furthermore, the Netflix CDN uses push
+caching rather than pull caching (Section 2.2.5): content is pushed into
+the servers at scheduled times at off-peak hours, rather than
+dynamically during cache misses. YouTube With 300 hours of video
+uploaded to YouTube every minute and several billion video views per day
+\[YouTube 2016\], YouTube is indisputably the world's largest
+video-sharing site. YouTube began its
+
+service in April 2005 and was acquired by Google in November 2006.
+Although the Google/YouTube design and protocols are proprietary,
+through several independent measurement efforts we can gain a basic
+understanding about how YouTube operates \[Zink 2009; Torres 2011;
+Adhikari 2011a\]. As with Netflix, YouTube makes extensive use of CDN
+technology to distribute its videos \[Torres 2011\]. Similar to Netflix,
+Google uses its own private CDN to distribute YouTube videos, and has
+installed server clusters in many hundreds of different IXP and ISP
+locations. From these locations and directly from its huge data centers,
+Google distributes YouTube videos \[Adhikari 2011a\]. Unlike Netflix,
+however, Google uses pull caching, as described in Section 2.2.5, and
+DNS redirect, as described in Section 2.6.3. Most of the time, Google's
+cluster-selection strategy directs the client to the cluster for which
+the RTT between client and cluster is the lowest; however, in order to
+balance the load across clusters, sometimes the client is directed (via
+DNS) to a more distant cluster \[Torres 2011\]. YouTube employs HTTP
+streaming, often making a small number of different versions available
+for a video, each with a different bit rate and corresponding quality
+level. YouTube does not employ adaptive streaming (such as DASH), but
+instead requires the user to manually select a version. In order to save
+bandwidth and server resources that would be wasted by repositioning or
+early termination, YouTube uses the HTTP byte range request to limit the
+flow of transmitted data after a target amount of video is prefetched.
+Several million videos are uploaded to YouTube every day. Not only are
+YouTube videos streamed from server to client over HTTP, but YouTube
+uploaders also upload their videos from client to server over HTTP.
+YouTube processes each video it receives, converting it to a YouTube
+video format and creating multiple versions at different bit rates. This
+processing takes place entirely within Google data centers. (See the
+case study on Google's network infrastructure in Section 2.6.3.) Kankan
+We just saw that dedicated servers, operated by private CDNs, stream
+Netflix and YouTube videos to clients. Netflix and YouTube have to pay
+not only for the server hardware but also for the bandwidth the servers
+use to distribute the videos. Given the scale of these services and the
+amount of bandwidth they are consuming, such a CDN deployment can be
+costly. We conclude this section by describing an entirely different
+approach for providing video on demand over the Internet at a large
+scale---one that allows the service provider to significantly reduce its
+infrastructure and bandwidth costs. As you might suspect, this approach
+uses P2P delivery instead of (or along with) client-server delivery.
+Since 2011, Kankan (owned and operated by Xunlei) has been deploying P2P
+video delivery with great success, with tens of millions of users every
+month \[Zhang 2015\]. At a high level, P2P video streaming is very
+similar to BitTorrent file downloading. When a peer wants to
+
+see a video, it contacts a tracker to discover other peers in the system
+that have a copy of that video. This requesting peer then requests
+chunks of the video in parallel from the other peers that have the
+video. Different from downloading with BitTorrent, however, requests are
+preferentially made for chunks that are to be played back in the near
+future in order to ensure continuous playback \[Dhungel 2012\].
+Recently, Kankan has migrated to a hybrid CDN-P2P streaming system
+\[Zhang 2015\]. Specifically, Kankan now deploys a few hundred servers
+within China and pushes video content to these servers. This Kankan CDN
+plays a major role in the start-up stage of video streaming. In most
+cases, the client requests the beginning of the content from CDN
+servers, and in parallel requests content from peers. When the total P2P
+traffic is sufficient for video playback, the client will cease
+streaming from the CDN and only stream from peers. But if the P2P
+streaming traffic becomes insufficient, the client will restart CDN
+connections and return to the mode of hybrid CDN-P2P streaming. In this
+manner, Kankan can ensure short initial start-up delays while minimally
+relying on costly infrastructure servers and bandwidth.
+
+2.7 Socket Programming: Creating Network Applications Now that we've
+looked at a number of important network applications, let's explore how
+network application programs are actually created. Recall from Section
+2.1 that a typical network application consists of a pair of
+programs---a client program and a server program---residing in two
+different end systems. When these two programs are executed, a client
+process and a server process are created, and these processes
+communicate with each other by reading from, and writing to, sockets.
+When creating a network application, the developer's main task is
+therefore to write the code for both the client and server programs.
+There are two types of network applications. One type is an
+implementation whose operation is specified in a protocol standard, such
+as an RFC or some other standards document; such an application is
+sometimes referred to as "open," since the rules specifying its
+operation are known to all. For such an implementation, the client and
+server programs must conform to the rules dictated by the RFC. For
+example, the client program could be an implementation of the client
+side of the HTTP protocol, described in Section 2.2 and precisely
+defined in RFC 2616; similarly, the server program could be an
+implementation of the HTTP server protocol, also precisely defined in
+RFC 2616. If one developer writes code for the client program and
+another developer writes code for the server program, and both
+developers carefully follow the rules of the RFC, then the two programs
+will be able to interoperate. Indeed, many of today's network
+applications involve communication between client and server programs
+that have been created by independent developers---for example, a Google
+Chrome browser communicating with an Apache Web server, or a BitTorrent
+client communicating with BitTorrent tracker. The other type of network
+application is a proprietary network application. In this case the
+client and server programs employ an application-layer protocol that has
+not been openly published in an RFC or elsewhere. A single developer (or
+development team) creates both the client and server programs, and the
+developer has complete control over what goes in the code. But because
+the code does not implement an open protocol, other independent
+developers will not be able to develop code that interoperates with the
+application. In this section, we'll examine the key issues in developing
+a client-server application, and we'll "get our hands dirty" by looking
+at code that implements a very simple client-server application. During
+the development phase, one of the first decisions the developer must
+make is whether the application is to run over TCP or over UDP. Recall
+that TCP is connection oriented and provides a reliable byte-stream
+channel through which data flows between two end systems. UDP is
+connectionless and sends independent packets of data from one end system
+to the other, without any guarantees about delivery.
+
+Recall also that when a client or server program implements a protocol
+defined by an RFC, it should use the well-known port number associated
+with the protocol; conversely, when developing a proprietary
+application, the developer must be careful to avoid using such
+well-known port numbers. (Port numbers were briefly discussed in Section
+2.1. They are covered in more detail in Chapter 3.) We introduce UDP and
+TCP socket programming by way of a simple UDP application and a simple
+TCP application. We present the simple UDP and TCP applications in
+Python 3. We could have written the code in Java, C, or C++, but we
+chose Python mostly because Python clearly exposes the key socket
+concepts. With Python there are fewer lines of code, and each line can
+be explained to the novice programmer without difficulty. But there's no
+need to be frightened if you are not familiar with Python. You should be
+able to easily follow the code if you have experience programming in
+Java, C, or C++. If you are interested in client-server programming with
+Java, you are encouraged to see the Companion Website for this textbook;
+in fact, you can find there all the examples in this section (and
+associated labs) in Java. For readers who are interested in
+client-server programming in C, there are several good references
+available \[Donahoo 2001; Stevens 1997; Frost 1994; Kurose 1996\]; our
+Python examples below have a similar look and feel to C.
+
+2.7.1 Socket Programming with UDP In this subsection, we'll write simple
+client-server programs that use UDP; in the following section, we'll
+write similar programs that use TCP. Recall from Section 2.1 that
+processes running on different machines communicate with each other by
+sending messages into sockets. We said that each process is analogous to
+a house and the process's socket is analogous to a door. The application
+resides on one side of the door in the house; the transport-layer
+protocol resides on the other side of the door in the outside world. The
+application developer has control of everything on the application-layer
+side of the socket; however, it has little control of the
+transport-layer side. Now let's take a closer look at the interaction
+between two communicating processes that use UDP sockets. Before the
+sending process can push a packet of data out the socket door, when
+using UDP, it must first attach a destination address to the packet.
+After the packet passes through the sender's socket, the Internet will
+use this destination address to route the packet through the Internet to
+the socket in the receiving process. When the packet arrives at the
+receiving socket, the receiving process will retrieve the packet through
+the socket, and then inspect the packet's contents and take appropriate
+action. So you may be now wondering, what goes into the destination
+address that is attached to the packet?
+
+As you might expect, the destination host's IP address is part of the
+destination address. By including the destination IP address in the
+packet, the routers in the Internet will be able to route the packet
+through the Internet to the destination host. But because a host may be
+running many network application processes, each with one or more
+sockets, it is also necessary to identify the particular socket in the
+destination host. When a socket is created, an identifier, called a port
+number, is assigned to it. So, as you might expect, the packet's
+destination address also includes the socket's port number. In summary,
+the sending process attaches to the packet a destination address, which
+consists of the destination host's IP address and the destination
+socket's port number. Moreover, as we shall soon see, the sender's
+source address---consisting of the IP address of the source host and the
+port number of the source socket---are also attached to the packet.
+However, attaching the source address to the packet is typically not
+done by the UDP application code; instead it is automatically done by
+the underlying operating system. We'll use the following simple
+client-server application to demonstrate socket programming for both UDP
+and TCP:
+
+1.  The client reads a line of characters (data) from its keyboard and
+    sends the data to the server.
+2.  The server receives the data and converts the characters to
+    uppercase.
+3.  The server sends the modified data to the client.
+4.  The client receives the modified data and displays the line on its
+    screen. Figure 2.27 highlights the main socket-related activity of
+    the client and server that communicate over the UDP transport
+    service. Now let's get our hands dirty and take a look at the
+    client-server program pair for a UDP implementation of this simple
+    application. We also provide a detailed, line-by-line analysis after
+    each program. We'll begin with the UDP client, which will send a
+    simple application-level message to the server. In order for
+
+Figure 2.27 The client-server application using UDP
+
+the server to be able to receive and reply to the client's message, it
+must be ready and running---that is, it must be running as a process
+before the client sends its message. The client program is called
+UDPClient.py, and the server program is called UDPServer.py. In order to
+emphasize the key issues, we intentionally provide code that is minimal.
+"Good code" would certainly have a few more auxiliary lines, in
+particular for handling error cases. For this application, we have
+arbitrarily chosen 12000 for the server port number. UDPClient.py Here
+is the code for the client side of the application:
+
+from socket import \* serverName = 'hostname' serverPort = 12000
+
+clientSocket = socket(AF_INET, SOCK_DGRAM) message = raw_input('Input
+lowercase sentence:') clientSocket.sendto(message.encode(),(serverName,
+serverPort)) modifiedMessage, serverAddress =
+clientSocket.recvfrom(2048) print(modifiedMessage.decode())
+clientSocket.close()
+
+Now let's take a look at the various lines of code in UDPClient.py.
+
+from socket import \*
+
+The socket module forms the basis of all network communications in
+Python. By including this line, we will be able to create sockets within
+our program.
+
+serverName = 'hostname' serverPort = 12000
+
+The first line sets the variable serverName to the string 'hostname'.
+Here, we provide a string containing either the IP address of the server
+(e.g., "128.138.32.126") or the hostname of the server (e.g.,
+"cis.poly.edu"). If we use the hostname, then a DNS lookup will
+automatically be performed to get the IP address.) The second line sets
+the integer variable serverPort to 12000.
+
+clientSocket = socket(AF_INET, SOCK_DGRAM)
+
+This line creates the client's socket, called clientSocket . The first
+parameter indicates the address family; in particular, AF_INET indicates
+that the underlying network is using IPv4. (Do not worry about this
+now---we will discuss IPv4 in Chapter 4.) The second parameter indicates
+that the socket is of type SOCK_DGRAM , which means it is a UDP socket
+(rather than a TCP socket). Note that we are not specifying the port
+number of the client socket when we create it; we are instead letting
+the operating system do this for us. Now that the client process's door
+has been created, we will want to create a message to send through the
+door.
+
+message = raw_input('Input lowercase sentence:')
+
+raw_input() is a built-in function in Python. When this command is
+executed, the user at the client is prompted with the words "Input
+lowercase sentence:" The user then uses her keyboard to input a line,
+which is put into the variable message . Now that we have a socket and a
+message, we will want to send the message through the socket to the
+destination host.
+
+clientSocket.sendto(message.encode(),(serverName, serverPort))
+
+In the above line, we first convert the message from string type to byte
+type, as we need to send bytes into a socket; this is done with the
+encode() method. The method sendto() attaches the destination address (
+serverName, serverPort ) to the message and sends the resulting packet
+into the process's socket, clientSocket . (As mentioned earlier, the
+source address is also attached to the packet, although this is done
+automatically rather than explicitly by the code.) Sending a
+client-to-server message via a UDP socket is that simple! After sending
+the packet, the client waits to receive data from the server.
+
+modifiedMessage, serverAddress = clientSocket.recvfrom(2048)
+
+With the above line, when a packet arrives from the Internet at the
+client's socket, the packet's data is put into the variable
+modifiedMessage and the packet's source address is put into the variable
+serverAddress . The variable serverAddress contains both the server's IP
+address and the server's port number. The program UDPClient doesn't
+actually need this server address information, since it already knows
+the server address from the outset; but this line of Python provides the
+server address nevertheless. The method recvfrom also takes the buffer
+size 2048 as input. (This buffer size works for most purposes.)
+
+print(modifiedMessage.decode())
+
+This line prints out modifiedMessage on the user's display, after
+converting the message from bytes to string. It should be the original
+line that the user typed, but now capitalized.
+
+clientSocket.close()
+
+This line closes the socket. The process then terminates. UDPServer.py
+Let's now take a look at the server side of the application:
+
+from socket import \* serverPort = 12000 serverSocket = socket(AF_INET,
+SOCK_DGRAM) serverSocket.bind(('', serverPort)) print("The server is
+ready to receive") while True: message, clientAddress =
+serverSocket.recvfrom(2048) modifiedMessage = message.decode().upper()
+serverSocket.sendto(modifiedMessage.encode(), clientAddress)
+
+Note that the beginning of UDPServer is similar to UDPClient. It also
+imports the socket module, also sets the integer variable serverPort to
+12000, and also creates a socket of type SOCK_DGRAM (a UDP socket). The
+first line of code that is significantly different from UDPClient is:
+
+serverSocket.bind(('', serverPort))
+
+The above line binds (that is, assigns) the port number 12000 to the
+server's socket. Thus in UDPServer, the code (written by the application
+developer) is explicitly assigning a port number to the socket. In this
+manner, when anyone sends a packet to port 12000 at the IP address of
+the server, that packet will be directed to this socket. UDPServer then
+enters a while loop; the while loop will allow UDPServer to receive and
+process packets from clients indefinitely. In the while loop, UDPServer
+waits for a packet to arrive.
+
+message, clientAddress = serverSocket.recvfrom(2048)
+
+This line of code is similar to what we saw in UDPClient. When a packet
+arrives at the server's socket, the packet's data is put into the
+variable message and the packet's source address is put into the
+variable clientAddress . The variable clientAddress contains both the
+client's IP address and the client's port number. Here, UDPServer will
+make use of this address information, as it provides a return
+
+address, similar to the return address with ordinary postal mail. With
+this source address information, the server now knows to where it should
+direct its reply.
+
+modifiedMessage = message.decode().upper()
+
+This line is the heart of our simple application. It takes the line sent
+by the client and, after converting the message to a string, uses the
+method upper() to capitalize it.
+
+serverSocket.sendto(modifiedMessage.encode(), clientAddress)
+
+This last line attaches the client's address (IP address and port
+number) to the capitalized message (after converting the string to
+bytes), and sends the resulting packet into the server's socket. (As
+mentioned earlier, the server address is also attached to the packet,
+although this is done automatically rather than explicitly by the code.)
+The Internet will then deliver the packet to this client address. After
+the server sends the packet, it remains in the while loop, waiting for
+another UDP packet to arrive (from any client running on any host). To
+test the pair of programs, you run UDPClient.py on one host and
+UDPServer.py on another host. Be sure to include the proper hostname or
+IP address of the server in UDPClient.py. Next, you execute
+UDPServer.py, the compiled server program, in the server host. This
+creates a process in the server that idles until it is contacted by some
+client. Then you execute UDPClient.py, the compiled client program, in
+the client. This creates a process in the client. Finally, to use the
+application at the client, you type a sentence followed by a carriage
+return. To develop your own UDP client-server application, you can begin
+by slightly modifying the client or server programs. For example,
+instead of converting all the letters to uppercase, the server could
+count the number of times the letter s appears and return this number.
+Or you can modify the client so that after receiving a capitalized
+sentence, the user can continue to send more sentences to the server.
+
+2.7.2 Socket Programming with TCP Unlike UDP, TCP is a
+connection-oriented protocol. This means that before the client and
+server can start to send data to each other, they first need to
+handshake and establish a TCP connection. One end of the TCP connection
+is attached to the client socket and the other end is attached to a
+server socket. When creating the TCP connection, we associate with it
+the client socket address (IP address and port
+
+number) and the server socket address (IP address and port number). With
+the TCP connection established, when one side wants to send data to the
+other side, it just drops the data into the TCP connection via its
+socket. This is different from UDP, for which the server must attach a
+destination address to the packet before dropping it into the socket.
+Now let's take a closer look at the interaction of client and server
+programs in TCP. The client has the job of initiating contact with the
+server. In order for the server to be able to react to the client's
+initial contact, the server has to be ready. This implies two things.
+First, as in the case of UDP, the TCP server must be running as a
+process before the client attempts to initiate contact. Second, the
+server program must have a special door---more precisely, a special
+socket---that welcomes some initial contact from a client process
+running on an arbitrary host. Using our house/door analogy for a
+process/socket, we will sometimes refer to the client's initial contact
+as "knocking on the welcoming door." With the server process running,
+the client process can initiate a TCP connection to the server. This is
+done in the client program by creating a TCP socket. When the client
+creates its TCP socket, it specifies the address of the welcoming socket
+in the server, namely, the IP address of the server host and the port
+number of the socket. After creating its socket, the client initiates a
+three-way handshake and establishes a TCP connection with the server.
+The three-way handshake, which takes place within the transport layer,
+is completely invisible to the client and server programs. During the
+three-way handshake, the client process knocks on the welcoming door of
+the server process. When the server "hears" the knocking, it creates a
+new door---more precisely, a new socket that is dedicated to that
+particular client. In our example below, the welcoming door is a TCP
+socket object that we call  serverSocket ; the newly created socket
+dedicated to the client making the connection is called connectionSocket
+. Students who are encountering TCP sockets for the first time sometimes
+confuse the welcoming socket (which is the initial point of contact for
+all clients wanting to communicate with the server), and each newly
+created server-side connection socket that is subsequently created for
+communicating with each client. From the application's perspective, the
+client's socket and the server's connection socket are directly
+connected by a pipe. As shown in Figure 2.28, the client process can
+send arbitrary bytes into its socket, and TCP guarantees that the server
+process will receive (through the connection socket) each byte in the
+order sent. TCP thus provides a reliable service between the client and
+server processes. Furthermore, just as people can go in and out the same
+door, the client process not only sends bytes into but also receives
+bytes from its socket; similarly, the server process not only receives
+bytes from but also sends bytes into its connection socket. We use the
+same simple client-server application to demonstrate socket programming
+with TCP: The client sends one line of data to the server, the server
+capitalizes the line and sends it back to the client. Figure 2.29
+highlights the main socket-related activity of the client and server
+that communicate over
+
+the TCP transport service.
+
+Figure 2.28 The TCPServer process has two sockets
+
+TCPClient.py Here is the code for the client side of the application:
+
+from socket import \* serverName = 'servername' serverPort = 12000
+clientSocket = socket(AF_INET, SOCK_STREAM)
+clientSocket.connect((serverName, serverPort)) sentence =
+raw_input('Input lowercase sentence:')
+clientSocket.send(sentence.encode()) modifiedSentence =
+clientSocket.recv(1024) print('From Server: ',
+modifiedSentence.decode()) clientSocket.close()
+
+Let's now take a look at the various lines in the code that differ
+significantly from the UDP implementation. The first such line is the
+creation of the client socket.
+
+clientSocket = socket(AF_INET, SOCK_STREAM)
+
+This line creates the client's socket, called clientSocket . The first
+parameter again indicates that the underlying network is using IPv4. The
+second parameter
+
+Figure 2.29 The client-server application using TCP
+
+indicates that the socket is of type SOCK_STREAM , which means it is a
+TCP socket (rather than a UDP socket). Note that we are again not
+specifying the port number of the client socket when we create it; we
+are instead letting the operating system do this for us. Now the next
+line of code is very different from what we saw in UDPClient:
+
+clientSocket.connect((serverName, serverPort))
+
+Recall that before the client can send data to the server (or vice
+versa) using a TCP socket, a TCP connection must first be established
+between the client and server. The above line initiates the TCP
+connection between the client and server. The parameter of the connect()
+method is the address of the server side of the connection. After this
+line of code is executed, the three-way handshake is performed and a TCP
+connection is established between the client and server.
+
+sentence = raw_input('Input lowercase sentence:')
+
+As with UDPClient, the above obtains a sentence from the user. The
+string sentence continues to gather characters until the user ends the
+line by typing a carriage return. The next line of code is also very
+different from UDPClient:
+
+clientSocket.send(sentence.encode())
+
+The above line sends the sentence through the client's socket and into
+the TCP connection. Note that the program does not explicitly create a
+packet and attach the destination address to the packet, as was the case
+with UDP sockets. Instead the client program simply drops the bytes in
+the string sentence into the TCP connection. The client then waits to
+receive bytes from the server.
+
+modifiedSentence = clientSocket.recv(2048)
+
+When characters arrive from the server, they get placed into the string
+modifiedSentence . Characters continue to accumulate in modifiedSentence
+until the line ends with a carriage return character. After printing the
+capitalized sentence, we close the client's socket:
+
+clientSocket.close()
+
+This last line closes the socket and, hence, closes the TCP connection
+between the client and the server. It causes TCP in the client to send a
+TCP message to TCP in the server (see Section 3.5).
+
+TCPServer.py Now let's take a look at the server program.
+
+from socket import \* serverPort = 12000 serverSocket = socket(AF_INET,
+SOCK_STREAM) serverSocket.bind(('', serverPort)) serverSocket.listen(1)
+print('The server is ready to receive') while True: connectionSocket,
+addr = serverSocket.accept() sentence =
+connectionSocket.recv(1024).decode() capitalizedSentence =
+sentence.upper() connectionSocket.send(capitalizedSentence.encode())
+connectionSocket.close()
+
+Let's now take a look at the lines that differ significantly from
+UDPServer and TCPClient. As with TCPClient, the server creates a TCP
+socket with:
+
+serverSocket=socket(AF_INET, SOCK_STREAM)
+
+Similar to UDPServer, we associate the server port number, serverPort ,
+with this socket:
+
+serverSocket.bind(('', serverPort))
+
+But with TCP, serverSocket will be our welcoming socket. After
+establishing this welcoming door, we will wait and listen for some
+client to knock on the door:
+
+serverSocket.listen(1)
+
+This line has the server listen for TCP connection requests from the
+client. The parameter specifies the maximum number of queued connections
+(at least 1).
+
+connectionSocket, addr = serverSocket.accept()
+
+When a client knocks on this door, the program invokes the accept()
+method for serverSocket, which creates a new socket in the server,
+called  connectionSocket , dedicated to this particular client. The
+client and server then complete the handshaking, creating a TCP
+connection between the client's clientSocket and the server's
+connectionSocket . With the TCP connection established, the client and
+server can now send bytes to each other over the connection. With TCP,
+all bytes sent from one side not are not only guaranteed to arrive at
+the other side but also guaranteed arrive in order.
+
+connectionSocket.close()
+
+In this program, after sending the modified sentence to the client, we
+close the connection socket. But since serverSocket remains open,
+another client can now knock on the door and send the server a sentence
+to modify. This completes our discussion of socket programming in TCP.
+You are encouraged to run the two programs in two separate hosts, and
+also to modify them to achieve slightly different goals. You should
+compare the UDP program pair with the TCP program pair and see how they
+differ. You should also do many of the socket programming assignments
+described at the ends of Chapter 2, 4, and 9. Finally, we hope someday,
+after mastering these and more advanced socket programs, you will write
+your own popular network application, become very rich and famous, and
+remember the authors of this textbook!
+
+2.8 Summary In this chapter, we've studied the conceptual and the
+implementation aspects of network applications. We've learned about the
+ubiquitous client-server architecture adopted by many Internet
+applications and seen its use in the HTTP, SMTP, POP3, and DNS
+protocols. We've studied these important applicationlevel protocols, and
+their corresponding associated applications (the Web, file transfer,
+e-mail, and DNS) in some detail. We've learned about the P2P
+architecture and how it is used in many applications. We've also learned
+about streaming video, and how modern video distribution systems
+leverage CDNs. We've examined how the socket API can be used to build
+network applications. We've walked through the use of sockets for
+connection-oriented (TCP) and connectionless (UDP) end-to-end transport
+services. The first step in our journey down the layered network
+architecture is now complete! At the very beginning of this book, in
+Section 1.1, we gave a rather vague, bare-bones definition of a
+protocol: "the format and the order of messages exchanged between two or
+more communicating entities, as well as the actions taken on the
+transmission and/or receipt of a message or other event." The material
+in this chapter, and in particular our detailed study of the HTTP, SMTP,
+POP3, and DNS protocols, has now added considerable substance to this
+definition. Protocols are a key concept in networking; our study of
+application protocols has now given us the opportunity to develop a more
+intuitive feel for what protocols are all about. In Section 2.1, we
+described the service models that TCP and UDP offer to applications that
+invoke them. We took an even closer look at these service models when we
+developed simple applications that run over TCP and UDP in Section 2.7.
+However, we have said little about how TCP and UDP provide these service
+models. For example, we know that TCP provides a reliable data service,
+but we haven't said yet how it does so. In the next chapter we'll take a
+careful look at not only the what, but also the how and why of transport
+protocols. Equipped with knowledge about Internet application structure
+and application-level protocols, we're now ready to head further down
+the protocol stack and examine the transport layer in Chapter 3.
+
+Homework Problems and Questions
+
+Chapter 2 Review Questions
+
+SECTION 2.1 R1. List five nonproprietary Internet applications and the
+application-layer protocols that they use. R2. What is the difference
+between network architecture and application architecture? R3. For a
+communication session between a pair of processes, which process is the
+client and which is the server? R4. For a P2P file-sharing application,
+do you agree with the statement, "There is no notion of client and
+server sides of a communication session"? Why or why not? R5. What
+information is used by a process running on one host to identify a
+process running on another host? R6. Suppose you wanted to do a
+transaction from a remote client to a server as fast as possible. Would
+you use UDP or TCP? Why? R7. Referring to Figure 2.4 , we see that none
+of the applications listed in Figure 2.4 requires both no data loss and
+timing. Can you conceive of an application that requires no data loss
+and that is also highly time-sensitive? R8. List the four broad classes
+of services that a transport protocol can provide. For each of the
+service classes, indicate if either UDP or TCP (or both) provides such a
+service. R9. Recall that TCP can be enhanced with SSL to provide
+process-to-process security services, including encryption. Does SSL
+operate at the transport layer or the application layer? If the
+application developer wants TCP to be enhanced with SSL, what does the
+developer have to do?
+
+SECTION 2.2--2.5 R10. What is meant by a handshaking protocol? R11. Why
+do HTTP, SMTP, and POP3 run on top of TCP rather than on UDP? R12.
+Consider an e-commerce site that wants to keep a purchase record for
+each of its customers. Describe how this can be done with cookies. R13.
+Describe how Web caching can reduce the delay in receiving a requested
+object. Will Web caching reduce the delay for all objects requested by a
+user or for only some of the objects?
+
+Why? R14. Telnet into a Web server and send a multiline request message.
+Include in the request message the If-modified-since: header line to
+force a response message with the 304 Not Modified status code. R15.
+List several popular messaging apps. Do they use the same protocols as
+SMS? R16. Suppose Alice, with a Web-based e-mail account (such as
+Hotmail or Gmail), sends a message to Bob, who accesses his mail from
+his mail server using POP3. Discuss how the message gets from Alice's
+host to Bob's host. Be sure to list the series of application-layer
+protocols that are used to move the message between the two hosts. R17.
+Print out the header of an e-mail message you have recently received.
+How many Received: header lines are there? Analyze each of the header
+lines in the message. R18. From a user's perspective, what is the
+difference between the download-and-delete mode and the
+download-and-keep mode in POP3? R19. Is it possible for an
+organization's Web server and mail server to have exactly the same alias
+for a hostname (for example, foo.com )? What would be the type for the
+RR that contains the hostname of the mail server? R20. Look over your
+received e-mails, and examine the header of a message sent from a user
+with a .edu e-mail address. Is it possible to determine from the header
+the IP address of the host from which the message was sent? Do the same
+for a message sent from a Gmail account.
+
+SECTION 2.5 R21. In BitTorrent, suppose Alice provides chunks to Bob
+throughout a 30-second interval. Will Bob necessarily return the favor
+and provide chunks to Alice in this same interval? Why or why not? R22.
+Consider a new peer Alice that joins BitTorrent without possessing any
+chunks. Without any chunks, she cannot become a top-four uploader for
+any of the other peers, since she has nothing to upload. How then will
+Alice get her first chunk? R23. What is an overlay network? Does it
+include routers? What are the edges in the overlay network?
+
+SECTION 2.6 R24. CDNs typically adopt one of two different server
+placement philosophies. Name and briefly describe them. R25. Besides
+network-related considerations such as delay, loss, and bandwidth
+performance, there are other important factors that go into designing a
+CDN server selection strategy. What are they?
+
+SECTION 2.7 R26. In Section 2.7, the UDP server described needed only
+one socket, whereas the TCP server needed two sockets. Why? If the TCP
+server were to support n simultaneous connections, each from a different
+client host, how many sockets would the TCP server need? R27. For the
+client-server application over TCP described in Section 2.7 , why must
+the server program be executed before the client program? For the
+client-server application over UDP, why may the client program be
+executed before the server program?
+
+Problems P1. True or false?
+
+a.  A user requests a Web page that consists of some text and three
+    images. For this page, the client will send one request message and
+    receive four response messages.
+
+b.  Two distinct Web pages (for example, www.mit.edu/research.html and
+    www.mit.edu/students.html ) can be sent over the same persistent
+    connection.
+
+c.  With nonpersistent connections between browser and origin server, it
+    is possible for a single TCP segment to carry two distinct HTTP
+    request messages.
+
+d.  The Date: header in the HTTP response message indicates when the
+    object in the response was last modified.
+
+e.  HTTP response messages never have an empty message body. P2. SMS,
+    iMessage, and WhatsApp are all smartphone real-time messaging
+    systems. After doing some research on the Internet, for each of
+    these systems write one paragraph about the protocols they use. Then
+    write a paragraph explaining how they differ. P3. Consider an HTTP
+    client that wants to retrieve a Web document at a given URL. The IP
+    address of the HTTP server is initially unknown. What transport and
+    application-layer protocols besides HTTP are needed in this
+    scenario? P4. Consider the following string of ASCII characters that
+    were captured by Wireshark when the browser sent an HTTP GET message
+    (i.e., this is the actual content of an HTTP GET message). The
+    characters `<cr>`{=html}`<lf>`{=html} are carriage return and
+    line-feed characters (that is, the italized character string
+    `<cr>`{=html} in the text below represents the single
+    carriage-return character that was contained at that point in the
+    HTTP header). Answer the following questions, indicating where in
+    the HTTP GET message below you find the answer. GET
+    /cs453/index.html HTTP/1.1`<cr>`{=html}`<lf>`{=html}Host: gai
+    a.cs.umass.edu`<cr>`{=html}`<lf>`{=html}User-Agent: Mozilla/5.0 (
+    Windows;U; Windows NT 5.1; en-US; rv:1.7.2) Gec ko/20040804
+    Netscape/7.2 (ax) `<cr>`{=html}`<lf>`{=html}Accept:ex
+
+t/xml, application/xml, application/xhtml+xml, text /html;q=0.9,
+text/plain;q=0.8, image/png,*/*;q=0.5
+`<cr>`{=html}`<lf>`{=html}Accept-Language: en-us,
+en;q=0.5`<cr>`{=html}`<lf>`{=html}AcceptEncoding: zip,
+deflate`<cr>`{=html}`<lf>`{=html}Accept-Charset: ISO -8859-1,
+utf-8;q=0.7,\*;q=0.7`<cr>`{=html}`<lf>`{=html}Keep-Alive:
+300`<cr>`{=html}
+`<lf>`{=html}Connection:keep-alive`<cr>`{=html}`<lf>`{=html}`<cr>`{=html}`<lf>`{=html}
+
+a.  What is the URL of the document requested by the browser?
+
+b.  What version of HTTP is the browser running?
+
+c.  Does the browser request a non-persistent or a persistent
+    connection?
+
+d.  What is the IP address of the host on which the browser is running?
+
+e.  What type of browser initiates this message? Why is the browser type
+    needed in an HTTP request message? P5. The text below shows the
+    reply sent from the server in response to the HTTP GET message in
+    the question above. Answer the following questions, indicating where
+    in the message below you find the answer. HTTP/1.1 200
+    OK`<cr>`{=html}`<lf>`{=html}Date: Tue, 07 Mar 2008
+    12:39:45GMT`<cr>`{=html}`<lf>`{=html}Server: Apache/2.0.52 (Fedora)
+    `<cr>`{=html}`<lf>`{=html}Last-Modified: Sat, 10 Dec2005 18:27:46
+    GMT`<cr>`{=html}`<lf>`{=html}ETag:
+    "526c3-f22-a88a4c80"`<cr>`{=html}`<lf>`{=html}AcceptRanges:
+    bytes`<cr>`{=html}`<lf>`{=html}Content-Length:
+    3874`<cr>`{=html}`<lf>`{=html} Keep-Alive:
+    timeout=max=100`<cr>`{=html}`<lf>`{=html}Connection:
+    Keep-Alive`<cr>`{=html}`<lf>`{=html}Content-Type: text/html;
+    charset=
+    ISO-8859-1`<cr>`{=html}`<lf>`{=html}`<cr>`{=html}`<lf>`{=html}\<!doctype
+    html public "//w3c//dtd html 4.0 transitional//en"\>`<lf>`{=html}
+
+    ```{=html}
+    <html>
+    ```
+    `<lf>`{=html}
+
+    ```{=html}
+    <head>
+    ```
+    `<lf>`{=html}
+
+    ```{=html}
+    <meta http-equiv=”Content-Type”
+    content=”text/html; charset=iso-8859-1”>
+    ```
+    `<lf>`{=html} \<meta name="GENERATOR" content="Mozilla/4.79 \[en\]
+    (Windows NT 5.0; U) Netscape\]"\>`<lf>`{=html}
+
+    ```{=html}
+    <title>
+    ```
+    CMPSCI 453 / 591 / NTU-ST550ASpring 2005 homepage
+
+    ```{=html}
+    </title>
+    ```
+    `<lf>`{=html}
+
+    ```{=html}
+    </head>
+    ```
+    `<lf>`{=html} \<much more document text following here (not shown)\>
+
+f.  Was the server able to successfully find the document or not? What
+    time was the document reply provided?
+
+g.  When was the document last modified?
+
+h.  How many bytes are there in the document being returned?
+
+i.  What are the first 5 bytes of the document being returned? Did the
+    server agree to a
+
+persistent connection? P6. Obtain the HTTP/1.1 specification (RFC 2616).
+Answer the following questions:
+
+a.  Explain the mechanism used for signaling between the client and
+    server to indicate that a persistent connection is being closed. Can
+    the client, the server, or both signal the close of a connection?
+
+b.  What encryption services are provided by HTTP?
+
+c.  Can a client open three or more simultaneous connections with a
+    given server?
+
+d.  Either a server or a client may close a transport connection between
+    them if either one detects the connection has been idle for some
+    time. Is it possible that one side starts closing a connection while
+    the other side is transmitting data via this connection? Explain.
+    P7. Suppose within your Web browser you click on a link to obtain a
+    Web page. The IP address for the associated URL is not cached in
+    your local host, so a DNS lookup is necessary to obtain the IP
+    address. Suppose that n DNS servers are visited before your host
+    receives the IP address from DNS; the successive visits incur an RTT
+    of RTT1,. . .,RTTn. Further suppose that the Web page associated
+    with the link contains exactly one object, consisting of a small
+    amount of HTML text. Let RTT0 denote the RTT between the local host
+    and the server containing the object. Assuming zero transmission
+    time of the object, how much time elapses from when the client
+    clicks on the link until the client receives the object? P8.
+    Referring to Problem P7, suppose the HTML file references eight very
+    small objects on the same server. Neglecting transmission times, how
+    much time elapses with
+
+e.  Non-persistent HTTP with no parallel TCP connections?
+
+f.  Non-persistent HTTP with the browser configured for 5 parallel
+    connections?
+
+g.  Persistent HTTP? P9. Consider Figure 2.12 , for which there is an
+    institutional network connected to the Internet. Suppose that the
+    average object size is 850,000 bits and that the average request
+    rate from the institution's browsers to the origin servers is 16
+    requests per second. Also suppose that the amount of time it takes
+    from when the router on the Internet side of the access link
+    forwards an HTTP request until it receives the response is three
+    seconds on average (see Section 2.2.5). Model the total average
+    response time as the sum of the average access delay (that is, the
+    delay from Internet router to institution router) and the average
+    Internet delay. For the average access delay, use Δ/(1−Δβ), where Δ
+    is the average time required to send an object over the access link
+    and b is the arrival rate of objects to the access link.
+
+h.  Find the total average response time.
+
+i.  Now suppose a cache is installed in the institutional LAN. Suppose
+    the miss rate is 0.4. Find the total response time.
+
+P10. Consider a short, 10-meter link, over which a sender can transmit
+at a rate of 150 bits/sec in both directions. Suppose that packets
+containing data are 100,000 bits long, and packets containing only
+control (e.g., ACK or handshaking) are 200 bits long. Assume that N
+parallel connections each get 1/N of the link bandwidth. Now consider
+the HTTP protocol, and suppose that each downloaded object is 100 Kbits
+long, and that the initial downloaded object contains 10 referenced
+objects from the same sender. Would parallel downloads via parallel
+instances of non-persistent HTTP make sense in this case? Now consider
+persistent HTTP. Do you expect significant gains over the non-persistent
+case? Justify and explain your answer. P11. Consider the scenario
+introduced in the previous problem. Now suppose that the link is shared
+by Bob with four other users. Bob uses parallel instances of
+non-persistent HTTP, and the other four users use non-persistent HTTP
+without parallel downloads.
+
+a.  Do Bob's parallel connections help him get Web pages more quickly?
+    Why or why not?
+b.  If all five users open five parallel instances of non-persistent
+    HTTP, then would Bob's parallel connections still be beneficial? Why
+    or why not? P12. Write a simple TCP program for a server that
+    accepts lines of input from a client and prints the lines onto the
+    server's standard output. (You can do this by modifying the
+    TCPServer.py program in the text.) Compile and execute your program.
+    On any other machine that contains a Web browser, set the proxy
+    server in the browser to the host that is running your server
+    program; also configure the port number appropriately. Your browser
+    should now send its GET request messages to your server, and your
+    server should display the messages on its standard output. Use this
+    platform to determine whether your browser generates conditional GET
+    messages for objects that are locally cached. P13. What is the
+    difference between MAIL FROM : in SMTP and From : in the mail
+    message itself? P14. How does SMTP mark the end of a message body?
+    How about HTTP? Can HTTP use the same method as SMTP to mark the end
+    of a message body? Explain. P15. Read RFC 5321 for SMTP. What does
+    MTA stand for? Consider the following received spam e-mail (modified
+    from a real spam e-mail). Assuming only the originator of this spam
+    e-mail is malicious and all other hosts are honest, identify the
+    malacious host that has generated this spam e-mail.
+
+From - Fri Nov 07 13:41:30 2008 Return-Path: <tennis5@pp33head.com>
+Received: from barmail.cs.umass.edu (barmail.cs.umass.edu
+\[128.119.240.3\]) by cs.umass.edu (8.13.1/8.12.6) for
+<hg@cs.umass.edu>; Fri, 7 Nov 2008 13:27:10 -0500 Received: from
+asusus-4b96 (localhost \[127.0.0.1\]) by barmail.cs.umass.edu (Spam
+Firewall) for <hg@cs.umass.edu>; Fri, 7
+
+Nov 2008 13:27:07 -0500 (EST) Received: from asusus-4b96
+(\[58.88.21.177\]) by barmail.cs.umass.edu for <hg@cs.umass.edu>; Fri,
+07 Nov 2008 13:27:07 -0500 (EST) Received: from \[58.88.21.177\] by
+inbnd55.exchangeddd.com; Sat, 8 Nov 2008 01:27:07 +0700 From: "Jonny"
+<tennis5@pp33head.com> To: <hg@cs.umass.edu> Subject: How to secure your
+savings
+
+P16. Read the POP3 RFC, RFC 1939. What is the purpose of the UIDL POP3
+command? P17. Consider accessing your e-mail with POP3.
+
+a.  Suppose you have configured your POP mail client to operate in the
+    download-anddelete mode. Complete the following transaction:
+
+C: list S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S:
+..........blah S: . ? ?
+
+b.  Suppose you have configured your POP mail client to operate in the
+    download-and-keep mode. Complete the following transaction: C: list
+    S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: ..........blah
+    S: . ?
+
+?
+
+c.  Suppose you have configured your POP mail client to operate in the
+    download-and-keep mode. Using your transcript in part (b), suppose
+    you retrieve messages 1 and 2, exit POP, and then five minutes later
+    you again access POP to retrieve new e-mail. Suppose that in the
+    five-minute interval no new messages have been sent to you. Provide
+    a transcript of this second POP session. P18.
+
+d.  What is a whois database?
+
+e.  Use various whois databases on the Internet to obtain the names of
+    two DNS servers. Indicate which whois databases you used.
+
+f.  Use nslookup on your local host to send DNS queries to three DNS
+    servers: your local DNS server and the two DNS servers you found in
+    part (b). Try querying for Type A, NS, and MX reports. Summarize
+    your findings.
+
+g.  Use nslookup to find a Web server that has multiple IP addresses.
+    Does the Web server of your institution (school or company) have
+    multiple IP addresses?
+
+h.  Use the ARIN whois database to determine the IP address range used
+    by your university.
+
+i.  Describe how an attacker can use whois databases and the nslookup
+    tool to perform reconnaissance on an institution before launching an
+    attack.
+
+j.  Discuss why whois databases should be publicly available. P19. In
+    this problem, we use the useful dig tool available on Unix and Linux
+    hosts to explore the hierarchy of DNS servers. Recall that in Figure
+    2.19 , a DNS server in the DNS hierarchy delegates a DNS query to a
+    DNS server lower in the hierarchy, by sending back to the DNS client
+    the name of that lower-level DNS server. First read the man page for
+    dig, and then answer the following questions.
+
+k.  Starting with a root DNS server (from one of the root servers
+    \[a-m\].root-servers.net), initiate a sequence of queries for the IP
+    address for your department's Web server by using dig. Show the list
+    of the names of DNS servers in the delegation chain in answering
+    your query.
+
+l.  Repeat part (a) for several popular Web sites, such as google.com,
+    yahoo.com, or amazon.com. P20. Suppose you can access the caches in
+    the local DNS servers of your department. Can you propose a way to
+    roughly determine the Web servers (outside your department) that are
+    most popular among the users in your department? Explain. P21.
+    Suppose that your department has a local DNS server for all
+    computers in the department.
+
+You are an ordinary user (i.e., not a network/system administrator). Can
+you determine if an external Web site was likely accessed from a
+computer in your department a couple of seconds ago? Explain. P22.
+Consider distributing a file of F=15 Gbits to N peers. The server has an
+upload rate of us=30 Mbps, and each peer has a download rate of di=2
+Mbps and an upload rate of u. For N=10, 100, and 1,000 and u=300 Kbps,
+700 Kbps, and 2 Mbps, prepare a chart giving the minimum distribution
+time for each of the combinations of N and u for both client-server
+distribution and P2P distribution. P23. Consider distributing a file of
+F bits to N peers using a client-server architecture. Assume a fluid
+model where the server can simultaneously transmit to multiple peers,
+transmitting to each peer at different rates, as long as the combined
+rate does not exceed us.
+
+a.  Suppose that us/N≤dmin. Specify a distribution scheme that has a
+    distribution time of NF/us.
+
+b.  Suppose that us/N≥dmin. Specify a distribution scheme that has a
+    distribution time of F/dmin.
+
+c.  Conclude that the minimum distribution time is in general given by
+    max{NF/us, F/dmin}. P24. Consider distributing a file of F bits to N
+    peers using a P2P architecture. Assume a fluid model. For simplicity
+    assume that dmin is very large, so that peer download bandwidth is
+    never a bottleneck.
+
+d.  Suppose that us≤(us+u1+...+uN)/N. Specify a distribution scheme that
+    has a distribution time of F/us.
+
+e.  Suppose that us≥(us+u1+...+uN)/N. Specify a distribution scheme that
+    has a distribution time of NF/(us+u1+...+uN).
+
+f.  Conclude that the minimum distribution time is in general given by
+    max{F/us, NF/(us+u1+...+uN)}. P25. Consider an overlay network with
+    N active peers, with each pair of peers having an active TCP
+    connection. Additionally, suppose that the TCP connections pass
+    through a total of M routers. How many nodes and edges are there in
+    the corresponding overlay network? P26. Suppose Bob joins a
+    BitTorrent torrent, but he does not want to upload any data to any
+    other peers (so called free-riding).
+
+g.  Bob claims that he can receive a complete copy of the file that is
+    shared by the swarm. Is Bob's claim possible? Why or why not?
+
+h.  Bob further claims that he can further make his "free-riding" more
+    efficient by using a collection of multiple computers (with distinct
+    IP addresses) in the computer lab in his department. How can he do
+    that? P27. Consider a DASH system for which there are N video
+    versions (at N different rates and qualities) and N audio versions
+    (at N different rates and qualities). Suppose we want to allow the
+
+player to choose at any time any of the N video versions and any of the
+N audio versions.
+
+a.  If we create files so that the audio is mixed in with the video, so
+    server sends only one media stream at given time, how many files
+    will the server need to store (each a different URL)?
+
+b.  If the server instead sends the audio and video streams separately
+    and has the client synchronize the streams, how many files will the
+    server need to store? P28. Install and compile the Python programs
+    TCPClient and UDPClient on one host and TCPServer and UDPServer on
+    another host.
+
+c.  Suppose you run TCPClient before you run TCPServer. What happens?
+    Why?
+
+d.  Suppose you run UDPClient before you run UDPServer. What happens?
+    Why?
+
+e.  What happens if you use different port numbers for the client and
+    server sides? P29. Suppose that in UDPClient.py, after we create the
+    socket, we add the line: clientSocket.bind(('', 5432))
+
+Will it become necessary to change UDPServer.py? What are the port
+numbers for the sockets in UDPClient and UDPServer? What were they
+before making this change? P30. Can you configure your browser to open
+multiple simultaneous connections to a Web site? What are the advantages
+and disadvantages of having a large number of simultaneous TCP
+connections? P31. We have seen that Internet TCP sockets treat the data
+being sent as a byte stream but UDP sockets recognize message
+boundaries. What are one advantage and one disadvantage of byte-oriented
+API versus having the API explicitly recognize and preserve
+application-defined message boundaries? P32. What is the Apache Web
+server? How much does it cost? What functionality does it currently
+have? You may want to look at Wikipedia to answer this question.
+
+Socket Programming Assignments The Companion Website includes six socket
+programming assignments. The first four assignments are summarized
+below. The fifth assignment makes use of the ICMP protocol and is
+summarized at the end of Chapter 5. The sixth assignment employs
+multimedia protocols and is summarized at the end of Chapter 9. It is
+highly recommended that students complete several, if not all, of these
+assignments. Students can find full details of these assignments, as
+well as important snippets of the Python code, at the Web site
+www.pearsonhighered.com/cs-resources. Assignment 1: Web Server
+
+In this assignment, you will develop a simple Web server in Python that
+is capable of processing only one request. Specifically, your Web server
+will (i) create a connection socket when contacted by a client
+(browser); (ii) receive the HTTP request from this connection; (iii)
+parse the request to determine the specific file being requested; (iv)
+get the requested file from the server's file system; (v) create an HTTP
+response message consisting of the requested file preceded by header
+lines; and (vi) send the response over the TCP connection to the
+requesting browser. If a browser requests a file that is not present in
+your server, your server should return a "404 Not Found" error message.
+In the Companion Website, we provide the skeleton code for your server.
+Your job is to complete the code, run your server, and then test your
+server by sending requests from browsers running on different hosts. If
+you run your server on a host that already has a Web server running on
+it, then you should use a different port than port 80 for your Web
+server. Assignment 2: UDP Pinger In this programming assignment, you
+will write a client ping program in Python. Your client will send a
+simple ping message to a server, receive a corresponding pong message
+back from the server, and determine the delay between when the client
+sent the ping message and received the pong message. This delay is
+called the Round Trip Time (RTT). The functionality provided by the
+client and server is similar to the functionality provided by standard
+ping program available in modern operating systems. However, standard
+ping programs use the Internet Control Message Protocol (ICMP) (which we
+will study in Chapter 5). Here we will create a nonstandard (but
+simple!) UDP-based ping program. Your ping program is to send 10 ping
+messages to the target server over UDP. For each message, your client is
+to determine and print the RTT when the corresponding pong message is
+returned. Because UDP is an unreliable protocol, a packet sent by the
+client or server may be lost. For this reason, the client cannot wait
+indefinitely for a reply to a ping message. You should have the client
+wait up to one second for a reply from the server; if no reply is
+received, the client should assume that the packet was lost and print a
+message accordingly. In this assignment, you will be given the complete
+code for the server (available in the Companion Website). Your job is to
+write the client code, which will be very similar to the server code. It
+is recommended that you first study carefully the server code. You can
+then write your client code, liberally cutting and pasting lines from
+the server code. Assignment 3: Mail Client The goal of this programming
+assignment is to create a simple mail client that sends e-mail to any
+recipient. Your client will need to establish a TCP connection with a
+mail server (e.g., a Google mail server), dialogue with the mail server
+using the SMTP protocol, send an e-mail message to a recipient
+
+(e.g., your friend) via the mail server, and finally close the TCP
+connection with the mail server. For this assignment, the Companion
+Website provides the skeleton code for your client. Your job is to
+complete the code and test your client by sending e-mail to different
+user accounts. You may also try sending through different servers (for
+example, through a Google mail server and through your university mail
+server). Assignment 4: Multi-Threaded Web Proxy In this assignment, you
+will develop a Web proxy. When your proxy receives an HTTP request for
+an object from a browser, it generates a new HTTP request for the same
+object and sends it to the origin server. When the proxy receives the
+corresponding HTTP response with the object from the origin server, it
+creates a new HTTP response, including the object, and sends it to the
+client. This proxy will be multi-threaded, so that it will be able to
+handle multiple requests at the same time. For this assignment, the
+Companion Website provides the skeleton code for the proxy server. Your
+job is to complete the code, and then test it by having different
+browsers request Web objects via your proxy.
+
+Wireshark Lab: HTTP Having gotten our feet wet with the Wireshark packet
+sniffer in Lab 1, we're now ready to use Wireshark to investigate
+protocols in operation. In this lab, we'll explore several aspects of
+the HTTP protocol: the basic GET/reply interaction, HTTP message
+formats, retrieving large HTML files, retrieving HTML files with
+embedded URLs, persistent and non-persistent connections, and HTTP
+authentication and security. As is the case with all Wireshark labs, the
+full description of this lab is available at this book's Web site,
+www.pearsonhighered.com/cs-resources.
+
+Wireshark Lab: DNS In this lab, we take a closer look at the client side
+of the DNS, the protocol that translates Internet hostnames to IP
+addresses. Recall from Section 2.5 that the client's role in the DNS is
+relatively simple ---a client sends a query to its local DNS server and
+receives a response back. Much can go on under the covers, invisible to
+the DNS clients, as the hierarchical DNS servers communicate with each
+other to either recursively or iteratively resolve the client's DNS
+query. From the DNS client's standpoint, however, the protocol is quite
+simple---a query is formulated to the local DNS server and a response is
+received from that server. We observe DNS in action in this lab.
+
+As is the case with all Wireshark labs, the full description of this lab
+is available at this book's Web site,
+www.pearsonhighered.com/cs-resources. An Interview With... Marc
+Andreessen Marc Andreessen is the co-creator of Mosaic, the Web browser
+that popularized the World Wide Web in 1993. Mosaic had a clean, easily
+understood interface and was the first browser to display images in-line
+with text. In 1994, Marc Andreessen and Jim Clark founded Netscape,
+whose browser was by far the most popular browser through the mid-1990s.
+Netscape also developed the Secure Sockets Layer (SSL) protocol and many
+Internet server products, including mail servers and SSL-based Web
+servers. He is now a co-founder and general partner of venture capital
+firm Andreessen Horowitz, overseeing portfolio development with holdings
+that include Facebook, Foursquare, Groupon, Jawbone, Twitter, and Zynga.
+He serves on numerous boards, including Bump, eBay, Glam Media,
+Facebook, and Hewlett-Packard. He holds a BS in Computer Science from
+the University of Illinois at Urbana-Champaign.
+
+How did you become interested in computing? Did you always know that you
+wanted to work in information technology? The video game and personal
+computing revolutions hit right when I was growing up---personal
+computing was the new technology frontier in the late 70's and early
+80's. And it wasn't just Apple and the IBM PC, but hundreds of new
+companies like Commodore and Atari as well. I taught myself to program
+out of a book called "Instant Freeze-Dried BASIC" at age 10, and got my
+first computer (a TRS-80 Color Computer---look it up!) at age 12. Please
+describe one or two of the most exciting projects you have worked on
+during your career.
+
+What were the biggest challenges? Undoubtedly the most exciting project
+was the original Mosaic web browser in '92--'93---and the biggest
+challenge was getting anyone to take it seriously back then. At the
+time, everyone thought the interactive future would be delivered as
+"interactive television" by huge companies, not as the Internet by
+startups. What excites you about the future of networking and the
+Internet? What are your biggest concerns? The most exciting thing is the
+huge unexplored frontier of applications and services that programmers
+and entrepreneurs are able to explore---the Internet has unleashed
+creativity at a level that I don't think we've ever seen before. My
+biggest concern is the principle of unintended consequences---we don't
+always know the implications of what we do, such as the Internet being
+used by governments to run a new level of surveillance on citizens. Is
+there anything in particular students should be aware of as Web
+technology advances? The rate of change---the most important thing to
+learn is how to learn---how to flexibly adapt to changes in the specific
+technologies, and how to keep an open mind on the new opportunities and
+possibilities as you move through your career. What people inspired you
+professionally? Vannevar Bush, Ted Nelson, Doug Engelbart, Nolan
+Bushnell, Bill Hewlett and Dave Packard, Ken Olsen, Steve Jobs, Steve
+Wozniak, Andy Grove, Grace Hopper, Hedy Lamarr, Alan Turing, Richard
+Stallman. What are your recommendations for students who want to pursue
+careers in computing and information technology? Go as deep as you
+possibly can on understanding how technology is created, and then
+complement with learning how business works. Can technology solve the
+world's problems? No, but we advance the standard of living of people
+through economic growth, and most economic growth throughout history has
+come from technology---so that's as good as it gets.
+
+Chapter 3 Transport Layer
+
+Residing between the application and network layers, the transport layer
+is a central piece of the layered network architecture. It has the
+critical role of providing communication services directly to the
+application processes running on different hosts. The pedagogic approach
+we take in this chapter is to alternate between discussions of
+transport-layer principles and discussions of how these principles are
+implemented in existing protocols; as usual, particular emphasis will be
+given to Internet protocols, in particular the TCP and UDP
+transport-layer protocols. We'll begin by discussing the relationship
+between the transport and network layers. This sets the stage for
+examining the first critical function of the transport layer---extending
+the network layer's delivery service between two end systems to a
+delivery service between two application-layer processes running on the
+end systems. We'll illustrate this function in our coverage of the
+Internet's connectionless transport protocol, UDP. We'll then return to
+principles and confront one of the most fundamental problems in computer
+networking---how two entities can communicate reliably over a medium
+that may lose and corrupt data. Through a series of increasingly
+complicated (and realistic!) scenarios, we'll build up an array of
+techniques that transport protocols use to solve this problem. We'll
+then show how these principles are embodied in TCP, the Internet's
+connection-oriented transport protocol. We'll next move on to a second
+fundamentally important problem in networking---controlling the
+transmission rate of transport-layer entities in order to avoid, or
+recover from, congestion within the network. We'll consider the causes
+and consequences of congestion, as well as commonly used
+congestion-control techniques. After obtaining a solid understanding of
+the issues behind congestion control, we'll study TCP's approach to
+congestion control.
+
+3.1 Introduction and Transport-Layer Services In the previous two
+chapters we touched on the role of the transport layer and the services
+that it provides. Let's quickly review what we have already learned
+about the transport layer. A transport-layer protocol provides for
+logical communication between application processes running on different
+hosts. By logical communication, we mean that from an application's
+perspective, it is as if the hosts running the processes were directly
+connected; in reality, the hosts may be on opposite sides of the planet,
+connected via numerous routers and a wide range of link types.
+Application processes use the logical communication provided by the
+transport layer to send messages to each other, free from the worry of
+the details of the physical infrastructure used to carry these messages.
+Figure 3.1 illustrates the notion of logical communication. As shown in
+Figure 3.1, transport-layer protocols are implemented in the end systems
+but not in network routers. On the sending side, the transport layer
+converts the application-layer messages it receives from a sending
+application process into transport-layer packets, known as
+transport-layer segments in Internet terminology. This is done by
+(possibly) breaking the application messages into smaller chunks and
+adding a transport-layer header to each chunk to create the
+transport-layer segment. The transport layer then passes the segment to
+the network layer at the sending end system, where the segment is
+encapsulated within a network-layer packet (a datagram) and sent to the
+destination. It's important to note that network routers act only on the
+network-layer fields of the datagram; that is, they do not examine the
+fields of the transport-layer segment encapsulated with the datagram. On
+the receiving side, the network layer extracts the transport-layer
+segment from the datagram and passes the segment up to the transport
+layer. The transport layer then processes the received segment, making
+the data in the segment available to the receiving application. More
+than one transport-layer protocol may be available to network
+applications. For example, the Internet has two protocols---TCP and UDP.
+Each of these protocols provides a different set of transportlayer
+services to the invoking application.
+
+3.1.1 Relationship Between Transport and Network Layers Recall that the
+transport layer lies just above the network layer in the protocol stack.
+Whereas a transport-layer protocol provides logical communication
+between
+
+Figure 3.1 The transport layer provides logical rather than physical
+communication between application processes
+
+processes running on different hosts, a network-layer protocol provides
+logical-communication between hosts. This distinction is subtle but
+important. Let's examine this distinction with the aid of a household
+analogy. Consider two houses, one on the East Coast and the other on the
+West Coast, with each house being home to a dozen kids. The kids in the
+East Coast household are cousins of the kids in the West Coast
+
+household. The kids in the two households love to write to each
+other---each kid writes each cousin every week, with each letter
+delivered by the traditional postal service in a separate envelope.
+Thus, each household sends 144 letters to the other household every
+week. (These kids would save a lot of money if they had e-mail!) In each
+of the households there is one kid---Ann in the West Coast house and
+Bill in the East Coast house---responsible for mail collection and mail
+distribution. Each week Ann visits all her brothers and sisters,
+collects the mail, and gives the mail to a postal-service mail carrier,
+who makes daily visits to the house. When letters arrive at the West
+Coast house, Ann also has the job of distributing the mail to her
+brothers and sisters. Bill has a similar job on the East Coast. In this
+example, the postal service provides logical communication between the
+two houses---the postal service moves mail from house to house, not from
+person to person. On the other hand, Ann and Bill provide logical
+communication among the cousins---Ann and Bill pick up mail from, and
+deliver mail to, their brothers and sisters. Note that from the cousins'
+perspective, Ann and Bill are the mail service, even though Ann and Bill
+are only a part (the end-system part) of the end-to-end delivery
+process. This household example serves as a nice analogy for explaining
+how the transport layer relates to the network layer: application
+messages = letters in envelopes processes = cousins hosts (also called
+end systems) = houses transport-layer protocol = Ann and Bill
+network-layer protocol = postal service (including mail carriers)
+Continuing with this analogy, note that Ann and Bill do all their work
+within their respective homes; they are not involved, for example, in
+sorting mail in any intermediate mail center or in moving mail from one
+mail center to another. Similarly, transport-layer protocols live in the
+end systems. Within an end system, a transport protocol moves messages
+from application processes to the network edge (that is, the network
+layer) and vice versa, but it doesn't have any say about how the
+messages are moved within the network core. In fact, as illustrated in
+Figure 3.1, intermediate routers neither act on, nor recognize, any
+information that the transport layer may have added to the application
+messages. Continuing with our family saga, suppose now that when Ann and
+Bill go on vacation, another cousin pair---say, Susan and
+Harvey---substitute for them and provide the household-internal
+collection and delivery of mail. Unfortunately for the two families,
+Susan and Harvey do not do the collection and delivery in exactly the
+same way as Ann and Bill. Being younger kids, Susan and Harvey pick up
+and drop off the mail less frequently and occasionally lose letters
+(which are sometimes chewed up by the family dog). Thus, the cousin-pair
+Susan and Harvey do not provide the same set of services (that is, the
+same service model) as Ann and Bill. In an analogous manner, a computer
+network may make
+
+available multiple transport protocols, with each protocol offering a
+different service model to applications. The possible services that Ann
+and Bill can provide are clearly constrained by the possible services
+that the postal service provides. For example, if the postal service
+doesn't provide a maximum bound on how long it can take to deliver mail
+between the two houses (for example, three days), then there is no way
+that Ann and Bill can guarantee a maximum delay for mail delivery
+between any of the cousin pairs. In a similar manner, the services that
+a transport protocol can provide are often constrained by the service
+model of the underlying network-layer protocol. If the network-layer
+protocol cannot provide delay or bandwidth guarantees for
+transport-layer segments sent between hosts, then the transport-layer
+protocol cannot provide delay or bandwidth guarantees for application
+messages sent between processes. Nevertheless, certain services can be
+offered by a transport protocol even when the underlying network
+protocol doesn't offer the corresponding service at the network layer.
+For example, as we'll see in this chapter, a transport protocol can
+offer reliable data transfer service to an application even when the
+underlying network protocol is unreliable, that is, even when the
+network protocol loses, garbles, or duplicates packets. As another
+example (which we'll explore in Chapter 8 when we discuss network
+security), a transport protocol can use encryption to guarantee that
+application messages are not read by intruders, even when the network
+layer cannot guarantee the confidentiality of transport-layer segments.
+
+3.1.2 Overview of the Transport Layer in the Internet Recall that the
+Internet makes two distinct transport-layer protocols available to the
+application layer. One of these protocols is UDP (User Datagram
+Protocol), which provides an unreliable, connectionless service to the
+invoking application. The second of these protocols is TCP (Transmission
+Control Protocol), which provides a reliable, connection-oriented
+service to the invoking application. When designing a network
+application, the application developer must specify one of these two
+transport protocols. As we saw in Section 2.7, the application developer
+selects between UDP and TCP when creating sockets. To simplify
+terminology, we refer to the transport-layer packet as a segment. We
+mention, however, that the Internet literature (for example, the RFCs)
+also refers to the transport-layer packet for TCP as a segment but often
+refers to the packet for UDP as a datagram. But this same Internet
+literature also uses the term datagram for the network-layer packet! For
+an introductory book on computer networking such as this, we believe
+that it is less confusing to refer to both TCP and UDP packets as
+segments, and reserve the term datagram for the network-layer packet.
+
+Before proceeding with our brief introduction of UDP and TCP, it will be
+useful to say a few words about the Internet's network layer. (We'll
+learn about the network layer in detail in Chapters 4 and 5.) The
+Internet's network-layer protocol has a name---IP, for Internet
+Protocol. IP provides logical communication between hosts. The IP
+service model is a best-effort delivery service. This means that IP
+makes its "best effort" to deliver segments between communicating hosts,
+but it makes no guarantees. In particular, it does not guarantee segment
+delivery, it does not guarantee orderly delivery of segments, and it
+does not guarantee the integrity of the data in the segments. For these
+reasons, IP is said to be an unreliable service. We also mention here
+that every host has at least one networklayer address, a so-called IP
+address. We'll examine IP addressing in detail in Chapter 4; for this
+chapter we need only keep in mind that each host has an IP address.
+Having taken a glimpse at the IP service model, let's now summarize the
+service models provided by UDP and TCP. The most fundamental
+responsibility of UDP and TCP is to extend IP's delivery service between
+two end systems to a delivery service between two processes running on
+the end systems. Extending host-to-host delivery to process-to-process
+delivery is called transport-layer multiplexing and demultiplexing.
+We'll discuss transport-layer multiplexing and demultiplexing in the
+next section. UDP and TCP also provide integrity checking by including
+error-detection fields in their segments' headers. These two minimal
+transport-layer services---process-to-process data delivery and error
+checking---are the only two services that UDP provides! In particular,
+like IP, UDP is an unreliable service---it does not guarantee that data
+sent by one process will arrive intact (or at all!) to the destination
+process. UDP is discussed in detail in Section 3.3. TCP, on the other
+hand, offers several additional services to applications. First and
+foremost, it provides reliable data transfer. Using flow control,
+sequence numbers, acknowledgments, and timers (techniques we'll explore
+in detail in this chapter), TCP ensures that data is delivered from
+sending process to receiving process, correctly and in order. TCP thus
+converts IP's unreliable service between end systems into a reliable
+data transport service between processes. TCP also provides congestion
+control. Congestion control is not so much a service provided to the
+invoking application as it is a service for the Internet as a whole, a
+service for the general good. Loosely speaking, TCP congestion control
+prevents any one TCP connection from swamping the links and routers
+between communicating hosts with an excessive amount of traffic. TCP
+strives to give each connection traversing a congested link an equal
+share of the link bandwidth. This is done by regulating the rate at
+which the sending sides of TCP connections can send traffic into the
+network. UDP traffic, on the other hand, is unregulated. An application
+using UDP transport can send at any rate it pleases, for as long as it
+pleases. A protocol that provides reliable data transfer and congestion
+control is necessarily complex. We'll need several sections to cover the
+principles of reliable data transfer and congestion control, and
+additional sections to cover the TCP protocol itself. These topics are
+investigated in Sections 3.4 through 3.8. The approach taken in this
+chapter is to alternate between basic principles and the TCP protocol.
+For example, we'll first discuss reliable data transfer in a general
+setting and then discuss how TCP
+
+specifically provides reliable data transfer. Similarly, we'll first
+discuss congestion control in a general setting and then discuss how TCP
+performs congestion control. But before getting into all this good
+stuff, let's first look at transport-layer multiplexing and
+demultiplexing.
+
+3.2 Multiplexing and Demultiplexing In this section, we discuss
+transport-layer multiplexing and demultiplexing, that is, extending the
+host-tohost delivery service provided by the network layer to a
+process-to-process delivery service for applications running on the
+hosts. In order to keep the discussion concrete, we'll discuss this
+basic transport-layer service in the context of the Internet. We
+emphasize, however, that a multiplexing/demultiplexing service is needed
+for all computer networks. At the destination host, the transport layer
+receives segments from the network layer just below. The transport layer
+has the responsibility of delivering the data in these segments to the
+appropriate application process running in the host. Let's take a look
+at an example. Suppose you are sitting in front of your computer, and
+you are downloading Web pages while running one FTP session and two
+Telnet sessions. You therefore have four network application processes
+running---two Telnet processes, one FTP process, and one HTTP process.
+When the transport layer in your computer receives data from the network
+layer below, it needs to direct the received data to one of these four
+processes. Let's now examine how this is done. First recall from Section
+2.7 that a process (as part of a network application) can have one or
+more sockets, doors through which data passes from the network to the
+process and through which data passes from the process to the network.
+Thus, as shown in Figure 3.2, the transport layer in the receiving host
+does not actually deliver data directly to a process, but instead to an
+intermediary socket. Because at any given time there can be more than
+one socket in the receiving host, each socket has a unique identifier.
+The format of the identifier depends on whether the socket is a UDP or a
+TCP socket, as we'll discuss shortly. Now let's consider how a receiving
+host directs an incoming transport-layer segment to the appropriate
+socket. Each transport-layer segment has a set of fields in the segment
+for this purpose. At the receiving end, the transport layer examines
+these fields to identify the receiving socket and then directs the
+segment to that socket. This job of delivering the data in a
+transport-layer segment to the correct socket is called demultiplexing.
+The job of gathering data chunks at the source host from different
+sockets, encapsulating each data chunk with header information (that
+will later be used in demultiplexing) to create segments, and passing
+the segments to the network layer is called multiplexing. Note that the
+transport layer in the middle host
+
+Figure 3.2 Transport-layer multiplexing and demultiplexing
+
+in Figure 3.2 must demultiplex segments arriving from the network layer
+below to either process P1 or P2 above; this is done by directing the
+arriving segment's data to the corresponding process's socket. The
+transport layer in the middle host must also gather outgoing data from
+these sockets, form transportlayer segments, and pass these segments
+down to the network layer. Although we have introduced multiplexing and
+demultiplexing in the context of the Internet transport protocols, it's
+important to realize that they are concerns whenever a single protocol
+at one layer (at the transport layer or elsewhere) is used by multiple
+protocols at the next higher layer. To illustrate the demultiplexing
+job, recall the household analogy in the previous section. Each of the
+kids is identified by his or her name. When Bill receives a batch of
+mail from the mail carrier, he performs a demultiplexing operation by
+observing to whom the letters are addressed and then hand delivering the
+mail to his brothers and sisters. Ann performs a multiplexing operation
+when she collects letters from her brothers and sisters and gives the
+collected mail to the mail person. Now that we understand the roles of
+transport-layer multiplexing and demultiplexing, let us examine how it
+is actually done in a host. From the discussion above, we know that
+transport-layer multiplexing requires (1) that sockets have unique
+identifiers, and (2) that each segment have special fields that indicate
+the socket to which the segment is to be delivered. These special
+fields, illustrated in Figure 3.3, are the source port number field and
+the destination port number field. (The UDP and TCP segments have other
+fields as well, as discussed in the subsequent sections of this
+chapter.) Each port number is a 16-bit number, ranging from 0 to 65535.
+The port numbers ranging from 0 to 1023 are called well-known port
+numbers and are restricted, which means that they are reserved for use
+by well-known
+
+Figure 3.3 Source and destination port-number fields in a
+transport-layer segment
+
+application protocols such as HTTP (which uses port number 80) and FTP
+(which uses port number 21). The list of well-known port numbers is
+given in RFC 1700 and is updated at http://www.iana.org \[RFC 3232\].
+When we develop a new application (such as the simple application
+developed in Section 2.7), we must assign the application a port number.
+It should now be clear how the transport layer could implement the
+demultiplexing service: Each socket in the host could be assigned a port
+number, and when a segment arrives at the host, the transport layer
+examines the destination port number in the segment and directs the
+segment to the corresponding socket. The segment's data then passes
+through the socket into the attached process. As we'll see, this is
+basically how UDP does it. However, we'll also see that
+multiplexing/demultiplexing in TCP is yet more subtle. Connectionless
+Multiplexing and Demultiplexing Recall from Section 2.7.1 that the
+Python program running in a host can create a UDP socket with the line
+
+clientSocket = socket(AF_INET, SOCK_DGRAM)
+
+When a UDP socket is created in this manner, the transport layer
+automatically assigns a port number to the socket. In particular, the
+transport layer assigns a port number in the range 1024 to 65535 that is
+currently not being used by any other UDP port in the host.
+Alternatively, we can add a line into our Python program after we create
+the socket to associate a specific port number (say, 19157) to this UDP
+socket via the socket bind() method:
+
+clientSocket.bind(('', 19157))
+
+If the application developer writing the code were implementing the
+server side of a "well-known protocol," then the developer would have to
+assign the corresponding well-known port number. Typically, the client
+side of the application lets the transport layer automatically (and
+transparently) assign the port number, whereas the server side of the
+application assigns a specific port number. With port numbers assigned
+to UDP sockets, we can now precisely describe UDP
+multiplexing/demultiplexing. Suppose a process in Host A, with UDP port
+19157, wants to send a chunk of application data to a process with UDP
+port 46428 in Host B. The transport layer in Host A creates a
+transport-layer segment that includes the application data, the source
+port number (19157), the destination port number (46428), and two other
+values (which will be discussed later, but are unimportant for the
+current discussion). The transport layer then passes the resulting
+segment to the network layer. The network layer encapsulates the segment
+in an IP datagram and makes a best-effort attempt to deliver the segment
+to the receiving host. If the segment arrives at the receiving Host B,
+the transport layer at the receiving host examines the destination port
+number in the segment (46428) and delivers the segment to its socket
+identified by port 46428. Note that Host B could be running multiple
+processes, each with its own UDP socket and associated port number. As
+UDP segments arrive from the network, Host B directs (demultiplexes)
+each segment to the appropriate socket by examining the segment's
+destination port number. It is important to note that a UDP socket is
+fully identified by a two-tuple consisting of a destination IP address
+and a destination port number. As a consequence, if two UDP segments
+have different source IP addresses and/or source port numbers, but have
+the same destination IP address and destination port number, then the
+two segments will be directed to the same destination process via the
+same destination socket. You may be wondering now, what is the purpose
+of the source port number? As shown in Figure 3.4, in the A-to-B segment
+the source port number serves as part of a "return address"---when B
+wants to send a segment back to A, the destination port in the B-to-A
+segment will take its value from the source port value of the A-to-B
+segment. (The complete return address is A's IP address and the source
+port number.) As an example, recall the UDP server program studied in
+Section 2.7. In UDPServer.py , the server uses the recvfrom() method to
+extract the client-side (source) port number from the segment it
+receives from the client; it then sends a new segment to the client,
+with the extracted source port number serving as the destination port
+number in this new segment. Connection-Oriented Multiplexing and
+Demultiplexing In order to understand TCP demultiplexing, we have to
+take a close look at TCP sockets and TCP connection establishment. One
+subtle difference between a TCP socket and a UDP socket is that a TCP
+
+socket is identified by a four-tuple: (source IP address, source port
+number, destination IP address, destination port number). Thus, when a
+TCP segment arrives from the network to a host, the host uses all four
+values to direct (demultiplex) the segment to the appropriate socket.
+
+Figure 3.4 The inversion of source and destination port numbers
+
+In particular, and in contrast with UDP, two arriving TCP segments with
+different source IP addresses or source port numbers will (with the
+exception of a TCP segment carrying the original connectionestablishment
+request) be directed to two different sockets. To gain further insight,
+let's reconsider the TCP client-server programming example in Section
+2.7.2: The TCP server application has a "welcoming socket," that waits
+for connection-establishment requests from TCP clients (see Figure 2.29)
+on port number 12000. The TCP client creates a socket and sends a
+connection establishment request segment with the lines:
+
+clientSocket = socket(AF_INET, SOCK_STREAM)
+clientSocket.connect((serverName,12000))
+
+A connection-establishment request is nothing more than a TCP segment
+with destination port number 12000 and a special
+connection-establishment bit set in the TCP header (discussed in Section
+3.5). The segment also includes a source port number that was chosen by
+the client. When the host operating system of the computer running the
+server process receives the incoming
+
+connection-request segment with destination port 12000, it locates the
+server process that is waiting to accept a connection on port number
+12000. The server process then creates a new socket: connectionSocket,
+addr = serverSocket.accept()
+
+Also, the transport layer at the server notes the following four values
+in the connection-request segment: (1) the source port number in the
+segment, (2) the IP address of the source host, (3) the destination port
+number in the segment, and (4) its own IP address. The newly created
+connection socket is identified by these four values; all subsequently
+arriving segments whose source port, source IP address, destination
+port, and destination IP address match these four values will be
+demultiplexed to this socket. With the TCP connection now in place, the
+client and server can now send data to each other. The server host may
+support many simultaneous TCP connection sockets, with each socket
+attached to a process, and with each socket identified by its own
+four-tuple. When a TCP segment arrives at the host, all four fields
+(source IP address, source port, destination IP address, destination
+port) are used to direct (demultiplex) the segment to the appropriate
+socket.
+
+FOCUS ON SECURITY Port Scanning We've seen that a server process waits
+patiently on an open port for contact by a remote client. Some ports are
+reserved for well-known applications (e.g., Web, FTP, DNS, and SMTP
+servers); other ports are used by convention by popular applications
+(e.g., the Microsoft 2000 SQL server listens for requests on UDP port
+1434). Thus, if we determine that a port is open on a host, we may be
+able to map that port to a specific application running on the host.
+This is very useful for system administrators, who are often interested
+in knowing which network applications are running on the hosts in their
+networks. But attackers, in order to "case the joint," also want to know
+which ports are open on target hosts. If a host is found to be running
+an application with a known security flaw (e.g., a SQL server listening
+on port 1434 was subject to a buffer overflow, allowing a remote user to
+execute arbitrary code on the vulnerable host, a flaw exploited by the
+Slammer worm \[CERT 2003--04\]), then that host is ripe for attack.
+Determining which applications are listening on which ports is a
+relatively easy task. Indeed there are a number of public domain
+programs, called port scanners, that do just that. Perhaps the most
+widely used of these is nmap, freely available at http://nmap.org and
+included in most Linux distributions. For TCP, nmap sequentially scans
+ports, looking for ports that are accepting TCP connections. For UDP,
+nmap again sequentially scans ports, looking for UDP ports that respond
+to transmitted UDP segments. In both cases, nmap returns a list of open,
+closed, or unreachable ports. A host running nmap can attempt to scan
+any target host anywhere in the
+
+Internet. We'll revisit nmap in Section 3.5.6, when we discuss TCP
+connection management.
+
+Figure 3.5 Two clients, using the same destination port number (80) to
+communicate with the same Web server application
+
+The situation is illustrated in Figure 3.5, in which Host C initiates
+two HTTP sessions to server B, and Host A initiates one HTTP session to
+B. Hosts A and C and server B each have their own unique IP address---A,
+C, and B, respectively. Host C assigns two different source port numbers
+(26145 and 7532) to its two HTTP connections. Because Host A is choosing
+source port numbers independently of C, it might also assign a source
+port of 26145 to its HTTP connection. But this is not a problem---server
+B will still be able to correctly demultiplex the two connections having
+the same source port number, since the two connections have different
+source IP addresses. Web Servers and TCP Before closing this discussion,
+it's instructive to say a few additional words about Web servers and how
+they use port numbers. Consider a host running a Web server, such as an
+Apache Web server, on port 80. When clients (for example, browsers) send
+segments to the server, all segments will have destination port 80. In
+particular, both the initial connection-establishment segments and the
+segments carrying HTTP request messages will have destination port 80.
+As we have just described, the server distinguishes the segments from
+the different clients using source IP addresses and source port
+
+numbers. Figure 3.5 shows a Web server that spawns a new process for
+each connection. As shown in Figure 3.5, each of these processes has its
+own connection socket through which HTTP requests arrive and HTTP
+responses are sent. We mention, however, that there is not always a
+one-to-one correspondence between connection sockets and processes. In
+fact, today's high-performing Web servers often use only one process,
+and create a new thread with a new connection socket for each new client
+connection. (A thread can be viewed as a lightweight subprocess.) If you
+did the first programming assignment in Chapter 2, you built a Web
+server that does just this. For such a server, at any given time there
+may be many connection sockets (with different identifiers) attached to
+the same process. If the client and server are using persistent HTTP,
+then throughout the duration of the persistent connection the client and
+server exchange HTTP messages via the same server socket. However, if
+the client and server use non-persistent HTTP, then a new TCP connection
+is created and closed for every request/response, and hence a new socket
+is created and later closed for every request/response. This frequent
+creating and closing of sockets can severely impact the performance of a
+busy Web server (although a number of operating system tricks can be
+used to mitigate the problem). Readers interested in the operating
+system issues surrounding persistent and non-persistent HTTP are
+encouraged to see \[Nielsen 1997; Nahum 2002\]. Now that we've discussed
+transport-layer multiplexing and demultiplexing, let's move on and
+discuss one of the Internet's transport protocols, UDP. In the next
+section we'll see that UDP adds little more to the network-layer
+protocol than a multiplexing/demultiplexing service.
+
+3.3 Connectionless Transport: UDP In this section, we'll take a close
+look at UDP, how it works, and what it does. We encourage you to refer
+back to Section 2.1, which includes an overview of the UDP service
+model, and to Section 2.7.1, which discusses socket programming using
+UDP. To motivate our discussion about UDP, suppose you were interested
+in designing a no-frills, bare-bones transport protocol. How might you
+go about doing this? You might first consider using a vacuous transport
+protocol. In particular, on the sending side, you might consider taking
+the messages from the application process and passing them directly to
+the network layer; and on the receiving side, you might consider taking
+the messages arriving from the network layer and passing them directly
+to the application process. But as we learned in the previous section,
+we have to do a little more than nothing! At the very least, the
+transport layer has to provide a multiplexing/demultiplexing service in
+order to pass data between the network layer and the correct
+application-level process. UDP, defined in \[RFC 768\], does just about
+as little as a transport protocol can do. Aside from the
+multiplexing/demultiplexing function and some light error checking, it
+adds nothing to IP. In fact, if the application developer chooses UDP
+instead of TCP, then the application is almost directly talking with IP.
+UDP takes messages from the application process, attaches source and
+destination port number fields for the multiplexing/demultiplexing
+service, adds two other small fields, and passes the resulting segment
+to the network layer. The network layer encapsulates the transport-layer
+segment into an IP datagram and then makes a best-effort attempt to
+deliver the segment to the receiving host. If the segment arrives at the
+receiving host, UDP uses the destination port number to deliver the
+segment's data to the correct application process. Note that with UDP
+there is no handshaking between sending and receiving transport-layer
+entities before sending a segment. For this reason, UDP is said to be
+connectionless. DNS is an example of an application-layer protocol that
+typically uses UDP. When the DNS application in a host wants to make a
+query, it constructs a DNS query message and passes the message to UDP.
+Without performing any handshaking with the UDP entity running on the
+destination end system, the host-side UDP adds header fields to the
+message and passes the resulting segment to the network layer. The
+network layer encapsulates the UDP segment into a datagram and sends the
+datagram to a name server. The DNS application at the querying host then
+waits for a reply to its query. If it doesn't receive a reply (possibly
+because the underlying network lost the query or the reply), it might
+try resending the query, try sending the query to another name server,
+or inform the invoking application that it can't get a reply.
+
+Now you might be wondering why an application developer would ever
+choose to build an application over UDP rather than over TCP. Isn't TCP
+always preferable, since TCP provides a reliable data transfer service,
+while UDP does not? The answer is no, as some applications are better
+suited for UDP for the following reasons: Finer application-level
+control over what data is sent, and when. Under UDP, as soon as an
+application process passes data to UDP, UDP will package the data inside
+a UDP segment and immediately pass the segment to the network layer.
+TCP, on the other hand, has a congestioncontrol mechanism that throttles
+the transport-layer TCP sender when one or more links between the source
+and destination hosts become excessively congested. TCP will also
+continue to resend a segment until the receipt of the segment has been
+acknowledged by the destination, regardless of how long reliable
+delivery takes. Since real-time applications often require a minimum
+sending rate, do not want to overly delay segment transmission, and can
+tolerate some data loss, TCP's service model is not particularly well
+matched to these applications' needs. As discussed below, these
+applications can use UDP and implement, as part of the application, any
+additional functionality that is needed beyond UDP's no-frills
+segment-delivery service. No connection establishment. As we'll discuss
+later, TCP uses a three-way handshake before it starts to transfer data.
+UDP just blasts away without any formal preliminaries. Thus UDP does not
+introduce any delay to establish a connection. This is probably the
+principal reason why DNS runs over UDP rather than TCP---DNS would be
+much slower if it ran over TCP. HTTP uses TCP rather than UDP, since
+reliability is critical for Web pages with text. But, as we briefly
+discussed in Section 2.2, the TCP connection-establishment delay in HTTP
+is an important contributor to the delays associated with downloading
+Web documents. Indeed, the QUIC protocol (Quick UDP Internet Connection,
+\[Iyengar 2015\]), used in Google's Chrome browser, uses UDP as its
+underlying transport protocol and implements reliability in an
+application-layer protocol on top of UDP. No connection state. TCP
+maintains connection state in the end systems. This connection state
+includes receive and send buffers, congestion-control parameters, and
+sequence and acknowledgment number parameters. We will see in Section
+3.5 that this state information is needed to implement TCP's reliable
+data transfer service and to provide congestion control. UDP, on the
+other hand, does not maintain connection state and does not track any of
+these parameters. For this reason, a server devoted to a particular
+application can typically support many more active clients when the
+application runs over UDP rather than TCP. Small packet header overhead.
+The TCP segment has 20 bytes of header overhead in every segment,
+whereas UDP has only 8 bytes of overhead. Figure 3.6 lists popular
+Internet applications and the transport protocols that they use. As we
+expect, email, remote terminal access, the Web, and file transfer run
+over TCP---all these applications need the reliable data transfer
+service of TCP. Nevertheless, many important applications run over UDP
+rather than TCP. For example, UDP is used to carry network management
+(SNMP; see Section 5.7) data. UDP is preferred to TCP in this case,
+since network management applications must often run when the
+
+network is in a stressed state---precisely when reliable,
+congestion-controlled data transfer is difficult to achieve. Also, as we
+mentioned earlier, DNS runs over UDP, thereby avoiding TCP's
+connectionestablishment delays. As shown in Figure 3.6, both UDP and TCP
+are somtimes used today with multimedia applications, such as Internet
+phone, real-time video conferencing, and streaming of stored audio and
+video. We'll take a close look at these applications in Chapter 9. We
+just mention now that all of these applications can tolerate a small
+amount of packet loss, so that reliable data transfer is not absolutely
+critical for the application's success. Furthermore, real-time
+applications, like Internet phone and video conferencing, react very
+poorly to TCP's congestion control. For these reasons, developers of
+multimedia applications may choose to run their applications over UDP
+instead of TCP. When packet loss rates are low, and with some
+organizations blocking UDP traffic for security reasons (see Chapter 8),
+TCP becomes an increasingly attractive protocol for streaming media
+transport.
+
+Figure 3.6 Popular Internet applications and their underlying transport
+protocols
+
+Although commonly done today, running multimedia applications over UDP
+is controversial. As we mentioned above, UDP has no congestion control.
+But congestion control is needed to prevent the network from entering a
+congested state in which very little useful work is done. If everyone
+were to start streaming high-bit-rate video without using any congestion
+control, there would be so much packet overflow at routers that very few
+UDP packets would successfully traverse the source-to-destination path.
+Moreover, the high loss rates induced by the uncontrolled UDP senders
+would cause the TCP senders (which, as we'll see, do decrease their
+sending rates in the face of congestion) to dramatically decrease their
+rates. Thus, the lack of congestion control in UDP can result in high
+loss rates between a UDP sender and receiver, and the crowding out of
+TCP sessions---a potentially serious problem \[Floyd
+
+1999\]. Many researchers have proposed new mechanisms to force all
+sources, including UDP sources, to perform adaptive congestion control
+\[Mahdavi 1997; Floyd 2000; Kohler 2006: RFC 4340\]. Before discussing
+the UDP segment structure, we mention that it is possible for an
+application to have reliable data transfer when using UDP. This can be
+done if reliability is built into the application itself (for example,
+by adding acknowledgment and retransmission mechanisms, such as those
+we'll study in the next section). We mentioned earlier that the QUIC
+protocol \[Iyengar 2015\] used in Google's Chrome browser implements
+reliability in an application-layer protocol on top of UDP. But this is
+a nontrivial task that would keep an application developer busy
+debugging for a long time. Nevertheless, building reliability directly
+into the application allows the application to "have its cake and eat it
+too. That is, application processes can communicate reliably without
+being subjected to the transmission-rate constraints imposed by TCP's
+congestion-control mechanism.
+
+3.3.1 UDP Segment Structure The UDP segment structure, shown in Figure
+3.7, is defined in RFC 768. The application data occupies the data field
+of the UDP segment. For example, for DNS, the data field contains either
+a query message or a response message. For a streaming audio
+application, audio samples fill the data field. The UDP header has only
+four fields, each consisting of two bytes. As discussed in the previous
+section, the port numbers allow the destination host to pass the
+application data to the correct process running on the destination end
+system (that is, to perform the demultiplexing function). The length
+field specifies the number of bytes in the UDP segment (header plus
+data). An explicit length value is needed since the size of the data
+field may differ from one UDP segment to the next. The checksum is used
+by the receiving host to check whether errors have been introduced into
+the segment. In truth, the checksum is also calculated over a few of the
+fields in the IP header in addition to the UDP segment. But we ignore
+this detail in order to see the forest through the trees. We'll discuss
+the checksum calculation below. Basic principles of error detection are
+described in Section 6.2. The length field specifies the length of the
+UDP segment, including the header, in bytes.
+
+3.3.2 UDP Checksum The UDP checksum provides for error detection. That
+is, the checksum is used to determine whether bits within the UDP
+segment have been altered (for example, by noise in the links or while
+stored in a router) as it moved from source to destination.
+
+Figure 3.7 UDP segment structure
+
+UDP at the sender side performs the 1s complement of the sum of all the
+16-bit words in the segment, with any overflow encountered during the
+sum being wrapped around. This result is put in the checksum field of
+the UDP segment. Here we give a simple example of the checksum
+calculation. You can find details about efficient implementation of the
+calculation in RFC 1071 and performance over real data in \[Stone 1998;
+Stone 2000\]. As an example, suppose that we have the following three
+16-bit words: 0110011001100000 0101010101010101 1000111100001100 The sum
+of first two of these 16-bit words is 0110011001100000 0101010101010101
+1011101110110101 Adding the third word to the above sum gives
+1011101110110101 1000111100001100 0100101011000010 Note that this last
+addition had overflow, which was wrapped around. The 1s complement is
+obtained by converting all the 0s to 1s and converting all the 1s to 0s.
+Thus the 1s complement of the sum 0100101011000010 is 1011010100111101,
+which becomes the checksum. At the receiver, all four 16-
+
+bit words are added, including the checksum. If no errors are introduced
+into the packet, then clearly the sum at the receiver will be
+1111111111111111. If one of the bits is a 0, then we know that errors
+have been introduced into the packet. You may wonder why UDP provides a
+checksum in the first place, as many link-layer protocols (including the
+popular Ethernet protocol) also provide error checking. The reason is
+that there is no guarantee that all the links between source and
+destination provide error checking; that is, one of the links may use a
+link-layer protocol that does not provide error checking. Furthermore,
+even if segments are correctly transferred across a link, it's possible
+that bit errors could be introduced when a segment is stored in a
+router's memory. Given that neither link-by-link reliability nor
+in-memory error detection is guaranteed, UDP must provide error
+detection at the transport layer, on an end-end basis, if the endend
+data transfer service is to provide error detection. This is an example
+of the celebrated end-end principle in system design \[Saltzer 1984\],
+which states that since certain functionality (error detection, in this
+case) must be implemented on an end-end basis: "functions placed at the
+lower levels may be redundant or of little value when compared to the
+cost of providing them at the higher level." Because IP is supposed to
+run over just about any layer-2 protocol, it is useful for the transport
+layer to provide error checking as a safety measure. Although UDP
+provides error checking, it does not do anything to recover from an
+error. Some implementations of UDP simply discard the damaged segment;
+others pass the damaged segment to the application with a warning. That
+wraps up our discussion of UDP. We will soon see that TCP offers
+reliable data transfer to its applications as well as other services
+that UDP doesn't offer. Naturally, TCP is also more complex than UDP.
+Before discussing TCP, however, it will be useful to step back and first
+discuss the underlying principles of reliable data transfer.
+
+3.4 Principles of Reliable Data Transfer In this section, we consider
+the problem of reliable data transfer in a general context. This is
+appropriate since the problem of implementing reliable data transfer
+occurs not only at the transport layer, but also at the link layer and
+the application layer as well. The general problem is thus of central
+importance to networking. Indeed, if one had to identify a "top-ten"
+list of fundamentally important problems in all of networking, this
+would be a candidate to lead the list. In the next section we'll examine
+TCP and show, in particular, that TCP exploits many of the principles
+that we are about to describe. Figure 3.8 illustrates the framework for
+our study of reliable data transfer. The service abstraction provided to
+the upper-layer entities is that of a reliable channel through which
+data can be transferred. With a reliable channel, no transferred data
+bits are corrupted (flipped from 0 to 1, or vice versa) or lost, and all
+are delivered in the order in which they were sent. This is precisely
+the service model offered by TCP to the Internet applications that
+invoke it. It is the responsibility of a reliable data transfer protocol
+to implement this service abstraction. This task is made difficult by
+the fact that the layer below the reliable data transfer protocol may be
+unreliable. For example, TCP is a reliable data transfer protocol that
+is implemented on top of an unreliable (IP) end-to-end network layer.
+More generally, the layer beneath the two reliably communicating end
+points might consist of a single physical link (as in the case of a
+link-level data transfer protocol) or a global internetwork (as in the
+case of a transport-level protocol). For our purposes, however, we can
+view this lower layer simply as an unreliable point-to-point channel. In
+this section, we will incrementally develop the sender and receiver
+sides of a reliable data transfer protocol, considering increasingly
+complex models of the underlying channel. For example, we'll consider
+what protocol mechanisms are
+
+Figure 3.8 Reliable data transfer: Service model and service
+implementation
+
+needed when the underlying channel can corrupt bits or lose entire
+packets. One assumption we'll adopt throughout our discussion here is
+that packets will be delivered in the order in which they were sent,
+with some packets possibly being lost; that is, the underlying channel
+will not reorder packets. Figure 3.8(b) illustrates the interfaces for
+our data transfer protocol. The sending side of the data transfer
+protocol will be invoked from above by a call to rdt_send() . It will
+pass the data to be delivered to the upper layer at the receiving side.
+(Here rdt stands for reliable data transfer protocol and \_send
+indicates that the sending side of rdt is being called. The first step
+in developing any protocol is to choose a good name!) On the receiving
+side, rdt_rcv() will be called when a packet arrives from the receiving
+side of the channel. When the rdt protocol wants to deliver data to the
+upper layer, it will do so by calling deliver_data() . In the following
+we use the terminology "packet" rather than transport-layer "segment."
+Because the theory developed in this section applies to computer
+networks in general and not just to the Internet transport layer, the
+generic term "packet" is perhaps more appropriate here. In this section
+we consider only the case of unidirectional data transfer, that is, data
+transfer from the sending to the receiving side. The case of reliable
+bidirectional (that is, full-duplex) data transfer is conceptually no
+more difficult but considerably more tedious to explain. Although we
+consider only unidirectional data transfer, it is important to note that
+the sending and receiving sides of our protocol will nonetheless need to
+transmit packets in both directions, as indicated in Figure 3.8. We will
+see shortly that, in addition to exchanging packets containing the data
+to be transferred, the sending and receiving sides of rdt will also need
+to exchange control packets back and forth. Both the send and receive
+sides of rdt send packets to the other side by a call to udt_send()
+(where udt stands for unreliable data transfer).
+
+3.4.1 Building a Reliable Data Transfer Protocol We now step through a
+series of protocols, each one becoming more complex, arriving at a
+flawless, reliable data transfer protocol. Reliable Data Transfer over a
+Perfectly Reliable Channel: rdt1.0 We first consider the simplest case,
+in which the underlying channel is completely reliable. The protocol
+itself, which we'll call rdt1.0 , is trivial. The finite-state machine
+(FSM) definitions for the rdt1.0 sender and receiver are shown in Figure
+3.9. The FSM in Figure 3.9(a) defines the operation of the sender, while
+the FSM in Figure 3.9(b) defines the operation of the receiver. It is
+important to note that there are separate FSMs for the sender and for
+the receiver. The sender and receiver FSMs in Figure 3.9 each have just
+one state. The arrows in the FSM description indicate the transition of
+the protocol from one state to another. (Since each FSM in Figure 3.9
+has just one state, a transition is necessarily from the one state back
+to itself; we'll see more complicated state diagrams shortly.) The event
+causing
+
+the transition is shown above the horizontal line labeling the
+transition, and the actions taken when the event occurs are shown below
+the horizontal line. When no action is taken on an event, or no event
+occurs and an action is taken, we'll use the symbol Λ below or above the
+horizontal, respectively, to explicitly denote the lack of an action or
+event. The initial state of the FSM is indicated by the dashed arrow.
+Although the FSMs in Figure 3.9 have but one state, the FSMs we will see
+shortly have multiple states, so it will be important to identify the
+initial state of each FSM. The sending side of rdt simply accepts data
+from the upper layer via the rdt_send(data) event, creates a packet
+containing the data (via the action make_pkt(data) ) and sends the
+packet into the channel. In practice, the rdt_send(data) event would
+result from a procedure call (for example, to rdt_send() ) by the
+upper-layer application.
+
+Figure 3.9 rdt1.0 -- A protocol for a completely reliable channel
+
+On the receiving side, rdt receives a packet from the underlying channel
+via the rdt_rcv(packet) event, removes the data from the packet (via the
+action extract (packet, data) ) and passes the data up to the upper
+layer (via the action deliver_data(data) ). In practice, the
+rdt_rcv(packet) event would result from a procedure call (for example,
+to rdt_rcv() ) from the lower-layer protocol. In this simple protocol,
+there is no difference between a unit of data and a packet. Also, all
+packet flow is from the sender to receiver; with a perfectly reliable
+channel there is no need for the receiver side to provide any feedback
+to the sender since nothing can go wrong! Note that we have also assumed
+that
+
+the receiver is able to receive data as fast as the sender happens to
+send data. Thus, there is no need for the receiver to ask the sender to
+slow down! Reliable Data Transfer over a Channel with Bit Errors: rdt2.0
+A more realistic model of the underlying channel is one in which bits in
+a packet may be corrupted. Such bit errors typically occur in the
+physical components of a network as a packet is transmitted, propagates,
+or is buffered. We'll continue to assume for the moment that all
+transmitted packets are received (although their bits may be corrupted)
+in the order in which they were sent. Before developing a protocol for
+reliably communicating over such a channel, first consider how people
+might deal with such a situation. Consider how you yourself might
+dictate a long message over the phone. In a typical scenario, the
+message taker might say "OK" after each sentence has been heard,
+understood, and recorded. If the message taker hears a garbled sentence,
+you're asked to repeat the garbled sentence. This message-dictation
+protocol uses both positive acknowledgments ("OK") and negative
+acknowledgments ("Please repeat that."). These control messages allow
+the receiver to let the sender know what has been received correctly,
+and what has been received in error and thus requires repeating. In a
+computer network setting, reliable data transfer protocols based on such
+retransmission are known as ARQ (Automatic Repeat reQuest) protocols.
+Fundamentally, three additional protocol capabilities are required in
+ARQ protocols to handle the presence of bit errors: Error detection.
+First, a mechanism is needed to allow the receiver to detect when bit
+errors have occurred. Recall from the previous section that UDP uses the
+Internet checksum field for exactly this purpose. In Chapter 6 we'll
+examine error-detection and -correction techniques in greater detail;
+these techniques allow the receiver to detect and possibly correct
+packet bit errors. For now, we need only know that these techniques
+require that extra bits (beyond the bits of original data to be
+transferred) be sent from the sender to the receiver; these bits will be
+gathered into the packet checksum field of the rdt2.0 data packet.
+Receiver feedback. Since the sender and receiver are typically executing
+on different end systems, possibly separated by thousands of miles, the
+only way for the sender to learn of the receiver's view of the world (in
+this case, whether or not a packet was received correctly) is for the
+receiver to provide explicit feedback to the sender. The positive (ACK)
+and negative (NAK) acknowledgment replies in the message-dictation
+scenario are examples of such feedback. Our rdt2.0 protocol will
+similarly send ACK and NAK packets back from the receiver to the sender.
+In principle, these packets need only be one bit long; for example, a 0
+value could indicate a NAK and a value of 1 could indicate an ACK.
+Retransmission. A packet that is received in error at the receiver will
+be retransmitted by the sender.
+
+Figure 3.10 shows the FSM representation of rdt2.0 , a data transfer
+protocol employing error detection, positive acknowledgments, and
+negative acknowledgments. The send side of rdt2.0 has two states. In the
+leftmost state, the send-side protocol is waiting for data to be passed
+down from the upper layer. When the rdt_send(data) event occurs, the
+sender will create a packet ( sndpkt ) containing the data to be sent,
+along with a packet checksum (for example, as discussed in Section 3.3.2
+for the case of a UDP segment), and then send the packet via the
+udt_send(sndpkt) operation. In the rightmost state, the sender protocol
+is waiting for an ACK or a NAK packet from the receiver. If an ACK
+packet is received
+
+Figure 3.10 rdt2.0 -- A protocol for a channel with bit errors
+
+(the notation rdt_rcv(rcvpkt) && isACK (rcvpkt) in Figure 3.10
+corresponds to this event), the sender knows that the most recently
+transmitted packet has been received correctly and thus the protocol
+returns to the state of waiting for data from the upper layer. If a NAK
+is received, the protocol retransmits the last packet and waits for an
+ACK or NAK to be returned by the receiver in response to
+
+the retransmitted data packet. It is important to note that when the
+sender is in the wait-for-ACK-or-NAK state, it cannot get more data from
+the upper layer; that is, the rdt_send() event can not occur; that will
+happen only after the sender receives an ACK and leaves this state.
+Thus, the sender will not send a new piece of data until it is sure that
+the receiver has correctly received the current packet. Because of this
+behavior, protocols such as rdt2.0 are known as stop-and-wait protocols.
+The receiver-side FSM for rdt2.0 still has a single state. On packet
+arrival, the receiver replies with either an ACK or a NAK, depending on
+whether or not the received packet is corrupted. In Figure 3.10, the
+notation rdt_rcv(rcvpkt) && corrupt(rcvpkt) corresponds to the event in
+which a packet is received and is found to be in error. Protocol rdt2.0
+may look as if it works but, unfortunately, it has a fatal flaw. In
+particular, we haven't accounted for the possibility that the ACK or NAK
+packet could be corrupted! (Before proceeding on, you should think about
+how this problem may be fixed.) Unfortunately, our slight oversight is
+not as innocuous as it may seem. Minimally, we will need to add checksum
+bits to ACK/NAK packets in order to detect such errors. The more
+difficult question is how the protocol should recover from errors in ACK
+or NAK packets. The difficulty here is that if an ACK or NAK is
+corrupted, the sender has no way of knowing whether or not the receiver
+has correctly received the last piece of transmitted data. Consider
+three possibilities for handling corrupted ACKs or NAKs: For the first
+possibility, consider what a human might do in the message-dictation
+scenario. If the speaker didn't understand the "OK" or "Please repeat
+that" reply from the receiver, the speaker would probably ask, "What did
+you say?" (thus introducing a new type of sender-to-receiver packet to
+our protocol). The receiver would then repeat the reply. But what if the
+speaker's "What did you say?" is corrupted? The receiver, having no idea
+whether the garbled sentence was part of the dictation or a request to
+repeat the last reply, would probably then respond with "What did you
+say?" And then, of course, that response might be garbled. Clearly,
+we're heading down a difficult path. A second alternative is to add
+enough checksum bits to allow the sender not only to detect, but also to
+recover from, bit errors. This solves the immediate problem for a
+channel that can corrupt packets but not lose them. A third approach is
+for the sender simply to resend the current data packet when it receives
+a garbled ACK or NAK packet. This approach, however, introduces
+duplicate packets into the sender-to-receiver channel. The fundamental
+difficulty with duplicate packets is that the receiver doesn't know
+whether the ACK or NAK it last sent was received correctly at the
+sender. Thus, it cannot know a priori whether an arriving packet
+contains new data or is a retransmission! A simple solution to this new
+problem (and one adopted in almost all existing data transfer protocols,
+including TCP) is to add a new field to the data packet and have the
+sender number its data packets by putting a sequence number into this
+field. The receiver then need only check this sequence number to
+
+determine whether or not the received packet is a retransmission. For
+this simple case of a stop-andwait protocol, a 1-bit sequence number
+will suffice, since it will allow the receiver to know whether the
+sender is resending the previously transmitted packet (the sequence
+number of the received packet has the same sequence number as the most
+recently received packet) or a new packet (the sequence number changes,
+moving "forward" in modulo-2 arithmetic). Since we are currently
+assuming a channel that does not lose packets, ACK and NAK packets do
+not themselves need to indicate the sequence number of the packet they
+are acknowledging. The sender knows that a received ACK or NAK packet
+(whether garbled or not) was generated in response to its most recently
+transmitted data packet. Figures 3.11 and 3.12 show the FSM description
+for rdt2.1 , our fixed version of rdt2.0 . The rdt2.1 sender and
+receiver FSMs each now have twice as many states as before. This is
+because the protocol state must now reflect whether the packet currently
+being sent (by the sender) or expected (at the receiver) should have a
+sequence number of 0 or 1. Note that the actions in those states where a
+0numbered packet is being sent or expected are mirror images of those
+where a 1-numbered packet is being sent or expected; the only
+differences have to do with the handling of the sequence number.
+Protocol rdt2.1 uses both positive and negative acknowledgments from the
+receiver to the sender. When an out-of-order packet is received, the
+receiver sends a positive acknowledgment for the packet it has received.
+When a corrupted packet
+
+Figure 3.11 rdt2.1 sender
+
+Figure 3.12 rdt2.1 receiver
+
+is received, the receiver sends a negative acknowledgment. We can
+accomplish the same effect as a NAK if, instead of sending a NAK, we
+send an ACK for the last correctly received packet. A sender that
+receives two ACKs for the same packet (that is, receives duplicate ACKs)
+knows that the receiver did not correctly receive the packet following
+the packet that is being ACKed twice. Our NAK-free reliable data
+transfer protocol for a channel with bit errors is rdt2.2 , shown in
+Figures 3.13 and 3.14. One subtle change between rtdt2.1 and rdt2.2 is
+that the receiver must now include the sequence number of the packet
+being acknowledged by an ACK message (this is done by including the ACK
+, 0 or ACK , 1 argument in make_pkt() in the receiver FSM), and the
+sender must now check the sequence number of the packet being
+acknowledged by a received ACK message (this is done by including the 0
+or 1 argument in isACK() in the sender FSM). Reliable Data Transfer over
+a Lossy Channel with Bit Errors: rdt3.0 Suppose now that in addition to
+corrupting bits, the underlying channel can lose packets as well, a
+notuncommon event in today's computer networks (including the Internet).
+Two additional concerns must now be addressed by the protocol: how to
+detect packet loss and what to do when packet loss occurs. The use of
+checksumming, sequence numbers, ACK packets, and retransmissions---the
+techniques
+
+Figure 3.13 rdt2.2 sender
+
+already developed in rdt2.2 ---will allow us to answer the latter
+concern. Handling the first concern will require adding a new protocol
+mechanism. There are many possible approaches toward dealing with packet
+loss (several more of which are explored in the exercises at the end of
+the chapter). Here, we'll put the burden of detecting and recovering
+from lost packets on the sender. Suppose that the sender transmits a
+data packet and either that packet, or the receiver's ACK of that
+packet, gets lost. In either case, no reply is forthcoming at the sender
+from the receiver. If the sender is willing to wait long enough so that
+it is certain that a packet has been lost, it can simply retransmit the
+data packet. You should convince yourself that this protocol does indeed
+work. But how long must the sender wait to be certain that something has
+been lost? The sender must clearly wait at least as long as a round-trip
+delay between the sender and receiver (which may include buffering at
+intermediate routers) plus whatever amount of time is needed to process
+a packet at the receiver. In many networks, this worst-case maximum
+delay is very difficult even to estimate, much less know with certainty.
+Moreover, the protocol should ideally recover from packet loss as soon
+as possible; waiting for a worst-case delay could mean a long wait until
+error recovery
+
+Figure 3.14 rdt2.2 receiver
+
+is initiated. The approach thus adopted in practice is for the sender to
+judiciously choose a time value such that packet loss is likely,
+although not guaranteed, to have happened. If an ACK is not received
+within this time, the packet is retransmitted. Note that if a packet
+experiences a particularly large delay, the sender may retransmit the
+packet even though neither the data packet nor its ACK have been lost.
+This introduces the possibility of duplicate data packets in the
+sender-to-receiver channel. Happily, protocol rdt2.2 already has enough
+functionality (that is, sequence numbers) to handle the case of
+duplicate packets. From the sender's viewpoint, retransmission is a
+panacea. The sender does not know whether a data packet was lost, an ACK
+was lost, or if the packet or ACK was simply overly delayed. In all
+cases, the action is the same: retransmit. Implementing a time-based
+retransmission mechanism requires a countdown timer that can interrupt
+the sender after a given amount of time has expired. The sender will
+thus need to be able to (1) start the timer each time a packet (either a
+first-time packet or a retransmission) is sent, (2) respond to a timer
+interrupt (taking appropriate actions), and (3) stop the timer. Figure
+3.15 shows the sender FSM for rdt3.0 , a protocol that reliably
+transfers data over a channel that can corrupt or lose packets; in the
+homework problems, you'll be asked to provide the receiver FSM for
+rdt3.0 . Figure 3.16 shows how the protocol operates with no lost or
+delayed packets and how it handles lost data packets. In Figure 3.16,
+time moves forward from the top of the diagram toward the bottom of the
+
+Figure 3.15 rdt3.0 sender
+
+diagram; note that a receive time for a packet is necessarily later than
+the send time for a packet as a result of transmission and propagation
+delays. In Figures 3.16(b)--(d), the send-side brackets indicate the
+times at which a timer is set and later times out. Several of the more
+subtle aspects of this protocol are explored in the exercises at the end
+of this chapter. Because packet sequence numbers alternate between 0 and
+1, protocol rdt3.0 is sometimes known as the alternating-bit protocol.
+We have now assembled the key elements of a data transfer protocol.
+Checksums, sequence numbers, timers, and positive and negative
+acknowledgment packets each play a crucial and necessary role in the
+operation of the protocol. We now have a working reliable data transfer
+protocol!
+
+Developing a protocol and FSM representation for a simple
+application-layer protocol
+
+3.4.2 Pipelined Reliable Data Transfer Protocols Protocol rdt3.0 is a
+functionally correct protocol, but it is unlikely that anyone would be
+happy with its performance, particularly in today's high-speed networks.
+At the heart of rdt3.0 's performance problem is the fact that it is a
+stop-and-wait protocol.
+
+Figure 3.16 Operation of rdt3.0 , the alternating-bit protocol
+
+Figure 3.17 Stop-and-wait versus pipelined protocol
+
+To appreciate the performance impact of this stop-and-wait behavior,
+consider an idealized case of two hosts, one located on the West Coast
+of the United States and the other located on the East Coast, as shown
+in Figure 3.17. The speed-of-light round-trip propagation delay between
+these two end systems, RTT, is approximately 30 milliseconds. Suppose
+that they are connected by a channel with a transmission rate, R, of 1
+Gbps (109 bits per second). With a packet size, L, of 1,000 bytes (8,000
+bits) per packet, including both header fields and data, the time needed
+to actually transmit the packet into the 1 Gbps link is dtrans=LR=8000
+bits/packet109 bits/sec=8 microseconds Figure 3.18(a) shows that with
+our stop-and-wait protocol, if the sender begins sending the packet at
+t=0, then at t=L/R=8 microseconds, the last bit enters the channel at
+the sender side. The packet then makes its 15-msec cross-country
+journey, with the last bit of the packet emerging at the receiver at
+t=RTT/2+L/R= 15.008 msec. Assuming for simplicity that ACK packets are
+extremely small (so that we can ignore their transmission time) and that
+the receiver can send an ACK as soon as the last bit of a data packet is
+received, the ACK emerges back at the sender at t=RTT+L/R=30.008 msec.
+At this point, the sender can now transmit the next message. Thus, in
+30.008 msec, the sender was sending for only 0.008 msec. If we define
+the utilization of the sender (or the channel) as the fraction of time
+the sender is actually busy sending bits into the channel, the analysis
+in Figure 3.18(a) shows that the stop-andwait protocol has a rather
+dismal sender utilization, Usender, of Usender=L/RRTT+L/R
+=.00830.008=0.00027
+
+Figure 3.18 Stop-and-wait and pipelined sending
+
+That is, the sender was busy only 2.7 hundredths of one percent of the
+time! Viewed another way, the sender was able to send only 1,000 bytes
+in 30.008 milliseconds, an effective throughput of only 267 kbps---even
+though a 1 Gbps link was available! Imagine the unhappy network manager
+who just paid a fortune for a gigabit capacity link but manages to get a
+throughput of only 267 kilobits per second! This is a graphic example of
+how network protocols can limit the capabilities provided by the
+underlying network hardware. Also, we have neglected lower-layer
+protocol-processing times at the sender and receiver, as well as the
+processing and queuing delays that would occur at any intermediate
+routers
+
+between the sender and receiver. Including these effects would serve
+only to further increase the delay and further accentuate the poor
+performance. The solution to this particular performance problem is
+simple: Rather than operate in a stop-and-wait manner, the sender is
+allowed to send multiple packets without waiting for acknowledgments, as
+illustrated in Figure 3.17(b). Figure 3.18(b) shows that if the sender
+is allowed to transmit three packets before having to wait for
+acknowledgments, the utilization of the sender is essentially tripled.
+Since the many in-transit sender-to-receiver packets can be visualized
+as filling a pipeline, this technique is known as pipelining. Pipelining
+has the following consequences for reliable data transfer protocols: The
+range of sequence numbers must be increased, since each in-transit
+packet (not counting retransmissions) must have a unique sequence number
+and there may be multiple, in-transit, unacknowledged packets. The
+sender and receiver sides of the protocols may have to buffer more than
+one packet. Minimally, the sender will have to buffer packets that have
+been transmitted but not yet acknowledged. Buffering of correctly
+received packets may also be needed at the receiver, as discussed below.
+The range of sequence numbers needed and the buffering requirements will
+depend on the manner in which a data transfer protocol responds to lost,
+corrupted, and overly delayed packets. Two basic approaches toward
+pipelined error recovery can be identified: Go-Back-N and selective
+repeat.
+
+3.4.3 Go-Back-N (GBN) In a Go-Back-N (GBN) protocol, the sender is
+allowed to transmit multiple packets (when available) without waiting
+for an acknowledgment, but is constrained to have no more than some
+maximum allowable number, N, of unacknowledged packets in the pipeline.
+We describe the GBN protocol in some detail in this section. But before
+reading on, you are encouraged to play with the GBN applet (an awesome
+applet!) at the companion Web site. Figure 3.19 shows the sender's view
+of the range of sequence numbers in a GBN protocol. If we define base to
+be the sequence number of the oldest unacknowledged
+
+Figure 3.19 Sender's view of sequence numbers in Go-Back-N
+
+packet and nextseqnum to be the smallest unused sequence number (that
+is, the sequence number of the next packet to be sent), then four
+intervals in the range of sequence numbers can be identified. Sequence
+numbers in the interval \[ 0, base-1 \] correspond to packets that have
+already been transmitted and acknowledged. The interval \[base,
+nextseqnum-1\] corresponds to packets that have been sent but not yet
+acknowledged. Sequence numbers in the interval \[nextseqnum, base+N-1\]
+can be used for packets that can be sent immediately, should data arrive
+from the upper layer. Finally, sequence numbers greater than or equal to
+base+N cannot be used until an unacknowledged packet currently in the
+pipeline (specifically, the packet with sequence number base ) has been
+acknowledged. As suggested by Figure 3.19, the range of permissible
+sequence numbers for transmitted but not yet acknowledged packets can be
+viewed as a window of size N over the range of sequence numbers. As the
+protocol operates, this window slides forward over the sequence number
+space. For this reason, N is often referred to as the window size and
+the GBN protocol itself as a sliding-window protocol. You might be
+wondering why we would even limit the number of outstanding,
+unacknowledged packets to a value of N in the first place. Why not allow
+an unlimited number of such packets? We'll see in Section 3.5 that flow
+control is one reason to impose a limit on the sender. We'll examine
+another reason to do so in Section 3.7, when we study TCP congestion
+control. In practice, a packet's sequence number is carried in a
+fixed-length field in the packet header. If k is the number of bits in
+the packet sequence number field, the range of sequence numbers is thus
+\[0,2k−1\]. With a finite range of sequence numbers, all arithmetic
+involving sequence numbers must then be done using modulo 2k arithmetic.
+(That is, the sequence number space can be thought of as a ring of size
+2k, where sequence number 2k−1 is immediately followed by sequence
+number 0.) Recall that rdt3.0 had a 1-bit sequence number and a range of
+sequence numbers of \[0,1\]. Several of the problems at the end of this
+chapter explore the consequences of a finite range of sequence numbers.
+We will see in Section 3.5 that TCP has a 32-bit sequence number field,
+where TCP sequence numbers count bytes in the byte stream rather than
+packets. Figures 3.20 and 3.21 give an extended FSM description of the
+sender and receiver sides of an ACKbased, NAK-free, GBN protocol. We
+refer to this FSM
+
+Figure 3.20 Extended FSM description of the GBN sender
+
+Figure 3.21 Extended FSM description of the GBN receiver
+
+description as an extended FSM because we have added variables (similar
+to programming-language variables) for base and nextseqnum , and added
+operations on these variables and conditional actions involving these
+variables. Note that the extended FSM specification is now beginning to
+look somewhat like a programming-language specification. \[Bochman
+1984\] provides an excellent survey of
+
+additional extensions to FSM techniques as well as other
+programming-language-based techniques for specifying protocols. The GBN
+sender must respond to three types of events: Invocation from above.
+When rdt_send() is called from above, the sender first checks to see if
+the window is full, that is, whether there are N outstanding,
+unacknowledged packets. If the window is not full, a packet is created
+and sent, and variables are appropriately updated. If the window is
+full, the sender simply returns the data back to the upper layer, an
+implicit indication that the window is full. The upper layer would
+presumably then have to try again later. In a real implementation, the
+sender would more likely have either buffered (but not immediately sent)
+this data, or would have a synchronization mechanism (for example, a
+semaphore or a flag) that would allow the upper layer to call rdt_send()
+only when the window is not full. Receipt of an ACK. In our GBN
+protocol, an acknowledgment for a packet with sequence number n will be
+taken to be a cumulative acknowledgment, indicating that all packets
+with a sequence number up to and including n have been correctly
+received at the receiver. We'll come back to this issue shortly when we
+examine the receiver side of GBN. A timeout event. The protocol's name,
+"Go-Back-N," is derived from the sender's behavior in the presence of
+lost or overly delayed packets. As in the stop-and-wait protocol, a
+timer will again be used to recover from lost data or acknowledgment
+packets. If a timeout occurs, the sender resends all packets that have
+been previously sent but that have not yet been acknowledged. Our sender
+in Figure 3.20 uses only a single timer, which can be thought of as a
+timer for the oldest transmitted but not yet acknowledged packet. If an
+ACK is received but there are still additional transmitted but not yet
+acknowledged packets, the timer is restarted. If there are no
+outstanding, unacknowledged packets, the timer is stopped. The
+receiver's actions in GBN are also simple. If a packet with sequence
+number n is received correctly and is in order (that is, the data last
+delivered to the upper layer came from a packet with sequence number
+n−1), the receiver sends an ACK for packet n and delivers the data
+portion of the packet to the upper layer. In all other cases, the
+receiver discards the packet and resends an ACK for the most recently
+received in-order packet. Note that since packets are delivered one at a
+time to the upper layer, if packet k has been received and delivered,
+then all packets with a sequence number lower than k have also been
+delivered. Thus, the use of cumulative acknowledgments is a natural
+choice for GBN. In our GBN protocol, the receiver discards out-of-order
+packets. Although it may seem silly and wasteful to discard a correctly
+received (but out-of-order) packet, there is some justification for
+doing so. Recall that the receiver must deliver data in order to the
+upper layer. Suppose now that packet n is expected, but packet n+1
+arrives. Because data must be delivered in order, the receiver could
+buffer (save) packet n+1 and then deliver this packet to the upper layer
+after it had later received and delivered packet n. However, if packet n
+is lost, both it and packet n+1 will eventually be retransmitted as a
+result of the
+
+GBN retransmission rule at the sender. Thus, the receiver can simply
+discard packet n+1. The advantage of this approach is the simplicity of
+receiver buffering---the receiver need not buffer any outof-order
+packets. Thus, while the sender must maintain the upper and lower bounds
+of its window and the position of nextseqnum within this window, the
+only piece of information the receiver need maintain is the sequence
+number of the next in-order packet. This value is held in the variable
+expectedseqnum , shown in the receiver FSM in Figure 3.21. Of course,
+the disadvantage of throwing away a correctly received packet is that
+the subsequent retransmission of that packet might be lost or garbled
+and thus even more retransmissions would be required. Figure 3.22 shows
+the operation of the GBN protocol for the case of a window size of four
+packets. Because of this window size limitation, the sender sends
+packets 0 through 3 but then must wait for one or more of these packets
+to be acknowledged before proceeding. As each successive ACK (for
+example, ACK0 and ACK1 ) is received, the window slides forward and the
+sender can transmit one new packet (pkt4 and pkt5, respectively). On the
+receiver side, packet 2 is lost and thus packets 3, 4, and 5 are found
+to be out of order and are discarded. Before closing our discussion of
+GBN, it is worth noting that an implementation of this protocol in a
+protocol stack would likely have a structure similar to that of the
+extended FSM in Figure 3.20. The implementation would also likely be in
+the form of various procedures that implement the actions to be taken in
+response to the various events that can occur. In such event-based
+programming, the various procedures are called (invoked) either by other
+procedures in the protocol stack, or as the result of an interrupt. In
+the sender, these events would be (1) a call from the upper-layer entity
+to invoke rdt_send() , (2) a timer interrupt, and (3) a call from the
+lower layer to invoke rdt_rcv() when a packet arrives. The programming
+exercises at the end of this chapter will give you a chance to actually
+implement these routines in a simulated, but realistic, network setting.
+We note here that the GBN protocol incorporates almost all of the
+techniques that we will encounter when we study the reliable data
+transfer components of TCP in Section 3.5. These techniques include the
+use of sequence numbers, cumulative acknowledgments, checksums, and a
+timeout/retransmit operation.
+
+Figure 3.22 Go-Back-N in operation
+
+3.4.4 Selective Repeat (SR) The GBN protocol allows the sender to
+potentially "fill the pipeline" in Figure 3.17 with packets, thus
+avoiding the channel utilization problems we noted with stop-and-wait
+protocols. There are, however, scenarios in which GBN itself suffers
+from performance problems. In particular, when the window size and
+bandwidth-delay product are both large, many packets can be in the
+pipeline. A single packet error can thus cause GBN to retransmit a large
+number of packets, many unnecessarily. As the probability of channel
+errors increases, the pipeline can become filled with these unnecessary
+retransmissions. Imagine, in our message-dictation scenario, that if
+every time a word was garbled, the surrounding 1,000 words (for example,
+a window size of 1,000 words) had to be repeated. The dictation would be
+
+slowed by all of the reiterated words. As the name suggests,
+selective-repeat protocols avoid unnecessary retransmissions by having
+the sender retransmit only those packets that it suspects were received
+in error (that is, were lost or corrupted) at the receiver. This
+individual, as-needed, retransmission will require that the receiver
+individually acknowledge correctly received packets. A window size of N
+will again be used to limit the number of outstanding, unacknowledged
+packets in the pipeline. However, unlike GBN, the sender will have
+already received ACKs for some of the packets in the window. Figure 3.23
+shows the SR sender's view of the sequence number space. Figure 3.24
+details the various actions taken by the SR sender. The SR receiver will
+acknowledge a correctly received packet whether or not it is in order.
+Out-of-order packets are buffered until any missing packets (that is,
+packets with lower sequence numbers) are received, at which point a
+batch of packets can be delivered in order to the upper layer. Figure
+3.25 itemizes the various actions taken by the SR receiver. Figure 3.26
+shows an example of SR operation in the presence of lost packets. Note
+that in Figure 3.26, the receiver initially buffers packets 3, 4, and 5,
+and delivers them together with packet 2 to the upper layer when packet
+2 is finally received.
+
+Figure 3.23 Selective-repeat (SR) sender and receiver views of
+sequence-number space
+
+Figure 3.24 SR sender events and actions
+
+Figure 3.25 SR receiver events and actions
+
+It is important to note that in Step 2 in Figure 3.25, the receiver
+reacknowledges (rather than ignores) already received packets with
+certain sequence numbers below the current window base. You should
+convince yourself that this reacknowledgment is indeed needed. Given the
+sender and receiver sequence number spaces in Figure 3.23, for example,
+if there is no ACK for packet send_base propagating from the
+
+Figure 3.26 SR operation
+
+receiver to the sender, the sender will eventually retransmit packet
+send_base , even though it is clear (to us, not the sender!) that the
+receiver has already received that packet. If the receiver were not to
+acknowledge this packet, the sender's window would never move forward!
+This example illustrates an important aspect of SR protocols (and many
+other protocols as well). The sender and receiver will not always have
+an identical view of what has been received correctly and what has not.
+For SR protocols, this means that the sender and receiver windows will
+not always coincide. The lack of synchronization between sender and
+receiver windows has important consequences when we are faced with the
+reality of a finite range of sequence numbers. Consider what could
+happen, for example, with a finite range of four packet sequence
+numbers, 0, 1, 2, 3, and a window size of three.
+
+Suppose packets 0 through 2 are transmitted and correctly received and
+acknowledged at the receiver. At this point, the receiver's window is
+over the fourth, fifth, and sixth packets, which have sequence numbers
+3, 0, and 1, respectively. Now consider two scenarios. In the first
+scenario, shown in Figure 3.27(a), the ACKs for the first three packets
+are lost and the sender retransmits these packets. The receiver thus
+next receives a packet with sequence number 0---a copy of the first
+packet sent. In the second scenario, shown in Figure 3.27(b), the ACKs
+for the first three packets are all delivered correctly. The sender thus
+moves its window forward and sends the fourth, fifth, and sixth packets,
+with sequence numbers 3, 0, and 1, respectively. The packet with
+sequence number 3 is lost, but the packet with sequence number 0
+arrives---a packet containing new data. Now consider the receiver's
+viewpoint in Figure 3.27, which has a figurative curtain between the
+sender and the receiver, since the receiver cannot "see" the actions
+taken by the sender. All the receiver observes is the sequence of
+messages it receives from the channel and sends into the channel. As far
+as it is concerned, the two scenarios in Figure 3.27 are identical.
+There is no way of distinguishing the retransmission of the first packet
+from an original transmission of the fifth packet. Clearly, a window
+size that is 1 less than the size of the sequence number space won't
+work. But how small must the window size be? A problem at the end of the
+chapter asks you to show that the window size must be less than or equal
+to half the size of the sequence number space for SR protocols. At the
+companion Web site, you will find an applet that animates the operation
+of the SR protocol. Try performing the same experiments that you did
+with the GBN applet. Do the results agree with what you expect? This
+completes our discussion of reliable data transfer protocols. We've
+covered a lot of ground and introduced numerous mechanisms that together
+provide for reliable data transfer. Table 3.1 summarizes these
+mechanisms. Now that we have seen all of these mechanisms in operation
+and can see the "big picture," we encourage you to review this section
+again to see how these mechanisms were incrementally added to cover
+increasingly complex (and realistic) models of the channel connecting
+the sender and receiver, or to improve the performance of the protocols.
+Let's conclude our discussion of reliable data transfer protocols by
+considering one remaining assumption in our underlying channel model.
+Recall that we have assumed that packets cannot be reordered within the
+channel between the sender and receiver. This is generally a reasonable
+assumption when the sender and receiver are connected by a single
+physical wire. However, when the "channel" connecting the two is a
+network, packet reordering can occur. One manifestation of packet
+reordering is that old copies of a packet with a sequence or
+acknowledgment
+
+Figure 3.27 SR receiver dilemma with too-large windows: A new packet or
+a retransmission?
+
+Table 3.1 Summary of reliable data transfer mechanisms and their use
+Mechanism
+
+Use, Comments
+
+Checksum
+
+Used to detect bit errors in a transmitted packet.
+
+Timer
+
+Used to timeout/retransmit a packet, possibly because the packet (or its
+ACK) was lost within the channel. Because timeouts can occur when a
+packet is delayed but not lost (premature timeout), or when a packet has
+been received by the receiver but the receiver-to-sender ACK has been
+lost, duplicate copies
+
+of a packet may be received by a receiver. Sequence
+
+Used for sequential numbering of packets of data flowing from sender to
+
+number
+
+receiver. Gaps in the sequence numbers of received packets allow the
+receiver to detect a lost packet. Packets with duplicate sequence
+numbers allow the receiver to detect duplicate copies of a packet.
+
+Acknowledgment
+
+Used by the receiver to tell the sender that a packet or set of packets
+has been received correctly. Acknowledgments will typically carry the
+sequence number of the packet or packets being acknowledged.
+Acknowledgments may be individual or cumulative, depending on the
+protocol.
+
+Negative
+
+Used by the receiver to tell the sender that a packet has not been
+received
+
+acknowledgment
+
+correctly. Negative acknowledgments will typically carry the sequence
+number of the packet that was not received correctly.
+
+Window,
+
+The sender may be restricted to sending only packets with sequence
+numbers
+
+pipelining
+
+that fall within a given range. By allowing multiple packets to be
+transmitted but not yet acknowledged, sender utilization can be
+increased over a stop-and-wait mode of operation. We'll see shortly that
+the window size may be set on the basis of the receiver's ability to
+receive and buffer messages, or the level of congestion in the network,
+or both.
+
+number of x can appear, even though neither the sender's nor the
+receiver's window contains x. With packet reordering, the channel can be
+thought of as essentially buffering packets and spontaneously emitting
+these packets at any point in the future. Because sequence numbers may
+be reused, some care must be taken to guard against such duplicate
+packets. The approach taken in practice is to ensure that a sequence
+number is not reused until the sender is "sure" that any previously sent
+packets with sequence number x are no longer in the network. This is
+done by assuming that a packet cannot "live" in the network for longer
+than some fixed maximum amount of time. A maximum packet lifetime of
+approximately three minutes is assumed in the TCP extensions for
+high-speed networks \[RFC 1323\]. \[Sunshine 1978\] describes a method
+for using sequence numbers such that reordering problems can be
+completely avoided.
+
+3.5 Connection-Oriented Transport: TCP Now that we have covered the
+underlying principles of reliable data transfer, let's turn to TCP---the
+Internet's transport-layer, connection-oriented, reliable transport
+protocol. In this section, we'll see that in order to provide reliable
+data transfer, TCP relies on many of the underlying principles discussed
+in the previous section, including error detection, retransmissions,
+cumulative acknowledgments, timers, and header fields for sequence and
+acknowledgment numbers. TCP is defined in RFC 793, RFC 1122, RFC 1323,
+RFC 2018, and RFC 2581.
+
+3.5.1 The TCP Connection TCP is said to be connection-oriented because
+before one application process can begin to send data to another, the
+two processes must first "handshake" with each other---that is, they
+must send some preliminary segments to each other to establish the
+parameters of the ensuing data transfer. As part of TCP connection
+establishment, both sides of the connection will initialize many TCP
+state variables (many of which will be discussed in this section and in
+Section 3.7) associated with the TCP connection. The TCP "connection" is
+not an end-to-end TDM or FDM circuit as in a circuit-switched network.
+Instead, the "connection" is a logical one, with common state residing
+only in the TCPs in the two communicating end systems. Recall that
+because the TCP protocol runs only in the end systems and not in the
+intermediate network elements (routers and link-layer switches), the
+intermediate network elements do not maintain TCP connection state. In
+fact, the intermediate routers are completely oblivious to TCP
+connections; they see datagrams, not connections. A TCP connection
+provides a full-duplex service: If there is a TCP connection between
+Process A on one host and Process B on another host, then
+application-layer data can flow from Process A to Process B at the same
+time as application-layer data flows from Process B to Process A. A TCP
+connection is also always point-to-point, that is, between a single
+sender and a single receiver. Socalled "multicasting" (see the online
+supplementary materials for this text)---the transfer of data from one
+sender to many receivers in a single send operation---is not possible
+with TCP. With TCP, two hosts are company and three are a crowd! Let's
+now take a look at how a TCP connection is established. Suppose a
+process running in one host wants to initiate a connection with another
+process in another host. Recall that the process that is
+
+initiating the connection is called the client process, while the other
+process is called the server process. The client application process
+first informs the client transport layer that it wants to establish a
+connection
+
+CASE HISTORY Vinton Cerf, Robert Kahn, and TCP/IP In the early 1970s,
+packet-switched networks began to proliferate, with the ARPAnet---the
+precursor of the Internet---being just one of many networks. Each of
+these networks had its own protocol. Two researchers, Vinton Cerf and
+Robert Kahn, recognized the importance of interconnecting these networks
+and invented a cross-network protocol called TCP/IP, which stands for
+Transmission Control Protocol/Internet Protocol. Although Cerf and Kahn
+began by seeing the protocol as a single entity, it was later split into
+its two parts, TCP and IP, which operated separately. Cerf and Kahn
+published a paper on TCP/IP in May 1974 in IEEE Transactions on
+Communications Technology \[Cerf 1974\]. The TCP/IP protocol, which is
+the bread and butter of today's Internet, was devised before PCs,
+workstations, smartphones, and tablets, before the proliferation of
+Ethernet, cable, and DSL, WiFi, and other access network technologies,
+and before the Web, social media, and streaming video. Cerf and Kahn saw
+the need for a networking protocol that, on the one hand, provides broad
+support for yet-to-be-defined applications and, on the other hand,
+allows arbitrary hosts and link-layer protocols to interoperate. In
+2004, Cerf and Kahn received the ACM's Turing Award, considered the
+"Nobel Prize of Computing" for "pioneering work on internetworking,
+including the design and implementation of the Internet's basic
+communications protocols, TCP/IP, and for inspired leadership in
+networking."
+
+to a process in the server. Recall from Section 2.7.2, a Python client
+program does this by issuing the command
+
+clientSocket.connect((serverName, serverPort))
+
+where serverName is the name of the server and serverPort identifies the
+process on the server. TCP in the client then proceeds to establish a
+TCP connection with TCP in the server. At the end of this section we
+discuss in some detail the connection-establishment procedure. For now
+it suffices to know that the client first sends a special TCP segment;
+the server responds with a second special TCP segment; and finally the
+client responds again with a third special segment. The first two
+segments carry no payload, that is, no application-layer data; the third
+of these segments may carry a payload. Because
+
+three segments are sent between the two hosts, this
+connection-establishment procedure is often referred to as a three-way
+handshake. Once a TCP connection is established, the two application
+processes can send data to each other. Let's consider the sending of
+data from the client process to the server process. The client process
+passes a stream of data through the socket (the door of the process), as
+described in Section 2.7. Once the data passes through the door, the
+data is in the hands of TCP running in the client. As shown in Figure
+3.28, TCP directs this data to the connection's send buffer, which is
+one of the buffers that is set aside during the initial three-way
+handshake. From time to time, TCP will grab chunks of data from the send
+buffer and pass the data to the network layer. Interestingly, the TCP
+specification \[RFC 793\] is very laid back about specifying when TCP
+should actually send buffered data, stating that TCP should "send that
+data in segments at its own convenience." The maximum amount of data
+that can be grabbed and placed in a segment is limited by the maximum
+segment size (MSS). The MSS is typically set by first determining the
+length of the largest link-layer frame that can be sent by the local
+sending host (the socalled maximum transmission unit, MTU), and then
+setting the MSS to ensure that a TCP segment (when encapsulated in an IP
+datagram) plus the TCP/IP header length (typically 40 bytes) will fit
+into a single link-layer frame. Both Ethernet and PPP link-layer
+protocols have an MTU of 1,500 bytes. Thus a typical value of MSS is
+1460 bytes. Approaches have also been proposed for discovering the path
+MTU ---the largest link-layer frame that can be sent on all links from
+source to destination \[RFC 1191\]---and setting the MSS based on the
+path MTU value. Note that the MSS is the maximum amount of
+application-layer data in the segment, not the maximum size of the TCP
+segment including headers. (This terminology is confusing, but we have
+to live with it, as it is well entrenched.) TCP pairs each chunk of
+client data with a TCP header, thereby forming TCP segments. The
+segments are passed down to the network layer, where they are separately
+encapsulated within network-layer IP datagrams. The IP datagrams are
+then sent into the network. When TCP receives a segment at the other
+end, the segment's data is placed in the TCP connection's receive
+buffer, as shown in Figure 3.28. The application reads the stream of
+data from this buffer. Each side of the connection has
+
+Figure 3.28 TCP send and receive buffers
+
+its own send buffer and its own receive buffer. (You can see the online
+flow-control applet at http://www.awl.com/kurose-ross, which provides an
+animation of the send and receive buffers.) We see from this discussion
+that a TCP connection consists of buffers, variables, and a socket
+connection to a process in one host, and another set of buffers,
+variables, and a socket connection to a process in another host. As
+mentioned earlier, no buffers or variables are allocated to the
+connection in the network elements (routers, switches, and repeaters)
+between the hosts.
+
+3.5.2 TCP Segment Structure Having taken a brief look at the TCP
+connection, let's examine the TCP segment structure. The TCP segment
+consists of header fields and a data field. The data field contains a
+chunk of application data. As mentioned above, the MSS limits the
+maximum size of a segment's data field. When TCP sends a large file,
+such as an image as part of a Web page, it typically breaks the file
+into chunks of size MSS (except for the last chunk, which will often be
+less than the MSS). Interactive applications, however, often transmit
+data chunks that are smaller than the MSS; for example, with remote
+login applications like Telnet, the data field in the TCP segment is
+often only one byte. Because the TCP header is typically 20 bytes (12
+bytes more than the UDP header), segments sent by Telnet may be only 21
+bytes in length. Figure 3.29 shows the structure of the TCP segment. As
+with UDP, the header includes source and destination port numbers, which
+are used for multiplexing/demultiplexing data from/to upper-layer
+applications. Also, as with UDP, the header includes a checksum field. A
+TCP segment header also contains the following fields: The 32-bit
+sequence number field and the 32-bit acknowledgment number field are
+used by the TCP sender and receiver in implementing a reliable data
+transfer service, as discussed below. The 16-bit receive window field is
+used for flow control. We will see shortly that it is used to indicate
+the number of bytes that a receiver is willing to accept. The 4-bit
+header length field specifies the length of the TCP header in 32-bit
+words. The TCP header can be of variable length due to the TCP options
+field. (Typically, the options field is empty, so that the length of the
+typical TCP header is 20 bytes.) The optional and variable-length
+options field is used when a sender and receiver negotiate the maximum
+segment size (MSS) or as a window scaling factor for use in high-speed
+networks. A timestamping option is also defined. See RFC 854 and RFC
+1323 for additional details. The flag field contains 6 bits. The ACK bit
+is used to indicate that the value carried in the acknowledgment field
+is valid; that is, the segment contains an acknowledgment for a segment
+that has been successfully received. The RST,
+
+Figure 3.29 TCP segment structure
+
+SYN, and FIN bits are used for connection setup and teardown, as we will
+discuss at the end of this section. The CWR and ECE bits are used in
+explicit congestion notification, as discussed in Section 3.7.2. Setting
+the PSH bit indicates that the receiver should pass the data to the
+upper layer immediately. Finally, the URG bit is used to indicate that
+there is data in this segment that the sending-side upper-layer entity
+has marked as "urgent." The location of the last byte of this urgent
+data is indicated by the 16-bit urgent data pointer field. TCP must
+inform the receiving-side upperlayer entity when urgent data exists and
+pass it a pointer to the end of the urgent data. (In practice, the PSH,
+URG, and the urgent data pointer are not used. However, we mention these
+fields for completeness.) Our experience as teachers is that our
+students sometimes find discussion of packet formats rather dry and
+perhaps a bit boring. For a fun and fanciful look at TCP header fields,
+particularly if you love Legos™ as we do, see \[Pomeranz 2010\].
+Sequence Numbers and Acknowledgment Numbers Two of the most important
+fields in the TCP segment header are the sequence number field and the
+acknowledgment number field. These fields are a critical part of TCP's
+reliable data transfer service. But before discussing how these fields
+are used to provide reliable data transfer, let us first explain what
+exactly TCP puts in these fields.
+
+Figure 3.30 Dividing file data into TCP segments
+
+TCP views data as an unstructured, but ordered, stream of bytes. TCP's
+use of sequence numbers reflects this view in that sequence numbers are
+over the stream of transmitted bytes and not over the series of
+transmitted segments. The sequence number for a segment is therefore the
+byte-stream number of the first byte in the segment. Let's look at an
+example. Suppose that a process in Host A wants to send a stream of data
+to a process in Host B over a TCP connection. The TCP in Host A will
+implicitly number each byte in the data stream. Suppose that the data
+stream consists of a file consisting of 500,000 bytes, that the MSS is
+1,000 bytes, and that the first byte of the data stream is numbered 0.
+As shown in Figure 3.30, TCP constructs 500 segments out of the data
+stream. The first segment gets assigned sequence number 0, the second
+segment gets assigned sequence number 1,000, the third segment gets
+assigned sequence number 2,000, and so on. Each sequence number is
+inserted in the sequence number field in the header of the appropriate
+TCP segment. Now let's consider acknowledgment numbers. These are a
+little trickier than sequence numbers. Recall that TCP is full-duplex,
+so that Host A may be receiving data from Host B while it sends data to
+Host B (as part of the same TCP connection). Each of the segments that
+arrive from Host B has a sequence number for the data flowing from B to
+A. The acknowledgment number that Host A puts in its segment is the
+sequence number of the next byte Host A is expecting from Host B. It is
+good to look at a few examples to understand what is going on here.
+Suppose that Host A has received all bytes numbered 0 through 535 from B
+and suppose that it is about to send a segment to Host B. Host A is
+waiting for byte 536 and all the subsequent bytes in Host B's data
+stream. So Host A puts 536 in the acknowledgment number field of the
+segment it sends to B. As another example, suppose that Host A has
+received one segment from Host B containing bytes 0 through 535 and
+another segment containing bytes 900 through 1,000. For some reason Host
+A has not yet received bytes 536 through 899. In this example, Host A is
+still waiting for byte 536 (and beyond) in order to re-create B's data
+stream. Thus, A's next segment to B will contain 536 in the
+acknowledgment number field. Because TCP only acknowledges bytes up to
+the first missing byte in the stream, TCP is said to provide cumulative
+acknowledgments.
+
+This last example also brings up an important but subtle issue. Host A
+received the third segment (bytes 900 through 1,000) before receiving
+the second segment (bytes 536 through 899). Thus, the third segment
+arrived out of order. The subtle issue is: What does a host do when it
+receives out-of-order segments in a TCP connection? Interestingly, the
+TCP RFCs do not impose any rules here and leave the decision up to the
+programmers implementing a TCP implementation. There are basically two
+choices: either (1) the receiver immediately discards out-of-order
+segments (which, as we discussed earlier, can simplify receiver design),
+or (2) the receiver keeps the out-of-order bytes and waits for the
+missing bytes to fill in the gaps. Clearly, the latter choice is more
+efficient in terms of network bandwidth, and is the approach taken in
+practice. In Figure 3.30, we assumed that the initial sequence number
+was zero. In truth, both sides of a TCP connection randomly choose an
+initial sequence number. This is done to minimize the possibility that a
+segment that is still present in the network from an earlier,
+already-terminated connection between two hosts is mistaken for a valid
+segment in a later connection between these same two hosts (which also
+happen to be using the same port numbers as the old connection)
+\[Sunshine 1978\]. Telnet: A Case Study for Sequence and Acknowledgment
+Numbers Telnet, defined in RFC 854, is a popular application-layer
+protocol used for remote login. It runs over TCP and is designed to work
+between any pair of hosts. Unlike the bulk data transfer applications
+discussed in Chapter 2, Telnet is an interactive application. We discuss
+a Telnet example here, as it nicely illustrates TCP sequence and
+acknowledgment numbers. We note that many users now prefer to use the
+SSH protocol rather than Telnet, since data sent in a Telnet connection
+(including passwords!) are not encrypted, making Telnet vulnerable to
+eavesdropping attacks (as discussed in Section 8.7). Suppose Host A
+initiates a Telnet session with Host B. Because Host A initiates the
+session, it is labeled the client, and Host B is labeled the server.
+Each character typed by the user (at the client) will be sent to the
+remote host; the remote host will send back a copy of each character,
+which will be displayed on the Telnet user's screen. This "echo back" is
+used to ensure that characters seen by the Telnet user have already been
+received and processed at the remote site. Each character thus traverses
+the network twice between the time the user hits the key and the time
+the character is displayed on the user's monitor. Now suppose the user
+types a single letter, 'C,' and then grabs a coffee. Let's examine the
+TCP segments that are sent between the client and server. As shown in
+Figure 3.31, we suppose the starting sequence numbers are 42 and 79 for
+the client and server, respectively. Recall that the sequence number of
+a segment is the sequence number of the first byte in the data field.
+Thus, the first segment sent from the client will have sequence number
+42; the first segment sent from the server will have sequence number 79.
+Recall that the acknowledgment number is the sequence
+
+Figure 3.31 Sequence and acknowledgment numbers for a simple Telnet
+application over TCP
+
+number of the next byte of data that the host is waiting for. After the
+TCP connection is established but before any data is sent, the client is
+waiting for byte 79 and the server is waiting for byte 42. As shown in
+Figure 3.31, three segments are sent. The first segment is sent from the
+client to the server, containing the 1-byte ASCII representation of the
+letter 'C' in its data field. This first segment also has 42 in its
+sequence number field, as we just described. Also, because the client
+has not yet received any data from the server, this first segment will
+have 79 in its acknowledgment number field. The second segment is sent
+from the server to the client. It serves a dual purpose. First it
+provides an acknowledgment of the data the server has received. By
+putting 43 in the acknowledgment field, the server is telling the client
+that it has successfully received everything up through byte 42 and is
+now waiting for bytes 43 onward. The second purpose of this segment is
+to echo back the letter 'C.' Thus, the second segment has the ASCII
+representation of 'C' in its data field. This second segment has the
+sequence number 79, the initial sequence number of the server-to-client
+data flow of this TCP connection, as this is the very first byte of data
+that the server is sending. Note that the acknowledgment for
+client-to-server data is carried in a segment carrying server-to-client
+data; this acknowledgment is said to be piggybacked on the
+server-to-client data segment.
+
+The third segment is sent from the client to the server. Its sole
+purpose is to acknowledge the data it has received from the server.
+(Recall that the second segment contained data---the letter 'C'---from
+the server to the client.) This segment has an empty data field (that
+is, the acknowledgment is not being piggybacked with any
+client-to-server data). The segment has 80 in the acknowledgment number
+field because the client has received the stream of bytes up through
+byte sequence number 79 and it is now waiting for bytes 80 onward. You
+might think it odd that this segment also has a sequence number since
+the segment contains no data. But because TCP has a sequence number
+field, the segment needs to have some sequence number.
+
+3.5.3 Round-Trip Time Estimation and Timeout TCP, like our rdt protocol
+in Section 3.4, uses a timeout/retransmit mechanism to recover from lost
+segments. Although this is conceptually simple, many subtle issues arise
+when we implement a timeout/retransmit mechanism in an actual protocol
+such as TCP. Perhaps the most obvious question is the length of the
+timeout intervals. Clearly, the timeout should be larger than the
+connection's round-trip time (RTT), that is, the time from when a
+segment is sent until it is acknowledged. Otherwise, unnecessary
+retransmissions would be sent. But how much larger? How should the RTT
+be estimated in the first place? Should a timer be associated with each
+and every unacknowledged segment? So many questions! Our discussion in
+this section is based on the TCP work in \[Jacobson 1988\] and the
+current IETF recommendations for managing TCP timers \[RFC 6298\].
+Estimating the Round-Trip Time Let's begin our study of TCP timer
+management by considering how TCP estimates the round-trip time between
+sender and receiver. This is accomplished as follows. The sample RTT,
+denoted SampleRTT , for a segment is the amount of time between when the
+segment is sent (that is, passed to IP) and when an acknowledgment for
+the segment is received. Instead of measuring a SampleRTT for every
+transmitted segment, most TCP implementations take only one SampleRTT
+measurement at a time. That is, at any point in time, the SampleRTT is
+being estimated for only one of the transmitted but currently
+unacknowledged segments, leading to a new value of SampleRTT
+approximately once every RTT. Also, TCP never computes a SampleRTT for a
+segment that has been retransmitted; it only measures SampleRTT for
+segments that have been transmitted once \[Karn 1987\]. (A problem at
+the end of the chapter asks you to consider why.) Obviously, the
+SampleRTT values will fluctuate from segment to segment due to
+congestion in the routers and to the varying load on the end systems.
+Because of this fluctuation, any given SampleRTT value may be atypical.
+In order to estimate a typical RTT, it is therefore natural to take some
+sort of average of the SampleRTT values. TCP maintains an average,
+called EstimatedRTT , of the
+
+SampleRTT values. Upon obtaining a new SampleRTT , TCP updates
+EstimatedRTT according to the following formula:
+
+EstimatedRTT=(1−α)⋅EstimatedRTT+α⋅SampleRTT The formula above is written
+in the form of a programming-language statement---the new value of
+EstimatedRTT is a weighted combination of the previous value of
+EstimatedRTT and the new value for SampleRTT. The recommended value of α
+is α = 0.125 (that is, 1/8) \[RFC 6298\], in which case the formula
+above becomes:
+
+EstimatedRTT=0.875⋅EstimatedRTT+0.125⋅SampleRTT
+
+Note that EstimatedRTT is a weighted average of the SampleRTT values. As
+discussed in a homework problem at the end of this chapter, this
+weighted average puts more weight on recent samples than on old samples.
+This is natural, as the more recent samples better reflect the current
+congestion in the network. In statistics, such an average is called an
+exponential weighted moving average (EWMA). The word "exponential"
+appears in EWMA because the weight of a given SampleRTT decays
+exponentially fast as the updates proceed. In the homework problems you
+will be asked to derive the exponential term in EstimatedRTT . Figure
+3.32 shows the SampleRTT values and EstimatedRTT for a value of α = 1/8
+for a TCP connection between gaia.cs.umass.edu (in Amherst,
+Massachusetts) to fantasia.eurecom.fr (in the south of France). Clearly,
+the variations in the SampleRTT are smoothed out in the computation of
+the EstimatedRTT . In addition to having an estimate of the RTT, it is
+also valuable to have a measure of the variability of the RTT. \[RFC
+6298\] defines the RTT variation, DevRTT , as an estimate of how much
+SampleRTT typically deviates from EstimatedRTT :
+
+DevRTT=(1−β)⋅DevRTT+β⋅\|SampleRTT−EstimatedRTT\|
+
+Note that DevRTT is an EWMA of the difference between SampleRTT and
+EstimatedRTT . If the SampleRTT values have little fluctuation, then
+DevRTT will be small; on the other hand, if there is a lot of
+fluctuation, DevRTT will be large. The recommended value of β is 0.25.
+
+Setting and Managing the Retransmission Timeout Interval Given values of
+EstimatedRTT and DevRTT , what value should be used for TCP's timeout
+interval? Clearly, the interval should be greater than or equal to
+
+PRINCIPLES IN PRACTICE TCP provides reliable data transfer by using
+positive acknowledgments and timers in much the same way that we studied
+in Section 3.4. TCP acknowledges data that has been received correctly,
+and it then retransmits segments when segments or their corresponding
+acknowledgments are thought to be lost or corrupted. Certain versions of
+TCP also have an implicit NAK mechanism---with TCP's fast retransmit
+mechanism, the receipt of three duplicate ACKs for a given segment
+serves as an implicit NAK for the following segment, triggering
+retransmission of that segment before timeout. TCP uses sequences of
+numbers to allow the receiver to identify lost or duplicate segments.
+Just as in the case of our reliable data transfer protocol, rdt3.0 , TCP
+cannot itself tell for certain if a segment, or its ACK, is lost,
+corrupted, or overly delayed. At the sender, TCP's response will be the
+same: retransmit the segment in question. TCP also uses pipelining,
+allowing the sender to have multiple transmitted but
+yet-to-beacknowledged segments outstanding at any given time. We saw
+earlier that pipelining can greatly improve a session's throughput when
+the ratio of the segment size to round-trip delay is small. The specific
+number of outstanding, unacknowledged segments that a sender can have is
+determined by TCP's flow-control and congestion-control mechanisms. TCP
+flow control is discussed at the end of this section; TCP congestion
+control is discussed in Section 3.7. For the time being, we must simply
+be aware that the TCP sender uses pipelining. EstimatedRTT , or
+unnecessary retransmissions would be sent. But the timeout interval
+should not be too much larger than EstimatedRTT ; otherwise, when a
+segment is lost, TCP would not quickly retransmit the segment, leading
+to large data transfer delays. It is therefore desirable to set the
+timeout equal to the EstimatedRTT plus some margin. The margin should be
+large when there is a lot of fluctuation in the SampleRTT values; it
+should be small when there is little fluctuation. The value of DevRTT
+should thus come into play here. All of these considerations are taken
+into account in TCP's method for determining the retransmission timeout
+interval:
+
+TimeoutInterval=EstimatedRTT+4⋅DevRTT
+
+An initial TimeoutInterval value of 1 second is recommended \[RFC
+6298\]. Also, when a timeout occurs, the value of TimeoutInterval is
+doubled to avoid a premature timeout occurring for a
+
+subsequent segment that will soon be acknowledged. However, as soon as a
+segment is received and EstimatedRTT is updated, the TimeoutInterval is
+again computed using the formula above.
+
+Figure 3.32 RTT samples and RTT estimates
+
+3.5.4 Reliable Data Transfer Recall that the Internet's network-layer
+service (IP service) is unreliable. IP does not guarantee datagram
+delivery, does not guarantee in-order delivery of datagrams, and does
+not guarantee the integrity of the data in the datagrams. With IP
+service, datagrams can overflow router buffers and never reach their
+destination, datagrams can arrive out of order, and bits in the datagram
+can get corrupted (flipped from 0 to 1 and vice versa). Because
+transport-layer segments are carried across the network by IP datagrams,
+transport-layer segments can suffer from these problems as well. TCP
+creates a reliable data transfer service on top of IP's unreliable
+best-effort service. TCP's reliable data transfer service ensures that
+the data stream that a process reads out of its TCP receive buffer is
+uncorrupted, without gaps, without duplication, and in sequence; that
+is, the byte stream is exactly the same byte stream that was sent by the
+end system on the other side of the connection. How TCP provides a
+reliable data transfer involves many of the principles that we studied
+in Section 3.4. In our earlier development of reliable data transfer
+techniques, it was conceptually easiest to assume
+
+that an individual timer is associated with each transmitted but not yet
+acknowledged segment. While this is great in theory, timer management
+can require considerable overhead. Thus, the recommended TCP timer
+management procedures \[RFC 6298\] use only a single retransmission
+timer, even if there are multiple transmitted but not yet acknowledged
+segments. The TCP protocol described in this section follows this
+single-timer recommendation. We will discuss how TCP provides reliable
+data transfer in two incremental steps. We first present a highly
+simplified description of a TCP sender that uses only timeouts to
+recover from lost segments; we then present a more complete description
+that uses duplicate acknowledgments in addition to timeouts. In the
+ensuing discussion, we suppose that data is being sent in only one
+direction, from Host A to Host B, and that Host A is sending a large
+file. Figure 3.33 presents a highly simplified description of a TCP
+sender. We see that there are three major events related to data
+transmission and retransmission in the TCP sender: data received from
+application above; timer timeout; and ACK
+
+Figure 3.33 Simplified TCP sender
+
+receipt. Upon the occurrence of the first major event, TCP receives data
+from the application, encapsulates the data in a segment, and passes the
+segment to IP. Note that each segment includes a sequence number that is
+the byte-stream number of the first data byte in the segment, as
+described in Section 3.5.2. Also note that if the timer is already not
+running for some other segment, TCP starts the timer when the segment is
+passed to IP. (It is helpful to think of the timer as being associated
+with the oldest unacknowledged segment.) The expiration interval for
+this timer is the TimeoutInterval , which is calculated from
+EstimatedRTT and DevRTT , as described in Section 3.5.3. The second
+major event is the timeout. TCP responds to the timeout event by
+retransmitting the segment that caused the timeout. TCP then restarts
+the timer. The third major event that must be handled by the TCP sender
+is the arrival of an acknowledgment segment (ACK) from the receiver
+(more specifically, a segment containing a valid ACK field value). On
+the occurrence of this event, TCP compares the ACK value y with its
+variable SendBase . The TCP state variable SendBase is the sequence
+number of the oldest unacknowledged byte. (Thus SendBase--1 is the
+sequence number of the last byte that is known to have been received
+correctly and in order at the receiver.) As indicated earlier, TCP uses
+cumulative acknowledgments, so that y acknowledges the receipt of all
+bytes before byte number y . If y \> SendBase , then the ACK is
+acknowledging one or more previously unacknowledged segments. Thus the
+sender updates its SendBase variable; it also restarts the timer if
+there currently are any not-yet-acknowledged segments. A Few Interesting
+Scenarios We have just described a highly simplified version of how TCP
+provides reliable data transfer. But even this highly simplified version
+has many subtleties. To get a good feeling for how this protocol works,
+let's now walk through a few simple scenarios. Figure 3.34 depicts the
+first scenario, in which Host A sends one segment to Host B. Suppose
+that this segment has sequence number 92 and contains 8 bytes of data.
+After sending this segment, Host A waits for a segment from B with
+acknowledgment number 100. Although the segment from A is received at B,
+the acknowledgment from B to A gets lost. In this case, the timeout
+event occurs, and Host A retransmits the same segment. Of course, when
+Host B receives the retransmission, it observes from the sequence number
+that the segment contains data that has already been received. Thus, TCP
+in Host B will discard the bytes in the retransmitted segment. In a
+second scenario, shown in Figure 3.35, Host A sends two segments back to
+back. The first segment has sequence number 92 and 8 bytes of data, and
+the second segment has sequence number 100 and 20 bytes of data. Suppose
+that both segments arrive intact at B, and B sends two separate
+acknowledgments for each of these segments. The first of these
+acknowledgments has acknowledgment number 100; the second has
+acknowledgment number 120. Suppose now that neither of the
+acknowledgments arrives at Host A before the timeout. When the timeout
+event occurs, Host
+
+Figure 3.34 Retransmission due to a lost acknowledgment
+
+A resends the first segment with sequence number 92 and restarts the
+timer. As long as the ACK for the second segment arrives before the new
+timeout, the second segment will not be retransmitted. In a third and
+final scenario, suppose Host A sends the two segments, exactly as in the
+second example. The acknowledgment of the first segment is lost in the
+network, but just before the timeout event, Host A receives an
+acknowledgment with acknowledgment number 120. Host A therefore knows
+that Host B has received everything up through byte 119; so Host A does
+not resend either of the two segments. This scenario is illustrated in
+Figure 3.36. Doubling the Timeout Interval We now discuss a few
+modifications that most TCP implementations employ. The first concerns
+the length of the timeout interval after a timer expiration. In this
+modification, whenever the timeout event occurs, TCP retransmits the
+not-yet-acknowledged segment with the smallest sequence number, as
+described above. But each time TCP retransmits, it sets the next timeout
+interval to twice the previous value,
+
+Figure 3.35 Segment 100 not retransmitted
+
+rather than deriving it from the last EstimatedRTT and DevRTT (as
+described in Section 3.5.3). For example, suppose TimeoutInterval
+associated with the oldest not yet acknowledged segment is .75 sec when
+the timer first expires. TCP will then retransmit this segment and set
+the new expiration time to 1.5 sec. If the timer expires again 1.5 sec
+later, TCP will again retransmit this segment, now setting the
+expiration time to 3.0 sec. Thus the intervals grow exponentially after
+each retransmission. However, whenever the timer is started after either
+of the two other events (that is, data received from application above,
+and ACK received), the TimeoutInterval is derived from the most recent
+values of EstimatedRTT and DevRTT . This modification provides a limited
+form of congestion control. (More comprehensive forms of TCP congestion
+control will be studied in Section 3.7.) The timer expiration is most
+likely caused by congestion in the network, that is, too many packets
+arriving at one (or more) router queues in the path between the source
+and destination, causing packets to be dropped and/or long queuing
+delays. In times of congestion, if the sources continue to retransmit
+packets persistently, the congestion
+
+Figure 3.36 A cumulative acknowledgment avoids retransmission of the
+first segment
+
+may get worse. Instead, TCP acts more politely, with each sender
+retransmitting after longer and longer intervals. We will see that a
+similar idea is used by Ethernet when we study CSMA/CD in Chapter 6.
+Fast Retransmit One of the problems with timeout-triggered
+retransmissions is that the timeout period can be relatively long. When
+a segment is lost, this long timeout period forces the sender to delay
+resending the lost packet, thereby increasing the end-to-end delay.
+Fortunately, the sender can often detect packet loss well before the
+timeout event occurs by noting so-called duplicate ACKs. A duplicate ACK
+is an ACK that reacknowledges a segment for which the sender has already
+received an earlier acknowledgment. To understand the sender's response
+to a duplicate ACK, we must look at why the receiver sends a duplicate
+ACK in the first place. Table 3.2 summarizes the TCP receiver's ACK
+generation policy \[RFC 5681\]. When a TCP receiver receives Table 3.2
+TCP ACK Generation Recommendation \[RFC 5681\] Event
+
+TCP Receiver Action
+
+Arrival of in-order segment with expected
+
+Delayed ACK. Wait up to 500 msec for arrival of
+
+sequence number. All data up to expected
+
+another in-order segment. If next in-order segment
+
+sequence number already acknowledged.
+
+does not arrive in this interval, send an ACK.
+
+Arrival of in-order segment with expected
+
+One Immediately send single cumulative ACK,
+
+sequence number. One other in-order
+
+ACKing both in-order segments.
+
+segment waiting for ACK transmission. Arrival of out-of-order segment
+with higher-
+
+Immediately send duplicate ACK, indicating
+
+than-expected sequence number. Gap
+
+sequence number of next expected byte (which is
+
+detected.
+
+the lower end of the gap).
+
+Arrival of segment that partially or completely
+
+Immediately send ACK, provided that segment
+
+fills in gap in received data.
+
+starts at the lower end of gap.
+
+a segment with a sequence number that is larger than the next, expected,
+in-order sequence number, it detects a gap in the data stream---that is,
+a missing segment. This gap could be the result of lost or reordered
+segments within the network. Since TCP does not use negative
+acknowledgments, the receiver cannot send an explicit negative
+acknowledgment back to the sender. Instead, it simply reacknowledges
+(that is, generates a duplicate ACK for) the last in-order byte of data
+it has received. (Note that Table 3.2 allows for the case that the
+receiver does not discard out-of-order segments.) Because a sender often
+sends a large number of segments back to back, if one segment is lost,
+there will likely be many back-to-back duplicate ACKs. If the TCP sender
+receives three duplicate ACKs for the same data, it takes this as an
+indication that the segment following the segment that has been ACKed
+three times has been lost. (In the homework problems, we consider the
+question of why the sender waits for three duplicate ACKs, rather than
+just a single duplicate ACK.) In the case that three duplicate ACKs are
+received, the TCP sender performs a fast retransmit \[RFC 5681\],
+retransmitting the missing segment before that segment's timer expires.
+This is shown in Figure 3.37, where the second segment is lost, then
+retransmitted before its timer expires. For TCP with fast retransmit,
+the following code snippet replaces the ACK received event in Figure
+3.33:
+
+event: ACK received, with ACK field value of y if (y \> SendBase) {
+SendBase=y if (there are currently any not yet acknowledged segments)
+start timer
+
+}
+
+Figure 3.37 Fast retransmit: retransmitting the missing segment before
+the segment's timer expires
+
+else {/\* a duplicate ACK for already ACKed segment */ increment number
+of duplicate ACKs received for y if (number of duplicate ACKS received
+for y==3) /* TCP fast retransmit \*/ resend segment with sequence number
+y } break;
+
+We noted earlier that many subtle issues arise when a timeout/retransmit
+mechanism is implemented in an actual protocol such as TCP. The
+procedures above, which have evolved as a result of more than 20 years
+of experience with TCP timers, should convince you that this is indeed
+the case! Go-Back-N or Selective Repeat? Let us close our study of TCP's
+error-recovery mechanism by considering the following question: Is TCP a
+GBN or an SR protocol? Recall that TCP acknowledgments are cumulative
+and correctly received but out-of-order segments are not individually
+ACKed by the receiver. Consequently, as shown in Figure 3.33 (see also
+Figure 3.19), the TCP sender need only maintain the smallest sequence
+number of a transmitted but unacknowledged byte ( SendBase ) and the
+sequence number of the next byte to be sent ( NextSeqNum ). In this
+sense, TCP looks a lot like a GBN-style protocol. But there are some
+striking differences between TCP and Go-Back-N. Many TCP implementations
+will buffer correctly received but out-of-order segments \[Stevens
+1994\]. Consider also what happens when the sender sends a sequence of
+segments 1, 2, . . ., N, and all of the segments arrive in order without
+error at the receiver. Further suppose that the acknowledgment for
+packet n\<N gets lost, but the remaining N−1 acknowledgments arrive at
+the sender before their respective timeouts. In this example, GBN would
+retransmit not only packet n, but also all of the subsequent packets
+n+1,n+2,...,N. TCP, on the other hand, would retransmit at most one
+segment, namely, segment n. Moreover, TCP would not even retransmit
+segment n if the acknowledgment for segment n+1 arrived before the
+timeout for segment n. A proposed modification to TCP, the so-called
+selective acknowledgment \[RFC 2018\], allows a TCP receiver to
+acknowledge out-of-order segments selectively rather than just
+cumulatively acknowledging the last correctly received, in-order
+segment. When combined with selective retransmission---skipping the
+retransmission of segments that have already been selectively
+acknowledged by the receiver---TCP looks a lot like our generic SR
+protocol. Thus, TCP's error-recovery mechanism is probably best
+categorized as a hybrid of GBN and SR protocols.
+
+3.5.5 Flow Control Recall that the hosts on each side of a TCP
+connection set aside a receive buffer for the connection. When the TCP
+connection receives bytes that are correct and in sequence, it places
+the data in the receive buffer. The associated application process will
+read data from this buffer, but not necessarily at the instant the data
+arrives. Indeed, the receiving application may be busy with some other
+task and may not even attempt to read the data until long after it has
+arrived. If the application is relatively slow at reading the data, the
+sender can very easily overflow the connection's receive buffer by
+sending too much data too quickly.
+
+TCP provides a flow-control service to its applications to eliminate the
+possibility of the sender overflowing the receiver's buffer. Flow
+control is thus a speed-matching service---matching the rate at which
+the sender is sending against the rate at which the receiving
+application is reading. As noted earlier, a TCP sender can also be
+throttled due to congestion within the IP network; this form of sender
+control is referred to as congestion control, a topic we will explore in
+detail in Sections 3.6 and 3.7. Even though the actions taken by flow
+and congestion control are similar (the throttling of the sender), they
+are obviously taken for very different reasons. Unfortunately, many
+authors use the terms interchangeably, and the savvy reader would be
+wise to distinguish between them. Let's now discuss how TCP provides its
+flow-control service. In order to see the forest for the trees, we
+suppose throughout this section that the TCP implementation is such that
+the TCP receiver discards out-of-order segments. TCP provides flow
+control by having the sender maintain a variable called the receive
+window. Informally, the receive window is used to give the sender an
+idea of how much free buffer space is available at the receiver. Because
+TCP is full-duplex, the sender at each side of the connection maintains
+a distinct receive window. Let's investigate the receive window in the
+context of a file transfer. Suppose that Host A is sending a large file
+to Host B over a TCP connection. Host B allocates a receive buffer to
+this connection; denote its size by RcvBuffer . From time to time, the
+application process in Host B reads from the buffer. Define the
+following variables: LastByteRead : the number of the last byte in the
+data stream read from the buffer by the application process in B
+LastByteRcvd : the number of the last byte in the data stream that has
+arrived from the network and has been placed in the receive buffer at B
+Because TCP is not permitted to overflow the allocated buffer, we must
+have
+
+LastByteRcvd−LastByteRead≤RcvBuffer
+
+The receive window, denoted rwnd is set to the amount of spare room in
+the buffer:
+
+rwnd=RcvBuffer−\[LastByteRcvd−LastByteRead\]
+
+Because the spare room changes with time, rwnd is dynamic. The variable
+rwnd is illustrated in Figure 3.38.
+
+How does the connection use the variable rwnd to provide the
+flow-control service? Host B tells Host A how much spare room it has in
+the connection buffer by placing its current value of rwnd in the
+receive window field of every segment it sends to A. Initially, Host B
+sets rwnd = RcvBuffer . Note that to pull this off, Host B must keep
+track of several connection-specific variables. Host A in turn keeps
+track of two variables, LastByteSent and LastByteAcked , which have
+obvious meanings. Note that the difference between these two variables,
+LastByteSent -- LastByteAcked , is the amount of unacknowledged data
+that A has sent into the connection. By keeping the amount of
+unacknowledged data less than the value of rwnd , Host A is assured that
+it is not
+
+Figure 3.38 The receive window (rwnd) and the receive buffer (RcvBuffer)
+
+overflowing the receive buffer at Host B. Thus, Host A makes sure
+throughout the connection's life that
+
+LastByteSent−LastByteAcked≤rwnd
+
+There is one minor technical problem with this scheme. To see this,
+suppose Host B's receive buffer becomes full so that rwnd = 0. After
+advertising rwnd = 0 to Host A, also suppose that B has nothing to send
+to A. Now consider what happens. As the application process at B empties
+the buffer, TCP does not send new segments with new rwnd values to Host
+A; indeed, TCP sends a segment to Host A only if it has data to send or
+if it has an acknowledgment to send. Therefore, Host A is never informed
+that some space has opened up in Host B's receive buffer---Host A is
+blocked and can transmit no more data! To solve this problem, the TCP
+specification requires Host A to continue to send segments with one data
+byte when B's receive window is zero. These segments will be
+acknowledged by the receiver. Eventually the buffer will begin to empty
+and the acknowledgments will contain a nonzero rwnd value.
+
+The online site at http://www.awl.com/kurose-ross for this book provides
+an interactive Java applet that illustrates the operation of the TCP
+receive window. Having described TCP's flow-control service, we briefly
+mention here that UDP does not provide flow control and consequently,
+segments may be lost at the receiver due to buffer overflow. For
+example, consider sending a series of UDP segments from a process on
+Host A to a process on Host B. For a typical UDP implementation, UDP
+will append the segments in a finite-sized buffer that "precedes" the
+corresponding socket (that is, the door to the process). The process
+reads one entire segment at a time from the buffer. If the process does
+not read the segments fast enough from the buffer, the buffer will
+overflow and segments will get dropped.
+
+3.5.6 TCP Connection Management In this subsection we take a closer look
+at how a TCP connection is established and torn down. Although this
+topic may not seem particularly thrilling, it is important because TCP
+connection establishment can significantly add to perceived delays (for
+example, when surfing the Web). Furthermore, many of the most common
+network attacks---including the incredibly popular SYN flood
+attack---exploit vulnerabilities in TCP connection management. Let's
+first take a look at how a TCP connection is established. Suppose a
+process running in one host (client) wants to initiate a connection with
+another process in another host (server). The client application process
+first informs the client TCP that it wants to establish a connection to
+a process in the server. The TCP in the client then proceeds to
+establish a TCP connection with the TCP in the server in the following
+manner: Step 1. The client-side TCP first sends a special TCP segment to
+the server-side TCP. This special segment contains no application-layer
+data. But one of the flag bits in the segment's header (see Figure
+3.29), the SYN bit, is set to 1. For this reason, this special segment
+is referred to as a SYN segment. In addition, the client randomly
+chooses an initial sequence number ( client_isn ) and puts this number
+in the sequence number field of the initial TCP SYN segment. This
+segment is encapsulated within an IP datagram and sent to the server.
+There has been considerable interest in properly randomizing the choice
+of the client_isn in order to avoid certain security attacks \[CERT
+2001--09\]. Step 2. Once the IP datagram containing the TCP SYN segment
+arrives at the server host (assuming it does arrive!), the server
+extracts the TCP SYN segment from the datagram, allocates the TCP
+buffers and variables to the connection, and sends a connection-granted
+segment to the client TCP. (We'll see in Chapter 8 that the allocation
+of these buffers and variables before completing the third step of the
+three-way handshake makes TCP vulnerable to a denial-of-service attack
+known as SYN flooding.) This connection-granted segment also contains no
+application-layer data. However, it does contain three important pieces
+of information in the segment header. First, the SYN bit is set to 1.
+Second, the acknowledgment field of the TCP segment header is set to
+
+client_isn+1 . Finally, the server chooses its own initial sequence
+number ( server_isn ) and puts this value in the sequence number field
+of the TCP segment header. This connection-granted segment is saying, in
+effect, "I received your SYN packet to start a connection with your
+initial sequence number, client_isn . I agree to establish this
+connection. My own initial sequence number is server_isn ." The
+connection-granted segment is referred to as a SYNACK segment. Step 3.
+Upon receiving the SYNACK segment, the client also allocates buffers and
+variables to the connection. The client host then sends the server yet
+another segment; this last segment acknowledges the server's
+connection-granted segment (the client does so by putting the value
+server_isn+1 in the acknowledgment field of the TCP segment header). The
+SYN bit is set to zero, since the connection is established. This third
+stage of the three-way handshake may carry client-to-server data in the
+segment payload. Once these three steps have been completed, the client
+and server hosts can send segments containing data to each other. In
+each of these future segments, the SYN bit will be set to zero. Note
+that in order to establish the connection, three packets are sent
+between the two hosts, as illustrated in Figure 3.39. For this reason,
+this connection-establishment procedure is often referred to as a
+threeway handshake. Several aspects of the TCP three-way handshake are
+explored in the homework problems (Why are initial sequence numbers
+needed? Why is a three-way handshake, as opposed to a two-way handshake,
+needed?). It's interesting to note that a rock climber and a belayer
+(who is stationed below the rock climber and whose job it is to handle
+the climber's safety rope) use a threeway-handshake communication
+protocol that is identical to TCP's to ensure that both sides are ready
+before the climber begins ascent. All good things must come to an end,
+and the same is true with a TCP connection. Either of the two processes
+participating in a TCP connection can end the connection. When a
+connection ends, the "resources" (that is, the buffers and variables)
+
+Figure 3.39 TCP three-way handshake: segment exchange
+
+Figure 3.40 Closing a TCP connection
+
+in the hosts are deallocated. As an example, suppose the client decides
+to close the connection, as shown in Figure 3.40. The client application
+process issues a close command. This causes the client TCP to send a
+special TCP segment to the server process. This special segment has a
+flag bit in the segment's header, the FIN bit (see Figure 3.29), set
+to 1. When the server receives this segment, it sends the client an
+acknowledgment segment in return. The server then sends its own shutdown
+segment, which has the FIN bit set to 1. Finally, the client
+acknowledges the server's shutdown segment. At this point, all the
+resources in the two hosts are now deallocated. During the life of a TCP
+connection, the TCP protocol running in each host makes transitions
+through various TCP states. Figure 3.41 illustrates a typical sequence
+of TCP states that are visited by the client TCP. The client TCP begins
+in the CLOSED state. The application on the client side initiates a new
+TCP connection (by creating a Socket object in our Java examples as in
+the Python examples from Chapter 2). This causes TCP in the client to
+send a SYN segment to TCP in the server. After having sent the SYN
+segment, the client TCP enters the SYN_SENT state. While in the SYN_SENT
+state, the client TCP waits for a segment from the server TCP that
+includes an acknowledgment for the client's previous segment and
+
+Figure 3.41 A typical sequence of TCP states visited by a client TCP
+
+has the SYN bit set to 1. Having received such a segment, the client TCP
+enters the ESTABLISHED state. While in the ESTABLISHED state, the TCP
+client can send and receive TCP segments containing payload (that is,
+application-generated) data. Suppose that the client application decides
+it wants to close the connection. (Note that the server could also
+choose to close the connection.) This causes the client TCP to send a
+TCP segment with the FIN bit set to 1 and to enter the FIN_WAIT_1 state.
+While in the FIN_WAIT_1 state, the client TCP waits for a TCP segment
+from the server with an acknowledgment. When it receives this segment,
+the client TCP enters the FIN_WAIT_2 state. While in the FIN_WAIT_2
+state, the client waits for another segment from the server with the FIN
+bit set to 1; after receiving this segment, the client TCP acknowledges
+the server's segment and enters the TIME_WAIT state. The TIME_WAIT state
+lets the TCP client resend the final acknowledgment in case the ACK is
+lost. The time spent in the TIME_WAIT state is implementation-dependent,
+but typical values are 30 seconds, 1 minute, and 2 minutes. After the
+wait, the connection formally closes and all resources on the client
+side (including port numbers) are released. Figure 3.42 illustrates the
+series of states typically visited by the server-side TCP, assuming the
+client begins connection teardown. The transitions are self-explanatory.
+In these two state-transition diagrams, we have only shown how a TCP
+connection is normally established and shut down. We have not described
+what happens in certain pathological scenarios, for example, when both
+sides of a connection want to initiate or shut down at the same time. If
+you are interested in learning about
+
+Figure 3.42 A typical sequence of TCP states visited by a server-side
+TCP
+
+this and other advanced issues concerning TCP, you are encouraged to see
+Stevens' comprehensive book \[Stevens 1994\]. Our discussion above has
+assumed that both the client and server are prepared to communicate,
+i.e., that the server is listening on the port to which the client sends
+its SYN segment. Let's consider what happens when a host receives a TCP
+segment whose port numbers or source IP address do not match with any of
+the ongoing sockets in the host. For example, suppose a host receives a
+TCP SYN packet with destination port 80, but the host is not accepting
+connections on port 80 (that is, it is not running a Web server on port
+80). Then the host will send a special reset segment to the source. This
+TCP segment has the RST flag bit (see Section 3.5.2) set to 1. Thus,
+when a host sends a reset segment, it is telling the source "I don't
+have a socket for that segment. Please do not resend the segment." When
+a host receives a UDP packet whose destination port number doesn't match
+with an ongoing UDP socket, the host sends a special ICMP datagram, as
+discussed in Chapter 5. Now that we have a good understanding of TCP
+connection management, let's revisit the nmap portscanning tool and
+examine more closely how it works. To explore a specific TCP port, say
+port 6789, on a target host, nmap will send a TCP SYN segment with
+destination port 6789 to that host. There are three possible outcomes:
+The source host receives a TCP SYNACK segment from the target host.
+Since this means that an application is running with TCP port 6789 on
+the target post, nmap returns "open." FOCUS ON SECURITY The Syn Flood
+Attack We've seen in our discussion of TCP's three-way handshake that a
+server allocates and initializes connection variables and buffers in
+response to a received SYN. The server then sends a SYNACK in response,
+and awaits an ACK segment from the client. If the client does not send
+an ACK to complete the third step of this 3-way handshake, eventually
+(often after a minute or more) the server will terminate the half-open
+connection and reclaim the allocated resources. This TCP connection
+management protocol sets the stage for a classic Denial of Service (DoS)
+attack known as the SYN flood attack. In this attack, the attacker(s)
+send a large number of TCP SYN segments, without completing the third
+handshake step. With this deluge of SYN segments, the server's
+connection resources become exhausted as they are allocated (but never
+used!) for half-open connections; legitimate clients are then denied
+service. Such SYN flooding attacks were among the first documented DoS
+attacks \[CERT SYN 1996\]. Fortunately, an effective defense known as
+SYN cookies \[RFC 4987\] are now deployed in most major operating
+systems. SYN cookies work as follows: When the server receives a SYN
+segment, it does not know if the segment is coming
+
+from a legitimate user or is part of a SYN flood attack. So, instead of
+creating a half-open TCP connection for this SYN, the server creates an
+initial TCP sequence number that is a complicated function (hash
+function) of source and destination IP addresses and port numbers of the
+SYN segment, as well as a secret number only known to the server. This
+carefully crafted initial sequence number is the so-called "cookie." The
+server then sends the client a SYNACK packet with this special initial
+sequence number. Importantly, the server does not remember the cookie or
+any other state information corresponding to the SYN. A legitimate
+client will return an ACK segment. When the server receives this ACK, it
+must verify that the ACK corresponds to some SYN sent earlier. But how
+is this done if the server maintains no memory about SYN segments? As
+you may have guessed, it is done with the cookie. Recall that for a
+legitimate ACK, the value in the acknowledgment field is equal to the
+initial sequence number in the SYNACK (the cookie value in this case)
+plus one (see Figure 3.39). The server can then run the same hash
+function using the source and destination IP address and port numbers in
+the SYNACK (which are the same as in the original SYN) and the secret
+number. If the result of the function plus one is the same as the
+acknowledgment (cookie) value in the client's SYNACK, the server
+concludes that the ACK corresponds to an earlier SYN segment and is
+hence valid. The server then creates a fully open connection along with
+a socket. On the other hand, if the client does not return an ACK
+segment, then the original SYN has done no harm at the server, since the
+server hasn't yet allocated any resources in response to the original
+bogus SYN. The source host receives a TCP RST segment from the target
+host. This means that the SYN segment reached the target host, but the
+target host is not running an application with TCP port 6789. But the
+attacker at least knows that the segments destined to the host at port
+6789 are not blocked by any firewall on the path between source and
+target hosts. (Firewalls are discussed in Chapter 8.) The source
+receives nothing. This likely means that the SYN segment was blocked by
+an intervening firewall and never reached the target host. Nmap is a
+powerful tool that can "case the joint" not only for open TCP ports, but
+also for open UDP ports, for firewalls and their configurations, and
+even for the versions of applications and operating systems. Most of
+this is done by manipulating TCP connection-management segments
+\[Skoudis 2006\]. You can download nmap from www.nmap.org. This
+completes our introduction to error control and flow control in TCP. In
+Section 3.7 we'll return to TCP and look at TCP congestion control in
+some depth. Before doing so, however, we first step back and examine
+congestion-control issues in a broader context.
+
+3.6 Principles of Congestion Control In the previous sections, we
+examined both the general principles and specific TCP mechanisms used to
+provide for a reliable data transfer service in the face of packet loss.
+We mentioned earlier that, in practice, such loss typically results from
+the overflowing of router buffers as the network becomes congested.
+Packet retransmission thus treats a symptom of network congestion (the
+loss of a specific transport-layer segment) but does not treat the cause
+of network congestion---too many sources attempting to send data at too
+high a rate. To treat the cause of network congestion, mechanisms are
+needed to throttle senders in the face of network congestion. In this
+section, we consider the problem of congestion control in a general
+context, seeking to understand why congestion is a bad thing, how
+network congestion is manifested in the performance received by
+upper-layer applications, and various approaches that can be taken to
+avoid, or react to, network congestion. This more general study of
+congestion control is appropriate since, as with reliable data transfer,
+it is high on our "top-ten" list of fundamentally important problems in
+networking. The following section contains a detailed study of TCP's
+congestion-control algorithm.
+
+3.6.1 The Causes and the Costs of Congestion Let's begin our general
+study of congestion control by examining three increasingly complex
+scenarios in which congestion occurs. In each case, we'll look at why
+congestion occurs in the first place and at the cost of congestion (in
+terms of resources not fully utilized and poor performance received by
+the end systems). We'll not (yet) focus on how to react to, or avoid,
+congestion but rather focus on the simpler issue of understanding what
+happens as hosts increase their transmission rate and the network
+becomes congested. Scenario 1: Two Senders, a Router with Infinite
+Buffers We begin by considering perhaps the simplest congestion scenario
+possible: Two hosts (A and B) each have a connection that shares a
+single hop between source and destination, as shown in Figure 3.43.
+Let's assume that the application in Host A is sending data into the
+connection (for example, passing data to the transport-level protocol
+via a socket) at an average rate of λin bytes/sec. These data are
+original in the sense that each unit of data is sent into the socket
+only once. The underlying transportlevel protocol is a simple one. Data
+is encapsulated and sent; no error recovery (for example,
+
+retransmission), flow control, or congestion control is performed.
+Ignoring the additional overhead due to adding transport- and
+lower-layer header information, the rate at which Host A offers traffic
+to the router in this first scenario is thus λin bytes/sec. Host B
+operates in a similar manner, and we assume for simplicity that it too
+is sending at a rate of λin bytes/sec. Packets from Hosts A and B pass
+through a router and over a shared outgoing link of capacity R. The
+router has buffers that allow it to store incoming packets when the
+packet-arrival rate exceeds the outgoing link's capacity. In this first
+scenario, we assume that the router has an infinite amount of buffer
+space. Figure 3.44 plots the performance of Host A's connection under
+this first scenario. The left graph plots the per-connection throughput
+(number of bytes per
+
+Figure 3.43 Congestion scenario 1: Two connections sharing a single hop
+with infinite buffers
+
+Figure 3.44 Congestion scenario 1: Throughput and delay as a function of
+host sending rate
+
+second at the receiver) as a function of the connection-sending rate.
+For a sending rate between 0 and R/2, the throughput at the receiver
+equals the sender's sending rate---everything sent by the sender is
+received at the receiver with a finite delay. When the sending rate is
+above R/2, however, the throughput is only R/2. This upper limit on
+throughput is a consequence of the sharing of link capacity between two
+connections. The link simply cannot deliver packets to a receiver at a
+steady-state rate that exceeds R/2. No matter how high Hosts A and B set
+their sending rates, they will each never see a throughput higher than
+R/2. Achieving a per-connection throughput of R/2 might actually appear
+to be a good thing, because the link is fully utilized in delivering
+packets to their destinations. The right-hand graph in Figure 3.44,
+however, shows the consequence of operating near link capacity. As the
+sending rate approaches R/2 (from the left), the average delay becomes
+larger and larger. When the sending rate exceeds R/2, the average number
+of queued packets in the router is unbounded, and the average delay
+between source and destination becomes infinite (assuming that the
+connections operate at these sending rates for an infinite period of
+time and there is an infinite amount of buffering available). Thus,
+while operating at an aggregate throughput of near R may be ideal from a
+throughput standpoint, it is far from ideal from a delay standpoint.
+Even in this (extremely) idealized scenario, we've already found one
+cost of a congested network---large queuing delays are experienced as
+the packet-arrival rate nears the link capacity. Scenario 2: Two Senders
+and a Router with Finite Buffers Let's now slightly modify scenario 1 in
+the following two ways (see Figure 3.45). First, the amount of router
+buffering is assumed to be finite. A consequence of this real-world
+assumption is that packets will be dropped when arriving to an
+already-full buffer. Second, we assume that each connection is reliable.
+If a packet containing
+
+Figure 3.45 Scenario 2: Two hosts (with retransmissions) and a router
+with finite buffers
+
+a transport-level segment is dropped at the router, the sender will
+eventually retransmit it. Because packets can be retransmitted, we must
+now be more careful with our use of the term sending rate. Specifically,
+let us again denote the rate at which the application sends original
+data into the socket by λin bytes/sec. The rate at which the transport
+layer sends segments (containing original data and retransmitted data)
+into the network will be denoted λ′in bytes/sec. λ′in is sometimes
+referred to as the offered load to the network. The performance realized
+under scenario 2 will now depend strongly on how retransmission is
+performed. First, consider the unrealistic case that Host A is able to
+somehow (magically!) determine whether or not a buffer is free in the
+router and thus sends a packet only when a buffer is free. In this case,
+no loss would occur, λin would be equal to λ′in, and the throughput of
+the connection would be equal to λin. This case is shown in Figure
+3.46(a). From a throughput standpoint, performance is ideal---
+everything that is sent is received. Note that the average host sending
+rate cannot exceed R/2 under this scenario, since packet loss is assumed
+never to occur. Consider next the slightly more realistic case that the
+sender retransmits only when a packet is known for certain to be lost.
+(Again, this assumption is a bit of a stretch. However, it is possible
+that the sending host might set its timeout large enough to be virtually
+assured that a packet that has not been acknowledged has been lost.) In
+this case, the performance might look something like that shown in
+Figure 3.46(b). To appreciate what is happening here, consider the case
+that the offered load, λ′in (the rate of original data transmission plus
+retransmissions), equals R/2. According to Figure 3.46(b), at this value
+of the offered load, the rate at which data
+
+Figure 3.46 Scenario 2 performance with finite buffers
+
+are delivered to the receiver application is R/3. Thus, out of the 0.5R
+units of data transmitted, 0.333R bytes/sec (on average) are original
+data and 0.166R bytes/sec (on average) are retransmitted data. We see
+here another cost of a congested network---the sender must perform
+retransmissions in order to compensate for dropped (lost) packets due to
+buffer overflow. Finally, let us consider the case that the sender may
+time out prematurely and retransmit a packet that has been delayed in
+the queue but not yet lost. In this case, both the original data packet
+and the retransmission may reach the receiver. Of course, the receiver
+needs but one copy of this packet and will discard the retransmission.
+In this case, the work done by the router in forwarding the
+retransmitted copy of the original packet was wasted, as the receiver
+will have already received the original copy of this packet. The router
+would have better used the link transmission capacity to send a
+different packet instead. Here then is yet another cost of a congested
+network---unneeded retransmissions by the sender in the face of large
+delays may cause a router to use its link bandwidth to forward unneeded
+copies of a packet. Figure 3.46 (c) shows the throughput versus offered
+load when each packet is assumed to be forwarded (on average) twice by
+the router. Since each packet is forwarded twice, the throughput will
+have an asymptotic value of R/4 as the offered load approaches R/2.
+Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop
+Paths In our final congestion scenario, four hosts transmit packets,
+each over overlapping two-hop paths, as shown in Figure 3.47. We again
+assume that each host uses a timeout/retransmission mechanism to
+implement a reliable data transfer service, that all hosts have the same
+value of λin, and that all router links have capacity R bytes/sec.
+
+Figure 3.47 Four senders, routers with finite buffers, and multihop
+paths
+
+Let's consider the connection from Host A to Host C, passing through
+routers R1 and R2. The A--C connection shares router R1 with the D--B
+connection and shares router R2 with the B--D connection. For extremely
+small values of λin, buffer overflows are rare (as in congestion
+scenarios 1 and 2), and the throughput approximately equals the offered
+load. For slightly larger values of λin, the corresponding throughput is
+also larger, since more original data is being transmitted into the
+network and delivered to the destination, and overflows are still rare.
+Thus, for small values of λin, an increase in λin results in an increase
+in λout. Having considered the case of extremely low traffic, let's next
+examine the case that λin (and hence λ′in) is extremely large. Consider
+router R2. The A--C traffic arriving to router R2 (which arrives at R2
+after being forwarded from R1) can have an arrival rate at R2 that is at
+most R, the capacity of the link from R1 to R2, regardless of the value
+of λin. If λ′in is extremely large for all connections (including the
+
+Figure 3.48 Scenario 3 performance with finite buffers and multihop
+paths
+
+B--D connection), then the arrival rate of B--D traffic at R2 can be
+much larger than that of the A--C traffic. Because the A--C and B--D
+traffic must compete at router R2 for the limited amount of buffer
+space, the amount of A--C traffic that successfully gets through R2
+(that is, is not lost due to buffer overflow) becomes smaller and
+smaller as the offered load from B--D gets larger and larger. In the
+limit, as the offered load approaches infinity, an empty buffer at R2 is
+immediately filled by a B--D packet, and the throughput of the A--C
+connection at R2 goes to zero. This, in turn, implies that the A--C
+end-to-end throughput goes to zero in the limit of heavy traffic. These
+considerations give rise to the offered load versus throughput tradeoff
+shown in Figure 3.48. The reason for the eventual decrease in throughput
+with increasing offered load is evident when one considers the amount of
+wasted work done by the network. In the high-traffic scenario outlined
+above, whenever a packet is dropped at a second-hop router, the work
+done by the first-hop router in forwarding a packet to the second-hop
+router ends up being "wasted." The network would have been equally well
+off (more accurately, equally bad off) if the first router had simply
+discarded that packet and remained idle. More to the point, the
+transmission capacity used at the first router to forward the packet to
+the second router could have been much more profitably used to transmit
+a different packet. (For example, when selecting a packet for
+transmission, it might be better for a router to give priority to
+packets that have already traversed some number of upstream routers.) So
+here we see yet another cost of dropping a packet due to
+congestion---when a packet is dropped along a path, the transmission
+capacity that was used at each of the upstream links to forward that
+packet to the point at which it is dropped ends up having been wasted.
+
+3.6.2 Approaches to Congestion Control In Section 3.7, we'll examine
+TCP's specific approach to congestion control in great detail. Here, we
+identify the two broad approaches to congestion control that are taken
+in practice and discuss specific
+
+network architectures and congestion-control protocols embodying these
+approaches. At the highest level, we can distinguish among
+congestion-control approaches by whether the network layer provides
+explicit assistance to the transport layer for congestion-control
+purposes: End-to-end congestion control. In an end-to-end approach to
+congestion control, the network layer provides no explicit support to
+the transport layer for congestion-control purposes. Even the presence
+of network congestion must be inferred by the end systems based only on
+observed network behavior (for example, packet loss and delay). We'll
+see shortly in Section 3.7.1 that TCP takes this end-to-end approach
+toward congestion control, since the IP layer is not required to provide
+feedback to hosts regarding network congestion. TCP segment loss (as
+indicated by a timeout or the receipt of three duplicate
+acknowledgments) is taken as an indication of network congestion, and
+TCP decreases its window size accordingly. We'll also see a more recent
+proposal for TCP congestion control that uses increasing round-trip
+segment delay as an indicator of increased network congestion
+Network-assisted congestion control. With network-assisted congestion
+control, routers provide explicit feedback to the sender and/or receiver
+regarding the congestion state of the network. This feedback may be as
+simple as a single bit indicating congestion at a link -- an approach
+taken in the early IBM SNA \[Schwartz 1982\], DEC DECnet \[Jain 1989;
+Ramakrishnan 1990\] architectures, and ATM \[Black 1995\] network
+architectures. More sophisticated feedback is also possible. For
+example, in ATM Available Bite Rate (ABR) congestion control, a router
+informs the sender of the maximum host sending rate it (the router) can
+support on an outgoing link. As noted above, the Internet-default
+versions of IP and TCP adopt an end-to-end approach towards congestion
+control. We'll see, however, in Section 3.7.2 that, more recently, IP
+and TCP may also optionally implement network-assisted congestion
+control. For network-assisted congestion control, congestion information
+is typically fed back from the network to the sender in one of two ways,
+as shown in Figure 3.49. Direct feedback may be sent from a network
+router to the sender. This form of notification typically takes the form
+of a choke packet (essentially saying, "I'm congested!"). The second and
+more common form of notification occurs when a router marks/updates a
+field in a packet flowing from sender to receiver to indicate
+congestion. Upon receipt of a marked packet, the receiver then notifies
+the sender of the congestion indication. This latter form of
+notification takes a full round-trip time.
+
+Figure 3.49 Two feedback pathways for network-indicated congestion
+information
+
+3.7 TCP Congestion Control In this section we return to our study of
+TCP. As we learned in Section 3.5, TCP provides a reliable transport
+service between two processes running on different hosts. Another key
+component of TCP is its congestion-control mechanism. As indicated in
+the previous section, TCP must use end-to-end congestion control rather
+than network-assisted congestion control, since the IP layer provides no
+explicit feedback to the end systems regarding network congestion. The
+approach taken by TCP is to have each sender limit the rate at which it
+sends traffic into its connection as a function of perceived network
+congestion. If a TCP sender perceives that there is little congestion on
+the path between itself and the destination, then the TCP sender
+increases its send rate; if the sender perceives that there is
+congestion along the path, then the sender reduces its send rate. But
+this approach raises three questions. First, how does a TCP sender limit
+the rate at which it sends traffic into its connection? Second, how does
+a TCP sender perceive that there is congestion on the path between
+itself and the destination? And third, what algorithm should the sender
+use to change its send rate as a function of perceived end-to-end
+congestion? Let's first examine how a TCP sender limits the rate at
+which it sends traffic into its connection. In Section 3.5 we saw that
+each side of a TCP connection consists of a receive buffer, a send
+buffer, and several variables ( LastByteRead , rwnd , and so on). The
+TCP congestion-control mechanism operating at the sender keeps track of
+an additional variable, the congestion window. The congestion window,
+denoted cwnd , imposes a constraint on the rate at which a TCP sender
+can send traffic into the network. Specifically, the amount of
+unacknowledged data at a sender may not exceed the minimum of cwnd and
+rwnd , that is:
+
+LastByteSent−LastByteAcked≤min{cwnd, rwnd}
+
+In order to focus on congestion control (as opposed to flow control),
+let us henceforth assume that the TCP receive buffer is so large that
+the receive-window constraint can be ignored; thus, the amount of
+unacknowledged data at the sender is solely limited by cwnd . We will
+also assume that the sender always has data to send, i.e., that all
+segments in the congestion window are sent. The constraint above limits
+the amount of unacknowledged data at the sender and therefore indirectly
+limits the sender's send rate. To see this, consider a connection for
+which loss and packet transmission delays are negligible. Then, roughly,
+at the beginning of every RTT, the constraint permits the sender to
+
+send cwnd bytes of data into the connection; at the end of the RTT the
+sender receives acknowledgments for the data. Thus the sender's send
+rate is roughly cwnd/RTT bytes/sec. By adjusting the value of cwnd , the
+sender can therefore adjust the rate at which it sends data into its
+connection. Let's next consider how a TCP sender perceives that there is
+congestion on the path between itself and the destination. Let us define
+a "loss event" at a TCP sender as the occurrence of either a timeout or
+the receipt of three duplicate ACKs from the receiver. (Recall our
+discussion in Section 3.5.4 of the timeout event in Figure 3.33 and the
+subsequent modification to include fast retransmit on receipt of three
+duplicate ACKs.) When there is excessive congestion, then one (or more)
+router buffers along the path overflows, causing a datagram (containing
+a TCP segment) to be dropped. The dropped datagram, in turn, results in
+a loss event at the sender---either a timeout or the receipt of three
+duplicate ACKs--- which is taken by the sender to be an indication of
+congestion on the sender-to-receiver path. Having considered how
+congestion is detected, let's next consider the more optimistic case
+when the network is congestion-free, that is, when a loss event doesn't
+occur. In this case, acknowledgments for previously unacknowledged
+segments will be received at the TCP sender. As we'll see, TCP will take
+the arrival of these acknowledgments as an indication that all is
+well---that segments being transmitted into the network are being
+successfully delivered to the destination---and will use acknowledgments
+to increase its congestion window size (and hence its transmission
+rate). Note that if acknowledgments arrive at a relatively slow rate
+(e.g., if the end-end path has high delay or contains a low-bandwidth
+link), then the congestion window will be increased at a relatively slow
+rate. On the other hand, if acknowledgments arrive at a high rate, then
+the congestion window will be increased more quickly. Because TCP uses
+acknowledgments to trigger (or clock) its increase in congestion window
+size, TCP is said to be self-clocking. Given the mechanism of adjusting
+the value of cwnd to control the sending rate, the critical question
+remains: How should a TCP sender determine the rate at which it should
+send? If TCP senders collectively send too fast, they can congest the
+network, leading to the type of congestion collapse that we saw in
+Figure 3.48. Indeed, the version of TCP that we'll study shortly was
+developed in response to observed Internet congestion collapse
+\[Jacobson 1988\] under earlier versions of TCP. However, if TCP senders
+are too cautious and send too slowly, they could under utilize the
+bandwidth in the network; that is, the TCP senders could send at a
+higher rate without congesting the network. How then do the TCP senders
+determine their sending rates such that they don't congest the network
+but at the same time make use of all the available bandwidth? Are TCP
+senders explicitly coordinated, or is there a distributed approach in
+which the TCP senders can set their sending rates based only on local
+information? TCP answers these questions using the following guiding
+principles: A lost segment implies congestion, and hence, the TCP
+sender's rate should be decreased when a segment is lost. Recall from
+our discussion in Section 3.5.4, that a timeout event or the
+
+receipt of four acknowledgments for a given segment (one original ACK
+and then three duplicate ACKs) is interpreted as an implicit "loss
+event" indication of the segment following the quadruply ACKed segment,
+triggering a retransmission of the lost segment. From a
+congestion-control standpoint, the question is how the TCP sender should
+decrease its congestion window size, and hence its sending rate, in
+response to this inferred loss event. An acknowledged segment indicates
+that the network is delivering the sender's segments to the receiver,
+and hence, the sender's rate can be increased when an ACK arrives for a
+previously unacknowledged segment. The arrival of acknowledgments is
+taken as an implicit indication that all is well---segments are being
+successfully delivered from sender to receiver, and the network is thus
+not congested. The congestion window size can thus be increased.
+Bandwidth probing. Given ACKs indicating a congestion-free
+source-to-destination path and loss events indicating a congested path,
+TCP's strategy for adjusting its transmission rate is to increase its
+rate in response to arriving ACKs until a loss event occurs, at which
+point, the transmission rate is decreased. The TCP sender thus increases
+its transmission rate to probe for the rate that at which congestion
+onset begins, backs off from that rate, and then to begins probing again
+to see if the congestion onset rate has changed. The TCP sender's
+behavior is perhaps analogous to the child who requests (and gets) more
+and more goodies until finally he/she is finally told "No!", backs off a
+bit, but then begins making requests again shortly afterwards. Note that
+there is no explicit signaling of congestion state by the network---ACKs
+and loss events serve as implicit signals---and that each TCP sender
+acts on local information asynchronously from other TCP senders. Given
+this overview of TCP congestion control, we're now in a position to
+consider the details of the celebrated TCP congestion-control algorithm,
+which was first described in \[Jacobson 1988\] and is standardized in
+\[RFC 5681\]. The algorithm has three major components: (1) slow start,
+(2) congestion avoidance, and (3) fast recovery. Slow start and
+congestion avoidance are mandatory components of TCP, differing in how
+they increase the size of cwnd in response to received ACKs. We'll see
+shortly that slow start increases the size of cwnd more rapidly (despite
+its name!) than congestion avoidance. Fast recovery is recommended, but
+not required, for TCP senders. Slow Start When a TCP connection begins,
+the value of cwnd is typically initialized to a small value of 1 MSS
+\[RFC 3390\], resulting in an initial sending rate of roughly MSS/RTT.
+For example, if MSS = 500 bytes and RTT = 200 msec, the resulting
+initial sending rate is only about 20 kbps. Since the available
+bandwidth to the TCP sender may be much larger than MSS/RTT, the TCP
+sender would like to find the amount of available bandwidth quickly.
+Thus, in the slow-start state, the value of cwnd begins at 1 MSS and
+increases by 1 MSS every time a transmitted segment is first
+acknowledged. In the example of Figure 3.50, TCP sends the first segment
+into the network
+
+Figure 3.50 TCP slow start
+
+and waits for an acknowledgment. When this acknowledgment arrives, the
+TCP sender increases the congestion window by one MSS and sends out two
+maximum-sized segments. These segments are then acknowledged, with the
+sender increasing the congestion window by 1 MSS for each of the
+acknowledged segments, giving a congestion window of 4 MSS, and so on.
+This process results in a doubling of the sending rate every RTT. Thus,
+the TCP send rate starts slow but grows exponentially during the slow
+start phase. But when should this exponential growth end? Slow start
+provides several answers to this question. First, if there is a loss
+event (i.e., congestion) indicated by a timeout, the TCP sender sets the
+value of cwnd to 1 and begins the slow start process anew. It also sets
+the value of a second state variable, ssthresh (shorthand for "slow
+start threshold") to cwnd/2 ---half of the value of the congestion
+window value when congestion was detected. The second way in which slow
+start may end is directly tied to the value of ssthresh . Since ssthresh
+is half the value of cwnd when congestion was last detected, it might be
+a bit reckless to keep doubling cwnd when it reaches or surpasses the
+value of ssthresh . Thus, when the value of cwnd equals ssthresh , slow
+start ends and TCP transitions into congestion avoidance mode. As we'll
+see, TCP increases cwnd more cautiously when in congestion-avoidance
+mode. The final way in which slow start can end is if three duplicate
+ACKs are
+
+detected, in which case TCP performs a fast retransmit (see Section
+3.5.4) and enters the fast recovery state, as discussed below. TCP's
+behavior in slow start is summarized in the FSM description of TCP
+congestion control in Figure 3.51. The slow-start algorithm traces it
+roots to \[Jacobson 1988\]; an approach similar to slow start was also
+proposed independently in \[Jain 1986\]. Congestion Avoidance On entry
+to the congestion-avoidance state, the value of cwnd is approximately
+half its value when congestion was last encountered---congestion could
+be just around the corner! Thus, rather than doubling the value of cwnd
+every RTT, TCP adopts a more conservative approach and increases the
+value of cwnd by just a single MSS every RTT \[RFC 5681\]. This can be
+accomplished in several ways. A common approach is for the TCP sender to
+increase cwnd by MSS bytes (MSS/ cwnd ) whenever a new acknowledgment
+arrives. For example, if MSS is 1,460 bytes and cwnd is 14,600 bytes,
+then 10 segments are being sent within an RTT. Each arriving ACK
+(assuming one ACK per segment) increases the congestion window size by
+1/10 MSS, and thus, the value of the congestion window will have
+increased by one MSS after ACKs when all 10 segments have been received.
+But when should congestion avoidance's linear increase (of 1 MSS per
+RTT) end? TCP's congestionavoidance algorithm behaves the same when a
+timeout occurs. As in the case of slow start: The value of cwnd is set
+to 1 MSS, and the value of ssthresh is updated to half the value of cwnd
+when the loss event occurred. Recall, however, that a loss event also
+can be triggered by a triple duplicate ACK event.
+
+Figure 3.51 FSM description of TCP congestion control
+
+In this case, the network is continuing to deliver segments from sender
+to receiver (as indicated by the receipt of duplicate ACKs). So TCP's
+behavior to this type of loss event should be less drastic than with a
+timeout-indicated loss: TCP halves the value of cwnd (adding in 3 MSS
+for good measure to account for the triple duplicate ACKs received) and
+records the value of ssthresh to be half the value of cwnd when the
+triple duplicate ACKs were received. The fast-recovery state is then
+entered. Fast Recovery In fast recovery, the value of cwnd is increased
+by 1 MSS for every duplicate ACK received for the missing segment that
+caused TCP to enter the fast-recovery state. Eventually, when an ACK
+arrives for the missing segment, TCP enters the
+
+Examining the behavior of TCP
+
+PRINCIPLES IN PRACTICE TCP SPLITTING: OPTIMIZING THE PERFORMANCE OF
+CLOUD SERVICES For cloud services such as search, e-mail, and social
+networks, it is desirable to provide a highlevel of responsiveness,
+ideally giving users the illusion that the services are running within
+their own end systems (including their smartphones). This can be a major
+challenge, as users are often located far away from the data centers
+responsible for serving the dynamic content associated with the cloud
+services. Indeed, if the end system is far from a data center, then the
+RTT will be large, potentially leading to poor response time performance
+due to TCP slow start. As a case study, consider the delay in receiving
+a response for a search query. Typically, the server requires three TCP
+windows during slow start to deliver the response \[Pathak 2010\]. Thus
+the time from when an end system initiates a TCP connection until the
+time when it receives the last packet of the response is roughly 4⋅RTT
+(one RTT to set up the TCP connection plus three RTTs for the three
+windows of data) plus the processing time in the data center. These RTT
+delays can lead to a noticeable delay in returning search results for a
+significant fraction of queries. Moreover, there can be significant
+packet loss in access networks, leading to TCP retransmissions and even
+larger delays. One way to mitigate this problem and improve
+user-perceived performance is to (1) deploy frontend servers closer to
+the users, and (2) utilize TCP splitting by breaking the TCP connection
+at the front-end server. With TCP splitting, the client establishes a
+TCP connection to the nearby front-end, and the front-end maintains a
+persistent TCP connection to the data center with a very large TCP
+congestion window \[Tariq 2008, Pathak 2010, Chen 2011\]. With this
+approach, the response time roughly becomes 4⋅RTTFE+RTTBE+ processing
+time, where RTTFE is the roundtrip time between client and front-end
+server, and RTTBE is the round-trip time between the frontend server and
+the data center (back-end server). If the front-end server is close to
+client, then this response time approximately becomes RTT plus
+processing time, since RTTFE is negligibly small and RTTBE is
+approximately RTT. In summary, TCP splitting can reduce the networking
+delay roughly from 4⋅RTT to RTT, significantly improving user-perceived
+performance, particularly for users who are far from the nearest data
+center. TCP splitting also helps reduce TCP retransmission delays caused
+by losses in access networks. Google and Akamai have made extensive use
+of their CDN servers in access networks (recall our discussion in
+Section 2.6) to perform TCP splitting for the cloud services they
+support \[Chen 2011\].
+
+congestion-avoidance state after deflating cwnd . If a timeout event
+occurs, fast recovery transitions to the slow-start state after
+performing the same actions as in slow start and congestion avoidance:
+The value of cwnd is set to 1 MSS, and the value of ssthresh is set to
+half the value of cwnd when the loss event occurred. Fast recovery is a
+recommended, but not required, component of TCP \[RFC 5681\]. It is
+interesting that an early version of TCP, known as TCP Tahoe,
+unconditionally cut its congestion window to 1 MSS and entered the
+slow-start phase after either a timeout-indicated or
+triple-duplicate-ACK-indicated loss event. The newer version of TCP, TCP
+Reno, incorporated fast recovery. Figure 3.52 illustrates the evolution
+of TCP's congestion window for both Reno and Tahoe. In this figure, the
+threshold is initially equal to 8 MSS. For the first eight transmission
+rounds, Tahoe and Reno take identical actions. The congestion window
+climbs exponentially fast during slow start and hits the threshold at
+the fourth round of transmission. The congestion window then climbs
+linearly until a triple duplicate- ACK event occurs, just after
+transmission round 8. Note that the congestion window is 12⋅MSS when
+this loss event occurs. The value of ssthresh is then set to 0.5⋅ cwnd
+=6⋅MSS. Under TCP Reno, the congestion window is set to cwnd = 9⋅MSS and
+then grows linearly. Under TCP Tahoe, the congestion window is set to 1
+MSS and grows exponentially until it reaches the value of ssthresh , at
+which point it grows linearly. Figure 3.51 presents the complete FSM
+description of TCP's congestion-control algorithms---slow start,
+congestion avoidance, and fast recovery. The figure also indicates where
+transmission of new segments or retransmitted segments can occur.
+Although it is important to distinguish between TCP error
+control/retransmission and TCP congestion control, it's also important
+to appreciate how these two aspects of TCP are inextricably linked. TCP
+Congestion Control: Retrospective Having delved into the details of slow
+start, congestion avoidance, and fast recovery, it's worthwhile to now
+step back and view the forest from the trees. Ignoring the
+
+Figure 3.52 Evolution of TCP's congestion window (Tahoe and Reno)
+
+Figure 3.53 Additive-increase, multiplicative-decrease congestion
+control
+
+initial slow-start period when a connection begins and assuming that
+losses are indicated by triple duplicate ACKs rather than timeouts,
+TCP's congestion control consists of linear (additive) increase in cwnd
+of 1 MSS per RTT and then a halving (multiplicative decrease) of cwnd on
+a triple duplicate-ACK event. For this reason, TCP congestion control is
+often referred to as an additive-increase, multiplicative-decrease
+(AIMD) form of congestion control. AIMD congestion control gives rise to
+the "saw tooth" behavior shown in Figure 3.53, which also nicely
+illustrates our earlier intuition of TCP "probing" for bandwidth---TCP
+linearly increases its congestion window size (and hence its
+transmission rate) until a triple duplicate-ACK event occurs. It then
+decreases its congestion window size by a factor of two but then again
+begins increasing it linearly, probing to see if there is additional
+available bandwidth.
+
+As noted previously, many TCP implementations use the Reno algorithm
+\[Padhye 2001\]. Many variations of the Reno algorithm have been
+proposed \[RFC 3782; RFC 2018\]. The TCP Vegas algorithm \[Brakmo 1995;
+Ahn 1995\] attempts to avoid congestion while maintaining good
+throughput. The basic idea of Vegas is to (1) detect congestion in the
+routers between source and destination before packet loss occurs, and
+(2) lower the rate linearly when this imminent packet loss is detected.
+Imminent packet loss is predicted by observing the RTT. The longer the
+RTT of the packets, the greater the congestion in the routers. As of
+late 2015, the Ubuntu Linux implementation of TCP provided slowstart,
+congestion avoidance, fast recovery, fast retransmit, and SACK, by
+default; alternative congestion control algorithms, such as TCP Vegas
+and BIC \[Xu 2004\], are also provided. For a survey of the many flavors
+of TCP, see \[Afanasyev 2010\]. TCP's AIMD algorithm was developed based
+on a tremendous amount of engineering insight and experimentation with
+congestion control in operational networks. Ten years after TCP's
+development, theoretical analyses showed that TCP's congestion-control
+algorithm serves as a distributed asynchronous-optimization algorithm
+that results in several important aspects of user and network
+performance being simultaneously optimized \[Kelly 1998\]. A rich theory
+of congestion control has since been developed \[Srikant 2004\].
+Macroscopic Description of TCP Throughput Given the saw-toothed behavior
+of TCP, it's natural to consider what the average throughput (that is,
+the average rate) of a long-lived TCP connection might be. In this
+analysis we'll ignore the slow-start phases that occur after timeout
+events. (These phases are typically very short, since the sender grows
+out of the phase exponentially fast.) During a particular round-trip
+interval, the rate at which TCP sends data is a function of the
+congestion window and the current RTT. When the window size is w bytes
+and the current round-trip time is RTT seconds, then TCP's transmission
+rate is roughly w/RTT. TCP then probes for additional bandwidth by
+increasing w by 1 MSS each RTT until a loss event occurs. Denote by W
+the value of w when a loss event occurs. Assuming that RTT and W are
+approximately constant over the duration of the connection, the TCP
+transmission rate ranges from W/(2 · RTT) to W/RTT. These assumptions
+lead to a highly simplified macroscopic model for the steady-state
+behavior of TCP. The network drops a packet from the connection when the
+rate increases to W/RTT; the rate is then cut in half and then increases
+by MSS/RTT every RTT until it again reaches W/RTT. This process repeats
+itself over and over again. Because TCP's throughput (that is, rate)
+increases linearly between the two extreme values, we have average
+throughput of a connection=0.75⋅WRTT Using this highly idealized model
+for the steady-state dynamics of TCP, we can also derive an interesting
+expression that relates a connection's loss rate to its available
+bandwidth \[Mahdavi 1997\].
+
+This derivation is outlined in the homework problems. A more
+sophisticated model that has been found empirically to agree with
+measured data is \[Padhye 2000\]. TCP Over High-Bandwidth Paths It is
+important to realize that TCP congestion control has evolved over the
+years and indeed continues to evolve. For a summary of current TCP
+variants and discussion of TCP evolution, see \[Floyd 2001, RFC 5681,
+Afanasyev 2010\]. What was good for the Internet when the bulk of the
+TCP connections carried SMTP, FTP, and Telnet traffic is not necessarily
+good for today's HTTP-dominated Internet or for a future Internet with
+services that are still undreamed of. The need for continued evolution
+of TCP can be illustrated by considering the high-speed TCP connections
+that are needed for grid- and cloud-computing applications. For example,
+consider a TCP connection with 1,500-byte segments and a 100 ms RTT, and
+suppose we want to send data through this connection at 10 Gbps.
+Following \[RFC 3649\], we note that using the TCP throughput formula
+above, in order to achieve a 10 Gbps throughput, the average congestion
+window size would need to be 83,333 segments. That's a lot of segments,
+leading us to be rather concerned that one of these 83,333 in-flight
+segments might be lost. What would happen in the case of a loss? Or, put
+another way, what fraction of the transmitted segments could be lost
+that would allow the TCP congestion-control algorithm specified in
+Figure 3.51 still to achieve the desired 10 Gbps rate? In the homework
+questions for this chapter, you are led through the derivation of a
+formula relating the throughput of a TCP connection as a function of the
+loss rate (L), the round-trip time (RTT), and the maximum segment size
+(MSS): average throughput of a connection=1.22⋅MSSRTTL Using this
+formula, we can see that in order to achieve a throughput of 10 Gbps,
+today's TCP congestion-control algorithm can only tolerate a segment
+loss probability of 2 · 10--10 (or equivalently, one loss event for
+every 5,000,000,000 segments)---a very low rate. This observation has
+led a number of researchers to investigate new versions of TCP that are
+specifically designed for such high-speed environments; see \[Jin 2004;
+Kelly 2003; Ha 2008; RFC 7323\] for discussions of these efforts.
+
+3.7.1 Fairness Consider K TCP connections, each with a different
+end-to-end path, but all passing through a bottleneck link with
+transmission rate R bps. (By bottleneck link, we mean that for each
+connection, all the other links along the connection's path are not
+congested and have abundant transmission capacity as compared with the
+transmission capacity of the bottleneck link.) Suppose each connection
+is transferring a large file and there is no UDP traffic passing through
+the bottleneck link. A congestion-control mechanism is said to be fair
+if the average transmission rate of each connection is approximately
+R/K;
+
+that is, each connection gets an equal share of the link bandwidth. Is
+TCP's AIMD algorithm fair, particularly given that different TCP
+connections may start at different times and thus may have different
+window sizes at a given point in time? \[Chiu 1989\] provides an elegant
+and intuitive explanation of why TCP congestion control converges to
+provide an equal share of a bottleneck link's bandwidth among competing
+TCP connections. Let's consider the simple case of two TCP connections
+sharing a single link with transmission rate R, as shown in Figure 3.54.
+Assume that the two connections
+
+Figure 3.54 Two TCP connections sharing a single bottleneck link
+
+have the same MSS and RTT (so that if they have the same congestion
+window size, then they have the same throughput), that they have a large
+amount of data to send, and that no other TCP connections or UDP
+datagrams traverse this shared link. Also, ignore the slow-start phase
+of TCP and assume the TCP connections are operating in CA mode (AIMD) at
+all times. Figure 3.55 plots the throughput realized by the two TCP
+connections. If TCP is to share the link bandwidth equally between the
+two connections, then the realized throughput should fall along the
+45degree arrow (equal bandwidth share) emanating from the origin.
+Ideally, the sum of the two throughputs should equal R. (Certainly, each
+connection receiving an equal, but zero, share of the link capacity is
+not a desirable situation!) So the goal should be to have the achieved
+throughputs fall somewhere near the intersection of the equal bandwidth
+share line and the full bandwidth utilization line in Figure 3.55.
+Suppose that the TCP window sizes are such that at a given point in
+time, connections 1 and 2 realize throughputs indicated by point A in
+Figure 3.55. Because the amount of link bandwidth jointly consumed by
+the two connections is less than R, no loss will occur, and both
+connections will increase their window by 1 MSS per RTT as a result of
+TCP's congestion-avoidance algorithm. Thus, the joint throughput of the
+two connections proceeds along a 45-degree line (equal increase for both
+
+connections) starting from point A. Eventually, the link bandwidth
+jointly consumed by the two connections will be greater than R, and
+eventually packet loss will occur. Suppose that connections 1 and 2
+experience packet loss when they realize throughputs indicated by point
+B. Connections 1 and 2 then decrease their windows by a factor of two.
+The resulting throughputs realized are thus at point C, halfway along a
+vector starting at B and ending at the origin. Because the joint
+bandwidth use is less than R at point C, the two connections again
+increase their throughputs along a 45-degree line starting from C.
+Eventually, loss will again occur, for example, at point D, and the two
+connections again decrease their window sizes by a factor of two, and so
+on. You should convince yourself that the bandwidth realized by the two
+connections eventually fluctuates along the equal bandwidth share line.
+You should also convince
+
+Figure 3.55 Throughput realized by TCP connections 1 and 2
+
+yourself that the two connections will converge to this behavior
+regardless of where they are in the twodimensional space! Although a
+number of idealized assumptions lie behind this scenario, it still
+provides an intuitive feel for why TCP results in an equal sharing of
+bandwidth among connections. In our idealized scenario, we assumed that
+only TCP connections traverse the bottleneck link, that the connections
+have the same RTT value, and that only a single TCP connection is
+associated with a hostdestination pair. In practice, these conditions
+are typically not met, and client-server applications can thus obtain
+very unequal portions of link bandwidth. In particular, it has been
+shown that when multiple connections share a common bottleneck, those
+sessions with a smaller RTT are able to grab the available bandwidth at
+that link more quickly as it becomes free (that is, open their
+congestion windows faster) and thus will enjoy higher throughput than
+those connections with larger RTTs \[Lakshman
+
+1997\]. Fairness and UDP We have just seen how TCP congestion control
+regulates an application's transmission rate via the congestion window
+mechanism. Many multimedia applications, such as Internet phone and
+video conferencing, often do not run over TCP for this very
+reason---they do not want their transmission rate throttled, even if the
+network is very congested. Instead, these applications prefer to run
+over UDP, which does not have built-in congestion control. When running
+over UDP, applications can pump their audio and video into the network
+at a constant rate and occasionally lose packets, rather than reduce
+their rates to "fair" levels at times of congestion and not lose any
+packets. From the perspective of TCP, the multimedia applications
+running over UDP are not being fair---they do not cooperate with the
+other connections nor adjust their transmission rates appropriately.
+Because TCP congestion control will decrease its transmission rate in
+the face of increasing congestion (loss), while UDP sources need not, it
+is possible for UDP sources to crowd out TCP traffic. An area of
+research today is thus the development of congestion-control mechanisms
+for the Internet that prevent UDP traffic from bringing the Internet's
+throughput to a grinding halt \[Floyd 1999; Floyd 2000; Kohler 2006; RFC
+4340\]. Fairness and Parallel TCP Connections But even if we could force
+UDP traffic to behave fairly, the fairness problem would still not be
+completely solved. This is because there is nothing to stop a TCP-based
+application from using multiple parallel connections. For example, Web
+browsers often use multiple parallel TCP connections to transfer the
+multiple objects within a Web page. (The exact number of multiple
+connections is configurable in most browsers.) When an application uses
+multiple parallel connections, it gets a larger fraction of the
+bandwidth in a congested link. As an example, consider a link of rate R
+supporting nine ongoing clientserver applications, with each of the
+applications using one TCP connection. If a new application comes along
+and also uses one TCP connection, then each application gets
+approximately the same transmission rate of R/10. But if this new
+application instead uses 11 parallel TCP connections, then the new
+application gets an unfair allocation of more than R/2. Because Web
+traffic is so pervasive in the Internet, multiple parallel connections
+are not uncommon.
+
+3.7.2 Explicit Congestion Notification (ECN): Network-assisted
+Congestion Control Since the initial standardization of slow start and
+congestion avoidance in the late 1980's \[RFC 1122\], TCP has
+implemented the form of end-end congestion control that we studied in
+Section 3.7.1: a TCP sender receives no explicit congestion indications
+from the network layer, and instead infers congestion through observed
+packet loss. More recently, extensions to both IP and TCP \[RFC 3168\]
+have been proposed, implemented, and deployed that allow the network to
+explicitly signal congestion to a TCP
+
+sender and receiver. This form of network-assisted congestion control is
+known as Explicit Congestion Notification. As shown in Figure 3.56, the
+TCP and IP protocols are involved. At the network layer, two bits (with
+four possible values, overall) in the Type of Service field of the IP
+datagram header (which we'll discuss in Section 4.3) are used for ECN.
+One setting of the ECN bits is used by a router to indicate that it (the
+
+Figure 3.56 Explicit Congestion Notification: network-assisted
+congestion control
+
+router) is experiencing congestion. This congestion indication is then
+carried in the marked IP datagram to the destination host, which then
+informs the sending host, as shown in Figure 3.56. RFC 3168 does not
+provide a definition of when a router is congested; that decision is a
+configuration choice made possible by the router vendor, and decided by
+the network operator. However, RFC 3168 does recommend that an ECN
+congestion indication be set only in the face of persistent congestion.
+A second setting of the ECN bits is used by the sending host to inform
+routers that the sender and receiver are ECN-capable, and thus capable
+of taking action in response to ECN-indicated network congestion. As
+shown in Figure 3.56, when the TCP in the receiving host receives an ECN
+congestion indication via a received datagram, the TCP in the receiving
+host informs the TCP in the sending host of the congestion indication by
+setting the ECE (Explicit Congestion Notification Echo) bit (see Figure
+3.29) in a receiver-to-sender TCP ACK segment. The TCP sender, in turn,
+reacts to an ACK with an ECE congestion indication by halving the
+congestion window, as it would react to a lost segment using fast
+retransmit, and sets the CWR (Congestion Window Reduced) bit in the
+header of the next transmitted TCP sender-to-receiver segment.
+
+Other transport-layer protocols besides TCP may also make use of
+network-layer-signaled ECN. The Datagram Congestion Control Protocol
+(DCCP) \[RFC 4340\] provides a low-overhead, congestioncontrolled
+UDP-like unreliable service that utilizes ECN. DCTCP (Data Center TCP)
+\[Alizadeh 2010\], a version of TCP designed specifically for data
+center networks, also makes use of ECN.
+
+3.8 Summary We began this chapter by studying the services that a
+transport-layer protocol can provide to network applications. At one
+extreme, the transport-layer protocol can be very simple and offer a
+no-frills service to applications, providing only a
+multiplexing/demultiplexing function for communicating processes. The
+Internet's UDP protocol is an example of such a no-frills
+transport-layer protocol. At the other extreme, a transport-layer
+protocol can provide a variety of guarantees to applications, such as
+reliable delivery of data, delay guarantees, and bandwidth guarantees.
+Nevertheless, the services that a transport protocol can provide are
+often constrained by the service model of the underlying network-layer
+protocol. If the network-layer protocol cannot provide delay or
+bandwidth guarantees to transport-layer segments, then the
+transport-layer protocol cannot provide delay or bandwidth guarantees
+for the messages sent between processes. We learned in Section 3.4 that
+a transport-layer protocol can provide reliable data transfer even if
+the underlying network layer is unreliable. We saw that providing
+reliable data transfer has many subtle points, but that the task can be
+accomplished by carefully combining acknowledgments, timers,
+retransmissions, and sequence numbers. Although we covered reliable data
+transfer in this chapter, we should keep in mind that reliable data
+transfer can be provided by link-, network-, transport-, or
+application-layer protocols. Any of the upper four layers of the
+protocol stack can implement acknowledgments, timers, retransmissions,
+and sequence numbers and provide reliable data transfer to the layer
+above. In fact, over the years, engineers and computer scientists have
+independently designed and implemented link-, network-, transport-, and
+application-layer protocols that provide reliable data transfer
+(although many of these protocols have quietly disappeared). In Section
+3.5, we took a close look at TCP, the Internet's connection-oriented and
+reliable transportlayer protocol. We learned that TCP is complex,
+involving connection management, flow control, and round-trip time
+estimation, as well as reliable data transfer. In fact, TCP is actually
+more complex than our description---we intentionally did not discuss a
+variety of TCP patches, fixes, and improvements that are widely
+implemented in various versions of TCP. All of this complexity, however,
+is hidden from the network application. If a client on one host wants to
+send data reliably to a server on another host, it simply opens a TCP
+socket to the server and pumps data into that socket. The client-server
+application is blissfully unaware of TCP's complexity. In Section 3.6,
+we examined congestion control from a broad perspective, and in Section
+3.7, we showed how TCP implements congestion control. We learned that
+congestion control is imperative for
+
+the well-being of the network. Without congestion control, a network can
+easily become gridlocked, with little or no data being transported
+end-to-end. In Section 3.7 we learned that TCP implements an endto-end
+congestion-control mechanism that additively increases its transmission
+rate when the TCP connection's path is judged to be congestion-free, and
+multiplicatively decreases its transmission rate when loss occurs. This
+mechanism also strives to give each TCP connection passing through a
+congested link an equal share of the link bandwidth. We also examined in
+some depth the impact of TCP connection establishment and slow start on
+latency. We observed that in many important scenarios, connection
+establishment and slow start significantly contribute to end-to-end
+delay. We emphasize once more that while TCP congestion control has
+evolved over the years, it remains an area of intensive research and
+will likely continue to evolve in the upcoming years. Our discussion of
+specific Internet transport protocols in this chapter has focused on UDP
+and TCP---the two "work horses" of the Internet transport layer.
+However, two decades of experience with these two protocols has
+identified circumstances in which neither is ideally suited. Researchers
+have thus been busy developing additional transport-layer protocols,
+several of which are now IETF proposed standards. The Datagram
+Congestion Control Protocol (DCCP) \[RFC 4340\] provides a low-overhead,
+messageoriented, UDP-like unreliable service, but with an
+application-selected form of congestion control that is compatible with
+TCP. If reliable or semi-reliable data transfer is needed by an
+application, then this would be performed within the application itself,
+perhaps using the mechanisms we have studied in Section 3.4. DCCP is
+envisioned for use in applications such as streaming media (see Chapter
+9) that can exploit the tradeoff between timeliness and reliability of
+data delivery, but that want to be responsive to network congestion.
+Google's QUIC (Quick UDP Internet Connections) protocol \[Iyengar
+2016\], implemented in Google's Chromium browser, provides reliability
+via retransmission as well as error correction, fast-connection setup,
+and a rate-based congestion control algorithm that aims to be TCP
+friendly---all implemented as an application-level protocol on top of
+UDP. In early 2015, Google reported that roughly half of all requests
+from Chrome to Google servers are served over QUIC. DCTCP (Data Center
+TCP) \[Alizadeh 2010\] is a version of TCP designed specifically for
+data center networks, and uses ECN to better support the mix of short-
+and long-lived flows that characterize data center workloads. The Stream
+Control Transmission Protocol (SCTP) \[RFC 4960, RFC 3286\] is a
+reliable, messageoriented protocol that allows several different
+application-level "streams" to be multiplexed through a single SCTP
+connection (an approach known as "multi-streaming"). From a reliability
+standpoint, the different streams within the connection are handled
+separately, so that packet loss in one stream does not affect the
+delivery of data in other streams. QUIC provides similar multi-stream
+semantics. SCTP
+
+also allows data to be transferred over two outgoing paths when a host
+is connected to two or more networks, optional delivery of out-of-order
+data, and a number of other features. SCTP's flow- and
+congestion-control algorithms are essentially the same as in TCP. The
+TCP-Friendly Rate Control (TFRC) protocol \[RFC 5348\] is a
+congestion-control protocol rather than a full-fledged transport-layer
+protocol. It specifies a congestion-control mechanism that could be used
+in another transport protocol such as DCCP (indeed one of the two
+application-selectable protocols available in DCCP is TFRC). The goal of
+TFRC is to smooth out the "saw tooth" behavior (see Figure 3.53) in TCP
+congestion control, while maintaining a long-term sending rate that is
+"reasonably" close to that of TCP. With a smoother sending rate than
+TCP, TFRC is well-suited for multimedia applications such as IP
+telephony or streaming media where such a smooth rate is important. TFRC
+is an "equationbased" protocol that uses the measured packet loss rate
+as input to an equation \[Padhye 2000\] that estimates what TCP's
+throughput would be if a TCP session experiences that loss rate. This
+rate is then taken as TFRC's target sending rate. Only the future will
+tell whether DCCP, SCTP, QUIC, or TFRC will see widespread deployment.
+While these protocols clearly provide enhanced capabilities over TCP and
+UDP, TCP and UDP have proven themselves "good enough" over the years.
+Whether "better" wins out over "good enough" will depend on a complex
+mix of technical, social, and business considerations. In Chapter 1, we
+said that a computer network can be partitioned into the "network edge"
+and the "network core." The network edge covers everything that happens
+in the end systems. Having now covered the application layer and the
+transport layer, our discussion of the network edge is complete. It is
+time to explore the network core! This journey begins in the next two
+chapters, where we'll study the network layer, and continues into
+Chapter 6, where we'll study the link layer.
+
+Homework Problems and Questions
+
+Chapter 3 Review Questions
+
+SECTIONS 3.1--3.3 R1. Suppose the network layer provides the following
+service. The network layer in the source host accepts a segment of
+maximum size 1,200 bytes and a destination host address from the
+transport layer. The network layer then guarantees to deliver the
+segment to the transport layer at the destination host. Suppose many
+network application processes can be running at the destination host.
+
+a.  Design the simplest possible transport-layer protocol that will get
+    application data to the desired process at the destination host.
+    Assume the operating system in the destination host has assigned a
+    4-byte port number to each running application process.
+
+b.  Modify this protocol so that it provides a "return address" to the
+    destination process.
+
+c.  In your protocols, does the transport layer "have to do anything" in
+    the core of the computer network? R2. Consider a planet where
+    everyone belongs to a family of six, every family lives in its own
+    house, each house has a unique address, and each person in a given
+    house has a unique name. Suppose this planet has a mail service that
+    delivers letters from source house to destination house. The mail
+    service requires that (1) the letter be in an envelope, and that (2)
+    the address of the destination house (and nothing more) be clearly
+    written on the envelope. Suppose each family has a delegate family
+    member who collects and distributes letters for the other family
+    members. The letters do not necessarily provide any indication of
+    the recipients of the letters.
+
+d.  Using the solution to Problem R1 above as inspiration, describe a
+    protocol that the delegates can use to deliver letters from a
+    sending family member to a receiving family member.
+
+e.  In your protocol, does the mail service ever have to open the
+    envelope and examine the letter in order to provide its service? R3.
+    Consider a TCP connection between Host A and Host B. Suppose that
+    the TCP segments traveling from Host A to Host B have source port
+    number x and destination port number y. What are the source and
+    destination port numbers for the segments traveling from Host B to
+    Host A?
+
+R4. Describe why an application developer might choose to run an
+application over UDP rather than TCP. R5. Why is it that voice and video
+traffic is often sent over TCP rather than UDP in today's Internet?
+(Hint: The answer we are looking for has nothing to do with TCP's
+congestion-control mechanism.) R6. Is it possible for an application to
+enjoy reliable data transfer even when the application runs over UDP? If
+so, how? R7. Suppose a process in Host C has a UDP socket with port
+number 6789. Suppose both Host A and Host B each send a UDP segment to
+Host C with destination port number 6789. Will both of these segments be
+directed to the same socket at Host C? If so, how will the process at
+Host C know that these two segments originated from two different hosts?
+R8. Suppose that a Web server runs in Host C on port 80. Suppose this
+Web server uses persistent connections, and is currently receiving
+requests from two different Hosts, A and B. Are all of the requests
+being sent through the same socket at Host C? If they are being passed
+through different sockets, do both of the sockets have port 80? Discuss
+and explain.
+
+SECTION 3.4 R9. In our rdt protocols, why did we need to introduce
+sequence numbers? R10. In our rdt protocols, why did we need to
+introduce timers? R11. Suppose that the roundtrip delay between sender
+and receiver is constant and known to the sender. Would a timer still be
+necessary in protocol rdt 3.0 , assuming that packets can be lost?
+Explain. R12. Visit the Go-Back-N Java applet at the companion Web site.
+
+a.  Have the source send five packets, and then pause the animation
+    before any of the five packets reach the destination. Then kill the
+    first packet and resume the animation. Describe what happens.
+
+b.  Repeat the experiment, but now let the first packet reach the
+    destination and kill the first acknowledgment. Describe again what
+    happens.
+
+c.  Finally, try sending six packets. What happens? R13. Repeat R12, but
+    now with the Selective Repeat Java applet. How are Selective Repeat
+    and Go-Back-N different?
+
+SECTION 3.5 R14. True or false?
+
+a.  Host A is sending Host B a large file over a TCP connection. Assume
+    Host B has no data to send Host A. Host B will not send
+    acknowledgments to Host A because Host B cannot piggyback the
+    acknowledgments on data.
+
+b. The size of the TCP rwnd never changes throughout the duration of the
+connection. c. Suppose Host A is sending Host B a large file over a TCP
+connection. The number of unacknowledged bytes that A sends cannot
+exceed the size of the receive buffer.
+
+d.  Suppose Host A is sending a large file to Host B over a TCP
+    connection. If the sequence number for a segment of this connection
+    is m, then the sequence number for the subsequent segment will
+    necessarily be m+1.
+
+e.  The TCP segment has a field in its header for rwnd .
+
+f.  Suppose that the last SampleRTT in a TCP connection is equal to 1
+    sec. The current value of TimeoutInterval for the connection will
+    necessarily be ≥1 sec.
+
+g.  Suppose Host A sends one segment with sequence number 38 and 4 bytes
+    of data over a TCP connection to Host B. In this same segment the
+    acknowledgment number is necessarily 42. R15. Suppose Host A sends
+    two TCP segments back to back to Host B over a TCP connection. The
+    first segment has sequence number 90; the second has sequence number
+    110.
+
+h.  How much data is in the first segment?
+
+i.  Suppose that the first segment is lost but the second segment
+    arrives at B. In the acknowledgment that Host B sends to Host A,
+    what will be the acknowledgment number? R16. Consider the Telnet
+    example discussed in Section 3.5 . A few seconds after the user
+    types the letter 'C,' the user types the letter 'R.' After typing
+    the letter 'R,' how many segments are sent, and what is put in the
+    sequence number and acknowledgment fields of the segments?
+
+SECTION 3.7 R17. Suppose two TCP connections are present over some
+bottleneck link of rate R bps. Both connections have a huge file to send
+(in the same direction over the bottleneck link). The transmissions of
+the files start at the same time. What transmission rate would TCP like
+to give to each of the connections? R18. True or false? Consider
+congestion control in TCP. When the timer expires at the sender, the
+value of ssthresh is set to one half of its previous value. R19. In the
+discussion of TCP splitting in the sidebar in Section 3.7 , it was
+claimed that the response time with TCP splitting is approximately
+4⋅RTTFE+RTTBE+processing time. Justify this claim.
+
+Problems P1. Suppose Client A initiates a Telnet session with Server S.
+At about the same time, Client B
+
+also initiates a Telnet session with Server S. Provide possible source
+and destination port numbers for
+
+a.  The segments sent from A to S.
+
+b.  The segments sent from B to S.
+
+c.  The segments sent from S to A.
+
+d.  The segments sent from S to B.
+
+e.  If A and B are different hosts, is it possible that the source port
+    number in the segments from A to S is the same as that from B to S?
+
+f.  How about if they are the same host? P2. Consider Figure 3.5 . What
+    are the source and destination port values in the segments flowing
+    from the server back to the clients' processes? What are the IP
+    addresses in the network-layer datagrams carrying the
+    transport-layer segments? P3. UDP and TCP use 1s complement for
+    their checksums. Suppose you have the following three 8-bit bytes:
+    01010011, 01100110, 01110100. What is the 1s complement of the sum
+    of these 8-bit bytes? (Note that although UDP and TCP use 16-bit
+    words in computing the checksum, for this problem you are being
+    asked to consider 8-bit sums.) Show all work. Why is it that UDP
+    takes the 1s complement of the sum; that is, why not just use the
+    sum? With the 1s complement scheme, how does the receiver detect
+    errors? Is it possible that a 1-bit error will go undetected? How
+    about a 2-bit error? P4.
+
+g.  Suppose you have the following 2 bytes: 01011100 and 01100101. What
+    is the 1s complement of the sum of these 2 bytes?
+
+h.  Suppose you have the following 2 bytes: 11011010 and 01100101. What
+    is the 1s complement of the sum of these 2 bytes?
+
+i.  For the bytes in part (a), give an example where one bit is flipped
+    in each of the 2 bytes and yet the 1s complement doesn't change. P5.
+    Suppose that the UDP receiver computes the Internet checksum for the
+    received UDP segment and finds that it matches the value carried in
+    the checksum field. Can the receiver be absolutely certain that no
+    bit errors have occurred? Explain. P6. Consider our motivation for
+    correcting protocol rdt2.1 . Show that the receiver, shown in Figure
+    3.57 , when operating with the sender shown in Figure 3.11 , can
+    lead the sender and receiver to enter into a deadlock state, where
+    each is waiting for an event that will never occur. P7. In protocol
+    rdt3.0 , the ACK packets flowing from the receiver to the sender do
+    not have sequence numbers (although they do have an ACK field that
+    contains the sequence number of the packet they are acknowledging).
+    Why is it that our ACK packets do not require sequence numbers?
+
+Figure 3.57 An incorrect receiver for protocol rdt 2.1
+
+P8. Draw the FSM for the receiver side of protocol rdt3.0 . P9. Give a
+trace of the operation of protocol rdt3.0 when data packets and
+acknowledgment packets are garbled. Your trace should be similar to that
+used in Figure 3.16 . P10. Consider a channel that can lose packets but
+has a maximum delay that is known. Modify protocol rdt2.1 to include
+sender timeout and retransmit. Informally argue why your protocol can
+communicate correctly over this channel. P11. Consider the rdt2.2
+receiver in Figure 3.14 , and the creation of a new packet in the
+self-transition (i.e., the transition from the state back to itself) in
+the Wait-for-0-from-below and the Wait-for-1-from-below states:
+sndpkt=make_pkt(ACK, 1, checksum) and sndpkt=make_pkt(ACK, 0, checksum)
+. Would the protocol work correctly if this action were removed from the
+self-transition in the Wait-for-1-from-below state? Justify your answer.
+What if this event were removed from the self-transition in the
+Wait-for-0-from-below state? \[Hint: In this latter case, consider what
+would happen if the first sender-to-receiver packet were corrupted.\]
+P12. The sender side of rdt3.0 simply ignores (that is, takes no action
+on) all received packets that are either in error or have the wrong
+value in the acknum field of an acknowledgment packet. Suppose that in
+such circumstances, rdt3.0 were simply to retransmit the current data
+packet. Would the protocol still work? (Hint: Consider what would happen
+if there were only bit errors; there are no packet losses but premature
+timeouts can occur. Consider how many times the nth packet is sent, in
+the limit as n approaches infinity.)
+
+P13. Consider the rdt 3.0 protocol. Draw a diagram showing that if the
+network connection between the sender and receiver can reorder messages
+(that is, that two messages propagating in the medium between the sender
+and receiver can be reordered), then the alternating-bit protocol will
+not work correctly (make sure you clearly identify the sense in which it
+will not work correctly). Your diagram should have the sender on the
+left and the receiver on the right, with the time axis running down the
+page, showing data (D) and acknowledgment (A) message exchange. Make
+sure you indicate the sequence number associated with any data or
+acknowledgment segment. P14. Consider a reliable data transfer protocol
+that uses only negative acknowledgments. Suppose the sender sends data
+only infrequently. Would a NAK-only protocol be preferable to a protocol
+that uses ACKs? Why? Now suppose the sender has a lot of data to send
+and the endto-end connection experiences few losses. In this second
+case, would a NAK-only protocol be preferable to a protocol that uses
+ACKs? Why? P15. Consider the cross-country example shown in Figure 3.17
+. How big would the window size have to be for the channel utilization
+to be greater than 98 percent? Suppose that the size of a packet is
+1,500 bytes, including both header fields and data. P16. Suppose an
+application uses rdt 3.0 as its transport layer protocol. As the
+stop-and-wait protocol has very low channel utilization (shown in the
+cross-country example), the designers of this application let the
+receiver keep sending back a number (more than two) of alternating ACK 0
+and ACK 1 even if the corresponding data have not arrived at the
+receiver. Would this application design increase the channel
+utilization? Why? Are there any potential problems with this approach?
+Explain. P17. Consider two network entities, A and B, which are
+connected by a perfect bi-directional channel (i.e., any message sent
+will be received correctly; the channel will not corrupt, lose, or
+re-order packets). A and B are to deliver data messages to each other in
+an alternating manner: First, A must deliver a message to B, then B must
+deliver a message to A, then A must deliver a message to B and so on. If
+an entity is in a state where it should not attempt to deliver a message
+to the other side, and there is an event like rdt_send(data) call from
+above that attempts to pass data down for transmission to the other
+side, this call from above can simply be ignored with a call to
+rdt_unable_to_send(data) , which informs the higher layer that it is
+currently not able to send data. \[Note: This simplifying assumption is
+made so you don't have to worry about buffering data.\] Draw a FSM
+specification for this protocol (one FSM for A, and one FSM for B!).
+Note that you do not have to worry about a reliability mechanism here;
+the main point of this question is to create a FSM specification that
+reflects the synchronized behavior of the two entities. You should use
+the following events and actions that have the same meaning as protocol
+rdt1.0 in Figure 3.9 : rdt_send(data), packet = make_pkt(data) ,
+udt_send(packet), rdt_rcv(packet) , extract (packet, data),
+deliver_data(data) . Make sure your protocol reflects the strict
+alternation of sending between A and B. Also, make sure to indicate the
+initial states for A and B in your FSM descriptions.
+
+P18. In the generic SR protocol that we studied in Section 3.4.4 , the
+sender transmits a message as soon as it is available (if it is in the
+window) without waiting for an acknowledgment. Suppose now that we want
+an SR protocol that sends messages two at a time. That is, the sender
+will send a pair of messages and will send the next pair of messages
+only when it knows that both messages in the first pair have been
+received correctly. Suppose that the channel may lose messages but will
+not corrupt or reorder messages. Design an error-control protocol for
+the unidirectional reliable transfer of messages. Give an FSM
+description of the sender and receiver. Describe the format of the
+packets sent between sender and receiver, and vice versa. If you use any
+procedure calls other than those in Section 3.4 (for example, udt_send()
+, start_timer() , rdt_rcv() , and so on), clearly state their actions.
+Give an example (a timeline trace of sender and receiver) showing how
+your protocol recovers from a lost packet. P19. Consider a scenario in
+which Host A wants to simultaneously send packets to Hosts B and C. A is
+connected to B and C via a broadcast channel---a packet sent by A is
+carried by the channel to both B and C. Suppose that the broadcast
+channel connecting A, B, and C can independently lose and corrupt
+packets (and so, for example, a packet sent from A might be correctly
+received by B, but not by C). Design a stop-and-wait-like error-control
+protocol for reliably transferring packets from A to B and C, such that
+A will not get new data from the upper layer until it knows that both B
+and C have correctly received the current packet. Give FSM descriptions
+of A and C. (Hint: The FSM for B should be essentially the same as for
+C.) Also, give a description of the packet format(s) used. P20. Consider
+a scenario in which Host A and Host B want to send messages to Host C.
+Hosts A and C are connected by a channel that can lose and corrupt (but
+not reorder) messages. Hosts B and C are connected by another channel
+(independent of the channel connecting A and C) with the same
+properties. The transport layer at Host C should alternate in delivering
+messages from A and B to the layer above (that is, it should first
+deliver the data from a packet from A, then the data from a packet from
+B, and so on). Design a stop-and-wait-like error-control protocol for
+reliably transferring packets from A and B to C, with alternating
+delivery at C as described above. Give FSM descriptions of A and C.
+(Hint: The FSM for B should be essentially the same as for A.) Also,
+give a description of the packet format(s) used. P21. Suppose we have
+two network entities, A and B. B has a supply of data messages that will
+be sent to A according to the following conventions. When A gets a
+request from the layer above to get the next data (D) message from B, A
+must send a request (R) message to B on the A-to-B channel. Only when B
+receives an R message can it send a data (D) message back to A on the
+B-to-A channel. A should deliver exactly one copy of each D message to
+the layer above. R messages can be lost (but not corrupted) in the
+A-to-B channel; D messages, once sent, are always delivered correctly.
+The delay along both channels is unknown and variable. Design (give an
+FSM description of) a protocol that incorporates the appropriate
+mechanisms to compensate for the loss-prone A-to-B channel and
+implements message passing to the layer above at entity A, as discussed
+above. Use only those mechanisms that are absolutely
+
+necessary. P22. Consider the GBN protocol with a sender window size of 4
+and a sequence number range of 1,024. Suppose that at time t, the next
+in-order packet that the receiver is expecting has a sequence number of
+k. Assume that the medium does not reorder messages. Answer the
+following questions:
+
+a.  What are the possible sets of sequence numbers inside the sender's
+    window at time t? Justify your answer.
+
+b.  What are all possible values of the ACK field in all possible
+    messages currently propagating back to the sender at time t? Justify
+    your answer. P23. Consider the GBN and SR protocols. Suppose the
+    sequence number space is of size k. What is the largest allowable
+    sender window that will avoid the occurrence of problems such as
+    that in Figure 3.27 for each of these protocols? P24. Answer true or
+    false to the following questions and briefly justify your answer:
+
+c.  With the SR protocol, it is possible for the sender to receive an
+    ACK for a packet that falls outside of its current window.
+
+d.  With GBN, it is possible for the sender to receive an ACK for a
+    packet that falls outside of its current window.
+
+e.  The alternating-bit protocol is the same as the SR protocol with a
+    sender and receiver window size of 1.
+
+f.  The alternating-bit protocol is the same as the GBN protocol with a
+    sender and receiver window size of 1. P25. We have said that an
+    application may choose UDP for a transport protocol because UDP
+    offers finer application control (than TCP) of what data is sent in
+    a segment and when.
+
+g.  Why does an application have more control of what data is sent in a
+    segment?
+
+h.  Why does an application have more control on when the segment is
+    sent? P26. Consider transferring an enormous file of L bytes from
+    Host A to Host B. Assume an MSS of 536 bytes.
+
+i.  What is the maximum value of L such that TCP sequence numbers are
+    not exhausted? Recall that the TCP sequence number field has 4
+    bytes.
+
+j.  For the L you obtain in (a), find how long it takes to transmit the
+    file. Assume that a total of 66 bytes of transport, network, and
+    data-link header are added to each segment before the resulting
+    packet is sent out over a 155 Mbps link. Ignore flow control and
+    congestion control so A can pump out the segments back to back and
+    continuously. P27. Host A and B are communicating over a TCP
+    connection, and Host B has already received from A all bytes up
+    through byte 126. Suppose Host A then sends two segments to Host B
+    backto-back. The first and second segments contain 80 and 40 bytes
+    of data, respectively. In the first
+
+segment, the sequence number is 127, the source port number is 302, and
+the destination port number is 80. Host B sends an acknowledgment
+whenever it receives a segment from Host A.
+
+a.  In the second segment sent from Host A to B, what are the sequence
+    number, source port number, and destination port number?
+
+b.  If the first segment arrives before the second segment, in the
+    acknowledgment of the first arriving segment, what is the
+    acknowledgment number, the source port number, and the destination
+    port number?
+
+c.  If the second segment arrives before the first segment, in the
+    acknowledgment of the first arriving segment, what is the
+    acknowledgment number?
+
+d.  Suppose the two segments sent by A arrive in order at B. The first
+    acknowledgment is lost and the second acknowledgment arrives after
+    the first timeout interval. Draw a timing diagram, showing these
+    segments and all other segments and acknowledgments sent. (Assume
+    there is no additional packet loss.) For each segment in your
+    figure, provide the sequence number and the number of bytes of data;
+    for each acknowledgment that you add, provide the acknowledgment
+    number. P28. Host A and B are directly connected with a 100 Mbps
+    link. There is one TCP connection between the two hosts, and Host A
+    is sending to Host B an enormous file over this connection. Host A
+    can send its application data into its TCP socket at a rate as high
+    as 120 Mbps but Host B can read out of its TCP receive buffer at a
+    maximum rate of 50 Mbps. Describe the effect of TCP flow control.
+    P29. SYN cookies were discussed in Section 3.5.6 .
+
+e.  Why is it necessary for the server to use a special initial sequence
+    number in the SYNACK?
+
+f.  Suppose an attacker knows that a target host uses SYN cookies. Can
+    the attacker create half-open or fully open connections by simply
+    sending an ACK packet to the target? Why or why not?
+
+g.  Suppose an attacker collects a large amount of initial sequence
+    numbers sent by the server. Can the attacker cause the server to
+    create many fully open connections by sending ACKs with those
+    initial sequence numbers? Why? P30. Consider the network shown in
+    Scenario 2 in Section 3.6.1 . Suppose both sending hosts A and B
+    have some fixed timeout values.
+
+h.  Argue that increasing the size of the finite buffer of the router
+    might possibly decrease the throughput (λout).
+
+i.  Now suppose both hosts dynamically adjust their timeout values (like
+    what TCP does) based on the buffering delay at the router. Would
+    increasing the buffer size help to increase the throughput? Why?
+    P31. Suppose that the five measured SampleRTT values (see Section
+    3.5.3 ) are 106 ms, 120
+
+ms, 140 ms, 90 ms, and 115 ms. Compute the EstimatedRTT after each of
+these SampleRTT values is obtained, using a value of α=0.125 and
+assuming that the value of EstimatedRTT was 100 ms just before the first
+of these five samples were obtained. Compute also the DevRTT after each
+sample is obtained, assuming a value of β=0.25 and assuming the value of
+DevRTT was 5 ms just before the first of these five samples was
+obtained. Last, compute the TCP TimeoutInterval after each of these
+samples is obtained. P32. Consider the TCP procedure for estimating RTT.
+Suppose that α=0.1. Let SampleRTT 1 be the most recent sample RTT, let
+SampleRTT 2 be the next most recent sample RTT, and so on.
+
+a.  For a given TCP connection, suppose four acknowledgments have been
+    returned with corresponding sample RTTs: SampleRTT 4, SampleRTT 3,
+    SampleRTT 2, and SampleRTT 1. Express EstimatedRTT in terms of the
+    four sample RTTs.
+
+b.  Generalize your formula for n sample RTTs.
+
+c.  For the formula in part (b) let n approach infinity. Comment on why
+    this averaging procedure is called an exponential moving average.
+    P33. In Section 3.5.3 , we discussed TCP's estimation of RTT. Why do
+    you think TCP avoids measuring the SampleRTT for retransmitted
+    segments? P34. What is the relationship between the variable
+    SendBase in Section 3.5.4 and the variable LastByteRcvd in Section
+    3.5.5 ? P35. What is the relationship between the variable
+    LastByteRcvd in Section 3.5.5 and the variable y in Section 3.5.4?
+    P36. In Section 3.5.4 , we saw that TCP waits until it has received
+    three duplicate ACKs before performing a fast retransmit. Why do you
+    think the TCP designers chose not to perform a fast retransmit after
+    the first duplicate ACK for a segment is received? P37. Compare GBN,
+    SR, and TCP (no delayed ACK). Assume that the timeout values for all
+    three protocols are sufficiently long such that 5 consecutive data
+    segments and their corresponding ACKs can be received (if not lost
+    in the channel) by the receiving host (Host B) and the sending host
+    (Host A) respectively. Suppose Host A sends 5 data segments to Host
+    B, and the 2nd segment (sent from A) is lost. In the end, all 5 data
+    segments have been correctly received by Host B.
+
+d.  How many segments has Host A sent in total and how many ACKs has
+    Host B sent in total? What are their sequence numbers? Answer this
+    question for all three protocols.
+
+e.  If the timeout values for all three protocol are much longer than 5
+    RTT, then which protocol successfully delivers all five data
+    segments in shortest time interval? P38. In our description of TCP
+    in Figure 3.53 , the value of the threshold, ssthresh , is set as
+    ssthresh=cwnd/2 in several places and ssthresh value is referred to
+    as being set to half the window size when a loss event occurred.
+    Must the rate at which the sender is sending when the loss event
+    occurred be approximately equal to cwnd segments per RTT? Explain
+    your
+
+answer. If your answer is no, can you suggest a different manner in
+which ssthresh should be set? P39. Consider Figure 3.46(b) . If λ′in
+increases beyond R/2, can λout increase beyond R/3? Explain. Now
+consider Figure 3.46(c) . If λ′in increases beyond R/2, can λout
+increase beyond R/4 under the assumption that a packet will be forwarded
+twice on average from the router to the receiver? Explain. P40. Consider
+Figure 3.58 . Assuming TCP Reno is the protocol experiencing the
+behavior shown above, answer the following questions. In all cases, you
+should provide a short discussion justifying your answer.
+
+Examining the behavior of TCP
+
+a.  Identify the intervals of time when TCP slow start is operating.
+
+b.  Identify the intervals of time when TCP congestion avoidance is
+    operating.
+
+c.  After the 16th transmission round, is segment loss detected by a
+    triple duplicate ACK or by a timeout?
+
+d.  After the 22nd transmission round, is segment loss detected by a
+    triple duplicate ACK or by a timeout?
+
+Figure 3.58 TCP window size as a function of time
+
+e. What is the initial value of ssthresh at the first transmission
+round? f. What is the value of ssthresh at the 18th transmission round?
+g. What is the value of ssthresh at the 24th transmission round? h.
+During what transmission round is the 70th segment sent? i. Assuming a
+packet loss is detected after the 26th round by the receipt of a triple
+duplicate ACK, what will be the values of the congestion window size and
+of ssthresh ?
+
+j.  Suppose TCP Tahoe is used (instead of TCP Reno), and assume that
+    triple duplicate ACKs are received at the 16th round. What are the
+    ssthresh and the congestion window size at the 19th round?
+
+k.  Again suppose TCP Tahoe is used, and there is a timeout event at
+    22nd round. How many packets have been sent out from 17th round till
+    22nd round, inclusive? P41. Refer to Figure 3.55 , which illustrates
+    the convergence of TCP's AIMD algorithm. Suppose that instead of a
+    multiplicative decrease, TCP decreased the window size by a constant
+    amount. Would the resulting AIAD algorithm converge to an equal
+    share algorithm? Justify your answer using a diagram similar to
+    Figure 3.55 . P42. In Section 3.5.4 , we discussed the doubling of
+    the timeout interval after a timeout event. This mechanism is a form
+    of congestion control. Why does TCP need a window-based
+    congestion-control mechanism (as studied in Section 3.7 ) in
+    addition to this doubling-timeoutinterval mechanism? P43. Host A is
+    sending an enormous file to Host B over a TCP connection. Over this
+    connection there is never any packet loss and the timers never
+    expire. Denote the transmission rate of the link connecting Host A
+    to the Internet by R bps. Suppose that the process in Host A is
+    capable of sending data into its TCP socket at a rate S bps, where
+    S=10⋅R. Further suppose that the TCP receive buffer is large enough
+    to hold the entire file, and the send buffer can hold only one
+    percent of the file. What would prevent the process in Host A from
+    continuously passing data to its TCP socket at rate S bps? TCP flow
+    control? TCP congestion control? Or something else? Elaborate. P44.
+    Consider sending a large file from a host to another over a TCP
+    connection that has no loss.
+
+l.  Suppose TCP uses AIMD for its congestion control without slow start.
+    Assuming cwnd increases by 1 MSS every time a batch of ACKs is
+    received and assuming approximately constant round-trip times, how
+    long does it take for cwnd increase from 6 MSS to 12 MSS (assuming
+    no loss events)?
+
+m.  What is the average throughout (in terms of MSS and RTT) for this
+    connection up through time=6 RTT? P45. Recall the macroscopic
+    description of TCP throughput. In the period of time from when the
+
+connection's rate varies from W/(2 · RTT) to W/RTT, only one packet is
+lost (at the very end of the period).
+
+a.  Show that the loss rate (fraction of packets lost) is equal to
+    L=loss rate=138W2+34W
+
+b.  Use the result above to show that if a connection has loss rate L,
+    then its average rate is approximately given by ≈1.22⋅MSSRTTL P46.
+    Consider that only a single TCP (Reno) connection uses one 10Mbps
+    link which does not buffer any data. Suppose that this link is the
+    only congested link between the sending and receiving hosts. Assume
+    that the TCP sender has a huge file to send to the receiver, and the
+    receiver's receive buffer is much larger than the congestion window.
+    We also make the following assumptions: each TCP segment size is
+    1,500 bytes; the two-way propagation delay of this connection is 150
+    msec; and this TCP connection is always in congestion avoidance
+    phase, that is, ignore slow start.
+
+c.  What is the maximum window size (in segments) that this TCP
+    connection can achieve?
+
+d.  What is the average window size (in segments) and average throughput
+    (in bps) of this TCP connection?
+
+e.  How long would it take for this TCP connection to reach its maximum
+    window again after recovering from a packet loss? P47. Consider the
+    scenario described in the previous problem. Suppose that the 10Mbps
+    link can buffer a finite number of segments. Argue that in order for
+    the link to always be busy sending data, we would like to choose a
+    buffer size that is at least the product of the link speed C and the
+    two-way propagation delay between the sender and the receiver. P48.
+    Repeat Problem 46, but replacing the 10 Mbps link with a 10 Gbps
+    link. Note that in your answer to part c, you will realize that it
+    takes a very long time for the congestion window size to reach its
+    maximum window size after recovering from a packet loss. Sketch a
+    solution to solve this problem. P49. Let T (measured by RTT) denote
+    the time interval that a TCP connection takes to increase its
+    congestion window size from W/2 to W, where W is the maximum
+    congestion window size. Argue that T is a function of TCP's average
+    throughput. P50. Consider a simplified TCP's AIMD algorithm where
+    the congestion window size is measured in number of segments, not in
+    bytes. In additive increase, the congestion window size increases by
+    one segment in each RTT. In multiplicative decrease, the congestion
+    window size decreases by half (if the result is not an integer,
+    round down to the nearest integer). Suppose that two TCP
+    connections, C1 and C2, share a single congested link of speed 30
+    segments per second. Assume that both C1 and C2 are in the
+    congestion avoidance phase. Connection C1's RTT is 50 msec and
+    connection C2's RTT is 100 msec. Assume that when the data rate in
+    the
+
+link exceeds the link's speed, all TCP connections experience data
+segment loss.
+
+a.  If both C1 and C2 at time t0 have a congestion window of 10
+    segments, what are their congestion window sizes after 1000 msec?
+
+b.  In the long run, will these two connections get the same share of
+    the bandwidth of the congested link? Explain. P51. Consider the
+    network described in the previous problem. Now suppose that the two
+    TCP connections, C1 and C2, have the same RTT of 100 msec. Suppose
+    that at time t0, C1's congestion window size is 15 segments but C2's
+    congestion window size is 10 segments.
+
+c.  What are their congestion window sizes after 2200 msec?
+
+d.  In the long run, will these two connections get about the same share
+    of the bandwidth of the congested link?
+
+e.  We say that two connections are synchronized, if both connections
+    reach their maximum window sizes at the same time and reach their
+    minimum window sizes at the same time. In the long run, will these
+    two connections get synchronized eventually? If so, what are their
+    maximum window sizes?
+
+f.  Will this synchronization help to improve the utilization of the
+    shared link? Why? Sketch some idea to break this synchronization.
+    P52. Consider a modification to TCP's congestion control algorithm.
+    Instead of additive increase, we can use multiplicative increase. A
+    TCP sender increases its window size by a small positive constant
+    a(0\<a\<1) whenever it receives a valid ACK. Find the functional
+    relationship between loss rate L and maximum congestion window W.
+    Argue that for this modified TCP, regardless of TCP's average
+    throughput, a TCP connection always spends the same amount of time
+    to increase its congestion window size from W/2 to W. P53. In our
+    discussion of TCP futures in Section 3.7 , we noted that to achieve
+    a throughput of 10 Gbps, TCP could only tolerate a segment loss
+    probability of 2⋅10−10 (or equivalently, one loss event for every
+    5,000,000,000 segments). Show the derivation for the values of
+    2⋅10−10 (1 out of 5,000,000) for the RTT and MSS values given in
+    Section 3.7 . If TCP needed to support a 100 Gbps connection, what
+    would the tolerable loss be? P54. In our discussion of TCP
+    congestion control in Section 3.7 , we implicitly assumed that the
+    TCP sender always had data to send. Consider now the case that the
+    TCP sender sends a large amount of data and then goes idle (since it
+    has no more data to send) at t1. TCP remains idle for a relatively
+    long period of time and then wants to send more data at t2. What are
+    the advantages and disadvantages of having TCP use the cwnd and
+    ssthresh values from t1 when starting to send data at t2? What
+    alternative would you recommend? Why? P55. In this problem we
+    investigate whether either UDP or TCP provides a degree of end-point
+    authentication.
+
+g.  Consider a server that receives a request within a UDP packet and
+    responds to that request within a UDP packet (for example, as done
+    by a DNS server). If a client with IP
+
+address X spoofs its address with address Y, where will the server send
+its response?
+
+b.  Suppose a server receives a SYN with IP source address Y, and after
+    responding with a SYNACK, receives an ACK with IP source address Y
+    with the correct acknowledgment number. Assuming the server chooses
+    a random initial sequence number and there is no
+    "man-in-the-middle," can the server be certain that the client is
+    indeed at Y (and not at some other address X that is spoofing Y)?
+    P56. In this problem, we consider the delay introduced by the TCP
+    slow-start phase. Consider a client and a Web server directly
+    connected by one link of rate R. Suppose the client wants to
+    retrieve an object whose size is exactly equal to 15 S, where S is
+    the maximum segment size (MSS). Denote the round-trip time between
+    client and server as RTT (assumed to be constant). Ignoring protocol
+    headers, determine the time to retrieve the object (including TCP
+    connection establishment) when
+
+c.  4 S/R\>S/R+RTT\>2S/R
+
+d.  S/R+RTT\>4 S/R
+
+e.  S/R\>RTT.
+
+Programming Assignments Implementing a Reliable Transport Protocol In
+this laboratory programming assignment, you will be writing the sending
+and receiving transport-level code for implementing a simple reliable
+data transfer protocol. There are two versions of this lab, the
+alternating-bit-protocol version and the GBN version. This lab should be
+fun---your implementation will differ very little from what would be
+required in a real-world situation. Since you probably don't have
+standalone machines (with an OS that you can modify), your code will
+have to execute in a simulated hardware/software environment. However,
+the programming interface provided to your routines---the code that
+would call your entities from above and from below---is very close to
+what is done in an actual UNIX environment. (Indeed, the software
+interfaces described in this programming assignment are much more
+realistic than the infinite loop senders and receivers that many texts
+describe.) Stopping and starting timers are also simulated, and timer
+interrupts will cause your timer handling routine to be activated. The
+full lab assignment, as well as code you will need to compile with your
+own code, are available at this book's Web site:
+www.pearsonhighered.com/cs-resources.
+
+Wireshark Lab: Exploring TCP
+
+In this lab, you'll use your Web browser to access a file from a Web
+server. As in earlier Wireshark labs, you'll use Wireshark to capture
+the packets arriving at your computer. Unlike earlier labs, you'll also
+be able to download a Wireshark-readable packet trace from the Web
+server from which you downloaded the file. In this server trace, you'll
+find the packets that were generated by your own access of the Web
+server. You'll analyze the client- and server-side traces to explore
+aspects of TCP. In particular, you'll evaluate the performance of the
+TCP connection between your computer and the Web server. You'll trace
+TCP's window behavior, and infer packet loss, retransmission, flow
+control and congestion control behavior, and estimated roundtrip time.
+As is the case with all Wireshark labs, the full description of this lab
+is available at this book's Web site,
+www.pearsonhighered.com/cs-resources.
+
+Wireshark Lab: Exploring UDP In this short lab, you'll do a packet
+capture and analysis of your favorite application that uses UDP (for
+example, DNS or a multimedia application such as Skype). As we learned
+in Section 3.3, UDP is a simple, no-frills transport protocol. In this
+lab, you'll investigate the header fields in the UDP segment as well as
+the checksum calculation. As is the case with all Wireshark labs, the
+full description of this lab is available at this book's Web site,
+www.pearsonhighered.com/cs-resources. AN INTERVIEW WITH... Van Jacobson
+Van Jacobson works at Google and was previously a Research Fellow at
+PARC. Prior to that, he was co-founder and Chief Scientist of Packet
+Design. Before that, he was Chief Scientist at Cisco. Before joining
+Cisco, he was head of the Network Research Group at Lawrence Berkeley
+National Laboratory and taught at UC Berkeley and Stanford. Van received
+the ACM SIGCOMM Award in 2001 for outstanding lifetime contribution to
+the field of communication networks and the IEEE Kobayashi Award in 2002
+for "contributing to the understanding of network congestion and
+developing congestion control mechanisms that enabled the successful
+scaling of the Internet". He was elected to the U.S. National Academy of
+Engineering in 2004.
+
+Please describe one or two of the most exciting projects you have worked
+on during your career. What were the biggest challenges? School teaches
+us lots of ways to find answers. In every interesting problem I've
+worked on, the challenge has been finding the right question. When Mike
+Karels and I started looking at TCP congestion, we spent months staring
+at protocol and packet traces asking "Why is it failing?". One day in
+Mike's office, one of us said "The reason I can't figure out why it
+fails is because I don't understand how it ever worked to begin with."
+That turned out to be the right question and it forced us to figure out
+the "ack clocking" that makes TCP work. After that, the rest was easy.
+More generally, where do you see the future of networking and the
+Internet? For most people, the Web is the Internet. Networking geeks
+smile politely since we know the Web is an application running over the
+Internet but what if they're right? The Internet is about enabling
+conversations between pairs of hosts. The Web is about distributed
+information production and consumption. "Information propagation" is a
+very general view of communication of which "pairwise conversation" is a
+tiny subset. We need to move into the larger tent. Networking today
+deals with broadcast media (radios, PONs, etc.) by pretending it's a
+point-topoint wire. That's massively inefficient. Terabits-per-second of
+data are being exchanged all over the World via thumb drives or smart
+phones but we don't know how to treat that as "networking". ISPs are
+busily setting up caches and CDNs to scalably distribute video and
+audio. Caching is a necessary part of the solution but there's no part
+of today's networking---from Information, Queuing or Traffic Theory down
+to the Internet protocol specs---that tells us how to engineer and
+deploy it. I think and hope that over the next few years, networking
+will evolve to embrace the much larger vision of communication that
+underlies the Web. What people inspired you professionally?
+
+When I was in grad school, Richard Feynman visited and gave a
+colloquium. He talked about a piece of Quantum theory that I'd been
+struggling with all semester and his explanation was so simple and lucid
+that what had been incomprehensible gibberish to me became obvious and
+inevitable. That ability to see and convey the simplicity that underlies
+our complex world seems to me a rare and wonderful gift. What are your
+recommendations for students who want careers in computer science and
+networking? It's a wonderful field---computers and networking have
+probably had more impact on society than any invention since the book.
+Networking is fundamentally about connecting stuff, and studying it
+helps you make intellectual connections: Ant foraging & Bee dances
+demonstrate protocol design better than RFCs, traffic jams or people
+leaving a packed stadium are the essence of congestion, and students
+finding flights back to school in a post-Thanksgiving blizzard are the
+core of dynamic routing. If you're interested in lots of stuff and want
+to have an impact, it's hard to imagine a better field.
+
+Chapter 4 The Network Layer: Data Plane
+
+We learned in the previous chapter that the transport layer provides
+various forms of process-to-process communication by relying on the
+network layer's host-to-host communication service. We also learned that
+the transport layer does so without any knowledge about how the network
+layer actually implements this service. So perhaps you're now wondering,
+what's under the hood of the host-to-host communication service, what
+makes it tick? In this chapter and the next, we'll learn exactly how the
+network layer can provide its host-to-host communication service. We'll
+see that unlike the transport and application layers, there is a piece
+of the network layer in each and every host and router in the network.
+Because of this, network-layer protocols are among the most challenging
+(and therefore among the most interesting!) in the protocol stack. Since
+the network layer is arguably the most complex layer in the protocol
+stack, we'll have a lot of ground to cover here. Indeed, there is so
+much to cover that we cover the network layer in two chapters. We'll see
+that the network layer can be decomposed into two interacting parts, the
+data plane and the control plane. In Chapter 4, we'll first cover the
+data plane functions of the network layer---the perrouter functions in
+the network layer that determine how a datagram (that is, a
+network-layer packet) arriving on one of a router's input links is
+forwarded to one of that router's output links. We'll cover both
+traditional IP forwarding (where forwarding is based on a datagram's
+destination address) and generalized forwarding (where forwarding and
+other functions may be performed using values in several different
+fields in the datagram's header). We'll study the IPv4 and IPv6
+protocols and addressing in detail. In Chapter 5, we'll cover the
+control plane functions of the network layer---the network-wide logic
+that controls how a datagram is routed among routers along an end-to-end
+path from the source host to the destination host. We'll cover routing
+algorithms, as well as routing protocols, such as OSPF and BGP, that are
+in widespread use in today's Internet. Traditionally, these
+control-plane routing protocols and data-plane forwarding functions have
+been implemented together, monolithically, within a router.
+Software-defined networking (SDN) explicitly separates the data plane
+and control plane by implementing these control plane functions as a
+separate service, typically in a remote "controller." We'll also cover
+SDN controllers in Chapter 5. This distinction between data-plane and
+control-plane functions in the network layer is an important concept to
+keep in mind as you learn about the network layer ---it will help
+structure your thinking about
+
+the network layer and reflects a modern view of the network layer's role
+in computer networking.
+
+4.1 Overview of Network Layer Figure 4.1 shows a simple network with two
+hosts, H1 and H2, and several routers on the path between H1 and H2.
+Let's suppose that H1 is sending information to H2, and consider the
+role of the network layer in these hosts and in the intervening routers.
+The network layer in H1 takes segments from the transport layer in H1,
+encapsulates each segment into a datagram, and then sends the datagrams
+to its nearby router, R1. At the receiving host, H2, the network layer
+receives the datagrams from its nearby router R2, extracts the
+transport-layer segments, and delivers the segments up to the transport
+layer at H2. The primary data-plane role of each router is to forward
+datagrams from its input links to its output links; the primary role of
+the network control plane is to coordinate these local, per-router
+forwarding actions so that datagrams are ultimately transferred
+end-to-end, along paths of routers between source and destination hosts.
+Note that the routers in Figure 4.1 are shown with a truncated protocol
+stack, that is, with no upper layers above the network layer, because
+routers do not run application- and transportlayer protocols such as
+those we examined in Chapters 2 and 3.
+
+4.1.1 Forwarding and Routing: The Data and Control Planes The primary
+role of the network layer is deceptively simple---to move packets from a
+sending host to a receiving host. To do so, two important network-layer
+functions can be identified: Forwarding. When a packet arrives at a
+router's input link, the router must move the packet to the appropriate
+output link. For example, a packet arriving from Host H1 to Router R1 in
+Figure 4.1 must be forwarded to the next router on a path to H2. As we
+will see, forwarding is but one function (albeit the most
+
+Figure 4.1 The network layer
+
+common and important one!) implemented in the data plane. In the more
+general case, which we'll cover in Section 4.4, a packet might also be
+blocked from exiting a router (e.g., if the packet originated at a known
+malicious sending host, or if the packet were destined to a forbidden
+destination host), or might be duplicated and sent over multiple
+outgoing links. Routing. The network layer must determine the route or
+path taken by packets as they flow from a sender to a receiver. The
+algorithms that calculate these paths are referred to as routing
+algorithms. A routing algorithm would determine, for example, the path
+along which packets flow
+
+from H1 to H2 in Figure 4.1. Routing is implemented in the control plane
+of the network layer. The terms forwarding and routing are often used
+interchangeably by authors discussing the network layer. We'll use these
+terms much more precisely in this book. Forwarding refers to the
+router-local action of transferring a packet from an input link
+interface to the appropriate output link interface. Forwarding takes
+place at very short timescales (typically a few nanoseconds), and thus
+is typically implemented in hardware. Routing refers to the network-wide
+process that determines the end-to-end paths that packets take from
+source to destination. Routing takes place on much longer timescales
+(typically seconds), and as we will see is often implemented in
+software. Using our driving analogy, consider the trip from Pennsylvania
+to Florida undertaken by our traveler back in Section 1.3.1. During this
+trip, our driver passes through many interchanges en route to Florida.
+We can think of forwarding as the process of getting through a single
+interchange: A car enters the interchange from one road and determines
+which road it should take to leave the interchange. We can think of
+routing as the process of planning the trip from Pennsylvania to
+Florida: Before embarking on the trip, the driver has consulted a map
+and chosen one of many paths possible, with each path consisting of a
+series of road segments connected at interchanges. A key element in
+every network router is its forwarding table. A router forwards a packet
+by examining the value of one or more fields in the arriving packet's
+header, and then using these header values to index into its forwarding
+table. The value stored in the forwarding table entry for those values
+indicates the outgoing link interface at that router to which that
+packet is to be forwarded. For example, in Figure 4.2, a packet with
+header field value of 0110 arrives to a router. The router indexes into
+its forwarding table and determines that the output link interface for
+this packet is interface 2. The router then internally forwards the
+packet to interface 2. In Section 4.2, we'll look inside a router and
+examine the forwarding function in much greater detail. Forwarding is
+the key function performed by the data-plane functionality of the
+network layer. Control Plane: The Traditional Approach But now you are
+undoubtedly wondering how a router's forwarding tables are configured in
+the first place. This is a crucial issue, one that exposes the important
+interplay between forwarding (in data plane) and routing (in control
+plane). As shown
+
+Figure 4.2 Routing algorithms determine values in forward tables
+
+in Figure 4.2, the routing algorithm determines the contents of the
+routers' forwarding tables. In this example, a routing algorithm runs in
+each and every router and both forwarding and routing functions are
+contained within a router. As we'll see in Sections 5.3 and 5.4, the
+routing algorithm function in one router communicates with the routing
+algorithm function in other routers to compute the values for its
+forwarding table. How is this communication performed? By exchanging
+routing messages containing routing information according to a routing
+protocol! We'll cover routing algorithms and protocols in Sections 5.2
+through 5.4. The distinct and different purposes of the forwarding and
+routing functions can be further illustrated by considering the
+hypothetical (and unrealistic, but technically feasible) case of a
+network in which all forwarding tables are configured directly by human
+network operators physically present at the routers. In this case, no
+routing protocols would be required! Of course, the human operators
+would need to interact with each other to ensure that the forwarding
+tables were configured in such a way that packets reached their intended
+destinations. It's also likely that human configuration would be more
+error-prone and much slower to respond to changes in the network
+topology than a routing protocol. We're thus fortunate that all networks
+have both a forwarding and a routing function! Control Plane: The SDN
+Approach The approach to implementing routing functionality shown in
+Figure 4.2---with each router having a routing component that
+communicates with the routing component of other routers---has been the
+
+traditional approach adopted by routing vendors in their products, at
+least until recently. Our observation that humans could manually
+configure forwarding tables does suggest, however, that there may be
+other ways for control-plane functionality to determine the contents of
+the data-plane forwarding tables. Figure 4.3 shows an alternate approach
+in which a physically separate (from the routers), remote controller
+computes and distributes the forwarding tables to be used by each and
+every router. Note that the data plane components of Figures 4.2 and 4.3
+are identical. In Figure 4.3, however, control-plane routing
+functionality is separated
+
+Figure 4.3 A remote controller determines and distributes values in
+forwarding tables
+
+from the physical router---the routing device performs forwarding only,
+while the remote controller computes and distributes forwarding tables.
+The remote controller might be implemented in a remote data center with
+high reliability and redundancy, and might be managed by the ISP or some
+third party. How might the routers and the remote controller
+communicate? By exchanging messages containing forwarding tables and
+other pieces of routing information. The control-plane approach shown in
+Figure 4.3 is at the heart of software-defined networking (SDN), where
+the network is "software-defined" because the controller that computes
+forwarding tables and interacts with routers is implemented in software.
+Increasingly, these software implementations are also open, i.e.,
+similar to Linux OS code, the
+
+code is publically available, allowing ISPs (and networking researchers
+and students!) to innovate and propose changes to the software that
+controls network-layer functionality. We will cover the SDN control
+plane in Section 5.5.
+
+4.1.2 Network Service Model Before delving into the network layer's data
+plane, let's wrap up our introduction by taking the broader view and
+consider the different types of service that might be offered by the
+network layer. When the transport layer at a sending host transmits a
+packet into the network (that is, passes it down to the network layer at
+the sending host), can the transport layer rely on the network layer to
+deliver the packet to the destination? When multiple packets are sent,
+will they be delivered to the transport layer in the receiving host in
+the order in which they were sent? Will the amount of time between the
+sending of two sequential packet transmissions be the same as the amount
+of time between their reception? Will the network provide any feedback
+about congestion in the network? The answers to these questions and
+others are determined by the service model provided by the network
+layer. The network service model defines the characteristics of
+end-to-end delivery of packets between sending and receiving hosts.
+Let's now consider some possible services that the network layer could
+provide. These services could include: Guaranteed delivery. This service
+guarantees that a packet sent by a source host will eventually arrive at
+the destination host. Guaranteed delivery with bounded delay. This
+service not only guarantees delivery of the packet, but delivery within
+a specified host-to-host delay bound (for example, within 100 msec).
+In-order packet delivery. This service guarantees that packets arrive at
+the destination in the order that they were sent. Guaranteed minimal
+bandwidth. This network-layer service emulates the behavior of a
+transmission link of a specified bit rate (for example, 1 Mbps) between
+sending and receiving hosts. As long as the sending host transmits bits
+(as part of packets) at a rate below the specified bit rate, then all
+packets are eventually delivered to the destination host. Security. The
+network layer could encrypt all datagrams at the source and decrypt them
+at the destination, thereby providing confidentiality to all
+transport-layer segments. This is only a partial list of services that a
+network layer could provide---there are countless variations possible.
+The Internet's network layer provides a single service, known as
+best-effort service. With best-effort service, packets are neither
+guaranteed to be received in the order in which they were sent, nor is
+their eventual delivery even guaranteed. There is no guarantee on the
+end-to-end delay nor is there a
+
+minimal bandwidth guarantee. It might appear that best-effort service is
+a euphemism for no service at all---a network that delivered no packets
+to the destination would satisfy the definition of best-effort delivery
+service! Other network architectures have defined and implemented
+service models that go beyond the Internet's best-effort service. For
+example, the ATM network architecture \[MFA Forum 2016, Black 1995\]
+provides for guaranteed in-order delay, bounded delay, and guaranteed
+minimal bandwidth. There have also been proposed service model
+extensions to the Internet architecture; for example, the Intserv
+architecture \[RFC 1633\] aims to provide end-end delay guarantees and
+congestion-free communication. Interestingly, in spite of these
+well-developed alternatives, the Internet's basic best-effort service
+model combined with adequate bandwidth provisioning have arguably proven
+to be more than "good enough" to enable an amazing range of
+applications, including streaming video services such as Netflix and
+voice-and-video-over-IP, real-time conferencing applications such as
+Skype and Facetime.
+
+An Overview of Chapter 4 Having now provided an overview of the network
+layer, we'll cover the data-plane component of the network layer in the
+following sections in this chapter. In Section 4.2, we'll dive down into
+the internal hardware operations of a router, including input and output
+packet processing, the router's internal switching mechanism, and packet
+queueing and scheduling. In Section 4.3, we'll take a look at
+traditional IP forwarding, in which packets are forwarded to output
+ports based on their destination IP addresses. We'll encounter IP
+addressing, the celebrated IPv4 and IPv6 protocols and more. In Section
+4.4, we'll cover more generalized forwarding, where packets may be
+forwarded to output ports based on a large number of header values
+(i.e., not only based on destination IP address). Packets may be blocked
+or duplicated at the router, or may have certain header field values
+rewritten---all under software control. This more generalized form of
+packet forwarding is a key component of a modern network data plane,
+including the data plane in software-defined networks (SDN). We mention
+here in passing that the terms forwarding and switching are often used
+interchangeably by computer-networking researchers and practitioners;
+we'll use both terms interchangeably in this textbook as well. While
+we're on the topic of terminology, it's also worth mentioning two other
+terms that are often used interchangeably, but that we will use more
+carefully. We'll reserve the term packet switch to mean a general
+packet-switching device that transfers a packet from input link
+interface to output link interface, according to values in a packet's
+header fields. Some packet switches, called link-layer switches
+(examined in Chapter 6), base their forwarding decision on values in the
+fields of the linklayer frame; switches are thus referred to as
+link-layer (layer 2) devices. Other packet switches, called routers,
+base their forwarding decision on header field values in the
+network-layer datagram. Routers are thus network-layer (layer 3)
+devices. (To fully appreciate this important distinction, you might want
+to review Section 1.5.2, where we discuss network-layer datagrams and
+link-layer frames and their relationship.) Since our focus in this
+chapter is on the network layer, we'll mostly use the term router in
+place of packet switch.
+
+4.2 What's Inside a Router? Now that we've overviewed the data and
+control planes within the network layer, the important distinction
+between forwarding and routing, and the services and functions of the
+network layer, let's turn our attention to its forwarding function---the
+actual transfer of packets from a router's incoming links to the
+appropriate outgoing links at that router. A high-level view of a
+generic router architecture is shown in Figure 4.4. Four router
+components can be identified:
+
+Figure 4.4 Router architecture
+
+Input ports. An input port performs several key functions. It performs
+the physical layer function of terminating an incoming physical link at
+a router; this is shown in the leftmost box of an input port and the
+rightmost box of an output port in Figure 4.4. An input port also
+performs link-layer functions needed to interoperate with the link layer
+at the other side of the incoming link; this is represented by the
+middle boxes in the input and output ports. Perhaps most crucially, a
+lookup function is also performed at the input port; this will occur in
+the rightmost box of the input port. It is here that the forwarding
+table is consulted to determine the router output port to which an
+arriving packet will be forwarded via the switching fabric. Control
+packets (for example, packets carrying routing protocol information) are
+forwarded from an input port to the routing processor. Note that the
+term "port" here ---referring to the physical input and output router
+interfaces---is distinctly different from the software
+
+ports associated with network applications and sockets discussed in
+Chapters 2 and 3. In practice, the number of ports supported by a router
+can range from a relatively small number in enterprise routers, to
+hundreds of 10 Gbps ports in a router at an ISP's edge, where the number
+of incoming lines tends to be the greatest. The Juniper MX2020, edge
+router, for example, supports up to 960 10 Gbps Ethernet ports, with an
+overall router system capacity of 80 Tbps \[Juniper MX 2020 2016\].
+Switching fabric. The switching fabric connects the router's input ports
+to its output ports. This switching fabric is completely contained
+within the router---a network inside of a network router! Output ports.
+An output port stores packets received from the switching fabric and
+transmits these packets on the outgoing link by performing the necessary
+link-layer and physical-layer functions. When a link is bidirectional
+(that is, carries traffic in both directions), an output port will
+typically be paired with the input port for that link on the same line
+card. Routing processor. The routing processor performs control-plane
+functions. In traditional routers, it executes the routing protocols
+(which we'll study in Sections 5.3 and 5.4), maintains routing tables
+and attached link state information, and computes the forwarding table
+for the router. In SDN routers, the routing processor is responsible for
+communicating with the remote controller in order to (among other
+activities) receive forwarding table entries computed by the remote
+controller, and install these entries in the router's input ports. The
+routing processor also performs the network management functions that
+we'll study in Section 5.7. A router's input ports, output ports, and
+switching fabric are almost always implemented in hardware, as shown in
+Figure 4.4. To appreciate why a hardware implementation is needed,
+consider that with a 10 Gbps input link and a 64-byte IP datagram, the
+input port has only 51.2 ns to process the datagram before another
+datagram may arrive. If N ports are combined on a line card (as is often
+done in practice), the datagram-processing pipeline must operate N times
+faster---far too fast for software implementation. Forwarding hardware
+can be implemented either using a router vendor's own hardware designs,
+or constructed using purchased merchant-silicon chips (e.g., as sold by
+companies such as Intel and Broadcom). While the data plane operates at
+the nanosecond time scale, a router's control functions---executing the
+routing protocols, responding to attached links that go up or down,
+communicating with the remote controller (in the SDN case) and
+performing management functions---operate at the millisecond or second
+timescale. These control plane functions are thus usually implemented in
+software and execute on the routing processor (typically a traditional
+CPU). Before delving into the details of router internals, let's return
+to our analogy from the beginning of this chapter, where packet
+forwarding was compared to cars entering and leaving an interchange.
+Let's suppose that the interchange is a roundabout, and that as a car
+enters the roundabout, a bit of processing is required. Let's consider
+what information is required for this processing: Destination-based
+forwarding. Suppose the car stops at an entry station and indicates its
+final
+
+destination (not at the local roundabout, but the ultimate destination
+of its journey). An attendant at the entry station looks up the final
+destination, determines the roundabout exit that leads to that final
+destination, and tells the driver which roundabout exit to take.
+Generalized forwarding. The attendant could also determine the car's
+exit ramp on the basis of many other factors besides the destination.
+For example, the selected exit ramp might depend on the car's origin,
+for example the state that issued the car's license plate. Cars from a
+certain set of states might be directed to use one exit ramp (that leads
+to the destination via a slow road), while cars from other states might
+be directed to use a different exit ramp (that leads to the destination
+via superhighway). The same decision might be made based on the model,
+make and year of the car. Or a car not deemed roadworthy might be
+blocked and not be allowed to pass through the roundabout. In the case
+of generalized forwarding, any number of factors may contribute to the
+attendant's choice of the exit ramp for a given car. Once the car enters
+the roundabout (which may be filled with other cars entering from other
+input roads and heading to other roundabout exits), it eventually leaves
+at the prescribed roundabout exit ramp, where it may encounter other
+cars leaving the roundabout at that exit. We can easily recognize the
+principal router components in Figure 4.4 in this analogy---the entry
+road and entry station correspond to the input port (with a lookup
+function to determine to local outgoing port); the roundabout
+corresponds to the switch fabric; and the roundabout exit road
+corresponds to the output port. With this analogy, it's instructive to
+consider where bottlenecks might occur. What happens if cars arrive
+blazingly fast (for example, the roundabout is in Germany or Italy!) but
+the station attendant is slow? How fast must the attendant work to
+ensure there's no backup on an entry road? Even with a blazingly fast
+attendant, what happens if cars traverse the roundabout slowly---can
+backups still occur? And what happens if most of the cars entering at
+all of the roundabout's entrance ramps all want to leave the roundabout
+at the same exit ramp---can backups occur at the exit ramp or elsewhere?
+How should the roundabout operate if we want to assign priorities to
+different cars, or block certain cars from entering the roundabout in
+the first place? These are all analogous to critical questions faced by
+router and switch designers. In the following subsections, we'll look at
+router functions in more detail. \[Iyer 2008, Chao 2001; Chuang 2005;
+Turner 1988; McKeown 1997a; Partridge 1998; Sopranos 2011\] provide a
+discussion of specific router architectures. For concreteness and
+simplicity, we'll initially assume in this section that forwarding
+decisions are based only on the packet's destination address, rather
+than on a generalized set of packet header fields. We will cover the
+case of more generalized packet forwarding in Section 4.4.
+
+4.2.1 Input Port Processing and Destination-Based Forwarding
+
+A more detailed view of input processing is shown in Figure 4.5. As just
+discussed, the input port's linetermination function and link-layer
+processing implement the physical and link layers for that individual
+input link. The lookup performed in the input port is central to the
+router's operation---it is here that the router uses the forwarding
+table to look up the output port to which an arriving packet will be
+forwarded via the switching fabric. The forwarding table is either
+computed and updated by the routing processor (using a routing protocol
+to interact with the routing processors in other network routers) or is
+received from a remote SDN controller. The forwarding table is copied
+from the routing processor to the line cards over a separate bus (e.g.,
+a PCI bus) indicated by the dashed line from the routing processor to
+the input line cards in Figure 4.4. With such a shadow copy at each line
+card, forwarding decisions can be made locally, at each input port,
+without invoking the centralized routing processor on a per-packet basis
+and thus avoiding a centralized processing bottleneck. Let's now
+consider the "simplest" case that the output port to which an incoming
+packet is to be switched is based on the packet's destination address.
+In the case of 32-bit IP addresses, a brute-force implementation of the
+forwarding table would have one entry for every possible destination
+address. Since there are more than 4 billion possible addresses, this
+option is totally out of the question.
+
+Figure 4.5 Input port processing
+
+As an example of how this issue of scale can be handled, let's suppose
+that our router has four links, numbered 0 through 3, and that packets
+are to be forwarded to the link interfaces as follows:
+
+Destination Address Range
+
+Link Interface
+
+11001000 00010111 00010000 00000000
+
+0
+
+through 11001000 00010111 00010111 11111111
+
+11001000 00010111 00011000 00000000
+
+1
+
+through 11001000 00010111 00011000 11111111
+
+11001000 00010111 00011001 00000000
+
+2
+
+through 11001000 00010111 00011111 11111111
+
+Otherwise
+
+3
+
+Clearly, for this example, it is not necessary to have 4 billion entries
+in the router's forwarding table. We could, for example, have the
+following forwarding table with just four entries:
+
+Prefix
+
+Link Interface
+
+11001000 00010111 00010
+
+0
+
+11001000 00010111 00011000
+
+1
+
+11001000 00010111 00011
+
+2
+
+Otherwise
+
+3
+
+With this style of forwarding table, the router matches a prefix of the
+packet's destination address with the entries in the table; if there's a
+match, the router forwards the packet to a link associated with the
+match. For example, suppose the packet's destination address is 11001000
+00010111 00010110 10100001 ; because the 21-bit prefix of this address
+matches the first entry in the table, the router forwards the packet to
+link interface 0. If a prefix doesn't match any of the first three
+entries, then the router forwards the packet to the default interface 3.
+Although this sounds simple enough, there's a very important subtlety
+here. You may have noticed that it is possible for a destination address
+to match more than one entry. For example, the first 24 bits of the
+address 11001000 00010111 00011000 10101010 match the second entry in
+the table, and the first 21 bits of the address match the third entry in
+the table. When there are multiple matches, the router uses the longest
+prefix matching rule; that is, it finds the longest matching entry in
+the table and forwards the packet to the link interface associated with
+the longest prefix match. We'll see exactly why this longest
+prefix-matching rule is used when we study Internet addressing in more
+detail in Section 4.3.
+
+Given the existence of a forwarding table, lookup is conceptually
+simple---hardware logic just searches through the forwarding table
+looking for the longest prefix match. But at Gigabit transmission rates,
+this lookup must be performed in nanoseconds (recall our earlier example
+of a 10 Gbps link and a 64-byte IP datagram). Thus, not only must lookup
+be performed in hardware, but techniques beyond a simple linear search
+through a large table are needed; surveys of fast lookup algorithms can
+be found in \[Gupta 2001, Ruiz-Sanchez 2001\]. Special attention must
+also be paid to memory access times, resulting in designs with embedded
+on-chip DRAM and faster SRAM (used as a DRAM cache) memories. In
+practice, Ternary Content Addressable Memories (TCAMs) are also often
+used for lookup \[Yu 2004\]. With a TCAM, a 32-bit IP address is
+presented to the memory, which returns the content of the forwarding
+table entry for that address in essentially constant time. The Cisco
+Catalyst 6500 and 7600 Series routers and switches can hold upwards of a
+million TCAM forwarding table entries \[Cisco TCAM 2014\]. Once a
+packet's output port has been determined via the lookup, the packet can
+be sent into the switching fabric. In some designs, a packet may be
+temporarily blocked from entering the switching fabric if packets from
+other input ports are currently using the fabric. A blocked packet will
+be queued at the input port and then scheduled to cross the fabric at a
+later point in time. We'll take a closer look at the blocking, queuing,
+and scheduling of packets (at both input ports and output ports)
+shortly. Although "lookup" is arguably the most important action in
+input port processing, many other actions must be taken: (1) physical-
+and link-layer processing must occur, as discussed previously; (2) the
+packet's version number, checksum and time-to-live field---all of which
+we'll study in Section 4.3---must be checked and the latter two fields
+rewritten; and (3) counters used for network management (such as the
+number of IP datagrams received) must be updated. Let's close our
+discussion of input port processing by noting that the input port steps
+of looking up a destination IP address ("match") and then sending the
+packet into the switching fabric to the specified output port ("action")
+is a specific case of a more general "match plus action" abstraction
+that is performed in many networked devices, not just routers. In
+link-layer switches (covered in Chapter 6), link-layer destination
+addresses are looked up and several actions may be taken in addition to
+sending the frame into the switching fabric towards the output port. In
+firewalls (covered in Chapter 8)---devices that filter out selected
+incoming packets---an incoming packet whose header matches a given
+criteria (e.g., a combination of source/destination IP addresses and
+transport-layer port numbers) may be dropped (action). In a network
+address translator (NAT, covered in Section 4.3), an incoming packet
+whose transport-layer port number matches a given value will have its
+port number rewritten before forwarding (action). Indeed, the "match
+plus action" abstraction is both powerful and prevalent in network
+devices today, and is central to the notion of generalized forwarding
+that we'll study in Section 4.4.
+
+4.2.2 Switching The switching fabric is at the very heart of a router,
+as it is through this fabric that the packets are actually switched
+(that is, forwarded) from an input port to an output port. Switching can
+be accomplished in a number of ways, as shown in Figure 4.6: Switching
+via memory. The simplest, earliest routers were traditional computers,
+with switching between input and output ports being done under direct
+control of the CPU (routing processor). Input and output ports
+functioned as traditional I/O devices in a traditional operating system.
+An input port with an arriving packet first signaled the routing
+processor via an interrupt. The packet was then copied from the input
+port into processor memory. The routing processor then extracted the
+destination address from the header, looked up the appropriate output
+port in the forwarding table, and copied the packet to the output port's
+buffers. In this scenario, if the memory bandwidth is such that a
+maximum of B packets per second can be written into, or read from,
+memory, then the overall forwarding throughput (the total rate at which
+packets are transferred from input ports to output ports) must be less
+than B/2. Note also that two packets cannot be forwarded
+
+Figure 4.6 Three switching techniques
+
+at the same time, even if they have different destination ports, since
+only one memory read/write can be done at a time over the shared system
+bus. Some modern routers switch via memory. A major difference from
+early routers, however, is that the lookup of the destination address
+and the storing of the packet into the appropriate memory location are
+performed by processing on the input line cards. In some ways, routers
+that switch via memory look very much like shared-memory
+multiprocessors, with the processing on a line card switching (writing)
+packets into the memory of the appropriate output port. Cisco's Catalyst
+8500 series switches \[Cisco 8500 2016\] internally switches packets via
+a shared memory. Switching via a bus. In this approach, an input port
+transfers a packet directly to the output port over a shared bus,
+without intervention by the routing processor. This is typically done by
+having the input port pre-pend a switch-internal label (header) to the
+packet indicating the local output port to which this packet is being
+transferred and transmitting the packet onto the bus. All output ports
+receive the packet, but only the port that matches the label will keep
+the packet. The label is then removed at the output port, as this label
+is only used within the switch to cross the bus. If multiple packets
+arrive to the router at the same time, each at a different input port,
+all but one must wait since only one packet can cross the bus at a time.
+Because every packet must cross the single bus, the switching speed of
+the router is limited to the bus speed; in our roundabout analogy, this
+is as if the roundabout could only contain one car at a time.
+Nonetheless, switching via a bus is often sufficient for routers that
+operate in small local area and enterprise networks. The Cisco 6500
+router \[Cisco 6500 2016\] internally switches packets over a
+32-Gbps-backplane bus. Switching via an interconnection network. One way
+to overcome the bandwidth limitation of a single, shared bus is to use a
+more sophisticated interconnection network, such as those that have been
+used in the past to interconnect processors in a multiprocessor computer
+architecture. A crossbar switch is an interconnection network consisting
+of 2N buses that connect N input ports to N output ports, as shown in
+Figure 4.6. Each vertical bus intersects each horizontal bus at a
+crosspoint, which can be opened or closed at any time by the switch
+fabric controller (whose logic is
+
+part of the switching fabric itself). When a packet arrives from port A
+and needs to be forwarded to port Y, the switch controller closes the
+crosspoint at the intersection of busses A and Y, and port A then sends
+the packet onto its bus, which is picked up (only) by bus Y. Note that a
+packet from port B can be forwarded to port X at the same time, since
+the A-to-Y and B-to-X packets use different input and output busses.
+Thus, unlike the previous two switching approaches, crossbar switches
+are capable of forwarding multiple packets in parallel. A crossbar
+switch is non-blocking---a packet being forwarded to an output port will
+not be blocked from reaching that output port as long as no other packet
+is currently being forwarded to that output port. However, if two
+packets from two different input ports are destined to that same output
+port, then one will have to wait at the input, since only one packet can
+be sent over any given bus at a time. Cisco 12000 series switches
+\[Cisco 12000 2016\] use a crossbar switching network; the Cisco 7600
+series can be configured to use either a bus or crossbar switch \[Cisco
+7600 2016\]. More sophisticated interconnection networks use multiple
+stages of switching elements to allow packets from different input ports
+to proceed towards the same output port at the same time through the
+multi-stage switching fabric. See \[Tobagi 1990\] for a survey of switch
+architectures. The Cisco CRS employs a three-stage non-blocking
+switching strategy. A router's switching capacity can also be scaled by
+running multiple switching fabrics in parallel. In this approach, input
+ports and output ports are connected to N switching fabrics that operate
+in parallel. An input port breaks a packet into K smaller chunks, and
+sends ("sprays") the chunks through K of these N switching fabrics to
+the selected output port, which reassembles the K chunks back into the
+original packet.
+
+4.2.3 Output Port Processing Output port processing, shown in Figure
+4.7, takes packets that have been stored in the output port's memory and
+transmits them over the output link. This includes selecting and
+de-queueing packets for transmission, and performing the needed
+link-layer and physical-layer transmission functions.
+
+4.2.4 Where Does Queuing Occur? If we consider input and output port
+functionality and the configurations shown in Figure 4.6, it's clear
+that packet queues may form at both the input ports and the output
+ports, just as we identified cases where cars may wait at the inputs and
+outputs of the traffic intersection in our roundabout analogy. The
+location and extent of queueing (either at the input port queues or the
+output port queues) will depend on the traffic load, the relative speed
+of the switching fabric, and the line speed. Let's now consider these
+queues in a bit more detail, since as these queues grow large, the
+router's memory can eventually be exhausted and packet loss will occur
+when no memory is available to store arriving packets. Recall that in
+our earlier discussions, we said that packets were "lost within the
+network" or "dropped at a
+
+router." It is here, at these queues within a router, where such packets
+are actually dropped and lost.
+
+Figure 4.7 Output port processing
+
+Suppose that the input and output line speeds (transmission rates) all
+have an identical transmission rate of Rline packets per second, and
+that there are N input ports and N output ports. To further simplify the
+discussion, let's assume that all packets have the same fixed length,
+and that packets arrive to input ports in a synchronous manner. That is,
+the time to send a packet on any link is equal to the time to receive a
+packet on any link, and during such an interval of time, either zero or
+one packets can arrive on an input link. Define the switching fabric
+transfer rate Rswitch as the rate at which packets can be moved from
+input port to output port. If Rswitch is N times faster than Rline, then
+only negligible queuing will occur at the input ports. This is because
+even in the worst case, where all N input lines are receiving packets,
+and all packets are to be forwarded to the same output port, each batch
+of N packets (one packet per input port) can be cleared through the
+switch fabric before the next batch arrives. Input Queueing But what
+happens if the switch fabric is not fast enough (relative to the input
+line speeds) to transfer all arriving packets through the fabric without
+delay? In this case, packet queuing can also occur at the input ports,
+as packets must join input port queues to wait their turn to be
+transferred through the switching fabric to the output port. To
+illustrate an important consequence of this queuing, consider a crossbar
+switching fabric and suppose that (1) all link speeds are identical, (2)
+that one packet can be transferred from any one input port to a given
+output port in the same amount of time it takes for a packet to be
+received on an input link, and (3) packets are moved from a given input
+queue to their desired output queue in an FCFS manner. Multiple packets
+can be transferred in parallel, as long as their output ports are
+different. However, if two packets at the front of two input queues are
+destined for the same output queue, then one of the packets will be
+blocked and must wait at the input queue---the switching fabric can
+transfer only one packet to a given output port at a time. Figure 4.8
+shows an example in which two packets (darkly shaded) at the front of
+their input queues are destined for the same upper-right output port.
+Suppose that the switch fabric chooses to transfer the packet from the
+front of the upper-left queue. In this case, the darkly shaded packet in
+the lower-left queue must wait. But not only must this darkly shaded
+packet wait, so too must the lightly shaded
+
+packet that is queued behind that packet in the lower-left queue, even
+though there is no contention for the middle-right output port (the
+destination for the lightly shaded packet). This phenomenon is known as
+head-of-the-line (HOL) blocking in an input-queued switch---a queued
+packet in an input queue must wait for transfer through the fabric (even
+though its output port is free) because it is blocked by another packet
+at the head of the line. \[Karol 1987\] shows that due to HOL blocking,
+the input queue will grow to unbounded length (informally, this is
+equivalent to saying that significant packet loss will occur) under
+certain assumptions as soon as the packet arrival rate on the input
+links reaches only 58 percent of their capacity. A number of solutions
+to HOL blocking are discussed in \[McKeown 1997\].
+
+Figure 4.8 HOL blocking at and input-queued switch
+
+Output Queueing Let's next consider whether queueing can occur at a
+switch's output ports. Suppose that Rswitch is again N times faster than
+Rline and that packets arriving at each of the N input ports are
+destined to the same output port. In this case, in the time it takes to
+send a single packet onto the outgoing link, N new packets will arrive
+at this output port (one from each of the N input ports). Since the
+output port can
+
+transmit only a single packet in a unit of time (the packet transmission
+time), the N arriving packets will have to queue (wait) for transmission
+over the outgoing link. Then N more packets can possibly arrive in the
+time it takes to transmit just one of the N packets that had just
+previously been queued. And so on. Thus, packet queues can form at the
+output ports even when the switching fabric is N times faster than the
+port line speeds. Eventually, the number of queued packets can grow
+large enough to exhaust available memory at the output port.
+
+Figure 4.9 Output port queueing
+
+When there is not enough memory to buffer an incoming packet, a decision
+must be made to either drop the arriving packet (a policy known as
+drop-tail) or remove one or more already-queued packets to make room for
+the newly arrived packet. In some cases, it may be advantageous to drop
+(or mark the header of) a packet before the buffer is full in order to
+provide a congestion signal to the sender. A number of proactive
+packet-dropping and -marking policies (which collectively have become
+known as active queue management (AQM) algorithms) have been proposed
+and analyzed \[Labrador 1999, Hollot 2002\]. One of the most widely
+studied and implemented AQM algorithms is the Random Early Detection
+(RED) algorithm \[Christiansen 2001; Floyd 2016\]. Output port queuing
+is illustrated in Figure 4.9. At time t, a packet has arrived at each of
+the incoming input ports, each destined for the uppermost outgoing port.
+Assuming identical line speeds and a switch operating at three times the
+line speed, one time unit later (that is, in the time needed to receive
+or send
+
+a packet), all three original packets have been transferred to the
+outgoing port and are queued awaiting transmission. In the next time
+unit, one of these three packets will have been transmitted over the
+outgoing link. In our example, two new packets have arrived at the
+incoming side of the switch; one of these packets is destined for this
+uppermost output port. A consequence of such queuing is that a packet
+scheduler at the output port must choose one packet, among those queued,
+for transmission--- a topic we'll cover in the following section. Given
+that router buffers are needed to absorb the fluctuations in traffic
+load, a natural question to ask is how much buffering is required. For
+many years, the rule of thumb \[RFC 3439\] for buffer sizing was that
+the amount of buffering (B) should be equal to an average round-trip
+time (RTT, say 250 msec) times the link capacity (C). This result is
+based on an analysis of the queueing dynamics of a relatively small
+number of TCP flows \[Villamizar 1994\]. Thus, a 10 Gbps link with an
+RTT of 250 msec would need an amount of buffering equal to B 5 RTT · C 5
+2.5 Gbits of buffers. More recent theoretical and experimental efforts
+\[Appenzeller 2004\], however, suggest that when there are a large
+number of TCP flows (N) passing through a link, the amount of buffering
+needed is B=RTI⋅C/N. With a large number of flows typically passing
+through large backbone router links (see, e.g., \[Fraleigh 2003\]), the
+value of N can be large, with the decrease in needed buffer size
+becoming quite significant. \[Appenzeller 2004; Wischik 2005; Beheshti
+2008\] provide very readable discussions of the buffer-sizing problem
+from a theoretical, implementation, and operational standpoint.
+
+4.2.5 Packet Scheduling Let's now return to the question of determining
+the order in which queued packets are transmitted over an outgoing link.
+Since you yourself have undoubtedly had to wait in long lines on many
+occasions and observed how waiting customers are served, you're no doubt
+familiar with many of the queueing disciplines commonly used in routers.
+There is first-come-first-served (FCFS, also known as first-in-firstout,
+FIFO). The British are famous for patient and orderly FCFS queueing at
+bus stops and in the marketplace ("Oh, are you queueing?"). Other
+countries operate on a priority basis, with one class of waiting
+customers given priority service over other waiting customers. There is
+also round-robin queueing, where customers are again divided into
+classes (as in priority queueing) but each class of customer is given
+service in turn. First-in-First-Out (FIFO) Figure 4.10 shows the queuing
+model abstraction for the FIFO link-scheduling discipline. Packets
+arriving at the link output queue wait for transmission if the link is
+currently busy transmitting another packet. If there is not sufficient
+buffering space to hold the arriving packet, the queue's
+packetdiscarding policy then determines whether the packet will be
+dropped (lost) or whether other packets will be removed from the queue
+to make space for the arriving packet, as discussed above. In our
+
+discussion below, we'll ignore packet discard. When a packet is
+completely transmitted over the outgoing link (that is, receives
+service) it is removed from the queue. The FIFO (also known as
+first-come-first-served, or FCFS) scheduling discipline selects packets
+for link transmission in the same order in which they arrived at the
+output link queue. We're all familiar with FIFO queuing from service
+centers, where
+
+Figure 4.10 FIFO queueing abstraction
+
+arriving customers join the back of the single waiting line, remain in
+order, and are then served when they reach the front of the line. Figure
+4.11 shows the FIFO queue in operation. Packet arrivals are indicated by
+numbered arrows above the upper timeline, with the number indicating the
+order in which the packet arrived. Individual packet departures are
+shown below the lower timeline. The time that a packet spends in service
+(being transmitted) is indicated by the shaded rectangle between the two
+timelines. In our examples here, let's assume that each packet takes
+three units of time to be transmitted. Under the FIFO discipline,
+packets leave in the same order in which they arrived. Note that after
+the departure of packet 4, the link remains idle (since packets 1
+through 4 have been transmitted and removed from the queue) until the
+arrival of packet 5. Priority Queuing Under priority queuing, packets
+arriving at the output link are classified into priority classes upon
+arrival at the queue, as shown in Figure 4.12. In practice, a network
+operator may configure a queue so that packets carrying network
+management information (e.g., as indicated by the source or destination
+TCP/UDP port number) receive priority over user traffic; additionally,
+real-time voice-over-IP packets might receive priority over non-real
+traffic such as SMTP or IMAP e-mail packets. Each
+
+Figure 4.11 The FIFO queue in operation
+
+Figure 4.12 The priority queueing model
+
+priority class typically has its own queue. When choosing a packet to
+transmit, the priority queuing discipline will transmit a packet from
+the highest priority class that has a nonempty queue (that is, has
+packets waiting for transmission). The choice among packets in the same
+priority class is typically done in a FIFO manner. Figure 4.13
+illustrates the operation of a priority queue with two priority classes.
+Packets 1, 3, and 4 belong to the high-priority class, and packets 2 and
+5 belong to the low-priority class. Packet 1 arrives and, finding the
+link idle, begins transmission. During the transmission of packet 1,
+packets 2 and 3 arrive and are queued in the low- and high-priority
+queues, respectively. After the transmission of packet 1, packet 3 (a
+high-priority packet) is selected for transmission over packet 2 (which,
+even though it arrived earlier, is a low-priority packet). At the end of
+the transmission of packet 3, packet 2 then begins transmission. Packet
+4 (a high-priority packet) arrives during the transmission of packet 2
+(a low-priority packet). Under a non-preemptive priority queuing
+discipline, the transmission of a packet is not interrupted once it has
+
+Figure 4.13 The priority queue in operation
+
+Figure 4.14 The two-class robin queue in operation
+
+begun. In this case, packet 4 queues for transmission and begins being
+transmitted after the transmission of packet 2 is completed. Round Robin
+and Weighted Fair Queuing (WFQ) Under the round robin queuing
+discipline, packets are sorted into classes as with priority queuing.
+However, rather than there being a strict service priority among
+classes, a round robin scheduler alternates service among the classes.
+In the simplest form of round robin scheduling, a class 1 packet is
+transmitted, followed by a class 2 packet, followed by a class 1 packet,
+followed by a class 2 packet, and so on. A so-called work-conserving
+queuing discipline will never allow the link to remain idle whenever
+there are packets (of any class) queued for transmission. A
+work-conserving round robin discipline that looks for a packet of a
+given class but finds none will immediately check the next class in the
+round robin sequence. Figure 4.14 illustrates the operation of a
+two-class round robin queue. In this example, packets 1, 2, and
+
+4 belong to class 1, and packets 3 and 5 belong to the second class.
+Packet 1 begins transmission immediately upon arrival at the output
+queue. Packets 2 and 3 arrive during the transmission of packet 1 and
+thus queue for transmission. After the transmission of packet 1, the
+link scheduler looks for a class 2 packet and thus transmits packet 3.
+After the transmission of packet 3, the scheduler looks for a class 1
+packet and thus transmits packet 2. After the transmission of packet 2,
+packet 4 is the only queued packet; it is thus transmitted immediately
+after packet 2. A generalized form of round robin queuing that has been
+widely implemented in routers is the so-called weighted fair queuing
+(WFQ) discipline \[Demers 1990; Parekh 1993; Cisco QoS 2016\]. WFQ is
+illustrated in Figure 4.15. Here, arriving packets are classified and
+queued in the appropriate per-class waiting area. As in round robin
+scheduling, a WFQ scheduler will serve classes in a circular manner---
+first serving class 1, then serving class 2, then serving class 3, and
+then (assuming there are three classes) repeating the service pattern.
+WFQ is also a work-conserving
+
+Figure 4.15 Weighted fair queueing
+
+queuing discipline and thus will immediately move on to the next class
+in the service sequence when it finds an empty class queue. WFQ differs
+from round robin in that each class may receive a differential amount of
+service in any interval of time. Specifically, each class, i, is
+assigned a weight, wi. Under WFQ, during any interval of time during
+which there are class i packets to send, class i will then be guaranteed
+to receive a fraction of service equal to wi/(∑wj), where the sum in the
+denominator is taken over all classes that also have packets queued for
+transmission. In the worst case, even if all classes have queued
+packets, class i will still be guaranteed to receive a fraction wi/(∑wj)
+of the bandwidth, where in this worst case the sum in the denominator is
+over all classes. Thus, for a link with transmission rate R, class i
+will always achieve a throughput of at least R⋅wi/(∑wj). Our description
+of WFQ has been idealized, as we have not considered the fact that
+packets are discrete and a packet's transmission will not be interrupted
+to begin transmission of another packet; \[Demers 1990; Parekh 1993\]
+discuss this packetization issue.
+
+4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, and More Our
+study of the network layer thus far in Chapter 4---the notion of the
+data and control plane component of the network layer, our distinction
+between forwarding and routing, the identification of various network
+service models, and our look inside a router---have often been without
+reference to any specific computer network architecture or protocol. In
+this section we'll focus on key aspects of the network layer on today's
+Internet and the celebrated Internet Protocol (IP). There are two
+versions of IP in use today. We'll first examine the widely deployed IP
+protocol version 4, which is usually referred to simply as IPv4 \[RFC
+791\]
+
+Figure 4.16 IPv4 datagram format
+
+in Section 4.3.1. We'll examine IP version 6 \[RFC 2460; RFC 4291\],
+which has been proposed to replace IPv4, in Section 4.3.5. In between,
+we'll primarily cover Internet addressing---a topic that might seem
+rather dry and detail-oriented but we'll see is crucial to understanding
+how the Internet's network layer works. To master IP addressing is to
+master the Internet's network layer itself!
+
+4.3.1 IPv4 Datagram Format Recall that the Internet's network-layer
+packet is referred to as a datagram. We begin our study of IP with an
+overview of the syntax and semantics of the IPv4 datagram. You might be
+thinking that nothing could be drier than the syntax and semantics of a
+packet's bits. Nevertheless, the datagram plays a central role in the
+Internet---every networking student and professional needs to see it,
+absorb it, and master it. (And just to see that protocol headers can
+indeed be fun to study, check out \[Pomeranz 2010\]). The IPv4 datagram
+format is shown in Figure 4.16. The key fields in the IPv4 datagram are
+the following: Version number. These 4 bits specify the IP protocol
+version of the datagram. By looking at the version number, the router
+can determine how to interpret the remainder of the IP datagram.
+Different versions of IP use different datagram formats. The datagram
+format for IPv4 is shown in Figure 4.16. The datagram format for the new
+version of IP (IPv6) is discussed in Section 4.3.5. Header length.
+Because an IPv4 datagram can contain a variable number of options (which
+are included in the IPv4 datagram header), these 4 bits are needed to
+determine where in the IP datagram the payload (e.g., the
+transport-layer segment being encapsulated in this datagram) actually
+begins. Most IP datagrams do not contain options, so the typical IP
+datagram has a 20-byte header. Type of service. The type of service
+(TOS) bits were included in the IPv4 header to allow different types of
+IP datagrams to be distinguished from each other. For example, it might
+be useful to distinguish real-time datagrams (such as those used by an
+IP telephony application) from non-realtime traffic (for example, FTP).
+The specific level of service to be provided is a policy issue
+determined and configured by the network administrator for that router.
+We also learned in Section 3.7.2 that two of the TOS bits are used for
+Explicit Congestion Notification. Datagram length. This is the total
+length of the IP datagram (header plus data), measured in bytes. Since
+this field is 16 bits long, the theoretical maximum size of the IP
+datagram is 65,535 bytes. However, datagrams are rarely larger than
+1,500 bytes, which allows an IP datagram to fit in the payload field of
+a maximally sized Ethernet frame. Identifier, flags, fragmentation
+offset. These three fields have to do with so-called IP fragmentation, a
+topic we will consider shortly. Interestingly, the new version of IP,
+IPv6, does not allow for fragmentation. Time-to-live. The time-to-live
+(TTL) field is included to ensure that datagrams do not circulate
+forever (due to, for example, a long-lived routing loop) in the network.
+This field is decremented by one each time the datagram is processed by
+a router. If the TTL field reaches 0, a router must drop that datagram.
+Protocol. This field is typically used only when an IP datagram reaches
+its final destination. The value of this field indicates the specific
+transport-layer protocol to which the data portion of this IP datagram
+should be passed. For example, a value of 6 indicates that the data
+portion is passed to TCP, while a value of 17 indicates that the data is
+passed to UDP. For a list of all possible values,
+
+see \[IANA Protocol Numbers 2016\]. Note that the protocol number in the
+IP datagram has a role that is analogous to the role of the port number
+field in the transport-layer segment. The protocol number is the glue
+that binds the network and transport layers together, whereas the port
+number is the glue that binds the transport and application layers
+together. We'll see in Chapter 6 that the linklayer frame also has a
+special field that binds the link layer to the network layer. Header
+checksum. The header checksum aids a router in detecting bit errors in a
+received IP datagram. The header checksum is computed by treating each 2
+bytes in the header as a number and summing these numbers using 1s
+complement arithmetic. As discussed in Section 3.3, the 1s complement of
+this sum, known as the Internet checksum, is stored in the checksum
+field. A router computes the header checksum for each received IP
+datagram and detects an error condition if the checksum carried in the
+datagram header does not equal the computed checksum. Routers typically
+discard datagrams for which an error has been detected. Note that the
+checksum must be recomputed and stored again at each router, since the
+TTL field, and possibly the options field as well, will change. An
+interesting discussion of fast algorithms for computing the Internet
+checksum is \[RFC 1071\]. A question often asked at this point is, why
+does TCP/IP perform error checking at both the transport and network
+layers? There are several reasons for this repetition. First, note that
+only the IP header is checksummed at the IP layer, while the TCP/UDP
+checksum is computed over the entire TCP/UDP segment. Second, TCP/UDP
+and IP do not necessarily both have to belong to the same protocol
+stack. TCP can, in principle, run over a different network-layer
+protocol (for example, ATM) \[Black 1995\]) and IP can carry data that
+will not be passed to TCP/UDP. Source and destination IP addresses. When
+a source creates a datagram, it inserts its IP address into the source
+IP address field and inserts the address of the ultimate destination
+into the destination IP address field. Often the source host determines
+the destination address via a DNS lookup, as discussed in Chapter 2.
+We'll discuss IP addressing in detail in Section 4.3.3. Options. The
+options fields allow an IP header to be extended. Header options were
+meant to be used rarely---hence the decision to save overhead by not
+including the information in options fields in every datagram header.
+However, the mere existence of options does complicate matters---since
+datagram headers can be of variable length, one cannot determine a
+priori where the data field will start. Also, since some datagrams may
+require options processing and others may not, the amount of time needed
+to process an IP datagram at a router can vary greatly. These
+considerations become particularly important for IP processing in
+high-performance routers and hosts. For these reasons and others, IP
+options were not included in the IPv6 header, as discussed in Section
+4.3.5. Data (payload). Finally, we come to the last and most important
+field---the raison d'etre for the datagram in the first place! In most
+circumstances, the data field of the IP datagram contains the
+transport-layer segment (TCP or UDP) to be delivered to the destination.
+However, the data field can carry other types of data, such as ICMP
+messages (discussed in Section 5.6). Note that an IP datagram has a
+total of 20 bytes of header (assuming no options). If the datagram
+carries a TCP segment, then each (non-fragmented) datagram carries a
+total of 40 bytes of header (20 bytes of IP header plus 20 bytes of TCP
+header) along with the application-layer message.
+
+4.3.2 IPv4 Datagram Fragmentation We'll see in Chapter 6 that not all
+link-layer protocols can carry network-layer packets of the same size.
+Some protocols can carry big datagrams, whereas other protocols can
+carry only little datagrams. For example, Ethernet frames can carry up
+to 1,500 bytes of data, whereas frames for some wide-area links can
+carry no more than 576 bytes. The maximum amount of data that a
+link-layer frame can carry is called the maximum transmission unit
+(MTU). Because each IP datagram is encapsulated within the link-layer
+frame for transport from one router to the next router, the MTU of the
+link-layer protocol places a hard limit on the length of an IP datagram.
+Having a hard limit on the size of an IP datagram is not much of a
+problem. What is a problem is that each of the links along the route
+between sender and destination can use different link-layer protocols,
+and each of these protocols can have different MTUs. To understand the
+forwarding issue better, imagine that you are a router that
+interconnects several links, each running different link-layer protocols
+with different MTUs. Suppose you receive an IP datagram from one link.
+You check your forwarding table to determine the outgoing link, and this
+outgoing link has an MTU that is smaller than the length of the IP
+datagram. Time to panic---how are you going to squeeze this oversized IP
+datagram into the payload field of the link-layer frame? The solution is
+to fragment the payload in the IP datagram into two or more smaller IP
+datagrams, encapsulate each of these smaller IP datagrams in a separate
+link-layer frame; and send these frames over the outgoing link. Each of
+these smaller datagrams is referred to as a fragment. Fragments need to
+be reassembled before they reach the transport layer at the destination.
+Indeed, both TCP and UDP are expecting to receive complete, unfragmented
+segments from the network layer. The designers of IPv4 felt that
+reassembling datagrams in the routers would introduce significant
+complication into the protocol and put a damper on router performance.
+(If you were a router, would you want to be reassembling fragments on
+top of everything else you had to do?) Sticking to the principle of
+keeping the network core simple, the designers of IPv4 decided to put
+the job of datagram reassembly in the end systems rather than in network
+routers. When a destination host receives a series of datagrams from the
+same source, it needs to determine whether any of these datagrams are
+fragments of some original, larger datagram. If some datagrams are
+fragments, it must further determine when it has received the last
+fragment and how the fragments it has received should be pieced back
+together to form the original datagram. To allow the destination host to
+perform these reassembly tasks, the designers of IP (version 4) put
+identification, flag, and fragmentation offset fields in the IP datagram
+header. When a datagram is created, the sending host stamps the datagram
+with an identification number as well as source and destination
+addresses. Typically, the sending host increments the identification
+number for each datagram it sends. When a router needs to fragment a
+datagram, each resulting datagram (that is, fragment) is stamped with
+the
+
+source address, destination address, and identification number of the
+original datagram. When the destination receives a series of datagrams
+from the same sending host, it can examine the identification numbers of
+the datagrams to determine which of the datagrams are actually fragments
+of the same larger datagram. Because IP is an unreliable service, one or
+more of the fragments may never arrive at the destination. For this
+reason, in order for the destination host to be absolutely sure it has
+received the last fragment of
+
+Figure 4.17 IP fragmentation and reassembly
+
+the original datagram, the last fragment has a flag bit set to 0,
+whereas all the other fragments have this flag bit set to 1. Also, in
+order for the destination host to determine whether a fragment is
+missing (and also to be able to reassemble the fragments in their proper
+order), the offset field is used to specify where the fragment fits
+within the original IP datagram. Figure 4.17 illustrates an example. A
+datagram of 4,000 bytes (20 bytes of IP header plus 3,980 bytes of IP
+payload) arrives at a router and must be forwarded to a link with an MTU
+of 1,500 bytes. This implies that the 3,980 data bytes in the original
+datagram must be allocated to three separate fragments (each of which is
+also an IP datagram). The online material for this book, and the
+problems at the end of this chapter will allow you to explore
+fragmentation in more detail. Also, on this book's Web site, we provide
+a Java applet that generates fragments. You provide the incoming
+datagram size, the MTU, and the incoming datagram identification.
+
+The applet automatically generates the fragments for you. See
+http://www.pearsonhighered.com/csresources/.
+
+4.3.3 IPv4 Addressing We now turn our attention to IPv4 addressing.
+Although you may be thinking that addressing must be a straightforward
+topic, hopefully by the end of this section you'll be convinced that
+Internet addressing is not only a juicy, subtle, and interesting topic
+but also one that is of central importance to the Internet. An excellent
+treatment of IPv4 addressing can be found in the first chapter in
+\[Stewart 1999\]. Before discussing IP addressing, however, we'll need
+to say a few words about how hosts and routers are connected into the
+Internet. A host typically has only a single link into the network; when
+IP in the host wants to send a datagram, it does so over this link. The
+boundary between the host and the physical link is called an interface.
+Now consider a router and its interfaces. Because a router's job is to
+receive a datagram on one link and forward the datagram on some other
+link, a router necessarily has two or more links to which it is
+connected. The boundary between the router and any one of its links is
+also called an interface. A router thus has multiple interfaces, one for
+each of its links. Because every host and router is capable of sending
+and receiving IP datagrams, IP requires each host and router interface
+to have its own IP address. Thus, an IP address is technically
+associated with an interface, rather than with the host or router
+containing that interface. Each IP address is 32 bits long
+(equivalently, 4 bytes), and there are thus a total of 232 (or
+approximately 4 billion) possible IP addresses. These addresses are
+typically written in so-called dotted-decimal notation, in which each
+byte of the address is written in its decimal form and is separated by a
+period (dot) from other bytes in the address. For example, consider the
+IP address 193.32.216.9. The 193 is the decimal equivalent of the first
+8 bits of the address; the 32 is the decimal equivalent of the second 8
+bits of the address, and so on. Thus, the address 193.32.216.9 in binary
+notation is 11000001 00100000 11011000 00001001 Each interface on every
+host and router in the global Internet must have an IP address that is
+globally unique (except for interfaces behind NATs, as discussed in
+Section 4.3.4). These addresses cannot be chosen in a willy-nilly
+manner, however. A portion of an interface's IP address will be
+determined by the subnet to which it is connected. Figure 4.18 provides
+an example of IP addressing and interfaces. In this figure, one router
+(with three interfaces) is used to interconnect seven hosts. Take a
+close look at the IP addresses assigned to the host and router
+interfaces, as there are several things to notice. The three hosts in
+the upper-left portion of Figure 4.18, and the router interface to which
+they are connected, all have an IP address of the form
+
+223.1.1.xxx. That is, they all have the same leftmost 24 bits in their
+IP address. These four interfaces are also interconnected to each other
+by a network that contains no routers. This network could be
+interconnected by an Ethernet LAN, in which case the interfaces would be
+interconnected by an Ethernet switch (as we'll discuss in Chapter 6), or
+by a wireless access point (as we'll discuss in Chapter 7). We'll
+represent this routerless network connecting these hosts as a cloud for
+now, and dive into the internals of such networks in Chapters 6 and 7.
+In IP terms, this network interconnecting three host interfaces and one
+router interface forms a subnet \[RFC 950\]. (A subnet is also called an
+IP network or simply
+
+Figure 4.18 Interface addresses and subnets
+
+a network in the Internet literature.) IP addressing assigns an address
+to this subnet: 223.1.1.0/24, where the /24 ("slash-24") notation,
+sometimes known as a subnet mask, indicates that the leftmost 24 bits of
+the 32-bit quantity define the subnet address. The 223.1.1.0/24 subnet
+thus consists of the three host interfaces (223.1.1.1, 223.1.1.2, and
+223.1.1.3) and one router interface (223.1.1.4). Any additional hosts
+attached to the 223.1.1.0/24 subnet would be required to have an address
+of the form 223.1.1.xxx. There are two additional subnets shown in
+Figure 4.18: the 223.1.2.0/24 network and the 223.1.3.0/24 subnet.
+Figure 4.19 illustrates the three IP subnets present in Figure 4.18. The
+IP definition of a subnet is not restricted to Ethernet segments that
+connect multiple hosts to a router interface. To get some insight here,
+consider Figure 4.20, which shows three routers that are interconnected
+with each other by point-to-point links. Each router has three
+interfaces, one for each point-to-point link and one for the broadcast
+link that directly connects the router to a pair of hosts. What
+
+subnets are present here? Three subnets, 223.1.1.0/24, 223.1.2.0/24, and
+223.1.3.0/24, are similar to the subnets we encountered in Figure 4.18.
+But note that there are three additional subnets in this example as
+well: one subnet, 223.1.9.0/24, for the interfaces that connect routers
+R1 and R2; another subnet, 223.1.8.0/24, for the interfaces that connect
+routers R2 and R3; and a third subnet, 223.1.7.0/24, for the interfaces
+that connect routers R3 and R1. For a general interconnected system of
+routers and hosts, we can use the following recipe to define the subnets
+in the system:
+
+Figure 4.19 Subnet addresses
+
+To determine the subnets, detach each interface from its host or router,
+creating islands of isolated networks, with interfaces terminating the
+end points of the isolated networks. Each of these isolated networks is
+called a subnet. If we apply this procedure to the interconnected system
+in Figure 4.20, we get six islands or subnets. From the discussion
+above, it's clear that an organization (such as a company or academic
+institution) with multiple Ethernet segments and point-to-point links
+will have multiple subnets, with all of the devices on a given subnet
+having the same subnet address. In principle, the different subnets
+could have quite different subnet addresses. In practice, however, their
+subnet addresses often have much in common. To understand why, let's
+next turn our attention to how addressing is handled in the global
+Internet. The Internet's address assignment strategy is known as
+Classless Interdomain Routing (CIDR--- pronounced cider) \[RFC 4632\].
+CIDR generalizes the notion of subnet addressing. As with subnet
+
+addressing, the 32-bit IP address is divided into two parts and again
+has the dotted-decimal form a.b.c.d/x, where x indicates the number of
+bits in the first part of the address. The x most significant bits of an
+address of the form a.b.c.d/x constitute the network portion of the IP
+address, and are often referred to as the prefix (or network prefix) of
+the address. An organization is typically assigned a block of contiguous
+addresses, that is, a range of addresses with a common prefix (see the
+Principles in Practice feature). In this case, the IP addresses of
+devices within the organization will share the common prefix. When we
+cover the Internet's BGP routing protocol in
+
+Figure 4.20 Three routers interconnecting six subnets
+
+Section 5.4, we'll see that only these x leading prefix bits are
+considered by routers outside the organization's network. That is, when
+a router outside the organization forwards a datagram whose destination
+address is inside the organization, only the leading x bits of the
+address need be considered. This considerably reduces the size of the
+forwarding table in these routers, since a single entry of the form
+a.b.c.d/x will be sufficient to forward packets to any destination
+within the organization. The remaining 32-x bits of an address can be
+thought of as distinguishing among the devices within the organization,
+all of which have the same network prefix. These are the bits that will
+be considered when forwarding packets at routers within the
+organization. These lower-order bits may (or may not) have an
+
+additional subnetting structure, such as that discussed above. For
+example, suppose the first 21 bits of the CIDRized address a.b.c.d/21
+specify the organization's network prefix and are common to the IP
+addresses of all devices in that organization. The remaining 11 bits
+then identify the specific hosts in the organization. The organization's
+internal structure might be such that these 11 rightmost bits are used
+for subnetting within the organization, as discussed above. For example,
+a.b.c.d/24 might refer to a specific subnet within the organization.
+Before CIDR was adopted, the network portions of an IP address were
+constrained to be 8, 16, or 24 bits in length, an addressing scheme
+known as classful addressing, since subnets with 8-, 16-, and 24-bit
+subnet addresses were known as class A, B, and C networks, respectively.
+The requirement that the subnet portion of an IP address be exactly 1,
+2, or 3 bytes long turned out to be problematic for supporting the
+rapidly growing number of organizations with small and medium-sized
+subnets. A class C (/24) subnet could accommodate only up to 28 − 2 =
+254 hosts (two of the 28 = 256 addresses are reserved for special
+use)---too small for many organizations. However, a class B (/16)
+subnet, which supports up to 65,634 hosts, was too large. Under classful
+addressing, an organization with, say, 2,000 hosts was typically
+allocated a class B (/16) subnet address. This led to a rapid depletion
+of the class B address space and poor utilization of the assigned
+address space. For example, the organization that used a class B address
+for its 2,000 hosts was allocated enough of the address space for up to
+65,534 interfaces---leaving more than 63,000 addresses that could not be
+used by other organizations.
+
+PRINCIPLES IN PRACTICE This example of an ISP that connects eight
+organizations to the Internet nicely illustrates how carefully allocated
+CIDRized addresses facilitate routing. Suppose, as shown in Figure 4.21,
+that the ISP (which we'll call Fly-By-Night-ISP) advertises to the
+outside world that it should be sent any datagrams whose first 20
+address bits match 200.23.16.0/20. The rest of the world need not know
+that within the address block 200.23.16.0/20 there are in fact eight
+other organizations, each with its own subnets. This ability to use a
+single prefix to advertise multiple networks is often referred to as
+address aggregation (also route aggregation or route summarization).
+Address aggregation works extremely well when addresses are allocated in
+blocks to ISPs and then from ISPs to client organizations. But what
+happens when addresses are not allocated in such a hierarchical manner?
+What would happen, for example, if Fly-By-Night-ISP acquires ISPs-R-Us
+and then has Organization 1 connect to the Internet through its
+subsidiary ISPs-RUs? As shown in Figure 4.21, the subsidiary ISPs-R-Us
+owns the address block 199.31.0.0/16, but Organization 1's IP addresses
+are unfortunately outside of this address block. What should be done
+here? Certainly, Organization 1 could renumber all of its routers and
+hosts to have addresses within the ISPs-R-Us address block. But this is
+a costly solution, and Organization 1 might well be reassigned to
+another subsidiary in the future. The solution typically adopted is for
+Organization 1 to keep its IP addresses in 200.23.18.0/23. In this case,
+as shown in Figure 4.22,
+
+Fly-By-Night-ISP continues to advertise the address block 200.23.16.0/20
+and ISPs-R-Us continues to advertise 199.31.0.0/16. However, ISPs-R-Us
+now also advertises the block of addresses for Organization 1,
+200.23.18.0/23. When other routers in the larger Internet see the
+address blocks 200.23.16.0/20 (from Fly-By-Night-ISP) and 200.23.18.0/23
+(from ISPs-R-Us) and want to route to an address in the block
+200.23.18.0/23, they will use longest prefix matching (see Section
+4.2.1), and route toward ISPs-R-Us, as it advertises the longest (i.e.,
+most-specific) address prefix that matches the destination address.
+
+Figure 4.21 Hierarchical addressing and route aggregation
+
+Figure 4.22 ISPs-R-Us has a more specific route to Organization 1
+
+We would be remiss if we did not mention yet another type of IP address,
+the IP broadcast address 255.255.255.255. When a host sends a datagram
+with destination address 255.255.255.255, the message is delivered to
+all hosts on the same subnet. Routers optionally forward the message
+into neighboring subnets as well (although they usually don't). Having
+now studied IP addressing in detail, we need to know how hosts and
+subnets get their addresses in the first place. Let's begin by looking
+at how an organization gets a block of addresses for its devices, and
+then look at how a device (such as a host) is assigned an address from
+within the organization's block of addresses. Obtaining a Block of
+Addresses In order to obtain a block of IP addresses for use within an
+organization's subnet, a network administrator might first contact its
+ISP, which would provide addresses from a larger block of addresses that
+had already been allocated to the ISP. For example, the ISP may itself
+have been allocated the address block 200.23.16.0/20. The ISP, in turn,
+could divide its address block into eight equal-sized contiguous address
+blocks and give one of these address blocks out to each of up to eight
+organizations that are supported by this ISP, as shown below. (We have
+underlined the subnet part of these addresses for your convenience.)
+ISP's block:
+
+200.23.16.0/20
+
+11001000 00010111 00010000 00000000
+
+Organization 0
+
+200.23.16.0/23
+
+11001000 00010111 00010000 00000000
+
+Organization 1
+
+200.23.18.0/23
+
+11001000 00010111 00010010 00000000
+
+Organization 2
+
+200.23.20.0/23
+
+11001000 00010111 00010100 00000000
+
+    ...   ...                     Organization 7
+
+200.23.30.0/23
+
+   ... 11001000 00010111 00011110 00000000
+
+While obtaining a set of addresses from an ISP is one way to get a block
+of addresses, it is not the only way. Clearly, there must also be a way
+for the ISP itself to get a block of addresses. Is there a global
+authority that has ultimate responsibility for managing the IP address
+space and allocating address blocks to ISPs and other organizations?
+Indeed there is! IP addresses are managed under the authority of the
+Internet Corporation for Assigned Names and Numbers (ICANN) \[ICANN
+2016\], based on guidelines set forth in \[RFC 7020\]. The role of the
+nonprofit ICANN organization \[NTIA 1998\] is not only to allocate IP
+addresses, but also to manage the DNS root servers. It also has the very
+contentious job of assigning domain names and resolving domain name
+disputes. The ICANN allocates addresses to regional Internet registries
+(for example, ARIN, RIPE, APNIC, and LACNIC, which together form the
+Address Supporting Organization of ICANN \[ASO-ICANN 2016\]), and handle
+the allocation/management of addresses within their regions. Obtaining a
+Host Address: The Dynamic Host Configuration Protocol Once an
+organization has obtained a block of addresses, it can assign individual
+IP addresses to the host and router interfaces in its organization. A
+system administrator will typically manually configure the IP addresses
+into the router (often remotely, with a network management tool). Host
+addresses can also be configured manually, but typically this is done
+using the Dynamic Host Configuration Protocol (DHCP) \[RFC 2131\]. DHCP
+allows a host to obtain (be allocated) an IP address automatically. A
+network administrator can configure DHCP so that a given host receives
+the same IP address each time it connects to the network, or a host may
+be assigned a temporary IP address that will be different each time the
+host connects to the network. In addition to host IP address assignment,
+DHCP also allows a host to learn additional information, such as its
+subnet mask, the address of its first-hop router (often called the
+default gateway), and the address of its local DNS server. Because of
+DHCP's ability to automate the network-related aspects of connecting a
+host into a network, it is often referred to as a plug-and-play or
+zeroconf (zero-configuration) protocol. This capability makes it very
+attractive to the network administrator who would otherwise have to
+perform these tasks manually! DHCP is also enjoying widespread use in
+residential Internet access networks, enterprise
+
+networks, and in wireless LANs, where hosts join and leave the network
+frequently. Consider, for example, the student who carries a laptop from
+a dormitory room to a library to a classroom. It is likely that in each
+location, the student will be connecting into a new subnet and hence
+will need a new IP address at each location. DHCP is ideally suited to
+this situation, as there are many users coming and going, and addresses
+are needed for only a limited amount of time. The value of DHCP's
+plug-and-play capability is clear, since it's unimaginable that a system
+administrator would be able to reconfigure laptops at each location, and
+few students (except those taking a computer networking class!) would
+have the expertise to configure their laptops manually. DHCP is a
+client-server protocol. A client is typically a newly arriving host
+wanting to obtain network configuration information, including an IP
+address for itself. In the simplest case, each subnet (in the addressing
+sense of Figure 4.20) will have a DHCP server. If no server is present
+on the subnet, a DHCP relay agent (typically a router) that knows the
+address of a DHCP server for that network is needed. Figure 4.23 shows a
+DHCP server attached to subnet 223.1.2/24, with the router serving as
+the relay agent for arriving clients attached to subnets 223.1.1/24 and
+223.1.3/24. In our discussion below, we'll assume that a DHCP server is
+available on the subnet. For a newly arriving host, the DHCP protocol is
+a four-step process, as shown in Figure 4.24 for the network setting
+shown in Figure 4.23. In this figure, yiaddr (as in "your Internet
+address") indicates the address being allocated to the newly arriving
+client. The four steps are:
+
+Figure 4.23 DHCP client and server
+
+DHCP server discovery. The first task of a newly arriving host is to
+find a DHCP server with which to interact. This is done using a DHCP
+discover message, which a client sends within a UDP packet to port 67.
+The UDP packet is encapsulated in an IP datagram. But to whom should
+this datagram be sent? The host doesn't even know the IP address of the
+network to which it is attaching, much less the address of a DHCP server
+for this network. Given this, the DHCP client creates an IP datagram
+containing its DHCP discover message along with the broadcast
+destination IP address of 255.255.255.255 and a "this host" source IP
+address of 0.0.0.0. The DHCP client passes the IP datagram to the link
+layer, which then broadcasts this frame to all nodes attached to the
+subnet (we will cover the details of link-layer broadcasting in Section
+6.4). DHCP server offer(s). A DHCP server receiving a DHCP discover
+message responds to the client with a DHCP offer message that is
+broadcast to all nodes on the subnet, again using the IP broadcast
+address of 255.255.255.255. (You might want to think about why this
+server reply must also be broadcast). Since several DHCP servers can be
+present on the subnet, the client may find itself in the enviable
+position of being able to choose from among several offers. Each
+
+Figure 4.24 DHCP client-server interaction
+
+server offer message contains the transaction ID of the received
+discover message, the proposed IP address for the client, the network
+mask, and an IP address lease time---the amount of time for which the IP
+address will be valid. It is common for the server to set the lease time
+to several hours or days \[Droms 2002\]. DHCP request. The newly
+arriving client will choose from among one or more server offers and
+respond to its selected offer with a DHCP request message, echoing back
+the configuration parameters. DHCP ACK. The server responds to the DHCP
+request message with a DHCP ACK message, confirming the requested
+parameters. Once the client receives the DHCP ACK, the interaction is
+complete and the client can use the DHCPallocated IP address for the
+lease duration. Since a client may want to use its address beyond the
+
+lease's expiration, DHCP also provides a mechanism that allows a client
+to renew its lease on an IP address. From a mobility aspect, DHCP does
+have one very significant shortcoming. Since a new IP address is
+obtained from DHCP each time a node connects to a new subnet, a TCP
+connection to a remote application cannot be maintained as a mobile node
+moves between subnets. In Chapter 6, we will examine mobile IP---an
+extension to the IP infrastructure that allows a mobile node to use a
+single permanent address as it moves between subnets. Additional details
+about DHCP can be found in \[Droms 2002\] and \[dhc 2016\]. An open
+source reference implementation of DHCP is available from the Internet
+Systems Consortium \[ISC 2016\].
+
+4.3.4 Network Address Translation (NAT) Given our discussion about
+Internet addresses and the IPv4 datagram format, we're now well aware
+that every IP-capable device needs an IP address. With the proliferation
+of small office, home office (SOHO) subnets, this would seem to imply
+that whenever a SOHO wants to install a LAN to connect multiple
+machines, a range of addresses would need to be allocated by the ISP to
+cover all of the SOHO's IP devices (including phones, tablets, gaming
+devices, IP TVs, printers and more). If the subnet grew bigger, a larger
+block of addresses would have to be allocated. But what if the ISP had
+already allocated the contiguous portions of the SOHO network's current
+address range? And what typical homeowner wants (or should need) to know
+how to manage IP addresses in the first place? Fortunately, there is a
+simpler approach to address allocation that has found increasingly
+widespread use in such scenarios: network address translation (NAT)
+\[RFC 2663; RFC 3022; Huston 2004, Zhang 2007; Cisco NAT 2016\]. Figure
+4.25 shows the operation of a NAT-enabled router. The NAT-enabled
+router, residing in the home, has an interface that is part of the home
+network on the right of Figure 4.25. Addressing within the home network
+is exactly as we have seen above---all four interfaces in the home
+network have the same subnet address of 10.0.0/24. The address space
+10.0.0.0/8 is one of three portions of the IP address space that is
+reserved in \[RFC 1918\] for a private network or a realm with private
+addresses, such as the home network in Figure 4.25. A realm with private
+addresses refers to a network whose addresses only have meaning to
+devices within that network. To see why this is important, consider the
+fact that there are hundreds of thousands of home networks, many using
+the same address space, 10.0.0.0/24. Devices within a given home network
+can send packets to each other using 10.0.0.0/24 addressing. However,
+packets forwarded beyond the home network into the larger global
+Internet clearly cannot use these addresses (as either a source or a
+destination address) because there are hundreds of thousands of networks
+using this block of addresses. That is, the 10.0.0.0/24 addresses can
+only have meaning within the
+
+Figure 4.25 Network address translation
+
+given home network. But if private addresses only have meaning within a
+given network, how is addressing handled when packets are sent to or
+received from the global Internet, where addresses are necessarily
+unique? The answer lies in understanding NAT. The NAT-enabled router
+does not look like a router to the outside world. Instead the NAT router
+behaves to the outside world as a single device with a single IP
+address. In Figure 4.25, all traffic leaving the home router for the
+larger Internet has a source IP address of 138.76.29.7, and all traffic
+entering the home router must have a destination address of 138.76.29.7.
+In essence, the NAT-enabled router is hiding the details of the home
+network from the outside world. (As an aside, you might wonder where the
+home network computers get their addresses and where the router gets its
+single IP address. Often, the answer is the same---DHCP! The router gets
+its address from the ISP's DHCP server, and the router runs a DHCP
+server to provide addresses to computers within the
+NAT-DHCP-routercontrolled home network's address space.) If all
+datagrams arriving at the NAT router from the WAN have the same
+destination IP address (specifically, that of the WAN-side interface of
+the NAT router), then how does the router know the internal host to
+which it should forward a given datagram? The trick is to use a NAT
+translation table at the NAT router, and to include port numbers as well
+as IP addresses in the table entries. Consider the example in Figure
+4.25. Suppose a user sitting in a home network behind host 10.0.0.1
+requests a Web page on some Web server (port 80) with IP address
+128.119.40.186. The host 10.0.0.1 assigns the (arbitrary) source port
+number 3345 and sends the datagram into the LAN. The NAT router receives
+the datagram, generates a new source port number 5001 for the datagram,
+replaces the
+
+source IP address with its WAN-side IP address 138.76.29.7, and replaces
+the original source port number 3345 with the new source port number
+5001. When generating a new source port number, the NAT router can
+select any source port number that is not currently in the NAT
+translation table. (Note that because a port number field is 16 bits
+long, the NAT protocol can support over 60,000 simultaneous connections
+with a single WAN-side IP address for the router!) NAT in the router
+also adds an entry to its NAT translation table. The Web server,
+blissfully unaware that the arriving datagram containing the HTTP
+request has been manipulated by the NAT router, responds with a datagram
+whose destination address is the IP address of the NAT router, and whose
+destination port number is 5001. When this datagram arrives at the NAT
+router, the router indexes the NAT translation table using the
+destination IP address and destination port number to obtain the
+appropriate IP address (10.0.0.1) and destination port number (3345) for
+the browser in the home network. The router then rewrites the datagram's
+destination address and destination port number, and forwards the
+datagram into the home network. NAT has enjoyed widespread deployment in
+recent years. But NAT is not without detractors. First, one might argue
+that, port numbers are meant to be used for addressing processes, not
+for addressing hosts. This violation can indeed cause problems for
+servers running on the home network, since, as we have seen in Chapter
+2, server processes wait for incoming requests at well-known port
+numbers and peers in a P2P protocol need to accept incoming connections
+when acting as servers. Technical solutions to these problems include
+NAT traversal tools \[RFC 5389\] and Universal Plug and Play (UPnP), a
+protocol that allows a host to discover and configure a nearby NAT
+\[UPnP Forum 2016\]. More "philosophical" arguments have also been
+raised against NAT by architectural purists. Here, the concern is that
+routers are meant to be layer 3 (i.e., network-layer) devices, and
+should process packets only up to the network layer. NAT violates this
+principle that hosts should be talking directly with each other, without
+interfering nodes modifying IP addresses, much less port numbers. But
+like it or not, NAT has not become an important component of the
+Internet, as have other so-called middleboxes \[Sekar 2011\] that
+operate at the network layer but have functions that are quite different
+from routers. Middleboxes do not perform traditional datagram
+forwarding, but instead perform functions such as NAT, load balancing of
+traffic flows, traffic firewalling (see accompanying sidebar), and more.
+The generalized forwarding paradigm that we'll study shortly in Section
+4.4 allows a number of these middlebox functions, as well as traditional
+router forwarding, to be accomplished in a common, integrated manner.
+
+FOCUS ON SECURITY INSPECTING DATAGRAMS: FIREWALLS AND INTRUSION
+DETECTION SYSTEMS Suppose you are assigned the task of administering a
+home, departmental, university, or corporate network. Attackers, knowing
+the IP address range of your network, can easily send IP datagrams to
+addresses in your range. These datagrams can do all kinds of devious
+things, including mapping your network with ping sweeps and port scans,
+crashing vulnerable hosts with
+
+malformed packets, scanning for open TCP/UDP ports on servers in your
+network, and infecting hosts by including malware in the packets. As the
+network administrator, what are you going to do about all those bad guys
+out there, each capable of sending malicious packets into your network?
+Two popular defense mechanisms to malicious packet attacks are firewalls
+and intrusion detection systems (IDSs). As a network administrator, you
+may first try installing a firewall between your network and the
+Internet. (Most access routers today have firewall capability.)
+Firewalls inspect the datagram and segment header fields, denying
+suspicious datagrams entry into the internal network. For example, a
+firewall may be configured to block all ICMP echo request packets (see
+Section 5.6), thereby preventing an attacker from doing a traditional
+port scan across your IP address range. Firewalls can also block packets
+based on source and destination IP addresses and port numbers.
+Additionally, firewalls can be configured to track TCP connections,
+granting entry only to datagrams that belong to approved connections.
+Additional protection can be provided with an IDS. An IDS, typically
+situated at the network boundary, performs "deep packet inspection,"
+examining not only header fields but also the payloads in the datagram
+(including application-layer data). An IDS has a database of packet
+signatures that are known to be part of attacks. This database is
+automatically updated as new attacks are discovered. As packets pass
+through the IDS, the IDS attempts to match header fields and payloads to
+the signatures in its signature database. If such a match is found, an
+alert is created. An intrusion prevention system (IPS) is similar to an
+IDS, except that it actually blocks packets in addition to creating
+alerts. In Chapter 8, we'll explore firewalls and IDSs in more detail.
+Can firewalls and IDSs fully shield your network from all attacks? The
+answer is clearly no, as attackers continually find new attacks for
+which signatures are not yet available. But firewalls and traditional
+signature-based IDSs are useful in protecting your network from known
+attacks.
+
+4.3.5 IPv6 In the early 1990s, the Internet Engineering Task Force began
+an effort to develop a successor to the IPv4 protocol. A prime
+motivation for this effort was the realization that the 32-bit IPv4
+address space was beginning to be used up, with new subnets and IP nodes
+being attached to the Internet (and being allocated unique IP addresses)
+at a breathtaking rate. To respond to this need for a large IP address
+space, a new IP protocol, IPv6, was developed. The designers of IPv6
+also took this opportunity to tweak and augment other aspects of IPv4,
+based on the accumulated operational experience with IPv4. The point in
+time when IPv4 addresses would be completely allocated (and hence no new
+networks
+
+could attach to the Internet) was the subject of considerable debate.
+The estimates of the two leaders of the IETF's Address Lifetime
+Expectations working group were that addresses would become exhausted in
+2008 and 2018, respectively \[Solensky 1996\]. In February 2011, IANA
+allocated out the last remaining pool of unassigned IPv4 addresses to a
+regional registry. While these registries still have available IPv4
+addresses within their pool, once these addresses are exhausted, there
+are no more available address blocks that can be allocated from a
+central pool \[Huston 2011a\]. A recent survey of IPv4 address-space
+exhaustion, and the steps taken to prolong the life of the address space
+is \[Richter 2015\]. Although the mid-1990s estimates of IPv4 address
+depletion suggested that a considerable amount of time might be left
+until the IPv4 address space was exhausted, it was realized that
+considerable time would be needed to deploy a new technology on such an
+extensive scale, and so the process to develop IP version 6 (IPv6) \[RFC
+2460\] was begun \[RFC 1752\]. (An often-asked question is what happened
+to IPv5? It was initially envisioned that the ST-2 protocol would become
+IPv5, but ST-2 was later dropped.) An excellent source of information
+about IPv6 is \[Huitema 1998\]. IPv6 Datagram Format The format of the
+IPv6 datagram is shown in Figure 4.26. The most important changes
+introduced in IPv6 are evident in the datagram format: Expanded
+addressing capabilities. IPv6 increases the size of the IP address from
+32 to 128 bits. This ensures that the world won't run out of IP
+addresses. Now, every grain of sand on the planet can be IP-addressable.
+In addition to unicast and multicast addresses, IPv6 has introduced a
+new type of address, called an anycast address, that allows a datagram
+to be delivered to any one of a group of hosts. (This feature could be
+used, for example, to send an HTTP GET to the nearest of a number of
+mirror sites that contain a given document.) A streamlined 40-byte
+header. As discussed below, a number of IPv4 fields have been dropped or
+made optional. The resulting 40-byte fixed-length header allows for
+faster processing of the IP datagram by a router. A new encoding of
+options allows for more flexible options processing. Flow labeling. IPv6
+has an elusive definition of a flow. RFC 2460 states that this allows
+"labeling of packets belonging to particular flows for which the sender
+
+Figure 4.26 IPv6 datagram format
+
+requests special handling, such as a non-default quality of service or
+real-time service." For example, audio and video transmission might
+likely be treated as a flow. On the other hand, the more traditional
+applications, such as file transfer and e-mail, might not be treated as
+flows. It is possible that the traffic carried by a high-priority user
+(for example, someone paying for better service for their traffic) might
+also be treated as a flow. What is clear, however, is that the designers
+of IPv6 foresaw the eventual need to be able to differentiate among the
+flows, even if the exact meaning of a flow had yet to be determined. As
+noted above, a comparison of Figure 4.26 with Figure 4.16 reveals the
+simpler, more streamlined structure of the IPv6 datagram. The following
+fields are defined in IPv6: Version. This 4-bit field identifies the IP
+version number. Not surprisingly, IPv6 carries a value of 6 in this
+field. Note that putting a 4 in this field does not create a valid IPv4
+datagram. (If it did, life would be a lot simpler---see the discussion
+below regarding the transition from IPv4 to IPv6.) Traffic class. The
+8-bit traffic class field, like the TOS field in IPv4, can be used to
+give priority to certain datagrams within a flow, or it can be used to
+give priority to datagrams from certain applications (for example,
+voice-over-IP) over datagrams from other applications (for example, SMTP
+e-mail). Flow label. As discussed above, this 20-bit field is used to
+identify a flow of datagrams. Payload length. This 16-bit value is
+treated as an unsigned integer giving the number of bytes in the IPv6
+datagram following the fixed-length, 40-byte datagram header. Next
+header. This field identifies the protocol to which the contents (data
+field) of this datagram will be delivered (for example, to TCP or UDP).
+The field uses the same values as the protocol field in the IPv4 header.
+Hop limit. The contents of this field are decremented by one by each
+router that forwards the datagram. If the hop limit count reaches zero,
+the datagram is discarded.
+
+Source and destination addresses. The various formats of the IPv6
+128-bit address are described in RFC 4291. Data. This is the payload
+portion of the IPv6 datagram. When the datagram reaches its destination,
+the payload will be removed from the IP datagram and passed on to the
+protocol specified in the next header field. The discussion above
+identified the purpose of the fields that are included in the IPv6
+datagram. Comparing the IPv6 datagram format in Figure 4.26 with the
+IPv4 datagram format that we saw in Figure 4.16, we notice that several
+fields appearing in the IPv4 datagram are no longer present in the IPv6
+datagram: Fragmentation/reassembly. IPv6 does not allow for
+fragmentation and reassembly at intermediate routers; these operations
+can be performed only by the source and destination. If an IPv6 datagram
+received by a router is too large to be forwarded over the outgoing
+link, the router simply drops the datagram and sends a "Packet Too Big"
+ICMP error message (see Section 5.6) back to the sender. The sender can
+then resend the data, using a smaller IP datagram size. Fragmentation
+and reassembly is a time-consuming operation; removing this
+functionality from the routers and placing it squarely in the end
+systems considerably speeds up IP forwarding within the network. Header
+checksum. Because the transport-layer (for example, TCP and UDP) and
+link-layer (for example, Ethernet) protocols in the Internet layers
+perform checksumming, the designers of IP probably felt that this
+functionality was sufficiently redundant in the network layer that it
+could be removed. Once again, fast processing of IP packets was a
+central concern. Recall from our discussion of IPv4 in Section 4.3.1
+that since the IPv4 header contains a TTL field (similar to the hop
+limit field in IPv6), the IPv4 header checksum needed to be recomputed
+at every router. As with fragmentation and reassembly, this too was a
+costly operation in IPv4. Options. An options field is no longer a part
+of the standard IP header. However, it has not gone away. Instead, the
+options field is one of the possible next headers pointed to from within
+the IPv6 header. That is, just as TCP or UDP protocol headers can be the
+next header within an IP packet, so too can an options field. The
+removal of the options field results in a fixed-length, 40-byte IP
+header. Transitioning from IPv4 to IPv6 Now that we have seen the
+technical details of IPv6, let us consider a very practical matter: How
+will the public Internet, which is based on IPv4, be transitioned to
+IPv6? The problem is that while new IPv6capable systems can be made
+backward-compatible, that is, can send, route, and receive IPv4
+datagrams, already deployed IPv4-capable systems are not capable of
+handling IPv6 datagrams. Several options are possible \[Huston 2011b,
+RFC 4213\]. One option would be to declare a flag day---a given time and
+date when all Internet machines would be turned off and upgraded from
+IPv4 to IPv6. The last major technology transition (from using NCP to
+
+using TCP for reliable transport service) occurred almost 35 years ago.
+Even back then \[RFC 801\], when the Internet was tiny and still being
+administered by a small number of "wizards," it was realized that such a
+flag day was not possible. A flag day involving billions of devices is
+even more unthinkable today. The approach to IPv4-to-IPv6 transition
+that has been most widely adopted in practice involves tunneling \[RFC
+4213\]. The basic idea behind tunneling---a key concept with
+applications in many other scenarios beyond IPv4-to-IPv6 transition,
+including wide use in the all-IP cellular networks that we'll cover in
+Chapter 7---is the following. Suppose two IPv6 nodes (in this example, B
+and E in Figure 4.27) want to interoperate using IPv6 datagrams but are
+connected to each other by intervening IPv4 routers. We refer to the
+intervening set of IPv4 routers between two IPv6 routers as a tunnel, as
+illustrated in Figure 4.27. With tunneling, the IPv6 node on the sending
+side of the tunnel (in this example, B) takes the entire IPv6 datagram
+and puts it in the data (payload) field of an IPv4 datagram. This IPv4
+datagram is then addressed to the IPv6 node on the receiving side of the
+tunnel (in this example, E) and sent to the first node in the tunnel (in
+this example, C). The intervening IPv4 routers in the tunnel route this
+IPv4 datagram among themselves, just as they would any other datagram,
+blissfully unaware that the IPv4 datagram itself contains a complete
+IPv6 datagram. The IPv6 node on the receiving side of the tunnel
+eventually receives the IPv4 datagram (it is the destination of the IPv4
+datagram!), determines that the IPv4 datagram contains an IPv6 datagram
+(by observing that the protocol number field in the IPv4 datagram is 41
+\[RFC 4213\], indicating that the IPv4 payload is a IPv6 datagram),
+extracts the IPv6 datagram, and then routes the IPv6 datagram exactly as
+it would if it had received the IPv6 datagram from a directly connected
+IPv6 neighbor. We end this section by noting that while the adoption of
+IPv6 was initially slow to take off \[Lawton 2001; Huston 2008b\],
+momentum has been building. NIST \[NIST IPv6 2015\] reports that more
+than a third of US government second-level domains are IPv6-enabled. On
+the client side, Google reports that only about 8 percent of the clients
+accessing Google services do so via IPv6 \[Google IPv6 2015\]. But other
+recent measurements \[Czyz 2014\] indicate that IPv6 adoption is
+accelerating. The proliferation of devices such as IP-enabled phones and
+other portable devices
+
+Figure 4.27 Tunneling
+
+provides an additional push for more widespread deployment of IPv6.
+Europe's Third Generation Partnership Program \[3GPP 2016\] has
+specified IPv6 as the standard addressing scheme for mobile multimedia.
+One important lesson that we can learn from the IPv6 experience is that
+it is enormously difficult to change network-layer protocols. Since the
+early 1990s, numerous new network-layer protocols have been trumpeted as
+the next major revolution for the Internet, but most of these protocols
+have had limited penetration to date. These protocols include IPv6,
+multicast protocols, and resource reservation protocols; a discussion of
+these latter two protocols can be found in the online supplement to this
+text. Indeed, introducing new protocols into the network layer is like
+replacing the foundation of a house---it is difficult to do without
+tearing the whole house down or at least temporarily relocating the
+house's residents. On the other hand, the Internet has witnessed rapid
+deployment of new protocols at the application layer. The classic
+examples, of course, are the Web, instant messaging, streaming media,
+distributed games, and various forms of social media. Introducing new
+application-layer protocols is like adding a new layer of paint to a
+house---it is relatively easy to do, and if you choose an attractive
+color, others in the neighborhood will copy you. In summary, in the
+future we can certainly expect to see changes in the Internet's network
+layer, but these changes will likely occur on a time scale that is much
+slower than the changes that will occur at the application layer.
+
+4.4 Generalized Forwarding and SDN In Section 4.2.1, we noted that an
+Internet router's forwarding decision has traditionally been based
+solely on a packet's destination address. In the previous section,
+however, we've also seen that there has been a proliferation of
+middleboxes that perform many layer-3 functions. NAT boxes rewrite
+header IP addresses and port numbers; firewalls block traffic based on
+header-field values or redirect packets for additional processing, such
+as deep packet inspection (DPI). Load-balancers forward packets
+requesting a given service (e.g., an HTTP request) to one of a set of a
+set of servers that provide that service. \[RFC 3234\] lists a number of
+common middlebox functions. This proliferation of middleboxes, layer-2
+switches, and layer-3 routers \[Qazi 2013\]---each with its own
+specialized hardware, software and management interfaces---has
+undoubtedly resulted in costly headaches for many network operators.
+However, recent advances in software-defined networking have promised,
+and are now delivering, a unified approach towards providing many of
+these network-layer functions, and certain link-layer functions as well,
+in a modern, elegant, and integrated manner. Recall that Section 4.2.1
+characterized destination-based forwarding as the two steps of looking
+up a destination IP address ("match"), then sending the packet into the
+switching fabric to the specified output port ("action"). Let's now
+consider a significantly more general "match-plus-action" paradigm,
+where the "match" can be made over multiple header fields associated
+with different protocols at different layers in the protocol stack. The
+"action" can include forwarding the packet to one or more output ports
+(as in destination-based forwarding), load balancing packets across
+multiple outgoing interfaces that lead to a service (as in load
+balancing), rewriting header values (as in NAT), purposefully
+blocking/dropping a packet (as in a firewall), sending a packet to a
+special server for further processing and action (as in DPI), and more.
+In generalized forwarding, a match-plus-action table generalizes the
+notion of the destination-based forwarding table that we encountered in
+Section 4.2.1. Because forwarding decisions may be made using
+network-layer and/or link-layer source and destination addresses, the
+forwarding devices shown in Figure 4.28 are more accurately described as
+"packet switches" rather than layer 3 "routers" or layer 2 "switches."
+Thus, in the remainder of this section, and in Section 5.5, we'll refer
+
+Figure 4.28 Generalized forwarding: Each packet switch contains a
+match-plus-action table that is computed and distributed by a remote
+controller
+
+to these devices as packet switches, adopting the terminology that is
+gaining widespread adoption in SDN literature. Figure 4.28 shows a
+match-plus-action table in each packet switch, with the table being
+computed, installed, and updated by a remote controller. We note that
+while it is possible for the control components at the individual packet
+switch to interact with each other (e.g., in a manner similar to that in
+Figure 4.2), in practice generalized match-plus-action capabilities are
+implemented via a remote controller that computes, installs, and updates
+these tables. You might take a minute to compare Figures 4.2, 4.3 and
+4.28---what similarities and differences do you notice between
+destination-based forwarding shown in Figure 4.2 and 4.3, and
+generalized forwarding shown in Figure 4.28? Our following discussion of
+generalized forwarding will be based on OpenFlow \[McKeown 2008,
+OpenFlow 2009, Casado 2014, Tourrilhes 2014\]---a highly visible and
+successful standard that has pioneered the notion of the
+match-plus-action forwarding abstraction and controllers, as well as the
+SDN revolution more generally \[Feamster 2013\]. We'll primarily
+consider OpenFlow 1.0, which introduced key SDN abstractions and
+functionality in a particularly clear and concise manner. Later versions
+of
+
+OpenFlow introduced additional capabilities as a result of experience
+gained through implementation and use; current and earlier versions of
+the OpenFlow standard can be found at \[ONF 2016\]. Each entry in the
+match-plus-action forwarding table, known as a flow table in OpenFlow,
+includes: A set of header field values to which an incoming packet will
+be matched. As in the case of destination-based forwarding,
+hardware-based matching is most rapidly performed in TCAM memory, with
+more than a million destination address entries being possible
+\[Bosshart 2013\]. A packet that matches no flow table entry can be
+dropped or sent to the remote controller for more processing. In
+practice, a flow table may be implemented by multiple flow tables for
+performance or cost reasons \[Bosshart 2013\], but we'll focus here on
+the abstraction of a single flow table. A set of counters that are
+updated as packets are matched to flow table entries. These counters
+might include the number of packets that have been matched by that table
+entry, and the time since the table entry was last updated. A set of
+actions to be taken when a packet matches a flow table entry. These
+actions might be to forward the packet to a given output port, to drop
+the packet, makes copies of the packet and sent them to multiple output
+ports, and/or to rewrite selected header fields. We'll explore matching
+and actions in more detail in Sections 4.4.1 and 4.4.2, respectively.
+We'll then study how the network-wide collection of per-packet switch
+matching rules can be used to implement a wide range of functions
+including routing, layer-2 switching, firewalling, load-balancing,
+virtual networks, and more in Section 4.4.3. In closing, we note that
+the flow table is essentially an API, the abstraction through which an
+individual packet switch's behavior can be programmed; we'll see in
+Section 4.4.3 that network-wide behaviors can similarly be programmed by
+appropriately programming/configuring these tables in a collection of
+network packet switches \[Casado 2014\].
+
+4.4.1 Match Figure 4.29 shows the eleven packet-header fields and the
+incoming port ID that can be matched in an OpenFlow 1.0
+match-plus-action rule. Recall from
+
+Figure 4.29 Packet matching fields, OpenFlow 1.0 flow table
+
+Section 1.5.2 that a link-layer (layer 2) frame arriving to a packet
+switch will contain a network-layer (layer 3) datagram as its payload,
+which in turn will typically contain a transport-layer (layer 4)
+segment. The first observation we make is that OpenFlow's match
+abstraction allows for a match to be made on selected fields from three
+layers of protocol headers (thus rather brazenly defying the layering
+principle we studied in Section 1.5). Since we've not yet covered the
+link layer, suffice it to say that the source and destination MAC
+addresses shown in Figure 4.29 are the link-layer addresses associated
+with the frame's sending and receiving interfaces; by forwarding on the
+basis of Ethernet addresses rather than IP addresses, we can see that an
+OpenFlow-enabled device can equally perform as a router (layer-3 device)
+forwarding datagrams as well as a switch (layer-2 device) forwarding
+frames. The Ethernet type field corresponds to the upper layer protocol
+(e.g., IP) to which the frame's payload will be demultiplexed, and the
+VLAN fields are concerned with so-called virtual local area networks
+that we'll study in Chapter 6. The set of twelve values that can be
+matched in the OpenFlow 1.0 specification has grown to 41 values in more
+recent OpenFlow specifications \[Bosshart 2014\]. The ingress port
+refers to the input port at the packet switch on which a packet is
+received. The packet's IP source address, IP destination address, IP
+protocol field, and IP type of service fields were discussed earlier in
+Section 4.3.1. The transport-layer source and destination port number
+fields can also be matched. Flow table entries may also have wildcards.
+For example, an IP address of 128.119.*.* in a flow table will match the
+corresponding address field of any datagram that has 128.119 as the
+first 16 bits of its address. Each flow table entry also has an
+associated priority. If a packet matches multiple flow table entries,
+the selected match and corresponding action will be that of the highest
+priority entry with which the packet matches. Lastly, we observe that
+not all fields in an IP header can be matched. For example OpenFlow does
+not allow matching on the basis of TTL field or datagram length field.
+Why are some fields allowed for matching, while others are not?
+Undoubtedly, the answer has to do with the tradeoff between
+functionality and complexity. The "art" in choosing an abstraction is to
+provide for enough functionality to accomplish a task (in this case to
+implement, configure, and manage a wide range of network-layer functions
+that had previously been implemented through an assortment of
+network-layer devices), without over-burdening the abstraction with so
+much detail and generality that it becomes bloated and unusable. Butler
+Lampson has famously noted \[Lampson 1983\]: Do one thing at a time, and
+do it well. An interface should capture the minimum essentials of an
+abstraction. Don't generalize; generalizations are generally wrong.
+Given OpenFlow's success, one can surmise that its designers indeed
+chose their abstraction well. Additional details of OpenFlow matching
+can be found in \[OpenFlow 2009, ONF 2016\].
+
+4.4.2 Action As shown in Figure 4.28, each flow table entry has a list
+of zero or more actions that determine the processing that is to be
+applied to a packet that matches a flow table entry. If there are
+multiple actions, they are performed in the order specified in the list.
+Among the most important possible actions are: Forwarding. An incoming
+packet may be forwarded to a particular physical output port, broadcast
+over all ports (except the port on which it arrived) or multicast over a
+selected set of ports. The packet may be encapsulated and sent to the
+remote controller for this device. That controller then may (or may not)
+take some action on that packet, including installing new flow table
+entries, and may return the packet to the device for forwarding under
+the updated set of flow table rules. Dropping. A flow table entry with
+no action indicates that a matched packet should be dropped.
+Modify-field. The values in ten packet header fields (all layer 2, 3,
+and 4 fields shown in Figure 4.29 except the IP Protocol field) may be
+re-written before the packet is forwarded to the chosen output port.
+
+4.4.3 OpenFlow Examples of Match-plus-action in Action Having now
+considered both the match and action components of generalized
+forwarding, let's put these ideas together in the context of the sample
+network shown in Figure 4.30. The network has 6 hosts (h1, h2, h3, h4,
+h5 and h6) and three packet switches (s1, s2 and s3), each with four
+local interfaces (numbered 1 through 4). We'll consider a number of
+network-wide behaviors that we'd like to implement, and the flow table
+entries in s1, s2 and s3 needed to implement this behavior.
+
+Figure 4.30 OpenFlow match-plus-action network with three packet
+switches, 6 hosts, and an OpenFlow controller
+
+A First Example: Simple Forwarding As a very simple example, suppose
+that the desired forwarding behavior is that packets from h5 or h6
+destined to h3 or h4 are to be forwarded from s3 to s1, and then from s1
+to s2 (thus completely avoiding the use of the link between s3 and s2).
+The flow table entry in s1 would be:
+
+s1 Flow Table (Example 1) Match
+
+Action
+
+Ingress Port = 1 ; IP Src = 10.3.*.* ; IP Dst = 10.2.*.*
+
+Forward(4)
+
+...
+
+...
+
+Of course, we'll also need a flow table entry in s3 so that datagrams
+sent from h5 or h6 are forwarded to s1 over outgoing interface 3:
+
+s3 Flow Table (Example 1) Match
+
+Action
+
+IP Src = 10.3.*.* ; IP Dst = 10.2.*.*
+
+Forward(3)
+
+...
+
+...
+
+Lastly, we'll also need a flow table entry in s2 to complete this first
+example, so that datagrams arriving from s1 are forwarded to their
+destination, either host h3 or h4:
+
+s2 Flow Table (Example 1) Match
+
+Action
+
+Ingress port = 2 ; IP Dst = 10.2.0.3
+
+Forward(3)
+
+Ingress port = 2 ; IP Dst = 10.2.0.4
+
+Forward(4)
+
+...
+
+...
+
+A Second Example: Load Balancing As a second example, let's consider a
+load-balancing scenario, where datagrams from h3 destined to 10.1.*.*
+are to be forwarded over the direct link between s2 and s1, while
+datagrams from h4 destined to 10.1.*.* are to be forwarded over the link
+between s2 and s3 (and then from s3 to s1). Note that this behavior
+couldn't be achieved with IP's destination-based forwarding. In this
+case, the flow table in s2 would be:
+
+s2 Flow Table (Example 2) Match
+
+Action
+
+Ingress port = 3; IP Dst = 10.1.*.*
+
+Forward(2)
+
+Ingress port = 4; IP Dst = 10.1.*.*
+
+Forward(1)
+
+...
+
+...
+
+Flow table entries are also needed at s1 to forward the datagrams
+received from s2 to either h1 or h2; and flow table entries are needed
+at s3 to forward datagrams received on interface 4 from s2 over
+interface 3 towards s1. See if you can figure out these flow table
+entries at s1 and s3. A Third Example: Firewalling As a third example,
+let's consider a firewall scenario in which s2 wants only to receive (on
+any of its interfaces) traffic sent from hosts attached to s3.
+
+s2 Flow Table (Example 3) Match
+
+Action
+
+IP Src = 10.3.*.* IP Dst = 10.2.0.3
+
+Forward(3)
+
+IP Src = 10.3.*.* IP Dst = 10.2.0.4
+
+Forward(4)
+
+...
+
+...
+
+If there were no other entries in s2's flow table, then only traffic
+from 10.3.*.* would be forwarded to the hosts attached to s2. Although
+we've only considered a few basic scenarios here, the versatility and
+advantages of generalized forwarding are hopefully apparent. In homework
+problems, we'll explore how flow tables can be used to create many
+different logical behaviors, including virtual networks---two or more
+logically separate networks (each with their own independent and
+distinct forwarding behavior)---that use the same physical set of packet
+switches and links. In Section 5.5, we'll return to flow tables when we
+study the SDN controllers that compute and distribute the flow tables,
+and the protocol used for communicating between a packet switch and its
+controller.
+
+4.5 Summary In this chapter we've covered the data plane functions of
+the network layer---the per-router functions that determine how packets
+arriving on one of a router's input links are forwarded to one of that
+router's output links. We began by taking a detailed look at the
+internal operations of a router, studying input and output port
+functionality and destination-based forwarding, a router's internal
+switching mechanism, packet queue management and more. We covered both
+traditional IP forwarding (where forwarding is based on a datagram's
+destination address) and generalized forwarding (where forwarding and
+other functions may be performed using values in several different
+fields in the datagram's header) and seen the versatility of the latter
+approach. We also studied the IPv4 and IPv6 protocols in detail, and
+Internet addressing, which we found to be much deeper, subtler, and more
+interesting than we might have expected. With our newfound understanding
+of the network-layer's data plane, we're now ready to dive into the
+network layer's control plane in Chapter 5!
+
+Homework Problems and Questions
+
+Chapter 4 Review Questions
+
+SECTION 4.1 R1. Let's review some of the terminology used in this
+textbook. Recall that the name of a transport-layer packet is segment
+and that the name of a link-layer packet is frame. What is the name of a
+network-layer packet? Recall that both routers and link-layer switches
+are called packet switches. What is the fundamental difference between a
+router and link-layer switch? R2. We noted that network layer
+functionality can be broadly divided into data plane functionality and
+control plane functionality. What are the main functions of the data
+plane? Of the control plane? R3. We made a distinction between the
+forwarding function and the routing function performed in the network
+layer. What are the key differences between routing and forwarding? R4.
+What is the role of the forwarding table within a router? R5. We said
+that a network layer's service model "defines the characteristics of
+end-to-end transport of packets between sending and receiving hosts."
+What is the service model of the Internet's network layer? What
+guarantees are made by the Internet's service model regarding the
+host-to-host delivery of datagrams?
+
+SECTION 4.2 R6. In Section 4.2 , we saw that a router typically consists
+of input ports, output ports, a switching fabric and a routing
+processor. Which of these are implemented in hardware and which are
+implemented in software? Why? Returning to the notion of the network
+layer's data plane and control plane, which are implemented in hardware
+and which are implemented in software? Why? R7. Discuss why each input
+port in a high-speed router stores a shadow copy of the forwarding
+table. R8. What is meant by destination-based forwarding? How does this
+differ from generalized forwarding (assuming you've read Section 4.4 ,
+which of the two approaches are adopted by Software-Defined Networking)?
+R9. Suppose that an arriving packet matches two or more entries in a
+router's forwarding table. With traditional destination-based
+forwarding, what rule does a router apply to determine which
+
+of these rules should be applied to determine the output port to which
+the arriving packet should be switched? R10. Three types of switching
+fabrics are discussed in Section 4.2 . List and briefly describe each
+type. Which, if any, can send multiple packets across the fabric in
+parallel? R11. Describe how packet loss can occur at input ports.
+Describe how packet loss at input ports can be eliminated (without using
+infinite buffers). R12. Describe how packet loss can occur at output
+ports. Can this loss be prevented by increasing the switch fabric speed?
+R13. What is HOL blocking? Does it occur in input ports or output ports?
+R14. In Section 4.2 , we studied FIFO, Priority, Round Robin (RR), and
+Weighted Fair Queueing (WFQ) packet scheduling disciplines? Which of
+these queueing disciplines ensure that all packets depart in the order
+in which they arrived? R15. Give an example showing why a network
+operator might want one class of packets to be given priority over
+another class of packets. R16. What is an essential different between RR
+and WFQ packet scheduling? Is there a case (Hint: Consider the WFQ
+weights) where RR and WFQ will behave exactly the same?
+
+SECTION 4.3 R17. Suppose Host A sends Host B a TCP segment encapsulated
+in an IP datagram. When Host B receives the datagram, how does the
+network layer in Host B know it should pass the segment (that is, the
+payload of the datagram) to TCP rather than to UDP or to some other
+upper-layer protocol? R18. What field in the IP header can be used to
+ensure that a packet is forwarded through no more than N routers? R19.
+Recall that we saw the Internet checksum being used in both
+transport-layer segment (in UDP and TCP headers, Figures 3.7 and 3.29
+respectively) and in network-layer datagrams (IP header, Figure 4.16 ).
+Now consider a transport layer segment encapsulated in an IP datagram.
+Are the checksums in the segment header and datagram header computed
+over any common bytes in the IP datagram? Explain your answer. R20. When
+a large datagram is fragmented into multiple smaller datagrams, where
+are these smaller datagrams reassembled into a single larger datagram?
+R21. Do routers have IP addresses? If so, how many? R22. What is the
+32-bit binary equivalent of the IP address 223.1.3.27? R23. Visit a host
+that uses DHCP to obtain its IP address, network mask, default router,
+and IP address of its local DNS server. List these values. R24. Suppose
+there are three routers between a source host and a destination host.
+Ignoring fragmentation, an IP datagram sent from the source host to the
+destination host will travel over how many interfaces? How many
+forwarding tables will be indexed to move the datagram from the source
+to the destination?
+
+R25. Suppose an application generates chunks of 40 bytes of data every
+20 msec, and each chunk gets encapsulated in a TCP segment and then an
+IP datagram. What percentage of each datagram will be overhead, and what
+percentage will be application data? R26. Suppose you purchase a
+wireless router and connect it to your cable modem. Also suppose that
+your ISP dynamically assigns your connected device (that is, your
+wireless router) one IP address. Also suppose that you have five PCs at
+home that use 802.11 to wirelessly connect to your wireless router. How
+are IP addresses assigned to the five PCs? Does the wireless router use
+NAT? Why or why not? R27. What is meant by the term "route aggregation"?
+Why is it useful for a router to perform route aggregation? R28. What is
+meant by a "plug-and-play" or "zeroconf" protocol? R29. What is a
+private network address? Should a datagram with a private network
+address ever be present in the larger public Internet? Explain. R30.
+Compare and contrast the IPv4 and the IPv6 header fields. Do they have
+any fields in common? R31. It has been said that when IPv6 tunnels
+through IPv4 routers, IPv6 treats the IPv4 tunnels as link-layer
+protocols. Do you agree with this statement? Why or why not?
+
+SECTION 4.4 R32. How does generalized forwarding differ from
+destination-based forwarding? R33. What is the difference between a
+forwarding table that we encountered in destinationbased forwarding in
+Section 4.1 and OpenFlow's flow table that we encountered in Section 4.4
+? R34. What is meant by the "match plus action" operation of a router or
+switch? In the case of destination-based forwarding packet switch, what
+is matched and what is the action taken? In the case of an SDN, name
+three fields that can be matched, and three actions that can be taken.
+R35. Name three header fields in an IP datagram that can be "matched" in
+OpenFlow 1.0 generalized forwarding. What are three IP datagram header
+fields that cannot be "matched" in OpenFlow?
+
+Problems P1. Consider the network below.
+
+a.  Show the forwarding table in router A, such that all traffic
+    destined to host H3 is forwarded through interface 3.
+
+b.  Can you write down a forwarding table in router A, such that all
+    traffic from H1 destined to host H3 is forwarded through interface
+    3, while all traffic from H2 destined to host H3 is forwarded
+    through interface 4? (Hint: This is a trick question.)
+
+P2. Suppose two packets arrive to two different input ports of a router
+at exactly the same time. Also suppose there are no other packets
+anywhere in the router.
+
+a.  Suppose the two packets are to be forwarded to two different output
+    ports. Is it possible to forward the two packets through the switch
+    fabric at the same time when the fabric uses a shared bus?
+
+b.  Suppose the two packets are to be forwarded to two different output
+    ports. Is it possible to forward the two packets through the switch
+    fabric at the same time when the fabric uses switching via memory?
+
+c.  Suppose the two packets are to be forwarded to the same output port.
+    Is it possible to forward the two packets through the switch fabric
+    at the same time when the fabric uses a crossbar? P3. In Section 4.2
+    , we noted that the maximum queuing delay is (n--1)D if the
+    switching fabric is n times faster than the input line rates.
+    Suppose that all packets are of the same length, n packets arrive at
+    the same time to the n input ports, and all n packets want to be
+    forwarded to different output ports. What is the maximum delay for a
+    packet for the (a) memory, (b) bus, and
+
+```{=html}
+<!-- -->
+```
+(c) crossbar switching fabrics? P4. Consider the switch shown below.
+    Suppose that all datagrams have the same fixed length, that the
+    switch operates in a slotted, synchronous manner, and that in one
+    time slot a datagram can be transferred from an input port to an
+    output port. The switch fabric is a crossbar so that at most one
+    datagram can be transferred to a given output port in a time slot,
+    but different output ports can receive datagrams from different
+    input ports in a single time slot. What is the minimal number of
+    time slots needed to transfer the packets shown from input ports to
+    their output ports, assuming any input queue scheduling order you
+    want (i.e., it need not have HOL blocking)? What is the largest
+    number of slots needed, assuming the worst-case scheduling order you
+    can devise, assuming that a non-empty input queue is never idle?
+
+P5. Consider a datagram network using 32-bit host addresses. Suppose a
+router has four links, numbered 0 through 3, and packets are to be
+forwarded to the link interfaces as follows: Destination Address Range
+
+Link Interface
+
+11100000 00000000 00000000 00000000
+
+0
+
+through 11100000 00111111 11111111 11111111 11100000 01000000 00000000
+00000000
+
+1
+
+through 11100000 01000000 11111111 11111111 2
+
+11100000 01000001 00000000 00000000 through 11100001 01111111 11111111
+11111111 otherwise
+
+3
+
+a.  Provide a forwarding table that has five entries, uses longest
+    prefix matching, and forwards packets to the correct link
+    interfaces.
+
+b.  Describe how your forwarding table determines the appropriate link
+    interface for datagrams with destination addresses: 11001000
+    10010001 01010001 01010101 11100001 01000000 11000011 00111100
+    11100001 10000000 00010001 01110111 P6. Consider a datagram network
+    using 8-bit host addresses. Suppose a router uses longest prefix
+    matching and has the following forwarding table: Prefix Match
+
+Interface
+
+00
+
+0
+
+010
+
+1
+
+011
+
+2
+
+10
+
+2
+
+11
+
+3
+
+For each of the four interfaces, give the associated range of
+destination host addresses and the number of addresses in the range. P7.
+Consider a datagram network using 8-bit host addresses. Suppose a router
+uses longest prefix matching and has the following forwarding table:
+Prefix Match
+
+Interface
+
+1
+
+0
+
+10
+
+1
+
+111
+
+2
+
+otherwise
+
+3
+
+For each of the four interfaces, give the associated range of
+destination host addresses and the number of addresses in the range. P8.
+Consider a router that interconnects three subnets: Subnet 1, Subnet 2,
+and Subnet 3. Suppose all of the interfaces in each of these three
+subnets are required to have the prefix 223.1.17/24. Also suppose that
+Subnet 1 is required to support at least 60 interfaces, Subnet 2 is to
+support at least 90 interfaces, and Subnet 3 is to support at least 12
+interfaces. Provide three network addresses (of the form a.b.c.d/x) that
+satisfy these constraints. P9. In Section 4.2.2 an example forwarding
+table (using longest prefix matching) is given. Rewrite this forwarding
+table using the a.b.c.d/x notation instead of the binary string
+notation. P10. In Problem P5 you are asked to provide a forwarding table
+(using longest prefix matching). Rewrite this forwarding table using the
+a.b.c.d/x notation instead of the binary string notation. P11. Consider
+a subnet with prefix 128.119.40.128/26. Give an example of one IP
+address (of form xxx.xxx.xxx.xxx) that can be assigned to this network.
+Suppose an ISP owns the block of addresses of the form 128.119.40.64/26.
+Suppose it wants to create four subnets from this block, with each block
+having the same number of IP addresses. What are the prefixes (of form
+
+a.b.c.d/x) for the four subnets? P12. Consider the topology shown in
+Figure 4.20 . Denote the three subnets with hosts (starting clockwise at
+12:00) as Networks A, B, and C. Denote the subnets without hosts as
+Networks D, E, and F.
+
+a.  Assign network addresses to each of these six subnets, with the
+    following constraints: All addresses must be allocated from
+    214.97.254/23; Subnet A should have enough addresses to support 250
+    interfaces; Subnet B should have enough addresses to support 120
+    interfaces; and Subnet C should have enough addresses to support 120
+    interfaces. Of course, subnets D, E and F should each be able to
+    support two interfaces. For each subnet, the assignment should take
+    the form a.b.c.d/x or a.b.c.d/x -- e.f.g.h/y.
+
+b.  Using your answer to part (a), provide the forwarding tables (using
+    longest prefix matching) for each of the three routers. P13. Use the
+    whois service at the American Registry for Internet Numbers
+    (http://www.arin.net/ whois) to determine the IP address blocks for
+    three universities. Can the whois services be used to determine with
+    certainty the geographical location of a specific IP address? Use
+    www.maxmind.com to determine the locations of the Web servers at
+    each of these universities. P14. Consider sending a 2400-byte
+    datagram into a link that has an MTU of 700 bytes. Suppose the
+    original datagram is stamped with the identification number 422. How
+    many fragments are generated? What are the values in the various
+    fields in the IP datagram(s) generated related to fragmentation?
+    P15. Suppose datagrams are limited to 1,500 bytes (including header)
+    between source Host A and destination Host B. Assuming a 20-byte IP
+    header, how many datagrams would be required to send an MP3
+    consisting of 5 million bytes? Explain how you computed your answer.
+    P16. Consider the network setup in Figure 4.25 . Suppose that the
+    ISP instead assigns the router the address 24.34.112.235 and that
+    the network address of the home network is 192.168.1/24.
+
+c.  Assign addresses to all interfaces in the home network.
+
+d.  Suppose each host has two ongoing TCP connections, all to port 80 at
+    host 128.119.40.86. Provide the six corresponding entries in the NAT
+    translation table. P17. Suppose you are interested in detecting the
+    number of hosts behind a NAT. You observe that the IP layer stamps
+    an identification number sequentially on each IP packet. The
+    identification number of the first IP packet generated by a host is
+    a random number, and the identification numbers of the subsequent IP
+    packets are sequentially assigned. Assume all IP packets generated
+    by hosts behind the NAT are sent to the outside world.
+
+e.  Based on this observation, and assuming you can sniff all packets
+    sent by the NAT to the outside, can you outline a simple technique
+    that detects the number of unique hosts behind a NAT? Justify your
+    answer.
+
+f.  If the identification numbers are not sequentially assigned but
+    randomly assigned, would
+
+your technique work? Justify your answer. P18. In this problem we'll
+explore the impact of NATs on P2P applications. Suppose a peer with
+username Arnold discovers through querying that a peer with username
+Bernard has a file it wants to download. Also suppose that Bernard and
+Arnold are both behind a NAT. Try to devise a technique that will allow
+Arnold to establish a TCP connection with Bernard without
+applicationspecific NAT configuration. If you have difficulty devising
+such a technique, discuss why. P19. Consider the SDN OpenFlow network
+shown in Figure 4.30 . Suppose that the desired forwarding behavior for
+datagrams arriving at s2 is as follows: any datagrams arriving on input
+port 1 from hosts h5 or h6 that are destined to hosts h1 or h2 should be
+forwarded over output port 2; any datagrams arriving on input port 2
+from hosts h1 or h2 that are destined to hosts h5 or h6 should be
+forwarded over output port 1; any arriving datagrams on input ports 1 or
+2 and destined to hosts h3 or h4 should be delivered to the host
+specified; hosts h3 and h4 should be able to send datagrams to each
+other. Specify the flow table entries in s2 that implement this
+forwarding behavior. P20. Consider again the SDN OpenFlow network shown
+in Figure 4.30 . Suppose that the desired forwarding behavior for
+datagrams arriving from hosts h3 or h4 at s2 is as follows: any
+datagrams arriving from host h3 and destined for h1, h2, h5 or h6 should
+be forwarded in a clockwise direction in the network; any datagrams
+arriving from host h4 and destined for h1, h2, h5 or h6 should be
+forwarded in a counter-clockwise direction in the network. Specify the
+flow table entries in s2 that implement this forwarding behavior. P21.
+Consider again the scenario from P19 above. Give the flow tables entries
+at packet switches s1 and s3, such that any arriving datagrams with a
+source address of h3 or h4 are routed to the destination hosts specified
+in the destination address field in the IP datagram. (Hint: Your
+forwarding table rules should include the cases that an arriving
+datagram is destined for a directly attached host or should be forwarded
+to a neighboring router for eventual host delivery there.) P22. Consider
+again the SDN OpenFlow network shown in Figure 4.30 . Suppose we want
+switch s2 to function as a firewall. Specify the flow table in s2 that
+implements the following firewall behaviors (specify a different flow
+table for each of the four firewalling behaviors below) for delivery of
+datagrams destined to h3 and h4. You do not need to specify the
+forwarding behavior in s2 that forwards traffic to other routers. Only
+traffic arriving from hosts h1 and h6 should be delivered to hosts h3 or
+h4 (i.e., that arriving traffic from hosts h2 and h5 is blocked). Only
+TCP traffic is allowed to be delivered to hosts h3 or h4 (i.e., that UDP
+traffic is blocked).
+
+Only traffic destined to h3 is to be delivered (i.e., all traffic to h4
+is blocked). Only UDP traffic from h1 and destined to h3 is to be
+delivered. All other traffic is blocked.
+
+Wireshark Lab In the Web site for this textbook,
+www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab
+assignment that examines the operation of the IP protocol, and the IP
+datagram format in particular.
+
+AN INTERVIEW WITH... Vinton G. Cerf Vinton G. Cerf is Vice President and
+Chief Internet Evangelist for Google. He served for over 16 years at MCI
+in various positions, ending up his tenure there as Senior Vice
+President for Technology Strategy. He is widely known as the co-designer
+of the TCP/IP protocols and the architecture of the Internet. During his
+time from 1976 to 1982 at the US Department of Defense Advanced Research
+Projects Agency (DARPA), he played a key role leading the development of
+Internet and Internet-related data packet and security techniques. He
+received the US Presidential Medal of Freedom in 2005 and the US
+National Medal of Technology in 1997. He holds a BS in Mathematics from
+Stanford University and an MS and PhD in computer science from UCLA.
+
+What brought you to specialize in networking? I was working as a
+programmer at UCLA in the late 1960s. My job was supported by the US
+Defense Advanced Research Projects Agency (called ARPA then, called
+DARPA now). I was working in the laboratory of Professor Leonard
+Kleinrock on the Network Measurement Center of the newly created
+ARPAnet. The first node of the ARPAnet was installed at UCLA on
+September 1, 1969. I was responsible for programming a computer that was
+used to capture performance information about the ARPAnet and to report
+this information back for comparison with mathematical models and
+predictions of the performance of the network. Several of the other
+graduate students and I were made responsible for working on the
+so-called
+
+host-level protocols of the ARPAnet---the procedures and formats that
+would allow many different kinds of computers on the network to interact
+with each other. It was a fascinating exploration into a new world (for
+me) of distributed computing and communication. Did you imagine that IP
+would become as pervasive as it is today when you first designed the
+protocol? When Bob Kahn and I first worked on this in 1973, I think we
+were mostly very focused on the central question: How can we make
+heterogeneous packet networks interoperate with one another, assuming we
+cannot actually change the networks themselves? We hoped that we could
+find a way to permit an arbitrary collection of packet-switched networks
+to be interconnected in a transparent fashion, so that host computers
+could communicate end-to-end without having to do any translations in
+between. I think we knew that we were dealing with powerful and
+expandable technology, but I doubt we had a clear image of what the
+world would be like with hundreds of millions of computers all
+interlinked on the Internet. What do you now envision for the future of
+networking and the Internet? What major challenges/obstacles do you
+think lie ahead in their development? I believe the Internet itself and
+networks in general will continue to proliferate. Already there is
+convincing evidence that there will be billions of Internet-enabled
+devices on the Internet, including appliances like cell phones,
+refrigerators, personal digital assistants, home servers, televisions,
+as well as the usual array of laptops, servers, and so on. Big
+challenges include support for mobility, battery life, capacity of the
+access links to the network, and ability to scale the optical core of
+the network up in an unlimited fashion. Designing an interplanetary
+extension of the Internet is a project in which I am deeply engaged at
+the Jet Propulsion Laboratory. We will need to cut over from IPv4
+\[32-bit addresses\] to IPv6 \[128 bits\]. The list is long! Who has
+inspired you professionally? My colleague Bob Kahn; my thesis advisor,
+Gerald Estrin; my best friend, Steve Crocker (we met in high school and
+he introduced me to computers in 1960!); and the thousands of engineers
+who continue to evolve the Internet today. Do you have any advice for
+students entering the networking/Internet field? Think outside the
+limitations of existing systems---imagine what might be possible; but
+then do the hard work of figuring out how to get there from the current
+state of affairs. Dare to dream: A half dozen colleagues and I at the
+Jet Propulsion Laboratory have been working on the design of an
+interplanetary extension of the terrestrial Internet. It may take
+decades to implement this,
+
+mission by mission, but to paraphrase: "A man's reach should exceed his
+grasp, or what are the heavens for?"
+
+Chapter 5 The Network Layer: Control Plane
+
+In this chapter, we'll complete our journey through the network layer by
+covering the control-plane component of the network layer---the
+network-wide logic that controls not only how a datagram is forwarded
+among routers along an end-to-end path from the source host to the
+destination host, but also how network-layer components and services are
+configured and managed. In Section 5.2, we'll cover traditional routing
+algorithms for computing least cost paths in a graph; these algorithms
+are the basis for two widely deployed Internet routing protocols: OSPF
+and BGP, that we'll cover in Sections 5.3 and 5.4, respectively. As
+we'll see, OSPF is a routing protocol that operates within a single
+ISP's network. BGP is a routing protocol that serves to interconnect all
+of the networks in the Internet; BGP is thus often referred to as the
+"glue" that holds the Internet together. Traditionally, control-plane
+routing protocols have been implemented together with data-plane
+forwarding functions, monolithically, within a router. As we learned in
+the introduction to Chapter 4, software-defined networking (SDN) makes a
+clear separation between the data and control planes, implementing
+control-plane functions in a separate "controller" service that is
+distinct, and remote, from the forwarding components of the routers it
+controls. We'll cover SDN controllers in Section 5.5. In Sections 5.6
+and 5.7 we'll cover some of the nuts and bolts of managing an IP
+network: ICMP (the Internet Control Message Protocol) and SNMP (the
+Simple Network Management Protocol).
+
+5.1 Introduction Let's quickly set the context for our study of the
+network control plane by recalling Figures 4.2 and 4.3. There, we saw
+that the forwarding table (in the case of destination-based forwarding)
+and the flow table (in the case of generalized forwarding) were the
+principal elements that linked the network layer's data and control
+planes. We learned that these tables specify the local data-plane
+forwarding behavior of a router. We saw that in the case of generalized
+forwarding, the actions taken (Section 4.4.2) could include not only
+forwarding a packet to a router's output port, but also dropping a
+packet, replicating a packet, and/or rewriting layer 2, 3 or 4
+packet-header fields. In this chapter, we'll study how those forwarding
+and flow tables are computed, maintained and installed. In our
+introduction to the network layer in Section 4.1, we learned that there
+are two possible approaches for doing so. Per-router control. Figure 5.1
+illustrates the case where a routing algorithm runs in each and every
+router; both a forwarding and a routing function are contained
+
+Figure 5.1 Per-router control: Individual routing algorithm components
+interact in the control plane
+
+within each router. Each router has a routing component that
+communicates with the routing components in other routers to compute the
+values for its forwarding table. This per-router control approach has
+been used in the Internet for decades. The OSPF and BGP protocols that
+we'll study in Sections 5.3 and 5.4 are based on this per-router
+approach to control. Logically centralized control. Figure 5.2
+illustrates the case in which a logically centralized controller
+computes and distributes the forwarding tables to be used by each and
+every router. As we saw in Section 4.4, the generalized
+match-plus-action abstraction allows the router to perform traditional
+IP forwarding as well as a rich set of other functions (load sharing,
+firewalling, and NAT) that had been previously implemented in separate
+middleboxes.
+
+Figure 5.2 Logically centralized control: A distinct, typically remote,
+controller interacts with local control agents (CAs)
+
+The controller interacts with a control agent (CA) in each of the
+routers via a well-defined protocol to configure and manage that
+router's flow table. Typically, the CA has minimum functionality; its
+job is to communicate with the controller, and to do as the controller
+commands. Unlike the routing algorithms in Figure 5.1, the CAs do not
+directly interact with each other nor do they actively take part in
+computing
+
+the forwarding table. This is a key distinction between per-router
+control and logically centralized control. By "logically centralized"
+control \[Levin 2012\] we mean that the routing control service is
+accessed as if it were a single central service point, even though the
+service is likely to be implemented via multiple servers for
+fault-tolerance, and performance scalability reasons. As we will see in
+Section 5.5, SDN adopts this notion of a logically centralized
+controller---an approach that is finding increased use in production
+deployments. Google uses SDN to control the routers in its internal B4
+global wide-area network that interconnects its data centers \[Jain
+2013\]. SWAN \[Hong 2013\], from Microsoft Research, uses a logically
+centralized controller to manage routing and forwarding between a wide
+area network and a data center network. China Telecom and China Unicom
+are using SDN both within data centers and between data centers \[Li
+2015\]. AT&T has noted \[AT&T 2013\] that it "supports many SDN
+capabilities and independently defined, proprietary mechanisms that fall
+under the SDN architectural framework."
+
+5.2 Routing Algorithms In this section we'll study routing algorithms,
+whose goal is to determine good paths (equivalently, routes), from
+senders to receivers, through the network of routers. Typically, a
+"good" path is one that has the least cost. We'll see that in practice,
+however, real-world concerns such as policy issues (for example, a rule
+such as "router x, belonging to organization Y, should not forward any
+packets originating from the network owned by organization Z") also come
+into play. We note that whether the network control plane adopts a
+per-router control approach or a logically centralized approach, there
+must always be a welldefined sequence of routers that a packet will
+cross in traveling from sending to receiving host. Thus, the routing
+algorithms that compute these paths are of fundamental importance, and
+another candidate for our top-10 list of fundamentally important
+networking concepts. A graph is used to formulate routing problems.
+Recall that a graph G=(N, E) is a set N of nodes and a collection E of
+edges, where each edge is a pair of nodes from N. In the context of
+network-layer routing, the nodes in the graph represent
+
+Figure 5.3 Abstract graph model of a computer network
+
+routers---the points at which packet-forwarding decisions are made---and
+the edges connecting these nodes represent the physical links between
+these routers. Such a graph abstraction of a computer network is shown
+in Figure 5.3. To view some graphs representing real network maps, see
+\[Dodge 2016, Cheswick 2000\]; for a discussion of how well different
+graph-based models model the Internet, see \[Zegura 1997, Faloutsos
+1999, Li 2004\]. As shown in Figure 5.3, an edge also has a value
+representing its cost. Typically, an edge's cost may reflect the
+physical length of the corresponding link (for example, a transoceanic
+link might have a higher
+
+cost than a short-haul terrestrial link), the link speed, or the
+monetary cost associated with a link. For our purposes, we'll simply
+take the edge costs as a given and won't worry about how they are
+determined. For any edge (x, y) in E, we denote c(x, y) as the cost of
+the edge between nodes x and y. If the pair (x, y) does not belong to E,
+we set c(x, y)=∞. Also, we'll only consider undirected graphs (i.e.,
+graphs whose edges do not have a direction) in our discussion here, so
+that edge (x, y) is the same as edge (y, x) and that c(x, y)=c(y, x);
+however, the algorithms we'll study can be easily extended to the case
+of directed links with a different cost in each direction. Also, a node
+y is said to be a neighbor of node x if (x, y) belongs to E. Given that
+costs are assigned to the various edges in the graph abstraction, a
+natural goal of a routing algorithm is to identify the least costly
+paths between sources and destinations. To make this problem more
+precise, recall that a path in a graph G=(N, E) is a sequence of nodes
+(x1,x2,⋯,xp) such that each of the pairs (x1,x2),(x2,x3),⋯,(xp−1,xp) are
+edges in E. The cost of a path (x1,x2,⋯, xp) is simply the sum of all
+the edge costs along the path, that is, c(x1,x2)+c(x2,x3)+⋯+c(xp−1,xp).
+Given any two nodes x and y, there are typically many paths between the
+two nodes, with each path having a cost. One or more of these paths is a
+least-cost path. The least-cost problem is therefore clear: Find a path
+between the source and destination that has least cost. In Figure 5.3,
+for example, the least-cost path between source node u and destination
+node w is (u, x, y, w) with a path cost of 3. Note that if all edges in
+the graph have the same cost, the least-cost path is also the shortest
+path (that is, the path with the smallest number of links between the
+source and the destination). As a simple exercise, try finding the
+least-cost path from node u to z in Figure 5.3 and reflect for a moment
+on how you calculated that path. If you are like most people, you found
+the path from u to z by examining Figure 5.3, tracing a few routes from
+u to z, and somehow convincing yourself that the path you had chosen had
+the least cost among all possible paths. (Did you check all of the 17
+possible paths between u and z? Probably not!) Such a calculation is an
+example of a centralized routing algorithm---the routing algorithm was
+run in one location, your brain, with complete information about the
+network. Broadly, one way in which we can classify routing algorithms is
+according to whether they are centralized or decentralized. A
+centralized routing algorithm computes the least-cost path between a
+source and destination using complete, global knowledge about the
+network. That is, the algorithm takes the connectivity between all nodes
+and all link costs as inputs. This then requires that the algorithm
+somehow obtain this information before actually performing the
+calculation. The calculation itself can be run at one site (e.g., a
+logically centralized controller as in Figure 5.2) or could be
+replicated in the routing component of each and every router (e.g., as
+in Figure 5.1). The key distinguishing feature here, however, is that
+the algorithm has complete information about connectivity and link
+costs. Algorithms with global state information are often referred to as
+link-state (LS) algorithms, since the algorithm must be aware of the
+cost of each link in the network. We'll study LS algorithms in Section
+5.2.1. In a decentralized routing algorithm, the calculation of the
+least-cost path is carried out in an
+
+iterative, distributed manner by the routers. No node has complete
+information about the costs of all network links. Instead, each node
+begins with only the knowledge of the costs of its own directly attached
+links. Then, through an iterative process of calculation and exchange of
+information with its neighboring nodes, a node gradually calculates the
+least-cost path to a destination or set of destinations. The
+decentralized routing algorithm we'll study below in Section 5.2.2 is
+called a distance-vector (DV) algorithm, because each node maintains a
+vector of estimates of the costs (distances) to all other nodes in the
+network. Such decentralized algorithms, with interactive message
+exchange between neighboring routers is perhaps more naturally suited to
+control planes where the routers interact directly with each other, as
+in Figure 5.1. A second broad way to classify routing algorithms is
+according to whether they are static or dynamic. In static routing
+algorithms, routes change very slowly over time, often as a result of
+human intervention (for example, a human manually editing a link costs).
+Dynamic routing algorithms change the routing paths as the network
+traffic loads or topology change. A dynamic algorithm can be run either
+periodically or in direct response to topology or link cost changes.
+While dynamic algorithms are more responsive to network changes, they
+are also more susceptible to problems such as routing loops and route
+oscillation. A third way to classify routing algorithms is according to
+whether they are load-sensitive or loadinsensitive. In a load-sensitive
+algorithm, link costs vary dynamically to reflect the current level of
+congestion in the underlying link. If a high cost is associated with a
+link that is currently congested, a routing algorithm will tend to
+choose routes around such a congested link. While early ARPAnet routing
+algorithms were load-sensitive \[McQuillan 1980\], a number of
+difficulties were encountered \[Huitema 1998\]. Today's Internet routing
+algorithms (such as RIP, OSPF, and BGP) are load-insensitive, as a
+link's cost does not explicitly reflect its current (or recent past)
+level of congestion.
+
+5.2.1 The Link-State (LS) Routing Algorithm Recall that in a link-state
+algorithm, the network topology and all link costs are known, that is,
+available as input to the LS algorithm. In practice this is accomplished
+by having each node broadcast link-state packets to all other nodes in
+the network, with each link-state packet containing the identities and
+costs of its attached links. In practice (for example, with the
+Internet's OSPF routing protocol, discussed in Section 5.3) this is
+often accomplished by a link-state broadcast algorithm \[Perlman 1999\].
+The result of the nodes' broadcast is that all nodes have an identical
+and complete view of the network. Each node can then run the LS
+algorithm and compute the same set of least-cost paths as every other
+node. The link-state routing algorithm we present below is known as
+Dijkstra's algorithm, named after its inventor. A closely related
+algorithm is Prim's algorithm; see \[Cormen 2001\] for a general
+discussion of graph algorithms. Dijkstra's algorithm computes the
+least-cost path from one node (the source, which we will refer to as u)
+to all other nodes in the network. Dijkstra's algorithm is iterative and
+has the property that
+
+after the kth iteration of the algorithm, the least-cost paths are known
+to k destination nodes, and among the least-cost paths to all
+destination nodes, these k paths will have the k smallest costs. Let us
+define the following notation: D(v): cost of the least-cost path from
+the source node to destination v as of this iteration of the algorithm.
+p(v): previous node (neighbor of v) along the current least-cost path
+from the source to v. N′: subset of nodes; v is in N′ if the least-cost
+path from the source to v is definitively known. The centralized routing
+algorithm consists of an initialization step followed by a loop. The
+number of times the loop is executed is equal to the number of nodes in
+the network. Upon termination, the algorithm will have calculated the
+shortest paths from the source node u to every other node in the
+network.
+
+Link-State (LS) Algorithm for Source Node u
+
+1
+
+Initialization:
+
+2
+
+N' = {u}
+
+3
+
+for all nodes v
+
+4
+
+if v is a neighbor of u
+
+5
+
+then D(v) = c(u, v)
+
+6
+
+else D(v) = ∞
+
+7 8
+
+Loop
+
+9
+
+find w not in N' such that D(w) is a minimum
+
+10
+
+add w to N'
+
+11
+
+update D(v) for each neighbor v of w and not in N':
+
+12
+
+D(v) = min(D(v), D(w)+ c(w, v) )
+
+13
+
+/\* new cost to v is either old cost to v or known
+
+14
+
+least path cost to w plus cost from w to v \*/
+
+15 until N'= N
+
+As an example, let's consider the network in Figure 5.3 and compute the
+least-cost paths from u to all possible destinations. A tabular summary
+of the algorithm's computation is shown in Table 5.1, where each line in
+the table gives the values of the algorithm's variables at the end of
+the iteration. Let's consider the few first steps in detail. In the
+initialization step, the currently known least-cost paths from u to its
+directly attached neighbors,
+
+v, x, and w, are initialized to 2, 1, and 5, respectively. Note in Table
+5.1 Running the link-state algorithm on the network in Figure 5.3 step
+
+N'
+
+D (v), p (v)
+
+D (w), p (w)
+
+D (x), p (x)
+
+D (y), p (y)
+
+D (z), p (z)
+
+0
+
+u
+
+2, u
+
+5, u
+
+1,u
+
+∞
+
+∞
+
+1
+
+ux
+
+2, u
+
+4, x
+
+2, x
+
+∞
+
+2
+
+uxy
+
+2, u
+
+3, y
+
+4, y
+
+3
+
+uxyv
+
+3, y
+
+4, y
+
+4
+
+uxyvw
+
+5
+
+uxyvwz
+
+4, y
+
+particular that the cost to w is set to 5 (even though we will soon see
+that a lesser-cost path does indeed exist) since this is the cost of the
+direct (one hop) link from u to w. The costs to y and z are set to
+infinity because they are not directly connected to u. In the first
+iteration, we look among those nodes not yet added to the set N′ and
+find that node with the least cost as of the end of the previous
+iteration. That node is x, with a cost of 1, and thus x is added to the
+set N′. Line 12 of the LS algorithm is then performed to update D(v) for
+all nodes v, yielding the results shown in the second line (Step 1) in
+Table 5.1. The cost of the path to v is unchanged. The cost of the path
+to w (which was 5 at the end of the initialization) through node x is
+found to have a cost of 4. Hence this lower-cost path is selected and
+w's predecessor along the shortest path from u is set to x. Similarly,
+the cost to y (through x) is computed to be 2, and the table is updated
+accordingly. In the second iteration, nodes v and y are found to have
+the least-cost paths (2), and we break the tie arbitrarily and add y to
+the set N′ so that N′ now contains u, x, and y. The cost to the
+remaining nodes not yet in N′, that is, nodes v, w, and z, are updated
+via line 12 of the LS algorithm, yielding the results shown in the third
+row in Table 5.1. And so on . . . When the LS algorithm terminates, we
+have, for each node, its predecessor along the least-cost path from the
+source node. For each predecessor, we also have its predecessor, and so
+in this manner we can construct the entire path from the source to all
+destinations. The forwarding table in a node, say node u, can then be
+constructed from this information by storing, for each destination, the
+next-hop node on the least-cost path from u to the destination. Figure
+5.4 shows the resulting least-cost paths and forwarding table in u for
+the network in Figure 5.3.
+
+Figure 5.4 Least cost path and forwarding table for node u
+
+What is the computational complexity of this algorithm? That is, given n
+nodes (not counting the source), how much computation must be done in
+the worst case to find the least-cost paths from the source to all
+destinations? In the first iteration, we need to search through all n
+nodes to determine the node, w, not in N′ that has the minimum cost. In
+the second iteration, we need to check n−1 nodes to determine the
+minimum cost; in the third iteration n−2 nodes, and so on. Overall, the
+total number of nodes we need to search through over all the iterations
+is n(n+1)/2, and thus we say that the preceding implementation of the LS
+algorithm has worst-case complexity of order n squared: O(n2). (A more
+sophisticated implementation of this algorithm, using a data structure
+known as a heap, can find the minimum in line 9 in logarithmic rather
+than linear time, thus reducing the complexity.) Before completing our
+discussion of the LS algorithm, let us consider a pathology that can
+arise. Figure 5.5 shows a simple network topology where link costs are
+equal to the load carried on the link, for example, reflecting the delay
+that would be experienced. In this example, link costs are not
+symmetric; that is, c(u, v) equals c(v, u) only if the load carried on
+both directions on the link (u, v) is the same. In this example, node z
+originates a unit of traffic destined for w, node x also originates a
+unit of traffic destined for w, and node y injects an amount of traffic
+equal to e, also destined for w. The initial routing is shown in Figure
+5.5(a) with the link costs corresponding to the amount of traffic
+carried. When the LS algorithm is next run, node y determines (based on
+the link costs shown in Figure 5.5(a)) that the clockwise path to w has
+a cost of 1, while the counterclockwise path to w (which it had been
+using) has a cost of 1+e. Hence y's least-cost path to w is now
+clockwise. Similarly, x determines that its new least-cost path to w is
+also clockwise, resulting in costs shown in Figure 5.5(b). When the LS
+algorithm is run next, nodes x, y, and z all detect a zero-cost path to
+w in the counterclockwise direction, and all route their traffic to the
+counterclockwise routes. The next time the LS algorithm is run, x, y,
+and z all then route their traffic to the clockwise routes. What can be
+done to prevent such oscillations (which can occur in any algorithm, not
+just an LS algorithm, that uses a congestion or delay-based link
+metric)? One solution would be to mandate that link costs not depend on
+the amount of traffic
+
+Figure 5.5 Oscillations with congestion-sensitive routing
+
+carried---an unacceptable solution since one goal of routing is to avoid
+highly congested (for example, high-delay) links. Another solution is to
+ensure that not all routers run the LS algorithm at the same time. This
+seems a more reasonable solution, since we would hope that even if
+routers ran the LS algorithm with the same periodicity, the execution
+instance of the algorithm would not be the same at each node.
+Interestingly, researchers have found that routers in the Internet can
+self-synchronize among themselves \[Floyd Synchronization 1994\]. That
+is, even though they initially execute the algorithm with the same
+period but at different instants of time, the algorithm execution
+instance can eventually become, and remain, synchronized at the routers.
+One way to avoid such self-synchronization is for each router to
+randomize the time it sends out a link advertisement. Having studied the
+LS algorithm, let's consider the other major routing algorithm that is
+used in practice today---the distance-vector routing algorithm.
+
+5.2.2 The Distance-Vector (DV) Routing Algorithm Whereas the LS
+algorithm is an algorithm using global information, the distance-vector
+(DV) algorithm is iterative, asynchronous, and distributed. It is
+distributed in that each node receives some information from one or more
+of its directly attached neighbors, performs a calculation, and then
+distributes the results of its calculation back to its neighbors. It is
+iterative in that this process continues on until no more information is
+exchanged between neighbors. (Interestingly, the algorithm is also
+self-terminating---there is no signal that the computation should stop;
+it just stops.) The algorithm is asynchronous in that it does not
+require all of the nodes to operate in lockstep with each other. We'll
+see that an asynchronous, iterative, selfterminating, distributed
+algorithm is much more interesting and fun than a centralized algorithm!
+Before we present the DV algorithm, it will prove beneficial to discuss
+an important relationship that exists among the costs of the least-cost
+paths. Let dx(y) be the cost of the least-cost path from node x to node
+y. Then the least costs are related by the celebrated Bellman-Ford
+equation, namely,
+
+(5.1)
+
+dx(y)=minv{c(x,v)+dv(y)}, where the minv in the equation is taken over
+all of x's neighbors. The Bellman-Ford equation is rather
+
+intuitive. Indeed, after traveling from x to v, if we then take the
+least-cost path from v to y, the path cost will be c(x,v)+dv(y). Since
+we must begin by traveling to some neighbor v, the least cost from x to
+y is the minimum of c(x,v)+dv(y) taken over all neighbors v. But for
+those who might be skeptical about the validity of the equation, let's
+check it for source node u and destination node z in Figure 5.3. The
+source node u has three neighbors: nodes v, x, and w. By walking along
+various paths in the graph, it is easy to see that dv(z)=5, dx(z)=3, and
+dw(z)=3. Plugging these values into Equation 5.1, along with the costs
+c(u,v)=2, c(u,x)=1, and c(u,w)=5, gives du(z)=min{2+5,5+3,1+3}=4, which
+is obviously true and which is exactly what the Dijskstra algorithm gave
+us for the same network. This quick verification should help relieve any
+skepticism you may have. The Bellman-Ford equation is not just an
+intellectual curiosity. It actually has significant practical
+importance: the solution to the Bellman-Ford equation provides the
+entries in node x's forwarding table. To see this, let v\* be any
+neighboring node that achieves the minimum in Equation 5.1. Then, if
+node x wants to send a packet to node y along a least-cost path, it
+should first forward the packet to node v*. Thus, node x's forwarding
+table would specify node v* as the next-hop router for the ultimate
+destination y. Another important practical contribution of the
+Bellman-Ford equation is that it suggests the form of the
+neighborto-neighbor communication that will take place in the DV
+algorithm. The basic idea is as follows. Each node x begins with Dx(y),
+an estimate of the cost of the least-cost path from itself to node y,
+for all nodes, y, in N. Let Dx=\[Dx(y): y in N\] be node x's distance
+vector, which is the vector of cost estimates from x to all other nodes,
+y, in N. With the DV algorithm, each node x maintains the following
+routing information: For each neighbor v, the cost c(x, v) from x to
+directly attached neighbor, v Node x's distance vector, that is,
+Dx=\[Dx(y): y in N\], containing x's estimate of its cost to all
+destinations, y, in N The distance vectors of each of its neighbors,
+that is, Dv=\[Dv(y): y in N\] for each neighbor v of x In the
+distributed, asynchronous algorithm, from time to time, each node sends
+a copy of its distance vector to each of its neighbors. When a node x
+receives a new distance vector from any of its neighbors w, it saves w's
+distance vector, and then uses the Bellman-Ford equation to update its
+own distance vector as follows: Dx(y)=minv{c(x,v)+Dv(y)}
+
+for each node y in N
+
+If node x's distance vector has changed as a result of this update step,
+node x will then send its updated
+
+distance vector to each of its neighbors, which can in turn update their
+own distance vectors. Miraculously enough, as long as all the nodes
+continue to exchange their distance vectors in an asynchronous fashion,
+each cost estimate Dx(y) converges to dx(y), the actual cost of the
+least-cost path from node x to node y \[Bertsekas 1991\]!
+Distance-Vector (DV) Algorithm At each node, x:
+
+1 2
+
+Initialization: for all destinations y in N:
+
+3 4
+
+Dx(y)= c(x, y)/\* if y is not a neighbor then c(x, y)= ∞ \*/ for each
+neighbor w
+
+5 6
+
+Dw(y) = ? for all destinations y in N for each neighbor w
+
+7
+
+send distance vector
+
+Dx = \[Dx(y): y in N\] to w
+
+8 9 10
+
+loop wait
+
+11
+
+(until I see a link cost change to some neighbor w or until I receive a
+distance vector from some neighbor w)
+
+12 13
+
+for each y in N:
+
+14
+
+Dx(y) = minv{c(x, v) + Dv(y)}
+
+15 16 if Dx(y) changed for any destination y 17
+
+send distance vector Dx
+
+= \[Dx(y): y in N\] to all neighbors
+
+18 19 forever
+
+In the DV algorithm, a node x updates its distance-vector estimate when
+it either sees a cost change in one of its directly attached links or
+receives a distance-vector update from some neighbor. But to update its
+own forwarding table for a given destination y, what node x really needs
+to know is not the shortest-path distance to y but instead the
+neighboring node v*(y) that is the next-hop router along the shortest
+path to y. As you might expect, the next-hop router v*(y) is the
+neighbor v that achieves the minimum in Line 14 of the DV algorithm. (If
+there are multiple neighbors v that achieve the minimum, then v*(y) can
+be any of the minimizing neighbors.) Thus, in Lines 13--14, for each
+destination y, node x also determines v*(y) and updates its forwarding
+table for destination y.
+
+Recall that the LS algorithm is a centralized algorithm in the sense
+that it requires each node to first obtain a complete map of the network
+before running the Dijkstra algorithm. The DV algorithm is decentralized
+and does not use such global information. Indeed, the only information a
+node will have is the costs of the links to its directly attached
+neighbors and information it receives from these neighbors. Each node
+waits for an update from any neighbor (Lines 10--11), calculates its new
+distance vector when receiving an update (Line 14), and distributes its
+new distance vector to its neighbors (Lines 16--17). DV-like algorithms
+are used in many routing protocols in practice, including the Internet's
+RIP and BGP, ISO IDRP, Novell IPX, and the original ARPAnet. Figure 5.6
+illustrates the operation of the DV algorithm for the simple three-node
+network shown at the top of the figure. The operation of the algorithm
+is illustrated in a synchronous manner, where all nodes simultaneously
+receive distance vectors from their neighbors, compute their new
+distance vectors, and inform their neighbors if their distance vectors
+have changed. After studying this example, you should convince yourself
+that the algorithm operates correctly in an asynchronous manner as well,
+with node computations and update generation/reception occurring at any
+time. The leftmost column of the figure displays three initial routing
+tables for each of the three nodes. For example, the table in the
+upper-left corner is node x's initial routing table. Within a specific
+routing table, each row is a distance vector--- specifically, each
+node's routing table includes its own distance vector and that of each
+of its neighbors. Thus, the first row in node x's initial routing table
+is Dx=\[Dx(x),Dx(y),Dx(z)\]=\[0,2,7\]. The second and third rows in this
+table are the most recently received distance vectors from nodes y and
+z, respectively. Because at initialization node x has not received
+anything from node y or z, the entries in the second and third rows are
+initialized to infinity. After initialization, each node sends its
+distance vector to each of its two neighbors. This is illustrated in
+Figure 5.6 by the arrows from the first column of tables to the second
+column of tables. For example, node x sends its distance vector Dx =
+\[0, 2, 7\] to both nodes y and z. After receiving the updates, each
+node recomputes its own distance vector. For example, node x computes
+Dx(x)=0Dx(y)=min{c(x,y)+Dy(y),c(x,z)+Dz(y)}=min{2+0,
+7+1}=2Dx(z)=min{c(x,y)+Dy(z),c(x,z)+Dz(z)}=min{2+1,7+0}=3 The second
+column therefore displays, for each node, the node's new distance vector
+along with distance vectors just received from its neighbors. Note, for
+example, that
+
+Figure 5.6 Distance-vector (DV) algorithm in operation
+
+node x's estimate for the least cost to node z, Dx(z), has changed from
+7 to 3. Also note that for node x, neighboring node y achieves the
+minimum in line 14 of the DV algorithm; thus at this stage of the
+algorithm, we have at node x that v*(y)=y and v*(z)=y. After the nodes
+recompute their distance vectors, they again send their updated distance
+vectors to their neighbors (if there has been a change). This is
+illustrated in Figure 5.6 by the arrows from the second column of tables
+to the third column of tables. Note that only nodes x and z send
+updates: node y's distance vector didn't change so node y doesn't send
+an update. After receiving the updates, the nodes then recompute their
+distance vectors and update their routing tables, which are shown in the
+third column.
+
+The process of receiving updated distance vectors from neighbors,
+recomputing routing table entries, and informing neighbors of changed
+costs of the least-cost path to a destination continues until no update
+messages are sent. At this point, since no update messages are sent, no
+further routing table calculations will occur and the algorithm will
+enter a quiescent state; that is, all nodes will be performing the wait
+in Lines 10--11 of the DV algorithm. The algorithm remains in the
+quiescent state until a link cost changes, as discussed next.
+Distance-Vector Algorithm: Link-Cost Changes and Link Failure When a
+node running the DV algorithm detects a change in the link cost from
+itself to a neighbor (Lines 10--11), it updates its distance vector
+(Lines 13--14) and, if there's a change in the cost of the least-cost
+path, informs its neighbors (Lines 16--17) of its new distance vector.
+Figure 5.7(a) illustrates a scenario where the link cost from y to x
+changes from 4 to 1. We focus here only on y' and z's distance table
+entries to destination x. The DV algorithm causes the following sequence
+of events to occur: At time t0, y detects the link-cost change (the cost
+has changed from 4 to 1), updates its distance vector, and informs its
+neighbors of this change since its distance vector has changed. At time
+t1, z receives the update from y and updates its table. It computes a
+new least cost to x (it has decreased from a cost of 5 to a cost of 2)
+and sends its new distance vector to its neighbors. At time t2, y
+receives z's update and updates its distance table. y's least costs do
+not change and hence y does not send any message to z. The algorithm
+comes to a quiescent state. Thus, only two iterations are required for
+the DV algorithm to reach a quiescent state. The good news about the
+decreased cost between x and y has propagated quickly through the
+network.
+
+Figure 5.7 Changes in link cost
+
+Let's now consider what can happen when a link cost increases. Suppose
+that the link cost between x and y increases from 4 to 60, as shown in
+Figure 5.7(b).
+
+1.  Before the link cost changes, Dy(x)=4, Dy(z)=1, Dz(y)=1, and
+    Dz(x)=5. At time t0, y detects the link-
+
+cost change (the cost has changed from 4 to 60). y computes its new
+minimum-cost path to x to have a cost of Dy(x)=min{c(y,x)+Dx(x),
+c(y,z)+Dz(x)}=min{60+0,1+5}=6 Of course, with our global view of the
+network, we can see that this new cost via z is wrong. But the only
+information node y has is that its direct cost to x is 60 and that z has
+last told y that z could get to x with a cost of 5. So in order to get
+to x, y would now route through z, fully expecting that z will be able
+to get to x with a cost of 5. As of t1 we have a routing loop---in order
+to get to x, y routes through z, and z routes through y. A routing loop
+is like a black hole---a packet destined for x arriving at y or z as of
+t1 will bounce back and forth between these two nodes forever (or until
+the forwarding tables are changed).
+
+2.  Since node y has computed a new minimum cost to x, it informs z of
+    its new distance vector at time t1.
+
+3.  Sometime after t1, z receives y's new distance vector, which
+    indicates that y's minimum cost to x is
+
+4.  z knows it can get to y with a cost of 1 and hence computes a new
+    least cost to x of Dz(x)=min{50+0,1+6}=7. Since z's least cost to x
+    has increased, it then informs y of its new distance vector at t2.
+
+5.  In a similar manner, after receiving z's new distance vector, y
+    determines Dy(x)=8 and sends z its distance vector. z then
+    determines Dz(x)=9 and sends y its distance vector, and so on. How
+    long will the process continue? You should convince yourself that
+    the loop will persist for 44 iterations (message exchanges between y
+    and z)---until z eventually computes the cost of its path via y to
+    be greater than 50. At this point, z will (finally!) determine that
+    its least-cost path to x is via its direct connection to x. y will
+    then route to x via z. The result of the bad news about the increase
+    in link cost has indeed traveled slowly! What would have happened if
+    the link cost c(y, x) had changed from 4 to 10,000 and the cost c(z,
+
+```{=html}
+<!-- -->
+```
+x)  had been 9,999? Because of such scenarios, the problem we have seen
+    is sometimes referred to as the count-to-infinity problem.
+    Distance-Vector Algorithm: Adding Poisoned Reverse The specific
+    looping scenario just described can be avoided using a technique
+    known as poisoned reverse. The idea is simple---if z routes through
+    y to get to destination x, then z will advertise to y that its
+    distance to x is infinity, that is, z will advertise to y that
+    Dz(x)=∞ (even though z knows Dz(x)=5 in truth). z will continue
+    telling this little white lie to y as long as it routes to x via y.
+    Since y believes that z has no path to x, y will never attempt to
+    route to x via z, as long as z continues to route to x via y (and
+    lies about doing so). Let's now see how poisoned reverse solves the
+    particular looping problem we encountered before in Figure 5.5(b).
+    As a result of the poisoned reverse, y's distance table indicates
+    Dz(x)=∞. When the cost of the (x, y) link changes from 4 to 60 at
+    time t0, y updates its table and continues to route directly to x,
+    albeit
+
+at a higher cost of 60, and informs z of its new cost to x, that is,
+Dy(x)=60. After receiving the update at t1, z immediately shifts its
+route to x to be via the direct (z, x) link at a cost of 50. Since this
+is a new least-cost path to x, and since the path no longer passes
+through y, z now informs y that Dz(x)=50 at t2. After receiving the
+update from z, y updates its distance table with Dy(x)=51. Also, since z
+is now on y's leastcost path to x, y poisons the reverse path from z to
+x by informing z at time t3 that Dy(x)=∞ (even though y knows that
+Dy(x)=51 in truth). Does poisoned reverse solve the general
+count-to-infinity problem? It does not. You should convince yourself
+that loops involving three or more nodes (rather than simply two
+immediately neighboring nodes) will not be detected by the poisoned
+reverse technique. A Comparison of LS and DV Routing Algorithms The DV
+and LS algorithms take complementary approaches toward computing
+routing. In the DV algorithm, each node talks to only its directly
+connected neighbors, but it provides its neighbors with leastcost
+estimates from itself to all the nodes (that it knows about) in the
+network. The LS algorithm requires global information. Consequently,
+when implemented in each and every router, e.g., as in Figure 4.2 and
+5.1, each node would need to communicate with all other nodes (via
+broadcast), but it tells them only the costs of its directly connected
+links. Let's conclude our study of LS and DV algorithms with a quick
+comparison of some of their attributes. Recall that N is the set of
+nodes (routers) and E is the set of edges (links). Message complexity.
+We have seen that LS requires each node to know the cost of each link in
+the network. This requires O(\|N\| \|E\|) messages to be sent. Also,
+whenever a link cost changes, the new link cost must be sent to all
+nodes. The DV algorithm requires message exchanges between directly
+connected neighbors at each iteration. We have seen that the time needed
+for the algorithm to converge can depend on many factors. When link
+costs change, the DV algorithm will propagate the results of the changed
+link cost only if the new link cost results in a changed least-cost path
+for one of the nodes attached to that link. Speed of convergence. We
+have seen that our implementation of LS is an O(\|N\|2) algorithm
+requiring O(\|N\| \|E\|)) messages. The DV algorithm can converge slowly
+and can have routing loops while the algorithm is converging. DV also
+suffers from the count-to-infinity problem. Robustness. What can happen
+if a router fails, misbehaves, or is sabotaged? Under LS, a router could
+broadcast an incorrect cost for one of its attached links (but no
+others). A node could also corrupt or drop any packets it received as
+part of an LS broadcast. But an LS node is computing only its own
+forwarding tables; other nodes are performing similar calculations for
+themselves. This means route calculations are somewhat separated under
+LS, providing a degree of robustness. Under DV, a node can advertise
+incorrect least-cost paths to any or all destinations. (Indeed, in 1997,
+a malfunctioning router in a small ISP provided national backbone
+routers with erroneous routing information. This caused other routers to
+flood the malfunctioning router with traffic and caused large portions
+of the
+
+Internet to become disconnected for up to several hours \[Neumann
+1997\].) More generally, we note that, at each iteration, a node's
+calculation in DV is passed on to its neighbor and then indirectly to
+its neighbor's neighbor on the next iteration. In this sense, an
+incorrect node calculation can be diffused through the entire network
+under DV. In the end, neither algorithm is an obvious winner over the
+other; indeed, both algorithms are used in the Internet.
+
+5.3 Intra-AS Routing in the Internet: OSPF In our study of routing
+algorithms so far, we've viewed the network simply as a collection of
+interconnected routers. One router was indistinguishable from another in
+the sense that all routers executed the same routing algorithm to
+compute routing paths through the entire network. In practice, this
+model and its view of a homogenous set of routers all executing the same
+routing algorithm is simplistic for two important reasons: Scale. As the
+number of routers becomes large, the overhead involved in communicating,
+computing, and storing routing information becomes prohibitive. Today's
+Internet consists of hundreds of millions of routers. Storing routing
+information for possible destinations at each of these routers would
+clearly require enormous amounts of memory. The overhead required to
+broadcast connectivity and link cost updates among all of the routers
+would be huge! A distance-vector algorithm that iterated among such a
+large number of routers would surely never converge. Clearly, something
+must be done to reduce the complexity of route computation in a network
+as large as the Internet. Administrative autonomy. As described in
+Section 1.3, the Internet is a network of ISPs, with each ISP consisting
+of its own network of routers. An ISP generally desires to operate its
+network as it pleases (for example, to run whatever routing algorithm it
+chooses within its network) or to hide aspects of its network's internal
+organization from the outside. Ideally, an organization should be able
+to operate and administer its network as it wishes, while still being
+able to connect its network to other outside networks. Both of these
+problems can be solved by organizing routers into autonomous systems
+(ASs), with each AS consisting of a group of routers that are under the
+same administrative control. Often the routers in an ISP, and the links
+that interconnect them, constitute a single AS. Some ISPs, however,
+partition their network into multiple ASs. In particular, some tier-1
+ISPs use one gigantic AS for their entire network, whereas others break
+up their ISP into tens of interconnected ASs. An autonomous system is
+identified by its globally unique autonomous system number (ASN) \[RFC
+1930\]. AS numbers, like IP addresses, are assigned by ICANN regional
+registries \[ICANN 2016\]. Routers within the same AS all run the same
+routing algorithm and have information about each other. The routing
+algorithm running within an autonomous system is called an
+intra-autonomous system routing protocol. Open Shortest Path First
+(OSPF)
+
+OSPF routing and its closely related cousin, IS-IS, are widely used for
+intra-AS routing in the Internet. The Open in OSPF indicates that the
+routing protocol specification is publicly available (for example, as
+opposed to Cisco's EIGRP protocol, which was only recently became open
+\[Savage 2015\], after roughly 20 years as a Cisco-proprietary
+protocol). The most recent version of OSPF, version 2, is defined in
+\[RFC 2328\], a public document. OSPF is a link-state protocol that uses
+flooding of link-state information and a Dijkstra's least-cost path
+algorithm. With OSPF, each router constructs a complete topological map
+(that is, a graph) of the entire autonomous system. Each router then
+locally runs Dijkstra's shortest-path algorithm to determine a
+shortest-path tree to all subnets, with itself as the root node.
+Individual link costs are configured by the network administrator (see
+sidebar, Principles and Practice: Setting OSPF Weights). The
+administrator might choose to set all link costs to 1,
+
+PRINCIPLES IN PRACTICE SETTING OSPF LINK WEIGHTS Our discussion of
+link-state routing has implicitly assumed that link weights are set, a
+routing algorithm such as OSPF is run, and traffic flows according to
+the routing tables computed by the LS algorithm. In terms of cause and
+effect, the link weights are given (i.e., they come first) and result
+(via Dijkstra's algorithm) in routing paths that minimize overall cost.
+In this viewpoint, link weights reflect the cost of using a link (e.g.,
+if link weights are inversely proportional to capacity, then the use of
+high-capacity links would have smaller weight and thus be more
+attractive from a routing standpoint) and Dijsktra's algorithm serves to
+minimize overall cost. In practice, the cause and effect relationship
+between link weights and routing paths may be reversed, with network
+operators configuring link weights in order to obtain routing paths that
+achieve certain traffic engineering goals \[Fortz 2000, Fortz 2002\].
+For example, suppose a network operator has an estimate of traffic flow
+entering the network at each ingress point and destined for each egress
+point. The operator may then want to put in place a specific routing of
+ingress-to-egress flows that minimizes the maximum utilization over all
+of the network's links. But with a routing algorithm such as OSPF, the
+operator's main "knobs" for tuning the routing of flows through the
+network are the link weights. Thus, in order to achieve the goal of
+minimizing the maximum link utilization, the operator must find the set
+of link weights that achieves this goal. This is a reversal of the cause
+and effect relationship---the desired routing of flows is known, and the
+OSPF link weights must be found such that the OSPF routing algorithm
+results in this desired routing of flows.
+
+thus achieving minimum-hop routing, or might choose to set the link
+weights to be inversely proportional to link capacity in order to
+discourage traffic from using low-bandwidth links. OSPF does not mandate
+a policy for how link weights are set (that is the job of the network
+administrator), but instead provides
+
+the mechanisms (protocol) for determining least-cost path routing for
+the given set of link weights. With OSPF, a router broadcasts routing
+information to all other routers in the autonomous system, not just to
+its neighboring routers. A router broadcasts link-state information
+whenever there is a change in a link's state (for example, a change in
+cost or a change in up/down status). It also broadcasts a link's state
+periodically (at least once every 30 minutes), even if the link's state
+has not changed. RFC 2328 notes that "this periodic updating of link
+state advertisements adds robustness to the link state algorithm." OSPF
+advertisements are contained in OSPF messages that are carried directly
+by IP, with an upper-layer protocol of 89 for OSPF. Thus, the OSPF
+protocol must itself implement functionality such as reliable message
+transfer and link-state broadcast. The OSPF protocol also checks that
+links are operational (via a HELLO message that is sent to an attached
+neighbor) and allows an OSPF router to obtain a neighboring router's
+database of network-wide link state. Some of the advances embodied in
+OSPF include the following: Security. Exchanges between OSPF routers
+(for example, link-state updates) can be authenticated. With
+authentication, only trusted routers can participate in the OSPF
+protocol within an AS, thus preventing malicious intruders (or
+networking students taking their newfound knowledge out for a joyride)
+from injecting incorrect information into router tables. By default,
+OSPF packets between routers are not authenticated and could be forged.
+Two types of authentication can be configured--- simple and MD5 (see
+Chapter 8 for a discussion on MD5 and authentication in general). With
+simple authentication, the same password is configured on each router.
+When a router sends an OSPF packet, it includes the password in
+plaintext. Clearly, simple authentication is not very secure. MD5
+authentication is based on shared secret keys that are configured in all
+the routers. For each OSPF packet that it sends, the router computes the
+MD5 hash of the content of the OSPF packet appended with the secret key.
+(See the discussion of message authentication codes in Chapter 8.) Then
+the router includes the resulting hash value in the OSPF packet. The
+receiving router, using the preconfigured secret key, will compute an
+MD5 hash of the packet and compare it with the hash value that the
+packet carries, thus verifying the packet's authenticity. Sequence
+numbers are also used with MD5 authentication to protect against replay
+attacks. Multiple same-cost paths. When multiple paths to a destination
+have the same cost, OSPF allows multiple paths to be used (that is, a
+single path need not be chosen for carrying all traffic when multiple
+equal-cost paths exist). Integrated support for unicast and multicast
+routing. Multicast OSPF (MOSPF) \[RFC 1584\] provides simple extensions
+to OSPF to provide for multicast routing. MOSPF uses the existing OSPF
+link database and adds a new type of link-state advertisement to the
+existing OSPF link-state broadcast mechanism. Support for hierarchy
+within a single AS. An OSPF autonomous system can be configured
+hierarchically into areas. Each area runs its own OSPF link-state
+routing algorithm, with each router in an area broadcasting its link
+state to all other routers in that area. Within each area, one or more
+
+area border routers are responsible for routing packets outside the
+area. Lastly, exactly one OSPF area in the AS is configured to be the
+backbone area. The primary role of the backbone area is to route traffic
+between the other areas in the AS. The backbone always contains all area
+border routers in the AS and may contain non-border routers as well.
+Inter-area routing within the AS requires that the packet be first
+routed to an area border router (intra-area routing), then routed
+through the backbone to the area border router that is in the
+destination area, and then routed to the final destination. OSPF is a
+relatively complex protocol, and our coverage here has been necessarily
+brief; \[Huitema 1998; Moy 1998; RFC 2328\] provide additional details.
+
+5.4 Routing Among the ISPs: BGP We just learned that OSPF is an example
+of an intra-AS routing protocol. When routing a packet between a source
+and destination within the same AS, the route the packet follows is
+entirely determined by the intra-AS routing protocol. However, to route
+a packet across multiple ASs, say from a smartphone in Timbuktu to a
+server in a datacenter in Silicon Valley, we need an inter-autonomous
+system routing protocol. Since an inter-AS routing protocol involves
+coordination among multiple ASs, communicating ASs must run the same
+inter-AS routing protocol. In fact, in the Internet, all ASs run the
+same inter-AS routing protocol, called the Border Gateway Protocol, more
+commonly known as BGP \[RFC 4271; Stewart 1999\]. BGP is arguably the
+most important of all the Internet protocols (the only other contender
+would be the IP protocol that we studied in Section 4.3), as it is the
+protocol that glues the thousands of ISPs in the Internet together. As
+we will soon see, BGP is a decentralized and asynchronous protocol in
+the vein of distance-vector routing described in Section 5.2.2. Although
+BGP is a complex and challenging protocol, to understand the Internet on
+a deep level, we need to become familiar with its underpinnings and
+operation. The time we devote to learning BGP will be well worth the
+effort.
+
+5.4.1 The Role of BGP To understand the responsibilities of BGP,
+consider an AS and an arbitrary router in that AS. Recall that every
+router has a forwarding table, which plays the central role in the
+process of forwarding arriving packets to outbound router links. As we
+have learned, for destinations that are within the same AS, the entries
+in the router's forwarding table are determined by the AS's intra-AS
+routing protocol. But what about destinations that are outside of the
+AS? This is precisely where BGP comes to the rescue. In BGP, packets are
+not routed to a specific destination address, but instead to CIDRized
+prefixes, with each prefix representing a subnet or a collection of
+subnets. In the world of BGP, a destination may take the form
+138.16.68/22, which for this example includes 1,024 IP addresses. Thus,
+a router's forwarding table will have entries of the form (x, I), where
+x is a prefix (such as 138.16.68/22) and I is an interface number for
+one of the router's interfaces. As an inter-AS routing protocol, BGP
+provides each router a means to:
+
+1.  Obtain prefix reachability information from neighboring ASs. In
+    particular, BGP allows each
+
+subnet to advertise its existence to the rest of the Internet. A subnet
+screams, "I exist and I am here," and BGP makes sure that all the
+routers in the Internet know about this subnet. If it weren't for BGP,
+each subnet would be an isolated island---alone, unknown and unreachable
+by the rest of the Internet.
+
+2.  Determine the "best" routes to the prefixes. A router may learn
+    about two or more different routes to a specific prefix. To
+    determine the best route, the router will locally run a BGP
+    routeselection procedure (using the prefix reachability information
+    it obtained via neighboring routers). The best route will be
+    determined based on policy as well as the reachability information.
+    Let us now delve into how BGP carries out these two tasks.
+    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+5.4.2 Advertising BGP Route Information Consider the network shown in
+Figure 5.8. As we can see, this simple network has three autonomous
+systems: AS1, AS2, and AS3. As shown, AS3 includes a subnet with prefix
+x. For each AS, each router is either a gateway router or an internal
+router. A gateway router is a router on the edge of an AS that directly
+connects to one or more routers in other ASs. An internal router
+connects only to hosts and routers within its own AS. In AS1, for
+example, router 1c is a gateway router; routers 1a, 1b, and 1d are
+internal routers. Let's consider the task of advertising reachability
+information for prefix x to all of the routers shown in Figure 5.8. At a
+high level, this is straightforward. First, AS3 sends a BGP message to
+AS2, saying that x exists and is in AS3; let's denote this message as
+"AS3 x". Then AS2 sends a BGP message to AS1, saying that x exists and
+that you can get to x by first passing through AS2 and then going to
+AS3; let's denote that message as "AS2 AS3 x". In this manner, each of
+the autonomous systems will not only learn about the existence of x, but
+also learn about a path of autonomous systems that leads to x. Although
+the discussion in the above paragraph about advertising BGP reachability
+information should get the general idea across, it is not precise in the
+sense that autonomous systems do not actually send messages to each
+other, but instead routers do. To understand this, let's now re-examine
+the example in Figure 5.8. In BGP,
+
+Figure 5.8 Network with three autonomous systems. AS3 includes a subnet
+with prefix x
+
+pairs of routers exchange routing information over semi-permanent TCP
+connections using port 179. Each such TCP connection, along with all the
+BGP messages sent over the connection, is called a BGP connection.
+Furthermore, a BGP connection that spans two ASs is called an external
+BGP (eBGP) connection, and a BGP session between routers in the same AS
+is called an internal BGP (iBGP) connection. Examples of BGP connections
+for the network in Figure 5.8 are shown in Figure 5.9. There is
+typically one eBGP connection for each link that directly connects
+gateway routers in different ASs; thus, in Figure 5.9, there is an eBGP
+connection between gateway routers 1c and 2a and an eBGP connection
+between gateway routers 2c and 3a. There are also iBGP connections
+between routers within each of the ASs. In particular, Figure 5.9
+displays a common configuration of one BGP connection for each pair of
+routers internal to an AS, creating a mesh of TCP connections within
+each AS. In Figure 5.9, the eBGP connections are shown with the long
+dashes; the iBGP connections are shown with the short dashes. Note that
+iBGP connections do not always correspond to physical links. In order to
+propagate the reachability information, both iBGP and eBGP sessions are
+used. Consider again advertising the reachability information for prefix
+x to all routers in AS1 and AS2. In this process, gateway router 3a
+first sends an eBGP message "AS3 x" to gateway router 2c. Gateway router
+2c then sends the iBGP message "AS3 x" to all of the other routers in
+AS2, including to gateway router 2a. Gateway router 2a then sends the
+eBGP message "AS2 AS3 x" to gateway router 1c.
+
+Figure 5.9 eBGP and iBGP connections
+
+Finally, gateway router 1c uses iBGP to send the message "AS2 AS3 x" to
+all the routers in AS1. After this process is complete, each router in
+AS1 and AS2 is aware of the existence of x and is also aware of an AS
+path that leads to x. Of course, in a real network, from a given router
+there may be many different paths to a given destination, each through a
+different sequence of ASs. For example, consider the network in Figure
+5.10, which is the original network in Figure 5.8, with an additional
+physical link from router 1d to router 3d. In this case, there are two
+paths from AS1 to x: the path "AS2 AS3 x" via router 1c; and the new
+path "AS3 x" via the router 1d.
+
+5.4.3 Determining the Best Routes As we have just learned, there may be
+many paths from a given router to a destination subnet. In fact, in the
+Internet, routers often receive reachability information about dozens of
+different possible paths. How does a router choose among these paths
+(and then configure its forwarding table accordingly)? Before addressing
+this critical question, we need to introduce a little more BGP
+terminology. When a router advertises a prefix across a BGP connection,
+it includes with the prefix several BGP attributes. In BGP jargon, a
+prefix along with its attributes is called a route. Two of the more
+important attributes are AS-PATH and NEXT-HOP. The AS-PATH attribute
+contains the list of ASs through which the
+
+Figure 5.10 Network augmented with peering link between AS1 and AS3
+
+advertisement has passed, as we've seen in our examples above. To
+generate the AS-PATH value, when a prefix is passed to an AS, the AS
+adds its ASN to the existing list in the AS-PATH. For example, in Figure
+5.10, there are two routes from AS1 to subnet x: one which uses the
+AS-PATH "AS2 AS3"; and another that uses the AS-PATH "A3". BGP routers
+also use the AS-PATH attribute to detect and prevent looping
+advertisements; specifically, if a router sees that its own AS is
+contained in the path list, it will reject the advertisement. Providing
+the critical link between the inter-AS and intra-AS routing protocols,
+the NEXT-HOP attribute has a subtle but important use. The NEXT-HOP is
+the IP address of the router interface that begins the AS-PATH. To gain
+insight into this attribute, let's again refer to Figure 5.10. As
+indicated in Figure 5.10, the NEXT-HOP attribute for the route "AS2 AS3
+x" from AS1 to x that passes through AS2 is the IP address of the left
+interface on router 2a. The NEXT-HOP attribute for the route "AS3 x"
+from AS1 to x that bypasses AS2 is the IP address of the leftmost
+interface of router 3d. In summary, in this toy example, each router in
+AS1 becomes aware of two BGP routes to prefix x: IP address of leftmost
+interface for router 2a; AS2 AS3; x IP address of leftmost interface of
+router 3d; AS3; x Here, each BGP route is written as a list with three
+components: NEXT-HOP; AS-PATH; destination prefix. In practice, a BGP
+route includes additional attributes, which we will ignore for the time
+being. Note that the NEXT-HOP attribute is an IP address of a router
+that does not belong to AS1; however, the subnet that contains this IP
+address directly attaches to AS1. Hot Potato Routing
+
+We are now finally in position to talk about BGP routing algorithms in a
+precise manner. We will begin with one of the simplest routing
+algorithms, namely, hot potato routing. Consider router 1b in the
+network in Figure 5.10. As just described, this router will learn about
+two possible BGP routes to prefix x. In hot potato routing, the route
+chosen (from among all possible routes) is that route with the least
+cost to the NEXT-HOP router beginning that route. In this example,
+router 1b will consult its intra-AS routing information to find the
+least-cost intra-AS path to NEXT-HOP router 2a and the least-cost
+intra-AS path to NEXT-HOP router 3d, and then select the route with the
+smallest of these least-cost paths. For example, suppose that cost is
+defined as the number of links traversed. Then the least cost from
+router 1b to router 2a is 2, the least cost from router 1b to router 2d
+is 3, and router 2a would therefore be selected. Router 1b would then
+consult its forwarding table (configured by its intra-AS algorithm) and
+find the interface I that is on the least-cost path to router 2a. It
+then adds (x, I) to its forwarding table. The steps for adding an
+outside-AS prefix in a router's forwarding table for hot potato routing
+are summarized in Figure 5.11. It is important to note that when adding
+an outside-AS prefix into a forwarding table, both the inter-AS routing
+protocol (BGP) and the intra-AS routing protocol (e.g., OSPF) are used.
+The idea behind hot-potato routing is for router 1b to get packets out
+of its AS as quickly as possible (more specifically, with the least cost
+possible) without worrying about the cost of the remaining portions of
+the path outside of its AS to the destination. In the name "hot potato
+routing," a packet is analogous to a hot potato that is burning in your
+hands. Because it is burning hot, you want to pass it off to another
+person (another AS) as quickly as possible. Hot potato routing is thus
+
+Figure 5.11 Steps in adding outside-AS destination in a router's
+forwarding table
+
+a selfish algorithm---it tries to reduce the cost in its own AS while
+ignoring the other components of the end-to-end costs outside its AS.
+Note that with hot potato routing, two routers in the same AS may choose
+two different AS paths to the same prefix. For example, we just saw that
+router 1b would send packets through AS2 to reach x. However, router 1d
+would bypass AS2 and send packets directly to AS3 to reach x.
+Route-Selection Algorithm
+
+In practice, BGP uses an algorithm that is more complicated than hot
+potato routing, but nevertheless incorporates hot potato routing. For
+any given destination prefix, the input into BGP's route-selection
+algorithm is the set of all routes to that prefix that have been learned
+and accepted by the router. If there is only one such route, then BGP
+obviously selects that route. If there are two or more routes to the
+same prefix, then BGP sequentially invokes the following elimination
+rules until one route remains:
+
+1.  A route is assigned a local preference value as one of its
+    attributes (in addition to the AS-PATH and NEXT-HOP attributes). The
+    local preference of a route could have been set by the router or
+    could have been learned from another router in the same AS. The
+    value of the local preference attribute is a policy decision that is
+    left entirely up to the AS's network administrator. (We will shortly
+    discuss BGP policy issues in some detail.) The routes with the
+    highest local preference values are selected.
+
+2.  From the remaining routes (all with the same highest local
+    preference value), the route with the shortest AS-PATH is selected.
+    If this rule were the only rule for route selection, then BGP would
+    be using a DV algorithm for path determination, where the distance
+    metric uses the number of AS hops rather than the number of router
+    hops.
+
+3.  From the remaining routes (all with the same highest local
+    preference value and the same ASPATH length), hot potato routing is
+    used, that is, the route with the closest NEXT-HOP router is
+    selected.
+
+4.  If more than one route still remains, the router uses BGP
+    identifiers to select the route; see \[Stewart 1999\]. As an
+    example, let's again consider router 1b in Figure 5.10. Recall that
+    there are exactly two BGP routes to prefix x, one that passes
+    through AS2 and one that bypasses AS2. Also recall that if hot
+    potato routing on its own were used, then BGP would route packets
+    through AS2 to prefix x. But in the above route-selection algorithm,
+    rule 2 is applied before rule 3, causing BGP to select the route
+    that bypasses AS2, since that route has a shorter AS PATH. So we see
+    that with the above route-selection algorithm, BGP is no longer a
+    selfish algorithm---it first looks for routes with short AS paths
+    (thereby likely reducing end-to-end delay). As noted above, BGP is
+    the de facto standard for inter-AS routing for the Internet. To see
+    the contents of various BGP routing tables (large!) extracted from
+    routers in tier-1 ISPs, see http:// www.routeviews.org. BGP routing
+    tables often contain over half a million routes (that is, prefixes
+    and corresponding attributes). Statistics about the size and
+    characteristics of BGP routing tables are presented in \[Potaroo
+    2016\].
+
+5.4.4 IP-Anycast
+
+In addition to being the Internet's inter-AS routing protocol, BGP is
+often used to implement the IPanycast service \[RFC 1546, RFC 7094\],
+which is commonly used in DNS. To motivate IP-anycast, consider that in
+many applications, we are interested in (1) replicating the same content
+on different servers in many different dispersed geographical locations,
+and (2) having each user access the content from the server that is
+closest. For example, a CDN may replicate videos and other objects on
+servers in different countries. Similarly, the DNS system can replicate
+DNS records on DNS servers throughout the world. When a user wants to
+access this replicated content, it is desirable to point the user to the
+"nearest" server with the replicated content. BGP's route-selection
+algorithm provides an easy and natural mechanism for doing so. To make
+our discussion concrete, let's describe how a CDN might use IP-anycast.
+As shown in Figure 5.12, during the IP-anycast configuration stage, the
+CDN company assigns the same IP address to each of its servers, and uses
+standard BGP to advertise this IP address from each of the servers. When
+a BGP router receives multiple route advertisements for this IP address,
+it treats these advertisements as providing different paths to the same
+physical location (when, in fact, the advertisements are for different
+paths to different physical locations). When configuring its routing
+table, each router will locally use the BGP route-selection algorithm to
+pick the "best" (for example, closest, as determined by AS-hop counts)
+route to that IP address. For example, if one BGP route (corresponding
+to one location) is only one AS hop away from the router, and all other
+BGP routes (corresponding to other locations) are two or more AS hops
+away, then the BGP router would choose to route packets to the location
+that is one hop away. After this initial BGP address-advertisement
+phase, the CDN can do its main job of distributing content. When a
+client requests the video, the CDN returns to the client the common IP
+address used by the geographically dispersed servers, no matter where
+the client is located. When the client sends a request to that IP
+address, Internet routers then forward the request packet to the
+"closest" server, as defined by the BGP route-selection algorithm.
+Although the above CDN example nicely illustrates how IP-anycast can be
+used, in practice CDNs generally choose not to use IP-anycast because
+BGP routing changes can result in different packets of the same TCP
+connection arriving at different instances of the Web server. But
+IP-anycast is extensively used by the DNS system to direct DNS queries
+to the closest root DNS server. Recall from Section 2.4, there are
+currently 13 IP addresses for root DNS servers. But corresponding
+
+Figure 5.12 Using IP-anycast to bring users to the closest CDN server
+
+to each of these addresses, there are multiple DNS root servers, with
+some of these addresses having over 100 DNS root servers scattered over
+all corners of the world. When a DNS query is sent to one of these 13 IP
+addresses, IP anycast is used to route the query to the nearest of the
+DNS root servers that is responsible for that address.
+
+5.4.5 Routing Policy When a router selects a route to a destination, the
+AS routing policy can trump all other considerations, such as shortest
+AS path or hot potato routing. Indeed, in the route-selection algorithm,
+routes are first selected according to the local-preference attribute,
+whose value is fixed by the policy of the local AS. Let's illustrate
+some of the basic concepts of BGP routing policy with a simple example.
+Figure 5.13 shows six interconnected autonomous systems: A, B, C, W, X,
+and Y. It is important to note that A, B, C, W, X, and Y are ASs, not
+routers. Let's
+
+Figure 5.13 A simple BGP policy scenario
+
+assume that autonomous systems W, X, and Y are access ISPs and that A,
+B, and C are backbone provider networks. We'll also assume that A, B,
+and C, directly send traffic to each other, and provide full BGP
+information to their customer networks. All traffic entering an ISP
+access network must be destined for that network, and all traffic
+leaving an ISP access network must have originated in that network. W
+and Y are clearly access ISPs. X is a multi-homed access ISP, since it
+is connected to the rest of the network via two different providers (a
+scenario that is becoming increasingly common in practice). However,
+like W and Y, X itself must be the source/destination of all traffic
+leaving/entering X. But how will this stub network behavior be
+implemented and enforced? How will X be prevented from forwarding
+traffic between B and C? This can easily be accomplished by controlling
+the manner in which BGP routes are advertised. In particular X will
+function as an access ISP network if it advertises (to its neighbors B
+and C) that it has no paths to any other destinations except itself.
+That is, even though X may know of a path, say XCY, that reaches network
+Y, it will not advertise this path to B. Since B is unaware that X has a
+path to Y, B would never forward traffic destined to Y (or C) via X.
+This simple example illustrates how a selective route advertisement
+policy can be used to implement customer/provider routing relationships.
+Let's next focus on a provider network, say AS B. Suppose that B has
+learned (from A) that A has a path AW to W. B can thus install the route
+AW into its routing information base. Clearly, B also wants to advertise
+the path BAW to its customer, X, so that X knows that it can route to W
+via B. But should B advertise the path BAW to C? If it does so, then C
+could route traffic to W via BAW. If A, B, and C are all backbone
+providers, than B might rightly feel that it should not have to shoulder
+the burden (and cost!) of carrying transit traffic between A and C. B
+might rightly feel that it is A's and C's job (and cost!) to make sure
+that C can route to/from A's customers via a direct connection between A
+and C. There are currently no official standards that govern how
+backbone ISPs route among themselves. However, a rule of thumb followed
+by commercial ISPs is that any traffic flowing across an ISP's backbone
+network must have either a source or a destination (or both) in a
+network that is a customer of that ISP; otherwise the traffic would be
+getting a free ride on the ISP's network. Individual peering agreements
+(that would govern questions such as
+
+PRINCIPLES IN PRACTICE
+
+WHY ARE THERE DIFFERENT INTER-AS AND INTRA-AS ROUTING PROTOCOLS? Having
+now studied the details of specific inter-AS and intra-AS routing
+protocols deployed in today's Internet, let's conclude by considering
+perhaps the most fundamental question we could ask about these protocols
+in the first place (hopefully, you have been wondering this all along,
+and have not lost the forest for the trees!): Why are different inter-AS
+and intra-AS routing protocols used? The answer to this question gets at
+the heart of the differences between the goals of routing within an AS
+and among ASs: Policy. Among ASs, policy issues dominate. It may well be
+important that traffic originating in a given AS not be able to pass
+through another specific AS. Similarly, a given AS may well want to
+control what transit traffic it carries between other ASs. We have seen
+that BGP carries path attributes and provides for controlled
+distribution of routing information so that such policy-based routing
+decisions can be made. Within an AS, everything is nominally under the
+same administrative control, and thus policy issues play a much less
+important role in choosing routes within the AS. Scale. The ability of a
+routing algorithm and its data structures to scale to handle routing
+to/among large numbers of networks is a critical issue in inter-AS
+routing. Within an AS, scalability is less of a concern. For one thing,
+if a single ISP becomes too large, it is always possible to divide it
+into two ASs and perform inter-AS routing between the two new ASs.
+(Recall that OSPF allows such a hierarchy to be built by splitting an AS
+into areas.) Performance. Because inter-AS routing is so policy
+oriented, the quality (for example, performance) of the routes used is
+often of secondary concern (that is, a longer or more costly route that
+satisfies certain policy criteria may well be taken over a route that is
+shorter but does not meet that criteria). Indeed, we saw that among ASs,
+there is not even the notion of cost (other than AS hop count)
+associated with routes. Within a single AS, however, such policy
+concerns are of less importance, allowing routing to focus more on the
+level of performance realized on a route.
+
+those raised above) are typically negotiated between pairs of ISPs and
+are often confidential; \[Huston 1999a\] provides an interesting
+discussion of peering agreements. For a detailed description of how
+routing policy reflects commercial relationships among ISPs, see \[Gao
+2001; Dmitiropoulos 2007\]. For a discussion of BGP routing polices from
+an ISP standpoint, see \[Caesar 2005b\]. This completes our brief
+introduction to BGP. Understanding BGP is important because it plays a
+central role in the Internet. We encourage you to see the references
+\[Griffin 2012; Stewart 1999; Labovitz 1997; Halabi 2000; Huitema 1998;
+Gao 2001; Feamster 2004; Caesar 2005b; Li 2007\] to learn more about
+BGP.
+
+5.4.6 Putting the Pieces Together: Obtaining Internet Presence Although
+this subsection is not about BGP per se, it brings together many of the
+protocols and concepts we've seen thus far, including IP addressing,
+DNS, and BGP. Suppose you have just created a small company that has a
+number of servers, including a public Web server that describes your
+company's products and services, a mail server from which your employees
+obtain their e-mail messages, and a DNS server. Naturally, you would
+like the entire world to be able to visit your Web site in order to
+learn about your exciting products and services. Moreover, you would
+like your employees to be able to send and receive e-mail to potential
+customers throughout the world. To meet these goals, you first need to
+obtain Internet connectivity, which is done by contracting with, and
+connecting to, a local ISP. Your company will have a gateway router,
+which will be connected to a router in your local ISP. This connection
+might be a DSL connection through the existing telephone infrastructure,
+a leased line to the ISP's router, or one of the many other access
+solutions described in Chapter 1. Your local ISP will also provide you
+with an IP address range, e.g., a /24 address range consisting of 256
+addresses. Once you have your physical connectivity and your IP address
+range, you will assign one of the IP addresses (in your address range)
+to your Web server, one to your mail server, one to your DNS server, one
+to your gateway router, and other IP addresses to other servers and
+networking devices in your company's network. In addition to contracting
+with an ISP, you will also need to contract with an Internet registrar
+to obtain a domain name for your company, as described in Chapter 2. For
+example, if your company's name is, say, Xanadu Inc., you will naturally
+try to obtain the domain name xanadu.com. Your company must also obtain
+presence in the DNS system. Specifically, because outsiders will want to
+contact your DNS server to obtain the IP addresses of your servers, you
+will also need to provide your registrar with the IP address of your DNS
+server. Your registrar will then put an entry for your DNS server
+(domain name and corresponding IP address) in the .com top-level-domain
+servers, as described in Chapter 2. After this step is completed, any
+user who knows your domain name (e.g., xanadu.com) will be able to
+obtain the IP address of your DNS server via the DNS system. So that
+people can discover the IP addresses of your Web server, in your DNS
+server you will need to include entries that map the host name of your
+Web server (e.g., www.xanadu.com) to its IP address. You will want to
+have similar entries for other publicly available servers in your
+company, including your mail server. In this manner, if Alice wants to
+browse your Web server, the DNS system will contact your DNS server,
+find the IP address of your Web server, and give it to Alice. Alice can
+then establish a TCP connection directly with your Web server. However,
+there still remains one other necessary and crucial step to allow
+outsiders from around the
+
+world to access your Web server. Consider what happens when Alice, who
+knows the IP address of your Web server, sends an IP datagram (e.g., a
+TCP SYN segment) to that IP address. This datagram will be routed
+through the Internet, visiting a series of routers in many different
+ASs, and eventually reach your Web server. When any one of the routers
+receives the datagram, it is going to look for an entry in its
+forwarding table to determine on which outgoing port it should forward
+the datagram. Therefore, each of the routers needs to know about the
+existence of your company's /24 prefix (or some aggregate entry). How
+does a router become aware of your company's prefix? As we have just
+seen, it becomes aware of it from BGP! Specifically, when your company
+contracts with a local ISP and gets assigned a prefix (i.e., an address
+range), your local ISP will use BGP to advertise your prefix to the ISPs
+to which it connects. Those ISPs will then, in turn, use BGP to
+propagate the advertisement. Eventually, all Internet routers will know
+about your prefix (or about some aggregate that includes your prefix)
+and thus be able to appropriately forward datagrams destined to your Web
+and mail servers.
+
+5.5 The SDN Control Plane In this section, we'll dive into the SDN
+control plane---the network-wide logic that controls packet forwarding
+among a network's SDN-enabled devices, as well as the configuration and
+management of these devices and their services. Our study here builds on
+our earlier discussion of generalized SDN forwarding in Section 4.4, so
+you might want to first review that section, as well as Section 5.1 of
+this chapter, before continuing on. As in Section 4.4, we'll again adopt
+the terminology used in the SDN literature and refer to the network's
+forwarding devices as "packet switches" (or just switches, with "packet"
+being understood), since forwarding decisions can be made on the basis
+of network-layer source/destination addresses, link-layer
+source/destination addresses, as well as many other values in
+transport-, network-, and link-layer packet-header fields. Four key
+characteristics of an SDN architecture can be identified \[Kreutz
+2015\]: Flow-based forwarding. Packet forwarding by SDN-controlled
+switches can be based on any number of header field values in the
+transport-layer, network-layer, or link-layer header. We saw in Section
+4.4 that the OpenFlow1.0 abstraction allows forwarding based on eleven
+different header field values. This contrasts sharply with the
+traditional approach to router-based forwarding that we studied in
+Sections 5.2--5.4, where forwarding of IP datagrams was based solely on
+a datagram's destination IP address. Recall from Figure 5.2 that packet
+forwarding rules are specified in a switch's flow table; it is the job
+of the SDN control plane to compute, manage and install flow table
+entries in all of the network's switches. Separation of data plane and
+control plane. This separation is shown clearly in Figures 5.2 and 5.14.
+The data plane consists of the network's switches--- relatively simple
+(but fast) devices that execute the "match plus action" rules in their
+flow tables. The control plane consists of servers and software that
+determine and manage the switches' flow tables. Network control
+functions: external to data-plane switches. Given that the "S" in SDN is
+for "software," it's perhaps not surprising that the SDN control plane
+is implemented in software. Unlike traditional routers, however, this
+software executes on servers that are both distinct and remote from the
+network's switches. As shown in Figure 5.14, the control plane itself
+consists of two components ---an SDN controller (or network operating
+system \[Gude 2008\]) and a set of network-control applications. The
+controller maintains accurate network state information (e.g., the state
+of remote links, switches, and hosts); provides this information to the
+network-control applications running in the control plane; and provides
+the means through which these applications can monitor, program, and
+control the underlying network devices. Although the controller in
+Figure 5.14 is shown as a single central server, in practice the
+controller is only logically centralized; it is typically implemented on
+several servers that provide coordinated, scalable performance and high
+availability.
+
+A programmable network. The network is programmable through the
+network-control applications running in the control plane. These
+applications represent the "brains" of the SDN control plane, using the
+APIs provided by the SDN controller to specify and control the data
+plane in the network devices. For example, a routing network-control
+application might determine the end-end paths between sources and
+destinations (e.g., by executing Dijkstra's algorithm using the
+node-state and link-state information maintained by the SDN controller).
+Another network application might perform access control, i.e.,
+determine which packets are to be blocked at a switch, as in our third
+example in Section 4.4.3. Yet another application might forward packets
+in a manner that performs server load balancing (the second example we
+considered in Section 4.4.3). From this discussion, we can see that SDN
+represents a significant "unbundling" of network functionality ---data
+plane switches, SDN controllers, and network-control applications are
+separate entities that may each be provided by different vendors and
+organizations. This contrasts with the pre-SDN model in which a
+switch/router (together with its embedded control plane software and
+protocol implementations) was monolithic, vertically integrated, and
+sold by a single vendor. This unbundling of network functionality in SDN
+has been likened to the earlier evolution from mainframe computers
+(where hardware, system software, and applications were provided by a
+single vendor) to personal computers (with their separate hardware,
+operating systems, and applications). The unbundling of computing
+hardware, system software, and applications has arguably led to a rich,
+open ecosystem driven by innovation in all three of these areas; one
+hope for SDN is that it too will lead to a such rich innovation. Given
+our understanding of the SDN architecture of Figure 5.14, many questions
+naturally arise. How and where are the flow tables actually computed?
+How are these tables updated in response to events at SDN-controlled
+devices (e.g., an attached link going up/down)? And how are the flow
+table entries at multiple switches coordinated in such a way as to
+result in orchestrated and consistent network-wide functionality (e.g.,
+end-to-end paths for forwarding packets from sources to destinations, or
+coordinated distributed firewalls)? It is the role of the SDN control
+plane to provide these, and many other, capabilities.
+
+Figure 5.14 Components of the SDN architecture: SDN-controlled switches,
+the SDN controller, network-control applications
+
+5.5.2 The SDN Control Plane: SDN Controller and SDN Network-control
+Applications Let's begin our discussion of the SDN control plane in the
+abstract, by considering the generic capabilities that the control plane
+must provide. As we'll see, this abstract, "first principles" approach
+will lead us to an overall architecture that reflects how SDN control
+planes have been implemented in practice. As noted above, the SDN
+control plane divides broadly into two components---the SDN controller
+and the SDN network-control applications. Let's explore the controller
+first. Many SDN controllers have been developed since the earliest SDN
+controller \[Gude 2008\]; see \[Kreutz 2015\] for an extremely thorough
+and up-to-date survey. Figure 5.15 provides a more detailed view of a
+generic SDN controller. A controller's functionality can be broadly
+organized into three layers. Let's consider these layers in an
+uncharacteristically bottom-up fashion: A communication layer:
+communicating between the SDN controller and controlled network devices.
+Clearly, if an SDN controller is going to control the operation of a
+remote SDN-enabled
+
+switch, host, or other device, a protocol is needed to transfer
+information between the controller and that device. In addition, a
+device must be able to communicate locally-observed events to the
+controller (e.g., a message indicating that an attached link has gone up
+or down, that a device has just joined the network, or a heartbeat
+indicating that a device is up and operational). These events provide
+the SDN controller with an up-to-date view of the network's state. This
+protocol constitutes the lowest layer of the controller architecture, as
+shown in Figure 5.15. The communication between the controller and the
+controlled devices cross what has come to be known as the controller's
+"southbound" interface. In Section 5.5.2, we'll study OpenFlow---a
+specific protocol that provides this communication functionality.
+OpenFlow is implemented in most, if not all, SDN controllers. A
+network-wide state-management layer. The ultimate control decisions made
+by the SDN control plane---e.g., configuring flow tables in all switches
+to achieve the desired end-end forwarding, to implement load balancing,
+or to implement a particular firewalling capability---will require that
+the controller have up-to-date information about state of the networks'
+hosts, links, switches, and other SDN-controlled devices. A switch's
+flow table contains counters whose values might also be profitably used
+by network-control applications; these values should thus be available
+to the applications. Since the ultimate aim of the control plane is to
+determine flow tables for the various controlled devices, a controller
+might also maintain a copy of these tables. These pieces of information
+all constitute examples of the network-wide "state" maintained by the
+SDN controller. The interface to the network-control application layer.
+The controller interacts with networkcontrol applications through its
+"northbound" interface. This API
+
+Figure 5.15 Components of an SDN controller
+
+allows network-control applications to read/write network state and flow
+tables within the statemanagement layer. Applications can register to be
+notified when state-change events occur, so that they can take actions
+in response to network event notifications sent from SDN-controlled
+devices. Different types of APIs may be provided; we'll see that two
+popular SDN controllers communicate with their applications using a REST
+\[Fielding 2000\] request-response interface. We have noted several
+times that an SDN controller can be considered to be "logically
+centralized," i.e., that the controller may be viewed externally (e.g.,
+from the point of view of SDN-controlled devices and external
+network-control applications) as a single, monolithic service. However,
+these services and the databases used to hold state information are
+implemented in practice by a distributed set of servers for fault
+tolerance, high availability, or for performance reasons. With
+controller functions being implemented by a set of servers, the
+semantics of the controller's internal operations (e.g., maintaining
+logical time ordering of events, consistency, consensus, and more) must
+be considered \[Panda 2013\].
+
+Such concerns are common across many different distributed systems; see
+\[Lamport 1989, Lampson 1996\] for elegant solutions to these
+challenges. Modern controllers such as OpenDaylight \[OpenDaylight
+Lithium 2016\] and ONOS \[ONOS 2016\] (see sidebar) have placed
+considerable emphasis on architecting a logically centralized but
+physically distributed controller platform that provides scalable
+services and high availability to the controlled devices and
+network-control applications alike. The architecture depicted in Figure
+5.15 closely resembles the architecture of the originally proposed NOX
+controller in 2008 \[Gude 2008\], as well as that of today's
+OpenDaylight \[OpenDaylight Lithium 2016\] and ONOS \[ONOS 2016\] SDN
+controllers (see sidebar). We'll cover an example of controller
+operation in Section 5.5.3. First, however, let's examine the OpenFlow
+protocol, which lies in the controller's communication layer.
+
+5.5.2 OpenFlow Protocol The OpenFlow protocol \[OpenFlow 2009, ONF
+2016\] operates between an SDN controller and an SDN-controlled switch
+or other device implementing the OpenFlow API that we studied earlier in
+Section 4.4. The OpenFlow protocol operates over TCP, with a default
+port number of 6653. Among the important messages flowing from the
+controller to the controlled switch are the following: Configuration.
+This message allows the controller to query and set a switch's
+configuration parameters. Modify-State. This message is used by a
+controller to add/delete or modify entries in the switch's flow table,
+and to set switch port properties. Read-State. This message is used by a
+controller to collect statistics and counter values from the switch's
+flow table and ports. Send-Packet. This message is used by the
+controller to send a specific packet out of a specified port at the
+controlled switch. The message itself contains the packet to be sent in
+its payload. Among the messages flowing from the SDN-controlled switch
+to the controller are the following: Flow-Removed. This message informs
+the controller that a flow table entry has been removed, for example by
+a timeout or as the result of a received modify-state message.
+Port-status. This message is used by a switch to inform the controller
+of a change in port status. Packet-in. Recall from Section 4.4 that a
+packet arriving at a switch port and not matching any flow table entry
+is sent to the controller for additional processing. Matched packets may
+also be sent to the controller, as an action to be taken on a match. The
+packet-in message is used to send such packets to the controller.
+
+Additional OpenFlow messages are defined in \[OpenFlow 2009, ONF 2016\].
+Principles in Practice Google's Software-Defined Global Network Recall
+from the case study in Section 2.6 that Google deploys a dedicated
+wide-area network (WAN) that interconnects its data centers and server
+clusters (in IXPs and ISPs). This network, called B4, has a
+Google-designed SDN control plane built on OpenFlow. Google's network is
+able to drive WAN links at near 70% utilization over the long run (a two
+to three fold increase over typical link utilizations) and split
+application flows among multiple paths based on application priority and
+existing flow demands \[Jain 2013\]. The Google B4 network is
+particularly it well-suited for SDN: (i) Google controls all devices
+from the edge servers in IXPs and ISPs to routers in their network core;
+(ii) the most bandwidthintensive applications are large-scale data
+copies between sites that can defer to higher-priority interactive
+applications during times of resource congestion; (iii) with only a few
+dozen data centers being connected, centralized control is feasible.
+Google's B4 network uses custom-built switches, each implementing a
+slightly extended version of OpenFlow, with a local Open Flow Agent
+(OFA) that is similar in spirit to the control agent we encountered in
+Figure 5.2. Each OFA in turn connects to an Open Flow Controller (OFC)
+in the network control server (NCS), using a separate "out of band"
+network, distinct from the network that carries data-center traffic
+between data centers. The OFC thus provides the services used by the NCS
+to communicate with its controlled switches, similar in spirit to the
+lowest layer in the SDN architecture shown in Figure 5.15. In B4, the
+OFC also performs state management functions, keeping node and link
+status in a Network Information Base (NIB). Google's implementation of
+the OFC is based on the ONIX SDN controller \[Koponen 2010\]. Two
+routing protocols, BGP (for routing between the data centers) and IS-IS
+(a close relative of OSPF, for routing within a data center), are
+implemented. Paxos \[Chandra 2007\] is used to execute hot replicas of
+NCS components to protect against failure. A traffic engineering
+network-control application, sitting logically above the set of network
+control servers, interacts with these servers to provide global,
+network-wide bandwidth provisioning for groups of application flows.
+With B4, SDN made an important leap forward into the operational
+networks of a global network provider. See \[Jain 2013\] for a detailed
+description of B4.
+
+5.5.3 Data and Control Plane Interaction: An Example
+
+In order to solidify our understanding of the interaction between
+SDN-controlled switches and the SDN controller, let's consider the
+example shown in Figure 5.16, in which Dijkstra's algorithm (which we
+studied in Section 5.2) is used to determine shortest path routes. The
+SDN scenario in Figure 5.16 has two important differences from the
+earlier per-router-control scenario of Sections 5.2.1 and 5.3, where
+Dijkstra's algorithm was implemented in each and every router and
+link-state updates were flooded among all network routers: Dijkstra's
+algorithm is executed as a separate application, outside of the packet
+switches. Packet switches send link updates to the SDN controller and
+not to each other. In this example, let's assume that the link between
+switch s1 and s2 goes down; that shortest path routing is implemented,
+and consequently and that incoming and outgoing flow forwarding rules at
+s1, s3, and s4 are affected, but that s2's
+
+Figure 5.16 SDN controller scenario: Link-state change
+
+operation is unchanged. Let's also assume that OpenFlow is used as the
+communication layer protocol, and that the control plane performs no
+other function other than link-state routing.
+
+1. Switch s1, experiencing a link failure between itself and s2,
+notifies the SDN controller of the link-state change using the OpenFlow
+port-status message.
+
+2.  The SDN controller receives the OpenFlow message indicating the
+    link-state change, and notifies the link-state manager, which
+    updates a link-state database.
+
+3.  The network-control application that implements Dijkstra's
+    link-state routing has previously registered to be notified when
+    link state changes. That application receives the notification of
+    the link-state change.
+
+4.  The link-state routing application interacts with the link-state
+    manager to get updated link state; it might also consult other
+    components in the state-management layer. It then computes the new
+    least-cost paths.
+
+5.  The link-state routing application then interacts with the flow
+    table manager, which determines the flow tables to be updated.
+
+6.  The flow table manager then uses the OpenFlow protocol to update
+    flow table entries at affected switches---s1 (which will now route
+    packets destined to s2 via s4), s2 (which will now begin receiving
+    packets from s1 via intermediate switch s4), and s4 (which must now
+    forward packets from s1 destined to s2). This example is simple but
+    illustrates how the SDN control plane provides control-plane
+    services (in this case network-layer routing) that had been
+    previously implemented with per-router control exercised in each and
+    every network router. One can now easily appreciate how an
+    SDN-enabled ISP could easily switch from least-cost path routing to
+    a more hand-tailored approach to routing. Indeed, since the
+    controller can tailor the flow tables as it pleases, it can
+    implement any form of forwarding that it pleases ---simply by
+    changing its application-control software. This ease of change
+    should be contrasted to the case of a traditional per-router control
+    plane, where software in all routers (which might be provided to the
+    ISP by multiple independent vendors) must be changed.
+
+5.5.4 SDN: Past and Future Although the intense interest in SDN is a
+relatively recent phenomenon, the technical roots of SDN, and the
+separation of the data and control planes in particular, go back
+considerably further. In 2004, \[Feamster 2004, Lakshman 2004, RFC
+3746\] all argued for the separation of the network's data and control
+planes. \[van der Merwe 1998\] describes a control framework for ATM
+networks \[Black 1995\] with multiple controllers, each controlling a
+number of ATM switches. The Ethane project \[Casado 2007\] pioneered the
+notion of a network of simple flow-based Ethernet switches with
+match-plus-action flow tables, a centralized controller that managed
+flow admission and routing, and the forwarding of unmatched packets from
+the switch to the controller. A network of more than 300 Ethane switches
+was operational in 2007. Ethane quickly evolved into the OpenFlow
+project, and the rest (as the saying goes) is history!
+
+Numerous research efforts are aimed at developing future SDN
+architectures and capabilities. As we have seen, the SDN revolution is
+leading to the disruptive replacement of dedicated monolithic switches
+and routers (with both data and control planes) by simple commodity
+switching hardware and a sophisticated software control plane. A
+generalization of SDN known as network functions virtualization (NFV)
+similarly aims at disruptive replacement of sophisticated middleboxes
+(such as middleboxes with dedicated hardware and proprietary software
+for media caching/service) with simple commodity servers, switching, and
+storage \[Gember-Jacobson 2014\]. A second area of important research
+seeks to extend SDN concepts from the intra-AS setting to the inter-AS
+setting \[Gupta 2014\]. PRINCIPLES IN PRACTICE SDN Controller Case
+Studies: The OpenDaylight and ONOS Controllers In the earliest days of
+SDN, there was a single SDN protocol (OpenFlow \[McKeown 2008; OpenFlow
+2009\]) and a single SDN controller (NOX \[Gude 2008\]). Since then, the
+number of SDN controllers in particular has grown significantly \[Kreutz
+2015\]. Some SDN controllers are company-specific and proprietary, e.g.,
+ONIX \[Koponen 2010\], Juniper Networks Contrail \[Juniper Contrail
+2016\], and Google's controller \[Jain 2013\] for its B4 wide-area
+network. But many more controllers are open-source and implemented in a
+variety of programming languages \[Erickson 2013\]. Most recently, the
+OpenDaylight controller \[OpenDaylight Lithium 2016\] and the ONOS
+controller \[ONOS 2016\] have found considerable industry support. They
+are both open-source and are being developed in partnership with the
+Linux Foundation. The OpenDaylight Controller Figure 5.17 presents a
+simplified view of the OpenDaylight Lithium SDN controller platform
+\[OpenDaylight Lithium 2016\]. ODL's main set of controller components
+correspond closely to those we developed in Figure 5.15. Network-Service
+Applications are the applications that determine how data-plane
+forwarding and other services, such as firewalling and load balancing,
+are accomplished in the controlled switches. Unlike the canonical
+controller in Figure 5.15, the ODL controller has two interfaces through
+which applications may communicate with native controller services and
+each other: external applications communicate with controller modules
+using a REST request-response API running over HTTP. Internal
+applications communicate with each other via the Service Abstraction
+Layer (SAL). The choice as to whether a controller application is
+implemented externally or internally is up to the application designer;
+
+Figure 5.17 The OpenDaylight controller
+
+the particular configuration of applications shown in Figure 5.17 is
+only meant as an example. ODL's Basic Network-Service Functions are at
+the heart of the controller, and they correspond closely to the
+network-wide state management capabilities that we encountered in Figure
+5.15. The SAL is the controller's nerve center, allowing controller
+components and applications to invoke each other's services and to
+subscribe to events they generate. It also provides a uniform abstract
+interface to the specific underlying communications protocols in the
+communication layer, including OpenFlow and SNMP (the Simple Network
+Management Protocol---a network management protocol that we will cover
+in Section 5.7). OVSDB is a protocol used to manage data center
+switching, an important application area for SDN technology. We'll
+introduce data center networking in Chapter 6.
+
+Figure 5.18 ONOS controller architecture
+
+The ONOS Controller Figure 5.18 presents a simplified view of the ONOS
+controller ONOS 2016\]. Similar to the canonical controller in Figure
+5.15, three layers can be identified in the ONOS controller: Northbound
+abstractions and protocols. A unique feature of ONOS is its intent
+framework, which allows an application to request a high-level service
+(e.g., to setup a connection between host A and Host B, or conversely to
+not allow Host A and host B to communicate) without having to know the
+details of how this service is performed. State information is provided
+to network-control applications across the northbound API either
+synchronously (via query) or asynchronously (via listener callbacks,
+e.g., when network state changes). Distributed core. The state of the
+network's links, hosts, and devices is maintained in ONOS's distributed
+core. ONOS is deployed as a service on a set of interconnected servers,
+with each server running an identical copy of the ONOS software; an
+increased number of servers offers an increased service capacity. The
+ONOS core provides the mechanisms for service replication and
+coordination among instances, providing the applications above and the
+network devices below with the abstraction of logically centralized core
+services.
+
+Southbound abstractions and protocols. The southbound abstractions mask
+the heterogeneity of the underlying hosts, links, switches, and
+protocols, allowing the distributed core to be both device and protocol
+agnostic. Because of this abstraction, the southbound interface below
+the distributed core is logically higher than in our canonical
+controller in Figure 5.14 or the ODL controller in Figure 5.17.
+
+5.6 ICMP: The Internet Control Message Protocol The Internet Control
+Message Protocol (ICMP), specified in \[RFC 792\], is used by hosts and
+routers to communicate network-layer information to each other. The most
+typical use of ICMP is for error reporting. For example, when running an
+HTTP session, you may have encountered an error message such as
+"Destination network unreachable." This message had its origins in ICMP.
+At some point, an IP router was unable to find a path to the host
+specified in your HTTP request. That router created and sent an ICMP
+message to your host indicating the error. ICMP is often considered part
+of IP, but architecturally it lies just above IP, as ICMP messages are
+carried inside IP datagrams. That is, ICMP messages are carried as IP
+payload, just as TCP or UDP segments are carried as IP payload.
+Similarly, when a host receives an IP datagram with ICMP specified as
+the upper-layer protocol (an upper-layer protocol number of 1), it
+demultiplexes the datagram's contents to ICMP, just as it would
+demultiplex a datagram's content to TCP or UDP. ICMP messages have a
+type and a code field, and contain the header and the first 8 bytes of
+the IP datagram that caused the ICMP message to be generated in the
+first place (so that the sender can determine the datagram that caused
+the error). Selected ICMP message types are shown in Figure 5.19. Note
+that ICMP messages are used not only for signaling error conditions. The
+well-known ping program sends an ICMP type 8 code 0 message to the
+specified host. The destination host, seeing the echo request, sends
+back a type 0 code 0 ICMP echo reply. Most TCP/IP implementations
+support the ping server directly in the operating system; that is, the
+server is not a process. Chapter 11 of \[Stevens 1990\] provides the
+source code for the ping client program. Note that the client program
+needs to be able to instruct the operating system to generate an ICMP
+message of type 8 code 0. Another interesting ICMP message is the source
+quench message. This message is seldom used in practice. Its original
+purpose was to perform congestion control---to allow a congested router
+to send an ICMP source quench message to a host to force
+
+Figure 5.19 ICMP message types
+
+that host to reduce its transmission rate. We have seen in Chapter 3
+that TCP has its own congestioncontrol mechanism that operates at the
+transport layer, without the use of network-layer feedback such as the
+ICMP source quench message. In Chapter 1 we introduced the Traceroute
+program, which allows us to trace a route from a host to any other host
+in the world. Interestingly, Traceroute is implemented with ICMP
+messages. To determine the names and addresses of the routers between
+source and destination, Traceroute in the source sends a series of
+ordinary IP datagrams to the destination. Each of these datagrams
+carries a UDP segment with an unlikely UDP port number. The first of
+these datagrams has a TTL of 1, the second of 2, the third of 3, and so
+on. The source also starts timers for each of the datagrams. When the
+nth datagram arrives at the nth router, the nth router observes that the
+TTL of the datagram has just expired. According to the rules of the IP
+protocol, the router discards the datagram and sends an ICMP warning
+message to the source (type 11 code 0). This warning message includes
+the name of the router and its IP address. When this ICMP message
+arrives back at the source, the source obtains the round-trip time from
+the timer and the name and IP address of the nth router from the ICMP
+message. How does a Traceroute source know when to stop sending UDP
+segments? Recall that the source increments the TTL field for each
+datagram it sends. Thus, one of the datagrams will eventually make it
+all the way to the destination host. Because this datagram contains a
+UDP segment with an unlikely port
+
+number, the destination host sends a port unreachable ICMP message (type
+3 code 3) back to the source. When the source host receives this
+particular ICMP message, it knows it does not need to send additional
+probe packets. (The standard Traceroute program actually sends sets of
+three packets with the same TTL; thus the Traceroute output provides
+three results for each TTL.) In this manner, the source host learns the
+number and the identities of routers that lie between it and the
+destination host and the round-trip time between the two hosts. Note
+that the Traceroute client program must be able to instruct the
+operating system to generate UDP datagrams with specific TTL values and
+must also be able to be notified by its operating system when ICMP
+messages arrive. Now that you understand how Traceroute works, you may
+want to go back and play with it some more. A new version of ICMP has
+been defined for IPv6 in RFC 4443. In addition to reorganizing the
+existing ICMP type and code definitions, ICMPv6 also added new types and
+codes required by the new IPv6 functionality. These include the "Packet
+Too Big" type and an "unrecognized IPv6 options" error code.
+
+5.7 Network Management and SNMP Having now made our way to the end of
+our study of the network layer, with only the link-layer before us,
+we're well aware that a network consists of many complex, interacting
+pieces of hardware and software ---from the links, switches, routers,
+hosts, and other devices that comprise the physical components of the
+network to the many protocols that control and coordinate these devices.
+When hundreds or thousands of such components are brought together by an
+organization to form a network, the job of the network administrator to
+keep the network "up and running" is surely a challenge. We saw in
+Section 5.5 that the logically centralized controller can help with this
+process in an SDN context. But the challenge of network management has
+been around long before SDN, with a rich set of network management tools
+and approaches that help the network administrator monitor, manage, and
+control the network. We'll study these tools and techniques in this
+section. An often-asked question is "What is network management?" A
+well-conceived, single-sentence (albeit a rather long run-on sentence)
+definition of network management from \[Saydam 1996\] is: Network
+management includes the deployment, integration, and coordination of the
+hardware, software, and human elements to monitor, test, poll,
+configure, analyze, evaluate, and control the network and element
+resources to meet the real-time, operational performance, and Quality of
+Service requirements at a reasonable cost. Given this broad definition,
+we'll cover only the rudiments of network management in this
+section---the architecture, protocols, and information base used by a
+network administrator in performing their task. We'll not cover the
+administrator's decision-making processes, where topics such as fault
+identification \[Labovitz 1997; Steinder 2002; Feamster 2005; Wu 2005;
+Teixeira 2006\], anomaly detection \[Lakhina 2005; Barford 2009\],
+network design/engineering to meet contracted Service Level Agreements
+(SLA's) \[Huston 1999a\], and more come into consideration. Our focus is
+thus purposefully narrow; the interested reader should consult these
+references, the excellent network-management text by Subramanian
+\[Subramanian 2000\], and the more detailed treatment of network
+management available on the Web site for this text.
+
+5.7.1 The Network Management Framework Figure 5.20 shows the key
+components of network management:
+
+The managing server is an application, typically with a human in the
+loop, running in a centralized network management station in the network
+operations center (NOC). The managing server is the locus of activity
+for network management; it controls the collection, processing,
+analysis, and/or display of network management information. It is here
+that actions are initiated to control network behavior and here that the
+human network administrator interacts with the network's devices. A
+managed device is a piece of network equipment (including its software)
+that resides on a managed network. A managed device might be a host,
+router, switch, middlebox, modem, thermometer, or other
+network-connected device. There may be several so-called managed objects
+within a managed device. These managed objects are the actual pieces of
+hardware within the managed device (for example, a network interface
+card is but one component of a host or router), and configuration
+parameters for these hardware and software components (for example, an
+intraAS routing protocol such as OSPF). Each managed object within a
+managed device associated information that is collected into a
+Management Information Base (MIB); we'll see that the values of these
+pieces of information are available to (and in many cases able to be set
+by) the managing server. A MIB object might be a counter, such as the
+number of IP datagrams discarded at a router due to errors in an IP
+datagram header, or the number of UDP segments received at a host;
+descriptive information such as the version of the software running on a
+DNS server; status information such as whether a particular device is
+functioning correctly; or protocol-specific information such as a
+routing path to a destination. MIB objects are specified in a data
+description language known as SMI (Structure of Management Information)
+\[RFC 2578; RFC 2579; RFC 2580\]. A formal definition language is used
+to ensure that the syntax and semantics of the network management data
+are well defined and unambiguous. Related MIB objects are gathered into
+MIB modules. As of mid-2015, there were nearly 400 MIB modules defined
+by RFCs, and a much larger number of vendor-specific (private) MIB
+modules. Also resident in each managed device is a network management
+agent, a process running in the managed device that communicates with
+the managing server,
+
+Figure 5.20 Elements of network management: Managing server, managed
+devices, MIB data, remote agents, SNMP
+
+taking local actions at the managed device under the command and control
+of the managing server. The network management agent is similar to the
+routing agent that we saw in Figure 5.2. The final component of a
+network management framework is the network management protocol. The
+protocol runs between the managing server and the managed devices,
+allowing the managing server to query the status of managed devices and
+indirectly take actions at these devices via its agents. Agents can use
+the network management protocol to inform the managing server of
+exceptional events (for example, component failures or violation of
+performance thresholds). It's important to note that the network
+management protocol does not itself manage the network. Instead, it
+provides capabilities that a network administrator can use to manage
+("monitor, test, poll, configure, analyze, evaluate, and control") the
+network. This is a subtle, but important, distinction. In the following
+section, we'll cover the Internet's SNMP (Simple Network Management
+Protocol) protocol.
+
+5.7.2 The Simple Network Management Protocol (SNMP)
+
+The Simple Network Management Protocol version 2 (SNMPv2) \[RFC 3416\]
+is an application-layer protocol used to convey network-management
+control and information messages between a managing server and an agent
+executing on behalf of that managing server. The most common usage of
+SNMP is in a request-response mode in which an SNMP managing server
+sends a request to an SNMP agent, who receives the request, performs
+some action, and sends a reply to the request. Typically, a request will
+be used to query (retrieve) or modify (set) MIB object values associated
+with a managed device. A second common usage of SNMP is for an agent to
+send an unsolicited message, known as a trap message, to a managing
+server. Trap messages are used to notify a managing server of an
+exceptional situation (e.g., a link interface going up or down) that has
+resulted in changes to MIB object values. SNMPv2 defines seven types of
+messages, known generically as protocol data units---PDUs---as shown in
+Table 5.2 and described below. The format of the PDU is shown in Figure
+5.21. The GetRequest , GetNextRequest, and GetBulkRequest PDUs are all
+sent from a managing server to an agent to request the value of one or
+more MIB objects at the agent's managed device. The MIB objects whose
+values are being Table 5.2 SNMPv2 PDU types SNMPv2 PDU
+
+Sender-receiver
+
+Description
+
+manager-to-
+
+get value of one or more MIB object instances
+
+Type GetRequest
+
+agent GetNextRequest
+
+manager-to-
+
+get value of next MIB object instance in list or table
+
+agent GetBulkRequest
+
+InformRequest
+
+SetRequest
+
+manager-to-
+
+get values in large block of data, for example, values
+
+agent
+
+in a large table
+
+manager-to-
+
+inform remote managing entity of MIB values remote
+
+manager
+
+to its access
+
+manager-to-
+
+set value of one or more MIB object instances
+
+agent Response
+
+agent-to-
+
+generated in response to
+
+manager or manager-tomanager
+
+GetRequest,
+
+GetNextRequest, GetBulkRequest, SetRequest PDU, or InformRequest
+SNMPv2-Trap
+
+agent-to-
+
+inform manager of an exceptional event \#
+
+manager
+
+Figure 5.21 SNMP PDU format
+
+requested are specified in the variable binding portion of the PDU. 
+GetRequest , GetNextRequest , and GetBulkRequest differ in the
+granularity of their data requests. GetRequest can request an arbitrary
+set of MIB values; multiple GetNextRequest s can be used to sequence
+through a list or table of MIB objects; GetBulkRequest allows a large
+block of data to be returned, avoiding the overhead incurred if multiple
+GetRequest or  GetNextRequest messages were to be sent. In all three
+cases, the agent responds with a Response PDU containing the object
+identifiers and their associated values. The SetRequest PDU is used by a
+managing server to set the value of one or more MIB objects in a managed
+device. An agent replies with a Response PDU with the "noError" error
+status to confirm that the value has indeed been set. The InformRequest
+PDU is used by a managing server to notify another managing server of
+MIB
+
+information that is remote to the receiving server. The Response PDU is
+typically sent from a managed device to the managing server in response
+to a request message from that server, returning the requested
+information. The final type of SNMPv2 PDU is the trap message. Trap
+messages are generated asynchronously; that is, they are not generated
+in response to a received request but rather in response to an event for
+which the managing server requires notification. RFC 3418 defines
+well-known trap types that include a cold or warm start by a device, a
+link going up or down, the loss of a neighbor, or an authentication
+failure event. A received trap request has no required response from a
+managing server. Given the request-response nature of SNMP, it is worth
+noting here that although SNMP PDUs can be carried via many different
+transport protocols, the SNMP PDU is typically carried in the payload of
+a UDP datagram. Indeed, RFC 3417 states that UDP is "the preferred
+transport mapping." However, since UDP is an unreliable transport
+protocol, there is no guarantee that a request, or its response, will be
+received at the intended destination. The request ID field of the PDU
+(see Figure 5.21) is used by the managing server to number its requests
+to an agent; the agent's response takes its request ID from that of the
+received request. Thus, the request ID field can be used by the managing
+server to detect lost requests or replies. It is up to the managing
+server to decide whether to retransmit a request if no corresponding
+response is received after a given amount of time. In particular, the
+SNMP standard does not mandate any particular procedure for
+retransmission, or even if retransmission is to be done in the first
+place. It only requires that the managing server "needs to act
+responsibly in respect to the frequency and duration of
+retransmissions." This, of course, leads one to wonder how a
+"responsible" protocol should act! SNMP has evolved through three
+versions. The designers of SNMPv3 have said that "SNMPv3 can be thought
+of as SNMPv2 with additional security and administration capabilities"
+\[RFC 3410\]. Certainly, there are changes in SNMPv3 over SNMPv2, but
+nowhere are those changes more evident than in the area of
+administration and security. The central role of security in SNMPv3 was
+particularly important, since the lack of adequate security resulted in
+SNMP being used primarily for monitoring rather than control (for
+example, SetRequest is rarely used in SNMPv1). Once again, we see that
+security---a topic we'll cover in detail in Chapter 8 --- is of critical
+concern, but once again a concern whose importance had been realized
+perhaps a bit late and only then "added on."
+
+5.7 Summary We have now completed our two-chapter journey into the
+network core---a journey that began with our study of the network
+layer's data plane in Chapter 4 and finished here with our study of the
+network layer's control plane. We learned that the control plane is the
+network-wide logic that controls not only how a datagram is forwarded
+among routers along an end-to-end path from the source host to the
+destination host, but also how network-layer components and services are
+configured and managed. We learned that there are two broad approaches
+towards building a control plane: traditional per-router control (where
+a routing algorithm runs in each and every router and the routing
+component in the router communicates with the routing components in
+other routers) and software-defined networking (SDN) control (where a
+logically centralized controller computes and distributes the forwarding
+tables to be used by each and every router). We studied two fundamental
+routing algorithms for computing least cost paths in a
+graph---link-state routing and distance-vector routing---in Section 5.2;
+these algorithms find application in both per-router control and in SDN
+control. These algorithms are the basis for two widelydeployed Internet
+routing protocols, OSPF and BGP, that we covered in Sections 5.3 and
+5.4. We covered the SDN approach to the network-layer control plane in
+Section 5.5, investigating SDN network-control applications, the SDN
+controller, and the OpenFlow protocol for communicating between the
+controller and SDN-controlled devices. In Sections 5.6 and 5.7, we
+covered some of the nuts and bolts of managing an IP network: ICMP (the
+Internet Control Message Protocol) and SNMP (the Simple Network
+Management Protocol). Having completed our study of the network layer,
+our journey now takes us one step further down the protocol stack,
+namely, to the link layer. Like the network layer, the link layer is
+part of each and every network-connected device. But we will see in the
+next chapter that the link layer has the much more localized task of
+moving packets between nodes on the same link or LAN. Although this task
+may appear on the surface to be rather simple compared with that of the
+network layer's tasks, we will see that the link layer involves a number
+of important and fascinating issues that can keep us busy for a long
+time.
+
+Homework Problems and Questions
+
+Chapter 5 Review Questions
+
+SECTION 5.1 R1. What is meant by a control plane that is based on
+per-router control? In such cases, when we say the network control and
+data planes are implemented "monolithically," what do we mean? R2. What
+is meant by a control plane that is based on logically centralized
+control? In such cases, are the data plane and the control plane
+implemented within the same device or in separate devices? Explain.
+
+SECTION 5.2 R3. Compare and contrast the properties of a centralized and
+a distributed routing algorithm. Give an example of a routing protocol
+that takes a centralized and a decentralized approach. R4. Compare and
+contrast link-state and distance-vector routing algorithms. R5. What is
+the "count to infinity" problem in distance vector routing? R6. Is it
+necessary that every autonomous system use the same intra-AS routing
+algorithm? Why or why not?
+
+SECTIONS 5.3--5.4 R7. Why are different inter-AS and intra-AS protocols
+used in the Internet? R8. True or false: When an OSPF route sends its
+link state information, it is sent only to those nodes directly attached
+neighbors. Explain. R9. What is meant by an area in an OSPF autonomous
+system? Why was the concept of an area introduced? R10. Define and
+contrast the following terms: subnet, prefix, and BGP route. R11. How
+does BGP use the NEXT-HOP attribute? How does it use the AS-PATH
+attribute? R12. Describe how a network administrator of an upper-tier
+ISP can implement policy when configuring BGP. R13. True or false: When
+a BGP router receives an advertised path from its neighbor, it must add
+its own identity to the received path and then send that new path on to
+all of its neighbors.
+
+Explain.
+
+SECTION 5.5 R14. Describe the main role of the communication layer, the
+network-wide state-management layer, and the network-control application
+layer in an SDN controller. R15. Suppose you wanted to implement a new
+routing protocol in the SDN control plane. At which layer would you
+implement that protocol? Explain. R16. What types of messages flow
+across an SDN controller's northbound and southbound APIs? Who is the
+recipient of these messages sent from the controller across the
+southbound interface, and who sends messages to the controller across
+the northbound interface? R17. Describe the purpose of two types of
+OpenFlow messages (of your choosing) that are sent from a controlled
+device to the controller. Describe the purpose of two types of Openflow
+messages (of your choosing) that are send from the controller to a
+controlled device. R18. What is the purpose of the service abstraction
+layer in the OpenDaylight SDN controller?
+
+SECTIONS 5.6--5.7 R19. Names four different types of ICMP messages R20.
+What two types of ICMP messages are received at the sending host
+executing the Traceroute program? R21. Define the following terms in the
+context of SNMP: managing server, managed device, network management
+agent and MIB. R22. What are the purposes of the SNMP GetRequest and
+SetRequest messages? R23. What is the purpose of the SNMP trap message?
+
+Problems P1. Looking at Figure 5.3 , enumerate the paths from y to u
+that do not contain any loops. P2. Repeat Problem P1 for paths from x to
+z, z to u, and z to w. P3. Consider the following network. With the
+indicated link costs, use Dijkstra's shortest-path algorithm to compute
+the shortest path from x to all network nodes. Show how the algorithm
+works by computing a table similar to Table 5.1 .
+
+Dijkstra's algorithm: discussion and example
+
+P4. Consider the network shown in Problem P3. Using Dijkstra's
+algorithm, and showing your work using a table similar to Table 5.1 , do
+the following:
+
+a.  Compute the shortest path from t to all network nodes.
+b.  Compute the shortest path from u to all network nodes.
+c.  Compute the shortest path from v to all network nodes.
+d.  Compute the shortest path from w to all network nodes.
+e.  Compute the shortest path from y to all network nodes.
+f.  Compute the shortest path from z to all network nodes. P5. Consider
+    the network shown below, and assume that each node initially knows
+    the costs to each of its neighbors. Consider the distance-vector
+    algorithm and show the distance table entries at node z.
+
+P6. Consider a general topology (that is, not the specific network shown
+above) and a
+
+synchronous version of the distance-vector algorithm. Suppose that at
+each iteration, a node exchanges its distance vectors with its neighbors
+and receives their distance vectors. Assuming that the algorithm begins
+with each node knowing only the costs to its immediate neighbors, what
+is the maximum number of iterations required before the distributed
+algorithm converges? Justify your answer. P7. Consider the network
+fragment shown below. x has only two attached neighbors, w and y. w has
+a minimum-cost path to destination u (not shown) of 5, and y has a
+minimum-cost path to u of 6. The complete paths from w and y to u (and
+between w and y) are not shown. All link costs in the network have
+strictly positive integer values.
+
+a.  Give x's distance vector for destinations w, y, and u.
+
+b.  Give a link-cost change for either c(x, w) or c(x, y) such that x
+    will inform its neighbors of a new minimum-cost path to u as a
+    result of executing the distance-vector algorithm.
+
+c.  Give a link-cost change for either c(x, w) or c(x, y) such that x
+    will not inform its neighbors of a new minimum-cost path to u as a
+    result of executing the distance-vector algorithm. P8. Consider the
+    three-node topology shown in Figure 5.6 . Rather than having the
+    link costs shown in Figure 5.6 , the link costs are c(x,y)=3,
+    c(y,z)=6, c(z,x)=4. Compute the distance tables after the
+    initialization step and after each iteration of a synchronous
+    version of the distancevector algorithm (as we did in our earlier
+    discussion of Figure 5.6 ). P9. Consider the count-to-infinity
+    problem in the distance vector routing. Will the count-to-infinity
+    problem occur if we decrease the cost of a link? Why? How about if
+    we connect two nodes which do not have a link? P10. Argue that for
+    the distance-vector algorithm in Figure 5.6 , each value in the
+    distance vector D(x) is non-increasing and will eventually stabilize
+    in a finite number of steps. P11. Consider Figure 5.7. Suppose there
+    is another router w, connected to router y and z. The costs of all
+    links are given as follows: c(x,y)=4, c(x,z)=50, c(y,w)=1, c(z,w)=1,
+    c(y,z)=3. Suppose that poisoned reverse is used in the
+    distance-vector routing algorithm.
+
+d.  When the distance vector routing is stabilized, router w, y, and z
+    inform their distances to x to each other. What distance values do
+    they tell each other?
+
+e.  Now suppose that the link cost between x and y increases to 60. Will
+    there be a count-toinfinity problem even if poisoned reverse is
+    used? Why or why not? If there is a count-toinfinity problem, then
+    how many iterations are needed for the distance-vector routing to
+
+reach a stable state again? Justify your answer.
+
+c.  How do you modify c(y, z) such that there is no count-to-infinity
+    problem at all if c(y,x) changes from 4 to 60? P12. Describe how
+    loops in paths can be detected in BGP. P13. Will a BGP router always
+    choose the loop-free route with the shortest ASpath length? Justify
+    your answer. P14. Consider the network shown below. Suppose AS3 and
+    AS2 are running OSPF for their intra-AS routing protocol. Suppose
+    AS1 and AS4 are running RIP for their intra-AS routing protocol.
+    Suppose eBGP and iBGP are used for the inter-AS routing protocol.
+    Initially suppose there is no physical link between AS2 and AS4.
+
+d.  Router 3c learns about prefix x from which routing protocol: OSPF,
+    RIP, eBGP, or iBGP?
+
+e.  Router 3a learns about x from which routing protocol?
+
+f.  Router 1c learns about x from which routing protocol?
+
+g.  Router 1d learns about x from which routing protocol?
+
+P15. Referring to the previous problem, once router 1d learns about x it
+will put an entry (x, I) in its forwarding table.
+
+a.  Will I be equal to I1 or I2 for this entry? Explain why in one
+    sentence.
+
+b.  Now suppose that there is a physical link between AS2 and AS4, shown
+    by the dotted line. Suppose router 1d learns that x is accessible
+    via AS2 as well as via AS3. Will I be set to I1 or I2? Explain why
+    in one sentence.
+
+c.  Now suppose there is another AS, called AS5, which lies on the path
+    between AS2 and AS4 (not shown in diagram). Suppose router 1d learns
+    that x is accessible via AS2 AS5 AS4 as well as via AS3 AS4. Will I
+    be set to I1 or I2? Explain why in one sentence.
+
+P16. Consider the following network. ISP B provides national backbone
+service to regional ISP A. ISP C provides national backbone service to
+regional ISP D. Each ISP consists of one AS. B and C peer with each
+other in two places using BGP. Consider traffic going from A to D. B
+would prefer to hand that traffic over to C on the West Coast (so that C
+would have to absorb the cost of carrying the traffic cross-country),
+while C would prefer to get the traffic via its East Coast peering point
+with B (so that B would have carried the traffic across the country).
+What BGP mechanism might C use, so that B would hand over A-to-D traffic
+at its East Coast peering point? To answer this question, you will need
+to dig into the BGP specification.
+
+P17. In Figure 5.13 , consider the path information that reaches stub
+networks W, X, and Y. Based on the information available at W and X,
+what are their respective views of the network topology? Justify your
+answer. The topology view at Y is shown below.
+
+P18. Consider Figure 5.13 . B would never forward traffic destined to Y
+via X based on BGP routing. But there are some very popular applications
+for which data packets go to X first and then flow to Y. Identify one
+such application, and describe how data packets follow a path not given
+by BGP routing.
+
+P19. In Figure 5.13 , suppose that there is another stub network V that
+is a customer of ISP A. Suppose that B and C have a peering
+relationship, and A is a customer of both B and C. Suppose that A would
+like to have the traffic destined to W to come from B only, and the
+traffic destined to V from either B or C. How should A advertise its
+routes to B and C? What AS routes does C receive? P20. Suppose ASs X and
+Z are not directly connected but instead are connected by AS Y. Further
+suppose that X has a peering agreement with Y, and that Y has a peering
+agreement with Z. Finally, suppose that Z wants to transit all of Y's
+traffic but does not want to transit X's traffic. Does BGP allow Z to
+implement this policy? P21. Consider the two ways in which communication
+occurs between a managing entity and a managed device: request-response
+mode and trapping. What are the pros and cons of these two approaches,
+in terms of (1) overhead, (2) notification time when exceptional events
+occur, and (3) robustness with respect to lost messages between the
+managing entity and the device? P22. In Section 5.7 we saw that it was
+preferable to transport SNMP messages in unreliable UDP datagrams. Why
+do you think the designers of SNMP chose UDP rather than TCP as the
+transport protocol of choice for SNMP?
+
+Socket Programming Assignment At the end of Chapter 2, there are four
+socket programming assignments. Below, you will find a fifth assignment
+which employs ICMP, a protocol discussed in this chapter. Assignment 5:
+ICMP Ping Ping is a popular networking application used to test from a
+remote location whether a particular host is up and reachable. It is
+also often used to measure latency between the client host and the
+target host. It works by sending ICMP "echo request" packets (i.e., ping
+packets) to the target host and listening for ICMP "echo response"
+replies (i.e., pong packets). Ping measures the RRT, records packet
+loss, and calculates a statistical summary of multiple ping-pong
+exchanges (the minimum, mean, max, and standard deviation of the
+round-trip times). In this lab, you will write your own Ping application
+in Python. Your application will use ICMP. But in order to keep your
+program simple, you will not exactly follow the official specification
+in RFC 1739. Note that you will only need to write the client side of
+the program, as the functionality needed on the server side is built
+into almost all operating systems. You can find full details of this
+assignment, as well as important snippets of the Python code, at the Web
+site http://www.pearsonhighered.com/csresources. Programming Assignment
+
+In this programming assignment, you will be writing a "distributed" set
+of procedures that implements a distributed asynchronous distance-vector
+routing for the network shown below. You are to write the following
+routines that will "execute" asynchronously within the emulated
+environment provided for this assignment. For node 0, you will write the
+routines:
+
+rtinit0(). This routine will be called once at the beginning of the
+emulation. rtinit0() has no arguments. It should initialize your
+distance table in node 0 to reflect the direct costs of 1, 3, and 7 to
+nodes 1, 2, and 3, respectively. In the figure above, all links are
+bidirectional and the costs in both directions are identical. After
+initializing the distance table and any other data structures needed by
+your node 0 routines, it should then send its directly connected
+neighbors (in this case, 1, 2, and 3) the cost of its minimum-cost paths
+to all other network nodes. This minimum-cost information is sent to
+neighboring nodes in a routing update packet by calling the routine
+tolayer2(), as described in the full assignment. The format of the
+routing update packet is also described in the full assignment.
+rtupdate0(struct rtpkt *rcvdpkt). This routine will be called when node
+0 receives a routing packet that was sent to it by one of its directly
+connected neighbors. The parameter *rcvdpkt is a pointer to the packet
+that was received. rtupdate0() is the "heart" of the distance-vector
+algorithm. The values it receives in a routing update packet from some
+other node i contain i's current shortest-path costs to all other
+network nodes. rtupdate0() uses these received values to update its own
+distance table (as specified by the distance-vector algorithm). If its
+own minimum cost to another node changes as a result of the update, node
+0 informs its directly connected neighbors of this change in minimum
+cost by sending them a routing packet. Recall that in the
+distance-vector algorithm, only directly connected nodes will exchange
+routing packets. Thus, nodes 1 and 2 will communicate with each other,
+but nodes 1 and 3 will not communicate with each other. Similar routines
+are defined for nodes 1, 2, and 3. Thus, you will write eight procedures
+in all: rtinit0(), rtinit1(), rtinit2(), rtinit3(), rtupdate0(),
+rtupdate1(), rtupdate2(), and rtupdate3(). These routines will together
+implement a distributed, asynchronous computation of the distance tables
+for the topology and costs shown in the figure on the preceding page.
+You can find the full details of the programming assignment, as well as
+C code that you will need to create the simulated hardware/software
+environment, at http://www.pearsonhighered.com/cs-resource. A Java
+version of the assignment is also available.
+
+Wireshark Lab In the Web site for this textbook,
+www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab
+assignment that examines the use of the ICMP protocol in the ping and
+traceroute commands.
+
+An Interview With... Jennifer Rexford Jennifer Rexford is a Professor in
+the Computer Science department at Princeton University. Her research
+has the broad goal of making computer networks easier to design and
+manage, with particular emphasis on routing protocols. From 1996--2004,
+she was a member of the Network Management and Performance department at
+AT&T Labs--Research. While at AT&T, she designed techniques and tools
+for network measurement, traffic engineering, and router configuration
+that were deployed in AT&T's backbone network. Jennifer is co-author of
+the book "Web Protocols and Practice: Networking Protocols, Caching, and
+Traffic Measurement," published by Addison-Wesley in May 2001. She
+served as the chair of ACM SIGCOMM from 2003 to 2007. She received her
+BSE degree in electrical engineering from Princeton University in 1991,
+and her PhD degree in electrical engineering and computer science from
+the University of Michigan in 1996. In 2004, Jennifer was the winner of
+ACM's Grace Murray Hopper Award for outstanding young computer
+professional and appeared on the MIT TR-100 list of top innovators under
+the age of 35.
+
+Please describe one or two of the most exciting projects you have worked
+on during your career. What were the biggest challenges? When I was a
+researcher at AT&T, a group of us designed a new way to manage routing
+in Internet Service Provider backbone networks. Traditionally, network
+operators configure each router individually, and these routers run
+distributed protocols to compute paths through the network. We believed
+that network management would be simpler and more flexible if network
+
+operators could exercise direct control over how routers forward traffic
+based on a network-wide view of the topology and traffic. The Routing
+Control Platform (RCP) we designed and built could compute the routes
+for all of AT&T's backbone on a single commodity computer, and could
+control legacy routers without modification. To me, this project was
+exciting because we had a provocative idea, a working system, and
+ultimately a real deployment in an operational network. Fast forward a
+few years, and software-defined networking (SDN) has become a mainstream
+technology, and standard protocols (like OpenFlow) have made it much
+easier to tell the underlying switches what to do. How do you think
+software-defined networking should evolve in the future? In a major
+break from the past, control-plane software can be created by many
+different programmers, not just at companies selling network equipment.
+Yet, unlike the applications running on a server or a smart phone,
+controller apps must work together to handle the same traffic. Network
+operators do not want to perform load balancing on some traffic and
+routing on other traffic; instead, they want to perform load balancing
+and routing, together, on the same traffic. Future SDN controller
+platforms should offer good programming abstractions for composing
+independently written multiple controller applications together. More
+broadly, good programming abstractions can make it easier to create
+controller applications, without having to worry about low-level details
+like flow table entries, traffic counters, bit patterns in packet
+headers, and so on. Also, while an SDN controller is logically
+centralized, the network still consists of a distributed collection of
+devices. Future controllers should offer good abstractions for updating
+the flow tables across the network, so apps can reason about what
+happens to packets in flight while the devices are updated. Programming
+abstractions for control-plane software is an exciting area for
+interdisciplinary research between computer networking, distributed
+systems, and programming languages, with a real chance for practical
+impact in the years ahead. Where do you see the future of networking and
+the Internet? Networking is an exciting field because the applications
+and the underlying technologies change all the time. We are always
+reinventing ourselves! Who would have predicted even ten years ago the
+dominance of smart phones, allowing mobile users to access existing
+applications as well as new location-based services? The emergence of
+cloud computing is fundamentally changing the relationship between users
+and the applications they run, and networked sensors and actuators (the
+"Internet of Things") are enabling a wealth of new applications (and
+security vulnerabilities!). The pace of innovation is truly inspiring.
+The underlying network is a crucial component in all of these
+innovations. Yet, the network is notoriously "in the way"---limiting
+performance, compromising reliability, constraining applications, and
+complicating the deployment and management of services. We should strive
+to make the network of the future as invisible as the air we breathe, so
+it never stands in the way of
+
+new ideas and valuable services. To do this, we need to raise the level
+of abstraction above individual network devices and protocols (and their
+attendant acronyms!), so we can reason about the network and the user's
+high-level goals as a whole. What people inspired you professionally?
+I've long been inspired by Sally Floyd at the International Computer
+Science Institute. Her research is always purposeful, focusing on the
+important challenges facing the Internet. She digs deeply into hard
+questions until she understands the problem and the space of solutions
+completely, and she devotes serious energy into "making things happen,"
+such as pushing her ideas into protocol standards and network equipment.
+Also, she gives back to the community, through professional service in
+numerous standards and research organizations and by creating tools
+(such as the widely used ns-2 and ns-3 simulators) that enable other
+researchers to succeed. She retired in 2009 but her influence on the
+field will be felt for years to come. What are your recommendations for
+students who want careers in computer science and networking? Networking
+is an inherently interdisciplinary field. Applying techniques from other
+disciplines breakthroughs in networking come from such diverse areas as
+queuing theory, game theory, control theory, distributed systems,
+network optimization, programming languages, machine learning,
+algorithms, data structures, and so on. I think that becoming conversant
+in a related field, or collaborating closely with experts in those
+fields, is a wonderful way to put networking on a stronger foundation,
+so we can learn how to build networks that are worthy of society's
+trust. Beyond the theoretical disciplines, networking is exciting
+because we create real artifacts that real people use. Mastering how to
+design and build systems---by gaining experience in operating systems,
+computer architecture, and so on---is another fantastic way to amplify
+your knowledge of networking to help make the world a better place.
+
+Chapter 6 The Link Layer and LANs
+
+In the previous two chapters we learned that the network layer provides
+a communication service between any two network hosts. Between the two
+hosts, datagrams travel over a series of communication links, some wired
+and some wireless, starting at the source host, passing through a series
+of packet switches (switches and routers) and ending at the destination
+host. As we continue down the protocol stack, from the network layer to
+the link layer, we naturally wonder how packets are sent across the
+individual links that make up the end-to-end communication path. How are
+the networklayer datagrams encapsulated in the link-layer frames for
+transmission over a single link? Are different link-layer protocols used
+in the different links along the communication path? How are
+transmission conflicts in broadcast links resolved? Is there addressing
+at the link layer and, if so, how does the linklayer addressing operate
+with the network-layer addressing we learned about in Chapter 4? And
+what exactly is the difference between a switch and a router? We'll
+answer these and other important questions in this chapter. In
+discussing the link layer, we'll see that there are two fundamentally
+different types of link-layer channels. The first type are broadcast
+channels, which connect multiple hosts in wireless LANs, satellite
+networks, and hybrid fiber-coaxial cable (HFC) access networks. Since
+many hosts are connected to the same broadcast communication channel, a
+so-called medium access protocol is needed to coordinate frame
+transmission. In some cases, a central controller may be used to
+coordinate transmissions; in other cases, the hosts themselves
+coordinate transmissions. The second type of link-layer channel is the
+point-to-point communication link, such as that often found between two
+routers connected by a long-distance link, or between a user's office
+computer and the nearby Ethernet switch to which it is connected.
+Coordinating access to a point-to-point link is simpler; the reference
+material on this book's Web site has a detailed discussion of the
+Point-to-Point Protocol (PPP), which is used in settings ranging from
+dial-up service over a telephone line to high-speed point-to-point frame
+transport over fiber-optic links. We'll explore several important
+link-layer concepts and technologies in this chapter. We'll dive deeper
+into error detection and correction, a topic we touched on briefly in
+Chapter 3. We'll consider multiple access networks and switched LANs,
+including Ethernet---by far the most prevalent wired LAN technology.
+We'll also look at virtual LANs, and data center networks. Although
+WiFi, and more generally wireless LANs, are link-layer topics, we'll
+postpone our study of these important topics until
+
+Chapter 7.
+
+6.1 Introduction to the Link Layer Let's begin with some important
+terminology. We'll find it convenient in this chapter to refer to any
+device that runs a link-layer (i.e., layer 2) protocol as a node. Nodes
+include hosts, routers, switches, and WiFi access points (discussed in
+Chapter 7). We will also refer to the communication channels that
+connect adjacent nodes along the communication path as links. In order
+for a datagram to be transferred from source host to destination host,
+it must be moved over each of the individual links in the end-to-end
+path. As an example, in the company network shown at the bottom of
+Figure 6.1, consider sending a datagram from one of the wireless hosts
+to one of the servers. This datagram will actually pass through six
+links: a WiFi link between sending host and WiFi access point, an
+Ethernet link between the access point and a link-layer switch; a link
+between the link-layer switch and the router, a link between the two
+routers; an Ethernet link between the router and a link-layer switch;
+and finally an Ethernet link between the switch and the server. Over a
+given link, a transmitting node encapsulates the datagram in a linklayer
+frame and transmits the frame into the link. In order to gain further
+insight into the link layer and how it relates to the network layer,
+let's consider a transportation analogy. Consider a travel agent who is
+planning a trip for a tourist traveling from Princeton, New Jersey, to
+Lausanne, Switzerland. The travel agent decides that it is most
+convenient for the tourist to take a limousine from Princeton to JFK
+airport, then a plane from JFK airport to Geneva's airport, and finally
+a train from Geneva's airport to Lausanne's train station. Once the
+travel agent makes the three reservations, it is the responsibility of
+the Princeton limousine company to get the tourist from Princeton to
+JFK; it is the responsibility of the airline company to get the tourist
+from JFK to Geneva; and it is the responsibility
+
+Figure 6.1 Six link-layer hops between wireless host and server
+
+of the Swiss train service to get the tourist from Geneva to Lausanne.
+Each of the three segments of the trip is "direct" between two
+"adjacent" locations. Note that the three transportation segments are
+managed by different companies and use entirely different transportation
+modes (limousine, plane, and train). Although the transportation modes
+are different, they each provide the basic service of moving passengers
+from one location to an adjacent location. In this transportation
+analogy, the tourist is a datagram, each transportation segment is a
+link, the transportation mode is a link-layer protocol, and the
+
+travel agent is a routing protocol.
+
+6.1.1 The Services Provided by the Link Layer Although the basic service
+of any link layer is to move a datagram from one node to an adjacent
+node over a single communication link, the details of the provided
+service can vary from one link-layer protocol to the next. Possible
+services that can be offered by a link-layer protocol include: Framing.
+Almost all link-layer protocols encapsulate each network-layer datagram
+within a link-layer frame before transmission over the link. A frame
+consists of a data field, in which the network-layer datagram is
+inserted, and a number of header fields. The structure of the frame is
+specified by the link-layer protocol. We'll see several different frame
+formats when we examine specific link-layer protocols in the second half
+of this chapter. Link access. A medium access control (MAC) protocol
+specifies the rules by which a frame is transmitted onto the link. For
+point-to-point links that have a single sender at one end of the link
+and a single receiver at the other end of the link, the MAC protocol is
+simple (or nonexistent)---the sender can send a frame whenever the link
+is idle. The more interesting case is when multiple nodes share a single
+broadcast link---the so-called multiple access problem. Here, the MAC
+protocol serves to coordinate the frame transmissions of the many nodes.
+Reliable delivery. When a link-layer protocol provides reliable delivery
+service, it guarantees to move each network-layer datagram across the
+link without error. Recall that certain transport-layer protocols (such
+as TCP) also provide a reliable delivery service. Similar to a
+transport-layer reliable delivery service, a link-layer reliable
+delivery service can be achieved with acknowledgments and
+retransmissions (see Section 3.4). A link-layer reliable delivery
+service is often used for links that are prone to high error rates, such
+as a wireless link, with the goal of correcting an error locally---on
+the link where the error occurs---rather than forcing an end-to-end
+retransmission of the data by a transport- or application-layer
+protocol. However, link-layer reliable delivery can be considered an
+unnecessary overhead for low bit-error links, including fiber, coax, and
+many twisted-pair copper links. For this reason, many wired link-layer
+protocols do not provide a reliable delivery service. Error detection
+and correction. The link-layer hardware in a receiving node can
+incorrectly decide that a bit in a frame is zero when it was transmitted
+as a one, and vice versa. Such bit errors are introduced by signal
+attenuation and electromagnetic noise. Because there is no need to
+forward a datagram that has an error, many link-layer protocols provide
+a mechanism to detect such bit errors. This is done by having the
+transmitting node include error-detection bits in the frame, and having
+the receiving node perform an error check. Recall from Chapters 3 and 4
+that the Internet's transport layer and network layer also provide a
+limited form of error detection---the Internet checksum. Error detection
+in the link layer is usually more sophisticated and is implemented in
+hardware. Error correction is similar to error detection, except that a
+receiver not only detects when bit errors have occurred in the frame but
+also determines exactly where in the frame the errors have occurred (and
+
+then corrects these errors).
+
+6.1.2 Where Is the Link Layer Implemented? Before diving into our
+detailed study of the link layer, let's conclude this introduction by
+considering the question of where the link layer is implemented. We'll
+focus here on an end system, since we learned in Chapter 4 that the link
+layer is implemented in a router's line card. Is a host's link layer
+implemented in hardware or software? Is it implemented on a separate
+card or chip, and how does it interface with the rest of a host's
+hardware and operating system components? Figure 6.2 shows a typical
+host architecture. For the most part, the link layer is implemented in a
+network adapter, also sometimes known as a network interface card (NIC).
+At the heart of the network adapter is the link-layer controller,
+usually a single, special-purpose chip that implements many of the
+link-layer services (framing, link access, error detection, and so on).
+Thus, much of a link-layer controller's functionality is implemented in
+hardware. For example, Intel's 710 adapter \[Intel 2016\] implements the
+Ethernet protocols we'll study in Section 6.5; the Atheros AR5006
+\[Atheros 2016\] controller implements the 802.11 WiFi protocols we'll
+study in Chapter 7. Until the late 1990s, most network adapters were
+physically separate cards (such as a PCMCIA card or a plug-in card
+fitting into a PC's PCI card slot) but increasingly, network adapters
+are being integrated onto the host's motherboard ---a so-called
+LAN-on-motherboard configuration. On the sending side, the controller
+takes a datagram that has been created and stored in host memory by the
+higher layers of the protocol stack, encapsulates the datagram in a
+link-layer frame (filling in the frame's various fields), and then
+transmits the frame into the communication link, following the
+linkaccess protocol. On the receiving side, a controller receives the
+entire frame, and extracts the networklayer datagram. If the link layer
+performs error detection, then it is the sending controller that sets
+the error-detection bits in the frame header and it is the receiving
+controller that performs error detection. Figure 6.2 shows a network
+adapter attaching to a host's bus (e.g., a PCI or PCI-X bus), where it
+looks much like any other I/O device to the other host
+
+Figure 6.2 Network adapter: Its relationship to other host components
+and to protocol stack functionality
+
+components. Figure 6.2 also shows that while most of the link layer is
+implemented in hardware, part of the link layer is implemented in
+software that runs on the host's CPU. The software components of the
+link layer implement higher-level link-layer functionality such as
+assembling link-layer addressing information and activating the
+controller hardware. On the receiving side, link-layer software responds
+to controller interrupts (e.g., due to the receipt of one or more
+frames), handling error conditions and passing a datagram up to the
+network layer. Thus, the link layer is a combination of hardware and
+software---the place in the protocol stack where software meets
+hardware. \[Intel 2016\] provides a readable overview (as well as a
+detailed description) of the XL710 controller from a softwareprogramming
+point of view.
+
+6.2 Error-Detection and -Correction Techniques In the previous section,
+we noted that bit-level error detection and correction---detecting and
+correcting the corruption of bits in a link-layer frame sent from one
+node to another physically connected neighboring node---are two services
+often provided by the link layer. We saw in Chapter 3 that
+errordetection and -correction services are also often offered at the
+transport layer as well. In this section, we'll examine a few of the
+simplest techniques that can be used to detect and, in some cases,
+correct such bit errors. A full treatment of the theory and
+implementation of this topic is itself the topic of many textbooks (for
+example, \[Schwartz 1980\] or \[Bertsekas 1991\]), and our treatment
+here is necessarily brief. Our goal here is to develop an intuitive feel
+for the capabilities that error-detection and -correction techniques
+provide and to see how a few simple techniques work and are used in
+practice in the link layer. Figure 6.3 illustrates the setting for our
+study. At the sending node, data, D, to be protected against bit errors
+is augmented with error-detection and -correction bits (EDC). Typically,
+the data to be protected includes not only the datagram passed down from
+the network layer for transmission across the link, but also link-level
+addressing information, sequence numbers, and other fields in the link
+frame header. Both D and EDC are sent to the receiving node in a
+link-level frame. At the receiving node, a sequence of bits, D′ and EDC′
+is received. Note that D′ and EDC′ may differ from the original D and
+EDC as a result of in-transit bit flips. The receiver's challenge is to
+determine whether or not D′ is the same as the original D, given that it
+has only received D′ and EDC′. The exact wording of the receiver's
+decision in Figure 6.3 (we ask whether an error is detected, not whether
+an error has occurred!) is important. Error-detection and -correction
+techniques allow the receiver to sometimes, but not always, detect that
+bit errors have occurred. Even with the use of error-detection bits
+there still may be undetected bit errors; that is, the receiver may be
+unaware that the received information contains bit errors. As a
+
+Figure 6.3 Error-detection and -correction scenario
+
+consequence, the receiver might deliver a corrupted datagram to the
+network layer, or be unaware that the contents of a field in the frame's
+header has been corrupted. We thus want to choose an errordetection
+scheme that keeps the probability of such occurrences small. Generally,
+more sophisticated error-detection and-correction techniques (that is,
+those that have a smaller probability of allowing undetected bit errors)
+incur a larger overhead---more computation is needed to compute and
+transmit a larger number of error-detection and -correction bits. Let's
+now examine three techniques for detecting errors in the transmitted
+data---parity checks (to illustrate the basic ideas behind error
+detection and correction), checksumming methods (which are more
+typically used in the transport layer), and cyclic redundancy checks
+(which are more typically used in the link layer in an adapter).
+
+6.2.1 Parity Checks Perhaps the simplest form of error detection is the
+use of a single parity bit. Suppose that the information to be sent, D
+in Figure 6.4, has d bits. In an even parity scheme, the sender simply
+includes one additional bit and chooses its value such that the total
+number of 1s in the d+1 bits (the original information plus a parity
+bit) is even. For odd parity schemes, the parity bit value is chosen
+such that there is an odd number of 1s. Figure 6.4 illustrates an even
+parity scheme, with the single parity bit being stored in a separate
+field.
+
+Receiver operation is also simple with a single parity bit. The receiver
+need only count the number of 1s in the received d+1 bits. If an odd
+number of 1-valued bits are found with an even parity scheme, the
+receiver knows that at least one bit error has occurred. More precisely,
+it knows that some odd number of bit errors have occurred. But what
+happens if an even number of bit errors occur? You should convince
+yourself that this would result in an undetected error. If the
+probability of bit errors is small and errors can be assumed to occur
+independently from one bit to the next, the probability of multiple bit
+errors in a packet would be extremely small. In this case, a single
+parity bit might suffice. However, measurements have shown that, rather
+than occurring independently, errors are often clustered together in
+"bursts." Under burst error conditions, the probability of undetected
+errors in a frame protected by single-bit parity can approach 50 percent
+\[Spragins 1991\]. Clearly, a more robust error-detection scheme is
+needed (and, fortunately, is used in practice!). But before examining
+error-detection schemes that are used in practice, let's consider a
+simple
+
+Figure 6.4 One-bit even parity
+
+generalization of one-bit parity that will provide us with insight into
+error-correction techniques. Figure 6.5 shows a two-dimensional
+generalization of the single-bit parity scheme. Here, the d bits in D
+are divided into i rows and j columns. A parity value is computed for
+each row and for each column. The resulting i+j+1 parity bits comprise
+the link-layer frame's error-detection bits. Suppose now that a single
+bit error occurs in the original d bits of information. With this
+twodimensional parity scheme, the parity of both the column and the row
+containing the flipped bit will be in error. The receiver can thus not
+only detect the fact that a single bit error has occurred, but can use
+the column and row indices of the column and row with parity errors to
+actually identify the bit that was corrupted and correct that error!
+Figure 6.5 shows an example in which the 1-valued bit in position (2,2)
+is corrupted and switched to a 0---an error that is both detectable and
+correctable at the receiver. Although our discussion has focused on the
+original d bits of information, a single error in the parity bits
+themselves is also detectable and correctable. Two-dimensional parity
+can also detect (but not correct!) any combination of two errors in a
+packet. Other properties of the two-dimensional parity scheme are
+explored in the problems at the end of the chapter.
+
+Figure 6.5 Two-dimensional even parity
+
+The ability of the receiver to both detect and correct errors is known
+as forward error correction (FEC). These techniques are commonly used in
+audio storage and playback devices such as audio CDs. In a network
+setting, FEC techniques can be used by themselves, or in conjunction
+with link-layer ARQ techniques similar to those we examined in Chapter
+3. FEC techniques are valuable because they can decrease the number of
+sender retransmissions required. Perhaps more important, they allow for
+immediate correction of errors at the receiver. This avoids having to
+wait for the round-trip propagation delay needed for the sender to
+receive a NAK packet and for the retransmitted packet to propagate back
+to the receiver---a potentially important advantage for real-time
+network applications \[Rubenstein 1998\] or links (such as deep-space
+links) with long propagation delays. Research examining the use of FEC
+in error-control protocols includes \[Biersack 1992; Nonnenmacher 1998;
+Byers 1998; Shacham 1990\].
+
+6.2.2 Checksumming Methods In checksumming techniques, the d bits of
+data in Figure 6.4 are treated as a sequence of k-bit integers. One
+simple checksumming method is to simply sum these k-bit integers and use
+the resulting sum as the error-detection bits. The Internet checksum is
+based on this approach---bytes of data are
+
+treated as 16-bit integers and summed. The 1s complement of this sum
+then forms the Internet checksum that is carried in the segment header.
+As discussed in Section 3.3, the receiver checks the checksum by taking
+the 1s complement of the sum of the received data (including the
+checksum) and checking whether the result is all 1 bits. If any of the
+bits are 0, an error is indicated. RFC 1071 discusses the Internet
+checksum algorithm and its implementation in detail. In the TCP and UDP
+protocols, the Internet checksum is computed over all fields (header and
+data fields included). In IP the checksum is computed over the IP header
+(since the UDP or TCP segment has its own checksum). In other protocols,
+for example, XTP \[Strayer 1992\], one checksum is computed over the
+header and another checksum is computed over the entire packet.
+Checksumming methods require relatively little packet overhead. For
+example, the checksums in TCP and UDP use only 16 bits. However, they
+provide relatively weak protection against errors as compared with
+cyclic redundancy check, which is discussed below and which is often
+used in the link layer. A natural question at this point is, Why is
+checksumming used at the transport layer and cyclic redundancy check
+used at the link layer? Recall that the transport layer is typically
+implemented in software in a host as part of the host's operating
+system. Because transport-layer error detection is implemented in
+software, it is important to have a simple and fast error-detection
+scheme such as checksumming. On the other hand, error detection at the
+link layer is implemented in dedicated hardware in adapters, which can
+rapidly perform the more complex CRC operations. Feldmeier \[Feldmeier
+1995\] presents fast software implementation techniques for not only
+weighted checksum codes, but CRC (see below) and other codes as well.
+
+6.2.3 Cyclic Redundancy Check (CRC) An error-detection technique used
+widely in today's computer networks is based on cyclic redundancy check
+(CRC) codes. CRC codes are also known as polynomial codes, since it is
+possible to view the bit string to be sent as a polynomial whose
+coefficients are the 0 and 1 values in the bit string, with operations
+on the bit string interpreted as polynomial arithmetic. CRC codes
+operate as follows. Consider the d-bit piece of data, D, that the
+sending node wants to send to the receiving node. The sender and
+receiver must first agree on an r+1 bit pattern, known as a generator,
+which we will denote as G. We will require that the most significant
+(leftmost) bit of G be a 1. The key idea behind CRC codes is shown in
+Figure 6.6. For a given piece of data, D, the sender will choose r
+additional bits, R, and append them to D such that the resulting d+r bit
+pattern (interpreted as a binary number) is exactly divisible by G
+(i.e., has no remainder) using modulo-2 arithmetic. The process of error
+checking with CRCs is thus simple: The receiver divides the d+r received
+bits by G. If the remainder is nonzero, the receiver knows that an error
+has occurred; otherwise the data is accepted as being correct.
+
+All CRC calculations are done in modulo-2 arithmetic without carries in
+addition or borrows in subtraction. This means that addition and
+subtraction are identical, and both are equivalent to the bitwise
+exclusive-or (XOR) of the operands. Thus, for example,
+
+1011 XOR 0101 = 1110 1001 XOR 1101 = 0100
+
+Also, we similarly have
+
+1011 - 0101 = 1110 1001 - 1101 = 0100
+
+Multiplication and division are the same as in base-2 arithmetic, except
+that any required addition or subtraction is done without carries or
+borrows. As in regular
+
+Figure 6.6 CRC
+
+binary arithmetic, multiplication by 2k left shifts a bit pattern by k
+places. Thus, given D and R, the quantity D⋅2rXOR R yields the d+r bit
+pattern shown in Figure 6.6. We'll use this algebraic characterization
+of the d+r bit pattern from Figure 6.6 in our discussion below. Let us
+now turn to the crucial question of how the sender computes R. Recall
+that we want to find R such that there is an n such that D⋅2rXOR R=nG
+That is, we want to choose R such that G divides into D⋅2rXOR R without
+remainder. If we XOR (that is, add modulo-2, without carry) R to both
+sides of the above equation, we get
+
+D⋅2r=nG XOR R This equation tells us that if we divide D⋅2r by G, the
+value of the remainder is precisely R. In other words, we can calculate
+R as R=remainderD⋅2rG Figure 6.7 illustrates this calculation for the
+case of D=101110, d=6, G=1001, and r=3. The 9 bits transmitted in this
+case are 101 110 011. You should check these calculations for yourself
+and also check that indeed D⋅2r=101011⋅G XOR R.
+
+Figure 6.7 A sample CRC calculation
+
+International standards have been defined for 8-, 12-, 16-, and 32-bit
+generators, G. The CRC-32 32-bit standard, which has been adopted in a
+number of link-level IEEE protocols, uses a generator of
+GCRC-32=100000100110000010001110110110111 Each of the CRC standards can
+detect burst errors of fewer than r+1 bits. (This means that all
+consecutive bit errors of r bits or fewer will be detected.)
+Furthermore, under appropriate assumptions, a burst of length greater
+than r+1 bits is detected with probability 1−0.5r. Also, each of the CRC
+standards can detect any odd number of bit errors. See \[Williams 1993\]
+for a discussion of implementing CRC checks. The theory behind CRC codes
+and even more powerful codes is beyond the scope of this text. The text
+\[Schwartz 1980\] provides an excellent introduction to this topic.
+
+6.3 Multiple Access Links and Protocols In the introduction to this
+chapter, we noted that there are two types of network links:
+point-to-point links and broadcast links. A point-to-point link consists
+of a single sender at one end of the link and a single receiver at the
+other end of the link. Many link-layer protocols have been designed for
+point-to-point links; the point-to-point protocol (PPP) and high-level
+data link control (HDLC) are two such protocols. The second type of
+link, a broadcast link, can have multiple sending and receiving nodes
+all connected to the same, single, shared broadcast channel. The term
+broadcast is used here because when any one node transmits a frame, the
+channel broadcasts the frame and each of the other nodes receives a
+copy. Ethernet and wireless LANs are examples of broadcast link-layer
+technologies. In this section we'll take a step back from specific
+link-layer protocols and first examine a problem of central importance
+to the link layer: how to coordinate the access of multiple sending and
+receiving nodes to a shared broadcast channel---the multiple access
+problem. Broadcast channels are often used in LANs, networks that are
+geographically concentrated in a single building (or on a corporate or
+university campus). Thus, we'll look at how multiple access channels are
+used in LANs at the end of this section. We are all familiar with the
+notion of broadcasting---television has been using it since its
+invention. But traditional television is a one-way broadcast (that is,
+one fixed node transmitting to many receiving nodes), while nodes on a
+computer network broadcast channel can both send and receive. Perhaps a
+more apt human analogy for a broadcast channel is a cocktail party,
+where many people gather in a large room (the air providing the
+broadcast medium) to talk and listen. A second good analogy is something
+many readers will be familiar with---a classroom---where teacher(s) and
+student(s) similarly share the same, single, broadcast medium. A central
+problem in both scenarios is that of determining who gets to talk (that
+is, transmit into the channel) and when. As humans, we've evolved an
+elaborate set of protocols for sharing the broadcast channel: "Give
+everyone a chance to speak." "Don't speak until you are spoken to."
+"Don't monopolize the conversation." "Raise your hand if you have a
+question." "Don't interrupt when someone is speaking." "Don't fall
+asleep when someone is talking." Computer networks similarly have
+protocols---so-called multiple access protocols---by which nodes
+
+regulate their transmission into the shared broadcast channel. As shown
+in Figure 6.8, multiple access protocols are needed in a wide variety of
+network settings, including both wired and wireless access networks, and
+satellite networks. Although technically each node accesses the
+broadcast channel through its adapter, in this section we will refer to
+the node as the sending and
+
+Figure 6.8 Various multiple access channels
+
+receiving device. In practice, hundreds or even thousands of nodes can
+directly communicate over a broadcast channel. Because all nodes are
+capable of transmitting frames, more than two nodes can transmit frames
+at the same time. When this happens, all of the nodes receive multiple
+frames at the same time; that is, the transmitted frames collide at all
+of the receivers. Typically, when there is a collision, none of the
+receiving nodes can make any sense of any of the frames that were
+transmitted; in a sense, the signals of the colliding frames become
+inextricably tangled together. Thus, all the frames involved in the
+collision are lost, and the broadcast channel is wasted during the
+collision interval. Clearly, if many nodes want to transmit frames
+frequently, many transmissions will result in collisions, and much of
+the bandwidth of the broadcast channel will be wasted. In order to
+ensure that the broadcast channel performs useful work when multiple
+nodes are active, it is
+
+necessary to somehow coordinate the transmissions of the active nodes.
+This coordination job is the responsibility of the multiple access
+protocol. Over the past 40 years, thousands of papers and hundreds of
+PhD dissertations have been written on multiple access protocols; a
+comprehensive survey of the first 20 years of this body of work is \[Rom
+1990\]. Furthermore, active research in multiple access protocols
+continues due to the continued emergence of new types of links,
+particularly new wireless links. Over the years, dozens of multiple
+access protocols have been implemented in a variety of link-layer
+technologies. Nevertheless, we can classify just about any multiple
+access protocol as belonging to one of three categories: channel
+partitioning protocols, random access protocols, and taking-turns
+protocols. We'll cover these categories of multiple access protocols in
+the following three subsections. Let's conclude this overview by noting
+that, ideally, a multiple access protocol for a broadcast channel of
+rate R bits per second should have the following desirable
+characteristics:
+
+1.  When only one node has data to send, that node has a throughput of R
+    bps.
+
+2.  When M nodes have data to send, each of these nodes has a throughput
+    of R/M bps. This need not necessarily imply that each of the M nodes
+    always has an instantaneous rate of R/M, but rather that each node
+    should have an average transmission rate of R/M over some suitably
+    defined interval of time.
+
+3.  The protocol is decentralized; that is, there is no master node that
+    represents a single point of failure for the network.
+
+4.  The protocol is simple, so that it is inexpensive to implement.
+
+6.3.1 Channel Partitioning Protocols Recall from our early discussion
+back in Section 1.3 that time-division multiplexing (TDM) and
+frequency-division multiplexing (FDM) are two techniques that can
+
+Figure 6.9 A four-node TDM and FDM example
+
+be used to partition a broadcast channel's bandwidth among all nodes
+sharing that channel. As an example, suppose the channel supports N
+nodes and that the transmission rate of the channel is R bps. TDM
+divides time into time frames and further divides each time frame into N
+time slots. (The TDM time frame should not be confused with the
+link-layer unit of data exchanged between sending and receiving
+adapters, which is also called a frame. In order to reduce confusion, in
+this subsection we'll refer to the link-layer unit of data exchanged as
+a packet.) Each time slot is then assigned to one of the N nodes.
+Whenever a node has a packet to send, it transmits the packet's bits
+during its assigned time slot in the revolving TDM frame. Typically,
+slot sizes are chosen so that a single packet can be transmitted during
+a slot time. Figure 6.9 shows a simple four-node TDM example. Returning
+to our cocktail party analogy, a TDM-regulated cocktail party would
+allow one partygoer to speak for a fixed period of time, then allow
+another partygoer to speak for the same amount of time, and so on. Once
+everyone had had a chance to talk, the pattern would repeat. TDM is
+appealing because it eliminates collisions and is perfectly fair: Each
+node gets a dedicated transmission rate of R/N bps during each frame
+time. However, it has two major drawbacks. First, a node is limited to
+an average rate of R/N bps even when it is the only node with packets to
+send. A second drawback is that a node must always wait for its turn in
+the transmission sequence---again, even when it is the only node with a
+frame to send. Imagine the partygoer who is the only one with anything
+to say (and imagine that this is the even rarer circumstance where
+everyone wants to hear what that one person has to say). Clearly, TDM
+would be a poor choice for a multiple access protocol for this
+particular party.
+
+While TDM shares the broadcast channel in time, FDM divides the R bps
+channel into different frequencies (each with a bandwidth of R/N) and
+assigns each frequency to one of the N nodes. FDM thus creates N smaller
+channels of R/N bps out of the single, larger R bps channel. FDM shares
+both the advantages and drawbacks of TDM. It avoids collisions and
+divides the bandwidth fairly among the N nodes. However, FDM also shares
+a principal disadvantage with TDM---a node is limited to a bandwidth of
+R/N, even when it is the only node with packets to send. A third channel
+partitioning protocol is code division multiple access (CDMA). While TDM
+and FDM assign time slots and frequencies, respectively, to the nodes,
+CDMA assigns a different code to each node. Each node then uses its
+unique code to encode the data bits it sends. If the codes are chosen
+carefully, CDMA networks have the wonderful property that different
+nodes can transmit simultaneously and yet have their respective
+receivers correctly receive a sender's encoded data bits (assuming the
+receiver knows the sender's code) in spite of interfering transmissions
+by other nodes. CDMA has been used in military systems for some time
+(due to its anti-jamming properties) and now has widespread civilian
+use, particularly in cellular telephony. Because CDMA's use is so
+tightly tied to wireless channels, we'll save our discussion of the
+technical details of CDMA until Chapter 7. For now, it will suffice to
+know that CDMA codes, like time slots in TDM and frequencies in FDM, can
+be allocated to the multiple access channel users.
+
+6.3.2 Random Access Protocols The second broad class of multiple access
+protocols are random access protocols. In a random access protocol, a
+transmitting node always transmits at the full rate of the channel,
+namely, R bps. When there is a collision, each node involved in the
+collision repeatedly retransmits its frame (that is, packet) until its
+frame gets through without a collision. But when a node experiences a
+collision, it doesn't necessarily retransmit the frame right away.
+Instead it waits a random delay before retransmitting the frame. Each
+node involved in a collision chooses independent random delays. Because
+the random delays are independently chosen, it is possible that one of
+the nodes will pick a delay that is sufficiently less than the delays of
+the other colliding nodes and will therefore be able to sneak its frame
+into the channel without a collision. There are dozens if not hundreds
+of random access protocols described in the literature \[Rom 1990;
+Bertsekas 1991\]. In this section we'll describe a few of the most
+commonly used random access protocols---the ALOHA protocols \[Abramson
+1970; Abramson 1985; Abramson 2009\] and the carrier sense multiple
+access (CSMA) protocols \[Kleinrock 1975b\]. Ethernet \[Metcalfe 1976\]
+is a popular and widely deployed CSMA protocol. Slotted ALOHA
+
+Let's begin our study of random access protocols with one of the
+simplest random access protocols, the slotted ALOHA protocol. In our
+description of slotted ALOHA, we assume the following: All frames
+consist of exactly L bits. Time is divided into slots of size L/R
+seconds (that is, a slot equals the time to transmit one frame). Nodes
+start to transmit frames only at the beginnings of slots. The nodes are
+synchronized so that each node knows when the slots begin. If two or
+more frames collide in a slot, then all the nodes detect the collision
+event before the slot ends. Let p be a probability, that is, a number
+between 0 and 1. The operation of slotted ALOHA in each node is simple:
+When the node has a fresh frame to send, it waits until the beginning of
+the next slot and transmits the entire frame in the slot. If there isn't
+a collision, the node has successfully transmitted its frame and thus
+need not consider retransmitting the frame. (The node can prepare a new
+frame for transmission, if it has one.) If there is a collision, the
+node detects the collision before the end of the slot. The node
+retransmits its frame in each subsequent slot with probability p until
+the frame is transmitted without a collision. By retransmitting with
+probability p, we mean that the node effectively tosses a biased coin;
+the event heads corresponds to "retransmit," which occurs with
+probability p. The event tails corresponds to "skip the slot and toss
+the coin again in the next slot"; this occurs with probability (1−p).
+All nodes involved in the collision toss their coins independently.
+Slotted ALOHA would appear to have many advantages. Unlike channel
+partitioning, slotted ALOHA allows a node to transmit continuously at
+the full rate, R, when that node is the only active node. (A node is
+said to be active if it has frames to send.) Slotted ALOHA is also
+highly decentralized, because each node detects collisions and
+independently decides when to retransmit. (Slotted ALOHA does, however,
+require the slots to be synchronized in the nodes; shortly we'll discuss
+an unslotted version of the ALOHA protocol, as well as CSMA protocols,
+none of which require such synchronization.) Slotted ALOHA is also an
+extremely simple protocol. Slotted ALOHA works well when there is only
+one active node, but how efficient is it when there are multiple active
+nodes? There are two possible efficiency
+
+Figure 6.10 Nodes 1, 2, and 3 collide in the first slot. Node 2 finally
+succeeds in the fourth slot, node 1 in the eighth slot, and node 3 in
+the ninth slot
+
+concerns here. First, as shown in Figure 6.10, when there are multiple
+active nodes, a certain fraction of the slots will have collisions and
+will therefore be "wasted." The second concern is that another fraction
+of the slots will be empty because all active nodes refrain from
+transmitting as a result of the probabilistic transmission policy. The
+only "unwasted" slots will be those in which exactly one node transmits.
+A slot in which exactly one node transmits is said to be a successful
+slot. The efficiency of a slotted multiple access protocol is defined to
+be the long-run fraction of successful slots in the case when there are
+a large number of active nodes, each always having a large number of
+frames to send. Note that if no form of access control were used, and
+each node were to immediately retransmit after each collision, the
+efficiency would be zero. Slotted ALOHA clearly increases the efficiency
+beyond zero, but by how much? We now proceed to outline the derivation
+of the maximum efficiency of slotted ALOHA. To keep this derivation
+simple, let's modify the protocol a little and assume that each node
+attempts to transmit a frame in each slot with probability p. (That is,
+we assume that each node always has a frame to send and that the node
+transmits with probability p for a fresh frame as well as for a frame
+that has already suffered a collision.) Suppose there are N nodes. Then
+the probability that a given slot is a successful slot is the
+probability that one of the nodes transmits and that the remaining N−1
+nodes do not transmit. The probability that a given node transmits is p;
+the probability that the remaining nodes do not transmit is (1−p)N−1.
+Therefore the probability a given node has a success is p(1−p)N−1.
+Because there are N nodes, the probability that any one of the N nodes
+has a success is Np(1−p)N−1. Thus, when there are N active nodes, the
+efficiency of slotted ALOHA is Np(1−p)N−1. To obtain the maximum
+efficiency for N active nodes, we have to find the p\* that maximizes
+this expression. (See the
+
+homework problems for a general outline of this derivation.) And to
+obtain the maximum efficiency for a large number of active nodes, we
+take the limit of Np*(1−p*)N−1 as N approaches infinity. (Again, see the
+homework problems.) After performing these calculations, we'll find that
+the maximum efficiency of the protocol is given by 1/e=0.37. That is,
+when a large number of nodes have many frames to transmit, then (at
+best) only 37 percent of the slots do useful work. Thus the effective
+transmission rate of the channel is not R bps but only 0.37 R bps! A
+similar analysis also shows that 37 percent of the slots go empty and 26
+percent of slots have collisions. Imagine the poor network administrator
+who has purchased a 100-Mbps slotted ALOHA system, expecting to be able
+to use the network to transmit data among a large number of users at an
+aggregate rate of, say, 80 Mbps! Although the channel is capable of
+transmitting a given frame at the full channel rate of 100 Mbps, in the
+long run, the successful throughput of this channel will be less than 37
+Mbps. ALOHA The slotted ALOHA protocol required that all nodes
+synchronize their transmissions to start at the beginning of a slot. The
+first ALOHA protocol \[Abramson 1970\] was actually an unslotted, fully
+decentralized protocol. In pure ALOHA, when a frame first arrives (that
+is, a network-layer datagram is passed down from the network layer at
+the sending node), the node immediately transmits the frame in its
+entirety into the broadcast channel. If a transmitted frame experiences
+a collision with one or more other transmissions, the node will then
+immediately (after completely transmitting its collided frame)
+retransmit the frame with probability p. Otherwise, the node waits for a
+frame transmission time. After this wait, it then transmits the frame
+with probability p, or waits (remaining idle) for another frame time
+with probability 1 -- p. To determine the maximum efficiency of pure
+ALOHA, we focus on an individual node. We'll make the same assumptions
+as in our slotted ALOHA analysis and take the frame transmission time to
+be the unit of time. At any given time, the probability that a node is
+transmitting a frame is p. Suppose this frame begins transmission at
+time t0. As shown in Figure 6.11, in order for this frame to be
+successfully transmitted, no other nodes can begin their transmission in
+the interval of time \[ t0−1,t0\]. Such a transmission would overlap
+with the beginning of the transmission of node i's frame. The
+probability that all other nodes do not begin a transmission in this
+interval is (1−p)N−1. Similarly, no other node can begin a transmission
+while node i is transmitting, as such a transmission would overlap with
+the latter part of node i's transmission. The probability that all other
+nodes do not begin a transmission in this interval is also (1−p)N−1.
+Thus, the probability that a given node has a successful transmission is
+p(1−p)2(N−1). By taking limits as in the slotted ALOHA case, we find
+that the maximum efficiency of the pure ALOHA protocol is only
+1/(2e)---exactly half that of slotted ALOHA. This then is the price to
+be paid for a fully decentralized ALOHA protocol.
+
+Figure 6.11 Interfering transmissions in pure ALOHA
+
+Carrier Sense Multiple Access (CSMA) In both slotted and pure ALOHA, a
+node's decision to transmit is made independently of the activity of the
+other nodes attached to the broadcast channel. In particular, a node
+neither pays attention to whether another node happens to be
+transmitting when it begins to transmit, nor stops transmitting if
+another node begins to interfere with its transmission. In our cocktail
+party analogy, ALOHA protocols are quite like a boorish partygoer who
+continues to chatter away regardless of whether other people are
+talking. As humans, we have human protocols that allow us not only to
+behave with more civility, but also to decrease the amount of time spent
+"colliding" with each other in conversation and, consequently, to
+increase the amount of data we exchange in our conversations.
+Specifically, there are two important rules for polite human
+conversation: Listen before speaking. If someone else is speaking, wait
+until they are finished. In the networking world, this is called carrier
+sensing---a node listens to the channel before transmitting. If a frame
+from another node is currently being transmitted into the channel, a
+node then waits until it detects no transmissions for a short amount of
+time and then begins transmission. If someone else begins talking at the
+same time, stop talking. In the networking world, this is called
+collision detection---a transmitting node listens to the channel while
+it is transmitting. If it detects that another node is transmitting an
+interfering frame, it stops transmitting and waits a random amount of
+time before repeating the sense-and-transmit-when-idle cycle. These two
+rules are embodied in the family of carrier sense multiple access (CSMA)
+and CSMA with collision detection (CSMA/CD) protocols \[Kleinrock 1975b;
+Metcalfe 1976; Lam 1980; Rom 1990\]. Many variations on CSMA and
+
+CASE HISTORY
+
+NORM ABRAMSON AND ALOHANET Norm Abramson, a PhD engineer, had a passion
+for surfing and an interest in packet switching. This combination of
+interests brought him to the University of Hawaii in 1969. Hawaii
+consists of many mountainous islands, making it difficult to install and
+operate land-based networks. When not surfing, Abramson thought about
+how to design a network that does packet switching over radio. The
+network he designed had one central host and several secondary nodes
+scattered over the Hawaiian Islands. The network had two channels, each
+using a different frequency band. The downlink channel broadcasted
+packets from the central host to the secondary hosts; and the upstream
+channel sent packets from the secondary hosts to the central host. In
+addition to sending informational packets, the central host also sent on
+the downstream channel an acknowledgment for each packet successfully
+received from the secondary hosts. Because the secondary hosts
+transmitted packets in a decentralized fashion, collisions on the
+upstream channel inevitably occurred. This observation led Abramson to
+devise the pure ALOHA protocol, as described in this chapter. In 1970,
+with continued funding from ARPA, Abramson connected his ALOHAnet to the
+ARPAnet. Abramson's work is important not only because it was the first
+example of a radio packet network, but also because it inspired Bob
+Metcalfe. A few years later, Metcalfe modified the ALOHA protocol to
+create the CSMA/CD protocol and the Ethernet LAN.
+
+CSMA/CD have been proposed. Here, we'll consider a few of the most
+important, and fundamental, characteristics of CSMA and CSMA/CD. The
+first question that you might ask about CSMA is why, if all nodes
+perform carrier sensing, do collisions occur in the first place? After
+all, a node will refrain from transmitting whenever it senses that
+another node is transmitting. The answer to the question can best be
+illustrated using space-time diagrams \[Molle 1987\]. Figure 6.12 shows
+a space-time diagram of four nodes (A, B, C, D) attached to a linear
+broadcast bus. The horizontal axis shows the position of each node in
+space; the vertical axis represents time. At time t0, node B senses the
+channel is idle, as no other nodes are currently transmitting. Node B
+thus begins transmitting, with its bits propagating in both directions
+along the broadcast medium. The downward propagation of B's bits in
+Figure 6.12 with increasing time indicates that a nonzero amount of time
+is needed for B's bits actually to propagate (albeit at near the speed
+of light) along the broadcast medium. At time t1(t1\>t0), node D has a
+frame to send. Although node B is currently transmitting at time t1, the
+bits being transmitted by B have yet to reach D, and thus D senses
+
+Figure 6.12 Space-time diagram of two CSMA nodes with colliding
+transmissions
+
+the channel idle at t1. In accordance with the CSMA protocol, D thus
+begins transmitting its frame. A short time later, B's transmission
+begins to interfere with D's transmission at D. From Figure 6.12, it is
+evident that the end-to-end channel propagation delay of a broadcast
+channel---the time it takes for a signal to propagate from one of the
+nodes to another---will play a crucial role in determining its
+performance. The longer this propagation delay, the larger the chance
+that a carrier-sensing node is not yet able to sense a transmission that
+has already begun at another node in the network. Carrier Sense Multiple
+Access with Collision Dection (CSMA/CD) In Figure 6.12, nodes do not
+perform collision detection; both B and D continue to transmit their
+frames in their entirety even though a collision has occurred. When a
+node performs collision detection, it ceases transmission as soon as it
+detects a collision. Figure 6.13 shows the same scenario as in Figure
+6.12, except that the two
+
+Figure 6.13 CSMA with collision detection
+
+nodes each abort their transmission a short time after detecting a
+collision. Clearly, adding collision detection to a multiple access
+protocol will help protocol performance by not transmitting a useless,
+damaged (by interference with a frame from another node) frame in its
+entirety. Before analyzing the CSMA/CD protocol, let us now summarize
+its operation from the perspective of an adapter (in a node) attached to
+a broadcast channel:
+
+1.  The adapter obtains a datagram from the network layer, prepares a
+    link-layer frame, and puts the frame adapter buffer.
+
+2.  If the adapter senses that the channel is idle (that is, there is no
+    signal energy entering the adapter from the channel), it starts to
+    transmit the frame. If, on the other hand, the adapter senses that
+    the channel is busy, it waits until it senses no signal energy and
+    then starts to transmit the frame.
+
+3.  While transmitting, the adapter monitors for the presence of signal
+    energy coming from other adapters using the broadcast channel.
+
+4.  If the adapter transmits the entire frame without detecting signal
+    energy from other adapters, the
+
+adapter is finished with the frame. If, on the other hand, the adapter
+detects signal energy from other adapters while transmitting, it aborts
+the transmission (that is, it stops transmitting its frame).
+
+5.  After aborting, the adapter waits a random amount of time and then
+    returns to step 2. The need to wait a random (rather than fixed)
+    amount of time is hopefully clear---if two nodes transmitted frames
+    at the same time and then both waited the same fixed amount of time,
+    they'd continue colliding forever. But what is a good interval of
+    time from which to choose the random backoff time? If the interval
+    is large and the number of colliding nodes is small, nodes are
+    likely to wait a large amount of time (with the channel remaining
+    idle) before repeating the sense-and-transmit-when-idle step. On the
+    other hand, if the interval is small and the number of colliding
+    nodes is large, it's likely that the chosen random values will be
+    nearly the same, and transmitting nodes will again collide. What
+    we'd like is an interval that is short when the number of colliding
+    nodes is small, and long when the number of colliding nodes is
+    large. The binary exponential backoff algorithm, used in Ethernet as
+    well as in DOCSIS cable network multiple access protocols \[DOCSIS
+    2011\], elegantly solves this problem. Specifically, when
+    transmitting a frame that has already experienced n collisions, a
+    node chooses the value of K at random from { 0,1,2,...2n−1}. Thus,
+    the more collisions experienced by a frame, the larger the interval
+    from which K is chosen. For Ethernet, the actual amount of time a
+    node waits is K⋅512 bit times (i.e., K times the amount of time
+    needed to send 512 bits into the Ethernet) and the maximum value
+    that n can take is capped at
+6.  Let's look at an example. Suppose that a node attempts to transmit a
+    frame for the first time and while transmitting it detects a
+    collision. The node then chooses K=0 with probability 0.5 or chooses
+    K=1 with probability 0.5. If the node chooses K=0, then it
+    immediately begins sensing the channel. If the node chooses K=1, it
+    waits 512 bit times (e.g., 5.12 microseconds for a 100 Mbps
+    Ethernet) before beginning the sense-and-transmit-when-idle cycle.
+    After a second collision, K is chosen with equal probability from
+    {0,1,2,3}. After three collisions, K is chosen with equal
+    probability from {0,1,2,3,4,5,6,7}. After 10 or more collisions, K
+    is chosen with equal probability from {0,1,2,..., 1023}. Thus, the
+    size of the sets from which K is chosen grows exponentially with the
+    number of collisions; for this reason this algorithm is referred to
+    as binary exponential backoff. We also note here that each time a
+    node prepares a new frame for transmission, it runs the CSMA/CD
+    algorithm, not taking into account any collisions that may have
+    occurred in the recent past. So it is possible that a node with a
+    new frame will immediately be able to sneak in a successful
+    transmission while several other nodes are in the exponential
+    backoff state. CSMA/CD Efficiency
+
+When only one node has a frame to send, the node can transmit at the
+full channel rate (e.g., for Ethernet typical rates are 10 Mbps, 100
+Mbps, or 1 Gbps). However, if many nodes have frames to transmit, the
+effective transmission rate of the channel can be much less. We define
+the efficiency of CSMA/CD to be the long-run fraction of time during
+which frames are being transmitted on the channel without collisions
+when there is a large number of active nodes, with each node having a
+large number of frames to send. In order to present a closed-form
+approximation of the efficiency of Ethernet, let dprop denote the
+maximum time it takes signal energy to propagate between any two
+adapters. Let dtrans be the time to transmit a maximum-size frame
+(approximately 1.2 msecs for a 10 Mbps Ethernet). A derivation of the
+efficiency of CSMA/CD is beyond the scope of this book (see \[Lam 1980\]
+and \[Bertsekas 1991\]). Here we simply state the following
+approximation: Efficiency=11+5dprop/dtrans We see from this formula that
+as dprop approaches 0, the efficiency approaches 1. This matches our
+intuition that if the propagation delay is zero, colliding nodes will
+abort immediately without wasting the channel. Also, as dtrans becomes
+very large, efficiency approaches 1. This is also intuitive because when
+a frame grabs the channel, it will hold on to the channel for a very
+long time; thus, the channel will be doing productive work most of the
+time.
+
+6.3.3 Taking-Turns Protocols Recall that two desirable properties of a
+multiple access protocol are (1) when only one node is active, the
+active node has a throughput of R bps, and (2) when M nodes are active,
+then each active node has a throughput of nearly R/M bps. The ALOHA and
+CSMA protocols have this first property but not the second. This has
+motivated researchers to create another class of protocols---the
+taking-turns protocols. As with random access protocols, there are
+dozens of taking-turns protocols, and each one of these protocols has
+many variations. We'll discuss two of the more important protocols here.
+The first one is the polling protocol. The polling protocol requires one
+of the nodes to be designated as a master node. The master node polls
+each of the nodes in a round-robin fashion. In particular, the master
+node first sends a message to node 1, saying that it (node 1) can
+transmit up to some maximum number of frames. After node 1 transmits
+some frames, the master node tells node 2 it (node 2) can transmit up to
+the maximum number of frames. (The master node can determine when a node
+has finished sending its frames by observing the lack of a signal on the
+channel.) The procedure continues in this manner, with the master node
+polling each of the nodes in a cyclic manner. The polling protocol
+eliminates the collisions and empty slots that plague random access
+protocols. This allows polling to achieve a much higher efficiency. But
+it also has a few drawbacks. The first drawback is that the protocol
+introduces a polling delay---the amount of time required to notify a
+node that it can
+
+transmit. If, for example, only one node is active, then the node will
+transmit at a rate less than R bps, as the master node must poll each of
+the inactive nodes in turn each time the active node has sent its
+maximum number of frames. The second drawback, which is potentially more
+serious, is that if the master node fails, the entire channel becomes
+inoperative. The 802.15 protocol and the Bluetooth protocol we will
+study in Section 6.3 are examples of polling protocols. The second
+taking-turns protocol is the token-passing protocol. In this protocol
+there is no master node. A small, special-purpose frame known as a token
+is exchanged among the nodes in some fixed order. For example, node 1
+might always send the token to node 2, node 2 might always send the
+token to node 3, and node N might always send the token to node 1. When
+a node receives a token, it holds onto the token only if it has some
+frames to transmit; otherwise, it immediately forwards the token to the
+next node. If a node does have frames to transmit when it receives the
+token, it sends up to a maximum number of frames and then forwards the
+token to the next node. Token passing is decentralized and highly
+efficient. But it has its problems as well. For example, the failure of
+one node can crash the entire channel. Or if a node accidentally
+neglects to release the token, then some recovery procedure must be
+invoked to get the token back in circulation. Over the years many
+token-passing protocols have been developed, including the fiber
+distributed data interface (FDDI) protocol \[Jain 1994\] and the IEEE
+802.5 token ring protocol \[IEEE 802.5 2012\], and each one had to
+address these as well as other sticky issues.
+
+6.3.4 DOCSIS: The Link-Layer Protocol for Cable Internet Access In the
+previous three subsections, we've learned about three broad classes of
+multiple access protocols: channel partitioning protocols, random access
+protocols, and taking turns protocols. A cable access network will make
+for an excellent case study here, as we'll find aspects of each of these
+three classes of multiple access protocols with the cable access
+network! Recall from Section 1.2.1 that a cable access network typically
+connects several thousand residential cable modems to a cable modem
+termination system (CMTS) at the cable network headend. The
+DataOver-Cable Service Interface Specifications (DOCSIS) \[DOCSIS 2011\]
+specifies the cable data network architecture and its protocols. DOCSIS
+uses FDM to divide the downstream (CMTS to modem) and upstream (modem to
+CMTS) network segments into multiple frequency channels. Each downstream
+channel is 6 MHz wide, with a maximum throughput of approximately 40
+Mbps per channel (although this data rate is seldom seen at a cable
+modem in practice); each upstream channel has a maximum channel width of
+6.4 MHz, and a maximum upstream throughput of approximately 30 Mbps.
+Each upstream and
+
+Figure 6.14 Upstream and downstream channels between CMTS and cable
+modems
+
+downstream channel is a broadcast channel. Frames transmitted on the
+downstream channel by the CMTS are received by all cable modems
+receiving that channel; since there is just a single CMTS transmitting
+into the downstream channel, however, there is no multiple access
+problem. The upstream direction, however, is more interesting and
+technically challenging, since multiple cable modems share the same
+upstream channel (frequency) to the CMTS, and thus collisions can
+potentially occur. As illustrated in Figure 6.14, each upstream channel
+is divided into intervals of time (TDM-like), each containing a sequence
+of mini-slots during which cable modems can transmit to the CMTS. The
+CMTS explicitly grants permission to individual cable modems to transmit
+during specific mini-slots. The CMTS accomplishes this by sending a
+control message known as a MAP message on a downstream channel to
+specify which cable modem (with data to send) can transmit during which
+mini-slot for the interval of time specified in the control message.
+Since mini-slots are explicitly allocated to cable modems, the CMTS can
+ensure there are no colliding transmissions during a mini-slot. But how
+does the CMTS know which cable modems have data to send in the first
+place? This is accomplished by having cable modems send
+mini-slot-request frames to the CMTS during a special set of interval
+mini-slots that are dedicated for this purpose, as shown in Figure 6.14.
+These mini-slotrequest frames are transmitted in a random access manner
+and so may collide with each other. A cable modem can neither sense
+whether the upstream channel is busy nor detect collisions. Instead, the
+cable modem infers that its mini-slot-request frame experienced a
+collision if it does not receive a response to the requested allocation
+in the next downstream control message. When a collision is inferred, a
+cable modem uses binary exponential backoff to defer the retransmission
+of its mini-slot-request frame to a future time slot. When there is
+little traffic on the upstream channel, a cable modem may actually
+transmit data frames during slots nominally assigned for
+mini-slot-request frames (and thus avoid having
+
+to wait for a mini-slot assignment). A cable access network thus serves
+as a terrific example of multiple access protocols in action---FDM, TDM,
+random access, and centrally allocated time slots all within one
+network!
+
+6.4 Switched Local Area Networks Having covered broadcast networks and
+multiple access protocols in the previous section, let's turn our
+attention next to switched local networks. Figure 6.15 shows a switched
+local network connecting three departments, two servers and a router
+with four switches. Because these switches operate at the link layer,
+they switch link-layer frames (rather than network-layer datagrams),
+don't recognize network-layer addresses, and don't use routing
+algorithms like RIP or OSPF to determine
+
+Figure 6.15 An institutional network connected together by four switches
+
+paths through the network of layer-2 switches. Instead of using IP
+addresses, we will soon see that they use link-layer addresses to
+forward link-layer frames through the network of switches. We'll begin
+our study of switched LANs by first covering link-layer addressing
+(Section 6.4.1). We then examine the celebrated Ethernet protocol
+(Section 6.5.2). After examining link-layer addressing and Ethernet,
+we'll look at how link-layer switches operate (Section 6.4.3), and then
+see (Section 6.4.4) how these switches are often used to build
+large-scale LANs.
+
+6.4.1 Link-Layer Addressing and ARP Hosts and routers have link-layer
+addresses. Now you might find this surprising, recalling from Chapter 4
+that hosts and routers have network-layer addresses as well. You might
+be asking, why in the world do we need to have addresses at both the
+network and link layers? In addition to describing the syntax and
+function of the link-layer addresses, in this section we hope to shed
+some light on why the two layers of addresses are useful and, in fact,
+indispensable. We'll also cover the Address Resolution Protocol (ARP),
+which provides a mechanism to translate IP addresses to link-layer
+addresses. MAC Addresses In truth, it is not hosts and routers that have
+link-layer addresses but rather their adapters (that is, network
+interfaces) that have link-layer addresses. A host or router with
+multiple network interfaces will thus have multiple link-layer addresses
+associated with it, just as it would also have multiple IP addresses
+associated with it. It's important to note, however, that link-layer
+switches do not have linklayer addresses associated with their
+interfaces that connect to hosts and routers. This is because the job of
+the link-layer switch is to carry datagrams between hosts and routers; a
+switch does this job transparently, that is, without the host or router
+having to explicitly address the frame to the intervening switch. This
+is illustrated in Figure 6.16. A link-layer address is variously called
+a LAN address, a physical address, or a MAC address. Because MAC address
+seems to be the most popular term, we'll henceforth refer to link-layer
+addresses as MAC addresses. For most LANs (including Ethernet and 802.11
+wireless LANs), the MAC address is 6 bytes long, giving 248 possible MAC
+addresses. As shown in Figure 6.16, these 6-byte addresses are typically
+expressed in hexadecimal notation, with each byte of the address
+expressed as a pair of hexadecimal numbers. Although MAC addresses were
+designed to be permanent, it is now possible to change an adapter's MAC
+address via software. For the rest of this section, however, we'll
+assume that an adapter's MAC address is fixed. One interesting property
+of MAC addresses is that no two adapters have the same address. This
+might seem surprising given that adapters are manufactured in many
+countries by many companies. How does a company manufacturing adapters
+in Taiwan make sure that it is using different addresses from a company
+manufacturing
+
+Figure 6.16 Each interface connected to a LAN has a unique MAC address
+
+adapters in Belgium? The answer is that the IEEE manages the MAC address
+space. In particular, when a company wants to manufacture adapters, it
+purchases a chunk of the address space consisting of 224 addresses for a
+nominal fee. IEEE allocates the chunk of 224 addresses by fixing the
+first 24 bits of a MAC address and letting the company create unique
+combinations of the last 24 bits for each adapter. An adapter's MAC
+address has a flat structure (as opposed to a hierarchical structure)
+and doesn't change no matter where the adapter goes. A laptop with an
+Ethernet interface always has the same MAC address, no matter where the
+computer goes. A smartphone with an 802.11 interface always has the same
+MAC address, no matter where the smartphone goes. Recall that, in
+contrast, IP addresses have a hierarchical structure (that is, a network
+part and a host part), and a host's IP addresses needs to be changed
+when the host moves, i.e., changes the network to which it is attached.
+An adapter's MAC address is analogous to a person's social security
+number, which also has a flat addressing structure and which doesn't
+change no matter where the person goes. An IP address is analogous to a
+person's postal address, which is hierarchical and which must be changed
+whenever a person moves. Just as a person may find it useful to have
+both a postal address and a social security number, it is useful for a
+host and router interfaces to have both a network-layer address and a
+MAC address. When an adapter wants to send a frame to some destination
+adapter, the sending adapter inserts the destination adapter's MAC
+address into the frame and then sends the frame into the LAN. As we will
+soon see, a switch occasionally broadcasts an incoming frame onto all of
+its interfaces. We'll see in Chapter 7 that 802.11 also broadcasts
+frames. Thus, an adapter may receive a frame that isn't addressed to it.
+Thus, when an adapter receives a frame, it will check to see whether the
+destination MAC address in the frame matches its own MAC address. If
+there is a match, the adapter extracts the enclosed datagram and passes
+the datagram up the protocol stack. If there isn't a match, the adapter
+discards the frame, without passing the network-layer datagram up. Thus,
+the destination only will be
+
+interrupted when the frame is received. However, sometimes a sending
+adapter does want all the other adapters on the LAN to receive and
+process the frame it is about to send. In this case, the sending adapter
+inserts a special MAC broadcast address into the destination address
+field of the frame. For LANs that use 6-byte addresses (such as Ethernet
+and 802.11), the broadcast address is a string of 48 consecutive 1s
+(that is, FF-FF-FF-FF-FFFF in hexadecimal notation). Address Resolution
+Protocol (ARP) Because there are both network-layer addresses (for
+example, Internet IP addresses) and link-layer addresses (that is, MAC
+addresses), there is a need to translate between them. For the Internet,
+this is the job of the Address Resolution Protocol (ARP) \[RFC 826\]. To
+understand the need for a protocol such as ARP, consider the network
+shown in Figure 6.17. In this simple example, each host and router has a
+single IP address and single MAC address. As usual, IP addresses are
+shown in dotted-decimal
+
+PRINCIPLES IN PRACTICE KEEPING THE LAYERS INDEPENDENT There are several
+reasons why hosts and router interfaces have MAC addresses in addition
+to network-layer addresses. First, LANs are designed for arbitrary
+network-layer protocols, not just for IP and the Internet. If adapters
+were assigned IP addresses rather than "neutral" MAC addresses, then
+adapters would not easily be able to support other network-layer
+protocols (for example, IPX or DECnet). Second, if adapters were to use
+network-layer addresses instead of MAC addresses, the network-layer
+address would have to be stored in the adapter RAM and reconfigured
+every time the adapter was moved (or powered up). Another option is to
+not use any addresses in the adapters and have each adapter pass the
+data (typically, an IP datagram) of each frame it receives up the
+protocol stack. The network layer could then check for a matching
+network-layer address. One problem with this option is that the host
+would be interrupted by every frame sent on the LAN, including by frames
+that were destined for other hosts on the same broadcast LAN. In
+summary, in order for the layers to be largely independent building
+blocks in a network architecture, different layers need to have their
+own addressing scheme. We have now seen three types of addresses: host
+names for the application layer, IP addresses for the network layer, and
+MAC addresses for the link layer.
+
+Figure 6.17 Each interface on a LAN has an IP address and a MAC address
+
+notation and MAC addresses are shown in hexadecimal notation. For the
+purposes of this discussion, we will assume in this section that the
+switch broadcasts all frames; that is, whenever a switch receives a
+frame on one interface, it forwards the frame on all of its other
+interfaces. In the next section, we will provide a more accurate
+explanation of how switches operate. Now suppose that the host with IP
+address 222.222.222.220 wants to send an IP datagram to host
+222.222.222.222. In this example, both the source and destination are in
+the same subnet, in the addressing sense of Section 4.3.3. To send a
+datagram, the source must give its adapter not only the IP datagram but
+also the MAC address for destination 222.222.222.222. The sending
+adapter will then construct a link-layer frame containing the
+destination's MAC address and send the frame into the LAN. The important
+question addressed in this section is, How does the sending host
+determine the MAC address for the destination host with IP address
+222.222.222.222? As you might have guessed, it uses ARP. An ARP module
+in the sending host takes any IP address on the same LAN as input, and
+returns the corresponding MAC address. In the example at hand, sending
+host 222.222.222.220 provides its ARP module the IP address
+222.222.222.222, and the ARP module returns the corresponding MAC
+address 49-BD-D2-C7-56-2A. So we see that ARP resolves an IP address to
+a MAC address. In many ways it is analogous to DNS (studied in Section
+2.5), which resolves host names to IP addresses. However, one important
+difference between the two resolvers is that DNS resolves host names for
+hosts anywhere in the Internet, whereas ARP resolves IP addresses only
+for hosts and router interfaces on the same subnet. If a node in
+California were to try to use ARP to resolve the IP address for a node
+in Mississippi, ARP would return with an error.
+
+Figure 6.18 A possible ARP table in 222.222.222.220
+
+Now that we have explained what ARP does, let's look at how it works.
+Each host and router has an ARP table in its memory, which contains
+mappings of IP addresses to MAC addresses. Figure 6.18 shows what an ARP
+table in host 222.222.222.220 might look like. The ARP table also
+contains a timeto-live (TTL) value, which indicates when each mapping
+will be deleted from the table. Note that a table does not necessarily
+contain an entry for every host and router on the subnet; some may have
+never been entered into the table, and others may have expired. A
+typical expiration time for an entry is 20 minutes from when an entry is
+placed in an ARP table. Now suppose that host 222.222.222.220 wants to
+send a datagram that is IP-addressed to another host or router on that
+subnet. The sending host needs to obtain the MAC address of the
+destination given the IP address. This task is easy if the sender's ARP
+table has an entry for the destination node. But what if the ARP table
+doesn't currently have an entry for the destination? In particular,
+suppose 222.222.222.220 wants to send a datagram to 222.222.222.222. In
+this case, the sender uses the ARP protocol to resolve the address.
+First, the sender constructs a special packet called an ARP packet. An
+ARP packet has several fields, including the sending and receiving IP
+and MAC addresses. Both ARP query and response packets have the same
+format. The purpose of the ARP query packet is to query all the other
+hosts and routers on the subnet to determine the MAC address
+corresponding to the IP address that is being resolved. Returning to our
+example, 222.222.222.220 passes an ARP query packet to the adapter along
+with an indication that the adapter should send the packet to the MAC
+broadcast address, namely, FF-FF-FFFF-FF-FF. The adapter encapsulates
+the ARP packet in a link-layer frame, uses the broadcast address for the
+frame's destination address, and transmits the frame into the subnet.
+Recalling our social security number/postal address analogy, an ARP
+query is equivalent to a person shouting out in a crowded room of
+cubicles in some company (say, AnyCorp): "What is the social security
+number of the person whose postal address is Cubicle 13, Room 112,
+AnyCorp, Palo Alto, California?" The frame containing the ARP query is
+received by all the other adapters on the subnet, and (because of the
+broadcast address) each adapter passes the ARP packet within the frame
+up to its ARP module. Each of these ARP modules checks to see if its IP
+address matches the destination IP address in the ARP packet. The one
+with a match sends back to the querying host a response ARP packet with
+the desired mapping. The querying host 222.222.222.220 can then update
+its ARP table and send its IP datagram, encapsulated in a link-layer
+frame whose destination MAC is that of the host or router responding to
+the earlier ARP query.
+
+There are a couple of interesting things to note about the ARP protocol.
+First, the query ARP message is sent within a broadcast frame, whereas
+the response ARP message is sent within a standard frame. Before reading
+on you should think about why this is so. Second, ARP is plug-and-play;
+that is, an ARP table gets built automatically---it doesn't have to be
+configured by a system administrator. And if a host becomes disconnected
+from the subnet, its entry is eventually deleted from the other ARP
+tables in the subnet. Students often wonder if ARP is a link-layer
+protocol or a network-layer protocol. As we've seen, an ARP packet is
+encapsulated within a link-layer frame and thus lies architecturally
+above the link layer. However, an ARP packet has fields containing
+link-layer addresses and thus is arguably a link-layer protocol, but it
+also contains network-layer addresses and thus is also arguably a
+network-layer protocol. In the end, ARP is probably best considered a
+protocol that straddles the boundary between the link and network
+layers---not fitting neatly into the simple layered protocol stack we
+studied in Chapter 1. Such are the complexities of real-world protocols!
+Sending a Datagram off the Subnet It should now be clear how ARP
+operates when a host wants to send a datagram to another host on the
+same subnet. But now let's look at the more complicated situation when a
+host on a subnet wants to send a network-layer datagram to a host off
+the subnet (that is, across a router onto another subnet). Let's discuss
+this issue in the context of Figure 6.19, which shows a simple network
+consisting of two subnets interconnected by a router. There are several
+interesting things to note about Figure 6.19. Each host has exactly one
+IP address and one adapter. But, as discussed in Chapter 4, a router has
+an IP address for each of its interfaces. For each router interface
+there is also an ARP module (in the router) and an adapter. Because the
+router in Figure 6.19 has two interfaces, it has two IP addresses, two
+ARP modules, and two adapters. Of course, each adapter in the network
+has its own MAC address.
+
+Figure 6.19 Two subnets interconnected by a router
+
+Also note that Subnet 1 has the network address 111.111.111/24 and that
+Subnet 2 has the network address 222.222.222/24. Thus all of the
+interfaces connected to Subnet 1 have addresses of the form
+111.111.111.xxx and all of the interfaces connected to Subnet 2 have
+addresses of the form 222.222.222.xxx. Now let's examine how a host on
+Subnet 1 would send a datagram to a host on Subnet 2. Specifically,
+suppose that host 111.111.111.111 wants to send an IP datagram to a host
+222.222.222.222. The sending host passes the datagram to its adapter, as
+usual. But the sending host must also indicate to its adapter an
+appropriate destination MAC address. What MAC address should the adapter
+use? One might be tempted to guess that the appropriate MAC address is
+that of the adapter for host 222.222.222.222, namely, 49-BD-D2-C7-56-2A.
+This guess, however, would be wrong! If the sending adapter were to use
+that MAC address, then none of the adapters on Subnet 1 would bother to
+pass the IP datagram up to its network layer, since the frame's
+destination address would not match the MAC address of any adapter on
+Subnet 1. The datagram would just die and go to datagram heaven. If we
+look carefully at Figure 6.19, we see that in order for a datagram to go
+from 111.111.111.111 to a host on Subnet 2, the datagram must first be
+sent to the router interface 111.111.111.110, which is the IP address of
+the first-hop router on the path to the final destination. Thus, the
+appropriate MAC address for the frame is the address of the adapter for
+router interface 111.111.111.110, namely, E6-E9-00-17BB-4B. How does the
+sending host acquire the MAC address for 111.111.111.110? By using ARP,
+of course! Once the sending adapter has this MAC address, it creates a
+frame (containing the datagram addressed to 222.222.222.222) and sends
+the frame into Subnet 1. The router adapter on Subnet 1 sees that the
+link-layer frame is addressed to it, and therefore passes the frame to
+the network layer of the router. Hooray---the IP datagram has
+successfully been moved from source host to the router! But we are not
+finished. We still have to move the datagram from the router to the
+destination. The router now has to determine the correct interface on
+which the datagram is to be forwarded. As discussed in Chapter 4, this
+is done by consulting a forwarding table in the router. The forwarding
+table tells the router that the datagram is to be forwarded via router
+interface 222.222.222.220. This interface then passes the datagram to
+its adapter, which encapsulates the datagram in a new frame and sends
+the frame into Subnet 2. This time, the destination MAC address of the
+frame is indeed the MAC address of the ultimate destination. And how
+does the router obtain this destination MAC address? From ARP, of
+course! ARP for Ethernet is defined in RFC 826. A nice introduction to
+ARP is given in the TCP/IP tutorial, RFC 1180. We'll explore ARP in more
+detail in the homework problems.
+
+6.4.2 Ethernet
+
+Ethernet has pretty much taken over the wired LAN market. In the 1980s
+and the early 1990s, Ethernet faced many challenges from other LAN
+technologies, including token ring, FDDI, and ATM. Some of these other
+technologies succeeded in capturing a part of the LAN market for a few
+years. But since its invention in the mid-1970s, Ethernet has continued
+to evolve and grow and has held on to its dominant position. Today,
+Ethernet is by far the most prevalent wired LAN technology, and it is
+likely to remain so for the foreseeable future. One might say that
+Ethernet has been to local area networking what the Internet has been to
+global networking. There are many reasons for Ethernet's success. First,
+Ethernet was the first widely deployed high-speed LAN. Because it was
+deployed early, network administrators became intimately familiar with
+Ethernet--- its wonders and its quirks---and were reluctant to switch
+over to other LAN technologies when they came on the scene. Second,
+token ring, FDDI, and ATM were more complex and expensive than Ethernet,
+which further discouraged network administrators from switching over.
+Third, the most compelling reason to switch to another LAN technology
+(such as FDDI or ATM) was usually the higher data rate of the new
+technology; however, Ethernet always fought back, producing versions
+that operated at equal data rates or higher. Switched Ethernet was also
+introduced in the early 1990s, which further increased its effective
+data rates. Finally, because Ethernet has been so popular, Ethernet
+hardware (in particular, adapters and switches) has become a commodity
+and is remarkably cheap. The original Ethernet LAN was invented in the
+mid-1970s by Bob Metcalfe and David Boggs. The original Ethernet LAN
+used a coaxial bus to interconnect the nodes. Bus topologies for
+Ethernet actually persisted throughout the 1980s and into the mid-1990s.
+Ethernet with a bus topology is a broadcast LAN ---all transmitted
+frames travel to and are processed by all adapters connected to the bus.
+Recall that we covered Ethernet's CSMA/CD multiple access protocol with
+binary exponential backoff in Section 6.3.2. By the late 1990s, most
+companies and universities had replaced their LANs with Ethernet
+installations using a hub-based star topology. In such an installation
+the hosts (and routers) are directly connected to a hub with
+twisted-pair copper wire. A hub is a physical-layer device that acts on
+individual bits rather than frames. When a bit, representing a zero or a
+one, arrives from one interface, the hub simply recreates the bit,
+boosts its energy strength, and transmits the bit onto all the other
+interfaces. Thus, Ethernet with a hub-based star topology is also a
+broadcast LAN---whenever a hub receives a bit from one of its
+interfaces, it sends a copy out on all of its other interfaces. In
+particular, if a hub receives frames from two different interfaces at
+the same time, a collision occurs and the nodes that created the frames
+must retransmit. In the early 2000s Ethernet experienced yet another
+major evolutionary change. Ethernet installations continued to use a
+star topology, but the hub at the center was replaced with a switch.
+We'll be examining switched Ethernet in depth later in this chapter. For
+now, we only mention that a switch is not only "collision-less" but is
+also a bona-fide store-and-forward packet switch; but unlike routers,
+which operate up through layer 3, a switch operates only up through
+layer 2.
+
+Figure 6.20 Ethernet frame structure
+
+Ethernet Frame Structure We can learn a lot about Ethernet by examining
+the Ethernet frame, which is shown in Figure 6.20. To give this
+discussion about Ethernet frames a tangible context, let's consider
+sending an IP datagram from one host to another host, with both hosts on
+the same Ethernet LAN (for example, the Ethernet LAN in Figure 6.17.)
+(Although the payload of our Ethernet frame is an IP datagram, we note
+that an Ethernet frame can carry other network-layer packets as well.)
+Let the sending adapter, adapter A, have the MAC address
+AA-AA-AA-AA-AA-AA and the receiving adapter, adapter B, have the MAC
+address BB-BB-BB-BB-BB-BB. The sending adapter encapsulates the IP
+datagram within an Ethernet frame and passes the frame to the physical
+layer. The receiving adapter receives the frame from the physical layer,
+extracts the IP datagram, and passes the IP datagram to the network
+layer. In this context, let's now examine the six fields of the Ethernet
+frame, as shown in Figure 6.20. Data field (46 to 1,500 bytes). This
+field carries the IP datagram. The maximum transmission unit (MTU) of
+Ethernet is 1,500 bytes. This means that if the IP datagram exceeds
+1,500 bytes, then the host has to fragment the datagram, as discussed in
+Section 4.3.2. The minimum size of the data field is 46 bytes. This
+means that if the IP datagram is less than 46 bytes, the data field has
+to be "stuffed" to fill it out to 46 bytes. When stuffing is used, the
+data passed to the network layer contains the stuffing as well as an IP
+datagram. The network layer uses the length field in the IP datagram
+header to remove the stuffing. Destination address (6 bytes). This field
+contains the MAC address of the destination adapter, BBBB-BB-BB-BB-BB.
+When adapter B receives an Ethernet frame whose destination address is
+either BB-BB-BB-BB-BB-BB or the MAC broadcast address, it passes the
+contents of the frame's data field to the network layer; if it receives
+a frame with any other MAC address, it discards the frame. Source
+address (6 bytes). This field contains the MAC address of the adapter
+that transmits the frame onto the LAN, in this example,
+AA-AA-AA-AA-AA-AA. Type field (2 bytes). The type field permits Ethernet
+to multiplex network-layer protocols. To understand this, we need to
+keep in mind that hosts can use other network-layer protocols besides
+IP. In fact, a given host may support multiple network-layer protocols
+using different protocols for different applications. For this reason,
+when the Ethernet frame arrives at adapter B, adapter B needs to know to
+which network-layer protocol it should pass (that is, demultiplex) the
+contents of the data field. IP and other network-layer protocols (for
+example, Novell IPX or AppleTalk) each have their own, standardized type
+number. Furthermore, the ARP protocol (discussed in the previous
+
+section) has its own type number, and if the arriving frame contains an
+ARP packet (i.e., has a type field of 0806 hexadecimal), the ARP packet
+will be demultiplexed up to the ARP protocol. Note that the type field
+is analogous to the protocol field in the network-layer datagram and the
+port-number fields in the transport-layer segment; all of these fields
+serve to glue a protocol at one layer to a protocol at the layer above.
+Cyclic redundancy check (CRC) (4 bytes). As discussed in Section 6.2.3,
+the purpose of the CRC field is to allow the receiving adapter, adapter
+B, to detect bit errors in the frame. Preamble (8 bytes). The Ethernet
+frame begins with an 8-byte preamble field. Each of the first 7 bytes of
+the preamble has a value of 10101010; the last byte is 10101011. The
+first 7 bytes of the preamble serve to "wake up" the receiving adapters
+and to synchronize their clocks to that of the sender's clock. Why
+should the clocks be out of synchronization? Keep in mind that adapter A
+aims to transmit the frame at 10 Mbps, 100 Mbps, or 1 Gbps, depending on
+the type of Ethernet LAN. However, because nothing is absolutely
+perfect, adapter A will not transmit the frame at exactly the target
+rate; there will always be some drift from the target rate, a drift
+which is not known a priori by the other adapters on the LAN. A
+receiving adapter can lock onto adapter A's clock simply by locking onto
+the bits in the first 7 bytes of the preamble. The last 2 bits of the
+eighth byte of the preamble (the first two consecutive 1s) alert adapter
+B that the "important stuff" is about to come. All of the Ethernet
+technologies provide connectionless service to the network layer. That
+is, when adapter A wants to send a datagram to adapter B, adapter A
+encapsulates the datagram in an Ethernet frame and sends the frame into
+the LAN, without first handshaking with adapter B. This layer-2
+connectionless service is analogous to IP's layer-3 datagram service and
+UDP's layer-4 connectionless service. Ethernet technologies provide an
+unreliable service to the network layer. Specifically, when adapter B
+receives a frame from adapter A, it runs the frame through a CRC check,
+but neither sends an acknowledgment when a frame passes the CRC check
+nor sends a negative acknowledgment when a frame fails the CRC check.
+When a frame fails the CRC check, adapter B simply discards the frame.
+Thus, adapter A has no idea whether its transmitted frame reached
+adapter B and passed the CRC check. This lack of reliable transport (at
+the link layer) helps to make Ethernet simple and cheap. But it also
+means that the stream of datagrams passed to the network layer can have
+gaps.
+
+CASE HISTORY BOB METCALFE AND ETHERNET As a PhD student at Harvard
+University in the early 1970s, Bob Metcalfe worked on the ARPAnet at
+MIT. During his studies, he also became exposed to Abramson's work on
+ALOHA and random access protocols. After completing his PhD and just
+before beginning a job at Xerox Palo Alto Research Center (Xerox PARC),
+he visited Abramson and his University of Hawaii colleagues for three
+months, getting a firsthand look at ALOHAnet. At Xerox PARC, Metcalfe
+
+became exposed to Alto computers, which in many ways were the
+forerunners of the personal computers of the 1980s. Metcalfe saw the
+need to network these computers in an inexpensive manner. So armed with
+his knowledge about ARPAnet, ALOHAnet, and random access protocols,
+Metcalfe---along with colleague David Boggs---invented Ethernet.
+Metcalfe and Boggs's original Ethernet ran at 2.94 Mbps and linked up to
+256 hosts separated by up to one mile. Metcalfe and Boggs succeeded at
+getting most of the researchers at Xerox PARC to communicate through
+their Alto computers. Metcalfe then forged an alliance between Xerox,
+Digital, and Intel to establish Ethernet as a 10 Mbps Ethernet standard,
+ratified by the IEEE. Xerox did not show much interest in
+commercializing Ethernet. In 1979, Metcalfe formed his own company,
+3Com, which developed and commercialized networking technology,
+including Ethernet technology. In particular, 3Com developed and
+marketed Ethernet cards in the early 1980s for the immensely popular IBM
+PCs.
+
+If there are gaps due to discarded Ethernet frames, does the application
+at Host B see gaps as well? As we learned in Chapter 3, this depends on
+whether the application is using UDP or TCP. If the application is using
+UDP, then the application in Host B will indeed see gaps in the data. On
+the other hand, if the application is using TCP, then TCP in Host B will
+not acknowledge the data contained in discarded frames, causing TCP in
+Host A to retransmit. Note that when TCP retransmits data, the data will
+eventually return to the Ethernet adapter at which it was discarded.
+Thus, in this sense, Ethernet does retransmit data, although Ethernet is
+unaware of whether it is transmitting a brand-new datagram with
+brand-new data, or a datagram that contains data that has already been
+transmitted at least once. Ethernet Technologies In our discussion
+above, we've referred to Ethernet as if it were a single protocol
+standard. But in fact, Ethernet comes in many different flavors, with
+somewhat bewildering acronyms such as 10BASE-T, 10BASE-2, 100BASE-T,
+1000BASE-LX, 10GBASE-T and 40GBASE-T. These and many other Ethernet
+technologies have been standardized over the years by the IEEE 802.3
+CSMA/CD (Ethernet) working group \[IEEE 802.3 2012\]. While these
+acronyms may appear bewildering, there is actually considerable order
+here. The first part of the acronym refers to the speed of the standard:
+10, 100, 1000, or 10G, for 10 Megabit (per second), 100 Megabit,
+Gigabit, 10 Gigabit and 40 Gigibit Ethernet, respectively. "BASE" refers
+to baseband Ethernet, meaning that the physical media only carries
+Ethernet traffic; almost all of the 802.3 standards are for baseband
+Ethernet. The final part of the acronym refers to the physical media
+itself; Ethernet is both a link-layer and a physical-layer specification
+and is carried over a variety of physical media including coaxial cable,
+copper wire, and fiber. Generally, a "T" refers to twisted-pair copper
+wires. Historically, an Ethernet was initially conceived of as a segment
+of coaxial cable. The early 10BASE-2 and 10BASE-5 standards specify 10
+Mbps Ethernet over two types of coaxial cable, each limited in
+
+length to 500 meters. Longer runs could be obtained by using a
+repeater---a physical-layer device that receives a signal on the input
+side, and regenerates the signal on the output side. A coaxial cable
+corresponds nicely to our view of Ethernet as a broadcast medium---all
+frames transmitted by one interface are received at other interfaces,
+and Ethernet's CDMA/CD protocol nicely solves the multiple access
+problem. Nodes simply attach to the cable, and voila, we have a local
+area network! Ethernet has passed through a series of evolutionary steps
+over the years, and today's Ethernet is very different from the original
+bus-topology designs using coaxial cable. In most installations today,
+nodes are connected to a switch via point-to-point segments made of
+twisted-pair copper wires or fiber-optic cables, as shown in Figures
+6.15--6.17. In the mid-1990s, Ethernet was standardized at 100 Mbps, 10
+times faster than 10 Mbps Ethernet. The original Ethernet MAC protocol
+and frame format were preserved, but higher-speed physical layers were
+defined for copper wire (100BASE-T) and fiber (100BASE-FX, 100BASE-SX,
+100BASE-BX). Figure 6.21 shows these different standards and the common
+Ethernet MAC protocol and frame format. 100 Mbps Ethernet is limited to
+a 100-meter distance over twisted pair, and to
+
+Figure 6.21 100 Mbps Ethernet standards: A common link layer, different
+physical layers
+
+several kilometers over fiber, allowing Ethernet switches in different
+buildings to be connected. Gigabit Ethernet is an extension to the
+highly successful 10 Mbps and 100 Mbps Ethernet standards. Offering a
+raw data rate of 40,000 Mbps, 40 Gigabit Ethernet maintains full
+compatibility with the huge installed base of Ethernet equipment. The
+standard for Gigabit Ethernet, referred to as IEEE 802.3z, does the
+following: Uses the standard Ethernet frame format (Figure 6.20) and is
+backward compatible with 10BASE-T and 100BASE-T technologies. This
+allows for easy integration of Gigabit Ethernet with the existing
+installed base of Ethernet equipment. Allows for point-to-point links as
+well as shared broadcast channels. Point-to-point links use switches
+while broadcast channels use hubs, as described earlier. In Gigabit
+Ethernet jargon, hubs are called buffered distributors. Uses CSMA/CD for
+shared broadcast channels. In order to have acceptable efficiency, the
+
+maximum distance between nodes must be severely restricted. Allows for
+full-duplex operation at 40 Gbps in both directions for point-to-point
+channels. Initially operating over optical fiber, Gigabit Ethernet is
+now able to run over category 5 UTP cabling. Let's conclude our
+discussion of Ethernet technology by posing a question that may have
+begun troubling you. In the days of bus topologies and hub-based star
+topologies, Ethernet was clearly a broadcast link (as defined in Section
+6.3) in which frame collisions occurred when nodes transmitted at the
+same time. To deal with these collisions, the Ethernet standard included
+the CSMA/CD protocol, which is particularly effective for a wired
+broadcast LAN spanning a small geographical region. But if the prevalent
+use of Ethernet today is a switch-based star topology, using
+store-and-forward packet switching, is there really a need anymore for
+an Ethernet MAC protocol? As we'll see shortly, a switch coordinates its
+transmissions and never forwards more than one frame onto the same
+interface at any time. Furthermore, modern switches are full-duplex, so
+that a switch and a node can each send frames to each other at the same
+time without interference. In other words, in a switch-based Ethernet
+LAN there are no collisions and, therefore, there is no need for a MAC
+protocol! As we've seen, today's Ethernets are very different from the
+original Ethernet conceived by Metcalfe and Boggs more than 30 years
+ago---speeds have increased by three orders of magnitude, Ethernet
+frames are carried over a variety of media, switched-Ethernets have
+become dominant, and now even the MAC protocol is often unnecessary! Is
+all of this really still Ethernet? The answer, of course, is "yes, by
+definition." It is interesting to note, however, that through all of
+these changes, there has indeed been one enduring constant that has
+remained unchanged over 30 years---Ethernet's frame format. Perhaps this
+then is the one true and timeless centerpiece of the Ethernet standard.
+
+6.4.3 Link-Layer Switches Up until this point, we have been purposefully
+vague about what a switch actually does and how it works. The role of
+the switch is to receive incoming link-layer frames and forward them
+onto outgoing links; we'll study this forwarding function in detail in
+this subsection. We'll see that the switch itself is transparent to the
+hosts and routers in the subnet; that is, a host/router addresses a
+frame to another host/router (rather than addressing the frame to the
+switch) and happily sends the frame into the LAN, unaware that a switch
+will be receiving the frame and forwarding it. The rate at which frames
+arrive to any one of the switch's output interfaces may temporarily
+exceed the link capacity of that interface. To accommodate this problem,
+switch output interfaces have buffers, in much the same way that router
+output interfaces have buffers for datagrams. Let's now take a closer
+look at how switches operate. Forwarding and Filtering
+
+Filtering is the switch function that determines whether a frame should
+be forwarded to some interface or should just be dropped. Forwarding is
+the switch function that determines the interfaces to which a frame
+should be directed, and then moves the frame to those interfaces. Switch
+filtering and forwarding are done with a switch table. The switch table
+contains entries for some, but not necessarily all, of the hosts and
+routers on a LAN. An entry in the switch table contains (1) a MAC
+address, (2) the switch interface that leads toward that MAC address,
+and (3) the time at which the entry was placed in the table. An example
+switch table for the uppermost switch in Figure 6.15 is shown in Figure
+6.22. This description of frame forwarding may sound similar to our
+discussion of datagram forwarding
+
+Figure 6.22 Portion of a switch table for the uppermost switch in Figure
+6.15
+
+in Chapter 4. Indeed, in our discussion of generalized forwarding in
+Section 4.4, we learned that many modern packet switches can be
+configured to forward on the basis of layer-2 destination MAC addresses
+(i.e., function as a layer-2 switch) or layer-3 IP destination addresses
+(i.e., function as a layer-3 router). Nonetheless, we'll make the
+important distinction that switches forward packets based on MAC
+addresses rather than on IP addresses. We will also see that a
+traditional (i.e., in a non-SDN context) switch table is constructed in
+a very different manner from a router's forwarding table. To understand
+how switch filtering and forwarding work, suppose a frame with
+destination address DDDD-DD-DD-DD-DD arrives at the switch on interface
+x. The switch indexes its table with the MAC address DD-DD-DD-DD-DD-DD.
+There are three possible cases: There is no entry in the table for
+DD-DD-DD-DD-DD-DD. In this case, the switch forwards copies of the frame
+to the output buffers preceding all interfaces except for interface x.
+In other words, if there is no entry for the destination address, the
+switch broadcasts the frame. There is an entry in the table, associating
+DD-DD-DD-DD-DD-DD with interface x. In this case, the frame is coming
+from a LAN segment that contains adapter DD-DD-DD-DD-DD-DD. There being
+no need to forward the frame to any of the other interfaces, the switch
+performs the filtering function by discarding the frame. There is an
+entry in the table, associating DD-DD-DD-DD-DD-DD with interface y≠x. In
+this case, the frame needs to be forwarded to the LAN segment attached
+to interface y. The switch performs its forwarding function by putting
+the frame in an output buffer that precedes interface y.
+
+Let's walk through these rules for the uppermost switch in Figure 6.15
+and its switch table in Figure 6.22. Suppose that a frame with
+destination address 62-FE-F7-11-89-A3 arrives at the switch from
+interface 1. The switch examines its table and sees that the destination
+is on the LAN segment connected to interface 1 (that is, Electrical
+Engineering). This means that the frame has already been broadcast on
+the LAN segment that contains the destination. The switch therefore
+filters (that is, discards) the frame. Now suppose a frame with the same
+destination address arrives from interface 2. The switch again examines
+its table and sees that the destination is in the direction of interface
+1; it therefore forwards the frame to the output buffer preceding
+interface 1. It should be clear from this example that as long as the
+switch table is complete and accurate, the switch forwards frames toward
+destinations without any broadcasting. In this sense, a switch is
+"smarter" than a hub. But how does this switch table get configured in
+the first place? Are there link-layer equivalents to network-layer
+routing protocols? Or must an overworked manager manually configure the
+switch table? Self-Learning A switch has the wonderful property
+(particularly for the already-overworked network administrator) that its
+table is built automatically, dynamically, and autonomously---without
+any intervention from a network administrator or from a configuration
+protocol. In other words, switches are self-learning. This capability is
+accomplished as follows:
+
+1.  The switch table is initially empty.
+2.  For each incoming frame received on an interface, the switch stores
+    in its table (1) the MAC address in the frame's source address
+    field, (2) the interface from which the frame arrived, and
+
+```{=html}
+<!-- -->
+```
+(3) the current time. In this manner the switch records in its table the
+    LAN segment on which the sender resides. If every host in the LAN
+    eventually sends a frame, then every host will eventually get
+    recorded in the table.
+
+```{=html}
+<!-- -->
+```
+3.  The switch deletes an address in the table if no frames are received
+    with that address as the source address after some period of time
+    (the aging time). In this manner, if a PC is replaced by another PC
+    (with a different adapter), the MAC address of the original PC will
+    eventually be purged from the switch table. Let's walk through the
+    self-learning property for the uppermost switch in Figure 6.15 and
+    its corresponding switch table in Figure 6.22. Suppose at time 9:39
+    a frame with source address 01-12-2334-45-56 arrives from
+    interface 2. Suppose that this address is not in the switch table.
+    Then the switch adds a new entry to the table, as shown in Figure
+    6.23. Continuing with this same example, suppose that the aging time
+    for this switch is 60 minutes, and no frames with source address
+    62-FE-F7-11-89-A3 arrive to the switch between 9:32 and 10:32. Then
+    at
+
+time 10:32, the switch removes this address from its table.
+
+Figure 6.23 Switch learns about the location of an adapter with address
+01-12-23-34-45-56
+
+Switches are plug-and-play devices because they require no intervention
+from a network administrator or user. A network administrator wanting to
+install a switch need do nothing more than connect the LAN segments to
+the switch interfaces. The administrator need not configure the switch
+tables at the time of installation or when a host is removed from one of
+the LAN segments. Switches are also full-duplex, meaning any switch
+interface can send and receive at the same time. Properties of
+Link-Layer Switching Having described the basic operation of a
+link-layer switch, let's now consider their features and properties. We
+can identify several advantages of using switches, rather than broadcast
+links such as buses or hub-based star topologies: Elimination of
+collisions. In a LAN built from switches (and without hubs), there is no
+wasted bandwidth due to collisions! The switches buffer frames and never
+transmit more than one frame on a segment at any one time. As with a
+router, the maximum aggregate throughput of a switch is the sum of all
+the switch interface rates. Thus, switches provide a significant
+performance improvement over LANs with broadcast links. Heterogeneous
+links. Because a switch isolates one link from another, the different
+links in the LAN can operate at different speeds and can run over
+different media. For example, the uppermost switch in Figure 6.15 might
+have three1 Gbps 1000BASE-T copper links, two 100 Mbps 100BASEFX fiber
+links, and one 100BASE-T copper link. Thus, a switch is ideal for mixing
+legacy equipment with new equipment. Management. In addition to
+providing enhanced security (see sidebar on Focus on Security), a switch
+also eases network management. For example, if an adapter malfunctions
+and continually sends Ethernet frames (called a jabbering adapter), a
+switch can detect the problem and internally disconnect the
+malfunctioning adapter. With this feature, the network administrator
+need not get out of bed and drive back to work in order to correct the
+problem. Similarly, a cable cut disconnects only that host that was
+using the cut cable to connect to the switch. In the days of coaxial
+cable, many a
+
+network manager spent hours "walking the line" (or more accurately,
+"crawling the floor") to find the cable break that brought down the
+entire network. Switches also gather statistics on bandwidth usage,
+collision rates, and traffic types, and make this information available
+to the network manager. This information can be used to debug and
+correct problems, and to plan how the LAN should evolve in the future.
+Researchers are exploring adding yet more management functionality into
+Ethernet LANs in prototype deployments \[Casado 2007; Koponen 2011\].
+FOCUS ON SECURITY SNIFFING A SWITCHED LAN: SWITCH POISONING When a host
+is connected to a switch, it typically only receives frames that are
+intended for it. For example, consider a switched LAN in Figure 6.17.
+When host A sends a frame to host B, and there is an entry for host B in
+the switch table, then the switch will forward the frame only to host B.
+If host C happens to be running a sniffer, host C will not be able to
+sniff this A-to-B frame. Thus, in a switched-LAN environment (in
+contrast to a broadcast link environment such as 802.11 LANs or
+hub--based Ethernet LANs), it is more difficult for an attacker to sniff
+frames. However, because the switch broadcasts frames that have
+destination addresses that are not in the switch table, the sniffer at C
+can still sniff some frames that are not intended for C. Furthermore, a
+sniffer will be able sniff all Ethernet broadcast frames with broadcast
+destination address FF--FF--FF--FF--FF--FF. A well-known attack against
+a switch, called switch poisoning, is to send tons of packets to the
+switch with many different bogus source MAC addresses, thereby filling
+the switch table with bogus entries and leaving no room for the MAC
+addresses of the legitimate hosts. This causes the switch to broadcast
+most frames, which can then be picked up by the sniffer \[Skoudis
+2006\]. As this attack is rather involved even for a sophisticated
+attacker, switches are significantly less vulnerable to sniffing than
+are hubs and wireless LANs.
+
+Switches Versus Routers As we learned in Chapter 4, routers are
+store-and-forward packet switches that forward packets using
+network-layer addresses. Although a switch is also a store-and-forward
+packet switch, it is fundamentally different from a router in that it
+forwards packets using MAC addresses. Whereas a router is a layer-3
+packet switch, a switch is a layer-2 packet switch. Recall, however,
+that we learned in Section 4.4 that modern switches using the "match
+plus action" operation can be used to forward a layer-2 frame based on
+the frame's destination MAC address, as well as a layer-3 datagram using
+the datagram's destination IP address. Indeed, we saw that switches
+using the OpenFlow standard can perform generalized packet forwarding
+based on any of eleven different frame, datagram, and transportlayer
+header fields.
+
+Even though switches and routers are fundamentally different, network
+administrators must often choose between them when installing an
+interconnection device. For example, for the network in Figure 6.15, the
+network administrator could just as easily have used a router instead of
+a switch to connect the department LANs, servers, and internet gateway
+router. Indeed, a router would permit interdepartmental communication
+without creating collisions. Given that both switches and routers are
+candidates for interconnection devices, what are the pros and cons of
+the two approaches?
+
+Figure 6.24 Packet processing in switches, routers, and hosts
+
+First consider the pros and cons of switches. As mentioned above,
+switches are plug-and-play, a property that is cherished by all the
+overworked network administrators of the world. Switches can also have
+relatively high filtering and forwarding rates---as shown in Figure
+6.24, switches have to process frames only up through layer 2, whereas
+routers have to process datagrams up through layer 3. On the other hand,
+to prevent the cycling of broadcast frames, the active topology of a
+switched network is restricted to a spanning tree. Also, a large
+switched network would require large ARP tables in the hosts and routers
+and would generate substantial ARP traffic and processing. Furthermore,
+switches are susceptible to broadcast storms---if one host goes haywire
+and transmits an endless stream of Ethernet broadcast frames, the
+switches will forward all of these frames, causing the entire network to
+collapse. Now consider the pros and cons of routers. Because network
+addressing is often hierarchical (and not flat, as is MAC addressing),
+packets do not normally cycle through routers even when the network has
+redundant paths. (However, packets can cycle when router tables are
+misconfigured; but as we learned in Chapter 4, IP uses a special
+datagram header field to limit the cycling.) Thus, packets are not
+restricted to a spanning tree and can use the best path between source
+and destination. Because routers do not have the spanning tree
+restriction, they have allowed the Internet to be built with a rich
+topology that includes, for example, multiple active links between
+Europe and North America. Another feature of routers is that they
+provide firewall protection against layer-2 broadcast storms. Perhaps
+the most significant drawback of routers, though, is that they are not
+plug-and-play---they and the hosts that connect to them need their IP
+addresses to be configured. Also, routers often have a larger per-packet
+processing time than switches, because they have to process up through
+the layer-3 fields. Finally, there
+
+are two different ways to pronounce the word router, either as "rootor"
+or as "rowter," and people waste a lot of time arguing over the proper
+pronunciation \[Perlman 1999\]. Given that both switches and routers
+have their pros and cons (as summarized in Table 6.1), when should an
+institutional network (for example, a university campus Table 6.1
+Comparison of the typical features of popular interconnection devices
+Hubs
+
+Routers
+
+Switches
+
+Traffic isolation
+
+No
+
+Yes
+
+Yes
+
+Plug and play
+
+Yes
+
+No
+
+Yes
+
+Optimal routing
+
+No
+
+Yes
+
+No
+
+network or a corporate campus network) use switches, and when should it
+use routers? Typically, small networks consisting of a few hundred hosts
+have a few LAN segments. Switches suffice for these small networks, as
+they localize traffic and increase aggregate throughput without
+requiring any configuration of IP addresses. But larger networks
+consisting of thousands of hosts typically include routers within the
+network (in addition to switches). The routers provide a more robust
+isolation of traffic, control broadcast storms, and use more
+"intelligent" routes among the hosts in the network. For more discussion
+of the pros and cons of switched versus routed networks, as well as a
+discussion of how switched LAN technology can be extended to accommodate
+two orders of magnitude more hosts than today's Ethernets, see \[Meyers
+2004; Kim 2008\].
+
+6.4.4 Virtual Local Area Networks (VLANs) In our earlier discussion of
+Figure 6.15, we noted that modern institutional LANs are often
+configured hierarchically, with each workgroup (department) having its
+own switched LAN connected to the switched LANs of other groups via a
+switch hierarchy. While such a configuration works well in an ideal
+world, the real world is often far from ideal. Three drawbacks can be
+identified in the configuration in Figure 6.15: Lack of traffic
+isolation. Although the hierarchy localizes group traffic to within a
+single switch, broadcast traffic (e.g., frames carrying ARP and DHCP
+messages or frames whose destination has not yet been learned by a
+self-learning switch) must still traverse the entire institutional
+network.
+
+Limiting the scope of such broadcast traffic would improve LAN
+performance. Perhaps more importantly, it also may be desirable to limit
+LAN broadcast traffic for security/privacy reasons. For example, if one
+group contains the company's executive management team and another group
+contains disgruntled employees running Wireshark packet sniffers, the
+network manager may well prefer that the executives' traffic never even
+reaches employee hosts. This type of isolation could be provided by
+replacing the center switch in Figure 6.15 with a router. We'll see
+shortly that this isolation also can be achieved via a switched (layer
+2) solution. Inefficient use of switches. If instead of three groups,
+the institution had 10 groups, then 10 firstlevel switches would be
+required. If each group were small, say less than 10 people, then a
+single 96-port switch would likely be large enough to accommodate
+everyone, but this single switch would not provide traffic isolation.
+Managing users. If an employee moves between groups, the physical
+cabling must be changed to connect the employee to a different switch in
+Figure 6.15. Employees belonging to two groups make the problem even
+harder. Fortunately, each of these difficulties can be handled by a
+switch that supports virtual local area networks (VLANs). As the name
+suggests, a switch that supports VLANs allows multiple virtual local
+area networks to be defined over a single physical local area network
+infrastructure. Hosts within a VLAN communicate with each other as if
+they (and no other hosts) were connected to the switch. In a port-based
+VLAN, the switch's ports (interfaces) are divided into groups by the
+network manager. Each group constitutes a VLAN, with the ports in each
+VLAN forming a broadcast domain (i.e., broadcast traffic from one port
+can only reach other ports in the group). Figure 6.25 shows a single
+switch with 16 ports. Ports 2 to 8 belong to the EE VLAN, while ports 9
+to 15 belong to the CS VLAN (ports 1 and 16 are unassigned). This VLAN
+solves all of the difficulties noted above---EE and CS VLAN frames are
+isolated from each other, the two switches in Figure 6.15 have been
+replaced by a single switch, and if the user at switch port 8 joins the
+CS Department, the network operator simply reconfigures the VLAN
+software so that port 8 is now associated with the CS VLAN. One can
+easily imagine how the VLAN switch is configured and operates---the
+network manager declares a port to belong
+
+Figure 6.25 A single switch with two configured VLANs
+
+to a given VLAN (with undeclared ports belonging to a default VLAN)
+using switch management software, a table of port-to-VLAN mappings is
+maintained within the switch; and switch hardware only delivers frames
+between ports belonging to the same VLAN. But by completely isolating
+the two VLANs, we have introduced a new difficulty! How can traffic from
+the EE Department be sent to the CS Department? One way to handle this
+would be to connect a VLAN switch port (e.g., port 1 in Figure 6.25) to
+an external router and configure that port to belong both the EE and CS
+VLANs. In this case, even though the EE and CS departments share the
+same physical switch, the logical configuration would look as if the EE
+and CS departments had separate switches connected via a router. An IP
+datagram going from the EE to the CS department would first cross the EE
+VLAN to reach the router and then be forwarded by the router back over
+the CS VLAN to the CS host. Fortunately, switch vendors make such
+configurations easy for the network manager by building a single device
+that contains both a VLAN switch and a router, so a separate external
+router is not needed. A homework problem at the end of the chapter
+explores this scenario in more detail. Returning again to Figure 6.15,
+let's now suppose that rather than having a separate Computer
+Engineering department, some EE and CS faculty are housed in a separate
+building, where (of course!) they need network access, and (of course!)
+they'd like to be part of their department's VLAN. Figure 6.26 shows a
+second 8-port switch, where the switch ports have been defined as
+belonging to the EE or the CS VLAN, as needed. But how should these two
+switches be interconnected? One easy solution would be to define a port
+belonging to the CS VLAN on each switch (similarly for the EE VLAN) and
+to connect these ports to each other, as shown in Figure 6.26(a). This
+solution doesn't scale, however, since N VLANS would require N ports on
+each switch simply to interconnect the two switches. A more scalable
+approach to interconnecting VLAN switches is known as VLAN trunking. In
+the VLAN trunking approach shown in Figure 6.26(b), a special port on
+each switch (port 16 on the left switch and port 1 on the right switch)
+is configured as a trunk port to interconnect the two VLAN switches. The
+trunk port belongs to all VLANs, and frames sent to any VLAN are
+forwarded over the trunk link to the other switch. But this raises yet
+another question: How does a switch know that a frame arriving on a
+trunk port belongs to a particular VLAN? The IEEE has defined an
+extended Ethernet frame format, 802.1Q, for frames crossing a VLAN
+trunk. As shown in Figure 6.27, the 802.1Q frame consists of the
+standard Ethernet frame with a four-byte VLAN tag added into the header
+that carries the identity of the VLAN to which the frame belongs. The
+VLAN tag is added into a frame by the switch at the sending side of a
+VLAN trunk, parsed, and removed by the switch at the receiving side of
+the trunk. The VLAN tag itself consists of a 2-byte Tag Protocol
+Identifier (TPID) field (with a fixed hexadecimal value of 81-00), a
+2byte Tag Control Information field that contains a 12-bit VLAN
+identifier field, and a 3-bit priority field that is similar in intent
+to the IP datagram TOS field.
+
+Figure 6.26 Connecting two VLAN switches with two VLANs: (a) two cables
+(b) trunked
+
+Figure 6.27 Original Ethernet frame (top), 802.1Q-tagged Ethernet VLAN
+frame (below)
+
+In this discussion, we've only briefly touched on VLANs and have focused
+on port-based VLANs. We should also mention that VLANs can be defined in
+several other ways. In MAC-based VLANs, the network manager specifies
+the set of MAC addresses that belong to each VLAN; whenever a device
+attaches to a port, the port is connected into the appropriate VLAN
+based on the MAC address of the device. VLANs can also be defined based
+on network-layer protocols (e.g., IPv4, IPv6, or Appletalk) and other
+criteria. It is also possible for VLANs to be extended across IP
+routers, allowing islands of LANs to be connected together to form a
+single VLAN that could span the globe \[Yu 2011\]. See the 802.1Q
+standard \[IEEE 802.1q 2005\] for more details.
+
+6.5 Link Virtualization: A Network as a Link Layer Because this chapter
+concerns link-layer protocols, and given that we're now nearing the
+chapter's end, let's reflect on how our understanding of the term link
+has evolved. We began this chapter by viewing the link as a physical
+wire connecting two communicating hosts. In studying multiple access
+protocols, we saw that multiple hosts could be connected by a shared
+wire and that the "wire" connecting the hosts could be radio spectra or
+other media. This led us to consider the link a bit more abstractly as a
+channel, rather than as a wire. In our study of Ethernet LANs (Figure
+6.15) we saw that the interconnecting media could actually be a rather
+complex switched infrastructure. Throughout this evolution, however, the
+hosts themselves maintained the view that the interconnecting medium was
+simply a link-layer channel connecting two or more hosts. We saw, for
+example, that an Ethernet host can be blissfully unaware of whether it
+is connected to other LAN hosts by a single short LAN segment (Figure
+6.17) or by a geographically dispersed switched LAN (Figure 6.15) or by
+a VLAN (Figure 6.26). In the case of a dialup modem connection between
+two hosts, the link connecting the two hosts is actually the telephone
+network---a logically separate, global telecommunications network with
+its own switches, links, and protocol stacks for data transfer and
+signaling. From the Internet link-layer point of view, however, the
+dial-up connection through the telephone network is viewed as a simple
+"wire." In this sense, the Internet virtualizes the telephone network,
+viewing the telephone network as a link-layer technology providing
+link-layer connectivity between two Internet hosts. You may recall from
+our discussion of overlay networks in Chapter 2 that an overlay network
+similarly views the Internet as a means for providing connectivity
+between overlay nodes, seeking to overlay the Internet in the same way
+that the Internet overlays the telephone network. In this section, we'll
+consider Multiprotocol Label Switching (MPLS) networks. Unlike the
+circuit-switched telephone network, MPLS is a packet-switched,
+virtual-circuit network in its own right. It has its own packet formats
+and forwarding behaviors. Thus, from a pedagogical viewpoint, a
+discussion of MPLS fits well into a study of either the network layer or
+the link layer. From an Internet viewpoint, however, we can consider
+MPLS, like the telephone network and switched-Ethernets, as a link-layer
+technology that serves to interconnect IP devices. Thus, we'll consider
+MPLS in our discussion of the link layer. Framerelay and ATM networks
+can also be used to interconnect IP devices, though they represent a
+slightly older (but still deployed) technology and will not be covered
+here; see the very readable book \[Goralski 1999\] for details. Our
+treatment of MPLS will be necessarily brief, as entire books could be
+(and have been) written on these networks. We recommend \[Davie 2000\]
+for details on MPLS. We'll focus here primarily on how MPLS servers
+interconnect to IP devices, although we'll dive a bit deeper into the
+underlying technologies as well.
+
+6.5.1 Multiprotocol Label Switching (MPLS) Multiprotocol Label Switching
+(MPLS) evolved from a number of industry efforts in the mid-to-late
+1990s to improve the forwarding speed of IP routers by adopting a key
+concept from the world of virtual-circuit networks: a fixed-length
+label. The goal was not to abandon the destination-based IP
+datagramforwarding infrastructure for one based on fixed-length labels
+and virtual circuits, but to augment it by selectively labeling
+datagrams and allowing routers to forward datagrams based on
+fixed-length labels (rather than destination IP addresses) when
+possible. Importantly, these techniques work hand-in-hand with IP, using
+IP addressing and routing. The IETF unified these efforts in the MPLS
+protocol \[RFC 3031, RFC 3032\], effectively blending VC techniques into
+a routed datagram network. Let's begin our study of MPLS by considering
+the format of a link-layer frame that is handled by an MPLS-capable
+router. Figure 6.28 shows that a link-layer frame transmitted between
+MPLS-capable devices has a small MPLS header added between the layer-2
+(e.g., Ethernet) header and layer-3 (i.e., IP) header. RFC 3032 defines
+the format of the MPLS header for such links; headers are defined for
+ATM and frame-relayed networks as well in other RFCs. Among the fields
+in the MPLS
+
+Figure 6.28 MPLS header: Located between link- and network-layer headers
+
+header are the label, 3 bits reserved for experimental use, a single S
+bit, which is used to indicate the end of a series of "stacked" MPLS
+headers (an advanced topic that we'll not cover here), and a time-tolive
+field. It's immediately evident from Figure 6.28 that an MPLS-enhanced
+frame can only be sent between routers that are both MPLS capable (since
+a non-MPLS-capable router would be quite confused when it found an MPLS
+header where it had expected to find the IP header!). An MPLS-capable
+router is often referred to as a label-switched router, since it
+forwards an MPLS frame by looking up the MPLS label in its forwarding
+table and then immediately passing the datagram to the appropriate
+output interface. Thus, the MPLS-capable router need not extract the
+destination IP address and perform a lookup of the longest prefix match
+in the forwarding table. But how does a router know if its neighbor is
+indeed MPLS capable, and how does a router know what label to associate
+with the given IP destination? To answer these questions, we'll need to
+take a look at the interaction among a group of MPLS-capable routers.
+
+In the example in Figure 6.29, routers R1 through R4 are MPLS capable.
+R5 and R6 are standard IP routers. R1 has advertised to R2 and R3 that
+it (R1) can route to destination A, and that a received frame with MPLS
+label 6 will be forwarded to destination A. Router R3 has advertised to
+router R4 that it can route to destinations A and D, and that incoming
+frames with MPLS labels 10 and 12, respectively, will be switched toward
+those destinations. Router R2 has also advertised to router R4 that it
+(R2) can reach destination A, and that a received frame with MPLS label
+8 will be switched toward A. Note that router R4 is now in the
+interesting position of having
+
+Figure 6.29 MPLS-enhanced forwarding
+
+two MPLS paths to reach A: via interface 0 with outbound MPLS label 10,
+and via interface 1 with an MPLS label of 8. The broad picture painted
+in Figure 6.29 is that IP devices R5, R6, A, and D are connected
+together via an MPLS infrastructure (MPLS-capable routers R1, R2, R3,
+and R4) in much the same way that a switched LAN or an ATM network can
+connect together IP devices. And like a switched LAN or ATM network, the
+MPLS-capable routers R1 through R4 do so without ever touching the IP
+header of a packet. In our discussion above, we've not specified the
+specific protocol used to distribute labels among the MPLS-capable
+routers, as the details of this signaling are well beyond the scope of
+this book. We note, however, that the IETF working group on MPLS has
+specified in \[RFC 3468\] that an extension of the RSVP protocol, known
+as RSVP-TE \[RFC 3209\], will be the focus of its efforts for MPLS
+signaling. We've also not discussed how MPLS actually computes the paths
+for packets among MPLS capable routers, nor how it gathers link-state
+information (e.g., amount of link bandwidth unreserved by MPLS) to
+
+use in these path computations. Existing link-state routing algorithms
+(e.g., OSPF) have been extended to flood this information to
+MPLS-capable routers. Interestingly, the actual path computation
+algorithms are not standardized, and are currently vendor-specific. Thus
+far, the emphasis of our discussion of MPLS has been on the fact that
+MPLS performs switching based on labels, without needing to consider the
+IP address of a packet. The true advantages of MPLS and the reason for
+current interest in MPLS, however, lie not in the potential increases in
+switching speeds, but rather in the new traffic management capabilities
+that MPLS enables. As noted above, R4 has two MPLS paths to A. If
+forwarding were performed up at the IP layer on the basis of IP address,
+the IP routing protocols we studied in Chapter 5 would specify only a
+single, least-cost path to A. Thus, MPLS provides the ability to forward
+packets along routes that would not be possible using standard IP
+routing protocols. This is one simple form of traffic engineering using
+MPLS \[RFC 3346; RFC 3272; RFC 2702; Xiao 2000\], in which a network
+operator can override normal IP routing and force some of the traffic
+headed toward a given destination along one path, and other traffic
+destined toward the same destination along another path (whether for
+policy, performance, or some other reason). It is also possible to use
+MPLS for many other purposes as well. It can be used to perform fast
+restoration of MPLS forwarding paths, e.g., to reroute traffic over a
+precomputed failover path in response to link failure \[Kar 2000; Huang
+2002; RFC 3469\]. Finally, we note that MPLS can, and has, been used to
+implement so-called virtual private networks (VPNs). In implementing a
+VPN for a customer, an ISP uses its MPLS-enabled network to connect
+together the customer's various networks. MPLS can be used to isolate
+both the resources and addressing used by the customer's VPN from that
+of other users crossing the ISP's network; see \[DeClercq 2002\] for
+details. Our discussion of MPLS has been brief, and we encourage you to
+consult the references we've mentioned. We note that with so many
+possible uses for MPLS, it appears that it is rapidly becoming the Swiss
+Army knife of Internet traffic engineering!
+
+6.6 Data Center Networking In recent years, Internet companies such as
+Google, Microsoft, Facebook, and Amazon (as well as their counterparts
+in Asia and Europe) have built massive data centers, each housing tens
+to hundreds of thousands of hosts, and concurrently supporting many
+distinct cloud applications (e.g., search, e-mail, social networking,
+and e-commerce). Each data center has its own data center network that
+interconnects its hosts with each other and interconnects the data
+center with the Internet. In this section, we provide a brief
+introduction to data center networking for cloud applications. The cost
+of a large data center is huge, exceeding \$12 million per month for a
+100,000 host data center \[Greenberg 2009a\]. Of these costs, about 45
+percent can be attributed to the hosts themselves (which need to be
+replaced every 3--4 years); 25 percent to infrastructure, including
+transformers, uninterruptable power supplies (UPS) systems, generators
+for long-term outages, and cooling systems; 15 percent for electric
+utility costs for the power draw; and 15 percent for networking,
+including network gear (switches, routers and load balancers), external
+links, and transit traffic costs. (In these percentages, costs for
+equipment are amortized so that a common cost metric is applied for
+one-time purchases and ongoing expenses such as power.) While networking
+is not the largest cost, networking innovation is the key to reducing
+overall cost and maximizing performance \[Greenberg 2009a\]. The worker
+bees in a data center are the hosts: They serve content (e.g., Web pages
+and videos), store e-mails and documents, and collectively perform
+massively distributed computations (e.g., distributed index computations
+for search engines). The hosts in data centers, called blades and
+resembling pizza boxes, are generally commodity hosts that include CPU,
+memory, and disk storage. The hosts are stacked in racks, with each rack
+typically having 20 to 40 blades. At the top of each rack there is a
+switch, aptly named the Top of Rack (TOR) switch, that interconnects the
+hosts in the rack with each other and with other switches in the data
+center. Specifically, each host in the rack has a network interface card
+that connects to its TOR switch, and each TOR switch has additional
+ports that can be connected to other switches. Today hosts typically
+have 40 Gbps Ethernet connections to their TOR switches \[Greenberg
+2015\]. Each host is also assigned its own data-center-internal IP
+address. The data center network supports two types of traffic: traffic
+flowing between external clients and internal hosts and traffic flowing
+between internal hosts. To handle flows between external clients and
+internal hosts, the data center network includes one or more border
+routers, connecting the data center network to the public Internet. The
+data center network therefore interconnects the racks with each other
+and connects the racks to the border routers. Figure 6.30 shows an
+example of a data center network. Data center network design, the art of
+designing the interconnection network and protocols that connect the
+racks with each other and with the border routers, has become an
+important branch of
+
+computer networking research in recent years \[Al-Fares 2008; Greenberg
+2009a; Greenberg 2009b; Mysore 2009; Guo 2009; Wang 2010\].
+
+Figure 6.30 A data center network with a hierarchical topology
+
+Load Balancing A cloud data center, such as a Google or Microsoft data
+center, provides many applications concurrently, such as search, e-mail,
+and video applications. To support requests from external clients, each
+application is associated with a publicly visible IP address to which
+clients send their requests and from which they receive responses.
+Inside the data center, the external requests are first directed to a
+load balancer whose job it is to distribute requests to the hosts,
+balancing the load across the hosts as a function of their current load.
+A large data center will often have several load balancers, each one
+devoted to a set of specific cloud applications. Such a load balancer is
+sometimes referred to as a "layer-4 switch" since it makes decisions
+based on the destination port number (layer 4) as well as destination IP
+address in the packet. Upon receiving a request for a particular
+application, the load balancer forwards it to one of the hosts that
+handles the application. (A host may then invoke the services of other
+hosts to help process the request.) When the host finishes processing
+the request, it sends its response back to the load balancer, which in
+turn relays the response back to the external client. The load balancer
+not only balances the work load across hosts, but also provides a
+NAT-like function, translating the public external IP address to the
+internal IP address of the appropriate host, and
+
+then translating back for packets traveling in the reverse direction
+back to the clients. This prevents clients from contacting hosts
+directly, which has the security benefit of hiding the internal network
+structure and preventing clients from directly interacting with the
+hosts. Hierarchical Architecture For a small data center housing only a
+few thousand hosts, a simple network consisting of a border router, a
+load balancer, and a few tens of racks all interconnected by a single
+Ethernet switch could possibly suffice. But to scale to tens to hundreds
+of thousands of hosts, a data center often employs a hierarchy of
+routers and switches, such as the topology shown in Figure 6.30. At the
+top of the hierarchy, the border router connects to access routers (only
+two are shown in Figure 6.30, but there can be many more). Below each
+access router there are three tiers of switches. Each access router
+connects to a top-tier switch, and each top-tier switch connects to
+multiple second-tier switches and a load balancer. Each second-tier
+switch in turn connects to multiple racks via the racks' TOR switches
+(third-tier switches). All links typically use Ethernet for their
+link-layer and physical-layer protocols, with a mix of copper and fiber
+cabling. With such a hierarchical design, it is possible to scale a data
+center to hundreds of thousands of hosts. Because it is critical for a
+cloud application provider to continually provide applications with high
+availability, data centers also include redundant network equipment and
+redundant links in their designs (not shown in Figure 6.30). For
+example, each TOR switch can connect to two tier-2 switches, and each
+access router, tier-1 switch, and tier-2 switch can be duplicated and
+integrated into the design \[Cisco 2012; Greenberg 2009b\]. In the
+hierarchical design in Figure 6.30, observe that the hosts below each
+access router form a single subnet. In order to localize ARP broadcast
+traffic, each of these subnets is further partitioned into smaller VLAN
+subnets, each comprising a few hundred hosts \[Greenberg 2009a\].
+Although the conventional hierarchical architecture just described
+solves the problem of scale, it suffers from limited host-to-host
+capacity \[Greenberg 2009b\]. To understand this limitation, consider
+again Figure 6.30, and suppose each host connects to its TOR switch with
+a 1 Gbps link, whereas the links between switches are 10 Gbps Ethernet
+links. Two hosts in the same rack can always communicate at a full 1
+Gbps, limited only by the rate of the hosts' network interface cards.
+However, if there are many simultaneous flows in the data center
+network, the maximum rate between two hosts in different racks can be
+much less. To gain insight into this issue, consider a traffic pattern
+consisting of 40 simultaneous flows between 40 pairs of hosts in
+different racks. Specifically, suppose each of 10 hosts in rack 1 in
+Figure 6.30 sends a flow to a corresponding host in rack 5. Similarly,
+there are ten simultaneous flows between pairs of hosts in racks 2 and
+6, ten simultaneous flows between racks 3 and 7, and ten simultaneous
+flows between racks 4 and 8. If each flow evenly shares a link's
+capacity with other flows traversing that link, then the 40 flows
+crossing the 10 Gbps A-to-B link (as well as the 10 Gbps B-to-C link)
+will each only receive 10 Gbps/40=250 Mbps, which is significantly less
+than the 1 Gbps network
+
+interface card rate. The problem becomes even more acute for flows
+between hosts that need to travel higher up the hierarchy. One possible
+solution to this limitation is to deploy higher-rate switches and
+routers. But this would significantly increase the cost of the data
+center, because switches and routers with high port speeds are very
+expensive. Supporting high-bandwidth host-to-host communication is
+important because a key requirement in data centers is flexibility in
+placement of computation and services \[Greenberg 2009b; Farrington
+2010\]. For example, a large-scale Internet search engine may run on
+thousands of hosts spread across multiple racks with significant
+bandwidth requirements between all pairs of hosts. Similarly, a cloud
+computing service such as EC2 may wish to place the multiple virtual
+machines comprising a customer's service on the physical hosts with the
+most capacity irrespective of their location in the data center. If
+these physical hosts are spread across multiple racks, network
+bottlenecks as described above may result in poor performance. Trends in
+Data Center Networking In order to reduce the cost of data centers, and
+at the same time improve their delay and throughput performance,
+Internet cloud giants such as Google, Facebook, Amazon, and Microsoft
+are continually deploying new data center network designs. Although
+these designs are proprietary, many important trends can nevertheless be
+identified. One such trend is to deploy new interconnection
+architectures and network protocols that overcome the drawbacks of the
+traditional hierarchical designs. One such approach is to replace the
+hierarchy of switches and routers with a fully connected topology
+\[Facebook 2014; Al-Fares 2008; Greenberg 2009b; Guo 2009\], such as the
+topology shown in Figure 6.31. In this design, each tier-1 switch
+connects to all of the tier-2 switches so that (1) host-to-host traffic
+never has to rise above the switch tiers, and (2) with n tier-1
+switches, between any two tier-2 switches there are n disjoint paths.
+Such a design can significantly improve the host-to-host capacity. To
+see this, consider again our example of 40 flows. The topology in Figure
+6.31 can handle such a flow pattern since there are four distinct paths
+between the first tier-2 switch and the second tier-2 switch, together
+providing an aggregate capacity of 40 Gbps between the first two tier-2
+switches. Such a design not only alleviates the host-to-host capacity
+limitation, but also creates a more flexible computation and service
+environment in which communication between any two racks not connected
+to the same switch is logically equivalent, irrespective of their
+locations in the data center. Another major trend is to employ shipping
+container--based modular data centers (MDCs) \[YouTube 2009; Waldrop
+2007\]. In an MDC, a factory builds, within a
+
+Figure 6.31 Highly interconnected data network topology
+
+standard 12-meter shipping container, a "mini data center" and ships the
+container to the data center location. Each container has up to a few
+thousand hosts, stacked in tens of racks, which are packed closely
+together. At the data center location, multiple containers are
+interconnected with each other and also with the Internet. Once a
+prefabricated container is deployed at a data center, it is often
+difficult to service. Thus, each container is designed for graceful
+performance degradation: as components (servers and switches) fail over
+time, the container continues to operate but with degraded performance.
+When many components have failed and performance has dropped below a
+threshold, the entire container is removed and replaced with a fresh
+one. Building a data center out of containers creates new networking
+challenges. With an MDC, there are two types of networks: the
+container-internal networks within each of the containers and the core
+network connecting each container \[Guo 2009; Farrington 2010\]. Within
+each container, at the scale of up to a few thousand hosts, it is
+possible to build a fully connected network (as described above) using
+inexpensive commodity Gigabit Ethernet switches. However, the design of
+the core network, interconnecting hundreds to thousands of containers
+while providing high host-to-host bandwidth across containers for
+typical workloads, remains a challenging problem. A hybrid
+electrical/optical switch architecture for interconnecting the
+containers is proposed in \[Farrington 2010\]. When using highly
+interconnected topologies, one of the major issues is designing routing
+algorithms among the switches. One possibility \[Greenberg 2009b\] is to
+use a form of random routing. Another possibility \[Guo 2009\] is to
+deploy multiple network interface cards in each host, connect each host
+to multiple low-cost commodity switches, and allow the hosts themselves
+to intelligently route traffic among the switches. Variations and
+extensions of these approaches are currently being deployed in
+contemporary data centers. Another important trend is that large cloud
+providers are increasingly building or customizing just about everything
+that is in their data centers, including network adapters, switches
+routers, TORs, software,
+
+and networking protocols \[Greenberg 2015, Singh 2015\]. Another trend,
+pioneered by Amazon, is to improve reliability with "availability
+zones," which essentially replicate distinct data centers in different
+nearby buildings. By having the buildings nearby (a few kilometers
+apart), transactional data can be synchronized across the data centers
+in the same availability zone while providing fault tolerance \[Amazon
+2014\]. Many more innovations in data center design are likely to
+continue to come; interested readers are encouraged to see the recent
+papers and videos on data center network design.
+
+6.7 Retrospective: A Day in the Life of a Web Page Request Now that
+we've covered the link layer in this chapter, and the network, transport
+and application layers in earlier chapters, our journey down the
+protocol stack is complete! In the very beginning of this book (Section
+1.1), we wrote "much of this book is concerned with computer network
+protocols," and in the first five chapters, we've certainly seen that
+this is indeed the case! Before heading into the topical chapters in
+second part of this book, we'd like to wrap up our journey down the
+protocol stack by taking an integrated, holistic view of the protocols
+we've learned about so far. One way then to take this "big picture" view
+is to identify the many (many!) protocols that are involved in
+satisfying even the simplest request: downloading a Web page. Figure
+6.32 illustrates our setting: a student, Bob, connects a laptop to his
+school's Ethernet switch and downloads a Web page (say the home page of
+www.google.com). As we now know, there's a lot going on "under the hood"
+to satisfy this seemingly simple request. A Wireshark lab at the end of
+this chapter examines trace files containing a number of the packets
+involved in similar scenarios in more detail.
+
+6.7.1 Getting Started: DHCP, UDP, IP, and Ethernet Let's suppose that
+Bob boots up his laptop and then connects it to an Ethernet cable
+connected to the school's Ethernet switch, which in turn is connected to
+the school's router, as shown in Figure 6.32. The school's router is
+connected to an ISP, in this example, comcast.net. In this example,
+comcast.net is providing the DNS service for the school; thus, the DNS
+server resides in the Comcast network rather than the school network.
+We'll assume that the DHCP server is running within the router, as is
+often the case. When Bob first connects his laptop to the network, he
+can't do anything (e.g., download a Web page) without an IP address.
+Thus, the first network-related
+
+Figure 6.32 A day in the life of a Web page request: Network setting and
+actions
+
+action taken by Bob's laptop is to run the DHCP protocol to obtain an IP
+address, as well as other information, from the local DHCP server:
+
+1.  The operating system on Bob's laptop creates a DHCP request message
+    (Section 4.3.3) and puts this message within a UDP segment (Section
+    3.3) with destination port 67 (DHCP server) and source port 68 (DHCP
+    client). The UDP segment is then placed within an IP datagram
+    (Section 4.3.1) with a broadcast IP destination address
+    (255.255.255.255) and a source IP address of 0.0.0.0, since Bob's
+    laptop doesn't yet have an IP address.
+
+2.  The IP datagram containing the DHCP request message is then placed
+    within an Ethernet frame (Section 6.4.2). The Ethernet frame has a
+    destination MAC addresses of FF:FF:FF:FF:FF:FF so that the frame
+    will be broadcast to all devices connected to the switch (hopefully
+    including a DHCP server); the frame's source MAC address is that of
+    Bob's laptop, 00:16:D3:23:68:8A.
+
+3.  The broadcast Ethernet frame containing the DHCP request is the
+    first frame sent by Bob's laptop to the Ethernet switch. The switch
+    broadcasts the incoming frame on all outgoing ports, including the
+    port connected to the router.
+
+4.  The router receives the broadcast Ethernet frame containing the DHCP
+    request on its interface with MAC address 00:22:6B:45:1F:1B and the
+    IP datagram is extracted from the Ethernet frame. The datagram's
+    broadcast IP destination address indicates that this IP datagram
+    should be processed by upper layer protocols at this node, so the
+    datagram's payload (a UDP segment) is
+
+thus demultiplexed (Section 3.2) up to UDP, and the DHCP request message
+is extracted from the UDP segment. The DHCP server now has the DHCP
+request message.
+
+5.  Let's suppose that the DHCP server running within the router can
+    allocate IP addresses in the CIDR (Section 4.3.3) block
+    68.85.2.0/24. In this example, all IP addresses used within the
+    school are thus within Comcast's address block. Let's suppose the
+    DHCP server allocates address 68.85.2.101 to Bob's laptop. The DHCP
+    server creates a DHCP ACK message (Section 4.3.3) containing this IP
+    address, as well as the IP address of the DNS server (68.87.71.226),
+    the IP address for the default gateway router (68.85.2.1), and the
+    subnet block (68.85.2.0/24) (equivalently, the "network mask"). The
+    DHCP message is put inside a UDP segment, which is put inside an IP
+    datagram, which is put inside an Ethernet frame. The Ethernet frame
+    has a source MAC address of the router's interface to the home
+    network (00:22:6B:45:1F:1B) and a destination MAC address of Bob's
+    laptop (00:16:D3:23:68:8A).
+
+6.  The Ethernet frame containing the DHCP ACK is sent (unicast) by the
+    router to the switch. Because the switch is self-learning (Section
+    6.4.3) and previously received an Ethernet frame (containing the
+    DHCP request) from Bob's laptop, the switch knows to forward a frame
+    addressed to 00:16:D3:23:68:8A only to the output port leading to
+    Bob's laptop.
+
+7.  Bob's laptop receives the Ethernet frame containing the DHCP ACK,
+    extracts the IP datagram from the Ethernet frame, extracts the UDP
+    segment from the IP datagram, and extracts the DHCP ACK message from
+    the UDP segment. Bob's DHCP client then records its IP address and
+    the IP address of its DNS server. It also installs the address of
+    the default gateway into its IP forwarding table (Section 4.1).
+    Bob's laptop will send all datagrams with destination address
+    outside of its subnet 68.85.2.0/24 to the default gateway. At this
+    point, Bob's laptop has initialized its networking components and is
+    ready to begin processing the Web page fetch. (Note that only the
+    last two DHCP steps of the four presented in Chapter 4 are actually
+    necessary.)
+
+6.7.2 Still Getting Started: DNS and ARP When Bob types the URL for
+www.google.com into his Web browser, he begins the long chain of events
+that will eventually result in Google's home page being displayed by his
+Web browser. Bob's Web browser begins the process by creating a TCP
+socket (Section 2.7) that will be used to send the HTTP request (Section
+2.2) to www.google.com. In order to create the socket, Bob's laptop will
+need to know the IP address of www.google.com. We learned in Section
+2.5, that the DNS protocol is used to provide this name-to-IP-address
+translation service.
+
+8.  The operating system on Bob's laptop thus creates a DNS query
+    message (Section 2.5.3), putting the string "www.google.com" in the
+    question section of the DNS message. This DNS message is then placed
+    within a UDP segment with a destination port of 53 (DNS server). The
+    UDP segment is then placed within an IP datagram with an IP
+    destination address of
+
+68.87.71.226 (the address of the DNS server returned in the DHCP ACK in
+step 5) and a source IP address of 68.85.2.101.
+
+9.  Bob's laptop then places the datagram containing the DNS query
+    message in an Ethernet frame. This frame will be sent (addressed, at
+    the link layer) to the gateway router in Bob's school's network.
+    However, even though Bob's laptop knows the IP address of the
+    school's gateway router (68.85.2.1) via the DHCP ACK message in step
+    5 above, it doesn't know the gateway router's MAC address. In order
+    to obtain the MAC address of the gateway router, Bob's laptop will
+    need to use the ARP protocol (Section 6.4.1).
+
+10. Bob's laptop creates an ARP query message with a target IP address
+    of 68.85.2.1 (the default gateway), places the ARP message within an
+    Ethernet frame with a broadcast destination address
+    (FF:FF:FF:FF:FF:FF) and sends the Ethernet frame to the switch,
+    which delivers the frame to all connected devices, including the
+    gateway router.
+
+11. The gateway router receives the frame containing the ARP query
+    message on the interface to the school network, and finds that the
+    target IP address of 68.85.2.1 in the ARP message matches the IP
+    address of its interface. The gateway router thus prepares an ARP
+    reply, indicating that its MAC address of 00:22:6B:45:1F:1B
+    corresponds to IP address 68.85.2.1. It places the ARP reply message
+    in an Ethernet frame, with a destination address of
+    00:16:D3:23:68:8A (Bob's laptop) and sends the frame to the switch,
+    which delivers the frame to Bob's laptop.
+
+12. Bob's laptop receives the frame containing the ARP reply message and
+    extracts the MAC address of the gateway router (00:22:6B:45:1F:1B)
+    from the ARP reply message.
+
+13. Bob's laptop can now (finally!) address the Ethernet frame
+    containing the DNS query to the gateway router's MAC address. Note
+    that the IP datagram in this frame has an IP destination address of
+    68.87.71.226 (the DNS server), while the frame has a destination
+    address of 00:22:6B:45:1F:1B (the gateway router). Bob's laptop
+    sends this frame to the switch, which delivers the frame to the
+    gateway router.
+
+6.7.3 Still Getting Started: Intra-Domain Routing to the DNS Server 14.
+The gateway router receives the frame and extracts the IP datagram
+containing the DNS query. The router looks up the destination address of
+this datagram (68.87.71.226) and determines from its forwarding table
+that the datagram should be sent to the leftmost router in the Comcast
+network in Figure 6.32. The IP datagram is placed inside a link-layer
+frame appropriate for the link connecting the school's router to the
+leftmost Comcast router and the frame is sent over this link.
+
+15. The leftmost router in the Comcast network receives the frame,
+    extracts the IP datagram, examines the datagram's destination
+    address (68.87.71.226) and determines the outgoing interface on
+    which to forward the datagram toward the DNS server from its
+    forwarding table, which has been filled in by Comcast's intra-domain
+    protocol (such as RIP, OSPF or IS-IS,
+
+Section 5.3) as well as the Internet's inter-domain protocol, BGP
+(Section 5.4).
+
+16. Eventually the IP datagram containing the DNS query arrives at the
+    DNS server. The DNS server extracts the DNS query message, looks up
+    the name www.google.com in its DNS database (Section 2.5), and finds
+    the DNS resource record that contains the IP address
+    (64.233.169.105) for www.google.com. (assuming that it is currently
+    cached in the DNS server). Recall that this cached data originated
+    in the authoritative DNS server (Section 2.5.2) for googlecom. The
+    DNS server forms a DNS reply message containing this
+    hostname-to-IPaddress mapping, and places the DNS reply message in a
+    UDP segment, and the segment within an IP datagram addressed to
+    Bob's laptop (68.85.2.101). This datagram will be forwarded back
+    through the Comcast network to the school's router and from there,
+    via the Ethernet switch to Bob's laptop.
+
+17. Bob's laptop extracts the IP address of the server www.google.com
+    from the DNS message. Finally, after a lot of work, Bob's laptop is
+    now ready to contact the www.google.com server!
+
+6.7.4 Web Client-Server Interaction: TCP and HTTP 18. Now that Bob's
+laptop has the IP address of www.google.com, it can create the TCP
+socket (Section 2.7) that will be used to send the HTTP GET message
+(Section 2.2.3) to www.google.com. When Bob creates the TCP socket, the
+TCP in Bob's laptop must first perform a three-way handshake (Section
+3.5.6) with the TCP in www.google.com. Bob's laptop thus first creates a
+TCP SYN segment with destination port 80 (for HTTP), places the TCP
+segment inside an IP datagram with a destination IP address of
+64.233.169.105 (www.google.com), places the datagram inside a frame with
+a destination MAC address of 00:22:6B:45:1F:1B (the gateway router) and
+sends the frame to the switch.
+
+19. The routers in the school network, Comcast's network, and Google's
+    network forward the datagram containing the TCP SYN toward
+    www.google.com, using the forwarding table in each router, as in
+    steps 14--16 above. Recall that the router forwarding table entries
+    governing forwarding of packets over the inter-domain link between
+    the Comcast and Google networks are determined by the BGP protocol
+    (Chapter 5).
+
+20. Eventually, the datagram containing the TCP SYN arrives at
+    www.google.com. The TCP SYN message is extracted from the datagram
+    and demultiplexed to the welcome socket associated with port 80. A
+    connection socket (Section 2.7) is created for the TCP connection
+    between the Google HTTP server and Bob's laptop. A TCP SYNACK
+    (Section 3.5.6) segment is generated, placed inside a datagram
+    addressed to Bob's laptop, and finally placed inside a link-layer
+    frame appropriate for the link connecting www.google.com to its
+    first-hop router.
+
+21. The datagram containing the TCP SYNACK segment is forwarded through
+    the Google, Comcast, and school networks, eventually arriving at the
+    Ethernet card in Bob's laptop. The datagram is demultiplexed within
+    the operating system to the TCP socket created in step 18, which
+    enters the connected state.
+
+22. With the socket on Bob's laptop now (finally!) ready to send bytes
+to www.google.com, Bob's browser creates the HTTP GET message (Section
+2.2.3) containing the URL to be fetched. The HTTP GET message is then
+written into the socket, with the GET message becoming the payload of a
+TCP segment. The TCP segment is placed in a datagram and sent and
+delivered to www.google.com as in steps 18--20 above.
+
+23. The HTTP server at www.google.com reads the HTTP GET message from
+    the TCP socket, creates an HTTP response message (Section 2.2),
+    places the requested Web page content in the body of the HTTP
+    response message, and sends the message into the TCP socket.
+
+24. The datagram containing the HTTP reply message is forwarded through
+    the Google, Comcast, and school networks, and arrives at Bob's
+    laptop. Bob's Web browser program reads the HTTP response from the
+    socket, extracts the html for the Web page from the body of the HTTP
+    response, and finally (finally!) displays the Web page! Our scenario
+    above has covered a lot of networking ground! If you've understood
+    most or all of the above example, then you've also covered a lot of
+    ground since you first read Section 1.1, where we wrote "much of
+    this book is concerned with computer network protocols" and you may
+    have wondered what a protocol actually was! As detailed as the above
+    example might seem, we've omitted a number of possible additional
+    protocols (e.g., NAT running in the school's gateway router,
+    wireless access to the school's network, security protocols for
+    accessing the school network or encrypting segments or datagrams,
+    network management protocols), and considerations (Web caching, the
+    DNS hierarchy) that one would encounter in the public Internet.
+    We'll cover a number of these topics and more in the second part of
+    this book. Lastly, we note that our example above was an integrated
+    and holistic, but also very "nuts and bolts," view of many of the
+    protocols that we've studied in the first part of this book. The
+    example focused more on the "how" than the "why." For a broader,
+    more reflective view on the design of network protocols in general,
+    see \[Clark 1988, RFC 5218\].
+
+6.8 Summary In this chapter, we've examined the link layer---its
+services, the principles underlying its operation, and a number of
+important specific protocols that use these principles in implementing
+link-layer services. We saw that the basic service of the link layer is
+to move a network-layer datagram from one node (host, switch, router,
+WiFi access point) to an adjacent node. We saw that all link-layer
+protocols operate by encapsulating a network-layer datagram within a
+link-layer frame before transmitting the frame over the link to the
+adjacent node. Beyond this common framing function, however, we learned
+that different link-layer protocols provide very different link access,
+delivery, and transmission services. These differences are due in part
+to the wide variety of link types over which link-layer protocols must
+operate. A simple point-to-point link has a single sender and receiver
+communicating over a single "wire." A multiple access link is shared
+among many senders and receivers; consequently, the link-layer protocol
+for a multiple access channel has a protocol (its multiple access
+protocol) for coordinating link access. In the case of MPLS, the "link"
+connecting two adjacent nodes (for example, two IP routers that are
+adjacent in an IP sense---that they are next-hop IP routers toward some
+destination) may actually be a network in and of itself. In one sense,
+the idea of a network being considered as a link should not seem odd. A
+telephone link connecting a home modem/computer to a remote
+modem/router, for example, is actually a path through a sophisticated
+and complex telephone network. Among the principles underlying
+link-layer communication, we examined error-detection and -correction
+techniques, multiple access protocols, link-layer addressing,
+virtualization (VLANs), and the construction of extended switched LANs
+and data center networks. Much of the focus today at the link layer is
+on these switched networks. In the case of error detection/correction,
+we examined how it is possible to add additional bits to a frame's
+header in order to detect, and in some cases correct, bit-flip errors
+that might occur when the frame is transmitted over the link. We covered
+simple parity and checksumming schemes, as well as the more robust
+cyclic redundancy check. We then moved on to the topic of multiple
+access protocols. We identified and studied three broad approaches for
+coordinating access to a broadcast channel: channel partitioning
+approaches (TDM, FDM), random access approaches (the ALOHA protocols and
+CSMA protocols), and taking-turns approaches (polling and token
+passing). We studied the cable access network and found that it uses
+many of these multiple access methods. We saw that a consequence of
+having multiple nodes share a single broadcast channel was the need to
+provide node addresses at the link layer. We learned that link-layer
+addresses were quite different from network-layer addresses and that, in
+the case of the Internet, a special protocol (ARP---the Address
+Resolution Protocol) is used to translate between these two forms of
+addressing and studied the hugely successful Ethernet protocol in
+detail. We then examined how nodes sharing a broadcast channel form
+
+a LAN and how multiple LANs can be connected together to form larger
+LANs---all without the intervention of network-layer routing to
+interconnect these local nodes. We also learned how multiple virtual
+LANs can be created on a single physical LAN infrastructure. We ended
+our study of the link layer by focusing on how MPLS networks provide
+link-layer services when they interconnect IP routers and an overview of
+the network designs for today's massive data centers. We wrapped up this
+chapter (and indeed the first five chapters) by identifying the many
+protocols that are needed to fetch a simple Web page. Having covered the
+link layer, our journey down the protocol stack is now over! Certainly,
+the physical layer lies below the link layer, but the details of the
+physical layer are probably best left for another course (for example,
+in communication theory, rather than computer networking). We have,
+however, touched upon several aspects of the physical layer in this
+chapter and in Chapter 1 (our discussion of physical media in Section
+1.2). We'll consider the physical layer again when we study wireless
+link characteristics in the next chapter. Although our journey down the
+protocol stack is over, our study of computer networking is not yet at
+an end. In the following three chapters we cover wireless networking,
+network security, and multimedia networking. These four topics do not
+fit conveniently into any one layer; indeed, each topic crosscuts many
+layers. Understanding these topics (billed as advanced topics in some
+networking texts) thus requires a firm foundation in all layers of the
+protocol stack---a foundation that our study of the link layer has now
+completed!
+
+Homework Problems and Questions
+
+Chapter 6 Review Questions
+
+SECTIONS 6.1--6.2 R1. Consider the transportation analogy in Section
+6.1.1 . If the passenger is analagous to a datagram, what is analogous
+to the link layer frame? R2. If all the links in the Internet were to
+provide reliable delivery service, would the TCP reliable delivery
+service be redundant? Why or why not? R3. What are some of the possible
+services that a link-layer protocol can offer to the network layer?
+Which of these link-layer services have corresponding services in IP? In
+TCP?
+
+SECTION 6.3 R4. Suppose two nodes start to transmit at the same time a
+packet of length L over a broadcast channel of rate R. Denote the
+propagation delay between the two nodes as dprop. Will there be a
+collision if dprop\<L/R? Why or why not? R5. In Section 6.3 , we listed
+four desirable characteristics of a broadcast channel. Which of these
+characteristics does slotted ALOHA have? Which of these characteristics
+does token passing have? R6. In CSMA/CD, after the fifth collision, what
+is the probability that a node chooses K=4? The result K=4 corresponds
+to a delay of how many seconds on a 10 Mbps Ethernet? R7. Describe
+polling and token-passing protocols using the analogy of cocktail party
+interactions. R8. Why would the token-ring protocol be inefficient if a
+LAN had a very large perimeter?
+
+SECTION 6.4 R9. How big is the MAC address space? The IPv4 address
+space? The IPv6 address space? R10. Suppose nodes A, B, and C each
+attach to the same broadcast LAN (through their adapters). If A sends
+thousands of IP datagrams to B with each encapsulating frame addressed
+to the MAC address of B, will C's adapter process these frames? If so,
+will C's adapter pass the IP datagrams in these frames to the network
+layer C? How would your answers change if A sends frames with the MAC
+broadcast address? R11. Why is an ARP query sent within a broadcast
+frame? Why is an ARP response sent within
+
+a frame with a specific destination MAC address? R12. For the network in
+Figure 6.19 , the router has two ARP modules, each with its own ARP
+table. Is it possible that the same MAC address appears in both tables?
+R13. Compare the frame structures for 10BASE-T, 100BASE-T, and Gigabit
+Ethernet. How do they differ? R14. Consider Figure 6.15 . How many
+subnetworks are there, in the addressing sense of Section 4.3 ? R15.
+What is the maximum number of VLANs that can be configured on a switch
+supporting the 802.1Q protocol? Why? R16. Suppose that N switches
+supporting K VLAN groups are to be connected via a trunking protocol.
+How many ports are needed to connect the switches? Justify your answer.
+
+Problems P1. Suppose the information content of a packet is the bit
+pattern 1110 0110 1001 1101 and an even parity scheme is being used.
+What would the value of the field containing the parity bits be for the
+case of a two-dimensional parity scheme? Your answer should be such that
+a minimumlength checksum field is used. P2. Show (give an example other
+than the one in Figure 6.5 ) that two-dimensional parity checks can
+correct and detect a single bit error. Show (give an example of) a
+double-bit error that can be detected but not corrected. P3. Suppose the
+information portion of a packet (D in Figure 6.3 ) contains 10 bytes
+consisting of the 8-bit unsigned binary ASCII representation of string
+"Networking." Compute the Internet checksum for this data. P4. Consider
+the previous problem, but instead suppose these 10 bytes contain
+
+a.  the binary representation of the numbers 1 through 10.
+
+b.  the ASCII representation of the letters B through K (uppercase).
+
+c.  the ASCII representation of the letters b through k (lowercase).
+    Compute the Internet checksum for this data. P5. Consider the 5-bit
+    generator, G=10011, and suppose that D has the value 1010101010.
+    What is the value of R? P6. Consider the previous problem, but
+    suppose that D has the value
+
+d.  1001010101. 
+
+e.  101101010. 
+
+f.  1010100000. P7. In this problem, we explore some of the properties
+                of the CRC. For the generator G(=1001) given in Section
+                6.2.3 , answer the following questions.
+
+a. Why can it detect any single bit error in data D? b. Can the above G
+detect any odd number of bit errors? Why? P8. In Section 6.3 , we
+provided an outline of the derivation of the efficiency of slotted
+ALOHA. In this problem we'll complete the derivation.
+
+a.  Recall that when there are N active nodes, the efficiency of slotted
+    ALOHA is Np(1−p)N−1. Find the value of p that maximizes this
+    expression.
+
+b.  Using the value of p found in (a), find the efficiency of slotted
+    ALOHA by letting N approach infinity. Hint: (1−1/N)N approaches 1/e
+    as N approaches infinity. P9. Show that the maximum efficiency of
+    pure ALOHA is 1/(2e). Note: This problem is easy if you have
+    completed the problem above! P 10. Consider two nodes, A and B, that
+    use the slotted ALOHA protocol to contend for a channel. Suppose
+    node A has more data to transmit than node B, and node A's
+    retransmission probability pA is greater than node B's
+    retransmission probability, pB.
+
+c.  Provide a formula for node A's average throughput. What is the total
+    efficiency of the protocol with these two nodes?
+
+d.  If pA=2pB, is node A's average throughput twice as large as that of
+    node B? Why or why not? If not, how can you choose pA and pB to make
+    that happen?
+
+e.  In general, suppose there are N nodes, among which node A has
+    retransmission probability 2p and all other nodes have
+    retransmission probability p. Provide expressions to compute the
+    average throughputs of node A and of any other node. P11. Suppose
+    four active nodes---nodes A, B, C and D---are competing for access
+    to a channel using slotted ALOHA. Assume each node has an infinite
+    number of packets to send. Each node attempts to transmit in each
+    slot with probability p. The first slot is numbered slot 1, the
+    second slot is numbered slot 2, and so on.
+
+f.  What is the probability that node A succeeds for the first time in
+    slot 5?
+
+g.  What is the probability that some node (either A, B, C or D)
+    succeeds in slot 4?
+
+h.  What is the probability that the first success occurs in slot 3?
+
+i.  What is the efficiency of this four-node system? P12. Graph the
+    efficiency of slotted ALOHA and pure ALOHA as a function of p for
+    the following values of N:
+
+j.  N=15.
+
+k.  N=25.
+
+l.  N=35. P13. Consider a broadcast channel with N nodes and a
+    transmission rate of R bps. Suppose the broadcast channel uses
+    polling (with an additional polling node) for multiple access.
+    Suppose the
+
+amount of time from when a node completes transmission until the
+subsequent node is permitted to transmit (that is, the polling delay) is
+dpoll. Suppose that within a polling round, a given node is allowed to
+transmit at most Q bits. What is the maximum throughput of the broadcast
+channel? P14. Consider three LANs interconnected by two routers, as
+shown in Figure 6.33 .
+
+a.  Assign IP addresses to all of the interfaces. For Subnet 1 use
+    addresses of the form 192.168.1.xxx; for Subnet 2 uses addresses of
+    the form 192.168.2.xxx; and for Subnet 3 use addresses of the form
+    192.168.3.xxx.
+
+b.  Assign MAC addresses to all of the adapters.
+
+c.  Consider sending an IP datagram from Host E to Host B. Suppose all
+    of the ARP tables are up to date. Enumerate all the steps, as done
+    for the single-router example in Section 6.4.1 .
+
+d.  Repeat (c), now assuming that the ARP table in the sending host is
+    empty (and the other tables are up to date). P15. Consider Figure
+    6.33 . Now we replace the router between subnets 1 and 2 with a
+    switch S1, and label the router between subnets 2 and 3 as R1.
+
+Figure 6.33 Three subnets, interconnected by routers
+
+a.  Consider sending an IP datagram from Host E to Host F. Will Host E
+    ask router R1 to help forward the datagram? Why? In the Ethernet
+    frame containing the IP datagram, what are the source and
+    destination IP and MAC addresses?
+
+b.  Suppose E would like to send an IP datagram to B, and assume that
+    E's ARP cache does not contain B's MAC address. Will E perform an
+    ARP query to find B's MAC
+
+address? Why? In the Ethernet frame (containing the IP datagram destined
+to B) that is delivered to router R1, what are the source and
+destination IP and MAC addresses?
+
+c.  Suppose Host A would like to send an IP datagram to Host B, and
+    neither A's ARP cache contains B's MAC address nor does B's ARP
+    cache contain A's MAC address. Further suppose that the switch S1's
+    forwarding table contains entries for Host B and router R1 only.
+    Thus, A will broadcast an ARP request message. What actions will
+    switch S1 perform once it receives the ARP request message? Will
+    router R1 also receive this ARP request message? If so, will R1
+    forward the message to Subnet 3? Once Host B receives this ARP
+    request message, it will send back to Host A an ARP response
+    message. But will it send an ARP query message to ask for A's MAC
+    address? Why? What will switch S1 do once it receives an ARP
+    response message from Host B? P16. Consider the previous problem,
+    but suppose now that the router between subnets 2 and 3 is replaced
+    by a switch. Answer questions (a)--(c) in the previous problem in
+    this new context. P17. Recall that with the CSMA/CD protocol, the
+    adapter waits K⋅512 bit times after a collision, where K is drawn
+    randomly. For K=100, how long does the adapter wait until returning
+    to Step 2 for a 10 Mbps broadcast channel? For a 100 Mbps broadcast
+    channel? P18. Suppose nodes A and B are on the same 10 Mbps
+    broadcast channel, and the propagation delay between the two nodes
+    is 325 bit times. Suppose CSMA/CD and Ethernet packets are used for
+    this broadcast channel. Suppose node A begins transmitting a frame
+    and, before it finishes, node B begins transmitting a frame. Can A
+    finish transmitting before it detects that B has transmitted? Why or
+    why not? If the answer is yes, then A incorrectly believes that its
+    frame was successfully transmitted without a collision. Hint:
+    Suppose at time t=0 bits, A begins transmitting a frame. In the
+    worst case, A transmits a minimum-sized frame of 512+64 bit times.
+    So A would finish transmitting the frame at t=512+64 bit times.
+    Thus, the answer is no, if B's signal reaches A before bit time
+    t=512+64 bits. In the worst case, when does B's signal reach A? P19.
+    Suppose nodes A and B are on the same 10 Mbps broadcast channel, and
+    the propagation delay between the two nodes is 245 bit times.
+    Suppose A and B send Ethernet frames at the same time, the frames
+    collide, and then A and B choose different values of K in the
+    CSMA/CD algorithm. Assuming no other nodes are active, can the
+    retransmissions from A and B collide? For our purposes, it suffices
+    to work out the following example. Suppose A and B begin
+    transmission at t=0 bit times. They both detect collisions at t=245
+    t bit times. Suppose KA=0 and KB=1. At what time does B schedule its
+    retransmission? At what time does A begin transmission? (Note: The
+    nodes must wait for an idle channel after returning to Step 2---see
+    protocol.) At what time does A's signal reach B? Does B refrain from
+    transmitting at its scheduled time? P20. In this problem, you will
+    derive the efficiency of a CSMA/CD-like multiple access protocol. In
+    this protocol, time is slotted and all adapters are synchronized to
+    the slots. Unlike slotted ALOHA, however, the length of a slot (in
+    seconds) is much less than a frame time (the time to transmit a
+    frame). Let S be the length of a slot. Suppose all frames are of
+    constant length
+
+L=kRS, where R is the transmission rate of the channel and k is a large
+integer. Suppose there are N nodes, each with an infinite number of
+frames to send. We also assume that dprop\<S, so that all nodes can
+detect a collision before the end of a slot time. The protocol is as
+follows: If, for a given slot, no node has possession of the channel,
+all nodes contend for the channel; in particular, each node transmits in
+the slot with probability p. If exactly one node transmits in the slot,
+that node takes possession of the channel for the subsequent k−1 slots
+and transmits its entire frame. If some node has possession of the
+channel, all other nodes refrain from transmitting until the node that
+possesses the channel has finished transmitting its frame. Once this
+node has transmitted its frame, all nodes contend for the channel. Note
+that the channel alternates between two states: the productive state,
+which lasts exactly k slots, and the nonproductive state, which lasts
+for a random number of slots. Clearly, the channel efficiency is the
+ratio of k/(k+x), where x is the expected number of consecutive
+unproductive slots.
+
+a.  For fixed N and p, determine the efficiency of this protocol.
+
+b.  For fixed N, determine the p that maximizes the efficiency.
+
+c.  Using the p (which is a function of N) found in (b), determine the
+    efficiency as N approaches infinity.
+
+d.  Show that this efficiency approaches 1 as the frame length becomes
+    large. P21. Consider Figure 6.33 in problem P14. Provide MAC
+    addresses and IP addresses for the interfaces at Host A, both
+    routers, and Host F. Suppose Host A sends a datagram to Host F. Give
+    the source and destination MAC addresses in the frame encapsulating
+    this IP datagram as the frame is transmitted (i) from A to the left
+    router, (ii) from the left router to the right router, (iii) from
+    the right router to F. Also give the source and destination IP
+    addresses in the IP datagram encapsulated within the frame at each
+    of these points in time. P22. Suppose now that the leftmost router
+    in Figure 6.33 is replaced by a switch. Hosts A, B, C, and D and the
+    right router are all star-connected into this switch. Give the
+    source and destination MAC addresses in the frame encapsulating this
+    IP datagram as the frame is transmitted (i) from A to the
+    switch, (ii) from the switch to the right router, (iii) from the
+    right router to F. Also give the source and destination IP addresses
+    in the IP datagram encapsulated within the frame at each of these
+    points in time. P23. Consider Figure 6.15 . Suppose that all links
+    are 100 Mbps. What is the maximum total aggregate throughput that
+    can be achieved among the 9 hosts and 2 servers in this network? You
+    can assume that any host or server can send to any other host or
+    server. Why? P24. Suppose the three departmental switches in Figure
+    6.15 are replaced by hubs. All links are 100 Mbps. Now answer the
+    questions posed in problem P23. P25. Suppose that all the switches
+    in Figure 6.15 are replaced by hubs. All links are 100 Mbps. Now
+    answer the questions posed in problem P23.
+
+P26. Let's consider the operation of a learning switch in the context of
+a network in which 6 nodes labeled A through F are star connected into
+an Ethernet switch. Suppose that (i) B sends a frame to E, (ii) E
+replies with a frame to B, (iii) A sends a frame to B, (iv) B replies
+with a frame to A. The switch table is initially empty. Show the state
+of the switch table before and after each of these events. For each of
+these events, identify the link(s) on which the transmitted frame will
+be forwarded, and briefly justify your answers. P27. In this problem, we
+explore the use of small packets for Voice-over-IP applications. One of
+the drawbacks of a small packet size is that a large fraction of link
+bandwidth is consumed by overhead bytes. To this end, suppose that the
+packet consists of P bytes and 5 bytes of header.
+
+a.  Consider sending a digitally encoded voice source directly. Suppose
+    the source is encoded at a constant rate of 128 kbps. Assume each
+    packet is entirely filled before the source sends the packet into
+    the network. The time required to fill a packet is the packetization
+    delay. In terms of L, determine the packetization delay in
+    milliseconds.
+
+b.  Packetization delays greater than 20 msec can cause a noticeable and
+    unpleasant echo. Determine the packetization delay for L=1,500 bytes
+    (roughly corresponding to a maximum-sized Ethernet packet) and for
+    L=50 (corresponding to an ATM packet).
+
+c.  Calculate the store-and-forward delay at a single switch for a link
+    rate of R=622 Mbps for L=1,500 bytes, and for L=50 bytes.
+
+d.  Comment on the advantages of using a small packet size. P28.
+    Consider the single switch VLAN in Figure 6.25 , and assume an
+    external router is connected to switch port 1. Assign IP addresses
+    to the EE and CS hosts and router interface. Trace the steps taken
+    at both the network layer and the link layer to transfer an IP
+    datagram from an EE host to a CS host (Hint: Reread the discussion
+    of Figure 6.19 in the text). P29. Consider the MPLS network shown in
+    Figure 6.29 , and suppose that routers R5 and R6 are now MPLS
+    enabled. Suppose that we want to perform traffic engineering so that
+    packets from R6 destined for A are switched to A via R6-R4-R3-R1,
+    and packets from R5 destined for A are switched via R5-R4-R2-R1.
+    Show the MPLS tables in R5 and R6, as well as the modified table in
+    R4, that would make this possible. P30. Consider again the same
+    scenario as in the previous problem, but suppose that packets from
+    R6 destined for D are switched via R6-R4-R3, while packets from R5
+    destined to D are switched via R4-R2-R1-R3. Show the MPLS tables in
+    all routers that would make this possible. P31. In this problem, you
+    will put together much of what you have learned about Internet
+    protocols. Suppose you walk into a room, connect to Ethernet, and
+    want to download a Web page. What are all the protocol steps that
+    take place, starting from powering on your PC to getting the Web
+    page? Assume there is nothing in our DNS or browser caches when you
+    power on your PC. (Hint: The steps include the use of Ethernet,
+    DHCP, ARP, DNS, TCP, and HTTP protocols.) Explicitly indicate in
+    your steps how you obtain the IP and MAC addresses of a gateway
+    router. P32. Consider the data center network with hierarchical
+    topology in Figure 6.30 . Suppose now
+
+there are 80 pairs of flows, with ten flows between the first and ninth
+rack, ten flows between the second and tenth rack, and so on. Further
+suppose that all links in the network are 10 Gbps, except for the links
+between hosts and TOR switches, which are 1 Gbps.
+
+a.  Each flow has the same data rate; determine the maximum rate of a
+    flow.
+
+b.  For the same traffic pattern, determine the maximum rate of a flow
+    for the highly interconnected topology in Figure 6.31 .
+
+c.  Now suppose there is a similar traffic pattern, but involving 20
+    hosts on each rack and 160 pairs of flows. Determine the maximum
+    flow rates for the two topologies. P33. Consider the hierarchical
+    network in Figure 6.30 and suppose that the data center needs to
+    support e-mail and video distribution among other applications.
+    Suppose four racks of servers are reserved for e-mail and four racks
+    are reserved for video. For each of the applications, all four racks
+    must lie below a single tier-2 switch since the tier-2 to tier-1
+    links do not have sufficient bandwidth to support the
+    intra-application traffic. For the e-mail application, suppose that
+    for 99.9 percent of the time only three racks are used, and that the
+    video application has identical usage patterns.
+
+d.  For what fraction of time does the e-mail application need to use a
+    fourth rack? How about for the video application?
+
+e.  Assuming e-mail usage and video usage are independent, for what
+    fraction of time do (equivalently, what is the probability that)
+    both applications need their fourth rack?
+
+f.  Suppose that it is acceptable for an application to have a shortage
+    of servers for 0.001 percent of time or less (causing rare periods
+    of performance degradation for users). Discuss how the topology in
+    Figure 6.31 can be used so that only seven racks are collectively
+    assigned to the two applications (assuming that the topology can
+    support all the traffic).
+
+Wireshark Labs At the Companion website for this textbook,
+http://www.pearsonhighered.com/cs-resources/, you'll find a Wireshark
+lab that examines the operation of the IEEE 802.3 protocol and the
+Wireshark frame format. A second Wireshark lab examines packet traces
+taken in a home network scenario.
+
+AN INTERVIEW WITH... Simon S. Lam Simon S. Lam is Professor and Regents
+Chair in Computer Sciences at the University of Texas at Austin. From
+1971 to 1974, he was with the ARPA Network Measurement Center at UCLA,
+where he worked on satellite and radio packet switching. He led a
+research group that invented secure sockets and prototyped, in 1993, the
+first secure sockets layer named Secure Network Programming, which won
+the 2004 ACM Software System Award. His research interests are in design
+and analysis of network protocols and security services. He received his
+BSEE from
+
+Washington State University and his MS and PhD from UCLA. He was elected
+to the National Academy of Engineering in 2007.
+
+Why did you decide to specialize in networking? When I arrived at UCLA
+as a new graduate student in Fall 1969, my intention was to study
+control theory. Then I took the queuing theory classes of Leonard
+Kleinrock and was very impressed by him. For a while, I was working on
+adaptive control of queuing systems as a possible thesis topic. In early
+1972, Larry Roberts initiated the ARPAnet Satellite System project
+(later called Packet Satellite). Professor Kleinrock asked me to join
+the project. The first thing we did was to introduce a simple, yet
+realistic, backoff algorithm to the slotted ALOHA protocol. Shortly
+thereafter, I found many interesting research problems, such as ALOHA's
+instability problem and need for adaptive backoff, which would form the
+core of my thesis. You were active in the early days of the Internet in
+the 1970s, beginning with your student days at UCLA. What was it like
+then? Did people have any inkling of what the Internet would become? The
+atmosphere was really no different from other system-building projects I
+have seen in industry and academia. The initially stated goal of the
+ARPAnet was fairly modest, that is, to provide access to expensive
+computers from remote locations so that many more scientists could use
+them. However, with the startup of the Packet Satellite project in 1972
+and the Packet Radio project in 1973, ARPA's goal had expanded
+substantially. By 1973, ARPA was building three different packet
+networks at the same time, and it became necessary for Vint Cerf and Bob
+Kahn to develop an interconnection strategy. Back then, all of these
+progressive developments in networking were viewed (I believe) as
+logical rather than magical. No one could have envisioned the scale of
+the Internet and power of personal computers today. It was a decade
+before appearance of the first PCs. To put things in perspective, most
+students submitted their computer programs as decks of punched cards for
+batch processing. Only some students had direct access to computers,
+which were typically housed in a restricted area. Modems were slow and
+still a rarity. As a graduate student, I had only a phone on my desk,
+and I used pencil and paper to do most of my work.
+
+Where do you see the field of networking and the Internet heading in the
+future? In the past, the simplicity of the Internet's IP protocol was
+its greatest strength in vanquishing competition and becoming the de
+facto standard for internetworking. Unlike competitors, such as X.25 in
+the 1980s and ATM in the 1990s, IP can run on top of any link-layer
+networking technology, because it offers only a best-effort datagram
+service. Thus, any packet network can connect to the Internet. Today,
+IP's greatest strength is actually a shortcoming. IP is like a
+straitjacket that confines the Internet's development to specific
+directions. In recent years, many researchers have redirected their
+efforts to the application layer only. There is also a great deal of
+research on wireless ad hoc networks, sensor networks, and satellite
+networks. These networks can be viewed either as stand-alone systems or
+link-layer systems, which can flourish because they are outside of the
+IP straitjacket. Many people are excited about the possibility of P2P
+systems as a platform for novel Internet applications. However, P2P
+systems are highly inefficient in their use of Internet resources. A
+concern of mine is whether the transmission and switching capacity of
+the Internet core will continue to increase faster than the traffic
+demand on the Internet as it grows to interconnect all kinds of devices
+and support future P2P-enabled applications. Without substantial
+overprovisioning of capacity, ensuring network stability in the presence
+of malicious attacks and congestion will continue to be a significant
+challenge. The Internet's phenomenal growth also requires the allocation
+of new IP addresses at a rapid rate to network operators and enterprises
+worldwide. At the current rate, the pool of unallocated IPv4 addresses
+would be depleted in a few years. When that happens, large contiguous
+blocks of address space can only be allocated from the IPv6 address
+space. Since adoption of IPv6 is off to a slow start, due to lack of
+incentives for early adopters, IPv4 and IPv6 will most likely coexist on
+the Internet for many years to come. Successful migration from an
+IPv4-dominant Internet to an IPv6-dominant Internet will require a
+substantial global effort. What is the most challenging part of your
+job? The most challenging part of my job as a professor is teaching and
+motivating every student in my class, and every doctoral student under
+my supervision, rather than just the high achievers. The very bright and
+motivated may require a little guidance but not much else. I often learn
+more from these students than they learn from me. Educating and
+motivating the underachievers present a major challenge. What impacts do
+you foresee technology having on learning in the future? Eventually,
+almost all human knowledge will be accessible through the Internet,
+which will be the most powerful tool for learning. This vast knowledge
+base will have the potential of leveling the
+
+playing field for students all over the world. For example, motivated
+students in any country will be able to access the best-class Web sites,
+multimedia lectures, and teaching materials. Already, it was said that
+the IEEE and ACM digital libraries have accelerated the development of
+computer science researchers in China. In time, the Internet will
+transcend all geographic barriers to learning.
+
+Chapter 7 Wireless and Mobile Networks
+
+In the telephony world, the past 20 years have arguably been the golden
+years of cellular telephony. The number of worldwide mobile cellular
+subscribers increased from 34 million in 1993 to nearly 7.0 billion
+subscribers by 2014, with the number of cellular subscribers now
+surpassing the number of wired telephone lines. There are now a larger
+number of mobile phone subscriptions than there are people on our
+planet. The many advantages of cell phones are evident to
+all---anywhere, anytime, untethered access to the global telephone
+network via a highly portable lightweight device. More recently,
+laptops, smartphones, and tablets are wirelessly connected to the
+Internet via a cellular or WiFi network. And increasingly, devices such
+as gaming consoles, thermostats, home security systems, home appliances,
+watches, eye glasses, cars, traffic control systems and more are being
+wirelessly connected to the Internet. From a networking standpoint, the
+challenges posed by networking these wireless and mobile devices,
+particularly at the link layer and the network layer, are so different
+from traditional wired computer networks that an individual chapter
+devoted to the study of wireless and mobile networks (i.e., this
+chapter) is appropriate. We'll begin this chapter with a discussion of
+mobile users, wireless links, and networks, and their relationship to
+the larger (typically wired) networks to which they connect. We'll draw
+a distinction between the challenges posed by the wireless nature of the
+communication links in such networks, and by the mobility that these
+wireless links enable. Making this important distinction---between
+wireless and mobility---will allow us to better isolate, identify, and
+master the key concepts in each area. Note that there are indeed many
+networked environments in which the network nodes are wireless but not
+mobile (e.g., wireless home or office networks with stationary
+workstations and large displays), and that there are limited forms of
+mobility that do not require wireless links (e.g., a worker who uses a
+wired laptop at home, shuts down the laptop, drives to work, and
+attaches the laptop to the company's wired network). Of course, many of
+the most exciting networked environments are those in which users are
+both wireless and mobile---for example, a scenario in which a mobile
+user (say in the back seat of car) maintains a Voice-over-IP call and
+multiple ongoing TCP connections while racing down the autobahn at 160
+kilometers per hour, soon in an autonomous vehicle. It is here, at the
+intersection of wireless and mobility, that we'll find the most
+interesting technical challenges!
+
+We'll begin by illustrating the setting in which we'll consider wireless
+communication and mobility---a network in which wireless (and possibly
+mobile) users are connected into the larger network infrastructure by a
+wireless link at the network's edge. We'll then consider the
+characteristics of this wireless link in Section 7.2. We include a brief
+introduction to code division multiple access (CDMA), a shared-medium
+access protocol that is often used in wireless networks, in Section 7.2.
+In Section 7.3, we'll examine the link-level aspects of the IEEE 802.11
+(WiFi) wireless LAN standard in some depth; we'll also say a few words
+about Bluetooth and other wireless personal area networks. In Section
+7.4, we'll provide an overview of cellular Internet access, including 3G
+and emerging 4G cellular technologies that provide both voice and
+high-speed Internet access. In Section 7.5, we'll turn our attention to
+mobility, focusing on the problems of locating a mobile user, routing to
+the mobile user, and "handing off" the mobile user who dynamically moves
+from one point of attachment to the network to another. We'll examine
+how these mobility services are implemented in the mobile IP standard in
+enterprise 802.11 networks, and in LTE cellular networks in Sections 7.6
+and 7.7, respectively. Finally, we'll consider the impact of wireless
+links and mobility on transport-layer protocols and networked
+applications in Section 7.8.
+
+7.1 Introduction Figure 7.1 shows the setting in which we'll consider
+the topics of wireless data communication and mobility. We'll begin by
+keeping our discussion general enough to cover a wide range of networks,
+including both wireless LANs such as IEEE 802.11 and cellular networks
+such as a 4G network; we'll drill down into a more detailed discussion
+of specific wireless architectures in later sections. We can identify
+the following elements in a wireless network: Wireless hosts. As in the
+case of wired networks, hosts are the end-system devices that run
+applications. A wireless host might be a laptop, tablet, smartphone, or
+desktop computer. The hosts themselves may or may not be mobile.
+
+Figure 7.1 Elements of a wireless network
+
+Wireless links. A host connects to a base station (defined below) or to
+another wireless host through a wireless communication link. Different
+wireless link technologies have different
+
+transmission rates and can transmit over different distances. Figure 7.2
+shows two key characteristics (coverage area and link rate) of the more
+popular wireless network standards. (The figure is only meant to provide
+a rough idea of these characteristics. For example, some of these types
+of networks are only now being deployed, and some link rates can
+increase or decrease beyond the values shown depending on distance,
+channel conditions, and the number of users in the wireless network.)
+We'll cover these standards later in the first half of this chapter;
+we'll also consider other wireless link characteristics (such as their
+bit error rates and the causes of bit errors) in Section 7.2. In Figure
+7.1, wireless links connect wireless hosts located at the edge of the
+network into the larger network infrastructure. We hasten to add that
+wireless links are also sometimes used within a network to connect
+routers, switches, and
+
+Figure 7.2 Link characteristics of selected wireless network standards
+
+other network equipment. However, our focus in this chapter will be on
+the use of wireless communication at the network edge, as it is here
+that many of the most exciting technical challenges, and most of the
+growth, are occurring. Base station. The base station is a key part of
+the wireless network infrastructure. Unlike the wireless host and
+wireless link, a base station has no obvious counterpart in a wired
+network. A base station is responsible for sending and receiving data
+(e.g., packets) to and from a wireless host that is associated with that
+base station. A base station will often be responsible for coordinating
+the transmission of multiple wireless hosts with which it is associated.
+When we say a wireless host is
+
+"associated" with a base station, we mean that (1) the host is within
+the wireless communication distance of the base station, and (2) the
+host uses that base station to relay data between it (the host) and the
+larger network. Cell towers in cellular networks and access points in
+802.11 wireless LANs are examples of base stations. In Figure 7.1, the
+base station is connected to the larger network (e.g., the Internet,
+corporate or home network, or telephone network), thus functioning as a
+link-layer relay between the wireless host and the rest of the world
+with which the host communicates. Hosts associated with a base station
+are often referred to as operating in infrastructure mode, since all
+traditional network services (e.g., address assignment and routing) are
+provided by the network to which a host is connected via CASE HISTORY
+PUBLIC WIFI ACCESS: COMING SOON TO A LAMP POST NEAR YOU? WiFi
+hotspots---public locations where users can find 802.11 wireless
+access---are becoming increasingly common in hotels, airports, and cafés
+around the world. Most college campuses offer ubiquitous wireless
+access, and it's hard to find a hotel that doesn't offer wireless
+Internet access. Over the past decade a number of cities have designed,
+deployed, and operated municipal WiFi networks. The vision of providing
+ubiquitous WiFi access to the community as a public service (much like
+streetlights)---helping to bridge the digital divide by providing
+Internet access to all citizens and to promote economic development---is
+compelling. Many cities around the world, including Philadelphia,
+Toronto, Hong Kong, Minneapolis, London, and Auckland, have plans to
+provide ubiquitous wireless within the city, or have already done so to
+varying degrees. The goal in Philadelphia was to "turn Philadelphia into
+the nation's largest WiFi hotspot and help to improve education, bridge
+the digital divide, enhance neighborhood development, and reduce the
+costs of government." The ambitious program--- an agreement between the
+city, Wireless Philadelphia (a nonprofit entity), and the Internet
+Service Provider Earthlink---built an operational network of 802.11b
+hotspots on streetlamp pole arms and traffic control devices that
+covered 80 percent of the city. But financial and operational concerns
+caused the network to be sold to a group of private investors in 2008,
+who later sold the network back to the city in 2010. Other cities, such
+as Minneapolis, Toronto, Hong Kong, and Auckland, have had success with
+smaller-scale efforts. The fact that 802.11 networks operate in the
+unlicensed spectrum (and hence can be deployed without purchasing
+expensive spectrum use rights) would seem to make them financially
+attractive. However, 802.11 access points (see Section 7.3) have much
+shorter ranges than 4G cellular base stations (see Section 7.4),
+requiring a larger number of deployed endpoints to cover the same
+geographic region. Cellular data networks providing Internet access, on
+the other hand, operate in the licensed spectrum. Cellular providers pay
+
+billions of dollars for spectrum access rights for their networks,
+making cellular data networks a business rather than municipal
+undertaking. the base station. In ad hoc networks, wireless hosts have
+no such infrastructure with which to connect. In the absence of such
+infrastructure, the hosts themselves must provide for services such as
+routing, address assignment, DNS-like name translation, and more. When a
+mobile host moves beyond the range of one base station and into the
+range of another, it will change its point of attachment into the larger
+network (i.e., change the base station with which it is associated)---a
+process referred to as handoff. Such mobility raises many challenging
+questions. If a host can move, how does one find the mobile host's
+current location in the network so that data can be forwarded to that
+mobile host? How is addressing performed, given that a host can be in
+one of many possible locations? If the host moves during a TCP
+connection or phone call, how is data routed so that the connection
+continues uninterrupted? These and many (many!) other questions make
+wireless and mobile networking an area of exciting networking research.
+Network infrastructure. This is the larger network with which a wireless
+host may wish to communicate. Having discussed the "pieces" of a
+wireless network, we note that these pieces can be combined in many
+different ways to form different types of wireless networks. You may
+find a taxonomy of these types of wireless networks useful as you read
+on in this chapter, or read/learn more about wireless networks beyond
+this book. At the highest level we can classify wireless networks
+according to two criteria: (i) whether a packet in the wireless network
+crosses exactly one wireless hop or multiple wireless hops, and (ii)
+whether there is infrastructure such as a base station in the network:
+Single-hop, infrastructure-based. These networks have a base station
+that is connected to a larger wired network (e.g., the Internet).
+Furthermore, all communication is between this base station and a
+wireless host over a single wireless hop. The 802.11 networks you use in
+the classroom, café, or library; and the 4G LTE data networks that we
+will learn about shortly all fall in this category. The vast majority of
+our daily interactions are with single-hop, infrastructure-based
+wireless networks. Single-hop, infrastructure-less. In these networks,
+there is no base station that is connected to a wireless network.
+However, as we will see, one of the nodes in this single-hop network may
+coordinate the transmissions of the other nodes. Bluetooth networks
+(that connect small wireless devices such as keyboards, speakers, and
+headsets, and which we will study in Section 7.3.6) and 802.11 networks
+in ad hoc mode are single-hop, infrastructure-less networks. Multi-hop,
+infrastructure-based. In these networks, a base station is present that
+is wired to the larger network. However, some wireless nodes may have to
+relay their communication through other wireless nodes in order to
+communicate via the base station. Some wireless sensor networks and
+so-called wireless mesh networks fall in this category. Multi-hop,
+infrastructure-less. There is no base station in these networks, and
+nodes may have to relay messages among several other nodes in order to
+reach a destination. Nodes may also be
+
+mobile, with connectivity changing among nodes---a class of networks
+known as mobile ad hoc networks (MANETs). If the mobile nodes are
+vehicles, the network is a vehicular ad hoc network (VANET). As you
+might imagine, the development of protocols for such networks is
+challenging and is the subject of much ongoing research. In this
+chapter, we'll mostly confine ourselves to single-hop networks, and then
+mostly to infrastructurebased networks. Let's now dig deeper into the
+technical challenges that arise in wireless and mobile networks. We'll
+begin by first considering the individual wireless link, deferring our
+discussion of mobility until later in this chapter.
+
+7.2 Wireless Links and Network Characteristics Let's begin by
+considering a simple wired network, say a home network, with a wired
+Ethernet switch (see Section 6.4) interconnecting the hosts. If we
+replace the wired Ethernet with a wireless 802.11 network, a wireless
+network interface would replace the host's wired Ethernet interface, and
+an access point would replace the Ethernet switch, but virtually no
+changes would be needed at the network layer or above. This suggests
+that we focus our attention on the link layer when looking for important
+differences between wired and wireless networks. Indeed, we can find a
+number of important differences between a wired link and a wireless
+link: Decreasing signal strength. Electromagnetic radiation attenuates
+as it passes through matter (e.g., a radio signal passing through a
+wall). Even in free space, the signal will disperse, resulting in
+decreased signal strength (sometimes referred to as path loss) as the
+distance between sender and receiver increases. Interference from other
+sources. Radio sources transmitting in the same frequency band will
+interfere with each other. For example, 2.4 GHz wireless phones and
+802.11b wireless LANs transmit in the same frequency band. Thus, the
+802.11b wireless LAN user talking on a 2.4 GHz wireless phone can expect
+that neither the network nor the phone will perform particularly well.
+In addition to interference from transmitting sources, electromagnetic
+noise within the environment (e.g., a nearby motor, a microwave) can
+result in interference. Multipath propagation. Multipath propagation
+occurs when portions of the electromagnetic wave reflect off objects and
+the ground, taking paths of different lengths between a sender and
+receiver. This results in the blurring of the received signal at the
+receiver. Moving objects between the sender and receiver can cause
+multipath propagation to change over time. For a detailed discussion of
+wireless channel characteristics, models, and measurements, see
+\[Anderson 1995\]. The discussion above suggests that bit errors will be
+more common in wireless links than in wired links. For this reason, it
+is perhaps not surprising that wireless link protocols (such as the
+802.11 protocol we'll examine in the following section) employ not only
+powerful CRC error detection codes, but also link-level
+reliable-data-transfer protocols that retransmit corrupted frames.
+Having considered the impairments that can occur on a wireless channel,
+let's next turn our attention to the host receiving the wireless signal.
+This host receives an electromagnetic signal that is a combination of a
+degraded form of the original signal transmitted by the sender (degraded
+due to the attenuation and multipath propagation effects that we
+discussed above, among others) and background noise in the
+
+environment. The signal-to-noise ratio (SNR) is a relative measure of
+the strength of the received signal (i.e., the information being
+transmitted) and this noise. The SNR is typically measured in units of
+decibels (dB), a unit of measure that some think is used by electrical
+engineers primarily to confuse computer scientists. The SNR, measured in
+dB, is twenty times the ratio of the base-10 logarithm of the amplitude
+of the received signal to the amplitude of the noise. For our purposes
+here, we need only know that a larger SNR makes it easier for the
+receiver to extract the transmitted signal from the background noise.
+Figure 7.3 (adapted from \[Holland 2001\]) shows the bit error rate
+(BER)---roughly speaking, the probability that a transmitted bit is
+received in error at the receiver---versus the SNR for three different
+modulation techniques for encoding information for transmission on an
+idealized wireless channel. The theory of modulation and coding, as well
+as signal extraction and BER, is well beyond the scope of
+
+Figure 7.3 Bit error rate, transmission rate, and SNR
+
+Figure 7.4 Hidden terminal problem caused by obstacle (a) and fading (b)
+
+this text (see \[Schwartz 1980\] for a discussion of these topics).
+Nonetheless, Figure 7.3 illustrates several physical-layer
+characteristics that are important in understanding higher-layer
+wireless communication protocols: For a given modulation scheme, the
+higher the SNR, the lower the BER. Since a sender can increase the SNR
+by increasing its transmission power, a sender can decrease the
+probability that a frame is received in error by increasing its
+transmission power. Note, however, that there is arguably little
+practical gain in increasing the power beyond a certain threshold, say
+to decrease the BER from 10−12 to 10−13. There are also disadvantages
+associated with increasing the transmission power: More energy must be
+expended by the sender (an important concern for battery-powered mobile
+users), and the sender's transmissions are more likely to interfere with
+the transmissions of another sender (see Figure 7.4(b)). For a given
+SNR, a modulation technique with a higher bit transmission rate (whether
+in error or not) will have a higher BER. For example, in Figure 7.3,
+with an SNR of 10 dB, BPSK modulation with a transmission rate of 1 Mbps
+has a BER of less than 10−7, while with QAM16 modulation with a
+transmission rate of 4 Mbps, the BER is 10−1, far too high to be
+practically useful. However, with an SNR of 20 dB, QAM16 modulation has
+a transmission rate of 4 Mbps and a BER of 10−7, while BPSK modulation
+has a transmission rate of only 1 Mbps and a BER that is so low as to be
+(literally) "off the charts." If one can tolerate a BER of 10−7, the
+higher transmission rate offered by QAM16 would make it the preferred
+modulation technique in this situation. These considerations give rise
+to the final characteristic, described next. Dynamic selection of the
+physical-layer modulation technique can be used to adapt the modulation
+technique to channel conditions. The SNR (and hence the BER) may change
+as a result of mobility or due to changes in the environment. Adaptive
+modulation and coding are used in cellular data systems and in the
+802.11 WiFi and 4G cellular data networks that we'll study in Sections
+7.3 and 7.4. This allows, for example, the selection of a modulation
+technique that provides the highest transmission rate possible subject
+to a constraint on the BER, for given channel characteristics.
+
+A higher and time-varying bit error rate is not the only difference
+between a wired and wireless link. Recall that in the case of wired
+broadcast links, all nodes receive the transmissions from all other
+nodes. In the case of wireless links, the situation is not as simple, as
+shown in Figure 7.4. Suppose that Station A is transmitting to Station
+B. Suppose also that Station C is transmitting to Station B. With the
+so-called hidden terminal problem, physical obstructions in the
+environment (for example, a mountain or a building) may prevent A and C
+from hearing each other's transmissions, even though A's and C's
+transmissions are indeed interfering at the destination, B. This is
+shown in Figure 7.4(a). A second scenario that results in undetectable
+collisions at the receiver results from the fading of a signal's
+strength as it propagates through the wireless medium. Figure 7.4(b)
+illustrates the case where A and C are placed such that their signals
+are not strong enough to detect each other's transmissions, yet their
+signals are strong enough to interfere with each other at station B. As
+we'll see in Section 7.3, the hidden terminal problem and fading make
+multiple access in a wireless network considerably more complex than in
+a wired network.
+
+7.2.1 CDMA Recall from Chapter 6 that when hosts communicate over a
+shared medium, a protocol is needed so that the signals sent by multiple
+senders do not interfere at the receivers. In Chapter 6 we described
+three classes of medium access protocols: channel partitioning, random
+access, and taking turns. Code division multiple access (CDMA) belongs
+to the family of channel partitioning protocols. It is prevalent in
+wireless LAN and cellular technologies. Because CDMA is so important in
+the wireless world, we'll take a quick look at CDMA now, before getting
+into specific wireless access technologies in the subsequent sections.
+In a CDMA protocol, each bit being sent is encoded by multiplying the
+bit by a signal (the code) that changes at a much faster rate (known as
+the chipping rate) than the original sequence of data bits. Figure 7.5
+shows a simple, idealized CDMA encoding/decoding scenario. Suppose that
+the rate at which original data bits reach the CDMA encoder defines the
+unit of time; that is, each original data bit to be transmitted requires
+a one-bit slot time. Let di be the value of the data bit for the ith bit
+slot. For mathematical convenience, we represent a data bit with a 0
+value as −1. Each bit slot is further subdivided into M mini-slots; in
+Figure 7.5, M=8,
+
+Figure 7.5 A simple CDMA example: Sender encoding, receiver decoding
+
+although in practice M is much larger. The CDMA code used by the sender
+consists of a sequence of M values, cm, m=1,..., M, each taking a+1 or
+−1 value. In the example in Figure 7.5, the M-bit CDMA code being used
+by the sender is (1,1,1,−1,1,−1,−1,−1). To illustrate how CDMA works,
+let us focus on the ith data bit, di. For the mth mini-slot of the
+bittransmission time of di, the output of the CDMA encoder, Zi,m, is the
+value of di multiplied by the mth bit in the assigned CDMA code, cm:
+Zi,m=di⋅cm In a simple world, with no interfering senders, the receiver
+would receive the encoded bits, Zi,m, and recover the original data bit,
+di, by computing:
+
+(7.1)
+
+di=1M∑m=1MZi,m⋅cm
+
+(7.2)
+
+The reader might want to work through the details of the example in
+Figure 7.5 to see that the original data bits are indeed correctly
+recovered at the receiver using Equation 7.2. The world is far from
+ideal, however, and as noted above, CDMA must work in the presence of
+interfering senders that are encoding and transmitting their data using
+a different assigned code. But how can a CDMA receiver recover a
+sender's original data bits when those data bits are being tangled with
+bits being transmitted by other senders? CDMA works under the assumption
+that the interfering transmitted bit signals are additive. This means,
+for example, that if three senders send a 1 value, and a fourth sender
+sends a −1 value during the same mini-slot, then the received signal at
+all receivers during that mini-slot is a 2 (since 1+1+1−1=2). In the
+presence of multiple senders, sender s computes its encoded
+transmissions, Zi,ms, in exactly the same manner as in Equation 7.1. The
+value received at a receiver during the mth mini-slot of the ith bit
+slot, however, is now the sum of the transmitted bits from all N senders
+during that mini-slot: Zi,m*=∑s=1NZi,ms Amazingly, if the senders' codes
+are chosen carefully, each receiver can recover the data sent by a given
+sender out of the aggregate signal simply by using the sender's code in
+exactly the same manner as in Equation 7.2: di=1M∑m=1MZi,m*⋅cm
+
+(7.3)
+
+as shown in Figure 7.6, for a two-sender CDMA example. The M-bit CDMA
+code being used by the upper sender is (1,1,1,−1,1,−1,−1,−1), while the
+CDMA code being used by the lower sender is (1,−1,1,1,1,−1,1,1). Figure
+7.6 illustrates a receiver recovering the original data bits from the
+upper sender. Note that the receiver is able to extract the data from
+sender 1 in spite of the interfering transmission from sender 2. Recall
+our cocktail analogy from Chapter 6. A CDMA protocol is similar to
+having partygoers speaking in multiple languages; in such circumstances
+humans are actually quite good at locking into the conversation in the
+language they understand, while filtering out the remaining
+conversations. We see here that CDMA is a partitioning protocol in that
+it partitions the codespace (as opposed to time or frequency) and
+assigns each node a dedicated piece of the codespace. Our discussion
+here of CDMA is necessarily brief; in practice a number of difficult
+issues must be addressed. First, in order for the CDMA receivers to be
+able
+
+Figure 7.6 A two-sender CDMA example
+
+to extract a particular sender's signal, the CDMA codes must be
+carefully chosen. Second, our discussion has assumed that the received
+signal strengths from various senders are the same; in reality this can
+be difficult to achieve. There is a considerable body of literature
+addressing these and other issues related to CDMA; see \[Pickholtz 1982;
+Viterbi 1995\] for details.
+
+7.3 WiFi: 802.11 Wireless LANs Pervasive in the workplace, the home,
+educational institutions, cafés, airports, and street corners, wireless
+LANs are now one of the most important access network technologies in
+the Internet today. Although many technologies and standards for
+wireless LANs were developed in the 1990s, one particular class of
+standards has clearly emerged as the winner: the IEEE 802.11 wireless
+LAN, also known as WiFi. In this section, we'll take a close look at
+802.11 wireless LANs, examining its frame structure, its medium access
+protocol, and its internetworking of 802.11 LANs with wired Ethernet
+LANs. There are several 802.11 standards for wireless LAN technology in
+the IEEE 802.11 ("WiFi") family, as summarized in Table 7.1. The
+different 802.11 standards all share some common characteristics. They
+all use the same medium access protocol, CSMA/CA, which we'll discuss
+shortly. All three use the same frame structure for their link-layer
+frames as well. All three standards have the ability to reduce their
+transmission rate in order to reach out over greater distances. And,
+importantly, 802.11 products are also all backwards compatible, meaning,
+for example, that a mobile capable only of 802.11g may still interact
+with a newer 802.11ac base station. However, as shown in Table 7.1, the
+standards have some major differences at the physical layer. 802.11
+devices operate in two difference frequency ranges: 2.4--2.485 GHz
+(referred to as the 2.4 GHz range) and 5.1 -- 5.8 GHz (referred to as
+the 5 GHz range). The 2.4 GHz range is an unlicensed frequency band,
+where 802.11 devices may compete for frequency spectrum with 2.4 GHz
+phones and microwave ovens. At 5 GHz, 802.11 LANs have a shorter
+transmission distance for a given power level and suffer more from
+multipath propagation. The two most recent standards, 802.11n \[IEEE
+802.11n 2012\] and 802.11ac \[IEEE 802.11ac 2013; Cisco 802.11ac 2015\]
+uses multiple input multiple-output (MIMO) antennas; i.e., two or more
+antennas on the sending side and two or more antennas on the receiving
+side that are transmitting/receiving different signals \[Diggavi 2004\].
+802.11ac base Table 7.1 Summary of IEEE 802.11 standards Standard
+
+Frequency Range
+
+Data Rate
+
+802.11b
+
+2.4 GHz
+
+up to 11 Mbps
+
+802.11a
+
+5 GHz
+
+up to 54 Mbps
+
+802.11g
+
+2.4 GHz
+
+up to 54 Mbps
+
+802.11n
+
+2.5 GHz and 5 GHz
+
+up to 450 Mbps
+
+802.11ac
+
+5 GHz
+
+up to 1300 Mbps
+
+stations may transmit to multiple stations simultaneously, and use
+"smart" antennas to adaptively beamform to target transmissions in the
+direction of a receiver. This decreases interference and increases the
+distance reached at a given data rate. The data rates shown in Table 7.1
+are for an idealized environment, e.g., a receiver placed 1 meter away
+from the base station, with no interference ---a scenario that we're
+unlikely to experience in practice! So as the saying goes, YMMV: Your
+Mileage (or in this case your wireless data rate) May Vary.
+
+7.3.1 The 802.11 Architecture Figure 7.7 illustrates the principal
+components of the 802.11 wireless LAN architecture. The fundamental
+building block of the 802.11 architecture is the basic service set
+(BSS). A BSS contains one or more wireless stations and a central base
+station, known as an access point (AP) in 802.11 parlance. Figure 7.7
+shows the AP in each of two BSSs connecting to an interconnection device
+(such as a switch or router), which in turn leads to the Internet. In a
+typical home network, there is one AP and one router (typically
+integrated together as one unit) that connects the BSS to the Internet.
+As with Ethernet devices, each 802.11 wireless station has a 6-byte MAC
+address that is stored in the firmware of the station's adapter (that
+is, 802.11 network interface card). Each AP also has a MAC address for
+its wireless interface. As with Ethernet, these MAC addresses are
+administered by IEEE and are (in theory) globally unique.
+
+Figure 7.7 IEEE 802.11 LAN architecture
+
+Figure 7.8 An IEEE 802.11 ad hoc network
+
+As noted in Section 7.1, wireless LANs that deploy APs are often
+referred to as infrastructure wireless LANs, with the "infrastructure"
+being the APs along with the wired Ethernet infrastructure that
+interconnects the APs and a router. Figure 7.8 shows that IEEE 802.11
+stations can also group themselves together to form an ad hoc
+network---a network with no central control and with no connections to
+the "outside world." Here, the network is formed "on the fly," by mobile
+devices that have found themselves in proximity to each other, that have
+a need to communicate, and that find no preexisting network
+infrastructure in their location. An ad hoc network might be formed when
+people with
+
+laptops get together (for example, in a conference room, a train, or a
+car) and want to exchange data in the absence of a centralized AP. There
+has been tremendous interest in ad hoc networking, as communicating
+portable devices continue to proliferate. In this section, though, we'll
+focus our attention on infrastructure wireless LANs. Channels and
+Association In 802.11, each wireless station needs to associate with an
+AP before it can send or receive networklayer data. Although all of the
+802.11 standards use association, we'll discuss this topic specifically
+in the context of IEEE 802.11b/g. When a network administrator installs
+an AP, the administrator assigns a one- or two-word Service Set
+Identifier (SSID) to the access point. (When you choose Wi-Fi under
+Setting on your iPhone, for example, a list is displayed showing the
+SSID of each AP in range.) The administrator must also assign a channel
+number to the AP. To understand channel numbers, recall that 802.11
+operates in the frequency range of 2.4 GHz to 2.4835 GHz. Within this 85
+MHz band, 802.11 defines 11 partially overlapping channels. Any two
+channels are non-overlapping if and only if they are separated by four
+or more channels. In particular, the set of channels 1, 6, and 11 is the
+only set of three non-overlapping channels. This means that an
+administrator could create a wireless LAN with an aggregate maximum
+transmission rate of 33 Mbps by installing three 802.11b APs at the same
+physical location, assigning channels 1, 6, and 11 to the APs, and
+interconnecting each of the APs with a switch. Now that we have a basic
+understanding of 802.11 channels, let's describe an interesting (and not
+completely uncommon) situation---that of a WiFi jungle. A WiFi jungle is
+any physical location where a wireless station receives a sufficiently
+strong signal from two or more APs. For example, in many cafés in New
+York City, a wireless station can pick up a signal from numerous nearby
+APs. One of the APs might be managed by the café, while the other APs
+might be in residential apartments near the café. Each of these APs
+would likely be located in a different IP subnet and would have been
+independently assigned a channel. Now suppose you enter such a WiFi
+jungle with your phone, tablet, or laptop, seeking wireless Internet
+access and a blueberry muffin. Suppose there are five APs in the WiFi
+jungle. To gain Internet access, your wireless device needs to join
+exactly one of the subnets and hence needs to associate with exactly one
+of the APs. Associating means the wireless device creates a virtual wire
+between itself and the AP. Specifically, only the associated AP will
+send data frames (that is, frames containing data, such as a datagram)
+to your wireless device, and your wireless device will send data frames
+into the Internet only through the associated AP. But how does your
+wireless device associate with a particular AP? And more fundamentally,
+how does your wireless device know which APs, if any, are out there in
+the jungle? The 802.11 standard requires that an AP periodically send
+beacon frames, each of which includes the
+
+AP's SSID and MAC address. Your wireless device, knowing that APs are
+sending out beacon frames, scans the 11 channels, seeking beacon frames
+from any APs that may be out there (some of which may be transmitting on
+the same channel---it's a jungle out there!). Having learned about
+available APs from the beacon frames, you (or your wireless device)
+select one of the APs for association. The 802.11 standard does not
+specify an algorithm for selecting which of the available APs to
+associate with; that algorithm is left up to the designers of the 802.11
+firmware and software in your wireless device. Typically, the device
+chooses the AP whose beacon frame is received with the highest signal
+strength. While a high signal strength is good (see, e.g., Figure 7.3),
+signal strength is not the only AP characteristic that will determine
+the performance a device receives. In particular, it's possible that the
+selected AP may have a strong signal, but may be overloaded with other
+affiliated devices (that will need to share the wireless bandwidth at
+that AP), while an unloaded AP is not selected due to a slightly weaker
+signal. A number of alternative ways of choosing APs have thus recently
+been proposed \[Vasudevan 2005; Nicholson 2006; Sundaresan 2006\]. For
+an interesting and down-to-earth discussion of how signal strength is
+measured, see \[Bardwell 2004\].
+
+Figure 7.9 Active and passive scanning for access points
+
+The process of scanning channels and listening for beacon frames is
+known as passive scanning (see Figure 7.9a). A wireless device can also
+perform active scanning, by broadcasting a probe frame that will be
+received by all APs within the wireless device's range, as shown in
+Figure 7.9b. APs respond to the probe request frame with a probe
+response frame. The wireless device can then choose the AP with which to
+associate from among the responding APs.
+
+After selecting the AP with which to associate, the wireless device
+sends an association request frame to the AP, and the AP responds with
+an association response frame. Note that this second request/response
+handshake is needed with active scanning, since an AP responding to the
+initial probe request frame doesn't know which of the (possibly many)
+responding APs the device will choose to associate with, in much the
+same way that a DHCP client can choose from among multiple DHCP servers
+(see Figure 4.21). Once associated with an AP, the device will want to
+join the subnet (in the IP addressing sense of Section 4.3.3) to which
+the AP belongs. Thus, the device will typically send a DHCP discovery
+message (see Figure 4.21) into the subnet via the AP in order to obtain
+an IP address on the subnet. Once the address is obtained, the rest of
+the world then views that device simply as another host with an IP
+address in that subnet. In order to create an association with a
+particular AP, the wireless device may be required to authenticate
+itself to the AP. 802.11 wireless LANs provide a number of alternatives
+for authentication and access. One approach, used by many companies, is
+to permit access to a wireless network based on a device's MAC address.
+A second approach, used by many Internet cafés, employs usernames and
+passwords. In both cases, the AP typically communicates with an
+authentication server, relaying information between the wireless device
+and the authentication server using a protocol such as RADIUS \[RFC
+2865\] or DIAMETER \[RFC 3588\]. Separating the authentication server
+from the AP allows one authentication server to serve many APs,
+centralizing the (often sensitive) decisions of authentication and
+access within the single server, and keeping AP costs and complexity
+low. We'll see in Chapter 8 that the new IEEE 802.11i protocol defining
+security aspects of the 802.11 protocol family takes precisely this
+approach.
+
+7.3.2 The 802.11 MAC Protocol Once a wireless device is associated with
+an AP, it can start sending and receiving data frames to and from the
+access point. But because multiple wireless devices, or the AP itself
+may want to transmit data frames at the same time over the same channel,
+a multiple access protocol is needed to coordinate the transmissions. In
+the following, we'll refer to the devices or the AP as wireless
+"stations" that share the multiple access channel. As discussed in
+Chapter 6 and Section 7.2.1, broadly speaking there are three classes of
+multiple access protocols: channel partitioning (including CDMA), random
+access, and taking turns. Inspired by the huge success of Ethernet and
+its random access protocol, the designers of 802.11 chose a random
+access protocol for 802.11 wireless LANs. This random access protocol is
+referred to as CSMA with collision avoidance, or more succinctly as
+CSMA/CA. As with Ethernet's CSMA/CD, the "CSMA" in CSMA/CA stands for
+"carrier sense multiple access," meaning that each station senses the
+channel before transmitting, and refrains from transmitting when the
+channel is sensed busy. Although both Ethernet and 802.11 use
+carrier-sensing random access, the two MAC protocols have important
+differences. First, instead of using collision detection, 802.11 uses
+collisionavoidance techniques. Second, because of the relatively high
+bit error rates of wireless channels,
+
+802.11 (unlike Ethernet) uses a link-layer acknowledgment/retransmission
+(ARQ) scheme. We'll describe 802.11's collision-avoidance and link-layer
+acknowledgment schemes below. Recall from Sections 6.3.2 and 6.4.2 that
+with Ethernet's collision-detection algorithm, an Ethernet station
+listens to the channel as it transmits. If, while transmitting, it
+detects that another station is also transmitting, it aborts its
+transmission and tries to transmit again after waiting a small, random
+amount of time. Unlike the 802.3 Ethernet protocol, the 802.11 MAC
+protocol does not implement collision detection. There are two important
+reasons for this: The ability to detect collisions requires the ability
+to send (the station's own signal) and receive (to determine whether
+another station is also transmitting) at the same time. Because the
+strength of the received signal is typically very small compared to the
+strength of the transmitted signal at the 802.11 adapter, it is costly
+to build hardware that can detect a collision. More importantly, even if
+the adapter could transmit and listen at the same time (and presumably
+abort transmission when it senses a busy channel), the adapter would
+still not be able to detect all collisions, due to the hidden terminal
+problem and fading, as discussed in Section 7.2. Because 802.11wireless
+LANs do not use collision detection, once a station begins to transmit a
+frame, it transmits the frame in its entirety; that is, once a station
+gets started, there is no turning back. As one might expect,
+transmitting entire frames (particularly long frames) when collisions
+are prevalent can significantly degrade a multiple access protocol's
+performance. In order to reduce the likelihood of collisions, 802.11
+employs several collision-avoidance techniques, which we'll shortly
+discuss. Before considering collision avoidance, however, we'll first
+need to examine 802.11's link-layer acknowledgment scheme. Recall from
+Section 7.2 that when a station in a wireless LAN sends a frame, the
+frame may not reach the destination station intact for a variety of
+reasons. To deal with this non-negligible chance of failure, the 802.11
+MAC protocol uses link-layer acknowledgments. As shown in Figure 7.10,
+when the destination station receives a frame that passes the CRC, it
+waits a short period of time known as the Short Inter-frame Spacing
+(SIFS) and then sends back
+
+Figure 7.10 802.11 uses link-layer acknowledgments
+
+an acknowledgment frame. If the transmitting station does not receive an
+acknowledgment within a given amount of time, it assumes that an error
+has occurred and retransmits the frame, using the CSMA/CA protocol to
+access the channel. If an acknowledgment is not received after some
+fixed number of retransmissions, the transmitting station gives up and
+discards the frame. Having discussed how 802.11 uses link-layer
+acknowledgments, we're now in a position to describe the 802.11 CSMA/CA
+protocol. Suppose that a station (wireless device or an AP) has a frame
+to transmit.
+
+1.  If initially the station senses the channel idle, it transmits its
+    frame after a short period of time known as the Distributed
+    Inter-frame Space (DIFS); see Figure 7.10.
+
+2.  Otherwise, the station chooses a random backoff value using binary
+    exponential backoff (as we encountered in Section 6.3.2) and counts
+    down this value after DIFS when the channel is sensed idle. While
+    the channel is sensed busy, the counter value remains frozen.
+
+3.  When the counter reaches zero (note that this can only occur while
+    the channel is sensed idle), the station transmits the entire frame
+    and then waits for an acknowledgment.
+
+4.  If an acknowledgment is received, the transmitting station knows
+    that its frame has been correctly received at the destination
+    station. If the station has another frame to send, it begins
+
+the CSMA/CA protocol at step 2. If the acknowledgment isn't received,
+the transmitting station reenters the backoff phase in step 2, with the
+random value chosen from a larger interval. Recall that under Ethernet's
+CSMA/CD, multiple access protocol (Section 6.3.2), a station begins
+transmitting as soon as the channel is sensed idle. With CSMA/CA,
+however, the station refrains from transmitting while counting down,
+even when it senses the channel to be idle. Why do CSMA/CD and CDMA/CA
+take such different approaches here? To answer this question, let's
+consider a scenario in which two stations each have a data frame to
+transmit, but neither station transmits immediately because each senses
+that a third station is already transmitting. With Ethernet's CSMA/CD,
+the two stations would each transmit as soon as they detect that the
+third station has finished transmitting. This would cause a collision,
+which isn't a serious issue in CSMA/CD, since both stations would abort
+their transmissions and thus avoid the useless transmissions of the
+remainders of their frames. In 802.11, however, the situation is quite
+different. Because 802.11 does not detect a collision and abort
+transmission, a frame suffering a collision will be transmitted in its
+entirety. The goal in 802.11 is thus to avoid collisions whenever
+possible. In 802.11, if the two stations sense the channel busy, they
+both immediately enter random backoff, hopefully choosing different
+backoff values. If these values are indeed different, once the channel
+becomes idle, one of the two stations will begin transmitting before the
+other, and (if the two stations are not hidden from each other) the
+"losing station" will hear the "winning station's" signal, freeze its
+counter, and refrain from transmitting until the winning station has
+completed its transmission. In this manner, a costly collision is
+avoided. Of course, collisions can still occur with 802.11 in this
+scenario: The two stations could be hidden from each other, or the two
+stations could choose random backoff values that are close enough that
+the transmission from the station starting first have yet to reach the
+second station. Recall that we encountered this problem earlier in our
+discussion of random access algorithms in the context of Figure 6.12.
+Dealing with Hidden Terminals: RTS and CTS The 802.11 MAC protocol also
+includes a nifty (but optional) reservation scheme that helps avoid
+collisions even in the presence of hidden terminals. Let's investigate
+this scheme in the context of Figure 7.11, which shows two wireless
+stations and one access point. Both of the wireless stations are within
+range of the AP (whose coverage is shown as a shaded circle) and both
+have associated with the AP. However, due to fading, the signal ranges
+of wireless stations are limited to the interiors of the shaded circles
+shown in Figure 7.11. Thus, each of the wireless stations is hidden from
+the other, although neither is hidden from the AP. Let's now consider
+why hidden terminals can be problematic. Suppose Station H1 is
+transmitting a frame and halfway through H1's transmission, Station H2
+wants to send a frame to the AP. H2, not hearing the transmission from
+H1, will first wait a DIFS interval and then transmit the frame,
+resulting in
+
+a collision. The channel will therefore be wasted during the entire
+period of H1's transmission as well as during H2's transmission. In
+order to avoid this problem, the IEEE 802.11 protocol allows a station
+to use a short Request to Send (RTS) control frame and a short Clear to
+Send (CTS) control frame to reserve access to the channel. When a sender
+wants to send a DATA
+
+Figure 7.11 Hidden terminal example: H1 is hidden from H2, and vice
+versa
+
+frame, it can first send an RTS frame to the AP, indicating the total
+time required to transmit the DATA frame and the acknowledgment (ACK)
+frame. When the AP receives the RTS frame, it responds by broadcasting a
+CTS frame. This CTS frame serves two purposes: It gives the sender
+explicit permission to send and also instructs the other stations not to
+send for the reserved duration. Thus, in Figure 7.12, before
+transmitting a DATA frame, H1 first broadcasts an RTS frame, which is
+heard by all stations in its circle, including the AP. The AP then
+responds
+
+Figure 7.12 Collision avoidance using the RTS and CTS frames
+
+with a CTS frame, which is heard by all stations within its range,
+including H1 and H2. Station H2, having heard the CTS, refrains from
+transmitting for the time specified in the CTS frame. The RTS, CTS,
+DATA, and ACK frames are shown in Figure 7.12. The use of the RTS and
+CTS frames can improve performance in two important ways: The hidden
+station problem is mitigated, since a long DATA frame is transmitted
+only after the channel has been reserved. Because the RTS and CTS frames
+are short, a collision involving an RTS or CTS frame will last only
+
+for the duration of the short RTS or CTS frame. Once the RTS and CTS
+frames are correctly transmitted, the following DATA and ACK frames
+should be transmitted without collisions. You are encouraged to check
+out the 802.11 applet in the textbook's Web site. This interactive
+applet illustrates the CSMA/CA protocol, including the RTS/CTS exchange
+sequence. Although the RTS/CTS exchange can help reduce collisions, it
+also introduces delay and consumes channel resources. For this reason,
+the RTS/CTS exchange is only used (if at all) to reserve the channel for
+the transmission of a long DATA frame. In practice, each wireless
+station can set an RTS threshold such that the RTS/CTS sequence is used
+only when the frame is longer than the threshold. For many wireless
+stations, the default RTS threshold value is larger than the maximum
+frame length, so the RTS/CTS sequence is skipped for all DATA frames
+sent. Using 802.11 as a Point-to-Point Link Our discussion so far has
+focused on the use of 802.11 in a multiple access setting. We should
+mention that if two nodes each have a directional antenna, they can
+point their directional antennas at each other and run the 802.11
+protocol over what is essentially a point-to-point link. Given the low
+cost of commodity 802.11 hardware, the use of directional antennas and
+an increased transmission power allow 802.11 to be used as an
+inexpensive means of providing wireless point-to-point connections over
+tens of kilometers distance. \[Raman 2007\] describes one of the first
+such multi-hop wireless networks, operating in the rural Ganges plains
+in India using point-to-point 802.11 links.
+
+7.3.3 The IEEE 802.11 Frame Although the 802.11 frame shares many
+similarities with an Ethernet frame, it also contains a number of fields
+that are specific to its use for wireless links. The 802.11 frame is
+shown in Figure 7.13. The numbers above each of the fields in the frame
+represent the lengths of the fields in bytes; the numbers above each of
+the subfields in the frame control field represent the lengths of the
+subfields in bits. Let's now examine the fields in the frame as well as
+some of the more important subfields in the frame's control field.
+
+Figure 7.13 The 802.11 frame
+
+Payload and CRC Fields At the heart of the frame is the payload, which
+typically consists of an IP datagram or an ARP packet. Although the
+field is permitted to be as long as 2,312 bytes, it is typically fewer
+than 1,500 bytes, holding an IP datagram or an ARP packet. As with an
+Ethernet frame, an 802.11 frame includes a 32-bit cyclic redundancy
+check (CRC) so that the receiver can detect bit errors in the received
+frame. As we've seen, bit errors are much more common in wireless LANs
+than in wired LANs, so the CRC is even more useful here. Address Fields
+Perhaps the most striking difference in the 802.11 frame is that it has
+four address fields, each of which can hold a 6-byte MAC address. But
+why four address fields? Doesn't a source MAC field and destination MAC
+field suffice, as they do for Ethernet? It turns out that three address
+fields are needed for internetworking purposes---specifically, for
+moving the network-layer datagram from a wireless station through an AP
+to a router interface. The fourth address field is used when APs forward
+frames to each other in ad hoc mode. Since we are only considering
+infrastructure networks here, let's focus our attention on the first
+three address fields. The 802.11 standard defines these fields as
+follows: Address 2 is the MAC address of the station that transmits the
+frame. Thus, if a wireless station transmits the frame, that station's
+MAC address is inserted in the address 2 field. Similarly, if an AP
+transmits the frame, the AP's MAC address is inserted in the address 2
+field. Address 1 is the MAC address of the wireless station that is to
+receive the frame. Thus if a mobile wireless station transmits the
+frame, address 1 contains the MAC address of the destination AP.
+Similarly, if an AP transmits the frame, address 1 contains the MAC
+address of the destination wireless station.
+
+Figure 7.14 The use of address fields in 802.11 frames: Sending frames
+between H1 and R1
+
+To understand address 3, recall that the BSS (consisting of the AP and
+wireless stations) is part of a subnet, and that this subnet connects to
+other subnets via some router interface. Address 3 contains the MAC
+address of this router interface. To gain further insight into the
+purpose of address 3, let's walk through an internetworking example in
+the context of Figure 7.14. In this figure, there are two APs, each of
+which is responsible for a number of wireless stations. Each of the APs
+has a direct connection to a router, which in turn connects to the
+global Internet. We should keep in mind that an AP is a link-layer
+device, and thus neither "speaks" IP nor understands IP addresses.
+Consider now moving a datagram from the router interface R1 to the
+wireless Station H1. The router is not aware that there is an AP between
+it and H1; from the router's perspective, H1 is just a host in one of
+the subnets to which it (the router) is connected. The router, which
+knows the IP address of H1 (from the destination address of the
+datagram), uses ARP to determine the MAC address of H1, just as in an
+ordinary Ethernet LAN. After obtaining H1's MAC address, router
+interface R1 encapsulates the datagram within an Ethernet frame. The
+source address field of this frame contains R1's MAC address, and the
+destination address field contains H1's MAC address. When the Ethernet
+frame arrives at the AP, the AP converts the 802.3 Ethernet frame to an
+802.11 frame before transmitting the frame into the wireless channel.
+The AP fills in address 1 and address 2 with H1's MAC address and its
+own MAC address, respectively, as described above. For address 3, the AP
+inserts the MAC address of R1. In this manner, H1 can determine (from
+address 3) the MAC address of the router interface that sent the
+datagram into the subnet.
+
+Now consider what happens when the wireless station H1 responds by
+moving a datagram from H1 to R1. H1 creates an 802.11 frame, filling the
+fields for address 1 and address 2 with the AP's MAC address and H1's
+MAC address, respectively, as described above. For address 3, H1 inserts
+R1's MAC address. When the AP receives the 802.11 frame, it converts the
+frame to an Ethernet frame. The source address field for this frame is
+H1's MAC address, and the destination address field is R1's MAC address.
+Thus, address 3 allows the AP to determine the appropriate destination
+MAC address when constructing the Ethernet frame. In summary, address 3
+plays a crucial role for internetworking the BSS with a wired LAN.
+Sequence Number, Duration, and Frame Control Fields Recall that in
+802.11, whenever a station correctly receives a frame from another
+station, it sends back an acknowledgment. Because acknowledgments can
+get lost, the sending station may send multiple copies of a given frame.
+As we saw in our discussion of the rdt2.1 protocol (Section 3.4.1), the
+use of sequence numbers allows the receiver to distinguish between a
+newly transmitted frame and the retransmission of a previous frame. The
+sequence number field in the 802.11 frame thus serves exactly the same
+purpose here at the link layer as it did in the transport layer in
+Chapter 3. Recall that the 802.11 protocol allows a transmitting station
+to reserve the channel for a period of time that includes the time to
+transmit its data frame and the time to transmit an acknowledgment. This
+duration value is included in the frame's duration field (both for data
+frames and for the RTS and CTS frames). As shown in Figure 7.13, the
+frame control field includes many subfields. We'll say just a few words
+about some of the more important subfields; for a more complete
+discussion, you are encouraged to consult the 802.11 specification
+\[Held 2001; Crow 1997; IEEE 802.11 1999\]. The type and subtype fields
+are used to distinguish the association, RTS, CTS, ACK, and data frames.
+The to and from fields are used to define the meanings of the different
+address fields. (These meanings change depending on whether ad hoc or
+infrastructure modes are used and, in the case of infrastructure mode,
+whether a wireless station or an AP is sending the frame.) Finally the
+WEP field indicates whether encryption is being used or not (WEP is
+discussed in Chapter 8).
+
+7.3.4 Mobility in the Same IP Subnet
+
+In order to increase the physical range of a wireless LAN, companies and
+universities will often deploy multiple BSSs within the same IP subnet.
+This naturally raises the issue of mobility among the BSSs--- how do
+wireless stations seamlessly move from one BSS to another while
+maintaining ongoing TCP sessions? As we'll see in this subsection,
+mobility can be handled in a relatively straightforward manner when the
+BSSs are part of the subnet. When stations move between subnets, more
+sophisticated mobility management protocols will be needed, such as
+those we'll study in Sections 7.5 and 7.6. Let's now look at a specific
+example of mobility between BSSs in the same subnet. Figure 7.15 shows
+two interconnected BSSs with a host, H1, moving from BSS1 to BSS2.
+Because in this example the interconnection device that connects the two
+BSSs is not a router, all of the stations in the two BSSs, including the
+APs, belong to the same IP subnet. Thus, when H1 moves from BSS1 to
+BSS2, it may keep its IP address and all of its ongoing TCP connections.
+If the interconnection device were a router, then H1 would have to
+obtain a new IP address in the subnet in which it was moving. This
+address change would disrupt (and eventually terminate) any on-going TCP
+connections at H1. In Section 7.6, we'll see how a network-layer
+mobility protocol, such as mobile IP, can be used to avoid this problem.
+But what specifically happens when H1 moves from BSS1 to BSS2? As H1
+wanders away from AP1, H1 detects a weakening signal from AP1 and starts
+to scan for a stronger signal. H1 receives beacon frames from AP2 (which
+in many corporate and university settings will have the same SSID as
+AP1). H1 then disassociates with AP1 and associates with AP2, while
+keeping its IP address and maintaining its ongoing TCP sessions. This
+addresses the handoff problem from the host and AP viewpoint. But what
+about the switch in Figure 7.15? How does it know that the host has
+moved from one AP to another? As you may recall from Chapter 6, switches
+are "self-learning" and automatically build their forwarding tables.
+This selflearning feature nicely handles
+
+Figure 7.15 Mobility in the same subnet
+
+occasional moves (for example, when an employee gets transferred from
+one department to another); however, switches were not designed to
+support highly mobile users who want to maintain TCP connections while
+moving between BSSs. To appreciate the problem here, recall that before
+the move, the switch has an entry in its forwarding table that pairs
+H1's MAC address with the outgoing switch interface through which H1 can
+be reached. If H1 is initially in BSS1, then a datagram destined to H1
+will be directed to H1 via AP1. Once H1 associates with BSS2, however,
+its frames should be directed to AP2. One solution (a bit of a hack,
+really) is for AP2 to send a broadcast Ethernet frame with H1's source
+address to the switch just after the new association. When the switch
+receives the frame, it updates its forwarding table, allowing H1 to be
+reached via AP2. The 802.11f standards group is developing an inter-AP
+protocol to handle these and related issues. Our discussion above has
+focused on mobility with the same LAN subnet. Recall that VLANs, which
+we studied in Section 6.4.4, can be used to connect together islands of
+LANs into a large virtual LAN that can span a large geographical region.
+Mobility among base stations within such a VLAN can be handled in
+exactly the same manner as above \[Yu 2011\].
+
+7.3.5 Advanced Features in 802.11 We'll wrap up our coverage of 802.11
+with a short discussion of two advanced capabilities found in 802.11
+networks. As we'll see, these capabilities are not completely specified
+in the 802.11 standard, but rather are made possible by mechanisms
+specified in the standard. This allows different vendors to implement
+these capabilities using their own (proprietary) approaches, presumably
+giving them an edge over the competition. 802.11 Rate Adaptation We saw
+earlier in Figure 7.3 that different modulation techniques (with the
+different transmission rates that they provide) are appropriate for
+different SNR scenarios. Consider for example a mobile 802.11 user who
+is initially 20 meters away from the base station, with a high
+signal-to-noise ratio. Given the high SNR, the user can communicate with
+the base station using a physical-layer modulation technique that
+provides high transmission rates while maintaining a low BER. This is
+one happy user! Suppose now that the user becomes mobile, walking away
+from the base station, with the SNR falling as the distance from the
+base station increases. In this case, if the modulation technique used
+in the 802.11 protocol operating between the base station and the user
+does not change, the BER will become unacceptably high as the SNR
+decreases, and eventually no transmitted frames will be received
+correctly. For this reason, some 802.11 implementations have a rate
+adaptation capability that adaptively selects the underlying
+physical-layer modulation technique to use based on current or recent
+channel
+
+characteristics. If a node sends two frames in a row without receiving
+an acknowledgment (an implicit indication of bit errors on the channel),
+the transmission rate falls back to the next lower rate. If 10 frames in
+a row are acknowledged, or if a timer that tracks the time since the
+last fallback expires, the transmission rate increases to the next
+higher rate. This rate adaptation mechanism shares the same "probing"
+philosophy as TCP's congestion-control mechanism---when conditions are
+good (reflected by ACK receipts), the transmission rate is increased
+until something "bad" happens (the lack of ACK receipts); when something
+"bad" happens, the transmission rate is reduced. 802.11 rate adaptation
+and TCP congestion control are thus similar to the young child who is
+constantly pushing his/her parents for more and more (say candy for a
+young child, later curfew hours for the teenager) until the parents
+finally say "Enough!" and the child backs off (only to try again later
+after conditions have hopefully improved!). A number of other schemes
+have also been proposed to improve on this basic automatic
+rateadjustment scheme \[Kamerman 1997; Holland 2001; Lacage 2004\].
+Power Management Power is a precious resource in mobile devices, and
+thus the 802.11 standard provides powermanagement capabilities that
+allow 802.11 nodes to minimize the amount of time that their sense,
+transmit, and receive functions and other circuitry need to be "on."
+802.11 power management operates as follows. A node is able to
+explicitly alternate between sleep and wake states (not unlike a sleepy
+student in a classroom!). A node indicates to the access point that it
+will be going to sleep by setting the power-management bit in the header
+of an 802.11 frame to 1. A timer in the node is then set to wake up the
+node just before the AP is scheduled to send its beacon frame (recall
+that an AP typically sends a beacon frame every 100 msec). Since the AP
+knows from the set power-transmission bit that the node is going to
+sleep, it (the AP) knows that it should not send any frames to that
+node, and will buffer any frames destined for the sleeping host for
+later transmission. A node will wake up just before the AP sends a
+beacon frame, and quickly enter the fully active state (unlike the
+sleepy student, this wakeup requires only 250 microseconds \[Kamerman
+1997\]!). The beacon frames sent out by the AP contain a list of nodes
+whose frames have been buffered at the AP. If there are no buffered
+frames for the node, it can go back to sleep. Otherwise, the node can
+explicitly request that the buffered frames be sent by sending a polling
+message to the AP. With an inter-beacon time of 100 msec, a wakeup time
+of 250 microseconds, and a similarly small time to receive a beacon
+frame and check to ensure that there are no buffered frames, a node that
+has no frames to send or receive can be asleep 99% of the time,
+resulting in a significant energy savings.
+
+7.3.6 Personal Area Networks: Bluetooth and Zigbee As illustrated in
+Figure 7.2, the IEEE 802.11 WiFi standard is aimed at communication
+among devices separated by up to 100 meters (except when 802.11 is used
+in a point-to-point configuration with a
+
+directional antenna). Two other wireless protocols in the IEEE 802
+family are Bluetooth and Zigbee (defined in the IEEE 802.15.1 and IEEE
+802.15.4 standards \[IEEE 802.15 2012\]). Bluetooth An IEEE 802.15.1
+network operates over a short range, at low power, and at low cost. It
+is essentially a low-power, short-range, low-rate "cable replacement"
+technology for interconnecting a computer with its wireless keyboard,
+mouse or other peripheral device; cellular phones, speakers, headphones,
+and many other devices, whereas 802.11 is a higher-power, medium-range,
+higher-rate "access" technology. For this reason, 802.15.1 networks are
+sometimes referred to as wireless personal area networks (WPANs). The
+link and physical layers of 802.15.1 are based on the earlier Bluetooth
+specification for personal area networks \[Held 2001, Bisdikian 2001\].
+802.15.1 networks operate in the 2.4 GHz unlicensed radio band in a TDM
+manner, with time slots of 625 microseconds. During each time slot, a
+sender transmits on one of 79 channels, with the channel changing in a
+known but pseudo-random manner from slot to slot. This form of channel
+hopping, known as frequency-hopping spread spectrum (FHSS), spreads
+transmissions in time over the frequency spectrum. 802.15.1 can provide
+data rates up to 4 Mbps. 802.15.1 networks are ad hoc networks: No
+network infrastructure (e.g., an access point) is needed to interconnect
+802.15.1 devices. Thus, 802.15.1 devices must organize themselves.
+802.15.1 devices are first organized into a piconet of up to eight
+active devices, as shown in Figure 7.16. One of these devices is
+designated as the master, with the remaining devices acting as slaves.
+The master node truly rules the piconet---its clock determines time in
+the piconet, it can transmit in each odd-numbered slot, and a
+
+Figure 7.16 A Bluetooth piconet
+
+slave can transmit only after the master has communicated with it in the
+previous slot and even then the slave can only transmit to the master.
+In addition to the slave devices, there can also be up to 255 parked
+devices in the network. These devices cannot communicate until their
+status has been changed from parked to active by the master node. For
+more information about WPANs, the interested reader should consult the
+Bluetooth references \[Held 2001, Bisdikian 2001\] or the official IEEE
+802.15 Web site \[IEEE 802.15 2012\]. Zigbee A second personal area
+network standardized by the IEEE is the 802.15.4 standard \[IEEE 802.15
+2012\] known as Zigbee. While Bluetooth networks provide a "cable
+replacement" data rate of over a Megabit per second, Zigbee is targeted
+at lower-powered, lower-data-rate, lower-duty-cycle applications than
+Bluetooth. While we may tend to think that "bigger and faster is
+better," not all network applications need high bandwidth and the
+consequent higher costs (both economic and power costs). For example,
+home temperature and light sensors, security devices, and wall-mounted
+switches are all very simple, lowpower, low-duty-cycle, low-cost
+devices. Zigbee is thus well-suited for these devices. Zigbee defines
+channel rates of 20, 40, 100, and 250 Kbps, depending on the channel
+frequency. Nodes in a Zigbee network come in two flavors. So-called
+"reduced-function devices" operate as slave devices under the control of
+a single "full-function device," much as Bluetooth slave devices. A
+fullfunction device can operate as a master device as in Bluetooth by
+controlling multiple slave devices, and multiple full-function devices
+can additionally be configured into a mesh network in which fullfunction
+devices route frames amongst themselves. Zigbee shares many protocol
+mechanisms that we've already encountered in other link-layer protocols:
+beacon frames and link-layer acknowledgments (similar to 802.11),
+carrier-sense random access protocols with binary exponential backoff
+(similar to 802.11 and Ethernet), and fixed, guaranteed allocation of
+time slots (similar to DOCSIS). Zigbee networks can be configured in
+many different ways. Let's consider the simple case of a single
+full-function device controlling multiple reduced-function devices in a
+time-slotted manner using beacon frames. Figure 7.17 shows the case
+
+Figure 7.17 Zigbee 802.15.4 super-frame structure
+
+where the Zigbee network divides time into recurring super frames, each
+of which begins with a beacon frame. Each beacon frame divides the super
+frame into an active period (during which devices may transmit) and an
+inactive period (during which all devices, including the controller, can
+sleep and thus conserve power). The active period consists of 16 time
+slots, some of which are used by devices in a CSMA/CA random access
+manner, and some of which are allocated by the controller to specific
+devices, thus providing guaranteed channel access for those devices.
+More details about Zigbee networks can be found at \[Baronti 2007, IEEE
+802.15.4 2012\].
+
+7.4 Cellular Internet Access In the previous section we examined how an
+Internet host can access the Internet when inside a WiFi hotspot---that
+is, when it is within the vicinity of an 802.11 access point. But most
+WiFi hotspots have a small coverage area of between 10 and 100 meters in
+diameter. What do we do then when we have a desperate need for wireless
+Internet access and we cannot access a WiFi hotspot? Given that cellular
+telephony is now ubiquitous in many areas throughout the world, a
+natural strategy is to extend cellular networks so that they support not
+only voice telephony but wireless Internet access as well. Ideally, this
+Internet access would be at a reasonably high speed and would provide
+for seamless mobility, allowing users to maintain their TCP sessions
+while traveling, for example, on a bus or a train. With sufficiently
+high upstream and downstream bit rates, the user could even maintain
+videoconferencing sessions while roaming about. This scenario is not
+that far-fetched. Data rates of several megabits per second are becoming
+available as broadband data services such as those we will cover here
+become more widely deployed. In this section, we provide a brief
+overview of current and emerging cellular Internet access technologies.
+Our focus here will be on both the wireless first hop as well as the
+network that connects the wireless first hop into the larger telephone
+network and/or the Internet; in Section 7.7 we'll consider how calls are
+routed to a user moving between base stations. Our brief discussion will
+necessarily provide only a simplified and high-level description of
+cellular technologies. Modern cellular communications, of course, has
+great breadth and depth, with many universities offering several courses
+on the topic. Readers seeking a deeper understanding are encouraged to
+see \[Goodman 1997; Kaaranen 2001; Lin 2001; Korhonen 2003; Schiller
+2003; Palat 2009; Scourias 2012; Turner 2012; Akyildiz 2010\], as well
+as the particularly excellent and exhaustive references \[Mouly 1992;
+Sauter 2014\].
+
+7.4.1 An Overview of Cellular Network Architecture In our description of
+cellular network architecture in this section, we'll adopt the
+terminology of the Global System for Mobile Communications (GSM)
+standards. (For history buffs, the GSM acronym was originally derived
+from Groupe Spécial Mobile, until the more anglicized name was adopted,
+preserving the original acronym letters.) In the 1980s, Europeans
+recognized the need for a pan-European digital cellular telephony system
+that would replace the numerous incompatible analog cellular telephony
+systems, leading to the GSM standard \[Mouly 1992\]. Europeans deployed
+GSM technology with great
+
+success in the early 1990s, and since then GSM has grown to be the
+800-pound gorilla of the cellular telephone world, with more than 80% of
+all cellular subscribers worldwide using GSM.
+
+CASE HISTORY 4G Cellular Mobile Versus Wireless LANs Many cellular
+mobile phone operators are deploying 4G cellular mobile systems. In some
+countries (e.g., Korea and Japan), 4G LTE coverage is higher than
+90%---nearly ubiquitous. In 2015, average download rates over deployed
+LTE systems range from 10Mbps in the US and India to close to 40 Mbps in
+New Zealand. These 4G systems are being deployed in licensed
+radio-frequency bands, with some operators paying considerable sums to
+governments for spectrum-use licenses. 4G systems allow users to access
+the Internet from remote outdoor locations while on the move, in a
+manner similar to today's cellular phone-only access. In many cases, a
+user may have simultaneous access to both wireless LANs and 4G. With the
+capacity of 4G systems being both more constrained and more expensive,
+many mobile devices default to the use of WiFi rather than 4G, when both
+are avilable. The question of whether wireless edge network access will
+be primarily over wireless LANs or cellular systems remains an open
+question: The emerging wireless LAN infrastructure may become nearly
+ubiquitous. IEEE 802.11 wireless LANs, operating at 54 Mbps and higher,
+are enjoying widespread deployment. Essentially all laptops, tablets and
+smartphones are factory-equipped with 802.11 LAN capabilities.
+Furthermore, emerging Internet appliances---such as wireless cameras and
+picture frames---also have low-powered wireless LAN capabilities.
+Wireless LAN base stations can also handle mobile phone appliances. Many
+phones are already capable of connecting to the cellular phone network
+or to an IP network either natively or using a Skype-like Voice-over-IP
+service, thus bypassing the operator's cellular voice and 4G data
+services. Of course, many other experts believe that 4G not only will be
+a major success, but will also dramatically revolutionize the way we
+work and live. Most likely, both WiFi and 4G will both become prevalent
+wireless technologies, with roaming wireless devices automatically
+selecting the access technology that provides the best service at their
+current physical location.
+
+When people talk about cellular technology, they often classify the
+technology as belonging to one of several "generations." The earliest
+generations were designed primarily for voice traffic. First generation
+(1G) systems were analog FDMA systems designed exclusively for
+voice-only communication. These 1G systems are almost extinct now,
+having been replaced by digital 2G systems. The original 2G systems were
+also designed for voice, but later extended (2.5G) to support data
+(i.e., Internet) as well as voice service. 3G systems also support voice
+and data, but with an emphasis on data capabilities and
+
+higher-speed radio access links. The 4G systems being deployed today are
+based on LTE technology, feature an all-IP core network, and provide
+integrated voice and data at multi-Megabit speeds. Cellular Network
+Architecture, 2G: Voice Connections to the Telephone Network The term
+cellular refers to the fact that the region covered by a cellular
+network is partitioned into a number of geographic coverage areas, known
+as cells, shown as hexagons on the left side of Figure 7.18. As with the
+802.11WiFi standard we studied in Section 7.3.1, GSM has its own
+particular nomenclature. Each cell
+
+Figure 7.18 Components of the GSM 2G cellular network architecture
+
+contains a base transceiver station (BTS) that transmits signals to and
+receives signals from the mobile stations in its cell. The coverage area
+of a cell depends on many factors, including the transmitting power of
+the BTS, the transmitting power of the user devices, obstructing
+buildings in the cell, and the height of base station antennas. Although
+Figure 7.18 shows each cell containing one base transceiver station
+residing in the middle of the cell, many systems today place the BTS at
+corners where three cells intersect, so that a single BTS with
+directional antennas can service three cells. The GSM standard for 2G
+cellular systems uses combined FDM/TDM (radio) for the air interface.
+Recall from Chapter 1 that, with pure FDM, the channel is partitioned
+into a number of frequency bands with each band devoted to a call. Also
+recall from Chapter 1 that, with pure TDM, time is partitioned into
+
+frames with each frame further partitioned into slots and each call
+being assigned the use of a particular slot in the revolving frame. In
+combined FDM/TDM systems, the channel is partitioned into a number of
+frequency sub-bands; within each sub-band, time is partitioned into
+frames and slots. Thus, for a combined FDM/TDM system, if the channel is
+partitioned into F sub-bands and time is partitioned into T slots, then
+the channel will be able to support F.T simultaneous calls. Recall that
+we saw in Section 6.3.4 that cable access networks also use a combined
+FDM/TDM approach. GSM systems consist of 200-kHz frequency bands with
+each band supporting eight TDM calls. GSM encodes speech at 13 kbps and
+12.2 kbps. A GSM network's base station controller (BSC) will typically
+service several tens of base transceiver stations. The role of the BSC
+is to allocate BTS radio channels to mobile subscribers, perform paging
+(finding the cell in which a mobile user is resident), and perform
+handoff of mobile users---a topic we'll cover shortly in Section 7.7.2.
+The base station controller and its controlled base transceiver stations
+collectively constitute a GSM base station subsystem (BSS). As we'll see
+in Section 7.7, the mobile switching center (MSC) plays the central role
+in user authorization and accounting (e.g., determining whether a mobile
+device is allowed to connect to the cellular network), call
+establishment and teardown, and handoff. A single MSC will typically
+contain up to five BSCs, resulting in approximately 200K subscribers per
+MSC. A cellular provider's network will have a number of MSCs, with
+special MSCs known as gateway MSCs connecting the provider's cellular
+network to the larger public telephone network.
+
+7.4.2 3G Cellular Data Networks: Extending the Internet to Cellular
+Subscribers Our discussion in Section 7.4.1 focused on connecting
+cellular voice users to the public telephone network. But, of course,
+when we're on the go, we'd also like to read e-mail, access the Web, get
+location-dependent services (e.g., maps and restaurant recommendations)
+and perhaps even watch streaming video. To do this, our smartphone will
+need to run a full TCP/IP protocol stack (including the physical link,
+network, transport, and application layers) and connect into the
+Internet via the cellular data network. The topic of cellular data
+networks is a rather bewildering collection of competing and
+ever-evolving standards as one generation (and half-generation) succeeds
+the former and introduces new technologies and services with new
+acronyms. To make matters worse, there's no single official body that
+sets requirements for 2.5G, 3G, 3.5G, or 4G technologies, making it hard
+to sort out the differences among competing standards. In our discussion
+below, we'll focus on the UMTS (Universal Mobile Telecommunications
+Service) 3G and 4G standards developed by the 3rd Generation Partnership
+project (3GPP) \[3GPP 2016\]. Let's first take a top-down look at 3G
+cellular data network architecture shown in Figure 7.19.
+
+Figure 7.19 3G system architecture
+
+3G Core Network The 3G core cellular data network connects radio access
+networks to the public Internet. The core network interoperates with
+components of the existing cellular voice network (in particular, the
+MSC) that we previously encountered in Figure 7.18. Given the
+considerable amount of existing infrastructure (and profitable
+services!) in the existing cellular voice network, the approach taken by
+the designers of 3G data services is clear: leave the existing core GSM
+cellular voice network untouched, adding additional cellular data
+functionality in parallel to the existing cellular voice network. The
+alternative--- integrating new data services directly into the core of
+the existing cellular voice network---would have raised the same
+challenges encountered in Section 4.3, where we discussed integrating
+new (IPv6) and legacy (IPv4) technologies in the Internet.
+
+There are two types of nodes in the 3G core network: Serving GPRS
+Support Nodes (SGSNs) and Gateway GPRS Support Nodes (GGSNs). (GPRS
+stands for Generalized Packet Radio Service, an early cellular data
+service in 2G networks; here we discuss the evolved version of GPRS in
+3G networks). An SGSN is responsible for delivering datagrams to/from
+the mobile nodes in the radio access network to which the SGSN is
+attached. The SGSN interacts with the cellular voice network's MSC for
+that area, providing user authorization and handoff, maintaining
+location (cell) information about active mobile nodes, and performing
+datagram forwarding between mobile nodes in the radio access network and
+a GGSN. The GGSN acts as a gateway, connecting multiple SGSNs into the
+larger Internet. A GGSN is thus the last piece of 3G infrastructure that
+a datagram originating at a mobile node encounters before entering the
+larger Internet. To the outside world, the GGSN looks like any other
+gateway router; the mobility of the 3G nodes within the GGSN's network
+is hidden from the outside world behind the GGSN. 3G Radio Access
+Network: The Wireless Edge The 3G radio access network is the wireless
+first-hop network that we see as a 3G user. The Radio Network Controller
+(RNC) typically controls several cell base transceiver stations similar
+to the base stations that we encountered in 2G systems (but officially
+known in 3G UMTS parlance as a "Node Bs"---a rather non-descriptive
+name!). Each cell's wireless link operates between the mobile nodes and
+a base transceiver station, just as in 2G networks. The RNC connects to
+both the circuit-switched cellular voice network via an MSC, and to the
+packet-switched Internet via an SGSN. Thus, while 3G cellular voice and
+cellular data services use different core networks, they share a common
+first/last-hop radio access network. A significant change in 3G UMTS
+over 2G networks is that rather than using GSM's FDMA/TDMA scheme, UMTS
+uses a CDMA technique known as Direct Sequence Wideband CDMA (DS-WCDMA)
+\[Dahlman 1998\] within TDMA slots; TDMA slots, in turn, are available
+on multiple frequencies---an interesting use of all three dedicated
+channel-sharing approaches that we earlier identified in Chapter 6 and
+similar to the approach taken in wired cable access networks (see
+Section 6.3.4). This change requires a new 3G cellular wireless-access
+network operating in parallel with the 2G BSS radio network shown in
+Figure 7.19. The data service associated with the WCDMA specification is
+known as HSPA (High Speed Packet Access) and promises downlink data
+rates of up to 14 Mbps. Details regarding 3G networks can be found at
+the 3rd Generation Partnership Project (3GPP) Web site \[3GPP 2016\].
+
+7.4.3 On to 4G: LTE Fourth generation (4G) cellular systems are becoming
+widely deployed. In 2015, more than 50 countries had 4G coverage
+exceeding 50%. The 4G Long-Term Evolution (LTE) standard \[Sauter 2014\]
+put forward by the 3GPP has two important innovations over 3G systems an
+all-IP core network and an
+
+enhanced radio access network, as discussed below. 4G System
+Architecture: An All-IP Core Network Figure 7.20 shows the overall 4G
+network architecture, which (unfortunately) introduces yet another
+(rather impenetrable) new vocabulary and set of acronyms for
+
+Figure 7.20 4G network architecture
+
+network components. But let's not get lost in these acronyms! There are
+two important high-level observations about the 4G architecture: A
+unified, all-IP network architecture. Unlike the 3G network shown in
+Figure 7.19, which has separate network components and paths for voice
+and data traffic, the 4G architecture shown in Figure 7.20 is
+"all-IP"---both voice and data are carried in IP datagrams to/from the
+wireless device (the User Equipment, UE in 4G parlance) to the gateway
+to the packet gateway (P-GW) that connects the 4G edge network to the
+rest of the network. With 4G, the last vestiges of cellular networks'
+roots in the telephony have disappeared, giving way to universal IP
+service! A clear separation of the 4G data plane and 4G control plane.
+Mirroring our distinction between the data and control planes for IP's
+network layer in Chapters 4 and 5 respectively, the 4G network
+architecture also clearly separates the data and control planes. We'll
+discuss their functionality below. A clear separation between the radio
+access network, and the all-IP-core network. IP datagrams carrying user
+data are forwarded between the user (UE) and the gateway (P-GW in
+
+Figure 7.20) over a 4G-internal IP network to the external Internet.
+Control packets are exchanged over this same internal network among the
+4G's control services components, whose roles are described below. The
+principal components of the 4G architecture are as follows. The eNodeB
+is the logical descendant of the 2G base station and the 3G Radio
+Network Controller (a.k.a Node B) and again plays a central role here.
+Its data-plane role is to forward datagrams between UE (over the LTE
+radio access network) and the P-GW. UE datagrams are encapsulated at the
+eNodeB and tunneled to the P-GW through the 4G network's all-IP enhanced
+packet core (EPC). This tunneling between the eNodeB and P-GW is similar
+the tunneling we saw in Section 4.3 of IPv6 datagrams between two IPv6
+endpoints through a network of IPv4 routers. These tunnels may have
+associated quality of service (QoS) guarantees. For example, a 4G
+network may guarantee that voice traffic experiences no more than a 100
+msec delay between UE and P-GW, and has a packet loss rate of less than
+1%; TCP traffic might have a guarantee of 300 msec and a packet loss
+rate of less than .0001% \[Palat 2009\]. We'll cover QoS in Chapter 9.
+In the control plane, the eNodeB handles registration and mobility
+signaling traffic on behalf of the UE. The Packet Data Network Gateway
+(P-GW) allocates IP addresses to the UEs and performs QoS enforcement.
+As a tunnel endpoint it also performs datagram
+encapsulation/decapsulation when forwarding a datagram to/from a UE. The
+Serving Gateway (S-GW) is the data-plane mobility anchor point---all UE
+traffic will pass through the S-GW. The S-GW also performs
+charging/billing functions and lawful traffic interception. The Mobility
+Management Entity (MME) performs connection and mobility management on
+behalf of the UEs resident in the cell it controls. It receives UE
+subscription information from the HHS. We cover mobility in cellular
+networks in detail in Section 7.7. The Home Subscriber Server (HSS)
+contains UE information including roaming access capabilities, quality
+of service profiles, and authentication information. As we'll see in
+Section 7.7, the HSS obtains this information from the UE's home
+cellular provider. Very readable introductions to 4G network
+architecture and its EPC are \[Motorola 2007; Palat 2009; Sauter 2014\].
+LTE Radio Access Network LTE uses a combination of frequency division
+multiplexing and time division multiplexing on the downstream channel,
+known as orthogonal frequency division multiplexing (OFDM) \[Rohde 2008;
+Ericsson 2011\]. (The term "orthogonal" comes from the fact the signals
+being sent on different frequency
+
+channels are created so that they interfere very little with each other,
+even when channel frequencies are tightly spaced). In LTE, each active
+mobile node is allocated one or more 0.5 ms time slots in one or more of
+the channel frequencies. Figure 7.21 shows an allocation of eight time
+slots over four frequencies. By being allocated increasingly more time
+slots (whether on the same frequency or on different frequencies), a
+mobile node is able to achieve increasingly higher transmission rates.
+Slot (re)allocation among mobile
+
+Figure 7.21 Twenty 0.5 ms slots organized into 10 ms frames at each
+frequency. An eight-slot allocation is shown shaded.
+
+nodes can be performed as often as once every millisecond. Different
+modulation schemes can also be used to change the transmission rate; see
+our earlier discussion of Figure 7.3 and dynamic selection of modulation
+schemes in WiFi networks. The particular allocation of time slots to
+mobile nodes is not mandated by the LTE standard. Instead, the decision
+of which mobile nodes will be allowed to transmit in a given time slot
+on a given frequency is determined by the scheduling algorithms provided
+by the LTE equipment vendor and/or the network operator. With
+opportunistic scheduling \[Bender 2000; Kolding 2003; Kulkarni 2005\],
+matching the physical-layer protocol to the channel conditions between
+the sender and receiver and choosing the receivers to which packets will
+be sent based on channel conditions allow the radio network controller
+to make best use of the wireless medium. In addition, user priorities
+and contracted levels of service (e.g., silver, gold, or platinum) can
+be used in scheduling downstream packet transmissions. In addition to
+the LTE capabilities described above, LTE-Advanced allows for downstream
+bandwidths of hundreds of Mbps by allocating aggregated channels to a
+mobile node \[Akyildiz 2010\].
+
+An additional 4G wireless technology---WiMAX (World Interoperability for
+Microwave Access)---is a family of IEEE 802.16 standards that differ
+significantly from LTE. WiMAX has not yet been able to enjoy the
+widespread deployment of LTE. A detailed discussion of WiMAX can be
+found on this book's Web site.
+
+7.5 Mobility Management: Principles Having covered the wireless nature
+of the communication links in a wireless network, it's now time to turn
+our attention to the mobility that these wireless links enable. In the
+broadest sense, a mobile node is one that changes its point of
+attachment into the network over time. Because the term mobility has
+taken on many meanings in both the computer and telephony worlds, it
+will serve us well first to consider several dimensions of mobility in
+some detail. From the network layer's standpoint, how mobile is a user?
+A physically mobile user will present a very different set of challenges
+to the network layer, depending on how he or she moves between points of
+attachment to the network. At one end of the spectrum in Figure 7.22, a
+user may carry a laptop with a wireless network interface card around in
+a building. As we saw in Section 7.3.4, this user is not mobile from a
+network-layer perspective. Moreover, if the user associates with the
+same access point regardless of location, the user is not even mobile
+from the perspective of the link layer. At the other end of the
+spectrum, consider the user zooming along the autobahn in a BMW or Tesla
+at 150 kilometers per hour, passing through multiple wireless access
+networks and wanting to maintain an uninterrupted TCP connection to a
+remote application throughout the trip. This user is definitely mobile!
+In between
+
+Figure 7.22 Various degrees of mobility, from the network layer's point
+of view
+
+these extremes is a user who takes a laptop from one location (e.g.,
+office or dormitory) into another (e.g., coffeeshop, classroom) and
+wants to connect into the-network in the new location. This user is also
+mobile (although less so than the BMW driver!) but does not need to
+maintain an ongoing connection while moving between points of attachment
+to the network. Figure 7.22 illustrates this spectrum of user mobility
+from the network layer's perspective. How important is it for the mobile
+node's address to always remain the same? With mobile telephony, your
+phone number---essentially the network-layer address of your
+phone---remains the same as you travel from one provider's mobile phone
+network to another. Must a laptop similarly
+
+maintain the same IP address while moving between IP networks? The
+answer to this question will depend strongly on the applications being
+run. For the BMW or Tesla driver who wants to maintain an uninterrupted
+TCP connection to a remote application while zipping along the autobahn,
+it would be convenient to maintain the same IP address. Recall from
+Chapter 3 that an Internet application needs to know the IP address and
+port number of the remote entity with which it is communicating. If a
+mobile entity is able to maintain its IP address as it moves, mobility
+becomes invisible from the application standpoint. There is great value
+to this transparency ---an application need not be concerned with a
+potentially changing IP address, and the same application code serves
+mobile and nonmobile connections alike. We'll see in the following
+section that mobile IP provides this transparency, allowing a mobile
+node to maintain its permanent IP address while moving among networks.
+On the other hand, a less glamorous mobile user might simply want to
+turn off an office laptop, bring that laptop home, power up, and work
+from home. If the laptop functions primarily as a client in
+client-server applications (e.g., send/read e-mail, browse the Web,
+Telnet to a remote host) from home, the particular IP address used by
+the laptop is not that important. In particular, one could get by fine
+with an address that is temporarily allocated to the laptop by the ISP
+serving the home. We saw in Section 4.3 that DHCP already provides this
+functionality. What supporting wired infrastructure is available? In all
+of our scenarios above, we've implicitly assumed that there is a fixed
+infrastructure to which the mobile user can connect---for example, the
+home's ISP network, the wireless access network in the office, or the
+wireless access networks lining the autobahn. What if no such
+infrastructure exists? If two users are within communication proximity
+of each other, can they establish a network connection in the absence of
+any other network-layer infrastructure? Ad hoc networking provides
+precisely these capabilities. This rapidly developing area is at the
+cutting edge of mobile networking research and is beyond the scope of
+this book. \[Perkins 2000\] and the IETF Mobile Ad Hoc Network (manet)
+working group Web pages \[manet 2016\] provide thorough treatments of
+the subject. In order to illustrate the issues involved in allowing a
+mobile user to maintain ongoing connections while moving between
+networks, let's consider a human analogy. A twenty-something adult
+moving out of the family home becomes mobile, living in a series of
+dormitories and/or apartments, and often changing addresses. If an old
+friend wants to get in touch, how can that friend find the address of
+her mobile friend? One common way is to contact the family, since a
+mobile adult will often register his or her current address with the
+family (if for no other reason than so that the parents can send money
+to help pay the rent!). The family home, with its permanent address,
+becomes that one place that others can go as a first step in
+communicating with the mobile adult. Later communication from the friend
+may be either indirect (for example, with mail being sent first to the
+parents' home and then forwarded to the mobile adult) or direct (for
+example, with the friend using the address obtained from the parents to
+send mail directly to her mobile friend).
+
+In a network setting, the permanent home of a mobile node (such as a
+laptop or smartphone) is known as the home network, and the entity
+within the home network that performs the mobility management functions
+discussed below on behalf of the mobile node is known as the home agent.
+The network in which the mobile node is currently residing is known as
+the foreign (or visited) network, and the entity within the foreign
+network that helps the mobile node with the mobility management
+functions discussed below is known as a foreign agent. For mobile
+professionals, their home network might likely be their company network,
+while the visited network might be the network of a colleague they are
+visiting. A correspondent is the entity wishing to communicate with the
+mobile node. Figure 7.23 illustrates these concepts, as well as
+addressing concepts considered below. In Figure 7.23, note that agents
+are shown as being collocated with routers (e.g., as processes running
+on routers), but alternatively they could be executing on other hosts or
+servers in the network.
+
+7.5.1 Addressing We noted above that in order for user mobility to be
+transparent to network applications, it is desirable for a mobile node
+to keep its address as it moves from one network
+
+Figure 7.23 Initial elements of a mobile network architecture
+
+to another. When a mobile node is resident in a foreign network, all
+traffic addressed to the node's permanent address now needs to be routed
+to the foreign network. How can this be done? One option is for the
+foreign network to advertise to all other networks that the mobile node
+is resident in its network. This could be via the usual exchange of
+intradomain and interdomain routing information and would require few
+changes to the existing routing infrastructure. The foreign network
+could simply advertise to its neighbors that it has a highly specific
+route to the mobile node's permanent address (that is, essentially
+inform other networks that it has the correct path for routing datagrams
+to the mobile node's permanent address; see Section 4.3). These
+neighbors would then propagate this routing information throughout the
+network as part of the normal procedure of updating routing information
+and forwarding tables. When the mobile node leaves one foreign network
+and joins another, the new foreign network would advertise a new, highly
+specific route to the mobile node, and the old foreign network would
+withdraw its routing information regarding the mobile node. This solves
+two problems at once, and it does so without making significant changes
+to the networklayer infrastructure. Other networks know the location of
+the mobile node, and it is easy to route datagrams to the mobile node,
+since the forwarding tables will direct datagrams to the foreign
+network. A significant drawback, however, is that of scalability. If
+mobility management were to be the responsibility of network routers,
+the routers would have to maintain forwarding table entries for
+potentially millions of mobile nodes, and update these entries as nodes
+move. Some additional drawbacks are explored in the problems at the end
+of this chapter. An alternative approach (and one that has been adopted
+in practice) is to push mobility functionality from the network core to
+the network edge---a recurring theme in our study of Internet
+architecture. A natural way to do this is via the mobile node's home
+network. In much the same way that parents of the mobile
+twenty-something track their child's location, the home agent in the
+mobile node's home network can track the foreign network in which the
+mobile node resides. A protocol between the mobile node (or a foreign
+agent representing the mobile node) and the home agent will certainly be
+needed to update the mobile node's location. Let's now consider the
+foreign agent in more detail. The conceptually simplest approach, shown
+in Figure 7.23, is to locate foreign agents at the edge routers in the
+foreign network. One role of the foreign agent is to create a so-called
+care-of address (COA) for the mobile node, with the network portion of
+the COA matching that of the foreign network. There are thus two
+addresses associated with a mobile node, its permanent address
+(analogous to our mobile youth's family's home address) and its COA,
+sometimes known as a foreign address (analogous to the address of the
+house in which our mobile youth is currently residing). In the example
+in Figure 7.23, the permanent address of the mobile node is
+128.119.40.186. When visiting network 79.129.13/24, the mobile node has
+a COA of 79.129.13.2. A second role of the foreign agent is to inform
+the home agent that the mobile node is resident in its (the foreign
+agent's) network and has the given COA. We'll see shortly that the COA
+will
+
+be used to "reroute" datagrams to the mobile node via its foreign agent.
+Although we have separated the functionality of the mobile node and the
+foreign agent, it is worth noting that the mobile node can also assume
+the responsibilities of the foreign agent. For example, the mobile node
+could obtain a COA in the foreign network (for example, using a protocol
+such as DHCP) and itself inform the home agent of its COA.
+
+7.5.2 Routing to a Mobile Node We have now seen how a mobile node
+obtains a COA and how the home agent can be informed of that address.
+But having the home agent know the COA solves only part of the problem.
+How should datagrams be addressed and forwarded to the mobile node?
+Since only the home agent (and not network-wide routers) knows the
+location of the mobile node, it will no longer suffice to simply address
+a datagram to the mobile node's permanent address and send it into the
+network-layer infrastructure. Something more must be done. Two
+approaches can be identified, which we will refer to as indirect and
+direct routing. Indirect Routing to a Mobile Node Let's first consider a
+correspondent that wants to send a datagram to a mobile node. In the
+indirect routing approach, the correspondent simply addresses the
+datagram to the mobile node's permanent address and sends the datagram
+into the network, blissfully unaware of whether the mobile node is
+resident in its home network or is visiting a foreign network; mobility
+is thus completely transparent to the correspondent. Such datagrams are
+first routed, as usual, to the mobile node's home network. This is
+illustrated in step 1 in Figure 7.24. Let's now turn our attention to
+the home agent. In addition to being responsible for interacting with a
+foreign agent to track the mobile node's COA, the home agent has another
+very important function. Its second job is to be on the lookout for
+arriving datagrams addressed to nodes whose home network is that of the
+home agent but that are currently resident in a foreign network. The
+home agent intercepts these datagrams and then forwards them to a mobile
+node in a two-step process. The datagram is first forwarded to the
+foreign agent, using the mobile node's COA (step 2 in Figure 7.24), and
+then forwarded from the foreign agent to the mobile node (step 3 in
+Figure 7.24).
+
+Figure 7.24 Indirect routing to a mobile node
+
+It is instructive to consider this rerouting in more detail. The home
+agent will need to address the datagram using the mobile node's COA, so
+that the network layer will route the datagram to the foreign network.
+On the other hand, it is desirable to leave the correspondent's datagram
+intact, since the application receiving the datagram should be unaware
+that the datagram was forwarded via the home agent. Both goals can be
+satisfied by having the home agent encapsulate the correspondent's
+original complete datagram within a new (larger) datagram. This larger
+datagram is addressed and delivered to the mobile node's COA. The
+foreign agent, who "owns" the COA, will receive and decapsulate the
+datagram---that is, remove the correspondent's original datagram from
+within the larger encapsulating datagram and forward (step 3 in Figure
+7.24) the original datagram to the mobile node. Figure 7.25 shows a
+correspondent's original datagram being sent to the home network, an
+encapsulated datagram being sent to the foreign agent, and the original
+datagram being delivered to the mobile node. The sharp reader will note
+that the encapsulation/decapsulation described here is identical to the
+notion of tunneling, discussed in Section 4.3 in the context of IP
+multicast and IPv6. Let's next consider how a mobile node sends
+datagrams to a correspondent. This is quite simple, as the mobile node
+can address its datagram directly to the correspondent (using its own
+permanent address as the source address, and the
+
+Figure 7.25 Encapsulation and decapsulation
+
+correspondent's address as the destination address). Since the mobile
+node knows the correspondent's address, there is no need to route the
+datagram back through the home agent. This is shown as step 4 in Figure
+7.24. Let's summarize our discussion of indirect routing by listing the
+new network-layer functionality required to support mobility. A
+mobile-node--to--foreign-agent protocol. The mobile node will register
+with the foreign agent when attaching to the foreign network. Similarly,
+a mobile node will deregister with the foreign agent when it leaves the
+foreign network. A foreign-agent--to--home-agent registration protocol.
+The foreign agent will register the mobile node's COA with the home
+agent. A foreign agent need not explicitly deregister a COA when a
+mobile node leaves its network, because the subsequent registration of a
+new COA, when the mobile node moves to a new network, will take care of
+this. A home-agent datagram encapsulation protocol. Encapsulation and
+forwarding of the correspondent's original datagram within a datagram
+addressed to the COA. A foreign-agent decapsulation protocol. Extraction
+of the correspondent's original datagram from the encapsulating
+datagram, and the forwarding of the original datagram to the mobile
+node. The previous discussion provides all the pieces---foreign agents,
+the home agent, and indirect
+
+forwarding---needed for a mobile node to maintain an ongoing connection
+while moving among networks. As an example of how these pieces fit
+together, assume the mobile node is attached to foreign network A, has
+registered a COA in network A with its home agent, and is receiving
+datagrams that are being indirectly routed through its home agent. The
+mobile node now moves to foreign network B and registers with the
+foreign agent in network B, which informs the home agent of the mobile
+node's new COA. From this point on, the home agent will reroute
+datagrams to foreign network B. As far as a correspondent is concerned,
+mobility is transparent---datagrams are routed via the same home agent
+both before and after the move. As far as the home agent is concerned,
+there is no disruption in the flow of datagrams---arriving datagrams are
+first forwarded to foreign network A; after the change in COA, datagrams
+are forwarded to foreign network B. But will the mobile node see an
+interrupted flow of datagrams as it moves between networks? As long as
+the time between the mobile node's disconnection from network A (at
+which point it can no longer receive datagrams via A) and its attachment
+to network B (at which point it will register a new COA with its home
+agent) is small, few datagrams will be lost. Recall from Chapter 3 that
+end-to-end connections can suffer datagram loss due to network
+congestion. Hence occasional datagram loss within a connection when a
+node moves between networks is by no means a catastrophic problem. If
+loss-free communication is required, upperlayer mechanisms will recover
+from datagram loss, whether such loss results from network congestion or
+from user mobility. An indirect routing approach is used in the mobile
+IP standard \[RFC 5944\], as discussed in Section 7.6. Direct Routing to
+a Mobile Node The indirect routing approach illustrated in Figure 7.24
+suffers from an inefficiency known as the triangle routing
+problem---datagrams addressed to the mobile node must be routed first to
+the home agent and then to the foreign network, even when a much more
+efficient route exists between the correspondent and the mobile node. In
+the worst case, imagine a mobile user who is visiting the foreign
+network of a colleague. The two are sitting side by side and exchanging
+data over the network. Datagrams from the correspondent (in this case
+the colleague of the visitor) are routed to the mobile user's home agent
+and then back again to the foreign network! Direct routing overcomes the
+inefficiency of triangle routing, but does so at the cost of additional
+complexity. In the direct routing approach, a correspondent agent in the
+correspondent's network first learns the COA of the mobile node. This
+can be done by having the correspondent agent query the home agent,
+assuming that (as in the case of indirect routing) the mobile node has
+an up-to-date value for its COA registered with its home agent. It is
+also possible for the correspondent itself to perform the function of
+the correspondent agent, just as a mobile node could perform the
+function of the foreign agent. This is shown as steps 1 and 2 in Figure
+7.26. The correspondent agent then tunnels datagrams directly to the
+mobile node's COA, in a manner analogous to the tunneling performed by
+the home agent, steps 3 and 4 in Figure 7.26.
+
+While direct routing overcomes the triangle routing problem, it
+introduces two important additional challenges: A mobile-user location
+protocol is needed for the correspondent agent to query the home agent
+to obtain the mobile node's COA (steps 1 and 2 in Figure 7.26). When the
+mobile node moves from one foreign network to another, how will data now
+be forwarded to the new foreign network? In the case of indirect
+routing, this problem was easily solved by updating the COA maintained
+by the home agent. However, with direct routing, the home agent is
+queried for the COA by the correspondent agent only once, at the
+beginning of the session. Thus, updating the COA at the home agent,
+while necessary, will not be enough to solve the problem of routing data
+to the mobile node's new foreign network. One solution would be to
+create a new protocol to notify the correspondent of the changing COA.
+An alternate solution, and one that we'll see adopted in practice
+
+Figure 7.26 Direct routing to a mobile user
+
+in GSM networks, works as follows. Suppose data is currently being
+forwarded to the mobile node in the foreign network where the mobile
+node was located when the session first started (step 1 in Figure 7.27).
+We'll identify the foreign agent in that foreign network where the
+mobile node was first found as the anchor foreign agent. When the mobile
+node moves to a new foreign network (step 2 in Figure 7.27), the mobile
+node registers with the new foreign agent (step 3), and the new foreign
+agent provides the anchor foreign agent with the mobile node's new COA
+(step 4). When the anchor foreign agent receives an encapsulated
+datagram for a departed mobile node, it can then re-encapsulate the
+datagram and forward it to the mobile node (step 5) using the new COA.
+If the mobile node later moves yet again to a new foreign network, the
+foreign agent in that new visited network would then contact the anchor
+foreign agent in order to set up forwarding to this new foreign network.
+
+Figure 7.27 Mobile transfer between networks with direct routing
+
+7.6 Mobile IP The Internet architecture and protocols for supporting
+mobility, collectively known as mobile IP, are defined primarily in RFC
+5944 for IPv4. Mobile IP is a flexible standard, supporting many
+different modes of operation (for example, operation with or without a
+foreign agent), multiple ways for agents and mobile nodes to discover
+each other, use of single or multiple COAs, and multiple forms of
+encapsulation. As such, mobile IP is a complex standard, and would
+require an entire book to describe in detail; indeed one such book is
+\[Perkins 1998b\]. Our modest goal here is to provide an overview of the
+most important aspects of mobile IP and to illustrate its use in a few
+common-case scenarios. The mobile IP architecture contains many of the
+elements we have considered above, including the concepts of home
+agents, foreign agents, care-of addresses, and
+encapsulation/decapsulation. The current standard \[RFC 5944\] specifies
+the use of indirect routing to the mobile node. The mobile IP standard
+consists of three main pieces: Agent discovery. Mobile IP defines the
+protocols used by a home or foreign agent to advertise its services to
+mobile nodes, and protocols for mobile nodes to solicit the services of
+a foreign or home agent. Registration with the home agent. Mobile IP
+defines the protocols used by the mobile node and/or foreign agent to
+register and deregister COAs with a mobile node's home agent. Indirect
+routing of datagrams. The standard also defines the manner in which
+datagrams are forwarded to mobile nodes by a home agent, including rules
+for forwarding datagrams, rules for handling error conditions, and
+several forms of encapsulation \[RFC 2003, RFC 2004\]. Security
+considerations are prominent throughout the mobile IP standard. For
+example, authentication of a mobile node is clearly needed to ensure
+that a malicious user does not register a bogus care-of address with a
+home agent, which could cause all datagrams addressed to an IP address
+to be redirected to the malicious user. Mobile IP achieves security
+using many of the mechanisms that we will examine in Chapter 8, so we
+will not address security considerations in our discussion below. Agent
+Discovery A mobile IP node arriving to a new network, whether attaching
+to a foreign network or returning to its home network, must learn the
+identity of the corresponding foreign or home agent. Indeed it is the
+discovery of a new foreign agent, with a new network address, that
+allows the network layer in a mobile
+
+node to learn that it has moved into a new foreign network. This process
+is known as agent discovery. Agent discovery can be accomplished in one
+of two ways: via agent advertisement or via agent solicitation. With
+agent advertisement, a foreign or home agent advertises its services
+using an extension to the existing router discovery protocol \[RFC
+1256\]. The agent periodically broadcasts an ICMP message with a type
+field of 9 (router discovery) on all links to which it is connected. The
+router discovery message contains the IP address of the router (that is,
+the agent), thus allowing a mobile node to learn the agent's IP address.
+The router discovery message also contains a mobility agent
+advertisement extension that contains additional information needed by
+the mobile node. Among the more important fields in the extension are
+the following: Home agent bit (H). Indicates that the agent is a home
+agent for the network in which it resides. Foreign agent bit (F).
+Indicates that the agent is a foreign agent for the network in which it
+resides. Registration required bit (R). Indicates that a mobile user in
+this network must register with a foreign agent. In particular, a mobile
+user cannot obtain a care-of address in the foreign network (for
+example, using DHCP) and assume the functionality of the foreign agent
+for itself, without registering with the foreign agent.
+
+Figure 7.28 ICMP router discovery message with mobility agent
+advertisement extension
+
+M, G encapsulation bits. Indicate whether a form of encapsulation other
+than IP-in-IP encapsulation will be used. Care-of address (COA) fields.
+A list of one or more care-of addresses provided by the foreign
+
+agent. In our example below, the COA will be associated with the foreign
+agent, who will receive datagrams sent to the COA and then forward them
+to the appropriate mobile node. The mobile user will select one of these
+addresses as its COA when registering with its home agent. Figure 7.28
+illustrates some of the key fields in the agent advertisement message.
+With agent solicitation, a mobile node wanting to learn about agents
+without waiting to receive an agent advertisement can broadcast an agent
+solicitation message, which is simply an ICMP message with type value
+10. An agent receiving the solicitation will unicast an agent
+advertisement directly to the mobile node, which can then proceed as if
+it had received an unsolicited advertisement. Registration with the Home
+Agent Once a mobile IP node has received a COA, that address must be
+registered with the home agent. This can be done either via the foreign
+agent (who then registers the COA with the home agent) or directly by
+the mobile IP node itself. We consider the former case below. Four steps
+are involved.
+
+1.  Following the receipt of a foreign agent advertisement, a mobile
+    node sends a mobile IP registration message to the foreign agent.
+    The registration message is carried within a UDP datagram and sent
+    to port 434. The registration message carries a COA advertised by
+    the foreign agent, the address of the home agent (HA), the permanent
+    address of the mobile node (MA), the requested lifetime of the
+    registration, and a 64-bit registration identification. The
+    requested registration lifetime is the number of seconds that the
+    registration is to be valid. If the registration is not renewed at
+    the home agent within the specified lifetime, the registration will
+    become invalid. The registration identifier acts like a sequence
+    number and serves to match a received registration reply with a
+    registration request, as discussed below.
+
+2.  The foreign agent receives the registration message and records the
+    mobile node's permanent IP address. The foreign agent now knows that
+    it should be looking for datagrams containing an encapsulated
+    datagram whose destination address matches the permanent address of
+    the mobile node. The foreign agent then sends a mobile IP
+    registration message (again, within a UDP datagram) to port 434 of
+    the home agent. The message contains the COA, HA, MA, encapsulation
+    format requested, requested registration lifetime, and registration
+    identification.
+
+3.  The home agent receives the registration request and checks for
+    authenticity and correctness. The home agent binds the mobile node's
+    permanent IP address with the COA; in the future, datagrams arriving
+    at the home agent and addressed to the mobile node will now be
+    encapsulated and tunneled to the COA. The home agent sends a mobile
+    IP registration reply containing the HA, MA, actual registration
+    lifetime, and the registration identification of the request that is
+    being satisfied with this reply.
+
+4.  The foreign agent receives the registration reply and then forwards
+    it to the mobile node.
+
+At this point, registration is complete, and the mobile node can receive
+datagrams sent to its permanent address. Figure 7.29 illustrates these
+steps. Note that the home agent specifies a lifetime that is smaller
+than the lifetime requested by the mobile node. A foreign agent need not
+explicitly deregister a COA when a mobile node leaves its network. This
+will occur automatically, when the mobile node moves to a new network
+(whether another foreign network or its home network) and registers a
+new COA. The mobile IP standard allows many additional scenarios and
+capabilities in addition to those described previously. The interested
+reader should consult \[Perkins 1998b; RFC 5944\].
+
+Figure 7.29 Agent advertisement and mobile IP registration
+
+7.7 Managing Mobility in Cellular Networks Having examined how mobility
+is managed in IP networks, let's now turn our attention to networks with
+an even longer history of supporting mobility---cellular telephony
+networks. Whereas we focused on the first-hop wireless link in cellular
+networks in Section 7.4, we'll focus here on mobility, using the GSM
+cellular network \[Goodman 1997; Mouly 1992; Scourias 2012; Kaaranen
+2001; Korhonen 2003; Turner 2012\] as our case study, since it is a
+mature and widely deployed technology. Mobility in 3G and 4G networks is
+similar in principle to that used in GSM. As in the case of mobile IP,
+we'll see that a number of the fundamental principles we identified in
+Section 7.5 are embodied in GSM's network architecture. Like mobile IP,
+GSM adopts an indirect routing approach (see Section 7.5.2), first
+routing the correspondent's call to the mobile user's home network and
+from there to the visited network. In GSM terminology, the mobile
+users's home network is referred to as the mobile user's home public
+land mobile network (home PLMN). Since the PLMN acronym is a bit of a
+mouthful, and mindful of our quest to avoid an alphabet soup of
+acronyms, we'll refer to the GSM home PLMN simply as the home network.
+The home network is the cellular provider with which the mobile user has
+a subscription (i.e., the provider that bills the user for monthly
+cellular service). The visited PLMN, which we'll refer to simply as the
+visited network, is the network in which the mobile user is currently
+residing. As in the case of mobile IP, the responsibilities of the home
+and visited networks are quite different. The home network maintains a
+database known as the home location register (HLR), which contains the
+permanent cell phone number and subscriber profile information for each
+of its subscribers. Importantly, the HLR also contains information about
+the current locations of these subscribers. That is, if a mobile user is
+currently roaming in another provider's cellular network, the HLR
+contains enough information to obtain (via a process we'll describe
+shortly) an address in the visited network to which a call to the mobile
+user should be routed. As we'll see, a special switch in the home
+network, known as the Gateway Mobile services Switching Center (GMSC) is
+contacted by a correspondent when a call is placed to a mobile user.
+Again, in our quest to avoid an alphabet soup of acronyms, we'll refer
+to the GMSC here by a more descriptive term, home MSC. The visited
+network maintains a database known as the visitor location register
+(VLR). The VLR contains an entry for each mobile user that is currently
+in the portion of the network served by the VLR. VLR entries thus come
+and go as mobile users enter and leave the network. A VLR is usually
+co-located with the mobile switching center (MSC) that coordinates the
+setup of a call to and from the visited network.
+
+In practice, a provider's cellular network will serve as a home network
+for its subscribers and as a visited network for mobile users whose
+subscription is with a different cellular provider.
+
+Figure 7.30 Placing a call to a mobile user: Indirect routing
+
+7.7.1 Routing Calls to a Mobile User We're now in a position to describe
+how a call is placed to a mobile GSM user in a visited network. We'll
+consider a simple example below; more complex scenarios are described in
+\[Mouly 1992\]. The steps, as illustrated in Figure 7.30, are as
+follows:
+
+1.  The correspondent dials the mobile user's phone number. This number
+    itself does not refer to a particular telephone line or location
+    (after all, the phone number is fixed and the user is mobile!). The
+    leading digits in the number are sufficient to globally identify the
+    mobile's home network. The call is routed from the correspondent
+    through the PSTN to the home MSC in the mobile's home network. This
+    is the first leg of the call.
+
+2.  The home MSC receives the call and interrogates the HLR to determine
+    the location of the mobile user. In the simplest case, the HLR
+    returns the mobile station roaming number (MSRN), which we will
+    refer to as the roaming number. Note that this number is different
+    from the mobile's permanent phone number, which is associated with
+    the mobile's home network. The
+
+roaming number is ephemeral: It is temporarily assigned to a mobile when
+it enters a visited network. The roaming number serves a role similar to
+that of the care-of address in mobile IP and, like the COA, is invisible
+to the correspondent and the mobile. If HLR does not have the roaming
+number, it returns the address of the VLR in the visited network. In
+this case (not shown in Figure 7.30), the home MSC will need to query
+the VLR to obtain the roaming number of the mobile node. But how does
+the HLR get the roaming number or the VLR address in the first place?
+What happens to these values when the mobile user moves to another
+visited network? We'll consider these important questions shortly.
+
+3.  Given the roaming number, the home MSC sets up the second leg of the
+    call through the network to the MSC in the visited network. The call
+    is completed, being routed from the correspondent to the home MSC,
+    and from there to the visited MSC, and from there to the base
+    station serving the mobile user. An unresolved question in step 2 is
+    how the HLR obtains information about the location of the mobile
+    user. When a mobile telephone is switched on or enters a part of a
+    visited network that is covered by a new VLR, the mobile must
+    register with the visited network. This is done through the exchange
+    of signaling messages between the mobile and the VLR. The visited
+    VLR, in turn, sends a location update request message to the
+    mobile's HLR. This message informs the HLR of either the roaming
+    number at which the mobile can be contacted, or the address of the
+    VLR (which can then later be queried to obtain the mobile number).
+    As part of this exchange, the VLR also obtains subscriber
+    information from the HLR about the mobile and determines what
+    services (if any) should be accorded the mobile user by the visited
+    network.
+
+7.7.2 Handoffs in GSM A handoff occurs when a mobile station changes its
+association from one base station to another during a call. As shown in
+Figure 7.31, a mobile's call is initially (before handoff) routed to the
+mobile through one base station (which we'll refer to as the old base
+station), and after handoff is routed to the mobile through another base
+
+Figure 7.31 Handoff scenario between base stations with a common MSC
+
+station (which we'll refer to as the new base station). Note that a
+handoff between base stations results not only in the mobile
+transmitting/receiving to/from a new base station, but also in the
+rerouting of the ongoing call from a switching point within the network
+to the new base station. Let's initially assume that the old and new
+base stations share the same MSC, and that the rerouting occurs at this
+MSC. There may be several reasons for handoff to occur, including (1)
+the signal between the current base station and the mobile may have
+deteriorated to such an extent that the call is in danger of being
+dropped, and (2) a cell may have become overloaded, handling a large
+number of calls. This congestion may be alleviated by handing off
+mobiles to less congested nearby cells. While it is associated with a
+base station, a mobile periodically measures the strength of a beacon
+signal from its current base station as well as beacon signals from
+nearby base stations that it can "hear." These measurements are reported
+once or twice a second to the mobile's current base station. Handoff in
+GSM is initiated by the old base station based on these measurements,
+the current loads of mobiles in nearby cells, and other factors \[Mouly
+1992\]. The GSM standard does not specify the specific algorithm to be
+used by a base station to determine whether or not to perform handoff.
+Figure 7.32 illustrates the steps involved when a base station does
+decide to hand off a mobile user:
+
+1.  The old base station (BS) informs the visited MSC that a handoff is
+    to be performed and the BS (or possible set of BSs) to which the
+    mobile is to be handed off.
+
+2.  The visited MSC initiates path setup to the new BS, allocating the
+    resources needed to carry the rerouted call, and signaling the new
+    BS that a handoff is about to occur.
+
+3.  The new BS allocates and activates a radio channel for use by the
+    mobile.
+
+4.  The new BS signals back to the visited MSC and the old BS that the
+    visited-MSC-to-new-BS path has been established and that the mobile
+    should be
+
+Figure 7.32 Steps in accomplishing a handoff between base stations with
+a common MSC
+
+informed of the impending handoff. The new BS provides all of the
+information that the mobile will need to associate with the new BS.
+
+5.  The mobile is informed that it should perform a handoff. Note that
+    up until this point, the mobile has been blissfully unaware that the
+    network has been laying the groundwork (e.g., allocating a channel
+    in the new BS and allocating a path from the visited MSC to the new
+    BS) for a handoff.
+
+6.  The mobile and the new BS exchange one or more messages to fully
+    activate the new channel in the new BS.
+
+7.  The mobile sends a handoff complete message to the new BS, which is
+    forwarded up to the visited MSC. The visited MSC then reroutes the
+    ongoing call to the mobile via the new BS.
+
+8.  The resources allocated along the path to the old BS are then
+    released. Let's conclude our discussion of handoff by considering
+    what happens when the mobile moves to a BS that is associated with a
+    different MSC than the old BS, and what happens when this inter-MSC
+    handoff occurs more than once. As shown in Figure 7.33, GSM defines
+    the notion of an anchor MSC. The anchor MSC is the MSC visited by
+    the mobile when a call first begins; the anchor MSC thus remains
+    unchanged during the call. Throughout the call's duration and
+    regardless of the number of inter-MSC
+
+Figure 7.33 Rerouting via the anchor MSC
+
+Table 7.2 Commonalities between mobile IP and GSM mobility GSM element
+
+Comment on GSM element
+
+Mobile IP element
+
+Home system
+
+Network to which the mobile user's permanent phone number
+
+Home
+
+belongs.
+
+network
+
+Gateway mobile
+
+Home MSC: point of contact to obtain routable address of
+
+Home
+
+switching center or
+
+mobile user. HLR: database in home system containing
+
+agent
+
+simply home MSC,
+
+permanent phone number, profile information, current location
+
+Home location register
+
+of mobile user, subscription information.
+
+(HLR) Visited system
+
+Network other than home system where mobile user is
+
+Visited
+
+currently residing.
+
+network
+
+Visited mobile services
+
+Visited MSC: responsible for setting up calls to/from mobile
+
+Foreign
+
+switching center,
+
+nodes in cells associated with MSC. VLR: temporary database
+
+agent
+
+Visitor location register
+
+entry in visited system, containing subscription information for
+
+(VLR)
+
+each visiting mobile user.
+
+Mobile station roaming
+
+Routable address for telephone call segment between home
+
+Care-of
+
+number (MSRN) or
+
+MSC and visited MSC, visible to neither the mobile nor the
+
+address
+
+simply roaming number
+
+correspondent.
+
+transfers performed by the mobile, the call is routed from the home MSC
+to the anchor MSC, and then from the anchor MSC to the visited MSC where
+the mobile is currently located. When a mobile moves from the coverage
+area of one MSC to another, the ongoing call is rerouted from the anchor
+MSC to the new visited MSC containing the new base station. Thus, at all
+times there are at most three MSCs (the home MSC, the anchor MSC, and
+the visited MSC) between the correspondent and the mobile. Figure 7.33
+illustrates the routing of a call among the MSCs visited by a mobile
+user. Rather than maintaining a single MSC hop from the anchor MSC to
+the current MSC, an alternative approach would have been to simply chain
+the MSCs visited by the mobile, having an old MSC forward the ongoing
+call to the new MSC each time the mobile moves to a new MSC. Such MSC
+chaining can in fact occur in IS-41 cellular networks, with an optional
+path minimization step to remove MSCs between the anchor MSC and the
+current visited MSC \[Lin 2001\]. Let's wrap up our discussion of GSM
+mobility management with a comparison of mobility management in GSM and
+Mobile IP. The comparison in Table 7.2 indicates that although IP and
+cellular networks are fundamentally different in many ways, they share a
+surprising number of common functional elements and overall approaches
+in handling mobility.
+
+7.8 Wireless and Mobility: Impact on Higher-Layer Protocols In this
+chapter, we've seen that wireless networks differ significantly from
+their wired counterparts at both the link layer (as a result of wireless
+channel characteristics such as fading, multipath, and hidden terminals)
+and at the network layer (as a result of mobile users who change their
+points of attachment to the network). But are there important
+differences at the transport and application layers? It's tempting to
+think that these differences will be minor, since the network layer
+provides the same best-effort delivery service model to upper layers in
+both wired and wireless networks. Similarly, if protocols such as TCP or
+UDP are used to provide transport-layer services to applications in both
+wired and wireless networks, then the application layer should remain
+unchanged as well. In one sense our intuition is right---TCP and UDP can
+(and do) operate in networks with wireless links. On the other hand,
+transport protocols in general, and TCP in particular, can sometimes
+have very different performance in wired and wireless networks, and it
+is here, in terms of performance, that differences are manifested. Let's
+see why. Recall that TCP retransmits a segment that is either lost or
+corrupted on the path between sender and receiver. In the case of mobile
+users, loss can result from either network congestion (router buffer
+overflow) or from handoff (e.g., from delays in rerouting segments to a
+mobile's new point of attachment to the network). In all cases, TCP's
+receiver-to-sender ACK indicates only that a segment was not received
+intact; the sender is unaware of whether the segment was lost due to
+congestion, during handoff, or due to detected bit errors. In all cases,
+the sender's response is the same---to retransmit the segment. TCP's
+congestion-control response is also the same in all cases---TCP
+decreases its congestion window, as discussed in Section 3.7. By
+unconditionally decreasing its congestion window, TCP implicitly assumes
+that segment loss results from congestion rather than corruption or
+handoff. We saw in Section 7.2 that bit errors are much more common in
+wireless networks than in wired networks. When such bit errors occur or
+when handoff loss occurs, there's really no reason for the TCP sender to
+decrease its congestion window (and thus decrease its sending rate).
+Indeed, it may well be the case that router buffers are empty and
+packets are flowing along the end-to-end path unimpeded by congestion.
+Researchers realized in the early to mid 1990s that given high bit error
+rates on wireless links and the possibility of handoff loss, TCP's
+congestion-control response could be problematic in a wireless setting.
+Three broad classes of approaches are possible for dealing with this
+problem: Local recovery. Local recovery protocols recover from bit
+errors when and where (e.g., at the wireless link) they occur, e.g., the
+802.11 ARQ protocol we studied in Section 7.3, or more sophisticated
+approaches that use both ARQ and FEC \[Ayanoglu 1995\].
+
+TCP sender awareness of wireless links. In the local recovery
+approaches, the TCP sender is blissfully unaware that its segments are
+traversing a wireless link. An alternative approach is for the TCP
+sender and receiver to be aware of the existence of a wireless link, to
+distinguish between congestive losses occurring in the wired network and
+corruption/loss occurring at the wireless link, and to invoke congestion
+control only in response to congestive wired-network losses.
+\[Balakrishnan 1997\] investigates various types of TCP, assuming that
+end systems can make this distinction. \[Liu 2003\] investigates
+techniques for distinguishing between losses on the wired and wireless
+segments of an end-to-end path. Split-connection approaches. In a
+split-connection approach \[Bakre 1995\], the end-to-end connection
+between the mobile user and the other end point is broken into two
+transport-layer connections: one from the mobile host to the wireless
+access point, and one from the wireless access point to the other
+communication end point (which we'll assume here is a wired host). The
+end-to-end connection is thus formed by the concatenation of a wireless
+part and a wired part. The transport layer over the wireless segment can
+be a standard TCP connection \[Bakre 1995\], or a specially tailored
+error recovery protocol on top of UDP. \[Yavatkar 1994\] investigates
+the use of a transport-layer selective repeat protocol over the wireless
+connection. Measurements reported in \[Wei 2006\] indicate that split
+TCP connections are widely used in cellular data networks, and that
+significant improvements can indeed be made through the use of split TCP
+connections. Our treatment of TCP over wireless links has been
+necessarily brief here. In-depth surveys of TCP challenges and solutions
+in wireless networks can be found in \[Hanabali 2005; Leung 2006\]. We
+encourage you to consult the references for details of this ongoing area
+of research. Having considered transport-layer protocols, let us next
+consider the effect of wireless and mobility on application-layer
+protocols. Here, an important consideration is that wireless links often
+have relatively low bandwidths, as we saw in Figure 7.2. As a result,
+applications that operate over wireless links, particularly over
+cellular wireless links, must treat bandwidth as a scarce commodity. For
+example, a Web server serving content to a Web browser executing on a 4G
+phone will likely not be able to provide the same image-rich content
+that it gives to a browser operating over a wired connection. Although
+wireless links do provide challenges at the application layer, the
+mobility they enable also makes possible a rich set of location-aware
+and context-aware applications \[Chen 2000; Baldauf 2007\]. More
+generally, wireless and mobile networks will play a key role in
+realizing the ubiquitous computing environments of the future \[Weiser
+1991\]. It's fair to say that we've only seen the tip of the iceberg
+when it comes to the impact of wireless and mobile networks on networked
+applications and their protocols!
+
+7.9 Summary Wireless and mobile networks have revolutionized telephony
+and are having an increasingly profound impact in the world of computer
+networks as well. With their anytime, anywhere, untethered access into
+the global network infrastructure, they are not only making network
+access more ubiquitous, they are also enabling an exciting new set of
+location-dependent services. Given the growing importance of wireless
+and mobile networks, this chapter has focused on the principles, common
+link technologies, and network architectures for supporting wireless and
+mobile communication. We began this chapter with an introduction to
+wireless and mobile networks, drawing an important distinction between
+the challenges posed by the wireless nature of the communication links
+in such networks, and by the mobility that these wireless links enable.
+This allowed us to better isolate, identify, and master the key concepts
+in each area. We focused first on wireless communication, considering
+the characteristics of a wireless link in Section 7.2. In Sections 7.3
+and 7.4, we examined the link-level aspects of the IEEE 802.11 (WiFi)
+wireless LAN standard, two IEEE 802.15 personal area networks (Bluetooth
+and Zigbee), and 3G and 4G cellular Internet access. We then turned our
+attention to the issue of mobility. In Section 7.5, we identified
+several forms of mobility, with points along this spectrum posing
+different challenges and admitting different solutions. We considered
+the problems of locating and routing to a mobile user, as well as
+approaches for handing off the mobile user who dynamically moves from
+one point of attachment to the network to another. We examined how these
+issues were addressed in the mobile IP standard and in GSM, in Sections
+7.6 and 7.7, respectively. Finally, we considered the impact of wireless
+links and mobility on transport-layer protocols and networked
+applications in Section 7.8. Although we have devoted an entire chapter
+to the study of wireless and mobile networks, an entire book (or more)
+would be required to fully explore this exciting and rapidly expanding
+field. We encourage you to delve more deeply into this field by
+consulting the many references provided in this chapter.
+
+Homework Problems and Questions
+
+Chapter 7 Review Questions
+
+Section 7.1 R1. What does it mean for a wireless network to be operating
+in "infrastructure mode"? If the network is not in infrastructure mode,
+what mode of operation is it in, and what is the difference between that
+mode of operation and infrastructure mode? R2. What are the four types
+of wireless networks identified in our taxonomy in Section 7.1 ? Which
+of these types of wireless networks have you used?
+
+Section 7.2 R3. What are the differences between the following types of
+wireless channel impairments: path loss, multipath propagation,
+interference from other sources? R4. As a mobile node gets farther and
+farther away from a base station, what are two actions that a base
+station could take to ensure that the loss probability of a transmitted
+frame does not increase?
+
+Sections 7.3 and 7.4 R5. Describe the role of the beacon frames in
+802.11. R6. True or false: Before an 802.11 station transmits a data
+frame, it must first send an RTS frame and receive a corresponding CTS
+frame. R7. Why are acknowledgments used in 802.11 but not in wired
+Ethernet? R8. True or false: Ethernet and 802.11 use the same frame
+structure. R9. Describe how the RTS threshold works. R10. Suppose the
+IEEE 802.11 RTS and CTS frames were as long as the standard DATA and ACK
+frames. Would there be any advantage to using the CTS and RTS frames?
+Why or why not? R11. Section 7.3.4 discusses 802.11 mobility, in which a
+wireless station moves from one BSS to another within the same subnet.
+When the APs are interconnected with a switch, an AP may need to send a
+frame with a spoofed MAC address to get the switch to forward the frame
+properly. Why?
+
+R12. What are the differences between a master device in a Bluetooth
+network and a base station in an 802.11 network? R13. What is meant by a
+super frame in the 802.15.4 Zigbee standard? R14. What is the role of
+the "core network" in the 3G cellular data architecture? R15. What is
+the role of the RNC in the 3G cellular data network architecture? What
+role does the RNC play in the cellular voice network? R16. What is the
+role of the eNodeB, MME, P-GW, and S-GW in 4G architecture? R17. What
+are three important differences between the 3G and 4G cellular
+architectures?
+
+Sections 7.5 and 7.6 R18. If a node has a wireless connection to the
+Internet, does that node have to be mobile? Explain. Suppose that a user
+with a laptop walks around her house with her laptop, and always
+accesses the Internet through the same access point. Is this user mobile
+from a network standpoint? Explain. R19. What is the difference between
+a permanent address and a care-of address? Who assigns a care-of
+address? R20. Consider a TCP connection going over Mobile IP. True or
+false: The TCP connection phase between the correspondent and the mobile
+host goes through the mobile's home network, but the data transfer phase
+is directly between the correspondent and the mobile host, bypassing the
+home network.
+
+Section 7.7 R21. What are the purposes of the HLR and VLR in GSM
+networks? What elements of mobile IP are similar to the HLR and VLR?
+R22. What is the role of the anchor MSC in GSM networks?
+
+Section 7.8 R23. What are three approaches that can be taken to avoid
+having a single wireless link degrade the performance of an end-to-end
+transport-layer TCP connection?
+
+Problems P1. Consider the single-sender CDMA example in Figure 7.5 .
+What would be the sender's output (for the 2 data bits shown) if the
+sender's CDMA code were (1,−1,1,−1,1,−1,1,−1)? P2. Consider sender 2 in
+Figure 7.6 . What is the sender's output to the channel (before it is
+added to the signal from sender 1), Zi,m2?
+
+P3. Suppose that the receiver in Figure 7.6 wanted to receive the data
+being sent by sender 2. Show (by calculation) that the receiver is
+indeed able to recover sender 2's data from the aggregate channel signal
+by using sender 2's code. P4. For the two-sender, two-receiver example,
+give an example of two CDMA codes containing 1 and 21 values that do not
+allow the two receivers to extract the original transmitted bits from
+the two CDMA senders. P5. Suppose there are two ISPs providing WiFi
+access in a particular café, with each ISP operating its own AP and
+having its own IP address block.
+
+a.  Further suppose that by accident, each ISP has configured its AP to
+    operate over channel 11. Will the 802.11 protocol completely break
+    down in this situation? Discuss what happens when two stations, each
+    associated with a different ISP, attempt to transmit at the same
+    time.
+
+b.  Now suppose that one AP operates over channel 1 and the other over
+    channel 11. How do your answers change? P6. In step 4 of the CSMA/CA
+    protocol, a station that successfully transmits a frame begins the
+    CSMA/CA protocol for a second frame at step 2, rather than at
+    step 1. What rationale might the designers of CSMA/CA have had in
+    mind by having such a station not transmit the second frame
+    immediately (if the channel is sensed idle)? P7. Suppose an 802.11b
+    station is configured to always reserve the channel with the RTS/CTS
+    sequence. Suppose this station suddenly wants to transmit 1,000
+    bytes of data, and all other stations are idle at this time. As a
+    function of SIFS and DIFS, and ignoring propagation delay and
+    assuming no bit errors, calculate the time required to transmit the
+    frame and receive the acknowledgment. P8. Consider the scenario
+    shown in Figure 7.34 , in which there are four wireless nodes, A, B,
+    C, and D. The radio coverage of the four nodes is shown via the
+    shaded ovals; all nodes share the same frequency. When A transmits,
+    it
+
+Figure 7.34 Scenario for problem P8
+
+can only be heard/received by B; when B transmits, both A and C can
+hear/receive from B; when C transmits, both B and D can hear/receive
+from C; when D transmits, only C can hear/receive
+
+from D. Suppose now that each node has an infinite supply of messages
+that it wants to send to each of the other nodes. If a message's
+destination is not an immediate neighbor, then the message must be
+relayed. For example, if A wants to send to D, a message from A must
+first be sent to B, which then sends the message to C, which then sends
+the message to D. Time is slotted, with a message transmission time
+taking exactly one time slot, e.g., as in slotted Aloha. During a slot,
+a node can do one of the following: (i) send a message, (ii) receive a
+message (if exactly one message is being sent to it), (iii) remain
+silent. As always, if a node hears two or more simultaneous
+transmissions, a collision occurs and none of the transmitted messages
+are received successfully. You can assume here that there are no
+bit-level errors, and thus if exactly one message is sent, it will be
+received correctly by those within the transmission radius of the
+sender.
+
+a.  Suppose now that an omniscient controller (i.e., a controller that
+    knows the state of every node in the network) can command each node
+    to do whatever it (the omniscient controller) wishes, i.e., to send
+    a message, to receive a message, or to remain silent. Given this
+    omniscient controller, what is the maximum rate at which a data
+    message can be transferred from C to A, given that there are no
+    other messages between any other source/destination pairs?
+
+b.  Suppose now that A sends messages to B, and D sends messages to C.
+    What is the combined maximum rate at which data messages can flow
+    from A to B and from D to C?
+
+c.  Suppose now that A sends messages to B, and C sends messages to D.
+    What is the combined maximum rate at which data messages can flow
+    from A to B and from C to D?
+
+d.  Suppose now that the wireless links are replaced by wired links.
+    Repeat questions (a) through (c) again in this wired scenario.
+
+e.  Now suppose we are again in the wireless scenario, and that for
+    every data message sent from source to destination, the destination
+    will send an ACK message back to the source (e.g., as in TCP). Also
+    suppose that each ACK message takes up one slot. Repeat questions
+    (a)--(c) above for this scenario. P9. Describe the format of the
+    802.15.1 Bluetooth frame. You will have to do some reading outside
+    of the text to find this information. Is there anything in the frame
+    format that inherently limits the number of active nodes in an
+    802.15.1 network to eight active nodes? Explain. P10. Consider the
+    following idealized LTE scenario. The downstream channel (see Figure
+    7.21 ) is slotted in time, across F frequencies. There are four
+    nodes, A, B, C, and D, reachable from the base station at rates of
+    10 Mbps, 5 Mbps, 2.5 Mbps, and 1 Mbps, respectively, on the
+    downstream channel. These rates assume that the base station
+    utilizes all time slots available on all F frequencies to send to
+    just one station. The base station has an infinite amount of data to
+    send to each of the nodes, and can send to any one of these four
+    nodes using any of the F frequencies during any time slot in the
+    downstream sub-frame.
+
+f.  What is the maximum rate at which the base station can send to the
+    nodes, assuming it
+
+can send to any node it chooses during each time slot? Is your solution
+fair? Explain and define what you mean by "fair."
+
+b.  If there is a fairness requirement that each node must receive an
+    equal amount of data during each one second interval, what is the
+    average transmission rate by the base station (to all nodes) during
+    the downstream sub-frame? Explain how you arrived at your answer.
+
+c.  Suppose that the fairness criterion is that any node can receive at
+    most twice as much data as any other node during the sub-frame. What
+    is the average transmission rate by the base station (to all nodes)
+    during the sub-frame? Explain how you arrived at your answer. P11.
+    In Section 7.5 , one proposed solution that allowed mobile users to
+    maintain their IP addresses as they moved among foreign networks was
+    to have a foreign network advertise a highly specific route to the
+    mobile user and use the existing routing infrastructure to propagate
+    this information throughout the network. We identified scalability
+    as one concern. Suppose that when a mobile user moves from one
+    network to another, the new foreign network advertises a specific
+    route to the mobile user, and the old foreign network withdraws its
+    route. Consider how routing information propagates in a
+    distance-vector algorithm (particularly for the case of interdomain
+    routing among networks that span the globe).
+
+d.  Will other routers be able to route datagrams immediately to the new
+    foreign network as soon as the foreign network begins advertising
+    its route?
+
+e.  Is it possible for different routers to believe that different
+    foreign networks contain the mobile user?
+
+f.  Discuss the timescale over which other routers in the network will
+    eventually learn the path to the mobile users. P12. Suppose the
+    correspondent in Figure 7.23 were mobile. Sketch the additional
+    networklayer infrastructure that would be needed to route the
+    datagram from the original mobile user to the (now mobile)
+    correspondent. Show the structure of the datagram(s) between the
+    original mobile user and the (now mobile) correspondent, as in
+    Figure 7.24 . P13. In mobile IP, what effect will mobility have on
+    end-to-end delays of datagrams between the source and destination?
+    P14. Consider the chaining example discussed at the end of Section
+    7.7.2 . Suppose a mobile user visits foreign networks A, B, and C,
+    and that a correspondent begins a connection to the mobile user when
+    it is resident in foreign network A. List the sequence of messages
+    between foreign agents, and between foreign agents and the home
+    agent as the mobile user moves from network A to network B to
+    network C. Next, suppose chaining is not performed, and the
+    correspondent (as well as the home agent) must be explicitly
+    notified of the changes in the mobile user's care-of address. List
+    the sequence of messages that would need to be exchanged in this
+    second scenario.
+
+P15. Consider two mobile nodes in a foreign network having a foreign
+agent. Is it possible for the two mobile nodes to use the same care-of
+address in mobile IP? Explain your answer. P16. In our discussion of how
+the VLR updated the HLR with information about the mobile's current
+location, what are the advantages and disadvantages of providing the
+MSRN as opposed to the address of the VLR to the HLR?
+
+Wireshark Lab At the Web site for this textbook,
+www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab for
+this chapter that captures and studies the 802.11 frames exchanged
+between a wireless laptop and an access point.
+
+AN INTERVIEW WITH... Deborah Estrin Deborah Estrin is a Professor of
+Computer Science at Cornell Tech in New York City and a Professor of
+Public Health at Weill Cornell Medical College. She is founder of the
+Health Tech Hub at Cornell Tech and co-founder of the non-profit startup
+Open mHealth. She received her Ph.D. (1985) in Computer Science from
+M.I.T. and her B.S. (1980) from UC Berkeley. Estrin's early research
+focused on the design of network protocols, including multicast and
+inter-domain routing. In 2002 Estrin founded the NSF-funded Science and
+Technology Center at UCLA, Center for Embedded Networked Sensing (CENS
+http://cens.ucla.edu.). CENS launched new areas of multi-disciplinary
+computer systems research from sensor networks for environmental
+monitoring, to participatory sensing for citizen science. Her current
+focus is on mobile health and small data, leveraging the pervasiveness
+of mobile devices and digital interactions for health and life
+management, as described in her 2013 TEDMED talk. Professor Estrin is an
+elected member of the American Academy of Arts and Sciences (2007) and
+the National Academy of Engineering (2009). She is a fellow of the IEEE,
+ACM, and AAAS. She was selected as the first ACM-W Athena Lecturer
+(2006), awarded the Anita Borg Institute's Women of Vision Award for
+Innovation (2007), inducted into the WITI hall of fame (2008) and
+awarded Doctor Honoris Causa from EPFL (2008) and Uppsala University
+(2011).
+
+Please describe a few of the most exciting projects you have worked on
+during your career. What were the biggest challenges? In the mid-90s at
+USC and ISI, I had the great fortune to work with the likes of Steve
+Deering, Mark Handley, and Van Jacobson on the design of multicast
+routing protocols (in particular, PIM). I tried to carry many of the
+architectural design lessons from multicast into the design of
+ecological monitoring arrays, where for the first time I really began to
+take applications and multidisciplinary research seriously. That
+interest in jointly innovating in the social and technological space is
+what interests me so much about my latest area of research, mobile
+health. The challenges in these projects were as diverse as the problem
+domains, but what they all had in common was the need to keep our eyes
+open to whether we had the problem definition right as we iterated
+between design and deployment, prototype and pilot. None of them were
+problems that could be solved analytically, with simulation or even in
+constructed laboratory experiments. They all challenged our ability to
+retain clean architectures in the presence of messy problems and
+contexts, and they all called for extensive collaboration. What changes
+and innovations do you see happening in wireless networks and mobility
+in the future? In a prior edition of this interview I said that I have
+never put much faith into predicting the future, but I did go on to
+speculate that we might see the end of feature phones (i.e., those that
+are not programmable and are used only for voice and text messaging) as
+smart phones become more and more powerful and the primary point of
+Internet access for many---and now not so many years later that is
+clearly the case. I also predicted that we would see the continued
+proliferation of embedded SIMs by which all sorts of devices have the
+ability to communicate via the cellular network at low data rates. While
+that has occurred, we see many devices and "Internet of Things" that use
+embedded WiFi and other lower power, shorter range, forms of
+connectivity to local hubs. I did not anticipate at that time the
+emergence of a large consumer wearables market. By the time the next
+edition is published I expect broad proliferation of personal
+applications that leverage data from IoT and other digital traces. Where
+do you see the future of networking and the Internet? Again I think its
+useful to look both back and forward. Previously I observed that the
+efforts in named data and software-defined networking would emerge to
+create a more manageable, evolvable, and richer infrastructure and more
+generally represent moving the role of architecture higher up in the
+stack. In the beginnings of the Internet, architecture was layer 4 and
+below, with
+
+applications being more siloed/monolithic, sitting on top. Now data and
+analytics dominate transport. The adoption of SDN (which I'm really
+happy to see is featured in this 7th edition of this book) has been well
+beyond what I ever anticipated. However, looking up the stack, our
+dominant applications increasingly live in walled gardens, whether
+mobile apps or large consumer platforms such as Facebook. As Data
+Science and Big Data techniques develop, they might help to lure these
+applications out of their silos because of the value in connecting with
+other apps and platforms. What people inspired you professionally? There
+are three people who come to mind. First, Dave Clark, the secret sauce
+and under-sung hero of the Internet community. I was lucky to be around
+in the early days to see him act as the "organizing principle" of the
+IAB and Internet governance; the priest of rough consensus and running
+code. Second, Scott Shenker, for his intellectual brilliance, integrity,
+and persistence. I strive for, but rarely attain, his clarity in
+defining problems and solutions. He is always the first person I e-mail
+for advice on matters large and small. Third, my sister Judy Estrin, who
+had the creativity and courage to spend her career bringing ideas and
+concepts to market. Without the Judys of the world the Internet
+technologies would never have transformed our lives. What are your
+recommendations for students who want careers in computer science and
+networking? First, build a strong foundation in your academic work,
+balanced with any and every real-world work experience you can get. As
+you look for a working environment, seek opportunities in problem areas
+you really care about and with smart teams that you can learn from.
+
+Chapter 8 Security in Computer Networks
+
+Way back in Section 1.6 we described some of the more prevalent and
+damaging classes of Internet attacks, including malware attacks, denial
+of service, sniffing, source masquerading, and message modification and
+deletion. Although we have since learned a tremendous amount about
+computer networks, we still haven't examined how to secure networks from
+those attacks. Equipped with our newly acquired expertise in computer
+networking and Internet protocols, we'll now study in-depth secure
+communication and, in particular, how computer networks can be defended
+from those nasty bad guys. Let us introduce Alice and Bob, two people
+who want to communicate and wish to do so "securely." This being a
+networking text, we should remark that Alice and Bob could be two
+routers that want to exchange routing tables securely, a client and
+server that want to establish a secure transport connection, or two
+e-mail applications that want to exchange secure e-mail---all case
+studies that we will consider later in this chapter. Alice and Bob are
+well-known fixtures in the security community, perhaps because their
+names are more fun than a generic entity named "A" that wants to
+communicate securely with a generic entity named "B." Love affairs,
+wartime communication, and business transactions are the commonly cited
+human needs for secure communications; preferring the first to the
+latter two, we're happy to use Alice and Bob as our sender and receiver,
+and imagine them in this first scenario. We said that Alice and Bob want
+to communicate and wish to do so "securely," but what precisely does
+this mean? As we will see, security (like love) is a many-splendored
+thing; that is, there are many facets to security. Certainly, Alice and
+Bob would like for the contents of their communication to remain secret
+from an eavesdropper. They probably would also like to make sure that
+when they are communicating, they are indeed communicating with each
+other, and that if their communication is tampered with by an
+eavesdropper, that this tampering is detected. In the first part of this
+chapter, we'll cover the fundamental cryptography techniques that allow
+for encrypting communication, authenticating the party with whom one is
+communicating, and ensuring message integrity. In the second part of
+this chapter, we'll examine how the fundamental cryptography principles
+can be used to create secure networking protocols. Once again taking a
+top-down approach, we'll examine secure protocols in each of the (top
+four) layers, beginning with the application layer. We'll examine how to
+secure e-mail, how to secure a TCP connection, how to provide blanket
+security at the network layer, and how to secure a wireless LAN. In the
+third part of this chapter we'll consider operational security,
+
+which is about protecting organizational networks from attacks. In
+particular, we'll take a careful look at how firewalls and intrusion
+detection systems can enhance the security of an organizational network.
+
+8.1 What Is Network Security? Let's begin our study of network security
+by returning to our lovers, Alice and Bob, who want to communicate
+"securely." What precisely does this mean? Certainly, Alice wants only
+Bob to be able to understand a message that she has sent, even though
+they are communicating over an insecure medium where an intruder (Trudy,
+the intruder) may intercept whatever is transmitted from Alice to Bob.
+Bob also wants to be sure that the message he receives from Alice was
+indeed sent by Alice, and Alice wants to make sure that the person with
+whom she is communicating is indeed Bob. Alice and Bob also want to make
+sure that the contents of their messages have not been altered in
+transit. They also want to be assured that they can communicate in the
+first place (i.e., that no one denies them access to the resources
+needed to communicate). Given these considerations, we can identify the
+following desirable properties of secure communication. Confidentiality.
+Only the sender and intended receiver should be able to understand the
+contents of the transmitted message. Because eavesdroppers may intercept
+the message, this necessarily requires that the message be somehow
+encrypted so that an intercepted message cannot be understood by an
+interceptor. This aspect of confidentiality is probably the most
+commonly perceived meaning of the term secure communication. We'll study
+cryptographic techniques for encrypting and decrypting data in Section
+8.2. Message integrity. Alice and Bob want to ensure that the content of
+their communication is not altered, either maliciously or by accident,
+in transit. Extensions to the checksumming techniques that we
+encountered in reliable transport and data link protocols can be used to
+provide such message integrity. We will study message integrity in
+Section 8.3. End-point authentication. Both the sender and receiver
+should be able to confirm the identity of the other party involved in
+the communication---to confirm that the other party is indeed who or
+what they claim to be. Face-to-face human communication solves this
+problem easily by visual recognition. When communicating entities
+exchange messages over a medium where they cannot see the other party,
+authentication is not so simple. When a user wants to access an inbox,
+how does the mail server verify that the user is the person he or she
+claims to be? We study end-point authentication in Section 8.4.
+Operational security. Almost all organizations (companies, universities,
+and so on) today have networks that are attached to the public Internet.
+These networks therefore can potentially be compromised. Attackers can
+attempt to deposit worms into the hosts in the network, obtain corporate
+secrets, map the internal network configurations, and launch DoS
+attacks. We'll see in Section 8.9 that operational devices such as
+firewalls and intrusion detection systems are used to counter attacks
+against an organization's network. A firewall sits between the
+organization's network and the public network, controlling packet access
+to and from the network. An intrusion detection
+
+system performs "deep packet inspection," alerting the network
+administrators about suspicious activity. Having established what we
+mean by network security, let's next consider exactly what information
+an intruder may have access to, and what actions can be taken by the
+intruder. Figure 8.1 illustrates the scenario. Alice, the sender, wants
+to send data to Bob, the receiver. In order to exchange data securely,
+while meeting the requirements of confidentiality, end-point
+authentication, and message integrity, Alice and Bob will exchange
+control messages and data messages (in much the same way that TCP
+senders and receivers exchange control segments and data segments).
+
+Figure 8.1 Sender, receiver, and intruder (Alice, Bob, and Trudy)
+
+All or some of these messages will typically be encrypted. As discussed
+in Section 1.6, an intruder can potentially perform
+eavesdropping---sniffing and recording control and data messages on the
+channel. modification, insertion, or deletion of messages or message
+content. As we'll see, unless appropriate countermeasures are taken,
+these capabilities allow an intruder to mount a wide variety of security
+attacks: snooping on communication (possibly stealing passwords and
+data), impersonating another entity, hijacking an ongoing session,
+denying service to legitimate network users by overloading system
+resources, and so on. A summary of reported attacks is maintained at the
+CERT Coordination Center \[CERT 2016\]. Having established that there
+are indeed real threats loose in the Internet, what are the Internet
+equivalents of Alice and Bob, our friends who need to communicate
+securely? Certainly, Bob and Alice might be human users at two end
+systems, for example, a real Alice and a real Bob who really do want to
+exchange secure e-mail. They might also be participants in an electronic
+commerce transaction. For example, a real Bob might want to transfer his
+credit card number securely to a Web server to purchase
+
+an item online. Similarly, a real Alice might want to interact with her
+bank online. The parties needing secure communication might themselves
+also be part of the network infrastructure. Recall that the domain name
+system (DNS, see Section 2.4) or routing daemons that exchange routing
+information (see Chapter 5) require secure communication between two
+parties. The same is true for network management applications, a topic
+we examined in Chapter 5). An intruder that could actively interfere
+with DNS lookups (as discussed in Section 2.4), routing computations
+\[RFC 4272\], or network management functions \[RFC 3414\] could wreak
+havoc in the Internet. Having now established the framework, a few of
+the most important definitions, and the need for network security, let
+us next delve into cryptography. While the use of cryptography in
+providing confidentiality is self-evident, we'll see shortly that it is
+also central to providing end-point authentication and message
+integrity---making cryptography a cornerstone of network security.
+
+8.2 Principles of Cryptography Although cryptography has a long history
+dating back at least as far as Julius Caesar, modern cryptographic
+techniques, including many of those used in the Internet, are based on
+advances made in the past 30 years. Kahn's book, The Codebreakers \[Kahn
+1967\], and Singh's book, The Code Book: The Science of Secrecy from
+Ancient Egypt to Quantum Cryptography \[Singh 1999\], provide a
+fascinating look at the
+
+Figure 8.2 Cryptographic components
+
+long history of cryptography. A complete discussion of cryptography
+itself requires a complete book \[Kaufman 1995; Schneier 1995\] and so
+we only touch on the essential aspects of cryptography, particularly as
+they are practiced on the Internet. We also note that while our focus in
+this section will be on the use of cryptography for confidentiality,
+we'll see shortly that cryptographic techniques are inextricably woven
+into authentication, message integrity, nonrepudiation, and more.
+Cryptographic techniques allow a sender to disguise data so that an
+intruder can gain no information from the intercepted data. The
+receiver, of course, must be able to recover the original data from the
+disguised data. Figure 8.2 illustrates some of the important
+terminology. Suppose now that Alice wants to send a message to Bob.
+Alice's message in its original form (for example, " Bob, I love you.
+Alice ") is known as plaintext, or cleartext. Alice encrypts her
+plaintext message using an encryption algorithm so that the encrypted
+message, known as ciphertext, looks unintelligible to any intruder.
+Interestingly, in many modern cryptographic systems,
+
+including those used in the Internet, the encryption technique itself is
+known---published, standardized, and available to everyone (for example,
+\[RFC 1321; RFC 3447; RFC 2420; NIST 2001\]), even a potential intruder!
+Clearly, if everyone knows the method for encoding data, then there must
+be some secret information that prevents an intruder from decrypting the
+transmitted data. This is where keys come in. In Figure 8.2, Alice
+provides a key, KA, a string of numbers or characters, as input to the
+encryption algorithm. The encryption algorithm takes the key and the
+plaintext message, m, as input and produces ciphertext as output. The
+notation KA(m) refers to the ciphertext form (encrypted using the key
+KA) of the plaintext message, m. The actual encryption algorithm that
+uses key KA will be evident from the context. Similarly, Bob will
+provide a key, KB, to the decryption algorithm that takes the ciphertext
+and Bob's key as input and produces the original plaintext as output.
+That is, if Bob receives an encrypted message KA(m), he decrypts it by
+computing KB(KA(m))=m. In symmetric key systems, Alice's and Bob's keys
+are identical and are secret. In public key systems, a pair of keys is
+used. One of the keys is known to both Bob and Alice (indeed, it is
+known to the whole world). The other key is known only by either Bob or
+Alice (but not both). In the following two subsections, we consider
+symmetric key and public key systems in more detail.
+
+8.2.1 Symmetric Key Cryptography All cryptographic algorithms involve
+substituting one thing for another, for example, taking a piece of
+plaintext and then computing and substituting the appropriate ciphertext
+to create the encrypted message. Before studying a modern key-based
+cryptographic system, let us first get our feet wet by studying a very
+old, very simple symmetric key algorithm attributed to Julius Caesar,
+known as the Caesar cipher (a cipher is a method for encrypting data).
+For English text, the Caesar cipher would work by taking each letter in
+the plaintext message and substituting the letter that is k letters
+later (allowing wraparound; that is, having the letter z followed by the
+letter a) in the alphabet. For example if k=3, then the letter a in
+plaintext becomes d in ciphertext; b in plaintext becomes e in
+ciphertext, and so on. Here, the value of k serves as the key. As an
+example, the plaintext message " bob, i love you. Alice " becomes " ere,
+l oryh brx. dolfh " in ciphertext. While the ciphertext does indeed look
+like gibberish, it wouldn't take long to break the code if you knew that
+the Caesar cipher was being used, as there are only 25 possible key
+values. An improvement on the Caesar cipher is the monoalphabetic
+cipher, which also substitutes one letter of the alphabet with another
+letter of the alphabet. However, rather than substituting according to a
+regular pattern (for example, substitution with an offset of k for all
+letters), any letter can be substituted for any other letter, as long as
+each letter has a unique substitute letter, and vice versa. The
+substitution
+
+rule in Figure 8.3 shows one possible rule for encoding plaintext. The
+plaintext message " bob, i love you. Alice " becomes "nkn, s gktc wky.
+Mgsbc." Thus, as in the case of the Caesar cipher, this looks like
+gibberish. A monoalphabetic cipher would also appear to be better than
+the Caesar cipher in that there are 26! (on the order of 1026) possible
+pairings of letters rather than 25 possible pairings. A brute-force
+approach of trying all 1026 possible pairings
+
+Figure 8.3 A monoalphabetic cipher
+
+would require far too much work to be a feasible way of breaking the
+encryption algorithm and decoding the message. However, by statistical
+analysis of the plaintext language, for example, knowing that the
+letters e and t are the most frequently occurring letters in typical
+English text (accounting for 13 percent and 9 percent of letter
+occurrences), and knowing that particular two-and three-letter
+occurrences of letters appear quite often together (for example, "in,"
+"it," "the," "ion," "ing," and so forth) make it relatively easy to
+break this code. If the intruder has some knowledge about the possible
+contents of the message, then it is even easier to break the code. For
+example, if Trudy the intruder is Bob's wife and suspects Bob of having
+an affair with Alice, then she might suspect that the names "bob" and
+"alice" appear in the text. If Trudy knew for certain that those two
+names appeared in the ciphertext and had a copy of the example
+ciphertext message above, then she could immediately determine seven of
+the 26 letter pairings, requiring 109 fewer possibilities to be checked
+by a brute-force method. Indeed, if Trudy suspected Bob of having an
+affair, she might well expect to find some other choice words in the
+message as well. When considering how easy it might be for Trudy to
+break Bob and Alice's encryption scheme, one can distinguish three
+different scenarios, depending on what information the intruder has.
+Ciphertext-only attack. In some cases, the intruder may have access only
+to the intercepted ciphertext, with no certain information about the
+contents of the plaintext message. We have seen how statistical analysis
+can help in a ciphertext-only attack on an encryption scheme.
+Known-plaintext attack. We saw above that if Trudy somehow knew for sure
+that "bob" and "alice" appeared in the ciphertext message, then she
+could have determined the (plaintext, ciphertext) pairings for the
+letters a, l, i, c, e, b, and o. Trudy might also have been fortunate
+enough to have recorded all of the ciphertext transmissions and then
+found Bob's own decrypted version of one of the transmissions scribbled
+on a piece of paper. When an intruder knows some of the (plaintext,
+ciphertext) pairings, we refer to this as a known-plaintext attack on
+the encryption scheme. Chosen-plaintext attack. In a chosen-plaintext
+attack, the intruder is able to choose the plaintext
+
+message and obtain its corresponding ciphertext form. For the simple
+encryption algorithms we've seen so far, if Trudy could get Alice to
+send the message, " The quick brown fox jumps over the lazy dog, " she
+could completely break the encryption scheme. We'll see shortly that for
+more sophisticated encryption techniques, a chosen-plaintext attack does
+not necessarily mean that the encryption technique can be broken. Five
+hundred years ago, techniques improving on monoalphabetic encryption,
+known as polyalphabetic encryption, were invented. The idea behind
+polyalphabetic encryption is to use multiple monoalphabetic ciphers,
+with a specific
+
+Figure 8.4 A polyalphabetic cipher using two Caesar ciphers
+
+monoalphabetic cipher to encode a letter in a specific position in the
+plaintext message. Thus, the same letter, appearing in different
+positions in the plaintext message, might be encoded differently. An
+example of a polyalphabetic encryption scheme is shown in Figure 8.4. It
+has two Caesar ciphers (with k=5 and k=19), shown as rows. We might
+choose to use these two Caesar ciphers, C1 and C2, in the repeating
+pattern C1, C2, C2, C1, C2. That is, the first letter of plaintext is to
+be encoded using C1, the second and third using C2, the fourth using C1,
+and the fifth using C2. The pattern then repeats, with the sixth letter
+being encoded using C1, the seventh with C2, and so on. The plaintext
+message " bob, i love you. " is thus encrypted " ghu, n etox dhz. " Note
+that the first b in the plaintext message is encrypted using C1, while
+the second b is encrypted using C2. In this example, the encryption and
+decryption "key" is the knowledge of the two Caesar keys (k=5, k=19) and
+the pattern C1, C2, C2, C1, C2. Block Ciphers Let us now move forward to
+modern times and examine how symmetric key encryption is done today.
+There are two broad classes of symmetric encryption techniques: stream
+ciphers and block ciphers. We'll briefly examine stream ciphers in
+Section 8.7 when we investigate security for wireless LANs. In this
+section, we focus on block ciphers, which are used in many secure
+Internet protocols, including PGP (for secure e-mail), SSL (for securing
+TCP connections), and IPsec (for securing the network-layer transport).
+In a block cipher, the message to be encrypted is processed in blocks of
+k bits. For example, if k=64, then the message is broken into 64-bit
+blocks, and each block is encrypted independently. To encode a block,
+the cipher uses a one-to-one mapping to map the k-bit block of cleartext
+to a k-bit block of
+
+ciphertext. Let's look at an example. Suppose that k=3, so that the
+block cipher maps 3-bit inputs (cleartext) to 3-bit outputs
+(ciphertext). One possible mapping is given in Table 8.1. Notice that
+this is a one-to-one mapping; that is, there is a different output for
+each input. This block cipher breaks the message up into 3-bit blocks
+and encrypts each block according to the above mapping. You should
+verify that the message 010110001111 gets encrypted into 101000111001.
+Continuing with this 3-bit block example, note that the mapping in Table
+8.1 is just one mapping of many possible mappings. How many possible
+mappings are Table 8.1 A specific 3-bit block cipher input
+
+output
+
+input
+
+output
+
+000
+
+110
+
+100
+
+011
+
+001
+
+111
+
+101
+
+010
+
+010
+
+101
+
+110
+
+000
+
+011
+
+100
+
+111
+
+001
+
+there? To answer this question, observe that a mapping is nothing more
+than a permutation of all the possible inputs. There are 23(=8) possible
+inputs (listed under the input columns). These eight inputs can be
+permuted in 8!=40,320 different ways. Since each of these permutations
+specifies a mapping, there are 40,320 possible mappings. We can view
+each of these mappings as a key---if Alice and Bob both know the mapping
+(the key), they can encrypt and decrypt the messages sent between them.
+The brute-force attack for this cipher is to try to decrypt ciphtertext
+by using all mappings. With only 40,320 mappings (when k=3), this can
+quickly be accomplished on a desktop PC. To thwart brute-force attacks,
+block ciphers typically use much larger blocks, consisting of k=64 bits
+or even larger. Note that the number of possible mappings for a general
+k-block cipher is 2k!, which is astronomical for even moderate values of
+k (such as k=64). Although full-table block ciphers, as just described,
+with moderate values of k can produce robust symmetric key encryption
+schemes, they are unfortunately difficult to implement. For k=64 and for
+a given mapping, Alice and Bob would need to maintain a table with 264
+input values, which is an infeasible task. Moreover, if Alice and Bob
+were to change keys, they would have to each regenerate the table. Thus,
+a full-table block cipher, providing predetermined mappings between all
+inputs and outputs (as in the example above), is simply out of the
+question.
+
+Instead, block ciphers typically use functions that simulate randomly
+permuted tables. An example (adapted from \[Kaufman 1995\]) of such a
+function for k=64 bits is shown in Figure 8.5. The function first breaks
+a 64-bit block into 8 chunks, with each chunk consisting of 8 bits. Each
+8-bit chunk is processed by an 8-bit to 8-bit table, which is of
+manageable size. For example, the first chunk is processed by the table
+denoted by T1. Next, the 8 output chunks are reassembled into a 64-bit
+block. The positions of the 64 bits in the block are then scrambled
+(permuted) to produce a 64-bit output. This output is fed back to the
+64-bit input, where another cycle begins. After n such cycles, the
+function provides a 64-bit block of ciphertext. The purpose of the
+rounds is to make each input bit affect most (if not all) of the final
+output bits. (If only one round were used, a given input bit would
+affect only 8 of the 64 output bits.) The key for this block cipher
+algorithm would be the eight permutation tables (assuming the scramble
+function is publicly known).
+
+Figure 8.5 An example of a block cipher
+
+Today there are a number of popular block ciphers, including DES
+(standing for Data Encryption Standard), 3DES, and AES (standing for
+Advanced Encryption Standard). Each of these standards uses functions,
+rather than predetermined tables, along the lines of Figure 8.5 (albeit
+more complicated and specific to each cipher). Each of these algorithms
+also uses a string of bits for a key. For example, DES uses 64-bit
+blocks with a 56-bit key. AES uses 128-bit blocks and can operate with
+keys that are 128, 192, and 256 bits long. An algorithm's key determines
+the specific "mini-table" mappings and permutations within the
+algorithm's internals. The brute-force attack for each of these ciphers
+is to cycle through all the keys, applying the decryption algorithm with
+each key. Observe that with a key length of n, there are 2n possible
+keys. NIST \[NIST 2001\] estimates that a machine that could crack
+56-bit DES in one second (that is, try all 256 keys in one second) would
+take approximately 149 trillion years to crack a 128-bit AES key.
+
+Cipher-Block Chaining In computer networking applications, we typically
+need to encrypt long messages (or long streams of data). If we apply a
+block cipher as described by simply chopping up the message into k-bit
+blocks and independently encrypting each block, a subtle but important
+problem occurs. To see this, observe that two or more of the cleartext
+blocks can be identical. For example, the cleartext in two or more
+blocks could be "HTTP/1.1". For these identical blocks, a block cipher
+would, of course, produce the same ciphertext. An attacker could
+potentially guess the cleartext when it sees identical ciphertext blocks
+and may even be able to decrypt the entire message by identifying
+identical ciphtertext blocks and using knowledge about the underlying
+protocol structure \[Kaufman 1995\]. To address this problem, we can mix
+some randomness into the ciphertext so that identical plaintext blocks
+produce different ciphertext blocks. To explain this idea, let m(i)
+denote the ith plaintext block, c(i) denote the ith ciphertext block,
+and a⊕b denote the exclusive-or (XOR) of two bit strings, a and b.
+(Recall that the 0⊕0=1⊕1=0 and 0⊕1=1⊕0=1, and the XOR of two bit strings
+is done on a bit-by-bit basis. So, for example,
+10101010⊕11110000=01011010.) Also, denote the block-cipher encryption
+algorithm with key S as KS. The basic idea is as follows. The sender
+creates a random k-bit number r(i) for the ith block and calculates
+c(i)=KS(m(i)⊕r(i)). Note that a new k-bit random number is chosen for
+each block. The sender then sends c(1), r(1), c(2), r(2), c(3), r(3),
+and so on. Since the receiver receives c(i) and r(i), it can recover
+each block of the plaintext by computing m(i)=KS(c(i))⊕r(i). It is
+important to note that, although r(i) is sent in the clear and thus can
+be sniffed by Trudy, she cannot obtain the plaintext m(i), since she
+does not know the key KS. Also note that if two plaintext blocks m(i)
+and m(j) are the same, the corresponding ciphertext blocks c(i) and c(j)
+will be different (as long as the random numbers r(i) and r(j) are
+different, which occurs with very high probability). As an example,
+consider the 3-bit block cipher in Table 8.1. Suppose the plaintext is
+010010010. If Alice encrypts this directly, without including the
+randomness, the resulting ciphertext becomes 101101101. If Trudy sniffs
+this ciphertext, because each of the three cipher blocks is the same,
+she can correctly surmise that each of the three plaintext blocks are
+the same. Now suppose instead Alice generates the random blocks
+r(1)=001, r(2)=111, and r(3)=100 and uses the above technique to
+generate the ciphertext c(1)=100, c(2)=010, and c(3)=000. Note that the
+three ciphertext blocks are different even though the plaintext blocks
+are the same. Alice then sends c(1), r(1), c(2), and r(2). You should
+verify that Bob can obtain the original plaintext using the shared key
+KS. The astute reader will note that introducing randomness solves one
+problem but creates another: namely, Alice must transmit twice as many
+bits as before. Indeed, for each cipher bit, she must now also send a
+random bit, doubling the required bandwidth. In order to have our cake
+and eat it too, block ciphers typically use a technique called Cipher
+Block Chaining (CBC). The basic idea is to send only one random value
+along with the very first message, and then have the sender and receiver
+use the
+
+computed coded blocks in place of the subsequent random number.
+Specifically, CBC operates as follows:
+
+1.  Before encrypting the message (or the stream of data), the sender
+    generates a random k-bit string, called the Initialization Vector
+    (IV). Denote this initialization vector by c(0). The sender sends
+    the IV to the receiver in cleartext.
+
+2.  For the first block, the sender calculates m(1)⊕c(0), that is,
+    calculates the exclusive-or of the first block of cleartext with
+    the IV. It then runs the result through the block-cipher algorithm
+    to get the corresponding ciphertext block; that is,
+    c(1)=KS(m(1)⊕c(0)). The sender sends the encrypted block c(1) to the
+    receiver.
+
+3.  For the ith block, the sender generates the ith ciphertext block
+    from c(i)= KS(m(i)⊕c(i−1)). Let's now examine some of the
+    consequences of this approach. First, the receiver will still be
+    able to recover the original message. Indeed, when the receiver
+    receives c(i), it decrypts it with KS to obtain s(i)=m(i)⊕c(i−1);
+    since the receiver also knows c(i−1), it then obtains the cleartext
+    block from m(i)=s(i)⊕c(i−1). Second, even if two cleartext blocks
+    are identical, the corresponding ciphtertexts (almost always) will
+    be different. Third, although the sender sends the IV in the clear,
+    an intruder will still not be able to decrypt the ciphertext blocks,
+    since the intruder does not know the secret key, S. Finally, the
+    sender only sends one overhead block (the IV), thereby negligibly
+    increasing the bandwidth usage for long messages (consisting of
+    hundreds of blocks). As an example, let's now determine the
+    ciphertext for the 3-bit block cipher in Table 8.1 with plaintext
+    010010010 and IV=c(0)=001. The sender first uses the IV to calculate
+    c(1)=KS(m(1)⊕c(0))=100. The sender then calculates c(2)=
+    KS(m(2)⊕c(1))=KS(010⊕100)=000, and C(3)=KS(m(3)⊕c(2))=KS(010⊕
+    000)=101. The reader should verify that the receiver, knowing the IV
+    and KS can recover the original plaintext. CBC has an important
+    consequence when designing secure network protocols: we'll need to
+    provide a mechanism within the protocol to distribute the IV from
+    sender to receiver. We'll see how this is done for several protocols
+    later in this chapter.
+
+8.2.2 Public Key Encryption For more than 2,000 years (since the time of
+the Caesar cipher and up to the 1970s), encrypted communication required
+that the two communicating parties share a common secret---the symmetric
+key used for encryption and decryption. One difficulty with this
+approach is that the two parties must somehow agree on the shared key;
+but to do so requires (presumably secure) communication! Perhaps the
+parties could first meet and agree on the key in person (for example,
+two of Caesar's centurions might meet at the Roman baths) and thereafter
+communicate with encryption. In a networked world,
+
+however, communicating parties may never meet and may never converse
+except over the network. Is it possible for two parties to communicate
+with encryption without having a shared secret key that is known in
+advance? In 1976, Diffie and Hellman \[Diffie 1976\] demonstrated an
+algorithm (known now as Diffie-Hellman Key Exchange) to do just that---a
+radically different and marvelously elegant approach toward secure
+communication that has led to the development of today's public key
+cryptography systems. We'll see shortly that public key cryptography
+systems also have several wonderful properties that make them useful not
+only
+
+Figure 8.6 Public key cryptography
+
+for encryption, but for authentication and digital signatures as well.
+Interestingly, it has recently come to light that ideas similar to those
+in \[Diffie 1976\] and \[RSA 1978\] had been independently developed in
+the early 1970s in a series of secret reports by researchers at the
+Communications-Electronics Security Group in the United Kingdom \[Ellis
+1987\]. As is often the case, great ideas can spring up independently in
+many places; fortunately, public key advances took place not only in
+private, but also in the public view, as well. The use of public key
+cryptography is conceptually quite simple. Suppose Alice wants to
+communicate with Bob. As shown in Figure 8.6, rather than Bob and Alice
+sharing a single secret key (as in the case of symmetric key systems),
+Bob (the recipient of Alice's messages) instead has two keys---a public
+key that is available to everyone in the world (including Trudy the
+intruder) and a private key that is known only to Bob. We will use the
+notation KB+ and KB− to refer to Bob's public and private keys,
+respectively. In order to communicate with Bob, Alice first fetches
+Bob's public key. Alice then encrypts her message, m, to Bob using Bob's
+public key and a known (for example, standardized) encryption algorithm;
+that is, Alice computes KB−(m). Bob receives Alice's encrypted message
+and uses his private key and a known (for example, standardized)
+decryption algorithm to decrypt Alice's encrypted message. That is, Bob
+computes KB−(KB+(m)). We will see below that there are
+encryption/decryption
+
+algorithms and techniques for choosing public and private keys such that
+KB−(KB+(m))=m; that is, applying Bob's public key, KB+, to a message, m
+(to get KB−(m)), and then applying Bob's private key, KB−, to the
+encrypted version of m (that is, computing KB−(KB+(m))) gives back m.
+This is a remarkable result! In this manner, Alice can use Bob's
+publicly available key to send a secret message to Bob without either of
+them having to distribute any secret keys! We will see shortly that we
+can interchange the public key and private key encryption and get the
+same remarkable result----that is, KB−(B+(m))=KB+(KB−(m))=m. The use of
+public key cryptography is thus conceptually simple. But two immediate
+worries may spring to mind. A first concern is that although an intruder
+intercepting Alice's encrypted message will see only gibberish, the
+intruder knows both the key (Bob's public key, which is available for
+all the world to see) and the algorithm that Alice used for encryption.
+Trudy can thus mount a chosen-plaintext attack, using the known
+standardized encryption algorithm and Bob's publicly available
+encryption key to encode any message she chooses! Trudy might well try,
+for example, to encode messages, or parts of messages, that she suspects
+that Alice might send. Clearly, if public key cryptography is to work,
+key selection and encryption/decryption must be done in such a way that
+it is impossible (or at least so hard as to be nearly impossible) for an
+intruder to either determine Bob's private key or somehow otherwise
+decrypt or guess Alice's message to Bob. A second concern is that since
+Bob's encryption key is public, anyone can send an encrypted message to
+Bob, including Alice or someone claiming to be Alice. In the case of a
+single shared secret key, the fact that the sender knows the secret key
+implicitly identifies the sender to the receiver. In the case of public
+key cryptography, however, this is no longer the case since anyone can
+send an encrypted message to Bob using Bob's publicly available key. A
+digital signature, a topic we will study in Section 8.3, is needed to
+bind a sender to a message. RSA While there may be many algorithms that
+address these concerns, the RSA algorithm (named after its founders, Ron
+Rivest, Adi Shamir, and Leonard Adleman) has become almost synonymous
+with public key cryptography. Let's first see how RSA works and then
+examine why it works. RSA makes extensive use of arithmetic operations
+using modulo-n arithmetic. So let's briefly review modular arithmetic.
+Recall that x mod n simply means the remainder of x when divided by n;
+so, for example, 19 mod 5=4. In modular arithmetic, one performs the
+usual operations of addition, multiplication, and exponentiation.
+However, the result of each operation is replaced by the integer
+remainder that is left when the result is divided by n. Adding and
+multiplying with modular arithmetic is facilitated with the following
+handy facts: \[ (a mod n)+(b mod n)\]mod n=(a+b)mod n\[ (a mod n)−(b mod
+n)\]mod n=(a−b)mod n\[ (a mod n)⋅(b mod n)\]mod n=(a⋅b)mod n
+
+It follows from the third fact that (a mod n)d n=ad mod n, which is an
+identity that we will soon find very useful. Now suppose that Alice
+wants to send to Bob an RSA-encrypted message, as shown in Figure 8.6.
+In our discussion of RSA, let's always keep in mind that a message is
+nothing but a bit pattern, and every bit pattern can be uniquely
+represented by an integer number (along with the length of the bit
+pattern). For example, suppose a message is the bit pattern 1001; this
+message can be represented by the decimal integer 9. Thus, when
+encrypting a message with RSA, it is equivalent to encrypting the unique
+integer number that represents the message. There are two interrelated
+components of RSA: The choice of the public key and the private key The
+encryption and decryption algorithm To generate the public and private
+RSA keys, Bob performs the following steps:
+
+1.  Choose two large prime numbers, p and q. How large should p and q
+    be? The larger the values, the more difficult it is to break RSA,
+    but the longer it takes to perform the encoding and decoding. RSA
+    Laboratories recommends that the product of p and q be on the order
+    of 1,024 bits. For a discussion of how to find large prime numbers,
+    see \[Caldwell 2012\].
+
+2.  Compute n=pq and z=(p−1)(q−1).
+
+3.  Choose a number, e, less than n, that has no common factors (other
+    than 1) with z. (In this case, e and z are said to be relatively
+    prime.) The letter e is used since this value will be used in
+    encryption.
+
+4.  Find a number, d, such that ed−1 is exactly divisible (that is, with
+    no remainder) by z. The letter d is used because this value will be
+    used in decryption. Put another way, given e, we choose d such that
+    ed modz=1
+
+5.  The public key that Bob makes available to the world, KB+, is the
+    pair of numbers (n, e); his private key, KB−, is the pair of numbers
+    (n, d). The encryption by Alice and the decryption by Bob are done
+    as follows: Suppose Alice wants to send Bob a bit pattern
+    represented by the integer number m (with m\<n). To encode, Alice
+    performs the exponentiation me, and then computes the integer
+    remainder when me is divided by n. In other words, the encrypted
+    value, c, of Alice's plaintext message, m, is c=memod n
+
+The bit pattern corresponding to this ciphertext c is sent to Bob. To
+decrypt the received ciphertext message, c, Bob computes m=cdmod n which
+requires the use of his private key (n, d). Table 8.2 Alice's RSA
+encryption, e=5, n=35 Plaintext Letter
+
+m: numeric representation
+
+me
+
+Ciphertext c=me mod n
+
+l
+
+12
+
+248832
+
+17
+
+o
+
+15
+
+759375
+
+15
+
+v
+
+22
+
+5153632
+
+22
+
+e
+
+5
+
+3125
+
+10
+
+As a simple example of RSA, suppose Bob chooses p=5 and q=7.
+(Admittedly, these values are far too small to be secure.) Then n=35 and
+z=24. Bob chooses e=5, since 5 and 24 have no common factors. Finally,
+Bob chooses d=29, since 5⋅29−1 (that is, ed−1) is exactly divisible by
+24. Bob makes the two values, n=35 and e=5, public and keeps the value
+d=29 secret. Observing these two public values, suppose Alice now wants
+to send the letters l, o, v, and e to Bob. Interpreting each letter as a
+number between 1 and 26 (with a being 1, and z being 26), Alice and Bob
+perform the encryption and decryption shown in Tables 8.2 and 8.3,
+respectively. Note that in this example, we consider each of the four
+letters as a distinct message. A more realistic example would be to
+convert the four letters into their 8-bit ASCII representations and then
+encrypt the integer corresponding to the resulting 32-bit bit pattern.
+(Such a realistic example generates numbers that are much too long to
+print in a textbook!) Given that the "toy" example in Tables 8.2 and 8.3
+has already produced some extremely large numbers, and given that we saw
+earlier that p and q should each be several hundred bits long, several
+practical issues regarding RSA come to mind. How does one choose large
+prime numbers? How does one then choose e and d? How does one perform
+exponentiation with large numbers? A discussion of these important
+issues is beyond the scope of this book; see \[Kaufman 1995\] and the
+references therein for details. Table 8.3 Bob's RSA decryption, d=29,
+n=35 Ciphertext c
+
+cd
+
+m = cd mod n
+
+Plaintext Letter
+
+17
+
+4819685721067509150915091411825223071697
+
+12
+
+l
+
+15
+
+127834039403948858939111232757568359375
+
+15
+
+o
+
+22
+
+851643319086537701956194499721106030592
+
+22
+
+v
+
+10
+
+1000000000000000000000000000000
+
+5
+
+e
+
+Session Keys We note here that the exponentiation required by RSA is a
+rather time-consuming process. By contrast, DES is at least 100 times
+faster in software and between 1,000 and 10,000 times faster in hardware
+\[RSA Fast 2012\]. As a result, RSA is often used in practice in
+combination with symmetric key cryptography. For example, if Alice wants
+to send Bob a large amount of encrypted data, she could do the
+following. First Alice chooses a key that will be used to encode the
+data itself; this key is referred to as a session key, and is denoted by
+KS. Alice must inform Bob of the session key, since this is the shared
+symmetric key they will use with a symmetric key cipher (e.g., with DES
+or AES). Alice encrypts the session key using Bob's public key, that is,
+computes c=(KS)e mod n. Bob receives the RSA-encrypted session key, c,
+and decrypts it to obtain the session key, KS. Bob now knows the session
+key that Alice will use for her encrypted data transfer. Why Does RSA
+Work? RSA encryption/decryption appears rather magical. Why should it be
+that by applying the encryption algorithm and then the decryption
+algorithm, one recovers the original message? In order to understand why
+RSA works, again denote n=pq, where p and q are the large prime numbers
+used in the RSA algorithm. Recall that, under RSA encryption, a message
+(uniquely represented by an integer), m, is exponentiated to the power e
+using modulo-n arithmetic, that is, c=memod n Decryption is performed by
+raising this value to the power d, again using modulo-n arithmetic. The
+result of an encryption step followed by a decryption step is thus (me
+mod n)d mod n. Let's now see what we can say about this quantity. As
+mentioned earlier, one important property of modulo arithmetic is (a mod
+n)d mod n=ad mod n for any values a, n, and d. Thus, using a=me in this
+property, we have (memod n)dmod n=medmod n
+
+It therefore remains to show that medmod n=m. Although we're trying to
+remove some of the magic about why RSA works, to establish this, we'll
+need to use a rather magical result from number theory here.
+Specifically, we'll need the result that says if p and q are prime,
+n=pq, and z=(p−1)(q−1), then xy mod n is the same as x(y mod z) mod n
+\[Kaufman 1995\]. Applying this result with x=m and y=ed we have medmod
+n=m(edmod z)mod n But remember that we have chosen e and d such that
+edmod z=1. This gives us medmod n=m1mod n=m which is exactly the result
+we are looking for! By first exponentiating to the power of e (that is,
+encrypting) and then exponentiating to the power of d (that is,
+decrypting), we obtain the original value, m. Even more wonderful is the
+fact that if we first exponentiate to the power of d and then
+exponentiate to the power of e---that is, we reverse the order of
+encryption and decryption, performing the decryption operation first and
+then applying the encryption operation---we also obtain the original
+value, m. This wonderful result follows immediately from the modular
+arithmetic: (mdmod n)emod n=mdemod n=medmod n=(memod n)dmod n The
+security of RSA relies on the fact that there are no known algorithms
+for quickly factoring a number, in this case the public value n, into
+the primes p and q. If one knew p and q, then given the public value e,
+one could easily compute the secret key, d. On the other hand, it is not
+known whether or not there exist fast algorithms for factoring a number,
+and in this sense, the security of RSA is not guaranteed. Another
+popular public-key encryption algorithm is the Diffie-Hellman algorithm,
+which we will briefly explore in the homework problems. Diffie-Hellman
+is not as versatile as RSA in that it cannot be used to encrypt messages
+of arbitrary length; it can be used, however, to establish a symmetric
+session key, which is in turn used to encrypt messages.
+
+8.3 Message Integrity and Digital Signatures In the previous section we
+saw how encryption can be used to provide confidentiality to two
+communicating entities. In this section we turn to the equally important
+cryptography topic of providing message integrity (also known as message
+authentication). Along with message integrity, we will discuss two
+related topics in this section: digital signatures and end-point
+authentication. We define the message integrity problem using, once
+again, Alice and Bob. Suppose Bob receives a message (which may be
+encrypted or may be in plaintext) and he believes this message was sent
+by Alice. To authenticate this message, Bob needs to verify:
+
+1.  The message indeed originated from Alice.
+2.  The message was not tampered with on its way to Bob. We'll see in
+    Sections 8.4 through 8.7 that this problem of message integrity is a
+    critical concern in just about all secure networking protocols. As a
+    specific example, consider a computer network using a link-state
+    routing algorithm (such as OSPF) for determining routes between each
+    pair of routers in the network (see Chapter 5). In a link-state
+    algorithm, each router needs to broadcast a link-state message to
+    all other routers in the network. A router's link-state message
+    includes a list of its directly connected neighbors and the direct
+    costs to these neighbors. Once a router receives link-state messages
+    from all of the other routers, it can create a complete map of the
+    network, run its least-cost routing algorithm, and configure its
+    forwarding table. One relatively easy attack on the routing
+    algorithm is for Trudy to distribute bogus link-state messages with
+    incorrect link-state information. Thus the need for message
+    integrity---when router B receives a linkstate message from router
+    A, router B should verify that router A actually created the message
+    and, further, that no one tampered with the message in transit. In
+    this section, we describe a popular message integrity technique that
+    is used by many secure networking protocols. But before doing so, we
+    need to cover another important topic in cryptography---
+    cryptographic hash functions.
+
+8.3.1 Cryptographic Hash Functions As shown in Figure 8.7, a hash
+function takes an input, m, and computes a fixed-size string H(m)
+
+known as a hash. The Internet checksum (Chapter 3) and CRCs (Chapter 6)
+meet this definition. A cryptographic hash function is required to have
+the following additional property: It is computationally infeasible to
+find any two different messages x and y such that H(x)=H(y). Informally,
+this property means that it is computationally infeasible for an
+intruder to substitute one message for another message that is protected
+by the hash
+
+Figure 8.7 Hash functions
+
+Figure 8.8 Initial message and fraudulent message have the same
+checksum!
+
+function. That is, if (m, H(m)) are the message and the hash of the
+message created by the sender, then
+
+an intruder cannot forge the contents of another message, y, that has
+the same hash value as the original message. Let's convince ourselves
+that a simple checksum, such as the Internet checksum, would make a poor
+cryptographic hash function. Rather than performing 1s complement
+arithmetic (as in the Internet checksum), let us compute a checksum by
+treating each character as a byte and adding the bytes together using
+4-byte chunks at a time. Suppose Bob owes Alice \$100.99 and sends an
+IOU to Alice consisting of the text string " IOU100.99BOB. " The ASCII
+representation (in hexadecimal notation) for these letters is 49 , 4F ,
+55 , 31 , 30 , 30 , 2E , 39 , 39 , 42 , 4F , 42 . Figure 8.8 (top) shows
+that the 4-byte checksum for this message is B2 C1 D2 AC. A slightly
+different message (and a much more costly one for Bob) is shown in the
+bottom half of Figure 8.8. The messages " IOU100.99BOB " and "
+IOU900.19BOB " have the same checksum. Thus, this simple checksum
+algorithm violates the requirement above. Given the original data, it is
+simple to find another set of data with the same checksum. Clearly, for
+security purposes, we are going to need a more powerful hash function
+than a checksum. The MD5 hash algorithm of Ron Rivest \[RFC 1321\] is in
+wide use today. It computes a 128-bit hash in a four-step process
+consisting of a padding step (adding a one followed by enough zeros so
+that the length of the message satisfies certain conditions), an append
+step (appending a 64-bit representation of the message length before
+padding), an initialization of an accumulator, and a final looping step
+in which the message's 16-word blocks are processed (mangled) in four
+rounds. For a description of MD5 (including a C source code
+implementation) see \[RFC 1321\]. The second major hash algorithm in use
+today is the Secure Hash Algorithm (SHA-1) \[FIPS 1995\]. This algorithm
+is based on principles similar to those used in the design of MD4 \[RFC
+1320\], the predecessor to MD5. SHA-1, a US federal standard, is
+required for use whenever a cryptographic hash algorithm is needed for
+federal applications. It produces a 160-bit message digest. The longer
+output length makes SHA-1 more secure.
+
+8.3.2 Message Authentication Code Let's now return to the problem of
+message integrity. Now that we understand hash functions, let's take a
+first stab at how we might perform message integrity:
+
+1.  Alice creates message m and calculates the hash H(m) (for example
+    with SHA-1).
+2.  Alice then appends H(m) to the message m, creating an extended
+    message (m, H(m)), and sends the extended message to Bob.
+
+3. Bob receives an extended message (m, h) and calculates H(m). If
+H(m)=h, Bob concludes that everything is fine. This approach is
+obviously flawed. Trudy can create a bogus message m´ in which she says
+she is Alice, calculate H(m´), and send Bob (m´, H(m´)). When Bob
+receives the message, everything checks out in step 3, so Bob doesn't
+suspect any funny business. To perform message integrity, in addition to
+using cryptographic hash functions, Alice and Bob will need a shared
+secret s. This shared secret, which is nothing more than a string of
+bits, is called the authentication key. Using this shared secret,
+message integrity can be performed as follows:
+
+1.  Alice creates message m, concatenates s with m to create m+s, and
+    calculates the hash H(m+s) (for example with SHA-1). H(m+s) is
+    called the message authentication code (MAC).
+
+2.  Alice then appends the MAC to the message m, creating an extended
+    message (m, H(m+s)), and sends the extended message to Bob.
+
+3.  Bob receives an extended message (m, h) and knowing s, calculates
+    the MAC H(m+s). If H(m+s)=h, Bob concludes that everything is fine.
+    A summary of the procedure is shown in Figure 8.9. Readers should
+    note that the MAC here (standing for "message authentication code")
+    is not the same MAC used in link-layer protocols (standing for
+    "medium access control")! One nice feature of a MAC is that it does
+    not require an encryption algorithm. Indeed, in many applications,
+    including the link-state routing algorithm described earlier,
+    communicating entities are only concerned with message integrity and
+    are not concerned with message confidentiality. Using a MAC, the
+    entities can authenticate
+
+Figure 8.9 Message authentication code (MAC)
+
+the messages they send to each other without having to integrate complex
+encryption algorithms into the integrity process. As you might expect, a
+number of different standards for MACs have been proposed over the
+years. The most popular standard today is HMAC, which can be used either
+with MD5 or SHA-1. HMAC actually runs data and the authentication key
+through the hash function twice \[Kaufman 1995; RFC 2104\]. There still
+remains an important issue. How do we distribute the shared
+authentication key to the communicating entities? For example, in the
+link-state routing algorithm, we would somehow need to distribute the
+secret authentication key to each of the routers in the autonomous
+system. (Note that the routers can all use the same authentication key.)
+A network administrator could actually accomplish this by physically
+visiting each of the routers. Or, if the network administrator is a lazy
+guy, and if each router has its own public key, the network
+administrator could distribute the authentication key to any one of the
+routers by encrypting it with the router's public key and then sending
+the encrypted key over the network to the router.
+
+8.3.3 Digital Signatures Think of the number of the times you've signed
+your name to a piece of paper during the last week. You sign checks,
+credit card receipts, legal documents, and letters. Your signature
+attests to the fact that you (as opposed to someone else) have
+acknowledged and/or agreed with the document's contents. In a digital
+world, one often wants to indicate the owner or creator of a document,
+or to signify one's agreement with a document's content. A digital
+signature is a cryptographic technique for achieving these goals in a
+digital world. Just as with handwritten signatures, digital signing
+should be done in a way that is verifiable and nonforgeable. That is, it
+must be possible to prove that a document signed by an individual was
+indeed signed by that individual (the signature must be verifiable) and
+that only that individual could have signed the document (the signature
+cannot be forged). Let's now consider how we might design a digital
+signature scheme. Observe that when Bob signs a message, Bob must put
+something on the message that is unique to him. Bob could consider
+attaching a MAC for the signature, where the MAC is created by appending
+his key (unique to him) to the message, and then taking the hash. But
+for Alice to verify the signature, she must also have a copy of the key,
+in which case the key would not be unique to Bob. Thus, MACs are not
+going to get the job done here.
+
+Recall that with public-key cryptography, Bob has both a public and
+private key, with both of these keys being unique to Bob. Thus,
+public-key cryptography is an excellent candidate for providing digital
+signatures. Let us now examine how it is done. Suppose that Bob wants to
+digitally sign a document, m. We can think of the document as a file or
+a message that Bob is going to sign and send. As shown in Figure 8.10,
+to sign this document, Bob simply uses his private key, KB−, to compute
+KB−(m). At first, it might seem odd that Bob is using his private key
+(which, as we saw in Section 8.2, was used to decrypt a message that had
+been encrypted with his public key) to sign a document. But recall that
+encryption and decryption are nothing more than mathematical operations
+(exponentiation to the power of e or d in RSA; see Section 8.2) and
+recall that Bob's goal is not to scramble or obscure the contents of the
+document, but rather to sign the document in a manner that is verifiable
+and nonforgeable. Bob's digital signature of the document is KB−(m).
+Does the digital signature KB−(m) meet our requirements of being
+verifiable and nonforgeable? Suppose Alice has m and KB−(m). She wants
+to prove in court (being
+
+Figure 8.10 Creating a digital signature for a document
+
+litigious) that Bob had indeed signed the document and was the only
+person who could have possibly signed the document. Alice takes Bob's
+public key, KB+, and applies it to the digital signature, KB−(m),
+associated with the document, m. That is, she computes KB+(KB−(m)), and
+voilà, with a dramatic flurry, she produces m, which exactly matches the
+original document! Alice then argues that only Bob could have signed the
+document, for the following reasons: Whoever signed the message must
+have used the private key, KB−, in computing the signature KB−(m), such
+that KB+(KB−(m))=m. The only person who could have known the private
+key, KB−, is Bob. Recall from our discussion of
+
+RSA in Section 8.2 that knowing the public key, KB+, is of no help in
+learning the private key, KB−. Therefore, the only person who could know
+KB− is the person who generated the pair of keys, (KB+, KB−), in the
+first place, Bob. (Note that this assumes, though, that Bob has not
+given KB− to anyone, nor has anyone stolen KB− from Bob.) It is also
+important to note that if the original document, m, is ever modified to
+some alternate form, m´, the signature that Bob created for m will not
+be valid for m´, since KB+(KB−(m)) does not equal m´. Thus we see that
+digital signatures also provide message integrity, allowing the receiver
+to verify that the message was unaltered as well as the source of the
+message. One concern with signing data by encryption is that encryption
+and decryption are computationally expensive. Given the overheads of
+encryption and decryption, signing data via complete
+encryption/decryption can be overkill. A more efficient approach is to
+introduce hash functions into the digital signature. Recall from Section
+8.3.2 that a hash algorithm takes a message, m, of arbitrary length and
+computes a fixed-length "fingerprint" of the message, denoted by H(m).
+Using a hash function, Bob signs the hash of a message rather than the
+message itself, that is, Bob calculates KB−(H(m)). Since H(m) is
+generally much smaller than the original message m, the computational
+effort required to create the digital signature is substantially
+reduced. In the context of Bob sending a message to Alice, Figure 8.11
+provides a summary of the operational procedure of creating a digital
+signature. Bob puts his original long message through a hash function.
+He then digitally signs the resulting hash with his private key. The
+original message (in cleartext) along with the digitally signed message
+digest (henceforth referred to as the digital signature) is then sent to
+Alice. Figure 8.12 provides a summary of the operational procedure of
+the signature. Alice applies the sender's public key to the message to
+obtain a hash result. Alice also applies the hash function to the
+cleartext message to obtain a second hash result. If the two hashes
+match, then Alice can be sure about the integrity and author of the
+message. Before moving on, let's briefly compare digital signatures with
+MACs, since they have parallels, but also have important subtle
+differences. Both digital signatures and
+
+Figure 8.11 Sending a digitally signed message
+
+MACs start with a message (or a document). To create a MAC out of the
+message, we append an authentication key to the message, and then take
+the hash of the result. Note that neither public key nor symmetric key
+encryption is involved in creating the MAC. To create a digital
+signature, we first take the hash of the message and then encrypt the
+message with our private key (using public key cryptography). Thus, a
+digital signature is a "heavier" technique, since it requires an
+underlying Public Key Infrastructure (PKI) with certification
+authorities as described below. We'll see in Section 8.4 that PGP---a
+popular secure e-mail system---uses digital signatures for message
+integrity. We've seen already that OSPF uses MACs for message integrity.
+We'll see in Sections 8.5 and 8.6 that MACs are also used for popular
+transport-layer and network-layer security protocols. Public Key
+Certification An important application of digital signatures is public
+key certification, that is, certifying that a public key belongs to a
+specific entity. Public key certification is used in many popular secure
+networking protocols, including IPsec and SSL. To gain insight into this
+problem, let's consider an Internet-commerce version of the classic
+"pizza prank." Alice is in the pizza delivery business and accepts
+orders
+
+Figure 8.12 Verifying a signed message
+
+over the Internet. Bob, a pizza lover, sends Alice a plaintext message
+that includes his home address and the type of pizza he wants. In this
+message, Bob also includes a digital signature (that is, a signed hash
+of the original plaintext message) to prove to Alice that he is the true
+source of the message. To verify the signature, Alice obtains Bob's
+public key (perhaps from a public key server or from the e-mail message)
+and checks the digital signature. In this manner she makes sure that
+Bob, rather than some adolescent prankster, placed the order. This all
+sounds fine until clever Trudy comes along. As shown in Figure 8.13,
+Trudy is indulging in a prank. She sends a message to Alice in which she
+says she is Bob, gives Bob's home address, and orders a pizza. In this
+message she also includes her (Trudy's) public key, although Alice
+naturally assumes it is Bob's public key. Trudy also attaches a digital
+signature, which was created with her own (Trudy's) private key. After
+receiving the message, Alice applies Trudy's public key (thinking that
+it is Bob's) to the digital signature and concludes that the plaintext
+message was
+
+Figure 8.13 Trudy masquerades as Bob using public key cryptography
+
+indeed created by Bob. Bob will be very surprised when the delivery
+person brings a pizza with pepperoni and anchovies to his home! We see
+from this example that for public key cryptography to be useful, you
+need to be able to verify that you have the actual public key of the
+entity (person, router, browser, and so on) with whom you want to
+communicate. For example, when Alice wants to communicate with Bob using
+public key cryptography, she needs to verify that the public key that is
+supposed to be Bob's is indeed Bob's. Binding a public key to a
+particular entity is typically done by a Certification Authority (CA),
+whose job is to validate identities and issue certificates. A CA has the
+following roles:
+
+1.  A CA verifies that an entity (a person, a router, and so on) is who
+    it says it is. There are no mandated procedures for how
+    certification is done. When dealing with a CA, one must trust the CA
+    to have performed a suitably rigorous identity verification. For
+    example, if Trudy were able to walk into the Fly-by-Night
+
+Figure 8.14 Bob has his public key certified by the CA
+
+CA and simply announce "I am Alice" and receive certificates associated
+with the identity of Alice, then one shouldn't put much faith in public
+keys certified by the Fly-by-Night CA. On the other hand, one might (or
+might not!) be more willing to trust a CA that is part of a federal or
+state program. You can trust the identity associated with a public key
+only to the extent to which you can trust a CA and its identity
+verification techniques. What a tangled web of trust we spin!
+
+2.  Once the CA verifies the identity of the entity, the CA creates a
+    certificate that binds the public key of the entity to the identity.
+    The certificate contains the public key and globally unique
+    identifying information about the owner of the public key (for
+    example, a human name or an IP address). The certificate is
+    digitally signed by the CA. These steps are shown in Figure 8.14.
+    Let us now see how certificates can be used to combat pizza-ordering
+    pranksters, like Trudy, and other undesirables. When Bob places his
+    order he also sends his CA-signed certificate. Alice uses the CA's
+    public key to check the validity of Bob's certificate and extract
+    Bob's public key. Both the International Telecommunication Union
+    (ITU) and the IETF have developed standards for CAs. ITU X.509 \[ITU
+    2005a\] specifies an authentication service as well as a specific
+    syntax for certificates. \[RFC 1422\] describes CA-based key
+    management for use with secure Internet e-mail. It is compatible
+    with X.509 but goes beyond X.509 by establishing procedures and
+    conventions for a key management architecture. Table 8.4 describes
+    some of the important fields in a certificate. Table 8.4 Selected
+    fields in an X.509 and RFC 1422 public key
+
+Field Name
+
+Description
+
+Version
+
+Version number of X.509 specification
+
+Serial
+
+CA-issued unique identifier for a certificate
+
+number Signature
+
+Specifies the algorithm used by CA to sign this certificate
+
+Issuer
+
+Identity of CA issuing this certificate, in distinguished name (DN)
+\[RFC 4514\] format
+
+name Validity
+
+Start and end of period of validity for certificate
+
+period Subject
+
+Identity of entity whose public key is associated with this certificate,
+in DN format
+
+name Subject
+
+The subject's public key as well indication of the public key algorithm
+(and algorithm
+
+public key
+
+parameters) to be used with this key
+
+8.4 End-Point Authentication End-point authentication is the process of
+one entity proving its identity to another entity over a computer
+network, for example, a user proving its identity to an e-mail server.
+As humans, we authenticate each other in many ways: We recognize each
+other's faces when we meet, we recognize each other's voices on the
+telephone, we are authenticated by the customs official who checks us
+against the picture on our passport. In this section, we consider how
+one party can authenticate another party when the two are communicating
+over a network. We focus here on authenticating a "live" party, at the
+point in time when communication is actually occurring. A concrete
+example is a user authenticating him or herself to an email server. This
+is a subtly different problem from proving that a message received at
+some point in the past did indeed come from that claimed sender, as
+studied in Section 8.3. When performing authentication over the network,
+the communicating parties cannot rely on biometric information, such as
+a visual appearance or a voiceprint. Indeed, we will see in our later
+case studies that it is often network elements such as routers and
+client/server processes that must authenticate each other. Here,
+authentication must be done solely on the basis of messages and data
+exchanged as part of an authentication protocol. Typically, an
+authentication protocol would run before the two communicating parties
+run some other protocol (for example, a reliable data transfer protocol,
+a routing information exchange protocol, or an e-mail protocol). The
+authentication protocol first establishes the identities of the parties
+to each other's satisfaction; only after authentication do the parties
+get down to the work at hand. As in the case of our development of a
+reliable data transfer (rdt) protocol in Chapter 3, we will find it
+instructive here to develop various versions of an authentication
+protocol, which we will call ap (authentication protocol), and poke
+holes in each version
+
+Figure 8.15 Protocol ap1.0 and a failure scenario
+
+as we proceed. (If you enjoy this stepwise evolution of a design, you
+might also enjoy \[Bryant 1988\], which recounts a fictitious narrative
+between designers of an open-network authentication system, and their
+discovery of the many subtle issues involved.) Let's assume that Alice
+needs to authenticate herself to Bob.
+
+8.4.1 Authentication Protocol ap1.0 Perhaps the simplest authentication
+protocol we can imagine is one where Alice simply sends a message to Bob
+saying she is Alice. This protocol is shown in Figure 8.15. The flaw
+here is obvious--- there is no way for Bob actually to know that the
+person sending the message "I am Alice" is indeed Alice. For example,
+Trudy (the intruder) could just as well send such a message.
+
+8.4.2 Authentication Protocol ap2.0 If Alice has a well-known network
+address (e.g., an IP address) from which she always communicates, Bob
+could attempt to authenticate Alice by verifying that the source address
+on the IP datagram carrying the authentication message matches Alice's
+well-known address. In this case, Alice would be authenticated. This
+might stop a very network-naive intruder from impersonating Alice, but
+it wouldn't stop the determined student studying this book, or many
+others! From our study of the network and data link layers, we know that
+it is not that hard (for example, if one had access to the operating
+system code and could build one's own operating system kernel, as is the
+
+case with Linux and several other freely available operating systems) to
+create an IP datagram, put whatever IP source address we want (for
+example, Alice's well-known IP address) into the IP datagram, and send
+the datagram over the link-layer protocol to the first-hop router. From
+then
+
+Figure 8.16 Protocol ap2.0 and a failure scenario
+
+on, the incorrectly source-addressed datagram would be dutifully
+forwarded to Bob. This approach, shown in Figure 8.16, is a form of IP
+spoofing. IP spoofing can be avoided if Trudy's first-hop router is
+configured to forward only datagrams containing Trudy's IP source
+address \[RFC 2827\]. However, this capability is not universally
+deployed or enforced. Bob would thus be foolish to assume that Trudy's
+network manager (who might be Trudy herself) had configured Trudy's
+first-hop router to forward only appropriately addressed datagrams.
+
+8.4.3 Authentication Protocol ap3.0 One classic approach to
+authentication is to use a secret password. The password is a shared
+secret between the authenticator and the person being authenticated.
+Gmail, Facebook, telnet, FTP, and many other services use password
+authentication. In protocol ap3.0, Alice thus sends her secret password
+to Bob, as shown in Figure 8.17. Since passwords are so widely used, we
+might suspect that protocol ap3.0 is fairly secure. If so, we'd be
+wrong! The security flaw here is clear. If Trudy eavesdrops on Alice's
+communication, then she can learn Alice's password. Lest you think this
+is unlikely, consider the fact that when you Telnet to another machine
+and log in, the login password is sent unencrypted to the Telnet server.
+Someone connected to the Telnet client or server's LAN can possibly
+sniff (read and store) all packets transmitted on the LAN and thus steal
+the login password. In fact, this is a well-known approach for stealing
+passwords (see, for example, \[Jimenez 1997\]). Such a threat is
+obviously very real, so ap3.0 clearly won't do.
+
+8.4.4 Authentication Protocol ap3.1 Our next idea for fixing ap3.0 is
+naturally to encrypt the password. By encrypting the password, we can
+prevent Trudy from learning Alice's password. If we assume
+
+Figure 8.17 Protocol ap3.0 and a failure scenario
+
+that Alice and Bob share a symmetric secret key, KA−B, then Alice can
+encrypt the password and send her identification message, " I am Alice,
+" and her encrypted password to Bob. Bob then decrypts the password and,
+assuming the password is correct, authenticates Alice. Bob feels
+comfortable in authenticating Alice since Alice not only knows the
+password, but also knows the shared secret key value needed to encrypt
+the password. Let's call this protocol ap3.1. While it is true that
+ap3.1 prevents Trudy from learning Alice's password, the use of
+cryptography here does not solve the authentication problem. Bob is
+subject to a playback attack: Trudy need only eavesdrop on Alice's
+communication, record the encrypted version of the password, and play
+back the encrypted version of the password to Bob to pretend that she is
+Alice. The use of an encrypted password in ap3.1 doesn't make the
+situation manifestly different from that of protocol ap3.0 in Figure
+8.17.
+
+8.4.5 Authentication Protocol ap4.0 The failure scenario in Figure 8.17
+resulted from the fact that Bob could not distinguish between the
+original authentication of Alice and the later playback of Alice's
+original authentication. That is, Bob could not tell if Alice was live
+(that is, was currently really on the other end of the connection) or
+whether the messages he was receiving were a recorded playback of a
+previous authentication of Alice. The very (very) observant reader will
+recall that the three-way TCP handshake protocol needed to address the
+same problem---the server side of a TCP connection did not want to
+accept a connection if the received SYN segment was an old copy
+(retransmission) of a SYN segment from an earlier connection. How did
+the TCP server side solve the problem of determining whether the client
+was really live? It chose an initial sequence number that had not been
+used in a very long time, sent that number to the client, and then
+waited for the client to respond with an ACK segment containing that
+number. We can adopt the same idea here for authentication purposes. A
+nonce is a number that a protocol will use only once in a lifetime. That
+is, once a protocol uses a nonce, it will never use that number again.
+Our ap4.0 protocol uses a nonce as follows:
+
+1.  Alice sends the message " I am Alice " to Bob.
+
+2.  Bob chooses a nonce, R, and sends it to Alice.
+
+3.  Alice encrypts the nonce using Alice and Bob's symmetric secret key,
+    KA−B, and sends the encrypted nonce, KA−B (R), back to Bob. As in
+    protocol ap3.1, it is the fact that Alice knows KA−B and uses it to
+    encrypt a value that lets Bob know that the message he receives was
+    generated by Alice. The nonce is used to ensure that Alice is live.
+
+4.  Bob decrypts the received message. If the decrypted nonce equals the
+    nonce he sent Alice, then Alice is authenticated. Protocol ap4.0 is
+    illustrated in Figure 8.18. By using the once-in-a-lifetime value,
+    R, and then checking the returned value, KA−B (R), Bob can be sure
+    that Alice is both who she says she is (since she knows the secret
+    key value needed to encrypt R) and live (since she has encrypted the
+    nonce, R, that Bob just created). The use of a nonce and symmetric
+    key cryptography forms the basis of ap4.0. A natural question is
+    whether we can use a nonce and public key cryptography (rather than
+    symmetric key cryptography) to solve the authentication problem.
+    This issue is explored in the problems at the end of the chapter.
+
+Figure 8.18 Protocol ap4.0 and a failure scenario
+
+8.5 Securing E-Mail In previous sections, we examined fundamental issues
+in network security, including symmetric key and public key
+cryptography, end-point authentication, key distribution, message
+integrity, and digital signatures. We are now going to examine how these
+tools are being used to provide security in the Internet. Interestingly,
+it is possible to provide security services in any of the top four
+layers of the Internet protocol stack. When security is provided for a
+specific application-layer protocol, the application using the protocol
+will enjoy one or more security services, such as confidentiality,
+authentication, or integrity. When security is provided for a
+transport-layer protocol, all applications that use that protocol enjoy
+the security services of the transport protocol. When security is
+provided at the network layer on a host-tohost basis, all
+transport-layer segments (and hence all application-layer data) enjoy
+the security services of the network layer. When security is provided on
+a link basis, then the data in all frames traveling over the link
+receive the security services of the link. In Sections 8.5 through 8.8,
+we examine how security tools are being used in the application,
+transport, network, and link layers. Being consistent with the general
+structure of this book, we begin at the top of the protocol stack and
+discuss security at the application layer. Our approach is to use a
+specific application, e-mail, as a case study for application-layer
+security. We then move down the protocol stack. We'll examine the SSL
+protocol (which provides security at the transport layer), IPsec (which
+provides security at the network layer), and the security of the IEEE
+802.11 wireless LAN protocol. You might be wondering why security
+functionality is being provided at more than one layer in the Internet.
+Wouldn't it suffice simply to provide the security functionality at the
+network layer and be done with it? There are two answers to this
+question. First, although security at the network layer can offer
+"blanket coverage" by encrypting all the data in the datagrams (that is,
+all the transport-layer segments) and by authenticating all the source
+IP addresses, it can't provide user-level security. For example, a
+commerce site cannot rely on IP-layer security to authenticate a
+customer who is purchasing goods at the commerce site. Thus, there is a
+need for security functionality at higher layers as well as blanket
+coverage at lower layers. Second, it is generally easier to deploy new
+Internet services, including security services, at the higher layers of
+the protocol stack. While waiting for security to be broadly deployed at
+the network layer, which is probably still many years in the future,
+many application developers "just do it" and introduce security
+functionality into their favorite applications. A classic example is
+Pretty Good Privacy (PGP), which provides secure e-mail (discussed later
+in this section). Requiring only client and server application code, PGP
+was one of the first security technologies to be broadly used in the
+Internet.
+
+8.5.1 Secure E-Mail We now use the cryptographic principles of Sections
+8.2 through 8.3 to create a secure e-mail system. We create this
+high-level design in an incremental manner, at each step introducing new
+security services. When designing a secure e-mail system, let us keep in
+mind the racy example introduced in Section 8.1---the love affair
+between Alice and Bob. Imagine that Alice wants to send an e-mail
+message to Bob, and Trudy wants to intrude. Before plowing ahead and
+designing a secure e-mail system for Alice and Bob, we should consider
+which security features would be most desirable for them. First and
+foremost is confidentiality. As discussed in Section 8.1, neither Alice
+nor Bob wants Trudy to read Alice's e-mail message. The second feature
+that Alice and Bob would most likely want to see in the secure e-mail
+system is sender authentication. In particular, when Bob receives the
+message " I don't love you anymore. I never want to see you again.
+Formerly yours, Alice, " he would naturally want to be sure that the
+message came from Alice and not from Trudy. Another feature that the two
+lovers would appreciate is message integrity, that is, assurance that
+the message Alice sends is not modified while en route to Bob. Finally,
+the e-mail system should provide receiver authentication; that is, Alice
+wants to make sure that she is indeed sending the letter to Bob and not
+to someone else (for example, Trudy) who is impersonating Bob. So let's
+begin by addressing the foremost concern, confidentiality. The most
+straightforward way to provide confidentiality is for Alice to encrypt
+the message with symmetric key technology (such as DES or AES) and for
+Bob to decrypt the message on receipt. As discussed in Section 8.2, if
+the symmetric key is long enough, and if only Alice and Bob have the
+key, then it is extremely difficult for anyone else (including Trudy) to
+read the message. Although this approach is straightforward, it has the
+fundamental difficulty that we discussed in Section 8.2---distributing a
+symmetric key so that only Alice and Bob have copies of it. So we
+naturally consider an alternative approach---public key cryptography
+(using, for example, RSA). In the public key approach, Bob makes his
+public key publicly available (e.g., in a public key server or on his
+personal Web page), Alice encrypts her message with Bob's public key,
+and she sends the encrypted message to Bob's e-mail address. When Bob
+receives the message, he simply decrypts it with his private key.
+Assuming that Alice knows for sure that the public key is Bob's public
+key, this approach is an excellent means to provide the desired
+confidentiality. One problem, however, is that public key encryption is
+relatively inefficient, particularly for long messages. To overcome the
+efficiency problem, let's make use of a session key (discussed in
+Section 8.2.2). In particular, Alice (1) selects a random symmetric
+session key, KS, (2) encrypts her message, m, with the symmetric key,
+(3) encrypts the symmetric key with Bob's public key, KB+, (4)
+concatenates the
+
+encrypted message and the encrypted symmetric key to form a "package,"
+and (5) sends the package to Bob's
+
+Figure 8.19 Alice used a symmetric session key, KS, to send a secret
+e-mail to Bob
+
+e-mail address. The steps are illustrated in Figure 8.19. (In this and
+the subsequent figures, the circled "+" represents concatenation and the
+circled "−" represents deconcatenation.) When Bob receives the package,
+he (1) uses his private key, KB−, to obtain the symmetric key, KS, and
+(2) uses the symmetric key KS to decrypt the message m. Having designed
+a secure e-mail system that provides confidentiality, let's now design
+another system that provides both sender authentication and message
+integrity. We'll suppose, for the moment, that Alice and Bob are no
+longer concerned with confidentiality (they want to share their feelings
+with everyone!), and are concerned only about sender authentication and
+message integrity. To accomplish this task, we use digital signatures
+and message digests, as described in Section 8.3. Specifically, Alice
+(1) applies a hash function, H (for example, MD5), to her message, m, to
+obtain a message digest, (2) signs the result of the hash function with
+her private key, KA−, to create a digital signature, (3) concatenates
+the original (unencrypted) message with the signature to create a
+package, and (4) sends the package to Bob's e-mail address. When Bob
+receives the package, he (1) applies Alice's public key, KA+, to the
+signed message digest and (2) compares the result of this operation with
+his own hash, H, of the message. The steps are illustrated in Figure
+8.20. As discussed in Section 8.3, if the two results are the same, Bob
+can be pretty confident that the message came from Alice and is
+unaltered. Now let's consider designing an e-mail system that provides
+confidentiality, sender authentication, and message integrity. This can
+be done by combining the procedures in Figures 8.19 and 8.20. Alice
+first creates a preliminary package, exactly as in Figure 8.20, that
+consists of her original message along with a digitally signed hash of
+the message. She then treats this preliminary package as a message in
+itself and sends this new message through the sender steps in Figure
+8.19, creating a new package that is sent to Bob. The steps applied by
+Alice are shown in Figure 8.21. When Bob receives the
+
+package, he first applies his side of Figure 8.19 and then his
+
+Figure 8.20 Using hash functions and digital signatures to provide
+sender authentication and message integrity
+
+side of Figure 8.20. It should be clear that this design achieves the
+goal of providing confidentiality, sender authentication, and message
+integrity. Note that, in this scheme, Alice uses public key cryptography
+twice: once with her own private key and once with Bob's public key.
+Similarly, Bob also uses public key cryptography twice---once with his
+private key and once with Alice's public key. The secure e-mail design
+outlined in Figure 8.21 probably provides satisfactory security for most
+e-mail users for most occasions. But there is still one important issue
+that remains to be addressed. The design in Figure 8.21 requires Alice
+to obtain Bob's public key, and requires Bob to obtain Alice's public
+key. The distribution of these public keys is a nontrivial problem. For
+example, Trudy might masquerade as Bob and give Alice her own public key
+while saying that it is Bob's public key,
+
+Figure 8.21 Alice uses symmetric key cyptography, public key
+cryptography, a hash function, and a digital signature to provide
+secrecy, sender authentication, and message integrity
+
+enabling her to receive the message meant for Bob. As we learned in
+Section 8.3, a popular approach for securely distributing public keys is
+to certify the public keys using a CA.
+
+8.5.2 PGP Written by Phil Zimmermann in 1991, Pretty Good Privacy (PGP)
+is a nice example of an e-mail encryption scheme \[PGPI 2016\]. Versions
+of PGP are available in the public domain; for example, you can find the
+PGP software for your favorite platform as well as lots of interesting
+reading at the International PGP Home Page \[PGPI 2016\]. The PGP design
+is, in essence, the same as the design shown in Figure 8.21. Depending
+on the version, the PGP software uses MD5 or SHA for calculating the
+message digest; CAST, triple-DES, or IDEA for symmetric key encryption;
+and RSA for the public key encryption. When PGP is installed, the
+software creates a public key pair for the user. The public key can be
+posted on the user's Web site or placed in a public key server. The
+private key is protected by the use of a password. The password has to
+be entered every time the user accesses the private key. PGP gives the
+user the option of digitally signing the message, encrypting the
+message, or both digitally signing and encrypting. Figure 8.22 shows a
+PGP signed message. This message appears after the MIME header. The
+encoded data in the message is KA−(H(m)), that is, the digitally signed
+message digest. As we discussed above, in order for Bob to verify the
+integrity of the message, he needs to have access to Alice's public key.
+Figure 8.23 shows a secret PGP message. This message also appears after
+the MIME header. Of course, the plaintext message is not included within
+the secret e-mail message. When a sender (such as Alice) wants both
+confidentiality and integrity, PGP contains a message like that of
+Figure 8.23 within the message of Figure 8.22. PGP also provides a
+mechanism for public key certification, but the mechanism is quite
+different from the more conventional CA. PGP public keys are certified
+by
+
+Figure 8.22 A PGP signed message
+
+Figure 8.23 A secret PGP message
+
+a web of trust. Alice herself can certify any key/username pair when she
+believes the pair really belong together. In addition, PGP permits Alice
+to say that she trusts another user to vouch for the authenticity of
+more keys. Some PGP users sign each other's keys by holding key-signing
+parties. Users physically gather, exchange public keys, and certify each
+other's keys by signing them with their private keys.
+
+8.6 Securing TCP Connections: SSL In the previous section, we saw how
+cryptographic techniques can provide confidentiality, data integrity,
+and end-point authentication to a specific application, namely, e-mail.
+In this section, we'll drop down a layer in the protocol stack and
+examine how cryptography can enhance TCP with security services,
+including confidentiality, data integrity, and end-point authentication.
+This enhanced version of TCP is commonly known as Secure Sockets Layer
+(SSL). A slightly modified version of SSL version 3, called Transport
+Layer Security (TLS), has been standardized by the IETF \[RFC 4346\].
+The SSL protocol was originally designed by Netscape, but the basic
+ideas behind securing TCP had predated Netscape's work (for example, see
+Woo \[Woo 1994\]). Since its inception, SSL has enjoyed broad
+deployment. SSL is supported by all popular Web browsers and Web
+servers, and it is used by Gmail and essentially all Internet commerce
+sites (including Amazon, eBay, and TaoBao). Hundreds of billions of
+dollars are spent over SSL every year. In fact, if you have ever
+purchased anything over the Internet with your credit card, the
+communication between your browser and the server for this purchase
+almost certainly went over SSL. (You can identify that SSL is being used
+by your browser when the URL begins with https: rather than http.) To
+understand the need for SSL, let's walk through a typical Internet
+commerce scenario. Bob is surfing the Web and arrives at the Alice
+Incorporated site, which is selling perfume. The Alice Incorporated site
+displays a form in which Bob is supposed to enter the type of perfume
+and quantity desired, his address, and his payment card number. Bob
+enters this information, clicks on Submit, and expects to receive (via
+ordinary postal mail) the purchased perfumes; he also expects to receive
+a charge for his order in his next payment card statement. This all
+sounds good, but if no security measures are taken, Bob could be in for
+a few surprises. If no confidentiality (encryption) is used, an intruder
+could intercept Bob's order and obtain his payment card information. The
+intruder could then make purchases at Bob's expense. If no data
+integrity is used, an intruder could modify Bob's order, having him
+purchase ten times more bottles of perfume than desired. Finally, if no
+server authentication is used, a server could display Alice
+Incorporated's famous logo when in actuality the site maintained by
+Trudy, who is masquerading as Alice Incorporated. After receiving Bob's
+order, Trudy could take Bob's money and run. Or Trudy could carry out an
+identity theft by collecting Bob's name, address, and credit card
+number. SSL addresses these issues by enhancing TCP with
+confidentiality, data integrity, server authentication, and client
+authentication.
+
+SSL is often used to provide security to transactions that take place
+over HTTP. However, because SSL secures TCP, it can be employed by any
+application that runs over TCP. SSL provides a simple Application
+Programmer Interface (API) with sockets, which is similar and analogous
+to TCP's API. When an application wants to employ SSL, the application
+includes SSL classes/libraries. As shown in Figure 8.24, although SSL
+technically resides in the application layer, from the developer's
+perspective it is a transport protocol that provides TCP's services
+enhanced with security services.
+
+8.6.1 The Big Picture We begin by describing a simplified version of
+SSL, one that will allow us to get a big-picture understanding of the
+why and how of SSL. We will refer to this simplified
+
+Figure 8.24 Although SSL technically resides in the application layer,
+from the developer's perspective it is a transport-layer protocol
+
+version of SSL as "almost-SSL." After describing almost-SSL, in the next
+subsection we'll then describe the real SSL, filling in the details.
+Almost-SSL (and SSL) has three phases: handshake, key derivation, and
+data transfer. We now describe these three phases for a communication
+session between a client (Bob) and a server (Alice), with Alice having a
+private/public key pair and a certificate that binds her identity to her
+public key.
+
+Handshake During the handshake phase, Bob needs to (a) establish a TCP
+connection with Alice, (b) verify that Alice is really Alice, and (c)
+send Alice a master secret key, which will be used by both Alice and Bob
+to generate all the symmetric keys they need for the SSL session. These
+three steps are shown in Figure 8.25. Note that once the TCP connection
+is established, Bob sends Alice a hello message. Alice then responds
+with her certificate, which contains her public key. As discussed in
+Section 8.3, because the certificate has been certified by a CA, Bob
+knows for sure that the public key in the certificate belongs to Alice.
+Bob then generates a Master Secret (MS) (which will only be used for
+this SSL session), encrypts the MS with Alice's public key to create the
+Encrypted Master Secret (EMS), and sends the EMS to Alice. Alice
+decrypts the EMS with her private key to get the MS. After this phase,
+both Bob and Alice (and no one else) know the master secret for this SSL
+session.
+
+Figure 8.25 The almost-SSL handshake, beginning with a TCP connection
+
+Key Derivation In principle, the MS, now shared by Bob and Alice, could
+be used as the symmetric session key for all subsequent encryption and
+data integrity checking. It is, however, generally considered safer for
+Alice and Bob to each use different cryptographic keys, and also to use
+different keys for encryption and integrity checking. Thus, both Alice
+and Bob use the MS to generate four keys: EB= session encryption key for
+data sent from Bob to Alice MB= session MAC key for data sent from Bob
+to Alice EA=
+
+session encryption key for data sent from Alice to Bob MA= session MAC
+key for data sent from Alice to Bob Alice and Bob each generate the four
+keys from the MS. This could be done by simply slicing the MS into four
+keys. (But in real SSL it is a little more complicated, as we'll see.)
+At the end of the key derivation phase, both Alice and Bob have all four
+keys. The two encryption keys will be used to encrypt data; the two MAC
+keys will be used to verify the integrity of the data. Data Transfer Now
+that Alice and Bob share the same four session keys (EB, MB, EA, and
+MA), they can start to send secured data to each other over the TCP
+connection. Since TCP is a byte-stream protocol, a natural approach
+would be for SSL to encrypt application data on the fly and then pass
+the encrypted data on the fly to TCP. But if we were to do this, where
+would we put the MAC for the integrity check? We certainly do not want
+to wait until the end of the TCP session to verify the integrity of all
+of Bob's data that was sent over the entire session! To address this
+issue, SSL breaks the data stream into records, appends a MAC to each
+record for integrity checking, and then encrypts the record +MAC. To
+create the MAC, Bob inputs the record data along with the key MB into a
+hash function, as discussed in Section 8.3. To encrypt the package
+record +MAC, Bob uses his session encryption key EB. This encrypted
+package is then passed to TCP for transport over the Internet. Although
+this approach goes a long way, it still isn't bullet-proof when it comes
+to providing data integrity for the entire message stream. In
+particular, suppose Trudy is a woman-in-the-middle and has the ability
+to insert, delete, and replace segments in the stream of TCP segments
+sent between Alice and Bob. Trudy, for example, could capture two
+segments sent by Bob, reverse the order of the segments, adjust the TCP
+sequence numbers (which are not encrypted), and then send the two
+reverse-ordered segments to Alice. Assuming that each TCP segment
+encapsulates exactly one record, let's now take a look at how Alice
+would process these segments.
+
+1.  TCP running in Alice would think everything is fine and pass the two
+    records to the SSL sublayer.
+
+2.  SSL in Alice would decrypt the two records.
+
+3.  SSL in Alice would use the MAC in each record to verify the data
+    integrity of the two records.
+
+4.  SSL would then pass the decrypted byte streams of the two records to
+    the application layer; but the complete byte stream received by
+    Alice would not be in the correct order due to reversal of the
+    records! You are encouraged to walk through similar scenarios for
+    when Trudy removes segments or when Trudy replays segments.
+
+The solution to this problem, as you probably guessed, is to use
+sequence numbers. SSL does this as follows. Bob maintains a sequence
+number counter, which begins at zero and is incremented for each SSL
+record he sends. Bob doesn't actually include a sequence number in the
+record itself, but when he calculates the MAC, he includes the sequence
+number in the MAC calculation. Thus, the MAC is now a hash of the data
+plus the MAC key MB plus the current sequence number. Alice tracks Bob's
+sequence numbers, allowing her to verify the data integrity of a record
+by including the appropriate sequence number in the MAC calculation.
+This use of SSL sequence numbers prevents Trudy from carrying out a
+woman-in-the-middle attack, such as reordering or replaying segments.
+(Why?) SSL Record The SSL record (as well as the almost-SSL record) is
+shown in Figure 8.26. The record consists of a type field, version
+field, length field, data field, and MAC field. Note that the first
+three fields are not encrypted. The type field indicates whether the
+record is a handshake message or a message that contains application
+data. It is also used to close the SSL connection, as discussed below.
+SSL at the receiving end uses the length field to extract the SSL
+records out of the incoming TCP byte stream. The version field is
+self-explanatory.
+
+8.6.2 A More Complete Picture The previous subsection covered the
+almost-SSL protocol; it served to give us a basic understanding of the
+why and how of SSL. Now that we have a basic understanding of SSL, we
+can dig a little deeper and examine the essentials of the actual SSL
+protocol. In parallel to reading this description of the SSL protocol,
+you are encouraged to complete the Wireshark SSL lab, available at the
+textbook's Web site.
+
+Figure 8.26 Record format for SSL
+
+SSL Handshake SSL does not mandate that Alice and Bob use a specific
+symmetric key algorithm, a specific public-key algorithm, or a specific
+MAC. Instead, SSL allows Alice and Bob to agree on the cryptographic
+algorithms at the beginning of the SSL session, during the handshake
+phase. Additionally, during the handshake phase, Alice and Bob send
+nonces to each other, which are used in the creation of the
+
+session keys (EB, MB, EA, and MA). The steps of the real SSL handshake
+are as follows:
+
+1.  The client sends a list of cryptographic algorithms it supports,
+    along with a client nonce.
+
+2.  From the list, the server chooses a symmetric algorithm (for
+    example, AES), a public key algorithm (for example, RSA with a
+    specific key length), and a MAC algorithm. It sends back to the
+    client its choices, as well as a certificate and a server nonce.
+
+3.  The client verifies the certificate, extracts the server's public
+    key, generates a Pre-Master Secret (PMS), encrypts the PMS with the
+    server's public key, and sends the encrypted PMS to the server.
+
+4.  Using the same key derivation function (as specified by the SSL
+    standard), the client and server independently compute the Master
+    Secret (MS) from the PMS and nonces. The MS is then sliced up to
+    generate the two encryption and two MAC keys. Furthermore, when the
+    chosen symmetric cipher employs CBC (such as 3DES or AES), then two
+    Initialization Vectors (IVs)--- one for each side of the
+    connection---are also obtained from the MS. Henceforth, all messages
+    sent between client and server are encrypted and authenticated (with
+    the MAC).
+
+5.  The client sends a MAC of all the handshake messages.
+
+6.  The server sends a MAC of all the handshake messages. The last two
+    steps protect the handshake from tampering. To see this, observe
+    that in step 1, the client typically offers a list of
+    algorithms---some strong, some weak. This list of algorithms is sent
+    in cleartext, since the encryption algorithms and keys have not yet
+    been agreed upon. Trudy, as a woman-in-themiddle, could delete the
+    stronger algorithms from the list, forcing the client to select a
+    weak algorithm. To prevent such a tampering attack, in step 5 the
+    client sends a MAC of the concatenation of all the handshake
+    messages it sent and received. The server can compare this MAC with
+    the MAC of the handshake messages it received and sent. If there is
+    an inconsistency, the server can terminate the connection.
+    Similarly, the server sends a MAC of the handshake messages it has
+    seen, allowing the client to check for inconsistencies. You may be
+    wondering why there are nonces in steps 1 and 2. Don't sequence
+    numbers suffice for preventing the segment replay attack? The answer
+    is yes, but they don't alone prevent the "connection replay attack."
+    Consider the following connection replay attack. Suppose Trudy
+    sniffs all messages between Alice and Bob. The next day, Trudy
+    masquerades as Bob and sends to Alice exactly the same sequence of
+    messages that Bob sent to Alice on the previous day. If Alice
+    doesn't use nonces, she will respond with exactly the same sequence
+    of messages she sent the previous day. Alice will not suspect any
+    funny business, as each message she receives will pass the integrity
+    check. If Alice is an ecommerce server, she will think that Bob is
+    placing a second order (for exactly the same thing). On the other
+    hand, by including a nonce in the protocol, Alice will send
+    different nonces for each TCP session, causing the encryption keys
+    to be different on the two days. Therefore, when Alice receives
+    played-back SSL records from Trudy, the records will fail the
+    integrity checks, and the bogus e-commerce transaction will not
+    succeed. In summary, in SSL, nonces are used to defend against the
+    "connection replay attack"
+
+and sequence numbers are used to defend against replaying individual
+packets during an ongoing session. Connection Closure At some point,
+either Bob or Alice will want to end the SSL session. One approach would
+be to let Bob end the SSL session by simply terminating the underlying
+TCP connection---that is, by having Bob send a TCP FIN segment to Alice.
+But such a naive design sets the stage for the truncation attack whereby
+Trudy once again gets in the middle of an ongoing SSL session and ends
+the session early with a TCP FIN. If Trudy were to do this, Alice would
+think she received all of Bob's data when actuality she only received a
+portion of it. The solution to this problem is to indicate in the type
+field whether the record serves to terminate the SSL session. (Although
+the SSL type is sent in the clear, it is authenticated at the receiver
+using the record's MAC.) By including such a field, if Alice were to
+receive a TCP FIN before receiving a closure SSL record, she would know
+that something funny was going on. This completes our introduction to
+SSL. We've seen that it uses many of the cryptography principles
+discussed in Sections 8.2 and 8.3. Readers who want to explore SSL on
+yet a deeper level can read Rescorla's highly readable book on SSL
+\[Rescorla 2001\].
+
+8.7 Network-Layer Security: IPsec and Virtual Private Networks The IP
+security protocol, more commonly known as IPsec, provides security at
+the network layer. IPsec secures IP datagrams between any two
+network-layer entities, including hosts and routers. As we will soon
+describe, many institutions (corporations, government branches,
+non-profit organizations, and so on) use IPsec to create virtual private
+networks (VPNs) that run over the public Internet. Before getting into
+the specifics of IPsec, let's step back and consider what it means to
+provide confidentiality at the network layer. With network-layer
+confidentiality between a pair of network entities (for example, between
+two routers, between two hosts, or between a router and a host), the
+sending entity encrypts the payloads of all the datagrams it sends to
+the receiving entity. The encrypted payload could be a TCP segment, a
+UDP segment, an ICMP message, and so on. If such a network-layer service
+were in place, all data sent from one entity to the other---including
+e-mail, Web pages, TCP handshake messages, and management messages (such
+as ICMP and SNMP)---would be hidden from any third party that might be
+sniffing the network. For this reason, network-layer security is said to
+provide "blanket coverage." In addition to confidentiality, a
+network-layer security protocol could potentially provide other security
+services. For example, it could provide source authentication, so that
+the receiving entity can verify the source of the secured datagram. A
+network-layer security protocol could provide data integrity, so that
+the receiving entity can check for any tampering of the datagram that
+may have occurred while the datagram was in transit. A network-layer
+security service could also provide replay-attack prevention, meaning
+that Bob could detect any duplicate datagrams that an attacker might
+insert. We will soon see that IPsec indeed provides mechanisms for all
+these security services, that is, for confidentiality, source
+authentication, data integrity, and replay-attack prevention.
+
+8.7.1 IPsec and Virtual Private Networks (VPNs) An institution that
+extends over multiple geographical regions often desires its own IP
+network, so that its hosts and servers can send data to each other in a
+secure and confidential manner. To achieve this goal, the institution
+could actually deploy a stand-alone physical network---including
+routers, links, and a DNS infrastructure---that is completely separate
+from the public Internet. Such a disjoint network, dedicated to a
+particular institution, is called a private network. Not surprisingly, a
+private network can be very costly, as the institution needs to
+purchase, install, and maintain its own physical network infrastructure.
+
+Instead of deploying and maintaining a private network, many
+institutions today create VPNs over the existing public Internet. With a
+VPN, the institution's inter-office traffic is sent over the public
+Internet rather than over a physically independent network. But to
+provide confidentiality, the inter-office traffic is encrypted before it
+enters the public Internet. A simple example of a VPN is shown in Figure
+8.27. Here the institution consists of a headquarters, a branch office,
+and traveling salespersons that typically access the Internet from their
+hotel rooms. (There is only one salesperson shown in the figure.) In
+this VPN, whenever two hosts within headquarters send IP datagrams to
+each other or whenever two hosts within the branch office want to
+communicate, they use good-old vanilla IPv4 (that is, without IPsec
+services). However, when two of the institution's hosts
+
+Figure 8.27 Virtual private network (VPN)
+
+communicate over a path that traverses the public Internet, the traffic
+is encrypted before it enters the Internet. To get a feel for how a VPN
+works, let's walk through a simple example in the context of Figure
+8.27. When a host in headquarters sends an IP datagram to a salesperson
+in a hotel, the gateway router in headquarters converts the vanilla IPv4
+datagram into an IPsec datagram and then forwards this IPsec datagram
+into the Internet. This IPsec datagram actually has a traditional IPv4
+header, so that the routers in the public Internet process the datagram
+as if it were an ordinary IPv4 datagram---to them, the datagram is a
+perfectly ordinary datagram. But, as shown Figure 8.27, the payload of
+the IPsec datagram includes an IPsec header, which is used for IPsec
+processing; furthermore, the payload of the
+
+IPsec datagram is encrypted. When the IPsec datagram arrives at the
+salesperson's laptop, the OS in the laptop decrypts the payload (and
+provides other security services, such as verifying data integrity) and
+passes the unencrypted payload to the upper-layer protocol (for example,
+to TCP or UDP). We have just given a high-level overview of how an
+institution can employ IPsec to create a VPN. To see the forest through
+the trees, we have brushed aside many important details. Let's now take
+a closer look.
+
+8.7.2 The AH and ESP Protocols IPsec is a rather complex animal---it is
+defined in more than a dozen RFCs. Two important RFCs are RFC 4301,
+which describes the overall IP security architecture, and RFC 6071,
+which provides an overview of the IPsec protocol suite. Our goal in this
+textbook, as usual, is not simply to re-hash the dry and arcane RFCs,
+but instead take a more operational and pedagogic approach to describing
+the protocols. In the IPsec protocol suite, there are two principal
+protocols: the Authentication Header (AH) protocol and the Encapsulation
+Security Payload (ESP) protocol. When a source IPsec entity (typically a
+host or a router) sends secure datagrams to a destination entity (also a
+host or a router), it does so with either the AH protocol or the ESP
+protocol. The AH protocol provides source authentication and data
+integrity but does not provide confidentiality. The ESP protocol
+provides source authentication, data integrity, and confidentiality.
+Because confidentiality is often critical for VPNs and other IPsec
+applications, the ESP protocol is much more widely used than the AH
+protocol. In order to de-mystify IPsec and avoid much of its
+complication, we will henceforth focus exclusively on the ESP protocol.
+Readers wanting to learn also about the AH protocol are encouraged to
+explore the RFCs and other online resources.
+
+8.7.3 Security Associations IPsec datagrams are sent between pairs of
+network entities, such as between two hosts, between two routers, or
+between a host and router. Before sending IPsec datagrams from source
+entity to destination entity, the source and destination entities create
+a network-layer logical connection. This logical connection is called a
+security association (SA). An SA is a simplex logical connection; that
+is, it is unidirectional from source to destination. If both entities
+want to send secure datagrams to each other, then two SAs (that is, two
+logical connections) need to be established, one in each direction. For
+example, consider once again the institutional VPN in Figure 8.27. This
+institution consists of a
+
+headquarters office, a branch office and, say, n traveling salespersons.
+For the sake of example, let's suppose that there is bi-directional
+IPsec traffic between headquarters and the branch office and
+bidirectional IPsec traffic between headquarters and the salespersons.
+In this VPN, how many SAs are there? To answer this question, note that
+there are two SAs between the headquarters gateway router and the
+branch-office gateway router (one in each direction); for each
+salesperson's laptop, there are two SAs between the headquarters gateway
+router and the laptop (again, one in each direction). So, in total,
+there are (2+2n) SAs. Keep in mind, however, that not all traffic sent
+into the Internet by the gateway routers or by the laptops will be IPsec
+secured. For example, a host in headquarters may want to access a Web
+server (such as Amazon or Google) in the public Internet. Thus, the
+gateway router (and the laptops) will emit into the Internet both
+vanilla IPv4 datagrams and secured IPsec datagrams.
+
+Figure 8.28 Security association (SA) from R1 to R2
+
+Let's now take a look "inside" an SA. To make the discussion tangible
+and concrete, let's do this in the context of an SA from router R1 to
+router R2 in Figure 8.28. (You can think of Router R1 as the
+headquarters gateway router and Router R2 as the branch office gateway
+router from Figure 8.27.) Router R1 will maintain state information
+about this SA, which will include: A 32-bit identifier for the SA,
+called the Security Parameter Index (SPI) The origin interface of the SA
+(in this case 200.168.1.100) and the destination interface of the SA (in
+this case 193.68.2.23) The type of encryption to be used (for example,
+3DES with CBC) The encryption key The type of integrity check (for
+example, HMAC with MD5) The authentication key Whenever router R1 needs
+to construct an IPsec datagram for forwarding over this SA, it accesses
+this state information to determine how it should authenticate and
+encrypt the datagram. Similarly, router R2 will maintain the same state
+information for this SA and will use this information to authenticate
+and decrypt any IPsec datagram that arrives from the SA. An IPsec entity
+(router or host) often maintains state information for many SAs. For
+example, in the VPN
+
+example in Figure 8.27 with n salespersons, the headquarters gateway
+router maintains state information for (2+2n) SAs. An IPsec entity
+stores the state information for all of its SAs in its Security
+Association Database (SAD), which is a data structure in the entity's OS
+kernel.
+
+8.7.4 The IPsec Datagram Having now described SAs, we can now describe
+the actual IPsec datagram. IPsec has two different packet forms, one for
+the so-called tunnel mode and the other for the so-called transport
+mode. The tunnel mode, being more appropriate for VPNs,
+
+Figure 8.29 IPsec datagram format
+
+is more widely deployed than the transport mode. In order to further
+de-mystify IPsec and avoid much of its complication, we henceforth focus
+exclusively on the tunnel mode. Once you have a solid grip on the tunnel
+mode, you should be able to easily learn about the transport mode on
+your own. The packet format of the IPsec datagram is shown in Figure
+8.29. You might think that packet formats are boring and insipid, but we
+will soon see that the IPsec datagram actually looks and tastes like a
+popular Tex-Mex delicacy! Let's examine the IPsec fields in the context
+of Figure 8.28. Suppose router R1 receives an ordinary IPv4 datagram
+from host 172.16.1.17 (in the headquarters network) which is destined to
+host 172.16.2.48 (in the branch-office network). Router R1 uses the
+following recipe to convert this "original IPv4 datagram" into an IPsec
+datagram: Appends to the back of the original IPv4 datagram (which
+includes the original header fields!) an "ESP trailer" field Encrypts
+the result using the algorithm and key specified by the SA Appends to
+the front of this encrypted quantity a field called "ESP header"; the
+resulting package is called the "enchilada" Creates an authentication
+MAC over the whole enchilada using the algorithm and key specified in
+
+the SA Appends the MAC to the back of the enchilada forming the payload
+Finally, creates a brand new IP header with all the classic IPv4 header
+fields (together normally 20 bytes long), which it appends before the
+payload Note that the resulting IPsec datagram is a bona fide IPv4
+datagram, with the traditional IPv4 header fields followed by a payload.
+But in this case, the payload contains an ESP header, the original IP
+datagram, an ESP trailer, and an ESP authentication field (with the
+original datagram and ESP trailer encrypted). The original IP datagram
+has 172.16.1.17 for the source IP address and 172.16.2.48 for the
+destination IP address. Because the IPsec datagram includes the original
+IP datagram, these addresses are included (and encrypted) as part of the
+payload of the IPsec packet. But what about the source and destination
+IP addresses that are in the new IP header, that is, in the left-most
+header of the IPsec datagram? As you might expect, they are set to the
+source and destination router interfaces at the two ends of the tunnels,
+namely, 200.168.1.100 and 193.68.2.23. Also, the protocol number in this
+new IPv4 header field is not set to that of TCP, UDP, or SMTP, but
+instead to 50, designating that this is an IPsec datagram using the ESP
+protocol. After R1 sends the IPsec datagram into the public Internet, it
+will pass through many routers before reaching R2. Each of these routers
+will process the datagram as if it were an ordinary datagram---they are
+completely oblivious to the fact that the datagram is carrying
+IPsec-encrypted data. For these public Internet routers, because the
+destination IP address in the outer header is R2, the ultimate
+destination of the datagram is R2. Having walked through an example of
+how an IPsec datagram is constructed, let's now take a closer look at
+the ingredients in the enchilada. We see in Figure 8.29 that the ESP
+trailer consists of three fields: padding; pad length; and next header.
+Recall that block ciphers require the message to be encrypted to be an
+integer multiple of the block length. Padding (consisting of meaningless
+bytes) is used so that when added to the original datagram (along with
+the pad length and next header fields), the resulting "message" is an
+integer number of blocks. The pad-length field indicates to the
+receiving entity how much padding was inserted (and thus needs to be
+removed). The next header identifies the type (e.g., UDP) of data
+contained in the payload-data field. The payload data (typically the
+original IP datagram) and the ESP trailer are concatenated and then
+encrypted. Appended to the front of this encrypted unit is the ESP
+header, which is sent in the clear and consists of two fields: the SPI
+and the sequence number field. The SPI indicates to the receiving entity
+the SA to which the datagram belongs; the receiving entity can then
+index its SAD with the SPI to determine the appropriate
+authentication/decryption algorithms and keys. The sequence number field
+is used to defend against replay attacks. The sending entity also
+appends an authentication MAC. As stated earlier, the sending entity
+calculates
+
+a MAC over the whole enchilada (consisting of the ESP header, the
+original IP datagram, and the ESP trailer---with the datagram and
+trailer being encrypted). Recall that to calculate a MAC, the sender
+appends a secret MAC key to the enchilada and then calculates a
+fixed-length hash of the result. When R2 receives the IPsec datagram, R2
+observes that the destination IP address of the datagram is R2 itself.
+R2 therefore processes the datagram. Because the protocol field (in the
+left-most IP header) is 50, R2 sees that it should apply IPsec ESP
+processing to the datagram. First, peering into the enchilada, R2 uses
+the SPI to determine to which SA the datagram belongs. Second, it
+calculates the MAC of the enchilada and verifies that the MAC is
+consistent with the value in the ESP MAC field. If it is, it knows that
+the enchilada comes from R1 and has not been tampered with. Third, it
+checks the sequence-number field to verify that the datagram is fresh
+(and not a replayed datagram). Fourth, it decrypts the encrypted unit
+using the decryption algorithm and key associated with the SA. Fifth, it
+removes padding and extracts the original, vanilla IP datagram. And
+finally, sixth, it forwards the original datagram into the branch office
+network toward its ultimate destination. Whew, what a complicated
+recipe, huh? Well no one ever said that preparing and unraveling an
+enchilada was easy! There is actually another important subtlety that
+needs to be addressed. It centers on the following question: When R1
+receives an (unsecured) datagram from a host in the headquarters
+network, and that datagram is destined to some destination IP address
+outside of headquarters, how does R1 know whether it should be converted
+to an IPsec datagram? And if it is to be processed by IPsec, how does R1
+know which SA (of many SAs in its SAD) should be used to construct the
+IPsec datagram? The problem is solved as follows. Along with a SAD, the
+IPsec entity also maintains another data structure called the Security
+Policy Database (SPD). The SPD indicates what types of datagrams (as a
+function of source IP address, destination IP address, and protocol
+type) are to be IPsec processed; and for those that are to be IPsec
+processed, which SA should be used. In a sense, the information in a SPD
+indicates "what" to do with an arriving datagram; the information in the
+SAD indicates "how" to do it. Summary of IPsec Services So what services
+does IPsec provide, exactly? Let us examine these services from the
+perspective of an attacker, say Trudy, who is a woman-in-the-middle,
+sitting somewhere on the path between R1 and R2 in Figure 8.28. Assume
+throughout this discussion that Trudy does not know the authentication
+and encryption keys used by the SA. What can and cannot Trudy do? First,
+Trudy cannot see the original datagram. If fact, not only is the data in
+the original datagram hidden from Trudy, but so is the protocol number,
+the source IP address, and the destination IP address. For datagrams
+sent over the SA, Trudy only knows that the datagram originated from
+some host in 172.16.1.0/24 and is destined to some host in
+172.16.2.0/24. She does not know if it is carrying TCP, UDP, or ICMP
+data; she does not know if it is carrying HTTP, SMTP, or some other type
+of application data. This confidentiality thus goes a lot farther than
+SSL. Second, suppose Trudy tries to tamper with a datagram in the SA by
+flipping some of its bits. When this tampered datagram arrives at R2, it
+will fail the integrity check (using the MAC), thwarting
+
+Trudy's vicious attempts once again. Third, suppose Trudy tries to
+masquerade as R1, creating a IPsec datagram with source 200.168.1.100
+and destination 193.68.2.23. Trudy's attack will be futile, as this
+datagram will again fail the integrity check at R2. Finally, because
+IPsec includes sequence numbers, Trudy will not be able create a
+successful replay attack. In summary, as claimed at the beginning of
+this section, IPsec provides---between any pair of devices that process
+packets through the network layer--- confidentiality, source
+authentication, data integrity, and replay-attack prevention.
+
+8.7.5 IKE: Key Management in IPsec When a VPN has a small number of end
+points (for example, just two routers as in Figure 8.28), the network
+administrator can manually enter the SA information
+(encryption/authentication algorithms and keys, and the SPIs) into the
+SADs of the endpoints. Such "manual keying" is clearly impractical for a
+large VPN, which may consist of hundreds or even thousands of IPsec
+routers and hosts. Large, geographically distributed deployments require
+an automated mechanism for creating the SAs. IPsec does this with the
+Internet Key Exchange (IKE) protocol, specified in RFC 5996. IKE has
+some similarities with the handshake in SSL (see Section 8.6). Each
+IPsec entity has a certificate, which includes the entity's public key.
+As with SSL, the IKE protocol has the two entities exchange
+certificates, negotiate authentication and encryption algorithms, and
+securely exchange key material for creating session keys in the IPsec
+SAs. Unlike SSL, IKE employs two phases to carry out these tasks. Let's
+investigate these two phases in the context of two routers, R1 and R2,
+in Figure 8.28. The first phase consists of two exchanges of message
+pairs between R1 and R2: During the first exchange of messages, the two
+sides use Diffie-Hellman (see Homework Problems) to create a
+bi-directional IKE SA between the routers. To keep us all confused, this
+bi-directional IKE SA is entirely different from the IPsec SAs discussed
+in Sections 8.6.3 and 8.6.4. The IKE SA provides an authenticated and
+encrypted channel between the two routers. During this first
+message-pair exchange, keys are established for encryption and
+authentication for the IKE SA. Also established is a master secret that
+will be used to compute IPSec SA keys later in phase 2. Observe that
+during this first step, RSA public and private keys are not used. In
+particular, neither R1 nor R2 reveals its identity by signing a message
+with its private key. During the second exchange of messages, both sides
+reveal their identity to each other by signing their messages. However,
+the identities are not revealed to a passive sniffer, since the messages
+are sent over the secured IKE SA channel. Also during this phase, the
+two sides negotiate the IPsec encryption and authentication algorithms
+to be employed by the IPsec SAs. In phase 2 of IKE, the two sides create
+an SA in each direction. At the end of phase 2, the encryption
+
+and authentication session keys are established on both sides for the
+two SAs. The two sides can then use the SAs to send secured datagrams,
+as described in Sections 8.7.3 and 8.7.4. The primary motivation for
+having two phases in IKE is computational cost---since the second phase
+doesn't involve any public-key cryptography, IKE can generate a large
+number of SAs between the two IPsec entities with relatively little
+computational cost.
+
+8.8 Securing Wireless LANs Security is a particularly important concern
+in wireless networks, where radio waves carrying frames can propagate
+far beyond the building containing the wireless base station and hosts.
+In this section we present a brief introduction to wireless security.
+For a more in-depth treatment, see the highly readable book by Edney and
+Arbaugh \[Edney 2003\]. The issue of security in 802.11 has attracted
+considerable attention in both technical circles and in the media. While
+there has been considerable discussion, there has been little
+debate---there seems to be universal agreement that the original 802.11
+specification contains a number of serious security flaws. Indeed,
+public domain software can now be downloaded that exploits these holes,
+making those who use the vanilla 802.11 security mechanisms as open to
+security attacks as users who use no security features at all. In the
+following section, we discuss the security mechanisms initially
+standardized in the 802.11 specification, known collectively as Wired
+Equivalent Privacy (WEP). As the name suggests, WEP is meant to provide
+a level of security similar to that found in wired networks. We'll then
+discuss a few of the security holes in WEP and discuss the 802.11i
+standard, a fundamentally more secure version of 802.11 adopted in 2004.
+
+8.8.1 Wired Equivalent Privacy (WEP) The IEEE 802.11 WEP protocol was
+designed in 1999 to provide authentication and data encryption between a
+host and a wireless access point (that is, base station) using a
+symmetric shared key approach. WEP does not specify a key management
+algorithm, so it is assumed that the host and wireless access point have
+somehow agreed on the key via an out-of-band method. Authentication is
+carried out as follows:
+
+1.  A wireless host requests authentication by an access point.
+
+2.  The access point responds to the authentication request with a
+    128-byte nonce value.
+
+3.  The wireless host encrypts the nonce using the symmetric key that it
+    shares with the access point.
+
+4.  The access point decrypts the host-encrypted nonce. If the decrypted
+    nonce matches the nonce value originally sent to the host, then the
+    host is
+
+authenticated by the access point. The WEP data encryption algorithm is
+illustrated in Figure 8.30. A secret 40-bit symmetric key, KS, is
+assumed to be known by both a host and the access point. In addition, a
+24-bit Initialization Vector (IV) is appended to the 40-bit key to
+create a 64-bit key that will be used to encrypt a single frame. The IV
+will
+
+Figure 8.30 802.11 WEP protocol
+
+change from one frame to another, and hence each frame will be encrypted
+with a different 64-bit key. Encryption is performed as follows. First a
+4-byte CRC value (see Section 6.2) is computed for the data payload. The
+payload and the four CRC bytes are then encrypted using the RC4 stream
+cipher. We will not cover the details of RC4 here (see \[Schneier 1995\]
+and \[Edney 2003\] for details). For our purposes, it is enough to know
+that when presented with a key value (in this case, the 64-bit (KS, IV)
+key), the RC4 algorithm produces a stream of key values,
+k1IV,k2IV,k3IV,... that are used to encrypt the data and CRC value in a
+frame. For practical purposes, we can think of these operations being
+performed a byte at a time. Encryption is performed by XOR-ing the ith
+byte of data, di, with the ith key, kiIV, in the stream of key values
+generated by the (KS, IV) pair to produce the ith byte of ciphertext,
+ci: ci=di⊕kiIV The IV value changes from one frame to the next and is
+included in plaintext in the header of each WEP-encrypted 802.11 frame,
+as shown in Figure 8.30. The receiver takes the secret 40-bit symmetric
+key that it shares with the sender, appends the IV, and uses the
+resulting 64-bit key (which is identical to the key used by the sender
+to perform encryption) to decrypt the frame: di=ci⊕kiIV Proper use of
+the RC4 algorithm requires that the same 64-bit key value never be used
+more than once. Recall that the WEP key changes on a frame-by-frame
+basis. For a given KS (which changes rarely, if ever), this means that
+there are only 224 unique keys. If these keys are chosen randomly, we
+can show
+
+\[Edney 2003\] that the probability of having chosen the same IV value
+(and hence used the same 64-bit key) is more than 99 percent after only
+12,000 frames. With 1 Kbyte frame sizes and a data transmission rate of
+11 Mbps, only a few seconds are needed before 12,000 frames are
+transmitted. Furthermore, since the IV is transmitted in plaintext in
+the frame, an eavesdropper will know whenever a duplicate IV value is
+used. To see one of the several problems that occur when a duplicate key
+is used, consider the following chosen-plaintext attack taken by Trudy
+against Alice. Suppose that Trudy (possibly using IP spoofing) sends a
+request (for example, an HTTP or FTP request) to Alice to transmit a
+file with known content, d1, d2, d3, d4,.... Trudy also observes the
+encrypted data c1, c2, c3, c4,.... Since di=ci⊕kiIV, if we XOR ci with
+each side of this equality we have di⊕ci=kiIV With this relationship,
+Trudy can use the known values of di and ci to compute kiIV. The next
+time Trudy sees the same value of IV being used, she will know the key
+sequence k1IV,k2IV,k3IV,... and will thus be able to decrypt the
+encrypted message. There are several additional security concerns with
+WEP as well. \[Fluhrer 2001\] described an attack exploiting a known
+weakness in RC4 when certain weak keys are chosen. \[Stubblefield 2002\]
+discusses efficient ways to implement and exploit this attack. Another
+concern with WEP involves the CRC bits shown in Figure 8.30 and
+transmitted in the 802.11 frame to detect altered bits in the payload.
+However, an attacker who changes the encrypted content (e.g.,
+substituting gibberish for the original encrypted data), computes a CRC
+over the substituted gibberish, and places the CRC into a WEP frame can
+produce an 802.11 frame that will be accepted by the receiver. What is
+needed here are message integrity techniques such as those we studied in
+Section 8.3 to detect content tampering or substitution. For more
+details of WEP security, see \[Edney 2003; Wright 2015\] and the
+references therein.
+
+8.8.2 IEEE 802.11i Soon after the 1999 release of IEEE 802.11, work
+began on developing a new and improved version of 802.11 with stronger
+security mechanisms. The new standard, known as 802.11i, underwent final
+ratification in 2004. As we'll see, while WEP provided relatively weak
+encryption, only a single way to perform authentication, and no key
+distribution mechanisms, IEEE 802.11i provides for much stronger forms
+of encryption, an extensible set of authentication mechanisms, and a key
+distribution mechanism. In the following, we present an overview of
+802.11i; an excellent (streaming audio) technical overview of 802.11i is
+\[TechOnline 2012\]. Figure 8.31 overviews the 802.11i framework. In
+addition to the wireless client and access point,
+
+802.11i defines an authentication server with which the AP can
+communicate. Separating the authentication server from the AP allows one
+authentication server to serve many APs, centralizing the (often
+sensitive) decisions
+
+Figure 8.31 802.11i: Four phases of operation
+
+regarding authentication and access within the single server, and
+keeping AP costs and complexity low. 802.11i operates in four phases:
+
+1.  Discovery. In the discovery phase, the AP advertises its presence
+    and the forms of authentication and encryption that can be provided
+    to the wireless client node. The client then requests the specific
+    forms of authentication and encryption that it desires. Although the
+    client and AP are already exchanging messages, the client has not
+    yet been authenticated nor does it have an encryption key, and so
+    several more steps will be required before the client can
+    communicate with an arbitrary remote host over the wireless channel.
+
+2.  Mutual authentication and Master Key (MK) generation. Authentication
+    takes place between the wireless client and the authentication
+    server. In this phase, the access point acts essentially as a relay,
+    forwarding messages between the client and the authentication
+    server. The Extensible Authentication Protocol (EAP) \[RFC 3748\]
+    defines the end-to-end message formats used in a simple
+    request/response mode of interaction between the client and
+    authentication server. As shown in Figure 8.32, EAP messages are
+    encapsulated using EAPoL (EAP over LAN, \[IEEE 802.1X\]) and sent
+    over the 802.11 wireless link. These EAP messages
+
+are then decapsulated at the access point, and then re-encapsulated
+using the RADIUS protocol for transmission over UDP/IP to the
+authentication server. While
+
+Figure 8.32 EAP is an end-to-end protocol. EAP messages are encapsulated
+using EAPoL over the wireless link between the client and the access
+point, and using RADIUS over UDP/IP between the access point and the
+authentication server
+
+the RADIUS server and protocol \[RFC 2865\] are not required by the
+802.11i protocol, they are de facto standard components for 802.11i. The
+recently standardized DIAMETER protocol \[RFC 3588\] is likely to
+replace RADIUS in the near future. With EAP, the authentication server
+can choose one of a number of ways to perform authentication. While
+802.11i does not mandate a particular authentication method, the EAPTLS
+authentication scheme \[RFC 5216\] is often used. EAP-TLS uses public
+key techniques (including nonce encryption and message digests) similar
+to those we studied in Section 8.3 to allow the client and the
+authentication server to mutually authenticate each other, and to derive
+a Master Key (MK) that is known to both parties.
+
+3.  Pairwise Master Key (PMK) generation. The MK is a shared secret
+    known only to the client and the authentication server, which they
+    each use to generate a second key, the Pairwise Master Key (PMK).
+    The authentication server then sends the PMK to the AP. This is
+    where we wanted to be! The client and AP now have a shared key
+    (recall that in WEP, the problem of key distribution was not
+    addressed at all) and have mutually authenticated each other.
+    They're just about ready to get down to business.
+
+4.  Temporal Key (TK) generation. With the PMK, the wireless client and
+    AP can now generate additional keys that will be used for
+    communication. Of particular interest is the Temporal Key (TK),
+    which will be used to perform the link-level encryption of data sent
+    over the wireless link and to an arbitrary remote host. 802.11i
+    provides several forms of encryption, including an AES-based
+    encryption scheme and a
+
+strengthened version of WEP encryption.
+
+8.9 Operational Security: Firewalls and Intrusion Detection Systems
+We've seen throughout this chapter that the Internet is not a very safe
+place---bad guys are out there, wreaking all sorts of havoc. Given the
+hostile nature of the Internet, let's now consider an organization's
+network and the network administrator who administers it. From a network
+administrator's point of view, the world divides quite neatly into two
+camps---the good guys (who belong to the organization's network, and who
+should be able to access resources inside the organization's network in
+a relatively unconstrained manner) and the bad guys (everyone else,
+whose access to network resources must be carefully scrutinized). In
+many organizations, ranging from medieval castles to modern corporate
+office buildings, there is a single point of entry/exit where both good
+guys and bad guys entering and leaving the organization are
+security-checked. In a castle, this was done at a gate at one end of the
+drawbridge; in a corporate building, this is done at the security desk.
+In a computer network, when traffic entering/leaving a network is
+security-checked, logged, dropped, or forwarded, it is done by
+operational devices known as firewalls, intrusion detection systems
+(IDSs), and intrusion prevention systems (IPSs).
+
+8.9.1 Firewalls A firewall is a combination of hardware and software
+that isolates an organization's internal network from the Internet at
+large, allowing some packets to pass and blocking others. A firewall
+allows a network administrator to control access between the outside
+world and resources within the administered network by managing the
+traffic flow to and from these resources. A firewall has three goals:
+All traffic from outside to inside, and vice versa, passes through the
+firewall. Figure 8.33 shows a firewall, sitting squarely at the boundary
+between the administered network and the rest of the Internet. While
+large organizations may use multiple levels of firewalls or distributed
+firewalls \[Skoudis 2006\], locating a firewall at a single access point
+to the network, as shown in Figure 8.33, makes it easier to manage and
+enforce a security-access policy. Only authorized traffic, as defined by
+the local security policy, will be allowed to pass. With all traffic
+entering and leaving the institutional network passing through the
+firewall, the firewall can restrict access to authorized traffic. The
+firewall itself is immune to penetration. The firewall itself is a
+device connected to the network. If not designed or installed properly,
+it can be compromised, in which case it provides only
+
+a false sense of security (which is worse than no firewall at all!).
+
+Figure 8.33 Firewall placement between the administered network and the
+outside world
+
+Cisco and Check Point are two of the leading firewall vendors today. You
+can also easily create a firewall (packet filter) from a Linux box using
+iptables (public-domain software that is normally shipped with Linux).
+Furthermore, as discussed in Chapters 4 and 5, firewalls are now
+frequently implemented in routers and controlled remotely using SDNs.
+Firewalls can be classified in three categories: traditional packet
+filters, stateful filters, and application gateways. We'll cover each of
+these in turn in the following subsections. Traditional Packet Filters
+As shown in Figure 8.33, an organization typically has a gateway router
+connecting its internal network to its ISP (and hence to the larger
+public Internet). All traffic leaving and entering the internal network
+passes through this router, and it is at this router where packet
+filtering occurs. A packet filter examines each datagram in isolation,
+determining whether the datagram should be allowed to pass or should be
+dropped based on administrator-specific rules. Filtering decisions are
+typically based on: IP source or destination address Protocol type in IP
+datagram field: TCP, UDP, ICMP, OSPF, and so on TCP or UDP source and
+destination port
+
+Table 8.5 Policies and corresponding filtering rules for an
+organization's network 130.207/16 with Web server at 130.207.244.203
+Policy
+
+Firewall Setting
+
+No outside Web access.
+
+Drop all outgoing packets to any IP address, port 80.
+
+No incoming TCP connections, except those for
+
+Drop all incoming TCP SYN packets to any
+
+organization's public Web server only.
+
+IP except 130.207.244.203, port 80.
+
+Prevent Web-radios from eating up the
+
+Drop all incoming UDP packets---except DNS
+
+available bandwidth.
+
+packets.
+
+Prevent your network from being used for a
+
+Drop all ICMP ping packets going to a
+
+smurf DoS attack.
+
+"broadcast" address (eg 130.207.255.255).
+
+Prevent your network from being tracerouted.
+
+Drop all outgoing ICMP TTL expired traffic.
+
+TCP flag bits: SYN, ACK, and so on ICMP message type Different rules for
+datagrams leaving and entering the network Different rules for the
+different router interfaces A network administrator configures the
+firewall based on the policy of the organization. The policy may take
+user productivity and bandwidth usage into account as well as the
+security concerns of an organization. Table 8.5 lists a number of
+possible polices an organization may have, and how they would be
+addressed with a packet filter. For example, if the organization doesn't
+want any incoming TCP connections except those for its public Web
+server, it can block all incoming TCP SYN segments except TCP SYN
+segments with destination port 80 and the destination IP address
+corresponding to the Web server. If the organization doesn't want its
+users to monopolize access bandwidth with Internet radio applications,
+it can block all not-critical UDP traffic (since Internet radio is often
+sent over UDP). If the organization doesn't want its internal network to
+be mapped (tracerouted) by an outsider, it can block all ICMP TTL
+expired messages leaving the organization's network. A filtering policy
+can be based on a combination of addresses and port numbers. For
+example, a filtering router could forward all Telnet datagrams (those
+with a port number of 23) except those going to and coming from a list
+of specific IP addresses. This policy permits Telnet connections to and
+from hosts on the allowed list. Unfortunately, basing the policy on
+external addresses provides no protection against
+
+datagrams that have had their source addresses spoofed. Filtering can
+also be based on whether or not the TCP ACK bit is set. This trick is
+quite useful if an organization wants to let its internal clients
+connect to external servers but wants to prevent external clients from
+connecting to internal servers. Table 8.6 An access control list for a
+router interface action
+
+allow
+
+source address
+
+222.22/16
+
+dest address
+
+source
+
+dest
+
+flag
+
+port
+
+port
+
+bit
+
+TCP
+
+> 1023
+
+80
+
+any
+
+222.22/16
+
+TCP
+
+80
+
+> 1023
+
+ACK
+
+outside of
+
+UDP
+
+> 1023
+
+53
+
+---
+
+222.22/16
+
+UDP
+
+53
+
+> 1023
+
+---
+
+all
+
+all
+
+all
+
+all
+
+all
+
+outside of
+
+protocol
+
+222.22/16 allow
+
+outside of 222.22/16
+
+allow
+
+222.22/16
+
+222.22/16 allow
+
+outside of 222.22/16
+
+deny
+
+all
+
+Recall from Section 3.5 that the first segment in every TCP connection
+has the ACK bit set to 0, whereas all the other segments in the
+connection have the ACK bit set to 1. Thus, if an organization wants to
+prevent external clients from initiating connections to internal
+servers, it simply filters all incoming segments with the ACK bit set to
+0. This policy kills all TCP connections originating from the outside,
+but permits connections originating internally. Firewall rules are
+implemented in routers with access control lists, with each router
+interface having its own list. An example of an access control list for
+an organization 222.22/16 is shown in Table 8.6. This access control
+list is for an interface that connects the router to the organization's
+external ISPs. Rules are applied to each datagram that passes through
+the interface from top to bottom. The first two rules together allow
+internal users to surf the Web: The first rule allows any TCP packet
+with destination port 80 to leave the organization's network; the second
+rule allows any TCP packet with source port 80 and the ACK bit set to
+enter the organization's network. Note that if an external source
+attempts to establish a TCP connection with an internal host, the
+connection will be blocked, even if the source or destination port is
+80. The second two rules together allow DNS packets to enter and leave
+the organization's
+
+network. In summary, this rather restrictive access control list blocks
+all traffic except Web traffic initiated from within the organization
+and DNS traffic. \[CERT Filtering 2012\] provides a list of recommended
+port/protocol packet filterings to avoid a number of well-known security
+holes in existing network applications. Stateful Packet Filters In a
+traditional packet filter, filtering decisions are made on each packet
+in isolation. Stateful filters actually track TCP connections, and use
+this knowledge to make filtering decisions. Table 8.7 Connection table
+for stateful filter source address
+
+dest address
+
+source port
+
+dest port
+
+222.22.1.7
+
+37.96.87.123
+
+12699
+
+80
+
+222.22.93.2
+
+199.1.205.23
+
+37654
+
+80
+
+222.22.65.143
+
+203.77.240.43
+
+48712
+
+80
+
+To understand stateful filters, let's reexamine the access control list
+in Table 8.6. Although rather restrictive, the access control list in
+Table 8.6 nevertheless allows any packet arriving from the outside with
+ACK = 1 and source port 80 to get through the filter. Such packets could
+be used by attackers in attempts to crash internal systems with
+malformed packets, carry out denial-of-service attacks, or map the
+internal network. The naive solution is to block TCP ACK packets as
+well, but such an approach would prevent the organization's internal
+users from surfing the Web. Stateful filters solve this problem by
+tracking all ongoing TCP connections in a connection table. This is
+possible because the firewall can observe the beginning of a new
+connection by observing a three-way handshake (SYN, SYNACK, and ACK);
+and it can observe the end of a connection when it sees a FIN packet for
+the connection. The firewall can also (conservatively) assume that the
+connection is over when it hasn't seen any activity over the connection
+for, say, 60 seconds. An example connection table for a firewall is
+shown in Table 8.7. This connection table indicates that there are
+currently three ongoing TCP connections, all of which have been
+initiated from within the organization. Additionally, the stateful
+filter includes a new column, "check connection," in its access control
+list, as shown in Table 8.8. Note that Table 8.8 is identical to the
+access control list in Table 8.6, except now it indicates that the
+connection should be checked for two of the rules. Let's walk through
+some examples to see how the connection table and the extended access
+control list
+
+work hand-in-hand. Suppose an attacker attempts to send a malformed
+packet into the organization's network by sending a datagram with TCP
+source port 80 and with the ACK flag set. Further suppose that this
+packet has source port number 12543 and source IP address 150.23.23.155.
+When this packet reaches the firewall, the firewall checks the access
+control list in Table 8.7, which indicates that the connection table
+must also be checked before permitting this packet to enter the
+organization's network. The firewall duly checks the connection table,
+sees that this packet is not part of an ongoing TCP connection, and
+rejects the packet. As a second example, suppose that an internal user
+wants to surf an external Web site. Because this user first sends a TCP
+SYN segment, the user's TCP connection gets recorded in the connection
+table. When Table 8.8 Access control list for stateful filter action
+
+allow
+
+source address
+
+222.22/16
+
+dest address
+
+outside of
+
+protocol
+
+source
+
+dest
+
+flag
+
+check
+
+port
+
+port
+
+bit
+
+conxion
+
+TCP
+
+> 1023
+
+80
+
+any
+
+TCP
+
+80
+
+ACK
+
+222.22/16 allow
+
+outside of
+
+222.22/16
+
+222.22/16 allow
+
+222.22/16
+
+X
+
+1023 outside of
+
+UDP
+
+> 1023
+
+53
+
+---
+
+UDP
+
+53
+
+---
+
+222.22/16 allow
+
+outside of
+
+222.22/16
+
+222.22/16 deny
+
+all
+
+X
+
+1023 all
+
+all
+
+all
+
+all
+
+all
+
+the Web server sends back packets (with the ACK bit necessarily set),
+the firewall checks the table and sees that a corresponding connection
+is in progress. The firewall will thus let these packets pass, thereby
+not interfering with the internal user's Web surfing activity.
+Application Gateway In the examples above, we have seen that
+packet-level filtering allows an organization to perform coarse-grain
+filtering on the basis of the contents of IP and TCP/UDP headers,
+including IP addresses, port numbers, and acknowledgment bits. But what
+if an organization wants to provide a Telnet service to a restricted set
+of internal users (as opposed to IP addresses)? And what if the
+organization wants such privileged users to authenticate themselves
+first before being allowed to create Telnet sessions to the
+
+outside world? Such tasks are beyond the capabilities of traditional and
+stateful filters. Indeed, information about the identity of the internal
+users is application-layer data and is not included in the IP/TCP/UDP
+headers. To have finer-level security, firewalls must combine packet
+filters with application gateways. Application gateways look beyond the
+IP/TCP/UDP headers and make policy decisions based on application data.
+An application gateway is an application-specific server through which
+all application data (inbound and outbound) must pass. Multiple
+application gateways can run on the same host, but each gateway is a
+separate server with its own processes. To get some insight into
+application gateways, let's design a firewall that allows only a
+restricted set of internal users to Telnet outside and prevents all
+external clients from Telneting inside. Such a policy can be
+accomplished by implementing
+
+Figure 8.34 Firewall consisting of an application gateway and a filter
+
+a combination of a packet filter (in a router) and a Telnet application
+gateway, as shown in Figure 8.34. The router's filter is configured to
+block all Telnet connections except those that originate from the IP
+address of the application gateway. Such a filter configuration forces
+all outbound Telnet connections to pass through the application gateway.
+Consider now an internal user who wants to Telnet to the outside world.
+The user must first set up a Telnet session with the application
+gateway. An application running in the gateway, which listens for
+incoming Telnet sessions, prompts the user for a user ID and password.
+When the user supplies this information, the application gateway checks
+to see if the user has
+
+permission to Telnet to the outside world. If not, the Telnet connection
+from the internal user to the gateway is terminated by the gateway. If
+the user has permission, then the gateway (1) prompts the user for the
+host name of the external host to which the user wants to connect, (2)
+sets up a Telnet session between the gateway and the external host, and
+(3) relays to the external host all data arriving from the user, and
+relays to the user all data arriving from the external host. Thus, the
+Telnet application gateway not only performs user authorization but also
+acts as a Telnet server and a Telnet client, relaying information
+between the user and the remote Telnet server. Note that the filter will
+permit step 2 because the gateway initiates the Telnet connection to the
+outside world.
+
+CASE HISTORY ANONYMITY AND PRIVACY Suppose you want to visit a
+controversial Web site (for example, a political activist site) and you
+(1) don't want to reveal your IP address to the Web site, (2) don't want
+your local ISP (which may be your home or office ISP) to know that you
+are visiting the site, and (3) don't want your local ISP to see the data
+you are exchanging with the site. If you use the traditional approach of
+connecting directly to the Web site without any encryption, you fail on
+all three counts. Even if you use SSL, you fail on the first two counts:
+Your source IP address is presented to the Web site in every datagram
+you send; and the destination address of every packet you send can
+easily be sniffed by your local ISP. To obtain privacy and anonymity,
+you can instead use a combination of a trusted proxy server and SSL, as
+shown in Figure 8.35. With this approach, you first make an SSL
+connection to the trusted proxy. You then send, into this SSL
+connection, an HTTP request for a page at the desired site. When the
+proxy receives the SSL-encrypted HTTP request, it decrypts the request
+and forwards the cleartext HTTP request to the Web site. The Web site
+then responds to the proxy, which in turn forwards the response to you
+over SSL. Because the Web site only sees the IP address of the proxy,
+and not of your client's address, you are indeed obtaining anonymous
+access to the Web site. And because all traffic between you and the
+proxy is encrypted, your local ISP cannot invade your privacy by logging
+the site you visited or recording the data you are exchanging. Many
+companies today (such as proxify .com) make available such proxy
+services. Of course, in this solution, your proxy knows everything: It
+knows your IP address and the IP address of the site you're surfing; and
+it can see all the traffic in cleartext exchanged between you and the
+Web site. Such a solution, therefore, is only as good as the
+trustworthiness of the proxy. A more robust approach, taken by the TOR
+anonymizing and privacy service, is to route your traffic through a
+series of non-colluding proxy servers \[TOR 2016\]. In particular, TOR
+allows independent individuals to contribute proxies to its proxy pool.
+When a user connects to a server using TOR, TOR randomly chooses (from
+its proxy pool) a chain of three proxies and routes all traffic between
+client and server over the chain. In this manner, assuming the proxies
+do not collude, no one knows that communication took place between your
+IP address and the
+
+target Web site. Furthermore, although cleartext is sent between the
+last proxy and the server, the last proxy doesn't know what IP address
+is sending and receiving the cleartext.
+
+Figure 8.35 Providing anonymity and privacy with a proxy
+
+Internal networks often have multiple application gateways, for example,
+gateways for Telnet, HTTP, FTP, and e-mail. In fact, an organization's
+mail server (see Section 2.3) and Web cache are application gateways.
+Application gateways do not come without their disadvantages. First, a
+different application gateway is needed for each application. Second,
+there is a performance penalty to be paid, since all data will be
+relayed via the gateway. This becomes a concern particularly when
+multiple users or applications are using the same gateway machine.
+Finally, the client software must know how to contact the gateway when
+the user makes a request, and must know how to tell the application
+gateway what external server to connect to.
+
+8.9.2 Intrusion Detection Systems We've just seen that a packet filter
+(traditional and stateful) inspects IP, TCP, UDP, and ICMP header fields
+when deciding which packets to let pass through the firewall. However,
+to detect many attack types, we need to perform deep packet inspection,
+that is, look beyond the header fields and into the actual application
+data that the packets carry. As we saw in Section 8.9.1, application
+gateways often do deep packet inspection. But an application gateway
+only does this for a specific application. Clearly, there is a niche for
+yet another device---a device that not only examines the headers of all
+packets passing through it (like a packet filter), but also performs
+deep packet inspection (unlike a packet filter). When such a device
+observes a suspicious packet, or a suspicious series of packets, it
+could prevent those packets from entering the organizational network.
+Or, because the activity is only
+
+deemed as suspicious, the device could let the packets pass, but send
+alerts to a network administrator, who can then take a closer look at
+the traffic and take appropriate actions. A device that generates alerts
+when it observes potentially malicious traffic is called an intrusion
+detection system (IDS). A device that filters out suspicious traffic is
+called an intrusion prevention system (IPS). In this section we study
+both systems---IDS and IPS---together, since the most interesting
+technical aspect of these systems is how they detect suspicious traffic
+(and not whether they send alerts or drop packets). We will henceforth
+collectively refer to IDS systems and IPS systems as IDS systems. An IDS
+can be used to detect a wide range of attacks, including network mapping
+(emanating, for example, from nmap), port scans, TCP stack scans, DoS
+bandwidth-flooding attacks, worms and viruses, OS vulnerability attacks,
+and application vulnerability attacks. (See Section 1.6 for a survey of
+network attacks.) Today, thousands of organizations employ IDS systems.
+Many of these deployed systems are proprietary, marketed by Cisco, Check
+Point, and other security equipment vendors. But many of the deployed
+IDS systems are public-domain systems, such as the immensely popular
+Snort IDS system (which we'll discuss shortly). An organization may
+deploy one or more IDS sensors in its organizational network. Figure
+8.36 shows an organization that has three IDS sensors. When multiple
+sensors are deployed, they typically work in concert, sending
+information about
+
+Figure 8.36 An organization deploying a filter, an application gateway,
+and IDS sensors
+
+suspicious traffic activity to a central IDS processor, which collects
+and integrates the information and sends alarms to network
+administrators when deemed appropriate. In Figure 8.36, the organization
+has partitioned its network into two regions: a high-security region,
+protected by a packet filter and an application gateway and monitored by
+IDS sensors; and a lower-security region---referred to as the
+demilitarized zone (DMZ)---which is protected only by the packet filter,
+but also monitored by IDS sensors. Note that the DMZ includes the
+organization's servers that need to communicate with the outside world,
+such as its public Web server and its authoritative DNS server. You may
+be wondering at this stage, why multiple IDS sensors? Why not just place
+one IDS sensor just behind the packet filter (or even integrated with
+the packet filter) in Figure 8.36? We will soon see that an IDS not only
+needs to do deep packet inspection, but must also compare each passing
+packet with tens of thousands of "signatures"; this can be a significant
+amount of processing, particularly if the organization receives
+gigabits/sec of traffic from the Internet. By placing the IDS sensors
+further downstream, each sensor sees only a fraction of the
+organization's traffic, and can more easily keep up. Nevertheless,
+high-performance IDS and IPS systems are available today, and many
+organizations can actually get by with just one sensor located near its
+access router. IDS systems are broadly classified as either
+signature-based systems or anomaly-based systems. A signature-based IDS
+maintains an extensive database of attack signatures. Each signature is
+a set of rules pertaining to an intrusion activity. A signature may
+simply be a list of characteristics about a single packet (e.g., source
+and destination port numbers, protocol type, and a specific string of
+bits in the packet payload), or may relate to a series of packets. The
+signatures are normally created by skilled network security engineers
+who research known attacks. An organization's network administrator can
+customize the signatures or add its own to the database. Operationally,
+a signature-based IDS sniffs every packet passing by it, comparing each
+sniffed packet with the signatures in its database. If a packet (or
+series of packets) matches a signature in the database, the IDS
+generates an alert. The alert could be sent to the network administrator
+in an e-mail message, could be sent to the network management system, or
+could simply be logged for future inspection. Signature-based IDS
+systems, although widely deployed, have a number of limitations. Most
+importantly, they require previous knowledge of the attack to generate
+an accurate signature. In other words, a signature-based IDS is
+completely blind to new attacks that have yet to be recorded. Another
+disadvantage is that even if a signature is matched, it may not be the
+result of an attack, so that a false alarm is generated. Finally,
+because every packet must be compared with an extensive collection of
+signatures, the IDS can become overwhelmed with processing and actually
+fail to detect many malicious
+
+packets. An anomaly-based IDS creates a traffic profile as it observes
+traffic in normal operation. It then looks for packet streams that are
+statistically unusual, for example, an inordinate percentage of ICMP
+packets or a sudden exponential growth in port scans and ping sweeps.
+The great thing about anomaly-based IDS systems is that they don't rely
+on previous knowledge about existing attacks---that is, they can
+potentially detect new, undocumented attacks. On the other hand, it is
+an extremely challenging problem to distinguish between normal traffic
+and statistically unusual traffic. To date, most IDS deployments are
+primarily signature-based, although some include some anomaly-based
+features. Snort Snort is a public-domain, open source IDS with hundreds
+of thousands of existing deployments \[Snort 2012; Koziol 2003\]. It can
+run on Linux, UNIX, and Windows platforms. It uses the generic sniffing
+interface libpcap, which is also used by Wireshark and many other packet
+sniffers. It can easily handle 100 Mbps of traffic; for installations
+with gibabit/sec traffic rates, multiple Snort sensors may be needed. To
+gain some insight into Snort, let's take a look at an example of a Snort
+signature:
+
+alert icmp \$EXTERNAL_NET any -\> \$HOME_NET any (msg:"ICMP PING NMAP";
+dsize: 0; itype: 8;)
+
+This signature is matched by any ICMP packet that enters the
+organization's network ( \$HOME_NET ) from the outside ( \$EXTERNAL_NET
+), is of type 8 (ICMP ping), and has an empty payload (dsize = 0). Since
+nmap (see Section 1.6) generates ping packets with these specific
+characteristics, this signature is designed to detect nmap ping sweeps.
+When a packet matches this signature, Snort generates an alert that
+includes the message "ICMP PING NMAP" . Perhaps what is most impressive
+about Snort is the vast community of users and security experts that
+maintain its signature database. Typically within a few hours of a new
+attack, the Snort community writes and releases an attack signature,
+which is then downloaded by the hundreds of thousands of Snort
+deployments distributed around the world. Moreover, using the Snort
+signature syntax, network administrators can tailor the signatures to
+their own organization's needs by either modifying existing signatures
+or creating entirely new ones.
+
+8.10 Summary In this chapter, we've examined the various mechanisms that
+our secret lovers, Bob and Alice, can use to communicate securely. We've
+seen that Bob and Alice are interested in confidentiality (so they alone
+are able to understand the contents of a transmitted message), end-point
+authentication (so they are sure that they are talking with each other),
+and message integrity (so they are sure that their messages are not
+altered in transit). Of course, the need for secure communication is not
+confined to secret lovers. Indeed, we saw in Sections 8.5 through 8.8
+that security can be used in various layers in a network architecture to
+protect against bad guys who have a large arsenal of possible attacks at
+hand. The first part of this chapter presented various principles
+underlying secure communication. In Section 8.2, we covered
+cryptographic techniques for encrypting and decrypting data, including
+symmetric key cryptography and public key cryptography. DES and RSA were
+examined as specific case studies of these two major classes of
+cryptographic techniques in use in today's networks. In Section 8.3, we
+examined two approaches for providing message integrity: message
+authentication codes (MACs) and digital signatures. The two approaches
+have a number of parallels. Both use cryptographic hash functions and
+both techniques enable us to verify the source of the message as well as
+the integrity of the message itself. One important difference is that
+MACs do not rely on encryption whereas digital signatures require a
+public key infrastructure. Both techniques are extensively used in
+practice, as we saw in Sections 8.5 through 8.8. Furthermore, digital
+signatures are used to create digital certificates, which are important
+for verifying the validity of public keys. In Section 8.4, we examined
+endpoint authentication and introduced nonces to defend against the
+replay attack. In Sections 8.5 through 8.8 we examined several security
+networking protocols that enjoy extensive use in practice. We saw that
+symmetric key cryptography is at the core of PGP, SSL, IPsec, and
+wireless security. We saw that public key cryptography is crucial for
+both PGP and SSL. We saw that PGP uses digital signatures for message
+integrity, whereas SSL and IPsec use MACs. Having now an understanding
+of the basic principles of cryptography, and having studied how these
+principles are actually used, you are now in position to design your own
+secure network protocols! Armed with the techniques covered in Sections
+8.2 through 8.8, Bob and Alice can communicate securely. (One can only
+hope that they are networking students who have learned this material
+and can thus avoid having their tryst uncovered by Trudy!) But
+confidentiality is only a small part of the network security picture. As
+we learned in Section 8.9, increasingly, the focus in network security
+has been on securing the network infrastructure against a potential
+onslaught by the bad guys. In the latter part of this chapter, we thus
+covered firewalls and IDS systems which inspect packets entering and
+leaving an
+
+organization's network. This chapter has covered a lot of ground, while
+focusing on the most important topics in modern network security.
+Readers who desire to dig deeper are encouraged to investigate the
+references cited in this chapter. In particular, we recommend \[Skoudis
+2006\] for attacks and operational security, \[Kaufman 1995\] for
+cryptography and how it applies to network security, \[Rescorla 2001\]
+for an in-depth but readable treatment of SSL, and \[Edney 2003\] for a
+thorough discussion of 802.11 security, including an insightful
+investigation into WEP and its flaws.
+
+Homework Problems and Questions
+
+Chapter 8 Review Problems
+
+SECTION 8.1 R1. What are the differences between message confidentiality
+and message integrity? Can you have confidentiality without integrity?
+Can you have integrity without confidentiality? Justify your answer. R2.
+Internet entities (routers, switches, DNS servers, Web servers, user end
+systems, and so on) often need to communicate securely. Give three
+specific example pairs of Internet entities that may want secure
+communication.
+
+SECTION 8.2 R3. From a service perspective, what is an important
+difference between a symmetric-key system and a public-key system? R4.
+Suppose that an intruder has an encrypted message as well as the
+decrypted version of that message. Can the intruder mount a
+ciphertext-only attack, a known-plaintext attack, or a chosenplaintext
+attack? R5. Consider an 8-block cipher. How many possible input blocks
+does this cipher have? How many possible mappings are there? If we view
+each mapping as a key, then how many possible keys does this cipher
+have? R6. Suppose N people want to communicate with each of N−1 other
+people using symmetric key encryption. All communication between any two
+people, i and j, is visible to all other people in this group of N, and
+no other person in this group should be able to decode their
+communication. How many keys are required in the system as a whole? Now
+suppose that public key encryption is used. How many keys are required
+in this case? R7. Suppose n=10,000, a=10,023, and b=10,004. Use an
+identity of modular arithmetic to calculate in your head (a⋅b)mod n. R8.
+Suppose you want to encrypt the message 10101111 by encrypting the
+decimal number that corresponds to the message. What is the decimal
+number?
+
+SECTIONS 8.3--8.4
+
+R9. In what way does a hash provide a better message integrity check
+than a checksum (such as the Internet checksum)? R10. Can you "decrypt"
+a hash of a message to get the original message? Explain your answer.
+R11. Consider a variation of the MAC algorithm (Figure 8.9 ) where the
+sender sends (m, H(m)+s), where H(m)+s is the concatenation of H(m) and
+s. Is this variation flawed? Why or why not? R12. What does it mean for
+a signed document to be verifiable and nonforgeable? R13. In what way
+does the public-key encrypted message hash provide a better digital
+signature than the public-key encrypted message? R14. Suppose
+certifier.com creates a certificate for foo.com. Typically, the entire
+certificate would be encrypted with certifier.com's public key. True or
+false? R15. Suppose Alice has a message that she is ready to send to
+anyone who asks. Thousands of people want to obtain Alice's message, but
+each wants to be sure of the integrity of the message. In this context,
+do you think a MAC-based or a digital-signature-based integrity scheme
+is more suitable? Why? R16. What is the purpose of a nonce in an
+end-point authentication protocol? R17. What does it mean to say that a
+nonce is a once-in-a-lifetime value? In whose lifetime? R18. Is the
+message integrity scheme based on HMAC susceptible to playback attacks?
+If so, how can a nonce be incorporated into the scheme to remove this
+susceptibility?
+
+SECTIONS 8.5--8.8 R19. Suppose that Bob receives a PGP message from
+Alice. How does Bob know for sure that Alice created the message (rather
+than, say, Trudy)? Does PGP use a MAC for message integrity? R20. In the
+SSL record, there is a field for SSL sequence numbers. True or false?
+R21. What is the purpose of the random nonces in the SSL handshake? R22.
+Suppose an SSL session employs a block cipher with CBC. True or false:
+The server sends to the client the IV in the clear. R23. Suppose Bob
+initiates a TCP connection to Trudy who is pretending to be Alice.
+During the handshake, Trudy sends Bob Alice's certificate. In what step
+of the SSL handshake algorithm will Bob discover that he is not
+communicating with Alice? R24. Consider sending a stream of packets from
+Host A to Host B using IPsec. Typically, a new SA will be established
+for each packet sent in the stream. True or false? R25. Suppose that TCP
+is being run over IPsec between headquarters and the branch office in
+Figure 8.28 . If TCP retransmits the same packet, then the two
+corresponding packets sent by R1 packets will have the same sequence
+number in the ESP header. True or false? R26. An IKE SA and an IPsec SA
+are the same thing. True or false? R27. Consider WEP for 802.11. Suppose
+that the data is 10101100 and the keystream is 1111000. What is the
+resulting ciphertext?
+
+R28. In WEP, an IV is sent in the clear in every frame. True or false?
+
+SECTION 8.9 R29. Stateful packet filters maintain two data structures.
+Name them and briefly describe what they do. R30. Consider a traditional
+(stateless) packet filter. This packet filter may filter packets based
+on TCP flag bits as well as other header fields. True or false? R31. In
+a traditional packet filter, each interface can have its own access
+control list. True or false? R32. Why must an application gateway work
+in conjunction with a router filter to be effective? R33.
+Signature-based IDSs and IPSs inspect into the payloads of TCP and UDP
+segments. True or false?
+
+Problems P1. Using the monoalphabetic cipher in Figure 8.3 , encode the
+message "This is an easy problem." Decode the message "rmij'u uamu xyj."
+P2. Show that Trudy's known-plaintext attack, in which she knows the
+(ciphertext, plaintext) translation pairs for seven letters, reduces the
+number of possible substitutions to be checked in the example in Section
+8.2.1 by approximately 109. P3. Consider the polyalphabetic system shown
+in Figure 8.4 . Will a chosen-plaintext attack that is able to get the
+plaintext encoding of the message "The quick brown fox jumps over the
+lazy dog." be sufficient to decode all messages? Why or why not? P4.
+Consider the block cipher in Figure 8.5 . Suppose that each block cipher
+Ti simply reverses the order of the eight input bits (so that, for
+example, 11110000 becomes 00001111). Further suppose that the 64-bit
+scrambler does not modify any bits (so that the output value of the mth
+bit is equal to the input value of the mth bit). (a) With n=3 and the
+original 64-bit input equal to 10100000 repeated eight times, what is
+the value of the output? (b) Repeat part (a) but now change the last bit
+of the original 64-bit input from a 0 to a 1. (c) Repeat parts (a) and
+(b) but now suppose that the 64-bit scrambler inverses the order of the
+64 bits. P5. Consider the block cipher in Figure 8.5 . For a given "key"
+Alice and Bob would need to keep eight tables, each 8 bits by 8 bits.
+For Alice (or Bob) to store all eight tables, how many bits of storage
+are necessary? How does this number compare with the number of bits
+required for a full-table 64-bit block cipher? P6. Consider the 3-bit
+block cipher in Table 8.1 . Suppose the plaintext is 100100100. (a)
+Initially assume that CBC is not used. What is the resulting ciphertext?
+(b) Suppose Trudy sniffs the ciphertext. Assuming she knows that a 3-bit
+block cipher without CBC is being employed (but doesn't know the
+specific cipher), what can she surmise? (c) Now suppose that CBC is used
+
+with IV=111. What is the resulting ciphertext? P7. (a) Using RSA, choose
+p=3 and q=11, and encode the word "dog" by encrypting each letter
+separately. Apply the decryption algorithm to the encrypted version to
+recover the original plaintext message. (b) Repeat part (a) but now
+encrypt "dog" as one message m. P8. Consider RSA with p=5 and q=11.
+
+a.  What are n and z?
+
+b.  Let e be 3. Why is this an acceptable choice for e?
+
+c.  Find d such that de=1 (mod z) and d\<160.
+
+d.  Encrypt the message m=8 using the key (n, e). Let c denote the
+    corresponding ciphertext. Show all work. Hint: To simplify the
+    calculations, use the fact: \[ (a mod n)⋅(b mod n)\]mod n=(a⋅b)modn
+    P9. In this problem, we explore the Diffie-Hellman (DH) public-key
+    encryption algorithm, which allows two entities to agree on a shared
+    key. The DH algorithm makes use of a large prime number p and
+    another large number g less than p. Both p and g are made public (so
+    that an attacker would know them). In DH, Alice and Bob each
+    independently choose secret keys, SA and SB, respectively. Alice
+    then computes her public key, TA, by raising g to SA and then taking
+    mod p. Bob similarly computes his own public key TB by raising g to
+    SB and then taking mod p. Alice and Bob then exchange their public
+    keys over the Internet. Alice then calculates the shared secret key
+    S by raising TB to SA and then taking mod p. Similarly, Bob
+    calculates the shared key S′ by raising TA to SB and then taking mod
+    p.
+
+e.  Prove that, in general, Alice and Bob obtain the same symmetric key,
+    that is, prove S=S′.
+
+f.  With p = 11 and g = 2, suppose Alice and Bob choose private keys
+    SA=5 and SB=12, respectively. Calculate Alice's and Bob's public
+    keys, TA and TB. Show all work.
+
+g.  Following up on part (b), now calculate S as the shared symmetric
+    key. Show all work.
+
+h.  Provide a timing diagram that shows how Diffie-Hellman can be
+    attacked by a man-inthe-middle. The timing diagram should have three
+    vertical lines, one for Alice, one for Bob, and one for the attacker
+    Trudy. P10. Suppose Alice wants to communicate with Bob using
+    symmetric key cryptography using a session key KS. In Section 8.2 ,
+    we learned how public-key cryptography can be used to distribute the
+    session key from Alice to Bob. In this problem, we explore how the
+    session key can be distributed---without public key
+    cryptography---using a key distribution center (KDC). The KDC is a
+    server that shares a unique secret symmetric key with each
+    registered user. For Alice and Bob, denote these keys by KA-KDC and
+    KB-KDC. Design a scheme that uses the KDC to distribute KS to Alice
+    and Bob. Your scheme should use three messages to distribute the
+    session key: a message from Alice to the KDC; a message from the KDC
+    to Alice; and finally a message from Alice to Bob. The first message
+    is KA-KDC (A, B). Using the notation, KA-KDC, KB-KDC, S, A, and B
+    answer the following questions.
+
+a. What is the second message? b. What is the third message? P11.
+Compute a third message, different from the two messages in Figure 8.8 ,
+that has the same checksum as the messages in Figure 8.8 . P12. Suppose
+Alice and Bob share two secret keys: an authentication key S1 and a
+symmetric encryption key S2. Augment Figure 8.9 so that both integrity
+and confidentiality are provided. P13. In the BitTorrent P2P file
+distribution protocol (see Chapter 2 ), the seed breaks the file into
+blocks, and the peers redistribute the blocks to each other. Without any
+protection, an attacker can easily wreak havoc in a torrent by
+masquerading as a benevolent peer and sending bogus blocks to a small
+subset of peers in the torrent. These unsuspecting peers then
+redistribute the bogus blocks to other peers, which in turn redistribute
+the bogus blocks to even more peers. Thus, it is critical for BitTorrent
+to have a mechanism that allows a peer to verify the integrity of a
+block, so that it doesn't redistribute bogus blocks. Assume that when a
+peer joins a torrent, it initially gets a .torrent file from a fully
+trusted source. Describe a simple scheme that allows peers to verify the
+integrity of blocks. P14. The OSPF routing protocol uses a MAC rather
+than digital signatures to provide message integrity. Why do you think a
+MAC was chosen over digital signatures? P15. Consider our authentication
+protocol in Figure 8.18 in which Alice authenticates herself to Bob,
+which we saw works well (i.e., we found no flaws in it). Now suppose
+that while Alice is authenticating herself to Bob, Bob must authenticate
+himself to Alice. Give a scenario by which Trudy, pretending to be
+Alice, can now authenticate herself to Bob as Alice. (Hint: Consider
+that the sequence of operations of the protocol, one with Trudy
+initiating and one with Bob initiating, can be arbitrarily interleaved.
+Pay particular attention to the fact that both Bob and Alice will use a
+nonce, and that if care is not taken, the same nonce can be used
+maliciously.) P16. A natural question is whether we can use a nonce and
+public key cryptography to solve the end-point authentication problem in
+Section 8.4 . Consider the following natural protocol: (1) Alice sends
+the message " I am Alice " to Bob. (2) Bob chooses a nonce, R, and sends
+it to Alice. (3) Alice uses her private key to encrypt the nonce and
+sends the resulting value to Bob. (4) Bob applies Alice's public key to
+the received message. Thus, Bob computes R and authenticates Alice.
+
+a.  Diagram this protocol, using the notation for public and private
+    keys employed in the textbook.
+
+b.  Suppose that certificates are not used. Describe how Trudy can
+    become a "woman-inthe-middle" by intercepting Alice's messages and
+    then pretending to be Alice to Bob. P17. Figure 8.19 shows the
+    operations that Alice must perform with PGP to provide
+    confidentiality, authentication, and integrity. Diagram the
+    corresponding operations that Bob must perform on the package
+    received from Alice. P18. Suppose Alice wants to send an e-mail to
+    Bob. Bob has a public-private key pair
+
+(KB+,KB−), and Alice has Bob's certificate. But Alice does not have a
+public, private key pair. Alice and Bob (and the entire world) share the
+same hash function H(⋅).
+
+a.  In this situation, is it possible to design a scheme so that Bob can
+    verify that Alice created the message? If so, show how with a block
+    diagram for Alice and Bob.
+
+b.  Is it possible to design a scheme that provides confidentiality for
+    sending the message from Alice to Bob? If so, show how with a block
+    diagram for Alice and Bob. P19. Consider the Wireshark output below
+    for a portion of an SSL session.
+
+c.  Is Wireshark packet 112 sent by the client or server?
+
+d.  What is the server's IP address and port number?
+
+e.  Assuming no loss and no retransmissions, what will be the sequence
+    number of the next TCP segment sent by the client?
+
+f.  How many SSL records does Wireshark packet 112 contain?
+
+g.  Does packet 112 contain a Master Secret or an Encrypted Master
+    Secret or neither?
+
+h.  Assuming that the handshake type field is 1 byte and each length
+    field is 3 bytes, what are the values of the first and last bytes of
+    the Master Secret (or Encrypted Master Secret)?
+
+i.  The client encrypted handshake message takes into account how many
+    SSL records?
+
+j.  The server encrypted handshake message takes into account how many
+    SSL records? P20. In Section 8.6.1 , it is shown that without
+    sequence numbers, Trudy (a woman-in-the middle) can wreak havoc in
+    an SSL session by interchanging TCP segments. Can Trudy do something
+    similar by deleting a TCP segment? What does she need to do to
+    succeed at the deletion attack? What effect will it have?
+
+(Wireshark screenshot reprinted by permission of the Wireshark
+Foundation.)
+
+P21. Suppose Alice and Bob are communicating over an SSL session.
+Suppose an attacker, who does not have any of the shared keys, inserts a
+bogus TCP segment into a packet stream with correct TCP checksum and
+sequence numbers (and correct IP addresses and port numbers). Will SSL
+at the receiving side accept the bogus packet and pass the payload to
+the receiving application? Why or why not? P22. The following true/false
+questions pertain to Figure 8.28 .
+
+a.  When a host in 172.16.1/24 sends a datagram to an Amazon.com server,
+    the router R1 will encrypt the datagram using IPsec.
+
+b.  When a host in 172.16.1/24 sends a datagram to a host in
+    172.16.2/24, the router R1 will change the source and destination
+    address of the IP datagram.
+
+c.  Suppose a host in 172.16.1/24 initiates a TCP connection to a Web
+    server in 172.16.2/24. As part of this connection, all datagrams
+    sent by R1 will have protocol number 50 in the left-most IPv4 header
+    field.
+
+d.  Consider sending a TCP segment from a host in 172.16.1/24 to a host
+    in 172.16.2/24. Suppose the acknowledgment for this segment gets
+    lost, so that TCP resends the segment. Because IPsec uses sequence
+    numbers, R1 will not resend the TCP segment.
+
+P23. Consider the example in Figure 8.28 . Suppose Trudy is a
+woman-in-the-middle, who can insert datagrams into the stream of
+datagrams going from R1 and R2. As part of a replay attack, Trudy sends
+a duplicate copy of one of the datagrams sent from R1 to R2. Will R2
+decrypt the duplicate datagram and forward it into the branch-office
+network? If not, describe in detail how R2 detects the duplicate
+datagram. P24. Consider the following pseudo-WEP protocol. The key is 4
+bits and the IV is 2 bits. The IV is appended to the end of the key when
+generating the keystream. Suppose that the shared secret key is 1010.
+The keystreams for the four possible inputs are as follows: 101000:
+0010101101010101001011010100100 . . . 101001:
+1010011011001010110100100101101 . . . 101010:
+0001101000111100010100101001111 . . . 101011:
+1111101010000000101010100010111 . . . Suppose all messages are 8 bits
+long. Suppose the ICV (integrity check) is 4 bits long, and is
+calculated by XOR-ing the first 4 bits of data with the last 4 bits of
+data. Suppose the pseudoWEP packet consists of three fields: first the
+IV field, then the message field, and last the ICV field, with some of
+these fields encrypted.
+
+a.  We want to send the message m=10100000 using the IV=11 and using
+    WEP. What will be the values in the three WEP fields?
+
+b.  Show that when the receiver decrypts the WEP packet, it recovers the
+    message and the ICV.
+
+c.  Suppose Trudy intercepts a WEP packet (not necessarily with the
+    IV=11) and wants to modify it before forwarding it to the receiver.
+    Suppose Trudy flips the first ICV bit. Assuming that Trudy does not
+    know the keystreams for any of the IVs, what other bit(s) must Trudy
+    also flip so that the received packet passes the ICV check?
+
+d.  Justify your answer by modifying the bits in the WEP packet in part
+    (a), decrypting the resulting packet, and verifying the integrity
+    check. P25. Provide a filter table and a connection table for a
+    stateful firewall that is as restrictive as possible but
+    accomplishes the following:
+
+e.  Allows all internal users to establish Telnet sessions with external
+    hosts.
+
+f.  Allows external users to surf the company Web site at 222.22.0.12.
+
+g.  But otherwise blocks all inbound and outbound traffic. The internal
+    network is 222.22/16. In your solution, suppose that the connection
+    table is currently caching three connections, all from inside to
+    outside. You'll need to invent appropriate IP addresses and port
+    numbers. P26. Suppose Alice wants to visit the Web site activist.com
+    using a TOR-like service. This service uses two non-colluding proxy
+    servers, Proxy1 and Proxy2. Alice first obtains the
+
+certificates (each containing a public key) for Proxy1 and Proxy2 from
+some central server. Denote K1+(),K2+(),K1−(), and K2−() for the
+encryption/decryption with public and private RSA keys.
+
+a.  Using a timing diagram, provide a protocol (as simple as possible)
+    that enables Alice to establish a shared session key S1 with Proxy1.
+    Denote S1(m) for encryption/decryption of data m with the shared key
+    S1.
+
+b.  Using a timing diagram, provide a protocol (as simple as possible)
+    that allows Alice to establish a shared session key S2 with Proxy2
+    without revealing her IP address to Proxy2.
+
+c.  Assume now that shared keys S1 and S2 are now established. Using a
+    timing diagram, provide a protocol (as simple as possible and not
+    using public-key cryptography) that allows Alice to request an html
+    page from activist.com without revealing her IP address to Proxy2
+    and without revealing to Proxy1 which site she is visiting. Your
+    diagram should end with an HTTP request arriving at activist.com.
+
+Wireshark Lab In this lab (available from the book Web site), we
+investigate the Secure Sockets Layer (SSL) protocol. Recall from Section
+8.6 that SSL is used for securing a TCP connection, and that it is
+extensively used in practice for secure Internet transactions. In this
+lab, we will focus on the SSL records sent over the TCP connection. We
+will attempt to delineate and classify each of the records, with a goal
+of understanding the why and how for each record. We investigate the
+various SSL record types as well as the fields in the SSL messages. We
+do so by analyzing a trace of the SSL records sent between your host and
+an e-commerce server.
+
+IPsec Lab In this lab (available from the book Web site), we will
+explore how to create IPsec SAs between linux boxes. You can do the
+first part of the lab with two ordinary linux boxes, each with one
+Ethernet adapter. But for the second part of the lab, you will need four
+linux boxes, two of which having two Ethernet adapters. In the second
+half of the lab, you will create IPsec SAs using the ESP protocol in the
+tunnel mode. You will do this by first manually creating the SAs, and
+then by having IKE create the SAs.
+
+AN INTERVIEW WITH... Steven M. Bellovin Steven M. Bellovin joined the
+faculty at Columbia University after many years at the Network Services
+Research Lab at AT&T Labs Research in Florham Park, New Jersey. His
+focus is on networks, security, and why the two are incompatible. In
+1995, he was awarded the Usenix Lifetime Achievement Award for his work
+in the creation of Usenet, the first newsgroup exchange network that
+linked two or more computers and allowed users to share information
+
+and join in discussions. Steve is also an elected member of the National
+Academy of Engineering. He received his BA from Columbia University and
+his PhD from the University of North Carolina at Chapel Hill.
+
+What led you to specialize in the networking security area? This is
+going to sound odd, but the answer is simple: It was fun. My background
+was in systems programming and systems administration, which leads
+fairly naturally to security. And I've always been interested in
+communications, ranging back to part-time systems programming jobs when
+I was in college. My work on security continues to be motivated by two
+things---a desire to keep computers useful, which means that their
+function can't be corrupted by attackers, and a desire to protect
+privacy. What was your vision for Usenet at the time that you were
+developing it? And now? We originally viewed it as a way to talk about
+computer science and computer programming around the country, with a lot
+of local use for administrative matters, for-sale ads, and so on. In
+fact, my original prediction was one to two messages per day, from
+50--100 sites at the most--- ever. But the real growth was in
+people-related topics, including---but not limited to---human
+interactions with computers. My favorite newsgroups, over the years,
+have been things like rec.woodworking, as well as sci.crypt. To some
+extent, netnews has been displaced by the Web. Were I to start designing
+it today, it would look very different. But it still excels as a way to
+reach a very broad audience that is interested in the topic, without
+having to rely on particular Web sites. Has anyone inspired you
+professionally? In what ways? Professor Fred Brooks---the founder and
+original chair of the computer science department at the University of
+North Carolina at Chapel Hill, the manager of the team that developed
+the IBM S/360 and OS/360, and the author of The Mythical Man-Month---was
+a tremendous influence on my career. More than anything else, he taught
+outlook and trade-offs---how to look at problems in the context of the
+real world (and how much messier the real world is than a theorist would
+like), and how to balance competing interests in designing a solution.
+Most computer work is engineering---the art of making the right
+trade-offs to satisfy many contradictory objectives. What is your vision
+for the future of networking and security? Thus far, much of the
+security we have has come from isolation. A firewall, for example, works
+by cutting off access to certain machines and services. But we're in an
+era of increasing connectivity---it's gotten harder to isolate things.
+Worse yet, our production systems require far more separate pieces,
+interconnected by networks. Securing all that is one of our biggest
+challenges.
+
+What would you say have been the greatest advances in security? How much
+further do we have to go? At least scientifically, we know how to do
+cryptography. That's been a big help. But most security problems are due
+to buggy code, and that's a much harder problem. In fact, it's the
+oldest unsolved problem in computer science, and I think it will remain
+that way. The challenge is figuring out how to secure systems when we
+have to build them out of insecure components. We can already do that
+for reliability in the face of hardware failures; can we do the same for
+security? Do you have any advice for students about the Internet and
+networking security? Learning the mechanisms is the easy part. Learning
+how to "think paranoid" is harder. You have to remember that probability
+distributions don't apply---the attackers can and will find improbable
+conditions. And the details matter---a lot.
+
+Chapter 9 Multimedia Networking
+
+While lounging in bed or riding buses and subways, people in all corners
+of the world are currently using the Internet to watch movies and
+television shows on demand. Internet movie and television distribution
+companies such as Netflix and Amazon in North America and Youku and
+Kankan in China have practically become household names. But people are
+not only watching Internet videos, they are using sites like YouTube to
+upload and distribute their own user-generated content, becoming
+Internet video producers as well as consumers. Moreover, network
+applications such as Skype, Google Talk, and WeChat (enormously popular
+in China) allow people to not only make "telephone calls" over the
+Internet, but to also enhance those calls with video and multi-person
+conferencing. In fact, we predict that by the end of the current decade
+most of the video consumption and voice conversations will take place
+end-to-end over the Internet, more typically to wireless devices
+connected to the Internet via cellular and WiFi access networks.
+Traditional telephony and broadcast television are quickly becoming
+obsolete. We begin this chapter with a taxonomy of multimedia
+applications in Section 9.1. We'll see that a multimedia application can
+be classified as either streaming stored audio/video, conversational
+voice/video-over-IP, or streaming live audio/video. We'll see that each
+of these classes of applications has its own unique service requirements
+that differ significantly from those of traditional elastic applications
+such as e-mail, Web browsing, and remote login. In Section 9.2, we'll
+examine video streaming in some detail. We'll explore many of the
+underlying principles behind video streaming, including client
+buffering, prefetching, and adapting video quality to available
+bandwidth. In Section 9.3, we investigate conversational voice and
+video, which, unlike elastic applications, are highly sensitive to
+end-to-end delay but can tolerate occasional loss of data. Here we'll
+examine how techniques such as adaptive playout, forward error
+correction, and error concealment can mitigate against network-induced
+packet loss and delay. We'll also examine Skype as a case study. In
+Section 9.4, we'll study RTP and SIP, two popular protocols for
+real-time conversational voice and video applications. In Section 9.5,
+we'll investigate mechanisms within the network that can be used to
+distinguish one class of traffic (e.g., delay-sensitive applications
+such as conversational voice) from another (e.g., elastic applications
+such as browsing Web pages), and provide differentiated service among
+multiple classes of traffic.
+
+9.1 Multimedia Networking Applications We define a multimedia network
+application as any network application that employs audio or video. In
+this section, we provide a taxonomy of multimedia applications. We'll
+see that each class of applications in the taxonomy has its own unique
+set of service requirements and design issues. But before diving into an
+in-depth discussion of Internet multimedia applications, it is useful to
+consider the intrinsic characteristics of the audio and video media
+themselves.
+
+9.1.1 Properties of Video Perhaps the most salient characteristic of
+video is its high bit rate. Video distributed over the Internet
+typically ranges from 100 kbps for low-quality video conferencing to
+over 3 Mbps for streaming highdefinition movies. To get a sense of how
+video bandwidth demands compare with those of other Internet
+applications, let's briefly consider three different users, each using a
+different Internet application. Our first user, Frank, is going quickly
+through photos posted on his friends' Facebook pages. Let's assume that
+Frank is looking at a new photo every 10 seconds, and that photos are on
+average 200 Kbytes in size. (As usual, throughout this discussion we
+make the simplifying assumption that 1 Kbyte=8,000 bits.) Our second
+user, Martha, is streaming music from the Internet ("the cloud") to her
+smartphone. Let's assume Martha is using a service such as Spotify to
+listen to many MP3 songs, one after the other, each encoded at a rate of
+128 kbps. Our third user, Victor, is watching a video that has been
+encoded at 2 Mbps. Finally, let's suppose that the session length for
+all three users is 4,000 seconds (approximately 67 minutes). Table 9.1
+compares the bit rates and the total bytes transferred for these three
+users. We see that video streaming consumes by far the most bandwidth,
+having a bit rate of more than ten times greater than that of the
+Facebook and music-streaming applications. Therefore, when design Table
+9.1 Comparison of bit-rate requirements of three Internet applications
+Bit rate
+
+Bytes transferred in 67 min
+
+Facebook Frank
+
+160 kbps
+
+80 Mbytes
+
+Martha Music
+
+128 kbps
+
+64 Mbytes
+
+Victor Video
+
+2 Mbps
+
+1 Gbyte
+
+ing networked video applications, the first thing we must keep in mind
+is the high bit-rate requirements of video. Given the popularity of
+video and its high bit rate, it is perhaps not surprising that Cisco
+predicts \[Cisco 2015\] that streaming and stored video will be
+approximately 80 percent of global consumer Internet traffic by 2019.
+Another important characteristic of video is that it can be compressed,
+thereby trading off video quality with bit rate. A video is a sequence
+of images, typically being displayed at a constant rate, for example, at
+24 or 30 images per second. An uncompressed, digitally encoded image
+consists of an array of pixels, with each pixel encoded into a number of
+bits to represent luminance and color. There are two types of redundancy
+in video, both of which can be exploited by video compression. Spatial
+redundancy is the redundancy within a given image. Intuitively, an image
+that consists of mostly white space has a high degree of redundancy and
+can be efficiently compressed without significantly sacrificing image
+quality. Temporal redundancy reflects repetition from image to
+subsequent image. If, for example, an image and the subsequent image are
+exactly the same, there is no reason to re-encode the subsequent image;
+it is instead more efficient simply to indicate during encoding that the
+subsequent image is exactly the same. Today's off-the-shelf compression
+algorithms can compress a video to essentially any bit rate desired. Of
+course, the higher the bit rate, the better the image quality and the
+better the overall user viewing experience. We can also use compression
+to create multiple versions of the same video, each at a different
+quality level. For example, we can use compression to create, say, three
+versions of the same video, at rates of 300 kbps, 1 Mbps, and 3 Mbps.
+Users can then decide which version they want to watch as a function of
+their current available bandwidth. Users with high-speed Internet
+connections might choose the 3 Mbps version; users watching the video
+over 3G with a smartphone might choose the 300 kbps version. Similarly,
+the video in a video conference application can be compressed
+"on-the-fly" to provide the best video quality given the available
+end-to-end bandwidth between conversing users.
+
+9.1.2 Properties of Audio Digital audio (including digitized speech and
+music) has significantly lower bandwidth requirements than video.
+Digital audio, however, has its own unique properties that must be
+considered when designing multimedia network applications. To understand
+these properties, let's first consider how analog audio (which humans
+and musical instruments generate) is converted to a digital signal: The
+analog audio signal is sampled at some fixed rate, for example, at 8,000
+samples per second. The value of each sample will be some real number.
+Each of the samples is then rounded to one of a finite number of values.
+This operation is referred to as quantization. The number of such finite
+values---called quantization values---is typically a power
+
+of two, for example, 256 quantization values. Each of the quantization
+values is represented by a fixed number of bits. For example, if there
+are 256 quantization values, then each value---and hence each audio
+sample---is represented by one byte. The bit representations of all the
+samples are then concatenated together to form the digital
+representation of the signal. As an example, if an analog audio signal
+is sampled at 8,000 samples per second and each sample is quantized and
+represented by 8 bits, then the resulting digital signal will have a
+rate of 64,000 bits per second. For playback through audio speakers, the
+digital signal can then be converted back---that is, decoded---to an
+analog signal. However, the decoded analog signal is only an
+approximation of the original signal, and the sound quality may be
+noticeably degraded (for example, high-frequency sounds may be missing
+in the decoded signal). By increasing the sampling rate and the number
+of quantization values, the decoded signal can better approximate the
+original analog signal. Thus (as with video), there is a trade-off
+between the quality of the decoded signal and the bit-rate and storage
+requirements of the digital signal. The basic encoding technique that we
+just described is called pulse code modulation (PCM). Speech encoding
+often uses PCM, with a sampling rate of 8,000 samples per second and 8
+bits per sample, resulting in a rate of 64 kbps. The audio compact disk
+(CD) also uses PCM, with a sampling rate of 44,100 samples per second
+with 16 bits per sample; this gives a rate of 705.6 kbps for mono and
+1.411 Mbps for stereo. PCM-encoded speech and music, however, are rarely
+used in the Internet. Instead, as with video, compression techniques are
+used to reduce the bit rates of the stream. Human speech can be
+compressed to less than 10 kbps and still be intelligible. A popular
+compression technique for near CDquality stereo music is MPEG 1 layer 3,
+more commonly known as MP3. MP3 encoders can compress to many different
+rates; 128 kbps is the most common encoding rate and produces very
+little sound degradation. A related standard is Advanced Audio Coding
+(AAC), which has been popularized by Apple. As with video, multiple
+versions of a prerecorded audio stream can be created, each at a
+different bit rate. Although audio bit rates are generally much less
+than those of video, users are generally much more sensitive to audio
+glitches than video glitches. Consider, for example, a video conference
+taking place over the Internet. If, from time to time, the video signal
+is lost for a few seconds, the video conference can likely proceed
+without too much user frustration. If, however, the audio signal is
+frequently lost, the users may have to terminate the session.
+
+9.1.3 Types of Multimedia Network Applications The Internet supports a
+large variety of useful and entertaining multimedia applications. In
+this subsection, we classify multimedia applications into three broad
+categories: (i) streaming stored
+
+audio/video, (ii) conversational voice/video-over-IP, and (iii)
+streaming live audio/video. As we will soon see, each of these
+application categories has its own set of service requirements and
+design issues. Streaming Stored Audio and Video To keep the discussion
+concrete, we focus here on streaming stored video, which typically
+combines video and audio components. Streaming stored audio (such as
+Spotify's streaming music service) is very similar to streaming stored
+video, although the bit rates are typically much lower. In this class of
+applications, the underlying medium is prerecorded video, such as a
+movie, a television show, a prerecorded sporting event, or a prerecorded
+user-generated video (such as those commonly seen on YouTube). These
+prerecorded videos are placed on servers, and users send requests to the
+servers to view the videos on demand. Many Internet companies today
+provide streaming video, including YouTube (Google), Netflix, Amazon,
+and Hulu. Streaming stored video has three key distinguishing features.
+Streaming. In a streaming stored video application, the client typically
+begins video playout within a few seconds after it begins receiving the
+video from the server. This means that the client will be playing out
+from one location in the video while at the same time receiving later
+parts of the video from the server. This technique, known as streaming,
+avoids having to download the entire video file (and incurring a
+potentially long delay) before playout begins. Interactivity. Because
+the media is prerecorded, the user may pause, reposition forward,
+reposition backward, fast-forward, and so on through the video content.
+The time from when the user makes such a request until the action
+manifests itself at the client should be less than a few seconds for
+acceptable responsiveness. Continuous playout. Once playout of the video
+begins, it should proceed according to the original timing of the
+recording. Therefore, data must be received from the server in time for
+its playout at the client; otherwise, users experience video frame
+freezing (when the client waits for the delayed frames) or frame
+skipping (when the client skips over delayed frames). By far, the most
+important performance measure for streaming video is average throughput.
+In order to provide continuous playout, the network must provide an
+average throughput to the streaming application that is at least as
+large the bit rate of the video itself. As we will see in Section 9.2,
+by using buffering and prefetching, it is possible to provide continuous
+playout even when the throughput fluctuates, as long as the average
+throughput (averaged over 5--10 seconds) remains above the video rate
+\[Wang 2008\]. For many streaming video applications, prerecorded video
+is stored on, and streamed from, a CDN rather than from a single data
+center. There are also many P2P video streaming applications for which
+the video is stored on users' hosts (peers), with different chunks of
+video arriving from different peers
+
+that may spread around the globe. Given the prominence of Internet video
+streaming, we will explore video streaming in some depth in Section 9.2,
+paying particular attention to client buffering, prefetching, adapting
+quality to bandwidth availability, and CDN distribution. Conversational
+Voice- and Video-over-IP Real-time conversational voice over the
+Internet is often referred to as Internet telephony, since, from the
+user's perspective, it is similar to the traditional circuit-switched
+telephone service. It is also commonly called Voice-over-IP (VoIP).
+Conversational video is similar, except that it includes the video of
+the participants as well as their voices. Most of today's voice and
+video conversational systems allow users to create conferences with
+three or more participants. Conversational voice and video are widely
+used in the Internet today, with the Internet companies Skype, QQ, and
+Google Talk boasting hundreds of millions of daily users. In our
+discussion of application service requirements in Chapter 2 (Figure
+2.4), we identified a number of axes along which application
+requirements can be classified. Two of these axes---timing
+considerations and tolerance of data loss---are particularly important
+for conversational voice and video applications. Timing considerations
+are important because audio and video conversational applications are
+highly delay-sensitive. For a conversation with two or more interacting
+speakers, the delay from when a user speaks or moves until the action is
+manifested at the other end should be less than a few hundred
+milliseconds. For voice, delays smaller than 150 milliseconds are not
+perceived by a human listener, delays between 150 and 400 milliseconds
+can be acceptable, and delays exceeding 400 milliseconds can result in
+frustrating, if not completely unintelligible, voice conversations. On
+the other hand, conversational multimedia applications are
+loss-tolerant---occasional loss only causes occasional glitches in
+audio/video playback, and these losses can often be partially or fully
+concealed. These delay-sensitive but loss-tolerant characteristics are
+clearly different from those of elastic data applications such as Web
+browsing, e-mail, social networks, and remote login. For elastic
+applications, long delays are annoying but not particularly harmful; the
+completeness and integrity of the transferred data, however, are of
+paramount importance. We will explore conversational voice and video in
+more depth in Section 9.3, paying particular attention to how adaptive
+playout, forward error correction, and error concealment can mitigate
+against network-induced packet loss and delay. Streaming Live Audio and
+Video This third class of applications is similar to traditional
+broadcast radio and television, except that transmission takes place
+over the Internet. These applications allow a user to receive a live
+radio or television transmission---such as a live sporting event or an
+ongoing news event---transmitted from any corner of the world. Today,
+thousands of radio and television stations around the world are
+broadcasting content over the Internet.
+
+Live, broadcast-like applications often have many users who receive the
+same audio/video program at the same time. In the Internet today, this
+is typically done with CDNs (Section 2.6). As with streaming stored
+multimedia, the network must provide each live multimedia flow with an
+average throughput that is larger than the video consumption rate.
+Because the event is live, delay can also be an issue, although the
+timing constraints are much less stringent than those for conversational
+voice. Delays of up to ten seconds or so from when the user chooses to
+view a live transmission to when playout begins can be tolerated. We
+will not cover streaming live media in this book because many of the
+techniques used for streaming live media---initial buffering delay,
+adaptive bandwidth use, and CDN distribution---are similar to those for
+streaming stored media.
+
+9.2 Streaming Stored Video For streaming video applications, prerecorded
+videos are placed on servers, and users send requests to these servers
+to view the videos on demand. The user may watch the video from
+beginning to end without interruption, may stop watching the video well
+before it ends, or interact with the video by pausing or repositioning
+to a future or past scene. Streaming video systems can be classified
+into three categories: UDP streaming, HTTP streaming, and adaptive HTTP
+streaming (see Section 2.6). Although all three types of systems are
+used in practice, the majority of today's systems employ HTTP streaming
+and adaptive HTTP streaming. A common characteristic of all three forms
+of video streaming is the extensive use of client-side application
+buffering to mitigate the effects of varying end-to-end delays and
+varying amounts of available bandwidth between server and client. For
+streaming video (both stored and live), users generally can tolerate a
+small several-second initial delay between when the client requests a
+video and when video playout begins at the client. Consequently, when
+the video starts to arrive at the client, the client need not
+immediately begin playout, but can instead build up a reserve of video
+in an application buffer. Once the client has built up a reserve of
+several seconds of buffered-but-not-yet-played video, the client can
+then begin video playout. There are two important advantages provided by
+such client buffering. First, client-side buffering can absorb
+variations in server-to-client delay. If a particular piece of video
+data is delayed, as long as it arrives before the reserve of
+received-but-not-yet-played video is exhausted, this long delay will not
+be noticed. Second, if the server-to-client bandwidth briefly drops
+below the video consumption rate, a user can continue to enjoy
+continuous playback, again as long as the client application buffer does
+not become completely drained. Figure 9.1 illustrates client-side
+buffering. In this simple example, suppose that video is encoded at a
+fixed bit rate, and thus each video block contains video frames that are
+to be played out over the same fixed amount of time, Δ. The server
+transmits the first video block at t0, the second block at t0+Δ, the
+third block at t0+2Δ, and so on. Once the client begins playout, each
+block should be played out Δ time units after the previous block in
+order to reproduce the timing of the original recorded video. Because of
+the variable end-to-end network delays, different video blocks
+experience different delays. The first video block arrives at the client
+at t1 and the second block arrives at t2. The network delay for the ith
+block is the horizontal distance between the time the block was
+transmitted by the server and the time it is received at the client;
+note that the network delay varies from one video block to another. In
+this example, if the client were to begin playout as soon as the first
+block arrived at t1, then the second block would not have arrived in
+time to be played out at out at t1+Δ. In this case, video playout would
+either have to stall (waiting for block 2 to arrive) or block 2 could be
+skipped---both resulting in undesirable
+
+playout impairments. Instead, if the client were to delay the start of
+playout until t3, when blocks 1 through 6 have all arrived, periodic
+playout can proceed with all blocks having been received before their
+playout time.
+
+Figure 9.1 Client playout delay in video streaming
+
+9.2.1 UDP Streaming We only briefly discuss UDP streaming here,
+referring the reader to more in-depth discussions of the protocols
+behind these systems where appropriate. With UDP streaming, the server
+transmits video at a rate that matches the client's video consumption
+rate by clocking out the video chunks over UDP at a steady rate. For
+example, if the video consumption rate is 2 Mbps and each UDP packet
+carries 8,000 bits of video, then the server would transmit one UDP
+packet into its socket every (8000 bits)/(2 Mbps)=4 msec. As we learned
+in Chapter 3, because UDP does not employ a congestion-control
+mechanism, the server can push packets into the network at the
+consumption rate of the video without the rate-control restrictions of
+TCP. UDP streaming typically uses a small client-side buffer, big enough
+to hold less than a second of video. Before passing the video chunks to
+UDP, the server will encapsulate the video chunks within transport
+packets specially designed for transporting audio and video, using the
+Real-Time Transport Protocol (RTP) \[RFC 3550\] or a similar (possibly
+proprietary) scheme. We delay our coverage of RTP until Section 9.3,
+where we discuss RTP in the context of conversational voice and video
+systems. Another distinguishing property of UDP streaming is that in
+addition to the server-to-client video stream, the client and server
+also maintain, in parallel, a separate control connection over which the
+client sends commands regarding session state changes (such as pause,
+resume, reposition, and so on). The Real-
+
+Time Streaming Protocol (RTSP) \[RFC 2326\], explained in some detail in
+the Web site for this textbook, is a popular open protocol for such a
+control connection. Although UDP streaming has been employed in many
+open-source systems and proprietary products, it suffers from three
+significant drawbacks. First, due to the unpredictable and varying
+amount of available bandwidth between server and client, constant-rate
+UDP streaming can fail to provide continuous playout. For example,
+consider the scenario where the video consumption rate is 1 Mbps and the
+server-to-client available bandwidth is usually more than 1 Mbps, but
+every few minutes the available bandwidth drops below 1 Mbps for several
+seconds. In such a scenario, a UDP streaming system that transmits video
+at a constant rate of 1 Mbps over RTP/UDP would likely provide a poor
+user experience, with freezing or skipped frames soon after the
+available bandwidth falls below 1 Mbps. The second drawback of UDP
+streaming is that it requires a media control server, such as an RTSP
+server, to process client-to-server interactivity requests and to track
+client state (e.g., the client's playout point in the video, whether the
+video is being paused or played, and so on) for each ongoing client
+session. This increases the overall cost and complexity of deploying a
+large-scale video-on-demand system. The third drawback is that many
+firewalls are configured to block UDP traffic, preventing the users
+behind these firewalls from receiving UDP video.
+
+9.2.2 HTTP Streaming In HTTP streaming, the video is simply stored in an
+HTTP server as an ordinary file with a specific URL. When a user wants
+to see the video, the client establishes a TCP connection with the
+server and issues an HTTP GET request for that URL. The server then
+sends the video file, within an HTTP response message, as quickly as
+possible, that is, as quickly as TCP congestion control and flow control
+will allow. On the client side, the bytes are collected in a client
+application buffer. Once the number of bytes in this buffer exceeds a
+predetermined threshold, the client application begins
+playback---specifically, it periodically grabs video frames from the
+client application buffer, decompresses the frames, and displays them on
+the user's screen. We learned in Chapter 3 that when transferring a file
+over TCP, the server-to-client transmission rate can vary significantly
+due to TCP's congestion control mechanism. In particular, it is not
+uncommon for the transmission rate to vary in a "saw-tooth" manner
+associated with TCP congestion control. Furthermore, packets can also be
+significantly delayed due to TCP's retransmission mechanism. Because of
+these characteristics of TCP, the conventional wisdom in the 1990s was
+that video streaming would never work well over TCP. Over time, however,
+designers of streaming video systems learned that TCP's congestion
+control and reliable-data transfer mechanisms do not necessarily
+preclude continuous playout when client buffering and prefetching
+(discussed in the next section) are used.
+
+The use of HTTP over TCP also allows the video to traverse firewalls and
+NATs more easily (which are often configured to block most UDP traffic
+but to allow most HTTP traffic). Streaming over HTTP also obviates the
+need for a media control server, such as an RTSP server, reducing the
+cost of a largescale deployment over the Internet. Due to all of these
+advantages, most video streaming applications today---including YouTube
+and Netflix---use HTTP streaming (over TCP) as its underlying streaming
+protocol. Prefetching Video As we just learned, client-side buffering
+can be used to mitigate the effects of varying end-to-end delays and
+varying available bandwidth. In our earlier example in Figure 9.1, the
+server transmits video at the rate at which the video is to be played
+out. However, for streaming stored video, the client can attempt to
+download the video at a rate higher than the consumption rate, thereby
+prefetching video frames that are to be consumed in the future. This
+prefetched video is naturally stored in the client application buffer.
+Such prefetching occurs naturally with TCP streaming, since TCP's
+congestion avoidance mechanism will attempt to use all of the available
+bandwidth between server and client. To gain some insight into
+prefetching, let's take a look at a simple example. Suppose the video
+consumption rate is 1 Mbps but the network is capable of delivering the
+video from server to client at a constant rate of 1.5 Mbps. Then the
+client will not only be able to play out the video with a very small
+playout delay, but will also be able to increase the amount of buffered
+video data by 500 Kbits every second. In this manner, if in the future
+the client receives data at a rate of less than 1 Mbps for a brief
+period of time, the client will be able to continue to provide
+continuous playback due to the reserve in its buffer. \[Wang 2008\]
+shows that when the average TCP throughput is roughly twice the media
+bit rate, streaming over TCP results in minimal starvation and low
+buffering delays. Client Application Buffer and TCP Buffers Figure 9.2
+illustrates the interaction between client and server for HTTP
+streaming. At the server side, the portion of the video file in white
+has already been sent into the server's socket, while the darkened
+portion is what remains to be sent. After "passing through the socket
+door," the bytes are placed in the TCP send buffer before being
+transmitted into the Internet, as described in Chapter 3. In Figure 9.2,
+because the TCP send buffer at the server side is shown to be full, the
+server is momentarily prevented from sending more bytes from the video
+file into the socket. On the client side, the client application (media
+player) reads bytes from the TCP receive buffer (through its client
+socket) and places the bytes into the client application buffer. At the
+same time, the client application periodically grabs video frames from
+the client application buffer, decompresses the frames, and displays
+them on the user's screen. Note that if the client application buffer is
+larger than the video file, then the whole process of moving bytes from
+the server's storage to the client's application buffer is equivalent to
+an ordinary file download over HTTP---the client simply pulls the video
+off the server as fast as TCP will allow!
+
+Figure 9.2 Streaming stored video over HTTP/TCP
+
+Consider now what happens when the user pauses the video during the
+streaming process. During the pause period, bits are not removed from
+the client application buffer, even though bits continue to enter the
+buffer from the server. If the client application buffer is finite, it
+may eventually become full, which will cause "back pressure" all the way
+back to the server. Specifically, once the client application buffer
+becomes full, bytes can no longer be removed from the client TCP receive
+buffer, so it too becomes full. Once the client receive TCP buffer
+becomes full, bytes can no longer be removed from the server TCP send
+buffer, so it also becomes full. Once the TCP becomes full, the server
+cannot send any more bytes into the socket. Thus, if the user pauses the
+video, the server may be forced to stop transmitting, in which case the
+server will be blocked until the user resumes the video. In fact, even
+during regular playback (that is, without pausing), if the client
+application buffer becomes full, back pressure will cause the TCP
+buffers to become full, which will force the server to reduce its rate.
+To determine the resulting rate, note that when the client application
+removes f bits, it creates room for f bits in the client application
+buffer, which in turn allows the server to send f additional bits. Thus,
+the server send rate can be no higher than the video consumption rate at
+the client. Therefore, a full client application buffer indirectly
+imposes a limit on the rate that video can be sent from server to client
+when streaming over HTTP. Analysis of Video Streaming Some simple
+modeling will provide more insight into initial playout delay and
+freezing due to application buffer depletion. As shown in Figure 9.3,
+let B denote the size
+
+Figure 9.3 Analysis of client-side buffering for video streaming
+
+(in bits) of the client's application buffer, and let Q denote the
+number of bits that must be buffered before the client application
+begins playout. (Of course, Q\<B.) Let r denote the video consumption
+rate ---the rate at which the client draws bits out of the client
+application buffer during playback. So, for example, if the video's
+frame rate is 30 frames/sec, and each (compressed) frame is 100,000
+bits, then r=3 Mbps. To see the forest through the trees, we'll ignore
+TCP's send and receive buffers. Let's assume that the server sends bits
+at a constant rate x whenever the client buffer is not full. (This is a
+gross simplification, since TCP's send rate varies due to congestion
+control; we'll examine more realistic time-dependent rates x(t) in the
+problems at the end of this chapter.) Suppose at time t=0, the
+application buffer is empty and video begins arriving to the client
+application buffer. We now ask at what time t=tp does playout begin? And
+while we are at it, at what time t=tf does the client application buffer
+become full? First, let's determine tp, the time when Q bits have
+entered the application buffer and playout begins. Recall that bits
+arrive to the client application buffer at rate x and no bits are
+removed from this buffer before playout begins. Thus, the amount of time
+required to build up Q bits (the initial buffering delay) is tp=Q/x. Now
+let's determine tf, the point in time when the client application buffer
+becomes full. We first observe that if x\<r (that is, if the server send
+rate is less than the video consumption rate), then the client buffer
+will never become full! Indeed, starting at time tp, the buffer will be
+depleted at rate r and will only be filled at rate x\<r. Eventually the
+client buffer will empty out entirely, at which time the video will
+freeze on the screen while the client buffer waits another tp seconds to
+build up Q bits of video. Thus, when the
+
+available rate in the network is less than the video rate, playout will
+alternate between periods of continuous playout and periods of freezing.
+In a homework problem, you will be asked to determine the length of each
+continuous playout and freezing period as a function of Q, r, and x. Now
+let's determine tf for when x\>r. In this case, starting at time tp, the
+buffer increases from Q to B at rate x−r since bits are being depleted
+at rate r but are arriving at rate x, as shown in Figure 9.3. Given
+these hints, you will be asked in a homework problem to determine tf,
+the time the client buffer becomes full. Note that when the available
+rate in the network is more than the video rate, after the initial
+buffering delay, the user will enjoy continuous playout until the video
+ends. Early Termination and Repositioning the Video HTTP streaming
+systems often make use of the HTTP byte-range header in the HTTP GET
+request message, which specifies the specific range of bytes the client
+currently wants to retrieve from the desired video. This is particularly
+useful when the user wants to reposition (that is, jump) to a future
+point in time in the video. When the user repositions to a new position,
+the client sends a new HTTP request, indicating with the byte-range
+header from which byte in the file should the server send data. When the
+server receives the new HTTP request, it can forget about any earlier
+request and instead send bytes beginning with the byte indicated in the
+byte-range request. While we are on the subject of repositioning, we
+briefly mention that when a user repositions to a future point in the
+video or terminates the video early, some prefetched-but-not-yet-viewed
+data transmitted by the server will go unwatched---a waste of network
+bandwidth and server resources. For example, suppose that the client
+buffer is full with B bits at some time t0 into the video, and at this
+time the user repositions to some instant t\>t0+B/r into the video, and
+then watches the video to completion from that point on. In this case,
+all B bits in the buffer will be unwatched and the bandwidth and server
+resources that were used to transmit those B bits have been completely
+wasted. There is significant wasted bandwidth in the Internet due to
+early termination, which can be quite costly, particularly for wireless
+links \[Ihm 2011\]. For this reason, many streaming systems use only a
+moderate-size client application buffer, or will limit the amount of
+prefetched video using the byte-range header in HTTP requests \[Rao
+2011\]. Repositioning and early termination are analogous to cooking a
+large meal, eating only a portion of it, and throwing the rest away,
+thereby wasting food. So the next time your parents criticize you for
+wasting food by not eating all your dinner, you can quickly retort by
+saying they are wasting bandwidth and server resources when they
+reposition while watching movies over the Internet! But, of course, two
+wrongs do not make a right---both food and bandwidth are not to be
+wasted! In Sections 9.2.1 and 9.2.2, we covered UDP streaming and HTTP
+streaming, respectively. A third type of streaming is Dynamic Adaptive
+Streaming over HTTP (DASH), which uses multiple versions of the
+
+video, each compressed at a different rate. DASH is discussed in detail
+in Section 2.6.2. CDNs are often used to distribute stored and live
+video. CDNs are discussed in detail in Section 2.6.3.
+
+9.3 Voice-over-IP Real-time conversational voice over the Internet is
+often referred to as Internet telephony, since, from the user's
+perspective, it is similar to the traditional circuit-switched telephone
+service. It is also commonly called Voice-over-IP (VoIP). In this
+section we describe the principles and protocols underlying VoIP.
+Conversational video is similar in many respects to VoIP, except that it
+includes the video of the participants as well as their voices. To keep
+the discussion focused and concrete, we focus here only on voice in this
+section rather than combined voice and video.
+
+9.3.1 Limitations of the Best-Effort IP Service The Internet's
+network-layer protocol, IP, provides best-effort service. That is to say
+the service makes its best effort to move each datagram from source to
+destination as quickly as possible but makes no promises whatsoever
+about getting the packet to the destination within some delay bound or
+about a limit on the percentage of packets lost. The lack of such
+guarantees poses significant challenges to the design of real-time
+conversational applications, which are acutely sensitive to packet
+delay, jitter, and loss. In this section, we'll cover several ways in
+which the performance of VoIP over a best-effort network can be
+enhanced. Our focus will be on application-layer techniques, that is,
+approaches that do not require any changes in the network core or even
+in the transport layer at the end hosts. To keep the discussion
+concrete, we'll discuss the limitations of best-effort IP service in the
+context of a specific VoIP example. The sender generates bytes at a rate
+of 8,000 bytes per second; every 20 msecs the sender gathers these bytes
+into a chunk. A chunk and a special header (discussed below) are
+encapsulated in a UDP segment, via a call to the socket interface. Thus,
+the number of bytes in a chunk is (20 msecs)⋅(8,000 bytes/sec)=160
+bytes, and a UDP segment is sent every 20 msecs. If each packet makes it
+to the receiver with a constant end-to-end delay, then packets arrive at
+the receiver periodically every 20 msecs. In these ideal conditions, the
+receiver can simply play back each chunk as soon as it arrives. But
+unfortunately, some packets can be lost and most packets will not have
+the same end-to-end delay, even in a lightly congested Internet. For
+this reason, the receiver must take more care in determining (1) when to
+play back a chunk, and (2) what to do with a missing chunk. Packet Loss
+
+Consider one of the UDP segments generated by our VoIP application. The
+UDP segment is encapsulated in an IP datagram. As the datagram wanders
+through the network, it passes through router buffers (that is, queues)
+while waiting for transmission on outbound links. It is possible that
+one or more of the buffers in the path from sender to receiver is full,
+in which case the arriving IP datagram may be discarded, never to arrive
+at the receiving application. Loss could be eliminated by sending the
+packets over TCP (which provides for reliable data transfer) rather than
+over UDP. However, retransmission mechanisms are often considered
+unacceptable for conversational real-time audio applications such as
+VoIP, because they increase end-to-end delay \[Bolot 1996\].
+Furthermore, due to TCP congestion control, packet loss may result in a
+reduction of the TCP sender's transmission rate to a rate that is lower
+than the receiver's drain rate, possibly leading to buffer starvation.
+This can have a severe impact on voice intelligibility at the receiver.
+For these reasons, most existing VoIP applications run over UDP by
+default. \[Baset 2006\] reports that UDP is used by Skype unless a user
+is behind a NAT or firewall that blocks UDP segments (in which case TCP
+is used). But losing packets is not necessarily as disastrous as one
+might think. Indeed, packet loss rates between 1 and 20 percent can be
+tolerated, depending on how voice is encoded and transmitted, and on how
+the loss is concealed at the receiver. For example, forward error
+correction (FEC) can help conceal packet loss. We'll see below that with
+FEC, redundant information is transmitted along with the original
+information so that some of the lost original data can be recovered from
+the redundant information. Nevertheless, if one or more of the links
+between sender and receiver is severely congested, and packet loss
+exceeds 10 to 20 percent (for example, on a wireless link), then there
+is really nothing that can be done to achieve acceptable audio quality.
+Clearly, best-effort service has its limitations. End-to-End Delay
+End-to-end delay is the accumulation of transmission, processing, and
+queuing delays in routers; propagation delays in links; and end-system
+processing delays. For real-time conversational applications, such as
+VoIP, end-to-end delays smaller than 150 msecs are not perceived by a
+human listener; delays between 150 and 400 msecs can be acceptable but
+are not ideal; and delays exceeding 400 msecs can seriously hinder the
+interactivity in voice conversations. The receiving side of a VoIP
+application will typically disregard any packets that are delayed more
+than a certain threshold, for example, more than 400 msecs. Thus,
+packets that are delayed by more than the threshold are effectively
+lost. Packet Jitter A crucial component of end-to-end delay is the
+varying queuing delays that a packet experiences in the network's
+routers. Because of these varying delays, the time from when a packet is
+generated at the
+
+source until it is received at the receiver can fluctuate from packet to
+packet, as shown in Figure 9.1. This phenomenon is called jitter. As an
+example, consider two consecutive packets in our VoIP application. The
+sender sends the second packet 20 msecs after sending the first packet.
+But at the receiver, the spacing between these packets can become
+greater than 20 msecs. To see this, suppose the first packet arrives at
+a nearly empty queue at a router, but just before the second packet
+arrives at the queue a large number of packets from other sources arrive
+at the same queue. Because the first packet experiences a small queuing
+delay and the second packet suffers a large queuing delay at this
+router, the first and second packets become spaced by more than 20
+msecs. The spacing between consecutive packets can also become less than
+20 msecs. To see this, again consider two consecutive packets. Suppose
+the first packet joins the end of a queue with a large number of
+packets, and the second packet arrives at the queue before this first
+packet is transmitted and before any packets from other sources arrive
+at the queue. In this case, our two packets find themselves one right
+after the other in the queue. If the time it takes to transmit a packet
+on the router's outbound link is less than 20 msecs, then the spacing
+between first and second packets becomes less than 20 msecs. The
+situation is analogous to driving cars on roads. Suppose you and your
+friend are each driving in your own cars from San Diego to Phoenix.
+Suppose you and your friend have similar driving styles, and that you
+both drive at 100 km/hour, traffic permitting. If your friend starts out
+one hour before you, depending on intervening traffic, you may arrive at
+Phoenix more or less than one hour after your friend. If the receiver
+ignores the presence of jitter and plays out chunks as soon as they
+arrive, then the resulting audio quality can easily become
+unintelligible at the receiver. Fortunately, jitter can often be removed
+by using sequence numbers, timestamps, and a playout delay, as discussed
+below.
+
+9.3.2 Removing Jitter at the Receiver for Audio For our VoIP
+application, where packets are being generated periodically, the
+receiver should attempt to provide periodic playout of voice chunks in
+the presence of random network jitter. This is typically done by
+combining the following two mechanisms: Prepending each chunk with a
+timestamp. The sender stamps each chunk with the time at which the chunk
+was generated. Delaying playout of chunks at the receiver. As we saw in
+our earlier discussion of Figure 9.1, the playout delay of the received
+audio chunks must be long enough so that most of the packets are
+received before their scheduled playout times. This playout delay can
+either be fixed throughout the duration of the audio session or vary
+adaptively during the audio session lifetime. We now discuss how these
+three mechanisms, when combined, can alleviate or even eliminate the
+effects of jitter. We examine two playback strategies: fixed playout
+delay and adaptive playout delay.
+
+Fixed Playout Delay With the fixed-delay strategy, the receiver attempts
+to play out each chunk exactly q msecs after the chunk is generated. So
+if a chunk is timestamped at the sender at time t, the receiver plays
+out the chunk at time t+q, assuming the chunk has arrived by that time.
+Packets that arrive after their scheduled playout times are discarded
+and considered lost. What is a good choice for q? VoIP can support
+delays up to about 400 msecs, although a more satisfying conversational
+experience is achieved with smaller values of q. On the other hand, if q
+is made much smaller than 400 msecs, then many packets may miss their
+scheduled playback times due to the network-induced packet jitter.
+Roughly speaking, if large variations in end-to-end delay are typical,
+it is preferable to use a large q; on the other hand, if delay is small
+and variations in delay are also small, it is preferable to use a small
+q, perhaps less than 150 msecs. The trade-off between the playback delay
+and packet loss is illustrated in Figure 9.4. The figure shows the times
+at which packets are generated and played
+
+Figure 9.4 Packet loss for different fixed playout delays
+
+out for a single talk spurt. Two distinct initial playout delays are
+considered. As shown by the leftmost staircase, the sender generates
+packets at regular intervals---say, every 20 msecs. The first packet in
+this talk spurt is received at time r. As shown in the figure, the
+arrivals of subsequent packets are not evenly spaced due to the network
+jitter. For the first playout schedule, the fixed initial playout delay
+is set to p−r. With this schedule, the fourth
+
+packet does not arrive by its scheduled playout time, and the receiver
+considers it lost. For the second playout schedule, the fixed initial
+playout delay is set to p′−r. For this schedule, all packets arrive
+before their scheduled playout times, and there is therefore no loss.
+Adaptive Playout Delay The previous example demonstrates an important
+delay-loss trade-off that arises when designing a playout strategy with
+fixed playout delays. By making the initial playout delay large, most
+packets will make their deadlines and there will therefore be negligible
+loss; however, for conversational services such as VoIP, long delays can
+become bothersome if not intolerable. Ideally, we would like the playout
+delay to be minimized subject to the constraint that the loss be below a
+few percent. The natural way to deal with this trade-off is to estimate
+the network delay and the variance of the network delay, and to adjust
+the playout delay accordingly at the beginning of each talk spurt. This
+adaptive adjustment of playout delays at the beginning of the talk
+spurts will cause the sender's silent periods to be compressed and
+elongated; however, compression and elongation of silence by a small
+amount is not noticeable in speech. Following \[Ramjee 1994\], we now
+describe a generic algorithm that the receiver can use to adaptively
+adjust its playout delays. To this end, let ti= the timestamp of the ith
+packet = the time the packet was generated by the sender ri= the time
+packet i is received by receiver pi= the time packet i is played at
+receiver The end-to-end network delay of the ith packet is ri−ti. Due to
+network jitter, this delay will vary from packet to packet. Let di
+denote an estimate of the average network delay upon reception of the
+ith packet. This estimate is constructed from the timestamps as follows:
+di=(1−u)di−1+u(ri−ti) where u is a fixed constant (for example, u=0.01).
+Thus di is a smoothed average of the observed network delays
+r1−t1,...,ri−ti. The estimate places more weight on the recently
+observed network delays than on the observed network delays of the
+distant past. This form of estimate should not be completely unfamiliar;
+a similar idea is used to estimate round-trip times in TCP, as discussed
+in Chapter 3. Let vi denote an estimate of the average deviation of the
+delay from the estimated average delay. This estimate is also
+constructed from the timestamps: vi=(1−u)vi−1+u\| ri−ti−di\|
+
+The estimates di and vi are calculated for every packet received,
+although they are used only to determine the playout point for the first
+packet in any talk spurt. Once having calculated these estimates, the
+receiver employs the following algorithm for the playout of packets. If
+packet i is the first packet of a talk spurt, its playout time, pi, is
+computed as: pi=ti+di+Kvi where K is a positive constant (for example,
+K=4). The purpose of the Kvi term is to set the playout time far enough
+into the future so that only a small fraction of the arriving packets in
+the talk spurt will be lost due to late arrivals. The playout point for
+any subsequent packet in a talk spurt is computed as an offset from the
+point in time when the first packet in the talk spurt was played out. In
+particular, let qi=pi−ti be the length of time from when the first
+packet in the talk spurt is generated until it is played out. If packet
+j also belongs to this talk spurt, it is played out at time pj=tj+qi The
+algorithm just described makes perfect sense assuming that the receiver
+can tell whether a packet is the first packet in the talk spurt. This
+can be done by examining the signal energy in each received packet.
+
+9.3.3 Recovering from Packet Loss We have discussed in some detail how a
+VoIP application can deal with packet jitter. We now briefly describe
+several schemes that attempt to preserve acceptable audio quality in the
+presence of packet loss. Such schemes are called loss recovery schemes.
+Here we define packet loss in a broad sense: A packet is lost either if
+it never arrives at the receiver or if it arrives after its scheduled
+playout time. Our VoIP example will again serve as a context for
+describing loss recovery schemes. As mentioned at the beginning of this
+section, retransmitting lost packets may not be feasible in a realtime
+conversational application such as VoIP. Indeed, retransmitting a packet
+that has missed its playout deadline serves absolutely no purpose. And
+retransmitting a packet that overflowed a router queue cannot normally
+be accomplished quickly enough. Because of these considerations, VoIP
+applications often use some type of loss anticipation scheme. Two types
+of loss anticipation schemes are forward error correction (FEC) and
+interleaving.
+
+Forward Error Correction (FEC) The basic idea of FEC is to add redundant
+information to the original packet stream. For the cost of marginally
+increasing the transmission rate, the redundant information can be used
+to reconstruct approximations or exact versions of some of the lost
+packets. Following \[Bolot 1996\] and \[Perkins 1998\], we now outline
+two simple FEC mechanisms. The first mechanism sends a redundant encoded
+chunk after every n chunks. The redundant chunk is obtained by exclusive
+OR-ing the n original chunks \[Shacham 1990\]. In this manner if any one
+packet of the group of n+1 packets is lost, the receiver can fully
+reconstruct the lost packet. But if two or more packets in a group are
+lost, the receiver cannot reconstruct the lost packets. By keeping n+1,
+the group size, small, a large fraction of the lost packets can be
+recovered when loss is not excessive. However, the smaller the group
+size, the greater the relative increase of the transmission rate. In
+particular, the transmission rate increases by a factor of 1/n, so that,
+if n=3, then the transmission rate increases by 33 percent. Furthermore,
+this simple scheme increases the playout delay, as the receiver must
+wait to receive the entire group of packets before it can begin playout.
+For more practical details about how FEC works for multimedia transport
+see \[RFC 5109\]. The second FEC mechanism is to send a lower-resolution
+audio stream as the redundant information. For example, the sender might
+create a nominal audio stream and a corresponding low-resolution, lowbit
+rate audio stream. (The nominal stream could be a PCM encoding at 64
+kbps, and the lower-quality stream could be a GSM encoding at 13 kbps.)
+The low-bit rate stream is referred to as the redundant stream. As shown
+in Figure 9.5, the sender constructs the nth packet by taking the nth
+chunk from the nominal stream and appending to it the (n−1)st chunk from
+the redundant stream. In this manner, whenever there is nonconsecutive
+packet loss, the receiver can conceal the loss by playing out the lowbit
+rate encoded chunk that arrives with the subsequent packet. Of course,
+low-bit rate chunks give lower quality than the nominal chunks. However,
+a stream of mostly high-quality chunks, occasional lowquality chunks,
+and no missing chunks gives good overall audio quality. Note that in
+this scheme, the receiver only has to receive two packets before
+playback, so that the increased playout delay is small. Furthermore, if
+the low-bit rate encoding is much less than the nominal encoding, then
+the marginal increase in the transmission rate will be small. In order
+to cope with consecutive loss, we can use a simple variation. Instead of
+appending just the (n−1)st low-bit rate chunk to the nth nominal chunk,
+the sender can append the (n−1)st and (n−2)nd lowbit rate chunk, or
+append the (n−1)st and (n−3)rd low-bit rate chunk, and so on. By
+appending more lowbit rate chunks to each nominal chunk, the audio
+quality at the receiver becomes acceptable for a wider variety of harsh
+best-effort environments. On the other hand, the additional chunks
+increase the transmission bandwidth and the playout delay.
+
+Figure 9.5 Piggybacking lower-quality redundant information
+
+Interleaving As an alternative to redundant transmission, a VoIP
+application can send interleaved audio. As shown in Figure 9.6, the
+sender resequences units of audio data before transmission, so that
+originally adjacent units are separated by a certain distance in the
+transmitted stream. Interleaving can mitigate the effect of packet
+losses. If, for example, units are 5 msecs in length and chunks are 20
+msecs (that is, four units per chunk), then the first chunk could
+contain units 1, 5, 9, and 13; the second chunk could contain units 2,
+6, 10, and 14; and so on. Figure 9.6 shows that the loss of a single
+packet from an interleaved stream results in multiple small gaps in the
+reconstructed stream, as opposed to the single large gap that would
+occur in a noninterleaved stream. Interleaving can significantly improve
+the perceived quality of an audio stream \[Perkins 1998\]. It also has
+low overhead. The obvious disadvantage of interleaving is that it
+increases latency. This limits its use for conversational applications
+such as VoIP, although it can perform well for streaming stored audio. A
+major advantage of interleaving is that it does not increase the
+bandwidth requirements of a stream. Error Concealment Error concealment
+schemes attempt to produce a replacement for a lost packet that is
+similar to the original. As discussed in \[Perkins 1998\], this is
+possible since audio
+
+Figure 9.6 Sending interleaved audio
+
+signals, and in particular speech, exhibit large amounts of short-term
+self-similarity. As such, these techniques work for relatively small
+loss rates (less than 15 percent), and for small packets (4--40 msecs).
+When the loss length approaches the length of a phoneme (5--100 msecs)
+these techniques break down, since whole phonemes may be missed by the
+listener. Perhaps the simplest form of receiver-based recovery is packet
+repetition. Packet repetition replaces lost packets with copies of the
+packets that arrived immediately before the loss. It has low
+computational complexity and performs reasonably well. Another form of
+receiver-based recovery is interpolation, which uses audio before and
+after the loss to interpolate a suitable packet to cover the loss.
+Interpolation performs somewhat better than packet repetition but is
+significantly more computationally intensive \[Perkins 1998\].
+
+9.3.4 Case Study: VoIP with Skype Skype is an immensely popular VoIP
+application with over 50 million accounts active on a daily basis. In
+addition to providing host-to-host VoIP service, Skype offers
+host-to-phone services, phone-to-host services, and multi-party
+host-to-host video conferencing services. (Here, a host is again any
+Internet connected IP device, including PCs, tablets, and smartphones.)
+Skype was acquired by Microsoft in 2011.
+
+Because the Skype protocol is proprietary, and because all Skype's
+control and media packets are encrypted, it is difficult to precisely
+determine how Skype operates. Nevertheless, from the Skype Web site and
+several measurement studies, researchers have learned how Skype
+generally works \[Baset 2006; Guha 2006; Chen 2006; Suh 2006; Ren 2006;
+Zhang X 2012\]. For both voice and video, the Skype clients have at
+their disposal many different codecs, which are capable of encoding the
+media at a wide range of rates and qualities. For example, video rates
+for Skype have been measured to be as low as 30 kbps for a low-quality
+session up to almost 1 Mbps for a high quality session \[Zhang X 2012\].
+Typically, Skype's audio quality is better than the "POTS" (Plain Old
+Telephone Service) quality provided by the wire-line phone system.
+(Skype codecs typically sample voice at 16,000 samples/sec or higher,
+which provides richer tones than POTS, which samples at 8,000/sec.) By
+default, Skype sends audio and video packets over UDP. However, control
+packets are sent over TCP, and media packets are also sent over TCP when
+firewalls block UDP streams. Skype uses FEC for loss recovery for both
+voice and video streams sent over UDP. The Skype client also adapts the
+audio and video streams it sends to current network conditions, by
+changing video quality and FEC overhead \[Zhang X 2012\]. Skype uses P2P
+techniques in a number of innovative ways, nicely illustrating how P2P
+can be used in applications that go beyond content distribution and file
+sharing. As with instant messaging, host-to-host Internet telephony is
+inherently P2P since, at the heart of the application, pairs of users
+(that is, peers) communicate with each other in real time. But Skype
+also employs P2P techniques for two other important functions, namely,
+for user location and for NAT traversal.
+
+Figure 9.7 Skype peers
+
+As shown in Figure 9.7, the peers (hosts) in Skype are organized into a
+hierarchical overlay network, with each peer classified as a super peer
+or an ordinary peer. Skype maintains an index that maps Skype usernames
+to current IP addresses (and port numbers). This index is distributed
+over the super peers. When Alice wants to call Bob, her Skype client
+searches the distributed index to determine Bob's current IP address.
+Because the Skype protocol is proprietary, it is currently not known how
+the index mappings are organized across the super peers, although some
+form of DHT organization is very possible. P2P techniques are also used
+in Skype relays, which are useful for establishing calls between hosts
+in home networks. Many home network configurations provide access to the
+Internet through NATs, as discussed in Chapter 4. Recall that a NAT
+prevents a host from outside the home network from initiating a
+connection to a host within the home network. If both Skype callers have
+NATs, then there is a problem---neither can accept a call initiated by
+the other, making a call seemingly impossible. The clever use of super
+peers and relays nicely solves this problem. Suppose that when Alice
+signs in, she is assigned to a non-NATed super peer and initiates a
+session to that super peer. (Since Alice is initiating the session, her
+NAT permits this session.) This session allows Alice and her super peer
+to exchange control messages. The same happens for Bob when he signs in.
+Now, when Alice wants to call Bob, she informs her super peer, who in
+turn informs Bob's super peer, who in turn informs Bob of Alice's
+incoming call. If Bob accepts the call, the two super peers select a
+third non-NATed super peer---the relay peer---whose job will be to relay
+data between Alice and Bob. Alice's and Bob's super peers then instruct
+Alice and Bob respectively to initiate a session with the relay. As
+shown in Figure 9.7, Alice then sends voice packets to the relay over
+the Alice-to-relay connection (which was initiated by Alice), and the
+relay then forwards these packets over the relay-to-Bob connection
+(which was initiated by Bob); packets from Bob to Alice flow over these
+same two relay connections in reverse. And voila!---Bob and Alice have
+an end-to-end connection even though neither can accept a session
+originating from outside. Up to now, our discussion on Skype has focused
+on calls involving two persons. Now let's examine multi-party audio
+conference calls. With N\>2 participants, if each user were to send a
+copy of its audio stream to each of the N−1 other users, then a total of
+N(N−1) audio streams would need to be sent into the network to support
+the audio conference. To reduce this bandwidth usage, Skype employs a
+clever distribution technique. Specifically, each user sends its audio
+stream to the conference initiator. The conference initiator combines
+the audio streams into one stream (basically by adding all the audio
+signals together) and then sends a copy of each combined stream to each
+of the other N−1 participants. In this manner, the number of streams is
+reduced to 2(N−1). For ordinary two-person video conversations, Skype
+routes the call peer-to-peer, unless NAT traversal is required, in which
+case the call is relayed through a non-NATed peer, as described earlier.
+For a video conference call involving N\>2 participants, due to the
+nature of the video medium, Skype does not combine the call into one
+
+stream at one location and then redistribute the stream to all the
+participants, as it does for voice calls. Instead, each participant's
+video stream is routed to a server cluster (located in Estonia as of
+2011), which in turn relays to each participant the N−1 streams of the
+N−1 other participants \[Zhang X 2012\]. You may be wondering why each
+participant sends a copy to a server rather than directly sending a copy
+of its video stream to each of the other N−1 participants? Indeed, for
+both approaches, N(N−1) video streams are being collectively received by
+the N participants in the conference. The reason is, because upstream
+link bandwidths are significantly lower than downstream link bandwidths
+in most access links, the upstream links may not be able to support the
+N−1 streams with the P2P approach. VoIP systems such as Skype, WeChat,
+and Google Talk introduce new privacy concerns. Specifically, when Alice
+and Bob communicate over VoIP, Alice can sniff Bob's IP address and then
+use geo-location services \[MaxMind 2016; Quova 2016\] to determine
+Bob's current location and ISP (for example, his work or home ISP). In
+fact, with Skype it is possible for Alice to block the transmission of
+certain packets during call establishment so that she obtains Bob's
+current IP address, say every hour, without Bob knowing that he is being
+tracked and without being on Bob's contact list. Furthermore, the IP
+address discovered from Skype can be correlated with IP addresses found
+in BitTorrent, so that Alice can determine the files that Bob is
+downloading \[LeBlond 2011\]. Moreover, it is possible to partially
+decrypt a Skype call by doing a traffic analysis of the packet sizes in
+a stream \[White 2011\].
+
+9.4 Protocols for Real-Time Conversational Applications Real-time
+conversational applications, including VoIP and video conferencing, are
+compelling and very popular. It is therefore not surprising that
+standards bodies, such as the IETF and ITU, have been busy for many
+years (and continue to be busy!) at hammering out standards for this
+class of applications. With the appropriate standards in place for
+real-time conversational applications, independent companies are
+creating new products that interoperate with each other. In this section
+we examine RTP and SIP for real-time conversational applications. Both
+standards are enjoying widespread implementation in industry products.
+
+9.4.1 RTP In the previous section, we learned that the sender side of a
+VoIP application appends header fields to the audio chunks before
+passing them to the transport layer. These header fields include
+sequence numbers and timestamps. Since most multimedia networking
+applications can make use of sequence numbers and timestamps, it is
+convenient to have a standardized packet structure that includes fields
+for audio/video data, sequence number, and timestamp, as well as other
+potentially useful fields. RTP, defined in RFC 3550, is such a standard.
+RTP can be used for transporting common formats such as PCM, ACC, and
+MP3 for sound and MPEG and H.263 for video. It can also be used for
+transporting proprietary sound and video formats. Today, RTP enjoys
+widespread implementation in many products and research prototypes. It
+is also complementary to other important real-time interactive
+protocols, such as SIP. In this section, we provide an introduction to
+RTP. We also encourage you to visit Henning Schulzrinne's RTP site
+\[Schulzrinne-RTP 2012\], which provides a wealth of information on the
+subject. Also, you may want to visit the RAT site \[RAT 2012\], which
+documents VoIP application that uses RTP. RTP Basics RTP typically runs
+on top of UDP. The sending side encapsulates a media chunk within an RTP
+packet, then encapsulates the packet in a UDP segment, and then hands
+the segment to IP. The receiving side extracts the RTP packet from the
+UDP segment, then extracts the media chunk from the RTP packet, and then
+passes the chunk to the media player for decoding and rendering. As an
+example, consider the use of RTP to transport voice. Suppose the voice
+source is PCM-encoded
+
+(that is, sampled, quantized, and digitized) at 64 kbps. Further suppose
+that the application collects the encoded data in 20-msec chunks, that
+is, 160 bytes in a chunk. The sending side precedes each chunk of the
+audio data with an RTP header that includes the type of audio encoding,
+a sequence number, and a timestamp. The RTP header is normally 12 bytes.
+The audio chunk along with the RTP header form the RTP packet. The RTP
+packet is then sent into the UDP socket interface. At the receiver side,
+the application receives the RTP packet from its socket interface. The
+application extracts the audio chunk from the RTP packet and uses the
+header fields of the RTP packet to properly decode and play back the
+audio chunk. If an application incorporates RTP---instead of a
+proprietary scheme to provide payload type, sequence numbers, or
+timestamps---then the application will more easily interoperate with
+other networked multimedia applications. For example, if two different
+companies develop VoIP software and they both incorporate RTP into their
+product, there may be some hope that a user using one of the VoIP
+products will be able to communicate with a user using the other VoIP
+product. In Section 9.4.2, we'll see that RTP is often used in
+conjunction with SIP, an important standard for Internet telephony. It
+should be emphasized that RTP does not provide any mechanism to ensure
+timely delivery of data or provide other quality-of-service (QoS)
+guarantees; it does not even guarantee delivery of packets or prevent
+out-of-order delivery of packets. Indeed, RTP encapsulation is seen only
+at the end systems. Routers do not distinguish between IP datagrams that
+carry RTP packets and IP datagrams that don't. RTP allows each source
+(for example, a camera or a microphone) to be assigned its own
+independent RTP stream of packets. For example, for a video conference
+between two participants, four RTP streams could be opened---two streams
+for transmitting the audio (one in each direction) and two streams for
+transmitting the video (again, one in each direction). However, many
+popular encoding techniques---including MPEG 1 and MPEG 2---bundle the
+audio and video into a single stream during the encoding process. When
+the audio and video are bundled by the encoder, then only one RTP stream
+is generated in each direction. RTP packets are not limited to unicast
+applications. They can also be sent over one-to-many and manyto-many
+multicast trees. For a many-to-many multicast session, all of the
+session's senders and sources typically use the same multicast group for
+sending their RTP streams. RTP multicast streams belonging together,
+such as audio and video streams emanating from multiple senders in a
+video conference application, belong to an RTP session.
+
+Figure 9.8 RTP header fields
+
+RTP Packet Header Fields As shown in Figure 9.8, the four main RTP
+packet header fields are the payload type, sequence number, timestamp,
+and source identifier fields. The payload type field in the RTP packet
+is 7 bits long. For an audio stream, the payload type field is used to
+indicate the type of audio encoding (for example, PCM, adaptive delta
+modulation, linear predictive encoding) that is being used. If a sender
+decides to change the encoding in the middle of a session, the sender
+can inform the receiver of the change through this payload type field.
+The sender may want to change the encoding in order to increase the
+audio quality or to decrease the RTP stream bit rate. Table 9.2 lists
+some of the audio payload types currently supported by RTP. For a video
+stream, the payload type is used to indicate the type of video encoding
+(for example, motion JPEG, MPEG 1, MPEG 2, H.261). Again, the sender can
+change video encoding on the fly during a session. Table 9.3 lists some
+of the video payload types currently supported by RTP. The other
+important fields are the following: Sequence number field. The sequence
+number field is 16 bits long. The sequence number increments by one for
+each RTP packet sent, and may be used by the receiver to detect packet
+loss and to restore packet sequence. For example, if the receiver side
+of the application receives a stream of RTP packets with a gap between
+sequence numbers 86 and 89, then the receiver knows that packets 87 and
+88 are missing. The receiver can then attempt to conceal the lost data.
+Timestamp field. The timestamp field is 32 bits long. It reflects the
+sampling instant of the first byte in the RTP data packet. As we saw in
+the preceding section, the receiver can use timestamps to remove packet
+jitter introduced in the network and to provide synchronous playout at
+the receiver. The timestamp is derived from a sampling clock at the
+sender. As an example, for audio the timestamp clock increments by one
+for each sampling period (for example, each 125 μsec for an 8 kHz
+sampling clock); if the audio application generates chunks consisting of
+160 encoded samples, then the timestamp increases by 160 for each RTP
+packet when the source is active. The timestamp clock continues to
+increase at a constant rate even if the source is inactive.
+Synchronization source identifier (SSRC). The SSRC field is 32 bits
+long. It identifies the source of the RTP stream. Typically, each stream
+in an RTP session has a distinct SSRC. The SSRC is not the IP address of
+the sender, but instead is a number that the source assigns randomly
+when the new stream is started. The probability that two streams get
+assigned the same SSRC is very small. Should this happen, the two
+sources pick a new SSRC value. Table 9.2 Audio payload types supported
+by RTP Payload-Type Number
+
+Audio Format
+
+Sampling Rate
+
+Rate
+
+0
+
+PCM μ-law
+
+8 kHz
+
+64 kbps
+
+1
+
+1016
+
+8 kHz
+
+4.8 kbps
+
+3
+
+GSM
+
+8 kHz
+
+13 kbps
+
+7
+
+LPC
+
+8 kHz
+
+2.4 kbps
+
+9
+
+G.722
+
+16 kHz
+
+48--64 kbps
+
+14
+
+MPEG Audio
+
+90 kHz
+
+---
+
+15
+
+G.728
+
+8 kHz
+
+16 kbps
+
+Table 9.3 Some video payload types supported by RTP Payload-Type Number
+
+Video Format
+
+26
+
+Motion JPEG
+
+31
+
+H.261
+
+32
+
+MPEG 1 video
+
+33
+
+MPEG 2 video
+
+9.4.2 SIP The Session Initiation Protocol (SIP), defined in \[RFC 3261;
+RFC 5411\], is an open and lightweight protocol that does the following:
+It provides mechanisms for establishing calls between a caller and a
+callee over an IP network. It allows the caller to notify the callee
+that it wants to start a call. It allows the participants to agree on
+media encodings. It also allows participants to end calls. It provides
+mechanisms for the caller to determine the current IP address of the
+callee. Users do not have a single, fixed IP address because they may be
+assigned addresses dynamically (using DHCP) and because they may have
+multiple IP devices, each with a different IP address. It provides
+mechanisms for call management, such as adding new media streams during
+the call,
+
+changing the encoding during the call, inviting new participants during
+the call, call transfer, and call holding. Setting Up a Call to a Known
+IP Address To understand the essence of SIP, it is best to take a look
+at a concrete example. In this example, Alice is at her PC and she wants
+to call Bob, who is also working at his PC. Alice's and Bob's PCs are
+both equipped with SIP-based software for making and receiving phone
+calls. In this initial example, we'll assume that Alice knows the IP
+address of Bob's PC. Figure 9.9 illustrates the SIP call-establishment
+process. In Figure 9.9, we see that an SIP session begins when Alice
+sends Bob an INVITE message, which resembles an HTTP request message.
+This INVITE message is sent over UDP to the well-known port 5060 for
+SIP. (SIP messages can also be sent over TCP.) The INVITE message
+includes an identifier for Bob (bob@193.64.210.89), an indication of
+Alice's current IP address, an indication that Alice desires to receive
+audio, which is to be encoded in format AVP 0 (PCM encoded μ-law) and
+
+Figure 9.9 SIP call establishment when Alice knows Bob's IP address
+
+encapsulated in RTP, and an indication that she wants to receive the RTP
+packets on port 38060. After receiving Alice's INVITE message, Bob sends
+an SIP response message, which resembles an HTTP response message. This
+response SIP message is also sent to the SIP port 5060. Bob's response
+includes a 200 OK as well as an indication of his IP address, his
+desired encoding and packetization for reception, and his port number to
+which the audio packets should be sent. Note that in this example Alice
+and Bob are going to use different audio-encoding mechanisms: Alice is
+asked to encode her audio with GSM whereas Bob is asked to encode his
+audio with PCM μ-law. After receiving Bob's response, Alice sends Bob an
+SIP acknowledgment message. After this SIP transaction, Bob and Alice
+can talk. (For visual convenience, Figure 9.9 shows Alice talking after
+Bob, but in truth they would normally talk at the same time.) Bob will
+encode and packetize the audio as requested and send the audio packets
+to port number 38060 at IP address 167.180.112.24. Alice will also
+encode and packetize the audio as requested and send the audio packets
+to port number 48753 at IP address 193.64.210.89. From this simple
+example, we have learned a number of key characteristics of SIP. First,
+SIP is an outof-band protocol: The SIP messages are sent and received in
+sockets that are different from those used for sending and receiving the
+media data. Second, the SIP messages themselves are ASCII-readable and
+resemble HTTP messages. Third, SIP requires all messages to be
+acknowledged, so it can run over UDP or TCP. In this example, let's
+consider what would happen if Bob does not have a PCM μ-law codec for
+encoding audio. In this case, instead of responding with 200 OK, Bob
+would likely respond with a 606 Not Acceptable and list in the message
+all the codecs he can use. Alice would then choose one of the listed
+codecs and send another INVITE message, this time advertising the chosen
+codec. Bob could also simply reject the call by sending one of many
+possible rejection reply codes. (There are many such codes, including
+"busy," "gone," "payment required," and "forbidden.") SIP Addresses In
+the previous example, Bob's SIP address is sip:bob@193.64.210.89.
+However, we expect many---if not most---SIP addresses to resemble e-mail
+addresses. For example, Bob's address might be sip:bob@domain.com. When
+Alice's SIP device sends an INVITE message, the message would include
+this e-mail-like address; the SIP infrastructure would then route the
+message to the IP device that Bob is currently using (as we'll discuss
+below). Other possible forms for the SIP address could be Bob's legacy
+phone number or simply Bob's first/middle/last name (assuming it is
+unique). An interesting feature of SIP addresses is that they can be
+included in Web pages, just as people's email addresses are included in
+Web pages with the mailto URL. For example, suppose Bob has a
+
+personal homepage, and he wants to provide a means for visitors to the
+homepage to call him. He could then simply include the URL
+sip:bob@domain.com. When the visitor clicks on the URL, the SIP
+application in the visitor's device is launched and an INVITE message is
+sent to Bob. SIP Messages In this short introduction to SIP, we'll not
+cover all SIP message types and headers. Instead, we'll take a brief
+look at the SIP INVITE message, along with a few common header lines.
+Let us again suppose that Alice wants to initiate a VoIP call to Bob,
+and this time Alice knows only Bob's SIP address, bob@domain.com, and
+does not know the IP address of the device that Bob is currently using.
+Then her message might look something like this:
+
+INVITE sip:bob@domain.com SIP/2.0 Via: SIP/2.0/UDP 167.180.112.24 From:
+sip:alice@hereway.com To: sip:bob@domain.com Call-ID:
+a2e3a@pigeon.hereway.com Content-Type: application/sdp Content-Length:
+885 c=IN IP4 167.180.112.24 m=audio 38060 RTP/AVP 0
+
+The INVITE line includes the SIP version, as does an HTTP request
+message. Whenever an SIP message passes through an SIP device (including
+the device that originates the message), it attaches a Via header, which
+indicates the IP address of the device. (We'll see soon that the typical
+INVITE message passes through many SIP devices before reaching the
+callee's SIP application.) Similar to an e-mail message, the SIP message
+includes a From header line and a To header line. The message includes a
+Call-ID, which uniquely identifies the call (similar to the message-ID
+in e-mail). It includes a Content-Type header line, which defines the
+format used to describe the content contained in the SIP message. It
+also includes a Content-Length header line, which provides the length in
+bytes of the content in the message. Finally, after a carriage return
+and line feed, the message contains the content. In this case, the
+content provides information about Alice's IP address and how Alice
+wants to receive the audio. Name Translation and User Location In the
+example in Figure 9.9, we assumed that Alice's SIP device knew the IP
+address where Bob could
+
+be contacted. But this assumption is quite unrealistic, not only because
+IP addresses are often dynamically assigned with DHCP, but also because
+Bob may have multiple IP devices (for example, different devices for his
+home, work, and car). So now let us suppose that Alice knows only Bob's
+e-mail address, bob@domain.com, and that this same address is used for
+SIP-based calls. In this case, Alice needs to obtain the IP address of
+the device that the user bob@domain.com is currently using. To find this
+out, Alice creates an INVITE message that begins with INVITE
+bob@domain.com SIP/2.0 and sends this message to an SIP proxy. The proxy
+will respond with an SIP reply that might include the IP address of the
+device that bob@domain.com is currently using. Alternatively, the reply
+might include the IP address of Bob's voicemail box, or it might include
+a URL of a Web page (that says "Bob is sleeping. Leave me alone!").
+Also, the result returned by the proxy might depend on the caller: If
+the call is from Bob's wife, he might accept the call and supply his IP
+address; if the call is from Bob's mother-inlaw, he might respond with
+the URL that points to the I-am-sleeping Web page! Now, you are probably
+wondering, how can the proxy server determine the current IP address for
+bob@domain.com? To answer this question, we need to say a few words
+about another SIP device, the SIP registrar. Every SIP user has an
+associated registrar. Whenever a user launches an SIP application on a
+device, the application sends an SIP register message to the registrar,
+informing the registrar of its current IP address. For example, when Bob
+launches his SIP application on his PDA, the application would send a
+message along the lines of:
+
+REGISTER sip:domain.com SIP/2.0 Via: SIP/2.0/UDP 193.64.210.89 From:
+sip:bob@domain.com To: sip:bob@domain.com Expires: 3600
+
+Bob's registrar keeps track of Bob's current IP address. Whenever Bob
+switches to a new SIP device, the new device sends a new register
+message, indicating the new IP address. Also, if Bob remains at the same
+device for an extended period of time, the device will send refresh
+register messages, indicating that the most recently sent IP address is
+still valid. (In the example above, refresh messages need to be sent
+every 3600 seconds to maintain the address at the registrar server.) It
+is worth noting that the registrar is analogous to a DNS authoritative
+name server: The DNS server translates fixed host names to fixed IP
+addresses; the SIP registrar translates fixed human identifiers (for
+example, bob@domain.com) to dynamic IP addresses. Often SIP registrars
+and SIP proxies are run on the same host. Now let's examine how Alice's
+SIP proxy server obtains Bob's current IP address. From the preceding
+discussion we see that the proxy server simply needs to forward Alice's
+INVITE message to Bob's registrar/proxy. The registrar/proxy could then
+forward the message to Bob's current SIP device. Finally,
+
+Bob, having now received Alice's INVITE message, could send an SIP
+response to Alice. As an example, consider Figure 9.10, in which
+jim@umass.edu, currently working on 217.123.56.89, wants to initiate a
+Voice-over-IP (VoIP) session with keith@upenn.edu, currently working on
+197.87.54.21. The following steps are taken:
+
+Figure 9.10 Session initiation, involving SIP proxies and registrars
+
+(1) Jim sends an INVITE message to the umass SIP proxy. (2) The proxy
+    does a DNS lookup on the SIP registrar upenn.edu (not shown in
+    diagram) and then forwards the message to the registrar server.
+(2) Because keith@upenn.edu is no longer registered at the upenn
+    registrar, the upenn registrar sends a redirect response, indicating
+    that it should try keith@nyu.edu. (4) The umass proxy sends an
+    INVITE message to the NYU SIP registrar. (5) The NYU registrar knows
+    the IP address of keith@upenn.edu and forwards the INVITE message to
+    the host 197.87.54.21, which is running Keith's SIP client. (6--8)
+    An SIP response is sent back through registrars/proxies to the SIP
+    client on 217.123.56.89. (9) Media is sent directly between the two
+    clients. (There is also an SIP acknowledgment message, which is not
+    shown.) Our discussion of SIP has focused on call initiation for
+    voice calls. SIP, being a signaling protocol for initiating and
+    ending calls in general, can be used for video conference calls as
+    well as for text-based
+
+sessions. In fact, SIP has become a fundamental component in many
+instant messaging applications. Readers desiring to learn more about SIP
+are encouraged to visit Henning Schulzrinne's SIP Web site
+\[Schulzrinne-SIP 2016\]. In particular, on this site you will find open
+source software for SIP clients and servers \[SIP Software 2016\].
+
+9.5 Network Support for Multimedia In Sections 9.2 through 9.4, we
+learned how application-level mechanisms such as client buffering,
+prefetching, adapting media quality to available bandwidth, adaptive
+playout, and loss mitigation techniques can be used by multimedia
+applications to improve a multimedia application's performance. We also
+learned how content distribution networks and P2P overlay networks can
+be used to provide a system-level approach for delivering multimedia
+content. These techniques and approaches are all designed to be used in
+today's best-effort Internet. Indeed, they are in use today precisely
+because the Internet provides only a single, best-effort class of
+service. But as designers of computer networks, we can't help but ask
+whether the network (rather than the applications or application-level
+infrastructure alone) might provide mechanisms to support multimedia
+content delivery. As we'll see shortly, the answer is, of course, "yes"!
+But we'll also see that a number of these new network-level mechanisms
+have yet to be widely deployed. This may be due to their complexity and
+to the fact that application-level techniques together with best-effort
+service and properly dimensioned network resources (for example,
+bandwidth) can indeed provide a "good-enough" (even if
+not-always-perfect) end-to-end multimedia delivery service. Table 9.4
+summarizes three broad approaches towards providing network-level
+support for multimedia applications. Making the best of best-effort
+service. The application-level mechanisms and infrastructure that we
+studied in Sections 9.2 through 9.4 can be successfully used in a
+well-dimensioned network where packet loss and excessive end-to-end
+delay rarely occur. When demand increases are forecasted, the ISPs
+deploy additional bandwidth and switching capacity to continue to ensure
+satisfactory delay and packet-loss performance \[Huang 2005\]. We'll
+discuss such network dimensioning further in Section 9.5.1.
+Differentiated service. Since the early days of the Internet, it's been
+envisioned that different types of traffic (for example, as indicated in
+the Type-of-Service field in the IP4v packet header) could be provided
+with different classes of service, rather than a single
+one-size-fits-all best-effort service. With differentiated service, one
+type of traffic might be given strict priority over another class of
+traffic when both types of traffic are queued at a router. For example,
+packets belonging to a realtime conversational application might be
+given priority over other packets due to their stringent delay
+constraints. Introducing differentiated service into the network will
+require new mechanisms for packet marking (indicating a packet's class
+of service), packet scheduling, and more. We'll cover differentiated
+service, and new network mechanisms needed to implement this service, in
+Sections 9.5.2 and 9.5.3.
+
+Table 9.4 Three network-level approaches to supporting multimedia
+applications Approach
+
+Granularity
+
+Guarantee
+
+Mechanisms
+
+Complexity
+
+Deployment to date
+
+Making the
+
+all traffic
+
+none, or
+
+application-layer
+
+best of best-
+
+treated
+
+soft
+
+support, CDNs,
+
+effort service
+
+equally
+
+minimal
+
+everywhere
+
+medium
+
+some
+
+light
+
+little
+
+overlays, networklevel resource provisioning
+
+Differentiated
+
+different
+
+none, or
+
+packet marking,
+
+service
+
+classes of
+
+soft
+
+policing, scheduling
+
+traffic treated differently Per-
+
+each
+
+soft or
+
+packet marking,
+
+connection
+
+source-
+
+hard, once
+
+policing,
+
+Quality-of-
+
+destination
+
+flow is
+
+scheduling; call
+
+Service (QoS)
+
+flows treated
+
+admitted
+
+admission and
+
+Guarantees
+
+differently
+
+signaling
+
+Per-connection Quality-of-Service (QoS) Guarantees. With per-connection
+QoS guarantees, each instance of an application explicitly reserves
+end-to-end bandwidth and thus has a guaranteed end-to-end performance. A
+hard guarantee means the application will receive its requested quality
+of service (QoS) with certainty. A soft guarantee means the application
+will receive its requested quality of service with high probability. For
+example, if a user wants to make a VoIP call from Host A to Host B, the
+user's VoIP application reserves bandwidth explicitly in each link along
+a route between the two hosts. But permitting applications to make
+reservations and requiring the network to honor the reservations
+requires some big changes. First, we need a protocol that, on behalf of
+the applications, reserves link bandwidth on the paths from the senders
+to their receivers. Second, we'll need new scheduling policies in the
+router queues so that per-connection bandwidth reservations can be
+honored. Finally, in order to make a reservation, the applications must
+give the network a description of the traffic that they intend to send
+into the network and the network will need to police each application's
+traffic to make sure that it abides by that description. These
+mechanisms, when combined, require new and complex software in hosts and
+routers. Because per-connection QoS guaranteed service has not seen
+significant deployment, we'll cover these mechanisms only briefly in
+Section 9.5.4.
+
+9.5.1 Dimensioning Best-Effort Networks Fundamentally, the difficulty in
+supporting multimedia applications arises from their stringent
+performance requirements---low end-to-end packet delay, delay jitter,
+and loss---and the fact that packet delay, delay jitter, and loss occur
+whenever the network becomes congested. A first approach to improving
+the quality of multimedia applications---an approach that can often be
+used to solve just about any problem where resources are
+constrained---is simply to "throw money at the problem" and thus simply
+avoid resource contention. In the case of networked multimedia, this
+means providing enough link capacity throughout the network so that
+network congestion, and its consequent packet delay and loss, never (or
+only very rarely) occurs. With enough link capacity, packets could zip
+through today's Internet without queuing delay or loss. From many
+perspectives this is an ideal situation---multimedia applications would
+perform perfectly, users would be happy, and this could all be achieved
+with no changes to Internet's best-effort architecture. The question, of
+course, is how much capacity is "enough" to achieve this nirvana, and
+whether the costs of providing "enough" bandwidth are practical from a
+business standpoint to the ISPs. The question of how much capacity to
+provide at network links in a given topology to achieve a given level of
+performance is often known as bandwidth provisioning. The even more
+complicated problem of how to design a network topology (where to place
+routers, how to interconnect routers with links, and what capacity to
+assign to links) to achieve a given level of end-to-end performance is a
+network design problem often referred to as network dimensioning. Both
+bandwidth provisioning and network dimensioning are complex topics, well
+beyond the scope of this textbook. We note here, however, that the
+following issues must be addressed in order to predict application-level
+performance between two network end points, and thus provision enough
+capacity to meet an application's performance requirements. Models of
+traffic demand between network end points. Models may need to be
+specified at both the call level (for example, users "arriving" to the
+network and starting up end-to-end applications) and at the packet level
+(for example, packets being generated by ongoing applications). Note
+that workload may change over time. Well-defined performance
+requirements. For example, a performance requirement for supporting
+delay-sensitive traffic, such as a conversational multimedia
+application, might be that the probability that the end-to-end delay of
+the packet is greater than a maximum tolerable delay be less than some
+small value \[Fraleigh 2003\]. Models to predict end-to-end performance
+for a given workload model, and techniques to find a minimal cost
+bandwidth allocation that will result in all user requirements being
+met. Here, researchers are busy developing performance models that can
+quantify performance for a given workload, and optimization techniques
+to find minimal-cost bandwidth allocations meeting performance
+requirements.
+
+Given that today's best-effort Internet could (from a technology
+standpoint) support multimedia traffic at an appropriate performance
+level if it were dimensioned to do so, the natural question is why
+today's Internet doesn't do so. The answers are primarily economic and
+organizational. From an economic standpoint, would users be willing to
+pay their ISPs enough for the ISPs to install sufficient bandwidth to
+support multimedia applications over a best-effort Internet? The
+organizational issues are perhaps even more daunting. Note that an
+end-to-end path between two multimedia end points will pass through the
+networks of multiple ISPs. From an organizational standpoint, would
+these ISPs be willing to cooperate (perhaps with revenue sharing) to
+ensure that the end-to-end path is properly dimensioned to support
+multimedia applications? For a perspective on these economic and
+organizational issues, see \[Davies 2005\]. For a perspective on
+provisioning tier-1 backbone networks to support delay-sensitive
+traffic, see \[Fraleigh 2003\].
+
+9.5.2 Providing Multiple Classes of Service Perhaps the simplest
+enhancement to the one-size-fits-all best-effort service in today's
+Internet is to divide traffic into classes, and provide different levels
+of service to these different classes of traffic. For example, an ISP
+might well want to provide a higher class of service to delay-sensitive
+Voice-over-IP or teleconferencing traffic (and charge more for this
+service!) than to elastic traffic such as e-mail or HTTP. Alternatively,
+an ISP may simply want to provide a higher quality of service to
+customers willing to pay more for this improved service. A number of
+residential wired-access ISPs and cellular wireless-access ISPs have
+adopted such tiered levels of service---with platinum-service
+subscribers receiving better performance than gold- or silver-service
+subscribers. We're all familiar with different classes of service from
+our everyday lives---first-class airline passengers get better service
+than business-class passengers, who in turn get better service than
+those of us who fly economy class; VIPs are provided immediate entry to
+events while everyone else waits in line; elders are revered in some
+countries and provided seats of honor and the finest food at a table.
+It's important to note that such differential service is provided among
+aggregates of traffic, that is, among classes of traffic, not among
+individual connections. For example, all first-class passengers are
+handled the same (with no first-class passenger receiving any better
+treatment than any other first-class passenger), just as all VoIP
+packets would receive the same treatment within the network, independent
+of the particular end-to-end connection to which they belong. As we will
+see, by dealing with a small number of traffic aggregates, rather than a
+large number of individual connections, the new network mechanisms
+required to provide better-than-best service can be kept relatively
+simple. The early Internet designers clearly had this notion of multiple
+classes of service in mind. Recall the type-of-service (ToS) field in
+the IPv4 header discussed in Chapter 4. IEN123 \[ISI 1979\] describes
+the ToS field also present in an ancestor of the IPv4 datagram as
+follows: "The Type of Service \[field\]
+
+provides an indication of the abstract parameters of the quality of
+service desired. These parameters are to be used to guide the selection
+of the actual service parameters when transmitting a datagram through a
+particular network. Several networks offer service precedence, which
+somehow treats high precedence traffic as more important that other
+traffic." More than four decades ago, the vision of providing different
+levels of service to different classes of traffic was clear! However,
+it's taken us an equally long period of time to realize this vision.
+Motivating Scenarios Let's begin our discussion of network mechanisms
+for providing multiple classes of service with a few motivating
+scenarios. Figure 9.11 shows a simple network scenario in which two
+application packet flows originate on Hosts H1 and H2 on one LAN and are
+destined for Hosts H3 and H4 on another LAN. The routers on the two LANs
+are connected by a 1.5 Mbps link. Let's assume the LAN speeds are
+significantly higher than 1.5 Mbps, and focus on the output queue of
+router R1; it is here that packet delay and packet loss will occur if
+the aggregate sending rate of H1 and H2 exceeds 1.5 Mbps. Let's further
+suppose that a 1 Mbps audio application (for example, a CD-quality audio
+call) shares the
+
+Figure 9.11 Competing audio and HTTP applications
+
+1.5 Mbps link between R1 and R2 with an HTTP Web-browsing application
+that is downloading a Web page from H2 to H4. In the best-effort
+Internet, the audio and HTTP packets are mixed in the output queue at R1
+and (typically) transmitted in a first-in-first-out (FIFO) order. In
+this scenario, a burst of packets from the Web
+
+server could potentially fill up the queue, causing IP audio packets to
+be excessively delayed or lost due to buffer overflow at R1. How should
+we solve this potential problem? Given that the HTTP Webbrowsing
+application does not have time constraints, our intuition might be to
+give strict priority to audio packets at R1. Under a strict priority
+scheduling discipline, an audio packet in the R1 output buffer would
+always be transmitted before any HTTP packet in the R1 output buffer.
+The link from R1 to R2 would look like a dedicated link of 1.5 Mbps to
+the audio traffic, with HTTP traffic using the R1-to-R2 link only when
+no audio traffic is queued. In order for R1 to distinguish between the
+audio and HTTP packets in its queue, each packet must be marked as
+belonging to one of these two classes of traffic. This was the original
+goal of the type-of-service (ToS) field in IPv4. As obvious as this
+might seem, this then is our first insight into mechanisms needed to
+provide multiple classes of traffic: Insight 1: Packet marking allows a
+router to distinguish among packets belonging to different classes of
+traffic. Note that although our example considers a competing multimedia
+and elastic flow, the same insight applies to the case that platinum,
+gold, and silver classes of service are implemented---a packetmarking
+mechanism is still needed to indicate that class of service to which a
+packet belongs. Now suppose that the router is configured to give
+priority to packets marked as belonging to the 1 Mbps audio application.
+Since the outgoing link speed is 1.5 Mbps, even though the HTTP packets
+receive lower priority, they can still, on average, receive 0.5 Mbps of
+transmission service. But what happens if the audio application starts
+sending packets at a rate of 1.5 Mbps or higher (either maliciously or
+due to an error in the application)? In this case, the HTTP packets will
+starve, that is, they will not receive any service on the R1-to-R2 link.
+Similar problems would occur if multiple applications (for example,
+multiple audio calls), all with the same class of service as the audio
+application, were sharing the link's bandwidth; they too could
+collectively starve the FTP session. Ideally, one wants a degree of
+isolation among classes of traffic so that one class of traffic can be
+protected from the other. This protection could be implemented at
+different places in the network---at each and every router, at first
+entry to the network, or at inter-domain network boundaries. This then
+is our second insight: Insight 2: It is desirable to provide a degree of
+traffic isolation among classes so that one class is not adversely
+affected by another class of traffic that misbehaves. We'll examine
+several specific mechanisms for providing such isolation among traffic
+classes. We note here that two broad approaches can be taken. First, it
+is possible to perform traffic policing, as shown in Figure 9.12. If a
+traffic class or flow must meet certain criteria (for example, that the
+audio flow not exceed a peak rate of 1 Mbps), then a policing mechanism
+can be put into place to ensure that these criteria are indeed observed.
+If the policed application misbehaves, the policing mechanism will take
+some action (for example, drop or delay packets that are in violation of
+the criteria) so that the traffic actually entering the network conforms
+to the criteria. The leaky bucket mechanism that we'll examine
+
+shortly is perhaps the most widely used policing mechanism. In Figure
+9.12, the packet classification and marking mechanism (Insight 1) and
+the policing mechanism (Insight 2) are both implemented together at the
+network's edge, either in the end system or at an edge router. A
+complementary approach for providing isolation among traffic classes is
+for the link-level packetscheduling mechanism to explicitly allocate a
+fixed amount of link bandwidth to each class. For example, the audio
+class could be allocated 1 Mbps at R1, and the HTTP class could be
+allocated 0.5 Mbps. In this case, the audio and
+
+Figure 9.12 Policing (and marking) the audio and HTTP traffic classes
+
+Figure 9.13 Logical isolation of audio and HTTP traffic classes
+
+HTTP flows see a logical link with capacity 1.0 and 0.5 Mbps,
+respectively, as shown in Figure 9.13. With strict enforcement of the
+link-level allocation of bandwidth, a class can use only the amount of
+bandwidth that has been allocated; in particular, it cannot utilize
+bandwidth that is not currently being used by others. For example, if
+the audio flow goes silent (for example, if the speaker pauses and
+generates no audio packets), the HTTP flow would still not be able to
+transmit more than 0.5 Mbps over the R1-to-R2 link, even though the
+audio flow's 1 Mbps bandwidth allocation is not being used at that
+moment. Since bandwidth is a "use-it-or-lose-it" resource, there is no
+reason to prevent HTTP traffic from using bandwidth not used by the
+audio traffic. We'd like to use bandwidth as efficiently as possible,
+never wasting it when it could be otherwise used. This gives rise to our
+third insight: Insight 3: While providing isolation among classes or
+flows, it is desirable to use resources (for example, link bandwidth and
+buffers) as efficiently as possible. Recall from our discussion in
+Sections 1.3 and 4.2 that packets belonging to various network flows are
+multiplexed and queued for transmission at the output buffers associated
+with a link. The manner in which queued packets are selected for
+transmission on the link is known as the link-scheduling discipline, and
+was discussed in detail in Section 4.2. Recall that in Section 4.2 three
+link-scheduling disciplines were discussed, namely, FIFO, priority
+queuing, and Weighted Fair Queuing (WFQ). We'll see soon see that WFQ
+will play a particularly important role for isolating the traffic
+classes. The Leaky Bucket One of our earlier insights was that policing,
+the regulation of the rate at which a class or flow (we will assume the
+unit of policing is a flow in our discussion below) is allowed to inject
+packets into the
+
+network, is an important QoS mechanism. But what aspects of a flow's
+packet rate should be policed? We can identify three important policing
+criteria, each differing from the other according to the time scale over
+which the packet flow is policed: Average rate. The network may wish to
+limit the long-term average rate (packets per time interval) at which a
+flow's packets can be sent into the network. A crucial issue here is the
+interval of time over which the average rate will be policed. A flow
+whose average rate is limited to 100 packets per second is more
+constrained than a source that is limited to 6,000 packets per minute,
+even though both have the same average rate over a long enough interval
+of time. For example, the latter constraint would allow a flow to send
+1,000 packets in a given second-long interval of time, while the former
+constraint would disallow this sending behavior. Peak rate. While the
+average-rate constraint limits the amount of traffic that can be sent
+into the network over a relatively long period of time, a peak-rate
+constraint limits the maximum number of packets that can be sent over a
+shorter period of time. Using our example above, the network may police
+a flow at an average rate of 6,000 packets per minute, while limiting
+the flow's peak rate to 1,500 packets per second. Burst size. The
+network may also wish to limit the maximum number of packets (the
+"burst" of packets) that can be sent into the network over an extremely
+short interval of time. In the limit, as the interval length approaches
+zero, the burst size limits the number of packets that can be
+instantaneously sent into the network. Even though it is physically
+impossible to instantaneously send multiple packets into the network
+(after all, every link has a physical transmission rate that cannot be
+exceeded!), the abstraction of a maximum burst size is a useful one. The
+leaky bucket mechanism is an abstraction that can be used to
+characterize these policing limits. As shown in Figure 9.14, a leaky
+bucket consists of a bucket that can hold up to b tokens. Tokens are
+added to this bucket as follows. New tokens, which may potentially be
+added to the bucket, are always being generated at a rate of r tokens
+per second. (We assume here for simplicity that the unit of time is a
+second.) If the bucket is filled with less than b tokens when a token is
+generated, the newly generated token is added to the bucket; otherwise
+the newly generated token is ignored, and the token bucket remains full
+with b tokens. Let us now consider how the leaky bucket can be used to
+police a packet flow. Suppose that before a packet is transmitted into
+the network, it must first remove a token from the token bucket. If the
+token bucket is empty, the packet must wait for
+
+Figure 9.14 The leaky bucket policer
+
+a token. (An alternative is for the packet to be dropped, although we
+will not consider that option here.) Let us now consider how this
+behavior polices a traffic flow. Because there can be at most b tokens
+in the bucket, the maximum burst size for a leaky-bucket-policed flow is
+b packets. Furthermore, because the token generation rate is r, the
+maximum number of packets that can enter the network of any interval of
+time of length t is rt+b. Thus, the token-generation rate, r, serves to
+limit the long-term average rate at which packets can enter the network.
+It is also possible to use leaky buckets (specifically, two leaky
+buckets in series) to police a flow's peak rate in addition to the
+long-term average rate; see the homework problems at the end of this
+chapter. Leaky Bucket + Weighted Fair Queuing = Provable Maximum Delay
+in a Queue Let's close our discussion on policing by showing how the
+leaky bucket and WFQ can be combined to provide a bound on the delay
+through a router's queue. (Readers who have forgotten about WFQ are
+encouraged to review WFQ, which is covered in Section 4.2.) Let's
+consider a router's output link that multiplexes n flows, each policed
+by a leaky bucket with parameters bi and ri,i=1,...,n, using WFQ
+scheduling. We use the term flow here loosely to refer to the set of
+packets that are not distinguished from each other by the scheduler. In
+practice, a flow might be comprised of traffic from a single end-toend
+connection or a collection of many such connections, see Figure 9.15.
+Recall from our discussion of WFQ that each flow, i, is guaranteed to
+receive a share of the link bandwidth equal to at least R⋅wi/(∑ wj),
+where R is the transmission
+
+Figure 9.15 n multiplexed leaky bucket flows with WFQ scheduling
+
+rate of the link in packets/sec. What then is the maximum delay that a
+packet will experience while waiting for service in the WFQ (that is,
+after passing through the leaky bucket)? Let us focus on flow 1. Suppose
+that flow 1's token bucket is initially full. A burst of b1 packets then
+arrives to the leaky bucket policer for flow 1. These packets remove all
+of the tokens (without wait) from the leaky bucket and then join the WFQ
+waiting area for flow 1. Since these b1 packets are served at a rate of
+at least R⋅wi/(∑ wj) packet/sec, the last of these packets will then
+have a maximum delay, dmax, until its transmission is completed, where
+dmax=b1R⋅w1/∑ wj The rationale behind this formula is that if there are
+b1 packets in the queue and packets are being serviced (removed) from
+the queue at a rate of at least R⋅w1/(∑ wj) packets per second, then the
+amount of time until the last bit of the last packet is transmitted
+cannot be more than b1/(R⋅w1/(∑ wj)). A homework problem asks you to
+prove that as long as r1\<R⋅w1/(∑ wj), then dmax is indeed the maximum
+delay that any packet in flow 1 will ever experience in the WFQ queue.
+
+9.5.3 Diffserv Having seen the motivation, insights, and specific
+mechanisms for providing multiple classes of service, let's wrap up our
+study of approaches toward proving multiple classes of service with an
+example---the Internet Diffserv architecture \[RFC 2475; Kilkki 1999\].
+Diffserv provides service differentiation---that is, the ability to
+handle different classes of traffic in different ways within the
+Internet in a scalable manner.
+
+The need for scalability arises from the fact that millions of
+simultaneous source-destination traffic flows may be present at a
+backbone router. We'll see shortly that this need is met by placing only
+simple functionality within the network core, with more complex control
+operations being implemented at the network's edge. Let's begin with the
+simple network shown in Figure 9.16. We'll describe one possible use of
+Diffserv here; other variations are possible, as described in RFC 2475.
+The Diffserv architecture consists of two sets of functional elements:
+Edge functions: Packet classification and traffic conditioning. At the
+incoming edge of the network (that is, at either a Diffserv-capable host
+that generates traffic or at the first Diffserv-capable router that the
+traffic passes through), arriving packets are marked. More specifically,
+the differentiated service (DS) field in the IPv4 or IPv6 packet header
+is set to some value \[RFC 3260\]. The definition of the DS field is
+intended to supersede the earlier definitions of the IPv4 type-ofservice
+field and the IPv6 traffic class fields that we discussed in Chapter 4.
+For example, in Figure 9.16, packets being sent from H1 to H3 might be
+marked at R1, while packets being sent from H2 to H4 might be marked at
+R2. The mark that a packet receives identifies the class of traffic to
+which it belongs. Different classes of traffic will then receive
+different service within the core network.
+
+Figure 9.16 A simple Diffserv network example
+
+Core function: Forwarding. When a DS-marked packet arrives at a
+Diffserv-capable router, the packet is forwarded onto its next hop
+according to the so-called per-hop behavior (PHB) associated with that
+packet's class. The per-hop behavior influences how a router's buffers
+and link bandwidth are shared among the competing classes of traffic. A
+crucial tenet of the Diffserv architecture is that
+
+a router's per-hop behavior will be based only on packet markings, that
+is, the class of traffic to which a packet belongs. Thus, if packets
+being sent from H1 to H3 in Figure 9.16 receive the same marking as
+packets being sent from H2 to H4, then the network routers treat these
+packets as an aggregate, without distinguishing whether the packets
+originated at H1 or H2. For example, R3 would not distinguish between
+packets from H1 and H2 when forwarding these packets on to R4. Thus, the
+Diffserv architecture obviates the need to keep router state for
+individual sourcedestination pairs---a critical consideration in making
+Diffserv scalable. An analogy might prove useful here. At many
+large-scale social events (for example, a large public reception, a
+large dance club or discothèque, a concert, or a football game), people
+entering the event receive a pass of one type or another: VIP passes for
+Very Important People; over-21 passes for people who are 21 years old or
+older (for example, if alcoholic drinks are to be served); backstage
+passes at concerts; press passes for reporters; even an ordinary pass
+for the Ordinary Person. These passes are typically distributed upon
+entry to the event, that is, at the edge of the event. It is here at the
+edge where computationally intensive operations, such as paying for
+entry, checking for the appropriate type of invitation, and matching an
+invitation against a piece of identification, are performed.
+Furthermore, there may be a limit on the number of people of a given
+type that are allowed into an event. If there is such a limit, people
+may have to wait before entering the event. Once inside the event, one's
+pass allows one to receive differentiated service at many locations
+around the event---a VIP is provided with free drinks, a better table,
+free food, entry to exclusive rooms, and fawning service. Conversely, an
+ordinary person is excluded from certain areas, pays for drinks, and
+receives only basic service. In both cases, the service received within
+the event depends solely on the type of one's pass. Moreover, all people
+within a class are treated alike. Figure 9.17 provides a logical view of
+the classification and marking functions within the edge router. Packets
+arriving to the edge router are first classified. The classifier selects
+packets based on the values of one or more packet header fields (for
+example, source address, destination address, source port, destination
+port, and protocol ID) and steers the packet to the appropriate marking
+function. As noted above, a packet's marking is carried in the DS field
+in the packet header. In some cases, an end user may have agreed to
+limit its packet-sending rate to conform to a declared traffic profile.
+The traffic profile might contain a limit on the peak rate, as well as
+the burstiness of the packet flow, as we saw previously with the leaky
+bucket mechanism. As long as the user sends packets into the network in
+a way that conforms to the negotiated traffic profile, the packets
+receive their priority
+
+Figure 9.17 A simple Diffserv network example
+
+marking and are forwarded along their route to the destination. On the
+other hand, if the traffic profile is violated, out-of-profile packets
+might be marked differently, might be shaped (for example, delayed so
+that a maximum rate constraint would be observed), or might be dropped
+at the network edge. The role of the metering function, shown in Figure
+9.17, is to compare the incoming packet flow with the negotiated traffic
+profile and to determine whether a packet is within the negotiated
+traffic profile. The actual decision about whether to immediately
+remark, forward, delay, or drop a packet is a policy issue determined by
+the network administrator and is not specified in the Diffserv
+architecture. So far, we have focused on the marking and policing
+functions in the Diffserv architecture. The second key component of the
+Diffserv architecture involves the per-hop behavior (PHB) performed by
+Diffservcapable routers. PHB is rather cryptically, but carefully,
+defined as "a description of the externally observable forwarding
+behavior of a Diffserv node applied to a particular Diffserv behavior
+aggregate" \[RFC 2475\]. Digging a little deeper into this definition,
+we can see several important considerations embedded within: A PHB can
+result in different classes of traffic receiving different performance
+(that is, different externally observable forwarding behaviors). While a
+PHB defines differences in performance (behavior) among classes, it does
+not mandate any particular mechanism for achieving these behaviors. As
+long as the externally observable performance criteria are met, any
+implementation mechanism and any buffer/bandwidth allocation policy can
+be used. For example, a PHB would not require that a particular
+packet-queuing discipline (for example, a priority queue versus a WFQ
+queue versus a FCFS queue) be used to achieve a particular behavior. The
+PHB is the end, to which resource allocation and implementation
+mechanisms are the means. Differences in performance must be observable
+and hence measurable.
+
+Two PHBs have been defined: an expedited forwarding (EF) PHB \[RFC
+3246\] and an assured forwarding (AF) PHB \[RFC 2597\]. The expedited
+forwarding PHB specifies that the departure rate of a class of traffic
+from a router must equal or exceed a configured rate. The assured
+forwarding PHB divides traffic into four classes, where each AF class is
+guaranteed to be provided with some minimum amount of bandwidth and
+buffering. Let's close our discussion of Diffserv with a few
+observations regarding its service model. First, we have implicitly
+assumed that Diffserv is deployed within a single administrative domain,
+but typically an endto-end service must be fashioned from multiple ISPs
+sitting between communicating end systems. In order to provide
+end-to-end Diffserv service, all the ISPs between the end systems must
+not only provide this service, but most also cooperate and make
+settlements in order to offer end customers true end-to-end service.
+Without this kind of cooperation, ISPs directly selling Diffserv service
+to customers will find themselves repeatedly saying: "Yes, we know you
+paid extra, but we don't have a service agreement with the ISP that
+dropped and delayed your traffic. I'm sorry that there were so many gaps
+in your VoIP call!" Second, if Diffserv were actually in place and the
+network ran at only moderate load, most of the time there would be no
+perceived difference between a best-effort service and a Diffserv
+service. Indeed, end-to-end delay is usually dominated by access rates
+and router hops rather than by queuing delays in the routers. Imagine
+the unhappy Diffserv customer who has paid more for premium service but
+finds that the best-effort service being provided to others almost
+always has the same performance as premium service!
+
+9.5.4 Per-Connection Quality-of-Service (QoS) Guarantees: Resource
+Reservation and Call Admission In the previous section, we have seen
+that packet marking and policing, traffic isolation, and link-level
+scheduling can provide one class of service with better performance than
+another. Under certain scheduling disciplines, such as priority
+scheduling, the lower classes of traffic are essentially "invisible" to
+the highest-priority class of traffic. With proper network dimensioning,
+the highest class of service can indeed achieve extremely low packet
+loss and delay---essentially circuit-like performance. But can the
+network guarantee that an ongoing flow in a high-priority traffic class
+will continue to receive such service throughout the flow's duration
+using only the mechanisms that we have described so far? It cannot. In
+this section, we'll see why yet additional network mechanisms and
+protocols are required when a hard service guarantee is provided to
+individual connections. Let's return to our scenario from Section 9.5.2
+and consider two 1 Mbps audio applications transmitting their packets
+over the 1.5 Mbps link, as shown in Figure 9.18. The combined data rate
+of the two flows (2 Mbps) exceeds the link capacity. Even with
+classification and marking, isolation of flows, and sharing of unused
+bandwidth (of which there is none), this is clearly a losing
+proposition. There is simply not
+
+enough bandwidth to accommodate the needs of both applications at
+
+Figure 9.18 Two competing audio applications overloading the R1-to-R2
+link
+
+the same time. If the two applications equally share the bandwidth, each
+application would lose 25 percent of its transmitted packets. This is
+such an unacceptably low QoS that both audio applications are completely
+unusable; there's no need even to transmit any audio packets in the
+first place. Given that the two applications in Figure 9.18 cannot both
+be satisfied simultaneously, what should the network do? Allowing both
+to proceed with an unusable QoS wastes network resources on application
+flows that ultimately provide no utility to the end user. The answer is
+hopefully clear---one of the application flows should be blocked (that
+is, denied access to the network), while the other should be allowed to
+proceed on, using the full 1 Mbps needed by the application. The
+telephone network is an example of a network that performs such call
+blocking---if the required resources (an end-to-end circuit in the case
+of the telephone network) cannot be allocated to the call, the call is
+blocked (prevented from entering the network) and a busy signal is
+returned to the user. In our example, there is no gain in allowing a
+flow into the network if it will not receive a sufficient QoS to be
+considered usable. Indeed, there is a cost to admitting a flow that does
+not receive its needed QoS, as network resources are being used to
+support a flow that provides no utility to the end user. By explicitly
+admitting or blocking flows based on their resource requirements, and
+the source requirements of already-admitted flows, the network can
+guarantee that admitted flows will be able to receive their requested
+QoS. Implicit in the need to provide a guaranteed QoS to a flow is the
+need for the flow to declare its QoS requirements. This process of
+having a flow declare its QoS requirement, and then having the network
+either accept the flow (at the required QoS) or block the flow is
+referred to as the call admission process. This then is our fourth
+insight (in addition to the three earlier insights from Section 9.5.2,)
+into the mechanisms needed to provide QoS.
+
+Insight 4: If sufficient resources will not always be available, and QoS
+is to be guaranteed, a call admission process is needed in which flows
+declare their QoS requirements and are then either admitted to the
+network (at the required QoS) or blocked from the network (if the
+required QoS cannot be provided by the network). Our motivating example
+in Figure 9.18 highlights the need for several new network mechanisms
+and protocols if a call (an end-to-end flow) is to be guaranteed a given
+quality of service once it begins: Resource reservation. The only way to
+guarantee that a call will have the resources (link bandwidth, buffers)
+needed to meet its desired QoS is to explicitly allocate those resources
+to the call---a process known in networking parlance as resource
+reservation. Once resources are reserved, the call has on-demand access
+to these resources throughout its duration, regardless of the demands of
+all other calls. If a call reserves and receives a guarantee of x Mbps
+of link bandwidth, and never transmits at a rate greater than x, the
+call will see loss- and delay-free performance. Call admission. If
+resources are to be reserved, then the network must have a mechanism for
+calls to request and reserve resources. Since resources are not
+infinite, a call making a call admission request will be denied
+admission, that is, be blocked, if the requested resources are not
+available. Such a call admission is performed by the telephone
+network---we request resources when we dial a number. If the circuits
+(TDMA slots) needed to complete the call are available, the circuits are
+allocated and the call is completed. If the circuits are not available,
+then the call is blocked, and we receive a busy signal. A blocked call
+can try again to gain admission to the network, but it is not allowed to
+send traffic into the network until it has successfully completed the
+call admission process. Of course, a router that allocates link
+bandwidth should not allocate more than is available at that link.
+Typically, a call may reserve only a fraction of the link's bandwidth,
+and so a router may allocate link bandwidth to more than one call.
+However, the sum of the allocated bandwidth to all calls should be less
+than the link capacity if hard quality of service guarantees are to be
+provided. Call setup signaling. The call admission process described
+above requires that a call be able to reserve sufficient resources at
+each and every network router on its source-to-destination path to
+ensure that its end-to-end QoS requirement is met. Each router must
+determine the local resources required by the session, consider the
+amounts of its resources that are already committed to other ongoing
+sessions, and determine whether it has sufficient resources to satisfy
+the per-hop QoS requirement of the session at this router without
+violating local QoS guarantees made to an alreadyadmitted session. A
+signaling protocol is needed to coordinate these various
+activities---the per-hop allocation of local resources, as well as the
+overall end-to-end decision of whether or not the call has been able to
+reserve suf
+
+Figure 9.19 The call setup process
+
+ficient resources at each and every router on the end-to-end path. This
+is the job of the call setup protocol, as shown in Figure 9.19. The RSVP
+protocol \[Zhang 1993, RFC 2210\] was proposed for this purpose within
+an Internet architecture for providing quality-of-service guarantees. In
+ATM networks, the Q2931b protocol \[Black 1995\] carries this
+information among the ATM network's switches and end point. Despite a
+tremendous amount of research and development, and even products that
+provide for perconnection quality of service guarantees, there has been
+almost no extended deployment of such services. There are many possible
+reasons. First and foremost, it may well be the case that the simple
+application-level mechanisms that we studied in Sections 9.2 through
+9.4, combined with proper network dimensioning (Section 9.5.1) provide
+"good enough" best-effort network service for multimedia applications.
+In addition, the added complexity and cost of deploying and managing a
+network that provides per-connection quality of service guarantees may
+be judged by ISPs to be simply too high given predicted customer
+revenues for that service.
+
+9.6 Summary Multimedia networking is one of the most exciting
+developments in the Internet today. People throughout the world less and
+less time in front of their televisions, and are instead use their
+smartphones and devices to receive audio and video transmissions, both
+live and prerecorded. Moreover, with sites like YouTube, users have
+become producers as well as consumers of multimedia Internet content. In
+addition to video distribution, the Internet is also being used to
+transport phone calls. In fact, over the next 10 years, the Internet,
+along with wireless Internet access, may make the traditional
+circuitswitched telephone system a thing of the past. VoIP not only
+provides phone service inexpensively, but also provides numerous
+value-added services, such as video conferencing, online directory
+services, voice messaging, and integration into social networks such as
+Facebook and WeChat. In Section 9.1, we described the intrinsic
+characteristics of video and voice, and then classified multimedia
+applications into three categories: (i) streaming stored audio/video,
+(ii) conversational voice/video-over-IP, and (iii) streaming live
+audio/video. In Section 9.2, we studied streaming stored video in some
+depth. For streaming video applications, prerecorded videos are placed
+on servers, and users send requests to these servers to view the videos
+on demand. We saw that streaming video systems can be classified into
+two categories: UDP streaming and HTTP. We observed that the most
+important performance measure for streaming video is average throughput.
+In Section 9.3, we examined how conversational multimedia applications,
+such as VoIP, can be designed to run over a best-effort network. For
+conversational multimedia, timing considerations are important because
+conversational applications are highly delay-sensitive. On the other
+hand, conversational multimedia applications are
+loss---tolerant---occasional loss only causes occasional glitches in
+audio/video playback, and these losses can often be partially or fully
+concealed. We saw how a combination of client buffers, packet sequence
+numbers, and timestamps can greatly alleviate the effects of
+network-induced jitter. We also surveyed the technology behind Skype,
+one of the leading voice- and video-over-IP companies. In Section 9.4,
+we examined two of the most important standardized protocols for VoIP,
+namely, RTP and SIP. In Section 9.5, we introduced how several network
+mechanisms (link-level scheduling disciplines and traffic policing) can
+be used to provide differentiated service among several classes of
+traffic.
+
+Homework Problems and Questions
+
+Chapter 9 Review Questions
+
+SECTION 9.1 R1. Reconstruct Table 9.1 for when Victor Video is watching
+a 4 Mbps video, Facebook Frank is looking at a new 100 Kbyte image every
+20 seconds, and Martha Music is listening to 200 kbps audio stream. R2.
+There are two types of redundancy in video. Describe them, and discuss
+how they can be exploited for efficient compression. R3. Suppose an
+analog audio signal is sampled 16,000 times per second, and each sample
+is quantized into one of 1024 levels. What would be the resulting bit
+rate of the PCM digital audio signal? R4. Multimedia applications can be
+classified into three categories. Name and describe each category.
+
+SECTION 9.2 R5. Streaming video systems can be classified into three
+categories. Name and briefly describe each of these categories. R6. List
+three disadvantages of UDP streaming. R7. With HTTP streaming, are the
+TCP receive buffer and the client's application buffer the same thing?
+If not, how do they interact? R8. Consider the simple model for HTTP
+streaming. Suppose the server sends bits at a constant rate of 2 Mbps
+and playback begins when 8 million bits have been received. What is the
+initial buffering delay tp?
+
+SECTION 9.3 R9. What is the difference between end-to-end delay and
+packet jitter? What are the causes of packet jitter? R10. Why is a
+packet that is received after its scheduled playout time considered
+lost? R11. Section 9.3 describes two FEC schemes. Briefly summarize
+them. Both schemes increase the transmission rate of the stream by
+adding overhead. Does interleaving also increase the
+
+transmission rate?
+
+SECTION 9.4 R12. How are different RTP streams in different sessions
+identified by a receiver? How are different streams from within the same
+session identified? R13. What is the role of a SIP registrar? How is the
+role of an SIP registrar different from that of a home agent in Mobile
+IP?
+
+Problems P1. Consider the figure below. Similar to our discussion of
+Figure 9.1 , suppose that video is encoded at a fixed bit rate, and thus
+each video block contains video frames that are to be played out over
+the same fixed amount of time, Δ. The server transmits the first video
+block at t0, the second block at t0+Δ, the third block at t0+2Δ, and so
+on. Once the client begins playout, each block should be played out Δ
+time units after the previous block.
+
+a.  Suppose that the client begins playout as soon as the first block
+    arrives at t1. In the figure below, how many blocks of video
+    (including the first block) will have arrived at the client in time
+    for their playout? Explain how you arrived at your answer.
+
+b.  Suppose that the client begins playout now at t1+Δ. How many blocks
+    of video (including the first block) will have arrived at the client
+    in time for their playout? Explain how you arrived at your answer.
+
+c.  In the same scenario at (b) above, what is the largest number of
+    blocks that is ever stored in the client buffer, awaiting playout?
+    Explain how you arrived at your answer.
+
+d.  What is the smallest playout delay at the client, such that every
+    video block has arrived in time for its playout? Explain how you
+    arrived at your answer.
+
+P2. Recall the simple model for HTTP streaming shown in Figure 9.3 .
+Recall that B denotes the size of the client's application buffer, and Q
+denotes the number of bits that must be buffered before the client
+application begins playout. Also r denotes the video consumption rate.
+Assume that the server sends bits at a constant rate x whenever the
+client buffer is not full. a. Suppose that x\<r. As discussed in the
+text, in this case playout will alternate between periods of continuous
+playout and periods of freezing. Determine the length of each continuous
+playout and freezing period as a function of Q, r, and x. b. Now suppose
+that x\>r. At what time t=tf does the client application buffer become
+full? P3. Recall the simple model for HTTP streaming shown in Figure 9.3
+. Suppose the buffer size is infinite but the server sends bits at
+variable rate x(t). Specifically, suppose x(t) has the following
+saw-tooth shape. The rate is initially zero at time t=0 and linearly
+climbs to H at time t=T. It then repeats this pattern again and again,
+as shown in the figure below.
+
+a.  What is the server's average send rate?
+
+b.  Suppose that Q=0, so that the client starts playback as soon as it
+    receives a video frame. What will happen?
+
+c.  Now suppose Q\>0 and HT/2≥Q. Determine as a function of Q, H, and T
+    the time at which playback first begins.
+
+d.  Suppose H\>2r and Q=HT/2. Prove there will be no freezing after the
+    initial playout delay.
+
+e.  Suppose H\>2r. Find the smallest value of Q such that there will be
+    no freezing after the initial playback delay.
+
+f.  Now suppose that the buffer size B is finite. Suppose H\>2r. As a
+    function of Q, B, T, and H, determine the time t=tf when the client
+    application buffer first becomes full. P4. Recall the simple model
+    for HTTP streaming shown in Figure 9.3 . Suppose the client
+    application buffer is infinite, the server sends at the constant
+    rate x, and the video consumption r\<x.
+
+rate is r with Also suppose playback begins immediately. Suppose that
+the user terminates the video early at time t=E. At the time of
+termination, the server stops sending bits (if it hasn't already sent
+all the bits in the video).
+
+a.  Suppose the video is infinitely long. How many bits are wasted (that
+    is, sent but not viewed)?
+
+b.  Suppose the video is T seconds long with T\>E. How many bits are
+    wasted (that is, sent but not viewed)? P5. Consider a DASH system
+    (as discussed in Section 2.6 ) for which there are N video versions
+    (at N different rates and qualities) and N audio versions (at N
+    different rates and qualities). Suppose we want to allow the player
+    to choose at any time any of the N video versions and any of the N
+    audio versions.
+
+c.  If we create files so that the audio is mixed in with the video, so
+    server sends only one media stream at given time, how many files
+    will the server need to store (each a different URL)?
+
+d.  If the server instead sends the audio and video streams separately
+    and has the client synchronize the streams, how many files will the
+    server need to store? P6. In the VoIP example in Section 9.3 , let h
+    be the total number of header bytes added to each chunk, including
+    UDP and IP header.
+
+e.  Assuming an IP datagram is emitted every 20 msecs, find the
+    transmission rate in bits per second for the datagrams generated by
+    one side of this application.
+
+f.  What is a typical value of h when RTP is used? P7. Consider the
+    procedure described in Section 9.3 for estimating average delay di.
+    Suppose that u=0.1. Let r1−t1 be the most recent sample delay, let
+    r2−t2 be the next most recent sample delay, and so on.
+
+g.  For a given audio application suppose four packets have arrived at
+    the receiver with sample delays r4−t4, r3−t3, r2−t2, and r1−t1.
+    Express the estimate of delay d in terms of the four samples.
+
+h.  Generalize your formula for n sample delays.
+
+i.  For the formula in part (b), let n approach infinity and give the
+    resulting formula. Comment on why this averaging procedure is called
+    an exponential moving average. P8. Repeat parts (a) and (b) in
+    Question P7 for the estimate of average delay deviation. P9. For the
+    VoIP example in Section 9.3 , we introduced an online procedure
+    (exponential moving average) for estimating delay. In this problem
+    we will examine an alternative procedure. Let ti be the timestamp of
+    the ith packet received; let ri be the time at which the ith packet
+    is received. Let dn be our estimate of average delay after receiving
+    the nth packet. After the first packet is received, we set the delay
+    estimate equal to d1=r1−t1.
+
+a. Suppose that we would like dn=(r1−t1+r2−t2+⋯+rn−tn)/n for all n. Give
+a recursive formula for dn in terms of dn−1, rn, and tn.
+
+b.  Describe why for Internet telephony, the delay estimate described in
+    Section 9.3 is more appropriate than the delay estimate outlined in
+    part (a). P10. Compare the procedure described in Section 9.3 for
+    estimating average delay with the procedure in Section 3.5 for
+    estimating round-trip time. What do the procedures have in common?
+    How are they different? P11. Consider the figure below (which is
+    similar to Figure 9.3 ). A sender begins sending packetized audio
+    periodically at t=1. The first packet arrives at the receiver at
+    t=8.
+
+c.  What are the delays (from sender to receiver, ignoring any playout
+    delays) of packets 2 through 8? Note that each vertical and
+    horizontal line segment in the figure has a length of 1, 2, or 3
+    time units.
+
+d.  If audio playout begins as soon as the first packet arrives at the
+    receiver at t=8, which of the first eight packets sent will not
+    arrive in time for playout?
+
+e.  If audio playout begins at t=9, which of the first eight packets
+    sent will not arrive in time for playout?
+
+f.  What is the minimum playout delay at the receiver that results in
+    all of the first eight packets arriving in time for their playout?
+    P12. Consider again the figure in P11, showing packet audio
+    transmission and reception times.
+
+g.  Compute the estimated delay for packets 2 through 8, using the
+    formula for di from Section 9.3.2 . Use a value of u=0.1.
+
+b. Compute the estimated deviation of the delay from the estimated
+average for packets 2 through 8, using the formula for vi from Section
+9.3.2 . Use a value of u=0.1. P13. Recall the two FEC schemes for VoIP
+described in Section 9.3 . Suppose the first scheme generates a
+redundant chunk for every four original chunks. Suppose the second
+scheme uses a low-bit rate encoding whose transmission rate is 25
+percent of the transmission rate of the nominal stream.
+
+a.  How much additional bandwidth does each scheme require? How much
+    playback delay does each scheme add?
+
+b.  How do the two schemes perform if the first packet is lost in every
+    group of five packets? Which scheme will have better audio quality?
+
+c.  How do the two schemes perform if the first packet is lost in every
+    group of two packets? Which scheme will have better audio quality?
+    P14.
+
+d.  Consider an audio conference call in Skype with N\>2 participants.
+    Suppose each participant generates a constant stream of rate r bps.
+    How many bits per second will the call initiator need to send? How
+    many bits per second will each of the other N−1 participants need to
+    send? What is the total send rate, aggregated over all participants?
+
+e.  Repeat part (a) for a Skype video conference call using a central
+    server.
+
+f.  Repeat part (b), but now for when each peer sends a copy of its
+    video stream to each of the N−1 other peers. P15.
+
+g.  Suppose we send into the Internet two IP datagrams, each carrying a
+    different UDP segment. The first datagram has source IP address A1,
+    destination IP address B, source port P1, and destination port T.
+    The second datagram has source IP address A2, destination IP address
+    B, source port P2, and destination port T. Suppose that A1 is
+    different from A2 and that P1 is different from P2. Assuming that
+    both datagrams reach their final destination, will the two UDP
+    datagrams be received by the same socket? Why or why not?
+
+h.  Suppose Alice, Bob, and Claire want to have an audio conference call
+    using SIP and RTP. For Alice to send and receive RTP packets to and
+    from Bob and Claire, is only one UDP socket sufficient (in addition
+    to the socket needed for the SIP messages)? If yes, then how does
+    Alice's SIP client distinguish between the RTP packets received from
+    Bob and Claire? P16. True or false:
+
+i.  If stored video is streamed directly from a Web server to a media
+    player, then the application is using TCP as the underlying
+    transport protocol.
+
+b. When using RTP, it is possible for a sender to change encoding in the
+middle of a session.
+
+c.  All applications that use RTP must use port 87.
+
+d.  If an RTP session has a separate audio and video stream for each
+    sender, then the audio and video streams use the same SSRC.
+
+e.  In differentiated services, while per-hop behavior defines
+    differences in performance among classes, it does not mandate any
+    particular mechanism for achieving these performances.
+
+f.  Suppose Alice wants to establish an SIP session with Bob. In her
+    INVITE message she includes the line: m=audio 48753 RTP/AVP 3 (AVP 3
+    denotes GSM audio). Alice has therefore indicated in this message
+    that she wishes to send GSM audio.
+
+g.  Referring to the preceding statement, Alice has indicated in her
+    INVITE message that she will send audio to port 48753.
+
+h.  SIP messages are typically sent between SIP entities using a default
+    SIP port number.
+
+i.  In order to maintain registration, SIP clients must periodically
+    send REGISTER messages.
+
+j.  SIP mandates that all SIP clients support G.711 audio encoding. P17.
+    Consider the figure below, which shows a leaky bucket policer being
+    fed by a stream of packets. The token buffer can hold at most two
+    tokens, and is initially full at t=0. New tokens arrive at a rate of
+    one token per slot. The output link speed is such that if two
+    packets obtain tokens at the beginning of a time slot, they can both
+    go to the output link in the same slot. The timing details of the
+    system are as follows:
+
+A. Packets (if any) arrive at the beginning of the slot. Thus in the
+figure, packets 1, 2, and 3 arrive in slot 0. If there are already
+packets in the queue, then the arriving packets join the end of the
+queue. Packets proceed towards the front of the queue in a FIFO manner.
+
+B. After the arrivals have been added to the queue, if there are any
+queued packets, one or two of those packets (depending on the number of
+available tokens) will each remove a token from the token buffer and go
+to the output link during that slot. Thus, packets 1 and
+
+2 each remove a token from the buffer (since there are initially two
+tokens) and go to the output link during slot 0.
+
+C. A new token is added to the token buffer if it is not full, since the
+token generation rate is r = 1 token/slot.
+
+D. Time then advances to the next time slot, and these steps repeat.
+Answer the following questions:
+
+a.  For each time slot, identify the packets that are in the queue and
+    the number of tokens in the bucket, immediately after the arrivals
+    have been processed (step 1 above) but before any of the packets
+    have passed through the queue and removed a token. Thus, for the t=0
+    time slot in the example above, packets 1, 2, and 3 are in the
+    queue, and there are two tokens in the buffer.
+
+b.  For each time slot indicate which packets appear on the output after
+    the token(s) have been removed from the queue. Thus, for the t=0
+    time slot in the example above, packets 1 and 2 appear on the output
+    link from the leaky buffer during slot 0. P18. Repeat P17 but assume
+    that r=2. Assume again that the bucket is initially full. P19.
+    Consider P18 and suppose now that r=3 and that b=2 as before. Will
+    your answer to the question above change? P20. Consider the leaky
+    bucket policer that polices the average rate and burst size of a
+    packet flow. We now want to police the peak rate, p, as well. Show
+    how the output of this leaky bucket policer can be fed into a second
+    leaky bucket policer so that the two leaky buckets in series police
+    the average rate, peak rate, and burst size. Be sure to give the
+    bucket size and token generation rate for the second policer. P21. A
+    packet flow is said to conform to a leaky bucket specification
+    (r, b) with burst size b and average rate r if the number of packets
+    that arrive to the leaky bucket is less than rt+b packets in every
+    interval of time of length t for all t. Will a packet flow that
+    conforms to a leaky bucket specification (r, b) ever have to wait at
+    a leaky bucket policer with parameters r and b? Justify your answer.
+    P22. Show that as long as r1\<Rw1/(∑ wj), then dmax is indeed the
+    maximum delay that any packet in flow 1 will ever experience in the
+    WFQ queue. Programming Assignment In this lab, you will implement a
+    streaming video server and client. The client will use the real-time
+    streaming protocol (RTSP) to control the actions of the server. The
+    server will use the real-time protocol (RTP) to packetize the video
+    for transport over UDP. You will be given Python code that partially
+    implements RTSP and RTP at the client and server. Your job will be
+    to complete both the client and server code. When you are finished,
+    you will have created a client-server application that does the
+    following:
+
+The client sends SETUP, PLAY, PAUSE, and TEARDOWN RTSP commands, and the
+server responds to the commands. When the server is in the playing
+state, it periodically grabs a stored JPEG frame, packetizes the frame
+with RTP, and sends the RTP packet into a UDP socket. The client
+receives the RTP packets, removes the JPEG frames, decompresses the
+frames, and renders the frames on the client's monitor. The code you
+will be given implements the RTSP protocol in the server and the RTP
+depacketization in the client. The code also takes care of displaying
+the transmitted video. You will need to implement RTSP in the client and
+RTP server. This programming assignment will significantly enhance the
+student's understanding of RTP, RTSP, and streaming video. It is highly
+recommended. The assignment also suggests a number of optional
+exercises, including implementing the RTSP DESCRIBE command at both
+client and server. You can find full details of the assignment, as well
+as an overview of the RTSP protocol, at the Web site
+www.pearsonhighered.com/cs-resources. AN INTERVIEW WITH . . . Henning
+Schulzrinne Henning Schulzrinne is a professor, chair of the Department
+of Computer Science, and head of the Internet Real-Time Laboratory at
+Columbia University. He is the co-author of RTP, RTSP, SIP, and
+GIST---key protocols for audio and video communications over the
+Internet. Henning received his BS in electrical and industrial
+engineering at TU Darmstadt in Germany, his MS in electrical and
+computer engineering at the University of Cincinnati, and his PhD in
+electrical engineering at the University of Massachusetts, Amherst.
+
+What made you decide to specialize in multimedia networking? This
+happened almost by accident. As a PhD student, I got involved with
+DARTnet, an experimental network spanning the United States with T1
+lines. DARTnet was used as a proving ground for multicast and Internet
+real-time tools. That led me to write my first audio tool, NeVoT.
+Through some of the DARTnet participants, I became involved in the IETF,
+in the then-nascent
+
+Audio Video Transport working group. This group later ended up
+standardizing RTP. What was your first job in the computer industry?
+What did it entail? My first job in the computer industry was soldering
+together an Altair computer kit when I was a high school student in
+Livermore, California. Back in Germany, I started a little consulting
+company that devised an address management program for a travel
+agency---storing data on cassette tapes for our TRS-80 and using an IBM
+Selectric typewriter with a home-brew hardware interface as a printer.
+My first real job was with AT&T Bell Laboratories, developing a network
+emulator for constructing experimental networks in a lab environment.
+What are the goals of the Internet Real-Time Lab? Our goal is to provide
+components and building blocks for the Internet as the single future
+communications infrastructure. This includes developing new protocols,
+such as GIST (for network-layer signaling) and LoST (for finding
+resources by location), or enhancing protocols that we have worked on
+earlier, such as SIP, through work on rich presence, peer-to-peer
+systems, next-generation emergency calling, and service creation tools.
+Recently, we have also looked extensively at wireless systems for VoIP,
+as 802.11b and 802.11n networks and maybe WiMax networks are likely to
+become important last-mile technologies for telephony. We are also
+trying to greatly improve the ability of users to diagnose faults in the
+complicated tangle of providers and equipment, using a peer-to-peer
+fault diagnosis system called DYSWIS (Do You See What I See). We try to
+do practically relevant work, by building prototypes and open source
+systems, by measuring performance of real systems, and by contributing
+to IETF standards. What is your vision for the future of multimedia
+networking? We are now in a transition phase; just a few years shy of
+when IP will be the universal platform for multimedia services, from
+IPTV to VoIP. We expect radio, telephone, and TV to be available even
+during snowstorms and earthquakes, so when the Internet takes over the
+role of these dedicated networks, users will expect the same level of
+reliability. We will have to learn to design network technologies for an
+ecosystem of competing carriers, service and content providers, serving
+lots of technically untrained users and defending them against a small,
+but destructive, set of malicious and criminal users. Changing protocols
+is becoming increasingly hard. They are also becoming more complex, as
+they need to take into account competing business interests, security,
+privacy, and the lack of transparency of networks caused by firewalls
+and network address translators. Since multimedia networking is becoming
+the foundation for almost all of consumer
+
+entertainment, there will be an emphasis on managing very large
+networks, at low cost. Users will expect ease of use, such as finding
+the same content on all of their devices. Why does SIP have a promising
+future? As the current wireless network upgrade to 3G networks proceeds,
+there is the hope of a single multimedia signaling mechanism spanning
+all types of networks, from cable modems, to corporate telephone
+networks and public wireless networks. Together with software radios,
+this will make it possible in the future that a single device can be
+used on a home network, as a cordless BlueTooth phone, in a corporate
+network via 802.11 and in the wide area via 3G networks. Even before we
+have such a single universal wireless device, the personal mobility
+mechanisms make it possible to hide the differences between networks.
+One identifier becomes the universal means of reaching a person, rather
+than remembering or passing around half a dozen technology- or
+location-specific telephone numbers. SIP also breaks apart the provision
+of voice (bit) transport from voice services. It now becomes technically
+possible to break apart the local telephone monopoly, where one company
+provides neutral bit transport, while others provide IP "dial tone" and
+the classical telephone services, such as gateways, call forwarding, and
+caller ID. Beyond multimedia signaling, SIP offers a new service that
+has been missing in the Internet: event notification. We have
+approximated such services with HTTP kludges and e-mail, but this was
+never very satisfactory. Since events are a common abstraction for
+distributed systems, this may simplify the construction of new services.
+Do you have any advice for students entering the networking field?
+Networking bridges disciplines. It draws from electrical engineering,
+all aspects of computer science, operations research, statistics,
+economics, and other disciplines. Thus, networking researchers have to
+be familiar with subjects well beyond protocols and routing algorithms.
+Given that networks are becoming such an important part of everyday
+life, students wanting to make a difference in the field should think of
+the new resource constraints in networks: human time and effort, rather
+than just bandwidth or storage. Work in networking research can be
+immensely satisfying since it is about allowing people to communicate
+and exchange ideas, one of the essentials of being human. The Internet
+has become the third major global infrastructure, next to the
+transportation system and energy distribution. Almost no part of the
+economy can work without high-performance networks, so there should be
+plenty of opportunities for the foreseeable future.
+
+References A note on URLs. In the references below, we have provided
+URLs for Web pages, Web-only documents, and other material that has not
+been published in a conference or journal (when we have been able to
+locate a URL for such material). We have not provided URLs for
+conference and journal publications, as these documents can usually be
+located via a search engine, from the conference Web site (e.g., papers
+in all ACM SIGCOMM conferences and workshops can be located via
+http://www.acm.org/ sigcomm), or via a digital library subscription.
+While all URLs provided below were valid (and tested) in Jan. 2016, URLs
+can become out of date. Please consult the online version of this book
+(www.pearsonhighered .com/cs-resources) for an up-to-date bibliography.
+A note on Internet Request for Comments (RFCs): Copies of Internet RFCs
+are available at many sites. The RFC Editor of the Internet Society (the
+body that oversees the RFCs) maintains the site,
+http://www.rfc-editor.org. This site allows you to search for a specific
+RFC by title, number, or authors, and will show updates to any RFCs
+listed. Internet RFCs can be updated or obsoleted by later RFCs. Our
+favorite site for getting RFCs is the original
+source---http://www.rfc-editor.org. \[3GPP 2016\] Third Generation
+Partnership Project homepage, http://www.3gpp.org/
+
+\[Abramson 1970\] N. Abramson, "The Aloha System---Another Alternative
+for Computer Communications," Proc. 1970 Fall Joint Computer Conference,
+AFIPS Conference, p. 37, 1970.
+
+\[Abramson 1985\] N. Abramson, "Development of the Alohanet," IEEE
+Transactions on Information Theory, Vol. IT-31, No. 3 (Mar. 1985),
+pp. 119--123.
+
+\[Abramson 2009\] N. Abramson, "The Alohanet---Surfing for Wireless
+Data," IEEE Communications Magazine, Vol. 47, No. 12, pp. 21--25.
+
+\[Adhikari 2011a\] V. K. Adhikari, S. Jain, Y. Chen, Z. L. Zhang,
+"Vivisecting YouTube: An Active Measurement Study," Technical Report,
+University of Minnesota, 2011.
+
+\[Adhikari 2012\] V. K. Adhikari, Y. Gao, F. Hao, M. Varvello, V. Hilt,
+M. Steiner, Z. L. Zhang, "Unreeling Netflix: Understanding and Improving
+Multi-CDN Movie Delivery," Technical Report, University of Minnesota,
+2012.
+
+\[Afanasyev 2010\] A. Afanasyev, N. Tilley, P. Reiher, L. Kleinrock,
+"Host-to-Host Congestion Control for TCP," IEEE Communications Surveys &
+Tutorials, Vol. 12, No. 3, pp. 304--342.
+
+\[Agarwal 2009\] S. Agarwal, J. Lorch, "Matchmaking for Online Games and
+Other Latency-sensitive P2P Systems," Proc. 2009 ACM SIGCOMM.
+
+\[Ager 2012\] B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, W.
+Willinger, "Anatomy of a Large European ISP," Sigcomm, 2012.
+
+\[Ahn 1995\] J. S. Ahn, P. B. Danzig, Z. Liu, and Y. Yan, "Experience
+with TCP Vegas: Emulation and Experiment," Proc. 1995 ACM SIGCOMM
+(Boston, MA, Aug. 1995), pp. 185--195.
+
+\[Akamai 2016\] Akamai homepage, http://www.akamai.com
+
+\[Akella 2003\] A. Akella, S. Seshan, A. Shaikh, "An Empirical
+Evaluation of Wide-Area Internet Bottlenecks," Proc. 2003 ACM Internet
+Measurement Conference (Miami, FL, Nov. 2003).
+
+\[Akhshabi 2011\] S. Akhshabi, A. C. Begen, C. Dovrolis, "An
+Experimental Evaluation of Rate-Adaptation Algorithms in Adaptive
+Streaming over HTTP," Proc. 2011 ACM Multimedia Systems Conf.
+
+\[Akyildiz 2010\] I. Akyildiz, D. Gutierrex-Estevez, E. Reyes, "The
+Evolution to 4G Cellular Systems, LTE Advanced," Physical Communication,
+Elsevier, 3 (2010), 217--244.
+
+\[Albitz 1993\] P. Albitz and C. Liu, DNS and BIND, O'Reilly &
+Associates, Petaluma, CA, 1993.
+
+\[Al-Fares 2008\] M. Al-Fares, A. Loukissas, A. Vahdat, "A Scalable,
+Commodity Data Center Network Architecture," Proc. 2008 ACM SIGCOMM.
+
+\[Amazon 2014\] J. Hamilton, "AWS: Innovation at Scale, YouTube video,
+https://www.youtube.com/watch?v=JIQETrFC_SQ
+
+\[Anderson 1995\] J. B. Andersen, T. S. Rappaport, S. Yoshida,
+"Propagation Measurements and Models for Wireless Communications
+Channels," IEEE Communications Magazine, (Jan. 1995), pp. 42--49.
+
+\[Alizadeh 2010\] M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P.
+Patel, B. Prabhakar, S. Sengupta, M. Sridharan. "Data center TCP
+(DCTCP)," ACM SIGCOMM 2010 Conference, ACM, New York, NY, USA,
+pp. 63--74.
+
+\[Allman 2011\] E. Allman, "The Robustness Principle Reconsidered:
+Seeking a Middle Ground," Communications of the ACM, Vol. 54, No. 8
+(Aug. 2011), pp. 40--45.
+
+\[Appenzeller 2004\] G. Appenzeller, I. Keslassy, N. McKeown, "Sizing
+Router Buffers," Proc. 2004 ACM SIGCOMM (Portland, OR, Aug. 2004).
+
+\[ASO-ICANN 2016\] The Address Supporting Organization homepage,
+http://www.aso.icann.org
+
+\[AT&T 2013\] "AT&T Vision Alignment Challenge Technology Survey," AT&T
+Domain 2.0 Vision White Paper, November 13, 2013.
+
+\[Atheros 2016\] Atheros Communications Inc., "Atheros AR5006 WLAN
+Chipset Product Bulletins,"
+http://www.atheros.com/pt/AR5006Bulletins.htm
+
+\[Ayanoglu 1995\] E. Ayanoglu, S. Paul, T. F. La Porta, K. K. Sabnani,
+R. D. Gitlin, "AIRMAIL: A Link-Layer Protocol for Wireless Networks,"
+ACM ACM/Baltzer Wireless Networks Journal, 1: 47--60, Feb. 1995.
+
+\[Bakre 1995\] A. Bakre, B. R. Badrinath, "I-TCP: Indirect TCP for
+Mobile Hosts," Proc. 1995 Int. Conf. on Distributed Computing Systems
+(ICDCS) (May 1995), pp. 136--143.
+
+\[Balakrishnan 1997\] H. Balakrishnan, V. Padmanabhan, S. Seshan, R.
+Katz, "A Comparison of Mechanisms for Improving TCP Performance Over
+Wireless Links," IEEE/ACM Transactions on Networking Vol. 5, No. 6
+(Dec. 1997).
+
+\[Balakrishnan 2003\] H. Balakrishnan, F. Kaashoek, D. Karger, R.
+Morris, I. Stoica, "Looking Up Data in P2P Systems," Communications of
+the ACM, Vol. 46, No. 2 (Feb. 2003), pp. 43--48.
+
+\[Baldauf 2007\] M. Baldauf, S. Dustdar, F. Rosenberg, "A Survey on
+Context-Aware Systems," Int. J. Ad Hoc and Ubiquitous Computing, Vol. 2,
+No. 4 (2007), pp. 263--277.
+
+\[Baran 1964\] P. Baran, "On Distributed Communication Networks," IEEE
+Transactions on Communication Systems, Mar. 1964. Rand Corporation
+Technical report with the same title (Memorandum RM-3420-PR, 1964).
+http://www.rand.org/publications/RM/RM3420/
+
+\[Bardwell 2004\] J. Bardwell, "You Believe You Understand What You
+Think I Said . . . The Truth About 802.11 Signal and Noise Metrics: A
+Discussion Clarifying OftenMisused 802.11 WLAN Terminologies,"
+http://www.connect802.com/download/techpubs/2004/you_believe_D100201.pdf
+
+\[Barford 2009\] P. Barford, N. Duffield, A. Ron, J. Sommers, "Network
+Performance Anomaly Detection and Localization," Proc. 2009 IEEE INFOCOM
+(Apr. 2009).
+
+\[Baronti 2007\] P. Baronti, P. Pillai, V. Chook, S. Chessa, A. Gotta,
+Y. Hu, "Wireless Sensor Networks: A Survey on the State of the Art and
+the 802.15.4 and ZigBee Standards," Computer Communications, Vol. 30,
+No. 7 (2007), pp. 1655--1695.
+
+\[Baset 2006\] S. A. Basset and H. Schulzrinne, "An Analysis of the
+Skype Peer-to-Peer Internet Telephony Protocol," Proc. 2006 IEEE INFOCOM
+(Barcelona, Spain, Apr. 2006).
+
+\[BBC 2001\] BBC news online "A Small Slice of Design," Apr. 2001,
+http://news.bbc.co.uk/2/hi/science/nature/1264205.stm
+
+\[Beheshti 2008\] N. Beheshti, Y. Ganjali, M. Ghobadi, N. McKeown, G.
+Salmon, "Experimental Study of Router Buffer Sizing," Proc. ACM Internet
+Measurement Conference (Oct. 2008, Vouliagmeni, Greece).
+
+\[Bender 2000\] P. Bender, P. Black, M. Grob, R. Padovani, N.
+Sindhushayana, A. Viterbi, "CDMA/HDR: A Bandwidth-Efficient High-Speed
+Wireless Data Service for Nomadic Users," IEEE Commun. Mag., Vol. 38,
+No. 7 (July 2000), pp. 70--77.
+
+\[Berners-Lee 1989\] T. Berners-Lee, CERN, "Information Management: A
+Proposal," Mar. 1989, May 1990. http://www.w3.org/History/1989/proposal
+.html
+
+\[Berners-Lee 1994\] T. Berners-Lee, R. Cailliau, A. Luotonen, H.
+Frystyk Nielsen, A. Secret, "The World-Wide Web," Communications of the
+ACM, Vol. 37, No. 8 (Aug. 1994), pp. 76--82.
+
+\[Bertsekas 1991\] D. Bertsekas, R. Gallagher, Data Networks, 2nd Ed.,
+Prentice Hall, Englewood Cliffs, NJ, 1991.
+
+\[Biersack 1992\] E. W. Biersack, "Performance Evaluation of Forward
+Error Correction in ATM Networks," Proc. 1999 ACM SIGCOMM (Baltimore,
+MD, Aug. 1992), pp. 248--257.
+
+\[BIND 2016\] Internet Software Consortium page on BIND,
+http://www.isc.org/bind.html
+
+\[Bisdikian 2001\] C. Bisdikian, "An Overview of the Bluetooth Wireless
+Technology," IEEE Communications Magazine, No. 12 (Dec. 2001),
+pp. 86--94.
+
+\[Bishop 2003\] M. Bishop, Computer Security: Art and Science, Boston:
+Addison Wesley, Boston MA, 2003.
+
+\[Black 1995\] U. Black, ATM Volume I: Foundation for Broadband
+Networks, Prentice Hall, 1995.
+
+\[Black 1997\] U. Black, ATM Volume II: Signaling in Broadband Networks,
+Prentice Hall, 1997.
+
+\[Blumenthal 2001\] M. Blumenthal, D. Clark, "Rethinking the Design of
+the Internet: The End-to-end Arguments vs. the Brave New World," ACM
+Transactions on Internet Technology, Vol. 1, No. 1 (Aug. 2001),
+pp. 70--109.
+
+\[Bochman 1984\] G. V. Bochmann, C. A. Sunshine, "Formal Methods in
+Communication Protocol Design," IEEE Transactions on Communications,
+Vol. 28, No. 4 (Apr. 1980) pp. 624--631.
+
+\[Bolot 1996\] J-C. Bolot, A. Vega-Garcia, "Control Mechanisms for
+Packet Audio in the Internet," Proc. 1996 IEEE INFOCOM, pp. 232--239.
+
+\[Bosshart 2013\] P. Bosshart, G. Gibb, H. Kim, G. Varghese, N. McKeown,
+M. Izzard, F. Mujica, M. Horowitz, "Forwarding Metamorphosis: Fast
+Programmable Match-Action Processing in Hardware for SDN," ACM SIGCOMM
+Comput. Commun. Rev. 43, 4 (Aug. 2013), 99--110.
+
+\[Bosshart 2014\] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown,
+J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, D.
+Walker, "P4: Programming Protocol-Independent Packet Processors," ACM
+SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), pp. 87--95.
+
+\[Brakmo 1995\] L. Brakmo, L. Peterson, "TCP Vegas: End to End
+Congestion Avoidance on a Global Internet," IEEE Journal of Selected
+Areas in Communications, Vol. 13, No. 8 (Oct. 1995), pp. 1465--1480.
+
+\[Bryant 1988\] B. Bryant, "Designing an Authentication System: A
+Dialogue in Four Scenes," http://web.mit.edu/kerberos/www/dialogue.html
+
+\[Bush 1945\] V. Bush, "As We May Think," The Atlantic Monthly, July
+1945. http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
+
+\[Byers 1998\] J. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A Digital
+Fountain Approach to Reliable Distribution of Bulk Data," Proc. 1998 ACM
+SIGCOMM (Vancouver, Canada, Aug. 1998), pp. 56--67.
+
+\[Caesar 2005a\] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A.
+Shaikh, J. van der Merwe, "Design and implementation of a Routing
+Control Platform," Proc. Networked Systems Design and Implementation
+(May 2005).
+
+\[Caesar 2005b\] M. Caesar, J. Rexford, "BGP Routing Policies in ISP
+Networks," IEEE Network Magazine, Vol. 19, No. 6 (Nov. 2005).
+
+\[Caldwell 2012\] C. Caldwell, "The Prime Pages,"
+http://www.utm.edu/research/primes/prove
+
+\[Cardwell 2000\] N. Cardwell, S. Savage, T. Anderson, "Modeling TCP
+Latency," Proc. 2000 IEEE INFOCOM (Tel-Aviv, Israel, Mar. 2000).
+
+\[Casado 2007\] M. Casado, M. Freedman, J. Pettit, J. Luo, N. McKeown,
+S. Shenker, "Ethane: Taking Control of the Enterprise," Proc. ACM
+SIGCOMM '07, New York, pp. 1--12. See also IEEE/ACM Trans. Networking,
+17, 4 (Aug. 2007), pp. 270--1283.
+
+\[Casado 2009\] M. Casado, M. Freedman, J. Pettit, J. Luo, N. Gude, N.
+McKeown, S. Shenker, "Rethinking Enterprise Network Control," IEEE/ACM
+Transactions on Networking (ToN), Vol. 17, No. 4 (Aug. 2009),
+pp. 1270--1283.
+
+\[Casado 2014\] M. Casado, N. Foster, A. Guha, "Abstractions for
+Software-Defined Networks," Communications of the ACM, Vol. 57 No. 10,
+(Oct. 2014), pp. 86--95.
+
+\[Cerf 1974\] V. Cerf, R. Kahn, "A Protocol for Packet Network
+Interconnection," IEEE Transactions on Communications Technology, Vol.
+COM-22, No. 5, pp. 627--641.
+
+\[CERT 2001--09\] CERT, "Advisory 2001--09: Statistical Weaknesses in
+TCP/IP Initial Sequence Numbers,"
+http://www.cert.org/advisories/CA-2001-09.html
+
+\[CERT 2003--04\] CERT, "CERT Advisory CA-2003-04 MS-SQL Server Worm,"
+http://www.cert.org/advisories/CA-2003-04.html
+
+\[CERT 2016\] CERT, http://www.cert.org
+
+\[CERT Filtering 2012\] CERT, "Packet Filtering for Firewall Systems,"
+http://www.cert.org/tech_tips/packet_filtering.html
+
+\[Cert SYN 1996\] CERT, "Advisory CA-96.21: TCP SYN Flooding and IP
+Spoofing Attacks," http://www.cert.org/advisories/CA-1998-01.html
+
+\[Chandra 2007\] T. Chandra, R. Greisemer, J. Redstone, "Paxos Made
+Live: an Engineering Perspective," Proc. of 2007 ACM Symposium on
+Principles of Distributed Computing (PODC), pp. 398--407.
+
+\[Chao 2001\] H. J. Chao, C. Lam, E. Oki, Broadband Packet Switching
+Technologies---A Practical Guide to ATM Switches and IP Routers, John
+Wiley & Sons, 2001.
+
+\[Chao 2011\] C. Zhang, P. Dunghel, D. Wu, K. W. Ross, "Unraveling the
+BitTorrent Ecosystem," IEEE Transactions on Parallel and Distributed
+Systems, Vol. 22, No. 7 (July 2011).
+
+\[Chen 2000\] G. Chen, D. Kotz, "A Survey of Context-Aware Mobile
+Computing Research," Technical Report TR2000-381, Dept. of Computer
+Science, Dartmouth College, Nov. 2000.
+http://www.cs.dartmouth.edu/reports/TR2000-381.pdf
+
+\[Chen 2006\] K.-T. Chen, C.-Y. Huang, P. Huang, C.-L. Lei, "Quantifying
+Skype User Satisfaction," Proc. 2006 ACM SIGCOMM (Pisa, Italy,
+Sept. 2006).
+
+\[Chen 2011\] Y. Chen, S. Jain, V. K. Adhikari, Z. Zhang,
+"Characterizing Roles of Front-End Servers in End-to-End Performance of
+Dynamic Content Distribution," Proc. 2011 ACM Internet Measurement
+Conference (Berlin, Germany, Nov. 2011).
+
+\[Cheswick 2000\] B. Cheswick, H. Burch, S. Branigan, "Mapping and
+Visualizing the Internet," Proc. 2000 Usenix Conference (San Diego, CA,
+June 2000).
+
+\[Chiu 1989\] D. Chiu, R. Jain, "Analysis of the Increase and Decrease
+Algorithms for Congestion Avoidance in Computer Networks," Computer
+Networks and ISDN Systems, Vol. 17, No. 1, pp. 1--14.
+http://www.cs.wustl.edu/\~jain/papers/cong_av.htm
+
+\[Christiansen 2001\] M. Christiansen, K. Jeffay, D. Ott, F. D. Smith,
+"Tuning Red for Web Traffic," IEEE/ACM Transactions on Networking, Vol.
+9, No. 3 (June 2001), pp. 249--264.
+
+\[Chuang 2005\] S. Chuang, S. Iyer, N. McKeown, "Practical Algorithms
+for Performance Guarantees in Buffered Crossbars," Proc. 2005 IEEE
+INFOCOM.
+
+\[Cisco 802.11ac 2014\] Cisco Systems, "802.11ac: The Fifth Generation
+of Wi-Fi," Technical White Paper, Mar. 2014.
+
+\[Cisco 7600 2016\] Cisco Systems, "Cisco 7600 Series Solution and
+Design Guide,"
+http://www.cisco.com/en/US/products/hw/routers/ps368/prod_technical\_
+reference09186a0080092246.html
+
+\[Cisco 8500 2012\] Cisco Systems Inc., "Catalyst 8500 Campus Switch
+Router Architecture,"
+http://www.cisco.com/univercd/cc/td/doc/product/l3sw/8540/rel_12_0/w5_6f/softcnfg/1cfg8500.pdf
+
+\[Cisco 12000 2016\] Cisco Systems Inc., "Cisco XR 12000 Series and
+Cisco 12000 Series Routers,"
+http://www.cisco.com/en/US/products/ps6342/index.html
+
+\[Cisco 2012\] Cisco 2012, Data Centers, http://www.cisco.com/go/dce
+
+\[Cisco 2015\] Cisco Visual Networking Index: Forecast and Methodology,
+2014--2019, White Paper, 2015.
+
+\[Cisco 6500 2016\] Cisco Systems, "Cisco Catalyst 6500 Architecture
+White Paper," http://www.cisco.com/c/en/us/products/collateral/switches/
+catalyst-6500-seriesswitches/prod_white_paper0900aecd80673385.html
+
+\[Cisco NAT 2016\] Cisco Systems Inc., "How NAT Works,"
+http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080094831.shtml
+
+\[Cisco QoS 2016\] Cisco Systems Inc., "Advanced QoS Services for the
+Intelligent Internet,"
+http://www.cisco.com/warp/public/cc/pd/iosw/ioft/ioqo/tech/qos_wp.htm
+
+\[Cisco Queue 2016\] Cisco Systems Inc., "Congestion Management
+Overview,"
+http://www.cisco.com/en/US/docs/ios/12_2/qos/configuration/guide/qcfconmg.html
+
+\[Cisco SYN 2016\] Cisco Systems Inc., "Defining Strategies to Protect
+Against TCP SYN Denial of Service Attacks,"
+http://www.cisco.com/en/US/tech/tk828/
+technologies_tech_note09186a00800f67d5.shtml
+
+\[Cisco TCAM 2014\] Cisco Systems Inc., "CAT 6500 and 7600 Series
+Routers and Switches TCAM Allocation Adjustment Procedures,"
+http://www.cisco.com/c/en/us/
+support/docs/switches/catalyst-6500-series-switches/117712-problemsolution-cat6500-00.html
+
+\[Cisco VNI 2015\] Cisco Systems Inc., "Visual Networking Index,"
+http://www.cisco.com/web/solutions/sp/vni/vni_forecast_highlights/index.html
+
+\[Clark 1988\] D. Clark, "The Design Philosophy of the DARPA Internet
+Protocols," Proc. 1988 ACM SIGCOMM (Stanford, CA, Aug. 1988).
+
+\[Cohen 1977\] D. Cohen, "Issues in Transnet Packetized Voice
+Communication," Proc. Fifth Data Communications Symposium (Snowbird, UT,
+Sept. 1977), pp. 6--13.
+
+\[Cookie Central 2016\] Cookie Central homepage,
+http://www.cookiecentral.com/ n_cookie_faq.htm
+
+\[Cormen 2001\] T. H. Cormen, Introduction to Algorithms, 2nd Ed., MIT
+Press, Cambridge, MA, 2001.
+
+\[Crow 1997\] B. Crow, I. Widjaja, J. Kim, P. Sakai, "IEEE 802.11
+Wireless Local Area Networks," IEEE Communications Magazine
+(Sept. 1997), pp. 116--126.
+
+\[Cusumano 1998\] M. A. Cusumano, D. B. Yoffie, Competing on Internet
+Time: Lessons from Netscape and Its Battle with Microsoft, Free Press,
+New York, NY, 1998.
+
+\[Czyz 2014\] J. Czyz, M. Allman, J. Zhang, S. Iekel-Johnson, E.
+Osterweil, M. Bailey, "Measuring IPv6 Adoption," Proc. ACM SIGCOMM 2014,
+ACM, New York, NY, USA, pp. 87--98.
+
+\[Dahlman 1998\] E. Dahlman, B. Gudmundson, M. Nilsson, J. Sköld,
+"UMTS/IMT-2000 Based on Wideband CDMA," IEEE Communications Magazine
+(Sept. 1998), pp. 70--80.
+
+\[Daigle 1991\] J. N. Daigle, Queuing Theory for Telecommunications,
+Addison-Wesley, Reading, MA, 1991.
+
+\[DAM 2016\] Digital Attack Map, http://www.digitalattackmap.com
+
+\[Davie 2000\] B. Davie and Y. Rekhter, MPLS: Technology and
+Applications, Morgan Kaufmann Series in Networking, 2000.
+
+\[Davies 2005\] G. Davies, F. Kelly, "Network Dimensioning, Service
+Costing, and Pricing in a Packet-Switched Environment,"
+Telecommunications Policy, Vol. 28, No. 4, pp. 391--412.
+
+\[DEC 1990\] Digital Equipment Corporation, "In Memoriam: J. C. R.
+Licklider 1915--1990," SRC Research Report 61, Aug. 1990.
+http://www.memex.org/ licklider.pdf
+
+\[DeClercq 2002\] J. DeClercq, O. Paridaens, "Scalability Implications
+of Virtual Private Networks," IEEE Communications Magazine, Vol. 40,
+No. 5 (May 2002), pp. 151--157.
+
+\[Demers 1990\] A. Demers, S. Keshav, S. Shenker, "Analysis and
+Simulation of a Fair Queuing Algorithm," Internetworking: Research and
+Experience, Vol. 1, No. 1 (1990), pp. 3--26.
+
+\[dhc 2016\] IETF Dynamic Host Configuration working group homepage,
+http://www.ietf.org/html.charters/dhc-charter.html
+
+\[Dhungel 2012\] P. Dhungel, K. W. Ross, M. Steiner., Y. Tian, X. Hei,
+"Xunlei: Peer-Assisted Download Acceleration on a Massive Scale,"
+Passive and Active Measurement Conference (PAM) 2012, Vienna, 2012.
+
+\[Diffie 1976\] W. Diffie, M. E. Hellman, "New Directions in
+Cryptography," IEEE Transactions on Information Theory, Vol IT-22
+(1976), pp. 644--654.
+
+\[Diggavi 2004\] S. N. Diggavi, N. Al-Dhahir, A. Stamoulis, R.
+Calderbank, "Great Expectations: The Value of Spatial Diversity in
+Wireless Networks," Proceedings of the IEEE, Vol. 92, No. 2 (Feb. 2004).
+
+\[Dilley 2002\] J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman,
+B. Weihl, "Globally Distributed Content Delivert," IEEE Internet
+Computing (Sept.--Oct. 2002).
+
+\[Diot 2000\] C. Diot, B. N. Levine, B. Lyles, H. Kassem, D.
+Balensiefen, "Deployment Issues for the IP Multicast Service and
+Architecture," IEEE Network, Vol. 14, No. 1 (Jan./Feb. 2000) pp. 78--88.
+
+\[Dischinger 2007\] M. Dischinger, A. Haeberlen, K. Gummadi, S. Saroiu,
+"Characterizing residential broadband networks," Proc. 2007 ACM Internet
+Measurement Conference, pp. 24--26.
+
+\[Dmitiropoulos 2007\] X. Dmitiropoulos, D. Krioukov, M. Fomenkov, B.
+Huffaker, Y. Hyun, K. C. Claffy, G. Riley, "AS Relationships: Inference
+and Validation," ACM Computer Communication Review (Jan. 2007).
+
+\[DOCSIS 2011\] Data-Over-Cable Service Interface Specifications, DOCSIS
+3.0: MAC and Upper Layer Protocols Interface Specification,
+CM-SP-MULPIv3.0-I16-110623, 2011.
+
+\[Dodge 2016\] M. Dodge, "An Atlas of Cyberspaces,"
+http://www.cybergeography.org/atlas/isp_maps.html
+
+\[Donahoo 2001\] M. Donahoo, K. Calvert, TCP/IP Sockets in C: Practical
+Guide for Programmers, Morgan Kaufman, 2001.
+
+\[DSL 2016\] DSL Forum homepage, http://www.dslforum.org/
+
+\[Dhunghel 2008\] P. Dhungel, D. Wu, B. Schonhorst, K.W. Ross, "A
+Measurement Study of Attacks on BitTorrent Leechers," 7th International
+Workshop on Peer-to-Peer Systems (IPTPS 2008) (Tampa Bay, FL,
+Feb. 2008).
+
+\[Droms 2002\] R. Droms, T. Lemon, The DHCP Handbook (2nd Edition), SAMS
+Publishing, 2002.
+
+\[Edney 2003\] J. Edney and W. A. Arbaugh, Real 802.11 Security: Wi-Fi
+Protected Access and 802.11i, Addison-Wesley Professional, 2003.
+
+\[Edwards 2011\] W. K. Edwards, R. Grinter, R. Mahajan, D. Wetherall,
+"Advancing the State of Home Networking," Communications of the ACM,
+Vol. 54, No. 6 (June 2011), pp. 62--71.
+
+\[Ellis 1987\] H. Ellis, "The Story of Non-Secret Encryption,"
+http://jya.com/ellisdoc.htm
+
+\[Erickson 2013\] D. Erickson, " The Beacon Openflow Controller," 2nd
+ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking
+(HotSDN '13). ACM, New York, NY, USA, pp. 13--18.
+
+\[Ericsson 2012\] Ericsson, "The Evolution of Edge,"
+http://www.ericsson.com/technology/whitepapers/broadband/evolution_of_EDGE.shtml
+
+\[Facebook 2014\] A. Andreyev, "Introducing Data Center Fabric, the
+Next-Generation Facebook Data Center Network,"
+https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network
+
+\[Faloutsos 1999\] C. Faloutsos, M. Faloutsos, P. Faloutsos, "What Does
+the Internet Look Like? Empirical Laws of the Internet Topology," Proc.
+1999 ACM SIGCOMM (Boston, MA, Aug. 1999).
+
+\[Farrington 2010\] N. Farrington, G. Porter, S. Radhakrishnan, H.
+Bazzaz, V. Subramanya, Y. Fainman, G. Papen, A. Vahdat, "Helios: A
+Hybrid Electrical/Optical Switch Architecture for Modular Data Centers,"
+Proc. 2010 ACM SIGCOMM.
+
+\[Feamster 2004\] N. Feamster, H. Balakrishnan, J. Rexford, A. Shaikh,
+K. van der Merwe, "The Case for Separating Routing from Routers," ACM
+SIGCOMM Workshop on Future Directions in Network Architecture,
+Sept. 2004.
+
+\[Feamster 2004\] N. Feamster, J. Winick, J. Rexford, "A Model for BGP
+Routing for Network Engineering," Proc. 2004 ACM SIGMETRICS (New York,
+NY, June 2004).
+
+\[Feamster 2005\] N. Feamster, H. Balakrishnan, "Detecting BGP
+Configuration Faults with Static Analysis," NSDI (May 2005).
+
+\[Feamster 2013\] N. Feamster, J. Rexford, E. Zegura, "The Road to SDN,"
+ACM Queue, Volume 11, Issue 12, (Dec. 2013).
+
+\[Feldmeier 1995\] D. Feldmeier, "Fast Software Implementation of Error
+Detection Codes," IEEE/ACM Transactions on Networking, Vol. 3, No. 6
+(Dec. 1995), pp. 640--652.
+
+\[Ferguson 2013\] A. Ferguson, A. Guha, C. Liang, R. Fonseca, S.
+Krishnamurthi, "Participatory Networking: An API for Application Control
+of SDNs," Proceedings ACM SIGCOMM 2013, pp. 327--338.
+
+\[Fielding 2000\] R. Fielding, "Architectural Styles and the Design of
+Network-based Software Architectures," 2000. PhD Thesis, UC Irvine,
+2000.
+
+\[FIPS 1995\] Federal Information Processing Standard, "Secure Hash
+Standard," FIPS Publication 180-1.
+http://www.itl.nist.gov/fipspubs/fip180-1.htm
+
+\[Floyd 1999\] S. Floyd, K. Fall, "Promoting the Use of End-to-End
+Congestion Control in the Internet," IEEE/ACM Transactions on
+Networking, Vol. 6, No. 5 (Oct. 1998), pp. 458--472.
+
+\[Floyd 2000\] S. Floyd, M. Handley, J. Padhye, J. Widmer,
+"Equation-Based Congestion Control for Unicast Applications," Proc. 2000
+ACM SIGCOMM (Stockholm, Sweden, Aug. 2000).
+
+\[Floyd 2001\] S. Floyd, "A Report on Some Recent Developments in TCP
+Congestion Control," IEEE Communications Magazine (Apr. 2001).
+
+\[Floyd 2016\] S. Floyd, "References on RED (Random Early Detection)
+Queue Management," http://www.icir.org/floyd/red.html
+
+\[Floyd Synchronization 1994\] S. Floyd, V. Jacobson, "Synchronization
+of Periodic Routing Messages," IEEE/ACM Transactions on Networking, Vol.
+2, No. 2 (Apr. 1997) pp. 122--136.
+
+\[Floyd TCP 1994\] S. Floyd, "TCP and Explicit Congestion Notification,"
+ACM SIGCOMM Computer Communications Review, Vol. 24, No. 5 (Oct. 1994),
+pp. 10--23.
+
+\[Fluhrer 2001\] S. Fluhrer, I. Mantin, A. Shamir, "Weaknesses in the
+Key Scheduling Algorithm of RC4," Eighth Annual Workshop on Selected
+Areas in Cryptography (Toronto, Canada, Aug. 2002).
+
+\[Fortz 2000\] B. Fortz, M. Thorup, "Internet Traffic Engineering by
+Optimizing OSPF Weights," Proc. 2000 IEEE INFOCOM (Tel Aviv, Israel,
+Apr. 2000).
+
+\[Fortz 2002\] B. Fortz, J. Rexford, M. Thorup, "Traffic Engineering
+with Traditional IP Routing Protocols," IEEE Communication Magazine
+(Oct. 2002).
+
+\[Fraleigh 2003\] C. Fraleigh, F. Tobagi, C. Diot, "Provisioning IP
+Backbone Networks to Support Latency Sensitive Traffic," Proc. 2003 IEEE
+INFOCOM (San Francisco, CA, Mar. 2003).
+
+\[Frost 1994\] J. Frost, "BSD Sockets: A Quick and Dirty Primer,"
+http://world.std .com/\~jimf/papers/sockets/sockets.html
+
+\[FTC 2015\] Internet of Things: Privacy and Security in a Connected
+World, Federal Trade Commission, 2015,
+https://www.ftc.gov/system/files/documents/reports/
+federal-trade-commission-staff-report-november-2013-workshop-entitled-internet-things-privacy/150127iotrpt.pdf
+
+\[FTTH 2016\] Fiber to the Home Council, http://www.ftthcouncil.org/
+
+\[Gao 2001\] L. Gao, J. Rexford, "Stable Internet Routing Without Global
+Coordination," IEEE/ACM Transactions on Networking, Vol. 9, No. 6
+(Dec. 2001), pp. 681--692.
+
+\[Gartner 2014\] Gartner report on Internet of Things,
+http://www.gartner.com/ technology/research/internet-of-things
+
+\[Gauthier 1999\] L. Gauthier, C. Diot, and J. Kurose, "End-to-End
+Transmission Control Mechanisms for Multiparty Interactive Applications
+on the Internet," Proc. 1999 IEEE INFOCOM (New York, NY, Apr. 1999).
+
+\[Gember-Jacobson 2014\] A. Gember-Jacobson, R. Viswanathan, C. Prakash,
+R. Grandl, J. Khalid, S. Das, A. Akella, "OpenNF: Enabling Innovation in
+Network Function Control," Proc. ACM SIGCOMM 2014, pp. 163--174.
+
+\[Goodman 1997\] David J. Goodman, Wireless Personal Communications
+Systems, Prentice-Hall, 1997.
+
+\[Google IPv6 2015\] Google Inc. "IPv6 Statistics,"
+https://www.google.com/intl/en/ipv6/statistics.html
+
+\[Google Locations 2016\] Google data centers.
+http://www.google.com/corporate/datacenter/locations.html
+
+\[Goralski 1999\] W. Goralski, Frame Relay for High-Speed Networks, John
+Wiley, New York, 1999.
+
+\[Greenberg 2009a\] A. Greenberg, J. Hamilton, D. Maltz, P. Patel, "The
+Cost of a Cloud: Research Problems in Data Center Networks," ACM
+Computer Communications Review (Jan. 2009).
+
+\[Greenberg 2009b\] A. Greenberg, N. Jain, S. Kandula, C. Kim, P.
+Lahiri, D. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and Flexible
+Data Center Network," Proc. 2009 ACM SIGCOMM.
+
+\[Greenberg 2011\] A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C.
+Kim, P. Lahiri, D. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and
+Flexible Data Center Network," Communications of the ACM, Vol. 54, No. 3
+(Mar. 2011), pp. 95--104.
+
+\[Greenberg 2015\] A. Greenberg, "SDN for the Cloud," Sigcomm 2015
+Keynote Address,
+http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/keynote.pdf
+
+\[Griffin 2012\] T. Griffin, "Interdomain Routing Links,"
+http://www.cl.cam.ac.uk/\~tgg22/interdomain/
+
+\[Gude 2008\] N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N.
+McKeown, and S. Shenker, "NOX: Towards an Operating System for
+Networks," ACM SIGCOMM Computer Communication Review, July 2008.
+
+\[Guha 2006\] S. Guha, N. Daswani, R. Jain, "An Experimental Study of
+the Skype Peer-to-Peer VoIP System," Proc. Fifth Int. Workshop on P2P
+Systems (Santa Barbara, CA, 2006).
+
+\[Guo 2005\] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, X. Zhang,
+"Measurement, Analysis, and Modeling of BitTorrent-Like Systems," Proc.
+2005 ACM Internet Measurement Conference.
+
+\[Guo 2009\] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y.
+Zhang, S. Lu, "BCube: A High Performance, Server-centric Network
+Architecture for Modular Data Centers," Proc. 2009 ACM SIGCOMM.
+
+\[Gupta 2001\] P. Gupta, N. McKeown, "Algorithms for Packet
+Classification," IEEE Network Magazine, Vol. 15, No. 2 (Mar./Apr. 2001),
+pp. 24--32.
+
+\[Gupta 2014\] A. Gupta, L. Vanbever, M. Shahbaz, S. Donovan, B.
+Schlinker, N. Feamster, J. Rexford, S. Shenker, R. Clark, E.
+Katz-Bassett, "SDX: A Software Defined Internet Exchange, " Proc. ACM
+SIGCOMM 2014 (Aug. 2014), pp. 551--562.
+
+\[Ha 2008\] S. Ha, I. Rhee, L. Xu, "CUBIC: A New TCP-Friendly High-Speed
+TCP Variant," ACM SIGOPS Operating System Review, 2008.
+
+\[Halabi 2000\] S. Halabi, Internet Routing Architectures, 2nd Ed.,
+Cisco Press, 2000.
+
+\[Hanabali 2005\] A. A. Hanbali, E. Altman, P. Nain, "A Survey of TCP
+over Ad Hoc Networks," IEEE Commun. Surveys and Tutorials, Vol. 7, No. 3
+(2005), pp. 22--36.
+
+\[Hei 2007\] X. Hei, C. Liang, J. Liang, Y. Liu, K. W. Ross, "A
+Measurement Study of a Large-scale P2P IPTV System," IEEE Trans. on
+Multimedia (Dec. 2007).
+
+\[Heidemann 1997\] J. Heidemann, K. Obraczka, J. Touch, "Modeling the
+Performance of HTTP over Several Transport Protocols," IEEE/ACM
+Transactions on Networking, Vol. 5, No. 5 (Oct. 1997), pp. 616--630.
+
+\[Held 2001\] G. Held, Data Over Wireless Networks: Bluetooth, WAP, and
+Wireless LANs, McGraw-Hill, 2001.
+
+\[Holland 2001\] G. Holland, N. Vaidya, V. Bahl, "A Rate-Adaptive MAC
+Protocol for Multi-Hop Wireless Networks," Proc. 2001 ACM Int.
+Conference of Mobile Computing and
+
+Networking (Mobicom01) (Rome, Italy, July 2001).
+
+\[Hollot 2002\] C.V. Hollot, V. Misra, D. Towsley, W. Gong, "Analysis
+and Design of Controllers for AQM Routers Supporting TCP Flows," IEEE
+Transactions on Automatic Control, Vol. 47, No. 6 (June 2002),
+pp. 945--959.
+
+\[Hong 2013\] C. Hong, S, Kandula, R. Mahajan, M.Zhang, V. Gill, M.
+Nanduri, R. Wattenhofer, "Achieving High Utilization with
+Software-driven WAN," ACM SIGCOMM Conference (Aug. 2013), pp.15--26.
+
+\[Huang 2002\] C. Haung, V. Sharma, K. Owens, V. Makam, "Building
+Reliable MPLS Networks Using a Path Protection Mechanism," IEEE
+Communications Magazine, Vol. 40, No. 3 (Mar. 2002), pp. 156--162.
+
+\[Huang 2005\] Y. Huang, R. Guerin, "Does Over-Provisioning Become More
+or Less Efficient as Networks Grow Larger?," Proc. IEEE Int. Conf.
+Network Protocols (ICNP) (Boston MA, Nov. 2005).
+
+\[Huang 2008\] C. Huang, J. Li, A. Wang, K. W. Ross, "Understanding
+Hybrid CDN-P2P: Why Limelight Needs Its Own Red Swoosh," Proc. 2008
+NOSSDAV, Braunschweig, Germany.
+
+\[Huitema 1998\] C. Huitema, IPv6: The New Internet Protocol, 2nd Ed.,
+Prentice Hall, Englewood Cliffs, NJ, 1998.
+
+\[Huston 1999a\] G. Huston, "Interconnection, Peering, and
+Settlements---Part I," The Internet Protocol Journal, Vol. 2, No. 1
+(Mar. 1999).
+
+\[Huston 2004\] G. Huston, "NAT Anatomy: A Look Inside Network Address
+Translators," The Internet Protocol Journal, Vol. 7, No. 3 (Sept. 2004).
+
+\[Huston 2008a\] G. Huston, "Confronting IPv4 Address Exhaustion,"
+http://www.potaroo.net/ispcol/2008-10/v4depletion.html
+
+\[Huston 2008b\] G. Huston, G. Michaelson, "IPv6 Deployment: Just where
+are we?" http://www.potaroo.net/ispcol/2008-04/ipv6.html
+
+\[Huston 2011a\] G. Huston, "A Rough Guide to Address Exhaustion," The
+Internet Protocol Journal, Vol. 14, No. 1 (Mar. 2011).
+
+\[Huston 2011b\] G. Huston, "Transitioning Protocols," The Internet
+Protocol Journal, Vol. 14, No. 1 (Mar. 2011).
+
+\[IAB 2016\] Internet Architecture Board homepage, http://www.iab.org/
+
+\[IANA Protocol Numbers 2016\] Internet Assigned Numbers Authority,
+Protocol Numbers,
+http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
+
+\[IBM 1997\] IBM Corp., IBM Inside APPN - The Essential Guide to the
+Next-Generation SNA, SG24-3669-03, June 1997.
+
+\[ICANN 2016\] The Internet Corporation for Assigned Names and Numbers
+homepage, http://www.icann.org
+
+\[IEEE 802 2016\] IEEE 802 LAN/MAN Standards Committee homepage,
+http://www.ieee802.org/
+
+\[IEEE 802.11 1999\] IEEE 802.11, "1999 Edition (ISO/IEC 8802-11: 1999)
+IEEE Standards for Information Technology---Telecommunications and
+Information Exchange Between Systems---Local and Metropolitan Area
+Network---Specific Requirements---Part 11: Wireless LAN Medium Access
+Control (MAC) and Physical Layer (PHY) Specification,"
+http://standards.ieee.org/getieee802/download/802.11-1999.pdf
+
+\[IEEE 802.11ac 2013\] IEEE, "802.11ac-2013---IEEE Standard for
+Information technology---Telecommunications and Information Exchange
+Between Systems---Local and Metropolitan Area Networks---Specific
+Requirements---Part 11: Wireless LAN Medium Access Control (MAC) and
+Physical Layer (PHY) Specifications---Amendment 4: Enhancements for Very
+High Throughput for Operation in Bands Below 6 GHz."
+
+\[IEEE 802.11n 2012\] IEEE, "IEEE P802.11---Task Group N---Meeting
+Update: Status of 802.11n,"
+http://grouper.ieee.org/groups/802/11/Reports/tgn_update .htm
+
+\[IEEE 802.15 2012\] IEEE 802.15 Working Group for WPAN homepage,
+http://grouper.ieee.org/groups/802/15/.
+
+\[IEEE 802.15.4 2012\] IEEE 802.15 WPAN Task Group 4,
+http://www.ieee802.org/15/pub/TG4.html
+
+\[IEEE 802.16d 2004\] IEEE, "IEEE Standard for Local and Metropolitan
+Area Networks, Part 16: Air Interface for Fixed Broadband Wireless
+Access Systems," http://
+standards.ieee.org/getieee802/download/802.16-2004.pdf
+
+\[IEEE 802.16e 2005\] IEEE, "IEEE Standard for Local and Metropolitan
+Area Networks, Part 16: Air Interface for Fixed and Mobile Broadband
+Wireless Access Systems, Amendment 2: Physical and Medium Access Control
+Layers for Combined Fixed and Mobile Operation in Licensed Bands and
+Corrigendum 1," http://
+standards.ieee.org/getieee802/download/802.16e-2005.pdf
+
+\[IEEE 802.1q 2005\] IEEE, "IEEE Standard for Local and Metropolitan
+Area Networks: Virtual Bridged Local Area Networks,"
+http://standards.ieee.org/ getieee802/ download/802.1Q-2005.pdf
+
+\[IEEE 802.1X\] IEEE Std 802.1X-2001 Port-Based Network Access Control,
+http://standards.ieee.org/reading/ieee/std_public/description/lanman/
+802.1x-2001_desc.html
+
+\[IEEE 802.3 2012\] IEEE, "IEEE 802.3 CSMA/CD (Ethernet),"
+http://grouper.ieee.org/groups/802/3/
+
+\[IEEE 802.5 2012\] IEEE, IEEE 802.5 homepage, http://www.ieee802.org/5/
+www8025org/
+
+\[IETF 2016\] Internet Engineering Task Force homepage,
+http://www.ietf.org
+
+\[Ihm 2011\] S. Ihm, V. S. Pai, "Towards Understanding Modern Web
+Traffic," Proc. 2011 ACM Internet Measurement Conference (Berlin).
+
+\[IMAP 2012\] The IMAP Connection, http://www.imap.org/
+
+\[Intel 2016\] Intel Corp., "Intel 710 Ethernet Adapter,"
+http://www.intel.com/
+content/www/us/en/ethernet-products/converged-network-adapters/ethernet-xl710
+.html
+
+\[Internet2 Multicast 2012\] Internet2 Multicast Working Group homepage,
+http://www.internet2.edu/multicast/
+
+\[ISC 2016\] Internet Systems Consortium homepage, http://www.isc.org
+
+\[ISI 1979\] Information Sciences Institute, "DoD Standard Internet
+Protocol," Internet Engineering Note 123 (Dec. 1979),
+http://www.isi.edu/in-notes/ien/ ien123.txt
+
+\[ISO 2016\] International Organization for Standardization homepage,
+International Organization for Standardization, http://www.iso.org/
+
+\[ISO X.680 2002\] International Organization for Standardization,
+"X.680: ITU-T Recommendation X.680 (2002) Information
+Technology---Abstract Syntax Notation One (ASN.1): Specification of
+Basic Notation,"
+http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf
+
+\[ITU 1999\] Asymmetric Digital Subscriber Line (ADSL) Transceivers.
+ITU-T G.992.1, 1999.
+
+\[ITU 2003\] Asymmetric Digital Subscriber Line (ADSL)
+Transceivers---Extended Bandwidth ADSL2 (ADSL2Plus). ITU-T G.992.5,
+2003.
+
+\[ITU 2005a\] International Telecommunication Union, "ITU-T X.509, The
+Directory: Public-key and attribute certificate frameworks" (Aug. 2005).
+
+\[ITU 2006\] ITU, "G.993.1: Very High Speed Digital Subscriber Line
+Transceivers (VDSL)," https://www.itu.int/rec/T-REC-G.993.1-200406-I/en,
+2006.
+
+\[ITU 2015\] "Measuring the Information Society Report," 2015,
+http://www.itu.int/en/ITU-D/Statistics/Pages/publications/mis2015.aspx
+
+\[ITU 2012\] The ITU homepage, http://www.itu.int/
+
+\[ITU-T Q.2931 1995\] International Telecommunication Union,
+"Recommendation Q.2931 (02/95)---Broadband Integrated Services Digital
+Network (B-ISDN)--- Digital Subscriber Signalling System No. 2 (DSS
+2)---User-Network Interface (UNI)---Layer 3 Specification for Basic
+Call/Connection Control."
+
+\[IXP List 2016\] List of IXPs, Wikipedia,
+https://en.wikipedia.org/wiki/List_of\_ Internet_exchange_points
+
+\[Iyengar 2015\] J. Iyengar, I. Swett, "QUIC: A UDP-Based Secure and
+Reliable Transport for HTTP/2," Internet Draft
+draft-tsvwg-quic-protocol-00, June 2015.
+
+\[Iyer 2008\] S. Iyer, R. R. Kompella, N. McKeown, "Designing Packet
+Buffers for Router Line Cards," IEEE Transactions on Networking, Vol.
+16, No. 3 (June 2008), pp. 705--717.
+
+\[Jacobson 1988\] V. Jacobson, "Congestion Avoidance and Control," Proc.
+1988 ACM SIGCOMM (Stanford, CA, Aug. 1988), pp. 314--329.
+
+\[Jain 1986\] R. Jain, "A Timeout-Based Congestion Control Scheme for
+Window Flow-Controlled Networks," IEEE Journal on Selected Areas in
+Communications SAC-4, 7 (Oct. 1986).
+
+\[Jain 1989\] R. Jain, "A Delay-Based Approach for Congestion Avoidance
+in Interconnected Heterogeneous Computer Networks," ACM SIGCOMM Computer
+Communications Review, Vol. 19, No. 5 (1989), pp. 56--71.
+
+\[Jain 1994\] R. Jain, FDDI Handbook: High-Speed Networking Using Fiber
+and Other Media, Addison-Wesley, Reading, MA, 1994.
+
+\[Jain 1996\] R. Jain. S. Kalyanaraman, S. Fahmy, R. Goyal, S. Kim,
+"Tutorial Paper on ABR Source Behavior," ATM Forum/96-1270, Oct. 1996.
+http://www.cse.wustl.edu/ \~jain/atmf/ftp/atm96-1270.pdf
+
+\[Jain 2013\] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A.
+Singh, S.Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S.
+Stuart, A, Vahdat, "B4: Experience with a Globally Deployed Software
+Defined Wan," ACM SIGCOMM 2013, pp. 3--14.
+
+\[Jaiswal 2003\] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, D.
+Towsley, "Measurement and Classification of Out-of-Sequence Packets in a
+Tier-1 IP backbone," Proc. 2003 IEEE INFOCOM.
+
+\[Ji 2003\] P. Ji, Z. Ge, J. Kurose, D. Towsley, "A Comparison of
+Hard-State and Soft-State Signaling Protocols," Proc. 2003 ACM SIGCOMM
+(Karlsruhe, Germany, Aug. 2003).
+
+\[Jimenez 1997\] D. Jimenez, "Outside Hackers Infiltrate MIT Network,
+Compromise Security," The Tech, Vol. 117, No 49 (Oct. 1997), p. 1,
+http://www-tech.mit.edu/V117/ N49/hackers.49n.html
+
+\[Jin 2004\] C. Jin, D. X. We, S. Low, "FAST TCP: Motivation,
+Architecture, Algorithms, Performance," Proc. 2004 IEEE INFOCOM (Hong
+Kong, Mar. 2004).
+
+\[Juniper Contrail 2016\] Juniper Networks, "Contrail,"
+http://www.juniper.net/us/en/products-services/sdn/contrail/
+
+\[Juniper MX2020 2015\] Juniper Networks, "MX2020 and MX2010 3D
+Universal Edge Routers,"
+www.juniper.net/us/en/local/pdf/.../1000417-en.pdf
+
+\[Kaaranen 2001\] H. Kaaranen, S. Naghian, L. Laitinen, A. Ahtiainen, V.
+Niemi, Networks: Architecture, Mobility and Services, New York: John
+Wiley & Sons, 2001.
+
+\[Kahn 1967\] D. Kahn, The Codebreakers: The Story of Secret Writing,
+The Macmillan Company, 1967.
+
+\[Kahn 1978\] R. E. Kahn, S. Gronemeyer, J. Burchfiel, R. Kunzelman,
+"Advances in Packet Radio Technology," Proc. 1978 IEEE INFOCOM, 66, 11
+(Nov. 1978).
+
+\[Kamerman 1997\] A. Kamerman, L. Monteban, "WaveLAN-II: A High--
+Performance Wireless LAN for the Unlicensed Band," Bell Labs Technical
+Journal (Summer 1997), pp. 118--133.
+
+\[Kar 2000\] K. Kar, M. Kodialam, T. V. Lakshman, "Minimum Interference
+Routing of Bandwidth Guaranteed Tunnels with MPLS Traffic Engineering
+Applications," IEEE J. Selected Areas in Communications (Dec. 2000).
+
+\[Karn 1987\] P. Karn, C. Partridge, "Improving Round-Trip Time
+Estimates in Reliable Transport Protocols," Proc. 1987 ACM SIGCOMM.
+
+\[Karol 1987\] M. Karol, M. Hluchyj, A. Morgan, "Input Versus Output
+Queuing on a Space-Division Packet Switch," IEEE Transactions on
+Communications, Vol. 35, No. 12 (Dec.1987), pp. 1347--1356.
+
+\[Kaufman 1995\] C. Kaufman, R. Perlman, M. Speciner, Network Security,
+Private Communication in a Public World, Prentice Hall, Englewood
+Cliffs, NJ, 1995.
+
+\[Kelly 1998\] F. P. Kelly, A. Maulloo, D. Tan, "Rate Control for
+Communication Networks: Shadow Prices, Proportional Fairness and
+Stability," J. Operations Res. Soc., Vol. 49, No. 3 (Mar. 1998),
+pp. 237--252.
+
+\[Kelly 2003\] T. Kelly, "Scalable TCP: Improving Performance in High
+Speed Wide Area Networks," ACM SIGCOMM Computer Communications Review,
+Volume 33, No. 2 (Apr. 2003), pp.83--91.
+
+\[Kilkki 1999\] K. Kilkki, Differentiated Services for the Internet,
+Macmillan Technical Publishing, Indianapolis, IN, 1999.
+
+\[Kim 2005\] H. Kim, S. Rixner, V. Pai, "Network Interface Data
+Caching," IEEE Transactions on Computers, Vol. 54, No. 11 (Nov. 2005),
+pp. 1394--1408.
+
+\[Kim 2008\] C. Kim, M. Caesar, J. Rexford, "Floodless in SEATTLE: A
+Scalable Ethernet Architecture for Large Enterprises," Proc. 2008 ACM
+SIGCOMM (Seattle, WA, Aug. 2008).
+
+\[Kleinrock 1961\] L. Kleinrock, "Information Flow in Large
+Communication Networks," RLE Quarterly Progress Report, July 1961.
+
+\[Kleinrock 1964\] L. Kleinrock, 1964 Communication Nets: Stochastic
+Message Flow and Delay, McGraw-Hill, New York, NY, 1964.
+
+\[Kleinrock 1975\] L. Kleinrock, Queuing Systems, Vol. 1, John Wiley,
+New York, 1975.
+
+\[Kleinrock 1975b\] L. Kleinrock, F. A. Tobagi, "Packet Switching in
+Radio Channels: Part I---Carrier Sense Multiple-Access Modes and Their
+Throughput-Delay Characteristics," IEEE Transactions on Communications,
+Vol. 23, No. 12 (Dec. 1975), pp. 1400--1416.
+
+\[Kleinrock 1976\] L. Kleinrock, Queuing Systems, Vol. 2, John Wiley,
+New York, 1976.
+
+\[Kleinrock 2004\] L. Kleinrock, "The Birth of the Internet,"
+http://www.lk.cs.ucla.edu/LK/Inet/birth.html
+
+\[Kohler 2006\] E. Kohler, M. Handley, S. Floyd, "DDCP: Designing DCCP:
+Congestion Control Without Reliability," Proc. 2006 ACM SIGCOMM (Pisa,
+Italy, Sept. 2006).
+
+\[Kolding 2003\] T. Kolding, K. Pedersen, J. Wigard, F. Frederiksen, P.
+Mogensen, "High Speed Downlink Packet Access: WCDMA Evolution," IEEE
+Vehicular Technology Society News (Feb. 2003), pp. 4--10.
+
+\[Koponen 2010\] T. Koponen, M. Casado, N. Gude, J. Stribling, L.
+Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, S.
+Shenker, "Onix: A Distributed Control Platform for Large-Scale
+Production Networks," 9th USENIX conference on Operating systems design
+and implementation (OSDI'10), pp. 1--6.
+
+\[Koponen 2011\] T. Koponen, S. Shenker, H. Balakrishnan, N. Feamster,
+I. Ganichev, A. Ghodsi, P. B. Godfrey, N. McKeown, G. Parulkar, B.
+Raghavan, J. Rexford, S. Arianfar, D. Kuptsov, "Architecting for
+Innovation," ACM Computer Communications Review, 2011.
+
+\[Korhonen 2003\] J. Korhonen, Introduction to 3G Mobile Communications,
+2nd ed., Artech House, 2003.
+
+\[Koziol 2003\] J. Koziol, Intrusion Detection with Snort, Sams
+Publishing, 2003.
+
+\[Kreutz 2015\] D. Kreutz, F.M.V. Ramos, P. Esteves Verissimo, C.
+Rothenberg, S. Azodolmolky, S. Uhlig, "Software-Defined Networking: A
+Comprehensive Survey," Proceedings of the IEEE, Vol. 103, No. 1
+(Jan. 2015), pp. 14-76. This paper is also being updated at
+https://github.com/SDN-Survey/latex/wiki
+
+\[Krishnamurthy 2001\] B. Krishnamurthy, J. Rexford, Web Protocols and
+Practice: HTTP/ 1.1, Networking Protocols, and Traffic Measurement,
+Addison-Wesley, Boston, MA, 2001.
+
+\[Kulkarni 2005\] S. Kulkarni, C. Rosenberg, "Opportunistic Scheduling:
+Generalizations to Include Multiple Constraints, Multiple Interfaces,
+and Short Term Fairness," Wireless Networks, 11 (2005), 557--569.
+
+\[Kumar 2006\] R. Kumar, K.W. Ross, "Optimal Peer-Assisted File
+Distribution: Single and Multi-Class Problems," IEEE Workshop on Hot
+Topics in Web Systems and Technologies (Boston, MA, 2006).
+
+\[Labovitz 1997\] C. Labovitz, G. R. Malan, F. Jahanian, "Internet
+Routing Instability," Proc. 1997 ACM SIGCOMM (Cannes, France,
+Sept. 1997), pp. 115--126.
+
+\[Labovitz 2010\] C. Labovitz, S. Iekel-Johnson, D. McPherson, J.
+Oberheide, F. Jahanian, "Internet Inter-Domain Traffic," Proc. 2010 ACM
+SIGCOMM.
+
+\[Labrador 1999\] M. Labrador, S. Banerjee, "Packet Dropping Policies
+for ATM and IP Networks," IEEE Communications Surveys, Vol. 2, No. 3
+(Third Quarter 1999), pp. 2--14.
+
+\[Lacage 2004\] M. Lacage, M.H. Manshaei, T. Turletti, "IEEE 802.11 Rate
+Adaptation: A Practical Approach," ACM Int. Symposium on Modeling,
+Analysis, and Simulation of Wireless and Mobile Systems (MSWiM) (Venice,
+Italy, Oct. 2004).
+
+\[Lakhina 2004\] A. Lakhina, M. Crovella, C. Diot, "Diagnosing
+Network-Wide Traffic Anomalies," Proc. 2004 ACM SIGCOMM.
+
+\[Lakhina 2005\] A. Lakhina, M. Crovella, C. Diot, "Mining Anomalies
+Using Traffic Feature Distributions," Proc. 2005 ACM SIGCOMM.
+
+\[Lakshman 1997\] T. V. Lakshman, U. Madhow, "The Performance of TCP/IP
+for Networks with High Bandwidth-Delay Products and Random Loss,"
+IEEE/ACM Transactions on Networking, Vol. 5, No. 3 (1997), pp. 336--350.
+
+\[Lakshman 2004\] T. V. Lakshman, T. Nandagopal, R. Ramjee, K. Sabnani,
+T. Woo, "The SoftRouter Architecture," Proc. 3nd ACM Workshop on Hot
+Topics in Networks (Hotnets-III), Nov. 2004.
+
+\[Lam 1980\] S. Lam, "A Carrier Sense Multiple Access Protocol for Local
+Networks," Computer Networks, Vol. 4 (1980), pp. 21--32.
+
+\[Lamport 1989\] L. Lamport, "The Part-Time Parliament," Technical
+Report 49, Systems Research Center, Digital Equipment Corp., Palo Alto,
+Sept. 1989.
+
+\[Lampson 1983\] Lampson, Butler W. "Hints for computer system design,"
+ACM SIGOPS Operating Systems Review, Vol. 17, No. 5, 1983.
+
+\[Lampson 1996\] B. Lampson, "How to Build a Highly Available System
+Using Consensus," Proc. 10th International Workshop on Distributed
+Algorithms (WDAG '96), Özalp Babaoglu and Keith Marzullo (Eds.),
+Springer-Verlag, pp. 1--17.
+
+\[Lawton 2001\] G. Lawton, "Is IPv6 Finally Gaining Ground?" IEEE
+Computer Magazine (Aug. 2001), pp. 11--15.
+
+\[LeBlond 2011\] S. Le Blond, C. Zhang, A. Legout, K. Ross, W. Dabbous.
+2011, "I know where you are and what you are sharing: exploiting P2P
+communications to invade users' privacy." 2011 ACM Internet Measurement
+Conference, ACM, New York, NY, USA, pp. 45--60.
+
+\[Leighton 2009\] T. Leighton, "Improving Performance on the Internet,"
+Communications of the ACM, Vol. 52, No. 2 (Feb. 2009), pp. 44--51.
+
+\[Leiner 1998\] B. Leiner, V. Cerf, D. Clark, R. Kahn, L. Kleinrock, D.
+Lynch, J. Postel, L. Roberts, S. Woolf, "A Brief History of the
+Internet," http://www.isoc.org/internet/history/brief.html
+
+\[Leung 2006\] K. Leung, V. O.K. Li, "TCP in Wireless Networks: Issues,
+Approaches, and Challenges," IEEE Commun. Surveys and Tutorials, Vol. 8,
+No. 4 (2006), pp. 64--79.
+
+\[Levin 2012\] D. Levin, A. Wundsam, B. Heller, N. Handigol, A.
+Feldmann, "Logically Centralized?: State Distribution Trade-offs in
+Software Defined Networks," Proc. First Workshop on Hot Topics in
+Software Defined Networks (Aug. 2012), pp. 1--6.
+
+\[Li 2004\] L. Li, D. Alderson, W. Willinger, J. Doyle, "A
+First-Principles Approach to Understanding the Internet's Router-Level
+Topology," Proc. 2004 ACM SIGCOMM (Portland, OR, Aug. 2004).
+
+\[Li 2007\] J. Li, M. Guidero, Z. Wu, E. Purpus, T. Ehrenkranz, "BGP
+Routing Dynamics Revisited." ACM Computer Communication Review
+(Apr. 2007).
+
+\[Li 2015\] S.Q. Li, "Building Softcom Ecosystem Foundation," Open
+Networking Summit, 2015.
+
+\[Lin 2001\] Y. Lin, I. Chlamtac, Wireless and Mobile Network
+Architectures, John Wiley and Sons, New York, NY, 2001.
+
+\[Liogkas 2006\] N. Liogkas, R. Nelson, E. Kohler, L. Zhang, "Exploiting
+BitTorrent for Fun (but Not Profit)," 6th International Workshop on
+Peer-to-Peer Systems (IPTPS 2006).
+
+\[Liu 2003\] J. Liu, I. Matta, M. Crovella, "End-to-End Inference of
+Loss Nature in a Hybrid Wired/Wireless Environment," Proc. WiOpt'03:
+Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks.
+
+\[Locher 2006\] T. Locher, P. Moor, S. Schmid, R. Wattenhofer, "Free
+Riding in BitTorrent is Cheap," Proc. ACM HotNets 2006 (Irvine CA,
+Nov. 2006).
+
+\[Lui 2004\] J. Lui, V. Misra, D. Rubenstein, "On the Robustness of Soft
+State Protocols," Proc. IEEE Int. Conference on Network Protocols (ICNP
+'04), pp. 50--60.
+
+\[Mahdavi 1997\] J. Mahdavi, S. Floyd, "TCP-Friendly Unicast Rate-Based
+Flow Control," unpublished note (Jan. 1997).
+
+\[MaxMind 2016\] http://www.maxmind.com/app/ip-location
+
+\[Maymounkov 2002\] P. Maymounkov, D. Mazières. "Kademlia: A
+Peer-to-Peer Information System Based on the XOR Metric." Proceedings of
+the 1st International Workshop on Peerto-Peer Systems (IPTPS '02)
+(Mar. 2002), pp. 53--65.
+
+\[McKeown 1997a\] N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick,
+M. Horowitz, "The Tiny Tera: A Packet Switch Core," IEEE Micro Magazine
+(Jan.--Feb. 1997).
+
+\[McKeown 1997b\] N. McKeown, "A Fast Switched Backplane for a Gigabit
+Switched Router," Business Communications Review, Vol. 27, No. 12.
+http://tinytera.stanford.edu/\~nickm/papers/cisco_fasts_wp.pdf
+
+\[McKeown 2008\] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar,
+L. Peterson, J. Rexford, S. Shenker, J. Turner. 2008. OpenFlow: Enabling
+Innovation in Campus Networks. SIGCOMM Comput. Commun. Rev. 38, 2
+(Mar. 2008), pp. 69--74.
+
+\[McQuillan 1980\] J. McQuillan, I. Richer, E. Rosen, "The New Routing
+Algorithm for the Arpanet," IEEE Transactions on Communications, Vol.
+28, No. 5 (May 1980), pp. 711--719.
+
+\[Metcalfe 1976\] R. M. Metcalfe, D. R. Boggs. "Ethernet: Distributed
+Packet Switching for Local Computer Networks," Communications of the
+Association for Computing Machinery, Vol. 19, No. 7 (July 1976),
+pp. 395--404.
+
+\[Meyers 2004\] A. Myers, T. Ng, H. Zhang, "Rethinking the Service
+Model: Scaling Ethernet to a Million Nodes," ACM Hotnets Conference,
+2004.
+
+\[MFA Forum 2016\] IP/MPLS Forum homepage, http://www.ipmplsforum.org/
+
+\[Mockapetris 1988\] P. V. Mockapetris, K. J. Dunlap, "Development of
+the Domain Name System," Proc. 1988 ACM SIGCOMM (Stanford, CA,
+Aug. 1988).
+
+\[Mockapetris 2005\] P. Mockapetris, Sigcomm Award Lecture, video
+available at http://www.postel.org/sigcomm
+
+\[Molinero-Fernandez 2002\] P. Molinaro-Fernandez, N. McKeown, H. Zhang,
+"Is IP Going to Take Over the World (of Communications)?" Proc. 2002 ACM
+Hotnets.
+
+\[Molle 1987\] M. L. Molle, K. Sohraby, A. N. Venetsanopoulos,
+"Space-Time Models of Asynchronous CSMA Protocols for Local Area
+Networks," IEEE Journal on Selected Areas in Communications, Vol. 5,
+No. 6 (1987), pp. 956--968.
+
+\[Moore 2001\] D. Moore, G. Voelker, S. Savage, "Inferring Internet
+Denial of Service Activity," Proc. 2001 USENIX Security Symposium
+(Washington, DC, Aug. 2001).
+
+\[Motorola 2007\] Motorola, "Long Term Evolution (LTE): A Technical
+Overview,"
+http://www.motorola.com/staticfiles/Business/Solutions/Industry%20Solutions/Service%20Providers/Wireless%20Operators/LTE/\_Document/Static%20Files/6834_MotDoc_New.pdf
+
+\[Mouly 1992\] M. Mouly, M. Pautet, The GSM System for Mobile
+Communications, Cell and Sys, Palaiseau, France, 1992.
+
+\[Moy 1998\] J. Moy, OSPF: Anatomy of An Internet Routing Protocol,
+Addison-Wesley, Reading, MA, 1998.
+
+\[Mukherjee 1997\] B. Mukherjee, Optical Communication Networks,
+McGraw-Hill, 1997.
+
+\[Mukherjee 2006\] B. Mukherjee, Optical WDM Networks, Springer, 2006.
+
+\[Mysore 2009\] R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P.
+Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat, "PortLand: A Scalable
+Fault-Tolerant Layer 2 Data Center Network Fabric," Proc. 2009 ACM
+SIGCOMM.
+
+\[Nahum 2002\] E. Nahum, T. Barzilai, D. Kandlur, "Performance Issues in
+WWW Servers," IEEE/ACM Transactions on Networking, Vol 10, No. 1
+(Feb. 2002).
+
+\[Netflix Open Connect 2016\] Netflix Open Connect CDN, 2016, https://
+openconnect.netflix.com/
+
+\[Netflix Video 1\] Designing Netflix's Content Delivery System, D.
+Fulllager, 2014, https://www.youtube.com/watch?v=LkLLpYdDINA
+
+\[Netflix Video 2\] Scaling the Netflix Global CDN, D. Temkin, 2015,
+https://www .youtube.com/watch?v=tbqcsHg-Q_o
+
+\[Neumann 1997\] R. Neumann, "Internet Routing Black Hole," The Risks
+Digest: Forum on Risks to the Public in Computers and Related Systems,
+Vol. 19, No. 12 (May 1997).
+http://catless.ncl.ac.uk/Risks/19.12.html#subj1.1
+
+\[Neville-Neil 2009\] G. Neville-Neil, "Whither Sockets?" Communications
+of the ACM, Vol. 52, No. 6 (June 2009), pp. 51--55.
+
+\[Nicholson 2006\] A Nicholson, Y. Chawathe, M. Chen, B. Noble, D.
+Wetherall, "Improved Access Point Selection," Proc. 2006 ACM Mobisys
+Conference (Uppsala Sweden, 2006).
+
+\[Nielsen 1997\] H. F. Nielsen, J. Gettys, A. Baird-Smith, E.
+Prud'hommeaux, H. W. Lie, C. Lilley, "Network Performance Effects of
+HTTP/1.1, CSS1, and PNG," W3C Document, 1997 (also appears in Proc. 1997
+ACM SIGCOM (Cannes, France, Sept 1997), pp. 155--166.
+
+\[NIST 2001\] National Institute of Standards and Technology, "Advanced
+Encryption Standard (AES)," Federal Information Processing Standards
+197, Nov. 2001, http://
+csrc.nist.gov/publications/fips/fips197/fips-197.pdf
+
+\[NIST IPv6 2015\] US National Institute of Standards and Technology,
+"Estimating IPv6 & DNSSEC Deployment SnapShots,"
+http://fedv6-deployment.antd.nist.gov/snapall.html
+
+\[Nmap 2012\] Nmap homepage, http://www.insecure.com/nmap
+
+\[Nonnenmacher 1998\] J. Nonnenmacher, E. Biersak, D. Towsley,
+"Parity-Based Loss Recovery for Reliable Multicast Transmission,"
+IEEE/ACM Transactions on Networking, Vol. 6, No. 4 (Aug. 1998),
+pp. 349--361.
+
+\[Nygren 2010\] Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun, "The
+Akamai Network: A Platform for High-performance Internet Applications,"
+SIGOPS Oper. Syst. Rev. 44, 3 (Aug. 2010), 2--19.
+
+\[ONF 2016\] Open Networking Foundation, Technical Library,
+https://www.opennetworking.org/sdn-resources/technical-library
+
+\[ONOS 2016\] Open Network Operating System (ONOS), "Architecture
+Guide," https://wiki.onosproject.org/display/ONOS/Architecture+Guide,
+2016.
+
+\[OpenFlow 2009\] Open Network Foundation, "OpenFlow Switch
+Specification 1.0.0, TS-001,"
+https://www.opennetworking.org/images/stories/downloads/sdnresources/onf-specifications/openflow/openflow-spec-v1.0.0.pdf
+
+\[OpenDaylight Lithium 2016\] OpenDaylight, "Lithium,"
+https://www.opendaylight.org/lithium
+
+\[OSI 2012\] International Organization for Standardization homepage,
+http://www.iso.org/iso/en/ISOOnline.frontpage
+
+\[Osterweil 2012\] E. Osterweil, D. McPherson, S. DiBenedetto, C.
+Papadopoulos, D. Massey, "Behavior of DNS Top Talkers," Passive and
+Active Measurement Conference, 2012.
+
+\[Padhye 2000\] J. Padhye, V. Firoiu, D. Towsley, J. Kurose, "Modeling
+TCP Reno Performance: A Simple Model and Its Empirical Validation,"
+IEEE/ACM Transactions on Networking, Vol. 8 No. 2 (Apr. 2000),
+pp. 133--145.
+
+\[Padhye 2001\] J. Padhye, S. Floyd, "On Inferring TCP Behavior," Proc.
+2001 ACM SIGCOMM (San Diego, CA, Aug. 2001).
+
+\[Palat 2009\] S. Palat, P. Godin, "The LTE Network Architecture: A
+Comprehensive Tutorial," in LTE---The UMTS Long Term Evolution: From
+Theory to Practice. Also available as a standalone Alcatel white paper.
+
+\[Panda 2013\] A. Panda, C. Scott, A. Ghodsi, T. Koponen, S. Shenker,
+"CAP for Networks," Proc. ACM HotSDN '13, pp. 91--96.
+
+\[Parekh 1993\] A. Parekh, R. Gallagher, "A Generalized Processor
+Sharing Approach to Flow Control in Integrated Services Networks: The
+Single-Node Case," IEEE/ACM Transactions on Networking, Vol. 1, No. 3
+(June 1993), pp. 344--357.
+
+\[Partridge 1992\] C. Partridge, S. Pink, "An Implementation of the
+Revised Internet Stream Protocol (ST-2)," Journal of Internetworking:
+Research and Experience, Vol. 3, No. 1 (Mar. 1992).
+
+\[Partridge 1998\] C. Partridge, et al. "A Fifty Gigabit per second IP
+Router," IEEE/ACM Transactions on Networking, Vol. 6, No. 3 (Jun. 1998),
+pp. 237--248.
+
+\[Pathak 2010\] A. Pathak, Y. A. Wang, C. Huang, A. Greenberg, Y. C. Hu,
+J. Li, K. W. Ross, "Measuring and Evaluating TCP Splitting for Cloud
+Services," Passive and Active Measurement (PAM) Conference (Zurich,
+2010).
+
+\[Perkins 1994\] A. Perkins, "Networking with Bob Metcalfe," The Red
+Herring Magazine (Nov. 1994).
+
+\[Perkins 1998\] C. Perkins, O. Hodson, V. Hardman, "A Survey of Packet
+Loss Recovery Techniques for Streaming Audio," IEEE Network Magazine
+(Sept./Oct. 1998), pp. 40--47.
+
+\[Perkins 1998b\] C. Perkins, Mobile IP: Design Principles and Practice,
+Addison-Wesley, Reading, MA, 1998.
+
+\[Perkins 2000\] C. Perkins, Ad Hoc Networking, Addison-Wesley, Reading,
+MA, 2000.
+
+\[Perlman 1999\] R. Perlman, Interconnections: Bridges, Routers,
+Switches, and Internetworking Protocols, 2nd ed., Addison-Wesley
+Professional Computing Series, Reading, MA, 1999.
+
+\[PGPI 2016\] The International PGP homepage, http://www.pgpi.org
+
+\[Phifer 2000\] L. Phifer, "The Trouble with NAT," The Internet Protocol
+Journal, Vol. 3, No. 4 (Dec. 2000),
+http://www.cisco.com/warp/public/759/ipj_3-4/ipj\_ 3-4_nat.html
+
+\[Piatek 2007\] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, A.
+Venkataramani, "Do Incentives Build Robustness in Bittorrent?," Proc.
+NSDI (2007).
+
+\[Piatek 2008\] M. Piatek, T. Isdal, A. Krishnamurthy, T. Anderson, "One
+Hop Reputations for Peer-to-peer File Sharing Workloads," Proc. NSDI
+(2008).
+
+\[Pickholtz 1982\] R. Pickholtz, D. Schilling, L. Milstein, "Theory of
+Spread Spectrum Communication---a Tutorial," IEEE Transactions on
+Communications, Vol. 30, No. 5 (May 1982), pp. 855--884.
+
+\[PingPlotter 2016\] PingPlotter homepage, http://www.pingplotter.com
+
+\[Piscatello 1993\] D. Piscatello, A. Lyman Chapin, Open Systems
+Networking, Addison-Wesley, Reading, MA, 1993.
+
+\[Pomeranz 2010\] H. Pomeranz, "Practical, Visual, Three-Dimensional
+Pedagogy for Internet Protocol Packet Header Control Fields,"
+https://righteousit.wordpress.com/
+2010/06/27/practical-visual-three-dimensional-pedagogy-for-internet-protocol-packet-header-control-fields/,
+June 2010.
+
+\[Potaroo 2016\] "Growth of the BGP Table--1994 to Present,"
+http://bgp.potaroo.net/
+
+\[PPLive 2012\] PPLive homepage, http://www.pplive.com
+
+\[Qazi 2013\] Z. Qazi, C. Tu, L. Chiang, R. Miao, V. Sekar, M. Yu,
+"SIMPLE-fying Middlebox Policy Enforcement Using SDN," ACM SIGCOMM
+Conference (Aug. 2013), pp. 27--38.
+
+\[Quagga 2012\] Quagga, "Quagga Routing Suite," http://www.quagga.net/
+
+\[Quittner 1998\] J. Quittner, M. Slatalla, Speeding the Net: The Inside
+Story of Netscape and How It Challenged Microsoft, Atlantic Monthly
+Press, 1998.
+
+\[Quova 2016\] www.quova.com
+
+\[Ramakrishnan 1990\] K. K. Ramakrishnan, R. Jain, "A Binary Feedback
+Scheme for Congestion Avoidance in Computer Networks," ACM Transactions
+on Computer Systems, Vol. 8, No. 2 (May 1990), pp. 158--181.
+
+\[Raman 1999\] S. Raman, S. McCanne, "A Model, Analysis, and Protocol
+Framework for Soft State-based Communication," Proc. 1999 ACM SIGCOMM
+(Boston, MA, Aug. 1999).
+
+\[Raman 2007\] B. Raman, K. Chebrolu, "Experiences in Using WiFi for
+Rural Internet in India," IEEE Communications Magazine, Special Issue on
+New Directions in Networking Technologies in Emerging Economies
+(Jan. 2007).
+
+\[Ramaswami 2010\] R. Ramaswami, K. Sivarajan, G. Sasaki, Optical
+Networks: A Practical Perspective, Morgan Kaufman Publishers, 2010.
+
+\[Ramjee 1994\] R. Ramjee, J. Kurose, D. Towsley, H. Schulzrinne,
+"Adaptive Playout Mechanisms for Packetized Audio Applications in
+Wide-Area Networks," Proc. 1994 IEEE INFOCOM.
+
+\[Rao 2011\] A. S. Rao, Y. S. Lim, C. Barakat, A. Legout, D. Towsley, W.
+Dabbous, "Network Characteristics of Video Streaming Traffic," Proc.
+2011 ACM CoNEXT (Tokyo).
+
+\[Ren 2006\] S. Ren, L. Guo, X. Zhang, "ASAP: An AS-Aware Peer-Relay
+Protocol for High Quality VoIP," Proc. 2006 IEEE ICDCS (Lisboa,
+Portugal, July 2006).
+
+\[Rescorla 2001\] E. Rescorla, SSL and TLS: Designing and Building
+Secure Systems, Addison-Wesley, Boston, 2001.
+
+\[RFC 001\] S. Crocker, "Host Software," RFC 001 (the very first RFC!).
+
+\[RFC 768\] J. Postel, "User Datagram Protocol," RFC 768, Aug. 1980.
+
+\[RFC 791\] J. Postel, "Internet Protocol: DARPA Internet Program
+Protocol Specification," RFC 791, Sept. 1981.
+
+\[RFC 792\] J. Postel, "Internet Control Message Protocol," RFC 792,
+Sept. 1981.
+
+\[RFC 793\] J. Postel, "Transmission Control Protocol," RFC 793,
+Sept. 1981.
+
+\[RFC 801\] J. Postel, "NCP/TCP Transition Plan," RFC 801, Nov. 1981.
+
+\[RFC 826\] D. C. Plummer, "An Ethernet Address Resolution
+Protocol---or--- Converting Network Protocol Addresses to 48-bit
+Ethernet Address for Transmission on Ethernet Hardware," RFC 826,
+Nov. 1982.
+
+\[RFC 829\] V. Cerf, "Packet Satellite Technology Reference Sources,"
+RFC 829, Nov. 1982.
+
+\[RFC 854\] J. Postel, J. Reynolds, "TELNET Protocol Specification," RFC
+854, May 1993.
+
+\[RFC 950\] J. Mogul, J. Postel, "Internet Standard Subnetting
+Procedure," RFC 950, Aug. 1985.
+
+\[RFC 959\] J. Postel and J. Reynolds, "File Transfer Protocol (FTP),"
+RFC 959, Oct. 1985.
+
+\[RFC 1034\] P. V. Mockapetris, "Domain Names---Concepts and
+Facilities," RFC 1034, Nov. 1987.
+
+\[RFC 1035\] P. Mockapetris, "Domain Names---Implementation and
+Specification," RFC 1035, Nov. 1987.
+
+\[RFC 1058\] C. L. Hendrick, "Routing Information Protocol," RFC 1058,
+June 1988.
+
+\[RFC 1071\] R. Braden, D. Borman, and C. Partridge, "Computing the
+Internet Checksum," RFC 1071, Sept. 1988.
+
+\[RFC 1122\] R. Braden, "Requirements for Internet Hosts---Communication
+Layers," RFC 1122, Oct. 1989.
+
+\[RFC 1123\] R. Braden, ed., "Requirements for Internet
+Hosts---Application and Support," RFC-1123, Oct. 1989.
+
+\[RFC 1142\] D. Oran, "OSI IS-IS Intra-Domain Routing Protocol," RFC
+1142, Feb. 1990.
+
+\[RFC 1190\] C. Topolcic, "Experimental Internet Stream Protocol:
+Version 2 (ST-II)," RFC 1190, Oct. 1990.
+
+\[RFC 1256\] S. Deering, "ICMP Router Discovery Messages," RFC 1256,
+Sept. 1991.
+
+\[RFC 1320\] R. Rivest, "The MD4 Message-Digest Algorithm," RFC 1320,
+Apr. 1992.
+
+\[RFC 1321\] R. Rivest, "The MD5 Message-Digest Algorithm," RFC 1321,
+Apr. 1992.
+
+\[RFC 1323\] V. Jacobson, S. Braden, D. Borman, "TCP Extensions for High
+Performance," RFC 1323, May 1992.
+
+\[RFC 1422\] S. Kent, "Privacy Enhancement for Internet Electronic Mail:
+Part II: Certificate-Based Key Management," RFC 1422.
+
+\[RFC 1546\] C. Partridge, T. Mendez, W. Milliken, "Host Anycasting
+Service," RFC 1546, 1993.
+
+\[RFC 1584\] J. Moy, "Multicast Extensions to OSPF," RFC 1584,
+Mar. 1994.
+
+\[RFC 1633\] R. Braden, D. Clark, S. Shenker, "Integrated Services in
+the Internet Architecture: an Overview," RFC 1633, June 1994.
+
+\[RFC 1636\] R. Braden, D. Clark, S. Crocker, C. Huitema, "Report of IAB
+Workshop on Security in the Internet Architecture," RFC 1636, Nov. 1994.
+
+\[RFC 1700\] J. Reynolds, J. Postel, "Assigned Numbers," RFC 1700,
+Oct. 1994.
+
+\[RFC 1752\] S. Bradner, A. Mankin, "The Recommendations for the IP Next
+Generation Protocol," RFC 1752, Jan. 1995.
+
+\[RFC 1918\] Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, E.
+Lear, "Address Allocation for Private Internets," RFC 1918, Feb. 1996.
+
+\[RFC 1930\] J. Hawkinson, T. Bates, "Guidelines for Creation,
+Selection, and Registration of an Autonomous System (AS)," RFC 1930,
+Mar. 1996.
+
+\[RFC 1939\] J. Myers, M. Rose, "Post Office Protocol---Version 3," RFC
+1939, May 1996.
+
+\[RFC 1945\] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext
+Transfer Protocol---HTTP/1.0," RFC 1945, May 1996.
+
+\[RFC 2003\] C. Perkins, "IP Encapsulation Within IP," RFC 2003,
+Oct. 1996.
+
+\[RFC 2004\] C. Perkins, "Minimal Encapsulation Within IP," RFC 2004,
+Oct. 1996.
+
+\[RFC 2018\] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective
+Acknowledgment Options," RFC 2018, Oct. 1996.
+
+\[RFC 2131\] R. Droms, "Dynamic Host Configuration Protocol," RFC 2131,
+Mar. 1997.
+
+\[RFC 2136\] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic
+Updates in the Domain Name System," RFC 2136, Apr. 1997.
+
+\[RFC 2205\] R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin,
+"Resource ReSerVation Protocol (RSVP)---Version 1 Functional
+Specification," RFC 2205, Sept. 1997.
+
+\[RFC 2210\] J. Wroclawski, "The Use of RSVP with IETF Integrated
+Services," RFC 2210, Sept. 1997.
+
+\[RFC 2211\] J. Wroclawski, "Specification of the Controlled-Load
+Network Element Service," RFC 2211, Sept. 1997.
+
+\[RFC 2215\] S. Shenker, J. Wroclawski, "General Characterization
+Parameters for Integrated Service Network Elements," RFC 2215,
+Sept. 1997.
+
+\[RFC 2326\] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
+Protocol (RTSP)," RFC 2326, Apr. 1998.
+
+\[RFC 2328\] J. Moy, "OSPF Version 2," RFC 2328, Apr. 1998.
+
+\[RFC 2420\] H. Kummert, "The PPP Triple-DES Encryption Protocol
+(3DESE)," RFC 2420, Sept. 1998.
+
+\[RFC 2453\] G. Malkin, "RIP Version 2," RFC 2453, Nov. 1998.
+
+\[RFC 2460\] S. Deering, R. Hinden, "Internet Protocol, Version 6 (IPv6)
+Specification," RFC 2460, Dec. 1998.
+
+\[RFC 2475\] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W.
+Weiss, "An Architecture for Differentiated Services," RFC 2475,
+Dec. 1998.
+
+\[RFC 2578\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Structure of
+Management Information Version 2 (SMIv2)," RFC 2578, Apr. 1999.
+
+\[RFC 2579\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Textual
+Conventions for SMIv2," RFC 2579, Apr. 1999.
+
+\[RFC 2580\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Conformance
+Statements for SMIv2," RFC 2580, Apr. 1999.
+
+\[RFC 2597\] J. Heinanen, F. Baker, W. Weiss, J. Wroclawski, "Assured
+Forwarding PHB Group," RFC 2597, June 1999.
+
+\[RFC 2616\] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter,
+P. Leach, T. Berners-Lee, R. Fielding, "Hypertext Transfer
+Protocol---HTTP/1.1," RFC 2616, June 1999.
+
+\[RFC 2663\] P. Srisuresh, M. Holdrege, "IP Network Address Translator
+(NAT) Terminology and Considerations," RFC 2663.
+
+\[RFC 2702\] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
+"Requirements for Traffic Engineering Over MPLS," RFC 2702, Sept. 1999.
+
+\[RFC 2827\] P. Ferguson, D. Senie, "Network Ingress Filtering:
+Defeating Denial of Service Attacks which Employ IP Source Address
+Spoofing," RFC 2827, May 2000.
+
+\[RFC 2865\] C. Rigney, S. Willens, A. Rubens, W. Simpson, "Remote
+Authentication Dial In User Service (RADIUS)," RFC 2865, June 2000.
+
+\[RFC 3007\] B. Wellington, "Secure Domain Name System (DNS) Dynamic
+Update," RFC 3007, Nov. 2000.
+
+\[RFC 3022\] P. Srisuresh, K. Egevang, "Traditional IP Network Address
+Translator (Traditional NAT)," RFC 3022, Jan. 2001.
+
+\[RFC 3022\] P. Srisuresh, K. Egevang, "Traditional IP Network Address
+Translator (Traditional NAT)," RFC 3022, Jan. 2001.
+
+\[RFC 3031\] E. Rosen, A. Viswanathan, R. Callon, "Multiprotocol Label
+Switching Architecture," RFC 3031, Jan. 2001.
+
+\[RFC 3032\] E. Rosen, D. Tappan, G. Fedorkow, Y. Rekhter, D. Farinacci,
+T. Li, A. Conta, "MPLS Label Stack Encoding," RFC 3032, Jan. 2001.
+
+\[RFC 3168\] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of
+Explicit Congestion Notification (ECN) to IP," RFC 3168, Sept. 2001.
+
+\[RFC 3209\] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, G.
+Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels," RFC 3209,
+Dec. 2001.
+
+\[RFC 3221\] G. Huston, "Commentary on Inter-Domain Routing in the
+Internet," RFC 3221, Dec. 2001.
+
+\[RFC 3232\] J. Reynolds, "Assigned Numbers: RFC 1700 Is Replaced by an
+On-line Database," RFC 3232, Jan. 2002.
+
+\[RFC 3234\] B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues,"
+RFC 3234, Feb. 2002.
+
+\[RFC 3246\] B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le
+Boudec, W. Courtney, S. Davari, V. Firoiu, D. Stiliadis, "An Expedited
+Forwarding PHB (Per-Hop Behavior)," RFC 3246, Mar. 2002.
+
+\[RFC 3260\] D. Grossman, "New Terminology and Clarifications for
+Diffserv," RFC 3260, Apr. 2002.
+
+\[RFC 3261\] J. Rosenberg, H. Schulzrinne, G. Carmarillo, A. Johnston,
+J. Peterson, R. Sparks, M. Handley, E. Schooler, "SIP: Session
+Initiation Protocol," RFC 3261, July 2002.
+
+\[RFC 3272\] J. Boyle, V. Gill, A. Hannan, D. Cooper, D. Awduche, B.
+Christian, W. S. Lai, "Overview and Principles of Internet Traffic
+Engineering," RFC 3272, May 2002.
+
+\[RFC 3286\] L. Ong, J. Yoakum, "An Introduction to the Stream Control
+Transmission Protocol (SCTP)," RFC 3286, May 2002.
+
+\[RFC 3346\] J. Boyle, V. Gill, A. Hannan, D. Cooper, D. Awduche, B.
+Christian, W. S. Lai, "Applicability Statement for Traffic Engineering
+with MPLS," RFC 3346, Aug. 2002.
+
+\[RFC 3390\] M. Allman, S. Floyd, C. Partridge, "Increasing TCP's
+Initial Window," RFC 3390, Oct. 2002.
+
+\[RFC 3410\] J. Case, R. Mundy, D. Partain, "Introduction and
+Applicability Statements for Internet Standard Management Framework,"
+RFC 3410, Dec. 2002.
+
+\[RFC 3414\] U. Blumenthal and B. Wijnen, "User-based Security Model
+(USM) for Version 3 of the Simple Network Management Protocol (SNMPv3),"
+RFC 3414, Dec. 2002.
+
+\[RFC 3416\] R. Presuhn, J. Case, K. McCloghrie, M. Rose, S. Waldbusser,
+"Version 2 of the Protocol Operations for the Simple Network Management
+Protocol (SNMP)," Dec. 2002.
+
+\[RFC 3439\] R. Bush, D. Meyer, "Some Internet Architectural Guidelines
+and Philosophy," RFC 3439, Dec. 2003.
+
+\[RFC 3447\] J. Jonsson, B. Kaliski, "Public-Key Cryptography Standards
+(PKCS) #1: RSA Cryptography Specifications Version 2.1," RFC 3447,
+Feb. 2003.
+
+\[RFC 3468\] L. Andersson, G. Swallow, "The Multiprotocol Label
+Switching (MPLS) Working Group Decision on MPLS Signaling Protocols,"
+RFC 3468, Feb. 2003.
+
+\[RFC 3469\] V. Sharma, Ed., F. Hellstrand, Ed, "Framework for
+Multi-Protocol Label Switching (MPLS)-based Recovery," RFC 3469,
+Feb. 2003. ftp://ftp.rfc-editor.org/innotes/rfc3469.txt
+
+\[RFC 3501\] M. Crispin, "Internet Message Access Protocol---Version
+4rev1," RFC 3501, Mar. 2003.
+
+\[RFC 3550\] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP:
+A Transport Protocol for Real-Time Applications," RFC 3550, July 2003.
+
+\[RFC 3588\] P. Calhoun, J. Loughney, E. Guttman, G. Zorn, J. Arkko,
+"Diameter Base Protocol," RFC 3588, Sept. 2003.
+
+\[RFC 3649\] S. Floyd, "HighSpeed TCP for Large Congestion Windows," RFC
+3649, Dec. 2003.
+
+\[RFC 3746\] L. Yang, R. Dantu, T. Anderson, R. Gopal, "Forwarding and
+Control Element Separation (ForCES) Framework," Internet, RFC 3746,
+Apr. 2004.
+
+\[RFC 3748\] B. Aboba, L. Blunk, J. Vollbrecht, J. Carlson, H.
+Levkowetz, Ed., "Extensible Authentication Protocol (EAP)," RFC 3748,
+June 2004.
+
+\[RFC 3782\] S. Floyd, T. Henderson, A. Gurtov, "The NewReno
+Modification to TCP's Fast Recovery Algorithm," RFC 3782, Apr. 2004.
+
+\[RFC 4213\] E. Nordmark, R. Gilligan, "Basic Transition Mechanisms for
+IPv6 Hosts and Routers," RFC 4213, Oct. 2005.
+
+\[RFC 4271\] Y. Rekhter, T. Li, S. Hares, Ed., "A Border Gateway
+Protocol 4 (BGP-4)," RFC 4271, Jan. 2006.
+
+\[RFC 4272\] S. Murphy, "BGP Security Vulnerabilities Analysis," RFC
+4274, Jan. 2006.
+
+\[RFC 4291\] R. Hinden, S. Deering, "IP Version 6 Addressing
+Architecture," RFC 4291, Feb. 2006.
+
+\[RFC 4340\] E. Kohler, M. Handley, S. Floyd, "Datagram Congestion
+Control Protocol (DCCP)," RFC 4340, Mar. 2006.
+
+\[RFC 4443\] A. Conta, S. Deering, M. Gupta, Ed., "Internet Control
+Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6)
+Specification," RFC 4443, Mar. 2006.
+
+\[RFC 4346\] T. Dierks, E. Rescorla, "The Transport Layer Security (TLS)
+Protocol Version 1.1," RFC 4346, Apr. 2006.
+
+\[RFC 4514\] K. Zeilenga, Ed., "Lightweight Directory Access Protocol
+(LDAP): String Representation of Distinguished Names," RFC 4514, June
+2006.
+
+\[RFC 4601\] B. Fenner, M. Handley, H. Holbrook, I. Kouvelas, "Protocol
+Independent Multicast---Sparse Mode (PIM-SM): Protocol Specification
+(Revised)," RFC 4601, Aug. 2006.
+
+\[RFC 4632\] V. Fuller, T. Li, "Classless Inter-domain Routing (CIDR):
+The Internet Address Assignment and Aggregation Plan," RFC 4632,
+Aug. 2006.
+
+\[RFC 4960\] R. Stewart, ed., "Stream Control Transmission Protocol,"
+RFC 4960, Sept. 2007.
+
+\[RFC 4987\] W. Eddy, "TCP SYN Flooding Attacks and Common Mitigations,"
+RFC 4987, Aug. 2007.
+
+\[RFC 5000\] RFC editor, "Internet Official Protocol Standards," RFC
+5000, May 2008.
+
+\[RFC 5109\] A. Li (ed.), "RTP Payload Format for Generic Forward Error
+Correction," RFC 5109, Dec. 2007.
+
+\[RFC 5216\] D. Simon, B. Aboba, R. Hurst, "The EAP-TLS Authentication
+Protocol," RFC 5216, Mar. 2008.
+
+\[RFC 5218\] D. Thaler, B. Aboba, "What Makes for a Successful
+Protocol?," RFC 5218, July 2008.
+
+\[RFC 5321\] J. Klensin, "Simple Mail Transfer Protocol," RFC 5321,
+Oct. 2008.
+
+\[RFC 5322\] P. Resnick, Ed., "Internet Message Format," RFC 5322,
+Oct. 2008.
+
+\[RFC 5348\] S. Floyd, M. Handley, J. Padhye, J. Widmer, "TCP Friendly
+Rate Control (TFRC): Protocol Specification," RFC 5348, Sept. 2008.
+
+\[RFC 5389\] J. Rosenberg, R. Mahy, P. Matthews, D. Wing, "Session
+Traversal Utilities for NAT (STUN)," RFC 5389, Oct. 2008.
+
+\[RFC 5411\] J Rosenberg, "A Hitchhiker's Guide to the Session
+Initiation Protocol (SIP)," RFC 5411, Feb. 2009.
+
+\[RFC 5681\] M. Allman, V. Paxson, E. Blanton, "TCP Congestion Control,"
+RFC 5681, Sept. 2009.
+
+\[RFC 5944\] C. Perkins, Ed., "IP Mobility Support for IPv4, Revised,"
+RFC 5944, Nov. 2010.
+
+\[RFC 6265\] A Barth, "HTTP State Management Mechanism," RFC 6265,
+Apr. 2011.
+
+\[RFC 6298\] V. Paxson, M. Allman, J. Chu, M. Sargent, "Computing TCP's
+Retransmission Timer," RFC 6298, June 2011.
+
+\[RFC 7020\] R. Housley, J. Curran, G. Huston, D. Conrad, "The Internet
+Numbers Registry System," RFC 7020, Aug. 2013.
+
+\[RFC 7094\] D. McPherson, D. Oran, D. Thaler, E. Osterweil,
+"Architectural Considerations of IP Anycast," RFC 7094, Jan. 2014.
+
+\[RFC 7323\] D. Borman, R. Braden, V. Jacobson, R. Scheffenegger (ed.),
+"TCP Extensions for High Performance," RFC 7323, Sept. 2014.
+
+\[RFC 7540\] M. Belshe, R. Peon, M. Thomson (Eds), "Hypertext Transfer
+Protocol Version 2 (HTTP/2)," RFC 7540, May 2015.
+
+\[Richter 2015\] P. Richter, M. Allman, R. Bush, V. Paxson, "A Primer on
+IPv4 Scarcity," ACM SIGCOMM Computer Communication Review, Vol. 45,
+No. 2 (Apr. 2015), pp. 21--32.
+
+\[Roberts 1967\] L. Roberts, T. Merril, "Toward a Cooperative Network of
+Time-Shared Computers," AFIPS Fall Conference (Oct. 1966).
+
+\[Rodriguez 2010\] R. Rodrigues, P. Druschel, "Peer-to-Peer Systems,"
+Communications of the ACM, Vol. 53, No. 10 (Oct. 2010), pp. 72--82.
+
+\[Rohde 2008\] Rohde, Schwarz, "UMTS Long Term Evolution (LTE)
+Technology Introduction," Application Note 1MA111.
+
+\[Rom 1990\] R. Rom, M. Sidi, Multiple Access Protocols: Performance and
+Analysis, Springer-Verlag, New York, 1990.
+
+\[Root Servers 2016\] Root Servers home page,
+http://www.root-servers.org/
+
+\[RSA 1978\] R. Rivest, A. Shamir, L. Adelman, "A Method for Obtaining
+Digital Signatures and Public-key Cryptosystems," Communications of the
+ACM, Vol. 21, No. 2 (Feb. 1978), pp. 120--126.
+
+\[RSA Fast 2012\] RSA Laboratories, "How Fast Is RSA?"
+http://www.rsa.com/rsalabs/node.asp?id=2215
+
+\[RSA Key 2012\] RSA Laboratories, "How Large a Key Should Be Used in
+the RSA Crypto System?" http://www.rsa.com/rsalabs/node.asp?id=2218
+
+\[Rubenstein 1998\] D. Rubenstein, J. Kurose, D. Towsley, "Real-Time
+Reliable Multicast Using Proactive Forward Error Correction,"
+Proceedings of NOSSDAV '98 (Cambridge, UK, July 1998).
+
+\[Ruiz-Sanchez 2001\] M. Ruiz-Sánchez, E. Biersack, W. Dabbous, "Survey
+and Taxonomy of IP Address Lookup Algorithms," IEEE Network Magazine,
+Vol. 15, No. 2 (Mar./Apr. 2001), pp. 8--23.
+
+\[Saltzer 1984\] J. Saltzer, D. Reed, D. Clark, "End-to-End Arguments in
+System Design," ACM Transactions on Computer Systems (TOCS), Vol. 2,
+No. 4 (Nov. 1984).
+
+\[Sandvine 2015\] "Global Internet Phenomena Report, Spring 2011,"
+http://www.sandvine.com/news/globalbroadbandtrends.asp, 2011.
+
+\[Sardar 2006\] B. Sardar, D. Saha, "A Survey of TCP Enhancements for
+Last-Hop Wireless Networks," IEEE Commun. Surveys and Tutorials, Vol. 8,
+No. 3 (2006), pp. 20--34.
+
+\[Saroiu 2002\] S. Saroiu, P. K. Gummadi, S. D. Gribble, "A Measurement
+Study of Peer-to-Peer File Sharing Systems," Proc. of Multimedia
+Computing and Networking (MMCN) (2002).
+
+\[Sauter 2014\] M. Sauter, From GSM to LTE-Advanced, John Wiley and
+Sons, 2014.
+
+\[Savage 2015\] D. Savage, J. Ng, S. Moore, D. Slice, P. Paluch, R.
+White, "Enhanced Interior Gateway Routing Protocol," Internet Draft,
+draft-savage-eigrp-04.txt, Aug. 2015.
+
+\[Saydam 1996\] T. Saydam, T. Magedanz, "From Networks and Network
+Management into Service and Service Management," Journal of Networks and
+System Management, Vol. 4, No. 4 (Dec. 1996), pp. 345--348.
+
+\[Schiller 2003\] J. Schiller, Mobile Communications 2nd edition,
+Addison Wesley, 2003.
+
+\[Schneier 1995\] B. Schneier, Applied Cryptography: Protocols,
+Algorithms, and Source Code in C, John Wiley and Sons, 1995.
+
+\[Schulzrinne-RTP 2012\] Henning Schulzrinne's RTP site,
+http://www.cs.columbia .edu/\~hgs/rtp
+
+\[Schulzrinne-SIP 2016\] Henning Schulzrinne's SIP site,
+http://www.cs.columbia.edu/\~hgs/sip
+
+\[Schwartz 1977\] M. Schwartz, Computer-Communication Network Design and
+Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1997.
+
+\[Schwartz 1980\] M. Schwartz, Information, Transmission, Modulation,
+and Noise, McGraw Hill, New York, NY 1980.
+
+\[Schwartz 1982\] M. Schwartz, "Performance Analysis of the SNA Virtual
+Route Pacing Control," IEEE Transactions on Communications, Vol. 30,
+No. 1 (Jan. 1982), pp. 172--184.
+
+\[Scourias 2012\] J. Scourias, "Overview of the Global System for Mobile
+Communications: GSM." http://www.privateline.com/PCS/GSM0.html
+
+\[SDNHub 2016\] SDNHub, "App Development Tutorials," http://sdnhub.org/
+tutorials/
+
+\[Segaller 1998\] S. Segaller, Nerds 2.0.1, A Brief History of the
+Internet, TV Books, New York, 1998.
+
+\[Sekar 2011\] V. Sekar, S. Ratnasamy, M. Reiter, N. Egi, G. Shi, " The
+Middlebox Manifesto: Enabling Innovation in Middlebox Deployment," Proc.
+10th ACM Workshop on Hot Topics in Networks (HotNets), Article 21, 6
+pages.
+
+\[Serpanos 2011\] D. Serpanos, T. Wolf, Architecture of Network Systems,
+Morgan Kaufmann Publishers, 2011.
+
+\[Shacham 1990\] N. Shacham, P. McKenney, "Packet Recovery in High-Speed
+Networks Using Coding and Buffer Management," Proc. 1990 IEEE INFOCOM
+(San Francisco, CA, Apr. 1990), pp. 124--131.
+
+\[Shaikh 2001\] A. Shaikh, R. Tewari, M. Agrawal, "On the Effectiveness
+of DNS-based Server Selection," Proc. 2001 IEEE INFOCOM.
+
+\[Singh 1999\] S. Singh, The Code Book: The Evolution of Secrecy from
+Mary, Queen of Scotsto Quantum Cryptography, Doubleday Press, 1999.
+
+\[Singh 2015\] A. Singh, J. Ong,. Agarwal, G. Anderson, A. Armistead, R.
+Banno, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J.
+Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, A.
+Vahdat, "Jupiter Rising: A Decade of Clos Topologies and Centralized
+Control in Google's Datacenter Network," Sigcomm, 2015.
+
+\[SIP Software 2016\] H. Schulzrinne Software Package site,
+http://www.cs.columbia.edu/IRT/software
+
+\[Skoudis 2004\] E. Skoudis, L. Zeltser, Malware: Fighting Malicious
+Code, Prentice Hall, 2004.
+
+\[Skoudis 2006\] E. Skoudis, T. Liston, Counter Hack Reloaded: A
+Step-by-Step Guide to Computer Attacks and Effective Defenses (2nd
+Edition), Prentice Hall, 2006.
+
+\[Smith 2009\] J. Smith, "Fighting Physics: A Tough Battle,"
+Communications of the ACM, Vol. 52, No. 7 (July 2009), pp. 60--65.
+
+\[Snort 2012\] Sourcefire Inc., Snort homepage,
+http://http://www.snort.org/
+
+\[Solensky 1996\] F. Solensky, "IPv4 Address Lifetime Expectations," in
+IPng: Internet Protocol Next Generation (S. Bradner, A. Mankin, ed.),
+Addison-Wesley, Reading, MA,
+
+1996.
+
+\[Spragins 1991\] J. D. Spragins, Telecommunications Protocols and
+Design, Addison-Wesley, Reading, MA, 1991.
+
+\[Srikant 2004\] R. Srikant, The Mathematics of Internet Congestion
+Control, Birkhauser, 2004
+
+\[Steinder 2002\] M. Steinder, A. Sethi, "Increasing Robustness of Fault
+Localization Through Analysis of Lost, Spurious, and Positive Symptoms,"
+Proc. 2002 IEEE INFOCOM.
+
+\[Stevens 1990\] W. R. Stevens, Unix Network Programming, Prentice-Hall,
+Englewood Cliffs, NJ.
+
+\[Stevens 1994\] W. R. Stevens, TCP/IP Illustrated, Vol. 1: The
+Protocols, Addison-Wesley, Reading, MA, 1994.
+
+\[Stevens 1997\] W.R. Stevens, Unix Network Programming, Volume 1:
+Networking APIs-Sockets and XTI, 2nd edition, Prentice-Hall, Englewood
+Cliffs, NJ, 1997.
+
+\[Stewart 1999\] J. Stewart, BGP4: Interdomain Routing in the Internet,
+Addison-Wesley, 1999.
+
+\[Stone 1998\] J. Stone, M. Greenwald, C. Partridge, J. Hughes,
+"Performance of Checksums and CRC's Over Real Data," IEEE/ACM
+Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp. 529--543.
+
+\[Stone 2000\] J. Stone, C. Partridge, "When Reality and the Checksum
+Disagree," Proc. 2000 ACM SIGCOMM (Stockholm, Sweden, Aug. 2000).
+
+\[Strayer 1992\] W. T. Strayer, B. Dempsey, A. Weaver, XTP: The Xpress
+Transfer Protocol, Addison-Wesley, Reading, MA, 1992.
+
+\[Stubblefield 2002\] A. Stubblefield, J. Ioannidis, A. Rubin, "Using
+the Fluhrer, Mantin, and Shamir Attack to Break WEP," Proceedings of
+2002 Network and Distributed Systems Security Symposium (2002),
+pp. 17--22.
+
+\[Subramanian 2000\] M. Subramanian, Network Management: Principles and
+Practice, Addison-Wesley, Reading, MA, 2000.
+
+\[Subramanian 2002\] L. Subramanian, S. Agarwal, J. Rexford, R. Katz,
+"Characterizing the Internet Hierarchy from Multiple Vantage Points,"
+Proc. 2002 IEEE INFOCOM.
+
+\[Sundaresan 2006\] K.Sundaresan, K. Papagiannaki, "The Need for
+Cross-layer Information in Access Point Selection," Proc. 2006 ACM
+Internet Measurement Conference (Rio De Janeiro, Oct. 2006).
+
+\[Suh 2006\] K. Suh, D. R. Figueiredo, J. Kurose and D. Towsley,
+"Characterizing and Detecting Relayed Traffic: A Case Study Using
+Skype," Proc. 2006 IEEE INFOCOM (Barcelona, Spain, Apr. 2006).
+
+\[Sunshine 1978\] C. Sunshine, Y. Dalal, "Connection Management in
+Transport Protocols," Computer Networks, North-Holland, Amsterdam, 1978.
+
+\[Tariq 2008\] M. Tariq, A. Zeitoun, V. Valancius, N. Feamster, M.
+Ammar, "Answering What-If Deployment and Configuration Questions with
+WISE," Proc. 2008 ACM SIGCOMM (Aug. 2008).
+
+\[TechnOnLine 2012\] TechOnLine, "Protected Wireless Networks," online
+webcast tutorial,
+http://www.techonline.com/community/tech_topic/internet/21752
+
+\[Teixeira 2006\] R. Teixeira, J. Rexford, "Managing Routing Disruptions
+in Internet Service Provider Networks," IEEE Communications Magazine
+(Mar. 2006).
+
+\[Think 2012\] Technical History of Network Protocols, "Cyclades,"
+http://www.cs.utexas.edu/users/chris/think/Cyclades/index.shtml
+
+\[Tian 2012\] Y. Tian, R. Dey, Y. Liu, K. W. Ross, "China's Internet:
+Topology Mapping and Geolocating," IEEE INFOCOM Mini-Conference 2012
+(Orlando, FL, 2012).
+
+\[TLD list 2016\] TLD list maintained by Wikipedia,
+https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
+
+\[Tobagi 1990\] F. Tobagi, "Fast Packet Switch Architectures for
+Broadband Integrated Networks," Proc. 1990 IEEE INFOCOM, Vol. 78, No. 1
+(Jan. 1990), pp. 133--167.
+
+\[TOR 2016\] Tor: Anonymity Online, http://www.torproject.org
+
+\[Torres 2011\] R. Torres, A. Finamore, J. R. Kim, M. M. Munafo, S. Rao,
+"Dissecting Video Server Selection Strategies in the YouTube CDN," Proc.
+2011 Int. Conf. on Distributed Computing Systems.
+
+\[Tourrilhes 2014\] J. Tourrilhes, P. Sharma, S. Banerjee, J. Petit,
+"SDN and Openflow Evolution: A Standards Perspective," IEEE Computer
+Magazine, Nov. 2014, pp. 22--29.
+
+\[Turner 1988\] J. S. Turner, "Design of a Broadcast packet switching
+network," IEEE Transactions on Communications, Vol. 36, No. 6 (June
+1988), pp. 734--743.
+
+\[Turner 2012\] B. Turner, "2G, 3G, 4G Wireless Tutorial,"
+http://blogs.nmscommunications.com/communications/2008/10/2g-3g-4g-wireless-tutorial.html
+
+\[UPnP Forum 2016\] UPnP Forum homepage, http://www.upnp.org/
+
+\[van der Berg 2008\] R. van der Berg, "How the 'Net Works: An
+Introduction to Peering and Transit,"
+http://arstechnica.com/guides/other/peering-and-transit.ars
+
+\[van der Merwe 1998\] J. van der Merwe, S. Rooney, I. Leslie, S.
+Crosby, "The Tempest: A Practical Framework for Network
+Programmability," IEEE Network, Vol. 12, No. 3 (May 1998), pp. 20--28.
+
+\[Varghese 1997\] G. Varghese, A. Lauck, "Hashed and Hierarchical Timing
+Wheels: Efficient Data Structures for Implementing a Timer Facility,"
+IEEE/ACM Transactions on Networking, Vol. 5, No. 6 (Dec. 1997),
+pp. 824--834.
+
+\[Vasudevan 2012\] S. Vasudevan, C. Diot, J. Kurose, D. Towsley,
+"Facilitating Access Point Selection in IEEE 802.11 Wireless Networks,"
+Proc. 2005 ACM Internet Measurement Conference, (San Francisco CA,
+Oct. 2005).
+
+\[Villamizar 1994\] C. Villamizar, C. Song. "High Performance tcp in
+ansnet," ACM SIGCOMM Computer Communications Review, Vol. 24, No. 5
+(1994), pp. 45--60.
+
+\[Viterbi 1995\] A. Viterbi, CDMA: Principles of Spread Spectrum
+Communication, Addison-Wesley, Reading, MA, 1995.
+
+\[Vixie 2009\] P. Vixie, "What DNS Is Not," Communications of the ACM,
+Vol. 52, No. 12 (Dec. 2009), pp. 43--47.
+
+\[Wakeman 1992\] I. Wakeman, J. Crowcroft, Z. Wang, D. Sirovica,
+"Layering Considered Harmful," IEEE Network (Jan. 1992), pp. 20--24.
+
+\[Waldrop 2007\] M. Waldrop, "Data Center in a Box," Scientific American
+(July 2007).
+
+\[Wang 2004\] B. Wang, J. Kurose, P. Shenoy, D. Towsley, "Multimedia
+Streaming via TCP: An Analytic Performance Study," Proc. 2004 ACM
+Multimedia Conference (New York, NY, Oct. 2004).
+
+\[Wang 2008\] B. Wang, J. Kurose, P. Shenoy, D. Towsley, "Multimedia
+Streaming via TCP: An Analytic Performance Study," ACM Transactions on
+Multimedia Computing Communications and Applications (TOMCCAP), Vol. 4,
+No. 2 (Apr. 2008), p. 16. 1--22.
+
+\[Wang 2010\] G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T.
+S. E. Ng, M. Kozuch, M. Ryan, "c-Through: Part-time Optics in Data
+Centers," Proc. 2010 ACM SIGCOMM.
+
+\[Wei 2006\] W. Wei, C. Zhang, H. Zang, J. Kurose, D. Towsley,
+"Inference and Evaluation of Split-Connection Approaches in Cellular
+Data Networks," Proc. Active and Passive Measurement Workshop (Adelaide,
+Australia, Mar. 2006).
+
+\[Wei 2007\] D. X. Wei, C. Jin, S. H. Low, S. Hegde, "FAST TCP:
+Motivation, Architecture, Algorithms, Performance," IEEE/ACM
+Transactions on Networking (2007).
+
+\[Weiser 1991\] M. Weiser, "The Computer for the Twenty-First Century,"
+Scientific American (Sept. 1991): 94--10.
+http://www.ubiq.com/hypertext/weiser/ SciAmDraft3.html
+
+\[White 2011\] A. White, K. Snow, A. Matthews, F. Monrose, "Hookt on
+fon-iks: Phonotactic Reconstruction of Encrypted VoIP Conversations,"
+IEEE Symposium on Security and Privacy, Oakland, CA, 2011.
+
+\[Wigle.net 2016\] Wireless Geographic Logging Engine,
+http://www.wigle.net
+
+\[Wiki Satellite 2016\] Satellite Internet access,
+https://en.wikipedia.org/wiki/Satellite_Internet_access
+
+\[Wireshark 2016\] Wireshark homepage, http://www.wireshark.org
+
+\[Wischik 2005\] D. Wischik, N. McKeown, "Part I: Buffer Sizes for Core
+Routers," ACM SIGCOMM Computer Communications Review, Vol. 35, No. 3
+(July 2005).
+
+\[Woo 1994\] T. Woo, R. Bindignavle, S. Su, S. Lam, "SNP: an interface
+for secure network programming," Proc. 1994 Summer USENIX (Boston, MA,
+June 1994), pp. 45--58.
+
+\[Wright 2015\] J. Wright, J. Wireless Security Secrets & Solutions, 3e,
+"Hacking Exposed Wireless," McGraw-Hill Education, 2015.
+
+\[Wu 2005\] J. Wu, Z. M. Mao, J. Rexford, J. Wang, "Finding a Needle in
+a Haystack: Pinpointing Significant BGP Routing Changes in an IP
+Network," Proc. USENIX NSDI (2005).
+
+\[Xanadu 2012\] Xanadu Project homepage, http://www.xanadu.com/
+
+\[Xiao 2000\] X. Xiao, A. Hannan, B. Bailey, L. Ni, "Traffic Engineering
+with MPLS in the Internet," IEEE Network (Mar./Apr. 2000).
+
+\[Xu 2004\] L. Xu, K Harfoush, I. Rhee, "Binary Increase Congestion
+Control (BIC) for Fast Long-Distance Networks," IEEE INFOCOM 2004,
+pp. 2514--2524.
+
+\[Yavatkar 1994\] R. Yavatkar, N. Bhagwat, "Improving End-to-End
+Performance of TCP over Mobile Internetworks," Proc. Mobile 94 Workshop
+on Mobile Computing Systems and Applications (Dec. 1994).
+
+\[YouTube 2009\] YouTube 2009, Google container data center tour, 2009.
+
+\[YouTube 2016\] YouTube Statistics, 2016,
+https://www.youtube.com/yt/press/ statistics.html
+
+\[Yu 2004\] Yu, Fang, H. Katz, Tirunellai V. Lakshman. "Gigabit Rate
+Packet Pattern-Matching Using TCAM," Proc. 2004 Int. Conf. Network
+Protocols, pp. 174--183.
+
+\[Yu 2011\] M. Yu, J. Rexford, X. Sun, S. Rao, N. Feamster, "A Survey of
+VLAN Usage in Campus Networks," IEEE Communications Magazine, July 2011.
+
+\[Zegura 1997\] E. Zegura, K. Calvert, M. Donahoo, "A Quantitative
+Comparison of Graph-based Models for Internet Topology," IEEE/ACM
+Transactions on Networking, Vol. 5, No. 6, (Dec. 1997). See also
+http://www.cc.gatech.edu/projects/gtim for a software package that
+generates networks with a transit-stub structure.
+
+\[Zhang 1993\] L. Zhang, S. Deering, D. Estrin, S. Shenker, D. Zappala,
+"RSVP: A New Resource Reservation Protocol," IEEE Network Magazine, Vol.
+7, No. 9 (Sept. 1993), pp. 8--18.
+
+\[Zhang 2007\] L. Zhang, "A Retrospective View of NAT," The IETF
+Journal, Vol. 3, Issue 2 (Oct. 2007).
+
+\[Zhang 2015\] G. Zhang, W. Liu, X. Hei, W. Cheng, "Unreeling Xunlei
+Kankan: Understanding Hybrid CDN-P2P Video-on-Demand Streaming," IEEE
+Transactions on Multimedia, Vol. 17, No. 2, Feb. 2015.
+
+\[Zhang X 2102\] X. Zhang, Y. Xu, Y. Liu, Z. Guo, Y. Wang, "Profiling
+Skype Video Calls: Rate Control and Video Quality," IEEE INFOCOM
+(Mar. 2012).
+
+\[Zink 2009\] M. Zink, K. Suh, Y. Gu, J. Kurose, "Characteristics of
+YouTube Network Traffic at a Campus Network---Measurements, Models, and
+Implications," Computer Networks, Vol. 53, No. 4, pp. 501--514, 2009.
+
+Index
+
+