diff options
Diffstat (limited to 'generated/textbook.md')
| -rw-r--r-- | generated/textbook.md | 31112 |
1 files changed, 31112 insertions, 0 deletions
diff --git a/generated/textbook.md b/generated/textbook.md new file mode 100644 index 0000000..cfb43cd --- /dev/null +++ b/generated/textbook.md @@ -0,0 +1,31112 @@ +Computer Networking A Top-Down Approach Seventh Edition James F. Kurose +University of Massachusetts, Amherst Keith W. Ross NYU and NYU Shanghai + +Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam +Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi +Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Vice +President, Editorial Director, ECS: Marcia Horton Acquisitions Editor: +Matt Goldstein Editorial Assistant: Kristy Alaura Vice President of +Marketing: Christy Lesko Director of Field Marketing: Tim Galligan +Product Marketing Manager: Bram Van Kempen Field Marketing Manager: +Demetrius Hall Marketing Assistant: Jon Bryant Director of Product +Management: Erin Gregg Team Lead, Program and Project Management: Scott +Disanno Program Manager: Joanne Manning and Carole Snyder Project +Manager: Katrina Ostler, Ostler Editorial, Inc. Senior Specialist, +Program Planning and Support: Maura Zaldivar-Garcia + +Cover Designer: Joyce Wells Manager, Rights and Permissions: Ben Ferrini +Project Manager, Rights and Permissions: Jenny Hoffman, Aptara +Corporation Inventory Manager: Ann Lam Cover Image: Marc Gutierrez/Getty +Images Media Project Manager: Steve Wright Composition: Cenveo +Publishing Services Printer/Binder: Edwards Brothers Malloy Cover and +Insert Printer: Phoenix Color/ Hagerstown Credits and acknowledgments +borrowed from other sources and reproduced, with permission, in this +textbook appear on appropriate page within text. Copyright © 2017, 2013, +2010 Pearson Education, Inc. All rights reserved. Manufactured in the +United States of America. This publication is protected by Copyright, +and permission should be obtained from the publisher prior to any +prohibited reproduction, storage in a retrieval system, or transmission +in any form or by any means, electronic, mechanical, photocopying, +recording, or likewise. For information regarding permissions, request +forms and the appropriate contacts within the Pearson Education Global +Rights & Permissions Department, please visit +www.pearsoned.com/permissions/. Many of the designations by +manufacturers and seller to distinguish their products are claimed as +trademarks. Where those designations appear in this book, and the +publisher was aware of a trademark claim, the designations have been +printed in initial caps or all caps. Library of Congress +Cataloging-in-Publication Data Names: Kurose, James F. \| Ross, Keith +W., 1956Title: Computer networking: a top-down approach / James F. +Kurose, University of Massachusetts, Amherst, Keith W. Ross, NYU and NYU +Shanghai. Description: Seventh edition. \| Hoboken, New Jersey: Pearson, +\[2017\] \| Includes bibliographical references and index. Identifiers: +LCCN 2016004976 \| ISBN 9780133594140 \| ISBN 0133594149 Subjects: LCSH: +Internet. \| Computer networks. Classification: LCC TK5105.875.I57 K88 +2017 \| DDC 004.6-dc23 + +LC record available at http://lccn.loc.gov/2016004976 + +ISBN-10: + +0-13-359414-9 + +ISBN-13: 978-0-13-359414-0 + +About the Authors Jim Kurose Jim Kurose is a Distinguished University +Professor of Computer Science at the University of Massachusetts, +Amherst. He is currently on leave from the University of Massachusetts, +serving as an Assistant Director at the US National Science Foundation, +where he leads the Directorate of Computer and Information Science and +Engineering. Dr. Kurose has received a number of recognitions for his +educational activities including Outstanding Teacher Awards from the +National Technological University (eight times), the University of +Massachusetts, and the Northeast Association of Graduate Schools. He +received the IEEE Taylor Booth Education Medal and was recognized for +his leadership of Massachusetts' Commonwealth Information Technology +Initiative. He has won several conference best paper awards and received +the IEEE Infocom Achievement Award and the ACM Sigcomm Test of Time +Award. + +Dr. Kurose is a former Editor-in-Chief of IEEE Transactions on +Communications and of IEEE/ACM Transactions on Networking. He has served +as Technical Program co-Chair for IEEE Infocom, ACM SIGCOMM, ACM +Internet Measurement Conference, and ACM SIGMETRICS. He is a Fellow of +the IEEE and the ACM. His research interests include network protocols +and architecture, network measurement, multimedia communication, and +modeling and performance evaluation. He holds a PhD in Computer Science +from Columbia University. + +Keith Ross + +Keith Ross is the Dean of Engineering and Computer Science at NYU +Shanghai and the Leonard J. Shustek Chair Professor in the Computer +Science and Engineering Department at NYU. Previously he was at +University of Pennsylvania (13 years), Eurecom Institute (5 years) and +Polytechnic University (10 years). He received a B.S.E.E from Tufts +University, a M.S.E.E. from Columbia University, and a Ph.D. in Computer +and Control Engineering from The University of Michigan. Keith Ross is +also the co-founder and original CEO of Wimba, which develops online +multimedia applications for e-learning and was acquired by Blackboard in +2010. + +Professor Ross's research interests are in privacy, social networks, +peer-to-peer networking, Internet measurement, content distribution +networks, and stochastic modeling. He is an ACM Fellow, an IEEE Fellow, +recipient of the Infocom 2009 Best Paper Award, and recipient of 2011 +and 2008 Best Paper Awards for Multimedia Communications (awarded by +IEEE Communications Society). He has served on numerous journal +editorial boards and conference program committees, including IEEE/ACM +Transactions on Networking, ACM SIGCOMM, ACM CoNext, and ACM Internet +Measurement Conference. He also has served as an advisor to the Federal +Trade Commission on P2P file sharing. + +To Julie and our three precious ones---Chris, Charlie, and Nina JFK + +A big THANKS to my professors, colleagues, and students all over the +world. KWR + +Preface Welcome to the seventh edition of Computer Networking: A +Top-Down Approach. Since the publication of the first edition 16 years +ago, our book has been adopted for use at many hundreds of colleges and +universities, translated into 14 languages, and used by over one hundred +thousand students and practitioners worldwide. We've heard from many of +these readers and have been overwhelmed by the positive response. + +What's New in the Seventh Edition? We think one important reason for +this success has been that our book continues to offer a fresh and +timely approach to computer networking instruction. We've made changes +in this seventh edition, but we've also kept unchanged what we believe +(and the instructors and students who have used our book have confirmed) +to be the most important aspects of this book: its top-down approach, +its focus on the Internet and a modern treatment of computer networking, +its attention to both principles and practice, and its accessible style +and approach toward learning about computer networking. Nevertheless, +the seventh edition has been revised and updated substantially. +Long-time readers of our book will notice that for the first time since +this text was published, we've changed the organization of the chapters +themselves. The network layer, which had been previously covered in a +single chapter, is now covered in Chapter 4 (which focuses on the +so-called "data plane" component of the network layer) and Chapter 5 +(which focuses on the network layer's "control plane"). This expanded +coverage of the network layer reflects the swift rise in importance of +software-defined networking (SDN), arguably the most important and +exciting advance in networking in decades. Although a relatively recent +innovation, SDN has been rapidly adopted in practice---so much so that +it's already hard to imagine an introduction to modern computer +networking that doesn't cover SDN. The topic of network management, +previously covered in Chapter 9, has now been folded into the new +Chapter 5. As always, we've also updated many other sections of the text +to reflect recent changes in the dynamic field of networking since the +sixth edition. As always, material that has been retired from the +printed text can always be found on this book's Companion Website. The +most important updates are the following: Chapter 1 has been updated to +reflect the ever-growing reach and use of the Internet. Chapter 2, which +covers the application layer, has been significantly updated. We've +removed the material on the FTP protocol and distributed hash tables to +make room for a new section on application-level video streaming and +content distribution networks, together with Netflix and YouTube case +studies. The socket programming sections have been updated from Python 2 +to Python 3. Chapter 3, which covers the transport layer, has been +modestly updated. The material on asynchronous transport mode (ATM) +networks has been replaced by more modern material on the Internet's +explicit congestion notification (ECN), which teaches the same +principles. Chapter 4 covers the "data plane" component of the network +layer---the per-router forwarding function that determine how a packet +arriving on one of a router's input links is forwarded to one of that +router's output links. We updated the material on traditional Internet +forwarding found in all previous editions, and added material on packet +scheduling. We've also added a new section on generalized forwarding, as +practiced in SDN. There are also numerous updates throughout the +chapter. Material on multicast and broadcast communication has been +removed to make way for the new material. In Chapter 5, we cover the +control plane functions of the network layer---the network-wide logic +that controls how a datagram is routed along an end-to-end path of +routers from the source host to the destination host. As in previous +editions, we cover routing algorithms, as well as routing protocols +(with an updated treatment of BGP) used in today's Internet. We've added +a significant new section on the SDN control plane, where routing and +other functions are implemented in so-called SDN controllers. Chapter 6, +which now covers the link layer, has an updated treatment of Ethernet, +and of data center networking. Chapter 7, which covers wireless and +mobile networking, contains updated material on 802.11 (so-called "WiFi) +networks and cellular networks, including 4G and LTE. Chapter 8, which +covers network security and was extensively updated in the sixth +edition, has only + +modest updates in this seventh edition. Chapter 9, on multimedia +networking, is now slightly "thinner" than in the sixth edition, as +material on video streaming and content distribution networks has been +moved to Chapter 2, and material on packet scheduling has been +incorporated into Chapter 4. Significant new material involving +end-of-chapter problems has been added. As with all previous editions, +homework problems have been revised, added, and removed. As always, our +aim in creating this new edition of our book is to continue to provide a +focused and modern treatment of computer networking, emphasizing both +principles and practice. Audience This textbook is for a first course on +computer networking. It can be used in both computer science and +electrical engineering departments. In terms of programming languages, +the book assumes only that the student has experience with C, C++, Java, +or Python (and even then only in a few places). Although this book is +more precise and analytical than many other introductory computer +networking texts, it rarely uses any mathematical concepts that are not +taught in high school. We have made a deliberate effort to avoid using +any advanced calculus, probability, or stochastic process concepts +(although we've included some homework problems for students with this +advanced background). The book is therefore appropriate for +undergraduate courses and for first-year graduate courses. It should +also be useful to practitioners in the telecommunications industry. What +Is Unique About This Textbook? The subject of computer networking is +enormously complex, involving many concepts, protocols, and technologies +that are woven together in an intricate manner. To cope with this scope +and complexity, many computer networking texts are often organized +around the "layers" of a network architecture. With a layered +organization, students can see through the complexity of computer +networking---they learn about the distinct concepts and protocols in one +part of the architecture while seeing the big picture of how all parts +fit together. From a pedagogical perspective, our personal experience +has been that such a layered approach indeed works well. Nevertheless, +we have found that the traditional approach of teaching---bottom up; +that is, from the physical layer towards the application layer---is not +the best approach for a modern course on computer networking. A Top-Down +Approach Our book broke new ground 16 years ago by treating networking +in a top-down manner---that is, by beginning at the application layer +and working its way down toward the physical layer. The feedback we +received from teachers and students alike have confirmed that this +top-down approach has many advantages and does indeed work well +pedagogically. First, it places emphasis on the application layer (a +"high growth area" in networking). Indeed, many of the recent +revolutions in computer networking---including the Web, peer-to-peer +file sharing, and media streaming---have taken place at the application +layer. An early emphasis on application-layer issues differs from the +approaches taken in most other texts, which have only a small amount of +material on network applications, their requirements, application-layer +paradigms (e.g., client-server and peer-to-peer), and application +programming interfaces. Second, our experience as instructors (and that +of many instructors who have used this text) has been that teaching +networking applications near the beginning of the course is a powerful +motivational tool. Students are thrilled to learn about how networking + +applications work---applications such as e-mail and the Web, which most +students use on a daily basis. Once a student understands the +applications, the student can then understand the network services +needed to support these applications. The student can then, in turn, +examine the various ways in which such services might be provided and +implemented in the lower layers. Covering applications early thus +provides motivation for the remainder of the text. Third, a top-down +approach enables instructors to introduce network application +development at an early stage. Students not only see how popular +applications and protocols work, but also learn how easy it is to create +their own network applications and application-level protocols. With the +top-down approach, students get early exposure to the notions of socket +programming, service models, and protocols---important concepts that +resurface in all subsequent layers. By providing socket programming +examples in Python, we highlight the central ideas without confusing +students with complex code. Undergraduates in electrical engineering and +computer science should not have difficulty following the Python code. +An Internet Focus Although we dropped the phrase "Featuring the +Internet" from the title of this book with the fourth edition, this +doesn't mean that we dropped our focus on the Internet. Indeed, nothing +could be further from the case! Instead, since the Internet has become +so pervasive, we felt that any networking textbook must have a +significant focus on the Internet, and thus this phrase was somewhat +unnecessary. We continue to use the Internet's architecture and +protocols as primary vehicles for studying fundamental computer +networking concepts. Of course, we also include concepts and protocols +from other network architectures. But the spotlight is clearly on the +Internet, a fact reflected in our organizing the book around the +Internet's five-layer architecture: the application, transport, network, +link, and physical layers. Another benefit of spotlighting the Internet +is that most computer science and electrical engineering students are +eager to learn about the Internet and its protocols. They know that the +Internet has been a revolutionary and disruptive technology and can see +that it is profoundly changing our world. Given the enormous relevance +of the Internet, students are naturally curious about what is "under the +hood." Thus, it is easy for an instructor to get students excited about +basic principles when using the Internet as the guiding focus. Teaching +Networking Principles Two of the unique features of the book---its +top-down approach and its focus on the Internet---have appeared in the +titles of our book. If we could have squeezed a third phrase into the +subtitle, it would have contained the word principles. The field of +networking is now mature enough that a number of fundamentally important +issues can be identified. For example, in the transport layer, the +fundamental issues include reliable communication over an unreliable +network layer, connection establishment/ teardown and handshaking, +congestion and flow control, and multiplexing. Three fundamentally +important network-layer issues are determining "good" paths between two +routers, interconnecting a large number of heterogeneous networks, and +managing the complexity of a modern network. In the link layer, a +fundamental problem is sharing a multiple access channel. In network +security, techniques for providing confidentiality, authentication, and +message integrity are all based on cryptographic fundamentals. This text +identifies fundamental networking issues and studies approaches towards +addressing these issues. The student learning these principles will gain +knowledge with a long "shelf life"---long after today's network +standards and protocols have become obsolete, the principles they embody +will remain important and relevant. We believe that the combination of +using the Internet to get the student's foot in the door and then +emphasizing fundamental issues and solution approaches will allow the +student to + +quickly understand just about any networking technology. The Website +Each new copy of this textbook includes twelve months of access to a +Companion Website for all book readers at +http://www.pearsonhighered.com/cs-resources/, which includes: +Interactive learning material. The book's Companion Website contains +VideoNotes---video presentations of important topics throughout the book +done by the authors, as well as walkthroughs of solutions to problems +similar to those at the end of the chapter. We've seeded the Web site +with VideoNotes and online problems for Chapters 1 through 5 and will +continue to actively add and update this material over time. As in +earlier editions, the Web site contains the interactive Java applets +that animate many key networking concepts. The site also has interactive +quizzes that permit students to check their basic understanding of the +subject matter. Professors can integrate these interactive features into +their lectures or use them as mini labs. Additional technical material. +As we have added new material in each edition of our book, we've had to +remove coverage of some existing topics to keep the book at manageable +length. For example, to make room for the new material in this edition, +we've removed material on FTP, distributed hash tables, and +multicasting, Material that appeared in earlier editions of the text is +still of interest, and thus can be found on the book's Web site. +Programming assignments. The Web site also provides a number of detailed +programming assignments, which include building a multithreaded Web +server, building an e-mail client with a GUI interface, programming the +sender and receiver sides of a reliable data transport protocol, +programming a distributed routing algorithm, and more. Wireshark labs. +One's understanding of network protocols can be greatly deepened by +seeing them in action. The Web site provides numerous Wireshark +assignments that enable students to actually observe the sequence of +messages exchanged between two protocol entities. The Web site includes +separate Wireshark labs on HTTP, DNS, TCP, UDP, IP, ICMP, Ethernet, ARP, +WiFi, SSL, and on tracing all protocols involved in satisfying a request +to fetch a Web page. We'll continue to add new labs over time. In +addition to the Companion Website, the authors maintain a public Web +site, http://gaia.cs.umass.edu/kurose_ross/interactive, containing +interactive exercises that create (and present solutions for) problems +similar to selected end-of-chapter problems. Since students can generate +(and view solutions for) an unlimited number of similar problem +instances, they can work until the material is truly mastered. +Pedagogical Features We have each been teaching computer networking for +more than 30 years. Together, we bring more than 60 years of teaching +experience to this text, during which time we have taught many thousands +of students. We have also been active researchers in computer networking +during this time. (In fact, Jim and Keith first met each other as +master's students in a computer networking course taught by Mischa +Schwartz in 1979 at Columbia University.) We think all this gives us a +good perspective on where networking has been and where it is likely to +go in the future. Nevertheless, we have resisted temptations to bias the +material in this book towards our own pet research projects. We figure +you can visit our personal Web sites if you are interested in our +research. Thus, this book is about modern computer networking---it is +about contemporary protocols and technologies as well as the underlying +principles behind these protocols and technologies. We also believe + +that learning (and teaching!) about networking can be fun. A sense of +humor, use of analogies, and real-world examples in this book will +hopefully make this material more fun. Supplements for Instructors We +provide a complete supplements package to aid instructors in teaching +this course. This material can be accessed from Pearson's Instructor +Resource Center (http://www.pearsonhighered.com/irc). Visit the +Instructor Resource Center for information about accessing these +instructor's supplements. PowerPoint® slides. We provide PowerPoint +slides for all nine chapters. The slides have been completely updated +with this seventh edition. The slides cover each chapter in detail. They +use graphics and animations (rather than relying only on monotonous text +bullets) to make the slides interesting and visually appealing. We +provide the original PowerPoint slides so you can customize them to best +suit your own teaching needs. Some of these slides have been contributed +by other instructors who have taught from our book. Homework solutions. +We provide a solutions manual for the homework problems in the text, +programming assignments, and Wireshark labs. As noted earlier, we've +introduced many new homework problems in the first six chapters of the +book. Chapter Dependencies The first chapter of this text presents a +self-contained overview of computer networking. Introducing many key +concepts and terminology, this chapter sets the stage for the rest of +the book. All of the other chapters directly depend on this first +chapter. After completing Chapter 1, we recommend instructors cover +Chapters 2 through 6 in sequence, following our top-down philosophy. +Each of these five chapters leverages material from the preceding +chapters. After completing the first six chapters, the instructor has +quite a bit of flexibility. There are no interdependencies among the +last three chapters, so they can be taught in any order. However, each +of the last three chapters depends on the material in the first six +chapters. Many instructors first teach the first six chapters and then +teach one of the last three chapters for "dessert." One Final Note: We'd +Love to Hear from You We encourage students and instructors to e-mail us +with any comments they might have about our book. It's been wonderful +for us to hear from so many instructors and students from around the +world about our first five editions. We've incorporated many of these +suggestions into later editions of the book. We also encourage +instructors to send us new homework problems (and solutions) that would +complement the current homework problems. We'll post these on the +instructor-only portion of the Web site. We also encourage instructors +and students to create new Java applets that illustrate the concepts and +protocols in this book. If you have an applet that you think would be +appropriate for this text, please submit it to us. If the applet +(including notation and terminology) is appropriate, we'll be happy to +include it on the text's Web site, with an appropriate reference to the +applet's authors. So, as the saying goes, "Keep those cards and letters +coming!" Seriously, please do continue to send us interesting URLs, +point out typos, disagree with any of our claims, and tell us what works +and what doesn't work. Tell us what you think should or shouldn't be +included in the next edition. Send your e-mail to kurose@cs.umass.edu +and keithwross@nyu.edu. + +Acknowledgments Since we began writing this book in 1996, many people +have given us invaluable help and have been influential in shaping our +thoughts on how to best organize and teach a networking course. We want +to say A BIG THANKS to everyone who has helped us from the earliest +first drafts of this book, up to this seventh edition. We are also very +thankful to the many hundreds of readers from around the +world---students, faculty, practitioners---who have sent us thoughts and +comments on earlier editions of the book and suggestions for future +editions of the book. Special thanks go out to: Al Aho (Columbia +University) Hisham Al-Mubaid (University of Houston-Clear Lake) Pratima +Akkunoor (Arizona State University) Paul Amer (University of Delaware) +Shamiul Azom (Arizona State University) Lichun Bao (University of +California at Irvine) Paul Barford (University of Wisconsin) Bobby +Bhattacharjee (University of Maryland) Steven Bellovin (Columbia +University) Pravin Bhagwat (Wibhu) Supratik Bhattacharyya (previously at +Sprint) Ernst Biersack (Eurécom Institute) Shahid Bokhari (University of +Engineering & Technology, Lahore) Jean Bolot (Technicolor Research) +Daniel Brushteyn (former University of Pennsylvania student) Ken Calvert +(University of Kentucky) Evandro Cantu (Federal University of Santa +Catarina) Jeff Case (SNMP Research International) Jeff Chaltas (Sprint) +Vinton Cerf (Google) Byung Kyu Choi (Michigan Technological University) +Bram Cohen (BitTorrent, Inc.) Constantine Coutras (Pace University) John +Daigle (University of Mississippi) Edmundo A. de Souza e Silva (Federal +University of Rio de Janeiro) + +Philippe Decuetos (Eurécom Institute) Christophe Diot (Technicolor +Research) Prithula Dhunghel (Akamai) Deborah Estrin (University of +California, Los Angeles) Michalis Faloutsos (University of California at +Riverside) Wu-chi Feng (Oregon Graduate Institute) Sally Floyd (ICIR, +University of California at Berkeley) Paul Francis (Max Planck +Institute) David Fullager (Netflix) Lixin Gao (University of +Massachusetts) JJ Garcia-Luna-Aceves (University of California at Santa +Cruz) Mario Gerla (University of California at Los Angeles) David +Goodman (NYU-Poly) Yang Guo (Alcatel/Lucent Bell Labs) Tim Griffin +(Cambridge University) Max Hailperin (Gustavus Adolphus College) Bruce +Harvey (Florida A&M University, Florida State University) Carl Hauser +(Washington State University) Rachelle Heller (George Washington +University) Phillipp Hoschka (INRIA/W3C) Wen Hsin (Park University) +Albert Huang (former University of Pennsylvania student) Cheng Huang +(Microsoft Research) Esther A. Hughes (Virginia Commonwealth University) +Van Jacobson (Xerox PARC) Pinak Jain (former NYU-Poly student) Jobin +James (University of California at Riverside) Sugih Jamin (University of +Michigan) Shivkumar Kalyanaraman (IBM Research, India) Jussi Kangasharju +(University of Helsinki) Sneha Kasera (University of Utah) + +Parviz Kermani (formerly of IBM Research) Hyojin Kim (former University +of Pennsylvania student) Leonard Kleinrock (University of California at +Los Angeles) David Kotz (Dartmouth College) Beshan Kulapala (Arizona +State University) Rakesh Kumar (Bloomberg) Miguel A. Labrador +(University of South Florida) Simon Lam (University of Texas) Steve Lai +(Ohio State University) Tom LaPorta (Penn State University) Tim-Berners +Lee (World Wide Web Consortium) Arnaud Legout (INRIA) Lee Leitner +(Drexel University) Brian Levine (University of Massachusetts) Chunchun +Li (former NYU-Poly student) Yong Liu (NYU-Poly) William Liang (former +University of Pennsylvania student) Willis Marti (Texas A&M University) +Nick McKeown (Stanford University) Josh McKinzie (Park University) Deep +Medhi (University of Missouri, Kansas City) Bob Metcalfe (International +Data Group) Sue Moon (KAIST) Jenni Moyer (Comcast) Erich Nahum (IBM +Research) Christos Papadopoulos (Colorado Sate University) Craig +Partridge (BBN Technologies) Radia Perlman (Intel) Jitendra Padhye +(Microsoft Research) Vern Paxson (University of California at Berkeley) +Kevin Phillips (Sprint) + +George Polyzos (Athens University of Economics and Business) Sriram +Rajagopalan (Arizona State University) Ramachandran Ramjee (Microsoft +Research) Ken Reek (Rochester Institute of Technology) Martin Reisslein +(Arizona State University) Jennifer Rexford (Princeton University) Leon +Reznik (Rochester Institute of Technology) Pablo Rodrigez (Telefonica) +Sumit Roy (University of Washington) Dan Rubenstein (Columbia +University) Avi Rubin (Johns Hopkins University) Douglas Salane (John +Jay College) Despina Saparilla (Cisco Systems) John Schanz (Comcast) +Henning Schulzrinne (Columbia University) Mischa Schwartz (Columbia +University) Ardash Sethi (University of Delaware) Harish Sethu (Drexel +University) K. Sam Shanmugan (University of Kansas) Prashant Shenoy +(University of Massachusetts) Clay Shields (Georgetown University) Subin +Shrestra (University of Pennsylvania) Bojie Shu (former NYU-Poly +student) Mihail L. Sichitiu (NC State University) Peter Steenkiste +(Carnegie Mellon University) Tatsuya Suda (University of California at +Irvine) Kin Sun Tam (State University of New York at Albany) Don Towsley +(University of Massachusetts) David Turner (California State University, +San Bernardino) Nitin Vaidya (University of Illinois) Michele Weigle +(Clemson University) + +David Wetherall (University of Washington) Ira Winston (University of +Pennsylvania) Di Wu (Sun Yat-sen University) Shirley Wynn (NYU-Poly) Raj +Yavatkar (Intel) Yechiam Yemini (Columbia University) Dian Yu (NYU +Shanghai) Ming Yu (State University of New York at Binghamton) Ellen +Zegura (Georgia Institute of Technology) Honggang Zhang (Suffolk +University) Hui Zhang (Carnegie Mellon University) Lixia Zhang +(University of California at Los Angeles) Meng Zhang (former NYU-Poly +student) Shuchun Zhang (former University of Pennsylvania student) +Xiaodong Zhang (Ohio State University) ZhiLi Zhang (University of +Minnesota) Phil Zimmermann (independent consultant) Mike Zink +(University of Massachusetts) Cliff C. Zou (University of Central +Florida) We also want to thank the entire Pearson team---in particular, +Matt Goldstein and Joanne Manning---who have done an absolutely +outstanding job on this seventh edition (and who have put up with two +very finicky authors who seem congenitally unable to meet deadlines!). +Thanks also to our artists, Janet Theurer and Patrice Rossi Calkin, for +their work on the beautiful figures in this and earlier editions of our +book, and to Katie Ostler and her team at Cenveo for their wonderful +production work on this edition. Finally, a most special thanks go to +our previous two editors at Addison-Wesley---Michael Hirsch and Susan +Hartman. This book would not be what it is (and may well not have been +at all) without their graceful management, constant encouragement, +nearly infinite patience, good humor, and perseverance. + +Table of Contents Chapter 1 Computer Networks and the Internet 1 1.1 +What Is the Internet? 2 1.1.1 A Nuts-and-Bolts Description 2 1.1.2 A +Services Description 5 1.1.3 What Is a Protocol? 7 1.2 The Network Edge +9 1.2.1 Access Networks 12 1.2.2 Physical Media 18 1.3 The Network Core +21 1.3.1 Packet Switching 23 1.3.2 Circuit Switching 27 1.3.3 A Network +of Networks 31 1.4 Delay, Loss, and Throughput in Packet-Switched +Networks 35 1.4.1 Overview of Delay in Packet-Switched Networks 35 1.4.2 +Queuing Delay and Packet Loss 39 1.4.3 End-to-End Delay 41 1.4.4 +Throughput in Computer Networks 43 1.5 Protocol Layers and Their Service +Models 47 1.5.1 Layered Architecture 47 1.5.2 Encapsulation 53 1.6 +Networks Under Attack 55 1.7 History of Computer Networking and the +Internet 59 1.7.1 The Development of Packet Switching: 1961--1972 59 +1.7.2 Proprietary Networks and Internetworking: 1972--1980 60 1.7.3 A +Proliferation of Networks: 1980--1990 62 1.7.4 The Internet Explosion: +The 1990s 63 1.7.5 The New Millennium 64 1.8 Summary 65 + +Homework Problems and Questions 67 Wireshark Lab 77 Interview: Leonard +Kleinrock 79 Chapter 2 Application Layer 83 2.1 Principles of Network +Applications 84 2.1.1 Network Application Architectures 86 2.1.2 +Processes Communicating 88 2.1.3 Transport Services Available to +Applications 90 2.1.4 Transport Services Provided by the Internet 93 +2.1.5 Application-Layer Protocols 96 2.1.6 Network Applications Covered +in This Book 97 2.2 The Web and HTTP 98 2.2.1 Overview of HTTP 98 2.2.2 +Non-Persistent and Persistent Connections 100 2.2.3 HTTP Message Format +103 2.2.4 User-Server Interaction: Cookies 108 2.2.5 Web Caching 110 2.3 +Electronic Mail in the Internet 116 2.3.1 SMTP 118 2.3.2 Comparison with +HTTP 121 2.3.3 Mail Message Formats 121 2.3.4 Mail Access Protocols 122 +2.4 DNS---The Internet's Directory Service 126 2.4.1 Services Provided +by DNS 127 2.4.2 Overview of How DNS Works 129 2.4.3 DNS Records and +Messages 135 2.5 Peer-to-Peer Applications 140 2.5.1 P2P File +Distribution 140 2.6 Video Streaming and Content Distribution Networks +147 2.6.1 Internet Video 148 2.6.2 HTTP Streaming and DASH 148 + +2.6.3 Content Distribution Networks 149 2.6.4 Case Studies: Netflix, +YouTube, and Kankan 153 2.7 Socket Programming: Creating Network +Applications 157 2.7.1 Socket Programming with UDP 159 2.7.2 Socket +Programming with TCP 164 2.8 Summary 170 Homework Problems and Questions +171 Socket Programming Assignments 180 Wireshark Labs: HTTP, DNS 182 +Interview: Marc Andreessen 184 Chapter 3 Transport Layer 187 3.1 +Introduction and Transport-Layer Services 188 3.1.1 Relationship Between +Transport and Network Layers 188 3.1.2 Overview of the Transport Layer +in the Internet 191 3.2 Multiplexing and Demultiplexing 193 3.3 +Connectionless Transport: UDP 200 3.3.1 UDP Segment Structure 204 3.3.2 +UDP Checksum 204 3.4 Principles of Reliable Data Transfer 206 3.4.1 +Building a Reliable Data Transfer Protocol 208 3.4.2 Pipelined Reliable +Data Transfer Protocols 217 3.4.3 Go-Back-N (GBN) 221 3.4.4 Selective +Repeat (SR) 226 3.5 Connection-Oriented Transport: TCP 233 3.5.1 The TCP +Connection 233 3.5.2 TCP Segment Structure 236 3.5.3 Round-Trip Time +Estimation and Timeout 241 3.5.4 Reliable Data Transfer 244 3.5.5 Flow +Control 252 3.5.6 TCP Connection Management 255 3.6 Principles of +Congestion Control 261 + +3.6.1 The Causes and the Costs of Congestion 261 3.6.2 Approaches to +Congestion Control 268 3.7 TCP Congestion Control 269 3.7.1 Fairness 279 +3.7.2 Explicit Congestion Notification (ECN): Network-assisted +Congestion Control 282 3.8 Summary 284 Homework Problems and Questions +286 Programming Assignments 301 Wireshark Labs: Exploring TCP, UDP 302 +Interview: Van Jacobson 303 Chapter 4 The Network Layer: Data Plane 305 +4.1 Overview of Network Layer 306 4.1.1 Forwarding and Routing: The +Network Data and Control Planes 306 4.1.2 Network Service Models 311 4.2 +What's Inside a Router? 313 4.2.1 Input Port Processing and +Destination-Based Forwarding 316 4.2.2 Switching 319 4.2.3 Output Port +Processing 321 4.2.4 Where Does Queuing Occur? 321 4.2.5 Packet +Scheduling 325 4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, +and More 329 4.3.1 IPv4 Datagram Format 330 4.3.2 IPv4 Datagram +Fragmentation 332 4.3.3 IPv4 Addressing 334 4.3.4 Network Address +Translation (NAT) 345 4.3.5 IPv6 348 4.4 Generalized Forwarding and SDN +354 4.4.1 Match 356 4.4.2 Action 358 4.4.3 OpenFlow Examples of +Match-plus-action in Action 358 4.5 Summary 361 + +Homework Problems and Questions 361 Wireshark Lab 370 Interview: Vinton +G. Cerf 371 Chapter 5 The Network Layer: Control Plane 373 5.1 +Introduction 374 5.2 Routing Algorithms 376 5.2.1 The Link-State (LS) +Routing Algorithm 379 5.2.2 The Distance-Vector (DV) Routing Algorithm +384 5.3 Intra-AS Routing in the Internet: OSPF 391 5.4 Routing Among the +ISPs: BGP 395 5.4.1 The Role of BGP 395 5.4.2 Advertising BGP Route +Information 396 5.4.3 Determining the Best Routes 398 5.4.4 IP-Anycast +402 5.4.5 Routing Policy 403 5.4.6 Putting the Pieces Together: +Obtaining Internet Presence 406 5.5 The SDN Control Plane 407 5.5.1 The +SDN Control Plane: SDN Controller and SDN Control Applications 410 5.5.2 +OpenFlow Protocol 412 5.5.3 Data and Control Plane Interaction: An +Example 414 5.5.4 SDN: Past and Future 415 5.6 ICMP: The Internet +Control Message Protocol 419 5.7 Network Management and SNMP 421 5.7.1 +The Network Management Framework 422 5.7.2 The Simple Network Management +Protocol (SNMP) 424 5.8 Summary 426 Homework Problems and Questions 427 +Socket Programming Assignment 433 Programming Assignment 434 Wireshark +Lab 435 Interview: Jennifer Rexford 436 + +Chapter 6 The Link Layer and LANs 439 6.1 Introduction to the Link Layer +440 6.1.1 The Services Provided by the Link Layer 442 6.1.2 Where Is the +Link Layer Implemented? 443 6.2 Error-Detection and -Correction +Techniques 444 6.2.1 Parity Checks 446 6.2.2 Checksumming Methods 448 +6.2.3 Cyclic Redundancy Check (CRC) 449 6.3 Multiple Access Links and +Protocols 451 6.3.1 Channel Partitioning Protocols 453 6.3.2 Random +Access Protocols 455 6.3.3 Taking-Turns Protocols 464 6.3.4 DOCSIS: The +Link-Layer Protocol for Cable Internet Access 465 6.4 Switched Local +Area Networks 467 6.4.1 Link-Layer Addressing and ARP 468 6.4.2 Ethernet +474 6.4.3 Link-Layer Switches 481 6.4.4 Virtual Local Area Networks +(VLANs) 487 6.5 Link Virtualization: A Network as a Link Layer 491 6.5.1 +Multiprotocol Label Switching (MPLS) 492 6.6 Data Center Networking 495 +6.7 Retrospective: A Day in the Life of a Web Page Request 500 6.7.1 +Getting Started: DHCP, UDP, IP, and Ethernet 500 6.7.2 Still Getting +Started: DNS and ARP 502 6.7.3 Still Getting Started: Intra-Domain +Routing to the DNS Server 503 6.7.4 Web Client-Server Interaction: TCP +and HTTP 504 6.8 Summary 506 Homework Problems and Questions 507 +Wireshark Lab 515 Interview: Simon S. Lam 516 + +Chapter 7 Wireless and Mobile Networks 519 7.1 Introduction 520 7.2 +Wireless Links and Network Characteristics 525 7.2.1 CDMA 528 7.3 WiFi: +802.11 Wireless LANs 532 7.3.1 The 802.11 Architecture 533 7.3.2 The +802.11 MAC Protocol 537 7.3.3 The IEEE 802.11 Frame 542 7.3.4 Mobility +in the Same IP Subnet 546 7.3.5 Advanced Features in 802.11 547 7.3.6 +Personal Area Networks: Bluetooth and Zigbee 548 7.4 Cellular Internet +Access 551 7.4.1 An Overview of Cellular Network Architecture 551 7.4.2 +3G Cellular Data Networks: Extending the Internet to Cellular +Subscribers 554 7.4.3 On to 4G: LTE 557 7.5 Mobility Management: +Principles 560 7.5.1 Addressing 562 7.5.2 Routing to a Mobile Node 564 +7.6 Mobile IP 570 7.7 Managing Mobility in Cellular Networks 574 7.7.1 +Routing Calls to a Mobile User 576 7.7.2 Handoffs in GSM 577 7.8 +Wireless and Mobility: Impact on Higher-Layer Protocols 580 7.9 Summary +582 Homework Problems and Questions 583 Wireshark Lab 588 Interview: +Deborah Estrin 589 Chapter 8 Security in Computer Networks 593 8.1 What +Is Network Security? 594 8.2 Principles of Cryptography 596 8.2.1 +Symmetric Key Cryptography 598 8.2.2 Public Key Encryption 604 + +8.3 Message Integrity and Digital Signatures 610 8.3.1 Cryptographic +Hash Functions 611 8.3.2 Message Authentication Code 613 8.3.3 Digital +Signatures 614 8.4 End-Point Authentication 621 8.4.1 Authentication +Protocol ap1.0 622 8.4.2 Authentication Protocol ap2.0 622 8.4.3 +Authentication Protocol ap3.0 623 8.4.4 Authentication Protocol ap3.1 +623 8.4.5 Authentication Protocol ap4.0 624 8.5 Securing E-Mail 626 +8.5.1 Secure E-Mail 627 8.5.2 PGP 630 8.6 Securing TCP Connections: SSL +631 8.6.1 The Big Picture 632 8.6.2 A More Complete Picture 635 8.7 +Network-Layer Security: IPsec and Virtual Private Networks 637 8.7.1 +IPsec and Virtual Private Networks (VPNs) 638 8.7.2 The AH and ESP +Protocols 640 8.7.3 Security Associations 640 8.7.4 The IPsec Datagram +641 8.7.5 IKE: Key Management in IPsec 645 8.8 Securing Wireless LANs +646 8.8.1 Wired Equivalent Privacy (WEP) 646 8.8.2 IEEE 802.11i 648 8.9 +Operational Security: Firewalls and Intrusion Detection Systems 651 +8.9.1 Firewalls 651 8.9.2 Intrusion Detection Systems 659 8.10 Summary +662 Homework Problems and Questions 664 Wireshark Lab 672 + +IPsec Lab 672 Interview: Steven M. Bellovin 673 Chapter 9 Multimedia +Networking 675 9.1 Multimedia Networking Applications 676 9.1.1 +Properties of Video 676 9.1.2 Properties of Audio 677 9.1.3 Types of +Multimedia Network Applications 679 9.2 Streaming Stored Video 681 9.2.1 +UDP Streaming 683 9.2.2 HTTP Streaming 684 9.3 Voice-over-IP 688 9.3.1 +Limitations of the Best-Effort IP Service 688 9.3.2 Removing Jitter at +the Receiver for Audio 691 9.3.3 Recovering from Packet Loss 694 9.3.4 +Case Study: VoIP with Skype 697 9.4 Protocols for Real-Time +Conversational Applications 700 9.4.1 RTP 700 9.4.2 SIP 703 9.5 Network +Support for Multimedia 709 9.5.1 Dimensioning Best-Effort Networks 711 +9.5.2 Providing Multiple Classes of Service 712 9.5.3 Diffserv 719 9.5.4 +Per-Connection Quality-of-Service (QoS) Guarantees: Resource Reservation +and Call Admission 723 9.6 Summary 726 Homework Problems and Questions +727 Programming Assignment 735 Interview: Henning Schulzrinne 736 +References 741 Index 783 + +Chapter 1 Computer Networks and the Internet + +Today's Internet is arguably the largest engineered system ever created +by mankind, with hundreds of millions of connected computers, +communication links, and switches; with billions of users who connect +via laptops, tablets, and smartphones; and with an array of new +Internet-connected "things" including game consoles, surveillance +systems, watches, eye glasses, thermostats, body scales, and cars. Given +that the Internet is so large and has so many diverse components and +uses, is there any hope of understanding how it works? Are there guiding +principles and structure that can provide a foundation for understanding +such an amazingly large and complex system? And if so, is it possible +that it actually could be both interesting and fun to learn about +computer networks? Fortunately, the answer to all of these questions is +a resounding YES! Indeed, it's our aim in this book to provide you with +a modern introduction to the dynamic field of computer networking, +giving you the principles and practical insights you'll need to +understand not only today's networks, but tomorrow's as well. This first +chapter presents a broad overview of computer networking and the +Internet. Our goal here is to paint a broad picture and set the context +for the rest of this book, to see the forest through the trees. We'll +cover a lot of ground in this introductory chapter and discuss a lot of +the pieces of a computer network, without losing sight of the big +picture. We'll structure our overview of computer networks in this +chapter as follows. After introducing some basic terminology and +concepts, we'll first examine the basic hardware and software components +that make up a network. We'll begin at the network's edge and look at +the end systems and network applications running in the network. We'll +then explore the core of a computer network, examining the links and the +switches that transport data, as well as the access networks and +physical media that connect end systems to the network core. We'll learn +that the Internet is a network of networks, and we'll learn how these +networks connect with each other. After having completed this overview +of the edge and core of a computer network, we'll take the broader and +more abstract view in the second half of this chapter. We'll examine +delay, loss, and throughput of data in a computer network and provide +simple quantitative models for end-to-end throughput and delay: models +that take into account transmission, propagation, and queuing delays. +We'll then introduce some of the key architectural principles in +computer networking, namely, protocol layering and service models. We'll +also learn that computer networks are vulnerable to many different types +of attacks; we'll survey + +some of these attacks and consider how computer networks can be made +more secure. Finally, we'll close this chapter with a brief history of +computer networking. + +1.1 What Is the Internet? In this book, we'll use the public Internet, a +specific computer network, as our principal vehicle for discussing +computer networks and their protocols. But what is the Internet? There +are a couple of ways to answer this question. First, we can describe the +nuts and bolts of the Internet, that is, the basic hardware and software +components that make up the Internet. Second, we can describe the +Internet in terms of a networking infrastructure that provides services +to distributed applications. Let's begin with the nuts-and-bolts +description, using Figure 1.1 to illustrate our discussion. + +1.1.1 A Nuts-and-Bolts Description The Internet is a computer network +that interconnects billions of computing devices throughout the world. +Not too long ago, these computing devices were primarily traditional +desktop PCs, Linux workstations, and so-called servers that store and +transmit information such as Web pages and e-mail messages. +Increasingly, however, nontraditional Internet "things" such as laptops, +smartphones, tablets, TVs, gaming consoles, thermostats, home security +systems, home appliances, watches, eye glasses, cars, traffic control +systems and more are being connected to the Internet. Indeed, the term +computer network is beginning to sound a bit dated, given the many +nontraditional devices that are being hooked up to the Internet. In +Internet jargon, all of these devices are called hosts or end systems. +By some estimates, in 2015 there were about 5 billion devices connected +to the Internet, and the number will reach 25 billion by 2020 \[Gartner +2014\]. It is estimated that in 2015 there were over 3.2 billion +Internet users worldwide, approximately 40% of the world population +\[ITU 2015\]. + +Figure 1.1 Some pieces of the Internet + +End systems are connected together by a network of communication links +and packet switches. We'll see in Section 1.2 that there are many types +of communication links, which are made up of + +different types of physical media, including coaxial cable, copper wire, +optical fiber, and radio spectrum. Different links can transmit data at +different rates, with the transmission rate of a link measured in +bits/second. When one end system has data to send to another end system, +the sending end system segments the data and adds header bytes to each +segment. The resulting packages of information, known as packets in the +jargon of computer networks, are then sent through the network to the +destination end system, where they are reassembled into the original +data. A packet switch takes a packet arriving on one of its incoming +communication links and forwards that packet on one of its outgoing +communication links. Packet switches come in many shapes and flavors, +but the two most prominent types in today's Internet are routers and +link-layer switches. Both types of switches forward packets toward their +ultimate destinations. Link-layer switches are typically used in access +networks, while routers are typically used in the network core. The +sequence of communication links and packet switches traversed by a +packet from the sending end system to the receiving end system is known +as a route or path through the network. Cisco predicts annual global IP +traffic will pass the zettabyte (1021 bytes) threshold by the end of +2016, and will reach 2 zettabytes per year by 2019 \[Cisco VNI 2015\]. +Packet-switched networks (which transport packets) are in many ways +similar to transportation networks of highways, roads, and intersections +(which transport vehicles). Consider, for example, a factory that needs +to move a large amount of cargo to some destination warehouse located +thousands of kilometers away. At the factory, the cargo is segmented and +loaded into a fleet of trucks. Each of the trucks then independently +travels through the network of highways, roads, and intersections to the +destination warehouse. At the destination warehouse, the cargo is +unloaded and grouped with the rest of the cargo arriving from the same +shipment. Thus, in many ways, packets are analogous to trucks, +communication links are analogous to highways and roads, packet switches +are analogous to intersections, and end systems are analogous to +buildings. Just as a truck takes a path through the transportation +network, a packet takes a path through a computer network. End systems +access the Internet through Internet Service Providers (ISPs), including +residential ISPs such as local cable or telephone companies; corporate +ISPs; university ISPs; ISPs that provide WiFi access in airports, +hotels, coffee shops, and other public places; and cellular data ISPs, +providing mobile access to our smartphones and other devices. Each ISP +is in itself a network of packet switches and communication links. ISPs +provide a variety of types of network access to the end systems, +including residential broadband access such as cable modem or DSL, +high-speed local area network access, and mobile wireless access. ISPs +also provide Internet access to content providers, connecting Web sites +and video servers directly to the Internet. The Internet is all about +connecting end systems to each other, so the ISPs that provide access to +end systems must also be interconnected. These lower-tier ISPs are +interconnected through national and international upper-tier ISPs such +as Level 3 Communications, AT&T, Sprint, and NTT. An upper-tier ISP +consists of high-speed routers interconnected with high-speed +fiber-optic links. Each ISP network, whether upper-tier or lower-tier, +is + +managed independently, runs the IP protocol (see below), and conforms to +certain naming and address conventions. We'll examine ISPs and their +interconnection more closely in Section 1.3. End systems, packet +switches, and other pieces of the Internet run protocols that control +the sending and receiving of information within the Internet. The +Transmission Control Protocol (TCP) and the Internet Protocol (IP) are +two of the most important protocols in the Internet. The IP protocol +specifies the format of the packets that are sent and received among +routers and end systems. The Internet's principal protocols are +collectively known as TCP/IP. We'll begin looking into protocols in this +introductory chapter. But that's just a start---much of this book is +concerned with computer network protocols! Given the importance of +protocols to the Internet, it's important that everyone agree on what +each and every protocol does, so that people can create systems and +products that interoperate. This is where standards come into play. +Internet standards are developed by the Internet Engineering Task Force +(IETF) \[IETF 2016\]. The IETF standards documents are called requests +for comments (RFCs). RFCs started out as general requests for comments +(hence the name) to resolve network and protocol design problems that +faced the precursor to the Internet \[Allman 2011\]. RFCs tend to be +quite technical and detailed. They define protocols such as TCP, IP, +HTTP (for the Web), and SMTP (for e-mail). There are currently more than +7,000 RFCs. Other bodies also specify standards for network components, +most notably for network links. The IEEE 802 LAN/MAN Standards Committee +\[IEEE 802 2016\], for example, specifies the Ethernet and wireless WiFi +standards. + +1.1.2 A Services Description Our discussion above has identified many of +the pieces that make up the Internet. But we can also describe the +Internet from an entirely different angle---namely, as an infrastructure +that provides services to applications. In addition to traditional +applications such as e-mail and Web surfing, Internet applications +include mobile smartphone and tablet applications, including Internet +messaging, mapping with real-time road-traffic information, music +streaming from the cloud, movie and television streaming, online social +networks, video conferencing, multi-person games, and location-based +recommendation systems. The applications are said to be distributed +applications, since they involve multiple end systems that exchange data +with each other. Importantly, Internet applications run on end +systems--- they do not run in the packet switches in the network core. +Although packet switches facilitate the exchange of data among end +systems, they are not concerned with the application that is the source +or sink of data. Let's explore a little more what we mean by an +infrastructure that provides services to applications. To this end, +suppose you have an exciting new idea for a distributed Internet +application, one that may greatly benefit humanity or one that may +simply make you rich and famous. How might you go about + +transforming this idea into an actual Internet application? Because +applications run on end systems, you are going to need to write programs +that run on the end systems. You might, for example, write your programs +in Java, C, or Python. Now, because you are developing a distributed +Internet application, the programs running on the different end systems +will need to send data to each other. And here we get to a central +issue---one that leads to the alternative way of describing the Internet +as a platform for applications. How does one program running on one end +system instruct the Internet to deliver data to another program running +on another end system? End systems attached to the Internet provide a +socket interface that specifies how a program running on one end system +asks the Internet infrastructure to deliver data to a specific +destination program running on another end system. This Internet socket +interface is a set of rules that the sending program must follow so that +the Internet can deliver the data to the destination program. We'll +discuss the Internet socket interface in detail in Chapter 2. For now, +let's draw upon a simple analogy, one that we will frequently use in +this book. Suppose Alice wants to send a letter to Bob using the postal +service. Alice, of course, can't just write the letter (the data) and +drop the letter out her window. Instead, the postal service requires +that Alice put the letter in an envelope; write Bob's full name, +address, and zip code in the center of the envelope; seal the envelope; +put a stamp in the upper-right-hand corner of the envelope; and finally, +drop the envelope into an official postal service mailbox. Thus, the +postal service has its own "postal service interface," or set of rules, +that Alice must follow to have the postal service deliver her letter to +Bob. In a similar manner, the Internet has a socket interface that the +program sending data must follow to have the Internet deliver the data +to the program that will receive the data. The postal service, of +course, provides more than one service to its customers. It provides +express delivery, reception confirmation, ordinary use, and many more +services. In a similar manner, the Internet provides multiple services +to its applications. When you develop an Internet application, you too +must choose one of the Internet's services for your application. We'll +describe the Internet's services in Chapter 2. We have just given two +descriptions of the Internet; one in terms of its hardware and software +components, the other in terms of an infrastructure for providing +services to distributed applications. But perhaps you are still confused +as to what the Internet is. What are packet switching and TCP/IP? What +are routers? What kinds of communication links are present in the +Internet? What is a distributed application? How can a thermostat or +body scale be attached to the Internet? If you feel a bit overwhelmed by +all of this now, don't worry---the purpose of this book is to introduce +you to both the nuts and bolts of the Internet and the principles that +govern how and why it works. We'll explain these important terms and +questions in the following sections and chapters. + +1.1.3 What Is a Protocol? + +Now that we've got a bit of a feel for what the Internet is, let's +consider another important buzzword in computer networking: protocol. +What is a protocol? What does a protocol do? A Human Analogy It is +probably easiest to understand the notion of a computer network protocol +by first considering some human analogies, since we humans execute +protocols all of the time. Consider what you do when you want to ask +someone for the time of day. A typical exchange is shown in Figure 1.2. +Human protocol (or good manners, at least) dictates that one first offer +a greeting (the first "Hi" in Figure 1.2) to initiate communication with +someone else. The typical response to a "Hi" is a returned "Hi" message. +Implicitly, one then takes a cordial "Hi" response as an indication that +one can proceed and ask for the time of day. A different response to the +initial "Hi" (such as "Don't bother me!" or "I don't speak English," or +some unprintable reply) might + +Figure 1.2 A human protocol and a computer network protocol + +indicate an unwillingness or inability to communicate. In this case, the +human protocol would be not to ask for the time of day. Sometimes one +gets no response at all to a question, in which case one typically gives +up asking that person for the time. Note that in our human protocol, +there are specific messages + +we send, and specific actions we take in response to the received reply +messages or other events (such as no reply within some given amount of +time). Clearly, transmitted and received messages, and actions taken +when these messages are sent or received or other events occur, play a +central role in a human protocol. If people run different protocols (for +example, if one person has manners but the other does not, or if one +understands the concept of time and the other does not) the protocols do +not interoperate and no useful work can be accomplished. The same is +true in networking---it takes two (or more) communicating entities +running the same protocol in order to accomplish a task. Let's consider +a second human analogy. Suppose you're in a college class (a computer +networking class, for example!). The teacher is droning on about +protocols and you're confused. The teacher stops to ask, "Are there any +questions?" (a message that is transmitted to, and received by, all +students who are not sleeping). You raise your hand (transmitting an +implicit message to the teacher). Your teacher acknowledges you with a +smile, saying "Yes . . ." (a transmitted message encouraging you to ask +your question---teachers love to be asked questions), and you then ask +your question (that is, transmit your message to your teacher). Your +teacher hears your question (receives your question message) and answers +(transmits a reply to you). Once again, we see that the transmission and +receipt of messages, and a set of conventional actions taken when these +messages are sent and received, are at the heart of this +question-and-answer protocol. Network Protocols A network protocol is +similar to a human protocol, except that the entities exchanging +messages and taking actions are hardware or software components of some +device (for example, computer, smartphone, tablet, router, or other +network-capable device). All activity in the Internet that involves two +or more communicating remote entities is governed by a protocol. For +example, hardware-implemented protocols in two physically connected +computers control the flow of bits on the "wire" between the two network +interface cards; congestion-control protocols in end systems control the +rate at which packets are transmitted between sender and receiver; +protocols in routers determine a packet's path from source to +destination. Protocols are running everywhere in the Internet, and +consequently much of this book is about computer network protocols. As +an example of a computer network protocol with which you are probably +familiar, consider what happens when you make a request to a Web server, +that is, when you type the URL of a Web page into your Web browser. The +scenario is illustrated in the right half of Figure 1.2. First, your +computer will send a connection request message to the Web server and +wait for a reply. The Web server will eventually receive your connection +request message and return a connection reply message. Knowing that it +is now OK to request the Web document, your computer then sends the name +of the Web page it wants to fetch from that Web server in a GET message. +Finally, the Web server returns the Web page (file) to your computer. + +Given the human and networking examples above, the exchange of messages +and the actions taken when these messages are sent and received are the +key defining elements of a protocol: A protocol defines the format and +the order of messages exchanged between two or more communicating +entities, as well as the actions taken on the transmission and/or +receipt of a message or other event. The Internet, and computer networks +in general, make extensive use of protocols. Different protocols are +used to accomplish different communication tasks. As you read through +this book, you will learn that some protocols are simple and +straightforward, while others are complex and intellectually deep. +Mastering the field of computer networking is equivalent to +understanding the what, why, and how of networking protocols. + +1.2 The Network Edge In the previous section we presented a high-level +overview of the Internet and networking protocols. We are now going to +delve a bit more deeply into the components of a computer network (and +the Internet, in particular). We begin in this section at the edge of a +network and look at the components with which we are most +familiar---namely, the computers, smartphones and other devices that we +use on a daily basis. In the next section we'll move from the network +edge to the network core and examine switching and routing in computer +networks. Recall from the previous section that in computer networking +jargon, the computers and other devices connected to the Internet are +often referred to as end systems. They are referred to as end systems +because they sit at the edge of the Internet, as shown in Figure 1.3. +The Internet's end systems include desktop computers (e.g., desktop PCs, +Macs, and Linux boxes), servers (e.g., Web and e-mail servers), and +mobile devices (e.g., laptops, smartphones, and tablets). Furthermore, +an increasing number of non-traditional "things" are being attached to +the Internet as end systems (see the Case History feature). End systems +are also referred to as hosts because they host (that is, run) +application programs such as a Web browser program, a Web server +program, an e-mail client program, or an e-mail server program. +Throughout this book we will use the + +Figure 1.3 End-system interaction + +CASE HISTORY THE INTERNET OF THINGS Can you imagine a world in which +just about everything is wirelessly connected to the Internet? A world +in which most people, cars, bicycles, eye glasses, watches, toys, +hospital equipment, home sensors, classrooms, video surveillance +systems, atmospheric sensors, store-shelf + +products, and pets are connected? This world of the Internet of Things +(IoT) may actually be just around the corner. By some estimates, as of +2015 there are already 5 billion things connected to the Internet, and +the number could reach 25 billion by 2020 \[Gartner 2014\]. These things +include our smartphones, which already follow us around in our homes, +offices, and cars, reporting our geolocations and usage data to our ISPs +and Internet applications. But in addition to our smartphones, a +wide-variety of non-traditional "things" are already available as +products. For example, there are Internet-connected wearables, including +watches (from Apple and many others) and eye glasses. Internet-connected +glasses can, for example, upload everything we see to the cloud, +allowing us to share our visual experiences with people around the world +in realtime. There are Internet-connected things already available for +the smart home, including Internet-connected thermostats that can be +controlled remotely from our smartphones, and Internet-connected body +scales, enabling us to graphically review the progress of our diets from +our smartphones. There are Internet-connected toys, including dolls that +recognize and interpret a child's speech and respond appropriately. The +IoT offers potentially revolutionary benefits to users. But at the same +time there are also huge security and privacy risks. For example, +attackers, via the Internet, might be able to hack into IoT devices or +into the servers collecting data from IoT devices. For example, an +attacker could hijack an Internet-connected doll and talk directly with +a child; or an attacker could hack into a database that stores personal +health and activity information collected from wearable devices. These +security and privacy concerns could undermine the consumer confidence +necessary for the technologies to meet their full potential and may +result in less widespread adoption \[FTC 2015\]. + +terms hosts and end systems interchangeably; that is, host = end system. +Hosts are sometimes further divided into two categories: clients and +servers. Informally, clients tend to be desktop and mobile PCs, +smartphones, and so on, whereas servers tend to be more powerful +machines that store and distribute Web pages, stream video, relay +e-mail, and so on. Today, most of the servers from which we receive +search results, e-mail, Web pages, and videos reside in large data +centers. For example, Google has 50-100 data centers, including about 15 +large centers, each with more than 100,000 servers. + +1.2.1 Access Networks Having considered the applications and end systems +at the "edge of the network," let's next consider the access +network---the network that physically connects an end system to the +first router (also known as the "edge router") on a path from the end +system to any other distant end system. Figure 1.4 shows several types +of access + +Figure 1.4 Access networks + +networks with thick, shaded lines and the settings (home, enterprise, +and wide-area mobile wireless) in which they are used. Home Access: DSL, +Cable, FTTH, Dial-Up, and Satellite + +In developed countries as of 2014, more than 78 percent of the +households have Internet access, with Korea, Netherlands, Finland, and +Sweden leading the way with more than 80 percent of households having +Internet access, almost all via a high-speed broadband connection \[ITU +2015\]. Given this widespread use of home access networks let's begin +our overview of access networks by considering how homes connect to the +Internet. Today, the two most prevalent types of broadband residential +access are digital subscriber line (DSL) and cable. A residence +typically obtains DSL Internet access from the same local telephone +company (telco) that provides its wired local phone access. Thus, when +DSL is used, a customer's telco is also its ISP. As shown in Figure 1.5, +each customer's DSL modem uses the existing telephone line (twistedpair +copper wire, which we'll discuss in Section 1.2.2) to exchange data with +a digital subscriber line access multiplexer (DSLAM) located in the +telco's local central office (CO). The home's DSL modem takes digital +data and translates it to high-frequency tones for transmission over +telephone wires to the CO; the analog signals from many such houses are +translated back into digital format at the DSLAM. The residential +telephone line carries both data and traditional telephone signals +simultaneously, which are encoded at different frequencies: A high-speed +downstream channel, in the 50 kHz to 1 MHz band A medium-speed upstream +channel, in the 4 kHz to 50 kHz band An ordinary two-way telephone +channel, in the 0 to 4 kHz band This approach makes the single DSL link +appear as if there were three separate links, so that a telephone call +and an Internet connection can share the DSL link at the same time. + +Figure 1.5 DSL Internet access + +(We'll describe this technique of frequency-division multiplexing in +Section 1.3.1.) On the customer side, a splitter separates the data and +telephone signals arriving to the home and forwards the data signal to + +the DSL modem. On the telco side, in the CO, the DSLAM separates the +data and phone signals and sends the data into the Internet. Hundreds or +even thousands of households connect to a single DSLAM \[Dischinger +2007\]. The DSL standards define multiple transmission rates, including +12 Mbps downstream and 1.8 Mbps upstream \[ITU 1999\], and 55 Mbps +downstream and 15 Mbps upstream \[ITU 2006\]. Because the downstream and +upstream rates are different, the access is said to be asymmetric. The +actual downstream and upstream transmission rates achieved may be less +than the rates noted above, as the DSL provider may purposefully limit a +residential rate when tiered service (different rates, available at +different prices) are offered. The maximum rate is also limited by the +distance between the home and the CO, the gauge of the twisted-pair line +and the degree of electrical interference. Engineers have expressly +designed DSL for short distances between the home and the CO; generally, +if the residence is not located within 5 to 10 miles of the CO, the +residence must resort to an alternative form of Internet access. While +DSL makes use of the telco's existing local telephone infrastructure, +cable Internet access makes use of the cable television company's +existing cable television infrastructure. A residence obtains cable +Internet access from the same company that provides its cable +television. As illustrated in Figure 1.6, fiber optics connect the cable +head end to neighborhood-level junctions, from which traditional coaxial +cable is then used to reach individual houses and apartments. Each +neighborhood junction typically supports 500 to 5,000 homes. Because +both fiber and coaxial cable are employed in this system, it is often +referred to as hybrid fiber coax (HFC). + +Figure 1.6 A hybrid fiber-coaxial access network + +Cable internet access requires special modems, called cable modems. As +with a DSL modem, the cable + +modem is typically an external device and connects to the home PC +through an Ethernet port. (We will discuss Ethernet in great detail in +Chapter 6.) At the cable head end, the cable modem termination system +(CMTS) serves a similar function as the DSL network's DSLAM---turning +the analog signal sent from the cable modems in many downstream homes +back into digital format. Cable modems divide the HFC network into two +channels, a downstream and an upstream channel. As with DSL, access is +typically asymmetric, with the downstream channel typically allocated a +higher transmission rate than the upstream channel. The DOCSIS 2.0 +standard defines downstream rates up to 42.8 Mbps and upstream rates of +up to 30.7 Mbps. As in the case of DSL networks, the maximum achievable +rate may not be realized due to lower contracted data rates or media +impairments. One important characteristic of cable Internet access is +that it is a shared broadcast medium. In particular, every packet sent +by the head end travels downstream on every link to every home and every +packet sent by a home travels on the upstream channel to the head end. +For this reason, if several users are simultaneously downloading a video +file on the downstream channel, the actual rate at which each user +receives its video file will be significantly lower than the aggregate +cable downstream rate. On the other hand, if there are only a few active +users and they are all Web surfing, then each of the users may actually +receive Web pages at the full cable downstream rate, because the users +will rarely request a Web page at exactly the same time. Because the +upstream channel is also shared, a distributed multiple access protocol +is needed to coordinate transmissions and avoid collisions. (We'll +discuss this collision issue in some detail in Chapter 6.) Although DSL +and cable networks currently represent more than 85 percent of +residential broadband access in the United States, an up-and-coming +technology that provides even higher speeds is fiber to the home (FTTH) +\[FTTH Council 2016\]. As the name suggests, the FTTH concept is +simple---provide an optical fiber path from the CO directly to the home. +Many countries today---including the UAE, South Korea, Hong Kong, Japan, +Singapore, Taiwan, Lithuania, and Sweden---now have household +penetration rates exceeding 30% \[FTTH Council 2016\]. There are several +competing technologies for optical distribution from the CO to the +homes. The simplest optical distribution network is called direct fiber, +with one fiber leaving the CO for each home. More commonly, each fiber +leaving the central office is actually shared by many homes; it is not +until the fiber gets relatively close to the homes that it is split into +individual customer-specific fibers. There are two competing +optical-distribution network architectures that perform this splitting: +active optical networks (AONs) and passive optical networks (PONs). AON +is essentially switched Ethernet, which is discussed in Chapter 6. Here, +we briefly discuss PON, which is used in Verizon's FIOS service. Figure +1.7 shows FTTH using the PON distribution architecture. Each home has an +optical network terminator (ONT), which is connected by dedicated +optical fiber to a neighborhood splitter. The splitter combines a number +of homes (typically less + +Figure 1.7 FTTH Internet access + +than 100) onto a single, shared optical fiber, which connects to an +optical line terminator (OLT) in the telco's CO. The OLT, providing +conversion between optical and electrical signals, connects to the +Internet via a telco router. In the home, users connect a home router +(typically a wireless router) to the ONT and access the Internet via +this home router. In the PON architecture, all packets sent from OLT to +the splitter are replicated at the splitter (similar to a cable head +end). FTTH can potentially provide Internet access rates in the gigabits +per second range. However, most FTTH ISPs provide different rate +offerings, with the higher rates naturally costing more money. The +average downstream speed of US FTTH customers was approximately 20 Mbps +in 2011 (compared with 13 Mbps for cable access networks and less than 5 +Mbps for DSL) \[FTTH Council 2011b\]. Two other access network +technologies are also used to provide Internet access to the home. In +locations where DSL, cable, and FTTH are not available (e.g., in some +rural settings), a satellite link can be used to connect a residence to +the Internet at speeds of more than 1 Mbps; StarBand and HughesNet are +two such satellite access providers. Dial-up access over traditional +phone lines is based on the same model as DSL---a home modem connects +over a phone line to a modem in the ISP. Compared with DSL and other +broadband access networks, dial-up access is excruciatingly slow at 56 +kbps. Access in the Enterprise (and the Home): Ethernet and WiFi On +corporate and university campuses, and increasingly in home settings, a +local area network (LAN) is used to connect an end system to the edge +router. Although there are many types of LAN technologies, Ethernet is +by far the most prevalent access technology in corporate, university, +and home networks. As shown in Figure 1.8, Ethernet users use +twisted-pair copper wire to connect to an Ethernet switch, a technology +discussed in detail in Chapter 6. The Ethernet switch, or a network of +such + +Figure 1.8 Ethernet Internet access + +interconnected switches, is then in turn connected into the larger +Internet. With Ethernet access, users typically have 100 Mbps or 1 Gbps +access to the Ethernet switch, whereas servers may have 1 Gbps or even +10 Gbps access. Increasingly, however, people are accessing the Internet +wirelessly from laptops, smartphones, tablets, and other "things" (see +earlier sidebar on "Internet of Things"). In a wireless LAN setting, +wireless users transmit/receive packets to/from an access point that is +connected into the enterprise's network (most likely using wired +Ethernet), which in turn is connected to the wired Internet. A wireless +LAN user must typically be within a few tens of meters of the access +point. Wireless LAN access based on IEEE 802.11 technology, more +colloquially known as WiFi, is now just about everywhere---universities, +business offices, cafes, airports, homes, and even in airplanes. In many +cities, one can stand on a street corner and be within range of ten or +twenty base stations (for a browseable global map of 802.11 base +stations that have been discovered and logged on a Web site by people +who take great enjoyment in doing such things, see \[wigle.net 2016\]). +As discussed in detail in Chapter 7, 802.11 today provides a shared +transmission rate of up to more than 100 Mbps. Even though Ethernet and +WiFi access networks were initially deployed in enterprise (corporate, +university) settings, they have recently become relatively common +components of home networks. Many homes combine broadband residential +access (that is, cable modems or DSL) with these inexpensive wireless +LAN technologies to create powerful home networks \[Edwards 2011\]. +Figure 1.9 shows a typical home network. This home network consists of a +roaming laptop as well as a wired PC; a base station (the wireless +access point), which communicates with the wireless PC and other +wireless devices in the home; a cable modem, providing broadband access +to the Internet; and a router, which interconnects the base station and +the stationary PC with the cable modem. This network allows household +members to have broadband access to the Internet with one member roaming +from the + +kitchen to the backyard to the bedrooms. + +Figure 1.9 A typical home network + +Wide-Area Wireless Access: 3G and LTE Increasingly, devices such as +iPhones and Android devices are being used to message, share photos in +social networks, watch movies, and stream music while on the run. These +devices employ the same wireless infrastructure used for cellular +telephony to send/receive packets through a base station that is +operated by the cellular network provider. Unlike WiFi, a user need only +be within a few tens of kilometers (as opposed to a few tens of meters) +of the base station. Telecommunications companies have made enormous +investments in so-called third-generation (3G) wireless, which provides +packet-switched wide-area wireless Internet access at speeds in excess +of 1 Mbps. But even higher-speed wide-area access technologies---a +fourth-generation (4G) of wide-area wireless networks---are already +being deployed. LTE (for "Long-Term Evolution"---a candidate for Bad +Acronym of the Year Award) has its roots in 3G technology, and can +achieve rates in excess of 10 Mbps. LTE downstream rates of many tens of +Mbps have been reported in commercial deployments. We'll cover the basic +principles of wireless networks and mobility, as well as WiFi, 3G, and +LTE technologies (and more!) in Chapter 7. + +1.2.2 Physical Media In the previous subsection, we gave an overview of +some of the most important network access technologies in the Internet. +As we described these technologies, we also indicated the physical media +used. For example, we said that HFC uses a combination of fiber cable +and coaxial cable. We said that DSL and Ethernet use copper wire. And we +said that mobile access networks use the radio spectrum. In this +subsection we provide a brief overview of these and other transmission +media that are commonly used in the Internet. + +In order to define what is meant by a physical medium, let us reflect on +the brief life of a bit. Consider a bit traveling from one end system, +through a series of links and routers, to another end system. This poor +bit gets kicked around and transmitted many, many times! The source end +system first transmits the bit, and shortly thereafter the first router +in the series receives the bit; the first router then transmits the bit, +and shortly thereafter the second router receives the bit; and so on. +Thus our bit, when traveling from source to destination, passes through +a series of transmitter-receiver pairs. For each transmitterreceiver +pair, the bit is sent by propagating electromagnetic waves or optical +pulses across a physical medium. The physical medium can take many +shapes and forms and does not have to be of the same type for each +transmitter-receiver pair along the path. Examples of physical media +include twisted-pair copper wire, coaxial cable, multimode fiber-optic +cable, terrestrial radio spectrum, and satellite radio spectrum. +Physical media fall into two categories: guided media and unguided +media. With guided media, the waves are guided along a solid medium, +such as a fiber-optic cable, a twisted-pair copper wire, or a coaxial +cable. With unguided media, the waves propagate in the atmosphere and in +outer space, such as in a wireless LAN or a digital satellite channel. +But before we get into the characteristics of the various media types, +let us say a few words about their costs. The actual cost of the +physical link (copper wire, fiber-optic cable, and so on) is often +relatively minor compared with other networking costs. In particular, +the labor cost associated with the installation of the physical link can +be orders of magnitude higher than the cost of the material. For this +reason, many builders install twisted pair, optical fiber, and coaxial +cable in every room in a building. Even if only one medium is initially +used, there is a good chance that another medium could be used in the +near future, and so money is saved by not having to lay additional wires +in the future. Twisted-Pair Copper Wire The least expensive and most +commonly used guided transmission medium is twisted-pair copper wire. +For over a hundred years it has been used by telephone networks. In +fact, more than 99 percent of the wired connections from the telephone +handset to the local telephone switch use twisted-pair copper wire. Most +of us have seen twisted pair in our homes (or those of our parents or +grandparents!) and work environments. Twisted pair consists of two +insulated copper wires, each about 1 mm thick, arranged in a regular +spiral pattern. The wires are twisted together to reduce the electrical +interference from similar pairs close by. Typically, a number of pairs +are bundled together in a cable by wrapping the pairs in a protective +shield. A wire pair constitutes a single communication link. Unshielded +twisted pair (UTP) is commonly used for computer networks within a +building, that is, for LANs. Data rates for LANs using twisted pair +today range from 10 Mbps to 10 Gbps. The data rates that can be achieved +depend on the thickness of the wire and the distance between transmitter +and receiver. When fiber-optic technology emerged in the 1980s, many +people disparaged twisted pair because of its relatively low bit rates. +Some people even felt that fiber-optic technology would completely +replace twisted pair. But twisted pair did not give up so easily. Modern +twisted-pair technology, such as category + +6a cable, can achieve data rates of 10 Gbps for distances up to a +hundred meters. In the end, twisted pair has emerged as the dominant +solution for high-speed LAN networking. As discussed earlier, twisted +pair is also commonly used for residential Internet access. We saw that +dial-up modem technology enables access at rates of up to 56 kbps over +twisted pair. We also saw that DSL (digital subscriber line) technology +has enabled residential users to access the Internet at tens of Mbps +over twisted pair (when users live close to the ISP's central office). +Coaxial Cable Like twisted pair, coaxial cable consists of two copper +conductors, but the two conductors are concentric rather than parallel. +With this construction and special insulation and shielding, coaxial +cable can achieve high data transmission rates. Coaxial cable is quite +common in cable television systems. As we saw earlier, cable television +systems have recently been coupled with cable modems to provide +residential users with Internet access at rates of tens of Mbps. In +cable television and cable Internet access, the transmitter shifts the +digital signal to a specific frequency band, and the resulting analog +signal is sent from the transmitter to one or more receivers. Coaxial +cable can be used as a guided shared medium. Specifically, a number of +end systems can be connected directly to the cable, with each of the end +systems receiving whatever is sent by the other end systems. Fiber +Optics An optical fiber is a thin, flexible medium that conducts pulses +of light, with each pulse representing a bit. A single optical fiber can +support tremendous bit rates, up to tens or even hundreds of gigabits +per second. They are immune to electromagnetic interference, have very +low signal attenuation up to 100 kilometers, and are very hard to tap. +These characteristics have made fiber optics the preferred longhaul +guided transmission media, particularly for overseas links. Many of the +long-distance telephone networks in the United States and elsewhere now +use fiber optics exclusively. Fiber optics is also prevalent in the +backbone of the Internet. However, the high cost of optical +devices---such as transmitters, receivers, and switches---has hindered +their deployment for short-haul transport, such as in a LAN or into the +home in a residential access network. The Optical Carrier (OC) standard +link speeds range from 51.8 Mbps to 39.8 Gbps; these specifications are +often referred to as OC-n, where the link speed equals n ∞ 51.8 Mbps. +Standards in use today include OC-1, OC-3, OC-12, OC-24, OC-48, OC96, +OC-192, OC-768. \[Mukherjee 2006, Ramaswami 2010\] provide coverage of +various aspects of optical networking. Terrestrial Radio Channels Radio +channels carry signals in the electromagnetic spectrum. They are an +attractive medium because they require no physical wire to be installed, +can penetrate walls, provide connectivity to a mobile user, + +and can potentially carry a signal for long distances. The +characteristics of a radio channel depend significantly on the +propagation environment and the distance over which a signal is to be +carried. Environmental considerations determine path loss and shadow +fading (which decrease the signal strength as the signal travels over a +distance and around/through obstructing objects), multipath fading (due +to signal reflection off of interfering objects), and interference (due +to other transmissions and electromagnetic signals). Terrestrial radio +channels can be broadly classified into three groups: those that operate +over very short distance (e.g., with one or two meters); those that +operate in local areas, typically spanning from ten to a few hundred +meters; and those that operate in the wide area, spanning tens of +kilometers. Personal devices such as wireless headsets, keyboards, and +medical devices operate over short distances; the wireless LAN +technologies described in Section 1.2.1 use local-area radio channels; +the cellular access technologies use wide-area radio channels. We'll +discuss radio channels in detail in Chapter 7. Satellite Radio Channels +A communication satellite links two or more Earth-based microwave +transmitter/ receivers, known as ground stations. The satellite receives +transmissions on one frequency band, regenerates the signal using a +repeater (discussed below), and transmits the signal on another +frequency. Two types of satellites are used in communications: +geostationary satellites and low-earth orbiting (LEO) satellites \[Wiki +Satellite 2016\]. Geostationary satellites permanently remain above the +same spot on Earth. This stationary presence is achieved by placing the +satellite in orbit at 36,000 kilometers above Earth's surface. This huge +distance from ground station through satellite back to ground station +introduces a substantial signal propagation delay of 280 milliseconds. +Nevertheless, satellite links, which can operate at speeds of hundreds +of Mbps, are often used in areas without access to DSL or cable-based +Internet access. LEO satellites are placed much closer to Earth and do +not remain permanently above one spot on Earth. They rotate around Earth +(just as the Moon does) and may communicate with each other, as well as +with ground stations. To provide continuous coverage to an area, many +satellites need to be placed in orbit. There are currently many +low-altitude communication systems in development. LEO satellite +technology may be used for Internet access sometime in the future. + +1.3 The Network Core Having examined the Internet's edge, let us now +delve more deeply inside the network core---the mesh of packet switches +and links that interconnects the Internet's end systems. Figure 1.10 +highlights the network core with thick, shaded lines. + +Figure 1.10 The network core + +1.3.1 Packet Switching In a network application, end systems exchange +messages with each other. Messages can contain anything the application +designer wants. Messages may perform a control function (for example, +the "Hi" messages in our handshaking example in Figure 1.2) or can +contain data, such as an e-mail message, a JPEG image, or an MP3 audio +file. To send a message from a source end system to a destination end +system, the source breaks long messages into smaller chunks of data +known as packets. Between source and destination, each packet travels +through communication links and packet switches (for which there are two +predominant types, routers and link-layer switches). Packets are +transmitted over each communication link at a rate equal to the full +transmission rate of the link. So, if a source end system or a packet +switch is sending a packet of L bits over a link with transmission rate +R bits/sec, then the time to transmit the packet is L / R seconds. +Store-and-Forward Transmission Most packet switches use +store-and-forward transmission at the inputs to the links. +Store-and-forward transmission means that the packet switch must receive +the entire packet before it can begin to transmit the first bit of the +packet onto the outbound link. To explore store-and-forward transmission +in more detail, consider a simple network consisting of two end systems +connected by a single router, as shown in Figure 1.11. A router will +typically have many incident links, since its job is to switch an +incoming packet onto an outgoing link; in this simple example, the +router has the rather simple task of transferring a packet from one +(input) link to the only other attached link. In this example, the +source has three packets, each consisting of L bits, to send to the +destination. At the snapshot of time shown in Figure 1.11, the source +has transmitted some of packet 1, and the front of packet 1 has already +arrived at the router. Because the router employs store-and-forwarding, +at this instant of time, the router cannot transmit the bits it has +received; instead it must first buffer (i.e., "store") the packet's +bits. Only after the router has received all of the packet's bits can it +begin to transmit (i.e., "forward") the packet onto the outbound link. +To gain some insight into store-and-forward transmission, let's now +calculate the amount of time that elapses from when the source begins to +send the packet until the destination has received the entire packet. +(Here we will ignore propagation delay---the time it takes for the bits +to travel across the wire at near the speed of light---which will be +discussed in Section 1.4.) The source begins to transmit at time 0; at +time L/R seconds, the source has transmitted the entire packet, and the +entire packet has been received and stored at the router (since there is +no propagation delay). At time L/R seconds, since the router has just +received the entire packet, it can begin to transmit the packet onto the +outbound link towards the destination; at time 2L/R, the router has +transmitted the entire packet, and the + +entire packet has been received by the destination. Thus, the total +delay is 2L/R. If the + +Figure 1.11 Store-and-forward packet switching + +switch instead forwarded bits as soon as they arrive (without first +receiving the entire packet), then the total delay would be L/R since +bits are not held up at the router. But, as we will discuss in Section +1.4, routers need to receive, store, and process the entire packet +before forwarding. Now let's calculate the amount of time that elapses +from when the source begins to send the first packet until the +destination has received all three packets. As before, at time L/R, the +router begins to forward the first packet. But also at time L/R the +source will begin to send the second packet, since it has just finished +sending the entire first packet. Thus, at time 2L/R, the destination has +received the first packet and the router has received the second packet. +Similarly, at time 3L/R, the destination has received the first two +packets and the router has received the third packet. Finally, at time +4L/R the destination has received all three packets! Let's now consider +the general case of sending one packet from source to destination over a +path consisting of N links each of rate R (thus, there are N-1 routers +between source and destination). Applying the same logic as above, we +see that the end-to-end delay is: dend-to-end=NLR + +(1.1) + +You may now want to try to determine what the delay would be for P +packets sent over a series of N links. Queuing Delays and Packet Loss +Each packet switch has multiple links attached to it. For each attached +link, the packet switch has an output buffer (also called an output +queue), which stores packets that the router is about to send into that +link. The output buffers play a key role in packet switching. If an +arriving packet needs to be transmitted onto a link but finds the link +busy with the transmission of another packet, the arriving packet must +wait in the output buffer. Thus, in addition to the store-and-forward +delays, packets suffer output buffer queuing delays. These delays are +variable and depend on the level of congestion in the network. + +Since the amount of buffer space is finite, an + +Figure 1.12 Packet switching + +arriving packet may find that the buffer is completely full with other +packets waiting for transmission. In this case, packet loss will +occur---either the arriving packet or one of the already-queued packets +will be dropped. Figure 1.12 illustrates a simple packet-switched +network. As in Figure 1.11, packets are represented by three-dimensional +slabs. The width of a slab represents the number of bits in the packet. +In this figure, all packets have the same width and hence the same +length. Suppose Hosts A and B are sending packets to Host E. Hosts A and +B first send their packets along 100 Mbps Ethernet links to the first +router. The router then directs these packets to the 15 Mbps link. If, +during a short interval of time, the arrival rate of packets to the +router (when converted to bits per second) exceeds 15 Mbps, congestion +will occur at the router as packets queue in the link's output buffer +before being transmitted onto the link. For example, if Host A and B +each send a burst of five packets back-to-back at the same time, then +most of these packets will spend some time waiting in the queue. The +situation is, in fact, entirely analogous to many common-day +situations---for example, when we wait in line for a bank teller or wait +in front of a tollbooth. We'll examine this queuing delay in more detail +in Section 1.4. Forwarding Tables and Routing Protocols Earlier, we said +that a router takes a packet arriving on one of its attached +communication links and forwards that packet onto another one of its +attached communication links. But how does the router determine which +link it should forward the packet onto? Packet forwarding is actually +done in different ways in different types of computer networks. Here, we +briefly describe how it is done in the Internet. + +In the Internet, every end system has an address called an IP address. +When a source end system wants to send a packet to a destination end +system, the source includes the destination's IP address in the packet's +header. As with postal addresses, this address has a hierarchical +structure. When a packet arrives at a router in the network, the router +examines a portion of the packet's destination address and forwards the +packet to an adjacent router. More specifically, each router has a +forwarding table that maps destination addresses (or portions of the +destination addresses) to that router's outbound links. When a packet +arrives at a router, the router examines the address and searches its +forwarding table, using this destination address, to find the +appropriate outbound link. The router then directs the packet to this +outbound link. The end-to-end routing process is analogous to a car +driver who does not use maps but instead prefers to ask for directions. +For example, suppose Joe is driving from Philadelphia to 156 Lakeside +Drive in Orlando, Florida. Joe first drives to his neighborhood gas +station and asks how to get to 156 Lakeside Drive in Orlando, Florida. +The gas station attendant extracts the Florida portion of the address +and tells Joe that he needs to get onto the interstate highway I-95 +South, which has an entrance just next to the gas station. He also tells +Joe that once he enters Florida, he should ask someone else there. Joe +then takes I-95 South until he gets to Jacksonville, Florida, at which +point he asks another gas station attendant for directions. The +attendant extracts the Orlando portion of the address and tells Joe that +he should continue on I-95 to Daytona Beach and then ask someone else. +In Daytona Beach, another gas station attendant also extracts the +Orlando portion of the address and tells Joe that he should take I-4 +directly to Orlando. Joe takes I-4 and gets off at the Orlando exit. Joe +goes to another gas station attendant, and this time the attendant +extracts the Lakeside Drive portion of the address and tells Joe the +road he must follow to get to Lakeside Drive. Once Joe reaches Lakeside +Drive, he asks a kid on a bicycle how to get to his destination. The kid +extracts the 156 portion of the address and points to the house. Joe +finally reaches his ultimate destination. In the above analogy, the gas +station attendants and kids on bicycles are analogous to routers. We +just learned that a router uses a packet's destination address to index +a forwarding table and determine the appropriate outbound link. But this +statement begs yet another question: How do forwarding tables get set? +Are they configured by hand in each and every router, or does the +Internet use a more automated procedure? This issue will be studied in +depth in Chapter 5. But to whet your appetite here, we'll note now that +the Internet has a number of special routing protocols that are used to +automatically set the forwarding tables. A routing protocol may, for +example, determine the shortest path from each router to each +destination and use the shortest path results to configure the +forwarding tables in the routers. How would you actually like to see the +end-to-end route that packets take in the Internet? We now invite you to +get your hands dirty by interacting with the Trace-route program. Simply +visit the site www.traceroute.org, choose a source in a particular +country, and trace the route from that source to your computer. (For a +discussion of Traceroute, see Section 1.4.) + +1.3.2 Circuit Switching There are two fundamental approaches to moving +data through a network of links and switches: circuit switching and +packet switching. Having covered packet-switched networks in the +previous subsection, we now turn our attention to circuit-switched +networks. In circuit-switched networks, the resources needed along a +path (buffers, link transmission rate) to provide for communication +between the end systems are reserved for the duration of the +communication session between the end systems. In packet-switched +networks, these resources are not reserved; a session's messages use the +resources on demand and, as a consequence, may have to wait (that is, +queue) for access to a communication link. As a simple analogy, consider +two restaurants, one that requires reservations and another that neither +requires reservations nor accepts them. For the restaurant that requires +reservations, we have to go through the hassle of calling before we +leave home. But when we arrive at the restaurant we can, in principle, +immediately be seated and order our meal. For the restaurant that does +not require reservations, we don't need to bother to reserve a table. +But when we arrive at the restaurant, we may have to wait for a table +before we can be seated. Traditional telephone networks are examples of +circuit-switched networks. Consider what happens when one person wants +to send information (voice or facsimile) to another over a telephone +network. Before the sender can send the information, the network must +establish a connection between the sender and the receiver. This is a +bona fide connection for which the switches on the path between the +sender and receiver maintain connection state for that connection. In +the jargon of telephony, this connection is called a circuit. When the +network establishes the circuit, it also reserves a constant +transmission rate in the network's links (representing a fraction of +each link's transmission capacity) for the duration of the connection. +Since a given transmission rate has been reserved for this +sender-toreceiver connection, the sender can transfer the data to the +receiver at the guaranteed constant rate. Figure 1.13 illustrates a +circuit-switched network. In this network, the four circuit switches are +interconnected by four links. Each of these links has four circuits, so +that each link can support four simultaneous connections. The hosts (for +example, PCs and workstations) are each directly connected to one of the +switches. When two hosts want to communicate, the network establishes a +dedicated endto-end connection between the two hosts. Thus, in order for +Host A to communicate with Host B, the network must first reserve one +circuit on each of two links. In this example, the dedicated end-to-end +connection uses the second circuit in the first link and the fourth +circuit in the second link. Because each link has four circuits, for +each link used by the end-to-end connection, the connection gets one +fourth of the link's total transmission capacity for the duration of the +connection. Thus, for example, if each link between adjacent switches +has a transmission rate of 1 Mbps, then each end-to-end circuit-switch +connection gets 250 kbps of dedicated transmission rate. + +Figure 1.13 A simple circuit-switched network consisting of four +switches and four links + +In contrast, consider what happens when one host wants to send a packet +to another host over a packet-switched network, such as the Internet. As +with circuit switching, the packet is transmitted over a series of +communication links. But different from circuit switching, the packet is +sent into the network without reserving any link resources whatsoever. +If one of the links is congested because other packets need to be +transmitted over the link at the same time, then the packet will have to +wait in a buffer at the sending side of the transmission link and suffer +a delay. The Internet makes its best effort to deliver packets in a +timely manner, but it does not make any guarantees. Multiplexing in +Circuit-Switched Networks A circuit in a link is implemented with either +frequency-division multiplexing (FDM) or time-division multiplexing +(TDM). With FDM, the frequency spectrum of a link is divided up among +the connections established across the link. Specifically, the link +dedicates a frequency band to each connection for the duration of the +connection. In telephone networks, this frequency band typically has a +width of 4 kHz (that is, 4,000 hertz or 4,000 cycles per second). The +width of the band is called, not surprisingly, the bandwidth. FM radio +stations also use FDM to share the frequency spectrum between 88 MHz and +108 MHz, with each station being allocated a specific frequency band. +For a TDM link, time is divided into frames of fixed duration, and each +frame is divided into a fixed number of time slots. When the network +establishes a connection across a link, the network dedicates one time +slot in every frame to this connection. These slots are dedicated for +the sole use of that connection, with one time slot available for use +(in every frame) to transmit the connection's data. + +Figure 1.14 With FDM, each circuit continuously gets a fraction of the +bandwidth. With TDM, each circuit gets all of the bandwidth periodically +during brief intervals of time (that is, during slots) + +Figure 1.14 illustrates FDM and TDM for a specific network link +supporting up to four circuits. For FDM, the frequency domain is +segmented into four bands, each of bandwidth 4 kHz. For TDM, the time +domain is segmented into frames, with four time slots in each frame; +each circuit is assigned the same dedicated slot in the revolving TDM +frames. For TDM, the transmission rate of a circuit is equal to the +frame rate multiplied by the number of bits in a slot. For example, if +the link transmits 8,000 frames per second and each slot consists of 8 +bits, then the transmission rate of each circuit is 64 kbps. Proponents +of packet switching have always argued that circuit switching is +wasteful because the dedicated circuits are idle during silent periods. +For example, when one person in a telephone call stops talking, the idle +network resources (frequency bands or time slots in the links along the +connection's route) cannot be used by other ongoing connections. As +another example of how these resources can be underutilized, consider a +radiologist who uses a circuit-switched network to remotely access a +series of x-rays. The radiologist sets up a connection, requests an +image, contemplates the image, and then requests a new image. Network +resources are allocated to the connection but are not used (i.e., are +wasted) during the radiologist's contemplation periods. Proponents of +packet switching also enjoy pointing out that establishing end-to-end +circuits and reserving end-to-end transmission capacity is complicated +and requires complex signaling software to coordinate the operation of +the switches along the end-to-end path. Before we finish our discussion +of circuit switching, let's work through a numerical example that should +shed further insight on the topic. Let us consider how long it takes to +send a file of 640,000 bits from Host A to Host B over a +circuit-switched network. Suppose that all links in the network use TDM +with 24 slots and have a bit rate of 1.536 Mbps. Also suppose that it +takes 500 msec to establish an end-to-end circuit before Host A can +begin to transmit the file. How long does it take to send the file? Each +circuit has a transmission rate of (1.536 Mbps)/24=64 kbps, so it takes +(640,000 bits)/(64 kbps)=10 seconds to transmit the file. To this 10 +seconds we add the circuit establishment time, giving 10.5 seconds to +send the file. Note that the transmission time is independent of the +number of links: The transmission time would be 10 seconds if the +end-to-end circuit passed through one link or a hundred links. (The +actual + +end-to-end delay also includes a propagation delay; see Section 1.4.) +Packet Switching Versus Circuit Switching Having described circuit +switching and packet switching, let us compare the two. Critics of +packet switching have often argued that packet switching is not suitable +for real-time services (for example, telephone calls and video +conference calls) because of its variable and unpredictable end-to-end +delays (due primarily to variable and unpredictable queuing delays). +Proponents of packet switching argue that (1) it offers better sharing +of transmission capacity than circuit switching and (2) it is simpler, +more efficient, and less costly to implement than circuit switching. An +interesting discussion of packet switching versus circuit switching is +\[Molinero-Fernandez 2002\]. Generally speaking, people who do not like +to hassle with restaurant reservations prefer packet switching to +circuit switching. Why is packet switching more efficient? Let's look at +a simple example. Suppose users share a 1 Mbps link. Also suppose that +each user alternates between periods of activity, when a user generates +data at a constant rate of 100 kbps, and periods of inactivity, when a +user generates no data. Suppose further that a user is active only 10 +percent of the time (and is idly drinking coffee during the remaining 90 +percent of the time). With circuit switching, 100 kbps must be reserved +for each user at all times. For example, with circuit-switched TDM, if a +one-second frame is divided into 10 time slots of 100 ms each, then each +user would be allocated one time slot per frame. Thus, the +circuit-switched link can support only 10(=1 Mbps/100 kbps) simultaneous +users. With packet switching, the probability that a specific user is +active is 0.1 (that is, 10 percent). If there are 35 users, the +probability that there are 11 or more simultaneously active users is +approximately 0.0004. (Homework Problem P8 outlines how this probability +is obtained.) When there are 10 or fewer simultaneously active users +(which happens with probability 0.9996), the aggregate arrival rate of +data is less than or equal to 1 Mbps, the output rate of the link. Thus, +when there are 10 or fewer active users, users' packets flow through the +link essentially without delay, as is the case with circuit switching. +When there are more than 10 simultaneously active users, then the +aggregate arrival rate of packets exceeds the output capacity of the +link, and the output queue will begin to grow. (It continues to grow +until the aggregate input rate falls back below 1 Mbps, at which point +the queue will begin to diminish in length.) Because the probability of +having more than 10 simultaneously active users is minuscule in this +example, packet switching provides essentially the same performance as +circuit switching, but does so while allowing for more than three times +the number of users. Let's now consider a second simple example. Suppose +there are 10 users and that one user suddenly generates one thousand +1,000-bit packets, while other users remain quiescent and do not +generate packets. Under TDM circuit switching with 10 slots per frame +and each slot consisting of 1,000 bits, the active user can only use its +one time slot per frame to transmit data, while the remaining nine time +slots in each frame remain idle. It will be 10 seconds before all of the +active user's one million bits of data has + +been transmitted. In the case of packet switching, the active user can +continuously send its packets at the full link rate of 1 Mbps, since +there are no other users generating packets that need to be multiplexed +with the active user's packets. In this case, all of the active user's +data will be transmitted within 1 second. The above examples illustrate +two ways in which the performance of packet switching can be superior to +that of circuit switching. They also highlight the crucial difference +between the two forms of sharing a link's transmission rate among +multiple data streams. Circuit switching pre-allocates use of the +transmission link regardless of demand, with allocated but unneeded link +time going unused. Packet switching on the other hand allocates link use +on demand. Link transmission capacity will be shared on a +packet-by-packet basis only among those users who have packets that need +to be transmitted over the link. Although packet switching and circuit +switching are both prevalent in today's telecommunication networks, the +trend has certainly been in the direction of packet switching. Even many +of today's circuitswitched telephone networks are slowly migrating +toward packet switching. In particular, telephone networks often use +packet switching for the expensive overseas portion of a telephone call. + +1.3.3 A Network of Networks We saw earlier that end systems (PCs, +smartphones, Web servers, mail servers, and so on) connect into the +Internet via an access ISP. The access ISP can provide either wired or +wireless connectivity, using an array of access technologies including +DSL, cable, FTTH, Wi-Fi, and cellular. Note that the access ISP does not +have to be a telco or a cable company; instead it can be, for example, a +university (providing Internet access to students, staff, and faculty), +or a company (providing access for its employees). But connecting end +users and content providers into an access ISP is only a small piece of +solving the puzzle of connecting the billions of end systems that make +up the Internet. To complete this puzzle, the access ISPs themselves +must be interconnected. This is done by creating a network of +networks---understanding this phrase is the key to understanding the +Internet. Over the years, the network of networks that forms the +Internet has evolved into a very complex structure. Much of this +evolution is driven by economics and national policy, rather than by +performance considerations. In order to understand today's Internet +network structure, let's incrementally build a series of network +structures, with each new structure being a better approximation of the +complex Internet that we have today. Recall that the overarching goal is +to interconnect the access ISPs so that all end systems can send packets +to each other. One naive approach would be to have each access ISP +directly connect with every other access ISP. Such a mesh design is, of +course, much too costly for the access ISPs, as it would require each +access ISP to have a separate communication link to each of the hundreds +of thousands of other access ISPs all over the world. + +Our first network structure, Network Structure 1, interconnects all of +the access ISPs with a single global transit ISP. Our (imaginary) global +transit ISP is a network of routers and communication links that not +only spans the globe, but also has at least one router near each of the +hundreds of thousands of access ISPs. Of course, it would be very costly +for the global ISP to build such an extensive network. To be profitable, +it would naturally charge each of the access ISPs for connectivity, with +the pricing reflecting (but not necessarily directly proportional to) +the amount of traffic an access ISP exchanges with the global ISP. Since +the access ISP pays the global transit ISP, the access ISP is said to be +a customer and the global transit ISP is said to be a provider. Now if +some company builds and operates a global transit ISP that is +profitable, then it is natural for other companies to build their own +global transit ISPs and compete with the original global transit ISP. +This leads to Network Structure 2, which consists of the hundreds of +thousands of access ISPs and multiple global transit ISPs. The access +ISPs certainly prefer Network Structure 2 over Network Structure 1 since +they can now choose among the competing global transit providers as a +function of their pricing and services. Note, however, that the global +transit ISPs themselves must interconnect: Otherwise access ISPs +connected to one of the global transit providers would not be able to +communicate with access ISPs connected to the other global transit +providers. Network Structure 2, just described, is a two-tier hierarchy +with global transit providers residing at the top tier and access ISPs +at the bottom tier. This assumes that global transit ISPs are not only +capable of getting close to each and every access ISP, but also find it +economically desirable to do so. In reality, although some ISPs do have +impressive global coverage and do directly connect with many access +ISPs, no ISP has presence in each and every city in the world. Instead, +in any given region, there may be a regional ISP to which the access +ISPs in the region connect. Each regional ISP then connects to tier-1 +ISPs. Tier-1 ISPs are similar to our (imaginary) global transit ISP; but +tier-1 ISPs, which actually do exist, do not have a presence in every +city in the world. There are approximately a dozen tier-1 ISPs, +including Level 3 Communications, AT&T, Sprint, and NTT. Interestingly, +no group officially sanctions tier-1 status; as the saying goes---if you +have to ask if you're a member of a group, you're probably not. +Returning to this network of networks, not only are there multiple +competing tier-1 ISPs, there may be multiple competing regional ISPs in +a region. In such a hierarchy, each access ISP pays the regional ISP to +which it connects, and each regional ISP pays the tier-1 ISP to which it +connects. (An access ISP can also connect directly to a tier-1 ISP, in +which case it pays the tier-1 ISP). Thus, there is customerprovider +relationship at each level of the hierarchy. Note that the tier-1 ISPs +do not pay anyone as they are at the top of the hierarchy. To further +complicate matters, in some regions, there may be a larger regional ISP +(possibly spanning an entire country) to which the smaller regional ISPs +in that region connect; the larger regional ISP then connects to a +tier-1 ISP. For example, in China, there are access ISPs in each city, +which connect to provincial ISPs, which in turn connect to national +ISPs, which finally connect to tier-1 ISPs \[Tian 2012\]. We refer to +this multi-tier hierarchy, which is still only a crude + +approximation of today's Internet, as Network Structure 3. To build a +network that more closely resembles today's Internet, we must add points +of presence (PoPs), multi-homing, peering, and Internet exchange points +(IXPs) to the hierarchical Network Structure 3. PoPs exist in all levels +of the hierarchy, except for the bottom (access ISP) level. A PoP is +simply a group of one or more routers (at the same location) in the +provider's network where customer ISPs can connect into the provider +ISP. For a customer network to connect to a provider's PoP, it can lease +a high-speed link from a third-party telecommunications provider to +directly connect one of its routers to a router at the PoP. Any ISP +(except for tier-1 ISPs) may choose to multi-home, that is, to connect +to two or more provider ISPs. So, for example, an access ISP may +multi-home with two regional ISPs, or it may multi-home with two +regional ISPs and also with a tier-1 ISP. Similarly, a regional ISP may +multi-home with multiple tier-1 ISPs. When an ISP multi-homes, it can +continue to send and receive packets into the Internet even if one of +its providers has a failure. As we just learned, customer ISPs pay their +provider ISPs to obtain global Internet interconnectivity. The amount +that a customer ISP pays a provider ISP reflects the amount of traffic +it exchanges with the provider. To reduce these costs, a pair of nearby +ISPs at the same level of the hierarchy can peer, that is, they can +directly connect their networks together so that all the traffic between +them passes over the direct connection rather than through upstream +intermediaries. When two ISPs peer, it is typically settlement-free, +that is, neither ISP pays the other. As noted earlier, tier-1 ISPs also +peer with one another, settlement-free. For a readable discussion of +peering and customer-provider relationships, see \[Van der Berg 2008\]. +Along these same lines, a third-party company can create an Internet +Exchange Point (IXP), which is a meeting point where multiple ISPs can +peer together. An IXP is typically in a stand-alone building with its +own switches \[Ager 2012\]. There are over 400 IXPs in the Internet +today \[IXP List 2016\]. We refer to this ecosystem---consisting of +access ISPs, regional ISPs, tier-1 ISPs, PoPs, multi-homing, peering, +and IXPs---as Network Structure 4. We now finally arrive at Network +Structure 5, which describes today's Internet. Network Structure 5, +illustrated in Figure 1.15, builds on top of Network Structure 4 by +adding content-provider networks. Google is currently one of the leading +examples of such a content-provider network. As of this writing, it is +estimated that Google has 50--100 data centers distributed across North +America, Europe, Asia, South America, and Australia. Some of these data +centers house over one hundred thousand servers, while other data +centers are smaller, housing only hundreds of servers. The Google data +centers are all interconnected via Google's private TCP/IP network, +which spans the entire globe but is nevertheless separate from the +public Internet. Importantly, the Google private network only carries +traffic to/from Google servers. As shown in Figure 1.15, the Google +private network attempts to "bypass" the upper tiers of the Internet by +peering (settlement free) with lower-tier ISPs, either by directly +connecting with them or by connecting with them at IXPs \[Labovitz +2010\]. However, because many access ISPs can still only be reached by +transiting through tier-1 networks, the Google network also connects to +tier-1 ISPs, and pays those ISPs for the traffic it exchanges with them. +By creating its own network, a content + +provider not only reduces its payments to upper-tier ISPs, but also has +greater control of how its services are ultimately delivered to end +users. Google's network infrastructure is described in greater detail in +Section 2.6. In summary, today's Internet---a network of networks---is +complex, consisting of a dozen or so tier-1 ISPs and hundreds of +thousands of lower-tier ISPs. The ISPs are diverse in their coverage, +with some spanning multiple continents and oceans, and others limited to +narrow geographic regions. The lowertier ISPs connect to the higher-tier +ISPs, and the higher-tier ISPs interconnect with one another. Users and +content providers are customers of lower-tier ISPs, and lower-tier ISPs +are customers of higher-tier ISPs. In recent years, major content +providers have also created their own networks and connect directly into +lower-tier ISPs where possible. + +Figure 1.15 Interconnection of ISPs + +1.4 Delay, Loss, and Throughput in Packet-Switched Networks Back in +Section 1.1 we said that the Internet can be viewed as an infrastructure +that provides services to distributed applications running on end +systems. Ideally, we would like Internet services to be able to move as +much data as we want between any two end systems, instantaneously, +without any loss of data. Alas, this is a lofty goal, one that is +unachievable in reality. Instead, computer networks necessarily +constrain throughput (the amount of data per second that can be +transferred) between end systems, introduce delays between end systems, +and can actually lose packets. On one hand, it is unfortunate that the +physical laws of reality introduce delay and loss as well as constrain +throughput. On the other hand, because computer networks have these +problems, there are many fascinating issues surrounding how to deal with +the problems---more than enough issues to fill a course on computer +networking and to motivate thousands of PhD theses! In this section, +we'll begin to examine and quantify delay, loss, and throughput in +computer networks. + +1.4.1 Overview of Delay in Packet-Switched Networks Recall that a packet +starts in a host (the source), passes through a series of routers, and +ends its journey in another host (the destination). As a packet travels +from one node (host or router) to the subsequent node (host or router) +along this path, the packet suffers from several types of delays at each +node along the path. The most important of these delays are the nodal +processing delay, queuing delay, transmission delay, and propagation +delay; together, these delays accumulate to give a total nodal delay. +The performance of many Internet applications---such as search, Web +browsing, e-mail, maps, instant messaging, and voice-over-IP---are +greatly affected by network delays. In order to acquire a deep +understanding of packet switching and computer networks, we must +understand the nature and importance of these delays. Types of Delay +Let's explore these delays in the context of Figure 1.16. As part of its +end-to-end route between source and destination, a packet is sent from +the upstream node through router A to router B. Our goal is to +characterize the nodal delay at router A. Note that router A has an +outbound link leading to router B. This link is preceded by a queue +(also known as a buffer). When the packet arrives at router A from the +upstream node, router A examines the packet's header to determine the +appropriate outbound link for the packet and then directs the packet to +this link. In this example, the outbound link for the packet is the one +that leads to router B. A packet can be transmitted on a link only if +there is no other packet currently + +being transmitted on the link and if there are no other packets +preceding it in the queue; if the link is + +Figure 1.16 The nodal delay at router A + +currently busy or if there are other packets already queued for the +link, the newly arriving packet will then join the queue. Processing +Delay The time required to examine the packet's header and determine +where to direct the packet is part of the processing delay. The +processing delay can also include other factors, such as the time needed +to check for bit-level errors in the packet that occurred in +transmitting the packet's bits from the upstream node to router A. +Processing delays in high-speed routers are typically on the order of +microseconds or less. After this nodal processing, the router directs +the packet to the queue that precedes the link to router B. (In Chapter +4 we'll study the details of how a router operates.) Queuing Delay At +the queue, the packet experiences a queuing delay as it waits to be +transmitted onto the link. The length of the queuing delay of a specific +packet will depend on the number of earlier-arriving packets that are +queued and waiting for transmission onto the link. If the queue is empty +and no other packet is currently being transmitted, then our packet's +queuing delay will be zero. On the other hand, if the traffic is heavy +and many other packets are also waiting to be transmitted, the queuing +delay will be long. We will see shortly that the number of packets that +an arriving packet might expect to find is a function of the intensity +and nature of the traffic arriving at the queue. Queuing delays can be +on the order of microseconds to milliseconds in practice. Transmission +Delay Assuming that packets are transmitted in a first-come-first-served +manner, as is common in packetswitched networks, our packet can be +transmitted only after all the packets that have arrived before it have +been transmitted. Denote the length of the packet by L bits, and denote +the transmission rate of + +the link from router A to router B by R bits/sec. For example, for a 10 +Mbps Ethernet link, the rate is R=10 Mbps; for a 100 Mbps Ethernet link, +the rate is R=100 Mbps. The transmission delay is L/R. This is the +amount of time required to push (that is, transmit) all of the packet's +bits into the link. Transmission delays are typically on the order of +microseconds to milliseconds in practice. Propagation Delay Once a bit +is pushed into the link, it needs to propagate to router B. The time +required to propagate from the beginning of the link to router B is the +propagation delay. The bit propagates at the propagation speed of the +link. The propagation speed depends on the physical medium of the link +(that is, fiber optics, twisted-pair copper wire, and so on) and is in +the range of 2⋅108 meters/sec to 3⋅108 meters/sec which is equal to, or +a little less than, the speed of light. The propagation delay is the +distance between two routers divided by the propagation speed. That is, +the propagation delay is d/s, where d is the distance between router A +and router B and s is the propagation speed of the link. Once the last +bit of the packet propagates to node B, it and all the preceding bits of +the packet are stored in router B. The whole process then continues with +router B now performing the forwarding. In wide-area networks, +propagation delays are on the order of milliseconds. Comparing +Transmission and Propagation Delay + +Exploring propagation delay and transmission delay + +Newcomers to the field of computer networking sometimes have difficulty +understanding the difference between transmission delay and propagation +delay. The difference is subtle but important. The transmission delay is +the amount of time required for the router to push out the packet; it is +a function of the packet's length and the transmission rate of the link, +but has nothing to do with the distance between the two routers. The +propagation delay, on the other hand, is the time it takes a bit to +propagate from one router to the next; it is a function of the distance +between the two routers, but has nothing to do with the packet's length +or the transmission rate of the link. An analogy might clarify the +notions of transmission and propagation delay. Consider a highway that +has a tollbooth every 100 kilometers, as shown in Figure 1.17. You can +think of the highway segments + +between tollbooths as links and the tollbooths as routers. Suppose that +cars travel (that is, propagate) on the highway at a rate of 100 km/hour +(that is, when a car leaves a tollbooth, it instantaneously accelerates +to 100 km/hour and maintains that speed between tollbooths). Suppose +next that 10 cars, traveling together as a caravan, follow each other in +a fixed order. You can think of each car as a bit and the caravan as a +packet. Also suppose that each + +Figure 1.17 Caravan analogy + +tollbooth services (that is, transmits) a car at a rate of one car per +12 seconds, and that it is late at night so that the caravan's cars are +the only cars on the highway. Finally, suppose that whenever the first +car of the caravan arrives at a tollbooth, it waits at the entrance +until the other nine cars have arrived and lined up behind it. (Thus the +entire caravan must be stored at the tollbooth before it can begin to be +forwarded.) The time required for the tollbooth to push the entire +caravan onto the highway is (10 cars)/(5 cars/minute)=2 minutes. This +time is analogous to the transmission delay in a router. The time +required for a car to travel from the exit of one tollbooth to the next +tollbooth is 100 km/(100 km/hour)=1 hour. This time is analogous to +propagation delay. Therefore, the time from when the caravan is stored +in front of a tollbooth until the caravan is stored in front of the next +tollbooth is the sum of transmission delay and propagation delay---in +this example, 62 minutes. Let's explore this analogy a bit more. What +would happen if the tollbooth service time for a caravan were greater +than the time for a car to travel between tollbooths? For example, +suppose now that the cars travel at the rate of 1,000 km/hour and the +tollbooth services cars at the rate of one car per minute. Then the +traveling delay between two tollbooths is 6 minutes and the time to +serve a caravan is 10 minutes. In this case, the first few cars in the +caravan will arrive at the second tollbooth before the last cars in the +caravan leave the first tollbooth. This situation also arises in +packet-switched networks---the first bits in a packet can arrive at a +router while many of the remaining bits in the packet are still waiting +to be transmitted by the preceding router. If a picture speaks a +thousand words, then an animation must speak a million words. The Web +site for this textbook provides an interactive Java applet that nicely +illustrates and contrasts transmission delay and propagation delay. The +reader is highly encouraged to visit that applet. \[Smith 2009\] also +provides a very readable discussion of propagation, queueing, and +transmission delays. If we let dproc, dqueue, dtrans, and dprop denote +the processing, queuing, transmission, and propagation + +delays, then the total nodal delay is given by +dnodal=dproc+dqueue+dtrans+dprop The contribution of these delay +components can vary significantly. For example, dprop can be negligible +(for example, a couple of microseconds) for a link connecting two +routers on the same university campus; however, dprop is hundreds of +milliseconds for two routers interconnected by a geostationary satellite +link, and can be the dominant term in dnodal. Similarly, dtrans can +range from negligible to significant. Its contribution is typically +negligible for transmission rates of 10 Mbps and higher (for example, +for LANs); however, it can be hundreds of milliseconds for large +Internet packets sent over low-speed dial-up modem links. The processing +delay, dproc, is often negligible; however, it strongly influences a +router's maximum throughput, which is the maximum rate at which a router +can forward packets. + +1.4.2 Queuing Delay and Packet Loss The most complicated and interesting +component of nodal delay is the queuing delay, dqueue. In fact, queuing +delay is so important and interesting in computer networking that +thousands of papers and numerous books have been written about it +\[Bertsekas 1991; Daigle 1991; Kleinrock 1975, Kleinrock 1976; Ross +1995\]. We give only a high-level, intuitive discussion of queuing delay +here; the more curious reader may want to browse through some of the +books (or even eventually write a PhD thesis on the subject!). Unlike +the other three delays (namely, dproc, dtrans, and dprop), the queuing +delay can vary from packet to packet. For example, if 10 packets arrive +at an empty queue at the same time, the first packet transmitted will +suffer no queuing delay, while the last packet transmitted will suffer a +relatively large queuing delay (while it waits for the other nine +packets to be transmitted). Therefore, when characterizing queuing +delay, one typically uses statistical measures, such as average queuing +delay, variance of queuing delay, and the probability that the queuing +delay exceeds some specified value. When is the queuing delay large and +when is it insignificant? The answer to this question depends on the +rate at which traffic arrives at the queue, the transmission rate of the +link, and the nature of the arriving traffic, that is, whether the +traffic arrives periodically or arrives in bursts. To gain some insight +here, let a denote the average rate at which packets arrive at the queue +(a is in units of packets/sec). Recall that R is the transmission rate; +that is, it is the rate (in bits/sec) at which bits are pushed out of +the queue. Also suppose, for simplicity, that all packets consist of L +bits. Then the average rate at which bits arrive at the queue is La +bits/sec. Finally, assume that the queue is very big, so that it can +hold essentially an infinite number of bits. The ratio La/R, called the +traffic intensity, often plays an important role in estimating the +extent of the queuing delay. If La/R \> 1, then the average rate at +which bits arrive at the queue exceeds the rate at which the bits can be +transmitted from the queue. In this + +unfortunate situation, the queue will tend to increase without bound and +the queuing delay will approach infinity! Therefore, one of the golden +rules in traffic engineering is: Design your system so that the traffic +intensity is no greater than 1. Now consider the case La/R ≤ 1. Here, +the nature of the arriving traffic impacts the queuing delay. For +example, if packets arrive periodically---that is, one packet arrives +every L/R seconds---then every packet will arrive at an empty queue and +there will be no queuing delay. On the other hand, if packets arrive in +bursts but periodically, there can be a significant average queuing +delay. For example, suppose N packets arrive simultaneously every (L/R)N +seconds. Then the first packet transmitted has no queuing delay; the +second packet transmitted has a queuing delay of L/R seconds; and more +generally, the nth packet transmitted has a queuing delay of (n−1)L/R +seconds. We leave it as an exercise for you to calculate the average +queuing delay in this example. The two examples of periodic arrivals +described above are a bit academic. Typically, the arrival process to a +queue is random; that is, the arrivals do not follow any pattern and the +packets are spaced apart by random amounts of time. In this more +realistic case, the quantity La/R is not usually sufficient to fully +characterize the queuing delay statistics. Nonetheless, it is useful in +gaining an intuitive understanding of the extent of the queuing delay. +In particular, if the traffic intensity is close to zero, then packet +arrivals are few and far between and it is unlikely that an arriving +packet will find another packet in the queue. Hence, the average queuing +delay will be close to zero. On the other hand, when the traffic +intensity is close to 1, there will be intervals of time when the +arrival rate exceeds the transmission capacity (due to variations in +packet arrival rate), and a queue will form during these periods of +time; when the arrival rate is less than the transmission capacity, the +length of the queue will shrink. Nonetheless, as the traffic intensity +approaches 1, the average queue length gets larger and larger. The +qualitative dependence of average queuing delay on the traffic intensity +is shown in Figure 1.18. One important aspect of Figure 1.18 is the fact +that as the traffic intensity approaches 1, the average queuing delay +increases rapidly. A small percentage increase in the intensity will +result in a much larger percentage-wise increase in delay. Perhaps you +have experienced this phenomenon on the highway. If you regularly drive +on a road that is typically congested, the fact that the road is +typically + +Figure 1.18 Dependence of average queuing delay on traffic intensity + +congested means that its traffic intensity is close to 1. If some event +causes an even slightly larger-thanusual amount of traffic, the delays +you experience can be huge. To really get a good feel for what queuing +delays are about, you are encouraged once again to visit the textbook +Web site, which provides an interactive Java applet for a queue. If you +set the packet arrival rate high enough so that the traffic intensity +exceeds 1, you will see the queue slowly build up over time. Packet Loss +In our discussions above, we have assumed that the queue is capable of +holding an infinite number of packets. In reality a queue preceding a +link has finite capacity, although the queuing capacity greatly depends +on the router design and cost. Because the queue capacity is finite, +packet delays do not really approach infinity as the traffic intensity +approaches 1. Instead, a packet can arrive to find a full queue. With no +place to store such a packet, a router will drop that packet; that is, +the packet will be lost. This overflow at a queue can again be seen in +the Java applet for a queue when the traffic intensity is greater +than 1. From an end-system viewpoint, a packet loss will look like a +packet having been transmitted into the network core but never emerging +from the network at the destination. The fraction of lost packets +increases as the traffic intensity increases. Therefore, performance at +a node is often measured not only in terms of delay, but also in terms +of the probability of packet loss. As we'll discuss in the subsequent +chapters, a lost packet may be retransmitted on an end-to-end basis in +order to ensure that all data are eventually transferred from source to +destination. + +1.4.3 End-to-End Delay + +Our discussion up to this point has focused on the nodal delay, that is, +the delay at a single router. Let's now consider the total delay from +source to destination. To get a handle on this concept, suppose there +are N−1 routers between the source host and the destination host. Let's +also suppose for the moment that the network is uncongested (so that +queuing delays are negligible), the processing delay at each router and +at the source host is dproc, the transmission rate out of each router +and out of the source host is R bits/sec, and the propagation on each +link is dprop. The nodal delays accumulate and give an end-toend delay, +dend−end=N(dproc+dtrans+dprop) + +(1.2) + +where, once again, dtrans=L/R, where L is the packet size. Note that +Equation 1.2 is a generalization of Equation 1.1, which did not take +into account processing and propagation delays. We leave it to you to +generalize Equation 1.2 to the case of heterogeneous delays at the nodes +and to the presence of an average queuing delay at each node. Traceroute + +Using Traceroute to discover network paths and measure network delay + +To get a hands-on feel for end-to-end delay in a computer network, we +can make use of the Traceroute program. Traceroute is a simple program +that can run in any Internet host. When the user specifies a destination +hostname, the program in the source host sends multiple, special packets +toward that destination. As these packets work their way toward the +destination, they pass through a series of routers. When a router +receives one of these special packets, it sends back to the source a +short message that contains the name and address of the router. More +specifically, suppose there are N−1 routers between the source and the +destination. Then the source will send N special packets into the +network, with each packet addressed to the ultimate destination. These N +special packets are marked 1 through N, with the first packet marked 1 +and the last packet marked N. When the nth router receives the nth +packet marked n, the router does not forward the packet toward its +destination, but instead sends a message back to the source. When the +destination host receives the Nth packet, it too returns a message back +to the source. The source records the time that elapses between when it +sends a packet and when it receives the corresponding + +return message; it also records the name and address of the router (or +the destination host) that returns the message. In this manner, the +source can reconstruct the route taken by packets flowing from source to +destination, and the source can determine the round-trip delays to all +the intervening routers. Traceroute actually repeats the experiment just +described three times, so the source actually sends 3 • N packets to the +destination. RFC 1393 describes Traceroute in detail. Here is an example +of the output of the Traceroute program, where the route was being +traced from the source host gaia.cs.umass.edu (at the University of +Massachusetts) to the host cis.poly.edu (at Polytechnic University in +Brooklyn). The output has six columns: the first column is the n value +described above, that is, the number of the router along the route; the +second column is the name of the router; the third column is the address +of the router (of the form xxx.xxx.xxx.xxx); the last three columns are +the round-trip delays for three experiments. If the source receives +fewer than three messages from any given router (due to packet loss in +the network), Traceroute places an asterisk just after the router number +and reports fewer than three round-trip times for that router. + +1 + +cs-gw (128.119.240.254) 1.009 ms 0.899 ms 0.993 ms + +2 + +128.119.3.154 (128.119.3.154) 0.931 ms 0.441 ms 0.651 ms + +3 + +-border4-rt-gi-1-3.gw.umass.edu (128.119.2.194) 1.032 ms 0.484 ms + +0.451 ms 4 + +-acr1-ge-2-1-0.Boston.cw.net (208.172.51.129) 10.006 ms 8.150 ms 8.460 + +ms 5 + +-agr4-loopback.NewYork.cw.net (206.24.194.104) 12.272 ms 14.344 ms + +13.267 ms 6 + +-acr2-loopback.NewYork.cw.net (206.24.194.62) 13.225 ms 12.292 ms + +12.148 ms 7 + +-pos10-2.core2.NewYork1.Level3.net (209.244.160.133) 12.218 ms 11.823 + +ms 11.793 ms 8 + +-gige9-1-52.hsipaccess1.NewYork1.Level3.net (64.159.17.39) 13.081 ms + +11.556 ms 13.297 ms 9 + +-p0-0.polyu.bbnplanet.net (4.25.109.122) 12.716 ms 13.052 ms 12.786 ms + +10 cis.poly.edu (128.238.32.126) 14.080 ms 13.035 ms 12.802 ms + +In the trace above there are nine routers between the source and the +destination. Most of these routers have a name, and all of them have +addresses. For example, the name of Router 3 is +border4-rt-gi1-3.gw.umass.edu and its address is 128.119.2.194 . Looking +at the data provided for this same router, we see that in the first of +the three trials the round-trip delay between the source and the router +was 1.03 msec. The round-trip delays for the subsequent two trials were +0.48 and 0.45 msec. These + +round-trip delays include all of the delays just discussed, including +transmission delays, propagation delays, router processing delays, and +queuing delays. Because the queuing delay is varying with time, the +round-trip delay of packet n sent to a router n can sometimes be longer +than the round-trip delay of packet n+1 sent to router n+1. Indeed, we +observe this phenomenon in the above example: the delays to Router 6 are +larger than the delays to Router 7! Want to try out Traceroute for +yourself? We highly recommended that you visit http:// +www.traceroute.org, which provides a Web interface to an extensive list +of sources for route tracing. You choose a source and supply the +hostname for any destination. The Traceroute program then does all the +work. There are a number of free software programs that provide a +graphical interface to Traceroute; one of our favorites is PingPlotter +\[PingPlotter 2016\]. End System, Application, and Other Delays In +addition to processing, transmission, and propagation delays, there can +be additional significant delays in the end systems. For example, an end +system wanting to transmit a packet into a shared medium (e.g., as in a +WiFi or cable modem scenario) may purposefully delay its transmission as +part of its protocol for sharing the medium with other end systems; +we'll consider such protocols in detail in Chapter 6. Another important +delay is media packetization delay, which is present in Voice-over-IP +(VoIP) applications. In VoIP, the sending side must first fill a packet +with encoded digitized speech before passing the packet to the Internet. +This time to fill a packet---called the packetization delay---can be +significant and can impact the user-perceived quality of a VoIP call. +This issue will be further explored in a homework problem at the end of +this chapter. + +1.4.4 Throughput in Computer Networks In addition to delay and packet +loss, another critical performance measure in computer networks is +endto-end throughput. To define throughput, consider transferring a +large file from Host A to Host B across a computer network. This +transfer might be, for example, a large video clip from one peer to +another in a P2P file sharing system. The instantaneous throughput at +any instant of time is the rate (in bits/sec) at which Host B is +receiving the file. (Many applications, including many P2P file sharing +systems, display the instantaneous throughput during downloads in the +user interface---perhaps you have observed this before!) If the file +consists of F bits and the transfer takes T seconds for Host B to +receive all F bits, then the average throughput of the file transfer is +F/T bits/sec. For some applications, such as Internet telephony, it is +desirable to have a low delay and an instantaneous throughput +consistently above some threshold (for example, over 24 kbps for some +Internet telephony applications and over 256 kbps for some real-time +video applications). For other applications, including those involving +file transfers, delay is not critical, but it is desirable to have the +highest possible throughput. + +To gain further insight into the important concept of throughput, let's +consider a few examples. Figure 1.19(a) shows two end systems, a server +and a client, connected by two communication links and a router. +Consider the throughput for a file transfer from the server to the +client. Let Rs denote the rate of the link between the server and the +router; and Rc denote the rate of the link between the router and the +client. Suppose that the only bits being sent in the entire network are +those from the server to the client. We now ask, in this ideal scenario, +what is the server-to-client throughput? To answer this question, we may +think of bits as fluid and communication links as pipes. Clearly, the +server cannot pump bits through its link at a rate faster than Rs bps; +and the router cannot forward bits at a rate faster than Rc bps. If +Rs\<Rc, then the bits pumped by the server will "flow" right through the +router and arrive at the client at a rate of Rs bps, giving a throughput +of Rs bps. If, on the other hand, Rc\<Rs, then the router will not be +able to forward bits as quickly as it receives them. In this case, bits +will only leave the router at rate Rc, giving an end-to-end throughput +of Rc. (Note also that if bits continue to arrive at the router at rate +Rs, and continue to leave the router at Rc, the backlog of bits at the +router waiting + +Figure 1.19 Throughput for a file transfer from server to client + +for transmission to the client will grow and grow---a most undesirable +situation!) Thus, for this simple two-link network, the throughput is +min{Rc, Rs}, that is, it is the transmission rate of the bottleneck +link. Having determined the throughput, we can now approximate the time +it takes to transfer a large file of F bits from server to client as +F/min{Rs, Rc}. For a specific example, suppose you are downloading an +MP3 file of F=32 million bits, the server has a transmission rate of +Rs=2 Mbps, and you have an access link of Rc=1 Mbps. The time needed to +transfer the file is then 32 seconds. Of course, these expressions for +throughput and transfer time are only approximations, as they do not +account for store-and-forward and processing delays as well as protocol +issues. Figure 1.19(b) now shows a network with N links between the +server and the client, with the transmission rates of the N links being +R1,R2,..., RN. Applying the same analysis as for the two-link network, +we find that the throughput for a file transfer from server to client is +min{R1,R2,..., RN}, which + +is once again the transmission rate of the bottleneck link along the +path between server and client. Now consider another example motivated +by today's Internet. Figure 1.20(a) shows two end systems, a server and +a client, connected to a computer network. Consider the throughput for a +file transfer from the server to the client. The server is connected to +the network with an access link of rate Rs and the client is connected +to the network with an access link of rate Rc. Now suppose that all the +links in the core of the communication network have very high +transmission rates, much higher than Rs and Rc. Indeed, today, the core +of the Internet is over-provisioned with high speed links that +experience little congestion. Also suppose that the only bits being sent +in the entire network are those from the server to the client. Because +the core of the computer network is like a wide pipe in this example, +the rate at which bits can flow from source to destination is again the +minimum of Rs and Rc, that is, throughput = min{Rs, Rc}. Therefore, the +constraining factor for throughput in today's Internet is typically the +access network. For a final example, consider Figure 1.20(b) in which +there are 10 servers and 10 clients connected to the core of the +computer network. In this example, there are 10 simultaneous downloads +taking place, involving 10 client-server pairs. Suppose that these 10 +downloads are the only traffic in the network at the current time. As +shown in the figure, there is a link in the core that is traversed by +all 10 downloads. Denote R for the transmission rate of this link R. +Let's suppose that all server access links have the same rate Rs, all +client access links have the same rate Rc, and the transmission rates of +all the links in the core---except the one common link of rate R---are +much larger than Rs, Rc, and R. Now we ask, what are the throughputs of +the downloads? Clearly, if the rate of the common link, R, is +large---say a hundred times larger than both Rs and Rc---then the +throughput for each download will once again be min{Rs, Rc}. But what if +the rate of the common link is of the same order as Rs and Rc? What will +the throughput be in this case? Let's take a look at a specific example. +Suppose Rs=2 Mbps, Rc=1 Mbps, R=5 Mbps, and the + +Figure 1.20 End-to-end throughput: (a) Client downloads a file from +server; (b) 10 clients downloading with 10 servers + +common link divides its transmission rate equally among the 10 +downloads. Then the bottleneck for each download is no longer in the +access network, but is now instead the shared link in the core, which +only provides each download with 500 kbps of throughput. Thus the +end-to-end throughput for each download is now reduced to 500 kbps. The +examples in Figure 1.19 and Figure 1.20(a) show that throughput depends +on the transmission rates of the links over which the data flows. We saw +that when there is no other intervening traffic, the throughput can +simply be approximated as the minimum transmission rate along the path +between source and destination. The example in Figure 1.20(b) shows that +more generally the throughput depends not only on the transmission rates +of the links along the path, but also on the intervening traffic. In +particular, a link with a high transmission rate may nonetheless be the +bottleneck link for a file transfer if many other data flows are also +passing through that link. We will examine throughput in computer +networks more closely in the homework problems and in the subsequent +chapters. + +1.5 Protocol Layers and Their Service Models From our discussion thus +far, it is apparent that the Internet is an extremely complicated +system. We have seen that there are many pieces to the Internet: +numerous applications and protocols, various types of end systems, +packet switches, and various types of link-level media. Given this +enormous complexity, is there any hope of organizing a network +architecture, or at least our discussion of network architecture? +Fortunately, the answer to both questions is yes. + +1.5.1 Layered Architecture Before attempting to organize our thoughts on +Internet architecture, let's look for a human analogy. Actually, we deal +with complex systems all the time in our everyday life. Imagine if +someone asked you to describe, for example, the airline system. How +would you find the structure to describe this complex system that has +ticketing agents, baggage checkers, gate personnel, pilots, airplanes, +air traffic control, and a worldwide system for routing airplanes? One +way to describe this system might be to describe the series of actions +you take (or others take for you) when you fly on an airline. You +purchase your ticket, check your bags, go to the gate, and eventually +get loaded onto the plane. The plane takes off and is routed to its +destination. After your plane lands, you deplane at the gate and claim +your bags. If the trip was bad, you complain about the flight to the +ticket agent (getting nothing for your effort). This scenario is shown +in Figure 1.21. + +Figure 1.21 Taking an airplane trip: actions + +Figure 1.22 Horizontal layering of airline functionality + +Already, we can see some analogies here with computer networking: You +are being shipped from source to destination by the airline; a packet is +shipped from source host to destination host in the Internet. But this +is not quite the analogy we are after. We are looking for some structure +in Figure 1.21. Looking at Figure 1.21, we note that there is a +ticketing function at each end; there is also a baggage function for +already-ticketed passengers, and a gate function for already-ticketed +and already-baggagechecked passengers. For passengers who have made it +through the gate (that is, passengers who are already ticketed, +baggage-checked, and through the gate), there is a takeoff and landing +function, and while in flight, there is an airplane-routing function. +This suggests that we can look at the functionality in Figure 1.21 in a +horizontal manner, as shown in Figure 1.22. Figure 1.22 has divided the +airline functionality into layers, providing a framework in which we can +discuss airline travel. Note that each layer, combined with the layers +below it, implements some functionality, some service. At the ticketing +layer and below, airline-counter-to-airline-counter transfer of a person +is accomplished. At the baggage layer and below, +baggage-check-to-baggage-claim transfer of a person and bags is +accomplished. Note that the baggage layer provides this service only to +an already-ticketed person. At the gate layer, +departure-gate-to-arrival-gate transfer of a person and bags is +accomplished. At the takeoff/landing layer, runway-to-runway transfer of +people and their bags is accomplished. Each layer provides its service +by (1) performing certain actions within that layer (for example, at the +gate layer, loading and unloading people from an airplane) and by (2) +using the services of the layer directly below it (for example, in the +gate layer, using the runway-to-runway passenger transfer service of the +takeoff/landing layer). A layered architecture allows us to discuss a +well-defined, specific part of a large and complex system. This +simplification itself is of considerable value by providing modularity, +making it much easier to change the implementation of the service +provided by the layer. As long as the layer provides the same service to +the layer above it, and uses the same services from the layer below it, +the remainder of the system remains unchanged when a layer's +implementation is changed. (Note that changing the + +implementation of a service is very different from changing the service +itself!) For example, if the gate functions were changed (for instance, +to have people board and disembark by height), the remainder of the +airline system would remain unchanged since the gate layer still +provides the same function (loading and unloading people); it simply +implements that function in a different manner after the change. For +large and complex systems that are constantly being updated, the ability +to change the implementation of a service without affecting other +components of the system is another important advantage of layering. +Protocol Layering But enough about airlines. Let's now turn our +attention to network protocols. To provide structure to the design of +network protocols, network designers organize protocols---and the +network hardware and software that implement the protocols---in layers. +Each protocol belongs to one of the layers, just as each function in the +airline architecture in Figure 1.22 belonged to a layer. We are again +interested in the services that a layer offers to the layer above---the +so-called service model of a layer. Just as in the case of our airline +example, each layer provides its service by (1) performing certain +actions within that layer and by (2) using the services of the layer +directly below it. For example, the services provided by layer n may +include reliable delivery of messages from one edge of the network to +the other. This might be implemented by using an unreliable edge-to-edge +message delivery service of layer n−1, and adding layer n functionality +to detect and retransmit lost messages. A protocol layer can be +implemented in software, in hardware, or in a combination of the two. +Application-layer protocols---such as HTTP and SMTP---are almost always +implemented in software in the end systems; so are transport-layer +protocols. Because the physical layer and data link layers are +responsible for handling communication over a specific link, they are +typically implemented in a network interface card (for example, Ethernet +or WiFi interface cards) associated with a given link. The network layer +is often a mixed implementation of hardware and software. Also note that +just as the functions in the layered airline architecture were +distributed among the various airports and flight control centers that +make up the system, so too is a layer n protocol distributed among the +end systems, packet switches, and other components that make up the +network. That is, there's often a piece of a layer n protocol in each of +these network components. Protocol layering has conceptual and +structural advantages \[RFC 3439\]. As we have seen, layering provides a +structured way to discuss system components. Modularity makes it easier +to update system components. We mention, however, that some researchers +and networking engineers are vehemently opposed to layering \[Wakeman +1992\]. One potential drawback of layering is that one layer may +duplicate lower-layer functionality. For example, many protocol stacks +provide error recovery + +Figure 1.23 The Internet protocol stack (a) and OSI reference model (b) + +on both a per-link basis and an end-to-end basis. A second potential +drawback is that functionality at one layer may need information (for +example, a timestamp value) that is present only in another layer; this +violates the goal of separation of layers. When taken together, the +protocols of the various layers are called the protocol stack. The +Internet protocol stack consists of five layers: the physical, link, +network, transport, and application layers, as shown in Figure 1.23(a). +If you examine the Table of Contents, you will see that we have roughly +organized this book using the layers of the Internet protocol stack. We +take a top-down approach, first covering the application layer and then +proceeding downward. Application Layer The application layer is where +network applications and their application-layer protocols reside. The +Internet's application layer includes many protocols, such as the HTTP +protocol (which provides for Web document request and transfer), SMTP +(which provides for the transfer of e-mail messages), and FTP (which +provides for the transfer of files between two end systems). We'll see +that certain network functions, such as the translation of +human-friendly names for Internet end systems like www.ietf.org to a +32-bit network address, are also done with the help of a specific +application-layer protocol, namely, the domain name system (DNS). We'll +see in Chapter 2 that it is very easy to create and deploy our own new +application-layer protocols. An application-layer protocol is +distributed over multiple end systems, with the application in one end +system using the protocol to exchange packets of information with the +application in another end system. We'll refer to this packet of +information at the application layer as a message. Transport Layer + +The Internet's transport layer transports application-layer messages +between application endpoints. In the Internet there are two transport +protocols, TCP and UDP, either of which can transport applicationlayer +messages. TCP provides a connection-oriented service to its +applications. This service includes guaranteed delivery of +application-layer messages to the destination and flow control (that is, +sender/receiver speed matching). TCP also breaks long messages into +shorter segments and provides a congestion-control mechanism, so that a +source throttles its transmission rate when the network is congested. +The UDP protocol provides a connectionless service to its applications. +This is a no-frills service that provides no reliability, no flow +control, and no congestion control. In this book, we'll refer to a +transport-layer packet as a segment. Network Layer The Internet's +network layer is responsible for moving network-layer packets known as +datagrams from one host to another. The Internet transport-layer +protocol (TCP or UDP) in a source host passes a transport-layer segment +and a destination address to the network layer, just as you would give +the postal service a letter with a destination address. The network +layer then provides the service of delivering the segment to the +transport layer in the destination host. The Internet's network layer +includes the celebrated IP protocol, which defines the fields in the +datagram as well as how the end systems and routers act on these fields. +There is only one IP protocol, and all Internet components that have a +network layer must run the IP protocol. The Internet's network layer +also contains routing protocols that determine the routes that datagrams +take between sources and destinations. The Internet has many routing +protocols. As we saw in Section 1.3, the Internet is a network of +networks, and within a network, the network administrator can run any +routing protocol desired. Although the network layer contains both the +IP protocol and numerous routing protocols, it is often simply referred +to as the IP layer, reflecting the fact that IP is the glue that binds +the Internet together. Link Layer The Internet's network layer routes a +datagram through a series of routers between the source and destination. +To move a packet from one node (host or router) to the next node in the +route, the network layer relies on the services of the link layer. In +particular, at each node, the network layer passes the datagram down to +the link layer, which delivers the datagram to the next node along the +route. At this next node, the link layer passes the datagram up to the +network layer. The services provided by the link layer depend on the +specific link-layer protocol that is employed over the link. For +example, some link-layer protocols provide reliable delivery, from +transmitting node, over one link, to receiving node. Note that this +reliable delivery service is different from the reliable delivery +service of TCP, which provides reliable delivery from one end system to +another. Examples of link-layer + +protocols include Ethernet, WiFi, and the cable access network's DOCSIS +protocol. As datagrams typically need to traverse several links to +travel from source to destination, a datagram may be handled by +different link-layer protocols at different links along its route. For +example, a datagram may be handled by Ethernet on one link and by PPP on +the next link. The network layer will receive a different service from +each of the different link-layer protocols. In this book, we'll refer to +the link-layer packets as frames. Physical Layer While the job of the +link layer is to move entire frames from one network element to an +adjacent network element, the job of the physical layer is to move the +individual bits within the frame from one node to the next. The +protocols in this layer are again link dependent and further depend on +the actual transmission medium of the link (for example, twisted-pair +copper wire, single-mode fiber optics). For example, Ethernet has many +physical-layer protocols: one for twisted-pair copper wire, another for +coaxial cable, another for fiber, and so on. In each case, a bit is +moved across the link in a different way. The OSI Model Having discussed +the Internet protocol stack in detail, we should mention that it is not +the only protocol stack around. In particular, back in the late 1970s, +the International Organization for Standardization (ISO) proposed that +computer networks be organized around seven layers, called the Open +Systems Interconnection (OSI) model \[ISO 2016\]. The OSI model took +shape when the protocols that were to become the Internet protocols were +in their infancy, and were but one of many different protocol suites +under development; in fact, the inventors of the original OSI model +probably did not have the Internet in mind when creating it. +Nevertheless, beginning in the late 1970s, many training and university +courses picked up on the ISO mandate and organized courses around the +seven-layer model. Because of its early impact on networking education, +the seven-layer model continues to linger on in some networking +textbooks and training courses. The seven layers of the OSI reference +model, shown in Figure 1.23(b), are: application layer, presentation +layer, session layer, transport layer, network layer, data link layer, +and physical layer. The functionality of five of these layers is roughly +the same as their similarly named Internet counterparts. Thus, let's +consider the two additional layers present in the OSI reference +model---the presentation layer and the session layer. The role of the +presentation layer is to provide services that allow communicating +applications to interpret the meaning of data exchanged. These services +include data compression and data encryption (which are +self-explanatory) as well as data description (which frees the +applications from having to worry about the internal format in which +data are represented/stored---formats that may differ from one computer +to another). The session layer provides for delimiting and +synchronization of data exchange, including the means to build a +checkpointing and recovery scheme. + +The fact that the Internet lacks two layers found in the OSI reference +model poses a couple of interesting questions: Are the services provided +by these layers unimportant? What if an application needs one of these +services? The Internet's answer to both of these questions is the +same---it's up to the application developer. It's up to the application +developer to decide if a service is important, and if the service is +important, it's up to the application developer to build that +functionality into the application. + +1.5.2 Encapsulation Figure 1.24 shows the physical path that data takes +down a sending end system's protocol stack, up and down the protocol +stacks of an intervening link-layer switch + +Figure 1.24 Hosts, routers, and link-layer switches; each contains a +different set of layers, reflecting their differences in functionality + +and router, and then up the protocol stack at the receiving end system. +As we discuss later in this book, routers and link-layer switches are +both packet switches. Similar to end systems, routers and link-layer +switches organize their networking hardware and software into layers. +But routers and link-layer switches do not implement all of the layers +in the protocol stack; they typically implement only the bottom layers. +As shown in Figure 1.24, link-layer switches implement layers 1 and 2; +routers implement layers 1 through 3. This means, for example, that +Internet routers are capable of implementing the IP protocol (a layer 3 +protocol), while link-layer switches are not. We'll see later that + +while link-layer switches do not recognize IP addresses, they are +capable of recognizing layer 2 addresses, such as Ethernet addresses. +Note that hosts implement all five layers; this is consistent with the +view that the Internet architecture puts much of its complexity at the +edges of the network. Figure 1.24 also illustrates the important concept +of encapsulation. At the sending host, an application-layer message (M +in Figure 1.24) is passed to the transport layer. In the simplest case, +the transport layer takes the message and appends additional information +(so-called transport-layer header information, Ht in Figure 1.24) that +will be used by the receiver-side transport layer. The application-layer +message and the transport-layer header information together constitute +the transportlayer segment. The transport-layer segment thus +encapsulates the application-layer message. The added information might +include information allowing the receiver-side transport layer to +deliver the message up to the appropriate application, and +error-detection bits that allow the receiver to determine whether bits +in the message have been changed in route. The transport layer then +passes the segment to the network layer, which adds network-layer header +information (Hn in Figure 1.24) such as source and destination end +system addresses, creating a network-layer datagram. The datagram is +then passed to the link layer, which (of course!) will add its own +link-layer header information and create a link-layer frame. Thus, we +see that at each layer, a packet has two types of fields: header fields +and a payload field. The payload is typically a packet from the layer +above. A useful analogy here is the sending of an interoffice memo from +one corporate branch office to another via the public postal service. +Suppose Alice, who is in one branch office, wants to send a memo to Bob, +who is in another branch office. The memo is analogous to the +application-layer message. Alice puts the memo in an interoffice +envelope with Bob's name and department written on the front of the +envelope. The interoffice envelope is analogous to a transport-layer +segment---it contains header information (Bob's name and department +number) and it encapsulates the application-layer message (the memo). +When the sending branch-office mailroom receives the interoffice +envelope, it puts the interoffice envelope inside yet another envelope, +which is suitable for sending through the public postal service. The +sending mailroom also writes the postal address of the sending and +receiving branch offices on the postal envelope. Here, the postal +envelope is analogous to the datagram---it encapsulates the +transportlayer segment (the interoffice envelope), which encapsulates +the original message (the memo). The postal service delivers the postal +envelope to the receiving branch-office mailroom. There, the process of +de-encapsulation is begun. The mailroom extracts the interoffice memo +and forwards it to Bob. Finally, Bob opens the envelope and removes the +memo. The process of encapsulation can be more complex than that +described above. For example, a large message may be divided into +multiple transport-layer segments (which might themselves each be +divided into multiple network-layer datagrams). At the receiving end, +such a segment must then be reconstructed from its constituent +datagrams. + +1.6 Networks Under Attack The Internet has become mission critical for +many institutions today, including large and small companies, +universities, and government agencies. Many individuals also rely on the +Internet for many of their professional, social, and personal +activities. Billions of "things," including wearables and home devices, +are currently being connected to the Internet. But behind all this +utility and excitement, there is a dark side, a side where "bad guys" +attempt to wreak havoc in our daily lives by damaging our +Internetconnected computers, violating our privacy, and rendering +inoperable the Internet services on which we depend. The field of +network security is about how the bad guys can attack computer networks +and about how we, soon-to-be experts in computer networking, can defend +networks against those attacks, or better yet, design new architectures +that are immune to such attacks in the first place. Given the frequency +and variety of existing attacks as well as the threat of new and more +destructive future attacks, network security has become a central topic +in the field of computer networking. One of the features of this +textbook is that it brings network security issues to the forefront. +Since we don't yet have expertise in computer networking and Internet +protocols, we'll begin here by surveying some of today's more prevalent +security-related problems. This will whet our appetite for more +substantial discussions in the upcoming chapters. So we begin here by +simply asking, what can go wrong? How are computer networks vulnerable? +What are some of the more prevalent types of attacks today? The Bad Guys +Can Put Malware into Your Host Via the Internet We attach devices to the +Internet because we want to receive/send data from/to the Internet. This +includes all kinds of good stuff, including Instagram posts, Internet +search results, streaming music, video conference calls, streaming +movies, and so on. But, unfortunately, along with all that good stuff +comes malicious stuff---collectively known as malware---that can also +enter and infect our devices. Once malware infects our device it can do +all kinds of devious things, including deleting our files and installing +spyware that collects our private information, such as social security +numbers, passwords, and keystrokes, and then sends this (over the +Internet, of course!) back to the bad guys. Our compromised host may +also be enrolled in a network of thousands of similarly compromised +devices, collectively known as a botnet, which the bad guys control and +leverage for spam e-mail distribution or distributed denial-of-service +attacks (soon to be discussed) against targeted hosts. + +Much of the malware out there today is self-replicating: once it infects +one host, from that host it seeks entry into other hosts over the +Internet, and from the newly infected hosts, it seeks entry into yet +more hosts. In this manner, self-replicating malware can spread +exponentially fast. Malware can spread in the form of a virus or a worm. +Viruses are malware that require some form of user interaction to infect +the user's device. The classic example is an e-mail attachment +containing malicious executable code. If a user receives and opens such +an attachment, the user inadvertently runs the malware on the device. +Typically, such e-mail viruses are self-replicating: once executed, the +virus may send an identical message with an identical malicious +attachment to, for example, every recipient in the user's address book. +Worms are malware that can enter a device without any explicit user +interaction. For example, a user may be running a vulnerable network +application to which an attacker can send malware. In some cases, +without any user intervention, the application may accept the malware +from the Internet and run it, creating a worm. The worm in the newly +infected device then scans the Internet, searching for other hosts +running the same vulnerable network application. When it finds other +vulnerable hosts, it sends a copy of itself to those hosts. Today, +malware, is pervasive and costly to defend against. As you work through +this textbook, we encourage you to think about the following question: +What can computer network designers do to defend Internet-attached +devices from malware attacks? The Bad Guys Can Attack Servers and +Network Infrastructure Another broad class of security threats are known +as denial-of-service (DoS) attacks. As the name suggests, a DoS attack +renders a network, host, or other piece of infrastructure unusable by +legitimate users. Web servers, e-mail servers, DNS servers (discussed in +Chapter 2), and institutional networks can all be subject to DoS +attacks. Internet DoS attacks are extremely common, with thousands of +DoS attacks occurring every year \[Moore 2001\]. The site Digital Attack +Map allows use to visualize the top daily DoS attacks worldwide \[DAM +2016\]. Most Internet DoS attacks fall into one of three categories: +Vulnerability attack. This involves sending a few well-crafted messages +to a vulnerable application or operating system running on a targeted +host. If the right sequence of packets is sent to a vulnerable +application or operating system, the service can stop or, worse, the +host can crash. Bandwidth flooding. The attacker sends a deluge of +packets to the targeted host---so many packets that the target's access +link becomes clogged, preventing legitimate packets from reaching the +server. Connection flooding. The attacker establishes a large number of +half-open or fully open TCP connections (TCP connections are discussed +in Chapter 3) at the target host. The host can become so bogged down +with these bogus connections that it stops accepting legitimate +connections. Let's now explore the bandwidth-flooding attack in more +detail. Recalling our delay and loss analysis discussion in Section +1.4.2, it's evident that if the server has an access rate of R bps, then +the attacker will need to send traffic at a rate of approximately R bps +to cause damage. If R is very large, a single attack source may not be +able to generate enough traffic to harm the server. Furthermore, if all +the + +traffic emanates from a single source, an upstream router may be able to +detect the attack and block all traffic from that source before the +traffic gets near the server. In a distributed DoS (DDoS) attack, +illustrated in Figure 1.25, the attacker controls multiple sources and +has each source blast traffic at the target. With this approach, the +aggregate traffic rate across all the controlled sources needs to be +approximately R to cripple the service. DDoS attacks leveraging botnets +with thousands of comprised hosts are a common occurrence today \[DAM +2016\]. DDos attacks are much harder to detect and defend against than a +DoS attack from a single host. We encourage you to consider the +following question as you work your way through this book: What can +computer network designers do to defend against DoS attacks? We will see +that different defenses are needed for the three types of DoS attacks. + +Figure 1.25 A distributed denial-of-service attack + +The Bad Guys Can Sniff Packets Many users today access the Internet via +wireless devices, such as WiFi-connected laptops or handheld devices +with cellular Internet connections (covered in Chapter 7). While +ubiquitous Internet access is extremely convenient and enables marvelous +new applications for mobile users, it also creates a major security +vulnerability---by placing a passive receiver in the vicinity of the +wireless transmitter, that receiver can obtain a copy of every packet +that is transmitted! These packets can contain all kinds of sensitive +information, including passwords, social security numbers, trade +secrets, and private personal messages. A passive receiver that records +a copy of every packet that flies by is called a packet sniffer. + +Sniffers can be deployed in wired environments as well. In wired +broadcast environments, as in many Ethernet LANs, a packet sniffer can +obtain copies of broadcast packets sent over the LAN. As described in +Section 1.2, cable access technologies also broadcast packets and are +thus vulnerable to sniffing. Furthermore, a bad guy who gains access to +an institution's access router or access link to the Internet may be +able to plant a sniffer that makes a copy of every packet going to/from +the organization. Sniffed packets can then be analyzed offline for +sensitive information. Packet-sniffing software is freely available at +various Web sites and as commercial products. Professors teaching a +networking course have been known to assign lab exercises that involve +writing a packetsniffing and application-layer data reconstruction +program. Indeed, the Wireshark \[Wireshark 2016\] labs associated with +this text (see the introductory Wireshark lab at the end of this +chapter) use exactly such a packet sniffer! Because packet sniffers are +passive---that is, they do not inject packets into the channel---they +are difficult to detect. So, when we send packets into a wireless +channel, we must accept the possibility that some bad guy may be +recording copies of our packets. As you may have guessed, some of the +best defenses against packet sniffing involve cryptography. We will +examine cryptography as it applies to network security in Chapter 8. The +Bad Guys Can Masquerade as Someone You Trust It is surprisingly easy +(you will have the knowledge to do so shortly as you proceed through +this text!) to create a packet with an arbitrary source address, packet +content, and destination address and then transmit this hand-crafted +packet into the Internet, which will dutifully forward the packet to its +destination. Imagine the unsuspecting receiver (say an Internet router) +who receives such a packet, takes the (false) source address as being +truthful, and then performs some command embedded in the packet's +contents (say modifies its forwarding table). The ability to inject +packets into the Internet with a false source address is known as IP +spoofing, and is but one of many ways in which one user can masquerade +as another user. To solve this problem, we will need end-point +authentication, that is, a mechanism that will allow us to determine +with certainty if a message originates from where we think it does. Once +again, we encourage you to think about how this can be done for network +applications and protocols as you progress through the chapters of this +book. We will explore mechanisms for end-point authentication in Chapter +8. In closing this section, it's worth considering how the Internet got +to be such an insecure place in the first place. The answer, in essence, +is that the Internet was originally designed to be that way, based on +the model of "a group of mutually trusting users attached to a +transparent network" \[Blumenthal 2001\]---a model in which (by +definition) there is no need for security. Many aspects of the original +Internet architecture deeply reflect this notion of mutual trust. For +example, the ability for one user to send a + +packet to any other user is the default rather than a requested/granted +capability, and user identity is taken at declared face value, rather +than being authenticated by default. But today's Internet certainly does +not involve "mutually trusting users." Nonetheless, today's users still +need to communicate when they don't necessarily trust each other, may +wish to communicate anonymously, may communicate indirectly through +third parties (e.g., Web caches, which we'll study in Chapter 2, or +mobility-assisting agents, which we'll study in Chapter 7), and may +distrust the hardware, software, and even the air through which they +communicate. We now have many security-related challenges before us as +we progress through this book: We should seek defenses against sniffing, +endpoint masquerading, man-in-the-middle attacks, DDoS attacks, malware, +and more. We should keep in mind that communication among mutually +trusted users is the exception rather than the rule. Welcome to the +world of modern computer networking! + +1.7 History of Computer Networking and the Internet Sections 1.1 through +1.6 presented an overview of the technology of computer networking and +the Internet. You should know enough now to impress your family and +friends! However, if you really want to be a big hit at the next +cocktail party, you should sprinkle your discourse with tidbits about +the fascinating history of the Internet \[Segaller 1998\]. + +1.7.1 The Development of Packet Switching: 1961--1972 The field of +computer networking and today's Internet trace their beginnings back to +the early 1960s, when the telephone network was the world's dominant +communication network. Recall from Section 1.3 that the telephone +network uses circuit switching to transmit information from a sender to +a receiver---an appropriate choice given that voice is transmitted at a +constant rate between sender and receiver. Given the increasing +importance of computers in the early 1960s and the advent of timeshared +computers, it was perhaps natural to consider how to hook computers +together so that they could be shared among geographically distributed +users. The traffic generated by such users was likely to be +bursty---intervals of activity, such as the sending of a command to a +remote computer, followed by periods of inactivity while waiting for a +reply or while contemplating the received response. Three research +groups around the world, each unaware of the others' work \[Leiner +1998\], began inventing packet switching as an efficient and robust +alternative to circuit switching. The first published work on +packet-switching techniques was that of Leonard Kleinrock \[Kleinrock +1961; Kleinrock 1964\], then a graduate student at MIT. Using queuing +theory, Kleinrock's work elegantly demonstrated the effectiveness of the +packet-switching approach for bursty traffic sources. In 1964, Paul +Baran \[Baran 1964\] at the Rand Institute had begun investigating the +use of packet switching for secure voice over military networks, and at +the National Physical Laboratory in England, Donald Davies and Roger +Scantlebury were also developing their ideas on packet switching. The +work at MIT, Rand, and the NPL laid the foundations for today's +Internet. But the Internet also has a long history of a +let's-build-it-and-demonstrate-it attitude that also dates back to the +1960s. J. C. R. Licklider \[DEC 1990\] and Lawrence Roberts, both +colleagues of Kleinrock's at MIT, went on to lead the computer science +program at the Advanced Research Projects Agency (ARPA) in the United +States. Roberts published an overall plan for the ARPAnet \[Roberts +1967\], the first packet-switched computer network and a direct ancestor +of today's public Internet. On Labor Day in 1969, the first packet +switch was installed at UCLA under Kleinrock's supervision, and three +additional packet switches were installed + +shortly thereafter at the Stanford Research Institute (SRI), UC Santa +Barbara, and the University of Utah (Figure 1.26). The fledgling +precursor to the Internet was four nodes large by the end of 1969. +Kleinrock recalls the very first use of the network to perform a remote +login from UCLA to SRI, crashing the system \[Kleinrock 2004\]. By 1972, +ARPAnet had grown to approximately 15 nodes and was given its first +public demonstration by Robert Kahn. The first host-to-host protocol +between ARPAnet end systems, known as the networkcontrol protocol (NCP), +was completed \[RFC 001\]. With an end-to-end protocol available, +applications could now be written. Ray Tomlinson wrote the first e-mail +program in 1972. + +1.7.2 Proprietary Networks and Internetworking: 1972--1980 The initial +ARPAnet was a single, closed network. In order to communicate with an +ARPAnet host, one had to be actually attached to another ARPAnet IMP. In +the early to mid-1970s, additional stand-alone packet-switching networks +besides ARPAnet came into being: ALOHANet, a microwave network linking +universities on the Hawaiian islands \[Abramson 1970\], as well as +DARPA's packet-satellite \[RFC 829\] + +Figure 1.26 An early packet switch + +and packet-radio networks \[Kahn 1978\]; Telenet, a BBN commercial +packet-switching network based on ARPAnet technology; Cyclades, a French +packet-switching network pioneered by Louis Pouzin \[Think 2012\]; +Time-sharing networks such as Tymnet and the GE Information Services +network, among others, in the late 1960s and early 1970s \[Schwartz +1977\]; IBM's SNA (1969--1974), which paralleled the ARPAnet work +\[Schwartz 1977\]. + +The number of networks was growing. With perfect hindsight we can see +that the time was ripe for developing an encompassing architecture for +connecting networks together. Pioneering work on interconnecting +networks (under the sponsorship of the Defense Advanced Research +Projects Agency (DARPA)), in essence creating a network of networks, was +done by Vinton Cerf and Robert Kahn \[Cerf 1974\]; the term internetting +was coined to describe this work. These architectural principles were +embodied in TCP. The early versions of TCP, however, were quite +different from today's TCP. The early versions of TCP combined a +reliable in-sequence delivery of data via end-system retransmission +(still part of today's TCP) with forwarding functions (which today are +performed by IP). Early experimentation with TCP, combined with the +recognition of the importance of an unreliable, non-flow-controlled, +end-to-end transport service for applications such as packetized voice, +led to the separation of IP out of TCP and the development of the UDP +protocol. The three key Internet protocols that we see today---TCP, UDP, +and IP---were conceptually in place by the end of the 1970s. In addition +to the DARPA Internet-related research, many other important networking +activities were underway. In Hawaii, Norman Abramson was developing +ALOHAnet, a packet-based radio network that allowed multiple remote +sites on the Hawaiian Islands to communicate with each other. The ALOHA +protocol \[Abramson 1970\] was the first multiple-access protocol, +allowing geographically distributed users to share a single broadcast +communication medium (a radio frequency). Metcalfe and Boggs built on +Abramson's multiple-access protocol work when they developed the +Ethernet protocol \[Metcalfe 1976\] for wire-based shared broadcast +networks. Interestingly, Metcalfe and Boggs' Ethernet protocol was +motivated by the need to connect multiple PCs, printers, and shared +disks \[Perkins 1994\]. Twentyfive years ago, well before the PC +revolution and the explosion of networks, Metcalfe and Boggs were laying +the foundation for today's PC LANs. + +1.7.3 A Proliferation of Networks: 1980--1990 By the end of the 1970s, +approximately two hundred hosts were connected to the ARPAnet. By the +end of the 1980s the number of hosts connected to the public Internet, a +confederation of networks looking much like today's Internet, would +reach a hundred thousand. The 1980s would be a time of tremendous +growth. Much of that growth resulted from several distinct efforts to +create computer networks linking universities together. BITNET provided +e-mail and file transfers among several universities in the Northeast. +CSNET (computer science network) was formed to link university +researchers who did not have access to ARPAnet. In 1986, NSFNET was +created to provide access to NSF-sponsored supercomputing centers. +Starting with an initial backbone speed of 56 kbps, NSFNET's backbone +would be running at 1.5 Mbps by the end of the decade and would serve as +a primary backbone linking regional networks. + +In the ARPAnet community, many of the final pieces of today's Internet +architecture were falling into place. January 1, 1983 saw the official +deployment of TCP/IP as the new standard host protocol for ARPAnet +(replacing the NCP protocol). The transition \[RFC 801\] from NCP to +TCP/IP was a flag day event---all hosts were required to transfer over +to TCP/IP as of that day. In the late 1980s, important extensions were +made to TCP to implement host-based congestion control \[Jacobson +1988\]. The DNS, used to map between a human-readable Internet name (for +example, gaia.cs.umass.edu) and its 32-bit IP address, was also +developed \[RFC 1034\]. Paralleling this development of the ARPAnet +(which was for the most part a US effort), in the early 1980s the French +launched the Minitel project, an ambitious plan to bring data networking +into everyone's home. Sponsored by the French government, the Minitel +system consisted of a public packet-switched network (based on the X.25 +protocol suite), Minitel servers, and inexpensive terminals with +built-in low-speed modems. The Minitel became a huge success in 1984 +when the French government gave away a free Minitel terminal to each +French household that wanted one. Minitel sites included free +sites---such as a telephone directory site---as well as private sites, +which collected a usage-based fee from each user. At its peak in the mid +1990s, it offered more than 20,000 services, ranging from home banking +to specialized research databases. The Minitel was in a large proportion +of French homes 10 years before most Americans had ever heard of the +Internet. + +1.7.4 The Internet Explosion: The 1990s The 1990s were ushered in with a +number of events that symbolized the continued evolution and the +soon-to-arrive commercialization of the Internet. ARPAnet, the +progenitor of the Internet, ceased to exist. In 1991, NSFNET lifted its +restrictions on the use of NSFNET for commercial purposes. NSFNET itself +would be decommissioned in 1995, with Internet backbone traffic being +carried by commercial Internet Service Providers. The main event of the +1990s was to be the emergence of the World Wide Web application, which +brought the Internet into the homes and businesses of millions of people +worldwide. The Web served as a platform for enabling and deploying +hundreds of new applications that we take for granted today, including +search (e.g., Google and Bing) Internet commerce (e.g., Amazon and eBay) +and social networks (e.g., Facebook). The Web was invented at CERN by +Tim Berners-Lee between 1989 and 1991 \[Berners-Lee 1989\], based on +ideas originating in earlier work on hypertext from the 1940s by +Vannevar Bush \[Bush 1945\] and since the 1960s by Ted Nelson \[Xanadu +2012\]. Berners-Lee and his associates developed initial versions of +HTML, HTTP, a Web server, and a browser---the four key components of the +Web. Around the end of 1993 there were about two hundred Web servers in +operation, this collection of servers being + +just a harbinger of what was about to come. At about this time several +researchers were developing Web browsers with GUI interfaces, including +Marc Andreessen, who along with Jim Clark, formed Mosaic Communications, +which later became Netscape Communications Corporation \[Cusumano 1998; +Quittner 1998\]. By 1995, university students were using Netscape +browsers to surf the Web on a daily basis. At about this time +companies---big and small---began to operate Web servers and transact +commerce over the Web. In 1996, Microsoft started to make browsers, +which started the browser war between Netscape and Microsoft, which +Microsoft won a few years later \[Cusumano 1998\]. The second half of +the 1990s was a period of tremendous growth and innovation for the +Internet, with major corporations and thousands of startups creating +Internet products and services. By the end of the millennium the +Internet was supporting hundreds of popular applications, including four +killer applications: E-mail, including attachments and Web-accessible +e-mail The Web, including Web browsing and Internet commerce Instant +messaging, with contact lists Peer-to-peer file sharing of MP3s, +pioneered by Napster Interestingly, the first two killer applications +came from the research community, whereas the last two were created by a +few young entrepreneurs. The period from 1995 to 2001 was a +roller-coaster ride for the Internet in the financial markets. Before +they were even profitable, hundreds of Internet startups made initial +public offerings and started to be traded in a stock market. Many +companies were valued in the billions of dollars without having any +significant revenue streams. The Internet stocks collapsed in +2000--2001, and many startups shut down. Nevertheless, a number of +companies emerged as big winners in the Internet space, including +Microsoft, Cisco, Yahoo, e-Bay, Google, and Amazon. + +1.7.5 The New Millennium Innovation in computer networking continues at +a rapid pace. Advances are being made on all fronts, including +deployments of faster routers and higher transmission speeds in both +access networks and in network backbones. But the following developments +merit special attention: Since the beginning of the millennium, we have +been seeing aggressive deployment of broadband Internet access to +homes---not only cable modems and DSL but also fiber to the home, as +discussed in Section 1.2. This high-speed Internet access has set the +stage for a wealth of video applications, including the distribution of +user-generated video (for example, YouTube), on-demand streaming of +movies and television shows (e.g., Netflix), and multi-person video +conference (e.g., Skype, + +Facetime, and Google Hangouts). The increasing ubiquity of high-speed +(54 Mbps and higher) public WiFi networks and mediumspeed (tens of Mbps) +Internet access via 4G cellular telephony networks is not only making it +possible to remain constantly connected while on the move, but also +enabling new location-specific applications such as Yelp, Tinder, Yik +Yak, and Waz. The number of wireless devices connecting to the Internet +surpassed the number of wired devices in 2011. This high-speed wireless +access has set the stage for the rapid emergence of hand-held computers +(iPhones, Androids, iPads, and so on), which enjoy constant and +untethered access to the Internet. Online social networks---such as +Facebook, Instagram, Twitter, and WeChat (hugely popular in +China)---have created massive people networks on top of the Internet. +Many of these social networks are extensively used for messaging as well +as photo sharing. Many Internet users today "live" primarily within one +or more social networks. Through their APIs, the online social networks +create platforms for new networked applications and distributed games. +As discussed in Section 1.3.3, online service providers, such as Google +and Microsoft, have deployed their own extensive private networks, which +not only connect together their globally distributed data centers, but +are used to bypass the Internet as much as possible by peering directly +with lower-tier ISPs. As a result, Google provides search results and +e-mail access almost instantaneously, as if their data centers were +running within one's own computer. Many Internet commerce companies are +now running their applications in the "cloud"---such as in Amazon's EC2, +in Google's Application Engine, or in Microsoft's Azure. Many companies +and universities have also migrated their Internet applications (e.g., +e-mail and Web hosting) to the cloud. Cloud companies not only provide +applications scalable computing and storage environments, but also +provide the applications implicit access to their high-performance +private networks. + +1.8 Summary In this chapter we've covered a tremendous amount of +material! We've looked at the various pieces of hardware and software +that make up the Internet in particular and computer networks in +general. We started at the edge of the network, looking at end systems +and applications, and at the transport service provided to the +applications running on the end systems. We also looked at the +link-layer technologies and physical media typically found in the access +network. We then dove deeper inside the network, into the network core, +identifying packet switching and circuit switching as the two basic +approaches for transporting data through a telecommunication network, +and we examined the strengths and weaknesses of each approach. We also +examined the structure of the global Internet, learning that the +Internet is a network of networks. We saw that the Internet's +hierarchical structure, consisting of higherand lower-tier ISPs, has +allowed it to scale to include thousands of networks. In the second part +of this introductory chapter, we examined several topics central to the +field of computer networking. We first examined the causes of delay, +throughput and packet loss in a packetswitched network. We developed +simple quantitative models for transmission, propagation, and queuing +delays as well as for throughput; we'll make extensive use of these +delay models in the homework problems throughout this book. Next we +examined protocol layering and service models, key architectural +principles in networking that we will also refer back to throughout this +book. We also surveyed some of the more prevalent security attacks in +the Internet day. We finished our introduction to networking with a +brief history of computer networking. The first chapter in itself +constitutes a minicourse in computer networking. So, we have indeed +covered a tremendous amount of ground in this first chapter! If you're a +bit overwhelmed, don't worry. In the following chapters we'll revisit +all of these ideas, covering them in much more detail (that's a promise, +not a threat!). At this point, we hope you leave this chapter with a +still-developing intuition for the pieces that make up a network, a +still-developing command of the vocabulary of networking (don't be shy +about referring back to this chapter), and an ever-growing desire to +learn more about networking. That's the task ahead of us for the rest of +this book. + +Road-Mapping This Book Before starting any trip, you should always +glance at a road map in order to become familiar with the major roads +and junctures that lie ahead. For the trip we are about to embark on, +the ultimate destination is a deep understanding of the how, what, and +why of computer networks. Our road map is + +the sequence of chapters of this book: + +1. Computer Networks and the Internet +2. Application Layer +3. Transport Layer +4. Network Layer: Data Plane +5. Network Layer: Control Plane +6. The Link Layer and LANs +7. Wireless and Mobile Networks +8. Security in Computer Networks +9. Multimedia Networking Chapters 2 through 6 are the five core + chapters of this book. You should notice that these chapters are + organized around the top four layers of the five-layer Internet + protocol. Further note that our journey will begin at the top of the + Internet protocol stack, namely, the application layer, and will + work its way downward. The rationale behind this top-down journey is + that once we understand the applications, we can understand the + network services needed to support these applications. We can then, + in turn, examine the various ways in which such services might be + implemented by a network architecture. Covering applications early + thus provides motivation for the remainder of the text. The second + half of the book---Chapters 7 through 9---zooms in on three + enormously important (and somewhat independent) topics in modern + computer networking. In Chapter 7, we examine wireless and mobile + networks, including wireless LANs (including WiFi and Bluetooth), + Cellular telephony networks (including GSM, 3G, and 4G), and + mobility (in both IP and GSM networks). Chapter 8, which addresses + security in computer networks, first looks at the underpinnings of + encryption and network security, and then we examine how the basic + theory is being applied in a broad range of Internet contexts. The + last chapter, which addresses multimedia networking, examines audio + and video applications such as Internet phone, video conferencing, + and streaming of stored media. We also look at how a packetswitched + network can be designed to provide consistent quality of service to + audio and video applications. + +Homework Problems and Questions + +Chapter 1 Review Questions + +SECTION 1.1 R1. What is the difference between a host and an end system? +List several different types of end systems. Is a Web server an end +system? R2. The word protocol is often used to describe diplomatic +relations. How does Wikipedia describe diplomatic protocol? R3. Why are +standards important for protocols? + +SECTION 1.2 R4. List six access technologies. Classify each one as home +access, enterprise access, or widearea wireless access. R5. Is HFC +transmission rate dedicated or shared among users? Are collisions +possible in a downstream HFC channel? Why or why not? R6. List the +available residential access technologies in your city. For each type of +access, provide the advertised downstream rate, upstream rate, and +monthly price. R7. What is the transmission rate of Ethernet LANs? R8. +What are some of the physical media that Ethernet can run over? R9. +Dial-up modems, HFC, DSL and FTTH are all used for residential access. +For each of these access technologies, provide a range of transmission +rates and comment on whether the transmission rate is shared or +dedicated. R10. Describe the most popular wireless Internet access +technologies today. Compare and contrast them. + +SECTION 1.3 R11. Suppose there is exactly one packet switch between a +sending host and a receiving host. The transmission rates between the +sending host and the switch and between the switch and the receiving +host are R1 and R2, respectively. Assuming that the switch uses +store-and-forward packet switching, what is the total end-to-end delay +to send a packet of length L? (Ignore queuing, propagation delay, and +processing delay.) + +R12. What advantage does a circuit-switched network have over a +packet-switched network? What advantages does TDM have over FDM in a +circuit-switched network? R13. Suppose users share a 2 Mbps link. Also +suppose each user transmits continuously at 1 Mbps when transmitting, +but each user transmits only 20 percent of the time. (See the discussion +of statistical multiplexing in Section 1.3 .) + +a. When circuit switching is used, how many users can be supported? + +b. For the remainder of this problem, suppose packet switching is used. + Why will there be essentially no queuing delay before the link if + two or fewer users transmit at the same time? Why will there be a + queuing delay if three users transmit at the same time? + +c. Find the probability that a given user is transmitting. + +d. Suppose now there are three users. Find the probability that at any + given time, all three users are transmitting simultaneously. Find + the fraction of time during which the queue grows. R14. Why will two + ISPs at the same level of the hierarchy often peer with each other? + How does an IXP earn money? R15. Some content providers have created + their own networks. Describe Google's network. What motivates + content providers to create these networks? + +SECTION 1.4 R16. Consider sending a packet from a source host to a +destination host over a fixed route. List the delay components in the +end-to-end delay. Which of these delays are constant and which are +variable? R17. Visit the Transmission Versus Propagation Delay applet at +the companion Web site. Among the rates, propagation delay, and packet +sizes available, find a combination for which the sender finishes +transmitting before the first bit of the packet reaches the receiver. +Find another combination for which the first bit of the packet reaches +the receiver before the sender finishes transmitting. R18. How long does +it take a packet of length 1,000 bytes to propagate over a link of +distance 2,500 km, propagation speed 2.5⋅108 m/s, and transmission rate +2 Mbps? More generally, how long does it take a packet of length L to +propagate over a link of distance d, propagation speed s, and +transmission rate R bps? Does this delay depend on packet length? Does +this delay depend on transmission rate? R19. Suppose Host A wants to +send a large file to Host B. The path from Host A to Host B has three +links, of rates R1=500 kbps, R2=2 Mbps, and R3=1 Mbps. + +a. Assuming no other traffic in the network, what is the throughput for + the file transfer? + +b. Suppose the file is 4 million bytes. Dividing the file size by the + throughput, roughly how long will it take to transfer the file to + Host B? + +c. Repeat (a) and (b), but now with R2 reduced to 100 kbps. + +R20. Suppose end system A wants to send a large file to end system B. At +a very high level, describe how end system A creates packets from the +file. When one of these packets arrives to a router, what information in +the packet does the router use to determine the link onto which the +packet is forwarded? Why is packet switching in the Internet analogous +to driving from one city to another and asking directions along the way? +R21. Visit the Queuing and Loss applet at the companion Web site. What +is the maximum emission rate and the minimum transmission rate? With +those rates, what is the traffic intensity? Run the applet with these +rates and determine how long it takes for packet loss to occur. Then +repeat the experiment a second time and determine again how long it +takes for packet loss to occur. Are the values different? Why or why +not? + +SECTION 1.5 R22. List five tasks that a layer can perform. Is it +possible that one (or more) of these tasks could be performed by two (or +more) layers? R23. What are the five layers in the Internet protocol +stack? What are the principal responsibilities of each of these layers? +R24. What is an application-layer message? A transport-layer segment? A +network-layer datagram? A link-layer frame? R25. Which layers in the +Internet protocol stack does a router process? Which layers does a +link-layer switch process? Which layers does a host process? + +SECTION 1.6 R26. What is the difference between a virus and a worm? R27. +Describe how a botnet can be created and how it can be used for a DDoS +attack. R28. Suppose Alice and Bob are sending packets to each other +over a computer network. Suppose Trudy positions herself in the network +so that she can capture all the packets sent by Alice and send whatever +she wants to Bob; she can also capture all the packets sent by Bob and +send whatever she wants to Alice. List some of the malicious things +Trudy can do from this position. + +Problems P1. Design and describe an application-level protocol to be +used between an automatic teller machine and a bank's centralized +computer. Your protocol should allow a user's card and password to be +verified, the account balance (which is maintained at the centralized +computer) to be queried, and an account withdrawal to be made (that is, +money disbursed to the user). + +Your protocol entities should be able to handle the all-too-common case +in which there is not enough money in the account to cover the +withdrawal. Specify your protocol by listing the messages exchanged and +the action taken by the automatic teller machine or the bank's +centralized computer on transmission and receipt of messages. Sketch the +operation of your protocol for the case of a simple withdrawal with no +errors, using a diagram similar to that in Figure 1.2 . Explicitly state +the assumptions made by your protocol about the underlying end-toend +transport service. P2. Equation 1.1 gives a formula for the end-to-end +delay of sending one packet of length L over N links of transmission +rate R. Generalize this formula for sending P such packets back-toback +over the N links. P3. Consider an application that transmits data at a +steady rate (for example, the sender generates an N-bit unit of data +every k time units, where k is small and fixed). Also, when such an +application starts, it will continue running for a relatively long +period of time. Answer the following questions, briefly justifying your +answer: + +a. Would a packet-switched network or a circuit-switched network be + more appropriate for this application? Why? + +b. Suppose that a packet-switched network is used and the only traffic + in this network comes from such applications as described above. + Furthermore, assume that the sum of the application data rates is + less than the capacities of each and every link. Is some form of + congestion control needed? Why? P4. Consider the circuit-switched + network in Figure 1.13 . Recall that there are 4 circuits on each + link. Label the four switches A, B, C, and D, going in the clockwise + direction. + +c. What is the maximum number of simultaneous connections that can be + in progress at any one time in this network? + +d. Suppose that all connections are between switches A and C. What is + the maximum number of simultaneous connections that can be in + progress? + +e. Suppose we want to make four connections between switches A and C, + and another four connections between switches B and D. Can we route + these calls through the four links to accommodate all eight + connections? P5. Review the car-caravan analogy in Section 1.4 . + Assume a propagation speed of 100 km/hour. + +f. Suppose the caravan travels 150 km, beginning in front of one + tollbooth, passing through a second tollbooth, and finishing just + after a third tollbooth. What is the end-to-end delay? + +g. Repeat (a), now assuming that there are eight cars in the caravan + instead of ten. P6. This elementary problem begins to explore + propagation delay and transmission delay, two central concepts in + data networking. Consider two hosts, A and B, connected by a single + link of rate R bps. Suppose that the two hosts are separated by m + meters, and suppose the + +propagation speed along the link is s meters/sec. Host A is to send a +packet of size L bits to Host B. + +Exploring propagation delay and transmission delay + +a. Express the propagation delay, dprop, in terms of m and s. + +b. Determine the transmission time of the packet, dtrans, in terms of L + and R. + +c. Ignoring processing and queuing delays, obtain an expression for the + end-to-end delay. + +d. Suppose Host A begins to transmit the packet at time t=0. At time t= + dtrans, where is the last bit of the packet? + +e. Suppose dprop is greater than dtrans. At time t=dtrans, where is the + first bit of the packet? + +f. Suppose dprop is less than dtrans. At time t=dtrans, where is the + first bit of the packet? + +g. Suppose s=2.5⋅108, L=120 bits, and R=56 kbps. Find the distance m so + that dprop equals dtrans. P7. In this problem, we consider sending + real-time voice from Host A to Host B over a packetswitched network + (VoIP). Host A converts analog voice to a digital 64 kbps bit stream + on the fly. Host A then groups the bits into 56-byte packets. There + is one link between Hosts A and B; its transmission rate is 2 Mbps + and its propagation delay is 10 msec. As soon as Host A gathers a + packet, it sends it to Host B. As soon as Host B receives an entire + packet, it converts the packet's bits to an analog signal. How much + time elapses from the time a bit is created (from the original + analog signal at Host A) until the bit is decoded (as part of the + analog signal at Host B)? P8. Suppose users share a 3 Mbps link. + Also suppose each user requires 150 kbps when transmitting, but each + user transmits only 10 percent of the time. (See the discussion of + packet switching versus circuit switching in Section 1.3 .) + +h. When circuit switching is used, how many users can be supported? + +i. For the remainder of this problem, suppose packet switching is used. + Find the probability that a given user is transmitting. + +j. Suppose there are 120 users. Find the probability that at any given + time, exactly n users are transmitting simultaneously. (Hint: Use + the binomial distribution.) + +k. Find the probability that there are 21 or more users transmitting + simultaneously. P9. Consider the discussion in Section 1.3 of packet + switching versus circuit switching in which an example is provided + with a 1 Mbps link. Users are generating data at a rate of 100 kbps + when busy, but are busy generating data only with probability p=0.1. + Suppose that the 1 Mbps link is + +replaced by a 1 Gbps link. + +a. What is N, the maximum number of users that can be supported + simultaneously under circuit switching? + +b. Now consider packet switching and a user population of M users. Give + a formula (in terms of p, M, N) for the probability that more than N + users are sending data. P10. Consider a packet of length L that + begins at end system A and travels over three links to a destination + end system. These three links are connected by two packet switches. + Let di, si, and Ri denote the length, propagation speed, and the + transmission rate of link i, for i=1,2,3. The packet switch delays + each packet by dproc. Assuming no queuing delays, in terms of di, + si, Ri, (i=1,2,3), and L, what is the total end-to-end delay for the + packet? Suppose now the packet is 1,500 bytes, the propagation speed + on all three links is 2.5⋅108m/s, the transmission rates of all + three links are 2 Mbps, the packet switch processing delay is 3 + msec, the length of the first link is 5,000 km, the length of the + second link is 4,000 km, and the length of the last link is 1,000 + km. For these values, what is the end-to-end delay? P11. In the + above problem, suppose R1=R2=R3=R and dproc=0. Further suppose the + packet switch does not store-and-forward packets but instead + immediately transmits each bit it receives before waiting for the + entire packet to arrive. What is the end-to-end delay? P12. A packet + switch receives a packet and determines the outbound link to which + the packet should be forwarded. When the packet arrives, one other + packet is halfway done being transmitted on this outbound link and + four other packets are waiting to be transmitted. Packets are + transmitted in order of arrival. Suppose all packets are 1,500 bytes + and the link rate is 2 Mbps. What is the queuing delay for the + packet? More generally, what is the queuing delay when all packets + have length L, the transmission rate is R, x bits of the + currently-being-transmitted packet have been transmitted, and n + packets are already in the queue? P13. + +c. Suppose N packets arrive simultaneously to a link at which no + packets are currently being transmitted or queued. Each packet is of + length L and the link has transmission rate R. What is the average + queuing delay for the N packets? + +d. Now suppose that N such packets arrive to the link every LN/R + seconds. What is the average queuing delay of a packet? P14. + Consider the queuing delay in a router buffer. Let I denote traffic + intensity; that is, I=La/R. Suppose that the queuing delay takes the + form IL/R(1−I) for I\<1. + +e. Provide a formula for the total delay, that is, the queuing delay + plus the transmission delay. + +f. Plot the total delay as a function of L /R. P15. Let a denote the + rate of packets arriving at a link in packets/sec, and let µ denote + the link's transmission rate in packets/sec. Based on the formula + for the total delay (i.e., the queuing delay + +plus the transmission delay) derived in the previous problem, derive a +formula for the total delay in terms of a and µ. P16. Consider a router +buffer preceding an outbound link. In this problem, you will use +Little's formula, a famous formula from queuing theory. Let N denote the +average number of packets in the buffer plus the packet being +transmitted. Let a denote the rate of packets arriving at the link. Let +d denote the average total delay (i.e., the queuing delay plus the +transmission delay) experienced by a packet. Little's formula is +N=a⋅d. Suppose that on average, the buffer contains 10 packets, and the +average packet queuing delay is 10 msec. The link's transmission rate is +100 packets/sec. Using Little's formula, what is the average packet +arrival rate, assuming there is no packet loss? P17. + +a. Generalize Equation 1.2 in Section 1.4.3 for heterogeneous + processing rates, transmission rates, and propagation delays. + +b. Repeat (a), but now also suppose that there is an average queuing + delay of dqueue at each node. P18. Perform a Traceroute between + source and destination on the same continent at three different + hours of the day. + +Using Traceroute to discover network paths and measure network delay + +a. Find the average and standard deviation of the round-trip delays at + each of the three hours. + +b. Find the number of routers in the path at each of the three hours. + Did the paths change during any of the hours? + +c. Try to identify the number of ISP networks that the Traceroute + packets pass through from source to destination. Routers with + similar names and/or similar IP addresses should be considered as + part of the same ISP. In your experiments, do the largest delays + occur at the peering interfaces between adjacent ISPs? + +d. Repeat the above for a source and destination on different + continents. Compare the intra-continent and inter-continent results. + P19. + +e. Visit the site www.traceroute.org and perform traceroutes from two + different cities in France to the same destination host in the + United States. How many links are the same + +in the two traceroutes? Is the transatlantic link the same? + +b. Repeat (a) but this time choose one city in France and another city + in Germany. + +c. Pick a city in the United States, and perform traceroutes to two + hosts, each in a different city in China. How many links are common + in the two traceroutes? Do the two traceroutes diverge before + reaching China? P20. Consider the throughput example corresponding + to Figure 1.20(b) . Now suppose that there are M client-server pairs + rather than 10. Denote Rs, Rc, and R for the rates of the server + links, client links, and network link. Assume all other links have + abundant capacity and that there is no other traffic in the network + besides the traffic generated by the M client-server pairs. Derive a + general expression for throughput in terms of Rs, Rc, R, and M. P21. + Consider Figure 1.19(b) . Now suppose that there are M paths between + the server and the client. No two paths share any link. Path + k(k=1,...,M) consists of N links with transmission rates + R1k,R2k,...,RNk. If the server can only use one path to send data to + the client, what is the maximum throughput that the server can + achieve? If the server can use all M paths to send data, what is the + maximum throughput that the server can achieve? P22. Consider Figure + 1.19(b) . Suppose that each link between the server and the client + has a packet loss probability p, and the packet loss probabilities + for these links are independent. What is the probability that a + packet (sent by the server) is successfully received by the + receiver? If a packet is lost in the path from the server to the + client, then the server will re-transmit the packet. On average, how + many times will the server re-transmit the packet in order for the + client to successfully receive the packet? P23. Consider Figure + 1.19(a) . Assume that we know the bottleneck link along the path + from the server to the client is the first link with rate Rs + bits/sec. Suppose we send a pair of packets back to back from the + server to the client, and there is no other traffic on this path. + Assume each packet of size L bits, and both links have the same + propagation delay dprop. + +d. What is the packet inter-arrival time at the destination? That is, + how much time elapses from when the last bit of the first packet + arrives until the last bit of the second packet arrives? + +e. Now assume that the second link is the bottleneck link (i.e., + Rc\<Rs). Is it possible that the second packet queues at the input + queue of the second link? Explain. Now suppose that the server sends + the second packet T seconds after sending the first packet. How + large must T be to ensure no queuing before the second link? + Explain. P24. Suppose you would like to urgently deliver 40 + terabytes data from Boston to Los Angeles. You have available a 100 + Mbps dedicated link for data transfer. Would you prefer to transmit + the data via this link or instead use FedEx over-night delivery? + Explain. P25. Suppose two hosts, A and B, are separated by 20,000 + kilometers and are connected by a direct link of R=2 Mbps. Suppose + the propagation speed over the link is 2.5⋅108 meters/sec. + +f. Calculate the bandwidth-delay product, R⋅dprop. + +b. Consider sending a file of 800,000 bits from Host A to Host B. +Suppose the file is sent continuously as one large message. What is the +maximum number of bits that will be in the link at any given time? + +c. Provide an interpretation of the bandwidth-delay product. + +d. What is the width (in meters) of a bit in the link? Is it longer + than a football field? + +e. Derive a general expression for the width of a bit in terms of the + propagation speed s, the transmission rate R, and the length of the + link m. P26. Referring to problem P25, suppose we can modify R. For + what value of R is the width of a bit as long as the length of the + link? P27. Consider problem P25 but now with a link of R=1 Gbps. + +f. Calculate the bandwidth-delay product, R⋅dprop. + +g. Consider sending a file of 800,000 bits from Host A to Host B. + Suppose the file is sent continuously as one big message. What is + the maximum number of bits that will be in the link at any given + time? + +h. What is the width (in meters) of a bit in the link? P28. Refer again + to problem P25. + +i. How long does it take to send the file, assuming it is sent + continuously? + +j. Suppose now the file is broken up into 20 packets with each packet + containing 40,000 bits. Suppose that each packet is acknowledged by + the receiver and the transmission time of an acknowledgment packet + is negligible. Finally, assume that the sender cannot send a packet + until the preceding one is acknowledged. How long does it take to + send the file? + +k. Compare the results from (a) and (b). P29. Suppose there is a 10 + Mbps microwave link between a geostationary satellite and its base + station on Earth. Every minute the satellite takes a digital photo + and sends it to the base station. Assume a propagation speed of + 2.4⋅108 meters/sec. + +l. What is the propagation delay of the link? + +m. What is the bandwidth-delay product, R⋅dprop? + +n. Let x denote the size of the photo. What is the minimum value of x + for the microwave link to be continuously transmitting? P30. + Consider the airline travel analogy in our discussion of layering in + Section 1.5 , and the addition of headers to protocol data units as + they flow down the protocol stack. Is there an equivalent notion of + header information that is added to passengers and baggage as they + move down the airline protocol stack? P31. In modern packet-switched + networks, including the Internet, the source host segments long, + application-layer messages (for example, an image or a music file) + into smaller packets + +and sends the packets into the network. The receiver then reassembles +the packets back into the original message. We refer to this process as +message segmentation. Figure 1.27 illustrates the end-to-end transport +of a message with and without message segmentation. Consider a message +that is 8⋅106 bits long that is to be sent from source to destination in +Figure 1.27 . Suppose each link in the figure is 2 Mbps. Ignore +propagation, queuing, and processing delays. + +a. Consider sending the message from source to destination without + message segmentation. How long does it take to move the message from + the source host to the first packet switch? Keeping in mind that + each switch uses store-and-forward packet switching, what is the + total time to move the message from source host to destination host? + +b. Now suppose that the message is segmented into 800 packets, with + each packet being 10,000 bits long. How long does it take to move + the first packet from source host to the first switch? When the + first packet is being sent from the first switch to the second + switch, the second packet is being sent from the source host to the + first switch. At what time will the second packet be fully received + at the first switch? + +c. How long does it take to move the file from source host to + destination host when message segmentation is used? Compare this + result with your answer in part (a) and comment. + +Figure 1.27 End-to-end message transport: (a) without message +segmentation; (b) with message segmentation + +d. In addition to reducing delay, what are reasons to use message + segmentation? +e. Discuss the drawbacks of message segmentation. P32. Experiment with + the Message Segmentation applet at the book's Web site. Do the + delays in the applet correspond to the delays in the previous + problem? How do link propagation delays affect the overall + end-to-end delay for packet switching (with message segmentation) + and for message switching? P33. Consider sending a large file of F + bits from Host A to Host B. There are three links (and two switches) + between A and B, and the links are uncongested (that is, no queuing + delays). Host A + +segments the file into segments of S bits each and adds 80 bits of +header to each segment, forming packets of L=80 + S bits. Each link has +a transmission rate of R bps. Find the value of S that minimizes the +delay of moving the file from Host A to Host B. Disregard propagation +delay. P34. Skype offers a service that allows you to make a phone call +from a PC to an ordinary phone. This means that the voice call must pass +through both the Internet and through a telephone network. Discuss how +this might be done. + +Wireshark Lab + +"Tell me and I forget. Show me and I remember. Involve me and I +understand." Chinese proverb + +One's understanding of network protocols can often be greatly deepened +by seeing them in action and by playing around with them---observing the +sequence of messages exchanged between two protocol entities, delving +into the details of protocol operation, causing protocols to perform +certain actions, and observing these actions and their consequences. +This can be done in simulated scenarios or in a real network environment +such as the Internet. The Java applets at the textbook Web site take the +first approach. In the Wireshark labs, we'll take the latter approach. +You'll run network applications in various scenarios using a computer on +your desk, at home, or in a lab. You'll observe the network protocols in +your computer, interacting and exchanging messages with protocol +entities executing elsewhere in the Internet. Thus, you and your +computer will be an integral part of these live labs. You'll +observe---and you'll learn---by doing. The basic tool for observing the +messages exchanged between executing protocol entities is called a +packet sniffer. As the name suggests, a packet sniffer passively copies +(sniffs) messages being sent from and received by your computer; it also +displays the contents of the various protocol fields of these captured +messages. A screenshot of the Wireshark packet sniffer is shown in +Figure 1.28. Wireshark is a free packet sniffer that runs on Windows, +Linux/Unix, and Mac computers. + +Figure 1.28 A Wireshark screenshot (Wireshark screenshot reprinted by +permission of the Wireshark Foundation.) + +Throughout the textbook, you will find Wireshark labs that allow you to +explore a number of the protocols studied in the chapter. In this first +Wireshark lab, you'll obtain and install a copy of Wireshark, access a +Web site, and capture and examine the protocol messages being exchanged +between your Web browser and the Web server. You can find full details +about this first Wireshark lab (including instructions about how to +obtain and install Wireshark) at the Web site +http://www.pearsonhighered.com/csresources/. + +AN INTERVIEW WITH... Leonard Kleinrock Leonard Kleinrock is a professor +of computer science at the University of California, Los Angeles. In +1969, his computer at UCLA became the first node of the Internet. His +creation of packet-switching principles in 1961 became the technology +behind the Internet. He received his B.E.E. from the City College of New +York (CCNY) and his masters and PhD in electrical engineering from MIT. + +What made you decide to specialize in networking/Internet technology? As +a PhD student at MIT in 1959, I looked around and found that most of my +classmates were doing research in the area of information theory and +coding theory. At MIT, there was the great researcher, Claude Shannon, +who had launched these fields and had solved most of the important +problems already. The research problems that were left were hard and of +lesser consequence. So I decided to launch out in a new area that no one +else had yet conceived of. Remember that at MIT I was surrounded by lots +of computers, and it was clear to me that soon these machines would need +to communicate with each other. At the time, there was no effective way +for them to do so, so I decided to develop the technology that would +permit efficient and reliable data networks to be created. What was your +first job in the computer industry? What did it entail? I went to the +evening session at CCNY from 1951 to 1957 for my bachelor's degree in +electrical engineering. During the day, I worked first as a technician +and then as an engineer at a small, industrial electronics firm called +Photobell. While there, I introduced digital technology to their product +line. Essentially, we were using photoelectric devices to detect the +presence of certain items (boxes, people, etc.) and the use of a circuit +known then as a bistable multivibrator was just the kind of technology +we needed to bring digital processing into this field of detection. +These circuits happen to be the building blocks for computers, and have +come to be known as flip-flops or switches in today's vernacular. What +was going through your mind when you sent the first host-to-host message +(from UCLA to the Stanford Research Institute)? Frankly, we had no idea +of the importance of that event. We had not prepared a special message +of historic significance, as did so many inventors of the past (Samuel +Morse with "What hath God wrought." or Alexander Graham Bell with +"Watson, come here! I want you." or Neal Amstrong with "That's one small +step for a man, one giant leap for mankind.") Those guys were + +smart! They understood media and public relations. All we wanted to do +was to login to the SRI computer. So we typed the "L", which was +correctly received, we typed the "o" which was received, and then we +typed the "g" which caused the SRI host computer to crash! So, it turned +out that our message was the shortest and perhaps the most prophetic +message ever, namely "Lo!" as in "Lo and behold!" Earlier that year, I +was quoted in a UCLA press release saying that once the network was up +and running, it would be possible to gain access to computer utilities +from our homes and offices as easily as we gain access to electricity +and telephone connectivity. So my vision at that time was that the +Internet would be ubiquitous, always on, always available, anyone with +any device could connect from any location, and it would be invisible. +However, I never anticipated that my 99-year-old mother would use the +Internet---and indeed she did! What is your vision for the future of +networking? The easy part of the vision is to predict the infrastructure +itself. I anticipate that we see considerable deployment of nomadic +computing, mobile devices, and smart spaces. Indeed, the availability of +lightweight, inexpensive, high-performance, portable computing, and +communication devices (plus the ubiquity of the Internet) has enabled us +to become nomads. Nomadic computing refers to the technology that +enables end users who travel from place to place to gain access to +Internet services in a transparent fashion, no matter where they travel +and no matter what device they carry or gain access to. The harder part +of the vision is to predict the applications and services, which have +consistently surprised us in dramatic ways (e-mail, search technologies, +the World Wide Web, blogs, social networks, user generation, and sharing +of music, photos, and videos, etc.). We are on the verge of a new class +of surprising and innovative mobile applications delivered to our +hand-held devices. The next step will enable us to move out from the +netherworld of cyberspace to the physical world of smart spaces. Our +environments (desks, walls, vehicles, watches, belts, and so on) will +come alive with technology, through actuators, sensors, logic, +processing, storage, cameras, microphones, speakers, displays, and +communication. This embedded technology will allow our environment to +provide the IP services we want. When I walk into a room, the room will +know I entered. I will be able to communicate with my environment +naturally, as in spoken English; my requests will generate replies that +present Web pages to me from wall displays, through my eyeglasses, as +speech, holograms, and so forth. Looking a bit further out, I see a +networking future that includes the following additional key components. +I see intelligent software agents deployed across the network whose +function it is to mine data, act on that data, observe trends, and carry +out tasks dynamically and adaptively. I see considerably more network +traffic generated not so much by humans, but by these embedded devices +and these intelligent software agents. I see large collections of +selforganizing systems controlling this vast, fast network. I see huge +amounts of information flashing + +across this network instantaneously with this information undergoing +enormous processing and filtering. The Internet will essentially be a +pervasive global nervous system. I see all these things and more as we +move headlong through the twenty-first century. What people have +inspired you professionally? By far, it was Claude Shannon from MIT, a +brilliant researcher who had the ability to relate his mathematical +ideas to the physical world in highly intuitive ways. He was on my PhD +thesis committee. Do you have any advice for students entering the +networking/Internet field? The Internet and all that it enables is a +vast new frontier, full of amazing challenges. There is room for great +innovation. Don't be constrained by today's technology. Reach out and +imagine what could be and then make it happen. + +Chapter 2 Application Layer + +Network applications are the raisons d'être of a computer network---if +we couldn't conceive of any useful applications, there wouldn't be any +need for networking infrastructure and protocols to support them. Since +the Internet's inception, numerous useful and entertaining applications +have indeed been created. These applications have been the driving force +behind the Internet's success, motivating people in homes, schools, +governments, and businesses to make the Internet an integral part of +their daily activities. Internet applications include the classic +text-based applications that became popular in the 1970s and 1980s: text +e-mail, remote access to computers, file transfers, and newsgroups. They +include the killer application of the mid-1990s, the World Wide Web, +encompassing Web surfing, search, and electronic commerce. They include +instant messaging and P2P file sharing, the two killer applications +introduced at the end of the millennium. In the new millennium, new and +highly compelling applications continue to emerge, including voice over +IP and video conferencing such as Skype, Facetime, and Google Hangouts; +user generated video such as YouTube and movies on demand such as +Netflix; multiplayer online games such as Second Life and World of +Warcraft. During this same period, we have seen the emergence of a new +generation of social networking applications---such as Facebook, +Instagram, Twitter, and WeChat---which have created engaging human +networks on top of the Internet's network or routers and communication +links. And most recently, along with the arrival of the smartphone, +there has been a profusion of location based mobile apps, including +popular check-in, dating, and road-traffic forecasting apps (such as +Yelp, Tinder, Waz, and Yik Yak). Clearly, there has been no slowing down +of new and exciting Internet applications. Perhaps some of the readers +of this text will create the next generation of killer Internet +applications! In this chapter we study the conceptual and implementation +aspects of network applications. We begin by defining key +application-layer concepts, including network services required by +applications, clients and servers, processes, and transport-layer +interfaces. We examine several network applications in detail, including +the Web, e-mail, DNS, peer-to-peer (P2P) file distribution, and video +streaming. (Chapter 9 will further examine multimedia applications, +including streaming video and VoIP.) We then cover network application +development, over both TCP and UDP. In particular, we study the socket +interface and walk through some simple client-server applications in +Python. We also provide several fun and interesting socket programming +assignments at the end of the chapter. + +The application layer is a particularly good place to start our study of +protocols. It's familiar ground. We're acquainted with many of the +applications that rely on the protocols we'll study. It will give us a +good feel for what protocols are all about and will introduce us to many +of the same issues that we'll see again when we study transport, +network, and link layer protocols. + +2.1 Principles of Network Applications Suppose you have an idea for a +new network application. Perhaps this application will be a great +service to humanity, or will please your professor, or will bring you +great wealth, or will simply be fun to develop. Whatever the motivation +may be, let's now examine how you transform the idea into a real-world +network application. At the core of network application development is +writing programs that run on different end systems and communicate with +each other over the network. For example, in the Web application there +are two distinct programs that communicate with each other: the browser +program running in the user's host (desktop, laptop, tablet, smartphone, +and so on); and the Web server program running in the Web server host. +As another example, in a P2P file-sharing system there is a program in +each host that participates in the file-sharing community. In this case, +the programs in the various hosts may be similar or identical. Thus, +when developing your new application, you need to write software that +will run on multiple end systems. This software could be written, for +example, in C, Java, or Python. Importantly, you do not need to write +software that runs on network-core devices, such as routers or +link-layer switches. Even if you wanted to write application software +for these network-core devices, you wouldn't be able to do so. As we +learned in Chapter 1, and as shown earlier in Figure 1.24, network-core +devices do not function at the application layer but instead function at +lower layers---specifically at the network layer and below. This basic +design---namely, confining application software to the end systems---as +shown in Figure 2.1, has facilitated the rapid development and +deployment of a vast array of network applications. + +Figure 2.1 Communication for a network application takes place between +end systems at the application layer + +2.1.1 Network Application Architectures + +Before diving into software coding, you should have a broad +architectural plan for your application. Keep in mind that an +application's architecture is distinctly different from the network +architecture (e.g., the five-layer Internet architecture discussed in +Chapter 1). From the application developer's perspective, the network +architecture is fixed and provides a specific set of services to +applications. The application architecture, on the other hand, is +designed by the application developer and dictates how the application +is structured over the various end systems. In choosing the application +architecture, an application developer will likely draw on one of the +two predominant architectural paradigms used in modern network +applications: the client-server architecture or the peer-to-peer (P2P) +architecture. In a client-server architecture, there is an always-on +host, called the server, which services requests from many other hosts, +called clients. A classic example is the Web application for which an +always-on Web server services requests from browsers running on client +hosts. When a Web server receives a request for an object from a client +host, it responds by sending the requested object to the client host. +Note that with the client-server architecture, clients do not directly +communicate with each other; for example, in the Web application, two +browsers do not directly communicate. Another characteristic of the +client-server architecture is that the server has a fixed, well-known +address, called an IP address (which we'll discuss soon). Because the +server has a fixed, well-known address, and because the server is always +on, a client can always contact the server by sending a packet to the +server's IP address. Some of the better-known applications with a +client-server architecture include the Web, FTP, Telnet, and e-mail. The +client-server architecture is shown in Figure 2.2(a). Often in a +client-server application, a single-server host is incapable of keeping +up with all the requests from clients. For example, a popular +social-networking site can quickly become overwhelmed if it has only one +server handling all of its requests. For this reason, a data center, +housing a large number of hosts, is often used to create a powerful +virtual server. The most popular Internet services---such as search +engines (e.g., Google, Bing, Baidu), Internet commerce (e.g., Amazon, +eBay, Alibaba), Webbased e-mail (e.g., Gmail and Yahoo Mail), social +networking (e.g., Facebook, Instagram, Twitter, and WeChat)---employ one +or more data centers. As discussed in Section 1.3.3, Google has 30 to 50 +data centers distributed around the world, which collectively handle +search, YouTube, Gmail, and other services. A data center can have +hundreds of thousands of servers, which must be powered and maintained. +Additionally, the service providers must pay recurring interconnection +and bandwidth costs for sending data from their data centers. In a P2P +architecture, there is minimal (or no) reliance on dedicated servers in +data centers. Instead the application exploits direct communication +between pairs of intermittently connected hosts, called peers. The peers +are not owned by the service provider, but are instead desktops and +laptops controlled by users, with most of the + +Figure 2.2 (a) Client-server architecture; (b) P2P architecture + +peers residing in homes, universities, and offices. Because the peers +communicate without passing through a dedicated server, the architecture +is called peer-to-peer. Many of today's most popular and +traffic-intensive applications are based on P2P architectures. These +applications include file sharing (e.g., BitTorrent), peer-assisted +download acceleration (e.g., Xunlei), and Internet telephony and video +conference (e.g., Skype). The P2P architecture is illustrated in Figure +2.2(b). We mention that some applications have hybrid architectures, +combining both client-server and P2P elements. For example, for many +instant messaging applications, servers are used to track the IP +addresses of users, but user-touser messages are sent directly between +user hosts (without passing through intermediate servers). One of the +most compelling features of P2P architectures is their self-scalability. +For example, in a P2P file-sharing application, although each peer +generates workload by requesting files, each peer also adds service +capacity to the system by distributing files to other peers. P2P +architectures are also cost effective, since they normally don't require +significant server infrastructure and server bandwidth (in contrast with +clients-server designs with datacenters). However, P2P applications face +challenges of security, performance, and reliability due to their highly +decentralized structure. + +2.1.2 Processes Communicating Before building your network application, +you also need a basic understanding of how the programs, running in +multiple end systems, communicate with each other. In the jargon of +operating systems, it is not actually programs but processes that +communicate. A process can be thought of as a program that is running +within an end system. When processes are running on the same end system, +they can communicate with each other with interprocess communication, +using rules that are governed by the end system's operating system. But +in this book we are not particularly interested in how processes in the +same host communicate, but instead in how processes running on different +hosts (with potentially different operating systems) communicate. +Processes on two different end systems communicate with each other by +exchanging messages across the computer network. A sending process +creates and sends messages into the network; a receiving process +receives these messages and possibly responds by sending messages back. +Figure 2.1 illustrates that processes communicating with each other +reside in the application layer of the five-layer protocol stack. Client +and Server Processes A network application consists of pairs of +processes that send messages to each other over a network. For example, +in the Web application a client browser process exchanges messages with +a Web server + +process. In a P2P file-sharing system, a file is transferred from a +process in one peer to a process in another peer. For each pair of +communicating processes, we typically label one of the two processes as +the client and the other process as the server. With the Web, a browser +is a client process and a Web server is a server process. With P2P file +sharing, the peer that is downloading the file is labeled as the client, +and the peer that is uploading the file is labeled as the server. You +may have observed that in some applications, such as in P2P file +sharing, a process can be both a client and a server. Indeed, a process +in a P2P file-sharing system can both upload and download files. +Nevertheless, in the context of any given communication session between +a pair of processes, we can still label one process as the client and +the other process as the server. We define the client and server +processes as follows: In the context of a communication session between +a pair of processes, the process that initiates the communication (that +is, initially contacts the other process at the beginning of the +session) is labeled as the client. The process that waits to be +contacted to begin the session is the server. In the Web, a browser +process initializes contact with a Web server process; hence the browser +process is the client and the Web server process is the server. In P2P +file sharing, when Peer A asks Peer B to send a specific file, Peer A is +the client and Peer B is the server in the context of this specific +communication session. When there's no confusion, we'll sometimes also +use the terminology "client side and server side of an application." At +the end of this chapter, we'll step through simple code for both the +client and server sides of network applications. The Interface Between +the Process and the Computer Network As noted above, most applications +consist of pairs of communicating processes, with the two processes in +each pair sending messages to each other. Any message sent from one +process to another must go through the underlying network. A process +sends messages into, and receives messages from, the network through a +software interface called a socket. Let's consider an analogy to help us +understand processes and sockets. A process is analogous to a house and +its socket is analogous to its door. When a process wants to send a +message to another process on another host, it shoves the message out +its door (socket). This sending process assumes that there is a +transportation infrastructure on the other side of its door that will +transport the message to the door of the destination process. Once the +message arrives at the destination host, the message passes through the +receiving process's door (socket), and the receiving process then acts +on the message. Figure 2.3 illustrates socket communication between two +processes that communicate over the Internet. (Figure 2.3 assumes that +the underlying transport protocol used by the processes is the +Internet's TCP protocol.) As shown in this figure, a socket is the +interface between the application layer and the transport layer within a +host. It is also referred to as the Application Programming Interface +(API) + +between the application and the network, since the socket is the +programming interface with which network applications are built. The +application developer has control of everything on the applicationlayer +side of the socket but has little control of the transport-layer side of +the socket. The only control that the application developer has on the +transport-layer side is (1) the choice of transport protocol and (2) +perhaps the ability to fix a few transport-layer parameters such as +maximum buffer and maximum segment sizes (to be covered in Chapter 3). +Once the application developer chooses a transport protocol (if a choice +is available), the application is built using the transport-layer +services provided by that protocol. We'll explore sockets in some detail +in Section 2.7. Addressing Processes In order to send postal mail to a +particular destination, the destination needs to have an address. +Similarly, in order for a process running on one host to send packets to +a process running on another host, the receiving process needs to have +an address. + +Figure 2.3 Application processes, sockets, and underlying transport +protocol + +To identify the receiving process, two pieces of information need to be +specified: (1) the address of the host and (2) an identifier that +specifies the receiving process in the destination host. In the +Internet, the host is identified by its IP address. We'll discuss IP +addresses in great detail in Chapter 4. For now, all we need to know is +that an IP address is a 32-bit quantity that we can think of as uniquely +identifying the host. In addition to knowing the address of the host to +which a message is destined, the sending process must also identify the +receiving process (more specifically, the receiving socket) running in +the host. This information is needed because in general a host could be +running many network applications. A destination port number serves this +purpose. Popular applications have been + +assigned specific port numbers. For example, a Web server is identified +by port number 80. A mail server process (using the SMTP protocol) is +identified by port number 25. A list of well-known port numbers for all +Internet standard protocols can be found at www.iana.org. We'll examine +port numbers in detail in Chapter 3. + +2.1.3 Transport Services Available to Applications Recall that a socket +is the interface between the application process and the transport-layer +protocol. The application at the sending side pushes messages through +the socket. At the other side of the socket, the transport-layer +protocol has the responsibility of getting the messages to the socket of +the receiving process. Many networks, including the Internet, provide +more than one transport-layer protocol. When you develop an application, +you must choose one of the available transport-layer protocols. How do +you make this choice? Most likely, you would study the services provided +by the available transport-layer protocols, and then pick the protocol +with the services that best match your application's needs. The +situation is similar to choosing either train or airplane transport for +travel between two cities. You have to choose one or the other, and each +transportation mode offers different services. (For example, the train +offers downtown pickup and drop-off, whereas the plane offers shorter +travel time.) What are the services that a transport-layer protocol can +offer to applications invoking it? We can broadly classify the possible +services along four dimensions: reliable data transfer, throughput, +timing, and security. Reliable Data Transfer As discussed in Chapter 1, +packets can get lost within a computer network. For example, a packet +can overflow a buffer in a router, or can be discarded by a host or +router after having some of its bits corrupted. For many +applications---such as electronic mail, file transfer, remote host +access, Web document transfers, and financial applications---data loss +can have devastating consequences (in the latter case, for either the +bank or the customer!). Thus, to support these applications, something +has to be done to guarantee that the data sent by one end of the +application is delivered correctly and completely to the other end of +the application. If a protocol provides such a guaranteed data delivery +service, it is said to provide reliable data transfer. One important +service that a transport-layer protocol can potentially provide to an +application is process-to-process reliable data transfer. When a +transport protocol provides this service, the sending process can just +pass its data into the socket and know with complete confidence that the +data will arrive without errors at the receiving process. When a +transport-layer protocol doesn't provide reliable data transfer, some of +the data sent by the + +sending process may never arrive at the receiving process. This may be +acceptable for loss-tolerant applications, most notably multimedia +applications such as conversational audio/video that can tolerate some +amount of data loss. In these multimedia applications, lost data might +result in a small glitch in the audio/video---not a crucial impairment. +Throughput In Chapter 1 we introduced the concept of available +throughput, which, in the context of a communication session between two +processes along a network path, is the rate at which the sending process +can deliver bits to the receiving process. Because other sessions will +be sharing the bandwidth along the network path, and because these other +sessions will be coming and going, the available throughput can +fluctuate with time. These observations lead to another natural service +that a transportlayer protocol could provide, namely, guaranteed +available throughput at some specified rate. With such a service, the +application could request a guaranteed throughput of r bits/sec, and the +transport protocol would then ensure that the available throughput is +always at least r bits/sec. Such a guaranteed throughput service would +appeal to many applications. For example, if an Internet telephony +application encodes voice at 32 kbps, it needs to send data into the +network and have data delivered to the receiving application at this +rate. If the transport protocol cannot provide this throughput, the +application would need to encode at a lower rate (and receive enough +throughput to sustain this lower coding rate) or may have to give up, +since receiving, say, half of the needed throughput is of little or no +use to this Internet telephony application. Applications that have +throughput requirements are said to be bandwidth-sensitive applications. +Many current multimedia applications are bandwidth sensitive, although +some multimedia applications may use adaptive coding techniques to +encode digitized voice or video at a rate that matches the currently +available throughput. While bandwidth-sensitive applications have +specific throughput requirements, elastic applications can make use of +as much, or as little, throughput as happens to be available. Electronic +mail, file transfer, and Web transfers are all elastic applications. Of +course, the more throughput, the better. There'san adage that says that +one cannot be too rich, too thin, or have too much throughput! Timing A +transport-layer protocol can also provide timing guarantees. As with +throughput guarantees, timing guarantees can come in many shapes and +forms. An example guarantee might be that every bit that the sender +pumps into the socket arrives at the receiver's socket no more than 100 +msec later. Such a service would be appealing to interactive real-time +applications, such as Internet telephony, virtual environments, +teleconferencing, and multiplayer games, all of which require tight +timing constraints on data delivery in order to be effective. (See +Chapter 9, \[Gauthier 1999; Ramjee 1994\].) Long delays in Internet +telephony, for example, tend to result in unnatural pauses in the +conversation; in a multiplayer game or virtual interactive environment, +a long delay between taking an action and seeing the response + +from the environment (for example, from another player at the end of an +end-to-end connection) makes the application feel less realistic. For +non-real-time applications, lower delay is always preferable to higher +delay, but no tight constraint is placed on the end-to-end delays. +Security Finally, a transport protocol can provide an application with +one or more security services. For example, in the sending host, a +transport protocol can encrypt all data transmitted by the sending +process, and in the receiving host, the transport-layer protocol can +decrypt the data before delivering the data to the receiving process. +Such a service would provide confidentiality between the two processes, +even if the data is somehow observed between sending and receiving +processes. A transport protocol can also provide other security services +in addition to confidentiality, including data integrity and end-point +authentication, topics that we'll cover in detail in Chapter 8. + +2.1.4 Transport Services Provided by the Internet Up until this point, +we have been considering transport services that a computer network +could provide in general. Let's now get more specific and examine the +type of transport services provided by the Internet. The Internet (and, +more generally, TCP/IP networks) makes two transport protocols available +to applications, UDP and TCP. When you (as an application developer) +create a new network application for the Internet, one of the first +decisions you have to make is whether to use UDP or TCP. Each of these +protocols offers a different set of services to the invoking +applications. Figure 2.4 shows the service requirements for some +selected applications. TCP Services The TCP service model includes a +connection-oriented service and a reliable data transfer service. When +an application invokes TCP as its transport protocol, the application +receives both of these services from TCP. Connection-oriented service. +TCP has the client and server exchange transport-layer control +information with each other before the application-level messages begin +to flow. This so-called handshaking procedure alerts the client and +server, allowing them to prepare for an onslaught of packets. After the +handshaking phase, a TCP connection is said to exist between the sockets + +Figure 2.4 Requirements of selected network applications + +of the two processes. The connection is a full-duplex connection in that +the two processes can send messages to each other over the connection at +the same time. When the application finishes sending messages, it must +tear down the connection. In Chapter 3 we'll discuss connection-oriented +service in detail and examine how it is implemented. Reliable data +transfer service. The communicating processes can rely on TCP to deliver +all data sent without error and in the proper order. When one side of +the application passes a stream of bytes into a socket, it can count on +TCP to deliver the same stream of bytes to the receiving socket, with no +missing or duplicate bytes. TCP also includes a congestion-control +mechanism, a service for the general welfare of the Internet rather than +for the direct benefit of the communicating processes. The TCP +congestion-control mechanism throttles a sending process (client or +server) when the network is congested between sender and receiver. As we +will see + +FOCUS ON SECURITY SECURING TCP Neither TCP nor UDP provides any +encryption---the data that the sending process passes into its socket is +the same data that travels over the network to the destination process. +So, for example, if the sending process sends a password in cleartext +(i.e., unencrypted) into its socket, the cleartext password will travel +over all the links between sender and receiver, potentially getting +sniffed and discovered at any of the intervening links. Because privacy +and other security issues have become critical for many applications, +the Internet community has developed an enhancement for TCP, called +Secure Sockets Layer (SSL). TCP-enhanced-with-SSL not only + +does everything that traditional TCP does but also provides critical +process-to-process security services, including encryption, data +integrity, and end-point authentication. We emphasize that SSL is not a +third Internet transport protocol, on the same level as TCP and UDP, but +instead is an enhancement of TCP, with the enhancements being +implemented in the application layer. In particular, if an application +wants to use the services of SSL, it needs to include SSL code +(existing, highly optimized libraries and classes) in both the client +and server sides of the application. SSL has its own socket API that is +similar to the traditional TCP socket API. When an application uses SSL, +the sending process passes cleartext data to the SSL socket; SSL in the +sending host then encrypts the data and passes the encrypted data to the +TCP socket. The encrypted data travels over the Internet to the TCP +socket in the receiving process. The receiving socket passes the +encrypted data to SSL, which decrypts the data. Finally, SSL passes the +cleartext data through its SSL socket to the receiving process. We'll +cover SSL in some detail in Chapter 8. + +in Chapter 3, TCP congestion control also attempts to limit each TCP +connection to its fair share of network bandwidth. UDP Services UDP is a +no-frills, lightweight transport protocol, providing minimal services. +UDP is connectionless, so there is no handshaking before the two +processes start to communicate. UDP provides an unreliable data transfer +service---that is, when a process sends a message into a UDP socket, UDP +provides no guarantee that the message will ever reach the receiving +process. Furthermore, messages that do arrive at the receiving process +may arrive out of order. UDP does not include a congestion-control +mechanism, so the sending side of UDP can pump data into the layer below +(the network layer) at any rate it pleases. (Note, however, that the +actual end-to-end throughput may be less than this rate due to the +limited transmission capacity of intervening links or due to +congestion). Services Not Provided by Internet Transport Protocols We +have organized transport protocol services along four dimensions: +reliable data transfer, throughput, timing, and security. Which of these +services are provided by TCP and UDP? We have already noted that TCP +provides reliable end-to-end data transfer. And we also know that TCP +can be easily enhanced at the application layer with SSL to provide +security services. But in our brief description of TCP and UDP, +conspicuously missing was any mention of throughput or timing +guarantees--- services not provided by today's Internet transport +protocols. Does this mean that time-sensitive applications such as +Internet telephony cannot run in today's Internet? The answer is clearly +no---the Internet has been hosting time-sensitive applications for many +years. These applications often work fairly well because + +they have been designed to cope, to the greatest extent possible, with +this lack of guarantee. We'll investigate several of these design tricks +in Chapter 9. Nevertheless, clever design has its limitations when delay +is excessive, or the end-to-end throughput is limited. In summary, +today's Internet can often provide satisfactory service to +time-sensitive applications, but it cannot provide any timing or +throughput guarantees. Figure 2.5 indicates the transport protocols used +by some popular Internet applications. We see that email, remote +terminal access, the Web, and file transfer all use TCP. These +applications have chosen TCP primarily because TCP provides reliable +data transfer, guaranteeing that all data will eventually get to its +destination. Because Internet telephony applications (such as Skype) can +often tolerate some loss but require a minimal rate to be effective, +developers of Internet telephony applications usually prefer to run +their applications over UDP, thereby circumventing TCP's congestion +control mechanism and packet overheads. But because many firewalls are +configured to block (most types of) UDP traffic, Internet telephony +applications often are designed to use TCP as a backup if UDP +communication fails. + +Figure 2.5 Popular Internet applications, their application-layer +protocols, and their underlying transport protocols + +2.1.5 Application-Layer Protocols We have just learned that network +processes communicate with each other by sending messages into sockets. +But how are these messages structured? What are the meanings of the +various fields in the messages? When do the processes send the messages? +These questions bring us into the realm of application-layer protocols. +An application-layer protocol defines how an application's processes, +running on different end systems, pass messages to each other. In +particular, an application-layer protocol defines: + +The types of messages exchanged, for example, request messages and +response messages The syntax of the various message types, such as the +fields in the message and how the fields are delineated The semantics of +the fields, that is, the meaning of the information in the fields Rules +for determining when and how a process sends messages and responds to +messages Some application-layer protocols are specified in RFCs and are +therefore in the public domain. For example, the Web's application-layer +protocol, HTTP (the HyperText Transfer Protocol \[RFC 2616\]), is +available as an RFC. If a browser developer follows the rules of the +HTTP RFC, the browser will be able to retrieve Web pages from any Web +server that has also followed the rules of the HTTP RFC. Many other +application-layer protocols are proprietary and intentionally not +available in the public domain. For example, Skype uses proprietary +application-layer protocols. It is important to distinguish between +network applications and application-layer protocols. An +application-layer protocol is only one piece of a network application +(albeit, a very important piece of the application from our point of +view!). Let's look at a couple of examples. The Web is a client-server +application that allows users to obtain documents from Web servers on +demand. The Web application consists of many components, including a +standard for document formats (that is, HTML), Web browsers (for +example, Firefox and Microsoft Internet Explorer), Web servers (for +example, Apache and Microsoft servers), and an application-layer +protocol. The Web's application-layer protocol, HTTP, defines the format +and sequence of messages exchanged between browser and Web server. Thus, +HTTP is only one piece (albeit, an important piece) of the Web +application. As another example, an Internet e-mail application also has +many components, including mail servers that house user mailboxes; mail +clients (such as Microsoft Outlook) that allow users to read and create +messages; a standard for defining the structure of an e-mail message; +and application-layer protocols that define how messages are passed +between servers, how messages are passed between servers and mail +clients, and how the contents of message headers are to be interpreted. +The principal application-layer protocol for electronic mail is SMTP +(Simple Mail Transfer Protocol) \[RFC 5321\]. Thus, e-mail's principal +application-layer protocol, SMTP, is only one piece (albeit an important +piece) of the e-mail application. + +2.1.6 Network Applications Covered in This Book New public domain and +proprietary Internet applications are being developed every day. Rather +than covering a large number of Internet applications in an encyclopedic +manner, we have chosen to focus on a small number of applications that +are both pervasive and important. In this chapter we discuss five +important applications: the Web, electronic mail, directory service +video streaming, and P2P applications. We first discuss the Web, not +only because it is an enormously popular application, but also because +its application-layer protocol, HTTP, is straightforward and easy to +understand. We then discuss electronic mail, the Internet's first killer +application. E-mail is more complex than the Web in the + +sense that it makes use of not one but several application-layer +protocols. After e-mail, we cover DNS, which provides a directory +service for the Internet. Most users do not interact with DNS directly; +instead, users invoke DNS indirectly through other applications +(including the Web, file transfer, and electronic mail). DNS illustrates +nicely how a piece of core network functionality (network-name to +networkaddress translation) can be implemented at the application layer +in the Internet. We then discuss P2P file sharing applications, and +complete our application study by discussing video streaming on demand, +including distributing stored video over content distribution networks. +In Chapter 9, we'll cover multimedia applications in more depth, +including voice over IP and video conferencing. + +2.2 The Web and HTTP Until the early 1990s the Internet was used +primarily by researchers, academics, and university students to log in +to remote hosts, to transfer files from local hosts to remote hosts and +vice versa, to receive and send news, and to receive and send electronic +mail. Although these applications were (and continue to be) extremely +useful, the Internet was essentially unknown outside of the academic and +research communities. Then, in the early 1990s, a major new application +arrived on the scene---the World Wide Web \[Berners-Lee 1994\]. The Web +was the first Internet application that caught the general public's eye. +It dramatically changed, and continues to change, how people interact +inside and outside their work environments. It elevated the Internet +from just one of many data networks to essentially the one and only data +network. Perhaps what appeals the most to users is that the Web operates +on demand. Users receive what they want, when they want it. This is +unlike traditional broadcast radio and television, which force users to +tune in when the content provider makes the content available. In +addition to being available on demand, the Web has many other wonderful +features that people love and cherish. It is enormously easy for any +individual to make information available over the Web---everyone can +become a publisher at extremely low cost. Hyperlinks and search engines +help us navigate through an ocean of information. Photos and videos +stimulate our senses. Forms, JavaScript, Java applets, and many other +devices enable us to interact with pages and sites. And the Web and its +protocols serve as a platform for YouTube, Web-based e-mail (such as +Gmail), and most mobile Internet applications, including Instagram and +Google Maps. + +2.2.1 Overview of HTTP The HyperText Transfer Protocol (HTTP), the Web's +application-layer protocol, is at the heart of the Web. It is defined in +\[RFC 1945\] and \[RFC 2616\]. HTTP is implemented in two programs: a +client program and a server program. The client program and server +program, executing on different end systems, talk to each other by +exchanging HTTP messages. HTTP defines the structure of these messages +and how the client and server exchange the messages. Before explaining +HTTP in detail, we should review some Web terminology. A Web page (also +called a document) consists of objects. An object is simply a +file---such as an HTML file, a JPEG image, a Java applet, or a video +clip---that is addressable by a single URL. Most Web pages consist of a +base HTML file and several referenced objects. For example, if a Web +page + +contains HTML text and five JPEG images, then the Web page has six +objects: the base HTML file plus the five images. The base HTML file +references the other objects in the page with the objects' URLs. Each +URL has two components: the hostname of the server that houses the +object and the object's path name. For example, the URL + +http://www.someSchool.edu/someDepartment/picture.gif + +has www.someSchool.edu for a hostname and /someDepartment/picture.gif +for a path name. Because Web browsers (such as Internet Explorer and +Firefox) implement the client side of HTTP, in the context of the Web, +we will use the words browser and client interchangeably. Web servers, +which implement the server side of HTTP, house Web objects, each +addressable by a URL. Popular Web servers include Apache and Microsoft +Internet Information Server. HTTP defines how Web clients request Web +pages from Web servers and how servers transfer Web pages to clients. We +discuss the interaction between client and server in detail later, but +the general idea is illustrated in Figure 2.6. When a user requests a +Web page (for example, clicks on a hyperlink), the browser sends HTTP +request messages for the objects in the page to the server. The server +receives the requests and responds with HTTP response messages that +contain the objects. HTTP uses TCP as its underlying transport protocol +(rather than running on top of UDP). The HTTP client first initiates a +TCP connection with the server. Once the connection is established, the +browser and the server processes access TCP through their socket +interfaces. As described in Section 2.1, on the client side the socket +interface is the door between the client process and the TCP connection; +on the server side it is the door between the server process and the TCP +connection. The client sends HTTP request messages into its socket +interface and receives HTTP response messages from its socket interface. +Similarly, the HTTP server receives request messages + +Figure 2.6 HTTP request-response behavior + +from its socket interface and sends response messages into its socket +interface. Once the client sends a message into its socket interface, +the message is out of the client's hands and is "in the hands" of TCP. +Recall from Section 2.1 that TCP provides a reliable data transfer +service to HTTP. This implies that each HTTP request message sent by a +client process eventually arrives intact at the server; similarly, each +HTTP response message sent by the server process eventually arrives +intact at the client. Here we see one of the great advantages of a +layered architecture---HTTP need not worry about lost data or the +details of how TCP recovers from loss or reordering of data within the +network. That is the job of TCP and the protocols in the lower layers of +the protocol stack. It is important to note that the server sends +requested files to clients without storing any state information about +the client. If a particular client asks for the same object twice in a +period of a few seconds, the server does not respond by saying that it +just served the object to the client; instead, the server resends the +object, as it has completely forgotten what it did earlier. Because an +HTTP server maintains no information about the clients, HTTP is said to +be a stateless protocol. We also remark that the Web uses the +client-server application architecture, as described in Section 2.1. A +Web server is always on, with a fixed IP address, and it services +requests from potentially millions of different browsers. + +2.2.2 Non-Persistent and Persistent Connections In many Internet +applications, the client and server communicate for an extended period +of time, with the client making a series of requests and the server +responding to each of the requests. Depending on the application and on +how the application is being used, the series of requests may be made +back-to-back, periodically at regular intervals, or intermittently. When +this client-server interaction is taking place over TCP, the application +developer needs to make an important decision---should each +request/response pair be sent over a separate TCP connection, or should +all of the requests and their corresponding responses be sent over the +same TCP connection? In the former approach, the application is said to +use non-persistent connections; and in the latter approach, persistent +connections. To gain a deep understanding of this design issue, let's +examine the advantages and disadvantages of persistent connections in +the context of a specific application, namely, HTTP, which can use both +non-persistent connections and persistent connections. Although HTTP +uses persistent connections in its default mode, HTTP clients and +servers can be configured to use non-persistent connections instead. +HTTP with Non-Persistent Connections + +Let's walk through the steps of transferring a Web page from server to +client for the case of nonpersistent connections. Let's suppose the page +consists of a base HTML file and 10 JPEG images, and that all 11 of +these objects reside on the same server. Further suppose the URL for the +base HTML file is + +http://www.someSchool.edu/someDepartment/home.index + +Here is what happens: + +1. The HTTP client process initiates a TCP connection to the server + www.someSchool.edu on port number 80, which is the default port + number for HTTP. Associated with the TCP connection, there will be a + socket at the client and a socket at the server. + +2. The HTTP client sends an HTTP request message to the server via its + socket. The request message includes the path name + /someDepartment/home .index . (We will discuss HTTP messages in some + detail below.) + +3. The HTTP server process receives the request message via its socket, + retrieves the object /someDepartment/home.index from its storage + (RAM or disk), encapsulates the object in an HTTP response message, + and sends the response message to the client via its socket. + +4. The HTTP server process tells TCP to close the TCP connection. (But + TCP doesn't actually terminate the connection until it knows for + sure that the client has received the response message intact.) + +5. The HTTP client receives the response message. The TCP connection + terminates. The message indicates that the encapsulated object is an + HTML file. The client extracts the file from the response message, + examines the HTML file, and finds references to the 10 JPEG objects. + +6. The first four steps are then repeated for each of the referenced + JPEG objects. As the browser receives the Web page, it displays the + page to the user. Two different browsers may interpret (that is, + display to the user) a Web page in somewhat different ways. HTTP has + nothing to do with how a Web page is interpreted by a client. The + HTTP specifications (\[RFC 1945\] and \[RFC 2616\]) define only the + communication protocol between the client HTTP program and the + server HTTP program. The steps above illustrate the use of + non-persistent connections, where each TCP connection is closed + after the server sends the object---the connection does not persist + for other objects. Note that each TCP connection transports exactly + one request message and one response message. Thus, in this example, + when a user requests the Web page, 11 TCP connections are generated. + In the steps described above, we were intentionally vague about + whether the client obtains the 10 + +JPEGs over 10 serial TCP connections, or whether some of the JPEGs are +obtained over parallel TCP connections. Indeed, users can configure +modern browsers to control the degree of parallelism. In their default +modes, most browsers open 5 to 10 parallel TCP connections, and each of +these connections handles one request-response transaction. If the user +prefers, the maximum number of parallel connections can be set to one, +in which case the 10 connections are established serially. As we'll see +in the next chapter, the use of parallel connections shortens the +response time. Before continuing, let's do a back-of-the-envelope +calculation to estimate the amount of time that elapses from when a +client requests the base HTML file until the entire file is received by +the client. To this end, we define the round-trip time (RTT), which is +the time it takes for a small packet to travel from client to server and +then back to the client. The RTT includes packet-propagation delays, +packetqueuing delays in intermediate routers and switches, and +packet-processing delays. (These delays were discussed in Section 1.4.) +Now consider what happens when a user clicks on a hyperlink. As shown in +Figure 2.7, this causes the browser to initiate a TCP connection between +the browser and the Web server; this involves a "three-way +handshake"---the client sends a small TCP segment to the server, the +server acknowledges and responds with a small TCP segment, and, finally, +the client acknowledges back to the server. The first two parts of the +three-way handshake take one RTT. After completing the first two parts +of the handshake, the client sends the HTTP request message combined +with the third part of the three-way handshake (the acknowledgment) into +the TCP connection. Once the request message arrives at + +Figure 2.7 Back-of-the-envelope calculation for the time needed to +request and receive an HTML file + +the server, the server sends the HTML file into the TCP connection. This +HTTP request/response eats up another RTT. Thus, roughly, the total +response time is two RTTs plus the transmission time at the server of +the HTML file. HTTP with Persistent Connections Non-persistent +connections have some shortcomings. First, a brand-new connection must +be established and maintained for each requested object. For each of +these connections, TCP buffers must be allocated and TCP variables must +be kept in both the client and server. This can place a significant +burden on the Web server, which may be serving requests from hundreds of +different clients simultaneously. Second, as we just described, each +object suffers a delivery delay of two RTTs---one RTT to establish the +TCP connection and one RTT to request and receive an object. With HTTP +1.1 persistent connections, the server leaves the TCP connection open +after sending a response. Subsequent requests and responses between the +same client and server can be sent over the same connection. In +particular, an entire Web page (in the example above, the base HTML file +and the 10 images) can be sent over a single persistent TCP connection. +Moreover, multiple Web pages residing on the same server can be sent +from the server to the same client over a single persistent TCP +connection. These requests for objects can be made back-to-back, without +waiting for replies to pending requests (pipelining). Typically, the +HTTP server closes a connection when it isn't used for a certain time (a +configurable timeout interval). When the server receives the +back-to-back requests, it sends the objects back-to-back. The default +mode of HTTP uses persistent connections with pipelining. Most recently, +HTTP/2 \[RFC 7540\] builds on HTTP 1.1 by allowing multiple requests and +replies to be interleaved in the same connection, and a mechanism for +prioritizing HTTP message requests and replies within this connection. +We'll quantitatively compare the performance of non-persistent and +persistent connections in the homework problems of Chapters 2 and 3. You +are also encouraged to see \[Heidemann 1997; Nielsen 1997; RFC 7540\]. + +2.2.3 HTTP Message Format The HTTP specifications \[RFC 1945; RFC 2616; +RFC 7540\] include the definitions of the HTTP message formats. There +are two types of HTTP messages, request messages and response messages, +both of which are discussed below. HTTP Request Message + +Below we provide a typical HTTP request message: + +GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu Connection: +close User-agent: Mozilla/5.0 Accept-language: fr + +We can learn a lot by taking a close look at this simple request +message. First of all, we see that the message is written in ordinary +ASCII text, so that your ordinary computer-literate human being can read +it. Second, we see that the message consists of five lines, each +followed by a carriage return and a line feed. The last line is followed +by an additional carriage return and line feed. Although this particular +request message has five lines, a request message can have many more +lines or as few as one line. The first line of an HTTP request message +is called the request line; the subsequent lines are called the header +lines. The request line has three fields: the method field, the URL +field, and the HTTP version field. The method field can take on several +different values, including GET, POST, HEAD, PUT, and DELETE . The great +majority of HTTP request messages use the GET method. The GET method is +used when the browser requests an object, with the requested object +identified in the URL field. In this example, the browser is requesting +the object /somedir/page.html . The version is selfexplanatory; in this +example, the browser implements version HTTP/1.1. Now let's look at the +header lines in the example. The header line Host: www.someschool.edu +specifies the host on which the object resides. You might think that +this header line is unnecessary, as there is already a TCP connection in +place to the host. But, as we'll see in Section 2.2.5, the information +provided by the host header line is required by Web proxy caches. By +including the Connection: close header line, the browser is telling the +server that it doesn't want to bother with persistent connections; it +wants the server to close the connection after sending the requested +object. The Useragent: header line specifies the user agent, that is, +the browser type that is making the request to the server. Here the user +agent is Mozilla/5.0, a Firefox browser. This header line is useful +because the server can actually send different versions of the same +object to different types of user agents. (Each of the versions is +addressed by the same URL.) Finally, the Accept-language: header +indicates that the user prefers to receive a French version of the +object, if such an object exists on the server; otherwise, the server +should send its default version. The Accept-language: header is just one +of many content negotiation headers available in HTTP. Having looked at +an example, let's now look at the general format of a request message, +as shown in Figure 2.8. We see that the general format closely follows +our earlier example. You may have noticed, + +however, that after the header lines (and the additional carriage return +and line feed) there is an "entity body." The entity body is empty with +the GET method, but is used with the POST method. An HTTP client often +uses the POST method when the user fills out a form---for example, when +a user provides search words to a search engine. With a POST message, +the user is still requesting a Web page from the server, but the +specific contents of the Web page + +Figure 2.8 General format of an HTTP request message + +depend on what the user entered into the form fields. If the value of +the method field is POST , then the entity body contains what the user +entered into the form fields. We would be remiss if we didn't mention +that a request generated with a form does not necessarily use the POST +method. Instead, HTML forms often use the GET method and include the +inputted data (in the form fields) in the requested URL. For example, if +a form uses the GET method, has two fields, and the inputs to the two +fields are monkeys and bananas , then the URL will have the structure +www.somesite.com/animalsearch?monkeys&bananas . In your day-to-day Web +surfing, you have probably noticed extended URLs of this sort. The HEAD +method is similar to the GET method. When a server receives a request +with the HEAD method, it responds with an HTTP message but it leaves out +the requested object. Application developers often use the HEAD method +for debugging. The PUT method is often used in conjunction with Web +publishing tools. It allows a user to upload an object to a specific +path (directory) on a specific Web server. The PUT method is also used +by applications that need to upload objects to Web servers. The DELETE +method allows a user, or an application, to delete an object on a Web +server. HTTP Response Message + +Below we provide a typical HTTP response message. This response message +could be the response to the example request message just discussed. + +HTTP/1.1 200 OK Connection: close Date: Tue, 18 Aug 2015 15:44:04 GMT +Server: Apache/2.2.3 (CentOS) Last-Modified: Tue, 18 Aug 2015 15:11:03 +GMT Content-Length: 6821 Content-Type: text/html (data data data data +data ...) + +Let's take a careful look at this response message. It has three +sections: an initial status line, six header lines, and then the entity +body. The entity body is the meat of the message---it contains the +requested object itself (represented by data data data data data ... ). +The status line has three fields: the protocol version field, a status +code, and a corresponding status message. In this example, the status +line indicates that the server is using HTTP/1.1 and that everything is +OK (that is, the server has found, and is sending, the requested +object). Now let's look at the header lines. The server uses the +Connection: close header line to tell the client that it is going to +close the TCP connection after sending the message. The Date: header +line indicates the time and date when the HTTP response was created and +sent by the server. Note that this is not the time when the object was +created or last modified; it is the time when the server retrieves the +object from its file system, inserts the object into the response +message, and sends the response message. The Server: header line +indicates that the message was generated by an Apache Web server; it is +analogous to the User-agent: header line in the HTTP request message. +The LastModified: header line indicates the time and date when the +object was created or last modified. The Last-Modified: header, which we +will soon cover in more detail, is critical for object caching, both in +the local client and in network cache servers (also known as proxy +servers). The Content-Length: header line indicates the number of bytes +in the object being sent. The Content-Type: header line indicates that +the object in the entity body is HTML text. (The object type is +officially indicated by the Content-Type: header and not by the file +extension.) Having looked at an example, let's now examine the general +format of a response message, which is shown in Figure 2.9. This general +format of the response message matches the previous example of a +response message. Let's say a few additional words about status codes +and their phrases. The status + +code and associated phrase indicate the result of the request. Some +common status codes and associated phrases include: 200 OK: Request +succeeded and the information is returned in the response. 301 Moved +Permanently: Requested object has been permanently moved; the new URL is +specified in Location : header of the response message. The client +software will automatically retrieve the new URL. 400 Bad Request: This +is a generic error code indicating that the request could not be +understood by the server. + +Figure 2.9 General format of an HTTP response message + +404 Not Found: The requested document does not exist on this server. 505 +HTTP Version Not Supported: The requested HTTP protocol version is not +supported by the server. How would you like to see a real HTTP response +message? This is highly recommended and very easy to do! First Telnet +into your favorite Web server. Then type in a one-line request message +for some object that is housed on the server. For example, if you have +access to a command prompt, type: + +Using Wireshark to investigate the HTTP protocol + +telnet gaia.cs.umass.edu 80 GET /kurose_ross/interactive/index.php +HTTP/1.1 Host: gaia.cs.umass.edu + +(Press the carriage return twice after typing the last line.) This opens +a TCP connection to port 80 of the host gaia.cs.umass.edu and then sends +the HTTP request message. You should see a response message that +includes the base HTML file for the interactive homework problems for +this textbook. If you'd rather just see the HTTP message lines and not +receive the object itself, replace GET with HEAD . In this section we +discussed a number of header lines that can be used within HTTP request +and response messages. The HTTP specification defines many, many more +header lines that can be inserted by browsers, Web servers, and network +cache servers. We have covered only a small number of the totality of +header lines. We'll cover a few more below and another small number when +we discuss network Web caching in Section 2.2.5. A highly readable and +comprehensive discussion of the HTTP protocol, including its headers and +status codes, is given in \[Krishnamurthy 2001\]. How does a browser +decide which header lines to include in a request message? How does a +Web server decide which header lines to include in a response message? A +browser will generate header lines as a function of the browser type and +version (for example, an HTTP/1.0 browser will not generate any 1.1 +header lines), the user configuration of the browser (for example, +preferred language), and whether the browser currently has a cached, but +possibly out-of-date, version of the object. Web servers behave +similarly: There are different products, versions, and configurations, +all of which influence which header lines are included in response +messages. + +2.2.4 User-Server Interaction: Cookies We mentioned above that an HTTP +server is stateless. This simplifies server design and has permitted +engineers to develop high-performance Web servers that can handle +thousands of simultaneous TCP connections. However, it is often +desirable for a Web site to identify users, either because the server +wishes to restrict user access or because it wants to serve content as a +function of the user identity. For these purposes, HTTP uses cookies. +Cookies, defined in \[RFC 6265\], allow sites to keep track of users. +Most major commercial Web sites use cookies today. As shown in Figure +2.10, cookie technology has four components: (1) a cookie header line in +the HTTP response message; (2) a cookie header line in the HTTP request +message; (3) a cookie file kept on the + +user's end system and managed by the user's browser; and (4) a back-end +database at the Web site. Using Figure 2.10, let's walk through an +example of how cookies work. Suppose Susan, who always accesses the Web +using Internet Explorer from her home PC, contacts Amazon.com for the +first time. Let us suppose that in the past she has already visited the +eBay site. When the request comes into the Amazon Web server, the server +creates a unique identification number and creates an entry in its +backend database that is indexed by the identification number. The +Amazon Web server then responds to Susan's browser, including in the +HTTP response a Set-cookie: header, which contains the identification +number. For example, the header line might be: + +Set-cookie: 1678 + +When Susan's browser receives the HTTP response message, it sees the +Set-cookie: header. The browser then appends a line to the special +cookie file that it manages. This line includes the hostname of the +server and the identification number in the Set-cookie: header. Note +that the cookie file already has an entry for eBay, since Susan has +visited that site in the past. As Susan continues to browse the Amazon +site, each time she requests a Web page, her browser consults her cookie +file, extracts her identification number for this site, and puts a +cookie header line that + +Figure 2.10 Keeping user state with cookies + +includes the identification number in the HTTP request. Specifically, +each of her HTTP requests to the Amazon server includes the header line: + +Cookie: 1678 + +In this manner, the Amazon server is able to track Susan's activity at +the Amazon site. Although the Amazon Web site does not necessarily know +Susan's name, it knows exactly which pages user 1678 visited, in which +order, and at what times! Amazon uses cookies to provide its shopping +cart service--- Amazon can maintain a list of all of Susan's intended +purchases, so that she can pay for them + +collectively at the end of the session. If Susan returns to Amazon's +site, say, one week later, her browser will continue to put the header +line Cookie: 1678 in the request messages. Amazon also recommends +products to Susan based on Web pages she has visited at Amazon in the +past. If Susan also registers herself with Amazon--- providing full +name, e-mail address, postal address, and credit card +information---Amazon can then include this information in its database, +thereby associating Susan's name with her identification number (and all +of the pages she has visited at the site in the past!). This is how +Amazon and other e-commerce sites provide "one-click shopping"---when +Susan chooses to purchase an item during a subsequent visit, she doesn't +need to re-enter her name, credit card number, or address. From this +discussion we see that cookies can be used to identify a user. The first +time a user visits a site, the user can provide a user identification +(possibly his or her name). During the subsequent sessions, the browser +passes a cookie header to the server, thereby identifying the user to +the server. Cookies can thus be used to create a user session layer on +top of stateless HTTP. For example, when a user logs in to a Web-based +e-mail application (such as Hotmail), the browser sends cookie +information to the server, permitting the server to identify the user +throughout the user's session with the application. Although cookies +often simplify the Internet shopping experience for the user, they are +controversial because they can also be considered as an invasion of +privacy. As we just saw, using a combination of cookies and +user-supplied account information, a Web site can learn a lot about a +user and potentially sell this information to a third party. Cookie +Central \[Cookie Central 2016\] includes extensive information on the +cookie controversy. + +2.2.5 Web Caching A Web cache---also called a proxy server---is a +network entity that satisfies HTTP requests on the behalf of an origin +Web server. The Web cache has its own disk storage and keeps copies of +recently requested objects in this storage. As shown in Figure 2.11, a +user's browser can be configured so that all of the user's HTTP requests +are first directed to the Web cache. Once a browser is configured, each +browser request for an object is first directed to the Web cache. As an +example, suppose a browser is requesting the object +http://www.someschool.edu/campus.gif . Here is what happens: + +1. The browser establishes a TCP connection to the Web cache and sends + an HTTP request for the object to the Web cache. + +2. The Web cache checks to see if it has a copy of the object stored + locally. If it does, the Web cache returns the object within an HTTP + response message to the client browser. + +Figure 2.11 Clients requesting objects through a Web cache + +3. If the Web cache does not have the object, the Web cache opens a TCP + connection to the origin server, that is, to www.someschool.edu . + The Web cache then sends an HTTP request for the object into the + cache-to-server TCP connection. After receiving this request, the + origin server sends the object within an HTTP response to the Web + cache. + +4. When the Web cache receives the object, it stores a copy in its + local storage and sends a copy, within an HTTP response message, to + the client browser (over the existing TCP connection between the + client browser and the Web cache). Note that a cache is both a + server and a client at the same time. When it receives requests from + and sends responses to a browser, it is a server. When it sends + requests to and receives responses from an origin server, it is a + client. Typically a Web cache is purchased and installed by an ISP. + For example, a university might install a cache on its campus + network and configure all of the campus browsers to point to the + cache. Or a major residential ISP (such as Comcast) might install + one or more caches in its network and preconfigure its shipped + browsers to point to the installed caches. Web caching has seen + deployment in the Internet for two reasons. First, a Web cache can + substantially reduce the response time for a client request, + particularly if the bottleneck bandwidth between the client and the + origin server is much less than the bottleneck bandwidth between the + client and the cache. If there is a high-speed connection between + the client and the cache, as there often is, and if the cache has + the requested object, then the cache will be able to deliver the + object rapidly to the client. Second, as we will soon illustrate + with an example, Web caches can substantially reduce traffic on an + institution's access link to the Internet. By reducing traffic, the + institution (for example, a company or a university) does not have + to upgrade bandwidth as quickly, thereby reducing costs. + Furthermore, Web caches can + +substantially reduce Web traffic in the Internet as a whole, thereby +improving performance for all applications. To gain a deeper +understanding of the benefits of caches, let's consider an example in +the context of Figure 2.12. This figure shows two networks---the +institutional network and the rest of the public Internet. The +institutional network is a high-speed LAN. A router in the institutional +network and a router in the Internet are connected by a 15 Mbps link. +The origin servers are attached to the Internet but are located all over +the globe. Suppose that the average object size is 1 Mbits and that the +average request rate from the institution's browsers to the origin +servers is 15 requests per second. Suppose that the HTTP request +messages are negligibly small and thus create no traffic in the networks +or in the access link (from institutional router to Internet router). +Also suppose that the amount of time it takes from when the router on +the Internet side of the access link in Figure 2.12 forwards an HTTP +request (within an IP datagram) until it receives the response +(typically within many IP datagrams) is two seconds on average. +Informally, we refer to this last delay as the "Internet delay." + +Figure 2.12 Bottleneck between an institutional network and the Internet + +The total response time---that is, the time from the browser's request +of an object until its receipt of the object---is the sum of the LAN +delay, the access delay (that is, the delay between the two routers), +and + +the Internet delay. Let's now do a very crude calculation to estimate +this delay. The traffic intensity on the LAN (see Section 1.4.2) is (15 +requests/sec)⋅(1 Mbits/request)/(100 Mbps)=0.15 whereas the traffic +intensity on the access link (from the Internet router to institution +router) is (15 requests/sec)⋅(1 Mbits/request)/(15 Mbps)=1 A traffic +intensity of 0.15 on a LAN typically results in, at most, tens of +milliseconds of delay; hence, we can neglect the LAN delay. However, as +discussed in Section 1.4.2, as the traffic intensity approaches 1 (as is +the case of the access link in Figure 2.12), the delay on a link becomes +very large and grows without bound. Thus, the average response time to +satisfy requests is going to be on the order of minutes, if not more, +which is unacceptable for the institution's users. Clearly something +must be done. One possible solution is to increase the access rate from +15 Mbps to, say, 100 Mbps. This will lower the traffic intensity on the +access link to 0.15, which translates to negligible delays between the +two routers. In this case, the total response time will roughly be two +seconds, that is, the Internet delay. But this solution also means that +the institution must upgrade its access link from 15 Mbps to 100 Mbps, a +costly proposition. Now consider the alternative solution of not +upgrading the access link but instead installing a Web cache in the +institutional network. This solution is illustrated in Figure 2.13. Hit +rates---the fraction of requests that are satisfied by a cache--- +typically range from 0.2 to 0.7 in practice. For illustrative purposes, +let's suppose that the cache provides a hit rate of 0.4 for this +institution. Because the clients and the cache are connected to the same +high-speed LAN, 40 percent of the requests will be satisfied almost +immediately, say, within 10 milliseconds, by the cache. Nevertheless, +the remaining 60 percent of the requests still need to be satisfied by +the origin servers. But with only 60 percent of the requested objects +passing through the access link, the traffic intensity on the access +link is reduced from 1.0 to 0.6. Typically, a traffic intensity less +than 0.8 corresponds to a small delay, say, tens of milliseconds, on a +15 Mbps link. This delay is negligible compared with the two-second +Internet delay. Given these considerations, average delay therefore is +0.4⋅(0.01 seconds)+0.6⋅(2.01 seconds) which is just slightly greater +than 1.2 seconds. Thus, this second solution provides an even lower +response time than the first solution, and it doesn't require the +institution + +Figure 2.13 Adding a cache to the institutional network + +to upgrade its link to the Internet. The institution does, of course, +have to purchase and install a Web cache. But this cost is low---many +caches use public-domain software that runs on inexpensive PCs. Through +the use of Content Distribution Networks (CDNs), Web caches are +increasingly playing an important role in the Internet. A CDN company +installs many geographically distributed caches throughout the Internet, +thereby localizing much of the traffic. There are shared CDNs (such as +Akamai and Limelight) and dedicated CDNs (such as Google and Netflix). +We will discuss CDNs in more detail in Section 2.6. The Conditional GET +Although caching can reduce user-perceived response times, it introduces +a new problem---the copy of an object residing in the cache may be +stale. In other words, the object housed in the Web server may have been +modified since the copy was cached at the client. Fortunately, HTTP has +a mechanism that allows a cache to verify that its objects are up to +date. This mechanism is called the conditional GET. + +An HTTP request message is a so-called conditional GET message if (1) +the request message uses the GET method and (2) the request message +includes an If-Modified-Since: header line. To illustrate how the +conditional GET operates, let's walk through an example. First, on the +behalf of a requesting browser, a proxy cache sends a request message to +a Web server: + +GET /fruit/kiwi.gif HTTP/1.1 Host: www.exotiquecuisine.com + +Second, the Web server sends a response message with the requested +object to the cache: + +HTTP/1.1 200 OK Date: Sat, 3 Oct 2015 15:39:29 Server: Apache/1.3.0 +(Unix) Last-Modified: Wed, 9 Sep 2015 09:23:24 Content-Type: image/gif +(data data data data data ...) + +The cache forwards the object to the requesting browser but also caches +the object locally. Importantly, the cache also stores the last-modified +date along with the object. Third, one week later, another browser +requests the same object via the cache, and the object is still in the +cache. Since this object may have been modified at the Web server in the +past week, the cache performs an up-to-date check by issuing a +conditional GET. Specifically, the cache sends: + +GET /fruit/kiwi.gif HTTP/1.1 Host: www.exotiquecuisine.com +If-modified-since: Wed, 9 Sep 2015 09:23:24 + +Note that the value of the If-modified-since: header line is exactly +equal to the value of the Last-Modified: header line that was sent by +the server one week ago. This conditional GET is telling the server to +send the object only if the object has been modified since the specified +date. Suppose the object has not been modified since 9 Sep 2015 +09:23:24. Then, fourth, the Web server sends a response message to the +cache: + +HTTP/1.1 304 Not Modified Date: Sat, 10 Oct 2015 15:39:29 Server: +Apache/1.3.0 (Unix) (empty entity body) + +We see that in response to the conditional GET, the Web server still +sends a response message but does not include the requested object in +the response message. Including the requested object would only waste +bandwidth and increase user-perceived response time, particularly if the +object is large. Note that this last response message has 304 Not +Modified in the status line, which tells the cache that it can go ahead +and forward its (the proxy cache's) cached copy of the object to the +requesting browser. This ends our discussion of HTTP, the first Internet +protocol (an application-layer protocol) that we've studied in detail. +We've seen the format of HTTP messages and the actions taken by the Web +client and server as these messages are sent and received. We've also +studied a bit of the Web's application infrastructure, including caches, +cookies, and back-end databases, all of which are tied in some way to +the HTTP protocol. + +2.3 Electronic Mail in the Internet Electronic mail has been around +since the beginning of the Internet. It was the most popular application +when the Internet was in its infancy \[Segaller 1998\], and has become +more elaborate and powerful over the years. It remains one of the +Internet's most important and utilized applications. As with ordinary +postal mail, e-mail is an asynchronous communication medium---people +send and read messages when it is convenient for them, without having to +coordinate with other people's schedules. In contrast with postal mail, +electronic mail is fast, easy to distribute, and inexpensive. Modern +e-mail has many powerful features, including messages with attachments, +hyperlinks, HTML-formatted text, and embedded photos. In this section, +we examine the application-layer protocols that are at the heart of +Internet e-mail. But before we jump into an in-depth discussion of these +protocols, let's take a high-level view of the Internet mail system and +its key components. Figure 2.14 presents a high-level view of the +Internet mail system. We see from this diagram that it has three major +components: user agents, mail servers, and the Simple Mail Transfer +Protocol (SMTP). We now describe each of these components in the context +of a sender, Alice, sending an e-mail message to a recipient, Bob. User +agents allow users to read, reply to, forward, save, and compose +messages. Microsoft Outlook and Apple Mail are examples of user agents +for e-mail. When Alice is finished composing her message, her user agent +sends the message to her mail server, where the message is placed in the +mail server's outgoing message queue. When Bob wants to read a message, +his user agent retrieves the message from his mailbox in his mail +server. Mail servers form the core of the e-mail infrastructure. Each +recipient, such as Bob, has a mailbox located in one of the mail +servers. Bob's mailbox manages and + +Figure 2.14 A high-level view of the Internet e-mail system + +maintains the messages that have been sent to him. A typical message +starts its journey in the sender's user agent, travels to the sender's +mail server, and travels to the recipient's mail server, where it is +deposited in the recipient's mailbox. When Bob wants to access the +messages in his mailbox, the mail server containing his mailbox +authenticates Bob (with usernames and passwords). Alice's mail server +must also deal with failures in Bob's mail server. If Alice's server +cannot deliver mail to Bob's server, Alice's server holds the message in +a message queue and attempts to transfer the message later. Reattempts +are often done every 30 minutes or so; if there is no success after +several days, the server removes the message and notifies the sender +(Alice) with an e-mail message. SMTP is the principal application-layer +protocol for Internet electronic mail. It uses the reliable data +transfer service of TCP to transfer mail from the sender's mail server +to the recipient's mail server. As with most application-layer +protocols, SMTP has two sides: a client side, which executes on the +sender's mail server, and a server side, which executes on the +recipient's mail server. Both the client and server sides of SMTP run on +every mail server. When a mail server sends mail to other mail servers, +it acts as an SMTP client. When a mail server receives mail from other +mail servers, it acts as an SMTP server. + +2.3.1 SMTP SMTP, defined in RFC 5321, is at the heart of Internet +electronic mail. As mentioned above, SMTP transfers messages from +senders' mail servers to the recipients' mail servers. SMTP is much +older than HTTP. (The original SMTP RFC dates back to 1982, and SMTP was +around long before that.) Although SMTP has numerous wonderful +qualities, as evidenced by its ubiquity in the Internet, it is +nevertheless a legacy technology that possesses certain archaic +characteristics. For example, it restricts the body (not just the +headers) of all mail messages to simple 7-bit ASCII. This restriction +made sense in the early 1980s when transmission capacity was scarce and +no one was e-mailing large attachments or large image, audio, or video +files. But today, in the multimedia era, the 7-bit ASCII restriction is +a bit of a pain ---it requires binary multimedia data to be encoded to +ASCII before being sent over SMTP; and it requires the corresponding +ASCII message to be decoded back to binary after SMTP transport. Recall +from Section 2.2 that HTTP does not require multimedia data to be ASCII +encoded before transfer. To illustrate the basic operation of SMTP, +let's walk through a common scenario. Suppose Alice wants to send Bob a +simple ASCII message. + +1. Alice invokes her user agent for e-mail, provides Bob's e-mail + address (for example, bob@someschool.edu ), composes a message, and + instructs the user agent to send the message. + +2. Alice's user agent sends the message to her mail server, where it is + placed in a message queue. + +3. The client side of SMTP, running on Alice's mail server, sees the + message in the message queue. It opens a TCP connection to an SMTP + server, running on Bob's mail server. + +4. After some initial SMTP handshaking, the SMTP client sends Alice's + message into the TCP connection. + +5. At Bob's mail server, the server side of SMTP receives the message. + Bob's mail server then places the message in Bob's mailbox. + +6. Bob invokes his user agent to read the message at his convenience. + The scenario is summarized in Figure 2.15. It is important to + observe that SMTP does not normally use intermediate mail servers + for sending mail, even when the two mail servers are located at + opposite ends of the world. If Alice's server is in Hong Kong and + Bob's server is in St. Louis, the TCP + +Figure 2.15 Alice sends a message to Bob + +connection is a direct connection between the Hong Kong and St. Louis +servers. In particular, if Bob's mail server is down, the message +remains in Alice's mail server and waits for a new attempt---the message +does not get placed in some intermediate mail server. Let's now take a +closer look at how SMTP transfers a message from a sending mail server +to a receiving mail server. We will see that the SMTP protocol has many +similarities with protocols that are used for face-to-face human +interaction. First, the client SMTP (running on the sending mail server +host) has TCP establish a connection to port 25 at the server SMTP +(running on the receiving mail server host). If the server is down, the +client tries again later. Once this connection is established, the +server and client perform some application-layer handshaking---just as +humans often introduce themselves before transferring information from +one to another, SMTP clients and servers introduce themselves before +transferring information. During this SMTP handshaking phase, the SMTP +client indicates the email address of the sender (the person who +generated the message) and the e-mail address of the recipient. Once the +SMTP client and server have introduced themselves to each other, the +client sends the message. SMTP can count on the reliable data transfer +service of TCP to get the message to the server without errors. The +client then repeats this process over the same TCP connection if it has +other messages to send to the server; otherwise, it instructs TCP to +close the connection. Let's next take a look at an example transcript of +messages exchanged between an SMTP client (C) and an SMTP server (S). +The hostname of the client is crepes.fr and the hostname of the server +is hamburger.edu . The ASCII text lines prefaced with C: are exactly the +lines the client sends into its TCP socket, and the ASCII text lines +prefaced with S: are exactly the lines the server sends into its TCP +socket. The following transcript begins as soon as the TCP connection is +established. + +S: 220 hamburger.edu C: HELO crepes.fr S: 250 Hello crepes.fr, +pleased to meet you + +C: MAIL FROM: <alice@crepes.fr> S: 250 alice@crepes.fr ... Sender ok +C: RCPT TO: <bob@hamburger.edu> S: 250 bob@hamburger.edu ... Recipient +ok C: DATA S: 354 Enter mail, end with "." on a line by itself C: Do +you like ketchup? C: How about pickles? C: . S: 250 Message accepted +for delivery C: QUIT S: 221 hamburger.edu closing connection + +In the example above, the client sends a message (" Do you like ketchup? +How about pickles? ") from mail server crepes.fr to mail server +hamburger.edu . As part of the dialogue, the client issued five +commands: HELO (an abbreviation for HELLO), MAIL FROM , RCPT TO , DATA , +and QUIT . These commands are self-explanatory. The client also sends a +line consisting of a single period, which indicates the end of the +message to the server. (In ASCII jargon, each message ends with +CRLF.CRLF , where CR and LF stand for carriage return and line feed, +respectively.) The server issues replies to each command, with each +reply having a reply code and some (optional) Englishlanguage +explanation. We mention here that SMTP uses persistent connections: If +the sending mail server has several messages to send to the same +receiving mail server, it can send all of the messages over the same TCP +connection. For each message, the client begins the process with a new +MAIL FROM: crepes.fr , designates the end of message with an isolated +period, and issues QUIT only after all messages have been sent. It is +highly recommended that you use Telnet to carry out a direct dialogue +with an SMTP server. To do this, issue + +telnet serverName 25 + +where serverName is the name of a local mail server. When you do this, +you are simply establishing a TCP connection between your local host and +the mail server. After typing this line, you should immediately receive +the 220 reply from the server. Then issue the SMTP commands HELO , MAIL +FROM , RCPT TO , DATA , CRLF.CRLF , and QUIT at the appropriate times. +It is also highly recommended that you do Programming Assignment 3 at +the end of this chapter. In that assignment, you'll build a simple user +agent that implements the client side of SMTP. It will allow you to send +an e- + +mail message to an arbitrary recipient via a local mail server. + +2.3.2 Comparison with HTTP Let's now briefly compare SMTP with HTTP. +Both protocols are used to transfer files from one host to another: HTTP +transfers files (also called objects) from a Web server to a Web client +(typically a browser); SMTP transfers files (that is, e-mail messages) +from one mail server to another mail server. When transferring the +files, both persistent HTTP and SMTP use persistent connections. Thus, +the two protocols have common characteristics. However, there are +important differences. First, HTTP is mainly a pull protocol---someone +loads information on a Web server and users use HTTP to pull the +information from the server at their convenience. In particular, the TCP +connection is initiated by the machine that wants to receive the file. +On the other hand, SMTP is primarily a push protocol---the sending mail +server pushes the file to the receiving mail server. In particular, the +TCP connection is initiated by the machine that wants to send the file. +A second difference, which we alluded to earlier, is that SMTP requires +each message, including the body of each message, to be in 7-bit ASCII +format. If the message contains characters that are not 7-bit ASCII (for +example, French characters with accents) or contains binary data (such +as an image file), then the message has to be encoded into 7-bit ASCII. +HTTP data does not impose this restriction. A third important difference +concerns how a document consisting of text and images (along with +possibly other media types) is handled. As we learned in Section 2.2, +HTTP encapsulates each object in its own HTTP response message. SMTP +places all of the message's objects into one message. + +2.3.3 Mail Message Formats When Alice writes an ordinary snail-mail +letter to Bob, she may include all kinds of peripheral header +information at the top of the letter, such as Bob's address, her own +return address, and the date. Similarly, when an e-mail message is sent +from one person to another, a header containing peripheral information +precedes the body of the message itself. This peripheral information is +contained in a series of header lines, which are defined in RFC 5322. +The header lines and the body of the message are separated by a blank +line (that is, by CRLF ). RFC 5322 specifies the exact format for mail +header lines as well as their semantic interpretations. As with HTTP, +each header line contains readable text, consisting of a keyword +followed by a colon followed by a value. Some of the keywords are +required and others are optional. Every header must have a From: header +line and a To: header line; a header may include a Subject: header line +as well as other optional header lines. It is important to note that +these header lines are different from the SMTP commands we studied in +Section 2.4.1 (even though + +they contain some common words such as "from" and "to"). The commands in +that section were part of the SMTP handshaking protocol; the header +lines examined in this section are part of the mail message itself. A +typical message header looks like this: + +From: alice@crepes.fr To: bob@hamburger.edu Subject: Searching for the +meaning of life. + +After the message header, a blank line follows; then the message body +(in ASCII) follows. You should use Telnet to send a message to a mail +server that contains some header lines, including the Subject: header +line. To do this, issue telnet serverName 25, as discussed in Section +2.4.1. + +2.3.4 Mail Access Protocols Once SMTP delivers the message from Alice's +mail server to Bob's mail server, the message is placed in Bob's +mailbox. Throughout this discussion we have tacitly assumed that Bob +reads his mail by logging onto the server host and then executing a mail +reader that runs on that host. Up until the early 1990s this was the +standard way of doing things. But today, mail access uses a +client-server architecture---the typical user reads e-mail with a client +that executes on the user's end system, for example, on an office PC, a +laptop, or a smartphone. By executing a mail client on a local PC, users +enjoy a rich set of features, including the ability to view multimedia +messages and attachments. Given that Bob (the recipient) executes his +user agent on his local PC, it is natural to consider placing a mail +server on his local PC as well. With this approach, Alice's mail server +would dialogue directly with Bob's PC. There is a problem with this +approach, however. Recall that a mail server manages mailboxes and runs +the client and server sides of SMTP. If Bob's mail server were to reside +on his local PC, then Bob's PC would have to remain always on, and +connected to the Internet, in order to receive new mail, which can +arrive at any time. This is impractical for many Internet users. +Instead, a typical user runs a user agent on the local PC but accesses +its mailbox stored on an always-on shared mail server. This mail server +is shared with other users and is typically maintained by the user's ISP +(for example, university or company). Now let's consider the path an +e-mail message takes when it is sent from Alice to Bob. We just learned +that at some point along the path the e-mail message needs to be +deposited in Bob's mail server. This could be done simply by having +Alice's user agent send the message directly to Bob's mail server. And + +this could be done with SMTP---indeed, SMTP has been designed for +pushing e-mail from one host to another. However, typically the sender's +user agent does not dialogue directly with the recipient's mail server. +Instead, as shown in Figure 2.16, Alice's user agent uses SMTP to push +the e-mail message into her mail server, then Alice's mail server uses +SMTP (as an SMTP client) to relay the e-mail message to Bob's mail +server. Why the two-step procedure? Primarily because without relaying +through Alice's mail server, Alice's user agent doesn't have any +recourse to an unreachable destination + +Figure 2.16 E-mail protocols and their communicating entities + +mail server. By having Alice first deposit the e-mail in her own mail +server, Alice's mail server can repeatedly try to send the message to +Bob's mail server, say every 30 minutes, until Bob's mail server becomes +operational. (And if Alice's mail server is down, then she has the +recourse of complaining to her system administrator!) The SMTP RFC +defines how the SMTP commands can be used to relay a message across +multiple SMTP servers. But there is still one missing piece to the +puzzle! How does a recipient like Bob, running a user agent on his local +PC, obtain his messages, which are sitting in a mail server within Bob's +ISP? Note that Bob's user agent can't use SMTP to obtain the messages +because obtaining the messages is a pull operation, whereas SMTP is a +push protocol. The puzzle is completed by introducing a special mail +access protocol that transfers messages from Bob's mail server to his +local PC. There are currently a number of popular mail access protocols, +including Post Office Protocol---Version 3 (POP3), Internet Mail Access +Protocol (IMAP), and HTTP. Figure 2.16 provides a summary of the +protocols that are used for Internet mail: SMTP is used to transfer mail +from the sender's mail server to the recipient's mail server; SMTP is +also used to transfer mail from the sender's user agent to the sender's +mail server. A mail access protocol, such as POP3, is used to transfer +mail from the recipient's mail server to the recipient's user agent. +POP3 POP3 is an extremely simple mail access protocol. It is defined in +\[RFC 1939\], which is short and quite readable. Because the protocol is +so simple, its functionality is rather limited. POP3 begins when the +user agent (the client) opens a TCP connection to the mail server (the +server) on port 110. With the TCP + +connection established, POP3 progresses through three phases: +authorization, transaction, and update. During the first phase, +authorization, the user agent sends a username and a password (in the +clear) to authenticate the user. During the second phase, transaction, +the user agent retrieves messages; also during this phase, the user +agent can mark messages for deletion, remove deletion marks, and obtain +mail statistics. The third phase, update, occurs after the client has +issued the quit command, ending the POP3 session; at this time, the mail +server deletes the messages that were marked for deletion. In a POP3 +transaction, the user agent issues commands, and the server responds to +each command with a reply. There are two possible responses: +OK +(sometimes followed by server-to-client data), used by the server to +indicate that the previous command was fine; and -ERR , used by the +server to indicate that something was wrong with the previous command. +The authorization phase has two principal commands: user +`<username>`{=html} and pass `<password>`{=html} . To illustrate these +two commands, we suggest that you Telnet directly into a POP3 server, +using port 110, and issue these commands. Suppose that mailServer is the +name of your mail server. You will see something like: + +telnet mailServer 110 +OK POP3 server ready user bob +OK pass hungry +OK +user successfully logged on + +If you misspell a command, the POP3 server will reply with an -ERR +message. Now let's take a look at the transaction phase. A user agent +using POP3 can often be configured (by the user) to "download and +delete" or to "download and keep." The sequence of commands issued by a +POP3 user agent depends on which of these two modes the user agent is +operating in. In the downloadand-delete mode, the user agent will issue +the list , retr , and dele commands. As an example, suppose the user has +two messages in his or her mailbox. In the dialogue below, C: (standing +for client) is the user agent and S: (standing for server) is the mail +server. The transaction will look something like: + +C: list S: 1 498 S: 2 912 + +S: . C: retr 1 S: (blah blah ... S: ................. S: ..........blah) +S: . C: dele 1 C: retr 2 S: (blah blah ... S: ................. S: +..........blah) S: . C: dele 2 C: quit S: +OK POP3 server signing off + +The user agent first asks the mail server to list the size of each of +the stored messages. The user agent then retrieves and deletes each +message from the server. Note that after the authorization phase, the +user agent employed only four commands: list , retr , dele , and quit . +The syntax for these commands is defined in RFC 1939. After processing +the quit command, the POP3 server enters the update phase and removes +messages 1 and 2 from the mailbox. A problem with this +download-and-delete mode is that the recipient, Bob, may be nomadic and +may want to access his mail messages from multiple machines, for +example, his office PC, his home PC, and his portable computer. The +download-and-delete mode partitions Bob's mail messages over these three +machines; in particular, if Bob first reads a message on his office PC, +he will not be able to reread the message from his portable at home +later in the evening. In the download-and-keep mode, the user agent +leaves the messages on the mail server after downloading them. In this +case, Bob can reread messages from different machines; he can access a +message from work and access it again later in the week from home. +During a POP3 session between a user agent and the mail server, the POP3 +server maintains some state information; in particular, it keeps track +of which user messages have been marked deleted. However, the POP3 +server does not carry state information across POP3 sessions. This lack +of state information across sessions greatly simplifies the +implementation of a POP3 server. IMAP With POP3 access, once Bob has +downloaded his messages to the local machine, he can create mail + +folders and move the downloaded messages into the folders. Bob can then +delete messages, move messages across folders, and search for messages +(by sender name or subject). But this paradigm--- namely, folders and +messages in the local machine---poses a problem for the nomadic user, +who would prefer to maintain a folder hierarchy on a remote server that +can be accessed from any computer. This is not possible with POP3---the +POP3 protocol does not provide any means for a user to create remote +folders and assign messages to folders. To solve this and other +problems, the IMAP protocol, defined in \[RFC 3501\], was invented. Like +POP3, IMAP is a mail access protocol. It has many more features than +POP3, but it is also significantly more complex. (And thus the client +and server side implementations are significantly more complex.) An IMAP +server will associate each message with a folder; when a message first +arrives at the server, it is associated with the recipient's INBOX +folder. The recipient can then move the message into a new, user-created +folder, read the message, delete the message, and so on. The IMAP +protocol provides commands to allow users to create folders and move +messages from one folder to another. IMAP also provides commands that +allow users to search remote folders for messages matching specific +criteria. Note that, unlike POP3, an IMAP server maintains user state +information across IMAP sessions---for example, the names of the folders +and which messages are associated with which folders. Another important +feature of IMAP is that it has commands that permit a user agent to +obtain components of messages. For example, a user agent can obtain just +the message header of a message or just one part of a multipart MIME +message. This feature is useful when there is a low-bandwidth connection +(for example, a slow-speed modem link) between the user agent and its +mail server. With a low-bandwidth connection, the user may not want to +download all of the messages in its mailbox, particularly avoiding long +messages that might contain, for example, an audio or video clip. +Web-Based E-Mail More and more users today are sending and accessing +their e-mail through their Web browsers. Hotmail introduced Web-based +access in the mid 1990s. Now Web-based e-mail is also provided by +Google, Yahoo!, as well as just about every major university and +corporation. With this service, the user agent is an ordinary Web +browser, and the user communicates with its remote mailbox via HTTP. +When a recipient, such as Bob, wants to access a message in his mailbox, +the e-mail message is sent from Bob's mail server to Bob's browser using +the HTTP protocol rather than the POP3 or IMAP protocol. When a sender, +such as Alice, wants to send an e-mail message, the e-mail message is +sent from her browser to her mail server over HTTP rather than over +SMTP. Alice's mail server, however, still sends messages to, and +receives messages from, other mail servers using SMTP. + +2.4 DNS---The Internet's Directory Service We human beings can be +identified in many ways. For example, we can be identified by the names +that appear on our birth certificates. We can be identified by our +social security numbers. We can be identified by our driver's license +numbers. Although each of these identifiers can be used to identify +people, within a given context one identifier may be more appropriate +than another. For example, the computers at the IRS (the infamous +tax-collecting agency in the United States) prefer to use fixed-length +social security numbers rather than birth certificate names. On the +other hand, ordinary people prefer the more mnemonic birth certificate +names rather than social security numbers. (Indeed, can you imagine +saying, "Hi. My name is 132-67-9875. Please meet my husband, +178-87-1146.") Just as humans can be identified in many ways, so too can +Internet hosts. One identifier for a host is its hostname. +Hostnames---such as www.facebook.com, www.google.com , gaia.cs.umass.edu +---are mnemonic and are therefore appreciated by humans. However, +hostnames provide little, if any, information about the location within +the Internet of the host. (A hostname such as www.eurecom.fr , which +ends with the country code .fr , tells us that the host is probably in +France, but doesn't say much more.) Furthermore, because hostnames can +consist of variable-length alphanumeric characters, they would be +difficult to process by routers. For these reasons, hosts are also +identified by so-called IP addresses. We discuss IP addresses in some +detail in Chapter 4, but it is useful to say a few brief words about +them now. An IP address consists of four bytes and has a rigid +hierarchical structure. An IP address looks like 121.7.106.83 , where +each period separates one of the bytes expressed in decimal notation +from 0 to 255. An IP address is hierarchical because as we scan the +address from left to right, we obtain more and more specific information +about where the host is located in the Internet (that is, within which +network, in the network of networks). Similarly, when we scan a postal +address from bottom to top, we obtain more and more specific information +about where the addressee is located. + +2.4.1 Services Provided by DNS We have just seen that there are two ways +to identify a host---by a hostname and by an IP address. People prefer +the more mnemonic hostname identifier, while routers prefer +fixed-length, hierarchically structured IP addresses. In order to +reconcile these preferences, we need a directory service that translates +hostnames to IP addresses. This is the main task of the Internet's +domain name system (DNS). The DNS is (1) a distributed database +implemented in a hierarchy of DNS servers, and (2) an + +application-layer protocol that allows hosts to query the distributed +database. The DNS servers are often UNIX machines running the Berkeley +Internet Name Domain (BIND) software \[BIND 2016\]. The DNS protocol +runs over UDP and uses port 53. DNS is commonly employed by other +application-layer protocols---including HTTP and SMTP to translate +user-supplied hostnames to IP addresses. As an example, consider what +happens when a browser (that is, an HTTP client), running on some user's +host, requests the URL www.someschool.edu/index.html . In order for the +user's host to be able to send an HTTP request message to the Web server +www.someschool.edu , the user's host must first obtain the IP address of +www.someschool.edu . This is done as follows. + +1. The same user machine runs the client side of the DNS application. + +2. The browser extracts the hostname, www.someschool.edu , from the URL + and passes the hostname to the client side of the DNS application. + +3. The DNS client sends a query containing the hostname to a DNS + server. + +4. The DNS client eventually receives a reply, which includes the IP + address for the hostname. + +5. Once the browser receives the IP address from DNS, it can initiate a + TCP connection to the HTTP server process located at port 80 at that + IP address. We see from this example that DNS adds an additional + delay---sometimes substantial---to the Internet applications that + use it. Fortunately, as we discuss below, the desired IP address is + often cached in a "nearby" DNS server, which helps to reduce DNS + network traffic as well as the average DNS delay. DNS provides a few + other important services in addition to translating hostnames to IP + addresses: Host aliasing. A host with a complicated hostname can + have one or more alias names. For example, a hostname such as + relay1.west-coast.enterprise.com could have, say, two aliases such + as enterprise.com and www.enterprise.com . In this case, the + hostname relay1.west-coast.enterprise.com is said to be a canonical + hostname. Alias hostnames, when present, are typically more mnemonic + than canonical hostnames. DNS can be invoked by an application to + obtain the canonical hostname for a supplied alias hostname as well + as the IP address of the host. Mail server aliasing. For obvious + reasons, it is highly desirable that e-mail addresses be mnemonic. + For example, if Bob has an account with Yahoo Mail, Bob's e-mail + address might be as simple as bob@yahoo.mail . However, the hostname + of the Yahoo mail server is more complicated and much less mnemonic + than simply yahoo.com (for example, the canonical hostname might be + something like relay1.west-coast.yahoo.com ). DNS can be invoked by + a mail application to obtain the canonical hostname for a supplied + alias hostname as well as the IP address of the host. In fact, the + MX record (see below) permits a company's mail server and Web server + to have identical (aliased) hostnames; for example, a company's Web + server and mail server can both be called + +enterprise.com . Load distribution. DNS is also used to perform load +distribution among replicated servers, such as replicated Web servers. +Busy sites, such as cnn.com , are replicated over multiple servers, with +each server running on a different end system and each having a +different IP address. For replicated Web servers, a set of IP addresses +is thus associated with one canonical hostname. The DNS database +contains this set of IP addresses. When clients make a DNS query for a +name mapped to a set of addresses, the server responds with the entire +set of IP addresses, but rotates the ordering of the addresses within +each reply. Because a client typically sends its HTTP request message to +the IP address that is listed first in the set, DNS rotation distributes +the traffic among the replicated servers. DNS rotation is also used for +e-mail so that multiple mail servers can have the same alias name. Also, +content distribution companies such as Akamai have used DNS in more +sophisticated ways \[Dilley 2002\] to provide Web content distribution +(see Section 2.6.3). The DNS is specified in RFC 1034 and RFC 1035, and +updated in several additional RFCs. It is a complex system, and we only +touch upon key aspects of its + +PRINCIPLES IN PRACTICE DNS: CRITICAL NETWORK FUNCTIONS VIA THE +CLIENT-SERVER PARADIGM Like HTTP, FTP, and SMTP, the DNS protocol is an +application-layer protocol since it (1) runs between communicating end +systems using the client-server paradigm and (2) relies on an underlying +end-to-end transport protocol to transfer DNS messages between +communicating end systems. In another sense, however, the role of the +DNS is quite different from Web, file transfer, and e-mail applications. +Unlike these applications, the DNS is not an application with which a +user directly interacts. Instead, the DNS provides a core Internet +function---namely, translating hostnames to their underlying IP +addresses, for user applications and other software in the Internet. We +noted in Section 1.2 that much of the complexity in the Internet +architecture is located at the "edges" of the network. The DNS, which +implements the critical name-toaddress translation process using clients +and servers located at the edge of the network, is yet another example +of that design philosophy. + +operation here. The interested reader is referred to these RFCs and the +book by Albitz and Liu \[Albitz 1993\]; see also the retrospective paper +\[Mockapetris 1988\], which provides a nice description of the what and +why of DNS, and \[Mockapetris 2005\]. + +2.4.2 Overview of How DNS Works We now present a high-level overview of +how DNS works. Our discussion will focus on the hostname-to- + +IP-address translation service. Suppose that some application (such as a +Web browser or a mail reader) running in a user's host needs to +translate a hostname to an IP address. The application will invoke the +client side of DNS, specifying the hostname that needs to be translated. +(On many UNIX-based machines, gethostbyname() is the function call that +an application calls in order to perform the translation.) DNS in the +user's host then takes over, sending a query message into the network. +All DNS query and reply messages are sent within UDP datagrams to port +53. After a delay, ranging from milliseconds to seconds, DNS in the +user's host receives a DNS reply message that provides the desired +mapping. This mapping is then passed to the invoking application. Thus, +from the perspective of the invoking application in the user's host, DNS +is a black box providing a simple, straightforward translation service. +But in fact, the black box that implements the service is complex, +consisting of a large number of DNS servers distributed around the +globe, as well as an application-layer protocol that specifies how the +DNS servers and querying hosts communicate. A simple design for DNS +would have one DNS server that contains all the mappings. In this +centralized design, clients simply direct all queries to the single DNS +server, and the DNS server responds directly to the querying clients. +Although the simplicity of this design is attractive, it is +inappropriate for today's Internet, with its vast (and growing) number +of hosts. The problems with a centralized design include: A single point +of failure. If the DNS server crashes, so does the entire Internet! +Traffic volume. A single DNS server would have to handle all DNS queries +(for all the HTTP requests and e-mail messages generated from hundreds +of millions of hosts). Distant centralized database. A single DNS server +cannot be "close to" all the querying clients. If we put the single DNS +server in New York City, then all queries from Australia must travel to +the other side of the globe, perhaps over slow and congested links. This +can lead to significant delays. Maintenance. The single DNS server would +have to keep records for all Internet hosts. Not only would this +centralized database be huge, but it would have to be updated frequently +to account for every new host. In summary, a centralized database in a +single DNS server simply doesn't scale. Consequently, the DNS is +distributed by design. In fact, the DNS is a wonderful example of how a +distributed database can be implemented in the Internet. A Distributed, +Hierarchical Database In order to deal with the issue of scale, the DNS +uses a large number of servers, organized in a hierarchical fashion and +distributed around the world. No single DNS server has all of the +mappings for all of the hosts in the Internet. Instead, the mappings are +distributed across the DNS servers. To a first approximation, there are +three classes of DNS servers---root DNS servers, top-level domain (TLD) +DNS + +servers, and authoritative DNS servers---organized in a hierarchy as +shown in Figure 2.17. To understand how these three classes of servers +interact, suppose a DNS client wants to determine the IP address for the +hostname www.amazon.com . To a first + +Figure 2.17 Portion of the hierarchy of DNS servers + +approximation, the following events will take place. The client first +contacts one of the root servers, which returns IP addresses for TLD +servers for the top-level domain com . The client then contacts one of +these TLD servers, which returns the IP address of an authoritative +server for amazon.com . Finally, the client contacts one of the +authoritative servers for amazon.com , which returns the IP address for +the hostname www.amazon.com . We'll soon examine this DNS lookup process +in more detail. But let's first take a closer look at these three +classes of DNS servers: Root DNS servers. There are over 400 root name +servers scattered all over the world. Figure 2.18 shows the countries +that have root names servers, with countries having more than ten darkly +shaded. These root name servers are managed by 13 different +organizations. The full list of root name servers, along with the +organizations that manage them and their IP addresses can be found at +\[Root Servers 2016\]. Root name servers provide the IP addresses of the +TLD servers. Top-level domain (TLD) servers. For each of the top-level +domains --- top-level domains such as com, org, net, edu, and gov, and +all of the country top-level domains such as uk, fr, ca, and jp --- +there is TLD server (or server cluster). The company Verisign Global +Registry Services maintains the TLD servers for the com top-level +domain, and the company Educause maintains the TLD servers for the edu +top-level domain. The network infrastructure supporting a TLD can be +large and complex; see \[Osterweil 2012\] for a nice overview of the +Verisign network. See \[TLD list 2016\] for a list of all top-level +domains. TLD servers provide the IP addresses for authoritative DNS +servers. + +Figure 2.18 DNS root servers in 2016 + +Authoritative DNS servers. Every organization with publicly accessible +hosts (such as Web servers and mail servers) on the Internet must +provide publicly accessible DNS records that map the names of those +hosts to IP addresses. An organization's authoritative DNS server houses +these DNS records. An organization can choose to implement its own +authoritative DNS server to hold these records; alternatively, the +organization can pay to have these records stored in an authoritative +DNS server of some service provider. Most universities and large +companies implement and maintain their own primary and secondary +(backup) authoritative DNS server. The root, TLD, and authoritative DNS +servers all belong to the hierarchy of DNS servers, as shown in Figure +2.17. There is another important type of DNS server called the local DNS +server. A local DNS server does not strictly belong to the hierarchy of +servers but is nevertheless central to the DNS architecture. Each +ISP---such as a residential ISP or an institutional ISP---has a local +DNS server (also called a default name server). When a host connects to +an ISP, the ISP provides the host with the IP addresses of one or more +of its local DNS servers (typically through DHCP, which is discussed in +Chapter 4). You can easily determine the IP address of your local DNS +server by accessing network status windows in Windows or UNIX. A host's +local DNS server is typically "close to" the host. For an institutional +ISP, the local DNS server may be on the same LAN as the host; for a +residential ISP, it is typically separated from the host by no more than +a few routers. When a host makes a DNS query, the query is sent to the +local DNS server, which acts a proxy, forwarding the query into the DNS +server hierarchy, as we'll discuss in more detail below. Let's take a +look at a simple example. Suppose the host cse.nyu.edu desires the IP +address of gaia.cs.umass.edu . Also suppose that NYU's ocal DNS server +for cse.nyu.edu is called + +dns.nyu.edu and that an authoritative DNS server for gaia.cs.umass.edu +is called dns.umass.edu . As shown in Figure 2.19, the host cse.nyu.edu +first sends a DNS query message to its local DNS server, dns.nyu.edu . +The query message contains the hostname to be translated, namely, +gaia.cs.umass.edu . The local DNS server forwards the query message to a +root DNS server. The root DNS server takes note of the edu suffix and +returns to the local DNS server a list of IP addresses for TLD servers +responsible for edu . The local DNS server then resends the query +message to one of these TLD servers. The TLD server takes note of the +umass.edu suffix and responds with the IP address of the authoritative +DNS server for the University of Massachusetts, namely, dns.umass.edu . +Finally, the local DNS server resends the query message directly to +dns.umass.edu , which responds with the IP address of gaia.cs.umass.edu +. Note that in this example, in order to obtain the mapping for one +hostname, eight DNS messages were sent: four query messages and four +reply messages! We'll soon see how DNS caching reduces this query +traffic. Our previous example assumed that the TLD server knows the +authoritative DNS server for the hostname. In general this not always +true. Instead, the TLD server + +Figure 2.19 Interaction of the various DNS servers + +may know only of an intermediate DNS server, which in turn knows the +authoritative DNS server for the hostname. For example, suppose again +that the University of Massachusetts has a DNS server for the +university, called dns.umass.edu . Also suppose that each of the +departments at the University of Massachusetts has its own DNS server, +and that each departmental DNS server is authoritative for all hosts in +the department. In this case, when the intermediate DNS server, +dns.umass.edu , receives a query for a host with a hostname ending with +cs.umass.edu , it returns to dns.nyu.edu the IP address of +dns.cs.umass.edu , which is authoritative for all hostnames ending with +cs.umass.edu . The local DNS server dns.nyu.edu then sends the query to +the authoritative DNS server, which returns the desired mapping to the +local DNS server, which in turn returns the mapping to the requesting +host. In this case, a total of 10 DNS messages are sent! The example +shown in Figure 2.19 makes use of both recursive queries and iterative +queries. The query sent from cse.nyu.edu to dns.nyu.edu is a recursive +query, since the query asks dns.nyu.edu to obtain the mapping on its +behalf. But the subsequent three queries are iterative since all of the +replies are directly returned to dns.nyu.edu . In theory, any DNS query +can be iterative or recursive. For example, Figure 2.20 shows a DNS +query chain for which all of the queries are recursive. In practice, the +queries typically follow the pattern in Figure 2.19: The query from the +requesting host to the local DNS server is recursive, and the remaining +queries are iterative. DNS Caching Our discussion thus far has ignored +DNS caching, a critically important feature of the DNS system. In truth, +DNS extensively exploits DNS caching in order to improve the delay +performance and to reduce the number of DNS messages + +Figure 2.20 Recursive queries in DNS + +ricocheting around the Internet. The idea behind DNS caching is very +simple. In a query chain, when a DNS server receives a DNS reply +(containing, for example, a mapping from a hostname to an IP address), +it can cache the mapping in its local memory. For example, in Figure +2.19, each time the local DNS server dns.nyu.edu receives a reply from +some DNS server, it can cache any of the information contained in the +reply. If a hostname/IP address pair is cached in a DNS server and +another query arrives to the DNS server for the same hostname, the DNS +server can provide the desired IP address, even if it is not +authoritative for the hostname. Because hosts and mappings between +hostnames and IP addresses are by no means permanent, DNS servers +discard cached information after a period of time (often set to two +days). As an example, suppose that a host apricot.nyu.edu queries +dns.nyu.edu for the IP address for the hostname cnn.com . Furthermore, +suppose that a few hours later, another NYU host, say, kiwi.nyu.edu , +also queries dns.nyu.edu with the same hostname. Because of caching, the +local DNS server will be able to immediately return the IP address of +cnn.com to this second requesting + +host without having to query any other DNS servers. A local DNS server +can also cache the IP addresses of TLD servers, thereby allowing the +local DNS server to bypass the root DNS servers in a query chain. In +fact, because of caching, root servers are bypassed for all but a very +small fraction of DNS queries. + +2.4.3 DNS Records and Messages The DNS servers that together implement +the DNS distributed database store resource records (RRs), including RRs +that provide hostname-to-IP address mappings. Each DNS reply message +carries one or more resource records. In this and the following +subsection, we provide a brief overview of DNS resource records and +messages; more details can be found in \[Albitz 1993\] or in the DNS +RFCs \[RFC 1034; RFC 1035\]. A resource record is a four-tuple that +contains the following fields: + +(Name, Value, Type, TTL) + +TTL is the time to live of the resource record; it determines when a +resource should be removed from a cache. In the example records given +below, we ignore the TTL field. The meaning of Name and Value depend on +Type : If Type=A , then Name is a hostname and Value is the IP address +for the hostname. Thus, a Type A record provides the standard +hostname-to-IP address mapping. As an example, (relay1.bar.foo.com, +145.37.93.126, A) is a Type A record. If Type=NS , then Name is a domain +(such as foo.com ) and Value is the hostname of an authoritative DNS +server that knows how to obtain the IP addresses for hosts in the +domain. This record is used to route DNS queries further along in the +query chain. As an example, (foo.com, dns.foo.com, NS) is a Type NS +record. If Type=CNAME , then Value is a canonical hostname for the alias +hostname Name . This record can provide querying hosts the canonical +name for a hostname. As an example, (foo.com, relay1.bar.foo.com, CNAME) +is a CNAME record. If Type=MX , then Value is the canonical name of a +mail server that has an alias hostname Name . As an example, (foo.com, +mail.bar.foo.com, MX) is an MX record. MX records allow the hostnames of +mail servers to have simple aliases. Note that by using the MX record, a +company can have the same aliased name for its mail server and for one +of its other servers (such as its Web server). To obtain the canonical +name for the mail server, a DNS client would query for an MX + +record; to obtain the canonical name for the other server, the DNS +client would query for the CNAME record. If a DNS server is +authoritative for a particular hostname, then the DNS server will +contain a Type A record for the hostname. (Even if the DNS server is not +authoritative, it may contain a Type A record in its cache.) If a server +is not authoritative for a hostname, then the server will contain a Type +NS record for the domain that includes the hostname; it will also +contain a Type A record that provides the IP address of the DNS server +in the Value field of the NS record. As an example, suppose an edu TLD +server is not authoritative for the host gaia.cs.umass.edu . Then this +server will contain a record for a domain that includes the host +gaia.cs.umass.edu , for example, (umass.edu, dns.umass.edu, NS) . The +edu TLD server would also contain a Type A record, which maps the DNS +server dns.umass.edu to an IP address, for example, (dns.umass.edu, +128.119.40.111, A) . DNS Messages Earlier in this section, we referred +to DNS query and reply messages. These are the only two kinds of DNS +messages. Furthermore, both query and reply messages have the same +format, as shown in Figure 2.21.The semantics of the various fields in a +DNS message are as follows: The first 12 bytes is the header section, +which has a number of fields. The first field is a 16-bit number that +identifies the query. This identifier is copied into the reply message +to a query, allowing the client to match received replies with sent +queries. There are a number of flags in the flag field. A 1-bit +query/reply flag indicates whether the message is a query (0) or a reply +(1). A 1-bit authoritative flag is + +Figure 2.21 DNS message format + +set in a reply message when a DNS server is an authoritative server for +a queried name. A 1-bit recursion-desired flag is set when a client +(host or DNS server) desires that the DNS server perform recursion when +it doesn't have the record. A 1-bit recursion-available field is set in +a reply if the DNS server supports recursion. In the header, there are +also four number-of fields. These fields indicate the number of +occurrences of the four types of data sections that follow the header. +The question section contains information about the query that is being +made. This section includes (1) a name field that contains the name that +is being queried, and (2) a type field that indicates the type of +question being asked about the name---for example, a host address +associated with a name (Type A) or the mail server for a name (Type MX). +In a reply from a DNS server, the answer section contains the resource +records for the name that was originally queried. Recall that in each +resource record there is the Type (for example, A, NS, CNAME, and MX), +the Value , and the TTL . A reply can return multiple RRs in the answer, +since a hostname can have multiple IP addresses (for example, for +replicated Web servers, as discussed earlier in this section). The +authority section contains records of other authoritative servers. The +additional section contains other helpful records. For example, the +answer field in a reply to an MX query contains a resource record +providing the canonical hostname of a mail server. The additional +section contains a Type A record providing the IP address for the +canonical hostname of the mail server. How would you like to send a DNS +query message directly from the host you're working on to some DNS +server? This can easily be done with the nslookup program, which is +available from most Windows and UNIX platforms. For example, from a +Windows host, open the Command Prompt and invoke the nslookup program by +simply typing "nslookup." After invoking nslookup, you can send a DNS +query to any DNS server (root, TLD, or authoritative). After receiving +the reply message from the DNS server, nslookup will display the records +included in the reply (in a human-readable format). As an alternative to +running nslookup from your own host, you can visit one of many Web sites +that allow you to remotely employ nslookup. (Just type "nslookup" into a +search engine and you'll be brought to one of these sites.) The DNS +Wireshark lab at the end of this chapter will allow you to explore the +DNS in much more detail. Inserting Records into the DNS Database The +discussion above focused on how records are retrieved from the DNS +database. You might be wondering how records get into the database in +the first place. Let's look at how this is done in the context of a +specific example. Suppose you have just created an exciting new startup +company called Network Utopia. The first thing you'll surely want to do +is register the domain name + +networkutopia.com at a registrar. A registrar is a commercial entity +that verifies the uniqueness of the domain name, enters the domain name +into the DNS database (as discussed below), and collects a small fee +from you for its services. Prior to 1999, a single registrar, Network +Solutions, had a monopoly on domain name registration for com , net , +and org domains. But now there are many registrars competing for +customers, and the Internet Corporation for Assigned Names and Numbers +(ICANN) accredits the various registrars. A complete list of accredited +registrars is available at http:// www.internic.net . When you register +the domain name networkutopia.com with some registrar, you also need to +provide the registrar with the names and IP addresses of your primary +and secondary authoritative DNS servers. Suppose the names and IP +addresses are dns1.networkutopia.com , dns2.networkutopia.com , +212.2.212.1, and 212.212.212.2. For each of these two authoritative DNS +servers, the registrar would then make sure that a Type NS and a Type A +record are entered into the TLD com servers. Specifically, for the +primary authoritative server for networkutopia.com , the registrar would +insert the following two resource records into the DNS system: + +(networkutopia.com, dns1.networkutopia.com, NS) (dns1.networkutopia.com, +212.212.212.1, A) + +You'll also have to make sure that the Type A resource record for your +Web server www.networkutopia.com and the Type MX resource record for +your mail server mail.networkutopia.com are entered into your +authoritative DNS FOCUS ON SECURITY DNS VULNERABILITIES We have seen +that DNS is a critical component of the Internet infrastructure, with +many important services---including the Web and e-mail---simply +incapable of functioning without it. We therefore naturally ask, how can +DNS be attacked? Is DNS a sitting duck, waiting to be knocked out of +service, while taking most Internet applications down with it? The first +type of attack that comes to mind is a DDoS bandwidth-flooding attack +(see Section 1.6) against DNS servers. For example, an attacker could +attempt to send to each DNS root server a deluge of packets, so many +that the majority of legitimate DNS queries never get answered. Such a +large-scale DDoS attack against DNS root servers actually took place on +October 21, 2002. In this attack, the attackers leveraged a botnet to +send truck loads of ICMP ping messages to each of the 13 DNS root IP +addresses. (ICMP messages are discussed in + +Section 5.6. For now, it suffices to know that ICMP packets are special +types of IP datagrams.) Fortunately, this large-scale attack caused +minimal damage, having little or no impact on users' Internet +experience. The attackers did succeed at directing a deluge of packets +at the root servers. But many of the DNS root servers were protected by +packet filters, configured to always block all ICMP ping messages +directed at the root servers. These protected servers were thus spared +and functioned as normal. Furthermore, most local DNS servers cache the +IP addresses of top-level-domain servers, allowing the query process to +often bypass the DNS root servers. A potentially more effective DDoS +attack against DNS would be send a deluge of DNS queries to +top-level-domain servers, for example, to all the top-level-domain +servers that handle the .com domain. It would be harder to filter DNS +queries directed to DNS servers; and top-level-domain servers are not as +easily bypassed as are root servers. But the severity of such an attack +would be partially mitigated by caching in local DNS servers. DNS could +potentially be attacked in other ways. In a man-in-the-middle attack, +the attacker intercepts queries from hosts and returns bogus replies. In +the DNS poisoning attack, the attacker sends bogus replies to a DNS +server, tricking the server into accepting bogus records into its cache. +Either of these attacks could be used, for example, to redirect an +unsuspecting Web user to the attacker's Web site. These attacks, +however, are difficult to implement, as they require intercepting +packets or throttling servers \[Skoudis 2006\]. In summary, DNS has +demonstrated itself to be surprisingly robust against attacks. To date, +there hasn't been an attack that has successfully impeded the DNS +service. + +servers. (Until recently, the contents of each DNS server were +configured statically, for example, from a configuration file created by +a system manager. More recently, an UPDATE option has been added to the +DNS protocol to allow data to be dynamically added or deleted from the +database via DNS messages. \[RFC 2136\] and \[RFC 3007\] specify DNS +dynamic updates.) Once all of these steps are completed, people will be +able to visit your Web site and send e-mail to the employees at your +company. Let's conclude our discussion of DNS by verifying that this +statement is true. This verification also helps to solidify what we have +learned about DNS. Suppose Alice in Australia wants to view the Web page +www.networkutopia.com . As discussed earlier, her host will first send a +DNS query to her local DNS server. The local DNS server will then +contact a TLD com server. (The local DNS server will also have to +contact a root DNS server if the address of a TLD com server is not +cached.) This TLD server contains the Type NS and Type A resource +records listed above, because the registrar had these resource records +inserted into all of the TLD com servers. The TLD com server sends a +reply to Alice's local DNS server, with the reply containing the two +resource records. The local DNS server then sends a DNS query to +212.212.212.1 , asking for the Type A record corresponding to +www.networkutopia.com . This record provides the IP address of the +desired Web server, say, 212.212.71.4 , which the local DNS server +passes back to Alice's host. Alice's browser can now + +initiate a TCP connection to the host 212.212.71.4 and send an HTTP +request over the connection. Whew! There's a lot more going on than what +meets the eye when one surfs the Web! + +2.5 Peer-to-Peer File Distribution The applications described in this +chapter thus far---including the Web, e-mail, and DNS---all employ +client-server architectures with significant reliance on always-on +infrastructure servers. Recall from Section 2.1.1 that with a P2P +architecture, there is minimal (or no) reliance on always-on +infrastructure servers. Instead, pairs of intermittently connected +hosts, called peers, communicate directly with each other. The peers are +not owned by a service provider, but are instead desktops and laptops +controlled by users. In this section we consider a very natural P2P +application, namely, distributing a large file from a single server to a +large number of hosts (called peers). The file might be a new version of +the Linux operating system, a software patch for an existing operating +system or application, an MP3 music file, or an MPEG video file. In +client-server file distribution, the server must send a copy of the file +to each of the peers---placing an enormous burden on the server and +consuming a large amount of server bandwidth. In P2P file distribution, +each peer can redistribute any portion of the file it has received to +any other peers, thereby assisting the server in the distribution +process. As of 2016, the most popular P2P file distribution protocol is +BitTorrent. Originally developed by Bram Cohen, there are now many +different independent BitTorrent clients conforming to the BitTorrent +protocol, just as there are a number of Web browser clients that conform +to the HTTP protocol. In this subsection, we first examine the +selfscalability of P2P architectures in the context of file +distribution. We then describe BitTorrent in some detail, highlighting +its most important characteristics and features. Scalability of P2P +Architectures To compare client-server architectures with peer-to-peer +architectures, and illustrate the inherent selfscalability of P2P, we +now consider a simple quantitative model for distributing a file to a +fixed set of peers for both architecture types. As shown in Figure 2.22, +the server and the peers are connected to the Internet with access +links. Denote the upload rate of the server's access link by us, the +upload rate of the ith peer's access link by ui, and the download rate +of the ith peer's access link by di. Also denote the size of the file to +be distributed (in bits) by F and the number of peers that want to +obtain a copy of the file by N. The distribution time is the time it +takes to get + +Figure 2.22 An illustrative file distribution problem + +a copy of the file to all N peers. In our analysis of the distribution +time below, for both client-server and P2P architectures, we make the +simplifying (and generally accurate \[Akella 2003\]) assumption that the +Internet core has abundant bandwidth, implying that all of the +bottlenecks are in access networks. We also suppose that the server and +clients are not participating in any other network applications, so that +all of their upload and download access bandwidth can be fully devoted +to distributing this file. Let's first determine the distribution time +for the client-server architecture, which we denote by Dcs. In the +client-server architecture, none of the peers aids in distributing the +file. We make the following observations: The server must transmit one +copy of the file to each of the N peers. Thus the server must transmit +NF bits. Since the server's upload rate is us, the time to distribute +the file must be at least NF/us. Let dmin denote the download rate of +the peer with the lowest download rate, that is, dmin=min{d1,dp,. . +.,dN}. The peer with the lowest download rate cannot obtain all F bits +of the file in less than F/dmin seconds. Thus the minimum distribution +time is at least F/dmin. Putting these two observations together, we +obtain Dcs≥max{NFus,Fdmin}. + +This provides a lower bound on the minimum distribution time for the +client-server architecture. In the homework problems you will be asked +to show that the server can schedule its transmissions so that the lower +bound is actually achieved. So let's take this lower bound provided +above as the actual distribution time, that is, Dcs=max{NFus,Fdmin} + +(2.1) + +We see from Equation 2.1 that for N large enough, the client-server +distribution time is given by NF/us. Thus, the distribution time +increases linearly with the number of peers N. So, for example, if the +number of peers from one week to the next increases a thousand-fold from +a thousand to a million, the time required to distribute the file to all +peers increases by 1,000. Let's now go through a similar analysis for +the P2P architecture, where each peer can assist the server in +distributing the file. In particular, when a peer receives some file +data, it can use its own upload capacity to redistribute the data to +other peers. Calculating the distribution time for the P2P architecture +is somewhat more complicated than for the client-server architecture, +since the distribution time depends on how each peer distributes +portions of the file to the other peers. Nevertheless, a simple +expression for the minimal distribution time can be obtained \[Kumar +2006\]. To this end, we first make the following observations: At the +beginning of the distribution, only the server has the file. To get this +file into the community of peers, the server must send each bit of the +file at least once into its access link. Thus, the minimum distribution +time is at least F/us. (Unlike the client-server scheme, a bit sent once +by the server may not have to be sent by the server again, as the peers +may redistribute the bit among themselves.) As with the client-server +architecture, the peer with the lowest download rate cannot obtain all F +bits of the file in less than F/dmin seconds. Thus the minimum +distribution time is at least F/dmin. Finally, observe that the total +upload capacity of the system as a whole is equal to the upload rate of +the server plus the upload rates of each of the individual peers, that +is, utotal=us+u1+⋯+uN. The system must deliver (upload) F bits to each +of the N peers, thus delivering a total of NF bits. This cannot be done +at a rate faster than utotal. Thus, the minimum distribution time is +also at least NF/(us+u1+⋯+uN). Putting these three observations +together, we obtain the minimum distribution time for P2P, denoted by +DP2P. DP2P≥max{Fus,Fdmin,NFus+∑i=1Nui} + +(2.2) + +Equation 2.2 provides a lower bound for the minimum distribution time +for the P2P architecture. It turns out that if we imagine that each peer +can redistribute a bit as soon as it receives the bit, then there is a + +redistribution scheme that actually achieves this lower bound \[Kumar +2006\]. (We will prove a special case of this result in the homework.) +In reality, where chunks of the file are redistributed rather than +individual bits, Equation 2.2 serves as a good approximation of the +actual minimum distribution time. Thus, let's take the lower bound +provided by Equation 2.2 as the actual minimum distribution time, that +is, DP2P=max{Fus,Fdmin,NFus+∑i=1Nui} + +(2.3) + +Figure 2.23 compares the minimum distribution time for the client-server +and P2P architectures assuming that all peers have the same upload rate +u. In Figure 2.23, we have set F/u=1 hour, us=10u, and dmin≥us. Thus, a +peer can transmit the entire file in one hour, the server transmission +rate is 10 times the peer upload rate, + +Figure 2.23 Distribution time for P2P and client-server architectures + +and (for simplicity) the peer download rates are set large enough so as +not to have an effect. We see from Figure 2.23 that for the +client-server architecture, the distribution time increases linearly and +without bound as the number of peers increases. However, for the P2P +architecture, the minimal distribution time is not only always less than +the distribution time of the client-server architecture; it is also less +than one hour for any number of peers N. Thus, applications with the P2P +architecture can be self-scaling. This scalability is a direct +consequence of peers being redistributors as well as consumers of bits. +BitTorrent BitTorrent is a popular P2P protocol for file distribution +\[Chao 2011\]. In BitTorrent lingo, the collection of + +all peers participating in the distribution of a particular file is +called a torrent. Peers in a torrent download equal-size chunks of the +file from one another, with a typical chunk size of 256 KBytes. When a +peer first joins a torrent, it has no chunks. Over time it accumulates +more and more chunks. While it downloads chunks it also uploads chunks +to other peers. Once a peer has acquired the entire file, it may +(selfishly) leave the torrent, or (altruistically) remain in the torrent +and continue to upload chunks to other peers. Also, any peer may leave +the torrent at any time with only a subset of chunks, and later rejoin +the torrent. Let's now take a closer look at how BitTorrent operates. +Since BitTorrent is a rather complicated protocol and system, we'll only +describe its most important mechanisms, sweeping some of the details +under the rug; this will allow us to see the forest through the trees. +Each torrent has an infrastructure node called a tracker. + +Figure 2.24 File distribution with BitTorrent + +When a peer joins a torrent, it registers itself with the tracker and +periodically informs the tracker that it is still in the torrent. In +this manner, the tracker keeps track of the peers that are participating +in the torrent. A given torrent may have fewer than ten or more than a +thousand peers participating at any instant of time. + +As shown in Figure 2.24, when a new peer, Alice, joins the torrent, the +tracker randomly selects a subset of peers (for concreteness, say 50) +from the set of participating peers, and sends the IP addresses of these +50 peers to Alice. Possessing this list of peers, Alice attempts to +establish concurrent TCP connections with all the peers on this list. +Let's call all the peers with which Alice succeeds in establishing a TCP +connection "neighboring peers." (In Figure 2.24, Alice is shown to have +only three neighboring peers. Normally, she would have many more.) As +time evolves, some of these peers may leave and other peers (outside the +initial 50) may attempt to establish TCP connections with Alice. So a +peer's neighboring peers will fluctuate over time. At any given time, +each peer will have a subset of chunks from the file, with different +peers having different subsets. Periodically, Alice will ask each of her +neighboring peers (over the TCP connections) for the list of the chunks +they have. If Alice has L different neighbors, she will obtain L lists +of chunks. With this knowledge, Alice will issue requests (again over +the TCP connections) for chunks she currently does not have. So at any +given instant of time, Alice will have a subset of chunks and will know +which chunks her neighbors have. With this information, Alice will have +two important decisions to make. First, which chunks should she request +first from her neighbors? And second, to which of her neighbors should +she send requested chunks? In deciding which chunks to request, Alice +uses a technique called rarest first. The idea is to determine, from +among the chunks she does not have, the chunks that are the rarest among +her neighbors (that is, the chunks that have the fewest repeated copies +among her neighbors) and then request those rarest chunks first. In this +manner, the rarest chunks get more quickly redistributed, aiming to +(roughly) equalize the numbers of copies of each chunk in the torrent. +To determine which requests she responds to, BitTorrent uses a clever +trading algorithm. The basic idea is that Alice gives priority to the +neighbors that are currently supplying her data at the highest rate. +Specifically, for each of her neighbors, Alice continually measures the +rate at which she receives bits and determines the four peers that are +feeding her bits at the highest rate. She then reciprocates by sending +chunks to these same four peers. Every 10 seconds, she recalculates the +rates and possibly modifies the set of four peers. In BitTorrent lingo, +these four peers are said to be unchoked. Importantly, every 30 seconds, +she also picks one additional neighbor at random and sends it chunks. +Let's call the randomly chosen peer Bob. In BitTorrent lingo, Bob is +said to be optimistically unchoked. Because Alice is sending data to +Bob, she may become one of Bob's top four uploaders, in which case Bob +would start to send data to Alice. If the rate at which Bob sends data +to Alice is high enough, Bob could then, in turn, become one of Alice's +top four uploaders. In other words, every 30 seconds, Alice will +randomly choose a new trading partner and initiate trading with that +partner. If the two peers are satisfied with the trading, they will put +each other in their top four lists and continue trading with each other +until one of the peers finds a better partner. The effect is that peers +capable of uploading at compatible rates tend to find each other. The +random neighbor selection also allows new peers to get chunks, so that +they can have something to trade. All other neighboring peers besides +these five peers + +(four "top" peers and one probing peer) are "choked," that is, they do +not receive any chunks from Alice. BitTorrent has a number of +interesting mechanisms that are not discussed here, including pieces +(minichunks), pipelining, random first selection, endgame mode, and +anti-snubbing \[Cohen 2003\]. The incentive mechanism for trading just +described is often referred to as tit-for-tat \[Cohen 2003\]. It has +been shown that this incentive scheme can be circumvented \[Liogkas +2006; Locher 2006; Piatek 2007\]. Nevertheless, the BitTorrent ecosystem +is wildly successful, with millions of simultaneous peers actively +sharing files in hundreds of thousands of torrents. If BitTorrent had +been designed without tit-fortat (or a variant), but otherwise exactly +the same, BitTorrent would likely not even exist now, as the majority of +the users would have been freeriders \[Saroiu 2002\]. We close our +discussion on P2P by briefly mentioning another application of P2P, +namely, Distributed Hast Table (DHT). A distributed hash table is a +simple database, with the database records being distributed over the +peers in a P2P system. DHTs have been widely implemented (e.g., in +BitTorrent) and have been the subject of extensive research. An overview +is provided in a Video Note in the companion website. + +Walking though distributed hash tables + +2.6 Video Streaming and Content Distribution Networks Streaming +prerecorded video now accounts for the majority of the traffic in +residential ISPs in North America. In particular, the Netflix and +YouTube services alone consumed a whopping 37% and 16%, respectively, of +residential ISP traffic in 2015 \[Sandvine 2015\]. In this section we +will provide an overview of how popular video streaming services are +implemented in today's Internet. We will see they are implemented using +application-level protocols and servers that function in some ways like +a cache. In Chapter 9, devoted to multimedia networking, we will further +examine Internet video as well as other Internet multimedia services. + +2.6.1 Internet Video In streaming stored video applications, the +underlying medium is prerecorded video, such as a movie, a television +show, a prerecorded sporting event, or a prerecorded user-generated +video (such as those commonly seen on YouTube). These prerecorded videos +are placed on servers, and users send requests to the servers to view +the videos on demand. Many Internet companies today provide streaming +video, including, Netflix, YouTube (Google), Amazon, and Youku. But +before launching into a discussion of video streaming, we should first +get a quick feel for the video medium itself. A video is a sequence of +images, typically being displayed at a constant rate, for example, at 24 +or 30 images per second. An uncompressed, digitally encoded image +consists of an array of pixels, with each pixel encoded into a number of +bits to represent luminance and color. An important characteristic of +video is that it can be compressed, thereby trading off video quality +with bit rate. Today's off-the-shelf compression algorithms can compress +a video to essentially any bit rate desired. Of course, the higher the +bit rate, the better the image quality and the better the overall user +viewing experience. From a networking perspective, perhaps the most +salient characteristic of video is its high bit rate. Compressed +Internet video typically ranges from 100 kbps for low-quality video to +over 3 Mbps for streaming high-definition movies; 4K streaming envisions +a bitrate of more than 10 Mbps. This can translate to huge amount of +traffic and storage, particularly for high-end video. For example, a +single 2 Mbps video with a duration of 67 minutes will consume 1 +gigabyte of storage and traffic. By far, the most important performance +measure for streaming video is average end-to-end throughput. In order +to provide continuous playout, the network must provide an average +throughput to the streaming application that is at least as large as the +bit rate of the compressed video. + +We can also use compression to create multiple versions of the same +video, each at a different quality level. For example, we can use +compression to create, say, three versions of the same video, at rates +of 300 kbps, 1 Mbps, and 3 Mbps. Users can then decide which version +they want to watch as a function of their current available bandwidth. +Users with high-speed Internet connections might choose the 3 Mbps +version; users watching the video over 3G with a smartphone might choose +the 300 kbps version. + +2.6.2 HTTP Streaming and DASH In HTTP streaming, the video is simply +stored at an HTTP server as an ordinary file with a specific URL. When a +user wants to see the video, the client establishes a TCP connection +with the server and issues an HTTP GET request for that URL. The server +then sends the video file, within an HTTP response message, as quickly +as the underlying network protocols and traffic conditions will allow. +On the client side, the bytes are collected in a client application +buffer. Once the number of bytes in this buffer exceeds a predetermined +threshold, the client application begins playback---specifically, the +streaming video application periodically grabs video frames from the +client application buffer, decompresses the frames, and displays them on +the user's screen. Thus, the video streaming application is displaying +video as it is receiving and buffering frames corresponding to latter +parts of the video. Although HTTP streaming, as described in the +previous paragraph, has been extensively deployed in practice (for +example, by YouTube since its inception), it has a major shortcoming: +All clients receive the same encoding of the video, despite the large +variations in the amount of bandwidth available to a client, both across +different clients and also over time for the same client. This has led +to the development of a new type of HTTP-based streaming, often referred +to as Dynamic Adaptive Streaming over HTTP (DASH). In DASH, the video is +encoded into several different versions, with each version having a +different bit rate and, correspondingly, a different quality level. The +client dynamically requests chunks of video segments of a few seconds in +length. When the amount of available bandwidth is high, the client +naturally selects chunks from a high-rate version; and when the +available bandwidth is low, it naturally selects from a low-rate +version. The client selects different chunks one at a time with HTTP GET +request messages \[Akhshabi 2011\]. DASH allows clients with different +Internet access rates to stream in video at different encoding rates. +Clients with low-speed 3G connections can receive a low bit-rate (and +low-quality) version, and clients with fiber connections can receive a +high-quality version. DASH also allows a client to adapt to the +available bandwidth if the available end-to-end bandwidth changes during +the session. This feature is particularly important for mobile users, +who typically see their bandwidth availability fluctuate as they move +with respect to the base stations. With DASH, each video version is +stored in the HTTP server, each with a different URL. The HTTP + +server also has a manifest file, which provides a URL for each version +along with its bit rate. The client first requests the manifest file and +learns about the various versions. The client then selects one chunk at +a time by specifying a URL and a byte range in an HTTP GET request +message for each chunk. While downloading chunks, the client also +measures the received bandwidth and runs a rate determination algorithm +to select the chunk to request next. Naturally, if the client has a lot +of video buffered and if the measured receive bandwidth is high, it will +choose a chunk from a high-bitrate version. And naturally if the client +has little video buffered and the measured received bandwidth is low, it +will choose a chunk from a low-bitrate version. DASH therefore allows +the client to freely switch among different quality levels. + +2.6.3 Content Distribution Networks Today, many Internet video companies +are distributing on-demand multi-Mbps streams to millions of users on a +daily basis. YouTube, for example, with a library of hundreds of +millions of videos, distributes hundreds of millions of video streams to +users around the world every day. Streaming all this traffic to +locations all over the world while providing continuous playout and high +interactivity is clearly a challenging task. For an Internet video +company, perhaps the most straightforward approach to providing +streaming video service is to build a single massive data center, store +all of its videos in the data center, and stream the videos directly +from the data center to clients worldwide. But there are three major +problems with this approach. First, if the client is far from the data +center, server-to-client packets will cross many communication links and +likely pass through many ISPs, with some of the ISPs possibly located on +different continents. If one of these links provides a throughput that +is less than the video consumption rate, the end-to-end throughput will +also be below the consumption rate, resulting in annoying freezing +delays for the user. (Recall from Chapter 1 that the end-to-end +throughput of a stream is governed by the throughput at the bottleneck +link.) The likelihood of this happening increases as the number of links +in the end-to-end path increases. A second drawback is that a popular +video will likely be sent many times over the same communication links. +Not only does this waste network bandwidth, but the Internet video +company itself will be paying its provider ISP (connected to the data +center) for sending the same bytes into the Internet over and over +again. A third problem with this solution is that a single data center +represents a single point of failure---if the data center or its links +to the Internet goes down, it would not be able to distribute any video +streams. In order to meet the challenge of distributing massive amounts +of video data to users distributed around the world, almost all major +video-streaming companies make use of Content Distribution Networks +(CDNs). A CDN manages servers in multiple geographically distributed +locations, stores copies of the videos (and other types of Web content, +including documents, images, and audio) in its servers, and attempts to +direct each user request to a CDN location that will provide the best +user experience. The + +CDN may be a private CDN, that is, owned by the content provider itself; +for example, Google's CDN distributes YouTube videos and other types of +content. The CDN may alternatively be a third-party CDN that distributes +content on behalf of multiple content providers; Akamai, Limelight and +Level-3 all operate third-party CDNs. A very readable overview of modern +CDNs is \[Leighton 2009; Nygren 2010\]. CDNs typically adopt one of two +different server placement philosophies \[Huang 2008\]: Enter Deep. One +philosophy, pioneered by Akamai, is to enter deep into the access +networks of Internet Service Providers, by deploying server clusters in +access ISPs all over the world. (Access networks are described in +Section 1.3.) Akamai takes this approach with clusters in approximately +1,700 locations. The goal is to get close to end users, thereby +improving user-perceived delay and throughput by decreasing the number +of links and routers between the end user and the CDN server from which +it receives content. Because of this highly distributed design, the task +of maintaining and managing the clusters becomes challenging. Bring +Home. A second design philosophy, taken by Limelight and many other CDN +companies, is to bring the ISPs home by building large clusters at a +smaller number (for example, tens) of sites. Instead of getting inside +the access ISPs, these CDNs typically place their clusters in Internet +Exchange Points (IXPs) (see Section 1.3). Compared with the enter-deep +design philosophy, the bring-home design typically results in lower +maintenance and management overhead, possibly at the expense of higher +delay and lower throughput to end users. Once its clusters are in place, +the CDN replicates content across its clusters. The CDN may not want to +place a copy of every video in each cluster, since some videos are +rarely viewed or are only popular in some countries. In fact, many CDNs +do not push videos to their clusters but instead use a simple pull +strategy: If a client requests a video from a cluster that is not +storing the video, then the cluster retrieves the video (from a central +repository or from another cluster) and stores a copy locally while +streaming the video to the client at the same time. Similar Web caching +(see Section 2.2.5), when a cluster's storage becomes full, it removes +videos that are not frequently requested. CDN Operation Having +identified the two major approaches toward deploying a CDN, let's now +dive down into the nuts and bolts of how a CDN operates. When a browser +in a user's + +CASE STUDY GOOGLE'S NETWORK INFRASTRUCTURE To support its vast array of +cloud services---including search, Gmail, calendar, YouTube video, maps, +documents, and social networks---Google has deployed an extensive +private network and CDN infrastructure. Google's CDN infrastructure has +three tiers of server clusters: + +Fourteen "mega data centers," with eight in North America, four in +Europe, and two in Asia \[Google Locations 2016\], with each data center +having on the order of 100,000 servers. These mega data centers are +responsible for serving dynamic (and often personalized) content, +including search results and Gmail messages. An estimated 50 clusters in +IXPs scattered throughout the world, with each cluster consisting on the +order of 100--500 servers \[Adhikari 2011a\]. These clusters are +responsible for serving static content, including YouTube videos +\[Adhikari 2011a\]. Many hundreds of "enter-deep" clusters located +within an access ISP. Here a cluster typically consists of tens of +servers within a single rack. These enter-deep servers perform TCP +splitting (see Section 3.7) and serve static content \[Chen 2011\], +including the static portions of Web pages that embody search results. +All of these data centers and cluster locations are networked together +with Google's own private network. When a user makes a search query, +often the query is first sent over the local ISP to a nearby enter-deep +cache, from where the static content is retrieved; while providing the +static content to the client, the nearby cache also forwards the query +over Google's private network to one of the mega data centers, from +where the personalized search results are retrieved. For a YouTube +video, the video itself may come from one of the bring-home caches, +whereas portions of the Web page surrounding the video may come from the +nearby enter-deep cache, and the advertisements surrounding the video +come from the data centers. In summary, except for the local ISPs, the +Google cloud services are largely provided by a network infrastructure +that is independent of the public Internet. + +host is instructed to retrieve a specific video (identified by a URL), +the CDN must intercept the request so that it can (1) determine a +suitable CDN server cluster for that client at that time, and (2) +redirect the client's request to a server in that cluster. We'll shortly +discuss how a CDN can determine a suitable cluster. But first let's +examine the mechanics behind intercepting and redirecting a request. +Most CDNs take advantage of DNS to intercept and redirect requests; an +interesting discussion of such a use of the DNS is \[Vixie 2009\]. Let's +consider a simple example to illustrate how the DNS is typically +involved. Suppose a content provider, NetCinema, employs the third-party +CDN company, KingCDN, to distribute its videos to its customers. On the +NetCinema Web pages, each of its videos is assigned a URL that includes +the string "video" and a unique identifier for the video itself; for +example, Transformers 7 might be assigned +http://video.netcinema.com/6Y7B23V. Six steps then occur, as shown in +Figure 2.25: + +1. The user visits the Web page at NetCinema. +2. When the user clicks on the link http://video.netcinema.com/6Y7B23V, + the user's host sends a DNS query for video.netcinema.com. + +3. The user's Local DNS Server (LDNS) relays the DNS query to an +authoritative DNS server for NetCinema, which observes the string +"video" in the hostname video.netcinema.com. To "hand over" the DNS +query to KingCDN, instead of returning an IP address, the NetCinema +authoritative DNS server returns to the LDNS a hostname in the KingCDN's +domain, for example, a1105.kingcdn.com. + +4. From this point on, the DNS query enters into KingCDN's private DNS + infrastructure. The user's LDNS then sends a second query, now for + a1105.kingcdn.com, and KingCDN's DNS system eventually returns the + IP addresses of a KingCDN content server to the LDNS. It is thus + here, within the KingCDN's DNS system, that the CDN server from + which the client will receive its content is specified. + +Figure 2.25 DNS redirects a user's request to a CDN server + +5. The LDNS forwards the IP address of the content-serving CDN node to + the user's host. +6. Once the client receives the IP address for a KingCDN content + server, it establishes a direct TCP connection with the server at + that IP address and issues an HTTP GET request for the video. If + DASH is used, the server will first send to the client a manifest + file with a list of URLs, one for each version of the video, and the + client will dynamically select chunks from the different versions. + Cluster Selection Strategies At the core of any CDN deployment is a + cluster selection strategy, that is, a mechanism for dynamically + directing clients to a server cluster or a data center within the + CDN. As we just saw, the + +CDN learns the IP address of the client's LDNS server via the client's +DNS lookup. After learning this IP address, the CDN needs to select an +appropriate cluster based on this IP address. CDNs generally employ +proprietary cluster selection strategies. We now briefly survey a few +approaches, each of which has its own advantages and disadvantages. One +simple strategy is to assign the client to the cluster that is +geographically closest. Using commercial geo-location databases (such as +Quova \[Quova 2016\] and Max-Mind \[MaxMind 2016\]), each LDNS IP +address is mapped to a geographic location. When a DNS request is +received from a particular LDNS, the CDN chooses the geographically +closest cluster, that is, the cluster that is the fewest kilometers from +the LDNS "as the bird flies." Such a solution can work reasonably well +for a large fraction of the clients \[Agarwal 2009\]. However, for some +clients, the solution may perform poorly, since the geographically +closest cluster may not be the closest cluster in terms of the length or +number of hops of the network path. Furthermore, a problem inherent with +all DNS-based approaches is that some end-users are configured to use +remotely located LDNSs \[Shaikh 2001; Mao 2002\], in which case the LDNS +location may be far from the client's location. Moreover, this simple +strategy ignores the variation in delay and available bandwidth over +time of Internet paths, always assigning the same cluster to a +particular client. In order to determine the best cluster for a client +based on the current traffic conditions, CDNs can instead perform +periodic real-time measurements of delay and loss performance between +their clusters and clients. For instance, a CDN can have each of its +clusters periodically send probes (for example, ping messages or DNS +queries) to all of the LDNSs around the world. One drawback of this +approach is that many LDNSs are configured to not respond to such +probes. + +2.6.4 Case Studies: Netflix, YouTube, and Kankan We conclude our +discussion of streaming stored video by taking a look at three highly +successful largescale deployments: Netflix, YouTube, and Kankan. We'll +see that each of these systems take a very different approach, yet +employ many of the underlying principles discussed in this section. +Netflix Generating 37% of the downstream traffic in residential ISPs in +North America in 2015, Netflix has become the leading service provider +for online movies and TV series in the United States \[Sandvine 2015\]. +As we discuss below, Netflix video distribution has two major +components: the Amazon cloud and its own private CDN infrastructure. +Netflix has a Web site that handles numerous functions, including user +registration and login, billing, movie catalogue for browsing and +searching, and a movie recommendation system. As shown in Figure + +2.26, this Web site (and its associated backend databases) run entirely +on Amazon servers in the Amazon cloud. Additionally, the Amazon cloud +handles the following critical functions: Content ingestion. Before +Netflix can distribute a movie to its customers, it must first ingest +and process the movie. Netflix receives studio master versions of movies +and uploads them to hosts in the Amazon cloud. Content processing. The +machines in the Amazon cloud create many different formats for each +movie, suitable for a diverse array of client video players running on +desktop computers, smartphones, and game consoles connected to +televisions. A different version is created for each of these formats +and at multiple bit rates, allowing for adaptive streaming over HTTP +using DASH. Uploading versions to its CDN. Once all of the versions of a +movie have been created, the hosts in the Amazon cloud upload the +versions to its CDN. + +Figure 2.26 Netflix video streaming platform + +When Netflix first rolled out its video streaming service in 2007, it +employed three third-party CDN companies to distribute its video +content. Netflix has since created its own private CDN, from which it +now streams all of its videos. (Netflix still uses Akamai to distribute +its Web pages, however.) To create its own CDN, Netflix has installed +server racks both in IXPs and within residential ISPs themselves. +Netflix currently has server racks in over 50 IXP locations; see +\[Netflix Open Connect 2016\] for a current list of IXPs housing Netflix +racks. There are also hundreds of ISP locations housing Netflix racks; +also see \[Netflix Open Connect 2016\], where Netflix provides to +potential ISP partners instructions about installing a (free) Netflix +rack for their networks. Each server in the rack has several 10 Gbps + +Ethernet ports and over 100 terabytes of storage. The number of servers +in a rack varies: IXP installations often have tens of servers and +contain the entire Netflix streaming video library, including multiple +versions of the videos to support DASH; local IXPs may only have one +server and contain only the most popular videos. Netflix does not use +pull-caching (Section 2.2.5) to populate its CDN servers in the IXPs and +ISPs. Instead, Netflix distributes by pushing the videos to its CDN +servers during offpeak hours. For those locations that cannot hold the +entire library, Netflix pushes only the most popular videos, which are +determined on a day-to-day basis. The Netflix CDN design is described in +some detail in the YouTube videos \[Netflix Video 1\] and \[Netflix +Video 2\]. Having described the components of the Netflix architecture, +let's take a closer look at the interaction between the client and the +various servers that are involved in movie delivery. As indicated +earlier, the Web pages for browsing the Netflix video library are served +from servers in the Amazon cloud. When a user selects a movie to play, +the Netflix software, running in the Amazon cloud, first determines +which of its CDN servers have copies of the movie. Among the servers +that have the movie, the software then determines the "best" server for +that client request. If the client is using a residential ISP that has a +Netflix CDN server rack installed in that ISP, and this rack has a copy +of the requested movie, then a server in this rack is typically +selected. If not, a server at a nearby IXP is typically selected. Once +Netflix determines the CDN server that is to deliver the content, it +sends the client the IP address of the specific server as well as a +manifest file, which has the URLs for the different versions of the +requested movie. The client and that CDN server then directly interact +using a proprietary version of DASH. Specifically, as described in +Section 2.6.2, the client uses the byte-range header in HTTP GET request +messages, to request chunks from the different versions of the movie. +Netflix uses chunks that are approximately four-seconds long \[Adhikari +2012\]. While the chunks are being downloaded, the client measures the +received throughput and runs a rate-determination algorithm to determine +the quality of the next chunk to request. Netflix embodies many of the +key principles discussed earlier in this section, including adaptive +streaming and CDN distribution. However, because Netflix uses its own +private CDN, which distributes only video (and not Web pages), Netflix +has been able to simplify and tailor its CDN design. In particular, +Netflix does not need to employ DNS redirect, as discussed in Section +2.6.3, to connect a particular client to a CDN server; instead, the +Netflix software (running in the Amazon cloud) directly tells the client +to use a particular CDN server. Furthermore, the Netflix CDN uses push +caching rather than pull caching (Section 2.2.5): content is pushed into +the servers at scheduled times at off-peak hours, rather than +dynamically during cache misses. YouTube With 300 hours of video +uploaded to YouTube every minute and several billion video views per day +\[YouTube 2016\], YouTube is indisputably the world's largest +video-sharing site. YouTube began its + +service in April 2005 and was acquired by Google in November 2006. +Although the Google/YouTube design and protocols are proprietary, +through several independent measurement efforts we can gain a basic +understanding about how YouTube operates \[Zink 2009; Torres 2011; +Adhikari 2011a\]. As with Netflix, YouTube makes extensive use of CDN +technology to distribute its videos \[Torres 2011\]. Similar to Netflix, +Google uses its own private CDN to distribute YouTube videos, and has +installed server clusters in many hundreds of different IXP and ISP +locations. From these locations and directly from its huge data centers, +Google distributes YouTube videos \[Adhikari 2011a\]. Unlike Netflix, +however, Google uses pull caching, as described in Section 2.2.5, and +DNS redirect, as described in Section 2.6.3. Most of the time, Google's +cluster-selection strategy directs the client to the cluster for which +the RTT between client and cluster is the lowest; however, in order to +balance the load across clusters, sometimes the client is directed (via +DNS) to a more distant cluster \[Torres 2011\]. YouTube employs HTTP +streaming, often making a small number of different versions available +for a video, each with a different bit rate and corresponding quality +level. YouTube does not employ adaptive streaming (such as DASH), but +instead requires the user to manually select a version. In order to save +bandwidth and server resources that would be wasted by repositioning or +early termination, YouTube uses the HTTP byte range request to limit the +flow of transmitted data after a target amount of video is prefetched. +Several million videos are uploaded to YouTube every day. Not only are +YouTube videos streamed from server to client over HTTP, but YouTube +uploaders also upload their videos from client to server over HTTP. +YouTube processes each video it receives, converting it to a YouTube +video format and creating multiple versions at different bit rates. This +processing takes place entirely within Google data centers. (See the +case study on Google's network infrastructure in Section 2.6.3.) Kankan +We just saw that dedicated servers, operated by private CDNs, stream +Netflix and YouTube videos to clients. Netflix and YouTube have to pay +not only for the server hardware but also for the bandwidth the servers +use to distribute the videos. Given the scale of these services and the +amount of bandwidth they are consuming, such a CDN deployment can be +costly. We conclude this section by describing an entirely different +approach for providing video on demand over the Internet at a large +scale---one that allows the service provider to significantly reduce its +infrastructure and bandwidth costs. As you might suspect, this approach +uses P2P delivery instead of (or along with) client-server delivery. +Since 2011, Kankan (owned and operated by Xunlei) has been deploying P2P +video delivery with great success, with tens of millions of users every +month \[Zhang 2015\]. At a high level, P2P video streaming is very +similar to BitTorrent file downloading. When a peer wants to + +see a video, it contacts a tracker to discover other peers in the system +that have a copy of that video. This requesting peer then requests +chunks of the video in parallel from the other peers that have the +video. Different from downloading with BitTorrent, however, requests are +preferentially made for chunks that are to be played back in the near +future in order to ensure continuous playback \[Dhungel 2012\]. +Recently, Kankan has migrated to a hybrid CDN-P2P streaming system +\[Zhang 2015\]. Specifically, Kankan now deploys a few hundred servers +within China and pushes video content to these servers. This Kankan CDN +plays a major role in the start-up stage of video streaming. In most +cases, the client requests the beginning of the content from CDN +servers, and in parallel requests content from peers. When the total P2P +traffic is sufficient for video playback, the client will cease +streaming from the CDN and only stream from peers. But if the P2P +streaming traffic becomes insufficient, the client will restart CDN +connections and return to the mode of hybrid CDN-P2P streaming. In this +manner, Kankan can ensure short initial start-up delays while minimally +relying on costly infrastructure servers and bandwidth. + +2.7 Socket Programming: Creating Network Applications Now that we've +looked at a number of important network applications, let's explore how +network application programs are actually created. Recall from Section +2.1 that a typical network application consists of a pair of +programs---a client program and a server program---residing in two +different end systems. When these two programs are executed, a client +process and a server process are created, and these processes +communicate with each other by reading from, and writing to, sockets. +When creating a network application, the developer's main task is +therefore to write the code for both the client and server programs. +There are two types of network applications. One type is an +implementation whose operation is specified in a protocol standard, such +as an RFC or some other standards document; such an application is +sometimes referred to as "open," since the rules specifying its +operation are known to all. For such an implementation, the client and +server programs must conform to the rules dictated by the RFC. For +example, the client program could be an implementation of the client +side of the HTTP protocol, described in Section 2.2 and precisely +defined in RFC 2616; similarly, the server program could be an +implementation of the HTTP server protocol, also precisely defined in +RFC 2616. If one developer writes code for the client program and +another developer writes code for the server program, and both +developers carefully follow the rules of the RFC, then the two programs +will be able to interoperate. Indeed, many of today's network +applications involve communication between client and server programs +that have been created by independent developers---for example, a Google +Chrome browser communicating with an Apache Web server, or a BitTorrent +client communicating with BitTorrent tracker. The other type of network +application is a proprietary network application. In this case the +client and server programs employ an application-layer protocol that has +not been openly published in an RFC or elsewhere. A single developer (or +development team) creates both the client and server programs, and the +developer has complete control over what goes in the code. But because +the code does not implement an open protocol, other independent +developers will not be able to develop code that interoperates with the +application. In this section, we'll examine the key issues in developing +a client-server application, and we'll "get our hands dirty" by looking +at code that implements a very simple client-server application. During +the development phase, one of the first decisions the developer must +make is whether the application is to run over TCP or over UDP. Recall +that TCP is connection oriented and provides a reliable byte-stream +channel through which data flows between two end systems. UDP is +connectionless and sends independent packets of data from one end system +to the other, without any guarantees about delivery. + +Recall also that when a client or server program implements a protocol +defined by an RFC, it should use the well-known port number associated +with the protocol; conversely, when developing a proprietary +application, the developer must be careful to avoid using such +well-known port numbers. (Port numbers were briefly discussed in Section +2.1. They are covered in more detail in Chapter 3.) We introduce UDP and +TCP socket programming by way of a simple UDP application and a simple +TCP application. We present the simple UDP and TCP applications in +Python 3. We could have written the code in Java, C, or C++, but we +chose Python mostly because Python clearly exposes the key socket +concepts. With Python there are fewer lines of code, and each line can +be explained to the novice programmer without difficulty. But there's no +need to be frightened if you are not familiar with Python. You should be +able to easily follow the code if you have experience programming in +Java, C, or C++. If you are interested in client-server programming with +Java, you are encouraged to see the Companion Website for this textbook; +in fact, you can find there all the examples in this section (and +associated labs) in Java. For readers who are interested in +client-server programming in C, there are several good references +available \[Donahoo 2001; Stevens 1997; Frost 1994; Kurose 1996\]; our +Python examples below have a similar look and feel to C. + +2.7.1 Socket Programming with UDP In this subsection, we'll write simple +client-server programs that use UDP; in the following section, we'll +write similar programs that use TCP. Recall from Section 2.1 that +processes running on different machines communicate with each other by +sending messages into sockets. We said that each process is analogous to +a house and the process's socket is analogous to a door. The application +resides on one side of the door in the house; the transport-layer +protocol resides on the other side of the door in the outside world. The +application developer has control of everything on the application-layer +side of the socket; however, it has little control of the +transport-layer side. Now let's take a closer look at the interaction +between two communicating processes that use UDP sockets. Before the +sending process can push a packet of data out the socket door, when +using UDP, it must first attach a destination address to the packet. +After the packet passes through the sender's socket, the Internet will +use this destination address to route the packet through the Internet to +the socket in the receiving process. When the packet arrives at the +receiving socket, the receiving process will retrieve the packet through +the socket, and then inspect the packet's contents and take appropriate +action. So you may be now wondering, what goes into the destination +address that is attached to the packet? + +As you might expect, the destination host's IP address is part of the +destination address. By including the destination IP address in the +packet, the routers in the Internet will be able to route the packet +through the Internet to the destination host. But because a host may be +running many network application processes, each with one or more +sockets, it is also necessary to identify the particular socket in the +destination host. When a socket is created, an identifier, called a port +number, is assigned to it. So, as you might expect, the packet's +destination address also includes the socket's port number. In summary, +the sending process attaches to the packet a destination address, which +consists of the destination host's IP address and the destination +socket's port number. Moreover, as we shall soon see, the sender's +source address---consisting of the IP address of the source host and the +port number of the source socket---are also attached to the packet. +However, attaching the source address to the packet is typically not +done by the UDP application code; instead it is automatically done by +the underlying operating system. We'll use the following simple +client-server application to demonstrate socket programming for both UDP +and TCP: + +1. The client reads a line of characters (data) from its keyboard and + sends the data to the server. +2. The server receives the data and converts the characters to + uppercase. +3. The server sends the modified data to the client. +4. The client receives the modified data and displays the line on its + screen. Figure 2.27 highlights the main socket-related activity of + the client and server that communicate over the UDP transport + service. Now let's get our hands dirty and take a look at the + client-server program pair for a UDP implementation of this simple + application. We also provide a detailed, line-by-line analysis after + each program. We'll begin with the UDP client, which will send a + simple application-level message to the server. In order for + +Figure 2.27 The client-server application using UDP + +the server to be able to receive and reply to the client's message, it +must be ready and running---that is, it must be running as a process +before the client sends its message. The client program is called +UDPClient.py, and the server program is called UDPServer.py. In order to +emphasize the key issues, we intentionally provide code that is minimal. +"Good code" would certainly have a few more auxiliary lines, in +particular for handling error cases. For this application, we have +arbitrarily chosen 12000 for the server port number. UDPClient.py Here +is the code for the client side of the application: + +from socket import \* serverName = 'hostname' serverPort = 12000 + +clientSocket = socket(AF_INET, SOCK_DGRAM) message = raw_input('Input +lowercase sentence:') clientSocket.sendto(message.encode(),(serverName, +serverPort)) modifiedMessage, serverAddress = +clientSocket.recvfrom(2048) print(modifiedMessage.decode()) +clientSocket.close() + +Now let's take a look at the various lines of code in UDPClient.py. + +from socket import \* + +The socket module forms the basis of all network communications in +Python. By including this line, we will be able to create sockets within +our program. + +serverName = 'hostname' serverPort = 12000 + +The first line sets the variable serverName to the string 'hostname'. +Here, we provide a string containing either the IP address of the server +(e.g., "128.138.32.126") or the hostname of the server (e.g., +"cis.poly.edu"). If we use the hostname, then a DNS lookup will +automatically be performed to get the IP address.) The second line sets +the integer variable serverPort to 12000. + +clientSocket = socket(AF_INET, SOCK_DGRAM) + +This line creates the client's socket, called clientSocket . The first +parameter indicates the address family; in particular, AF_INET indicates +that the underlying network is using IPv4. (Do not worry about this +now---we will discuss IPv4 in Chapter 4.) The second parameter indicates +that the socket is of type SOCK_DGRAM , which means it is a UDP socket +(rather than a TCP socket). Note that we are not specifying the port +number of the client socket when we create it; we are instead letting +the operating system do this for us. Now that the client process's door +has been created, we will want to create a message to send through the +door. + +message = raw_input('Input lowercase sentence:') + +raw_input() is a built-in function in Python. When this command is +executed, the user at the client is prompted with the words "Input +lowercase sentence:" The user then uses her keyboard to input a line, +which is put into the variable message . Now that we have a socket and a +message, we will want to send the message through the socket to the +destination host. + +clientSocket.sendto(message.encode(),(serverName, serverPort)) + +In the above line, we first convert the message from string type to byte +type, as we need to send bytes into a socket; this is done with the +encode() method. The method sendto() attaches the destination address ( +serverName, serverPort ) to the message and sends the resulting packet +into the process's socket, clientSocket . (As mentioned earlier, the +source address is also attached to the packet, although this is done +automatically rather than explicitly by the code.) Sending a +client-to-server message via a UDP socket is that simple! After sending +the packet, the client waits to receive data from the server. + +modifiedMessage, serverAddress = clientSocket.recvfrom(2048) + +With the above line, when a packet arrives from the Internet at the +client's socket, the packet's data is put into the variable +modifiedMessage and the packet's source address is put into the variable +serverAddress . The variable serverAddress contains both the server's IP +address and the server's port number. The program UDPClient doesn't +actually need this server address information, since it already knows +the server address from the outset; but this line of Python provides the +server address nevertheless. The method recvfrom also takes the buffer +size 2048 as input. (This buffer size works for most purposes.) + +print(modifiedMessage.decode()) + +This line prints out modifiedMessage on the user's display, after +converting the message from bytes to string. It should be the original +line that the user typed, but now capitalized. + +clientSocket.close() + +This line closes the socket. The process then terminates. UDPServer.py +Let's now take a look at the server side of the application: + +from socket import \* serverPort = 12000 serverSocket = socket(AF_INET, +SOCK_DGRAM) serverSocket.bind(('', serverPort)) print("The server is +ready to receive") while True: message, clientAddress = +serverSocket.recvfrom(2048) modifiedMessage = message.decode().upper() +serverSocket.sendto(modifiedMessage.encode(), clientAddress) + +Note that the beginning of UDPServer is similar to UDPClient. It also +imports the socket module, also sets the integer variable serverPort to +12000, and also creates a socket of type SOCK_DGRAM (a UDP socket). The +first line of code that is significantly different from UDPClient is: + +serverSocket.bind(('', serverPort)) + +The above line binds (that is, assigns) the port number 12000 to the +server's socket. Thus in UDPServer, the code (written by the application +developer) is explicitly assigning a port number to the socket. In this +manner, when anyone sends a packet to port 12000 at the IP address of +the server, that packet will be directed to this socket. UDPServer then +enters a while loop; the while loop will allow UDPServer to receive and +process packets from clients indefinitely. In the while loop, UDPServer +waits for a packet to arrive. + +message, clientAddress = serverSocket.recvfrom(2048) + +This line of code is similar to what we saw in UDPClient. When a packet +arrives at the server's socket, the packet's data is put into the +variable message and the packet's source address is put into the +variable clientAddress . The variable clientAddress contains both the +client's IP address and the client's port number. Here, UDPServer will +make use of this address information, as it provides a return + +address, similar to the return address with ordinary postal mail. With +this source address information, the server now knows to where it should +direct its reply. + +modifiedMessage = message.decode().upper() + +This line is the heart of our simple application. It takes the line sent +by the client and, after converting the message to a string, uses the +method upper() to capitalize it. + +serverSocket.sendto(modifiedMessage.encode(), clientAddress) + +This last line attaches the client's address (IP address and port +number) to the capitalized message (after converting the string to +bytes), and sends the resulting packet into the server's socket. (As +mentioned earlier, the server address is also attached to the packet, +although this is done automatically rather than explicitly by the code.) +The Internet will then deliver the packet to this client address. After +the server sends the packet, it remains in the while loop, waiting for +another UDP packet to arrive (from any client running on any host). To +test the pair of programs, you run UDPClient.py on one host and +UDPServer.py on another host. Be sure to include the proper hostname or +IP address of the server in UDPClient.py. Next, you execute +UDPServer.py, the compiled server program, in the server host. This +creates a process in the server that idles until it is contacted by some +client. Then you execute UDPClient.py, the compiled client program, in +the client. This creates a process in the client. Finally, to use the +application at the client, you type a sentence followed by a carriage +return. To develop your own UDP client-server application, you can begin +by slightly modifying the client or server programs. For example, +instead of converting all the letters to uppercase, the server could +count the number of times the letter s appears and return this number. +Or you can modify the client so that after receiving a capitalized +sentence, the user can continue to send more sentences to the server. + +2.7.2 Socket Programming with TCP Unlike UDP, TCP is a +connection-oriented protocol. This means that before the client and +server can start to send data to each other, they first need to +handshake and establish a TCP connection. One end of the TCP connection +is attached to the client socket and the other end is attached to a +server socket. When creating the TCP connection, we associate with it +the client socket address (IP address and port + +number) and the server socket address (IP address and port number). With +the TCP connection established, when one side wants to send data to the +other side, it just drops the data into the TCP connection via its +socket. This is different from UDP, for which the server must attach a +destination address to the packet before dropping it into the socket. +Now let's take a closer look at the interaction of client and server +programs in TCP. The client has the job of initiating contact with the +server. In order for the server to be able to react to the client's +initial contact, the server has to be ready. This implies two things. +First, as in the case of UDP, the TCP server must be running as a +process before the client attempts to initiate contact. Second, the +server program must have a special door---more precisely, a special +socket---that welcomes some initial contact from a client process +running on an arbitrary host. Using our house/door analogy for a +process/socket, we will sometimes refer to the client's initial contact +as "knocking on the welcoming door." With the server process running, +the client process can initiate a TCP connection to the server. This is +done in the client program by creating a TCP socket. When the client +creates its TCP socket, it specifies the address of the welcoming socket +in the server, namely, the IP address of the server host and the port +number of the socket. After creating its socket, the client initiates a +three-way handshake and establishes a TCP connection with the server. +The three-way handshake, which takes place within the transport layer, +is completely invisible to the client and server programs. During the +three-way handshake, the client process knocks on the welcoming door of +the server process. When the server "hears" the knocking, it creates a +new door---more precisely, a new socket that is dedicated to that +particular client. In our example below, the welcoming door is a TCP +socket object that we call serverSocket ; the newly created socket +dedicated to the client making the connection is called connectionSocket +. Students who are encountering TCP sockets for the first time sometimes +confuse the welcoming socket (which is the initial point of contact for +all clients wanting to communicate with the server), and each newly +created server-side connection socket that is subsequently created for +communicating with each client. From the application's perspective, the +client's socket and the server's connection socket are directly +connected by a pipe. As shown in Figure 2.28, the client process can +send arbitrary bytes into its socket, and TCP guarantees that the server +process will receive (through the connection socket) each byte in the +order sent. TCP thus provides a reliable service between the client and +server processes. Furthermore, just as people can go in and out the same +door, the client process not only sends bytes into but also receives +bytes from its socket; similarly, the server process not only receives +bytes from but also sends bytes into its connection socket. We use the +same simple client-server application to demonstrate socket programming +with TCP: The client sends one line of data to the server, the server +capitalizes the line and sends it back to the client. Figure 2.29 +highlights the main socket-related activity of the client and server +that communicate over + +the TCP transport service. + +Figure 2.28 The TCPServer process has two sockets + +TCPClient.py Here is the code for the client side of the application: + +from socket import \* serverName = 'servername' serverPort = 12000 +clientSocket = socket(AF_INET, SOCK_STREAM) +clientSocket.connect((serverName, serverPort)) sentence = +raw_input('Input lowercase sentence:') +clientSocket.send(sentence.encode()) modifiedSentence = +clientSocket.recv(1024) print('From Server: ', +modifiedSentence.decode()) clientSocket.close() + +Let's now take a look at the various lines in the code that differ +significantly from the UDP implementation. The first such line is the +creation of the client socket. + +clientSocket = socket(AF_INET, SOCK_STREAM) + +This line creates the client's socket, called clientSocket . The first +parameter again indicates that the underlying network is using IPv4. The +second parameter + +Figure 2.29 The client-server application using TCP + +indicates that the socket is of type SOCK_STREAM , which means it is a +TCP socket (rather than a UDP socket). Note that we are again not +specifying the port number of the client socket when we create it; we +are instead letting the operating system do this for us. Now the next +line of code is very different from what we saw in UDPClient: + +clientSocket.connect((serverName, serverPort)) + +Recall that before the client can send data to the server (or vice +versa) using a TCP socket, a TCP connection must first be established +between the client and server. The above line initiates the TCP +connection between the client and server. The parameter of the connect() +method is the address of the server side of the connection. After this +line of code is executed, the three-way handshake is performed and a TCP +connection is established between the client and server. + +sentence = raw_input('Input lowercase sentence:') + +As with UDPClient, the above obtains a sentence from the user. The +string sentence continues to gather characters until the user ends the +line by typing a carriage return. The next line of code is also very +different from UDPClient: + +clientSocket.send(sentence.encode()) + +The above line sends the sentence through the client's socket and into +the TCP connection. Note that the program does not explicitly create a +packet and attach the destination address to the packet, as was the case +with UDP sockets. Instead the client program simply drops the bytes in +the string sentence into the TCP connection. The client then waits to +receive bytes from the server. + +modifiedSentence = clientSocket.recv(2048) + +When characters arrive from the server, they get placed into the string +modifiedSentence . Characters continue to accumulate in modifiedSentence +until the line ends with a carriage return character. After printing the +capitalized sentence, we close the client's socket: + +clientSocket.close() + +This last line closes the socket and, hence, closes the TCP connection +between the client and the server. It causes TCP in the client to send a +TCP message to TCP in the server (see Section 3.5). + +TCPServer.py Now let's take a look at the server program. + +from socket import \* serverPort = 12000 serverSocket = socket(AF_INET, +SOCK_STREAM) serverSocket.bind(('', serverPort)) serverSocket.listen(1) +print('The server is ready to receive') while True: connectionSocket, +addr = serverSocket.accept() sentence = +connectionSocket.recv(1024).decode() capitalizedSentence = +sentence.upper() connectionSocket.send(capitalizedSentence.encode()) +connectionSocket.close() + +Let's now take a look at the lines that differ significantly from +UDPServer and TCPClient. As with TCPClient, the server creates a TCP +socket with: + +serverSocket=socket(AF_INET, SOCK_STREAM) + +Similar to UDPServer, we associate the server port number, serverPort , +with this socket: + +serverSocket.bind(('', serverPort)) + +But with TCP, serverSocket will be our welcoming socket. After +establishing this welcoming door, we will wait and listen for some +client to knock on the door: + +serverSocket.listen(1) + +This line has the server listen for TCP connection requests from the +client. The parameter specifies the maximum number of queued connections +(at least 1). + +connectionSocket, addr = serverSocket.accept() + +When a client knocks on this door, the program invokes the accept() +method for serverSocket, which creates a new socket in the server, +called connectionSocket , dedicated to this particular client. The +client and server then complete the handshaking, creating a TCP +connection between the client's clientSocket and the server's +connectionSocket . With the TCP connection established, the client and +server can now send bytes to each other over the connection. With TCP, +all bytes sent from one side not are not only guaranteed to arrive at +the other side but also guaranteed arrive in order. + +connectionSocket.close() + +In this program, after sending the modified sentence to the client, we +close the connection socket. But since serverSocket remains open, +another client can now knock on the door and send the server a sentence +to modify. This completes our discussion of socket programming in TCP. +You are encouraged to run the two programs in two separate hosts, and +also to modify them to achieve slightly different goals. You should +compare the UDP program pair with the TCP program pair and see how they +differ. You should also do many of the socket programming assignments +described at the ends of Chapter 2, 4, and 9. Finally, we hope someday, +after mastering these and more advanced socket programs, you will write +your own popular network application, become very rich and famous, and +remember the authors of this textbook! + +2.8 Summary In this chapter, we've studied the conceptual and the +implementation aspects of network applications. We've learned about the +ubiquitous client-server architecture adopted by many Internet +applications and seen its use in the HTTP, SMTP, POP3, and DNS +protocols. We've studied these important applicationlevel protocols, and +their corresponding associated applications (the Web, file transfer, +e-mail, and DNS) in some detail. We've learned about the P2P +architecture and how it is used in many applications. We've also learned +about streaming video, and how modern video distribution systems +leverage CDNs. We've examined how the socket API can be used to build +network applications. We've walked through the use of sockets for +connection-oriented (TCP) and connectionless (UDP) end-to-end transport +services. The first step in our journey down the layered network +architecture is now complete! At the very beginning of this book, in +Section 1.1, we gave a rather vague, bare-bones definition of a +protocol: "the format and the order of messages exchanged between two or +more communicating entities, as well as the actions taken on the +transmission and/or receipt of a message or other event." The material +in this chapter, and in particular our detailed study of the HTTP, SMTP, +POP3, and DNS protocols, has now added considerable substance to this +definition. Protocols are a key concept in networking; our study of +application protocols has now given us the opportunity to develop a more +intuitive feel for what protocols are all about. In Section 2.1, we +described the service models that TCP and UDP offer to applications that +invoke them. We took an even closer look at these service models when we +developed simple applications that run over TCP and UDP in Section 2.7. +However, we have said little about how TCP and UDP provide these service +models. For example, we know that TCP provides a reliable data service, +but we haven't said yet how it does so. In the next chapter we'll take a +careful look at not only the what, but also the how and why of transport +protocols. Equipped with knowledge about Internet application structure +and application-level protocols, we're now ready to head further down +the protocol stack and examine the transport layer in Chapter 3. + +Homework Problems and Questions + +Chapter 2 Review Questions + +SECTION 2.1 R1. List five nonproprietary Internet applications and the +application-layer protocols that they use. R2. What is the difference +between network architecture and application architecture? R3. For a +communication session between a pair of processes, which process is the +client and which is the server? R4. For a P2P file-sharing application, +do you agree with the statement, "There is no notion of client and +server sides of a communication session"? Why or why not? R5. What +information is used by a process running on one host to identify a +process running on another host? R6. Suppose you wanted to do a +transaction from a remote client to a server as fast as possible. Would +you use UDP or TCP? Why? R7. Referring to Figure 2.4 , we see that none +of the applications listed in Figure 2.4 requires both no data loss and +timing. Can you conceive of an application that requires no data loss +and that is also highly time-sensitive? R8. List the four broad classes +of services that a transport protocol can provide. For each of the +service classes, indicate if either UDP or TCP (or both) provides such a +service. R9. Recall that TCP can be enhanced with SSL to provide +process-to-process security services, including encryption. Does SSL +operate at the transport layer or the application layer? If the +application developer wants TCP to be enhanced with SSL, what does the +developer have to do? + +SECTION 2.2--2.5 R10. What is meant by a handshaking protocol? R11. Why +do HTTP, SMTP, and POP3 run on top of TCP rather than on UDP? R12. +Consider an e-commerce site that wants to keep a purchase record for +each of its customers. Describe how this can be done with cookies. R13. +Describe how Web caching can reduce the delay in receiving a requested +object. Will Web caching reduce the delay for all objects requested by a +user or for only some of the objects? + +Why? R14. Telnet into a Web server and send a multiline request message. +Include in the request message the If-modified-since: header line to +force a response message with the 304 Not Modified status code. R15. +List several popular messaging apps. Do they use the same protocols as +SMS? R16. Suppose Alice, with a Web-based e-mail account (such as +Hotmail or Gmail), sends a message to Bob, who accesses his mail from +his mail server using POP3. Discuss how the message gets from Alice's +host to Bob's host. Be sure to list the series of application-layer +protocols that are used to move the message between the two hosts. R17. +Print out the header of an e-mail message you have recently received. +How many Received: header lines are there? Analyze each of the header +lines in the message. R18. From a user's perspective, what is the +difference between the download-and-delete mode and the +download-and-keep mode in POP3? R19. Is it possible for an +organization's Web server and mail server to have exactly the same alias +for a hostname (for example, foo.com )? What would be the type for the +RR that contains the hostname of the mail server? R20. Look over your +received e-mails, and examine the header of a message sent from a user +with a .edu e-mail address. Is it possible to determine from the header +the IP address of the host from which the message was sent? Do the same +for a message sent from a Gmail account. + +SECTION 2.5 R21. In BitTorrent, suppose Alice provides chunks to Bob +throughout a 30-second interval. Will Bob necessarily return the favor +and provide chunks to Alice in this same interval? Why or why not? R22. +Consider a new peer Alice that joins BitTorrent without possessing any +chunks. Without any chunks, she cannot become a top-four uploader for +any of the other peers, since she has nothing to upload. How then will +Alice get her first chunk? R23. What is an overlay network? Does it +include routers? What are the edges in the overlay network? + +SECTION 2.6 R24. CDNs typically adopt one of two different server +placement philosophies. Name and briefly describe them. R25. Besides +network-related considerations such as delay, loss, and bandwidth +performance, there are other important factors that go into designing a +CDN server selection strategy. What are they? + +SECTION 2.7 R26. In Section 2.7, the UDP server described needed only +one socket, whereas the TCP server needed two sockets. Why? If the TCP +server were to support n simultaneous connections, each from a different +client host, how many sockets would the TCP server need? R27. For the +client-server application over TCP described in Section 2.7 , why must +the server program be executed before the client program? For the +client-server application over UDP, why may the client program be +executed before the server program? + +Problems P1. True or false? + +a. A user requests a Web page that consists of some text and three + images. For this page, the client will send one request message and + receive four response messages. + +b. Two distinct Web pages (for example, www.mit.edu/research.html and + www.mit.edu/students.html ) can be sent over the same persistent + connection. + +c. With nonpersistent connections between browser and origin server, it + is possible for a single TCP segment to carry two distinct HTTP + request messages. + +d. The Date: header in the HTTP response message indicates when the + object in the response was last modified. + +e. HTTP response messages never have an empty message body. P2. SMS, + iMessage, and WhatsApp are all smartphone real-time messaging + systems. After doing some research on the Internet, for each of + these systems write one paragraph about the protocols they use. Then + write a paragraph explaining how they differ. P3. Consider an HTTP + client that wants to retrieve a Web document at a given URL. The IP + address of the HTTP server is initially unknown. What transport and + application-layer protocols besides HTTP are needed in this + scenario? P4. Consider the following string of ASCII characters that + were captured by Wireshark when the browser sent an HTTP GET message + (i.e., this is the actual content of an HTTP GET message). The + characters `<cr>`{=html}`<lf>`{=html} are carriage return and + line-feed characters (that is, the italized character string + `<cr>`{=html} in the text below represents the single + carriage-return character that was contained at that point in the + HTTP header). Answer the following questions, indicating where in + the HTTP GET message below you find the answer. GET + /cs453/index.html HTTP/1.1`<cr>`{=html}`<lf>`{=html}Host: gai + a.cs.umass.edu`<cr>`{=html}`<lf>`{=html}User-Agent: Mozilla/5.0 ( + Windows;U; Windows NT 5.1; en-US; rv:1.7.2) Gec ko/20040804 + Netscape/7.2 (ax) `<cr>`{=html}`<lf>`{=html}Accept:ex + +t/xml, application/xml, application/xhtml+xml, text /html;q=0.9, +text/plain;q=0.8, image/png,*/*;q=0.5 +`<cr>`{=html}`<lf>`{=html}Accept-Language: en-us, +en;q=0.5`<cr>`{=html}`<lf>`{=html}AcceptEncoding: zip, +deflate`<cr>`{=html}`<lf>`{=html}Accept-Charset: ISO -8859-1, +utf-8;q=0.7,\*;q=0.7`<cr>`{=html}`<lf>`{=html}Keep-Alive: +300`<cr>`{=html} +`<lf>`{=html}Connection:keep-alive`<cr>`{=html}`<lf>`{=html}`<cr>`{=html}`<lf>`{=html} + +a. What is the URL of the document requested by the browser? + +b. What version of HTTP is the browser running? + +c. Does the browser request a non-persistent or a persistent + connection? + +d. What is the IP address of the host on which the browser is running? + +e. What type of browser initiates this message? Why is the browser type + needed in an HTTP request message? P5. The text below shows the + reply sent from the server in response to the HTTP GET message in + the question above. Answer the following questions, indicating where + in the message below you find the answer. HTTP/1.1 200 + OK`<cr>`{=html}`<lf>`{=html}Date: Tue, 07 Mar 2008 + 12:39:45GMT`<cr>`{=html}`<lf>`{=html}Server: Apache/2.0.52 (Fedora) + `<cr>`{=html}`<lf>`{=html}Last-Modified: Sat, 10 Dec2005 18:27:46 + GMT`<cr>`{=html}`<lf>`{=html}ETag: + "526c3-f22-a88a4c80"`<cr>`{=html}`<lf>`{=html}AcceptRanges: + bytes`<cr>`{=html}`<lf>`{=html}Content-Length: + 3874`<cr>`{=html}`<lf>`{=html} Keep-Alive: + timeout=max=100`<cr>`{=html}`<lf>`{=html}Connection: + Keep-Alive`<cr>`{=html}`<lf>`{=html}Content-Type: text/html; + charset= + ISO-8859-1`<cr>`{=html}`<lf>`{=html}`<cr>`{=html}`<lf>`{=html}\<!doctype + html public "//w3c//dtd html 4.0 transitional//en"\>`<lf>`{=html} + + ```{=html} + <html> + ``` + `<lf>`{=html} + + ```{=html} + <head> + ``` + `<lf>`{=html} + + ```{=html} + <meta http-equiv=”Content-Type” + content=”text/html; charset=iso-8859-1”> + ``` + `<lf>`{=html} \<meta name="GENERATOR" content="Mozilla/4.79 \[en\] + (Windows NT 5.0; U) Netscape\]"\>`<lf>`{=html} + + ```{=html} + <title> + ``` + CMPSCI 453 / 591 / NTU-ST550ASpring 2005 homepage + + ```{=html} + </title> + ``` + `<lf>`{=html} + + ```{=html} + </head> + ``` + `<lf>`{=html} \<much more document text following here (not shown)\> + +f. Was the server able to successfully find the document or not? What + time was the document reply provided? + +g. When was the document last modified? + +h. How many bytes are there in the document being returned? + +i. What are the first 5 bytes of the document being returned? Did the + server agree to a + +persistent connection? P6. Obtain the HTTP/1.1 specification (RFC 2616). +Answer the following questions: + +a. Explain the mechanism used for signaling between the client and + server to indicate that a persistent connection is being closed. Can + the client, the server, or both signal the close of a connection? + +b. What encryption services are provided by HTTP? + +c. Can a client open three or more simultaneous connections with a + given server? + +d. Either a server or a client may close a transport connection between + them if either one detects the connection has been idle for some + time. Is it possible that one side starts closing a connection while + the other side is transmitting data via this connection? Explain. + P7. Suppose within your Web browser you click on a link to obtain a + Web page. The IP address for the associated URL is not cached in + your local host, so a DNS lookup is necessary to obtain the IP + address. Suppose that n DNS servers are visited before your host + receives the IP address from DNS; the successive visits incur an RTT + of RTT1,. . .,RTTn. Further suppose that the Web page associated + with the link contains exactly one object, consisting of a small + amount of HTML text. Let RTT0 denote the RTT between the local host + and the server containing the object. Assuming zero transmission + time of the object, how much time elapses from when the client + clicks on the link until the client receives the object? P8. + Referring to Problem P7, suppose the HTML file references eight very + small objects on the same server. Neglecting transmission times, how + much time elapses with + +e. Non-persistent HTTP with no parallel TCP connections? + +f. Non-persistent HTTP with the browser configured for 5 parallel + connections? + +g. Persistent HTTP? P9. Consider Figure 2.12 , for which there is an + institutional network connected to the Internet. Suppose that the + average object size is 850,000 bits and that the average request + rate from the institution's browsers to the origin servers is 16 + requests per second. Also suppose that the amount of time it takes + from when the router on the Internet side of the access link + forwards an HTTP request until it receives the response is three + seconds on average (see Section 2.2.5). Model the total average + response time as the sum of the average access delay (that is, the + delay from Internet router to institution router) and the average + Internet delay. For the average access delay, use Δ/(1−Δβ), where Δ + is the average time required to send an object over the access link + and b is the arrival rate of objects to the access link. + +h. Find the total average response time. + +i. Now suppose a cache is installed in the institutional LAN. Suppose + the miss rate is 0.4. Find the total response time. + +P10. Consider a short, 10-meter link, over which a sender can transmit +at a rate of 150 bits/sec in both directions. Suppose that packets +containing data are 100,000 bits long, and packets containing only +control (e.g., ACK or handshaking) are 200 bits long. Assume that N +parallel connections each get 1/N of the link bandwidth. Now consider +the HTTP protocol, and suppose that each downloaded object is 100 Kbits +long, and that the initial downloaded object contains 10 referenced +objects from the same sender. Would parallel downloads via parallel +instances of non-persistent HTTP make sense in this case? Now consider +persistent HTTP. Do you expect significant gains over the non-persistent +case? Justify and explain your answer. P11. Consider the scenario +introduced in the previous problem. Now suppose that the link is shared +by Bob with four other users. Bob uses parallel instances of +non-persistent HTTP, and the other four users use non-persistent HTTP +without parallel downloads. + +a. Do Bob's parallel connections help him get Web pages more quickly? + Why or why not? +b. If all five users open five parallel instances of non-persistent + HTTP, then would Bob's parallel connections still be beneficial? Why + or why not? P12. Write a simple TCP program for a server that + accepts lines of input from a client and prints the lines onto the + server's standard output. (You can do this by modifying the + TCPServer.py program in the text.) Compile and execute your program. + On any other machine that contains a Web browser, set the proxy + server in the browser to the host that is running your server + program; also configure the port number appropriately. Your browser + should now send its GET request messages to your server, and your + server should display the messages on its standard output. Use this + platform to determine whether your browser generates conditional GET + messages for objects that are locally cached. P13. What is the + difference between MAIL FROM : in SMTP and From : in the mail + message itself? P14. How does SMTP mark the end of a message body? + How about HTTP? Can HTTP use the same method as SMTP to mark the end + of a message body? Explain. P15. Read RFC 5321 for SMTP. What does + MTA stand for? Consider the following received spam e-mail (modified + from a real spam e-mail). Assuming only the originator of this spam + e-mail is malicious and all other hosts are honest, identify the + malacious host that has generated this spam e-mail. + +From - Fri Nov 07 13:41:30 2008 Return-Path: <tennis5@pp33head.com> +Received: from barmail.cs.umass.edu (barmail.cs.umass.edu +\[128.119.240.3\]) by cs.umass.edu (8.13.1/8.12.6) for +<hg@cs.umass.edu>; Fri, 7 Nov 2008 13:27:10 -0500 Received: from +asusus-4b96 (localhost \[127.0.0.1\]) by barmail.cs.umass.edu (Spam +Firewall) for <hg@cs.umass.edu>; Fri, 7 + +Nov 2008 13:27:07 -0500 (EST) Received: from asusus-4b96 +(\[58.88.21.177\]) by barmail.cs.umass.edu for <hg@cs.umass.edu>; Fri, +07 Nov 2008 13:27:07 -0500 (EST) Received: from \[58.88.21.177\] by +inbnd55.exchangeddd.com; Sat, 8 Nov 2008 01:27:07 +0700 From: "Jonny" +<tennis5@pp33head.com> To: <hg@cs.umass.edu> Subject: How to secure your +savings + +P16. Read the POP3 RFC, RFC 1939. What is the purpose of the UIDL POP3 +command? P17. Consider accessing your e-mail with POP3. + +a. Suppose you have configured your POP mail client to operate in the + download-anddelete mode. Complete the following transaction: + +C: list S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: +..........blah S: . ? ? + +b. Suppose you have configured your POP mail client to operate in the + download-and-keep mode. Complete the following transaction: C: list + S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: ..........blah + S: . ? + +? + +c. Suppose you have configured your POP mail client to operate in the + download-and-keep mode. Using your transcript in part (b), suppose + you retrieve messages 1 and 2, exit POP, and then five minutes later + you again access POP to retrieve new e-mail. Suppose that in the + five-minute interval no new messages have been sent to you. Provide + a transcript of this second POP session. P18. + +d. What is a whois database? + +e. Use various whois databases on the Internet to obtain the names of + two DNS servers. Indicate which whois databases you used. + +f. Use nslookup on your local host to send DNS queries to three DNS + servers: your local DNS server and the two DNS servers you found in + part (b). Try querying for Type A, NS, and MX reports. Summarize + your findings. + +g. Use nslookup to find a Web server that has multiple IP addresses. + Does the Web server of your institution (school or company) have + multiple IP addresses? + +h. Use the ARIN whois database to determine the IP address range used + by your university. + +i. Describe how an attacker can use whois databases and the nslookup + tool to perform reconnaissance on an institution before launching an + attack. + +j. Discuss why whois databases should be publicly available. P19. In + this problem, we use the useful dig tool available on Unix and Linux + hosts to explore the hierarchy of DNS servers. Recall that in Figure + 2.19 , a DNS server in the DNS hierarchy delegates a DNS query to a + DNS server lower in the hierarchy, by sending back to the DNS client + the name of that lower-level DNS server. First read the man page for + dig, and then answer the following questions. + +k. Starting with a root DNS server (from one of the root servers + \[a-m\].root-servers.net), initiate a sequence of queries for the IP + address for your department's Web server by using dig. Show the list + of the names of DNS servers in the delegation chain in answering + your query. + +l. Repeat part (a) for several popular Web sites, such as google.com, + yahoo.com, or amazon.com. P20. Suppose you can access the caches in + the local DNS servers of your department. Can you propose a way to + roughly determine the Web servers (outside your department) that are + most popular among the users in your department? Explain. P21. + Suppose that your department has a local DNS server for all + computers in the department. + +You are an ordinary user (i.e., not a network/system administrator). Can +you determine if an external Web site was likely accessed from a +computer in your department a couple of seconds ago? Explain. P22. +Consider distributing a file of F=15 Gbits to N peers. The server has an +upload rate of us=30 Mbps, and each peer has a download rate of di=2 +Mbps and an upload rate of u. For N=10, 100, and 1,000 and u=300 Kbps, +700 Kbps, and 2 Mbps, prepare a chart giving the minimum distribution +time for each of the combinations of N and u for both client-server +distribution and P2P distribution. P23. Consider distributing a file of +F bits to N peers using a client-server architecture. Assume a fluid +model where the server can simultaneously transmit to multiple peers, +transmitting to each peer at different rates, as long as the combined +rate does not exceed us. + +a. Suppose that us/N≤dmin. Specify a distribution scheme that has a + distribution time of NF/us. + +b. Suppose that us/N≥dmin. Specify a distribution scheme that has a + distribution time of F/dmin. + +c. Conclude that the minimum distribution time is in general given by + max{NF/us, F/dmin}. P24. Consider distributing a file of F bits to N + peers using a P2P architecture. Assume a fluid model. For simplicity + assume that dmin is very large, so that peer download bandwidth is + never a bottleneck. + +d. Suppose that us≤(us+u1+...+uN)/N. Specify a distribution scheme that + has a distribution time of F/us. + +e. Suppose that us≥(us+u1+...+uN)/N. Specify a distribution scheme that + has a distribution time of NF/(us+u1+...+uN). + +f. Conclude that the minimum distribution time is in general given by + max{F/us, NF/(us+u1+...+uN)}. P25. Consider an overlay network with + N active peers, with each pair of peers having an active TCP + connection. Additionally, suppose that the TCP connections pass + through a total of M routers. How many nodes and edges are there in + the corresponding overlay network? P26. Suppose Bob joins a + BitTorrent torrent, but he does not want to upload any data to any + other peers (so called free-riding). + +g. Bob claims that he can receive a complete copy of the file that is + shared by the swarm. Is Bob's claim possible? Why or why not? + +h. Bob further claims that he can further make his "free-riding" more + efficient by using a collection of multiple computers (with distinct + IP addresses) in the computer lab in his department. How can he do + that? P27. Consider a DASH system for which there are N video + versions (at N different rates and qualities) and N audio versions + (at N different rates and qualities). Suppose we want to allow the + +player to choose at any time any of the N video versions and any of the +N audio versions. + +a. If we create files so that the audio is mixed in with the video, so + server sends only one media stream at given time, how many files + will the server need to store (each a different URL)? + +b. If the server instead sends the audio and video streams separately + and has the client synchronize the streams, how many files will the + server need to store? P28. Install and compile the Python programs + TCPClient and UDPClient on one host and TCPServer and UDPServer on + another host. + +c. Suppose you run TCPClient before you run TCPServer. What happens? + Why? + +d. Suppose you run UDPClient before you run UDPServer. What happens? + Why? + +e. What happens if you use different port numbers for the client and + server sides? P29. Suppose that in UDPClient.py, after we create the + socket, we add the line: clientSocket.bind(('', 5432)) + +Will it become necessary to change UDPServer.py? What are the port +numbers for the sockets in UDPClient and UDPServer? What were they +before making this change? P30. Can you configure your browser to open +multiple simultaneous connections to a Web site? What are the advantages +and disadvantages of having a large number of simultaneous TCP +connections? P31. We have seen that Internet TCP sockets treat the data +being sent as a byte stream but UDP sockets recognize message +boundaries. What are one advantage and one disadvantage of byte-oriented +API versus having the API explicitly recognize and preserve +application-defined message boundaries? P32. What is the Apache Web +server? How much does it cost? What functionality does it currently +have? You may want to look at Wikipedia to answer this question. + +Socket Programming Assignments The Companion Website includes six socket +programming assignments. The first four assignments are summarized +below. The fifth assignment makes use of the ICMP protocol and is +summarized at the end of Chapter 5. The sixth assignment employs +multimedia protocols and is summarized at the end of Chapter 9. It is +highly recommended that students complete several, if not all, of these +assignments. Students can find full details of these assignments, as +well as important snippets of the Python code, at the Web site +www.pearsonhighered.com/cs-resources. Assignment 1: Web Server + +In this assignment, you will develop a simple Web server in Python that +is capable of processing only one request. Specifically, your Web server +will (i) create a connection socket when contacted by a client +(browser); (ii) receive the HTTP request from this connection; (iii) +parse the request to determine the specific file being requested; (iv) +get the requested file from the server's file system; (v) create an HTTP +response message consisting of the requested file preceded by header +lines; and (vi) send the response over the TCP connection to the +requesting browser. If a browser requests a file that is not present in +your server, your server should return a "404 Not Found" error message. +In the Companion Website, we provide the skeleton code for your server. +Your job is to complete the code, run your server, and then test your +server by sending requests from browsers running on different hosts. If +you run your server on a host that already has a Web server running on +it, then you should use a different port than port 80 for your Web +server. Assignment 2: UDP Pinger In this programming assignment, you +will write a client ping program in Python. Your client will send a +simple ping message to a server, receive a corresponding pong message +back from the server, and determine the delay between when the client +sent the ping message and received the pong message. This delay is +called the Round Trip Time (RTT). The functionality provided by the +client and server is similar to the functionality provided by standard +ping program available in modern operating systems. However, standard +ping programs use the Internet Control Message Protocol (ICMP) (which we +will study in Chapter 5). Here we will create a nonstandard (but +simple!) UDP-based ping program. Your ping program is to send 10 ping +messages to the target server over UDP. For each message, your client is +to determine and print the RTT when the corresponding pong message is +returned. Because UDP is an unreliable protocol, a packet sent by the +client or server may be lost. For this reason, the client cannot wait +indefinitely for a reply to a ping message. You should have the client +wait up to one second for a reply from the server; if no reply is +received, the client should assume that the packet was lost and print a +message accordingly. In this assignment, you will be given the complete +code for the server (available in the Companion Website). Your job is to +write the client code, which will be very similar to the server code. It +is recommended that you first study carefully the server code. You can +then write your client code, liberally cutting and pasting lines from +the server code. Assignment 3: Mail Client The goal of this programming +assignment is to create a simple mail client that sends e-mail to any +recipient. Your client will need to establish a TCP connection with a +mail server (e.g., a Google mail server), dialogue with the mail server +using the SMTP protocol, send an e-mail message to a recipient + +(e.g., your friend) via the mail server, and finally close the TCP +connection with the mail server. For this assignment, the Companion +Website provides the skeleton code for your client. Your job is to +complete the code and test your client by sending e-mail to different +user accounts. You may also try sending through different servers (for +example, through a Google mail server and through your university mail +server). Assignment 4: Multi-Threaded Web Proxy In this assignment, you +will develop a Web proxy. When your proxy receives an HTTP request for +an object from a browser, it generates a new HTTP request for the same +object and sends it to the origin server. When the proxy receives the +corresponding HTTP response with the object from the origin server, it +creates a new HTTP response, including the object, and sends it to the +client. This proxy will be multi-threaded, so that it will be able to +handle multiple requests at the same time. For this assignment, the +Companion Website provides the skeleton code for the proxy server. Your +job is to complete the code, and then test it by having different +browsers request Web objects via your proxy. + +Wireshark Lab: HTTP Having gotten our feet wet with the Wireshark packet +sniffer in Lab 1, we're now ready to use Wireshark to investigate +protocols in operation. In this lab, we'll explore several aspects of +the HTTP protocol: the basic GET/reply interaction, HTTP message +formats, retrieving large HTML files, retrieving HTML files with +embedded URLs, persistent and non-persistent connections, and HTTP +authentication and security. As is the case with all Wireshark labs, the +full description of this lab is available at this book's Web site, +www.pearsonhighered.com/cs-resources. + +Wireshark Lab: DNS In this lab, we take a closer look at the client side +of the DNS, the protocol that translates Internet hostnames to IP +addresses. Recall from Section 2.5 that the client's role in the DNS is +relatively simple ---a client sends a query to its local DNS server and +receives a response back. Much can go on under the covers, invisible to +the DNS clients, as the hierarchical DNS servers communicate with each +other to either recursively or iteratively resolve the client's DNS +query. From the DNS client's standpoint, however, the protocol is quite +simple---a query is formulated to the local DNS server and a response is +received from that server. We observe DNS in action in this lab. + +As is the case with all Wireshark labs, the full description of this lab +is available at this book's Web site, +www.pearsonhighered.com/cs-resources. An Interview With... Marc +Andreessen Marc Andreessen is the co-creator of Mosaic, the Web browser +that popularized the World Wide Web in 1993. Mosaic had a clean, easily +understood interface and was the first browser to display images in-line +with text. In 1994, Marc Andreessen and Jim Clark founded Netscape, +whose browser was by far the most popular browser through the mid-1990s. +Netscape also developed the Secure Sockets Layer (SSL) protocol and many +Internet server products, including mail servers and SSL-based Web +servers. He is now a co-founder and general partner of venture capital +firm Andreessen Horowitz, overseeing portfolio development with holdings +that include Facebook, Foursquare, Groupon, Jawbone, Twitter, and Zynga. +He serves on numerous boards, including Bump, eBay, Glam Media, +Facebook, and Hewlett-Packard. He holds a BS in Computer Science from +the University of Illinois at Urbana-Champaign. + +How did you become interested in computing? Did you always know that you +wanted to work in information technology? The video game and personal +computing revolutions hit right when I was growing up---personal +computing was the new technology frontier in the late 70's and early +80's. And it wasn't just Apple and the IBM PC, but hundreds of new +companies like Commodore and Atari as well. I taught myself to program +out of a book called "Instant Freeze-Dried BASIC" at age 10, and got my +first computer (a TRS-80 Color Computer---look it up!) at age 12. Please +describe one or two of the most exciting projects you have worked on +during your career. + +What were the biggest challenges? Undoubtedly the most exciting project +was the original Mosaic web browser in '92--'93---and the biggest +challenge was getting anyone to take it seriously back then. At the +time, everyone thought the interactive future would be delivered as +"interactive television" by huge companies, not as the Internet by +startups. What excites you about the future of networking and the +Internet? What are your biggest concerns? The most exciting thing is the +huge unexplored frontier of applications and services that programmers +and entrepreneurs are able to explore---the Internet has unleashed +creativity at a level that I don't think we've ever seen before. My +biggest concern is the principle of unintended consequences---we don't +always know the implications of what we do, such as the Internet being +used by governments to run a new level of surveillance on citizens. Is +there anything in particular students should be aware of as Web +technology advances? The rate of change---the most important thing to +learn is how to learn---how to flexibly adapt to changes in the specific +technologies, and how to keep an open mind on the new opportunities and +possibilities as you move through your career. What people inspired you +professionally? Vannevar Bush, Ted Nelson, Doug Engelbart, Nolan +Bushnell, Bill Hewlett and Dave Packard, Ken Olsen, Steve Jobs, Steve +Wozniak, Andy Grove, Grace Hopper, Hedy Lamarr, Alan Turing, Richard +Stallman. What are your recommendations for students who want to pursue +careers in computing and information technology? Go as deep as you +possibly can on understanding how technology is created, and then +complement with learning how business works. Can technology solve the +world's problems? No, but we advance the standard of living of people +through economic growth, and most economic growth throughout history has +come from technology---so that's as good as it gets. + +Chapter 3 Transport Layer + +Residing between the application and network layers, the transport layer +is a central piece of the layered network architecture. It has the +critical role of providing communication services directly to the +application processes running on different hosts. The pedagogic approach +we take in this chapter is to alternate between discussions of +transport-layer principles and discussions of how these principles are +implemented in existing protocols; as usual, particular emphasis will be +given to Internet protocols, in particular the TCP and UDP +transport-layer protocols. We'll begin by discussing the relationship +between the transport and network layers. This sets the stage for +examining the first critical function of the transport layer---extending +the network layer's delivery service between two end systems to a +delivery service between two application-layer processes running on the +end systems. We'll illustrate this function in our coverage of the +Internet's connectionless transport protocol, UDP. We'll then return to +principles and confront one of the most fundamental problems in computer +networking---how two entities can communicate reliably over a medium +that may lose and corrupt data. Through a series of increasingly +complicated (and realistic!) scenarios, we'll build up an array of +techniques that transport protocols use to solve this problem. We'll +then show how these principles are embodied in TCP, the Internet's +connection-oriented transport protocol. We'll next move on to a second +fundamentally important problem in networking---controlling the +transmission rate of transport-layer entities in order to avoid, or +recover from, congestion within the network. We'll consider the causes +and consequences of congestion, as well as commonly used +congestion-control techniques. After obtaining a solid understanding of +the issues behind congestion control, we'll study TCP's approach to +congestion control. + +3.1 Introduction and Transport-Layer Services In the previous two +chapters we touched on the role of the transport layer and the services +that it provides. Let's quickly review what we have already learned +about the transport layer. A transport-layer protocol provides for +logical communication between application processes running on different +hosts. By logical communication, we mean that from an application's +perspective, it is as if the hosts running the processes were directly +connected; in reality, the hosts may be on opposite sides of the planet, +connected via numerous routers and a wide range of link types. +Application processes use the logical communication provided by the +transport layer to send messages to each other, free from the worry of +the details of the physical infrastructure used to carry these messages. +Figure 3.1 illustrates the notion of logical communication. As shown in +Figure 3.1, transport-layer protocols are implemented in the end systems +but not in network routers. On the sending side, the transport layer +converts the application-layer messages it receives from a sending +application process into transport-layer packets, known as +transport-layer segments in Internet terminology. This is done by +(possibly) breaking the application messages into smaller chunks and +adding a transport-layer header to each chunk to create the +transport-layer segment. The transport layer then passes the segment to +the network layer at the sending end system, where the segment is +encapsulated within a network-layer packet (a datagram) and sent to the +destination. It's important to note that network routers act only on the +network-layer fields of the datagram; that is, they do not examine the +fields of the transport-layer segment encapsulated with the datagram. On +the receiving side, the network layer extracts the transport-layer +segment from the datagram and passes the segment up to the transport +layer. The transport layer then processes the received segment, making +the data in the segment available to the receiving application. More +than one transport-layer protocol may be available to network +applications. For example, the Internet has two protocols---TCP and UDP. +Each of these protocols provides a different set of transportlayer +services to the invoking application. + +3.1.1 Relationship Between Transport and Network Layers Recall that the +transport layer lies just above the network layer in the protocol stack. +Whereas a transport-layer protocol provides logical communication +between + +Figure 3.1 The transport layer provides logical rather than physical +communication between application processes + +processes running on different hosts, a network-layer protocol provides +logical-communication between hosts. This distinction is subtle but +important. Let's examine this distinction with the aid of a household +analogy. Consider two houses, one on the East Coast and the other on the +West Coast, with each house being home to a dozen kids. The kids in the +East Coast household are cousins of the kids in the West Coast + +household. The kids in the two households love to write to each +other---each kid writes each cousin every week, with each letter +delivered by the traditional postal service in a separate envelope. +Thus, each household sends 144 letters to the other household every +week. (These kids would save a lot of money if they had e-mail!) In each +of the households there is one kid---Ann in the West Coast house and +Bill in the East Coast house---responsible for mail collection and mail +distribution. Each week Ann visits all her brothers and sisters, +collects the mail, and gives the mail to a postal-service mail carrier, +who makes daily visits to the house. When letters arrive at the West +Coast house, Ann also has the job of distributing the mail to her +brothers and sisters. Bill has a similar job on the East Coast. In this +example, the postal service provides logical communication between the +two houses---the postal service moves mail from house to house, not from +person to person. On the other hand, Ann and Bill provide logical +communication among the cousins---Ann and Bill pick up mail from, and +deliver mail to, their brothers and sisters. Note that from the cousins' +perspective, Ann and Bill are the mail service, even though Ann and Bill +are only a part (the end-system part) of the end-to-end delivery +process. This household example serves as a nice analogy for explaining +how the transport layer relates to the network layer: application +messages = letters in envelopes processes = cousins hosts (also called +end systems) = houses transport-layer protocol = Ann and Bill +network-layer protocol = postal service (including mail carriers) +Continuing with this analogy, note that Ann and Bill do all their work +within their respective homes; they are not involved, for example, in +sorting mail in any intermediate mail center or in moving mail from one +mail center to another. Similarly, transport-layer protocols live in the +end systems. Within an end system, a transport protocol moves messages +from application processes to the network edge (that is, the network +layer) and vice versa, but it doesn't have any say about how the +messages are moved within the network core. In fact, as illustrated in +Figure 3.1, intermediate routers neither act on, nor recognize, any +information that the transport layer may have added to the application +messages. Continuing with our family saga, suppose now that when Ann and +Bill go on vacation, another cousin pair---say, Susan and +Harvey---substitute for them and provide the household-internal +collection and delivery of mail. Unfortunately for the two families, +Susan and Harvey do not do the collection and delivery in exactly the +same way as Ann and Bill. Being younger kids, Susan and Harvey pick up +and drop off the mail less frequently and occasionally lose letters +(which are sometimes chewed up by the family dog). Thus, the cousin-pair +Susan and Harvey do not provide the same set of services (that is, the +same service model) as Ann and Bill. In an analogous manner, a computer +network may make + +available multiple transport protocols, with each protocol offering a +different service model to applications. The possible services that Ann +and Bill can provide are clearly constrained by the possible services +that the postal service provides. For example, if the postal service +doesn't provide a maximum bound on how long it can take to deliver mail +between the two houses (for example, three days), then there is no way +that Ann and Bill can guarantee a maximum delay for mail delivery +between any of the cousin pairs. In a similar manner, the services that +a transport protocol can provide are often constrained by the service +model of the underlying network-layer protocol. If the network-layer +protocol cannot provide delay or bandwidth guarantees for +transport-layer segments sent between hosts, then the transport-layer +protocol cannot provide delay or bandwidth guarantees for application +messages sent between processes. Nevertheless, certain services can be +offered by a transport protocol even when the underlying network +protocol doesn't offer the corresponding service at the network layer. +For example, as we'll see in this chapter, a transport protocol can +offer reliable data transfer service to an application even when the +underlying network protocol is unreliable, that is, even when the +network protocol loses, garbles, or duplicates packets. As another +example (which we'll explore in Chapter 8 when we discuss network +security), a transport protocol can use encryption to guarantee that +application messages are not read by intruders, even when the network +layer cannot guarantee the confidentiality of transport-layer segments. + +3.1.2 Overview of the Transport Layer in the Internet Recall that the +Internet makes two distinct transport-layer protocols available to the +application layer. One of these protocols is UDP (User Datagram +Protocol), which provides an unreliable, connectionless service to the +invoking application. The second of these protocols is TCP (Transmission +Control Protocol), which provides a reliable, connection-oriented +service to the invoking application. When designing a network +application, the application developer must specify one of these two +transport protocols. As we saw in Section 2.7, the application developer +selects between UDP and TCP when creating sockets. To simplify +terminology, we refer to the transport-layer packet as a segment. We +mention, however, that the Internet literature (for example, the RFCs) +also refers to the transport-layer packet for TCP as a segment but often +refers to the packet for UDP as a datagram. But this same Internet +literature also uses the term datagram for the network-layer packet! For +an introductory book on computer networking such as this, we believe +that it is less confusing to refer to both TCP and UDP packets as +segments, and reserve the term datagram for the network-layer packet. + +Before proceeding with our brief introduction of UDP and TCP, it will be +useful to say a few words about the Internet's network layer. (We'll +learn about the network layer in detail in Chapters 4 and 5.) The +Internet's network-layer protocol has a name---IP, for Internet +Protocol. IP provides logical communication between hosts. The IP +service model is a best-effort delivery service. This means that IP +makes its "best effort" to deliver segments between communicating hosts, +but it makes no guarantees. In particular, it does not guarantee segment +delivery, it does not guarantee orderly delivery of segments, and it +does not guarantee the integrity of the data in the segments. For these +reasons, IP is said to be an unreliable service. We also mention here +that every host has at least one networklayer address, a so-called IP +address. We'll examine IP addressing in detail in Chapter 4; for this +chapter we need only keep in mind that each host has an IP address. +Having taken a glimpse at the IP service model, let's now summarize the +service models provided by UDP and TCP. The most fundamental +responsibility of UDP and TCP is to extend IP's delivery service between +two end systems to a delivery service between two processes running on +the end systems. Extending host-to-host delivery to process-to-process +delivery is called transport-layer multiplexing and demultiplexing. +We'll discuss transport-layer multiplexing and demultiplexing in the +next section. UDP and TCP also provide integrity checking by including +error-detection fields in their segments' headers. These two minimal +transport-layer services---process-to-process data delivery and error +checking---are the only two services that UDP provides! In particular, +like IP, UDP is an unreliable service---it does not guarantee that data +sent by one process will arrive intact (or at all!) to the destination +process. UDP is discussed in detail in Section 3.3. TCP, on the other +hand, offers several additional services to applications. First and +foremost, it provides reliable data transfer. Using flow control, +sequence numbers, acknowledgments, and timers (techniques we'll explore +in detail in this chapter), TCP ensures that data is delivered from +sending process to receiving process, correctly and in order. TCP thus +converts IP's unreliable service between end systems into a reliable +data transport service between processes. TCP also provides congestion +control. Congestion control is not so much a service provided to the +invoking application as it is a service for the Internet as a whole, a +service for the general good. Loosely speaking, TCP congestion control +prevents any one TCP connection from swamping the links and routers +between communicating hosts with an excessive amount of traffic. TCP +strives to give each connection traversing a congested link an equal +share of the link bandwidth. This is done by regulating the rate at +which the sending sides of TCP connections can send traffic into the +network. UDP traffic, on the other hand, is unregulated. An application +using UDP transport can send at any rate it pleases, for as long as it +pleases. A protocol that provides reliable data transfer and congestion +control is necessarily complex. We'll need several sections to cover the +principles of reliable data transfer and congestion control, and +additional sections to cover the TCP protocol itself. These topics are +investigated in Sections 3.4 through 3.8. The approach taken in this +chapter is to alternate between basic principles and the TCP protocol. +For example, we'll first discuss reliable data transfer in a general +setting and then discuss how TCP + +specifically provides reliable data transfer. Similarly, we'll first +discuss congestion control in a general setting and then discuss how TCP +performs congestion control. But before getting into all this good +stuff, let's first look at transport-layer multiplexing and +demultiplexing. + +3.2 Multiplexing and Demultiplexing In this section, we discuss +transport-layer multiplexing and demultiplexing, that is, extending the +host-tohost delivery service provided by the network layer to a +process-to-process delivery service for applications running on the +hosts. In order to keep the discussion concrete, we'll discuss this +basic transport-layer service in the context of the Internet. We +emphasize, however, that a multiplexing/demultiplexing service is needed +for all computer networks. At the destination host, the transport layer +receives segments from the network layer just below. The transport layer +has the responsibility of delivering the data in these segments to the +appropriate application process running in the host. Let's take a look +at an example. Suppose you are sitting in front of your computer, and +you are downloading Web pages while running one FTP session and two +Telnet sessions. You therefore have four network application processes +running---two Telnet processes, one FTP process, and one HTTP process. +When the transport layer in your computer receives data from the network +layer below, it needs to direct the received data to one of these four +processes. Let's now examine how this is done. First recall from Section +2.7 that a process (as part of a network application) can have one or +more sockets, doors through which data passes from the network to the +process and through which data passes from the process to the network. +Thus, as shown in Figure 3.2, the transport layer in the receiving host +does not actually deliver data directly to a process, but instead to an +intermediary socket. Because at any given time there can be more than +one socket in the receiving host, each socket has a unique identifier. +The format of the identifier depends on whether the socket is a UDP or a +TCP socket, as we'll discuss shortly. Now let's consider how a receiving +host directs an incoming transport-layer segment to the appropriate +socket. Each transport-layer segment has a set of fields in the segment +for this purpose. At the receiving end, the transport layer examines +these fields to identify the receiving socket and then directs the +segment to that socket. This job of delivering the data in a +transport-layer segment to the correct socket is called demultiplexing. +The job of gathering data chunks at the source host from different +sockets, encapsulating each data chunk with header information (that +will later be used in demultiplexing) to create segments, and passing +the segments to the network layer is called multiplexing. Note that the +transport layer in the middle host + +Figure 3.2 Transport-layer multiplexing and demultiplexing + +in Figure 3.2 must demultiplex segments arriving from the network layer +below to either process P1 or P2 above; this is done by directing the +arriving segment's data to the corresponding process's socket. The +transport layer in the middle host must also gather outgoing data from +these sockets, form transportlayer segments, and pass these segments +down to the network layer. Although we have introduced multiplexing and +demultiplexing in the context of the Internet transport protocols, it's +important to realize that they are concerns whenever a single protocol +at one layer (at the transport layer or elsewhere) is used by multiple +protocols at the next higher layer. To illustrate the demultiplexing +job, recall the household analogy in the previous section. Each of the +kids is identified by his or her name. When Bill receives a batch of +mail from the mail carrier, he performs a demultiplexing operation by +observing to whom the letters are addressed and then hand delivering the +mail to his brothers and sisters. Ann performs a multiplexing operation +when she collects letters from her brothers and sisters and gives the +collected mail to the mail person. Now that we understand the roles of +transport-layer multiplexing and demultiplexing, let us examine how it +is actually done in a host. From the discussion above, we know that +transport-layer multiplexing requires (1) that sockets have unique +identifiers, and (2) that each segment have special fields that indicate +the socket to which the segment is to be delivered. These special +fields, illustrated in Figure 3.3, are the source port number field and +the destination port number field. (The UDP and TCP segments have other +fields as well, as discussed in the subsequent sections of this +chapter.) Each port number is a 16-bit number, ranging from 0 to 65535. +The port numbers ranging from 0 to 1023 are called well-known port +numbers and are restricted, which means that they are reserved for use +by well-known + +Figure 3.3 Source and destination port-number fields in a +transport-layer segment + +application protocols such as HTTP (which uses port number 80) and FTP +(which uses port number 21). The list of well-known port numbers is +given in RFC 1700 and is updated at http://www.iana.org \[RFC 3232\]. +When we develop a new application (such as the simple application +developed in Section 2.7), we must assign the application a port number. +It should now be clear how the transport layer could implement the +demultiplexing service: Each socket in the host could be assigned a port +number, and when a segment arrives at the host, the transport layer +examines the destination port number in the segment and directs the +segment to the corresponding socket. The segment's data then passes +through the socket into the attached process. As we'll see, this is +basically how UDP does it. However, we'll also see that +multiplexing/demultiplexing in TCP is yet more subtle. Connectionless +Multiplexing and Demultiplexing Recall from Section 2.7.1 that the +Python program running in a host can create a UDP socket with the line + +clientSocket = socket(AF_INET, SOCK_DGRAM) + +When a UDP socket is created in this manner, the transport layer +automatically assigns a port number to the socket. In particular, the +transport layer assigns a port number in the range 1024 to 65535 that is +currently not being used by any other UDP port in the host. +Alternatively, we can add a line into our Python program after we create +the socket to associate a specific port number (say, 19157) to this UDP +socket via the socket bind() method: + +clientSocket.bind(('', 19157)) + +If the application developer writing the code were implementing the +server side of a "well-known protocol," then the developer would have to +assign the corresponding well-known port number. Typically, the client +side of the application lets the transport layer automatically (and +transparently) assign the port number, whereas the server side of the +application assigns a specific port number. With port numbers assigned +to UDP sockets, we can now precisely describe UDP +multiplexing/demultiplexing. Suppose a process in Host A, with UDP port +19157, wants to send a chunk of application data to a process with UDP +port 46428 in Host B. The transport layer in Host A creates a +transport-layer segment that includes the application data, the source +port number (19157), the destination port number (46428), and two other +values (which will be discussed later, but are unimportant for the +current discussion). The transport layer then passes the resulting +segment to the network layer. The network layer encapsulates the segment +in an IP datagram and makes a best-effort attempt to deliver the segment +to the receiving host. If the segment arrives at the receiving Host B, +the transport layer at the receiving host examines the destination port +number in the segment (46428) and delivers the segment to its socket +identified by port 46428. Note that Host B could be running multiple +processes, each with its own UDP socket and associated port number. As +UDP segments arrive from the network, Host B directs (demultiplexes) +each segment to the appropriate socket by examining the segment's +destination port number. It is important to note that a UDP socket is +fully identified by a two-tuple consisting of a destination IP address +and a destination port number. As a consequence, if two UDP segments +have different source IP addresses and/or source port numbers, but have +the same destination IP address and destination port number, then the +two segments will be directed to the same destination process via the +same destination socket. You may be wondering now, what is the purpose +of the source port number? As shown in Figure 3.4, in the A-to-B segment +the source port number serves as part of a "return address"---when B +wants to send a segment back to A, the destination port in the B-to-A +segment will take its value from the source port value of the A-to-B +segment. (The complete return address is A's IP address and the source +port number.) As an example, recall the UDP server program studied in +Section 2.7. In UDPServer.py , the server uses the recvfrom() method to +extract the client-side (source) port number from the segment it +receives from the client; it then sends a new segment to the client, +with the extracted source port number serving as the destination port +number in this new segment. Connection-Oriented Multiplexing and +Demultiplexing In order to understand TCP demultiplexing, we have to +take a close look at TCP sockets and TCP connection establishment. One +subtle difference between a TCP socket and a UDP socket is that a TCP + +socket is identified by a four-tuple: (source IP address, source port +number, destination IP address, destination port number). Thus, when a +TCP segment arrives from the network to a host, the host uses all four +values to direct (demultiplex) the segment to the appropriate socket. + +Figure 3.4 The inversion of source and destination port numbers + +In particular, and in contrast with UDP, two arriving TCP segments with +different source IP addresses or source port numbers will (with the +exception of a TCP segment carrying the original connectionestablishment +request) be directed to two different sockets. To gain further insight, +let's reconsider the TCP client-server programming example in Section +2.7.2: The TCP server application has a "welcoming socket," that waits +for connection-establishment requests from TCP clients (see Figure 2.29) +on port number 12000. The TCP client creates a socket and sends a +connection establishment request segment with the lines: + +clientSocket = socket(AF_INET, SOCK_STREAM) +clientSocket.connect((serverName,12000)) + +A connection-establishment request is nothing more than a TCP segment +with destination port number 12000 and a special +connection-establishment bit set in the TCP header (discussed in Section +3.5). The segment also includes a source port number that was chosen by +the client. When the host operating system of the computer running the +server process receives the incoming + +connection-request segment with destination port 12000, it locates the +server process that is waiting to accept a connection on port number +12000. The server process then creates a new socket: connectionSocket, +addr = serverSocket.accept() + +Also, the transport layer at the server notes the following four values +in the connection-request segment: (1) the source port number in the +segment, (2) the IP address of the source host, (3) the destination port +number in the segment, and (4) its own IP address. The newly created +connection socket is identified by these four values; all subsequently +arriving segments whose source port, source IP address, destination +port, and destination IP address match these four values will be +demultiplexed to this socket. With the TCP connection now in place, the +client and server can now send data to each other. The server host may +support many simultaneous TCP connection sockets, with each socket +attached to a process, and with each socket identified by its own +four-tuple. When a TCP segment arrives at the host, all four fields +(source IP address, source port, destination IP address, destination +port) are used to direct (demultiplex) the segment to the appropriate +socket. + +FOCUS ON SECURITY Port Scanning We've seen that a server process waits +patiently on an open port for contact by a remote client. Some ports are +reserved for well-known applications (e.g., Web, FTP, DNS, and SMTP +servers); other ports are used by convention by popular applications +(e.g., the Microsoft 2000 SQL server listens for requests on UDP port +1434). Thus, if we determine that a port is open on a host, we may be +able to map that port to a specific application running on the host. +This is very useful for system administrators, who are often interested +in knowing which network applications are running on the hosts in their +networks. But attackers, in order to "case the joint," also want to know +which ports are open on target hosts. If a host is found to be running +an application with a known security flaw (e.g., a SQL server listening +on port 1434 was subject to a buffer overflow, allowing a remote user to +execute arbitrary code on the vulnerable host, a flaw exploited by the +Slammer worm \[CERT 2003--04\]), then that host is ripe for attack. +Determining which applications are listening on which ports is a +relatively easy task. Indeed there are a number of public domain +programs, called port scanners, that do just that. Perhaps the most +widely used of these is nmap, freely available at http://nmap.org and +included in most Linux distributions. For TCP, nmap sequentially scans +ports, looking for ports that are accepting TCP connections. For UDP, +nmap again sequentially scans ports, looking for UDP ports that respond +to transmitted UDP segments. In both cases, nmap returns a list of open, +closed, or unreachable ports. A host running nmap can attempt to scan +any target host anywhere in the + +Internet. We'll revisit nmap in Section 3.5.6, when we discuss TCP +connection management. + +Figure 3.5 Two clients, using the same destination port number (80) to +communicate with the same Web server application + +The situation is illustrated in Figure 3.5, in which Host C initiates +two HTTP sessions to server B, and Host A initiates one HTTP session to +B. Hosts A and C and server B each have their own unique IP address---A, +C, and B, respectively. Host C assigns two different source port numbers +(26145 and 7532) to its two HTTP connections. Because Host A is choosing +source port numbers independently of C, it might also assign a source +port of 26145 to its HTTP connection. But this is not a problem---server +B will still be able to correctly demultiplex the two connections having +the same source port number, since the two connections have different +source IP addresses. Web Servers and TCP Before closing this discussion, +it's instructive to say a few additional words about Web servers and how +they use port numbers. Consider a host running a Web server, such as an +Apache Web server, on port 80. When clients (for example, browsers) send +segments to the server, all segments will have destination port 80. In +particular, both the initial connection-establishment segments and the +segments carrying HTTP request messages will have destination port 80. +As we have just described, the server distinguishes the segments from +the different clients using source IP addresses and source port + +numbers. Figure 3.5 shows a Web server that spawns a new process for +each connection. As shown in Figure 3.5, each of these processes has its +own connection socket through which HTTP requests arrive and HTTP +responses are sent. We mention, however, that there is not always a +one-to-one correspondence between connection sockets and processes. In +fact, today's high-performing Web servers often use only one process, +and create a new thread with a new connection socket for each new client +connection. (A thread can be viewed as a lightweight subprocess.) If you +did the first programming assignment in Chapter 2, you built a Web +server that does just this. For such a server, at any given time there +may be many connection sockets (with different identifiers) attached to +the same process. If the client and server are using persistent HTTP, +then throughout the duration of the persistent connection the client and +server exchange HTTP messages via the same server socket. However, if +the client and server use non-persistent HTTP, then a new TCP connection +is created and closed for every request/response, and hence a new socket +is created and later closed for every request/response. This frequent +creating and closing of sockets can severely impact the performance of a +busy Web server (although a number of operating system tricks can be +used to mitigate the problem). Readers interested in the operating +system issues surrounding persistent and non-persistent HTTP are +encouraged to see \[Nielsen 1997; Nahum 2002\]. Now that we've discussed +transport-layer multiplexing and demultiplexing, let's move on and +discuss one of the Internet's transport protocols, UDP. In the next +section we'll see that UDP adds little more to the network-layer +protocol than a multiplexing/demultiplexing service. + +3.3 Connectionless Transport: UDP In this section, we'll take a close +look at UDP, how it works, and what it does. We encourage you to refer +back to Section 2.1, which includes an overview of the UDP service +model, and to Section 2.7.1, which discusses socket programming using +UDP. To motivate our discussion about UDP, suppose you were interested +in designing a no-frills, bare-bones transport protocol. How might you +go about doing this? You might first consider using a vacuous transport +protocol. In particular, on the sending side, you might consider taking +the messages from the application process and passing them directly to +the network layer; and on the receiving side, you might consider taking +the messages arriving from the network layer and passing them directly +to the application process. But as we learned in the previous section, +we have to do a little more than nothing! At the very least, the +transport layer has to provide a multiplexing/demultiplexing service in +order to pass data between the network layer and the correct +application-level process. UDP, defined in \[RFC 768\], does just about +as little as a transport protocol can do. Aside from the +multiplexing/demultiplexing function and some light error checking, it +adds nothing to IP. In fact, if the application developer chooses UDP +instead of TCP, then the application is almost directly talking with IP. +UDP takes messages from the application process, attaches source and +destination port number fields for the multiplexing/demultiplexing +service, adds two other small fields, and passes the resulting segment +to the network layer. The network layer encapsulates the transport-layer +segment into an IP datagram and then makes a best-effort attempt to +deliver the segment to the receiving host. If the segment arrives at the +receiving host, UDP uses the destination port number to deliver the +segment's data to the correct application process. Note that with UDP +there is no handshaking between sending and receiving transport-layer +entities before sending a segment. For this reason, UDP is said to be +connectionless. DNS is an example of an application-layer protocol that +typically uses UDP. When the DNS application in a host wants to make a +query, it constructs a DNS query message and passes the message to UDP. +Without performing any handshaking with the UDP entity running on the +destination end system, the host-side UDP adds header fields to the +message and passes the resulting segment to the network layer. The +network layer encapsulates the UDP segment into a datagram and sends the +datagram to a name server. The DNS application at the querying host then +waits for a reply to its query. If it doesn't receive a reply (possibly +because the underlying network lost the query or the reply), it might +try resending the query, try sending the query to another name server, +or inform the invoking application that it can't get a reply. + +Now you might be wondering why an application developer would ever +choose to build an application over UDP rather than over TCP. Isn't TCP +always preferable, since TCP provides a reliable data transfer service, +while UDP does not? The answer is no, as some applications are better +suited for UDP for the following reasons: Finer application-level +control over what data is sent, and when. Under UDP, as soon as an +application process passes data to UDP, UDP will package the data inside +a UDP segment and immediately pass the segment to the network layer. +TCP, on the other hand, has a congestioncontrol mechanism that throttles +the transport-layer TCP sender when one or more links between the source +and destination hosts become excessively congested. TCP will also +continue to resend a segment until the receipt of the segment has been +acknowledged by the destination, regardless of how long reliable +delivery takes. Since real-time applications often require a minimum +sending rate, do not want to overly delay segment transmission, and can +tolerate some data loss, TCP's service model is not particularly well +matched to these applications' needs. As discussed below, these +applications can use UDP and implement, as part of the application, any +additional functionality that is needed beyond UDP's no-frills +segment-delivery service. No connection establishment. As we'll discuss +later, TCP uses a three-way handshake before it starts to transfer data. +UDP just blasts away without any formal preliminaries. Thus UDP does not +introduce any delay to establish a connection. This is probably the +principal reason why DNS runs over UDP rather than TCP---DNS would be +much slower if it ran over TCP. HTTP uses TCP rather than UDP, since +reliability is critical for Web pages with text. But, as we briefly +discussed in Section 2.2, the TCP connection-establishment delay in HTTP +is an important contributor to the delays associated with downloading +Web documents. Indeed, the QUIC protocol (Quick UDP Internet Connection, +\[Iyengar 2015\]), used in Google's Chrome browser, uses UDP as its +underlying transport protocol and implements reliability in an +application-layer protocol on top of UDP. No connection state. TCP +maintains connection state in the end systems. This connection state +includes receive and send buffers, congestion-control parameters, and +sequence and acknowledgment number parameters. We will see in Section +3.5 that this state information is needed to implement TCP's reliable +data transfer service and to provide congestion control. UDP, on the +other hand, does not maintain connection state and does not track any of +these parameters. For this reason, a server devoted to a particular +application can typically support many more active clients when the +application runs over UDP rather than TCP. Small packet header overhead. +The TCP segment has 20 bytes of header overhead in every segment, +whereas UDP has only 8 bytes of overhead. Figure 3.6 lists popular +Internet applications and the transport protocols that they use. As we +expect, email, remote terminal access, the Web, and file transfer run +over TCP---all these applications need the reliable data transfer +service of TCP. Nevertheless, many important applications run over UDP +rather than TCP. For example, UDP is used to carry network management +(SNMP; see Section 5.7) data. UDP is preferred to TCP in this case, +since network management applications must often run when the + +network is in a stressed state---precisely when reliable, +congestion-controlled data transfer is difficult to achieve. Also, as we +mentioned earlier, DNS runs over UDP, thereby avoiding TCP's +connectionestablishment delays. As shown in Figure 3.6, both UDP and TCP +are somtimes used today with multimedia applications, such as Internet +phone, real-time video conferencing, and streaming of stored audio and +video. We'll take a close look at these applications in Chapter 9. We +just mention now that all of these applications can tolerate a small +amount of packet loss, so that reliable data transfer is not absolutely +critical for the application's success. Furthermore, real-time +applications, like Internet phone and video conferencing, react very +poorly to TCP's congestion control. For these reasons, developers of +multimedia applications may choose to run their applications over UDP +instead of TCP. When packet loss rates are low, and with some +organizations blocking UDP traffic for security reasons (see Chapter 8), +TCP becomes an increasingly attractive protocol for streaming media +transport. + +Figure 3.6 Popular Internet applications and their underlying transport +protocols + +Although commonly done today, running multimedia applications over UDP +is controversial. As we mentioned above, UDP has no congestion control. +But congestion control is needed to prevent the network from entering a +congested state in which very little useful work is done. If everyone +were to start streaming high-bit-rate video without using any congestion +control, there would be so much packet overflow at routers that very few +UDP packets would successfully traverse the source-to-destination path. +Moreover, the high loss rates induced by the uncontrolled UDP senders +would cause the TCP senders (which, as we'll see, do decrease their +sending rates in the face of congestion) to dramatically decrease their +rates. Thus, the lack of congestion control in UDP can result in high +loss rates between a UDP sender and receiver, and the crowding out of +TCP sessions---a potentially serious problem \[Floyd + +1999\]. Many researchers have proposed new mechanisms to force all +sources, including UDP sources, to perform adaptive congestion control +\[Mahdavi 1997; Floyd 2000; Kohler 2006: RFC 4340\]. Before discussing +the UDP segment structure, we mention that it is possible for an +application to have reliable data transfer when using UDP. This can be +done if reliability is built into the application itself (for example, +by adding acknowledgment and retransmission mechanisms, such as those +we'll study in the next section). We mentioned earlier that the QUIC +protocol \[Iyengar 2015\] used in Google's Chrome browser implements +reliability in an application-layer protocol on top of UDP. But this is +a nontrivial task that would keep an application developer busy +debugging for a long time. Nevertheless, building reliability directly +into the application allows the application to "have its cake and eat it +too. That is, application processes can communicate reliably without +being subjected to the transmission-rate constraints imposed by TCP's +congestion-control mechanism. + +3.3.1 UDP Segment Structure The UDP segment structure, shown in Figure +3.7, is defined in RFC 768. The application data occupies the data field +of the UDP segment. For example, for DNS, the data field contains either +a query message or a response message. For a streaming audio +application, audio samples fill the data field. The UDP header has only +four fields, each consisting of two bytes. As discussed in the previous +section, the port numbers allow the destination host to pass the +application data to the correct process running on the destination end +system (that is, to perform the demultiplexing function). The length +field specifies the number of bytes in the UDP segment (header plus +data). An explicit length value is needed since the size of the data +field may differ from one UDP segment to the next. The checksum is used +by the receiving host to check whether errors have been introduced into +the segment. In truth, the checksum is also calculated over a few of the +fields in the IP header in addition to the UDP segment. But we ignore +this detail in order to see the forest through the trees. We'll discuss +the checksum calculation below. Basic principles of error detection are +described in Section 6.2. The length field specifies the length of the +UDP segment, including the header, in bytes. + +3.3.2 UDP Checksum The UDP checksum provides for error detection. That +is, the checksum is used to determine whether bits within the UDP +segment have been altered (for example, by noise in the links or while +stored in a router) as it moved from source to destination. + +Figure 3.7 UDP segment structure + +UDP at the sender side performs the 1s complement of the sum of all the +16-bit words in the segment, with any overflow encountered during the +sum being wrapped around. This result is put in the checksum field of +the UDP segment. Here we give a simple example of the checksum +calculation. You can find details about efficient implementation of the +calculation in RFC 1071 and performance over real data in \[Stone 1998; +Stone 2000\]. As an example, suppose that we have the following three +16-bit words: 0110011001100000 0101010101010101 1000111100001100 The sum +of first two of these 16-bit words is 0110011001100000 0101010101010101 +1011101110110101 Adding the third word to the above sum gives +1011101110110101 1000111100001100 0100101011000010 Note that this last +addition had overflow, which was wrapped around. The 1s complement is +obtained by converting all the 0s to 1s and converting all the 1s to 0s. +Thus the 1s complement of the sum 0100101011000010 is 1011010100111101, +which becomes the checksum. At the receiver, all four 16- + +bit words are added, including the checksum. If no errors are introduced +into the packet, then clearly the sum at the receiver will be +1111111111111111. If one of the bits is a 0, then we know that errors +have been introduced into the packet. You may wonder why UDP provides a +checksum in the first place, as many link-layer protocols (including the +popular Ethernet protocol) also provide error checking. The reason is +that there is no guarantee that all the links between source and +destination provide error checking; that is, one of the links may use a +link-layer protocol that does not provide error checking. Furthermore, +even if segments are correctly transferred across a link, it's possible +that bit errors could be introduced when a segment is stored in a +router's memory. Given that neither link-by-link reliability nor +in-memory error detection is guaranteed, UDP must provide error +detection at the transport layer, on an end-end basis, if the endend +data transfer service is to provide error detection. This is an example +of the celebrated end-end principle in system design \[Saltzer 1984\], +which states that since certain functionality (error detection, in this +case) must be implemented on an end-end basis: "functions placed at the +lower levels may be redundant or of little value when compared to the +cost of providing them at the higher level." Because IP is supposed to +run over just about any layer-2 protocol, it is useful for the transport +layer to provide error checking as a safety measure. Although UDP +provides error checking, it does not do anything to recover from an +error. Some implementations of UDP simply discard the damaged segment; +others pass the damaged segment to the application with a warning. That +wraps up our discussion of UDP. We will soon see that TCP offers +reliable data transfer to its applications as well as other services +that UDP doesn't offer. Naturally, TCP is also more complex than UDP. +Before discussing TCP, however, it will be useful to step back and first +discuss the underlying principles of reliable data transfer. + +3.4 Principles of Reliable Data Transfer In this section, we consider +the problem of reliable data transfer in a general context. This is +appropriate since the problem of implementing reliable data transfer +occurs not only at the transport layer, but also at the link layer and +the application layer as well. The general problem is thus of central +importance to networking. Indeed, if one had to identify a "top-ten" +list of fundamentally important problems in all of networking, this +would be a candidate to lead the list. In the next section we'll examine +TCP and show, in particular, that TCP exploits many of the principles +that we are about to describe. Figure 3.8 illustrates the framework for +our study of reliable data transfer. The service abstraction provided to +the upper-layer entities is that of a reliable channel through which +data can be transferred. With a reliable channel, no transferred data +bits are corrupted (flipped from 0 to 1, or vice versa) or lost, and all +are delivered in the order in which they were sent. This is precisely +the service model offered by TCP to the Internet applications that +invoke it. It is the responsibility of a reliable data transfer protocol +to implement this service abstraction. This task is made difficult by +the fact that the layer below the reliable data transfer protocol may be +unreliable. For example, TCP is a reliable data transfer protocol that +is implemented on top of an unreliable (IP) end-to-end network layer. +More generally, the layer beneath the two reliably communicating end +points might consist of a single physical link (as in the case of a +link-level data transfer protocol) or a global internetwork (as in the +case of a transport-level protocol). For our purposes, however, we can +view this lower layer simply as an unreliable point-to-point channel. In +this section, we will incrementally develop the sender and receiver +sides of a reliable data transfer protocol, considering increasingly +complex models of the underlying channel. For example, we'll consider +what protocol mechanisms are + +Figure 3.8 Reliable data transfer: Service model and service +implementation + +needed when the underlying channel can corrupt bits or lose entire +packets. One assumption we'll adopt throughout our discussion here is +that packets will be delivered in the order in which they were sent, +with some packets possibly being lost; that is, the underlying channel +will not reorder packets. Figure 3.8(b) illustrates the interfaces for +our data transfer protocol. The sending side of the data transfer +protocol will be invoked from above by a call to rdt_send() . It will +pass the data to be delivered to the upper layer at the receiving side. +(Here rdt stands for reliable data transfer protocol and \_send +indicates that the sending side of rdt is being called. The first step +in developing any protocol is to choose a good name!) On the receiving +side, rdt_rcv() will be called when a packet arrives from the receiving +side of the channel. When the rdt protocol wants to deliver data to the +upper layer, it will do so by calling deliver_data() . In the following +we use the terminology "packet" rather than transport-layer "segment." +Because the theory developed in this section applies to computer +networks in general and not just to the Internet transport layer, the +generic term "packet" is perhaps more appropriate here. In this section +we consider only the case of unidirectional data transfer, that is, data +transfer from the sending to the receiving side. The case of reliable +bidirectional (that is, full-duplex) data transfer is conceptually no +more difficult but considerably more tedious to explain. Although we +consider only unidirectional data transfer, it is important to note that +the sending and receiving sides of our protocol will nonetheless need to +transmit packets in both directions, as indicated in Figure 3.8. We will +see shortly that, in addition to exchanging packets containing the data +to be transferred, the sending and receiving sides of rdt will also need +to exchange control packets back and forth. Both the send and receive +sides of rdt send packets to the other side by a call to udt_send() +(where udt stands for unreliable data transfer). + +3.4.1 Building a Reliable Data Transfer Protocol We now step through a +series of protocols, each one becoming more complex, arriving at a +flawless, reliable data transfer protocol. Reliable Data Transfer over a +Perfectly Reliable Channel: rdt1.0 We first consider the simplest case, +in which the underlying channel is completely reliable. The protocol +itself, which we'll call rdt1.0 , is trivial. The finite-state machine +(FSM) definitions for the rdt1.0 sender and receiver are shown in Figure +3.9. The FSM in Figure 3.9(a) defines the operation of the sender, while +the FSM in Figure 3.9(b) defines the operation of the receiver. It is +important to note that there are separate FSMs for the sender and for +the receiver. The sender and receiver FSMs in Figure 3.9 each have just +one state. The arrows in the FSM description indicate the transition of +the protocol from one state to another. (Since each FSM in Figure 3.9 +has just one state, a transition is necessarily from the one state back +to itself; we'll see more complicated state diagrams shortly.) The event +causing + +the transition is shown above the horizontal line labeling the +transition, and the actions taken when the event occurs are shown below +the horizontal line. When no action is taken on an event, or no event +occurs and an action is taken, we'll use the symbol Λ below or above the +horizontal, respectively, to explicitly denote the lack of an action or +event. The initial state of the FSM is indicated by the dashed arrow. +Although the FSMs in Figure 3.9 have but one state, the FSMs we will see +shortly have multiple states, so it will be important to identify the +initial state of each FSM. The sending side of rdt simply accepts data +from the upper layer via the rdt_send(data) event, creates a packet +containing the data (via the action make_pkt(data) ) and sends the +packet into the channel. In practice, the rdt_send(data) event would +result from a procedure call (for example, to rdt_send() ) by the +upper-layer application. + +Figure 3.9 rdt1.0 -- A protocol for a completely reliable channel + +On the receiving side, rdt receives a packet from the underlying channel +via the rdt_rcv(packet) event, removes the data from the packet (via the +action extract (packet, data) ) and passes the data up to the upper +layer (via the action deliver_data(data) ). In practice, the +rdt_rcv(packet) event would result from a procedure call (for example, +to rdt_rcv() ) from the lower-layer protocol. In this simple protocol, +there is no difference between a unit of data and a packet. Also, all +packet flow is from the sender to receiver; with a perfectly reliable +channel there is no need for the receiver side to provide any feedback +to the sender since nothing can go wrong! Note that we have also assumed +that + +the receiver is able to receive data as fast as the sender happens to +send data. Thus, there is no need for the receiver to ask the sender to +slow down! Reliable Data Transfer over a Channel with Bit Errors: rdt2.0 +A more realistic model of the underlying channel is one in which bits in +a packet may be corrupted. Such bit errors typically occur in the +physical components of a network as a packet is transmitted, propagates, +or is buffered. We'll continue to assume for the moment that all +transmitted packets are received (although their bits may be corrupted) +in the order in which they were sent. Before developing a protocol for +reliably communicating over such a channel, first consider how people +might deal with such a situation. Consider how you yourself might +dictate a long message over the phone. In a typical scenario, the +message taker might say "OK" after each sentence has been heard, +understood, and recorded. If the message taker hears a garbled sentence, +you're asked to repeat the garbled sentence. This message-dictation +protocol uses both positive acknowledgments ("OK") and negative +acknowledgments ("Please repeat that."). These control messages allow +the receiver to let the sender know what has been received correctly, +and what has been received in error and thus requires repeating. In a +computer network setting, reliable data transfer protocols based on such +retransmission are known as ARQ (Automatic Repeat reQuest) protocols. +Fundamentally, three additional protocol capabilities are required in +ARQ protocols to handle the presence of bit errors: Error detection. +First, a mechanism is needed to allow the receiver to detect when bit +errors have occurred. Recall from the previous section that UDP uses the +Internet checksum field for exactly this purpose. In Chapter 6 we'll +examine error-detection and -correction techniques in greater detail; +these techniques allow the receiver to detect and possibly correct +packet bit errors. For now, we need only know that these techniques +require that extra bits (beyond the bits of original data to be +transferred) be sent from the sender to the receiver; these bits will be +gathered into the packet checksum field of the rdt2.0 data packet. +Receiver feedback. Since the sender and receiver are typically executing +on different end systems, possibly separated by thousands of miles, the +only way for the sender to learn of the receiver's view of the world (in +this case, whether or not a packet was received correctly) is for the +receiver to provide explicit feedback to the sender. The positive (ACK) +and negative (NAK) acknowledgment replies in the message-dictation +scenario are examples of such feedback. Our rdt2.0 protocol will +similarly send ACK and NAK packets back from the receiver to the sender. +In principle, these packets need only be one bit long; for example, a 0 +value could indicate a NAK and a value of 1 could indicate an ACK. +Retransmission. A packet that is received in error at the receiver will +be retransmitted by the sender. + +Figure 3.10 shows the FSM representation of rdt2.0 , a data transfer +protocol employing error detection, positive acknowledgments, and +negative acknowledgments. The send side of rdt2.0 has two states. In the +leftmost state, the send-side protocol is waiting for data to be passed +down from the upper layer. When the rdt_send(data) event occurs, the +sender will create a packet ( sndpkt ) containing the data to be sent, +along with a packet checksum (for example, as discussed in Section 3.3.2 +for the case of a UDP segment), and then send the packet via the +udt_send(sndpkt) operation. In the rightmost state, the sender protocol +is waiting for an ACK or a NAK packet from the receiver. If an ACK +packet is received + +Figure 3.10 rdt2.0 -- A protocol for a channel with bit errors + +(the notation rdt_rcv(rcvpkt) && isACK (rcvpkt) in Figure 3.10 +corresponds to this event), the sender knows that the most recently +transmitted packet has been received correctly and thus the protocol +returns to the state of waiting for data from the upper layer. If a NAK +is received, the protocol retransmits the last packet and waits for an +ACK or NAK to be returned by the receiver in response to + +the retransmitted data packet. It is important to note that when the +sender is in the wait-for-ACK-or-NAK state, it cannot get more data from +the upper layer; that is, the rdt_send() event can not occur; that will +happen only after the sender receives an ACK and leaves this state. +Thus, the sender will not send a new piece of data until it is sure that +the receiver has correctly received the current packet. Because of this +behavior, protocols such as rdt2.0 are known as stop-and-wait protocols. +The receiver-side FSM for rdt2.0 still has a single state. On packet +arrival, the receiver replies with either an ACK or a NAK, depending on +whether or not the received packet is corrupted. In Figure 3.10, the +notation rdt_rcv(rcvpkt) && corrupt(rcvpkt) corresponds to the event in +which a packet is received and is found to be in error. Protocol rdt2.0 +may look as if it works but, unfortunately, it has a fatal flaw. In +particular, we haven't accounted for the possibility that the ACK or NAK +packet could be corrupted! (Before proceeding on, you should think about +how this problem may be fixed.) Unfortunately, our slight oversight is +not as innocuous as it may seem. Minimally, we will need to add checksum +bits to ACK/NAK packets in order to detect such errors. The more +difficult question is how the protocol should recover from errors in ACK +or NAK packets. The difficulty here is that if an ACK or NAK is +corrupted, the sender has no way of knowing whether or not the receiver +has correctly received the last piece of transmitted data. Consider +three possibilities for handling corrupted ACKs or NAKs: For the first +possibility, consider what a human might do in the message-dictation +scenario. If the speaker didn't understand the "OK" or "Please repeat +that" reply from the receiver, the speaker would probably ask, "What did +you say?" (thus introducing a new type of sender-to-receiver packet to +our protocol). The receiver would then repeat the reply. But what if the +speaker's "What did you say?" is corrupted? The receiver, having no idea +whether the garbled sentence was part of the dictation or a request to +repeat the last reply, would probably then respond with "What did you +say?" And then, of course, that response might be garbled. Clearly, +we're heading down a difficult path. A second alternative is to add +enough checksum bits to allow the sender not only to detect, but also to +recover from, bit errors. This solves the immediate problem for a +channel that can corrupt packets but not lose them. A third approach is +for the sender simply to resend the current data packet when it receives +a garbled ACK or NAK packet. This approach, however, introduces +duplicate packets into the sender-to-receiver channel. The fundamental +difficulty with duplicate packets is that the receiver doesn't know +whether the ACK or NAK it last sent was received correctly at the +sender. Thus, it cannot know a priori whether an arriving packet +contains new data or is a retransmission! A simple solution to this new +problem (and one adopted in almost all existing data transfer protocols, +including TCP) is to add a new field to the data packet and have the +sender number its data packets by putting a sequence number into this +field. The receiver then need only check this sequence number to + +determine whether or not the received packet is a retransmission. For +this simple case of a stop-andwait protocol, a 1-bit sequence number +will suffice, since it will allow the receiver to know whether the +sender is resending the previously transmitted packet (the sequence +number of the received packet has the same sequence number as the most +recently received packet) or a new packet (the sequence number changes, +moving "forward" in modulo-2 arithmetic). Since we are currently +assuming a channel that does not lose packets, ACK and NAK packets do +not themselves need to indicate the sequence number of the packet they +are acknowledging. The sender knows that a received ACK or NAK packet +(whether garbled or not) was generated in response to its most recently +transmitted data packet. Figures 3.11 and 3.12 show the FSM description +for rdt2.1 , our fixed version of rdt2.0 . The rdt2.1 sender and +receiver FSMs each now have twice as many states as before. This is +because the protocol state must now reflect whether the packet currently +being sent (by the sender) or expected (at the receiver) should have a +sequence number of 0 or 1. Note that the actions in those states where a +0numbered packet is being sent or expected are mirror images of those +where a 1-numbered packet is being sent or expected; the only +differences have to do with the handling of the sequence number. +Protocol rdt2.1 uses both positive and negative acknowledgments from the +receiver to the sender. When an out-of-order packet is received, the +receiver sends a positive acknowledgment for the packet it has received. +When a corrupted packet + +Figure 3.11 rdt2.1 sender + +Figure 3.12 rdt2.1 receiver + +is received, the receiver sends a negative acknowledgment. We can +accomplish the same effect as a NAK if, instead of sending a NAK, we +send an ACK for the last correctly received packet. A sender that +receives two ACKs for the same packet (that is, receives duplicate ACKs) +knows that the receiver did not correctly receive the packet following +the packet that is being ACKed twice. Our NAK-free reliable data +transfer protocol for a channel with bit errors is rdt2.2 , shown in +Figures 3.13 and 3.14. One subtle change between rtdt2.1 and rdt2.2 is +that the receiver must now include the sequence number of the packet +being acknowledged by an ACK message (this is done by including the ACK +, 0 or ACK , 1 argument in make_pkt() in the receiver FSM), and the +sender must now check the sequence number of the packet being +acknowledged by a received ACK message (this is done by including the 0 +or 1 argument in isACK() in the sender FSM). Reliable Data Transfer over +a Lossy Channel with Bit Errors: rdt3.0 Suppose now that in addition to +corrupting bits, the underlying channel can lose packets as well, a +notuncommon event in today's computer networks (including the Internet). +Two additional concerns must now be addressed by the protocol: how to +detect packet loss and what to do when packet loss occurs. The use of +checksumming, sequence numbers, ACK packets, and retransmissions---the +techniques + +Figure 3.13 rdt2.2 sender + +already developed in rdt2.2 ---will allow us to answer the latter +concern. Handling the first concern will require adding a new protocol +mechanism. There are many possible approaches toward dealing with packet +loss (several more of which are explored in the exercises at the end of +the chapter). Here, we'll put the burden of detecting and recovering +from lost packets on the sender. Suppose that the sender transmits a +data packet and either that packet, or the receiver's ACK of that +packet, gets lost. In either case, no reply is forthcoming at the sender +from the receiver. If the sender is willing to wait long enough so that +it is certain that a packet has been lost, it can simply retransmit the +data packet. You should convince yourself that this protocol does indeed +work. But how long must the sender wait to be certain that something has +been lost? The sender must clearly wait at least as long as a round-trip +delay between the sender and receiver (which may include buffering at +intermediate routers) plus whatever amount of time is needed to process +a packet at the receiver. In many networks, this worst-case maximum +delay is very difficult even to estimate, much less know with certainty. +Moreover, the protocol should ideally recover from packet loss as soon +as possible; waiting for a worst-case delay could mean a long wait until +error recovery + +Figure 3.14 rdt2.2 receiver + +is initiated. The approach thus adopted in practice is for the sender to +judiciously choose a time value such that packet loss is likely, +although not guaranteed, to have happened. If an ACK is not received +within this time, the packet is retransmitted. Note that if a packet +experiences a particularly large delay, the sender may retransmit the +packet even though neither the data packet nor its ACK have been lost. +This introduces the possibility of duplicate data packets in the +sender-to-receiver channel. Happily, protocol rdt2.2 already has enough +functionality (that is, sequence numbers) to handle the case of +duplicate packets. From the sender's viewpoint, retransmission is a +panacea. The sender does not know whether a data packet was lost, an ACK +was lost, or if the packet or ACK was simply overly delayed. In all +cases, the action is the same: retransmit. Implementing a time-based +retransmission mechanism requires a countdown timer that can interrupt +the sender after a given amount of time has expired. The sender will +thus need to be able to (1) start the timer each time a packet (either a +first-time packet or a retransmission) is sent, (2) respond to a timer +interrupt (taking appropriate actions), and (3) stop the timer. Figure +3.15 shows the sender FSM for rdt3.0 , a protocol that reliably +transfers data over a channel that can corrupt or lose packets; in the +homework problems, you'll be asked to provide the receiver FSM for +rdt3.0 . Figure 3.16 shows how the protocol operates with no lost or +delayed packets and how it handles lost data packets. In Figure 3.16, +time moves forward from the top of the diagram toward the bottom of the + +Figure 3.15 rdt3.0 sender + +diagram; note that a receive time for a packet is necessarily later than +the send time for a packet as a result of transmission and propagation +delays. In Figures 3.16(b)--(d), the send-side brackets indicate the +times at which a timer is set and later times out. Several of the more +subtle aspects of this protocol are explored in the exercises at the end +of this chapter. Because packet sequence numbers alternate between 0 and +1, protocol rdt3.0 is sometimes known as the alternating-bit protocol. +We have now assembled the key elements of a data transfer protocol. +Checksums, sequence numbers, timers, and positive and negative +acknowledgment packets each play a crucial and necessary role in the +operation of the protocol. We now have a working reliable data transfer +protocol! + +Developing a protocol and FSM representation for a simple +application-layer protocol + +3.4.2 Pipelined Reliable Data Transfer Protocols Protocol rdt3.0 is a +functionally correct protocol, but it is unlikely that anyone would be +happy with its performance, particularly in today's high-speed networks. +At the heart of rdt3.0 's performance problem is the fact that it is a +stop-and-wait protocol. + +Figure 3.16 Operation of rdt3.0 , the alternating-bit protocol + +Figure 3.17 Stop-and-wait versus pipelined protocol + +To appreciate the performance impact of this stop-and-wait behavior, +consider an idealized case of two hosts, one located on the West Coast +of the United States and the other located on the East Coast, as shown +in Figure 3.17. The speed-of-light round-trip propagation delay between +these two end systems, RTT, is approximately 30 milliseconds. Suppose +that they are connected by a channel with a transmission rate, R, of 1 +Gbps (109 bits per second). With a packet size, L, of 1,000 bytes (8,000 +bits) per packet, including both header fields and data, the time needed +to actually transmit the packet into the 1 Gbps link is dtrans=LR=8000 +bits/packet109 bits/sec=8 microseconds Figure 3.18(a) shows that with +our stop-and-wait protocol, if the sender begins sending the packet at +t=0, then at t=L/R=8 microseconds, the last bit enters the channel at +the sender side. The packet then makes its 15-msec cross-country +journey, with the last bit of the packet emerging at the receiver at +t=RTT/2+L/R= 15.008 msec. Assuming for simplicity that ACK packets are +extremely small (so that we can ignore their transmission time) and that +the receiver can send an ACK as soon as the last bit of a data packet is +received, the ACK emerges back at the sender at t=RTT+L/R=30.008 msec. +At this point, the sender can now transmit the next message. Thus, in +30.008 msec, the sender was sending for only 0.008 msec. If we define +the utilization of the sender (or the channel) as the fraction of time +the sender is actually busy sending bits into the channel, the analysis +in Figure 3.18(a) shows that the stop-andwait protocol has a rather +dismal sender utilization, Usender, of Usender=L/RRTT+L/R +=.00830.008=0.00027 + +Figure 3.18 Stop-and-wait and pipelined sending + +That is, the sender was busy only 2.7 hundredths of one percent of the +time! Viewed another way, the sender was able to send only 1,000 bytes +in 30.008 milliseconds, an effective throughput of only 267 kbps---even +though a 1 Gbps link was available! Imagine the unhappy network manager +who just paid a fortune for a gigabit capacity link but manages to get a +throughput of only 267 kilobits per second! This is a graphic example of +how network protocols can limit the capabilities provided by the +underlying network hardware. Also, we have neglected lower-layer +protocol-processing times at the sender and receiver, as well as the +processing and queuing delays that would occur at any intermediate +routers + +between the sender and receiver. Including these effects would serve +only to further increase the delay and further accentuate the poor +performance. The solution to this particular performance problem is +simple: Rather than operate in a stop-and-wait manner, the sender is +allowed to send multiple packets without waiting for acknowledgments, as +illustrated in Figure 3.17(b). Figure 3.18(b) shows that if the sender +is allowed to transmit three packets before having to wait for +acknowledgments, the utilization of the sender is essentially tripled. +Since the many in-transit sender-to-receiver packets can be visualized +as filling a pipeline, this technique is known as pipelining. Pipelining +has the following consequences for reliable data transfer protocols: The +range of sequence numbers must be increased, since each in-transit +packet (not counting retransmissions) must have a unique sequence number +and there may be multiple, in-transit, unacknowledged packets. The +sender and receiver sides of the protocols may have to buffer more than +one packet. Minimally, the sender will have to buffer packets that have +been transmitted but not yet acknowledged. Buffering of correctly +received packets may also be needed at the receiver, as discussed below. +The range of sequence numbers needed and the buffering requirements will +depend on the manner in which a data transfer protocol responds to lost, +corrupted, and overly delayed packets. Two basic approaches toward +pipelined error recovery can be identified: Go-Back-N and selective +repeat. + +3.4.3 Go-Back-N (GBN) In a Go-Back-N (GBN) protocol, the sender is +allowed to transmit multiple packets (when available) without waiting +for an acknowledgment, but is constrained to have no more than some +maximum allowable number, N, of unacknowledged packets in the pipeline. +We describe the GBN protocol in some detail in this section. But before +reading on, you are encouraged to play with the GBN applet (an awesome +applet!) at the companion Web site. Figure 3.19 shows the sender's view +of the range of sequence numbers in a GBN protocol. If we define base to +be the sequence number of the oldest unacknowledged + +Figure 3.19 Sender's view of sequence numbers in Go-Back-N + +packet and nextseqnum to be the smallest unused sequence number (that +is, the sequence number of the next packet to be sent), then four +intervals in the range of sequence numbers can be identified. Sequence +numbers in the interval \[ 0, base-1 \] correspond to packets that have +already been transmitted and acknowledged. The interval \[base, +nextseqnum-1\] corresponds to packets that have been sent but not yet +acknowledged. Sequence numbers in the interval \[nextseqnum, base+N-1\] +can be used for packets that can be sent immediately, should data arrive +from the upper layer. Finally, sequence numbers greater than or equal to +base+N cannot be used until an unacknowledged packet currently in the +pipeline (specifically, the packet with sequence number base ) has been +acknowledged. As suggested by Figure 3.19, the range of permissible +sequence numbers for transmitted but not yet acknowledged packets can be +viewed as a window of size N over the range of sequence numbers. As the +protocol operates, this window slides forward over the sequence number +space. For this reason, N is often referred to as the window size and +the GBN protocol itself as a sliding-window protocol. You might be +wondering why we would even limit the number of outstanding, +unacknowledged packets to a value of N in the first place. Why not allow +an unlimited number of such packets? We'll see in Section 3.5 that flow +control is one reason to impose a limit on the sender. We'll examine +another reason to do so in Section 3.7, when we study TCP congestion +control. In practice, a packet's sequence number is carried in a +fixed-length field in the packet header. If k is the number of bits in +the packet sequence number field, the range of sequence numbers is thus +\[0,2k−1\]. With a finite range of sequence numbers, all arithmetic +involving sequence numbers must then be done using modulo 2k arithmetic. +(That is, the sequence number space can be thought of as a ring of size +2k, where sequence number 2k−1 is immediately followed by sequence +number 0.) Recall that rdt3.0 had a 1-bit sequence number and a range of +sequence numbers of \[0,1\]. Several of the problems at the end of this +chapter explore the consequences of a finite range of sequence numbers. +We will see in Section 3.5 that TCP has a 32-bit sequence number field, +where TCP sequence numbers count bytes in the byte stream rather than +packets. Figures 3.20 and 3.21 give an extended FSM description of the +sender and receiver sides of an ACKbased, NAK-free, GBN protocol. We +refer to this FSM + +Figure 3.20 Extended FSM description of the GBN sender + +Figure 3.21 Extended FSM description of the GBN receiver + +description as an extended FSM because we have added variables (similar +to programming-language variables) for base and nextseqnum , and added +operations on these variables and conditional actions involving these +variables. Note that the extended FSM specification is now beginning to +look somewhat like a programming-language specification. \[Bochman +1984\] provides an excellent survey of + +additional extensions to FSM techniques as well as other +programming-language-based techniques for specifying protocols. The GBN +sender must respond to three types of events: Invocation from above. +When rdt_send() is called from above, the sender first checks to see if +the window is full, that is, whether there are N outstanding, +unacknowledged packets. If the window is not full, a packet is created +and sent, and variables are appropriately updated. If the window is +full, the sender simply returns the data back to the upper layer, an +implicit indication that the window is full. The upper layer would +presumably then have to try again later. In a real implementation, the +sender would more likely have either buffered (but not immediately sent) +this data, or would have a synchronization mechanism (for example, a +semaphore or a flag) that would allow the upper layer to call rdt_send() +only when the window is not full. Receipt of an ACK. In our GBN +protocol, an acknowledgment for a packet with sequence number n will be +taken to be a cumulative acknowledgment, indicating that all packets +with a sequence number up to and including n have been correctly +received at the receiver. We'll come back to this issue shortly when we +examine the receiver side of GBN. A timeout event. The protocol's name, +"Go-Back-N," is derived from the sender's behavior in the presence of +lost or overly delayed packets. As in the stop-and-wait protocol, a +timer will again be used to recover from lost data or acknowledgment +packets. If a timeout occurs, the sender resends all packets that have +been previously sent but that have not yet been acknowledged. Our sender +in Figure 3.20 uses only a single timer, which can be thought of as a +timer for the oldest transmitted but not yet acknowledged packet. If an +ACK is received but there are still additional transmitted but not yet +acknowledged packets, the timer is restarted. If there are no +outstanding, unacknowledged packets, the timer is stopped. The +receiver's actions in GBN are also simple. If a packet with sequence +number n is received correctly and is in order (that is, the data last +delivered to the upper layer came from a packet with sequence number +n−1), the receiver sends an ACK for packet n and delivers the data +portion of the packet to the upper layer. In all other cases, the +receiver discards the packet and resends an ACK for the most recently +received in-order packet. Note that since packets are delivered one at a +time to the upper layer, if packet k has been received and delivered, +then all packets with a sequence number lower than k have also been +delivered. Thus, the use of cumulative acknowledgments is a natural +choice for GBN. In our GBN protocol, the receiver discards out-of-order +packets. Although it may seem silly and wasteful to discard a correctly +received (but out-of-order) packet, there is some justification for +doing so. Recall that the receiver must deliver data in order to the +upper layer. Suppose now that packet n is expected, but packet n+1 +arrives. Because data must be delivered in order, the receiver could +buffer (save) packet n+1 and then deliver this packet to the upper layer +after it had later received and delivered packet n. However, if packet n +is lost, both it and packet n+1 will eventually be retransmitted as a +result of the + +GBN retransmission rule at the sender. Thus, the receiver can simply +discard packet n+1. The advantage of this approach is the simplicity of +receiver buffering---the receiver need not buffer any outof-order +packets. Thus, while the sender must maintain the upper and lower bounds +of its window and the position of nextseqnum within this window, the +only piece of information the receiver need maintain is the sequence +number of the next in-order packet. This value is held in the variable +expectedseqnum , shown in the receiver FSM in Figure 3.21. Of course, +the disadvantage of throwing away a correctly received packet is that +the subsequent retransmission of that packet might be lost or garbled +and thus even more retransmissions would be required. Figure 3.22 shows +the operation of the GBN protocol for the case of a window size of four +packets. Because of this window size limitation, the sender sends +packets 0 through 3 but then must wait for one or more of these packets +to be acknowledged before proceeding. As each successive ACK (for +example, ACK0 and ACK1 ) is received, the window slides forward and the +sender can transmit one new packet (pkt4 and pkt5, respectively). On the +receiver side, packet 2 is lost and thus packets 3, 4, and 5 are found +to be out of order and are discarded. Before closing our discussion of +GBN, it is worth noting that an implementation of this protocol in a +protocol stack would likely have a structure similar to that of the +extended FSM in Figure 3.20. The implementation would also likely be in +the form of various procedures that implement the actions to be taken in +response to the various events that can occur. In such event-based +programming, the various procedures are called (invoked) either by other +procedures in the protocol stack, or as the result of an interrupt. In +the sender, these events would be (1) a call from the upper-layer entity +to invoke rdt_send() , (2) a timer interrupt, and (3) a call from the +lower layer to invoke rdt_rcv() when a packet arrives. The programming +exercises at the end of this chapter will give you a chance to actually +implement these routines in a simulated, but realistic, network setting. +We note here that the GBN protocol incorporates almost all of the +techniques that we will encounter when we study the reliable data +transfer components of TCP in Section 3.5. These techniques include the +use of sequence numbers, cumulative acknowledgments, checksums, and a +timeout/retransmit operation. + +Figure 3.22 Go-Back-N in operation + +3.4.4 Selective Repeat (SR) The GBN protocol allows the sender to +potentially "fill the pipeline" in Figure 3.17 with packets, thus +avoiding the channel utilization problems we noted with stop-and-wait +protocols. There are, however, scenarios in which GBN itself suffers +from performance problems. In particular, when the window size and +bandwidth-delay product are both large, many packets can be in the +pipeline. A single packet error can thus cause GBN to retransmit a large +number of packets, many unnecessarily. As the probability of channel +errors increases, the pipeline can become filled with these unnecessary +retransmissions. Imagine, in our message-dictation scenario, that if +every time a word was garbled, the surrounding 1,000 words (for example, +a window size of 1,000 words) had to be repeated. The dictation would be + +slowed by all of the reiterated words. As the name suggests, +selective-repeat protocols avoid unnecessary retransmissions by having +the sender retransmit only those packets that it suspects were received +in error (that is, were lost or corrupted) at the receiver. This +individual, as-needed, retransmission will require that the receiver +individually acknowledge correctly received packets. A window size of N +will again be used to limit the number of outstanding, unacknowledged +packets in the pipeline. However, unlike GBN, the sender will have +already received ACKs for some of the packets in the window. Figure 3.23 +shows the SR sender's view of the sequence number space. Figure 3.24 +details the various actions taken by the SR sender. The SR receiver will +acknowledge a correctly received packet whether or not it is in order. +Out-of-order packets are buffered until any missing packets (that is, +packets with lower sequence numbers) are received, at which point a +batch of packets can be delivered in order to the upper layer. Figure +3.25 itemizes the various actions taken by the SR receiver. Figure 3.26 +shows an example of SR operation in the presence of lost packets. Note +that in Figure 3.26, the receiver initially buffers packets 3, 4, and 5, +and delivers them together with packet 2 to the upper layer when packet +2 is finally received. + +Figure 3.23 Selective-repeat (SR) sender and receiver views of +sequence-number space + +Figure 3.24 SR sender events and actions + +Figure 3.25 SR receiver events and actions + +It is important to note that in Step 2 in Figure 3.25, the receiver +reacknowledges (rather than ignores) already received packets with +certain sequence numbers below the current window base. You should +convince yourself that this reacknowledgment is indeed needed. Given the +sender and receiver sequence number spaces in Figure 3.23, for example, +if there is no ACK for packet send_base propagating from the + +Figure 3.26 SR operation + +receiver to the sender, the sender will eventually retransmit packet +send_base , even though it is clear (to us, not the sender!) that the +receiver has already received that packet. If the receiver were not to +acknowledge this packet, the sender's window would never move forward! +This example illustrates an important aspect of SR protocols (and many +other protocols as well). The sender and receiver will not always have +an identical view of what has been received correctly and what has not. +For SR protocols, this means that the sender and receiver windows will +not always coincide. The lack of synchronization between sender and +receiver windows has important consequences when we are faced with the +reality of a finite range of sequence numbers. Consider what could +happen, for example, with a finite range of four packet sequence +numbers, 0, 1, 2, 3, and a window size of three. + +Suppose packets 0 through 2 are transmitted and correctly received and +acknowledged at the receiver. At this point, the receiver's window is +over the fourth, fifth, and sixth packets, which have sequence numbers +3, 0, and 1, respectively. Now consider two scenarios. In the first +scenario, shown in Figure 3.27(a), the ACKs for the first three packets +are lost and the sender retransmits these packets. The receiver thus +next receives a packet with sequence number 0---a copy of the first +packet sent. In the second scenario, shown in Figure 3.27(b), the ACKs +for the first three packets are all delivered correctly. The sender thus +moves its window forward and sends the fourth, fifth, and sixth packets, +with sequence numbers 3, 0, and 1, respectively. The packet with +sequence number 3 is lost, but the packet with sequence number 0 +arrives---a packet containing new data. Now consider the receiver's +viewpoint in Figure 3.27, which has a figurative curtain between the +sender and the receiver, since the receiver cannot "see" the actions +taken by the sender. All the receiver observes is the sequence of +messages it receives from the channel and sends into the channel. As far +as it is concerned, the two scenarios in Figure 3.27 are identical. +There is no way of distinguishing the retransmission of the first packet +from an original transmission of the fifth packet. Clearly, a window +size that is 1 less than the size of the sequence number space won't +work. But how small must the window size be? A problem at the end of the +chapter asks you to show that the window size must be less than or equal +to half the size of the sequence number space for SR protocols. At the +companion Web site, you will find an applet that animates the operation +of the SR protocol. Try performing the same experiments that you did +with the GBN applet. Do the results agree with what you expect? This +completes our discussion of reliable data transfer protocols. We've +covered a lot of ground and introduced numerous mechanisms that together +provide for reliable data transfer. Table 3.1 summarizes these +mechanisms. Now that we have seen all of these mechanisms in operation +and can see the "big picture," we encourage you to review this section +again to see how these mechanisms were incrementally added to cover +increasingly complex (and realistic) models of the channel connecting +the sender and receiver, or to improve the performance of the protocols. +Let's conclude our discussion of reliable data transfer protocols by +considering one remaining assumption in our underlying channel model. +Recall that we have assumed that packets cannot be reordered within the +channel between the sender and receiver. This is generally a reasonable +assumption when the sender and receiver are connected by a single +physical wire. However, when the "channel" connecting the two is a +network, packet reordering can occur. One manifestation of packet +reordering is that old copies of a packet with a sequence or +acknowledgment + +Figure 3.27 SR receiver dilemma with too-large windows: A new packet or +a retransmission? + +Table 3.1 Summary of reliable data transfer mechanisms and their use +Mechanism + +Use, Comments + +Checksum + +Used to detect bit errors in a transmitted packet. + +Timer + +Used to timeout/retransmit a packet, possibly because the packet (or its +ACK) was lost within the channel. Because timeouts can occur when a +packet is delayed but not lost (premature timeout), or when a packet has +been received by the receiver but the receiver-to-sender ACK has been +lost, duplicate copies + +of a packet may be received by a receiver. Sequence + +Used for sequential numbering of packets of data flowing from sender to + +number + +receiver. Gaps in the sequence numbers of received packets allow the +receiver to detect a lost packet. Packets with duplicate sequence +numbers allow the receiver to detect duplicate copies of a packet. + +Acknowledgment + +Used by the receiver to tell the sender that a packet or set of packets +has been received correctly. Acknowledgments will typically carry the +sequence number of the packet or packets being acknowledged. +Acknowledgments may be individual or cumulative, depending on the +protocol. + +Negative + +Used by the receiver to tell the sender that a packet has not been +received + +acknowledgment + +correctly. Negative acknowledgments will typically carry the sequence +number of the packet that was not received correctly. + +Window, + +The sender may be restricted to sending only packets with sequence +numbers + +pipelining + +that fall within a given range. By allowing multiple packets to be +transmitted but not yet acknowledged, sender utilization can be +increased over a stop-and-wait mode of operation. We'll see shortly that +the window size may be set on the basis of the receiver's ability to +receive and buffer messages, or the level of congestion in the network, +or both. + +number of x can appear, even though neither the sender's nor the +receiver's window contains x. With packet reordering, the channel can be +thought of as essentially buffering packets and spontaneously emitting +these packets at any point in the future. Because sequence numbers may +be reused, some care must be taken to guard against such duplicate +packets. The approach taken in practice is to ensure that a sequence +number is not reused until the sender is "sure" that any previously sent +packets with sequence number x are no longer in the network. This is +done by assuming that a packet cannot "live" in the network for longer +than some fixed maximum amount of time. A maximum packet lifetime of +approximately three minutes is assumed in the TCP extensions for +high-speed networks \[RFC 1323\]. \[Sunshine 1978\] describes a method +for using sequence numbers such that reordering problems can be +completely avoided. + +3.5 Connection-Oriented Transport: TCP Now that we have covered the +underlying principles of reliable data transfer, let's turn to TCP---the +Internet's transport-layer, connection-oriented, reliable transport +protocol. In this section, we'll see that in order to provide reliable +data transfer, TCP relies on many of the underlying principles discussed +in the previous section, including error detection, retransmissions, +cumulative acknowledgments, timers, and header fields for sequence and +acknowledgment numbers. TCP is defined in RFC 793, RFC 1122, RFC 1323, +RFC 2018, and RFC 2581. + +3.5.1 The TCP Connection TCP is said to be connection-oriented because +before one application process can begin to send data to another, the +two processes must first "handshake" with each other---that is, they +must send some preliminary segments to each other to establish the +parameters of the ensuing data transfer. As part of TCP connection +establishment, both sides of the connection will initialize many TCP +state variables (many of which will be discussed in this section and in +Section 3.7) associated with the TCP connection. The TCP "connection" is +not an end-to-end TDM or FDM circuit as in a circuit-switched network. +Instead, the "connection" is a logical one, with common state residing +only in the TCPs in the two communicating end systems. Recall that +because the TCP protocol runs only in the end systems and not in the +intermediate network elements (routers and link-layer switches), the +intermediate network elements do not maintain TCP connection state. In +fact, the intermediate routers are completely oblivious to TCP +connections; they see datagrams, not connections. A TCP connection +provides a full-duplex service: If there is a TCP connection between +Process A on one host and Process B on another host, then +application-layer data can flow from Process A to Process B at the same +time as application-layer data flows from Process B to Process A. A TCP +connection is also always point-to-point, that is, between a single +sender and a single receiver. Socalled "multicasting" (see the online +supplementary materials for this text)---the transfer of data from one +sender to many receivers in a single send operation---is not possible +with TCP. With TCP, two hosts are company and three are a crowd! Let's +now take a look at how a TCP connection is established. Suppose a +process running in one host wants to initiate a connection with another +process in another host. Recall that the process that is + +initiating the connection is called the client process, while the other +process is called the server process. The client application process +first informs the client transport layer that it wants to establish a +connection + +CASE HISTORY Vinton Cerf, Robert Kahn, and TCP/IP In the early 1970s, +packet-switched networks began to proliferate, with the ARPAnet---the +precursor of the Internet---being just one of many networks. Each of +these networks had its own protocol. Two researchers, Vinton Cerf and +Robert Kahn, recognized the importance of interconnecting these networks +and invented a cross-network protocol called TCP/IP, which stands for +Transmission Control Protocol/Internet Protocol. Although Cerf and Kahn +began by seeing the protocol as a single entity, it was later split into +its two parts, TCP and IP, which operated separately. Cerf and Kahn +published a paper on TCP/IP in May 1974 in IEEE Transactions on +Communications Technology \[Cerf 1974\]. The TCP/IP protocol, which is +the bread and butter of today's Internet, was devised before PCs, +workstations, smartphones, and tablets, before the proliferation of +Ethernet, cable, and DSL, WiFi, and other access network technologies, +and before the Web, social media, and streaming video. Cerf and Kahn saw +the need for a networking protocol that, on the one hand, provides broad +support for yet-to-be-defined applications and, on the other hand, +allows arbitrary hosts and link-layer protocols to interoperate. In +2004, Cerf and Kahn received the ACM's Turing Award, considered the +"Nobel Prize of Computing" for "pioneering work on internetworking, +including the design and implementation of the Internet's basic +communications protocols, TCP/IP, and for inspired leadership in +networking." + +to a process in the server. Recall from Section 2.7.2, a Python client +program does this by issuing the command + +clientSocket.connect((serverName, serverPort)) + +where serverName is the name of the server and serverPort identifies the +process on the server. TCP in the client then proceeds to establish a +TCP connection with TCP in the server. At the end of this section we +discuss in some detail the connection-establishment procedure. For now +it suffices to know that the client first sends a special TCP segment; +the server responds with a second special TCP segment; and finally the +client responds again with a third special segment. The first two +segments carry no payload, that is, no application-layer data; the third +of these segments may carry a payload. Because + +three segments are sent between the two hosts, this +connection-establishment procedure is often referred to as a three-way +handshake. Once a TCP connection is established, the two application +processes can send data to each other. Let's consider the sending of +data from the client process to the server process. The client process +passes a stream of data through the socket (the door of the process), as +described in Section 2.7. Once the data passes through the door, the +data is in the hands of TCP running in the client. As shown in Figure +3.28, TCP directs this data to the connection's send buffer, which is +one of the buffers that is set aside during the initial three-way +handshake. From time to time, TCP will grab chunks of data from the send +buffer and pass the data to the network layer. Interestingly, the TCP +specification \[RFC 793\] is very laid back about specifying when TCP +should actually send buffered data, stating that TCP should "send that +data in segments at its own convenience." The maximum amount of data +that can be grabbed and placed in a segment is limited by the maximum +segment size (MSS). The MSS is typically set by first determining the +length of the largest link-layer frame that can be sent by the local +sending host (the socalled maximum transmission unit, MTU), and then +setting the MSS to ensure that a TCP segment (when encapsulated in an IP +datagram) plus the TCP/IP header length (typically 40 bytes) will fit +into a single link-layer frame. Both Ethernet and PPP link-layer +protocols have an MTU of 1,500 bytes. Thus a typical value of MSS is +1460 bytes. Approaches have also been proposed for discovering the path +MTU ---the largest link-layer frame that can be sent on all links from +source to destination \[RFC 1191\]---and setting the MSS based on the +path MTU value. Note that the MSS is the maximum amount of +application-layer data in the segment, not the maximum size of the TCP +segment including headers. (This terminology is confusing, but we have +to live with it, as it is well entrenched.) TCP pairs each chunk of +client data with a TCP header, thereby forming TCP segments. The +segments are passed down to the network layer, where they are separately +encapsulated within network-layer IP datagrams. The IP datagrams are +then sent into the network. When TCP receives a segment at the other +end, the segment's data is placed in the TCP connection's receive +buffer, as shown in Figure 3.28. The application reads the stream of +data from this buffer. Each side of the connection has + +Figure 3.28 TCP send and receive buffers + +its own send buffer and its own receive buffer. (You can see the online +flow-control applet at http://www.awl.com/kurose-ross, which provides an +animation of the send and receive buffers.) We see from this discussion +that a TCP connection consists of buffers, variables, and a socket +connection to a process in one host, and another set of buffers, +variables, and a socket connection to a process in another host. As +mentioned earlier, no buffers or variables are allocated to the +connection in the network elements (routers, switches, and repeaters) +between the hosts. + +3.5.2 TCP Segment Structure Having taken a brief look at the TCP +connection, let's examine the TCP segment structure. The TCP segment +consists of header fields and a data field. The data field contains a +chunk of application data. As mentioned above, the MSS limits the +maximum size of a segment's data field. When TCP sends a large file, +such as an image as part of a Web page, it typically breaks the file +into chunks of size MSS (except for the last chunk, which will often be +less than the MSS). Interactive applications, however, often transmit +data chunks that are smaller than the MSS; for example, with remote +login applications like Telnet, the data field in the TCP segment is +often only one byte. Because the TCP header is typically 20 bytes (12 +bytes more than the UDP header), segments sent by Telnet may be only 21 +bytes in length. Figure 3.29 shows the structure of the TCP segment. As +with UDP, the header includes source and destination port numbers, which +are used for multiplexing/demultiplexing data from/to upper-layer +applications. Also, as with UDP, the header includes a checksum field. A +TCP segment header also contains the following fields: The 32-bit +sequence number field and the 32-bit acknowledgment number field are +used by the TCP sender and receiver in implementing a reliable data +transfer service, as discussed below. The 16-bit receive window field is +used for flow control. We will see shortly that it is used to indicate +the number of bytes that a receiver is willing to accept. The 4-bit +header length field specifies the length of the TCP header in 32-bit +words. The TCP header can be of variable length due to the TCP options +field. (Typically, the options field is empty, so that the length of the +typical TCP header is 20 bytes.) The optional and variable-length +options field is used when a sender and receiver negotiate the maximum +segment size (MSS) or as a window scaling factor for use in high-speed +networks. A timestamping option is also defined. See RFC 854 and RFC +1323 for additional details. The flag field contains 6 bits. The ACK bit +is used to indicate that the value carried in the acknowledgment field +is valid; that is, the segment contains an acknowledgment for a segment +that has been successfully received. The RST, + +Figure 3.29 TCP segment structure + +SYN, and FIN bits are used for connection setup and teardown, as we will +discuss at the end of this section. The CWR and ECE bits are used in +explicit congestion notification, as discussed in Section 3.7.2. Setting +the PSH bit indicates that the receiver should pass the data to the +upper layer immediately. Finally, the URG bit is used to indicate that +there is data in this segment that the sending-side upper-layer entity +has marked as "urgent." The location of the last byte of this urgent +data is indicated by the 16-bit urgent data pointer field. TCP must +inform the receiving-side upperlayer entity when urgent data exists and +pass it a pointer to the end of the urgent data. (In practice, the PSH, +URG, and the urgent data pointer are not used. However, we mention these +fields for completeness.) Our experience as teachers is that our +students sometimes find discussion of packet formats rather dry and +perhaps a bit boring. For a fun and fanciful look at TCP header fields, +particularly if you love Legos™ as we do, see \[Pomeranz 2010\]. +Sequence Numbers and Acknowledgment Numbers Two of the most important +fields in the TCP segment header are the sequence number field and the +acknowledgment number field. These fields are a critical part of TCP's +reliable data transfer service. But before discussing how these fields +are used to provide reliable data transfer, let us first explain what +exactly TCP puts in these fields. + +Figure 3.30 Dividing file data into TCP segments + +TCP views data as an unstructured, but ordered, stream of bytes. TCP's +use of sequence numbers reflects this view in that sequence numbers are +over the stream of transmitted bytes and not over the series of +transmitted segments. The sequence number for a segment is therefore the +byte-stream number of the first byte in the segment. Let's look at an +example. Suppose that a process in Host A wants to send a stream of data +to a process in Host B over a TCP connection. The TCP in Host A will +implicitly number each byte in the data stream. Suppose that the data +stream consists of a file consisting of 500,000 bytes, that the MSS is +1,000 bytes, and that the first byte of the data stream is numbered 0. +As shown in Figure 3.30, TCP constructs 500 segments out of the data +stream. The first segment gets assigned sequence number 0, the second +segment gets assigned sequence number 1,000, the third segment gets +assigned sequence number 2,000, and so on. Each sequence number is +inserted in the sequence number field in the header of the appropriate +TCP segment. Now let's consider acknowledgment numbers. These are a +little trickier than sequence numbers. Recall that TCP is full-duplex, +so that Host A may be receiving data from Host B while it sends data to +Host B (as part of the same TCP connection). Each of the segments that +arrive from Host B has a sequence number for the data flowing from B to +A. The acknowledgment number that Host A puts in its segment is the +sequence number of the next byte Host A is expecting from Host B. It is +good to look at a few examples to understand what is going on here. +Suppose that Host A has received all bytes numbered 0 through 535 from B +and suppose that it is about to send a segment to Host B. Host A is +waiting for byte 536 and all the subsequent bytes in Host B's data +stream. So Host A puts 536 in the acknowledgment number field of the +segment it sends to B. As another example, suppose that Host A has +received one segment from Host B containing bytes 0 through 535 and +another segment containing bytes 900 through 1,000. For some reason Host +A has not yet received bytes 536 through 899. In this example, Host A is +still waiting for byte 536 (and beyond) in order to re-create B's data +stream. Thus, A's next segment to B will contain 536 in the +acknowledgment number field. Because TCP only acknowledges bytes up to +the first missing byte in the stream, TCP is said to provide cumulative +acknowledgments. + +This last example also brings up an important but subtle issue. Host A +received the third segment (bytes 900 through 1,000) before receiving +the second segment (bytes 536 through 899). Thus, the third segment +arrived out of order. The subtle issue is: What does a host do when it +receives out-of-order segments in a TCP connection? Interestingly, the +TCP RFCs do not impose any rules here and leave the decision up to the +programmers implementing a TCP implementation. There are basically two +choices: either (1) the receiver immediately discards out-of-order +segments (which, as we discussed earlier, can simplify receiver design), +or (2) the receiver keeps the out-of-order bytes and waits for the +missing bytes to fill in the gaps. Clearly, the latter choice is more +efficient in terms of network bandwidth, and is the approach taken in +practice. In Figure 3.30, we assumed that the initial sequence number +was zero. In truth, both sides of a TCP connection randomly choose an +initial sequence number. This is done to minimize the possibility that a +segment that is still present in the network from an earlier, +already-terminated connection between two hosts is mistaken for a valid +segment in a later connection between these same two hosts (which also +happen to be using the same port numbers as the old connection) +\[Sunshine 1978\]. Telnet: A Case Study for Sequence and Acknowledgment +Numbers Telnet, defined in RFC 854, is a popular application-layer +protocol used for remote login. It runs over TCP and is designed to work +between any pair of hosts. Unlike the bulk data transfer applications +discussed in Chapter 2, Telnet is an interactive application. We discuss +a Telnet example here, as it nicely illustrates TCP sequence and +acknowledgment numbers. We note that many users now prefer to use the +SSH protocol rather than Telnet, since data sent in a Telnet connection +(including passwords!) are not encrypted, making Telnet vulnerable to +eavesdropping attacks (as discussed in Section 8.7). Suppose Host A +initiates a Telnet session with Host B. Because Host A initiates the +session, it is labeled the client, and Host B is labeled the server. +Each character typed by the user (at the client) will be sent to the +remote host; the remote host will send back a copy of each character, +which will be displayed on the Telnet user's screen. This "echo back" is +used to ensure that characters seen by the Telnet user have already been +received and processed at the remote site. Each character thus traverses +the network twice between the time the user hits the key and the time +the character is displayed on the user's monitor. Now suppose the user +types a single letter, 'C,' and then grabs a coffee. Let's examine the +TCP segments that are sent between the client and server. As shown in +Figure 3.31, we suppose the starting sequence numbers are 42 and 79 for +the client and server, respectively. Recall that the sequence number of +a segment is the sequence number of the first byte in the data field. +Thus, the first segment sent from the client will have sequence number +42; the first segment sent from the server will have sequence number 79. +Recall that the acknowledgment number is the sequence + +Figure 3.31 Sequence and acknowledgment numbers for a simple Telnet +application over TCP + +number of the next byte of data that the host is waiting for. After the +TCP connection is established but before any data is sent, the client is +waiting for byte 79 and the server is waiting for byte 42. As shown in +Figure 3.31, three segments are sent. The first segment is sent from the +client to the server, containing the 1-byte ASCII representation of the +letter 'C' in its data field. This first segment also has 42 in its +sequence number field, as we just described. Also, because the client +has not yet received any data from the server, this first segment will +have 79 in its acknowledgment number field. The second segment is sent +from the server to the client. It serves a dual purpose. First it +provides an acknowledgment of the data the server has received. By +putting 43 in the acknowledgment field, the server is telling the client +that it has successfully received everything up through byte 42 and is +now waiting for bytes 43 onward. The second purpose of this segment is +to echo back the letter 'C.' Thus, the second segment has the ASCII +representation of 'C' in its data field. This second segment has the +sequence number 79, the initial sequence number of the server-to-client +data flow of this TCP connection, as this is the very first byte of data +that the server is sending. Note that the acknowledgment for +client-to-server data is carried in a segment carrying server-to-client +data; this acknowledgment is said to be piggybacked on the +server-to-client data segment. + +The third segment is sent from the client to the server. Its sole +purpose is to acknowledge the data it has received from the server. +(Recall that the second segment contained data---the letter 'C'---from +the server to the client.) This segment has an empty data field (that +is, the acknowledgment is not being piggybacked with any +client-to-server data). The segment has 80 in the acknowledgment number +field because the client has received the stream of bytes up through +byte sequence number 79 and it is now waiting for bytes 80 onward. You +might think it odd that this segment also has a sequence number since +the segment contains no data. But because TCP has a sequence number +field, the segment needs to have some sequence number. + +3.5.3 Round-Trip Time Estimation and Timeout TCP, like our rdt protocol +in Section 3.4, uses a timeout/retransmit mechanism to recover from lost +segments. Although this is conceptually simple, many subtle issues arise +when we implement a timeout/retransmit mechanism in an actual protocol +such as TCP. Perhaps the most obvious question is the length of the +timeout intervals. Clearly, the timeout should be larger than the +connection's round-trip time (RTT), that is, the time from when a +segment is sent until it is acknowledged. Otherwise, unnecessary +retransmissions would be sent. But how much larger? How should the RTT +be estimated in the first place? Should a timer be associated with each +and every unacknowledged segment? So many questions! Our discussion in +this section is based on the TCP work in \[Jacobson 1988\] and the +current IETF recommendations for managing TCP timers \[RFC 6298\]. +Estimating the Round-Trip Time Let's begin our study of TCP timer +management by considering how TCP estimates the round-trip time between +sender and receiver. This is accomplished as follows. The sample RTT, +denoted SampleRTT , for a segment is the amount of time between when the +segment is sent (that is, passed to IP) and when an acknowledgment for +the segment is received. Instead of measuring a SampleRTT for every +transmitted segment, most TCP implementations take only one SampleRTT +measurement at a time. That is, at any point in time, the SampleRTT is +being estimated for only one of the transmitted but currently +unacknowledged segments, leading to a new value of SampleRTT +approximately once every RTT. Also, TCP never computes a SampleRTT for a +segment that has been retransmitted; it only measures SampleRTT for +segments that have been transmitted once \[Karn 1987\]. (A problem at +the end of the chapter asks you to consider why.) Obviously, the +SampleRTT values will fluctuate from segment to segment due to +congestion in the routers and to the varying load on the end systems. +Because of this fluctuation, any given SampleRTT value may be atypical. +In order to estimate a typical RTT, it is therefore natural to take some +sort of average of the SampleRTT values. TCP maintains an average, +called EstimatedRTT , of the + +SampleRTT values. Upon obtaining a new SampleRTT , TCP updates +EstimatedRTT according to the following formula: + +EstimatedRTT=(1−α)⋅EstimatedRTT+α⋅SampleRTT The formula above is written +in the form of a programming-language statement---the new value of +EstimatedRTT is a weighted combination of the previous value of +EstimatedRTT and the new value for SampleRTT. The recommended value of α +is α = 0.125 (that is, 1/8) \[RFC 6298\], in which case the formula +above becomes: + +EstimatedRTT=0.875⋅EstimatedRTT+0.125⋅SampleRTT + +Note that EstimatedRTT is a weighted average of the SampleRTT values. As +discussed in a homework problem at the end of this chapter, this +weighted average puts more weight on recent samples than on old samples. +This is natural, as the more recent samples better reflect the current +congestion in the network. In statistics, such an average is called an +exponential weighted moving average (EWMA). The word "exponential" +appears in EWMA because the weight of a given SampleRTT decays +exponentially fast as the updates proceed. In the homework problems you +will be asked to derive the exponential term in EstimatedRTT . Figure +3.32 shows the SampleRTT values and EstimatedRTT for a value of α = 1/8 +for a TCP connection between gaia.cs.umass.edu (in Amherst, +Massachusetts) to fantasia.eurecom.fr (in the south of France). Clearly, +the variations in the SampleRTT are smoothed out in the computation of +the EstimatedRTT . In addition to having an estimate of the RTT, it is +also valuable to have a measure of the variability of the RTT. \[RFC +6298\] defines the RTT variation, DevRTT , as an estimate of how much +SampleRTT typically deviates from EstimatedRTT : + +DevRTT=(1−β)⋅DevRTT+β⋅\|SampleRTT−EstimatedRTT\| + +Note that DevRTT is an EWMA of the difference between SampleRTT and +EstimatedRTT . If the SampleRTT values have little fluctuation, then +DevRTT will be small; on the other hand, if there is a lot of +fluctuation, DevRTT will be large. The recommended value of β is 0.25. + +Setting and Managing the Retransmission Timeout Interval Given values of +EstimatedRTT and DevRTT , what value should be used for TCP's timeout +interval? Clearly, the interval should be greater than or equal to + +PRINCIPLES IN PRACTICE TCP provides reliable data transfer by using +positive acknowledgments and timers in much the same way that we studied +in Section 3.4. TCP acknowledges data that has been received correctly, +and it then retransmits segments when segments or their corresponding +acknowledgments are thought to be lost or corrupted. Certain versions of +TCP also have an implicit NAK mechanism---with TCP's fast retransmit +mechanism, the receipt of three duplicate ACKs for a given segment +serves as an implicit NAK for the following segment, triggering +retransmission of that segment before timeout. TCP uses sequences of +numbers to allow the receiver to identify lost or duplicate segments. +Just as in the case of our reliable data transfer protocol, rdt3.0 , TCP +cannot itself tell for certain if a segment, or its ACK, is lost, +corrupted, or overly delayed. At the sender, TCP's response will be the +same: retransmit the segment in question. TCP also uses pipelining, +allowing the sender to have multiple transmitted but +yet-to-beacknowledged segments outstanding at any given time. We saw +earlier that pipelining can greatly improve a session's throughput when +the ratio of the segment size to round-trip delay is small. The specific +number of outstanding, unacknowledged segments that a sender can have is +determined by TCP's flow-control and congestion-control mechanisms. TCP +flow control is discussed at the end of this section; TCP congestion +control is discussed in Section 3.7. For the time being, we must simply +be aware that the TCP sender uses pipelining. EstimatedRTT , or +unnecessary retransmissions would be sent. But the timeout interval +should not be too much larger than EstimatedRTT ; otherwise, when a +segment is lost, TCP would not quickly retransmit the segment, leading +to large data transfer delays. It is therefore desirable to set the +timeout equal to the EstimatedRTT plus some margin. The margin should be +large when there is a lot of fluctuation in the SampleRTT values; it +should be small when there is little fluctuation. The value of DevRTT +should thus come into play here. All of these considerations are taken +into account in TCP's method for determining the retransmission timeout +interval: + +TimeoutInterval=EstimatedRTT+4⋅DevRTT + +An initial TimeoutInterval value of 1 second is recommended \[RFC +6298\]. Also, when a timeout occurs, the value of TimeoutInterval is +doubled to avoid a premature timeout occurring for a + +subsequent segment that will soon be acknowledged. However, as soon as a +segment is received and EstimatedRTT is updated, the TimeoutInterval is +again computed using the formula above. + +Figure 3.32 RTT samples and RTT estimates + +3.5.4 Reliable Data Transfer Recall that the Internet's network-layer +service (IP service) is unreliable. IP does not guarantee datagram +delivery, does not guarantee in-order delivery of datagrams, and does +not guarantee the integrity of the data in the datagrams. With IP +service, datagrams can overflow router buffers and never reach their +destination, datagrams can arrive out of order, and bits in the datagram +can get corrupted (flipped from 0 to 1 and vice versa). Because +transport-layer segments are carried across the network by IP datagrams, +transport-layer segments can suffer from these problems as well. TCP +creates a reliable data transfer service on top of IP's unreliable +best-effort service. TCP's reliable data transfer service ensures that +the data stream that a process reads out of its TCP receive buffer is +uncorrupted, without gaps, without duplication, and in sequence; that +is, the byte stream is exactly the same byte stream that was sent by the +end system on the other side of the connection. How TCP provides a +reliable data transfer involves many of the principles that we studied +in Section 3.4. In our earlier development of reliable data transfer +techniques, it was conceptually easiest to assume + +that an individual timer is associated with each transmitted but not yet +acknowledged segment. While this is great in theory, timer management +can require considerable overhead. Thus, the recommended TCP timer +management procedures \[RFC 6298\] use only a single retransmission +timer, even if there are multiple transmitted but not yet acknowledged +segments. The TCP protocol described in this section follows this +single-timer recommendation. We will discuss how TCP provides reliable +data transfer in two incremental steps. We first present a highly +simplified description of a TCP sender that uses only timeouts to +recover from lost segments; we then present a more complete description +that uses duplicate acknowledgments in addition to timeouts. In the +ensuing discussion, we suppose that data is being sent in only one +direction, from Host A to Host B, and that Host A is sending a large +file. Figure 3.33 presents a highly simplified description of a TCP +sender. We see that there are three major events related to data +transmission and retransmission in the TCP sender: data received from +application above; timer timeout; and ACK + +Figure 3.33 Simplified TCP sender + +receipt. Upon the occurrence of the first major event, TCP receives data +from the application, encapsulates the data in a segment, and passes the +segment to IP. Note that each segment includes a sequence number that is +the byte-stream number of the first data byte in the segment, as +described in Section 3.5.2. Also note that if the timer is already not +running for some other segment, TCP starts the timer when the segment is +passed to IP. (It is helpful to think of the timer as being associated +with the oldest unacknowledged segment.) The expiration interval for +this timer is the TimeoutInterval , which is calculated from +EstimatedRTT and DevRTT , as described in Section 3.5.3. The second +major event is the timeout. TCP responds to the timeout event by +retransmitting the segment that caused the timeout. TCP then restarts +the timer. The third major event that must be handled by the TCP sender +is the arrival of an acknowledgment segment (ACK) from the receiver +(more specifically, a segment containing a valid ACK field value). On +the occurrence of this event, TCP compares the ACK value y with its +variable SendBase . The TCP state variable SendBase is the sequence +number of the oldest unacknowledged byte. (Thus SendBase--1 is the +sequence number of the last byte that is known to have been received +correctly and in order at the receiver.) As indicated earlier, TCP uses +cumulative acknowledgments, so that y acknowledges the receipt of all +bytes before byte number y . If y \> SendBase , then the ACK is +acknowledging one or more previously unacknowledged segments. Thus the +sender updates its SendBase variable; it also restarts the timer if +there currently are any not-yet-acknowledged segments. A Few Interesting +Scenarios We have just described a highly simplified version of how TCP +provides reliable data transfer. But even this highly simplified version +has many subtleties. To get a good feeling for how this protocol works, +let's now walk through a few simple scenarios. Figure 3.34 depicts the +first scenario, in which Host A sends one segment to Host B. Suppose +that this segment has sequence number 92 and contains 8 bytes of data. +After sending this segment, Host A waits for a segment from B with +acknowledgment number 100. Although the segment from A is received at B, +the acknowledgment from B to A gets lost. In this case, the timeout +event occurs, and Host A retransmits the same segment. Of course, when +Host B receives the retransmission, it observes from the sequence number +that the segment contains data that has already been received. Thus, TCP +in Host B will discard the bytes in the retransmitted segment. In a +second scenario, shown in Figure 3.35, Host A sends two segments back to +back. The first segment has sequence number 92 and 8 bytes of data, and +the second segment has sequence number 100 and 20 bytes of data. Suppose +that both segments arrive intact at B, and B sends two separate +acknowledgments for each of these segments. The first of these +acknowledgments has acknowledgment number 100; the second has +acknowledgment number 120. Suppose now that neither of the +acknowledgments arrives at Host A before the timeout. When the timeout +event occurs, Host + +Figure 3.34 Retransmission due to a lost acknowledgment + +A resends the first segment with sequence number 92 and restarts the +timer. As long as the ACK for the second segment arrives before the new +timeout, the second segment will not be retransmitted. In a third and +final scenario, suppose Host A sends the two segments, exactly as in the +second example. The acknowledgment of the first segment is lost in the +network, but just before the timeout event, Host A receives an +acknowledgment with acknowledgment number 120. Host A therefore knows +that Host B has received everything up through byte 119; so Host A does +not resend either of the two segments. This scenario is illustrated in +Figure 3.36. Doubling the Timeout Interval We now discuss a few +modifications that most TCP implementations employ. The first concerns +the length of the timeout interval after a timer expiration. In this +modification, whenever the timeout event occurs, TCP retransmits the +not-yet-acknowledged segment with the smallest sequence number, as +described above. But each time TCP retransmits, it sets the next timeout +interval to twice the previous value, + +Figure 3.35 Segment 100 not retransmitted + +rather than deriving it from the last EstimatedRTT and DevRTT (as +described in Section 3.5.3). For example, suppose TimeoutInterval +associated with the oldest not yet acknowledged segment is .75 sec when +the timer first expires. TCP will then retransmit this segment and set +the new expiration time to 1.5 sec. If the timer expires again 1.5 sec +later, TCP will again retransmit this segment, now setting the +expiration time to 3.0 sec. Thus the intervals grow exponentially after +each retransmission. However, whenever the timer is started after either +of the two other events (that is, data received from application above, +and ACK received), the TimeoutInterval is derived from the most recent +values of EstimatedRTT and DevRTT . This modification provides a limited +form of congestion control. (More comprehensive forms of TCP congestion +control will be studied in Section 3.7.) The timer expiration is most +likely caused by congestion in the network, that is, too many packets +arriving at one (or more) router queues in the path between the source +and destination, causing packets to be dropped and/or long queuing +delays. In times of congestion, if the sources continue to retransmit +packets persistently, the congestion + +Figure 3.36 A cumulative acknowledgment avoids retransmission of the +first segment + +may get worse. Instead, TCP acts more politely, with each sender +retransmitting after longer and longer intervals. We will see that a +similar idea is used by Ethernet when we study CSMA/CD in Chapter 6. +Fast Retransmit One of the problems with timeout-triggered +retransmissions is that the timeout period can be relatively long. When +a segment is lost, this long timeout period forces the sender to delay +resending the lost packet, thereby increasing the end-to-end delay. +Fortunately, the sender can often detect packet loss well before the +timeout event occurs by noting so-called duplicate ACKs. A duplicate ACK +is an ACK that reacknowledges a segment for which the sender has already +received an earlier acknowledgment. To understand the sender's response +to a duplicate ACK, we must look at why the receiver sends a duplicate +ACK in the first place. Table 3.2 summarizes the TCP receiver's ACK +generation policy \[RFC 5681\]. When a TCP receiver receives Table 3.2 +TCP ACK Generation Recommendation \[RFC 5681\] Event + +TCP Receiver Action + +Arrival of in-order segment with expected + +Delayed ACK. Wait up to 500 msec for arrival of + +sequence number. All data up to expected + +another in-order segment. If next in-order segment + +sequence number already acknowledged. + +does not arrive in this interval, send an ACK. + +Arrival of in-order segment with expected + +One Immediately send single cumulative ACK, + +sequence number. One other in-order + +ACKing both in-order segments. + +segment waiting for ACK transmission. Arrival of out-of-order segment +with higher- + +Immediately send duplicate ACK, indicating + +than-expected sequence number. Gap + +sequence number of next expected byte (which is + +detected. + +the lower end of the gap). + +Arrival of segment that partially or completely + +Immediately send ACK, provided that segment + +fills in gap in received data. + +starts at the lower end of gap. + +a segment with a sequence number that is larger than the next, expected, +in-order sequence number, it detects a gap in the data stream---that is, +a missing segment. This gap could be the result of lost or reordered +segments within the network. Since TCP does not use negative +acknowledgments, the receiver cannot send an explicit negative +acknowledgment back to the sender. Instead, it simply reacknowledges +(that is, generates a duplicate ACK for) the last in-order byte of data +it has received. (Note that Table 3.2 allows for the case that the +receiver does not discard out-of-order segments.) Because a sender often +sends a large number of segments back to back, if one segment is lost, +there will likely be many back-to-back duplicate ACKs. If the TCP sender +receives three duplicate ACKs for the same data, it takes this as an +indication that the segment following the segment that has been ACKed +three times has been lost. (In the homework problems, we consider the +question of why the sender waits for three duplicate ACKs, rather than +just a single duplicate ACK.) In the case that three duplicate ACKs are +received, the TCP sender performs a fast retransmit \[RFC 5681\], +retransmitting the missing segment before that segment's timer expires. +This is shown in Figure 3.37, where the second segment is lost, then +retransmitted before its timer expires. For TCP with fast retransmit, +the following code snippet replaces the ACK received event in Figure +3.33: + +event: ACK received, with ACK field value of y if (y \> SendBase) { +SendBase=y if (there are currently any not yet acknowledged segments) +start timer + +} + +Figure 3.37 Fast retransmit: retransmitting the missing segment before +the segment's timer expires + +else {/\* a duplicate ACK for already ACKed segment */ increment number +of duplicate ACKs received for y if (number of duplicate ACKS received +for y==3) /* TCP fast retransmit \*/ resend segment with sequence number +y } break; + +We noted earlier that many subtle issues arise when a timeout/retransmit +mechanism is implemented in an actual protocol such as TCP. The +procedures above, which have evolved as a result of more than 20 years +of experience with TCP timers, should convince you that this is indeed +the case! Go-Back-N or Selective Repeat? Let us close our study of TCP's +error-recovery mechanism by considering the following question: Is TCP a +GBN or an SR protocol? Recall that TCP acknowledgments are cumulative +and correctly received but out-of-order segments are not individually +ACKed by the receiver. Consequently, as shown in Figure 3.33 (see also +Figure 3.19), the TCP sender need only maintain the smallest sequence +number of a transmitted but unacknowledged byte ( SendBase ) and the +sequence number of the next byte to be sent ( NextSeqNum ). In this +sense, TCP looks a lot like a GBN-style protocol. But there are some +striking differences between TCP and Go-Back-N. Many TCP implementations +will buffer correctly received but out-of-order segments \[Stevens +1994\]. Consider also what happens when the sender sends a sequence of +segments 1, 2, . . ., N, and all of the segments arrive in order without +error at the receiver. Further suppose that the acknowledgment for +packet n\<N gets lost, but the remaining N−1 acknowledgments arrive at +the sender before their respective timeouts. In this example, GBN would +retransmit not only packet n, but also all of the subsequent packets +n+1,n+2,...,N. TCP, on the other hand, would retransmit at most one +segment, namely, segment n. Moreover, TCP would not even retransmit +segment n if the acknowledgment for segment n+1 arrived before the +timeout for segment n. A proposed modification to TCP, the so-called +selective acknowledgment \[RFC 2018\], allows a TCP receiver to +acknowledge out-of-order segments selectively rather than just +cumulatively acknowledging the last correctly received, in-order +segment. When combined with selective retransmission---skipping the +retransmission of segments that have already been selectively +acknowledged by the receiver---TCP looks a lot like our generic SR +protocol. Thus, TCP's error-recovery mechanism is probably best +categorized as a hybrid of GBN and SR protocols. + +3.5.5 Flow Control Recall that the hosts on each side of a TCP +connection set aside a receive buffer for the connection. When the TCP +connection receives bytes that are correct and in sequence, it places +the data in the receive buffer. The associated application process will +read data from this buffer, but not necessarily at the instant the data +arrives. Indeed, the receiving application may be busy with some other +task and may not even attempt to read the data until long after it has +arrived. If the application is relatively slow at reading the data, the +sender can very easily overflow the connection's receive buffer by +sending too much data too quickly. + +TCP provides a flow-control service to its applications to eliminate the +possibility of the sender overflowing the receiver's buffer. Flow +control is thus a speed-matching service---matching the rate at which +the sender is sending against the rate at which the receiving +application is reading. As noted earlier, a TCP sender can also be +throttled due to congestion within the IP network; this form of sender +control is referred to as congestion control, a topic we will explore in +detail in Sections 3.6 and 3.7. Even though the actions taken by flow +and congestion control are similar (the throttling of the sender), they +are obviously taken for very different reasons. Unfortunately, many +authors use the terms interchangeably, and the savvy reader would be +wise to distinguish between them. Let's now discuss how TCP provides its +flow-control service. In order to see the forest for the trees, we +suppose throughout this section that the TCP implementation is such that +the TCP receiver discards out-of-order segments. TCP provides flow +control by having the sender maintain a variable called the receive +window. Informally, the receive window is used to give the sender an +idea of how much free buffer space is available at the receiver. Because +TCP is full-duplex, the sender at each side of the connection maintains +a distinct receive window. Let's investigate the receive window in the +context of a file transfer. Suppose that Host A is sending a large file +to Host B over a TCP connection. Host B allocates a receive buffer to +this connection; denote its size by RcvBuffer . From time to time, the +application process in Host B reads from the buffer. Define the +following variables: LastByteRead : the number of the last byte in the +data stream read from the buffer by the application process in B +LastByteRcvd : the number of the last byte in the data stream that has +arrived from the network and has been placed in the receive buffer at B +Because TCP is not permitted to overflow the allocated buffer, we must +have + +LastByteRcvd−LastByteRead≤RcvBuffer + +The receive window, denoted rwnd is set to the amount of spare room in +the buffer: + +rwnd=RcvBuffer−\[LastByteRcvd−LastByteRead\] + +Because the spare room changes with time, rwnd is dynamic. The variable +rwnd is illustrated in Figure 3.38. + +How does the connection use the variable rwnd to provide the +flow-control service? Host B tells Host A how much spare room it has in +the connection buffer by placing its current value of rwnd in the +receive window field of every segment it sends to A. Initially, Host B +sets rwnd = RcvBuffer . Note that to pull this off, Host B must keep +track of several connection-specific variables. Host A in turn keeps +track of two variables, LastByteSent and LastByteAcked , which have +obvious meanings. Note that the difference between these two variables, +LastByteSent -- LastByteAcked , is the amount of unacknowledged data +that A has sent into the connection. By keeping the amount of +unacknowledged data less than the value of rwnd , Host A is assured that +it is not + +Figure 3.38 The receive window (rwnd) and the receive buffer (RcvBuffer) + +overflowing the receive buffer at Host B. Thus, Host A makes sure +throughout the connection's life that + +LastByteSent−LastByteAcked≤rwnd + +There is one minor technical problem with this scheme. To see this, +suppose Host B's receive buffer becomes full so that rwnd = 0. After +advertising rwnd = 0 to Host A, also suppose that B has nothing to send +to A. Now consider what happens. As the application process at B empties +the buffer, TCP does not send new segments with new rwnd values to Host +A; indeed, TCP sends a segment to Host A only if it has data to send or +if it has an acknowledgment to send. Therefore, Host A is never informed +that some space has opened up in Host B's receive buffer---Host A is +blocked and can transmit no more data! To solve this problem, the TCP +specification requires Host A to continue to send segments with one data +byte when B's receive window is zero. These segments will be +acknowledged by the receiver. Eventually the buffer will begin to empty +and the acknowledgments will contain a nonzero rwnd value. + +The online site at http://www.awl.com/kurose-ross for this book provides +an interactive Java applet that illustrates the operation of the TCP +receive window. Having described TCP's flow-control service, we briefly +mention here that UDP does not provide flow control and consequently, +segments may be lost at the receiver due to buffer overflow. For +example, consider sending a series of UDP segments from a process on +Host A to a process on Host B. For a typical UDP implementation, UDP +will append the segments in a finite-sized buffer that "precedes" the +corresponding socket (that is, the door to the process). The process +reads one entire segment at a time from the buffer. If the process does +not read the segments fast enough from the buffer, the buffer will +overflow and segments will get dropped. + +3.5.6 TCP Connection Management In this subsection we take a closer look +at how a TCP connection is established and torn down. Although this +topic may not seem particularly thrilling, it is important because TCP +connection establishment can significantly add to perceived delays (for +example, when surfing the Web). Furthermore, many of the most common +network attacks---including the incredibly popular SYN flood +attack---exploit vulnerabilities in TCP connection management. Let's +first take a look at how a TCP connection is established. Suppose a +process running in one host (client) wants to initiate a connection with +another process in another host (server). The client application process +first informs the client TCP that it wants to establish a connection to +a process in the server. The TCP in the client then proceeds to +establish a TCP connection with the TCP in the server in the following +manner: Step 1. The client-side TCP first sends a special TCP segment to +the server-side TCP. This special segment contains no application-layer +data. But one of the flag bits in the segment's header (see Figure +3.29), the SYN bit, is set to 1. For this reason, this special segment +is referred to as a SYN segment. In addition, the client randomly +chooses an initial sequence number ( client_isn ) and puts this number +in the sequence number field of the initial TCP SYN segment. This +segment is encapsulated within an IP datagram and sent to the server. +There has been considerable interest in properly randomizing the choice +of the client_isn in order to avoid certain security attacks \[CERT +2001--09\]. Step 2. Once the IP datagram containing the TCP SYN segment +arrives at the server host (assuming it does arrive!), the server +extracts the TCP SYN segment from the datagram, allocates the TCP +buffers and variables to the connection, and sends a connection-granted +segment to the client TCP. (We'll see in Chapter 8 that the allocation +of these buffers and variables before completing the third step of the +three-way handshake makes TCP vulnerable to a denial-of-service attack +known as SYN flooding.) This connection-granted segment also contains no +application-layer data. However, it does contain three important pieces +of information in the segment header. First, the SYN bit is set to 1. +Second, the acknowledgment field of the TCP segment header is set to + +client_isn+1 . Finally, the server chooses its own initial sequence +number ( server_isn ) and puts this value in the sequence number field +of the TCP segment header. This connection-granted segment is saying, in +effect, "I received your SYN packet to start a connection with your +initial sequence number, client_isn . I agree to establish this +connection. My own initial sequence number is server_isn ." The +connection-granted segment is referred to as a SYNACK segment. Step 3. +Upon receiving the SYNACK segment, the client also allocates buffers and +variables to the connection. The client host then sends the server yet +another segment; this last segment acknowledges the server's +connection-granted segment (the client does so by putting the value +server_isn+1 in the acknowledgment field of the TCP segment header). The +SYN bit is set to zero, since the connection is established. This third +stage of the three-way handshake may carry client-to-server data in the +segment payload. Once these three steps have been completed, the client +and server hosts can send segments containing data to each other. In +each of these future segments, the SYN bit will be set to zero. Note +that in order to establish the connection, three packets are sent +between the two hosts, as illustrated in Figure 3.39. For this reason, +this connection-establishment procedure is often referred to as a +threeway handshake. Several aspects of the TCP three-way handshake are +explored in the homework problems (Why are initial sequence numbers +needed? Why is a three-way handshake, as opposed to a two-way handshake, +needed?). It's interesting to note that a rock climber and a belayer +(who is stationed below the rock climber and whose job it is to handle +the climber's safety rope) use a threeway-handshake communication +protocol that is identical to TCP's to ensure that both sides are ready +before the climber begins ascent. All good things must come to an end, +and the same is true with a TCP connection. Either of the two processes +participating in a TCP connection can end the connection. When a +connection ends, the "resources" (that is, the buffers and variables) + +Figure 3.39 TCP three-way handshake: segment exchange + +Figure 3.40 Closing a TCP connection + +in the hosts are deallocated. As an example, suppose the client decides +to close the connection, as shown in Figure 3.40. The client application +process issues a close command. This causes the client TCP to send a +special TCP segment to the server process. This special segment has a +flag bit in the segment's header, the FIN bit (see Figure 3.29), set +to 1. When the server receives this segment, it sends the client an +acknowledgment segment in return. The server then sends its own shutdown +segment, which has the FIN bit set to 1. Finally, the client +acknowledges the server's shutdown segment. At this point, all the +resources in the two hosts are now deallocated. During the life of a TCP +connection, the TCP protocol running in each host makes transitions +through various TCP states. Figure 3.41 illustrates a typical sequence +of TCP states that are visited by the client TCP. The client TCP begins +in the CLOSED state. The application on the client side initiates a new +TCP connection (by creating a Socket object in our Java examples as in +the Python examples from Chapter 2). This causes TCP in the client to +send a SYN segment to TCP in the server. After having sent the SYN +segment, the client TCP enters the SYN_SENT state. While in the SYN_SENT +state, the client TCP waits for a segment from the server TCP that +includes an acknowledgment for the client's previous segment and + +Figure 3.41 A typical sequence of TCP states visited by a client TCP + +has the SYN bit set to 1. Having received such a segment, the client TCP +enters the ESTABLISHED state. While in the ESTABLISHED state, the TCP +client can send and receive TCP segments containing payload (that is, +application-generated) data. Suppose that the client application decides +it wants to close the connection. (Note that the server could also +choose to close the connection.) This causes the client TCP to send a +TCP segment with the FIN bit set to 1 and to enter the FIN_WAIT_1 state. +While in the FIN_WAIT_1 state, the client TCP waits for a TCP segment +from the server with an acknowledgment. When it receives this segment, +the client TCP enters the FIN_WAIT_2 state. While in the FIN_WAIT_2 +state, the client waits for another segment from the server with the FIN +bit set to 1; after receiving this segment, the client TCP acknowledges +the server's segment and enters the TIME_WAIT state. The TIME_WAIT state +lets the TCP client resend the final acknowledgment in case the ACK is +lost. The time spent in the TIME_WAIT state is implementation-dependent, +but typical values are 30 seconds, 1 minute, and 2 minutes. After the +wait, the connection formally closes and all resources on the client +side (including port numbers) are released. Figure 3.42 illustrates the +series of states typically visited by the server-side TCP, assuming the +client begins connection teardown. The transitions are self-explanatory. +In these two state-transition diagrams, we have only shown how a TCP +connection is normally established and shut down. We have not described +what happens in certain pathological scenarios, for example, when both +sides of a connection want to initiate or shut down at the same time. If +you are interested in learning about + +Figure 3.42 A typical sequence of TCP states visited by a server-side +TCP + +this and other advanced issues concerning TCP, you are encouraged to see +Stevens' comprehensive book \[Stevens 1994\]. Our discussion above has +assumed that both the client and server are prepared to communicate, +i.e., that the server is listening on the port to which the client sends +its SYN segment. Let's consider what happens when a host receives a TCP +segment whose port numbers or source IP address do not match with any of +the ongoing sockets in the host. For example, suppose a host receives a +TCP SYN packet with destination port 80, but the host is not accepting +connections on port 80 (that is, it is not running a Web server on port +80). Then the host will send a special reset segment to the source. This +TCP segment has the RST flag bit (see Section 3.5.2) set to 1. Thus, +when a host sends a reset segment, it is telling the source "I don't +have a socket for that segment. Please do not resend the segment." When +a host receives a UDP packet whose destination port number doesn't match +with an ongoing UDP socket, the host sends a special ICMP datagram, as +discussed in Chapter 5. Now that we have a good understanding of TCP +connection management, let's revisit the nmap portscanning tool and +examine more closely how it works. To explore a specific TCP port, say +port 6789, on a target host, nmap will send a TCP SYN segment with +destination port 6789 to that host. There are three possible outcomes: +The source host receives a TCP SYNACK segment from the target host. +Since this means that an application is running with TCP port 6789 on +the target post, nmap returns "open." FOCUS ON SECURITY The Syn Flood +Attack We've seen in our discussion of TCP's three-way handshake that a +server allocates and initializes connection variables and buffers in +response to a received SYN. The server then sends a SYNACK in response, +and awaits an ACK segment from the client. If the client does not send +an ACK to complete the third step of this 3-way handshake, eventually +(often after a minute or more) the server will terminate the half-open +connection and reclaim the allocated resources. This TCP connection +management protocol sets the stage for a classic Denial of Service (DoS) +attack known as the SYN flood attack. In this attack, the attacker(s) +send a large number of TCP SYN segments, without completing the third +handshake step. With this deluge of SYN segments, the server's +connection resources become exhausted as they are allocated (but never +used!) for half-open connections; legitimate clients are then denied +service. Such SYN flooding attacks were among the first documented DoS +attacks \[CERT SYN 1996\]. Fortunately, an effective defense known as +SYN cookies \[RFC 4987\] are now deployed in most major operating +systems. SYN cookies work as follows: When the server receives a SYN +segment, it does not know if the segment is coming + +from a legitimate user or is part of a SYN flood attack. So, instead of +creating a half-open TCP connection for this SYN, the server creates an +initial TCP sequence number that is a complicated function (hash +function) of source and destination IP addresses and port numbers of the +SYN segment, as well as a secret number only known to the server. This +carefully crafted initial sequence number is the so-called "cookie." The +server then sends the client a SYNACK packet with this special initial +sequence number. Importantly, the server does not remember the cookie or +any other state information corresponding to the SYN. A legitimate +client will return an ACK segment. When the server receives this ACK, it +must verify that the ACK corresponds to some SYN sent earlier. But how +is this done if the server maintains no memory about SYN segments? As +you may have guessed, it is done with the cookie. Recall that for a +legitimate ACK, the value in the acknowledgment field is equal to the +initial sequence number in the SYNACK (the cookie value in this case) +plus one (see Figure 3.39). The server can then run the same hash +function using the source and destination IP address and port numbers in +the SYNACK (which are the same as in the original SYN) and the secret +number. If the result of the function plus one is the same as the +acknowledgment (cookie) value in the client's SYNACK, the server +concludes that the ACK corresponds to an earlier SYN segment and is +hence valid. The server then creates a fully open connection along with +a socket. On the other hand, if the client does not return an ACK +segment, then the original SYN has done no harm at the server, since the +server hasn't yet allocated any resources in response to the original +bogus SYN. The source host receives a TCP RST segment from the target +host. This means that the SYN segment reached the target host, but the +target host is not running an application with TCP port 6789. But the +attacker at least knows that the segments destined to the host at port +6789 are not blocked by any firewall on the path between source and +target hosts. (Firewalls are discussed in Chapter 8.) The source +receives nothing. This likely means that the SYN segment was blocked by +an intervening firewall and never reached the target host. Nmap is a +powerful tool that can "case the joint" not only for open TCP ports, but +also for open UDP ports, for firewalls and their configurations, and +even for the versions of applications and operating systems. Most of +this is done by manipulating TCP connection-management segments +\[Skoudis 2006\]. You can download nmap from www.nmap.org. This +completes our introduction to error control and flow control in TCP. In +Section 3.7 we'll return to TCP and look at TCP congestion control in +some depth. Before doing so, however, we first step back and examine +congestion-control issues in a broader context. + +3.6 Principles of Congestion Control In the previous sections, we +examined both the general principles and specific TCP mechanisms used to +provide for a reliable data transfer service in the face of packet loss. +We mentioned earlier that, in practice, such loss typically results from +the overflowing of router buffers as the network becomes congested. +Packet retransmission thus treats a symptom of network congestion (the +loss of a specific transport-layer segment) but does not treat the cause +of network congestion---too many sources attempting to send data at too +high a rate. To treat the cause of network congestion, mechanisms are +needed to throttle senders in the face of network congestion. In this +section, we consider the problem of congestion control in a general +context, seeking to understand why congestion is a bad thing, how +network congestion is manifested in the performance received by +upper-layer applications, and various approaches that can be taken to +avoid, or react to, network congestion. This more general study of +congestion control is appropriate since, as with reliable data transfer, +it is high on our "top-ten" list of fundamentally important problems in +networking. The following section contains a detailed study of TCP's +congestion-control algorithm. + +3.6.1 The Causes and the Costs of Congestion Let's begin our general +study of congestion control by examining three increasingly complex +scenarios in which congestion occurs. In each case, we'll look at why +congestion occurs in the first place and at the cost of congestion (in +terms of resources not fully utilized and poor performance received by +the end systems). We'll not (yet) focus on how to react to, or avoid, +congestion but rather focus on the simpler issue of understanding what +happens as hosts increase their transmission rate and the network +becomes congested. Scenario 1: Two Senders, a Router with Infinite +Buffers We begin by considering perhaps the simplest congestion scenario +possible: Two hosts (A and B) each have a connection that shares a +single hop between source and destination, as shown in Figure 3.43. +Let's assume that the application in Host A is sending data into the +connection (for example, passing data to the transport-level protocol +via a socket) at an average rate of λin bytes/sec. These data are +original in the sense that each unit of data is sent into the socket +only once. The underlying transportlevel protocol is a simple one. Data +is encapsulated and sent; no error recovery (for example, + +retransmission), flow control, or congestion control is performed. +Ignoring the additional overhead due to adding transport- and +lower-layer header information, the rate at which Host A offers traffic +to the router in this first scenario is thus λin bytes/sec. Host B +operates in a similar manner, and we assume for simplicity that it too +is sending at a rate of λin bytes/sec. Packets from Hosts A and B pass +through a router and over a shared outgoing link of capacity R. The +router has buffers that allow it to store incoming packets when the +packet-arrival rate exceeds the outgoing link's capacity. In this first +scenario, we assume that the router has an infinite amount of buffer +space. Figure 3.44 plots the performance of Host A's connection under +this first scenario. The left graph plots the per-connection throughput +(number of bytes per + +Figure 3.43 Congestion scenario 1: Two connections sharing a single hop +with infinite buffers + +Figure 3.44 Congestion scenario 1: Throughput and delay as a function of +host sending rate + +second at the receiver) as a function of the connection-sending rate. +For a sending rate between 0 and R/2, the throughput at the receiver +equals the sender's sending rate---everything sent by the sender is +received at the receiver with a finite delay. When the sending rate is +above R/2, however, the throughput is only R/2. This upper limit on +throughput is a consequence of the sharing of link capacity between two +connections. The link simply cannot deliver packets to a receiver at a +steady-state rate that exceeds R/2. No matter how high Hosts A and B set +their sending rates, they will each never see a throughput higher than +R/2. Achieving a per-connection throughput of R/2 might actually appear +to be a good thing, because the link is fully utilized in delivering +packets to their destinations. The right-hand graph in Figure 3.44, +however, shows the consequence of operating near link capacity. As the +sending rate approaches R/2 (from the left), the average delay becomes +larger and larger. When the sending rate exceeds R/2, the average number +of queued packets in the router is unbounded, and the average delay +between source and destination becomes infinite (assuming that the +connections operate at these sending rates for an infinite period of +time and there is an infinite amount of buffering available). Thus, +while operating at an aggregate throughput of near R may be ideal from a +throughput standpoint, it is far from ideal from a delay standpoint. +Even in this (extremely) idealized scenario, we've already found one +cost of a congested network---large queuing delays are experienced as +the packet-arrival rate nears the link capacity. Scenario 2: Two Senders +and a Router with Finite Buffers Let's now slightly modify scenario 1 in +the following two ways (see Figure 3.45). First, the amount of router +buffering is assumed to be finite. A consequence of this real-world +assumption is that packets will be dropped when arriving to an +already-full buffer. Second, we assume that each connection is reliable. +If a packet containing + +Figure 3.45 Scenario 2: Two hosts (with retransmissions) and a router +with finite buffers + +a transport-level segment is dropped at the router, the sender will +eventually retransmit it. Because packets can be retransmitted, we must +now be more careful with our use of the term sending rate. Specifically, +let us again denote the rate at which the application sends original +data into the socket by λin bytes/sec. The rate at which the transport +layer sends segments (containing original data and retransmitted data) +into the network will be denoted λ′in bytes/sec. λ′in is sometimes +referred to as the offered load to the network. The performance realized +under scenario 2 will now depend strongly on how retransmission is +performed. First, consider the unrealistic case that Host A is able to +somehow (magically!) determine whether or not a buffer is free in the +router and thus sends a packet only when a buffer is free. In this case, +no loss would occur, λin would be equal to λ′in, and the throughput of +the connection would be equal to λin. This case is shown in Figure +3.46(a). From a throughput standpoint, performance is ideal--- +everything that is sent is received. Note that the average host sending +rate cannot exceed R/2 under this scenario, since packet loss is assumed +never to occur. Consider next the slightly more realistic case that the +sender retransmits only when a packet is known for certain to be lost. +(Again, this assumption is a bit of a stretch. However, it is possible +that the sending host might set its timeout large enough to be virtually +assured that a packet that has not been acknowledged has been lost.) In +this case, the performance might look something like that shown in +Figure 3.46(b). To appreciate what is happening here, consider the case +that the offered load, λ′in (the rate of original data transmission plus +retransmissions), equals R/2. According to Figure 3.46(b), at this value +of the offered load, the rate at which data + +Figure 3.46 Scenario 2 performance with finite buffers + +are delivered to the receiver application is R/3. Thus, out of the 0.5R +units of data transmitted, 0.333R bytes/sec (on average) are original +data and 0.166R bytes/sec (on average) are retransmitted data. We see +here another cost of a congested network---the sender must perform +retransmissions in order to compensate for dropped (lost) packets due to +buffer overflow. Finally, let us consider the case that the sender may +time out prematurely and retransmit a packet that has been delayed in +the queue but not yet lost. In this case, both the original data packet +and the retransmission may reach the receiver. Of course, the receiver +needs but one copy of this packet and will discard the retransmission. +In this case, the work done by the router in forwarding the +retransmitted copy of the original packet was wasted, as the receiver +will have already received the original copy of this packet. The router +would have better used the link transmission capacity to send a +different packet instead. Here then is yet another cost of a congested +network---unneeded retransmissions by the sender in the face of large +delays may cause a router to use its link bandwidth to forward unneeded +copies of a packet. Figure 3.46 (c) shows the throughput versus offered +load when each packet is assumed to be forwarded (on average) twice by +the router. Since each packet is forwarded twice, the throughput will +have an asymptotic value of R/4 as the offered load approaches R/2. +Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop +Paths In our final congestion scenario, four hosts transmit packets, +each over overlapping two-hop paths, as shown in Figure 3.47. We again +assume that each host uses a timeout/retransmission mechanism to +implement a reliable data transfer service, that all hosts have the same +value of λin, and that all router links have capacity R bytes/sec. + +Figure 3.47 Four senders, routers with finite buffers, and multihop +paths + +Let's consider the connection from Host A to Host C, passing through +routers R1 and R2. The A--C connection shares router R1 with the D--B +connection and shares router R2 with the B--D connection. For extremely +small values of λin, buffer overflows are rare (as in congestion +scenarios 1 and 2), and the throughput approximately equals the offered +load. For slightly larger values of λin, the corresponding throughput is +also larger, since more original data is being transmitted into the +network and delivered to the destination, and overflows are still rare. +Thus, for small values of λin, an increase in λin results in an increase +in λout. Having considered the case of extremely low traffic, let's next +examine the case that λin (and hence λ′in) is extremely large. Consider +router R2. The A--C traffic arriving to router R2 (which arrives at R2 +after being forwarded from R1) can have an arrival rate at R2 that is at +most R, the capacity of the link from R1 to R2, regardless of the value +of λin. If λ′in is extremely large for all connections (including the + +Figure 3.48 Scenario 3 performance with finite buffers and multihop +paths + +B--D connection), then the arrival rate of B--D traffic at R2 can be +much larger than that of the A--C traffic. Because the A--C and B--D +traffic must compete at router R2 for the limited amount of buffer +space, the amount of A--C traffic that successfully gets through R2 +(that is, is not lost due to buffer overflow) becomes smaller and +smaller as the offered load from B--D gets larger and larger. In the +limit, as the offered load approaches infinity, an empty buffer at R2 is +immediately filled by a B--D packet, and the throughput of the A--C +connection at R2 goes to zero. This, in turn, implies that the A--C +end-to-end throughput goes to zero in the limit of heavy traffic. These +considerations give rise to the offered load versus throughput tradeoff +shown in Figure 3.48. The reason for the eventual decrease in throughput +with increasing offered load is evident when one considers the amount of +wasted work done by the network. In the high-traffic scenario outlined +above, whenever a packet is dropped at a second-hop router, the work +done by the first-hop router in forwarding a packet to the second-hop +router ends up being "wasted." The network would have been equally well +off (more accurately, equally bad off) if the first router had simply +discarded that packet and remained idle. More to the point, the +transmission capacity used at the first router to forward the packet to +the second router could have been much more profitably used to transmit +a different packet. (For example, when selecting a packet for +transmission, it might be better for a router to give priority to +packets that have already traversed some number of upstream routers.) So +here we see yet another cost of dropping a packet due to +congestion---when a packet is dropped along a path, the transmission +capacity that was used at each of the upstream links to forward that +packet to the point at which it is dropped ends up having been wasted. + +3.6.2 Approaches to Congestion Control In Section 3.7, we'll examine +TCP's specific approach to congestion control in great detail. Here, we +identify the two broad approaches to congestion control that are taken +in practice and discuss specific + +network architectures and congestion-control protocols embodying these +approaches. At the highest level, we can distinguish among +congestion-control approaches by whether the network layer provides +explicit assistance to the transport layer for congestion-control +purposes: End-to-end congestion control. In an end-to-end approach to +congestion control, the network layer provides no explicit support to +the transport layer for congestion-control purposes. Even the presence +of network congestion must be inferred by the end systems based only on +observed network behavior (for example, packet loss and delay). We'll +see shortly in Section 3.7.1 that TCP takes this end-to-end approach +toward congestion control, since the IP layer is not required to provide +feedback to hosts regarding network congestion. TCP segment loss (as +indicated by a timeout or the receipt of three duplicate +acknowledgments) is taken as an indication of network congestion, and +TCP decreases its window size accordingly. We'll also see a more recent +proposal for TCP congestion control that uses increasing round-trip +segment delay as an indicator of increased network congestion +Network-assisted congestion control. With network-assisted congestion +control, routers provide explicit feedback to the sender and/or receiver +regarding the congestion state of the network. This feedback may be as +simple as a single bit indicating congestion at a link -- an approach +taken in the early IBM SNA \[Schwartz 1982\], DEC DECnet \[Jain 1989; +Ramakrishnan 1990\] architectures, and ATM \[Black 1995\] network +architectures. More sophisticated feedback is also possible. For +example, in ATM Available Bite Rate (ABR) congestion control, a router +informs the sender of the maximum host sending rate it (the router) can +support on an outgoing link. As noted above, the Internet-default +versions of IP and TCP adopt an end-to-end approach towards congestion +control. We'll see, however, in Section 3.7.2 that, more recently, IP +and TCP may also optionally implement network-assisted congestion +control. For network-assisted congestion control, congestion information +is typically fed back from the network to the sender in one of two ways, +as shown in Figure 3.49. Direct feedback may be sent from a network +router to the sender. This form of notification typically takes the form +of a choke packet (essentially saying, "I'm congested!"). The second and +more common form of notification occurs when a router marks/updates a +field in a packet flowing from sender to receiver to indicate +congestion. Upon receipt of a marked packet, the receiver then notifies +the sender of the congestion indication. This latter form of +notification takes a full round-trip time. + +Figure 3.49 Two feedback pathways for network-indicated congestion +information + +3.7 TCP Congestion Control In this section we return to our study of +TCP. As we learned in Section 3.5, TCP provides a reliable transport +service between two processes running on different hosts. Another key +component of TCP is its congestion-control mechanism. As indicated in +the previous section, TCP must use end-to-end congestion control rather +than network-assisted congestion control, since the IP layer provides no +explicit feedback to the end systems regarding network congestion. The +approach taken by TCP is to have each sender limit the rate at which it +sends traffic into its connection as a function of perceived network +congestion. If a TCP sender perceives that there is little congestion on +the path between itself and the destination, then the TCP sender +increases its send rate; if the sender perceives that there is +congestion along the path, then the sender reduces its send rate. But +this approach raises three questions. First, how does a TCP sender limit +the rate at which it sends traffic into its connection? Second, how does +a TCP sender perceive that there is congestion on the path between +itself and the destination? And third, what algorithm should the sender +use to change its send rate as a function of perceived end-to-end +congestion? Let's first examine how a TCP sender limits the rate at +which it sends traffic into its connection. In Section 3.5 we saw that +each side of a TCP connection consists of a receive buffer, a send +buffer, and several variables ( LastByteRead , rwnd , and so on). The +TCP congestion-control mechanism operating at the sender keeps track of +an additional variable, the congestion window. The congestion window, +denoted cwnd , imposes a constraint on the rate at which a TCP sender +can send traffic into the network. Specifically, the amount of +unacknowledged data at a sender may not exceed the minimum of cwnd and +rwnd , that is: + +LastByteSent−LastByteAcked≤min{cwnd, rwnd} + +In order to focus on congestion control (as opposed to flow control), +let us henceforth assume that the TCP receive buffer is so large that +the receive-window constraint can be ignored; thus, the amount of +unacknowledged data at the sender is solely limited by cwnd . We will +also assume that the sender always has data to send, i.e., that all +segments in the congestion window are sent. The constraint above limits +the amount of unacknowledged data at the sender and therefore indirectly +limits the sender's send rate. To see this, consider a connection for +which loss and packet transmission delays are negligible. Then, roughly, +at the beginning of every RTT, the constraint permits the sender to + +send cwnd bytes of data into the connection; at the end of the RTT the +sender receives acknowledgments for the data. Thus the sender's send +rate is roughly cwnd/RTT bytes/sec. By adjusting the value of cwnd , the +sender can therefore adjust the rate at which it sends data into its +connection. Let's next consider how a TCP sender perceives that there is +congestion on the path between itself and the destination. Let us define +a "loss event" at a TCP sender as the occurrence of either a timeout or +the receipt of three duplicate ACKs from the receiver. (Recall our +discussion in Section 3.5.4 of the timeout event in Figure 3.33 and the +subsequent modification to include fast retransmit on receipt of three +duplicate ACKs.) When there is excessive congestion, then one (or more) +router buffers along the path overflows, causing a datagram (containing +a TCP segment) to be dropped. The dropped datagram, in turn, results in +a loss event at the sender---either a timeout or the receipt of three +duplicate ACKs--- which is taken by the sender to be an indication of +congestion on the sender-to-receiver path. Having considered how +congestion is detected, let's next consider the more optimistic case +when the network is congestion-free, that is, when a loss event doesn't +occur. In this case, acknowledgments for previously unacknowledged +segments will be received at the TCP sender. As we'll see, TCP will take +the arrival of these acknowledgments as an indication that all is +well---that segments being transmitted into the network are being +successfully delivered to the destination---and will use acknowledgments +to increase its congestion window size (and hence its transmission +rate). Note that if acknowledgments arrive at a relatively slow rate +(e.g., if the end-end path has high delay or contains a low-bandwidth +link), then the congestion window will be increased at a relatively slow +rate. On the other hand, if acknowledgments arrive at a high rate, then +the congestion window will be increased more quickly. Because TCP uses +acknowledgments to trigger (or clock) its increase in congestion window +size, TCP is said to be self-clocking. Given the mechanism of adjusting +the value of cwnd to control the sending rate, the critical question +remains: How should a TCP sender determine the rate at which it should +send? If TCP senders collectively send too fast, they can congest the +network, leading to the type of congestion collapse that we saw in +Figure 3.48. Indeed, the version of TCP that we'll study shortly was +developed in response to observed Internet congestion collapse +\[Jacobson 1988\] under earlier versions of TCP. However, if TCP senders +are too cautious and send too slowly, they could under utilize the +bandwidth in the network; that is, the TCP senders could send at a +higher rate without congesting the network. How then do the TCP senders +determine their sending rates such that they don't congest the network +but at the same time make use of all the available bandwidth? Are TCP +senders explicitly coordinated, or is there a distributed approach in +which the TCP senders can set their sending rates based only on local +information? TCP answers these questions using the following guiding +principles: A lost segment implies congestion, and hence, the TCP +sender's rate should be decreased when a segment is lost. Recall from +our discussion in Section 3.5.4, that a timeout event or the + +receipt of four acknowledgments for a given segment (one original ACK +and then three duplicate ACKs) is interpreted as an implicit "loss +event" indication of the segment following the quadruply ACKed segment, +triggering a retransmission of the lost segment. From a +congestion-control standpoint, the question is how the TCP sender should +decrease its congestion window size, and hence its sending rate, in +response to this inferred loss event. An acknowledged segment indicates +that the network is delivering the sender's segments to the receiver, +and hence, the sender's rate can be increased when an ACK arrives for a +previously unacknowledged segment. The arrival of acknowledgments is +taken as an implicit indication that all is well---segments are being +successfully delivered from sender to receiver, and the network is thus +not congested. The congestion window size can thus be increased. +Bandwidth probing. Given ACKs indicating a congestion-free +source-to-destination path and loss events indicating a congested path, +TCP's strategy for adjusting its transmission rate is to increase its +rate in response to arriving ACKs until a loss event occurs, at which +point, the transmission rate is decreased. The TCP sender thus increases +its transmission rate to probe for the rate that at which congestion +onset begins, backs off from that rate, and then to begins probing again +to see if the congestion onset rate has changed. The TCP sender's +behavior is perhaps analogous to the child who requests (and gets) more +and more goodies until finally he/she is finally told "No!", backs off a +bit, but then begins making requests again shortly afterwards. Note that +there is no explicit signaling of congestion state by the network---ACKs +and loss events serve as implicit signals---and that each TCP sender +acts on local information asynchronously from other TCP senders. Given +this overview of TCP congestion control, we're now in a position to +consider the details of the celebrated TCP congestion-control algorithm, +which was first described in \[Jacobson 1988\] and is standardized in +\[RFC 5681\]. The algorithm has three major components: (1) slow start, +(2) congestion avoidance, and (3) fast recovery. Slow start and +congestion avoidance are mandatory components of TCP, differing in how +they increase the size of cwnd in response to received ACKs. We'll see +shortly that slow start increases the size of cwnd more rapidly (despite +its name!) than congestion avoidance. Fast recovery is recommended, but +not required, for TCP senders. Slow Start When a TCP connection begins, +the value of cwnd is typically initialized to a small value of 1 MSS +\[RFC 3390\], resulting in an initial sending rate of roughly MSS/RTT. +For example, if MSS = 500 bytes and RTT = 200 msec, the resulting +initial sending rate is only about 20 kbps. Since the available +bandwidth to the TCP sender may be much larger than MSS/RTT, the TCP +sender would like to find the amount of available bandwidth quickly. +Thus, in the slow-start state, the value of cwnd begins at 1 MSS and +increases by 1 MSS every time a transmitted segment is first +acknowledged. In the example of Figure 3.50, TCP sends the first segment +into the network + +Figure 3.50 TCP slow start + +and waits for an acknowledgment. When this acknowledgment arrives, the +TCP sender increases the congestion window by one MSS and sends out two +maximum-sized segments. These segments are then acknowledged, with the +sender increasing the congestion window by 1 MSS for each of the +acknowledged segments, giving a congestion window of 4 MSS, and so on. +This process results in a doubling of the sending rate every RTT. Thus, +the TCP send rate starts slow but grows exponentially during the slow +start phase. But when should this exponential growth end? Slow start +provides several answers to this question. First, if there is a loss +event (i.e., congestion) indicated by a timeout, the TCP sender sets the +value of cwnd to 1 and begins the slow start process anew. It also sets +the value of a second state variable, ssthresh (shorthand for "slow +start threshold") to cwnd/2 ---half of the value of the congestion +window value when congestion was detected. The second way in which slow +start may end is directly tied to the value of ssthresh . Since ssthresh +is half the value of cwnd when congestion was last detected, it might be +a bit reckless to keep doubling cwnd when it reaches or surpasses the +value of ssthresh . Thus, when the value of cwnd equals ssthresh , slow +start ends and TCP transitions into congestion avoidance mode. As we'll +see, TCP increases cwnd more cautiously when in congestion-avoidance +mode. The final way in which slow start can end is if three duplicate +ACKs are + +detected, in which case TCP performs a fast retransmit (see Section +3.5.4) and enters the fast recovery state, as discussed below. TCP's +behavior in slow start is summarized in the FSM description of TCP +congestion control in Figure 3.51. The slow-start algorithm traces it +roots to \[Jacobson 1988\]; an approach similar to slow start was also +proposed independently in \[Jain 1986\]. Congestion Avoidance On entry +to the congestion-avoidance state, the value of cwnd is approximately +half its value when congestion was last encountered---congestion could +be just around the corner! Thus, rather than doubling the value of cwnd +every RTT, TCP adopts a more conservative approach and increases the +value of cwnd by just a single MSS every RTT \[RFC 5681\]. This can be +accomplished in several ways. A common approach is for the TCP sender to +increase cwnd by MSS bytes (MSS/ cwnd ) whenever a new acknowledgment +arrives. For example, if MSS is 1,460 bytes and cwnd is 14,600 bytes, +then 10 segments are being sent within an RTT. Each arriving ACK +(assuming one ACK per segment) increases the congestion window size by +1/10 MSS, and thus, the value of the congestion window will have +increased by one MSS after ACKs when all 10 segments have been received. +But when should congestion avoidance's linear increase (of 1 MSS per +RTT) end? TCP's congestionavoidance algorithm behaves the same when a +timeout occurs. As in the case of slow start: The value of cwnd is set +to 1 MSS, and the value of ssthresh is updated to half the value of cwnd +when the loss event occurred. Recall, however, that a loss event also +can be triggered by a triple duplicate ACK event. + +Figure 3.51 FSM description of TCP congestion control + +In this case, the network is continuing to deliver segments from sender +to receiver (as indicated by the receipt of duplicate ACKs). So TCP's +behavior to this type of loss event should be less drastic than with a +timeout-indicated loss: TCP halves the value of cwnd (adding in 3 MSS +for good measure to account for the triple duplicate ACKs received) and +records the value of ssthresh to be half the value of cwnd when the +triple duplicate ACKs were received. The fast-recovery state is then +entered. Fast Recovery In fast recovery, the value of cwnd is increased +by 1 MSS for every duplicate ACK received for the missing segment that +caused TCP to enter the fast-recovery state. Eventually, when an ACK +arrives for the missing segment, TCP enters the + +Examining the behavior of TCP + +PRINCIPLES IN PRACTICE TCP SPLITTING: OPTIMIZING THE PERFORMANCE OF +CLOUD SERVICES For cloud services such as search, e-mail, and social +networks, it is desirable to provide a highlevel of responsiveness, +ideally giving users the illusion that the services are running within +their own end systems (including their smartphones). This can be a major +challenge, as users are often located far away from the data centers +responsible for serving the dynamic content associated with the cloud +services. Indeed, if the end system is far from a data center, then the +RTT will be large, potentially leading to poor response time performance +due to TCP slow start. As a case study, consider the delay in receiving +a response for a search query. Typically, the server requires three TCP +windows during slow start to deliver the response \[Pathak 2010\]. Thus +the time from when an end system initiates a TCP connection until the +time when it receives the last packet of the response is roughly 4⋅RTT +(one RTT to set up the TCP connection plus three RTTs for the three +windows of data) plus the processing time in the data center. These RTT +delays can lead to a noticeable delay in returning search results for a +significant fraction of queries. Moreover, there can be significant +packet loss in access networks, leading to TCP retransmissions and even +larger delays. One way to mitigate this problem and improve +user-perceived performance is to (1) deploy frontend servers closer to +the users, and (2) utilize TCP splitting by breaking the TCP connection +at the front-end server. With TCP splitting, the client establishes a +TCP connection to the nearby front-end, and the front-end maintains a +persistent TCP connection to the data center with a very large TCP +congestion window \[Tariq 2008, Pathak 2010, Chen 2011\]. With this +approach, the response time roughly becomes 4⋅RTTFE+RTTBE+ processing +time, where RTTFE is the roundtrip time between client and front-end +server, and RTTBE is the round-trip time between the frontend server and +the data center (back-end server). If the front-end server is close to +client, then this response time approximately becomes RTT plus +processing time, since RTTFE is negligibly small and RTTBE is +approximately RTT. In summary, TCP splitting can reduce the networking +delay roughly from 4⋅RTT to RTT, significantly improving user-perceived +performance, particularly for users who are far from the nearest data +center. TCP splitting also helps reduce TCP retransmission delays caused +by losses in access networks. Google and Akamai have made extensive use +of their CDN servers in access networks (recall our discussion in +Section 2.6) to perform TCP splitting for the cloud services they +support \[Chen 2011\]. + +congestion-avoidance state after deflating cwnd . If a timeout event +occurs, fast recovery transitions to the slow-start state after +performing the same actions as in slow start and congestion avoidance: +The value of cwnd is set to 1 MSS, and the value of ssthresh is set to +half the value of cwnd when the loss event occurred. Fast recovery is a +recommended, but not required, component of TCP \[RFC 5681\]. It is +interesting that an early version of TCP, known as TCP Tahoe, +unconditionally cut its congestion window to 1 MSS and entered the +slow-start phase after either a timeout-indicated or +triple-duplicate-ACK-indicated loss event. The newer version of TCP, TCP +Reno, incorporated fast recovery. Figure 3.52 illustrates the evolution +of TCP's congestion window for both Reno and Tahoe. In this figure, the +threshold is initially equal to 8 MSS. For the first eight transmission +rounds, Tahoe and Reno take identical actions. The congestion window +climbs exponentially fast during slow start and hits the threshold at +the fourth round of transmission. The congestion window then climbs +linearly until a triple duplicate- ACK event occurs, just after +transmission round 8. Note that the congestion window is 12⋅MSS when +this loss event occurs. The value of ssthresh is then set to 0.5⋅ cwnd +=6⋅MSS. Under TCP Reno, the congestion window is set to cwnd = 9⋅MSS and +then grows linearly. Under TCP Tahoe, the congestion window is set to 1 +MSS and grows exponentially until it reaches the value of ssthresh , at +which point it grows linearly. Figure 3.51 presents the complete FSM +description of TCP's congestion-control algorithms---slow start, +congestion avoidance, and fast recovery. The figure also indicates where +transmission of new segments or retransmitted segments can occur. +Although it is important to distinguish between TCP error +control/retransmission and TCP congestion control, it's also important +to appreciate how these two aspects of TCP are inextricably linked. TCP +Congestion Control: Retrospective Having delved into the details of slow +start, congestion avoidance, and fast recovery, it's worthwhile to now +step back and view the forest from the trees. Ignoring the + +Figure 3.52 Evolution of TCP's congestion window (Tahoe and Reno) + +Figure 3.53 Additive-increase, multiplicative-decrease congestion +control + +initial slow-start period when a connection begins and assuming that +losses are indicated by triple duplicate ACKs rather than timeouts, +TCP's congestion control consists of linear (additive) increase in cwnd +of 1 MSS per RTT and then a halving (multiplicative decrease) of cwnd on +a triple duplicate-ACK event. For this reason, TCP congestion control is +often referred to as an additive-increase, multiplicative-decrease +(AIMD) form of congestion control. AIMD congestion control gives rise to +the "saw tooth" behavior shown in Figure 3.53, which also nicely +illustrates our earlier intuition of TCP "probing" for bandwidth---TCP +linearly increases its congestion window size (and hence its +transmission rate) until a triple duplicate-ACK event occurs. It then +decreases its congestion window size by a factor of two but then again +begins increasing it linearly, probing to see if there is additional +available bandwidth. + +As noted previously, many TCP implementations use the Reno algorithm +\[Padhye 2001\]. Many variations of the Reno algorithm have been +proposed \[RFC 3782; RFC 2018\]. The TCP Vegas algorithm \[Brakmo 1995; +Ahn 1995\] attempts to avoid congestion while maintaining good +throughput. The basic idea of Vegas is to (1) detect congestion in the +routers between source and destination before packet loss occurs, and +(2) lower the rate linearly when this imminent packet loss is detected. +Imminent packet loss is predicted by observing the RTT. The longer the +RTT of the packets, the greater the congestion in the routers. As of +late 2015, the Ubuntu Linux implementation of TCP provided slowstart, +congestion avoidance, fast recovery, fast retransmit, and SACK, by +default; alternative congestion control algorithms, such as TCP Vegas +and BIC \[Xu 2004\], are also provided. For a survey of the many flavors +of TCP, see \[Afanasyev 2010\]. TCP's AIMD algorithm was developed based +on a tremendous amount of engineering insight and experimentation with +congestion control in operational networks. Ten years after TCP's +development, theoretical analyses showed that TCP's congestion-control +algorithm serves as a distributed asynchronous-optimization algorithm +that results in several important aspects of user and network +performance being simultaneously optimized \[Kelly 1998\]. A rich theory +of congestion control has since been developed \[Srikant 2004\]. +Macroscopic Description of TCP Throughput Given the saw-toothed behavior +of TCP, it's natural to consider what the average throughput (that is, +the average rate) of a long-lived TCP connection might be. In this +analysis we'll ignore the slow-start phases that occur after timeout +events. (These phases are typically very short, since the sender grows +out of the phase exponentially fast.) During a particular round-trip +interval, the rate at which TCP sends data is a function of the +congestion window and the current RTT. When the window size is w bytes +and the current round-trip time is RTT seconds, then TCP's transmission +rate is roughly w/RTT. TCP then probes for additional bandwidth by +increasing w by 1 MSS each RTT until a loss event occurs. Denote by W +the value of w when a loss event occurs. Assuming that RTT and W are +approximately constant over the duration of the connection, the TCP +transmission rate ranges from W/(2 · RTT) to W/RTT. These assumptions +lead to a highly simplified macroscopic model for the steady-state +behavior of TCP. The network drops a packet from the connection when the +rate increases to W/RTT; the rate is then cut in half and then increases +by MSS/RTT every RTT until it again reaches W/RTT. This process repeats +itself over and over again. Because TCP's throughput (that is, rate) +increases linearly between the two extreme values, we have average +throughput of a connection=0.75⋅WRTT Using this highly idealized model +for the steady-state dynamics of TCP, we can also derive an interesting +expression that relates a connection's loss rate to its available +bandwidth \[Mahdavi 1997\]. + +This derivation is outlined in the homework problems. A more +sophisticated model that has been found empirically to agree with +measured data is \[Padhye 2000\]. TCP Over High-Bandwidth Paths It is +important to realize that TCP congestion control has evolved over the +years and indeed continues to evolve. For a summary of current TCP +variants and discussion of TCP evolution, see \[Floyd 2001, RFC 5681, +Afanasyev 2010\]. What was good for the Internet when the bulk of the +TCP connections carried SMTP, FTP, and Telnet traffic is not necessarily +good for today's HTTP-dominated Internet or for a future Internet with +services that are still undreamed of. The need for continued evolution +of TCP can be illustrated by considering the high-speed TCP connections +that are needed for grid- and cloud-computing applications. For example, +consider a TCP connection with 1,500-byte segments and a 100 ms RTT, and +suppose we want to send data through this connection at 10 Gbps. +Following \[RFC 3649\], we note that using the TCP throughput formula +above, in order to achieve a 10 Gbps throughput, the average congestion +window size would need to be 83,333 segments. That's a lot of segments, +leading us to be rather concerned that one of these 83,333 in-flight +segments might be lost. What would happen in the case of a loss? Or, put +another way, what fraction of the transmitted segments could be lost +that would allow the TCP congestion-control algorithm specified in +Figure 3.51 still to achieve the desired 10 Gbps rate? In the homework +questions for this chapter, you are led through the derivation of a +formula relating the throughput of a TCP connection as a function of the +loss rate (L), the round-trip time (RTT), and the maximum segment size +(MSS): average throughput of a connection=1.22⋅MSSRTTL Using this +formula, we can see that in order to achieve a throughput of 10 Gbps, +today's TCP congestion-control algorithm can only tolerate a segment +loss probability of 2 · 10--10 (or equivalently, one loss event for +every 5,000,000,000 segments)---a very low rate. This observation has +led a number of researchers to investigate new versions of TCP that are +specifically designed for such high-speed environments; see \[Jin 2004; +Kelly 2003; Ha 2008; RFC 7323\] for discussions of these efforts. + +3.7.1 Fairness Consider K TCP connections, each with a different +end-to-end path, but all passing through a bottleneck link with +transmission rate R bps. (By bottleneck link, we mean that for each +connection, all the other links along the connection's path are not +congested and have abundant transmission capacity as compared with the +transmission capacity of the bottleneck link.) Suppose each connection +is transferring a large file and there is no UDP traffic passing through +the bottleneck link. A congestion-control mechanism is said to be fair +if the average transmission rate of each connection is approximately +R/K; + +that is, each connection gets an equal share of the link bandwidth. Is +TCP's AIMD algorithm fair, particularly given that different TCP +connections may start at different times and thus may have different +window sizes at a given point in time? \[Chiu 1989\] provides an elegant +and intuitive explanation of why TCP congestion control converges to +provide an equal share of a bottleneck link's bandwidth among competing +TCP connections. Let's consider the simple case of two TCP connections +sharing a single link with transmission rate R, as shown in Figure 3.54. +Assume that the two connections + +Figure 3.54 Two TCP connections sharing a single bottleneck link + +have the same MSS and RTT (so that if they have the same congestion +window size, then they have the same throughput), that they have a large +amount of data to send, and that no other TCP connections or UDP +datagrams traverse this shared link. Also, ignore the slow-start phase +of TCP and assume the TCP connections are operating in CA mode (AIMD) at +all times. Figure 3.55 plots the throughput realized by the two TCP +connections. If TCP is to share the link bandwidth equally between the +two connections, then the realized throughput should fall along the +45degree arrow (equal bandwidth share) emanating from the origin. +Ideally, the sum of the two throughputs should equal R. (Certainly, each +connection receiving an equal, but zero, share of the link capacity is +not a desirable situation!) So the goal should be to have the achieved +throughputs fall somewhere near the intersection of the equal bandwidth +share line and the full bandwidth utilization line in Figure 3.55. +Suppose that the TCP window sizes are such that at a given point in +time, connections 1 and 2 realize throughputs indicated by point A in +Figure 3.55. Because the amount of link bandwidth jointly consumed by +the two connections is less than R, no loss will occur, and both +connections will increase their window by 1 MSS per RTT as a result of +TCP's congestion-avoidance algorithm. Thus, the joint throughput of the +two connections proceeds along a 45-degree line (equal increase for both + +connections) starting from point A. Eventually, the link bandwidth +jointly consumed by the two connections will be greater than R, and +eventually packet loss will occur. Suppose that connections 1 and 2 +experience packet loss when they realize throughputs indicated by point +B. Connections 1 and 2 then decrease their windows by a factor of two. +The resulting throughputs realized are thus at point C, halfway along a +vector starting at B and ending at the origin. Because the joint +bandwidth use is less than R at point C, the two connections again +increase their throughputs along a 45-degree line starting from C. +Eventually, loss will again occur, for example, at point D, and the two +connections again decrease their window sizes by a factor of two, and so +on. You should convince yourself that the bandwidth realized by the two +connections eventually fluctuates along the equal bandwidth share line. +You should also convince + +Figure 3.55 Throughput realized by TCP connections 1 and 2 + +yourself that the two connections will converge to this behavior +regardless of where they are in the twodimensional space! Although a +number of idealized assumptions lie behind this scenario, it still +provides an intuitive feel for why TCP results in an equal sharing of +bandwidth among connections. In our idealized scenario, we assumed that +only TCP connections traverse the bottleneck link, that the connections +have the same RTT value, and that only a single TCP connection is +associated with a hostdestination pair. In practice, these conditions +are typically not met, and client-server applications can thus obtain +very unequal portions of link bandwidth. In particular, it has been +shown that when multiple connections share a common bottleneck, those +sessions with a smaller RTT are able to grab the available bandwidth at +that link more quickly as it becomes free (that is, open their +congestion windows faster) and thus will enjoy higher throughput than +those connections with larger RTTs \[Lakshman + +1997\]. Fairness and UDP We have just seen how TCP congestion control +regulates an application's transmission rate via the congestion window +mechanism. Many multimedia applications, such as Internet phone and +video conferencing, often do not run over TCP for this very +reason---they do not want their transmission rate throttled, even if the +network is very congested. Instead, these applications prefer to run +over UDP, which does not have built-in congestion control. When running +over UDP, applications can pump their audio and video into the network +at a constant rate and occasionally lose packets, rather than reduce +their rates to "fair" levels at times of congestion and not lose any +packets. From the perspective of TCP, the multimedia applications +running over UDP are not being fair---they do not cooperate with the +other connections nor adjust their transmission rates appropriately. +Because TCP congestion control will decrease its transmission rate in +the face of increasing congestion (loss), while UDP sources need not, it +is possible for UDP sources to crowd out TCP traffic. An area of +research today is thus the development of congestion-control mechanisms +for the Internet that prevent UDP traffic from bringing the Internet's +throughput to a grinding halt \[Floyd 1999; Floyd 2000; Kohler 2006; RFC +4340\]. Fairness and Parallel TCP Connections But even if we could force +UDP traffic to behave fairly, the fairness problem would still not be +completely solved. This is because there is nothing to stop a TCP-based +application from using multiple parallel connections. For example, Web +browsers often use multiple parallel TCP connections to transfer the +multiple objects within a Web page. (The exact number of multiple +connections is configurable in most browsers.) When an application uses +multiple parallel connections, it gets a larger fraction of the +bandwidth in a congested link. As an example, consider a link of rate R +supporting nine ongoing clientserver applications, with each of the +applications using one TCP connection. If a new application comes along +and also uses one TCP connection, then each application gets +approximately the same transmission rate of R/10. But if this new +application instead uses 11 parallel TCP connections, then the new +application gets an unfair allocation of more than R/2. Because Web +traffic is so pervasive in the Internet, multiple parallel connections +are not uncommon. + +3.7.2 Explicit Congestion Notification (ECN): Network-assisted +Congestion Control Since the initial standardization of slow start and +congestion avoidance in the late 1980's \[RFC 1122\], TCP has +implemented the form of end-end congestion control that we studied in +Section 3.7.1: a TCP sender receives no explicit congestion indications +from the network layer, and instead infers congestion through observed +packet loss. More recently, extensions to both IP and TCP \[RFC 3168\] +have been proposed, implemented, and deployed that allow the network to +explicitly signal congestion to a TCP + +sender and receiver. This form of network-assisted congestion control is +known as Explicit Congestion Notification. As shown in Figure 3.56, the +TCP and IP protocols are involved. At the network layer, two bits (with +four possible values, overall) in the Type of Service field of the IP +datagram header (which we'll discuss in Section 4.3) are used for ECN. +One setting of the ECN bits is used by a router to indicate that it (the + +Figure 3.56 Explicit Congestion Notification: network-assisted +congestion control + +router) is experiencing congestion. This congestion indication is then +carried in the marked IP datagram to the destination host, which then +informs the sending host, as shown in Figure 3.56. RFC 3168 does not +provide a definition of when a router is congested; that decision is a +configuration choice made possible by the router vendor, and decided by +the network operator. However, RFC 3168 does recommend that an ECN +congestion indication be set only in the face of persistent congestion. +A second setting of the ECN bits is used by the sending host to inform +routers that the sender and receiver are ECN-capable, and thus capable +of taking action in response to ECN-indicated network congestion. As +shown in Figure 3.56, when the TCP in the receiving host receives an ECN +congestion indication via a received datagram, the TCP in the receiving +host informs the TCP in the sending host of the congestion indication by +setting the ECE (Explicit Congestion Notification Echo) bit (see Figure +3.29) in a receiver-to-sender TCP ACK segment. The TCP sender, in turn, +reacts to an ACK with an ECE congestion indication by halving the +congestion window, as it would react to a lost segment using fast +retransmit, and sets the CWR (Congestion Window Reduced) bit in the +header of the next transmitted TCP sender-to-receiver segment. + +Other transport-layer protocols besides TCP may also make use of +network-layer-signaled ECN. The Datagram Congestion Control Protocol +(DCCP) \[RFC 4340\] provides a low-overhead, congestioncontrolled +UDP-like unreliable service that utilizes ECN. DCTCP (Data Center TCP) +\[Alizadeh 2010\], a version of TCP designed specifically for data +center networks, also makes use of ECN. + +3.8 Summary We began this chapter by studying the services that a +transport-layer protocol can provide to network applications. At one +extreme, the transport-layer protocol can be very simple and offer a +no-frills service to applications, providing only a +multiplexing/demultiplexing function for communicating processes. The +Internet's UDP protocol is an example of such a no-frills +transport-layer protocol. At the other extreme, a transport-layer +protocol can provide a variety of guarantees to applications, such as +reliable delivery of data, delay guarantees, and bandwidth guarantees. +Nevertheless, the services that a transport protocol can provide are +often constrained by the service model of the underlying network-layer +protocol. If the network-layer protocol cannot provide delay or +bandwidth guarantees to transport-layer segments, then the +transport-layer protocol cannot provide delay or bandwidth guarantees +for the messages sent between processes. We learned in Section 3.4 that +a transport-layer protocol can provide reliable data transfer even if +the underlying network layer is unreliable. We saw that providing +reliable data transfer has many subtle points, but that the task can be +accomplished by carefully combining acknowledgments, timers, +retransmissions, and sequence numbers. Although we covered reliable data +transfer in this chapter, we should keep in mind that reliable data +transfer can be provided by link-, network-, transport-, or +application-layer protocols. Any of the upper four layers of the +protocol stack can implement acknowledgments, timers, retransmissions, +and sequence numbers and provide reliable data transfer to the layer +above. In fact, over the years, engineers and computer scientists have +independently designed and implemented link-, network-, transport-, and +application-layer protocols that provide reliable data transfer +(although many of these protocols have quietly disappeared). In Section +3.5, we took a close look at TCP, the Internet's connection-oriented and +reliable transportlayer protocol. We learned that TCP is complex, +involving connection management, flow control, and round-trip time +estimation, as well as reliable data transfer. In fact, TCP is actually +more complex than our description---we intentionally did not discuss a +variety of TCP patches, fixes, and improvements that are widely +implemented in various versions of TCP. All of this complexity, however, +is hidden from the network application. If a client on one host wants to +send data reliably to a server on another host, it simply opens a TCP +socket to the server and pumps data into that socket. The client-server +application is blissfully unaware of TCP's complexity. In Section 3.6, +we examined congestion control from a broad perspective, and in Section +3.7, we showed how TCP implements congestion control. We learned that +congestion control is imperative for + +the well-being of the network. Without congestion control, a network can +easily become gridlocked, with little or no data being transported +end-to-end. In Section 3.7 we learned that TCP implements an endto-end +congestion-control mechanism that additively increases its transmission +rate when the TCP connection's path is judged to be congestion-free, and +multiplicatively decreases its transmission rate when loss occurs. This +mechanism also strives to give each TCP connection passing through a +congested link an equal share of the link bandwidth. We also examined in +some depth the impact of TCP connection establishment and slow start on +latency. We observed that in many important scenarios, connection +establishment and slow start significantly contribute to end-to-end +delay. We emphasize once more that while TCP congestion control has +evolved over the years, it remains an area of intensive research and +will likely continue to evolve in the upcoming years. Our discussion of +specific Internet transport protocols in this chapter has focused on UDP +and TCP---the two "work horses" of the Internet transport layer. +However, two decades of experience with these two protocols has +identified circumstances in which neither is ideally suited. Researchers +have thus been busy developing additional transport-layer protocols, +several of which are now IETF proposed standards. The Datagram +Congestion Control Protocol (DCCP) \[RFC 4340\] provides a low-overhead, +messageoriented, UDP-like unreliable service, but with an +application-selected form of congestion control that is compatible with +TCP. If reliable or semi-reliable data transfer is needed by an +application, then this would be performed within the application itself, +perhaps using the mechanisms we have studied in Section 3.4. DCCP is +envisioned for use in applications such as streaming media (see Chapter +9) that can exploit the tradeoff between timeliness and reliability of +data delivery, but that want to be responsive to network congestion. +Google's QUIC (Quick UDP Internet Connections) protocol \[Iyengar +2016\], implemented in Google's Chromium browser, provides reliability +via retransmission as well as error correction, fast-connection setup, +and a rate-based congestion control algorithm that aims to be TCP +friendly---all implemented as an application-level protocol on top of +UDP. In early 2015, Google reported that roughly half of all requests +from Chrome to Google servers are served over QUIC. DCTCP (Data Center +TCP) \[Alizadeh 2010\] is a version of TCP designed specifically for +data center networks, and uses ECN to better support the mix of short- +and long-lived flows that characterize data center workloads. The Stream +Control Transmission Protocol (SCTP) \[RFC 4960, RFC 3286\] is a +reliable, messageoriented protocol that allows several different +application-level "streams" to be multiplexed through a single SCTP +connection (an approach known as "multi-streaming"). From a reliability +standpoint, the different streams within the connection are handled +separately, so that packet loss in one stream does not affect the +delivery of data in other streams. QUIC provides similar multi-stream +semantics. SCTP + +also allows data to be transferred over two outgoing paths when a host +is connected to two or more networks, optional delivery of out-of-order +data, and a number of other features. SCTP's flow- and +congestion-control algorithms are essentially the same as in TCP. The +TCP-Friendly Rate Control (TFRC) protocol \[RFC 5348\] is a +congestion-control protocol rather than a full-fledged transport-layer +protocol. It specifies a congestion-control mechanism that could be used +in another transport protocol such as DCCP (indeed one of the two +application-selectable protocols available in DCCP is TFRC). The goal of +TFRC is to smooth out the "saw tooth" behavior (see Figure 3.53) in TCP +congestion control, while maintaining a long-term sending rate that is +"reasonably" close to that of TCP. With a smoother sending rate than +TCP, TFRC is well-suited for multimedia applications such as IP +telephony or streaming media where such a smooth rate is important. TFRC +is an "equationbased" protocol that uses the measured packet loss rate +as input to an equation \[Padhye 2000\] that estimates what TCP's +throughput would be if a TCP session experiences that loss rate. This +rate is then taken as TFRC's target sending rate. Only the future will +tell whether DCCP, SCTP, QUIC, or TFRC will see widespread deployment. +While these protocols clearly provide enhanced capabilities over TCP and +UDP, TCP and UDP have proven themselves "good enough" over the years. +Whether "better" wins out over "good enough" will depend on a complex +mix of technical, social, and business considerations. In Chapter 1, we +said that a computer network can be partitioned into the "network edge" +and the "network core." The network edge covers everything that happens +in the end systems. Having now covered the application layer and the +transport layer, our discussion of the network edge is complete. It is +time to explore the network core! This journey begins in the next two +chapters, where we'll study the network layer, and continues into +Chapter 6, where we'll study the link layer. + +Homework Problems and Questions + +Chapter 3 Review Questions + +SECTIONS 3.1--3.3 R1. Suppose the network layer provides the following +service. The network layer in the source host accepts a segment of +maximum size 1,200 bytes and a destination host address from the +transport layer. The network layer then guarantees to deliver the +segment to the transport layer at the destination host. Suppose many +network application processes can be running at the destination host. + +a. Design the simplest possible transport-layer protocol that will get + application data to the desired process at the destination host. + Assume the operating system in the destination host has assigned a + 4-byte port number to each running application process. + +b. Modify this protocol so that it provides a "return address" to the + destination process. + +c. In your protocols, does the transport layer "have to do anything" in + the core of the computer network? R2. Consider a planet where + everyone belongs to a family of six, every family lives in its own + house, each house has a unique address, and each person in a given + house has a unique name. Suppose this planet has a mail service that + delivers letters from source house to destination house. The mail + service requires that (1) the letter be in an envelope, and that (2) + the address of the destination house (and nothing more) be clearly + written on the envelope. Suppose each family has a delegate family + member who collects and distributes letters for the other family + members. The letters do not necessarily provide any indication of + the recipients of the letters. + +d. Using the solution to Problem R1 above as inspiration, describe a + protocol that the delegates can use to deliver letters from a + sending family member to a receiving family member. + +e. In your protocol, does the mail service ever have to open the + envelope and examine the letter in order to provide its service? R3. + Consider a TCP connection between Host A and Host B. Suppose that + the TCP segments traveling from Host A to Host B have source port + number x and destination port number y. What are the source and + destination port numbers for the segments traveling from Host B to + Host A? + +R4. Describe why an application developer might choose to run an +application over UDP rather than TCP. R5. Why is it that voice and video +traffic is often sent over TCP rather than UDP in today's Internet? +(Hint: The answer we are looking for has nothing to do with TCP's +congestion-control mechanism.) R6. Is it possible for an application to +enjoy reliable data transfer even when the application runs over UDP? If +so, how? R7. Suppose a process in Host C has a UDP socket with port +number 6789. Suppose both Host A and Host B each send a UDP segment to +Host C with destination port number 6789. Will both of these segments be +directed to the same socket at Host C? If so, how will the process at +Host C know that these two segments originated from two different hosts? +R8. Suppose that a Web server runs in Host C on port 80. Suppose this +Web server uses persistent connections, and is currently receiving +requests from two different Hosts, A and B. Are all of the requests +being sent through the same socket at Host C? If they are being passed +through different sockets, do both of the sockets have port 80? Discuss +and explain. + +SECTION 3.4 R9. In our rdt protocols, why did we need to introduce +sequence numbers? R10. In our rdt protocols, why did we need to +introduce timers? R11. Suppose that the roundtrip delay between sender +and receiver is constant and known to the sender. Would a timer still be +necessary in protocol rdt 3.0 , assuming that packets can be lost? +Explain. R12. Visit the Go-Back-N Java applet at the companion Web site. + +a. Have the source send five packets, and then pause the animation + before any of the five packets reach the destination. Then kill the + first packet and resume the animation. Describe what happens. + +b. Repeat the experiment, but now let the first packet reach the + destination and kill the first acknowledgment. Describe again what + happens. + +c. Finally, try sending six packets. What happens? R13. Repeat R12, but + now with the Selective Repeat Java applet. How are Selective Repeat + and Go-Back-N different? + +SECTION 3.5 R14. True or false? + +a. Host A is sending Host B a large file over a TCP connection. Assume + Host B has no data to send Host A. Host B will not send + acknowledgments to Host A because Host B cannot piggyback the + acknowledgments on data. + +b. The size of the TCP rwnd never changes throughout the duration of the +connection. c. Suppose Host A is sending Host B a large file over a TCP +connection. The number of unacknowledged bytes that A sends cannot +exceed the size of the receive buffer. + +d. Suppose Host A is sending a large file to Host B over a TCP + connection. If the sequence number for a segment of this connection + is m, then the sequence number for the subsequent segment will + necessarily be m+1. + +e. The TCP segment has a field in its header for rwnd . + +f. Suppose that the last SampleRTT in a TCP connection is equal to 1 + sec. The current value of TimeoutInterval for the connection will + necessarily be ≥1 sec. + +g. Suppose Host A sends one segment with sequence number 38 and 4 bytes + of data over a TCP connection to Host B. In this same segment the + acknowledgment number is necessarily 42. R15. Suppose Host A sends + two TCP segments back to back to Host B over a TCP connection. The + first segment has sequence number 90; the second has sequence number + 110. + +h. How much data is in the first segment? + +i. Suppose that the first segment is lost but the second segment + arrives at B. In the acknowledgment that Host B sends to Host A, + what will be the acknowledgment number? R16. Consider the Telnet + example discussed in Section 3.5 . A few seconds after the user + types the letter 'C,' the user types the letter 'R.' After typing + the letter 'R,' how many segments are sent, and what is put in the + sequence number and acknowledgment fields of the segments? + +SECTION 3.7 R17. Suppose two TCP connections are present over some +bottleneck link of rate R bps. Both connections have a huge file to send +(in the same direction over the bottleneck link). The transmissions of +the files start at the same time. What transmission rate would TCP like +to give to each of the connections? R18. True or false? Consider +congestion control in TCP. When the timer expires at the sender, the +value of ssthresh is set to one half of its previous value. R19. In the +discussion of TCP splitting in the sidebar in Section 3.7 , it was +claimed that the response time with TCP splitting is approximately +4⋅RTTFE+RTTBE+processing time. Justify this claim. + +Problems P1. Suppose Client A initiates a Telnet session with Server S. +At about the same time, Client B + +also initiates a Telnet session with Server S. Provide possible source +and destination port numbers for + +a. The segments sent from A to S. + +b. The segments sent from B to S. + +c. The segments sent from S to A. + +d. The segments sent from S to B. + +e. If A and B are different hosts, is it possible that the source port + number in the segments from A to S is the same as that from B to S? + +f. How about if they are the same host? P2. Consider Figure 3.5 . What + are the source and destination port values in the segments flowing + from the server back to the clients' processes? What are the IP + addresses in the network-layer datagrams carrying the + transport-layer segments? P3. UDP and TCP use 1s complement for + their checksums. Suppose you have the following three 8-bit bytes: + 01010011, 01100110, 01110100. What is the 1s complement of the sum + of these 8-bit bytes? (Note that although UDP and TCP use 16-bit + words in computing the checksum, for this problem you are being + asked to consider 8-bit sums.) Show all work. Why is it that UDP + takes the 1s complement of the sum; that is, why not just use the + sum? With the 1s complement scheme, how does the receiver detect + errors? Is it possible that a 1-bit error will go undetected? How + about a 2-bit error? P4. + +g. Suppose you have the following 2 bytes: 01011100 and 01100101. What + is the 1s complement of the sum of these 2 bytes? + +h. Suppose you have the following 2 bytes: 11011010 and 01100101. What + is the 1s complement of the sum of these 2 bytes? + +i. For the bytes in part (a), give an example where one bit is flipped + in each of the 2 bytes and yet the 1s complement doesn't change. P5. + Suppose that the UDP receiver computes the Internet checksum for the + received UDP segment and finds that it matches the value carried in + the checksum field. Can the receiver be absolutely certain that no + bit errors have occurred? Explain. P6. Consider our motivation for + correcting protocol rdt2.1 . Show that the receiver, shown in Figure + 3.57 , when operating with the sender shown in Figure 3.11 , can + lead the sender and receiver to enter into a deadlock state, where + each is waiting for an event that will never occur. P7. In protocol + rdt3.0 , the ACK packets flowing from the receiver to the sender do + not have sequence numbers (although they do have an ACK field that + contains the sequence number of the packet they are acknowledging). + Why is it that our ACK packets do not require sequence numbers? + +Figure 3.57 An incorrect receiver for protocol rdt 2.1 + +P8. Draw the FSM for the receiver side of protocol rdt3.0 . P9. Give a +trace of the operation of protocol rdt3.0 when data packets and +acknowledgment packets are garbled. Your trace should be similar to that +used in Figure 3.16 . P10. Consider a channel that can lose packets but +has a maximum delay that is known. Modify protocol rdt2.1 to include +sender timeout and retransmit. Informally argue why your protocol can +communicate correctly over this channel. P11. Consider the rdt2.2 +receiver in Figure 3.14 , and the creation of a new packet in the +self-transition (i.e., the transition from the state back to itself) in +the Wait-for-0-from-below and the Wait-for-1-from-below states: +sndpkt=make_pkt(ACK, 1, checksum) and sndpkt=make_pkt(ACK, 0, checksum) +. Would the protocol work correctly if this action were removed from the +self-transition in the Wait-for-1-from-below state? Justify your answer. +What if this event were removed from the self-transition in the +Wait-for-0-from-below state? \[Hint: In this latter case, consider what +would happen if the first sender-to-receiver packet were corrupted.\] +P12. The sender side of rdt3.0 simply ignores (that is, takes no action +on) all received packets that are either in error or have the wrong +value in the acknum field of an acknowledgment packet. Suppose that in +such circumstances, rdt3.0 were simply to retransmit the current data +packet. Would the protocol still work? (Hint: Consider what would happen +if there were only bit errors; there are no packet losses but premature +timeouts can occur. Consider how many times the nth packet is sent, in +the limit as n approaches infinity.) + +P13. Consider the rdt 3.0 protocol. Draw a diagram showing that if the +network connection between the sender and receiver can reorder messages +(that is, that two messages propagating in the medium between the sender +and receiver can be reordered), then the alternating-bit protocol will +not work correctly (make sure you clearly identify the sense in which it +will not work correctly). Your diagram should have the sender on the +left and the receiver on the right, with the time axis running down the +page, showing data (D) and acknowledgment (A) message exchange. Make +sure you indicate the sequence number associated with any data or +acknowledgment segment. P14. Consider a reliable data transfer protocol +that uses only negative acknowledgments. Suppose the sender sends data +only infrequently. Would a NAK-only protocol be preferable to a protocol +that uses ACKs? Why? Now suppose the sender has a lot of data to send +and the endto-end connection experiences few losses. In this second +case, would a NAK-only protocol be preferable to a protocol that uses +ACKs? Why? P15. Consider the cross-country example shown in Figure 3.17 +. How big would the window size have to be for the channel utilization +to be greater than 98 percent? Suppose that the size of a packet is +1,500 bytes, including both header fields and data. P16. Suppose an +application uses rdt 3.0 as its transport layer protocol. As the +stop-and-wait protocol has very low channel utilization (shown in the +cross-country example), the designers of this application let the +receiver keep sending back a number (more than two) of alternating ACK 0 +and ACK 1 even if the corresponding data have not arrived at the +receiver. Would this application design increase the channel +utilization? Why? Are there any potential problems with this approach? +Explain. P17. Consider two network entities, A and B, which are +connected by a perfect bi-directional channel (i.e., any message sent +will be received correctly; the channel will not corrupt, lose, or +re-order packets). A and B are to deliver data messages to each other in +an alternating manner: First, A must deliver a message to B, then B must +deliver a message to A, then A must deliver a message to B and so on. If +an entity is in a state where it should not attempt to deliver a message +to the other side, and there is an event like rdt_send(data) call from +above that attempts to pass data down for transmission to the other +side, this call from above can simply be ignored with a call to +rdt_unable_to_send(data) , which informs the higher layer that it is +currently not able to send data. \[Note: This simplifying assumption is +made so you don't have to worry about buffering data.\] Draw a FSM +specification for this protocol (one FSM for A, and one FSM for B!). +Note that you do not have to worry about a reliability mechanism here; +the main point of this question is to create a FSM specification that +reflects the synchronized behavior of the two entities. You should use +the following events and actions that have the same meaning as protocol +rdt1.0 in Figure 3.9 : rdt_send(data), packet = make_pkt(data) , +udt_send(packet), rdt_rcv(packet) , extract (packet, data), +deliver_data(data) . Make sure your protocol reflects the strict +alternation of sending between A and B. Also, make sure to indicate the +initial states for A and B in your FSM descriptions. + +P18. In the generic SR protocol that we studied in Section 3.4.4 , the +sender transmits a message as soon as it is available (if it is in the +window) without waiting for an acknowledgment. Suppose now that we want +an SR protocol that sends messages two at a time. That is, the sender +will send a pair of messages and will send the next pair of messages +only when it knows that both messages in the first pair have been +received correctly. Suppose that the channel may lose messages but will +not corrupt or reorder messages. Design an error-control protocol for +the unidirectional reliable transfer of messages. Give an FSM +description of the sender and receiver. Describe the format of the +packets sent between sender and receiver, and vice versa. If you use any +procedure calls other than those in Section 3.4 (for example, udt_send() +, start_timer() , rdt_rcv() , and so on), clearly state their actions. +Give an example (a timeline trace of sender and receiver) showing how +your protocol recovers from a lost packet. P19. Consider a scenario in +which Host A wants to simultaneously send packets to Hosts B and C. A is +connected to B and C via a broadcast channel---a packet sent by A is +carried by the channel to both B and C. Suppose that the broadcast +channel connecting A, B, and C can independently lose and corrupt +packets (and so, for example, a packet sent from A might be correctly +received by B, but not by C). Design a stop-and-wait-like error-control +protocol for reliably transferring packets from A to B and C, such that +A will not get new data from the upper layer until it knows that both B +and C have correctly received the current packet. Give FSM descriptions +of A and C. (Hint: The FSM for B should be essentially the same as for +C.) Also, give a description of the packet format(s) used. P20. Consider +a scenario in which Host A and Host B want to send messages to Host C. +Hosts A and C are connected by a channel that can lose and corrupt (but +not reorder) messages. Hosts B and C are connected by another channel +(independent of the channel connecting A and C) with the same +properties. The transport layer at Host C should alternate in delivering +messages from A and B to the layer above (that is, it should first +deliver the data from a packet from A, then the data from a packet from +B, and so on). Design a stop-and-wait-like error-control protocol for +reliably transferring packets from A and B to C, with alternating +delivery at C as described above. Give FSM descriptions of A and C. +(Hint: The FSM for B should be essentially the same as for A.) Also, +give a description of the packet format(s) used. P21. Suppose we have +two network entities, A and B. B has a supply of data messages that will +be sent to A according to the following conventions. When A gets a +request from the layer above to get the next data (D) message from B, A +must send a request (R) message to B on the A-to-B channel. Only when B +receives an R message can it send a data (D) message back to A on the +B-to-A channel. A should deliver exactly one copy of each D message to +the layer above. R messages can be lost (but not corrupted) in the +A-to-B channel; D messages, once sent, are always delivered correctly. +The delay along both channels is unknown and variable. Design (give an +FSM description of) a protocol that incorporates the appropriate +mechanisms to compensate for the loss-prone A-to-B channel and +implements message passing to the layer above at entity A, as discussed +above. Use only those mechanisms that are absolutely + +necessary. P22. Consider the GBN protocol with a sender window size of 4 +and a sequence number range of 1,024. Suppose that at time t, the next +in-order packet that the receiver is expecting has a sequence number of +k. Assume that the medium does not reorder messages. Answer the +following questions: + +a. What are the possible sets of sequence numbers inside the sender's + window at time t? Justify your answer. + +b. What are all possible values of the ACK field in all possible + messages currently propagating back to the sender at time t? Justify + your answer. P23. Consider the GBN and SR protocols. Suppose the + sequence number space is of size k. What is the largest allowable + sender window that will avoid the occurrence of problems such as + that in Figure 3.27 for each of these protocols? P24. Answer true or + false to the following questions and briefly justify your answer: + +c. With the SR protocol, it is possible for the sender to receive an + ACK for a packet that falls outside of its current window. + +d. With GBN, it is possible for the sender to receive an ACK for a + packet that falls outside of its current window. + +e. The alternating-bit protocol is the same as the SR protocol with a + sender and receiver window size of 1. + +f. The alternating-bit protocol is the same as the GBN protocol with a + sender and receiver window size of 1. P25. We have said that an + application may choose UDP for a transport protocol because UDP + offers finer application control (than TCP) of what data is sent in + a segment and when. + +g. Why does an application have more control of what data is sent in a + segment? + +h. Why does an application have more control on when the segment is + sent? P26. Consider transferring an enormous file of L bytes from + Host A to Host B. Assume an MSS of 536 bytes. + +i. What is the maximum value of L such that TCP sequence numbers are + not exhausted? Recall that the TCP sequence number field has 4 + bytes. + +j. For the L you obtain in (a), find how long it takes to transmit the + file. Assume that a total of 66 bytes of transport, network, and + data-link header are added to each segment before the resulting + packet is sent out over a 155 Mbps link. Ignore flow control and + congestion control so A can pump out the segments back to back and + continuously. P27. Host A and B are communicating over a TCP + connection, and Host B has already received from A all bytes up + through byte 126. Suppose Host A then sends two segments to Host B + backto-back. The first and second segments contain 80 and 40 bytes + of data, respectively. In the first + +segment, the sequence number is 127, the source port number is 302, and +the destination port number is 80. Host B sends an acknowledgment +whenever it receives a segment from Host A. + +a. In the second segment sent from Host A to B, what are the sequence + number, source port number, and destination port number? + +b. If the first segment arrives before the second segment, in the + acknowledgment of the first arriving segment, what is the + acknowledgment number, the source port number, and the destination + port number? + +c. If the second segment arrives before the first segment, in the + acknowledgment of the first arriving segment, what is the + acknowledgment number? + +d. Suppose the two segments sent by A arrive in order at B. The first + acknowledgment is lost and the second acknowledgment arrives after + the first timeout interval. Draw a timing diagram, showing these + segments and all other segments and acknowledgments sent. (Assume + there is no additional packet loss.) For each segment in your + figure, provide the sequence number and the number of bytes of data; + for each acknowledgment that you add, provide the acknowledgment + number. P28. Host A and B are directly connected with a 100 Mbps + link. There is one TCP connection between the two hosts, and Host A + is sending to Host B an enormous file over this connection. Host A + can send its application data into its TCP socket at a rate as high + as 120 Mbps but Host B can read out of its TCP receive buffer at a + maximum rate of 50 Mbps. Describe the effect of TCP flow control. + P29. SYN cookies were discussed in Section 3.5.6 . + +e. Why is it necessary for the server to use a special initial sequence + number in the SYNACK? + +f. Suppose an attacker knows that a target host uses SYN cookies. Can + the attacker create half-open or fully open connections by simply + sending an ACK packet to the target? Why or why not? + +g. Suppose an attacker collects a large amount of initial sequence + numbers sent by the server. Can the attacker cause the server to + create many fully open connections by sending ACKs with those + initial sequence numbers? Why? P30. Consider the network shown in + Scenario 2 in Section 3.6.1 . Suppose both sending hosts A and B + have some fixed timeout values. + +h. Argue that increasing the size of the finite buffer of the router + might possibly decrease the throughput (λout). + +i. Now suppose both hosts dynamically adjust their timeout values (like + what TCP does) based on the buffering delay at the router. Would + increasing the buffer size help to increase the throughput? Why? + P31. Suppose that the five measured SampleRTT values (see Section + 3.5.3 ) are 106 ms, 120 + +ms, 140 ms, 90 ms, and 115 ms. Compute the EstimatedRTT after each of +these SampleRTT values is obtained, using a value of α=0.125 and +assuming that the value of EstimatedRTT was 100 ms just before the first +of these five samples were obtained. Compute also the DevRTT after each +sample is obtained, assuming a value of β=0.25 and assuming the value of +DevRTT was 5 ms just before the first of these five samples was +obtained. Last, compute the TCP TimeoutInterval after each of these +samples is obtained. P32. Consider the TCP procedure for estimating RTT. +Suppose that α=0.1. Let SampleRTT 1 be the most recent sample RTT, let +SampleRTT 2 be the next most recent sample RTT, and so on. + +a. For a given TCP connection, suppose four acknowledgments have been + returned with corresponding sample RTTs: SampleRTT 4, SampleRTT 3, + SampleRTT 2, and SampleRTT 1. Express EstimatedRTT in terms of the + four sample RTTs. + +b. Generalize your formula for n sample RTTs. + +c. For the formula in part (b) let n approach infinity. Comment on why + this averaging procedure is called an exponential moving average. + P33. In Section 3.5.3 , we discussed TCP's estimation of RTT. Why do + you think TCP avoids measuring the SampleRTT for retransmitted + segments? P34. What is the relationship between the variable + SendBase in Section 3.5.4 and the variable LastByteRcvd in Section + 3.5.5 ? P35. What is the relationship between the variable + LastByteRcvd in Section 3.5.5 and the variable y in Section 3.5.4? + P36. In Section 3.5.4 , we saw that TCP waits until it has received + three duplicate ACKs before performing a fast retransmit. Why do you + think the TCP designers chose not to perform a fast retransmit after + the first duplicate ACK for a segment is received? P37. Compare GBN, + SR, and TCP (no delayed ACK). Assume that the timeout values for all + three protocols are sufficiently long such that 5 consecutive data + segments and their corresponding ACKs can be received (if not lost + in the channel) by the receiving host (Host B) and the sending host + (Host A) respectively. Suppose Host A sends 5 data segments to Host + B, and the 2nd segment (sent from A) is lost. In the end, all 5 data + segments have been correctly received by Host B. + +d. How many segments has Host A sent in total and how many ACKs has + Host B sent in total? What are their sequence numbers? Answer this + question for all three protocols. + +e. If the timeout values for all three protocol are much longer than 5 + RTT, then which protocol successfully delivers all five data + segments in shortest time interval? P38. In our description of TCP + in Figure 3.53 , the value of the threshold, ssthresh , is set as + ssthresh=cwnd/2 in several places and ssthresh value is referred to + as being set to half the window size when a loss event occurred. + Must the rate at which the sender is sending when the loss event + occurred be approximately equal to cwnd segments per RTT? Explain + your + +answer. If your answer is no, can you suggest a different manner in +which ssthresh should be set? P39. Consider Figure 3.46(b) . If λ′in +increases beyond R/2, can λout increase beyond R/3? Explain. Now +consider Figure 3.46(c) . If λ′in increases beyond R/2, can λout +increase beyond R/4 under the assumption that a packet will be forwarded +twice on average from the router to the receiver? Explain. P40. Consider +Figure 3.58 . Assuming TCP Reno is the protocol experiencing the +behavior shown above, answer the following questions. In all cases, you +should provide a short discussion justifying your answer. + +Examining the behavior of TCP + +a. Identify the intervals of time when TCP slow start is operating. + +b. Identify the intervals of time when TCP congestion avoidance is + operating. + +c. After the 16th transmission round, is segment loss detected by a + triple duplicate ACK or by a timeout? + +d. After the 22nd transmission round, is segment loss detected by a + triple duplicate ACK or by a timeout? + +Figure 3.58 TCP window size as a function of time + +e. What is the initial value of ssthresh at the first transmission +round? f. What is the value of ssthresh at the 18th transmission round? +g. What is the value of ssthresh at the 24th transmission round? h. +During what transmission round is the 70th segment sent? i. Assuming a +packet loss is detected after the 26th round by the receipt of a triple +duplicate ACK, what will be the values of the congestion window size and +of ssthresh ? + +j. Suppose TCP Tahoe is used (instead of TCP Reno), and assume that + triple duplicate ACKs are received at the 16th round. What are the + ssthresh and the congestion window size at the 19th round? + +k. Again suppose TCP Tahoe is used, and there is a timeout event at + 22nd round. How many packets have been sent out from 17th round till + 22nd round, inclusive? P41. Refer to Figure 3.55 , which illustrates + the convergence of TCP's AIMD algorithm. Suppose that instead of a + multiplicative decrease, TCP decreased the window size by a constant + amount. Would the resulting AIAD algorithm converge to an equal + share algorithm? Justify your answer using a diagram similar to + Figure 3.55 . P42. In Section 3.5.4 , we discussed the doubling of + the timeout interval after a timeout event. This mechanism is a form + of congestion control. Why does TCP need a window-based + congestion-control mechanism (as studied in Section 3.7 ) in + addition to this doubling-timeoutinterval mechanism? P43. Host A is + sending an enormous file to Host B over a TCP connection. Over this + connection there is never any packet loss and the timers never + expire. Denote the transmission rate of the link connecting Host A + to the Internet by R bps. Suppose that the process in Host A is + capable of sending data into its TCP socket at a rate S bps, where + S=10⋅R. Further suppose that the TCP receive buffer is large enough + to hold the entire file, and the send buffer can hold only one + percent of the file. What would prevent the process in Host A from + continuously passing data to its TCP socket at rate S bps? TCP flow + control? TCP congestion control? Or something else? Elaborate. P44. + Consider sending a large file from a host to another over a TCP + connection that has no loss. + +l. Suppose TCP uses AIMD for its congestion control without slow start. + Assuming cwnd increases by 1 MSS every time a batch of ACKs is + received and assuming approximately constant round-trip times, how + long does it take for cwnd increase from 6 MSS to 12 MSS (assuming + no loss events)? + +m. What is the average throughout (in terms of MSS and RTT) for this + connection up through time=6 RTT? P45. Recall the macroscopic + description of TCP throughput. In the period of time from when the + +connection's rate varies from W/(2 · RTT) to W/RTT, only one packet is +lost (at the very end of the period). + +a. Show that the loss rate (fraction of packets lost) is equal to + L=loss rate=138W2+34W + +b. Use the result above to show that if a connection has loss rate L, + then its average rate is approximately given by ≈1.22⋅MSSRTTL P46. + Consider that only a single TCP (Reno) connection uses one 10Mbps + link which does not buffer any data. Suppose that this link is the + only congested link between the sending and receiving hosts. Assume + that the TCP sender has a huge file to send to the receiver, and the + receiver's receive buffer is much larger than the congestion window. + We also make the following assumptions: each TCP segment size is + 1,500 bytes; the two-way propagation delay of this connection is 150 + msec; and this TCP connection is always in congestion avoidance + phase, that is, ignore slow start. + +c. What is the maximum window size (in segments) that this TCP + connection can achieve? + +d. What is the average window size (in segments) and average throughput + (in bps) of this TCP connection? + +e. How long would it take for this TCP connection to reach its maximum + window again after recovering from a packet loss? P47. Consider the + scenario described in the previous problem. Suppose that the 10Mbps + link can buffer a finite number of segments. Argue that in order for + the link to always be busy sending data, we would like to choose a + buffer size that is at least the product of the link speed C and the + two-way propagation delay between the sender and the receiver. P48. + Repeat Problem 46, but replacing the 10 Mbps link with a 10 Gbps + link. Note that in your answer to part c, you will realize that it + takes a very long time for the congestion window size to reach its + maximum window size after recovering from a packet loss. Sketch a + solution to solve this problem. P49. Let T (measured by RTT) denote + the time interval that a TCP connection takes to increase its + congestion window size from W/2 to W, where W is the maximum + congestion window size. Argue that T is a function of TCP's average + throughput. P50. Consider a simplified TCP's AIMD algorithm where + the congestion window size is measured in number of segments, not in + bytes. In additive increase, the congestion window size increases by + one segment in each RTT. In multiplicative decrease, the congestion + window size decreases by half (if the result is not an integer, + round down to the nearest integer). Suppose that two TCP + connections, C1 and C2, share a single congested link of speed 30 + segments per second. Assume that both C1 and C2 are in the + congestion avoidance phase. Connection C1's RTT is 50 msec and + connection C2's RTT is 100 msec. Assume that when the data rate in + the + +link exceeds the link's speed, all TCP connections experience data +segment loss. + +a. If both C1 and C2 at time t0 have a congestion window of 10 + segments, what are their congestion window sizes after 1000 msec? + +b. In the long run, will these two connections get the same share of + the bandwidth of the congested link? Explain. P51. Consider the + network described in the previous problem. Now suppose that the two + TCP connections, C1 and C2, have the same RTT of 100 msec. Suppose + that at time t0, C1's congestion window size is 15 segments but C2's + congestion window size is 10 segments. + +c. What are their congestion window sizes after 2200 msec? + +d. In the long run, will these two connections get about the same share + of the bandwidth of the congested link? + +e. We say that two connections are synchronized, if both connections + reach their maximum window sizes at the same time and reach their + minimum window sizes at the same time. In the long run, will these + two connections get synchronized eventually? If so, what are their + maximum window sizes? + +f. Will this synchronization help to improve the utilization of the + shared link? Why? Sketch some idea to break this synchronization. + P52. Consider a modification to TCP's congestion control algorithm. + Instead of additive increase, we can use multiplicative increase. A + TCP sender increases its window size by a small positive constant + a(0\<a\<1) whenever it receives a valid ACK. Find the functional + relationship between loss rate L and maximum congestion window W. + Argue that for this modified TCP, regardless of TCP's average + throughput, a TCP connection always spends the same amount of time + to increase its congestion window size from W/2 to W. P53. In our + discussion of TCP futures in Section 3.7 , we noted that to achieve + a throughput of 10 Gbps, TCP could only tolerate a segment loss + probability of 2⋅10−10 (or equivalently, one loss event for every + 5,000,000,000 segments). Show the derivation for the values of + 2⋅10−10 (1 out of 5,000,000) for the RTT and MSS values given in + Section 3.7 . If TCP needed to support a 100 Gbps connection, what + would the tolerable loss be? P54. In our discussion of TCP + congestion control in Section 3.7 , we implicitly assumed that the + TCP sender always had data to send. Consider now the case that the + TCP sender sends a large amount of data and then goes idle (since it + has no more data to send) at t1. TCP remains idle for a relatively + long period of time and then wants to send more data at t2. What are + the advantages and disadvantages of having TCP use the cwnd and + ssthresh values from t1 when starting to send data at t2? What + alternative would you recommend? Why? P55. In this problem we + investigate whether either UDP or TCP provides a degree of end-point + authentication. + +g. Consider a server that receives a request within a UDP packet and + responds to that request within a UDP packet (for example, as done + by a DNS server). If a client with IP + +address X spoofs its address with address Y, where will the server send +its response? + +b. Suppose a server receives a SYN with IP source address Y, and after + responding with a SYNACK, receives an ACK with IP source address Y + with the correct acknowledgment number. Assuming the server chooses + a random initial sequence number and there is no + "man-in-the-middle," can the server be certain that the client is + indeed at Y (and not at some other address X that is spoofing Y)? + P56. In this problem, we consider the delay introduced by the TCP + slow-start phase. Consider a client and a Web server directly + connected by one link of rate R. Suppose the client wants to + retrieve an object whose size is exactly equal to 15 S, where S is + the maximum segment size (MSS). Denote the round-trip time between + client and server as RTT (assumed to be constant). Ignoring protocol + headers, determine the time to retrieve the object (including TCP + connection establishment) when + +c. 4 S/R\>S/R+RTT\>2S/R + +d. S/R+RTT\>4 S/R + +e. S/R\>RTT. + +Programming Assignments Implementing a Reliable Transport Protocol In +this laboratory programming assignment, you will be writing the sending +and receiving transport-level code for implementing a simple reliable +data transfer protocol. There are two versions of this lab, the +alternating-bit-protocol version and the GBN version. This lab should be +fun---your implementation will differ very little from what would be +required in a real-world situation. Since you probably don't have +standalone machines (with an OS that you can modify), your code will +have to execute in a simulated hardware/software environment. However, +the programming interface provided to your routines---the code that +would call your entities from above and from below---is very close to +what is done in an actual UNIX environment. (Indeed, the software +interfaces described in this programming assignment are much more +realistic than the infinite loop senders and receivers that many texts +describe.) Stopping and starting timers are also simulated, and timer +interrupts will cause your timer handling routine to be activated. The +full lab assignment, as well as code you will need to compile with your +own code, are available at this book's Web site: +www.pearsonhighered.com/cs-resources. + +Wireshark Lab: Exploring TCP + +In this lab, you'll use your Web browser to access a file from a Web +server. As in earlier Wireshark labs, you'll use Wireshark to capture +the packets arriving at your computer. Unlike earlier labs, you'll also +be able to download a Wireshark-readable packet trace from the Web +server from which you downloaded the file. In this server trace, you'll +find the packets that were generated by your own access of the Web +server. You'll analyze the client- and server-side traces to explore +aspects of TCP. In particular, you'll evaluate the performance of the +TCP connection between your computer and the Web server. You'll trace +TCP's window behavior, and infer packet loss, retransmission, flow +control and congestion control behavior, and estimated roundtrip time. +As is the case with all Wireshark labs, the full description of this lab +is available at this book's Web site, +www.pearsonhighered.com/cs-resources. + +Wireshark Lab: Exploring UDP In this short lab, you'll do a packet +capture and analysis of your favorite application that uses UDP (for +example, DNS or a multimedia application such as Skype). As we learned +in Section 3.3, UDP is a simple, no-frills transport protocol. In this +lab, you'll investigate the header fields in the UDP segment as well as +the checksum calculation. As is the case with all Wireshark labs, the +full description of this lab is available at this book's Web site, +www.pearsonhighered.com/cs-resources. AN INTERVIEW WITH... Van Jacobson +Van Jacobson works at Google and was previously a Research Fellow at +PARC. Prior to that, he was co-founder and Chief Scientist of Packet +Design. Before that, he was Chief Scientist at Cisco. Before joining +Cisco, he was head of the Network Research Group at Lawrence Berkeley +National Laboratory and taught at UC Berkeley and Stanford. Van received +the ACM SIGCOMM Award in 2001 for outstanding lifetime contribution to +the field of communication networks and the IEEE Kobayashi Award in 2002 +for "contributing to the understanding of network congestion and +developing congestion control mechanisms that enabled the successful +scaling of the Internet". He was elected to the U.S. National Academy of +Engineering in 2004. + +Please describe one or two of the most exciting projects you have worked +on during your career. What were the biggest challenges? School teaches +us lots of ways to find answers. In every interesting problem I've +worked on, the challenge has been finding the right question. When Mike +Karels and I started looking at TCP congestion, we spent months staring +at protocol and packet traces asking "Why is it failing?". One day in +Mike's office, one of us said "The reason I can't figure out why it +fails is because I don't understand how it ever worked to begin with." +That turned out to be the right question and it forced us to figure out +the "ack clocking" that makes TCP work. After that, the rest was easy. +More generally, where do you see the future of networking and the +Internet? For most people, the Web is the Internet. Networking geeks +smile politely since we know the Web is an application running over the +Internet but what if they're right? The Internet is about enabling +conversations between pairs of hosts. The Web is about distributed +information production and consumption. "Information propagation" is a +very general view of communication of which "pairwise conversation" is a +tiny subset. We need to move into the larger tent. Networking today +deals with broadcast media (radios, PONs, etc.) by pretending it's a +point-topoint wire. That's massively inefficient. Terabits-per-second of +data are being exchanged all over the World via thumb drives or smart +phones but we don't know how to treat that as "networking". ISPs are +busily setting up caches and CDNs to scalably distribute video and +audio. Caching is a necessary part of the solution but there's no part +of today's networking---from Information, Queuing or Traffic Theory down +to the Internet protocol specs---that tells us how to engineer and +deploy it. I think and hope that over the next few years, networking +will evolve to embrace the much larger vision of communication that +underlies the Web. What people inspired you professionally? + +When I was in grad school, Richard Feynman visited and gave a +colloquium. He talked about a piece of Quantum theory that I'd been +struggling with all semester and his explanation was so simple and lucid +that what had been incomprehensible gibberish to me became obvious and +inevitable. That ability to see and convey the simplicity that underlies +our complex world seems to me a rare and wonderful gift. What are your +recommendations for students who want careers in computer science and +networking? It's a wonderful field---computers and networking have +probably had more impact on society than any invention since the book. +Networking is fundamentally about connecting stuff, and studying it +helps you make intellectual connections: Ant foraging & Bee dances +demonstrate protocol design better than RFCs, traffic jams or people +leaving a packed stadium are the essence of congestion, and students +finding flights back to school in a post-Thanksgiving blizzard are the +core of dynamic routing. If you're interested in lots of stuff and want +to have an impact, it's hard to imagine a better field. + +Chapter 4 The Network Layer: Data Plane + +We learned in the previous chapter that the transport layer provides +various forms of process-to-process communication by relying on the +network layer's host-to-host communication service. We also learned that +the transport layer does so without any knowledge about how the network +layer actually implements this service. So perhaps you're now wondering, +what's under the hood of the host-to-host communication service, what +makes it tick? In this chapter and the next, we'll learn exactly how the +network layer can provide its host-to-host communication service. We'll +see that unlike the transport and application layers, there is a piece +of the network layer in each and every host and router in the network. +Because of this, network-layer protocols are among the most challenging +(and therefore among the most interesting!) in the protocol stack. Since +the network layer is arguably the most complex layer in the protocol +stack, we'll have a lot of ground to cover here. Indeed, there is so +much to cover that we cover the network layer in two chapters. We'll see +that the network layer can be decomposed into two interacting parts, the +data plane and the control plane. In Chapter 4, we'll first cover the +data plane functions of the network layer---the perrouter functions in +the network layer that determine how a datagram (that is, a +network-layer packet) arriving on one of a router's input links is +forwarded to one of that router's output links. We'll cover both +traditional IP forwarding (where forwarding is based on a datagram's +destination address) and generalized forwarding (where forwarding and +other functions may be performed using values in several different +fields in the datagram's header). We'll study the IPv4 and IPv6 +protocols and addressing in detail. In Chapter 5, we'll cover the +control plane functions of the network layer---the network-wide logic +that controls how a datagram is routed among routers along an end-to-end +path from the source host to the destination host. We'll cover routing +algorithms, as well as routing protocols, such as OSPF and BGP, that are +in widespread use in today's Internet. Traditionally, these +control-plane routing protocols and data-plane forwarding functions have +been implemented together, monolithically, within a router. +Software-defined networking (SDN) explicitly separates the data plane +and control plane by implementing these control plane functions as a +separate service, typically in a remote "controller." We'll also cover +SDN controllers in Chapter 5. This distinction between data-plane and +control-plane functions in the network layer is an important concept to +keep in mind as you learn about the network layer ---it will help +structure your thinking about + +the network layer and reflects a modern view of the network layer's role +in computer networking. + +4.1 Overview of Network Layer Figure 4.1 shows a simple network with two +hosts, H1 and H2, and several routers on the path between H1 and H2. +Let's suppose that H1 is sending information to H2, and consider the +role of the network layer in these hosts and in the intervening routers. +The network layer in H1 takes segments from the transport layer in H1, +encapsulates each segment into a datagram, and then sends the datagrams +to its nearby router, R1. At the receiving host, H2, the network layer +receives the datagrams from its nearby router R2, extracts the +transport-layer segments, and delivers the segments up to the transport +layer at H2. The primary data-plane role of each router is to forward +datagrams from its input links to its output links; the primary role of +the network control plane is to coordinate these local, per-router +forwarding actions so that datagrams are ultimately transferred +end-to-end, along paths of routers between source and destination hosts. +Note that the routers in Figure 4.1 are shown with a truncated protocol +stack, that is, with no upper layers above the network layer, because +routers do not run application- and transportlayer protocols such as +those we examined in Chapters 2 and 3. + +4.1.1 Forwarding and Routing: The Data and Control Planes The primary +role of the network layer is deceptively simple---to move packets from a +sending host to a receiving host. To do so, two important network-layer +functions can be identified: Forwarding. When a packet arrives at a +router's input link, the router must move the packet to the appropriate +output link. For example, a packet arriving from Host H1 to Router R1 in +Figure 4.1 must be forwarded to the next router on a path to H2. As we +will see, forwarding is but one function (albeit the most + +Figure 4.1 The network layer + +common and important one!) implemented in the data plane. In the more +general case, which we'll cover in Section 4.4, a packet might also be +blocked from exiting a router (e.g., if the packet originated at a known +malicious sending host, or if the packet were destined to a forbidden +destination host), or might be duplicated and sent over multiple +outgoing links. Routing. The network layer must determine the route or +path taken by packets as they flow from a sender to a receiver. The +algorithms that calculate these paths are referred to as routing +algorithms. A routing algorithm would determine, for example, the path +along which packets flow + +from H1 to H2 in Figure 4.1. Routing is implemented in the control plane +of the network layer. The terms forwarding and routing are often used +interchangeably by authors discussing the network layer. We'll use these +terms much more precisely in this book. Forwarding refers to the +router-local action of transferring a packet from an input link +interface to the appropriate output link interface. Forwarding takes +place at very short timescales (typically a few nanoseconds), and thus +is typically implemented in hardware. Routing refers to the network-wide +process that determines the end-to-end paths that packets take from +source to destination. Routing takes place on much longer timescales +(typically seconds), and as we will see is often implemented in +software. Using our driving analogy, consider the trip from Pennsylvania +to Florida undertaken by our traveler back in Section 1.3.1. During this +trip, our driver passes through many interchanges en route to Florida. +We can think of forwarding as the process of getting through a single +interchange: A car enters the interchange from one road and determines +which road it should take to leave the interchange. We can think of +routing as the process of planning the trip from Pennsylvania to +Florida: Before embarking on the trip, the driver has consulted a map +and chosen one of many paths possible, with each path consisting of a +series of road segments connected at interchanges. A key element in +every network router is its forwarding table. A router forwards a packet +by examining the value of one or more fields in the arriving packet's +header, and then using these header values to index into its forwarding +table. The value stored in the forwarding table entry for those values +indicates the outgoing link interface at that router to which that +packet is to be forwarded. For example, in Figure 4.2, a packet with +header field value of 0110 arrives to a router. The router indexes into +its forwarding table and determines that the output link interface for +this packet is interface 2. The router then internally forwards the +packet to interface 2. In Section 4.2, we'll look inside a router and +examine the forwarding function in much greater detail. Forwarding is +the key function performed by the data-plane functionality of the +network layer. Control Plane: The Traditional Approach But now you are +undoubtedly wondering how a router's forwarding tables are configured in +the first place. This is a crucial issue, one that exposes the important +interplay between forwarding (in data plane) and routing (in control +plane). As shown + +Figure 4.2 Routing algorithms determine values in forward tables + +in Figure 4.2, the routing algorithm determines the contents of the +routers' forwarding tables. In this example, a routing algorithm runs in +each and every router and both forwarding and routing functions are +contained within a router. As we'll see in Sections 5.3 and 5.4, the +routing algorithm function in one router communicates with the routing +algorithm function in other routers to compute the values for its +forwarding table. How is this communication performed? By exchanging +routing messages containing routing information according to a routing +protocol! We'll cover routing algorithms and protocols in Sections 5.2 +through 5.4. The distinct and different purposes of the forwarding and +routing functions can be further illustrated by considering the +hypothetical (and unrealistic, but technically feasible) case of a +network in which all forwarding tables are configured directly by human +network operators physically present at the routers. In this case, no +routing protocols would be required! Of course, the human operators +would need to interact with each other to ensure that the forwarding +tables were configured in such a way that packets reached their intended +destinations. It's also likely that human configuration would be more +error-prone and much slower to respond to changes in the network +topology than a routing protocol. We're thus fortunate that all networks +have both a forwarding and a routing function! Control Plane: The SDN +Approach The approach to implementing routing functionality shown in +Figure 4.2---with each router having a routing component that +communicates with the routing component of other routers---has been the + +traditional approach adopted by routing vendors in their products, at +least until recently. Our observation that humans could manually +configure forwarding tables does suggest, however, that there may be +other ways for control-plane functionality to determine the contents of +the data-plane forwarding tables. Figure 4.3 shows an alternate approach +in which a physically separate (from the routers), remote controller +computes and distributes the forwarding tables to be used by each and +every router. Note that the data plane components of Figures 4.2 and 4.3 +are identical. In Figure 4.3, however, control-plane routing +functionality is separated + +Figure 4.3 A remote controller determines and distributes values in +forwarding tables + +from the physical router---the routing device performs forwarding only, +while the remote controller computes and distributes forwarding tables. +The remote controller might be implemented in a remote data center with +high reliability and redundancy, and might be managed by the ISP or some +third party. How might the routers and the remote controller +communicate? By exchanging messages containing forwarding tables and +other pieces of routing information. The control-plane approach shown in +Figure 4.3 is at the heart of software-defined networking (SDN), where +the network is "software-defined" because the controller that computes +forwarding tables and interacts with routers is implemented in software. +Increasingly, these software implementations are also open, i.e., +similar to Linux OS code, the + +code is publically available, allowing ISPs (and networking researchers +and students!) to innovate and propose changes to the software that +controls network-layer functionality. We will cover the SDN control +plane in Section 5.5. + +4.1.2 Network Service Model Before delving into the network layer's data +plane, let's wrap up our introduction by taking the broader view and +consider the different types of service that might be offered by the +network layer. When the transport layer at a sending host transmits a +packet into the network (that is, passes it down to the network layer at +the sending host), can the transport layer rely on the network layer to +deliver the packet to the destination? When multiple packets are sent, +will they be delivered to the transport layer in the receiving host in +the order in which they were sent? Will the amount of time between the +sending of two sequential packet transmissions be the same as the amount +of time between their reception? Will the network provide any feedback +about congestion in the network? The answers to these questions and +others are determined by the service model provided by the network +layer. The network service model defines the characteristics of +end-to-end delivery of packets between sending and receiving hosts. +Let's now consider some possible services that the network layer could +provide. These services could include: Guaranteed delivery. This service +guarantees that a packet sent by a source host will eventually arrive at +the destination host. Guaranteed delivery with bounded delay. This +service not only guarantees delivery of the packet, but delivery within +a specified host-to-host delay bound (for example, within 100 msec). +In-order packet delivery. This service guarantees that packets arrive at +the destination in the order that they were sent. Guaranteed minimal +bandwidth. This network-layer service emulates the behavior of a +transmission link of a specified bit rate (for example, 1 Mbps) between +sending and receiving hosts. As long as the sending host transmits bits +(as part of packets) at a rate below the specified bit rate, then all +packets are eventually delivered to the destination host. Security. The +network layer could encrypt all datagrams at the source and decrypt them +at the destination, thereby providing confidentiality to all +transport-layer segments. This is only a partial list of services that a +network layer could provide---there are countless variations possible. +The Internet's network layer provides a single service, known as +best-effort service. With best-effort service, packets are neither +guaranteed to be received in the order in which they were sent, nor is +their eventual delivery even guaranteed. There is no guarantee on the +end-to-end delay nor is there a + +minimal bandwidth guarantee. It might appear that best-effort service is +a euphemism for no service at all---a network that delivered no packets +to the destination would satisfy the definition of best-effort delivery +service! Other network architectures have defined and implemented +service models that go beyond the Internet's best-effort service. For +example, the ATM network architecture \[MFA Forum 2016, Black 1995\] +provides for guaranteed in-order delay, bounded delay, and guaranteed +minimal bandwidth. There have also been proposed service model +extensions to the Internet architecture; for example, the Intserv +architecture \[RFC 1633\] aims to provide end-end delay guarantees and +congestion-free communication. Interestingly, in spite of these +well-developed alternatives, the Internet's basic best-effort service +model combined with adequate bandwidth provisioning have arguably proven +to be more than "good enough" to enable an amazing range of +applications, including streaming video services such as Netflix and +voice-and-video-over-IP, real-time conferencing applications such as +Skype and Facetime. + +An Overview of Chapter 4 Having now provided an overview of the network +layer, we'll cover the data-plane component of the network layer in the +following sections in this chapter. In Section 4.2, we'll dive down into +the internal hardware operations of a router, including input and output +packet processing, the router's internal switching mechanism, and packet +queueing and scheduling. In Section 4.3, we'll take a look at +traditional IP forwarding, in which packets are forwarded to output +ports based on their destination IP addresses. We'll encounter IP +addressing, the celebrated IPv4 and IPv6 protocols and more. In Section +4.4, we'll cover more generalized forwarding, where packets may be +forwarded to output ports based on a large number of header values +(i.e., not only based on destination IP address). Packets may be blocked +or duplicated at the router, or may have certain header field values +rewritten---all under software control. This more generalized form of +packet forwarding is a key component of a modern network data plane, +including the data plane in software-defined networks (SDN). We mention +here in passing that the terms forwarding and switching are often used +interchangeably by computer-networking researchers and practitioners; +we'll use both terms interchangeably in this textbook as well. While +we're on the topic of terminology, it's also worth mentioning two other +terms that are often used interchangeably, but that we will use more +carefully. We'll reserve the term packet switch to mean a general +packet-switching device that transfers a packet from input link +interface to output link interface, according to values in a packet's +header fields. Some packet switches, called link-layer switches +(examined in Chapter 6), base their forwarding decision on values in the +fields of the linklayer frame; switches are thus referred to as +link-layer (layer 2) devices. Other packet switches, called routers, +base their forwarding decision on header field values in the +network-layer datagram. Routers are thus network-layer (layer 3) +devices. (To fully appreciate this important distinction, you might want +to review Section 1.5.2, where we discuss network-layer datagrams and +link-layer frames and their relationship.) Since our focus in this +chapter is on the network layer, we'll mostly use the term router in +place of packet switch. + +4.2 What's Inside a Router? Now that we've overviewed the data and +control planes within the network layer, the important distinction +between forwarding and routing, and the services and functions of the +network layer, let's turn our attention to its forwarding function---the +actual transfer of packets from a router's incoming links to the +appropriate outgoing links at that router. A high-level view of a +generic router architecture is shown in Figure 4.4. Four router +components can be identified: + +Figure 4.4 Router architecture + +Input ports. An input port performs several key functions. It performs +the physical layer function of terminating an incoming physical link at +a router; this is shown in the leftmost box of an input port and the +rightmost box of an output port in Figure 4.4. An input port also +performs link-layer functions needed to interoperate with the link layer +at the other side of the incoming link; this is represented by the +middle boxes in the input and output ports. Perhaps most crucially, a +lookup function is also performed at the input port; this will occur in +the rightmost box of the input port. It is here that the forwarding +table is consulted to determine the router output port to which an +arriving packet will be forwarded via the switching fabric. Control +packets (for example, packets carrying routing protocol information) are +forwarded from an input port to the routing processor. Note that the +term "port" here ---referring to the physical input and output router +interfaces---is distinctly different from the software + +ports associated with network applications and sockets discussed in +Chapters 2 and 3. In practice, the number of ports supported by a router +can range from a relatively small number in enterprise routers, to +hundreds of 10 Gbps ports in a router at an ISP's edge, where the number +of incoming lines tends to be the greatest. The Juniper MX2020, edge +router, for example, supports up to 960 10 Gbps Ethernet ports, with an +overall router system capacity of 80 Tbps \[Juniper MX 2020 2016\]. +Switching fabric. The switching fabric connects the router's input ports +to its output ports. This switching fabric is completely contained +within the router---a network inside of a network router! Output ports. +An output port stores packets received from the switching fabric and +transmits these packets on the outgoing link by performing the necessary +link-layer and physical-layer functions. When a link is bidirectional +(that is, carries traffic in both directions), an output port will +typically be paired with the input port for that link on the same line +card. Routing processor. The routing processor performs control-plane +functions. In traditional routers, it executes the routing protocols +(which we'll study in Sections 5.3 and 5.4), maintains routing tables +and attached link state information, and computes the forwarding table +for the router. In SDN routers, the routing processor is responsible for +communicating with the remote controller in order to (among other +activities) receive forwarding table entries computed by the remote +controller, and install these entries in the router's input ports. The +routing processor also performs the network management functions that +we'll study in Section 5.7. A router's input ports, output ports, and +switching fabric are almost always implemented in hardware, as shown in +Figure 4.4. To appreciate why a hardware implementation is needed, +consider that with a 10 Gbps input link and a 64-byte IP datagram, the +input port has only 51.2 ns to process the datagram before another +datagram may arrive. If N ports are combined on a line card (as is often +done in practice), the datagram-processing pipeline must operate N times +faster---far too fast for software implementation. Forwarding hardware +can be implemented either using a router vendor's own hardware designs, +or constructed using purchased merchant-silicon chips (e.g., as sold by +companies such as Intel and Broadcom). While the data plane operates at +the nanosecond time scale, a router's control functions---executing the +routing protocols, responding to attached links that go up or down, +communicating with the remote controller (in the SDN case) and +performing management functions---operate at the millisecond or second +timescale. These control plane functions are thus usually implemented in +software and execute on the routing processor (typically a traditional +CPU). Before delving into the details of router internals, let's return +to our analogy from the beginning of this chapter, where packet +forwarding was compared to cars entering and leaving an interchange. +Let's suppose that the interchange is a roundabout, and that as a car +enters the roundabout, a bit of processing is required. Let's consider +what information is required for this processing: Destination-based +forwarding. Suppose the car stops at an entry station and indicates its +final + +destination (not at the local roundabout, but the ultimate destination +of its journey). An attendant at the entry station looks up the final +destination, determines the roundabout exit that leads to that final +destination, and tells the driver which roundabout exit to take. +Generalized forwarding. The attendant could also determine the car's +exit ramp on the basis of many other factors besides the destination. +For example, the selected exit ramp might depend on the car's origin, +for example the state that issued the car's license plate. Cars from a +certain set of states might be directed to use one exit ramp (that leads +to the destination via a slow road), while cars from other states might +be directed to use a different exit ramp (that leads to the destination +via superhighway). The same decision might be made based on the model, +make and year of the car. Or a car not deemed roadworthy might be +blocked and not be allowed to pass through the roundabout. In the case +of generalized forwarding, any number of factors may contribute to the +attendant's choice of the exit ramp for a given car. Once the car enters +the roundabout (which may be filled with other cars entering from other +input roads and heading to other roundabout exits), it eventually leaves +at the prescribed roundabout exit ramp, where it may encounter other +cars leaving the roundabout at that exit. We can easily recognize the +principal router components in Figure 4.4 in this analogy---the entry +road and entry station correspond to the input port (with a lookup +function to determine to local outgoing port); the roundabout +corresponds to the switch fabric; and the roundabout exit road +corresponds to the output port. With this analogy, it's instructive to +consider where bottlenecks might occur. What happens if cars arrive +blazingly fast (for example, the roundabout is in Germany or Italy!) but +the station attendant is slow? How fast must the attendant work to +ensure there's no backup on an entry road? Even with a blazingly fast +attendant, what happens if cars traverse the roundabout slowly---can +backups still occur? And what happens if most of the cars entering at +all of the roundabout's entrance ramps all want to leave the roundabout +at the same exit ramp---can backups occur at the exit ramp or elsewhere? +How should the roundabout operate if we want to assign priorities to +different cars, or block certain cars from entering the roundabout in +the first place? These are all analogous to critical questions faced by +router and switch designers. In the following subsections, we'll look at +router functions in more detail. \[Iyer 2008, Chao 2001; Chuang 2005; +Turner 1988; McKeown 1997a; Partridge 1998; Sopranos 2011\] provide a +discussion of specific router architectures. For concreteness and +simplicity, we'll initially assume in this section that forwarding +decisions are based only on the packet's destination address, rather +than on a generalized set of packet header fields. We will cover the +case of more generalized packet forwarding in Section 4.4. + +4.2.1 Input Port Processing and Destination-Based Forwarding + +A more detailed view of input processing is shown in Figure 4.5. As just +discussed, the input port's linetermination function and link-layer +processing implement the physical and link layers for that individual +input link. The lookup performed in the input port is central to the +router's operation---it is here that the router uses the forwarding +table to look up the output port to which an arriving packet will be +forwarded via the switching fabric. The forwarding table is either +computed and updated by the routing processor (using a routing protocol +to interact with the routing processors in other network routers) or is +received from a remote SDN controller. The forwarding table is copied +from the routing processor to the line cards over a separate bus (e.g., +a PCI bus) indicated by the dashed line from the routing processor to +the input line cards in Figure 4.4. With such a shadow copy at each line +card, forwarding decisions can be made locally, at each input port, +without invoking the centralized routing processor on a per-packet basis +and thus avoiding a centralized processing bottleneck. Let's now +consider the "simplest" case that the output port to which an incoming +packet is to be switched is based on the packet's destination address. +In the case of 32-bit IP addresses, a brute-force implementation of the +forwarding table would have one entry for every possible destination +address. Since there are more than 4 billion possible addresses, this +option is totally out of the question. + +Figure 4.5 Input port processing + +As an example of how this issue of scale can be handled, let's suppose +that our router has four links, numbered 0 through 3, and that packets +are to be forwarded to the link interfaces as follows: + +Destination Address Range + +Link Interface + +11001000 00010111 00010000 00000000 + +0 + +through 11001000 00010111 00010111 11111111 + +11001000 00010111 00011000 00000000 + +1 + +through 11001000 00010111 00011000 11111111 + +11001000 00010111 00011001 00000000 + +2 + +through 11001000 00010111 00011111 11111111 + +Otherwise + +3 + +Clearly, for this example, it is not necessary to have 4 billion entries +in the router's forwarding table. We could, for example, have the +following forwarding table with just four entries: + +Prefix + +Link Interface + +11001000 00010111 00010 + +0 + +11001000 00010111 00011000 + +1 + +11001000 00010111 00011 + +2 + +Otherwise + +3 + +With this style of forwarding table, the router matches a prefix of the +packet's destination address with the entries in the table; if there's a +match, the router forwards the packet to a link associated with the +match. For example, suppose the packet's destination address is 11001000 +00010111 00010110 10100001 ; because the 21-bit prefix of this address +matches the first entry in the table, the router forwards the packet to +link interface 0. If a prefix doesn't match any of the first three +entries, then the router forwards the packet to the default interface 3. +Although this sounds simple enough, there's a very important subtlety +here. You may have noticed that it is possible for a destination address +to match more than one entry. For example, the first 24 bits of the +address 11001000 00010111 00011000 10101010 match the second entry in +the table, and the first 21 bits of the address match the third entry in +the table. When there are multiple matches, the router uses the longest +prefix matching rule; that is, it finds the longest matching entry in +the table and forwards the packet to the link interface associated with +the longest prefix match. We'll see exactly why this longest +prefix-matching rule is used when we study Internet addressing in more +detail in Section 4.3. + +Given the existence of a forwarding table, lookup is conceptually +simple---hardware logic just searches through the forwarding table +looking for the longest prefix match. But at Gigabit transmission rates, +this lookup must be performed in nanoseconds (recall our earlier example +of a 10 Gbps link and a 64-byte IP datagram). Thus, not only must lookup +be performed in hardware, but techniques beyond a simple linear search +through a large table are needed; surveys of fast lookup algorithms can +be found in \[Gupta 2001, Ruiz-Sanchez 2001\]. Special attention must +also be paid to memory access times, resulting in designs with embedded +on-chip DRAM and faster SRAM (used as a DRAM cache) memories. In +practice, Ternary Content Addressable Memories (TCAMs) are also often +used for lookup \[Yu 2004\]. With a TCAM, a 32-bit IP address is +presented to the memory, which returns the content of the forwarding +table entry for that address in essentially constant time. The Cisco +Catalyst 6500 and 7600 Series routers and switches can hold upwards of a +million TCAM forwarding table entries \[Cisco TCAM 2014\]. Once a +packet's output port has been determined via the lookup, the packet can +be sent into the switching fabric. In some designs, a packet may be +temporarily blocked from entering the switching fabric if packets from +other input ports are currently using the fabric. A blocked packet will +be queued at the input port and then scheduled to cross the fabric at a +later point in time. We'll take a closer look at the blocking, queuing, +and scheduling of packets (at both input ports and output ports) +shortly. Although "lookup" is arguably the most important action in +input port processing, many other actions must be taken: (1) physical- +and link-layer processing must occur, as discussed previously; (2) the +packet's version number, checksum and time-to-live field---all of which +we'll study in Section 4.3---must be checked and the latter two fields +rewritten; and (3) counters used for network management (such as the +number of IP datagrams received) must be updated. Let's close our +discussion of input port processing by noting that the input port steps +of looking up a destination IP address ("match") and then sending the +packet into the switching fabric to the specified output port ("action") +is a specific case of a more general "match plus action" abstraction +that is performed in many networked devices, not just routers. In +link-layer switches (covered in Chapter 6), link-layer destination +addresses are looked up and several actions may be taken in addition to +sending the frame into the switching fabric towards the output port. In +firewalls (covered in Chapter 8)---devices that filter out selected +incoming packets---an incoming packet whose header matches a given +criteria (e.g., a combination of source/destination IP addresses and +transport-layer port numbers) may be dropped (action). In a network +address translator (NAT, covered in Section 4.3), an incoming packet +whose transport-layer port number matches a given value will have its +port number rewritten before forwarding (action). Indeed, the "match +plus action" abstraction is both powerful and prevalent in network +devices today, and is central to the notion of generalized forwarding +that we'll study in Section 4.4. + +4.2.2 Switching The switching fabric is at the very heart of a router, +as it is through this fabric that the packets are actually switched +(that is, forwarded) from an input port to an output port. Switching can +be accomplished in a number of ways, as shown in Figure 4.6: Switching +via memory. The simplest, earliest routers were traditional computers, +with switching between input and output ports being done under direct +control of the CPU (routing processor). Input and output ports +functioned as traditional I/O devices in a traditional operating system. +An input port with an arriving packet first signaled the routing +processor via an interrupt. The packet was then copied from the input +port into processor memory. The routing processor then extracted the +destination address from the header, looked up the appropriate output +port in the forwarding table, and copied the packet to the output port's +buffers. In this scenario, if the memory bandwidth is such that a +maximum of B packets per second can be written into, or read from, +memory, then the overall forwarding throughput (the total rate at which +packets are transferred from input ports to output ports) must be less +than B/2. Note also that two packets cannot be forwarded + +Figure 4.6 Three switching techniques + +at the same time, even if they have different destination ports, since +only one memory read/write can be done at a time over the shared system +bus. Some modern routers switch via memory. A major difference from +early routers, however, is that the lookup of the destination address +and the storing of the packet into the appropriate memory location are +performed by processing on the input line cards. In some ways, routers +that switch via memory look very much like shared-memory +multiprocessors, with the processing on a line card switching (writing) +packets into the memory of the appropriate output port. Cisco's Catalyst +8500 series switches \[Cisco 8500 2016\] internally switches packets via +a shared memory. Switching via a bus. In this approach, an input port +transfers a packet directly to the output port over a shared bus, +without intervention by the routing processor. This is typically done by +having the input port pre-pend a switch-internal label (header) to the +packet indicating the local output port to which this packet is being +transferred and transmitting the packet onto the bus. All output ports +receive the packet, but only the port that matches the label will keep +the packet. The label is then removed at the output port, as this label +is only used within the switch to cross the bus. If multiple packets +arrive to the router at the same time, each at a different input port, +all but one must wait since only one packet can cross the bus at a time. +Because every packet must cross the single bus, the switching speed of +the router is limited to the bus speed; in our roundabout analogy, this +is as if the roundabout could only contain one car at a time. +Nonetheless, switching via a bus is often sufficient for routers that +operate in small local area and enterprise networks. The Cisco 6500 +router \[Cisco 6500 2016\] internally switches packets over a +32-Gbps-backplane bus. Switching via an interconnection network. One way +to overcome the bandwidth limitation of a single, shared bus is to use a +more sophisticated interconnection network, such as those that have been +used in the past to interconnect processors in a multiprocessor computer +architecture. A crossbar switch is an interconnection network consisting +of 2N buses that connect N input ports to N output ports, as shown in +Figure 4.6. Each vertical bus intersects each horizontal bus at a +crosspoint, which can be opened or closed at any time by the switch +fabric controller (whose logic is + +part of the switching fabric itself). When a packet arrives from port A +and needs to be forwarded to port Y, the switch controller closes the +crosspoint at the intersection of busses A and Y, and port A then sends +the packet onto its bus, which is picked up (only) by bus Y. Note that a +packet from port B can be forwarded to port X at the same time, since +the A-to-Y and B-to-X packets use different input and output busses. +Thus, unlike the previous two switching approaches, crossbar switches +are capable of forwarding multiple packets in parallel. A crossbar +switch is non-blocking---a packet being forwarded to an output port will +not be blocked from reaching that output port as long as no other packet +is currently being forwarded to that output port. However, if two +packets from two different input ports are destined to that same output +port, then one will have to wait at the input, since only one packet can +be sent over any given bus at a time. Cisco 12000 series switches +\[Cisco 12000 2016\] use a crossbar switching network; the Cisco 7600 +series can be configured to use either a bus or crossbar switch \[Cisco +7600 2016\]. More sophisticated interconnection networks use multiple +stages of switching elements to allow packets from different input ports +to proceed towards the same output port at the same time through the +multi-stage switching fabric. See \[Tobagi 1990\] for a survey of switch +architectures. The Cisco CRS employs a three-stage non-blocking +switching strategy. A router's switching capacity can also be scaled by +running multiple switching fabrics in parallel. In this approach, input +ports and output ports are connected to N switching fabrics that operate +in parallel. An input port breaks a packet into K smaller chunks, and +sends ("sprays") the chunks through K of these N switching fabrics to +the selected output port, which reassembles the K chunks back into the +original packet. + +4.2.3 Output Port Processing Output port processing, shown in Figure +4.7, takes packets that have been stored in the output port's memory and +transmits them over the output link. This includes selecting and +de-queueing packets for transmission, and performing the needed +link-layer and physical-layer transmission functions. + +4.2.4 Where Does Queuing Occur? If we consider input and output port +functionality and the configurations shown in Figure 4.6, it's clear +that packet queues may form at both the input ports and the output +ports, just as we identified cases where cars may wait at the inputs and +outputs of the traffic intersection in our roundabout analogy. The +location and extent of queueing (either at the input port queues or the +output port queues) will depend on the traffic load, the relative speed +of the switching fabric, and the line speed. Let's now consider these +queues in a bit more detail, since as these queues grow large, the +router's memory can eventually be exhausted and packet loss will occur +when no memory is available to store arriving packets. Recall that in +our earlier discussions, we said that packets were "lost within the +network" or "dropped at a + +router." It is here, at these queues within a router, where such packets +are actually dropped and lost. + +Figure 4.7 Output port processing + +Suppose that the input and output line speeds (transmission rates) all +have an identical transmission rate of Rline packets per second, and +that there are N input ports and N output ports. To further simplify the +discussion, let's assume that all packets have the same fixed length, +and that packets arrive to input ports in a synchronous manner. That is, +the time to send a packet on any link is equal to the time to receive a +packet on any link, and during such an interval of time, either zero or +one packets can arrive on an input link. Define the switching fabric +transfer rate Rswitch as the rate at which packets can be moved from +input port to output port. If Rswitch is N times faster than Rline, then +only negligible queuing will occur at the input ports. This is because +even in the worst case, where all N input lines are receiving packets, +and all packets are to be forwarded to the same output port, each batch +of N packets (one packet per input port) can be cleared through the +switch fabric before the next batch arrives. Input Queueing But what +happens if the switch fabric is not fast enough (relative to the input +line speeds) to transfer all arriving packets through the fabric without +delay? In this case, packet queuing can also occur at the input ports, +as packets must join input port queues to wait their turn to be +transferred through the switching fabric to the output port. To +illustrate an important consequence of this queuing, consider a crossbar +switching fabric and suppose that (1) all link speeds are identical, (2) +that one packet can be transferred from any one input port to a given +output port in the same amount of time it takes for a packet to be +received on an input link, and (3) packets are moved from a given input +queue to their desired output queue in an FCFS manner. Multiple packets +can be transferred in parallel, as long as their output ports are +different. However, if two packets at the front of two input queues are +destined for the same output queue, then one of the packets will be +blocked and must wait at the input queue---the switching fabric can +transfer only one packet to a given output port at a time. Figure 4.8 +shows an example in which two packets (darkly shaded) at the front of +their input queues are destined for the same upper-right output port. +Suppose that the switch fabric chooses to transfer the packet from the +front of the upper-left queue. In this case, the darkly shaded packet in +the lower-left queue must wait. But not only must this darkly shaded +packet wait, so too must the lightly shaded + +packet that is queued behind that packet in the lower-left queue, even +though there is no contention for the middle-right output port (the +destination for the lightly shaded packet). This phenomenon is known as +head-of-the-line (HOL) blocking in an input-queued switch---a queued +packet in an input queue must wait for transfer through the fabric (even +though its output port is free) because it is blocked by another packet +at the head of the line. \[Karol 1987\] shows that due to HOL blocking, +the input queue will grow to unbounded length (informally, this is +equivalent to saying that significant packet loss will occur) under +certain assumptions as soon as the packet arrival rate on the input +links reaches only 58 percent of their capacity. A number of solutions +to HOL blocking are discussed in \[McKeown 1997\]. + +Figure 4.8 HOL blocking at and input-queued switch + +Output Queueing Let's next consider whether queueing can occur at a +switch's output ports. Suppose that Rswitch is again N times faster than +Rline and that packets arriving at each of the N input ports are +destined to the same output port. In this case, in the time it takes to +send a single packet onto the outgoing link, N new packets will arrive +at this output port (one from each of the N input ports). Since the +output port can + +transmit only a single packet in a unit of time (the packet transmission +time), the N arriving packets will have to queue (wait) for transmission +over the outgoing link. Then N more packets can possibly arrive in the +time it takes to transmit just one of the N packets that had just +previously been queued. And so on. Thus, packet queues can form at the +output ports even when the switching fabric is N times faster than the +port line speeds. Eventually, the number of queued packets can grow +large enough to exhaust available memory at the output port. + +Figure 4.9 Output port queueing + +When there is not enough memory to buffer an incoming packet, a decision +must be made to either drop the arriving packet (a policy known as +drop-tail) or remove one or more already-queued packets to make room for +the newly arrived packet. In some cases, it may be advantageous to drop +(or mark the header of) a packet before the buffer is full in order to +provide a congestion signal to the sender. A number of proactive +packet-dropping and -marking policies (which collectively have become +known as active queue management (AQM) algorithms) have been proposed +and analyzed \[Labrador 1999, Hollot 2002\]. One of the most widely +studied and implemented AQM algorithms is the Random Early Detection +(RED) algorithm \[Christiansen 2001; Floyd 2016\]. Output port queuing +is illustrated in Figure 4.9. At time t, a packet has arrived at each of +the incoming input ports, each destined for the uppermost outgoing port. +Assuming identical line speeds and a switch operating at three times the +line speed, one time unit later (that is, in the time needed to receive +or send + +a packet), all three original packets have been transferred to the +outgoing port and are queued awaiting transmission. In the next time +unit, one of these three packets will have been transmitted over the +outgoing link. In our example, two new packets have arrived at the +incoming side of the switch; one of these packets is destined for this +uppermost output port. A consequence of such queuing is that a packet +scheduler at the output port must choose one packet, among those queued, +for transmission--- a topic we'll cover in the following section. Given +that router buffers are needed to absorb the fluctuations in traffic +load, a natural question to ask is how much buffering is required. For +many years, the rule of thumb \[RFC 3439\] for buffer sizing was that +the amount of buffering (B) should be equal to an average round-trip +time (RTT, say 250 msec) times the link capacity (C). This result is +based on an analysis of the queueing dynamics of a relatively small +number of TCP flows \[Villamizar 1994\]. Thus, a 10 Gbps link with an +RTT of 250 msec would need an amount of buffering equal to B 5 RTT · C 5 +2.5 Gbits of buffers. More recent theoretical and experimental efforts +\[Appenzeller 2004\], however, suggest that when there are a large +number of TCP flows (N) passing through a link, the amount of buffering +needed is B=RTI⋅C/N. With a large number of flows typically passing +through large backbone router links (see, e.g., \[Fraleigh 2003\]), the +value of N can be large, with the decrease in needed buffer size +becoming quite significant. \[Appenzeller 2004; Wischik 2005; Beheshti +2008\] provide very readable discussions of the buffer-sizing problem +from a theoretical, implementation, and operational standpoint. + +4.2.5 Packet Scheduling Let's now return to the question of determining +the order in which queued packets are transmitted over an outgoing link. +Since you yourself have undoubtedly had to wait in long lines on many +occasions and observed how waiting customers are served, you're no doubt +familiar with many of the queueing disciplines commonly used in routers. +There is first-come-first-served (FCFS, also known as first-in-firstout, +FIFO). The British are famous for patient and orderly FCFS queueing at +bus stops and in the marketplace ("Oh, are you queueing?"). Other +countries operate on a priority basis, with one class of waiting +customers given priority service over other waiting customers. There is +also round-robin queueing, where customers are again divided into +classes (as in priority queueing) but each class of customer is given +service in turn. First-in-First-Out (FIFO) Figure 4.10 shows the queuing +model abstraction for the FIFO link-scheduling discipline. Packets +arriving at the link output queue wait for transmission if the link is +currently busy transmitting another packet. If there is not sufficient +buffering space to hold the arriving packet, the queue's +packetdiscarding policy then determines whether the packet will be +dropped (lost) or whether other packets will be removed from the queue +to make space for the arriving packet, as discussed above. In our + +discussion below, we'll ignore packet discard. When a packet is +completely transmitted over the outgoing link (that is, receives +service) it is removed from the queue. The FIFO (also known as +first-come-first-served, or FCFS) scheduling discipline selects packets +for link transmission in the same order in which they arrived at the +output link queue. We're all familiar with FIFO queuing from service +centers, where + +Figure 4.10 FIFO queueing abstraction + +arriving customers join the back of the single waiting line, remain in +order, and are then served when they reach the front of the line. Figure +4.11 shows the FIFO queue in operation. Packet arrivals are indicated by +numbered arrows above the upper timeline, with the number indicating the +order in which the packet arrived. Individual packet departures are +shown below the lower timeline. The time that a packet spends in service +(being transmitted) is indicated by the shaded rectangle between the two +timelines. In our examples here, let's assume that each packet takes +three units of time to be transmitted. Under the FIFO discipline, +packets leave in the same order in which they arrived. Note that after +the departure of packet 4, the link remains idle (since packets 1 +through 4 have been transmitted and removed from the queue) until the +arrival of packet 5. Priority Queuing Under priority queuing, packets +arriving at the output link are classified into priority classes upon +arrival at the queue, as shown in Figure 4.12. In practice, a network +operator may configure a queue so that packets carrying network +management information (e.g., as indicated by the source or destination +TCP/UDP port number) receive priority over user traffic; additionally, +real-time voice-over-IP packets might receive priority over non-real +traffic such as SMTP or IMAP e-mail packets. Each + +Figure 4.11 The FIFO queue in operation + +Figure 4.12 The priority queueing model + +priority class typically has its own queue. When choosing a packet to +transmit, the priority queuing discipline will transmit a packet from +the highest priority class that has a nonempty queue (that is, has +packets waiting for transmission). The choice among packets in the same +priority class is typically done in a FIFO manner. Figure 4.13 +illustrates the operation of a priority queue with two priority classes. +Packets 1, 3, and 4 belong to the high-priority class, and packets 2 and +5 belong to the low-priority class. Packet 1 arrives and, finding the +link idle, begins transmission. During the transmission of packet 1, +packets 2 and 3 arrive and are queued in the low- and high-priority +queues, respectively. After the transmission of packet 1, packet 3 (a +high-priority packet) is selected for transmission over packet 2 (which, +even though it arrived earlier, is a low-priority packet). At the end of +the transmission of packet 3, packet 2 then begins transmission. Packet +4 (a high-priority packet) arrives during the transmission of packet 2 +(a low-priority packet). Under a non-preemptive priority queuing +discipline, the transmission of a packet is not interrupted once it has + +Figure 4.13 The priority queue in operation + +Figure 4.14 The two-class robin queue in operation + +begun. In this case, packet 4 queues for transmission and begins being +transmitted after the transmission of packet 2 is completed. Round Robin +and Weighted Fair Queuing (WFQ) Under the round robin queuing +discipline, packets are sorted into classes as with priority queuing. +However, rather than there being a strict service priority among +classes, a round robin scheduler alternates service among the classes. +In the simplest form of round robin scheduling, a class 1 packet is +transmitted, followed by a class 2 packet, followed by a class 1 packet, +followed by a class 2 packet, and so on. A so-called work-conserving +queuing discipline will never allow the link to remain idle whenever +there are packets (of any class) queued for transmission. A +work-conserving round robin discipline that looks for a packet of a +given class but finds none will immediately check the next class in the +round robin sequence. Figure 4.14 illustrates the operation of a +two-class round robin queue. In this example, packets 1, 2, and + +4 belong to class 1, and packets 3 and 5 belong to the second class. +Packet 1 begins transmission immediately upon arrival at the output +queue. Packets 2 and 3 arrive during the transmission of packet 1 and +thus queue for transmission. After the transmission of packet 1, the +link scheduler looks for a class 2 packet and thus transmits packet 3. +After the transmission of packet 3, the scheduler looks for a class 1 +packet and thus transmits packet 2. After the transmission of packet 2, +packet 4 is the only queued packet; it is thus transmitted immediately +after packet 2. A generalized form of round robin queuing that has been +widely implemented in routers is the so-called weighted fair queuing +(WFQ) discipline \[Demers 1990; Parekh 1993; Cisco QoS 2016\]. WFQ is +illustrated in Figure 4.15. Here, arriving packets are classified and +queued in the appropriate per-class waiting area. As in round robin +scheduling, a WFQ scheduler will serve classes in a circular manner--- +first serving class 1, then serving class 2, then serving class 3, and +then (assuming there are three classes) repeating the service pattern. +WFQ is also a work-conserving + +Figure 4.15 Weighted fair queueing + +queuing discipline and thus will immediately move on to the next class +in the service sequence when it finds an empty class queue. WFQ differs +from round robin in that each class may receive a differential amount of +service in any interval of time. Specifically, each class, i, is +assigned a weight, wi. Under WFQ, during any interval of time during +which there are class i packets to send, class i will then be guaranteed +to receive a fraction of service equal to wi/(∑wj), where the sum in the +denominator is taken over all classes that also have packets queued for +transmission. In the worst case, even if all classes have queued +packets, class i will still be guaranteed to receive a fraction wi/(∑wj) +of the bandwidth, where in this worst case the sum in the denominator is +over all classes. Thus, for a link with transmission rate R, class i +will always achieve a throughput of at least R⋅wi/(∑wj). Our description +of WFQ has been idealized, as we have not considered the fact that +packets are discrete and a packet's transmission will not be interrupted +to begin transmission of another packet; \[Demers 1990; Parekh 1993\] +discuss this packetization issue. + +4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, and More Our +study of the network layer thus far in Chapter 4---the notion of the +data and control plane component of the network layer, our distinction +between forwarding and routing, the identification of various network +service models, and our look inside a router---have often been without +reference to any specific computer network architecture or protocol. In +this section we'll focus on key aspects of the network layer on today's +Internet and the celebrated Internet Protocol (IP). There are two +versions of IP in use today. We'll first examine the widely deployed IP +protocol version 4, which is usually referred to simply as IPv4 \[RFC +791\] + +Figure 4.16 IPv4 datagram format + +in Section 4.3.1. We'll examine IP version 6 \[RFC 2460; RFC 4291\], +which has been proposed to replace IPv4, in Section 4.3.5. In between, +we'll primarily cover Internet addressing---a topic that might seem +rather dry and detail-oriented but we'll see is crucial to understanding +how the Internet's network layer works. To master IP addressing is to +master the Internet's network layer itself! + +4.3.1 IPv4 Datagram Format Recall that the Internet's network-layer +packet is referred to as a datagram. We begin our study of IP with an +overview of the syntax and semantics of the IPv4 datagram. You might be +thinking that nothing could be drier than the syntax and semantics of a +packet's bits. Nevertheless, the datagram plays a central role in the +Internet---every networking student and professional needs to see it, +absorb it, and master it. (And just to see that protocol headers can +indeed be fun to study, check out \[Pomeranz 2010\]). The IPv4 datagram +format is shown in Figure 4.16. The key fields in the IPv4 datagram are +the following: Version number. These 4 bits specify the IP protocol +version of the datagram. By looking at the version number, the router +can determine how to interpret the remainder of the IP datagram. +Different versions of IP use different datagram formats. The datagram +format for IPv4 is shown in Figure 4.16. The datagram format for the new +version of IP (IPv6) is discussed in Section 4.3.5. Header length. +Because an IPv4 datagram can contain a variable number of options (which +are included in the IPv4 datagram header), these 4 bits are needed to +determine where in the IP datagram the payload (e.g., the +transport-layer segment being encapsulated in this datagram) actually +begins. Most IP datagrams do not contain options, so the typical IP +datagram has a 20-byte header. Type of service. The type of service +(TOS) bits were included in the IPv4 header to allow different types of +IP datagrams to be distinguished from each other. For example, it might +be useful to distinguish real-time datagrams (such as those used by an +IP telephony application) from non-realtime traffic (for example, FTP). +The specific level of service to be provided is a policy issue +determined and configured by the network administrator for that router. +We also learned in Section 3.7.2 that two of the TOS bits are used for +Explicit Congestion Notification. Datagram length. This is the total +length of the IP datagram (header plus data), measured in bytes. Since +this field is 16 bits long, the theoretical maximum size of the IP +datagram is 65,535 bytes. However, datagrams are rarely larger than +1,500 bytes, which allows an IP datagram to fit in the payload field of +a maximally sized Ethernet frame. Identifier, flags, fragmentation +offset. These three fields have to do with so-called IP fragmentation, a +topic we will consider shortly. Interestingly, the new version of IP, +IPv6, does not allow for fragmentation. Time-to-live. The time-to-live +(TTL) field is included to ensure that datagrams do not circulate +forever (due to, for example, a long-lived routing loop) in the network. +This field is decremented by one each time the datagram is processed by +a router. If the TTL field reaches 0, a router must drop that datagram. +Protocol. This field is typically used only when an IP datagram reaches +its final destination. The value of this field indicates the specific +transport-layer protocol to which the data portion of this IP datagram +should be passed. For example, a value of 6 indicates that the data +portion is passed to TCP, while a value of 17 indicates that the data is +passed to UDP. For a list of all possible values, + +see \[IANA Protocol Numbers 2016\]. Note that the protocol number in the +IP datagram has a role that is analogous to the role of the port number +field in the transport-layer segment. The protocol number is the glue +that binds the network and transport layers together, whereas the port +number is the glue that binds the transport and application layers +together. We'll see in Chapter 6 that the linklayer frame also has a +special field that binds the link layer to the network layer. Header +checksum. The header checksum aids a router in detecting bit errors in a +received IP datagram. The header checksum is computed by treating each 2 +bytes in the header as a number and summing these numbers using 1s +complement arithmetic. As discussed in Section 3.3, the 1s complement of +this sum, known as the Internet checksum, is stored in the checksum +field. A router computes the header checksum for each received IP +datagram and detects an error condition if the checksum carried in the +datagram header does not equal the computed checksum. Routers typically +discard datagrams for which an error has been detected. Note that the +checksum must be recomputed and stored again at each router, since the +TTL field, and possibly the options field as well, will change. An +interesting discussion of fast algorithms for computing the Internet +checksum is \[RFC 1071\]. A question often asked at this point is, why +does TCP/IP perform error checking at both the transport and network +layers? There are several reasons for this repetition. First, note that +only the IP header is checksummed at the IP layer, while the TCP/UDP +checksum is computed over the entire TCP/UDP segment. Second, TCP/UDP +and IP do not necessarily both have to belong to the same protocol +stack. TCP can, in principle, run over a different network-layer +protocol (for example, ATM) \[Black 1995\]) and IP can carry data that +will not be passed to TCP/UDP. Source and destination IP addresses. When +a source creates a datagram, it inserts its IP address into the source +IP address field and inserts the address of the ultimate destination +into the destination IP address field. Often the source host determines +the destination address via a DNS lookup, as discussed in Chapter 2. +We'll discuss IP addressing in detail in Section 4.3.3. Options. The +options fields allow an IP header to be extended. Header options were +meant to be used rarely---hence the decision to save overhead by not +including the information in options fields in every datagram header. +However, the mere existence of options does complicate matters---since +datagram headers can be of variable length, one cannot determine a +priori where the data field will start. Also, since some datagrams may +require options processing and others may not, the amount of time needed +to process an IP datagram at a router can vary greatly. These +considerations become particularly important for IP processing in +high-performance routers and hosts. For these reasons and others, IP +options were not included in the IPv6 header, as discussed in Section +4.3.5. Data (payload). Finally, we come to the last and most important +field---the raison d'etre for the datagram in the first place! In most +circumstances, the data field of the IP datagram contains the +transport-layer segment (TCP or UDP) to be delivered to the destination. +However, the data field can carry other types of data, such as ICMP +messages (discussed in Section 5.6). Note that an IP datagram has a +total of 20 bytes of header (assuming no options). If the datagram +carries a TCP segment, then each (non-fragmented) datagram carries a +total of 40 bytes of header (20 bytes of IP header plus 20 bytes of TCP +header) along with the application-layer message. + +4.3.2 IPv4 Datagram Fragmentation We'll see in Chapter 6 that not all +link-layer protocols can carry network-layer packets of the same size. +Some protocols can carry big datagrams, whereas other protocols can +carry only little datagrams. For example, Ethernet frames can carry up +to 1,500 bytes of data, whereas frames for some wide-area links can +carry no more than 576 bytes. The maximum amount of data that a +link-layer frame can carry is called the maximum transmission unit +(MTU). Because each IP datagram is encapsulated within the link-layer +frame for transport from one router to the next router, the MTU of the +link-layer protocol places a hard limit on the length of an IP datagram. +Having a hard limit on the size of an IP datagram is not much of a +problem. What is a problem is that each of the links along the route +between sender and destination can use different link-layer protocols, +and each of these protocols can have different MTUs. To understand the +forwarding issue better, imagine that you are a router that +interconnects several links, each running different link-layer protocols +with different MTUs. Suppose you receive an IP datagram from one link. +You check your forwarding table to determine the outgoing link, and this +outgoing link has an MTU that is smaller than the length of the IP +datagram. Time to panic---how are you going to squeeze this oversized IP +datagram into the payload field of the link-layer frame? The solution is +to fragment the payload in the IP datagram into two or more smaller IP +datagrams, encapsulate each of these smaller IP datagrams in a separate +link-layer frame; and send these frames over the outgoing link. Each of +these smaller datagrams is referred to as a fragment. Fragments need to +be reassembled before they reach the transport layer at the destination. +Indeed, both TCP and UDP are expecting to receive complete, unfragmented +segments from the network layer. The designers of IPv4 felt that +reassembling datagrams in the routers would introduce significant +complication into the protocol and put a damper on router performance. +(If you were a router, would you want to be reassembling fragments on +top of everything else you had to do?) Sticking to the principle of +keeping the network core simple, the designers of IPv4 decided to put +the job of datagram reassembly in the end systems rather than in network +routers. When a destination host receives a series of datagrams from the +same source, it needs to determine whether any of these datagrams are +fragments of some original, larger datagram. If some datagrams are +fragments, it must further determine when it has received the last +fragment and how the fragments it has received should be pieced back +together to form the original datagram. To allow the destination host to +perform these reassembly tasks, the designers of IP (version 4) put +identification, flag, and fragmentation offset fields in the IP datagram +header. When a datagram is created, the sending host stamps the datagram +with an identification number as well as source and destination +addresses. Typically, the sending host increments the identification +number for each datagram it sends. When a router needs to fragment a +datagram, each resulting datagram (that is, fragment) is stamped with +the + +source address, destination address, and identification number of the +original datagram. When the destination receives a series of datagrams +from the same sending host, it can examine the identification numbers of +the datagrams to determine which of the datagrams are actually fragments +of the same larger datagram. Because IP is an unreliable service, one or +more of the fragments may never arrive at the destination. For this +reason, in order for the destination host to be absolutely sure it has +received the last fragment of + +Figure 4.17 IP fragmentation and reassembly + +the original datagram, the last fragment has a flag bit set to 0, +whereas all the other fragments have this flag bit set to 1. Also, in +order for the destination host to determine whether a fragment is +missing (and also to be able to reassemble the fragments in their proper +order), the offset field is used to specify where the fragment fits +within the original IP datagram. Figure 4.17 illustrates an example. A +datagram of 4,000 bytes (20 bytes of IP header plus 3,980 bytes of IP +payload) arrives at a router and must be forwarded to a link with an MTU +of 1,500 bytes. This implies that the 3,980 data bytes in the original +datagram must be allocated to three separate fragments (each of which is +also an IP datagram). The online material for this book, and the +problems at the end of this chapter will allow you to explore +fragmentation in more detail. Also, on this book's Web site, we provide +a Java applet that generates fragments. You provide the incoming +datagram size, the MTU, and the incoming datagram identification. + +The applet automatically generates the fragments for you. See +http://www.pearsonhighered.com/csresources/. + +4.3.3 IPv4 Addressing We now turn our attention to IPv4 addressing. +Although you may be thinking that addressing must be a straightforward +topic, hopefully by the end of this section you'll be convinced that +Internet addressing is not only a juicy, subtle, and interesting topic +but also one that is of central importance to the Internet. An excellent +treatment of IPv4 addressing can be found in the first chapter in +\[Stewart 1999\]. Before discussing IP addressing, however, we'll need +to say a few words about how hosts and routers are connected into the +Internet. A host typically has only a single link into the network; when +IP in the host wants to send a datagram, it does so over this link. The +boundary between the host and the physical link is called an interface. +Now consider a router and its interfaces. Because a router's job is to +receive a datagram on one link and forward the datagram on some other +link, a router necessarily has two or more links to which it is +connected. The boundary between the router and any one of its links is +also called an interface. A router thus has multiple interfaces, one for +each of its links. Because every host and router is capable of sending +and receiving IP datagrams, IP requires each host and router interface +to have its own IP address. Thus, an IP address is technically +associated with an interface, rather than with the host or router +containing that interface. Each IP address is 32 bits long +(equivalently, 4 bytes), and there are thus a total of 232 (or +approximately 4 billion) possible IP addresses. These addresses are +typically written in so-called dotted-decimal notation, in which each +byte of the address is written in its decimal form and is separated by a +period (dot) from other bytes in the address. For example, consider the +IP address 193.32.216.9. The 193 is the decimal equivalent of the first +8 bits of the address; the 32 is the decimal equivalent of the second 8 +bits of the address, and so on. Thus, the address 193.32.216.9 in binary +notation is 11000001 00100000 11011000 00001001 Each interface on every +host and router in the global Internet must have an IP address that is +globally unique (except for interfaces behind NATs, as discussed in +Section 4.3.4). These addresses cannot be chosen in a willy-nilly +manner, however. A portion of an interface's IP address will be +determined by the subnet to which it is connected. Figure 4.18 provides +an example of IP addressing and interfaces. In this figure, one router +(with three interfaces) is used to interconnect seven hosts. Take a +close look at the IP addresses assigned to the host and router +interfaces, as there are several things to notice. The three hosts in +the upper-left portion of Figure 4.18, and the router interface to which +they are connected, all have an IP address of the form + +223.1.1.xxx. That is, they all have the same leftmost 24 bits in their +IP address. These four interfaces are also interconnected to each other +by a network that contains no routers. This network could be +interconnected by an Ethernet LAN, in which case the interfaces would be +interconnected by an Ethernet switch (as we'll discuss in Chapter 6), or +by a wireless access point (as we'll discuss in Chapter 7). We'll +represent this routerless network connecting these hosts as a cloud for +now, and dive into the internals of such networks in Chapters 6 and 7. +In IP terms, this network interconnecting three host interfaces and one +router interface forms a subnet \[RFC 950\]. (A subnet is also called an +IP network or simply + +Figure 4.18 Interface addresses and subnets + +a network in the Internet literature.) IP addressing assigns an address +to this subnet: 223.1.1.0/24, where the /24 ("slash-24") notation, +sometimes known as a subnet mask, indicates that the leftmost 24 bits of +the 32-bit quantity define the subnet address. The 223.1.1.0/24 subnet +thus consists of the three host interfaces (223.1.1.1, 223.1.1.2, and +223.1.1.3) and one router interface (223.1.1.4). Any additional hosts +attached to the 223.1.1.0/24 subnet would be required to have an address +of the form 223.1.1.xxx. There are two additional subnets shown in +Figure 4.18: the 223.1.2.0/24 network and the 223.1.3.0/24 subnet. +Figure 4.19 illustrates the three IP subnets present in Figure 4.18. The +IP definition of a subnet is not restricted to Ethernet segments that +connect multiple hosts to a router interface. To get some insight here, +consider Figure 4.20, which shows three routers that are interconnected +with each other by point-to-point links. Each router has three +interfaces, one for each point-to-point link and one for the broadcast +link that directly connects the router to a pair of hosts. What + +subnets are present here? Three subnets, 223.1.1.0/24, 223.1.2.0/24, and +223.1.3.0/24, are similar to the subnets we encountered in Figure 4.18. +But note that there are three additional subnets in this example as +well: one subnet, 223.1.9.0/24, for the interfaces that connect routers +R1 and R2; another subnet, 223.1.8.0/24, for the interfaces that connect +routers R2 and R3; and a third subnet, 223.1.7.0/24, for the interfaces +that connect routers R3 and R1. For a general interconnected system of +routers and hosts, we can use the following recipe to define the subnets +in the system: + +Figure 4.19 Subnet addresses + +To determine the subnets, detach each interface from its host or router, +creating islands of isolated networks, with interfaces terminating the +end points of the isolated networks. Each of these isolated networks is +called a subnet. If we apply this procedure to the interconnected system +in Figure 4.20, we get six islands or subnets. From the discussion +above, it's clear that an organization (such as a company or academic +institution) with multiple Ethernet segments and point-to-point links +will have multiple subnets, with all of the devices on a given subnet +having the same subnet address. In principle, the different subnets +could have quite different subnet addresses. In practice, however, their +subnet addresses often have much in common. To understand why, let's +next turn our attention to how addressing is handled in the global +Internet. The Internet's address assignment strategy is known as +Classless Interdomain Routing (CIDR--- pronounced cider) \[RFC 4632\]. +CIDR generalizes the notion of subnet addressing. As with subnet + +addressing, the 32-bit IP address is divided into two parts and again +has the dotted-decimal form a.b.c.d/x, where x indicates the number of +bits in the first part of the address. The x most significant bits of an +address of the form a.b.c.d/x constitute the network portion of the IP +address, and are often referred to as the prefix (or network prefix) of +the address. An organization is typically assigned a block of contiguous +addresses, that is, a range of addresses with a common prefix (see the +Principles in Practice feature). In this case, the IP addresses of +devices within the organization will share the common prefix. When we +cover the Internet's BGP routing protocol in + +Figure 4.20 Three routers interconnecting six subnets + +Section 5.4, we'll see that only these x leading prefix bits are +considered by routers outside the organization's network. That is, when +a router outside the organization forwards a datagram whose destination +address is inside the organization, only the leading x bits of the +address need be considered. This considerably reduces the size of the +forwarding table in these routers, since a single entry of the form +a.b.c.d/x will be sufficient to forward packets to any destination +within the organization. The remaining 32-x bits of an address can be +thought of as distinguishing among the devices within the organization, +all of which have the same network prefix. These are the bits that will +be considered when forwarding packets at routers within the +organization. These lower-order bits may (or may not) have an + +additional subnetting structure, such as that discussed above. For +example, suppose the first 21 bits of the CIDRized address a.b.c.d/21 +specify the organization's network prefix and are common to the IP +addresses of all devices in that organization. The remaining 11 bits +then identify the specific hosts in the organization. The organization's +internal structure might be such that these 11 rightmost bits are used +for subnetting within the organization, as discussed above. For example, +a.b.c.d/24 might refer to a specific subnet within the organization. +Before CIDR was adopted, the network portions of an IP address were +constrained to be 8, 16, or 24 bits in length, an addressing scheme +known as classful addressing, since subnets with 8-, 16-, and 24-bit +subnet addresses were known as class A, B, and C networks, respectively. +The requirement that the subnet portion of an IP address be exactly 1, +2, or 3 bytes long turned out to be problematic for supporting the +rapidly growing number of organizations with small and medium-sized +subnets. A class C (/24) subnet could accommodate only up to 28 − 2 = +254 hosts (two of the 28 = 256 addresses are reserved for special +use)---too small for many organizations. However, a class B (/16) +subnet, which supports up to 65,634 hosts, was too large. Under classful +addressing, an organization with, say, 2,000 hosts was typically +allocated a class B (/16) subnet address. This led to a rapid depletion +of the class B address space and poor utilization of the assigned +address space. For example, the organization that used a class B address +for its 2,000 hosts was allocated enough of the address space for up to +65,534 interfaces---leaving more than 63,000 addresses that could not be +used by other organizations. + +PRINCIPLES IN PRACTICE This example of an ISP that connects eight +organizations to the Internet nicely illustrates how carefully allocated +CIDRized addresses facilitate routing. Suppose, as shown in Figure 4.21, +that the ISP (which we'll call Fly-By-Night-ISP) advertises to the +outside world that it should be sent any datagrams whose first 20 +address bits match 200.23.16.0/20. The rest of the world need not know +that within the address block 200.23.16.0/20 there are in fact eight +other organizations, each with its own subnets. This ability to use a +single prefix to advertise multiple networks is often referred to as +address aggregation (also route aggregation or route summarization). +Address aggregation works extremely well when addresses are allocated in +blocks to ISPs and then from ISPs to client organizations. But what +happens when addresses are not allocated in such a hierarchical manner? +What would happen, for example, if Fly-By-Night-ISP acquires ISPs-R-Us +and then has Organization 1 connect to the Internet through its +subsidiary ISPs-RUs? As shown in Figure 4.21, the subsidiary ISPs-R-Us +owns the address block 199.31.0.0/16, but Organization 1's IP addresses +are unfortunately outside of this address block. What should be done +here? Certainly, Organization 1 could renumber all of its routers and +hosts to have addresses within the ISPs-R-Us address block. But this is +a costly solution, and Organization 1 might well be reassigned to +another subsidiary in the future. The solution typically adopted is for +Organization 1 to keep its IP addresses in 200.23.18.0/23. In this case, +as shown in Figure 4.22, + +Fly-By-Night-ISP continues to advertise the address block 200.23.16.0/20 +and ISPs-R-Us continues to advertise 199.31.0.0/16. However, ISPs-R-Us +now also advertises the block of addresses for Organization 1, +200.23.18.0/23. When other routers in the larger Internet see the +address blocks 200.23.16.0/20 (from Fly-By-Night-ISP) and 200.23.18.0/23 +(from ISPs-R-Us) and want to route to an address in the block +200.23.18.0/23, they will use longest prefix matching (see Section +4.2.1), and route toward ISPs-R-Us, as it advertises the longest (i.e., +most-specific) address prefix that matches the destination address. + +Figure 4.21 Hierarchical addressing and route aggregation + +Figure 4.22 ISPs-R-Us has a more specific route to Organization 1 + +We would be remiss if we did not mention yet another type of IP address, +the IP broadcast address 255.255.255.255. When a host sends a datagram +with destination address 255.255.255.255, the message is delivered to +all hosts on the same subnet. Routers optionally forward the message +into neighboring subnets as well (although they usually don't). Having +now studied IP addressing in detail, we need to know how hosts and +subnets get their addresses in the first place. Let's begin by looking +at how an organization gets a block of addresses for its devices, and +then look at how a device (such as a host) is assigned an address from +within the organization's block of addresses. Obtaining a Block of +Addresses In order to obtain a block of IP addresses for use within an +organization's subnet, a network administrator might first contact its +ISP, which would provide addresses from a larger block of addresses that +had already been allocated to the ISP. For example, the ISP may itself +have been allocated the address block 200.23.16.0/20. The ISP, in turn, +could divide its address block into eight equal-sized contiguous address +blocks and give one of these address blocks out to each of up to eight +organizations that are supported by this ISP, as shown below. (We have +underlined the subnet part of these addresses for your convenience.) +ISP's block: + +200.23.16.0/20 + +11001000 00010111 00010000 00000000 + +Organization 0 + +200.23.16.0/23 + +11001000 00010111 00010000 00000000 + +Organization 1 + +200.23.18.0/23 + +11001000 00010111 00010010 00000000 + +Organization 2 + +200.23.20.0/23 + +11001000 00010111 00010100 00000000 + + ... ... Organization 7 + +200.23.30.0/23 + + ... 11001000 00010111 00011110 00000000 + +While obtaining a set of addresses from an ISP is one way to get a block +of addresses, it is not the only way. Clearly, there must also be a way +for the ISP itself to get a block of addresses. Is there a global +authority that has ultimate responsibility for managing the IP address +space and allocating address blocks to ISPs and other organizations? +Indeed there is! IP addresses are managed under the authority of the +Internet Corporation for Assigned Names and Numbers (ICANN) \[ICANN +2016\], based on guidelines set forth in \[RFC 7020\]. The role of the +nonprofit ICANN organization \[NTIA 1998\] is not only to allocate IP +addresses, but also to manage the DNS root servers. It also has the very +contentious job of assigning domain names and resolving domain name +disputes. The ICANN allocates addresses to regional Internet registries +(for example, ARIN, RIPE, APNIC, and LACNIC, which together form the +Address Supporting Organization of ICANN \[ASO-ICANN 2016\]), and handle +the allocation/management of addresses within their regions. Obtaining a +Host Address: The Dynamic Host Configuration Protocol Once an +organization has obtained a block of addresses, it can assign individual +IP addresses to the host and router interfaces in its organization. A +system administrator will typically manually configure the IP addresses +into the router (often remotely, with a network management tool). Host +addresses can also be configured manually, but typically this is done +using the Dynamic Host Configuration Protocol (DHCP) \[RFC 2131\]. DHCP +allows a host to obtain (be allocated) an IP address automatically. A +network administrator can configure DHCP so that a given host receives +the same IP address each time it connects to the network, or a host may +be assigned a temporary IP address that will be different each time the +host connects to the network. In addition to host IP address assignment, +DHCP also allows a host to learn additional information, such as its +subnet mask, the address of its first-hop router (often called the +default gateway), and the address of its local DNS server. Because of +DHCP's ability to automate the network-related aspects of connecting a +host into a network, it is often referred to as a plug-and-play or +zeroconf (zero-configuration) protocol. This capability makes it very +attractive to the network administrator who would otherwise have to +perform these tasks manually! DHCP is also enjoying widespread use in +residential Internet access networks, enterprise + +networks, and in wireless LANs, where hosts join and leave the network +frequently. Consider, for example, the student who carries a laptop from +a dormitory room to a library to a classroom. It is likely that in each +location, the student will be connecting into a new subnet and hence +will need a new IP address at each location. DHCP is ideally suited to +this situation, as there are many users coming and going, and addresses +are needed for only a limited amount of time. The value of DHCP's +plug-and-play capability is clear, since it's unimaginable that a system +administrator would be able to reconfigure laptops at each location, and +few students (except those taking a computer networking class!) would +have the expertise to configure their laptops manually. DHCP is a +client-server protocol. A client is typically a newly arriving host +wanting to obtain network configuration information, including an IP +address for itself. In the simplest case, each subnet (in the addressing +sense of Figure 4.20) will have a DHCP server. If no server is present +on the subnet, a DHCP relay agent (typically a router) that knows the +address of a DHCP server for that network is needed. Figure 4.23 shows a +DHCP server attached to subnet 223.1.2/24, with the router serving as +the relay agent for arriving clients attached to subnets 223.1.1/24 and +223.1.3/24. In our discussion below, we'll assume that a DHCP server is +available on the subnet. For a newly arriving host, the DHCP protocol is +a four-step process, as shown in Figure 4.24 for the network setting +shown in Figure 4.23. In this figure, yiaddr (as in "your Internet +address") indicates the address being allocated to the newly arriving +client. The four steps are: + +Figure 4.23 DHCP client and server + +DHCP server discovery. The first task of a newly arriving host is to +find a DHCP server with which to interact. This is done using a DHCP +discover message, which a client sends within a UDP packet to port 67. +The UDP packet is encapsulated in an IP datagram. But to whom should +this datagram be sent? The host doesn't even know the IP address of the +network to which it is attaching, much less the address of a DHCP server +for this network. Given this, the DHCP client creates an IP datagram +containing its DHCP discover message along with the broadcast +destination IP address of 255.255.255.255 and a "this host" source IP +address of 0.0.0.0. The DHCP client passes the IP datagram to the link +layer, which then broadcasts this frame to all nodes attached to the +subnet (we will cover the details of link-layer broadcasting in Section +6.4). DHCP server offer(s). A DHCP server receiving a DHCP discover +message responds to the client with a DHCP offer message that is +broadcast to all nodes on the subnet, again using the IP broadcast +address of 255.255.255.255. (You might want to think about why this +server reply must also be broadcast). Since several DHCP servers can be +present on the subnet, the client may find itself in the enviable +position of being able to choose from among several offers. Each + +Figure 4.24 DHCP client-server interaction + +server offer message contains the transaction ID of the received +discover message, the proposed IP address for the client, the network +mask, and an IP address lease time---the amount of time for which the IP +address will be valid. It is common for the server to set the lease time +to several hours or days \[Droms 2002\]. DHCP request. The newly +arriving client will choose from among one or more server offers and +respond to its selected offer with a DHCP request message, echoing back +the configuration parameters. DHCP ACK. The server responds to the DHCP +request message with a DHCP ACK message, confirming the requested +parameters. Once the client receives the DHCP ACK, the interaction is +complete and the client can use the DHCPallocated IP address for the +lease duration. Since a client may want to use its address beyond the + +lease's expiration, DHCP also provides a mechanism that allows a client +to renew its lease on an IP address. From a mobility aspect, DHCP does +have one very significant shortcoming. Since a new IP address is +obtained from DHCP each time a node connects to a new subnet, a TCP +connection to a remote application cannot be maintained as a mobile node +moves between subnets. In Chapter 6, we will examine mobile IP---an +extension to the IP infrastructure that allows a mobile node to use a +single permanent address as it moves between subnets. Additional details +about DHCP can be found in \[Droms 2002\] and \[dhc 2016\]. An open +source reference implementation of DHCP is available from the Internet +Systems Consortium \[ISC 2016\]. + +4.3.4 Network Address Translation (NAT) Given our discussion about +Internet addresses and the IPv4 datagram format, we're now well aware +that every IP-capable device needs an IP address. With the proliferation +of small office, home office (SOHO) subnets, this would seem to imply +that whenever a SOHO wants to install a LAN to connect multiple +machines, a range of addresses would need to be allocated by the ISP to +cover all of the SOHO's IP devices (including phones, tablets, gaming +devices, IP TVs, printers and more). If the subnet grew bigger, a larger +block of addresses would have to be allocated. But what if the ISP had +already allocated the contiguous portions of the SOHO network's current +address range? And what typical homeowner wants (or should need) to know +how to manage IP addresses in the first place? Fortunately, there is a +simpler approach to address allocation that has found increasingly +widespread use in such scenarios: network address translation (NAT) +\[RFC 2663; RFC 3022; Huston 2004, Zhang 2007; Cisco NAT 2016\]. Figure +4.25 shows the operation of a NAT-enabled router. The NAT-enabled +router, residing in the home, has an interface that is part of the home +network on the right of Figure 4.25. Addressing within the home network +is exactly as we have seen above---all four interfaces in the home +network have the same subnet address of 10.0.0/24. The address space +10.0.0.0/8 is one of three portions of the IP address space that is +reserved in \[RFC 1918\] for a private network or a realm with private +addresses, such as the home network in Figure 4.25. A realm with private +addresses refers to a network whose addresses only have meaning to +devices within that network. To see why this is important, consider the +fact that there are hundreds of thousands of home networks, many using +the same address space, 10.0.0.0/24. Devices within a given home network +can send packets to each other using 10.0.0.0/24 addressing. However, +packets forwarded beyond the home network into the larger global +Internet clearly cannot use these addresses (as either a source or a +destination address) because there are hundreds of thousands of networks +using this block of addresses. That is, the 10.0.0.0/24 addresses can +only have meaning within the + +Figure 4.25 Network address translation + +given home network. But if private addresses only have meaning within a +given network, how is addressing handled when packets are sent to or +received from the global Internet, where addresses are necessarily +unique? The answer lies in understanding NAT. The NAT-enabled router +does not look like a router to the outside world. Instead the NAT router +behaves to the outside world as a single device with a single IP +address. In Figure 4.25, all traffic leaving the home router for the +larger Internet has a source IP address of 138.76.29.7, and all traffic +entering the home router must have a destination address of 138.76.29.7. +In essence, the NAT-enabled router is hiding the details of the home +network from the outside world. (As an aside, you might wonder where the +home network computers get their addresses and where the router gets its +single IP address. Often, the answer is the same---DHCP! The router gets +its address from the ISP's DHCP server, and the router runs a DHCP +server to provide addresses to computers within the +NAT-DHCP-routercontrolled home network's address space.) If all +datagrams arriving at the NAT router from the WAN have the same +destination IP address (specifically, that of the WAN-side interface of +the NAT router), then how does the router know the internal host to +which it should forward a given datagram? The trick is to use a NAT +translation table at the NAT router, and to include port numbers as well +as IP addresses in the table entries. Consider the example in Figure +4.25. Suppose a user sitting in a home network behind host 10.0.0.1 +requests a Web page on some Web server (port 80) with IP address +128.119.40.186. The host 10.0.0.1 assigns the (arbitrary) source port +number 3345 and sends the datagram into the LAN. The NAT router receives +the datagram, generates a new source port number 5001 for the datagram, +replaces the + +source IP address with its WAN-side IP address 138.76.29.7, and replaces +the original source port number 3345 with the new source port number +5001. When generating a new source port number, the NAT router can +select any source port number that is not currently in the NAT +translation table. (Note that because a port number field is 16 bits +long, the NAT protocol can support over 60,000 simultaneous connections +with a single WAN-side IP address for the router!) NAT in the router +also adds an entry to its NAT translation table. The Web server, +blissfully unaware that the arriving datagram containing the HTTP +request has been manipulated by the NAT router, responds with a datagram +whose destination address is the IP address of the NAT router, and whose +destination port number is 5001. When this datagram arrives at the NAT +router, the router indexes the NAT translation table using the +destination IP address and destination port number to obtain the +appropriate IP address (10.0.0.1) and destination port number (3345) for +the browser in the home network. The router then rewrites the datagram's +destination address and destination port number, and forwards the +datagram into the home network. NAT has enjoyed widespread deployment in +recent years. But NAT is not without detractors. First, one might argue +that, port numbers are meant to be used for addressing processes, not +for addressing hosts. This violation can indeed cause problems for +servers running on the home network, since, as we have seen in Chapter +2, server processes wait for incoming requests at well-known port +numbers and peers in a P2P protocol need to accept incoming connections +when acting as servers. Technical solutions to these problems include +NAT traversal tools \[RFC 5389\] and Universal Plug and Play (UPnP), a +protocol that allows a host to discover and configure a nearby NAT +\[UPnP Forum 2016\]. More "philosophical" arguments have also been +raised against NAT by architectural purists. Here, the concern is that +routers are meant to be layer 3 (i.e., network-layer) devices, and +should process packets only up to the network layer. NAT violates this +principle that hosts should be talking directly with each other, without +interfering nodes modifying IP addresses, much less port numbers. But +like it or not, NAT has not become an important component of the +Internet, as have other so-called middleboxes \[Sekar 2011\] that +operate at the network layer but have functions that are quite different +from routers. Middleboxes do not perform traditional datagram +forwarding, but instead perform functions such as NAT, load balancing of +traffic flows, traffic firewalling (see accompanying sidebar), and more. +The generalized forwarding paradigm that we'll study shortly in Section +4.4 allows a number of these middlebox functions, as well as traditional +router forwarding, to be accomplished in a common, integrated manner. + +FOCUS ON SECURITY INSPECTING DATAGRAMS: FIREWALLS AND INTRUSION +DETECTION SYSTEMS Suppose you are assigned the task of administering a +home, departmental, university, or corporate network. Attackers, knowing +the IP address range of your network, can easily send IP datagrams to +addresses in your range. These datagrams can do all kinds of devious +things, including mapping your network with ping sweeps and port scans, +crashing vulnerable hosts with + +malformed packets, scanning for open TCP/UDP ports on servers in your +network, and infecting hosts by including malware in the packets. As the +network administrator, what are you going to do about all those bad guys +out there, each capable of sending malicious packets into your network? +Two popular defense mechanisms to malicious packet attacks are firewalls +and intrusion detection systems (IDSs). As a network administrator, you +may first try installing a firewall between your network and the +Internet. (Most access routers today have firewall capability.) +Firewalls inspect the datagram and segment header fields, denying +suspicious datagrams entry into the internal network. For example, a +firewall may be configured to block all ICMP echo request packets (see +Section 5.6), thereby preventing an attacker from doing a traditional +port scan across your IP address range. Firewalls can also block packets +based on source and destination IP addresses and port numbers. +Additionally, firewalls can be configured to track TCP connections, +granting entry only to datagrams that belong to approved connections. +Additional protection can be provided with an IDS. An IDS, typically +situated at the network boundary, performs "deep packet inspection," +examining not only header fields but also the payloads in the datagram +(including application-layer data). An IDS has a database of packet +signatures that are known to be part of attacks. This database is +automatically updated as new attacks are discovered. As packets pass +through the IDS, the IDS attempts to match header fields and payloads to +the signatures in its signature database. If such a match is found, an +alert is created. An intrusion prevention system (IPS) is similar to an +IDS, except that it actually blocks packets in addition to creating +alerts. In Chapter 8, we'll explore firewalls and IDSs in more detail. +Can firewalls and IDSs fully shield your network from all attacks? The +answer is clearly no, as attackers continually find new attacks for +which signatures are not yet available. But firewalls and traditional +signature-based IDSs are useful in protecting your network from known +attacks. + +4.3.5 IPv6 In the early 1990s, the Internet Engineering Task Force began +an effort to develop a successor to the IPv4 protocol. A prime +motivation for this effort was the realization that the 32-bit IPv4 +address space was beginning to be used up, with new subnets and IP nodes +being attached to the Internet (and being allocated unique IP addresses) +at a breathtaking rate. To respond to this need for a large IP address +space, a new IP protocol, IPv6, was developed. The designers of IPv6 +also took this opportunity to tweak and augment other aspects of IPv4, +based on the accumulated operational experience with IPv4. The point in +time when IPv4 addresses would be completely allocated (and hence no new +networks + +could attach to the Internet) was the subject of considerable debate. +The estimates of the two leaders of the IETF's Address Lifetime +Expectations working group were that addresses would become exhausted in +2008 and 2018, respectively \[Solensky 1996\]. In February 2011, IANA +allocated out the last remaining pool of unassigned IPv4 addresses to a +regional registry. While these registries still have available IPv4 +addresses within their pool, once these addresses are exhausted, there +are no more available address blocks that can be allocated from a +central pool \[Huston 2011a\]. A recent survey of IPv4 address-space +exhaustion, and the steps taken to prolong the life of the address space +is \[Richter 2015\]. Although the mid-1990s estimates of IPv4 address +depletion suggested that a considerable amount of time might be left +until the IPv4 address space was exhausted, it was realized that +considerable time would be needed to deploy a new technology on such an +extensive scale, and so the process to develop IP version 6 (IPv6) \[RFC +2460\] was begun \[RFC 1752\]. (An often-asked question is what happened +to IPv5? It was initially envisioned that the ST-2 protocol would become +IPv5, but ST-2 was later dropped.) An excellent source of information +about IPv6 is \[Huitema 1998\]. IPv6 Datagram Format The format of the +IPv6 datagram is shown in Figure 4.26. The most important changes +introduced in IPv6 are evident in the datagram format: Expanded +addressing capabilities. IPv6 increases the size of the IP address from +32 to 128 bits. This ensures that the world won't run out of IP +addresses. Now, every grain of sand on the planet can be IP-addressable. +In addition to unicast and multicast addresses, IPv6 has introduced a +new type of address, called an anycast address, that allows a datagram +to be delivered to any one of a group of hosts. (This feature could be +used, for example, to send an HTTP GET to the nearest of a number of +mirror sites that contain a given document.) A streamlined 40-byte +header. As discussed below, a number of IPv4 fields have been dropped or +made optional. The resulting 40-byte fixed-length header allows for +faster processing of the IP datagram by a router. A new encoding of +options allows for more flexible options processing. Flow labeling. IPv6 +has an elusive definition of a flow. RFC 2460 states that this allows +"labeling of packets belonging to particular flows for which the sender + +Figure 4.26 IPv6 datagram format + +requests special handling, such as a non-default quality of service or +real-time service." For example, audio and video transmission might +likely be treated as a flow. On the other hand, the more traditional +applications, such as file transfer and e-mail, might not be treated as +flows. It is possible that the traffic carried by a high-priority user +(for example, someone paying for better service for their traffic) might +also be treated as a flow. What is clear, however, is that the designers +of IPv6 foresaw the eventual need to be able to differentiate among the +flows, even if the exact meaning of a flow had yet to be determined. As +noted above, a comparison of Figure 4.26 with Figure 4.16 reveals the +simpler, more streamlined structure of the IPv6 datagram. The following +fields are defined in IPv6: Version. This 4-bit field identifies the IP +version number. Not surprisingly, IPv6 carries a value of 6 in this +field. Note that putting a 4 in this field does not create a valid IPv4 +datagram. (If it did, life would be a lot simpler---see the discussion +below regarding the transition from IPv4 to IPv6.) Traffic class. The +8-bit traffic class field, like the TOS field in IPv4, can be used to +give priority to certain datagrams within a flow, or it can be used to +give priority to datagrams from certain applications (for example, +voice-over-IP) over datagrams from other applications (for example, SMTP +e-mail). Flow label. As discussed above, this 20-bit field is used to +identify a flow of datagrams. Payload length. This 16-bit value is +treated as an unsigned integer giving the number of bytes in the IPv6 +datagram following the fixed-length, 40-byte datagram header. Next +header. This field identifies the protocol to which the contents (data +field) of this datagram will be delivered (for example, to TCP or UDP). +The field uses the same values as the protocol field in the IPv4 header. +Hop limit. The contents of this field are decremented by one by each +router that forwards the datagram. If the hop limit count reaches zero, +the datagram is discarded. + +Source and destination addresses. The various formats of the IPv6 +128-bit address are described in RFC 4291. Data. This is the payload +portion of the IPv6 datagram. When the datagram reaches its destination, +the payload will be removed from the IP datagram and passed on to the +protocol specified in the next header field. The discussion above +identified the purpose of the fields that are included in the IPv6 +datagram. Comparing the IPv6 datagram format in Figure 4.26 with the +IPv4 datagram format that we saw in Figure 4.16, we notice that several +fields appearing in the IPv4 datagram are no longer present in the IPv6 +datagram: Fragmentation/reassembly. IPv6 does not allow for +fragmentation and reassembly at intermediate routers; these operations +can be performed only by the source and destination. If an IPv6 datagram +received by a router is too large to be forwarded over the outgoing +link, the router simply drops the datagram and sends a "Packet Too Big" +ICMP error message (see Section 5.6) back to the sender. The sender can +then resend the data, using a smaller IP datagram size. Fragmentation +and reassembly is a time-consuming operation; removing this +functionality from the routers and placing it squarely in the end +systems considerably speeds up IP forwarding within the network. Header +checksum. Because the transport-layer (for example, TCP and UDP) and +link-layer (for example, Ethernet) protocols in the Internet layers +perform checksumming, the designers of IP probably felt that this +functionality was sufficiently redundant in the network layer that it +could be removed. Once again, fast processing of IP packets was a +central concern. Recall from our discussion of IPv4 in Section 4.3.1 +that since the IPv4 header contains a TTL field (similar to the hop +limit field in IPv6), the IPv4 header checksum needed to be recomputed +at every router. As with fragmentation and reassembly, this too was a +costly operation in IPv4. Options. An options field is no longer a part +of the standard IP header. However, it has not gone away. Instead, the +options field is one of the possible next headers pointed to from within +the IPv6 header. That is, just as TCP or UDP protocol headers can be the +next header within an IP packet, so too can an options field. The +removal of the options field results in a fixed-length, 40-byte IP +header. Transitioning from IPv4 to IPv6 Now that we have seen the +technical details of IPv6, let us consider a very practical matter: How +will the public Internet, which is based on IPv4, be transitioned to +IPv6? The problem is that while new IPv6capable systems can be made +backward-compatible, that is, can send, route, and receive IPv4 +datagrams, already deployed IPv4-capable systems are not capable of +handling IPv6 datagrams. Several options are possible \[Huston 2011b, +RFC 4213\]. One option would be to declare a flag day---a given time and +date when all Internet machines would be turned off and upgraded from +IPv4 to IPv6. The last major technology transition (from using NCP to + +using TCP for reliable transport service) occurred almost 35 years ago. +Even back then \[RFC 801\], when the Internet was tiny and still being +administered by a small number of "wizards," it was realized that such a +flag day was not possible. A flag day involving billions of devices is +even more unthinkable today. The approach to IPv4-to-IPv6 transition +that has been most widely adopted in practice involves tunneling \[RFC +4213\]. The basic idea behind tunneling---a key concept with +applications in many other scenarios beyond IPv4-to-IPv6 transition, +including wide use in the all-IP cellular networks that we'll cover in +Chapter 7---is the following. Suppose two IPv6 nodes (in this example, B +and E in Figure 4.27) want to interoperate using IPv6 datagrams but are +connected to each other by intervening IPv4 routers. We refer to the +intervening set of IPv4 routers between two IPv6 routers as a tunnel, as +illustrated in Figure 4.27. With tunneling, the IPv6 node on the sending +side of the tunnel (in this example, B) takes the entire IPv6 datagram +and puts it in the data (payload) field of an IPv4 datagram. This IPv4 +datagram is then addressed to the IPv6 node on the receiving side of the +tunnel (in this example, E) and sent to the first node in the tunnel (in +this example, C). The intervening IPv4 routers in the tunnel route this +IPv4 datagram among themselves, just as they would any other datagram, +blissfully unaware that the IPv4 datagram itself contains a complete +IPv6 datagram. The IPv6 node on the receiving side of the tunnel +eventually receives the IPv4 datagram (it is the destination of the IPv4 +datagram!), determines that the IPv4 datagram contains an IPv6 datagram +(by observing that the protocol number field in the IPv4 datagram is 41 +\[RFC 4213\], indicating that the IPv4 payload is a IPv6 datagram), +extracts the IPv6 datagram, and then routes the IPv6 datagram exactly as +it would if it had received the IPv6 datagram from a directly connected +IPv6 neighbor. We end this section by noting that while the adoption of +IPv6 was initially slow to take off \[Lawton 2001; Huston 2008b\], +momentum has been building. NIST \[NIST IPv6 2015\] reports that more +than a third of US government second-level domains are IPv6-enabled. On +the client side, Google reports that only about 8 percent of the clients +accessing Google services do so via IPv6 \[Google IPv6 2015\]. But other +recent measurements \[Czyz 2014\] indicate that IPv6 adoption is +accelerating. The proliferation of devices such as IP-enabled phones and +other portable devices + +Figure 4.27 Tunneling + +provides an additional push for more widespread deployment of IPv6. +Europe's Third Generation Partnership Program \[3GPP 2016\] has +specified IPv6 as the standard addressing scheme for mobile multimedia. +One important lesson that we can learn from the IPv6 experience is that +it is enormously difficult to change network-layer protocols. Since the +early 1990s, numerous new network-layer protocols have been trumpeted as +the next major revolution for the Internet, but most of these protocols +have had limited penetration to date. These protocols include IPv6, +multicast protocols, and resource reservation protocols; a discussion of +these latter two protocols can be found in the online supplement to this +text. Indeed, introducing new protocols into the network layer is like +replacing the foundation of a house---it is difficult to do without +tearing the whole house down or at least temporarily relocating the +house's residents. On the other hand, the Internet has witnessed rapid +deployment of new protocols at the application layer. The classic +examples, of course, are the Web, instant messaging, streaming media, +distributed games, and various forms of social media. Introducing new +application-layer protocols is like adding a new layer of paint to a +house---it is relatively easy to do, and if you choose an attractive +color, others in the neighborhood will copy you. In summary, in the +future we can certainly expect to see changes in the Internet's network +layer, but these changes will likely occur on a time scale that is much +slower than the changes that will occur at the application layer. + +4.4 Generalized Forwarding and SDN In Section 4.2.1, we noted that an +Internet router's forwarding decision has traditionally been based +solely on a packet's destination address. In the previous section, +however, we've also seen that there has been a proliferation of +middleboxes that perform many layer-3 functions. NAT boxes rewrite +header IP addresses and port numbers; firewalls block traffic based on +header-field values or redirect packets for additional processing, such +as deep packet inspection (DPI). Load-balancers forward packets +requesting a given service (e.g., an HTTP request) to one of a set of a +set of servers that provide that service. \[RFC 3234\] lists a number of +common middlebox functions. This proliferation of middleboxes, layer-2 +switches, and layer-3 routers \[Qazi 2013\]---each with its own +specialized hardware, software and management interfaces---has +undoubtedly resulted in costly headaches for many network operators. +However, recent advances in software-defined networking have promised, +and are now delivering, a unified approach towards providing many of +these network-layer functions, and certain link-layer functions as well, +in a modern, elegant, and integrated manner. Recall that Section 4.2.1 +characterized destination-based forwarding as the two steps of looking +up a destination IP address ("match"), then sending the packet into the +switching fabric to the specified output port ("action"). Let's now +consider a significantly more general "match-plus-action" paradigm, +where the "match" can be made over multiple header fields associated +with different protocols at different layers in the protocol stack. The +"action" can include forwarding the packet to one or more output ports +(as in destination-based forwarding), load balancing packets across +multiple outgoing interfaces that lead to a service (as in load +balancing), rewriting header values (as in NAT), purposefully +blocking/dropping a packet (as in a firewall), sending a packet to a +special server for further processing and action (as in DPI), and more. +In generalized forwarding, a match-plus-action table generalizes the +notion of the destination-based forwarding table that we encountered in +Section 4.2.1. Because forwarding decisions may be made using +network-layer and/or link-layer source and destination addresses, the +forwarding devices shown in Figure 4.28 are more accurately described as +"packet switches" rather than layer 3 "routers" or layer 2 "switches." +Thus, in the remainder of this section, and in Section 5.5, we'll refer + +Figure 4.28 Generalized forwarding: Each packet switch contains a +match-plus-action table that is computed and distributed by a remote +controller + +to these devices as packet switches, adopting the terminology that is +gaining widespread adoption in SDN literature. Figure 4.28 shows a +match-plus-action table in each packet switch, with the table being +computed, installed, and updated by a remote controller. We note that +while it is possible for the control components at the individual packet +switch to interact with each other (e.g., in a manner similar to that in +Figure 4.2), in practice generalized match-plus-action capabilities are +implemented via a remote controller that computes, installs, and updates +these tables. You might take a minute to compare Figures 4.2, 4.3 and +4.28---what similarities and differences do you notice between +destination-based forwarding shown in Figure 4.2 and 4.3, and +generalized forwarding shown in Figure 4.28? Our following discussion of +generalized forwarding will be based on OpenFlow \[McKeown 2008, +OpenFlow 2009, Casado 2014, Tourrilhes 2014\]---a highly visible and +successful standard that has pioneered the notion of the +match-plus-action forwarding abstraction and controllers, as well as the +SDN revolution more generally \[Feamster 2013\]. We'll primarily +consider OpenFlow 1.0, which introduced key SDN abstractions and +functionality in a particularly clear and concise manner. Later versions +of + +OpenFlow introduced additional capabilities as a result of experience +gained through implementation and use; current and earlier versions of +the OpenFlow standard can be found at \[ONF 2016\]. Each entry in the +match-plus-action forwarding table, known as a flow table in OpenFlow, +includes: A set of header field values to which an incoming packet will +be matched. As in the case of destination-based forwarding, +hardware-based matching is most rapidly performed in TCAM memory, with +more than a million destination address entries being possible +\[Bosshart 2013\]. A packet that matches no flow table entry can be +dropped or sent to the remote controller for more processing. In +practice, a flow table may be implemented by multiple flow tables for +performance or cost reasons \[Bosshart 2013\], but we'll focus here on +the abstraction of a single flow table. A set of counters that are +updated as packets are matched to flow table entries. These counters +might include the number of packets that have been matched by that table +entry, and the time since the table entry was last updated. A set of +actions to be taken when a packet matches a flow table entry. These +actions might be to forward the packet to a given output port, to drop +the packet, makes copies of the packet and sent them to multiple output +ports, and/or to rewrite selected header fields. We'll explore matching +and actions in more detail in Sections 4.4.1 and 4.4.2, respectively. +We'll then study how the network-wide collection of per-packet switch +matching rules can be used to implement a wide range of functions +including routing, layer-2 switching, firewalling, load-balancing, +virtual networks, and more in Section 4.4.3. In closing, we note that +the flow table is essentially an API, the abstraction through which an +individual packet switch's behavior can be programmed; we'll see in +Section 4.4.3 that network-wide behaviors can similarly be programmed by +appropriately programming/configuring these tables in a collection of +network packet switches \[Casado 2014\]. + +4.4.1 Match Figure 4.29 shows the eleven packet-header fields and the +incoming port ID that can be matched in an OpenFlow 1.0 +match-plus-action rule. Recall from + +Figure 4.29 Packet matching fields, OpenFlow 1.0 flow table + +Section 1.5.2 that a link-layer (layer 2) frame arriving to a packet +switch will contain a network-layer (layer 3) datagram as its payload, +which in turn will typically contain a transport-layer (layer 4) +segment. The first observation we make is that OpenFlow's match +abstraction allows for a match to be made on selected fields from three +layers of protocol headers (thus rather brazenly defying the layering +principle we studied in Section 1.5). Since we've not yet covered the +link layer, suffice it to say that the source and destination MAC +addresses shown in Figure 4.29 are the link-layer addresses associated +with the frame's sending and receiving interfaces; by forwarding on the +basis of Ethernet addresses rather than IP addresses, we can see that an +OpenFlow-enabled device can equally perform as a router (layer-3 device) +forwarding datagrams as well as a switch (layer-2 device) forwarding +frames. The Ethernet type field corresponds to the upper layer protocol +(e.g., IP) to which the frame's payload will be demultiplexed, and the +VLAN fields are concerned with so-called virtual local area networks +that we'll study in Chapter 6. The set of twelve values that can be +matched in the OpenFlow 1.0 specification has grown to 41 values in more +recent OpenFlow specifications \[Bosshart 2014\]. The ingress port +refers to the input port at the packet switch on which a packet is +received. The packet's IP source address, IP destination address, IP +protocol field, and IP type of service fields were discussed earlier in +Section 4.3.1. The transport-layer source and destination port number +fields can also be matched. Flow table entries may also have wildcards. +For example, an IP address of 128.119.*.* in a flow table will match the +corresponding address field of any datagram that has 128.119 as the +first 16 bits of its address. Each flow table entry also has an +associated priority. If a packet matches multiple flow table entries, +the selected match and corresponding action will be that of the highest +priority entry with which the packet matches. Lastly, we observe that +not all fields in an IP header can be matched. For example OpenFlow does +not allow matching on the basis of TTL field or datagram length field. +Why are some fields allowed for matching, while others are not? +Undoubtedly, the answer has to do with the tradeoff between +functionality and complexity. The "art" in choosing an abstraction is to +provide for enough functionality to accomplish a task (in this case to +implement, configure, and manage a wide range of network-layer functions +that had previously been implemented through an assortment of +network-layer devices), without over-burdening the abstraction with so +much detail and generality that it becomes bloated and unusable. Butler +Lampson has famously noted \[Lampson 1983\]: Do one thing at a time, and +do it well. An interface should capture the minimum essentials of an +abstraction. Don't generalize; generalizations are generally wrong. +Given OpenFlow's success, one can surmise that its designers indeed +chose their abstraction well. Additional details of OpenFlow matching +can be found in \[OpenFlow 2009, ONF 2016\]. + +4.4.2 Action As shown in Figure 4.28, each flow table entry has a list +of zero or more actions that determine the processing that is to be +applied to a packet that matches a flow table entry. If there are +multiple actions, they are performed in the order specified in the list. +Among the most important possible actions are: Forwarding. An incoming +packet may be forwarded to a particular physical output port, broadcast +over all ports (except the port on which it arrived) or multicast over a +selected set of ports. The packet may be encapsulated and sent to the +remote controller for this device. That controller then may (or may not) +take some action on that packet, including installing new flow table +entries, and may return the packet to the device for forwarding under +the updated set of flow table rules. Dropping. A flow table entry with +no action indicates that a matched packet should be dropped. +Modify-field. The values in ten packet header fields (all layer 2, 3, +and 4 fields shown in Figure 4.29 except the IP Protocol field) may be +re-written before the packet is forwarded to the chosen output port. + +4.4.3 OpenFlow Examples of Match-plus-action in Action Having now +considered both the match and action components of generalized +forwarding, let's put these ideas together in the context of the sample +network shown in Figure 4.30. The network has 6 hosts (h1, h2, h3, h4, +h5 and h6) and three packet switches (s1, s2 and s3), each with four +local interfaces (numbered 1 through 4). We'll consider a number of +network-wide behaviors that we'd like to implement, and the flow table +entries in s1, s2 and s3 needed to implement this behavior. + +Figure 4.30 OpenFlow match-plus-action network with three packet +switches, 6 hosts, and an OpenFlow controller + +A First Example: Simple Forwarding As a very simple example, suppose +that the desired forwarding behavior is that packets from h5 or h6 +destined to h3 or h4 are to be forwarded from s3 to s1, and then from s1 +to s2 (thus completely avoiding the use of the link between s3 and s2). +The flow table entry in s1 would be: + +s1 Flow Table (Example 1) Match + +Action + +Ingress Port = 1 ; IP Src = 10.3.*.* ; IP Dst = 10.2.*.* + +Forward(4) + +... + +... + +Of course, we'll also need a flow table entry in s3 so that datagrams +sent from h5 or h6 are forwarded to s1 over outgoing interface 3: + +s3 Flow Table (Example 1) Match + +Action + +IP Src = 10.3.*.* ; IP Dst = 10.2.*.* + +Forward(3) + +... + +... + +Lastly, we'll also need a flow table entry in s2 to complete this first +example, so that datagrams arriving from s1 are forwarded to their +destination, either host h3 or h4: + +s2 Flow Table (Example 1) Match + +Action + +Ingress port = 2 ; IP Dst = 10.2.0.3 + +Forward(3) + +Ingress port = 2 ; IP Dst = 10.2.0.4 + +Forward(4) + +... + +... + +A Second Example: Load Balancing As a second example, let's consider a +load-balancing scenario, where datagrams from h3 destined to 10.1.*.* +are to be forwarded over the direct link between s2 and s1, while +datagrams from h4 destined to 10.1.*.* are to be forwarded over the link +between s2 and s3 (and then from s3 to s1). Note that this behavior +couldn't be achieved with IP's destination-based forwarding. In this +case, the flow table in s2 would be: + +s2 Flow Table (Example 2) Match + +Action + +Ingress port = 3; IP Dst = 10.1.*.* + +Forward(2) + +Ingress port = 4; IP Dst = 10.1.*.* + +Forward(1) + +... + +... + +Flow table entries are also needed at s1 to forward the datagrams +received from s2 to either h1 or h2; and flow table entries are needed +at s3 to forward datagrams received on interface 4 from s2 over +interface 3 towards s1. See if you can figure out these flow table +entries at s1 and s3. A Third Example: Firewalling As a third example, +let's consider a firewall scenario in which s2 wants only to receive (on +any of its interfaces) traffic sent from hosts attached to s3. + +s2 Flow Table (Example 3) Match + +Action + +IP Src = 10.3.*.* IP Dst = 10.2.0.3 + +Forward(3) + +IP Src = 10.3.*.* IP Dst = 10.2.0.4 + +Forward(4) + +... + +... + +If there were no other entries in s2's flow table, then only traffic +from 10.3.*.* would be forwarded to the hosts attached to s2. Although +we've only considered a few basic scenarios here, the versatility and +advantages of generalized forwarding are hopefully apparent. In homework +problems, we'll explore how flow tables can be used to create many +different logical behaviors, including virtual networks---two or more +logically separate networks (each with their own independent and +distinct forwarding behavior)---that use the same physical set of packet +switches and links. In Section 5.5, we'll return to flow tables when we +study the SDN controllers that compute and distribute the flow tables, +and the protocol used for communicating between a packet switch and its +controller. + +4.5 Summary In this chapter we've covered the data plane functions of +the network layer---the per-router functions that determine how packets +arriving on one of a router's input links are forwarded to one of that +router's output links. We began by taking a detailed look at the +internal operations of a router, studying input and output port +functionality and destination-based forwarding, a router's internal +switching mechanism, packet queue management and more. We covered both +traditional IP forwarding (where forwarding is based on a datagram's +destination address) and generalized forwarding (where forwarding and +other functions may be performed using values in several different +fields in the datagram's header) and seen the versatility of the latter +approach. We also studied the IPv4 and IPv6 protocols in detail, and +Internet addressing, which we found to be much deeper, subtler, and more +interesting than we might have expected. With our newfound understanding +of the network-layer's data plane, we're now ready to dive into the +network layer's control plane in Chapter 5! + +Homework Problems and Questions + +Chapter 4 Review Questions + +SECTION 4.1 R1. Let's review some of the terminology used in this +textbook. Recall that the name of a transport-layer packet is segment +and that the name of a link-layer packet is frame. What is the name of a +network-layer packet? Recall that both routers and link-layer switches +are called packet switches. What is the fundamental difference between a +router and link-layer switch? R2. We noted that network layer +functionality can be broadly divided into data plane functionality and +control plane functionality. What are the main functions of the data +plane? Of the control plane? R3. We made a distinction between the +forwarding function and the routing function performed in the network +layer. What are the key differences between routing and forwarding? R4. +What is the role of the forwarding table within a router? R5. We said +that a network layer's service model "defines the characteristics of +end-to-end transport of packets between sending and receiving hosts." +What is the service model of the Internet's network layer? What +guarantees are made by the Internet's service model regarding the +host-to-host delivery of datagrams? + +SECTION 4.2 R6. In Section 4.2 , we saw that a router typically consists +of input ports, output ports, a switching fabric and a routing +processor. Which of these are implemented in hardware and which are +implemented in software? Why? Returning to the notion of the network +layer's data plane and control plane, which are implemented in hardware +and which are implemented in software? Why? R7. Discuss why each input +port in a high-speed router stores a shadow copy of the forwarding +table. R8. What is meant by destination-based forwarding? How does this +differ from generalized forwarding (assuming you've read Section 4.4 , +which of the two approaches are adopted by Software-Defined Networking)? +R9. Suppose that an arriving packet matches two or more entries in a +router's forwarding table. With traditional destination-based +forwarding, what rule does a router apply to determine which + +of these rules should be applied to determine the output port to which +the arriving packet should be switched? R10. Three types of switching +fabrics are discussed in Section 4.2 . List and briefly describe each +type. Which, if any, can send multiple packets across the fabric in +parallel? R11. Describe how packet loss can occur at input ports. +Describe how packet loss at input ports can be eliminated (without using +infinite buffers). R12. Describe how packet loss can occur at output +ports. Can this loss be prevented by increasing the switch fabric speed? +R13. What is HOL blocking? Does it occur in input ports or output ports? +R14. In Section 4.2 , we studied FIFO, Priority, Round Robin (RR), and +Weighted Fair Queueing (WFQ) packet scheduling disciplines? Which of +these queueing disciplines ensure that all packets depart in the order +in which they arrived? R15. Give an example showing why a network +operator might want one class of packets to be given priority over +another class of packets. R16. What is an essential different between RR +and WFQ packet scheduling? Is there a case (Hint: Consider the WFQ +weights) where RR and WFQ will behave exactly the same? + +SECTION 4.3 R17. Suppose Host A sends Host B a TCP segment encapsulated +in an IP datagram. When Host B receives the datagram, how does the +network layer in Host B know it should pass the segment (that is, the +payload of the datagram) to TCP rather than to UDP or to some other +upper-layer protocol? R18. What field in the IP header can be used to +ensure that a packet is forwarded through no more than N routers? R19. +Recall that we saw the Internet checksum being used in both +transport-layer segment (in UDP and TCP headers, Figures 3.7 and 3.29 +respectively) and in network-layer datagrams (IP header, Figure 4.16 ). +Now consider a transport layer segment encapsulated in an IP datagram. +Are the checksums in the segment header and datagram header computed +over any common bytes in the IP datagram? Explain your answer. R20. When +a large datagram is fragmented into multiple smaller datagrams, where +are these smaller datagrams reassembled into a single larger datagram? +R21. Do routers have IP addresses? If so, how many? R22. What is the +32-bit binary equivalent of the IP address 223.1.3.27? R23. Visit a host +that uses DHCP to obtain its IP address, network mask, default router, +and IP address of its local DNS server. List these values. R24. Suppose +there are three routers between a source host and a destination host. +Ignoring fragmentation, an IP datagram sent from the source host to the +destination host will travel over how many interfaces? How many +forwarding tables will be indexed to move the datagram from the source +to the destination? + +R25. Suppose an application generates chunks of 40 bytes of data every +20 msec, and each chunk gets encapsulated in a TCP segment and then an +IP datagram. What percentage of each datagram will be overhead, and what +percentage will be application data? R26. Suppose you purchase a +wireless router and connect it to your cable modem. Also suppose that +your ISP dynamically assigns your connected device (that is, your +wireless router) one IP address. Also suppose that you have five PCs at +home that use 802.11 to wirelessly connect to your wireless router. How +are IP addresses assigned to the five PCs? Does the wireless router use +NAT? Why or why not? R27. What is meant by the term "route aggregation"? +Why is it useful for a router to perform route aggregation? R28. What is +meant by a "plug-and-play" or "zeroconf" protocol? R29. What is a +private network address? Should a datagram with a private network +address ever be present in the larger public Internet? Explain. R30. +Compare and contrast the IPv4 and the IPv6 header fields. Do they have +any fields in common? R31. It has been said that when IPv6 tunnels +through IPv4 routers, IPv6 treats the IPv4 tunnels as link-layer +protocols. Do you agree with this statement? Why or why not? + +SECTION 4.4 R32. How does generalized forwarding differ from +destination-based forwarding? R33. What is the difference between a +forwarding table that we encountered in destinationbased forwarding in +Section 4.1 and OpenFlow's flow table that we encountered in Section 4.4 +? R34. What is meant by the "match plus action" operation of a router or +switch? In the case of destination-based forwarding packet switch, what +is matched and what is the action taken? In the case of an SDN, name +three fields that can be matched, and three actions that can be taken. +R35. Name three header fields in an IP datagram that can be "matched" in +OpenFlow 1.0 generalized forwarding. What are three IP datagram header +fields that cannot be "matched" in OpenFlow? + +Problems P1. Consider the network below. + +a. Show the forwarding table in router A, such that all traffic + destined to host H3 is forwarded through interface 3. + +b. Can you write down a forwarding table in router A, such that all + traffic from H1 destined to host H3 is forwarded through interface + 3, while all traffic from H2 destined to host H3 is forwarded + through interface 4? (Hint: This is a trick question.) + +P2. Suppose two packets arrive to two different input ports of a router +at exactly the same time. Also suppose there are no other packets +anywhere in the router. + +a. Suppose the two packets are to be forwarded to two different output + ports. Is it possible to forward the two packets through the switch + fabric at the same time when the fabric uses a shared bus? + +b. Suppose the two packets are to be forwarded to two different output + ports. Is it possible to forward the two packets through the switch + fabric at the same time when the fabric uses switching via memory? + +c. Suppose the two packets are to be forwarded to the same output port. + Is it possible to forward the two packets through the switch fabric + at the same time when the fabric uses a crossbar? P3. In Section 4.2 + , we noted that the maximum queuing delay is (n--1)D if the + switching fabric is n times faster than the input line rates. + Suppose that all packets are of the same length, n packets arrive at + the same time to the n input ports, and all n packets want to be + forwarded to different output ports. What is the maximum delay for a + packet for the (a) memory, (b) bus, and + +```{=html} +<!-- --> +``` +(c) crossbar switching fabrics? P4. Consider the switch shown below. + Suppose that all datagrams have the same fixed length, that the + switch operates in a slotted, synchronous manner, and that in one + time slot a datagram can be transferred from an input port to an + output port. The switch fabric is a crossbar so that at most one + datagram can be transferred to a given output port in a time slot, + but different output ports can receive datagrams from different + input ports in a single time slot. What is the minimal number of + time slots needed to transfer the packets shown from input ports to + their output ports, assuming any input queue scheduling order you + want (i.e., it need not have HOL blocking)? What is the largest + number of slots needed, assuming the worst-case scheduling order you + can devise, assuming that a non-empty input queue is never idle? + +P5. Consider a datagram network using 32-bit host addresses. Suppose a +router has four links, numbered 0 through 3, and packets are to be +forwarded to the link interfaces as follows: Destination Address Range + +Link Interface + +11100000 00000000 00000000 00000000 + +0 + +through 11100000 00111111 11111111 11111111 11100000 01000000 00000000 +00000000 + +1 + +through 11100000 01000000 11111111 11111111 2 + +11100000 01000001 00000000 00000000 through 11100001 01111111 11111111 +11111111 otherwise + +3 + +a. Provide a forwarding table that has five entries, uses longest + prefix matching, and forwards packets to the correct link + interfaces. + +b. Describe how your forwarding table determines the appropriate link + interface for datagrams with destination addresses: 11001000 + 10010001 01010001 01010101 11100001 01000000 11000011 00111100 + 11100001 10000000 00010001 01110111 P6. Consider a datagram network + using 8-bit host addresses. Suppose a router uses longest prefix + matching and has the following forwarding table: Prefix Match + +Interface + +00 + +0 + +010 + +1 + +011 + +2 + +10 + +2 + +11 + +3 + +For each of the four interfaces, give the associated range of +destination host addresses and the number of addresses in the range. P7. +Consider a datagram network using 8-bit host addresses. Suppose a router +uses longest prefix matching and has the following forwarding table: +Prefix Match + +Interface + +1 + +0 + +10 + +1 + +111 + +2 + +otherwise + +3 + +For each of the four interfaces, give the associated range of +destination host addresses and the number of addresses in the range. P8. +Consider a router that interconnects three subnets: Subnet 1, Subnet 2, +and Subnet 3. Suppose all of the interfaces in each of these three +subnets are required to have the prefix 223.1.17/24. Also suppose that +Subnet 1 is required to support at least 60 interfaces, Subnet 2 is to +support at least 90 interfaces, and Subnet 3 is to support at least 12 +interfaces. Provide three network addresses (of the form a.b.c.d/x) that +satisfy these constraints. P9. In Section 4.2.2 an example forwarding +table (using longest prefix matching) is given. Rewrite this forwarding +table using the a.b.c.d/x notation instead of the binary string +notation. P10. In Problem P5 you are asked to provide a forwarding table +(using longest prefix matching). Rewrite this forwarding table using the +a.b.c.d/x notation instead of the binary string notation. P11. Consider +a subnet with prefix 128.119.40.128/26. Give an example of one IP +address (of form xxx.xxx.xxx.xxx) that can be assigned to this network. +Suppose an ISP owns the block of addresses of the form 128.119.40.64/26. +Suppose it wants to create four subnets from this block, with each block +having the same number of IP addresses. What are the prefixes (of form + +a.b.c.d/x) for the four subnets? P12. Consider the topology shown in +Figure 4.20 . Denote the three subnets with hosts (starting clockwise at +12:00) as Networks A, B, and C. Denote the subnets without hosts as +Networks D, E, and F. + +a. Assign network addresses to each of these six subnets, with the + following constraints: All addresses must be allocated from + 214.97.254/23; Subnet A should have enough addresses to support 250 + interfaces; Subnet B should have enough addresses to support 120 + interfaces; and Subnet C should have enough addresses to support 120 + interfaces. Of course, subnets D, E and F should each be able to + support two interfaces. For each subnet, the assignment should take + the form a.b.c.d/x or a.b.c.d/x -- e.f.g.h/y. + +b. Using your answer to part (a), provide the forwarding tables (using + longest prefix matching) for each of the three routers. P13. Use the + whois service at the American Registry for Internet Numbers + (http://www.arin.net/ whois) to determine the IP address blocks for + three universities. Can the whois services be used to determine with + certainty the geographical location of a specific IP address? Use + www.maxmind.com to determine the locations of the Web servers at + each of these universities. P14. Consider sending a 2400-byte + datagram into a link that has an MTU of 700 bytes. Suppose the + original datagram is stamped with the identification number 422. How + many fragments are generated? What are the values in the various + fields in the IP datagram(s) generated related to fragmentation? + P15. Suppose datagrams are limited to 1,500 bytes (including header) + between source Host A and destination Host B. Assuming a 20-byte IP + header, how many datagrams would be required to send an MP3 + consisting of 5 million bytes? Explain how you computed your answer. + P16. Consider the network setup in Figure 4.25 . Suppose that the + ISP instead assigns the router the address 24.34.112.235 and that + the network address of the home network is 192.168.1/24. + +c. Assign addresses to all interfaces in the home network. + +d. Suppose each host has two ongoing TCP connections, all to port 80 at + host 128.119.40.86. Provide the six corresponding entries in the NAT + translation table. P17. Suppose you are interested in detecting the + number of hosts behind a NAT. You observe that the IP layer stamps + an identification number sequentially on each IP packet. The + identification number of the first IP packet generated by a host is + a random number, and the identification numbers of the subsequent IP + packets are sequentially assigned. Assume all IP packets generated + by hosts behind the NAT are sent to the outside world. + +e. Based on this observation, and assuming you can sniff all packets + sent by the NAT to the outside, can you outline a simple technique + that detects the number of unique hosts behind a NAT? Justify your + answer. + +f. If the identification numbers are not sequentially assigned but + randomly assigned, would + +your technique work? Justify your answer. P18. In this problem we'll +explore the impact of NATs on P2P applications. Suppose a peer with +username Arnold discovers through querying that a peer with username +Bernard has a file it wants to download. Also suppose that Bernard and +Arnold are both behind a NAT. Try to devise a technique that will allow +Arnold to establish a TCP connection with Bernard without +applicationspecific NAT configuration. If you have difficulty devising +such a technique, discuss why. P19. Consider the SDN OpenFlow network +shown in Figure 4.30 . Suppose that the desired forwarding behavior for +datagrams arriving at s2 is as follows: any datagrams arriving on input +port 1 from hosts h5 or h6 that are destined to hosts h1 or h2 should be +forwarded over output port 2; any datagrams arriving on input port 2 +from hosts h1 or h2 that are destined to hosts h5 or h6 should be +forwarded over output port 1; any arriving datagrams on input ports 1 or +2 and destined to hosts h3 or h4 should be delivered to the host +specified; hosts h3 and h4 should be able to send datagrams to each +other. Specify the flow table entries in s2 that implement this +forwarding behavior. P20. Consider again the SDN OpenFlow network shown +in Figure 4.30 . Suppose that the desired forwarding behavior for +datagrams arriving from hosts h3 or h4 at s2 is as follows: any +datagrams arriving from host h3 and destined for h1, h2, h5 or h6 should +be forwarded in a clockwise direction in the network; any datagrams +arriving from host h4 and destined for h1, h2, h5 or h6 should be +forwarded in a counter-clockwise direction in the network. Specify the +flow table entries in s2 that implement this forwarding behavior. P21. +Consider again the scenario from P19 above. Give the flow tables entries +at packet switches s1 and s3, such that any arriving datagrams with a +source address of h3 or h4 are routed to the destination hosts specified +in the destination address field in the IP datagram. (Hint: Your +forwarding table rules should include the cases that an arriving +datagram is destined for a directly attached host or should be forwarded +to a neighboring router for eventual host delivery there.) P22. Consider +again the SDN OpenFlow network shown in Figure 4.30 . Suppose we want +switch s2 to function as a firewall. Specify the flow table in s2 that +implements the following firewall behaviors (specify a different flow +table for each of the four firewalling behaviors below) for delivery of +datagrams destined to h3 and h4. You do not need to specify the +forwarding behavior in s2 that forwards traffic to other routers. Only +traffic arriving from hosts h1 and h6 should be delivered to hosts h3 or +h4 (i.e., that arriving traffic from hosts h2 and h5 is blocked). Only +TCP traffic is allowed to be delivered to hosts h3 or h4 (i.e., that UDP +traffic is blocked). + +Only traffic destined to h3 is to be delivered (i.e., all traffic to h4 +is blocked). Only UDP traffic from h1 and destined to h3 is to be +delivered. All other traffic is blocked. + +Wireshark Lab In the Web site for this textbook, +www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab +assignment that examines the operation of the IP protocol, and the IP +datagram format in particular. + +AN INTERVIEW WITH... Vinton G. Cerf Vinton G. Cerf is Vice President and +Chief Internet Evangelist for Google. He served for over 16 years at MCI +in various positions, ending up his tenure there as Senior Vice +President for Technology Strategy. He is widely known as the co-designer +of the TCP/IP protocols and the architecture of the Internet. During his +time from 1976 to 1982 at the US Department of Defense Advanced Research +Projects Agency (DARPA), he played a key role leading the development of +Internet and Internet-related data packet and security techniques. He +received the US Presidential Medal of Freedom in 2005 and the US +National Medal of Technology in 1997. He holds a BS in Mathematics from +Stanford University and an MS and PhD in computer science from UCLA. + +What brought you to specialize in networking? I was working as a +programmer at UCLA in the late 1960s. My job was supported by the US +Defense Advanced Research Projects Agency (called ARPA then, called +DARPA now). I was working in the laboratory of Professor Leonard +Kleinrock on the Network Measurement Center of the newly created +ARPAnet. The first node of the ARPAnet was installed at UCLA on +September 1, 1969. I was responsible for programming a computer that was +used to capture performance information about the ARPAnet and to report +this information back for comparison with mathematical models and +predictions of the performance of the network. Several of the other +graduate students and I were made responsible for working on the +so-called + +host-level protocols of the ARPAnet---the procedures and formats that +would allow many different kinds of computers on the network to interact +with each other. It was a fascinating exploration into a new world (for +me) of distributed computing and communication. Did you imagine that IP +would become as pervasive as it is today when you first designed the +protocol? When Bob Kahn and I first worked on this in 1973, I think we +were mostly very focused on the central question: How can we make +heterogeneous packet networks interoperate with one another, assuming we +cannot actually change the networks themselves? We hoped that we could +find a way to permit an arbitrary collection of packet-switched networks +to be interconnected in a transparent fashion, so that host computers +could communicate end-to-end without having to do any translations in +between. I think we knew that we were dealing with powerful and +expandable technology, but I doubt we had a clear image of what the +world would be like with hundreds of millions of computers all +interlinked on the Internet. What do you now envision for the future of +networking and the Internet? What major challenges/obstacles do you +think lie ahead in their development? I believe the Internet itself and +networks in general will continue to proliferate. Already there is +convincing evidence that there will be billions of Internet-enabled +devices on the Internet, including appliances like cell phones, +refrigerators, personal digital assistants, home servers, televisions, +as well as the usual array of laptops, servers, and so on. Big +challenges include support for mobility, battery life, capacity of the +access links to the network, and ability to scale the optical core of +the network up in an unlimited fashion. Designing an interplanetary +extension of the Internet is a project in which I am deeply engaged at +the Jet Propulsion Laboratory. We will need to cut over from IPv4 +\[32-bit addresses\] to IPv6 \[128 bits\]. The list is long! Who has +inspired you professionally? My colleague Bob Kahn; my thesis advisor, +Gerald Estrin; my best friend, Steve Crocker (we met in high school and +he introduced me to computers in 1960!); and the thousands of engineers +who continue to evolve the Internet today. Do you have any advice for +students entering the networking/Internet field? Think outside the +limitations of existing systems---imagine what might be possible; but +then do the hard work of figuring out how to get there from the current +state of affairs. Dare to dream: A half dozen colleagues and I at the +Jet Propulsion Laboratory have been working on the design of an +interplanetary extension of the terrestrial Internet. It may take +decades to implement this, + +mission by mission, but to paraphrase: "A man's reach should exceed his +grasp, or what are the heavens for?" + +Chapter 5 The Network Layer: Control Plane + +In this chapter, we'll complete our journey through the network layer by +covering the control-plane component of the network layer---the +network-wide logic that controls not only how a datagram is forwarded +among routers along an end-to-end path from the source host to the +destination host, but also how network-layer components and services are +configured and managed. In Section 5.2, we'll cover traditional routing +algorithms for computing least cost paths in a graph; these algorithms +are the basis for two widely deployed Internet routing protocols: OSPF +and BGP, that we'll cover in Sections 5.3 and 5.4, respectively. As +we'll see, OSPF is a routing protocol that operates within a single +ISP's network. BGP is a routing protocol that serves to interconnect all +of the networks in the Internet; BGP is thus often referred to as the +"glue" that holds the Internet together. Traditionally, control-plane +routing protocols have been implemented together with data-plane +forwarding functions, monolithically, within a router. As we learned in +the introduction to Chapter 4, software-defined networking (SDN) makes a +clear separation between the data and control planes, implementing +control-plane functions in a separate "controller" service that is +distinct, and remote, from the forwarding components of the routers it +controls. We'll cover SDN controllers in Section 5.5. In Sections 5.6 +and 5.7 we'll cover some of the nuts and bolts of managing an IP +network: ICMP (the Internet Control Message Protocol) and SNMP (the +Simple Network Management Protocol). + +5.1 Introduction Let's quickly set the context for our study of the +network control plane by recalling Figures 4.2 and 4.3. There, we saw +that the forwarding table (in the case of destination-based forwarding) +and the flow table (in the case of generalized forwarding) were the +principal elements that linked the network layer's data and control +planes. We learned that these tables specify the local data-plane +forwarding behavior of a router. We saw that in the case of generalized +forwarding, the actions taken (Section 4.4.2) could include not only +forwarding a packet to a router's output port, but also dropping a +packet, replicating a packet, and/or rewriting layer 2, 3 or 4 +packet-header fields. In this chapter, we'll study how those forwarding +and flow tables are computed, maintained and installed. In our +introduction to the network layer in Section 4.1, we learned that there +are two possible approaches for doing so. Per-router control. Figure 5.1 +illustrates the case where a routing algorithm runs in each and every +router; both a forwarding and a routing function are contained + +Figure 5.1 Per-router control: Individual routing algorithm components +interact in the control plane + +within each router. Each router has a routing component that +communicates with the routing components in other routers to compute the +values for its forwarding table. This per-router control approach has +been used in the Internet for decades. The OSPF and BGP protocols that +we'll study in Sections 5.3 and 5.4 are based on this per-router +approach to control. Logically centralized control. Figure 5.2 +illustrates the case in which a logically centralized controller +computes and distributes the forwarding tables to be used by each and +every router. As we saw in Section 4.4, the generalized +match-plus-action abstraction allows the router to perform traditional +IP forwarding as well as a rich set of other functions (load sharing, +firewalling, and NAT) that had been previously implemented in separate +middleboxes. + +Figure 5.2 Logically centralized control: A distinct, typically remote, +controller interacts with local control agents (CAs) + +The controller interacts with a control agent (CA) in each of the +routers via a well-defined protocol to configure and manage that +router's flow table. Typically, the CA has minimum functionality; its +job is to communicate with the controller, and to do as the controller +commands. Unlike the routing algorithms in Figure 5.1, the CAs do not +directly interact with each other nor do they actively take part in +computing + +the forwarding table. This is a key distinction between per-router +control and logically centralized control. By "logically centralized" +control \[Levin 2012\] we mean that the routing control service is +accessed as if it were a single central service point, even though the +service is likely to be implemented via multiple servers for +fault-tolerance, and performance scalability reasons. As we will see in +Section 5.5, SDN adopts this notion of a logically centralized +controller---an approach that is finding increased use in production +deployments. Google uses SDN to control the routers in its internal B4 +global wide-area network that interconnects its data centers \[Jain +2013\]. SWAN \[Hong 2013\], from Microsoft Research, uses a logically +centralized controller to manage routing and forwarding between a wide +area network and a data center network. China Telecom and China Unicom +are using SDN both within data centers and between data centers \[Li +2015\]. AT&T has noted \[AT&T 2013\] that it "supports many SDN +capabilities and independently defined, proprietary mechanisms that fall +under the SDN architectural framework." + +5.2 Routing Algorithms In this section we'll study routing algorithms, +whose goal is to determine good paths (equivalently, routes), from +senders to receivers, through the network of routers. Typically, a +"good" path is one that has the least cost. We'll see that in practice, +however, real-world concerns such as policy issues (for example, a rule +such as "router x, belonging to organization Y, should not forward any +packets originating from the network owned by organization Z") also come +into play. We note that whether the network control plane adopts a +per-router control approach or a logically centralized approach, there +must always be a welldefined sequence of routers that a packet will +cross in traveling from sending to receiving host. Thus, the routing +algorithms that compute these paths are of fundamental importance, and +another candidate for our top-10 list of fundamentally important +networking concepts. A graph is used to formulate routing problems. +Recall that a graph G=(N, E) is a set N of nodes and a collection E of +edges, where each edge is a pair of nodes from N. In the context of +network-layer routing, the nodes in the graph represent + +Figure 5.3 Abstract graph model of a computer network + +routers---the points at which packet-forwarding decisions are made---and +the edges connecting these nodes represent the physical links between +these routers. Such a graph abstraction of a computer network is shown +in Figure 5.3. To view some graphs representing real network maps, see +\[Dodge 2016, Cheswick 2000\]; for a discussion of how well different +graph-based models model the Internet, see \[Zegura 1997, Faloutsos +1999, Li 2004\]. As shown in Figure 5.3, an edge also has a value +representing its cost. Typically, an edge's cost may reflect the +physical length of the corresponding link (for example, a transoceanic +link might have a higher + +cost than a short-haul terrestrial link), the link speed, or the +monetary cost associated with a link. For our purposes, we'll simply +take the edge costs as a given and won't worry about how they are +determined. For any edge (x, y) in E, we denote c(x, y) as the cost of +the edge between nodes x and y. If the pair (x, y) does not belong to E, +we set c(x, y)=∞. Also, we'll only consider undirected graphs (i.e., +graphs whose edges do not have a direction) in our discussion here, so +that edge (x, y) is the same as edge (y, x) and that c(x, y)=c(y, x); +however, the algorithms we'll study can be easily extended to the case +of directed links with a different cost in each direction. Also, a node +y is said to be a neighbor of node x if (x, y) belongs to E. Given that +costs are assigned to the various edges in the graph abstraction, a +natural goal of a routing algorithm is to identify the least costly +paths between sources and destinations. To make this problem more +precise, recall that a path in a graph G=(N, E) is a sequence of nodes +(x1,x2,⋯,xp) such that each of the pairs (x1,x2),(x2,x3),⋯,(xp−1,xp) are +edges in E. The cost of a path (x1,x2,⋯, xp) is simply the sum of all +the edge costs along the path, that is, c(x1,x2)+c(x2,x3)+⋯+c(xp−1,xp). +Given any two nodes x and y, there are typically many paths between the +two nodes, with each path having a cost. One or more of these paths is a +least-cost path. The least-cost problem is therefore clear: Find a path +between the source and destination that has least cost. In Figure 5.3, +for example, the least-cost path between source node u and destination +node w is (u, x, y, w) with a path cost of 3. Note that if all edges in +the graph have the same cost, the least-cost path is also the shortest +path (that is, the path with the smallest number of links between the +source and the destination). As a simple exercise, try finding the +least-cost path from node u to z in Figure 5.3 and reflect for a moment +on how you calculated that path. If you are like most people, you found +the path from u to z by examining Figure 5.3, tracing a few routes from +u to z, and somehow convincing yourself that the path you had chosen had +the least cost among all possible paths. (Did you check all of the 17 +possible paths between u and z? Probably not!) Such a calculation is an +example of a centralized routing algorithm---the routing algorithm was +run in one location, your brain, with complete information about the +network. Broadly, one way in which we can classify routing algorithms is +according to whether they are centralized or decentralized. A +centralized routing algorithm computes the least-cost path between a +source and destination using complete, global knowledge about the +network. That is, the algorithm takes the connectivity between all nodes +and all link costs as inputs. This then requires that the algorithm +somehow obtain this information before actually performing the +calculation. The calculation itself can be run at one site (e.g., a +logically centralized controller as in Figure 5.2) or could be +replicated in the routing component of each and every router (e.g., as +in Figure 5.1). The key distinguishing feature here, however, is that +the algorithm has complete information about connectivity and link +costs. Algorithms with global state information are often referred to as +link-state (LS) algorithms, since the algorithm must be aware of the +cost of each link in the network. We'll study LS algorithms in Section +5.2.1. In a decentralized routing algorithm, the calculation of the +least-cost path is carried out in an + +iterative, distributed manner by the routers. No node has complete +information about the costs of all network links. Instead, each node +begins with only the knowledge of the costs of its own directly attached +links. Then, through an iterative process of calculation and exchange of +information with its neighboring nodes, a node gradually calculates the +least-cost path to a destination or set of destinations. The +decentralized routing algorithm we'll study below in Section 5.2.2 is +called a distance-vector (DV) algorithm, because each node maintains a +vector of estimates of the costs (distances) to all other nodes in the +network. Such decentralized algorithms, with interactive message +exchange between neighboring routers is perhaps more naturally suited to +control planes where the routers interact directly with each other, as +in Figure 5.1. A second broad way to classify routing algorithms is +according to whether they are static or dynamic. In static routing +algorithms, routes change very slowly over time, often as a result of +human intervention (for example, a human manually editing a link costs). +Dynamic routing algorithms change the routing paths as the network +traffic loads or topology change. A dynamic algorithm can be run either +periodically or in direct response to topology or link cost changes. +While dynamic algorithms are more responsive to network changes, they +are also more susceptible to problems such as routing loops and route +oscillation. A third way to classify routing algorithms is according to +whether they are load-sensitive or loadinsensitive. In a load-sensitive +algorithm, link costs vary dynamically to reflect the current level of +congestion in the underlying link. If a high cost is associated with a +link that is currently congested, a routing algorithm will tend to +choose routes around such a congested link. While early ARPAnet routing +algorithms were load-sensitive \[McQuillan 1980\], a number of +difficulties were encountered \[Huitema 1998\]. Today's Internet routing +algorithms (such as RIP, OSPF, and BGP) are load-insensitive, as a +link's cost does not explicitly reflect its current (or recent past) +level of congestion. + +5.2.1 The Link-State (LS) Routing Algorithm Recall that in a link-state +algorithm, the network topology and all link costs are known, that is, +available as input to the LS algorithm. In practice this is accomplished +by having each node broadcast link-state packets to all other nodes in +the network, with each link-state packet containing the identities and +costs of its attached links. In practice (for example, with the +Internet's OSPF routing protocol, discussed in Section 5.3) this is +often accomplished by a link-state broadcast algorithm \[Perlman 1999\]. +The result of the nodes' broadcast is that all nodes have an identical +and complete view of the network. Each node can then run the LS +algorithm and compute the same set of least-cost paths as every other +node. The link-state routing algorithm we present below is known as +Dijkstra's algorithm, named after its inventor. A closely related +algorithm is Prim's algorithm; see \[Cormen 2001\] for a general +discussion of graph algorithms. Dijkstra's algorithm computes the +least-cost path from one node (the source, which we will refer to as u) +to all other nodes in the network. Dijkstra's algorithm is iterative and +has the property that + +after the kth iteration of the algorithm, the least-cost paths are known +to k destination nodes, and among the least-cost paths to all +destination nodes, these k paths will have the k smallest costs. Let us +define the following notation: D(v): cost of the least-cost path from +the source node to destination v as of this iteration of the algorithm. +p(v): previous node (neighbor of v) along the current least-cost path +from the source to v. N′: subset of nodes; v is in N′ if the least-cost +path from the source to v is definitively known. The centralized routing +algorithm consists of an initialization step followed by a loop. The +number of times the loop is executed is equal to the number of nodes in +the network. Upon termination, the algorithm will have calculated the +shortest paths from the source node u to every other node in the +network. + +Link-State (LS) Algorithm for Source Node u + +1 + +Initialization: + +2 + +N' = {u} + +3 + +for all nodes v + +4 + +if v is a neighbor of u + +5 + +then D(v) = c(u, v) + +6 + +else D(v) = ∞ + +7 8 + +Loop + +9 + +find w not in N' such that D(w) is a minimum + +10 + +add w to N' + +11 + +update D(v) for each neighbor v of w and not in N': + +12 + +D(v) = min(D(v), D(w)+ c(w, v) ) + +13 + +/\* new cost to v is either old cost to v or known + +14 + +least path cost to w plus cost from w to v \*/ + +15 until N'= N + +As an example, let's consider the network in Figure 5.3 and compute the +least-cost paths from u to all possible destinations. A tabular summary +of the algorithm's computation is shown in Table 5.1, where each line in +the table gives the values of the algorithm's variables at the end of +the iteration. Let's consider the few first steps in detail. In the +initialization step, the currently known least-cost paths from u to its +directly attached neighbors, + +v, x, and w, are initialized to 2, 1, and 5, respectively. Note in Table +5.1 Running the link-state algorithm on the network in Figure 5.3 step + +N' + +D (v), p (v) + +D (w), p (w) + +D (x), p (x) + +D (y), p (y) + +D (z), p (z) + +0 + +u + +2, u + +5, u + +1,u + +∞ + +∞ + +1 + +ux + +2, u + +4, x + +2, x + +∞ + +2 + +uxy + +2, u + +3, y + +4, y + +3 + +uxyv + +3, y + +4, y + +4 + +uxyvw + +5 + +uxyvwz + +4, y + +particular that the cost to w is set to 5 (even though we will soon see +that a lesser-cost path does indeed exist) since this is the cost of the +direct (one hop) link from u to w. The costs to y and z are set to +infinity because they are not directly connected to u. In the first +iteration, we look among those nodes not yet added to the set N′ and +find that node with the least cost as of the end of the previous +iteration. That node is x, with a cost of 1, and thus x is added to the +set N′. Line 12 of the LS algorithm is then performed to update D(v) for +all nodes v, yielding the results shown in the second line (Step 1) in +Table 5.1. The cost of the path to v is unchanged. The cost of the path +to w (which was 5 at the end of the initialization) through node x is +found to have a cost of 4. Hence this lower-cost path is selected and +w's predecessor along the shortest path from u is set to x. Similarly, +the cost to y (through x) is computed to be 2, and the table is updated +accordingly. In the second iteration, nodes v and y are found to have +the least-cost paths (2), and we break the tie arbitrarily and add y to +the set N′ so that N′ now contains u, x, and y. The cost to the +remaining nodes not yet in N′, that is, nodes v, w, and z, are updated +via line 12 of the LS algorithm, yielding the results shown in the third +row in Table 5.1. And so on . . . When the LS algorithm terminates, we +have, for each node, its predecessor along the least-cost path from the +source node. For each predecessor, we also have its predecessor, and so +in this manner we can construct the entire path from the source to all +destinations. The forwarding table in a node, say node u, can then be +constructed from this information by storing, for each destination, the +next-hop node on the least-cost path from u to the destination. Figure +5.4 shows the resulting least-cost paths and forwarding table in u for +the network in Figure 5.3. + +Figure 5.4 Least cost path and forwarding table for node u + +What is the computational complexity of this algorithm? That is, given n +nodes (not counting the source), how much computation must be done in +the worst case to find the least-cost paths from the source to all +destinations? In the first iteration, we need to search through all n +nodes to determine the node, w, not in N′ that has the minimum cost. In +the second iteration, we need to check n−1 nodes to determine the +minimum cost; in the third iteration n−2 nodes, and so on. Overall, the +total number of nodes we need to search through over all the iterations +is n(n+1)/2, and thus we say that the preceding implementation of the LS +algorithm has worst-case complexity of order n squared: O(n2). (A more +sophisticated implementation of this algorithm, using a data structure +known as a heap, can find the minimum in line 9 in logarithmic rather +than linear time, thus reducing the complexity.) Before completing our +discussion of the LS algorithm, let us consider a pathology that can +arise. Figure 5.5 shows a simple network topology where link costs are +equal to the load carried on the link, for example, reflecting the delay +that would be experienced. In this example, link costs are not +symmetric; that is, c(u, v) equals c(v, u) only if the load carried on +both directions on the link (u, v) is the same. In this example, node z +originates a unit of traffic destined for w, node x also originates a +unit of traffic destined for w, and node y injects an amount of traffic +equal to e, also destined for w. The initial routing is shown in Figure +5.5(a) with the link costs corresponding to the amount of traffic +carried. When the LS algorithm is next run, node y determines (based on +the link costs shown in Figure 5.5(a)) that the clockwise path to w has +a cost of 1, while the counterclockwise path to w (which it had been +using) has a cost of 1+e. Hence y's least-cost path to w is now +clockwise. Similarly, x determines that its new least-cost path to w is +also clockwise, resulting in costs shown in Figure 5.5(b). When the LS +algorithm is run next, nodes x, y, and z all detect a zero-cost path to +w in the counterclockwise direction, and all route their traffic to the +counterclockwise routes. The next time the LS algorithm is run, x, y, +and z all then route their traffic to the clockwise routes. What can be +done to prevent such oscillations (which can occur in any algorithm, not +just an LS algorithm, that uses a congestion or delay-based link +metric)? One solution would be to mandate that link costs not depend on +the amount of traffic + +Figure 5.5 Oscillations with congestion-sensitive routing + +carried---an unacceptable solution since one goal of routing is to avoid +highly congested (for example, high-delay) links. Another solution is to +ensure that not all routers run the LS algorithm at the same time. This +seems a more reasonable solution, since we would hope that even if +routers ran the LS algorithm with the same periodicity, the execution +instance of the algorithm would not be the same at each node. +Interestingly, researchers have found that routers in the Internet can +self-synchronize among themselves \[Floyd Synchronization 1994\]. That +is, even though they initially execute the algorithm with the same +period but at different instants of time, the algorithm execution +instance can eventually become, and remain, synchronized at the routers. +One way to avoid such self-synchronization is for each router to +randomize the time it sends out a link advertisement. Having studied the +LS algorithm, let's consider the other major routing algorithm that is +used in practice today---the distance-vector routing algorithm. + +5.2.2 The Distance-Vector (DV) Routing Algorithm Whereas the LS +algorithm is an algorithm using global information, the distance-vector +(DV) algorithm is iterative, asynchronous, and distributed. It is +distributed in that each node receives some information from one or more +of its directly attached neighbors, performs a calculation, and then +distributes the results of its calculation back to its neighbors. It is +iterative in that this process continues on until no more information is +exchanged between neighbors. (Interestingly, the algorithm is also +self-terminating---there is no signal that the computation should stop; +it just stops.) The algorithm is asynchronous in that it does not +require all of the nodes to operate in lockstep with each other. We'll +see that an asynchronous, iterative, selfterminating, distributed +algorithm is much more interesting and fun than a centralized algorithm! +Before we present the DV algorithm, it will prove beneficial to discuss +an important relationship that exists among the costs of the least-cost +paths. Let dx(y) be the cost of the least-cost path from node x to node +y. Then the least costs are related by the celebrated Bellman-Ford +equation, namely, + +(5.1) + +dx(y)=minv{c(x,v)+dv(y)}, where the minv in the equation is taken over +all of x's neighbors. The Bellman-Ford equation is rather + +intuitive. Indeed, after traveling from x to v, if we then take the +least-cost path from v to y, the path cost will be c(x,v)+dv(y). Since +we must begin by traveling to some neighbor v, the least cost from x to +y is the minimum of c(x,v)+dv(y) taken over all neighbors v. But for +those who might be skeptical about the validity of the equation, let's +check it for source node u and destination node z in Figure 5.3. The +source node u has three neighbors: nodes v, x, and w. By walking along +various paths in the graph, it is easy to see that dv(z)=5, dx(z)=3, and +dw(z)=3. Plugging these values into Equation 5.1, along with the costs +c(u,v)=2, c(u,x)=1, and c(u,w)=5, gives du(z)=min{2+5,5+3,1+3}=4, which +is obviously true and which is exactly what the Dijskstra algorithm gave +us for the same network. This quick verification should help relieve any +skepticism you may have. The Bellman-Ford equation is not just an +intellectual curiosity. It actually has significant practical +importance: the solution to the Bellman-Ford equation provides the +entries in node x's forwarding table. To see this, let v\* be any +neighboring node that achieves the minimum in Equation 5.1. Then, if +node x wants to send a packet to node y along a least-cost path, it +should first forward the packet to node v*. Thus, node x's forwarding +table would specify node v* as the next-hop router for the ultimate +destination y. Another important practical contribution of the +Bellman-Ford equation is that it suggests the form of the +neighborto-neighbor communication that will take place in the DV +algorithm. The basic idea is as follows. Each node x begins with Dx(y), +an estimate of the cost of the least-cost path from itself to node y, +for all nodes, y, in N. Let Dx=\[Dx(y): y in N\] be node x's distance +vector, which is the vector of cost estimates from x to all other nodes, +y, in N. With the DV algorithm, each node x maintains the following +routing information: For each neighbor v, the cost c(x, v) from x to +directly attached neighbor, v Node x's distance vector, that is, +Dx=\[Dx(y): y in N\], containing x's estimate of its cost to all +destinations, y, in N The distance vectors of each of its neighbors, +that is, Dv=\[Dv(y): y in N\] for each neighbor v of x In the +distributed, asynchronous algorithm, from time to time, each node sends +a copy of its distance vector to each of its neighbors. When a node x +receives a new distance vector from any of its neighbors w, it saves w's +distance vector, and then uses the Bellman-Ford equation to update its +own distance vector as follows: Dx(y)=minv{c(x,v)+Dv(y)} + +for each node y in N + +If node x's distance vector has changed as a result of this update step, +node x will then send its updated + +distance vector to each of its neighbors, which can in turn update their +own distance vectors. Miraculously enough, as long as all the nodes +continue to exchange their distance vectors in an asynchronous fashion, +each cost estimate Dx(y) converges to dx(y), the actual cost of the +least-cost path from node x to node y \[Bertsekas 1991\]! +Distance-Vector (DV) Algorithm At each node, x: + +1 2 + +Initialization: for all destinations y in N: + +3 4 + +Dx(y)= c(x, y)/\* if y is not a neighbor then c(x, y)= ∞ \*/ for each +neighbor w + +5 6 + +Dw(y) = ? for all destinations y in N for each neighbor w + +7 + +send distance vector + +Dx = \[Dx(y): y in N\] to w + +8 9 10 + +loop wait + +11 + +(until I see a link cost change to some neighbor w or until I receive a +distance vector from some neighbor w) + +12 13 + +for each y in N: + +14 + +Dx(y) = minv{c(x, v) + Dv(y)} + +15 16 if Dx(y) changed for any destination y 17 + +send distance vector Dx + += \[Dx(y): y in N\] to all neighbors + +18 19 forever + +In the DV algorithm, a node x updates its distance-vector estimate when +it either sees a cost change in one of its directly attached links or +receives a distance-vector update from some neighbor. But to update its +own forwarding table for a given destination y, what node x really needs +to know is not the shortest-path distance to y but instead the +neighboring node v*(y) that is the next-hop router along the shortest +path to y. As you might expect, the next-hop router v*(y) is the +neighbor v that achieves the minimum in Line 14 of the DV algorithm. (If +there are multiple neighbors v that achieve the minimum, then v*(y) can +be any of the minimizing neighbors.) Thus, in Lines 13--14, for each +destination y, node x also determines v*(y) and updates its forwarding +table for destination y. + +Recall that the LS algorithm is a centralized algorithm in the sense +that it requires each node to first obtain a complete map of the network +before running the Dijkstra algorithm. The DV algorithm is decentralized +and does not use such global information. Indeed, the only information a +node will have is the costs of the links to its directly attached +neighbors and information it receives from these neighbors. Each node +waits for an update from any neighbor (Lines 10--11), calculates its new +distance vector when receiving an update (Line 14), and distributes its +new distance vector to its neighbors (Lines 16--17). DV-like algorithms +are used in many routing protocols in practice, including the Internet's +RIP and BGP, ISO IDRP, Novell IPX, and the original ARPAnet. Figure 5.6 +illustrates the operation of the DV algorithm for the simple three-node +network shown at the top of the figure. The operation of the algorithm +is illustrated in a synchronous manner, where all nodes simultaneously +receive distance vectors from their neighbors, compute their new +distance vectors, and inform their neighbors if their distance vectors +have changed. After studying this example, you should convince yourself +that the algorithm operates correctly in an asynchronous manner as well, +with node computations and update generation/reception occurring at any +time. The leftmost column of the figure displays three initial routing +tables for each of the three nodes. For example, the table in the +upper-left corner is node x's initial routing table. Within a specific +routing table, each row is a distance vector--- specifically, each +node's routing table includes its own distance vector and that of each +of its neighbors. Thus, the first row in node x's initial routing table +is Dx=\[Dx(x),Dx(y),Dx(z)\]=\[0,2,7\]. The second and third rows in this +table are the most recently received distance vectors from nodes y and +z, respectively. Because at initialization node x has not received +anything from node y or z, the entries in the second and third rows are +initialized to infinity. After initialization, each node sends its +distance vector to each of its two neighbors. This is illustrated in +Figure 5.6 by the arrows from the first column of tables to the second +column of tables. For example, node x sends its distance vector Dx = +\[0, 2, 7\] to both nodes y and z. After receiving the updates, each +node recomputes its own distance vector. For example, node x computes +Dx(x)=0Dx(y)=min{c(x,y)+Dy(y),c(x,z)+Dz(y)}=min{2+0, +7+1}=2Dx(z)=min{c(x,y)+Dy(z),c(x,z)+Dz(z)}=min{2+1,7+0}=3 The second +column therefore displays, for each node, the node's new distance vector +along with distance vectors just received from its neighbors. Note, for +example, that + +Figure 5.6 Distance-vector (DV) algorithm in operation + +node x's estimate for the least cost to node z, Dx(z), has changed from +7 to 3. Also note that for node x, neighboring node y achieves the +minimum in line 14 of the DV algorithm; thus at this stage of the +algorithm, we have at node x that v*(y)=y and v*(z)=y. After the nodes +recompute their distance vectors, they again send their updated distance +vectors to their neighbors (if there has been a change). This is +illustrated in Figure 5.6 by the arrows from the second column of tables +to the third column of tables. Note that only nodes x and z send +updates: node y's distance vector didn't change so node y doesn't send +an update. After receiving the updates, the nodes then recompute their +distance vectors and update their routing tables, which are shown in the +third column. + +The process of receiving updated distance vectors from neighbors, +recomputing routing table entries, and informing neighbors of changed +costs of the least-cost path to a destination continues until no update +messages are sent. At this point, since no update messages are sent, no +further routing table calculations will occur and the algorithm will +enter a quiescent state; that is, all nodes will be performing the wait +in Lines 10--11 of the DV algorithm. The algorithm remains in the +quiescent state until a link cost changes, as discussed next. +Distance-Vector Algorithm: Link-Cost Changes and Link Failure When a +node running the DV algorithm detects a change in the link cost from +itself to a neighbor (Lines 10--11), it updates its distance vector +(Lines 13--14) and, if there's a change in the cost of the least-cost +path, informs its neighbors (Lines 16--17) of its new distance vector. +Figure 5.7(a) illustrates a scenario where the link cost from y to x +changes from 4 to 1. We focus here only on y' and z's distance table +entries to destination x. The DV algorithm causes the following sequence +of events to occur: At time t0, y detects the link-cost change (the cost +has changed from 4 to 1), updates its distance vector, and informs its +neighbors of this change since its distance vector has changed. At time +t1, z receives the update from y and updates its table. It computes a +new least cost to x (it has decreased from a cost of 5 to a cost of 2) +and sends its new distance vector to its neighbors. At time t2, y +receives z's update and updates its distance table. y's least costs do +not change and hence y does not send any message to z. The algorithm +comes to a quiescent state. Thus, only two iterations are required for +the DV algorithm to reach a quiescent state. The good news about the +decreased cost between x and y has propagated quickly through the +network. + +Figure 5.7 Changes in link cost + +Let's now consider what can happen when a link cost increases. Suppose +that the link cost between x and y increases from 4 to 60, as shown in +Figure 5.7(b). + +1. Before the link cost changes, Dy(x)=4, Dy(z)=1, Dz(y)=1, and + Dz(x)=5. At time t0, y detects the link- + +cost change (the cost has changed from 4 to 60). y computes its new +minimum-cost path to x to have a cost of Dy(x)=min{c(y,x)+Dx(x), +c(y,z)+Dz(x)}=min{60+0,1+5}=6 Of course, with our global view of the +network, we can see that this new cost via z is wrong. But the only +information node y has is that its direct cost to x is 60 and that z has +last told y that z could get to x with a cost of 5. So in order to get +to x, y would now route through z, fully expecting that z will be able +to get to x with a cost of 5. As of t1 we have a routing loop---in order +to get to x, y routes through z, and z routes through y. A routing loop +is like a black hole---a packet destined for x arriving at y or z as of +t1 will bounce back and forth between these two nodes forever (or until +the forwarding tables are changed). + +2. Since node y has computed a new minimum cost to x, it informs z of + its new distance vector at time t1. + +3. Sometime after t1, z receives y's new distance vector, which + indicates that y's minimum cost to x is + +4. z knows it can get to y with a cost of 1 and hence computes a new + least cost to x of Dz(x)=min{50+0,1+6}=7. Since z's least cost to x + has increased, it then informs y of its new distance vector at t2. + +5. In a similar manner, after receiving z's new distance vector, y + determines Dy(x)=8 and sends z its distance vector. z then + determines Dz(x)=9 and sends y its distance vector, and so on. How + long will the process continue? You should convince yourself that + the loop will persist for 44 iterations (message exchanges between y + and z)---until z eventually computes the cost of its path via y to + be greater than 50. At this point, z will (finally!) determine that + its least-cost path to x is via its direct connection to x. y will + then route to x via z. The result of the bad news about the increase + in link cost has indeed traveled slowly! What would have happened if + the link cost c(y, x) had changed from 4 to 10,000 and the cost c(z, + +```{=html} +<!-- --> +``` +x) had been 9,999? Because of such scenarios, the problem we have seen + is sometimes referred to as the count-to-infinity problem. + Distance-Vector Algorithm: Adding Poisoned Reverse The specific + looping scenario just described can be avoided using a technique + known as poisoned reverse. The idea is simple---if z routes through + y to get to destination x, then z will advertise to y that its + distance to x is infinity, that is, z will advertise to y that + Dz(x)=∞ (even though z knows Dz(x)=5 in truth). z will continue + telling this little white lie to y as long as it routes to x via y. + Since y believes that z has no path to x, y will never attempt to + route to x via z, as long as z continues to route to x via y (and + lies about doing so). Let's now see how poisoned reverse solves the + particular looping problem we encountered before in Figure 5.5(b). + As a result of the poisoned reverse, y's distance table indicates + Dz(x)=∞. When the cost of the (x, y) link changes from 4 to 60 at + time t0, y updates its table and continues to route directly to x, + albeit + +at a higher cost of 60, and informs z of its new cost to x, that is, +Dy(x)=60. After receiving the update at t1, z immediately shifts its +route to x to be via the direct (z, x) link at a cost of 50. Since this +is a new least-cost path to x, and since the path no longer passes +through y, z now informs y that Dz(x)=50 at t2. After receiving the +update from z, y updates its distance table with Dy(x)=51. Also, since z +is now on y's leastcost path to x, y poisons the reverse path from z to +x by informing z at time t3 that Dy(x)=∞ (even though y knows that +Dy(x)=51 in truth). Does poisoned reverse solve the general +count-to-infinity problem? It does not. You should convince yourself +that loops involving three or more nodes (rather than simply two +immediately neighboring nodes) will not be detected by the poisoned +reverse technique. A Comparison of LS and DV Routing Algorithms The DV +and LS algorithms take complementary approaches toward computing +routing. In the DV algorithm, each node talks to only its directly +connected neighbors, but it provides its neighbors with leastcost +estimates from itself to all the nodes (that it knows about) in the +network. The LS algorithm requires global information. Consequently, +when implemented in each and every router, e.g., as in Figure 4.2 and +5.1, each node would need to communicate with all other nodes (via +broadcast), but it tells them only the costs of its directly connected +links. Let's conclude our study of LS and DV algorithms with a quick +comparison of some of their attributes. Recall that N is the set of +nodes (routers) and E is the set of edges (links). Message complexity. +We have seen that LS requires each node to know the cost of each link in +the network. This requires O(\|N\| \|E\|) messages to be sent. Also, +whenever a link cost changes, the new link cost must be sent to all +nodes. The DV algorithm requires message exchanges between directly +connected neighbors at each iteration. We have seen that the time needed +for the algorithm to converge can depend on many factors. When link +costs change, the DV algorithm will propagate the results of the changed +link cost only if the new link cost results in a changed least-cost path +for one of the nodes attached to that link. Speed of convergence. We +have seen that our implementation of LS is an O(\|N\|2) algorithm +requiring O(\|N\| \|E\|)) messages. The DV algorithm can converge slowly +and can have routing loops while the algorithm is converging. DV also +suffers from the count-to-infinity problem. Robustness. What can happen +if a router fails, misbehaves, or is sabotaged? Under LS, a router could +broadcast an incorrect cost for one of its attached links (but no +others). A node could also corrupt or drop any packets it received as +part of an LS broadcast. But an LS node is computing only its own +forwarding tables; other nodes are performing similar calculations for +themselves. This means route calculations are somewhat separated under +LS, providing a degree of robustness. Under DV, a node can advertise +incorrect least-cost paths to any or all destinations. (Indeed, in 1997, +a malfunctioning router in a small ISP provided national backbone +routers with erroneous routing information. This caused other routers to +flood the malfunctioning router with traffic and caused large portions +of the + +Internet to become disconnected for up to several hours \[Neumann +1997\].) More generally, we note that, at each iteration, a node's +calculation in DV is passed on to its neighbor and then indirectly to +its neighbor's neighbor on the next iteration. In this sense, an +incorrect node calculation can be diffused through the entire network +under DV. In the end, neither algorithm is an obvious winner over the +other; indeed, both algorithms are used in the Internet. + +5.3 Intra-AS Routing in the Internet: OSPF In our study of routing +algorithms so far, we've viewed the network simply as a collection of +interconnected routers. One router was indistinguishable from another in +the sense that all routers executed the same routing algorithm to +compute routing paths through the entire network. In practice, this +model and its view of a homogenous set of routers all executing the same +routing algorithm is simplistic for two important reasons: Scale. As the +number of routers becomes large, the overhead involved in communicating, +computing, and storing routing information becomes prohibitive. Today's +Internet consists of hundreds of millions of routers. Storing routing +information for possible destinations at each of these routers would +clearly require enormous amounts of memory. The overhead required to +broadcast connectivity and link cost updates among all of the routers +would be huge! A distance-vector algorithm that iterated among such a +large number of routers would surely never converge. Clearly, something +must be done to reduce the complexity of route computation in a network +as large as the Internet. Administrative autonomy. As described in +Section 1.3, the Internet is a network of ISPs, with each ISP consisting +of its own network of routers. An ISP generally desires to operate its +network as it pleases (for example, to run whatever routing algorithm it +chooses within its network) or to hide aspects of its network's internal +organization from the outside. Ideally, an organization should be able +to operate and administer its network as it wishes, while still being +able to connect its network to other outside networks. Both of these +problems can be solved by organizing routers into autonomous systems +(ASs), with each AS consisting of a group of routers that are under the +same administrative control. Often the routers in an ISP, and the links +that interconnect them, constitute a single AS. Some ISPs, however, +partition their network into multiple ASs. In particular, some tier-1 +ISPs use one gigantic AS for their entire network, whereas others break +up their ISP into tens of interconnected ASs. An autonomous system is +identified by its globally unique autonomous system number (ASN) \[RFC +1930\]. AS numbers, like IP addresses, are assigned by ICANN regional +registries \[ICANN 2016\]. Routers within the same AS all run the same +routing algorithm and have information about each other. The routing +algorithm running within an autonomous system is called an +intra-autonomous system routing protocol. Open Shortest Path First +(OSPF) + +OSPF routing and its closely related cousin, IS-IS, are widely used for +intra-AS routing in the Internet. The Open in OSPF indicates that the +routing protocol specification is publicly available (for example, as +opposed to Cisco's EIGRP protocol, which was only recently became open +\[Savage 2015\], after roughly 20 years as a Cisco-proprietary +protocol). The most recent version of OSPF, version 2, is defined in +\[RFC 2328\], a public document. OSPF is a link-state protocol that uses +flooding of link-state information and a Dijkstra's least-cost path +algorithm. With OSPF, each router constructs a complete topological map +(that is, a graph) of the entire autonomous system. Each router then +locally runs Dijkstra's shortest-path algorithm to determine a +shortest-path tree to all subnets, with itself as the root node. +Individual link costs are configured by the network administrator (see +sidebar, Principles and Practice: Setting OSPF Weights). The +administrator might choose to set all link costs to 1, + +PRINCIPLES IN PRACTICE SETTING OSPF LINK WEIGHTS Our discussion of +link-state routing has implicitly assumed that link weights are set, a +routing algorithm such as OSPF is run, and traffic flows according to +the routing tables computed by the LS algorithm. In terms of cause and +effect, the link weights are given (i.e., they come first) and result +(via Dijkstra's algorithm) in routing paths that minimize overall cost. +In this viewpoint, link weights reflect the cost of using a link (e.g., +if link weights are inversely proportional to capacity, then the use of +high-capacity links would have smaller weight and thus be more +attractive from a routing standpoint) and Dijsktra's algorithm serves to +minimize overall cost. In practice, the cause and effect relationship +between link weights and routing paths may be reversed, with network +operators configuring link weights in order to obtain routing paths that +achieve certain traffic engineering goals \[Fortz 2000, Fortz 2002\]. +For example, suppose a network operator has an estimate of traffic flow +entering the network at each ingress point and destined for each egress +point. The operator may then want to put in place a specific routing of +ingress-to-egress flows that minimizes the maximum utilization over all +of the network's links. But with a routing algorithm such as OSPF, the +operator's main "knobs" for tuning the routing of flows through the +network are the link weights. Thus, in order to achieve the goal of +minimizing the maximum link utilization, the operator must find the set +of link weights that achieves this goal. This is a reversal of the cause +and effect relationship---the desired routing of flows is known, and the +OSPF link weights must be found such that the OSPF routing algorithm +results in this desired routing of flows. + +thus achieving minimum-hop routing, or might choose to set the link +weights to be inversely proportional to link capacity in order to +discourage traffic from using low-bandwidth links. OSPF does not mandate +a policy for how link weights are set (that is the job of the network +administrator), but instead provides + +the mechanisms (protocol) for determining least-cost path routing for +the given set of link weights. With OSPF, a router broadcasts routing +information to all other routers in the autonomous system, not just to +its neighboring routers. A router broadcasts link-state information +whenever there is a change in a link's state (for example, a change in +cost or a change in up/down status). It also broadcasts a link's state +periodically (at least once every 30 minutes), even if the link's state +has not changed. RFC 2328 notes that "this periodic updating of link +state advertisements adds robustness to the link state algorithm." OSPF +advertisements are contained in OSPF messages that are carried directly +by IP, with an upper-layer protocol of 89 for OSPF. Thus, the OSPF +protocol must itself implement functionality such as reliable message +transfer and link-state broadcast. The OSPF protocol also checks that +links are operational (via a HELLO message that is sent to an attached +neighbor) and allows an OSPF router to obtain a neighboring router's +database of network-wide link state. Some of the advances embodied in +OSPF include the following: Security. Exchanges between OSPF routers +(for example, link-state updates) can be authenticated. With +authentication, only trusted routers can participate in the OSPF +protocol within an AS, thus preventing malicious intruders (or +networking students taking their newfound knowledge out for a joyride) +from injecting incorrect information into router tables. By default, +OSPF packets between routers are not authenticated and could be forged. +Two types of authentication can be configured--- simple and MD5 (see +Chapter 8 for a discussion on MD5 and authentication in general). With +simple authentication, the same password is configured on each router. +When a router sends an OSPF packet, it includes the password in +plaintext. Clearly, simple authentication is not very secure. MD5 +authentication is based on shared secret keys that are configured in all +the routers. For each OSPF packet that it sends, the router computes the +MD5 hash of the content of the OSPF packet appended with the secret key. +(See the discussion of message authentication codes in Chapter 8.) Then +the router includes the resulting hash value in the OSPF packet. The +receiving router, using the preconfigured secret key, will compute an +MD5 hash of the packet and compare it with the hash value that the +packet carries, thus verifying the packet's authenticity. Sequence +numbers are also used with MD5 authentication to protect against replay +attacks. Multiple same-cost paths. When multiple paths to a destination +have the same cost, OSPF allows multiple paths to be used (that is, a +single path need not be chosen for carrying all traffic when multiple +equal-cost paths exist). Integrated support for unicast and multicast +routing. Multicast OSPF (MOSPF) \[RFC 1584\] provides simple extensions +to OSPF to provide for multicast routing. MOSPF uses the existing OSPF +link database and adds a new type of link-state advertisement to the +existing OSPF link-state broadcast mechanism. Support for hierarchy +within a single AS. An OSPF autonomous system can be configured +hierarchically into areas. Each area runs its own OSPF link-state +routing algorithm, with each router in an area broadcasting its link +state to all other routers in that area. Within each area, one or more + +area border routers are responsible for routing packets outside the +area. Lastly, exactly one OSPF area in the AS is configured to be the +backbone area. The primary role of the backbone area is to route traffic +between the other areas in the AS. The backbone always contains all area +border routers in the AS and may contain non-border routers as well. +Inter-area routing within the AS requires that the packet be first +routed to an area border router (intra-area routing), then routed +through the backbone to the area border router that is in the +destination area, and then routed to the final destination. OSPF is a +relatively complex protocol, and our coverage here has been necessarily +brief; \[Huitema 1998; Moy 1998; RFC 2328\] provide additional details. + +5.4 Routing Among the ISPs: BGP We just learned that OSPF is an example +of an intra-AS routing protocol. When routing a packet between a source +and destination within the same AS, the route the packet follows is +entirely determined by the intra-AS routing protocol. However, to route +a packet across multiple ASs, say from a smartphone in Timbuktu to a +server in a datacenter in Silicon Valley, we need an inter-autonomous +system routing protocol. Since an inter-AS routing protocol involves +coordination among multiple ASs, communicating ASs must run the same +inter-AS routing protocol. In fact, in the Internet, all ASs run the +same inter-AS routing protocol, called the Border Gateway Protocol, more +commonly known as BGP \[RFC 4271; Stewart 1999\]. BGP is arguably the +most important of all the Internet protocols (the only other contender +would be the IP protocol that we studied in Section 4.3), as it is the +protocol that glues the thousands of ISPs in the Internet together. As +we will soon see, BGP is a decentralized and asynchronous protocol in +the vein of distance-vector routing described in Section 5.2.2. Although +BGP is a complex and challenging protocol, to understand the Internet on +a deep level, we need to become familiar with its underpinnings and +operation. The time we devote to learning BGP will be well worth the +effort. + +5.4.1 The Role of BGP To understand the responsibilities of BGP, +consider an AS and an arbitrary router in that AS. Recall that every +router has a forwarding table, which plays the central role in the +process of forwarding arriving packets to outbound router links. As we +have learned, for destinations that are within the same AS, the entries +in the router's forwarding table are determined by the AS's intra-AS +routing protocol. But what about destinations that are outside of the +AS? This is precisely where BGP comes to the rescue. In BGP, packets are +not routed to a specific destination address, but instead to CIDRized +prefixes, with each prefix representing a subnet or a collection of +subnets. In the world of BGP, a destination may take the form +138.16.68/22, which for this example includes 1,024 IP addresses. Thus, +a router's forwarding table will have entries of the form (x, I), where +x is a prefix (such as 138.16.68/22) and I is an interface number for +one of the router's interfaces. As an inter-AS routing protocol, BGP +provides each router a means to: + +1. Obtain prefix reachability information from neighboring ASs. In + particular, BGP allows each + +subnet to advertise its existence to the rest of the Internet. A subnet +screams, "I exist and I am here," and BGP makes sure that all the +routers in the Internet know about this subnet. If it weren't for BGP, +each subnet would be an isolated island---alone, unknown and unreachable +by the rest of the Internet. + +2. Determine the "best" routes to the prefixes. A router may learn + about two or more different routes to a specific prefix. To + determine the best route, the router will locally run a BGP + routeselection procedure (using the prefix reachability information + it obtained via neighboring routers). The best route will be + determined based on policy as well as the reachability information. + Let us now delve into how BGP carries out these two tasks. + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +5.4.2 Advertising BGP Route Information Consider the network shown in +Figure 5.8. As we can see, this simple network has three autonomous +systems: AS1, AS2, and AS3. As shown, AS3 includes a subnet with prefix +x. For each AS, each router is either a gateway router or an internal +router. A gateway router is a router on the edge of an AS that directly +connects to one or more routers in other ASs. An internal router +connects only to hosts and routers within its own AS. In AS1, for +example, router 1c is a gateway router; routers 1a, 1b, and 1d are +internal routers. Let's consider the task of advertising reachability +information for prefix x to all of the routers shown in Figure 5.8. At a +high level, this is straightforward. First, AS3 sends a BGP message to +AS2, saying that x exists and is in AS3; let's denote this message as +"AS3 x". Then AS2 sends a BGP message to AS1, saying that x exists and +that you can get to x by first passing through AS2 and then going to +AS3; let's denote that message as "AS2 AS3 x". In this manner, each of +the autonomous systems will not only learn about the existence of x, but +also learn about a path of autonomous systems that leads to x. Although +the discussion in the above paragraph about advertising BGP reachability +information should get the general idea across, it is not precise in the +sense that autonomous systems do not actually send messages to each +other, but instead routers do. To understand this, let's now re-examine +the example in Figure 5.8. In BGP, + +Figure 5.8 Network with three autonomous systems. AS3 includes a subnet +with prefix x + +pairs of routers exchange routing information over semi-permanent TCP +connections using port 179. Each such TCP connection, along with all the +BGP messages sent over the connection, is called a BGP connection. +Furthermore, a BGP connection that spans two ASs is called an external +BGP (eBGP) connection, and a BGP session between routers in the same AS +is called an internal BGP (iBGP) connection. Examples of BGP connections +for the network in Figure 5.8 are shown in Figure 5.9. There is +typically one eBGP connection for each link that directly connects +gateway routers in different ASs; thus, in Figure 5.9, there is an eBGP +connection between gateway routers 1c and 2a and an eBGP connection +between gateway routers 2c and 3a. There are also iBGP connections +between routers within each of the ASs. In particular, Figure 5.9 +displays a common configuration of one BGP connection for each pair of +routers internal to an AS, creating a mesh of TCP connections within +each AS. In Figure 5.9, the eBGP connections are shown with the long +dashes; the iBGP connections are shown with the short dashes. Note that +iBGP connections do not always correspond to physical links. In order to +propagate the reachability information, both iBGP and eBGP sessions are +used. Consider again advertising the reachability information for prefix +x to all routers in AS1 and AS2. In this process, gateway router 3a +first sends an eBGP message "AS3 x" to gateway router 2c. Gateway router +2c then sends the iBGP message "AS3 x" to all of the other routers in +AS2, including to gateway router 2a. Gateway router 2a then sends the +eBGP message "AS2 AS3 x" to gateway router 1c. + +Figure 5.9 eBGP and iBGP connections + +Finally, gateway router 1c uses iBGP to send the message "AS2 AS3 x" to +all the routers in AS1. After this process is complete, each router in +AS1 and AS2 is aware of the existence of x and is also aware of an AS +path that leads to x. Of course, in a real network, from a given router +there may be many different paths to a given destination, each through a +different sequence of ASs. For example, consider the network in Figure +5.10, which is the original network in Figure 5.8, with an additional +physical link from router 1d to router 3d. In this case, there are two +paths from AS1 to x: the path "AS2 AS3 x" via router 1c; and the new +path "AS3 x" via the router 1d. + +5.4.3 Determining the Best Routes As we have just learned, there may be +many paths from a given router to a destination subnet. In fact, in the +Internet, routers often receive reachability information about dozens of +different possible paths. How does a router choose among these paths +(and then configure its forwarding table accordingly)? Before addressing +this critical question, we need to introduce a little more BGP +terminology. When a router advertises a prefix across a BGP connection, +it includes with the prefix several BGP attributes. In BGP jargon, a +prefix along with its attributes is called a route. Two of the more +important attributes are AS-PATH and NEXT-HOP. The AS-PATH attribute +contains the list of ASs through which the + +Figure 5.10 Network augmented with peering link between AS1 and AS3 + +advertisement has passed, as we've seen in our examples above. To +generate the AS-PATH value, when a prefix is passed to an AS, the AS +adds its ASN to the existing list in the AS-PATH. For example, in Figure +5.10, there are two routes from AS1 to subnet x: one which uses the +AS-PATH "AS2 AS3"; and another that uses the AS-PATH "A3". BGP routers +also use the AS-PATH attribute to detect and prevent looping +advertisements; specifically, if a router sees that its own AS is +contained in the path list, it will reject the advertisement. Providing +the critical link between the inter-AS and intra-AS routing protocols, +the NEXT-HOP attribute has a subtle but important use. The NEXT-HOP is +the IP address of the router interface that begins the AS-PATH. To gain +insight into this attribute, let's again refer to Figure 5.10. As +indicated in Figure 5.10, the NEXT-HOP attribute for the route "AS2 AS3 +x" from AS1 to x that passes through AS2 is the IP address of the left +interface on router 2a. The NEXT-HOP attribute for the route "AS3 x" +from AS1 to x that bypasses AS2 is the IP address of the leftmost +interface of router 3d. In summary, in this toy example, each router in +AS1 becomes aware of two BGP routes to prefix x: IP address of leftmost +interface for router 2a; AS2 AS3; x IP address of leftmost interface of +router 3d; AS3; x Here, each BGP route is written as a list with three +components: NEXT-HOP; AS-PATH; destination prefix. In practice, a BGP +route includes additional attributes, which we will ignore for the time +being. Note that the NEXT-HOP attribute is an IP address of a router +that does not belong to AS1; however, the subnet that contains this IP +address directly attaches to AS1. Hot Potato Routing + +We are now finally in position to talk about BGP routing algorithms in a +precise manner. We will begin with one of the simplest routing +algorithms, namely, hot potato routing. Consider router 1b in the +network in Figure 5.10. As just described, this router will learn about +two possible BGP routes to prefix x. In hot potato routing, the route +chosen (from among all possible routes) is that route with the least +cost to the NEXT-HOP router beginning that route. In this example, +router 1b will consult its intra-AS routing information to find the +least-cost intra-AS path to NEXT-HOP router 2a and the least-cost +intra-AS path to NEXT-HOP router 3d, and then select the route with the +smallest of these least-cost paths. For example, suppose that cost is +defined as the number of links traversed. Then the least cost from +router 1b to router 2a is 2, the least cost from router 1b to router 2d +is 3, and router 2a would therefore be selected. Router 1b would then +consult its forwarding table (configured by its intra-AS algorithm) and +find the interface I that is on the least-cost path to router 2a. It +then adds (x, I) to its forwarding table. The steps for adding an +outside-AS prefix in a router's forwarding table for hot potato routing +are summarized in Figure 5.11. It is important to note that when adding +an outside-AS prefix into a forwarding table, both the inter-AS routing +protocol (BGP) and the intra-AS routing protocol (e.g., OSPF) are used. +The idea behind hot-potato routing is for router 1b to get packets out +of its AS as quickly as possible (more specifically, with the least cost +possible) without worrying about the cost of the remaining portions of +the path outside of its AS to the destination. In the name "hot potato +routing," a packet is analogous to a hot potato that is burning in your +hands. Because it is burning hot, you want to pass it off to another +person (another AS) as quickly as possible. Hot potato routing is thus + +Figure 5.11 Steps in adding outside-AS destination in a router's +forwarding table + +a selfish algorithm---it tries to reduce the cost in its own AS while +ignoring the other components of the end-to-end costs outside its AS. +Note that with hot potato routing, two routers in the same AS may choose +two different AS paths to the same prefix. For example, we just saw that +router 1b would send packets through AS2 to reach x. However, router 1d +would bypass AS2 and send packets directly to AS3 to reach x. +Route-Selection Algorithm + +In practice, BGP uses an algorithm that is more complicated than hot +potato routing, but nevertheless incorporates hot potato routing. For +any given destination prefix, the input into BGP's route-selection +algorithm is the set of all routes to that prefix that have been learned +and accepted by the router. If there is only one such route, then BGP +obviously selects that route. If there are two or more routes to the +same prefix, then BGP sequentially invokes the following elimination +rules until one route remains: + +1. A route is assigned a local preference value as one of its + attributes (in addition to the AS-PATH and NEXT-HOP attributes). The + local preference of a route could have been set by the router or + could have been learned from another router in the same AS. The + value of the local preference attribute is a policy decision that is + left entirely up to the AS's network administrator. (We will shortly + discuss BGP policy issues in some detail.) The routes with the + highest local preference values are selected. + +2. From the remaining routes (all with the same highest local + preference value), the route with the shortest AS-PATH is selected. + If this rule were the only rule for route selection, then BGP would + be using a DV algorithm for path determination, where the distance + metric uses the number of AS hops rather than the number of router + hops. + +3. From the remaining routes (all with the same highest local + preference value and the same ASPATH length), hot potato routing is + used, that is, the route with the closest NEXT-HOP router is + selected. + +4. If more than one route still remains, the router uses BGP + identifiers to select the route; see \[Stewart 1999\]. As an + example, let's again consider router 1b in Figure 5.10. Recall that + there are exactly two BGP routes to prefix x, one that passes + through AS2 and one that bypasses AS2. Also recall that if hot + potato routing on its own were used, then BGP would route packets + through AS2 to prefix x. But in the above route-selection algorithm, + rule 2 is applied before rule 3, causing BGP to select the route + that bypasses AS2, since that route has a shorter AS PATH. So we see + that with the above route-selection algorithm, BGP is no longer a + selfish algorithm---it first looks for routes with short AS paths + (thereby likely reducing end-to-end delay). As noted above, BGP is + the de facto standard for inter-AS routing for the Internet. To see + the contents of various BGP routing tables (large!) extracted from + routers in tier-1 ISPs, see http:// www.routeviews.org. BGP routing + tables often contain over half a million routes (that is, prefixes + and corresponding attributes). Statistics about the size and + characteristics of BGP routing tables are presented in \[Potaroo + 2016\]. + +5.4.4 IP-Anycast + +In addition to being the Internet's inter-AS routing protocol, BGP is +often used to implement the IPanycast service \[RFC 1546, RFC 7094\], +which is commonly used in DNS. To motivate IP-anycast, consider that in +many applications, we are interested in (1) replicating the same content +on different servers in many different dispersed geographical locations, +and (2) having each user access the content from the server that is +closest. For example, a CDN may replicate videos and other objects on +servers in different countries. Similarly, the DNS system can replicate +DNS records on DNS servers throughout the world. When a user wants to +access this replicated content, it is desirable to point the user to the +"nearest" server with the replicated content. BGP's route-selection +algorithm provides an easy and natural mechanism for doing so. To make +our discussion concrete, let's describe how a CDN might use IP-anycast. +As shown in Figure 5.12, during the IP-anycast configuration stage, the +CDN company assigns the same IP address to each of its servers, and uses +standard BGP to advertise this IP address from each of the servers. When +a BGP router receives multiple route advertisements for this IP address, +it treats these advertisements as providing different paths to the same +physical location (when, in fact, the advertisements are for different +paths to different physical locations). When configuring its routing +table, each router will locally use the BGP route-selection algorithm to +pick the "best" (for example, closest, as determined by AS-hop counts) +route to that IP address. For example, if one BGP route (corresponding +to one location) is only one AS hop away from the router, and all other +BGP routes (corresponding to other locations) are two or more AS hops +away, then the BGP router would choose to route packets to the location +that is one hop away. After this initial BGP address-advertisement +phase, the CDN can do its main job of distributing content. When a +client requests the video, the CDN returns to the client the common IP +address used by the geographically dispersed servers, no matter where +the client is located. When the client sends a request to that IP +address, Internet routers then forward the request packet to the +"closest" server, as defined by the BGP route-selection algorithm. +Although the above CDN example nicely illustrates how IP-anycast can be +used, in practice CDNs generally choose not to use IP-anycast because +BGP routing changes can result in different packets of the same TCP +connection arriving at different instances of the Web server. But +IP-anycast is extensively used by the DNS system to direct DNS queries +to the closest root DNS server. Recall from Section 2.4, there are +currently 13 IP addresses for root DNS servers. But corresponding + +Figure 5.12 Using IP-anycast to bring users to the closest CDN server + +to each of these addresses, there are multiple DNS root servers, with +some of these addresses having over 100 DNS root servers scattered over +all corners of the world. When a DNS query is sent to one of these 13 IP +addresses, IP anycast is used to route the query to the nearest of the +DNS root servers that is responsible for that address. + +5.4.5 Routing Policy When a router selects a route to a destination, the +AS routing policy can trump all other considerations, such as shortest +AS path or hot potato routing. Indeed, in the route-selection algorithm, +routes are first selected according to the local-preference attribute, +whose value is fixed by the policy of the local AS. Let's illustrate +some of the basic concepts of BGP routing policy with a simple example. +Figure 5.13 shows six interconnected autonomous systems: A, B, C, W, X, +and Y. It is important to note that A, B, C, W, X, and Y are ASs, not +routers. Let's + +Figure 5.13 A simple BGP policy scenario + +assume that autonomous systems W, X, and Y are access ISPs and that A, +B, and C are backbone provider networks. We'll also assume that A, B, +and C, directly send traffic to each other, and provide full BGP +information to their customer networks. All traffic entering an ISP +access network must be destined for that network, and all traffic +leaving an ISP access network must have originated in that network. W +and Y are clearly access ISPs. X is a multi-homed access ISP, since it +is connected to the rest of the network via two different providers (a +scenario that is becoming increasingly common in practice). However, +like W and Y, X itself must be the source/destination of all traffic +leaving/entering X. But how will this stub network behavior be +implemented and enforced? How will X be prevented from forwarding +traffic between B and C? This can easily be accomplished by controlling +the manner in which BGP routes are advertised. In particular X will +function as an access ISP network if it advertises (to its neighbors B +and C) that it has no paths to any other destinations except itself. +That is, even though X may know of a path, say XCY, that reaches network +Y, it will not advertise this path to B. Since B is unaware that X has a +path to Y, B would never forward traffic destined to Y (or C) via X. +This simple example illustrates how a selective route advertisement +policy can be used to implement customer/provider routing relationships. +Let's next focus on a provider network, say AS B. Suppose that B has +learned (from A) that A has a path AW to W. B can thus install the route +AW into its routing information base. Clearly, B also wants to advertise +the path BAW to its customer, X, so that X knows that it can route to W +via B. But should B advertise the path BAW to C? If it does so, then C +could route traffic to W via BAW. If A, B, and C are all backbone +providers, than B might rightly feel that it should not have to shoulder +the burden (and cost!) of carrying transit traffic between A and C. B +might rightly feel that it is A's and C's job (and cost!) to make sure +that C can route to/from A's customers via a direct connection between A +and C. There are currently no official standards that govern how +backbone ISPs route among themselves. However, a rule of thumb followed +by commercial ISPs is that any traffic flowing across an ISP's backbone +network must have either a source or a destination (or both) in a +network that is a customer of that ISP; otherwise the traffic would be +getting a free ride on the ISP's network. Individual peering agreements +(that would govern questions such as + +PRINCIPLES IN PRACTICE + +WHY ARE THERE DIFFERENT INTER-AS AND INTRA-AS ROUTING PROTOCOLS? Having +now studied the details of specific inter-AS and intra-AS routing +protocols deployed in today's Internet, let's conclude by considering +perhaps the most fundamental question we could ask about these protocols +in the first place (hopefully, you have been wondering this all along, +and have not lost the forest for the trees!): Why are different inter-AS +and intra-AS routing protocols used? The answer to this question gets at +the heart of the differences between the goals of routing within an AS +and among ASs: Policy. Among ASs, policy issues dominate. It may well be +important that traffic originating in a given AS not be able to pass +through another specific AS. Similarly, a given AS may well want to +control what transit traffic it carries between other ASs. We have seen +that BGP carries path attributes and provides for controlled +distribution of routing information so that such policy-based routing +decisions can be made. Within an AS, everything is nominally under the +same administrative control, and thus policy issues play a much less +important role in choosing routes within the AS. Scale. The ability of a +routing algorithm and its data structures to scale to handle routing +to/among large numbers of networks is a critical issue in inter-AS +routing. Within an AS, scalability is less of a concern. For one thing, +if a single ISP becomes too large, it is always possible to divide it +into two ASs and perform inter-AS routing between the two new ASs. +(Recall that OSPF allows such a hierarchy to be built by splitting an AS +into areas.) Performance. Because inter-AS routing is so policy +oriented, the quality (for example, performance) of the routes used is +often of secondary concern (that is, a longer or more costly route that +satisfies certain policy criteria may well be taken over a route that is +shorter but does not meet that criteria). Indeed, we saw that among ASs, +there is not even the notion of cost (other than AS hop count) +associated with routes. Within a single AS, however, such policy +concerns are of less importance, allowing routing to focus more on the +level of performance realized on a route. + +those raised above) are typically negotiated between pairs of ISPs and +are often confidential; \[Huston 1999a\] provides an interesting +discussion of peering agreements. For a detailed description of how +routing policy reflects commercial relationships among ISPs, see \[Gao +2001; Dmitiropoulos 2007\]. For a discussion of BGP routing polices from +an ISP standpoint, see \[Caesar 2005b\]. This completes our brief +introduction to BGP. Understanding BGP is important because it plays a +central role in the Internet. We encourage you to see the references +\[Griffin 2012; Stewart 1999; Labovitz 1997; Halabi 2000; Huitema 1998; +Gao 2001; Feamster 2004; Caesar 2005b; Li 2007\] to learn more about +BGP. + +5.4.6 Putting the Pieces Together: Obtaining Internet Presence Although +this subsection is not about BGP per se, it brings together many of the +protocols and concepts we've seen thus far, including IP addressing, +DNS, and BGP. Suppose you have just created a small company that has a +number of servers, including a public Web server that describes your +company's products and services, a mail server from which your employees +obtain their e-mail messages, and a DNS server. Naturally, you would +like the entire world to be able to visit your Web site in order to +learn about your exciting products and services. Moreover, you would +like your employees to be able to send and receive e-mail to potential +customers throughout the world. To meet these goals, you first need to +obtain Internet connectivity, which is done by contracting with, and +connecting to, a local ISP. Your company will have a gateway router, +which will be connected to a router in your local ISP. This connection +might be a DSL connection through the existing telephone infrastructure, +a leased line to the ISP's router, or one of the many other access +solutions described in Chapter 1. Your local ISP will also provide you +with an IP address range, e.g., a /24 address range consisting of 256 +addresses. Once you have your physical connectivity and your IP address +range, you will assign one of the IP addresses (in your address range) +to your Web server, one to your mail server, one to your DNS server, one +to your gateway router, and other IP addresses to other servers and +networking devices in your company's network. In addition to contracting +with an ISP, you will also need to contract with an Internet registrar +to obtain a domain name for your company, as described in Chapter 2. For +example, if your company's name is, say, Xanadu Inc., you will naturally +try to obtain the domain name xanadu.com. Your company must also obtain +presence in the DNS system. Specifically, because outsiders will want to +contact your DNS server to obtain the IP addresses of your servers, you +will also need to provide your registrar with the IP address of your DNS +server. Your registrar will then put an entry for your DNS server +(domain name and corresponding IP address) in the .com top-level-domain +servers, as described in Chapter 2. After this step is completed, any +user who knows your domain name (e.g., xanadu.com) will be able to +obtain the IP address of your DNS server via the DNS system. So that +people can discover the IP addresses of your Web server, in your DNS +server you will need to include entries that map the host name of your +Web server (e.g., www.xanadu.com) to its IP address. You will want to +have similar entries for other publicly available servers in your +company, including your mail server. In this manner, if Alice wants to +browse your Web server, the DNS system will contact your DNS server, +find the IP address of your Web server, and give it to Alice. Alice can +then establish a TCP connection directly with your Web server. However, +there still remains one other necessary and crucial step to allow +outsiders from around the + +world to access your Web server. Consider what happens when Alice, who +knows the IP address of your Web server, sends an IP datagram (e.g., a +TCP SYN segment) to that IP address. This datagram will be routed +through the Internet, visiting a series of routers in many different +ASs, and eventually reach your Web server. When any one of the routers +receives the datagram, it is going to look for an entry in its +forwarding table to determine on which outgoing port it should forward +the datagram. Therefore, each of the routers needs to know about the +existence of your company's /24 prefix (or some aggregate entry). How +does a router become aware of your company's prefix? As we have just +seen, it becomes aware of it from BGP! Specifically, when your company +contracts with a local ISP and gets assigned a prefix (i.e., an address +range), your local ISP will use BGP to advertise your prefix to the ISPs +to which it connects. Those ISPs will then, in turn, use BGP to +propagate the advertisement. Eventually, all Internet routers will know +about your prefix (or about some aggregate that includes your prefix) +and thus be able to appropriately forward datagrams destined to your Web +and mail servers. + +5.5 The SDN Control Plane In this section, we'll dive into the SDN +control plane---the network-wide logic that controls packet forwarding +among a network's SDN-enabled devices, as well as the configuration and +management of these devices and their services. Our study here builds on +our earlier discussion of generalized SDN forwarding in Section 4.4, so +you might want to first review that section, as well as Section 5.1 of +this chapter, before continuing on. As in Section 4.4, we'll again adopt +the terminology used in the SDN literature and refer to the network's +forwarding devices as "packet switches" (or just switches, with "packet" +being understood), since forwarding decisions can be made on the basis +of network-layer source/destination addresses, link-layer +source/destination addresses, as well as many other values in +transport-, network-, and link-layer packet-header fields. Four key +characteristics of an SDN architecture can be identified \[Kreutz +2015\]: Flow-based forwarding. Packet forwarding by SDN-controlled +switches can be based on any number of header field values in the +transport-layer, network-layer, or link-layer header. We saw in Section +4.4 that the OpenFlow1.0 abstraction allows forwarding based on eleven +different header field values. This contrasts sharply with the +traditional approach to router-based forwarding that we studied in +Sections 5.2--5.4, where forwarding of IP datagrams was based solely on +a datagram's destination IP address. Recall from Figure 5.2 that packet +forwarding rules are specified in a switch's flow table; it is the job +of the SDN control plane to compute, manage and install flow table +entries in all of the network's switches. Separation of data plane and +control plane. This separation is shown clearly in Figures 5.2 and 5.14. +The data plane consists of the network's switches--- relatively simple +(but fast) devices that execute the "match plus action" rules in their +flow tables. The control plane consists of servers and software that +determine and manage the switches' flow tables. Network control +functions: external to data-plane switches. Given that the "S" in SDN is +for "software," it's perhaps not surprising that the SDN control plane +is implemented in software. Unlike traditional routers, however, this +software executes on servers that are both distinct and remote from the +network's switches. As shown in Figure 5.14, the control plane itself +consists of two components ---an SDN controller (or network operating +system \[Gude 2008\]) and a set of network-control applications. The +controller maintains accurate network state information (e.g., the state +of remote links, switches, and hosts); provides this information to the +network-control applications running in the control plane; and provides +the means through which these applications can monitor, program, and +control the underlying network devices. Although the controller in +Figure 5.14 is shown as a single central server, in practice the +controller is only logically centralized; it is typically implemented on +several servers that provide coordinated, scalable performance and high +availability. + +A programmable network. The network is programmable through the +network-control applications running in the control plane. These +applications represent the "brains" of the SDN control plane, using the +APIs provided by the SDN controller to specify and control the data +plane in the network devices. For example, a routing network-control +application might determine the end-end paths between sources and +destinations (e.g., by executing Dijkstra's algorithm using the +node-state and link-state information maintained by the SDN controller). +Another network application might perform access control, i.e., +determine which packets are to be blocked at a switch, as in our third +example in Section 4.4.3. Yet another application might forward packets +in a manner that performs server load balancing (the second example we +considered in Section 4.4.3). From this discussion, we can see that SDN +represents a significant "unbundling" of network functionality ---data +plane switches, SDN controllers, and network-control applications are +separate entities that may each be provided by different vendors and +organizations. This contrasts with the pre-SDN model in which a +switch/router (together with its embedded control plane software and +protocol implementations) was monolithic, vertically integrated, and +sold by a single vendor. This unbundling of network functionality in SDN +has been likened to the earlier evolution from mainframe computers +(where hardware, system software, and applications were provided by a +single vendor) to personal computers (with their separate hardware, +operating systems, and applications). The unbundling of computing +hardware, system software, and applications has arguably led to a rich, +open ecosystem driven by innovation in all three of these areas; one +hope for SDN is that it too will lead to a such rich innovation. Given +our understanding of the SDN architecture of Figure 5.14, many questions +naturally arise. How and where are the flow tables actually computed? +How are these tables updated in response to events at SDN-controlled +devices (e.g., an attached link going up/down)? And how are the flow +table entries at multiple switches coordinated in such a way as to +result in orchestrated and consistent network-wide functionality (e.g., +end-to-end paths for forwarding packets from sources to destinations, or +coordinated distributed firewalls)? It is the role of the SDN control +plane to provide these, and many other, capabilities. + +Figure 5.14 Components of the SDN architecture: SDN-controlled switches, +the SDN controller, network-control applications + +5.5.2 The SDN Control Plane: SDN Controller and SDN Network-control +Applications Let's begin our discussion of the SDN control plane in the +abstract, by considering the generic capabilities that the control plane +must provide. As we'll see, this abstract, "first principles" approach +will lead us to an overall architecture that reflects how SDN control +planes have been implemented in practice. As noted above, the SDN +control plane divides broadly into two components---the SDN controller +and the SDN network-control applications. Let's explore the controller +first. Many SDN controllers have been developed since the earliest SDN +controller \[Gude 2008\]; see \[Kreutz 2015\] for an extremely thorough +and up-to-date survey. Figure 5.15 provides a more detailed view of a +generic SDN controller. A controller's functionality can be broadly +organized into three layers. Let's consider these layers in an +uncharacteristically bottom-up fashion: A communication layer: +communicating between the SDN controller and controlled network devices. +Clearly, if an SDN controller is going to control the operation of a +remote SDN-enabled + +switch, host, or other device, a protocol is needed to transfer +information between the controller and that device. In addition, a +device must be able to communicate locally-observed events to the +controller (e.g., a message indicating that an attached link has gone up +or down, that a device has just joined the network, or a heartbeat +indicating that a device is up and operational). These events provide +the SDN controller with an up-to-date view of the network's state. This +protocol constitutes the lowest layer of the controller architecture, as +shown in Figure 5.15. The communication between the controller and the +controlled devices cross what has come to be known as the controller's +"southbound" interface. In Section 5.5.2, we'll study OpenFlow---a +specific protocol that provides this communication functionality. +OpenFlow is implemented in most, if not all, SDN controllers. A +network-wide state-management layer. The ultimate control decisions made +by the SDN control plane---e.g., configuring flow tables in all switches +to achieve the desired end-end forwarding, to implement load balancing, +or to implement a particular firewalling capability---will require that +the controller have up-to-date information about state of the networks' +hosts, links, switches, and other SDN-controlled devices. A switch's +flow table contains counters whose values might also be profitably used +by network-control applications; these values should thus be available +to the applications. Since the ultimate aim of the control plane is to +determine flow tables for the various controlled devices, a controller +might also maintain a copy of these tables. These pieces of information +all constitute examples of the network-wide "state" maintained by the +SDN controller. The interface to the network-control application layer. +The controller interacts with networkcontrol applications through its +"northbound" interface. This API + +Figure 5.15 Components of an SDN controller + +allows network-control applications to read/write network state and flow +tables within the statemanagement layer. Applications can register to be +notified when state-change events occur, so that they can take actions +in response to network event notifications sent from SDN-controlled +devices. Different types of APIs may be provided; we'll see that two +popular SDN controllers communicate with their applications using a REST +\[Fielding 2000\] request-response interface. We have noted several +times that an SDN controller can be considered to be "logically +centralized," i.e., that the controller may be viewed externally (e.g., +from the point of view of SDN-controlled devices and external +network-control applications) as a single, monolithic service. However, +these services and the databases used to hold state information are +implemented in practice by a distributed set of servers for fault +tolerance, high availability, or for performance reasons. With +controller functions being implemented by a set of servers, the +semantics of the controller's internal operations (e.g., maintaining +logical time ordering of events, consistency, consensus, and more) must +be considered \[Panda 2013\]. + +Such concerns are common across many different distributed systems; see +\[Lamport 1989, Lampson 1996\] for elegant solutions to these +challenges. Modern controllers such as OpenDaylight \[OpenDaylight +Lithium 2016\] and ONOS \[ONOS 2016\] (see sidebar) have placed +considerable emphasis on architecting a logically centralized but +physically distributed controller platform that provides scalable +services and high availability to the controlled devices and +network-control applications alike. The architecture depicted in Figure +5.15 closely resembles the architecture of the originally proposed NOX +controller in 2008 \[Gude 2008\], as well as that of today's +OpenDaylight \[OpenDaylight Lithium 2016\] and ONOS \[ONOS 2016\] SDN +controllers (see sidebar). We'll cover an example of controller +operation in Section 5.5.3. First, however, let's examine the OpenFlow +protocol, which lies in the controller's communication layer. + +5.5.2 OpenFlow Protocol The OpenFlow protocol \[OpenFlow 2009, ONF +2016\] operates between an SDN controller and an SDN-controlled switch +or other device implementing the OpenFlow API that we studied earlier in +Section 4.4. The OpenFlow protocol operates over TCP, with a default +port number of 6653. Among the important messages flowing from the +controller to the controlled switch are the following: Configuration. +This message allows the controller to query and set a switch's +configuration parameters. Modify-State. This message is used by a +controller to add/delete or modify entries in the switch's flow table, +and to set switch port properties. Read-State. This message is used by a +controller to collect statistics and counter values from the switch's +flow table and ports. Send-Packet. This message is used by the +controller to send a specific packet out of a specified port at the +controlled switch. The message itself contains the packet to be sent in +its payload. Among the messages flowing from the SDN-controlled switch +to the controller are the following: Flow-Removed. This message informs +the controller that a flow table entry has been removed, for example by +a timeout or as the result of a received modify-state message. +Port-status. This message is used by a switch to inform the controller +of a change in port status. Packet-in. Recall from Section 4.4 that a +packet arriving at a switch port and not matching any flow table entry +is sent to the controller for additional processing. Matched packets may +also be sent to the controller, as an action to be taken on a match. The +packet-in message is used to send such packets to the controller. + +Additional OpenFlow messages are defined in \[OpenFlow 2009, ONF 2016\]. +Principles in Practice Google's Software-Defined Global Network Recall +from the case study in Section 2.6 that Google deploys a dedicated +wide-area network (WAN) that interconnects its data centers and server +clusters (in IXPs and ISPs). This network, called B4, has a +Google-designed SDN control plane built on OpenFlow. Google's network is +able to drive WAN links at near 70% utilization over the long run (a two +to three fold increase over typical link utilizations) and split +application flows among multiple paths based on application priority and +existing flow demands \[Jain 2013\]. The Google B4 network is +particularly it well-suited for SDN: (i) Google controls all devices +from the edge servers in IXPs and ISPs to routers in their network core; +(ii) the most bandwidthintensive applications are large-scale data +copies between sites that can defer to higher-priority interactive +applications during times of resource congestion; (iii) with only a few +dozen data centers being connected, centralized control is feasible. +Google's B4 network uses custom-built switches, each implementing a +slightly extended version of OpenFlow, with a local Open Flow Agent +(OFA) that is similar in spirit to the control agent we encountered in +Figure 5.2. Each OFA in turn connects to an Open Flow Controller (OFC) +in the network control server (NCS), using a separate "out of band" +network, distinct from the network that carries data-center traffic +between data centers. The OFC thus provides the services used by the NCS +to communicate with its controlled switches, similar in spirit to the +lowest layer in the SDN architecture shown in Figure 5.15. In B4, the +OFC also performs state management functions, keeping node and link +status in a Network Information Base (NIB). Google's implementation of +the OFC is based on the ONIX SDN controller \[Koponen 2010\]. Two +routing protocols, BGP (for routing between the data centers) and IS-IS +(a close relative of OSPF, for routing within a data center), are +implemented. Paxos \[Chandra 2007\] is used to execute hot replicas of +NCS components to protect against failure. A traffic engineering +network-control application, sitting logically above the set of network +control servers, interacts with these servers to provide global, +network-wide bandwidth provisioning for groups of application flows. +With B4, SDN made an important leap forward into the operational +networks of a global network provider. See \[Jain 2013\] for a detailed +description of B4. + +5.5.3 Data and Control Plane Interaction: An Example + +In order to solidify our understanding of the interaction between +SDN-controlled switches and the SDN controller, let's consider the +example shown in Figure 5.16, in which Dijkstra's algorithm (which we +studied in Section 5.2) is used to determine shortest path routes. The +SDN scenario in Figure 5.16 has two important differences from the +earlier per-router-control scenario of Sections 5.2.1 and 5.3, where +Dijkstra's algorithm was implemented in each and every router and +link-state updates were flooded among all network routers: Dijkstra's +algorithm is executed as a separate application, outside of the packet +switches. Packet switches send link updates to the SDN controller and +not to each other. In this example, let's assume that the link between +switch s1 and s2 goes down; that shortest path routing is implemented, +and consequently and that incoming and outgoing flow forwarding rules at +s1, s3, and s4 are affected, but that s2's + +Figure 5.16 SDN controller scenario: Link-state change + +operation is unchanged. Let's also assume that OpenFlow is used as the +communication layer protocol, and that the control plane performs no +other function other than link-state routing. + +1. Switch s1, experiencing a link failure between itself and s2, +notifies the SDN controller of the link-state change using the OpenFlow +port-status message. + +2. The SDN controller receives the OpenFlow message indicating the + link-state change, and notifies the link-state manager, which + updates a link-state database. + +3. The network-control application that implements Dijkstra's + link-state routing has previously registered to be notified when + link state changes. That application receives the notification of + the link-state change. + +4. The link-state routing application interacts with the link-state + manager to get updated link state; it might also consult other + components in the state-management layer. It then computes the new + least-cost paths. + +5. The link-state routing application then interacts with the flow + table manager, which determines the flow tables to be updated. + +6. The flow table manager then uses the OpenFlow protocol to update + flow table entries at affected switches---s1 (which will now route + packets destined to s2 via s4), s2 (which will now begin receiving + packets from s1 via intermediate switch s4), and s4 (which must now + forward packets from s1 destined to s2). This example is simple but + illustrates how the SDN control plane provides control-plane + services (in this case network-layer routing) that had been + previously implemented with per-router control exercised in each and + every network router. One can now easily appreciate how an + SDN-enabled ISP could easily switch from least-cost path routing to + a more hand-tailored approach to routing. Indeed, since the + controller can tailor the flow tables as it pleases, it can + implement any form of forwarding that it pleases ---simply by + changing its application-control software. This ease of change + should be contrasted to the case of a traditional per-router control + plane, where software in all routers (which might be provided to the + ISP by multiple independent vendors) must be changed. + +5.5.4 SDN: Past and Future Although the intense interest in SDN is a +relatively recent phenomenon, the technical roots of SDN, and the +separation of the data and control planes in particular, go back +considerably further. In 2004, \[Feamster 2004, Lakshman 2004, RFC +3746\] all argued for the separation of the network's data and control +planes. \[van der Merwe 1998\] describes a control framework for ATM +networks \[Black 1995\] with multiple controllers, each controlling a +number of ATM switches. The Ethane project \[Casado 2007\] pioneered the +notion of a network of simple flow-based Ethernet switches with +match-plus-action flow tables, a centralized controller that managed +flow admission and routing, and the forwarding of unmatched packets from +the switch to the controller. A network of more than 300 Ethane switches +was operational in 2007. Ethane quickly evolved into the OpenFlow +project, and the rest (as the saying goes) is history! + +Numerous research efforts are aimed at developing future SDN +architectures and capabilities. As we have seen, the SDN revolution is +leading to the disruptive replacement of dedicated monolithic switches +and routers (with both data and control planes) by simple commodity +switching hardware and a sophisticated software control plane. A +generalization of SDN known as network functions virtualization (NFV) +similarly aims at disruptive replacement of sophisticated middleboxes +(such as middleboxes with dedicated hardware and proprietary software +for media caching/service) with simple commodity servers, switching, and +storage \[Gember-Jacobson 2014\]. A second area of important research +seeks to extend SDN concepts from the intra-AS setting to the inter-AS +setting \[Gupta 2014\]. PRINCIPLES IN PRACTICE SDN Controller Case +Studies: The OpenDaylight and ONOS Controllers In the earliest days of +SDN, there was a single SDN protocol (OpenFlow \[McKeown 2008; OpenFlow +2009\]) and a single SDN controller (NOX \[Gude 2008\]). Since then, the +number of SDN controllers in particular has grown significantly \[Kreutz +2015\]. Some SDN controllers are company-specific and proprietary, e.g., +ONIX \[Koponen 2010\], Juniper Networks Contrail \[Juniper Contrail +2016\], and Google's controller \[Jain 2013\] for its B4 wide-area +network. But many more controllers are open-source and implemented in a +variety of programming languages \[Erickson 2013\]. Most recently, the +OpenDaylight controller \[OpenDaylight Lithium 2016\] and the ONOS +controller \[ONOS 2016\] have found considerable industry support. They +are both open-source and are being developed in partnership with the +Linux Foundation. The OpenDaylight Controller Figure 5.17 presents a +simplified view of the OpenDaylight Lithium SDN controller platform +\[OpenDaylight Lithium 2016\]. ODL's main set of controller components +correspond closely to those we developed in Figure 5.15. Network-Service +Applications are the applications that determine how data-plane +forwarding and other services, such as firewalling and load balancing, +are accomplished in the controlled switches. Unlike the canonical +controller in Figure 5.15, the ODL controller has two interfaces through +which applications may communicate with native controller services and +each other: external applications communicate with controller modules +using a REST request-response API running over HTTP. Internal +applications communicate with each other via the Service Abstraction +Layer (SAL). The choice as to whether a controller application is +implemented externally or internally is up to the application designer; + +Figure 5.17 The OpenDaylight controller + +the particular configuration of applications shown in Figure 5.17 is +only meant as an example. ODL's Basic Network-Service Functions are at +the heart of the controller, and they correspond closely to the +network-wide state management capabilities that we encountered in Figure +5.15. The SAL is the controller's nerve center, allowing controller +components and applications to invoke each other's services and to +subscribe to events they generate. It also provides a uniform abstract +interface to the specific underlying communications protocols in the +communication layer, including OpenFlow and SNMP (the Simple Network +Management Protocol---a network management protocol that we will cover +in Section 5.7). OVSDB is a protocol used to manage data center +switching, an important application area for SDN technology. We'll +introduce data center networking in Chapter 6. + +Figure 5.18 ONOS controller architecture + +The ONOS Controller Figure 5.18 presents a simplified view of the ONOS +controller ONOS 2016\]. Similar to the canonical controller in Figure +5.15, three layers can be identified in the ONOS controller: Northbound +abstractions and protocols. A unique feature of ONOS is its intent +framework, which allows an application to request a high-level service +(e.g., to setup a connection between host A and Host B, or conversely to +not allow Host A and host B to communicate) without having to know the +details of how this service is performed. State information is provided +to network-control applications across the northbound API either +synchronously (via query) or asynchronously (via listener callbacks, +e.g., when network state changes). Distributed core. The state of the +network's links, hosts, and devices is maintained in ONOS's distributed +core. ONOS is deployed as a service on a set of interconnected servers, +with each server running an identical copy of the ONOS software; an +increased number of servers offers an increased service capacity. The +ONOS core provides the mechanisms for service replication and +coordination among instances, providing the applications above and the +network devices below with the abstraction of logically centralized core +services. + +Southbound abstractions and protocols. The southbound abstractions mask +the heterogeneity of the underlying hosts, links, switches, and +protocols, allowing the distributed core to be both device and protocol +agnostic. Because of this abstraction, the southbound interface below +the distributed core is logically higher than in our canonical +controller in Figure 5.14 or the ODL controller in Figure 5.17. + +5.6 ICMP: The Internet Control Message Protocol The Internet Control +Message Protocol (ICMP), specified in \[RFC 792\], is used by hosts and +routers to communicate network-layer information to each other. The most +typical use of ICMP is for error reporting. For example, when running an +HTTP session, you may have encountered an error message such as +"Destination network unreachable." This message had its origins in ICMP. +At some point, an IP router was unable to find a path to the host +specified in your HTTP request. That router created and sent an ICMP +message to your host indicating the error. ICMP is often considered part +of IP, but architecturally it lies just above IP, as ICMP messages are +carried inside IP datagrams. That is, ICMP messages are carried as IP +payload, just as TCP or UDP segments are carried as IP payload. +Similarly, when a host receives an IP datagram with ICMP specified as +the upper-layer protocol (an upper-layer protocol number of 1), it +demultiplexes the datagram's contents to ICMP, just as it would +demultiplex a datagram's content to TCP or UDP. ICMP messages have a +type and a code field, and contain the header and the first 8 bytes of +the IP datagram that caused the ICMP message to be generated in the +first place (so that the sender can determine the datagram that caused +the error). Selected ICMP message types are shown in Figure 5.19. Note +that ICMP messages are used not only for signaling error conditions. The +well-known ping program sends an ICMP type 8 code 0 message to the +specified host. The destination host, seeing the echo request, sends +back a type 0 code 0 ICMP echo reply. Most TCP/IP implementations +support the ping server directly in the operating system; that is, the +server is not a process. Chapter 11 of \[Stevens 1990\] provides the +source code for the ping client program. Note that the client program +needs to be able to instruct the operating system to generate an ICMP +message of type 8 code 0. Another interesting ICMP message is the source +quench message. This message is seldom used in practice. Its original +purpose was to perform congestion control---to allow a congested router +to send an ICMP source quench message to a host to force + +Figure 5.19 ICMP message types + +that host to reduce its transmission rate. We have seen in Chapter 3 +that TCP has its own congestioncontrol mechanism that operates at the +transport layer, without the use of network-layer feedback such as the +ICMP source quench message. In Chapter 1 we introduced the Traceroute +program, which allows us to trace a route from a host to any other host +in the world. Interestingly, Traceroute is implemented with ICMP +messages. To determine the names and addresses of the routers between +source and destination, Traceroute in the source sends a series of +ordinary IP datagrams to the destination. Each of these datagrams +carries a UDP segment with an unlikely UDP port number. The first of +these datagrams has a TTL of 1, the second of 2, the third of 3, and so +on. The source also starts timers for each of the datagrams. When the +nth datagram arrives at the nth router, the nth router observes that the +TTL of the datagram has just expired. According to the rules of the IP +protocol, the router discards the datagram and sends an ICMP warning +message to the source (type 11 code 0). This warning message includes +the name of the router and its IP address. When this ICMP message +arrives back at the source, the source obtains the round-trip time from +the timer and the name and IP address of the nth router from the ICMP +message. How does a Traceroute source know when to stop sending UDP +segments? Recall that the source increments the TTL field for each +datagram it sends. Thus, one of the datagrams will eventually make it +all the way to the destination host. Because this datagram contains a +UDP segment with an unlikely port + +number, the destination host sends a port unreachable ICMP message (type +3 code 3) back to the source. When the source host receives this +particular ICMP message, it knows it does not need to send additional +probe packets. (The standard Traceroute program actually sends sets of +three packets with the same TTL; thus the Traceroute output provides +three results for each TTL.) In this manner, the source host learns the +number and the identities of routers that lie between it and the +destination host and the round-trip time between the two hosts. Note +that the Traceroute client program must be able to instruct the +operating system to generate UDP datagrams with specific TTL values and +must also be able to be notified by its operating system when ICMP +messages arrive. Now that you understand how Traceroute works, you may +want to go back and play with it some more. A new version of ICMP has +been defined for IPv6 in RFC 4443. In addition to reorganizing the +existing ICMP type and code definitions, ICMPv6 also added new types and +codes required by the new IPv6 functionality. These include the "Packet +Too Big" type and an "unrecognized IPv6 options" error code. + +5.7 Network Management and SNMP Having now made our way to the end of +our study of the network layer, with only the link-layer before us, +we're well aware that a network consists of many complex, interacting +pieces of hardware and software ---from the links, switches, routers, +hosts, and other devices that comprise the physical components of the +network to the many protocols that control and coordinate these devices. +When hundreds or thousands of such components are brought together by an +organization to form a network, the job of the network administrator to +keep the network "up and running" is surely a challenge. We saw in +Section 5.5 that the logically centralized controller can help with this +process in an SDN context. But the challenge of network management has +been around long before SDN, with a rich set of network management tools +and approaches that help the network administrator monitor, manage, and +control the network. We'll study these tools and techniques in this +section. An often-asked question is "What is network management?" A +well-conceived, single-sentence (albeit a rather long run-on sentence) +definition of network management from \[Saydam 1996\] is: Network +management includes the deployment, integration, and coordination of the +hardware, software, and human elements to monitor, test, poll, +configure, analyze, evaluate, and control the network and element +resources to meet the real-time, operational performance, and Quality of +Service requirements at a reasonable cost. Given this broad definition, +we'll cover only the rudiments of network management in this +section---the architecture, protocols, and information base used by a +network administrator in performing their task. We'll not cover the +administrator's decision-making processes, where topics such as fault +identification \[Labovitz 1997; Steinder 2002; Feamster 2005; Wu 2005; +Teixeira 2006\], anomaly detection \[Lakhina 2005; Barford 2009\], +network design/engineering to meet contracted Service Level Agreements +(SLA's) \[Huston 1999a\], and more come into consideration. Our focus is +thus purposefully narrow; the interested reader should consult these +references, the excellent network-management text by Subramanian +\[Subramanian 2000\], and the more detailed treatment of network +management available on the Web site for this text. + +5.7.1 The Network Management Framework Figure 5.20 shows the key +components of network management: + +The managing server is an application, typically with a human in the +loop, running in a centralized network management station in the network +operations center (NOC). The managing server is the locus of activity +for network management; it controls the collection, processing, +analysis, and/or display of network management information. It is here +that actions are initiated to control network behavior and here that the +human network administrator interacts with the network's devices. A +managed device is a piece of network equipment (including its software) +that resides on a managed network. A managed device might be a host, +router, switch, middlebox, modem, thermometer, or other +network-connected device. There may be several so-called managed objects +within a managed device. These managed objects are the actual pieces of +hardware within the managed device (for example, a network interface +card is but one component of a host or router), and configuration +parameters for these hardware and software components (for example, an +intraAS routing protocol such as OSPF). Each managed object within a +managed device associated information that is collected into a +Management Information Base (MIB); we'll see that the values of these +pieces of information are available to (and in many cases able to be set +by) the managing server. A MIB object might be a counter, such as the +number of IP datagrams discarded at a router due to errors in an IP +datagram header, or the number of UDP segments received at a host; +descriptive information such as the version of the software running on a +DNS server; status information such as whether a particular device is +functioning correctly; or protocol-specific information such as a +routing path to a destination. MIB objects are specified in a data +description language known as SMI (Structure of Management Information) +\[RFC 2578; RFC 2579; RFC 2580\]. A formal definition language is used +to ensure that the syntax and semantics of the network management data +are well defined and unambiguous. Related MIB objects are gathered into +MIB modules. As of mid-2015, there were nearly 400 MIB modules defined +by RFCs, and a much larger number of vendor-specific (private) MIB +modules. Also resident in each managed device is a network management +agent, a process running in the managed device that communicates with +the managing server, + +Figure 5.20 Elements of network management: Managing server, managed +devices, MIB data, remote agents, SNMP + +taking local actions at the managed device under the command and control +of the managing server. The network management agent is similar to the +routing agent that we saw in Figure 5.2. The final component of a +network management framework is the network management protocol. The +protocol runs between the managing server and the managed devices, +allowing the managing server to query the status of managed devices and +indirectly take actions at these devices via its agents. Agents can use +the network management protocol to inform the managing server of +exceptional events (for example, component failures or violation of +performance thresholds). It's important to note that the network +management protocol does not itself manage the network. Instead, it +provides capabilities that a network administrator can use to manage +("monitor, test, poll, configure, analyze, evaluate, and control") the +network. This is a subtle, but important, distinction. In the following +section, we'll cover the Internet's SNMP (Simple Network Management +Protocol) protocol. + +5.7.2 The Simple Network Management Protocol (SNMP) + +The Simple Network Management Protocol version 2 (SNMPv2) \[RFC 3416\] +is an application-layer protocol used to convey network-management +control and information messages between a managing server and an agent +executing on behalf of that managing server. The most common usage of +SNMP is in a request-response mode in which an SNMP managing server +sends a request to an SNMP agent, who receives the request, performs +some action, and sends a reply to the request. Typically, a request will +be used to query (retrieve) or modify (set) MIB object values associated +with a managed device. A second common usage of SNMP is for an agent to +send an unsolicited message, known as a trap message, to a managing +server. Trap messages are used to notify a managing server of an +exceptional situation (e.g., a link interface going up or down) that has +resulted in changes to MIB object values. SNMPv2 defines seven types of +messages, known generically as protocol data units---PDUs---as shown in +Table 5.2 and described below. The format of the PDU is shown in Figure +5.21. The GetRequest , GetNextRequest, and GetBulkRequest PDUs are all +sent from a managing server to an agent to request the value of one or +more MIB objects at the agent's managed device. The MIB objects whose +values are being Table 5.2 SNMPv2 PDU types SNMPv2 PDU + +Sender-receiver + +Description + +manager-to- + +get value of one or more MIB object instances + +Type GetRequest + +agent GetNextRequest + +manager-to- + +get value of next MIB object instance in list or table + +agent GetBulkRequest + +InformRequest + +SetRequest + +manager-to- + +get values in large block of data, for example, values + +agent + +in a large table + +manager-to- + +inform remote managing entity of MIB values remote + +manager + +to its access + +manager-to- + +set value of one or more MIB object instances + +agent Response + +agent-to- + +generated in response to + +manager or manager-tomanager + +GetRequest, + +GetNextRequest, GetBulkRequest, SetRequest PDU, or InformRequest +SNMPv2-Trap + +agent-to- + +inform manager of an exceptional event \# + +manager + +Figure 5.21 SNMP PDU format + +requested are specified in the variable binding portion of the PDU. +GetRequest , GetNextRequest , and GetBulkRequest differ in the +granularity of their data requests. GetRequest can request an arbitrary +set of MIB values; multiple GetNextRequest s can be used to sequence +through a list or table of MIB objects; GetBulkRequest allows a large +block of data to be returned, avoiding the overhead incurred if multiple +GetRequest or GetNextRequest messages were to be sent. In all three +cases, the agent responds with a Response PDU containing the object +identifiers and their associated values. The SetRequest PDU is used by a +managing server to set the value of one or more MIB objects in a managed +device. An agent replies with a Response PDU with the "noError" error +status to confirm that the value has indeed been set. The InformRequest +PDU is used by a managing server to notify another managing server of +MIB + +information that is remote to the receiving server. The Response PDU is +typically sent from a managed device to the managing server in response +to a request message from that server, returning the requested +information. The final type of SNMPv2 PDU is the trap message. Trap +messages are generated asynchronously; that is, they are not generated +in response to a received request but rather in response to an event for +which the managing server requires notification. RFC 3418 defines +well-known trap types that include a cold or warm start by a device, a +link going up or down, the loss of a neighbor, or an authentication +failure event. A received trap request has no required response from a +managing server. Given the request-response nature of SNMP, it is worth +noting here that although SNMP PDUs can be carried via many different +transport protocols, the SNMP PDU is typically carried in the payload of +a UDP datagram. Indeed, RFC 3417 states that UDP is "the preferred +transport mapping." However, since UDP is an unreliable transport +protocol, there is no guarantee that a request, or its response, will be +received at the intended destination. The request ID field of the PDU +(see Figure 5.21) is used by the managing server to number its requests +to an agent; the agent's response takes its request ID from that of the +received request. Thus, the request ID field can be used by the managing +server to detect lost requests or replies. It is up to the managing +server to decide whether to retransmit a request if no corresponding +response is received after a given amount of time. In particular, the +SNMP standard does not mandate any particular procedure for +retransmission, or even if retransmission is to be done in the first +place. It only requires that the managing server "needs to act +responsibly in respect to the frequency and duration of +retransmissions." This, of course, leads one to wonder how a +"responsible" protocol should act! SNMP has evolved through three +versions. The designers of SNMPv3 have said that "SNMPv3 can be thought +of as SNMPv2 with additional security and administration capabilities" +\[RFC 3410\]. Certainly, there are changes in SNMPv3 over SNMPv2, but +nowhere are those changes more evident than in the area of +administration and security. The central role of security in SNMPv3 was +particularly important, since the lack of adequate security resulted in +SNMP being used primarily for monitoring rather than control (for +example, SetRequest is rarely used in SNMPv1). Once again, we see that +security---a topic we'll cover in detail in Chapter 8 --- is of critical +concern, but once again a concern whose importance had been realized +perhaps a bit late and only then "added on." + +5.7 Summary We have now completed our two-chapter journey into the +network core---a journey that began with our study of the network +layer's data plane in Chapter 4 and finished here with our study of the +network layer's control plane. We learned that the control plane is the +network-wide logic that controls not only how a datagram is forwarded +among routers along an end-to-end path from the source host to the +destination host, but also how network-layer components and services are +configured and managed. We learned that there are two broad approaches +towards building a control plane: traditional per-router control (where +a routing algorithm runs in each and every router and the routing +component in the router communicates with the routing components in +other routers) and software-defined networking (SDN) control (where a +logically centralized controller computes and distributes the forwarding +tables to be used by each and every router). We studied two fundamental +routing algorithms for computing least cost paths in a +graph---link-state routing and distance-vector routing---in Section 5.2; +these algorithms find application in both per-router control and in SDN +control. These algorithms are the basis for two widelydeployed Internet +routing protocols, OSPF and BGP, that we covered in Sections 5.3 and +5.4. We covered the SDN approach to the network-layer control plane in +Section 5.5, investigating SDN network-control applications, the SDN +controller, and the OpenFlow protocol for communicating between the +controller and SDN-controlled devices. In Sections 5.6 and 5.7, we +covered some of the nuts and bolts of managing an IP network: ICMP (the +Internet Control Message Protocol) and SNMP (the Simple Network +Management Protocol). Having completed our study of the network layer, +our journey now takes us one step further down the protocol stack, +namely, to the link layer. Like the network layer, the link layer is +part of each and every network-connected device. But we will see in the +next chapter that the link layer has the much more localized task of +moving packets between nodes on the same link or LAN. Although this task +may appear on the surface to be rather simple compared with that of the +network layer's tasks, we will see that the link layer involves a number +of important and fascinating issues that can keep us busy for a long +time. + +Homework Problems and Questions + +Chapter 5 Review Questions + +SECTION 5.1 R1. What is meant by a control plane that is based on +per-router control? In such cases, when we say the network control and +data planes are implemented "monolithically," what do we mean? R2. What +is meant by a control plane that is based on logically centralized +control? In such cases, are the data plane and the control plane +implemented within the same device or in separate devices? Explain. + +SECTION 5.2 R3. Compare and contrast the properties of a centralized and +a distributed routing algorithm. Give an example of a routing protocol +that takes a centralized and a decentralized approach. R4. Compare and +contrast link-state and distance-vector routing algorithms. R5. What is +the "count to infinity" problem in distance vector routing? R6. Is it +necessary that every autonomous system use the same intra-AS routing +algorithm? Why or why not? + +SECTIONS 5.3--5.4 R7. Why are different inter-AS and intra-AS protocols +used in the Internet? R8. True or false: When an OSPF route sends its +link state information, it is sent only to those nodes directly attached +neighbors. Explain. R9. What is meant by an area in an OSPF autonomous +system? Why was the concept of an area introduced? R10. Define and +contrast the following terms: subnet, prefix, and BGP route. R11. How +does BGP use the NEXT-HOP attribute? How does it use the AS-PATH +attribute? R12. Describe how a network administrator of an upper-tier +ISP can implement policy when configuring BGP. R13. True or false: When +a BGP router receives an advertised path from its neighbor, it must add +its own identity to the received path and then send that new path on to +all of its neighbors. + +Explain. + +SECTION 5.5 R14. Describe the main role of the communication layer, the +network-wide state-management layer, and the network-control application +layer in an SDN controller. R15. Suppose you wanted to implement a new +routing protocol in the SDN control plane. At which layer would you +implement that protocol? Explain. R16. What types of messages flow +across an SDN controller's northbound and southbound APIs? Who is the +recipient of these messages sent from the controller across the +southbound interface, and who sends messages to the controller across +the northbound interface? R17. Describe the purpose of two types of +OpenFlow messages (of your choosing) that are sent from a controlled +device to the controller. Describe the purpose of two types of Openflow +messages (of your choosing) that are send from the controller to a +controlled device. R18. What is the purpose of the service abstraction +layer in the OpenDaylight SDN controller? + +SECTIONS 5.6--5.7 R19. Names four different types of ICMP messages R20. +What two types of ICMP messages are received at the sending host +executing the Traceroute program? R21. Define the following terms in the +context of SNMP: managing server, managed device, network management +agent and MIB. R22. What are the purposes of the SNMP GetRequest and +SetRequest messages? R23. What is the purpose of the SNMP trap message? + +Problems P1. Looking at Figure 5.3 , enumerate the paths from y to u +that do not contain any loops. P2. Repeat Problem P1 for paths from x to +z, z to u, and z to w. P3. Consider the following network. With the +indicated link costs, use Dijkstra's shortest-path algorithm to compute +the shortest path from x to all network nodes. Show how the algorithm +works by computing a table similar to Table 5.1 . + +Dijkstra's algorithm: discussion and example + +P4. Consider the network shown in Problem P3. Using Dijkstra's +algorithm, and showing your work using a table similar to Table 5.1 , do +the following: + +a. Compute the shortest path from t to all network nodes. +b. Compute the shortest path from u to all network nodes. +c. Compute the shortest path from v to all network nodes. +d. Compute the shortest path from w to all network nodes. +e. Compute the shortest path from y to all network nodes. +f. Compute the shortest path from z to all network nodes. P5. Consider + the network shown below, and assume that each node initially knows + the costs to each of its neighbors. Consider the distance-vector + algorithm and show the distance table entries at node z. + +P6. Consider a general topology (that is, not the specific network shown +above) and a + +synchronous version of the distance-vector algorithm. Suppose that at +each iteration, a node exchanges its distance vectors with its neighbors +and receives their distance vectors. Assuming that the algorithm begins +with each node knowing only the costs to its immediate neighbors, what +is the maximum number of iterations required before the distributed +algorithm converges? Justify your answer. P7. Consider the network +fragment shown below. x has only two attached neighbors, w and y. w has +a minimum-cost path to destination u (not shown) of 5, and y has a +minimum-cost path to u of 6. The complete paths from w and y to u (and +between w and y) are not shown. All link costs in the network have +strictly positive integer values. + +a. Give x's distance vector for destinations w, y, and u. + +b. Give a link-cost change for either c(x, w) or c(x, y) such that x + will inform its neighbors of a new minimum-cost path to u as a + result of executing the distance-vector algorithm. + +c. Give a link-cost change for either c(x, w) or c(x, y) such that x + will not inform its neighbors of a new minimum-cost path to u as a + result of executing the distance-vector algorithm. P8. Consider the + three-node topology shown in Figure 5.6 . Rather than having the + link costs shown in Figure 5.6 , the link costs are c(x,y)=3, + c(y,z)=6, c(z,x)=4. Compute the distance tables after the + initialization step and after each iteration of a synchronous + version of the distancevector algorithm (as we did in our earlier + discussion of Figure 5.6 ). P9. Consider the count-to-infinity + problem in the distance vector routing. Will the count-to-infinity + problem occur if we decrease the cost of a link? Why? How about if + we connect two nodes which do not have a link? P10. Argue that for + the distance-vector algorithm in Figure 5.6 , each value in the + distance vector D(x) is non-increasing and will eventually stabilize + in a finite number of steps. P11. Consider Figure 5.7. Suppose there + is another router w, connected to router y and z. The costs of all + links are given as follows: c(x,y)=4, c(x,z)=50, c(y,w)=1, c(z,w)=1, + c(y,z)=3. Suppose that poisoned reverse is used in the + distance-vector routing algorithm. + +d. When the distance vector routing is stabilized, router w, y, and z + inform their distances to x to each other. What distance values do + they tell each other? + +e. Now suppose that the link cost between x and y increases to 60. Will + there be a count-toinfinity problem even if poisoned reverse is + used? Why or why not? If there is a count-toinfinity problem, then + how many iterations are needed for the distance-vector routing to + +reach a stable state again? Justify your answer. + +c. How do you modify c(y, z) such that there is no count-to-infinity + problem at all if c(y,x) changes from 4 to 60? P12. Describe how + loops in paths can be detected in BGP. P13. Will a BGP router always + choose the loop-free route with the shortest ASpath length? Justify + your answer. P14. Consider the network shown below. Suppose AS3 and + AS2 are running OSPF for their intra-AS routing protocol. Suppose + AS1 and AS4 are running RIP for their intra-AS routing protocol. + Suppose eBGP and iBGP are used for the inter-AS routing protocol. + Initially suppose there is no physical link between AS2 and AS4. + +d. Router 3c learns about prefix x from which routing protocol: OSPF, + RIP, eBGP, or iBGP? + +e. Router 3a learns about x from which routing protocol? + +f. Router 1c learns about x from which routing protocol? + +g. Router 1d learns about x from which routing protocol? + +P15. Referring to the previous problem, once router 1d learns about x it +will put an entry (x, I) in its forwarding table. + +a. Will I be equal to I1 or I2 for this entry? Explain why in one + sentence. + +b. Now suppose that there is a physical link between AS2 and AS4, shown + by the dotted line. Suppose router 1d learns that x is accessible + via AS2 as well as via AS3. Will I be set to I1 or I2? Explain why + in one sentence. + +c. Now suppose there is another AS, called AS5, which lies on the path + between AS2 and AS4 (not shown in diagram). Suppose router 1d learns + that x is accessible via AS2 AS5 AS4 as well as via AS3 AS4. Will I + be set to I1 or I2? Explain why in one sentence. + +P16. Consider the following network. ISP B provides national backbone +service to regional ISP A. ISP C provides national backbone service to +regional ISP D. Each ISP consists of one AS. B and C peer with each +other in two places using BGP. Consider traffic going from A to D. B +would prefer to hand that traffic over to C on the West Coast (so that C +would have to absorb the cost of carrying the traffic cross-country), +while C would prefer to get the traffic via its East Coast peering point +with B (so that B would have carried the traffic across the country). +What BGP mechanism might C use, so that B would hand over A-to-D traffic +at its East Coast peering point? To answer this question, you will need +to dig into the BGP specification. + +P17. In Figure 5.13 , consider the path information that reaches stub +networks W, X, and Y. Based on the information available at W and X, +what are their respective views of the network topology? Justify your +answer. The topology view at Y is shown below. + +P18. Consider Figure 5.13 . B would never forward traffic destined to Y +via X based on BGP routing. But there are some very popular applications +for which data packets go to X first and then flow to Y. Identify one +such application, and describe how data packets follow a path not given +by BGP routing. + +P19. In Figure 5.13 , suppose that there is another stub network V that +is a customer of ISP A. Suppose that B and C have a peering +relationship, and A is a customer of both B and C. Suppose that A would +like to have the traffic destined to W to come from B only, and the +traffic destined to V from either B or C. How should A advertise its +routes to B and C? What AS routes does C receive? P20. Suppose ASs X and +Z are not directly connected but instead are connected by AS Y. Further +suppose that X has a peering agreement with Y, and that Y has a peering +agreement with Z. Finally, suppose that Z wants to transit all of Y's +traffic but does not want to transit X's traffic. Does BGP allow Z to +implement this policy? P21. Consider the two ways in which communication +occurs between a managing entity and a managed device: request-response +mode and trapping. What are the pros and cons of these two approaches, +in terms of (1) overhead, (2) notification time when exceptional events +occur, and (3) robustness with respect to lost messages between the +managing entity and the device? P22. In Section 5.7 we saw that it was +preferable to transport SNMP messages in unreliable UDP datagrams. Why +do you think the designers of SNMP chose UDP rather than TCP as the +transport protocol of choice for SNMP? + +Socket Programming Assignment At the end of Chapter 2, there are four +socket programming assignments. Below, you will find a fifth assignment +which employs ICMP, a protocol discussed in this chapter. Assignment 5: +ICMP Ping Ping is a popular networking application used to test from a +remote location whether a particular host is up and reachable. It is +also often used to measure latency between the client host and the +target host. It works by sending ICMP "echo request" packets (i.e., ping +packets) to the target host and listening for ICMP "echo response" +replies (i.e., pong packets). Ping measures the RRT, records packet +loss, and calculates a statistical summary of multiple ping-pong +exchanges (the minimum, mean, max, and standard deviation of the +round-trip times). In this lab, you will write your own Ping application +in Python. Your application will use ICMP. But in order to keep your +program simple, you will not exactly follow the official specification +in RFC 1739. Note that you will only need to write the client side of +the program, as the functionality needed on the server side is built +into almost all operating systems. You can find full details of this +assignment, as well as important snippets of the Python code, at the Web +site http://www.pearsonhighered.com/csresources. Programming Assignment + +In this programming assignment, you will be writing a "distributed" set +of procedures that implements a distributed asynchronous distance-vector +routing for the network shown below. You are to write the following +routines that will "execute" asynchronously within the emulated +environment provided for this assignment. For node 0, you will write the +routines: + +rtinit0(). This routine will be called once at the beginning of the +emulation. rtinit0() has no arguments. It should initialize your +distance table in node 0 to reflect the direct costs of 1, 3, and 7 to +nodes 1, 2, and 3, respectively. In the figure above, all links are +bidirectional and the costs in both directions are identical. After +initializing the distance table and any other data structures needed by +your node 0 routines, it should then send its directly connected +neighbors (in this case, 1, 2, and 3) the cost of its minimum-cost paths +to all other network nodes. This minimum-cost information is sent to +neighboring nodes in a routing update packet by calling the routine +tolayer2(), as described in the full assignment. The format of the +routing update packet is also described in the full assignment. +rtupdate0(struct rtpkt *rcvdpkt). This routine will be called when node +0 receives a routing packet that was sent to it by one of its directly +connected neighbors. The parameter *rcvdpkt is a pointer to the packet +that was received. rtupdate0() is the "heart" of the distance-vector +algorithm. The values it receives in a routing update packet from some +other node i contain i's current shortest-path costs to all other +network nodes. rtupdate0() uses these received values to update its own +distance table (as specified by the distance-vector algorithm). If its +own minimum cost to another node changes as a result of the update, node +0 informs its directly connected neighbors of this change in minimum +cost by sending them a routing packet. Recall that in the +distance-vector algorithm, only directly connected nodes will exchange +routing packets. Thus, nodes 1 and 2 will communicate with each other, +but nodes 1 and 3 will not communicate with each other. Similar routines +are defined for nodes 1, 2, and 3. Thus, you will write eight procedures +in all: rtinit0(), rtinit1(), rtinit2(), rtinit3(), rtupdate0(), +rtupdate1(), rtupdate2(), and rtupdate3(). These routines will together +implement a distributed, asynchronous computation of the distance tables +for the topology and costs shown in the figure on the preceding page. +You can find the full details of the programming assignment, as well as +C code that you will need to create the simulated hardware/software +environment, at http://www.pearsonhighered.com/cs-resource. A Java +version of the assignment is also available. + +Wireshark Lab In the Web site for this textbook, +www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab +assignment that examines the use of the ICMP protocol in the ping and +traceroute commands. + +An Interview With... Jennifer Rexford Jennifer Rexford is a Professor in +the Computer Science department at Princeton University. Her research +has the broad goal of making computer networks easier to design and +manage, with particular emphasis on routing protocols. From 1996--2004, +she was a member of the Network Management and Performance department at +AT&T Labs--Research. While at AT&T, she designed techniques and tools +for network measurement, traffic engineering, and router configuration +that were deployed in AT&T's backbone network. Jennifer is co-author of +the book "Web Protocols and Practice: Networking Protocols, Caching, and +Traffic Measurement," published by Addison-Wesley in May 2001. She +served as the chair of ACM SIGCOMM from 2003 to 2007. She received her +BSE degree in electrical engineering from Princeton University in 1991, +and her PhD degree in electrical engineering and computer science from +the University of Michigan in 1996. In 2004, Jennifer was the winner of +ACM's Grace Murray Hopper Award for outstanding young computer +professional and appeared on the MIT TR-100 list of top innovators under +the age of 35. + +Please describe one or two of the most exciting projects you have worked +on during your career. What were the biggest challenges? When I was a +researcher at AT&T, a group of us designed a new way to manage routing +in Internet Service Provider backbone networks. Traditionally, network +operators configure each router individually, and these routers run +distributed protocols to compute paths through the network. We believed +that network management would be simpler and more flexible if network + +operators could exercise direct control over how routers forward traffic +based on a network-wide view of the topology and traffic. The Routing +Control Platform (RCP) we designed and built could compute the routes +for all of AT&T's backbone on a single commodity computer, and could +control legacy routers without modification. To me, this project was +exciting because we had a provocative idea, a working system, and +ultimately a real deployment in an operational network. Fast forward a +few years, and software-defined networking (SDN) has become a mainstream +technology, and standard protocols (like OpenFlow) have made it much +easier to tell the underlying switches what to do. How do you think +software-defined networking should evolve in the future? In a major +break from the past, control-plane software can be created by many +different programmers, not just at companies selling network equipment. +Yet, unlike the applications running on a server or a smart phone, +controller apps must work together to handle the same traffic. Network +operators do not want to perform load balancing on some traffic and +routing on other traffic; instead, they want to perform load balancing +and routing, together, on the same traffic. Future SDN controller +platforms should offer good programming abstractions for composing +independently written multiple controller applications together. More +broadly, good programming abstractions can make it easier to create +controller applications, without having to worry about low-level details +like flow table entries, traffic counters, bit patterns in packet +headers, and so on. Also, while an SDN controller is logically +centralized, the network still consists of a distributed collection of +devices. Future controllers should offer good abstractions for updating +the flow tables across the network, so apps can reason about what +happens to packets in flight while the devices are updated. Programming +abstractions for control-plane software is an exciting area for +interdisciplinary research between computer networking, distributed +systems, and programming languages, with a real chance for practical +impact in the years ahead. Where do you see the future of networking and +the Internet? Networking is an exciting field because the applications +and the underlying technologies change all the time. We are always +reinventing ourselves! Who would have predicted even ten years ago the +dominance of smart phones, allowing mobile users to access existing +applications as well as new location-based services? The emergence of +cloud computing is fundamentally changing the relationship between users +and the applications they run, and networked sensors and actuators (the +"Internet of Things") are enabling a wealth of new applications (and +security vulnerabilities!). The pace of innovation is truly inspiring. +The underlying network is a crucial component in all of these +innovations. Yet, the network is notoriously "in the way"---limiting +performance, compromising reliability, constraining applications, and +complicating the deployment and management of services. We should strive +to make the network of the future as invisible as the air we breathe, so +it never stands in the way of + +new ideas and valuable services. To do this, we need to raise the level +of abstraction above individual network devices and protocols (and their +attendant acronyms!), so we can reason about the network and the user's +high-level goals as a whole. What people inspired you professionally? +I've long been inspired by Sally Floyd at the International Computer +Science Institute. Her research is always purposeful, focusing on the +important challenges facing the Internet. She digs deeply into hard +questions until she understands the problem and the space of solutions +completely, and she devotes serious energy into "making things happen," +such as pushing her ideas into protocol standards and network equipment. +Also, she gives back to the community, through professional service in +numerous standards and research organizations and by creating tools +(such as the widely used ns-2 and ns-3 simulators) that enable other +researchers to succeed. She retired in 2009 but her influence on the +field will be felt for years to come. What are your recommendations for +students who want careers in computer science and networking? Networking +is an inherently interdisciplinary field. Applying techniques from other +disciplines breakthroughs in networking come from such diverse areas as +queuing theory, game theory, control theory, distributed systems, +network optimization, programming languages, machine learning, +algorithms, data structures, and so on. I think that becoming conversant +in a related field, or collaborating closely with experts in those +fields, is a wonderful way to put networking on a stronger foundation, +so we can learn how to build networks that are worthy of society's +trust. Beyond the theoretical disciplines, networking is exciting +because we create real artifacts that real people use. Mastering how to +design and build systems---by gaining experience in operating systems, +computer architecture, and so on---is another fantastic way to amplify +your knowledge of networking to help make the world a better place. + +Chapter 6 The Link Layer and LANs + +In the previous two chapters we learned that the network layer provides +a communication service between any two network hosts. Between the two +hosts, datagrams travel over a series of communication links, some wired +and some wireless, starting at the source host, passing through a series +of packet switches (switches and routers) and ending at the destination +host. As we continue down the protocol stack, from the network layer to +the link layer, we naturally wonder how packets are sent across the +individual links that make up the end-to-end communication path. How are +the networklayer datagrams encapsulated in the link-layer frames for +transmission over a single link? Are different link-layer protocols used +in the different links along the communication path? How are +transmission conflicts in broadcast links resolved? Is there addressing +at the link layer and, if so, how does the linklayer addressing operate +with the network-layer addressing we learned about in Chapter 4? And +what exactly is the difference between a switch and a router? We'll +answer these and other important questions in this chapter. In +discussing the link layer, we'll see that there are two fundamentally +different types of link-layer channels. The first type are broadcast +channels, which connect multiple hosts in wireless LANs, satellite +networks, and hybrid fiber-coaxial cable (HFC) access networks. Since +many hosts are connected to the same broadcast communication channel, a +so-called medium access protocol is needed to coordinate frame +transmission. In some cases, a central controller may be used to +coordinate transmissions; in other cases, the hosts themselves +coordinate transmissions. The second type of link-layer channel is the +point-to-point communication link, such as that often found between two +routers connected by a long-distance link, or between a user's office +computer and the nearby Ethernet switch to which it is connected. +Coordinating access to a point-to-point link is simpler; the reference +material on this book's Web site has a detailed discussion of the +Point-to-Point Protocol (PPP), which is used in settings ranging from +dial-up service over a telephone line to high-speed point-to-point frame +transport over fiber-optic links. We'll explore several important +link-layer concepts and technologies in this chapter. We'll dive deeper +into error detection and correction, a topic we touched on briefly in +Chapter 3. We'll consider multiple access networks and switched LANs, +including Ethernet---by far the most prevalent wired LAN technology. +We'll also look at virtual LANs, and data center networks. Although +WiFi, and more generally wireless LANs, are link-layer topics, we'll +postpone our study of these important topics until + +Chapter 7. + +6.1 Introduction to the Link Layer Let's begin with some important +terminology. We'll find it convenient in this chapter to refer to any +device that runs a link-layer (i.e., layer 2) protocol as a node. Nodes +include hosts, routers, switches, and WiFi access points (discussed in +Chapter 7). We will also refer to the communication channels that +connect adjacent nodes along the communication path as links. In order +for a datagram to be transferred from source host to destination host, +it must be moved over each of the individual links in the end-to-end +path. As an example, in the company network shown at the bottom of +Figure 6.1, consider sending a datagram from one of the wireless hosts +to one of the servers. This datagram will actually pass through six +links: a WiFi link between sending host and WiFi access point, an +Ethernet link between the access point and a link-layer switch; a link +between the link-layer switch and the router, a link between the two +routers; an Ethernet link between the router and a link-layer switch; +and finally an Ethernet link between the switch and the server. Over a +given link, a transmitting node encapsulates the datagram in a linklayer +frame and transmits the frame into the link. In order to gain further +insight into the link layer and how it relates to the network layer, +let's consider a transportation analogy. Consider a travel agent who is +planning a trip for a tourist traveling from Princeton, New Jersey, to +Lausanne, Switzerland. The travel agent decides that it is most +convenient for the tourist to take a limousine from Princeton to JFK +airport, then a plane from JFK airport to Geneva's airport, and finally +a train from Geneva's airport to Lausanne's train station. Once the +travel agent makes the three reservations, it is the responsibility of +the Princeton limousine company to get the tourist from Princeton to +JFK; it is the responsibility of the airline company to get the tourist +from JFK to Geneva; and it is the responsibility + +Figure 6.1 Six link-layer hops between wireless host and server + +of the Swiss train service to get the tourist from Geneva to Lausanne. +Each of the three segments of the trip is "direct" between two +"adjacent" locations. Note that the three transportation segments are +managed by different companies and use entirely different transportation +modes (limousine, plane, and train). Although the transportation modes +are different, they each provide the basic service of moving passengers +from one location to an adjacent location. In this transportation +analogy, the tourist is a datagram, each transportation segment is a +link, the transportation mode is a link-layer protocol, and the + +travel agent is a routing protocol. + +6.1.1 The Services Provided by the Link Layer Although the basic service +of any link layer is to move a datagram from one node to an adjacent +node over a single communication link, the details of the provided +service can vary from one link-layer protocol to the next. Possible +services that can be offered by a link-layer protocol include: Framing. +Almost all link-layer protocols encapsulate each network-layer datagram +within a link-layer frame before transmission over the link. A frame +consists of a data field, in which the network-layer datagram is +inserted, and a number of header fields. The structure of the frame is +specified by the link-layer protocol. We'll see several different frame +formats when we examine specific link-layer protocols in the second half +of this chapter. Link access. A medium access control (MAC) protocol +specifies the rules by which a frame is transmitted onto the link. For +point-to-point links that have a single sender at one end of the link +and a single receiver at the other end of the link, the MAC protocol is +simple (or nonexistent)---the sender can send a frame whenever the link +is idle. The more interesting case is when multiple nodes share a single +broadcast link---the so-called multiple access problem. Here, the MAC +protocol serves to coordinate the frame transmissions of the many nodes. +Reliable delivery. When a link-layer protocol provides reliable delivery +service, it guarantees to move each network-layer datagram across the +link without error. Recall that certain transport-layer protocols (such +as TCP) also provide a reliable delivery service. Similar to a +transport-layer reliable delivery service, a link-layer reliable +delivery service can be achieved with acknowledgments and +retransmissions (see Section 3.4). A link-layer reliable delivery +service is often used for links that are prone to high error rates, such +as a wireless link, with the goal of correcting an error locally---on +the link where the error occurs---rather than forcing an end-to-end +retransmission of the data by a transport- or application-layer +protocol. However, link-layer reliable delivery can be considered an +unnecessary overhead for low bit-error links, including fiber, coax, and +many twisted-pair copper links. For this reason, many wired link-layer +protocols do not provide a reliable delivery service. Error detection +and correction. The link-layer hardware in a receiving node can +incorrectly decide that a bit in a frame is zero when it was transmitted +as a one, and vice versa. Such bit errors are introduced by signal +attenuation and electromagnetic noise. Because there is no need to +forward a datagram that has an error, many link-layer protocols provide +a mechanism to detect such bit errors. This is done by having the +transmitting node include error-detection bits in the frame, and having +the receiving node perform an error check. Recall from Chapters 3 and 4 +that the Internet's transport layer and network layer also provide a +limited form of error detection---the Internet checksum. Error detection +in the link layer is usually more sophisticated and is implemented in +hardware. Error correction is similar to error detection, except that a +receiver not only detects when bit errors have occurred in the frame but +also determines exactly where in the frame the errors have occurred (and + +then corrects these errors). + +6.1.2 Where Is the Link Layer Implemented? Before diving into our +detailed study of the link layer, let's conclude this introduction by +considering the question of where the link layer is implemented. We'll +focus here on an end system, since we learned in Chapter 4 that the link +layer is implemented in a router's line card. Is a host's link layer +implemented in hardware or software? Is it implemented on a separate +card or chip, and how does it interface with the rest of a host's +hardware and operating system components? Figure 6.2 shows a typical +host architecture. For the most part, the link layer is implemented in a +network adapter, also sometimes known as a network interface card (NIC). +At the heart of the network adapter is the link-layer controller, +usually a single, special-purpose chip that implements many of the +link-layer services (framing, link access, error detection, and so on). +Thus, much of a link-layer controller's functionality is implemented in +hardware. For example, Intel's 710 adapter \[Intel 2016\] implements the +Ethernet protocols we'll study in Section 6.5; the Atheros AR5006 +\[Atheros 2016\] controller implements the 802.11 WiFi protocols we'll +study in Chapter 7. Until the late 1990s, most network adapters were +physically separate cards (such as a PCMCIA card or a plug-in card +fitting into a PC's PCI card slot) but increasingly, network adapters +are being integrated onto the host's motherboard ---a so-called +LAN-on-motherboard configuration. On the sending side, the controller +takes a datagram that has been created and stored in host memory by the +higher layers of the protocol stack, encapsulates the datagram in a +link-layer frame (filling in the frame's various fields), and then +transmits the frame into the communication link, following the +linkaccess protocol. On the receiving side, a controller receives the +entire frame, and extracts the networklayer datagram. If the link layer +performs error detection, then it is the sending controller that sets +the error-detection bits in the frame header and it is the receiving +controller that performs error detection. Figure 6.2 shows a network +adapter attaching to a host's bus (e.g., a PCI or PCI-X bus), where it +looks much like any other I/O device to the other host + +Figure 6.2 Network adapter: Its relationship to other host components +and to protocol stack functionality + +components. Figure 6.2 also shows that while most of the link layer is +implemented in hardware, part of the link layer is implemented in +software that runs on the host's CPU. The software components of the +link layer implement higher-level link-layer functionality such as +assembling link-layer addressing information and activating the +controller hardware. On the receiving side, link-layer software responds +to controller interrupts (e.g., due to the receipt of one or more +frames), handling error conditions and passing a datagram up to the +network layer. Thus, the link layer is a combination of hardware and +software---the place in the protocol stack where software meets +hardware. \[Intel 2016\] provides a readable overview (as well as a +detailed description) of the XL710 controller from a softwareprogramming +point of view. + +6.2 Error-Detection and -Correction Techniques In the previous section, +we noted that bit-level error detection and correction---detecting and +correcting the corruption of bits in a link-layer frame sent from one +node to another physically connected neighboring node---are two services +often provided by the link layer. We saw in Chapter 3 that +errordetection and -correction services are also often offered at the +transport layer as well. In this section, we'll examine a few of the +simplest techniques that can be used to detect and, in some cases, +correct such bit errors. A full treatment of the theory and +implementation of this topic is itself the topic of many textbooks (for +example, \[Schwartz 1980\] or \[Bertsekas 1991\]), and our treatment +here is necessarily brief. Our goal here is to develop an intuitive feel +for the capabilities that error-detection and -correction techniques +provide and to see how a few simple techniques work and are used in +practice in the link layer. Figure 6.3 illustrates the setting for our +study. At the sending node, data, D, to be protected against bit errors +is augmented with error-detection and -correction bits (EDC). Typically, +the data to be protected includes not only the datagram passed down from +the network layer for transmission across the link, but also link-level +addressing information, sequence numbers, and other fields in the link +frame header. Both D and EDC are sent to the receiving node in a +link-level frame. At the receiving node, a sequence of bits, D′ and EDC′ +is received. Note that D′ and EDC′ may differ from the original D and +EDC as a result of in-transit bit flips. The receiver's challenge is to +determine whether or not D′ is the same as the original D, given that it +has only received D′ and EDC′. The exact wording of the receiver's +decision in Figure 6.3 (we ask whether an error is detected, not whether +an error has occurred!) is important. Error-detection and -correction +techniques allow the receiver to sometimes, but not always, detect that +bit errors have occurred. Even with the use of error-detection bits +there still may be undetected bit errors; that is, the receiver may be +unaware that the received information contains bit errors. As a + +Figure 6.3 Error-detection and -correction scenario + +consequence, the receiver might deliver a corrupted datagram to the +network layer, or be unaware that the contents of a field in the frame's +header has been corrupted. We thus want to choose an errordetection +scheme that keeps the probability of such occurrences small. Generally, +more sophisticated error-detection and-correction techniques (that is, +those that have a smaller probability of allowing undetected bit errors) +incur a larger overhead---more computation is needed to compute and +transmit a larger number of error-detection and -correction bits. Let's +now examine three techniques for detecting errors in the transmitted +data---parity checks (to illustrate the basic ideas behind error +detection and correction), checksumming methods (which are more +typically used in the transport layer), and cyclic redundancy checks +(which are more typically used in the link layer in an adapter). + +6.2.1 Parity Checks Perhaps the simplest form of error detection is the +use of a single parity bit. Suppose that the information to be sent, D +in Figure 6.4, has d bits. In an even parity scheme, the sender simply +includes one additional bit and chooses its value such that the total +number of 1s in the d+1 bits (the original information plus a parity +bit) is even. For odd parity schemes, the parity bit value is chosen +such that there is an odd number of 1s. Figure 6.4 illustrates an even +parity scheme, with the single parity bit being stored in a separate +field. + +Receiver operation is also simple with a single parity bit. The receiver +need only count the number of 1s in the received d+1 bits. If an odd +number of 1-valued bits are found with an even parity scheme, the +receiver knows that at least one bit error has occurred. More precisely, +it knows that some odd number of bit errors have occurred. But what +happens if an even number of bit errors occur? You should convince +yourself that this would result in an undetected error. If the +probability of bit errors is small and errors can be assumed to occur +independently from one bit to the next, the probability of multiple bit +errors in a packet would be extremely small. In this case, a single +parity bit might suffice. However, measurements have shown that, rather +than occurring independently, errors are often clustered together in +"bursts." Under burst error conditions, the probability of undetected +errors in a frame protected by single-bit parity can approach 50 percent +\[Spragins 1991\]. Clearly, a more robust error-detection scheme is +needed (and, fortunately, is used in practice!). But before examining +error-detection schemes that are used in practice, let's consider a +simple + +Figure 6.4 One-bit even parity + +generalization of one-bit parity that will provide us with insight into +error-correction techniques. Figure 6.5 shows a two-dimensional +generalization of the single-bit parity scheme. Here, the d bits in D +are divided into i rows and j columns. A parity value is computed for +each row and for each column. The resulting i+j+1 parity bits comprise +the link-layer frame's error-detection bits. Suppose now that a single +bit error occurs in the original d bits of information. With this +twodimensional parity scheme, the parity of both the column and the row +containing the flipped bit will be in error. The receiver can thus not +only detect the fact that a single bit error has occurred, but can use +the column and row indices of the column and row with parity errors to +actually identify the bit that was corrupted and correct that error! +Figure 6.5 shows an example in which the 1-valued bit in position (2,2) +is corrupted and switched to a 0---an error that is both detectable and +correctable at the receiver. Although our discussion has focused on the +original d bits of information, a single error in the parity bits +themselves is also detectable and correctable. Two-dimensional parity +can also detect (but not correct!) any combination of two errors in a +packet. Other properties of the two-dimensional parity scheme are +explored in the problems at the end of the chapter. + +Figure 6.5 Two-dimensional even parity + +The ability of the receiver to both detect and correct errors is known +as forward error correction (FEC). These techniques are commonly used in +audio storage and playback devices such as audio CDs. In a network +setting, FEC techniques can be used by themselves, or in conjunction +with link-layer ARQ techniques similar to those we examined in Chapter +3. FEC techniques are valuable because they can decrease the number of +sender retransmissions required. Perhaps more important, they allow for +immediate correction of errors at the receiver. This avoids having to +wait for the round-trip propagation delay needed for the sender to +receive a NAK packet and for the retransmitted packet to propagate back +to the receiver---a potentially important advantage for real-time +network applications \[Rubenstein 1998\] or links (such as deep-space +links) with long propagation delays. Research examining the use of FEC +in error-control protocols includes \[Biersack 1992; Nonnenmacher 1998; +Byers 1998; Shacham 1990\]. + +6.2.2 Checksumming Methods In checksumming techniques, the d bits of +data in Figure 6.4 are treated as a sequence of k-bit integers. One +simple checksumming method is to simply sum these k-bit integers and use +the resulting sum as the error-detection bits. The Internet checksum is +based on this approach---bytes of data are + +treated as 16-bit integers and summed. The 1s complement of this sum +then forms the Internet checksum that is carried in the segment header. +As discussed in Section 3.3, the receiver checks the checksum by taking +the 1s complement of the sum of the received data (including the +checksum) and checking whether the result is all 1 bits. If any of the +bits are 0, an error is indicated. RFC 1071 discusses the Internet +checksum algorithm and its implementation in detail. In the TCP and UDP +protocols, the Internet checksum is computed over all fields (header and +data fields included). In IP the checksum is computed over the IP header +(since the UDP or TCP segment has its own checksum). In other protocols, +for example, XTP \[Strayer 1992\], one checksum is computed over the +header and another checksum is computed over the entire packet. +Checksumming methods require relatively little packet overhead. For +example, the checksums in TCP and UDP use only 16 bits. However, they +provide relatively weak protection against errors as compared with +cyclic redundancy check, which is discussed below and which is often +used in the link layer. A natural question at this point is, Why is +checksumming used at the transport layer and cyclic redundancy check +used at the link layer? Recall that the transport layer is typically +implemented in software in a host as part of the host's operating +system. Because transport-layer error detection is implemented in +software, it is important to have a simple and fast error-detection +scheme such as checksumming. On the other hand, error detection at the +link layer is implemented in dedicated hardware in adapters, which can +rapidly perform the more complex CRC operations. Feldmeier \[Feldmeier +1995\] presents fast software implementation techniques for not only +weighted checksum codes, but CRC (see below) and other codes as well. + +6.2.3 Cyclic Redundancy Check (CRC) An error-detection technique used +widely in today's computer networks is based on cyclic redundancy check +(CRC) codes. CRC codes are also known as polynomial codes, since it is +possible to view the bit string to be sent as a polynomial whose +coefficients are the 0 and 1 values in the bit string, with operations +on the bit string interpreted as polynomial arithmetic. CRC codes +operate as follows. Consider the d-bit piece of data, D, that the +sending node wants to send to the receiving node. The sender and +receiver must first agree on an r+1 bit pattern, known as a generator, +which we will denote as G. We will require that the most significant +(leftmost) bit of G be a 1. The key idea behind CRC codes is shown in +Figure 6.6. For a given piece of data, D, the sender will choose r +additional bits, R, and append them to D such that the resulting d+r bit +pattern (interpreted as a binary number) is exactly divisible by G +(i.e., has no remainder) using modulo-2 arithmetic. The process of error +checking with CRCs is thus simple: The receiver divides the d+r received +bits by G. If the remainder is nonzero, the receiver knows that an error +has occurred; otherwise the data is accepted as being correct. + +All CRC calculations are done in modulo-2 arithmetic without carries in +addition or borrows in subtraction. This means that addition and +subtraction are identical, and both are equivalent to the bitwise +exclusive-or (XOR) of the operands. Thus, for example, + +1011 XOR 0101 = 1110 1001 XOR 1101 = 0100 + +Also, we similarly have + +1011 - 0101 = 1110 1001 - 1101 = 0100 + +Multiplication and division are the same as in base-2 arithmetic, except +that any required addition or subtraction is done without carries or +borrows. As in regular + +Figure 6.6 CRC + +binary arithmetic, multiplication by 2k left shifts a bit pattern by k +places. Thus, given D and R, the quantity D⋅2rXOR R yields the d+r bit +pattern shown in Figure 6.6. We'll use this algebraic characterization +of the d+r bit pattern from Figure 6.6 in our discussion below. Let us +now turn to the crucial question of how the sender computes R. Recall +that we want to find R such that there is an n such that D⋅2rXOR R=nG +That is, we want to choose R such that G divides into D⋅2rXOR R without +remainder. If we XOR (that is, add modulo-2, without carry) R to both +sides of the above equation, we get + +D⋅2r=nG XOR R This equation tells us that if we divide D⋅2r by G, the +value of the remainder is precisely R. In other words, we can calculate +R as R=remainderD⋅2rG Figure 6.7 illustrates this calculation for the +case of D=101110, d=6, G=1001, and r=3. The 9 bits transmitted in this +case are 101 110 011. You should check these calculations for yourself +and also check that indeed D⋅2r=101011⋅G XOR R. + +Figure 6.7 A sample CRC calculation + +International standards have been defined for 8-, 12-, 16-, and 32-bit +generators, G. The CRC-32 32-bit standard, which has been adopted in a +number of link-level IEEE protocols, uses a generator of +GCRC-32=100000100110000010001110110110111 Each of the CRC standards can +detect burst errors of fewer than r+1 bits. (This means that all +consecutive bit errors of r bits or fewer will be detected.) +Furthermore, under appropriate assumptions, a burst of length greater +than r+1 bits is detected with probability 1−0.5r. Also, each of the CRC +standards can detect any odd number of bit errors. See \[Williams 1993\] +for a discussion of implementing CRC checks. The theory behind CRC codes +and even more powerful codes is beyond the scope of this text. The text +\[Schwartz 1980\] provides an excellent introduction to this topic. + +6.3 Multiple Access Links and Protocols In the introduction to this +chapter, we noted that there are two types of network links: +point-to-point links and broadcast links. A point-to-point link consists +of a single sender at one end of the link and a single receiver at the +other end of the link. Many link-layer protocols have been designed for +point-to-point links; the point-to-point protocol (PPP) and high-level +data link control (HDLC) are two such protocols. The second type of +link, a broadcast link, can have multiple sending and receiving nodes +all connected to the same, single, shared broadcast channel. The term +broadcast is used here because when any one node transmits a frame, the +channel broadcasts the frame and each of the other nodes receives a +copy. Ethernet and wireless LANs are examples of broadcast link-layer +technologies. In this section we'll take a step back from specific +link-layer protocols and first examine a problem of central importance +to the link layer: how to coordinate the access of multiple sending and +receiving nodes to a shared broadcast channel---the multiple access +problem. Broadcast channels are often used in LANs, networks that are +geographically concentrated in a single building (or on a corporate or +university campus). Thus, we'll look at how multiple access channels are +used in LANs at the end of this section. We are all familiar with the +notion of broadcasting---television has been using it since its +invention. But traditional television is a one-way broadcast (that is, +one fixed node transmitting to many receiving nodes), while nodes on a +computer network broadcast channel can both send and receive. Perhaps a +more apt human analogy for a broadcast channel is a cocktail party, +where many people gather in a large room (the air providing the +broadcast medium) to talk and listen. A second good analogy is something +many readers will be familiar with---a classroom---where teacher(s) and +student(s) similarly share the same, single, broadcast medium. A central +problem in both scenarios is that of determining who gets to talk (that +is, transmit into the channel) and when. As humans, we've evolved an +elaborate set of protocols for sharing the broadcast channel: "Give +everyone a chance to speak." "Don't speak until you are spoken to." +"Don't monopolize the conversation." "Raise your hand if you have a +question." "Don't interrupt when someone is speaking." "Don't fall +asleep when someone is talking." Computer networks similarly have +protocols---so-called multiple access protocols---by which nodes + +regulate their transmission into the shared broadcast channel. As shown +in Figure 6.8, multiple access protocols are needed in a wide variety of +network settings, including both wired and wireless access networks, and +satellite networks. Although technically each node accesses the +broadcast channel through its adapter, in this section we will refer to +the node as the sending and + +Figure 6.8 Various multiple access channels + +receiving device. In practice, hundreds or even thousands of nodes can +directly communicate over a broadcast channel. Because all nodes are +capable of transmitting frames, more than two nodes can transmit frames +at the same time. When this happens, all of the nodes receive multiple +frames at the same time; that is, the transmitted frames collide at all +of the receivers. Typically, when there is a collision, none of the +receiving nodes can make any sense of any of the frames that were +transmitted; in a sense, the signals of the colliding frames become +inextricably tangled together. Thus, all the frames involved in the +collision are lost, and the broadcast channel is wasted during the +collision interval. Clearly, if many nodes want to transmit frames +frequently, many transmissions will result in collisions, and much of +the bandwidth of the broadcast channel will be wasted. In order to +ensure that the broadcast channel performs useful work when multiple +nodes are active, it is + +necessary to somehow coordinate the transmissions of the active nodes. +This coordination job is the responsibility of the multiple access +protocol. Over the past 40 years, thousands of papers and hundreds of +PhD dissertations have been written on multiple access protocols; a +comprehensive survey of the first 20 years of this body of work is \[Rom +1990\]. Furthermore, active research in multiple access protocols +continues due to the continued emergence of new types of links, +particularly new wireless links. Over the years, dozens of multiple +access protocols have been implemented in a variety of link-layer +technologies. Nevertheless, we can classify just about any multiple +access protocol as belonging to one of three categories: channel +partitioning protocols, random access protocols, and taking-turns +protocols. We'll cover these categories of multiple access protocols in +the following three subsections. Let's conclude this overview by noting +that, ideally, a multiple access protocol for a broadcast channel of +rate R bits per second should have the following desirable +characteristics: + +1. When only one node has data to send, that node has a throughput of R + bps. + +2. When M nodes have data to send, each of these nodes has a throughput + of R/M bps. This need not necessarily imply that each of the M nodes + always has an instantaneous rate of R/M, but rather that each node + should have an average transmission rate of R/M over some suitably + defined interval of time. + +3. The protocol is decentralized; that is, there is no master node that + represents a single point of failure for the network. + +4. The protocol is simple, so that it is inexpensive to implement. + +6.3.1 Channel Partitioning Protocols Recall from our early discussion +back in Section 1.3 that time-division multiplexing (TDM) and +frequency-division multiplexing (FDM) are two techniques that can + +Figure 6.9 A four-node TDM and FDM example + +be used to partition a broadcast channel's bandwidth among all nodes +sharing that channel. As an example, suppose the channel supports N +nodes and that the transmission rate of the channel is R bps. TDM +divides time into time frames and further divides each time frame into N +time slots. (The TDM time frame should not be confused with the +link-layer unit of data exchanged between sending and receiving +adapters, which is also called a frame. In order to reduce confusion, in +this subsection we'll refer to the link-layer unit of data exchanged as +a packet.) Each time slot is then assigned to one of the N nodes. +Whenever a node has a packet to send, it transmits the packet's bits +during its assigned time slot in the revolving TDM frame. Typically, +slot sizes are chosen so that a single packet can be transmitted during +a slot time. Figure 6.9 shows a simple four-node TDM example. Returning +to our cocktail party analogy, a TDM-regulated cocktail party would +allow one partygoer to speak for a fixed period of time, then allow +another partygoer to speak for the same amount of time, and so on. Once +everyone had had a chance to talk, the pattern would repeat. TDM is +appealing because it eliminates collisions and is perfectly fair: Each +node gets a dedicated transmission rate of R/N bps during each frame +time. However, it has two major drawbacks. First, a node is limited to +an average rate of R/N bps even when it is the only node with packets to +send. A second drawback is that a node must always wait for its turn in +the transmission sequence---again, even when it is the only node with a +frame to send. Imagine the partygoer who is the only one with anything +to say (and imagine that this is the even rarer circumstance where +everyone wants to hear what that one person has to say). Clearly, TDM +would be a poor choice for a multiple access protocol for this +particular party. + +While TDM shares the broadcast channel in time, FDM divides the R bps +channel into different frequencies (each with a bandwidth of R/N) and +assigns each frequency to one of the N nodes. FDM thus creates N smaller +channels of R/N bps out of the single, larger R bps channel. FDM shares +both the advantages and drawbacks of TDM. It avoids collisions and +divides the bandwidth fairly among the N nodes. However, FDM also shares +a principal disadvantage with TDM---a node is limited to a bandwidth of +R/N, even when it is the only node with packets to send. A third channel +partitioning protocol is code division multiple access (CDMA). While TDM +and FDM assign time slots and frequencies, respectively, to the nodes, +CDMA assigns a different code to each node. Each node then uses its +unique code to encode the data bits it sends. If the codes are chosen +carefully, CDMA networks have the wonderful property that different +nodes can transmit simultaneously and yet have their respective +receivers correctly receive a sender's encoded data bits (assuming the +receiver knows the sender's code) in spite of interfering transmissions +by other nodes. CDMA has been used in military systems for some time +(due to its anti-jamming properties) and now has widespread civilian +use, particularly in cellular telephony. Because CDMA's use is so +tightly tied to wireless channels, we'll save our discussion of the +technical details of CDMA until Chapter 7. For now, it will suffice to +know that CDMA codes, like time slots in TDM and frequencies in FDM, can +be allocated to the multiple access channel users. + +6.3.2 Random Access Protocols The second broad class of multiple access +protocols are random access protocols. In a random access protocol, a +transmitting node always transmits at the full rate of the channel, +namely, R bps. When there is a collision, each node involved in the +collision repeatedly retransmits its frame (that is, packet) until its +frame gets through without a collision. But when a node experiences a +collision, it doesn't necessarily retransmit the frame right away. +Instead it waits a random delay before retransmitting the frame. Each +node involved in a collision chooses independent random delays. Because +the random delays are independently chosen, it is possible that one of +the nodes will pick a delay that is sufficiently less than the delays of +the other colliding nodes and will therefore be able to sneak its frame +into the channel without a collision. There are dozens if not hundreds +of random access protocols described in the literature \[Rom 1990; +Bertsekas 1991\]. In this section we'll describe a few of the most +commonly used random access protocols---the ALOHA protocols \[Abramson +1970; Abramson 1985; Abramson 2009\] and the carrier sense multiple +access (CSMA) protocols \[Kleinrock 1975b\]. Ethernet \[Metcalfe 1976\] +is a popular and widely deployed CSMA protocol. Slotted ALOHA + +Let's begin our study of random access protocols with one of the +simplest random access protocols, the slotted ALOHA protocol. In our +description of slotted ALOHA, we assume the following: All frames +consist of exactly L bits. Time is divided into slots of size L/R +seconds (that is, a slot equals the time to transmit one frame). Nodes +start to transmit frames only at the beginnings of slots. The nodes are +synchronized so that each node knows when the slots begin. If two or +more frames collide in a slot, then all the nodes detect the collision +event before the slot ends. Let p be a probability, that is, a number +between 0 and 1. The operation of slotted ALOHA in each node is simple: +When the node has a fresh frame to send, it waits until the beginning of +the next slot and transmits the entire frame in the slot. If there isn't +a collision, the node has successfully transmitted its frame and thus +need not consider retransmitting the frame. (The node can prepare a new +frame for transmission, if it has one.) If there is a collision, the +node detects the collision before the end of the slot. The node +retransmits its frame in each subsequent slot with probability p until +the frame is transmitted without a collision. By retransmitting with +probability p, we mean that the node effectively tosses a biased coin; +the event heads corresponds to "retransmit," which occurs with +probability p. The event tails corresponds to "skip the slot and toss +the coin again in the next slot"; this occurs with probability (1−p). +All nodes involved in the collision toss their coins independently. +Slotted ALOHA would appear to have many advantages. Unlike channel +partitioning, slotted ALOHA allows a node to transmit continuously at +the full rate, R, when that node is the only active node. (A node is +said to be active if it has frames to send.) Slotted ALOHA is also +highly decentralized, because each node detects collisions and +independently decides when to retransmit. (Slotted ALOHA does, however, +require the slots to be synchronized in the nodes; shortly we'll discuss +an unslotted version of the ALOHA protocol, as well as CSMA protocols, +none of which require such synchronization.) Slotted ALOHA is also an +extremely simple protocol. Slotted ALOHA works well when there is only +one active node, but how efficient is it when there are multiple active +nodes? There are two possible efficiency + +Figure 6.10 Nodes 1, 2, and 3 collide in the first slot. Node 2 finally +succeeds in the fourth slot, node 1 in the eighth slot, and node 3 in +the ninth slot + +concerns here. First, as shown in Figure 6.10, when there are multiple +active nodes, a certain fraction of the slots will have collisions and +will therefore be "wasted." The second concern is that another fraction +of the slots will be empty because all active nodes refrain from +transmitting as a result of the probabilistic transmission policy. The +only "unwasted" slots will be those in which exactly one node transmits. +A slot in which exactly one node transmits is said to be a successful +slot. The efficiency of a slotted multiple access protocol is defined to +be the long-run fraction of successful slots in the case when there are +a large number of active nodes, each always having a large number of +frames to send. Note that if no form of access control were used, and +each node were to immediately retransmit after each collision, the +efficiency would be zero. Slotted ALOHA clearly increases the efficiency +beyond zero, but by how much? We now proceed to outline the derivation +of the maximum efficiency of slotted ALOHA. To keep this derivation +simple, let's modify the protocol a little and assume that each node +attempts to transmit a frame in each slot with probability p. (That is, +we assume that each node always has a frame to send and that the node +transmits with probability p for a fresh frame as well as for a frame +that has already suffered a collision.) Suppose there are N nodes. Then +the probability that a given slot is a successful slot is the +probability that one of the nodes transmits and that the remaining N−1 +nodes do not transmit. The probability that a given node transmits is p; +the probability that the remaining nodes do not transmit is (1−p)N−1. +Therefore the probability a given node has a success is p(1−p)N−1. +Because there are N nodes, the probability that any one of the N nodes +has a success is Np(1−p)N−1. Thus, when there are N active nodes, the +efficiency of slotted ALOHA is Np(1−p)N−1. To obtain the maximum +efficiency for N active nodes, we have to find the p\* that maximizes +this expression. (See the + +homework problems for a general outline of this derivation.) And to +obtain the maximum efficiency for a large number of active nodes, we +take the limit of Np*(1−p*)N−1 as N approaches infinity. (Again, see the +homework problems.) After performing these calculations, we'll find that +the maximum efficiency of the protocol is given by 1/e=0.37. That is, +when a large number of nodes have many frames to transmit, then (at +best) only 37 percent of the slots do useful work. Thus the effective +transmission rate of the channel is not R bps but only 0.37 R bps! A +similar analysis also shows that 37 percent of the slots go empty and 26 +percent of slots have collisions. Imagine the poor network administrator +who has purchased a 100-Mbps slotted ALOHA system, expecting to be able +to use the network to transmit data among a large number of users at an +aggregate rate of, say, 80 Mbps! Although the channel is capable of +transmitting a given frame at the full channel rate of 100 Mbps, in the +long run, the successful throughput of this channel will be less than 37 +Mbps. ALOHA The slotted ALOHA protocol required that all nodes +synchronize their transmissions to start at the beginning of a slot. The +first ALOHA protocol \[Abramson 1970\] was actually an unslotted, fully +decentralized protocol. In pure ALOHA, when a frame first arrives (that +is, a network-layer datagram is passed down from the network layer at +the sending node), the node immediately transmits the frame in its +entirety into the broadcast channel. If a transmitted frame experiences +a collision with one or more other transmissions, the node will then +immediately (after completely transmitting its collided frame) +retransmit the frame with probability p. Otherwise, the node waits for a +frame transmission time. After this wait, it then transmits the frame +with probability p, or waits (remaining idle) for another frame time +with probability 1 -- p. To determine the maximum efficiency of pure +ALOHA, we focus on an individual node. We'll make the same assumptions +as in our slotted ALOHA analysis and take the frame transmission time to +be the unit of time. At any given time, the probability that a node is +transmitting a frame is p. Suppose this frame begins transmission at +time t0. As shown in Figure 6.11, in order for this frame to be +successfully transmitted, no other nodes can begin their transmission in +the interval of time \[ t0−1,t0\]. Such a transmission would overlap +with the beginning of the transmission of node i's frame. The +probability that all other nodes do not begin a transmission in this +interval is (1−p)N−1. Similarly, no other node can begin a transmission +while node i is transmitting, as such a transmission would overlap with +the latter part of node i's transmission. The probability that all other +nodes do not begin a transmission in this interval is also (1−p)N−1. +Thus, the probability that a given node has a successful transmission is +p(1−p)2(N−1). By taking limits as in the slotted ALOHA case, we find +that the maximum efficiency of the pure ALOHA protocol is only +1/(2e)---exactly half that of slotted ALOHA. This then is the price to +be paid for a fully decentralized ALOHA protocol. + +Figure 6.11 Interfering transmissions in pure ALOHA + +Carrier Sense Multiple Access (CSMA) In both slotted and pure ALOHA, a +node's decision to transmit is made independently of the activity of the +other nodes attached to the broadcast channel. In particular, a node +neither pays attention to whether another node happens to be +transmitting when it begins to transmit, nor stops transmitting if +another node begins to interfere with its transmission. In our cocktail +party analogy, ALOHA protocols are quite like a boorish partygoer who +continues to chatter away regardless of whether other people are +talking. As humans, we have human protocols that allow us not only to +behave with more civility, but also to decrease the amount of time spent +"colliding" with each other in conversation and, consequently, to +increase the amount of data we exchange in our conversations. +Specifically, there are two important rules for polite human +conversation: Listen before speaking. If someone else is speaking, wait +until they are finished. In the networking world, this is called carrier +sensing---a node listens to the channel before transmitting. If a frame +from another node is currently being transmitted into the channel, a +node then waits until it detects no transmissions for a short amount of +time and then begins transmission. If someone else begins talking at the +same time, stop talking. In the networking world, this is called +collision detection---a transmitting node listens to the channel while +it is transmitting. If it detects that another node is transmitting an +interfering frame, it stops transmitting and waits a random amount of +time before repeating the sense-and-transmit-when-idle cycle. These two +rules are embodied in the family of carrier sense multiple access (CSMA) +and CSMA with collision detection (CSMA/CD) protocols \[Kleinrock 1975b; +Metcalfe 1976; Lam 1980; Rom 1990\]. Many variations on CSMA and + +CASE HISTORY + +NORM ABRAMSON AND ALOHANET Norm Abramson, a PhD engineer, had a passion +for surfing and an interest in packet switching. This combination of +interests brought him to the University of Hawaii in 1969. Hawaii +consists of many mountainous islands, making it difficult to install and +operate land-based networks. When not surfing, Abramson thought about +how to design a network that does packet switching over radio. The +network he designed had one central host and several secondary nodes +scattered over the Hawaiian Islands. The network had two channels, each +using a different frequency band. The downlink channel broadcasted +packets from the central host to the secondary hosts; and the upstream +channel sent packets from the secondary hosts to the central host. In +addition to sending informational packets, the central host also sent on +the downstream channel an acknowledgment for each packet successfully +received from the secondary hosts. Because the secondary hosts +transmitted packets in a decentralized fashion, collisions on the +upstream channel inevitably occurred. This observation led Abramson to +devise the pure ALOHA protocol, as described in this chapter. In 1970, +with continued funding from ARPA, Abramson connected his ALOHAnet to the +ARPAnet. Abramson's work is important not only because it was the first +example of a radio packet network, but also because it inspired Bob +Metcalfe. A few years later, Metcalfe modified the ALOHA protocol to +create the CSMA/CD protocol and the Ethernet LAN. + +CSMA/CD have been proposed. Here, we'll consider a few of the most +important, and fundamental, characteristics of CSMA and CSMA/CD. The +first question that you might ask about CSMA is why, if all nodes +perform carrier sensing, do collisions occur in the first place? After +all, a node will refrain from transmitting whenever it senses that +another node is transmitting. The answer to the question can best be +illustrated using space-time diagrams \[Molle 1987\]. Figure 6.12 shows +a space-time diagram of four nodes (A, B, C, D) attached to a linear +broadcast bus. The horizontal axis shows the position of each node in +space; the vertical axis represents time. At time t0, node B senses the +channel is idle, as no other nodes are currently transmitting. Node B +thus begins transmitting, with its bits propagating in both directions +along the broadcast medium. The downward propagation of B's bits in +Figure 6.12 with increasing time indicates that a nonzero amount of time +is needed for B's bits actually to propagate (albeit at near the speed +of light) along the broadcast medium. At time t1(t1\>t0), node D has a +frame to send. Although node B is currently transmitting at time t1, the +bits being transmitted by B have yet to reach D, and thus D senses + +Figure 6.12 Space-time diagram of two CSMA nodes with colliding +transmissions + +the channel idle at t1. In accordance with the CSMA protocol, D thus +begins transmitting its frame. A short time later, B's transmission +begins to interfere with D's transmission at D. From Figure 6.12, it is +evident that the end-to-end channel propagation delay of a broadcast +channel---the time it takes for a signal to propagate from one of the +nodes to another---will play a crucial role in determining its +performance. The longer this propagation delay, the larger the chance +that a carrier-sensing node is not yet able to sense a transmission that +has already begun at another node in the network. Carrier Sense Multiple +Access with Collision Dection (CSMA/CD) In Figure 6.12, nodes do not +perform collision detection; both B and D continue to transmit their +frames in their entirety even though a collision has occurred. When a +node performs collision detection, it ceases transmission as soon as it +detects a collision. Figure 6.13 shows the same scenario as in Figure +6.12, except that the two + +Figure 6.13 CSMA with collision detection + +nodes each abort their transmission a short time after detecting a +collision. Clearly, adding collision detection to a multiple access +protocol will help protocol performance by not transmitting a useless, +damaged (by interference with a frame from another node) frame in its +entirety. Before analyzing the CSMA/CD protocol, let us now summarize +its operation from the perspective of an adapter (in a node) attached to +a broadcast channel: + +1. The adapter obtains a datagram from the network layer, prepares a + link-layer frame, and puts the frame adapter buffer. + +2. If the adapter senses that the channel is idle (that is, there is no + signal energy entering the adapter from the channel), it starts to + transmit the frame. If, on the other hand, the adapter senses that + the channel is busy, it waits until it senses no signal energy and + then starts to transmit the frame. + +3. While transmitting, the adapter monitors for the presence of signal + energy coming from other adapters using the broadcast channel. + +4. If the adapter transmits the entire frame without detecting signal + energy from other adapters, the + +adapter is finished with the frame. If, on the other hand, the adapter +detects signal energy from other adapters while transmitting, it aborts +the transmission (that is, it stops transmitting its frame). + +5. After aborting, the adapter waits a random amount of time and then + returns to step 2. The need to wait a random (rather than fixed) + amount of time is hopefully clear---if two nodes transmitted frames + at the same time and then both waited the same fixed amount of time, + they'd continue colliding forever. But what is a good interval of + time from which to choose the random backoff time? If the interval + is large and the number of colliding nodes is small, nodes are + likely to wait a large amount of time (with the channel remaining + idle) before repeating the sense-and-transmit-when-idle step. On the + other hand, if the interval is small and the number of colliding + nodes is large, it's likely that the chosen random values will be + nearly the same, and transmitting nodes will again collide. What + we'd like is an interval that is short when the number of colliding + nodes is small, and long when the number of colliding nodes is + large. The binary exponential backoff algorithm, used in Ethernet as + well as in DOCSIS cable network multiple access protocols \[DOCSIS + 2011\], elegantly solves this problem. Specifically, when + transmitting a frame that has already experienced n collisions, a + node chooses the value of K at random from { 0,1,2,...2n−1}. Thus, + the more collisions experienced by a frame, the larger the interval + from which K is chosen. For Ethernet, the actual amount of time a + node waits is K⋅512 bit times (i.e., K times the amount of time + needed to send 512 bits into the Ethernet) and the maximum value + that n can take is capped at +6. Let's look at an example. Suppose that a node attempts to transmit a + frame for the first time and while transmitting it detects a + collision. The node then chooses K=0 with probability 0.5 or chooses + K=1 with probability 0.5. If the node chooses K=0, then it + immediately begins sensing the channel. If the node chooses K=1, it + waits 512 bit times (e.g., 5.12 microseconds for a 100 Mbps + Ethernet) before beginning the sense-and-transmit-when-idle cycle. + After a second collision, K is chosen with equal probability from + {0,1,2,3}. After three collisions, K is chosen with equal + probability from {0,1,2,3,4,5,6,7}. After 10 or more collisions, K + is chosen with equal probability from {0,1,2,..., 1023}. Thus, the + size of the sets from which K is chosen grows exponentially with the + number of collisions; for this reason this algorithm is referred to + as binary exponential backoff. We also note here that each time a + node prepares a new frame for transmission, it runs the CSMA/CD + algorithm, not taking into account any collisions that may have + occurred in the recent past. So it is possible that a node with a + new frame will immediately be able to sneak in a successful + transmission while several other nodes are in the exponential + backoff state. CSMA/CD Efficiency + +When only one node has a frame to send, the node can transmit at the +full channel rate (e.g., for Ethernet typical rates are 10 Mbps, 100 +Mbps, or 1 Gbps). However, if many nodes have frames to transmit, the +effective transmission rate of the channel can be much less. We define +the efficiency of CSMA/CD to be the long-run fraction of time during +which frames are being transmitted on the channel without collisions +when there is a large number of active nodes, with each node having a +large number of frames to send. In order to present a closed-form +approximation of the efficiency of Ethernet, let dprop denote the +maximum time it takes signal energy to propagate between any two +adapters. Let dtrans be the time to transmit a maximum-size frame +(approximately 1.2 msecs for a 10 Mbps Ethernet). A derivation of the +efficiency of CSMA/CD is beyond the scope of this book (see \[Lam 1980\] +and \[Bertsekas 1991\]). Here we simply state the following +approximation: Efficiency=11+5dprop/dtrans We see from this formula that +as dprop approaches 0, the efficiency approaches 1. This matches our +intuition that if the propagation delay is zero, colliding nodes will +abort immediately without wasting the channel. Also, as dtrans becomes +very large, efficiency approaches 1. This is also intuitive because when +a frame grabs the channel, it will hold on to the channel for a very +long time; thus, the channel will be doing productive work most of the +time. + +6.3.3 Taking-Turns Protocols Recall that two desirable properties of a +multiple access protocol are (1) when only one node is active, the +active node has a throughput of R bps, and (2) when M nodes are active, +then each active node has a throughput of nearly R/M bps. The ALOHA and +CSMA protocols have this first property but not the second. This has +motivated researchers to create another class of protocols---the +taking-turns protocols. As with random access protocols, there are +dozens of taking-turns protocols, and each one of these protocols has +many variations. We'll discuss two of the more important protocols here. +The first one is the polling protocol. The polling protocol requires one +of the nodes to be designated as a master node. The master node polls +each of the nodes in a round-robin fashion. In particular, the master +node first sends a message to node 1, saying that it (node 1) can +transmit up to some maximum number of frames. After node 1 transmits +some frames, the master node tells node 2 it (node 2) can transmit up to +the maximum number of frames. (The master node can determine when a node +has finished sending its frames by observing the lack of a signal on the +channel.) The procedure continues in this manner, with the master node +polling each of the nodes in a cyclic manner. The polling protocol +eliminates the collisions and empty slots that plague random access +protocols. This allows polling to achieve a much higher efficiency. But +it also has a few drawbacks. The first drawback is that the protocol +introduces a polling delay---the amount of time required to notify a +node that it can + +transmit. If, for example, only one node is active, then the node will +transmit at a rate less than R bps, as the master node must poll each of +the inactive nodes in turn each time the active node has sent its +maximum number of frames. The second drawback, which is potentially more +serious, is that if the master node fails, the entire channel becomes +inoperative. The 802.15 protocol and the Bluetooth protocol we will +study in Section 6.3 are examples of polling protocols. The second +taking-turns protocol is the token-passing protocol. In this protocol +there is no master node. A small, special-purpose frame known as a token +is exchanged among the nodes in some fixed order. For example, node 1 +might always send the token to node 2, node 2 might always send the +token to node 3, and node N might always send the token to node 1. When +a node receives a token, it holds onto the token only if it has some +frames to transmit; otherwise, it immediately forwards the token to the +next node. If a node does have frames to transmit when it receives the +token, it sends up to a maximum number of frames and then forwards the +token to the next node. Token passing is decentralized and highly +efficient. But it has its problems as well. For example, the failure of +one node can crash the entire channel. Or if a node accidentally +neglects to release the token, then some recovery procedure must be +invoked to get the token back in circulation. Over the years many +token-passing protocols have been developed, including the fiber +distributed data interface (FDDI) protocol \[Jain 1994\] and the IEEE +802.5 token ring protocol \[IEEE 802.5 2012\], and each one had to +address these as well as other sticky issues. + +6.3.4 DOCSIS: The Link-Layer Protocol for Cable Internet Access In the +previous three subsections, we've learned about three broad classes of +multiple access protocols: channel partitioning protocols, random access +protocols, and taking turns protocols. A cable access network will make +for an excellent case study here, as we'll find aspects of each of these +three classes of multiple access protocols with the cable access +network! Recall from Section 1.2.1 that a cable access network typically +connects several thousand residential cable modems to a cable modem +termination system (CMTS) at the cable network headend. The +DataOver-Cable Service Interface Specifications (DOCSIS) \[DOCSIS 2011\] +specifies the cable data network architecture and its protocols. DOCSIS +uses FDM to divide the downstream (CMTS to modem) and upstream (modem to +CMTS) network segments into multiple frequency channels. Each downstream +channel is 6 MHz wide, with a maximum throughput of approximately 40 +Mbps per channel (although this data rate is seldom seen at a cable +modem in practice); each upstream channel has a maximum channel width of +6.4 MHz, and a maximum upstream throughput of approximately 30 Mbps. +Each upstream and + +Figure 6.14 Upstream and downstream channels between CMTS and cable +modems + +downstream channel is a broadcast channel. Frames transmitted on the +downstream channel by the CMTS are received by all cable modems +receiving that channel; since there is just a single CMTS transmitting +into the downstream channel, however, there is no multiple access +problem. The upstream direction, however, is more interesting and +technically challenging, since multiple cable modems share the same +upstream channel (frequency) to the CMTS, and thus collisions can +potentially occur. As illustrated in Figure 6.14, each upstream channel +is divided into intervals of time (TDM-like), each containing a sequence +of mini-slots during which cable modems can transmit to the CMTS. The +CMTS explicitly grants permission to individual cable modems to transmit +during specific mini-slots. The CMTS accomplishes this by sending a +control message known as a MAP message on a downstream channel to +specify which cable modem (with data to send) can transmit during which +mini-slot for the interval of time specified in the control message. +Since mini-slots are explicitly allocated to cable modems, the CMTS can +ensure there are no colliding transmissions during a mini-slot. But how +does the CMTS know which cable modems have data to send in the first +place? This is accomplished by having cable modems send +mini-slot-request frames to the CMTS during a special set of interval +mini-slots that are dedicated for this purpose, as shown in Figure 6.14. +These mini-slotrequest frames are transmitted in a random access manner +and so may collide with each other. A cable modem can neither sense +whether the upstream channel is busy nor detect collisions. Instead, the +cable modem infers that its mini-slot-request frame experienced a +collision if it does not receive a response to the requested allocation +in the next downstream control message. When a collision is inferred, a +cable modem uses binary exponential backoff to defer the retransmission +of its mini-slot-request frame to a future time slot. When there is +little traffic on the upstream channel, a cable modem may actually +transmit data frames during slots nominally assigned for +mini-slot-request frames (and thus avoid having + +to wait for a mini-slot assignment). A cable access network thus serves +as a terrific example of multiple access protocols in action---FDM, TDM, +random access, and centrally allocated time slots all within one +network! + +6.4 Switched Local Area Networks Having covered broadcast networks and +multiple access protocols in the previous section, let's turn our +attention next to switched local networks. Figure 6.15 shows a switched +local network connecting three departments, two servers and a router +with four switches. Because these switches operate at the link layer, +they switch link-layer frames (rather than network-layer datagrams), +don't recognize network-layer addresses, and don't use routing +algorithms like RIP or OSPF to determine + +Figure 6.15 An institutional network connected together by four switches + +paths through the network of layer-2 switches. Instead of using IP +addresses, we will soon see that they use link-layer addresses to +forward link-layer frames through the network of switches. We'll begin +our study of switched LANs by first covering link-layer addressing +(Section 6.4.1). We then examine the celebrated Ethernet protocol +(Section 6.5.2). After examining link-layer addressing and Ethernet, +we'll look at how link-layer switches operate (Section 6.4.3), and then +see (Section 6.4.4) how these switches are often used to build +large-scale LANs. + +6.4.1 Link-Layer Addressing and ARP Hosts and routers have link-layer +addresses. Now you might find this surprising, recalling from Chapter 4 +that hosts and routers have network-layer addresses as well. You might +be asking, why in the world do we need to have addresses at both the +network and link layers? In addition to describing the syntax and +function of the link-layer addresses, in this section we hope to shed +some light on why the two layers of addresses are useful and, in fact, +indispensable. We'll also cover the Address Resolution Protocol (ARP), +which provides a mechanism to translate IP addresses to link-layer +addresses. MAC Addresses In truth, it is not hosts and routers that have +link-layer addresses but rather their adapters (that is, network +interfaces) that have link-layer addresses. A host or router with +multiple network interfaces will thus have multiple link-layer addresses +associated with it, just as it would also have multiple IP addresses +associated with it. It's important to note, however, that link-layer +switches do not have linklayer addresses associated with their +interfaces that connect to hosts and routers. This is because the job of +the link-layer switch is to carry datagrams between hosts and routers; a +switch does this job transparently, that is, without the host or router +having to explicitly address the frame to the intervening switch. This +is illustrated in Figure 6.16. A link-layer address is variously called +a LAN address, a physical address, or a MAC address. Because MAC address +seems to be the most popular term, we'll henceforth refer to link-layer +addresses as MAC addresses. For most LANs (including Ethernet and 802.11 +wireless LANs), the MAC address is 6 bytes long, giving 248 possible MAC +addresses. As shown in Figure 6.16, these 6-byte addresses are typically +expressed in hexadecimal notation, with each byte of the address +expressed as a pair of hexadecimal numbers. Although MAC addresses were +designed to be permanent, it is now possible to change an adapter's MAC +address via software. For the rest of this section, however, we'll +assume that an adapter's MAC address is fixed. One interesting property +of MAC addresses is that no two adapters have the same address. This +might seem surprising given that adapters are manufactured in many +countries by many companies. How does a company manufacturing adapters +in Taiwan make sure that it is using different addresses from a company +manufacturing + +Figure 6.16 Each interface connected to a LAN has a unique MAC address + +adapters in Belgium? The answer is that the IEEE manages the MAC address +space. In particular, when a company wants to manufacture adapters, it +purchases a chunk of the address space consisting of 224 addresses for a +nominal fee. IEEE allocates the chunk of 224 addresses by fixing the +first 24 bits of a MAC address and letting the company create unique +combinations of the last 24 bits for each adapter. An adapter's MAC +address has a flat structure (as opposed to a hierarchical structure) +and doesn't change no matter where the adapter goes. A laptop with an +Ethernet interface always has the same MAC address, no matter where the +computer goes. A smartphone with an 802.11 interface always has the same +MAC address, no matter where the smartphone goes. Recall that, in +contrast, IP addresses have a hierarchical structure (that is, a network +part and a host part), and a host's IP addresses needs to be changed +when the host moves, i.e., changes the network to which it is attached. +An adapter's MAC address is analogous to a person's social security +number, which also has a flat addressing structure and which doesn't +change no matter where the person goes. An IP address is analogous to a +person's postal address, which is hierarchical and which must be changed +whenever a person moves. Just as a person may find it useful to have +both a postal address and a social security number, it is useful for a +host and router interfaces to have both a network-layer address and a +MAC address. When an adapter wants to send a frame to some destination +adapter, the sending adapter inserts the destination adapter's MAC +address into the frame and then sends the frame into the LAN. As we will +soon see, a switch occasionally broadcasts an incoming frame onto all of +its interfaces. We'll see in Chapter 7 that 802.11 also broadcasts +frames. Thus, an adapter may receive a frame that isn't addressed to it. +Thus, when an adapter receives a frame, it will check to see whether the +destination MAC address in the frame matches its own MAC address. If +there is a match, the adapter extracts the enclosed datagram and passes +the datagram up the protocol stack. If there isn't a match, the adapter +discards the frame, without passing the network-layer datagram up. Thus, +the destination only will be + +interrupted when the frame is received. However, sometimes a sending +adapter does want all the other adapters on the LAN to receive and +process the frame it is about to send. In this case, the sending adapter +inserts a special MAC broadcast address into the destination address +field of the frame. For LANs that use 6-byte addresses (such as Ethernet +and 802.11), the broadcast address is a string of 48 consecutive 1s +(that is, FF-FF-FF-FF-FFFF in hexadecimal notation). Address Resolution +Protocol (ARP) Because there are both network-layer addresses (for +example, Internet IP addresses) and link-layer addresses (that is, MAC +addresses), there is a need to translate between them. For the Internet, +this is the job of the Address Resolution Protocol (ARP) \[RFC 826\]. To +understand the need for a protocol such as ARP, consider the network +shown in Figure 6.17. In this simple example, each host and router has a +single IP address and single MAC address. As usual, IP addresses are +shown in dotted-decimal + +PRINCIPLES IN PRACTICE KEEPING THE LAYERS INDEPENDENT There are several +reasons why hosts and router interfaces have MAC addresses in addition +to network-layer addresses. First, LANs are designed for arbitrary +network-layer protocols, not just for IP and the Internet. If adapters +were assigned IP addresses rather than "neutral" MAC addresses, then +adapters would not easily be able to support other network-layer +protocols (for example, IPX or DECnet). Second, if adapters were to use +network-layer addresses instead of MAC addresses, the network-layer +address would have to be stored in the adapter RAM and reconfigured +every time the adapter was moved (or powered up). Another option is to +not use any addresses in the adapters and have each adapter pass the +data (typically, an IP datagram) of each frame it receives up the +protocol stack. The network layer could then check for a matching +network-layer address. One problem with this option is that the host +would be interrupted by every frame sent on the LAN, including by frames +that were destined for other hosts on the same broadcast LAN. In +summary, in order for the layers to be largely independent building +blocks in a network architecture, different layers need to have their +own addressing scheme. We have now seen three types of addresses: host +names for the application layer, IP addresses for the network layer, and +MAC addresses for the link layer. + +Figure 6.17 Each interface on a LAN has an IP address and a MAC address + +notation and MAC addresses are shown in hexadecimal notation. For the +purposes of this discussion, we will assume in this section that the +switch broadcasts all frames; that is, whenever a switch receives a +frame on one interface, it forwards the frame on all of its other +interfaces. In the next section, we will provide a more accurate +explanation of how switches operate. Now suppose that the host with IP +address 222.222.222.220 wants to send an IP datagram to host +222.222.222.222. In this example, both the source and destination are in +the same subnet, in the addressing sense of Section 4.3.3. To send a +datagram, the source must give its adapter not only the IP datagram but +also the MAC address for destination 222.222.222.222. The sending +adapter will then construct a link-layer frame containing the +destination's MAC address and send the frame into the LAN. The important +question addressed in this section is, How does the sending host +determine the MAC address for the destination host with IP address +222.222.222.222? As you might have guessed, it uses ARP. An ARP module +in the sending host takes any IP address on the same LAN as input, and +returns the corresponding MAC address. In the example at hand, sending +host 222.222.222.220 provides its ARP module the IP address +222.222.222.222, and the ARP module returns the corresponding MAC +address 49-BD-D2-C7-56-2A. So we see that ARP resolves an IP address to +a MAC address. In many ways it is analogous to DNS (studied in Section +2.5), which resolves host names to IP addresses. However, one important +difference between the two resolvers is that DNS resolves host names for +hosts anywhere in the Internet, whereas ARP resolves IP addresses only +for hosts and router interfaces on the same subnet. If a node in +California were to try to use ARP to resolve the IP address for a node +in Mississippi, ARP would return with an error. + +Figure 6.18 A possible ARP table in 222.222.222.220 + +Now that we have explained what ARP does, let's look at how it works. +Each host and router has an ARP table in its memory, which contains +mappings of IP addresses to MAC addresses. Figure 6.18 shows what an ARP +table in host 222.222.222.220 might look like. The ARP table also +contains a timeto-live (TTL) value, which indicates when each mapping +will be deleted from the table. Note that a table does not necessarily +contain an entry for every host and router on the subnet; some may have +never been entered into the table, and others may have expired. A +typical expiration time for an entry is 20 minutes from when an entry is +placed in an ARP table. Now suppose that host 222.222.222.220 wants to +send a datagram that is IP-addressed to another host or router on that +subnet. The sending host needs to obtain the MAC address of the +destination given the IP address. This task is easy if the sender's ARP +table has an entry for the destination node. But what if the ARP table +doesn't currently have an entry for the destination? In particular, +suppose 222.222.222.220 wants to send a datagram to 222.222.222.222. In +this case, the sender uses the ARP protocol to resolve the address. +First, the sender constructs a special packet called an ARP packet. An +ARP packet has several fields, including the sending and receiving IP +and MAC addresses. Both ARP query and response packets have the same +format. The purpose of the ARP query packet is to query all the other +hosts and routers on the subnet to determine the MAC address +corresponding to the IP address that is being resolved. Returning to our +example, 222.222.222.220 passes an ARP query packet to the adapter along +with an indication that the adapter should send the packet to the MAC +broadcast address, namely, FF-FF-FFFF-FF-FF. The adapter encapsulates +the ARP packet in a link-layer frame, uses the broadcast address for the +frame's destination address, and transmits the frame into the subnet. +Recalling our social security number/postal address analogy, an ARP +query is equivalent to a person shouting out in a crowded room of +cubicles in some company (say, AnyCorp): "What is the social security +number of the person whose postal address is Cubicle 13, Room 112, +AnyCorp, Palo Alto, California?" The frame containing the ARP query is +received by all the other adapters on the subnet, and (because of the +broadcast address) each adapter passes the ARP packet within the frame +up to its ARP module. Each of these ARP modules checks to see if its IP +address matches the destination IP address in the ARP packet. The one +with a match sends back to the querying host a response ARP packet with +the desired mapping. The querying host 222.222.222.220 can then update +its ARP table and send its IP datagram, encapsulated in a link-layer +frame whose destination MAC is that of the host or router responding to +the earlier ARP query. + +There are a couple of interesting things to note about the ARP protocol. +First, the query ARP message is sent within a broadcast frame, whereas +the response ARP message is sent within a standard frame. Before reading +on you should think about why this is so. Second, ARP is plug-and-play; +that is, an ARP table gets built automatically---it doesn't have to be +configured by a system administrator. And if a host becomes disconnected +from the subnet, its entry is eventually deleted from the other ARP +tables in the subnet. Students often wonder if ARP is a link-layer +protocol or a network-layer protocol. As we've seen, an ARP packet is +encapsulated within a link-layer frame and thus lies architecturally +above the link layer. However, an ARP packet has fields containing +link-layer addresses and thus is arguably a link-layer protocol, but it +also contains network-layer addresses and thus is also arguably a +network-layer protocol. In the end, ARP is probably best considered a +protocol that straddles the boundary between the link and network +layers---not fitting neatly into the simple layered protocol stack we +studied in Chapter 1. Such are the complexities of real-world protocols! +Sending a Datagram off the Subnet It should now be clear how ARP +operates when a host wants to send a datagram to another host on the +same subnet. But now let's look at the more complicated situation when a +host on a subnet wants to send a network-layer datagram to a host off +the subnet (that is, across a router onto another subnet). Let's discuss +this issue in the context of Figure 6.19, which shows a simple network +consisting of two subnets interconnected by a router. There are several +interesting things to note about Figure 6.19. Each host has exactly one +IP address and one adapter. But, as discussed in Chapter 4, a router has +an IP address for each of its interfaces. For each router interface +there is also an ARP module (in the router) and an adapter. Because the +router in Figure 6.19 has two interfaces, it has two IP addresses, two +ARP modules, and two adapters. Of course, each adapter in the network +has its own MAC address. + +Figure 6.19 Two subnets interconnected by a router + +Also note that Subnet 1 has the network address 111.111.111/24 and that +Subnet 2 has the network address 222.222.222/24. Thus all of the +interfaces connected to Subnet 1 have addresses of the form +111.111.111.xxx and all of the interfaces connected to Subnet 2 have +addresses of the form 222.222.222.xxx. Now let's examine how a host on +Subnet 1 would send a datagram to a host on Subnet 2. Specifically, +suppose that host 111.111.111.111 wants to send an IP datagram to a host +222.222.222.222. The sending host passes the datagram to its adapter, as +usual. But the sending host must also indicate to its adapter an +appropriate destination MAC address. What MAC address should the adapter +use? One might be tempted to guess that the appropriate MAC address is +that of the adapter for host 222.222.222.222, namely, 49-BD-D2-C7-56-2A. +This guess, however, would be wrong! If the sending adapter were to use +that MAC address, then none of the adapters on Subnet 1 would bother to +pass the IP datagram up to its network layer, since the frame's +destination address would not match the MAC address of any adapter on +Subnet 1. The datagram would just die and go to datagram heaven. If we +look carefully at Figure 6.19, we see that in order for a datagram to go +from 111.111.111.111 to a host on Subnet 2, the datagram must first be +sent to the router interface 111.111.111.110, which is the IP address of +the first-hop router on the path to the final destination. Thus, the +appropriate MAC address for the frame is the address of the adapter for +router interface 111.111.111.110, namely, E6-E9-00-17BB-4B. How does the +sending host acquire the MAC address for 111.111.111.110? By using ARP, +of course! Once the sending adapter has this MAC address, it creates a +frame (containing the datagram addressed to 222.222.222.222) and sends +the frame into Subnet 1. The router adapter on Subnet 1 sees that the +link-layer frame is addressed to it, and therefore passes the frame to +the network layer of the router. Hooray---the IP datagram has +successfully been moved from source host to the router! But we are not +finished. We still have to move the datagram from the router to the +destination. The router now has to determine the correct interface on +which the datagram is to be forwarded. As discussed in Chapter 4, this +is done by consulting a forwarding table in the router. The forwarding +table tells the router that the datagram is to be forwarded via router +interface 222.222.222.220. This interface then passes the datagram to +its adapter, which encapsulates the datagram in a new frame and sends +the frame into Subnet 2. This time, the destination MAC address of the +frame is indeed the MAC address of the ultimate destination. And how +does the router obtain this destination MAC address? From ARP, of +course! ARP for Ethernet is defined in RFC 826. A nice introduction to +ARP is given in the TCP/IP tutorial, RFC 1180. We'll explore ARP in more +detail in the homework problems. + +6.4.2 Ethernet + +Ethernet has pretty much taken over the wired LAN market. In the 1980s +and the early 1990s, Ethernet faced many challenges from other LAN +technologies, including token ring, FDDI, and ATM. Some of these other +technologies succeeded in capturing a part of the LAN market for a few +years. But since its invention in the mid-1970s, Ethernet has continued +to evolve and grow and has held on to its dominant position. Today, +Ethernet is by far the most prevalent wired LAN technology, and it is +likely to remain so for the foreseeable future. One might say that +Ethernet has been to local area networking what the Internet has been to +global networking. There are many reasons for Ethernet's success. First, +Ethernet was the first widely deployed high-speed LAN. Because it was +deployed early, network administrators became intimately familiar with +Ethernet--- its wonders and its quirks---and were reluctant to switch +over to other LAN technologies when they came on the scene. Second, +token ring, FDDI, and ATM were more complex and expensive than Ethernet, +which further discouraged network administrators from switching over. +Third, the most compelling reason to switch to another LAN technology +(such as FDDI or ATM) was usually the higher data rate of the new +technology; however, Ethernet always fought back, producing versions +that operated at equal data rates or higher. Switched Ethernet was also +introduced in the early 1990s, which further increased its effective +data rates. Finally, because Ethernet has been so popular, Ethernet +hardware (in particular, adapters and switches) has become a commodity +and is remarkably cheap. The original Ethernet LAN was invented in the +mid-1970s by Bob Metcalfe and David Boggs. The original Ethernet LAN +used a coaxial bus to interconnect the nodes. Bus topologies for +Ethernet actually persisted throughout the 1980s and into the mid-1990s. +Ethernet with a bus topology is a broadcast LAN ---all transmitted +frames travel to and are processed by all adapters connected to the bus. +Recall that we covered Ethernet's CSMA/CD multiple access protocol with +binary exponential backoff in Section 6.3.2. By the late 1990s, most +companies and universities had replaced their LANs with Ethernet +installations using a hub-based star topology. In such an installation +the hosts (and routers) are directly connected to a hub with +twisted-pair copper wire. A hub is a physical-layer device that acts on +individual bits rather than frames. When a bit, representing a zero or a +one, arrives from one interface, the hub simply recreates the bit, +boosts its energy strength, and transmits the bit onto all the other +interfaces. Thus, Ethernet with a hub-based star topology is also a +broadcast LAN---whenever a hub receives a bit from one of its +interfaces, it sends a copy out on all of its other interfaces. In +particular, if a hub receives frames from two different interfaces at +the same time, a collision occurs and the nodes that created the frames +must retransmit. In the early 2000s Ethernet experienced yet another +major evolutionary change. Ethernet installations continued to use a +star topology, but the hub at the center was replaced with a switch. +We'll be examining switched Ethernet in depth later in this chapter. For +now, we only mention that a switch is not only "collision-less" but is +also a bona-fide store-and-forward packet switch; but unlike routers, +which operate up through layer 3, a switch operates only up through +layer 2. + +Figure 6.20 Ethernet frame structure + +Ethernet Frame Structure We can learn a lot about Ethernet by examining +the Ethernet frame, which is shown in Figure 6.20. To give this +discussion about Ethernet frames a tangible context, let's consider +sending an IP datagram from one host to another host, with both hosts on +the same Ethernet LAN (for example, the Ethernet LAN in Figure 6.17.) +(Although the payload of our Ethernet frame is an IP datagram, we note +that an Ethernet frame can carry other network-layer packets as well.) +Let the sending adapter, adapter A, have the MAC address +AA-AA-AA-AA-AA-AA and the receiving adapter, adapter B, have the MAC +address BB-BB-BB-BB-BB-BB. The sending adapter encapsulates the IP +datagram within an Ethernet frame and passes the frame to the physical +layer. The receiving adapter receives the frame from the physical layer, +extracts the IP datagram, and passes the IP datagram to the network +layer. In this context, let's now examine the six fields of the Ethernet +frame, as shown in Figure 6.20. Data field (46 to 1,500 bytes). This +field carries the IP datagram. The maximum transmission unit (MTU) of +Ethernet is 1,500 bytes. This means that if the IP datagram exceeds +1,500 bytes, then the host has to fragment the datagram, as discussed in +Section 4.3.2. The minimum size of the data field is 46 bytes. This +means that if the IP datagram is less than 46 bytes, the data field has +to be "stuffed" to fill it out to 46 bytes. When stuffing is used, the +data passed to the network layer contains the stuffing as well as an IP +datagram. The network layer uses the length field in the IP datagram +header to remove the stuffing. Destination address (6 bytes). This field +contains the MAC address of the destination adapter, BBBB-BB-BB-BB-BB. +When adapter B receives an Ethernet frame whose destination address is +either BB-BB-BB-BB-BB-BB or the MAC broadcast address, it passes the +contents of the frame's data field to the network layer; if it receives +a frame with any other MAC address, it discards the frame. Source +address (6 bytes). This field contains the MAC address of the adapter +that transmits the frame onto the LAN, in this example, +AA-AA-AA-AA-AA-AA. Type field (2 bytes). The type field permits Ethernet +to multiplex network-layer protocols. To understand this, we need to +keep in mind that hosts can use other network-layer protocols besides +IP. In fact, a given host may support multiple network-layer protocols +using different protocols for different applications. For this reason, +when the Ethernet frame arrives at adapter B, adapter B needs to know to +which network-layer protocol it should pass (that is, demultiplex) the +contents of the data field. IP and other network-layer protocols (for +example, Novell IPX or AppleTalk) each have their own, standardized type +number. Furthermore, the ARP protocol (discussed in the previous + +section) has its own type number, and if the arriving frame contains an +ARP packet (i.e., has a type field of 0806 hexadecimal), the ARP packet +will be demultiplexed up to the ARP protocol. Note that the type field +is analogous to the protocol field in the network-layer datagram and the +port-number fields in the transport-layer segment; all of these fields +serve to glue a protocol at one layer to a protocol at the layer above. +Cyclic redundancy check (CRC) (4 bytes). As discussed in Section 6.2.3, +the purpose of the CRC field is to allow the receiving adapter, adapter +B, to detect bit errors in the frame. Preamble (8 bytes). The Ethernet +frame begins with an 8-byte preamble field. Each of the first 7 bytes of +the preamble has a value of 10101010; the last byte is 10101011. The +first 7 bytes of the preamble serve to "wake up" the receiving adapters +and to synchronize their clocks to that of the sender's clock. Why +should the clocks be out of synchronization? Keep in mind that adapter A +aims to transmit the frame at 10 Mbps, 100 Mbps, or 1 Gbps, depending on +the type of Ethernet LAN. However, because nothing is absolutely +perfect, adapter A will not transmit the frame at exactly the target +rate; there will always be some drift from the target rate, a drift +which is not known a priori by the other adapters on the LAN. A +receiving adapter can lock onto adapter A's clock simply by locking onto +the bits in the first 7 bytes of the preamble. The last 2 bits of the +eighth byte of the preamble (the first two consecutive 1s) alert adapter +B that the "important stuff" is about to come. All of the Ethernet +technologies provide connectionless service to the network layer. That +is, when adapter A wants to send a datagram to adapter B, adapter A +encapsulates the datagram in an Ethernet frame and sends the frame into +the LAN, without first handshaking with adapter B. This layer-2 +connectionless service is analogous to IP's layer-3 datagram service and +UDP's layer-4 connectionless service. Ethernet technologies provide an +unreliable service to the network layer. Specifically, when adapter B +receives a frame from adapter A, it runs the frame through a CRC check, +but neither sends an acknowledgment when a frame passes the CRC check +nor sends a negative acknowledgment when a frame fails the CRC check. +When a frame fails the CRC check, adapter B simply discards the frame. +Thus, adapter A has no idea whether its transmitted frame reached +adapter B and passed the CRC check. This lack of reliable transport (at +the link layer) helps to make Ethernet simple and cheap. But it also +means that the stream of datagrams passed to the network layer can have +gaps. + +CASE HISTORY BOB METCALFE AND ETHERNET As a PhD student at Harvard +University in the early 1970s, Bob Metcalfe worked on the ARPAnet at +MIT. During his studies, he also became exposed to Abramson's work on +ALOHA and random access protocols. After completing his PhD and just +before beginning a job at Xerox Palo Alto Research Center (Xerox PARC), +he visited Abramson and his University of Hawaii colleagues for three +months, getting a firsthand look at ALOHAnet. At Xerox PARC, Metcalfe + +became exposed to Alto computers, which in many ways were the +forerunners of the personal computers of the 1980s. Metcalfe saw the +need to network these computers in an inexpensive manner. So armed with +his knowledge about ARPAnet, ALOHAnet, and random access protocols, +Metcalfe---along with colleague David Boggs---invented Ethernet. +Metcalfe and Boggs's original Ethernet ran at 2.94 Mbps and linked up to +256 hosts separated by up to one mile. Metcalfe and Boggs succeeded at +getting most of the researchers at Xerox PARC to communicate through +their Alto computers. Metcalfe then forged an alliance between Xerox, +Digital, and Intel to establish Ethernet as a 10 Mbps Ethernet standard, +ratified by the IEEE. Xerox did not show much interest in +commercializing Ethernet. In 1979, Metcalfe formed his own company, +3Com, which developed and commercialized networking technology, +including Ethernet technology. In particular, 3Com developed and +marketed Ethernet cards in the early 1980s for the immensely popular IBM +PCs. + +If there are gaps due to discarded Ethernet frames, does the application +at Host B see gaps as well? As we learned in Chapter 3, this depends on +whether the application is using UDP or TCP. If the application is using +UDP, then the application in Host B will indeed see gaps in the data. On +the other hand, if the application is using TCP, then TCP in Host B will +not acknowledge the data contained in discarded frames, causing TCP in +Host A to retransmit. Note that when TCP retransmits data, the data will +eventually return to the Ethernet adapter at which it was discarded. +Thus, in this sense, Ethernet does retransmit data, although Ethernet is +unaware of whether it is transmitting a brand-new datagram with +brand-new data, or a datagram that contains data that has already been +transmitted at least once. Ethernet Technologies In our discussion +above, we've referred to Ethernet as if it were a single protocol +standard. But in fact, Ethernet comes in many different flavors, with +somewhat bewildering acronyms such as 10BASE-T, 10BASE-2, 100BASE-T, +1000BASE-LX, 10GBASE-T and 40GBASE-T. These and many other Ethernet +technologies have been standardized over the years by the IEEE 802.3 +CSMA/CD (Ethernet) working group \[IEEE 802.3 2012\]. While these +acronyms may appear bewildering, there is actually considerable order +here. The first part of the acronym refers to the speed of the standard: +10, 100, 1000, or 10G, for 10 Megabit (per second), 100 Megabit, +Gigabit, 10 Gigabit and 40 Gigibit Ethernet, respectively. "BASE" refers +to baseband Ethernet, meaning that the physical media only carries +Ethernet traffic; almost all of the 802.3 standards are for baseband +Ethernet. The final part of the acronym refers to the physical media +itself; Ethernet is both a link-layer and a physical-layer specification +and is carried over a variety of physical media including coaxial cable, +copper wire, and fiber. Generally, a "T" refers to twisted-pair copper +wires. Historically, an Ethernet was initially conceived of as a segment +of coaxial cable. The early 10BASE-2 and 10BASE-5 standards specify 10 +Mbps Ethernet over two types of coaxial cable, each limited in + +length to 500 meters. Longer runs could be obtained by using a +repeater---a physical-layer device that receives a signal on the input +side, and regenerates the signal on the output side. A coaxial cable +corresponds nicely to our view of Ethernet as a broadcast medium---all +frames transmitted by one interface are received at other interfaces, +and Ethernet's CDMA/CD protocol nicely solves the multiple access +problem. Nodes simply attach to the cable, and voila, we have a local +area network! Ethernet has passed through a series of evolutionary steps +over the years, and today's Ethernet is very different from the original +bus-topology designs using coaxial cable. In most installations today, +nodes are connected to a switch via point-to-point segments made of +twisted-pair copper wires or fiber-optic cables, as shown in Figures +6.15--6.17. In the mid-1990s, Ethernet was standardized at 100 Mbps, 10 +times faster than 10 Mbps Ethernet. The original Ethernet MAC protocol +and frame format were preserved, but higher-speed physical layers were +defined for copper wire (100BASE-T) and fiber (100BASE-FX, 100BASE-SX, +100BASE-BX). Figure 6.21 shows these different standards and the common +Ethernet MAC protocol and frame format. 100 Mbps Ethernet is limited to +a 100-meter distance over twisted pair, and to + +Figure 6.21 100 Mbps Ethernet standards: A common link layer, different +physical layers + +several kilometers over fiber, allowing Ethernet switches in different +buildings to be connected. Gigabit Ethernet is an extension to the +highly successful 10 Mbps and 100 Mbps Ethernet standards. Offering a +raw data rate of 40,000 Mbps, 40 Gigabit Ethernet maintains full +compatibility with the huge installed base of Ethernet equipment. The +standard for Gigabit Ethernet, referred to as IEEE 802.3z, does the +following: Uses the standard Ethernet frame format (Figure 6.20) and is +backward compatible with 10BASE-T and 100BASE-T technologies. This +allows for easy integration of Gigabit Ethernet with the existing +installed base of Ethernet equipment. Allows for point-to-point links as +well as shared broadcast channels. Point-to-point links use switches +while broadcast channels use hubs, as described earlier. In Gigabit +Ethernet jargon, hubs are called buffered distributors. Uses CSMA/CD for +shared broadcast channels. In order to have acceptable efficiency, the + +maximum distance between nodes must be severely restricted. Allows for +full-duplex operation at 40 Gbps in both directions for point-to-point +channels. Initially operating over optical fiber, Gigabit Ethernet is +now able to run over category 5 UTP cabling. Let's conclude our +discussion of Ethernet technology by posing a question that may have +begun troubling you. In the days of bus topologies and hub-based star +topologies, Ethernet was clearly a broadcast link (as defined in Section +6.3) in which frame collisions occurred when nodes transmitted at the +same time. To deal with these collisions, the Ethernet standard included +the CSMA/CD protocol, which is particularly effective for a wired +broadcast LAN spanning a small geographical region. But if the prevalent +use of Ethernet today is a switch-based star topology, using +store-and-forward packet switching, is there really a need anymore for +an Ethernet MAC protocol? As we'll see shortly, a switch coordinates its +transmissions and never forwards more than one frame onto the same +interface at any time. Furthermore, modern switches are full-duplex, so +that a switch and a node can each send frames to each other at the same +time without interference. In other words, in a switch-based Ethernet +LAN there are no collisions and, therefore, there is no need for a MAC +protocol! As we've seen, today's Ethernets are very different from the +original Ethernet conceived by Metcalfe and Boggs more than 30 years +ago---speeds have increased by three orders of magnitude, Ethernet +frames are carried over a variety of media, switched-Ethernets have +become dominant, and now even the MAC protocol is often unnecessary! Is +all of this really still Ethernet? The answer, of course, is "yes, by +definition." It is interesting to note, however, that through all of +these changes, there has indeed been one enduring constant that has +remained unchanged over 30 years---Ethernet's frame format. Perhaps this +then is the one true and timeless centerpiece of the Ethernet standard. + +6.4.3 Link-Layer Switches Up until this point, we have been purposefully +vague about what a switch actually does and how it works. The role of +the switch is to receive incoming link-layer frames and forward them +onto outgoing links; we'll study this forwarding function in detail in +this subsection. We'll see that the switch itself is transparent to the +hosts and routers in the subnet; that is, a host/router addresses a +frame to another host/router (rather than addressing the frame to the +switch) and happily sends the frame into the LAN, unaware that a switch +will be receiving the frame and forwarding it. The rate at which frames +arrive to any one of the switch's output interfaces may temporarily +exceed the link capacity of that interface. To accommodate this problem, +switch output interfaces have buffers, in much the same way that router +output interfaces have buffers for datagrams. Let's now take a closer +look at how switches operate. Forwarding and Filtering + +Filtering is the switch function that determines whether a frame should +be forwarded to some interface or should just be dropped. Forwarding is +the switch function that determines the interfaces to which a frame +should be directed, and then moves the frame to those interfaces. Switch +filtering and forwarding are done with a switch table. The switch table +contains entries for some, but not necessarily all, of the hosts and +routers on a LAN. An entry in the switch table contains (1) a MAC +address, (2) the switch interface that leads toward that MAC address, +and (3) the time at which the entry was placed in the table. An example +switch table for the uppermost switch in Figure 6.15 is shown in Figure +6.22. This description of frame forwarding may sound similar to our +discussion of datagram forwarding + +Figure 6.22 Portion of a switch table for the uppermost switch in Figure +6.15 + +in Chapter 4. Indeed, in our discussion of generalized forwarding in +Section 4.4, we learned that many modern packet switches can be +configured to forward on the basis of layer-2 destination MAC addresses +(i.e., function as a layer-2 switch) or layer-3 IP destination addresses +(i.e., function as a layer-3 router). Nonetheless, we'll make the +important distinction that switches forward packets based on MAC +addresses rather than on IP addresses. We will also see that a +traditional (i.e., in a non-SDN context) switch table is constructed in +a very different manner from a router's forwarding table. To understand +how switch filtering and forwarding work, suppose a frame with +destination address DDDD-DD-DD-DD-DD arrives at the switch on interface +x. The switch indexes its table with the MAC address DD-DD-DD-DD-DD-DD. +There are three possible cases: There is no entry in the table for +DD-DD-DD-DD-DD-DD. In this case, the switch forwards copies of the frame +to the output buffers preceding all interfaces except for interface x. +In other words, if there is no entry for the destination address, the +switch broadcasts the frame. There is an entry in the table, associating +DD-DD-DD-DD-DD-DD with interface x. In this case, the frame is coming +from a LAN segment that contains adapter DD-DD-DD-DD-DD-DD. There being +no need to forward the frame to any of the other interfaces, the switch +performs the filtering function by discarding the frame. There is an +entry in the table, associating DD-DD-DD-DD-DD-DD with interface y≠x. In +this case, the frame needs to be forwarded to the LAN segment attached +to interface y. The switch performs its forwarding function by putting +the frame in an output buffer that precedes interface y. + +Let's walk through these rules for the uppermost switch in Figure 6.15 +and its switch table in Figure 6.22. Suppose that a frame with +destination address 62-FE-F7-11-89-A3 arrives at the switch from +interface 1. The switch examines its table and sees that the destination +is on the LAN segment connected to interface 1 (that is, Electrical +Engineering). This means that the frame has already been broadcast on +the LAN segment that contains the destination. The switch therefore +filters (that is, discards) the frame. Now suppose a frame with the same +destination address arrives from interface 2. The switch again examines +its table and sees that the destination is in the direction of interface +1; it therefore forwards the frame to the output buffer preceding +interface 1. It should be clear from this example that as long as the +switch table is complete and accurate, the switch forwards frames toward +destinations without any broadcasting. In this sense, a switch is +"smarter" than a hub. But how does this switch table get configured in +the first place? Are there link-layer equivalents to network-layer +routing protocols? Or must an overworked manager manually configure the +switch table? Self-Learning A switch has the wonderful property +(particularly for the already-overworked network administrator) that its +table is built automatically, dynamically, and autonomously---without +any intervention from a network administrator or from a configuration +protocol. In other words, switches are self-learning. This capability is +accomplished as follows: + +1. The switch table is initially empty. +2. For each incoming frame received on an interface, the switch stores + in its table (1) the MAC address in the frame's source address + field, (2) the interface from which the frame arrived, and + +```{=html} +<!-- --> +``` +(3) the current time. In this manner the switch records in its table the + LAN segment on which the sender resides. If every host in the LAN + eventually sends a frame, then every host will eventually get + recorded in the table. + +```{=html} +<!-- --> +``` +3. The switch deletes an address in the table if no frames are received + with that address as the source address after some period of time + (the aging time). In this manner, if a PC is replaced by another PC + (with a different adapter), the MAC address of the original PC will + eventually be purged from the switch table. Let's walk through the + self-learning property for the uppermost switch in Figure 6.15 and + its corresponding switch table in Figure 6.22. Suppose at time 9:39 + a frame with source address 01-12-2334-45-56 arrives from + interface 2. Suppose that this address is not in the switch table. + Then the switch adds a new entry to the table, as shown in Figure + 6.23. Continuing with this same example, suppose that the aging time + for this switch is 60 minutes, and no frames with source address + 62-FE-F7-11-89-A3 arrive to the switch between 9:32 and 10:32. Then + at + +time 10:32, the switch removes this address from its table. + +Figure 6.23 Switch learns about the location of an adapter with address +01-12-23-34-45-56 + +Switches are plug-and-play devices because they require no intervention +from a network administrator or user. A network administrator wanting to +install a switch need do nothing more than connect the LAN segments to +the switch interfaces. The administrator need not configure the switch +tables at the time of installation or when a host is removed from one of +the LAN segments. Switches are also full-duplex, meaning any switch +interface can send and receive at the same time. Properties of +Link-Layer Switching Having described the basic operation of a +link-layer switch, let's now consider their features and properties. We +can identify several advantages of using switches, rather than broadcast +links such as buses or hub-based star topologies: Elimination of +collisions. In a LAN built from switches (and without hubs), there is no +wasted bandwidth due to collisions! The switches buffer frames and never +transmit more than one frame on a segment at any one time. As with a +router, the maximum aggregate throughput of a switch is the sum of all +the switch interface rates. Thus, switches provide a significant +performance improvement over LANs with broadcast links. Heterogeneous +links. Because a switch isolates one link from another, the different +links in the LAN can operate at different speeds and can run over +different media. For example, the uppermost switch in Figure 6.15 might +have three1 Gbps 1000BASE-T copper links, two 100 Mbps 100BASEFX fiber +links, and one 100BASE-T copper link. Thus, a switch is ideal for mixing +legacy equipment with new equipment. Management. In addition to +providing enhanced security (see sidebar on Focus on Security), a switch +also eases network management. For example, if an adapter malfunctions +and continually sends Ethernet frames (called a jabbering adapter), a +switch can detect the problem and internally disconnect the +malfunctioning adapter. With this feature, the network administrator +need not get out of bed and drive back to work in order to correct the +problem. Similarly, a cable cut disconnects only that host that was +using the cut cable to connect to the switch. In the days of coaxial +cable, many a + +network manager spent hours "walking the line" (or more accurately, +"crawling the floor") to find the cable break that brought down the +entire network. Switches also gather statistics on bandwidth usage, +collision rates, and traffic types, and make this information available +to the network manager. This information can be used to debug and +correct problems, and to plan how the LAN should evolve in the future. +Researchers are exploring adding yet more management functionality into +Ethernet LANs in prototype deployments \[Casado 2007; Koponen 2011\]. +FOCUS ON SECURITY SNIFFING A SWITCHED LAN: SWITCH POISONING When a host +is connected to a switch, it typically only receives frames that are +intended for it. For example, consider a switched LAN in Figure 6.17. +When host A sends a frame to host B, and there is an entry for host B in +the switch table, then the switch will forward the frame only to host B. +If host C happens to be running a sniffer, host C will not be able to +sniff this A-to-B frame. Thus, in a switched-LAN environment (in +contrast to a broadcast link environment such as 802.11 LANs or +hub--based Ethernet LANs), it is more difficult for an attacker to sniff +frames. However, because the switch broadcasts frames that have +destination addresses that are not in the switch table, the sniffer at C +can still sniff some frames that are not intended for C. Furthermore, a +sniffer will be able sniff all Ethernet broadcast frames with broadcast +destination address FF--FF--FF--FF--FF--FF. A well-known attack against +a switch, called switch poisoning, is to send tons of packets to the +switch with many different bogus source MAC addresses, thereby filling +the switch table with bogus entries and leaving no room for the MAC +addresses of the legitimate hosts. This causes the switch to broadcast +most frames, which can then be picked up by the sniffer \[Skoudis +2006\]. As this attack is rather involved even for a sophisticated +attacker, switches are significantly less vulnerable to sniffing than +are hubs and wireless LANs. + +Switches Versus Routers As we learned in Chapter 4, routers are +store-and-forward packet switches that forward packets using +network-layer addresses. Although a switch is also a store-and-forward +packet switch, it is fundamentally different from a router in that it +forwards packets using MAC addresses. Whereas a router is a layer-3 +packet switch, a switch is a layer-2 packet switch. Recall, however, +that we learned in Section 4.4 that modern switches using the "match +plus action" operation can be used to forward a layer-2 frame based on +the frame's destination MAC address, as well as a layer-3 datagram using +the datagram's destination IP address. Indeed, we saw that switches +using the OpenFlow standard can perform generalized packet forwarding +based on any of eleven different frame, datagram, and transportlayer +header fields. + +Even though switches and routers are fundamentally different, network +administrators must often choose between them when installing an +interconnection device. For example, for the network in Figure 6.15, the +network administrator could just as easily have used a router instead of +a switch to connect the department LANs, servers, and internet gateway +router. Indeed, a router would permit interdepartmental communication +without creating collisions. Given that both switches and routers are +candidates for interconnection devices, what are the pros and cons of +the two approaches? + +Figure 6.24 Packet processing in switches, routers, and hosts + +First consider the pros and cons of switches. As mentioned above, +switches are plug-and-play, a property that is cherished by all the +overworked network administrators of the world. Switches can also have +relatively high filtering and forwarding rates---as shown in Figure +6.24, switches have to process frames only up through layer 2, whereas +routers have to process datagrams up through layer 3. On the other hand, +to prevent the cycling of broadcast frames, the active topology of a +switched network is restricted to a spanning tree. Also, a large +switched network would require large ARP tables in the hosts and routers +and would generate substantial ARP traffic and processing. Furthermore, +switches are susceptible to broadcast storms---if one host goes haywire +and transmits an endless stream of Ethernet broadcast frames, the +switches will forward all of these frames, causing the entire network to +collapse. Now consider the pros and cons of routers. Because network +addressing is often hierarchical (and not flat, as is MAC addressing), +packets do not normally cycle through routers even when the network has +redundant paths. (However, packets can cycle when router tables are +misconfigured; but as we learned in Chapter 4, IP uses a special +datagram header field to limit the cycling.) Thus, packets are not +restricted to a spanning tree and can use the best path between source +and destination. Because routers do not have the spanning tree +restriction, they have allowed the Internet to be built with a rich +topology that includes, for example, multiple active links between +Europe and North America. Another feature of routers is that they +provide firewall protection against layer-2 broadcast storms. Perhaps +the most significant drawback of routers, though, is that they are not +plug-and-play---they and the hosts that connect to them need their IP +addresses to be configured. Also, routers often have a larger per-packet +processing time than switches, because they have to process up through +the layer-3 fields. Finally, there + +are two different ways to pronounce the word router, either as "rootor" +or as "rowter," and people waste a lot of time arguing over the proper +pronunciation \[Perlman 1999\]. Given that both switches and routers +have their pros and cons (as summarized in Table 6.1), when should an +institutional network (for example, a university campus Table 6.1 +Comparison of the typical features of popular interconnection devices +Hubs + +Routers + +Switches + +Traffic isolation + +No + +Yes + +Yes + +Plug and play + +Yes + +No + +Yes + +Optimal routing + +No + +Yes + +No + +network or a corporate campus network) use switches, and when should it +use routers? Typically, small networks consisting of a few hundred hosts +have a few LAN segments. Switches suffice for these small networks, as +they localize traffic and increase aggregate throughput without +requiring any configuration of IP addresses. But larger networks +consisting of thousands of hosts typically include routers within the +network (in addition to switches). The routers provide a more robust +isolation of traffic, control broadcast storms, and use more +"intelligent" routes among the hosts in the network. For more discussion +of the pros and cons of switched versus routed networks, as well as a +discussion of how switched LAN technology can be extended to accommodate +two orders of magnitude more hosts than today's Ethernets, see \[Meyers +2004; Kim 2008\]. + +6.4.4 Virtual Local Area Networks (VLANs) In our earlier discussion of +Figure 6.15, we noted that modern institutional LANs are often +configured hierarchically, with each workgroup (department) having its +own switched LAN connected to the switched LANs of other groups via a +switch hierarchy. While such a configuration works well in an ideal +world, the real world is often far from ideal. Three drawbacks can be +identified in the configuration in Figure 6.15: Lack of traffic +isolation. Although the hierarchy localizes group traffic to within a +single switch, broadcast traffic (e.g., frames carrying ARP and DHCP +messages or frames whose destination has not yet been learned by a +self-learning switch) must still traverse the entire institutional +network. + +Limiting the scope of such broadcast traffic would improve LAN +performance. Perhaps more importantly, it also may be desirable to limit +LAN broadcast traffic for security/privacy reasons. For example, if one +group contains the company's executive management team and another group +contains disgruntled employees running Wireshark packet sniffers, the +network manager may well prefer that the executives' traffic never even +reaches employee hosts. This type of isolation could be provided by +replacing the center switch in Figure 6.15 with a router. We'll see +shortly that this isolation also can be achieved via a switched (layer +2) solution. Inefficient use of switches. If instead of three groups, +the institution had 10 groups, then 10 firstlevel switches would be +required. If each group were small, say less than 10 people, then a +single 96-port switch would likely be large enough to accommodate +everyone, but this single switch would not provide traffic isolation. +Managing users. If an employee moves between groups, the physical +cabling must be changed to connect the employee to a different switch in +Figure 6.15. Employees belonging to two groups make the problem even +harder. Fortunately, each of these difficulties can be handled by a +switch that supports virtual local area networks (VLANs). As the name +suggests, a switch that supports VLANs allows multiple virtual local +area networks to be defined over a single physical local area network +infrastructure. Hosts within a VLAN communicate with each other as if +they (and no other hosts) were connected to the switch. In a port-based +VLAN, the switch's ports (interfaces) are divided into groups by the +network manager. Each group constitutes a VLAN, with the ports in each +VLAN forming a broadcast domain (i.e., broadcast traffic from one port +can only reach other ports in the group). Figure 6.25 shows a single +switch with 16 ports. Ports 2 to 8 belong to the EE VLAN, while ports 9 +to 15 belong to the CS VLAN (ports 1 and 16 are unassigned). This VLAN +solves all of the difficulties noted above---EE and CS VLAN frames are +isolated from each other, the two switches in Figure 6.15 have been +replaced by a single switch, and if the user at switch port 8 joins the +CS Department, the network operator simply reconfigures the VLAN +software so that port 8 is now associated with the CS VLAN. One can +easily imagine how the VLAN switch is configured and operates---the +network manager declares a port to belong + +Figure 6.25 A single switch with two configured VLANs + +to a given VLAN (with undeclared ports belonging to a default VLAN) +using switch management software, a table of port-to-VLAN mappings is +maintained within the switch; and switch hardware only delivers frames +between ports belonging to the same VLAN. But by completely isolating +the two VLANs, we have introduced a new difficulty! How can traffic from +the EE Department be sent to the CS Department? One way to handle this +would be to connect a VLAN switch port (e.g., port 1 in Figure 6.25) to +an external router and configure that port to belong both the EE and CS +VLANs. In this case, even though the EE and CS departments share the +same physical switch, the logical configuration would look as if the EE +and CS departments had separate switches connected via a router. An IP +datagram going from the EE to the CS department would first cross the EE +VLAN to reach the router and then be forwarded by the router back over +the CS VLAN to the CS host. Fortunately, switch vendors make such +configurations easy for the network manager by building a single device +that contains both a VLAN switch and a router, so a separate external +router is not needed. A homework problem at the end of the chapter +explores this scenario in more detail. Returning again to Figure 6.15, +let's now suppose that rather than having a separate Computer +Engineering department, some EE and CS faculty are housed in a separate +building, where (of course!) they need network access, and (of course!) +they'd like to be part of their department's VLAN. Figure 6.26 shows a +second 8-port switch, where the switch ports have been defined as +belonging to the EE or the CS VLAN, as needed. But how should these two +switches be interconnected? One easy solution would be to define a port +belonging to the CS VLAN on each switch (similarly for the EE VLAN) and +to connect these ports to each other, as shown in Figure 6.26(a). This +solution doesn't scale, however, since N VLANS would require N ports on +each switch simply to interconnect the two switches. A more scalable +approach to interconnecting VLAN switches is known as VLAN trunking. In +the VLAN trunking approach shown in Figure 6.26(b), a special port on +each switch (port 16 on the left switch and port 1 on the right switch) +is configured as a trunk port to interconnect the two VLAN switches. The +trunk port belongs to all VLANs, and frames sent to any VLAN are +forwarded over the trunk link to the other switch. But this raises yet +another question: How does a switch know that a frame arriving on a +trunk port belongs to a particular VLAN? The IEEE has defined an +extended Ethernet frame format, 802.1Q, for frames crossing a VLAN +trunk. As shown in Figure 6.27, the 802.1Q frame consists of the +standard Ethernet frame with a four-byte VLAN tag added into the header +that carries the identity of the VLAN to which the frame belongs. The +VLAN tag is added into a frame by the switch at the sending side of a +VLAN trunk, parsed, and removed by the switch at the receiving side of +the trunk. The VLAN tag itself consists of a 2-byte Tag Protocol +Identifier (TPID) field (with a fixed hexadecimal value of 81-00), a +2byte Tag Control Information field that contains a 12-bit VLAN +identifier field, and a 3-bit priority field that is similar in intent +to the IP datagram TOS field. + +Figure 6.26 Connecting two VLAN switches with two VLANs: (a) two cables +(b) trunked + +Figure 6.27 Original Ethernet frame (top), 802.1Q-tagged Ethernet VLAN +frame (below) + +In this discussion, we've only briefly touched on VLANs and have focused +on port-based VLANs. We should also mention that VLANs can be defined in +several other ways. In MAC-based VLANs, the network manager specifies +the set of MAC addresses that belong to each VLAN; whenever a device +attaches to a port, the port is connected into the appropriate VLAN +based on the MAC address of the device. VLANs can also be defined based +on network-layer protocols (e.g., IPv4, IPv6, or Appletalk) and other +criteria. It is also possible for VLANs to be extended across IP +routers, allowing islands of LANs to be connected together to form a +single VLAN that could span the globe \[Yu 2011\]. See the 802.1Q +standard \[IEEE 802.1q 2005\] for more details. + +6.5 Link Virtualization: A Network as a Link Layer Because this chapter +concerns link-layer protocols, and given that we're now nearing the +chapter's end, let's reflect on how our understanding of the term link +has evolved. We began this chapter by viewing the link as a physical +wire connecting two communicating hosts. In studying multiple access +protocols, we saw that multiple hosts could be connected by a shared +wire and that the "wire" connecting the hosts could be radio spectra or +other media. This led us to consider the link a bit more abstractly as a +channel, rather than as a wire. In our study of Ethernet LANs (Figure +6.15) we saw that the interconnecting media could actually be a rather +complex switched infrastructure. Throughout this evolution, however, the +hosts themselves maintained the view that the interconnecting medium was +simply a link-layer channel connecting two or more hosts. We saw, for +example, that an Ethernet host can be blissfully unaware of whether it +is connected to other LAN hosts by a single short LAN segment (Figure +6.17) or by a geographically dispersed switched LAN (Figure 6.15) or by +a VLAN (Figure 6.26). In the case of a dialup modem connection between +two hosts, the link connecting the two hosts is actually the telephone +network---a logically separate, global telecommunications network with +its own switches, links, and protocol stacks for data transfer and +signaling. From the Internet link-layer point of view, however, the +dial-up connection through the telephone network is viewed as a simple +"wire." In this sense, the Internet virtualizes the telephone network, +viewing the telephone network as a link-layer technology providing +link-layer connectivity between two Internet hosts. You may recall from +our discussion of overlay networks in Chapter 2 that an overlay network +similarly views the Internet as a means for providing connectivity +between overlay nodes, seeking to overlay the Internet in the same way +that the Internet overlays the telephone network. In this section, we'll +consider Multiprotocol Label Switching (MPLS) networks. Unlike the +circuit-switched telephone network, MPLS is a packet-switched, +virtual-circuit network in its own right. It has its own packet formats +and forwarding behaviors. Thus, from a pedagogical viewpoint, a +discussion of MPLS fits well into a study of either the network layer or +the link layer. From an Internet viewpoint, however, we can consider +MPLS, like the telephone network and switched-Ethernets, as a link-layer +technology that serves to interconnect IP devices. Thus, we'll consider +MPLS in our discussion of the link layer. Framerelay and ATM networks +can also be used to interconnect IP devices, though they represent a +slightly older (but still deployed) technology and will not be covered +here; see the very readable book \[Goralski 1999\] for details. Our +treatment of MPLS will be necessarily brief, as entire books could be +(and have been) written on these networks. We recommend \[Davie 2000\] +for details on MPLS. We'll focus here primarily on how MPLS servers +interconnect to IP devices, although we'll dive a bit deeper into the +underlying technologies as well. + +6.5.1 Multiprotocol Label Switching (MPLS) Multiprotocol Label Switching +(MPLS) evolved from a number of industry efforts in the mid-to-late +1990s to improve the forwarding speed of IP routers by adopting a key +concept from the world of virtual-circuit networks: a fixed-length +label. The goal was not to abandon the destination-based IP +datagramforwarding infrastructure for one based on fixed-length labels +and virtual circuits, but to augment it by selectively labeling +datagrams and allowing routers to forward datagrams based on +fixed-length labels (rather than destination IP addresses) when +possible. Importantly, these techniques work hand-in-hand with IP, using +IP addressing and routing. The IETF unified these efforts in the MPLS +protocol \[RFC 3031, RFC 3032\], effectively blending VC techniques into +a routed datagram network. Let's begin our study of MPLS by considering +the format of a link-layer frame that is handled by an MPLS-capable +router. Figure 6.28 shows that a link-layer frame transmitted between +MPLS-capable devices has a small MPLS header added between the layer-2 +(e.g., Ethernet) header and layer-3 (i.e., IP) header. RFC 3032 defines +the format of the MPLS header for such links; headers are defined for +ATM and frame-relayed networks as well in other RFCs. Among the fields +in the MPLS + +Figure 6.28 MPLS header: Located between link- and network-layer headers + +header are the label, 3 bits reserved for experimental use, a single S +bit, which is used to indicate the end of a series of "stacked" MPLS +headers (an advanced topic that we'll not cover here), and a time-tolive +field. It's immediately evident from Figure 6.28 that an MPLS-enhanced +frame can only be sent between routers that are both MPLS capable (since +a non-MPLS-capable router would be quite confused when it found an MPLS +header where it had expected to find the IP header!). An MPLS-capable +router is often referred to as a label-switched router, since it +forwards an MPLS frame by looking up the MPLS label in its forwarding +table and then immediately passing the datagram to the appropriate +output interface. Thus, the MPLS-capable router need not extract the +destination IP address and perform a lookup of the longest prefix match +in the forwarding table. But how does a router know if its neighbor is +indeed MPLS capable, and how does a router know what label to associate +with the given IP destination? To answer these questions, we'll need to +take a look at the interaction among a group of MPLS-capable routers. + +In the example in Figure 6.29, routers R1 through R4 are MPLS capable. +R5 and R6 are standard IP routers. R1 has advertised to R2 and R3 that +it (R1) can route to destination A, and that a received frame with MPLS +label 6 will be forwarded to destination A. Router R3 has advertised to +router R4 that it can route to destinations A and D, and that incoming +frames with MPLS labels 10 and 12, respectively, will be switched toward +those destinations. Router R2 has also advertised to router R4 that it +(R2) can reach destination A, and that a received frame with MPLS label +8 will be switched toward A. Note that router R4 is now in the +interesting position of having + +Figure 6.29 MPLS-enhanced forwarding + +two MPLS paths to reach A: via interface 0 with outbound MPLS label 10, +and via interface 1 with an MPLS label of 8. The broad picture painted +in Figure 6.29 is that IP devices R5, R6, A, and D are connected +together via an MPLS infrastructure (MPLS-capable routers R1, R2, R3, +and R4) in much the same way that a switched LAN or an ATM network can +connect together IP devices. And like a switched LAN or ATM network, the +MPLS-capable routers R1 through R4 do so without ever touching the IP +header of a packet. In our discussion above, we've not specified the +specific protocol used to distribute labels among the MPLS-capable +routers, as the details of this signaling are well beyond the scope of +this book. We note, however, that the IETF working group on MPLS has +specified in \[RFC 3468\] that an extension of the RSVP protocol, known +as RSVP-TE \[RFC 3209\], will be the focus of its efforts for MPLS +signaling. We've also not discussed how MPLS actually computes the paths +for packets among MPLS capable routers, nor how it gathers link-state +information (e.g., amount of link bandwidth unreserved by MPLS) to + +use in these path computations. Existing link-state routing algorithms +(e.g., OSPF) have been extended to flood this information to +MPLS-capable routers. Interestingly, the actual path computation +algorithms are not standardized, and are currently vendor-specific. Thus +far, the emphasis of our discussion of MPLS has been on the fact that +MPLS performs switching based on labels, without needing to consider the +IP address of a packet. The true advantages of MPLS and the reason for +current interest in MPLS, however, lie not in the potential increases in +switching speeds, but rather in the new traffic management capabilities +that MPLS enables. As noted above, R4 has two MPLS paths to A. If +forwarding were performed up at the IP layer on the basis of IP address, +the IP routing protocols we studied in Chapter 5 would specify only a +single, least-cost path to A. Thus, MPLS provides the ability to forward +packets along routes that would not be possible using standard IP +routing protocols. This is one simple form of traffic engineering using +MPLS \[RFC 3346; RFC 3272; RFC 2702; Xiao 2000\], in which a network +operator can override normal IP routing and force some of the traffic +headed toward a given destination along one path, and other traffic +destined toward the same destination along another path (whether for +policy, performance, or some other reason). It is also possible to use +MPLS for many other purposes as well. It can be used to perform fast +restoration of MPLS forwarding paths, e.g., to reroute traffic over a +precomputed failover path in response to link failure \[Kar 2000; Huang +2002; RFC 3469\]. Finally, we note that MPLS can, and has, been used to +implement so-called virtual private networks (VPNs). In implementing a +VPN for a customer, an ISP uses its MPLS-enabled network to connect +together the customer's various networks. MPLS can be used to isolate +both the resources and addressing used by the customer's VPN from that +of other users crossing the ISP's network; see \[DeClercq 2002\] for +details. Our discussion of MPLS has been brief, and we encourage you to +consult the references we've mentioned. We note that with so many +possible uses for MPLS, it appears that it is rapidly becoming the Swiss +Army knife of Internet traffic engineering! + +6.6 Data Center Networking In recent years, Internet companies such as +Google, Microsoft, Facebook, and Amazon (as well as their counterparts +in Asia and Europe) have built massive data centers, each housing tens +to hundreds of thousands of hosts, and concurrently supporting many +distinct cloud applications (e.g., search, e-mail, social networking, +and e-commerce). Each data center has its own data center network that +interconnects its hosts with each other and interconnects the data +center with the Internet. In this section, we provide a brief +introduction to data center networking for cloud applications. The cost +of a large data center is huge, exceeding \$12 million per month for a +100,000 host data center \[Greenberg 2009a\]. Of these costs, about 45 +percent can be attributed to the hosts themselves (which need to be +replaced every 3--4 years); 25 percent to infrastructure, including +transformers, uninterruptable power supplies (UPS) systems, generators +for long-term outages, and cooling systems; 15 percent for electric +utility costs for the power draw; and 15 percent for networking, +including network gear (switches, routers and load balancers), external +links, and transit traffic costs. (In these percentages, costs for +equipment are amortized so that a common cost metric is applied for +one-time purchases and ongoing expenses such as power.) While networking +is not the largest cost, networking innovation is the key to reducing +overall cost and maximizing performance \[Greenberg 2009a\]. The worker +bees in a data center are the hosts: They serve content (e.g., Web pages +and videos), store e-mails and documents, and collectively perform +massively distributed computations (e.g., distributed index computations +for search engines). The hosts in data centers, called blades and +resembling pizza boxes, are generally commodity hosts that include CPU, +memory, and disk storage. The hosts are stacked in racks, with each rack +typically having 20 to 40 blades. At the top of each rack there is a +switch, aptly named the Top of Rack (TOR) switch, that interconnects the +hosts in the rack with each other and with other switches in the data +center. Specifically, each host in the rack has a network interface card +that connects to its TOR switch, and each TOR switch has additional +ports that can be connected to other switches. Today hosts typically +have 40 Gbps Ethernet connections to their TOR switches \[Greenberg +2015\]. Each host is also assigned its own data-center-internal IP +address. The data center network supports two types of traffic: traffic +flowing between external clients and internal hosts and traffic flowing +between internal hosts. To handle flows between external clients and +internal hosts, the data center network includes one or more border +routers, connecting the data center network to the public Internet. The +data center network therefore interconnects the racks with each other +and connects the racks to the border routers. Figure 6.30 shows an +example of a data center network. Data center network design, the art of +designing the interconnection network and protocols that connect the +racks with each other and with the border routers, has become an +important branch of + +computer networking research in recent years \[Al-Fares 2008; Greenberg +2009a; Greenberg 2009b; Mysore 2009; Guo 2009; Wang 2010\]. + +Figure 6.30 A data center network with a hierarchical topology + +Load Balancing A cloud data center, such as a Google or Microsoft data +center, provides many applications concurrently, such as search, e-mail, +and video applications. To support requests from external clients, each +application is associated with a publicly visible IP address to which +clients send their requests and from which they receive responses. +Inside the data center, the external requests are first directed to a +load balancer whose job it is to distribute requests to the hosts, +balancing the load across the hosts as a function of their current load. +A large data center will often have several load balancers, each one +devoted to a set of specific cloud applications. Such a load balancer is +sometimes referred to as a "layer-4 switch" since it makes decisions +based on the destination port number (layer 4) as well as destination IP +address in the packet. Upon receiving a request for a particular +application, the load balancer forwards it to one of the hosts that +handles the application. (A host may then invoke the services of other +hosts to help process the request.) When the host finishes processing +the request, it sends its response back to the load balancer, which in +turn relays the response back to the external client. The load balancer +not only balances the work load across hosts, but also provides a +NAT-like function, translating the public external IP address to the +internal IP address of the appropriate host, and + +then translating back for packets traveling in the reverse direction +back to the clients. This prevents clients from contacting hosts +directly, which has the security benefit of hiding the internal network +structure and preventing clients from directly interacting with the +hosts. Hierarchical Architecture For a small data center housing only a +few thousand hosts, a simple network consisting of a border router, a +load balancer, and a few tens of racks all interconnected by a single +Ethernet switch could possibly suffice. But to scale to tens to hundreds +of thousands of hosts, a data center often employs a hierarchy of +routers and switches, such as the topology shown in Figure 6.30. At the +top of the hierarchy, the border router connects to access routers (only +two are shown in Figure 6.30, but there can be many more). Below each +access router there are three tiers of switches. Each access router +connects to a top-tier switch, and each top-tier switch connects to +multiple second-tier switches and a load balancer. Each second-tier +switch in turn connects to multiple racks via the racks' TOR switches +(third-tier switches). All links typically use Ethernet for their +link-layer and physical-layer protocols, with a mix of copper and fiber +cabling. With such a hierarchical design, it is possible to scale a data +center to hundreds of thousands of hosts. Because it is critical for a +cloud application provider to continually provide applications with high +availability, data centers also include redundant network equipment and +redundant links in their designs (not shown in Figure 6.30). For +example, each TOR switch can connect to two tier-2 switches, and each +access router, tier-1 switch, and tier-2 switch can be duplicated and +integrated into the design \[Cisco 2012; Greenberg 2009b\]. In the +hierarchical design in Figure 6.30, observe that the hosts below each +access router form a single subnet. In order to localize ARP broadcast +traffic, each of these subnets is further partitioned into smaller VLAN +subnets, each comprising a few hundred hosts \[Greenberg 2009a\]. +Although the conventional hierarchical architecture just described +solves the problem of scale, it suffers from limited host-to-host +capacity \[Greenberg 2009b\]. To understand this limitation, consider +again Figure 6.30, and suppose each host connects to its TOR switch with +a 1 Gbps link, whereas the links between switches are 10 Gbps Ethernet +links. Two hosts in the same rack can always communicate at a full 1 +Gbps, limited only by the rate of the hosts' network interface cards. +However, if there are many simultaneous flows in the data center +network, the maximum rate between two hosts in different racks can be +much less. To gain insight into this issue, consider a traffic pattern +consisting of 40 simultaneous flows between 40 pairs of hosts in +different racks. Specifically, suppose each of 10 hosts in rack 1 in +Figure 6.30 sends a flow to a corresponding host in rack 5. Similarly, +there are ten simultaneous flows between pairs of hosts in racks 2 and +6, ten simultaneous flows between racks 3 and 7, and ten simultaneous +flows between racks 4 and 8. If each flow evenly shares a link's +capacity with other flows traversing that link, then the 40 flows +crossing the 10 Gbps A-to-B link (as well as the 10 Gbps B-to-C link) +will each only receive 10 Gbps/40=250 Mbps, which is significantly less +than the 1 Gbps network + +interface card rate. The problem becomes even more acute for flows +between hosts that need to travel higher up the hierarchy. One possible +solution to this limitation is to deploy higher-rate switches and +routers. But this would significantly increase the cost of the data +center, because switches and routers with high port speeds are very +expensive. Supporting high-bandwidth host-to-host communication is +important because a key requirement in data centers is flexibility in +placement of computation and services \[Greenberg 2009b; Farrington +2010\]. For example, a large-scale Internet search engine may run on +thousands of hosts spread across multiple racks with significant +bandwidth requirements between all pairs of hosts. Similarly, a cloud +computing service such as EC2 may wish to place the multiple virtual +machines comprising a customer's service on the physical hosts with the +most capacity irrespective of their location in the data center. If +these physical hosts are spread across multiple racks, network +bottlenecks as described above may result in poor performance. Trends in +Data Center Networking In order to reduce the cost of data centers, and +at the same time improve their delay and throughput performance, +Internet cloud giants such as Google, Facebook, Amazon, and Microsoft +are continually deploying new data center network designs. Although +these designs are proprietary, many important trends can nevertheless be +identified. One such trend is to deploy new interconnection +architectures and network protocols that overcome the drawbacks of the +traditional hierarchical designs. One such approach is to replace the +hierarchy of switches and routers with a fully connected topology +\[Facebook 2014; Al-Fares 2008; Greenberg 2009b; Guo 2009\], such as the +topology shown in Figure 6.31. In this design, each tier-1 switch +connects to all of the tier-2 switches so that (1) host-to-host traffic +never has to rise above the switch tiers, and (2) with n tier-1 +switches, between any two tier-2 switches there are n disjoint paths. +Such a design can significantly improve the host-to-host capacity. To +see this, consider again our example of 40 flows. The topology in Figure +6.31 can handle such a flow pattern since there are four distinct paths +between the first tier-2 switch and the second tier-2 switch, together +providing an aggregate capacity of 40 Gbps between the first two tier-2 +switches. Such a design not only alleviates the host-to-host capacity +limitation, but also creates a more flexible computation and service +environment in which communication between any two racks not connected +to the same switch is logically equivalent, irrespective of their +locations in the data center. Another major trend is to employ shipping +container--based modular data centers (MDCs) \[YouTube 2009; Waldrop +2007\]. In an MDC, a factory builds, within a + +Figure 6.31 Highly interconnected data network topology + +standard 12-meter shipping container, a "mini data center" and ships the +container to the data center location. Each container has up to a few +thousand hosts, stacked in tens of racks, which are packed closely +together. At the data center location, multiple containers are +interconnected with each other and also with the Internet. Once a +prefabricated container is deployed at a data center, it is often +difficult to service. Thus, each container is designed for graceful +performance degradation: as components (servers and switches) fail over +time, the container continues to operate but with degraded performance. +When many components have failed and performance has dropped below a +threshold, the entire container is removed and replaced with a fresh +one. Building a data center out of containers creates new networking +challenges. With an MDC, there are two types of networks: the +container-internal networks within each of the containers and the core +network connecting each container \[Guo 2009; Farrington 2010\]. Within +each container, at the scale of up to a few thousand hosts, it is +possible to build a fully connected network (as described above) using +inexpensive commodity Gigabit Ethernet switches. However, the design of +the core network, interconnecting hundreds to thousands of containers +while providing high host-to-host bandwidth across containers for +typical workloads, remains a challenging problem. A hybrid +electrical/optical switch architecture for interconnecting the +containers is proposed in \[Farrington 2010\]. When using highly +interconnected topologies, one of the major issues is designing routing +algorithms among the switches. One possibility \[Greenberg 2009b\] is to +use a form of random routing. Another possibility \[Guo 2009\] is to +deploy multiple network interface cards in each host, connect each host +to multiple low-cost commodity switches, and allow the hosts themselves +to intelligently route traffic among the switches. Variations and +extensions of these approaches are currently being deployed in +contemporary data centers. Another important trend is that large cloud +providers are increasingly building or customizing just about everything +that is in their data centers, including network adapters, switches +routers, TORs, software, + +and networking protocols \[Greenberg 2015, Singh 2015\]. Another trend, +pioneered by Amazon, is to improve reliability with "availability +zones," which essentially replicate distinct data centers in different +nearby buildings. By having the buildings nearby (a few kilometers +apart), transactional data can be synchronized across the data centers +in the same availability zone while providing fault tolerance \[Amazon +2014\]. Many more innovations in data center design are likely to +continue to come; interested readers are encouraged to see the recent +papers and videos on data center network design. + +6.7 Retrospective: A Day in the Life of a Web Page Request Now that +we've covered the link layer in this chapter, and the network, transport +and application layers in earlier chapters, our journey down the +protocol stack is complete! In the very beginning of this book (Section +1.1), we wrote "much of this book is concerned with computer network +protocols," and in the first five chapters, we've certainly seen that +this is indeed the case! Before heading into the topical chapters in +second part of this book, we'd like to wrap up our journey down the +protocol stack by taking an integrated, holistic view of the protocols +we've learned about so far. One way then to take this "big picture" view +is to identify the many (many!) protocols that are involved in +satisfying even the simplest request: downloading a Web page. Figure +6.32 illustrates our setting: a student, Bob, connects a laptop to his +school's Ethernet switch and downloads a Web page (say the home page of +www.google.com). As we now know, there's a lot going on "under the hood" +to satisfy this seemingly simple request. A Wireshark lab at the end of +this chapter examines trace files containing a number of the packets +involved in similar scenarios in more detail. + +6.7.1 Getting Started: DHCP, UDP, IP, and Ethernet Let's suppose that +Bob boots up his laptop and then connects it to an Ethernet cable +connected to the school's Ethernet switch, which in turn is connected to +the school's router, as shown in Figure 6.32. The school's router is +connected to an ISP, in this example, comcast.net. In this example, +comcast.net is providing the DNS service for the school; thus, the DNS +server resides in the Comcast network rather than the school network. +We'll assume that the DHCP server is running within the router, as is +often the case. When Bob first connects his laptop to the network, he +can't do anything (e.g., download a Web page) without an IP address. +Thus, the first network-related + +Figure 6.32 A day in the life of a Web page request: Network setting and +actions + +action taken by Bob's laptop is to run the DHCP protocol to obtain an IP +address, as well as other information, from the local DHCP server: + +1. The operating system on Bob's laptop creates a DHCP request message + (Section 4.3.3) and puts this message within a UDP segment (Section + 3.3) with destination port 67 (DHCP server) and source port 68 (DHCP + client). The UDP segment is then placed within an IP datagram + (Section 4.3.1) with a broadcast IP destination address + (255.255.255.255) and a source IP address of 0.0.0.0, since Bob's + laptop doesn't yet have an IP address. + +2. The IP datagram containing the DHCP request message is then placed + within an Ethernet frame (Section 6.4.2). The Ethernet frame has a + destination MAC addresses of FF:FF:FF:FF:FF:FF so that the frame + will be broadcast to all devices connected to the switch (hopefully + including a DHCP server); the frame's source MAC address is that of + Bob's laptop, 00:16:D3:23:68:8A. + +3. The broadcast Ethernet frame containing the DHCP request is the + first frame sent by Bob's laptop to the Ethernet switch. The switch + broadcasts the incoming frame on all outgoing ports, including the + port connected to the router. + +4. The router receives the broadcast Ethernet frame containing the DHCP + request on its interface with MAC address 00:22:6B:45:1F:1B and the + IP datagram is extracted from the Ethernet frame. The datagram's + broadcast IP destination address indicates that this IP datagram + should be processed by upper layer protocols at this node, so the + datagram's payload (a UDP segment) is + +thus demultiplexed (Section 3.2) up to UDP, and the DHCP request message +is extracted from the UDP segment. The DHCP server now has the DHCP +request message. + +5. Let's suppose that the DHCP server running within the router can + allocate IP addresses in the CIDR (Section 4.3.3) block + 68.85.2.0/24. In this example, all IP addresses used within the + school are thus within Comcast's address block. Let's suppose the + DHCP server allocates address 68.85.2.101 to Bob's laptop. The DHCP + server creates a DHCP ACK message (Section 4.3.3) containing this IP + address, as well as the IP address of the DNS server (68.87.71.226), + the IP address for the default gateway router (68.85.2.1), and the + subnet block (68.85.2.0/24) (equivalently, the "network mask"). The + DHCP message is put inside a UDP segment, which is put inside an IP + datagram, which is put inside an Ethernet frame. The Ethernet frame + has a source MAC address of the router's interface to the home + network (00:22:6B:45:1F:1B) and a destination MAC address of Bob's + laptop (00:16:D3:23:68:8A). + +6. The Ethernet frame containing the DHCP ACK is sent (unicast) by the + router to the switch. Because the switch is self-learning (Section + 6.4.3) and previously received an Ethernet frame (containing the + DHCP request) from Bob's laptop, the switch knows to forward a frame + addressed to 00:16:D3:23:68:8A only to the output port leading to + Bob's laptop. + +7. Bob's laptop receives the Ethernet frame containing the DHCP ACK, + extracts the IP datagram from the Ethernet frame, extracts the UDP + segment from the IP datagram, and extracts the DHCP ACK message from + the UDP segment. Bob's DHCP client then records its IP address and + the IP address of its DNS server. It also installs the address of + the default gateway into its IP forwarding table (Section 4.1). + Bob's laptop will send all datagrams with destination address + outside of its subnet 68.85.2.0/24 to the default gateway. At this + point, Bob's laptop has initialized its networking components and is + ready to begin processing the Web page fetch. (Note that only the + last two DHCP steps of the four presented in Chapter 4 are actually + necessary.) + +6.7.2 Still Getting Started: DNS and ARP When Bob types the URL for +www.google.com into his Web browser, he begins the long chain of events +that will eventually result in Google's home page being displayed by his +Web browser. Bob's Web browser begins the process by creating a TCP +socket (Section 2.7) that will be used to send the HTTP request (Section +2.2) to www.google.com. In order to create the socket, Bob's laptop will +need to know the IP address of www.google.com. We learned in Section +2.5, that the DNS protocol is used to provide this name-to-IP-address +translation service. + +8. The operating system on Bob's laptop thus creates a DNS query + message (Section 2.5.3), putting the string "www.google.com" in the + question section of the DNS message. This DNS message is then placed + within a UDP segment with a destination port of 53 (DNS server). The + UDP segment is then placed within an IP datagram with an IP + destination address of + +68.87.71.226 (the address of the DNS server returned in the DHCP ACK in +step 5) and a source IP address of 68.85.2.101. + +9. Bob's laptop then places the datagram containing the DNS query + message in an Ethernet frame. This frame will be sent (addressed, at + the link layer) to the gateway router in Bob's school's network. + However, even though Bob's laptop knows the IP address of the + school's gateway router (68.85.2.1) via the DHCP ACK message in step + 5 above, it doesn't know the gateway router's MAC address. In order + to obtain the MAC address of the gateway router, Bob's laptop will + need to use the ARP protocol (Section 6.4.1). + +10. Bob's laptop creates an ARP query message with a target IP address + of 68.85.2.1 (the default gateway), places the ARP message within an + Ethernet frame with a broadcast destination address + (FF:FF:FF:FF:FF:FF) and sends the Ethernet frame to the switch, + which delivers the frame to all connected devices, including the + gateway router. + +11. The gateway router receives the frame containing the ARP query + message on the interface to the school network, and finds that the + target IP address of 68.85.2.1 in the ARP message matches the IP + address of its interface. The gateway router thus prepares an ARP + reply, indicating that its MAC address of 00:22:6B:45:1F:1B + corresponds to IP address 68.85.2.1. It places the ARP reply message + in an Ethernet frame, with a destination address of + 00:16:D3:23:68:8A (Bob's laptop) and sends the frame to the switch, + which delivers the frame to Bob's laptop. + +12. Bob's laptop receives the frame containing the ARP reply message and + extracts the MAC address of the gateway router (00:22:6B:45:1F:1B) + from the ARP reply message. + +13. Bob's laptop can now (finally!) address the Ethernet frame + containing the DNS query to the gateway router's MAC address. Note + that the IP datagram in this frame has an IP destination address of + 68.87.71.226 (the DNS server), while the frame has a destination + address of 00:22:6B:45:1F:1B (the gateway router). Bob's laptop + sends this frame to the switch, which delivers the frame to the + gateway router. + +6.7.3 Still Getting Started: Intra-Domain Routing to the DNS Server 14. +The gateway router receives the frame and extracts the IP datagram +containing the DNS query. The router looks up the destination address of +this datagram (68.87.71.226) and determines from its forwarding table +that the datagram should be sent to the leftmost router in the Comcast +network in Figure 6.32. The IP datagram is placed inside a link-layer +frame appropriate for the link connecting the school's router to the +leftmost Comcast router and the frame is sent over this link. + +15. The leftmost router in the Comcast network receives the frame, + extracts the IP datagram, examines the datagram's destination + address (68.87.71.226) and determines the outgoing interface on + which to forward the datagram toward the DNS server from its + forwarding table, which has been filled in by Comcast's intra-domain + protocol (such as RIP, OSPF or IS-IS, + +Section 5.3) as well as the Internet's inter-domain protocol, BGP +(Section 5.4). + +16. Eventually the IP datagram containing the DNS query arrives at the + DNS server. The DNS server extracts the DNS query message, looks up + the name www.google.com in its DNS database (Section 2.5), and finds + the DNS resource record that contains the IP address + (64.233.169.105) for www.google.com. (assuming that it is currently + cached in the DNS server). Recall that this cached data originated + in the authoritative DNS server (Section 2.5.2) for googlecom. The + DNS server forms a DNS reply message containing this + hostname-to-IPaddress mapping, and places the DNS reply message in a + UDP segment, and the segment within an IP datagram addressed to + Bob's laptop (68.85.2.101). This datagram will be forwarded back + through the Comcast network to the school's router and from there, + via the Ethernet switch to Bob's laptop. + +17. Bob's laptop extracts the IP address of the server www.google.com + from the DNS message. Finally, after a lot of work, Bob's laptop is + now ready to contact the www.google.com server! + +6.7.4 Web Client-Server Interaction: TCP and HTTP 18. Now that Bob's +laptop has the IP address of www.google.com, it can create the TCP +socket (Section 2.7) that will be used to send the HTTP GET message +(Section 2.2.3) to www.google.com. When Bob creates the TCP socket, the +TCP in Bob's laptop must first perform a three-way handshake (Section +3.5.6) with the TCP in www.google.com. Bob's laptop thus first creates a +TCP SYN segment with destination port 80 (for HTTP), places the TCP +segment inside an IP datagram with a destination IP address of +64.233.169.105 (www.google.com), places the datagram inside a frame with +a destination MAC address of 00:22:6B:45:1F:1B (the gateway router) and +sends the frame to the switch. + +19. The routers in the school network, Comcast's network, and Google's + network forward the datagram containing the TCP SYN toward + www.google.com, using the forwarding table in each router, as in + steps 14--16 above. Recall that the router forwarding table entries + governing forwarding of packets over the inter-domain link between + the Comcast and Google networks are determined by the BGP protocol + (Chapter 5). + +20. Eventually, the datagram containing the TCP SYN arrives at + www.google.com. The TCP SYN message is extracted from the datagram + and demultiplexed to the welcome socket associated with port 80. A + connection socket (Section 2.7) is created for the TCP connection + between the Google HTTP server and Bob's laptop. A TCP SYNACK + (Section 3.5.6) segment is generated, placed inside a datagram + addressed to Bob's laptop, and finally placed inside a link-layer + frame appropriate for the link connecting www.google.com to its + first-hop router. + +21. The datagram containing the TCP SYNACK segment is forwarded through + the Google, Comcast, and school networks, eventually arriving at the + Ethernet card in Bob's laptop. The datagram is demultiplexed within + the operating system to the TCP socket created in step 18, which + enters the connected state. + +22. With the socket on Bob's laptop now (finally!) ready to send bytes +to www.google.com, Bob's browser creates the HTTP GET message (Section +2.2.3) containing the URL to be fetched. The HTTP GET message is then +written into the socket, with the GET message becoming the payload of a +TCP segment. The TCP segment is placed in a datagram and sent and +delivered to www.google.com as in steps 18--20 above. + +23. The HTTP server at www.google.com reads the HTTP GET message from + the TCP socket, creates an HTTP response message (Section 2.2), + places the requested Web page content in the body of the HTTP + response message, and sends the message into the TCP socket. + +24. The datagram containing the HTTP reply message is forwarded through + the Google, Comcast, and school networks, and arrives at Bob's + laptop. Bob's Web browser program reads the HTTP response from the + socket, extracts the html for the Web page from the body of the HTTP + response, and finally (finally!) displays the Web page! Our scenario + above has covered a lot of networking ground! If you've understood + most or all of the above example, then you've also covered a lot of + ground since you first read Section 1.1, where we wrote "much of + this book is concerned with computer network protocols" and you may + have wondered what a protocol actually was! As detailed as the above + example might seem, we've omitted a number of possible additional + protocols (e.g., NAT running in the school's gateway router, + wireless access to the school's network, security protocols for + accessing the school network or encrypting segments or datagrams, + network management protocols), and considerations (Web caching, the + DNS hierarchy) that one would encounter in the public Internet. + We'll cover a number of these topics and more in the second part of + this book. Lastly, we note that our example above was an integrated + and holistic, but also very "nuts and bolts," view of many of the + protocols that we've studied in the first part of this book. The + example focused more on the "how" than the "why." For a broader, + more reflective view on the design of network protocols in general, + see \[Clark 1988, RFC 5218\]. + +6.8 Summary In this chapter, we've examined the link layer---its +services, the principles underlying its operation, and a number of +important specific protocols that use these principles in implementing +link-layer services. We saw that the basic service of the link layer is +to move a network-layer datagram from one node (host, switch, router, +WiFi access point) to an adjacent node. We saw that all link-layer +protocols operate by encapsulating a network-layer datagram within a +link-layer frame before transmitting the frame over the link to the +adjacent node. Beyond this common framing function, however, we learned +that different link-layer protocols provide very different link access, +delivery, and transmission services. These differences are due in part +to the wide variety of link types over which link-layer protocols must +operate. A simple point-to-point link has a single sender and receiver +communicating over a single "wire." A multiple access link is shared +among many senders and receivers; consequently, the link-layer protocol +for a multiple access channel has a protocol (its multiple access +protocol) for coordinating link access. In the case of MPLS, the "link" +connecting two adjacent nodes (for example, two IP routers that are +adjacent in an IP sense---that they are next-hop IP routers toward some +destination) may actually be a network in and of itself. In one sense, +the idea of a network being considered as a link should not seem odd. A +telephone link connecting a home modem/computer to a remote +modem/router, for example, is actually a path through a sophisticated +and complex telephone network. Among the principles underlying +link-layer communication, we examined error-detection and -correction +techniques, multiple access protocols, link-layer addressing, +virtualization (VLANs), and the construction of extended switched LANs +and data center networks. Much of the focus today at the link layer is +on these switched networks. In the case of error detection/correction, +we examined how it is possible to add additional bits to a frame's +header in order to detect, and in some cases correct, bit-flip errors +that might occur when the frame is transmitted over the link. We covered +simple parity and checksumming schemes, as well as the more robust +cyclic redundancy check. We then moved on to the topic of multiple +access protocols. We identified and studied three broad approaches for +coordinating access to a broadcast channel: channel partitioning +approaches (TDM, FDM), random access approaches (the ALOHA protocols and +CSMA protocols), and taking-turns approaches (polling and token +passing). We studied the cable access network and found that it uses +many of these multiple access methods. We saw that a consequence of +having multiple nodes share a single broadcast channel was the need to +provide node addresses at the link layer. We learned that link-layer +addresses were quite different from network-layer addresses and that, in +the case of the Internet, a special protocol (ARP---the Address +Resolution Protocol) is used to translate between these two forms of +addressing and studied the hugely successful Ethernet protocol in +detail. We then examined how nodes sharing a broadcast channel form + +a LAN and how multiple LANs can be connected together to form larger +LANs---all without the intervention of network-layer routing to +interconnect these local nodes. We also learned how multiple virtual +LANs can be created on a single physical LAN infrastructure. We ended +our study of the link layer by focusing on how MPLS networks provide +link-layer services when they interconnect IP routers and an overview of +the network designs for today's massive data centers. We wrapped up this +chapter (and indeed the first five chapters) by identifying the many +protocols that are needed to fetch a simple Web page. Having covered the +link layer, our journey down the protocol stack is now over! Certainly, +the physical layer lies below the link layer, but the details of the +physical layer are probably best left for another course (for example, +in communication theory, rather than computer networking). We have, +however, touched upon several aspects of the physical layer in this +chapter and in Chapter 1 (our discussion of physical media in Section +1.2). We'll consider the physical layer again when we study wireless +link characteristics in the next chapter. Although our journey down the +protocol stack is over, our study of computer networking is not yet at +an end. In the following three chapters we cover wireless networking, +network security, and multimedia networking. These four topics do not +fit conveniently into any one layer; indeed, each topic crosscuts many +layers. Understanding these topics (billed as advanced topics in some +networking texts) thus requires a firm foundation in all layers of the +protocol stack---a foundation that our study of the link layer has now +completed! + +Homework Problems and Questions + +Chapter 6 Review Questions + +SECTIONS 6.1--6.2 R1. Consider the transportation analogy in Section +6.1.1 . If the passenger is analagous to a datagram, what is analogous +to the link layer frame? R2. If all the links in the Internet were to +provide reliable delivery service, would the TCP reliable delivery +service be redundant? Why or why not? R3. What are some of the possible +services that a link-layer protocol can offer to the network layer? +Which of these link-layer services have corresponding services in IP? In +TCP? + +SECTION 6.3 R4. Suppose two nodes start to transmit at the same time a +packet of length L over a broadcast channel of rate R. Denote the +propagation delay between the two nodes as dprop. Will there be a +collision if dprop\<L/R? Why or why not? R5. In Section 6.3 , we listed +four desirable characteristics of a broadcast channel. Which of these +characteristics does slotted ALOHA have? Which of these characteristics +does token passing have? R6. In CSMA/CD, after the fifth collision, what +is the probability that a node chooses K=4? The result K=4 corresponds +to a delay of how many seconds on a 10 Mbps Ethernet? R7. Describe +polling and token-passing protocols using the analogy of cocktail party +interactions. R8. Why would the token-ring protocol be inefficient if a +LAN had a very large perimeter? + +SECTION 6.4 R9. How big is the MAC address space? The IPv4 address +space? The IPv6 address space? R10. Suppose nodes A, B, and C each +attach to the same broadcast LAN (through their adapters). If A sends +thousands of IP datagrams to B with each encapsulating frame addressed +to the MAC address of B, will C's adapter process these frames? If so, +will C's adapter pass the IP datagrams in these frames to the network +layer C? How would your answers change if A sends frames with the MAC +broadcast address? R11. Why is an ARP query sent within a broadcast +frame? Why is an ARP response sent within + +a frame with a specific destination MAC address? R12. For the network in +Figure 6.19 , the router has two ARP modules, each with its own ARP +table. Is it possible that the same MAC address appears in both tables? +R13. Compare the frame structures for 10BASE-T, 100BASE-T, and Gigabit +Ethernet. How do they differ? R14. Consider Figure 6.15 . How many +subnetworks are there, in the addressing sense of Section 4.3 ? R15. +What is the maximum number of VLANs that can be configured on a switch +supporting the 802.1Q protocol? Why? R16. Suppose that N switches +supporting K VLAN groups are to be connected via a trunking protocol. +How many ports are needed to connect the switches? Justify your answer. + +Problems P1. Suppose the information content of a packet is the bit +pattern 1110 0110 1001 1101 and an even parity scheme is being used. +What would the value of the field containing the parity bits be for the +case of a two-dimensional parity scheme? Your answer should be such that +a minimumlength checksum field is used. P2. Show (give an example other +than the one in Figure 6.5 ) that two-dimensional parity checks can +correct and detect a single bit error. Show (give an example of) a +double-bit error that can be detected but not corrected. P3. Suppose the +information portion of a packet (D in Figure 6.3 ) contains 10 bytes +consisting of the 8-bit unsigned binary ASCII representation of string +"Networking." Compute the Internet checksum for this data. P4. Consider +the previous problem, but instead suppose these 10 bytes contain + +a. the binary representation of the numbers 1 through 10. + +b. the ASCII representation of the letters B through K (uppercase). + +c. the ASCII representation of the letters b through k (lowercase). + Compute the Internet checksum for this data. P5. Consider the 5-bit + generator, G=10011, and suppose that D has the value 1010101010. + What is the value of R? P6. Consider the previous problem, but + suppose that D has the value + +d. 1001010101. + +e. 101101010. + +f. 1010100000. P7. In this problem, we explore some of the properties + of the CRC. For the generator G(=1001) given in Section + 6.2.3 , answer the following questions. + +a. Why can it detect any single bit error in data D? b. Can the above G +detect any odd number of bit errors? Why? P8. In Section 6.3 , we +provided an outline of the derivation of the efficiency of slotted +ALOHA. In this problem we'll complete the derivation. + +a. Recall that when there are N active nodes, the efficiency of slotted + ALOHA is Np(1−p)N−1. Find the value of p that maximizes this + expression. + +b. Using the value of p found in (a), find the efficiency of slotted + ALOHA by letting N approach infinity. Hint: (1−1/N)N approaches 1/e + as N approaches infinity. P9. Show that the maximum efficiency of + pure ALOHA is 1/(2e). Note: This problem is easy if you have + completed the problem above! P 10. Consider two nodes, A and B, that + use the slotted ALOHA protocol to contend for a channel. Suppose + node A has more data to transmit than node B, and node A's + retransmission probability pA is greater than node B's + retransmission probability, pB. + +c. Provide a formula for node A's average throughput. What is the total + efficiency of the protocol with these two nodes? + +d. If pA=2pB, is node A's average throughput twice as large as that of + node B? Why or why not? If not, how can you choose pA and pB to make + that happen? + +e. In general, suppose there are N nodes, among which node A has + retransmission probability 2p and all other nodes have + retransmission probability p. Provide expressions to compute the + average throughputs of node A and of any other node. P11. Suppose + four active nodes---nodes A, B, C and D---are competing for access + to a channel using slotted ALOHA. Assume each node has an infinite + number of packets to send. Each node attempts to transmit in each + slot with probability p. The first slot is numbered slot 1, the + second slot is numbered slot 2, and so on. + +f. What is the probability that node A succeeds for the first time in + slot 5? + +g. What is the probability that some node (either A, B, C or D) + succeeds in slot 4? + +h. What is the probability that the first success occurs in slot 3? + +i. What is the efficiency of this four-node system? P12. Graph the + efficiency of slotted ALOHA and pure ALOHA as a function of p for + the following values of N: + +j. N=15. + +k. N=25. + +l. N=35. P13. Consider a broadcast channel with N nodes and a + transmission rate of R bps. Suppose the broadcast channel uses + polling (with an additional polling node) for multiple access. + Suppose the + +amount of time from when a node completes transmission until the +subsequent node is permitted to transmit (that is, the polling delay) is +dpoll. Suppose that within a polling round, a given node is allowed to +transmit at most Q bits. What is the maximum throughput of the broadcast +channel? P14. Consider three LANs interconnected by two routers, as +shown in Figure 6.33 . + +a. Assign IP addresses to all of the interfaces. For Subnet 1 use + addresses of the form 192.168.1.xxx; for Subnet 2 uses addresses of + the form 192.168.2.xxx; and for Subnet 3 use addresses of the form + 192.168.3.xxx. + +b. Assign MAC addresses to all of the adapters. + +c. Consider sending an IP datagram from Host E to Host B. Suppose all + of the ARP tables are up to date. Enumerate all the steps, as done + for the single-router example in Section 6.4.1 . + +d. Repeat (c), now assuming that the ARP table in the sending host is + empty (and the other tables are up to date). P15. Consider Figure + 6.33 . Now we replace the router between subnets 1 and 2 with a + switch S1, and label the router between subnets 2 and 3 as R1. + +Figure 6.33 Three subnets, interconnected by routers + +a. Consider sending an IP datagram from Host E to Host F. Will Host E + ask router R1 to help forward the datagram? Why? In the Ethernet + frame containing the IP datagram, what are the source and + destination IP and MAC addresses? + +b. Suppose E would like to send an IP datagram to B, and assume that + E's ARP cache does not contain B's MAC address. Will E perform an + ARP query to find B's MAC + +address? Why? In the Ethernet frame (containing the IP datagram destined +to B) that is delivered to router R1, what are the source and +destination IP and MAC addresses? + +c. Suppose Host A would like to send an IP datagram to Host B, and + neither A's ARP cache contains B's MAC address nor does B's ARP + cache contain A's MAC address. Further suppose that the switch S1's + forwarding table contains entries for Host B and router R1 only. + Thus, A will broadcast an ARP request message. What actions will + switch S1 perform once it receives the ARP request message? Will + router R1 also receive this ARP request message? If so, will R1 + forward the message to Subnet 3? Once Host B receives this ARP + request message, it will send back to Host A an ARP response + message. But will it send an ARP query message to ask for A's MAC + address? Why? What will switch S1 do once it receives an ARP + response message from Host B? P16. Consider the previous problem, + but suppose now that the router between subnets 2 and 3 is replaced + by a switch. Answer questions (a)--(c) in the previous problem in + this new context. P17. Recall that with the CSMA/CD protocol, the + adapter waits K⋅512 bit times after a collision, where K is drawn + randomly. For K=100, how long does the adapter wait until returning + to Step 2 for a 10 Mbps broadcast channel? For a 100 Mbps broadcast + channel? P18. Suppose nodes A and B are on the same 10 Mbps + broadcast channel, and the propagation delay between the two nodes + is 325 bit times. Suppose CSMA/CD and Ethernet packets are used for + this broadcast channel. Suppose node A begins transmitting a frame + and, before it finishes, node B begins transmitting a frame. Can A + finish transmitting before it detects that B has transmitted? Why or + why not? If the answer is yes, then A incorrectly believes that its + frame was successfully transmitted without a collision. Hint: + Suppose at time t=0 bits, A begins transmitting a frame. In the + worst case, A transmits a minimum-sized frame of 512+64 bit times. + So A would finish transmitting the frame at t=512+64 bit times. + Thus, the answer is no, if B's signal reaches A before bit time + t=512+64 bits. In the worst case, when does B's signal reach A? P19. + Suppose nodes A and B are on the same 10 Mbps broadcast channel, and + the propagation delay between the two nodes is 245 bit times. + Suppose A and B send Ethernet frames at the same time, the frames + collide, and then A and B choose different values of K in the + CSMA/CD algorithm. Assuming no other nodes are active, can the + retransmissions from A and B collide? For our purposes, it suffices + to work out the following example. Suppose A and B begin + transmission at t=0 bit times. They both detect collisions at t=245 + t bit times. Suppose KA=0 and KB=1. At what time does B schedule its + retransmission? At what time does A begin transmission? (Note: The + nodes must wait for an idle channel after returning to Step 2---see + protocol.) At what time does A's signal reach B? Does B refrain from + transmitting at its scheduled time? P20. In this problem, you will + derive the efficiency of a CSMA/CD-like multiple access protocol. In + this protocol, time is slotted and all adapters are synchronized to + the slots. Unlike slotted ALOHA, however, the length of a slot (in + seconds) is much less than a frame time (the time to transmit a + frame). Let S be the length of a slot. Suppose all frames are of + constant length + +L=kRS, where R is the transmission rate of the channel and k is a large +integer. Suppose there are N nodes, each with an infinite number of +frames to send. We also assume that dprop\<S, so that all nodes can +detect a collision before the end of a slot time. The protocol is as +follows: If, for a given slot, no node has possession of the channel, +all nodes contend for the channel; in particular, each node transmits in +the slot with probability p. If exactly one node transmits in the slot, +that node takes possession of the channel for the subsequent k−1 slots +and transmits its entire frame. If some node has possession of the +channel, all other nodes refrain from transmitting until the node that +possesses the channel has finished transmitting its frame. Once this +node has transmitted its frame, all nodes contend for the channel. Note +that the channel alternates between two states: the productive state, +which lasts exactly k slots, and the nonproductive state, which lasts +for a random number of slots. Clearly, the channel efficiency is the +ratio of k/(k+x), where x is the expected number of consecutive +unproductive slots. + +a. For fixed N and p, determine the efficiency of this protocol. + +b. For fixed N, determine the p that maximizes the efficiency. + +c. Using the p (which is a function of N) found in (b), determine the + efficiency as N approaches infinity. + +d. Show that this efficiency approaches 1 as the frame length becomes + large. P21. Consider Figure 6.33 in problem P14. Provide MAC + addresses and IP addresses for the interfaces at Host A, both + routers, and Host F. Suppose Host A sends a datagram to Host F. Give + the source and destination MAC addresses in the frame encapsulating + this IP datagram as the frame is transmitted (i) from A to the left + router, (ii) from the left router to the right router, (iii) from + the right router to F. Also give the source and destination IP + addresses in the IP datagram encapsulated within the frame at each + of these points in time. P22. Suppose now that the leftmost router + in Figure 6.33 is replaced by a switch. Hosts A, B, C, and D and the + right router are all star-connected into this switch. Give the + source and destination MAC addresses in the frame encapsulating this + IP datagram as the frame is transmitted (i) from A to the + switch, (ii) from the switch to the right router, (iii) from the + right router to F. Also give the source and destination IP addresses + in the IP datagram encapsulated within the frame at each of these + points in time. P23. Consider Figure 6.15 . Suppose that all links + are 100 Mbps. What is the maximum total aggregate throughput that + can be achieved among the 9 hosts and 2 servers in this network? You + can assume that any host or server can send to any other host or + server. Why? P24. Suppose the three departmental switches in Figure + 6.15 are replaced by hubs. All links are 100 Mbps. Now answer the + questions posed in problem P23. P25. Suppose that all the switches + in Figure 6.15 are replaced by hubs. All links are 100 Mbps. Now + answer the questions posed in problem P23. + +P26. Let's consider the operation of a learning switch in the context of +a network in which 6 nodes labeled A through F are star connected into +an Ethernet switch. Suppose that (i) B sends a frame to E, (ii) E +replies with a frame to B, (iii) A sends a frame to B, (iv) B replies +with a frame to A. The switch table is initially empty. Show the state +of the switch table before and after each of these events. For each of +these events, identify the link(s) on which the transmitted frame will +be forwarded, and briefly justify your answers. P27. In this problem, we +explore the use of small packets for Voice-over-IP applications. One of +the drawbacks of a small packet size is that a large fraction of link +bandwidth is consumed by overhead bytes. To this end, suppose that the +packet consists of P bytes and 5 bytes of header. + +a. Consider sending a digitally encoded voice source directly. Suppose + the source is encoded at a constant rate of 128 kbps. Assume each + packet is entirely filled before the source sends the packet into + the network. The time required to fill a packet is the packetization + delay. In terms of L, determine the packetization delay in + milliseconds. + +b. Packetization delays greater than 20 msec can cause a noticeable and + unpleasant echo. Determine the packetization delay for L=1,500 bytes + (roughly corresponding to a maximum-sized Ethernet packet) and for + L=50 (corresponding to an ATM packet). + +c. Calculate the store-and-forward delay at a single switch for a link + rate of R=622 Mbps for L=1,500 bytes, and for L=50 bytes. + +d. Comment on the advantages of using a small packet size. P28. + Consider the single switch VLAN in Figure 6.25 , and assume an + external router is connected to switch port 1. Assign IP addresses + to the EE and CS hosts and router interface. Trace the steps taken + at both the network layer and the link layer to transfer an IP + datagram from an EE host to a CS host (Hint: Reread the discussion + of Figure 6.19 in the text). P29. Consider the MPLS network shown in + Figure 6.29 , and suppose that routers R5 and R6 are now MPLS + enabled. Suppose that we want to perform traffic engineering so that + packets from R6 destined for A are switched to A via R6-R4-R3-R1, + and packets from R5 destined for A are switched via R5-R4-R2-R1. + Show the MPLS tables in R5 and R6, as well as the modified table in + R4, that would make this possible. P30. Consider again the same + scenario as in the previous problem, but suppose that packets from + R6 destined for D are switched via R6-R4-R3, while packets from R5 + destined to D are switched via R4-R2-R1-R3. Show the MPLS tables in + all routers that would make this possible. P31. In this problem, you + will put together much of what you have learned about Internet + protocols. Suppose you walk into a room, connect to Ethernet, and + want to download a Web page. What are all the protocol steps that + take place, starting from powering on your PC to getting the Web + page? Assume there is nothing in our DNS or browser caches when you + power on your PC. (Hint: The steps include the use of Ethernet, + DHCP, ARP, DNS, TCP, and HTTP protocols.) Explicitly indicate in + your steps how you obtain the IP and MAC addresses of a gateway + router. P32. Consider the data center network with hierarchical + topology in Figure 6.30 . Suppose now + +there are 80 pairs of flows, with ten flows between the first and ninth +rack, ten flows between the second and tenth rack, and so on. Further +suppose that all links in the network are 10 Gbps, except for the links +between hosts and TOR switches, which are 1 Gbps. + +a. Each flow has the same data rate; determine the maximum rate of a + flow. + +b. For the same traffic pattern, determine the maximum rate of a flow + for the highly interconnected topology in Figure 6.31 . + +c. Now suppose there is a similar traffic pattern, but involving 20 + hosts on each rack and 160 pairs of flows. Determine the maximum + flow rates for the two topologies. P33. Consider the hierarchical + network in Figure 6.30 and suppose that the data center needs to + support e-mail and video distribution among other applications. + Suppose four racks of servers are reserved for e-mail and four racks + are reserved for video. For each of the applications, all four racks + must lie below a single tier-2 switch since the tier-2 to tier-1 + links do not have sufficient bandwidth to support the + intra-application traffic. For the e-mail application, suppose that + for 99.9 percent of the time only three racks are used, and that the + video application has identical usage patterns. + +d. For what fraction of time does the e-mail application need to use a + fourth rack? How about for the video application? + +e. Assuming e-mail usage and video usage are independent, for what + fraction of time do (equivalently, what is the probability that) + both applications need their fourth rack? + +f. Suppose that it is acceptable for an application to have a shortage + of servers for 0.001 percent of time or less (causing rare periods + of performance degradation for users). Discuss how the topology in + Figure 6.31 can be used so that only seven racks are collectively + assigned to the two applications (assuming that the topology can + support all the traffic). + +Wireshark Labs At the Companion website for this textbook, +http://www.pearsonhighered.com/cs-resources/, you'll find a Wireshark +lab that examines the operation of the IEEE 802.3 protocol and the +Wireshark frame format. A second Wireshark lab examines packet traces +taken in a home network scenario. + +AN INTERVIEW WITH... Simon S. Lam Simon S. Lam is Professor and Regents +Chair in Computer Sciences at the University of Texas at Austin. From +1971 to 1974, he was with the ARPA Network Measurement Center at UCLA, +where he worked on satellite and radio packet switching. He led a +research group that invented secure sockets and prototyped, in 1993, the +first secure sockets layer named Secure Network Programming, which won +the 2004 ACM Software System Award. His research interests are in design +and analysis of network protocols and security services. He received his +BSEE from + +Washington State University and his MS and PhD from UCLA. He was elected +to the National Academy of Engineering in 2007. + +Why did you decide to specialize in networking? When I arrived at UCLA +as a new graduate student in Fall 1969, my intention was to study +control theory. Then I took the queuing theory classes of Leonard +Kleinrock and was very impressed by him. For a while, I was working on +adaptive control of queuing systems as a possible thesis topic. In early +1972, Larry Roberts initiated the ARPAnet Satellite System project +(later called Packet Satellite). Professor Kleinrock asked me to join +the project. The first thing we did was to introduce a simple, yet +realistic, backoff algorithm to the slotted ALOHA protocol. Shortly +thereafter, I found many interesting research problems, such as ALOHA's +instability problem and need for adaptive backoff, which would form the +core of my thesis. You were active in the early days of the Internet in +the 1970s, beginning with your student days at UCLA. What was it like +then? Did people have any inkling of what the Internet would become? The +atmosphere was really no different from other system-building projects I +have seen in industry and academia. The initially stated goal of the +ARPAnet was fairly modest, that is, to provide access to expensive +computers from remote locations so that many more scientists could use +them. However, with the startup of the Packet Satellite project in 1972 +and the Packet Radio project in 1973, ARPA's goal had expanded +substantially. By 1973, ARPA was building three different packet +networks at the same time, and it became necessary for Vint Cerf and Bob +Kahn to develop an interconnection strategy. Back then, all of these +progressive developments in networking were viewed (I believe) as +logical rather than magical. No one could have envisioned the scale of +the Internet and power of personal computers today. It was a decade +before appearance of the first PCs. To put things in perspective, most +students submitted their computer programs as decks of punched cards for +batch processing. Only some students had direct access to computers, +which were typically housed in a restricted area. Modems were slow and +still a rarity. As a graduate student, I had only a phone on my desk, +and I used pencil and paper to do most of my work. + +Where do you see the field of networking and the Internet heading in the +future? In the past, the simplicity of the Internet's IP protocol was +its greatest strength in vanquishing competition and becoming the de +facto standard for internetworking. Unlike competitors, such as X.25 in +the 1980s and ATM in the 1990s, IP can run on top of any link-layer +networking technology, because it offers only a best-effort datagram +service. Thus, any packet network can connect to the Internet. Today, +IP's greatest strength is actually a shortcoming. IP is like a +straitjacket that confines the Internet's development to specific +directions. In recent years, many researchers have redirected their +efforts to the application layer only. There is also a great deal of +research on wireless ad hoc networks, sensor networks, and satellite +networks. These networks can be viewed either as stand-alone systems or +link-layer systems, which can flourish because they are outside of the +IP straitjacket. Many people are excited about the possibility of P2P +systems as a platform for novel Internet applications. However, P2P +systems are highly inefficient in their use of Internet resources. A +concern of mine is whether the transmission and switching capacity of +the Internet core will continue to increase faster than the traffic +demand on the Internet as it grows to interconnect all kinds of devices +and support future P2P-enabled applications. Without substantial +overprovisioning of capacity, ensuring network stability in the presence +of malicious attacks and congestion will continue to be a significant +challenge. The Internet's phenomenal growth also requires the allocation +of new IP addresses at a rapid rate to network operators and enterprises +worldwide. At the current rate, the pool of unallocated IPv4 addresses +would be depleted in a few years. When that happens, large contiguous +blocks of address space can only be allocated from the IPv6 address +space. Since adoption of IPv6 is off to a slow start, due to lack of +incentives for early adopters, IPv4 and IPv6 will most likely coexist on +the Internet for many years to come. Successful migration from an +IPv4-dominant Internet to an IPv6-dominant Internet will require a +substantial global effort. What is the most challenging part of your +job? The most challenging part of my job as a professor is teaching and +motivating every student in my class, and every doctoral student under +my supervision, rather than just the high achievers. The very bright and +motivated may require a little guidance but not much else. I often learn +more from these students than they learn from me. Educating and +motivating the underachievers present a major challenge. What impacts do +you foresee technology having on learning in the future? Eventually, +almost all human knowledge will be accessible through the Internet, +which will be the most powerful tool for learning. This vast knowledge +base will have the potential of leveling the + +playing field for students all over the world. For example, motivated +students in any country will be able to access the best-class Web sites, +multimedia lectures, and teaching materials. Already, it was said that +the IEEE and ACM digital libraries have accelerated the development of +computer science researchers in China. In time, the Internet will +transcend all geographic barriers to learning. + +Chapter 7 Wireless and Mobile Networks + +In the telephony world, the past 20 years have arguably been the golden +years of cellular telephony. The number of worldwide mobile cellular +subscribers increased from 34 million in 1993 to nearly 7.0 billion +subscribers by 2014, with the number of cellular subscribers now +surpassing the number of wired telephone lines. There are now a larger +number of mobile phone subscriptions than there are people on our +planet. The many advantages of cell phones are evident to +all---anywhere, anytime, untethered access to the global telephone +network via a highly portable lightweight device. More recently, +laptops, smartphones, and tablets are wirelessly connected to the +Internet via a cellular or WiFi network. And increasingly, devices such +as gaming consoles, thermostats, home security systems, home appliances, +watches, eye glasses, cars, traffic control systems and more are being +wirelessly connected to the Internet. From a networking standpoint, the +challenges posed by networking these wireless and mobile devices, +particularly at the link layer and the network layer, are so different +from traditional wired computer networks that an individual chapter +devoted to the study of wireless and mobile networks (i.e., this +chapter) is appropriate. We'll begin this chapter with a discussion of +mobile users, wireless links, and networks, and their relationship to +the larger (typically wired) networks to which they connect. We'll draw +a distinction between the challenges posed by the wireless nature of the +communication links in such networks, and by the mobility that these +wireless links enable. Making this important distinction---between +wireless and mobility---will allow us to better isolate, identify, and +master the key concepts in each area. Note that there are indeed many +networked environments in which the network nodes are wireless but not +mobile (e.g., wireless home or office networks with stationary +workstations and large displays), and that there are limited forms of +mobility that do not require wireless links (e.g., a worker who uses a +wired laptop at home, shuts down the laptop, drives to work, and +attaches the laptop to the company's wired network). Of course, many of +the most exciting networked environments are those in which users are +both wireless and mobile---for example, a scenario in which a mobile +user (say in the back seat of car) maintains a Voice-over-IP call and +multiple ongoing TCP connections while racing down the autobahn at 160 +kilometers per hour, soon in an autonomous vehicle. It is here, at the +intersection of wireless and mobility, that we'll find the most +interesting technical challenges! + +We'll begin by illustrating the setting in which we'll consider wireless +communication and mobility---a network in which wireless (and possibly +mobile) users are connected into the larger network infrastructure by a +wireless link at the network's edge. We'll then consider the +characteristics of this wireless link in Section 7.2. We include a brief +introduction to code division multiple access (CDMA), a shared-medium +access protocol that is often used in wireless networks, in Section 7.2. +In Section 7.3, we'll examine the link-level aspects of the IEEE 802.11 +(WiFi) wireless LAN standard in some depth; we'll also say a few words +about Bluetooth and other wireless personal area networks. In Section +7.4, we'll provide an overview of cellular Internet access, including 3G +and emerging 4G cellular technologies that provide both voice and +high-speed Internet access. In Section 7.5, we'll turn our attention to +mobility, focusing on the problems of locating a mobile user, routing to +the mobile user, and "handing off" the mobile user who dynamically moves +from one point of attachment to the network to another. We'll examine +how these mobility services are implemented in the mobile IP standard in +enterprise 802.11 networks, and in LTE cellular networks in Sections 7.6 +and 7.7, respectively. Finally, we'll consider the impact of wireless +links and mobility on transport-layer protocols and networked +applications in Section 7.8. + +7.1 Introduction Figure 7.1 shows the setting in which we'll consider +the topics of wireless data communication and mobility. We'll begin by +keeping our discussion general enough to cover a wide range of networks, +including both wireless LANs such as IEEE 802.11 and cellular networks +such as a 4G network; we'll drill down into a more detailed discussion +of specific wireless architectures in later sections. We can identify +the following elements in a wireless network: Wireless hosts. As in the +case of wired networks, hosts are the end-system devices that run +applications. A wireless host might be a laptop, tablet, smartphone, or +desktop computer. The hosts themselves may or may not be mobile. + +Figure 7.1 Elements of a wireless network + +Wireless links. A host connects to a base station (defined below) or to +another wireless host through a wireless communication link. Different +wireless link technologies have different + +transmission rates and can transmit over different distances. Figure 7.2 +shows two key characteristics (coverage area and link rate) of the more +popular wireless network standards. (The figure is only meant to provide +a rough idea of these characteristics. For example, some of these types +of networks are only now being deployed, and some link rates can +increase or decrease beyond the values shown depending on distance, +channel conditions, and the number of users in the wireless network.) +We'll cover these standards later in the first half of this chapter; +we'll also consider other wireless link characteristics (such as their +bit error rates and the causes of bit errors) in Section 7.2. In Figure +7.1, wireless links connect wireless hosts located at the edge of the +network into the larger network infrastructure. We hasten to add that +wireless links are also sometimes used within a network to connect +routers, switches, and + +Figure 7.2 Link characteristics of selected wireless network standards + +other network equipment. However, our focus in this chapter will be on +the use of wireless communication at the network edge, as it is here +that many of the most exciting technical challenges, and most of the +growth, are occurring. Base station. The base station is a key part of +the wireless network infrastructure. Unlike the wireless host and +wireless link, a base station has no obvious counterpart in a wired +network. A base station is responsible for sending and receiving data +(e.g., packets) to and from a wireless host that is associated with that +base station. A base station will often be responsible for coordinating +the transmission of multiple wireless hosts with which it is associated. +When we say a wireless host is + +"associated" with a base station, we mean that (1) the host is within +the wireless communication distance of the base station, and (2) the +host uses that base station to relay data between it (the host) and the +larger network. Cell towers in cellular networks and access points in +802.11 wireless LANs are examples of base stations. In Figure 7.1, the +base station is connected to the larger network (e.g., the Internet, +corporate or home network, or telephone network), thus functioning as a +link-layer relay between the wireless host and the rest of the world +with which the host communicates. Hosts associated with a base station +are often referred to as operating in infrastructure mode, since all +traditional network services (e.g., address assignment and routing) are +provided by the network to which a host is connected via CASE HISTORY +PUBLIC WIFI ACCESS: COMING SOON TO A LAMP POST NEAR YOU? WiFi +hotspots---public locations where users can find 802.11 wireless +access---are becoming increasingly common in hotels, airports, and cafés +around the world. Most college campuses offer ubiquitous wireless +access, and it's hard to find a hotel that doesn't offer wireless +Internet access. Over the past decade a number of cities have designed, +deployed, and operated municipal WiFi networks. The vision of providing +ubiquitous WiFi access to the community as a public service (much like +streetlights)---helping to bridge the digital divide by providing +Internet access to all citizens and to promote economic development---is +compelling. Many cities around the world, including Philadelphia, +Toronto, Hong Kong, Minneapolis, London, and Auckland, have plans to +provide ubiquitous wireless within the city, or have already done so to +varying degrees. The goal in Philadelphia was to "turn Philadelphia into +the nation's largest WiFi hotspot and help to improve education, bridge +the digital divide, enhance neighborhood development, and reduce the +costs of government." The ambitious program--- an agreement between the +city, Wireless Philadelphia (a nonprofit entity), and the Internet +Service Provider Earthlink---built an operational network of 802.11b +hotspots on streetlamp pole arms and traffic control devices that +covered 80 percent of the city. But financial and operational concerns +caused the network to be sold to a group of private investors in 2008, +who later sold the network back to the city in 2010. Other cities, such +as Minneapolis, Toronto, Hong Kong, and Auckland, have had success with +smaller-scale efforts. The fact that 802.11 networks operate in the +unlicensed spectrum (and hence can be deployed without purchasing +expensive spectrum use rights) would seem to make them financially +attractive. However, 802.11 access points (see Section 7.3) have much +shorter ranges than 4G cellular base stations (see Section 7.4), +requiring a larger number of deployed endpoints to cover the same +geographic region. Cellular data networks providing Internet access, on +the other hand, operate in the licensed spectrum. Cellular providers pay + +billions of dollars for spectrum access rights for their networks, +making cellular data networks a business rather than municipal +undertaking. the base station. In ad hoc networks, wireless hosts have +no such infrastructure with which to connect. In the absence of such +infrastructure, the hosts themselves must provide for services such as +routing, address assignment, DNS-like name translation, and more. When a +mobile host moves beyond the range of one base station and into the +range of another, it will change its point of attachment into the larger +network (i.e., change the base station with which it is associated)---a +process referred to as handoff. Such mobility raises many challenging +questions. If a host can move, how does one find the mobile host's +current location in the network so that data can be forwarded to that +mobile host? How is addressing performed, given that a host can be in +one of many possible locations? If the host moves during a TCP +connection or phone call, how is data routed so that the connection +continues uninterrupted? These and many (many!) other questions make +wireless and mobile networking an area of exciting networking research. +Network infrastructure. This is the larger network with which a wireless +host may wish to communicate. Having discussed the "pieces" of a +wireless network, we note that these pieces can be combined in many +different ways to form different types of wireless networks. You may +find a taxonomy of these types of wireless networks useful as you read +on in this chapter, or read/learn more about wireless networks beyond +this book. At the highest level we can classify wireless networks +according to two criteria: (i) whether a packet in the wireless network +crosses exactly one wireless hop or multiple wireless hops, and (ii) +whether there is infrastructure such as a base station in the network: +Single-hop, infrastructure-based. These networks have a base station +that is connected to a larger wired network (e.g., the Internet). +Furthermore, all communication is between this base station and a +wireless host over a single wireless hop. The 802.11 networks you use in +the classroom, café, or library; and the 4G LTE data networks that we +will learn about shortly all fall in this category. The vast majority of +our daily interactions are with single-hop, infrastructure-based +wireless networks. Single-hop, infrastructure-less. In these networks, +there is no base station that is connected to a wireless network. +However, as we will see, one of the nodes in this single-hop network may +coordinate the transmissions of the other nodes. Bluetooth networks +(that connect small wireless devices such as keyboards, speakers, and +headsets, and which we will study in Section 7.3.6) and 802.11 networks +in ad hoc mode are single-hop, infrastructure-less networks. Multi-hop, +infrastructure-based. In these networks, a base station is present that +is wired to the larger network. However, some wireless nodes may have to +relay their communication through other wireless nodes in order to +communicate via the base station. Some wireless sensor networks and +so-called wireless mesh networks fall in this category. Multi-hop, +infrastructure-less. There is no base station in these networks, and +nodes may have to relay messages among several other nodes in order to +reach a destination. Nodes may also be + +mobile, with connectivity changing among nodes---a class of networks +known as mobile ad hoc networks (MANETs). If the mobile nodes are +vehicles, the network is a vehicular ad hoc network (VANET). As you +might imagine, the development of protocols for such networks is +challenging and is the subject of much ongoing research. In this +chapter, we'll mostly confine ourselves to single-hop networks, and then +mostly to infrastructurebased networks. Let's now dig deeper into the +technical challenges that arise in wireless and mobile networks. We'll +begin by first considering the individual wireless link, deferring our +discussion of mobility until later in this chapter. + +7.2 Wireless Links and Network Characteristics Let's begin by +considering a simple wired network, say a home network, with a wired +Ethernet switch (see Section 6.4) interconnecting the hosts. If we +replace the wired Ethernet with a wireless 802.11 network, a wireless +network interface would replace the host's wired Ethernet interface, and +an access point would replace the Ethernet switch, but virtually no +changes would be needed at the network layer or above. This suggests +that we focus our attention on the link layer when looking for important +differences between wired and wireless networks. Indeed, we can find a +number of important differences between a wired link and a wireless +link: Decreasing signal strength. Electromagnetic radiation attenuates +as it passes through matter (e.g., a radio signal passing through a +wall). Even in free space, the signal will disperse, resulting in +decreased signal strength (sometimes referred to as path loss) as the +distance between sender and receiver increases. Interference from other +sources. Radio sources transmitting in the same frequency band will +interfere with each other. For example, 2.4 GHz wireless phones and +802.11b wireless LANs transmit in the same frequency band. Thus, the +802.11b wireless LAN user talking on a 2.4 GHz wireless phone can expect +that neither the network nor the phone will perform particularly well. +In addition to interference from transmitting sources, electromagnetic +noise within the environment (e.g., a nearby motor, a microwave) can +result in interference. Multipath propagation. Multipath propagation +occurs when portions of the electromagnetic wave reflect off objects and +the ground, taking paths of different lengths between a sender and +receiver. This results in the blurring of the received signal at the +receiver. Moving objects between the sender and receiver can cause +multipath propagation to change over time. For a detailed discussion of +wireless channel characteristics, models, and measurements, see +\[Anderson 1995\]. The discussion above suggests that bit errors will be +more common in wireless links than in wired links. For this reason, it +is perhaps not surprising that wireless link protocols (such as the +802.11 protocol we'll examine in the following section) employ not only +powerful CRC error detection codes, but also link-level +reliable-data-transfer protocols that retransmit corrupted frames. +Having considered the impairments that can occur on a wireless channel, +let's next turn our attention to the host receiving the wireless signal. +This host receives an electromagnetic signal that is a combination of a +degraded form of the original signal transmitted by the sender (degraded +due to the attenuation and multipath propagation effects that we +discussed above, among others) and background noise in the + +environment. The signal-to-noise ratio (SNR) is a relative measure of +the strength of the received signal (i.e., the information being +transmitted) and this noise. The SNR is typically measured in units of +decibels (dB), a unit of measure that some think is used by electrical +engineers primarily to confuse computer scientists. The SNR, measured in +dB, is twenty times the ratio of the base-10 logarithm of the amplitude +of the received signal to the amplitude of the noise. For our purposes +here, we need only know that a larger SNR makes it easier for the +receiver to extract the transmitted signal from the background noise. +Figure 7.3 (adapted from \[Holland 2001\]) shows the bit error rate +(BER)---roughly speaking, the probability that a transmitted bit is +received in error at the receiver---versus the SNR for three different +modulation techniques for encoding information for transmission on an +idealized wireless channel. The theory of modulation and coding, as well +as signal extraction and BER, is well beyond the scope of + +Figure 7.3 Bit error rate, transmission rate, and SNR + +Figure 7.4 Hidden terminal problem caused by obstacle (a) and fading (b) + +this text (see \[Schwartz 1980\] for a discussion of these topics). +Nonetheless, Figure 7.3 illustrates several physical-layer +characteristics that are important in understanding higher-layer +wireless communication protocols: For a given modulation scheme, the +higher the SNR, the lower the BER. Since a sender can increase the SNR +by increasing its transmission power, a sender can decrease the +probability that a frame is received in error by increasing its +transmission power. Note, however, that there is arguably little +practical gain in increasing the power beyond a certain threshold, say +to decrease the BER from 10−12 to 10−13. There are also disadvantages +associated with increasing the transmission power: More energy must be +expended by the sender (an important concern for battery-powered mobile +users), and the sender's transmissions are more likely to interfere with +the transmissions of another sender (see Figure 7.4(b)). For a given +SNR, a modulation technique with a higher bit transmission rate (whether +in error or not) will have a higher BER. For example, in Figure 7.3, +with an SNR of 10 dB, BPSK modulation with a transmission rate of 1 Mbps +has a BER of less than 10−7, while with QAM16 modulation with a +transmission rate of 4 Mbps, the BER is 10−1, far too high to be +practically useful. However, with an SNR of 20 dB, QAM16 modulation has +a transmission rate of 4 Mbps and a BER of 10−7, while BPSK modulation +has a transmission rate of only 1 Mbps and a BER that is so low as to be +(literally) "off the charts." If one can tolerate a BER of 10−7, the +higher transmission rate offered by QAM16 would make it the preferred +modulation technique in this situation. These considerations give rise +to the final characteristic, described next. Dynamic selection of the +physical-layer modulation technique can be used to adapt the modulation +technique to channel conditions. The SNR (and hence the BER) may change +as a result of mobility or due to changes in the environment. Adaptive +modulation and coding are used in cellular data systems and in the +802.11 WiFi and 4G cellular data networks that we'll study in Sections +7.3 and 7.4. This allows, for example, the selection of a modulation +technique that provides the highest transmission rate possible subject +to a constraint on the BER, for given channel characteristics. + +A higher and time-varying bit error rate is not the only difference +between a wired and wireless link. Recall that in the case of wired +broadcast links, all nodes receive the transmissions from all other +nodes. In the case of wireless links, the situation is not as simple, as +shown in Figure 7.4. Suppose that Station A is transmitting to Station +B. Suppose also that Station C is transmitting to Station B. With the +so-called hidden terminal problem, physical obstructions in the +environment (for example, a mountain or a building) may prevent A and C +from hearing each other's transmissions, even though A's and C's +transmissions are indeed interfering at the destination, B. This is +shown in Figure 7.4(a). A second scenario that results in undetectable +collisions at the receiver results from the fading of a signal's +strength as it propagates through the wireless medium. Figure 7.4(b) +illustrates the case where A and C are placed such that their signals +are not strong enough to detect each other's transmissions, yet their +signals are strong enough to interfere with each other at station B. As +we'll see in Section 7.3, the hidden terminal problem and fading make +multiple access in a wireless network considerably more complex than in +a wired network. + +7.2.1 CDMA Recall from Chapter 6 that when hosts communicate over a +shared medium, a protocol is needed so that the signals sent by multiple +senders do not interfere at the receivers. In Chapter 6 we described +three classes of medium access protocols: channel partitioning, random +access, and taking turns. Code division multiple access (CDMA) belongs +to the family of channel partitioning protocols. It is prevalent in +wireless LAN and cellular technologies. Because CDMA is so important in +the wireless world, we'll take a quick look at CDMA now, before getting +into specific wireless access technologies in the subsequent sections. +In a CDMA protocol, each bit being sent is encoded by multiplying the +bit by a signal (the code) that changes at a much faster rate (known as +the chipping rate) than the original sequence of data bits. Figure 7.5 +shows a simple, idealized CDMA encoding/decoding scenario. Suppose that +the rate at which original data bits reach the CDMA encoder defines the +unit of time; that is, each original data bit to be transmitted requires +a one-bit slot time. Let di be the value of the data bit for the ith bit +slot. For mathematical convenience, we represent a data bit with a 0 +value as −1. Each bit slot is further subdivided into M mini-slots; in +Figure 7.5, M=8, + +Figure 7.5 A simple CDMA example: Sender encoding, receiver decoding + +although in practice M is much larger. The CDMA code used by the sender +consists of a sequence of M values, cm, m=1,..., M, each taking a+1 or +−1 value. In the example in Figure 7.5, the M-bit CDMA code being used +by the sender is (1,1,1,−1,1,−1,−1,−1). To illustrate how CDMA works, +let us focus on the ith data bit, di. For the mth mini-slot of the +bittransmission time of di, the output of the CDMA encoder, Zi,m, is the +value of di multiplied by the mth bit in the assigned CDMA code, cm: +Zi,m=di⋅cm In a simple world, with no interfering senders, the receiver +would receive the encoded bits, Zi,m, and recover the original data bit, +di, by computing: + +(7.1) + +di=1M∑m=1MZi,m⋅cm + +(7.2) + +The reader might want to work through the details of the example in +Figure 7.5 to see that the original data bits are indeed correctly +recovered at the receiver using Equation 7.2. The world is far from +ideal, however, and as noted above, CDMA must work in the presence of +interfering senders that are encoding and transmitting their data using +a different assigned code. But how can a CDMA receiver recover a +sender's original data bits when those data bits are being tangled with +bits being transmitted by other senders? CDMA works under the assumption +that the interfering transmitted bit signals are additive. This means, +for example, that if three senders send a 1 value, and a fourth sender +sends a −1 value during the same mini-slot, then the received signal at +all receivers during that mini-slot is a 2 (since 1+1+1−1=2). In the +presence of multiple senders, sender s computes its encoded +transmissions, Zi,ms, in exactly the same manner as in Equation 7.1. The +value received at a receiver during the mth mini-slot of the ith bit +slot, however, is now the sum of the transmitted bits from all N senders +during that mini-slot: Zi,m*=∑s=1NZi,ms Amazingly, if the senders' codes +are chosen carefully, each receiver can recover the data sent by a given +sender out of the aggregate signal simply by using the sender's code in +exactly the same manner as in Equation 7.2: di=1M∑m=1MZi,m*⋅cm + +(7.3) + +as shown in Figure 7.6, for a two-sender CDMA example. The M-bit CDMA +code being used by the upper sender is (1,1,1,−1,1,−1,−1,−1), while the +CDMA code being used by the lower sender is (1,−1,1,1,1,−1,1,1). Figure +7.6 illustrates a receiver recovering the original data bits from the +upper sender. Note that the receiver is able to extract the data from +sender 1 in spite of the interfering transmission from sender 2. Recall +our cocktail analogy from Chapter 6. A CDMA protocol is similar to +having partygoers speaking in multiple languages; in such circumstances +humans are actually quite good at locking into the conversation in the +language they understand, while filtering out the remaining +conversations. We see here that CDMA is a partitioning protocol in that +it partitions the codespace (as opposed to time or frequency) and +assigns each node a dedicated piece of the codespace. Our discussion +here of CDMA is necessarily brief; in practice a number of difficult +issues must be addressed. First, in order for the CDMA receivers to be +able + +Figure 7.6 A two-sender CDMA example + +to extract a particular sender's signal, the CDMA codes must be +carefully chosen. Second, our discussion has assumed that the received +signal strengths from various senders are the same; in reality this can +be difficult to achieve. There is a considerable body of literature +addressing these and other issues related to CDMA; see \[Pickholtz 1982; +Viterbi 1995\] for details. + +7.3 WiFi: 802.11 Wireless LANs Pervasive in the workplace, the home, +educational institutions, cafés, airports, and street corners, wireless +LANs are now one of the most important access network technologies in +the Internet today. Although many technologies and standards for +wireless LANs were developed in the 1990s, one particular class of +standards has clearly emerged as the winner: the IEEE 802.11 wireless +LAN, also known as WiFi. In this section, we'll take a close look at +802.11 wireless LANs, examining its frame structure, its medium access +protocol, and its internetworking of 802.11 LANs with wired Ethernet +LANs. There are several 802.11 standards for wireless LAN technology in +the IEEE 802.11 ("WiFi") family, as summarized in Table 7.1. The +different 802.11 standards all share some common characteristics. They +all use the same medium access protocol, CSMA/CA, which we'll discuss +shortly. All three use the same frame structure for their link-layer +frames as well. All three standards have the ability to reduce their +transmission rate in order to reach out over greater distances. And, +importantly, 802.11 products are also all backwards compatible, meaning, +for example, that a mobile capable only of 802.11g may still interact +with a newer 802.11ac base station. However, as shown in Table 7.1, the +standards have some major differences at the physical layer. 802.11 +devices operate in two difference frequency ranges: 2.4--2.485 GHz +(referred to as the 2.4 GHz range) and 5.1 -- 5.8 GHz (referred to as +the 5 GHz range). The 2.4 GHz range is an unlicensed frequency band, +where 802.11 devices may compete for frequency spectrum with 2.4 GHz +phones and microwave ovens. At 5 GHz, 802.11 LANs have a shorter +transmission distance for a given power level and suffer more from +multipath propagation. The two most recent standards, 802.11n \[IEEE +802.11n 2012\] and 802.11ac \[IEEE 802.11ac 2013; Cisco 802.11ac 2015\] +uses multiple input multiple-output (MIMO) antennas; i.e., two or more +antennas on the sending side and two or more antennas on the receiving +side that are transmitting/receiving different signals \[Diggavi 2004\]. +802.11ac base Table 7.1 Summary of IEEE 802.11 standards Standard + +Frequency Range + +Data Rate + +802.11b + +2.4 GHz + +up to 11 Mbps + +802.11a + +5 GHz + +up to 54 Mbps + +802.11g + +2.4 GHz + +up to 54 Mbps + +802.11n + +2.5 GHz and 5 GHz + +up to 450 Mbps + +802.11ac + +5 GHz + +up to 1300 Mbps + +stations may transmit to multiple stations simultaneously, and use +"smart" antennas to adaptively beamform to target transmissions in the +direction of a receiver. This decreases interference and increases the +distance reached at a given data rate. The data rates shown in Table 7.1 +are for an idealized environment, e.g., a receiver placed 1 meter away +from the base station, with no interference ---a scenario that we're +unlikely to experience in practice! So as the saying goes, YMMV: Your +Mileage (or in this case your wireless data rate) May Vary. + +7.3.1 The 802.11 Architecture Figure 7.7 illustrates the principal +components of the 802.11 wireless LAN architecture. The fundamental +building block of the 802.11 architecture is the basic service set +(BSS). A BSS contains one or more wireless stations and a central base +station, known as an access point (AP) in 802.11 parlance. Figure 7.7 +shows the AP in each of two BSSs connecting to an interconnection device +(such as a switch or router), which in turn leads to the Internet. In a +typical home network, there is one AP and one router (typically +integrated together as one unit) that connects the BSS to the Internet. +As with Ethernet devices, each 802.11 wireless station has a 6-byte MAC +address that is stored in the firmware of the station's adapter (that +is, 802.11 network interface card). Each AP also has a MAC address for +its wireless interface. As with Ethernet, these MAC addresses are +administered by IEEE and are (in theory) globally unique. + +Figure 7.7 IEEE 802.11 LAN architecture + +Figure 7.8 An IEEE 802.11 ad hoc network + +As noted in Section 7.1, wireless LANs that deploy APs are often +referred to as infrastructure wireless LANs, with the "infrastructure" +being the APs along with the wired Ethernet infrastructure that +interconnects the APs and a router. Figure 7.8 shows that IEEE 802.11 +stations can also group themselves together to form an ad hoc +network---a network with no central control and with no connections to +the "outside world." Here, the network is formed "on the fly," by mobile +devices that have found themselves in proximity to each other, that have +a need to communicate, and that find no preexisting network +infrastructure in their location. An ad hoc network might be formed when +people with + +laptops get together (for example, in a conference room, a train, or a +car) and want to exchange data in the absence of a centralized AP. There +has been tremendous interest in ad hoc networking, as communicating +portable devices continue to proliferate. In this section, though, we'll +focus our attention on infrastructure wireless LANs. Channels and +Association In 802.11, each wireless station needs to associate with an +AP before it can send or receive networklayer data. Although all of the +802.11 standards use association, we'll discuss this topic specifically +in the context of IEEE 802.11b/g. When a network administrator installs +an AP, the administrator assigns a one- or two-word Service Set +Identifier (SSID) to the access point. (When you choose Wi-Fi under +Setting on your iPhone, for example, a list is displayed showing the +SSID of each AP in range.) The administrator must also assign a channel +number to the AP. To understand channel numbers, recall that 802.11 +operates in the frequency range of 2.4 GHz to 2.4835 GHz. Within this 85 +MHz band, 802.11 defines 11 partially overlapping channels. Any two +channels are non-overlapping if and only if they are separated by four +or more channels. In particular, the set of channels 1, 6, and 11 is the +only set of three non-overlapping channels. This means that an +administrator could create a wireless LAN with an aggregate maximum +transmission rate of 33 Mbps by installing three 802.11b APs at the same +physical location, assigning channels 1, 6, and 11 to the APs, and +interconnecting each of the APs with a switch. Now that we have a basic +understanding of 802.11 channels, let's describe an interesting (and not +completely uncommon) situation---that of a WiFi jungle. A WiFi jungle is +any physical location where a wireless station receives a sufficiently +strong signal from two or more APs. For example, in many cafés in New +York City, a wireless station can pick up a signal from numerous nearby +APs. One of the APs might be managed by the café, while the other APs +might be in residential apartments near the café. Each of these APs +would likely be located in a different IP subnet and would have been +independently assigned a channel. Now suppose you enter such a WiFi +jungle with your phone, tablet, or laptop, seeking wireless Internet +access and a blueberry muffin. Suppose there are five APs in the WiFi +jungle. To gain Internet access, your wireless device needs to join +exactly one of the subnets and hence needs to associate with exactly one +of the APs. Associating means the wireless device creates a virtual wire +between itself and the AP. Specifically, only the associated AP will +send data frames (that is, frames containing data, such as a datagram) +to your wireless device, and your wireless device will send data frames +into the Internet only through the associated AP. But how does your +wireless device associate with a particular AP? And more fundamentally, +how does your wireless device know which APs, if any, are out there in +the jungle? The 802.11 standard requires that an AP periodically send +beacon frames, each of which includes the + +AP's SSID and MAC address. Your wireless device, knowing that APs are +sending out beacon frames, scans the 11 channels, seeking beacon frames +from any APs that may be out there (some of which may be transmitting on +the same channel---it's a jungle out there!). Having learned about +available APs from the beacon frames, you (or your wireless device) +select one of the APs for association. The 802.11 standard does not +specify an algorithm for selecting which of the available APs to +associate with; that algorithm is left up to the designers of the 802.11 +firmware and software in your wireless device. Typically, the device +chooses the AP whose beacon frame is received with the highest signal +strength. While a high signal strength is good (see, e.g., Figure 7.3), +signal strength is not the only AP characteristic that will determine +the performance a device receives. In particular, it's possible that the +selected AP may have a strong signal, but may be overloaded with other +affiliated devices (that will need to share the wireless bandwidth at +that AP), while an unloaded AP is not selected due to a slightly weaker +signal. A number of alternative ways of choosing APs have thus recently +been proposed \[Vasudevan 2005; Nicholson 2006; Sundaresan 2006\]. For +an interesting and down-to-earth discussion of how signal strength is +measured, see \[Bardwell 2004\]. + +Figure 7.9 Active and passive scanning for access points + +The process of scanning channels and listening for beacon frames is +known as passive scanning (see Figure 7.9a). A wireless device can also +perform active scanning, by broadcasting a probe frame that will be +received by all APs within the wireless device's range, as shown in +Figure 7.9b. APs respond to the probe request frame with a probe +response frame. The wireless device can then choose the AP with which to +associate from among the responding APs. + +After selecting the AP with which to associate, the wireless device +sends an association request frame to the AP, and the AP responds with +an association response frame. Note that this second request/response +handshake is needed with active scanning, since an AP responding to the +initial probe request frame doesn't know which of the (possibly many) +responding APs the device will choose to associate with, in much the +same way that a DHCP client can choose from among multiple DHCP servers +(see Figure 4.21). Once associated with an AP, the device will want to +join the subnet (in the IP addressing sense of Section 4.3.3) to which +the AP belongs. Thus, the device will typically send a DHCP discovery +message (see Figure 4.21) into the subnet via the AP in order to obtain +an IP address on the subnet. Once the address is obtained, the rest of +the world then views that device simply as another host with an IP +address in that subnet. In order to create an association with a +particular AP, the wireless device may be required to authenticate +itself to the AP. 802.11 wireless LANs provide a number of alternatives +for authentication and access. One approach, used by many companies, is +to permit access to a wireless network based on a device's MAC address. +A second approach, used by many Internet cafés, employs usernames and +passwords. In both cases, the AP typically communicates with an +authentication server, relaying information between the wireless device +and the authentication server using a protocol such as RADIUS \[RFC +2865\] or DIAMETER \[RFC 3588\]. Separating the authentication server +from the AP allows one authentication server to serve many APs, +centralizing the (often sensitive) decisions of authentication and +access within the single server, and keeping AP costs and complexity +low. We'll see in Chapter 8 that the new IEEE 802.11i protocol defining +security aspects of the 802.11 protocol family takes precisely this +approach. + +7.3.2 The 802.11 MAC Protocol Once a wireless device is associated with +an AP, it can start sending and receiving data frames to and from the +access point. But because multiple wireless devices, or the AP itself +may want to transmit data frames at the same time over the same channel, +a multiple access protocol is needed to coordinate the transmissions. In +the following, we'll refer to the devices or the AP as wireless +"stations" that share the multiple access channel. As discussed in +Chapter 6 and Section 7.2.1, broadly speaking there are three classes of +multiple access protocols: channel partitioning (including CDMA), random +access, and taking turns. Inspired by the huge success of Ethernet and +its random access protocol, the designers of 802.11 chose a random +access protocol for 802.11 wireless LANs. This random access protocol is +referred to as CSMA with collision avoidance, or more succinctly as +CSMA/CA. As with Ethernet's CSMA/CD, the "CSMA" in CSMA/CA stands for +"carrier sense multiple access," meaning that each station senses the +channel before transmitting, and refrains from transmitting when the +channel is sensed busy. Although both Ethernet and 802.11 use +carrier-sensing random access, the two MAC protocols have important +differences. First, instead of using collision detection, 802.11 uses +collisionavoidance techniques. Second, because of the relatively high +bit error rates of wireless channels, + +802.11 (unlike Ethernet) uses a link-layer acknowledgment/retransmission +(ARQ) scheme. We'll describe 802.11's collision-avoidance and link-layer +acknowledgment schemes below. Recall from Sections 6.3.2 and 6.4.2 that +with Ethernet's collision-detection algorithm, an Ethernet station +listens to the channel as it transmits. If, while transmitting, it +detects that another station is also transmitting, it aborts its +transmission and tries to transmit again after waiting a small, random +amount of time. Unlike the 802.3 Ethernet protocol, the 802.11 MAC +protocol does not implement collision detection. There are two important +reasons for this: The ability to detect collisions requires the ability +to send (the station's own signal) and receive (to determine whether +another station is also transmitting) at the same time. Because the +strength of the received signal is typically very small compared to the +strength of the transmitted signal at the 802.11 adapter, it is costly +to build hardware that can detect a collision. More importantly, even if +the adapter could transmit and listen at the same time (and presumably +abort transmission when it senses a busy channel), the adapter would +still not be able to detect all collisions, due to the hidden terminal +problem and fading, as discussed in Section 7.2. Because 802.11wireless +LANs do not use collision detection, once a station begins to transmit a +frame, it transmits the frame in its entirety; that is, once a station +gets started, there is no turning back. As one might expect, +transmitting entire frames (particularly long frames) when collisions +are prevalent can significantly degrade a multiple access protocol's +performance. In order to reduce the likelihood of collisions, 802.11 +employs several collision-avoidance techniques, which we'll shortly +discuss. Before considering collision avoidance, however, we'll first +need to examine 802.11's link-layer acknowledgment scheme. Recall from +Section 7.2 that when a station in a wireless LAN sends a frame, the +frame may not reach the destination station intact for a variety of +reasons. To deal with this non-negligible chance of failure, the 802.11 +MAC protocol uses link-layer acknowledgments. As shown in Figure 7.10, +when the destination station receives a frame that passes the CRC, it +waits a short period of time known as the Short Inter-frame Spacing +(SIFS) and then sends back + +Figure 7.10 802.11 uses link-layer acknowledgments + +an acknowledgment frame. If the transmitting station does not receive an +acknowledgment within a given amount of time, it assumes that an error +has occurred and retransmits the frame, using the CSMA/CA protocol to +access the channel. If an acknowledgment is not received after some +fixed number of retransmissions, the transmitting station gives up and +discards the frame. Having discussed how 802.11 uses link-layer +acknowledgments, we're now in a position to describe the 802.11 CSMA/CA +protocol. Suppose that a station (wireless device or an AP) has a frame +to transmit. + +1. If initially the station senses the channel idle, it transmits its + frame after a short period of time known as the Distributed + Inter-frame Space (DIFS); see Figure 7.10. + +2. Otherwise, the station chooses a random backoff value using binary + exponential backoff (as we encountered in Section 6.3.2) and counts + down this value after DIFS when the channel is sensed idle. While + the channel is sensed busy, the counter value remains frozen. + +3. When the counter reaches zero (note that this can only occur while + the channel is sensed idle), the station transmits the entire frame + and then waits for an acknowledgment. + +4. If an acknowledgment is received, the transmitting station knows + that its frame has been correctly received at the destination + station. If the station has another frame to send, it begins + +the CSMA/CA protocol at step 2. If the acknowledgment isn't received, +the transmitting station reenters the backoff phase in step 2, with the +random value chosen from a larger interval. Recall that under Ethernet's +CSMA/CD, multiple access protocol (Section 6.3.2), a station begins +transmitting as soon as the channel is sensed idle. With CSMA/CA, +however, the station refrains from transmitting while counting down, +even when it senses the channel to be idle. Why do CSMA/CD and CDMA/CA +take such different approaches here? To answer this question, let's +consider a scenario in which two stations each have a data frame to +transmit, but neither station transmits immediately because each senses +that a third station is already transmitting. With Ethernet's CSMA/CD, +the two stations would each transmit as soon as they detect that the +third station has finished transmitting. This would cause a collision, +which isn't a serious issue in CSMA/CD, since both stations would abort +their transmissions and thus avoid the useless transmissions of the +remainders of their frames. In 802.11, however, the situation is quite +different. Because 802.11 does not detect a collision and abort +transmission, a frame suffering a collision will be transmitted in its +entirety. The goal in 802.11 is thus to avoid collisions whenever +possible. In 802.11, if the two stations sense the channel busy, they +both immediately enter random backoff, hopefully choosing different +backoff values. If these values are indeed different, once the channel +becomes idle, one of the two stations will begin transmitting before the +other, and (if the two stations are not hidden from each other) the +"losing station" will hear the "winning station's" signal, freeze its +counter, and refrain from transmitting until the winning station has +completed its transmission. In this manner, a costly collision is +avoided. Of course, collisions can still occur with 802.11 in this +scenario: The two stations could be hidden from each other, or the two +stations could choose random backoff values that are close enough that +the transmission from the station starting first have yet to reach the +second station. Recall that we encountered this problem earlier in our +discussion of random access algorithms in the context of Figure 6.12. +Dealing with Hidden Terminals: RTS and CTS The 802.11 MAC protocol also +includes a nifty (but optional) reservation scheme that helps avoid +collisions even in the presence of hidden terminals. Let's investigate +this scheme in the context of Figure 7.11, which shows two wireless +stations and one access point. Both of the wireless stations are within +range of the AP (whose coverage is shown as a shaded circle) and both +have associated with the AP. However, due to fading, the signal ranges +of wireless stations are limited to the interiors of the shaded circles +shown in Figure 7.11. Thus, each of the wireless stations is hidden from +the other, although neither is hidden from the AP. Let's now consider +why hidden terminals can be problematic. Suppose Station H1 is +transmitting a frame and halfway through H1's transmission, Station H2 +wants to send a frame to the AP. H2, not hearing the transmission from +H1, will first wait a DIFS interval and then transmit the frame, +resulting in + +a collision. The channel will therefore be wasted during the entire +period of H1's transmission as well as during H2's transmission. In +order to avoid this problem, the IEEE 802.11 protocol allows a station +to use a short Request to Send (RTS) control frame and a short Clear to +Send (CTS) control frame to reserve access to the channel. When a sender +wants to send a DATA + +Figure 7.11 Hidden terminal example: H1 is hidden from H2, and vice +versa + +frame, it can first send an RTS frame to the AP, indicating the total +time required to transmit the DATA frame and the acknowledgment (ACK) +frame. When the AP receives the RTS frame, it responds by broadcasting a +CTS frame. This CTS frame serves two purposes: It gives the sender +explicit permission to send and also instructs the other stations not to +send for the reserved duration. Thus, in Figure 7.12, before +transmitting a DATA frame, H1 first broadcasts an RTS frame, which is +heard by all stations in its circle, including the AP. The AP then +responds + +Figure 7.12 Collision avoidance using the RTS and CTS frames + +with a CTS frame, which is heard by all stations within its range, +including H1 and H2. Station H2, having heard the CTS, refrains from +transmitting for the time specified in the CTS frame. The RTS, CTS, +DATA, and ACK frames are shown in Figure 7.12. The use of the RTS and +CTS frames can improve performance in two important ways: The hidden +station problem is mitigated, since a long DATA frame is transmitted +only after the channel has been reserved. Because the RTS and CTS frames +are short, a collision involving an RTS or CTS frame will last only + +for the duration of the short RTS or CTS frame. Once the RTS and CTS +frames are correctly transmitted, the following DATA and ACK frames +should be transmitted without collisions. You are encouraged to check +out the 802.11 applet in the textbook's Web site. This interactive +applet illustrates the CSMA/CA protocol, including the RTS/CTS exchange +sequence. Although the RTS/CTS exchange can help reduce collisions, it +also introduces delay and consumes channel resources. For this reason, +the RTS/CTS exchange is only used (if at all) to reserve the channel for +the transmission of a long DATA frame. In practice, each wireless +station can set an RTS threshold such that the RTS/CTS sequence is used +only when the frame is longer than the threshold. For many wireless +stations, the default RTS threshold value is larger than the maximum +frame length, so the RTS/CTS sequence is skipped for all DATA frames +sent. Using 802.11 as a Point-to-Point Link Our discussion so far has +focused on the use of 802.11 in a multiple access setting. We should +mention that if two nodes each have a directional antenna, they can +point their directional antennas at each other and run the 802.11 +protocol over what is essentially a point-to-point link. Given the low +cost of commodity 802.11 hardware, the use of directional antennas and +an increased transmission power allow 802.11 to be used as an +inexpensive means of providing wireless point-to-point connections over +tens of kilometers distance. \[Raman 2007\] describes one of the first +such multi-hop wireless networks, operating in the rural Ganges plains +in India using point-to-point 802.11 links. + +7.3.3 The IEEE 802.11 Frame Although the 802.11 frame shares many +similarities with an Ethernet frame, it also contains a number of fields +that are specific to its use for wireless links. The 802.11 frame is +shown in Figure 7.13. The numbers above each of the fields in the frame +represent the lengths of the fields in bytes; the numbers above each of +the subfields in the frame control field represent the lengths of the +subfields in bits. Let's now examine the fields in the frame as well as +some of the more important subfields in the frame's control field. + +Figure 7.13 The 802.11 frame + +Payload and CRC Fields At the heart of the frame is the payload, which +typically consists of an IP datagram or an ARP packet. Although the +field is permitted to be as long as 2,312 bytes, it is typically fewer +than 1,500 bytes, holding an IP datagram or an ARP packet. As with an +Ethernet frame, an 802.11 frame includes a 32-bit cyclic redundancy +check (CRC) so that the receiver can detect bit errors in the received +frame. As we've seen, bit errors are much more common in wireless LANs +than in wired LANs, so the CRC is even more useful here. Address Fields +Perhaps the most striking difference in the 802.11 frame is that it has +four address fields, each of which can hold a 6-byte MAC address. But +why four address fields? Doesn't a source MAC field and destination MAC +field suffice, as they do for Ethernet? It turns out that three address +fields are needed for internetworking purposes---specifically, for +moving the network-layer datagram from a wireless station through an AP +to a router interface. The fourth address field is used when APs forward +frames to each other in ad hoc mode. Since we are only considering +infrastructure networks here, let's focus our attention on the first +three address fields. The 802.11 standard defines these fields as +follows: Address 2 is the MAC address of the station that transmits the +frame. Thus, if a wireless station transmits the frame, that station's +MAC address is inserted in the address 2 field. Similarly, if an AP +transmits the frame, the AP's MAC address is inserted in the address 2 +field. Address 1 is the MAC address of the wireless station that is to +receive the frame. Thus if a mobile wireless station transmits the +frame, address 1 contains the MAC address of the destination AP. +Similarly, if an AP transmits the frame, address 1 contains the MAC +address of the destination wireless station. + +Figure 7.14 The use of address fields in 802.11 frames: Sending frames +between H1 and R1 + +To understand address 3, recall that the BSS (consisting of the AP and +wireless stations) is part of a subnet, and that this subnet connects to +other subnets via some router interface. Address 3 contains the MAC +address of this router interface. To gain further insight into the +purpose of address 3, let's walk through an internetworking example in +the context of Figure 7.14. In this figure, there are two APs, each of +which is responsible for a number of wireless stations. Each of the APs +has a direct connection to a router, which in turn connects to the +global Internet. We should keep in mind that an AP is a link-layer +device, and thus neither "speaks" IP nor understands IP addresses. +Consider now moving a datagram from the router interface R1 to the +wireless Station H1. The router is not aware that there is an AP between +it and H1; from the router's perspective, H1 is just a host in one of +the subnets to which it (the router) is connected. The router, which +knows the IP address of H1 (from the destination address of the +datagram), uses ARP to determine the MAC address of H1, just as in an +ordinary Ethernet LAN. After obtaining H1's MAC address, router +interface R1 encapsulates the datagram within an Ethernet frame. The +source address field of this frame contains R1's MAC address, and the +destination address field contains H1's MAC address. When the Ethernet +frame arrives at the AP, the AP converts the 802.3 Ethernet frame to an +802.11 frame before transmitting the frame into the wireless channel. +The AP fills in address 1 and address 2 with H1's MAC address and its +own MAC address, respectively, as described above. For address 3, the AP +inserts the MAC address of R1. In this manner, H1 can determine (from +address 3) the MAC address of the router interface that sent the +datagram into the subnet. + +Now consider what happens when the wireless station H1 responds by +moving a datagram from H1 to R1. H1 creates an 802.11 frame, filling the +fields for address 1 and address 2 with the AP's MAC address and H1's +MAC address, respectively, as described above. For address 3, H1 inserts +R1's MAC address. When the AP receives the 802.11 frame, it converts the +frame to an Ethernet frame. The source address field for this frame is +H1's MAC address, and the destination address field is R1's MAC address. +Thus, address 3 allows the AP to determine the appropriate destination +MAC address when constructing the Ethernet frame. In summary, address 3 +plays a crucial role for internetworking the BSS with a wired LAN. +Sequence Number, Duration, and Frame Control Fields Recall that in +802.11, whenever a station correctly receives a frame from another +station, it sends back an acknowledgment. Because acknowledgments can +get lost, the sending station may send multiple copies of a given frame. +As we saw in our discussion of the rdt2.1 protocol (Section 3.4.1), the +use of sequence numbers allows the receiver to distinguish between a +newly transmitted frame and the retransmission of a previous frame. The +sequence number field in the 802.11 frame thus serves exactly the same +purpose here at the link layer as it did in the transport layer in +Chapter 3. Recall that the 802.11 protocol allows a transmitting station +to reserve the channel for a period of time that includes the time to +transmit its data frame and the time to transmit an acknowledgment. This +duration value is included in the frame's duration field (both for data +frames and for the RTS and CTS frames). As shown in Figure 7.13, the +frame control field includes many subfields. We'll say just a few words +about some of the more important subfields; for a more complete +discussion, you are encouraged to consult the 802.11 specification +\[Held 2001; Crow 1997; IEEE 802.11 1999\]. The type and subtype fields +are used to distinguish the association, RTS, CTS, ACK, and data frames. +The to and from fields are used to define the meanings of the different +address fields. (These meanings change depending on whether ad hoc or +infrastructure modes are used and, in the case of infrastructure mode, +whether a wireless station or an AP is sending the frame.) Finally the +WEP field indicates whether encryption is being used or not (WEP is +discussed in Chapter 8). + +7.3.4 Mobility in the Same IP Subnet + +In order to increase the physical range of a wireless LAN, companies and +universities will often deploy multiple BSSs within the same IP subnet. +This naturally raises the issue of mobility among the BSSs--- how do +wireless stations seamlessly move from one BSS to another while +maintaining ongoing TCP sessions? As we'll see in this subsection, +mobility can be handled in a relatively straightforward manner when the +BSSs are part of the subnet. When stations move between subnets, more +sophisticated mobility management protocols will be needed, such as +those we'll study in Sections 7.5 and 7.6. Let's now look at a specific +example of mobility between BSSs in the same subnet. Figure 7.15 shows +two interconnected BSSs with a host, H1, moving from BSS1 to BSS2. +Because in this example the interconnection device that connects the two +BSSs is not a router, all of the stations in the two BSSs, including the +APs, belong to the same IP subnet. Thus, when H1 moves from BSS1 to +BSS2, it may keep its IP address and all of its ongoing TCP connections. +If the interconnection device were a router, then H1 would have to +obtain a new IP address in the subnet in which it was moving. This +address change would disrupt (and eventually terminate) any on-going TCP +connections at H1. In Section 7.6, we'll see how a network-layer +mobility protocol, such as mobile IP, can be used to avoid this problem. +But what specifically happens when H1 moves from BSS1 to BSS2? As H1 +wanders away from AP1, H1 detects a weakening signal from AP1 and starts +to scan for a stronger signal. H1 receives beacon frames from AP2 (which +in many corporate and university settings will have the same SSID as +AP1). H1 then disassociates with AP1 and associates with AP2, while +keeping its IP address and maintaining its ongoing TCP sessions. This +addresses the handoff problem from the host and AP viewpoint. But what +about the switch in Figure 7.15? How does it know that the host has +moved from one AP to another? As you may recall from Chapter 6, switches +are "self-learning" and automatically build their forwarding tables. +This selflearning feature nicely handles + +Figure 7.15 Mobility in the same subnet + +occasional moves (for example, when an employee gets transferred from +one department to another); however, switches were not designed to +support highly mobile users who want to maintain TCP connections while +moving between BSSs. To appreciate the problem here, recall that before +the move, the switch has an entry in its forwarding table that pairs +H1's MAC address with the outgoing switch interface through which H1 can +be reached. If H1 is initially in BSS1, then a datagram destined to H1 +will be directed to H1 via AP1. Once H1 associates with BSS2, however, +its frames should be directed to AP2. One solution (a bit of a hack, +really) is for AP2 to send a broadcast Ethernet frame with H1's source +address to the switch just after the new association. When the switch +receives the frame, it updates its forwarding table, allowing H1 to be +reached via AP2. The 802.11f standards group is developing an inter-AP +protocol to handle these and related issues. Our discussion above has +focused on mobility with the same LAN subnet. Recall that VLANs, which +we studied in Section 6.4.4, can be used to connect together islands of +LANs into a large virtual LAN that can span a large geographical region. +Mobility among base stations within such a VLAN can be handled in +exactly the same manner as above \[Yu 2011\]. + +7.3.5 Advanced Features in 802.11 We'll wrap up our coverage of 802.11 +with a short discussion of two advanced capabilities found in 802.11 +networks. As we'll see, these capabilities are not completely specified +in the 802.11 standard, but rather are made possible by mechanisms +specified in the standard. This allows different vendors to implement +these capabilities using their own (proprietary) approaches, presumably +giving them an edge over the competition. 802.11 Rate Adaptation We saw +earlier in Figure 7.3 that different modulation techniques (with the +different transmission rates that they provide) are appropriate for +different SNR scenarios. Consider for example a mobile 802.11 user who +is initially 20 meters away from the base station, with a high +signal-to-noise ratio. Given the high SNR, the user can communicate with +the base station using a physical-layer modulation technique that +provides high transmission rates while maintaining a low BER. This is +one happy user! Suppose now that the user becomes mobile, walking away +from the base station, with the SNR falling as the distance from the +base station increases. In this case, if the modulation technique used +in the 802.11 protocol operating between the base station and the user +does not change, the BER will become unacceptably high as the SNR +decreases, and eventually no transmitted frames will be received +correctly. For this reason, some 802.11 implementations have a rate +adaptation capability that adaptively selects the underlying +physical-layer modulation technique to use based on current or recent +channel + +characteristics. If a node sends two frames in a row without receiving +an acknowledgment (an implicit indication of bit errors on the channel), +the transmission rate falls back to the next lower rate. If 10 frames in +a row are acknowledged, or if a timer that tracks the time since the +last fallback expires, the transmission rate increases to the next +higher rate. This rate adaptation mechanism shares the same "probing" +philosophy as TCP's congestion-control mechanism---when conditions are +good (reflected by ACK receipts), the transmission rate is increased +until something "bad" happens (the lack of ACK receipts); when something +"bad" happens, the transmission rate is reduced. 802.11 rate adaptation +and TCP congestion control are thus similar to the young child who is +constantly pushing his/her parents for more and more (say candy for a +young child, later curfew hours for the teenager) until the parents +finally say "Enough!" and the child backs off (only to try again later +after conditions have hopefully improved!). A number of other schemes +have also been proposed to improve on this basic automatic +rateadjustment scheme \[Kamerman 1997; Holland 2001; Lacage 2004\]. +Power Management Power is a precious resource in mobile devices, and +thus the 802.11 standard provides powermanagement capabilities that +allow 802.11 nodes to minimize the amount of time that their sense, +transmit, and receive functions and other circuitry need to be "on." +802.11 power management operates as follows. A node is able to +explicitly alternate between sleep and wake states (not unlike a sleepy +student in a classroom!). A node indicates to the access point that it +will be going to sleep by setting the power-management bit in the header +of an 802.11 frame to 1. A timer in the node is then set to wake up the +node just before the AP is scheduled to send its beacon frame (recall +that an AP typically sends a beacon frame every 100 msec). Since the AP +knows from the set power-transmission bit that the node is going to +sleep, it (the AP) knows that it should not send any frames to that +node, and will buffer any frames destined for the sleeping host for +later transmission. A node will wake up just before the AP sends a +beacon frame, and quickly enter the fully active state (unlike the +sleepy student, this wakeup requires only 250 microseconds \[Kamerman +1997\]!). The beacon frames sent out by the AP contain a list of nodes +whose frames have been buffered at the AP. If there are no buffered +frames for the node, it can go back to sleep. Otherwise, the node can +explicitly request that the buffered frames be sent by sending a polling +message to the AP. With an inter-beacon time of 100 msec, a wakeup time +of 250 microseconds, and a similarly small time to receive a beacon +frame and check to ensure that there are no buffered frames, a node that +has no frames to send or receive can be asleep 99% of the time, +resulting in a significant energy savings. + +7.3.6 Personal Area Networks: Bluetooth and Zigbee As illustrated in +Figure 7.2, the IEEE 802.11 WiFi standard is aimed at communication +among devices separated by up to 100 meters (except when 802.11 is used +in a point-to-point configuration with a + +directional antenna). Two other wireless protocols in the IEEE 802 +family are Bluetooth and Zigbee (defined in the IEEE 802.15.1 and IEEE +802.15.4 standards \[IEEE 802.15 2012\]). Bluetooth An IEEE 802.15.1 +network operates over a short range, at low power, and at low cost. It +is essentially a low-power, short-range, low-rate "cable replacement" +technology for interconnecting a computer with its wireless keyboard, +mouse or other peripheral device; cellular phones, speakers, headphones, +and many other devices, whereas 802.11 is a higher-power, medium-range, +higher-rate "access" technology. For this reason, 802.15.1 networks are +sometimes referred to as wireless personal area networks (WPANs). The +link and physical layers of 802.15.1 are based on the earlier Bluetooth +specification for personal area networks \[Held 2001, Bisdikian 2001\]. +802.15.1 networks operate in the 2.4 GHz unlicensed radio band in a TDM +manner, with time slots of 625 microseconds. During each time slot, a +sender transmits on one of 79 channels, with the channel changing in a +known but pseudo-random manner from slot to slot. This form of channel +hopping, known as frequency-hopping spread spectrum (FHSS), spreads +transmissions in time over the frequency spectrum. 802.15.1 can provide +data rates up to 4 Mbps. 802.15.1 networks are ad hoc networks: No +network infrastructure (e.g., an access point) is needed to interconnect +802.15.1 devices. Thus, 802.15.1 devices must organize themselves. +802.15.1 devices are first organized into a piconet of up to eight +active devices, as shown in Figure 7.16. One of these devices is +designated as the master, with the remaining devices acting as slaves. +The master node truly rules the piconet---its clock determines time in +the piconet, it can transmit in each odd-numbered slot, and a + +Figure 7.16 A Bluetooth piconet + +slave can transmit only after the master has communicated with it in the +previous slot and even then the slave can only transmit to the master. +In addition to the slave devices, there can also be up to 255 parked +devices in the network. These devices cannot communicate until their +status has been changed from parked to active by the master node. For +more information about WPANs, the interested reader should consult the +Bluetooth references \[Held 2001, Bisdikian 2001\] or the official IEEE +802.15 Web site \[IEEE 802.15 2012\]. Zigbee A second personal area +network standardized by the IEEE is the 802.15.4 standard \[IEEE 802.15 +2012\] known as Zigbee. While Bluetooth networks provide a "cable +replacement" data rate of over a Megabit per second, Zigbee is targeted +at lower-powered, lower-data-rate, lower-duty-cycle applications than +Bluetooth. While we may tend to think that "bigger and faster is +better," not all network applications need high bandwidth and the +consequent higher costs (both economic and power costs). For example, +home temperature and light sensors, security devices, and wall-mounted +switches are all very simple, lowpower, low-duty-cycle, low-cost +devices. Zigbee is thus well-suited for these devices. Zigbee defines +channel rates of 20, 40, 100, and 250 Kbps, depending on the channel +frequency. Nodes in a Zigbee network come in two flavors. So-called +"reduced-function devices" operate as slave devices under the control of +a single "full-function device," much as Bluetooth slave devices. A +fullfunction device can operate as a master device as in Bluetooth by +controlling multiple slave devices, and multiple full-function devices +can additionally be configured into a mesh network in which fullfunction +devices route frames amongst themselves. Zigbee shares many protocol +mechanisms that we've already encountered in other link-layer protocols: +beacon frames and link-layer acknowledgments (similar to 802.11), +carrier-sense random access protocols with binary exponential backoff +(similar to 802.11 and Ethernet), and fixed, guaranteed allocation of +time slots (similar to DOCSIS). Zigbee networks can be configured in +many different ways. Let's consider the simple case of a single +full-function device controlling multiple reduced-function devices in a +time-slotted manner using beacon frames. Figure 7.17 shows the case + +Figure 7.17 Zigbee 802.15.4 super-frame structure + +where the Zigbee network divides time into recurring super frames, each +of which begins with a beacon frame. Each beacon frame divides the super +frame into an active period (during which devices may transmit) and an +inactive period (during which all devices, including the controller, can +sleep and thus conserve power). The active period consists of 16 time +slots, some of which are used by devices in a CSMA/CA random access +manner, and some of which are allocated by the controller to specific +devices, thus providing guaranteed channel access for those devices. +More details about Zigbee networks can be found at \[Baronti 2007, IEEE +802.15.4 2012\]. + +7.4 Cellular Internet Access In the previous section we examined how an +Internet host can access the Internet when inside a WiFi hotspot---that +is, when it is within the vicinity of an 802.11 access point. But most +WiFi hotspots have a small coverage area of between 10 and 100 meters in +diameter. What do we do then when we have a desperate need for wireless +Internet access and we cannot access a WiFi hotspot? Given that cellular +telephony is now ubiquitous in many areas throughout the world, a +natural strategy is to extend cellular networks so that they support not +only voice telephony but wireless Internet access as well. Ideally, this +Internet access would be at a reasonably high speed and would provide +for seamless mobility, allowing users to maintain their TCP sessions +while traveling, for example, on a bus or a train. With sufficiently +high upstream and downstream bit rates, the user could even maintain +videoconferencing sessions while roaming about. This scenario is not +that far-fetched. Data rates of several megabits per second are becoming +available as broadband data services such as those we will cover here +become more widely deployed. In this section, we provide a brief +overview of current and emerging cellular Internet access technologies. +Our focus here will be on both the wireless first hop as well as the +network that connects the wireless first hop into the larger telephone +network and/or the Internet; in Section 7.7 we'll consider how calls are +routed to a user moving between base stations. Our brief discussion will +necessarily provide only a simplified and high-level description of +cellular technologies. Modern cellular communications, of course, has +great breadth and depth, with many universities offering several courses +on the topic. Readers seeking a deeper understanding are encouraged to +see \[Goodman 1997; Kaaranen 2001; Lin 2001; Korhonen 2003; Schiller +2003; Palat 2009; Scourias 2012; Turner 2012; Akyildiz 2010\], as well +as the particularly excellent and exhaustive references \[Mouly 1992; +Sauter 2014\]. + +7.4.1 An Overview of Cellular Network Architecture In our description of +cellular network architecture in this section, we'll adopt the +terminology of the Global System for Mobile Communications (GSM) +standards. (For history buffs, the GSM acronym was originally derived +from Groupe Spécial Mobile, until the more anglicized name was adopted, +preserving the original acronym letters.) In the 1980s, Europeans +recognized the need for a pan-European digital cellular telephony system +that would replace the numerous incompatible analog cellular telephony +systems, leading to the GSM standard \[Mouly 1992\]. Europeans deployed +GSM technology with great + +success in the early 1990s, and since then GSM has grown to be the +800-pound gorilla of the cellular telephone world, with more than 80% of +all cellular subscribers worldwide using GSM. + +CASE HISTORY 4G Cellular Mobile Versus Wireless LANs Many cellular +mobile phone operators are deploying 4G cellular mobile systems. In some +countries (e.g., Korea and Japan), 4G LTE coverage is higher than +90%---nearly ubiquitous. In 2015, average download rates over deployed +LTE systems range from 10Mbps in the US and India to close to 40 Mbps in +New Zealand. These 4G systems are being deployed in licensed +radio-frequency bands, with some operators paying considerable sums to +governments for spectrum-use licenses. 4G systems allow users to access +the Internet from remote outdoor locations while on the move, in a +manner similar to today's cellular phone-only access. In many cases, a +user may have simultaneous access to both wireless LANs and 4G. With the +capacity of 4G systems being both more constrained and more expensive, +many mobile devices default to the use of WiFi rather than 4G, when both +are avilable. The question of whether wireless edge network access will +be primarily over wireless LANs or cellular systems remains an open +question: The emerging wireless LAN infrastructure may become nearly +ubiquitous. IEEE 802.11 wireless LANs, operating at 54 Mbps and higher, +are enjoying widespread deployment. Essentially all laptops, tablets and +smartphones are factory-equipped with 802.11 LAN capabilities. +Furthermore, emerging Internet appliances---such as wireless cameras and +picture frames---also have low-powered wireless LAN capabilities. +Wireless LAN base stations can also handle mobile phone appliances. Many +phones are already capable of connecting to the cellular phone network +or to an IP network either natively or using a Skype-like Voice-over-IP +service, thus bypassing the operator's cellular voice and 4G data +services. Of course, many other experts believe that 4G not only will be +a major success, but will also dramatically revolutionize the way we +work and live. Most likely, both WiFi and 4G will both become prevalent +wireless technologies, with roaming wireless devices automatically +selecting the access technology that provides the best service at their +current physical location. + +When people talk about cellular technology, they often classify the +technology as belonging to one of several "generations." The earliest +generations were designed primarily for voice traffic. First generation +(1G) systems were analog FDMA systems designed exclusively for +voice-only communication. These 1G systems are almost extinct now, +having been replaced by digital 2G systems. The original 2G systems were +also designed for voice, but later extended (2.5G) to support data +(i.e., Internet) as well as voice service. 3G systems also support voice +and data, but with an emphasis on data capabilities and + +higher-speed radio access links. The 4G systems being deployed today are +based on LTE technology, feature an all-IP core network, and provide +integrated voice and data at multi-Megabit speeds. Cellular Network +Architecture, 2G: Voice Connections to the Telephone Network The term +cellular refers to the fact that the region covered by a cellular +network is partitioned into a number of geographic coverage areas, known +as cells, shown as hexagons on the left side of Figure 7.18. As with the +802.11WiFi standard we studied in Section 7.3.1, GSM has its own +particular nomenclature. Each cell + +Figure 7.18 Components of the GSM 2G cellular network architecture + +contains a base transceiver station (BTS) that transmits signals to and +receives signals from the mobile stations in its cell. The coverage area +of a cell depends on many factors, including the transmitting power of +the BTS, the transmitting power of the user devices, obstructing +buildings in the cell, and the height of base station antennas. Although +Figure 7.18 shows each cell containing one base transceiver station +residing in the middle of the cell, many systems today place the BTS at +corners where three cells intersect, so that a single BTS with +directional antennas can service three cells. The GSM standard for 2G +cellular systems uses combined FDM/TDM (radio) for the air interface. +Recall from Chapter 1 that, with pure FDM, the channel is partitioned +into a number of frequency bands with each band devoted to a call. Also +recall from Chapter 1 that, with pure TDM, time is partitioned into + +frames with each frame further partitioned into slots and each call +being assigned the use of a particular slot in the revolving frame. In +combined FDM/TDM systems, the channel is partitioned into a number of +frequency sub-bands; within each sub-band, time is partitioned into +frames and slots. Thus, for a combined FDM/TDM system, if the channel is +partitioned into F sub-bands and time is partitioned into T slots, then +the channel will be able to support F.T simultaneous calls. Recall that +we saw in Section 6.3.4 that cable access networks also use a combined +FDM/TDM approach. GSM systems consist of 200-kHz frequency bands with +each band supporting eight TDM calls. GSM encodes speech at 13 kbps and +12.2 kbps. A GSM network's base station controller (BSC) will typically +service several tens of base transceiver stations. The role of the BSC +is to allocate BTS radio channels to mobile subscribers, perform paging +(finding the cell in which a mobile user is resident), and perform +handoff of mobile users---a topic we'll cover shortly in Section 7.7.2. +The base station controller and its controlled base transceiver stations +collectively constitute a GSM base station subsystem (BSS). As we'll see +in Section 7.7, the mobile switching center (MSC) plays the central role +in user authorization and accounting (e.g., determining whether a mobile +device is allowed to connect to the cellular network), call +establishment and teardown, and handoff. A single MSC will typically +contain up to five BSCs, resulting in approximately 200K subscribers per +MSC. A cellular provider's network will have a number of MSCs, with +special MSCs known as gateway MSCs connecting the provider's cellular +network to the larger public telephone network. + +7.4.2 3G Cellular Data Networks: Extending the Internet to Cellular +Subscribers Our discussion in Section 7.4.1 focused on connecting +cellular voice users to the public telephone network. But, of course, +when we're on the go, we'd also like to read e-mail, access the Web, get +location-dependent services (e.g., maps and restaurant recommendations) +and perhaps even watch streaming video. To do this, our smartphone will +need to run a full TCP/IP protocol stack (including the physical link, +network, transport, and application layers) and connect into the +Internet via the cellular data network. The topic of cellular data +networks is a rather bewildering collection of competing and +ever-evolving standards as one generation (and half-generation) succeeds +the former and introduces new technologies and services with new +acronyms. To make matters worse, there's no single official body that +sets requirements for 2.5G, 3G, 3.5G, or 4G technologies, making it hard +to sort out the differences among competing standards. In our discussion +below, we'll focus on the UMTS (Universal Mobile Telecommunications +Service) 3G and 4G standards developed by the 3rd Generation Partnership +project (3GPP) \[3GPP 2016\]. Let's first take a top-down look at 3G +cellular data network architecture shown in Figure 7.19. + +Figure 7.19 3G system architecture + +3G Core Network The 3G core cellular data network connects radio access +networks to the public Internet. The core network interoperates with +components of the existing cellular voice network (in particular, the +MSC) that we previously encountered in Figure 7.18. Given the +considerable amount of existing infrastructure (and profitable +services!) in the existing cellular voice network, the approach taken by +the designers of 3G data services is clear: leave the existing core GSM +cellular voice network untouched, adding additional cellular data +functionality in parallel to the existing cellular voice network. The +alternative--- integrating new data services directly into the core of +the existing cellular voice network---would have raised the same +challenges encountered in Section 4.3, where we discussed integrating +new (IPv6) and legacy (IPv4) technologies in the Internet. + +There are two types of nodes in the 3G core network: Serving GPRS +Support Nodes (SGSNs) and Gateway GPRS Support Nodes (GGSNs). (GPRS +stands for Generalized Packet Radio Service, an early cellular data +service in 2G networks; here we discuss the evolved version of GPRS in +3G networks). An SGSN is responsible for delivering datagrams to/from +the mobile nodes in the radio access network to which the SGSN is +attached. The SGSN interacts with the cellular voice network's MSC for +that area, providing user authorization and handoff, maintaining +location (cell) information about active mobile nodes, and performing +datagram forwarding between mobile nodes in the radio access network and +a GGSN. The GGSN acts as a gateway, connecting multiple SGSNs into the +larger Internet. A GGSN is thus the last piece of 3G infrastructure that +a datagram originating at a mobile node encounters before entering the +larger Internet. To the outside world, the GGSN looks like any other +gateway router; the mobility of the 3G nodes within the GGSN's network +is hidden from the outside world behind the GGSN. 3G Radio Access +Network: The Wireless Edge The 3G radio access network is the wireless +first-hop network that we see as a 3G user. The Radio Network Controller +(RNC) typically controls several cell base transceiver stations similar +to the base stations that we encountered in 2G systems (but officially +known in 3G UMTS parlance as a "Node Bs"---a rather non-descriptive +name!). Each cell's wireless link operates between the mobile nodes and +a base transceiver station, just as in 2G networks. The RNC connects to +both the circuit-switched cellular voice network via an MSC, and to the +packet-switched Internet via an SGSN. Thus, while 3G cellular voice and +cellular data services use different core networks, they share a common +first/last-hop radio access network. A significant change in 3G UMTS +over 2G networks is that rather than using GSM's FDMA/TDMA scheme, UMTS +uses a CDMA technique known as Direct Sequence Wideband CDMA (DS-WCDMA) +\[Dahlman 1998\] within TDMA slots; TDMA slots, in turn, are available +on multiple frequencies---an interesting use of all three dedicated +channel-sharing approaches that we earlier identified in Chapter 6 and +similar to the approach taken in wired cable access networks (see +Section 6.3.4). This change requires a new 3G cellular wireless-access +network operating in parallel with the 2G BSS radio network shown in +Figure 7.19. The data service associated with the WCDMA specification is +known as HSPA (High Speed Packet Access) and promises downlink data +rates of up to 14 Mbps. Details regarding 3G networks can be found at +the 3rd Generation Partnership Project (3GPP) Web site \[3GPP 2016\]. + +7.4.3 On to 4G: LTE Fourth generation (4G) cellular systems are becoming +widely deployed. In 2015, more than 50 countries had 4G coverage +exceeding 50%. The 4G Long-Term Evolution (LTE) standard \[Sauter 2014\] +put forward by the 3GPP has two important innovations over 3G systems an +all-IP core network and an + +enhanced radio access network, as discussed below. 4G System +Architecture: An All-IP Core Network Figure 7.20 shows the overall 4G +network architecture, which (unfortunately) introduces yet another +(rather impenetrable) new vocabulary and set of acronyms for + +Figure 7.20 4G network architecture + +network components. But let's not get lost in these acronyms! There are +two important high-level observations about the 4G architecture: A +unified, all-IP network architecture. Unlike the 3G network shown in +Figure 7.19, which has separate network components and paths for voice +and data traffic, the 4G architecture shown in Figure 7.20 is +"all-IP"---both voice and data are carried in IP datagrams to/from the +wireless device (the User Equipment, UE in 4G parlance) to the gateway +to the packet gateway (P-GW) that connects the 4G edge network to the +rest of the network. With 4G, the last vestiges of cellular networks' +roots in the telephony have disappeared, giving way to universal IP +service! A clear separation of the 4G data plane and 4G control plane. +Mirroring our distinction between the data and control planes for IP's +network layer in Chapters 4 and 5 respectively, the 4G network +architecture also clearly separates the data and control planes. We'll +discuss their functionality below. A clear separation between the radio +access network, and the all-IP-core network. IP datagrams carrying user +data are forwarded between the user (UE) and the gateway (P-GW in + +Figure 7.20) over a 4G-internal IP network to the external Internet. +Control packets are exchanged over this same internal network among the +4G's control services components, whose roles are described below. The +principal components of the 4G architecture are as follows. The eNodeB +is the logical descendant of the 2G base station and the 3G Radio +Network Controller (a.k.a Node B) and again plays a central role here. +Its data-plane role is to forward datagrams between UE (over the LTE +radio access network) and the P-GW. UE datagrams are encapsulated at the +eNodeB and tunneled to the P-GW through the 4G network's all-IP enhanced +packet core (EPC). This tunneling between the eNodeB and P-GW is similar +the tunneling we saw in Section 4.3 of IPv6 datagrams between two IPv6 +endpoints through a network of IPv4 routers. These tunnels may have +associated quality of service (QoS) guarantees. For example, a 4G +network may guarantee that voice traffic experiences no more than a 100 +msec delay between UE and P-GW, and has a packet loss rate of less than +1%; TCP traffic might have a guarantee of 300 msec and a packet loss +rate of less than .0001% \[Palat 2009\]. We'll cover QoS in Chapter 9. +In the control plane, the eNodeB handles registration and mobility +signaling traffic on behalf of the UE. The Packet Data Network Gateway +(P-GW) allocates IP addresses to the UEs and performs QoS enforcement. +As a tunnel endpoint it also performs datagram +encapsulation/decapsulation when forwarding a datagram to/from a UE. The +Serving Gateway (S-GW) is the data-plane mobility anchor point---all UE +traffic will pass through the S-GW. The S-GW also performs +charging/billing functions and lawful traffic interception. The Mobility +Management Entity (MME) performs connection and mobility management on +behalf of the UEs resident in the cell it controls. It receives UE +subscription information from the HHS. We cover mobility in cellular +networks in detail in Section 7.7. The Home Subscriber Server (HSS) +contains UE information including roaming access capabilities, quality +of service profiles, and authentication information. As we'll see in +Section 7.7, the HSS obtains this information from the UE's home +cellular provider. Very readable introductions to 4G network +architecture and its EPC are \[Motorola 2007; Palat 2009; Sauter 2014\]. +LTE Radio Access Network LTE uses a combination of frequency division +multiplexing and time division multiplexing on the downstream channel, +known as orthogonal frequency division multiplexing (OFDM) \[Rohde 2008; +Ericsson 2011\]. (The term "orthogonal" comes from the fact the signals +being sent on different frequency + +channels are created so that they interfere very little with each other, +even when channel frequencies are tightly spaced). In LTE, each active +mobile node is allocated one or more 0.5 ms time slots in one or more of +the channel frequencies. Figure 7.21 shows an allocation of eight time +slots over four frequencies. By being allocated increasingly more time +slots (whether on the same frequency or on different frequencies), a +mobile node is able to achieve increasingly higher transmission rates. +Slot (re)allocation among mobile + +Figure 7.21 Twenty 0.5 ms slots organized into 10 ms frames at each +frequency. An eight-slot allocation is shown shaded. + +nodes can be performed as often as once every millisecond. Different +modulation schemes can also be used to change the transmission rate; see +our earlier discussion of Figure 7.3 and dynamic selection of modulation +schemes in WiFi networks. The particular allocation of time slots to +mobile nodes is not mandated by the LTE standard. Instead, the decision +of which mobile nodes will be allowed to transmit in a given time slot +on a given frequency is determined by the scheduling algorithms provided +by the LTE equipment vendor and/or the network operator. With +opportunistic scheduling \[Bender 2000; Kolding 2003; Kulkarni 2005\], +matching the physical-layer protocol to the channel conditions between +the sender and receiver and choosing the receivers to which packets will +be sent based on channel conditions allow the radio network controller +to make best use of the wireless medium. In addition, user priorities +and contracted levels of service (e.g., silver, gold, or platinum) can +be used in scheduling downstream packet transmissions. In addition to +the LTE capabilities described above, LTE-Advanced allows for downstream +bandwidths of hundreds of Mbps by allocating aggregated channels to a +mobile node \[Akyildiz 2010\]. + +An additional 4G wireless technology---WiMAX (World Interoperability for +Microwave Access)---is a family of IEEE 802.16 standards that differ +significantly from LTE. WiMAX has not yet been able to enjoy the +widespread deployment of LTE. A detailed discussion of WiMAX can be +found on this book's Web site. + +7.5 Mobility Management: Principles Having covered the wireless nature +of the communication links in a wireless network, it's now time to turn +our attention to the mobility that these wireless links enable. In the +broadest sense, a mobile node is one that changes its point of +attachment into the network over time. Because the term mobility has +taken on many meanings in both the computer and telephony worlds, it +will serve us well first to consider several dimensions of mobility in +some detail. From the network layer's standpoint, how mobile is a user? +A physically mobile user will present a very different set of challenges +to the network layer, depending on how he or she moves between points of +attachment to the network. At one end of the spectrum in Figure 7.22, a +user may carry a laptop with a wireless network interface card around in +a building. As we saw in Section 7.3.4, this user is not mobile from a +network-layer perspective. Moreover, if the user associates with the +same access point regardless of location, the user is not even mobile +from the perspective of the link layer. At the other end of the +spectrum, consider the user zooming along the autobahn in a BMW or Tesla +at 150 kilometers per hour, passing through multiple wireless access +networks and wanting to maintain an uninterrupted TCP connection to a +remote application throughout the trip. This user is definitely mobile! +In between + +Figure 7.22 Various degrees of mobility, from the network layer's point +of view + +these extremes is a user who takes a laptop from one location (e.g., +office or dormitory) into another (e.g., coffeeshop, classroom) and +wants to connect into the-network in the new location. This user is also +mobile (although less so than the BMW driver!) but does not need to +maintain an ongoing connection while moving between points of attachment +to the network. Figure 7.22 illustrates this spectrum of user mobility +from the network layer's perspective. How important is it for the mobile +node's address to always remain the same? With mobile telephony, your +phone number---essentially the network-layer address of your +phone---remains the same as you travel from one provider's mobile phone +network to another. Must a laptop similarly + +maintain the same IP address while moving between IP networks? The +answer to this question will depend strongly on the applications being +run. For the BMW or Tesla driver who wants to maintain an uninterrupted +TCP connection to a remote application while zipping along the autobahn, +it would be convenient to maintain the same IP address. Recall from +Chapter 3 that an Internet application needs to know the IP address and +port number of the remote entity with which it is communicating. If a +mobile entity is able to maintain its IP address as it moves, mobility +becomes invisible from the application standpoint. There is great value +to this transparency ---an application need not be concerned with a +potentially changing IP address, and the same application code serves +mobile and nonmobile connections alike. We'll see in the following +section that mobile IP provides this transparency, allowing a mobile +node to maintain its permanent IP address while moving among networks. +On the other hand, a less glamorous mobile user might simply want to +turn off an office laptop, bring that laptop home, power up, and work +from home. If the laptop functions primarily as a client in +client-server applications (e.g., send/read e-mail, browse the Web, +Telnet to a remote host) from home, the particular IP address used by +the laptop is not that important. In particular, one could get by fine +with an address that is temporarily allocated to the laptop by the ISP +serving the home. We saw in Section 4.3 that DHCP already provides this +functionality. What supporting wired infrastructure is available? In all +of our scenarios above, we've implicitly assumed that there is a fixed +infrastructure to which the mobile user can connect---for example, the +home's ISP network, the wireless access network in the office, or the +wireless access networks lining the autobahn. What if no such +infrastructure exists? If two users are within communication proximity +of each other, can they establish a network connection in the absence of +any other network-layer infrastructure? Ad hoc networking provides +precisely these capabilities. This rapidly developing area is at the +cutting edge of mobile networking research and is beyond the scope of +this book. \[Perkins 2000\] and the IETF Mobile Ad Hoc Network (manet) +working group Web pages \[manet 2016\] provide thorough treatments of +the subject. In order to illustrate the issues involved in allowing a +mobile user to maintain ongoing connections while moving between +networks, let's consider a human analogy. A twenty-something adult +moving out of the family home becomes mobile, living in a series of +dormitories and/or apartments, and often changing addresses. If an old +friend wants to get in touch, how can that friend find the address of +her mobile friend? One common way is to contact the family, since a +mobile adult will often register his or her current address with the +family (if for no other reason than so that the parents can send money +to help pay the rent!). The family home, with its permanent address, +becomes that one place that others can go as a first step in +communicating with the mobile adult. Later communication from the friend +may be either indirect (for example, with mail being sent first to the +parents' home and then forwarded to the mobile adult) or direct (for +example, with the friend using the address obtained from the parents to +send mail directly to her mobile friend). + +In a network setting, the permanent home of a mobile node (such as a +laptop or smartphone) is known as the home network, and the entity +within the home network that performs the mobility management functions +discussed below on behalf of the mobile node is known as the home agent. +The network in which the mobile node is currently residing is known as +the foreign (or visited) network, and the entity within the foreign +network that helps the mobile node with the mobility management +functions discussed below is known as a foreign agent. For mobile +professionals, their home network might likely be their company network, +while the visited network might be the network of a colleague they are +visiting. A correspondent is the entity wishing to communicate with the +mobile node. Figure 7.23 illustrates these concepts, as well as +addressing concepts considered below. In Figure 7.23, note that agents +are shown as being collocated with routers (e.g., as processes running +on routers), but alternatively they could be executing on other hosts or +servers in the network. + +7.5.1 Addressing We noted above that in order for user mobility to be +transparent to network applications, it is desirable for a mobile node +to keep its address as it moves from one network + +Figure 7.23 Initial elements of a mobile network architecture + +to another. When a mobile node is resident in a foreign network, all +traffic addressed to the node's permanent address now needs to be routed +to the foreign network. How can this be done? One option is for the +foreign network to advertise to all other networks that the mobile node +is resident in its network. This could be via the usual exchange of +intradomain and interdomain routing information and would require few +changes to the existing routing infrastructure. The foreign network +could simply advertise to its neighbors that it has a highly specific +route to the mobile node's permanent address (that is, essentially +inform other networks that it has the correct path for routing datagrams +to the mobile node's permanent address; see Section 4.3). These +neighbors would then propagate this routing information throughout the +network as part of the normal procedure of updating routing information +and forwarding tables. When the mobile node leaves one foreign network +and joins another, the new foreign network would advertise a new, highly +specific route to the mobile node, and the old foreign network would +withdraw its routing information regarding the mobile node. This solves +two problems at once, and it does so without making significant changes +to the networklayer infrastructure. Other networks know the location of +the mobile node, and it is easy to route datagrams to the mobile node, +since the forwarding tables will direct datagrams to the foreign +network. A significant drawback, however, is that of scalability. If +mobility management were to be the responsibility of network routers, +the routers would have to maintain forwarding table entries for +potentially millions of mobile nodes, and update these entries as nodes +move. Some additional drawbacks are explored in the problems at the end +of this chapter. An alternative approach (and one that has been adopted +in practice) is to push mobility functionality from the network core to +the network edge---a recurring theme in our study of Internet +architecture. A natural way to do this is via the mobile node's home +network. In much the same way that parents of the mobile +twenty-something track their child's location, the home agent in the +mobile node's home network can track the foreign network in which the +mobile node resides. A protocol between the mobile node (or a foreign +agent representing the mobile node) and the home agent will certainly be +needed to update the mobile node's location. Let's now consider the +foreign agent in more detail. The conceptually simplest approach, shown +in Figure 7.23, is to locate foreign agents at the edge routers in the +foreign network. One role of the foreign agent is to create a so-called +care-of address (COA) for the mobile node, with the network portion of +the COA matching that of the foreign network. There are thus two +addresses associated with a mobile node, its permanent address +(analogous to our mobile youth's family's home address) and its COA, +sometimes known as a foreign address (analogous to the address of the +house in which our mobile youth is currently residing). In the example +in Figure 7.23, the permanent address of the mobile node is +128.119.40.186. When visiting network 79.129.13/24, the mobile node has +a COA of 79.129.13.2. A second role of the foreign agent is to inform +the home agent that the mobile node is resident in its (the foreign +agent's) network and has the given COA. We'll see shortly that the COA +will + +be used to "reroute" datagrams to the mobile node via its foreign agent. +Although we have separated the functionality of the mobile node and the +foreign agent, it is worth noting that the mobile node can also assume +the responsibilities of the foreign agent. For example, the mobile node +could obtain a COA in the foreign network (for example, using a protocol +such as DHCP) and itself inform the home agent of its COA. + +7.5.2 Routing to a Mobile Node We have now seen how a mobile node +obtains a COA and how the home agent can be informed of that address. +But having the home agent know the COA solves only part of the problem. +How should datagrams be addressed and forwarded to the mobile node? +Since only the home agent (and not network-wide routers) knows the +location of the mobile node, it will no longer suffice to simply address +a datagram to the mobile node's permanent address and send it into the +network-layer infrastructure. Something more must be done. Two +approaches can be identified, which we will refer to as indirect and +direct routing. Indirect Routing to a Mobile Node Let's first consider a +correspondent that wants to send a datagram to a mobile node. In the +indirect routing approach, the correspondent simply addresses the +datagram to the mobile node's permanent address and sends the datagram +into the network, blissfully unaware of whether the mobile node is +resident in its home network or is visiting a foreign network; mobility +is thus completely transparent to the correspondent. Such datagrams are +first routed, as usual, to the mobile node's home network. This is +illustrated in step 1 in Figure 7.24. Let's now turn our attention to +the home agent. In addition to being responsible for interacting with a +foreign agent to track the mobile node's COA, the home agent has another +very important function. Its second job is to be on the lookout for +arriving datagrams addressed to nodes whose home network is that of the +home agent but that are currently resident in a foreign network. The +home agent intercepts these datagrams and then forwards them to a mobile +node in a two-step process. The datagram is first forwarded to the +foreign agent, using the mobile node's COA (step 2 in Figure 7.24), and +then forwarded from the foreign agent to the mobile node (step 3 in +Figure 7.24). + +Figure 7.24 Indirect routing to a mobile node + +It is instructive to consider this rerouting in more detail. The home +agent will need to address the datagram using the mobile node's COA, so +that the network layer will route the datagram to the foreign network. +On the other hand, it is desirable to leave the correspondent's datagram +intact, since the application receiving the datagram should be unaware +that the datagram was forwarded via the home agent. Both goals can be +satisfied by having the home agent encapsulate the correspondent's +original complete datagram within a new (larger) datagram. This larger +datagram is addressed and delivered to the mobile node's COA. The +foreign agent, who "owns" the COA, will receive and decapsulate the +datagram---that is, remove the correspondent's original datagram from +within the larger encapsulating datagram and forward (step 3 in Figure +7.24) the original datagram to the mobile node. Figure 7.25 shows a +correspondent's original datagram being sent to the home network, an +encapsulated datagram being sent to the foreign agent, and the original +datagram being delivered to the mobile node. The sharp reader will note +that the encapsulation/decapsulation described here is identical to the +notion of tunneling, discussed in Section 4.3 in the context of IP +multicast and IPv6. Let's next consider how a mobile node sends +datagrams to a correspondent. This is quite simple, as the mobile node +can address its datagram directly to the correspondent (using its own +permanent address as the source address, and the + +Figure 7.25 Encapsulation and decapsulation + +correspondent's address as the destination address). Since the mobile +node knows the correspondent's address, there is no need to route the +datagram back through the home agent. This is shown as step 4 in Figure +7.24. Let's summarize our discussion of indirect routing by listing the +new network-layer functionality required to support mobility. A +mobile-node--to--foreign-agent protocol. The mobile node will register +with the foreign agent when attaching to the foreign network. Similarly, +a mobile node will deregister with the foreign agent when it leaves the +foreign network. A foreign-agent--to--home-agent registration protocol. +The foreign agent will register the mobile node's COA with the home +agent. A foreign agent need not explicitly deregister a COA when a +mobile node leaves its network, because the subsequent registration of a +new COA, when the mobile node moves to a new network, will take care of +this. A home-agent datagram encapsulation protocol. Encapsulation and +forwarding of the correspondent's original datagram within a datagram +addressed to the COA. A foreign-agent decapsulation protocol. Extraction +of the correspondent's original datagram from the encapsulating +datagram, and the forwarding of the original datagram to the mobile +node. The previous discussion provides all the pieces---foreign agents, +the home agent, and indirect + +forwarding---needed for a mobile node to maintain an ongoing connection +while moving among networks. As an example of how these pieces fit +together, assume the mobile node is attached to foreign network A, has +registered a COA in network A with its home agent, and is receiving +datagrams that are being indirectly routed through its home agent. The +mobile node now moves to foreign network B and registers with the +foreign agent in network B, which informs the home agent of the mobile +node's new COA. From this point on, the home agent will reroute +datagrams to foreign network B. As far as a correspondent is concerned, +mobility is transparent---datagrams are routed via the same home agent +both before and after the move. As far as the home agent is concerned, +there is no disruption in the flow of datagrams---arriving datagrams are +first forwarded to foreign network A; after the change in COA, datagrams +are forwarded to foreign network B. But will the mobile node see an +interrupted flow of datagrams as it moves between networks? As long as +the time between the mobile node's disconnection from network A (at +which point it can no longer receive datagrams via A) and its attachment +to network B (at which point it will register a new COA with its home +agent) is small, few datagrams will be lost. Recall from Chapter 3 that +end-to-end connections can suffer datagram loss due to network +congestion. Hence occasional datagram loss within a connection when a +node moves between networks is by no means a catastrophic problem. If +loss-free communication is required, upperlayer mechanisms will recover +from datagram loss, whether such loss results from network congestion or +from user mobility. An indirect routing approach is used in the mobile +IP standard \[RFC 5944\], as discussed in Section 7.6. Direct Routing to +a Mobile Node The indirect routing approach illustrated in Figure 7.24 +suffers from an inefficiency known as the triangle routing +problem---datagrams addressed to the mobile node must be routed first to +the home agent and then to the foreign network, even when a much more +efficient route exists between the correspondent and the mobile node. In +the worst case, imagine a mobile user who is visiting the foreign +network of a colleague. The two are sitting side by side and exchanging +data over the network. Datagrams from the correspondent (in this case +the colleague of the visitor) are routed to the mobile user's home agent +and then back again to the foreign network! Direct routing overcomes the +inefficiency of triangle routing, but does so at the cost of additional +complexity. In the direct routing approach, a correspondent agent in the +correspondent's network first learns the COA of the mobile node. This +can be done by having the correspondent agent query the home agent, +assuming that (as in the case of indirect routing) the mobile node has +an up-to-date value for its COA registered with its home agent. It is +also possible for the correspondent itself to perform the function of +the correspondent agent, just as a mobile node could perform the +function of the foreign agent. This is shown as steps 1 and 2 in Figure +7.26. The correspondent agent then tunnels datagrams directly to the +mobile node's COA, in a manner analogous to the tunneling performed by +the home agent, steps 3 and 4 in Figure 7.26. + +While direct routing overcomes the triangle routing problem, it +introduces two important additional challenges: A mobile-user location +protocol is needed for the correspondent agent to query the home agent +to obtain the mobile node's COA (steps 1 and 2 in Figure 7.26). When the +mobile node moves from one foreign network to another, how will data now +be forwarded to the new foreign network? In the case of indirect +routing, this problem was easily solved by updating the COA maintained +by the home agent. However, with direct routing, the home agent is +queried for the COA by the correspondent agent only once, at the +beginning of the session. Thus, updating the COA at the home agent, +while necessary, will not be enough to solve the problem of routing data +to the mobile node's new foreign network. One solution would be to +create a new protocol to notify the correspondent of the changing COA. +An alternate solution, and one that we'll see adopted in practice + +Figure 7.26 Direct routing to a mobile user + +in GSM networks, works as follows. Suppose data is currently being +forwarded to the mobile node in the foreign network where the mobile +node was located when the session first started (step 1 in Figure 7.27). +We'll identify the foreign agent in that foreign network where the +mobile node was first found as the anchor foreign agent. When the mobile +node moves to a new foreign network (step 2 in Figure 7.27), the mobile +node registers with the new foreign agent (step 3), and the new foreign +agent provides the anchor foreign agent with the mobile node's new COA +(step 4). When the anchor foreign agent receives an encapsulated +datagram for a departed mobile node, it can then re-encapsulate the +datagram and forward it to the mobile node (step 5) using the new COA. +If the mobile node later moves yet again to a new foreign network, the +foreign agent in that new visited network would then contact the anchor +foreign agent in order to set up forwarding to this new foreign network. + +Figure 7.27 Mobile transfer between networks with direct routing + +7.6 Mobile IP The Internet architecture and protocols for supporting +mobility, collectively known as mobile IP, are defined primarily in RFC +5944 for IPv4. Mobile IP is a flexible standard, supporting many +different modes of operation (for example, operation with or without a +foreign agent), multiple ways for agents and mobile nodes to discover +each other, use of single or multiple COAs, and multiple forms of +encapsulation. As such, mobile IP is a complex standard, and would +require an entire book to describe in detail; indeed one such book is +\[Perkins 1998b\]. Our modest goal here is to provide an overview of the +most important aspects of mobile IP and to illustrate its use in a few +common-case scenarios. The mobile IP architecture contains many of the +elements we have considered above, including the concepts of home +agents, foreign agents, care-of addresses, and +encapsulation/decapsulation. The current standard \[RFC 5944\] specifies +the use of indirect routing to the mobile node. The mobile IP standard +consists of three main pieces: Agent discovery. Mobile IP defines the +protocols used by a home or foreign agent to advertise its services to +mobile nodes, and protocols for mobile nodes to solicit the services of +a foreign or home agent. Registration with the home agent. Mobile IP +defines the protocols used by the mobile node and/or foreign agent to +register and deregister COAs with a mobile node's home agent. Indirect +routing of datagrams. The standard also defines the manner in which +datagrams are forwarded to mobile nodes by a home agent, including rules +for forwarding datagrams, rules for handling error conditions, and +several forms of encapsulation \[RFC 2003, RFC 2004\]. Security +considerations are prominent throughout the mobile IP standard. For +example, authentication of a mobile node is clearly needed to ensure +that a malicious user does not register a bogus care-of address with a +home agent, which could cause all datagrams addressed to an IP address +to be redirected to the malicious user. Mobile IP achieves security +using many of the mechanisms that we will examine in Chapter 8, so we +will not address security considerations in our discussion below. Agent +Discovery A mobile IP node arriving to a new network, whether attaching +to a foreign network or returning to its home network, must learn the +identity of the corresponding foreign or home agent. Indeed it is the +discovery of a new foreign agent, with a new network address, that +allows the network layer in a mobile + +node to learn that it has moved into a new foreign network. This process +is known as agent discovery. Agent discovery can be accomplished in one +of two ways: via agent advertisement or via agent solicitation. With +agent advertisement, a foreign or home agent advertises its services +using an extension to the existing router discovery protocol \[RFC +1256\]. The agent periodically broadcasts an ICMP message with a type +field of 9 (router discovery) on all links to which it is connected. The +router discovery message contains the IP address of the router (that is, +the agent), thus allowing a mobile node to learn the agent's IP address. +The router discovery message also contains a mobility agent +advertisement extension that contains additional information needed by +the mobile node. Among the more important fields in the extension are +the following: Home agent bit (H). Indicates that the agent is a home +agent for the network in which it resides. Foreign agent bit (F). +Indicates that the agent is a foreign agent for the network in which it +resides. Registration required bit (R). Indicates that a mobile user in +this network must register with a foreign agent. In particular, a mobile +user cannot obtain a care-of address in the foreign network (for +example, using DHCP) and assume the functionality of the foreign agent +for itself, without registering with the foreign agent. + +Figure 7.28 ICMP router discovery message with mobility agent +advertisement extension + +M, G encapsulation bits. Indicate whether a form of encapsulation other +than IP-in-IP encapsulation will be used. Care-of address (COA) fields. +A list of one or more care-of addresses provided by the foreign + +agent. In our example below, the COA will be associated with the foreign +agent, who will receive datagrams sent to the COA and then forward them +to the appropriate mobile node. The mobile user will select one of these +addresses as its COA when registering with its home agent. Figure 7.28 +illustrates some of the key fields in the agent advertisement message. +With agent solicitation, a mobile node wanting to learn about agents +without waiting to receive an agent advertisement can broadcast an agent +solicitation message, which is simply an ICMP message with type value +10. An agent receiving the solicitation will unicast an agent +advertisement directly to the mobile node, which can then proceed as if +it had received an unsolicited advertisement. Registration with the Home +Agent Once a mobile IP node has received a COA, that address must be +registered with the home agent. This can be done either via the foreign +agent (who then registers the COA with the home agent) or directly by +the mobile IP node itself. We consider the former case below. Four steps +are involved. + +1. Following the receipt of a foreign agent advertisement, a mobile + node sends a mobile IP registration message to the foreign agent. + The registration message is carried within a UDP datagram and sent + to port 434. The registration message carries a COA advertised by + the foreign agent, the address of the home agent (HA), the permanent + address of the mobile node (MA), the requested lifetime of the + registration, and a 64-bit registration identification. The + requested registration lifetime is the number of seconds that the + registration is to be valid. If the registration is not renewed at + the home agent within the specified lifetime, the registration will + become invalid. The registration identifier acts like a sequence + number and serves to match a received registration reply with a + registration request, as discussed below. + +2. The foreign agent receives the registration message and records the + mobile node's permanent IP address. The foreign agent now knows that + it should be looking for datagrams containing an encapsulated + datagram whose destination address matches the permanent address of + the mobile node. The foreign agent then sends a mobile IP + registration message (again, within a UDP datagram) to port 434 of + the home agent. The message contains the COA, HA, MA, encapsulation + format requested, requested registration lifetime, and registration + identification. + +3. The home agent receives the registration request and checks for + authenticity and correctness. The home agent binds the mobile node's + permanent IP address with the COA; in the future, datagrams arriving + at the home agent and addressed to the mobile node will now be + encapsulated and tunneled to the COA. The home agent sends a mobile + IP registration reply containing the HA, MA, actual registration + lifetime, and the registration identification of the request that is + being satisfied with this reply. + +4. The foreign agent receives the registration reply and then forwards + it to the mobile node. + +At this point, registration is complete, and the mobile node can receive +datagrams sent to its permanent address. Figure 7.29 illustrates these +steps. Note that the home agent specifies a lifetime that is smaller +than the lifetime requested by the mobile node. A foreign agent need not +explicitly deregister a COA when a mobile node leaves its network. This +will occur automatically, when the mobile node moves to a new network +(whether another foreign network or its home network) and registers a +new COA. The mobile IP standard allows many additional scenarios and +capabilities in addition to those described previously. The interested +reader should consult \[Perkins 1998b; RFC 5944\]. + +Figure 7.29 Agent advertisement and mobile IP registration + +7.7 Managing Mobility in Cellular Networks Having examined how mobility +is managed in IP networks, let's now turn our attention to networks with +an even longer history of supporting mobility---cellular telephony +networks. Whereas we focused on the first-hop wireless link in cellular +networks in Section 7.4, we'll focus here on mobility, using the GSM +cellular network \[Goodman 1997; Mouly 1992; Scourias 2012; Kaaranen +2001; Korhonen 2003; Turner 2012\] as our case study, since it is a +mature and widely deployed technology. Mobility in 3G and 4G networks is +similar in principle to that used in GSM. As in the case of mobile IP, +we'll see that a number of the fundamental principles we identified in +Section 7.5 are embodied in GSM's network architecture. Like mobile IP, +GSM adopts an indirect routing approach (see Section 7.5.2), first +routing the correspondent's call to the mobile user's home network and +from there to the visited network. In GSM terminology, the mobile +users's home network is referred to as the mobile user's home public +land mobile network (home PLMN). Since the PLMN acronym is a bit of a +mouthful, and mindful of our quest to avoid an alphabet soup of +acronyms, we'll refer to the GSM home PLMN simply as the home network. +The home network is the cellular provider with which the mobile user has +a subscription (i.e., the provider that bills the user for monthly +cellular service). The visited PLMN, which we'll refer to simply as the +visited network, is the network in which the mobile user is currently +residing. As in the case of mobile IP, the responsibilities of the home +and visited networks are quite different. The home network maintains a +database known as the home location register (HLR), which contains the +permanent cell phone number and subscriber profile information for each +of its subscribers. Importantly, the HLR also contains information about +the current locations of these subscribers. That is, if a mobile user is +currently roaming in another provider's cellular network, the HLR +contains enough information to obtain (via a process we'll describe +shortly) an address in the visited network to which a call to the mobile +user should be routed. As we'll see, a special switch in the home +network, known as the Gateway Mobile services Switching Center (GMSC) is +contacted by a correspondent when a call is placed to a mobile user. +Again, in our quest to avoid an alphabet soup of acronyms, we'll refer +to the GMSC here by a more descriptive term, home MSC. The visited +network maintains a database known as the visitor location register +(VLR). The VLR contains an entry for each mobile user that is currently +in the portion of the network served by the VLR. VLR entries thus come +and go as mobile users enter and leave the network. A VLR is usually +co-located with the mobile switching center (MSC) that coordinates the +setup of a call to and from the visited network. + +In practice, a provider's cellular network will serve as a home network +for its subscribers and as a visited network for mobile users whose +subscription is with a different cellular provider. + +Figure 7.30 Placing a call to a mobile user: Indirect routing + +7.7.1 Routing Calls to a Mobile User We're now in a position to describe +how a call is placed to a mobile GSM user in a visited network. We'll +consider a simple example below; more complex scenarios are described in +\[Mouly 1992\]. The steps, as illustrated in Figure 7.30, are as +follows: + +1. The correspondent dials the mobile user's phone number. This number + itself does not refer to a particular telephone line or location + (after all, the phone number is fixed and the user is mobile!). The + leading digits in the number are sufficient to globally identify the + mobile's home network. The call is routed from the correspondent + through the PSTN to the home MSC in the mobile's home network. This + is the first leg of the call. + +2. The home MSC receives the call and interrogates the HLR to determine + the location of the mobile user. In the simplest case, the HLR + returns the mobile station roaming number (MSRN), which we will + refer to as the roaming number. Note that this number is different + from the mobile's permanent phone number, which is associated with + the mobile's home network. The + +roaming number is ephemeral: It is temporarily assigned to a mobile when +it enters a visited network. The roaming number serves a role similar to +that of the care-of address in mobile IP and, like the COA, is invisible +to the correspondent and the mobile. If HLR does not have the roaming +number, it returns the address of the VLR in the visited network. In +this case (not shown in Figure 7.30), the home MSC will need to query +the VLR to obtain the roaming number of the mobile node. But how does +the HLR get the roaming number or the VLR address in the first place? +What happens to these values when the mobile user moves to another +visited network? We'll consider these important questions shortly. + +3. Given the roaming number, the home MSC sets up the second leg of the + call through the network to the MSC in the visited network. The call + is completed, being routed from the correspondent to the home MSC, + and from there to the visited MSC, and from there to the base + station serving the mobile user. An unresolved question in step 2 is + how the HLR obtains information about the location of the mobile + user. When a mobile telephone is switched on or enters a part of a + visited network that is covered by a new VLR, the mobile must + register with the visited network. This is done through the exchange + of signaling messages between the mobile and the VLR. The visited + VLR, in turn, sends a location update request message to the + mobile's HLR. This message informs the HLR of either the roaming + number at which the mobile can be contacted, or the address of the + VLR (which can then later be queried to obtain the mobile number). + As part of this exchange, the VLR also obtains subscriber + information from the HLR about the mobile and determines what + services (if any) should be accorded the mobile user by the visited + network. + +7.7.2 Handoffs in GSM A handoff occurs when a mobile station changes its +association from one base station to another during a call. As shown in +Figure 7.31, a mobile's call is initially (before handoff) routed to the +mobile through one base station (which we'll refer to as the old base +station), and after handoff is routed to the mobile through another base + +Figure 7.31 Handoff scenario between base stations with a common MSC + +station (which we'll refer to as the new base station). Note that a +handoff between base stations results not only in the mobile +transmitting/receiving to/from a new base station, but also in the +rerouting of the ongoing call from a switching point within the network +to the new base station. Let's initially assume that the old and new +base stations share the same MSC, and that the rerouting occurs at this +MSC. There may be several reasons for handoff to occur, including (1) +the signal between the current base station and the mobile may have +deteriorated to such an extent that the call is in danger of being +dropped, and (2) a cell may have become overloaded, handling a large +number of calls. This congestion may be alleviated by handing off +mobiles to less congested nearby cells. While it is associated with a +base station, a mobile periodically measures the strength of a beacon +signal from its current base station as well as beacon signals from +nearby base stations that it can "hear." These measurements are reported +once or twice a second to the mobile's current base station. Handoff in +GSM is initiated by the old base station based on these measurements, +the current loads of mobiles in nearby cells, and other factors \[Mouly +1992\]. The GSM standard does not specify the specific algorithm to be +used by a base station to determine whether or not to perform handoff. +Figure 7.32 illustrates the steps involved when a base station does +decide to hand off a mobile user: + +1. The old base station (BS) informs the visited MSC that a handoff is + to be performed and the BS (or possible set of BSs) to which the + mobile is to be handed off. + +2. The visited MSC initiates path setup to the new BS, allocating the + resources needed to carry the rerouted call, and signaling the new + BS that a handoff is about to occur. + +3. The new BS allocates and activates a radio channel for use by the + mobile. + +4. The new BS signals back to the visited MSC and the old BS that the + visited-MSC-to-new-BS path has been established and that the mobile + should be + +Figure 7.32 Steps in accomplishing a handoff between base stations with +a common MSC + +informed of the impending handoff. The new BS provides all of the +information that the mobile will need to associate with the new BS. + +5. The mobile is informed that it should perform a handoff. Note that + up until this point, the mobile has been blissfully unaware that the + network has been laying the groundwork (e.g., allocating a channel + in the new BS and allocating a path from the visited MSC to the new + BS) for a handoff. + +6. The mobile and the new BS exchange one or more messages to fully + activate the new channel in the new BS. + +7. The mobile sends a handoff complete message to the new BS, which is + forwarded up to the visited MSC. The visited MSC then reroutes the + ongoing call to the mobile via the new BS. + +8. The resources allocated along the path to the old BS are then + released. Let's conclude our discussion of handoff by considering + what happens when the mobile moves to a BS that is associated with a + different MSC than the old BS, and what happens when this inter-MSC + handoff occurs more than once. As shown in Figure 7.33, GSM defines + the notion of an anchor MSC. The anchor MSC is the MSC visited by + the mobile when a call first begins; the anchor MSC thus remains + unchanged during the call. Throughout the call's duration and + regardless of the number of inter-MSC + +Figure 7.33 Rerouting via the anchor MSC + +Table 7.2 Commonalities between mobile IP and GSM mobility GSM element + +Comment on GSM element + +Mobile IP element + +Home system + +Network to which the mobile user's permanent phone number + +Home + +belongs. + +network + +Gateway mobile + +Home MSC: point of contact to obtain routable address of + +Home + +switching center or + +mobile user. HLR: database in home system containing + +agent + +simply home MSC, + +permanent phone number, profile information, current location + +Home location register + +of mobile user, subscription information. + +(HLR) Visited system + +Network other than home system where mobile user is + +Visited + +currently residing. + +network + +Visited mobile services + +Visited MSC: responsible for setting up calls to/from mobile + +Foreign + +switching center, + +nodes in cells associated with MSC. VLR: temporary database + +agent + +Visitor location register + +entry in visited system, containing subscription information for + +(VLR) + +each visiting mobile user. + +Mobile station roaming + +Routable address for telephone call segment between home + +Care-of + +number (MSRN) or + +MSC and visited MSC, visible to neither the mobile nor the + +address + +simply roaming number + +correspondent. + +transfers performed by the mobile, the call is routed from the home MSC +to the anchor MSC, and then from the anchor MSC to the visited MSC where +the mobile is currently located. When a mobile moves from the coverage +area of one MSC to another, the ongoing call is rerouted from the anchor +MSC to the new visited MSC containing the new base station. Thus, at all +times there are at most three MSCs (the home MSC, the anchor MSC, and +the visited MSC) between the correspondent and the mobile. Figure 7.33 +illustrates the routing of a call among the MSCs visited by a mobile +user. Rather than maintaining a single MSC hop from the anchor MSC to +the current MSC, an alternative approach would have been to simply chain +the MSCs visited by the mobile, having an old MSC forward the ongoing +call to the new MSC each time the mobile moves to a new MSC. Such MSC +chaining can in fact occur in IS-41 cellular networks, with an optional +path minimization step to remove MSCs between the anchor MSC and the +current visited MSC \[Lin 2001\]. Let's wrap up our discussion of GSM +mobility management with a comparison of mobility management in GSM and +Mobile IP. The comparison in Table 7.2 indicates that although IP and +cellular networks are fundamentally different in many ways, they share a +surprising number of common functional elements and overall approaches +in handling mobility. + +7.8 Wireless and Mobility: Impact on Higher-Layer Protocols In this +chapter, we've seen that wireless networks differ significantly from +their wired counterparts at both the link layer (as a result of wireless +channel characteristics such as fading, multipath, and hidden terminals) +and at the network layer (as a result of mobile users who change their +points of attachment to the network). But are there important +differences at the transport and application layers? It's tempting to +think that these differences will be minor, since the network layer +provides the same best-effort delivery service model to upper layers in +both wired and wireless networks. Similarly, if protocols such as TCP or +UDP are used to provide transport-layer services to applications in both +wired and wireless networks, then the application layer should remain +unchanged as well. In one sense our intuition is right---TCP and UDP can +(and do) operate in networks with wireless links. On the other hand, +transport protocols in general, and TCP in particular, can sometimes +have very different performance in wired and wireless networks, and it +is here, in terms of performance, that differences are manifested. Let's +see why. Recall that TCP retransmits a segment that is either lost or +corrupted on the path between sender and receiver. In the case of mobile +users, loss can result from either network congestion (router buffer +overflow) or from handoff (e.g., from delays in rerouting segments to a +mobile's new point of attachment to the network). In all cases, TCP's +receiver-to-sender ACK indicates only that a segment was not received +intact; the sender is unaware of whether the segment was lost due to +congestion, during handoff, or due to detected bit errors. In all cases, +the sender's response is the same---to retransmit the segment. TCP's +congestion-control response is also the same in all cases---TCP +decreases its congestion window, as discussed in Section 3.7. By +unconditionally decreasing its congestion window, TCP implicitly assumes +that segment loss results from congestion rather than corruption or +handoff. We saw in Section 7.2 that bit errors are much more common in +wireless networks than in wired networks. When such bit errors occur or +when handoff loss occurs, there's really no reason for the TCP sender to +decrease its congestion window (and thus decrease its sending rate). +Indeed, it may well be the case that router buffers are empty and +packets are flowing along the end-to-end path unimpeded by congestion. +Researchers realized in the early to mid 1990s that given high bit error +rates on wireless links and the possibility of handoff loss, TCP's +congestion-control response could be problematic in a wireless setting. +Three broad classes of approaches are possible for dealing with this +problem: Local recovery. Local recovery protocols recover from bit +errors when and where (e.g., at the wireless link) they occur, e.g., the +802.11 ARQ protocol we studied in Section 7.3, or more sophisticated +approaches that use both ARQ and FEC \[Ayanoglu 1995\]. + +TCP sender awareness of wireless links. In the local recovery +approaches, the TCP sender is blissfully unaware that its segments are +traversing a wireless link. An alternative approach is for the TCP +sender and receiver to be aware of the existence of a wireless link, to +distinguish between congestive losses occurring in the wired network and +corruption/loss occurring at the wireless link, and to invoke congestion +control only in response to congestive wired-network losses. +\[Balakrishnan 1997\] investigates various types of TCP, assuming that +end systems can make this distinction. \[Liu 2003\] investigates +techniques for distinguishing between losses on the wired and wireless +segments of an end-to-end path. Split-connection approaches. In a +split-connection approach \[Bakre 1995\], the end-to-end connection +between the mobile user and the other end point is broken into two +transport-layer connections: one from the mobile host to the wireless +access point, and one from the wireless access point to the other +communication end point (which we'll assume here is a wired host). The +end-to-end connection is thus formed by the concatenation of a wireless +part and a wired part. The transport layer over the wireless segment can +be a standard TCP connection \[Bakre 1995\], or a specially tailored +error recovery protocol on top of UDP. \[Yavatkar 1994\] investigates +the use of a transport-layer selective repeat protocol over the wireless +connection. Measurements reported in \[Wei 2006\] indicate that split +TCP connections are widely used in cellular data networks, and that +significant improvements can indeed be made through the use of split TCP +connections. Our treatment of TCP over wireless links has been +necessarily brief here. In-depth surveys of TCP challenges and solutions +in wireless networks can be found in \[Hanabali 2005; Leung 2006\]. We +encourage you to consult the references for details of this ongoing area +of research. Having considered transport-layer protocols, let us next +consider the effect of wireless and mobility on application-layer +protocols. Here, an important consideration is that wireless links often +have relatively low bandwidths, as we saw in Figure 7.2. As a result, +applications that operate over wireless links, particularly over +cellular wireless links, must treat bandwidth as a scarce commodity. For +example, a Web server serving content to a Web browser executing on a 4G +phone will likely not be able to provide the same image-rich content +that it gives to a browser operating over a wired connection. Although +wireless links do provide challenges at the application layer, the +mobility they enable also makes possible a rich set of location-aware +and context-aware applications \[Chen 2000; Baldauf 2007\]. More +generally, wireless and mobile networks will play a key role in +realizing the ubiquitous computing environments of the future \[Weiser +1991\]. It's fair to say that we've only seen the tip of the iceberg +when it comes to the impact of wireless and mobile networks on networked +applications and their protocols! + +7.9 Summary Wireless and mobile networks have revolutionized telephony +and are having an increasingly profound impact in the world of computer +networks as well. With their anytime, anywhere, untethered access into +the global network infrastructure, they are not only making network +access more ubiquitous, they are also enabling an exciting new set of +location-dependent services. Given the growing importance of wireless +and mobile networks, this chapter has focused on the principles, common +link technologies, and network architectures for supporting wireless and +mobile communication. We began this chapter with an introduction to +wireless and mobile networks, drawing an important distinction between +the challenges posed by the wireless nature of the communication links +in such networks, and by the mobility that these wireless links enable. +This allowed us to better isolate, identify, and master the key concepts +in each area. We focused first on wireless communication, considering +the characteristics of a wireless link in Section 7.2. In Sections 7.3 +and 7.4, we examined the link-level aspects of the IEEE 802.11 (WiFi) +wireless LAN standard, two IEEE 802.15 personal area networks (Bluetooth +and Zigbee), and 3G and 4G cellular Internet access. We then turned our +attention to the issue of mobility. In Section 7.5, we identified +several forms of mobility, with points along this spectrum posing +different challenges and admitting different solutions. We considered +the problems of locating and routing to a mobile user, as well as +approaches for handing off the mobile user who dynamically moves from +one point of attachment to the network to another. We examined how these +issues were addressed in the mobile IP standard and in GSM, in Sections +7.6 and 7.7, respectively. Finally, we considered the impact of wireless +links and mobility on transport-layer protocols and networked +applications in Section 7.8. Although we have devoted an entire chapter +to the study of wireless and mobile networks, an entire book (or more) +would be required to fully explore this exciting and rapidly expanding +field. We encourage you to delve more deeply into this field by +consulting the many references provided in this chapter. + +Homework Problems and Questions + +Chapter 7 Review Questions + +Section 7.1 R1. What does it mean for a wireless network to be operating +in "infrastructure mode"? If the network is not in infrastructure mode, +what mode of operation is it in, and what is the difference between that +mode of operation and infrastructure mode? R2. What are the four types +of wireless networks identified in our taxonomy in Section 7.1 ? Which +of these types of wireless networks have you used? + +Section 7.2 R3. What are the differences between the following types of +wireless channel impairments: path loss, multipath propagation, +interference from other sources? R4. As a mobile node gets farther and +farther away from a base station, what are two actions that a base +station could take to ensure that the loss probability of a transmitted +frame does not increase? + +Sections 7.3 and 7.4 R5. Describe the role of the beacon frames in +802.11. R6. True or false: Before an 802.11 station transmits a data +frame, it must first send an RTS frame and receive a corresponding CTS +frame. R7. Why are acknowledgments used in 802.11 but not in wired +Ethernet? R8. True or false: Ethernet and 802.11 use the same frame +structure. R9. Describe how the RTS threshold works. R10. Suppose the +IEEE 802.11 RTS and CTS frames were as long as the standard DATA and ACK +frames. Would there be any advantage to using the CTS and RTS frames? +Why or why not? R11. Section 7.3.4 discusses 802.11 mobility, in which a +wireless station moves from one BSS to another within the same subnet. +When the APs are interconnected with a switch, an AP may need to send a +frame with a spoofed MAC address to get the switch to forward the frame +properly. Why? + +R12. What are the differences between a master device in a Bluetooth +network and a base station in an 802.11 network? R13. What is meant by a +super frame in the 802.15.4 Zigbee standard? R14. What is the role of +the "core network" in the 3G cellular data architecture? R15. What is +the role of the RNC in the 3G cellular data network architecture? What +role does the RNC play in the cellular voice network? R16. What is the +role of the eNodeB, MME, P-GW, and S-GW in 4G architecture? R17. What +are three important differences between the 3G and 4G cellular +architectures? + +Sections 7.5 and 7.6 R18. If a node has a wireless connection to the +Internet, does that node have to be mobile? Explain. Suppose that a user +with a laptop walks around her house with her laptop, and always +accesses the Internet through the same access point. Is this user mobile +from a network standpoint? Explain. R19. What is the difference between +a permanent address and a care-of address? Who assigns a care-of +address? R20. Consider a TCP connection going over Mobile IP. True or +false: The TCP connection phase between the correspondent and the mobile +host goes through the mobile's home network, but the data transfer phase +is directly between the correspondent and the mobile host, bypassing the +home network. + +Section 7.7 R21. What are the purposes of the HLR and VLR in GSM +networks? What elements of mobile IP are similar to the HLR and VLR? +R22. What is the role of the anchor MSC in GSM networks? + +Section 7.8 R23. What are three approaches that can be taken to avoid +having a single wireless link degrade the performance of an end-to-end +transport-layer TCP connection? + +Problems P1. Consider the single-sender CDMA example in Figure 7.5 . +What would be the sender's output (for the 2 data bits shown) if the +sender's CDMA code were (1,−1,1,−1,1,−1,1,−1)? P2. Consider sender 2 in +Figure 7.6 . What is the sender's output to the channel (before it is +added to the signal from sender 1), Zi,m2? + +P3. Suppose that the receiver in Figure 7.6 wanted to receive the data +being sent by sender 2. Show (by calculation) that the receiver is +indeed able to recover sender 2's data from the aggregate channel signal +by using sender 2's code. P4. For the two-sender, two-receiver example, +give an example of two CDMA codes containing 1 and 21 values that do not +allow the two receivers to extract the original transmitted bits from +the two CDMA senders. P5. Suppose there are two ISPs providing WiFi +access in a particular café, with each ISP operating its own AP and +having its own IP address block. + +a. Further suppose that by accident, each ISP has configured its AP to + operate over channel 11. Will the 802.11 protocol completely break + down in this situation? Discuss what happens when two stations, each + associated with a different ISP, attempt to transmit at the same + time. + +b. Now suppose that one AP operates over channel 1 and the other over + channel 11. How do your answers change? P6. In step 4 of the CSMA/CA + protocol, a station that successfully transmits a frame begins the + CSMA/CA protocol for a second frame at step 2, rather than at + step 1. What rationale might the designers of CSMA/CA have had in + mind by having such a station not transmit the second frame + immediately (if the channel is sensed idle)? P7. Suppose an 802.11b + station is configured to always reserve the channel with the RTS/CTS + sequence. Suppose this station suddenly wants to transmit 1,000 + bytes of data, and all other stations are idle at this time. As a + function of SIFS and DIFS, and ignoring propagation delay and + assuming no bit errors, calculate the time required to transmit the + frame and receive the acknowledgment. P8. Consider the scenario + shown in Figure 7.34 , in which there are four wireless nodes, A, B, + C, and D. The radio coverage of the four nodes is shown via the + shaded ovals; all nodes share the same frequency. When A transmits, + it + +Figure 7.34 Scenario for problem P8 + +can only be heard/received by B; when B transmits, both A and C can +hear/receive from B; when C transmits, both B and D can hear/receive +from C; when D transmits, only C can hear/receive + +from D. Suppose now that each node has an infinite supply of messages +that it wants to send to each of the other nodes. If a message's +destination is not an immediate neighbor, then the message must be +relayed. For example, if A wants to send to D, a message from A must +first be sent to B, which then sends the message to C, which then sends +the message to D. Time is slotted, with a message transmission time +taking exactly one time slot, e.g., as in slotted Aloha. During a slot, +a node can do one of the following: (i) send a message, (ii) receive a +message (if exactly one message is being sent to it), (iii) remain +silent. As always, if a node hears two or more simultaneous +transmissions, a collision occurs and none of the transmitted messages +are received successfully. You can assume here that there are no +bit-level errors, and thus if exactly one message is sent, it will be +received correctly by those within the transmission radius of the +sender. + +a. Suppose now that an omniscient controller (i.e., a controller that + knows the state of every node in the network) can command each node + to do whatever it (the omniscient controller) wishes, i.e., to send + a message, to receive a message, or to remain silent. Given this + omniscient controller, what is the maximum rate at which a data + message can be transferred from C to A, given that there are no + other messages between any other source/destination pairs? + +b. Suppose now that A sends messages to B, and D sends messages to C. + What is the combined maximum rate at which data messages can flow + from A to B and from D to C? + +c. Suppose now that A sends messages to B, and C sends messages to D. + What is the combined maximum rate at which data messages can flow + from A to B and from C to D? + +d. Suppose now that the wireless links are replaced by wired links. + Repeat questions (a) through (c) again in this wired scenario. + +e. Now suppose we are again in the wireless scenario, and that for + every data message sent from source to destination, the destination + will send an ACK message back to the source (e.g., as in TCP). Also + suppose that each ACK message takes up one slot. Repeat questions + (a)--(c) above for this scenario. P9. Describe the format of the + 802.15.1 Bluetooth frame. You will have to do some reading outside + of the text to find this information. Is there anything in the frame + format that inherently limits the number of active nodes in an + 802.15.1 network to eight active nodes? Explain. P10. Consider the + following idealized LTE scenario. The downstream channel (see Figure + 7.21 ) is slotted in time, across F frequencies. There are four + nodes, A, B, C, and D, reachable from the base station at rates of + 10 Mbps, 5 Mbps, 2.5 Mbps, and 1 Mbps, respectively, on the + downstream channel. These rates assume that the base station + utilizes all time slots available on all F frequencies to send to + just one station. The base station has an infinite amount of data to + send to each of the nodes, and can send to any one of these four + nodes using any of the F frequencies during any time slot in the + downstream sub-frame. + +f. What is the maximum rate at which the base station can send to the + nodes, assuming it + +can send to any node it chooses during each time slot? Is your solution +fair? Explain and define what you mean by "fair." + +b. If there is a fairness requirement that each node must receive an + equal amount of data during each one second interval, what is the + average transmission rate by the base station (to all nodes) during + the downstream sub-frame? Explain how you arrived at your answer. + +c. Suppose that the fairness criterion is that any node can receive at + most twice as much data as any other node during the sub-frame. What + is the average transmission rate by the base station (to all nodes) + during the sub-frame? Explain how you arrived at your answer. P11. + In Section 7.5 , one proposed solution that allowed mobile users to + maintain their IP addresses as they moved among foreign networks was + to have a foreign network advertise a highly specific route to the + mobile user and use the existing routing infrastructure to propagate + this information throughout the network. We identified scalability + as one concern. Suppose that when a mobile user moves from one + network to another, the new foreign network advertises a specific + route to the mobile user, and the old foreign network withdraws its + route. Consider how routing information propagates in a + distance-vector algorithm (particularly for the case of interdomain + routing among networks that span the globe). + +d. Will other routers be able to route datagrams immediately to the new + foreign network as soon as the foreign network begins advertising + its route? + +e. Is it possible for different routers to believe that different + foreign networks contain the mobile user? + +f. Discuss the timescale over which other routers in the network will + eventually learn the path to the mobile users. P12. Suppose the + correspondent in Figure 7.23 were mobile. Sketch the additional + networklayer infrastructure that would be needed to route the + datagram from the original mobile user to the (now mobile) + correspondent. Show the structure of the datagram(s) between the + original mobile user and the (now mobile) correspondent, as in + Figure 7.24 . P13. In mobile IP, what effect will mobility have on + end-to-end delays of datagrams between the source and destination? + P14. Consider the chaining example discussed at the end of Section + 7.7.2 . Suppose a mobile user visits foreign networks A, B, and C, + and that a correspondent begins a connection to the mobile user when + it is resident in foreign network A. List the sequence of messages + between foreign agents, and between foreign agents and the home + agent as the mobile user moves from network A to network B to + network C. Next, suppose chaining is not performed, and the + correspondent (as well as the home agent) must be explicitly + notified of the changes in the mobile user's care-of address. List + the sequence of messages that would need to be exchanged in this + second scenario. + +P15. Consider two mobile nodes in a foreign network having a foreign +agent. Is it possible for the two mobile nodes to use the same care-of +address in mobile IP? Explain your answer. P16. In our discussion of how +the VLR updated the HLR with information about the mobile's current +location, what are the advantages and disadvantages of providing the +MSRN as opposed to the address of the VLR to the HLR? + +Wireshark Lab At the Web site for this textbook, +www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab for +this chapter that captures and studies the 802.11 frames exchanged +between a wireless laptop and an access point. + +AN INTERVIEW WITH... Deborah Estrin Deborah Estrin is a Professor of +Computer Science at Cornell Tech in New York City and a Professor of +Public Health at Weill Cornell Medical College. She is founder of the +Health Tech Hub at Cornell Tech and co-founder of the non-profit startup +Open mHealth. She received her Ph.D. (1985) in Computer Science from +M.I.T. and her B.S. (1980) from UC Berkeley. Estrin's early research +focused on the design of network protocols, including multicast and +inter-domain routing. In 2002 Estrin founded the NSF-funded Science and +Technology Center at UCLA, Center for Embedded Networked Sensing (CENS +http://cens.ucla.edu.). CENS launched new areas of multi-disciplinary +computer systems research from sensor networks for environmental +monitoring, to participatory sensing for citizen science. Her current +focus is on mobile health and small data, leveraging the pervasiveness +of mobile devices and digital interactions for health and life +management, as described in her 2013 TEDMED talk. Professor Estrin is an +elected member of the American Academy of Arts and Sciences (2007) and +the National Academy of Engineering (2009). She is a fellow of the IEEE, +ACM, and AAAS. She was selected as the first ACM-W Athena Lecturer +(2006), awarded the Anita Borg Institute's Women of Vision Award for +Innovation (2007), inducted into the WITI hall of fame (2008) and +awarded Doctor Honoris Causa from EPFL (2008) and Uppsala University +(2011). + +Please describe a few of the most exciting projects you have worked on +during your career. What were the biggest challenges? In the mid-90s at +USC and ISI, I had the great fortune to work with the likes of Steve +Deering, Mark Handley, and Van Jacobson on the design of multicast +routing protocols (in particular, PIM). I tried to carry many of the +architectural design lessons from multicast into the design of +ecological monitoring arrays, where for the first time I really began to +take applications and multidisciplinary research seriously. That +interest in jointly innovating in the social and technological space is +what interests me so much about my latest area of research, mobile +health. The challenges in these projects were as diverse as the problem +domains, but what they all had in common was the need to keep our eyes +open to whether we had the problem definition right as we iterated +between design and deployment, prototype and pilot. None of them were +problems that could be solved analytically, with simulation or even in +constructed laboratory experiments. They all challenged our ability to +retain clean architectures in the presence of messy problems and +contexts, and they all called for extensive collaboration. What changes +and innovations do you see happening in wireless networks and mobility +in the future? In a prior edition of this interview I said that I have +never put much faith into predicting the future, but I did go on to +speculate that we might see the end of feature phones (i.e., those that +are not programmable and are used only for voice and text messaging) as +smart phones become more and more powerful and the primary point of +Internet access for many---and now not so many years later that is +clearly the case. I also predicted that we would see the continued +proliferation of embedded SIMs by which all sorts of devices have the +ability to communicate via the cellular network at low data rates. While +that has occurred, we see many devices and "Internet of Things" that use +embedded WiFi and other lower power, shorter range, forms of +connectivity to local hubs. I did not anticipate at that time the +emergence of a large consumer wearables market. By the time the next +edition is published I expect broad proliferation of personal +applications that leverage data from IoT and other digital traces. Where +do you see the future of networking and the Internet? Again I think its +useful to look both back and forward. Previously I observed that the +efforts in named data and software-defined networking would emerge to +create a more manageable, evolvable, and richer infrastructure and more +generally represent moving the role of architecture higher up in the +stack. In the beginnings of the Internet, architecture was layer 4 and +below, with + +applications being more siloed/monolithic, sitting on top. Now data and +analytics dominate transport. The adoption of SDN (which I'm really +happy to see is featured in this 7th edition of this book) has been well +beyond what I ever anticipated. However, looking up the stack, our +dominant applications increasingly live in walled gardens, whether +mobile apps or large consumer platforms such as Facebook. As Data +Science and Big Data techniques develop, they might help to lure these +applications out of their silos because of the value in connecting with +other apps and platforms. What people inspired you professionally? There +are three people who come to mind. First, Dave Clark, the secret sauce +and under-sung hero of the Internet community. I was lucky to be around +in the early days to see him act as the "organizing principle" of the +IAB and Internet governance; the priest of rough consensus and running +code. Second, Scott Shenker, for his intellectual brilliance, integrity, +and persistence. I strive for, but rarely attain, his clarity in +defining problems and solutions. He is always the first person I e-mail +for advice on matters large and small. Third, my sister Judy Estrin, who +had the creativity and courage to spend her career bringing ideas and +concepts to market. Without the Judys of the world the Internet +technologies would never have transformed our lives. What are your +recommendations for students who want careers in computer science and +networking? First, build a strong foundation in your academic work, +balanced with any and every real-world work experience you can get. As +you look for a working environment, seek opportunities in problem areas +you really care about and with smart teams that you can learn from. + +Chapter 8 Security in Computer Networks + +Way back in Section 1.6 we described some of the more prevalent and +damaging classes of Internet attacks, including malware attacks, denial +of service, sniffing, source masquerading, and message modification and +deletion. Although we have since learned a tremendous amount about +computer networks, we still haven't examined how to secure networks from +those attacks. Equipped with our newly acquired expertise in computer +networking and Internet protocols, we'll now study in-depth secure +communication and, in particular, how computer networks can be defended +from those nasty bad guys. Let us introduce Alice and Bob, two people +who want to communicate and wish to do so "securely." This being a +networking text, we should remark that Alice and Bob could be two +routers that want to exchange routing tables securely, a client and +server that want to establish a secure transport connection, or two +e-mail applications that want to exchange secure e-mail---all case +studies that we will consider later in this chapter. Alice and Bob are +well-known fixtures in the security community, perhaps because their +names are more fun than a generic entity named "A" that wants to +communicate securely with a generic entity named "B." Love affairs, +wartime communication, and business transactions are the commonly cited +human needs for secure communications; preferring the first to the +latter two, we're happy to use Alice and Bob as our sender and receiver, +and imagine them in this first scenario. We said that Alice and Bob want +to communicate and wish to do so "securely," but what precisely does +this mean? As we will see, security (like love) is a many-splendored +thing; that is, there are many facets to security. Certainly, Alice and +Bob would like for the contents of their communication to remain secret +from an eavesdropper. They probably would also like to make sure that +when they are communicating, they are indeed communicating with each +other, and that if their communication is tampered with by an +eavesdropper, that this tampering is detected. In the first part of this +chapter, we'll cover the fundamental cryptography techniques that allow +for encrypting communication, authenticating the party with whom one is +communicating, and ensuring message integrity. In the second part of +this chapter, we'll examine how the fundamental cryptography principles +can be used to create secure networking protocols. Once again taking a +top-down approach, we'll examine secure protocols in each of the (top +four) layers, beginning with the application layer. We'll examine how to +secure e-mail, how to secure a TCP connection, how to provide blanket +security at the network layer, and how to secure a wireless LAN. In the +third part of this chapter we'll consider operational security, + +which is about protecting organizational networks from attacks. In +particular, we'll take a careful look at how firewalls and intrusion +detection systems can enhance the security of an organizational network. + +8.1 What Is Network Security? Let's begin our study of network security +by returning to our lovers, Alice and Bob, who want to communicate +"securely." What precisely does this mean? Certainly, Alice wants only +Bob to be able to understand a message that she has sent, even though +they are communicating over an insecure medium where an intruder (Trudy, +the intruder) may intercept whatever is transmitted from Alice to Bob. +Bob also wants to be sure that the message he receives from Alice was +indeed sent by Alice, and Alice wants to make sure that the person with +whom she is communicating is indeed Bob. Alice and Bob also want to make +sure that the contents of their messages have not been altered in +transit. They also want to be assured that they can communicate in the +first place (i.e., that no one denies them access to the resources +needed to communicate). Given these considerations, we can identify the +following desirable properties of secure communication. Confidentiality. +Only the sender and intended receiver should be able to understand the +contents of the transmitted message. Because eavesdroppers may intercept +the message, this necessarily requires that the message be somehow +encrypted so that an intercepted message cannot be understood by an +interceptor. This aspect of confidentiality is probably the most +commonly perceived meaning of the term secure communication. We'll study +cryptographic techniques for encrypting and decrypting data in Section +8.2. Message integrity. Alice and Bob want to ensure that the content of +their communication is not altered, either maliciously or by accident, +in transit. Extensions to the checksumming techniques that we +encountered in reliable transport and data link protocols can be used to +provide such message integrity. We will study message integrity in +Section 8.3. End-point authentication. Both the sender and receiver +should be able to confirm the identity of the other party involved in +the communication---to confirm that the other party is indeed who or +what they claim to be. Face-to-face human communication solves this +problem easily by visual recognition. When communicating entities +exchange messages over a medium where they cannot see the other party, +authentication is not so simple. When a user wants to access an inbox, +how does the mail server verify that the user is the person he or she +claims to be? We study end-point authentication in Section 8.4. +Operational security. Almost all organizations (companies, universities, +and so on) today have networks that are attached to the public Internet. +These networks therefore can potentially be compromised. Attackers can +attempt to deposit worms into the hosts in the network, obtain corporate +secrets, map the internal network configurations, and launch DoS +attacks. We'll see in Section 8.9 that operational devices such as +firewalls and intrusion detection systems are used to counter attacks +against an organization's network. A firewall sits between the +organization's network and the public network, controlling packet access +to and from the network. An intrusion detection + +system performs "deep packet inspection," alerting the network +administrators about suspicious activity. Having established what we +mean by network security, let's next consider exactly what information +an intruder may have access to, and what actions can be taken by the +intruder. Figure 8.1 illustrates the scenario. Alice, the sender, wants +to send data to Bob, the receiver. In order to exchange data securely, +while meeting the requirements of confidentiality, end-point +authentication, and message integrity, Alice and Bob will exchange +control messages and data messages (in much the same way that TCP +senders and receivers exchange control segments and data segments). + +Figure 8.1 Sender, receiver, and intruder (Alice, Bob, and Trudy) + +All or some of these messages will typically be encrypted. As discussed +in Section 1.6, an intruder can potentially perform +eavesdropping---sniffing and recording control and data messages on the +channel. modification, insertion, or deletion of messages or message +content. As we'll see, unless appropriate countermeasures are taken, +these capabilities allow an intruder to mount a wide variety of security +attacks: snooping on communication (possibly stealing passwords and +data), impersonating another entity, hijacking an ongoing session, +denying service to legitimate network users by overloading system +resources, and so on. A summary of reported attacks is maintained at the +CERT Coordination Center \[CERT 2016\]. Having established that there +are indeed real threats loose in the Internet, what are the Internet +equivalents of Alice and Bob, our friends who need to communicate +securely? Certainly, Bob and Alice might be human users at two end +systems, for example, a real Alice and a real Bob who really do want to +exchange secure e-mail. They might also be participants in an electronic +commerce transaction. For example, a real Bob might want to transfer his +credit card number securely to a Web server to purchase + +an item online. Similarly, a real Alice might want to interact with her +bank online. The parties needing secure communication might themselves +also be part of the network infrastructure. Recall that the domain name +system (DNS, see Section 2.4) or routing daemons that exchange routing +information (see Chapter 5) require secure communication between two +parties. The same is true for network management applications, a topic +we examined in Chapter 5). An intruder that could actively interfere +with DNS lookups (as discussed in Section 2.4), routing computations +\[RFC 4272\], or network management functions \[RFC 3414\] could wreak +havoc in the Internet. Having now established the framework, a few of +the most important definitions, and the need for network security, let +us next delve into cryptography. While the use of cryptography in +providing confidentiality is self-evident, we'll see shortly that it is +also central to providing end-point authentication and message +integrity---making cryptography a cornerstone of network security. + +8.2 Principles of Cryptography Although cryptography has a long history +dating back at least as far as Julius Caesar, modern cryptographic +techniques, including many of those used in the Internet, are based on +advances made in the past 30 years. Kahn's book, The Codebreakers \[Kahn +1967\], and Singh's book, The Code Book: The Science of Secrecy from +Ancient Egypt to Quantum Cryptography \[Singh 1999\], provide a +fascinating look at the + +Figure 8.2 Cryptographic components + +long history of cryptography. A complete discussion of cryptography +itself requires a complete book \[Kaufman 1995; Schneier 1995\] and so +we only touch on the essential aspects of cryptography, particularly as +they are practiced on the Internet. We also note that while our focus in +this section will be on the use of cryptography for confidentiality, +we'll see shortly that cryptographic techniques are inextricably woven +into authentication, message integrity, nonrepudiation, and more. +Cryptographic techniques allow a sender to disguise data so that an +intruder can gain no information from the intercepted data. The +receiver, of course, must be able to recover the original data from the +disguised data. Figure 8.2 illustrates some of the important +terminology. Suppose now that Alice wants to send a message to Bob. +Alice's message in its original form (for example, " Bob, I love you. +Alice ") is known as plaintext, or cleartext. Alice encrypts her +plaintext message using an encryption algorithm so that the encrypted +message, known as ciphertext, looks unintelligible to any intruder. +Interestingly, in many modern cryptographic systems, + +including those used in the Internet, the encryption technique itself is +known---published, standardized, and available to everyone (for example, +\[RFC 1321; RFC 3447; RFC 2420; NIST 2001\]), even a potential intruder! +Clearly, if everyone knows the method for encoding data, then there must +be some secret information that prevents an intruder from decrypting the +transmitted data. This is where keys come in. In Figure 8.2, Alice +provides a key, KA, a string of numbers or characters, as input to the +encryption algorithm. The encryption algorithm takes the key and the +plaintext message, m, as input and produces ciphertext as output. The +notation KA(m) refers to the ciphertext form (encrypted using the key +KA) of the plaintext message, m. The actual encryption algorithm that +uses key KA will be evident from the context. Similarly, Bob will +provide a key, KB, to the decryption algorithm that takes the ciphertext +and Bob's key as input and produces the original plaintext as output. +That is, if Bob receives an encrypted message KA(m), he decrypts it by +computing KB(KA(m))=m. In symmetric key systems, Alice's and Bob's keys +are identical and are secret. In public key systems, a pair of keys is +used. One of the keys is known to both Bob and Alice (indeed, it is +known to the whole world). The other key is known only by either Bob or +Alice (but not both). In the following two subsections, we consider +symmetric key and public key systems in more detail. + +8.2.1 Symmetric Key Cryptography All cryptographic algorithms involve +substituting one thing for another, for example, taking a piece of +plaintext and then computing and substituting the appropriate ciphertext +to create the encrypted message. Before studying a modern key-based +cryptographic system, let us first get our feet wet by studying a very +old, very simple symmetric key algorithm attributed to Julius Caesar, +known as the Caesar cipher (a cipher is a method for encrypting data). +For English text, the Caesar cipher would work by taking each letter in +the plaintext message and substituting the letter that is k letters +later (allowing wraparound; that is, having the letter z followed by the +letter a) in the alphabet. For example if k=3, then the letter a in +plaintext becomes d in ciphertext; b in plaintext becomes e in +ciphertext, and so on. Here, the value of k serves as the key. As an +example, the plaintext message " bob, i love you. Alice " becomes " ere, +l oryh brx. dolfh " in ciphertext. While the ciphertext does indeed look +like gibberish, it wouldn't take long to break the code if you knew that +the Caesar cipher was being used, as there are only 25 possible key +values. An improvement on the Caesar cipher is the monoalphabetic +cipher, which also substitutes one letter of the alphabet with another +letter of the alphabet. However, rather than substituting according to a +regular pattern (for example, substitution with an offset of k for all +letters), any letter can be substituted for any other letter, as long as +each letter has a unique substitute letter, and vice versa. The +substitution + +rule in Figure 8.3 shows one possible rule for encoding plaintext. The +plaintext message " bob, i love you. Alice " becomes "nkn, s gktc wky. +Mgsbc." Thus, as in the case of the Caesar cipher, this looks like +gibberish. A monoalphabetic cipher would also appear to be better than +the Caesar cipher in that there are 26! (on the order of 1026) possible +pairings of letters rather than 25 possible pairings. A brute-force +approach of trying all 1026 possible pairings + +Figure 8.3 A monoalphabetic cipher + +would require far too much work to be a feasible way of breaking the +encryption algorithm and decoding the message. However, by statistical +analysis of the plaintext language, for example, knowing that the +letters e and t are the most frequently occurring letters in typical +English text (accounting for 13 percent and 9 percent of letter +occurrences), and knowing that particular two-and three-letter +occurrences of letters appear quite often together (for example, "in," +"it," "the," "ion," "ing," and so forth) make it relatively easy to +break this code. If the intruder has some knowledge about the possible +contents of the message, then it is even easier to break the code. For +example, if Trudy the intruder is Bob's wife and suspects Bob of having +an affair with Alice, then she might suspect that the names "bob" and +"alice" appear in the text. If Trudy knew for certain that those two +names appeared in the ciphertext and had a copy of the example +ciphertext message above, then she could immediately determine seven of +the 26 letter pairings, requiring 109 fewer possibilities to be checked +by a brute-force method. Indeed, if Trudy suspected Bob of having an +affair, she might well expect to find some other choice words in the +message as well. When considering how easy it might be for Trudy to +break Bob and Alice's encryption scheme, one can distinguish three +different scenarios, depending on what information the intruder has. +Ciphertext-only attack. In some cases, the intruder may have access only +to the intercepted ciphertext, with no certain information about the +contents of the plaintext message. We have seen how statistical analysis +can help in a ciphertext-only attack on an encryption scheme. +Known-plaintext attack. We saw above that if Trudy somehow knew for sure +that "bob" and "alice" appeared in the ciphertext message, then she +could have determined the (plaintext, ciphertext) pairings for the +letters a, l, i, c, e, b, and o. Trudy might also have been fortunate +enough to have recorded all of the ciphertext transmissions and then +found Bob's own decrypted version of one of the transmissions scribbled +on a piece of paper. When an intruder knows some of the (plaintext, +ciphertext) pairings, we refer to this as a known-plaintext attack on +the encryption scheme. Chosen-plaintext attack. In a chosen-plaintext +attack, the intruder is able to choose the plaintext + +message and obtain its corresponding ciphertext form. For the simple +encryption algorithms we've seen so far, if Trudy could get Alice to +send the message, " The quick brown fox jumps over the lazy dog, " she +could completely break the encryption scheme. We'll see shortly that for +more sophisticated encryption techniques, a chosen-plaintext attack does +not necessarily mean that the encryption technique can be broken. Five +hundred years ago, techniques improving on monoalphabetic encryption, +known as polyalphabetic encryption, were invented. The idea behind +polyalphabetic encryption is to use multiple monoalphabetic ciphers, +with a specific + +Figure 8.4 A polyalphabetic cipher using two Caesar ciphers + +monoalphabetic cipher to encode a letter in a specific position in the +plaintext message. Thus, the same letter, appearing in different +positions in the plaintext message, might be encoded differently. An +example of a polyalphabetic encryption scheme is shown in Figure 8.4. It +has two Caesar ciphers (with k=5 and k=19), shown as rows. We might +choose to use these two Caesar ciphers, C1 and C2, in the repeating +pattern C1, C2, C2, C1, C2. That is, the first letter of plaintext is to +be encoded using C1, the second and third using C2, the fourth using C1, +and the fifth using C2. The pattern then repeats, with the sixth letter +being encoded using C1, the seventh with C2, and so on. The plaintext +message " bob, i love you. " is thus encrypted " ghu, n etox dhz. " Note +that the first b in the plaintext message is encrypted using C1, while +the second b is encrypted using C2. In this example, the encryption and +decryption "key" is the knowledge of the two Caesar keys (k=5, k=19) and +the pattern C1, C2, C2, C1, C2. Block Ciphers Let us now move forward to +modern times and examine how symmetric key encryption is done today. +There are two broad classes of symmetric encryption techniques: stream +ciphers and block ciphers. We'll briefly examine stream ciphers in +Section 8.7 when we investigate security for wireless LANs. In this +section, we focus on block ciphers, which are used in many secure +Internet protocols, including PGP (for secure e-mail), SSL (for securing +TCP connections), and IPsec (for securing the network-layer transport). +In a block cipher, the message to be encrypted is processed in blocks of +k bits. For example, if k=64, then the message is broken into 64-bit +blocks, and each block is encrypted independently. To encode a block, +the cipher uses a one-to-one mapping to map the k-bit block of cleartext +to a k-bit block of + +ciphertext. Let's look at an example. Suppose that k=3, so that the +block cipher maps 3-bit inputs (cleartext) to 3-bit outputs +(ciphertext). One possible mapping is given in Table 8.1. Notice that +this is a one-to-one mapping; that is, there is a different output for +each input. This block cipher breaks the message up into 3-bit blocks +and encrypts each block according to the above mapping. You should +verify that the message 010110001111 gets encrypted into 101000111001. +Continuing with this 3-bit block example, note that the mapping in Table +8.1 is just one mapping of many possible mappings. How many possible +mappings are Table 8.1 A specific 3-bit block cipher input + +output + +input + +output + +000 + +110 + +100 + +011 + +001 + +111 + +101 + +010 + +010 + +101 + +110 + +000 + +011 + +100 + +111 + +001 + +there? To answer this question, observe that a mapping is nothing more +than a permutation of all the possible inputs. There are 23(=8) possible +inputs (listed under the input columns). These eight inputs can be +permuted in 8!=40,320 different ways. Since each of these permutations +specifies a mapping, there are 40,320 possible mappings. We can view +each of these mappings as a key---if Alice and Bob both know the mapping +(the key), they can encrypt and decrypt the messages sent between them. +The brute-force attack for this cipher is to try to decrypt ciphtertext +by using all mappings. With only 40,320 mappings (when k=3), this can +quickly be accomplished on a desktop PC. To thwart brute-force attacks, +block ciphers typically use much larger blocks, consisting of k=64 bits +or even larger. Note that the number of possible mappings for a general +k-block cipher is 2k!, which is astronomical for even moderate values of +k (such as k=64). Although full-table block ciphers, as just described, +with moderate values of k can produce robust symmetric key encryption +schemes, they are unfortunately difficult to implement. For k=64 and for +a given mapping, Alice and Bob would need to maintain a table with 264 +input values, which is an infeasible task. Moreover, if Alice and Bob +were to change keys, they would have to each regenerate the table. Thus, +a full-table block cipher, providing predetermined mappings between all +inputs and outputs (as in the example above), is simply out of the +question. + +Instead, block ciphers typically use functions that simulate randomly +permuted tables. An example (adapted from \[Kaufman 1995\]) of such a +function for k=64 bits is shown in Figure 8.5. The function first breaks +a 64-bit block into 8 chunks, with each chunk consisting of 8 bits. Each +8-bit chunk is processed by an 8-bit to 8-bit table, which is of +manageable size. For example, the first chunk is processed by the table +denoted by T1. Next, the 8 output chunks are reassembled into a 64-bit +block. The positions of the 64 bits in the block are then scrambled +(permuted) to produce a 64-bit output. This output is fed back to the +64-bit input, where another cycle begins. After n such cycles, the +function provides a 64-bit block of ciphertext. The purpose of the +rounds is to make each input bit affect most (if not all) of the final +output bits. (If only one round were used, a given input bit would +affect only 8 of the 64 output bits.) The key for this block cipher +algorithm would be the eight permutation tables (assuming the scramble +function is publicly known). + +Figure 8.5 An example of a block cipher + +Today there are a number of popular block ciphers, including DES +(standing for Data Encryption Standard), 3DES, and AES (standing for +Advanced Encryption Standard). Each of these standards uses functions, +rather than predetermined tables, along the lines of Figure 8.5 (albeit +more complicated and specific to each cipher). Each of these algorithms +also uses a string of bits for a key. For example, DES uses 64-bit +blocks with a 56-bit key. AES uses 128-bit blocks and can operate with +keys that are 128, 192, and 256 bits long. An algorithm's key determines +the specific "mini-table" mappings and permutations within the +algorithm's internals. The brute-force attack for each of these ciphers +is to cycle through all the keys, applying the decryption algorithm with +each key. Observe that with a key length of n, there are 2n possible +keys. NIST \[NIST 2001\] estimates that a machine that could crack +56-bit DES in one second (that is, try all 256 keys in one second) would +take approximately 149 trillion years to crack a 128-bit AES key. + +Cipher-Block Chaining In computer networking applications, we typically +need to encrypt long messages (or long streams of data). If we apply a +block cipher as described by simply chopping up the message into k-bit +blocks and independently encrypting each block, a subtle but important +problem occurs. To see this, observe that two or more of the cleartext +blocks can be identical. For example, the cleartext in two or more +blocks could be "HTTP/1.1". For these identical blocks, a block cipher +would, of course, produce the same ciphertext. An attacker could +potentially guess the cleartext when it sees identical ciphertext blocks +and may even be able to decrypt the entire message by identifying +identical ciphtertext blocks and using knowledge about the underlying +protocol structure \[Kaufman 1995\]. To address this problem, we can mix +some randomness into the ciphertext so that identical plaintext blocks +produce different ciphertext blocks. To explain this idea, let m(i) +denote the ith plaintext block, c(i) denote the ith ciphertext block, +and a⊕b denote the exclusive-or (XOR) of two bit strings, a and b. +(Recall that the 0⊕0=1⊕1=0 and 0⊕1=1⊕0=1, and the XOR of two bit strings +is done on a bit-by-bit basis. So, for example, +10101010⊕11110000=01011010.) Also, denote the block-cipher encryption +algorithm with key S as KS. The basic idea is as follows. The sender +creates a random k-bit number r(i) for the ith block and calculates +c(i)=KS(m(i)⊕r(i)). Note that a new k-bit random number is chosen for +each block. The sender then sends c(1), r(1), c(2), r(2), c(3), r(3), +and so on. Since the receiver receives c(i) and r(i), it can recover +each block of the plaintext by computing m(i)=KS(c(i))⊕r(i). It is +important to note that, although r(i) is sent in the clear and thus can +be sniffed by Trudy, she cannot obtain the plaintext m(i), since she +does not know the key KS. Also note that if two plaintext blocks m(i) +and m(j) are the same, the corresponding ciphertext blocks c(i) and c(j) +will be different (as long as the random numbers r(i) and r(j) are +different, which occurs with very high probability). As an example, +consider the 3-bit block cipher in Table 8.1. Suppose the plaintext is +010010010. If Alice encrypts this directly, without including the +randomness, the resulting ciphertext becomes 101101101. If Trudy sniffs +this ciphertext, because each of the three cipher blocks is the same, +she can correctly surmise that each of the three plaintext blocks are +the same. Now suppose instead Alice generates the random blocks +r(1)=001, r(2)=111, and r(3)=100 and uses the above technique to +generate the ciphertext c(1)=100, c(2)=010, and c(3)=000. Note that the +three ciphertext blocks are different even though the plaintext blocks +are the same. Alice then sends c(1), r(1), c(2), and r(2). You should +verify that Bob can obtain the original plaintext using the shared key +KS. The astute reader will note that introducing randomness solves one +problem but creates another: namely, Alice must transmit twice as many +bits as before. Indeed, for each cipher bit, she must now also send a +random bit, doubling the required bandwidth. In order to have our cake +and eat it too, block ciphers typically use a technique called Cipher +Block Chaining (CBC). The basic idea is to send only one random value +along with the very first message, and then have the sender and receiver +use the + +computed coded blocks in place of the subsequent random number. +Specifically, CBC operates as follows: + +1. Before encrypting the message (or the stream of data), the sender + generates a random k-bit string, called the Initialization Vector + (IV). Denote this initialization vector by c(0). The sender sends + the IV to the receiver in cleartext. + +2. For the first block, the sender calculates m(1)⊕c(0), that is, + calculates the exclusive-or of the first block of cleartext with + the IV. It then runs the result through the block-cipher algorithm + to get the corresponding ciphertext block; that is, + c(1)=KS(m(1)⊕c(0)). The sender sends the encrypted block c(1) to the + receiver. + +3. For the ith block, the sender generates the ith ciphertext block + from c(i)= KS(m(i)⊕c(i−1)). Let's now examine some of the + consequences of this approach. First, the receiver will still be + able to recover the original message. Indeed, when the receiver + receives c(i), it decrypts it with KS to obtain s(i)=m(i)⊕c(i−1); + since the receiver also knows c(i−1), it then obtains the cleartext + block from m(i)=s(i)⊕c(i−1). Second, even if two cleartext blocks + are identical, the corresponding ciphtertexts (almost always) will + be different. Third, although the sender sends the IV in the clear, + an intruder will still not be able to decrypt the ciphertext blocks, + since the intruder does not know the secret key, S. Finally, the + sender only sends one overhead block (the IV), thereby negligibly + increasing the bandwidth usage for long messages (consisting of + hundreds of blocks). As an example, let's now determine the + ciphertext for the 3-bit block cipher in Table 8.1 with plaintext + 010010010 and IV=c(0)=001. The sender first uses the IV to calculate + c(1)=KS(m(1)⊕c(0))=100. The sender then calculates c(2)= + KS(m(2)⊕c(1))=KS(010⊕100)=000, and C(3)=KS(m(3)⊕c(2))=KS(010⊕ + 000)=101. The reader should verify that the receiver, knowing the IV + and KS can recover the original plaintext. CBC has an important + consequence when designing secure network protocols: we'll need to + provide a mechanism within the protocol to distribute the IV from + sender to receiver. We'll see how this is done for several protocols + later in this chapter. + +8.2.2 Public Key Encryption For more than 2,000 years (since the time of +the Caesar cipher and up to the 1970s), encrypted communication required +that the two communicating parties share a common secret---the symmetric +key used for encryption and decryption. One difficulty with this +approach is that the two parties must somehow agree on the shared key; +but to do so requires (presumably secure) communication! Perhaps the +parties could first meet and agree on the key in person (for example, +two of Caesar's centurions might meet at the Roman baths) and thereafter +communicate with encryption. In a networked world, + +however, communicating parties may never meet and may never converse +except over the network. Is it possible for two parties to communicate +with encryption without having a shared secret key that is known in +advance? In 1976, Diffie and Hellman \[Diffie 1976\] demonstrated an +algorithm (known now as Diffie-Hellman Key Exchange) to do just that---a +radically different and marvelously elegant approach toward secure +communication that has led to the development of today's public key +cryptography systems. We'll see shortly that public key cryptography +systems also have several wonderful properties that make them useful not +only + +Figure 8.6 Public key cryptography + +for encryption, but for authentication and digital signatures as well. +Interestingly, it has recently come to light that ideas similar to those +in \[Diffie 1976\] and \[RSA 1978\] had been independently developed in +the early 1970s in a series of secret reports by researchers at the +Communications-Electronics Security Group in the United Kingdom \[Ellis +1987\]. As is often the case, great ideas can spring up independently in +many places; fortunately, public key advances took place not only in +private, but also in the public view, as well. The use of public key +cryptography is conceptually quite simple. Suppose Alice wants to +communicate with Bob. As shown in Figure 8.6, rather than Bob and Alice +sharing a single secret key (as in the case of symmetric key systems), +Bob (the recipient of Alice's messages) instead has two keys---a public +key that is available to everyone in the world (including Trudy the +intruder) and a private key that is known only to Bob. We will use the +notation KB+ and KB− to refer to Bob's public and private keys, +respectively. In order to communicate with Bob, Alice first fetches +Bob's public key. Alice then encrypts her message, m, to Bob using Bob's +public key and a known (for example, standardized) encryption algorithm; +that is, Alice computes KB−(m). Bob receives Alice's encrypted message +and uses his private key and a known (for example, standardized) +decryption algorithm to decrypt Alice's encrypted message. That is, Bob +computes KB−(KB+(m)). We will see below that there are +encryption/decryption + +algorithms and techniques for choosing public and private keys such that +KB−(KB+(m))=m; that is, applying Bob's public key, KB+, to a message, m +(to get KB−(m)), and then applying Bob's private key, KB−, to the +encrypted version of m (that is, computing KB−(KB+(m))) gives back m. +This is a remarkable result! In this manner, Alice can use Bob's +publicly available key to send a secret message to Bob without either of +them having to distribute any secret keys! We will see shortly that we +can interchange the public key and private key encryption and get the +same remarkable result----that is, KB−(B+(m))=KB+(KB−(m))=m. The use of +public key cryptography is thus conceptually simple. But two immediate +worries may spring to mind. A first concern is that although an intruder +intercepting Alice's encrypted message will see only gibberish, the +intruder knows both the key (Bob's public key, which is available for +all the world to see) and the algorithm that Alice used for encryption. +Trudy can thus mount a chosen-plaintext attack, using the known +standardized encryption algorithm and Bob's publicly available +encryption key to encode any message she chooses! Trudy might well try, +for example, to encode messages, or parts of messages, that she suspects +that Alice might send. Clearly, if public key cryptography is to work, +key selection and encryption/decryption must be done in such a way that +it is impossible (or at least so hard as to be nearly impossible) for an +intruder to either determine Bob's private key or somehow otherwise +decrypt or guess Alice's message to Bob. A second concern is that since +Bob's encryption key is public, anyone can send an encrypted message to +Bob, including Alice or someone claiming to be Alice. In the case of a +single shared secret key, the fact that the sender knows the secret key +implicitly identifies the sender to the receiver. In the case of public +key cryptography, however, this is no longer the case since anyone can +send an encrypted message to Bob using Bob's publicly available key. A +digital signature, a topic we will study in Section 8.3, is needed to +bind a sender to a message. RSA While there may be many algorithms that +address these concerns, the RSA algorithm (named after its founders, Ron +Rivest, Adi Shamir, and Leonard Adleman) has become almost synonymous +with public key cryptography. Let's first see how RSA works and then +examine why it works. RSA makes extensive use of arithmetic operations +using modulo-n arithmetic. So let's briefly review modular arithmetic. +Recall that x mod n simply means the remainder of x when divided by n; +so, for example, 19 mod 5=4. In modular arithmetic, one performs the +usual operations of addition, multiplication, and exponentiation. +However, the result of each operation is replaced by the integer +remainder that is left when the result is divided by n. Adding and +multiplying with modular arithmetic is facilitated with the following +handy facts: \[ (a mod n)+(b mod n)\]mod n=(a+b)mod n\[ (a mod n)−(b mod +n)\]mod n=(a−b)mod n\[ (a mod n)⋅(b mod n)\]mod n=(a⋅b)mod n + +It follows from the third fact that (a mod n)d n=ad mod n, which is an +identity that we will soon find very useful. Now suppose that Alice +wants to send to Bob an RSA-encrypted message, as shown in Figure 8.6. +In our discussion of RSA, let's always keep in mind that a message is +nothing but a bit pattern, and every bit pattern can be uniquely +represented by an integer number (along with the length of the bit +pattern). For example, suppose a message is the bit pattern 1001; this +message can be represented by the decimal integer 9. Thus, when +encrypting a message with RSA, it is equivalent to encrypting the unique +integer number that represents the message. There are two interrelated +components of RSA: The choice of the public key and the private key The +encryption and decryption algorithm To generate the public and private +RSA keys, Bob performs the following steps: + +1. Choose two large prime numbers, p and q. How large should p and q + be? The larger the values, the more difficult it is to break RSA, + but the longer it takes to perform the encoding and decoding. RSA + Laboratories recommends that the product of p and q be on the order + of 1,024 bits. For a discussion of how to find large prime numbers, + see \[Caldwell 2012\]. + +2. Compute n=pq and z=(p−1)(q−1). + +3. Choose a number, e, less than n, that has no common factors (other + than 1) with z. (In this case, e and z are said to be relatively + prime.) The letter e is used since this value will be used in + encryption. + +4. Find a number, d, such that ed−1 is exactly divisible (that is, with + no remainder) by z. The letter d is used because this value will be + used in decryption. Put another way, given e, we choose d such that + ed modz=1 + +5. The public key that Bob makes available to the world, KB+, is the + pair of numbers (n, e); his private key, KB−, is the pair of numbers + (n, d). The encryption by Alice and the decryption by Bob are done + as follows: Suppose Alice wants to send Bob a bit pattern + represented by the integer number m (with m\<n). To encode, Alice + performs the exponentiation me, and then computes the integer + remainder when me is divided by n. In other words, the encrypted + value, c, of Alice's plaintext message, m, is c=memod n + +The bit pattern corresponding to this ciphertext c is sent to Bob. To +decrypt the received ciphertext message, c, Bob computes m=cdmod n which +requires the use of his private key (n, d). Table 8.2 Alice's RSA +encryption, e=5, n=35 Plaintext Letter + +m: numeric representation + +me + +Ciphertext c=me mod n + +l + +12 + +248832 + +17 + +o + +15 + +759375 + +15 + +v + +22 + +5153632 + +22 + +e + +5 + +3125 + +10 + +As a simple example of RSA, suppose Bob chooses p=5 and q=7. +(Admittedly, these values are far too small to be secure.) Then n=35 and +z=24. Bob chooses e=5, since 5 and 24 have no common factors. Finally, +Bob chooses d=29, since 5⋅29−1 (that is, ed−1) is exactly divisible by +24. Bob makes the two values, n=35 and e=5, public and keeps the value +d=29 secret. Observing these two public values, suppose Alice now wants +to send the letters l, o, v, and e to Bob. Interpreting each letter as a +number between 1 and 26 (with a being 1, and z being 26), Alice and Bob +perform the encryption and decryption shown in Tables 8.2 and 8.3, +respectively. Note that in this example, we consider each of the four +letters as a distinct message. A more realistic example would be to +convert the four letters into their 8-bit ASCII representations and then +encrypt the integer corresponding to the resulting 32-bit bit pattern. +(Such a realistic example generates numbers that are much too long to +print in a textbook!) Given that the "toy" example in Tables 8.2 and 8.3 +has already produced some extremely large numbers, and given that we saw +earlier that p and q should each be several hundred bits long, several +practical issues regarding RSA come to mind. How does one choose large +prime numbers? How does one then choose e and d? How does one perform +exponentiation with large numbers? A discussion of these important +issues is beyond the scope of this book; see \[Kaufman 1995\] and the +references therein for details. Table 8.3 Bob's RSA decryption, d=29, +n=35 Ciphertext c + +cd + +m = cd mod n + +Plaintext Letter + +17 + +4819685721067509150915091411825223071697 + +12 + +l + +15 + +127834039403948858939111232757568359375 + +15 + +o + +22 + +851643319086537701956194499721106030592 + +22 + +v + +10 + +1000000000000000000000000000000 + +5 + +e + +Session Keys We note here that the exponentiation required by RSA is a +rather time-consuming process. By contrast, DES is at least 100 times +faster in software and between 1,000 and 10,000 times faster in hardware +\[RSA Fast 2012\]. As a result, RSA is often used in practice in +combination with symmetric key cryptography. For example, if Alice wants +to send Bob a large amount of encrypted data, she could do the +following. First Alice chooses a key that will be used to encode the +data itself; this key is referred to as a session key, and is denoted by +KS. Alice must inform Bob of the session key, since this is the shared +symmetric key they will use with a symmetric key cipher (e.g., with DES +or AES). Alice encrypts the session key using Bob's public key, that is, +computes c=(KS)e mod n. Bob receives the RSA-encrypted session key, c, +and decrypts it to obtain the session key, KS. Bob now knows the session +key that Alice will use for her encrypted data transfer. Why Does RSA +Work? RSA encryption/decryption appears rather magical. Why should it be +that by applying the encryption algorithm and then the decryption +algorithm, one recovers the original message? In order to understand why +RSA works, again denote n=pq, where p and q are the large prime numbers +used in the RSA algorithm. Recall that, under RSA encryption, a message +(uniquely represented by an integer), m, is exponentiated to the power e +using modulo-n arithmetic, that is, c=memod n Decryption is performed by +raising this value to the power d, again using modulo-n arithmetic. The +result of an encryption step followed by a decryption step is thus (me +mod n)d mod n. Let's now see what we can say about this quantity. As +mentioned earlier, one important property of modulo arithmetic is (a mod +n)d mod n=ad mod n for any values a, n, and d. Thus, using a=me in this +property, we have (memod n)dmod n=medmod n + +It therefore remains to show that medmod n=m. Although we're trying to +remove some of the magic about why RSA works, to establish this, we'll +need to use a rather magical result from number theory here. +Specifically, we'll need the result that says if p and q are prime, +n=pq, and z=(p−1)(q−1), then xy mod n is the same as x(y mod z) mod n +\[Kaufman 1995\]. Applying this result with x=m and y=ed we have medmod +n=m(edmod z)mod n But remember that we have chosen e and d such that +edmod z=1. This gives us medmod n=m1mod n=m which is exactly the result +we are looking for! By first exponentiating to the power of e (that is, +encrypting) and then exponentiating to the power of d (that is, +decrypting), we obtain the original value, m. Even more wonderful is the +fact that if we first exponentiate to the power of d and then +exponentiate to the power of e---that is, we reverse the order of +encryption and decryption, performing the decryption operation first and +then applying the encryption operation---we also obtain the original +value, m. This wonderful result follows immediately from the modular +arithmetic: (mdmod n)emod n=mdemod n=medmod n=(memod n)dmod n The +security of RSA relies on the fact that there are no known algorithms +for quickly factoring a number, in this case the public value n, into +the primes p and q. If one knew p and q, then given the public value e, +one could easily compute the secret key, d. On the other hand, it is not +known whether or not there exist fast algorithms for factoring a number, +and in this sense, the security of RSA is not guaranteed. Another +popular public-key encryption algorithm is the Diffie-Hellman algorithm, +which we will briefly explore in the homework problems. Diffie-Hellman +is not as versatile as RSA in that it cannot be used to encrypt messages +of arbitrary length; it can be used, however, to establish a symmetric +session key, which is in turn used to encrypt messages. + +8.3 Message Integrity and Digital Signatures In the previous section we +saw how encryption can be used to provide confidentiality to two +communicating entities. In this section we turn to the equally important +cryptography topic of providing message integrity (also known as message +authentication). Along with message integrity, we will discuss two +related topics in this section: digital signatures and end-point +authentication. We define the message integrity problem using, once +again, Alice and Bob. Suppose Bob receives a message (which may be +encrypted or may be in plaintext) and he believes this message was sent +by Alice. To authenticate this message, Bob needs to verify: + +1. The message indeed originated from Alice. +2. The message was not tampered with on its way to Bob. We'll see in + Sections 8.4 through 8.7 that this problem of message integrity is a + critical concern in just about all secure networking protocols. As a + specific example, consider a computer network using a link-state + routing algorithm (such as OSPF) for determining routes between each + pair of routers in the network (see Chapter 5). In a link-state + algorithm, each router needs to broadcast a link-state message to + all other routers in the network. A router's link-state message + includes a list of its directly connected neighbors and the direct + costs to these neighbors. Once a router receives link-state messages + from all of the other routers, it can create a complete map of the + network, run its least-cost routing algorithm, and configure its + forwarding table. One relatively easy attack on the routing + algorithm is for Trudy to distribute bogus link-state messages with + incorrect link-state information. Thus the need for message + integrity---when router B receives a linkstate message from router + A, router B should verify that router A actually created the message + and, further, that no one tampered with the message in transit. In + this section, we describe a popular message integrity technique that + is used by many secure networking protocols. But before doing so, we + need to cover another important topic in cryptography--- + cryptographic hash functions. + +8.3.1 Cryptographic Hash Functions As shown in Figure 8.7, a hash +function takes an input, m, and computes a fixed-size string H(m) + +known as a hash. The Internet checksum (Chapter 3) and CRCs (Chapter 6) +meet this definition. A cryptographic hash function is required to have +the following additional property: It is computationally infeasible to +find any two different messages x and y such that H(x)=H(y). Informally, +this property means that it is computationally infeasible for an +intruder to substitute one message for another message that is protected +by the hash + +Figure 8.7 Hash functions + +Figure 8.8 Initial message and fraudulent message have the same +checksum! + +function. That is, if (m, H(m)) are the message and the hash of the +message created by the sender, then + +an intruder cannot forge the contents of another message, y, that has +the same hash value as the original message. Let's convince ourselves +that a simple checksum, such as the Internet checksum, would make a poor +cryptographic hash function. Rather than performing 1s complement +arithmetic (as in the Internet checksum), let us compute a checksum by +treating each character as a byte and adding the bytes together using +4-byte chunks at a time. Suppose Bob owes Alice \$100.99 and sends an +IOU to Alice consisting of the text string " IOU100.99BOB. " The ASCII +representation (in hexadecimal notation) for these letters is 49 , 4F , +55 , 31 , 30 , 30 , 2E , 39 , 39 , 42 , 4F , 42 . Figure 8.8 (top) shows +that the 4-byte checksum for this message is B2 C1 D2 AC. A slightly +different message (and a much more costly one for Bob) is shown in the +bottom half of Figure 8.8. The messages " IOU100.99BOB " and " +IOU900.19BOB " have the same checksum. Thus, this simple checksum +algorithm violates the requirement above. Given the original data, it is +simple to find another set of data with the same checksum. Clearly, for +security purposes, we are going to need a more powerful hash function +than a checksum. The MD5 hash algorithm of Ron Rivest \[RFC 1321\] is in +wide use today. It computes a 128-bit hash in a four-step process +consisting of a padding step (adding a one followed by enough zeros so +that the length of the message satisfies certain conditions), an append +step (appending a 64-bit representation of the message length before +padding), an initialization of an accumulator, and a final looping step +in which the message's 16-word blocks are processed (mangled) in four +rounds. For a description of MD5 (including a C source code +implementation) see \[RFC 1321\]. The second major hash algorithm in use +today is the Secure Hash Algorithm (SHA-1) \[FIPS 1995\]. This algorithm +is based on principles similar to those used in the design of MD4 \[RFC +1320\], the predecessor to MD5. SHA-1, a US federal standard, is +required for use whenever a cryptographic hash algorithm is needed for +federal applications. It produces a 160-bit message digest. The longer +output length makes SHA-1 more secure. + +8.3.2 Message Authentication Code Let's now return to the problem of +message integrity. Now that we understand hash functions, let's take a +first stab at how we might perform message integrity: + +1. Alice creates message m and calculates the hash H(m) (for example + with SHA-1). +2. Alice then appends H(m) to the message m, creating an extended + message (m, H(m)), and sends the extended message to Bob. + +3. Bob receives an extended message (m, h) and calculates H(m). If +H(m)=h, Bob concludes that everything is fine. This approach is +obviously flawed. Trudy can create a bogus message m´ in which she says +she is Alice, calculate H(m´), and send Bob (m´, H(m´)). When Bob +receives the message, everything checks out in step 3, so Bob doesn't +suspect any funny business. To perform message integrity, in addition to +using cryptographic hash functions, Alice and Bob will need a shared +secret s. This shared secret, which is nothing more than a string of +bits, is called the authentication key. Using this shared secret, +message integrity can be performed as follows: + +1. Alice creates message m, concatenates s with m to create m+s, and + calculates the hash H(m+s) (for example with SHA-1). H(m+s) is + called the message authentication code (MAC). + +2. Alice then appends the MAC to the message m, creating an extended + message (m, H(m+s)), and sends the extended message to Bob. + +3. Bob receives an extended message (m, h) and knowing s, calculates + the MAC H(m+s). If H(m+s)=h, Bob concludes that everything is fine. + A summary of the procedure is shown in Figure 8.9. Readers should + note that the MAC here (standing for "message authentication code") + is not the same MAC used in link-layer protocols (standing for + "medium access control")! One nice feature of a MAC is that it does + not require an encryption algorithm. Indeed, in many applications, + including the link-state routing algorithm described earlier, + communicating entities are only concerned with message integrity and + are not concerned with message confidentiality. Using a MAC, the + entities can authenticate + +Figure 8.9 Message authentication code (MAC) + +the messages they send to each other without having to integrate complex +encryption algorithms into the integrity process. As you might expect, a +number of different standards for MACs have been proposed over the +years. The most popular standard today is HMAC, which can be used either +with MD5 or SHA-1. HMAC actually runs data and the authentication key +through the hash function twice \[Kaufman 1995; RFC 2104\]. There still +remains an important issue. How do we distribute the shared +authentication key to the communicating entities? For example, in the +link-state routing algorithm, we would somehow need to distribute the +secret authentication key to each of the routers in the autonomous +system. (Note that the routers can all use the same authentication key.) +A network administrator could actually accomplish this by physically +visiting each of the routers. Or, if the network administrator is a lazy +guy, and if each router has its own public key, the network +administrator could distribute the authentication key to any one of the +routers by encrypting it with the router's public key and then sending +the encrypted key over the network to the router. + +8.3.3 Digital Signatures Think of the number of the times you've signed +your name to a piece of paper during the last week. You sign checks, +credit card receipts, legal documents, and letters. Your signature +attests to the fact that you (as opposed to someone else) have +acknowledged and/or agreed with the document's contents. In a digital +world, one often wants to indicate the owner or creator of a document, +or to signify one's agreement with a document's content. A digital +signature is a cryptographic technique for achieving these goals in a +digital world. Just as with handwritten signatures, digital signing +should be done in a way that is verifiable and nonforgeable. That is, it +must be possible to prove that a document signed by an individual was +indeed signed by that individual (the signature must be verifiable) and +that only that individual could have signed the document (the signature +cannot be forged). Let's now consider how we might design a digital +signature scheme. Observe that when Bob signs a message, Bob must put +something on the message that is unique to him. Bob could consider +attaching a MAC for the signature, where the MAC is created by appending +his key (unique to him) to the message, and then taking the hash. But +for Alice to verify the signature, she must also have a copy of the key, +in which case the key would not be unique to Bob. Thus, MACs are not +going to get the job done here. + +Recall that with public-key cryptography, Bob has both a public and +private key, with both of these keys being unique to Bob. Thus, +public-key cryptography is an excellent candidate for providing digital +signatures. Let us now examine how it is done. Suppose that Bob wants to +digitally sign a document, m. We can think of the document as a file or +a message that Bob is going to sign and send. As shown in Figure 8.10, +to sign this document, Bob simply uses his private key, KB−, to compute +KB−(m). At first, it might seem odd that Bob is using his private key +(which, as we saw in Section 8.2, was used to decrypt a message that had +been encrypted with his public key) to sign a document. But recall that +encryption and decryption are nothing more than mathematical operations +(exponentiation to the power of e or d in RSA; see Section 8.2) and +recall that Bob's goal is not to scramble or obscure the contents of the +document, but rather to sign the document in a manner that is verifiable +and nonforgeable. Bob's digital signature of the document is KB−(m). +Does the digital signature KB−(m) meet our requirements of being +verifiable and nonforgeable? Suppose Alice has m and KB−(m). She wants +to prove in court (being + +Figure 8.10 Creating a digital signature for a document + +litigious) that Bob had indeed signed the document and was the only +person who could have possibly signed the document. Alice takes Bob's +public key, KB+, and applies it to the digital signature, KB−(m), +associated with the document, m. That is, she computes KB+(KB−(m)), and +voilà, with a dramatic flurry, she produces m, which exactly matches the +original document! Alice then argues that only Bob could have signed the +document, for the following reasons: Whoever signed the message must +have used the private key, KB−, in computing the signature KB−(m), such +that KB+(KB−(m))=m. The only person who could have known the private +key, KB−, is Bob. Recall from our discussion of + +RSA in Section 8.2 that knowing the public key, KB+, is of no help in +learning the private key, KB−. Therefore, the only person who could know +KB− is the person who generated the pair of keys, (KB+, KB−), in the +first place, Bob. (Note that this assumes, though, that Bob has not +given KB− to anyone, nor has anyone stolen KB− from Bob.) It is also +important to note that if the original document, m, is ever modified to +some alternate form, m´, the signature that Bob created for m will not +be valid for m´, since KB+(KB−(m)) does not equal m´. Thus we see that +digital signatures also provide message integrity, allowing the receiver +to verify that the message was unaltered as well as the source of the +message. One concern with signing data by encryption is that encryption +and decryption are computationally expensive. Given the overheads of +encryption and decryption, signing data via complete +encryption/decryption can be overkill. A more efficient approach is to +introduce hash functions into the digital signature. Recall from Section +8.3.2 that a hash algorithm takes a message, m, of arbitrary length and +computes a fixed-length "fingerprint" of the message, denoted by H(m). +Using a hash function, Bob signs the hash of a message rather than the +message itself, that is, Bob calculates KB−(H(m)). Since H(m) is +generally much smaller than the original message m, the computational +effort required to create the digital signature is substantially +reduced. In the context of Bob sending a message to Alice, Figure 8.11 +provides a summary of the operational procedure of creating a digital +signature. Bob puts his original long message through a hash function. +He then digitally signs the resulting hash with his private key. The +original message (in cleartext) along with the digitally signed message +digest (henceforth referred to as the digital signature) is then sent to +Alice. Figure 8.12 provides a summary of the operational procedure of +the signature. Alice applies the sender's public key to the message to +obtain a hash result. Alice also applies the hash function to the +cleartext message to obtain a second hash result. If the two hashes +match, then Alice can be sure about the integrity and author of the +message. Before moving on, let's briefly compare digital signatures with +MACs, since they have parallels, but also have important subtle +differences. Both digital signatures and + +Figure 8.11 Sending a digitally signed message + +MACs start with a message (or a document). To create a MAC out of the +message, we append an authentication key to the message, and then take +the hash of the result. Note that neither public key nor symmetric key +encryption is involved in creating the MAC. To create a digital +signature, we first take the hash of the message and then encrypt the +message with our private key (using public key cryptography). Thus, a +digital signature is a "heavier" technique, since it requires an +underlying Public Key Infrastructure (PKI) with certification +authorities as described below. We'll see in Section 8.4 that PGP---a +popular secure e-mail system---uses digital signatures for message +integrity. We've seen already that OSPF uses MACs for message integrity. +We'll see in Sections 8.5 and 8.6 that MACs are also used for popular +transport-layer and network-layer security protocols. Public Key +Certification An important application of digital signatures is public +key certification, that is, certifying that a public key belongs to a +specific entity. Public key certification is used in many popular secure +networking protocols, including IPsec and SSL. To gain insight into this +problem, let's consider an Internet-commerce version of the classic +"pizza prank." Alice is in the pizza delivery business and accepts +orders + +Figure 8.12 Verifying a signed message + +over the Internet. Bob, a pizza lover, sends Alice a plaintext message +that includes his home address and the type of pizza he wants. In this +message, Bob also includes a digital signature (that is, a signed hash +of the original plaintext message) to prove to Alice that he is the true +source of the message. To verify the signature, Alice obtains Bob's +public key (perhaps from a public key server or from the e-mail message) +and checks the digital signature. In this manner she makes sure that +Bob, rather than some adolescent prankster, placed the order. This all +sounds fine until clever Trudy comes along. As shown in Figure 8.13, +Trudy is indulging in a prank. She sends a message to Alice in which she +says she is Bob, gives Bob's home address, and orders a pizza. In this +message she also includes her (Trudy's) public key, although Alice +naturally assumes it is Bob's public key. Trudy also attaches a digital +signature, which was created with her own (Trudy's) private key. After +receiving the message, Alice applies Trudy's public key (thinking that +it is Bob's) to the digital signature and concludes that the plaintext +message was + +Figure 8.13 Trudy masquerades as Bob using public key cryptography + +indeed created by Bob. Bob will be very surprised when the delivery +person brings a pizza with pepperoni and anchovies to his home! We see +from this example that for public key cryptography to be useful, you +need to be able to verify that you have the actual public key of the +entity (person, router, browser, and so on) with whom you want to +communicate. For example, when Alice wants to communicate with Bob using +public key cryptography, she needs to verify that the public key that is +supposed to be Bob's is indeed Bob's. Binding a public key to a +particular entity is typically done by a Certification Authority (CA), +whose job is to validate identities and issue certificates. A CA has the +following roles: + +1. A CA verifies that an entity (a person, a router, and so on) is who + it says it is. There are no mandated procedures for how + certification is done. When dealing with a CA, one must trust the CA + to have performed a suitably rigorous identity verification. For + example, if Trudy were able to walk into the Fly-by-Night + +Figure 8.14 Bob has his public key certified by the CA + +CA and simply announce "I am Alice" and receive certificates associated +with the identity of Alice, then one shouldn't put much faith in public +keys certified by the Fly-by-Night CA. On the other hand, one might (or +might not!) be more willing to trust a CA that is part of a federal or +state program. You can trust the identity associated with a public key +only to the extent to which you can trust a CA and its identity +verification techniques. What a tangled web of trust we spin! + +2. Once the CA verifies the identity of the entity, the CA creates a + certificate that binds the public key of the entity to the identity. + The certificate contains the public key and globally unique + identifying information about the owner of the public key (for + example, a human name or an IP address). The certificate is + digitally signed by the CA. These steps are shown in Figure 8.14. + Let us now see how certificates can be used to combat pizza-ordering + pranksters, like Trudy, and other undesirables. When Bob places his + order he also sends his CA-signed certificate. Alice uses the CA's + public key to check the validity of Bob's certificate and extract + Bob's public key. Both the International Telecommunication Union + (ITU) and the IETF have developed standards for CAs. ITU X.509 \[ITU + 2005a\] specifies an authentication service as well as a specific + syntax for certificates. \[RFC 1422\] describes CA-based key + management for use with secure Internet e-mail. It is compatible + with X.509 but goes beyond X.509 by establishing procedures and + conventions for a key management architecture. Table 8.4 describes + some of the important fields in a certificate. Table 8.4 Selected + fields in an X.509 and RFC 1422 public key + +Field Name + +Description + +Version + +Version number of X.509 specification + +Serial + +CA-issued unique identifier for a certificate + +number Signature + +Specifies the algorithm used by CA to sign this certificate + +Issuer + +Identity of CA issuing this certificate, in distinguished name (DN) +\[RFC 4514\] format + +name Validity + +Start and end of period of validity for certificate + +period Subject + +Identity of entity whose public key is associated with this certificate, +in DN format + +name Subject + +The subject's public key as well indication of the public key algorithm +(and algorithm + +public key + +parameters) to be used with this key + +8.4 End-Point Authentication End-point authentication is the process of +one entity proving its identity to another entity over a computer +network, for example, a user proving its identity to an e-mail server. +As humans, we authenticate each other in many ways: We recognize each +other's faces when we meet, we recognize each other's voices on the +telephone, we are authenticated by the customs official who checks us +against the picture on our passport. In this section, we consider how +one party can authenticate another party when the two are communicating +over a network. We focus here on authenticating a "live" party, at the +point in time when communication is actually occurring. A concrete +example is a user authenticating him or herself to an email server. This +is a subtly different problem from proving that a message received at +some point in the past did indeed come from that claimed sender, as +studied in Section 8.3. When performing authentication over the network, +the communicating parties cannot rely on biometric information, such as +a visual appearance or a voiceprint. Indeed, we will see in our later +case studies that it is often network elements such as routers and +client/server processes that must authenticate each other. Here, +authentication must be done solely on the basis of messages and data +exchanged as part of an authentication protocol. Typically, an +authentication protocol would run before the two communicating parties +run some other protocol (for example, a reliable data transfer protocol, +a routing information exchange protocol, or an e-mail protocol). The +authentication protocol first establishes the identities of the parties +to each other's satisfaction; only after authentication do the parties +get down to the work at hand. As in the case of our development of a +reliable data transfer (rdt) protocol in Chapter 3, we will find it +instructive here to develop various versions of an authentication +protocol, which we will call ap (authentication protocol), and poke +holes in each version + +Figure 8.15 Protocol ap1.0 and a failure scenario + +as we proceed. (If you enjoy this stepwise evolution of a design, you +might also enjoy \[Bryant 1988\], which recounts a fictitious narrative +between designers of an open-network authentication system, and their +discovery of the many subtle issues involved.) Let's assume that Alice +needs to authenticate herself to Bob. + +8.4.1 Authentication Protocol ap1.0 Perhaps the simplest authentication +protocol we can imagine is one where Alice simply sends a message to Bob +saying she is Alice. This protocol is shown in Figure 8.15. The flaw +here is obvious--- there is no way for Bob actually to know that the +person sending the message "I am Alice" is indeed Alice. For example, +Trudy (the intruder) could just as well send such a message. + +8.4.2 Authentication Protocol ap2.0 If Alice has a well-known network +address (e.g., an IP address) from which she always communicates, Bob +could attempt to authenticate Alice by verifying that the source address +on the IP datagram carrying the authentication message matches Alice's +well-known address. In this case, Alice would be authenticated. This +might stop a very network-naive intruder from impersonating Alice, but +it wouldn't stop the determined student studying this book, or many +others! From our study of the network and data link layers, we know that +it is not that hard (for example, if one had access to the operating +system code and could build one's own operating system kernel, as is the + +case with Linux and several other freely available operating systems) to +create an IP datagram, put whatever IP source address we want (for +example, Alice's well-known IP address) into the IP datagram, and send +the datagram over the link-layer protocol to the first-hop router. From +then + +Figure 8.16 Protocol ap2.0 and a failure scenario + +on, the incorrectly source-addressed datagram would be dutifully +forwarded to Bob. This approach, shown in Figure 8.16, is a form of IP +spoofing. IP spoofing can be avoided if Trudy's first-hop router is +configured to forward only datagrams containing Trudy's IP source +address \[RFC 2827\]. However, this capability is not universally +deployed or enforced. Bob would thus be foolish to assume that Trudy's +network manager (who might be Trudy herself) had configured Trudy's +first-hop router to forward only appropriately addressed datagrams. + +8.4.3 Authentication Protocol ap3.0 One classic approach to +authentication is to use a secret password. The password is a shared +secret between the authenticator and the person being authenticated. +Gmail, Facebook, telnet, FTP, and many other services use password +authentication. In protocol ap3.0, Alice thus sends her secret password +to Bob, as shown in Figure 8.17. Since passwords are so widely used, we +might suspect that protocol ap3.0 is fairly secure. If so, we'd be +wrong! The security flaw here is clear. If Trudy eavesdrops on Alice's +communication, then she can learn Alice's password. Lest you think this +is unlikely, consider the fact that when you Telnet to another machine +and log in, the login password is sent unencrypted to the Telnet server. +Someone connected to the Telnet client or server's LAN can possibly +sniff (read and store) all packets transmitted on the LAN and thus steal +the login password. In fact, this is a well-known approach for stealing +passwords (see, for example, \[Jimenez 1997\]). Such a threat is +obviously very real, so ap3.0 clearly won't do. + +8.4.4 Authentication Protocol ap3.1 Our next idea for fixing ap3.0 is +naturally to encrypt the password. By encrypting the password, we can +prevent Trudy from learning Alice's password. If we assume + +Figure 8.17 Protocol ap3.0 and a failure scenario + +that Alice and Bob share a symmetric secret key, KA−B, then Alice can +encrypt the password and send her identification message, " I am Alice, +" and her encrypted password to Bob. Bob then decrypts the password and, +assuming the password is correct, authenticates Alice. Bob feels +comfortable in authenticating Alice since Alice not only knows the +password, but also knows the shared secret key value needed to encrypt +the password. Let's call this protocol ap3.1. While it is true that +ap3.1 prevents Trudy from learning Alice's password, the use of +cryptography here does not solve the authentication problem. Bob is +subject to a playback attack: Trudy need only eavesdrop on Alice's +communication, record the encrypted version of the password, and play +back the encrypted version of the password to Bob to pretend that she is +Alice. The use of an encrypted password in ap3.1 doesn't make the +situation manifestly different from that of protocol ap3.0 in Figure +8.17. + +8.4.5 Authentication Protocol ap4.0 The failure scenario in Figure 8.17 +resulted from the fact that Bob could not distinguish between the +original authentication of Alice and the later playback of Alice's +original authentication. That is, Bob could not tell if Alice was live +(that is, was currently really on the other end of the connection) or +whether the messages he was receiving were a recorded playback of a +previous authentication of Alice. The very (very) observant reader will +recall that the three-way TCP handshake protocol needed to address the +same problem---the server side of a TCP connection did not want to +accept a connection if the received SYN segment was an old copy +(retransmission) of a SYN segment from an earlier connection. How did +the TCP server side solve the problem of determining whether the client +was really live? It chose an initial sequence number that had not been +used in a very long time, sent that number to the client, and then +waited for the client to respond with an ACK segment containing that +number. We can adopt the same idea here for authentication purposes. A +nonce is a number that a protocol will use only once in a lifetime. That +is, once a protocol uses a nonce, it will never use that number again. +Our ap4.0 protocol uses a nonce as follows: + +1. Alice sends the message " I am Alice " to Bob. + +2. Bob chooses a nonce, R, and sends it to Alice. + +3. Alice encrypts the nonce using Alice and Bob's symmetric secret key, + KA−B, and sends the encrypted nonce, KA−B (R), back to Bob. As in + protocol ap3.1, it is the fact that Alice knows KA−B and uses it to + encrypt a value that lets Bob know that the message he receives was + generated by Alice. The nonce is used to ensure that Alice is live. + +4. Bob decrypts the received message. If the decrypted nonce equals the + nonce he sent Alice, then Alice is authenticated. Protocol ap4.0 is + illustrated in Figure 8.18. By using the once-in-a-lifetime value, + R, and then checking the returned value, KA−B (R), Bob can be sure + that Alice is both who she says she is (since she knows the secret + key value needed to encrypt R) and live (since she has encrypted the + nonce, R, that Bob just created). The use of a nonce and symmetric + key cryptography forms the basis of ap4.0. A natural question is + whether we can use a nonce and public key cryptography (rather than + symmetric key cryptography) to solve the authentication problem. + This issue is explored in the problems at the end of the chapter. + +Figure 8.18 Protocol ap4.0 and a failure scenario + +8.5 Securing E-Mail In previous sections, we examined fundamental issues +in network security, including symmetric key and public key +cryptography, end-point authentication, key distribution, message +integrity, and digital signatures. We are now going to examine how these +tools are being used to provide security in the Internet. Interestingly, +it is possible to provide security services in any of the top four +layers of the Internet protocol stack. When security is provided for a +specific application-layer protocol, the application using the protocol +will enjoy one or more security services, such as confidentiality, +authentication, or integrity. When security is provided for a +transport-layer protocol, all applications that use that protocol enjoy +the security services of the transport protocol. When security is +provided at the network layer on a host-tohost basis, all +transport-layer segments (and hence all application-layer data) enjoy +the security services of the network layer. When security is provided on +a link basis, then the data in all frames traveling over the link +receive the security services of the link. In Sections 8.5 through 8.8, +we examine how security tools are being used in the application, +transport, network, and link layers. Being consistent with the general +structure of this book, we begin at the top of the protocol stack and +discuss security at the application layer. Our approach is to use a +specific application, e-mail, as a case study for application-layer +security. We then move down the protocol stack. We'll examine the SSL +protocol (which provides security at the transport layer), IPsec (which +provides security at the network layer), and the security of the IEEE +802.11 wireless LAN protocol. You might be wondering why security +functionality is being provided at more than one layer in the Internet. +Wouldn't it suffice simply to provide the security functionality at the +network layer and be done with it? There are two answers to this +question. First, although security at the network layer can offer +"blanket coverage" by encrypting all the data in the datagrams (that is, +all the transport-layer segments) and by authenticating all the source +IP addresses, it can't provide user-level security. For example, a +commerce site cannot rely on IP-layer security to authenticate a +customer who is purchasing goods at the commerce site. Thus, there is a +need for security functionality at higher layers as well as blanket +coverage at lower layers. Second, it is generally easier to deploy new +Internet services, including security services, at the higher layers of +the protocol stack. While waiting for security to be broadly deployed at +the network layer, which is probably still many years in the future, +many application developers "just do it" and introduce security +functionality into their favorite applications. A classic example is +Pretty Good Privacy (PGP), which provides secure e-mail (discussed later +in this section). Requiring only client and server application code, PGP +was one of the first security technologies to be broadly used in the +Internet. + +8.5.1 Secure E-Mail We now use the cryptographic principles of Sections +8.2 through 8.3 to create a secure e-mail system. We create this +high-level design in an incremental manner, at each step introducing new +security services. When designing a secure e-mail system, let us keep in +mind the racy example introduced in Section 8.1---the love affair +between Alice and Bob. Imagine that Alice wants to send an e-mail +message to Bob, and Trudy wants to intrude. Before plowing ahead and +designing a secure e-mail system for Alice and Bob, we should consider +which security features would be most desirable for them. First and +foremost is confidentiality. As discussed in Section 8.1, neither Alice +nor Bob wants Trudy to read Alice's e-mail message. The second feature +that Alice and Bob would most likely want to see in the secure e-mail +system is sender authentication. In particular, when Bob receives the +message " I don't love you anymore. I never want to see you again. +Formerly yours, Alice, " he would naturally want to be sure that the +message came from Alice and not from Trudy. Another feature that the two +lovers would appreciate is message integrity, that is, assurance that +the message Alice sends is not modified while en route to Bob. Finally, +the e-mail system should provide receiver authentication; that is, Alice +wants to make sure that she is indeed sending the letter to Bob and not +to someone else (for example, Trudy) who is impersonating Bob. So let's +begin by addressing the foremost concern, confidentiality. The most +straightforward way to provide confidentiality is for Alice to encrypt +the message with symmetric key technology (such as DES or AES) and for +Bob to decrypt the message on receipt. As discussed in Section 8.2, if +the symmetric key is long enough, and if only Alice and Bob have the +key, then it is extremely difficult for anyone else (including Trudy) to +read the message. Although this approach is straightforward, it has the +fundamental difficulty that we discussed in Section 8.2---distributing a +symmetric key so that only Alice and Bob have copies of it. So we +naturally consider an alternative approach---public key cryptography +(using, for example, RSA). In the public key approach, Bob makes his +public key publicly available (e.g., in a public key server or on his +personal Web page), Alice encrypts her message with Bob's public key, +and she sends the encrypted message to Bob's e-mail address. When Bob +receives the message, he simply decrypts it with his private key. +Assuming that Alice knows for sure that the public key is Bob's public +key, this approach is an excellent means to provide the desired +confidentiality. One problem, however, is that public key encryption is +relatively inefficient, particularly for long messages. To overcome the +efficiency problem, let's make use of a session key (discussed in +Section 8.2.2). In particular, Alice (1) selects a random symmetric +session key, KS, (2) encrypts her message, m, with the symmetric key, +(3) encrypts the symmetric key with Bob's public key, KB+, (4) +concatenates the + +encrypted message and the encrypted symmetric key to form a "package," +and (5) sends the package to Bob's + +Figure 8.19 Alice used a symmetric session key, KS, to send a secret +e-mail to Bob + +e-mail address. The steps are illustrated in Figure 8.19. (In this and +the subsequent figures, the circled "+" represents concatenation and the +circled "−" represents deconcatenation.) When Bob receives the package, +he (1) uses his private key, KB−, to obtain the symmetric key, KS, and +(2) uses the symmetric key KS to decrypt the message m. Having designed +a secure e-mail system that provides confidentiality, let's now design +another system that provides both sender authentication and message +integrity. We'll suppose, for the moment, that Alice and Bob are no +longer concerned with confidentiality (they want to share their feelings +with everyone!), and are concerned only about sender authentication and +message integrity. To accomplish this task, we use digital signatures +and message digests, as described in Section 8.3. Specifically, Alice +(1) applies a hash function, H (for example, MD5), to her message, m, to +obtain a message digest, (2) signs the result of the hash function with +her private key, KA−, to create a digital signature, (3) concatenates +the original (unencrypted) message with the signature to create a +package, and (4) sends the package to Bob's e-mail address. When Bob +receives the package, he (1) applies Alice's public key, KA+, to the +signed message digest and (2) compares the result of this operation with +his own hash, H, of the message. The steps are illustrated in Figure +8.20. As discussed in Section 8.3, if the two results are the same, Bob +can be pretty confident that the message came from Alice and is +unaltered. Now let's consider designing an e-mail system that provides +confidentiality, sender authentication, and message integrity. This can +be done by combining the procedures in Figures 8.19 and 8.20. Alice +first creates a preliminary package, exactly as in Figure 8.20, that +consists of her original message along with a digitally signed hash of +the message. She then treats this preliminary package as a message in +itself and sends this new message through the sender steps in Figure +8.19, creating a new package that is sent to Bob. The steps applied by +Alice are shown in Figure 8.21. When Bob receives the + +package, he first applies his side of Figure 8.19 and then his + +Figure 8.20 Using hash functions and digital signatures to provide +sender authentication and message integrity + +side of Figure 8.20. It should be clear that this design achieves the +goal of providing confidentiality, sender authentication, and message +integrity. Note that, in this scheme, Alice uses public key cryptography +twice: once with her own private key and once with Bob's public key. +Similarly, Bob also uses public key cryptography twice---once with his +private key and once with Alice's public key. The secure e-mail design +outlined in Figure 8.21 probably provides satisfactory security for most +e-mail users for most occasions. But there is still one important issue +that remains to be addressed. The design in Figure 8.21 requires Alice +to obtain Bob's public key, and requires Bob to obtain Alice's public +key. The distribution of these public keys is a nontrivial problem. For +example, Trudy might masquerade as Bob and give Alice her own public key +while saying that it is Bob's public key, + +Figure 8.21 Alice uses symmetric key cyptography, public key +cryptography, a hash function, and a digital signature to provide +secrecy, sender authentication, and message integrity + +enabling her to receive the message meant for Bob. As we learned in +Section 8.3, a popular approach for securely distributing public keys is +to certify the public keys using a CA. + +8.5.2 PGP Written by Phil Zimmermann in 1991, Pretty Good Privacy (PGP) +is a nice example of an e-mail encryption scheme \[PGPI 2016\]. Versions +of PGP are available in the public domain; for example, you can find the +PGP software for your favorite platform as well as lots of interesting +reading at the International PGP Home Page \[PGPI 2016\]. The PGP design +is, in essence, the same as the design shown in Figure 8.21. Depending +on the version, the PGP software uses MD5 or SHA for calculating the +message digest; CAST, triple-DES, or IDEA for symmetric key encryption; +and RSA for the public key encryption. When PGP is installed, the +software creates a public key pair for the user. The public key can be +posted on the user's Web site or placed in a public key server. The +private key is protected by the use of a password. The password has to +be entered every time the user accesses the private key. PGP gives the +user the option of digitally signing the message, encrypting the +message, or both digitally signing and encrypting. Figure 8.22 shows a +PGP signed message. This message appears after the MIME header. The +encoded data in the message is KA−(H(m)), that is, the digitally signed +message digest. As we discussed above, in order for Bob to verify the +integrity of the message, he needs to have access to Alice's public key. +Figure 8.23 shows a secret PGP message. This message also appears after +the MIME header. Of course, the plaintext message is not included within +the secret e-mail message. When a sender (such as Alice) wants both +confidentiality and integrity, PGP contains a message like that of +Figure 8.23 within the message of Figure 8.22. PGP also provides a +mechanism for public key certification, but the mechanism is quite +different from the more conventional CA. PGP public keys are certified +by + +Figure 8.22 A PGP signed message + +Figure 8.23 A secret PGP message + +a web of trust. Alice herself can certify any key/username pair when she +believes the pair really belong together. In addition, PGP permits Alice +to say that she trusts another user to vouch for the authenticity of +more keys. Some PGP users sign each other's keys by holding key-signing +parties. Users physically gather, exchange public keys, and certify each +other's keys by signing them with their private keys. + +8.6 Securing TCP Connections: SSL In the previous section, we saw how +cryptographic techniques can provide confidentiality, data integrity, +and end-point authentication to a specific application, namely, e-mail. +In this section, we'll drop down a layer in the protocol stack and +examine how cryptography can enhance TCP with security services, +including confidentiality, data integrity, and end-point authentication. +This enhanced version of TCP is commonly known as Secure Sockets Layer +(SSL). A slightly modified version of SSL version 3, called Transport +Layer Security (TLS), has been standardized by the IETF \[RFC 4346\]. +The SSL protocol was originally designed by Netscape, but the basic +ideas behind securing TCP had predated Netscape's work (for example, see +Woo \[Woo 1994\]). Since its inception, SSL has enjoyed broad +deployment. SSL is supported by all popular Web browsers and Web +servers, and it is used by Gmail and essentially all Internet commerce +sites (including Amazon, eBay, and TaoBao). Hundreds of billions of +dollars are spent over SSL every year. In fact, if you have ever +purchased anything over the Internet with your credit card, the +communication between your browser and the server for this purchase +almost certainly went over SSL. (You can identify that SSL is being used +by your browser when the URL begins with https: rather than http.) To +understand the need for SSL, let's walk through a typical Internet +commerce scenario. Bob is surfing the Web and arrives at the Alice +Incorporated site, which is selling perfume. The Alice Incorporated site +displays a form in which Bob is supposed to enter the type of perfume +and quantity desired, his address, and his payment card number. Bob +enters this information, clicks on Submit, and expects to receive (via +ordinary postal mail) the purchased perfumes; he also expects to receive +a charge for his order in his next payment card statement. This all +sounds good, but if no security measures are taken, Bob could be in for +a few surprises. If no confidentiality (encryption) is used, an intruder +could intercept Bob's order and obtain his payment card information. The +intruder could then make purchases at Bob's expense. If no data +integrity is used, an intruder could modify Bob's order, having him +purchase ten times more bottles of perfume than desired. Finally, if no +server authentication is used, a server could display Alice +Incorporated's famous logo when in actuality the site maintained by +Trudy, who is masquerading as Alice Incorporated. After receiving Bob's +order, Trudy could take Bob's money and run. Or Trudy could carry out an +identity theft by collecting Bob's name, address, and credit card +number. SSL addresses these issues by enhancing TCP with +confidentiality, data integrity, server authentication, and client +authentication. + +SSL is often used to provide security to transactions that take place +over HTTP. However, because SSL secures TCP, it can be employed by any +application that runs over TCP. SSL provides a simple Application +Programmer Interface (API) with sockets, which is similar and analogous +to TCP's API. When an application wants to employ SSL, the application +includes SSL classes/libraries. As shown in Figure 8.24, although SSL +technically resides in the application layer, from the developer's +perspective it is a transport protocol that provides TCP's services +enhanced with security services. + +8.6.1 The Big Picture We begin by describing a simplified version of +SSL, one that will allow us to get a big-picture understanding of the +why and how of SSL. We will refer to this simplified + +Figure 8.24 Although SSL technically resides in the application layer, +from the developer's perspective it is a transport-layer protocol + +version of SSL as "almost-SSL." After describing almost-SSL, in the next +subsection we'll then describe the real SSL, filling in the details. +Almost-SSL (and SSL) has three phases: handshake, key derivation, and +data transfer. We now describe these three phases for a communication +session between a client (Bob) and a server (Alice), with Alice having a +private/public key pair and a certificate that binds her identity to her +public key. + +Handshake During the handshake phase, Bob needs to (a) establish a TCP +connection with Alice, (b) verify that Alice is really Alice, and (c) +send Alice a master secret key, which will be used by both Alice and Bob +to generate all the symmetric keys they need for the SSL session. These +three steps are shown in Figure 8.25. Note that once the TCP connection +is established, Bob sends Alice a hello message. Alice then responds +with her certificate, which contains her public key. As discussed in +Section 8.3, because the certificate has been certified by a CA, Bob +knows for sure that the public key in the certificate belongs to Alice. +Bob then generates a Master Secret (MS) (which will only be used for +this SSL session), encrypts the MS with Alice's public key to create the +Encrypted Master Secret (EMS), and sends the EMS to Alice. Alice +decrypts the EMS with her private key to get the MS. After this phase, +both Bob and Alice (and no one else) know the master secret for this SSL +session. + +Figure 8.25 The almost-SSL handshake, beginning with a TCP connection + +Key Derivation In principle, the MS, now shared by Bob and Alice, could +be used as the symmetric session key for all subsequent encryption and +data integrity checking. It is, however, generally considered safer for +Alice and Bob to each use different cryptographic keys, and also to use +different keys for encryption and integrity checking. Thus, both Alice +and Bob use the MS to generate four keys: EB= session encryption key for +data sent from Bob to Alice MB= session MAC key for data sent from Bob +to Alice EA= + +session encryption key for data sent from Alice to Bob MA= session MAC +key for data sent from Alice to Bob Alice and Bob each generate the four +keys from the MS. This could be done by simply slicing the MS into four +keys. (But in real SSL it is a little more complicated, as we'll see.) +At the end of the key derivation phase, both Alice and Bob have all four +keys. The two encryption keys will be used to encrypt data; the two MAC +keys will be used to verify the integrity of the data. Data Transfer Now +that Alice and Bob share the same four session keys (EB, MB, EA, and +MA), they can start to send secured data to each other over the TCP +connection. Since TCP is a byte-stream protocol, a natural approach +would be for SSL to encrypt application data on the fly and then pass +the encrypted data on the fly to TCP. But if we were to do this, where +would we put the MAC for the integrity check? We certainly do not want +to wait until the end of the TCP session to verify the integrity of all +of Bob's data that was sent over the entire session! To address this +issue, SSL breaks the data stream into records, appends a MAC to each +record for integrity checking, and then encrypts the record +MAC. To +create the MAC, Bob inputs the record data along with the key MB into a +hash function, as discussed in Section 8.3. To encrypt the package +record +MAC, Bob uses his session encryption key EB. This encrypted +package is then passed to TCP for transport over the Internet. Although +this approach goes a long way, it still isn't bullet-proof when it comes +to providing data integrity for the entire message stream. In +particular, suppose Trudy is a woman-in-the-middle and has the ability +to insert, delete, and replace segments in the stream of TCP segments +sent between Alice and Bob. Trudy, for example, could capture two +segments sent by Bob, reverse the order of the segments, adjust the TCP +sequence numbers (which are not encrypted), and then send the two +reverse-ordered segments to Alice. Assuming that each TCP segment +encapsulates exactly one record, let's now take a look at how Alice +would process these segments. + +1. TCP running in Alice would think everything is fine and pass the two + records to the SSL sublayer. + +2. SSL in Alice would decrypt the two records. + +3. SSL in Alice would use the MAC in each record to verify the data + integrity of the two records. + +4. SSL would then pass the decrypted byte streams of the two records to + the application layer; but the complete byte stream received by + Alice would not be in the correct order due to reversal of the + records! You are encouraged to walk through similar scenarios for + when Trudy removes segments or when Trudy replays segments. + +The solution to this problem, as you probably guessed, is to use +sequence numbers. SSL does this as follows. Bob maintains a sequence +number counter, which begins at zero and is incremented for each SSL +record he sends. Bob doesn't actually include a sequence number in the +record itself, but when he calculates the MAC, he includes the sequence +number in the MAC calculation. Thus, the MAC is now a hash of the data +plus the MAC key MB plus the current sequence number. Alice tracks Bob's +sequence numbers, allowing her to verify the data integrity of a record +by including the appropriate sequence number in the MAC calculation. +This use of SSL sequence numbers prevents Trudy from carrying out a +woman-in-the-middle attack, such as reordering or replaying segments. +(Why?) SSL Record The SSL record (as well as the almost-SSL record) is +shown in Figure 8.26. The record consists of a type field, version +field, length field, data field, and MAC field. Note that the first +three fields are not encrypted. The type field indicates whether the +record is a handshake message or a message that contains application +data. It is also used to close the SSL connection, as discussed below. +SSL at the receiving end uses the length field to extract the SSL +records out of the incoming TCP byte stream. The version field is +self-explanatory. + +8.6.2 A More Complete Picture The previous subsection covered the +almost-SSL protocol; it served to give us a basic understanding of the +why and how of SSL. Now that we have a basic understanding of SSL, we +can dig a little deeper and examine the essentials of the actual SSL +protocol. In parallel to reading this description of the SSL protocol, +you are encouraged to complete the Wireshark SSL lab, available at the +textbook's Web site. + +Figure 8.26 Record format for SSL + +SSL Handshake SSL does not mandate that Alice and Bob use a specific +symmetric key algorithm, a specific public-key algorithm, or a specific +MAC. Instead, SSL allows Alice and Bob to agree on the cryptographic +algorithms at the beginning of the SSL session, during the handshake +phase. Additionally, during the handshake phase, Alice and Bob send +nonces to each other, which are used in the creation of the + +session keys (EB, MB, EA, and MA). The steps of the real SSL handshake +are as follows: + +1. The client sends a list of cryptographic algorithms it supports, + along with a client nonce. + +2. From the list, the server chooses a symmetric algorithm (for + example, AES), a public key algorithm (for example, RSA with a + specific key length), and a MAC algorithm. It sends back to the + client its choices, as well as a certificate and a server nonce. + +3. The client verifies the certificate, extracts the server's public + key, generates a Pre-Master Secret (PMS), encrypts the PMS with the + server's public key, and sends the encrypted PMS to the server. + +4. Using the same key derivation function (as specified by the SSL + standard), the client and server independently compute the Master + Secret (MS) from the PMS and nonces. The MS is then sliced up to + generate the two encryption and two MAC keys. Furthermore, when the + chosen symmetric cipher employs CBC (such as 3DES or AES), then two + Initialization Vectors (IVs)--- one for each side of the + connection---are also obtained from the MS. Henceforth, all messages + sent between client and server are encrypted and authenticated (with + the MAC). + +5. The client sends a MAC of all the handshake messages. + +6. The server sends a MAC of all the handshake messages. The last two + steps protect the handshake from tampering. To see this, observe + that in step 1, the client typically offers a list of + algorithms---some strong, some weak. This list of algorithms is sent + in cleartext, since the encryption algorithms and keys have not yet + been agreed upon. Trudy, as a woman-in-themiddle, could delete the + stronger algorithms from the list, forcing the client to select a + weak algorithm. To prevent such a tampering attack, in step 5 the + client sends a MAC of the concatenation of all the handshake + messages it sent and received. The server can compare this MAC with + the MAC of the handshake messages it received and sent. If there is + an inconsistency, the server can terminate the connection. + Similarly, the server sends a MAC of the handshake messages it has + seen, allowing the client to check for inconsistencies. You may be + wondering why there are nonces in steps 1 and 2. Don't sequence + numbers suffice for preventing the segment replay attack? The answer + is yes, but they don't alone prevent the "connection replay attack." + Consider the following connection replay attack. Suppose Trudy + sniffs all messages between Alice and Bob. The next day, Trudy + masquerades as Bob and sends to Alice exactly the same sequence of + messages that Bob sent to Alice on the previous day. If Alice + doesn't use nonces, she will respond with exactly the same sequence + of messages she sent the previous day. Alice will not suspect any + funny business, as each message she receives will pass the integrity + check. If Alice is an ecommerce server, she will think that Bob is + placing a second order (for exactly the same thing). On the other + hand, by including a nonce in the protocol, Alice will send + different nonces for each TCP session, causing the encryption keys + to be different on the two days. Therefore, when Alice receives + played-back SSL records from Trudy, the records will fail the + integrity checks, and the bogus e-commerce transaction will not + succeed. In summary, in SSL, nonces are used to defend against the + "connection replay attack" + +and sequence numbers are used to defend against replaying individual +packets during an ongoing session. Connection Closure At some point, +either Bob or Alice will want to end the SSL session. One approach would +be to let Bob end the SSL session by simply terminating the underlying +TCP connection---that is, by having Bob send a TCP FIN segment to Alice. +But such a naive design sets the stage for the truncation attack whereby +Trudy once again gets in the middle of an ongoing SSL session and ends +the session early with a TCP FIN. If Trudy were to do this, Alice would +think she received all of Bob's data when actuality she only received a +portion of it. The solution to this problem is to indicate in the type +field whether the record serves to terminate the SSL session. (Although +the SSL type is sent in the clear, it is authenticated at the receiver +using the record's MAC.) By including such a field, if Alice were to +receive a TCP FIN before receiving a closure SSL record, she would know +that something funny was going on. This completes our introduction to +SSL. We've seen that it uses many of the cryptography principles +discussed in Sections 8.2 and 8.3. Readers who want to explore SSL on +yet a deeper level can read Rescorla's highly readable book on SSL +\[Rescorla 2001\]. + +8.7 Network-Layer Security: IPsec and Virtual Private Networks The IP +security protocol, more commonly known as IPsec, provides security at +the network layer. IPsec secures IP datagrams between any two +network-layer entities, including hosts and routers. As we will soon +describe, many institutions (corporations, government branches, +non-profit organizations, and so on) use IPsec to create virtual private +networks (VPNs) that run over the public Internet. Before getting into +the specifics of IPsec, let's step back and consider what it means to +provide confidentiality at the network layer. With network-layer +confidentiality between a pair of network entities (for example, between +two routers, between two hosts, or between a router and a host), the +sending entity encrypts the payloads of all the datagrams it sends to +the receiving entity. The encrypted payload could be a TCP segment, a +UDP segment, an ICMP message, and so on. If such a network-layer service +were in place, all data sent from one entity to the other---including +e-mail, Web pages, TCP handshake messages, and management messages (such +as ICMP and SNMP)---would be hidden from any third party that might be +sniffing the network. For this reason, network-layer security is said to +provide "blanket coverage." In addition to confidentiality, a +network-layer security protocol could potentially provide other security +services. For example, it could provide source authentication, so that +the receiving entity can verify the source of the secured datagram. A +network-layer security protocol could provide data integrity, so that +the receiving entity can check for any tampering of the datagram that +may have occurred while the datagram was in transit. A network-layer +security service could also provide replay-attack prevention, meaning +that Bob could detect any duplicate datagrams that an attacker might +insert. We will soon see that IPsec indeed provides mechanisms for all +these security services, that is, for confidentiality, source +authentication, data integrity, and replay-attack prevention. + +8.7.1 IPsec and Virtual Private Networks (VPNs) An institution that +extends over multiple geographical regions often desires its own IP +network, so that its hosts and servers can send data to each other in a +secure and confidential manner. To achieve this goal, the institution +could actually deploy a stand-alone physical network---including +routers, links, and a DNS infrastructure---that is completely separate +from the public Internet. Such a disjoint network, dedicated to a +particular institution, is called a private network. Not surprisingly, a +private network can be very costly, as the institution needs to +purchase, install, and maintain its own physical network infrastructure. + +Instead of deploying and maintaining a private network, many +institutions today create VPNs over the existing public Internet. With a +VPN, the institution's inter-office traffic is sent over the public +Internet rather than over a physically independent network. But to +provide confidentiality, the inter-office traffic is encrypted before it +enters the public Internet. A simple example of a VPN is shown in Figure +8.27. Here the institution consists of a headquarters, a branch office, +and traveling salespersons that typically access the Internet from their +hotel rooms. (There is only one salesperson shown in the figure.) In +this VPN, whenever two hosts within headquarters send IP datagrams to +each other or whenever two hosts within the branch office want to +communicate, they use good-old vanilla IPv4 (that is, without IPsec +services). However, when two of the institution's hosts + +Figure 8.27 Virtual private network (VPN) + +communicate over a path that traverses the public Internet, the traffic +is encrypted before it enters the Internet. To get a feel for how a VPN +works, let's walk through a simple example in the context of Figure +8.27. When a host in headquarters sends an IP datagram to a salesperson +in a hotel, the gateway router in headquarters converts the vanilla IPv4 +datagram into an IPsec datagram and then forwards this IPsec datagram +into the Internet. This IPsec datagram actually has a traditional IPv4 +header, so that the routers in the public Internet process the datagram +as if it were an ordinary IPv4 datagram---to them, the datagram is a +perfectly ordinary datagram. But, as shown Figure 8.27, the payload of +the IPsec datagram includes an IPsec header, which is used for IPsec +processing; furthermore, the payload of the + +IPsec datagram is encrypted. When the IPsec datagram arrives at the +salesperson's laptop, the OS in the laptop decrypts the payload (and +provides other security services, such as verifying data integrity) and +passes the unencrypted payload to the upper-layer protocol (for example, +to TCP or UDP). We have just given a high-level overview of how an +institution can employ IPsec to create a VPN. To see the forest through +the trees, we have brushed aside many important details. Let's now take +a closer look. + +8.7.2 The AH and ESP Protocols IPsec is a rather complex animal---it is +defined in more than a dozen RFCs. Two important RFCs are RFC 4301, +which describes the overall IP security architecture, and RFC 6071, +which provides an overview of the IPsec protocol suite. Our goal in this +textbook, as usual, is not simply to re-hash the dry and arcane RFCs, +but instead take a more operational and pedagogic approach to describing +the protocols. In the IPsec protocol suite, there are two principal +protocols: the Authentication Header (AH) protocol and the Encapsulation +Security Payload (ESP) protocol. When a source IPsec entity (typically a +host or a router) sends secure datagrams to a destination entity (also a +host or a router), it does so with either the AH protocol or the ESP +protocol. The AH protocol provides source authentication and data +integrity but does not provide confidentiality. The ESP protocol +provides source authentication, data integrity, and confidentiality. +Because confidentiality is often critical for VPNs and other IPsec +applications, the ESP protocol is much more widely used than the AH +protocol. In order to de-mystify IPsec and avoid much of its +complication, we will henceforth focus exclusively on the ESP protocol. +Readers wanting to learn also about the AH protocol are encouraged to +explore the RFCs and other online resources. + +8.7.3 Security Associations IPsec datagrams are sent between pairs of +network entities, such as between two hosts, between two routers, or +between a host and router. Before sending IPsec datagrams from source +entity to destination entity, the source and destination entities create +a network-layer logical connection. This logical connection is called a +security association (SA). An SA is a simplex logical connection; that +is, it is unidirectional from source to destination. If both entities +want to send secure datagrams to each other, then two SAs (that is, two +logical connections) need to be established, one in each direction. For +example, consider once again the institutional VPN in Figure 8.27. This +institution consists of a + +headquarters office, a branch office and, say, n traveling salespersons. +For the sake of example, let's suppose that there is bi-directional +IPsec traffic between headquarters and the branch office and +bidirectional IPsec traffic between headquarters and the salespersons. +In this VPN, how many SAs are there? To answer this question, note that +there are two SAs between the headquarters gateway router and the +branch-office gateway router (one in each direction); for each +salesperson's laptop, there are two SAs between the headquarters gateway +router and the laptop (again, one in each direction). So, in total, +there are (2+2n) SAs. Keep in mind, however, that not all traffic sent +into the Internet by the gateway routers or by the laptops will be IPsec +secured. For example, a host in headquarters may want to access a Web +server (such as Amazon or Google) in the public Internet. Thus, the +gateway router (and the laptops) will emit into the Internet both +vanilla IPv4 datagrams and secured IPsec datagrams. + +Figure 8.28 Security association (SA) from R1 to R2 + +Let's now take a look "inside" an SA. To make the discussion tangible +and concrete, let's do this in the context of an SA from router R1 to +router R2 in Figure 8.28. (You can think of Router R1 as the +headquarters gateway router and Router R2 as the branch office gateway +router from Figure 8.27.) Router R1 will maintain state information +about this SA, which will include: A 32-bit identifier for the SA, +called the Security Parameter Index (SPI) The origin interface of the SA +(in this case 200.168.1.100) and the destination interface of the SA (in +this case 193.68.2.23) The type of encryption to be used (for example, +3DES with CBC) The encryption key The type of integrity check (for +example, HMAC with MD5) The authentication key Whenever router R1 needs +to construct an IPsec datagram for forwarding over this SA, it accesses +this state information to determine how it should authenticate and +encrypt the datagram. Similarly, router R2 will maintain the same state +information for this SA and will use this information to authenticate +and decrypt any IPsec datagram that arrives from the SA. An IPsec entity +(router or host) often maintains state information for many SAs. For +example, in the VPN + +example in Figure 8.27 with n salespersons, the headquarters gateway +router maintains state information for (2+2n) SAs. An IPsec entity +stores the state information for all of its SAs in its Security +Association Database (SAD), which is a data structure in the entity's OS +kernel. + +8.7.4 The IPsec Datagram Having now described SAs, we can now describe +the actual IPsec datagram. IPsec has two different packet forms, one for +the so-called tunnel mode and the other for the so-called transport +mode. The tunnel mode, being more appropriate for VPNs, + +Figure 8.29 IPsec datagram format + +is more widely deployed than the transport mode. In order to further +de-mystify IPsec and avoid much of its complication, we henceforth focus +exclusively on the tunnel mode. Once you have a solid grip on the tunnel +mode, you should be able to easily learn about the transport mode on +your own. The packet format of the IPsec datagram is shown in Figure +8.29. You might think that packet formats are boring and insipid, but we +will soon see that the IPsec datagram actually looks and tastes like a +popular Tex-Mex delicacy! Let's examine the IPsec fields in the context +of Figure 8.28. Suppose router R1 receives an ordinary IPv4 datagram +from host 172.16.1.17 (in the headquarters network) which is destined to +host 172.16.2.48 (in the branch-office network). Router R1 uses the +following recipe to convert this "original IPv4 datagram" into an IPsec +datagram: Appends to the back of the original IPv4 datagram (which +includes the original header fields!) an "ESP trailer" field Encrypts +the result using the algorithm and key specified by the SA Appends to +the front of this encrypted quantity a field called "ESP header"; the +resulting package is called the "enchilada" Creates an authentication +MAC over the whole enchilada using the algorithm and key specified in + +the SA Appends the MAC to the back of the enchilada forming the payload +Finally, creates a brand new IP header with all the classic IPv4 header +fields (together normally 20 bytes long), which it appends before the +payload Note that the resulting IPsec datagram is a bona fide IPv4 +datagram, with the traditional IPv4 header fields followed by a payload. +But in this case, the payload contains an ESP header, the original IP +datagram, an ESP trailer, and an ESP authentication field (with the +original datagram and ESP trailer encrypted). The original IP datagram +has 172.16.1.17 for the source IP address and 172.16.2.48 for the +destination IP address. Because the IPsec datagram includes the original +IP datagram, these addresses are included (and encrypted) as part of the +payload of the IPsec packet. But what about the source and destination +IP addresses that are in the new IP header, that is, in the left-most +header of the IPsec datagram? As you might expect, they are set to the +source and destination router interfaces at the two ends of the tunnels, +namely, 200.168.1.100 and 193.68.2.23. Also, the protocol number in this +new IPv4 header field is not set to that of TCP, UDP, or SMTP, but +instead to 50, designating that this is an IPsec datagram using the ESP +protocol. After R1 sends the IPsec datagram into the public Internet, it +will pass through many routers before reaching R2. Each of these routers +will process the datagram as if it were an ordinary datagram---they are +completely oblivious to the fact that the datagram is carrying +IPsec-encrypted data. For these public Internet routers, because the +destination IP address in the outer header is R2, the ultimate +destination of the datagram is R2. Having walked through an example of +how an IPsec datagram is constructed, let's now take a closer look at +the ingredients in the enchilada. We see in Figure 8.29 that the ESP +trailer consists of three fields: padding; pad length; and next header. +Recall that block ciphers require the message to be encrypted to be an +integer multiple of the block length. Padding (consisting of meaningless +bytes) is used so that when added to the original datagram (along with +the pad length and next header fields), the resulting "message" is an +integer number of blocks. The pad-length field indicates to the +receiving entity how much padding was inserted (and thus needs to be +removed). The next header identifies the type (e.g., UDP) of data +contained in the payload-data field. The payload data (typically the +original IP datagram) and the ESP trailer are concatenated and then +encrypted. Appended to the front of this encrypted unit is the ESP +header, which is sent in the clear and consists of two fields: the SPI +and the sequence number field. The SPI indicates to the receiving entity +the SA to which the datagram belongs; the receiving entity can then +index its SAD with the SPI to determine the appropriate +authentication/decryption algorithms and keys. The sequence number field +is used to defend against replay attacks. The sending entity also +appends an authentication MAC. As stated earlier, the sending entity +calculates + +a MAC over the whole enchilada (consisting of the ESP header, the +original IP datagram, and the ESP trailer---with the datagram and +trailer being encrypted). Recall that to calculate a MAC, the sender +appends a secret MAC key to the enchilada and then calculates a +fixed-length hash of the result. When R2 receives the IPsec datagram, R2 +observes that the destination IP address of the datagram is R2 itself. +R2 therefore processes the datagram. Because the protocol field (in the +left-most IP header) is 50, R2 sees that it should apply IPsec ESP +processing to the datagram. First, peering into the enchilada, R2 uses +the SPI to determine to which SA the datagram belongs. Second, it +calculates the MAC of the enchilada and verifies that the MAC is +consistent with the value in the ESP MAC field. If it is, it knows that +the enchilada comes from R1 and has not been tampered with. Third, it +checks the sequence-number field to verify that the datagram is fresh +(and not a replayed datagram). Fourth, it decrypts the encrypted unit +using the decryption algorithm and key associated with the SA. Fifth, it +removes padding and extracts the original, vanilla IP datagram. And +finally, sixth, it forwards the original datagram into the branch office +network toward its ultimate destination. Whew, what a complicated +recipe, huh? Well no one ever said that preparing and unraveling an +enchilada was easy! There is actually another important subtlety that +needs to be addressed. It centers on the following question: When R1 +receives an (unsecured) datagram from a host in the headquarters +network, and that datagram is destined to some destination IP address +outside of headquarters, how does R1 know whether it should be converted +to an IPsec datagram? And if it is to be processed by IPsec, how does R1 +know which SA (of many SAs in its SAD) should be used to construct the +IPsec datagram? The problem is solved as follows. Along with a SAD, the +IPsec entity also maintains another data structure called the Security +Policy Database (SPD). The SPD indicates what types of datagrams (as a +function of source IP address, destination IP address, and protocol +type) are to be IPsec processed; and for those that are to be IPsec +processed, which SA should be used. In a sense, the information in a SPD +indicates "what" to do with an arriving datagram; the information in the +SAD indicates "how" to do it. Summary of IPsec Services So what services +does IPsec provide, exactly? Let us examine these services from the +perspective of an attacker, say Trudy, who is a woman-in-the-middle, +sitting somewhere on the path between R1 and R2 in Figure 8.28. Assume +throughout this discussion that Trudy does not know the authentication +and encryption keys used by the SA. What can and cannot Trudy do? First, +Trudy cannot see the original datagram. If fact, not only is the data in +the original datagram hidden from Trudy, but so is the protocol number, +the source IP address, and the destination IP address. For datagrams +sent over the SA, Trudy only knows that the datagram originated from +some host in 172.16.1.0/24 and is destined to some host in +172.16.2.0/24. She does not know if it is carrying TCP, UDP, or ICMP +data; she does not know if it is carrying HTTP, SMTP, or some other type +of application data. This confidentiality thus goes a lot farther than +SSL. Second, suppose Trudy tries to tamper with a datagram in the SA by +flipping some of its bits. When this tampered datagram arrives at R2, it +will fail the integrity check (using the MAC), thwarting + +Trudy's vicious attempts once again. Third, suppose Trudy tries to +masquerade as R1, creating a IPsec datagram with source 200.168.1.100 +and destination 193.68.2.23. Trudy's attack will be futile, as this +datagram will again fail the integrity check at R2. Finally, because +IPsec includes sequence numbers, Trudy will not be able create a +successful replay attack. In summary, as claimed at the beginning of +this section, IPsec provides---between any pair of devices that process +packets through the network layer--- confidentiality, source +authentication, data integrity, and replay-attack prevention. + +8.7.5 IKE: Key Management in IPsec When a VPN has a small number of end +points (for example, just two routers as in Figure 8.28), the network +administrator can manually enter the SA information +(encryption/authentication algorithms and keys, and the SPIs) into the +SADs of the endpoints. Such "manual keying" is clearly impractical for a +large VPN, which may consist of hundreds or even thousands of IPsec +routers and hosts. Large, geographically distributed deployments require +an automated mechanism for creating the SAs. IPsec does this with the +Internet Key Exchange (IKE) protocol, specified in RFC 5996. IKE has +some similarities with the handshake in SSL (see Section 8.6). Each +IPsec entity has a certificate, which includes the entity's public key. +As with SSL, the IKE protocol has the two entities exchange +certificates, negotiate authentication and encryption algorithms, and +securely exchange key material for creating session keys in the IPsec +SAs. Unlike SSL, IKE employs two phases to carry out these tasks. Let's +investigate these two phases in the context of two routers, R1 and R2, +in Figure 8.28. The first phase consists of two exchanges of message +pairs between R1 and R2: During the first exchange of messages, the two +sides use Diffie-Hellman (see Homework Problems) to create a +bi-directional IKE SA between the routers. To keep us all confused, this +bi-directional IKE SA is entirely different from the IPsec SAs discussed +in Sections 8.6.3 and 8.6.4. The IKE SA provides an authenticated and +encrypted channel between the two routers. During this first +message-pair exchange, keys are established for encryption and +authentication for the IKE SA. Also established is a master secret that +will be used to compute IPSec SA keys later in phase 2. Observe that +during this first step, RSA public and private keys are not used. In +particular, neither R1 nor R2 reveals its identity by signing a message +with its private key. During the second exchange of messages, both sides +reveal their identity to each other by signing their messages. However, +the identities are not revealed to a passive sniffer, since the messages +are sent over the secured IKE SA channel. Also during this phase, the +two sides negotiate the IPsec encryption and authentication algorithms +to be employed by the IPsec SAs. In phase 2 of IKE, the two sides create +an SA in each direction. At the end of phase 2, the encryption + +and authentication session keys are established on both sides for the +two SAs. The two sides can then use the SAs to send secured datagrams, +as described in Sections 8.7.3 and 8.7.4. The primary motivation for +having two phases in IKE is computational cost---since the second phase +doesn't involve any public-key cryptography, IKE can generate a large +number of SAs between the two IPsec entities with relatively little +computational cost. + +8.8 Securing Wireless LANs Security is a particularly important concern +in wireless networks, where radio waves carrying frames can propagate +far beyond the building containing the wireless base station and hosts. +In this section we present a brief introduction to wireless security. +For a more in-depth treatment, see the highly readable book by Edney and +Arbaugh \[Edney 2003\]. The issue of security in 802.11 has attracted +considerable attention in both technical circles and in the media. While +there has been considerable discussion, there has been little +debate---there seems to be universal agreement that the original 802.11 +specification contains a number of serious security flaws. Indeed, +public domain software can now be downloaded that exploits these holes, +making those who use the vanilla 802.11 security mechanisms as open to +security attacks as users who use no security features at all. In the +following section, we discuss the security mechanisms initially +standardized in the 802.11 specification, known collectively as Wired +Equivalent Privacy (WEP). As the name suggests, WEP is meant to provide +a level of security similar to that found in wired networks. We'll then +discuss a few of the security holes in WEP and discuss the 802.11i +standard, a fundamentally more secure version of 802.11 adopted in 2004. + +8.8.1 Wired Equivalent Privacy (WEP) The IEEE 802.11 WEP protocol was +designed in 1999 to provide authentication and data encryption between a +host and a wireless access point (that is, base station) using a +symmetric shared key approach. WEP does not specify a key management +algorithm, so it is assumed that the host and wireless access point have +somehow agreed on the key via an out-of-band method. Authentication is +carried out as follows: + +1. A wireless host requests authentication by an access point. + +2. The access point responds to the authentication request with a + 128-byte nonce value. + +3. The wireless host encrypts the nonce using the symmetric key that it + shares with the access point. + +4. The access point decrypts the host-encrypted nonce. If the decrypted + nonce matches the nonce value originally sent to the host, then the + host is + +authenticated by the access point. The WEP data encryption algorithm is +illustrated in Figure 8.30. A secret 40-bit symmetric key, KS, is +assumed to be known by both a host and the access point. In addition, a +24-bit Initialization Vector (IV) is appended to the 40-bit key to +create a 64-bit key that will be used to encrypt a single frame. The IV +will + +Figure 8.30 802.11 WEP protocol + +change from one frame to another, and hence each frame will be encrypted +with a different 64-bit key. Encryption is performed as follows. First a +4-byte CRC value (see Section 6.2) is computed for the data payload. The +payload and the four CRC bytes are then encrypted using the RC4 stream +cipher. We will not cover the details of RC4 here (see \[Schneier 1995\] +and \[Edney 2003\] for details). For our purposes, it is enough to know +that when presented with a key value (in this case, the 64-bit (KS, IV) +key), the RC4 algorithm produces a stream of key values, +k1IV,k2IV,k3IV,... that are used to encrypt the data and CRC value in a +frame. For practical purposes, we can think of these operations being +performed a byte at a time. Encryption is performed by XOR-ing the ith +byte of data, di, with the ith key, kiIV, in the stream of key values +generated by the (KS, IV) pair to produce the ith byte of ciphertext, +ci: ci=di⊕kiIV The IV value changes from one frame to the next and is +included in plaintext in the header of each WEP-encrypted 802.11 frame, +as shown in Figure 8.30. The receiver takes the secret 40-bit symmetric +key that it shares with the sender, appends the IV, and uses the +resulting 64-bit key (which is identical to the key used by the sender +to perform encryption) to decrypt the frame: di=ci⊕kiIV Proper use of +the RC4 algorithm requires that the same 64-bit key value never be used +more than once. Recall that the WEP key changes on a frame-by-frame +basis. For a given KS (which changes rarely, if ever), this means that +there are only 224 unique keys. If these keys are chosen randomly, we +can show + +\[Edney 2003\] that the probability of having chosen the same IV value +(and hence used the same 64-bit key) is more than 99 percent after only +12,000 frames. With 1 Kbyte frame sizes and a data transmission rate of +11 Mbps, only a few seconds are needed before 12,000 frames are +transmitted. Furthermore, since the IV is transmitted in plaintext in +the frame, an eavesdropper will know whenever a duplicate IV value is +used. To see one of the several problems that occur when a duplicate key +is used, consider the following chosen-plaintext attack taken by Trudy +against Alice. Suppose that Trudy (possibly using IP spoofing) sends a +request (for example, an HTTP or FTP request) to Alice to transmit a +file with known content, d1, d2, d3, d4,.... Trudy also observes the +encrypted data c1, c2, c3, c4,.... Since di=ci⊕kiIV, if we XOR ci with +each side of this equality we have di⊕ci=kiIV With this relationship, +Trudy can use the known values of di and ci to compute kiIV. The next +time Trudy sees the same value of IV being used, she will know the key +sequence k1IV,k2IV,k3IV,... and will thus be able to decrypt the +encrypted message. There are several additional security concerns with +WEP as well. \[Fluhrer 2001\] described an attack exploiting a known +weakness in RC4 when certain weak keys are chosen. \[Stubblefield 2002\] +discusses efficient ways to implement and exploit this attack. Another +concern with WEP involves the CRC bits shown in Figure 8.30 and +transmitted in the 802.11 frame to detect altered bits in the payload. +However, an attacker who changes the encrypted content (e.g., +substituting gibberish for the original encrypted data), computes a CRC +over the substituted gibberish, and places the CRC into a WEP frame can +produce an 802.11 frame that will be accepted by the receiver. What is +needed here are message integrity techniques such as those we studied in +Section 8.3 to detect content tampering or substitution. For more +details of WEP security, see \[Edney 2003; Wright 2015\] and the +references therein. + +8.8.2 IEEE 802.11i Soon after the 1999 release of IEEE 802.11, work +began on developing a new and improved version of 802.11 with stronger +security mechanisms. The new standard, known as 802.11i, underwent final +ratification in 2004. As we'll see, while WEP provided relatively weak +encryption, only a single way to perform authentication, and no key +distribution mechanisms, IEEE 802.11i provides for much stronger forms +of encryption, an extensible set of authentication mechanisms, and a key +distribution mechanism. In the following, we present an overview of +802.11i; an excellent (streaming audio) technical overview of 802.11i is +\[TechOnline 2012\]. Figure 8.31 overviews the 802.11i framework. In +addition to the wireless client and access point, + +802.11i defines an authentication server with which the AP can +communicate. Separating the authentication server from the AP allows one +authentication server to serve many APs, centralizing the (often +sensitive) decisions + +Figure 8.31 802.11i: Four phases of operation + +regarding authentication and access within the single server, and +keeping AP costs and complexity low. 802.11i operates in four phases: + +1. Discovery. In the discovery phase, the AP advertises its presence + and the forms of authentication and encryption that can be provided + to the wireless client node. The client then requests the specific + forms of authentication and encryption that it desires. Although the + client and AP are already exchanging messages, the client has not + yet been authenticated nor does it have an encryption key, and so + several more steps will be required before the client can + communicate with an arbitrary remote host over the wireless channel. + +2. Mutual authentication and Master Key (MK) generation. Authentication + takes place between the wireless client and the authentication + server. In this phase, the access point acts essentially as a relay, + forwarding messages between the client and the authentication + server. The Extensible Authentication Protocol (EAP) \[RFC 3748\] + defines the end-to-end message formats used in a simple + request/response mode of interaction between the client and + authentication server. As shown in Figure 8.32, EAP messages are + encapsulated using EAPoL (EAP over LAN, \[IEEE 802.1X\]) and sent + over the 802.11 wireless link. These EAP messages + +are then decapsulated at the access point, and then re-encapsulated +using the RADIUS protocol for transmission over UDP/IP to the +authentication server. While + +Figure 8.32 EAP is an end-to-end protocol. EAP messages are encapsulated +using EAPoL over the wireless link between the client and the access +point, and using RADIUS over UDP/IP between the access point and the +authentication server + +the RADIUS server and protocol \[RFC 2865\] are not required by the +802.11i protocol, they are de facto standard components for 802.11i. The +recently standardized DIAMETER protocol \[RFC 3588\] is likely to +replace RADIUS in the near future. With EAP, the authentication server +can choose one of a number of ways to perform authentication. While +802.11i does not mandate a particular authentication method, the EAPTLS +authentication scheme \[RFC 5216\] is often used. EAP-TLS uses public +key techniques (including nonce encryption and message digests) similar +to those we studied in Section 8.3 to allow the client and the +authentication server to mutually authenticate each other, and to derive +a Master Key (MK) that is known to both parties. + +3. Pairwise Master Key (PMK) generation. The MK is a shared secret + known only to the client and the authentication server, which they + each use to generate a second key, the Pairwise Master Key (PMK). + The authentication server then sends the PMK to the AP. This is + where we wanted to be! The client and AP now have a shared key + (recall that in WEP, the problem of key distribution was not + addressed at all) and have mutually authenticated each other. + They're just about ready to get down to business. + +4. Temporal Key (TK) generation. With the PMK, the wireless client and + AP can now generate additional keys that will be used for + communication. Of particular interest is the Temporal Key (TK), + which will be used to perform the link-level encryption of data sent + over the wireless link and to an arbitrary remote host. 802.11i + provides several forms of encryption, including an AES-based + encryption scheme and a + +strengthened version of WEP encryption. + +8.9 Operational Security: Firewalls and Intrusion Detection Systems +We've seen throughout this chapter that the Internet is not a very safe +place---bad guys are out there, wreaking all sorts of havoc. Given the +hostile nature of the Internet, let's now consider an organization's +network and the network administrator who administers it. From a network +administrator's point of view, the world divides quite neatly into two +camps---the good guys (who belong to the organization's network, and who +should be able to access resources inside the organization's network in +a relatively unconstrained manner) and the bad guys (everyone else, +whose access to network resources must be carefully scrutinized). In +many organizations, ranging from medieval castles to modern corporate +office buildings, there is a single point of entry/exit where both good +guys and bad guys entering and leaving the organization are +security-checked. In a castle, this was done at a gate at one end of the +drawbridge; in a corporate building, this is done at the security desk. +In a computer network, when traffic entering/leaving a network is +security-checked, logged, dropped, or forwarded, it is done by +operational devices known as firewalls, intrusion detection systems +(IDSs), and intrusion prevention systems (IPSs). + +8.9.1 Firewalls A firewall is a combination of hardware and software +that isolates an organization's internal network from the Internet at +large, allowing some packets to pass and blocking others. A firewall +allows a network administrator to control access between the outside +world and resources within the administered network by managing the +traffic flow to and from these resources. A firewall has three goals: +All traffic from outside to inside, and vice versa, passes through the +firewall. Figure 8.33 shows a firewall, sitting squarely at the boundary +between the administered network and the rest of the Internet. While +large organizations may use multiple levels of firewalls or distributed +firewalls \[Skoudis 2006\], locating a firewall at a single access point +to the network, as shown in Figure 8.33, makes it easier to manage and +enforce a security-access policy. Only authorized traffic, as defined by +the local security policy, will be allowed to pass. With all traffic +entering and leaving the institutional network passing through the +firewall, the firewall can restrict access to authorized traffic. The +firewall itself is immune to penetration. The firewall itself is a +device connected to the network. If not designed or installed properly, +it can be compromised, in which case it provides only + +a false sense of security (which is worse than no firewall at all!). + +Figure 8.33 Firewall placement between the administered network and the +outside world + +Cisco and Check Point are two of the leading firewall vendors today. You +can also easily create a firewall (packet filter) from a Linux box using +iptables (public-domain software that is normally shipped with Linux). +Furthermore, as discussed in Chapters 4 and 5, firewalls are now +frequently implemented in routers and controlled remotely using SDNs. +Firewalls can be classified in three categories: traditional packet +filters, stateful filters, and application gateways. We'll cover each of +these in turn in the following subsections. Traditional Packet Filters +As shown in Figure 8.33, an organization typically has a gateway router +connecting its internal network to its ISP (and hence to the larger +public Internet). All traffic leaving and entering the internal network +passes through this router, and it is at this router where packet +filtering occurs. A packet filter examines each datagram in isolation, +determining whether the datagram should be allowed to pass or should be +dropped based on administrator-specific rules. Filtering decisions are +typically based on: IP source or destination address Protocol type in IP +datagram field: TCP, UDP, ICMP, OSPF, and so on TCP or UDP source and +destination port + +Table 8.5 Policies and corresponding filtering rules for an +organization's network 130.207/16 with Web server at 130.207.244.203 +Policy + +Firewall Setting + +No outside Web access. + +Drop all outgoing packets to any IP address, port 80. + +No incoming TCP connections, except those for + +Drop all incoming TCP SYN packets to any + +organization's public Web server only. + +IP except 130.207.244.203, port 80. + +Prevent Web-radios from eating up the + +Drop all incoming UDP packets---except DNS + +available bandwidth. + +packets. + +Prevent your network from being used for a + +Drop all ICMP ping packets going to a + +smurf DoS attack. + +"broadcast" address (eg 130.207.255.255). + +Prevent your network from being tracerouted. + +Drop all outgoing ICMP TTL expired traffic. + +TCP flag bits: SYN, ACK, and so on ICMP message type Different rules for +datagrams leaving and entering the network Different rules for the +different router interfaces A network administrator configures the +firewall based on the policy of the organization. The policy may take +user productivity and bandwidth usage into account as well as the +security concerns of an organization. Table 8.5 lists a number of +possible polices an organization may have, and how they would be +addressed with a packet filter. For example, if the organization doesn't +want any incoming TCP connections except those for its public Web +server, it can block all incoming TCP SYN segments except TCP SYN +segments with destination port 80 and the destination IP address +corresponding to the Web server. If the organization doesn't want its +users to monopolize access bandwidth with Internet radio applications, +it can block all not-critical UDP traffic (since Internet radio is often +sent over UDP). If the organization doesn't want its internal network to +be mapped (tracerouted) by an outsider, it can block all ICMP TTL +expired messages leaving the organization's network. A filtering policy +can be based on a combination of addresses and port numbers. For +example, a filtering router could forward all Telnet datagrams (those +with a port number of 23) except those going to and coming from a list +of specific IP addresses. This policy permits Telnet connections to and +from hosts on the allowed list. Unfortunately, basing the policy on +external addresses provides no protection against + +datagrams that have had their source addresses spoofed. Filtering can +also be based on whether or not the TCP ACK bit is set. This trick is +quite useful if an organization wants to let its internal clients +connect to external servers but wants to prevent external clients from +connecting to internal servers. Table 8.6 An access control list for a +router interface action + +allow + +source address + +222.22/16 + +dest address + +source + +dest + +flag + +port + +port + +bit + +TCP + +> 1023 + +80 + +any + +222.22/16 + +TCP + +80 + +> 1023 + +ACK + +outside of + +UDP + +> 1023 + +53 + +--- + +222.22/16 + +UDP + +53 + +> 1023 + +--- + +all + +all + +all + +all + +all + +outside of + +protocol + +222.22/16 allow + +outside of 222.22/16 + +allow + +222.22/16 + +222.22/16 allow + +outside of 222.22/16 + +deny + +all + +Recall from Section 3.5 that the first segment in every TCP connection +has the ACK bit set to 0, whereas all the other segments in the +connection have the ACK bit set to 1. Thus, if an organization wants to +prevent external clients from initiating connections to internal +servers, it simply filters all incoming segments with the ACK bit set to +0. This policy kills all TCP connections originating from the outside, +but permits connections originating internally. Firewall rules are +implemented in routers with access control lists, with each router +interface having its own list. An example of an access control list for +an organization 222.22/16 is shown in Table 8.6. This access control +list is for an interface that connects the router to the organization's +external ISPs. Rules are applied to each datagram that passes through +the interface from top to bottom. The first two rules together allow +internal users to surf the Web: The first rule allows any TCP packet +with destination port 80 to leave the organization's network; the second +rule allows any TCP packet with source port 80 and the ACK bit set to +enter the organization's network. Note that if an external source +attempts to establish a TCP connection with an internal host, the +connection will be blocked, even if the source or destination port is +80. The second two rules together allow DNS packets to enter and leave +the organization's + +network. In summary, this rather restrictive access control list blocks +all traffic except Web traffic initiated from within the organization +and DNS traffic. \[CERT Filtering 2012\] provides a list of recommended +port/protocol packet filterings to avoid a number of well-known security +holes in existing network applications. Stateful Packet Filters In a +traditional packet filter, filtering decisions are made on each packet +in isolation. Stateful filters actually track TCP connections, and use +this knowledge to make filtering decisions. Table 8.7 Connection table +for stateful filter source address + +dest address + +source port + +dest port + +222.22.1.7 + +37.96.87.123 + +12699 + +80 + +222.22.93.2 + +199.1.205.23 + +37654 + +80 + +222.22.65.143 + +203.77.240.43 + +48712 + +80 + +To understand stateful filters, let's reexamine the access control list +in Table 8.6. Although rather restrictive, the access control list in +Table 8.6 nevertheless allows any packet arriving from the outside with +ACK = 1 and source port 80 to get through the filter. Such packets could +be used by attackers in attempts to crash internal systems with +malformed packets, carry out denial-of-service attacks, or map the +internal network. The naive solution is to block TCP ACK packets as +well, but such an approach would prevent the organization's internal +users from surfing the Web. Stateful filters solve this problem by +tracking all ongoing TCP connections in a connection table. This is +possible because the firewall can observe the beginning of a new +connection by observing a three-way handshake (SYN, SYNACK, and ACK); +and it can observe the end of a connection when it sees a FIN packet for +the connection. The firewall can also (conservatively) assume that the +connection is over when it hasn't seen any activity over the connection +for, say, 60 seconds. An example connection table for a firewall is +shown in Table 8.7. This connection table indicates that there are +currently three ongoing TCP connections, all of which have been +initiated from within the organization. Additionally, the stateful +filter includes a new column, "check connection," in its access control +list, as shown in Table 8.8. Note that Table 8.8 is identical to the +access control list in Table 8.6, except now it indicates that the +connection should be checked for two of the rules. Let's walk through +some examples to see how the connection table and the extended access +control list + +work hand-in-hand. Suppose an attacker attempts to send a malformed +packet into the organization's network by sending a datagram with TCP +source port 80 and with the ACK flag set. Further suppose that this +packet has source port number 12543 and source IP address 150.23.23.155. +When this packet reaches the firewall, the firewall checks the access +control list in Table 8.7, which indicates that the connection table +must also be checked before permitting this packet to enter the +organization's network. The firewall duly checks the connection table, +sees that this packet is not part of an ongoing TCP connection, and +rejects the packet. As a second example, suppose that an internal user +wants to surf an external Web site. Because this user first sends a TCP +SYN segment, the user's TCP connection gets recorded in the connection +table. When Table 8.8 Access control list for stateful filter action + +allow + +source address + +222.22/16 + +dest address + +outside of + +protocol + +source + +dest + +flag + +check + +port + +port + +bit + +conxion + +TCP + +> 1023 + +80 + +any + +TCP + +80 + +ACK + +222.22/16 allow + +outside of + +222.22/16 + +222.22/16 allow + +222.22/16 + +X + +1023 outside of + +UDP + +> 1023 + +53 + +--- + +UDP + +53 + +--- + +222.22/16 allow + +outside of + +222.22/16 + +222.22/16 deny + +all + +X + +1023 all + +all + +all + +all + +all + +the Web server sends back packets (with the ACK bit necessarily set), +the firewall checks the table and sees that a corresponding connection +is in progress. The firewall will thus let these packets pass, thereby +not interfering with the internal user's Web surfing activity. +Application Gateway In the examples above, we have seen that +packet-level filtering allows an organization to perform coarse-grain +filtering on the basis of the contents of IP and TCP/UDP headers, +including IP addresses, port numbers, and acknowledgment bits. But what +if an organization wants to provide a Telnet service to a restricted set +of internal users (as opposed to IP addresses)? And what if the +organization wants such privileged users to authenticate themselves +first before being allowed to create Telnet sessions to the + +outside world? Such tasks are beyond the capabilities of traditional and +stateful filters. Indeed, information about the identity of the internal +users is application-layer data and is not included in the IP/TCP/UDP +headers. To have finer-level security, firewalls must combine packet +filters with application gateways. Application gateways look beyond the +IP/TCP/UDP headers and make policy decisions based on application data. +An application gateway is an application-specific server through which +all application data (inbound and outbound) must pass. Multiple +application gateways can run on the same host, but each gateway is a +separate server with its own processes. To get some insight into +application gateways, let's design a firewall that allows only a +restricted set of internal users to Telnet outside and prevents all +external clients from Telneting inside. Such a policy can be +accomplished by implementing + +Figure 8.34 Firewall consisting of an application gateway and a filter + +a combination of a packet filter (in a router) and a Telnet application +gateway, as shown in Figure 8.34. The router's filter is configured to +block all Telnet connections except those that originate from the IP +address of the application gateway. Such a filter configuration forces +all outbound Telnet connections to pass through the application gateway. +Consider now an internal user who wants to Telnet to the outside world. +The user must first set up a Telnet session with the application +gateway. An application running in the gateway, which listens for +incoming Telnet sessions, prompts the user for a user ID and password. +When the user supplies this information, the application gateway checks +to see if the user has + +permission to Telnet to the outside world. If not, the Telnet connection +from the internal user to the gateway is terminated by the gateway. If +the user has permission, then the gateway (1) prompts the user for the +host name of the external host to which the user wants to connect, (2) +sets up a Telnet session between the gateway and the external host, and +(3) relays to the external host all data arriving from the user, and +relays to the user all data arriving from the external host. Thus, the +Telnet application gateway not only performs user authorization but also +acts as a Telnet server and a Telnet client, relaying information +between the user and the remote Telnet server. Note that the filter will +permit step 2 because the gateway initiates the Telnet connection to the +outside world. + +CASE HISTORY ANONYMITY AND PRIVACY Suppose you want to visit a +controversial Web site (for example, a political activist site) and you +(1) don't want to reveal your IP address to the Web site, (2) don't want +your local ISP (which may be your home or office ISP) to know that you +are visiting the site, and (3) don't want your local ISP to see the data +you are exchanging with the site. If you use the traditional approach of +connecting directly to the Web site without any encryption, you fail on +all three counts. Even if you use SSL, you fail on the first two counts: +Your source IP address is presented to the Web site in every datagram +you send; and the destination address of every packet you send can +easily be sniffed by your local ISP. To obtain privacy and anonymity, +you can instead use a combination of a trusted proxy server and SSL, as +shown in Figure 8.35. With this approach, you first make an SSL +connection to the trusted proxy. You then send, into this SSL +connection, an HTTP request for a page at the desired site. When the +proxy receives the SSL-encrypted HTTP request, it decrypts the request +and forwards the cleartext HTTP request to the Web site. The Web site +then responds to the proxy, which in turn forwards the response to you +over SSL. Because the Web site only sees the IP address of the proxy, +and not of your client's address, you are indeed obtaining anonymous +access to the Web site. And because all traffic between you and the +proxy is encrypted, your local ISP cannot invade your privacy by logging +the site you visited or recording the data you are exchanging. Many +companies today (such as proxify .com) make available such proxy +services. Of course, in this solution, your proxy knows everything: It +knows your IP address and the IP address of the site you're surfing; and +it can see all the traffic in cleartext exchanged between you and the +Web site. Such a solution, therefore, is only as good as the +trustworthiness of the proxy. A more robust approach, taken by the TOR +anonymizing and privacy service, is to route your traffic through a +series of non-colluding proxy servers \[TOR 2016\]. In particular, TOR +allows independent individuals to contribute proxies to its proxy pool. +When a user connects to a server using TOR, TOR randomly chooses (from +its proxy pool) a chain of three proxies and routes all traffic between +client and server over the chain. In this manner, assuming the proxies +do not collude, no one knows that communication took place between your +IP address and the + +target Web site. Furthermore, although cleartext is sent between the +last proxy and the server, the last proxy doesn't know what IP address +is sending and receiving the cleartext. + +Figure 8.35 Providing anonymity and privacy with a proxy + +Internal networks often have multiple application gateways, for example, +gateways for Telnet, HTTP, FTP, and e-mail. In fact, an organization's +mail server (see Section 2.3) and Web cache are application gateways. +Application gateways do not come without their disadvantages. First, a +different application gateway is needed for each application. Second, +there is a performance penalty to be paid, since all data will be +relayed via the gateway. This becomes a concern particularly when +multiple users or applications are using the same gateway machine. +Finally, the client software must know how to contact the gateway when +the user makes a request, and must know how to tell the application +gateway what external server to connect to. + +8.9.2 Intrusion Detection Systems We've just seen that a packet filter +(traditional and stateful) inspects IP, TCP, UDP, and ICMP header fields +when deciding which packets to let pass through the firewall. However, +to detect many attack types, we need to perform deep packet inspection, +that is, look beyond the header fields and into the actual application +data that the packets carry. As we saw in Section 8.9.1, application +gateways often do deep packet inspection. But an application gateway +only does this for a specific application. Clearly, there is a niche for +yet another device---a device that not only examines the headers of all +packets passing through it (like a packet filter), but also performs +deep packet inspection (unlike a packet filter). When such a device +observes a suspicious packet, or a suspicious series of packets, it +could prevent those packets from entering the organizational network. +Or, because the activity is only + +deemed as suspicious, the device could let the packets pass, but send +alerts to a network administrator, who can then take a closer look at +the traffic and take appropriate actions. A device that generates alerts +when it observes potentially malicious traffic is called an intrusion +detection system (IDS). A device that filters out suspicious traffic is +called an intrusion prevention system (IPS). In this section we study +both systems---IDS and IPS---together, since the most interesting +technical aspect of these systems is how they detect suspicious traffic +(and not whether they send alerts or drop packets). We will henceforth +collectively refer to IDS systems and IPS systems as IDS systems. An IDS +can be used to detect a wide range of attacks, including network mapping +(emanating, for example, from nmap), port scans, TCP stack scans, DoS +bandwidth-flooding attacks, worms and viruses, OS vulnerability attacks, +and application vulnerability attacks. (See Section 1.6 for a survey of +network attacks.) Today, thousands of organizations employ IDS systems. +Many of these deployed systems are proprietary, marketed by Cisco, Check +Point, and other security equipment vendors. But many of the deployed +IDS systems are public-domain systems, such as the immensely popular +Snort IDS system (which we'll discuss shortly). An organization may +deploy one or more IDS sensors in its organizational network. Figure +8.36 shows an organization that has three IDS sensors. When multiple +sensors are deployed, they typically work in concert, sending +information about + +Figure 8.36 An organization deploying a filter, an application gateway, +and IDS sensors + +suspicious traffic activity to a central IDS processor, which collects +and integrates the information and sends alarms to network +administrators when deemed appropriate. In Figure 8.36, the organization +has partitioned its network into two regions: a high-security region, +protected by a packet filter and an application gateway and monitored by +IDS sensors; and a lower-security region---referred to as the +demilitarized zone (DMZ)---which is protected only by the packet filter, +but also monitored by IDS sensors. Note that the DMZ includes the +organization's servers that need to communicate with the outside world, +such as its public Web server and its authoritative DNS server. You may +be wondering at this stage, why multiple IDS sensors? Why not just place +one IDS sensor just behind the packet filter (or even integrated with +the packet filter) in Figure 8.36? We will soon see that an IDS not only +needs to do deep packet inspection, but must also compare each passing +packet with tens of thousands of "signatures"; this can be a significant +amount of processing, particularly if the organization receives +gigabits/sec of traffic from the Internet. By placing the IDS sensors +further downstream, each sensor sees only a fraction of the +organization's traffic, and can more easily keep up. Nevertheless, +high-performance IDS and IPS systems are available today, and many +organizations can actually get by with just one sensor located near its +access router. IDS systems are broadly classified as either +signature-based systems or anomaly-based systems. A signature-based IDS +maintains an extensive database of attack signatures. Each signature is +a set of rules pertaining to an intrusion activity. A signature may +simply be a list of characteristics about a single packet (e.g., source +and destination port numbers, protocol type, and a specific string of +bits in the packet payload), or may relate to a series of packets. The +signatures are normally created by skilled network security engineers +who research known attacks. An organization's network administrator can +customize the signatures or add its own to the database. Operationally, +a signature-based IDS sniffs every packet passing by it, comparing each +sniffed packet with the signatures in its database. If a packet (or +series of packets) matches a signature in the database, the IDS +generates an alert. The alert could be sent to the network administrator +in an e-mail message, could be sent to the network management system, or +could simply be logged for future inspection. Signature-based IDS +systems, although widely deployed, have a number of limitations. Most +importantly, they require previous knowledge of the attack to generate +an accurate signature. In other words, a signature-based IDS is +completely blind to new attacks that have yet to be recorded. Another +disadvantage is that even if a signature is matched, it may not be the +result of an attack, so that a false alarm is generated. Finally, +because every packet must be compared with an extensive collection of +signatures, the IDS can become overwhelmed with processing and actually +fail to detect many malicious + +packets. An anomaly-based IDS creates a traffic profile as it observes +traffic in normal operation. It then looks for packet streams that are +statistically unusual, for example, an inordinate percentage of ICMP +packets or a sudden exponential growth in port scans and ping sweeps. +The great thing about anomaly-based IDS systems is that they don't rely +on previous knowledge about existing attacks---that is, they can +potentially detect new, undocumented attacks. On the other hand, it is +an extremely challenging problem to distinguish between normal traffic +and statistically unusual traffic. To date, most IDS deployments are +primarily signature-based, although some include some anomaly-based +features. Snort Snort is a public-domain, open source IDS with hundreds +of thousands of existing deployments \[Snort 2012; Koziol 2003\]. It can +run on Linux, UNIX, and Windows platforms. It uses the generic sniffing +interface libpcap, which is also used by Wireshark and many other packet +sniffers. It can easily handle 100 Mbps of traffic; for installations +with gibabit/sec traffic rates, multiple Snort sensors may be needed. To +gain some insight into Snort, let's take a look at an example of a Snort +signature: + +alert icmp \$EXTERNAL_NET any -\> \$HOME_NET any (msg:"ICMP PING NMAP"; +dsize: 0; itype: 8;) + +This signature is matched by any ICMP packet that enters the +organization's network ( \$HOME_NET ) from the outside ( \$EXTERNAL_NET +), is of type 8 (ICMP ping), and has an empty payload (dsize = 0). Since +nmap (see Section 1.6) generates ping packets with these specific +characteristics, this signature is designed to detect nmap ping sweeps. +When a packet matches this signature, Snort generates an alert that +includes the message "ICMP PING NMAP" . Perhaps what is most impressive +about Snort is the vast community of users and security experts that +maintain its signature database. Typically within a few hours of a new +attack, the Snort community writes and releases an attack signature, +which is then downloaded by the hundreds of thousands of Snort +deployments distributed around the world. Moreover, using the Snort +signature syntax, network administrators can tailor the signatures to +their own organization's needs by either modifying existing signatures +or creating entirely new ones. + +8.10 Summary In this chapter, we've examined the various mechanisms that +our secret lovers, Bob and Alice, can use to communicate securely. We've +seen that Bob and Alice are interested in confidentiality (so they alone +are able to understand the contents of a transmitted message), end-point +authentication (so they are sure that they are talking with each other), +and message integrity (so they are sure that their messages are not +altered in transit). Of course, the need for secure communication is not +confined to secret lovers. Indeed, we saw in Sections 8.5 through 8.8 +that security can be used in various layers in a network architecture to +protect against bad guys who have a large arsenal of possible attacks at +hand. The first part of this chapter presented various principles +underlying secure communication. In Section 8.2, we covered +cryptographic techniques for encrypting and decrypting data, including +symmetric key cryptography and public key cryptography. DES and RSA were +examined as specific case studies of these two major classes of +cryptographic techniques in use in today's networks. In Section 8.3, we +examined two approaches for providing message integrity: message +authentication codes (MACs) and digital signatures. The two approaches +have a number of parallels. Both use cryptographic hash functions and +both techniques enable us to verify the source of the message as well as +the integrity of the message itself. One important difference is that +MACs do not rely on encryption whereas digital signatures require a +public key infrastructure. Both techniques are extensively used in +practice, as we saw in Sections 8.5 through 8.8. Furthermore, digital +signatures are used to create digital certificates, which are important +for verifying the validity of public keys. In Section 8.4, we examined +endpoint authentication and introduced nonces to defend against the +replay attack. In Sections 8.5 through 8.8 we examined several security +networking protocols that enjoy extensive use in practice. We saw that +symmetric key cryptography is at the core of PGP, SSL, IPsec, and +wireless security. We saw that public key cryptography is crucial for +both PGP and SSL. We saw that PGP uses digital signatures for message +integrity, whereas SSL and IPsec use MACs. Having now an understanding +of the basic principles of cryptography, and having studied how these +principles are actually used, you are now in position to design your own +secure network protocols! Armed with the techniques covered in Sections +8.2 through 8.8, Bob and Alice can communicate securely. (One can only +hope that they are networking students who have learned this material +and can thus avoid having their tryst uncovered by Trudy!) But +confidentiality is only a small part of the network security picture. As +we learned in Section 8.9, increasingly, the focus in network security +has been on securing the network infrastructure against a potential +onslaught by the bad guys. In the latter part of this chapter, we thus +covered firewalls and IDS systems which inspect packets entering and +leaving an + +organization's network. This chapter has covered a lot of ground, while +focusing on the most important topics in modern network security. +Readers who desire to dig deeper are encouraged to investigate the +references cited in this chapter. In particular, we recommend \[Skoudis +2006\] for attacks and operational security, \[Kaufman 1995\] for +cryptography and how it applies to network security, \[Rescorla 2001\] +for an in-depth but readable treatment of SSL, and \[Edney 2003\] for a +thorough discussion of 802.11 security, including an insightful +investigation into WEP and its flaws. + +Homework Problems and Questions + +Chapter 8 Review Problems + +SECTION 8.1 R1. What are the differences between message confidentiality +and message integrity? Can you have confidentiality without integrity? +Can you have integrity without confidentiality? Justify your answer. R2. +Internet entities (routers, switches, DNS servers, Web servers, user end +systems, and so on) often need to communicate securely. Give three +specific example pairs of Internet entities that may want secure +communication. + +SECTION 8.2 R3. From a service perspective, what is an important +difference between a symmetric-key system and a public-key system? R4. +Suppose that an intruder has an encrypted message as well as the +decrypted version of that message. Can the intruder mount a +ciphertext-only attack, a known-plaintext attack, or a chosenplaintext +attack? R5. Consider an 8-block cipher. How many possible input blocks +does this cipher have? How many possible mappings are there? If we view +each mapping as a key, then how many possible keys does this cipher +have? R6. Suppose N people want to communicate with each of N−1 other +people using symmetric key encryption. All communication between any two +people, i and j, is visible to all other people in this group of N, and +no other person in this group should be able to decode their +communication. How many keys are required in the system as a whole? Now +suppose that public key encryption is used. How many keys are required +in this case? R7. Suppose n=10,000, a=10,023, and b=10,004. Use an +identity of modular arithmetic to calculate in your head (a⋅b)mod n. R8. +Suppose you want to encrypt the message 10101111 by encrypting the +decimal number that corresponds to the message. What is the decimal +number? + +SECTIONS 8.3--8.4 + +R9. In what way does a hash provide a better message integrity check +than a checksum (such as the Internet checksum)? R10. Can you "decrypt" +a hash of a message to get the original message? Explain your answer. +R11. Consider a variation of the MAC algorithm (Figure 8.9 ) where the +sender sends (m, H(m)+s), where H(m)+s is the concatenation of H(m) and +s. Is this variation flawed? Why or why not? R12. What does it mean for +a signed document to be verifiable and nonforgeable? R13. In what way +does the public-key encrypted message hash provide a better digital +signature than the public-key encrypted message? R14. Suppose +certifier.com creates a certificate for foo.com. Typically, the entire +certificate would be encrypted with certifier.com's public key. True or +false? R15. Suppose Alice has a message that she is ready to send to +anyone who asks. Thousands of people want to obtain Alice's message, but +each wants to be sure of the integrity of the message. In this context, +do you think a MAC-based or a digital-signature-based integrity scheme +is more suitable? Why? R16. What is the purpose of a nonce in an +end-point authentication protocol? R17. What does it mean to say that a +nonce is a once-in-a-lifetime value? In whose lifetime? R18. Is the +message integrity scheme based on HMAC susceptible to playback attacks? +If so, how can a nonce be incorporated into the scheme to remove this +susceptibility? + +SECTIONS 8.5--8.8 R19. Suppose that Bob receives a PGP message from +Alice. How does Bob know for sure that Alice created the message (rather +than, say, Trudy)? Does PGP use a MAC for message integrity? R20. In the +SSL record, there is a field for SSL sequence numbers. True or false? +R21. What is the purpose of the random nonces in the SSL handshake? R22. +Suppose an SSL session employs a block cipher with CBC. True or false: +The server sends to the client the IV in the clear. R23. Suppose Bob +initiates a TCP connection to Trudy who is pretending to be Alice. +During the handshake, Trudy sends Bob Alice's certificate. In what step +of the SSL handshake algorithm will Bob discover that he is not +communicating with Alice? R24. Consider sending a stream of packets from +Host A to Host B using IPsec. Typically, a new SA will be established +for each packet sent in the stream. True or false? R25. Suppose that TCP +is being run over IPsec between headquarters and the branch office in +Figure 8.28 . If TCP retransmits the same packet, then the two +corresponding packets sent by R1 packets will have the same sequence +number in the ESP header. True or false? R26. An IKE SA and an IPsec SA +are the same thing. True or false? R27. Consider WEP for 802.11. Suppose +that the data is 10101100 and the keystream is 1111000. What is the +resulting ciphertext? + +R28. In WEP, an IV is sent in the clear in every frame. True or false? + +SECTION 8.9 R29. Stateful packet filters maintain two data structures. +Name them and briefly describe what they do. R30. Consider a traditional +(stateless) packet filter. This packet filter may filter packets based +on TCP flag bits as well as other header fields. True or false? R31. In +a traditional packet filter, each interface can have its own access +control list. True or false? R32. Why must an application gateway work +in conjunction with a router filter to be effective? R33. +Signature-based IDSs and IPSs inspect into the payloads of TCP and UDP +segments. True or false? + +Problems P1. Using the monoalphabetic cipher in Figure 8.3 , encode the +message "This is an easy problem." Decode the message "rmij'u uamu xyj." +P2. Show that Trudy's known-plaintext attack, in which she knows the +(ciphertext, plaintext) translation pairs for seven letters, reduces the +number of possible substitutions to be checked in the example in Section +8.2.1 by approximately 109. P3. Consider the polyalphabetic system shown +in Figure 8.4 . Will a chosen-plaintext attack that is able to get the +plaintext encoding of the message "The quick brown fox jumps over the +lazy dog." be sufficient to decode all messages? Why or why not? P4. +Consider the block cipher in Figure 8.5 . Suppose that each block cipher +Ti simply reverses the order of the eight input bits (so that, for +example, 11110000 becomes 00001111). Further suppose that the 64-bit +scrambler does not modify any bits (so that the output value of the mth +bit is equal to the input value of the mth bit). (a) With n=3 and the +original 64-bit input equal to 10100000 repeated eight times, what is +the value of the output? (b) Repeat part (a) but now change the last bit +of the original 64-bit input from a 0 to a 1. (c) Repeat parts (a) and +(b) but now suppose that the 64-bit scrambler inverses the order of the +64 bits. P5. Consider the block cipher in Figure 8.5 . For a given "key" +Alice and Bob would need to keep eight tables, each 8 bits by 8 bits. +For Alice (or Bob) to store all eight tables, how many bits of storage +are necessary? How does this number compare with the number of bits +required for a full-table 64-bit block cipher? P6. Consider the 3-bit +block cipher in Table 8.1 . Suppose the plaintext is 100100100. (a) +Initially assume that CBC is not used. What is the resulting ciphertext? +(b) Suppose Trudy sniffs the ciphertext. Assuming she knows that a 3-bit +block cipher without CBC is being employed (but doesn't know the +specific cipher), what can she surmise? (c) Now suppose that CBC is used + +with IV=111. What is the resulting ciphertext? P7. (a) Using RSA, choose +p=3 and q=11, and encode the word "dog" by encrypting each letter +separately. Apply the decryption algorithm to the encrypted version to +recover the original plaintext message. (b) Repeat part (a) but now +encrypt "dog" as one message m. P8. Consider RSA with p=5 and q=11. + +a. What are n and z? + +b. Let e be 3. Why is this an acceptable choice for e? + +c. Find d such that de=1 (mod z) and d\<160. + +d. Encrypt the message m=8 using the key (n, e). Let c denote the + corresponding ciphertext. Show all work. Hint: To simplify the + calculations, use the fact: \[ (a mod n)⋅(b mod n)\]mod n=(a⋅b)modn + P9. In this problem, we explore the Diffie-Hellman (DH) public-key + encryption algorithm, which allows two entities to agree on a shared + key. The DH algorithm makes use of a large prime number p and + another large number g less than p. Both p and g are made public (so + that an attacker would know them). In DH, Alice and Bob each + independently choose secret keys, SA and SB, respectively. Alice + then computes her public key, TA, by raising g to SA and then taking + mod p. Bob similarly computes his own public key TB by raising g to + SB and then taking mod p. Alice and Bob then exchange their public + keys over the Internet. Alice then calculates the shared secret key + S by raising TB to SA and then taking mod p. Similarly, Bob + calculates the shared key S′ by raising TA to SB and then taking mod + p. + +e. Prove that, in general, Alice and Bob obtain the same symmetric key, + that is, prove S=S′. + +f. With p = 11 and g = 2, suppose Alice and Bob choose private keys + SA=5 and SB=12, respectively. Calculate Alice's and Bob's public + keys, TA and TB. Show all work. + +g. Following up on part (b), now calculate S as the shared symmetric + key. Show all work. + +h. Provide a timing diagram that shows how Diffie-Hellman can be + attacked by a man-inthe-middle. The timing diagram should have three + vertical lines, one for Alice, one for Bob, and one for the attacker + Trudy. P10. Suppose Alice wants to communicate with Bob using + symmetric key cryptography using a session key KS. In Section 8.2 , + we learned how public-key cryptography can be used to distribute the + session key from Alice to Bob. In this problem, we explore how the + session key can be distributed---without public key + cryptography---using a key distribution center (KDC). The KDC is a + server that shares a unique secret symmetric key with each + registered user. For Alice and Bob, denote these keys by KA-KDC and + KB-KDC. Design a scheme that uses the KDC to distribute KS to Alice + and Bob. Your scheme should use three messages to distribute the + session key: a message from Alice to the KDC; a message from the KDC + to Alice; and finally a message from Alice to Bob. The first message + is KA-KDC (A, B). Using the notation, KA-KDC, KB-KDC, S, A, and B + answer the following questions. + +a. What is the second message? b. What is the third message? P11. +Compute a third message, different from the two messages in Figure 8.8 , +that has the same checksum as the messages in Figure 8.8 . P12. Suppose +Alice and Bob share two secret keys: an authentication key S1 and a +symmetric encryption key S2. Augment Figure 8.9 so that both integrity +and confidentiality are provided. P13. In the BitTorrent P2P file +distribution protocol (see Chapter 2 ), the seed breaks the file into +blocks, and the peers redistribute the blocks to each other. Without any +protection, an attacker can easily wreak havoc in a torrent by +masquerading as a benevolent peer and sending bogus blocks to a small +subset of peers in the torrent. These unsuspecting peers then +redistribute the bogus blocks to other peers, which in turn redistribute +the bogus blocks to even more peers. Thus, it is critical for BitTorrent +to have a mechanism that allows a peer to verify the integrity of a +block, so that it doesn't redistribute bogus blocks. Assume that when a +peer joins a torrent, it initially gets a .torrent file from a fully +trusted source. Describe a simple scheme that allows peers to verify the +integrity of blocks. P14. The OSPF routing protocol uses a MAC rather +than digital signatures to provide message integrity. Why do you think a +MAC was chosen over digital signatures? P15. Consider our authentication +protocol in Figure 8.18 in which Alice authenticates herself to Bob, +which we saw works well (i.e., we found no flaws in it). Now suppose +that while Alice is authenticating herself to Bob, Bob must authenticate +himself to Alice. Give a scenario by which Trudy, pretending to be +Alice, can now authenticate herself to Bob as Alice. (Hint: Consider +that the sequence of operations of the protocol, one with Trudy +initiating and one with Bob initiating, can be arbitrarily interleaved. +Pay particular attention to the fact that both Bob and Alice will use a +nonce, and that if care is not taken, the same nonce can be used +maliciously.) P16. A natural question is whether we can use a nonce and +public key cryptography to solve the end-point authentication problem in +Section 8.4 . Consider the following natural protocol: (1) Alice sends +the message " I am Alice " to Bob. (2) Bob chooses a nonce, R, and sends +it to Alice. (3) Alice uses her private key to encrypt the nonce and +sends the resulting value to Bob. (4) Bob applies Alice's public key to +the received message. Thus, Bob computes R and authenticates Alice. + +a. Diagram this protocol, using the notation for public and private + keys employed in the textbook. + +b. Suppose that certificates are not used. Describe how Trudy can + become a "woman-inthe-middle" by intercepting Alice's messages and + then pretending to be Alice to Bob. P17. Figure 8.19 shows the + operations that Alice must perform with PGP to provide + confidentiality, authentication, and integrity. Diagram the + corresponding operations that Bob must perform on the package + received from Alice. P18. Suppose Alice wants to send an e-mail to + Bob. Bob has a public-private key pair + +(KB+,KB−), and Alice has Bob's certificate. But Alice does not have a +public, private key pair. Alice and Bob (and the entire world) share the +same hash function H(⋅). + +a. In this situation, is it possible to design a scheme so that Bob can + verify that Alice created the message? If so, show how with a block + diagram for Alice and Bob. + +b. Is it possible to design a scheme that provides confidentiality for + sending the message from Alice to Bob? If so, show how with a block + diagram for Alice and Bob. P19. Consider the Wireshark output below + for a portion of an SSL session. + +c. Is Wireshark packet 112 sent by the client or server? + +d. What is the server's IP address and port number? + +e. Assuming no loss and no retransmissions, what will be the sequence + number of the next TCP segment sent by the client? + +f. How many SSL records does Wireshark packet 112 contain? + +g. Does packet 112 contain a Master Secret or an Encrypted Master + Secret or neither? + +h. Assuming that the handshake type field is 1 byte and each length + field is 3 bytes, what are the values of the first and last bytes of + the Master Secret (or Encrypted Master Secret)? + +i. The client encrypted handshake message takes into account how many + SSL records? + +j. The server encrypted handshake message takes into account how many + SSL records? P20. In Section 8.6.1 , it is shown that without + sequence numbers, Trudy (a woman-in-the middle) can wreak havoc in + an SSL session by interchanging TCP segments. Can Trudy do something + similar by deleting a TCP segment? What does she need to do to + succeed at the deletion attack? What effect will it have? + +(Wireshark screenshot reprinted by permission of the Wireshark +Foundation.) + +P21. Suppose Alice and Bob are communicating over an SSL session. +Suppose an attacker, who does not have any of the shared keys, inserts a +bogus TCP segment into a packet stream with correct TCP checksum and +sequence numbers (and correct IP addresses and port numbers). Will SSL +at the receiving side accept the bogus packet and pass the payload to +the receiving application? Why or why not? P22. The following true/false +questions pertain to Figure 8.28 . + +a. When a host in 172.16.1/24 sends a datagram to an Amazon.com server, + the router R1 will encrypt the datagram using IPsec. + +b. When a host in 172.16.1/24 sends a datagram to a host in + 172.16.2/24, the router R1 will change the source and destination + address of the IP datagram. + +c. Suppose a host in 172.16.1/24 initiates a TCP connection to a Web + server in 172.16.2/24. As part of this connection, all datagrams + sent by R1 will have protocol number 50 in the left-most IPv4 header + field. + +d. Consider sending a TCP segment from a host in 172.16.1/24 to a host + in 172.16.2/24. Suppose the acknowledgment for this segment gets + lost, so that TCP resends the segment. Because IPsec uses sequence + numbers, R1 will not resend the TCP segment. + +P23. Consider the example in Figure 8.28 . Suppose Trudy is a +woman-in-the-middle, who can insert datagrams into the stream of +datagrams going from R1 and R2. As part of a replay attack, Trudy sends +a duplicate copy of one of the datagrams sent from R1 to R2. Will R2 +decrypt the duplicate datagram and forward it into the branch-office +network? If not, describe in detail how R2 detects the duplicate +datagram. P24. Consider the following pseudo-WEP protocol. The key is 4 +bits and the IV is 2 bits. The IV is appended to the end of the key when +generating the keystream. Suppose that the shared secret key is 1010. +The keystreams for the four possible inputs are as follows: 101000: +0010101101010101001011010100100 . . . 101001: +1010011011001010110100100101101 . . . 101010: +0001101000111100010100101001111 . . . 101011: +1111101010000000101010100010111 . . . Suppose all messages are 8 bits +long. Suppose the ICV (integrity check) is 4 bits long, and is +calculated by XOR-ing the first 4 bits of data with the last 4 bits of +data. Suppose the pseudoWEP packet consists of three fields: first the +IV field, then the message field, and last the ICV field, with some of +these fields encrypted. + +a. We want to send the message m=10100000 using the IV=11 and using + WEP. What will be the values in the three WEP fields? + +b. Show that when the receiver decrypts the WEP packet, it recovers the + message and the ICV. + +c. Suppose Trudy intercepts a WEP packet (not necessarily with the + IV=11) and wants to modify it before forwarding it to the receiver. + Suppose Trudy flips the first ICV bit. Assuming that Trudy does not + know the keystreams for any of the IVs, what other bit(s) must Trudy + also flip so that the received packet passes the ICV check? + +d. Justify your answer by modifying the bits in the WEP packet in part + (a), decrypting the resulting packet, and verifying the integrity + check. P25. Provide a filter table and a connection table for a + stateful firewall that is as restrictive as possible but + accomplishes the following: + +e. Allows all internal users to establish Telnet sessions with external + hosts. + +f. Allows external users to surf the company Web site at 222.22.0.12. + +g. But otherwise blocks all inbound and outbound traffic. The internal + network is 222.22/16. In your solution, suppose that the connection + table is currently caching three connections, all from inside to + outside. You'll need to invent appropriate IP addresses and port + numbers. P26. Suppose Alice wants to visit the Web site activist.com + using a TOR-like service. This service uses two non-colluding proxy + servers, Proxy1 and Proxy2. Alice first obtains the + +certificates (each containing a public key) for Proxy1 and Proxy2 from +some central server. Denote K1+(),K2+(),K1−(), and K2−() for the +encryption/decryption with public and private RSA keys. + +a. Using a timing diagram, provide a protocol (as simple as possible) + that enables Alice to establish a shared session key S1 with Proxy1. + Denote S1(m) for encryption/decryption of data m with the shared key + S1. + +b. Using a timing diagram, provide a protocol (as simple as possible) + that allows Alice to establish a shared session key S2 with Proxy2 + without revealing her IP address to Proxy2. + +c. Assume now that shared keys S1 and S2 are now established. Using a + timing diagram, provide a protocol (as simple as possible and not + using public-key cryptography) that allows Alice to request an html + page from activist.com without revealing her IP address to Proxy2 + and without revealing to Proxy1 which site she is visiting. Your + diagram should end with an HTTP request arriving at activist.com. + +Wireshark Lab In this lab (available from the book Web site), we +investigate the Secure Sockets Layer (SSL) protocol. Recall from Section +8.6 that SSL is used for securing a TCP connection, and that it is +extensively used in practice for secure Internet transactions. In this +lab, we will focus on the SSL records sent over the TCP connection. We +will attempt to delineate and classify each of the records, with a goal +of understanding the why and how for each record. We investigate the +various SSL record types as well as the fields in the SSL messages. We +do so by analyzing a trace of the SSL records sent between your host and +an e-commerce server. + +IPsec Lab In this lab (available from the book Web site), we will +explore how to create IPsec SAs between linux boxes. You can do the +first part of the lab with two ordinary linux boxes, each with one +Ethernet adapter. But for the second part of the lab, you will need four +linux boxes, two of which having two Ethernet adapters. In the second +half of the lab, you will create IPsec SAs using the ESP protocol in the +tunnel mode. You will do this by first manually creating the SAs, and +then by having IKE create the SAs. + +AN INTERVIEW WITH... Steven M. Bellovin Steven M. Bellovin joined the +faculty at Columbia University after many years at the Network Services +Research Lab at AT&T Labs Research in Florham Park, New Jersey. His +focus is on networks, security, and why the two are incompatible. In +1995, he was awarded the Usenix Lifetime Achievement Award for his work +in the creation of Usenet, the first newsgroup exchange network that +linked two or more computers and allowed users to share information + +and join in discussions. Steve is also an elected member of the National +Academy of Engineering. He received his BA from Columbia University and +his PhD from the University of North Carolina at Chapel Hill. + +What led you to specialize in the networking security area? This is +going to sound odd, but the answer is simple: It was fun. My background +was in systems programming and systems administration, which leads +fairly naturally to security. And I've always been interested in +communications, ranging back to part-time systems programming jobs when +I was in college. My work on security continues to be motivated by two +things---a desire to keep computers useful, which means that their +function can't be corrupted by attackers, and a desire to protect +privacy. What was your vision for Usenet at the time that you were +developing it? And now? We originally viewed it as a way to talk about +computer science and computer programming around the country, with a lot +of local use for administrative matters, for-sale ads, and so on. In +fact, my original prediction was one to two messages per day, from +50--100 sites at the most--- ever. But the real growth was in +people-related topics, including---but not limited to---human +interactions with computers. My favorite newsgroups, over the years, +have been things like rec.woodworking, as well as sci.crypt. To some +extent, netnews has been displaced by the Web. Were I to start designing +it today, it would look very different. But it still excels as a way to +reach a very broad audience that is interested in the topic, without +having to rely on particular Web sites. Has anyone inspired you +professionally? In what ways? Professor Fred Brooks---the founder and +original chair of the computer science department at the University of +North Carolina at Chapel Hill, the manager of the team that developed +the IBM S/360 and OS/360, and the author of The Mythical Man-Month---was +a tremendous influence on my career. More than anything else, he taught +outlook and trade-offs---how to look at problems in the context of the +real world (and how much messier the real world is than a theorist would +like), and how to balance competing interests in designing a solution. +Most computer work is engineering---the art of making the right +trade-offs to satisfy many contradictory objectives. What is your vision +for the future of networking and security? Thus far, much of the +security we have has come from isolation. A firewall, for example, works +by cutting off access to certain machines and services. But we're in an +era of increasing connectivity---it's gotten harder to isolate things. +Worse yet, our production systems require far more separate pieces, +interconnected by networks. Securing all that is one of our biggest +challenges. + +What would you say have been the greatest advances in security? How much +further do we have to go? At least scientifically, we know how to do +cryptography. That's been a big help. But most security problems are due +to buggy code, and that's a much harder problem. In fact, it's the +oldest unsolved problem in computer science, and I think it will remain +that way. The challenge is figuring out how to secure systems when we +have to build them out of insecure components. We can already do that +for reliability in the face of hardware failures; can we do the same for +security? Do you have any advice for students about the Internet and +networking security? Learning the mechanisms is the easy part. Learning +how to "think paranoid" is harder. You have to remember that probability +distributions don't apply---the attackers can and will find improbable +conditions. And the details matter---a lot. + +Chapter 9 Multimedia Networking + +While lounging in bed or riding buses and subways, people in all corners +of the world are currently using the Internet to watch movies and +television shows on demand. Internet movie and television distribution +companies such as Netflix and Amazon in North America and Youku and +Kankan in China have practically become household names. But people are +not only watching Internet videos, they are using sites like YouTube to +upload and distribute their own user-generated content, becoming +Internet video producers as well as consumers. Moreover, network +applications such as Skype, Google Talk, and WeChat (enormously popular +in China) allow people to not only make "telephone calls" over the +Internet, but to also enhance those calls with video and multi-person +conferencing. In fact, we predict that by the end of the current decade +most of the video consumption and voice conversations will take place +end-to-end over the Internet, more typically to wireless devices +connected to the Internet via cellular and WiFi access networks. +Traditional telephony and broadcast television are quickly becoming +obsolete. We begin this chapter with a taxonomy of multimedia +applications in Section 9.1. We'll see that a multimedia application can +be classified as either streaming stored audio/video, conversational +voice/video-over-IP, or streaming live audio/video. We'll see that each +of these classes of applications has its own unique service requirements +that differ significantly from those of traditional elastic applications +such as e-mail, Web browsing, and remote login. In Section 9.2, we'll +examine video streaming in some detail. We'll explore many of the +underlying principles behind video streaming, including client +buffering, prefetching, and adapting video quality to available +bandwidth. In Section 9.3, we investigate conversational voice and +video, which, unlike elastic applications, are highly sensitive to +end-to-end delay but can tolerate occasional loss of data. Here we'll +examine how techniques such as adaptive playout, forward error +correction, and error concealment can mitigate against network-induced +packet loss and delay. We'll also examine Skype as a case study. In +Section 9.4, we'll study RTP and SIP, two popular protocols for +real-time conversational voice and video applications. In Section 9.5, +we'll investigate mechanisms within the network that can be used to +distinguish one class of traffic (e.g., delay-sensitive applications +such as conversational voice) from another (e.g., elastic applications +such as browsing Web pages), and provide differentiated service among +multiple classes of traffic. + +9.1 Multimedia Networking Applications We define a multimedia network +application as any network application that employs audio or video. In +this section, we provide a taxonomy of multimedia applications. We'll +see that each class of applications in the taxonomy has its own unique +set of service requirements and design issues. But before diving into an +in-depth discussion of Internet multimedia applications, it is useful to +consider the intrinsic characteristics of the audio and video media +themselves. + +9.1.1 Properties of Video Perhaps the most salient characteristic of +video is its high bit rate. Video distributed over the Internet +typically ranges from 100 kbps for low-quality video conferencing to +over 3 Mbps for streaming highdefinition movies. To get a sense of how +video bandwidth demands compare with those of other Internet +applications, let's briefly consider three different users, each using a +different Internet application. Our first user, Frank, is going quickly +through photos posted on his friends' Facebook pages. Let's assume that +Frank is looking at a new photo every 10 seconds, and that photos are on +average 200 Kbytes in size. (As usual, throughout this discussion we +make the simplifying assumption that 1 Kbyte=8,000 bits.) Our second +user, Martha, is streaming music from the Internet ("the cloud") to her +smartphone. Let's assume Martha is using a service such as Spotify to +listen to many MP3 songs, one after the other, each encoded at a rate of +128 kbps. Our third user, Victor, is watching a video that has been +encoded at 2 Mbps. Finally, let's suppose that the session length for +all three users is 4,000 seconds (approximately 67 minutes). Table 9.1 +compares the bit rates and the total bytes transferred for these three +users. We see that video streaming consumes by far the most bandwidth, +having a bit rate of more than ten times greater than that of the +Facebook and music-streaming applications. Therefore, when design Table +9.1 Comparison of bit-rate requirements of three Internet applications +Bit rate + +Bytes transferred in 67 min + +Facebook Frank + +160 kbps + +80 Mbytes + +Martha Music + +128 kbps + +64 Mbytes + +Victor Video + +2 Mbps + +1 Gbyte + +ing networked video applications, the first thing we must keep in mind +is the high bit-rate requirements of video. Given the popularity of +video and its high bit rate, it is perhaps not surprising that Cisco +predicts \[Cisco 2015\] that streaming and stored video will be +approximately 80 percent of global consumer Internet traffic by 2019. +Another important characteristic of video is that it can be compressed, +thereby trading off video quality with bit rate. A video is a sequence +of images, typically being displayed at a constant rate, for example, at +24 or 30 images per second. An uncompressed, digitally encoded image +consists of an array of pixels, with each pixel encoded into a number of +bits to represent luminance and color. There are two types of redundancy +in video, both of which can be exploited by video compression. Spatial +redundancy is the redundancy within a given image. Intuitively, an image +that consists of mostly white space has a high degree of redundancy and +can be efficiently compressed without significantly sacrificing image +quality. Temporal redundancy reflects repetition from image to +subsequent image. If, for example, an image and the subsequent image are +exactly the same, there is no reason to re-encode the subsequent image; +it is instead more efficient simply to indicate during encoding that the +subsequent image is exactly the same. Today's off-the-shelf compression +algorithms can compress a video to essentially any bit rate desired. Of +course, the higher the bit rate, the better the image quality and the +better the overall user viewing experience. We can also use compression +to create multiple versions of the same video, each at a different +quality level. For example, we can use compression to create, say, three +versions of the same video, at rates of 300 kbps, 1 Mbps, and 3 Mbps. +Users can then decide which version they want to watch as a function of +their current available bandwidth. Users with high-speed Internet +connections might choose the 3 Mbps version; users watching the video +over 3G with a smartphone might choose the 300 kbps version. Similarly, +the video in a video conference application can be compressed +"on-the-fly" to provide the best video quality given the available +end-to-end bandwidth between conversing users. + +9.1.2 Properties of Audio Digital audio (including digitized speech and +music) has significantly lower bandwidth requirements than video. +Digital audio, however, has its own unique properties that must be +considered when designing multimedia network applications. To understand +these properties, let's first consider how analog audio (which humans +and musical instruments generate) is converted to a digital signal: The +analog audio signal is sampled at some fixed rate, for example, at 8,000 +samples per second. The value of each sample will be some real number. +Each of the samples is then rounded to one of a finite number of values. +This operation is referred to as quantization. The number of such finite +values---called quantization values---is typically a power + +of two, for example, 256 quantization values. Each of the quantization +values is represented by a fixed number of bits. For example, if there +are 256 quantization values, then each value---and hence each audio +sample---is represented by one byte. The bit representations of all the +samples are then concatenated together to form the digital +representation of the signal. As an example, if an analog audio signal +is sampled at 8,000 samples per second and each sample is quantized and +represented by 8 bits, then the resulting digital signal will have a +rate of 64,000 bits per second. For playback through audio speakers, the +digital signal can then be converted back---that is, decoded---to an +analog signal. However, the decoded analog signal is only an +approximation of the original signal, and the sound quality may be +noticeably degraded (for example, high-frequency sounds may be missing +in the decoded signal). By increasing the sampling rate and the number +of quantization values, the decoded signal can better approximate the +original analog signal. Thus (as with video), there is a trade-off +between the quality of the decoded signal and the bit-rate and storage +requirements of the digital signal. The basic encoding technique that we +just described is called pulse code modulation (PCM). Speech encoding +often uses PCM, with a sampling rate of 8,000 samples per second and 8 +bits per sample, resulting in a rate of 64 kbps. The audio compact disk +(CD) also uses PCM, with a sampling rate of 44,100 samples per second +with 16 bits per sample; this gives a rate of 705.6 kbps for mono and +1.411 Mbps for stereo. PCM-encoded speech and music, however, are rarely +used in the Internet. Instead, as with video, compression techniques are +used to reduce the bit rates of the stream. Human speech can be +compressed to less than 10 kbps and still be intelligible. A popular +compression technique for near CDquality stereo music is MPEG 1 layer 3, +more commonly known as MP3. MP3 encoders can compress to many different +rates; 128 kbps is the most common encoding rate and produces very +little sound degradation. A related standard is Advanced Audio Coding +(AAC), which has been popularized by Apple. As with video, multiple +versions of a prerecorded audio stream can be created, each at a +different bit rate. Although audio bit rates are generally much less +than those of video, users are generally much more sensitive to audio +glitches than video glitches. Consider, for example, a video conference +taking place over the Internet. If, from time to time, the video signal +is lost for a few seconds, the video conference can likely proceed +without too much user frustration. If, however, the audio signal is +frequently lost, the users may have to terminate the session. + +9.1.3 Types of Multimedia Network Applications The Internet supports a +large variety of useful and entertaining multimedia applications. In +this subsection, we classify multimedia applications into three broad +categories: (i) streaming stored + +audio/video, (ii) conversational voice/video-over-IP, and (iii) +streaming live audio/video. As we will soon see, each of these +application categories has its own set of service requirements and +design issues. Streaming Stored Audio and Video To keep the discussion +concrete, we focus here on streaming stored video, which typically +combines video and audio components. Streaming stored audio (such as +Spotify's streaming music service) is very similar to streaming stored +video, although the bit rates are typically much lower. In this class of +applications, the underlying medium is prerecorded video, such as a +movie, a television show, a prerecorded sporting event, or a prerecorded +user-generated video (such as those commonly seen on YouTube). These +prerecorded videos are placed on servers, and users send requests to the +servers to view the videos on demand. Many Internet companies today +provide streaming video, including YouTube (Google), Netflix, Amazon, +and Hulu. Streaming stored video has three key distinguishing features. +Streaming. In a streaming stored video application, the client typically +begins video playout within a few seconds after it begins receiving the +video from the server. This means that the client will be playing out +from one location in the video while at the same time receiving later +parts of the video from the server. This technique, known as streaming, +avoids having to download the entire video file (and incurring a +potentially long delay) before playout begins. Interactivity. Because +the media is prerecorded, the user may pause, reposition forward, +reposition backward, fast-forward, and so on through the video content. +The time from when the user makes such a request until the action +manifests itself at the client should be less than a few seconds for +acceptable responsiveness. Continuous playout. Once playout of the video +begins, it should proceed according to the original timing of the +recording. Therefore, data must be received from the server in time for +its playout at the client; otherwise, users experience video frame +freezing (when the client waits for the delayed frames) or frame +skipping (when the client skips over delayed frames). By far, the most +important performance measure for streaming video is average throughput. +In order to provide continuous playout, the network must provide an +average throughput to the streaming application that is at least as +large the bit rate of the video itself. As we will see in Section 9.2, +by using buffering and prefetching, it is possible to provide continuous +playout even when the throughput fluctuates, as long as the average +throughput (averaged over 5--10 seconds) remains above the video rate +\[Wang 2008\]. For many streaming video applications, prerecorded video +is stored on, and streamed from, a CDN rather than from a single data +center. There are also many P2P video streaming applications for which +the video is stored on users' hosts (peers), with different chunks of +video arriving from different peers + +that may spread around the globe. Given the prominence of Internet video +streaming, we will explore video streaming in some depth in Section 9.2, +paying particular attention to client buffering, prefetching, adapting +quality to bandwidth availability, and CDN distribution. Conversational +Voice- and Video-over-IP Real-time conversational voice over the +Internet is often referred to as Internet telephony, since, from the +user's perspective, it is similar to the traditional circuit-switched +telephone service. It is also commonly called Voice-over-IP (VoIP). +Conversational video is similar, except that it includes the video of +the participants as well as their voices. Most of today's voice and +video conversational systems allow users to create conferences with +three or more participants. Conversational voice and video are widely +used in the Internet today, with the Internet companies Skype, QQ, and +Google Talk boasting hundreds of millions of daily users. In our +discussion of application service requirements in Chapter 2 (Figure +2.4), we identified a number of axes along which application +requirements can be classified. Two of these axes---timing +considerations and tolerance of data loss---are particularly important +for conversational voice and video applications. Timing considerations +are important because audio and video conversational applications are +highly delay-sensitive. For a conversation with two or more interacting +speakers, the delay from when a user speaks or moves until the action is +manifested at the other end should be less than a few hundred +milliseconds. For voice, delays smaller than 150 milliseconds are not +perceived by a human listener, delays between 150 and 400 milliseconds +can be acceptable, and delays exceeding 400 milliseconds can result in +frustrating, if not completely unintelligible, voice conversations. On +the other hand, conversational multimedia applications are +loss-tolerant---occasional loss only causes occasional glitches in +audio/video playback, and these losses can often be partially or fully +concealed. These delay-sensitive but loss-tolerant characteristics are +clearly different from those of elastic data applications such as Web +browsing, e-mail, social networks, and remote login. For elastic +applications, long delays are annoying but not particularly harmful; the +completeness and integrity of the transferred data, however, are of +paramount importance. We will explore conversational voice and video in +more depth in Section 9.3, paying particular attention to how adaptive +playout, forward error correction, and error concealment can mitigate +against network-induced packet loss and delay. Streaming Live Audio and +Video This third class of applications is similar to traditional +broadcast radio and television, except that transmission takes place +over the Internet. These applications allow a user to receive a live +radio or television transmission---such as a live sporting event or an +ongoing news event---transmitted from any corner of the world. Today, +thousands of radio and television stations around the world are +broadcasting content over the Internet. + +Live, broadcast-like applications often have many users who receive the +same audio/video program at the same time. In the Internet today, this +is typically done with CDNs (Section 2.6). As with streaming stored +multimedia, the network must provide each live multimedia flow with an +average throughput that is larger than the video consumption rate. +Because the event is live, delay can also be an issue, although the +timing constraints are much less stringent than those for conversational +voice. Delays of up to ten seconds or so from when the user chooses to +view a live transmission to when playout begins can be tolerated. We +will not cover streaming live media in this book because many of the +techniques used for streaming live media---initial buffering delay, +adaptive bandwidth use, and CDN distribution---are similar to those for +streaming stored media. + +9.2 Streaming Stored Video For streaming video applications, prerecorded +videos are placed on servers, and users send requests to these servers +to view the videos on demand. The user may watch the video from +beginning to end without interruption, may stop watching the video well +before it ends, or interact with the video by pausing or repositioning +to a future or past scene. Streaming video systems can be classified +into three categories: UDP streaming, HTTP streaming, and adaptive HTTP +streaming (see Section 2.6). Although all three types of systems are +used in practice, the majority of today's systems employ HTTP streaming +and adaptive HTTP streaming. A common characteristic of all three forms +of video streaming is the extensive use of client-side application +buffering to mitigate the effects of varying end-to-end delays and +varying amounts of available bandwidth between server and client. For +streaming video (both stored and live), users generally can tolerate a +small several-second initial delay between when the client requests a +video and when video playout begins at the client. Consequently, when +the video starts to arrive at the client, the client need not +immediately begin playout, but can instead build up a reserve of video +in an application buffer. Once the client has built up a reserve of +several seconds of buffered-but-not-yet-played video, the client can +then begin video playout. There are two important advantages provided by +such client buffering. First, client-side buffering can absorb +variations in server-to-client delay. If a particular piece of video +data is delayed, as long as it arrives before the reserve of +received-but-not-yet-played video is exhausted, this long delay will not +be noticed. Second, if the server-to-client bandwidth briefly drops +below the video consumption rate, a user can continue to enjoy +continuous playback, again as long as the client application buffer does +not become completely drained. Figure 9.1 illustrates client-side +buffering. In this simple example, suppose that video is encoded at a +fixed bit rate, and thus each video block contains video frames that are +to be played out over the same fixed amount of time, Δ. The server +transmits the first video block at t0, the second block at t0+Δ, the +third block at t0+2Δ, and so on. Once the client begins playout, each +block should be played out Δ time units after the previous block in +order to reproduce the timing of the original recorded video. Because of +the variable end-to-end network delays, different video blocks +experience different delays. The first video block arrives at the client +at t1 and the second block arrives at t2. The network delay for the ith +block is the horizontal distance between the time the block was +transmitted by the server and the time it is received at the client; +note that the network delay varies from one video block to another. In +this example, if the client were to begin playout as soon as the first +block arrived at t1, then the second block would not have arrived in +time to be played out at out at t1+Δ. In this case, video playout would +either have to stall (waiting for block 2 to arrive) or block 2 could be +skipped---both resulting in undesirable + +playout impairments. Instead, if the client were to delay the start of +playout until t3, when blocks 1 through 6 have all arrived, periodic +playout can proceed with all blocks having been received before their +playout time. + +Figure 9.1 Client playout delay in video streaming + +9.2.1 UDP Streaming We only briefly discuss UDP streaming here, +referring the reader to more in-depth discussions of the protocols +behind these systems where appropriate. With UDP streaming, the server +transmits video at a rate that matches the client's video consumption +rate by clocking out the video chunks over UDP at a steady rate. For +example, if the video consumption rate is 2 Mbps and each UDP packet +carries 8,000 bits of video, then the server would transmit one UDP +packet into its socket every (8000 bits)/(2 Mbps)=4 msec. As we learned +in Chapter 3, because UDP does not employ a congestion-control +mechanism, the server can push packets into the network at the +consumption rate of the video without the rate-control restrictions of +TCP. UDP streaming typically uses a small client-side buffer, big enough +to hold less than a second of video. Before passing the video chunks to +UDP, the server will encapsulate the video chunks within transport +packets specially designed for transporting audio and video, using the +Real-Time Transport Protocol (RTP) \[RFC 3550\] or a similar (possibly +proprietary) scheme. We delay our coverage of RTP until Section 9.3, +where we discuss RTP in the context of conversational voice and video +systems. Another distinguishing property of UDP streaming is that in +addition to the server-to-client video stream, the client and server +also maintain, in parallel, a separate control connection over which the +client sends commands regarding session state changes (such as pause, +resume, reposition, and so on). The Real- + +Time Streaming Protocol (RTSP) \[RFC 2326\], explained in some detail in +the Web site for this textbook, is a popular open protocol for such a +control connection. Although UDP streaming has been employed in many +open-source systems and proprietary products, it suffers from three +significant drawbacks. First, due to the unpredictable and varying +amount of available bandwidth between server and client, constant-rate +UDP streaming can fail to provide continuous playout. For example, +consider the scenario where the video consumption rate is 1 Mbps and the +server-to-client available bandwidth is usually more than 1 Mbps, but +every few minutes the available bandwidth drops below 1 Mbps for several +seconds. In such a scenario, a UDP streaming system that transmits video +at a constant rate of 1 Mbps over RTP/UDP would likely provide a poor +user experience, with freezing or skipped frames soon after the +available bandwidth falls below 1 Mbps. The second drawback of UDP +streaming is that it requires a media control server, such as an RTSP +server, to process client-to-server interactivity requests and to track +client state (e.g., the client's playout point in the video, whether the +video is being paused or played, and so on) for each ongoing client +session. This increases the overall cost and complexity of deploying a +large-scale video-on-demand system. The third drawback is that many +firewalls are configured to block UDP traffic, preventing the users +behind these firewalls from receiving UDP video. + +9.2.2 HTTP Streaming In HTTP streaming, the video is simply stored in an +HTTP server as an ordinary file with a specific URL. When a user wants +to see the video, the client establishes a TCP connection with the +server and issues an HTTP GET request for that URL. The server then +sends the video file, within an HTTP response message, as quickly as +possible, that is, as quickly as TCP congestion control and flow control +will allow. On the client side, the bytes are collected in a client +application buffer. Once the number of bytes in this buffer exceeds a +predetermined threshold, the client application begins +playback---specifically, it periodically grabs video frames from the +client application buffer, decompresses the frames, and displays them on +the user's screen. We learned in Chapter 3 that when transferring a file +over TCP, the server-to-client transmission rate can vary significantly +due to TCP's congestion control mechanism. In particular, it is not +uncommon for the transmission rate to vary in a "saw-tooth" manner +associated with TCP congestion control. Furthermore, packets can also be +significantly delayed due to TCP's retransmission mechanism. Because of +these characteristics of TCP, the conventional wisdom in the 1990s was +that video streaming would never work well over TCP. Over time, however, +designers of streaming video systems learned that TCP's congestion +control and reliable-data transfer mechanisms do not necessarily +preclude continuous playout when client buffering and prefetching +(discussed in the next section) are used. + +The use of HTTP over TCP also allows the video to traverse firewalls and +NATs more easily (which are often configured to block most UDP traffic +but to allow most HTTP traffic). Streaming over HTTP also obviates the +need for a media control server, such as an RTSP server, reducing the +cost of a largescale deployment over the Internet. Due to all of these +advantages, most video streaming applications today---including YouTube +and Netflix---use HTTP streaming (over TCP) as its underlying streaming +protocol. Prefetching Video As we just learned, client-side buffering +can be used to mitigate the effects of varying end-to-end delays and +varying available bandwidth. In our earlier example in Figure 9.1, the +server transmits video at the rate at which the video is to be played +out. However, for streaming stored video, the client can attempt to +download the video at a rate higher than the consumption rate, thereby +prefetching video frames that are to be consumed in the future. This +prefetched video is naturally stored in the client application buffer. +Such prefetching occurs naturally with TCP streaming, since TCP's +congestion avoidance mechanism will attempt to use all of the available +bandwidth between server and client. To gain some insight into +prefetching, let's take a look at a simple example. Suppose the video +consumption rate is 1 Mbps but the network is capable of delivering the +video from server to client at a constant rate of 1.5 Mbps. Then the +client will not only be able to play out the video with a very small +playout delay, but will also be able to increase the amount of buffered +video data by 500 Kbits every second. In this manner, if in the future +the client receives data at a rate of less than 1 Mbps for a brief +period of time, the client will be able to continue to provide +continuous playback due to the reserve in its buffer. \[Wang 2008\] +shows that when the average TCP throughput is roughly twice the media +bit rate, streaming over TCP results in minimal starvation and low +buffering delays. Client Application Buffer and TCP Buffers Figure 9.2 +illustrates the interaction between client and server for HTTP +streaming. At the server side, the portion of the video file in white +has already been sent into the server's socket, while the darkened +portion is what remains to be sent. After "passing through the socket +door," the bytes are placed in the TCP send buffer before being +transmitted into the Internet, as described in Chapter 3. In Figure 9.2, +because the TCP send buffer at the server side is shown to be full, the +server is momentarily prevented from sending more bytes from the video +file into the socket. On the client side, the client application (media +player) reads bytes from the TCP receive buffer (through its client +socket) and places the bytes into the client application buffer. At the +same time, the client application periodically grabs video frames from +the client application buffer, decompresses the frames, and displays +them on the user's screen. Note that if the client application buffer is +larger than the video file, then the whole process of moving bytes from +the server's storage to the client's application buffer is equivalent to +an ordinary file download over HTTP---the client simply pulls the video +off the server as fast as TCP will allow! + +Figure 9.2 Streaming stored video over HTTP/TCP + +Consider now what happens when the user pauses the video during the +streaming process. During the pause period, bits are not removed from +the client application buffer, even though bits continue to enter the +buffer from the server. If the client application buffer is finite, it +may eventually become full, which will cause "back pressure" all the way +back to the server. Specifically, once the client application buffer +becomes full, bytes can no longer be removed from the client TCP receive +buffer, so it too becomes full. Once the client receive TCP buffer +becomes full, bytes can no longer be removed from the server TCP send +buffer, so it also becomes full. Once the TCP becomes full, the server +cannot send any more bytes into the socket. Thus, if the user pauses the +video, the server may be forced to stop transmitting, in which case the +server will be blocked until the user resumes the video. In fact, even +during regular playback (that is, without pausing), if the client +application buffer becomes full, back pressure will cause the TCP +buffers to become full, which will force the server to reduce its rate. +To determine the resulting rate, note that when the client application +removes f bits, it creates room for f bits in the client application +buffer, which in turn allows the server to send f additional bits. Thus, +the server send rate can be no higher than the video consumption rate at +the client. Therefore, a full client application buffer indirectly +imposes a limit on the rate that video can be sent from server to client +when streaming over HTTP. Analysis of Video Streaming Some simple +modeling will provide more insight into initial playout delay and +freezing due to application buffer depletion. As shown in Figure 9.3, +let B denote the size + +Figure 9.3 Analysis of client-side buffering for video streaming + +(in bits) of the client's application buffer, and let Q denote the +number of bits that must be buffered before the client application +begins playout. (Of course, Q\<B.) Let r denote the video consumption +rate ---the rate at which the client draws bits out of the client +application buffer during playback. So, for example, if the video's +frame rate is 30 frames/sec, and each (compressed) frame is 100,000 +bits, then r=3 Mbps. To see the forest through the trees, we'll ignore +TCP's send and receive buffers. Let's assume that the server sends bits +at a constant rate x whenever the client buffer is not full. (This is a +gross simplification, since TCP's send rate varies due to congestion +control; we'll examine more realistic time-dependent rates x(t) in the +problems at the end of this chapter.) Suppose at time t=0, the +application buffer is empty and video begins arriving to the client +application buffer. We now ask at what time t=tp does playout begin? And +while we are at it, at what time t=tf does the client application buffer +become full? First, let's determine tp, the time when Q bits have +entered the application buffer and playout begins. Recall that bits +arrive to the client application buffer at rate x and no bits are +removed from this buffer before playout begins. Thus, the amount of time +required to build up Q bits (the initial buffering delay) is tp=Q/x. Now +let's determine tf, the point in time when the client application buffer +becomes full. We first observe that if x\<r (that is, if the server send +rate is less than the video consumption rate), then the client buffer +will never become full! Indeed, starting at time tp, the buffer will be +depleted at rate r and will only be filled at rate x\<r. Eventually the +client buffer will empty out entirely, at which time the video will +freeze on the screen while the client buffer waits another tp seconds to +build up Q bits of video. Thus, when the + +available rate in the network is less than the video rate, playout will +alternate between periods of continuous playout and periods of freezing. +In a homework problem, you will be asked to determine the length of each +continuous playout and freezing period as a function of Q, r, and x. Now +let's determine tf for when x\>r. In this case, starting at time tp, the +buffer increases from Q to B at rate x−r since bits are being depleted +at rate r but are arriving at rate x, as shown in Figure 9.3. Given +these hints, you will be asked in a homework problem to determine tf, +the time the client buffer becomes full. Note that when the available +rate in the network is more than the video rate, after the initial +buffering delay, the user will enjoy continuous playout until the video +ends. Early Termination and Repositioning the Video HTTP streaming +systems often make use of the HTTP byte-range header in the HTTP GET +request message, which specifies the specific range of bytes the client +currently wants to retrieve from the desired video. This is particularly +useful when the user wants to reposition (that is, jump) to a future +point in time in the video. When the user repositions to a new position, +the client sends a new HTTP request, indicating with the byte-range +header from which byte in the file should the server send data. When the +server receives the new HTTP request, it can forget about any earlier +request and instead send bytes beginning with the byte indicated in the +byte-range request. While we are on the subject of repositioning, we +briefly mention that when a user repositions to a future point in the +video or terminates the video early, some prefetched-but-not-yet-viewed +data transmitted by the server will go unwatched---a waste of network +bandwidth and server resources. For example, suppose that the client +buffer is full with B bits at some time t0 into the video, and at this +time the user repositions to some instant t\>t0+B/r into the video, and +then watches the video to completion from that point on. In this case, +all B bits in the buffer will be unwatched and the bandwidth and server +resources that were used to transmit those B bits have been completely +wasted. There is significant wasted bandwidth in the Internet due to +early termination, which can be quite costly, particularly for wireless +links \[Ihm 2011\]. For this reason, many streaming systems use only a +moderate-size client application buffer, or will limit the amount of +prefetched video using the byte-range header in HTTP requests \[Rao +2011\]. Repositioning and early termination are analogous to cooking a +large meal, eating only a portion of it, and throwing the rest away, +thereby wasting food. So the next time your parents criticize you for +wasting food by not eating all your dinner, you can quickly retort by +saying they are wasting bandwidth and server resources when they +reposition while watching movies over the Internet! But, of course, two +wrongs do not make a right---both food and bandwidth are not to be +wasted! In Sections 9.2.1 and 9.2.2, we covered UDP streaming and HTTP +streaming, respectively. A third type of streaming is Dynamic Adaptive +Streaming over HTTP (DASH), which uses multiple versions of the + +video, each compressed at a different rate. DASH is discussed in detail +in Section 2.6.2. CDNs are often used to distribute stored and live +video. CDNs are discussed in detail in Section 2.6.3. + +9.3 Voice-over-IP Real-time conversational voice over the Internet is +often referred to as Internet telephony, since, from the user's +perspective, it is similar to the traditional circuit-switched telephone +service. It is also commonly called Voice-over-IP (VoIP). In this +section we describe the principles and protocols underlying VoIP. +Conversational video is similar in many respects to VoIP, except that it +includes the video of the participants as well as their voices. To keep +the discussion focused and concrete, we focus here only on voice in this +section rather than combined voice and video. + +9.3.1 Limitations of the Best-Effort IP Service The Internet's +network-layer protocol, IP, provides best-effort service. That is to say +the service makes its best effort to move each datagram from source to +destination as quickly as possible but makes no promises whatsoever +about getting the packet to the destination within some delay bound or +about a limit on the percentage of packets lost. The lack of such +guarantees poses significant challenges to the design of real-time +conversational applications, which are acutely sensitive to packet +delay, jitter, and loss. In this section, we'll cover several ways in +which the performance of VoIP over a best-effort network can be +enhanced. Our focus will be on application-layer techniques, that is, +approaches that do not require any changes in the network core or even +in the transport layer at the end hosts. To keep the discussion +concrete, we'll discuss the limitations of best-effort IP service in the +context of a specific VoIP example. The sender generates bytes at a rate +of 8,000 bytes per second; every 20 msecs the sender gathers these bytes +into a chunk. A chunk and a special header (discussed below) are +encapsulated in a UDP segment, via a call to the socket interface. Thus, +the number of bytes in a chunk is (20 msecs)⋅(8,000 bytes/sec)=160 +bytes, and a UDP segment is sent every 20 msecs. If each packet makes it +to the receiver with a constant end-to-end delay, then packets arrive at +the receiver periodically every 20 msecs. In these ideal conditions, the +receiver can simply play back each chunk as soon as it arrives. But +unfortunately, some packets can be lost and most packets will not have +the same end-to-end delay, even in a lightly congested Internet. For +this reason, the receiver must take more care in determining (1) when to +play back a chunk, and (2) what to do with a missing chunk. Packet Loss + +Consider one of the UDP segments generated by our VoIP application. The +UDP segment is encapsulated in an IP datagram. As the datagram wanders +through the network, it passes through router buffers (that is, queues) +while waiting for transmission on outbound links. It is possible that +one or more of the buffers in the path from sender to receiver is full, +in which case the arriving IP datagram may be discarded, never to arrive +at the receiving application. Loss could be eliminated by sending the +packets over TCP (which provides for reliable data transfer) rather than +over UDP. However, retransmission mechanisms are often considered +unacceptable for conversational real-time audio applications such as +VoIP, because they increase end-to-end delay \[Bolot 1996\]. +Furthermore, due to TCP congestion control, packet loss may result in a +reduction of the TCP sender's transmission rate to a rate that is lower +than the receiver's drain rate, possibly leading to buffer starvation. +This can have a severe impact on voice intelligibility at the receiver. +For these reasons, most existing VoIP applications run over UDP by +default. \[Baset 2006\] reports that UDP is used by Skype unless a user +is behind a NAT or firewall that blocks UDP segments (in which case TCP +is used). But losing packets is not necessarily as disastrous as one +might think. Indeed, packet loss rates between 1 and 20 percent can be +tolerated, depending on how voice is encoded and transmitted, and on how +the loss is concealed at the receiver. For example, forward error +correction (FEC) can help conceal packet loss. We'll see below that with +FEC, redundant information is transmitted along with the original +information so that some of the lost original data can be recovered from +the redundant information. Nevertheless, if one or more of the links +between sender and receiver is severely congested, and packet loss +exceeds 10 to 20 percent (for example, on a wireless link), then there +is really nothing that can be done to achieve acceptable audio quality. +Clearly, best-effort service has its limitations. End-to-End Delay +End-to-end delay is the accumulation of transmission, processing, and +queuing delays in routers; propagation delays in links; and end-system +processing delays. For real-time conversational applications, such as +VoIP, end-to-end delays smaller than 150 msecs are not perceived by a +human listener; delays between 150 and 400 msecs can be acceptable but +are not ideal; and delays exceeding 400 msecs can seriously hinder the +interactivity in voice conversations. The receiving side of a VoIP +application will typically disregard any packets that are delayed more +than a certain threshold, for example, more than 400 msecs. Thus, +packets that are delayed by more than the threshold are effectively +lost. Packet Jitter A crucial component of end-to-end delay is the +varying queuing delays that a packet experiences in the network's +routers. Because of these varying delays, the time from when a packet is +generated at the + +source until it is received at the receiver can fluctuate from packet to +packet, as shown in Figure 9.1. This phenomenon is called jitter. As an +example, consider two consecutive packets in our VoIP application. The +sender sends the second packet 20 msecs after sending the first packet. +But at the receiver, the spacing between these packets can become +greater than 20 msecs. To see this, suppose the first packet arrives at +a nearly empty queue at a router, but just before the second packet +arrives at the queue a large number of packets from other sources arrive +at the same queue. Because the first packet experiences a small queuing +delay and the second packet suffers a large queuing delay at this +router, the first and second packets become spaced by more than 20 +msecs. The spacing between consecutive packets can also become less than +20 msecs. To see this, again consider two consecutive packets. Suppose +the first packet joins the end of a queue with a large number of +packets, and the second packet arrives at the queue before this first +packet is transmitted and before any packets from other sources arrive +at the queue. In this case, our two packets find themselves one right +after the other in the queue. If the time it takes to transmit a packet +on the router's outbound link is less than 20 msecs, then the spacing +between first and second packets becomes less than 20 msecs. The +situation is analogous to driving cars on roads. Suppose you and your +friend are each driving in your own cars from San Diego to Phoenix. +Suppose you and your friend have similar driving styles, and that you +both drive at 100 km/hour, traffic permitting. If your friend starts out +one hour before you, depending on intervening traffic, you may arrive at +Phoenix more or less than one hour after your friend. If the receiver +ignores the presence of jitter and plays out chunks as soon as they +arrive, then the resulting audio quality can easily become +unintelligible at the receiver. Fortunately, jitter can often be removed +by using sequence numbers, timestamps, and a playout delay, as discussed +below. + +9.3.2 Removing Jitter at the Receiver for Audio For our VoIP +application, where packets are being generated periodically, the +receiver should attempt to provide periodic playout of voice chunks in +the presence of random network jitter. This is typically done by +combining the following two mechanisms: Prepending each chunk with a +timestamp. The sender stamps each chunk with the time at which the chunk +was generated. Delaying playout of chunks at the receiver. As we saw in +our earlier discussion of Figure 9.1, the playout delay of the received +audio chunks must be long enough so that most of the packets are +received before their scheduled playout times. This playout delay can +either be fixed throughout the duration of the audio session or vary +adaptively during the audio session lifetime. We now discuss how these +three mechanisms, when combined, can alleviate or even eliminate the +effects of jitter. We examine two playback strategies: fixed playout +delay and adaptive playout delay. + +Fixed Playout Delay With the fixed-delay strategy, the receiver attempts +to play out each chunk exactly q msecs after the chunk is generated. So +if a chunk is timestamped at the sender at time t, the receiver plays +out the chunk at time t+q, assuming the chunk has arrived by that time. +Packets that arrive after their scheduled playout times are discarded +and considered lost. What is a good choice for q? VoIP can support +delays up to about 400 msecs, although a more satisfying conversational +experience is achieved with smaller values of q. On the other hand, if q +is made much smaller than 400 msecs, then many packets may miss their +scheduled playback times due to the network-induced packet jitter. +Roughly speaking, if large variations in end-to-end delay are typical, +it is preferable to use a large q; on the other hand, if delay is small +and variations in delay are also small, it is preferable to use a small +q, perhaps less than 150 msecs. The trade-off between the playback delay +and packet loss is illustrated in Figure 9.4. The figure shows the times +at which packets are generated and played + +Figure 9.4 Packet loss for different fixed playout delays + +out for a single talk spurt. Two distinct initial playout delays are +considered. As shown by the leftmost staircase, the sender generates +packets at regular intervals---say, every 20 msecs. The first packet in +this talk spurt is received at time r. As shown in the figure, the +arrivals of subsequent packets are not evenly spaced due to the network +jitter. For the first playout schedule, the fixed initial playout delay +is set to p−r. With this schedule, the fourth + +packet does not arrive by its scheduled playout time, and the receiver +considers it lost. For the second playout schedule, the fixed initial +playout delay is set to p′−r. For this schedule, all packets arrive +before their scheduled playout times, and there is therefore no loss. +Adaptive Playout Delay The previous example demonstrates an important +delay-loss trade-off that arises when designing a playout strategy with +fixed playout delays. By making the initial playout delay large, most +packets will make their deadlines and there will therefore be negligible +loss; however, for conversational services such as VoIP, long delays can +become bothersome if not intolerable. Ideally, we would like the playout +delay to be minimized subject to the constraint that the loss be below a +few percent. The natural way to deal with this trade-off is to estimate +the network delay and the variance of the network delay, and to adjust +the playout delay accordingly at the beginning of each talk spurt. This +adaptive adjustment of playout delays at the beginning of the talk +spurts will cause the sender's silent periods to be compressed and +elongated; however, compression and elongation of silence by a small +amount is not noticeable in speech. Following \[Ramjee 1994\], we now +describe a generic algorithm that the receiver can use to adaptively +adjust its playout delays. To this end, let ti= the timestamp of the ith +packet = the time the packet was generated by the sender ri= the time +packet i is received by receiver pi= the time packet i is played at +receiver The end-to-end network delay of the ith packet is ri−ti. Due to +network jitter, this delay will vary from packet to packet. Let di +denote an estimate of the average network delay upon reception of the +ith packet. This estimate is constructed from the timestamps as follows: +di=(1−u)di−1+u(ri−ti) where u is a fixed constant (for example, u=0.01). +Thus di is a smoothed average of the observed network delays +r1−t1,...,ri−ti. The estimate places more weight on the recently +observed network delays than on the observed network delays of the +distant past. This form of estimate should not be completely unfamiliar; +a similar idea is used to estimate round-trip times in TCP, as discussed +in Chapter 3. Let vi denote an estimate of the average deviation of the +delay from the estimated average delay. This estimate is also +constructed from the timestamps: vi=(1−u)vi−1+u\| ri−ti−di\| + +The estimates di and vi are calculated for every packet received, +although they are used only to determine the playout point for the first +packet in any talk spurt. Once having calculated these estimates, the +receiver employs the following algorithm for the playout of packets. If +packet i is the first packet of a talk spurt, its playout time, pi, is +computed as: pi=ti+di+Kvi where K is a positive constant (for example, +K=4). The purpose of the Kvi term is to set the playout time far enough +into the future so that only a small fraction of the arriving packets in +the talk spurt will be lost due to late arrivals. The playout point for +any subsequent packet in a talk spurt is computed as an offset from the +point in time when the first packet in the talk spurt was played out. In +particular, let qi=pi−ti be the length of time from when the first +packet in the talk spurt is generated until it is played out. If packet +j also belongs to this talk spurt, it is played out at time pj=tj+qi The +algorithm just described makes perfect sense assuming that the receiver +can tell whether a packet is the first packet in the talk spurt. This +can be done by examining the signal energy in each received packet. + +9.3.3 Recovering from Packet Loss We have discussed in some detail how a +VoIP application can deal with packet jitter. We now briefly describe +several schemes that attempt to preserve acceptable audio quality in the +presence of packet loss. Such schemes are called loss recovery schemes. +Here we define packet loss in a broad sense: A packet is lost either if +it never arrives at the receiver or if it arrives after its scheduled +playout time. Our VoIP example will again serve as a context for +describing loss recovery schemes. As mentioned at the beginning of this +section, retransmitting lost packets may not be feasible in a realtime +conversational application such as VoIP. Indeed, retransmitting a packet +that has missed its playout deadline serves absolutely no purpose. And +retransmitting a packet that overflowed a router queue cannot normally +be accomplished quickly enough. Because of these considerations, VoIP +applications often use some type of loss anticipation scheme. Two types +of loss anticipation schemes are forward error correction (FEC) and +interleaving. + +Forward Error Correction (FEC) The basic idea of FEC is to add redundant +information to the original packet stream. For the cost of marginally +increasing the transmission rate, the redundant information can be used +to reconstruct approximations or exact versions of some of the lost +packets. Following \[Bolot 1996\] and \[Perkins 1998\], we now outline +two simple FEC mechanisms. The first mechanism sends a redundant encoded +chunk after every n chunks. The redundant chunk is obtained by exclusive +OR-ing the n original chunks \[Shacham 1990\]. In this manner if any one +packet of the group of n+1 packets is lost, the receiver can fully +reconstruct the lost packet. But if two or more packets in a group are +lost, the receiver cannot reconstruct the lost packets. By keeping n+1, +the group size, small, a large fraction of the lost packets can be +recovered when loss is not excessive. However, the smaller the group +size, the greater the relative increase of the transmission rate. In +particular, the transmission rate increases by a factor of 1/n, so that, +if n=3, then the transmission rate increases by 33 percent. Furthermore, +this simple scheme increases the playout delay, as the receiver must +wait to receive the entire group of packets before it can begin playout. +For more practical details about how FEC works for multimedia transport +see \[RFC 5109\]. The second FEC mechanism is to send a lower-resolution +audio stream as the redundant information. For example, the sender might +create a nominal audio stream and a corresponding low-resolution, lowbit +rate audio stream. (The nominal stream could be a PCM encoding at 64 +kbps, and the lower-quality stream could be a GSM encoding at 13 kbps.) +The low-bit rate stream is referred to as the redundant stream. As shown +in Figure 9.5, the sender constructs the nth packet by taking the nth +chunk from the nominal stream and appending to it the (n−1)st chunk from +the redundant stream. In this manner, whenever there is nonconsecutive +packet loss, the receiver can conceal the loss by playing out the lowbit +rate encoded chunk that arrives with the subsequent packet. Of course, +low-bit rate chunks give lower quality than the nominal chunks. However, +a stream of mostly high-quality chunks, occasional lowquality chunks, +and no missing chunks gives good overall audio quality. Note that in +this scheme, the receiver only has to receive two packets before +playback, so that the increased playout delay is small. Furthermore, if +the low-bit rate encoding is much less than the nominal encoding, then +the marginal increase in the transmission rate will be small. In order +to cope with consecutive loss, we can use a simple variation. Instead of +appending just the (n−1)st low-bit rate chunk to the nth nominal chunk, +the sender can append the (n−1)st and (n−2)nd lowbit rate chunk, or +append the (n−1)st and (n−3)rd low-bit rate chunk, and so on. By +appending more lowbit rate chunks to each nominal chunk, the audio +quality at the receiver becomes acceptable for a wider variety of harsh +best-effort environments. On the other hand, the additional chunks +increase the transmission bandwidth and the playout delay. + +Figure 9.5 Piggybacking lower-quality redundant information + +Interleaving As an alternative to redundant transmission, a VoIP +application can send interleaved audio. As shown in Figure 9.6, the +sender resequences units of audio data before transmission, so that +originally adjacent units are separated by a certain distance in the +transmitted stream. Interleaving can mitigate the effect of packet +losses. If, for example, units are 5 msecs in length and chunks are 20 +msecs (that is, four units per chunk), then the first chunk could +contain units 1, 5, 9, and 13; the second chunk could contain units 2, +6, 10, and 14; and so on. Figure 9.6 shows that the loss of a single +packet from an interleaved stream results in multiple small gaps in the +reconstructed stream, as opposed to the single large gap that would +occur in a noninterleaved stream. Interleaving can significantly improve +the perceived quality of an audio stream \[Perkins 1998\]. It also has +low overhead. The obvious disadvantage of interleaving is that it +increases latency. This limits its use for conversational applications +such as VoIP, although it can perform well for streaming stored audio. A +major advantage of interleaving is that it does not increase the +bandwidth requirements of a stream. Error Concealment Error concealment +schemes attempt to produce a replacement for a lost packet that is +similar to the original. As discussed in \[Perkins 1998\], this is +possible since audio + +Figure 9.6 Sending interleaved audio + +signals, and in particular speech, exhibit large amounts of short-term +self-similarity. As such, these techniques work for relatively small +loss rates (less than 15 percent), and for small packets (4--40 msecs). +When the loss length approaches the length of a phoneme (5--100 msecs) +these techniques break down, since whole phonemes may be missed by the +listener. Perhaps the simplest form of receiver-based recovery is packet +repetition. Packet repetition replaces lost packets with copies of the +packets that arrived immediately before the loss. It has low +computational complexity and performs reasonably well. Another form of +receiver-based recovery is interpolation, which uses audio before and +after the loss to interpolate a suitable packet to cover the loss. +Interpolation performs somewhat better than packet repetition but is +significantly more computationally intensive \[Perkins 1998\]. + +9.3.4 Case Study: VoIP with Skype Skype is an immensely popular VoIP +application with over 50 million accounts active on a daily basis. In +addition to providing host-to-host VoIP service, Skype offers +host-to-phone services, phone-to-host services, and multi-party +host-to-host video conferencing services. (Here, a host is again any +Internet connected IP device, including PCs, tablets, and smartphones.) +Skype was acquired by Microsoft in 2011. + +Because the Skype protocol is proprietary, and because all Skype's +control and media packets are encrypted, it is difficult to precisely +determine how Skype operates. Nevertheless, from the Skype Web site and +several measurement studies, researchers have learned how Skype +generally works \[Baset 2006; Guha 2006; Chen 2006; Suh 2006; Ren 2006; +Zhang X 2012\]. For both voice and video, the Skype clients have at +their disposal many different codecs, which are capable of encoding the +media at a wide range of rates and qualities. For example, video rates +for Skype have been measured to be as low as 30 kbps for a low-quality +session up to almost 1 Mbps for a high quality session \[Zhang X 2012\]. +Typically, Skype's audio quality is better than the "POTS" (Plain Old +Telephone Service) quality provided by the wire-line phone system. +(Skype codecs typically sample voice at 16,000 samples/sec or higher, +which provides richer tones than POTS, which samples at 8,000/sec.) By +default, Skype sends audio and video packets over UDP. However, control +packets are sent over TCP, and media packets are also sent over TCP when +firewalls block UDP streams. Skype uses FEC for loss recovery for both +voice and video streams sent over UDP. The Skype client also adapts the +audio and video streams it sends to current network conditions, by +changing video quality and FEC overhead \[Zhang X 2012\]. Skype uses P2P +techniques in a number of innovative ways, nicely illustrating how P2P +can be used in applications that go beyond content distribution and file +sharing. As with instant messaging, host-to-host Internet telephony is +inherently P2P since, at the heart of the application, pairs of users +(that is, peers) communicate with each other in real time. But Skype +also employs P2P techniques for two other important functions, namely, +for user location and for NAT traversal. + +Figure 9.7 Skype peers + +As shown in Figure 9.7, the peers (hosts) in Skype are organized into a +hierarchical overlay network, with each peer classified as a super peer +or an ordinary peer. Skype maintains an index that maps Skype usernames +to current IP addresses (and port numbers). This index is distributed +over the super peers. When Alice wants to call Bob, her Skype client +searches the distributed index to determine Bob's current IP address. +Because the Skype protocol is proprietary, it is currently not known how +the index mappings are organized across the super peers, although some +form of DHT organization is very possible. P2P techniques are also used +in Skype relays, which are useful for establishing calls between hosts +in home networks. Many home network configurations provide access to the +Internet through NATs, as discussed in Chapter 4. Recall that a NAT +prevents a host from outside the home network from initiating a +connection to a host within the home network. If both Skype callers have +NATs, then there is a problem---neither can accept a call initiated by +the other, making a call seemingly impossible. The clever use of super +peers and relays nicely solves this problem. Suppose that when Alice +signs in, she is assigned to a non-NATed super peer and initiates a +session to that super peer. (Since Alice is initiating the session, her +NAT permits this session.) This session allows Alice and her super peer +to exchange control messages. The same happens for Bob when he signs in. +Now, when Alice wants to call Bob, she informs her super peer, who in +turn informs Bob's super peer, who in turn informs Bob of Alice's +incoming call. If Bob accepts the call, the two super peers select a +third non-NATed super peer---the relay peer---whose job will be to relay +data between Alice and Bob. Alice's and Bob's super peers then instruct +Alice and Bob respectively to initiate a session with the relay. As +shown in Figure 9.7, Alice then sends voice packets to the relay over +the Alice-to-relay connection (which was initiated by Alice), and the +relay then forwards these packets over the relay-to-Bob connection +(which was initiated by Bob); packets from Bob to Alice flow over these +same two relay connections in reverse. And voila!---Bob and Alice have +an end-to-end connection even though neither can accept a session +originating from outside. Up to now, our discussion on Skype has focused +on calls involving two persons. Now let's examine multi-party audio +conference calls. With N\>2 participants, if each user were to send a +copy of its audio stream to each of the N−1 other users, then a total of +N(N−1) audio streams would need to be sent into the network to support +the audio conference. To reduce this bandwidth usage, Skype employs a +clever distribution technique. Specifically, each user sends its audio +stream to the conference initiator. The conference initiator combines +the audio streams into one stream (basically by adding all the audio +signals together) and then sends a copy of each combined stream to each +of the other N−1 participants. In this manner, the number of streams is +reduced to 2(N−1). For ordinary two-person video conversations, Skype +routes the call peer-to-peer, unless NAT traversal is required, in which +case the call is relayed through a non-NATed peer, as described earlier. +For a video conference call involving N\>2 participants, due to the +nature of the video medium, Skype does not combine the call into one + +stream at one location and then redistribute the stream to all the +participants, as it does for voice calls. Instead, each participant's +video stream is routed to a server cluster (located in Estonia as of +2011), which in turn relays to each participant the N−1 streams of the +N−1 other participants \[Zhang X 2012\]. You may be wondering why each +participant sends a copy to a server rather than directly sending a copy +of its video stream to each of the other N−1 participants? Indeed, for +both approaches, N(N−1) video streams are being collectively received by +the N participants in the conference. The reason is, because upstream +link bandwidths are significantly lower than downstream link bandwidths +in most access links, the upstream links may not be able to support the +N−1 streams with the P2P approach. VoIP systems such as Skype, WeChat, +and Google Talk introduce new privacy concerns. Specifically, when Alice +and Bob communicate over VoIP, Alice can sniff Bob's IP address and then +use geo-location services \[MaxMind 2016; Quova 2016\] to determine +Bob's current location and ISP (for example, his work or home ISP). In +fact, with Skype it is possible for Alice to block the transmission of +certain packets during call establishment so that she obtains Bob's +current IP address, say every hour, without Bob knowing that he is being +tracked and without being on Bob's contact list. Furthermore, the IP +address discovered from Skype can be correlated with IP addresses found +in BitTorrent, so that Alice can determine the files that Bob is +downloading \[LeBlond 2011\]. Moreover, it is possible to partially +decrypt a Skype call by doing a traffic analysis of the packet sizes in +a stream \[White 2011\]. + +9.4 Protocols for Real-Time Conversational Applications Real-time +conversational applications, including VoIP and video conferencing, are +compelling and very popular. It is therefore not surprising that +standards bodies, such as the IETF and ITU, have been busy for many +years (and continue to be busy!) at hammering out standards for this +class of applications. With the appropriate standards in place for +real-time conversational applications, independent companies are +creating new products that interoperate with each other. In this section +we examine RTP and SIP for real-time conversational applications. Both +standards are enjoying widespread implementation in industry products. + +9.4.1 RTP In the previous section, we learned that the sender side of a +VoIP application appends header fields to the audio chunks before +passing them to the transport layer. These header fields include +sequence numbers and timestamps. Since most multimedia networking +applications can make use of sequence numbers and timestamps, it is +convenient to have a standardized packet structure that includes fields +for audio/video data, sequence number, and timestamp, as well as other +potentially useful fields. RTP, defined in RFC 3550, is such a standard. +RTP can be used for transporting common formats such as PCM, ACC, and +MP3 for sound and MPEG and H.263 for video. It can also be used for +transporting proprietary sound and video formats. Today, RTP enjoys +widespread implementation in many products and research prototypes. It +is also complementary to other important real-time interactive +protocols, such as SIP. In this section, we provide an introduction to +RTP. We also encourage you to visit Henning Schulzrinne's RTP site +\[Schulzrinne-RTP 2012\], which provides a wealth of information on the +subject. Also, you may want to visit the RAT site \[RAT 2012\], which +documents VoIP application that uses RTP. RTP Basics RTP typically runs +on top of UDP. The sending side encapsulates a media chunk within an RTP +packet, then encapsulates the packet in a UDP segment, and then hands +the segment to IP. The receiving side extracts the RTP packet from the +UDP segment, then extracts the media chunk from the RTP packet, and then +passes the chunk to the media player for decoding and rendering. As an +example, consider the use of RTP to transport voice. Suppose the voice +source is PCM-encoded + +(that is, sampled, quantized, and digitized) at 64 kbps. Further suppose +that the application collects the encoded data in 20-msec chunks, that +is, 160 bytes in a chunk. The sending side precedes each chunk of the +audio data with an RTP header that includes the type of audio encoding, +a sequence number, and a timestamp. The RTP header is normally 12 bytes. +The audio chunk along with the RTP header form the RTP packet. The RTP +packet is then sent into the UDP socket interface. At the receiver side, +the application receives the RTP packet from its socket interface. The +application extracts the audio chunk from the RTP packet and uses the +header fields of the RTP packet to properly decode and play back the +audio chunk. If an application incorporates RTP---instead of a +proprietary scheme to provide payload type, sequence numbers, or +timestamps---then the application will more easily interoperate with +other networked multimedia applications. For example, if two different +companies develop VoIP software and they both incorporate RTP into their +product, there may be some hope that a user using one of the VoIP +products will be able to communicate with a user using the other VoIP +product. In Section 9.4.2, we'll see that RTP is often used in +conjunction with SIP, an important standard for Internet telephony. It +should be emphasized that RTP does not provide any mechanism to ensure +timely delivery of data or provide other quality-of-service (QoS) +guarantees; it does not even guarantee delivery of packets or prevent +out-of-order delivery of packets. Indeed, RTP encapsulation is seen only +at the end systems. Routers do not distinguish between IP datagrams that +carry RTP packets and IP datagrams that don't. RTP allows each source +(for example, a camera or a microphone) to be assigned its own +independent RTP stream of packets. For example, for a video conference +between two participants, four RTP streams could be opened---two streams +for transmitting the audio (one in each direction) and two streams for +transmitting the video (again, one in each direction). However, many +popular encoding techniques---including MPEG 1 and MPEG 2---bundle the +audio and video into a single stream during the encoding process. When +the audio and video are bundled by the encoder, then only one RTP stream +is generated in each direction. RTP packets are not limited to unicast +applications. They can also be sent over one-to-many and manyto-many +multicast trees. For a many-to-many multicast session, all of the +session's senders and sources typically use the same multicast group for +sending their RTP streams. RTP multicast streams belonging together, +such as audio and video streams emanating from multiple senders in a +video conference application, belong to an RTP session. + +Figure 9.8 RTP header fields + +RTP Packet Header Fields As shown in Figure 9.8, the four main RTP +packet header fields are the payload type, sequence number, timestamp, +and source identifier fields. The payload type field in the RTP packet +is 7 bits long. For an audio stream, the payload type field is used to +indicate the type of audio encoding (for example, PCM, adaptive delta +modulation, linear predictive encoding) that is being used. If a sender +decides to change the encoding in the middle of a session, the sender +can inform the receiver of the change through this payload type field. +The sender may want to change the encoding in order to increase the +audio quality or to decrease the RTP stream bit rate. Table 9.2 lists +some of the audio payload types currently supported by RTP. For a video +stream, the payload type is used to indicate the type of video encoding +(for example, motion JPEG, MPEG 1, MPEG 2, H.261). Again, the sender can +change video encoding on the fly during a session. Table 9.3 lists some +of the video payload types currently supported by RTP. The other +important fields are the following: Sequence number field. The sequence +number field is 16 bits long. The sequence number increments by one for +each RTP packet sent, and may be used by the receiver to detect packet +loss and to restore packet sequence. For example, if the receiver side +of the application receives a stream of RTP packets with a gap between +sequence numbers 86 and 89, then the receiver knows that packets 87 and +88 are missing. The receiver can then attempt to conceal the lost data. +Timestamp field. The timestamp field is 32 bits long. It reflects the +sampling instant of the first byte in the RTP data packet. As we saw in +the preceding section, the receiver can use timestamps to remove packet +jitter introduced in the network and to provide synchronous playout at +the receiver. The timestamp is derived from a sampling clock at the +sender. As an example, for audio the timestamp clock increments by one +for each sampling period (for example, each 125 μsec for an 8 kHz +sampling clock); if the audio application generates chunks consisting of +160 encoded samples, then the timestamp increases by 160 for each RTP +packet when the source is active. The timestamp clock continues to +increase at a constant rate even if the source is inactive. +Synchronization source identifier (SSRC). The SSRC field is 32 bits +long. It identifies the source of the RTP stream. Typically, each stream +in an RTP session has a distinct SSRC. The SSRC is not the IP address of +the sender, but instead is a number that the source assigns randomly +when the new stream is started. The probability that two streams get +assigned the same SSRC is very small. Should this happen, the two +sources pick a new SSRC value. Table 9.2 Audio payload types supported +by RTP Payload-Type Number + +Audio Format + +Sampling Rate + +Rate + +0 + +PCM μ-law + +8 kHz + +64 kbps + +1 + +1016 + +8 kHz + +4.8 kbps + +3 + +GSM + +8 kHz + +13 kbps + +7 + +LPC + +8 kHz + +2.4 kbps + +9 + +G.722 + +16 kHz + +48--64 kbps + +14 + +MPEG Audio + +90 kHz + +--- + +15 + +G.728 + +8 kHz + +16 kbps + +Table 9.3 Some video payload types supported by RTP Payload-Type Number + +Video Format + +26 + +Motion JPEG + +31 + +H.261 + +32 + +MPEG 1 video + +33 + +MPEG 2 video + +9.4.2 SIP The Session Initiation Protocol (SIP), defined in \[RFC 3261; +RFC 5411\], is an open and lightweight protocol that does the following: +It provides mechanisms for establishing calls between a caller and a +callee over an IP network. It allows the caller to notify the callee +that it wants to start a call. It allows the participants to agree on +media encodings. It also allows participants to end calls. It provides +mechanisms for the caller to determine the current IP address of the +callee. Users do not have a single, fixed IP address because they may be +assigned addresses dynamically (using DHCP) and because they may have +multiple IP devices, each with a different IP address. It provides +mechanisms for call management, such as adding new media streams during +the call, + +changing the encoding during the call, inviting new participants during +the call, call transfer, and call holding. Setting Up a Call to a Known +IP Address To understand the essence of SIP, it is best to take a look +at a concrete example. In this example, Alice is at her PC and she wants +to call Bob, who is also working at his PC. Alice's and Bob's PCs are +both equipped with SIP-based software for making and receiving phone +calls. In this initial example, we'll assume that Alice knows the IP +address of Bob's PC. Figure 9.9 illustrates the SIP call-establishment +process. In Figure 9.9, we see that an SIP session begins when Alice +sends Bob an INVITE message, which resembles an HTTP request message. +This INVITE message is sent over UDP to the well-known port 5060 for +SIP. (SIP messages can also be sent over TCP.) The INVITE message +includes an identifier for Bob (bob@193.64.210.89), an indication of +Alice's current IP address, an indication that Alice desires to receive +audio, which is to be encoded in format AVP 0 (PCM encoded μ-law) and + +Figure 9.9 SIP call establishment when Alice knows Bob's IP address + +encapsulated in RTP, and an indication that she wants to receive the RTP +packets on port 38060. After receiving Alice's INVITE message, Bob sends +an SIP response message, which resembles an HTTP response message. This +response SIP message is also sent to the SIP port 5060. Bob's response +includes a 200 OK as well as an indication of his IP address, his +desired encoding and packetization for reception, and his port number to +which the audio packets should be sent. Note that in this example Alice +and Bob are going to use different audio-encoding mechanisms: Alice is +asked to encode her audio with GSM whereas Bob is asked to encode his +audio with PCM μ-law. After receiving Bob's response, Alice sends Bob an +SIP acknowledgment message. After this SIP transaction, Bob and Alice +can talk. (For visual convenience, Figure 9.9 shows Alice talking after +Bob, but in truth they would normally talk at the same time.) Bob will +encode and packetize the audio as requested and send the audio packets +to port number 38060 at IP address 167.180.112.24. Alice will also +encode and packetize the audio as requested and send the audio packets +to port number 48753 at IP address 193.64.210.89. From this simple +example, we have learned a number of key characteristics of SIP. First, +SIP is an outof-band protocol: The SIP messages are sent and received in +sockets that are different from those used for sending and receiving the +media data. Second, the SIP messages themselves are ASCII-readable and +resemble HTTP messages. Third, SIP requires all messages to be +acknowledged, so it can run over UDP or TCP. In this example, let's +consider what would happen if Bob does not have a PCM μ-law codec for +encoding audio. In this case, instead of responding with 200 OK, Bob +would likely respond with a 606 Not Acceptable and list in the message +all the codecs he can use. Alice would then choose one of the listed +codecs and send another INVITE message, this time advertising the chosen +codec. Bob could also simply reject the call by sending one of many +possible rejection reply codes. (There are many such codes, including +"busy," "gone," "payment required," and "forbidden.") SIP Addresses In +the previous example, Bob's SIP address is sip:bob@193.64.210.89. +However, we expect many---if not most---SIP addresses to resemble e-mail +addresses. For example, Bob's address might be sip:bob@domain.com. When +Alice's SIP device sends an INVITE message, the message would include +this e-mail-like address; the SIP infrastructure would then route the +message to the IP device that Bob is currently using (as we'll discuss +below). Other possible forms for the SIP address could be Bob's legacy +phone number or simply Bob's first/middle/last name (assuming it is +unique). An interesting feature of SIP addresses is that they can be +included in Web pages, just as people's email addresses are included in +Web pages with the mailto URL. For example, suppose Bob has a + +personal homepage, and he wants to provide a means for visitors to the +homepage to call him. He could then simply include the URL +sip:bob@domain.com. When the visitor clicks on the URL, the SIP +application in the visitor's device is launched and an INVITE message is +sent to Bob. SIP Messages In this short introduction to SIP, we'll not +cover all SIP message types and headers. Instead, we'll take a brief +look at the SIP INVITE message, along with a few common header lines. +Let us again suppose that Alice wants to initiate a VoIP call to Bob, +and this time Alice knows only Bob's SIP address, bob@domain.com, and +does not know the IP address of the device that Bob is currently using. +Then her message might look something like this: + +INVITE sip:bob@domain.com SIP/2.0 Via: SIP/2.0/UDP 167.180.112.24 From: +sip:alice@hereway.com To: sip:bob@domain.com Call-ID: +a2e3a@pigeon.hereway.com Content-Type: application/sdp Content-Length: +885 c=IN IP4 167.180.112.24 m=audio 38060 RTP/AVP 0 + +The INVITE line includes the SIP version, as does an HTTP request +message. Whenever an SIP message passes through an SIP device (including +the device that originates the message), it attaches a Via header, which +indicates the IP address of the device. (We'll see soon that the typical +INVITE message passes through many SIP devices before reaching the +callee's SIP application.) Similar to an e-mail message, the SIP message +includes a From header line and a To header line. The message includes a +Call-ID, which uniquely identifies the call (similar to the message-ID +in e-mail). It includes a Content-Type header line, which defines the +format used to describe the content contained in the SIP message. It +also includes a Content-Length header line, which provides the length in +bytes of the content in the message. Finally, after a carriage return +and line feed, the message contains the content. In this case, the +content provides information about Alice's IP address and how Alice +wants to receive the audio. Name Translation and User Location In the +example in Figure 9.9, we assumed that Alice's SIP device knew the IP +address where Bob could + +be contacted. But this assumption is quite unrealistic, not only because +IP addresses are often dynamically assigned with DHCP, but also because +Bob may have multiple IP devices (for example, different devices for his +home, work, and car). So now let us suppose that Alice knows only Bob's +e-mail address, bob@domain.com, and that this same address is used for +SIP-based calls. In this case, Alice needs to obtain the IP address of +the device that the user bob@domain.com is currently using. To find this +out, Alice creates an INVITE message that begins with INVITE +bob@domain.com SIP/2.0 and sends this message to an SIP proxy. The proxy +will respond with an SIP reply that might include the IP address of the +device that bob@domain.com is currently using. Alternatively, the reply +might include the IP address of Bob's voicemail box, or it might include +a URL of a Web page (that says "Bob is sleeping. Leave me alone!"). +Also, the result returned by the proxy might depend on the caller: If +the call is from Bob's wife, he might accept the call and supply his IP +address; if the call is from Bob's mother-inlaw, he might respond with +the URL that points to the I-am-sleeping Web page! Now, you are probably +wondering, how can the proxy server determine the current IP address for +bob@domain.com? To answer this question, we need to say a few words +about another SIP device, the SIP registrar. Every SIP user has an +associated registrar. Whenever a user launches an SIP application on a +device, the application sends an SIP register message to the registrar, +informing the registrar of its current IP address. For example, when Bob +launches his SIP application on his PDA, the application would send a +message along the lines of: + +REGISTER sip:domain.com SIP/2.0 Via: SIP/2.0/UDP 193.64.210.89 From: +sip:bob@domain.com To: sip:bob@domain.com Expires: 3600 + +Bob's registrar keeps track of Bob's current IP address. Whenever Bob +switches to a new SIP device, the new device sends a new register +message, indicating the new IP address. Also, if Bob remains at the same +device for an extended period of time, the device will send refresh +register messages, indicating that the most recently sent IP address is +still valid. (In the example above, refresh messages need to be sent +every 3600 seconds to maintain the address at the registrar server.) It +is worth noting that the registrar is analogous to a DNS authoritative +name server: The DNS server translates fixed host names to fixed IP +addresses; the SIP registrar translates fixed human identifiers (for +example, bob@domain.com) to dynamic IP addresses. Often SIP registrars +and SIP proxies are run on the same host. Now let's examine how Alice's +SIP proxy server obtains Bob's current IP address. From the preceding +discussion we see that the proxy server simply needs to forward Alice's +INVITE message to Bob's registrar/proxy. The registrar/proxy could then +forward the message to Bob's current SIP device. Finally, + +Bob, having now received Alice's INVITE message, could send an SIP +response to Alice. As an example, consider Figure 9.10, in which +jim@umass.edu, currently working on 217.123.56.89, wants to initiate a +Voice-over-IP (VoIP) session with keith@upenn.edu, currently working on +197.87.54.21. The following steps are taken: + +Figure 9.10 Session initiation, involving SIP proxies and registrars + +(1) Jim sends an INVITE message to the umass SIP proxy. (2) The proxy + does a DNS lookup on the SIP registrar upenn.edu (not shown in + diagram) and then forwards the message to the registrar server. +(2) Because keith@upenn.edu is no longer registered at the upenn + registrar, the upenn registrar sends a redirect response, indicating + that it should try keith@nyu.edu. (4) The umass proxy sends an + INVITE message to the NYU SIP registrar. (5) The NYU registrar knows + the IP address of keith@upenn.edu and forwards the INVITE message to + the host 197.87.54.21, which is running Keith's SIP client. (6--8) + An SIP response is sent back through registrars/proxies to the SIP + client on 217.123.56.89. (9) Media is sent directly between the two + clients. (There is also an SIP acknowledgment message, which is not + shown.) Our discussion of SIP has focused on call initiation for + voice calls. SIP, being a signaling protocol for initiating and + ending calls in general, can be used for video conference calls as + well as for text-based + +sessions. In fact, SIP has become a fundamental component in many +instant messaging applications. Readers desiring to learn more about SIP +are encouraged to visit Henning Schulzrinne's SIP Web site +\[Schulzrinne-SIP 2016\]. In particular, on this site you will find open +source software for SIP clients and servers \[SIP Software 2016\]. + +9.5 Network Support for Multimedia In Sections 9.2 through 9.4, we +learned how application-level mechanisms such as client buffering, +prefetching, adapting media quality to available bandwidth, adaptive +playout, and loss mitigation techniques can be used by multimedia +applications to improve a multimedia application's performance. We also +learned how content distribution networks and P2P overlay networks can +be used to provide a system-level approach for delivering multimedia +content. These techniques and approaches are all designed to be used in +today's best-effort Internet. Indeed, they are in use today precisely +because the Internet provides only a single, best-effort class of +service. But as designers of computer networks, we can't help but ask +whether the network (rather than the applications or application-level +infrastructure alone) might provide mechanisms to support multimedia +content delivery. As we'll see shortly, the answer is, of course, "yes"! +But we'll also see that a number of these new network-level mechanisms +have yet to be widely deployed. This may be due to their complexity and +to the fact that application-level techniques together with best-effort +service and properly dimensioned network resources (for example, +bandwidth) can indeed provide a "good-enough" (even if +not-always-perfect) end-to-end multimedia delivery service. Table 9.4 +summarizes three broad approaches towards providing network-level +support for multimedia applications. Making the best of best-effort +service. The application-level mechanisms and infrastructure that we +studied in Sections 9.2 through 9.4 can be successfully used in a +well-dimensioned network where packet loss and excessive end-to-end +delay rarely occur. When demand increases are forecasted, the ISPs +deploy additional bandwidth and switching capacity to continue to ensure +satisfactory delay and packet-loss performance \[Huang 2005\]. We'll +discuss such network dimensioning further in Section 9.5.1. +Differentiated service. Since the early days of the Internet, it's been +envisioned that different types of traffic (for example, as indicated in +the Type-of-Service field in the IP4v packet header) could be provided +with different classes of service, rather than a single +one-size-fits-all best-effort service. With differentiated service, one +type of traffic might be given strict priority over another class of +traffic when both types of traffic are queued at a router. For example, +packets belonging to a realtime conversational application might be +given priority over other packets due to their stringent delay +constraints. Introducing differentiated service into the network will +require new mechanisms for packet marking (indicating a packet's class +of service), packet scheduling, and more. We'll cover differentiated +service, and new network mechanisms needed to implement this service, in +Sections 9.5.2 and 9.5.3. + +Table 9.4 Three network-level approaches to supporting multimedia +applications Approach + +Granularity + +Guarantee + +Mechanisms + +Complexity + +Deployment to date + +Making the + +all traffic + +none, or + +application-layer + +best of best- + +treated + +soft + +support, CDNs, + +effort service + +equally + +minimal + +everywhere + +medium + +some + +light + +little + +overlays, networklevel resource provisioning + +Differentiated + +different + +none, or + +packet marking, + +service + +classes of + +soft + +policing, scheduling + +traffic treated differently Per- + +each + +soft or + +packet marking, + +connection + +source- + +hard, once + +policing, + +Quality-of- + +destination + +flow is + +scheduling; call + +Service (QoS) + +flows treated + +admitted + +admission and + +Guarantees + +differently + +signaling + +Per-connection Quality-of-Service (QoS) Guarantees. With per-connection +QoS guarantees, each instance of an application explicitly reserves +end-to-end bandwidth and thus has a guaranteed end-to-end performance. A +hard guarantee means the application will receive its requested quality +of service (QoS) with certainty. A soft guarantee means the application +will receive its requested quality of service with high probability. For +example, if a user wants to make a VoIP call from Host A to Host B, the +user's VoIP application reserves bandwidth explicitly in each link along +a route between the two hosts. But permitting applications to make +reservations and requiring the network to honor the reservations +requires some big changes. First, we need a protocol that, on behalf of +the applications, reserves link bandwidth on the paths from the senders +to their receivers. Second, we'll need new scheduling policies in the +router queues so that per-connection bandwidth reservations can be +honored. Finally, in order to make a reservation, the applications must +give the network a description of the traffic that they intend to send +into the network and the network will need to police each application's +traffic to make sure that it abides by that description. These +mechanisms, when combined, require new and complex software in hosts and +routers. Because per-connection QoS guaranteed service has not seen +significant deployment, we'll cover these mechanisms only briefly in +Section 9.5.4. + +9.5.1 Dimensioning Best-Effort Networks Fundamentally, the difficulty in +supporting multimedia applications arises from their stringent +performance requirements---low end-to-end packet delay, delay jitter, +and loss---and the fact that packet delay, delay jitter, and loss occur +whenever the network becomes congested. A first approach to improving +the quality of multimedia applications---an approach that can often be +used to solve just about any problem where resources are +constrained---is simply to "throw money at the problem" and thus simply +avoid resource contention. In the case of networked multimedia, this +means providing enough link capacity throughout the network so that +network congestion, and its consequent packet delay and loss, never (or +only very rarely) occurs. With enough link capacity, packets could zip +through today's Internet without queuing delay or loss. From many +perspectives this is an ideal situation---multimedia applications would +perform perfectly, users would be happy, and this could all be achieved +with no changes to Internet's best-effort architecture. The question, of +course, is how much capacity is "enough" to achieve this nirvana, and +whether the costs of providing "enough" bandwidth are practical from a +business standpoint to the ISPs. The question of how much capacity to +provide at network links in a given topology to achieve a given level of +performance is often known as bandwidth provisioning. The even more +complicated problem of how to design a network topology (where to place +routers, how to interconnect routers with links, and what capacity to +assign to links) to achieve a given level of end-to-end performance is a +network design problem often referred to as network dimensioning. Both +bandwidth provisioning and network dimensioning are complex topics, well +beyond the scope of this textbook. We note here, however, that the +following issues must be addressed in order to predict application-level +performance between two network end points, and thus provision enough +capacity to meet an application's performance requirements. Models of +traffic demand between network end points. Models may need to be +specified at both the call level (for example, users "arriving" to the +network and starting up end-to-end applications) and at the packet level +(for example, packets being generated by ongoing applications). Note +that workload may change over time. Well-defined performance +requirements. For example, a performance requirement for supporting +delay-sensitive traffic, such as a conversational multimedia +application, might be that the probability that the end-to-end delay of +the packet is greater than a maximum tolerable delay be less than some +small value \[Fraleigh 2003\]. Models to predict end-to-end performance +for a given workload model, and techniques to find a minimal cost +bandwidth allocation that will result in all user requirements being +met. Here, researchers are busy developing performance models that can +quantify performance for a given workload, and optimization techniques +to find minimal-cost bandwidth allocations meeting performance +requirements. + +Given that today's best-effort Internet could (from a technology +standpoint) support multimedia traffic at an appropriate performance +level if it were dimensioned to do so, the natural question is why +today's Internet doesn't do so. The answers are primarily economic and +organizational. From an economic standpoint, would users be willing to +pay their ISPs enough for the ISPs to install sufficient bandwidth to +support multimedia applications over a best-effort Internet? The +organizational issues are perhaps even more daunting. Note that an +end-to-end path between two multimedia end points will pass through the +networks of multiple ISPs. From an organizational standpoint, would +these ISPs be willing to cooperate (perhaps with revenue sharing) to +ensure that the end-to-end path is properly dimensioned to support +multimedia applications? For a perspective on these economic and +organizational issues, see \[Davies 2005\]. For a perspective on +provisioning tier-1 backbone networks to support delay-sensitive +traffic, see \[Fraleigh 2003\]. + +9.5.2 Providing Multiple Classes of Service Perhaps the simplest +enhancement to the one-size-fits-all best-effort service in today's +Internet is to divide traffic into classes, and provide different levels +of service to these different classes of traffic. For example, an ISP +might well want to provide a higher class of service to delay-sensitive +Voice-over-IP or teleconferencing traffic (and charge more for this +service!) than to elastic traffic such as e-mail or HTTP. Alternatively, +an ISP may simply want to provide a higher quality of service to +customers willing to pay more for this improved service. A number of +residential wired-access ISPs and cellular wireless-access ISPs have +adopted such tiered levels of service---with platinum-service +subscribers receiving better performance than gold- or silver-service +subscribers. We're all familiar with different classes of service from +our everyday lives---first-class airline passengers get better service +than business-class passengers, who in turn get better service than +those of us who fly economy class; VIPs are provided immediate entry to +events while everyone else waits in line; elders are revered in some +countries and provided seats of honor and the finest food at a table. +It's important to note that such differential service is provided among +aggregates of traffic, that is, among classes of traffic, not among +individual connections. For example, all first-class passengers are +handled the same (with no first-class passenger receiving any better +treatment than any other first-class passenger), just as all VoIP +packets would receive the same treatment within the network, independent +of the particular end-to-end connection to which they belong. As we will +see, by dealing with a small number of traffic aggregates, rather than a +large number of individual connections, the new network mechanisms +required to provide better-than-best service can be kept relatively +simple. The early Internet designers clearly had this notion of multiple +classes of service in mind. Recall the type-of-service (ToS) field in +the IPv4 header discussed in Chapter 4. IEN123 \[ISI 1979\] describes +the ToS field also present in an ancestor of the IPv4 datagram as +follows: "The Type of Service \[field\] + +provides an indication of the abstract parameters of the quality of +service desired. These parameters are to be used to guide the selection +of the actual service parameters when transmitting a datagram through a +particular network. Several networks offer service precedence, which +somehow treats high precedence traffic as more important that other +traffic." More than four decades ago, the vision of providing different +levels of service to different classes of traffic was clear! However, +it's taken us an equally long period of time to realize this vision. +Motivating Scenarios Let's begin our discussion of network mechanisms +for providing multiple classes of service with a few motivating +scenarios. Figure 9.11 shows a simple network scenario in which two +application packet flows originate on Hosts H1 and H2 on one LAN and are +destined for Hosts H3 and H4 on another LAN. The routers on the two LANs +are connected by a 1.5 Mbps link. Let's assume the LAN speeds are +significantly higher than 1.5 Mbps, and focus on the output queue of +router R1; it is here that packet delay and packet loss will occur if +the aggregate sending rate of H1 and H2 exceeds 1.5 Mbps. Let's further +suppose that a 1 Mbps audio application (for example, a CD-quality audio +call) shares the + +Figure 9.11 Competing audio and HTTP applications + +1.5 Mbps link between R1 and R2 with an HTTP Web-browsing application +that is downloading a Web page from H2 to H4. In the best-effort +Internet, the audio and HTTP packets are mixed in the output queue at R1 +and (typically) transmitted in a first-in-first-out (FIFO) order. In +this scenario, a burst of packets from the Web + +server could potentially fill up the queue, causing IP audio packets to +be excessively delayed or lost due to buffer overflow at R1. How should +we solve this potential problem? Given that the HTTP Webbrowsing +application does not have time constraints, our intuition might be to +give strict priority to audio packets at R1. Under a strict priority +scheduling discipline, an audio packet in the R1 output buffer would +always be transmitted before any HTTP packet in the R1 output buffer. +The link from R1 to R2 would look like a dedicated link of 1.5 Mbps to +the audio traffic, with HTTP traffic using the R1-to-R2 link only when +no audio traffic is queued. In order for R1 to distinguish between the +audio and HTTP packets in its queue, each packet must be marked as +belonging to one of these two classes of traffic. This was the original +goal of the type-of-service (ToS) field in IPv4. As obvious as this +might seem, this then is our first insight into mechanisms needed to +provide multiple classes of traffic: Insight 1: Packet marking allows a +router to distinguish among packets belonging to different classes of +traffic. Note that although our example considers a competing multimedia +and elastic flow, the same insight applies to the case that platinum, +gold, and silver classes of service are implemented---a packetmarking +mechanism is still needed to indicate that class of service to which a +packet belongs. Now suppose that the router is configured to give +priority to packets marked as belonging to the 1 Mbps audio application. +Since the outgoing link speed is 1.5 Mbps, even though the HTTP packets +receive lower priority, they can still, on average, receive 0.5 Mbps of +transmission service. But what happens if the audio application starts +sending packets at a rate of 1.5 Mbps or higher (either maliciously or +due to an error in the application)? In this case, the HTTP packets will +starve, that is, they will not receive any service on the R1-to-R2 link. +Similar problems would occur if multiple applications (for example, +multiple audio calls), all with the same class of service as the audio +application, were sharing the link's bandwidth; they too could +collectively starve the FTP session. Ideally, one wants a degree of +isolation among classes of traffic so that one class of traffic can be +protected from the other. This protection could be implemented at +different places in the network---at each and every router, at first +entry to the network, or at inter-domain network boundaries. This then +is our second insight: Insight 2: It is desirable to provide a degree of +traffic isolation among classes so that one class is not adversely +affected by another class of traffic that misbehaves. We'll examine +several specific mechanisms for providing such isolation among traffic +classes. We note here that two broad approaches can be taken. First, it +is possible to perform traffic policing, as shown in Figure 9.12. If a +traffic class or flow must meet certain criteria (for example, that the +audio flow not exceed a peak rate of 1 Mbps), then a policing mechanism +can be put into place to ensure that these criteria are indeed observed. +If the policed application misbehaves, the policing mechanism will take +some action (for example, drop or delay packets that are in violation of +the criteria) so that the traffic actually entering the network conforms +to the criteria. The leaky bucket mechanism that we'll examine + +shortly is perhaps the most widely used policing mechanism. In Figure +9.12, the packet classification and marking mechanism (Insight 1) and +the policing mechanism (Insight 2) are both implemented together at the +network's edge, either in the end system or at an edge router. A +complementary approach for providing isolation among traffic classes is +for the link-level packetscheduling mechanism to explicitly allocate a +fixed amount of link bandwidth to each class. For example, the audio +class could be allocated 1 Mbps at R1, and the HTTP class could be +allocated 0.5 Mbps. In this case, the audio and + +Figure 9.12 Policing (and marking) the audio and HTTP traffic classes + +Figure 9.13 Logical isolation of audio and HTTP traffic classes + +HTTP flows see a logical link with capacity 1.0 and 0.5 Mbps, +respectively, as shown in Figure 9.13. With strict enforcement of the +link-level allocation of bandwidth, a class can use only the amount of +bandwidth that has been allocated; in particular, it cannot utilize +bandwidth that is not currently being used by others. For example, if +the audio flow goes silent (for example, if the speaker pauses and +generates no audio packets), the HTTP flow would still not be able to +transmit more than 0.5 Mbps over the R1-to-R2 link, even though the +audio flow's 1 Mbps bandwidth allocation is not being used at that +moment. Since bandwidth is a "use-it-or-lose-it" resource, there is no +reason to prevent HTTP traffic from using bandwidth not used by the +audio traffic. We'd like to use bandwidth as efficiently as possible, +never wasting it when it could be otherwise used. This gives rise to our +third insight: Insight 3: While providing isolation among classes or +flows, it is desirable to use resources (for example, link bandwidth and +buffers) as efficiently as possible. Recall from our discussion in +Sections 1.3 and 4.2 that packets belonging to various network flows are +multiplexed and queued for transmission at the output buffers associated +with a link. The manner in which queued packets are selected for +transmission on the link is known as the link-scheduling discipline, and +was discussed in detail in Section 4.2. Recall that in Section 4.2 three +link-scheduling disciplines were discussed, namely, FIFO, priority +queuing, and Weighted Fair Queuing (WFQ). We'll see soon see that WFQ +will play a particularly important role for isolating the traffic +classes. The Leaky Bucket One of our earlier insights was that policing, +the regulation of the rate at which a class or flow (we will assume the +unit of policing is a flow in our discussion below) is allowed to inject +packets into the + +network, is an important QoS mechanism. But what aspects of a flow's +packet rate should be policed? We can identify three important policing +criteria, each differing from the other according to the time scale over +which the packet flow is policed: Average rate. The network may wish to +limit the long-term average rate (packets per time interval) at which a +flow's packets can be sent into the network. A crucial issue here is the +interval of time over which the average rate will be policed. A flow +whose average rate is limited to 100 packets per second is more +constrained than a source that is limited to 6,000 packets per minute, +even though both have the same average rate over a long enough interval +of time. For example, the latter constraint would allow a flow to send +1,000 packets in a given second-long interval of time, while the former +constraint would disallow this sending behavior. Peak rate. While the +average-rate constraint limits the amount of traffic that can be sent +into the network over a relatively long period of time, a peak-rate +constraint limits the maximum number of packets that can be sent over a +shorter period of time. Using our example above, the network may police +a flow at an average rate of 6,000 packets per minute, while limiting +the flow's peak rate to 1,500 packets per second. Burst size. The +network may also wish to limit the maximum number of packets (the +"burst" of packets) that can be sent into the network over an extremely +short interval of time. In the limit, as the interval length approaches +zero, the burst size limits the number of packets that can be +instantaneously sent into the network. Even though it is physically +impossible to instantaneously send multiple packets into the network +(after all, every link has a physical transmission rate that cannot be +exceeded!), the abstraction of a maximum burst size is a useful one. The +leaky bucket mechanism is an abstraction that can be used to +characterize these policing limits. As shown in Figure 9.14, a leaky +bucket consists of a bucket that can hold up to b tokens. Tokens are +added to this bucket as follows. New tokens, which may potentially be +added to the bucket, are always being generated at a rate of r tokens +per second. (We assume here for simplicity that the unit of time is a +second.) If the bucket is filled with less than b tokens when a token is +generated, the newly generated token is added to the bucket; otherwise +the newly generated token is ignored, and the token bucket remains full +with b tokens. Let us now consider how the leaky bucket can be used to +police a packet flow. Suppose that before a packet is transmitted into +the network, it must first remove a token from the token bucket. If the +token bucket is empty, the packet must wait for + +Figure 9.14 The leaky bucket policer + +a token. (An alternative is for the packet to be dropped, although we +will not consider that option here.) Let us now consider how this +behavior polices a traffic flow. Because there can be at most b tokens +in the bucket, the maximum burst size for a leaky-bucket-policed flow is +b packets. Furthermore, because the token generation rate is r, the +maximum number of packets that can enter the network of any interval of +time of length t is rt+b. Thus, the token-generation rate, r, serves to +limit the long-term average rate at which packets can enter the network. +It is also possible to use leaky buckets (specifically, two leaky +buckets in series) to police a flow's peak rate in addition to the +long-term average rate; see the homework problems at the end of this +chapter. Leaky Bucket + Weighted Fair Queuing = Provable Maximum Delay +in a Queue Let's close our discussion on policing by showing how the +leaky bucket and WFQ can be combined to provide a bound on the delay +through a router's queue. (Readers who have forgotten about WFQ are +encouraged to review WFQ, which is covered in Section 4.2.) Let's +consider a router's output link that multiplexes n flows, each policed +by a leaky bucket with parameters bi and ri,i=1,...,n, using WFQ +scheduling. We use the term flow here loosely to refer to the set of +packets that are not distinguished from each other by the scheduler. In +practice, a flow might be comprised of traffic from a single end-toend +connection or a collection of many such connections, see Figure 9.15. +Recall from our discussion of WFQ that each flow, i, is guaranteed to +receive a share of the link bandwidth equal to at least R⋅wi/(∑ wj), +where R is the transmission + +Figure 9.15 n multiplexed leaky bucket flows with WFQ scheduling + +rate of the link in packets/sec. What then is the maximum delay that a +packet will experience while waiting for service in the WFQ (that is, +after passing through the leaky bucket)? Let us focus on flow 1. Suppose +that flow 1's token bucket is initially full. A burst of b1 packets then +arrives to the leaky bucket policer for flow 1. These packets remove all +of the tokens (without wait) from the leaky bucket and then join the WFQ +waiting area for flow 1. Since these b1 packets are served at a rate of +at least R⋅wi/(∑ wj) packet/sec, the last of these packets will then +have a maximum delay, dmax, until its transmission is completed, where +dmax=b1R⋅w1/∑ wj The rationale behind this formula is that if there are +b1 packets in the queue and packets are being serviced (removed) from +the queue at a rate of at least R⋅w1/(∑ wj) packets per second, then the +amount of time until the last bit of the last packet is transmitted +cannot be more than b1/(R⋅w1/(∑ wj)). A homework problem asks you to +prove that as long as r1\<R⋅w1/(∑ wj), then dmax is indeed the maximum +delay that any packet in flow 1 will ever experience in the WFQ queue. + +9.5.3 Diffserv Having seen the motivation, insights, and specific +mechanisms for providing multiple classes of service, let's wrap up our +study of approaches toward proving multiple classes of service with an +example---the Internet Diffserv architecture \[RFC 2475; Kilkki 1999\]. +Diffserv provides service differentiation---that is, the ability to +handle different classes of traffic in different ways within the +Internet in a scalable manner. + +The need for scalability arises from the fact that millions of +simultaneous source-destination traffic flows may be present at a +backbone router. We'll see shortly that this need is met by placing only +simple functionality within the network core, with more complex control +operations being implemented at the network's edge. Let's begin with the +simple network shown in Figure 9.16. We'll describe one possible use of +Diffserv here; other variations are possible, as described in RFC 2475. +The Diffserv architecture consists of two sets of functional elements: +Edge functions: Packet classification and traffic conditioning. At the +incoming edge of the network (that is, at either a Diffserv-capable host +that generates traffic or at the first Diffserv-capable router that the +traffic passes through), arriving packets are marked. More specifically, +the differentiated service (DS) field in the IPv4 or IPv6 packet header +is set to some value \[RFC 3260\]. The definition of the DS field is +intended to supersede the earlier definitions of the IPv4 type-ofservice +field and the IPv6 traffic class fields that we discussed in Chapter 4. +For example, in Figure 9.16, packets being sent from H1 to H3 might be +marked at R1, while packets being sent from H2 to H4 might be marked at +R2. The mark that a packet receives identifies the class of traffic to +which it belongs. Different classes of traffic will then receive +different service within the core network. + +Figure 9.16 A simple Diffserv network example + +Core function: Forwarding. When a DS-marked packet arrives at a +Diffserv-capable router, the packet is forwarded onto its next hop +according to the so-called per-hop behavior (PHB) associated with that +packet's class. The per-hop behavior influences how a router's buffers +and link bandwidth are shared among the competing classes of traffic. A +crucial tenet of the Diffserv architecture is that + +a router's per-hop behavior will be based only on packet markings, that +is, the class of traffic to which a packet belongs. Thus, if packets +being sent from H1 to H3 in Figure 9.16 receive the same marking as +packets being sent from H2 to H4, then the network routers treat these +packets as an aggregate, without distinguishing whether the packets +originated at H1 or H2. For example, R3 would not distinguish between +packets from H1 and H2 when forwarding these packets on to R4. Thus, the +Diffserv architecture obviates the need to keep router state for +individual sourcedestination pairs---a critical consideration in making +Diffserv scalable. An analogy might prove useful here. At many +large-scale social events (for example, a large public reception, a +large dance club or discothèque, a concert, or a football game), people +entering the event receive a pass of one type or another: VIP passes for +Very Important People; over-21 passes for people who are 21 years old or +older (for example, if alcoholic drinks are to be served); backstage +passes at concerts; press passes for reporters; even an ordinary pass +for the Ordinary Person. These passes are typically distributed upon +entry to the event, that is, at the edge of the event. It is here at the +edge where computationally intensive operations, such as paying for +entry, checking for the appropriate type of invitation, and matching an +invitation against a piece of identification, are performed. +Furthermore, there may be a limit on the number of people of a given +type that are allowed into an event. If there is such a limit, people +may have to wait before entering the event. Once inside the event, one's +pass allows one to receive differentiated service at many locations +around the event---a VIP is provided with free drinks, a better table, +free food, entry to exclusive rooms, and fawning service. Conversely, an +ordinary person is excluded from certain areas, pays for drinks, and +receives only basic service. In both cases, the service received within +the event depends solely on the type of one's pass. Moreover, all people +within a class are treated alike. Figure 9.17 provides a logical view of +the classification and marking functions within the edge router. Packets +arriving to the edge router are first classified. The classifier selects +packets based on the values of one or more packet header fields (for +example, source address, destination address, source port, destination +port, and protocol ID) and steers the packet to the appropriate marking +function. As noted above, a packet's marking is carried in the DS field +in the packet header. In some cases, an end user may have agreed to +limit its packet-sending rate to conform to a declared traffic profile. +The traffic profile might contain a limit on the peak rate, as well as +the burstiness of the packet flow, as we saw previously with the leaky +bucket mechanism. As long as the user sends packets into the network in +a way that conforms to the negotiated traffic profile, the packets +receive their priority + +Figure 9.17 A simple Diffserv network example + +marking and are forwarded along their route to the destination. On the +other hand, if the traffic profile is violated, out-of-profile packets +might be marked differently, might be shaped (for example, delayed so +that a maximum rate constraint would be observed), or might be dropped +at the network edge. The role of the metering function, shown in Figure +9.17, is to compare the incoming packet flow with the negotiated traffic +profile and to determine whether a packet is within the negotiated +traffic profile. The actual decision about whether to immediately +remark, forward, delay, or drop a packet is a policy issue determined by +the network administrator and is not specified in the Diffserv +architecture. So far, we have focused on the marking and policing +functions in the Diffserv architecture. The second key component of the +Diffserv architecture involves the per-hop behavior (PHB) performed by +Diffservcapable routers. PHB is rather cryptically, but carefully, +defined as "a description of the externally observable forwarding +behavior of a Diffserv node applied to a particular Diffserv behavior +aggregate" \[RFC 2475\]. Digging a little deeper into this definition, +we can see several important considerations embedded within: A PHB can +result in different classes of traffic receiving different performance +(that is, different externally observable forwarding behaviors). While a +PHB defines differences in performance (behavior) among classes, it does +not mandate any particular mechanism for achieving these behaviors. As +long as the externally observable performance criteria are met, any +implementation mechanism and any buffer/bandwidth allocation policy can +be used. For example, a PHB would not require that a particular +packet-queuing discipline (for example, a priority queue versus a WFQ +queue versus a FCFS queue) be used to achieve a particular behavior. The +PHB is the end, to which resource allocation and implementation +mechanisms are the means. Differences in performance must be observable +and hence measurable. + +Two PHBs have been defined: an expedited forwarding (EF) PHB \[RFC +3246\] and an assured forwarding (AF) PHB \[RFC 2597\]. The expedited +forwarding PHB specifies that the departure rate of a class of traffic +from a router must equal or exceed a configured rate. The assured +forwarding PHB divides traffic into four classes, where each AF class is +guaranteed to be provided with some minimum amount of bandwidth and +buffering. Let's close our discussion of Diffserv with a few +observations regarding its service model. First, we have implicitly +assumed that Diffserv is deployed within a single administrative domain, +but typically an endto-end service must be fashioned from multiple ISPs +sitting between communicating end systems. In order to provide +end-to-end Diffserv service, all the ISPs between the end systems must +not only provide this service, but most also cooperate and make +settlements in order to offer end customers true end-to-end service. +Without this kind of cooperation, ISPs directly selling Diffserv service +to customers will find themselves repeatedly saying: "Yes, we know you +paid extra, but we don't have a service agreement with the ISP that +dropped and delayed your traffic. I'm sorry that there were so many gaps +in your VoIP call!" Second, if Diffserv were actually in place and the +network ran at only moderate load, most of the time there would be no +perceived difference between a best-effort service and a Diffserv +service. Indeed, end-to-end delay is usually dominated by access rates +and router hops rather than by queuing delays in the routers. Imagine +the unhappy Diffserv customer who has paid more for premium service but +finds that the best-effort service being provided to others almost +always has the same performance as premium service! + +9.5.4 Per-Connection Quality-of-Service (QoS) Guarantees: Resource +Reservation and Call Admission In the previous section, we have seen +that packet marking and policing, traffic isolation, and link-level +scheduling can provide one class of service with better performance than +another. Under certain scheduling disciplines, such as priority +scheduling, the lower classes of traffic are essentially "invisible" to +the highest-priority class of traffic. With proper network dimensioning, +the highest class of service can indeed achieve extremely low packet +loss and delay---essentially circuit-like performance. But can the +network guarantee that an ongoing flow in a high-priority traffic class +will continue to receive such service throughout the flow's duration +using only the mechanisms that we have described so far? It cannot. In +this section, we'll see why yet additional network mechanisms and +protocols are required when a hard service guarantee is provided to +individual connections. Let's return to our scenario from Section 9.5.2 +and consider two 1 Mbps audio applications transmitting their packets +over the 1.5 Mbps link, as shown in Figure 9.18. The combined data rate +of the two flows (2 Mbps) exceeds the link capacity. Even with +classification and marking, isolation of flows, and sharing of unused +bandwidth (of which there is none), this is clearly a losing +proposition. There is simply not + +enough bandwidth to accommodate the needs of both applications at + +Figure 9.18 Two competing audio applications overloading the R1-to-R2 +link + +the same time. If the two applications equally share the bandwidth, each +application would lose 25 percent of its transmitted packets. This is +such an unacceptably low QoS that both audio applications are completely +unusable; there's no need even to transmit any audio packets in the +first place. Given that the two applications in Figure 9.18 cannot both +be satisfied simultaneously, what should the network do? Allowing both +to proceed with an unusable QoS wastes network resources on application +flows that ultimately provide no utility to the end user. The answer is +hopefully clear---one of the application flows should be blocked (that +is, denied access to the network), while the other should be allowed to +proceed on, using the full 1 Mbps needed by the application. The +telephone network is an example of a network that performs such call +blocking---if the required resources (an end-to-end circuit in the case +of the telephone network) cannot be allocated to the call, the call is +blocked (prevented from entering the network) and a busy signal is +returned to the user. In our example, there is no gain in allowing a +flow into the network if it will not receive a sufficient QoS to be +considered usable. Indeed, there is a cost to admitting a flow that does +not receive its needed QoS, as network resources are being used to +support a flow that provides no utility to the end user. By explicitly +admitting or blocking flows based on their resource requirements, and +the source requirements of already-admitted flows, the network can +guarantee that admitted flows will be able to receive their requested +QoS. Implicit in the need to provide a guaranteed QoS to a flow is the +need for the flow to declare its QoS requirements. This process of +having a flow declare its QoS requirement, and then having the network +either accept the flow (at the required QoS) or block the flow is +referred to as the call admission process. This then is our fourth +insight (in addition to the three earlier insights from Section 9.5.2,) +into the mechanisms needed to provide QoS. + +Insight 4: If sufficient resources will not always be available, and QoS +is to be guaranteed, a call admission process is needed in which flows +declare their QoS requirements and are then either admitted to the +network (at the required QoS) or blocked from the network (if the +required QoS cannot be provided by the network). Our motivating example +in Figure 9.18 highlights the need for several new network mechanisms +and protocols if a call (an end-to-end flow) is to be guaranteed a given +quality of service once it begins: Resource reservation. The only way to +guarantee that a call will have the resources (link bandwidth, buffers) +needed to meet its desired QoS is to explicitly allocate those resources +to the call---a process known in networking parlance as resource +reservation. Once resources are reserved, the call has on-demand access +to these resources throughout its duration, regardless of the demands of +all other calls. If a call reserves and receives a guarantee of x Mbps +of link bandwidth, and never transmits at a rate greater than x, the +call will see loss- and delay-free performance. Call admission. If +resources are to be reserved, then the network must have a mechanism for +calls to request and reserve resources. Since resources are not +infinite, a call making a call admission request will be denied +admission, that is, be blocked, if the requested resources are not +available. Such a call admission is performed by the telephone +network---we request resources when we dial a number. If the circuits +(TDMA slots) needed to complete the call are available, the circuits are +allocated and the call is completed. If the circuits are not available, +then the call is blocked, and we receive a busy signal. A blocked call +can try again to gain admission to the network, but it is not allowed to +send traffic into the network until it has successfully completed the +call admission process. Of course, a router that allocates link +bandwidth should not allocate more than is available at that link. +Typically, a call may reserve only a fraction of the link's bandwidth, +and so a router may allocate link bandwidth to more than one call. +However, the sum of the allocated bandwidth to all calls should be less +than the link capacity if hard quality of service guarantees are to be +provided. Call setup signaling. The call admission process described +above requires that a call be able to reserve sufficient resources at +each and every network router on its source-to-destination path to +ensure that its end-to-end QoS requirement is met. Each router must +determine the local resources required by the session, consider the +amounts of its resources that are already committed to other ongoing +sessions, and determine whether it has sufficient resources to satisfy +the per-hop QoS requirement of the session at this router without +violating local QoS guarantees made to an alreadyadmitted session. A +signaling protocol is needed to coordinate these various +activities---the per-hop allocation of local resources, as well as the +overall end-to-end decision of whether or not the call has been able to +reserve suf + +Figure 9.19 The call setup process + +ficient resources at each and every router on the end-to-end path. This +is the job of the call setup protocol, as shown in Figure 9.19. The RSVP +protocol \[Zhang 1993, RFC 2210\] was proposed for this purpose within +an Internet architecture for providing quality-of-service guarantees. In +ATM networks, the Q2931b protocol \[Black 1995\] carries this +information among the ATM network's switches and end point. Despite a +tremendous amount of research and development, and even products that +provide for perconnection quality of service guarantees, there has been +almost no extended deployment of such services. There are many possible +reasons. First and foremost, it may well be the case that the simple +application-level mechanisms that we studied in Sections 9.2 through +9.4, combined with proper network dimensioning (Section 9.5.1) provide +"good enough" best-effort network service for multimedia applications. +In addition, the added complexity and cost of deploying and managing a +network that provides per-connection quality of service guarantees may +be judged by ISPs to be simply too high given predicted customer +revenues for that service. + +9.6 Summary Multimedia networking is one of the most exciting +developments in the Internet today. People throughout the world less and +less time in front of their televisions, and are instead use their +smartphones and devices to receive audio and video transmissions, both +live and prerecorded. Moreover, with sites like YouTube, users have +become producers as well as consumers of multimedia Internet content. In +addition to video distribution, the Internet is also being used to +transport phone calls. In fact, over the next 10 years, the Internet, +along with wireless Internet access, may make the traditional +circuitswitched telephone system a thing of the past. VoIP not only +provides phone service inexpensively, but also provides numerous +value-added services, such as video conferencing, online directory +services, voice messaging, and integration into social networks such as +Facebook and WeChat. In Section 9.1, we described the intrinsic +characteristics of video and voice, and then classified multimedia +applications into three categories: (i) streaming stored audio/video, +(ii) conversational voice/video-over-IP, and (iii) streaming live +audio/video. In Section 9.2, we studied streaming stored video in some +depth. For streaming video applications, prerecorded videos are placed +on servers, and users send requests to these servers to view the videos +on demand. We saw that streaming video systems can be classified into +two categories: UDP streaming and HTTP. We observed that the most +important performance measure for streaming video is average throughput. +In Section 9.3, we examined how conversational multimedia applications, +such as VoIP, can be designed to run over a best-effort network. For +conversational multimedia, timing considerations are important because +conversational applications are highly delay-sensitive. On the other +hand, conversational multimedia applications are +loss---tolerant---occasional loss only causes occasional glitches in +audio/video playback, and these losses can often be partially or fully +concealed. We saw how a combination of client buffers, packet sequence +numbers, and timestamps can greatly alleviate the effects of +network-induced jitter. We also surveyed the technology behind Skype, +one of the leading voice- and video-over-IP companies. In Section 9.4, +we examined two of the most important standardized protocols for VoIP, +namely, RTP and SIP. In Section 9.5, we introduced how several network +mechanisms (link-level scheduling disciplines and traffic policing) can +be used to provide differentiated service among several classes of +traffic. + +Homework Problems and Questions + +Chapter 9 Review Questions + +SECTION 9.1 R1. Reconstruct Table 9.1 for when Victor Video is watching +a 4 Mbps video, Facebook Frank is looking at a new 100 Kbyte image every +20 seconds, and Martha Music is listening to 200 kbps audio stream. R2. +There are two types of redundancy in video. Describe them, and discuss +how they can be exploited for efficient compression. R3. Suppose an +analog audio signal is sampled 16,000 times per second, and each sample +is quantized into one of 1024 levels. What would be the resulting bit +rate of the PCM digital audio signal? R4. Multimedia applications can be +classified into three categories. Name and describe each category. + +SECTION 9.2 R5. Streaming video systems can be classified into three +categories. Name and briefly describe each of these categories. R6. List +three disadvantages of UDP streaming. R7. With HTTP streaming, are the +TCP receive buffer and the client's application buffer the same thing? +If not, how do they interact? R8. Consider the simple model for HTTP +streaming. Suppose the server sends bits at a constant rate of 2 Mbps +and playback begins when 8 million bits have been received. What is the +initial buffering delay tp? + +SECTION 9.3 R9. What is the difference between end-to-end delay and +packet jitter? What are the causes of packet jitter? R10. Why is a +packet that is received after its scheduled playout time considered +lost? R11. Section 9.3 describes two FEC schemes. Briefly summarize +them. Both schemes increase the transmission rate of the stream by +adding overhead. Does interleaving also increase the + +transmission rate? + +SECTION 9.4 R12. How are different RTP streams in different sessions +identified by a receiver? How are different streams from within the same +session identified? R13. What is the role of a SIP registrar? How is the +role of an SIP registrar different from that of a home agent in Mobile +IP? + +Problems P1. Consider the figure below. Similar to our discussion of +Figure 9.1 , suppose that video is encoded at a fixed bit rate, and thus +each video block contains video frames that are to be played out over +the same fixed amount of time, Δ. The server transmits the first video +block at t0, the second block at t0+Δ, the third block at t0+2Δ, and so +on. Once the client begins playout, each block should be played out Δ +time units after the previous block. + +a. Suppose that the client begins playout as soon as the first block + arrives at t1. In the figure below, how many blocks of video + (including the first block) will have arrived at the client in time + for their playout? Explain how you arrived at your answer. + +b. Suppose that the client begins playout now at t1+Δ. How many blocks + of video (including the first block) will have arrived at the client + in time for their playout? Explain how you arrived at your answer. + +c. In the same scenario at (b) above, what is the largest number of + blocks that is ever stored in the client buffer, awaiting playout? + Explain how you arrived at your answer. + +d. What is the smallest playout delay at the client, such that every + video block has arrived in time for its playout? Explain how you + arrived at your answer. + +P2. Recall the simple model for HTTP streaming shown in Figure 9.3 . +Recall that B denotes the size of the client's application buffer, and Q +denotes the number of bits that must be buffered before the client +application begins playout. Also r denotes the video consumption rate. +Assume that the server sends bits at a constant rate x whenever the +client buffer is not full. a. Suppose that x\<r. As discussed in the +text, in this case playout will alternate between periods of continuous +playout and periods of freezing. Determine the length of each continuous +playout and freezing period as a function of Q, r, and x. b. Now suppose +that x\>r. At what time t=tf does the client application buffer become +full? P3. Recall the simple model for HTTP streaming shown in Figure 9.3 +. Suppose the buffer size is infinite but the server sends bits at +variable rate x(t). Specifically, suppose x(t) has the following +saw-tooth shape. The rate is initially zero at time t=0 and linearly +climbs to H at time t=T. It then repeats this pattern again and again, +as shown in the figure below. + +a. What is the server's average send rate? + +b. Suppose that Q=0, so that the client starts playback as soon as it + receives a video frame. What will happen? + +c. Now suppose Q\>0 and HT/2≥Q. Determine as a function of Q, H, and T + the time at which playback first begins. + +d. Suppose H\>2r and Q=HT/2. Prove there will be no freezing after the + initial playout delay. + +e. Suppose H\>2r. Find the smallest value of Q such that there will be + no freezing after the initial playback delay. + +f. Now suppose that the buffer size B is finite. Suppose H\>2r. As a + function of Q, B, T, and H, determine the time t=tf when the client + application buffer first becomes full. P4. Recall the simple model + for HTTP streaming shown in Figure 9.3 . Suppose the client + application buffer is infinite, the server sends at the constant + rate x, and the video consumption r\<x. + +rate is r with Also suppose playback begins immediately. Suppose that +the user terminates the video early at time t=E. At the time of +termination, the server stops sending bits (if it hasn't already sent +all the bits in the video). + +a. Suppose the video is infinitely long. How many bits are wasted (that + is, sent but not viewed)? + +b. Suppose the video is T seconds long with T\>E. How many bits are + wasted (that is, sent but not viewed)? P5. Consider a DASH system + (as discussed in Section 2.6 ) for which there are N video versions + (at N different rates and qualities) and N audio versions (at N + different rates and qualities). Suppose we want to allow the player + to choose at any time any of the N video versions and any of the N + audio versions. + +c. If we create files so that the audio is mixed in with the video, so + server sends only one media stream at given time, how many files + will the server need to store (each a different URL)? + +d. If the server instead sends the audio and video streams separately + and has the client synchronize the streams, how many files will the + server need to store? P6. In the VoIP example in Section 9.3 , let h + be the total number of header bytes added to each chunk, including + UDP and IP header. + +e. Assuming an IP datagram is emitted every 20 msecs, find the + transmission rate in bits per second for the datagrams generated by + one side of this application. + +f. What is a typical value of h when RTP is used? P7. Consider the + procedure described in Section 9.3 for estimating average delay di. + Suppose that u=0.1. Let r1−t1 be the most recent sample delay, let + r2−t2 be the next most recent sample delay, and so on. + +g. For a given audio application suppose four packets have arrived at + the receiver with sample delays r4−t4, r3−t3, r2−t2, and r1−t1. + Express the estimate of delay d in terms of the four samples. + +h. Generalize your formula for n sample delays. + +i. For the formula in part (b), let n approach infinity and give the + resulting formula. Comment on why this averaging procedure is called + an exponential moving average. P8. Repeat parts (a) and (b) in + Question P7 for the estimate of average delay deviation. P9. For the + VoIP example in Section 9.3 , we introduced an online procedure + (exponential moving average) for estimating delay. In this problem + we will examine an alternative procedure. Let ti be the timestamp of + the ith packet received; let ri be the time at which the ith packet + is received. Let dn be our estimate of average delay after receiving + the nth packet. After the first packet is received, we set the delay + estimate equal to d1=r1−t1. + +a. Suppose that we would like dn=(r1−t1+r2−t2+⋯+rn−tn)/n for all n. Give +a recursive formula for dn in terms of dn−1, rn, and tn. + +b. Describe why for Internet telephony, the delay estimate described in + Section 9.3 is more appropriate than the delay estimate outlined in + part (a). P10. Compare the procedure described in Section 9.3 for + estimating average delay with the procedure in Section 3.5 for + estimating round-trip time. What do the procedures have in common? + How are they different? P11. Consider the figure below (which is + similar to Figure 9.3 ). A sender begins sending packetized audio + periodically at t=1. The first packet arrives at the receiver at + t=8. + +c. What are the delays (from sender to receiver, ignoring any playout + delays) of packets 2 through 8? Note that each vertical and + horizontal line segment in the figure has a length of 1, 2, or 3 + time units. + +d. If audio playout begins as soon as the first packet arrives at the + receiver at t=8, which of the first eight packets sent will not + arrive in time for playout? + +e. If audio playout begins at t=9, which of the first eight packets + sent will not arrive in time for playout? + +f. What is the minimum playout delay at the receiver that results in + all of the first eight packets arriving in time for their playout? + P12. Consider again the figure in P11, showing packet audio + transmission and reception times. + +g. Compute the estimated delay for packets 2 through 8, using the + formula for di from Section 9.3.2 . Use a value of u=0.1. + +b. Compute the estimated deviation of the delay from the estimated +average for packets 2 through 8, using the formula for vi from Section +9.3.2 . Use a value of u=0.1. P13. Recall the two FEC schemes for VoIP +described in Section 9.3 . Suppose the first scheme generates a +redundant chunk for every four original chunks. Suppose the second +scheme uses a low-bit rate encoding whose transmission rate is 25 +percent of the transmission rate of the nominal stream. + +a. How much additional bandwidth does each scheme require? How much + playback delay does each scheme add? + +b. How do the two schemes perform if the first packet is lost in every + group of five packets? Which scheme will have better audio quality? + +c. How do the two schemes perform if the first packet is lost in every + group of two packets? Which scheme will have better audio quality? + P14. + +d. Consider an audio conference call in Skype with N\>2 participants. + Suppose each participant generates a constant stream of rate r bps. + How many bits per second will the call initiator need to send? How + many bits per second will each of the other N−1 participants need to + send? What is the total send rate, aggregated over all participants? + +e. Repeat part (a) for a Skype video conference call using a central + server. + +f. Repeat part (b), but now for when each peer sends a copy of its + video stream to each of the N−1 other peers. P15. + +g. Suppose we send into the Internet two IP datagrams, each carrying a + different UDP segment. The first datagram has source IP address A1, + destination IP address B, source port P1, and destination port T. + The second datagram has source IP address A2, destination IP address + B, source port P2, and destination port T. Suppose that A1 is + different from A2 and that P1 is different from P2. Assuming that + both datagrams reach their final destination, will the two UDP + datagrams be received by the same socket? Why or why not? + +h. Suppose Alice, Bob, and Claire want to have an audio conference call + using SIP and RTP. For Alice to send and receive RTP packets to and + from Bob and Claire, is only one UDP socket sufficient (in addition + to the socket needed for the SIP messages)? If yes, then how does + Alice's SIP client distinguish between the RTP packets received from + Bob and Claire? P16. True or false: + +i. If stored video is streamed directly from a Web server to a media + player, then the application is using TCP as the underlying + transport protocol. + +b. When using RTP, it is possible for a sender to change encoding in the +middle of a session. + +c. All applications that use RTP must use port 87. + +d. If an RTP session has a separate audio and video stream for each + sender, then the audio and video streams use the same SSRC. + +e. In differentiated services, while per-hop behavior defines + differences in performance among classes, it does not mandate any + particular mechanism for achieving these performances. + +f. Suppose Alice wants to establish an SIP session with Bob. In her + INVITE message she includes the line: m=audio 48753 RTP/AVP 3 (AVP 3 + denotes GSM audio). Alice has therefore indicated in this message + that she wishes to send GSM audio. + +g. Referring to the preceding statement, Alice has indicated in her + INVITE message that she will send audio to port 48753. + +h. SIP messages are typically sent between SIP entities using a default + SIP port number. + +i. In order to maintain registration, SIP clients must periodically + send REGISTER messages. + +j. SIP mandates that all SIP clients support G.711 audio encoding. P17. + Consider the figure below, which shows a leaky bucket policer being + fed by a stream of packets. The token buffer can hold at most two + tokens, and is initially full at t=0. New tokens arrive at a rate of + one token per slot. The output link speed is such that if two + packets obtain tokens at the beginning of a time slot, they can both + go to the output link in the same slot. The timing details of the + system are as follows: + +A. Packets (if any) arrive at the beginning of the slot. Thus in the +figure, packets 1, 2, and 3 arrive in slot 0. If there are already +packets in the queue, then the arriving packets join the end of the +queue. Packets proceed towards the front of the queue in a FIFO manner. + +B. After the arrivals have been added to the queue, if there are any +queued packets, one or two of those packets (depending on the number of +available tokens) will each remove a token from the token buffer and go +to the output link during that slot. Thus, packets 1 and + +2 each remove a token from the buffer (since there are initially two +tokens) and go to the output link during slot 0. + +C. A new token is added to the token buffer if it is not full, since the +token generation rate is r = 1 token/slot. + +D. Time then advances to the next time slot, and these steps repeat. +Answer the following questions: + +a. For each time slot, identify the packets that are in the queue and + the number of tokens in the bucket, immediately after the arrivals + have been processed (step 1 above) but before any of the packets + have passed through the queue and removed a token. Thus, for the t=0 + time slot in the example above, packets 1, 2, and 3 are in the + queue, and there are two tokens in the buffer. + +b. For each time slot indicate which packets appear on the output after + the token(s) have been removed from the queue. Thus, for the t=0 + time slot in the example above, packets 1 and 2 appear on the output + link from the leaky buffer during slot 0. P18. Repeat P17 but assume + that r=2. Assume again that the bucket is initially full. P19. + Consider P18 and suppose now that r=3 and that b=2 as before. Will + your answer to the question above change? P20. Consider the leaky + bucket policer that polices the average rate and burst size of a + packet flow. We now want to police the peak rate, p, as well. Show + how the output of this leaky bucket policer can be fed into a second + leaky bucket policer so that the two leaky buckets in series police + the average rate, peak rate, and burst size. Be sure to give the + bucket size and token generation rate for the second policer. P21. A + packet flow is said to conform to a leaky bucket specification + (r, b) with burst size b and average rate r if the number of packets + that arrive to the leaky bucket is less than rt+b packets in every + interval of time of length t for all t. Will a packet flow that + conforms to a leaky bucket specification (r, b) ever have to wait at + a leaky bucket policer with parameters r and b? Justify your answer. + P22. Show that as long as r1\<Rw1/(∑ wj), then dmax is indeed the + maximum delay that any packet in flow 1 will ever experience in the + WFQ queue. Programming Assignment In this lab, you will implement a + streaming video server and client. The client will use the real-time + streaming protocol (RTSP) to control the actions of the server. The + server will use the real-time protocol (RTP) to packetize the video + for transport over UDP. You will be given Python code that partially + implements RTSP and RTP at the client and server. Your job will be + to complete both the client and server code. When you are finished, + you will have created a client-server application that does the + following: + +The client sends SETUP, PLAY, PAUSE, and TEARDOWN RTSP commands, and the +server responds to the commands. When the server is in the playing +state, it periodically grabs a stored JPEG frame, packetizes the frame +with RTP, and sends the RTP packet into a UDP socket. The client +receives the RTP packets, removes the JPEG frames, decompresses the +frames, and renders the frames on the client's monitor. The code you +will be given implements the RTSP protocol in the server and the RTP +depacketization in the client. The code also takes care of displaying +the transmitted video. You will need to implement RTSP in the client and +RTP server. This programming assignment will significantly enhance the +student's understanding of RTP, RTSP, and streaming video. It is highly +recommended. The assignment also suggests a number of optional +exercises, including implementing the RTSP DESCRIBE command at both +client and server. You can find full details of the assignment, as well +as an overview of the RTSP protocol, at the Web site +www.pearsonhighered.com/cs-resources. AN INTERVIEW WITH . . . Henning +Schulzrinne Henning Schulzrinne is a professor, chair of the Department +of Computer Science, and head of the Internet Real-Time Laboratory at +Columbia University. He is the co-author of RTP, RTSP, SIP, and +GIST---key protocols for audio and video communications over the +Internet. Henning received his BS in electrical and industrial +engineering at TU Darmstadt in Germany, his MS in electrical and +computer engineering at the University of Cincinnati, and his PhD in +electrical engineering at the University of Massachusetts, Amherst. + +What made you decide to specialize in multimedia networking? This +happened almost by accident. As a PhD student, I got involved with +DARTnet, an experimental network spanning the United States with T1 +lines. DARTnet was used as a proving ground for multicast and Internet +real-time tools. That led me to write my first audio tool, NeVoT. +Through some of the DARTnet participants, I became involved in the IETF, +in the then-nascent + +Audio Video Transport working group. This group later ended up +standardizing RTP. What was your first job in the computer industry? +What did it entail? My first job in the computer industry was soldering +together an Altair computer kit when I was a high school student in +Livermore, California. Back in Germany, I started a little consulting +company that devised an address management program for a travel +agency---storing data on cassette tapes for our TRS-80 and using an IBM +Selectric typewriter with a home-brew hardware interface as a printer. +My first real job was with AT&T Bell Laboratories, developing a network +emulator for constructing experimental networks in a lab environment. +What are the goals of the Internet Real-Time Lab? Our goal is to provide +components and building blocks for the Internet as the single future +communications infrastructure. This includes developing new protocols, +such as GIST (for network-layer signaling) and LoST (for finding +resources by location), or enhancing protocols that we have worked on +earlier, such as SIP, through work on rich presence, peer-to-peer +systems, next-generation emergency calling, and service creation tools. +Recently, we have also looked extensively at wireless systems for VoIP, +as 802.11b and 802.11n networks and maybe WiMax networks are likely to +become important last-mile technologies for telephony. We are also +trying to greatly improve the ability of users to diagnose faults in the +complicated tangle of providers and equipment, using a peer-to-peer +fault diagnosis system called DYSWIS (Do You See What I See). We try to +do practically relevant work, by building prototypes and open source +systems, by measuring performance of real systems, and by contributing +to IETF standards. What is your vision for the future of multimedia +networking? We are now in a transition phase; just a few years shy of +when IP will be the universal platform for multimedia services, from +IPTV to VoIP. We expect radio, telephone, and TV to be available even +during snowstorms and earthquakes, so when the Internet takes over the +role of these dedicated networks, users will expect the same level of +reliability. We will have to learn to design network technologies for an +ecosystem of competing carriers, service and content providers, serving +lots of technically untrained users and defending them against a small, +but destructive, set of malicious and criminal users. Changing protocols +is becoming increasingly hard. They are also becoming more complex, as +they need to take into account competing business interests, security, +privacy, and the lack of transparency of networks caused by firewalls +and network address translators. Since multimedia networking is becoming +the foundation for almost all of consumer + +entertainment, there will be an emphasis on managing very large +networks, at low cost. Users will expect ease of use, such as finding +the same content on all of their devices. Why does SIP have a promising +future? As the current wireless network upgrade to 3G networks proceeds, +there is the hope of a single multimedia signaling mechanism spanning +all types of networks, from cable modems, to corporate telephone +networks and public wireless networks. Together with software radios, +this will make it possible in the future that a single device can be +used on a home network, as a cordless BlueTooth phone, in a corporate +network via 802.11 and in the wide area via 3G networks. Even before we +have such a single universal wireless device, the personal mobility +mechanisms make it possible to hide the differences between networks. +One identifier becomes the universal means of reaching a person, rather +than remembering or passing around half a dozen technology- or +location-specific telephone numbers. SIP also breaks apart the provision +of voice (bit) transport from voice services. It now becomes technically +possible to break apart the local telephone monopoly, where one company +provides neutral bit transport, while others provide IP "dial tone" and +the classical telephone services, such as gateways, call forwarding, and +caller ID. Beyond multimedia signaling, SIP offers a new service that +has been missing in the Internet: event notification. We have +approximated such services with HTTP kludges and e-mail, but this was +never very satisfactory. Since events are a common abstraction for +distributed systems, this may simplify the construction of new services. +Do you have any advice for students entering the networking field? +Networking bridges disciplines. It draws from electrical engineering, +all aspects of computer science, operations research, statistics, +economics, and other disciplines. Thus, networking researchers have to +be familiar with subjects well beyond protocols and routing algorithms. +Given that networks are becoming such an important part of everyday +life, students wanting to make a difference in the field should think of +the new resource constraints in networks: human time and effort, rather +than just bandwidth or storage. Work in networking research can be +immensely satisfying since it is about allowing people to communicate +and exchange ideas, one of the essentials of being human. The Internet +has become the third major global infrastructure, next to the +transportation system and energy distribution. Almost no part of the +economy can work without high-performance networks, so there should be +plenty of opportunities for the foreseeable future. + +References A note on URLs. In the references below, we have provided +URLs for Web pages, Web-only documents, and other material that has not +been published in a conference or journal (when we have been able to +locate a URL for such material). We have not provided URLs for +conference and journal publications, as these documents can usually be +located via a search engine, from the conference Web site (e.g., papers +in all ACM SIGCOMM conferences and workshops can be located via +http://www.acm.org/ sigcomm), or via a digital library subscription. +While all URLs provided below were valid (and tested) in Jan. 2016, URLs +can become out of date. Please consult the online version of this book +(www.pearsonhighered .com/cs-resources) for an up-to-date bibliography. +A note on Internet Request for Comments (RFCs): Copies of Internet RFCs +are available at many sites. The RFC Editor of the Internet Society (the +body that oversees the RFCs) maintains the site, +http://www.rfc-editor.org. This site allows you to search for a specific +RFC by title, number, or authors, and will show updates to any RFCs +listed. Internet RFCs can be updated or obsoleted by later RFCs. Our +favorite site for getting RFCs is the original +source---http://www.rfc-editor.org. \[3GPP 2016\] Third Generation +Partnership Project homepage, http://www.3gpp.org/ + +\[Abramson 1970\] N. Abramson, "The Aloha System---Another Alternative +for Computer Communications," Proc. 1970 Fall Joint Computer Conference, +AFIPS Conference, p. 37, 1970. + +\[Abramson 1985\] N. Abramson, "Development of the Alohanet," IEEE +Transactions on Information Theory, Vol. IT-31, No. 3 (Mar. 1985), +pp. 119--123. + +\[Abramson 2009\] N. Abramson, "The Alohanet---Surfing for Wireless +Data," IEEE Communications Magazine, Vol. 47, No. 12, pp. 21--25. + +\[Adhikari 2011a\] V. K. Adhikari, S. Jain, Y. Chen, Z. L. Zhang, +"Vivisecting YouTube: An Active Measurement Study," Technical Report, +University of Minnesota, 2011. + +\[Adhikari 2012\] V. K. Adhikari, Y. Gao, F. Hao, M. Varvello, V. Hilt, +M. Steiner, Z. L. Zhang, "Unreeling Netflix: Understanding and Improving +Multi-CDN Movie Delivery," Technical Report, University of Minnesota, +2012. + +\[Afanasyev 2010\] A. Afanasyev, N. Tilley, P. Reiher, L. Kleinrock, +"Host-to-Host Congestion Control for TCP," IEEE Communications Surveys & +Tutorials, Vol. 12, No. 3, pp. 304--342. + +\[Agarwal 2009\] S. Agarwal, J. Lorch, "Matchmaking for Online Games and +Other Latency-sensitive P2P Systems," Proc. 2009 ACM SIGCOMM. + +\[Ager 2012\] B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, W. +Willinger, "Anatomy of a Large European ISP," Sigcomm, 2012. + +\[Ahn 1995\] J. S. Ahn, P. B. Danzig, Z. Liu, and Y. Yan, "Experience +with TCP Vegas: Emulation and Experiment," Proc. 1995 ACM SIGCOMM +(Boston, MA, Aug. 1995), pp. 185--195. + +\[Akamai 2016\] Akamai homepage, http://www.akamai.com + +\[Akella 2003\] A. Akella, S. Seshan, A. Shaikh, "An Empirical +Evaluation of Wide-Area Internet Bottlenecks," Proc. 2003 ACM Internet +Measurement Conference (Miami, FL, Nov. 2003). + +\[Akhshabi 2011\] S. Akhshabi, A. C. Begen, C. Dovrolis, "An +Experimental Evaluation of Rate-Adaptation Algorithms in Adaptive +Streaming over HTTP," Proc. 2011 ACM Multimedia Systems Conf. + +\[Akyildiz 2010\] I. Akyildiz, D. Gutierrex-Estevez, E. Reyes, "The +Evolution to 4G Cellular Systems, LTE Advanced," Physical Communication, +Elsevier, 3 (2010), 217--244. + +\[Albitz 1993\] P. Albitz and C. Liu, DNS and BIND, O'Reilly & +Associates, Petaluma, CA, 1993. + +\[Al-Fares 2008\] M. Al-Fares, A. Loukissas, A. Vahdat, "A Scalable, +Commodity Data Center Network Architecture," Proc. 2008 ACM SIGCOMM. + +\[Amazon 2014\] J. Hamilton, "AWS: Innovation at Scale, YouTube video, +https://www.youtube.com/watch?v=JIQETrFC_SQ + +\[Anderson 1995\] J. B. Andersen, T. S. Rappaport, S. Yoshida, +"Propagation Measurements and Models for Wireless Communications +Channels," IEEE Communications Magazine, (Jan. 1995), pp. 42--49. + +\[Alizadeh 2010\] M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. +Patel, B. Prabhakar, S. Sengupta, M. Sridharan. "Data center TCP +(DCTCP)," ACM SIGCOMM 2010 Conference, ACM, New York, NY, USA, +pp. 63--74. + +\[Allman 2011\] E. Allman, "The Robustness Principle Reconsidered: +Seeking a Middle Ground," Communications of the ACM, Vol. 54, No. 8 +(Aug. 2011), pp. 40--45. + +\[Appenzeller 2004\] G. Appenzeller, I. Keslassy, N. McKeown, "Sizing +Router Buffers," Proc. 2004 ACM SIGCOMM (Portland, OR, Aug. 2004). + +\[ASO-ICANN 2016\] The Address Supporting Organization homepage, +http://www.aso.icann.org + +\[AT&T 2013\] "AT&T Vision Alignment Challenge Technology Survey," AT&T +Domain 2.0 Vision White Paper, November 13, 2013. + +\[Atheros 2016\] Atheros Communications Inc., "Atheros AR5006 WLAN +Chipset Product Bulletins," +http://www.atheros.com/pt/AR5006Bulletins.htm + +\[Ayanoglu 1995\] E. Ayanoglu, S. Paul, T. F. La Porta, K. K. Sabnani, +R. D. Gitlin, "AIRMAIL: A Link-Layer Protocol for Wireless Networks," +ACM ACM/Baltzer Wireless Networks Journal, 1: 47--60, Feb. 1995. + +\[Bakre 1995\] A. Bakre, B. R. Badrinath, "I-TCP: Indirect TCP for +Mobile Hosts," Proc. 1995 Int. Conf. on Distributed Computing Systems +(ICDCS) (May 1995), pp. 136--143. + +\[Balakrishnan 1997\] H. Balakrishnan, V. Padmanabhan, S. Seshan, R. +Katz, "A Comparison of Mechanisms for Improving TCP Performance Over +Wireless Links," IEEE/ACM Transactions on Networking Vol. 5, No. 6 +(Dec. 1997). + +\[Balakrishnan 2003\] H. Balakrishnan, F. Kaashoek, D. Karger, R. +Morris, I. Stoica, "Looking Up Data in P2P Systems," Communications of +the ACM, Vol. 46, No. 2 (Feb. 2003), pp. 43--48. + +\[Baldauf 2007\] M. Baldauf, S. Dustdar, F. Rosenberg, "A Survey on +Context-Aware Systems," Int. J. Ad Hoc and Ubiquitous Computing, Vol. 2, +No. 4 (2007), pp. 263--277. + +\[Baran 1964\] P. Baran, "On Distributed Communication Networks," IEEE +Transactions on Communication Systems, Mar. 1964. Rand Corporation +Technical report with the same title (Memorandum RM-3420-PR, 1964). +http://www.rand.org/publications/RM/RM3420/ + +\[Bardwell 2004\] J. Bardwell, "You Believe You Understand What You +Think I Said . . . The Truth About 802.11 Signal and Noise Metrics: A +Discussion Clarifying OftenMisused 802.11 WLAN Terminologies," +http://www.connect802.com/download/techpubs/2004/you_believe_D100201.pdf + +\[Barford 2009\] P. Barford, N. Duffield, A. Ron, J. Sommers, "Network +Performance Anomaly Detection and Localization," Proc. 2009 IEEE INFOCOM +(Apr. 2009). + +\[Baronti 2007\] P. Baronti, P. Pillai, V. Chook, S. Chessa, A. Gotta, +Y. Hu, "Wireless Sensor Networks: A Survey on the State of the Art and +the 802.15.4 and ZigBee Standards," Computer Communications, Vol. 30, +No. 7 (2007), pp. 1655--1695. + +\[Baset 2006\] S. A. Basset and H. Schulzrinne, "An Analysis of the +Skype Peer-to-Peer Internet Telephony Protocol," Proc. 2006 IEEE INFOCOM +(Barcelona, Spain, Apr. 2006). + +\[BBC 2001\] BBC news online "A Small Slice of Design," Apr. 2001, +http://news.bbc.co.uk/2/hi/science/nature/1264205.stm + +\[Beheshti 2008\] N. Beheshti, Y. Ganjali, M. Ghobadi, N. McKeown, G. +Salmon, "Experimental Study of Router Buffer Sizing," Proc. ACM Internet +Measurement Conference (Oct. 2008, Vouliagmeni, Greece). + +\[Bender 2000\] P. Bender, P. Black, M. Grob, R. Padovani, N. +Sindhushayana, A. Viterbi, "CDMA/HDR: A Bandwidth-Efficient High-Speed +Wireless Data Service for Nomadic Users," IEEE Commun. Mag., Vol. 38, +No. 7 (July 2000), pp. 70--77. + +\[Berners-Lee 1989\] T. Berners-Lee, CERN, "Information Management: A +Proposal," Mar. 1989, May 1990. http://www.w3.org/History/1989/proposal +.html + +\[Berners-Lee 1994\] T. Berners-Lee, R. Cailliau, A. Luotonen, H. +Frystyk Nielsen, A. Secret, "The World-Wide Web," Communications of the +ACM, Vol. 37, No. 8 (Aug. 1994), pp. 76--82. + +\[Bertsekas 1991\] D. Bertsekas, R. Gallagher, Data Networks, 2nd Ed., +Prentice Hall, Englewood Cliffs, NJ, 1991. + +\[Biersack 1992\] E. W. Biersack, "Performance Evaluation of Forward +Error Correction in ATM Networks," Proc. 1999 ACM SIGCOMM (Baltimore, +MD, Aug. 1992), pp. 248--257. + +\[BIND 2016\] Internet Software Consortium page on BIND, +http://www.isc.org/bind.html + +\[Bisdikian 2001\] C. Bisdikian, "An Overview of the Bluetooth Wireless +Technology," IEEE Communications Magazine, No. 12 (Dec. 2001), +pp. 86--94. + +\[Bishop 2003\] M. Bishop, Computer Security: Art and Science, Boston: +Addison Wesley, Boston MA, 2003. + +\[Black 1995\] U. Black, ATM Volume I: Foundation for Broadband +Networks, Prentice Hall, 1995. + +\[Black 1997\] U. Black, ATM Volume II: Signaling in Broadband Networks, +Prentice Hall, 1997. + +\[Blumenthal 2001\] M. Blumenthal, D. Clark, "Rethinking the Design of +the Internet: The End-to-end Arguments vs. the Brave New World," ACM +Transactions on Internet Technology, Vol. 1, No. 1 (Aug. 2001), +pp. 70--109. + +\[Bochman 1984\] G. V. Bochmann, C. A. Sunshine, "Formal Methods in +Communication Protocol Design," IEEE Transactions on Communications, +Vol. 28, No. 4 (Apr. 1980) pp. 624--631. + +\[Bolot 1996\] J-C. Bolot, A. Vega-Garcia, "Control Mechanisms for +Packet Audio in the Internet," Proc. 1996 IEEE INFOCOM, pp. 232--239. + +\[Bosshart 2013\] P. Bosshart, G. Gibb, H. Kim, G. Varghese, N. McKeown, +M. Izzard, F. Mujica, M. Horowitz, "Forwarding Metamorphosis: Fast +Programmable Match-Action Processing in Hardware for SDN," ACM SIGCOMM +Comput. Commun. Rev. 43, 4 (Aug. 2013), 99--110. + +\[Bosshart 2014\] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, +J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, D. +Walker, "P4: Programming Protocol-Independent Packet Processors," ACM +SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), pp. 87--95. + +\[Brakmo 1995\] L. Brakmo, L. Peterson, "TCP Vegas: End to End +Congestion Avoidance on a Global Internet," IEEE Journal of Selected +Areas in Communications, Vol. 13, No. 8 (Oct. 1995), pp. 1465--1480. + +\[Bryant 1988\] B. Bryant, "Designing an Authentication System: A +Dialogue in Four Scenes," http://web.mit.edu/kerberos/www/dialogue.html + +\[Bush 1945\] V. Bush, "As We May Think," The Atlantic Monthly, July +1945. http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm + +\[Byers 1998\] J. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A Digital +Fountain Approach to Reliable Distribution of Bulk Data," Proc. 1998 ACM +SIGCOMM (Vancouver, Canada, Aug. 1998), pp. 56--67. + +\[Caesar 2005a\] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A. +Shaikh, J. van der Merwe, "Design and implementation of a Routing +Control Platform," Proc. Networked Systems Design and Implementation +(May 2005). + +\[Caesar 2005b\] M. Caesar, J. Rexford, "BGP Routing Policies in ISP +Networks," IEEE Network Magazine, Vol. 19, No. 6 (Nov. 2005). + +\[Caldwell 2012\] C. Caldwell, "The Prime Pages," +http://www.utm.edu/research/primes/prove + +\[Cardwell 2000\] N. Cardwell, S. Savage, T. Anderson, "Modeling TCP +Latency," Proc. 2000 IEEE INFOCOM (Tel-Aviv, Israel, Mar. 2000). + +\[Casado 2007\] M. Casado, M. Freedman, J. Pettit, J. Luo, N. McKeown, +S. Shenker, "Ethane: Taking Control of the Enterprise," Proc. ACM +SIGCOMM '07, New York, pp. 1--12. See also IEEE/ACM Trans. Networking, +17, 4 (Aug. 2007), pp. 270--1283. + +\[Casado 2009\] M. Casado, M. Freedman, J. Pettit, J. Luo, N. Gude, N. +McKeown, S. Shenker, "Rethinking Enterprise Network Control," IEEE/ACM +Transactions on Networking (ToN), Vol. 17, No. 4 (Aug. 2009), +pp. 1270--1283. + +\[Casado 2014\] M. Casado, N. Foster, A. Guha, "Abstractions for +Software-Defined Networks," Communications of the ACM, Vol. 57 No. 10, +(Oct. 2014), pp. 86--95. + +\[Cerf 1974\] V. Cerf, R. Kahn, "A Protocol for Packet Network +Interconnection," IEEE Transactions on Communications Technology, Vol. +COM-22, No. 5, pp. 627--641. + +\[CERT 2001--09\] CERT, "Advisory 2001--09: Statistical Weaknesses in +TCP/IP Initial Sequence Numbers," +http://www.cert.org/advisories/CA-2001-09.html + +\[CERT 2003--04\] CERT, "CERT Advisory CA-2003-04 MS-SQL Server Worm," +http://www.cert.org/advisories/CA-2003-04.html + +\[CERT 2016\] CERT, http://www.cert.org + +\[CERT Filtering 2012\] CERT, "Packet Filtering for Firewall Systems," +http://www.cert.org/tech_tips/packet_filtering.html + +\[Cert SYN 1996\] CERT, "Advisory CA-96.21: TCP SYN Flooding and IP +Spoofing Attacks," http://www.cert.org/advisories/CA-1998-01.html + +\[Chandra 2007\] T. Chandra, R. Greisemer, J. Redstone, "Paxos Made +Live: an Engineering Perspective," Proc. of 2007 ACM Symposium on +Principles of Distributed Computing (PODC), pp. 398--407. + +\[Chao 2001\] H. J. Chao, C. Lam, E. Oki, Broadband Packet Switching +Technologies---A Practical Guide to ATM Switches and IP Routers, John +Wiley & Sons, 2001. + +\[Chao 2011\] C. Zhang, P. Dunghel, D. Wu, K. W. Ross, "Unraveling the +BitTorrent Ecosystem," IEEE Transactions on Parallel and Distributed +Systems, Vol. 22, No. 7 (July 2011). + +\[Chen 2000\] G. Chen, D. Kotz, "A Survey of Context-Aware Mobile +Computing Research," Technical Report TR2000-381, Dept. of Computer +Science, Dartmouth College, Nov. 2000. +http://www.cs.dartmouth.edu/reports/TR2000-381.pdf + +\[Chen 2006\] K.-T. Chen, C.-Y. Huang, P. Huang, C.-L. Lei, "Quantifying +Skype User Satisfaction," Proc. 2006 ACM SIGCOMM (Pisa, Italy, +Sept. 2006). + +\[Chen 2011\] Y. Chen, S. Jain, V. K. Adhikari, Z. Zhang, +"Characterizing Roles of Front-End Servers in End-to-End Performance of +Dynamic Content Distribution," Proc. 2011 ACM Internet Measurement +Conference (Berlin, Germany, Nov. 2011). + +\[Cheswick 2000\] B. Cheswick, H. Burch, S. Branigan, "Mapping and +Visualizing the Internet," Proc. 2000 Usenix Conference (San Diego, CA, +June 2000). + +\[Chiu 1989\] D. Chiu, R. Jain, "Analysis of the Increase and Decrease +Algorithms for Congestion Avoidance in Computer Networks," Computer +Networks and ISDN Systems, Vol. 17, No. 1, pp. 1--14. +http://www.cs.wustl.edu/\~jain/papers/cong_av.htm + +\[Christiansen 2001\] M. Christiansen, K. Jeffay, D. Ott, F. D. Smith, +"Tuning Red for Web Traffic," IEEE/ACM Transactions on Networking, Vol. +9, No. 3 (June 2001), pp. 249--264. + +\[Chuang 2005\] S. Chuang, S. Iyer, N. McKeown, "Practical Algorithms +for Performance Guarantees in Buffered Crossbars," Proc. 2005 IEEE +INFOCOM. + +\[Cisco 802.11ac 2014\] Cisco Systems, "802.11ac: The Fifth Generation +of Wi-Fi," Technical White Paper, Mar. 2014. + +\[Cisco 7600 2016\] Cisco Systems, "Cisco 7600 Series Solution and +Design Guide," +http://www.cisco.com/en/US/products/hw/routers/ps368/prod_technical\_ +reference09186a0080092246.html + +\[Cisco 8500 2012\] Cisco Systems Inc., "Catalyst 8500 Campus Switch +Router Architecture," +http://www.cisco.com/univercd/cc/td/doc/product/l3sw/8540/rel_12_0/w5_6f/softcnfg/1cfg8500.pdf + +\[Cisco 12000 2016\] Cisco Systems Inc., "Cisco XR 12000 Series and +Cisco 12000 Series Routers," +http://www.cisco.com/en/US/products/ps6342/index.html + +\[Cisco 2012\] Cisco 2012, Data Centers, http://www.cisco.com/go/dce + +\[Cisco 2015\] Cisco Visual Networking Index: Forecast and Methodology, +2014--2019, White Paper, 2015. + +\[Cisco 6500 2016\] Cisco Systems, "Cisco Catalyst 6500 Architecture +White Paper," http://www.cisco.com/c/en/us/products/collateral/switches/ +catalyst-6500-seriesswitches/prod_white_paper0900aecd80673385.html + +\[Cisco NAT 2016\] Cisco Systems Inc., "How NAT Works," +http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080094831.shtml + +\[Cisco QoS 2016\] Cisco Systems Inc., "Advanced QoS Services for the +Intelligent Internet," +http://www.cisco.com/warp/public/cc/pd/iosw/ioft/ioqo/tech/qos_wp.htm + +\[Cisco Queue 2016\] Cisco Systems Inc., "Congestion Management +Overview," +http://www.cisco.com/en/US/docs/ios/12_2/qos/configuration/guide/qcfconmg.html + +\[Cisco SYN 2016\] Cisco Systems Inc., "Defining Strategies to Protect +Against TCP SYN Denial of Service Attacks," +http://www.cisco.com/en/US/tech/tk828/ +technologies_tech_note09186a00800f67d5.shtml + +\[Cisco TCAM 2014\] Cisco Systems Inc., "CAT 6500 and 7600 Series +Routers and Switches TCAM Allocation Adjustment Procedures," +http://www.cisco.com/c/en/us/ +support/docs/switches/catalyst-6500-series-switches/117712-problemsolution-cat6500-00.html + +\[Cisco VNI 2015\] Cisco Systems Inc., "Visual Networking Index," +http://www.cisco.com/web/solutions/sp/vni/vni_forecast_highlights/index.html + +\[Clark 1988\] D. Clark, "The Design Philosophy of the DARPA Internet +Protocols," Proc. 1988 ACM SIGCOMM (Stanford, CA, Aug. 1988). + +\[Cohen 1977\] D. Cohen, "Issues in Transnet Packetized Voice +Communication," Proc. Fifth Data Communications Symposium (Snowbird, UT, +Sept. 1977), pp. 6--13. + +\[Cookie Central 2016\] Cookie Central homepage, +http://www.cookiecentral.com/ n_cookie_faq.htm + +\[Cormen 2001\] T. H. Cormen, Introduction to Algorithms, 2nd Ed., MIT +Press, Cambridge, MA, 2001. + +\[Crow 1997\] B. Crow, I. Widjaja, J. Kim, P. Sakai, "IEEE 802.11 +Wireless Local Area Networks," IEEE Communications Magazine +(Sept. 1997), pp. 116--126. + +\[Cusumano 1998\] M. A. Cusumano, D. B. Yoffie, Competing on Internet +Time: Lessons from Netscape and Its Battle with Microsoft, Free Press, +New York, NY, 1998. + +\[Czyz 2014\] J. Czyz, M. Allman, J. Zhang, S. Iekel-Johnson, E. +Osterweil, M. Bailey, "Measuring IPv6 Adoption," Proc. ACM SIGCOMM 2014, +ACM, New York, NY, USA, pp. 87--98. + +\[Dahlman 1998\] E. Dahlman, B. Gudmundson, M. Nilsson, J. Sköld, +"UMTS/IMT-2000 Based on Wideband CDMA," IEEE Communications Magazine +(Sept. 1998), pp. 70--80. + +\[Daigle 1991\] J. N. Daigle, Queuing Theory for Telecommunications, +Addison-Wesley, Reading, MA, 1991. + +\[DAM 2016\] Digital Attack Map, http://www.digitalattackmap.com + +\[Davie 2000\] B. Davie and Y. Rekhter, MPLS: Technology and +Applications, Morgan Kaufmann Series in Networking, 2000. + +\[Davies 2005\] G. Davies, F. Kelly, "Network Dimensioning, Service +Costing, and Pricing in a Packet-Switched Environment," +Telecommunications Policy, Vol. 28, No. 4, pp. 391--412. + +\[DEC 1990\] Digital Equipment Corporation, "In Memoriam: J. C. R. +Licklider 1915--1990," SRC Research Report 61, Aug. 1990. +http://www.memex.org/ licklider.pdf + +\[DeClercq 2002\] J. DeClercq, O. Paridaens, "Scalability Implications +of Virtual Private Networks," IEEE Communications Magazine, Vol. 40, +No. 5 (May 2002), pp. 151--157. + +\[Demers 1990\] A. Demers, S. Keshav, S. Shenker, "Analysis and +Simulation of a Fair Queuing Algorithm," Internetworking: Research and +Experience, Vol. 1, No. 1 (1990), pp. 3--26. + +\[dhc 2016\] IETF Dynamic Host Configuration working group homepage, +http://www.ietf.org/html.charters/dhc-charter.html + +\[Dhungel 2012\] P. Dhungel, K. W. Ross, M. Steiner., Y. Tian, X. Hei, +"Xunlei: Peer-Assisted Download Acceleration on a Massive Scale," +Passive and Active Measurement Conference (PAM) 2012, Vienna, 2012. + +\[Diffie 1976\] W. Diffie, M. E. Hellman, "New Directions in +Cryptography," IEEE Transactions on Information Theory, Vol IT-22 +(1976), pp. 644--654. + +\[Diggavi 2004\] S. N. Diggavi, N. Al-Dhahir, A. Stamoulis, R. +Calderbank, "Great Expectations: The Value of Spatial Diversity in +Wireless Networks," Proceedings of the IEEE, Vol. 92, No. 2 (Feb. 2004). + +\[Dilley 2002\] J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, +B. Weihl, "Globally Distributed Content Delivert," IEEE Internet +Computing (Sept.--Oct. 2002). + +\[Diot 2000\] C. Diot, B. N. Levine, B. Lyles, H. Kassem, D. +Balensiefen, "Deployment Issues for the IP Multicast Service and +Architecture," IEEE Network, Vol. 14, No. 1 (Jan./Feb. 2000) pp. 78--88. + +\[Dischinger 2007\] M. Dischinger, A. Haeberlen, K. Gummadi, S. Saroiu, +"Characterizing residential broadband networks," Proc. 2007 ACM Internet +Measurement Conference, pp. 24--26. + +\[Dmitiropoulos 2007\] X. Dmitiropoulos, D. Krioukov, M. Fomenkov, B. +Huffaker, Y. Hyun, K. C. Claffy, G. Riley, "AS Relationships: Inference +and Validation," ACM Computer Communication Review (Jan. 2007). + +\[DOCSIS 2011\] Data-Over-Cable Service Interface Specifications, DOCSIS +3.0: MAC and Upper Layer Protocols Interface Specification, +CM-SP-MULPIv3.0-I16-110623, 2011. + +\[Dodge 2016\] M. Dodge, "An Atlas of Cyberspaces," +http://www.cybergeography.org/atlas/isp_maps.html + +\[Donahoo 2001\] M. Donahoo, K. Calvert, TCP/IP Sockets in C: Practical +Guide for Programmers, Morgan Kaufman, 2001. + +\[DSL 2016\] DSL Forum homepage, http://www.dslforum.org/ + +\[Dhunghel 2008\] P. Dhungel, D. Wu, B. Schonhorst, K.W. Ross, "A +Measurement Study of Attacks on BitTorrent Leechers," 7th International +Workshop on Peer-to-Peer Systems (IPTPS 2008) (Tampa Bay, FL, +Feb. 2008). + +\[Droms 2002\] R. Droms, T. Lemon, The DHCP Handbook (2nd Edition), SAMS +Publishing, 2002. + +\[Edney 2003\] J. Edney and W. A. Arbaugh, Real 802.11 Security: Wi-Fi +Protected Access and 802.11i, Addison-Wesley Professional, 2003. + +\[Edwards 2011\] W. K. Edwards, R. Grinter, R. Mahajan, D. Wetherall, +"Advancing the State of Home Networking," Communications of the ACM, +Vol. 54, No. 6 (June 2011), pp. 62--71. + +\[Ellis 1987\] H. Ellis, "The Story of Non-Secret Encryption," +http://jya.com/ellisdoc.htm + +\[Erickson 2013\] D. Erickson, " The Beacon Openflow Controller," 2nd +ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking +(HotSDN '13). ACM, New York, NY, USA, pp. 13--18. + +\[Ericsson 2012\] Ericsson, "The Evolution of Edge," +http://www.ericsson.com/technology/whitepapers/broadband/evolution_of_EDGE.shtml + +\[Facebook 2014\] A. Andreyev, "Introducing Data Center Fabric, the +Next-Generation Facebook Data Center Network," +https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network + +\[Faloutsos 1999\] C. Faloutsos, M. Faloutsos, P. Faloutsos, "What Does +the Internet Look Like? Empirical Laws of the Internet Topology," Proc. +1999 ACM SIGCOMM (Boston, MA, Aug. 1999). + +\[Farrington 2010\] N. Farrington, G. Porter, S. Radhakrishnan, H. +Bazzaz, V. Subramanya, Y. Fainman, G. Papen, A. Vahdat, "Helios: A +Hybrid Electrical/Optical Switch Architecture for Modular Data Centers," +Proc. 2010 ACM SIGCOMM. + +\[Feamster 2004\] N. Feamster, H. Balakrishnan, J. Rexford, A. Shaikh, +K. van der Merwe, "The Case for Separating Routing from Routers," ACM +SIGCOMM Workshop on Future Directions in Network Architecture, +Sept. 2004. + +\[Feamster 2004\] N. Feamster, J. Winick, J. Rexford, "A Model for BGP +Routing for Network Engineering," Proc. 2004 ACM SIGMETRICS (New York, +NY, June 2004). + +\[Feamster 2005\] N. Feamster, H. Balakrishnan, "Detecting BGP +Configuration Faults with Static Analysis," NSDI (May 2005). + +\[Feamster 2013\] N. Feamster, J. Rexford, E. Zegura, "The Road to SDN," +ACM Queue, Volume 11, Issue 12, (Dec. 2013). + +\[Feldmeier 1995\] D. Feldmeier, "Fast Software Implementation of Error +Detection Codes," IEEE/ACM Transactions on Networking, Vol. 3, No. 6 +(Dec. 1995), pp. 640--652. + +\[Ferguson 2013\] A. Ferguson, A. Guha, C. Liang, R. Fonseca, S. +Krishnamurthi, "Participatory Networking: An API for Application Control +of SDNs," Proceedings ACM SIGCOMM 2013, pp. 327--338. + +\[Fielding 2000\] R. Fielding, "Architectural Styles and the Design of +Network-based Software Architectures," 2000. PhD Thesis, UC Irvine, +2000. + +\[FIPS 1995\] Federal Information Processing Standard, "Secure Hash +Standard," FIPS Publication 180-1. +http://www.itl.nist.gov/fipspubs/fip180-1.htm + +\[Floyd 1999\] S. Floyd, K. Fall, "Promoting the Use of End-to-End +Congestion Control in the Internet," IEEE/ACM Transactions on +Networking, Vol. 6, No. 5 (Oct. 1998), pp. 458--472. + +\[Floyd 2000\] S. Floyd, M. Handley, J. Padhye, J. Widmer, +"Equation-Based Congestion Control for Unicast Applications," Proc. 2000 +ACM SIGCOMM (Stockholm, Sweden, Aug. 2000). + +\[Floyd 2001\] S. Floyd, "A Report on Some Recent Developments in TCP +Congestion Control," IEEE Communications Magazine (Apr. 2001). + +\[Floyd 2016\] S. Floyd, "References on RED (Random Early Detection) +Queue Management," http://www.icir.org/floyd/red.html + +\[Floyd Synchronization 1994\] S. Floyd, V. Jacobson, "Synchronization +of Periodic Routing Messages," IEEE/ACM Transactions on Networking, Vol. +2, No. 2 (Apr. 1997) pp. 122--136. + +\[Floyd TCP 1994\] S. Floyd, "TCP and Explicit Congestion Notification," +ACM SIGCOMM Computer Communications Review, Vol. 24, No. 5 (Oct. 1994), +pp. 10--23. + +\[Fluhrer 2001\] S. Fluhrer, I. Mantin, A. Shamir, "Weaknesses in the +Key Scheduling Algorithm of RC4," Eighth Annual Workshop on Selected +Areas in Cryptography (Toronto, Canada, Aug. 2002). + +\[Fortz 2000\] B. Fortz, M. Thorup, "Internet Traffic Engineering by +Optimizing OSPF Weights," Proc. 2000 IEEE INFOCOM (Tel Aviv, Israel, +Apr. 2000). + +\[Fortz 2002\] B. Fortz, J. Rexford, M. Thorup, "Traffic Engineering +with Traditional IP Routing Protocols," IEEE Communication Magazine +(Oct. 2002). + +\[Fraleigh 2003\] C. Fraleigh, F. Tobagi, C. Diot, "Provisioning IP +Backbone Networks to Support Latency Sensitive Traffic," Proc. 2003 IEEE +INFOCOM (San Francisco, CA, Mar. 2003). + +\[Frost 1994\] J. Frost, "BSD Sockets: A Quick and Dirty Primer," +http://world.std .com/\~jimf/papers/sockets/sockets.html + +\[FTC 2015\] Internet of Things: Privacy and Security in a Connected +World, Federal Trade Commission, 2015, +https://www.ftc.gov/system/files/documents/reports/ +federal-trade-commission-staff-report-november-2013-workshop-entitled-internet-things-privacy/150127iotrpt.pdf + +\[FTTH 2016\] Fiber to the Home Council, http://www.ftthcouncil.org/ + +\[Gao 2001\] L. Gao, J. Rexford, "Stable Internet Routing Without Global +Coordination," IEEE/ACM Transactions on Networking, Vol. 9, No. 6 +(Dec. 2001), pp. 681--692. + +\[Gartner 2014\] Gartner report on Internet of Things, +http://www.gartner.com/ technology/research/internet-of-things + +\[Gauthier 1999\] L. Gauthier, C. Diot, and J. Kurose, "End-to-End +Transmission Control Mechanisms for Multiparty Interactive Applications +on the Internet," Proc. 1999 IEEE INFOCOM (New York, NY, Apr. 1999). + +\[Gember-Jacobson 2014\] A. Gember-Jacobson, R. Viswanathan, C. Prakash, +R. Grandl, J. Khalid, S. Das, A. Akella, "OpenNF: Enabling Innovation in +Network Function Control," Proc. ACM SIGCOMM 2014, pp. 163--174. + +\[Goodman 1997\] David J. Goodman, Wireless Personal Communications +Systems, Prentice-Hall, 1997. + +\[Google IPv6 2015\] Google Inc. "IPv6 Statistics," +https://www.google.com/intl/en/ipv6/statistics.html + +\[Google Locations 2016\] Google data centers. +http://www.google.com/corporate/datacenter/locations.html + +\[Goralski 1999\] W. Goralski, Frame Relay for High-Speed Networks, John +Wiley, New York, 1999. + +\[Greenberg 2009a\] A. Greenberg, J. Hamilton, D. Maltz, P. Patel, "The +Cost of a Cloud: Research Problems in Data Center Networks," ACM +Computer Communications Review (Jan. 2009). + +\[Greenberg 2009b\] A. Greenberg, N. Jain, S. Kandula, C. Kim, P. +Lahiri, D. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and Flexible +Data Center Network," Proc. 2009 ACM SIGCOMM. + +\[Greenberg 2011\] A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. +Kim, P. Lahiri, D. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and +Flexible Data Center Network," Communications of the ACM, Vol. 54, No. 3 +(Mar. 2011), pp. 95--104. + +\[Greenberg 2015\] A. Greenberg, "SDN for the Cloud," Sigcomm 2015 +Keynote Address, +http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/keynote.pdf + +\[Griffin 2012\] T. Griffin, "Interdomain Routing Links," +http://www.cl.cam.ac.uk/\~tgg22/interdomain/ + +\[Gude 2008\] N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. +McKeown, and S. Shenker, "NOX: Towards an Operating System for +Networks," ACM SIGCOMM Computer Communication Review, July 2008. + +\[Guha 2006\] S. Guha, N. Daswani, R. Jain, "An Experimental Study of +the Skype Peer-to-Peer VoIP System," Proc. Fifth Int. Workshop on P2P +Systems (Santa Barbara, CA, 2006). + +\[Guo 2005\] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, X. Zhang, +"Measurement, Analysis, and Modeling of BitTorrent-Like Systems," Proc. +2005 ACM Internet Measurement Conference. + +\[Guo 2009\] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. +Zhang, S. Lu, "BCube: A High Performance, Server-centric Network +Architecture for Modular Data Centers," Proc. 2009 ACM SIGCOMM. + +\[Gupta 2001\] P. Gupta, N. McKeown, "Algorithms for Packet +Classification," IEEE Network Magazine, Vol. 15, No. 2 (Mar./Apr. 2001), +pp. 24--32. + +\[Gupta 2014\] A. Gupta, L. Vanbever, M. Shahbaz, S. Donovan, B. +Schlinker, N. Feamster, J. Rexford, S. Shenker, R. Clark, E. +Katz-Bassett, "SDX: A Software Defined Internet Exchange, " Proc. ACM +SIGCOMM 2014 (Aug. 2014), pp. 551--562. + +\[Ha 2008\] S. Ha, I. Rhee, L. Xu, "CUBIC: A New TCP-Friendly High-Speed +TCP Variant," ACM SIGOPS Operating System Review, 2008. + +\[Halabi 2000\] S. Halabi, Internet Routing Architectures, 2nd Ed., +Cisco Press, 2000. + +\[Hanabali 2005\] A. A. Hanbali, E. Altman, P. Nain, "A Survey of TCP +over Ad Hoc Networks," IEEE Commun. Surveys and Tutorials, Vol. 7, No. 3 +(2005), pp. 22--36. + +\[Hei 2007\] X. Hei, C. Liang, J. Liang, Y. Liu, K. W. Ross, "A +Measurement Study of a Large-scale P2P IPTV System," IEEE Trans. on +Multimedia (Dec. 2007). + +\[Heidemann 1997\] J. Heidemann, K. Obraczka, J. Touch, "Modeling the +Performance of HTTP over Several Transport Protocols," IEEE/ACM +Transactions on Networking, Vol. 5, No. 5 (Oct. 1997), pp. 616--630. + +\[Held 2001\] G. Held, Data Over Wireless Networks: Bluetooth, WAP, and +Wireless LANs, McGraw-Hill, 2001. + +\[Holland 2001\] G. Holland, N. Vaidya, V. Bahl, "A Rate-Adaptive MAC +Protocol for Multi-Hop Wireless Networks," Proc. 2001 ACM Int. +Conference of Mobile Computing and + +Networking (Mobicom01) (Rome, Italy, July 2001). + +\[Hollot 2002\] C.V. Hollot, V. Misra, D. Towsley, W. Gong, "Analysis +and Design of Controllers for AQM Routers Supporting TCP Flows," IEEE +Transactions on Automatic Control, Vol. 47, No. 6 (June 2002), +pp. 945--959. + +\[Hong 2013\] C. Hong, S, Kandula, R. Mahajan, M.Zhang, V. Gill, M. +Nanduri, R. Wattenhofer, "Achieving High Utilization with +Software-driven WAN," ACM SIGCOMM Conference (Aug. 2013), pp.15--26. + +\[Huang 2002\] C. Haung, V. Sharma, K. Owens, V. Makam, "Building +Reliable MPLS Networks Using a Path Protection Mechanism," IEEE +Communications Magazine, Vol. 40, No. 3 (Mar. 2002), pp. 156--162. + +\[Huang 2005\] Y. Huang, R. Guerin, "Does Over-Provisioning Become More +or Less Efficient as Networks Grow Larger?," Proc. IEEE Int. Conf. +Network Protocols (ICNP) (Boston MA, Nov. 2005). + +\[Huang 2008\] C. Huang, J. Li, A. Wang, K. W. Ross, "Understanding +Hybrid CDN-P2P: Why Limelight Needs Its Own Red Swoosh," Proc. 2008 +NOSSDAV, Braunschweig, Germany. + +\[Huitema 1998\] C. Huitema, IPv6: The New Internet Protocol, 2nd Ed., +Prentice Hall, Englewood Cliffs, NJ, 1998. + +\[Huston 1999a\] G. Huston, "Interconnection, Peering, and +Settlements---Part I," The Internet Protocol Journal, Vol. 2, No. 1 +(Mar. 1999). + +\[Huston 2004\] G. Huston, "NAT Anatomy: A Look Inside Network Address +Translators," The Internet Protocol Journal, Vol. 7, No. 3 (Sept. 2004). + +\[Huston 2008a\] G. Huston, "Confronting IPv4 Address Exhaustion," +http://www.potaroo.net/ispcol/2008-10/v4depletion.html + +\[Huston 2008b\] G. Huston, G. Michaelson, "IPv6 Deployment: Just where +are we?" http://www.potaroo.net/ispcol/2008-04/ipv6.html + +\[Huston 2011a\] G. Huston, "A Rough Guide to Address Exhaustion," The +Internet Protocol Journal, Vol. 14, No. 1 (Mar. 2011). + +\[Huston 2011b\] G. Huston, "Transitioning Protocols," The Internet +Protocol Journal, Vol. 14, No. 1 (Mar. 2011). + +\[IAB 2016\] Internet Architecture Board homepage, http://www.iab.org/ + +\[IANA Protocol Numbers 2016\] Internet Assigned Numbers Authority, +Protocol Numbers, +http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml + +\[IBM 1997\] IBM Corp., IBM Inside APPN - The Essential Guide to the +Next-Generation SNA, SG24-3669-03, June 1997. + +\[ICANN 2016\] The Internet Corporation for Assigned Names and Numbers +homepage, http://www.icann.org + +\[IEEE 802 2016\] IEEE 802 LAN/MAN Standards Committee homepage, +http://www.ieee802.org/ + +\[IEEE 802.11 1999\] IEEE 802.11, "1999 Edition (ISO/IEC 8802-11: 1999) +IEEE Standards for Information Technology---Telecommunications and +Information Exchange Between Systems---Local and Metropolitan Area +Network---Specific Requirements---Part 11: Wireless LAN Medium Access +Control (MAC) and Physical Layer (PHY) Specification," +http://standards.ieee.org/getieee802/download/802.11-1999.pdf + +\[IEEE 802.11ac 2013\] IEEE, "802.11ac-2013---IEEE Standard for +Information technology---Telecommunications and Information Exchange +Between Systems---Local and Metropolitan Area Networks---Specific +Requirements---Part 11: Wireless LAN Medium Access Control (MAC) and +Physical Layer (PHY) Specifications---Amendment 4: Enhancements for Very +High Throughput for Operation in Bands Below 6 GHz." + +\[IEEE 802.11n 2012\] IEEE, "IEEE P802.11---Task Group N---Meeting +Update: Status of 802.11n," +http://grouper.ieee.org/groups/802/11/Reports/tgn_update .htm + +\[IEEE 802.15 2012\] IEEE 802.15 Working Group for WPAN homepage, +http://grouper.ieee.org/groups/802/15/. + +\[IEEE 802.15.4 2012\] IEEE 802.15 WPAN Task Group 4, +http://www.ieee802.org/15/pub/TG4.html + +\[IEEE 802.16d 2004\] IEEE, "IEEE Standard for Local and Metropolitan +Area Networks, Part 16: Air Interface for Fixed Broadband Wireless +Access Systems," http:// +standards.ieee.org/getieee802/download/802.16-2004.pdf + +\[IEEE 802.16e 2005\] IEEE, "IEEE Standard for Local and Metropolitan +Area Networks, Part 16: Air Interface for Fixed and Mobile Broadband +Wireless Access Systems, Amendment 2: Physical and Medium Access Control +Layers for Combined Fixed and Mobile Operation in Licensed Bands and +Corrigendum 1," http:// +standards.ieee.org/getieee802/download/802.16e-2005.pdf + +\[IEEE 802.1q 2005\] IEEE, "IEEE Standard for Local and Metropolitan +Area Networks: Virtual Bridged Local Area Networks," +http://standards.ieee.org/ getieee802/ download/802.1Q-2005.pdf + +\[IEEE 802.1X\] IEEE Std 802.1X-2001 Port-Based Network Access Control, +http://standards.ieee.org/reading/ieee/std_public/description/lanman/ +802.1x-2001_desc.html + +\[IEEE 802.3 2012\] IEEE, "IEEE 802.3 CSMA/CD (Ethernet)," +http://grouper.ieee.org/groups/802/3/ + +\[IEEE 802.5 2012\] IEEE, IEEE 802.5 homepage, http://www.ieee802.org/5/ +www8025org/ + +\[IETF 2016\] Internet Engineering Task Force homepage, +http://www.ietf.org + +\[Ihm 2011\] S. Ihm, V. S. Pai, "Towards Understanding Modern Web +Traffic," Proc. 2011 ACM Internet Measurement Conference (Berlin). + +\[IMAP 2012\] The IMAP Connection, http://www.imap.org/ + +\[Intel 2016\] Intel Corp., "Intel 710 Ethernet Adapter," +http://www.intel.com/ +content/www/us/en/ethernet-products/converged-network-adapters/ethernet-xl710 +.html + +\[Internet2 Multicast 2012\] Internet2 Multicast Working Group homepage, +http://www.internet2.edu/multicast/ + +\[ISC 2016\] Internet Systems Consortium homepage, http://www.isc.org + +\[ISI 1979\] Information Sciences Institute, "DoD Standard Internet +Protocol," Internet Engineering Note 123 (Dec. 1979), +http://www.isi.edu/in-notes/ien/ ien123.txt + +\[ISO 2016\] International Organization for Standardization homepage, +International Organization for Standardization, http://www.iso.org/ + +\[ISO X.680 2002\] International Organization for Standardization, +"X.680: ITU-T Recommendation X.680 (2002) Information +Technology---Abstract Syntax Notation One (ASN.1): Specification of +Basic Notation," +http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf + +\[ITU 1999\] Asymmetric Digital Subscriber Line (ADSL) Transceivers. +ITU-T G.992.1, 1999. + +\[ITU 2003\] Asymmetric Digital Subscriber Line (ADSL) +Transceivers---Extended Bandwidth ADSL2 (ADSL2Plus). ITU-T G.992.5, +2003. + +\[ITU 2005a\] International Telecommunication Union, "ITU-T X.509, The +Directory: Public-key and attribute certificate frameworks" (Aug. 2005). + +\[ITU 2006\] ITU, "G.993.1: Very High Speed Digital Subscriber Line +Transceivers (VDSL)," https://www.itu.int/rec/T-REC-G.993.1-200406-I/en, +2006. + +\[ITU 2015\] "Measuring the Information Society Report," 2015, +http://www.itu.int/en/ITU-D/Statistics/Pages/publications/mis2015.aspx + +\[ITU 2012\] The ITU homepage, http://www.itu.int/ + +\[ITU-T Q.2931 1995\] International Telecommunication Union, +"Recommendation Q.2931 (02/95)---Broadband Integrated Services Digital +Network (B-ISDN)--- Digital Subscriber Signalling System No. 2 (DSS +2)---User-Network Interface (UNI)---Layer 3 Specification for Basic +Call/Connection Control." + +\[IXP List 2016\] List of IXPs, Wikipedia, +https://en.wikipedia.org/wiki/List_of\_ Internet_exchange_points + +\[Iyengar 2015\] J. Iyengar, I. Swett, "QUIC: A UDP-Based Secure and +Reliable Transport for HTTP/2," Internet Draft +draft-tsvwg-quic-protocol-00, June 2015. + +\[Iyer 2008\] S. Iyer, R. R. Kompella, N. McKeown, "Designing Packet +Buffers for Router Line Cards," IEEE Transactions on Networking, Vol. +16, No. 3 (June 2008), pp. 705--717. + +\[Jacobson 1988\] V. Jacobson, "Congestion Avoidance and Control," Proc. +1988 ACM SIGCOMM (Stanford, CA, Aug. 1988), pp. 314--329. + +\[Jain 1986\] R. Jain, "A Timeout-Based Congestion Control Scheme for +Window Flow-Controlled Networks," IEEE Journal on Selected Areas in +Communications SAC-4, 7 (Oct. 1986). + +\[Jain 1989\] R. Jain, "A Delay-Based Approach for Congestion Avoidance +in Interconnected Heterogeneous Computer Networks," ACM SIGCOMM Computer +Communications Review, Vol. 19, No. 5 (1989), pp. 56--71. + +\[Jain 1994\] R. Jain, FDDI Handbook: High-Speed Networking Using Fiber +and Other Media, Addison-Wesley, Reading, MA, 1994. + +\[Jain 1996\] R. Jain. S. Kalyanaraman, S. Fahmy, R. Goyal, S. Kim, +"Tutorial Paper on ABR Source Behavior," ATM Forum/96-1270, Oct. 1996. +http://www.cse.wustl.edu/ \~jain/atmf/ftp/atm96-1270.pdf + +\[Jain 2013\] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. +Singh, S.Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. +Stuart, A, Vahdat, "B4: Experience with a Globally Deployed Software +Defined Wan," ACM SIGCOMM 2013, pp. 3--14. + +\[Jaiswal 2003\] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, D. +Towsley, "Measurement and Classification of Out-of-Sequence Packets in a +Tier-1 IP backbone," Proc. 2003 IEEE INFOCOM. + +\[Ji 2003\] P. Ji, Z. Ge, J. Kurose, D. Towsley, "A Comparison of +Hard-State and Soft-State Signaling Protocols," Proc. 2003 ACM SIGCOMM +(Karlsruhe, Germany, Aug. 2003). + +\[Jimenez 1997\] D. Jimenez, "Outside Hackers Infiltrate MIT Network, +Compromise Security," The Tech, Vol. 117, No 49 (Oct. 1997), p. 1, +http://www-tech.mit.edu/V117/ N49/hackers.49n.html + +\[Jin 2004\] C. Jin, D. X. We, S. Low, "FAST TCP: Motivation, +Architecture, Algorithms, Performance," Proc. 2004 IEEE INFOCOM (Hong +Kong, Mar. 2004). + +\[Juniper Contrail 2016\] Juniper Networks, "Contrail," +http://www.juniper.net/us/en/products-services/sdn/contrail/ + +\[Juniper MX2020 2015\] Juniper Networks, "MX2020 and MX2010 3D +Universal Edge Routers," +www.juniper.net/us/en/local/pdf/.../1000417-en.pdf + +\[Kaaranen 2001\] H. Kaaranen, S. Naghian, L. Laitinen, A. Ahtiainen, V. +Niemi, Networks: Architecture, Mobility and Services, New York: John +Wiley & Sons, 2001. + +\[Kahn 1967\] D. Kahn, The Codebreakers: The Story of Secret Writing, +The Macmillan Company, 1967. + +\[Kahn 1978\] R. E. Kahn, S. Gronemeyer, J. Burchfiel, R. Kunzelman, +"Advances in Packet Radio Technology," Proc. 1978 IEEE INFOCOM, 66, 11 +(Nov. 1978). + +\[Kamerman 1997\] A. Kamerman, L. Monteban, "WaveLAN-II: A High-- +Performance Wireless LAN for the Unlicensed Band," Bell Labs Technical +Journal (Summer 1997), pp. 118--133. + +\[Kar 2000\] K. Kar, M. Kodialam, T. V. Lakshman, "Minimum Interference +Routing of Bandwidth Guaranteed Tunnels with MPLS Traffic Engineering +Applications," IEEE J. Selected Areas in Communications (Dec. 2000). + +\[Karn 1987\] P. Karn, C. Partridge, "Improving Round-Trip Time +Estimates in Reliable Transport Protocols," Proc. 1987 ACM SIGCOMM. + +\[Karol 1987\] M. Karol, M. Hluchyj, A. Morgan, "Input Versus Output +Queuing on a Space-Division Packet Switch," IEEE Transactions on +Communications, Vol. 35, No. 12 (Dec.1987), pp. 1347--1356. + +\[Kaufman 1995\] C. Kaufman, R. Perlman, M. Speciner, Network Security, +Private Communication in a Public World, Prentice Hall, Englewood +Cliffs, NJ, 1995. + +\[Kelly 1998\] F. P. Kelly, A. Maulloo, D. Tan, "Rate Control for +Communication Networks: Shadow Prices, Proportional Fairness and +Stability," J. Operations Res. Soc., Vol. 49, No. 3 (Mar. 1998), +pp. 237--252. + +\[Kelly 2003\] T. Kelly, "Scalable TCP: Improving Performance in High +Speed Wide Area Networks," ACM SIGCOMM Computer Communications Review, +Volume 33, No. 2 (Apr. 2003), pp.83--91. + +\[Kilkki 1999\] K. Kilkki, Differentiated Services for the Internet, +Macmillan Technical Publishing, Indianapolis, IN, 1999. + +\[Kim 2005\] H. Kim, S. Rixner, V. Pai, "Network Interface Data +Caching," IEEE Transactions on Computers, Vol. 54, No. 11 (Nov. 2005), +pp. 1394--1408. + +\[Kim 2008\] C. Kim, M. Caesar, J. Rexford, "Floodless in SEATTLE: A +Scalable Ethernet Architecture for Large Enterprises," Proc. 2008 ACM +SIGCOMM (Seattle, WA, Aug. 2008). + +\[Kleinrock 1961\] L. Kleinrock, "Information Flow in Large +Communication Networks," RLE Quarterly Progress Report, July 1961. + +\[Kleinrock 1964\] L. Kleinrock, 1964 Communication Nets: Stochastic +Message Flow and Delay, McGraw-Hill, New York, NY, 1964. + +\[Kleinrock 1975\] L. Kleinrock, Queuing Systems, Vol. 1, John Wiley, +New York, 1975. + +\[Kleinrock 1975b\] L. Kleinrock, F. A. Tobagi, "Packet Switching in +Radio Channels: Part I---Carrier Sense Multiple-Access Modes and Their +Throughput-Delay Characteristics," IEEE Transactions on Communications, +Vol. 23, No. 12 (Dec. 1975), pp. 1400--1416. + +\[Kleinrock 1976\] L. Kleinrock, Queuing Systems, Vol. 2, John Wiley, +New York, 1976. + +\[Kleinrock 2004\] L. Kleinrock, "The Birth of the Internet," +http://www.lk.cs.ucla.edu/LK/Inet/birth.html + +\[Kohler 2006\] E. Kohler, M. Handley, S. Floyd, "DDCP: Designing DCCP: +Congestion Control Without Reliability," Proc. 2006 ACM SIGCOMM (Pisa, +Italy, Sept. 2006). + +\[Kolding 2003\] T. Kolding, K. Pedersen, J. Wigard, F. Frederiksen, P. +Mogensen, "High Speed Downlink Packet Access: WCDMA Evolution," IEEE +Vehicular Technology Society News (Feb. 2003), pp. 4--10. + +\[Koponen 2010\] T. Koponen, M. Casado, N. Gude, J. Stribling, L. +Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, S. +Shenker, "Onix: A Distributed Control Platform for Large-Scale +Production Networks," 9th USENIX conference on Operating systems design +and implementation (OSDI'10), pp. 1--6. + +\[Koponen 2011\] T. Koponen, S. Shenker, H. Balakrishnan, N. Feamster, +I. Ganichev, A. Ghodsi, P. B. Godfrey, N. McKeown, G. Parulkar, B. +Raghavan, J. Rexford, S. Arianfar, D. Kuptsov, "Architecting for +Innovation," ACM Computer Communications Review, 2011. + +\[Korhonen 2003\] J. Korhonen, Introduction to 3G Mobile Communications, +2nd ed., Artech House, 2003. + +\[Koziol 2003\] J. Koziol, Intrusion Detection with Snort, Sams +Publishing, 2003. + +\[Kreutz 2015\] D. Kreutz, F.M.V. Ramos, P. Esteves Verissimo, C. +Rothenberg, S. Azodolmolky, S. Uhlig, "Software-Defined Networking: A +Comprehensive Survey," Proceedings of the IEEE, Vol. 103, No. 1 +(Jan. 2015), pp. 14-76. This paper is also being updated at +https://github.com/SDN-Survey/latex/wiki + +\[Krishnamurthy 2001\] B. Krishnamurthy, J. Rexford, Web Protocols and +Practice: HTTP/ 1.1, Networking Protocols, and Traffic Measurement, +Addison-Wesley, Boston, MA, 2001. + +\[Kulkarni 2005\] S. Kulkarni, C. Rosenberg, "Opportunistic Scheduling: +Generalizations to Include Multiple Constraints, Multiple Interfaces, +and Short Term Fairness," Wireless Networks, 11 (2005), 557--569. + +\[Kumar 2006\] R. Kumar, K.W. Ross, "Optimal Peer-Assisted File +Distribution: Single and Multi-Class Problems," IEEE Workshop on Hot +Topics in Web Systems and Technologies (Boston, MA, 2006). + +\[Labovitz 1997\] C. Labovitz, G. R. Malan, F. Jahanian, "Internet +Routing Instability," Proc. 1997 ACM SIGCOMM (Cannes, France, +Sept. 1997), pp. 115--126. + +\[Labovitz 2010\] C. Labovitz, S. Iekel-Johnson, D. McPherson, J. +Oberheide, F. Jahanian, "Internet Inter-Domain Traffic," Proc. 2010 ACM +SIGCOMM. + +\[Labrador 1999\] M. Labrador, S. Banerjee, "Packet Dropping Policies +for ATM and IP Networks," IEEE Communications Surveys, Vol. 2, No. 3 +(Third Quarter 1999), pp. 2--14. + +\[Lacage 2004\] M. Lacage, M.H. Manshaei, T. Turletti, "IEEE 802.11 Rate +Adaptation: A Practical Approach," ACM Int. Symposium on Modeling, +Analysis, and Simulation of Wireless and Mobile Systems (MSWiM) (Venice, +Italy, Oct. 2004). + +\[Lakhina 2004\] A. Lakhina, M. Crovella, C. Diot, "Diagnosing +Network-Wide Traffic Anomalies," Proc. 2004 ACM SIGCOMM. + +\[Lakhina 2005\] A. Lakhina, M. Crovella, C. Diot, "Mining Anomalies +Using Traffic Feature Distributions," Proc. 2005 ACM SIGCOMM. + +\[Lakshman 1997\] T. V. Lakshman, U. Madhow, "The Performance of TCP/IP +for Networks with High Bandwidth-Delay Products and Random Loss," +IEEE/ACM Transactions on Networking, Vol. 5, No. 3 (1997), pp. 336--350. + +\[Lakshman 2004\] T. V. Lakshman, T. Nandagopal, R. Ramjee, K. Sabnani, +T. Woo, "The SoftRouter Architecture," Proc. 3nd ACM Workshop on Hot +Topics in Networks (Hotnets-III), Nov. 2004. + +\[Lam 1980\] S. Lam, "A Carrier Sense Multiple Access Protocol for Local +Networks," Computer Networks, Vol. 4 (1980), pp. 21--32. + +\[Lamport 1989\] L. Lamport, "The Part-Time Parliament," Technical +Report 49, Systems Research Center, Digital Equipment Corp., Palo Alto, +Sept. 1989. + +\[Lampson 1983\] Lampson, Butler W. "Hints for computer system design," +ACM SIGOPS Operating Systems Review, Vol. 17, No. 5, 1983. + +\[Lampson 1996\] B. Lampson, "How to Build a Highly Available System +Using Consensus," Proc. 10th International Workshop on Distributed +Algorithms (WDAG '96), Özalp Babaoglu and Keith Marzullo (Eds.), +Springer-Verlag, pp. 1--17. + +\[Lawton 2001\] G. Lawton, "Is IPv6 Finally Gaining Ground?" IEEE +Computer Magazine (Aug. 2001), pp. 11--15. + +\[LeBlond 2011\] S. Le Blond, C. Zhang, A. Legout, K. Ross, W. Dabbous. +2011, "I know where you are and what you are sharing: exploiting P2P +communications to invade users' privacy." 2011 ACM Internet Measurement +Conference, ACM, New York, NY, USA, pp. 45--60. + +\[Leighton 2009\] T. Leighton, "Improving Performance on the Internet," +Communications of the ACM, Vol. 52, No. 2 (Feb. 2009), pp. 44--51. + +\[Leiner 1998\] B. Leiner, V. Cerf, D. Clark, R. Kahn, L. Kleinrock, D. +Lynch, J. Postel, L. Roberts, S. Woolf, "A Brief History of the +Internet," http://www.isoc.org/internet/history/brief.html + +\[Leung 2006\] K. Leung, V. O.K. Li, "TCP in Wireless Networks: Issues, +Approaches, and Challenges," IEEE Commun. Surveys and Tutorials, Vol. 8, +No. 4 (2006), pp. 64--79. + +\[Levin 2012\] D. Levin, A. Wundsam, B. Heller, N. Handigol, A. +Feldmann, "Logically Centralized?: State Distribution Trade-offs in +Software Defined Networks," Proc. First Workshop on Hot Topics in +Software Defined Networks (Aug. 2012), pp. 1--6. + +\[Li 2004\] L. Li, D. Alderson, W. Willinger, J. Doyle, "A +First-Principles Approach to Understanding the Internet's Router-Level +Topology," Proc. 2004 ACM SIGCOMM (Portland, OR, Aug. 2004). + +\[Li 2007\] J. Li, M. Guidero, Z. Wu, E. Purpus, T. Ehrenkranz, "BGP +Routing Dynamics Revisited." ACM Computer Communication Review +(Apr. 2007). + +\[Li 2015\] S.Q. Li, "Building Softcom Ecosystem Foundation," Open +Networking Summit, 2015. + +\[Lin 2001\] Y. Lin, I. Chlamtac, Wireless and Mobile Network +Architectures, John Wiley and Sons, New York, NY, 2001. + +\[Liogkas 2006\] N. Liogkas, R. Nelson, E. Kohler, L. Zhang, "Exploiting +BitTorrent for Fun (but Not Profit)," 6th International Workshop on +Peer-to-Peer Systems (IPTPS 2006). + +\[Liu 2003\] J. Liu, I. Matta, M. Crovella, "End-to-End Inference of +Loss Nature in a Hybrid Wired/Wireless Environment," Proc. WiOpt'03: +Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks. + +\[Locher 2006\] T. Locher, P. Moor, S. Schmid, R. Wattenhofer, "Free +Riding in BitTorrent is Cheap," Proc. ACM HotNets 2006 (Irvine CA, +Nov. 2006). + +\[Lui 2004\] J. Lui, V. Misra, D. Rubenstein, "On the Robustness of Soft +State Protocols," Proc. IEEE Int. Conference on Network Protocols (ICNP +'04), pp. 50--60. + +\[Mahdavi 1997\] J. Mahdavi, S. Floyd, "TCP-Friendly Unicast Rate-Based +Flow Control," unpublished note (Jan. 1997). + +\[MaxMind 2016\] http://www.maxmind.com/app/ip-location + +\[Maymounkov 2002\] P. Maymounkov, D. Mazières. "Kademlia: A +Peer-to-Peer Information System Based on the XOR Metric." Proceedings of +the 1st International Workshop on Peerto-Peer Systems (IPTPS '02) +(Mar. 2002), pp. 53--65. + +\[McKeown 1997a\] N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, +M. Horowitz, "The Tiny Tera: A Packet Switch Core," IEEE Micro Magazine +(Jan.--Feb. 1997). + +\[McKeown 1997b\] N. McKeown, "A Fast Switched Backplane for a Gigabit +Switched Router," Business Communications Review, Vol. 27, No. 12. +http://tinytera.stanford.edu/\~nickm/papers/cisco_fasts_wp.pdf + +\[McKeown 2008\] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, +L. Peterson, J. Rexford, S. Shenker, J. Turner. 2008. OpenFlow: Enabling +Innovation in Campus Networks. SIGCOMM Comput. Commun. Rev. 38, 2 +(Mar. 2008), pp. 69--74. + +\[McQuillan 1980\] J. McQuillan, I. Richer, E. Rosen, "The New Routing +Algorithm for the Arpanet," IEEE Transactions on Communications, Vol. +28, No. 5 (May 1980), pp. 711--719. + +\[Metcalfe 1976\] R. M. Metcalfe, D. R. Boggs. "Ethernet: Distributed +Packet Switching for Local Computer Networks," Communications of the +Association for Computing Machinery, Vol. 19, No. 7 (July 1976), +pp. 395--404. + +\[Meyers 2004\] A. Myers, T. Ng, H. Zhang, "Rethinking the Service +Model: Scaling Ethernet to a Million Nodes," ACM Hotnets Conference, +2004. + +\[MFA Forum 2016\] IP/MPLS Forum homepage, http://www.ipmplsforum.org/ + +\[Mockapetris 1988\] P. V. Mockapetris, K. J. Dunlap, "Development of +the Domain Name System," Proc. 1988 ACM SIGCOMM (Stanford, CA, +Aug. 1988). + +\[Mockapetris 2005\] P. Mockapetris, Sigcomm Award Lecture, video +available at http://www.postel.org/sigcomm + +\[Molinero-Fernandez 2002\] P. Molinaro-Fernandez, N. McKeown, H. Zhang, +"Is IP Going to Take Over the World (of Communications)?" Proc. 2002 ACM +Hotnets. + +\[Molle 1987\] M. L. Molle, K. Sohraby, A. N. Venetsanopoulos, +"Space-Time Models of Asynchronous CSMA Protocols for Local Area +Networks," IEEE Journal on Selected Areas in Communications, Vol. 5, +No. 6 (1987), pp. 956--968. + +\[Moore 2001\] D. Moore, G. Voelker, S. Savage, "Inferring Internet +Denial of Service Activity," Proc. 2001 USENIX Security Symposium +(Washington, DC, Aug. 2001). + +\[Motorola 2007\] Motorola, "Long Term Evolution (LTE): A Technical +Overview," +http://www.motorola.com/staticfiles/Business/Solutions/Industry%20Solutions/Service%20Providers/Wireless%20Operators/LTE/\_Document/Static%20Files/6834_MotDoc_New.pdf + +\[Mouly 1992\] M. Mouly, M. Pautet, The GSM System for Mobile +Communications, Cell and Sys, Palaiseau, France, 1992. + +\[Moy 1998\] J. Moy, OSPF: Anatomy of An Internet Routing Protocol, +Addison-Wesley, Reading, MA, 1998. + +\[Mukherjee 1997\] B. Mukherjee, Optical Communication Networks, +McGraw-Hill, 1997. + +\[Mukherjee 2006\] B. Mukherjee, Optical WDM Networks, Springer, 2006. + +\[Mysore 2009\] R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. +Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat, "PortLand: A Scalable +Fault-Tolerant Layer 2 Data Center Network Fabric," Proc. 2009 ACM +SIGCOMM. + +\[Nahum 2002\] E. Nahum, T. Barzilai, D. Kandlur, "Performance Issues in +WWW Servers," IEEE/ACM Transactions on Networking, Vol 10, No. 1 +(Feb. 2002). + +\[Netflix Open Connect 2016\] Netflix Open Connect CDN, 2016, https:// +openconnect.netflix.com/ + +\[Netflix Video 1\] Designing Netflix's Content Delivery System, D. +Fulllager, 2014, https://www.youtube.com/watch?v=LkLLpYdDINA + +\[Netflix Video 2\] Scaling the Netflix Global CDN, D. Temkin, 2015, +https://www .youtube.com/watch?v=tbqcsHg-Q_o + +\[Neumann 1997\] R. Neumann, "Internet Routing Black Hole," The Risks +Digest: Forum on Risks to the Public in Computers and Related Systems, +Vol. 19, No. 12 (May 1997). +http://catless.ncl.ac.uk/Risks/19.12.html#subj1.1 + +\[Neville-Neil 2009\] G. Neville-Neil, "Whither Sockets?" Communications +of the ACM, Vol. 52, No. 6 (June 2009), pp. 51--55. + +\[Nicholson 2006\] A Nicholson, Y. Chawathe, M. Chen, B. Noble, D. +Wetherall, "Improved Access Point Selection," Proc. 2006 ACM Mobisys +Conference (Uppsala Sweden, 2006). + +\[Nielsen 1997\] H. F. Nielsen, J. Gettys, A. Baird-Smith, E. +Prud'hommeaux, H. W. Lie, C. Lilley, "Network Performance Effects of +HTTP/1.1, CSS1, and PNG," W3C Document, 1997 (also appears in Proc. 1997 +ACM SIGCOM (Cannes, France, Sept 1997), pp. 155--166. + +\[NIST 2001\] National Institute of Standards and Technology, "Advanced +Encryption Standard (AES)," Federal Information Processing Standards +197, Nov. 2001, http:// +csrc.nist.gov/publications/fips/fips197/fips-197.pdf + +\[NIST IPv6 2015\] US National Institute of Standards and Technology, +"Estimating IPv6 & DNSSEC Deployment SnapShots," +http://fedv6-deployment.antd.nist.gov/snapall.html + +\[Nmap 2012\] Nmap homepage, http://www.insecure.com/nmap + +\[Nonnenmacher 1998\] J. Nonnenmacher, E. Biersak, D. Towsley, +"Parity-Based Loss Recovery for Reliable Multicast Transmission," +IEEE/ACM Transactions on Networking, Vol. 6, No. 4 (Aug. 1998), +pp. 349--361. + +\[Nygren 2010\] Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun, "The +Akamai Network: A Platform for High-performance Internet Applications," +SIGOPS Oper. Syst. Rev. 44, 3 (Aug. 2010), 2--19. + +\[ONF 2016\] Open Networking Foundation, Technical Library, +https://www.opennetworking.org/sdn-resources/technical-library + +\[ONOS 2016\] Open Network Operating System (ONOS), "Architecture +Guide," https://wiki.onosproject.org/display/ONOS/Architecture+Guide, +2016. + +\[OpenFlow 2009\] Open Network Foundation, "OpenFlow Switch +Specification 1.0.0, TS-001," +https://www.opennetworking.org/images/stories/downloads/sdnresources/onf-specifications/openflow/openflow-spec-v1.0.0.pdf + +\[OpenDaylight Lithium 2016\] OpenDaylight, "Lithium," +https://www.opendaylight.org/lithium + +\[OSI 2012\] International Organization for Standardization homepage, +http://www.iso.org/iso/en/ISOOnline.frontpage + +\[Osterweil 2012\] E. Osterweil, D. McPherson, S. DiBenedetto, C. +Papadopoulos, D. Massey, "Behavior of DNS Top Talkers," Passive and +Active Measurement Conference, 2012. + +\[Padhye 2000\] J. Padhye, V. Firoiu, D. Towsley, J. Kurose, "Modeling +TCP Reno Performance: A Simple Model and Its Empirical Validation," +IEEE/ACM Transactions on Networking, Vol. 8 No. 2 (Apr. 2000), +pp. 133--145. + +\[Padhye 2001\] J. Padhye, S. Floyd, "On Inferring TCP Behavior," Proc. +2001 ACM SIGCOMM (San Diego, CA, Aug. 2001). + +\[Palat 2009\] S. Palat, P. Godin, "The LTE Network Architecture: A +Comprehensive Tutorial," in LTE---The UMTS Long Term Evolution: From +Theory to Practice. Also available as a standalone Alcatel white paper. + +\[Panda 2013\] A. Panda, C. Scott, A. Ghodsi, T. Koponen, S. Shenker, +"CAP for Networks," Proc. ACM HotSDN '13, pp. 91--96. + +\[Parekh 1993\] A. Parekh, R. Gallagher, "A Generalized Processor +Sharing Approach to Flow Control in Integrated Services Networks: The +Single-Node Case," IEEE/ACM Transactions on Networking, Vol. 1, No. 3 +(June 1993), pp. 344--357. + +\[Partridge 1992\] C. Partridge, S. Pink, "An Implementation of the +Revised Internet Stream Protocol (ST-2)," Journal of Internetworking: +Research and Experience, Vol. 3, No. 1 (Mar. 1992). + +\[Partridge 1998\] C. Partridge, et al. "A Fifty Gigabit per second IP +Router," IEEE/ACM Transactions on Networking, Vol. 6, No. 3 (Jun. 1998), +pp. 237--248. + +\[Pathak 2010\] A. Pathak, Y. A. Wang, C. Huang, A. Greenberg, Y. C. Hu, +J. Li, K. W. Ross, "Measuring and Evaluating TCP Splitting for Cloud +Services," Passive and Active Measurement (PAM) Conference (Zurich, +2010). + +\[Perkins 1994\] A. Perkins, "Networking with Bob Metcalfe," The Red +Herring Magazine (Nov. 1994). + +\[Perkins 1998\] C. Perkins, O. Hodson, V. Hardman, "A Survey of Packet +Loss Recovery Techniques for Streaming Audio," IEEE Network Magazine +(Sept./Oct. 1998), pp. 40--47. + +\[Perkins 1998b\] C. Perkins, Mobile IP: Design Principles and Practice, +Addison-Wesley, Reading, MA, 1998. + +\[Perkins 2000\] C. Perkins, Ad Hoc Networking, Addison-Wesley, Reading, +MA, 2000. + +\[Perlman 1999\] R. Perlman, Interconnections: Bridges, Routers, +Switches, and Internetworking Protocols, 2nd ed., Addison-Wesley +Professional Computing Series, Reading, MA, 1999. + +\[PGPI 2016\] The International PGP homepage, http://www.pgpi.org + +\[Phifer 2000\] L. Phifer, "The Trouble with NAT," The Internet Protocol +Journal, Vol. 3, No. 4 (Dec. 2000), +http://www.cisco.com/warp/public/759/ipj_3-4/ipj\_ 3-4_nat.html + +\[Piatek 2007\] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, A. +Venkataramani, "Do Incentives Build Robustness in Bittorrent?," Proc. +NSDI (2007). + +\[Piatek 2008\] M. Piatek, T. Isdal, A. Krishnamurthy, T. Anderson, "One +Hop Reputations for Peer-to-peer File Sharing Workloads," Proc. NSDI +(2008). + +\[Pickholtz 1982\] R. Pickholtz, D. Schilling, L. Milstein, "Theory of +Spread Spectrum Communication---a Tutorial," IEEE Transactions on +Communications, Vol. 30, No. 5 (May 1982), pp. 855--884. + +\[PingPlotter 2016\] PingPlotter homepage, http://www.pingplotter.com + +\[Piscatello 1993\] D. Piscatello, A. Lyman Chapin, Open Systems +Networking, Addison-Wesley, Reading, MA, 1993. + +\[Pomeranz 2010\] H. Pomeranz, "Practical, Visual, Three-Dimensional +Pedagogy for Internet Protocol Packet Header Control Fields," +https://righteousit.wordpress.com/ +2010/06/27/practical-visual-three-dimensional-pedagogy-for-internet-protocol-packet-header-control-fields/, +June 2010. + +\[Potaroo 2016\] "Growth of the BGP Table--1994 to Present," +http://bgp.potaroo.net/ + +\[PPLive 2012\] PPLive homepage, http://www.pplive.com + +\[Qazi 2013\] Z. Qazi, C. Tu, L. Chiang, R. Miao, V. Sekar, M. Yu, +"SIMPLE-fying Middlebox Policy Enforcement Using SDN," ACM SIGCOMM +Conference (Aug. 2013), pp. 27--38. + +\[Quagga 2012\] Quagga, "Quagga Routing Suite," http://www.quagga.net/ + +\[Quittner 1998\] J. Quittner, M. Slatalla, Speeding the Net: The Inside +Story of Netscape and How It Challenged Microsoft, Atlantic Monthly +Press, 1998. + +\[Quova 2016\] www.quova.com + +\[Ramakrishnan 1990\] K. K. Ramakrishnan, R. Jain, "A Binary Feedback +Scheme for Congestion Avoidance in Computer Networks," ACM Transactions +on Computer Systems, Vol. 8, No. 2 (May 1990), pp. 158--181. + +\[Raman 1999\] S. Raman, S. McCanne, "A Model, Analysis, and Protocol +Framework for Soft State-based Communication," Proc. 1999 ACM SIGCOMM +(Boston, MA, Aug. 1999). + +\[Raman 2007\] B. Raman, K. Chebrolu, "Experiences in Using WiFi for +Rural Internet in India," IEEE Communications Magazine, Special Issue on +New Directions in Networking Technologies in Emerging Economies +(Jan. 2007). + +\[Ramaswami 2010\] R. Ramaswami, K. Sivarajan, G. Sasaki, Optical +Networks: A Practical Perspective, Morgan Kaufman Publishers, 2010. + +\[Ramjee 1994\] R. Ramjee, J. Kurose, D. Towsley, H. Schulzrinne, +"Adaptive Playout Mechanisms for Packetized Audio Applications in +Wide-Area Networks," Proc. 1994 IEEE INFOCOM. + +\[Rao 2011\] A. S. Rao, Y. S. Lim, C. Barakat, A. Legout, D. Towsley, W. +Dabbous, "Network Characteristics of Video Streaming Traffic," Proc. +2011 ACM CoNEXT (Tokyo). + +\[Ren 2006\] S. Ren, L. Guo, X. Zhang, "ASAP: An AS-Aware Peer-Relay +Protocol for High Quality VoIP," Proc. 2006 IEEE ICDCS (Lisboa, +Portugal, July 2006). + +\[Rescorla 2001\] E. Rescorla, SSL and TLS: Designing and Building +Secure Systems, Addison-Wesley, Boston, 2001. + +\[RFC 001\] S. Crocker, "Host Software," RFC 001 (the very first RFC!). + +\[RFC 768\] J. Postel, "User Datagram Protocol," RFC 768, Aug. 1980. + +\[RFC 791\] J. Postel, "Internet Protocol: DARPA Internet Program +Protocol Specification," RFC 791, Sept. 1981. + +\[RFC 792\] J. Postel, "Internet Control Message Protocol," RFC 792, +Sept. 1981. + +\[RFC 793\] J. Postel, "Transmission Control Protocol," RFC 793, +Sept. 1981. + +\[RFC 801\] J. Postel, "NCP/TCP Transition Plan," RFC 801, Nov. 1981. + +\[RFC 826\] D. C. Plummer, "An Ethernet Address Resolution +Protocol---or--- Converting Network Protocol Addresses to 48-bit +Ethernet Address for Transmission on Ethernet Hardware," RFC 826, +Nov. 1982. + +\[RFC 829\] V. Cerf, "Packet Satellite Technology Reference Sources," +RFC 829, Nov. 1982. + +\[RFC 854\] J. Postel, J. Reynolds, "TELNET Protocol Specification," RFC +854, May 1993. + +\[RFC 950\] J. Mogul, J. Postel, "Internet Standard Subnetting +Procedure," RFC 950, Aug. 1985. + +\[RFC 959\] J. Postel and J. Reynolds, "File Transfer Protocol (FTP)," +RFC 959, Oct. 1985. + +\[RFC 1034\] P. V. Mockapetris, "Domain Names---Concepts and +Facilities," RFC 1034, Nov. 1987. + +\[RFC 1035\] P. Mockapetris, "Domain Names---Implementation and +Specification," RFC 1035, Nov. 1987. + +\[RFC 1058\] C. L. Hendrick, "Routing Information Protocol," RFC 1058, +June 1988. + +\[RFC 1071\] R. Braden, D. Borman, and C. Partridge, "Computing the +Internet Checksum," RFC 1071, Sept. 1988. + +\[RFC 1122\] R. Braden, "Requirements for Internet Hosts---Communication +Layers," RFC 1122, Oct. 1989. + +\[RFC 1123\] R. Braden, ed., "Requirements for Internet +Hosts---Application and Support," RFC-1123, Oct. 1989. + +\[RFC 1142\] D. Oran, "OSI IS-IS Intra-Domain Routing Protocol," RFC +1142, Feb. 1990. + +\[RFC 1190\] C. Topolcic, "Experimental Internet Stream Protocol: +Version 2 (ST-II)," RFC 1190, Oct. 1990. + +\[RFC 1256\] S. Deering, "ICMP Router Discovery Messages," RFC 1256, +Sept. 1991. + +\[RFC 1320\] R. Rivest, "The MD4 Message-Digest Algorithm," RFC 1320, +Apr. 1992. + +\[RFC 1321\] R. Rivest, "The MD5 Message-Digest Algorithm," RFC 1321, +Apr. 1992. + +\[RFC 1323\] V. Jacobson, S. Braden, D. Borman, "TCP Extensions for High +Performance," RFC 1323, May 1992. + +\[RFC 1422\] S. Kent, "Privacy Enhancement for Internet Electronic Mail: +Part II: Certificate-Based Key Management," RFC 1422. + +\[RFC 1546\] C. Partridge, T. Mendez, W. Milliken, "Host Anycasting +Service," RFC 1546, 1993. + +\[RFC 1584\] J. Moy, "Multicast Extensions to OSPF," RFC 1584, +Mar. 1994. + +\[RFC 1633\] R. Braden, D. Clark, S. Shenker, "Integrated Services in +the Internet Architecture: an Overview," RFC 1633, June 1994. + +\[RFC 1636\] R. Braden, D. Clark, S. Crocker, C. Huitema, "Report of IAB +Workshop on Security in the Internet Architecture," RFC 1636, Nov. 1994. + +\[RFC 1700\] J. Reynolds, J. Postel, "Assigned Numbers," RFC 1700, +Oct. 1994. + +\[RFC 1752\] S. Bradner, A. Mankin, "The Recommendations for the IP Next +Generation Protocol," RFC 1752, Jan. 1995. + +\[RFC 1918\] Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, E. +Lear, "Address Allocation for Private Internets," RFC 1918, Feb. 1996. + +\[RFC 1930\] J. Hawkinson, T. Bates, "Guidelines for Creation, +Selection, and Registration of an Autonomous System (AS)," RFC 1930, +Mar. 1996. + +\[RFC 1939\] J. Myers, M. Rose, "Post Office Protocol---Version 3," RFC +1939, May 1996. + +\[RFC 1945\] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext +Transfer Protocol---HTTP/1.0," RFC 1945, May 1996. + +\[RFC 2003\] C. Perkins, "IP Encapsulation Within IP," RFC 2003, +Oct. 1996. + +\[RFC 2004\] C. Perkins, "Minimal Encapsulation Within IP," RFC 2004, +Oct. 1996. + +\[RFC 2018\] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective +Acknowledgment Options," RFC 2018, Oct. 1996. + +\[RFC 2131\] R. Droms, "Dynamic Host Configuration Protocol," RFC 2131, +Mar. 1997. + +\[RFC 2136\] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic +Updates in the Domain Name System," RFC 2136, Apr. 1997. + +\[RFC 2205\] R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin, +"Resource ReSerVation Protocol (RSVP)---Version 1 Functional +Specification," RFC 2205, Sept. 1997. + +\[RFC 2210\] J. Wroclawski, "The Use of RSVP with IETF Integrated +Services," RFC 2210, Sept. 1997. + +\[RFC 2211\] J. Wroclawski, "Specification of the Controlled-Load +Network Element Service," RFC 2211, Sept. 1997. + +\[RFC 2215\] S. Shenker, J. Wroclawski, "General Characterization +Parameters for Integrated Service Network Elements," RFC 2215, +Sept. 1997. + +\[RFC 2326\] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming +Protocol (RTSP)," RFC 2326, Apr. 1998. + +\[RFC 2328\] J. Moy, "OSPF Version 2," RFC 2328, Apr. 1998. + +\[RFC 2420\] H. Kummert, "The PPP Triple-DES Encryption Protocol +(3DESE)," RFC 2420, Sept. 1998. + +\[RFC 2453\] G. Malkin, "RIP Version 2," RFC 2453, Nov. 1998. + +\[RFC 2460\] S. Deering, R. Hinden, "Internet Protocol, Version 6 (IPv6) +Specification," RFC 2460, Dec. 1998. + +\[RFC 2475\] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. +Weiss, "An Architecture for Differentiated Services," RFC 2475, +Dec. 1998. + +\[RFC 2578\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Structure of +Management Information Version 2 (SMIv2)," RFC 2578, Apr. 1999. + +\[RFC 2579\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Textual +Conventions for SMIv2," RFC 2579, Apr. 1999. + +\[RFC 2580\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Conformance +Statements for SMIv2," RFC 2580, Apr. 1999. + +\[RFC 2597\] J. Heinanen, F. Baker, W. Weiss, J. Wroclawski, "Assured +Forwarding PHB Group," RFC 2597, June 1999. + +\[RFC 2616\] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, +P. Leach, T. Berners-Lee, R. Fielding, "Hypertext Transfer +Protocol---HTTP/1.1," RFC 2616, June 1999. + +\[RFC 2663\] P. Srisuresh, M. Holdrege, "IP Network Address Translator +(NAT) Terminology and Considerations," RFC 2663. + +\[RFC 2702\] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, +"Requirements for Traffic Engineering Over MPLS," RFC 2702, Sept. 1999. + +\[RFC 2827\] P. Ferguson, D. Senie, "Network Ingress Filtering: +Defeating Denial of Service Attacks which Employ IP Source Address +Spoofing," RFC 2827, May 2000. + +\[RFC 2865\] C. Rigney, S. Willens, A. Rubens, W. Simpson, "Remote +Authentication Dial In User Service (RADIUS)," RFC 2865, June 2000. + +\[RFC 3007\] B. Wellington, "Secure Domain Name System (DNS) Dynamic +Update," RFC 3007, Nov. 2000. + +\[RFC 3022\] P. Srisuresh, K. Egevang, "Traditional IP Network Address +Translator (Traditional NAT)," RFC 3022, Jan. 2001. + +\[RFC 3022\] P. Srisuresh, K. Egevang, "Traditional IP Network Address +Translator (Traditional NAT)," RFC 3022, Jan. 2001. + +\[RFC 3031\] E. Rosen, A. Viswanathan, R. Callon, "Multiprotocol Label +Switching Architecture," RFC 3031, Jan. 2001. + +\[RFC 3032\] E. Rosen, D. Tappan, G. Fedorkow, Y. Rekhter, D. Farinacci, +T. Li, A. Conta, "MPLS Label Stack Encoding," RFC 3032, Jan. 2001. + +\[RFC 3168\] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of +Explicit Congestion Notification (ECN) to IP," RFC 3168, Sept. 2001. + +\[RFC 3209\] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, G. +Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels," RFC 3209, +Dec. 2001. + +\[RFC 3221\] G. Huston, "Commentary on Inter-Domain Routing in the +Internet," RFC 3221, Dec. 2001. + +\[RFC 3232\] J. Reynolds, "Assigned Numbers: RFC 1700 Is Replaced by an +On-line Database," RFC 3232, Jan. 2002. + +\[RFC 3234\] B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues," +RFC 3234, Feb. 2002. + +\[RFC 3246\] B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le +Boudec, W. Courtney, S. Davari, V. Firoiu, D. Stiliadis, "An Expedited +Forwarding PHB (Per-Hop Behavior)," RFC 3246, Mar. 2002. + +\[RFC 3260\] D. Grossman, "New Terminology and Clarifications for +Diffserv," RFC 3260, Apr. 2002. + +\[RFC 3261\] J. Rosenberg, H. Schulzrinne, G. Carmarillo, A. Johnston, +J. Peterson, R. Sparks, M. Handley, E. Schooler, "SIP: Session +Initiation Protocol," RFC 3261, July 2002. + +\[RFC 3272\] J. Boyle, V. Gill, A. Hannan, D. Cooper, D. Awduche, B. +Christian, W. S. Lai, "Overview and Principles of Internet Traffic +Engineering," RFC 3272, May 2002. + +\[RFC 3286\] L. Ong, J. Yoakum, "An Introduction to the Stream Control +Transmission Protocol (SCTP)," RFC 3286, May 2002. + +\[RFC 3346\] J. Boyle, V. Gill, A. Hannan, D. Cooper, D. Awduche, B. +Christian, W. S. Lai, "Applicability Statement for Traffic Engineering +with MPLS," RFC 3346, Aug. 2002. + +\[RFC 3390\] M. Allman, S. Floyd, C. Partridge, "Increasing TCP's +Initial Window," RFC 3390, Oct. 2002. + +\[RFC 3410\] J. Case, R. Mundy, D. Partain, "Introduction and +Applicability Statements for Internet Standard Management Framework," +RFC 3410, Dec. 2002. + +\[RFC 3414\] U. Blumenthal and B. Wijnen, "User-based Security Model +(USM) for Version 3 of the Simple Network Management Protocol (SNMPv3)," +RFC 3414, Dec. 2002. + +\[RFC 3416\] R. Presuhn, J. Case, K. McCloghrie, M. Rose, S. Waldbusser, +"Version 2 of the Protocol Operations for the Simple Network Management +Protocol (SNMP)," Dec. 2002. + +\[RFC 3439\] R. Bush, D. Meyer, "Some Internet Architectural Guidelines +and Philosophy," RFC 3439, Dec. 2003. + +\[RFC 3447\] J. Jonsson, B. Kaliski, "Public-Key Cryptography Standards +(PKCS) #1: RSA Cryptography Specifications Version 2.1," RFC 3447, +Feb. 2003. + +\[RFC 3468\] L. Andersson, G. Swallow, "The Multiprotocol Label +Switching (MPLS) Working Group Decision on MPLS Signaling Protocols," +RFC 3468, Feb. 2003. + +\[RFC 3469\] V. Sharma, Ed., F. Hellstrand, Ed, "Framework for +Multi-Protocol Label Switching (MPLS)-based Recovery," RFC 3469, +Feb. 2003. ftp://ftp.rfc-editor.org/innotes/rfc3469.txt + +\[RFC 3501\] M. Crispin, "Internet Message Access Protocol---Version +4rev1," RFC 3501, Mar. 2003. + +\[RFC 3550\] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: +A Transport Protocol for Real-Time Applications," RFC 3550, July 2003. + +\[RFC 3588\] P. Calhoun, J. Loughney, E. Guttman, G. Zorn, J. Arkko, +"Diameter Base Protocol," RFC 3588, Sept. 2003. + +\[RFC 3649\] S. Floyd, "HighSpeed TCP for Large Congestion Windows," RFC +3649, Dec. 2003. + +\[RFC 3746\] L. Yang, R. Dantu, T. Anderson, R. Gopal, "Forwarding and +Control Element Separation (ForCES) Framework," Internet, RFC 3746, +Apr. 2004. + +\[RFC 3748\] B. Aboba, L. Blunk, J. Vollbrecht, J. Carlson, H. +Levkowetz, Ed., "Extensible Authentication Protocol (EAP)," RFC 3748, +June 2004. + +\[RFC 3782\] S. Floyd, T. Henderson, A. Gurtov, "The NewReno +Modification to TCP's Fast Recovery Algorithm," RFC 3782, Apr. 2004. + +\[RFC 4213\] E. Nordmark, R. Gilligan, "Basic Transition Mechanisms for +IPv6 Hosts and Routers," RFC 4213, Oct. 2005. + +\[RFC 4271\] Y. Rekhter, T. Li, S. Hares, Ed., "A Border Gateway +Protocol 4 (BGP-4)," RFC 4271, Jan. 2006. + +\[RFC 4272\] S. Murphy, "BGP Security Vulnerabilities Analysis," RFC +4274, Jan. 2006. + +\[RFC 4291\] R. Hinden, S. Deering, "IP Version 6 Addressing +Architecture," RFC 4291, Feb. 2006. + +\[RFC 4340\] E. Kohler, M. Handley, S. Floyd, "Datagram Congestion +Control Protocol (DCCP)," RFC 4340, Mar. 2006. + +\[RFC 4443\] A. Conta, S. Deering, M. Gupta, Ed., "Internet Control +Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) +Specification," RFC 4443, Mar. 2006. + +\[RFC 4346\] T. Dierks, E. Rescorla, "The Transport Layer Security (TLS) +Protocol Version 1.1," RFC 4346, Apr. 2006. + +\[RFC 4514\] K. Zeilenga, Ed., "Lightweight Directory Access Protocol +(LDAP): String Representation of Distinguished Names," RFC 4514, June +2006. + +\[RFC 4601\] B. Fenner, M. Handley, H. Holbrook, I. Kouvelas, "Protocol +Independent Multicast---Sparse Mode (PIM-SM): Protocol Specification +(Revised)," RFC 4601, Aug. 2006. + +\[RFC 4632\] V. Fuller, T. Li, "Classless Inter-domain Routing (CIDR): +The Internet Address Assignment and Aggregation Plan," RFC 4632, +Aug. 2006. + +\[RFC 4960\] R. Stewart, ed., "Stream Control Transmission Protocol," +RFC 4960, Sept. 2007. + +\[RFC 4987\] W. Eddy, "TCP SYN Flooding Attacks and Common Mitigations," +RFC 4987, Aug. 2007. + +\[RFC 5000\] RFC editor, "Internet Official Protocol Standards," RFC +5000, May 2008. + +\[RFC 5109\] A. Li (ed.), "RTP Payload Format for Generic Forward Error +Correction," RFC 5109, Dec. 2007. + +\[RFC 5216\] D. Simon, B. Aboba, R. Hurst, "The EAP-TLS Authentication +Protocol," RFC 5216, Mar. 2008. + +\[RFC 5218\] D. Thaler, B. Aboba, "What Makes for a Successful +Protocol?," RFC 5218, July 2008. + +\[RFC 5321\] J. Klensin, "Simple Mail Transfer Protocol," RFC 5321, +Oct. 2008. + +\[RFC 5322\] P. Resnick, Ed., "Internet Message Format," RFC 5322, +Oct. 2008. + +\[RFC 5348\] S. Floyd, M. Handley, J. Padhye, J. Widmer, "TCP Friendly +Rate Control (TFRC): Protocol Specification," RFC 5348, Sept. 2008. + +\[RFC 5389\] J. Rosenberg, R. Mahy, P. Matthews, D. Wing, "Session +Traversal Utilities for NAT (STUN)," RFC 5389, Oct. 2008. + +\[RFC 5411\] J Rosenberg, "A Hitchhiker's Guide to the Session +Initiation Protocol (SIP)," RFC 5411, Feb. 2009. + +\[RFC 5681\] M. Allman, V. Paxson, E. Blanton, "TCP Congestion Control," +RFC 5681, Sept. 2009. + +\[RFC 5944\] C. Perkins, Ed., "IP Mobility Support for IPv4, Revised," +RFC 5944, Nov. 2010. + +\[RFC 6265\] A Barth, "HTTP State Management Mechanism," RFC 6265, +Apr. 2011. + +\[RFC 6298\] V. Paxson, M. Allman, J. Chu, M. Sargent, "Computing TCP's +Retransmission Timer," RFC 6298, June 2011. + +\[RFC 7020\] R. Housley, J. Curran, G. Huston, D. Conrad, "The Internet +Numbers Registry System," RFC 7020, Aug. 2013. + +\[RFC 7094\] D. McPherson, D. Oran, D. Thaler, E. Osterweil, +"Architectural Considerations of IP Anycast," RFC 7094, Jan. 2014. + +\[RFC 7323\] D. Borman, R. Braden, V. Jacobson, R. Scheffenegger (ed.), +"TCP Extensions for High Performance," RFC 7323, Sept. 2014. + +\[RFC 7540\] M. Belshe, R. Peon, M. Thomson (Eds), "Hypertext Transfer +Protocol Version 2 (HTTP/2)," RFC 7540, May 2015. + +\[Richter 2015\] P. Richter, M. Allman, R. Bush, V. Paxson, "A Primer on +IPv4 Scarcity," ACM SIGCOMM Computer Communication Review, Vol. 45, +No. 2 (Apr. 2015), pp. 21--32. + +\[Roberts 1967\] L. Roberts, T. Merril, "Toward a Cooperative Network of +Time-Shared Computers," AFIPS Fall Conference (Oct. 1966). + +\[Rodriguez 2010\] R. Rodrigues, P. Druschel, "Peer-to-Peer Systems," +Communications of the ACM, Vol. 53, No. 10 (Oct. 2010), pp. 72--82. + +\[Rohde 2008\] Rohde, Schwarz, "UMTS Long Term Evolution (LTE) +Technology Introduction," Application Note 1MA111. + +\[Rom 1990\] R. Rom, M. Sidi, Multiple Access Protocols: Performance and +Analysis, Springer-Verlag, New York, 1990. + +\[Root Servers 2016\] Root Servers home page, +http://www.root-servers.org/ + +\[RSA 1978\] R. Rivest, A. Shamir, L. Adelman, "A Method for Obtaining +Digital Signatures and Public-key Cryptosystems," Communications of the +ACM, Vol. 21, No. 2 (Feb. 1978), pp. 120--126. + +\[RSA Fast 2012\] RSA Laboratories, "How Fast Is RSA?" +http://www.rsa.com/rsalabs/node.asp?id=2215 + +\[RSA Key 2012\] RSA Laboratories, "How Large a Key Should Be Used in +the RSA Crypto System?" http://www.rsa.com/rsalabs/node.asp?id=2218 + +\[Rubenstein 1998\] D. Rubenstein, J. Kurose, D. Towsley, "Real-Time +Reliable Multicast Using Proactive Forward Error Correction," +Proceedings of NOSSDAV '98 (Cambridge, UK, July 1998). + +\[Ruiz-Sanchez 2001\] M. Ruiz-Sánchez, E. Biersack, W. Dabbous, "Survey +and Taxonomy of IP Address Lookup Algorithms," IEEE Network Magazine, +Vol. 15, No. 2 (Mar./Apr. 2001), pp. 8--23. + +\[Saltzer 1984\] J. Saltzer, D. Reed, D. Clark, "End-to-End Arguments in +System Design," ACM Transactions on Computer Systems (TOCS), Vol. 2, +No. 4 (Nov. 1984). + +\[Sandvine 2015\] "Global Internet Phenomena Report, Spring 2011," +http://www.sandvine.com/news/globalbroadbandtrends.asp, 2011. + +\[Sardar 2006\] B. Sardar, D. Saha, "A Survey of TCP Enhancements for +Last-Hop Wireless Networks," IEEE Commun. Surveys and Tutorials, Vol. 8, +No. 3 (2006), pp. 20--34. + +\[Saroiu 2002\] S. Saroiu, P. K. Gummadi, S. D. Gribble, "A Measurement +Study of Peer-to-Peer File Sharing Systems," Proc. of Multimedia +Computing and Networking (MMCN) (2002). + +\[Sauter 2014\] M. Sauter, From GSM to LTE-Advanced, John Wiley and +Sons, 2014. + +\[Savage 2015\] D. Savage, J. Ng, S. Moore, D. Slice, P. Paluch, R. +White, "Enhanced Interior Gateway Routing Protocol," Internet Draft, +draft-savage-eigrp-04.txt, Aug. 2015. + +\[Saydam 1996\] T. Saydam, T. Magedanz, "From Networks and Network +Management into Service and Service Management," Journal of Networks and +System Management, Vol. 4, No. 4 (Dec. 1996), pp. 345--348. + +\[Schiller 2003\] J. Schiller, Mobile Communications 2nd edition, +Addison Wesley, 2003. + +\[Schneier 1995\] B. Schneier, Applied Cryptography: Protocols, +Algorithms, and Source Code in C, John Wiley and Sons, 1995. + +\[Schulzrinne-RTP 2012\] Henning Schulzrinne's RTP site, +http://www.cs.columbia .edu/\~hgs/rtp + +\[Schulzrinne-SIP 2016\] Henning Schulzrinne's SIP site, +http://www.cs.columbia.edu/\~hgs/sip + +\[Schwartz 1977\] M. Schwartz, Computer-Communication Network Design and +Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1997. + +\[Schwartz 1980\] M. Schwartz, Information, Transmission, Modulation, +and Noise, McGraw Hill, New York, NY 1980. + +\[Schwartz 1982\] M. Schwartz, "Performance Analysis of the SNA Virtual +Route Pacing Control," IEEE Transactions on Communications, Vol. 30, +No. 1 (Jan. 1982), pp. 172--184. + +\[Scourias 2012\] J. Scourias, "Overview of the Global System for Mobile +Communications: GSM." http://www.privateline.com/PCS/GSM0.html + +\[SDNHub 2016\] SDNHub, "App Development Tutorials," http://sdnhub.org/ +tutorials/ + +\[Segaller 1998\] S. Segaller, Nerds 2.0.1, A Brief History of the +Internet, TV Books, New York, 1998. + +\[Sekar 2011\] V. Sekar, S. Ratnasamy, M. Reiter, N. Egi, G. Shi, " The +Middlebox Manifesto: Enabling Innovation in Middlebox Deployment," Proc. +10th ACM Workshop on Hot Topics in Networks (HotNets), Article 21, 6 +pages. + +\[Serpanos 2011\] D. Serpanos, T. Wolf, Architecture of Network Systems, +Morgan Kaufmann Publishers, 2011. + +\[Shacham 1990\] N. Shacham, P. McKenney, "Packet Recovery in High-Speed +Networks Using Coding and Buffer Management," Proc. 1990 IEEE INFOCOM +(San Francisco, CA, Apr. 1990), pp. 124--131. + +\[Shaikh 2001\] A. Shaikh, R. Tewari, M. Agrawal, "On the Effectiveness +of DNS-based Server Selection," Proc. 2001 IEEE INFOCOM. + +\[Singh 1999\] S. Singh, The Code Book: The Evolution of Secrecy from +Mary, Queen of Scotsto Quantum Cryptography, Doubleday Press, 1999. + +\[Singh 2015\] A. Singh, J. Ong,. Agarwal, G. Anderson, A. Armistead, R. +Banno, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. +Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, A. +Vahdat, "Jupiter Rising: A Decade of Clos Topologies and Centralized +Control in Google's Datacenter Network," Sigcomm, 2015. + +\[SIP Software 2016\] H. Schulzrinne Software Package site, +http://www.cs.columbia.edu/IRT/software + +\[Skoudis 2004\] E. Skoudis, L. Zeltser, Malware: Fighting Malicious +Code, Prentice Hall, 2004. + +\[Skoudis 2006\] E. Skoudis, T. Liston, Counter Hack Reloaded: A +Step-by-Step Guide to Computer Attacks and Effective Defenses (2nd +Edition), Prentice Hall, 2006. + +\[Smith 2009\] J. Smith, "Fighting Physics: A Tough Battle," +Communications of the ACM, Vol. 52, No. 7 (July 2009), pp. 60--65. + +\[Snort 2012\] Sourcefire Inc., Snort homepage, +http://http://www.snort.org/ + +\[Solensky 1996\] F. Solensky, "IPv4 Address Lifetime Expectations," in +IPng: Internet Protocol Next Generation (S. Bradner, A. Mankin, ed.), +Addison-Wesley, Reading, MA, + +1996. + +\[Spragins 1991\] J. D. Spragins, Telecommunications Protocols and +Design, Addison-Wesley, Reading, MA, 1991. + +\[Srikant 2004\] R. Srikant, The Mathematics of Internet Congestion +Control, Birkhauser, 2004 + +\[Steinder 2002\] M. Steinder, A. Sethi, "Increasing Robustness of Fault +Localization Through Analysis of Lost, Spurious, and Positive Symptoms," +Proc. 2002 IEEE INFOCOM. + +\[Stevens 1990\] W. R. Stevens, Unix Network Programming, Prentice-Hall, +Englewood Cliffs, NJ. + +\[Stevens 1994\] W. R. Stevens, TCP/IP Illustrated, Vol. 1: The +Protocols, Addison-Wesley, Reading, MA, 1994. + +\[Stevens 1997\] W.R. Stevens, Unix Network Programming, Volume 1: +Networking APIs-Sockets and XTI, 2nd edition, Prentice-Hall, Englewood +Cliffs, NJ, 1997. + +\[Stewart 1999\] J. Stewart, BGP4: Interdomain Routing in the Internet, +Addison-Wesley, 1999. + +\[Stone 1998\] J. Stone, M. Greenwald, C. Partridge, J. Hughes, +"Performance of Checksums and CRC's Over Real Data," IEEE/ACM +Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp. 529--543. + +\[Stone 2000\] J. Stone, C. Partridge, "When Reality and the Checksum +Disagree," Proc. 2000 ACM SIGCOMM (Stockholm, Sweden, Aug. 2000). + +\[Strayer 1992\] W. T. Strayer, B. Dempsey, A. Weaver, XTP: The Xpress +Transfer Protocol, Addison-Wesley, Reading, MA, 1992. + +\[Stubblefield 2002\] A. Stubblefield, J. Ioannidis, A. Rubin, "Using +the Fluhrer, Mantin, and Shamir Attack to Break WEP," Proceedings of +2002 Network and Distributed Systems Security Symposium (2002), +pp. 17--22. + +\[Subramanian 2000\] M. Subramanian, Network Management: Principles and +Practice, Addison-Wesley, Reading, MA, 2000. + +\[Subramanian 2002\] L. Subramanian, S. Agarwal, J. Rexford, R. Katz, +"Characterizing the Internet Hierarchy from Multiple Vantage Points," +Proc. 2002 IEEE INFOCOM. + +\[Sundaresan 2006\] K.Sundaresan, K. Papagiannaki, "The Need for +Cross-layer Information in Access Point Selection," Proc. 2006 ACM +Internet Measurement Conference (Rio De Janeiro, Oct. 2006). + +\[Suh 2006\] K. Suh, D. R. Figueiredo, J. Kurose and D. Towsley, +"Characterizing and Detecting Relayed Traffic: A Case Study Using +Skype," Proc. 2006 IEEE INFOCOM (Barcelona, Spain, Apr. 2006). + +\[Sunshine 1978\] C. Sunshine, Y. Dalal, "Connection Management in +Transport Protocols," Computer Networks, North-Holland, Amsterdam, 1978. + +\[Tariq 2008\] M. Tariq, A. Zeitoun, V. Valancius, N. Feamster, M. +Ammar, "Answering What-If Deployment and Configuration Questions with +WISE," Proc. 2008 ACM SIGCOMM (Aug. 2008). + +\[TechnOnLine 2012\] TechOnLine, "Protected Wireless Networks," online +webcast tutorial, +http://www.techonline.com/community/tech_topic/internet/21752 + +\[Teixeira 2006\] R. Teixeira, J. Rexford, "Managing Routing Disruptions +in Internet Service Provider Networks," IEEE Communications Magazine +(Mar. 2006). + +\[Think 2012\] Technical History of Network Protocols, "Cyclades," +http://www.cs.utexas.edu/users/chris/think/Cyclades/index.shtml + +\[Tian 2012\] Y. Tian, R. Dey, Y. Liu, K. W. Ross, "China's Internet: +Topology Mapping and Geolocating," IEEE INFOCOM Mini-Conference 2012 +(Orlando, FL, 2012). + +\[TLD list 2016\] TLD list maintained by Wikipedia, +https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains + +\[Tobagi 1990\] F. Tobagi, "Fast Packet Switch Architectures for +Broadband Integrated Networks," Proc. 1990 IEEE INFOCOM, Vol. 78, No. 1 +(Jan. 1990), pp. 133--167. + +\[TOR 2016\] Tor: Anonymity Online, http://www.torproject.org + +\[Torres 2011\] R. Torres, A. Finamore, J. R. Kim, M. M. Munafo, S. Rao, +"Dissecting Video Server Selection Strategies in the YouTube CDN," Proc. +2011 Int. Conf. on Distributed Computing Systems. + +\[Tourrilhes 2014\] J. Tourrilhes, P. Sharma, S. Banerjee, J. Petit, +"SDN and Openflow Evolution: A Standards Perspective," IEEE Computer +Magazine, Nov. 2014, pp. 22--29. + +\[Turner 1988\] J. S. Turner, "Design of a Broadcast packet switching +network," IEEE Transactions on Communications, Vol. 36, No. 6 (June +1988), pp. 734--743. + +\[Turner 2012\] B. Turner, "2G, 3G, 4G Wireless Tutorial," +http://blogs.nmscommunications.com/communications/2008/10/2g-3g-4g-wireless-tutorial.html + +\[UPnP Forum 2016\] UPnP Forum homepage, http://www.upnp.org/ + +\[van der Berg 2008\] R. van der Berg, "How the 'Net Works: An +Introduction to Peering and Transit," +http://arstechnica.com/guides/other/peering-and-transit.ars + +\[van der Merwe 1998\] J. van der Merwe, S. Rooney, I. Leslie, S. +Crosby, "The Tempest: A Practical Framework for Network +Programmability," IEEE Network, Vol. 12, No. 3 (May 1998), pp. 20--28. + +\[Varghese 1997\] G. Varghese, A. Lauck, "Hashed and Hierarchical Timing +Wheels: Efficient Data Structures for Implementing a Timer Facility," +IEEE/ACM Transactions on Networking, Vol. 5, No. 6 (Dec. 1997), +pp. 824--834. + +\[Vasudevan 2012\] S. Vasudevan, C. Diot, J. Kurose, D. Towsley, +"Facilitating Access Point Selection in IEEE 802.11 Wireless Networks," +Proc. 2005 ACM Internet Measurement Conference, (San Francisco CA, +Oct. 2005). + +\[Villamizar 1994\] C. Villamizar, C. Song. "High Performance tcp in +ansnet," ACM SIGCOMM Computer Communications Review, Vol. 24, No. 5 +(1994), pp. 45--60. + +\[Viterbi 1995\] A. Viterbi, CDMA: Principles of Spread Spectrum +Communication, Addison-Wesley, Reading, MA, 1995. + +\[Vixie 2009\] P. Vixie, "What DNS Is Not," Communications of the ACM, +Vol. 52, No. 12 (Dec. 2009), pp. 43--47. + +\[Wakeman 1992\] I. Wakeman, J. Crowcroft, Z. Wang, D. Sirovica, +"Layering Considered Harmful," IEEE Network (Jan. 1992), pp. 20--24. + +\[Waldrop 2007\] M. Waldrop, "Data Center in a Box," Scientific American +(July 2007). + +\[Wang 2004\] B. Wang, J. Kurose, P. Shenoy, D. Towsley, "Multimedia +Streaming via TCP: An Analytic Performance Study," Proc. 2004 ACM +Multimedia Conference (New York, NY, Oct. 2004). + +\[Wang 2008\] B. Wang, J. Kurose, P. Shenoy, D. Towsley, "Multimedia +Streaming via TCP: An Analytic Performance Study," ACM Transactions on +Multimedia Computing Communications and Applications (TOMCCAP), Vol. 4, +No. 2 (Apr. 2008), p. 16. 1--22. + +\[Wang 2010\] G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. +S. E. Ng, M. Kozuch, M. Ryan, "c-Through: Part-time Optics in Data +Centers," Proc. 2010 ACM SIGCOMM. + +\[Wei 2006\] W. Wei, C. Zhang, H. Zang, J. Kurose, D. Towsley, +"Inference and Evaluation of Split-Connection Approaches in Cellular +Data Networks," Proc. Active and Passive Measurement Workshop (Adelaide, +Australia, Mar. 2006). + +\[Wei 2007\] D. X. Wei, C. Jin, S. H. Low, S. Hegde, "FAST TCP: +Motivation, Architecture, Algorithms, Performance," IEEE/ACM +Transactions on Networking (2007). + +\[Weiser 1991\] M. Weiser, "The Computer for the Twenty-First Century," +Scientific American (Sept. 1991): 94--10. +http://www.ubiq.com/hypertext/weiser/ SciAmDraft3.html + +\[White 2011\] A. White, K. Snow, A. Matthews, F. Monrose, "Hookt on +fon-iks: Phonotactic Reconstruction of Encrypted VoIP Conversations," +IEEE Symposium on Security and Privacy, Oakland, CA, 2011. + +\[Wigle.net 2016\] Wireless Geographic Logging Engine, +http://www.wigle.net + +\[Wiki Satellite 2016\] Satellite Internet access, +https://en.wikipedia.org/wiki/Satellite_Internet_access + +\[Wireshark 2016\] Wireshark homepage, http://www.wireshark.org + +\[Wischik 2005\] D. Wischik, N. McKeown, "Part I: Buffer Sizes for Core +Routers," ACM SIGCOMM Computer Communications Review, Vol. 35, No. 3 +(July 2005). + +\[Woo 1994\] T. Woo, R. Bindignavle, S. Su, S. Lam, "SNP: an interface +for secure network programming," Proc. 1994 Summer USENIX (Boston, MA, +June 1994), pp. 45--58. + +\[Wright 2015\] J. Wright, J. Wireless Security Secrets & Solutions, 3e, +"Hacking Exposed Wireless," McGraw-Hill Education, 2015. + +\[Wu 2005\] J. Wu, Z. M. Mao, J. Rexford, J. Wang, "Finding a Needle in +a Haystack: Pinpointing Significant BGP Routing Changes in an IP +Network," Proc. USENIX NSDI (2005). + +\[Xanadu 2012\] Xanadu Project homepage, http://www.xanadu.com/ + +\[Xiao 2000\] X. Xiao, A. Hannan, B. Bailey, L. Ni, "Traffic Engineering +with MPLS in the Internet," IEEE Network (Mar./Apr. 2000). + +\[Xu 2004\] L. Xu, K Harfoush, I. Rhee, "Binary Increase Congestion +Control (BIC) for Fast Long-Distance Networks," IEEE INFOCOM 2004, +pp. 2514--2524. + +\[Yavatkar 1994\] R. Yavatkar, N. Bhagwat, "Improving End-to-End +Performance of TCP over Mobile Internetworks," Proc. Mobile 94 Workshop +on Mobile Computing Systems and Applications (Dec. 1994). + +\[YouTube 2009\] YouTube 2009, Google container data center tour, 2009. + +\[YouTube 2016\] YouTube Statistics, 2016, +https://www.youtube.com/yt/press/ statistics.html + +\[Yu 2004\] Yu, Fang, H. Katz, Tirunellai V. Lakshman. "Gigabit Rate +Packet Pattern-Matching Using TCAM," Proc. 2004 Int. Conf. Network +Protocols, pp. 174--183. + +\[Yu 2011\] M. Yu, J. Rexford, X. Sun, S. Rao, N. Feamster, "A Survey of +VLAN Usage in Campus Networks," IEEE Communications Magazine, July 2011. + +\[Zegura 1997\] E. Zegura, K. Calvert, M. Donahoo, "A Quantitative +Comparison of Graph-based Models for Internet Topology," IEEE/ACM +Transactions on Networking, Vol. 5, No. 6, (Dec. 1997). See also +http://www.cc.gatech.edu/projects/gtim for a software package that +generates networks with a transit-stub structure. + +\[Zhang 1993\] L. Zhang, S. Deering, D. Estrin, S. Shenker, D. Zappala, +"RSVP: A New Resource Reservation Protocol," IEEE Network Magazine, Vol. +7, No. 9 (Sept. 1993), pp. 8--18. + +\[Zhang 2007\] L. Zhang, "A Retrospective View of NAT," The IETF +Journal, Vol. 3, Issue 2 (Oct. 2007). + +\[Zhang 2015\] G. Zhang, W. Liu, X. Hei, W. Cheng, "Unreeling Xunlei +Kankan: Understanding Hybrid CDN-P2P Video-on-Demand Streaming," IEEE +Transactions on Multimedia, Vol. 17, No. 2, Feb. 2015. + +\[Zhang X 2102\] X. Zhang, Y. Xu, Y. Liu, Z. Guo, Y. Wang, "Profiling +Skype Video Calls: Rate Control and Video Quality," IEEE INFOCOM +(Mar. 2012). + +\[Zink 2009\] M. Zink, K. Suh, Y. Gu, J. Kurose, "Characteristics of +YouTube Network Traffic at a Campus Network---Measurements, Models, and +Implications," Computer Networks, Vol. 53, No. 4, pp. 501--514, 2009. + +Index + + |
