diff options
| author | mo khan <mo@mokhan.ca> | 2025-09-27 15:07:57 -0600 |
|---|---|---|
| committer | mo khan <mo@mokhan.ca> | 2025-09-27 15:07:57 -0600 |
| commit | a72342bd83304fcb5325c0167ff2c83b8525cf87 (patch) | |
| tree | 42935c55abbaf4ac19f5169b2522f901f4bdc29b /generated/textbook.md | |
| parent | 6f75be4a039d3f9225685b42c2537fa0156a0add (diff) | |
remove outdated material
Diffstat (limited to 'generated/textbook.md')
| -rw-r--r-- | generated/textbook.md | 31112 |
1 files changed, 0 insertions, 31112 deletions
diff --git a/generated/textbook.md b/generated/textbook.md deleted file mode 100644 index cfb43cd..0000000 --- a/generated/textbook.md +++ /dev/null @@ -1,31112 +0,0 @@ -Computer Networking A Top-Down Approach Seventh Edition James F. Kurose -University of Massachusetts, Amherst Keith W. Ross NYU and NYU Shanghai - -Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam -Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi -Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Vice -President, Editorial Director, ECS: Marcia Horton Acquisitions Editor: -Matt Goldstein Editorial Assistant: Kristy Alaura Vice President of -Marketing: Christy Lesko Director of Field Marketing: Tim Galligan -Product Marketing Manager: Bram Van Kempen Field Marketing Manager: -Demetrius Hall Marketing Assistant: Jon Bryant Director of Product -Management: Erin Gregg Team Lead, Program and Project Management: Scott -Disanno Program Manager: Joanne Manning and Carole Snyder Project -Manager: Katrina Ostler, Ostler Editorial, Inc. Senior Specialist, -Program Planning and Support: Maura Zaldivar-Garcia - -Cover Designer: Joyce Wells Manager, Rights and Permissions: Ben Ferrini -Project Manager, Rights and Permissions: Jenny Hoffman, Aptara -Corporation Inventory Manager: Ann Lam Cover Image: Marc Gutierrez/Getty -Images Media Project Manager: Steve Wright Composition: Cenveo -Publishing Services Printer/Binder: Edwards Brothers Malloy Cover and -Insert Printer: Phoenix Color/ Hagerstown Credits and acknowledgments -borrowed from other sources and reproduced, with permission, in this -textbook appear on appropriate page within text. Copyright © 2017, 2013, -2010 Pearson Education, Inc. All rights reserved. Manufactured in the -United States of America. This publication is protected by Copyright, -and permission should be obtained from the publisher prior to any -prohibited reproduction, storage in a retrieval system, or transmission -in any form or by any means, electronic, mechanical, photocopying, -recording, or likewise. For information regarding permissions, request -forms and the appropriate contacts within the Pearson Education Global -Rights & Permissions Department, please visit -www.pearsoned.com/permissions/. Many of the designations by -manufacturers and seller to distinguish their products are claimed as -trademarks. Where those designations appear in this book, and the -publisher was aware of a trademark claim, the designations have been -printed in initial caps or all caps. Library of Congress -Cataloging-in-Publication Data Names: Kurose, James F. \| Ross, Keith -W., 1956Title: Computer networking: a top-down approach / James F. -Kurose, University of Massachusetts, Amherst, Keith W. Ross, NYU and NYU -Shanghai. Description: Seventh edition. \| Hoboken, New Jersey: Pearson, -\[2017\] \| Includes bibliographical references and index. Identifiers: -LCCN 2016004976 \| ISBN 9780133594140 \| ISBN 0133594149 Subjects: LCSH: -Internet. \| Computer networks. Classification: LCC TK5105.875.I57 K88 -2017 \| DDC 004.6-dc23 - -LC record available at http://lccn.loc.gov/2016004976 - -ISBN-10: - -0-13-359414-9 - -ISBN-13: 978-0-13-359414-0 - -About the Authors Jim Kurose Jim Kurose is a Distinguished University -Professor of Computer Science at the University of Massachusetts, -Amherst. He is currently on leave from the University of Massachusetts, -serving as an Assistant Director at the US National Science Foundation, -where he leads the Directorate of Computer and Information Science and -Engineering. Dr. Kurose has received a number of recognitions for his -educational activities including Outstanding Teacher Awards from the -National Technological University (eight times), the University of -Massachusetts, and the Northeast Association of Graduate Schools. He -received the IEEE Taylor Booth Education Medal and was recognized for -his leadership of Massachusetts' Commonwealth Information Technology -Initiative. He has won several conference best paper awards and received -the IEEE Infocom Achievement Award and the ACM Sigcomm Test of Time -Award. - -Dr. Kurose is a former Editor-in-Chief of IEEE Transactions on -Communications and of IEEE/ACM Transactions on Networking. He has served -as Technical Program co-Chair for IEEE Infocom, ACM SIGCOMM, ACM -Internet Measurement Conference, and ACM SIGMETRICS. He is a Fellow of -the IEEE and the ACM. His research interests include network protocols -and architecture, network measurement, multimedia communication, and -modeling and performance evaluation. He holds a PhD in Computer Science -from Columbia University. - -Keith Ross - -Keith Ross is the Dean of Engineering and Computer Science at NYU -Shanghai and the Leonard J. Shustek Chair Professor in the Computer -Science and Engineering Department at NYU. Previously he was at -University of Pennsylvania (13 years), Eurecom Institute (5 years) and -Polytechnic University (10 years). He received a B.S.E.E from Tufts -University, a M.S.E.E. from Columbia University, and a Ph.D. in Computer -and Control Engineering from The University of Michigan. Keith Ross is -also the co-founder and original CEO of Wimba, which develops online -multimedia applications for e-learning and was acquired by Blackboard in -2010. - -Professor Ross's research interests are in privacy, social networks, -peer-to-peer networking, Internet measurement, content distribution -networks, and stochastic modeling. He is an ACM Fellow, an IEEE Fellow, -recipient of the Infocom 2009 Best Paper Award, and recipient of 2011 -and 2008 Best Paper Awards for Multimedia Communications (awarded by -IEEE Communications Society). He has served on numerous journal -editorial boards and conference program committees, including IEEE/ACM -Transactions on Networking, ACM SIGCOMM, ACM CoNext, and ACM Internet -Measurement Conference. He also has served as an advisor to the Federal -Trade Commission on P2P file sharing. - -To Julie and our three precious ones---Chris, Charlie, and Nina JFK - -A big THANKS to my professors, colleagues, and students all over the -world. KWR - -Preface Welcome to the seventh edition of Computer Networking: A -Top-Down Approach. Since the publication of the first edition 16 years -ago, our book has been adopted for use at many hundreds of colleges and -universities, translated into 14 languages, and used by over one hundred -thousand students and practitioners worldwide. We've heard from many of -these readers and have been overwhelmed by the positive response. - -What's New in the Seventh Edition? We think one important reason for -this success has been that our book continues to offer a fresh and -timely approach to computer networking instruction. We've made changes -in this seventh edition, but we've also kept unchanged what we believe -(and the instructors and students who have used our book have confirmed) -to be the most important aspects of this book: its top-down approach, -its focus on the Internet and a modern treatment of computer networking, -its attention to both principles and practice, and its accessible style -and approach toward learning about computer networking. Nevertheless, -the seventh edition has been revised and updated substantially. -Long-time readers of our book will notice that for the first time since -this text was published, we've changed the organization of the chapters -themselves. The network layer, which had been previously covered in a -single chapter, is now covered in Chapter 4 (which focuses on the -so-called "data plane" component of the network layer) and Chapter 5 -(which focuses on the network layer's "control plane"). This expanded -coverage of the network layer reflects the swift rise in importance of -software-defined networking (SDN), arguably the most important and -exciting advance in networking in decades. Although a relatively recent -innovation, SDN has been rapidly adopted in practice---so much so that -it's already hard to imagine an introduction to modern computer -networking that doesn't cover SDN. The topic of network management, -previously covered in Chapter 9, has now been folded into the new -Chapter 5. As always, we've also updated many other sections of the text -to reflect recent changes in the dynamic field of networking since the -sixth edition. As always, material that has been retired from the -printed text can always be found on this book's Companion Website. The -most important updates are the following: Chapter 1 has been updated to -reflect the ever-growing reach and use of the Internet. Chapter 2, which -covers the application layer, has been significantly updated. We've -removed the material on the FTP protocol and distributed hash tables to -make room for a new section on application-level video streaming and -content distribution networks, together with Netflix and YouTube case -studies. The socket programming sections have been updated from Python 2 -to Python 3. Chapter 3, which covers the transport layer, has been -modestly updated. The material on asynchronous transport mode (ATM) -networks has been replaced by more modern material on the Internet's -explicit congestion notification (ECN), which teaches the same -principles. Chapter 4 covers the "data plane" component of the network -layer---the per-router forwarding function that determine how a packet -arriving on one of a router's input links is forwarded to one of that -router's output links. We updated the material on traditional Internet -forwarding found in all previous editions, and added material on packet -scheduling. We've also added a new section on generalized forwarding, as -practiced in SDN. There are also numerous updates throughout the -chapter. Material on multicast and broadcast communication has been -removed to make way for the new material. In Chapter 5, we cover the -control plane functions of the network layer---the network-wide logic -that controls how a datagram is routed along an end-to-end path of -routers from the source host to the destination host. As in previous -editions, we cover routing algorithms, as well as routing protocols -(with an updated treatment of BGP) used in today's Internet. We've added -a significant new section on the SDN control plane, where routing and -other functions are implemented in so-called SDN controllers. Chapter 6, -which now covers the link layer, has an updated treatment of Ethernet, -and of data center networking. Chapter 7, which covers wireless and -mobile networking, contains updated material on 802.11 (so-called "WiFi) -networks and cellular networks, including 4G and LTE. Chapter 8, which -covers network security and was extensively updated in the sixth -edition, has only - -modest updates in this seventh edition. Chapter 9, on multimedia -networking, is now slightly "thinner" than in the sixth edition, as -material on video streaming and content distribution networks has been -moved to Chapter 2, and material on packet scheduling has been -incorporated into Chapter 4. Significant new material involving -end-of-chapter problems has been added. As with all previous editions, -homework problems have been revised, added, and removed. As always, our -aim in creating this new edition of our book is to continue to provide a -focused and modern treatment of computer networking, emphasizing both -principles and practice. Audience This textbook is for a first course on -computer networking. It can be used in both computer science and -electrical engineering departments. In terms of programming languages, -the book assumes only that the student has experience with C, C++, Java, -or Python (and even then only in a few places). Although this book is -more precise and analytical than many other introductory computer -networking texts, it rarely uses any mathematical concepts that are not -taught in high school. We have made a deliberate effort to avoid using -any advanced calculus, probability, or stochastic process concepts -(although we've included some homework problems for students with this -advanced background). The book is therefore appropriate for -undergraduate courses and for first-year graduate courses. It should -also be useful to practitioners in the telecommunications industry. What -Is Unique About This Textbook? The subject of computer networking is -enormously complex, involving many concepts, protocols, and technologies -that are woven together in an intricate manner. To cope with this scope -and complexity, many computer networking texts are often organized -around the "layers" of a network architecture. With a layered -organization, students can see through the complexity of computer -networking---they learn about the distinct concepts and protocols in one -part of the architecture while seeing the big picture of how all parts -fit together. From a pedagogical perspective, our personal experience -has been that such a layered approach indeed works well. Nevertheless, -we have found that the traditional approach of teaching---bottom up; -that is, from the physical layer towards the application layer---is not -the best approach for a modern course on computer networking. A Top-Down -Approach Our book broke new ground 16 years ago by treating networking -in a top-down manner---that is, by beginning at the application layer -and working its way down toward the physical layer. The feedback we -received from teachers and students alike have confirmed that this -top-down approach has many advantages and does indeed work well -pedagogically. First, it places emphasis on the application layer (a -"high growth area" in networking). Indeed, many of the recent -revolutions in computer networking---including the Web, peer-to-peer -file sharing, and media streaming---have taken place at the application -layer. An early emphasis on application-layer issues differs from the -approaches taken in most other texts, which have only a small amount of -material on network applications, their requirements, application-layer -paradigms (e.g., client-server and peer-to-peer), and application -programming interfaces. Second, our experience as instructors (and that -of many instructors who have used this text) has been that teaching -networking applications near the beginning of the course is a powerful -motivational tool. Students are thrilled to learn about how networking - -applications work---applications such as e-mail and the Web, which most -students use on a daily basis. Once a student understands the -applications, the student can then understand the network services -needed to support these applications. The student can then, in turn, -examine the various ways in which such services might be provided and -implemented in the lower layers. Covering applications early thus -provides motivation for the remainder of the text. Third, a top-down -approach enables instructors to introduce network application -development at an early stage. Students not only see how popular -applications and protocols work, but also learn how easy it is to create -their own network applications and application-level protocols. With the -top-down approach, students get early exposure to the notions of socket -programming, service models, and protocols---important concepts that -resurface in all subsequent layers. By providing socket programming -examples in Python, we highlight the central ideas without confusing -students with complex code. Undergraduates in electrical engineering and -computer science should not have difficulty following the Python code. -An Internet Focus Although we dropped the phrase "Featuring the -Internet" from the title of this book with the fourth edition, this -doesn't mean that we dropped our focus on the Internet. Indeed, nothing -could be further from the case! Instead, since the Internet has become -so pervasive, we felt that any networking textbook must have a -significant focus on the Internet, and thus this phrase was somewhat -unnecessary. We continue to use the Internet's architecture and -protocols as primary vehicles for studying fundamental computer -networking concepts. Of course, we also include concepts and protocols -from other network architectures. But the spotlight is clearly on the -Internet, a fact reflected in our organizing the book around the -Internet's five-layer architecture: the application, transport, network, -link, and physical layers. Another benefit of spotlighting the Internet -is that most computer science and electrical engineering students are -eager to learn about the Internet and its protocols. They know that the -Internet has been a revolutionary and disruptive technology and can see -that it is profoundly changing our world. Given the enormous relevance -of the Internet, students are naturally curious about what is "under the -hood." Thus, it is easy for an instructor to get students excited about -basic principles when using the Internet as the guiding focus. Teaching -Networking Principles Two of the unique features of the book---its -top-down approach and its focus on the Internet---have appeared in the -titles of our book. If we could have squeezed a third phrase into the -subtitle, it would have contained the word principles. The field of -networking is now mature enough that a number of fundamentally important -issues can be identified. For example, in the transport layer, the -fundamental issues include reliable communication over an unreliable -network layer, connection establishment/ teardown and handshaking, -congestion and flow control, and multiplexing. Three fundamentally -important network-layer issues are determining "good" paths between two -routers, interconnecting a large number of heterogeneous networks, and -managing the complexity of a modern network. In the link layer, a -fundamental problem is sharing a multiple access channel. In network -security, techniques for providing confidentiality, authentication, and -message integrity are all based on cryptographic fundamentals. This text -identifies fundamental networking issues and studies approaches towards -addressing these issues. The student learning these principles will gain -knowledge with a long "shelf life"---long after today's network -standards and protocols have become obsolete, the principles they embody -will remain important and relevant. We believe that the combination of -using the Internet to get the student's foot in the door and then -emphasizing fundamental issues and solution approaches will allow the -student to - -quickly understand just about any networking technology. The Website -Each new copy of this textbook includes twelve months of access to a -Companion Website for all book readers at -http://www.pearsonhighered.com/cs-resources/, which includes: -Interactive learning material. The book's Companion Website contains -VideoNotes---video presentations of important topics throughout the book -done by the authors, as well as walkthroughs of solutions to problems -similar to those at the end of the chapter. We've seeded the Web site -with VideoNotes and online problems for Chapters 1 through 5 and will -continue to actively add and update this material over time. As in -earlier editions, the Web site contains the interactive Java applets -that animate many key networking concepts. The site also has interactive -quizzes that permit students to check their basic understanding of the -subject matter. Professors can integrate these interactive features into -their lectures or use them as mini labs. Additional technical material. -As we have added new material in each edition of our book, we've had to -remove coverage of some existing topics to keep the book at manageable -length. For example, to make room for the new material in this edition, -we've removed material on FTP, distributed hash tables, and -multicasting, Material that appeared in earlier editions of the text is -still of interest, and thus can be found on the book's Web site. -Programming assignments. The Web site also provides a number of detailed -programming assignments, which include building a multithreaded Web -server, building an e-mail client with a GUI interface, programming the -sender and receiver sides of a reliable data transport protocol, -programming a distributed routing algorithm, and more. Wireshark labs. -One's understanding of network protocols can be greatly deepened by -seeing them in action. The Web site provides numerous Wireshark -assignments that enable students to actually observe the sequence of -messages exchanged between two protocol entities. The Web site includes -separate Wireshark labs on HTTP, DNS, TCP, UDP, IP, ICMP, Ethernet, ARP, -WiFi, SSL, and on tracing all protocols involved in satisfying a request -to fetch a Web page. We'll continue to add new labs over time. In -addition to the Companion Website, the authors maintain a public Web -site, http://gaia.cs.umass.edu/kurose_ross/interactive, containing -interactive exercises that create (and present solutions for) problems -similar to selected end-of-chapter problems. Since students can generate -(and view solutions for) an unlimited number of similar problem -instances, they can work until the material is truly mastered. -Pedagogical Features We have each been teaching computer networking for -more than 30 years. Together, we bring more than 60 years of teaching -experience to this text, during which time we have taught many thousands -of students. We have also been active researchers in computer networking -during this time. (In fact, Jim and Keith first met each other as -master's students in a computer networking course taught by Mischa -Schwartz in 1979 at Columbia University.) We think all this gives us a -good perspective on where networking has been and where it is likely to -go in the future. Nevertheless, we have resisted temptations to bias the -material in this book towards our own pet research projects. We figure -you can visit our personal Web sites if you are interested in our -research. Thus, this book is about modern computer networking---it is -about contemporary protocols and technologies as well as the underlying -principles behind these protocols and technologies. We also believe - -that learning (and teaching!) about networking can be fun. A sense of -humor, use of analogies, and real-world examples in this book will -hopefully make this material more fun. Supplements for Instructors We -provide a complete supplements package to aid instructors in teaching -this course. This material can be accessed from Pearson's Instructor -Resource Center (http://www.pearsonhighered.com/irc). Visit the -Instructor Resource Center for information about accessing these -instructor's supplements. PowerPoint® slides. We provide PowerPoint -slides for all nine chapters. The slides have been completely updated -with this seventh edition. The slides cover each chapter in detail. They -use graphics and animations (rather than relying only on monotonous text -bullets) to make the slides interesting and visually appealing. We -provide the original PowerPoint slides so you can customize them to best -suit your own teaching needs. Some of these slides have been contributed -by other instructors who have taught from our book. Homework solutions. -We provide a solutions manual for the homework problems in the text, -programming assignments, and Wireshark labs. As noted earlier, we've -introduced many new homework problems in the first six chapters of the -book. Chapter Dependencies The first chapter of this text presents a -self-contained overview of computer networking. Introducing many key -concepts and terminology, this chapter sets the stage for the rest of -the book. All of the other chapters directly depend on this first -chapter. After completing Chapter 1, we recommend instructors cover -Chapters 2 through 6 in sequence, following our top-down philosophy. -Each of these five chapters leverages material from the preceding -chapters. After completing the first six chapters, the instructor has -quite a bit of flexibility. There are no interdependencies among the -last three chapters, so they can be taught in any order. However, each -of the last three chapters depends on the material in the first six -chapters. Many instructors first teach the first six chapters and then -teach one of the last three chapters for "dessert." One Final Note: We'd -Love to Hear from You We encourage students and instructors to e-mail us -with any comments they might have about our book. It's been wonderful -for us to hear from so many instructors and students from around the -world about our first five editions. We've incorporated many of these -suggestions into later editions of the book. We also encourage -instructors to send us new homework problems (and solutions) that would -complement the current homework problems. We'll post these on the -instructor-only portion of the Web site. We also encourage instructors -and students to create new Java applets that illustrate the concepts and -protocols in this book. If you have an applet that you think would be -appropriate for this text, please submit it to us. If the applet -(including notation and terminology) is appropriate, we'll be happy to -include it on the text's Web site, with an appropriate reference to the -applet's authors. So, as the saying goes, "Keep those cards and letters -coming!" Seriously, please do continue to send us interesting URLs, -point out typos, disagree with any of our claims, and tell us what works -and what doesn't work. Tell us what you think should or shouldn't be -included in the next edition. Send your e-mail to kurose@cs.umass.edu -and keithwross@nyu.edu. - -Acknowledgments Since we began writing this book in 1996, many people -have given us invaluable help and have been influential in shaping our -thoughts on how to best organize and teach a networking course. We want -to say A BIG THANKS to everyone who has helped us from the earliest -first drafts of this book, up to this seventh edition. We are also very -thankful to the many hundreds of readers from around the -world---students, faculty, practitioners---who have sent us thoughts and -comments on earlier editions of the book and suggestions for future -editions of the book. Special thanks go out to: Al Aho (Columbia -University) Hisham Al-Mubaid (University of Houston-Clear Lake) Pratima -Akkunoor (Arizona State University) Paul Amer (University of Delaware) -Shamiul Azom (Arizona State University) Lichun Bao (University of -California at Irvine) Paul Barford (University of Wisconsin) Bobby -Bhattacharjee (University of Maryland) Steven Bellovin (Columbia -University) Pravin Bhagwat (Wibhu) Supratik Bhattacharyya (previously at -Sprint) Ernst Biersack (Eurécom Institute) Shahid Bokhari (University of -Engineering & Technology, Lahore) Jean Bolot (Technicolor Research) -Daniel Brushteyn (former University of Pennsylvania student) Ken Calvert -(University of Kentucky) Evandro Cantu (Federal University of Santa -Catarina) Jeff Case (SNMP Research International) Jeff Chaltas (Sprint) -Vinton Cerf (Google) Byung Kyu Choi (Michigan Technological University) -Bram Cohen (BitTorrent, Inc.) Constantine Coutras (Pace University) John -Daigle (University of Mississippi) Edmundo A. de Souza e Silva (Federal -University of Rio de Janeiro) - -Philippe Decuetos (Eurécom Institute) Christophe Diot (Technicolor -Research) Prithula Dhunghel (Akamai) Deborah Estrin (University of -California, Los Angeles) Michalis Faloutsos (University of California at -Riverside) Wu-chi Feng (Oregon Graduate Institute) Sally Floyd (ICIR, -University of California at Berkeley) Paul Francis (Max Planck -Institute) David Fullager (Netflix) Lixin Gao (University of -Massachusetts) JJ Garcia-Luna-Aceves (University of California at Santa -Cruz) Mario Gerla (University of California at Los Angeles) David -Goodman (NYU-Poly) Yang Guo (Alcatel/Lucent Bell Labs) Tim Griffin -(Cambridge University) Max Hailperin (Gustavus Adolphus College) Bruce -Harvey (Florida A&M University, Florida State University) Carl Hauser -(Washington State University) Rachelle Heller (George Washington -University) Phillipp Hoschka (INRIA/W3C) Wen Hsin (Park University) -Albert Huang (former University of Pennsylvania student) Cheng Huang -(Microsoft Research) Esther A. Hughes (Virginia Commonwealth University) -Van Jacobson (Xerox PARC) Pinak Jain (former NYU-Poly student) Jobin -James (University of California at Riverside) Sugih Jamin (University of -Michigan) Shivkumar Kalyanaraman (IBM Research, India) Jussi Kangasharju -(University of Helsinki) Sneha Kasera (University of Utah) - -Parviz Kermani (formerly of IBM Research) Hyojin Kim (former University -of Pennsylvania student) Leonard Kleinrock (University of California at -Los Angeles) David Kotz (Dartmouth College) Beshan Kulapala (Arizona -State University) Rakesh Kumar (Bloomberg) Miguel A. Labrador -(University of South Florida) Simon Lam (University of Texas) Steve Lai -(Ohio State University) Tom LaPorta (Penn State University) Tim-Berners -Lee (World Wide Web Consortium) Arnaud Legout (INRIA) Lee Leitner -(Drexel University) Brian Levine (University of Massachusetts) Chunchun -Li (former NYU-Poly student) Yong Liu (NYU-Poly) William Liang (former -University of Pennsylvania student) Willis Marti (Texas A&M University) -Nick McKeown (Stanford University) Josh McKinzie (Park University) Deep -Medhi (University of Missouri, Kansas City) Bob Metcalfe (International -Data Group) Sue Moon (KAIST) Jenni Moyer (Comcast) Erich Nahum (IBM -Research) Christos Papadopoulos (Colorado Sate University) Craig -Partridge (BBN Technologies) Radia Perlman (Intel) Jitendra Padhye -(Microsoft Research) Vern Paxson (University of California at Berkeley) -Kevin Phillips (Sprint) - -George Polyzos (Athens University of Economics and Business) Sriram -Rajagopalan (Arizona State University) Ramachandran Ramjee (Microsoft -Research) Ken Reek (Rochester Institute of Technology) Martin Reisslein -(Arizona State University) Jennifer Rexford (Princeton University) Leon -Reznik (Rochester Institute of Technology) Pablo Rodrigez (Telefonica) -Sumit Roy (University of Washington) Dan Rubenstein (Columbia -University) Avi Rubin (Johns Hopkins University) Douglas Salane (John -Jay College) Despina Saparilla (Cisco Systems) John Schanz (Comcast) -Henning Schulzrinne (Columbia University) Mischa Schwartz (Columbia -University) Ardash Sethi (University of Delaware) Harish Sethu (Drexel -University) K. Sam Shanmugan (University of Kansas) Prashant Shenoy -(University of Massachusetts) Clay Shields (Georgetown University) Subin -Shrestra (University of Pennsylvania) Bojie Shu (former NYU-Poly -student) Mihail L. Sichitiu (NC State University) Peter Steenkiste -(Carnegie Mellon University) Tatsuya Suda (University of California at -Irvine) Kin Sun Tam (State University of New York at Albany) Don Towsley -(University of Massachusetts) David Turner (California State University, -San Bernardino) Nitin Vaidya (University of Illinois) Michele Weigle -(Clemson University) - -David Wetherall (University of Washington) Ira Winston (University of -Pennsylvania) Di Wu (Sun Yat-sen University) Shirley Wynn (NYU-Poly) Raj -Yavatkar (Intel) Yechiam Yemini (Columbia University) Dian Yu (NYU -Shanghai) Ming Yu (State University of New York at Binghamton) Ellen -Zegura (Georgia Institute of Technology) Honggang Zhang (Suffolk -University) Hui Zhang (Carnegie Mellon University) Lixia Zhang -(University of California at Los Angeles) Meng Zhang (former NYU-Poly -student) Shuchun Zhang (former University of Pennsylvania student) -Xiaodong Zhang (Ohio State University) ZhiLi Zhang (University of -Minnesota) Phil Zimmermann (independent consultant) Mike Zink -(University of Massachusetts) Cliff C. Zou (University of Central -Florida) We also want to thank the entire Pearson team---in particular, -Matt Goldstein and Joanne Manning---who have done an absolutely -outstanding job on this seventh edition (and who have put up with two -very finicky authors who seem congenitally unable to meet deadlines!). -Thanks also to our artists, Janet Theurer and Patrice Rossi Calkin, for -their work on the beautiful figures in this and earlier editions of our -book, and to Katie Ostler and her team at Cenveo for their wonderful -production work on this edition. Finally, a most special thanks go to -our previous two editors at Addison-Wesley---Michael Hirsch and Susan -Hartman. This book would not be what it is (and may well not have been -at all) without their graceful management, constant encouragement, -nearly infinite patience, good humor, and perseverance. - -Table of Contents Chapter 1 Computer Networks and the Internet 1 1.1 -What Is the Internet? 2 1.1.1 A Nuts-and-Bolts Description 2 1.1.2 A -Services Description 5 1.1.3 What Is a Protocol? 7 1.2 The Network Edge -9 1.2.1 Access Networks 12 1.2.2 Physical Media 18 1.3 The Network Core -21 1.3.1 Packet Switching 23 1.3.2 Circuit Switching 27 1.3.3 A Network -of Networks 31 1.4 Delay, Loss, and Throughput in Packet-Switched -Networks 35 1.4.1 Overview of Delay in Packet-Switched Networks 35 1.4.2 -Queuing Delay and Packet Loss 39 1.4.3 End-to-End Delay 41 1.4.4 -Throughput in Computer Networks 43 1.5 Protocol Layers and Their Service -Models 47 1.5.1 Layered Architecture 47 1.5.2 Encapsulation 53 1.6 -Networks Under Attack 55 1.7 History of Computer Networking and the -Internet 59 1.7.1 The Development of Packet Switching: 1961--1972 59 -1.7.2 Proprietary Networks and Internetworking: 1972--1980 60 1.7.3 A -Proliferation of Networks: 1980--1990 62 1.7.4 The Internet Explosion: -The 1990s 63 1.7.5 The New Millennium 64 1.8 Summary 65 - -Homework Problems and Questions 67 Wireshark Lab 77 Interview: Leonard -Kleinrock 79 Chapter 2 Application Layer 83 2.1 Principles of Network -Applications 84 2.1.1 Network Application Architectures 86 2.1.2 -Processes Communicating 88 2.1.3 Transport Services Available to -Applications 90 2.1.4 Transport Services Provided by the Internet 93 -2.1.5 Application-Layer Protocols 96 2.1.6 Network Applications Covered -in This Book 97 2.2 The Web and HTTP 98 2.2.1 Overview of HTTP 98 2.2.2 -Non-Persistent and Persistent Connections 100 2.2.3 HTTP Message Format -103 2.2.4 User-Server Interaction: Cookies 108 2.2.5 Web Caching 110 2.3 -Electronic Mail in the Internet 116 2.3.1 SMTP 118 2.3.2 Comparison with -HTTP 121 2.3.3 Mail Message Formats 121 2.3.4 Mail Access Protocols 122 -2.4 DNS---The Internet's Directory Service 126 2.4.1 Services Provided -by DNS 127 2.4.2 Overview of How DNS Works 129 2.4.3 DNS Records and -Messages 135 2.5 Peer-to-Peer Applications 140 2.5.1 P2P File -Distribution 140 2.6 Video Streaming and Content Distribution Networks -147 2.6.1 Internet Video 148 2.6.2 HTTP Streaming and DASH 148 - -2.6.3 Content Distribution Networks 149 2.6.4 Case Studies: Netflix, -YouTube, and Kankan 153 2.7 Socket Programming: Creating Network -Applications 157 2.7.1 Socket Programming with UDP 159 2.7.2 Socket -Programming with TCP 164 2.8 Summary 170 Homework Problems and Questions -171 Socket Programming Assignments 180 Wireshark Labs: HTTP, DNS 182 -Interview: Marc Andreessen 184 Chapter 3 Transport Layer 187 3.1 -Introduction and Transport-Layer Services 188 3.1.1 Relationship Between -Transport and Network Layers 188 3.1.2 Overview of the Transport Layer -in the Internet 191 3.2 Multiplexing and Demultiplexing 193 3.3 -Connectionless Transport: UDP 200 3.3.1 UDP Segment Structure 204 3.3.2 -UDP Checksum 204 3.4 Principles of Reliable Data Transfer 206 3.4.1 -Building a Reliable Data Transfer Protocol 208 3.4.2 Pipelined Reliable -Data Transfer Protocols 217 3.4.3 Go-Back-N (GBN) 221 3.4.4 Selective -Repeat (SR) 226 3.5 Connection-Oriented Transport: TCP 233 3.5.1 The TCP -Connection 233 3.5.2 TCP Segment Structure 236 3.5.3 Round-Trip Time -Estimation and Timeout 241 3.5.4 Reliable Data Transfer 244 3.5.5 Flow -Control 252 3.5.6 TCP Connection Management 255 3.6 Principles of -Congestion Control 261 - -3.6.1 The Causes and the Costs of Congestion 261 3.6.2 Approaches to -Congestion Control 268 3.7 TCP Congestion Control 269 3.7.1 Fairness 279 -3.7.2 Explicit Congestion Notification (ECN): Network-assisted -Congestion Control 282 3.8 Summary 284 Homework Problems and Questions -286 Programming Assignments 301 Wireshark Labs: Exploring TCP, UDP 302 -Interview: Van Jacobson 303 Chapter 4 The Network Layer: Data Plane 305 -4.1 Overview of Network Layer 306 4.1.1 Forwarding and Routing: The -Network Data and Control Planes 306 4.1.2 Network Service Models 311 4.2 -What's Inside a Router? 313 4.2.1 Input Port Processing and -Destination-Based Forwarding 316 4.2.2 Switching 319 4.2.3 Output Port -Processing 321 4.2.4 Where Does Queuing Occur? 321 4.2.5 Packet -Scheduling 325 4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, -and More 329 4.3.1 IPv4 Datagram Format 330 4.3.2 IPv4 Datagram -Fragmentation 332 4.3.3 IPv4 Addressing 334 4.3.4 Network Address -Translation (NAT) 345 4.3.5 IPv6 348 4.4 Generalized Forwarding and SDN -354 4.4.1 Match 356 4.4.2 Action 358 4.4.3 OpenFlow Examples of -Match-plus-action in Action 358 4.5 Summary 361 - -Homework Problems and Questions 361 Wireshark Lab 370 Interview: Vinton -G. Cerf 371 Chapter 5 The Network Layer: Control Plane 373 5.1 -Introduction 374 5.2 Routing Algorithms 376 5.2.1 The Link-State (LS) -Routing Algorithm 379 5.2.2 The Distance-Vector (DV) Routing Algorithm -384 5.3 Intra-AS Routing in the Internet: OSPF 391 5.4 Routing Among the -ISPs: BGP 395 5.4.1 The Role of BGP 395 5.4.2 Advertising BGP Route -Information 396 5.4.3 Determining the Best Routes 398 5.4.4 IP-Anycast -402 5.4.5 Routing Policy 403 5.4.6 Putting the Pieces Together: -Obtaining Internet Presence 406 5.5 The SDN Control Plane 407 5.5.1 The -SDN Control Plane: SDN Controller and SDN Control Applications 410 5.5.2 -OpenFlow Protocol 412 5.5.3 Data and Control Plane Interaction: An -Example 414 5.5.4 SDN: Past and Future 415 5.6 ICMP: The Internet -Control Message Protocol 419 5.7 Network Management and SNMP 421 5.7.1 -The Network Management Framework 422 5.7.2 The Simple Network Management -Protocol (SNMP) 424 5.8 Summary 426 Homework Problems and Questions 427 -Socket Programming Assignment 433 Programming Assignment 434 Wireshark -Lab 435 Interview: Jennifer Rexford 436 - -Chapter 6 The Link Layer and LANs 439 6.1 Introduction to the Link Layer -440 6.1.1 The Services Provided by the Link Layer 442 6.1.2 Where Is the -Link Layer Implemented? 443 6.2 Error-Detection and -Correction -Techniques 444 6.2.1 Parity Checks 446 6.2.2 Checksumming Methods 448 -6.2.3 Cyclic Redundancy Check (CRC) 449 6.3 Multiple Access Links and -Protocols 451 6.3.1 Channel Partitioning Protocols 453 6.3.2 Random -Access Protocols 455 6.3.3 Taking-Turns Protocols 464 6.3.4 DOCSIS: The -Link-Layer Protocol for Cable Internet Access 465 6.4 Switched Local -Area Networks 467 6.4.1 Link-Layer Addressing and ARP 468 6.4.2 Ethernet -474 6.4.3 Link-Layer Switches 481 6.4.4 Virtual Local Area Networks -(VLANs) 487 6.5 Link Virtualization: A Network as a Link Layer 491 6.5.1 -Multiprotocol Label Switching (MPLS) 492 6.6 Data Center Networking 495 -6.7 Retrospective: A Day in the Life of a Web Page Request 500 6.7.1 -Getting Started: DHCP, UDP, IP, and Ethernet 500 6.7.2 Still Getting -Started: DNS and ARP 502 6.7.3 Still Getting Started: Intra-Domain -Routing to the DNS Server 503 6.7.4 Web Client-Server Interaction: TCP -and HTTP 504 6.8 Summary 506 Homework Problems and Questions 507 -Wireshark Lab 515 Interview: Simon S. Lam 516 - -Chapter 7 Wireless and Mobile Networks 519 7.1 Introduction 520 7.2 -Wireless Links and Network Characteristics 525 7.2.1 CDMA 528 7.3 WiFi: -802.11 Wireless LANs 532 7.3.1 The 802.11 Architecture 533 7.3.2 The -802.11 MAC Protocol 537 7.3.3 The IEEE 802.11 Frame 542 7.3.4 Mobility -in the Same IP Subnet 546 7.3.5 Advanced Features in 802.11 547 7.3.6 -Personal Area Networks: Bluetooth and Zigbee 548 7.4 Cellular Internet -Access 551 7.4.1 An Overview of Cellular Network Architecture 551 7.4.2 -3G Cellular Data Networks: Extending the Internet to Cellular -Subscribers 554 7.4.3 On to 4G: LTE 557 7.5 Mobility Management: -Principles 560 7.5.1 Addressing 562 7.5.2 Routing to a Mobile Node 564 -7.6 Mobile IP 570 7.7 Managing Mobility in Cellular Networks 574 7.7.1 -Routing Calls to a Mobile User 576 7.7.2 Handoffs in GSM 577 7.8 -Wireless and Mobility: Impact on Higher-Layer Protocols 580 7.9 Summary -582 Homework Problems and Questions 583 Wireshark Lab 588 Interview: -Deborah Estrin 589 Chapter 8 Security in Computer Networks 593 8.1 What -Is Network Security? 594 8.2 Principles of Cryptography 596 8.2.1 -Symmetric Key Cryptography 598 8.2.2 Public Key Encryption 604 - -8.3 Message Integrity and Digital Signatures 610 8.3.1 Cryptographic -Hash Functions 611 8.3.2 Message Authentication Code 613 8.3.3 Digital -Signatures 614 8.4 End-Point Authentication 621 8.4.1 Authentication -Protocol ap1.0 622 8.4.2 Authentication Protocol ap2.0 622 8.4.3 -Authentication Protocol ap3.0 623 8.4.4 Authentication Protocol ap3.1 -623 8.4.5 Authentication Protocol ap4.0 624 8.5 Securing E-Mail 626 -8.5.1 Secure E-Mail 627 8.5.2 PGP 630 8.6 Securing TCP Connections: SSL -631 8.6.1 The Big Picture 632 8.6.2 A More Complete Picture 635 8.7 -Network-Layer Security: IPsec and Virtual Private Networks 637 8.7.1 -IPsec and Virtual Private Networks (VPNs) 638 8.7.2 The AH and ESP -Protocols 640 8.7.3 Security Associations 640 8.7.4 The IPsec Datagram -641 8.7.5 IKE: Key Management in IPsec 645 8.8 Securing Wireless LANs -646 8.8.1 Wired Equivalent Privacy (WEP) 646 8.8.2 IEEE 802.11i 648 8.9 -Operational Security: Firewalls and Intrusion Detection Systems 651 -8.9.1 Firewalls 651 8.9.2 Intrusion Detection Systems 659 8.10 Summary -662 Homework Problems and Questions 664 Wireshark Lab 672 - -IPsec Lab 672 Interview: Steven M. Bellovin 673 Chapter 9 Multimedia -Networking 675 9.1 Multimedia Networking Applications 676 9.1.1 -Properties of Video 676 9.1.2 Properties of Audio 677 9.1.3 Types of -Multimedia Network Applications 679 9.2 Streaming Stored Video 681 9.2.1 -UDP Streaming 683 9.2.2 HTTP Streaming 684 9.3 Voice-over-IP 688 9.3.1 -Limitations of the Best-Effort IP Service 688 9.3.2 Removing Jitter at -the Receiver for Audio 691 9.3.3 Recovering from Packet Loss 694 9.3.4 -Case Study: VoIP with Skype 697 9.4 Protocols for Real-Time -Conversational Applications 700 9.4.1 RTP 700 9.4.2 SIP 703 9.5 Network -Support for Multimedia 709 9.5.1 Dimensioning Best-Effort Networks 711 -9.5.2 Providing Multiple Classes of Service 712 9.5.3 Diffserv 719 9.5.4 -Per-Connection Quality-of-Service (QoS) Guarantees: Resource Reservation -and Call Admission 723 9.6 Summary 726 Homework Problems and Questions -727 Programming Assignment 735 Interview: Henning Schulzrinne 736 -References 741 Index 783 - -Chapter 1 Computer Networks and the Internet - -Today's Internet is arguably the largest engineered system ever created -by mankind, with hundreds of millions of connected computers, -communication links, and switches; with billions of users who connect -via laptops, tablets, and smartphones; and with an array of new -Internet-connected "things" including game consoles, surveillance -systems, watches, eye glasses, thermostats, body scales, and cars. Given -that the Internet is so large and has so many diverse components and -uses, is there any hope of understanding how it works? Are there guiding -principles and structure that can provide a foundation for understanding -such an amazingly large and complex system? And if so, is it possible -that it actually could be both interesting and fun to learn about -computer networks? Fortunately, the answer to all of these questions is -a resounding YES! Indeed, it's our aim in this book to provide you with -a modern introduction to the dynamic field of computer networking, -giving you the principles and practical insights you'll need to -understand not only today's networks, but tomorrow's as well. This first -chapter presents a broad overview of computer networking and the -Internet. Our goal here is to paint a broad picture and set the context -for the rest of this book, to see the forest through the trees. We'll -cover a lot of ground in this introductory chapter and discuss a lot of -the pieces of a computer network, without losing sight of the big -picture. We'll structure our overview of computer networks in this -chapter as follows. After introducing some basic terminology and -concepts, we'll first examine the basic hardware and software components -that make up a network. We'll begin at the network's edge and look at -the end systems and network applications running in the network. We'll -then explore the core of a computer network, examining the links and the -switches that transport data, as well as the access networks and -physical media that connect end systems to the network core. We'll learn -that the Internet is a network of networks, and we'll learn how these -networks connect with each other. After having completed this overview -of the edge and core of a computer network, we'll take the broader and -more abstract view in the second half of this chapter. We'll examine -delay, loss, and throughput of data in a computer network and provide -simple quantitative models for end-to-end throughput and delay: models -that take into account transmission, propagation, and queuing delays. -We'll then introduce some of the key architectural principles in -computer networking, namely, protocol layering and service models. We'll -also learn that computer networks are vulnerable to many different types -of attacks; we'll survey - -some of these attacks and consider how computer networks can be made -more secure. Finally, we'll close this chapter with a brief history of -computer networking. - -1.1 What Is the Internet? In this book, we'll use the public Internet, a -specific computer network, as our principal vehicle for discussing -computer networks and their protocols. But what is the Internet? There -are a couple of ways to answer this question. First, we can describe the -nuts and bolts of the Internet, that is, the basic hardware and software -components that make up the Internet. Second, we can describe the -Internet in terms of a networking infrastructure that provides services -to distributed applications. Let's begin with the nuts-and-bolts -description, using Figure 1.1 to illustrate our discussion. - -1.1.1 A Nuts-and-Bolts Description The Internet is a computer network -that interconnects billions of computing devices throughout the world. -Not too long ago, these computing devices were primarily traditional -desktop PCs, Linux workstations, and so-called servers that store and -transmit information such as Web pages and e-mail messages. -Increasingly, however, nontraditional Internet "things" such as laptops, -smartphones, tablets, TVs, gaming consoles, thermostats, home security -systems, home appliances, watches, eye glasses, cars, traffic control -systems and more are being connected to the Internet. Indeed, the term -computer network is beginning to sound a bit dated, given the many -nontraditional devices that are being hooked up to the Internet. In -Internet jargon, all of these devices are called hosts or end systems. -By some estimates, in 2015 there were about 5 billion devices connected -to the Internet, and the number will reach 25 billion by 2020 \[Gartner -2014\]. It is estimated that in 2015 there were over 3.2 billion -Internet users worldwide, approximately 40% of the world population -\[ITU 2015\]. - -Figure 1.1 Some pieces of the Internet - -End systems are connected together by a network of communication links -and packet switches. We'll see in Section 1.2 that there are many types -of communication links, which are made up of - -different types of physical media, including coaxial cable, copper wire, -optical fiber, and radio spectrum. Different links can transmit data at -different rates, with the transmission rate of a link measured in -bits/second. When one end system has data to send to another end system, -the sending end system segments the data and adds header bytes to each -segment. The resulting packages of information, known as packets in the -jargon of computer networks, are then sent through the network to the -destination end system, where they are reassembled into the original -data. A packet switch takes a packet arriving on one of its incoming -communication links and forwards that packet on one of its outgoing -communication links. Packet switches come in many shapes and flavors, -but the two most prominent types in today's Internet are routers and -link-layer switches. Both types of switches forward packets toward their -ultimate destinations. Link-layer switches are typically used in access -networks, while routers are typically used in the network core. The -sequence of communication links and packet switches traversed by a -packet from the sending end system to the receiving end system is known -as a route or path through the network. Cisco predicts annual global IP -traffic will pass the zettabyte (1021 bytes) threshold by the end of -2016, and will reach 2 zettabytes per year by 2019 \[Cisco VNI 2015\]. -Packet-switched networks (which transport packets) are in many ways -similar to transportation networks of highways, roads, and intersections -(which transport vehicles). Consider, for example, a factory that needs -to move a large amount of cargo to some destination warehouse located -thousands of kilometers away. At the factory, the cargo is segmented and -loaded into a fleet of trucks. Each of the trucks then independently -travels through the network of highways, roads, and intersections to the -destination warehouse. At the destination warehouse, the cargo is -unloaded and grouped with the rest of the cargo arriving from the same -shipment. Thus, in many ways, packets are analogous to trucks, -communication links are analogous to highways and roads, packet switches -are analogous to intersections, and end systems are analogous to -buildings. Just as a truck takes a path through the transportation -network, a packet takes a path through a computer network. End systems -access the Internet through Internet Service Providers (ISPs), including -residential ISPs such as local cable or telephone companies; corporate -ISPs; university ISPs; ISPs that provide WiFi access in airports, -hotels, coffee shops, and other public places; and cellular data ISPs, -providing mobile access to our smartphones and other devices. Each ISP -is in itself a network of packet switches and communication links. ISPs -provide a variety of types of network access to the end systems, -including residential broadband access such as cable modem or DSL, -high-speed local area network access, and mobile wireless access. ISPs -also provide Internet access to content providers, connecting Web sites -and video servers directly to the Internet. The Internet is all about -connecting end systems to each other, so the ISPs that provide access to -end systems must also be interconnected. These lower-tier ISPs are -interconnected through national and international upper-tier ISPs such -as Level 3 Communications, AT&T, Sprint, and NTT. An upper-tier ISP -consists of high-speed routers interconnected with high-speed -fiber-optic links. Each ISP network, whether upper-tier or lower-tier, -is - -managed independently, runs the IP protocol (see below), and conforms to -certain naming and address conventions. We'll examine ISPs and their -interconnection more closely in Section 1.3. End systems, packet -switches, and other pieces of the Internet run protocols that control -the sending and receiving of information within the Internet. The -Transmission Control Protocol (TCP) and the Internet Protocol (IP) are -two of the most important protocols in the Internet. The IP protocol -specifies the format of the packets that are sent and received among -routers and end systems. The Internet's principal protocols are -collectively known as TCP/IP. We'll begin looking into protocols in this -introductory chapter. But that's just a start---much of this book is -concerned with computer network protocols! Given the importance of -protocols to the Internet, it's important that everyone agree on what -each and every protocol does, so that people can create systems and -products that interoperate. This is where standards come into play. -Internet standards are developed by the Internet Engineering Task Force -(IETF) \[IETF 2016\]. The IETF standards documents are called requests -for comments (RFCs). RFCs started out as general requests for comments -(hence the name) to resolve network and protocol design problems that -faced the precursor to the Internet \[Allman 2011\]. RFCs tend to be -quite technical and detailed. They define protocols such as TCP, IP, -HTTP (for the Web), and SMTP (for e-mail). There are currently more than -7,000 RFCs. Other bodies also specify standards for network components, -most notably for network links. The IEEE 802 LAN/MAN Standards Committee -\[IEEE 802 2016\], for example, specifies the Ethernet and wireless WiFi -standards. - -1.1.2 A Services Description Our discussion above has identified many of -the pieces that make up the Internet. But we can also describe the -Internet from an entirely different angle---namely, as an infrastructure -that provides services to applications. In addition to traditional -applications such as e-mail and Web surfing, Internet applications -include mobile smartphone and tablet applications, including Internet -messaging, mapping with real-time road-traffic information, music -streaming from the cloud, movie and television streaming, online social -networks, video conferencing, multi-person games, and location-based -recommendation systems. The applications are said to be distributed -applications, since they involve multiple end systems that exchange data -with each other. Importantly, Internet applications run on end -systems--- they do not run in the packet switches in the network core. -Although packet switches facilitate the exchange of data among end -systems, they are not concerned with the application that is the source -or sink of data. Let's explore a little more what we mean by an -infrastructure that provides services to applications. To this end, -suppose you have an exciting new idea for a distributed Internet -application, one that may greatly benefit humanity or one that may -simply make you rich and famous. How might you go about - -transforming this idea into an actual Internet application? Because -applications run on end systems, you are going to need to write programs -that run on the end systems. You might, for example, write your programs -in Java, C, or Python. Now, because you are developing a distributed -Internet application, the programs running on the different end systems -will need to send data to each other. And here we get to a central -issue---one that leads to the alternative way of describing the Internet -as a platform for applications. How does one program running on one end -system instruct the Internet to deliver data to another program running -on another end system? End systems attached to the Internet provide a -socket interface that specifies how a program running on one end system -asks the Internet infrastructure to deliver data to a specific -destination program running on another end system. This Internet socket -interface is a set of rules that the sending program must follow so that -the Internet can deliver the data to the destination program. We'll -discuss the Internet socket interface in detail in Chapter 2. For now, -let's draw upon a simple analogy, one that we will frequently use in -this book. Suppose Alice wants to send a letter to Bob using the postal -service. Alice, of course, can't just write the letter (the data) and -drop the letter out her window. Instead, the postal service requires -that Alice put the letter in an envelope; write Bob's full name, -address, and zip code in the center of the envelope; seal the envelope; -put a stamp in the upper-right-hand corner of the envelope; and finally, -drop the envelope into an official postal service mailbox. Thus, the -postal service has its own "postal service interface," or set of rules, -that Alice must follow to have the postal service deliver her letter to -Bob. In a similar manner, the Internet has a socket interface that the -program sending data must follow to have the Internet deliver the data -to the program that will receive the data. The postal service, of -course, provides more than one service to its customers. It provides -express delivery, reception confirmation, ordinary use, and many more -services. In a similar manner, the Internet provides multiple services -to its applications. When you develop an Internet application, you too -must choose one of the Internet's services for your application. We'll -describe the Internet's services in Chapter 2. We have just given two -descriptions of the Internet; one in terms of its hardware and software -components, the other in terms of an infrastructure for providing -services to distributed applications. But perhaps you are still confused -as to what the Internet is. What are packet switching and TCP/IP? What -are routers? What kinds of communication links are present in the -Internet? What is a distributed application? How can a thermostat or -body scale be attached to the Internet? If you feel a bit overwhelmed by -all of this now, don't worry---the purpose of this book is to introduce -you to both the nuts and bolts of the Internet and the principles that -govern how and why it works. We'll explain these important terms and -questions in the following sections and chapters. - -1.1.3 What Is a Protocol? - -Now that we've got a bit of a feel for what the Internet is, let's -consider another important buzzword in computer networking: protocol. -What is a protocol? What does a protocol do? A Human Analogy It is -probably easiest to understand the notion of a computer network protocol -by first considering some human analogies, since we humans execute -protocols all of the time. Consider what you do when you want to ask -someone for the time of day. A typical exchange is shown in Figure 1.2. -Human protocol (or good manners, at least) dictates that one first offer -a greeting (the first "Hi" in Figure 1.2) to initiate communication with -someone else. The typical response to a "Hi" is a returned "Hi" message. -Implicitly, one then takes a cordial "Hi" response as an indication that -one can proceed and ask for the time of day. A different response to the -initial "Hi" (such as "Don't bother me!" or "I don't speak English," or -some unprintable reply) might - -Figure 1.2 A human protocol and a computer network protocol - -indicate an unwillingness or inability to communicate. In this case, the -human protocol would be not to ask for the time of day. Sometimes one -gets no response at all to a question, in which case one typically gives -up asking that person for the time. Note that in our human protocol, -there are specific messages - -we send, and specific actions we take in response to the received reply -messages or other events (such as no reply within some given amount of -time). Clearly, transmitted and received messages, and actions taken -when these messages are sent or received or other events occur, play a -central role in a human protocol. If people run different protocols (for -example, if one person has manners but the other does not, or if one -understands the concept of time and the other does not) the protocols do -not interoperate and no useful work can be accomplished. The same is -true in networking---it takes two (or more) communicating entities -running the same protocol in order to accomplish a task. Let's consider -a second human analogy. Suppose you're in a college class (a computer -networking class, for example!). The teacher is droning on about -protocols and you're confused. The teacher stops to ask, "Are there any -questions?" (a message that is transmitted to, and received by, all -students who are not sleeping). You raise your hand (transmitting an -implicit message to the teacher). Your teacher acknowledges you with a -smile, saying "Yes . . ." (a transmitted message encouraging you to ask -your question---teachers love to be asked questions), and you then ask -your question (that is, transmit your message to your teacher). Your -teacher hears your question (receives your question message) and answers -(transmits a reply to you). Once again, we see that the transmission and -receipt of messages, and a set of conventional actions taken when these -messages are sent and received, are at the heart of this -question-and-answer protocol. Network Protocols A network protocol is -similar to a human protocol, except that the entities exchanging -messages and taking actions are hardware or software components of some -device (for example, computer, smartphone, tablet, router, or other -network-capable device). All activity in the Internet that involves two -or more communicating remote entities is governed by a protocol. For -example, hardware-implemented protocols in two physically connected -computers control the flow of bits on the "wire" between the two network -interface cards; congestion-control protocols in end systems control the -rate at which packets are transmitted between sender and receiver; -protocols in routers determine a packet's path from source to -destination. Protocols are running everywhere in the Internet, and -consequently much of this book is about computer network protocols. As -an example of a computer network protocol with which you are probably -familiar, consider what happens when you make a request to a Web server, -that is, when you type the URL of a Web page into your Web browser. The -scenario is illustrated in the right half of Figure 1.2. First, your -computer will send a connection request message to the Web server and -wait for a reply. The Web server will eventually receive your connection -request message and return a connection reply message. Knowing that it -is now OK to request the Web document, your computer then sends the name -of the Web page it wants to fetch from that Web server in a GET message. -Finally, the Web server returns the Web page (file) to your computer. - -Given the human and networking examples above, the exchange of messages -and the actions taken when these messages are sent and received are the -key defining elements of a protocol: A protocol defines the format and -the order of messages exchanged between two or more communicating -entities, as well as the actions taken on the transmission and/or -receipt of a message or other event. The Internet, and computer networks -in general, make extensive use of protocols. Different protocols are -used to accomplish different communication tasks. As you read through -this book, you will learn that some protocols are simple and -straightforward, while others are complex and intellectually deep. -Mastering the field of computer networking is equivalent to -understanding the what, why, and how of networking protocols. - -1.2 The Network Edge In the previous section we presented a high-level -overview of the Internet and networking protocols. We are now going to -delve a bit more deeply into the components of a computer network (and -the Internet, in particular). We begin in this section at the edge of a -network and look at the components with which we are most -familiar---namely, the computers, smartphones and other devices that we -use on a daily basis. In the next section we'll move from the network -edge to the network core and examine switching and routing in computer -networks. Recall from the previous section that in computer networking -jargon, the computers and other devices connected to the Internet are -often referred to as end systems. They are referred to as end systems -because they sit at the edge of the Internet, as shown in Figure 1.3. -The Internet's end systems include desktop computers (e.g., desktop PCs, -Macs, and Linux boxes), servers (e.g., Web and e-mail servers), and -mobile devices (e.g., laptops, smartphones, and tablets). Furthermore, -an increasing number of non-traditional "things" are being attached to -the Internet as end systems (see the Case History feature). End systems -are also referred to as hosts because they host (that is, run) -application programs such as a Web browser program, a Web server -program, an e-mail client program, or an e-mail server program. -Throughout this book we will use the - -Figure 1.3 End-system interaction - -CASE HISTORY THE INTERNET OF THINGS Can you imagine a world in which -just about everything is wirelessly connected to the Internet? A world -in which most people, cars, bicycles, eye glasses, watches, toys, -hospital equipment, home sensors, classrooms, video surveillance -systems, atmospheric sensors, store-shelf - -products, and pets are connected? This world of the Internet of Things -(IoT) may actually be just around the corner. By some estimates, as of -2015 there are already 5 billion things connected to the Internet, and -the number could reach 25 billion by 2020 \[Gartner 2014\]. These things -include our smartphones, which already follow us around in our homes, -offices, and cars, reporting our geolocations and usage data to our ISPs -and Internet applications. But in addition to our smartphones, a -wide-variety of non-traditional "things" are already available as -products. For example, there are Internet-connected wearables, including -watches (from Apple and many others) and eye glasses. Internet-connected -glasses can, for example, upload everything we see to the cloud, -allowing us to share our visual experiences with people around the world -in realtime. There are Internet-connected things already available for -the smart home, including Internet-connected thermostats that can be -controlled remotely from our smartphones, and Internet-connected body -scales, enabling us to graphically review the progress of our diets from -our smartphones. There are Internet-connected toys, including dolls that -recognize and interpret a child's speech and respond appropriately. The -IoT offers potentially revolutionary benefits to users. But at the same -time there are also huge security and privacy risks. For example, -attackers, via the Internet, might be able to hack into IoT devices or -into the servers collecting data from IoT devices. For example, an -attacker could hijack an Internet-connected doll and talk directly with -a child; or an attacker could hack into a database that stores personal -health and activity information collected from wearable devices. These -security and privacy concerns could undermine the consumer confidence -necessary for the technologies to meet their full potential and may -result in less widespread adoption \[FTC 2015\]. - -terms hosts and end systems interchangeably; that is, host = end system. -Hosts are sometimes further divided into two categories: clients and -servers. Informally, clients tend to be desktop and mobile PCs, -smartphones, and so on, whereas servers tend to be more powerful -machines that store and distribute Web pages, stream video, relay -e-mail, and so on. Today, most of the servers from which we receive -search results, e-mail, Web pages, and videos reside in large data -centers. For example, Google has 50-100 data centers, including about 15 -large centers, each with more than 100,000 servers. - -1.2.1 Access Networks Having considered the applications and end systems -at the "edge of the network," let's next consider the access -network---the network that physically connects an end system to the -first router (also known as the "edge router") on a path from the end -system to any other distant end system. Figure 1.4 shows several types -of access - -Figure 1.4 Access networks - -networks with thick, shaded lines and the settings (home, enterprise, -and wide-area mobile wireless) in which they are used. Home Access: DSL, -Cable, FTTH, Dial-Up, and Satellite - -In developed countries as of 2014, more than 78 percent of the -households have Internet access, with Korea, Netherlands, Finland, and -Sweden leading the way with more than 80 percent of households having -Internet access, almost all via a high-speed broadband connection \[ITU -2015\]. Given this widespread use of home access networks let's begin -our overview of access networks by considering how homes connect to the -Internet. Today, the two most prevalent types of broadband residential -access are digital subscriber line (DSL) and cable. A residence -typically obtains DSL Internet access from the same local telephone -company (telco) that provides its wired local phone access. Thus, when -DSL is used, a customer's telco is also its ISP. As shown in Figure 1.5, -each customer's DSL modem uses the existing telephone line (twistedpair -copper wire, which we'll discuss in Section 1.2.2) to exchange data with -a digital subscriber line access multiplexer (DSLAM) located in the -telco's local central office (CO). The home's DSL modem takes digital -data and translates it to high-frequency tones for transmission over -telephone wires to the CO; the analog signals from many such houses are -translated back into digital format at the DSLAM. The residential -telephone line carries both data and traditional telephone signals -simultaneously, which are encoded at different frequencies: A high-speed -downstream channel, in the 50 kHz to 1 MHz band A medium-speed upstream -channel, in the 4 kHz to 50 kHz band An ordinary two-way telephone -channel, in the 0 to 4 kHz band This approach makes the single DSL link -appear as if there were three separate links, so that a telephone call -and an Internet connection can share the DSL link at the same time. - -Figure 1.5 DSL Internet access - -(We'll describe this technique of frequency-division multiplexing in -Section 1.3.1.) On the customer side, a splitter separates the data and -telephone signals arriving to the home and forwards the data signal to - -the DSL modem. On the telco side, in the CO, the DSLAM separates the -data and phone signals and sends the data into the Internet. Hundreds or -even thousands of households connect to a single DSLAM \[Dischinger -2007\]. The DSL standards define multiple transmission rates, including -12 Mbps downstream and 1.8 Mbps upstream \[ITU 1999\], and 55 Mbps -downstream and 15 Mbps upstream \[ITU 2006\]. Because the downstream and -upstream rates are different, the access is said to be asymmetric. The -actual downstream and upstream transmission rates achieved may be less -than the rates noted above, as the DSL provider may purposefully limit a -residential rate when tiered service (different rates, available at -different prices) are offered. The maximum rate is also limited by the -distance between the home and the CO, the gauge of the twisted-pair line -and the degree of electrical interference. Engineers have expressly -designed DSL for short distances between the home and the CO; generally, -if the residence is not located within 5 to 10 miles of the CO, the -residence must resort to an alternative form of Internet access. While -DSL makes use of the telco's existing local telephone infrastructure, -cable Internet access makes use of the cable television company's -existing cable television infrastructure. A residence obtains cable -Internet access from the same company that provides its cable -television. As illustrated in Figure 1.6, fiber optics connect the cable -head end to neighborhood-level junctions, from which traditional coaxial -cable is then used to reach individual houses and apartments. Each -neighborhood junction typically supports 500 to 5,000 homes. Because -both fiber and coaxial cable are employed in this system, it is often -referred to as hybrid fiber coax (HFC). - -Figure 1.6 A hybrid fiber-coaxial access network - -Cable internet access requires special modems, called cable modems. As -with a DSL modem, the cable - -modem is typically an external device and connects to the home PC -through an Ethernet port. (We will discuss Ethernet in great detail in -Chapter 6.) At the cable head end, the cable modem termination system -(CMTS) serves a similar function as the DSL network's DSLAM---turning -the analog signal sent from the cable modems in many downstream homes -back into digital format. Cable modems divide the HFC network into two -channels, a downstream and an upstream channel. As with DSL, access is -typically asymmetric, with the downstream channel typically allocated a -higher transmission rate than the upstream channel. The DOCSIS 2.0 -standard defines downstream rates up to 42.8 Mbps and upstream rates of -up to 30.7 Mbps. As in the case of DSL networks, the maximum achievable -rate may not be realized due to lower contracted data rates or media -impairments. One important characteristic of cable Internet access is -that it is a shared broadcast medium. In particular, every packet sent -by the head end travels downstream on every link to every home and every -packet sent by a home travels on the upstream channel to the head end. -For this reason, if several users are simultaneously downloading a video -file on the downstream channel, the actual rate at which each user -receives its video file will be significantly lower than the aggregate -cable downstream rate. On the other hand, if there are only a few active -users and they are all Web surfing, then each of the users may actually -receive Web pages at the full cable downstream rate, because the users -will rarely request a Web page at exactly the same time. Because the -upstream channel is also shared, a distributed multiple access protocol -is needed to coordinate transmissions and avoid collisions. (We'll -discuss this collision issue in some detail in Chapter 6.) Although DSL -and cable networks currently represent more than 85 percent of -residential broadband access in the United States, an up-and-coming -technology that provides even higher speeds is fiber to the home (FTTH) -\[FTTH Council 2016\]. As the name suggests, the FTTH concept is -simple---provide an optical fiber path from the CO directly to the home. -Many countries today---including the UAE, South Korea, Hong Kong, Japan, -Singapore, Taiwan, Lithuania, and Sweden---now have household -penetration rates exceeding 30% \[FTTH Council 2016\]. There are several -competing technologies for optical distribution from the CO to the -homes. The simplest optical distribution network is called direct fiber, -with one fiber leaving the CO for each home. More commonly, each fiber -leaving the central office is actually shared by many homes; it is not -until the fiber gets relatively close to the homes that it is split into -individual customer-specific fibers. There are two competing -optical-distribution network architectures that perform this splitting: -active optical networks (AONs) and passive optical networks (PONs). AON -is essentially switched Ethernet, which is discussed in Chapter 6. Here, -we briefly discuss PON, which is used in Verizon's FIOS service. Figure -1.7 shows FTTH using the PON distribution architecture. Each home has an -optical network terminator (ONT), which is connected by dedicated -optical fiber to a neighborhood splitter. The splitter combines a number -of homes (typically less - -Figure 1.7 FTTH Internet access - -than 100) onto a single, shared optical fiber, which connects to an -optical line terminator (OLT) in the telco's CO. The OLT, providing -conversion between optical and electrical signals, connects to the -Internet via a telco router. In the home, users connect a home router -(typically a wireless router) to the ONT and access the Internet via -this home router. In the PON architecture, all packets sent from OLT to -the splitter are replicated at the splitter (similar to a cable head -end). FTTH can potentially provide Internet access rates in the gigabits -per second range. However, most FTTH ISPs provide different rate -offerings, with the higher rates naturally costing more money. The -average downstream speed of US FTTH customers was approximately 20 Mbps -in 2011 (compared with 13 Mbps for cable access networks and less than 5 -Mbps for DSL) \[FTTH Council 2011b\]. Two other access network -technologies are also used to provide Internet access to the home. In -locations where DSL, cable, and FTTH are not available (e.g., in some -rural settings), a satellite link can be used to connect a residence to -the Internet at speeds of more than 1 Mbps; StarBand and HughesNet are -two such satellite access providers. Dial-up access over traditional -phone lines is based on the same model as DSL---a home modem connects -over a phone line to a modem in the ISP. Compared with DSL and other -broadband access networks, dial-up access is excruciatingly slow at 56 -kbps. Access in the Enterprise (and the Home): Ethernet and WiFi On -corporate and university campuses, and increasingly in home settings, a -local area network (LAN) is used to connect an end system to the edge -router. Although there are many types of LAN technologies, Ethernet is -by far the most prevalent access technology in corporate, university, -and home networks. As shown in Figure 1.8, Ethernet users use -twisted-pair copper wire to connect to an Ethernet switch, a technology -discussed in detail in Chapter 6. The Ethernet switch, or a network of -such - -Figure 1.8 Ethernet Internet access - -interconnected switches, is then in turn connected into the larger -Internet. With Ethernet access, users typically have 100 Mbps or 1 Gbps -access to the Ethernet switch, whereas servers may have 1 Gbps or even -10 Gbps access. Increasingly, however, people are accessing the Internet -wirelessly from laptops, smartphones, tablets, and other "things" (see -earlier sidebar on "Internet of Things"). In a wireless LAN setting, -wireless users transmit/receive packets to/from an access point that is -connected into the enterprise's network (most likely using wired -Ethernet), which in turn is connected to the wired Internet. A wireless -LAN user must typically be within a few tens of meters of the access -point. Wireless LAN access based on IEEE 802.11 technology, more -colloquially known as WiFi, is now just about everywhere---universities, -business offices, cafes, airports, homes, and even in airplanes. In many -cities, one can stand on a street corner and be within range of ten or -twenty base stations (for a browseable global map of 802.11 base -stations that have been discovered and logged on a Web site by people -who take great enjoyment in doing such things, see \[wigle.net 2016\]). -As discussed in detail in Chapter 7, 802.11 today provides a shared -transmission rate of up to more than 100 Mbps. Even though Ethernet and -WiFi access networks were initially deployed in enterprise (corporate, -university) settings, they have recently become relatively common -components of home networks. Many homes combine broadband residential -access (that is, cable modems or DSL) with these inexpensive wireless -LAN technologies to create powerful home networks \[Edwards 2011\]. -Figure 1.9 shows a typical home network. This home network consists of a -roaming laptop as well as a wired PC; a base station (the wireless -access point), which communicates with the wireless PC and other -wireless devices in the home; a cable modem, providing broadband access -to the Internet; and a router, which interconnects the base station and -the stationary PC with the cable modem. This network allows household -members to have broadband access to the Internet with one member roaming -from the - -kitchen to the backyard to the bedrooms. - -Figure 1.9 A typical home network - -Wide-Area Wireless Access: 3G and LTE Increasingly, devices such as -iPhones and Android devices are being used to message, share photos in -social networks, watch movies, and stream music while on the run. These -devices employ the same wireless infrastructure used for cellular -telephony to send/receive packets through a base station that is -operated by the cellular network provider. Unlike WiFi, a user need only -be within a few tens of kilometers (as opposed to a few tens of meters) -of the base station. Telecommunications companies have made enormous -investments in so-called third-generation (3G) wireless, which provides -packet-switched wide-area wireless Internet access at speeds in excess -of 1 Mbps. But even higher-speed wide-area access technologies---a -fourth-generation (4G) of wide-area wireless networks---are already -being deployed. LTE (for "Long-Term Evolution"---a candidate for Bad -Acronym of the Year Award) has its roots in 3G technology, and can -achieve rates in excess of 10 Mbps. LTE downstream rates of many tens of -Mbps have been reported in commercial deployments. We'll cover the basic -principles of wireless networks and mobility, as well as WiFi, 3G, and -LTE technologies (and more!) in Chapter 7. - -1.2.2 Physical Media In the previous subsection, we gave an overview of -some of the most important network access technologies in the Internet. -As we described these technologies, we also indicated the physical media -used. For example, we said that HFC uses a combination of fiber cable -and coaxial cable. We said that DSL and Ethernet use copper wire. And we -said that mobile access networks use the radio spectrum. In this -subsection we provide a brief overview of these and other transmission -media that are commonly used in the Internet. - -In order to define what is meant by a physical medium, let us reflect on -the brief life of a bit. Consider a bit traveling from one end system, -through a series of links and routers, to another end system. This poor -bit gets kicked around and transmitted many, many times! The source end -system first transmits the bit, and shortly thereafter the first router -in the series receives the bit; the first router then transmits the bit, -and shortly thereafter the second router receives the bit; and so on. -Thus our bit, when traveling from source to destination, passes through -a series of transmitter-receiver pairs. For each transmitterreceiver -pair, the bit is sent by propagating electromagnetic waves or optical -pulses across a physical medium. The physical medium can take many -shapes and forms and does not have to be of the same type for each -transmitter-receiver pair along the path. Examples of physical media -include twisted-pair copper wire, coaxial cable, multimode fiber-optic -cable, terrestrial radio spectrum, and satellite radio spectrum. -Physical media fall into two categories: guided media and unguided -media. With guided media, the waves are guided along a solid medium, -such as a fiber-optic cable, a twisted-pair copper wire, or a coaxial -cable. With unguided media, the waves propagate in the atmosphere and in -outer space, such as in a wireless LAN or a digital satellite channel. -But before we get into the characteristics of the various media types, -let us say a few words about their costs. The actual cost of the -physical link (copper wire, fiber-optic cable, and so on) is often -relatively minor compared with other networking costs. In particular, -the labor cost associated with the installation of the physical link can -be orders of magnitude higher than the cost of the material. For this -reason, many builders install twisted pair, optical fiber, and coaxial -cable in every room in a building. Even if only one medium is initially -used, there is a good chance that another medium could be used in the -near future, and so money is saved by not having to lay additional wires -in the future. Twisted-Pair Copper Wire The least expensive and most -commonly used guided transmission medium is twisted-pair copper wire. -For over a hundred years it has been used by telephone networks. In -fact, more than 99 percent of the wired connections from the telephone -handset to the local telephone switch use twisted-pair copper wire. Most -of us have seen twisted pair in our homes (or those of our parents or -grandparents!) and work environments. Twisted pair consists of two -insulated copper wires, each about 1 mm thick, arranged in a regular -spiral pattern. The wires are twisted together to reduce the electrical -interference from similar pairs close by. Typically, a number of pairs -are bundled together in a cable by wrapping the pairs in a protective -shield. A wire pair constitutes a single communication link. Unshielded -twisted pair (UTP) is commonly used for computer networks within a -building, that is, for LANs. Data rates for LANs using twisted pair -today range from 10 Mbps to 10 Gbps. The data rates that can be achieved -depend on the thickness of the wire and the distance between transmitter -and receiver. When fiber-optic technology emerged in the 1980s, many -people disparaged twisted pair because of its relatively low bit rates. -Some people even felt that fiber-optic technology would completely -replace twisted pair. But twisted pair did not give up so easily. Modern -twisted-pair technology, such as category - -6a cable, can achieve data rates of 10 Gbps for distances up to a -hundred meters. In the end, twisted pair has emerged as the dominant -solution for high-speed LAN networking. As discussed earlier, twisted -pair is also commonly used for residential Internet access. We saw that -dial-up modem technology enables access at rates of up to 56 kbps over -twisted pair. We also saw that DSL (digital subscriber line) technology -has enabled residential users to access the Internet at tens of Mbps -over twisted pair (when users live close to the ISP's central office). -Coaxial Cable Like twisted pair, coaxial cable consists of two copper -conductors, but the two conductors are concentric rather than parallel. -With this construction and special insulation and shielding, coaxial -cable can achieve high data transmission rates. Coaxial cable is quite -common in cable television systems. As we saw earlier, cable television -systems have recently been coupled with cable modems to provide -residential users with Internet access at rates of tens of Mbps. In -cable television and cable Internet access, the transmitter shifts the -digital signal to a specific frequency band, and the resulting analog -signal is sent from the transmitter to one or more receivers. Coaxial -cable can be used as a guided shared medium. Specifically, a number of -end systems can be connected directly to the cable, with each of the end -systems receiving whatever is sent by the other end systems. Fiber -Optics An optical fiber is a thin, flexible medium that conducts pulses -of light, with each pulse representing a bit. A single optical fiber can -support tremendous bit rates, up to tens or even hundreds of gigabits -per second. They are immune to electromagnetic interference, have very -low signal attenuation up to 100 kilometers, and are very hard to tap. -These characteristics have made fiber optics the preferred longhaul -guided transmission media, particularly for overseas links. Many of the -long-distance telephone networks in the United States and elsewhere now -use fiber optics exclusively. Fiber optics is also prevalent in the -backbone of the Internet. However, the high cost of optical -devices---such as transmitters, receivers, and switches---has hindered -their deployment for short-haul transport, such as in a LAN or into the -home in a residential access network. The Optical Carrier (OC) standard -link speeds range from 51.8 Mbps to 39.8 Gbps; these specifications are -often referred to as OC-n, where the link speed equals n ∞ 51.8 Mbps. -Standards in use today include OC-1, OC-3, OC-12, OC-24, OC-48, OC96, -OC-192, OC-768. \[Mukherjee 2006, Ramaswami 2010\] provide coverage of -various aspects of optical networking. Terrestrial Radio Channels Radio -channels carry signals in the electromagnetic spectrum. They are an -attractive medium because they require no physical wire to be installed, -can penetrate walls, provide connectivity to a mobile user, - -and can potentially carry a signal for long distances. The -characteristics of a radio channel depend significantly on the -propagation environment and the distance over which a signal is to be -carried. Environmental considerations determine path loss and shadow -fading (which decrease the signal strength as the signal travels over a -distance and around/through obstructing objects), multipath fading (due -to signal reflection off of interfering objects), and interference (due -to other transmissions and electromagnetic signals). Terrestrial radio -channels can be broadly classified into three groups: those that operate -over very short distance (e.g., with one or two meters); those that -operate in local areas, typically spanning from ten to a few hundred -meters; and those that operate in the wide area, spanning tens of -kilometers. Personal devices such as wireless headsets, keyboards, and -medical devices operate over short distances; the wireless LAN -technologies described in Section 1.2.1 use local-area radio channels; -the cellular access technologies use wide-area radio channels. We'll -discuss radio channels in detail in Chapter 7. Satellite Radio Channels -A communication satellite links two or more Earth-based microwave -transmitter/ receivers, known as ground stations. The satellite receives -transmissions on one frequency band, regenerates the signal using a -repeater (discussed below), and transmits the signal on another -frequency. Two types of satellites are used in communications: -geostationary satellites and low-earth orbiting (LEO) satellites \[Wiki -Satellite 2016\]. Geostationary satellites permanently remain above the -same spot on Earth. This stationary presence is achieved by placing the -satellite in orbit at 36,000 kilometers above Earth's surface. This huge -distance from ground station through satellite back to ground station -introduces a substantial signal propagation delay of 280 milliseconds. -Nevertheless, satellite links, which can operate at speeds of hundreds -of Mbps, are often used in areas without access to DSL or cable-based -Internet access. LEO satellites are placed much closer to Earth and do -not remain permanently above one spot on Earth. They rotate around Earth -(just as the Moon does) and may communicate with each other, as well as -with ground stations. To provide continuous coverage to an area, many -satellites need to be placed in orbit. There are currently many -low-altitude communication systems in development. LEO satellite -technology may be used for Internet access sometime in the future. - -1.3 The Network Core Having examined the Internet's edge, let us now -delve more deeply inside the network core---the mesh of packet switches -and links that interconnects the Internet's end systems. Figure 1.10 -highlights the network core with thick, shaded lines. - -Figure 1.10 The network core - -1.3.1 Packet Switching In a network application, end systems exchange -messages with each other. Messages can contain anything the application -designer wants. Messages may perform a control function (for example, -the "Hi" messages in our handshaking example in Figure 1.2) or can -contain data, such as an e-mail message, a JPEG image, or an MP3 audio -file. To send a message from a source end system to a destination end -system, the source breaks long messages into smaller chunks of data -known as packets. Between source and destination, each packet travels -through communication links and packet switches (for which there are two -predominant types, routers and link-layer switches). Packets are -transmitted over each communication link at a rate equal to the full -transmission rate of the link. So, if a source end system or a packet -switch is sending a packet of L bits over a link with transmission rate -R bits/sec, then the time to transmit the packet is L / R seconds. -Store-and-Forward Transmission Most packet switches use -store-and-forward transmission at the inputs to the links. -Store-and-forward transmission means that the packet switch must receive -the entire packet before it can begin to transmit the first bit of the -packet onto the outbound link. To explore store-and-forward transmission -in more detail, consider a simple network consisting of two end systems -connected by a single router, as shown in Figure 1.11. A router will -typically have many incident links, since its job is to switch an -incoming packet onto an outgoing link; in this simple example, the -router has the rather simple task of transferring a packet from one -(input) link to the only other attached link. In this example, the -source has three packets, each consisting of L bits, to send to the -destination. At the snapshot of time shown in Figure 1.11, the source -has transmitted some of packet 1, and the front of packet 1 has already -arrived at the router. Because the router employs store-and-forwarding, -at this instant of time, the router cannot transmit the bits it has -received; instead it must first buffer (i.e., "store") the packet's -bits. Only after the router has received all of the packet's bits can it -begin to transmit (i.e., "forward") the packet onto the outbound link. -To gain some insight into store-and-forward transmission, let's now -calculate the amount of time that elapses from when the source begins to -send the packet until the destination has received the entire packet. -(Here we will ignore propagation delay---the time it takes for the bits -to travel across the wire at near the speed of light---which will be -discussed in Section 1.4.) The source begins to transmit at time 0; at -time L/R seconds, the source has transmitted the entire packet, and the -entire packet has been received and stored at the router (since there is -no propagation delay). At time L/R seconds, since the router has just -received the entire packet, it can begin to transmit the packet onto the -outbound link towards the destination; at time 2L/R, the router has -transmitted the entire packet, and the - -entire packet has been received by the destination. Thus, the total -delay is 2L/R. If the - -Figure 1.11 Store-and-forward packet switching - -switch instead forwarded bits as soon as they arrive (without first -receiving the entire packet), then the total delay would be L/R since -bits are not held up at the router. But, as we will discuss in Section -1.4, routers need to receive, store, and process the entire packet -before forwarding. Now let's calculate the amount of time that elapses -from when the source begins to send the first packet until the -destination has received all three packets. As before, at time L/R, the -router begins to forward the first packet. But also at time L/R the -source will begin to send the second packet, since it has just finished -sending the entire first packet. Thus, at time 2L/R, the destination has -received the first packet and the router has received the second packet. -Similarly, at time 3L/R, the destination has received the first two -packets and the router has received the third packet. Finally, at time -4L/R the destination has received all three packets! Let's now consider -the general case of sending one packet from source to destination over a -path consisting of N links each of rate R (thus, there are N-1 routers -between source and destination). Applying the same logic as above, we -see that the end-to-end delay is: dend-to-end=NLR - -(1.1) - -You may now want to try to determine what the delay would be for P -packets sent over a series of N links. Queuing Delays and Packet Loss -Each packet switch has multiple links attached to it. For each attached -link, the packet switch has an output buffer (also called an output -queue), which stores packets that the router is about to send into that -link. The output buffers play a key role in packet switching. If an -arriving packet needs to be transmitted onto a link but finds the link -busy with the transmission of another packet, the arriving packet must -wait in the output buffer. Thus, in addition to the store-and-forward -delays, packets suffer output buffer queuing delays. These delays are -variable and depend on the level of congestion in the network. - -Since the amount of buffer space is finite, an - -Figure 1.12 Packet switching - -arriving packet may find that the buffer is completely full with other -packets waiting for transmission. In this case, packet loss will -occur---either the arriving packet or one of the already-queued packets -will be dropped. Figure 1.12 illustrates a simple packet-switched -network. As in Figure 1.11, packets are represented by three-dimensional -slabs. The width of a slab represents the number of bits in the packet. -In this figure, all packets have the same width and hence the same -length. Suppose Hosts A and B are sending packets to Host E. Hosts A and -B first send their packets along 100 Mbps Ethernet links to the first -router. The router then directs these packets to the 15 Mbps link. If, -during a short interval of time, the arrival rate of packets to the -router (when converted to bits per second) exceeds 15 Mbps, congestion -will occur at the router as packets queue in the link's output buffer -before being transmitted onto the link. For example, if Host A and B -each send a burst of five packets back-to-back at the same time, then -most of these packets will spend some time waiting in the queue. The -situation is, in fact, entirely analogous to many common-day -situations---for example, when we wait in line for a bank teller or wait -in front of a tollbooth. We'll examine this queuing delay in more detail -in Section 1.4. Forwarding Tables and Routing Protocols Earlier, we said -that a router takes a packet arriving on one of its attached -communication links and forwards that packet onto another one of its -attached communication links. But how does the router determine which -link it should forward the packet onto? Packet forwarding is actually -done in different ways in different types of computer networks. Here, we -briefly describe how it is done in the Internet. - -In the Internet, every end system has an address called an IP address. -When a source end system wants to send a packet to a destination end -system, the source includes the destination's IP address in the packet's -header. As with postal addresses, this address has a hierarchical -structure. When a packet arrives at a router in the network, the router -examines a portion of the packet's destination address and forwards the -packet to an adjacent router. More specifically, each router has a -forwarding table that maps destination addresses (or portions of the -destination addresses) to that router's outbound links. When a packet -arrives at a router, the router examines the address and searches its -forwarding table, using this destination address, to find the -appropriate outbound link. The router then directs the packet to this -outbound link. The end-to-end routing process is analogous to a car -driver who does not use maps but instead prefers to ask for directions. -For example, suppose Joe is driving from Philadelphia to 156 Lakeside -Drive in Orlando, Florida. Joe first drives to his neighborhood gas -station and asks how to get to 156 Lakeside Drive in Orlando, Florida. -The gas station attendant extracts the Florida portion of the address -and tells Joe that he needs to get onto the interstate highway I-95 -South, which has an entrance just next to the gas station. He also tells -Joe that once he enters Florida, he should ask someone else there. Joe -then takes I-95 South until he gets to Jacksonville, Florida, at which -point he asks another gas station attendant for directions. The -attendant extracts the Orlando portion of the address and tells Joe that -he should continue on I-95 to Daytona Beach and then ask someone else. -In Daytona Beach, another gas station attendant also extracts the -Orlando portion of the address and tells Joe that he should take I-4 -directly to Orlando. Joe takes I-4 and gets off at the Orlando exit. Joe -goes to another gas station attendant, and this time the attendant -extracts the Lakeside Drive portion of the address and tells Joe the -road he must follow to get to Lakeside Drive. Once Joe reaches Lakeside -Drive, he asks a kid on a bicycle how to get to his destination. The kid -extracts the 156 portion of the address and points to the house. Joe -finally reaches his ultimate destination. In the above analogy, the gas -station attendants and kids on bicycles are analogous to routers. We -just learned that a router uses a packet's destination address to index -a forwarding table and determine the appropriate outbound link. But this -statement begs yet another question: How do forwarding tables get set? -Are they configured by hand in each and every router, or does the -Internet use a more automated procedure? This issue will be studied in -depth in Chapter 5. But to whet your appetite here, we'll note now that -the Internet has a number of special routing protocols that are used to -automatically set the forwarding tables. A routing protocol may, for -example, determine the shortest path from each router to each -destination and use the shortest path results to configure the -forwarding tables in the routers. How would you actually like to see the -end-to-end route that packets take in the Internet? We now invite you to -get your hands dirty by interacting with the Trace-route program. Simply -visit the site www.traceroute.org, choose a source in a particular -country, and trace the route from that source to your computer. (For a -discussion of Traceroute, see Section 1.4.) - -1.3.2 Circuit Switching There are two fundamental approaches to moving -data through a network of links and switches: circuit switching and -packet switching. Having covered packet-switched networks in the -previous subsection, we now turn our attention to circuit-switched -networks. In circuit-switched networks, the resources needed along a -path (buffers, link transmission rate) to provide for communication -between the end systems are reserved for the duration of the -communication session between the end systems. In packet-switched -networks, these resources are not reserved; a session's messages use the -resources on demand and, as a consequence, may have to wait (that is, -queue) for access to a communication link. As a simple analogy, consider -two restaurants, one that requires reservations and another that neither -requires reservations nor accepts them. For the restaurant that requires -reservations, we have to go through the hassle of calling before we -leave home. But when we arrive at the restaurant we can, in principle, -immediately be seated and order our meal. For the restaurant that does -not require reservations, we don't need to bother to reserve a table. -But when we arrive at the restaurant, we may have to wait for a table -before we can be seated. Traditional telephone networks are examples of -circuit-switched networks. Consider what happens when one person wants -to send information (voice or facsimile) to another over a telephone -network. Before the sender can send the information, the network must -establish a connection between the sender and the receiver. This is a -bona fide connection for which the switches on the path between the -sender and receiver maintain connection state for that connection. In -the jargon of telephony, this connection is called a circuit. When the -network establishes the circuit, it also reserves a constant -transmission rate in the network's links (representing a fraction of -each link's transmission capacity) for the duration of the connection. -Since a given transmission rate has been reserved for this -sender-toreceiver connection, the sender can transfer the data to the -receiver at the guaranteed constant rate. Figure 1.13 illustrates a -circuit-switched network. In this network, the four circuit switches are -interconnected by four links. Each of these links has four circuits, so -that each link can support four simultaneous connections. The hosts (for -example, PCs and workstations) are each directly connected to one of the -switches. When two hosts want to communicate, the network establishes a -dedicated endto-end connection between the two hosts. Thus, in order for -Host A to communicate with Host B, the network must first reserve one -circuit on each of two links. In this example, the dedicated end-to-end -connection uses the second circuit in the first link and the fourth -circuit in the second link. Because each link has four circuits, for -each link used by the end-to-end connection, the connection gets one -fourth of the link's total transmission capacity for the duration of the -connection. Thus, for example, if each link between adjacent switches -has a transmission rate of 1 Mbps, then each end-to-end circuit-switch -connection gets 250 kbps of dedicated transmission rate. - -Figure 1.13 A simple circuit-switched network consisting of four -switches and four links - -In contrast, consider what happens when one host wants to send a packet -to another host over a packet-switched network, such as the Internet. As -with circuit switching, the packet is transmitted over a series of -communication links. But different from circuit switching, the packet is -sent into the network without reserving any link resources whatsoever. -If one of the links is congested because other packets need to be -transmitted over the link at the same time, then the packet will have to -wait in a buffer at the sending side of the transmission link and suffer -a delay. The Internet makes its best effort to deliver packets in a -timely manner, but it does not make any guarantees. Multiplexing in -Circuit-Switched Networks A circuit in a link is implemented with either -frequency-division multiplexing (FDM) or time-division multiplexing -(TDM). With FDM, the frequency spectrum of a link is divided up among -the connections established across the link. Specifically, the link -dedicates a frequency band to each connection for the duration of the -connection. In telephone networks, this frequency band typically has a -width of 4 kHz (that is, 4,000 hertz or 4,000 cycles per second). The -width of the band is called, not surprisingly, the bandwidth. FM radio -stations also use FDM to share the frequency spectrum between 88 MHz and -108 MHz, with each station being allocated a specific frequency band. -For a TDM link, time is divided into frames of fixed duration, and each -frame is divided into a fixed number of time slots. When the network -establishes a connection across a link, the network dedicates one time -slot in every frame to this connection. These slots are dedicated for -the sole use of that connection, with one time slot available for use -(in every frame) to transmit the connection's data. - -Figure 1.14 With FDM, each circuit continuously gets a fraction of the -bandwidth. With TDM, each circuit gets all of the bandwidth periodically -during brief intervals of time (that is, during slots) - -Figure 1.14 illustrates FDM and TDM for a specific network link -supporting up to four circuits. For FDM, the frequency domain is -segmented into four bands, each of bandwidth 4 kHz. For TDM, the time -domain is segmented into frames, with four time slots in each frame; -each circuit is assigned the same dedicated slot in the revolving TDM -frames. For TDM, the transmission rate of a circuit is equal to the -frame rate multiplied by the number of bits in a slot. For example, if -the link transmits 8,000 frames per second and each slot consists of 8 -bits, then the transmission rate of each circuit is 64 kbps. Proponents -of packet switching have always argued that circuit switching is -wasteful because the dedicated circuits are idle during silent periods. -For example, when one person in a telephone call stops talking, the idle -network resources (frequency bands or time slots in the links along the -connection's route) cannot be used by other ongoing connections. As -another example of how these resources can be underutilized, consider a -radiologist who uses a circuit-switched network to remotely access a -series of x-rays. The radiologist sets up a connection, requests an -image, contemplates the image, and then requests a new image. Network -resources are allocated to the connection but are not used (i.e., are -wasted) during the radiologist's contemplation periods. Proponents of -packet switching also enjoy pointing out that establishing end-to-end -circuits and reserving end-to-end transmission capacity is complicated -and requires complex signaling software to coordinate the operation of -the switches along the end-to-end path. Before we finish our discussion -of circuit switching, let's work through a numerical example that should -shed further insight on the topic. Let us consider how long it takes to -send a file of 640,000 bits from Host A to Host B over a -circuit-switched network. Suppose that all links in the network use TDM -with 24 slots and have a bit rate of 1.536 Mbps. Also suppose that it -takes 500 msec to establish an end-to-end circuit before Host A can -begin to transmit the file. How long does it take to send the file? Each -circuit has a transmission rate of (1.536 Mbps)/24=64 kbps, so it takes -(640,000 bits)/(64 kbps)=10 seconds to transmit the file. To this 10 -seconds we add the circuit establishment time, giving 10.5 seconds to -send the file. Note that the transmission time is independent of the -number of links: The transmission time would be 10 seconds if the -end-to-end circuit passed through one link or a hundred links. (The -actual - -end-to-end delay also includes a propagation delay; see Section 1.4.) -Packet Switching Versus Circuit Switching Having described circuit -switching and packet switching, let us compare the two. Critics of -packet switching have often argued that packet switching is not suitable -for real-time services (for example, telephone calls and video -conference calls) because of its variable and unpredictable end-to-end -delays (due primarily to variable and unpredictable queuing delays). -Proponents of packet switching argue that (1) it offers better sharing -of transmission capacity than circuit switching and (2) it is simpler, -more efficient, and less costly to implement than circuit switching. An -interesting discussion of packet switching versus circuit switching is -\[Molinero-Fernandez 2002\]. Generally speaking, people who do not like -to hassle with restaurant reservations prefer packet switching to -circuit switching. Why is packet switching more efficient? Let's look at -a simple example. Suppose users share a 1 Mbps link. Also suppose that -each user alternates between periods of activity, when a user generates -data at a constant rate of 100 kbps, and periods of inactivity, when a -user generates no data. Suppose further that a user is active only 10 -percent of the time (and is idly drinking coffee during the remaining 90 -percent of the time). With circuit switching, 100 kbps must be reserved -for each user at all times. For example, with circuit-switched TDM, if a -one-second frame is divided into 10 time slots of 100 ms each, then each -user would be allocated one time slot per frame. Thus, the -circuit-switched link can support only 10(=1 Mbps/100 kbps) simultaneous -users. With packet switching, the probability that a specific user is -active is 0.1 (that is, 10 percent). If there are 35 users, the -probability that there are 11 or more simultaneously active users is -approximately 0.0004. (Homework Problem P8 outlines how this probability -is obtained.) When there are 10 or fewer simultaneously active users -(which happens with probability 0.9996), the aggregate arrival rate of -data is less than or equal to 1 Mbps, the output rate of the link. Thus, -when there are 10 or fewer active users, users' packets flow through the -link essentially without delay, as is the case with circuit switching. -When there are more than 10 simultaneously active users, then the -aggregate arrival rate of packets exceeds the output capacity of the -link, and the output queue will begin to grow. (It continues to grow -until the aggregate input rate falls back below 1 Mbps, at which point -the queue will begin to diminish in length.) Because the probability of -having more than 10 simultaneously active users is minuscule in this -example, packet switching provides essentially the same performance as -circuit switching, but does so while allowing for more than three times -the number of users. Let's now consider a second simple example. Suppose -there are 10 users and that one user suddenly generates one thousand -1,000-bit packets, while other users remain quiescent and do not -generate packets. Under TDM circuit switching with 10 slots per frame -and each slot consisting of 1,000 bits, the active user can only use its -one time slot per frame to transmit data, while the remaining nine time -slots in each frame remain idle. It will be 10 seconds before all of the -active user's one million bits of data has - -been transmitted. In the case of packet switching, the active user can -continuously send its packets at the full link rate of 1 Mbps, since -there are no other users generating packets that need to be multiplexed -with the active user's packets. In this case, all of the active user's -data will be transmitted within 1 second. The above examples illustrate -two ways in which the performance of packet switching can be superior to -that of circuit switching. They also highlight the crucial difference -between the two forms of sharing a link's transmission rate among -multiple data streams. Circuit switching pre-allocates use of the -transmission link regardless of demand, with allocated but unneeded link -time going unused. Packet switching on the other hand allocates link use -on demand. Link transmission capacity will be shared on a -packet-by-packet basis only among those users who have packets that need -to be transmitted over the link. Although packet switching and circuit -switching are both prevalent in today's telecommunication networks, the -trend has certainly been in the direction of packet switching. Even many -of today's circuitswitched telephone networks are slowly migrating -toward packet switching. In particular, telephone networks often use -packet switching for the expensive overseas portion of a telephone call. - -1.3.3 A Network of Networks We saw earlier that end systems (PCs, -smartphones, Web servers, mail servers, and so on) connect into the -Internet via an access ISP. The access ISP can provide either wired or -wireless connectivity, using an array of access technologies including -DSL, cable, FTTH, Wi-Fi, and cellular. Note that the access ISP does not -have to be a telco or a cable company; instead it can be, for example, a -university (providing Internet access to students, staff, and faculty), -or a company (providing access for its employees). But connecting end -users and content providers into an access ISP is only a small piece of -solving the puzzle of connecting the billions of end systems that make -up the Internet. To complete this puzzle, the access ISPs themselves -must be interconnected. This is done by creating a network of -networks---understanding this phrase is the key to understanding the -Internet. Over the years, the network of networks that forms the -Internet has evolved into a very complex structure. Much of this -evolution is driven by economics and national policy, rather than by -performance considerations. In order to understand today's Internet -network structure, let's incrementally build a series of network -structures, with each new structure being a better approximation of the -complex Internet that we have today. Recall that the overarching goal is -to interconnect the access ISPs so that all end systems can send packets -to each other. One naive approach would be to have each access ISP -directly connect with every other access ISP. Such a mesh design is, of -course, much too costly for the access ISPs, as it would require each -access ISP to have a separate communication link to each of the hundreds -of thousands of other access ISPs all over the world. - -Our first network structure, Network Structure 1, interconnects all of -the access ISPs with a single global transit ISP. Our (imaginary) global -transit ISP is a network of routers and communication links that not -only spans the globe, but also has at least one router near each of the -hundreds of thousands of access ISPs. Of course, it would be very costly -for the global ISP to build such an extensive network. To be profitable, -it would naturally charge each of the access ISPs for connectivity, with -the pricing reflecting (but not necessarily directly proportional to) -the amount of traffic an access ISP exchanges with the global ISP. Since -the access ISP pays the global transit ISP, the access ISP is said to be -a customer and the global transit ISP is said to be a provider. Now if -some company builds and operates a global transit ISP that is -profitable, then it is natural for other companies to build their own -global transit ISPs and compete with the original global transit ISP. -This leads to Network Structure 2, which consists of the hundreds of -thousands of access ISPs and multiple global transit ISPs. The access -ISPs certainly prefer Network Structure 2 over Network Structure 1 since -they can now choose among the competing global transit providers as a -function of their pricing and services. Note, however, that the global -transit ISPs themselves must interconnect: Otherwise access ISPs -connected to one of the global transit providers would not be able to -communicate with access ISPs connected to the other global transit -providers. Network Structure 2, just described, is a two-tier hierarchy -with global transit providers residing at the top tier and access ISPs -at the bottom tier. This assumes that global transit ISPs are not only -capable of getting close to each and every access ISP, but also find it -economically desirable to do so. In reality, although some ISPs do have -impressive global coverage and do directly connect with many access -ISPs, no ISP has presence in each and every city in the world. Instead, -in any given region, there may be a regional ISP to which the access -ISPs in the region connect. Each regional ISP then connects to tier-1 -ISPs. Tier-1 ISPs are similar to our (imaginary) global transit ISP; but -tier-1 ISPs, which actually do exist, do not have a presence in every -city in the world. There are approximately a dozen tier-1 ISPs, -including Level 3 Communications, AT&T, Sprint, and NTT. Interestingly, -no group officially sanctions tier-1 status; as the saying goes---if you -have to ask if you're a member of a group, you're probably not. -Returning to this network of networks, not only are there multiple -competing tier-1 ISPs, there may be multiple competing regional ISPs in -a region. In such a hierarchy, each access ISP pays the regional ISP to -which it connects, and each regional ISP pays the tier-1 ISP to which it -connects. (An access ISP can also connect directly to a tier-1 ISP, in -which case it pays the tier-1 ISP). Thus, there is customerprovider -relationship at each level of the hierarchy. Note that the tier-1 ISPs -do not pay anyone as they are at the top of the hierarchy. To further -complicate matters, in some regions, there may be a larger regional ISP -(possibly spanning an entire country) to which the smaller regional ISPs -in that region connect; the larger regional ISP then connects to a -tier-1 ISP. For example, in China, there are access ISPs in each city, -which connect to provincial ISPs, which in turn connect to national -ISPs, which finally connect to tier-1 ISPs \[Tian 2012\]. We refer to -this multi-tier hierarchy, which is still only a crude - -approximation of today's Internet, as Network Structure 3. To build a -network that more closely resembles today's Internet, we must add points -of presence (PoPs), multi-homing, peering, and Internet exchange points -(IXPs) to the hierarchical Network Structure 3. PoPs exist in all levels -of the hierarchy, except for the bottom (access ISP) level. A PoP is -simply a group of one or more routers (at the same location) in the -provider's network where customer ISPs can connect into the provider -ISP. For a customer network to connect to a provider's PoP, it can lease -a high-speed link from a third-party telecommunications provider to -directly connect one of its routers to a router at the PoP. Any ISP -(except for tier-1 ISPs) may choose to multi-home, that is, to connect -to two or more provider ISPs. So, for example, an access ISP may -multi-home with two regional ISPs, or it may multi-home with two -regional ISPs and also with a tier-1 ISP. Similarly, a regional ISP may -multi-home with multiple tier-1 ISPs. When an ISP multi-homes, it can -continue to send and receive packets into the Internet even if one of -its providers has a failure. As we just learned, customer ISPs pay their -provider ISPs to obtain global Internet interconnectivity. The amount -that a customer ISP pays a provider ISP reflects the amount of traffic -it exchanges with the provider. To reduce these costs, a pair of nearby -ISPs at the same level of the hierarchy can peer, that is, they can -directly connect their networks together so that all the traffic between -them passes over the direct connection rather than through upstream -intermediaries. When two ISPs peer, it is typically settlement-free, -that is, neither ISP pays the other. As noted earlier, tier-1 ISPs also -peer with one another, settlement-free. For a readable discussion of -peering and customer-provider relationships, see \[Van der Berg 2008\]. -Along these same lines, a third-party company can create an Internet -Exchange Point (IXP), which is a meeting point where multiple ISPs can -peer together. An IXP is typically in a stand-alone building with its -own switches \[Ager 2012\]. There are over 400 IXPs in the Internet -today \[IXP List 2016\]. We refer to this ecosystem---consisting of -access ISPs, regional ISPs, tier-1 ISPs, PoPs, multi-homing, peering, -and IXPs---as Network Structure 4. We now finally arrive at Network -Structure 5, which describes today's Internet. Network Structure 5, -illustrated in Figure 1.15, builds on top of Network Structure 4 by -adding content-provider networks. Google is currently one of the leading -examples of such a content-provider network. As of this writing, it is -estimated that Google has 50--100 data centers distributed across North -America, Europe, Asia, South America, and Australia. Some of these data -centers house over one hundred thousand servers, while other data -centers are smaller, housing only hundreds of servers. The Google data -centers are all interconnected via Google's private TCP/IP network, -which spans the entire globe but is nevertheless separate from the -public Internet. Importantly, the Google private network only carries -traffic to/from Google servers. As shown in Figure 1.15, the Google -private network attempts to "bypass" the upper tiers of the Internet by -peering (settlement free) with lower-tier ISPs, either by directly -connecting with them or by connecting with them at IXPs \[Labovitz -2010\]. However, because many access ISPs can still only be reached by -transiting through tier-1 networks, the Google network also connects to -tier-1 ISPs, and pays those ISPs for the traffic it exchanges with them. -By creating its own network, a content - -provider not only reduces its payments to upper-tier ISPs, but also has -greater control of how its services are ultimately delivered to end -users. Google's network infrastructure is described in greater detail in -Section 2.6. In summary, today's Internet---a network of networks---is -complex, consisting of a dozen or so tier-1 ISPs and hundreds of -thousands of lower-tier ISPs. The ISPs are diverse in their coverage, -with some spanning multiple continents and oceans, and others limited to -narrow geographic regions. The lowertier ISPs connect to the higher-tier -ISPs, and the higher-tier ISPs interconnect with one another. Users and -content providers are customers of lower-tier ISPs, and lower-tier ISPs -are customers of higher-tier ISPs. In recent years, major content -providers have also created their own networks and connect directly into -lower-tier ISPs where possible. - -Figure 1.15 Interconnection of ISPs - -1.4 Delay, Loss, and Throughput in Packet-Switched Networks Back in -Section 1.1 we said that the Internet can be viewed as an infrastructure -that provides services to distributed applications running on end -systems. Ideally, we would like Internet services to be able to move as -much data as we want between any two end systems, instantaneously, -without any loss of data. Alas, this is a lofty goal, one that is -unachievable in reality. Instead, computer networks necessarily -constrain throughput (the amount of data per second that can be -transferred) between end systems, introduce delays between end systems, -and can actually lose packets. On one hand, it is unfortunate that the -physical laws of reality introduce delay and loss as well as constrain -throughput. On the other hand, because computer networks have these -problems, there are many fascinating issues surrounding how to deal with -the problems---more than enough issues to fill a course on computer -networking and to motivate thousands of PhD theses! In this section, -we'll begin to examine and quantify delay, loss, and throughput in -computer networks. - -1.4.1 Overview of Delay in Packet-Switched Networks Recall that a packet -starts in a host (the source), passes through a series of routers, and -ends its journey in another host (the destination). As a packet travels -from one node (host or router) to the subsequent node (host or router) -along this path, the packet suffers from several types of delays at each -node along the path. The most important of these delays are the nodal -processing delay, queuing delay, transmission delay, and propagation -delay; together, these delays accumulate to give a total nodal delay. -The performance of many Internet applications---such as search, Web -browsing, e-mail, maps, instant messaging, and voice-over-IP---are -greatly affected by network delays. In order to acquire a deep -understanding of packet switching and computer networks, we must -understand the nature and importance of these delays. Types of Delay -Let's explore these delays in the context of Figure 1.16. As part of its -end-to-end route between source and destination, a packet is sent from -the upstream node through router A to router B. Our goal is to -characterize the nodal delay at router A. Note that router A has an -outbound link leading to router B. This link is preceded by a queue -(also known as a buffer). When the packet arrives at router A from the -upstream node, router A examines the packet's header to determine the -appropriate outbound link for the packet and then directs the packet to -this link. In this example, the outbound link for the packet is the one -that leads to router B. A packet can be transmitted on a link only if -there is no other packet currently - -being transmitted on the link and if there are no other packets -preceding it in the queue; if the link is - -Figure 1.16 The nodal delay at router A - -currently busy or if there are other packets already queued for the -link, the newly arriving packet will then join the queue. Processing -Delay The time required to examine the packet's header and determine -where to direct the packet is part of the processing delay. The -processing delay can also include other factors, such as the time needed -to check for bit-level errors in the packet that occurred in -transmitting the packet's bits from the upstream node to router A. -Processing delays in high-speed routers are typically on the order of -microseconds or less. After this nodal processing, the router directs -the packet to the queue that precedes the link to router B. (In Chapter -4 we'll study the details of how a router operates.) Queuing Delay At -the queue, the packet experiences a queuing delay as it waits to be -transmitted onto the link. The length of the queuing delay of a specific -packet will depend on the number of earlier-arriving packets that are -queued and waiting for transmission onto the link. If the queue is empty -and no other packet is currently being transmitted, then our packet's -queuing delay will be zero. On the other hand, if the traffic is heavy -and many other packets are also waiting to be transmitted, the queuing -delay will be long. We will see shortly that the number of packets that -an arriving packet might expect to find is a function of the intensity -and nature of the traffic arriving at the queue. Queuing delays can be -on the order of microseconds to milliseconds in practice. Transmission -Delay Assuming that packets are transmitted in a first-come-first-served -manner, as is common in packetswitched networks, our packet can be -transmitted only after all the packets that have arrived before it have -been transmitted. Denote the length of the packet by L bits, and denote -the transmission rate of - -the link from router A to router B by R bits/sec. For example, for a 10 -Mbps Ethernet link, the rate is R=10 Mbps; for a 100 Mbps Ethernet link, -the rate is R=100 Mbps. The transmission delay is L/R. This is the -amount of time required to push (that is, transmit) all of the packet's -bits into the link. Transmission delays are typically on the order of -microseconds to milliseconds in practice. Propagation Delay Once a bit -is pushed into the link, it needs to propagate to router B. The time -required to propagate from the beginning of the link to router B is the -propagation delay. The bit propagates at the propagation speed of the -link. The propagation speed depends on the physical medium of the link -(that is, fiber optics, twisted-pair copper wire, and so on) and is in -the range of 2⋅108 meters/sec to 3⋅108 meters/sec which is equal to, or -a little less than, the speed of light. The propagation delay is the -distance between two routers divided by the propagation speed. That is, -the propagation delay is d/s, where d is the distance between router A -and router B and s is the propagation speed of the link. Once the last -bit of the packet propagates to node B, it and all the preceding bits of -the packet are stored in router B. The whole process then continues with -router B now performing the forwarding. In wide-area networks, -propagation delays are on the order of milliseconds. Comparing -Transmission and Propagation Delay - -Exploring propagation delay and transmission delay - -Newcomers to the field of computer networking sometimes have difficulty -understanding the difference between transmission delay and propagation -delay. The difference is subtle but important. The transmission delay is -the amount of time required for the router to push out the packet; it is -a function of the packet's length and the transmission rate of the link, -but has nothing to do with the distance between the two routers. The -propagation delay, on the other hand, is the time it takes a bit to -propagate from one router to the next; it is a function of the distance -between the two routers, but has nothing to do with the packet's length -or the transmission rate of the link. An analogy might clarify the -notions of transmission and propagation delay. Consider a highway that -has a tollbooth every 100 kilometers, as shown in Figure 1.17. You can -think of the highway segments - -between tollbooths as links and the tollbooths as routers. Suppose that -cars travel (that is, propagate) on the highway at a rate of 100 km/hour -(that is, when a car leaves a tollbooth, it instantaneously accelerates -to 100 km/hour and maintains that speed between tollbooths). Suppose -next that 10 cars, traveling together as a caravan, follow each other in -a fixed order. You can think of each car as a bit and the caravan as a -packet. Also suppose that each - -Figure 1.17 Caravan analogy - -tollbooth services (that is, transmits) a car at a rate of one car per -12 seconds, and that it is late at night so that the caravan's cars are -the only cars on the highway. Finally, suppose that whenever the first -car of the caravan arrives at a tollbooth, it waits at the entrance -until the other nine cars have arrived and lined up behind it. (Thus the -entire caravan must be stored at the tollbooth before it can begin to be -forwarded.) The time required for the tollbooth to push the entire -caravan onto the highway is (10 cars)/(5 cars/minute)=2 minutes. This -time is analogous to the transmission delay in a router. The time -required for a car to travel from the exit of one tollbooth to the next -tollbooth is 100 km/(100 km/hour)=1 hour. This time is analogous to -propagation delay. Therefore, the time from when the caravan is stored -in front of a tollbooth until the caravan is stored in front of the next -tollbooth is the sum of transmission delay and propagation delay---in -this example, 62 minutes. Let's explore this analogy a bit more. What -would happen if the tollbooth service time for a caravan were greater -than the time for a car to travel between tollbooths? For example, -suppose now that the cars travel at the rate of 1,000 km/hour and the -tollbooth services cars at the rate of one car per minute. Then the -traveling delay between two tollbooths is 6 minutes and the time to -serve a caravan is 10 minutes. In this case, the first few cars in the -caravan will arrive at the second tollbooth before the last cars in the -caravan leave the first tollbooth. This situation also arises in -packet-switched networks---the first bits in a packet can arrive at a -router while many of the remaining bits in the packet are still waiting -to be transmitted by the preceding router. If a picture speaks a -thousand words, then an animation must speak a million words. The Web -site for this textbook provides an interactive Java applet that nicely -illustrates and contrasts transmission delay and propagation delay. The -reader is highly encouraged to visit that applet. \[Smith 2009\] also -provides a very readable discussion of propagation, queueing, and -transmission delays. If we let dproc, dqueue, dtrans, and dprop denote -the processing, queuing, transmission, and propagation - -delays, then the total nodal delay is given by -dnodal=dproc+dqueue+dtrans+dprop The contribution of these delay -components can vary significantly. For example, dprop can be negligible -(for example, a couple of microseconds) for a link connecting two -routers on the same university campus; however, dprop is hundreds of -milliseconds for two routers interconnected by a geostationary satellite -link, and can be the dominant term in dnodal. Similarly, dtrans can -range from negligible to significant. Its contribution is typically -negligible for transmission rates of 10 Mbps and higher (for example, -for LANs); however, it can be hundreds of milliseconds for large -Internet packets sent over low-speed dial-up modem links. The processing -delay, dproc, is often negligible; however, it strongly influences a -router's maximum throughput, which is the maximum rate at which a router -can forward packets. - -1.4.2 Queuing Delay and Packet Loss The most complicated and interesting -component of nodal delay is the queuing delay, dqueue. In fact, queuing -delay is so important and interesting in computer networking that -thousands of papers and numerous books have been written about it -\[Bertsekas 1991; Daigle 1991; Kleinrock 1975, Kleinrock 1976; Ross -1995\]. We give only a high-level, intuitive discussion of queuing delay -here; the more curious reader may want to browse through some of the -books (or even eventually write a PhD thesis on the subject!). Unlike -the other three delays (namely, dproc, dtrans, and dprop), the queuing -delay can vary from packet to packet. For example, if 10 packets arrive -at an empty queue at the same time, the first packet transmitted will -suffer no queuing delay, while the last packet transmitted will suffer a -relatively large queuing delay (while it waits for the other nine -packets to be transmitted). Therefore, when characterizing queuing -delay, one typically uses statistical measures, such as average queuing -delay, variance of queuing delay, and the probability that the queuing -delay exceeds some specified value. When is the queuing delay large and -when is it insignificant? The answer to this question depends on the -rate at which traffic arrives at the queue, the transmission rate of the -link, and the nature of the arriving traffic, that is, whether the -traffic arrives periodically or arrives in bursts. To gain some insight -here, let a denote the average rate at which packets arrive at the queue -(a is in units of packets/sec). Recall that R is the transmission rate; -that is, it is the rate (in bits/sec) at which bits are pushed out of -the queue. Also suppose, for simplicity, that all packets consist of L -bits. Then the average rate at which bits arrive at the queue is La -bits/sec. Finally, assume that the queue is very big, so that it can -hold essentially an infinite number of bits. The ratio La/R, called the -traffic intensity, often plays an important role in estimating the -extent of the queuing delay. If La/R \> 1, then the average rate at -which bits arrive at the queue exceeds the rate at which the bits can be -transmitted from the queue. In this - -unfortunate situation, the queue will tend to increase without bound and -the queuing delay will approach infinity! Therefore, one of the golden -rules in traffic engineering is: Design your system so that the traffic -intensity is no greater than 1. Now consider the case La/R ≤ 1. Here, -the nature of the arriving traffic impacts the queuing delay. For -example, if packets arrive periodically---that is, one packet arrives -every L/R seconds---then every packet will arrive at an empty queue and -there will be no queuing delay. On the other hand, if packets arrive in -bursts but periodically, there can be a significant average queuing -delay. For example, suppose N packets arrive simultaneously every (L/R)N -seconds. Then the first packet transmitted has no queuing delay; the -second packet transmitted has a queuing delay of L/R seconds; and more -generally, the nth packet transmitted has a queuing delay of (n−1)L/R -seconds. We leave it as an exercise for you to calculate the average -queuing delay in this example. The two examples of periodic arrivals -described above are a bit academic. Typically, the arrival process to a -queue is random; that is, the arrivals do not follow any pattern and the -packets are spaced apart by random amounts of time. In this more -realistic case, the quantity La/R is not usually sufficient to fully -characterize the queuing delay statistics. Nonetheless, it is useful in -gaining an intuitive understanding of the extent of the queuing delay. -In particular, if the traffic intensity is close to zero, then packet -arrivals are few and far between and it is unlikely that an arriving -packet will find another packet in the queue. Hence, the average queuing -delay will be close to zero. On the other hand, when the traffic -intensity is close to 1, there will be intervals of time when the -arrival rate exceeds the transmission capacity (due to variations in -packet arrival rate), and a queue will form during these periods of -time; when the arrival rate is less than the transmission capacity, the -length of the queue will shrink. Nonetheless, as the traffic intensity -approaches 1, the average queue length gets larger and larger. The -qualitative dependence of average queuing delay on the traffic intensity -is shown in Figure 1.18. One important aspect of Figure 1.18 is the fact -that as the traffic intensity approaches 1, the average queuing delay -increases rapidly. A small percentage increase in the intensity will -result in a much larger percentage-wise increase in delay. Perhaps you -have experienced this phenomenon on the highway. If you regularly drive -on a road that is typically congested, the fact that the road is -typically - -Figure 1.18 Dependence of average queuing delay on traffic intensity - -congested means that its traffic intensity is close to 1. If some event -causes an even slightly larger-thanusual amount of traffic, the delays -you experience can be huge. To really get a good feel for what queuing -delays are about, you are encouraged once again to visit the textbook -Web site, which provides an interactive Java applet for a queue. If you -set the packet arrival rate high enough so that the traffic intensity -exceeds 1, you will see the queue slowly build up over time. Packet Loss -In our discussions above, we have assumed that the queue is capable of -holding an infinite number of packets. In reality a queue preceding a -link has finite capacity, although the queuing capacity greatly depends -on the router design and cost. Because the queue capacity is finite, -packet delays do not really approach infinity as the traffic intensity -approaches 1. Instead, a packet can arrive to find a full queue. With no -place to store such a packet, a router will drop that packet; that is, -the packet will be lost. This overflow at a queue can again be seen in -the Java applet for a queue when the traffic intensity is greater -than 1. From an end-system viewpoint, a packet loss will look like a -packet having been transmitted into the network core but never emerging -from the network at the destination. The fraction of lost packets -increases as the traffic intensity increases. Therefore, performance at -a node is often measured not only in terms of delay, but also in terms -of the probability of packet loss. As we'll discuss in the subsequent -chapters, a lost packet may be retransmitted on an end-to-end basis in -order to ensure that all data are eventually transferred from source to -destination. - -1.4.3 End-to-End Delay - -Our discussion up to this point has focused on the nodal delay, that is, -the delay at a single router. Let's now consider the total delay from -source to destination. To get a handle on this concept, suppose there -are N−1 routers between the source host and the destination host. Let's -also suppose for the moment that the network is uncongested (so that -queuing delays are negligible), the processing delay at each router and -at the source host is dproc, the transmission rate out of each router -and out of the source host is R bits/sec, and the propagation on each -link is dprop. The nodal delays accumulate and give an end-toend delay, -dend−end=N(dproc+dtrans+dprop) - -(1.2) - -where, once again, dtrans=L/R, where L is the packet size. Note that -Equation 1.2 is a generalization of Equation 1.1, which did not take -into account processing and propagation delays. We leave it to you to -generalize Equation 1.2 to the case of heterogeneous delays at the nodes -and to the presence of an average queuing delay at each node. Traceroute - -Using Traceroute to discover network paths and measure network delay - -To get a hands-on feel for end-to-end delay in a computer network, we -can make use of the Traceroute program. Traceroute is a simple program -that can run in any Internet host. When the user specifies a destination -hostname, the program in the source host sends multiple, special packets -toward that destination. As these packets work their way toward the -destination, they pass through a series of routers. When a router -receives one of these special packets, it sends back to the source a -short message that contains the name and address of the router. More -specifically, suppose there are N−1 routers between the source and the -destination. Then the source will send N special packets into the -network, with each packet addressed to the ultimate destination. These N -special packets are marked 1 through N, with the first packet marked 1 -and the last packet marked N. When the nth router receives the nth -packet marked n, the router does not forward the packet toward its -destination, but instead sends a message back to the source. When the -destination host receives the Nth packet, it too returns a message back -to the source. The source records the time that elapses between when it -sends a packet and when it receives the corresponding - -return message; it also records the name and address of the router (or -the destination host) that returns the message. In this manner, the -source can reconstruct the route taken by packets flowing from source to -destination, and the source can determine the round-trip delays to all -the intervening routers. Traceroute actually repeats the experiment just -described three times, so the source actually sends 3 • N packets to the -destination. RFC 1393 describes Traceroute in detail. Here is an example -of the output of the Traceroute program, where the route was being -traced from the source host gaia.cs.umass.edu (at the University of -Massachusetts) to the host cis.poly.edu (at Polytechnic University in -Brooklyn). The output has six columns: the first column is the n value -described above, that is, the number of the router along the route; the -second column is the name of the router; the third column is the address -of the router (of the form xxx.xxx.xxx.xxx); the last three columns are -the round-trip delays for three experiments. If the source receives -fewer than three messages from any given router (due to packet loss in -the network), Traceroute places an asterisk just after the router number -and reports fewer than three round-trip times for that router. - -1 - -cs-gw (128.119.240.254) 1.009 ms 0.899 ms 0.993 ms - -2 - -128.119.3.154 (128.119.3.154) 0.931 ms 0.441 ms 0.651 ms - -3 - --border4-rt-gi-1-3.gw.umass.edu (128.119.2.194) 1.032 ms 0.484 ms - -0.451 ms 4 - --acr1-ge-2-1-0.Boston.cw.net (208.172.51.129) 10.006 ms 8.150 ms 8.460 - -ms 5 - --agr4-loopback.NewYork.cw.net (206.24.194.104) 12.272 ms 14.344 ms - -13.267 ms 6 - --acr2-loopback.NewYork.cw.net (206.24.194.62) 13.225 ms 12.292 ms - -12.148 ms 7 - --pos10-2.core2.NewYork1.Level3.net (209.244.160.133) 12.218 ms 11.823 - -ms 11.793 ms 8 - --gige9-1-52.hsipaccess1.NewYork1.Level3.net (64.159.17.39) 13.081 ms - -11.556 ms 13.297 ms 9 - --p0-0.polyu.bbnplanet.net (4.25.109.122) 12.716 ms 13.052 ms 12.786 ms - -10 cis.poly.edu (128.238.32.126) 14.080 ms 13.035 ms 12.802 ms - -In the trace above there are nine routers between the source and the -destination. Most of these routers have a name, and all of them have -addresses. For example, the name of Router 3 is -border4-rt-gi1-3.gw.umass.edu and its address is 128.119.2.194 . Looking -at the data provided for this same router, we see that in the first of -the three trials the round-trip delay between the source and the router -was 1.03 msec. The round-trip delays for the subsequent two trials were -0.48 and 0.45 msec. These - -round-trip delays include all of the delays just discussed, including -transmission delays, propagation delays, router processing delays, and -queuing delays. Because the queuing delay is varying with time, the -round-trip delay of packet n sent to a router n can sometimes be longer -than the round-trip delay of packet n+1 sent to router n+1. Indeed, we -observe this phenomenon in the above example: the delays to Router 6 are -larger than the delays to Router 7! Want to try out Traceroute for -yourself? We highly recommended that you visit http:// -www.traceroute.org, which provides a Web interface to an extensive list -of sources for route tracing. You choose a source and supply the -hostname for any destination. The Traceroute program then does all the -work. There are a number of free software programs that provide a -graphical interface to Traceroute; one of our favorites is PingPlotter -\[PingPlotter 2016\]. End System, Application, and Other Delays In -addition to processing, transmission, and propagation delays, there can -be additional significant delays in the end systems. For example, an end -system wanting to transmit a packet into a shared medium (e.g., as in a -WiFi or cable modem scenario) may purposefully delay its transmission as -part of its protocol for sharing the medium with other end systems; -we'll consider such protocols in detail in Chapter 6. Another important -delay is media packetization delay, which is present in Voice-over-IP -(VoIP) applications. In VoIP, the sending side must first fill a packet -with encoded digitized speech before passing the packet to the Internet. -This time to fill a packet---called the packetization delay---can be -significant and can impact the user-perceived quality of a VoIP call. -This issue will be further explored in a homework problem at the end of -this chapter. - -1.4.4 Throughput in Computer Networks In addition to delay and packet -loss, another critical performance measure in computer networks is -endto-end throughput. To define throughput, consider transferring a -large file from Host A to Host B across a computer network. This -transfer might be, for example, a large video clip from one peer to -another in a P2P file sharing system. The instantaneous throughput at -any instant of time is the rate (in bits/sec) at which Host B is -receiving the file. (Many applications, including many P2P file sharing -systems, display the instantaneous throughput during downloads in the -user interface---perhaps you have observed this before!) If the file -consists of F bits and the transfer takes T seconds for Host B to -receive all F bits, then the average throughput of the file transfer is -F/T bits/sec. For some applications, such as Internet telephony, it is -desirable to have a low delay and an instantaneous throughput -consistently above some threshold (for example, over 24 kbps for some -Internet telephony applications and over 256 kbps for some real-time -video applications). For other applications, including those involving -file transfers, delay is not critical, but it is desirable to have the -highest possible throughput. - -To gain further insight into the important concept of throughput, let's -consider a few examples. Figure 1.19(a) shows two end systems, a server -and a client, connected by two communication links and a router. -Consider the throughput for a file transfer from the server to the -client. Let Rs denote the rate of the link between the server and the -router; and Rc denote the rate of the link between the router and the -client. Suppose that the only bits being sent in the entire network are -those from the server to the client. We now ask, in this ideal scenario, -what is the server-to-client throughput? To answer this question, we may -think of bits as fluid and communication links as pipes. Clearly, the -server cannot pump bits through its link at a rate faster than Rs bps; -and the router cannot forward bits at a rate faster than Rc bps. If -Rs\<Rc, then the bits pumped by the server will "flow" right through the -router and arrive at the client at a rate of Rs bps, giving a throughput -of Rs bps. If, on the other hand, Rc\<Rs, then the router will not be -able to forward bits as quickly as it receives them. In this case, bits -will only leave the router at rate Rc, giving an end-to-end throughput -of Rc. (Note also that if bits continue to arrive at the router at rate -Rs, and continue to leave the router at Rc, the backlog of bits at the -router waiting - -Figure 1.19 Throughput for a file transfer from server to client - -for transmission to the client will grow and grow---a most undesirable -situation!) Thus, for this simple two-link network, the throughput is -min{Rc, Rs}, that is, it is the transmission rate of the bottleneck -link. Having determined the throughput, we can now approximate the time -it takes to transfer a large file of F bits from server to client as -F/min{Rs, Rc}. For a specific example, suppose you are downloading an -MP3 file of F=32 million bits, the server has a transmission rate of -Rs=2 Mbps, and you have an access link of Rc=1 Mbps. The time needed to -transfer the file is then 32 seconds. Of course, these expressions for -throughput and transfer time are only approximations, as they do not -account for store-and-forward and processing delays as well as protocol -issues. Figure 1.19(b) now shows a network with N links between the -server and the client, with the transmission rates of the N links being -R1,R2,..., RN. Applying the same analysis as for the two-link network, -we find that the throughput for a file transfer from server to client is -min{R1,R2,..., RN}, which - -is once again the transmission rate of the bottleneck link along the -path between server and client. Now consider another example motivated -by today's Internet. Figure 1.20(a) shows two end systems, a server and -a client, connected to a computer network. Consider the throughput for a -file transfer from the server to the client. The server is connected to -the network with an access link of rate Rs and the client is connected -to the network with an access link of rate Rc. Now suppose that all the -links in the core of the communication network have very high -transmission rates, much higher than Rs and Rc. Indeed, today, the core -of the Internet is over-provisioned with high speed links that -experience little congestion. Also suppose that the only bits being sent -in the entire network are those from the server to the client. Because -the core of the computer network is like a wide pipe in this example, -the rate at which bits can flow from source to destination is again the -minimum of Rs and Rc, that is, throughput = min{Rs, Rc}. Therefore, the -constraining factor for throughput in today's Internet is typically the -access network. For a final example, consider Figure 1.20(b) in which -there are 10 servers and 10 clients connected to the core of the -computer network. In this example, there are 10 simultaneous downloads -taking place, involving 10 client-server pairs. Suppose that these 10 -downloads are the only traffic in the network at the current time. As -shown in the figure, there is a link in the core that is traversed by -all 10 downloads. Denote R for the transmission rate of this link R. -Let's suppose that all server access links have the same rate Rs, all -client access links have the same rate Rc, and the transmission rates of -all the links in the core---except the one common link of rate R---are -much larger than Rs, Rc, and R. Now we ask, what are the throughputs of -the downloads? Clearly, if the rate of the common link, R, is -large---say a hundred times larger than both Rs and Rc---then the -throughput for each download will once again be min{Rs, Rc}. But what if -the rate of the common link is of the same order as Rs and Rc? What will -the throughput be in this case? Let's take a look at a specific example. -Suppose Rs=2 Mbps, Rc=1 Mbps, R=5 Mbps, and the - -Figure 1.20 End-to-end throughput: (a) Client downloads a file from -server; (b) 10 clients downloading with 10 servers - -common link divides its transmission rate equally among the 10 -downloads. Then the bottleneck for each download is no longer in the -access network, but is now instead the shared link in the core, which -only provides each download with 500 kbps of throughput. Thus the -end-to-end throughput for each download is now reduced to 500 kbps. The -examples in Figure 1.19 and Figure 1.20(a) show that throughput depends -on the transmission rates of the links over which the data flows. We saw -that when there is no other intervening traffic, the throughput can -simply be approximated as the minimum transmission rate along the path -between source and destination. The example in Figure 1.20(b) shows that -more generally the throughput depends not only on the transmission rates -of the links along the path, but also on the intervening traffic. In -particular, a link with a high transmission rate may nonetheless be the -bottleneck link for a file transfer if many other data flows are also -passing through that link. We will examine throughput in computer -networks more closely in the homework problems and in the subsequent -chapters. - -1.5 Protocol Layers and Their Service Models From our discussion thus -far, it is apparent that the Internet is an extremely complicated -system. We have seen that there are many pieces to the Internet: -numerous applications and protocols, various types of end systems, -packet switches, and various types of link-level media. Given this -enormous complexity, is there any hope of organizing a network -architecture, or at least our discussion of network architecture? -Fortunately, the answer to both questions is yes. - -1.5.1 Layered Architecture Before attempting to organize our thoughts on -Internet architecture, let's look for a human analogy. Actually, we deal -with complex systems all the time in our everyday life. Imagine if -someone asked you to describe, for example, the airline system. How -would you find the structure to describe this complex system that has -ticketing agents, baggage checkers, gate personnel, pilots, airplanes, -air traffic control, and a worldwide system for routing airplanes? One -way to describe this system might be to describe the series of actions -you take (or others take for you) when you fly on an airline. You -purchase your ticket, check your bags, go to the gate, and eventually -get loaded onto the plane. The plane takes off and is routed to its -destination. After your plane lands, you deplane at the gate and claim -your bags. If the trip was bad, you complain about the flight to the -ticket agent (getting nothing for your effort). This scenario is shown -in Figure 1.21. - -Figure 1.21 Taking an airplane trip: actions - -Figure 1.22 Horizontal layering of airline functionality - -Already, we can see some analogies here with computer networking: You -are being shipped from source to destination by the airline; a packet is -shipped from source host to destination host in the Internet. But this -is not quite the analogy we are after. We are looking for some structure -in Figure 1.21. Looking at Figure 1.21, we note that there is a -ticketing function at each end; there is also a baggage function for -already-ticketed passengers, and a gate function for already-ticketed -and already-baggagechecked passengers. For passengers who have made it -through the gate (that is, passengers who are already ticketed, -baggage-checked, and through the gate), there is a takeoff and landing -function, and while in flight, there is an airplane-routing function. -This suggests that we can look at the functionality in Figure 1.21 in a -horizontal manner, as shown in Figure 1.22. Figure 1.22 has divided the -airline functionality into layers, providing a framework in which we can -discuss airline travel. Note that each layer, combined with the layers -below it, implements some functionality, some service. At the ticketing -layer and below, airline-counter-to-airline-counter transfer of a person -is accomplished. At the baggage layer and below, -baggage-check-to-baggage-claim transfer of a person and bags is -accomplished. Note that the baggage layer provides this service only to -an already-ticketed person. At the gate layer, -departure-gate-to-arrival-gate transfer of a person and bags is -accomplished. At the takeoff/landing layer, runway-to-runway transfer of -people and their bags is accomplished. Each layer provides its service -by (1) performing certain actions within that layer (for example, at the -gate layer, loading and unloading people from an airplane) and by (2) -using the services of the layer directly below it (for example, in the -gate layer, using the runway-to-runway passenger transfer service of the -takeoff/landing layer). A layered architecture allows us to discuss a -well-defined, specific part of a large and complex system. This -simplification itself is of considerable value by providing modularity, -making it much easier to change the implementation of the service -provided by the layer. As long as the layer provides the same service to -the layer above it, and uses the same services from the layer below it, -the remainder of the system remains unchanged when a layer's -implementation is changed. (Note that changing the - -implementation of a service is very different from changing the service -itself!) For example, if the gate functions were changed (for instance, -to have people board and disembark by height), the remainder of the -airline system would remain unchanged since the gate layer still -provides the same function (loading and unloading people); it simply -implements that function in a different manner after the change. For -large and complex systems that are constantly being updated, the ability -to change the implementation of a service without affecting other -components of the system is another important advantage of layering. -Protocol Layering But enough about airlines. Let's now turn our -attention to network protocols. To provide structure to the design of -network protocols, network designers organize protocols---and the -network hardware and software that implement the protocols---in layers. -Each protocol belongs to one of the layers, just as each function in the -airline architecture in Figure 1.22 belonged to a layer. We are again -interested in the services that a layer offers to the layer above---the -so-called service model of a layer. Just as in the case of our airline -example, each layer provides its service by (1) performing certain -actions within that layer and by (2) using the services of the layer -directly below it. For example, the services provided by layer n may -include reliable delivery of messages from one edge of the network to -the other. This might be implemented by using an unreliable edge-to-edge -message delivery service of layer n−1, and adding layer n functionality -to detect and retransmit lost messages. A protocol layer can be -implemented in software, in hardware, or in a combination of the two. -Application-layer protocols---such as HTTP and SMTP---are almost always -implemented in software in the end systems; so are transport-layer -protocols. Because the physical layer and data link layers are -responsible for handling communication over a specific link, they are -typically implemented in a network interface card (for example, Ethernet -or WiFi interface cards) associated with a given link. The network layer -is often a mixed implementation of hardware and software. Also note that -just as the functions in the layered airline architecture were -distributed among the various airports and flight control centers that -make up the system, so too is a layer n protocol distributed among the -end systems, packet switches, and other components that make up the -network. That is, there's often a piece of a layer n protocol in each of -these network components. Protocol layering has conceptual and -structural advantages \[RFC 3439\]. As we have seen, layering provides a -structured way to discuss system components. Modularity makes it easier -to update system components. We mention, however, that some researchers -and networking engineers are vehemently opposed to layering \[Wakeman -1992\]. One potential drawback of layering is that one layer may -duplicate lower-layer functionality. For example, many protocol stacks -provide error recovery - -Figure 1.23 The Internet protocol stack (a) and OSI reference model (b) - -on both a per-link basis and an end-to-end basis. A second potential -drawback is that functionality at one layer may need information (for -example, a timestamp value) that is present only in another layer; this -violates the goal of separation of layers. When taken together, the -protocols of the various layers are called the protocol stack. The -Internet protocol stack consists of five layers: the physical, link, -network, transport, and application layers, as shown in Figure 1.23(a). -If you examine the Table of Contents, you will see that we have roughly -organized this book using the layers of the Internet protocol stack. We -take a top-down approach, first covering the application layer and then -proceeding downward. Application Layer The application layer is where -network applications and their application-layer protocols reside. The -Internet's application layer includes many protocols, such as the HTTP -protocol (which provides for Web document request and transfer), SMTP -(which provides for the transfer of e-mail messages), and FTP (which -provides for the transfer of files between two end systems). We'll see -that certain network functions, such as the translation of -human-friendly names for Internet end systems like www.ietf.org to a -32-bit network address, are also done with the help of a specific -application-layer protocol, namely, the domain name system (DNS). We'll -see in Chapter 2 that it is very easy to create and deploy our own new -application-layer protocols. An application-layer protocol is -distributed over multiple end systems, with the application in one end -system using the protocol to exchange packets of information with the -application in another end system. We'll refer to this packet of -information at the application layer as a message. Transport Layer - -The Internet's transport layer transports application-layer messages -between application endpoints. In the Internet there are two transport -protocols, TCP and UDP, either of which can transport applicationlayer -messages. TCP provides a connection-oriented service to its -applications. This service includes guaranteed delivery of -application-layer messages to the destination and flow control (that is, -sender/receiver speed matching). TCP also breaks long messages into -shorter segments and provides a congestion-control mechanism, so that a -source throttles its transmission rate when the network is congested. -The UDP protocol provides a connectionless service to its applications. -This is a no-frills service that provides no reliability, no flow -control, and no congestion control. In this book, we'll refer to a -transport-layer packet as a segment. Network Layer The Internet's -network layer is responsible for moving network-layer packets known as -datagrams from one host to another. The Internet transport-layer -protocol (TCP or UDP) in a source host passes a transport-layer segment -and a destination address to the network layer, just as you would give -the postal service a letter with a destination address. The network -layer then provides the service of delivering the segment to the -transport layer in the destination host. The Internet's network layer -includes the celebrated IP protocol, which defines the fields in the -datagram as well as how the end systems and routers act on these fields. -There is only one IP protocol, and all Internet components that have a -network layer must run the IP protocol. The Internet's network layer -also contains routing protocols that determine the routes that datagrams -take between sources and destinations. The Internet has many routing -protocols. As we saw in Section 1.3, the Internet is a network of -networks, and within a network, the network administrator can run any -routing protocol desired. Although the network layer contains both the -IP protocol and numerous routing protocols, it is often simply referred -to as the IP layer, reflecting the fact that IP is the glue that binds -the Internet together. Link Layer The Internet's network layer routes a -datagram through a series of routers between the source and destination. -To move a packet from one node (host or router) to the next node in the -route, the network layer relies on the services of the link layer. In -particular, at each node, the network layer passes the datagram down to -the link layer, which delivers the datagram to the next node along the -route. At this next node, the link layer passes the datagram up to the -network layer. The services provided by the link layer depend on the -specific link-layer protocol that is employed over the link. For -example, some link-layer protocols provide reliable delivery, from -transmitting node, over one link, to receiving node. Note that this -reliable delivery service is different from the reliable delivery -service of TCP, which provides reliable delivery from one end system to -another. Examples of link-layer - -protocols include Ethernet, WiFi, and the cable access network's DOCSIS -protocol. As datagrams typically need to traverse several links to -travel from source to destination, a datagram may be handled by -different link-layer protocols at different links along its route. For -example, a datagram may be handled by Ethernet on one link and by PPP on -the next link. The network layer will receive a different service from -each of the different link-layer protocols. In this book, we'll refer to -the link-layer packets as frames. Physical Layer While the job of the -link layer is to move entire frames from one network element to an -adjacent network element, the job of the physical layer is to move the -individual bits within the frame from one node to the next. The -protocols in this layer are again link dependent and further depend on -the actual transmission medium of the link (for example, twisted-pair -copper wire, single-mode fiber optics). For example, Ethernet has many -physical-layer protocols: one for twisted-pair copper wire, another for -coaxial cable, another for fiber, and so on. In each case, a bit is -moved across the link in a different way. The OSI Model Having discussed -the Internet protocol stack in detail, we should mention that it is not -the only protocol stack around. In particular, back in the late 1970s, -the International Organization for Standardization (ISO) proposed that -computer networks be organized around seven layers, called the Open -Systems Interconnection (OSI) model \[ISO 2016\]. The OSI model took -shape when the protocols that were to become the Internet protocols were -in their infancy, and were but one of many different protocol suites -under development; in fact, the inventors of the original OSI model -probably did not have the Internet in mind when creating it. -Nevertheless, beginning in the late 1970s, many training and university -courses picked up on the ISO mandate and organized courses around the -seven-layer model. Because of its early impact on networking education, -the seven-layer model continues to linger on in some networking -textbooks and training courses. The seven layers of the OSI reference -model, shown in Figure 1.23(b), are: application layer, presentation -layer, session layer, transport layer, network layer, data link layer, -and physical layer. The functionality of five of these layers is roughly -the same as their similarly named Internet counterparts. Thus, let's -consider the two additional layers present in the OSI reference -model---the presentation layer and the session layer. The role of the -presentation layer is to provide services that allow communicating -applications to interpret the meaning of data exchanged. These services -include data compression and data encryption (which are -self-explanatory) as well as data description (which frees the -applications from having to worry about the internal format in which -data are represented/stored---formats that may differ from one computer -to another). The session layer provides for delimiting and -synchronization of data exchange, including the means to build a -checkpointing and recovery scheme. - -The fact that the Internet lacks two layers found in the OSI reference -model poses a couple of interesting questions: Are the services provided -by these layers unimportant? What if an application needs one of these -services? The Internet's answer to both of these questions is the -same---it's up to the application developer. It's up to the application -developer to decide if a service is important, and if the service is -important, it's up to the application developer to build that -functionality into the application. - -1.5.2 Encapsulation Figure 1.24 shows the physical path that data takes -down a sending end system's protocol stack, up and down the protocol -stacks of an intervening link-layer switch - -Figure 1.24 Hosts, routers, and link-layer switches; each contains a -different set of layers, reflecting their differences in functionality - -and router, and then up the protocol stack at the receiving end system. -As we discuss later in this book, routers and link-layer switches are -both packet switches. Similar to end systems, routers and link-layer -switches organize their networking hardware and software into layers. -But routers and link-layer switches do not implement all of the layers -in the protocol stack; they typically implement only the bottom layers. -As shown in Figure 1.24, link-layer switches implement layers 1 and 2; -routers implement layers 1 through 3. This means, for example, that -Internet routers are capable of implementing the IP protocol (a layer 3 -protocol), while link-layer switches are not. We'll see later that - -while link-layer switches do not recognize IP addresses, they are -capable of recognizing layer 2 addresses, such as Ethernet addresses. -Note that hosts implement all five layers; this is consistent with the -view that the Internet architecture puts much of its complexity at the -edges of the network. Figure 1.24 also illustrates the important concept -of encapsulation. At the sending host, an application-layer message (M -in Figure 1.24) is passed to the transport layer. In the simplest case, -the transport layer takes the message and appends additional information -(so-called transport-layer header information, Ht in Figure 1.24) that -will be used by the receiver-side transport layer. The application-layer -message and the transport-layer header information together constitute -the transportlayer segment. The transport-layer segment thus -encapsulates the application-layer message. The added information might -include information allowing the receiver-side transport layer to -deliver the message up to the appropriate application, and -error-detection bits that allow the receiver to determine whether bits -in the message have been changed in route. The transport layer then -passes the segment to the network layer, which adds network-layer header -information (Hn in Figure 1.24) such as source and destination end -system addresses, creating a network-layer datagram. The datagram is -then passed to the link layer, which (of course!) will add its own -link-layer header information and create a link-layer frame. Thus, we -see that at each layer, a packet has two types of fields: header fields -and a payload field. The payload is typically a packet from the layer -above. A useful analogy here is the sending of an interoffice memo from -one corporate branch office to another via the public postal service. -Suppose Alice, who is in one branch office, wants to send a memo to Bob, -who is in another branch office. The memo is analogous to the -application-layer message. Alice puts the memo in an interoffice -envelope with Bob's name and department written on the front of the -envelope. The interoffice envelope is analogous to a transport-layer -segment---it contains header information (Bob's name and department -number) and it encapsulates the application-layer message (the memo). -When the sending branch-office mailroom receives the interoffice -envelope, it puts the interoffice envelope inside yet another envelope, -which is suitable for sending through the public postal service. The -sending mailroom also writes the postal address of the sending and -receiving branch offices on the postal envelope. Here, the postal -envelope is analogous to the datagram---it encapsulates the -transportlayer segment (the interoffice envelope), which encapsulates -the original message (the memo). The postal service delivers the postal -envelope to the receiving branch-office mailroom. There, the process of -de-encapsulation is begun. The mailroom extracts the interoffice memo -and forwards it to Bob. Finally, Bob opens the envelope and removes the -memo. The process of encapsulation can be more complex than that -described above. For example, a large message may be divided into -multiple transport-layer segments (which might themselves each be -divided into multiple network-layer datagrams). At the receiving end, -such a segment must then be reconstructed from its constituent -datagrams. - -1.6 Networks Under Attack The Internet has become mission critical for -many institutions today, including large and small companies, -universities, and government agencies. Many individuals also rely on the -Internet for many of their professional, social, and personal -activities. Billions of "things," including wearables and home devices, -are currently being connected to the Internet. But behind all this -utility and excitement, there is a dark side, a side where "bad guys" -attempt to wreak havoc in our daily lives by damaging our -Internetconnected computers, violating our privacy, and rendering -inoperable the Internet services on which we depend. The field of -network security is about how the bad guys can attack computer networks -and about how we, soon-to-be experts in computer networking, can defend -networks against those attacks, or better yet, design new architectures -that are immune to such attacks in the first place. Given the frequency -and variety of existing attacks as well as the threat of new and more -destructive future attacks, network security has become a central topic -in the field of computer networking. One of the features of this -textbook is that it brings network security issues to the forefront. -Since we don't yet have expertise in computer networking and Internet -protocols, we'll begin here by surveying some of today's more prevalent -security-related problems. This will whet our appetite for more -substantial discussions in the upcoming chapters. So we begin here by -simply asking, what can go wrong? How are computer networks vulnerable? -What are some of the more prevalent types of attacks today? The Bad Guys -Can Put Malware into Your Host Via the Internet We attach devices to the -Internet because we want to receive/send data from/to the Internet. This -includes all kinds of good stuff, including Instagram posts, Internet -search results, streaming music, video conference calls, streaming -movies, and so on. But, unfortunately, along with all that good stuff -comes malicious stuff---collectively known as malware---that can also -enter and infect our devices. Once malware infects our device it can do -all kinds of devious things, including deleting our files and installing -spyware that collects our private information, such as social security -numbers, passwords, and keystrokes, and then sends this (over the -Internet, of course!) back to the bad guys. Our compromised host may -also be enrolled in a network of thousands of similarly compromised -devices, collectively known as a botnet, which the bad guys control and -leverage for spam e-mail distribution or distributed denial-of-service -attacks (soon to be discussed) against targeted hosts. - -Much of the malware out there today is self-replicating: once it infects -one host, from that host it seeks entry into other hosts over the -Internet, and from the newly infected hosts, it seeks entry into yet -more hosts. In this manner, self-replicating malware can spread -exponentially fast. Malware can spread in the form of a virus or a worm. -Viruses are malware that require some form of user interaction to infect -the user's device. The classic example is an e-mail attachment -containing malicious executable code. If a user receives and opens such -an attachment, the user inadvertently runs the malware on the device. -Typically, such e-mail viruses are self-replicating: once executed, the -virus may send an identical message with an identical malicious -attachment to, for example, every recipient in the user's address book. -Worms are malware that can enter a device without any explicit user -interaction. For example, a user may be running a vulnerable network -application to which an attacker can send malware. In some cases, -without any user intervention, the application may accept the malware -from the Internet and run it, creating a worm. The worm in the newly -infected device then scans the Internet, searching for other hosts -running the same vulnerable network application. When it finds other -vulnerable hosts, it sends a copy of itself to those hosts. Today, -malware, is pervasive and costly to defend against. As you work through -this textbook, we encourage you to think about the following question: -What can computer network designers do to defend Internet-attached -devices from malware attacks? The Bad Guys Can Attack Servers and -Network Infrastructure Another broad class of security threats are known -as denial-of-service (DoS) attacks. As the name suggests, a DoS attack -renders a network, host, or other piece of infrastructure unusable by -legitimate users. Web servers, e-mail servers, DNS servers (discussed in -Chapter 2), and institutional networks can all be subject to DoS -attacks. Internet DoS attacks are extremely common, with thousands of -DoS attacks occurring every year \[Moore 2001\]. The site Digital Attack -Map allows use to visualize the top daily DoS attacks worldwide \[DAM -2016\]. Most Internet DoS attacks fall into one of three categories: -Vulnerability attack. This involves sending a few well-crafted messages -to a vulnerable application or operating system running on a targeted -host. If the right sequence of packets is sent to a vulnerable -application or operating system, the service can stop or, worse, the -host can crash. Bandwidth flooding. The attacker sends a deluge of -packets to the targeted host---so many packets that the target's access -link becomes clogged, preventing legitimate packets from reaching the -server. Connection flooding. The attacker establishes a large number of -half-open or fully open TCP connections (TCP connections are discussed -in Chapter 3) at the target host. The host can become so bogged down -with these bogus connections that it stops accepting legitimate -connections. Let's now explore the bandwidth-flooding attack in more -detail. Recalling our delay and loss analysis discussion in Section -1.4.2, it's evident that if the server has an access rate of R bps, then -the attacker will need to send traffic at a rate of approximately R bps -to cause damage. If R is very large, a single attack source may not be -able to generate enough traffic to harm the server. Furthermore, if all -the - -traffic emanates from a single source, an upstream router may be able to -detect the attack and block all traffic from that source before the -traffic gets near the server. In a distributed DoS (DDoS) attack, -illustrated in Figure 1.25, the attacker controls multiple sources and -has each source blast traffic at the target. With this approach, the -aggregate traffic rate across all the controlled sources needs to be -approximately R to cripple the service. DDoS attacks leveraging botnets -with thousands of comprised hosts are a common occurrence today \[DAM -2016\]. DDos attacks are much harder to detect and defend against than a -DoS attack from a single host. We encourage you to consider the -following question as you work your way through this book: What can -computer network designers do to defend against DoS attacks? We will see -that different defenses are needed for the three types of DoS attacks. - -Figure 1.25 A distributed denial-of-service attack - -The Bad Guys Can Sniff Packets Many users today access the Internet via -wireless devices, such as WiFi-connected laptops or handheld devices -with cellular Internet connections (covered in Chapter 7). While -ubiquitous Internet access is extremely convenient and enables marvelous -new applications for mobile users, it also creates a major security -vulnerability---by placing a passive receiver in the vicinity of the -wireless transmitter, that receiver can obtain a copy of every packet -that is transmitted! These packets can contain all kinds of sensitive -information, including passwords, social security numbers, trade -secrets, and private personal messages. A passive receiver that records -a copy of every packet that flies by is called a packet sniffer. - -Sniffers can be deployed in wired environments as well. In wired -broadcast environments, as in many Ethernet LANs, a packet sniffer can -obtain copies of broadcast packets sent over the LAN. As described in -Section 1.2, cable access technologies also broadcast packets and are -thus vulnerable to sniffing. Furthermore, a bad guy who gains access to -an institution's access router or access link to the Internet may be -able to plant a sniffer that makes a copy of every packet going to/from -the organization. Sniffed packets can then be analyzed offline for -sensitive information. Packet-sniffing software is freely available at -various Web sites and as commercial products. Professors teaching a -networking course have been known to assign lab exercises that involve -writing a packetsniffing and application-layer data reconstruction -program. Indeed, the Wireshark \[Wireshark 2016\] labs associated with -this text (see the introductory Wireshark lab at the end of this -chapter) use exactly such a packet sniffer! Because packet sniffers are -passive---that is, they do not inject packets into the channel---they -are difficult to detect. So, when we send packets into a wireless -channel, we must accept the possibility that some bad guy may be -recording copies of our packets. As you may have guessed, some of the -best defenses against packet sniffing involve cryptography. We will -examine cryptography as it applies to network security in Chapter 8. The -Bad Guys Can Masquerade as Someone You Trust It is surprisingly easy -(you will have the knowledge to do so shortly as you proceed through -this text!) to create a packet with an arbitrary source address, packet -content, and destination address and then transmit this hand-crafted -packet into the Internet, which will dutifully forward the packet to its -destination. Imagine the unsuspecting receiver (say an Internet router) -who receives such a packet, takes the (false) source address as being -truthful, and then performs some command embedded in the packet's -contents (say modifies its forwarding table). The ability to inject -packets into the Internet with a false source address is known as IP -spoofing, and is but one of many ways in which one user can masquerade -as another user. To solve this problem, we will need end-point -authentication, that is, a mechanism that will allow us to determine -with certainty if a message originates from where we think it does. Once -again, we encourage you to think about how this can be done for network -applications and protocols as you progress through the chapters of this -book. We will explore mechanisms for end-point authentication in Chapter -8. In closing this section, it's worth considering how the Internet got -to be such an insecure place in the first place. The answer, in essence, -is that the Internet was originally designed to be that way, based on -the model of "a group of mutually trusting users attached to a -transparent network" \[Blumenthal 2001\]---a model in which (by -definition) there is no need for security. Many aspects of the original -Internet architecture deeply reflect this notion of mutual trust. For -example, the ability for one user to send a - -packet to any other user is the default rather than a requested/granted -capability, and user identity is taken at declared face value, rather -than being authenticated by default. But today's Internet certainly does -not involve "mutually trusting users." Nonetheless, today's users still -need to communicate when they don't necessarily trust each other, may -wish to communicate anonymously, may communicate indirectly through -third parties (e.g., Web caches, which we'll study in Chapter 2, or -mobility-assisting agents, which we'll study in Chapter 7), and may -distrust the hardware, software, and even the air through which they -communicate. We now have many security-related challenges before us as -we progress through this book: We should seek defenses against sniffing, -endpoint masquerading, man-in-the-middle attacks, DDoS attacks, malware, -and more. We should keep in mind that communication among mutually -trusted users is the exception rather than the rule. Welcome to the -world of modern computer networking! - -1.7 History of Computer Networking and the Internet Sections 1.1 through -1.6 presented an overview of the technology of computer networking and -the Internet. You should know enough now to impress your family and -friends! However, if you really want to be a big hit at the next -cocktail party, you should sprinkle your discourse with tidbits about -the fascinating history of the Internet \[Segaller 1998\]. - -1.7.1 The Development of Packet Switching: 1961--1972 The field of -computer networking and today's Internet trace their beginnings back to -the early 1960s, when the telephone network was the world's dominant -communication network. Recall from Section 1.3 that the telephone -network uses circuit switching to transmit information from a sender to -a receiver---an appropriate choice given that voice is transmitted at a -constant rate between sender and receiver. Given the increasing -importance of computers in the early 1960s and the advent of timeshared -computers, it was perhaps natural to consider how to hook computers -together so that they could be shared among geographically distributed -users. The traffic generated by such users was likely to be -bursty---intervals of activity, such as the sending of a command to a -remote computer, followed by periods of inactivity while waiting for a -reply or while contemplating the received response. Three research -groups around the world, each unaware of the others' work \[Leiner -1998\], began inventing packet switching as an efficient and robust -alternative to circuit switching. The first published work on -packet-switching techniques was that of Leonard Kleinrock \[Kleinrock -1961; Kleinrock 1964\], then a graduate student at MIT. Using queuing -theory, Kleinrock's work elegantly demonstrated the effectiveness of the -packet-switching approach for bursty traffic sources. In 1964, Paul -Baran \[Baran 1964\] at the Rand Institute had begun investigating the -use of packet switching for secure voice over military networks, and at -the National Physical Laboratory in England, Donald Davies and Roger -Scantlebury were also developing their ideas on packet switching. The -work at MIT, Rand, and the NPL laid the foundations for today's -Internet. But the Internet also has a long history of a -let's-build-it-and-demonstrate-it attitude that also dates back to the -1960s. J. C. R. Licklider \[DEC 1990\] and Lawrence Roberts, both -colleagues of Kleinrock's at MIT, went on to lead the computer science -program at the Advanced Research Projects Agency (ARPA) in the United -States. Roberts published an overall plan for the ARPAnet \[Roberts -1967\], the first packet-switched computer network and a direct ancestor -of today's public Internet. On Labor Day in 1969, the first packet -switch was installed at UCLA under Kleinrock's supervision, and three -additional packet switches were installed - -shortly thereafter at the Stanford Research Institute (SRI), UC Santa -Barbara, and the University of Utah (Figure 1.26). The fledgling -precursor to the Internet was four nodes large by the end of 1969. -Kleinrock recalls the very first use of the network to perform a remote -login from UCLA to SRI, crashing the system \[Kleinrock 2004\]. By 1972, -ARPAnet had grown to approximately 15 nodes and was given its first -public demonstration by Robert Kahn. The first host-to-host protocol -between ARPAnet end systems, known as the networkcontrol protocol (NCP), -was completed \[RFC 001\]. With an end-to-end protocol available, -applications could now be written. Ray Tomlinson wrote the first e-mail -program in 1972. - -1.7.2 Proprietary Networks and Internetworking: 1972--1980 The initial -ARPAnet was a single, closed network. In order to communicate with an -ARPAnet host, one had to be actually attached to another ARPAnet IMP. In -the early to mid-1970s, additional stand-alone packet-switching networks -besides ARPAnet came into being: ALOHANet, a microwave network linking -universities on the Hawaiian islands \[Abramson 1970\], as well as -DARPA's packet-satellite \[RFC 829\] - -Figure 1.26 An early packet switch - -and packet-radio networks \[Kahn 1978\]; Telenet, a BBN commercial -packet-switching network based on ARPAnet technology; Cyclades, a French -packet-switching network pioneered by Louis Pouzin \[Think 2012\]; -Time-sharing networks such as Tymnet and the GE Information Services -network, among others, in the late 1960s and early 1970s \[Schwartz -1977\]; IBM's SNA (1969--1974), which paralleled the ARPAnet work -\[Schwartz 1977\]. - -The number of networks was growing. With perfect hindsight we can see -that the time was ripe for developing an encompassing architecture for -connecting networks together. Pioneering work on interconnecting -networks (under the sponsorship of the Defense Advanced Research -Projects Agency (DARPA)), in essence creating a network of networks, was -done by Vinton Cerf and Robert Kahn \[Cerf 1974\]; the term internetting -was coined to describe this work. These architectural principles were -embodied in TCP. The early versions of TCP, however, were quite -different from today's TCP. The early versions of TCP combined a -reliable in-sequence delivery of data via end-system retransmission -(still part of today's TCP) with forwarding functions (which today are -performed by IP). Early experimentation with TCP, combined with the -recognition of the importance of an unreliable, non-flow-controlled, -end-to-end transport service for applications such as packetized voice, -led to the separation of IP out of TCP and the development of the UDP -protocol. The three key Internet protocols that we see today---TCP, UDP, -and IP---were conceptually in place by the end of the 1970s. In addition -to the DARPA Internet-related research, many other important networking -activities were underway. In Hawaii, Norman Abramson was developing -ALOHAnet, a packet-based radio network that allowed multiple remote -sites on the Hawaiian Islands to communicate with each other. The ALOHA -protocol \[Abramson 1970\] was the first multiple-access protocol, -allowing geographically distributed users to share a single broadcast -communication medium (a radio frequency). Metcalfe and Boggs built on -Abramson's multiple-access protocol work when they developed the -Ethernet protocol \[Metcalfe 1976\] for wire-based shared broadcast -networks. Interestingly, Metcalfe and Boggs' Ethernet protocol was -motivated by the need to connect multiple PCs, printers, and shared -disks \[Perkins 1994\]. Twentyfive years ago, well before the PC -revolution and the explosion of networks, Metcalfe and Boggs were laying -the foundation for today's PC LANs. - -1.7.3 A Proliferation of Networks: 1980--1990 By the end of the 1970s, -approximately two hundred hosts were connected to the ARPAnet. By the -end of the 1980s the number of hosts connected to the public Internet, a -confederation of networks looking much like today's Internet, would -reach a hundred thousand. The 1980s would be a time of tremendous -growth. Much of that growth resulted from several distinct efforts to -create computer networks linking universities together. BITNET provided -e-mail and file transfers among several universities in the Northeast. -CSNET (computer science network) was formed to link university -researchers who did not have access to ARPAnet. In 1986, NSFNET was -created to provide access to NSF-sponsored supercomputing centers. -Starting with an initial backbone speed of 56 kbps, NSFNET's backbone -would be running at 1.5 Mbps by the end of the decade and would serve as -a primary backbone linking regional networks. - -In the ARPAnet community, many of the final pieces of today's Internet -architecture were falling into place. January 1, 1983 saw the official -deployment of TCP/IP as the new standard host protocol for ARPAnet -(replacing the NCP protocol). The transition \[RFC 801\] from NCP to -TCP/IP was a flag day event---all hosts were required to transfer over -to TCP/IP as of that day. In the late 1980s, important extensions were -made to TCP to implement host-based congestion control \[Jacobson -1988\]. The DNS, used to map between a human-readable Internet name (for -example, gaia.cs.umass.edu) and its 32-bit IP address, was also -developed \[RFC 1034\]. Paralleling this development of the ARPAnet -(which was for the most part a US effort), in the early 1980s the French -launched the Minitel project, an ambitious plan to bring data networking -into everyone's home. Sponsored by the French government, the Minitel -system consisted of a public packet-switched network (based on the X.25 -protocol suite), Minitel servers, and inexpensive terminals with -built-in low-speed modems. The Minitel became a huge success in 1984 -when the French government gave away a free Minitel terminal to each -French household that wanted one. Minitel sites included free -sites---such as a telephone directory site---as well as private sites, -which collected a usage-based fee from each user. At its peak in the mid -1990s, it offered more than 20,000 services, ranging from home banking -to specialized research databases. The Minitel was in a large proportion -of French homes 10 years before most Americans had ever heard of the -Internet. - -1.7.4 The Internet Explosion: The 1990s The 1990s were ushered in with a -number of events that symbolized the continued evolution and the -soon-to-arrive commercialization of the Internet. ARPAnet, the -progenitor of the Internet, ceased to exist. In 1991, NSFNET lifted its -restrictions on the use of NSFNET for commercial purposes. NSFNET itself -would be decommissioned in 1995, with Internet backbone traffic being -carried by commercial Internet Service Providers. The main event of the -1990s was to be the emergence of the World Wide Web application, which -brought the Internet into the homes and businesses of millions of people -worldwide. The Web served as a platform for enabling and deploying -hundreds of new applications that we take for granted today, including -search (e.g., Google and Bing) Internet commerce (e.g., Amazon and eBay) -and social networks (e.g., Facebook). The Web was invented at CERN by -Tim Berners-Lee between 1989 and 1991 \[Berners-Lee 1989\], based on -ideas originating in earlier work on hypertext from the 1940s by -Vannevar Bush \[Bush 1945\] and since the 1960s by Ted Nelson \[Xanadu -2012\]. Berners-Lee and his associates developed initial versions of -HTML, HTTP, a Web server, and a browser---the four key components of the -Web. Around the end of 1993 there were about two hundred Web servers in -operation, this collection of servers being - -just a harbinger of what was about to come. At about this time several -researchers were developing Web browsers with GUI interfaces, including -Marc Andreessen, who along with Jim Clark, formed Mosaic Communications, -which later became Netscape Communications Corporation \[Cusumano 1998; -Quittner 1998\]. By 1995, university students were using Netscape -browsers to surf the Web on a daily basis. At about this time -companies---big and small---began to operate Web servers and transact -commerce over the Web. In 1996, Microsoft started to make browsers, -which started the browser war between Netscape and Microsoft, which -Microsoft won a few years later \[Cusumano 1998\]. The second half of -the 1990s was a period of tremendous growth and innovation for the -Internet, with major corporations and thousands of startups creating -Internet products and services. By the end of the millennium the -Internet was supporting hundreds of popular applications, including four -killer applications: E-mail, including attachments and Web-accessible -e-mail The Web, including Web browsing and Internet commerce Instant -messaging, with contact lists Peer-to-peer file sharing of MP3s, -pioneered by Napster Interestingly, the first two killer applications -came from the research community, whereas the last two were created by a -few young entrepreneurs. The period from 1995 to 2001 was a -roller-coaster ride for the Internet in the financial markets. Before -they were even profitable, hundreds of Internet startups made initial -public offerings and started to be traded in a stock market. Many -companies were valued in the billions of dollars without having any -significant revenue streams. The Internet stocks collapsed in -2000--2001, and many startups shut down. Nevertheless, a number of -companies emerged as big winners in the Internet space, including -Microsoft, Cisco, Yahoo, e-Bay, Google, and Amazon. - -1.7.5 The New Millennium Innovation in computer networking continues at -a rapid pace. Advances are being made on all fronts, including -deployments of faster routers and higher transmission speeds in both -access networks and in network backbones. But the following developments -merit special attention: Since the beginning of the millennium, we have -been seeing aggressive deployment of broadband Internet access to -homes---not only cable modems and DSL but also fiber to the home, as -discussed in Section 1.2. This high-speed Internet access has set the -stage for a wealth of video applications, including the distribution of -user-generated video (for example, YouTube), on-demand streaming of -movies and television shows (e.g., Netflix), and multi-person video -conference (e.g., Skype, - -Facetime, and Google Hangouts). The increasing ubiquity of high-speed -(54 Mbps and higher) public WiFi networks and mediumspeed (tens of Mbps) -Internet access via 4G cellular telephony networks is not only making it -possible to remain constantly connected while on the move, but also -enabling new location-specific applications such as Yelp, Tinder, Yik -Yak, and Waz. The number of wireless devices connecting to the Internet -surpassed the number of wired devices in 2011. This high-speed wireless -access has set the stage for the rapid emergence of hand-held computers -(iPhones, Androids, iPads, and so on), which enjoy constant and -untethered access to the Internet. Online social networks---such as -Facebook, Instagram, Twitter, and WeChat (hugely popular in -China)---have created massive people networks on top of the Internet. -Many of these social networks are extensively used for messaging as well -as photo sharing. Many Internet users today "live" primarily within one -or more social networks. Through their APIs, the online social networks -create platforms for new networked applications and distributed games. -As discussed in Section 1.3.3, online service providers, such as Google -and Microsoft, have deployed their own extensive private networks, which -not only connect together their globally distributed data centers, but -are used to bypass the Internet as much as possible by peering directly -with lower-tier ISPs. As a result, Google provides search results and -e-mail access almost instantaneously, as if their data centers were -running within one's own computer. Many Internet commerce companies are -now running their applications in the "cloud"---such as in Amazon's EC2, -in Google's Application Engine, or in Microsoft's Azure. Many companies -and universities have also migrated their Internet applications (e.g., -e-mail and Web hosting) to the cloud. Cloud companies not only provide -applications scalable computing and storage environments, but also -provide the applications implicit access to their high-performance -private networks. - -1.8 Summary In this chapter we've covered a tremendous amount of -material! We've looked at the various pieces of hardware and software -that make up the Internet in particular and computer networks in -general. We started at the edge of the network, looking at end systems -and applications, and at the transport service provided to the -applications running on the end systems. We also looked at the -link-layer technologies and physical media typically found in the access -network. We then dove deeper inside the network, into the network core, -identifying packet switching and circuit switching as the two basic -approaches for transporting data through a telecommunication network, -and we examined the strengths and weaknesses of each approach. We also -examined the structure of the global Internet, learning that the -Internet is a network of networks. We saw that the Internet's -hierarchical structure, consisting of higherand lower-tier ISPs, has -allowed it to scale to include thousands of networks. In the second part -of this introductory chapter, we examined several topics central to the -field of computer networking. We first examined the causes of delay, -throughput and packet loss in a packetswitched network. We developed -simple quantitative models for transmission, propagation, and queuing -delays as well as for throughput; we'll make extensive use of these -delay models in the homework problems throughout this book. Next we -examined protocol layering and service models, key architectural -principles in networking that we will also refer back to throughout this -book. We also surveyed some of the more prevalent security attacks in -the Internet day. We finished our introduction to networking with a -brief history of computer networking. The first chapter in itself -constitutes a minicourse in computer networking. So, we have indeed -covered a tremendous amount of ground in this first chapter! If you're a -bit overwhelmed, don't worry. In the following chapters we'll revisit -all of these ideas, covering them in much more detail (that's a promise, -not a threat!). At this point, we hope you leave this chapter with a -still-developing intuition for the pieces that make up a network, a -still-developing command of the vocabulary of networking (don't be shy -about referring back to this chapter), and an ever-growing desire to -learn more about networking. That's the task ahead of us for the rest of -this book. - -Road-Mapping This Book Before starting any trip, you should always -glance at a road map in order to become familiar with the major roads -and junctures that lie ahead. For the trip we are about to embark on, -the ultimate destination is a deep understanding of the how, what, and -why of computer networks. Our road map is - -the sequence of chapters of this book: - -1. Computer Networks and the Internet -2. Application Layer -3. Transport Layer -4. Network Layer: Data Plane -5. Network Layer: Control Plane -6. The Link Layer and LANs -7. Wireless and Mobile Networks -8. Security in Computer Networks -9. Multimedia Networking Chapters 2 through 6 are the five core - chapters of this book. You should notice that these chapters are - organized around the top four layers of the five-layer Internet - protocol. Further note that our journey will begin at the top of the - Internet protocol stack, namely, the application layer, and will - work its way downward. The rationale behind this top-down journey is - that once we understand the applications, we can understand the - network services needed to support these applications. We can then, - in turn, examine the various ways in which such services might be - implemented by a network architecture. Covering applications early - thus provides motivation for the remainder of the text. The second - half of the book---Chapters 7 through 9---zooms in on three - enormously important (and somewhat independent) topics in modern - computer networking. In Chapter 7, we examine wireless and mobile - networks, including wireless LANs (including WiFi and Bluetooth), - Cellular telephony networks (including GSM, 3G, and 4G), and - mobility (in both IP and GSM networks). Chapter 8, which addresses - security in computer networks, first looks at the underpinnings of - encryption and network security, and then we examine how the basic - theory is being applied in a broad range of Internet contexts. The - last chapter, which addresses multimedia networking, examines audio - and video applications such as Internet phone, video conferencing, - and streaming of stored media. We also look at how a packetswitched - network can be designed to provide consistent quality of service to - audio and video applications. - -Homework Problems and Questions - -Chapter 1 Review Questions - -SECTION 1.1 R1. What is the difference between a host and an end system? -List several different types of end systems. Is a Web server an end -system? R2. The word protocol is often used to describe diplomatic -relations. How does Wikipedia describe diplomatic protocol? R3. Why are -standards important for protocols? - -SECTION 1.2 R4. List six access technologies. Classify each one as home -access, enterprise access, or widearea wireless access. R5. Is HFC -transmission rate dedicated or shared among users? Are collisions -possible in a downstream HFC channel? Why or why not? R6. List the -available residential access technologies in your city. For each type of -access, provide the advertised downstream rate, upstream rate, and -monthly price. R7. What is the transmission rate of Ethernet LANs? R8. -What are some of the physical media that Ethernet can run over? R9. -Dial-up modems, HFC, DSL and FTTH are all used for residential access. -For each of these access technologies, provide a range of transmission -rates and comment on whether the transmission rate is shared or -dedicated. R10. Describe the most popular wireless Internet access -technologies today. Compare and contrast them. - -SECTION 1.3 R11. Suppose there is exactly one packet switch between a -sending host and a receiving host. The transmission rates between the -sending host and the switch and between the switch and the receiving -host are R1 and R2, respectively. Assuming that the switch uses -store-and-forward packet switching, what is the total end-to-end delay -to send a packet of length L? (Ignore queuing, propagation delay, and -processing delay.) - -R12. What advantage does a circuit-switched network have over a -packet-switched network? What advantages does TDM have over FDM in a -circuit-switched network? R13. Suppose users share a 2 Mbps link. Also -suppose each user transmits continuously at 1 Mbps when transmitting, -but each user transmits only 20 percent of the time. (See the discussion -of statistical multiplexing in Section 1.3 .) - -a. When circuit switching is used, how many users can be supported? - -b. For the remainder of this problem, suppose packet switching is used. - Why will there be essentially no queuing delay before the link if - two or fewer users transmit at the same time? Why will there be a - queuing delay if three users transmit at the same time? - -c. Find the probability that a given user is transmitting. - -d. Suppose now there are three users. Find the probability that at any - given time, all three users are transmitting simultaneously. Find - the fraction of time during which the queue grows. R14. Why will two - ISPs at the same level of the hierarchy often peer with each other? - How does an IXP earn money? R15. Some content providers have created - their own networks. Describe Google's network. What motivates - content providers to create these networks? - -SECTION 1.4 R16. Consider sending a packet from a source host to a -destination host over a fixed route. List the delay components in the -end-to-end delay. Which of these delays are constant and which are -variable? R17. Visit the Transmission Versus Propagation Delay applet at -the companion Web site. Among the rates, propagation delay, and packet -sizes available, find a combination for which the sender finishes -transmitting before the first bit of the packet reaches the receiver. -Find another combination for which the first bit of the packet reaches -the receiver before the sender finishes transmitting. R18. How long does -it take a packet of length 1,000 bytes to propagate over a link of -distance 2,500 km, propagation speed 2.5⋅108 m/s, and transmission rate -2 Mbps? More generally, how long does it take a packet of length L to -propagate over a link of distance d, propagation speed s, and -transmission rate R bps? Does this delay depend on packet length? Does -this delay depend on transmission rate? R19. Suppose Host A wants to -send a large file to Host B. The path from Host A to Host B has three -links, of rates R1=500 kbps, R2=2 Mbps, and R3=1 Mbps. - -a. Assuming no other traffic in the network, what is the throughput for - the file transfer? - -b. Suppose the file is 4 million bytes. Dividing the file size by the - throughput, roughly how long will it take to transfer the file to - Host B? - -c. Repeat (a) and (b), but now with R2 reduced to 100 kbps. - -R20. Suppose end system A wants to send a large file to end system B. At -a very high level, describe how end system A creates packets from the -file. When one of these packets arrives to a router, what information in -the packet does the router use to determine the link onto which the -packet is forwarded? Why is packet switching in the Internet analogous -to driving from one city to another and asking directions along the way? -R21. Visit the Queuing and Loss applet at the companion Web site. What -is the maximum emission rate and the minimum transmission rate? With -those rates, what is the traffic intensity? Run the applet with these -rates and determine how long it takes for packet loss to occur. Then -repeat the experiment a second time and determine again how long it -takes for packet loss to occur. Are the values different? Why or why -not? - -SECTION 1.5 R22. List five tasks that a layer can perform. Is it -possible that one (or more) of these tasks could be performed by two (or -more) layers? R23. What are the five layers in the Internet protocol -stack? What are the principal responsibilities of each of these layers? -R24. What is an application-layer message? A transport-layer segment? A -network-layer datagram? A link-layer frame? R25. Which layers in the -Internet protocol stack does a router process? Which layers does a -link-layer switch process? Which layers does a host process? - -SECTION 1.6 R26. What is the difference between a virus and a worm? R27. -Describe how a botnet can be created and how it can be used for a DDoS -attack. R28. Suppose Alice and Bob are sending packets to each other -over a computer network. Suppose Trudy positions herself in the network -so that she can capture all the packets sent by Alice and send whatever -she wants to Bob; she can also capture all the packets sent by Bob and -send whatever she wants to Alice. List some of the malicious things -Trudy can do from this position. - -Problems P1. Design and describe an application-level protocol to be -used between an automatic teller machine and a bank's centralized -computer. Your protocol should allow a user's card and password to be -verified, the account balance (which is maintained at the centralized -computer) to be queried, and an account withdrawal to be made (that is, -money disbursed to the user). - -Your protocol entities should be able to handle the all-too-common case -in which there is not enough money in the account to cover the -withdrawal. Specify your protocol by listing the messages exchanged and -the action taken by the automatic teller machine or the bank's -centralized computer on transmission and receipt of messages. Sketch the -operation of your protocol for the case of a simple withdrawal with no -errors, using a diagram similar to that in Figure 1.2 . Explicitly state -the assumptions made by your protocol about the underlying end-toend -transport service. P2. Equation 1.1 gives a formula for the end-to-end -delay of sending one packet of length L over N links of transmission -rate R. Generalize this formula for sending P such packets back-toback -over the N links. P3. Consider an application that transmits data at a -steady rate (for example, the sender generates an N-bit unit of data -every k time units, where k is small and fixed). Also, when such an -application starts, it will continue running for a relatively long -period of time. Answer the following questions, briefly justifying your -answer: - -a. Would a packet-switched network or a circuit-switched network be - more appropriate for this application? Why? - -b. Suppose that a packet-switched network is used and the only traffic - in this network comes from such applications as described above. - Furthermore, assume that the sum of the application data rates is - less than the capacities of each and every link. Is some form of - congestion control needed? Why? P4. Consider the circuit-switched - network in Figure 1.13 . Recall that there are 4 circuits on each - link. Label the four switches A, B, C, and D, going in the clockwise - direction. - -c. What is the maximum number of simultaneous connections that can be - in progress at any one time in this network? - -d. Suppose that all connections are between switches A and C. What is - the maximum number of simultaneous connections that can be in - progress? - -e. Suppose we want to make four connections between switches A and C, - and another four connections between switches B and D. Can we route - these calls through the four links to accommodate all eight - connections? P5. Review the car-caravan analogy in Section 1.4 . - Assume a propagation speed of 100 km/hour. - -f. Suppose the caravan travels 150 km, beginning in front of one - tollbooth, passing through a second tollbooth, and finishing just - after a third tollbooth. What is the end-to-end delay? - -g. Repeat (a), now assuming that there are eight cars in the caravan - instead of ten. P6. This elementary problem begins to explore - propagation delay and transmission delay, two central concepts in - data networking. Consider two hosts, A and B, connected by a single - link of rate R bps. Suppose that the two hosts are separated by m - meters, and suppose the - -propagation speed along the link is s meters/sec. Host A is to send a -packet of size L bits to Host B. - -Exploring propagation delay and transmission delay - -a. Express the propagation delay, dprop, in terms of m and s. - -b. Determine the transmission time of the packet, dtrans, in terms of L - and R. - -c. Ignoring processing and queuing delays, obtain an expression for the - end-to-end delay. - -d. Suppose Host A begins to transmit the packet at time t=0. At time t= - dtrans, where is the last bit of the packet? - -e. Suppose dprop is greater than dtrans. At time t=dtrans, where is the - first bit of the packet? - -f. Suppose dprop is less than dtrans. At time t=dtrans, where is the - first bit of the packet? - -g. Suppose s=2.5⋅108, L=120 bits, and R=56 kbps. Find the distance m so - that dprop equals dtrans. P7. In this problem, we consider sending - real-time voice from Host A to Host B over a packetswitched network - (VoIP). Host A converts analog voice to a digital 64 kbps bit stream - on the fly. Host A then groups the bits into 56-byte packets. There - is one link between Hosts A and B; its transmission rate is 2 Mbps - and its propagation delay is 10 msec. As soon as Host A gathers a - packet, it sends it to Host B. As soon as Host B receives an entire - packet, it converts the packet's bits to an analog signal. How much - time elapses from the time a bit is created (from the original - analog signal at Host A) until the bit is decoded (as part of the - analog signal at Host B)? P8. Suppose users share a 3 Mbps link. - Also suppose each user requires 150 kbps when transmitting, but each - user transmits only 10 percent of the time. (See the discussion of - packet switching versus circuit switching in Section 1.3 .) - -h. When circuit switching is used, how many users can be supported? - -i. For the remainder of this problem, suppose packet switching is used. - Find the probability that a given user is transmitting. - -j. Suppose there are 120 users. Find the probability that at any given - time, exactly n users are transmitting simultaneously. (Hint: Use - the binomial distribution.) - -k. Find the probability that there are 21 or more users transmitting - simultaneously. P9. Consider the discussion in Section 1.3 of packet - switching versus circuit switching in which an example is provided - with a 1 Mbps link. Users are generating data at a rate of 100 kbps - when busy, but are busy generating data only with probability p=0.1. - Suppose that the 1 Mbps link is - -replaced by a 1 Gbps link. - -a. What is N, the maximum number of users that can be supported - simultaneously under circuit switching? - -b. Now consider packet switching and a user population of M users. Give - a formula (in terms of p, M, N) for the probability that more than N - users are sending data. P10. Consider a packet of length L that - begins at end system A and travels over three links to a destination - end system. These three links are connected by two packet switches. - Let di, si, and Ri denote the length, propagation speed, and the - transmission rate of link i, for i=1,2,3. The packet switch delays - each packet by dproc. Assuming no queuing delays, in terms of di, - si, Ri, (i=1,2,3), and L, what is the total end-to-end delay for the - packet? Suppose now the packet is 1,500 bytes, the propagation speed - on all three links is 2.5⋅108m/s, the transmission rates of all - three links are 2 Mbps, the packet switch processing delay is 3 - msec, the length of the first link is 5,000 km, the length of the - second link is 4,000 km, and the length of the last link is 1,000 - km. For these values, what is the end-to-end delay? P11. In the - above problem, suppose R1=R2=R3=R and dproc=0. Further suppose the - packet switch does not store-and-forward packets but instead - immediately transmits each bit it receives before waiting for the - entire packet to arrive. What is the end-to-end delay? P12. A packet - switch receives a packet and determines the outbound link to which - the packet should be forwarded. When the packet arrives, one other - packet is halfway done being transmitted on this outbound link and - four other packets are waiting to be transmitted. Packets are - transmitted in order of arrival. Suppose all packets are 1,500 bytes - and the link rate is 2 Mbps. What is the queuing delay for the - packet? More generally, what is the queuing delay when all packets - have length L, the transmission rate is R, x bits of the - currently-being-transmitted packet have been transmitted, and n - packets are already in the queue? P13. - -c. Suppose N packets arrive simultaneously to a link at which no - packets are currently being transmitted or queued. Each packet is of - length L and the link has transmission rate R. What is the average - queuing delay for the N packets? - -d. Now suppose that N such packets arrive to the link every LN/R - seconds. What is the average queuing delay of a packet? P14. - Consider the queuing delay in a router buffer. Let I denote traffic - intensity; that is, I=La/R. Suppose that the queuing delay takes the - form IL/R(1−I) for I\<1. - -e. Provide a formula for the total delay, that is, the queuing delay - plus the transmission delay. - -f. Plot the total delay as a function of L /R. P15. Let a denote the - rate of packets arriving at a link in packets/sec, and let µ denote - the link's transmission rate in packets/sec. Based on the formula - for the total delay (i.e., the queuing delay - -plus the transmission delay) derived in the previous problem, derive a -formula for the total delay in terms of a and µ. P16. Consider a router -buffer preceding an outbound link. In this problem, you will use -Little's formula, a famous formula from queuing theory. Let N denote the -average number of packets in the buffer plus the packet being -transmitted. Let a denote the rate of packets arriving at the link. Let -d denote the average total delay (i.e., the queuing delay plus the -transmission delay) experienced by a packet. Little's formula is -N=a⋅d. Suppose that on average, the buffer contains 10 packets, and the -average packet queuing delay is 10 msec. The link's transmission rate is -100 packets/sec. Using Little's formula, what is the average packet -arrival rate, assuming there is no packet loss? P17. - -a. Generalize Equation 1.2 in Section 1.4.3 for heterogeneous - processing rates, transmission rates, and propagation delays. - -b. Repeat (a), but now also suppose that there is an average queuing - delay of dqueue at each node. P18. Perform a Traceroute between - source and destination on the same continent at three different - hours of the day. - -Using Traceroute to discover network paths and measure network delay - -a. Find the average and standard deviation of the round-trip delays at - each of the three hours. - -b. Find the number of routers in the path at each of the three hours. - Did the paths change during any of the hours? - -c. Try to identify the number of ISP networks that the Traceroute - packets pass through from source to destination. Routers with - similar names and/or similar IP addresses should be considered as - part of the same ISP. In your experiments, do the largest delays - occur at the peering interfaces between adjacent ISPs? - -d. Repeat the above for a source and destination on different - continents. Compare the intra-continent and inter-continent results. - P19. - -e. Visit the site www.traceroute.org and perform traceroutes from two - different cities in France to the same destination host in the - United States. How many links are the same - -in the two traceroutes? Is the transatlantic link the same? - -b. Repeat (a) but this time choose one city in France and another city - in Germany. - -c. Pick a city in the United States, and perform traceroutes to two - hosts, each in a different city in China. How many links are common - in the two traceroutes? Do the two traceroutes diverge before - reaching China? P20. Consider the throughput example corresponding - to Figure 1.20(b) . Now suppose that there are M client-server pairs - rather than 10. Denote Rs, Rc, and R for the rates of the server - links, client links, and network link. Assume all other links have - abundant capacity and that there is no other traffic in the network - besides the traffic generated by the M client-server pairs. Derive a - general expression for throughput in terms of Rs, Rc, R, and M. P21. - Consider Figure 1.19(b) . Now suppose that there are M paths between - the server and the client. No two paths share any link. Path - k(k=1,...,M) consists of N links with transmission rates - R1k,R2k,...,RNk. If the server can only use one path to send data to - the client, what is the maximum throughput that the server can - achieve? If the server can use all M paths to send data, what is the - maximum throughput that the server can achieve? P22. Consider Figure - 1.19(b) . Suppose that each link between the server and the client - has a packet loss probability p, and the packet loss probabilities - for these links are independent. What is the probability that a - packet (sent by the server) is successfully received by the - receiver? If a packet is lost in the path from the server to the - client, then the server will re-transmit the packet. On average, how - many times will the server re-transmit the packet in order for the - client to successfully receive the packet? P23. Consider Figure - 1.19(a) . Assume that we know the bottleneck link along the path - from the server to the client is the first link with rate Rs - bits/sec. Suppose we send a pair of packets back to back from the - server to the client, and there is no other traffic on this path. - Assume each packet of size L bits, and both links have the same - propagation delay dprop. - -d. What is the packet inter-arrival time at the destination? That is, - how much time elapses from when the last bit of the first packet - arrives until the last bit of the second packet arrives? - -e. Now assume that the second link is the bottleneck link (i.e., - Rc\<Rs). Is it possible that the second packet queues at the input - queue of the second link? Explain. Now suppose that the server sends - the second packet T seconds after sending the first packet. How - large must T be to ensure no queuing before the second link? - Explain. P24. Suppose you would like to urgently deliver 40 - terabytes data from Boston to Los Angeles. You have available a 100 - Mbps dedicated link for data transfer. Would you prefer to transmit - the data via this link or instead use FedEx over-night delivery? - Explain. P25. Suppose two hosts, A and B, are separated by 20,000 - kilometers and are connected by a direct link of R=2 Mbps. Suppose - the propagation speed over the link is 2.5⋅108 meters/sec. - -f. Calculate the bandwidth-delay product, R⋅dprop. - -b. Consider sending a file of 800,000 bits from Host A to Host B. -Suppose the file is sent continuously as one large message. What is the -maximum number of bits that will be in the link at any given time? - -c. Provide an interpretation of the bandwidth-delay product. - -d. What is the width (in meters) of a bit in the link? Is it longer - than a football field? - -e. Derive a general expression for the width of a bit in terms of the - propagation speed s, the transmission rate R, and the length of the - link m. P26. Referring to problem P25, suppose we can modify R. For - what value of R is the width of a bit as long as the length of the - link? P27. Consider problem P25 but now with a link of R=1 Gbps. - -f. Calculate the bandwidth-delay product, R⋅dprop. - -g. Consider sending a file of 800,000 bits from Host A to Host B. - Suppose the file is sent continuously as one big message. What is - the maximum number of bits that will be in the link at any given - time? - -h. What is the width (in meters) of a bit in the link? P28. Refer again - to problem P25. - -i. How long does it take to send the file, assuming it is sent - continuously? - -j. Suppose now the file is broken up into 20 packets with each packet - containing 40,000 bits. Suppose that each packet is acknowledged by - the receiver and the transmission time of an acknowledgment packet - is negligible. Finally, assume that the sender cannot send a packet - until the preceding one is acknowledged. How long does it take to - send the file? - -k. Compare the results from (a) and (b). P29. Suppose there is a 10 - Mbps microwave link between a geostationary satellite and its base - station on Earth. Every minute the satellite takes a digital photo - and sends it to the base station. Assume a propagation speed of - 2.4⋅108 meters/sec. - -l. What is the propagation delay of the link? - -m. What is the bandwidth-delay product, R⋅dprop? - -n. Let x denote the size of the photo. What is the minimum value of x - for the microwave link to be continuously transmitting? P30. - Consider the airline travel analogy in our discussion of layering in - Section 1.5 , and the addition of headers to protocol data units as - they flow down the protocol stack. Is there an equivalent notion of - header information that is added to passengers and baggage as they - move down the airline protocol stack? P31. In modern packet-switched - networks, including the Internet, the source host segments long, - application-layer messages (for example, an image or a music file) - into smaller packets - -and sends the packets into the network. The receiver then reassembles -the packets back into the original message. We refer to this process as -message segmentation. Figure 1.27 illustrates the end-to-end transport -of a message with and without message segmentation. Consider a message -that is 8⋅106 bits long that is to be sent from source to destination in -Figure 1.27 . Suppose each link in the figure is 2 Mbps. Ignore -propagation, queuing, and processing delays. - -a. Consider sending the message from source to destination without - message segmentation. How long does it take to move the message from - the source host to the first packet switch? Keeping in mind that - each switch uses store-and-forward packet switching, what is the - total time to move the message from source host to destination host? - -b. Now suppose that the message is segmented into 800 packets, with - each packet being 10,000 bits long. How long does it take to move - the first packet from source host to the first switch? When the - first packet is being sent from the first switch to the second - switch, the second packet is being sent from the source host to the - first switch. At what time will the second packet be fully received - at the first switch? - -c. How long does it take to move the file from source host to - destination host when message segmentation is used? Compare this - result with your answer in part (a) and comment. - -Figure 1.27 End-to-end message transport: (a) without message -segmentation; (b) with message segmentation - -d. In addition to reducing delay, what are reasons to use message - segmentation? -e. Discuss the drawbacks of message segmentation. P32. Experiment with - the Message Segmentation applet at the book's Web site. Do the - delays in the applet correspond to the delays in the previous - problem? How do link propagation delays affect the overall - end-to-end delay for packet switching (with message segmentation) - and for message switching? P33. Consider sending a large file of F - bits from Host A to Host B. There are three links (and two switches) - between A and B, and the links are uncongested (that is, no queuing - delays). Host A - -segments the file into segments of S bits each and adds 80 bits of -header to each segment, forming packets of L=80 + S bits. Each link has -a transmission rate of R bps. Find the value of S that minimizes the -delay of moving the file from Host A to Host B. Disregard propagation -delay. P34. Skype offers a service that allows you to make a phone call -from a PC to an ordinary phone. This means that the voice call must pass -through both the Internet and through a telephone network. Discuss how -this might be done. - -Wireshark Lab - -"Tell me and I forget. Show me and I remember. Involve me and I -understand." Chinese proverb - -One's understanding of network protocols can often be greatly deepened -by seeing them in action and by playing around with them---observing the -sequence of messages exchanged between two protocol entities, delving -into the details of protocol operation, causing protocols to perform -certain actions, and observing these actions and their consequences. -This can be done in simulated scenarios or in a real network environment -such as the Internet. The Java applets at the textbook Web site take the -first approach. In the Wireshark labs, we'll take the latter approach. -You'll run network applications in various scenarios using a computer on -your desk, at home, or in a lab. You'll observe the network protocols in -your computer, interacting and exchanging messages with protocol -entities executing elsewhere in the Internet. Thus, you and your -computer will be an integral part of these live labs. You'll -observe---and you'll learn---by doing. The basic tool for observing the -messages exchanged between executing protocol entities is called a -packet sniffer. As the name suggests, a packet sniffer passively copies -(sniffs) messages being sent from and received by your computer; it also -displays the contents of the various protocol fields of these captured -messages. A screenshot of the Wireshark packet sniffer is shown in -Figure 1.28. Wireshark is a free packet sniffer that runs on Windows, -Linux/Unix, and Mac computers. - -Figure 1.28 A Wireshark screenshot (Wireshark screenshot reprinted by -permission of the Wireshark Foundation.) - -Throughout the textbook, you will find Wireshark labs that allow you to -explore a number of the protocols studied in the chapter. In this first -Wireshark lab, you'll obtain and install a copy of Wireshark, access a -Web site, and capture and examine the protocol messages being exchanged -between your Web browser and the Web server. You can find full details -about this first Wireshark lab (including instructions about how to -obtain and install Wireshark) at the Web site -http://www.pearsonhighered.com/csresources/. - -AN INTERVIEW WITH... Leonard Kleinrock Leonard Kleinrock is a professor -of computer science at the University of California, Los Angeles. In -1969, his computer at UCLA became the first node of the Internet. His -creation of packet-switching principles in 1961 became the technology -behind the Internet. He received his B.E.E. from the City College of New -York (CCNY) and his masters and PhD in electrical engineering from MIT. - -What made you decide to specialize in networking/Internet technology? As -a PhD student at MIT in 1959, I looked around and found that most of my -classmates were doing research in the area of information theory and -coding theory. At MIT, there was the great researcher, Claude Shannon, -who had launched these fields and had solved most of the important -problems already. The research problems that were left were hard and of -lesser consequence. So I decided to launch out in a new area that no one -else had yet conceived of. Remember that at MIT I was surrounded by lots -of computers, and it was clear to me that soon these machines would need -to communicate with each other. At the time, there was no effective way -for them to do so, so I decided to develop the technology that would -permit efficient and reliable data networks to be created. What was your -first job in the computer industry? What did it entail? I went to the -evening session at CCNY from 1951 to 1957 for my bachelor's degree in -electrical engineering. During the day, I worked first as a technician -and then as an engineer at a small, industrial electronics firm called -Photobell. While there, I introduced digital technology to their product -line. Essentially, we were using photoelectric devices to detect the -presence of certain items (boxes, people, etc.) and the use of a circuit -known then as a bistable multivibrator was just the kind of technology -we needed to bring digital processing into this field of detection. -These circuits happen to be the building blocks for computers, and have -come to be known as flip-flops or switches in today's vernacular. What -was going through your mind when you sent the first host-to-host message -(from UCLA to the Stanford Research Institute)? Frankly, we had no idea -of the importance of that event. We had not prepared a special message -of historic significance, as did so many inventors of the past (Samuel -Morse with "What hath God wrought." or Alexander Graham Bell with -"Watson, come here! I want you." or Neal Amstrong with "That's one small -step for a man, one giant leap for mankind.") Those guys were - -smart! They understood media and public relations. All we wanted to do -was to login to the SRI computer. So we typed the "L", which was -correctly received, we typed the "o" which was received, and then we -typed the "g" which caused the SRI host computer to crash! So, it turned -out that our message was the shortest and perhaps the most prophetic -message ever, namely "Lo!" as in "Lo and behold!" Earlier that year, I -was quoted in a UCLA press release saying that once the network was up -and running, it would be possible to gain access to computer utilities -from our homes and offices as easily as we gain access to electricity -and telephone connectivity. So my vision at that time was that the -Internet would be ubiquitous, always on, always available, anyone with -any device could connect from any location, and it would be invisible. -However, I never anticipated that my 99-year-old mother would use the -Internet---and indeed she did! What is your vision for the future of -networking? The easy part of the vision is to predict the infrastructure -itself. I anticipate that we see considerable deployment of nomadic -computing, mobile devices, and smart spaces. Indeed, the availability of -lightweight, inexpensive, high-performance, portable computing, and -communication devices (plus the ubiquity of the Internet) has enabled us -to become nomads. Nomadic computing refers to the technology that -enables end users who travel from place to place to gain access to -Internet services in a transparent fashion, no matter where they travel -and no matter what device they carry or gain access to. The harder part -of the vision is to predict the applications and services, which have -consistently surprised us in dramatic ways (e-mail, search technologies, -the World Wide Web, blogs, social networks, user generation, and sharing -of music, photos, and videos, etc.). We are on the verge of a new class -of surprising and innovative mobile applications delivered to our -hand-held devices. The next step will enable us to move out from the -netherworld of cyberspace to the physical world of smart spaces. Our -environments (desks, walls, vehicles, watches, belts, and so on) will -come alive with technology, through actuators, sensors, logic, -processing, storage, cameras, microphones, speakers, displays, and -communication. This embedded technology will allow our environment to -provide the IP services we want. When I walk into a room, the room will -know I entered. I will be able to communicate with my environment -naturally, as in spoken English; my requests will generate replies that -present Web pages to me from wall displays, through my eyeglasses, as -speech, holograms, and so forth. Looking a bit further out, I see a -networking future that includes the following additional key components. -I see intelligent software agents deployed across the network whose -function it is to mine data, act on that data, observe trends, and carry -out tasks dynamically and adaptively. I see considerably more network -traffic generated not so much by humans, but by these embedded devices -and these intelligent software agents. I see large collections of -selforganizing systems controlling this vast, fast network. I see huge -amounts of information flashing - -across this network instantaneously with this information undergoing -enormous processing and filtering. The Internet will essentially be a -pervasive global nervous system. I see all these things and more as we -move headlong through the twenty-first century. What people have -inspired you professionally? By far, it was Claude Shannon from MIT, a -brilliant researcher who had the ability to relate his mathematical -ideas to the physical world in highly intuitive ways. He was on my PhD -thesis committee. Do you have any advice for students entering the -networking/Internet field? The Internet and all that it enables is a -vast new frontier, full of amazing challenges. There is room for great -innovation. Don't be constrained by today's technology. Reach out and -imagine what could be and then make it happen. - -Chapter 2 Application Layer - -Network applications are the raisons d'être of a computer network---if -we couldn't conceive of any useful applications, there wouldn't be any -need for networking infrastructure and protocols to support them. Since -the Internet's inception, numerous useful and entertaining applications -have indeed been created. These applications have been the driving force -behind the Internet's success, motivating people in homes, schools, -governments, and businesses to make the Internet an integral part of -their daily activities. Internet applications include the classic -text-based applications that became popular in the 1970s and 1980s: text -e-mail, remote access to computers, file transfers, and newsgroups. They -include the killer application of the mid-1990s, the World Wide Web, -encompassing Web surfing, search, and electronic commerce. They include -instant messaging and P2P file sharing, the two killer applications -introduced at the end of the millennium. In the new millennium, new and -highly compelling applications continue to emerge, including voice over -IP and video conferencing such as Skype, Facetime, and Google Hangouts; -user generated video such as YouTube and movies on demand such as -Netflix; multiplayer online games such as Second Life and World of -Warcraft. During this same period, we have seen the emergence of a new -generation of social networking applications---such as Facebook, -Instagram, Twitter, and WeChat---which have created engaging human -networks on top of the Internet's network or routers and communication -links. And most recently, along with the arrival of the smartphone, -there has been a profusion of location based mobile apps, including -popular check-in, dating, and road-traffic forecasting apps (such as -Yelp, Tinder, Waz, and Yik Yak). Clearly, there has been no slowing down -of new and exciting Internet applications. Perhaps some of the readers -of this text will create the next generation of killer Internet -applications! In this chapter we study the conceptual and implementation -aspects of network applications. We begin by defining key -application-layer concepts, including network services required by -applications, clients and servers, processes, and transport-layer -interfaces. We examine several network applications in detail, including -the Web, e-mail, DNS, peer-to-peer (P2P) file distribution, and video -streaming. (Chapter 9 will further examine multimedia applications, -including streaming video and VoIP.) We then cover network application -development, over both TCP and UDP. In particular, we study the socket -interface and walk through some simple client-server applications in -Python. We also provide several fun and interesting socket programming -assignments at the end of the chapter. - -The application layer is a particularly good place to start our study of -protocols. It's familiar ground. We're acquainted with many of the -applications that rely on the protocols we'll study. It will give us a -good feel for what protocols are all about and will introduce us to many -of the same issues that we'll see again when we study transport, -network, and link layer protocols. - -2.1 Principles of Network Applications Suppose you have an idea for a -new network application. Perhaps this application will be a great -service to humanity, or will please your professor, or will bring you -great wealth, or will simply be fun to develop. Whatever the motivation -may be, let's now examine how you transform the idea into a real-world -network application. At the core of network application development is -writing programs that run on different end systems and communicate with -each other over the network. For example, in the Web application there -are two distinct programs that communicate with each other: the browser -program running in the user's host (desktop, laptop, tablet, smartphone, -and so on); and the Web server program running in the Web server host. -As another example, in a P2P file-sharing system there is a program in -each host that participates in the file-sharing community. In this case, -the programs in the various hosts may be similar or identical. Thus, -when developing your new application, you need to write software that -will run on multiple end systems. This software could be written, for -example, in C, Java, or Python. Importantly, you do not need to write -software that runs on network-core devices, such as routers or -link-layer switches. Even if you wanted to write application software -for these network-core devices, you wouldn't be able to do so. As we -learned in Chapter 1, and as shown earlier in Figure 1.24, network-core -devices do not function at the application layer but instead function at -lower layers---specifically at the network layer and below. This basic -design---namely, confining application software to the end systems---as -shown in Figure 2.1, has facilitated the rapid development and -deployment of a vast array of network applications. - -Figure 2.1 Communication for a network application takes place between -end systems at the application layer - -2.1.1 Network Application Architectures - -Before diving into software coding, you should have a broad -architectural plan for your application. Keep in mind that an -application's architecture is distinctly different from the network -architecture (e.g., the five-layer Internet architecture discussed in -Chapter 1). From the application developer's perspective, the network -architecture is fixed and provides a specific set of services to -applications. The application architecture, on the other hand, is -designed by the application developer and dictates how the application -is structured over the various end systems. In choosing the application -architecture, an application developer will likely draw on one of the -two predominant architectural paradigms used in modern network -applications: the client-server architecture or the peer-to-peer (P2P) -architecture. In a client-server architecture, there is an always-on -host, called the server, which services requests from many other hosts, -called clients. A classic example is the Web application for which an -always-on Web server services requests from browsers running on client -hosts. When a Web server receives a request for an object from a client -host, it responds by sending the requested object to the client host. -Note that with the client-server architecture, clients do not directly -communicate with each other; for example, in the Web application, two -browsers do not directly communicate. Another characteristic of the -client-server architecture is that the server has a fixed, well-known -address, called an IP address (which we'll discuss soon). Because the -server has a fixed, well-known address, and because the server is always -on, a client can always contact the server by sending a packet to the -server's IP address. Some of the better-known applications with a -client-server architecture include the Web, FTP, Telnet, and e-mail. The -client-server architecture is shown in Figure 2.2(a). Often in a -client-server application, a single-server host is incapable of keeping -up with all the requests from clients. For example, a popular -social-networking site can quickly become overwhelmed if it has only one -server handling all of its requests. For this reason, a data center, -housing a large number of hosts, is often used to create a powerful -virtual server. The most popular Internet services---such as search -engines (e.g., Google, Bing, Baidu), Internet commerce (e.g., Amazon, -eBay, Alibaba), Webbased e-mail (e.g., Gmail and Yahoo Mail), social -networking (e.g., Facebook, Instagram, Twitter, and WeChat)---employ one -or more data centers. As discussed in Section 1.3.3, Google has 30 to 50 -data centers distributed around the world, which collectively handle -search, YouTube, Gmail, and other services. A data center can have -hundreds of thousands of servers, which must be powered and maintained. -Additionally, the service providers must pay recurring interconnection -and bandwidth costs for sending data from their data centers. In a P2P -architecture, there is minimal (or no) reliance on dedicated servers in -data centers. Instead the application exploits direct communication -between pairs of intermittently connected hosts, called peers. The peers -are not owned by the service provider, but are instead desktops and -laptops controlled by users, with most of the - -Figure 2.2 (a) Client-server architecture; (b) P2P architecture - -peers residing in homes, universities, and offices. Because the peers -communicate without passing through a dedicated server, the architecture -is called peer-to-peer. Many of today's most popular and -traffic-intensive applications are based on P2P architectures. These -applications include file sharing (e.g., BitTorrent), peer-assisted -download acceleration (e.g., Xunlei), and Internet telephony and video -conference (e.g., Skype). The P2P architecture is illustrated in Figure -2.2(b). We mention that some applications have hybrid architectures, -combining both client-server and P2P elements. For example, for many -instant messaging applications, servers are used to track the IP -addresses of users, but user-touser messages are sent directly between -user hosts (without passing through intermediate servers). One of the -most compelling features of P2P architectures is their self-scalability. -For example, in a P2P file-sharing application, although each peer -generates workload by requesting files, each peer also adds service -capacity to the system by distributing files to other peers. P2P -architectures are also cost effective, since they normally don't require -significant server infrastructure and server bandwidth (in contrast with -clients-server designs with datacenters). However, P2P applications face -challenges of security, performance, and reliability due to their highly -decentralized structure. - -2.1.2 Processes Communicating Before building your network application, -you also need a basic understanding of how the programs, running in -multiple end systems, communicate with each other. In the jargon of -operating systems, it is not actually programs but processes that -communicate. A process can be thought of as a program that is running -within an end system. When processes are running on the same end system, -they can communicate with each other with interprocess communication, -using rules that are governed by the end system's operating system. But -in this book we are not particularly interested in how processes in the -same host communicate, but instead in how processes running on different -hosts (with potentially different operating systems) communicate. -Processes on two different end systems communicate with each other by -exchanging messages across the computer network. A sending process -creates and sends messages into the network; a receiving process -receives these messages and possibly responds by sending messages back. -Figure 2.1 illustrates that processes communicating with each other -reside in the application layer of the five-layer protocol stack. Client -and Server Processes A network application consists of pairs of -processes that send messages to each other over a network. For example, -in the Web application a client browser process exchanges messages with -a Web server - -process. In a P2P file-sharing system, a file is transferred from a -process in one peer to a process in another peer. For each pair of -communicating processes, we typically label one of the two processes as -the client and the other process as the server. With the Web, a browser -is a client process and a Web server is a server process. With P2P file -sharing, the peer that is downloading the file is labeled as the client, -and the peer that is uploading the file is labeled as the server. You -may have observed that in some applications, such as in P2P file -sharing, a process can be both a client and a server. Indeed, a process -in a P2P file-sharing system can both upload and download files. -Nevertheless, in the context of any given communication session between -a pair of processes, we can still label one process as the client and -the other process as the server. We define the client and server -processes as follows: In the context of a communication session between -a pair of processes, the process that initiates the communication (that -is, initially contacts the other process at the beginning of the -session) is labeled as the client. The process that waits to be -contacted to begin the session is the server. In the Web, a browser -process initializes contact with a Web server process; hence the browser -process is the client and the Web server process is the server. In P2P -file sharing, when Peer A asks Peer B to send a specific file, Peer A is -the client and Peer B is the server in the context of this specific -communication session. When there's no confusion, we'll sometimes also -use the terminology "client side and server side of an application." At -the end of this chapter, we'll step through simple code for both the -client and server sides of network applications. The Interface Between -the Process and the Computer Network As noted above, most applications -consist of pairs of communicating processes, with the two processes in -each pair sending messages to each other. Any message sent from one -process to another must go through the underlying network. A process -sends messages into, and receives messages from, the network through a -software interface called a socket. Let's consider an analogy to help us -understand processes and sockets. A process is analogous to a house and -its socket is analogous to its door. When a process wants to send a -message to another process on another host, it shoves the message out -its door (socket). This sending process assumes that there is a -transportation infrastructure on the other side of its door that will -transport the message to the door of the destination process. Once the -message arrives at the destination host, the message passes through the -receiving process's door (socket), and the receiving process then acts -on the message. Figure 2.3 illustrates socket communication between two -processes that communicate over the Internet. (Figure 2.3 assumes that -the underlying transport protocol used by the processes is the -Internet's TCP protocol.) As shown in this figure, a socket is the -interface between the application layer and the transport layer within a -host. It is also referred to as the Application Programming Interface -(API) - -between the application and the network, since the socket is the -programming interface with which network applications are built. The -application developer has control of everything on the applicationlayer -side of the socket but has little control of the transport-layer side of -the socket. The only control that the application developer has on the -transport-layer side is (1) the choice of transport protocol and (2) -perhaps the ability to fix a few transport-layer parameters such as -maximum buffer and maximum segment sizes (to be covered in Chapter 3). -Once the application developer chooses a transport protocol (if a choice -is available), the application is built using the transport-layer -services provided by that protocol. We'll explore sockets in some detail -in Section 2.7. Addressing Processes In order to send postal mail to a -particular destination, the destination needs to have an address. -Similarly, in order for a process running on one host to send packets to -a process running on another host, the receiving process needs to have -an address. - -Figure 2.3 Application processes, sockets, and underlying transport -protocol - -To identify the receiving process, two pieces of information need to be -specified: (1) the address of the host and (2) an identifier that -specifies the receiving process in the destination host. In the -Internet, the host is identified by its IP address. We'll discuss IP -addresses in great detail in Chapter 4. For now, all we need to know is -that an IP address is a 32-bit quantity that we can think of as uniquely -identifying the host. In addition to knowing the address of the host to -which a message is destined, the sending process must also identify the -receiving process (more specifically, the receiving socket) running in -the host. This information is needed because in general a host could be -running many network applications. A destination port number serves this -purpose. Popular applications have been - -assigned specific port numbers. For example, a Web server is identified -by port number 80. A mail server process (using the SMTP protocol) is -identified by port number 25. A list of well-known port numbers for all -Internet standard protocols can be found at www.iana.org. We'll examine -port numbers in detail in Chapter 3. - -2.1.3 Transport Services Available to Applications Recall that a socket -is the interface between the application process and the transport-layer -protocol. The application at the sending side pushes messages through -the socket. At the other side of the socket, the transport-layer -protocol has the responsibility of getting the messages to the socket of -the receiving process. Many networks, including the Internet, provide -more than one transport-layer protocol. When you develop an application, -you must choose one of the available transport-layer protocols. How do -you make this choice? Most likely, you would study the services provided -by the available transport-layer protocols, and then pick the protocol -with the services that best match your application's needs. The -situation is similar to choosing either train or airplane transport for -travel between two cities. You have to choose one or the other, and each -transportation mode offers different services. (For example, the train -offers downtown pickup and drop-off, whereas the plane offers shorter -travel time.) What are the services that a transport-layer protocol can -offer to applications invoking it? We can broadly classify the possible -services along four dimensions: reliable data transfer, throughput, -timing, and security. Reliable Data Transfer As discussed in Chapter 1, -packets can get lost within a computer network. For example, a packet -can overflow a buffer in a router, or can be discarded by a host or -router after having some of its bits corrupted. For many -applications---such as electronic mail, file transfer, remote host -access, Web document transfers, and financial applications---data loss -can have devastating consequences (in the latter case, for either the -bank or the customer!). Thus, to support these applications, something -has to be done to guarantee that the data sent by one end of the -application is delivered correctly and completely to the other end of -the application. If a protocol provides such a guaranteed data delivery -service, it is said to provide reliable data transfer. One important -service that a transport-layer protocol can potentially provide to an -application is process-to-process reliable data transfer. When a -transport protocol provides this service, the sending process can just -pass its data into the socket and know with complete confidence that the -data will arrive without errors at the receiving process. When a -transport-layer protocol doesn't provide reliable data transfer, some of -the data sent by the - -sending process may never arrive at the receiving process. This may be -acceptable for loss-tolerant applications, most notably multimedia -applications such as conversational audio/video that can tolerate some -amount of data loss. In these multimedia applications, lost data might -result in a small glitch in the audio/video---not a crucial impairment. -Throughput In Chapter 1 we introduced the concept of available -throughput, which, in the context of a communication session between two -processes along a network path, is the rate at which the sending process -can deliver bits to the receiving process. Because other sessions will -be sharing the bandwidth along the network path, and because these other -sessions will be coming and going, the available throughput can -fluctuate with time. These observations lead to another natural service -that a transportlayer protocol could provide, namely, guaranteed -available throughput at some specified rate. With such a service, the -application could request a guaranteed throughput of r bits/sec, and the -transport protocol would then ensure that the available throughput is -always at least r bits/sec. Such a guaranteed throughput service would -appeal to many applications. For example, if an Internet telephony -application encodes voice at 32 kbps, it needs to send data into the -network and have data delivered to the receiving application at this -rate. If the transport protocol cannot provide this throughput, the -application would need to encode at a lower rate (and receive enough -throughput to sustain this lower coding rate) or may have to give up, -since receiving, say, half of the needed throughput is of little or no -use to this Internet telephony application. Applications that have -throughput requirements are said to be bandwidth-sensitive applications. -Many current multimedia applications are bandwidth sensitive, although -some multimedia applications may use adaptive coding techniques to -encode digitized voice or video at a rate that matches the currently -available throughput. While bandwidth-sensitive applications have -specific throughput requirements, elastic applications can make use of -as much, or as little, throughput as happens to be available. Electronic -mail, file transfer, and Web transfers are all elastic applications. Of -course, the more throughput, the better. There'san adage that says that -one cannot be too rich, too thin, or have too much throughput! Timing A -transport-layer protocol can also provide timing guarantees. As with -throughput guarantees, timing guarantees can come in many shapes and -forms. An example guarantee might be that every bit that the sender -pumps into the socket arrives at the receiver's socket no more than 100 -msec later. Such a service would be appealing to interactive real-time -applications, such as Internet telephony, virtual environments, -teleconferencing, and multiplayer games, all of which require tight -timing constraints on data delivery in order to be effective. (See -Chapter 9, \[Gauthier 1999; Ramjee 1994\].) Long delays in Internet -telephony, for example, tend to result in unnatural pauses in the -conversation; in a multiplayer game or virtual interactive environment, -a long delay between taking an action and seeing the response - -from the environment (for example, from another player at the end of an -end-to-end connection) makes the application feel less realistic. For -non-real-time applications, lower delay is always preferable to higher -delay, but no tight constraint is placed on the end-to-end delays. -Security Finally, a transport protocol can provide an application with -one or more security services. For example, in the sending host, a -transport protocol can encrypt all data transmitted by the sending -process, and in the receiving host, the transport-layer protocol can -decrypt the data before delivering the data to the receiving process. -Such a service would provide confidentiality between the two processes, -even if the data is somehow observed between sending and receiving -processes. A transport protocol can also provide other security services -in addition to confidentiality, including data integrity and end-point -authentication, topics that we'll cover in detail in Chapter 8. - -2.1.4 Transport Services Provided by the Internet Up until this point, -we have been considering transport services that a computer network -could provide in general. Let's now get more specific and examine the -type of transport services provided by the Internet. The Internet (and, -more generally, TCP/IP networks) makes two transport protocols available -to applications, UDP and TCP. When you (as an application developer) -create a new network application for the Internet, one of the first -decisions you have to make is whether to use UDP or TCP. Each of these -protocols offers a different set of services to the invoking -applications. Figure 2.4 shows the service requirements for some -selected applications. TCP Services The TCP service model includes a -connection-oriented service and a reliable data transfer service. When -an application invokes TCP as its transport protocol, the application -receives both of these services from TCP. Connection-oriented service. -TCP has the client and server exchange transport-layer control -information with each other before the application-level messages begin -to flow. This so-called handshaking procedure alerts the client and -server, allowing them to prepare for an onslaught of packets. After the -handshaking phase, a TCP connection is said to exist between the sockets - -Figure 2.4 Requirements of selected network applications - -of the two processes. The connection is a full-duplex connection in that -the two processes can send messages to each other over the connection at -the same time. When the application finishes sending messages, it must -tear down the connection. In Chapter 3 we'll discuss connection-oriented -service in detail and examine how it is implemented. Reliable data -transfer service. The communicating processes can rely on TCP to deliver -all data sent without error and in the proper order. When one side of -the application passes a stream of bytes into a socket, it can count on -TCP to deliver the same stream of bytes to the receiving socket, with no -missing or duplicate bytes. TCP also includes a congestion-control -mechanism, a service for the general welfare of the Internet rather than -for the direct benefit of the communicating processes. The TCP -congestion-control mechanism throttles a sending process (client or -server) when the network is congested between sender and receiver. As we -will see - -FOCUS ON SECURITY SECURING TCP Neither TCP nor UDP provides any -encryption---the data that the sending process passes into its socket is -the same data that travels over the network to the destination process. -So, for example, if the sending process sends a password in cleartext -(i.e., unencrypted) into its socket, the cleartext password will travel -over all the links between sender and receiver, potentially getting -sniffed and discovered at any of the intervening links. Because privacy -and other security issues have become critical for many applications, -the Internet community has developed an enhancement for TCP, called -Secure Sockets Layer (SSL). TCP-enhanced-with-SSL not only - -does everything that traditional TCP does but also provides critical -process-to-process security services, including encryption, data -integrity, and end-point authentication. We emphasize that SSL is not a -third Internet transport protocol, on the same level as TCP and UDP, but -instead is an enhancement of TCP, with the enhancements being -implemented in the application layer. In particular, if an application -wants to use the services of SSL, it needs to include SSL code -(existing, highly optimized libraries and classes) in both the client -and server sides of the application. SSL has its own socket API that is -similar to the traditional TCP socket API. When an application uses SSL, -the sending process passes cleartext data to the SSL socket; SSL in the -sending host then encrypts the data and passes the encrypted data to the -TCP socket. The encrypted data travels over the Internet to the TCP -socket in the receiving process. The receiving socket passes the -encrypted data to SSL, which decrypts the data. Finally, SSL passes the -cleartext data through its SSL socket to the receiving process. We'll -cover SSL in some detail in Chapter 8. - -in Chapter 3, TCP congestion control also attempts to limit each TCP -connection to its fair share of network bandwidth. UDP Services UDP is a -no-frills, lightweight transport protocol, providing minimal services. -UDP is connectionless, so there is no handshaking before the two -processes start to communicate. UDP provides an unreliable data transfer -service---that is, when a process sends a message into a UDP socket, UDP -provides no guarantee that the message will ever reach the receiving -process. Furthermore, messages that do arrive at the receiving process -may arrive out of order. UDP does not include a congestion-control -mechanism, so the sending side of UDP can pump data into the layer below -(the network layer) at any rate it pleases. (Note, however, that the -actual end-to-end throughput may be less than this rate due to the -limited transmission capacity of intervening links or due to -congestion). Services Not Provided by Internet Transport Protocols We -have organized transport protocol services along four dimensions: -reliable data transfer, throughput, timing, and security. Which of these -services are provided by TCP and UDP? We have already noted that TCP -provides reliable end-to-end data transfer. And we also know that TCP -can be easily enhanced at the application layer with SSL to provide -security services. But in our brief description of TCP and UDP, -conspicuously missing was any mention of throughput or timing -guarantees--- services not provided by today's Internet transport -protocols. Does this mean that time-sensitive applications such as -Internet telephony cannot run in today's Internet? The answer is clearly -no---the Internet has been hosting time-sensitive applications for many -years. These applications often work fairly well because - -they have been designed to cope, to the greatest extent possible, with -this lack of guarantee. We'll investigate several of these design tricks -in Chapter 9. Nevertheless, clever design has its limitations when delay -is excessive, or the end-to-end throughput is limited. In summary, -today's Internet can often provide satisfactory service to -time-sensitive applications, but it cannot provide any timing or -throughput guarantees. Figure 2.5 indicates the transport protocols used -by some popular Internet applications. We see that email, remote -terminal access, the Web, and file transfer all use TCP. These -applications have chosen TCP primarily because TCP provides reliable -data transfer, guaranteeing that all data will eventually get to its -destination. Because Internet telephony applications (such as Skype) can -often tolerate some loss but require a minimal rate to be effective, -developers of Internet telephony applications usually prefer to run -their applications over UDP, thereby circumventing TCP's congestion -control mechanism and packet overheads. But because many firewalls are -configured to block (most types of) UDP traffic, Internet telephony -applications often are designed to use TCP as a backup if UDP -communication fails. - -Figure 2.5 Popular Internet applications, their application-layer -protocols, and their underlying transport protocols - -2.1.5 Application-Layer Protocols We have just learned that network -processes communicate with each other by sending messages into sockets. -But how are these messages structured? What are the meanings of the -various fields in the messages? When do the processes send the messages? -These questions bring us into the realm of application-layer protocols. -An application-layer protocol defines how an application's processes, -running on different end systems, pass messages to each other. In -particular, an application-layer protocol defines: - -The types of messages exchanged, for example, request messages and -response messages The syntax of the various message types, such as the -fields in the message and how the fields are delineated The semantics of -the fields, that is, the meaning of the information in the fields Rules -for determining when and how a process sends messages and responds to -messages Some application-layer protocols are specified in RFCs and are -therefore in the public domain. For example, the Web's application-layer -protocol, HTTP (the HyperText Transfer Protocol \[RFC 2616\]), is -available as an RFC. If a browser developer follows the rules of the -HTTP RFC, the browser will be able to retrieve Web pages from any Web -server that has also followed the rules of the HTTP RFC. Many other -application-layer protocols are proprietary and intentionally not -available in the public domain. For example, Skype uses proprietary -application-layer protocols. It is important to distinguish between -network applications and application-layer protocols. An -application-layer protocol is only one piece of a network application -(albeit, a very important piece of the application from our point of -view!). Let's look at a couple of examples. The Web is a client-server -application that allows users to obtain documents from Web servers on -demand. The Web application consists of many components, including a -standard for document formats (that is, HTML), Web browsers (for -example, Firefox and Microsoft Internet Explorer), Web servers (for -example, Apache and Microsoft servers), and an application-layer -protocol. The Web's application-layer protocol, HTTP, defines the format -and sequence of messages exchanged between browser and Web server. Thus, -HTTP is only one piece (albeit, an important piece) of the Web -application. As another example, an Internet e-mail application also has -many components, including mail servers that house user mailboxes; mail -clients (such as Microsoft Outlook) that allow users to read and create -messages; a standard for defining the structure of an e-mail message; -and application-layer protocols that define how messages are passed -between servers, how messages are passed between servers and mail -clients, and how the contents of message headers are to be interpreted. -The principal application-layer protocol for electronic mail is SMTP -(Simple Mail Transfer Protocol) \[RFC 5321\]. Thus, e-mail's principal -application-layer protocol, SMTP, is only one piece (albeit an important -piece) of the e-mail application. - -2.1.6 Network Applications Covered in This Book New public domain and -proprietary Internet applications are being developed every day. Rather -than covering a large number of Internet applications in an encyclopedic -manner, we have chosen to focus on a small number of applications that -are both pervasive and important. In this chapter we discuss five -important applications: the Web, electronic mail, directory service -video streaming, and P2P applications. We first discuss the Web, not -only because it is an enormously popular application, but also because -its application-layer protocol, HTTP, is straightforward and easy to -understand. We then discuss electronic mail, the Internet's first killer -application. E-mail is more complex than the Web in the - -sense that it makes use of not one but several application-layer -protocols. After e-mail, we cover DNS, which provides a directory -service for the Internet. Most users do not interact with DNS directly; -instead, users invoke DNS indirectly through other applications -(including the Web, file transfer, and electronic mail). DNS illustrates -nicely how a piece of core network functionality (network-name to -networkaddress translation) can be implemented at the application layer -in the Internet. We then discuss P2P file sharing applications, and -complete our application study by discussing video streaming on demand, -including distributing stored video over content distribution networks. -In Chapter 9, we'll cover multimedia applications in more depth, -including voice over IP and video conferencing. - -2.2 The Web and HTTP Until the early 1990s the Internet was used -primarily by researchers, academics, and university students to log in -to remote hosts, to transfer files from local hosts to remote hosts and -vice versa, to receive and send news, and to receive and send electronic -mail. Although these applications were (and continue to be) extremely -useful, the Internet was essentially unknown outside of the academic and -research communities. Then, in the early 1990s, a major new application -arrived on the scene---the World Wide Web \[Berners-Lee 1994\]. The Web -was the first Internet application that caught the general public's eye. -It dramatically changed, and continues to change, how people interact -inside and outside their work environments. It elevated the Internet -from just one of many data networks to essentially the one and only data -network. Perhaps what appeals the most to users is that the Web operates -on demand. Users receive what they want, when they want it. This is -unlike traditional broadcast radio and television, which force users to -tune in when the content provider makes the content available. In -addition to being available on demand, the Web has many other wonderful -features that people love and cherish. It is enormously easy for any -individual to make information available over the Web---everyone can -become a publisher at extremely low cost. Hyperlinks and search engines -help us navigate through an ocean of information. Photos and videos -stimulate our senses. Forms, JavaScript, Java applets, and many other -devices enable us to interact with pages and sites. And the Web and its -protocols serve as a platform for YouTube, Web-based e-mail (such as -Gmail), and most mobile Internet applications, including Instagram and -Google Maps. - -2.2.1 Overview of HTTP The HyperText Transfer Protocol (HTTP), the Web's -application-layer protocol, is at the heart of the Web. It is defined in -\[RFC 1945\] and \[RFC 2616\]. HTTP is implemented in two programs: a -client program and a server program. The client program and server -program, executing on different end systems, talk to each other by -exchanging HTTP messages. HTTP defines the structure of these messages -and how the client and server exchange the messages. Before explaining -HTTP in detail, we should review some Web terminology. A Web page (also -called a document) consists of objects. An object is simply a -file---such as an HTML file, a JPEG image, a Java applet, or a video -clip---that is addressable by a single URL. Most Web pages consist of a -base HTML file and several referenced objects. For example, if a Web -page - -contains HTML text and five JPEG images, then the Web page has six -objects: the base HTML file plus the five images. The base HTML file -references the other objects in the page with the objects' URLs. Each -URL has two components: the hostname of the server that houses the -object and the object's path name. For example, the URL - -http://www.someSchool.edu/someDepartment/picture.gif - -has www.someSchool.edu for a hostname and /someDepartment/picture.gif -for a path name. Because Web browsers (such as Internet Explorer and -Firefox) implement the client side of HTTP, in the context of the Web, -we will use the words browser and client interchangeably. Web servers, -which implement the server side of HTTP, house Web objects, each -addressable by a URL. Popular Web servers include Apache and Microsoft -Internet Information Server. HTTP defines how Web clients request Web -pages from Web servers and how servers transfer Web pages to clients. We -discuss the interaction between client and server in detail later, but -the general idea is illustrated in Figure 2.6. When a user requests a -Web page (for example, clicks on a hyperlink), the browser sends HTTP -request messages for the objects in the page to the server. The server -receives the requests and responds with HTTP response messages that -contain the objects. HTTP uses TCP as its underlying transport protocol -(rather than running on top of UDP). The HTTP client first initiates a -TCP connection with the server. Once the connection is established, the -browser and the server processes access TCP through their socket -interfaces. As described in Section 2.1, on the client side the socket -interface is the door between the client process and the TCP connection; -on the server side it is the door between the server process and the TCP -connection. The client sends HTTP request messages into its socket -interface and receives HTTP response messages from its socket interface. -Similarly, the HTTP server receives request messages - -Figure 2.6 HTTP request-response behavior - -from its socket interface and sends response messages into its socket -interface. Once the client sends a message into its socket interface, -the message is out of the client's hands and is "in the hands" of TCP. -Recall from Section 2.1 that TCP provides a reliable data transfer -service to HTTP. This implies that each HTTP request message sent by a -client process eventually arrives intact at the server; similarly, each -HTTP response message sent by the server process eventually arrives -intact at the client. Here we see one of the great advantages of a -layered architecture---HTTP need not worry about lost data or the -details of how TCP recovers from loss or reordering of data within the -network. That is the job of TCP and the protocols in the lower layers of -the protocol stack. It is important to note that the server sends -requested files to clients without storing any state information about -the client. If a particular client asks for the same object twice in a -period of a few seconds, the server does not respond by saying that it -just served the object to the client; instead, the server resends the -object, as it has completely forgotten what it did earlier. Because an -HTTP server maintains no information about the clients, HTTP is said to -be a stateless protocol. We also remark that the Web uses the -client-server application architecture, as described in Section 2.1. A -Web server is always on, with a fixed IP address, and it services -requests from potentially millions of different browsers. - -2.2.2 Non-Persistent and Persistent Connections In many Internet -applications, the client and server communicate for an extended period -of time, with the client making a series of requests and the server -responding to each of the requests. Depending on the application and on -how the application is being used, the series of requests may be made -back-to-back, periodically at regular intervals, or intermittently. When -this client-server interaction is taking place over TCP, the application -developer needs to make an important decision---should each -request/response pair be sent over a separate TCP connection, or should -all of the requests and their corresponding responses be sent over the -same TCP connection? In the former approach, the application is said to -use non-persistent connections; and in the latter approach, persistent -connections. To gain a deep understanding of this design issue, let's -examine the advantages and disadvantages of persistent connections in -the context of a specific application, namely, HTTP, which can use both -non-persistent connections and persistent connections. Although HTTP -uses persistent connections in its default mode, HTTP clients and -servers can be configured to use non-persistent connections instead. -HTTP with Non-Persistent Connections - -Let's walk through the steps of transferring a Web page from server to -client for the case of nonpersistent connections. Let's suppose the page -consists of a base HTML file and 10 JPEG images, and that all 11 of -these objects reside on the same server. Further suppose the URL for the -base HTML file is - -http://www.someSchool.edu/someDepartment/home.index - -Here is what happens: - -1. The HTTP client process initiates a TCP connection to the server - www.someSchool.edu on port number 80, which is the default port - number for HTTP. Associated with the TCP connection, there will be a - socket at the client and a socket at the server. - -2. The HTTP client sends an HTTP request message to the server via its - socket. The request message includes the path name - /someDepartment/home .index . (We will discuss HTTP messages in some - detail below.) - -3. The HTTP server process receives the request message via its socket, - retrieves the object /someDepartment/home.index from its storage - (RAM or disk), encapsulates the object in an HTTP response message, - and sends the response message to the client via its socket. - -4. The HTTP server process tells TCP to close the TCP connection. (But - TCP doesn't actually terminate the connection until it knows for - sure that the client has received the response message intact.) - -5. The HTTP client receives the response message. The TCP connection - terminates. The message indicates that the encapsulated object is an - HTML file. The client extracts the file from the response message, - examines the HTML file, and finds references to the 10 JPEG objects. - -6. The first four steps are then repeated for each of the referenced - JPEG objects. As the browser receives the Web page, it displays the - page to the user. Two different browsers may interpret (that is, - display to the user) a Web page in somewhat different ways. HTTP has - nothing to do with how a Web page is interpreted by a client. The - HTTP specifications (\[RFC 1945\] and \[RFC 2616\]) define only the - communication protocol between the client HTTP program and the - server HTTP program. The steps above illustrate the use of - non-persistent connections, where each TCP connection is closed - after the server sends the object---the connection does not persist - for other objects. Note that each TCP connection transports exactly - one request message and one response message. Thus, in this example, - when a user requests the Web page, 11 TCP connections are generated. - In the steps described above, we were intentionally vague about - whether the client obtains the 10 - -JPEGs over 10 serial TCP connections, or whether some of the JPEGs are -obtained over parallel TCP connections. Indeed, users can configure -modern browsers to control the degree of parallelism. In their default -modes, most browsers open 5 to 10 parallel TCP connections, and each of -these connections handles one request-response transaction. If the user -prefers, the maximum number of parallel connections can be set to one, -in which case the 10 connections are established serially. As we'll see -in the next chapter, the use of parallel connections shortens the -response time. Before continuing, let's do a back-of-the-envelope -calculation to estimate the amount of time that elapses from when a -client requests the base HTML file until the entire file is received by -the client. To this end, we define the round-trip time (RTT), which is -the time it takes for a small packet to travel from client to server and -then back to the client. The RTT includes packet-propagation delays, -packetqueuing delays in intermediate routers and switches, and -packet-processing delays. (These delays were discussed in Section 1.4.) -Now consider what happens when a user clicks on a hyperlink. As shown in -Figure 2.7, this causes the browser to initiate a TCP connection between -the browser and the Web server; this involves a "three-way -handshake"---the client sends a small TCP segment to the server, the -server acknowledges and responds with a small TCP segment, and, finally, -the client acknowledges back to the server. The first two parts of the -three-way handshake take one RTT. After completing the first two parts -of the handshake, the client sends the HTTP request message combined -with the third part of the three-way handshake (the acknowledgment) into -the TCP connection. Once the request message arrives at - -Figure 2.7 Back-of-the-envelope calculation for the time needed to -request and receive an HTML file - -the server, the server sends the HTML file into the TCP connection. This -HTTP request/response eats up another RTT. Thus, roughly, the total -response time is two RTTs plus the transmission time at the server of -the HTML file. HTTP with Persistent Connections Non-persistent -connections have some shortcomings. First, a brand-new connection must -be established and maintained for each requested object. For each of -these connections, TCP buffers must be allocated and TCP variables must -be kept in both the client and server. This can place a significant -burden on the Web server, which may be serving requests from hundreds of -different clients simultaneously. Second, as we just described, each -object suffers a delivery delay of two RTTs---one RTT to establish the -TCP connection and one RTT to request and receive an object. With HTTP -1.1 persistent connections, the server leaves the TCP connection open -after sending a response. Subsequent requests and responses between the -same client and server can be sent over the same connection. In -particular, an entire Web page (in the example above, the base HTML file -and the 10 images) can be sent over a single persistent TCP connection. -Moreover, multiple Web pages residing on the same server can be sent -from the server to the same client over a single persistent TCP -connection. These requests for objects can be made back-to-back, without -waiting for replies to pending requests (pipelining). Typically, the -HTTP server closes a connection when it isn't used for a certain time (a -configurable timeout interval). When the server receives the -back-to-back requests, it sends the objects back-to-back. The default -mode of HTTP uses persistent connections with pipelining. Most recently, -HTTP/2 \[RFC 7540\] builds on HTTP 1.1 by allowing multiple requests and -replies to be interleaved in the same connection, and a mechanism for -prioritizing HTTP message requests and replies within this connection. -We'll quantitatively compare the performance of non-persistent and -persistent connections in the homework problems of Chapters 2 and 3. You -are also encouraged to see \[Heidemann 1997; Nielsen 1997; RFC 7540\]. - -2.2.3 HTTP Message Format The HTTP specifications \[RFC 1945; RFC 2616; -RFC 7540\] include the definitions of the HTTP message formats. There -are two types of HTTP messages, request messages and response messages, -both of which are discussed below. HTTP Request Message - -Below we provide a typical HTTP request message: - -GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu Connection: -close User-agent: Mozilla/5.0 Accept-language: fr - -We can learn a lot by taking a close look at this simple request -message. First of all, we see that the message is written in ordinary -ASCII text, so that your ordinary computer-literate human being can read -it. Second, we see that the message consists of five lines, each -followed by a carriage return and a line feed. The last line is followed -by an additional carriage return and line feed. Although this particular -request message has five lines, a request message can have many more -lines or as few as one line. The first line of an HTTP request message -is called the request line; the subsequent lines are called the header -lines. The request line has three fields: the method field, the URL -field, and the HTTP version field. The method field can take on several -different values, including GET, POST, HEAD, PUT, and DELETE . The great -majority of HTTP request messages use the GET method. The GET method is -used when the browser requests an object, with the requested object -identified in the URL field. In this example, the browser is requesting -the object /somedir/page.html . The version is selfexplanatory; in this -example, the browser implements version HTTP/1.1. Now let's look at the -header lines in the example. The header line Host: www.someschool.edu -specifies the host on which the object resides. You might think that -this header line is unnecessary, as there is already a TCP connection in -place to the host. But, as we'll see in Section 2.2.5, the information -provided by the host header line is required by Web proxy caches. By -including the Connection: close header line, the browser is telling the -server that it doesn't want to bother with persistent connections; it -wants the server to close the connection after sending the requested -object. The Useragent: header line specifies the user agent, that is, -the browser type that is making the request to the server. Here the user -agent is Mozilla/5.0, a Firefox browser. This header line is useful -because the server can actually send different versions of the same -object to different types of user agents. (Each of the versions is -addressed by the same URL.) Finally, the Accept-language: header -indicates that the user prefers to receive a French version of the -object, if such an object exists on the server; otherwise, the server -should send its default version. The Accept-language: header is just one -of many content negotiation headers available in HTTP. Having looked at -an example, let's now look at the general format of a request message, -as shown in Figure 2.8. We see that the general format closely follows -our earlier example. You may have noticed, - -however, that after the header lines (and the additional carriage return -and line feed) there is an "entity body." The entity body is empty with -the GET method, but is used with the POST method. An HTTP client often -uses the POST method when the user fills out a form---for example, when -a user provides search words to a search engine. With a POST message, -the user is still requesting a Web page from the server, but the -specific contents of the Web page - -Figure 2.8 General format of an HTTP request message - -depend on what the user entered into the form fields. If the value of -the method field is POST , then the entity body contains what the user -entered into the form fields. We would be remiss if we didn't mention -that a request generated with a form does not necessarily use the POST -method. Instead, HTML forms often use the GET method and include the -inputted data (in the form fields) in the requested URL. For example, if -a form uses the GET method, has two fields, and the inputs to the two -fields are monkeys and bananas , then the URL will have the structure -www.somesite.com/animalsearch?monkeys&bananas . In your day-to-day Web -surfing, you have probably noticed extended URLs of this sort. The HEAD -method is similar to the GET method. When a server receives a request -with the HEAD method, it responds with an HTTP message but it leaves out -the requested object. Application developers often use the HEAD method -for debugging. The PUT method is often used in conjunction with Web -publishing tools. It allows a user to upload an object to a specific -path (directory) on a specific Web server. The PUT method is also used -by applications that need to upload objects to Web servers. The DELETE -method allows a user, or an application, to delete an object on a Web -server. HTTP Response Message - -Below we provide a typical HTTP response message. This response message -could be the response to the example request message just discussed. - -HTTP/1.1 200 OK Connection: close Date: Tue, 18 Aug 2015 15:44:04 GMT -Server: Apache/2.2.3 (CentOS) Last-Modified: Tue, 18 Aug 2015 15:11:03 -GMT Content-Length: 6821 Content-Type: text/html (data data data data -data ...) - -Let's take a careful look at this response message. It has three -sections: an initial status line, six header lines, and then the entity -body. The entity body is the meat of the message---it contains the -requested object itself (represented by data data data data data ... ). -The status line has three fields: the protocol version field, a status -code, and a corresponding status message. In this example, the status -line indicates that the server is using HTTP/1.1 and that everything is -OK (that is, the server has found, and is sending, the requested -object). Now let's look at the header lines. The server uses the -Connection: close header line to tell the client that it is going to -close the TCP connection after sending the message. The Date: header -line indicates the time and date when the HTTP response was created and -sent by the server. Note that this is not the time when the object was -created or last modified; it is the time when the server retrieves the -object from its file system, inserts the object into the response -message, and sends the response message. The Server: header line -indicates that the message was generated by an Apache Web server; it is -analogous to the User-agent: header line in the HTTP request message. -The LastModified: header line indicates the time and date when the -object was created or last modified. The Last-Modified: header, which we -will soon cover in more detail, is critical for object caching, both in -the local client and in network cache servers (also known as proxy -servers). The Content-Length: header line indicates the number of bytes -in the object being sent. The Content-Type: header line indicates that -the object in the entity body is HTML text. (The object type is -officially indicated by the Content-Type: header and not by the file -extension.) Having looked at an example, let's now examine the general -format of a response message, which is shown in Figure 2.9. This general -format of the response message matches the previous example of a -response message. Let's say a few additional words about status codes -and their phrases. The status - -code and associated phrase indicate the result of the request. Some -common status codes and associated phrases include: 200 OK: Request -succeeded and the information is returned in the response. 301 Moved -Permanently: Requested object has been permanently moved; the new URL is -specified in Location : header of the response message. The client -software will automatically retrieve the new URL. 400 Bad Request: This -is a generic error code indicating that the request could not be -understood by the server. - -Figure 2.9 General format of an HTTP response message - -404 Not Found: The requested document does not exist on this server. 505 -HTTP Version Not Supported: The requested HTTP protocol version is not -supported by the server. How would you like to see a real HTTP response -message? This is highly recommended and very easy to do! First Telnet -into your favorite Web server. Then type in a one-line request message -for some object that is housed on the server. For example, if you have -access to a command prompt, type: - -Using Wireshark to investigate the HTTP protocol - -telnet gaia.cs.umass.edu 80 GET /kurose_ross/interactive/index.php -HTTP/1.1 Host: gaia.cs.umass.edu - -(Press the carriage return twice after typing the last line.) This opens -a TCP connection to port 80 of the host gaia.cs.umass.edu and then sends -the HTTP request message. You should see a response message that -includes the base HTML file for the interactive homework problems for -this textbook. If you'd rather just see the HTTP message lines and not -receive the object itself, replace GET with HEAD . In this section we -discussed a number of header lines that can be used within HTTP request -and response messages. The HTTP specification defines many, many more -header lines that can be inserted by browsers, Web servers, and network -cache servers. We have covered only a small number of the totality of -header lines. We'll cover a few more below and another small number when -we discuss network Web caching in Section 2.2.5. A highly readable and -comprehensive discussion of the HTTP protocol, including its headers and -status codes, is given in \[Krishnamurthy 2001\]. How does a browser -decide which header lines to include in a request message? How does a -Web server decide which header lines to include in a response message? A -browser will generate header lines as a function of the browser type and -version (for example, an HTTP/1.0 browser will not generate any 1.1 -header lines), the user configuration of the browser (for example, -preferred language), and whether the browser currently has a cached, but -possibly out-of-date, version of the object. Web servers behave -similarly: There are different products, versions, and configurations, -all of which influence which header lines are included in response -messages. - -2.2.4 User-Server Interaction: Cookies We mentioned above that an HTTP -server is stateless. This simplifies server design and has permitted -engineers to develop high-performance Web servers that can handle -thousands of simultaneous TCP connections. However, it is often -desirable for a Web site to identify users, either because the server -wishes to restrict user access or because it wants to serve content as a -function of the user identity. For these purposes, HTTP uses cookies. -Cookies, defined in \[RFC 6265\], allow sites to keep track of users. -Most major commercial Web sites use cookies today. As shown in Figure -2.10, cookie technology has four components: (1) a cookie header line in -the HTTP response message; (2) a cookie header line in the HTTP request -message; (3) a cookie file kept on the - -user's end system and managed by the user's browser; and (4) a back-end -database at the Web site. Using Figure 2.10, let's walk through an -example of how cookies work. Suppose Susan, who always accesses the Web -using Internet Explorer from her home PC, contacts Amazon.com for the -first time. Let us suppose that in the past she has already visited the -eBay site. When the request comes into the Amazon Web server, the server -creates a unique identification number and creates an entry in its -backend database that is indexed by the identification number. The -Amazon Web server then responds to Susan's browser, including in the -HTTP response a Set-cookie: header, which contains the identification -number. For example, the header line might be: - -Set-cookie: 1678 - -When Susan's browser receives the HTTP response message, it sees the -Set-cookie: header. The browser then appends a line to the special -cookie file that it manages. This line includes the hostname of the -server and the identification number in the Set-cookie: header. Note -that the cookie file already has an entry for eBay, since Susan has -visited that site in the past. As Susan continues to browse the Amazon -site, each time she requests a Web page, her browser consults her cookie -file, extracts her identification number for this site, and puts a -cookie header line that - -Figure 2.10 Keeping user state with cookies - -includes the identification number in the HTTP request. Specifically, -each of her HTTP requests to the Amazon server includes the header line: - -Cookie: 1678 - -In this manner, the Amazon server is able to track Susan's activity at -the Amazon site. Although the Amazon Web site does not necessarily know -Susan's name, it knows exactly which pages user 1678 visited, in which -order, and at what times! Amazon uses cookies to provide its shopping -cart service--- Amazon can maintain a list of all of Susan's intended -purchases, so that she can pay for them - -collectively at the end of the session. If Susan returns to Amazon's -site, say, one week later, her browser will continue to put the header -line Cookie: 1678 in the request messages. Amazon also recommends -products to Susan based on Web pages she has visited at Amazon in the -past. If Susan also registers herself with Amazon--- providing full -name, e-mail address, postal address, and credit card -information---Amazon can then include this information in its database, -thereby associating Susan's name with her identification number (and all -of the pages she has visited at the site in the past!). This is how -Amazon and other e-commerce sites provide "one-click shopping"---when -Susan chooses to purchase an item during a subsequent visit, she doesn't -need to re-enter her name, credit card number, or address. From this -discussion we see that cookies can be used to identify a user. The first -time a user visits a site, the user can provide a user identification -(possibly his or her name). During the subsequent sessions, the browser -passes a cookie header to the server, thereby identifying the user to -the server. Cookies can thus be used to create a user session layer on -top of stateless HTTP. For example, when a user logs in to a Web-based -e-mail application (such as Hotmail), the browser sends cookie -information to the server, permitting the server to identify the user -throughout the user's session with the application. Although cookies -often simplify the Internet shopping experience for the user, they are -controversial because they can also be considered as an invasion of -privacy. As we just saw, using a combination of cookies and -user-supplied account information, a Web site can learn a lot about a -user and potentially sell this information to a third party. Cookie -Central \[Cookie Central 2016\] includes extensive information on the -cookie controversy. - -2.2.5 Web Caching A Web cache---also called a proxy server---is a -network entity that satisfies HTTP requests on the behalf of an origin -Web server. The Web cache has its own disk storage and keeps copies of -recently requested objects in this storage. As shown in Figure 2.11, a -user's browser can be configured so that all of the user's HTTP requests -are first directed to the Web cache. Once a browser is configured, each -browser request for an object is first directed to the Web cache. As an -example, suppose a browser is requesting the object -http://www.someschool.edu/campus.gif . Here is what happens: - -1. The browser establishes a TCP connection to the Web cache and sends - an HTTP request for the object to the Web cache. - -2. The Web cache checks to see if it has a copy of the object stored - locally. If it does, the Web cache returns the object within an HTTP - response message to the client browser. - -Figure 2.11 Clients requesting objects through a Web cache - -3. If the Web cache does not have the object, the Web cache opens a TCP - connection to the origin server, that is, to www.someschool.edu . - The Web cache then sends an HTTP request for the object into the - cache-to-server TCP connection. After receiving this request, the - origin server sends the object within an HTTP response to the Web - cache. - -4. When the Web cache receives the object, it stores a copy in its - local storage and sends a copy, within an HTTP response message, to - the client browser (over the existing TCP connection between the - client browser and the Web cache). Note that a cache is both a - server and a client at the same time. When it receives requests from - and sends responses to a browser, it is a server. When it sends - requests to and receives responses from an origin server, it is a - client. Typically a Web cache is purchased and installed by an ISP. - For example, a university might install a cache on its campus - network and configure all of the campus browsers to point to the - cache. Or a major residential ISP (such as Comcast) might install - one or more caches in its network and preconfigure its shipped - browsers to point to the installed caches. Web caching has seen - deployment in the Internet for two reasons. First, a Web cache can - substantially reduce the response time for a client request, - particularly if the bottleneck bandwidth between the client and the - origin server is much less than the bottleneck bandwidth between the - client and the cache. If there is a high-speed connection between - the client and the cache, as there often is, and if the cache has - the requested object, then the cache will be able to deliver the - object rapidly to the client. Second, as we will soon illustrate - with an example, Web caches can substantially reduce traffic on an - institution's access link to the Internet. By reducing traffic, the - institution (for example, a company or a university) does not have - to upgrade bandwidth as quickly, thereby reducing costs. - Furthermore, Web caches can - -substantially reduce Web traffic in the Internet as a whole, thereby -improving performance for all applications. To gain a deeper -understanding of the benefits of caches, let's consider an example in -the context of Figure 2.12. This figure shows two networks---the -institutional network and the rest of the public Internet. The -institutional network is a high-speed LAN. A router in the institutional -network and a router in the Internet are connected by a 15 Mbps link. -The origin servers are attached to the Internet but are located all over -the globe. Suppose that the average object size is 1 Mbits and that the -average request rate from the institution's browsers to the origin -servers is 15 requests per second. Suppose that the HTTP request -messages are negligibly small and thus create no traffic in the networks -or in the access link (from institutional router to Internet router). -Also suppose that the amount of time it takes from when the router on -the Internet side of the access link in Figure 2.12 forwards an HTTP -request (within an IP datagram) until it receives the response -(typically within many IP datagrams) is two seconds on average. -Informally, we refer to this last delay as the "Internet delay." - -Figure 2.12 Bottleneck between an institutional network and the Internet - -The total response time---that is, the time from the browser's request -of an object until its receipt of the object---is the sum of the LAN -delay, the access delay (that is, the delay between the two routers), -and - -the Internet delay. Let's now do a very crude calculation to estimate -this delay. The traffic intensity on the LAN (see Section 1.4.2) is (15 -requests/sec)⋅(1 Mbits/request)/(100 Mbps)=0.15 whereas the traffic -intensity on the access link (from the Internet router to institution -router) is (15 requests/sec)⋅(1 Mbits/request)/(15 Mbps)=1 A traffic -intensity of 0.15 on a LAN typically results in, at most, tens of -milliseconds of delay; hence, we can neglect the LAN delay. However, as -discussed in Section 1.4.2, as the traffic intensity approaches 1 (as is -the case of the access link in Figure 2.12), the delay on a link becomes -very large and grows without bound. Thus, the average response time to -satisfy requests is going to be on the order of minutes, if not more, -which is unacceptable for the institution's users. Clearly something -must be done. One possible solution is to increase the access rate from -15 Mbps to, say, 100 Mbps. This will lower the traffic intensity on the -access link to 0.15, which translates to negligible delays between the -two routers. In this case, the total response time will roughly be two -seconds, that is, the Internet delay. But this solution also means that -the institution must upgrade its access link from 15 Mbps to 100 Mbps, a -costly proposition. Now consider the alternative solution of not -upgrading the access link but instead installing a Web cache in the -institutional network. This solution is illustrated in Figure 2.13. Hit -rates---the fraction of requests that are satisfied by a cache--- -typically range from 0.2 to 0.7 in practice. For illustrative purposes, -let's suppose that the cache provides a hit rate of 0.4 for this -institution. Because the clients and the cache are connected to the same -high-speed LAN, 40 percent of the requests will be satisfied almost -immediately, say, within 10 milliseconds, by the cache. Nevertheless, -the remaining 60 percent of the requests still need to be satisfied by -the origin servers. But with only 60 percent of the requested objects -passing through the access link, the traffic intensity on the access -link is reduced from 1.0 to 0.6. Typically, a traffic intensity less -than 0.8 corresponds to a small delay, say, tens of milliseconds, on a -15 Mbps link. This delay is negligible compared with the two-second -Internet delay. Given these considerations, average delay therefore is -0.4⋅(0.01 seconds)+0.6⋅(2.01 seconds) which is just slightly greater -than 1.2 seconds. Thus, this second solution provides an even lower -response time than the first solution, and it doesn't require the -institution - -Figure 2.13 Adding a cache to the institutional network - -to upgrade its link to the Internet. The institution does, of course, -have to purchase and install a Web cache. But this cost is low---many -caches use public-domain software that runs on inexpensive PCs. Through -the use of Content Distribution Networks (CDNs), Web caches are -increasingly playing an important role in the Internet. A CDN company -installs many geographically distributed caches throughout the Internet, -thereby localizing much of the traffic. There are shared CDNs (such as -Akamai and Limelight) and dedicated CDNs (such as Google and Netflix). -We will discuss CDNs in more detail in Section 2.6. The Conditional GET -Although caching can reduce user-perceived response times, it introduces -a new problem---the copy of an object residing in the cache may be -stale. In other words, the object housed in the Web server may have been -modified since the copy was cached at the client. Fortunately, HTTP has -a mechanism that allows a cache to verify that its objects are up to -date. This mechanism is called the conditional GET. - -An HTTP request message is a so-called conditional GET message if (1) -the request message uses the GET method and (2) the request message -includes an If-Modified-Since: header line. To illustrate how the -conditional GET operates, let's walk through an example. First, on the -behalf of a requesting browser, a proxy cache sends a request message to -a Web server: - -GET /fruit/kiwi.gif HTTP/1.1 Host: www.exotiquecuisine.com - -Second, the Web server sends a response message with the requested -object to the cache: - -HTTP/1.1 200 OK Date: Sat, 3 Oct 2015 15:39:29 Server: Apache/1.3.0 -(Unix) Last-Modified: Wed, 9 Sep 2015 09:23:24 Content-Type: image/gif -(data data data data data ...) - -The cache forwards the object to the requesting browser but also caches -the object locally. Importantly, the cache also stores the last-modified -date along with the object. Third, one week later, another browser -requests the same object via the cache, and the object is still in the -cache. Since this object may have been modified at the Web server in the -past week, the cache performs an up-to-date check by issuing a -conditional GET. Specifically, the cache sends: - -GET /fruit/kiwi.gif HTTP/1.1 Host: www.exotiquecuisine.com -If-modified-since: Wed, 9 Sep 2015 09:23:24 - -Note that the value of the If-modified-since: header line is exactly -equal to the value of the Last-Modified: header line that was sent by -the server one week ago. This conditional GET is telling the server to -send the object only if the object has been modified since the specified -date. Suppose the object has not been modified since 9 Sep 2015 -09:23:24. Then, fourth, the Web server sends a response message to the -cache: - -HTTP/1.1 304 Not Modified Date: Sat, 10 Oct 2015 15:39:29 Server: -Apache/1.3.0 (Unix) (empty entity body) - -We see that in response to the conditional GET, the Web server still -sends a response message but does not include the requested object in -the response message. Including the requested object would only waste -bandwidth and increase user-perceived response time, particularly if the -object is large. Note that this last response message has 304 Not -Modified in the status line, which tells the cache that it can go ahead -and forward its (the proxy cache's) cached copy of the object to the -requesting browser. This ends our discussion of HTTP, the first Internet -protocol (an application-layer protocol) that we've studied in detail. -We've seen the format of HTTP messages and the actions taken by the Web -client and server as these messages are sent and received. We've also -studied a bit of the Web's application infrastructure, including caches, -cookies, and back-end databases, all of which are tied in some way to -the HTTP protocol. - -2.3 Electronic Mail in the Internet Electronic mail has been around -since the beginning of the Internet. It was the most popular application -when the Internet was in its infancy \[Segaller 1998\], and has become -more elaborate and powerful over the years. It remains one of the -Internet's most important and utilized applications. As with ordinary -postal mail, e-mail is an asynchronous communication medium---people -send and read messages when it is convenient for them, without having to -coordinate with other people's schedules. In contrast with postal mail, -electronic mail is fast, easy to distribute, and inexpensive. Modern -e-mail has many powerful features, including messages with attachments, -hyperlinks, HTML-formatted text, and embedded photos. In this section, -we examine the application-layer protocols that are at the heart of -Internet e-mail. But before we jump into an in-depth discussion of these -protocols, let's take a high-level view of the Internet mail system and -its key components. Figure 2.14 presents a high-level view of the -Internet mail system. We see from this diagram that it has three major -components: user agents, mail servers, and the Simple Mail Transfer -Protocol (SMTP). We now describe each of these components in the context -of a sender, Alice, sending an e-mail message to a recipient, Bob. User -agents allow users to read, reply to, forward, save, and compose -messages. Microsoft Outlook and Apple Mail are examples of user agents -for e-mail. When Alice is finished composing her message, her user agent -sends the message to her mail server, where the message is placed in the -mail server's outgoing message queue. When Bob wants to read a message, -his user agent retrieves the message from his mailbox in his mail -server. Mail servers form the core of the e-mail infrastructure. Each -recipient, such as Bob, has a mailbox located in one of the mail -servers. Bob's mailbox manages and - -Figure 2.14 A high-level view of the Internet e-mail system - -maintains the messages that have been sent to him. A typical message -starts its journey in the sender's user agent, travels to the sender's -mail server, and travels to the recipient's mail server, where it is -deposited in the recipient's mailbox. When Bob wants to access the -messages in his mailbox, the mail server containing his mailbox -authenticates Bob (with usernames and passwords). Alice's mail server -must also deal with failures in Bob's mail server. If Alice's server -cannot deliver mail to Bob's server, Alice's server holds the message in -a message queue and attempts to transfer the message later. Reattempts -are often done every 30 minutes or so; if there is no success after -several days, the server removes the message and notifies the sender -(Alice) with an e-mail message. SMTP is the principal application-layer -protocol for Internet electronic mail. It uses the reliable data -transfer service of TCP to transfer mail from the sender's mail server -to the recipient's mail server. As with most application-layer -protocols, SMTP has two sides: a client side, which executes on the -sender's mail server, and a server side, which executes on the -recipient's mail server. Both the client and server sides of SMTP run on -every mail server. When a mail server sends mail to other mail servers, -it acts as an SMTP client. When a mail server receives mail from other -mail servers, it acts as an SMTP server. - -2.3.1 SMTP SMTP, defined in RFC 5321, is at the heart of Internet -electronic mail. As mentioned above, SMTP transfers messages from -senders' mail servers to the recipients' mail servers. SMTP is much -older than HTTP. (The original SMTP RFC dates back to 1982, and SMTP was -around long before that.) Although SMTP has numerous wonderful -qualities, as evidenced by its ubiquity in the Internet, it is -nevertheless a legacy technology that possesses certain archaic -characteristics. For example, it restricts the body (not just the -headers) of all mail messages to simple 7-bit ASCII. This restriction -made sense in the early 1980s when transmission capacity was scarce and -no one was e-mailing large attachments or large image, audio, or video -files. But today, in the multimedia era, the 7-bit ASCII restriction is -a bit of a pain ---it requires binary multimedia data to be encoded to -ASCII before being sent over SMTP; and it requires the corresponding -ASCII message to be decoded back to binary after SMTP transport. Recall -from Section 2.2 that HTTP does not require multimedia data to be ASCII -encoded before transfer. To illustrate the basic operation of SMTP, -let's walk through a common scenario. Suppose Alice wants to send Bob a -simple ASCII message. - -1. Alice invokes her user agent for e-mail, provides Bob's e-mail - address (for example, bob@someschool.edu ), composes a message, and - instructs the user agent to send the message. - -2. Alice's user agent sends the message to her mail server, where it is - placed in a message queue. - -3. The client side of SMTP, running on Alice's mail server, sees the - message in the message queue. It opens a TCP connection to an SMTP - server, running on Bob's mail server. - -4. After some initial SMTP handshaking, the SMTP client sends Alice's - message into the TCP connection. - -5. At Bob's mail server, the server side of SMTP receives the message. - Bob's mail server then places the message in Bob's mailbox. - -6. Bob invokes his user agent to read the message at his convenience. - The scenario is summarized in Figure 2.15. It is important to - observe that SMTP does not normally use intermediate mail servers - for sending mail, even when the two mail servers are located at - opposite ends of the world. If Alice's server is in Hong Kong and - Bob's server is in St. Louis, the TCP - -Figure 2.15 Alice sends a message to Bob - -connection is a direct connection between the Hong Kong and St. Louis -servers. In particular, if Bob's mail server is down, the message -remains in Alice's mail server and waits for a new attempt---the message -does not get placed in some intermediate mail server. Let's now take a -closer look at how SMTP transfers a message from a sending mail server -to a receiving mail server. We will see that the SMTP protocol has many -similarities with protocols that are used for face-to-face human -interaction. First, the client SMTP (running on the sending mail server -host) has TCP establish a connection to port 25 at the server SMTP -(running on the receiving mail server host). If the server is down, the -client tries again later. Once this connection is established, the -server and client perform some application-layer handshaking---just as -humans often introduce themselves before transferring information from -one to another, SMTP clients and servers introduce themselves before -transferring information. During this SMTP handshaking phase, the SMTP -client indicates the email address of the sender (the person who -generated the message) and the e-mail address of the recipient. Once the -SMTP client and server have introduced themselves to each other, the -client sends the message. SMTP can count on the reliable data transfer -service of TCP to get the message to the server without errors. The -client then repeats this process over the same TCP connection if it has -other messages to send to the server; otherwise, it instructs TCP to -close the connection. Let's next take a look at an example transcript of -messages exchanged between an SMTP client (C) and an SMTP server (S). -The hostname of the client is crepes.fr and the hostname of the server -is hamburger.edu . The ASCII text lines prefaced with C: are exactly the -lines the client sends into its TCP socket, and the ASCII text lines -prefaced with S: are exactly the lines the server sends into its TCP -socket. The following transcript begins as soon as the TCP connection is -established. - -S: 220 hamburger.edu C: HELO crepes.fr S: 250 Hello crepes.fr, -pleased to meet you - -C: MAIL FROM: <alice@crepes.fr> S: 250 alice@crepes.fr ... Sender ok -C: RCPT TO: <bob@hamburger.edu> S: 250 bob@hamburger.edu ... Recipient -ok C: DATA S: 354 Enter mail, end with "." on a line by itself C: Do -you like ketchup? C: How about pickles? C: . S: 250 Message accepted -for delivery C: QUIT S: 221 hamburger.edu closing connection - -In the example above, the client sends a message (" Do you like ketchup? -How about pickles? ") from mail server crepes.fr to mail server -hamburger.edu . As part of the dialogue, the client issued five -commands: HELO (an abbreviation for HELLO), MAIL FROM , RCPT TO , DATA , -and QUIT . These commands are self-explanatory. The client also sends a -line consisting of a single period, which indicates the end of the -message to the server. (In ASCII jargon, each message ends with -CRLF.CRLF , where CR and LF stand for carriage return and line feed, -respectively.) The server issues replies to each command, with each -reply having a reply code and some (optional) Englishlanguage -explanation. We mention here that SMTP uses persistent connections: If -the sending mail server has several messages to send to the same -receiving mail server, it can send all of the messages over the same TCP -connection. For each message, the client begins the process with a new -MAIL FROM: crepes.fr , designates the end of message with an isolated -period, and issues QUIT only after all messages have been sent. It is -highly recommended that you use Telnet to carry out a direct dialogue -with an SMTP server. To do this, issue - -telnet serverName 25 - -where serverName is the name of a local mail server. When you do this, -you are simply establishing a TCP connection between your local host and -the mail server. After typing this line, you should immediately receive -the 220 reply from the server. Then issue the SMTP commands HELO , MAIL -FROM , RCPT TO , DATA , CRLF.CRLF , and QUIT at the appropriate times. -It is also highly recommended that you do Programming Assignment 3 at -the end of this chapter. In that assignment, you'll build a simple user -agent that implements the client side of SMTP. It will allow you to send -an e- - -mail message to an arbitrary recipient via a local mail server. - -2.3.2 Comparison with HTTP Let's now briefly compare SMTP with HTTP. -Both protocols are used to transfer files from one host to another: HTTP -transfers files (also called objects) from a Web server to a Web client -(typically a browser); SMTP transfers files (that is, e-mail messages) -from one mail server to another mail server. When transferring the -files, both persistent HTTP and SMTP use persistent connections. Thus, -the two protocols have common characteristics. However, there are -important differences. First, HTTP is mainly a pull protocol---someone -loads information on a Web server and users use HTTP to pull the -information from the server at their convenience. In particular, the TCP -connection is initiated by the machine that wants to receive the file. -On the other hand, SMTP is primarily a push protocol---the sending mail -server pushes the file to the receiving mail server. In particular, the -TCP connection is initiated by the machine that wants to send the file. -A second difference, which we alluded to earlier, is that SMTP requires -each message, including the body of each message, to be in 7-bit ASCII -format. If the message contains characters that are not 7-bit ASCII (for -example, French characters with accents) or contains binary data (such -as an image file), then the message has to be encoded into 7-bit ASCII. -HTTP data does not impose this restriction. A third important difference -concerns how a document consisting of text and images (along with -possibly other media types) is handled. As we learned in Section 2.2, -HTTP encapsulates each object in its own HTTP response message. SMTP -places all of the message's objects into one message. - -2.3.3 Mail Message Formats When Alice writes an ordinary snail-mail -letter to Bob, she may include all kinds of peripheral header -information at the top of the letter, such as Bob's address, her own -return address, and the date. Similarly, when an e-mail message is sent -from one person to another, a header containing peripheral information -precedes the body of the message itself. This peripheral information is -contained in a series of header lines, which are defined in RFC 5322. -The header lines and the body of the message are separated by a blank -line (that is, by CRLF ). RFC 5322 specifies the exact format for mail -header lines as well as their semantic interpretations. As with HTTP, -each header line contains readable text, consisting of a keyword -followed by a colon followed by a value. Some of the keywords are -required and others are optional. Every header must have a From: header -line and a To: header line; a header may include a Subject: header line -as well as other optional header lines. It is important to note that -these header lines are different from the SMTP commands we studied in -Section 2.4.1 (even though - -they contain some common words such as "from" and "to"). The commands in -that section were part of the SMTP handshaking protocol; the header -lines examined in this section are part of the mail message itself. A -typical message header looks like this: - -From: alice@crepes.fr To: bob@hamburger.edu Subject: Searching for the -meaning of life. - -After the message header, a blank line follows; then the message body -(in ASCII) follows. You should use Telnet to send a message to a mail -server that contains some header lines, including the Subject: header -line. To do this, issue telnet serverName 25, as discussed in Section -2.4.1. - -2.3.4 Mail Access Protocols Once SMTP delivers the message from Alice's -mail server to Bob's mail server, the message is placed in Bob's -mailbox. Throughout this discussion we have tacitly assumed that Bob -reads his mail by logging onto the server host and then executing a mail -reader that runs on that host. Up until the early 1990s this was the -standard way of doing things. But today, mail access uses a -client-server architecture---the typical user reads e-mail with a client -that executes on the user's end system, for example, on an office PC, a -laptop, or a smartphone. By executing a mail client on a local PC, users -enjoy a rich set of features, including the ability to view multimedia -messages and attachments. Given that Bob (the recipient) executes his -user agent on his local PC, it is natural to consider placing a mail -server on his local PC as well. With this approach, Alice's mail server -would dialogue directly with Bob's PC. There is a problem with this -approach, however. Recall that a mail server manages mailboxes and runs -the client and server sides of SMTP. If Bob's mail server were to reside -on his local PC, then Bob's PC would have to remain always on, and -connected to the Internet, in order to receive new mail, which can -arrive at any time. This is impractical for many Internet users. -Instead, a typical user runs a user agent on the local PC but accesses -its mailbox stored on an always-on shared mail server. This mail server -is shared with other users and is typically maintained by the user's ISP -(for example, university or company). Now let's consider the path an -e-mail message takes when it is sent from Alice to Bob. We just learned -that at some point along the path the e-mail message needs to be -deposited in Bob's mail server. This could be done simply by having -Alice's user agent send the message directly to Bob's mail server. And - -this could be done with SMTP---indeed, SMTP has been designed for -pushing e-mail from one host to another. However, typically the sender's -user agent does not dialogue directly with the recipient's mail server. -Instead, as shown in Figure 2.16, Alice's user agent uses SMTP to push -the e-mail message into her mail server, then Alice's mail server uses -SMTP (as an SMTP client) to relay the e-mail message to Bob's mail -server. Why the two-step procedure? Primarily because without relaying -through Alice's mail server, Alice's user agent doesn't have any -recourse to an unreachable destination - -Figure 2.16 E-mail protocols and their communicating entities - -mail server. By having Alice first deposit the e-mail in her own mail -server, Alice's mail server can repeatedly try to send the message to -Bob's mail server, say every 30 minutes, until Bob's mail server becomes -operational. (And if Alice's mail server is down, then she has the -recourse of complaining to her system administrator!) The SMTP RFC -defines how the SMTP commands can be used to relay a message across -multiple SMTP servers. But there is still one missing piece to the -puzzle! How does a recipient like Bob, running a user agent on his local -PC, obtain his messages, which are sitting in a mail server within Bob's -ISP? Note that Bob's user agent can't use SMTP to obtain the messages -because obtaining the messages is a pull operation, whereas SMTP is a -push protocol. The puzzle is completed by introducing a special mail -access protocol that transfers messages from Bob's mail server to his -local PC. There are currently a number of popular mail access protocols, -including Post Office Protocol---Version 3 (POP3), Internet Mail Access -Protocol (IMAP), and HTTP. Figure 2.16 provides a summary of the -protocols that are used for Internet mail: SMTP is used to transfer mail -from the sender's mail server to the recipient's mail server; SMTP is -also used to transfer mail from the sender's user agent to the sender's -mail server. A mail access protocol, such as POP3, is used to transfer -mail from the recipient's mail server to the recipient's user agent. -POP3 POP3 is an extremely simple mail access protocol. It is defined in -\[RFC 1939\], which is short and quite readable. Because the protocol is -so simple, its functionality is rather limited. POP3 begins when the -user agent (the client) opens a TCP connection to the mail server (the -server) on port 110. With the TCP - -connection established, POP3 progresses through three phases: -authorization, transaction, and update. During the first phase, -authorization, the user agent sends a username and a password (in the -clear) to authenticate the user. During the second phase, transaction, -the user agent retrieves messages; also during this phase, the user -agent can mark messages for deletion, remove deletion marks, and obtain -mail statistics. The third phase, update, occurs after the client has -issued the quit command, ending the POP3 session; at this time, the mail -server deletes the messages that were marked for deletion. In a POP3 -transaction, the user agent issues commands, and the server responds to -each command with a reply. There are two possible responses: +OK -(sometimes followed by server-to-client data), used by the server to -indicate that the previous command was fine; and -ERR , used by the -server to indicate that something was wrong with the previous command. -The authorization phase has two principal commands: user -`<username>`{=html} and pass `<password>`{=html} . To illustrate these -two commands, we suggest that you Telnet directly into a POP3 server, -using port 110, and issue these commands. Suppose that mailServer is the -name of your mail server. You will see something like: - -telnet mailServer 110 +OK POP3 server ready user bob +OK pass hungry +OK -user successfully logged on - -If you misspell a command, the POP3 server will reply with an -ERR -message. Now let's take a look at the transaction phase. A user agent -using POP3 can often be configured (by the user) to "download and -delete" or to "download and keep." The sequence of commands issued by a -POP3 user agent depends on which of these two modes the user agent is -operating in. In the downloadand-delete mode, the user agent will issue -the list , retr , and dele commands. As an example, suppose the user has -two messages in his or her mailbox. In the dialogue below, C: (standing -for client) is the user agent and S: (standing for server) is the mail -server. The transaction will look something like: - -C: list S: 1 498 S: 2 912 - -S: . C: retr 1 S: (blah blah ... S: ................. S: ..........blah) -S: . C: dele 1 C: retr 2 S: (blah blah ... S: ................. S: -..........blah) S: . C: dele 2 C: quit S: +OK POP3 server signing off - -The user agent first asks the mail server to list the size of each of -the stored messages. The user agent then retrieves and deletes each -message from the server. Note that after the authorization phase, the -user agent employed only four commands: list , retr , dele , and quit . -The syntax for these commands is defined in RFC 1939. After processing -the quit command, the POP3 server enters the update phase and removes -messages 1 and 2 from the mailbox. A problem with this -download-and-delete mode is that the recipient, Bob, may be nomadic and -may want to access his mail messages from multiple machines, for -example, his office PC, his home PC, and his portable computer. The -download-and-delete mode partitions Bob's mail messages over these three -machines; in particular, if Bob first reads a message on his office PC, -he will not be able to reread the message from his portable at home -later in the evening. In the download-and-keep mode, the user agent -leaves the messages on the mail server after downloading them. In this -case, Bob can reread messages from different machines; he can access a -message from work and access it again later in the week from home. -During a POP3 session between a user agent and the mail server, the POP3 -server maintains some state information; in particular, it keeps track -of which user messages have been marked deleted. However, the POP3 -server does not carry state information across POP3 sessions. This lack -of state information across sessions greatly simplifies the -implementation of a POP3 server. IMAP With POP3 access, once Bob has -downloaded his messages to the local machine, he can create mail - -folders and move the downloaded messages into the folders. Bob can then -delete messages, move messages across folders, and search for messages -(by sender name or subject). But this paradigm--- namely, folders and -messages in the local machine---poses a problem for the nomadic user, -who would prefer to maintain a folder hierarchy on a remote server that -can be accessed from any computer. This is not possible with POP3---the -POP3 protocol does not provide any means for a user to create remote -folders and assign messages to folders. To solve this and other -problems, the IMAP protocol, defined in \[RFC 3501\], was invented. Like -POP3, IMAP is a mail access protocol. It has many more features than -POP3, but it is also significantly more complex. (And thus the client -and server side implementations are significantly more complex.) An IMAP -server will associate each message with a folder; when a message first -arrives at the server, it is associated with the recipient's INBOX -folder. The recipient can then move the message into a new, user-created -folder, read the message, delete the message, and so on. The IMAP -protocol provides commands to allow users to create folders and move -messages from one folder to another. IMAP also provides commands that -allow users to search remote folders for messages matching specific -criteria. Note that, unlike POP3, an IMAP server maintains user state -information across IMAP sessions---for example, the names of the folders -and which messages are associated with which folders. Another important -feature of IMAP is that it has commands that permit a user agent to -obtain components of messages. For example, a user agent can obtain just -the message header of a message or just one part of a multipart MIME -message. This feature is useful when there is a low-bandwidth connection -(for example, a slow-speed modem link) between the user agent and its -mail server. With a low-bandwidth connection, the user may not want to -download all of the messages in its mailbox, particularly avoiding long -messages that might contain, for example, an audio or video clip. -Web-Based E-Mail More and more users today are sending and accessing -their e-mail through their Web browsers. Hotmail introduced Web-based -access in the mid 1990s. Now Web-based e-mail is also provided by -Google, Yahoo!, as well as just about every major university and -corporation. With this service, the user agent is an ordinary Web -browser, and the user communicates with its remote mailbox via HTTP. -When a recipient, such as Bob, wants to access a message in his mailbox, -the e-mail message is sent from Bob's mail server to Bob's browser using -the HTTP protocol rather than the POP3 or IMAP protocol. When a sender, -such as Alice, wants to send an e-mail message, the e-mail message is -sent from her browser to her mail server over HTTP rather than over -SMTP. Alice's mail server, however, still sends messages to, and -receives messages from, other mail servers using SMTP. - -2.4 DNS---The Internet's Directory Service We human beings can be -identified in many ways. For example, we can be identified by the names -that appear on our birth certificates. We can be identified by our -social security numbers. We can be identified by our driver's license -numbers. Although each of these identifiers can be used to identify -people, within a given context one identifier may be more appropriate -than another. For example, the computers at the IRS (the infamous -tax-collecting agency in the United States) prefer to use fixed-length -social security numbers rather than birth certificate names. On the -other hand, ordinary people prefer the more mnemonic birth certificate -names rather than social security numbers. (Indeed, can you imagine -saying, "Hi. My name is 132-67-9875. Please meet my husband, -178-87-1146.") Just as humans can be identified in many ways, so too can -Internet hosts. One identifier for a host is its hostname. -Hostnames---such as www.facebook.com, www.google.com , gaia.cs.umass.edu ----are mnemonic and are therefore appreciated by humans. However, -hostnames provide little, if any, information about the location within -the Internet of the host. (A hostname such as www.eurecom.fr , which -ends with the country code .fr , tells us that the host is probably in -France, but doesn't say much more.) Furthermore, because hostnames can -consist of variable-length alphanumeric characters, they would be -difficult to process by routers. For these reasons, hosts are also -identified by so-called IP addresses. We discuss IP addresses in some -detail in Chapter 4, but it is useful to say a few brief words about -them now. An IP address consists of four bytes and has a rigid -hierarchical structure. An IP address looks like 121.7.106.83 , where -each period separates one of the bytes expressed in decimal notation -from 0 to 255. An IP address is hierarchical because as we scan the -address from left to right, we obtain more and more specific information -about where the host is located in the Internet (that is, within which -network, in the network of networks). Similarly, when we scan a postal -address from bottom to top, we obtain more and more specific information -about where the addressee is located. - -2.4.1 Services Provided by DNS We have just seen that there are two ways -to identify a host---by a hostname and by an IP address. People prefer -the more mnemonic hostname identifier, while routers prefer -fixed-length, hierarchically structured IP addresses. In order to -reconcile these preferences, we need a directory service that translates -hostnames to IP addresses. This is the main task of the Internet's -domain name system (DNS). The DNS is (1) a distributed database -implemented in a hierarchy of DNS servers, and (2) an - -application-layer protocol that allows hosts to query the distributed -database. The DNS servers are often UNIX machines running the Berkeley -Internet Name Domain (BIND) software \[BIND 2016\]. The DNS protocol -runs over UDP and uses port 53. DNS is commonly employed by other -application-layer protocols---including HTTP and SMTP to translate -user-supplied hostnames to IP addresses. As an example, consider what -happens when a browser (that is, an HTTP client), running on some user's -host, requests the URL www.someschool.edu/index.html . In order for the -user's host to be able to send an HTTP request message to the Web server -www.someschool.edu , the user's host must first obtain the IP address of -www.someschool.edu . This is done as follows. - -1. The same user machine runs the client side of the DNS application. - -2. The browser extracts the hostname, www.someschool.edu , from the URL - and passes the hostname to the client side of the DNS application. - -3. The DNS client sends a query containing the hostname to a DNS - server. - -4. The DNS client eventually receives a reply, which includes the IP - address for the hostname. - -5. Once the browser receives the IP address from DNS, it can initiate a - TCP connection to the HTTP server process located at port 80 at that - IP address. We see from this example that DNS adds an additional - delay---sometimes substantial---to the Internet applications that - use it. Fortunately, as we discuss below, the desired IP address is - often cached in a "nearby" DNS server, which helps to reduce DNS - network traffic as well as the average DNS delay. DNS provides a few - other important services in addition to translating hostnames to IP - addresses: Host aliasing. A host with a complicated hostname can - have one or more alias names. For example, a hostname such as - relay1.west-coast.enterprise.com could have, say, two aliases such - as enterprise.com and www.enterprise.com . In this case, the - hostname relay1.west-coast.enterprise.com is said to be a canonical - hostname. Alias hostnames, when present, are typically more mnemonic - than canonical hostnames. DNS can be invoked by an application to - obtain the canonical hostname for a supplied alias hostname as well - as the IP address of the host. Mail server aliasing. For obvious - reasons, it is highly desirable that e-mail addresses be mnemonic. - For example, if Bob has an account with Yahoo Mail, Bob's e-mail - address might be as simple as bob@yahoo.mail . However, the hostname - of the Yahoo mail server is more complicated and much less mnemonic - than simply yahoo.com (for example, the canonical hostname might be - something like relay1.west-coast.yahoo.com ). DNS can be invoked by - a mail application to obtain the canonical hostname for a supplied - alias hostname as well as the IP address of the host. In fact, the - MX record (see below) permits a company's mail server and Web server - to have identical (aliased) hostnames; for example, a company's Web - server and mail server can both be called - -enterprise.com . Load distribution. DNS is also used to perform load -distribution among replicated servers, such as replicated Web servers. -Busy sites, such as cnn.com , are replicated over multiple servers, with -each server running on a different end system and each having a -different IP address. For replicated Web servers, a set of IP addresses -is thus associated with one canonical hostname. The DNS database -contains this set of IP addresses. When clients make a DNS query for a -name mapped to a set of addresses, the server responds with the entire -set of IP addresses, but rotates the ordering of the addresses within -each reply. Because a client typically sends its HTTP request message to -the IP address that is listed first in the set, DNS rotation distributes -the traffic among the replicated servers. DNS rotation is also used for -e-mail so that multiple mail servers can have the same alias name. Also, -content distribution companies such as Akamai have used DNS in more -sophisticated ways \[Dilley 2002\] to provide Web content distribution -(see Section 2.6.3). The DNS is specified in RFC 1034 and RFC 1035, and -updated in several additional RFCs. It is a complex system, and we only -touch upon key aspects of its - -PRINCIPLES IN PRACTICE DNS: CRITICAL NETWORK FUNCTIONS VIA THE -CLIENT-SERVER PARADIGM Like HTTP, FTP, and SMTP, the DNS protocol is an -application-layer protocol since it (1) runs between communicating end -systems using the client-server paradigm and (2) relies on an underlying -end-to-end transport protocol to transfer DNS messages between -communicating end systems. In another sense, however, the role of the -DNS is quite different from Web, file transfer, and e-mail applications. -Unlike these applications, the DNS is not an application with which a -user directly interacts. Instead, the DNS provides a core Internet -function---namely, translating hostnames to their underlying IP -addresses, for user applications and other software in the Internet. We -noted in Section 1.2 that much of the complexity in the Internet -architecture is located at the "edges" of the network. The DNS, which -implements the critical name-toaddress translation process using clients -and servers located at the edge of the network, is yet another example -of that design philosophy. - -operation here. The interested reader is referred to these RFCs and the -book by Albitz and Liu \[Albitz 1993\]; see also the retrospective paper -\[Mockapetris 1988\], which provides a nice description of the what and -why of DNS, and \[Mockapetris 2005\]. - -2.4.2 Overview of How DNS Works We now present a high-level overview of -how DNS works. Our discussion will focus on the hostname-to- - -IP-address translation service. Suppose that some application (such as a -Web browser or a mail reader) running in a user's host needs to -translate a hostname to an IP address. The application will invoke the -client side of DNS, specifying the hostname that needs to be translated. -(On many UNIX-based machines, gethostbyname() is the function call that -an application calls in order to perform the translation.) DNS in the -user's host then takes over, sending a query message into the network. -All DNS query and reply messages are sent within UDP datagrams to port -53. After a delay, ranging from milliseconds to seconds, DNS in the -user's host receives a DNS reply message that provides the desired -mapping. This mapping is then passed to the invoking application. Thus, -from the perspective of the invoking application in the user's host, DNS -is a black box providing a simple, straightforward translation service. -But in fact, the black box that implements the service is complex, -consisting of a large number of DNS servers distributed around the -globe, as well as an application-layer protocol that specifies how the -DNS servers and querying hosts communicate. A simple design for DNS -would have one DNS server that contains all the mappings. In this -centralized design, clients simply direct all queries to the single DNS -server, and the DNS server responds directly to the querying clients. -Although the simplicity of this design is attractive, it is -inappropriate for today's Internet, with its vast (and growing) number -of hosts. The problems with a centralized design include: A single point -of failure. If the DNS server crashes, so does the entire Internet! -Traffic volume. A single DNS server would have to handle all DNS queries -(for all the HTTP requests and e-mail messages generated from hundreds -of millions of hosts). Distant centralized database. A single DNS server -cannot be "close to" all the querying clients. If we put the single DNS -server in New York City, then all queries from Australia must travel to -the other side of the globe, perhaps over slow and congested links. This -can lead to significant delays. Maintenance. The single DNS server would -have to keep records for all Internet hosts. Not only would this -centralized database be huge, but it would have to be updated frequently -to account for every new host. In summary, a centralized database in a -single DNS server simply doesn't scale. Consequently, the DNS is -distributed by design. In fact, the DNS is a wonderful example of how a -distributed database can be implemented in the Internet. A Distributed, -Hierarchical Database In order to deal with the issue of scale, the DNS -uses a large number of servers, organized in a hierarchical fashion and -distributed around the world. No single DNS server has all of the -mappings for all of the hosts in the Internet. Instead, the mappings are -distributed across the DNS servers. To a first approximation, there are -three classes of DNS servers---root DNS servers, top-level domain (TLD) -DNS - -servers, and authoritative DNS servers---organized in a hierarchy as -shown in Figure 2.17. To understand how these three classes of servers -interact, suppose a DNS client wants to determine the IP address for the -hostname www.amazon.com . To a first - -Figure 2.17 Portion of the hierarchy of DNS servers - -approximation, the following events will take place. The client first -contacts one of the root servers, which returns IP addresses for TLD -servers for the top-level domain com . The client then contacts one of -these TLD servers, which returns the IP address of an authoritative -server for amazon.com . Finally, the client contacts one of the -authoritative servers for amazon.com , which returns the IP address for -the hostname www.amazon.com . We'll soon examine this DNS lookup process -in more detail. But let's first take a closer look at these three -classes of DNS servers: Root DNS servers. There are over 400 root name -servers scattered all over the world. Figure 2.18 shows the countries -that have root names servers, with countries having more than ten darkly -shaded. These root name servers are managed by 13 different -organizations. The full list of root name servers, along with the -organizations that manage them and their IP addresses can be found at -\[Root Servers 2016\]. Root name servers provide the IP addresses of the -TLD servers. Top-level domain (TLD) servers. For each of the top-level -domains --- top-level domains such as com, org, net, edu, and gov, and -all of the country top-level domains such as uk, fr, ca, and jp --- -there is TLD server (or server cluster). The company Verisign Global -Registry Services maintains the TLD servers for the com top-level -domain, and the company Educause maintains the TLD servers for the edu -top-level domain. The network infrastructure supporting a TLD can be -large and complex; see \[Osterweil 2012\] for a nice overview of the -Verisign network. See \[TLD list 2016\] for a list of all top-level -domains. TLD servers provide the IP addresses for authoritative DNS -servers. - -Figure 2.18 DNS root servers in 2016 - -Authoritative DNS servers. Every organization with publicly accessible -hosts (such as Web servers and mail servers) on the Internet must -provide publicly accessible DNS records that map the names of those -hosts to IP addresses. An organization's authoritative DNS server houses -these DNS records. An organization can choose to implement its own -authoritative DNS server to hold these records; alternatively, the -organization can pay to have these records stored in an authoritative -DNS server of some service provider. Most universities and large -companies implement and maintain their own primary and secondary -(backup) authoritative DNS server. The root, TLD, and authoritative DNS -servers all belong to the hierarchy of DNS servers, as shown in Figure -2.17. There is another important type of DNS server called the local DNS -server. A local DNS server does not strictly belong to the hierarchy of -servers but is nevertheless central to the DNS architecture. Each -ISP---such as a residential ISP or an institutional ISP---has a local -DNS server (also called a default name server). When a host connects to -an ISP, the ISP provides the host with the IP addresses of one or more -of its local DNS servers (typically through DHCP, which is discussed in -Chapter 4). You can easily determine the IP address of your local DNS -server by accessing network status windows in Windows or UNIX. A host's -local DNS server is typically "close to" the host. For an institutional -ISP, the local DNS server may be on the same LAN as the host; for a -residential ISP, it is typically separated from the host by no more than -a few routers. When a host makes a DNS query, the query is sent to the -local DNS server, which acts a proxy, forwarding the query into the DNS -server hierarchy, as we'll discuss in more detail below. Let's take a -look at a simple example. Suppose the host cse.nyu.edu desires the IP -address of gaia.cs.umass.edu . Also suppose that NYU's ocal DNS server -for cse.nyu.edu is called - -dns.nyu.edu and that an authoritative DNS server for gaia.cs.umass.edu -is called dns.umass.edu . As shown in Figure 2.19, the host cse.nyu.edu -first sends a DNS query message to its local DNS server, dns.nyu.edu . -The query message contains the hostname to be translated, namely, -gaia.cs.umass.edu . The local DNS server forwards the query message to a -root DNS server. The root DNS server takes note of the edu suffix and -returns to the local DNS server a list of IP addresses for TLD servers -responsible for edu . The local DNS server then resends the query -message to one of these TLD servers. The TLD server takes note of the -umass.edu suffix and responds with the IP address of the authoritative -DNS server for the University of Massachusetts, namely, dns.umass.edu . -Finally, the local DNS server resends the query message directly to -dns.umass.edu , which responds with the IP address of gaia.cs.umass.edu -. Note that in this example, in order to obtain the mapping for one -hostname, eight DNS messages were sent: four query messages and four -reply messages! We'll soon see how DNS caching reduces this query -traffic. Our previous example assumed that the TLD server knows the -authoritative DNS server for the hostname. In general this not always -true. Instead, the TLD server - -Figure 2.19 Interaction of the various DNS servers - -may know only of an intermediate DNS server, which in turn knows the -authoritative DNS server for the hostname. For example, suppose again -that the University of Massachusetts has a DNS server for the -university, called dns.umass.edu . Also suppose that each of the -departments at the University of Massachusetts has its own DNS server, -and that each departmental DNS server is authoritative for all hosts in -the department. In this case, when the intermediate DNS server, -dns.umass.edu , receives a query for a host with a hostname ending with -cs.umass.edu , it returns to dns.nyu.edu the IP address of -dns.cs.umass.edu , which is authoritative for all hostnames ending with -cs.umass.edu . The local DNS server dns.nyu.edu then sends the query to -the authoritative DNS server, which returns the desired mapping to the -local DNS server, which in turn returns the mapping to the requesting -host. In this case, a total of 10 DNS messages are sent! The example -shown in Figure 2.19 makes use of both recursive queries and iterative -queries. The query sent from cse.nyu.edu to dns.nyu.edu is a recursive -query, since the query asks dns.nyu.edu to obtain the mapping on its -behalf. But the subsequent three queries are iterative since all of the -replies are directly returned to dns.nyu.edu . In theory, any DNS query -can be iterative or recursive. For example, Figure 2.20 shows a DNS -query chain for which all of the queries are recursive. In practice, the -queries typically follow the pattern in Figure 2.19: The query from the -requesting host to the local DNS server is recursive, and the remaining -queries are iterative. DNS Caching Our discussion thus far has ignored -DNS caching, a critically important feature of the DNS system. In truth, -DNS extensively exploits DNS caching in order to improve the delay -performance and to reduce the number of DNS messages - -Figure 2.20 Recursive queries in DNS - -ricocheting around the Internet. The idea behind DNS caching is very -simple. In a query chain, when a DNS server receives a DNS reply -(containing, for example, a mapping from a hostname to an IP address), -it can cache the mapping in its local memory. For example, in Figure -2.19, each time the local DNS server dns.nyu.edu receives a reply from -some DNS server, it can cache any of the information contained in the -reply. If a hostname/IP address pair is cached in a DNS server and -another query arrives to the DNS server for the same hostname, the DNS -server can provide the desired IP address, even if it is not -authoritative for the hostname. Because hosts and mappings between -hostnames and IP addresses are by no means permanent, DNS servers -discard cached information after a period of time (often set to two -days). As an example, suppose that a host apricot.nyu.edu queries -dns.nyu.edu for the IP address for the hostname cnn.com . Furthermore, -suppose that a few hours later, another NYU host, say, kiwi.nyu.edu , -also queries dns.nyu.edu with the same hostname. Because of caching, the -local DNS server will be able to immediately return the IP address of -cnn.com to this second requesting - -host without having to query any other DNS servers. A local DNS server -can also cache the IP addresses of TLD servers, thereby allowing the -local DNS server to bypass the root DNS servers in a query chain. In -fact, because of caching, root servers are bypassed for all but a very -small fraction of DNS queries. - -2.4.3 DNS Records and Messages The DNS servers that together implement -the DNS distributed database store resource records (RRs), including RRs -that provide hostname-to-IP address mappings. Each DNS reply message -carries one or more resource records. In this and the following -subsection, we provide a brief overview of DNS resource records and -messages; more details can be found in \[Albitz 1993\] or in the DNS -RFCs \[RFC 1034; RFC 1035\]. A resource record is a four-tuple that -contains the following fields: - -(Name, Value, Type, TTL) - -TTL is the time to live of the resource record; it determines when a -resource should be removed from a cache. In the example records given -below, we ignore the TTL field. The meaning of Name and Value depend on -Type : If Type=A , then Name is a hostname and Value is the IP address -for the hostname. Thus, a Type A record provides the standard -hostname-to-IP address mapping. As an example, (relay1.bar.foo.com, -145.37.93.126, A) is a Type A record. If Type=NS , then Name is a domain -(such as foo.com ) and Value is the hostname of an authoritative DNS -server that knows how to obtain the IP addresses for hosts in the -domain. This record is used to route DNS queries further along in the -query chain. As an example, (foo.com, dns.foo.com, NS) is a Type NS -record. If Type=CNAME , then Value is a canonical hostname for the alias -hostname Name . This record can provide querying hosts the canonical -name for a hostname. As an example, (foo.com, relay1.bar.foo.com, CNAME) -is a CNAME record. If Type=MX , then Value is the canonical name of a -mail server that has an alias hostname Name . As an example, (foo.com, -mail.bar.foo.com, MX) is an MX record. MX records allow the hostnames of -mail servers to have simple aliases. Note that by using the MX record, a -company can have the same aliased name for its mail server and for one -of its other servers (such as its Web server). To obtain the canonical -name for the mail server, a DNS client would query for an MX - -record; to obtain the canonical name for the other server, the DNS -client would query for the CNAME record. If a DNS server is -authoritative for a particular hostname, then the DNS server will -contain a Type A record for the hostname. (Even if the DNS server is not -authoritative, it may contain a Type A record in its cache.) If a server -is not authoritative for a hostname, then the server will contain a Type -NS record for the domain that includes the hostname; it will also -contain a Type A record that provides the IP address of the DNS server -in the Value field of the NS record. As an example, suppose an edu TLD -server is not authoritative for the host gaia.cs.umass.edu . Then this -server will contain a record for a domain that includes the host -gaia.cs.umass.edu , for example, (umass.edu, dns.umass.edu, NS) . The -edu TLD server would also contain a Type A record, which maps the DNS -server dns.umass.edu to an IP address, for example, (dns.umass.edu, -128.119.40.111, A) . DNS Messages Earlier in this section, we referred -to DNS query and reply messages. These are the only two kinds of DNS -messages. Furthermore, both query and reply messages have the same -format, as shown in Figure 2.21.The semantics of the various fields in a -DNS message are as follows: The first 12 bytes is the header section, -which has a number of fields. The first field is a 16-bit number that -identifies the query. This identifier is copied into the reply message -to a query, allowing the client to match received replies with sent -queries. There are a number of flags in the flag field. A 1-bit -query/reply flag indicates whether the message is a query (0) or a reply -(1). A 1-bit authoritative flag is - -Figure 2.21 DNS message format - -set in a reply message when a DNS server is an authoritative server for -a queried name. A 1-bit recursion-desired flag is set when a client -(host or DNS server) desires that the DNS server perform recursion when -it doesn't have the record. A 1-bit recursion-available field is set in -a reply if the DNS server supports recursion. In the header, there are -also four number-of fields. These fields indicate the number of -occurrences of the four types of data sections that follow the header. -The question section contains information about the query that is being -made. This section includes (1) a name field that contains the name that -is being queried, and (2) a type field that indicates the type of -question being asked about the name---for example, a host address -associated with a name (Type A) or the mail server for a name (Type MX). -In a reply from a DNS server, the answer section contains the resource -records for the name that was originally queried. Recall that in each -resource record there is the Type (for example, A, NS, CNAME, and MX), -the Value , and the TTL . A reply can return multiple RRs in the answer, -since a hostname can have multiple IP addresses (for example, for -replicated Web servers, as discussed earlier in this section). The -authority section contains records of other authoritative servers. The -additional section contains other helpful records. For example, the -answer field in a reply to an MX query contains a resource record -providing the canonical hostname of a mail server. The additional -section contains a Type A record providing the IP address for the -canonical hostname of the mail server. How would you like to send a DNS -query message directly from the host you're working on to some DNS -server? This can easily be done with the nslookup program, which is -available from most Windows and UNIX platforms. For example, from a -Windows host, open the Command Prompt and invoke the nslookup program by -simply typing "nslookup." After invoking nslookup, you can send a DNS -query to any DNS server (root, TLD, or authoritative). After receiving -the reply message from the DNS server, nslookup will display the records -included in the reply (in a human-readable format). As an alternative to -running nslookup from your own host, you can visit one of many Web sites -that allow you to remotely employ nslookup. (Just type "nslookup" into a -search engine and you'll be brought to one of these sites.) The DNS -Wireshark lab at the end of this chapter will allow you to explore the -DNS in much more detail. Inserting Records into the DNS Database The -discussion above focused on how records are retrieved from the DNS -database. You might be wondering how records get into the database in -the first place. Let's look at how this is done in the context of a -specific example. Suppose you have just created an exciting new startup -company called Network Utopia. The first thing you'll surely want to do -is register the domain name - -networkutopia.com at a registrar. A registrar is a commercial entity -that verifies the uniqueness of the domain name, enters the domain name -into the DNS database (as discussed below), and collects a small fee -from you for its services. Prior to 1999, a single registrar, Network -Solutions, had a monopoly on domain name registration for com , net , -and org domains. But now there are many registrars competing for -customers, and the Internet Corporation for Assigned Names and Numbers -(ICANN) accredits the various registrars. A complete list of accredited -registrars is available at http:// www.internic.net . When you register -the domain name networkutopia.com with some registrar, you also need to -provide the registrar with the names and IP addresses of your primary -and secondary authoritative DNS servers. Suppose the names and IP -addresses are dns1.networkutopia.com , dns2.networkutopia.com , -212.2.212.1, and 212.212.212.2. For each of these two authoritative DNS -servers, the registrar would then make sure that a Type NS and a Type A -record are entered into the TLD com servers. Specifically, for the -primary authoritative server for networkutopia.com , the registrar would -insert the following two resource records into the DNS system: - -(networkutopia.com, dns1.networkutopia.com, NS) (dns1.networkutopia.com, -212.212.212.1, A) - -You'll also have to make sure that the Type A resource record for your -Web server www.networkutopia.com and the Type MX resource record for -your mail server mail.networkutopia.com are entered into your -authoritative DNS FOCUS ON SECURITY DNS VULNERABILITIES We have seen -that DNS is a critical component of the Internet infrastructure, with -many important services---including the Web and e-mail---simply -incapable of functioning without it. We therefore naturally ask, how can -DNS be attacked? Is DNS a sitting duck, waiting to be knocked out of -service, while taking most Internet applications down with it? The first -type of attack that comes to mind is a DDoS bandwidth-flooding attack -(see Section 1.6) against DNS servers. For example, an attacker could -attempt to send to each DNS root server a deluge of packets, so many -that the majority of legitimate DNS queries never get answered. Such a -large-scale DDoS attack against DNS root servers actually took place on -October 21, 2002. In this attack, the attackers leveraged a botnet to -send truck loads of ICMP ping messages to each of the 13 DNS root IP -addresses. (ICMP messages are discussed in - -Section 5.6. For now, it suffices to know that ICMP packets are special -types of IP datagrams.) Fortunately, this large-scale attack caused -minimal damage, having little or no impact on users' Internet -experience. The attackers did succeed at directing a deluge of packets -at the root servers. But many of the DNS root servers were protected by -packet filters, configured to always block all ICMP ping messages -directed at the root servers. These protected servers were thus spared -and functioned as normal. Furthermore, most local DNS servers cache the -IP addresses of top-level-domain servers, allowing the query process to -often bypass the DNS root servers. A potentially more effective DDoS -attack against DNS would be send a deluge of DNS queries to -top-level-domain servers, for example, to all the top-level-domain -servers that handle the .com domain. It would be harder to filter DNS -queries directed to DNS servers; and top-level-domain servers are not as -easily bypassed as are root servers. But the severity of such an attack -would be partially mitigated by caching in local DNS servers. DNS could -potentially be attacked in other ways. In a man-in-the-middle attack, -the attacker intercepts queries from hosts and returns bogus replies. In -the DNS poisoning attack, the attacker sends bogus replies to a DNS -server, tricking the server into accepting bogus records into its cache. -Either of these attacks could be used, for example, to redirect an -unsuspecting Web user to the attacker's Web site. These attacks, -however, are difficult to implement, as they require intercepting -packets or throttling servers \[Skoudis 2006\]. In summary, DNS has -demonstrated itself to be surprisingly robust against attacks. To date, -there hasn't been an attack that has successfully impeded the DNS -service. - -servers. (Until recently, the contents of each DNS server were -configured statically, for example, from a configuration file created by -a system manager. More recently, an UPDATE option has been added to the -DNS protocol to allow data to be dynamically added or deleted from the -database via DNS messages. \[RFC 2136\] and \[RFC 3007\] specify DNS -dynamic updates.) Once all of these steps are completed, people will be -able to visit your Web site and send e-mail to the employees at your -company. Let's conclude our discussion of DNS by verifying that this -statement is true. This verification also helps to solidify what we have -learned about DNS. Suppose Alice in Australia wants to view the Web page -www.networkutopia.com . As discussed earlier, her host will first send a -DNS query to her local DNS server. The local DNS server will then -contact a TLD com server. (The local DNS server will also have to -contact a root DNS server if the address of a TLD com server is not -cached.) This TLD server contains the Type NS and Type A resource -records listed above, because the registrar had these resource records -inserted into all of the TLD com servers. The TLD com server sends a -reply to Alice's local DNS server, with the reply containing the two -resource records. The local DNS server then sends a DNS query to -212.212.212.1 , asking for the Type A record corresponding to -www.networkutopia.com . This record provides the IP address of the -desired Web server, say, 212.212.71.4 , which the local DNS server -passes back to Alice's host. Alice's browser can now - -initiate a TCP connection to the host 212.212.71.4 and send an HTTP -request over the connection. Whew! There's a lot more going on than what -meets the eye when one surfs the Web! - -2.5 Peer-to-Peer File Distribution The applications described in this -chapter thus far---including the Web, e-mail, and DNS---all employ -client-server architectures with significant reliance on always-on -infrastructure servers. Recall from Section 2.1.1 that with a P2P -architecture, there is minimal (or no) reliance on always-on -infrastructure servers. Instead, pairs of intermittently connected -hosts, called peers, communicate directly with each other. The peers are -not owned by a service provider, but are instead desktops and laptops -controlled by users. In this section we consider a very natural P2P -application, namely, distributing a large file from a single server to a -large number of hosts (called peers). The file might be a new version of -the Linux operating system, a software patch for an existing operating -system or application, an MP3 music file, or an MPEG video file. In -client-server file distribution, the server must send a copy of the file -to each of the peers---placing an enormous burden on the server and -consuming a large amount of server bandwidth. In P2P file distribution, -each peer can redistribute any portion of the file it has received to -any other peers, thereby assisting the server in the distribution -process. As of 2016, the most popular P2P file distribution protocol is -BitTorrent. Originally developed by Bram Cohen, there are now many -different independent BitTorrent clients conforming to the BitTorrent -protocol, just as there are a number of Web browser clients that conform -to the HTTP protocol. In this subsection, we first examine the -selfscalability of P2P architectures in the context of file -distribution. We then describe BitTorrent in some detail, highlighting -its most important characteristics and features. Scalability of P2P -Architectures To compare client-server architectures with peer-to-peer -architectures, and illustrate the inherent selfscalability of P2P, we -now consider a simple quantitative model for distributing a file to a -fixed set of peers for both architecture types. As shown in Figure 2.22, -the server and the peers are connected to the Internet with access -links. Denote the upload rate of the server's access link by us, the -upload rate of the ith peer's access link by ui, and the download rate -of the ith peer's access link by di. Also denote the size of the file to -be distributed (in bits) by F and the number of peers that want to -obtain a copy of the file by N. The distribution time is the time it -takes to get - -Figure 2.22 An illustrative file distribution problem - -a copy of the file to all N peers. In our analysis of the distribution -time below, for both client-server and P2P architectures, we make the -simplifying (and generally accurate \[Akella 2003\]) assumption that the -Internet core has abundant bandwidth, implying that all of the -bottlenecks are in access networks. We also suppose that the server and -clients are not participating in any other network applications, so that -all of their upload and download access bandwidth can be fully devoted -to distributing this file. Let's first determine the distribution time -for the client-server architecture, which we denote by Dcs. In the -client-server architecture, none of the peers aids in distributing the -file. We make the following observations: The server must transmit one -copy of the file to each of the N peers. Thus the server must transmit -NF bits. Since the server's upload rate is us, the time to distribute -the file must be at least NF/us. Let dmin denote the download rate of -the peer with the lowest download rate, that is, dmin=min{d1,dp,. . -.,dN}. The peer with the lowest download rate cannot obtain all F bits -of the file in less than F/dmin seconds. Thus the minimum distribution -time is at least F/dmin. Putting these two observations together, we -obtain Dcs≥max{NFus,Fdmin}. - -This provides a lower bound on the minimum distribution time for the -client-server architecture. In the homework problems you will be asked -to show that the server can schedule its transmissions so that the lower -bound is actually achieved. So let's take this lower bound provided -above as the actual distribution time, that is, Dcs=max{NFus,Fdmin} - -(2.1) - -We see from Equation 2.1 that for N large enough, the client-server -distribution time is given by NF/us. Thus, the distribution time -increases linearly with the number of peers N. So, for example, if the -number of peers from one week to the next increases a thousand-fold from -a thousand to a million, the time required to distribute the file to all -peers increases by 1,000. Let's now go through a similar analysis for -the P2P architecture, where each peer can assist the server in -distributing the file. In particular, when a peer receives some file -data, it can use its own upload capacity to redistribute the data to -other peers. Calculating the distribution time for the P2P architecture -is somewhat more complicated than for the client-server architecture, -since the distribution time depends on how each peer distributes -portions of the file to the other peers. Nevertheless, a simple -expression for the minimal distribution time can be obtained \[Kumar -2006\]. To this end, we first make the following observations: At the -beginning of the distribution, only the server has the file. To get this -file into the community of peers, the server must send each bit of the -file at least once into its access link. Thus, the minimum distribution -time is at least F/us. (Unlike the client-server scheme, a bit sent once -by the server may not have to be sent by the server again, as the peers -may redistribute the bit among themselves.) As with the client-server -architecture, the peer with the lowest download rate cannot obtain all F -bits of the file in less than F/dmin seconds. Thus the minimum -distribution time is at least F/dmin. Finally, observe that the total -upload capacity of the system as a whole is equal to the upload rate of -the server plus the upload rates of each of the individual peers, that -is, utotal=us+u1+⋯+uN. The system must deliver (upload) F bits to each -of the N peers, thus delivering a total of NF bits. This cannot be done -at a rate faster than utotal. Thus, the minimum distribution time is -also at least NF/(us+u1+⋯+uN). Putting these three observations -together, we obtain the minimum distribution time for P2P, denoted by -DP2P. DP2P≥max{Fus,Fdmin,NFus+∑i=1Nui} - -(2.2) - -Equation 2.2 provides a lower bound for the minimum distribution time -for the P2P architecture. It turns out that if we imagine that each peer -can redistribute a bit as soon as it receives the bit, then there is a - -redistribution scheme that actually achieves this lower bound \[Kumar -2006\]. (We will prove a special case of this result in the homework.) -In reality, where chunks of the file are redistributed rather than -individual bits, Equation 2.2 serves as a good approximation of the -actual minimum distribution time. Thus, let's take the lower bound -provided by Equation 2.2 as the actual minimum distribution time, that -is, DP2P=max{Fus,Fdmin,NFus+∑i=1Nui} - -(2.3) - -Figure 2.23 compares the minimum distribution time for the client-server -and P2P architectures assuming that all peers have the same upload rate -u. In Figure 2.23, we have set F/u=1 hour, us=10u, and dmin≥us. Thus, a -peer can transmit the entire file in one hour, the server transmission -rate is 10 times the peer upload rate, - -Figure 2.23 Distribution time for P2P and client-server architectures - -and (for simplicity) the peer download rates are set large enough so as -not to have an effect. We see from Figure 2.23 that for the -client-server architecture, the distribution time increases linearly and -without bound as the number of peers increases. However, for the P2P -architecture, the minimal distribution time is not only always less than -the distribution time of the client-server architecture; it is also less -than one hour for any number of peers N. Thus, applications with the P2P -architecture can be self-scaling. This scalability is a direct -consequence of peers being redistributors as well as consumers of bits. -BitTorrent BitTorrent is a popular P2P protocol for file distribution -\[Chao 2011\]. In BitTorrent lingo, the collection of - -all peers participating in the distribution of a particular file is -called a torrent. Peers in a torrent download equal-size chunks of the -file from one another, with a typical chunk size of 256 KBytes. When a -peer first joins a torrent, it has no chunks. Over time it accumulates -more and more chunks. While it downloads chunks it also uploads chunks -to other peers. Once a peer has acquired the entire file, it may -(selfishly) leave the torrent, or (altruistically) remain in the torrent -and continue to upload chunks to other peers. Also, any peer may leave -the torrent at any time with only a subset of chunks, and later rejoin -the torrent. Let's now take a closer look at how BitTorrent operates. -Since BitTorrent is a rather complicated protocol and system, we'll only -describe its most important mechanisms, sweeping some of the details -under the rug; this will allow us to see the forest through the trees. -Each torrent has an infrastructure node called a tracker. - -Figure 2.24 File distribution with BitTorrent - -When a peer joins a torrent, it registers itself with the tracker and -periodically informs the tracker that it is still in the torrent. In -this manner, the tracker keeps track of the peers that are participating -in the torrent. A given torrent may have fewer than ten or more than a -thousand peers participating at any instant of time. - -As shown in Figure 2.24, when a new peer, Alice, joins the torrent, the -tracker randomly selects a subset of peers (for concreteness, say 50) -from the set of participating peers, and sends the IP addresses of these -50 peers to Alice. Possessing this list of peers, Alice attempts to -establish concurrent TCP connections with all the peers on this list. -Let's call all the peers with which Alice succeeds in establishing a TCP -connection "neighboring peers." (In Figure 2.24, Alice is shown to have -only three neighboring peers. Normally, she would have many more.) As -time evolves, some of these peers may leave and other peers (outside the -initial 50) may attempt to establish TCP connections with Alice. So a -peer's neighboring peers will fluctuate over time. At any given time, -each peer will have a subset of chunks from the file, with different -peers having different subsets. Periodically, Alice will ask each of her -neighboring peers (over the TCP connections) for the list of the chunks -they have. If Alice has L different neighbors, she will obtain L lists -of chunks. With this knowledge, Alice will issue requests (again over -the TCP connections) for chunks she currently does not have. So at any -given instant of time, Alice will have a subset of chunks and will know -which chunks her neighbors have. With this information, Alice will have -two important decisions to make. First, which chunks should she request -first from her neighbors? And second, to which of her neighbors should -she send requested chunks? In deciding which chunks to request, Alice -uses a technique called rarest first. The idea is to determine, from -among the chunks she does not have, the chunks that are the rarest among -her neighbors (that is, the chunks that have the fewest repeated copies -among her neighbors) and then request those rarest chunks first. In this -manner, the rarest chunks get more quickly redistributed, aiming to -(roughly) equalize the numbers of copies of each chunk in the torrent. -To determine which requests she responds to, BitTorrent uses a clever -trading algorithm. The basic idea is that Alice gives priority to the -neighbors that are currently supplying her data at the highest rate. -Specifically, for each of her neighbors, Alice continually measures the -rate at which she receives bits and determines the four peers that are -feeding her bits at the highest rate. She then reciprocates by sending -chunks to these same four peers. Every 10 seconds, she recalculates the -rates and possibly modifies the set of four peers. In BitTorrent lingo, -these four peers are said to be unchoked. Importantly, every 30 seconds, -she also picks one additional neighbor at random and sends it chunks. -Let's call the randomly chosen peer Bob. In BitTorrent lingo, Bob is -said to be optimistically unchoked. Because Alice is sending data to -Bob, she may become one of Bob's top four uploaders, in which case Bob -would start to send data to Alice. If the rate at which Bob sends data -to Alice is high enough, Bob could then, in turn, become one of Alice's -top four uploaders. In other words, every 30 seconds, Alice will -randomly choose a new trading partner and initiate trading with that -partner. If the two peers are satisfied with the trading, they will put -each other in their top four lists and continue trading with each other -until one of the peers finds a better partner. The effect is that peers -capable of uploading at compatible rates tend to find each other. The -random neighbor selection also allows new peers to get chunks, so that -they can have something to trade. All other neighboring peers besides -these five peers - -(four "top" peers and one probing peer) are "choked," that is, they do -not receive any chunks from Alice. BitTorrent has a number of -interesting mechanisms that are not discussed here, including pieces -(minichunks), pipelining, random first selection, endgame mode, and -anti-snubbing \[Cohen 2003\]. The incentive mechanism for trading just -described is often referred to as tit-for-tat \[Cohen 2003\]. It has -been shown that this incentive scheme can be circumvented \[Liogkas -2006; Locher 2006; Piatek 2007\]. Nevertheless, the BitTorrent ecosystem -is wildly successful, with millions of simultaneous peers actively -sharing files in hundreds of thousands of torrents. If BitTorrent had -been designed without tit-fortat (or a variant), but otherwise exactly -the same, BitTorrent would likely not even exist now, as the majority of -the users would have been freeriders \[Saroiu 2002\]. We close our -discussion on P2P by briefly mentioning another application of P2P, -namely, Distributed Hast Table (DHT). A distributed hash table is a -simple database, with the database records being distributed over the -peers in a P2P system. DHTs have been widely implemented (e.g., in -BitTorrent) and have been the subject of extensive research. An overview -is provided in a Video Note in the companion website. - -Walking though distributed hash tables - -2.6 Video Streaming and Content Distribution Networks Streaming -prerecorded video now accounts for the majority of the traffic in -residential ISPs in North America. In particular, the Netflix and -YouTube services alone consumed a whopping 37% and 16%, respectively, of -residential ISP traffic in 2015 \[Sandvine 2015\]. In this section we -will provide an overview of how popular video streaming services are -implemented in today's Internet. We will see they are implemented using -application-level protocols and servers that function in some ways like -a cache. In Chapter 9, devoted to multimedia networking, we will further -examine Internet video as well as other Internet multimedia services. - -2.6.1 Internet Video In streaming stored video applications, the -underlying medium is prerecorded video, such as a movie, a television -show, a prerecorded sporting event, or a prerecorded user-generated -video (such as those commonly seen on YouTube). These prerecorded videos -are placed on servers, and users send requests to the servers to view -the videos on demand. Many Internet companies today provide streaming -video, including, Netflix, YouTube (Google), Amazon, and Youku. But -before launching into a discussion of video streaming, we should first -get a quick feel for the video medium itself. A video is a sequence of -images, typically being displayed at a constant rate, for example, at 24 -or 30 images per second. An uncompressed, digitally encoded image -consists of an array of pixels, with each pixel encoded into a number of -bits to represent luminance and color. An important characteristic of -video is that it can be compressed, thereby trading off video quality -with bit rate. Today's off-the-shelf compression algorithms can compress -a video to essentially any bit rate desired. Of course, the higher the -bit rate, the better the image quality and the better the overall user -viewing experience. From a networking perspective, perhaps the most -salient characteristic of video is its high bit rate. Compressed -Internet video typically ranges from 100 kbps for low-quality video to -over 3 Mbps for streaming high-definition movies; 4K streaming envisions -a bitrate of more than 10 Mbps. This can translate to huge amount of -traffic and storage, particularly for high-end video. For example, a -single 2 Mbps video with a duration of 67 minutes will consume 1 -gigabyte of storage and traffic. By far, the most important performance -measure for streaming video is average end-to-end throughput. In order -to provide continuous playout, the network must provide an average -throughput to the streaming application that is at least as large as the -bit rate of the compressed video. - -We can also use compression to create multiple versions of the same -video, each at a different quality level. For example, we can use -compression to create, say, three versions of the same video, at rates -of 300 kbps, 1 Mbps, and 3 Mbps. Users can then decide which version -they want to watch as a function of their current available bandwidth. -Users with high-speed Internet connections might choose the 3 Mbps -version; users watching the video over 3G with a smartphone might choose -the 300 kbps version. - -2.6.2 HTTP Streaming and DASH In HTTP streaming, the video is simply -stored at an HTTP server as an ordinary file with a specific URL. When a -user wants to see the video, the client establishes a TCP connection -with the server and issues an HTTP GET request for that URL. The server -then sends the video file, within an HTTP response message, as quickly -as the underlying network protocols and traffic conditions will allow. -On the client side, the bytes are collected in a client application -buffer. Once the number of bytes in this buffer exceeds a predetermined -threshold, the client application begins playback---specifically, the -streaming video application periodically grabs video frames from the -client application buffer, decompresses the frames, and displays them on -the user's screen. Thus, the video streaming application is displaying -video as it is receiving and buffering frames corresponding to latter -parts of the video. Although HTTP streaming, as described in the -previous paragraph, has been extensively deployed in practice (for -example, by YouTube since its inception), it has a major shortcoming: -All clients receive the same encoding of the video, despite the large -variations in the amount of bandwidth available to a client, both across -different clients and also over time for the same client. This has led -to the development of a new type of HTTP-based streaming, often referred -to as Dynamic Adaptive Streaming over HTTP (DASH). In DASH, the video is -encoded into several different versions, with each version having a -different bit rate and, correspondingly, a different quality level. The -client dynamically requests chunks of video segments of a few seconds in -length. When the amount of available bandwidth is high, the client -naturally selects chunks from a high-rate version; and when the -available bandwidth is low, it naturally selects from a low-rate -version. The client selects different chunks one at a time with HTTP GET -request messages \[Akhshabi 2011\]. DASH allows clients with different -Internet access rates to stream in video at different encoding rates. -Clients with low-speed 3G connections can receive a low bit-rate (and -low-quality) version, and clients with fiber connections can receive a -high-quality version. DASH also allows a client to adapt to the -available bandwidth if the available end-to-end bandwidth changes during -the session. This feature is particularly important for mobile users, -who typically see their bandwidth availability fluctuate as they move -with respect to the base stations. With DASH, each video version is -stored in the HTTP server, each with a different URL. The HTTP - -server also has a manifest file, which provides a URL for each version -along with its bit rate. The client first requests the manifest file and -learns about the various versions. The client then selects one chunk at -a time by specifying a URL and a byte range in an HTTP GET request -message for each chunk. While downloading chunks, the client also -measures the received bandwidth and runs a rate determination algorithm -to select the chunk to request next. Naturally, if the client has a lot -of video buffered and if the measured receive bandwidth is high, it will -choose a chunk from a high-bitrate version. And naturally if the client -has little video buffered and the measured received bandwidth is low, it -will choose a chunk from a low-bitrate version. DASH therefore allows -the client to freely switch among different quality levels. - -2.6.3 Content Distribution Networks Today, many Internet video companies -are distributing on-demand multi-Mbps streams to millions of users on a -daily basis. YouTube, for example, with a library of hundreds of -millions of videos, distributes hundreds of millions of video streams to -users around the world every day. Streaming all this traffic to -locations all over the world while providing continuous playout and high -interactivity is clearly a challenging task. For an Internet video -company, perhaps the most straightforward approach to providing -streaming video service is to build a single massive data center, store -all of its videos in the data center, and stream the videos directly -from the data center to clients worldwide. But there are three major -problems with this approach. First, if the client is far from the data -center, server-to-client packets will cross many communication links and -likely pass through many ISPs, with some of the ISPs possibly located on -different continents. If one of these links provides a throughput that -is less than the video consumption rate, the end-to-end throughput will -also be below the consumption rate, resulting in annoying freezing -delays for the user. (Recall from Chapter 1 that the end-to-end -throughput of a stream is governed by the throughput at the bottleneck -link.) The likelihood of this happening increases as the number of links -in the end-to-end path increases. A second drawback is that a popular -video will likely be sent many times over the same communication links. -Not only does this waste network bandwidth, but the Internet video -company itself will be paying its provider ISP (connected to the data -center) for sending the same bytes into the Internet over and over -again. A third problem with this solution is that a single data center -represents a single point of failure---if the data center or its links -to the Internet goes down, it would not be able to distribute any video -streams. In order to meet the challenge of distributing massive amounts -of video data to users distributed around the world, almost all major -video-streaming companies make use of Content Distribution Networks -(CDNs). A CDN manages servers in multiple geographically distributed -locations, stores copies of the videos (and other types of Web content, -including documents, images, and audio) in its servers, and attempts to -direct each user request to a CDN location that will provide the best -user experience. The - -CDN may be a private CDN, that is, owned by the content provider itself; -for example, Google's CDN distributes YouTube videos and other types of -content. The CDN may alternatively be a third-party CDN that distributes -content on behalf of multiple content providers; Akamai, Limelight and -Level-3 all operate third-party CDNs. A very readable overview of modern -CDNs is \[Leighton 2009; Nygren 2010\]. CDNs typically adopt one of two -different server placement philosophies \[Huang 2008\]: Enter Deep. One -philosophy, pioneered by Akamai, is to enter deep into the access -networks of Internet Service Providers, by deploying server clusters in -access ISPs all over the world. (Access networks are described in -Section 1.3.) Akamai takes this approach with clusters in approximately -1,700 locations. The goal is to get close to end users, thereby -improving user-perceived delay and throughput by decreasing the number -of links and routers between the end user and the CDN server from which -it receives content. Because of this highly distributed design, the task -of maintaining and managing the clusters becomes challenging. Bring -Home. A second design philosophy, taken by Limelight and many other CDN -companies, is to bring the ISPs home by building large clusters at a -smaller number (for example, tens) of sites. Instead of getting inside -the access ISPs, these CDNs typically place their clusters in Internet -Exchange Points (IXPs) (see Section 1.3). Compared with the enter-deep -design philosophy, the bring-home design typically results in lower -maintenance and management overhead, possibly at the expense of higher -delay and lower throughput to end users. Once its clusters are in place, -the CDN replicates content across its clusters. The CDN may not want to -place a copy of every video in each cluster, since some videos are -rarely viewed or are only popular in some countries. In fact, many CDNs -do not push videos to their clusters but instead use a simple pull -strategy: If a client requests a video from a cluster that is not -storing the video, then the cluster retrieves the video (from a central -repository or from another cluster) and stores a copy locally while -streaming the video to the client at the same time. Similar Web caching -(see Section 2.2.5), when a cluster's storage becomes full, it removes -videos that are not frequently requested. CDN Operation Having -identified the two major approaches toward deploying a CDN, let's now -dive down into the nuts and bolts of how a CDN operates. When a browser -in a user's - -CASE STUDY GOOGLE'S NETWORK INFRASTRUCTURE To support its vast array of -cloud services---including search, Gmail, calendar, YouTube video, maps, -documents, and social networks---Google has deployed an extensive -private network and CDN infrastructure. Google's CDN infrastructure has -three tiers of server clusters: - -Fourteen "mega data centers," with eight in North America, four in -Europe, and two in Asia \[Google Locations 2016\], with each data center -having on the order of 100,000 servers. These mega data centers are -responsible for serving dynamic (and often personalized) content, -including search results and Gmail messages. An estimated 50 clusters in -IXPs scattered throughout the world, with each cluster consisting on the -order of 100--500 servers \[Adhikari 2011a\]. These clusters are -responsible for serving static content, including YouTube videos -\[Adhikari 2011a\]. Many hundreds of "enter-deep" clusters located -within an access ISP. Here a cluster typically consists of tens of -servers within a single rack. These enter-deep servers perform TCP -splitting (see Section 3.7) and serve static content \[Chen 2011\], -including the static portions of Web pages that embody search results. -All of these data centers and cluster locations are networked together -with Google's own private network. When a user makes a search query, -often the query is first sent over the local ISP to a nearby enter-deep -cache, from where the static content is retrieved; while providing the -static content to the client, the nearby cache also forwards the query -over Google's private network to one of the mega data centers, from -where the personalized search results are retrieved. For a YouTube -video, the video itself may come from one of the bring-home caches, -whereas portions of the Web page surrounding the video may come from the -nearby enter-deep cache, and the advertisements surrounding the video -come from the data centers. In summary, except for the local ISPs, the -Google cloud services are largely provided by a network infrastructure -that is independent of the public Internet. - -host is instructed to retrieve a specific video (identified by a URL), -the CDN must intercept the request so that it can (1) determine a -suitable CDN server cluster for that client at that time, and (2) -redirect the client's request to a server in that cluster. We'll shortly -discuss how a CDN can determine a suitable cluster. But first let's -examine the mechanics behind intercepting and redirecting a request. -Most CDNs take advantage of DNS to intercept and redirect requests; an -interesting discussion of such a use of the DNS is \[Vixie 2009\]. Let's -consider a simple example to illustrate how the DNS is typically -involved. Suppose a content provider, NetCinema, employs the third-party -CDN company, KingCDN, to distribute its videos to its customers. On the -NetCinema Web pages, each of its videos is assigned a URL that includes -the string "video" and a unique identifier for the video itself; for -example, Transformers 7 might be assigned -http://video.netcinema.com/6Y7B23V. Six steps then occur, as shown in -Figure 2.25: - -1. The user visits the Web page at NetCinema. -2. When the user clicks on the link http://video.netcinema.com/6Y7B23V, - the user's host sends a DNS query for video.netcinema.com. - -3. The user's Local DNS Server (LDNS) relays the DNS query to an -authoritative DNS server for NetCinema, which observes the string -"video" in the hostname video.netcinema.com. To "hand over" the DNS -query to KingCDN, instead of returning an IP address, the NetCinema -authoritative DNS server returns to the LDNS a hostname in the KingCDN's -domain, for example, a1105.kingcdn.com. - -4. From this point on, the DNS query enters into KingCDN's private DNS - infrastructure. The user's LDNS then sends a second query, now for - a1105.kingcdn.com, and KingCDN's DNS system eventually returns the - IP addresses of a KingCDN content server to the LDNS. It is thus - here, within the KingCDN's DNS system, that the CDN server from - which the client will receive its content is specified. - -Figure 2.25 DNS redirects a user's request to a CDN server - -5. The LDNS forwards the IP address of the content-serving CDN node to - the user's host. -6. Once the client receives the IP address for a KingCDN content - server, it establishes a direct TCP connection with the server at - that IP address and issues an HTTP GET request for the video. If - DASH is used, the server will first send to the client a manifest - file with a list of URLs, one for each version of the video, and the - client will dynamically select chunks from the different versions. - Cluster Selection Strategies At the core of any CDN deployment is a - cluster selection strategy, that is, a mechanism for dynamically - directing clients to a server cluster or a data center within the - CDN. As we just saw, the - -CDN learns the IP address of the client's LDNS server via the client's -DNS lookup. After learning this IP address, the CDN needs to select an -appropriate cluster based on this IP address. CDNs generally employ -proprietary cluster selection strategies. We now briefly survey a few -approaches, each of which has its own advantages and disadvantages. One -simple strategy is to assign the client to the cluster that is -geographically closest. Using commercial geo-location databases (such as -Quova \[Quova 2016\] and Max-Mind \[MaxMind 2016\]), each LDNS IP -address is mapped to a geographic location. When a DNS request is -received from a particular LDNS, the CDN chooses the geographically -closest cluster, that is, the cluster that is the fewest kilometers from -the LDNS "as the bird flies." Such a solution can work reasonably well -for a large fraction of the clients \[Agarwal 2009\]. However, for some -clients, the solution may perform poorly, since the geographically -closest cluster may not be the closest cluster in terms of the length or -number of hops of the network path. Furthermore, a problem inherent with -all DNS-based approaches is that some end-users are configured to use -remotely located LDNSs \[Shaikh 2001; Mao 2002\], in which case the LDNS -location may be far from the client's location. Moreover, this simple -strategy ignores the variation in delay and available bandwidth over -time of Internet paths, always assigning the same cluster to a -particular client. In order to determine the best cluster for a client -based on the current traffic conditions, CDNs can instead perform -periodic real-time measurements of delay and loss performance between -their clusters and clients. For instance, a CDN can have each of its -clusters periodically send probes (for example, ping messages or DNS -queries) to all of the LDNSs around the world. One drawback of this -approach is that many LDNSs are configured to not respond to such -probes. - -2.6.4 Case Studies: Netflix, YouTube, and Kankan We conclude our -discussion of streaming stored video by taking a look at three highly -successful largescale deployments: Netflix, YouTube, and Kankan. We'll -see that each of these systems take a very different approach, yet -employ many of the underlying principles discussed in this section. -Netflix Generating 37% of the downstream traffic in residential ISPs in -North America in 2015, Netflix has become the leading service provider -for online movies and TV series in the United States \[Sandvine 2015\]. -As we discuss below, Netflix video distribution has two major -components: the Amazon cloud and its own private CDN infrastructure. -Netflix has a Web site that handles numerous functions, including user -registration and login, billing, movie catalogue for browsing and -searching, and a movie recommendation system. As shown in Figure - -2.26, this Web site (and its associated backend databases) run entirely -on Amazon servers in the Amazon cloud. Additionally, the Amazon cloud -handles the following critical functions: Content ingestion. Before -Netflix can distribute a movie to its customers, it must first ingest -and process the movie. Netflix receives studio master versions of movies -and uploads them to hosts in the Amazon cloud. Content processing. The -machines in the Amazon cloud create many different formats for each -movie, suitable for a diverse array of client video players running on -desktop computers, smartphones, and game consoles connected to -televisions. A different version is created for each of these formats -and at multiple bit rates, allowing for adaptive streaming over HTTP -using DASH. Uploading versions to its CDN. Once all of the versions of a -movie have been created, the hosts in the Amazon cloud upload the -versions to its CDN. - -Figure 2.26 Netflix video streaming platform - -When Netflix first rolled out its video streaming service in 2007, it -employed three third-party CDN companies to distribute its video -content. Netflix has since created its own private CDN, from which it -now streams all of its videos. (Netflix still uses Akamai to distribute -its Web pages, however.) To create its own CDN, Netflix has installed -server racks both in IXPs and within residential ISPs themselves. -Netflix currently has server racks in over 50 IXP locations; see -\[Netflix Open Connect 2016\] for a current list of IXPs housing Netflix -racks. There are also hundreds of ISP locations housing Netflix racks; -also see \[Netflix Open Connect 2016\], where Netflix provides to -potential ISP partners instructions about installing a (free) Netflix -rack for their networks. Each server in the rack has several 10 Gbps - -Ethernet ports and over 100 terabytes of storage. The number of servers -in a rack varies: IXP installations often have tens of servers and -contain the entire Netflix streaming video library, including multiple -versions of the videos to support DASH; local IXPs may only have one -server and contain only the most popular videos. Netflix does not use -pull-caching (Section 2.2.5) to populate its CDN servers in the IXPs and -ISPs. Instead, Netflix distributes by pushing the videos to its CDN -servers during offpeak hours. For those locations that cannot hold the -entire library, Netflix pushes only the most popular videos, which are -determined on a day-to-day basis. The Netflix CDN design is described in -some detail in the YouTube videos \[Netflix Video 1\] and \[Netflix -Video 2\]. Having described the components of the Netflix architecture, -let's take a closer look at the interaction between the client and the -various servers that are involved in movie delivery. As indicated -earlier, the Web pages for browsing the Netflix video library are served -from servers in the Amazon cloud. When a user selects a movie to play, -the Netflix software, running in the Amazon cloud, first determines -which of its CDN servers have copies of the movie. Among the servers -that have the movie, the software then determines the "best" server for -that client request. If the client is using a residential ISP that has a -Netflix CDN server rack installed in that ISP, and this rack has a copy -of the requested movie, then a server in this rack is typically -selected. If not, a server at a nearby IXP is typically selected. Once -Netflix determines the CDN server that is to deliver the content, it -sends the client the IP address of the specific server as well as a -manifest file, which has the URLs for the different versions of the -requested movie. The client and that CDN server then directly interact -using a proprietary version of DASH. Specifically, as described in -Section 2.6.2, the client uses the byte-range header in HTTP GET request -messages, to request chunks from the different versions of the movie. -Netflix uses chunks that are approximately four-seconds long \[Adhikari -2012\]. While the chunks are being downloaded, the client measures the -received throughput and runs a rate-determination algorithm to determine -the quality of the next chunk to request. Netflix embodies many of the -key principles discussed earlier in this section, including adaptive -streaming and CDN distribution. However, because Netflix uses its own -private CDN, which distributes only video (and not Web pages), Netflix -has been able to simplify and tailor its CDN design. In particular, -Netflix does not need to employ DNS redirect, as discussed in Section -2.6.3, to connect a particular client to a CDN server; instead, the -Netflix software (running in the Amazon cloud) directly tells the client -to use a particular CDN server. Furthermore, the Netflix CDN uses push -caching rather than pull caching (Section 2.2.5): content is pushed into -the servers at scheduled times at off-peak hours, rather than -dynamically during cache misses. YouTube With 300 hours of video -uploaded to YouTube every minute and several billion video views per day -\[YouTube 2016\], YouTube is indisputably the world's largest -video-sharing site. YouTube began its - -service in April 2005 and was acquired by Google in November 2006. -Although the Google/YouTube design and protocols are proprietary, -through several independent measurement efforts we can gain a basic -understanding about how YouTube operates \[Zink 2009; Torres 2011; -Adhikari 2011a\]. As with Netflix, YouTube makes extensive use of CDN -technology to distribute its videos \[Torres 2011\]. Similar to Netflix, -Google uses its own private CDN to distribute YouTube videos, and has -installed server clusters in many hundreds of different IXP and ISP -locations. From these locations and directly from its huge data centers, -Google distributes YouTube videos \[Adhikari 2011a\]. Unlike Netflix, -however, Google uses pull caching, as described in Section 2.2.5, and -DNS redirect, as described in Section 2.6.3. Most of the time, Google's -cluster-selection strategy directs the client to the cluster for which -the RTT between client and cluster is the lowest; however, in order to -balance the load across clusters, sometimes the client is directed (via -DNS) to a more distant cluster \[Torres 2011\]. YouTube employs HTTP -streaming, often making a small number of different versions available -for a video, each with a different bit rate and corresponding quality -level. YouTube does not employ adaptive streaming (such as DASH), but -instead requires the user to manually select a version. In order to save -bandwidth and server resources that would be wasted by repositioning or -early termination, YouTube uses the HTTP byte range request to limit the -flow of transmitted data after a target amount of video is prefetched. -Several million videos are uploaded to YouTube every day. Not only are -YouTube videos streamed from server to client over HTTP, but YouTube -uploaders also upload their videos from client to server over HTTP. -YouTube processes each video it receives, converting it to a YouTube -video format and creating multiple versions at different bit rates. This -processing takes place entirely within Google data centers. (See the -case study on Google's network infrastructure in Section 2.6.3.) Kankan -We just saw that dedicated servers, operated by private CDNs, stream -Netflix and YouTube videos to clients. Netflix and YouTube have to pay -not only for the server hardware but also for the bandwidth the servers -use to distribute the videos. Given the scale of these services and the -amount of bandwidth they are consuming, such a CDN deployment can be -costly. We conclude this section by describing an entirely different -approach for providing video on demand over the Internet at a large -scale---one that allows the service provider to significantly reduce its -infrastructure and bandwidth costs. As you might suspect, this approach -uses P2P delivery instead of (or along with) client-server delivery. -Since 2011, Kankan (owned and operated by Xunlei) has been deploying P2P -video delivery with great success, with tens of millions of users every -month \[Zhang 2015\]. At a high level, P2P video streaming is very -similar to BitTorrent file downloading. When a peer wants to - -see a video, it contacts a tracker to discover other peers in the system -that have a copy of that video. This requesting peer then requests -chunks of the video in parallel from the other peers that have the -video. Different from downloading with BitTorrent, however, requests are -preferentially made for chunks that are to be played back in the near -future in order to ensure continuous playback \[Dhungel 2012\]. -Recently, Kankan has migrated to a hybrid CDN-P2P streaming system -\[Zhang 2015\]. Specifically, Kankan now deploys a few hundred servers -within China and pushes video content to these servers. This Kankan CDN -plays a major role in the start-up stage of video streaming. In most -cases, the client requests the beginning of the content from CDN -servers, and in parallel requests content from peers. When the total P2P -traffic is sufficient for video playback, the client will cease -streaming from the CDN and only stream from peers. But if the P2P -streaming traffic becomes insufficient, the client will restart CDN -connections and return to the mode of hybrid CDN-P2P streaming. In this -manner, Kankan can ensure short initial start-up delays while minimally -relying on costly infrastructure servers and bandwidth. - -2.7 Socket Programming: Creating Network Applications Now that we've -looked at a number of important network applications, let's explore how -network application programs are actually created. Recall from Section -2.1 that a typical network application consists of a pair of -programs---a client program and a server program---residing in two -different end systems. When these two programs are executed, a client -process and a server process are created, and these processes -communicate with each other by reading from, and writing to, sockets. -When creating a network application, the developer's main task is -therefore to write the code for both the client and server programs. -There are two types of network applications. One type is an -implementation whose operation is specified in a protocol standard, such -as an RFC or some other standards document; such an application is -sometimes referred to as "open," since the rules specifying its -operation are known to all. For such an implementation, the client and -server programs must conform to the rules dictated by the RFC. For -example, the client program could be an implementation of the client -side of the HTTP protocol, described in Section 2.2 and precisely -defined in RFC 2616; similarly, the server program could be an -implementation of the HTTP server protocol, also precisely defined in -RFC 2616. If one developer writes code for the client program and -another developer writes code for the server program, and both -developers carefully follow the rules of the RFC, then the two programs -will be able to interoperate. Indeed, many of today's network -applications involve communication between client and server programs -that have been created by independent developers---for example, a Google -Chrome browser communicating with an Apache Web server, or a BitTorrent -client communicating with BitTorrent tracker. The other type of network -application is a proprietary network application. In this case the -client and server programs employ an application-layer protocol that has -not been openly published in an RFC or elsewhere. A single developer (or -development team) creates both the client and server programs, and the -developer has complete control over what goes in the code. But because -the code does not implement an open protocol, other independent -developers will not be able to develop code that interoperates with the -application. In this section, we'll examine the key issues in developing -a client-server application, and we'll "get our hands dirty" by looking -at code that implements a very simple client-server application. During -the development phase, one of the first decisions the developer must -make is whether the application is to run over TCP or over UDP. Recall -that TCP is connection oriented and provides a reliable byte-stream -channel through which data flows between two end systems. UDP is -connectionless and sends independent packets of data from one end system -to the other, without any guarantees about delivery. - -Recall also that when a client or server program implements a protocol -defined by an RFC, it should use the well-known port number associated -with the protocol; conversely, when developing a proprietary -application, the developer must be careful to avoid using such -well-known port numbers. (Port numbers were briefly discussed in Section -2.1. They are covered in more detail in Chapter 3.) We introduce UDP and -TCP socket programming by way of a simple UDP application and a simple -TCP application. We present the simple UDP and TCP applications in -Python 3. We could have written the code in Java, C, or C++, but we -chose Python mostly because Python clearly exposes the key socket -concepts. With Python there are fewer lines of code, and each line can -be explained to the novice programmer without difficulty. But there's no -need to be frightened if you are not familiar with Python. You should be -able to easily follow the code if you have experience programming in -Java, C, or C++. If you are interested in client-server programming with -Java, you are encouraged to see the Companion Website for this textbook; -in fact, you can find there all the examples in this section (and -associated labs) in Java. For readers who are interested in -client-server programming in C, there are several good references -available \[Donahoo 2001; Stevens 1997; Frost 1994; Kurose 1996\]; our -Python examples below have a similar look and feel to C. - -2.7.1 Socket Programming with UDP In this subsection, we'll write simple -client-server programs that use UDP; in the following section, we'll -write similar programs that use TCP. Recall from Section 2.1 that -processes running on different machines communicate with each other by -sending messages into sockets. We said that each process is analogous to -a house and the process's socket is analogous to a door. The application -resides on one side of the door in the house; the transport-layer -protocol resides on the other side of the door in the outside world. The -application developer has control of everything on the application-layer -side of the socket; however, it has little control of the -transport-layer side. Now let's take a closer look at the interaction -between two communicating processes that use UDP sockets. Before the -sending process can push a packet of data out the socket door, when -using UDP, it must first attach a destination address to the packet. -After the packet passes through the sender's socket, the Internet will -use this destination address to route the packet through the Internet to -the socket in the receiving process. When the packet arrives at the -receiving socket, the receiving process will retrieve the packet through -the socket, and then inspect the packet's contents and take appropriate -action. So you may be now wondering, what goes into the destination -address that is attached to the packet? - -As you might expect, the destination host's IP address is part of the -destination address. By including the destination IP address in the -packet, the routers in the Internet will be able to route the packet -through the Internet to the destination host. But because a host may be -running many network application processes, each with one or more -sockets, it is also necessary to identify the particular socket in the -destination host. When a socket is created, an identifier, called a port -number, is assigned to it. So, as you might expect, the packet's -destination address also includes the socket's port number. In summary, -the sending process attaches to the packet a destination address, which -consists of the destination host's IP address and the destination -socket's port number. Moreover, as we shall soon see, the sender's -source address---consisting of the IP address of the source host and the -port number of the source socket---are also attached to the packet. -However, attaching the source address to the packet is typically not -done by the UDP application code; instead it is automatically done by -the underlying operating system. We'll use the following simple -client-server application to demonstrate socket programming for both UDP -and TCP: - -1. The client reads a line of characters (data) from its keyboard and - sends the data to the server. -2. The server receives the data and converts the characters to - uppercase. -3. The server sends the modified data to the client. -4. The client receives the modified data and displays the line on its - screen. Figure 2.27 highlights the main socket-related activity of - the client and server that communicate over the UDP transport - service. Now let's get our hands dirty and take a look at the - client-server program pair for a UDP implementation of this simple - application. We also provide a detailed, line-by-line analysis after - each program. We'll begin with the UDP client, which will send a - simple application-level message to the server. In order for - -Figure 2.27 The client-server application using UDP - -the server to be able to receive and reply to the client's message, it -must be ready and running---that is, it must be running as a process -before the client sends its message. The client program is called -UDPClient.py, and the server program is called UDPServer.py. In order to -emphasize the key issues, we intentionally provide code that is minimal. -"Good code" would certainly have a few more auxiliary lines, in -particular for handling error cases. For this application, we have -arbitrarily chosen 12000 for the server port number. UDPClient.py Here -is the code for the client side of the application: - -from socket import \* serverName = 'hostname' serverPort = 12000 - -clientSocket = socket(AF_INET, SOCK_DGRAM) message = raw_input('Input -lowercase sentence:') clientSocket.sendto(message.encode(),(serverName, -serverPort)) modifiedMessage, serverAddress = -clientSocket.recvfrom(2048) print(modifiedMessage.decode()) -clientSocket.close() - -Now let's take a look at the various lines of code in UDPClient.py. - -from socket import \* - -The socket module forms the basis of all network communications in -Python. By including this line, we will be able to create sockets within -our program. - -serverName = 'hostname' serverPort = 12000 - -The first line sets the variable serverName to the string 'hostname'. -Here, we provide a string containing either the IP address of the server -(e.g., "128.138.32.126") or the hostname of the server (e.g., -"cis.poly.edu"). If we use the hostname, then a DNS lookup will -automatically be performed to get the IP address.) The second line sets -the integer variable serverPort to 12000. - -clientSocket = socket(AF_INET, SOCK_DGRAM) - -This line creates the client's socket, called clientSocket . The first -parameter indicates the address family; in particular, AF_INET indicates -that the underlying network is using IPv4. (Do not worry about this -now---we will discuss IPv4 in Chapter 4.) The second parameter indicates -that the socket is of type SOCK_DGRAM , which means it is a UDP socket -(rather than a TCP socket). Note that we are not specifying the port -number of the client socket when we create it; we are instead letting -the operating system do this for us. Now that the client process's door -has been created, we will want to create a message to send through the -door. - -message = raw_input('Input lowercase sentence:') - -raw_input() is a built-in function in Python. When this command is -executed, the user at the client is prompted with the words "Input -lowercase sentence:" The user then uses her keyboard to input a line, -which is put into the variable message . Now that we have a socket and a -message, we will want to send the message through the socket to the -destination host. - -clientSocket.sendto(message.encode(),(serverName, serverPort)) - -In the above line, we first convert the message from string type to byte -type, as we need to send bytes into a socket; this is done with the -encode() method. The method sendto() attaches the destination address ( -serverName, serverPort ) to the message and sends the resulting packet -into the process's socket, clientSocket . (As mentioned earlier, the -source address is also attached to the packet, although this is done -automatically rather than explicitly by the code.) Sending a -client-to-server message via a UDP socket is that simple! After sending -the packet, the client waits to receive data from the server. - -modifiedMessage, serverAddress = clientSocket.recvfrom(2048) - -With the above line, when a packet arrives from the Internet at the -client's socket, the packet's data is put into the variable -modifiedMessage and the packet's source address is put into the variable -serverAddress . The variable serverAddress contains both the server's IP -address and the server's port number. The program UDPClient doesn't -actually need this server address information, since it already knows -the server address from the outset; but this line of Python provides the -server address nevertheless. The method recvfrom also takes the buffer -size 2048 as input. (This buffer size works for most purposes.) - -print(modifiedMessage.decode()) - -This line prints out modifiedMessage on the user's display, after -converting the message from bytes to string. It should be the original -line that the user typed, but now capitalized. - -clientSocket.close() - -This line closes the socket. The process then terminates. UDPServer.py -Let's now take a look at the server side of the application: - -from socket import \* serverPort = 12000 serverSocket = socket(AF_INET, -SOCK_DGRAM) serverSocket.bind(('', serverPort)) print("The server is -ready to receive") while True: message, clientAddress = -serverSocket.recvfrom(2048) modifiedMessage = message.decode().upper() -serverSocket.sendto(modifiedMessage.encode(), clientAddress) - -Note that the beginning of UDPServer is similar to UDPClient. It also -imports the socket module, also sets the integer variable serverPort to -12000, and also creates a socket of type SOCK_DGRAM (a UDP socket). The -first line of code that is significantly different from UDPClient is: - -serverSocket.bind(('', serverPort)) - -The above line binds (that is, assigns) the port number 12000 to the -server's socket. Thus in UDPServer, the code (written by the application -developer) is explicitly assigning a port number to the socket. In this -manner, when anyone sends a packet to port 12000 at the IP address of -the server, that packet will be directed to this socket. UDPServer then -enters a while loop; the while loop will allow UDPServer to receive and -process packets from clients indefinitely. In the while loop, UDPServer -waits for a packet to arrive. - -message, clientAddress = serverSocket.recvfrom(2048) - -This line of code is similar to what we saw in UDPClient. When a packet -arrives at the server's socket, the packet's data is put into the -variable message and the packet's source address is put into the -variable clientAddress . The variable clientAddress contains both the -client's IP address and the client's port number. Here, UDPServer will -make use of this address information, as it provides a return - -address, similar to the return address with ordinary postal mail. With -this source address information, the server now knows to where it should -direct its reply. - -modifiedMessage = message.decode().upper() - -This line is the heart of our simple application. It takes the line sent -by the client and, after converting the message to a string, uses the -method upper() to capitalize it. - -serverSocket.sendto(modifiedMessage.encode(), clientAddress) - -This last line attaches the client's address (IP address and port -number) to the capitalized message (after converting the string to -bytes), and sends the resulting packet into the server's socket. (As -mentioned earlier, the server address is also attached to the packet, -although this is done automatically rather than explicitly by the code.) -The Internet will then deliver the packet to this client address. After -the server sends the packet, it remains in the while loop, waiting for -another UDP packet to arrive (from any client running on any host). To -test the pair of programs, you run UDPClient.py on one host and -UDPServer.py on another host. Be sure to include the proper hostname or -IP address of the server in UDPClient.py. Next, you execute -UDPServer.py, the compiled server program, in the server host. This -creates a process in the server that idles until it is contacted by some -client. Then you execute UDPClient.py, the compiled client program, in -the client. This creates a process in the client. Finally, to use the -application at the client, you type a sentence followed by a carriage -return. To develop your own UDP client-server application, you can begin -by slightly modifying the client or server programs. For example, -instead of converting all the letters to uppercase, the server could -count the number of times the letter s appears and return this number. -Or you can modify the client so that after receiving a capitalized -sentence, the user can continue to send more sentences to the server. - -2.7.2 Socket Programming with TCP Unlike UDP, TCP is a -connection-oriented protocol. This means that before the client and -server can start to send data to each other, they first need to -handshake and establish a TCP connection. One end of the TCP connection -is attached to the client socket and the other end is attached to a -server socket. When creating the TCP connection, we associate with it -the client socket address (IP address and port - -number) and the server socket address (IP address and port number). With -the TCP connection established, when one side wants to send data to the -other side, it just drops the data into the TCP connection via its -socket. This is different from UDP, for which the server must attach a -destination address to the packet before dropping it into the socket. -Now let's take a closer look at the interaction of client and server -programs in TCP. The client has the job of initiating contact with the -server. In order for the server to be able to react to the client's -initial contact, the server has to be ready. This implies two things. -First, as in the case of UDP, the TCP server must be running as a -process before the client attempts to initiate contact. Second, the -server program must have a special door---more precisely, a special -socket---that welcomes some initial contact from a client process -running on an arbitrary host. Using our house/door analogy for a -process/socket, we will sometimes refer to the client's initial contact -as "knocking on the welcoming door." With the server process running, -the client process can initiate a TCP connection to the server. This is -done in the client program by creating a TCP socket. When the client -creates its TCP socket, it specifies the address of the welcoming socket -in the server, namely, the IP address of the server host and the port -number of the socket. After creating its socket, the client initiates a -three-way handshake and establishes a TCP connection with the server. -The three-way handshake, which takes place within the transport layer, -is completely invisible to the client and server programs. During the -three-way handshake, the client process knocks on the welcoming door of -the server process. When the server "hears" the knocking, it creates a -new door---more precisely, a new socket that is dedicated to that -particular client. In our example below, the welcoming door is a TCP -socket object that we call serverSocket ; the newly created socket -dedicated to the client making the connection is called connectionSocket -. Students who are encountering TCP sockets for the first time sometimes -confuse the welcoming socket (which is the initial point of contact for -all clients wanting to communicate with the server), and each newly -created server-side connection socket that is subsequently created for -communicating with each client. From the application's perspective, the -client's socket and the server's connection socket are directly -connected by a pipe. As shown in Figure 2.28, the client process can -send arbitrary bytes into its socket, and TCP guarantees that the server -process will receive (through the connection socket) each byte in the -order sent. TCP thus provides a reliable service between the client and -server processes. Furthermore, just as people can go in and out the same -door, the client process not only sends bytes into but also receives -bytes from its socket; similarly, the server process not only receives -bytes from but also sends bytes into its connection socket. We use the -same simple client-server application to demonstrate socket programming -with TCP: The client sends one line of data to the server, the server -capitalizes the line and sends it back to the client. Figure 2.29 -highlights the main socket-related activity of the client and server -that communicate over - -the TCP transport service. - -Figure 2.28 The TCPServer process has two sockets - -TCPClient.py Here is the code for the client side of the application: - -from socket import \* serverName = 'servername' serverPort = 12000 -clientSocket = socket(AF_INET, SOCK_STREAM) -clientSocket.connect((serverName, serverPort)) sentence = -raw_input('Input lowercase sentence:') -clientSocket.send(sentence.encode()) modifiedSentence = -clientSocket.recv(1024) print('From Server: ', -modifiedSentence.decode()) clientSocket.close() - -Let's now take a look at the various lines in the code that differ -significantly from the UDP implementation. The first such line is the -creation of the client socket. - -clientSocket = socket(AF_INET, SOCK_STREAM) - -This line creates the client's socket, called clientSocket . The first -parameter again indicates that the underlying network is using IPv4. The -second parameter - -Figure 2.29 The client-server application using TCP - -indicates that the socket is of type SOCK_STREAM , which means it is a -TCP socket (rather than a UDP socket). Note that we are again not -specifying the port number of the client socket when we create it; we -are instead letting the operating system do this for us. Now the next -line of code is very different from what we saw in UDPClient: - -clientSocket.connect((serverName, serverPort)) - -Recall that before the client can send data to the server (or vice -versa) using a TCP socket, a TCP connection must first be established -between the client and server. The above line initiates the TCP -connection between the client and server. The parameter of the connect() -method is the address of the server side of the connection. After this -line of code is executed, the three-way handshake is performed and a TCP -connection is established between the client and server. - -sentence = raw_input('Input lowercase sentence:') - -As with UDPClient, the above obtains a sentence from the user. The -string sentence continues to gather characters until the user ends the -line by typing a carriage return. The next line of code is also very -different from UDPClient: - -clientSocket.send(sentence.encode()) - -The above line sends the sentence through the client's socket and into -the TCP connection. Note that the program does not explicitly create a -packet and attach the destination address to the packet, as was the case -with UDP sockets. Instead the client program simply drops the bytes in -the string sentence into the TCP connection. The client then waits to -receive bytes from the server. - -modifiedSentence = clientSocket.recv(2048) - -When characters arrive from the server, they get placed into the string -modifiedSentence . Characters continue to accumulate in modifiedSentence -until the line ends with a carriage return character. After printing the -capitalized sentence, we close the client's socket: - -clientSocket.close() - -This last line closes the socket and, hence, closes the TCP connection -between the client and the server. It causes TCP in the client to send a -TCP message to TCP in the server (see Section 3.5). - -TCPServer.py Now let's take a look at the server program. - -from socket import \* serverPort = 12000 serverSocket = socket(AF_INET, -SOCK_STREAM) serverSocket.bind(('', serverPort)) serverSocket.listen(1) -print('The server is ready to receive') while True: connectionSocket, -addr = serverSocket.accept() sentence = -connectionSocket.recv(1024).decode() capitalizedSentence = -sentence.upper() connectionSocket.send(capitalizedSentence.encode()) -connectionSocket.close() - -Let's now take a look at the lines that differ significantly from -UDPServer and TCPClient. As with TCPClient, the server creates a TCP -socket with: - -serverSocket=socket(AF_INET, SOCK_STREAM) - -Similar to UDPServer, we associate the server port number, serverPort , -with this socket: - -serverSocket.bind(('', serverPort)) - -But with TCP, serverSocket will be our welcoming socket. After -establishing this welcoming door, we will wait and listen for some -client to knock on the door: - -serverSocket.listen(1) - -This line has the server listen for TCP connection requests from the -client. The parameter specifies the maximum number of queued connections -(at least 1). - -connectionSocket, addr = serverSocket.accept() - -When a client knocks on this door, the program invokes the accept() -method for serverSocket, which creates a new socket in the server, -called connectionSocket , dedicated to this particular client. The -client and server then complete the handshaking, creating a TCP -connection between the client's clientSocket and the server's -connectionSocket . With the TCP connection established, the client and -server can now send bytes to each other over the connection. With TCP, -all bytes sent from one side not are not only guaranteed to arrive at -the other side but also guaranteed arrive in order. - -connectionSocket.close() - -In this program, after sending the modified sentence to the client, we -close the connection socket. But since serverSocket remains open, -another client can now knock on the door and send the server a sentence -to modify. This completes our discussion of socket programming in TCP. -You are encouraged to run the two programs in two separate hosts, and -also to modify them to achieve slightly different goals. You should -compare the UDP program pair with the TCP program pair and see how they -differ. You should also do many of the socket programming assignments -described at the ends of Chapter 2, 4, and 9. Finally, we hope someday, -after mastering these and more advanced socket programs, you will write -your own popular network application, become very rich and famous, and -remember the authors of this textbook! - -2.8 Summary In this chapter, we've studied the conceptual and the -implementation aspects of network applications. We've learned about the -ubiquitous client-server architecture adopted by many Internet -applications and seen its use in the HTTP, SMTP, POP3, and DNS -protocols. We've studied these important applicationlevel protocols, and -their corresponding associated applications (the Web, file transfer, -e-mail, and DNS) in some detail. We've learned about the P2P -architecture and how it is used in many applications. We've also learned -about streaming video, and how modern video distribution systems -leverage CDNs. We've examined how the socket API can be used to build -network applications. We've walked through the use of sockets for -connection-oriented (TCP) and connectionless (UDP) end-to-end transport -services. The first step in our journey down the layered network -architecture is now complete! At the very beginning of this book, in -Section 1.1, we gave a rather vague, bare-bones definition of a -protocol: "the format and the order of messages exchanged between two or -more communicating entities, as well as the actions taken on the -transmission and/or receipt of a message or other event." The material -in this chapter, and in particular our detailed study of the HTTP, SMTP, -POP3, and DNS protocols, has now added considerable substance to this -definition. Protocols are a key concept in networking; our study of -application protocols has now given us the opportunity to develop a more -intuitive feel for what protocols are all about. In Section 2.1, we -described the service models that TCP and UDP offer to applications that -invoke them. We took an even closer look at these service models when we -developed simple applications that run over TCP and UDP in Section 2.7. -However, we have said little about how TCP and UDP provide these service -models. For example, we know that TCP provides a reliable data service, -but we haven't said yet how it does so. In the next chapter we'll take a -careful look at not only the what, but also the how and why of transport -protocols. Equipped with knowledge about Internet application structure -and application-level protocols, we're now ready to head further down -the protocol stack and examine the transport layer in Chapter 3. - -Homework Problems and Questions - -Chapter 2 Review Questions - -SECTION 2.1 R1. List five nonproprietary Internet applications and the -application-layer protocols that they use. R2. What is the difference -between network architecture and application architecture? R3. For a -communication session between a pair of processes, which process is the -client and which is the server? R4. For a P2P file-sharing application, -do you agree with the statement, "There is no notion of client and -server sides of a communication session"? Why or why not? R5. What -information is used by a process running on one host to identify a -process running on another host? R6. Suppose you wanted to do a -transaction from a remote client to a server as fast as possible. Would -you use UDP or TCP? Why? R7. Referring to Figure 2.4 , we see that none -of the applications listed in Figure 2.4 requires both no data loss and -timing. Can you conceive of an application that requires no data loss -and that is also highly time-sensitive? R8. List the four broad classes -of services that a transport protocol can provide. For each of the -service classes, indicate if either UDP or TCP (or both) provides such a -service. R9. Recall that TCP can be enhanced with SSL to provide -process-to-process security services, including encryption. Does SSL -operate at the transport layer or the application layer? If the -application developer wants TCP to be enhanced with SSL, what does the -developer have to do? - -SECTION 2.2--2.5 R10. What is meant by a handshaking protocol? R11. Why -do HTTP, SMTP, and POP3 run on top of TCP rather than on UDP? R12. -Consider an e-commerce site that wants to keep a purchase record for -each of its customers. Describe how this can be done with cookies. R13. -Describe how Web caching can reduce the delay in receiving a requested -object. Will Web caching reduce the delay for all objects requested by a -user or for only some of the objects? - -Why? R14. Telnet into a Web server and send a multiline request message. -Include in the request message the If-modified-since: header line to -force a response message with the 304 Not Modified status code. R15. -List several popular messaging apps. Do they use the same protocols as -SMS? R16. Suppose Alice, with a Web-based e-mail account (such as -Hotmail or Gmail), sends a message to Bob, who accesses his mail from -his mail server using POP3. Discuss how the message gets from Alice's -host to Bob's host. Be sure to list the series of application-layer -protocols that are used to move the message between the two hosts. R17. -Print out the header of an e-mail message you have recently received. -How many Received: header lines are there? Analyze each of the header -lines in the message. R18. From a user's perspective, what is the -difference between the download-and-delete mode and the -download-and-keep mode in POP3? R19. Is it possible for an -organization's Web server and mail server to have exactly the same alias -for a hostname (for example, foo.com )? What would be the type for the -RR that contains the hostname of the mail server? R20. Look over your -received e-mails, and examine the header of a message sent from a user -with a .edu e-mail address. Is it possible to determine from the header -the IP address of the host from which the message was sent? Do the same -for a message sent from a Gmail account. - -SECTION 2.5 R21. In BitTorrent, suppose Alice provides chunks to Bob -throughout a 30-second interval. Will Bob necessarily return the favor -and provide chunks to Alice in this same interval? Why or why not? R22. -Consider a new peer Alice that joins BitTorrent without possessing any -chunks. Without any chunks, she cannot become a top-four uploader for -any of the other peers, since she has nothing to upload. How then will -Alice get her first chunk? R23. What is an overlay network? Does it -include routers? What are the edges in the overlay network? - -SECTION 2.6 R24. CDNs typically adopt one of two different server -placement philosophies. Name and briefly describe them. R25. Besides -network-related considerations such as delay, loss, and bandwidth -performance, there are other important factors that go into designing a -CDN server selection strategy. What are they? - -SECTION 2.7 R26. In Section 2.7, the UDP server described needed only -one socket, whereas the TCP server needed two sockets. Why? If the TCP -server were to support n simultaneous connections, each from a different -client host, how many sockets would the TCP server need? R27. For the -client-server application over TCP described in Section 2.7 , why must -the server program be executed before the client program? For the -client-server application over UDP, why may the client program be -executed before the server program? - -Problems P1. True or false? - -a. A user requests a Web page that consists of some text and three - images. For this page, the client will send one request message and - receive four response messages. - -b. Two distinct Web pages (for example, www.mit.edu/research.html and - www.mit.edu/students.html ) can be sent over the same persistent - connection. - -c. With nonpersistent connections between browser and origin server, it - is possible for a single TCP segment to carry two distinct HTTP - request messages. - -d. The Date: header in the HTTP response message indicates when the - object in the response was last modified. - -e. HTTP response messages never have an empty message body. P2. SMS, - iMessage, and WhatsApp are all smartphone real-time messaging - systems. After doing some research on the Internet, for each of - these systems write one paragraph about the protocols they use. Then - write a paragraph explaining how they differ. P3. Consider an HTTP - client that wants to retrieve a Web document at a given URL. The IP - address of the HTTP server is initially unknown. What transport and - application-layer protocols besides HTTP are needed in this - scenario? P4. Consider the following string of ASCII characters that - were captured by Wireshark when the browser sent an HTTP GET message - (i.e., this is the actual content of an HTTP GET message). The - characters `<cr>`{=html}`<lf>`{=html} are carriage return and - line-feed characters (that is, the italized character string - `<cr>`{=html} in the text below represents the single - carriage-return character that was contained at that point in the - HTTP header). Answer the following questions, indicating where in - the HTTP GET message below you find the answer. GET - /cs453/index.html HTTP/1.1`<cr>`{=html}`<lf>`{=html}Host: gai - a.cs.umass.edu`<cr>`{=html}`<lf>`{=html}User-Agent: Mozilla/5.0 ( - Windows;U; Windows NT 5.1; en-US; rv:1.7.2) Gec ko/20040804 - Netscape/7.2 (ax) `<cr>`{=html}`<lf>`{=html}Accept:ex - -t/xml, application/xml, application/xhtml+xml, text /html;q=0.9, -text/plain;q=0.8, image/png,*/*;q=0.5 -`<cr>`{=html}`<lf>`{=html}Accept-Language: en-us, -en;q=0.5`<cr>`{=html}`<lf>`{=html}AcceptEncoding: zip, -deflate`<cr>`{=html}`<lf>`{=html}Accept-Charset: ISO -8859-1, -utf-8;q=0.7,\*;q=0.7`<cr>`{=html}`<lf>`{=html}Keep-Alive: -300`<cr>`{=html} -`<lf>`{=html}Connection:keep-alive`<cr>`{=html}`<lf>`{=html}`<cr>`{=html}`<lf>`{=html} - -a. What is the URL of the document requested by the browser? - -b. What version of HTTP is the browser running? - -c. Does the browser request a non-persistent or a persistent - connection? - -d. What is the IP address of the host on which the browser is running? - -e. What type of browser initiates this message? Why is the browser type - needed in an HTTP request message? P5. The text below shows the - reply sent from the server in response to the HTTP GET message in - the question above. Answer the following questions, indicating where - in the message below you find the answer. HTTP/1.1 200 - OK`<cr>`{=html}`<lf>`{=html}Date: Tue, 07 Mar 2008 - 12:39:45GMT`<cr>`{=html}`<lf>`{=html}Server: Apache/2.0.52 (Fedora) - `<cr>`{=html}`<lf>`{=html}Last-Modified: Sat, 10 Dec2005 18:27:46 - GMT`<cr>`{=html}`<lf>`{=html}ETag: - "526c3-f22-a88a4c80"`<cr>`{=html}`<lf>`{=html}AcceptRanges: - bytes`<cr>`{=html}`<lf>`{=html}Content-Length: - 3874`<cr>`{=html}`<lf>`{=html} Keep-Alive: - timeout=max=100`<cr>`{=html}`<lf>`{=html}Connection: - Keep-Alive`<cr>`{=html}`<lf>`{=html}Content-Type: text/html; - charset= - ISO-8859-1`<cr>`{=html}`<lf>`{=html}`<cr>`{=html}`<lf>`{=html}\<!doctype - html public "//w3c//dtd html 4.0 transitional//en"\>`<lf>`{=html} - - ```{=html} - <html> - ``` - `<lf>`{=html} - - ```{=html} - <head> - ``` - `<lf>`{=html} - - ```{=html} - <meta http-equiv=”Content-Type” - content=”text/html; charset=iso-8859-1”> - ``` - `<lf>`{=html} \<meta name="GENERATOR" content="Mozilla/4.79 \[en\] - (Windows NT 5.0; U) Netscape\]"\>`<lf>`{=html} - - ```{=html} - <title> - ``` - CMPSCI 453 / 591 / NTU-ST550ASpring 2005 homepage - - ```{=html} - </title> - ``` - `<lf>`{=html} - - ```{=html} - </head> - ``` - `<lf>`{=html} \<much more document text following here (not shown)\> - -f. Was the server able to successfully find the document or not? What - time was the document reply provided? - -g. When was the document last modified? - -h. How many bytes are there in the document being returned? - -i. What are the first 5 bytes of the document being returned? Did the - server agree to a - -persistent connection? P6. Obtain the HTTP/1.1 specification (RFC 2616). -Answer the following questions: - -a. Explain the mechanism used for signaling between the client and - server to indicate that a persistent connection is being closed. Can - the client, the server, or both signal the close of a connection? - -b. What encryption services are provided by HTTP? - -c. Can a client open three or more simultaneous connections with a - given server? - -d. Either a server or a client may close a transport connection between - them if either one detects the connection has been idle for some - time. Is it possible that one side starts closing a connection while - the other side is transmitting data via this connection? Explain. - P7. Suppose within your Web browser you click on a link to obtain a - Web page. The IP address for the associated URL is not cached in - your local host, so a DNS lookup is necessary to obtain the IP - address. Suppose that n DNS servers are visited before your host - receives the IP address from DNS; the successive visits incur an RTT - of RTT1,. . .,RTTn. Further suppose that the Web page associated - with the link contains exactly one object, consisting of a small - amount of HTML text. Let RTT0 denote the RTT between the local host - and the server containing the object. Assuming zero transmission - time of the object, how much time elapses from when the client - clicks on the link until the client receives the object? P8. - Referring to Problem P7, suppose the HTML file references eight very - small objects on the same server. Neglecting transmission times, how - much time elapses with - -e. Non-persistent HTTP with no parallel TCP connections? - -f. Non-persistent HTTP with the browser configured for 5 parallel - connections? - -g. Persistent HTTP? P9. Consider Figure 2.12 , for which there is an - institutional network connected to the Internet. Suppose that the - average object size is 850,000 bits and that the average request - rate from the institution's browsers to the origin servers is 16 - requests per second. Also suppose that the amount of time it takes - from when the router on the Internet side of the access link - forwards an HTTP request until it receives the response is three - seconds on average (see Section 2.2.5). Model the total average - response time as the sum of the average access delay (that is, the - delay from Internet router to institution router) and the average - Internet delay. For the average access delay, use Δ/(1−Δβ), where Δ - is the average time required to send an object over the access link - and b is the arrival rate of objects to the access link. - -h. Find the total average response time. - -i. Now suppose a cache is installed in the institutional LAN. Suppose - the miss rate is 0.4. Find the total response time. - -P10. Consider a short, 10-meter link, over which a sender can transmit -at a rate of 150 bits/sec in both directions. Suppose that packets -containing data are 100,000 bits long, and packets containing only -control (e.g., ACK or handshaking) are 200 bits long. Assume that N -parallel connections each get 1/N of the link bandwidth. Now consider -the HTTP protocol, and suppose that each downloaded object is 100 Kbits -long, and that the initial downloaded object contains 10 referenced -objects from the same sender. Would parallel downloads via parallel -instances of non-persistent HTTP make sense in this case? Now consider -persistent HTTP. Do you expect significant gains over the non-persistent -case? Justify and explain your answer. P11. Consider the scenario -introduced in the previous problem. Now suppose that the link is shared -by Bob with four other users. Bob uses parallel instances of -non-persistent HTTP, and the other four users use non-persistent HTTP -without parallel downloads. - -a. Do Bob's parallel connections help him get Web pages more quickly? - Why or why not? -b. If all five users open five parallel instances of non-persistent - HTTP, then would Bob's parallel connections still be beneficial? Why - or why not? P12. Write a simple TCP program for a server that - accepts lines of input from a client and prints the lines onto the - server's standard output. (You can do this by modifying the - TCPServer.py program in the text.) Compile and execute your program. - On any other machine that contains a Web browser, set the proxy - server in the browser to the host that is running your server - program; also configure the port number appropriately. Your browser - should now send its GET request messages to your server, and your - server should display the messages on its standard output. Use this - platform to determine whether your browser generates conditional GET - messages for objects that are locally cached. P13. What is the - difference between MAIL FROM : in SMTP and From : in the mail - message itself? P14. How does SMTP mark the end of a message body? - How about HTTP? Can HTTP use the same method as SMTP to mark the end - of a message body? Explain. P15. Read RFC 5321 for SMTP. What does - MTA stand for? Consider the following received spam e-mail (modified - from a real spam e-mail). Assuming only the originator of this spam - e-mail is malicious and all other hosts are honest, identify the - malacious host that has generated this spam e-mail. - -From - Fri Nov 07 13:41:30 2008 Return-Path: <tennis5@pp33head.com> -Received: from barmail.cs.umass.edu (barmail.cs.umass.edu -\[128.119.240.3\]) by cs.umass.edu (8.13.1/8.12.6) for -<hg@cs.umass.edu>; Fri, 7 Nov 2008 13:27:10 -0500 Received: from -asusus-4b96 (localhost \[127.0.0.1\]) by barmail.cs.umass.edu (Spam -Firewall) for <hg@cs.umass.edu>; Fri, 7 - -Nov 2008 13:27:07 -0500 (EST) Received: from asusus-4b96 -(\[58.88.21.177\]) by barmail.cs.umass.edu for <hg@cs.umass.edu>; Fri, -07 Nov 2008 13:27:07 -0500 (EST) Received: from \[58.88.21.177\] by -inbnd55.exchangeddd.com; Sat, 8 Nov 2008 01:27:07 +0700 From: "Jonny" -<tennis5@pp33head.com> To: <hg@cs.umass.edu> Subject: How to secure your -savings - -P16. Read the POP3 RFC, RFC 1939. What is the purpose of the UIDL POP3 -command? P17. Consider accessing your e-mail with POP3. - -a. Suppose you have configured your POP mail client to operate in the - download-anddelete mode. Complete the following transaction: - -C: list S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: -..........blah S: . ? ? - -b. Suppose you have configured your POP mail client to operate in the - download-and-keep mode. Complete the following transaction: C: list - S: 1 498 S: 2 912 S: . C: retr 1 S: blah blah ... S: ..........blah - S: . ? - -? - -c. Suppose you have configured your POP mail client to operate in the - download-and-keep mode. Using your transcript in part (b), suppose - you retrieve messages 1 and 2, exit POP, and then five minutes later - you again access POP to retrieve new e-mail. Suppose that in the - five-minute interval no new messages have been sent to you. Provide - a transcript of this second POP session. P18. - -d. What is a whois database? - -e. Use various whois databases on the Internet to obtain the names of - two DNS servers. Indicate which whois databases you used. - -f. Use nslookup on your local host to send DNS queries to three DNS - servers: your local DNS server and the two DNS servers you found in - part (b). Try querying for Type A, NS, and MX reports. Summarize - your findings. - -g. Use nslookup to find a Web server that has multiple IP addresses. - Does the Web server of your institution (school or company) have - multiple IP addresses? - -h. Use the ARIN whois database to determine the IP address range used - by your university. - -i. Describe how an attacker can use whois databases and the nslookup - tool to perform reconnaissance on an institution before launching an - attack. - -j. Discuss why whois databases should be publicly available. P19. In - this problem, we use the useful dig tool available on Unix and Linux - hosts to explore the hierarchy of DNS servers. Recall that in Figure - 2.19 , a DNS server in the DNS hierarchy delegates a DNS query to a - DNS server lower in the hierarchy, by sending back to the DNS client - the name of that lower-level DNS server. First read the man page for - dig, and then answer the following questions. - -k. Starting with a root DNS server (from one of the root servers - \[a-m\].root-servers.net), initiate a sequence of queries for the IP - address for your department's Web server by using dig. Show the list - of the names of DNS servers in the delegation chain in answering - your query. - -l. Repeat part (a) for several popular Web sites, such as google.com, - yahoo.com, or amazon.com. P20. Suppose you can access the caches in - the local DNS servers of your department. Can you propose a way to - roughly determine the Web servers (outside your department) that are - most popular among the users in your department? Explain. P21. - Suppose that your department has a local DNS server for all - computers in the department. - -You are an ordinary user (i.e., not a network/system administrator). Can -you determine if an external Web site was likely accessed from a -computer in your department a couple of seconds ago? Explain. P22. -Consider distributing a file of F=15 Gbits to N peers. The server has an -upload rate of us=30 Mbps, and each peer has a download rate of di=2 -Mbps and an upload rate of u. For N=10, 100, and 1,000 and u=300 Kbps, -700 Kbps, and 2 Mbps, prepare a chart giving the minimum distribution -time for each of the combinations of N and u for both client-server -distribution and P2P distribution. P23. Consider distributing a file of -F bits to N peers using a client-server architecture. Assume a fluid -model where the server can simultaneously transmit to multiple peers, -transmitting to each peer at different rates, as long as the combined -rate does not exceed us. - -a. Suppose that us/N≤dmin. Specify a distribution scheme that has a - distribution time of NF/us. - -b. Suppose that us/N≥dmin. Specify a distribution scheme that has a - distribution time of F/dmin. - -c. Conclude that the minimum distribution time is in general given by - max{NF/us, F/dmin}. P24. Consider distributing a file of F bits to N - peers using a P2P architecture. Assume a fluid model. For simplicity - assume that dmin is very large, so that peer download bandwidth is - never a bottleneck. - -d. Suppose that us≤(us+u1+...+uN)/N. Specify a distribution scheme that - has a distribution time of F/us. - -e. Suppose that us≥(us+u1+...+uN)/N. Specify a distribution scheme that - has a distribution time of NF/(us+u1+...+uN). - -f. Conclude that the minimum distribution time is in general given by - max{F/us, NF/(us+u1+...+uN)}. P25. Consider an overlay network with - N active peers, with each pair of peers having an active TCP - connection. Additionally, suppose that the TCP connections pass - through a total of M routers. How many nodes and edges are there in - the corresponding overlay network? P26. Suppose Bob joins a - BitTorrent torrent, but he does not want to upload any data to any - other peers (so called free-riding). - -g. Bob claims that he can receive a complete copy of the file that is - shared by the swarm. Is Bob's claim possible? Why or why not? - -h. Bob further claims that he can further make his "free-riding" more - efficient by using a collection of multiple computers (with distinct - IP addresses) in the computer lab in his department. How can he do - that? P27. Consider a DASH system for which there are N video - versions (at N different rates and qualities) and N audio versions - (at N different rates and qualities). Suppose we want to allow the - -player to choose at any time any of the N video versions and any of the -N audio versions. - -a. If we create files so that the audio is mixed in with the video, so - server sends only one media stream at given time, how many files - will the server need to store (each a different URL)? - -b. If the server instead sends the audio and video streams separately - and has the client synchronize the streams, how many files will the - server need to store? P28. Install and compile the Python programs - TCPClient and UDPClient on one host and TCPServer and UDPServer on - another host. - -c. Suppose you run TCPClient before you run TCPServer. What happens? - Why? - -d. Suppose you run UDPClient before you run UDPServer. What happens? - Why? - -e. What happens if you use different port numbers for the client and - server sides? P29. Suppose that in UDPClient.py, after we create the - socket, we add the line: clientSocket.bind(('', 5432)) - -Will it become necessary to change UDPServer.py? What are the port -numbers for the sockets in UDPClient and UDPServer? What were they -before making this change? P30. Can you configure your browser to open -multiple simultaneous connections to a Web site? What are the advantages -and disadvantages of having a large number of simultaneous TCP -connections? P31. We have seen that Internet TCP sockets treat the data -being sent as a byte stream but UDP sockets recognize message -boundaries. What are one advantage and one disadvantage of byte-oriented -API versus having the API explicitly recognize and preserve -application-defined message boundaries? P32. What is the Apache Web -server? How much does it cost? What functionality does it currently -have? You may want to look at Wikipedia to answer this question. - -Socket Programming Assignments The Companion Website includes six socket -programming assignments. The first four assignments are summarized -below. The fifth assignment makes use of the ICMP protocol and is -summarized at the end of Chapter 5. The sixth assignment employs -multimedia protocols and is summarized at the end of Chapter 9. It is -highly recommended that students complete several, if not all, of these -assignments. Students can find full details of these assignments, as -well as important snippets of the Python code, at the Web site -www.pearsonhighered.com/cs-resources. Assignment 1: Web Server - -In this assignment, you will develop a simple Web server in Python that -is capable of processing only one request. Specifically, your Web server -will (i) create a connection socket when contacted by a client -(browser); (ii) receive the HTTP request from this connection; (iii) -parse the request to determine the specific file being requested; (iv) -get the requested file from the server's file system; (v) create an HTTP -response message consisting of the requested file preceded by header -lines; and (vi) send the response over the TCP connection to the -requesting browser. If a browser requests a file that is not present in -your server, your server should return a "404 Not Found" error message. -In the Companion Website, we provide the skeleton code for your server. -Your job is to complete the code, run your server, and then test your -server by sending requests from browsers running on different hosts. If -you run your server on a host that already has a Web server running on -it, then you should use a different port than port 80 for your Web -server. Assignment 2: UDP Pinger In this programming assignment, you -will write a client ping program in Python. Your client will send a -simple ping message to a server, receive a corresponding pong message -back from the server, and determine the delay between when the client -sent the ping message and received the pong message. This delay is -called the Round Trip Time (RTT). The functionality provided by the -client and server is similar to the functionality provided by standard -ping program available in modern operating systems. However, standard -ping programs use the Internet Control Message Protocol (ICMP) (which we -will study in Chapter 5). Here we will create a nonstandard (but -simple!) UDP-based ping program. Your ping program is to send 10 ping -messages to the target server over UDP. For each message, your client is -to determine and print the RTT when the corresponding pong message is -returned. Because UDP is an unreliable protocol, a packet sent by the -client or server may be lost. For this reason, the client cannot wait -indefinitely for a reply to a ping message. You should have the client -wait up to one second for a reply from the server; if no reply is -received, the client should assume that the packet was lost and print a -message accordingly. In this assignment, you will be given the complete -code for the server (available in the Companion Website). Your job is to -write the client code, which will be very similar to the server code. It -is recommended that you first study carefully the server code. You can -then write your client code, liberally cutting and pasting lines from -the server code. Assignment 3: Mail Client The goal of this programming -assignment is to create a simple mail client that sends e-mail to any -recipient. Your client will need to establish a TCP connection with a -mail server (e.g., a Google mail server), dialogue with the mail server -using the SMTP protocol, send an e-mail message to a recipient - -(e.g., your friend) via the mail server, and finally close the TCP -connection with the mail server. For this assignment, the Companion -Website provides the skeleton code for your client. Your job is to -complete the code and test your client by sending e-mail to different -user accounts. You may also try sending through different servers (for -example, through a Google mail server and through your university mail -server). Assignment 4: Multi-Threaded Web Proxy In this assignment, you -will develop a Web proxy. When your proxy receives an HTTP request for -an object from a browser, it generates a new HTTP request for the same -object and sends it to the origin server. When the proxy receives the -corresponding HTTP response with the object from the origin server, it -creates a new HTTP response, including the object, and sends it to the -client. This proxy will be multi-threaded, so that it will be able to -handle multiple requests at the same time. For this assignment, the -Companion Website provides the skeleton code for the proxy server. Your -job is to complete the code, and then test it by having different -browsers request Web objects via your proxy. - -Wireshark Lab: HTTP Having gotten our feet wet with the Wireshark packet -sniffer in Lab 1, we're now ready to use Wireshark to investigate -protocols in operation. In this lab, we'll explore several aspects of -the HTTP protocol: the basic GET/reply interaction, HTTP message -formats, retrieving large HTML files, retrieving HTML files with -embedded URLs, persistent and non-persistent connections, and HTTP -authentication and security. As is the case with all Wireshark labs, the -full description of this lab is available at this book's Web site, -www.pearsonhighered.com/cs-resources. - -Wireshark Lab: DNS In this lab, we take a closer look at the client side -of the DNS, the protocol that translates Internet hostnames to IP -addresses. Recall from Section 2.5 that the client's role in the DNS is -relatively simple ---a client sends a query to its local DNS server and -receives a response back. Much can go on under the covers, invisible to -the DNS clients, as the hierarchical DNS servers communicate with each -other to either recursively or iteratively resolve the client's DNS -query. From the DNS client's standpoint, however, the protocol is quite -simple---a query is formulated to the local DNS server and a response is -received from that server. We observe DNS in action in this lab. - -As is the case with all Wireshark labs, the full description of this lab -is available at this book's Web site, -www.pearsonhighered.com/cs-resources. An Interview With... Marc -Andreessen Marc Andreessen is the co-creator of Mosaic, the Web browser -that popularized the World Wide Web in 1993. Mosaic had a clean, easily -understood interface and was the first browser to display images in-line -with text. In 1994, Marc Andreessen and Jim Clark founded Netscape, -whose browser was by far the most popular browser through the mid-1990s. -Netscape also developed the Secure Sockets Layer (SSL) protocol and many -Internet server products, including mail servers and SSL-based Web -servers. He is now a co-founder and general partner of venture capital -firm Andreessen Horowitz, overseeing portfolio development with holdings -that include Facebook, Foursquare, Groupon, Jawbone, Twitter, and Zynga. -He serves on numerous boards, including Bump, eBay, Glam Media, -Facebook, and Hewlett-Packard. He holds a BS in Computer Science from -the University of Illinois at Urbana-Champaign. - -How did you become interested in computing? Did you always know that you -wanted to work in information technology? The video game and personal -computing revolutions hit right when I was growing up---personal -computing was the new technology frontier in the late 70's and early -80's. And it wasn't just Apple and the IBM PC, but hundreds of new -companies like Commodore and Atari as well. I taught myself to program -out of a book called "Instant Freeze-Dried BASIC" at age 10, and got my -first computer (a TRS-80 Color Computer---look it up!) at age 12. Please -describe one or two of the most exciting projects you have worked on -during your career. - -What were the biggest challenges? Undoubtedly the most exciting project -was the original Mosaic web browser in '92--'93---and the biggest -challenge was getting anyone to take it seriously back then. At the -time, everyone thought the interactive future would be delivered as -"interactive television" by huge companies, not as the Internet by -startups. What excites you about the future of networking and the -Internet? What are your biggest concerns? The most exciting thing is the -huge unexplored frontier of applications and services that programmers -and entrepreneurs are able to explore---the Internet has unleashed -creativity at a level that I don't think we've ever seen before. My -biggest concern is the principle of unintended consequences---we don't -always know the implications of what we do, such as the Internet being -used by governments to run a new level of surveillance on citizens. Is -there anything in particular students should be aware of as Web -technology advances? The rate of change---the most important thing to -learn is how to learn---how to flexibly adapt to changes in the specific -technologies, and how to keep an open mind on the new opportunities and -possibilities as you move through your career. What people inspired you -professionally? Vannevar Bush, Ted Nelson, Doug Engelbart, Nolan -Bushnell, Bill Hewlett and Dave Packard, Ken Olsen, Steve Jobs, Steve -Wozniak, Andy Grove, Grace Hopper, Hedy Lamarr, Alan Turing, Richard -Stallman. What are your recommendations for students who want to pursue -careers in computing and information technology? Go as deep as you -possibly can on understanding how technology is created, and then -complement with learning how business works. Can technology solve the -world's problems? No, but we advance the standard of living of people -through economic growth, and most economic growth throughout history has -come from technology---so that's as good as it gets. - -Chapter 3 Transport Layer - -Residing between the application and network layers, the transport layer -is a central piece of the layered network architecture. It has the -critical role of providing communication services directly to the -application processes running on different hosts. The pedagogic approach -we take in this chapter is to alternate between discussions of -transport-layer principles and discussions of how these principles are -implemented in existing protocols; as usual, particular emphasis will be -given to Internet protocols, in particular the TCP and UDP -transport-layer protocols. We'll begin by discussing the relationship -between the transport and network layers. This sets the stage for -examining the first critical function of the transport layer---extending -the network layer's delivery service between two end systems to a -delivery service between two application-layer processes running on the -end systems. We'll illustrate this function in our coverage of the -Internet's connectionless transport protocol, UDP. We'll then return to -principles and confront one of the most fundamental problems in computer -networking---how two entities can communicate reliably over a medium -that may lose and corrupt data. Through a series of increasingly -complicated (and realistic!) scenarios, we'll build up an array of -techniques that transport protocols use to solve this problem. We'll -then show how these principles are embodied in TCP, the Internet's -connection-oriented transport protocol. We'll next move on to a second -fundamentally important problem in networking---controlling the -transmission rate of transport-layer entities in order to avoid, or -recover from, congestion within the network. We'll consider the causes -and consequences of congestion, as well as commonly used -congestion-control techniques. After obtaining a solid understanding of -the issues behind congestion control, we'll study TCP's approach to -congestion control. - -3.1 Introduction and Transport-Layer Services In the previous two -chapters we touched on the role of the transport layer and the services -that it provides. Let's quickly review what we have already learned -about the transport layer. A transport-layer protocol provides for -logical communication between application processes running on different -hosts. By logical communication, we mean that from an application's -perspective, it is as if the hosts running the processes were directly -connected; in reality, the hosts may be on opposite sides of the planet, -connected via numerous routers and a wide range of link types. -Application processes use the logical communication provided by the -transport layer to send messages to each other, free from the worry of -the details of the physical infrastructure used to carry these messages. -Figure 3.1 illustrates the notion of logical communication. As shown in -Figure 3.1, transport-layer protocols are implemented in the end systems -but not in network routers. On the sending side, the transport layer -converts the application-layer messages it receives from a sending -application process into transport-layer packets, known as -transport-layer segments in Internet terminology. This is done by -(possibly) breaking the application messages into smaller chunks and -adding a transport-layer header to each chunk to create the -transport-layer segment. The transport layer then passes the segment to -the network layer at the sending end system, where the segment is -encapsulated within a network-layer packet (a datagram) and sent to the -destination. It's important to note that network routers act only on the -network-layer fields of the datagram; that is, they do not examine the -fields of the transport-layer segment encapsulated with the datagram. On -the receiving side, the network layer extracts the transport-layer -segment from the datagram and passes the segment up to the transport -layer. The transport layer then processes the received segment, making -the data in the segment available to the receiving application. More -than one transport-layer protocol may be available to network -applications. For example, the Internet has two protocols---TCP and UDP. -Each of these protocols provides a different set of transportlayer -services to the invoking application. - -3.1.1 Relationship Between Transport and Network Layers Recall that the -transport layer lies just above the network layer in the protocol stack. -Whereas a transport-layer protocol provides logical communication -between - -Figure 3.1 The transport layer provides logical rather than physical -communication between application processes - -processes running on different hosts, a network-layer protocol provides -logical-communication between hosts. This distinction is subtle but -important. Let's examine this distinction with the aid of a household -analogy. Consider two houses, one on the East Coast and the other on the -West Coast, with each house being home to a dozen kids. The kids in the -East Coast household are cousins of the kids in the West Coast - -household. The kids in the two households love to write to each -other---each kid writes each cousin every week, with each letter -delivered by the traditional postal service in a separate envelope. -Thus, each household sends 144 letters to the other household every -week. (These kids would save a lot of money if they had e-mail!) In each -of the households there is one kid---Ann in the West Coast house and -Bill in the East Coast house---responsible for mail collection and mail -distribution. Each week Ann visits all her brothers and sisters, -collects the mail, and gives the mail to a postal-service mail carrier, -who makes daily visits to the house. When letters arrive at the West -Coast house, Ann also has the job of distributing the mail to her -brothers and sisters. Bill has a similar job on the East Coast. In this -example, the postal service provides logical communication between the -two houses---the postal service moves mail from house to house, not from -person to person. On the other hand, Ann and Bill provide logical -communication among the cousins---Ann and Bill pick up mail from, and -deliver mail to, their brothers and sisters. Note that from the cousins' -perspective, Ann and Bill are the mail service, even though Ann and Bill -are only a part (the end-system part) of the end-to-end delivery -process. This household example serves as a nice analogy for explaining -how the transport layer relates to the network layer: application -messages = letters in envelopes processes = cousins hosts (also called -end systems) = houses transport-layer protocol = Ann and Bill -network-layer protocol = postal service (including mail carriers) -Continuing with this analogy, note that Ann and Bill do all their work -within their respective homes; they are not involved, for example, in -sorting mail in any intermediate mail center or in moving mail from one -mail center to another. Similarly, transport-layer protocols live in the -end systems. Within an end system, a transport protocol moves messages -from application processes to the network edge (that is, the network -layer) and vice versa, but it doesn't have any say about how the -messages are moved within the network core. In fact, as illustrated in -Figure 3.1, intermediate routers neither act on, nor recognize, any -information that the transport layer may have added to the application -messages. Continuing with our family saga, suppose now that when Ann and -Bill go on vacation, another cousin pair---say, Susan and -Harvey---substitute for them and provide the household-internal -collection and delivery of mail. Unfortunately for the two families, -Susan and Harvey do not do the collection and delivery in exactly the -same way as Ann and Bill. Being younger kids, Susan and Harvey pick up -and drop off the mail less frequently and occasionally lose letters -(which are sometimes chewed up by the family dog). Thus, the cousin-pair -Susan and Harvey do not provide the same set of services (that is, the -same service model) as Ann and Bill. In an analogous manner, a computer -network may make - -available multiple transport protocols, with each protocol offering a -different service model to applications. The possible services that Ann -and Bill can provide are clearly constrained by the possible services -that the postal service provides. For example, if the postal service -doesn't provide a maximum bound on how long it can take to deliver mail -between the two houses (for example, three days), then there is no way -that Ann and Bill can guarantee a maximum delay for mail delivery -between any of the cousin pairs. In a similar manner, the services that -a transport protocol can provide are often constrained by the service -model of the underlying network-layer protocol. If the network-layer -protocol cannot provide delay or bandwidth guarantees for -transport-layer segments sent between hosts, then the transport-layer -protocol cannot provide delay or bandwidth guarantees for application -messages sent between processes. Nevertheless, certain services can be -offered by a transport protocol even when the underlying network -protocol doesn't offer the corresponding service at the network layer. -For example, as we'll see in this chapter, a transport protocol can -offer reliable data transfer service to an application even when the -underlying network protocol is unreliable, that is, even when the -network protocol loses, garbles, or duplicates packets. As another -example (which we'll explore in Chapter 8 when we discuss network -security), a transport protocol can use encryption to guarantee that -application messages are not read by intruders, even when the network -layer cannot guarantee the confidentiality of transport-layer segments. - -3.1.2 Overview of the Transport Layer in the Internet Recall that the -Internet makes two distinct transport-layer protocols available to the -application layer. One of these protocols is UDP (User Datagram -Protocol), which provides an unreliable, connectionless service to the -invoking application. The second of these protocols is TCP (Transmission -Control Protocol), which provides a reliable, connection-oriented -service to the invoking application. When designing a network -application, the application developer must specify one of these two -transport protocols. As we saw in Section 2.7, the application developer -selects between UDP and TCP when creating sockets. To simplify -terminology, we refer to the transport-layer packet as a segment. We -mention, however, that the Internet literature (for example, the RFCs) -also refers to the transport-layer packet for TCP as a segment but often -refers to the packet for UDP as a datagram. But this same Internet -literature also uses the term datagram for the network-layer packet! For -an introductory book on computer networking such as this, we believe -that it is less confusing to refer to both TCP and UDP packets as -segments, and reserve the term datagram for the network-layer packet. - -Before proceeding with our brief introduction of UDP and TCP, it will be -useful to say a few words about the Internet's network layer. (We'll -learn about the network layer in detail in Chapters 4 and 5.) The -Internet's network-layer protocol has a name---IP, for Internet -Protocol. IP provides logical communication between hosts. The IP -service model is a best-effort delivery service. This means that IP -makes its "best effort" to deliver segments between communicating hosts, -but it makes no guarantees. In particular, it does not guarantee segment -delivery, it does not guarantee orderly delivery of segments, and it -does not guarantee the integrity of the data in the segments. For these -reasons, IP is said to be an unreliable service. We also mention here -that every host has at least one networklayer address, a so-called IP -address. We'll examine IP addressing in detail in Chapter 4; for this -chapter we need only keep in mind that each host has an IP address. -Having taken a glimpse at the IP service model, let's now summarize the -service models provided by UDP and TCP. The most fundamental -responsibility of UDP and TCP is to extend IP's delivery service between -two end systems to a delivery service between two processes running on -the end systems. Extending host-to-host delivery to process-to-process -delivery is called transport-layer multiplexing and demultiplexing. -We'll discuss transport-layer multiplexing and demultiplexing in the -next section. UDP and TCP also provide integrity checking by including -error-detection fields in their segments' headers. These two minimal -transport-layer services---process-to-process data delivery and error -checking---are the only two services that UDP provides! In particular, -like IP, UDP is an unreliable service---it does not guarantee that data -sent by one process will arrive intact (or at all!) to the destination -process. UDP is discussed in detail in Section 3.3. TCP, on the other -hand, offers several additional services to applications. First and -foremost, it provides reliable data transfer. Using flow control, -sequence numbers, acknowledgments, and timers (techniques we'll explore -in detail in this chapter), TCP ensures that data is delivered from -sending process to receiving process, correctly and in order. TCP thus -converts IP's unreliable service between end systems into a reliable -data transport service between processes. TCP also provides congestion -control. Congestion control is not so much a service provided to the -invoking application as it is a service for the Internet as a whole, a -service for the general good. Loosely speaking, TCP congestion control -prevents any one TCP connection from swamping the links and routers -between communicating hosts with an excessive amount of traffic. TCP -strives to give each connection traversing a congested link an equal -share of the link bandwidth. This is done by regulating the rate at -which the sending sides of TCP connections can send traffic into the -network. UDP traffic, on the other hand, is unregulated. An application -using UDP transport can send at any rate it pleases, for as long as it -pleases. A protocol that provides reliable data transfer and congestion -control is necessarily complex. We'll need several sections to cover the -principles of reliable data transfer and congestion control, and -additional sections to cover the TCP protocol itself. These topics are -investigated in Sections 3.4 through 3.8. The approach taken in this -chapter is to alternate between basic principles and the TCP protocol. -For example, we'll first discuss reliable data transfer in a general -setting and then discuss how TCP - -specifically provides reliable data transfer. Similarly, we'll first -discuss congestion control in a general setting and then discuss how TCP -performs congestion control. But before getting into all this good -stuff, let's first look at transport-layer multiplexing and -demultiplexing. - -3.2 Multiplexing and Demultiplexing In this section, we discuss -transport-layer multiplexing and demultiplexing, that is, extending the -host-tohost delivery service provided by the network layer to a -process-to-process delivery service for applications running on the -hosts. In order to keep the discussion concrete, we'll discuss this -basic transport-layer service in the context of the Internet. We -emphasize, however, that a multiplexing/demultiplexing service is needed -for all computer networks. At the destination host, the transport layer -receives segments from the network layer just below. The transport layer -has the responsibility of delivering the data in these segments to the -appropriate application process running in the host. Let's take a look -at an example. Suppose you are sitting in front of your computer, and -you are downloading Web pages while running one FTP session and two -Telnet sessions. You therefore have four network application processes -running---two Telnet processes, one FTP process, and one HTTP process. -When the transport layer in your computer receives data from the network -layer below, it needs to direct the received data to one of these four -processes. Let's now examine how this is done. First recall from Section -2.7 that a process (as part of a network application) can have one or -more sockets, doors through which data passes from the network to the -process and through which data passes from the process to the network. -Thus, as shown in Figure 3.2, the transport layer in the receiving host -does not actually deliver data directly to a process, but instead to an -intermediary socket. Because at any given time there can be more than -one socket in the receiving host, each socket has a unique identifier. -The format of the identifier depends on whether the socket is a UDP or a -TCP socket, as we'll discuss shortly. Now let's consider how a receiving -host directs an incoming transport-layer segment to the appropriate -socket. Each transport-layer segment has a set of fields in the segment -for this purpose. At the receiving end, the transport layer examines -these fields to identify the receiving socket and then directs the -segment to that socket. This job of delivering the data in a -transport-layer segment to the correct socket is called demultiplexing. -The job of gathering data chunks at the source host from different -sockets, encapsulating each data chunk with header information (that -will later be used in demultiplexing) to create segments, and passing -the segments to the network layer is called multiplexing. Note that the -transport layer in the middle host - -Figure 3.2 Transport-layer multiplexing and demultiplexing - -in Figure 3.2 must demultiplex segments arriving from the network layer -below to either process P1 or P2 above; this is done by directing the -arriving segment's data to the corresponding process's socket. The -transport layer in the middle host must also gather outgoing data from -these sockets, form transportlayer segments, and pass these segments -down to the network layer. Although we have introduced multiplexing and -demultiplexing in the context of the Internet transport protocols, it's -important to realize that they are concerns whenever a single protocol -at one layer (at the transport layer or elsewhere) is used by multiple -protocols at the next higher layer. To illustrate the demultiplexing -job, recall the household analogy in the previous section. Each of the -kids is identified by his or her name. When Bill receives a batch of -mail from the mail carrier, he performs a demultiplexing operation by -observing to whom the letters are addressed and then hand delivering the -mail to his brothers and sisters. Ann performs a multiplexing operation -when she collects letters from her brothers and sisters and gives the -collected mail to the mail person. Now that we understand the roles of -transport-layer multiplexing and demultiplexing, let us examine how it -is actually done in a host. From the discussion above, we know that -transport-layer multiplexing requires (1) that sockets have unique -identifiers, and (2) that each segment have special fields that indicate -the socket to which the segment is to be delivered. These special -fields, illustrated in Figure 3.3, are the source port number field and -the destination port number field. (The UDP and TCP segments have other -fields as well, as discussed in the subsequent sections of this -chapter.) Each port number is a 16-bit number, ranging from 0 to 65535. -The port numbers ranging from 0 to 1023 are called well-known port -numbers and are restricted, which means that they are reserved for use -by well-known - -Figure 3.3 Source and destination port-number fields in a -transport-layer segment - -application protocols such as HTTP (which uses port number 80) and FTP -(which uses port number 21). The list of well-known port numbers is -given in RFC 1700 and is updated at http://www.iana.org \[RFC 3232\]. -When we develop a new application (such as the simple application -developed in Section 2.7), we must assign the application a port number. -It should now be clear how the transport layer could implement the -demultiplexing service: Each socket in the host could be assigned a port -number, and when a segment arrives at the host, the transport layer -examines the destination port number in the segment and directs the -segment to the corresponding socket. The segment's data then passes -through the socket into the attached process. As we'll see, this is -basically how UDP does it. However, we'll also see that -multiplexing/demultiplexing in TCP is yet more subtle. Connectionless -Multiplexing and Demultiplexing Recall from Section 2.7.1 that the -Python program running in a host can create a UDP socket with the line - -clientSocket = socket(AF_INET, SOCK_DGRAM) - -When a UDP socket is created in this manner, the transport layer -automatically assigns a port number to the socket. In particular, the -transport layer assigns a port number in the range 1024 to 65535 that is -currently not being used by any other UDP port in the host. -Alternatively, we can add a line into our Python program after we create -the socket to associate a specific port number (say, 19157) to this UDP -socket via the socket bind() method: - -clientSocket.bind(('', 19157)) - -If the application developer writing the code were implementing the -server side of a "well-known protocol," then the developer would have to -assign the corresponding well-known port number. Typically, the client -side of the application lets the transport layer automatically (and -transparently) assign the port number, whereas the server side of the -application assigns a specific port number. With port numbers assigned -to UDP sockets, we can now precisely describe UDP -multiplexing/demultiplexing. Suppose a process in Host A, with UDP port -19157, wants to send a chunk of application data to a process with UDP -port 46428 in Host B. The transport layer in Host A creates a -transport-layer segment that includes the application data, the source -port number (19157), the destination port number (46428), and two other -values (which will be discussed later, but are unimportant for the -current discussion). The transport layer then passes the resulting -segment to the network layer. The network layer encapsulates the segment -in an IP datagram and makes a best-effort attempt to deliver the segment -to the receiving host. If the segment arrives at the receiving Host B, -the transport layer at the receiving host examines the destination port -number in the segment (46428) and delivers the segment to its socket -identified by port 46428. Note that Host B could be running multiple -processes, each with its own UDP socket and associated port number. As -UDP segments arrive from the network, Host B directs (demultiplexes) -each segment to the appropriate socket by examining the segment's -destination port number. It is important to note that a UDP socket is -fully identified by a two-tuple consisting of a destination IP address -and a destination port number. As a consequence, if two UDP segments -have different source IP addresses and/or source port numbers, but have -the same destination IP address and destination port number, then the -two segments will be directed to the same destination process via the -same destination socket. You may be wondering now, what is the purpose -of the source port number? As shown in Figure 3.4, in the A-to-B segment -the source port number serves as part of a "return address"---when B -wants to send a segment back to A, the destination port in the B-to-A -segment will take its value from the source port value of the A-to-B -segment. (The complete return address is A's IP address and the source -port number.) As an example, recall the UDP server program studied in -Section 2.7. In UDPServer.py , the server uses the recvfrom() method to -extract the client-side (source) port number from the segment it -receives from the client; it then sends a new segment to the client, -with the extracted source port number serving as the destination port -number in this new segment. Connection-Oriented Multiplexing and -Demultiplexing In order to understand TCP demultiplexing, we have to -take a close look at TCP sockets and TCP connection establishment. One -subtle difference between a TCP socket and a UDP socket is that a TCP - -socket is identified by a four-tuple: (source IP address, source port -number, destination IP address, destination port number). Thus, when a -TCP segment arrives from the network to a host, the host uses all four -values to direct (demultiplex) the segment to the appropriate socket. - -Figure 3.4 The inversion of source and destination port numbers - -In particular, and in contrast with UDP, two arriving TCP segments with -different source IP addresses or source port numbers will (with the -exception of a TCP segment carrying the original connectionestablishment -request) be directed to two different sockets. To gain further insight, -let's reconsider the TCP client-server programming example in Section -2.7.2: The TCP server application has a "welcoming socket," that waits -for connection-establishment requests from TCP clients (see Figure 2.29) -on port number 12000. The TCP client creates a socket and sends a -connection establishment request segment with the lines: - -clientSocket = socket(AF_INET, SOCK_STREAM) -clientSocket.connect((serverName,12000)) - -A connection-establishment request is nothing more than a TCP segment -with destination port number 12000 and a special -connection-establishment bit set in the TCP header (discussed in Section -3.5). The segment also includes a source port number that was chosen by -the client. When the host operating system of the computer running the -server process receives the incoming - -connection-request segment with destination port 12000, it locates the -server process that is waiting to accept a connection on port number -12000. The server process then creates a new socket: connectionSocket, -addr = serverSocket.accept() - -Also, the transport layer at the server notes the following four values -in the connection-request segment: (1) the source port number in the -segment, (2) the IP address of the source host, (3) the destination port -number in the segment, and (4) its own IP address. The newly created -connection socket is identified by these four values; all subsequently -arriving segments whose source port, source IP address, destination -port, and destination IP address match these four values will be -demultiplexed to this socket. With the TCP connection now in place, the -client and server can now send data to each other. The server host may -support many simultaneous TCP connection sockets, with each socket -attached to a process, and with each socket identified by its own -four-tuple. When a TCP segment arrives at the host, all four fields -(source IP address, source port, destination IP address, destination -port) are used to direct (demultiplex) the segment to the appropriate -socket. - -FOCUS ON SECURITY Port Scanning We've seen that a server process waits -patiently on an open port for contact by a remote client. Some ports are -reserved for well-known applications (e.g., Web, FTP, DNS, and SMTP -servers); other ports are used by convention by popular applications -(e.g., the Microsoft 2000 SQL server listens for requests on UDP port -1434). Thus, if we determine that a port is open on a host, we may be -able to map that port to a specific application running on the host. -This is very useful for system administrators, who are often interested -in knowing which network applications are running on the hosts in their -networks. But attackers, in order to "case the joint," also want to know -which ports are open on target hosts. If a host is found to be running -an application with a known security flaw (e.g., a SQL server listening -on port 1434 was subject to a buffer overflow, allowing a remote user to -execute arbitrary code on the vulnerable host, a flaw exploited by the -Slammer worm \[CERT 2003--04\]), then that host is ripe for attack. -Determining which applications are listening on which ports is a -relatively easy task. Indeed there are a number of public domain -programs, called port scanners, that do just that. Perhaps the most -widely used of these is nmap, freely available at http://nmap.org and -included in most Linux distributions. For TCP, nmap sequentially scans -ports, looking for ports that are accepting TCP connections. For UDP, -nmap again sequentially scans ports, looking for UDP ports that respond -to transmitted UDP segments. In both cases, nmap returns a list of open, -closed, or unreachable ports. A host running nmap can attempt to scan -any target host anywhere in the - -Internet. We'll revisit nmap in Section 3.5.6, when we discuss TCP -connection management. - -Figure 3.5 Two clients, using the same destination port number (80) to -communicate with the same Web server application - -The situation is illustrated in Figure 3.5, in which Host C initiates -two HTTP sessions to server B, and Host A initiates one HTTP session to -B. Hosts A and C and server B each have their own unique IP address---A, -C, and B, respectively. Host C assigns two different source port numbers -(26145 and 7532) to its two HTTP connections. Because Host A is choosing -source port numbers independently of C, it might also assign a source -port of 26145 to its HTTP connection. But this is not a problem---server -B will still be able to correctly demultiplex the two connections having -the same source port number, since the two connections have different -source IP addresses. Web Servers and TCP Before closing this discussion, -it's instructive to say a few additional words about Web servers and how -they use port numbers. Consider a host running a Web server, such as an -Apache Web server, on port 80. When clients (for example, browsers) send -segments to the server, all segments will have destination port 80. In -particular, both the initial connection-establishment segments and the -segments carrying HTTP request messages will have destination port 80. -As we have just described, the server distinguishes the segments from -the different clients using source IP addresses and source port - -numbers. Figure 3.5 shows a Web server that spawns a new process for -each connection. As shown in Figure 3.5, each of these processes has its -own connection socket through which HTTP requests arrive and HTTP -responses are sent. We mention, however, that there is not always a -one-to-one correspondence between connection sockets and processes. In -fact, today's high-performing Web servers often use only one process, -and create a new thread with a new connection socket for each new client -connection. (A thread can be viewed as a lightweight subprocess.) If you -did the first programming assignment in Chapter 2, you built a Web -server that does just this. For such a server, at any given time there -may be many connection sockets (with different identifiers) attached to -the same process. If the client and server are using persistent HTTP, -then throughout the duration of the persistent connection the client and -server exchange HTTP messages via the same server socket. However, if -the client and server use non-persistent HTTP, then a new TCP connection -is created and closed for every request/response, and hence a new socket -is created and later closed for every request/response. This frequent -creating and closing of sockets can severely impact the performance of a -busy Web server (although a number of operating system tricks can be -used to mitigate the problem). Readers interested in the operating -system issues surrounding persistent and non-persistent HTTP are -encouraged to see \[Nielsen 1997; Nahum 2002\]. Now that we've discussed -transport-layer multiplexing and demultiplexing, let's move on and -discuss one of the Internet's transport protocols, UDP. In the next -section we'll see that UDP adds little more to the network-layer -protocol than a multiplexing/demultiplexing service. - -3.3 Connectionless Transport: UDP In this section, we'll take a close -look at UDP, how it works, and what it does. We encourage you to refer -back to Section 2.1, which includes an overview of the UDP service -model, and to Section 2.7.1, which discusses socket programming using -UDP. To motivate our discussion about UDP, suppose you were interested -in designing a no-frills, bare-bones transport protocol. How might you -go about doing this? You might first consider using a vacuous transport -protocol. In particular, on the sending side, you might consider taking -the messages from the application process and passing them directly to -the network layer; and on the receiving side, you might consider taking -the messages arriving from the network layer and passing them directly -to the application process. But as we learned in the previous section, -we have to do a little more than nothing! At the very least, the -transport layer has to provide a multiplexing/demultiplexing service in -order to pass data between the network layer and the correct -application-level process. UDP, defined in \[RFC 768\], does just about -as little as a transport protocol can do. Aside from the -multiplexing/demultiplexing function and some light error checking, it -adds nothing to IP. In fact, if the application developer chooses UDP -instead of TCP, then the application is almost directly talking with IP. -UDP takes messages from the application process, attaches source and -destination port number fields for the multiplexing/demultiplexing -service, adds two other small fields, and passes the resulting segment -to the network layer. The network layer encapsulates the transport-layer -segment into an IP datagram and then makes a best-effort attempt to -deliver the segment to the receiving host. If the segment arrives at the -receiving host, UDP uses the destination port number to deliver the -segment's data to the correct application process. Note that with UDP -there is no handshaking between sending and receiving transport-layer -entities before sending a segment. For this reason, UDP is said to be -connectionless. DNS is an example of an application-layer protocol that -typically uses UDP. When the DNS application in a host wants to make a -query, it constructs a DNS query message and passes the message to UDP. -Without performing any handshaking with the UDP entity running on the -destination end system, the host-side UDP adds header fields to the -message and passes the resulting segment to the network layer. The -network layer encapsulates the UDP segment into a datagram and sends the -datagram to a name server. The DNS application at the querying host then -waits for a reply to its query. If it doesn't receive a reply (possibly -because the underlying network lost the query or the reply), it might -try resending the query, try sending the query to another name server, -or inform the invoking application that it can't get a reply. - -Now you might be wondering why an application developer would ever -choose to build an application over UDP rather than over TCP. Isn't TCP -always preferable, since TCP provides a reliable data transfer service, -while UDP does not? The answer is no, as some applications are better -suited for UDP for the following reasons: Finer application-level -control over what data is sent, and when. Under UDP, as soon as an -application process passes data to UDP, UDP will package the data inside -a UDP segment and immediately pass the segment to the network layer. -TCP, on the other hand, has a congestioncontrol mechanism that throttles -the transport-layer TCP sender when one or more links between the source -and destination hosts become excessively congested. TCP will also -continue to resend a segment until the receipt of the segment has been -acknowledged by the destination, regardless of how long reliable -delivery takes. Since real-time applications often require a minimum -sending rate, do not want to overly delay segment transmission, and can -tolerate some data loss, TCP's service model is not particularly well -matched to these applications' needs. As discussed below, these -applications can use UDP and implement, as part of the application, any -additional functionality that is needed beyond UDP's no-frills -segment-delivery service. No connection establishment. As we'll discuss -later, TCP uses a three-way handshake before it starts to transfer data. -UDP just blasts away without any formal preliminaries. Thus UDP does not -introduce any delay to establish a connection. This is probably the -principal reason why DNS runs over UDP rather than TCP---DNS would be -much slower if it ran over TCP. HTTP uses TCP rather than UDP, since -reliability is critical for Web pages with text. But, as we briefly -discussed in Section 2.2, the TCP connection-establishment delay in HTTP -is an important contributor to the delays associated with downloading -Web documents. Indeed, the QUIC protocol (Quick UDP Internet Connection, -\[Iyengar 2015\]), used in Google's Chrome browser, uses UDP as its -underlying transport protocol and implements reliability in an -application-layer protocol on top of UDP. No connection state. TCP -maintains connection state in the end systems. This connection state -includes receive and send buffers, congestion-control parameters, and -sequence and acknowledgment number parameters. We will see in Section -3.5 that this state information is needed to implement TCP's reliable -data transfer service and to provide congestion control. UDP, on the -other hand, does not maintain connection state and does not track any of -these parameters. For this reason, a server devoted to a particular -application can typically support many more active clients when the -application runs over UDP rather than TCP. Small packet header overhead. -The TCP segment has 20 bytes of header overhead in every segment, -whereas UDP has only 8 bytes of overhead. Figure 3.6 lists popular -Internet applications and the transport protocols that they use. As we -expect, email, remote terminal access, the Web, and file transfer run -over TCP---all these applications need the reliable data transfer -service of TCP. Nevertheless, many important applications run over UDP -rather than TCP. For example, UDP is used to carry network management -(SNMP; see Section 5.7) data. UDP is preferred to TCP in this case, -since network management applications must often run when the - -network is in a stressed state---precisely when reliable, -congestion-controlled data transfer is difficult to achieve. Also, as we -mentioned earlier, DNS runs over UDP, thereby avoiding TCP's -connectionestablishment delays. As shown in Figure 3.6, both UDP and TCP -are somtimes used today with multimedia applications, such as Internet -phone, real-time video conferencing, and streaming of stored audio and -video. We'll take a close look at these applications in Chapter 9. We -just mention now that all of these applications can tolerate a small -amount of packet loss, so that reliable data transfer is not absolutely -critical for the application's success. Furthermore, real-time -applications, like Internet phone and video conferencing, react very -poorly to TCP's congestion control. For these reasons, developers of -multimedia applications may choose to run their applications over UDP -instead of TCP. When packet loss rates are low, and with some -organizations blocking UDP traffic for security reasons (see Chapter 8), -TCP becomes an increasingly attractive protocol for streaming media -transport. - -Figure 3.6 Popular Internet applications and their underlying transport -protocols - -Although commonly done today, running multimedia applications over UDP -is controversial. As we mentioned above, UDP has no congestion control. -But congestion control is needed to prevent the network from entering a -congested state in which very little useful work is done. If everyone -were to start streaming high-bit-rate video without using any congestion -control, there would be so much packet overflow at routers that very few -UDP packets would successfully traverse the source-to-destination path. -Moreover, the high loss rates induced by the uncontrolled UDP senders -would cause the TCP senders (which, as we'll see, do decrease their -sending rates in the face of congestion) to dramatically decrease their -rates. Thus, the lack of congestion control in UDP can result in high -loss rates between a UDP sender and receiver, and the crowding out of -TCP sessions---a potentially serious problem \[Floyd - -1999\]. Many researchers have proposed new mechanisms to force all -sources, including UDP sources, to perform adaptive congestion control -\[Mahdavi 1997; Floyd 2000; Kohler 2006: RFC 4340\]. Before discussing -the UDP segment structure, we mention that it is possible for an -application to have reliable data transfer when using UDP. This can be -done if reliability is built into the application itself (for example, -by adding acknowledgment and retransmission mechanisms, such as those -we'll study in the next section). We mentioned earlier that the QUIC -protocol \[Iyengar 2015\] used in Google's Chrome browser implements -reliability in an application-layer protocol on top of UDP. But this is -a nontrivial task that would keep an application developer busy -debugging for a long time. Nevertheless, building reliability directly -into the application allows the application to "have its cake and eat it -too. That is, application processes can communicate reliably without -being subjected to the transmission-rate constraints imposed by TCP's -congestion-control mechanism. - -3.3.1 UDP Segment Structure The UDP segment structure, shown in Figure -3.7, is defined in RFC 768. The application data occupies the data field -of the UDP segment. For example, for DNS, the data field contains either -a query message or a response message. For a streaming audio -application, audio samples fill the data field. The UDP header has only -four fields, each consisting of two bytes. As discussed in the previous -section, the port numbers allow the destination host to pass the -application data to the correct process running on the destination end -system (that is, to perform the demultiplexing function). The length -field specifies the number of bytes in the UDP segment (header plus -data). An explicit length value is needed since the size of the data -field may differ from one UDP segment to the next. The checksum is used -by the receiving host to check whether errors have been introduced into -the segment. In truth, the checksum is also calculated over a few of the -fields in the IP header in addition to the UDP segment. But we ignore -this detail in order to see the forest through the trees. We'll discuss -the checksum calculation below. Basic principles of error detection are -described in Section 6.2. The length field specifies the length of the -UDP segment, including the header, in bytes. - -3.3.2 UDP Checksum The UDP checksum provides for error detection. That -is, the checksum is used to determine whether bits within the UDP -segment have been altered (for example, by noise in the links or while -stored in a router) as it moved from source to destination. - -Figure 3.7 UDP segment structure - -UDP at the sender side performs the 1s complement of the sum of all the -16-bit words in the segment, with any overflow encountered during the -sum being wrapped around. This result is put in the checksum field of -the UDP segment. Here we give a simple example of the checksum -calculation. You can find details about efficient implementation of the -calculation in RFC 1071 and performance over real data in \[Stone 1998; -Stone 2000\]. As an example, suppose that we have the following three -16-bit words: 0110011001100000 0101010101010101 1000111100001100 The sum -of first two of these 16-bit words is 0110011001100000 0101010101010101 -1011101110110101 Adding the third word to the above sum gives -1011101110110101 1000111100001100 0100101011000010 Note that this last -addition had overflow, which was wrapped around. The 1s complement is -obtained by converting all the 0s to 1s and converting all the 1s to 0s. -Thus the 1s complement of the sum 0100101011000010 is 1011010100111101, -which becomes the checksum. At the receiver, all four 16- - -bit words are added, including the checksum. If no errors are introduced -into the packet, then clearly the sum at the receiver will be -1111111111111111. If one of the bits is a 0, then we know that errors -have been introduced into the packet. You may wonder why UDP provides a -checksum in the first place, as many link-layer protocols (including the -popular Ethernet protocol) also provide error checking. The reason is -that there is no guarantee that all the links between source and -destination provide error checking; that is, one of the links may use a -link-layer protocol that does not provide error checking. Furthermore, -even if segments are correctly transferred across a link, it's possible -that bit errors could be introduced when a segment is stored in a -router's memory. Given that neither link-by-link reliability nor -in-memory error detection is guaranteed, UDP must provide error -detection at the transport layer, on an end-end basis, if the endend -data transfer service is to provide error detection. This is an example -of the celebrated end-end principle in system design \[Saltzer 1984\], -which states that since certain functionality (error detection, in this -case) must be implemented on an end-end basis: "functions placed at the -lower levels may be redundant or of little value when compared to the -cost of providing them at the higher level." Because IP is supposed to -run over just about any layer-2 protocol, it is useful for the transport -layer to provide error checking as a safety measure. Although UDP -provides error checking, it does not do anything to recover from an -error. Some implementations of UDP simply discard the damaged segment; -others pass the damaged segment to the application with a warning. That -wraps up our discussion of UDP. We will soon see that TCP offers -reliable data transfer to its applications as well as other services -that UDP doesn't offer. Naturally, TCP is also more complex than UDP. -Before discussing TCP, however, it will be useful to step back and first -discuss the underlying principles of reliable data transfer. - -3.4 Principles of Reliable Data Transfer In this section, we consider -the problem of reliable data transfer in a general context. This is -appropriate since the problem of implementing reliable data transfer -occurs not only at the transport layer, but also at the link layer and -the application layer as well. The general problem is thus of central -importance to networking. Indeed, if one had to identify a "top-ten" -list of fundamentally important problems in all of networking, this -would be a candidate to lead the list. In the next section we'll examine -TCP and show, in particular, that TCP exploits many of the principles -that we are about to describe. Figure 3.8 illustrates the framework for -our study of reliable data transfer. The service abstraction provided to -the upper-layer entities is that of a reliable channel through which -data can be transferred. With a reliable channel, no transferred data -bits are corrupted (flipped from 0 to 1, or vice versa) or lost, and all -are delivered in the order in which they were sent. This is precisely -the service model offered by TCP to the Internet applications that -invoke it. It is the responsibility of a reliable data transfer protocol -to implement this service abstraction. This task is made difficult by -the fact that the layer below the reliable data transfer protocol may be -unreliable. For example, TCP is a reliable data transfer protocol that -is implemented on top of an unreliable (IP) end-to-end network layer. -More generally, the layer beneath the two reliably communicating end -points might consist of a single physical link (as in the case of a -link-level data transfer protocol) or a global internetwork (as in the -case of a transport-level protocol). For our purposes, however, we can -view this lower layer simply as an unreliable point-to-point channel. In -this section, we will incrementally develop the sender and receiver -sides of a reliable data transfer protocol, considering increasingly -complex models of the underlying channel. For example, we'll consider -what protocol mechanisms are - -Figure 3.8 Reliable data transfer: Service model and service -implementation - -needed when the underlying channel can corrupt bits or lose entire -packets. One assumption we'll adopt throughout our discussion here is -that packets will be delivered in the order in which they were sent, -with some packets possibly being lost; that is, the underlying channel -will not reorder packets. Figure 3.8(b) illustrates the interfaces for -our data transfer protocol. The sending side of the data transfer -protocol will be invoked from above by a call to rdt_send() . It will -pass the data to be delivered to the upper layer at the receiving side. -(Here rdt stands for reliable data transfer protocol and \_send -indicates that the sending side of rdt is being called. The first step -in developing any protocol is to choose a good name!) On the receiving -side, rdt_rcv() will be called when a packet arrives from the receiving -side of the channel. When the rdt protocol wants to deliver data to the -upper layer, it will do so by calling deliver_data() . In the following -we use the terminology "packet" rather than transport-layer "segment." -Because the theory developed in this section applies to computer -networks in general and not just to the Internet transport layer, the -generic term "packet" is perhaps more appropriate here. In this section -we consider only the case of unidirectional data transfer, that is, data -transfer from the sending to the receiving side. The case of reliable -bidirectional (that is, full-duplex) data transfer is conceptually no -more difficult but considerably more tedious to explain. Although we -consider only unidirectional data transfer, it is important to note that -the sending and receiving sides of our protocol will nonetheless need to -transmit packets in both directions, as indicated in Figure 3.8. We will -see shortly that, in addition to exchanging packets containing the data -to be transferred, the sending and receiving sides of rdt will also need -to exchange control packets back and forth. Both the send and receive -sides of rdt send packets to the other side by a call to udt_send() -(where udt stands for unreliable data transfer). - -3.4.1 Building a Reliable Data Transfer Protocol We now step through a -series of protocols, each one becoming more complex, arriving at a -flawless, reliable data transfer protocol. Reliable Data Transfer over a -Perfectly Reliable Channel: rdt1.0 We first consider the simplest case, -in which the underlying channel is completely reliable. The protocol -itself, which we'll call rdt1.0 , is trivial. The finite-state machine -(FSM) definitions for the rdt1.0 sender and receiver are shown in Figure -3.9. The FSM in Figure 3.9(a) defines the operation of the sender, while -the FSM in Figure 3.9(b) defines the operation of the receiver. It is -important to note that there are separate FSMs for the sender and for -the receiver. The sender and receiver FSMs in Figure 3.9 each have just -one state. The arrows in the FSM description indicate the transition of -the protocol from one state to another. (Since each FSM in Figure 3.9 -has just one state, a transition is necessarily from the one state back -to itself; we'll see more complicated state diagrams shortly.) The event -causing - -the transition is shown above the horizontal line labeling the -transition, and the actions taken when the event occurs are shown below -the horizontal line. When no action is taken on an event, or no event -occurs and an action is taken, we'll use the symbol Λ below or above the -horizontal, respectively, to explicitly denote the lack of an action or -event. The initial state of the FSM is indicated by the dashed arrow. -Although the FSMs in Figure 3.9 have but one state, the FSMs we will see -shortly have multiple states, so it will be important to identify the -initial state of each FSM. The sending side of rdt simply accepts data -from the upper layer via the rdt_send(data) event, creates a packet -containing the data (via the action make_pkt(data) ) and sends the -packet into the channel. In practice, the rdt_send(data) event would -result from a procedure call (for example, to rdt_send() ) by the -upper-layer application. - -Figure 3.9 rdt1.0 -- A protocol for a completely reliable channel - -On the receiving side, rdt receives a packet from the underlying channel -via the rdt_rcv(packet) event, removes the data from the packet (via the -action extract (packet, data) ) and passes the data up to the upper -layer (via the action deliver_data(data) ). In practice, the -rdt_rcv(packet) event would result from a procedure call (for example, -to rdt_rcv() ) from the lower-layer protocol. In this simple protocol, -there is no difference between a unit of data and a packet. Also, all -packet flow is from the sender to receiver; with a perfectly reliable -channel there is no need for the receiver side to provide any feedback -to the sender since nothing can go wrong! Note that we have also assumed -that - -the receiver is able to receive data as fast as the sender happens to -send data. Thus, there is no need for the receiver to ask the sender to -slow down! Reliable Data Transfer over a Channel with Bit Errors: rdt2.0 -A more realistic model of the underlying channel is one in which bits in -a packet may be corrupted. Such bit errors typically occur in the -physical components of a network as a packet is transmitted, propagates, -or is buffered. We'll continue to assume for the moment that all -transmitted packets are received (although their bits may be corrupted) -in the order in which they were sent. Before developing a protocol for -reliably communicating over such a channel, first consider how people -might deal with such a situation. Consider how you yourself might -dictate a long message over the phone. In a typical scenario, the -message taker might say "OK" after each sentence has been heard, -understood, and recorded. If the message taker hears a garbled sentence, -you're asked to repeat the garbled sentence. This message-dictation -protocol uses both positive acknowledgments ("OK") and negative -acknowledgments ("Please repeat that."). These control messages allow -the receiver to let the sender know what has been received correctly, -and what has been received in error and thus requires repeating. In a -computer network setting, reliable data transfer protocols based on such -retransmission are known as ARQ (Automatic Repeat reQuest) protocols. -Fundamentally, three additional protocol capabilities are required in -ARQ protocols to handle the presence of bit errors: Error detection. -First, a mechanism is needed to allow the receiver to detect when bit -errors have occurred. Recall from the previous section that UDP uses the -Internet checksum field for exactly this purpose. In Chapter 6 we'll -examine error-detection and -correction techniques in greater detail; -these techniques allow the receiver to detect and possibly correct -packet bit errors. For now, we need only know that these techniques -require that extra bits (beyond the bits of original data to be -transferred) be sent from the sender to the receiver; these bits will be -gathered into the packet checksum field of the rdt2.0 data packet. -Receiver feedback. Since the sender and receiver are typically executing -on different end systems, possibly separated by thousands of miles, the -only way for the sender to learn of the receiver's view of the world (in -this case, whether or not a packet was received correctly) is for the -receiver to provide explicit feedback to the sender. The positive (ACK) -and negative (NAK) acknowledgment replies in the message-dictation -scenario are examples of such feedback. Our rdt2.0 protocol will -similarly send ACK and NAK packets back from the receiver to the sender. -In principle, these packets need only be one bit long; for example, a 0 -value could indicate a NAK and a value of 1 could indicate an ACK. -Retransmission. A packet that is received in error at the receiver will -be retransmitted by the sender. - -Figure 3.10 shows the FSM representation of rdt2.0 , a data transfer -protocol employing error detection, positive acknowledgments, and -negative acknowledgments. The send side of rdt2.0 has two states. In the -leftmost state, the send-side protocol is waiting for data to be passed -down from the upper layer. When the rdt_send(data) event occurs, the -sender will create a packet ( sndpkt ) containing the data to be sent, -along with a packet checksum (for example, as discussed in Section 3.3.2 -for the case of a UDP segment), and then send the packet via the -udt_send(sndpkt) operation. In the rightmost state, the sender protocol -is waiting for an ACK or a NAK packet from the receiver. If an ACK -packet is received - -Figure 3.10 rdt2.0 -- A protocol for a channel with bit errors - -(the notation rdt_rcv(rcvpkt) && isACK (rcvpkt) in Figure 3.10 -corresponds to this event), the sender knows that the most recently -transmitted packet has been received correctly and thus the protocol -returns to the state of waiting for data from the upper layer. If a NAK -is received, the protocol retransmits the last packet and waits for an -ACK or NAK to be returned by the receiver in response to - -the retransmitted data packet. It is important to note that when the -sender is in the wait-for-ACK-or-NAK state, it cannot get more data from -the upper layer; that is, the rdt_send() event can not occur; that will -happen only after the sender receives an ACK and leaves this state. -Thus, the sender will not send a new piece of data until it is sure that -the receiver has correctly received the current packet. Because of this -behavior, protocols such as rdt2.0 are known as stop-and-wait protocols. -The receiver-side FSM for rdt2.0 still has a single state. On packet -arrival, the receiver replies with either an ACK or a NAK, depending on -whether or not the received packet is corrupted. In Figure 3.10, the -notation rdt_rcv(rcvpkt) && corrupt(rcvpkt) corresponds to the event in -which a packet is received and is found to be in error. Protocol rdt2.0 -may look as if it works but, unfortunately, it has a fatal flaw. In -particular, we haven't accounted for the possibility that the ACK or NAK -packet could be corrupted! (Before proceeding on, you should think about -how this problem may be fixed.) Unfortunately, our slight oversight is -not as innocuous as it may seem. Minimally, we will need to add checksum -bits to ACK/NAK packets in order to detect such errors. The more -difficult question is how the protocol should recover from errors in ACK -or NAK packets. The difficulty here is that if an ACK or NAK is -corrupted, the sender has no way of knowing whether or not the receiver -has correctly received the last piece of transmitted data. Consider -three possibilities for handling corrupted ACKs or NAKs: For the first -possibility, consider what a human might do in the message-dictation -scenario. If the speaker didn't understand the "OK" or "Please repeat -that" reply from the receiver, the speaker would probably ask, "What did -you say?" (thus introducing a new type of sender-to-receiver packet to -our protocol). The receiver would then repeat the reply. But what if the -speaker's "What did you say?" is corrupted? The receiver, having no idea -whether the garbled sentence was part of the dictation or a request to -repeat the last reply, would probably then respond with "What did you -say?" And then, of course, that response might be garbled. Clearly, -we're heading down a difficult path. A second alternative is to add -enough checksum bits to allow the sender not only to detect, but also to -recover from, bit errors. This solves the immediate problem for a -channel that can corrupt packets but not lose them. A third approach is -for the sender simply to resend the current data packet when it receives -a garbled ACK or NAK packet. This approach, however, introduces -duplicate packets into the sender-to-receiver channel. The fundamental -difficulty with duplicate packets is that the receiver doesn't know -whether the ACK or NAK it last sent was received correctly at the -sender. Thus, it cannot know a priori whether an arriving packet -contains new data or is a retransmission! A simple solution to this new -problem (and one adopted in almost all existing data transfer protocols, -including TCP) is to add a new field to the data packet and have the -sender number its data packets by putting a sequence number into this -field. The receiver then need only check this sequence number to - -determine whether or not the received packet is a retransmission. For -this simple case of a stop-andwait protocol, a 1-bit sequence number -will suffice, since it will allow the receiver to know whether the -sender is resending the previously transmitted packet (the sequence -number of the received packet has the same sequence number as the most -recently received packet) or a new packet (the sequence number changes, -moving "forward" in modulo-2 arithmetic). Since we are currently -assuming a channel that does not lose packets, ACK and NAK packets do -not themselves need to indicate the sequence number of the packet they -are acknowledging. The sender knows that a received ACK or NAK packet -(whether garbled or not) was generated in response to its most recently -transmitted data packet. Figures 3.11 and 3.12 show the FSM description -for rdt2.1 , our fixed version of rdt2.0 . The rdt2.1 sender and -receiver FSMs each now have twice as many states as before. This is -because the protocol state must now reflect whether the packet currently -being sent (by the sender) or expected (at the receiver) should have a -sequence number of 0 or 1. Note that the actions in those states where a -0numbered packet is being sent or expected are mirror images of those -where a 1-numbered packet is being sent or expected; the only -differences have to do with the handling of the sequence number. -Protocol rdt2.1 uses both positive and negative acknowledgments from the -receiver to the sender. When an out-of-order packet is received, the -receiver sends a positive acknowledgment for the packet it has received. -When a corrupted packet - -Figure 3.11 rdt2.1 sender - -Figure 3.12 rdt2.1 receiver - -is received, the receiver sends a negative acknowledgment. We can -accomplish the same effect as a NAK if, instead of sending a NAK, we -send an ACK for the last correctly received packet. A sender that -receives two ACKs for the same packet (that is, receives duplicate ACKs) -knows that the receiver did not correctly receive the packet following -the packet that is being ACKed twice. Our NAK-free reliable data -transfer protocol for a channel with bit errors is rdt2.2 , shown in -Figures 3.13 and 3.14. One subtle change between rtdt2.1 and rdt2.2 is -that the receiver must now include the sequence number of the packet -being acknowledged by an ACK message (this is done by including the ACK -, 0 or ACK , 1 argument in make_pkt() in the receiver FSM), and the -sender must now check the sequence number of the packet being -acknowledged by a received ACK message (this is done by including the 0 -or 1 argument in isACK() in the sender FSM). Reliable Data Transfer over -a Lossy Channel with Bit Errors: rdt3.0 Suppose now that in addition to -corrupting bits, the underlying channel can lose packets as well, a -notuncommon event in today's computer networks (including the Internet). -Two additional concerns must now be addressed by the protocol: how to -detect packet loss and what to do when packet loss occurs. The use of -checksumming, sequence numbers, ACK packets, and retransmissions---the -techniques - -Figure 3.13 rdt2.2 sender - -already developed in rdt2.2 ---will allow us to answer the latter -concern. Handling the first concern will require adding a new protocol -mechanism. There are many possible approaches toward dealing with packet -loss (several more of which are explored in the exercises at the end of -the chapter). Here, we'll put the burden of detecting and recovering -from lost packets on the sender. Suppose that the sender transmits a -data packet and either that packet, or the receiver's ACK of that -packet, gets lost. In either case, no reply is forthcoming at the sender -from the receiver. If the sender is willing to wait long enough so that -it is certain that a packet has been lost, it can simply retransmit the -data packet. You should convince yourself that this protocol does indeed -work. But how long must the sender wait to be certain that something has -been lost? The sender must clearly wait at least as long as a round-trip -delay between the sender and receiver (which may include buffering at -intermediate routers) plus whatever amount of time is needed to process -a packet at the receiver. In many networks, this worst-case maximum -delay is very difficult even to estimate, much less know with certainty. -Moreover, the protocol should ideally recover from packet loss as soon -as possible; waiting for a worst-case delay could mean a long wait until -error recovery - -Figure 3.14 rdt2.2 receiver - -is initiated. The approach thus adopted in practice is for the sender to -judiciously choose a time value such that packet loss is likely, -although not guaranteed, to have happened. If an ACK is not received -within this time, the packet is retransmitted. Note that if a packet -experiences a particularly large delay, the sender may retransmit the -packet even though neither the data packet nor its ACK have been lost. -This introduces the possibility of duplicate data packets in the -sender-to-receiver channel. Happily, protocol rdt2.2 already has enough -functionality (that is, sequence numbers) to handle the case of -duplicate packets. From the sender's viewpoint, retransmission is a -panacea. The sender does not know whether a data packet was lost, an ACK -was lost, or if the packet or ACK was simply overly delayed. In all -cases, the action is the same: retransmit. Implementing a time-based -retransmission mechanism requires a countdown timer that can interrupt -the sender after a given amount of time has expired. The sender will -thus need to be able to (1) start the timer each time a packet (either a -first-time packet or a retransmission) is sent, (2) respond to a timer -interrupt (taking appropriate actions), and (3) stop the timer. Figure -3.15 shows the sender FSM for rdt3.0 , a protocol that reliably -transfers data over a channel that can corrupt or lose packets; in the -homework problems, you'll be asked to provide the receiver FSM for -rdt3.0 . Figure 3.16 shows how the protocol operates with no lost or -delayed packets and how it handles lost data packets. In Figure 3.16, -time moves forward from the top of the diagram toward the bottom of the - -Figure 3.15 rdt3.0 sender - -diagram; note that a receive time for a packet is necessarily later than -the send time for a packet as a result of transmission and propagation -delays. In Figures 3.16(b)--(d), the send-side brackets indicate the -times at which a timer is set and later times out. Several of the more -subtle aspects of this protocol are explored in the exercises at the end -of this chapter. Because packet sequence numbers alternate between 0 and -1, protocol rdt3.0 is sometimes known as the alternating-bit protocol. -We have now assembled the key elements of a data transfer protocol. -Checksums, sequence numbers, timers, and positive and negative -acknowledgment packets each play a crucial and necessary role in the -operation of the protocol. We now have a working reliable data transfer -protocol! - -Developing a protocol and FSM representation for a simple -application-layer protocol - -3.4.2 Pipelined Reliable Data Transfer Protocols Protocol rdt3.0 is a -functionally correct protocol, but it is unlikely that anyone would be -happy with its performance, particularly in today's high-speed networks. -At the heart of rdt3.0 's performance problem is the fact that it is a -stop-and-wait protocol. - -Figure 3.16 Operation of rdt3.0 , the alternating-bit protocol - -Figure 3.17 Stop-and-wait versus pipelined protocol - -To appreciate the performance impact of this stop-and-wait behavior, -consider an idealized case of two hosts, one located on the West Coast -of the United States and the other located on the East Coast, as shown -in Figure 3.17. The speed-of-light round-trip propagation delay between -these two end systems, RTT, is approximately 30 milliseconds. Suppose -that they are connected by a channel with a transmission rate, R, of 1 -Gbps (109 bits per second). With a packet size, L, of 1,000 bytes (8,000 -bits) per packet, including both header fields and data, the time needed -to actually transmit the packet into the 1 Gbps link is dtrans=LR=8000 -bits/packet109 bits/sec=8 microseconds Figure 3.18(a) shows that with -our stop-and-wait protocol, if the sender begins sending the packet at -t=0, then at t=L/R=8 microseconds, the last bit enters the channel at -the sender side. The packet then makes its 15-msec cross-country -journey, with the last bit of the packet emerging at the receiver at -t=RTT/2+L/R= 15.008 msec. Assuming for simplicity that ACK packets are -extremely small (so that we can ignore their transmission time) and that -the receiver can send an ACK as soon as the last bit of a data packet is -received, the ACK emerges back at the sender at t=RTT+L/R=30.008 msec. -At this point, the sender can now transmit the next message. Thus, in -30.008 msec, the sender was sending for only 0.008 msec. If we define -the utilization of the sender (or the channel) as the fraction of time -the sender is actually busy sending bits into the channel, the analysis -in Figure 3.18(a) shows that the stop-andwait protocol has a rather -dismal sender utilization, Usender, of Usender=L/RRTT+L/R -=.00830.008=0.00027 - -Figure 3.18 Stop-and-wait and pipelined sending - -That is, the sender was busy only 2.7 hundredths of one percent of the -time! Viewed another way, the sender was able to send only 1,000 bytes -in 30.008 milliseconds, an effective throughput of only 267 kbps---even -though a 1 Gbps link was available! Imagine the unhappy network manager -who just paid a fortune for a gigabit capacity link but manages to get a -throughput of only 267 kilobits per second! This is a graphic example of -how network protocols can limit the capabilities provided by the -underlying network hardware. Also, we have neglected lower-layer -protocol-processing times at the sender and receiver, as well as the -processing and queuing delays that would occur at any intermediate -routers - -between the sender and receiver. Including these effects would serve -only to further increase the delay and further accentuate the poor -performance. The solution to this particular performance problem is -simple: Rather than operate in a stop-and-wait manner, the sender is -allowed to send multiple packets without waiting for acknowledgments, as -illustrated in Figure 3.17(b). Figure 3.18(b) shows that if the sender -is allowed to transmit three packets before having to wait for -acknowledgments, the utilization of the sender is essentially tripled. -Since the many in-transit sender-to-receiver packets can be visualized -as filling a pipeline, this technique is known as pipelining. Pipelining -has the following consequences for reliable data transfer protocols: The -range of sequence numbers must be increased, since each in-transit -packet (not counting retransmissions) must have a unique sequence number -and there may be multiple, in-transit, unacknowledged packets. The -sender and receiver sides of the protocols may have to buffer more than -one packet. Minimally, the sender will have to buffer packets that have -been transmitted but not yet acknowledged. Buffering of correctly -received packets may also be needed at the receiver, as discussed below. -The range of sequence numbers needed and the buffering requirements will -depend on the manner in which a data transfer protocol responds to lost, -corrupted, and overly delayed packets. Two basic approaches toward -pipelined error recovery can be identified: Go-Back-N and selective -repeat. - -3.4.3 Go-Back-N (GBN) In a Go-Back-N (GBN) protocol, the sender is -allowed to transmit multiple packets (when available) without waiting -for an acknowledgment, but is constrained to have no more than some -maximum allowable number, N, of unacknowledged packets in the pipeline. -We describe the GBN protocol in some detail in this section. But before -reading on, you are encouraged to play with the GBN applet (an awesome -applet!) at the companion Web site. Figure 3.19 shows the sender's view -of the range of sequence numbers in a GBN protocol. If we define base to -be the sequence number of the oldest unacknowledged - -Figure 3.19 Sender's view of sequence numbers in Go-Back-N - -packet and nextseqnum to be the smallest unused sequence number (that -is, the sequence number of the next packet to be sent), then four -intervals in the range of sequence numbers can be identified. Sequence -numbers in the interval \[ 0, base-1 \] correspond to packets that have -already been transmitted and acknowledged. The interval \[base, -nextseqnum-1\] corresponds to packets that have been sent but not yet -acknowledged. Sequence numbers in the interval \[nextseqnum, base+N-1\] -can be used for packets that can be sent immediately, should data arrive -from the upper layer. Finally, sequence numbers greater than or equal to -base+N cannot be used until an unacknowledged packet currently in the -pipeline (specifically, the packet with sequence number base ) has been -acknowledged. As suggested by Figure 3.19, the range of permissible -sequence numbers for transmitted but not yet acknowledged packets can be -viewed as a window of size N over the range of sequence numbers. As the -protocol operates, this window slides forward over the sequence number -space. For this reason, N is often referred to as the window size and -the GBN protocol itself as a sliding-window protocol. You might be -wondering why we would even limit the number of outstanding, -unacknowledged packets to a value of N in the first place. Why not allow -an unlimited number of such packets? We'll see in Section 3.5 that flow -control is one reason to impose a limit on the sender. We'll examine -another reason to do so in Section 3.7, when we study TCP congestion -control. In practice, a packet's sequence number is carried in a -fixed-length field in the packet header. If k is the number of bits in -the packet sequence number field, the range of sequence numbers is thus -\[0,2k−1\]. With a finite range of sequence numbers, all arithmetic -involving sequence numbers must then be done using modulo 2k arithmetic. -(That is, the sequence number space can be thought of as a ring of size -2k, where sequence number 2k−1 is immediately followed by sequence -number 0.) Recall that rdt3.0 had a 1-bit sequence number and a range of -sequence numbers of \[0,1\]. Several of the problems at the end of this -chapter explore the consequences of a finite range of sequence numbers. -We will see in Section 3.5 that TCP has a 32-bit sequence number field, -where TCP sequence numbers count bytes in the byte stream rather than -packets. Figures 3.20 and 3.21 give an extended FSM description of the -sender and receiver sides of an ACKbased, NAK-free, GBN protocol. We -refer to this FSM - -Figure 3.20 Extended FSM description of the GBN sender - -Figure 3.21 Extended FSM description of the GBN receiver - -description as an extended FSM because we have added variables (similar -to programming-language variables) for base and nextseqnum , and added -operations on these variables and conditional actions involving these -variables. Note that the extended FSM specification is now beginning to -look somewhat like a programming-language specification. \[Bochman -1984\] provides an excellent survey of - -additional extensions to FSM techniques as well as other -programming-language-based techniques for specifying protocols. The GBN -sender must respond to three types of events: Invocation from above. -When rdt_send() is called from above, the sender first checks to see if -the window is full, that is, whether there are N outstanding, -unacknowledged packets. If the window is not full, a packet is created -and sent, and variables are appropriately updated. If the window is -full, the sender simply returns the data back to the upper layer, an -implicit indication that the window is full. The upper layer would -presumably then have to try again later. In a real implementation, the -sender would more likely have either buffered (but not immediately sent) -this data, or would have a synchronization mechanism (for example, a -semaphore or a flag) that would allow the upper layer to call rdt_send() -only when the window is not full. Receipt of an ACK. In our GBN -protocol, an acknowledgment for a packet with sequence number n will be -taken to be a cumulative acknowledgment, indicating that all packets -with a sequence number up to and including n have been correctly -received at the receiver. We'll come back to this issue shortly when we -examine the receiver side of GBN. A timeout event. The protocol's name, -"Go-Back-N," is derived from the sender's behavior in the presence of -lost or overly delayed packets. As in the stop-and-wait protocol, a -timer will again be used to recover from lost data or acknowledgment -packets. If a timeout occurs, the sender resends all packets that have -been previously sent but that have not yet been acknowledged. Our sender -in Figure 3.20 uses only a single timer, which can be thought of as a -timer for the oldest transmitted but not yet acknowledged packet. If an -ACK is received but there are still additional transmitted but not yet -acknowledged packets, the timer is restarted. If there are no -outstanding, unacknowledged packets, the timer is stopped. The -receiver's actions in GBN are also simple. If a packet with sequence -number n is received correctly and is in order (that is, the data last -delivered to the upper layer came from a packet with sequence number -n−1), the receiver sends an ACK for packet n and delivers the data -portion of the packet to the upper layer. In all other cases, the -receiver discards the packet and resends an ACK for the most recently -received in-order packet. Note that since packets are delivered one at a -time to the upper layer, if packet k has been received and delivered, -then all packets with a sequence number lower than k have also been -delivered. Thus, the use of cumulative acknowledgments is a natural -choice for GBN. In our GBN protocol, the receiver discards out-of-order -packets. Although it may seem silly and wasteful to discard a correctly -received (but out-of-order) packet, there is some justification for -doing so. Recall that the receiver must deliver data in order to the -upper layer. Suppose now that packet n is expected, but packet n+1 -arrives. Because data must be delivered in order, the receiver could -buffer (save) packet n+1 and then deliver this packet to the upper layer -after it had later received and delivered packet n. However, if packet n -is lost, both it and packet n+1 will eventually be retransmitted as a -result of the - -GBN retransmission rule at the sender. Thus, the receiver can simply -discard packet n+1. The advantage of this approach is the simplicity of -receiver buffering---the receiver need not buffer any outof-order -packets. Thus, while the sender must maintain the upper and lower bounds -of its window and the position of nextseqnum within this window, the -only piece of information the receiver need maintain is the sequence -number of the next in-order packet. This value is held in the variable -expectedseqnum , shown in the receiver FSM in Figure 3.21. Of course, -the disadvantage of throwing away a correctly received packet is that -the subsequent retransmission of that packet might be lost or garbled -and thus even more retransmissions would be required. Figure 3.22 shows -the operation of the GBN protocol for the case of a window size of four -packets. Because of this window size limitation, the sender sends -packets 0 through 3 but then must wait for one or more of these packets -to be acknowledged before proceeding. As each successive ACK (for -example, ACK0 and ACK1 ) is received, the window slides forward and the -sender can transmit one new packet (pkt4 and pkt5, respectively). On the -receiver side, packet 2 is lost and thus packets 3, 4, and 5 are found -to be out of order and are discarded. Before closing our discussion of -GBN, it is worth noting that an implementation of this protocol in a -protocol stack would likely have a structure similar to that of the -extended FSM in Figure 3.20. The implementation would also likely be in -the form of various procedures that implement the actions to be taken in -response to the various events that can occur. In such event-based -programming, the various procedures are called (invoked) either by other -procedures in the protocol stack, or as the result of an interrupt. In -the sender, these events would be (1) a call from the upper-layer entity -to invoke rdt_send() , (2) a timer interrupt, and (3) a call from the -lower layer to invoke rdt_rcv() when a packet arrives. The programming -exercises at the end of this chapter will give you a chance to actually -implement these routines in a simulated, but realistic, network setting. -We note here that the GBN protocol incorporates almost all of the -techniques that we will encounter when we study the reliable data -transfer components of TCP in Section 3.5. These techniques include the -use of sequence numbers, cumulative acknowledgments, checksums, and a -timeout/retransmit operation. - -Figure 3.22 Go-Back-N in operation - -3.4.4 Selective Repeat (SR) The GBN protocol allows the sender to -potentially "fill the pipeline" in Figure 3.17 with packets, thus -avoiding the channel utilization problems we noted with stop-and-wait -protocols. There are, however, scenarios in which GBN itself suffers -from performance problems. In particular, when the window size and -bandwidth-delay product are both large, many packets can be in the -pipeline. A single packet error can thus cause GBN to retransmit a large -number of packets, many unnecessarily. As the probability of channel -errors increases, the pipeline can become filled with these unnecessary -retransmissions. Imagine, in our message-dictation scenario, that if -every time a word was garbled, the surrounding 1,000 words (for example, -a window size of 1,000 words) had to be repeated. The dictation would be - -slowed by all of the reiterated words. As the name suggests, -selective-repeat protocols avoid unnecessary retransmissions by having -the sender retransmit only those packets that it suspects were received -in error (that is, were lost or corrupted) at the receiver. This -individual, as-needed, retransmission will require that the receiver -individually acknowledge correctly received packets. A window size of N -will again be used to limit the number of outstanding, unacknowledged -packets in the pipeline. However, unlike GBN, the sender will have -already received ACKs for some of the packets in the window. Figure 3.23 -shows the SR sender's view of the sequence number space. Figure 3.24 -details the various actions taken by the SR sender. The SR receiver will -acknowledge a correctly received packet whether or not it is in order. -Out-of-order packets are buffered until any missing packets (that is, -packets with lower sequence numbers) are received, at which point a -batch of packets can be delivered in order to the upper layer. Figure -3.25 itemizes the various actions taken by the SR receiver. Figure 3.26 -shows an example of SR operation in the presence of lost packets. Note -that in Figure 3.26, the receiver initially buffers packets 3, 4, and 5, -and delivers them together with packet 2 to the upper layer when packet -2 is finally received. - -Figure 3.23 Selective-repeat (SR) sender and receiver views of -sequence-number space - -Figure 3.24 SR sender events and actions - -Figure 3.25 SR receiver events and actions - -It is important to note that in Step 2 in Figure 3.25, the receiver -reacknowledges (rather than ignores) already received packets with -certain sequence numbers below the current window base. You should -convince yourself that this reacknowledgment is indeed needed. Given the -sender and receiver sequence number spaces in Figure 3.23, for example, -if there is no ACK for packet send_base propagating from the - -Figure 3.26 SR operation - -receiver to the sender, the sender will eventually retransmit packet -send_base , even though it is clear (to us, not the sender!) that the -receiver has already received that packet. If the receiver were not to -acknowledge this packet, the sender's window would never move forward! -This example illustrates an important aspect of SR protocols (and many -other protocols as well). The sender and receiver will not always have -an identical view of what has been received correctly and what has not. -For SR protocols, this means that the sender and receiver windows will -not always coincide. The lack of synchronization between sender and -receiver windows has important consequences when we are faced with the -reality of a finite range of sequence numbers. Consider what could -happen, for example, with a finite range of four packet sequence -numbers, 0, 1, 2, 3, and a window size of three. - -Suppose packets 0 through 2 are transmitted and correctly received and -acknowledged at the receiver. At this point, the receiver's window is -over the fourth, fifth, and sixth packets, which have sequence numbers -3, 0, and 1, respectively. Now consider two scenarios. In the first -scenario, shown in Figure 3.27(a), the ACKs for the first three packets -are lost and the sender retransmits these packets. The receiver thus -next receives a packet with sequence number 0---a copy of the first -packet sent. In the second scenario, shown in Figure 3.27(b), the ACKs -for the first three packets are all delivered correctly. The sender thus -moves its window forward and sends the fourth, fifth, and sixth packets, -with sequence numbers 3, 0, and 1, respectively. The packet with -sequence number 3 is lost, but the packet with sequence number 0 -arrives---a packet containing new data. Now consider the receiver's -viewpoint in Figure 3.27, which has a figurative curtain between the -sender and the receiver, since the receiver cannot "see" the actions -taken by the sender. All the receiver observes is the sequence of -messages it receives from the channel and sends into the channel. As far -as it is concerned, the two scenarios in Figure 3.27 are identical. -There is no way of distinguishing the retransmission of the first packet -from an original transmission of the fifth packet. Clearly, a window -size that is 1 less than the size of the sequence number space won't -work. But how small must the window size be? A problem at the end of the -chapter asks you to show that the window size must be less than or equal -to half the size of the sequence number space for SR protocols. At the -companion Web site, you will find an applet that animates the operation -of the SR protocol. Try performing the same experiments that you did -with the GBN applet. Do the results agree with what you expect? This -completes our discussion of reliable data transfer protocols. We've -covered a lot of ground and introduced numerous mechanisms that together -provide for reliable data transfer. Table 3.1 summarizes these -mechanisms. Now that we have seen all of these mechanisms in operation -and can see the "big picture," we encourage you to review this section -again to see how these mechanisms were incrementally added to cover -increasingly complex (and realistic) models of the channel connecting -the sender and receiver, or to improve the performance of the protocols. -Let's conclude our discussion of reliable data transfer protocols by -considering one remaining assumption in our underlying channel model. -Recall that we have assumed that packets cannot be reordered within the -channel between the sender and receiver. This is generally a reasonable -assumption when the sender and receiver are connected by a single -physical wire. However, when the "channel" connecting the two is a -network, packet reordering can occur. One manifestation of packet -reordering is that old copies of a packet with a sequence or -acknowledgment - -Figure 3.27 SR receiver dilemma with too-large windows: A new packet or -a retransmission? - -Table 3.1 Summary of reliable data transfer mechanisms and their use -Mechanism - -Use, Comments - -Checksum - -Used to detect bit errors in a transmitted packet. - -Timer - -Used to timeout/retransmit a packet, possibly because the packet (or its -ACK) was lost within the channel. Because timeouts can occur when a -packet is delayed but not lost (premature timeout), or when a packet has -been received by the receiver but the receiver-to-sender ACK has been -lost, duplicate copies - -of a packet may be received by a receiver. Sequence - -Used for sequential numbering of packets of data flowing from sender to - -number - -receiver. Gaps in the sequence numbers of received packets allow the -receiver to detect a lost packet. Packets with duplicate sequence -numbers allow the receiver to detect duplicate copies of a packet. - -Acknowledgment - -Used by the receiver to tell the sender that a packet or set of packets -has been received correctly. Acknowledgments will typically carry the -sequence number of the packet or packets being acknowledged. -Acknowledgments may be individual or cumulative, depending on the -protocol. - -Negative - -Used by the receiver to tell the sender that a packet has not been -received - -acknowledgment - -correctly. Negative acknowledgments will typically carry the sequence -number of the packet that was not received correctly. - -Window, - -The sender may be restricted to sending only packets with sequence -numbers - -pipelining - -that fall within a given range. By allowing multiple packets to be -transmitted but not yet acknowledged, sender utilization can be -increased over a stop-and-wait mode of operation. We'll see shortly that -the window size may be set on the basis of the receiver's ability to -receive and buffer messages, or the level of congestion in the network, -or both. - -number of x can appear, even though neither the sender's nor the -receiver's window contains x. With packet reordering, the channel can be -thought of as essentially buffering packets and spontaneously emitting -these packets at any point in the future. Because sequence numbers may -be reused, some care must be taken to guard against such duplicate -packets. The approach taken in practice is to ensure that a sequence -number is not reused until the sender is "sure" that any previously sent -packets with sequence number x are no longer in the network. This is -done by assuming that a packet cannot "live" in the network for longer -than some fixed maximum amount of time. A maximum packet lifetime of -approximately three minutes is assumed in the TCP extensions for -high-speed networks \[RFC 1323\]. \[Sunshine 1978\] describes a method -for using sequence numbers such that reordering problems can be -completely avoided. - -3.5 Connection-Oriented Transport: TCP Now that we have covered the -underlying principles of reliable data transfer, let's turn to TCP---the -Internet's transport-layer, connection-oriented, reliable transport -protocol. In this section, we'll see that in order to provide reliable -data transfer, TCP relies on many of the underlying principles discussed -in the previous section, including error detection, retransmissions, -cumulative acknowledgments, timers, and header fields for sequence and -acknowledgment numbers. TCP is defined in RFC 793, RFC 1122, RFC 1323, -RFC 2018, and RFC 2581. - -3.5.1 The TCP Connection TCP is said to be connection-oriented because -before one application process can begin to send data to another, the -two processes must first "handshake" with each other---that is, they -must send some preliminary segments to each other to establish the -parameters of the ensuing data transfer. As part of TCP connection -establishment, both sides of the connection will initialize many TCP -state variables (many of which will be discussed in this section and in -Section 3.7) associated with the TCP connection. The TCP "connection" is -not an end-to-end TDM or FDM circuit as in a circuit-switched network. -Instead, the "connection" is a logical one, with common state residing -only in the TCPs in the two communicating end systems. Recall that -because the TCP protocol runs only in the end systems and not in the -intermediate network elements (routers and link-layer switches), the -intermediate network elements do not maintain TCP connection state. In -fact, the intermediate routers are completely oblivious to TCP -connections; they see datagrams, not connections. A TCP connection -provides a full-duplex service: If there is a TCP connection between -Process A on one host and Process B on another host, then -application-layer data can flow from Process A to Process B at the same -time as application-layer data flows from Process B to Process A. A TCP -connection is also always point-to-point, that is, between a single -sender and a single receiver. Socalled "multicasting" (see the online -supplementary materials for this text)---the transfer of data from one -sender to many receivers in a single send operation---is not possible -with TCP. With TCP, two hosts are company and three are a crowd! Let's -now take a look at how a TCP connection is established. Suppose a -process running in one host wants to initiate a connection with another -process in another host. Recall that the process that is - -initiating the connection is called the client process, while the other -process is called the server process. The client application process -first informs the client transport layer that it wants to establish a -connection - -CASE HISTORY Vinton Cerf, Robert Kahn, and TCP/IP In the early 1970s, -packet-switched networks began to proliferate, with the ARPAnet---the -precursor of the Internet---being just one of many networks. Each of -these networks had its own protocol. Two researchers, Vinton Cerf and -Robert Kahn, recognized the importance of interconnecting these networks -and invented a cross-network protocol called TCP/IP, which stands for -Transmission Control Protocol/Internet Protocol. Although Cerf and Kahn -began by seeing the protocol as a single entity, it was later split into -its two parts, TCP and IP, which operated separately. Cerf and Kahn -published a paper on TCP/IP in May 1974 in IEEE Transactions on -Communications Technology \[Cerf 1974\]. The TCP/IP protocol, which is -the bread and butter of today's Internet, was devised before PCs, -workstations, smartphones, and tablets, before the proliferation of -Ethernet, cable, and DSL, WiFi, and other access network technologies, -and before the Web, social media, and streaming video. Cerf and Kahn saw -the need for a networking protocol that, on the one hand, provides broad -support for yet-to-be-defined applications and, on the other hand, -allows arbitrary hosts and link-layer protocols to interoperate. In -2004, Cerf and Kahn received the ACM's Turing Award, considered the -"Nobel Prize of Computing" for "pioneering work on internetworking, -including the design and implementation of the Internet's basic -communications protocols, TCP/IP, and for inspired leadership in -networking." - -to a process in the server. Recall from Section 2.7.2, a Python client -program does this by issuing the command - -clientSocket.connect((serverName, serverPort)) - -where serverName is the name of the server and serverPort identifies the -process on the server. TCP in the client then proceeds to establish a -TCP connection with TCP in the server. At the end of this section we -discuss in some detail the connection-establishment procedure. For now -it suffices to know that the client first sends a special TCP segment; -the server responds with a second special TCP segment; and finally the -client responds again with a third special segment. The first two -segments carry no payload, that is, no application-layer data; the third -of these segments may carry a payload. Because - -three segments are sent between the two hosts, this -connection-establishment procedure is often referred to as a three-way -handshake. Once a TCP connection is established, the two application -processes can send data to each other. Let's consider the sending of -data from the client process to the server process. The client process -passes a stream of data through the socket (the door of the process), as -described in Section 2.7. Once the data passes through the door, the -data is in the hands of TCP running in the client. As shown in Figure -3.28, TCP directs this data to the connection's send buffer, which is -one of the buffers that is set aside during the initial three-way -handshake. From time to time, TCP will grab chunks of data from the send -buffer and pass the data to the network layer. Interestingly, the TCP -specification \[RFC 793\] is very laid back about specifying when TCP -should actually send buffered data, stating that TCP should "send that -data in segments at its own convenience." The maximum amount of data -that can be grabbed and placed in a segment is limited by the maximum -segment size (MSS). The MSS is typically set by first determining the -length of the largest link-layer frame that can be sent by the local -sending host (the socalled maximum transmission unit, MTU), and then -setting the MSS to ensure that a TCP segment (when encapsulated in an IP -datagram) plus the TCP/IP header length (typically 40 bytes) will fit -into a single link-layer frame. Both Ethernet and PPP link-layer -protocols have an MTU of 1,500 bytes. Thus a typical value of MSS is -1460 bytes. Approaches have also been proposed for discovering the path -MTU ---the largest link-layer frame that can be sent on all links from -source to destination \[RFC 1191\]---and setting the MSS based on the -path MTU value. Note that the MSS is the maximum amount of -application-layer data in the segment, not the maximum size of the TCP -segment including headers. (This terminology is confusing, but we have -to live with it, as it is well entrenched.) TCP pairs each chunk of -client data with a TCP header, thereby forming TCP segments. The -segments are passed down to the network layer, where they are separately -encapsulated within network-layer IP datagrams. The IP datagrams are -then sent into the network. When TCP receives a segment at the other -end, the segment's data is placed in the TCP connection's receive -buffer, as shown in Figure 3.28. The application reads the stream of -data from this buffer. Each side of the connection has - -Figure 3.28 TCP send and receive buffers - -its own send buffer and its own receive buffer. (You can see the online -flow-control applet at http://www.awl.com/kurose-ross, which provides an -animation of the send and receive buffers.) We see from this discussion -that a TCP connection consists of buffers, variables, and a socket -connection to a process in one host, and another set of buffers, -variables, and a socket connection to a process in another host. As -mentioned earlier, no buffers or variables are allocated to the -connection in the network elements (routers, switches, and repeaters) -between the hosts. - -3.5.2 TCP Segment Structure Having taken a brief look at the TCP -connection, let's examine the TCP segment structure. The TCP segment -consists of header fields and a data field. The data field contains a -chunk of application data. As mentioned above, the MSS limits the -maximum size of a segment's data field. When TCP sends a large file, -such as an image as part of a Web page, it typically breaks the file -into chunks of size MSS (except for the last chunk, which will often be -less than the MSS). Interactive applications, however, often transmit -data chunks that are smaller than the MSS; for example, with remote -login applications like Telnet, the data field in the TCP segment is -often only one byte. Because the TCP header is typically 20 bytes (12 -bytes more than the UDP header), segments sent by Telnet may be only 21 -bytes in length. Figure 3.29 shows the structure of the TCP segment. As -with UDP, the header includes source and destination port numbers, which -are used for multiplexing/demultiplexing data from/to upper-layer -applications. Also, as with UDP, the header includes a checksum field. A -TCP segment header also contains the following fields: The 32-bit -sequence number field and the 32-bit acknowledgment number field are -used by the TCP sender and receiver in implementing a reliable data -transfer service, as discussed below. The 16-bit receive window field is -used for flow control. We will see shortly that it is used to indicate -the number of bytes that a receiver is willing to accept. The 4-bit -header length field specifies the length of the TCP header in 32-bit -words. The TCP header can be of variable length due to the TCP options -field. (Typically, the options field is empty, so that the length of the -typical TCP header is 20 bytes.) The optional and variable-length -options field is used when a sender and receiver negotiate the maximum -segment size (MSS) or as a window scaling factor for use in high-speed -networks. A timestamping option is also defined. See RFC 854 and RFC -1323 for additional details. The flag field contains 6 bits. The ACK bit -is used to indicate that the value carried in the acknowledgment field -is valid; that is, the segment contains an acknowledgment for a segment -that has been successfully received. The RST, - -Figure 3.29 TCP segment structure - -SYN, and FIN bits are used for connection setup and teardown, as we will -discuss at the end of this section. The CWR and ECE bits are used in -explicit congestion notification, as discussed in Section 3.7.2. Setting -the PSH bit indicates that the receiver should pass the data to the -upper layer immediately. Finally, the URG bit is used to indicate that -there is data in this segment that the sending-side upper-layer entity -has marked as "urgent." The location of the last byte of this urgent -data is indicated by the 16-bit urgent data pointer field. TCP must -inform the receiving-side upperlayer entity when urgent data exists and -pass it a pointer to the end of the urgent data. (In practice, the PSH, -URG, and the urgent data pointer are not used. However, we mention these -fields for completeness.) Our experience as teachers is that our -students sometimes find discussion of packet formats rather dry and -perhaps a bit boring. For a fun and fanciful look at TCP header fields, -particularly if you love Legos™ as we do, see \[Pomeranz 2010\]. -Sequence Numbers and Acknowledgment Numbers Two of the most important -fields in the TCP segment header are the sequence number field and the -acknowledgment number field. These fields are a critical part of TCP's -reliable data transfer service. But before discussing how these fields -are used to provide reliable data transfer, let us first explain what -exactly TCP puts in these fields. - -Figure 3.30 Dividing file data into TCP segments - -TCP views data as an unstructured, but ordered, stream of bytes. TCP's -use of sequence numbers reflects this view in that sequence numbers are -over the stream of transmitted bytes and not over the series of -transmitted segments. The sequence number for a segment is therefore the -byte-stream number of the first byte in the segment. Let's look at an -example. Suppose that a process in Host A wants to send a stream of data -to a process in Host B over a TCP connection. The TCP in Host A will -implicitly number each byte in the data stream. Suppose that the data -stream consists of a file consisting of 500,000 bytes, that the MSS is -1,000 bytes, and that the first byte of the data stream is numbered 0. -As shown in Figure 3.30, TCP constructs 500 segments out of the data -stream. The first segment gets assigned sequence number 0, the second -segment gets assigned sequence number 1,000, the third segment gets -assigned sequence number 2,000, and so on. Each sequence number is -inserted in the sequence number field in the header of the appropriate -TCP segment. Now let's consider acknowledgment numbers. These are a -little trickier than sequence numbers. Recall that TCP is full-duplex, -so that Host A may be receiving data from Host B while it sends data to -Host B (as part of the same TCP connection). Each of the segments that -arrive from Host B has a sequence number for the data flowing from B to -A. The acknowledgment number that Host A puts in its segment is the -sequence number of the next byte Host A is expecting from Host B. It is -good to look at a few examples to understand what is going on here. -Suppose that Host A has received all bytes numbered 0 through 535 from B -and suppose that it is about to send a segment to Host B. Host A is -waiting for byte 536 and all the subsequent bytes in Host B's data -stream. So Host A puts 536 in the acknowledgment number field of the -segment it sends to B. As another example, suppose that Host A has -received one segment from Host B containing bytes 0 through 535 and -another segment containing bytes 900 through 1,000. For some reason Host -A has not yet received bytes 536 through 899. In this example, Host A is -still waiting for byte 536 (and beyond) in order to re-create B's data -stream. Thus, A's next segment to B will contain 536 in the -acknowledgment number field. Because TCP only acknowledges bytes up to -the first missing byte in the stream, TCP is said to provide cumulative -acknowledgments. - -This last example also brings up an important but subtle issue. Host A -received the third segment (bytes 900 through 1,000) before receiving -the second segment (bytes 536 through 899). Thus, the third segment -arrived out of order. The subtle issue is: What does a host do when it -receives out-of-order segments in a TCP connection? Interestingly, the -TCP RFCs do not impose any rules here and leave the decision up to the -programmers implementing a TCP implementation. There are basically two -choices: either (1) the receiver immediately discards out-of-order -segments (which, as we discussed earlier, can simplify receiver design), -or (2) the receiver keeps the out-of-order bytes and waits for the -missing bytes to fill in the gaps. Clearly, the latter choice is more -efficient in terms of network bandwidth, and is the approach taken in -practice. In Figure 3.30, we assumed that the initial sequence number -was zero. In truth, both sides of a TCP connection randomly choose an -initial sequence number. This is done to minimize the possibility that a -segment that is still present in the network from an earlier, -already-terminated connection between two hosts is mistaken for a valid -segment in a later connection between these same two hosts (which also -happen to be using the same port numbers as the old connection) -\[Sunshine 1978\]. Telnet: A Case Study for Sequence and Acknowledgment -Numbers Telnet, defined in RFC 854, is a popular application-layer -protocol used for remote login. It runs over TCP and is designed to work -between any pair of hosts. Unlike the bulk data transfer applications -discussed in Chapter 2, Telnet is an interactive application. We discuss -a Telnet example here, as it nicely illustrates TCP sequence and -acknowledgment numbers. We note that many users now prefer to use the -SSH protocol rather than Telnet, since data sent in a Telnet connection -(including passwords!) are not encrypted, making Telnet vulnerable to -eavesdropping attacks (as discussed in Section 8.7). Suppose Host A -initiates a Telnet session with Host B. Because Host A initiates the -session, it is labeled the client, and Host B is labeled the server. -Each character typed by the user (at the client) will be sent to the -remote host; the remote host will send back a copy of each character, -which will be displayed on the Telnet user's screen. This "echo back" is -used to ensure that characters seen by the Telnet user have already been -received and processed at the remote site. Each character thus traverses -the network twice between the time the user hits the key and the time -the character is displayed on the user's monitor. Now suppose the user -types a single letter, 'C,' and then grabs a coffee. Let's examine the -TCP segments that are sent between the client and server. As shown in -Figure 3.31, we suppose the starting sequence numbers are 42 and 79 for -the client and server, respectively. Recall that the sequence number of -a segment is the sequence number of the first byte in the data field. -Thus, the first segment sent from the client will have sequence number -42; the first segment sent from the server will have sequence number 79. -Recall that the acknowledgment number is the sequence - -Figure 3.31 Sequence and acknowledgment numbers for a simple Telnet -application over TCP - -number of the next byte of data that the host is waiting for. After the -TCP connection is established but before any data is sent, the client is -waiting for byte 79 and the server is waiting for byte 42. As shown in -Figure 3.31, three segments are sent. The first segment is sent from the -client to the server, containing the 1-byte ASCII representation of the -letter 'C' in its data field. This first segment also has 42 in its -sequence number field, as we just described. Also, because the client -has not yet received any data from the server, this first segment will -have 79 in its acknowledgment number field. The second segment is sent -from the server to the client. It serves a dual purpose. First it -provides an acknowledgment of the data the server has received. By -putting 43 in the acknowledgment field, the server is telling the client -that it has successfully received everything up through byte 42 and is -now waiting for bytes 43 onward. The second purpose of this segment is -to echo back the letter 'C.' Thus, the second segment has the ASCII -representation of 'C' in its data field. This second segment has the -sequence number 79, the initial sequence number of the server-to-client -data flow of this TCP connection, as this is the very first byte of data -that the server is sending. Note that the acknowledgment for -client-to-server data is carried in a segment carrying server-to-client -data; this acknowledgment is said to be piggybacked on the -server-to-client data segment. - -The third segment is sent from the client to the server. Its sole -purpose is to acknowledge the data it has received from the server. -(Recall that the second segment contained data---the letter 'C'---from -the server to the client.) This segment has an empty data field (that -is, the acknowledgment is not being piggybacked with any -client-to-server data). The segment has 80 in the acknowledgment number -field because the client has received the stream of bytes up through -byte sequence number 79 and it is now waiting for bytes 80 onward. You -might think it odd that this segment also has a sequence number since -the segment contains no data. But because TCP has a sequence number -field, the segment needs to have some sequence number. - -3.5.3 Round-Trip Time Estimation and Timeout TCP, like our rdt protocol -in Section 3.4, uses a timeout/retransmit mechanism to recover from lost -segments. Although this is conceptually simple, many subtle issues arise -when we implement a timeout/retransmit mechanism in an actual protocol -such as TCP. Perhaps the most obvious question is the length of the -timeout intervals. Clearly, the timeout should be larger than the -connection's round-trip time (RTT), that is, the time from when a -segment is sent until it is acknowledged. Otherwise, unnecessary -retransmissions would be sent. But how much larger? How should the RTT -be estimated in the first place? Should a timer be associated with each -and every unacknowledged segment? So many questions! Our discussion in -this section is based on the TCP work in \[Jacobson 1988\] and the -current IETF recommendations for managing TCP timers \[RFC 6298\]. -Estimating the Round-Trip Time Let's begin our study of TCP timer -management by considering how TCP estimates the round-trip time between -sender and receiver. This is accomplished as follows. The sample RTT, -denoted SampleRTT , for a segment is the amount of time between when the -segment is sent (that is, passed to IP) and when an acknowledgment for -the segment is received. Instead of measuring a SampleRTT for every -transmitted segment, most TCP implementations take only one SampleRTT -measurement at a time. That is, at any point in time, the SampleRTT is -being estimated for only one of the transmitted but currently -unacknowledged segments, leading to a new value of SampleRTT -approximately once every RTT. Also, TCP never computes a SampleRTT for a -segment that has been retransmitted; it only measures SampleRTT for -segments that have been transmitted once \[Karn 1987\]. (A problem at -the end of the chapter asks you to consider why.) Obviously, the -SampleRTT values will fluctuate from segment to segment due to -congestion in the routers and to the varying load on the end systems. -Because of this fluctuation, any given SampleRTT value may be atypical. -In order to estimate a typical RTT, it is therefore natural to take some -sort of average of the SampleRTT values. TCP maintains an average, -called EstimatedRTT , of the - -SampleRTT values. Upon obtaining a new SampleRTT , TCP updates -EstimatedRTT according to the following formula: - -EstimatedRTT=(1−α)⋅EstimatedRTT+α⋅SampleRTT The formula above is written -in the form of a programming-language statement---the new value of -EstimatedRTT is a weighted combination of the previous value of -EstimatedRTT and the new value for SampleRTT. The recommended value of α -is α = 0.125 (that is, 1/8) \[RFC 6298\], in which case the formula -above becomes: - -EstimatedRTT=0.875⋅EstimatedRTT+0.125⋅SampleRTT - -Note that EstimatedRTT is a weighted average of the SampleRTT values. As -discussed in a homework problem at the end of this chapter, this -weighted average puts more weight on recent samples than on old samples. -This is natural, as the more recent samples better reflect the current -congestion in the network. In statistics, such an average is called an -exponential weighted moving average (EWMA). The word "exponential" -appears in EWMA because the weight of a given SampleRTT decays -exponentially fast as the updates proceed. In the homework problems you -will be asked to derive the exponential term in EstimatedRTT . Figure -3.32 shows the SampleRTT values and EstimatedRTT for a value of α = 1/8 -for a TCP connection between gaia.cs.umass.edu (in Amherst, -Massachusetts) to fantasia.eurecom.fr (in the south of France). Clearly, -the variations in the SampleRTT are smoothed out in the computation of -the EstimatedRTT . In addition to having an estimate of the RTT, it is -also valuable to have a measure of the variability of the RTT. \[RFC -6298\] defines the RTT variation, DevRTT , as an estimate of how much -SampleRTT typically deviates from EstimatedRTT : - -DevRTT=(1−β)⋅DevRTT+β⋅\|SampleRTT−EstimatedRTT\| - -Note that DevRTT is an EWMA of the difference between SampleRTT and -EstimatedRTT . If the SampleRTT values have little fluctuation, then -DevRTT will be small; on the other hand, if there is a lot of -fluctuation, DevRTT will be large. The recommended value of β is 0.25. - -Setting and Managing the Retransmission Timeout Interval Given values of -EstimatedRTT and DevRTT , what value should be used for TCP's timeout -interval? Clearly, the interval should be greater than or equal to - -PRINCIPLES IN PRACTICE TCP provides reliable data transfer by using -positive acknowledgments and timers in much the same way that we studied -in Section 3.4. TCP acknowledges data that has been received correctly, -and it then retransmits segments when segments or their corresponding -acknowledgments are thought to be lost or corrupted. Certain versions of -TCP also have an implicit NAK mechanism---with TCP's fast retransmit -mechanism, the receipt of three duplicate ACKs for a given segment -serves as an implicit NAK for the following segment, triggering -retransmission of that segment before timeout. TCP uses sequences of -numbers to allow the receiver to identify lost or duplicate segments. -Just as in the case of our reliable data transfer protocol, rdt3.0 , TCP -cannot itself tell for certain if a segment, or its ACK, is lost, -corrupted, or overly delayed. At the sender, TCP's response will be the -same: retransmit the segment in question. TCP also uses pipelining, -allowing the sender to have multiple transmitted but -yet-to-beacknowledged segments outstanding at any given time. We saw -earlier that pipelining can greatly improve a session's throughput when -the ratio of the segment size to round-trip delay is small. The specific -number of outstanding, unacknowledged segments that a sender can have is -determined by TCP's flow-control and congestion-control mechanisms. TCP -flow control is discussed at the end of this section; TCP congestion -control is discussed in Section 3.7. For the time being, we must simply -be aware that the TCP sender uses pipelining. EstimatedRTT , or -unnecessary retransmissions would be sent. But the timeout interval -should not be too much larger than EstimatedRTT ; otherwise, when a -segment is lost, TCP would not quickly retransmit the segment, leading -to large data transfer delays. It is therefore desirable to set the -timeout equal to the EstimatedRTT plus some margin. The margin should be -large when there is a lot of fluctuation in the SampleRTT values; it -should be small when there is little fluctuation. The value of DevRTT -should thus come into play here. All of these considerations are taken -into account in TCP's method for determining the retransmission timeout -interval: - -TimeoutInterval=EstimatedRTT+4⋅DevRTT - -An initial TimeoutInterval value of 1 second is recommended \[RFC -6298\]. Also, when a timeout occurs, the value of TimeoutInterval is -doubled to avoid a premature timeout occurring for a - -subsequent segment that will soon be acknowledged. However, as soon as a -segment is received and EstimatedRTT is updated, the TimeoutInterval is -again computed using the formula above. - -Figure 3.32 RTT samples and RTT estimates - -3.5.4 Reliable Data Transfer Recall that the Internet's network-layer -service (IP service) is unreliable. IP does not guarantee datagram -delivery, does not guarantee in-order delivery of datagrams, and does -not guarantee the integrity of the data in the datagrams. With IP -service, datagrams can overflow router buffers and never reach their -destination, datagrams can arrive out of order, and bits in the datagram -can get corrupted (flipped from 0 to 1 and vice versa). Because -transport-layer segments are carried across the network by IP datagrams, -transport-layer segments can suffer from these problems as well. TCP -creates a reliable data transfer service on top of IP's unreliable -best-effort service. TCP's reliable data transfer service ensures that -the data stream that a process reads out of its TCP receive buffer is -uncorrupted, without gaps, without duplication, and in sequence; that -is, the byte stream is exactly the same byte stream that was sent by the -end system on the other side of the connection. How TCP provides a -reliable data transfer involves many of the principles that we studied -in Section 3.4. In our earlier development of reliable data transfer -techniques, it was conceptually easiest to assume - -that an individual timer is associated with each transmitted but not yet -acknowledged segment. While this is great in theory, timer management -can require considerable overhead. Thus, the recommended TCP timer -management procedures \[RFC 6298\] use only a single retransmission -timer, even if there are multiple transmitted but not yet acknowledged -segments. The TCP protocol described in this section follows this -single-timer recommendation. We will discuss how TCP provides reliable -data transfer in two incremental steps. We first present a highly -simplified description of a TCP sender that uses only timeouts to -recover from lost segments; we then present a more complete description -that uses duplicate acknowledgments in addition to timeouts. In the -ensuing discussion, we suppose that data is being sent in only one -direction, from Host A to Host B, and that Host A is sending a large -file. Figure 3.33 presents a highly simplified description of a TCP -sender. We see that there are three major events related to data -transmission and retransmission in the TCP sender: data received from -application above; timer timeout; and ACK - -Figure 3.33 Simplified TCP sender - -receipt. Upon the occurrence of the first major event, TCP receives data -from the application, encapsulates the data in a segment, and passes the -segment to IP. Note that each segment includes a sequence number that is -the byte-stream number of the first data byte in the segment, as -described in Section 3.5.2. Also note that if the timer is already not -running for some other segment, TCP starts the timer when the segment is -passed to IP. (It is helpful to think of the timer as being associated -with the oldest unacknowledged segment.) The expiration interval for -this timer is the TimeoutInterval , which is calculated from -EstimatedRTT and DevRTT , as described in Section 3.5.3. The second -major event is the timeout. TCP responds to the timeout event by -retransmitting the segment that caused the timeout. TCP then restarts -the timer. The third major event that must be handled by the TCP sender -is the arrival of an acknowledgment segment (ACK) from the receiver -(more specifically, a segment containing a valid ACK field value). On -the occurrence of this event, TCP compares the ACK value y with its -variable SendBase . The TCP state variable SendBase is the sequence -number of the oldest unacknowledged byte. (Thus SendBase--1 is the -sequence number of the last byte that is known to have been received -correctly and in order at the receiver.) As indicated earlier, TCP uses -cumulative acknowledgments, so that y acknowledges the receipt of all -bytes before byte number y . If y \> SendBase , then the ACK is -acknowledging one or more previously unacknowledged segments. Thus the -sender updates its SendBase variable; it also restarts the timer if -there currently are any not-yet-acknowledged segments. A Few Interesting -Scenarios We have just described a highly simplified version of how TCP -provides reliable data transfer. But even this highly simplified version -has many subtleties. To get a good feeling for how this protocol works, -let's now walk through a few simple scenarios. Figure 3.34 depicts the -first scenario, in which Host A sends one segment to Host B. Suppose -that this segment has sequence number 92 and contains 8 bytes of data. -After sending this segment, Host A waits for a segment from B with -acknowledgment number 100. Although the segment from A is received at B, -the acknowledgment from B to A gets lost. In this case, the timeout -event occurs, and Host A retransmits the same segment. Of course, when -Host B receives the retransmission, it observes from the sequence number -that the segment contains data that has already been received. Thus, TCP -in Host B will discard the bytes in the retransmitted segment. In a -second scenario, shown in Figure 3.35, Host A sends two segments back to -back. The first segment has sequence number 92 and 8 bytes of data, and -the second segment has sequence number 100 and 20 bytes of data. Suppose -that both segments arrive intact at B, and B sends two separate -acknowledgments for each of these segments. The first of these -acknowledgments has acknowledgment number 100; the second has -acknowledgment number 120. Suppose now that neither of the -acknowledgments arrives at Host A before the timeout. When the timeout -event occurs, Host - -Figure 3.34 Retransmission due to a lost acknowledgment - -A resends the first segment with sequence number 92 and restarts the -timer. As long as the ACK for the second segment arrives before the new -timeout, the second segment will not be retransmitted. In a third and -final scenario, suppose Host A sends the two segments, exactly as in the -second example. The acknowledgment of the first segment is lost in the -network, but just before the timeout event, Host A receives an -acknowledgment with acknowledgment number 120. Host A therefore knows -that Host B has received everything up through byte 119; so Host A does -not resend either of the two segments. This scenario is illustrated in -Figure 3.36. Doubling the Timeout Interval We now discuss a few -modifications that most TCP implementations employ. The first concerns -the length of the timeout interval after a timer expiration. In this -modification, whenever the timeout event occurs, TCP retransmits the -not-yet-acknowledged segment with the smallest sequence number, as -described above. But each time TCP retransmits, it sets the next timeout -interval to twice the previous value, - -Figure 3.35 Segment 100 not retransmitted - -rather than deriving it from the last EstimatedRTT and DevRTT (as -described in Section 3.5.3). For example, suppose TimeoutInterval -associated with the oldest not yet acknowledged segment is .75 sec when -the timer first expires. TCP will then retransmit this segment and set -the new expiration time to 1.5 sec. If the timer expires again 1.5 sec -later, TCP will again retransmit this segment, now setting the -expiration time to 3.0 sec. Thus the intervals grow exponentially after -each retransmission. However, whenever the timer is started after either -of the two other events (that is, data received from application above, -and ACK received), the TimeoutInterval is derived from the most recent -values of EstimatedRTT and DevRTT . This modification provides a limited -form of congestion control. (More comprehensive forms of TCP congestion -control will be studied in Section 3.7.) The timer expiration is most -likely caused by congestion in the network, that is, too many packets -arriving at one (or more) router queues in the path between the source -and destination, causing packets to be dropped and/or long queuing -delays. In times of congestion, if the sources continue to retransmit -packets persistently, the congestion - -Figure 3.36 A cumulative acknowledgment avoids retransmission of the -first segment - -may get worse. Instead, TCP acts more politely, with each sender -retransmitting after longer and longer intervals. We will see that a -similar idea is used by Ethernet when we study CSMA/CD in Chapter 6. -Fast Retransmit One of the problems with timeout-triggered -retransmissions is that the timeout period can be relatively long. When -a segment is lost, this long timeout period forces the sender to delay -resending the lost packet, thereby increasing the end-to-end delay. -Fortunately, the sender can often detect packet loss well before the -timeout event occurs by noting so-called duplicate ACKs. A duplicate ACK -is an ACK that reacknowledges a segment for which the sender has already -received an earlier acknowledgment. To understand the sender's response -to a duplicate ACK, we must look at why the receiver sends a duplicate -ACK in the first place. Table 3.2 summarizes the TCP receiver's ACK -generation policy \[RFC 5681\]. When a TCP receiver receives Table 3.2 -TCP ACK Generation Recommendation \[RFC 5681\] Event - -TCP Receiver Action - -Arrival of in-order segment with expected - -Delayed ACK. Wait up to 500 msec for arrival of - -sequence number. All data up to expected - -another in-order segment. If next in-order segment - -sequence number already acknowledged. - -does not arrive in this interval, send an ACK. - -Arrival of in-order segment with expected - -One Immediately send single cumulative ACK, - -sequence number. One other in-order - -ACKing both in-order segments. - -segment waiting for ACK transmission. Arrival of out-of-order segment -with higher- - -Immediately send duplicate ACK, indicating - -than-expected sequence number. Gap - -sequence number of next expected byte (which is - -detected. - -the lower end of the gap). - -Arrival of segment that partially or completely - -Immediately send ACK, provided that segment - -fills in gap in received data. - -starts at the lower end of gap. - -a segment with a sequence number that is larger than the next, expected, -in-order sequence number, it detects a gap in the data stream---that is, -a missing segment. This gap could be the result of lost or reordered -segments within the network. Since TCP does not use negative -acknowledgments, the receiver cannot send an explicit negative -acknowledgment back to the sender. Instead, it simply reacknowledges -(that is, generates a duplicate ACK for) the last in-order byte of data -it has received. (Note that Table 3.2 allows for the case that the -receiver does not discard out-of-order segments.) Because a sender often -sends a large number of segments back to back, if one segment is lost, -there will likely be many back-to-back duplicate ACKs. If the TCP sender -receives three duplicate ACKs for the same data, it takes this as an -indication that the segment following the segment that has been ACKed -three times has been lost. (In the homework problems, we consider the -question of why the sender waits for three duplicate ACKs, rather than -just a single duplicate ACK.) In the case that three duplicate ACKs are -received, the TCP sender performs a fast retransmit \[RFC 5681\], -retransmitting the missing segment before that segment's timer expires. -This is shown in Figure 3.37, where the second segment is lost, then -retransmitted before its timer expires. For TCP with fast retransmit, -the following code snippet replaces the ACK received event in Figure -3.33: - -event: ACK received, with ACK field value of y if (y \> SendBase) { -SendBase=y if (there are currently any not yet acknowledged segments) -start timer - -} - -Figure 3.37 Fast retransmit: retransmitting the missing segment before -the segment's timer expires - -else {/\* a duplicate ACK for already ACKed segment */ increment number -of duplicate ACKs received for y if (number of duplicate ACKS received -for y==3) /* TCP fast retransmit \*/ resend segment with sequence number -y } break; - -We noted earlier that many subtle issues arise when a timeout/retransmit -mechanism is implemented in an actual protocol such as TCP. The -procedures above, which have evolved as a result of more than 20 years -of experience with TCP timers, should convince you that this is indeed -the case! Go-Back-N or Selective Repeat? Let us close our study of TCP's -error-recovery mechanism by considering the following question: Is TCP a -GBN or an SR protocol? Recall that TCP acknowledgments are cumulative -and correctly received but out-of-order segments are not individually -ACKed by the receiver. Consequently, as shown in Figure 3.33 (see also -Figure 3.19), the TCP sender need only maintain the smallest sequence -number of a transmitted but unacknowledged byte ( SendBase ) and the -sequence number of the next byte to be sent ( NextSeqNum ). In this -sense, TCP looks a lot like a GBN-style protocol. But there are some -striking differences between TCP and Go-Back-N. Many TCP implementations -will buffer correctly received but out-of-order segments \[Stevens -1994\]. Consider also what happens when the sender sends a sequence of -segments 1, 2, . . ., N, and all of the segments arrive in order without -error at the receiver. Further suppose that the acknowledgment for -packet n\<N gets lost, but the remaining N−1 acknowledgments arrive at -the sender before their respective timeouts. In this example, GBN would -retransmit not only packet n, but also all of the subsequent packets -n+1,n+2,...,N. TCP, on the other hand, would retransmit at most one -segment, namely, segment n. Moreover, TCP would not even retransmit -segment n if the acknowledgment for segment n+1 arrived before the -timeout for segment n. A proposed modification to TCP, the so-called -selective acknowledgment \[RFC 2018\], allows a TCP receiver to -acknowledge out-of-order segments selectively rather than just -cumulatively acknowledging the last correctly received, in-order -segment. When combined with selective retransmission---skipping the -retransmission of segments that have already been selectively -acknowledged by the receiver---TCP looks a lot like our generic SR -protocol. Thus, TCP's error-recovery mechanism is probably best -categorized as a hybrid of GBN and SR protocols. - -3.5.5 Flow Control Recall that the hosts on each side of a TCP -connection set aside a receive buffer for the connection. When the TCP -connection receives bytes that are correct and in sequence, it places -the data in the receive buffer. The associated application process will -read data from this buffer, but not necessarily at the instant the data -arrives. Indeed, the receiving application may be busy with some other -task and may not even attempt to read the data until long after it has -arrived. If the application is relatively slow at reading the data, the -sender can very easily overflow the connection's receive buffer by -sending too much data too quickly. - -TCP provides a flow-control service to its applications to eliminate the -possibility of the sender overflowing the receiver's buffer. Flow -control is thus a speed-matching service---matching the rate at which -the sender is sending against the rate at which the receiving -application is reading. As noted earlier, a TCP sender can also be -throttled due to congestion within the IP network; this form of sender -control is referred to as congestion control, a topic we will explore in -detail in Sections 3.6 and 3.7. Even though the actions taken by flow -and congestion control are similar (the throttling of the sender), they -are obviously taken for very different reasons. Unfortunately, many -authors use the terms interchangeably, and the savvy reader would be -wise to distinguish between them. Let's now discuss how TCP provides its -flow-control service. In order to see the forest for the trees, we -suppose throughout this section that the TCP implementation is such that -the TCP receiver discards out-of-order segments. TCP provides flow -control by having the sender maintain a variable called the receive -window. Informally, the receive window is used to give the sender an -idea of how much free buffer space is available at the receiver. Because -TCP is full-duplex, the sender at each side of the connection maintains -a distinct receive window. Let's investigate the receive window in the -context of a file transfer. Suppose that Host A is sending a large file -to Host B over a TCP connection. Host B allocates a receive buffer to -this connection; denote its size by RcvBuffer . From time to time, the -application process in Host B reads from the buffer. Define the -following variables: LastByteRead : the number of the last byte in the -data stream read from the buffer by the application process in B -LastByteRcvd : the number of the last byte in the data stream that has -arrived from the network and has been placed in the receive buffer at B -Because TCP is not permitted to overflow the allocated buffer, we must -have - -LastByteRcvd−LastByteRead≤RcvBuffer - -The receive window, denoted rwnd is set to the amount of spare room in -the buffer: - -rwnd=RcvBuffer−\[LastByteRcvd−LastByteRead\] - -Because the spare room changes with time, rwnd is dynamic. The variable -rwnd is illustrated in Figure 3.38. - -How does the connection use the variable rwnd to provide the -flow-control service? Host B tells Host A how much spare room it has in -the connection buffer by placing its current value of rwnd in the -receive window field of every segment it sends to A. Initially, Host B -sets rwnd = RcvBuffer . Note that to pull this off, Host B must keep -track of several connection-specific variables. Host A in turn keeps -track of two variables, LastByteSent and LastByteAcked , which have -obvious meanings. Note that the difference between these two variables, -LastByteSent -- LastByteAcked , is the amount of unacknowledged data -that A has sent into the connection. By keeping the amount of -unacknowledged data less than the value of rwnd , Host A is assured that -it is not - -Figure 3.38 The receive window (rwnd) and the receive buffer (RcvBuffer) - -overflowing the receive buffer at Host B. Thus, Host A makes sure -throughout the connection's life that - -LastByteSent−LastByteAcked≤rwnd - -There is one minor technical problem with this scheme. To see this, -suppose Host B's receive buffer becomes full so that rwnd = 0. After -advertising rwnd = 0 to Host A, also suppose that B has nothing to send -to A. Now consider what happens. As the application process at B empties -the buffer, TCP does not send new segments with new rwnd values to Host -A; indeed, TCP sends a segment to Host A only if it has data to send or -if it has an acknowledgment to send. Therefore, Host A is never informed -that some space has opened up in Host B's receive buffer---Host A is -blocked and can transmit no more data! To solve this problem, the TCP -specification requires Host A to continue to send segments with one data -byte when B's receive window is zero. These segments will be -acknowledged by the receiver. Eventually the buffer will begin to empty -and the acknowledgments will contain a nonzero rwnd value. - -The online site at http://www.awl.com/kurose-ross for this book provides -an interactive Java applet that illustrates the operation of the TCP -receive window. Having described TCP's flow-control service, we briefly -mention here that UDP does not provide flow control and consequently, -segments may be lost at the receiver due to buffer overflow. For -example, consider sending a series of UDP segments from a process on -Host A to a process on Host B. For a typical UDP implementation, UDP -will append the segments in a finite-sized buffer that "precedes" the -corresponding socket (that is, the door to the process). The process -reads one entire segment at a time from the buffer. If the process does -not read the segments fast enough from the buffer, the buffer will -overflow and segments will get dropped. - -3.5.6 TCP Connection Management In this subsection we take a closer look -at how a TCP connection is established and torn down. Although this -topic may not seem particularly thrilling, it is important because TCP -connection establishment can significantly add to perceived delays (for -example, when surfing the Web). Furthermore, many of the most common -network attacks---including the incredibly popular SYN flood -attack---exploit vulnerabilities in TCP connection management. Let's -first take a look at how a TCP connection is established. Suppose a -process running in one host (client) wants to initiate a connection with -another process in another host (server). The client application process -first informs the client TCP that it wants to establish a connection to -a process in the server. The TCP in the client then proceeds to -establish a TCP connection with the TCP in the server in the following -manner: Step 1. The client-side TCP first sends a special TCP segment to -the server-side TCP. This special segment contains no application-layer -data. But one of the flag bits in the segment's header (see Figure -3.29), the SYN bit, is set to 1. For this reason, this special segment -is referred to as a SYN segment. In addition, the client randomly -chooses an initial sequence number ( client_isn ) and puts this number -in the sequence number field of the initial TCP SYN segment. This -segment is encapsulated within an IP datagram and sent to the server. -There has been considerable interest in properly randomizing the choice -of the client_isn in order to avoid certain security attacks \[CERT -2001--09\]. Step 2. Once the IP datagram containing the TCP SYN segment -arrives at the server host (assuming it does arrive!), the server -extracts the TCP SYN segment from the datagram, allocates the TCP -buffers and variables to the connection, and sends a connection-granted -segment to the client TCP. (We'll see in Chapter 8 that the allocation -of these buffers and variables before completing the third step of the -three-way handshake makes TCP vulnerable to a denial-of-service attack -known as SYN flooding.) This connection-granted segment also contains no -application-layer data. However, it does contain three important pieces -of information in the segment header. First, the SYN bit is set to 1. -Second, the acknowledgment field of the TCP segment header is set to - -client_isn+1 . Finally, the server chooses its own initial sequence -number ( server_isn ) and puts this value in the sequence number field -of the TCP segment header. This connection-granted segment is saying, in -effect, "I received your SYN packet to start a connection with your -initial sequence number, client_isn . I agree to establish this -connection. My own initial sequence number is server_isn ." The -connection-granted segment is referred to as a SYNACK segment. Step 3. -Upon receiving the SYNACK segment, the client also allocates buffers and -variables to the connection. The client host then sends the server yet -another segment; this last segment acknowledges the server's -connection-granted segment (the client does so by putting the value -server_isn+1 in the acknowledgment field of the TCP segment header). The -SYN bit is set to zero, since the connection is established. This third -stage of the three-way handshake may carry client-to-server data in the -segment payload. Once these three steps have been completed, the client -and server hosts can send segments containing data to each other. In -each of these future segments, the SYN bit will be set to zero. Note -that in order to establish the connection, three packets are sent -between the two hosts, as illustrated in Figure 3.39. For this reason, -this connection-establishment procedure is often referred to as a -threeway handshake. Several aspects of the TCP three-way handshake are -explored in the homework problems (Why are initial sequence numbers -needed? Why is a three-way handshake, as opposed to a two-way handshake, -needed?). It's interesting to note that a rock climber and a belayer -(who is stationed below the rock climber and whose job it is to handle -the climber's safety rope) use a threeway-handshake communication -protocol that is identical to TCP's to ensure that both sides are ready -before the climber begins ascent. All good things must come to an end, -and the same is true with a TCP connection. Either of the two processes -participating in a TCP connection can end the connection. When a -connection ends, the "resources" (that is, the buffers and variables) - -Figure 3.39 TCP three-way handshake: segment exchange - -Figure 3.40 Closing a TCP connection - -in the hosts are deallocated. As an example, suppose the client decides -to close the connection, as shown in Figure 3.40. The client application -process issues a close command. This causes the client TCP to send a -special TCP segment to the server process. This special segment has a -flag bit in the segment's header, the FIN bit (see Figure 3.29), set -to 1. When the server receives this segment, it sends the client an -acknowledgment segment in return. The server then sends its own shutdown -segment, which has the FIN bit set to 1. Finally, the client -acknowledges the server's shutdown segment. At this point, all the -resources in the two hosts are now deallocated. During the life of a TCP -connection, the TCP protocol running in each host makes transitions -through various TCP states. Figure 3.41 illustrates a typical sequence -of TCP states that are visited by the client TCP. The client TCP begins -in the CLOSED state. The application on the client side initiates a new -TCP connection (by creating a Socket object in our Java examples as in -the Python examples from Chapter 2). This causes TCP in the client to -send a SYN segment to TCP in the server. After having sent the SYN -segment, the client TCP enters the SYN_SENT state. While in the SYN_SENT -state, the client TCP waits for a segment from the server TCP that -includes an acknowledgment for the client's previous segment and - -Figure 3.41 A typical sequence of TCP states visited by a client TCP - -has the SYN bit set to 1. Having received such a segment, the client TCP -enters the ESTABLISHED state. While in the ESTABLISHED state, the TCP -client can send and receive TCP segments containing payload (that is, -application-generated) data. Suppose that the client application decides -it wants to close the connection. (Note that the server could also -choose to close the connection.) This causes the client TCP to send a -TCP segment with the FIN bit set to 1 and to enter the FIN_WAIT_1 state. -While in the FIN_WAIT_1 state, the client TCP waits for a TCP segment -from the server with an acknowledgment. When it receives this segment, -the client TCP enters the FIN_WAIT_2 state. While in the FIN_WAIT_2 -state, the client waits for another segment from the server with the FIN -bit set to 1; after receiving this segment, the client TCP acknowledges -the server's segment and enters the TIME_WAIT state. The TIME_WAIT state -lets the TCP client resend the final acknowledgment in case the ACK is -lost. The time spent in the TIME_WAIT state is implementation-dependent, -but typical values are 30 seconds, 1 minute, and 2 minutes. After the -wait, the connection formally closes and all resources on the client -side (including port numbers) are released. Figure 3.42 illustrates the -series of states typically visited by the server-side TCP, assuming the -client begins connection teardown. The transitions are self-explanatory. -In these two state-transition diagrams, we have only shown how a TCP -connection is normally established and shut down. We have not described -what happens in certain pathological scenarios, for example, when both -sides of a connection want to initiate or shut down at the same time. If -you are interested in learning about - -Figure 3.42 A typical sequence of TCP states visited by a server-side -TCP - -this and other advanced issues concerning TCP, you are encouraged to see -Stevens' comprehensive book \[Stevens 1994\]. Our discussion above has -assumed that both the client and server are prepared to communicate, -i.e., that the server is listening on the port to which the client sends -its SYN segment. Let's consider what happens when a host receives a TCP -segment whose port numbers or source IP address do not match with any of -the ongoing sockets in the host. For example, suppose a host receives a -TCP SYN packet with destination port 80, but the host is not accepting -connections on port 80 (that is, it is not running a Web server on port -80). Then the host will send a special reset segment to the source. This -TCP segment has the RST flag bit (see Section 3.5.2) set to 1. Thus, -when a host sends a reset segment, it is telling the source "I don't -have a socket for that segment. Please do not resend the segment." When -a host receives a UDP packet whose destination port number doesn't match -with an ongoing UDP socket, the host sends a special ICMP datagram, as -discussed in Chapter 5. Now that we have a good understanding of TCP -connection management, let's revisit the nmap portscanning tool and -examine more closely how it works. To explore a specific TCP port, say -port 6789, on a target host, nmap will send a TCP SYN segment with -destination port 6789 to that host. There are three possible outcomes: -The source host receives a TCP SYNACK segment from the target host. -Since this means that an application is running with TCP port 6789 on -the target post, nmap returns "open." FOCUS ON SECURITY The Syn Flood -Attack We've seen in our discussion of TCP's three-way handshake that a -server allocates and initializes connection variables and buffers in -response to a received SYN. The server then sends a SYNACK in response, -and awaits an ACK segment from the client. If the client does not send -an ACK to complete the third step of this 3-way handshake, eventually -(often after a minute or more) the server will terminate the half-open -connection and reclaim the allocated resources. This TCP connection -management protocol sets the stage for a classic Denial of Service (DoS) -attack known as the SYN flood attack. In this attack, the attacker(s) -send a large number of TCP SYN segments, without completing the third -handshake step. With this deluge of SYN segments, the server's -connection resources become exhausted as they are allocated (but never -used!) for half-open connections; legitimate clients are then denied -service. Such SYN flooding attacks were among the first documented DoS -attacks \[CERT SYN 1996\]. Fortunately, an effective defense known as -SYN cookies \[RFC 4987\] are now deployed in most major operating -systems. SYN cookies work as follows: When the server receives a SYN -segment, it does not know if the segment is coming - -from a legitimate user or is part of a SYN flood attack. So, instead of -creating a half-open TCP connection for this SYN, the server creates an -initial TCP sequence number that is a complicated function (hash -function) of source and destination IP addresses and port numbers of the -SYN segment, as well as a secret number only known to the server. This -carefully crafted initial sequence number is the so-called "cookie." The -server then sends the client a SYNACK packet with this special initial -sequence number. Importantly, the server does not remember the cookie or -any other state information corresponding to the SYN. A legitimate -client will return an ACK segment. When the server receives this ACK, it -must verify that the ACK corresponds to some SYN sent earlier. But how -is this done if the server maintains no memory about SYN segments? As -you may have guessed, it is done with the cookie. Recall that for a -legitimate ACK, the value in the acknowledgment field is equal to the -initial sequence number in the SYNACK (the cookie value in this case) -plus one (see Figure 3.39). The server can then run the same hash -function using the source and destination IP address and port numbers in -the SYNACK (which are the same as in the original SYN) and the secret -number. If the result of the function plus one is the same as the -acknowledgment (cookie) value in the client's SYNACK, the server -concludes that the ACK corresponds to an earlier SYN segment and is -hence valid. The server then creates a fully open connection along with -a socket. On the other hand, if the client does not return an ACK -segment, then the original SYN has done no harm at the server, since the -server hasn't yet allocated any resources in response to the original -bogus SYN. The source host receives a TCP RST segment from the target -host. This means that the SYN segment reached the target host, but the -target host is not running an application with TCP port 6789. But the -attacker at least knows that the segments destined to the host at port -6789 are not blocked by any firewall on the path between source and -target hosts. (Firewalls are discussed in Chapter 8.) The source -receives nothing. This likely means that the SYN segment was blocked by -an intervening firewall and never reached the target host. Nmap is a -powerful tool that can "case the joint" not only for open TCP ports, but -also for open UDP ports, for firewalls and their configurations, and -even for the versions of applications and operating systems. Most of -this is done by manipulating TCP connection-management segments -\[Skoudis 2006\]. You can download nmap from www.nmap.org. This -completes our introduction to error control and flow control in TCP. In -Section 3.7 we'll return to TCP and look at TCP congestion control in -some depth. Before doing so, however, we first step back and examine -congestion-control issues in a broader context. - -3.6 Principles of Congestion Control In the previous sections, we -examined both the general principles and specific TCP mechanisms used to -provide for a reliable data transfer service in the face of packet loss. -We mentioned earlier that, in practice, such loss typically results from -the overflowing of router buffers as the network becomes congested. -Packet retransmission thus treats a symptom of network congestion (the -loss of a specific transport-layer segment) but does not treat the cause -of network congestion---too many sources attempting to send data at too -high a rate. To treat the cause of network congestion, mechanisms are -needed to throttle senders in the face of network congestion. In this -section, we consider the problem of congestion control in a general -context, seeking to understand why congestion is a bad thing, how -network congestion is manifested in the performance received by -upper-layer applications, and various approaches that can be taken to -avoid, or react to, network congestion. This more general study of -congestion control is appropriate since, as with reliable data transfer, -it is high on our "top-ten" list of fundamentally important problems in -networking. The following section contains a detailed study of TCP's -congestion-control algorithm. - -3.6.1 The Causes and the Costs of Congestion Let's begin our general -study of congestion control by examining three increasingly complex -scenarios in which congestion occurs. In each case, we'll look at why -congestion occurs in the first place and at the cost of congestion (in -terms of resources not fully utilized and poor performance received by -the end systems). We'll not (yet) focus on how to react to, or avoid, -congestion but rather focus on the simpler issue of understanding what -happens as hosts increase their transmission rate and the network -becomes congested. Scenario 1: Two Senders, a Router with Infinite -Buffers We begin by considering perhaps the simplest congestion scenario -possible: Two hosts (A and B) each have a connection that shares a -single hop between source and destination, as shown in Figure 3.43. -Let's assume that the application in Host A is sending data into the -connection (for example, passing data to the transport-level protocol -via a socket) at an average rate of λin bytes/sec. These data are -original in the sense that each unit of data is sent into the socket -only once. The underlying transportlevel protocol is a simple one. Data -is encapsulated and sent; no error recovery (for example, - -retransmission), flow control, or congestion control is performed. -Ignoring the additional overhead due to adding transport- and -lower-layer header information, the rate at which Host A offers traffic -to the router in this first scenario is thus λin bytes/sec. Host B -operates in a similar manner, and we assume for simplicity that it too -is sending at a rate of λin bytes/sec. Packets from Hosts A and B pass -through a router and over a shared outgoing link of capacity R. The -router has buffers that allow it to store incoming packets when the -packet-arrival rate exceeds the outgoing link's capacity. In this first -scenario, we assume that the router has an infinite amount of buffer -space. Figure 3.44 plots the performance of Host A's connection under -this first scenario. The left graph plots the per-connection throughput -(number of bytes per - -Figure 3.43 Congestion scenario 1: Two connections sharing a single hop -with infinite buffers - -Figure 3.44 Congestion scenario 1: Throughput and delay as a function of -host sending rate - -second at the receiver) as a function of the connection-sending rate. -For a sending rate between 0 and R/2, the throughput at the receiver -equals the sender's sending rate---everything sent by the sender is -received at the receiver with a finite delay. When the sending rate is -above R/2, however, the throughput is only R/2. This upper limit on -throughput is a consequence of the sharing of link capacity between two -connections. The link simply cannot deliver packets to a receiver at a -steady-state rate that exceeds R/2. No matter how high Hosts A and B set -their sending rates, they will each never see a throughput higher than -R/2. Achieving a per-connection throughput of R/2 might actually appear -to be a good thing, because the link is fully utilized in delivering -packets to their destinations. The right-hand graph in Figure 3.44, -however, shows the consequence of operating near link capacity. As the -sending rate approaches R/2 (from the left), the average delay becomes -larger and larger. When the sending rate exceeds R/2, the average number -of queued packets in the router is unbounded, and the average delay -between source and destination becomes infinite (assuming that the -connections operate at these sending rates for an infinite period of -time and there is an infinite amount of buffering available). Thus, -while operating at an aggregate throughput of near R may be ideal from a -throughput standpoint, it is far from ideal from a delay standpoint. -Even in this (extremely) idealized scenario, we've already found one -cost of a congested network---large queuing delays are experienced as -the packet-arrival rate nears the link capacity. Scenario 2: Two Senders -and a Router with Finite Buffers Let's now slightly modify scenario 1 in -the following two ways (see Figure 3.45). First, the amount of router -buffering is assumed to be finite. A consequence of this real-world -assumption is that packets will be dropped when arriving to an -already-full buffer. Second, we assume that each connection is reliable. -If a packet containing - -Figure 3.45 Scenario 2: Two hosts (with retransmissions) and a router -with finite buffers - -a transport-level segment is dropped at the router, the sender will -eventually retransmit it. Because packets can be retransmitted, we must -now be more careful with our use of the term sending rate. Specifically, -let us again denote the rate at which the application sends original -data into the socket by λin bytes/sec. The rate at which the transport -layer sends segments (containing original data and retransmitted data) -into the network will be denoted λ′in bytes/sec. λ′in is sometimes -referred to as the offered load to the network. The performance realized -under scenario 2 will now depend strongly on how retransmission is -performed. First, consider the unrealistic case that Host A is able to -somehow (magically!) determine whether or not a buffer is free in the -router and thus sends a packet only when a buffer is free. In this case, -no loss would occur, λin would be equal to λ′in, and the throughput of -the connection would be equal to λin. This case is shown in Figure -3.46(a). From a throughput standpoint, performance is ideal--- -everything that is sent is received. Note that the average host sending -rate cannot exceed R/2 under this scenario, since packet loss is assumed -never to occur. Consider next the slightly more realistic case that the -sender retransmits only when a packet is known for certain to be lost. -(Again, this assumption is a bit of a stretch. However, it is possible -that the sending host might set its timeout large enough to be virtually -assured that a packet that has not been acknowledged has been lost.) In -this case, the performance might look something like that shown in -Figure 3.46(b). To appreciate what is happening here, consider the case -that the offered load, λ′in (the rate of original data transmission plus -retransmissions), equals R/2. According to Figure 3.46(b), at this value -of the offered load, the rate at which data - -Figure 3.46 Scenario 2 performance with finite buffers - -are delivered to the receiver application is R/3. Thus, out of the 0.5R -units of data transmitted, 0.333R bytes/sec (on average) are original -data and 0.166R bytes/sec (on average) are retransmitted data. We see -here another cost of a congested network---the sender must perform -retransmissions in order to compensate for dropped (lost) packets due to -buffer overflow. Finally, let us consider the case that the sender may -time out prematurely and retransmit a packet that has been delayed in -the queue but not yet lost. In this case, both the original data packet -and the retransmission may reach the receiver. Of course, the receiver -needs but one copy of this packet and will discard the retransmission. -In this case, the work done by the router in forwarding the -retransmitted copy of the original packet was wasted, as the receiver -will have already received the original copy of this packet. The router -would have better used the link transmission capacity to send a -different packet instead. Here then is yet another cost of a congested -network---unneeded retransmissions by the sender in the face of large -delays may cause a router to use its link bandwidth to forward unneeded -copies of a packet. Figure 3.46 (c) shows the throughput versus offered -load when each packet is assumed to be forwarded (on average) twice by -the router. Since each packet is forwarded twice, the throughput will -have an asymptotic value of R/4 as the offered load approaches R/2. -Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop -Paths In our final congestion scenario, four hosts transmit packets, -each over overlapping two-hop paths, as shown in Figure 3.47. We again -assume that each host uses a timeout/retransmission mechanism to -implement a reliable data transfer service, that all hosts have the same -value of λin, and that all router links have capacity R bytes/sec. - -Figure 3.47 Four senders, routers with finite buffers, and multihop -paths - -Let's consider the connection from Host A to Host C, passing through -routers R1 and R2. The A--C connection shares router R1 with the D--B -connection and shares router R2 with the B--D connection. For extremely -small values of λin, buffer overflows are rare (as in congestion -scenarios 1 and 2), and the throughput approximately equals the offered -load. For slightly larger values of λin, the corresponding throughput is -also larger, since more original data is being transmitted into the -network and delivered to the destination, and overflows are still rare. -Thus, for small values of λin, an increase in λin results in an increase -in λout. Having considered the case of extremely low traffic, let's next -examine the case that λin (and hence λ′in) is extremely large. Consider -router R2. The A--C traffic arriving to router R2 (which arrives at R2 -after being forwarded from R1) can have an arrival rate at R2 that is at -most R, the capacity of the link from R1 to R2, regardless of the value -of λin. If λ′in is extremely large for all connections (including the - -Figure 3.48 Scenario 3 performance with finite buffers and multihop -paths - -B--D connection), then the arrival rate of B--D traffic at R2 can be -much larger than that of the A--C traffic. Because the A--C and B--D -traffic must compete at router R2 for the limited amount of buffer -space, the amount of A--C traffic that successfully gets through R2 -(that is, is not lost due to buffer overflow) becomes smaller and -smaller as the offered load from B--D gets larger and larger. In the -limit, as the offered load approaches infinity, an empty buffer at R2 is -immediately filled by a B--D packet, and the throughput of the A--C -connection at R2 goes to zero. This, in turn, implies that the A--C -end-to-end throughput goes to zero in the limit of heavy traffic. These -considerations give rise to the offered load versus throughput tradeoff -shown in Figure 3.48. The reason for the eventual decrease in throughput -with increasing offered load is evident when one considers the amount of -wasted work done by the network. In the high-traffic scenario outlined -above, whenever a packet is dropped at a second-hop router, the work -done by the first-hop router in forwarding a packet to the second-hop -router ends up being "wasted." The network would have been equally well -off (more accurately, equally bad off) if the first router had simply -discarded that packet and remained idle. More to the point, the -transmission capacity used at the first router to forward the packet to -the second router could have been much more profitably used to transmit -a different packet. (For example, when selecting a packet for -transmission, it might be better for a router to give priority to -packets that have already traversed some number of upstream routers.) So -here we see yet another cost of dropping a packet due to -congestion---when a packet is dropped along a path, the transmission -capacity that was used at each of the upstream links to forward that -packet to the point at which it is dropped ends up having been wasted. - -3.6.2 Approaches to Congestion Control In Section 3.7, we'll examine -TCP's specific approach to congestion control in great detail. Here, we -identify the two broad approaches to congestion control that are taken -in practice and discuss specific - -network architectures and congestion-control protocols embodying these -approaches. At the highest level, we can distinguish among -congestion-control approaches by whether the network layer provides -explicit assistance to the transport layer for congestion-control -purposes: End-to-end congestion control. In an end-to-end approach to -congestion control, the network layer provides no explicit support to -the transport layer for congestion-control purposes. Even the presence -of network congestion must be inferred by the end systems based only on -observed network behavior (for example, packet loss and delay). We'll -see shortly in Section 3.7.1 that TCP takes this end-to-end approach -toward congestion control, since the IP layer is not required to provide -feedback to hosts regarding network congestion. TCP segment loss (as -indicated by a timeout or the receipt of three duplicate -acknowledgments) is taken as an indication of network congestion, and -TCP decreases its window size accordingly. We'll also see a more recent -proposal for TCP congestion control that uses increasing round-trip -segment delay as an indicator of increased network congestion -Network-assisted congestion control. With network-assisted congestion -control, routers provide explicit feedback to the sender and/or receiver -regarding the congestion state of the network. This feedback may be as -simple as a single bit indicating congestion at a link -- an approach -taken in the early IBM SNA \[Schwartz 1982\], DEC DECnet \[Jain 1989; -Ramakrishnan 1990\] architectures, and ATM \[Black 1995\] network -architectures. More sophisticated feedback is also possible. For -example, in ATM Available Bite Rate (ABR) congestion control, a router -informs the sender of the maximum host sending rate it (the router) can -support on an outgoing link. As noted above, the Internet-default -versions of IP and TCP adopt an end-to-end approach towards congestion -control. We'll see, however, in Section 3.7.2 that, more recently, IP -and TCP may also optionally implement network-assisted congestion -control. For network-assisted congestion control, congestion information -is typically fed back from the network to the sender in one of two ways, -as shown in Figure 3.49. Direct feedback may be sent from a network -router to the sender. This form of notification typically takes the form -of a choke packet (essentially saying, "I'm congested!"). The second and -more common form of notification occurs when a router marks/updates a -field in a packet flowing from sender to receiver to indicate -congestion. Upon receipt of a marked packet, the receiver then notifies -the sender of the congestion indication. This latter form of -notification takes a full round-trip time. - -Figure 3.49 Two feedback pathways for network-indicated congestion -information - -3.7 TCP Congestion Control In this section we return to our study of -TCP. As we learned in Section 3.5, TCP provides a reliable transport -service between two processes running on different hosts. Another key -component of TCP is its congestion-control mechanism. As indicated in -the previous section, TCP must use end-to-end congestion control rather -than network-assisted congestion control, since the IP layer provides no -explicit feedback to the end systems regarding network congestion. The -approach taken by TCP is to have each sender limit the rate at which it -sends traffic into its connection as a function of perceived network -congestion. If a TCP sender perceives that there is little congestion on -the path between itself and the destination, then the TCP sender -increases its send rate; if the sender perceives that there is -congestion along the path, then the sender reduces its send rate. But -this approach raises three questions. First, how does a TCP sender limit -the rate at which it sends traffic into its connection? Second, how does -a TCP sender perceive that there is congestion on the path between -itself and the destination? And third, what algorithm should the sender -use to change its send rate as a function of perceived end-to-end -congestion? Let's first examine how a TCP sender limits the rate at -which it sends traffic into its connection. In Section 3.5 we saw that -each side of a TCP connection consists of a receive buffer, a send -buffer, and several variables ( LastByteRead , rwnd , and so on). The -TCP congestion-control mechanism operating at the sender keeps track of -an additional variable, the congestion window. The congestion window, -denoted cwnd , imposes a constraint on the rate at which a TCP sender -can send traffic into the network. Specifically, the amount of -unacknowledged data at a sender may not exceed the minimum of cwnd and -rwnd , that is: - -LastByteSent−LastByteAcked≤min{cwnd, rwnd} - -In order to focus on congestion control (as opposed to flow control), -let us henceforth assume that the TCP receive buffer is so large that -the receive-window constraint can be ignored; thus, the amount of -unacknowledged data at the sender is solely limited by cwnd . We will -also assume that the sender always has data to send, i.e., that all -segments in the congestion window are sent. The constraint above limits -the amount of unacknowledged data at the sender and therefore indirectly -limits the sender's send rate. To see this, consider a connection for -which loss and packet transmission delays are negligible. Then, roughly, -at the beginning of every RTT, the constraint permits the sender to - -send cwnd bytes of data into the connection; at the end of the RTT the -sender receives acknowledgments for the data. Thus the sender's send -rate is roughly cwnd/RTT bytes/sec. By adjusting the value of cwnd , the -sender can therefore adjust the rate at which it sends data into its -connection. Let's next consider how a TCP sender perceives that there is -congestion on the path between itself and the destination. Let us define -a "loss event" at a TCP sender as the occurrence of either a timeout or -the receipt of three duplicate ACKs from the receiver. (Recall our -discussion in Section 3.5.4 of the timeout event in Figure 3.33 and the -subsequent modification to include fast retransmit on receipt of three -duplicate ACKs.) When there is excessive congestion, then one (or more) -router buffers along the path overflows, causing a datagram (containing -a TCP segment) to be dropped. The dropped datagram, in turn, results in -a loss event at the sender---either a timeout or the receipt of three -duplicate ACKs--- which is taken by the sender to be an indication of -congestion on the sender-to-receiver path. Having considered how -congestion is detected, let's next consider the more optimistic case -when the network is congestion-free, that is, when a loss event doesn't -occur. In this case, acknowledgments for previously unacknowledged -segments will be received at the TCP sender. As we'll see, TCP will take -the arrival of these acknowledgments as an indication that all is -well---that segments being transmitted into the network are being -successfully delivered to the destination---and will use acknowledgments -to increase its congestion window size (and hence its transmission -rate). Note that if acknowledgments arrive at a relatively slow rate -(e.g., if the end-end path has high delay or contains a low-bandwidth -link), then the congestion window will be increased at a relatively slow -rate. On the other hand, if acknowledgments arrive at a high rate, then -the congestion window will be increased more quickly. Because TCP uses -acknowledgments to trigger (or clock) its increase in congestion window -size, TCP is said to be self-clocking. Given the mechanism of adjusting -the value of cwnd to control the sending rate, the critical question -remains: How should a TCP sender determine the rate at which it should -send? If TCP senders collectively send too fast, they can congest the -network, leading to the type of congestion collapse that we saw in -Figure 3.48. Indeed, the version of TCP that we'll study shortly was -developed in response to observed Internet congestion collapse -\[Jacobson 1988\] under earlier versions of TCP. However, if TCP senders -are too cautious and send too slowly, they could under utilize the -bandwidth in the network; that is, the TCP senders could send at a -higher rate without congesting the network. How then do the TCP senders -determine their sending rates such that they don't congest the network -but at the same time make use of all the available bandwidth? Are TCP -senders explicitly coordinated, or is there a distributed approach in -which the TCP senders can set their sending rates based only on local -information? TCP answers these questions using the following guiding -principles: A lost segment implies congestion, and hence, the TCP -sender's rate should be decreased when a segment is lost. Recall from -our discussion in Section 3.5.4, that a timeout event or the - -receipt of four acknowledgments for a given segment (one original ACK -and then three duplicate ACKs) is interpreted as an implicit "loss -event" indication of the segment following the quadruply ACKed segment, -triggering a retransmission of the lost segment. From a -congestion-control standpoint, the question is how the TCP sender should -decrease its congestion window size, and hence its sending rate, in -response to this inferred loss event. An acknowledged segment indicates -that the network is delivering the sender's segments to the receiver, -and hence, the sender's rate can be increased when an ACK arrives for a -previously unacknowledged segment. The arrival of acknowledgments is -taken as an implicit indication that all is well---segments are being -successfully delivered from sender to receiver, and the network is thus -not congested. The congestion window size can thus be increased. -Bandwidth probing. Given ACKs indicating a congestion-free -source-to-destination path and loss events indicating a congested path, -TCP's strategy for adjusting its transmission rate is to increase its -rate in response to arriving ACKs until a loss event occurs, at which -point, the transmission rate is decreased. The TCP sender thus increases -its transmission rate to probe for the rate that at which congestion -onset begins, backs off from that rate, and then to begins probing again -to see if the congestion onset rate has changed. The TCP sender's -behavior is perhaps analogous to the child who requests (and gets) more -and more goodies until finally he/she is finally told "No!", backs off a -bit, but then begins making requests again shortly afterwards. Note that -there is no explicit signaling of congestion state by the network---ACKs -and loss events serve as implicit signals---and that each TCP sender -acts on local information asynchronously from other TCP senders. Given -this overview of TCP congestion control, we're now in a position to -consider the details of the celebrated TCP congestion-control algorithm, -which was first described in \[Jacobson 1988\] and is standardized in -\[RFC 5681\]. The algorithm has three major components: (1) slow start, -(2) congestion avoidance, and (3) fast recovery. Slow start and -congestion avoidance are mandatory components of TCP, differing in how -they increase the size of cwnd in response to received ACKs. We'll see -shortly that slow start increases the size of cwnd more rapidly (despite -its name!) than congestion avoidance. Fast recovery is recommended, but -not required, for TCP senders. Slow Start When a TCP connection begins, -the value of cwnd is typically initialized to a small value of 1 MSS -\[RFC 3390\], resulting in an initial sending rate of roughly MSS/RTT. -For example, if MSS = 500 bytes and RTT = 200 msec, the resulting -initial sending rate is only about 20 kbps. Since the available -bandwidth to the TCP sender may be much larger than MSS/RTT, the TCP -sender would like to find the amount of available bandwidth quickly. -Thus, in the slow-start state, the value of cwnd begins at 1 MSS and -increases by 1 MSS every time a transmitted segment is first -acknowledged. In the example of Figure 3.50, TCP sends the first segment -into the network - -Figure 3.50 TCP slow start - -and waits for an acknowledgment. When this acknowledgment arrives, the -TCP sender increases the congestion window by one MSS and sends out two -maximum-sized segments. These segments are then acknowledged, with the -sender increasing the congestion window by 1 MSS for each of the -acknowledged segments, giving a congestion window of 4 MSS, and so on. -This process results in a doubling of the sending rate every RTT. Thus, -the TCP send rate starts slow but grows exponentially during the slow -start phase. But when should this exponential growth end? Slow start -provides several answers to this question. First, if there is a loss -event (i.e., congestion) indicated by a timeout, the TCP sender sets the -value of cwnd to 1 and begins the slow start process anew. It also sets -the value of a second state variable, ssthresh (shorthand for "slow -start threshold") to cwnd/2 ---half of the value of the congestion -window value when congestion was detected. The second way in which slow -start may end is directly tied to the value of ssthresh . Since ssthresh -is half the value of cwnd when congestion was last detected, it might be -a bit reckless to keep doubling cwnd when it reaches or surpasses the -value of ssthresh . Thus, when the value of cwnd equals ssthresh , slow -start ends and TCP transitions into congestion avoidance mode. As we'll -see, TCP increases cwnd more cautiously when in congestion-avoidance -mode. The final way in which slow start can end is if three duplicate -ACKs are - -detected, in which case TCP performs a fast retransmit (see Section -3.5.4) and enters the fast recovery state, as discussed below. TCP's -behavior in slow start is summarized in the FSM description of TCP -congestion control in Figure 3.51. The slow-start algorithm traces it -roots to \[Jacobson 1988\]; an approach similar to slow start was also -proposed independently in \[Jain 1986\]. Congestion Avoidance On entry -to the congestion-avoidance state, the value of cwnd is approximately -half its value when congestion was last encountered---congestion could -be just around the corner! Thus, rather than doubling the value of cwnd -every RTT, TCP adopts a more conservative approach and increases the -value of cwnd by just a single MSS every RTT \[RFC 5681\]. This can be -accomplished in several ways. A common approach is for the TCP sender to -increase cwnd by MSS bytes (MSS/ cwnd ) whenever a new acknowledgment -arrives. For example, if MSS is 1,460 bytes and cwnd is 14,600 bytes, -then 10 segments are being sent within an RTT. Each arriving ACK -(assuming one ACK per segment) increases the congestion window size by -1/10 MSS, and thus, the value of the congestion window will have -increased by one MSS after ACKs when all 10 segments have been received. -But when should congestion avoidance's linear increase (of 1 MSS per -RTT) end? TCP's congestionavoidance algorithm behaves the same when a -timeout occurs. As in the case of slow start: The value of cwnd is set -to 1 MSS, and the value of ssthresh is updated to half the value of cwnd -when the loss event occurred. Recall, however, that a loss event also -can be triggered by a triple duplicate ACK event. - -Figure 3.51 FSM description of TCP congestion control - -In this case, the network is continuing to deliver segments from sender -to receiver (as indicated by the receipt of duplicate ACKs). So TCP's -behavior to this type of loss event should be less drastic than with a -timeout-indicated loss: TCP halves the value of cwnd (adding in 3 MSS -for good measure to account for the triple duplicate ACKs received) and -records the value of ssthresh to be half the value of cwnd when the -triple duplicate ACKs were received. The fast-recovery state is then -entered. Fast Recovery In fast recovery, the value of cwnd is increased -by 1 MSS for every duplicate ACK received for the missing segment that -caused TCP to enter the fast-recovery state. Eventually, when an ACK -arrives for the missing segment, TCP enters the - -Examining the behavior of TCP - -PRINCIPLES IN PRACTICE TCP SPLITTING: OPTIMIZING THE PERFORMANCE OF -CLOUD SERVICES For cloud services such as search, e-mail, and social -networks, it is desirable to provide a highlevel of responsiveness, -ideally giving users the illusion that the services are running within -their own end systems (including their smartphones). This can be a major -challenge, as users are often located far away from the data centers -responsible for serving the dynamic content associated with the cloud -services. Indeed, if the end system is far from a data center, then the -RTT will be large, potentially leading to poor response time performance -due to TCP slow start. As a case study, consider the delay in receiving -a response for a search query. Typically, the server requires three TCP -windows during slow start to deliver the response \[Pathak 2010\]. Thus -the time from when an end system initiates a TCP connection until the -time when it receives the last packet of the response is roughly 4⋅RTT -(one RTT to set up the TCP connection plus three RTTs for the three -windows of data) plus the processing time in the data center. These RTT -delays can lead to a noticeable delay in returning search results for a -significant fraction of queries. Moreover, there can be significant -packet loss in access networks, leading to TCP retransmissions and even -larger delays. One way to mitigate this problem and improve -user-perceived performance is to (1) deploy frontend servers closer to -the users, and (2) utilize TCP splitting by breaking the TCP connection -at the front-end server. With TCP splitting, the client establishes a -TCP connection to the nearby front-end, and the front-end maintains a -persistent TCP connection to the data center with a very large TCP -congestion window \[Tariq 2008, Pathak 2010, Chen 2011\]. With this -approach, the response time roughly becomes 4⋅RTTFE+RTTBE+ processing -time, where RTTFE is the roundtrip time between client and front-end -server, and RTTBE is the round-trip time between the frontend server and -the data center (back-end server). If the front-end server is close to -client, then this response time approximately becomes RTT plus -processing time, since RTTFE is negligibly small and RTTBE is -approximately RTT. In summary, TCP splitting can reduce the networking -delay roughly from 4⋅RTT to RTT, significantly improving user-perceived -performance, particularly for users who are far from the nearest data -center. TCP splitting also helps reduce TCP retransmission delays caused -by losses in access networks. Google and Akamai have made extensive use -of their CDN servers in access networks (recall our discussion in -Section 2.6) to perform TCP splitting for the cloud services they -support \[Chen 2011\]. - -congestion-avoidance state after deflating cwnd . If a timeout event -occurs, fast recovery transitions to the slow-start state after -performing the same actions as in slow start and congestion avoidance: -The value of cwnd is set to 1 MSS, and the value of ssthresh is set to -half the value of cwnd when the loss event occurred. Fast recovery is a -recommended, but not required, component of TCP \[RFC 5681\]. It is -interesting that an early version of TCP, known as TCP Tahoe, -unconditionally cut its congestion window to 1 MSS and entered the -slow-start phase after either a timeout-indicated or -triple-duplicate-ACK-indicated loss event. The newer version of TCP, TCP -Reno, incorporated fast recovery. Figure 3.52 illustrates the evolution -of TCP's congestion window for both Reno and Tahoe. In this figure, the -threshold is initially equal to 8 MSS. For the first eight transmission -rounds, Tahoe and Reno take identical actions. The congestion window -climbs exponentially fast during slow start and hits the threshold at -the fourth round of transmission. The congestion window then climbs -linearly until a triple duplicate- ACK event occurs, just after -transmission round 8. Note that the congestion window is 12⋅MSS when -this loss event occurs. The value of ssthresh is then set to 0.5⋅ cwnd -=6⋅MSS. Under TCP Reno, the congestion window is set to cwnd = 9⋅MSS and -then grows linearly. Under TCP Tahoe, the congestion window is set to 1 -MSS and grows exponentially until it reaches the value of ssthresh , at -which point it grows linearly. Figure 3.51 presents the complete FSM -description of TCP's congestion-control algorithms---slow start, -congestion avoidance, and fast recovery. The figure also indicates where -transmission of new segments or retransmitted segments can occur. -Although it is important to distinguish between TCP error -control/retransmission and TCP congestion control, it's also important -to appreciate how these two aspects of TCP are inextricably linked. TCP -Congestion Control: Retrospective Having delved into the details of slow -start, congestion avoidance, and fast recovery, it's worthwhile to now -step back and view the forest from the trees. Ignoring the - -Figure 3.52 Evolution of TCP's congestion window (Tahoe and Reno) - -Figure 3.53 Additive-increase, multiplicative-decrease congestion -control - -initial slow-start period when a connection begins and assuming that -losses are indicated by triple duplicate ACKs rather than timeouts, -TCP's congestion control consists of linear (additive) increase in cwnd -of 1 MSS per RTT and then a halving (multiplicative decrease) of cwnd on -a triple duplicate-ACK event. For this reason, TCP congestion control is -often referred to as an additive-increase, multiplicative-decrease -(AIMD) form of congestion control. AIMD congestion control gives rise to -the "saw tooth" behavior shown in Figure 3.53, which also nicely -illustrates our earlier intuition of TCP "probing" for bandwidth---TCP -linearly increases its congestion window size (and hence its -transmission rate) until a triple duplicate-ACK event occurs. It then -decreases its congestion window size by a factor of two but then again -begins increasing it linearly, probing to see if there is additional -available bandwidth. - -As noted previously, many TCP implementations use the Reno algorithm -\[Padhye 2001\]. Many variations of the Reno algorithm have been -proposed \[RFC 3782; RFC 2018\]. The TCP Vegas algorithm \[Brakmo 1995; -Ahn 1995\] attempts to avoid congestion while maintaining good -throughput. The basic idea of Vegas is to (1) detect congestion in the -routers between source and destination before packet loss occurs, and -(2) lower the rate linearly when this imminent packet loss is detected. -Imminent packet loss is predicted by observing the RTT. The longer the -RTT of the packets, the greater the congestion in the routers. As of -late 2015, the Ubuntu Linux implementation of TCP provided slowstart, -congestion avoidance, fast recovery, fast retransmit, and SACK, by -default; alternative congestion control algorithms, such as TCP Vegas -and BIC \[Xu 2004\], are also provided. For a survey of the many flavors -of TCP, see \[Afanasyev 2010\]. TCP's AIMD algorithm was developed based -on a tremendous amount of engineering insight and experimentation with -congestion control in operational networks. Ten years after TCP's -development, theoretical analyses showed that TCP's congestion-control -algorithm serves as a distributed asynchronous-optimization algorithm -that results in several important aspects of user and network -performance being simultaneously optimized \[Kelly 1998\]. A rich theory -of congestion control has since been developed \[Srikant 2004\]. -Macroscopic Description of TCP Throughput Given the saw-toothed behavior -of TCP, it's natural to consider what the average throughput (that is, -the average rate) of a long-lived TCP connection might be. In this -analysis we'll ignore the slow-start phases that occur after timeout -events. (These phases are typically very short, since the sender grows -out of the phase exponentially fast.) During a particular round-trip -interval, the rate at which TCP sends data is a function of the -congestion window and the current RTT. When the window size is w bytes -and the current round-trip time is RTT seconds, then TCP's transmission -rate is roughly w/RTT. TCP then probes for additional bandwidth by -increasing w by 1 MSS each RTT until a loss event occurs. Denote by W -the value of w when a loss event occurs. Assuming that RTT and W are -approximately constant over the duration of the connection, the TCP -transmission rate ranges from W/(2 · RTT) to W/RTT. These assumptions -lead to a highly simplified macroscopic model for the steady-state -behavior of TCP. The network drops a packet from the connection when the -rate increases to W/RTT; the rate is then cut in half and then increases -by MSS/RTT every RTT until it again reaches W/RTT. This process repeats -itself over and over again. Because TCP's throughput (that is, rate) -increases linearly between the two extreme values, we have average -throughput of a connection=0.75⋅WRTT Using this highly idealized model -for the steady-state dynamics of TCP, we can also derive an interesting -expression that relates a connection's loss rate to its available -bandwidth \[Mahdavi 1997\]. - -This derivation is outlined in the homework problems. A more -sophisticated model that has been found empirically to agree with -measured data is \[Padhye 2000\]. TCP Over High-Bandwidth Paths It is -important to realize that TCP congestion control has evolved over the -years and indeed continues to evolve. For a summary of current TCP -variants and discussion of TCP evolution, see \[Floyd 2001, RFC 5681, -Afanasyev 2010\]. What was good for the Internet when the bulk of the -TCP connections carried SMTP, FTP, and Telnet traffic is not necessarily -good for today's HTTP-dominated Internet or for a future Internet with -services that are still undreamed of. The need for continued evolution -of TCP can be illustrated by considering the high-speed TCP connections -that are needed for grid- and cloud-computing applications. For example, -consider a TCP connection with 1,500-byte segments and a 100 ms RTT, and -suppose we want to send data through this connection at 10 Gbps. -Following \[RFC 3649\], we note that using the TCP throughput formula -above, in order to achieve a 10 Gbps throughput, the average congestion -window size would need to be 83,333 segments. That's a lot of segments, -leading us to be rather concerned that one of these 83,333 in-flight -segments might be lost. What would happen in the case of a loss? Or, put -another way, what fraction of the transmitted segments could be lost -that would allow the TCP congestion-control algorithm specified in -Figure 3.51 still to achieve the desired 10 Gbps rate? In the homework -questions for this chapter, you are led through the derivation of a -formula relating the throughput of a TCP connection as a function of the -loss rate (L), the round-trip time (RTT), and the maximum segment size -(MSS): average throughput of a connection=1.22⋅MSSRTTL Using this -formula, we can see that in order to achieve a throughput of 10 Gbps, -today's TCP congestion-control algorithm can only tolerate a segment -loss probability of 2 · 10--10 (or equivalently, one loss event for -every 5,000,000,000 segments)---a very low rate. This observation has -led a number of researchers to investigate new versions of TCP that are -specifically designed for such high-speed environments; see \[Jin 2004; -Kelly 2003; Ha 2008; RFC 7323\] for discussions of these efforts. - -3.7.1 Fairness Consider K TCP connections, each with a different -end-to-end path, but all passing through a bottleneck link with -transmission rate R bps. (By bottleneck link, we mean that for each -connection, all the other links along the connection's path are not -congested and have abundant transmission capacity as compared with the -transmission capacity of the bottleneck link.) Suppose each connection -is transferring a large file and there is no UDP traffic passing through -the bottleneck link. A congestion-control mechanism is said to be fair -if the average transmission rate of each connection is approximately -R/K; - -that is, each connection gets an equal share of the link bandwidth. Is -TCP's AIMD algorithm fair, particularly given that different TCP -connections may start at different times and thus may have different -window sizes at a given point in time? \[Chiu 1989\] provides an elegant -and intuitive explanation of why TCP congestion control converges to -provide an equal share of a bottleneck link's bandwidth among competing -TCP connections. Let's consider the simple case of two TCP connections -sharing a single link with transmission rate R, as shown in Figure 3.54. -Assume that the two connections - -Figure 3.54 Two TCP connections sharing a single bottleneck link - -have the same MSS and RTT (so that if they have the same congestion -window size, then they have the same throughput), that they have a large -amount of data to send, and that no other TCP connections or UDP -datagrams traverse this shared link. Also, ignore the slow-start phase -of TCP and assume the TCP connections are operating in CA mode (AIMD) at -all times. Figure 3.55 plots the throughput realized by the two TCP -connections. If TCP is to share the link bandwidth equally between the -two connections, then the realized throughput should fall along the -45degree arrow (equal bandwidth share) emanating from the origin. -Ideally, the sum of the two throughputs should equal R. (Certainly, each -connection receiving an equal, but zero, share of the link capacity is -not a desirable situation!) So the goal should be to have the achieved -throughputs fall somewhere near the intersection of the equal bandwidth -share line and the full bandwidth utilization line in Figure 3.55. -Suppose that the TCP window sizes are such that at a given point in -time, connections 1 and 2 realize throughputs indicated by point A in -Figure 3.55. Because the amount of link bandwidth jointly consumed by -the two connections is less than R, no loss will occur, and both -connections will increase their window by 1 MSS per RTT as a result of -TCP's congestion-avoidance algorithm. Thus, the joint throughput of the -two connections proceeds along a 45-degree line (equal increase for both - -connections) starting from point A. Eventually, the link bandwidth -jointly consumed by the two connections will be greater than R, and -eventually packet loss will occur. Suppose that connections 1 and 2 -experience packet loss when they realize throughputs indicated by point -B. Connections 1 and 2 then decrease their windows by a factor of two. -The resulting throughputs realized are thus at point C, halfway along a -vector starting at B and ending at the origin. Because the joint -bandwidth use is less than R at point C, the two connections again -increase their throughputs along a 45-degree line starting from C. -Eventually, loss will again occur, for example, at point D, and the two -connections again decrease their window sizes by a factor of two, and so -on. You should convince yourself that the bandwidth realized by the two -connections eventually fluctuates along the equal bandwidth share line. -You should also convince - -Figure 3.55 Throughput realized by TCP connections 1 and 2 - -yourself that the two connections will converge to this behavior -regardless of where they are in the twodimensional space! Although a -number of idealized assumptions lie behind this scenario, it still -provides an intuitive feel for why TCP results in an equal sharing of -bandwidth among connections. In our idealized scenario, we assumed that -only TCP connections traverse the bottleneck link, that the connections -have the same RTT value, and that only a single TCP connection is -associated with a hostdestination pair. In practice, these conditions -are typically not met, and client-server applications can thus obtain -very unequal portions of link bandwidth. In particular, it has been -shown that when multiple connections share a common bottleneck, those -sessions with a smaller RTT are able to grab the available bandwidth at -that link more quickly as it becomes free (that is, open their -congestion windows faster) and thus will enjoy higher throughput than -those connections with larger RTTs \[Lakshman - -1997\]. Fairness and UDP We have just seen how TCP congestion control -regulates an application's transmission rate via the congestion window -mechanism. Many multimedia applications, such as Internet phone and -video conferencing, often do not run over TCP for this very -reason---they do not want their transmission rate throttled, even if the -network is very congested. Instead, these applications prefer to run -over UDP, which does not have built-in congestion control. When running -over UDP, applications can pump their audio and video into the network -at a constant rate and occasionally lose packets, rather than reduce -their rates to "fair" levels at times of congestion and not lose any -packets. From the perspective of TCP, the multimedia applications -running over UDP are not being fair---they do not cooperate with the -other connections nor adjust their transmission rates appropriately. -Because TCP congestion control will decrease its transmission rate in -the face of increasing congestion (loss), while UDP sources need not, it -is possible for UDP sources to crowd out TCP traffic. An area of -research today is thus the development of congestion-control mechanisms -for the Internet that prevent UDP traffic from bringing the Internet's -throughput to a grinding halt \[Floyd 1999; Floyd 2000; Kohler 2006; RFC -4340\]. Fairness and Parallel TCP Connections But even if we could force -UDP traffic to behave fairly, the fairness problem would still not be -completely solved. This is because there is nothing to stop a TCP-based -application from using multiple parallel connections. For example, Web -browsers often use multiple parallel TCP connections to transfer the -multiple objects within a Web page. (The exact number of multiple -connections is configurable in most browsers.) When an application uses -multiple parallel connections, it gets a larger fraction of the -bandwidth in a congested link. As an example, consider a link of rate R -supporting nine ongoing clientserver applications, with each of the -applications using one TCP connection. If a new application comes along -and also uses one TCP connection, then each application gets -approximately the same transmission rate of R/10. But if this new -application instead uses 11 parallel TCP connections, then the new -application gets an unfair allocation of more than R/2. Because Web -traffic is so pervasive in the Internet, multiple parallel connections -are not uncommon. - -3.7.2 Explicit Congestion Notification (ECN): Network-assisted -Congestion Control Since the initial standardization of slow start and -congestion avoidance in the late 1980's \[RFC 1122\], TCP has -implemented the form of end-end congestion control that we studied in -Section 3.7.1: a TCP sender receives no explicit congestion indications -from the network layer, and instead infers congestion through observed -packet loss. More recently, extensions to both IP and TCP \[RFC 3168\] -have been proposed, implemented, and deployed that allow the network to -explicitly signal congestion to a TCP - -sender and receiver. This form of network-assisted congestion control is -known as Explicit Congestion Notification. As shown in Figure 3.56, the -TCP and IP protocols are involved. At the network layer, two bits (with -four possible values, overall) in the Type of Service field of the IP -datagram header (which we'll discuss in Section 4.3) are used for ECN. -One setting of the ECN bits is used by a router to indicate that it (the - -Figure 3.56 Explicit Congestion Notification: network-assisted -congestion control - -router) is experiencing congestion. This congestion indication is then -carried in the marked IP datagram to the destination host, which then -informs the sending host, as shown in Figure 3.56. RFC 3168 does not -provide a definition of when a router is congested; that decision is a -configuration choice made possible by the router vendor, and decided by -the network operator. However, RFC 3168 does recommend that an ECN -congestion indication be set only in the face of persistent congestion. -A second setting of the ECN bits is used by the sending host to inform -routers that the sender and receiver are ECN-capable, and thus capable -of taking action in response to ECN-indicated network congestion. As -shown in Figure 3.56, when the TCP in the receiving host receives an ECN -congestion indication via a received datagram, the TCP in the receiving -host informs the TCP in the sending host of the congestion indication by -setting the ECE (Explicit Congestion Notification Echo) bit (see Figure -3.29) in a receiver-to-sender TCP ACK segment. The TCP sender, in turn, -reacts to an ACK with an ECE congestion indication by halving the -congestion window, as it would react to a lost segment using fast -retransmit, and sets the CWR (Congestion Window Reduced) bit in the -header of the next transmitted TCP sender-to-receiver segment. - -Other transport-layer protocols besides TCP may also make use of -network-layer-signaled ECN. The Datagram Congestion Control Protocol -(DCCP) \[RFC 4340\] provides a low-overhead, congestioncontrolled -UDP-like unreliable service that utilizes ECN. DCTCP (Data Center TCP) -\[Alizadeh 2010\], a version of TCP designed specifically for data -center networks, also makes use of ECN. - -3.8 Summary We began this chapter by studying the services that a -transport-layer protocol can provide to network applications. At one -extreme, the transport-layer protocol can be very simple and offer a -no-frills service to applications, providing only a -multiplexing/demultiplexing function for communicating processes. The -Internet's UDP protocol is an example of such a no-frills -transport-layer protocol. At the other extreme, a transport-layer -protocol can provide a variety of guarantees to applications, such as -reliable delivery of data, delay guarantees, and bandwidth guarantees. -Nevertheless, the services that a transport protocol can provide are -often constrained by the service model of the underlying network-layer -protocol. If the network-layer protocol cannot provide delay or -bandwidth guarantees to transport-layer segments, then the -transport-layer protocol cannot provide delay or bandwidth guarantees -for the messages sent between processes. We learned in Section 3.4 that -a transport-layer protocol can provide reliable data transfer even if -the underlying network layer is unreliable. We saw that providing -reliable data transfer has many subtle points, but that the task can be -accomplished by carefully combining acknowledgments, timers, -retransmissions, and sequence numbers. Although we covered reliable data -transfer in this chapter, we should keep in mind that reliable data -transfer can be provided by link-, network-, transport-, or -application-layer protocols. Any of the upper four layers of the -protocol stack can implement acknowledgments, timers, retransmissions, -and sequence numbers and provide reliable data transfer to the layer -above. In fact, over the years, engineers and computer scientists have -independently designed and implemented link-, network-, transport-, and -application-layer protocols that provide reliable data transfer -(although many of these protocols have quietly disappeared). In Section -3.5, we took a close look at TCP, the Internet's connection-oriented and -reliable transportlayer protocol. We learned that TCP is complex, -involving connection management, flow control, and round-trip time -estimation, as well as reliable data transfer. In fact, TCP is actually -more complex than our description---we intentionally did not discuss a -variety of TCP patches, fixes, and improvements that are widely -implemented in various versions of TCP. All of this complexity, however, -is hidden from the network application. If a client on one host wants to -send data reliably to a server on another host, it simply opens a TCP -socket to the server and pumps data into that socket. The client-server -application is blissfully unaware of TCP's complexity. In Section 3.6, -we examined congestion control from a broad perspective, and in Section -3.7, we showed how TCP implements congestion control. We learned that -congestion control is imperative for - -the well-being of the network. Without congestion control, a network can -easily become gridlocked, with little or no data being transported -end-to-end. In Section 3.7 we learned that TCP implements an endto-end -congestion-control mechanism that additively increases its transmission -rate when the TCP connection's path is judged to be congestion-free, and -multiplicatively decreases its transmission rate when loss occurs. This -mechanism also strives to give each TCP connection passing through a -congested link an equal share of the link bandwidth. We also examined in -some depth the impact of TCP connection establishment and slow start on -latency. We observed that in many important scenarios, connection -establishment and slow start significantly contribute to end-to-end -delay. We emphasize once more that while TCP congestion control has -evolved over the years, it remains an area of intensive research and -will likely continue to evolve in the upcoming years. Our discussion of -specific Internet transport protocols in this chapter has focused on UDP -and TCP---the two "work horses" of the Internet transport layer. -However, two decades of experience with these two protocols has -identified circumstances in which neither is ideally suited. Researchers -have thus been busy developing additional transport-layer protocols, -several of which are now IETF proposed standards. The Datagram -Congestion Control Protocol (DCCP) \[RFC 4340\] provides a low-overhead, -messageoriented, UDP-like unreliable service, but with an -application-selected form of congestion control that is compatible with -TCP. If reliable or semi-reliable data transfer is needed by an -application, then this would be performed within the application itself, -perhaps using the mechanisms we have studied in Section 3.4. DCCP is -envisioned for use in applications such as streaming media (see Chapter -9) that can exploit the tradeoff between timeliness and reliability of -data delivery, but that want to be responsive to network congestion. -Google's QUIC (Quick UDP Internet Connections) protocol \[Iyengar -2016\], implemented in Google's Chromium browser, provides reliability -via retransmission as well as error correction, fast-connection setup, -and a rate-based congestion control algorithm that aims to be TCP -friendly---all implemented as an application-level protocol on top of -UDP. In early 2015, Google reported that roughly half of all requests -from Chrome to Google servers are served over QUIC. DCTCP (Data Center -TCP) \[Alizadeh 2010\] is a version of TCP designed specifically for -data center networks, and uses ECN to better support the mix of short- -and long-lived flows that characterize data center workloads. The Stream -Control Transmission Protocol (SCTP) \[RFC 4960, RFC 3286\] is a -reliable, messageoriented protocol that allows several different -application-level "streams" to be multiplexed through a single SCTP -connection (an approach known as "multi-streaming"). From a reliability -standpoint, the different streams within the connection are handled -separately, so that packet loss in one stream does not affect the -delivery of data in other streams. QUIC provides similar multi-stream -semantics. SCTP - -also allows data to be transferred over two outgoing paths when a host -is connected to two or more networks, optional delivery of out-of-order -data, and a number of other features. SCTP's flow- and -congestion-control algorithms are essentially the same as in TCP. The -TCP-Friendly Rate Control (TFRC) protocol \[RFC 5348\] is a -congestion-control protocol rather than a full-fledged transport-layer -protocol. It specifies a congestion-control mechanism that could be used -in another transport protocol such as DCCP (indeed one of the two -application-selectable protocols available in DCCP is TFRC). The goal of -TFRC is to smooth out the "saw tooth" behavior (see Figure 3.53) in TCP -congestion control, while maintaining a long-term sending rate that is -"reasonably" close to that of TCP. With a smoother sending rate than -TCP, TFRC is well-suited for multimedia applications such as IP -telephony or streaming media where such a smooth rate is important. TFRC -is an "equationbased" protocol that uses the measured packet loss rate -as input to an equation \[Padhye 2000\] that estimates what TCP's -throughput would be if a TCP session experiences that loss rate. This -rate is then taken as TFRC's target sending rate. Only the future will -tell whether DCCP, SCTP, QUIC, or TFRC will see widespread deployment. -While these protocols clearly provide enhanced capabilities over TCP and -UDP, TCP and UDP have proven themselves "good enough" over the years. -Whether "better" wins out over "good enough" will depend on a complex -mix of technical, social, and business considerations. In Chapter 1, we -said that a computer network can be partitioned into the "network edge" -and the "network core." The network edge covers everything that happens -in the end systems. Having now covered the application layer and the -transport layer, our discussion of the network edge is complete. It is -time to explore the network core! This journey begins in the next two -chapters, where we'll study the network layer, and continues into -Chapter 6, where we'll study the link layer. - -Homework Problems and Questions - -Chapter 3 Review Questions - -SECTIONS 3.1--3.3 R1. Suppose the network layer provides the following -service. The network layer in the source host accepts a segment of -maximum size 1,200 bytes and a destination host address from the -transport layer. The network layer then guarantees to deliver the -segment to the transport layer at the destination host. Suppose many -network application processes can be running at the destination host. - -a. Design the simplest possible transport-layer protocol that will get - application data to the desired process at the destination host. - Assume the operating system in the destination host has assigned a - 4-byte port number to each running application process. - -b. Modify this protocol so that it provides a "return address" to the - destination process. - -c. In your protocols, does the transport layer "have to do anything" in - the core of the computer network? R2. Consider a planet where - everyone belongs to a family of six, every family lives in its own - house, each house has a unique address, and each person in a given - house has a unique name. Suppose this planet has a mail service that - delivers letters from source house to destination house. The mail - service requires that (1) the letter be in an envelope, and that (2) - the address of the destination house (and nothing more) be clearly - written on the envelope. Suppose each family has a delegate family - member who collects and distributes letters for the other family - members. The letters do not necessarily provide any indication of - the recipients of the letters. - -d. Using the solution to Problem R1 above as inspiration, describe a - protocol that the delegates can use to deliver letters from a - sending family member to a receiving family member. - -e. In your protocol, does the mail service ever have to open the - envelope and examine the letter in order to provide its service? R3. - Consider a TCP connection between Host A and Host B. Suppose that - the TCP segments traveling from Host A to Host B have source port - number x and destination port number y. What are the source and - destination port numbers for the segments traveling from Host B to - Host A? - -R4. Describe why an application developer might choose to run an -application over UDP rather than TCP. R5. Why is it that voice and video -traffic is often sent over TCP rather than UDP in today's Internet? -(Hint: The answer we are looking for has nothing to do with TCP's -congestion-control mechanism.) R6. Is it possible for an application to -enjoy reliable data transfer even when the application runs over UDP? If -so, how? R7. Suppose a process in Host C has a UDP socket with port -number 6789. Suppose both Host A and Host B each send a UDP segment to -Host C with destination port number 6789. Will both of these segments be -directed to the same socket at Host C? If so, how will the process at -Host C know that these two segments originated from two different hosts? -R8. Suppose that a Web server runs in Host C on port 80. Suppose this -Web server uses persistent connections, and is currently receiving -requests from two different Hosts, A and B. Are all of the requests -being sent through the same socket at Host C? If they are being passed -through different sockets, do both of the sockets have port 80? Discuss -and explain. - -SECTION 3.4 R9. In our rdt protocols, why did we need to introduce -sequence numbers? R10. In our rdt protocols, why did we need to -introduce timers? R11. Suppose that the roundtrip delay between sender -and receiver is constant and known to the sender. Would a timer still be -necessary in protocol rdt 3.0 , assuming that packets can be lost? -Explain. R12. Visit the Go-Back-N Java applet at the companion Web site. - -a. Have the source send five packets, and then pause the animation - before any of the five packets reach the destination. Then kill the - first packet and resume the animation. Describe what happens. - -b. Repeat the experiment, but now let the first packet reach the - destination and kill the first acknowledgment. Describe again what - happens. - -c. Finally, try sending six packets. What happens? R13. Repeat R12, but - now with the Selective Repeat Java applet. How are Selective Repeat - and Go-Back-N different? - -SECTION 3.5 R14. True or false? - -a. Host A is sending Host B a large file over a TCP connection. Assume - Host B has no data to send Host A. Host B will not send - acknowledgments to Host A because Host B cannot piggyback the - acknowledgments on data. - -b. The size of the TCP rwnd never changes throughout the duration of the -connection. c. Suppose Host A is sending Host B a large file over a TCP -connection. The number of unacknowledged bytes that A sends cannot -exceed the size of the receive buffer. - -d. Suppose Host A is sending a large file to Host B over a TCP - connection. If the sequence number for a segment of this connection - is m, then the sequence number for the subsequent segment will - necessarily be m+1. - -e. The TCP segment has a field in its header for rwnd . - -f. Suppose that the last SampleRTT in a TCP connection is equal to 1 - sec. The current value of TimeoutInterval for the connection will - necessarily be ≥1 sec. - -g. Suppose Host A sends one segment with sequence number 38 and 4 bytes - of data over a TCP connection to Host B. In this same segment the - acknowledgment number is necessarily 42. R15. Suppose Host A sends - two TCP segments back to back to Host B over a TCP connection. The - first segment has sequence number 90; the second has sequence number - 110. - -h. How much data is in the first segment? - -i. Suppose that the first segment is lost but the second segment - arrives at B. In the acknowledgment that Host B sends to Host A, - what will be the acknowledgment number? R16. Consider the Telnet - example discussed in Section 3.5 . A few seconds after the user - types the letter 'C,' the user types the letter 'R.' After typing - the letter 'R,' how many segments are sent, and what is put in the - sequence number and acknowledgment fields of the segments? - -SECTION 3.7 R17. Suppose two TCP connections are present over some -bottleneck link of rate R bps. Both connections have a huge file to send -(in the same direction over the bottleneck link). The transmissions of -the files start at the same time. What transmission rate would TCP like -to give to each of the connections? R18. True or false? Consider -congestion control in TCP. When the timer expires at the sender, the -value of ssthresh is set to one half of its previous value. R19. In the -discussion of TCP splitting in the sidebar in Section 3.7 , it was -claimed that the response time with TCP splitting is approximately -4⋅RTTFE+RTTBE+processing time. Justify this claim. - -Problems P1. Suppose Client A initiates a Telnet session with Server S. -At about the same time, Client B - -also initiates a Telnet session with Server S. Provide possible source -and destination port numbers for - -a. The segments sent from A to S. - -b. The segments sent from B to S. - -c. The segments sent from S to A. - -d. The segments sent from S to B. - -e. If A and B are different hosts, is it possible that the source port - number in the segments from A to S is the same as that from B to S? - -f. How about if they are the same host? P2. Consider Figure 3.5 . What - are the source and destination port values in the segments flowing - from the server back to the clients' processes? What are the IP - addresses in the network-layer datagrams carrying the - transport-layer segments? P3. UDP and TCP use 1s complement for - their checksums. Suppose you have the following three 8-bit bytes: - 01010011, 01100110, 01110100. What is the 1s complement of the sum - of these 8-bit bytes? (Note that although UDP and TCP use 16-bit - words in computing the checksum, for this problem you are being - asked to consider 8-bit sums.) Show all work. Why is it that UDP - takes the 1s complement of the sum; that is, why not just use the - sum? With the 1s complement scheme, how does the receiver detect - errors? Is it possible that a 1-bit error will go undetected? How - about a 2-bit error? P4. - -g. Suppose you have the following 2 bytes: 01011100 and 01100101. What - is the 1s complement of the sum of these 2 bytes? - -h. Suppose you have the following 2 bytes: 11011010 and 01100101. What - is the 1s complement of the sum of these 2 bytes? - -i. For the bytes in part (a), give an example where one bit is flipped - in each of the 2 bytes and yet the 1s complement doesn't change. P5. - Suppose that the UDP receiver computes the Internet checksum for the - received UDP segment and finds that it matches the value carried in - the checksum field. Can the receiver be absolutely certain that no - bit errors have occurred? Explain. P6. Consider our motivation for - correcting protocol rdt2.1 . Show that the receiver, shown in Figure - 3.57 , when operating with the sender shown in Figure 3.11 , can - lead the sender and receiver to enter into a deadlock state, where - each is waiting for an event that will never occur. P7. In protocol - rdt3.0 , the ACK packets flowing from the receiver to the sender do - not have sequence numbers (although they do have an ACK field that - contains the sequence number of the packet they are acknowledging). - Why is it that our ACK packets do not require sequence numbers? - -Figure 3.57 An incorrect receiver for protocol rdt 2.1 - -P8. Draw the FSM for the receiver side of protocol rdt3.0 . P9. Give a -trace of the operation of protocol rdt3.0 when data packets and -acknowledgment packets are garbled. Your trace should be similar to that -used in Figure 3.16 . P10. Consider a channel that can lose packets but -has a maximum delay that is known. Modify protocol rdt2.1 to include -sender timeout and retransmit. Informally argue why your protocol can -communicate correctly over this channel. P11. Consider the rdt2.2 -receiver in Figure 3.14 , and the creation of a new packet in the -self-transition (i.e., the transition from the state back to itself) in -the Wait-for-0-from-below and the Wait-for-1-from-below states: -sndpkt=make_pkt(ACK, 1, checksum) and sndpkt=make_pkt(ACK, 0, checksum) -. Would the protocol work correctly if this action were removed from the -self-transition in the Wait-for-1-from-below state? Justify your answer. -What if this event were removed from the self-transition in the -Wait-for-0-from-below state? \[Hint: In this latter case, consider what -would happen if the first sender-to-receiver packet were corrupted.\] -P12. The sender side of rdt3.0 simply ignores (that is, takes no action -on) all received packets that are either in error or have the wrong -value in the acknum field of an acknowledgment packet. Suppose that in -such circumstances, rdt3.0 were simply to retransmit the current data -packet. Would the protocol still work? (Hint: Consider what would happen -if there were only bit errors; there are no packet losses but premature -timeouts can occur. Consider how many times the nth packet is sent, in -the limit as n approaches infinity.) - -P13. Consider the rdt 3.0 protocol. Draw a diagram showing that if the -network connection between the sender and receiver can reorder messages -(that is, that two messages propagating in the medium between the sender -and receiver can be reordered), then the alternating-bit protocol will -not work correctly (make sure you clearly identify the sense in which it -will not work correctly). Your diagram should have the sender on the -left and the receiver on the right, with the time axis running down the -page, showing data (D) and acknowledgment (A) message exchange. Make -sure you indicate the sequence number associated with any data or -acknowledgment segment. P14. Consider a reliable data transfer protocol -that uses only negative acknowledgments. Suppose the sender sends data -only infrequently. Would a NAK-only protocol be preferable to a protocol -that uses ACKs? Why? Now suppose the sender has a lot of data to send -and the endto-end connection experiences few losses. In this second -case, would a NAK-only protocol be preferable to a protocol that uses -ACKs? Why? P15. Consider the cross-country example shown in Figure 3.17 -. How big would the window size have to be for the channel utilization -to be greater than 98 percent? Suppose that the size of a packet is -1,500 bytes, including both header fields and data. P16. Suppose an -application uses rdt 3.0 as its transport layer protocol. As the -stop-and-wait protocol has very low channel utilization (shown in the -cross-country example), the designers of this application let the -receiver keep sending back a number (more than two) of alternating ACK 0 -and ACK 1 even if the corresponding data have not arrived at the -receiver. Would this application design increase the channel -utilization? Why? Are there any potential problems with this approach? -Explain. P17. Consider two network entities, A and B, which are -connected by a perfect bi-directional channel (i.e., any message sent -will be received correctly; the channel will not corrupt, lose, or -re-order packets). A and B are to deliver data messages to each other in -an alternating manner: First, A must deliver a message to B, then B must -deliver a message to A, then A must deliver a message to B and so on. If -an entity is in a state where it should not attempt to deliver a message -to the other side, and there is an event like rdt_send(data) call from -above that attempts to pass data down for transmission to the other -side, this call from above can simply be ignored with a call to -rdt_unable_to_send(data) , which informs the higher layer that it is -currently not able to send data. \[Note: This simplifying assumption is -made so you don't have to worry about buffering data.\] Draw a FSM -specification for this protocol (one FSM for A, and one FSM for B!). -Note that you do not have to worry about a reliability mechanism here; -the main point of this question is to create a FSM specification that -reflects the synchronized behavior of the two entities. You should use -the following events and actions that have the same meaning as protocol -rdt1.0 in Figure 3.9 : rdt_send(data), packet = make_pkt(data) , -udt_send(packet), rdt_rcv(packet) , extract (packet, data), -deliver_data(data) . Make sure your protocol reflects the strict -alternation of sending between A and B. Also, make sure to indicate the -initial states for A and B in your FSM descriptions. - -P18. In the generic SR protocol that we studied in Section 3.4.4 , the -sender transmits a message as soon as it is available (if it is in the -window) without waiting for an acknowledgment. Suppose now that we want -an SR protocol that sends messages two at a time. That is, the sender -will send a pair of messages and will send the next pair of messages -only when it knows that both messages in the first pair have been -received correctly. Suppose that the channel may lose messages but will -not corrupt or reorder messages. Design an error-control protocol for -the unidirectional reliable transfer of messages. Give an FSM -description of the sender and receiver. Describe the format of the -packets sent between sender and receiver, and vice versa. If you use any -procedure calls other than those in Section 3.4 (for example, udt_send() -, start_timer() , rdt_rcv() , and so on), clearly state their actions. -Give an example (a timeline trace of sender and receiver) showing how -your protocol recovers from a lost packet. P19. Consider a scenario in -which Host A wants to simultaneously send packets to Hosts B and C. A is -connected to B and C via a broadcast channel---a packet sent by A is -carried by the channel to both B and C. Suppose that the broadcast -channel connecting A, B, and C can independently lose and corrupt -packets (and so, for example, a packet sent from A might be correctly -received by B, but not by C). Design a stop-and-wait-like error-control -protocol for reliably transferring packets from A to B and C, such that -A will not get new data from the upper layer until it knows that both B -and C have correctly received the current packet. Give FSM descriptions -of A and C. (Hint: The FSM for B should be essentially the same as for -C.) Also, give a description of the packet format(s) used. P20. Consider -a scenario in which Host A and Host B want to send messages to Host C. -Hosts A and C are connected by a channel that can lose and corrupt (but -not reorder) messages. Hosts B and C are connected by another channel -(independent of the channel connecting A and C) with the same -properties. The transport layer at Host C should alternate in delivering -messages from A and B to the layer above (that is, it should first -deliver the data from a packet from A, then the data from a packet from -B, and so on). Design a stop-and-wait-like error-control protocol for -reliably transferring packets from A and B to C, with alternating -delivery at C as described above. Give FSM descriptions of A and C. -(Hint: The FSM for B should be essentially the same as for A.) Also, -give a description of the packet format(s) used. P21. Suppose we have -two network entities, A and B. B has a supply of data messages that will -be sent to A according to the following conventions. When A gets a -request from the layer above to get the next data (D) message from B, A -must send a request (R) message to B on the A-to-B channel. Only when B -receives an R message can it send a data (D) message back to A on the -B-to-A channel. A should deliver exactly one copy of each D message to -the layer above. R messages can be lost (but not corrupted) in the -A-to-B channel; D messages, once sent, are always delivered correctly. -The delay along both channels is unknown and variable. Design (give an -FSM description of) a protocol that incorporates the appropriate -mechanisms to compensate for the loss-prone A-to-B channel and -implements message passing to the layer above at entity A, as discussed -above. Use only those mechanisms that are absolutely - -necessary. P22. Consider the GBN protocol with a sender window size of 4 -and a sequence number range of 1,024. Suppose that at time t, the next -in-order packet that the receiver is expecting has a sequence number of -k. Assume that the medium does not reorder messages. Answer the -following questions: - -a. What are the possible sets of sequence numbers inside the sender's - window at time t? Justify your answer. - -b. What are all possible values of the ACK field in all possible - messages currently propagating back to the sender at time t? Justify - your answer. P23. Consider the GBN and SR protocols. Suppose the - sequence number space is of size k. What is the largest allowable - sender window that will avoid the occurrence of problems such as - that in Figure 3.27 for each of these protocols? P24. Answer true or - false to the following questions and briefly justify your answer: - -c. With the SR protocol, it is possible for the sender to receive an - ACK for a packet that falls outside of its current window. - -d. With GBN, it is possible for the sender to receive an ACK for a - packet that falls outside of its current window. - -e. The alternating-bit protocol is the same as the SR protocol with a - sender and receiver window size of 1. - -f. The alternating-bit protocol is the same as the GBN protocol with a - sender and receiver window size of 1. P25. We have said that an - application may choose UDP for a transport protocol because UDP - offers finer application control (than TCP) of what data is sent in - a segment and when. - -g. Why does an application have more control of what data is sent in a - segment? - -h. Why does an application have more control on when the segment is - sent? P26. Consider transferring an enormous file of L bytes from - Host A to Host B. Assume an MSS of 536 bytes. - -i. What is the maximum value of L such that TCP sequence numbers are - not exhausted? Recall that the TCP sequence number field has 4 - bytes. - -j. For the L you obtain in (a), find how long it takes to transmit the - file. Assume that a total of 66 bytes of transport, network, and - data-link header are added to each segment before the resulting - packet is sent out over a 155 Mbps link. Ignore flow control and - congestion control so A can pump out the segments back to back and - continuously. P27. Host A and B are communicating over a TCP - connection, and Host B has already received from A all bytes up - through byte 126. Suppose Host A then sends two segments to Host B - backto-back. The first and second segments contain 80 and 40 bytes - of data, respectively. In the first - -segment, the sequence number is 127, the source port number is 302, and -the destination port number is 80. Host B sends an acknowledgment -whenever it receives a segment from Host A. - -a. In the second segment sent from Host A to B, what are the sequence - number, source port number, and destination port number? - -b. If the first segment arrives before the second segment, in the - acknowledgment of the first arriving segment, what is the - acknowledgment number, the source port number, and the destination - port number? - -c. If the second segment arrives before the first segment, in the - acknowledgment of the first arriving segment, what is the - acknowledgment number? - -d. Suppose the two segments sent by A arrive in order at B. The first - acknowledgment is lost and the second acknowledgment arrives after - the first timeout interval. Draw a timing diagram, showing these - segments and all other segments and acknowledgments sent. (Assume - there is no additional packet loss.) For each segment in your - figure, provide the sequence number and the number of bytes of data; - for each acknowledgment that you add, provide the acknowledgment - number. P28. Host A and B are directly connected with a 100 Mbps - link. There is one TCP connection between the two hosts, and Host A - is sending to Host B an enormous file over this connection. Host A - can send its application data into its TCP socket at a rate as high - as 120 Mbps but Host B can read out of its TCP receive buffer at a - maximum rate of 50 Mbps. Describe the effect of TCP flow control. - P29. SYN cookies were discussed in Section 3.5.6 . - -e. Why is it necessary for the server to use a special initial sequence - number in the SYNACK? - -f. Suppose an attacker knows that a target host uses SYN cookies. Can - the attacker create half-open or fully open connections by simply - sending an ACK packet to the target? Why or why not? - -g. Suppose an attacker collects a large amount of initial sequence - numbers sent by the server. Can the attacker cause the server to - create many fully open connections by sending ACKs with those - initial sequence numbers? Why? P30. Consider the network shown in - Scenario 2 in Section 3.6.1 . Suppose both sending hosts A and B - have some fixed timeout values. - -h. Argue that increasing the size of the finite buffer of the router - might possibly decrease the throughput (λout). - -i. Now suppose both hosts dynamically adjust their timeout values (like - what TCP does) based on the buffering delay at the router. Would - increasing the buffer size help to increase the throughput? Why? - P31. Suppose that the five measured SampleRTT values (see Section - 3.5.3 ) are 106 ms, 120 - -ms, 140 ms, 90 ms, and 115 ms. Compute the EstimatedRTT after each of -these SampleRTT values is obtained, using a value of α=0.125 and -assuming that the value of EstimatedRTT was 100 ms just before the first -of these five samples were obtained. Compute also the DevRTT after each -sample is obtained, assuming a value of β=0.25 and assuming the value of -DevRTT was 5 ms just before the first of these five samples was -obtained. Last, compute the TCP TimeoutInterval after each of these -samples is obtained. P32. Consider the TCP procedure for estimating RTT. -Suppose that α=0.1. Let SampleRTT 1 be the most recent sample RTT, let -SampleRTT 2 be the next most recent sample RTT, and so on. - -a. For a given TCP connection, suppose four acknowledgments have been - returned with corresponding sample RTTs: SampleRTT 4, SampleRTT 3, - SampleRTT 2, and SampleRTT 1. Express EstimatedRTT in terms of the - four sample RTTs. - -b. Generalize your formula for n sample RTTs. - -c. For the formula in part (b) let n approach infinity. Comment on why - this averaging procedure is called an exponential moving average. - P33. In Section 3.5.3 , we discussed TCP's estimation of RTT. Why do - you think TCP avoids measuring the SampleRTT for retransmitted - segments? P34. What is the relationship between the variable - SendBase in Section 3.5.4 and the variable LastByteRcvd in Section - 3.5.5 ? P35. What is the relationship between the variable - LastByteRcvd in Section 3.5.5 and the variable y in Section 3.5.4? - P36. In Section 3.5.4 , we saw that TCP waits until it has received - three duplicate ACKs before performing a fast retransmit. Why do you - think the TCP designers chose not to perform a fast retransmit after - the first duplicate ACK for a segment is received? P37. Compare GBN, - SR, and TCP (no delayed ACK). Assume that the timeout values for all - three protocols are sufficiently long such that 5 consecutive data - segments and their corresponding ACKs can be received (if not lost - in the channel) by the receiving host (Host B) and the sending host - (Host A) respectively. Suppose Host A sends 5 data segments to Host - B, and the 2nd segment (sent from A) is lost. In the end, all 5 data - segments have been correctly received by Host B. - -d. How many segments has Host A sent in total and how many ACKs has - Host B sent in total? What are their sequence numbers? Answer this - question for all three protocols. - -e. If the timeout values for all three protocol are much longer than 5 - RTT, then which protocol successfully delivers all five data - segments in shortest time interval? P38. In our description of TCP - in Figure 3.53 , the value of the threshold, ssthresh , is set as - ssthresh=cwnd/2 in several places and ssthresh value is referred to - as being set to half the window size when a loss event occurred. - Must the rate at which the sender is sending when the loss event - occurred be approximately equal to cwnd segments per RTT? Explain - your - -answer. If your answer is no, can you suggest a different manner in -which ssthresh should be set? P39. Consider Figure 3.46(b) . If λ′in -increases beyond R/2, can λout increase beyond R/3? Explain. Now -consider Figure 3.46(c) . If λ′in increases beyond R/2, can λout -increase beyond R/4 under the assumption that a packet will be forwarded -twice on average from the router to the receiver? Explain. P40. Consider -Figure 3.58 . Assuming TCP Reno is the protocol experiencing the -behavior shown above, answer the following questions. In all cases, you -should provide a short discussion justifying your answer. - -Examining the behavior of TCP - -a. Identify the intervals of time when TCP slow start is operating. - -b. Identify the intervals of time when TCP congestion avoidance is - operating. - -c. After the 16th transmission round, is segment loss detected by a - triple duplicate ACK or by a timeout? - -d. After the 22nd transmission round, is segment loss detected by a - triple duplicate ACK or by a timeout? - -Figure 3.58 TCP window size as a function of time - -e. What is the initial value of ssthresh at the first transmission -round? f. What is the value of ssthresh at the 18th transmission round? -g. What is the value of ssthresh at the 24th transmission round? h. -During what transmission round is the 70th segment sent? i. Assuming a -packet loss is detected after the 26th round by the receipt of a triple -duplicate ACK, what will be the values of the congestion window size and -of ssthresh ? - -j. Suppose TCP Tahoe is used (instead of TCP Reno), and assume that - triple duplicate ACKs are received at the 16th round. What are the - ssthresh and the congestion window size at the 19th round? - -k. Again suppose TCP Tahoe is used, and there is a timeout event at - 22nd round. How many packets have been sent out from 17th round till - 22nd round, inclusive? P41. Refer to Figure 3.55 , which illustrates - the convergence of TCP's AIMD algorithm. Suppose that instead of a - multiplicative decrease, TCP decreased the window size by a constant - amount. Would the resulting AIAD algorithm converge to an equal - share algorithm? Justify your answer using a diagram similar to - Figure 3.55 . P42. In Section 3.5.4 , we discussed the doubling of - the timeout interval after a timeout event. This mechanism is a form - of congestion control. Why does TCP need a window-based - congestion-control mechanism (as studied in Section 3.7 ) in - addition to this doubling-timeoutinterval mechanism? P43. Host A is - sending an enormous file to Host B over a TCP connection. Over this - connection there is never any packet loss and the timers never - expire. Denote the transmission rate of the link connecting Host A - to the Internet by R bps. Suppose that the process in Host A is - capable of sending data into its TCP socket at a rate S bps, where - S=10⋅R. Further suppose that the TCP receive buffer is large enough - to hold the entire file, and the send buffer can hold only one - percent of the file. What would prevent the process in Host A from - continuously passing data to its TCP socket at rate S bps? TCP flow - control? TCP congestion control? Or something else? Elaborate. P44. - Consider sending a large file from a host to another over a TCP - connection that has no loss. - -l. Suppose TCP uses AIMD for its congestion control without slow start. - Assuming cwnd increases by 1 MSS every time a batch of ACKs is - received and assuming approximately constant round-trip times, how - long does it take for cwnd increase from 6 MSS to 12 MSS (assuming - no loss events)? - -m. What is the average throughout (in terms of MSS and RTT) for this - connection up through time=6 RTT? P45. Recall the macroscopic - description of TCP throughput. In the period of time from when the - -connection's rate varies from W/(2 · RTT) to W/RTT, only one packet is -lost (at the very end of the period). - -a. Show that the loss rate (fraction of packets lost) is equal to - L=loss rate=138W2+34W - -b. Use the result above to show that if a connection has loss rate L, - then its average rate is approximately given by ≈1.22⋅MSSRTTL P46. - Consider that only a single TCP (Reno) connection uses one 10Mbps - link which does not buffer any data. Suppose that this link is the - only congested link between the sending and receiving hosts. Assume - that the TCP sender has a huge file to send to the receiver, and the - receiver's receive buffer is much larger than the congestion window. - We also make the following assumptions: each TCP segment size is - 1,500 bytes; the two-way propagation delay of this connection is 150 - msec; and this TCP connection is always in congestion avoidance - phase, that is, ignore slow start. - -c. What is the maximum window size (in segments) that this TCP - connection can achieve? - -d. What is the average window size (in segments) and average throughput - (in bps) of this TCP connection? - -e. How long would it take for this TCP connection to reach its maximum - window again after recovering from a packet loss? P47. Consider the - scenario described in the previous problem. Suppose that the 10Mbps - link can buffer a finite number of segments. Argue that in order for - the link to always be busy sending data, we would like to choose a - buffer size that is at least the product of the link speed C and the - two-way propagation delay between the sender and the receiver. P48. - Repeat Problem 46, but replacing the 10 Mbps link with a 10 Gbps - link. Note that in your answer to part c, you will realize that it - takes a very long time for the congestion window size to reach its - maximum window size after recovering from a packet loss. Sketch a - solution to solve this problem. P49. Let T (measured by RTT) denote - the time interval that a TCP connection takes to increase its - congestion window size from W/2 to W, where W is the maximum - congestion window size. Argue that T is a function of TCP's average - throughput. P50. Consider a simplified TCP's AIMD algorithm where - the congestion window size is measured in number of segments, not in - bytes. In additive increase, the congestion window size increases by - one segment in each RTT. In multiplicative decrease, the congestion - window size decreases by half (if the result is not an integer, - round down to the nearest integer). Suppose that two TCP - connections, C1 and C2, share a single congested link of speed 30 - segments per second. Assume that both C1 and C2 are in the - congestion avoidance phase. Connection C1's RTT is 50 msec and - connection C2's RTT is 100 msec. Assume that when the data rate in - the - -link exceeds the link's speed, all TCP connections experience data -segment loss. - -a. If both C1 and C2 at time t0 have a congestion window of 10 - segments, what are their congestion window sizes after 1000 msec? - -b. In the long run, will these two connections get the same share of - the bandwidth of the congested link? Explain. P51. Consider the - network described in the previous problem. Now suppose that the two - TCP connections, C1 and C2, have the same RTT of 100 msec. Suppose - that at time t0, C1's congestion window size is 15 segments but C2's - congestion window size is 10 segments. - -c. What are their congestion window sizes after 2200 msec? - -d. In the long run, will these two connections get about the same share - of the bandwidth of the congested link? - -e. We say that two connections are synchronized, if both connections - reach their maximum window sizes at the same time and reach their - minimum window sizes at the same time. In the long run, will these - two connections get synchronized eventually? If so, what are their - maximum window sizes? - -f. Will this synchronization help to improve the utilization of the - shared link? Why? Sketch some idea to break this synchronization. - P52. Consider a modification to TCP's congestion control algorithm. - Instead of additive increase, we can use multiplicative increase. A - TCP sender increases its window size by a small positive constant - a(0\<a\<1) whenever it receives a valid ACK. Find the functional - relationship between loss rate L and maximum congestion window W. - Argue that for this modified TCP, regardless of TCP's average - throughput, a TCP connection always spends the same amount of time - to increase its congestion window size from W/2 to W. P53. In our - discussion of TCP futures in Section 3.7 , we noted that to achieve - a throughput of 10 Gbps, TCP could only tolerate a segment loss - probability of 2⋅10−10 (or equivalently, one loss event for every - 5,000,000,000 segments). Show the derivation for the values of - 2⋅10−10 (1 out of 5,000,000) for the RTT and MSS values given in - Section 3.7 . If TCP needed to support a 100 Gbps connection, what - would the tolerable loss be? P54. In our discussion of TCP - congestion control in Section 3.7 , we implicitly assumed that the - TCP sender always had data to send. Consider now the case that the - TCP sender sends a large amount of data and then goes idle (since it - has no more data to send) at t1. TCP remains idle for a relatively - long period of time and then wants to send more data at t2. What are - the advantages and disadvantages of having TCP use the cwnd and - ssthresh values from t1 when starting to send data at t2? What - alternative would you recommend? Why? P55. In this problem we - investigate whether either UDP or TCP provides a degree of end-point - authentication. - -g. Consider a server that receives a request within a UDP packet and - responds to that request within a UDP packet (for example, as done - by a DNS server). If a client with IP - -address X spoofs its address with address Y, where will the server send -its response? - -b. Suppose a server receives a SYN with IP source address Y, and after - responding with a SYNACK, receives an ACK with IP source address Y - with the correct acknowledgment number. Assuming the server chooses - a random initial sequence number and there is no - "man-in-the-middle," can the server be certain that the client is - indeed at Y (and not at some other address X that is spoofing Y)? - P56. In this problem, we consider the delay introduced by the TCP - slow-start phase. Consider a client and a Web server directly - connected by one link of rate R. Suppose the client wants to - retrieve an object whose size is exactly equal to 15 S, where S is - the maximum segment size (MSS). Denote the round-trip time between - client and server as RTT (assumed to be constant). Ignoring protocol - headers, determine the time to retrieve the object (including TCP - connection establishment) when - -c. 4 S/R\>S/R+RTT\>2S/R - -d. S/R+RTT\>4 S/R - -e. S/R\>RTT. - -Programming Assignments Implementing a Reliable Transport Protocol In -this laboratory programming assignment, you will be writing the sending -and receiving transport-level code for implementing a simple reliable -data transfer protocol. There are two versions of this lab, the -alternating-bit-protocol version and the GBN version. This lab should be -fun---your implementation will differ very little from what would be -required in a real-world situation. Since you probably don't have -standalone machines (with an OS that you can modify), your code will -have to execute in a simulated hardware/software environment. However, -the programming interface provided to your routines---the code that -would call your entities from above and from below---is very close to -what is done in an actual UNIX environment. (Indeed, the software -interfaces described in this programming assignment are much more -realistic than the infinite loop senders and receivers that many texts -describe.) Stopping and starting timers are also simulated, and timer -interrupts will cause your timer handling routine to be activated. The -full lab assignment, as well as code you will need to compile with your -own code, are available at this book's Web site: -www.pearsonhighered.com/cs-resources. - -Wireshark Lab: Exploring TCP - -In this lab, you'll use your Web browser to access a file from a Web -server. As in earlier Wireshark labs, you'll use Wireshark to capture -the packets arriving at your computer. Unlike earlier labs, you'll also -be able to download a Wireshark-readable packet trace from the Web -server from which you downloaded the file. In this server trace, you'll -find the packets that were generated by your own access of the Web -server. You'll analyze the client- and server-side traces to explore -aspects of TCP. In particular, you'll evaluate the performance of the -TCP connection between your computer and the Web server. You'll trace -TCP's window behavior, and infer packet loss, retransmission, flow -control and congestion control behavior, and estimated roundtrip time. -As is the case with all Wireshark labs, the full description of this lab -is available at this book's Web site, -www.pearsonhighered.com/cs-resources. - -Wireshark Lab: Exploring UDP In this short lab, you'll do a packet -capture and analysis of your favorite application that uses UDP (for -example, DNS or a multimedia application such as Skype). As we learned -in Section 3.3, UDP is a simple, no-frills transport protocol. In this -lab, you'll investigate the header fields in the UDP segment as well as -the checksum calculation. As is the case with all Wireshark labs, the -full description of this lab is available at this book's Web site, -www.pearsonhighered.com/cs-resources. AN INTERVIEW WITH... Van Jacobson -Van Jacobson works at Google and was previously a Research Fellow at -PARC. Prior to that, he was co-founder and Chief Scientist of Packet -Design. Before that, he was Chief Scientist at Cisco. Before joining -Cisco, he was head of the Network Research Group at Lawrence Berkeley -National Laboratory and taught at UC Berkeley and Stanford. Van received -the ACM SIGCOMM Award in 2001 for outstanding lifetime contribution to -the field of communication networks and the IEEE Kobayashi Award in 2002 -for "contributing to the understanding of network congestion and -developing congestion control mechanisms that enabled the successful -scaling of the Internet". He was elected to the U.S. National Academy of -Engineering in 2004. - -Please describe one or two of the most exciting projects you have worked -on during your career. What were the biggest challenges? School teaches -us lots of ways to find answers. In every interesting problem I've -worked on, the challenge has been finding the right question. When Mike -Karels and I started looking at TCP congestion, we spent months staring -at protocol and packet traces asking "Why is it failing?". One day in -Mike's office, one of us said "The reason I can't figure out why it -fails is because I don't understand how it ever worked to begin with." -That turned out to be the right question and it forced us to figure out -the "ack clocking" that makes TCP work. After that, the rest was easy. -More generally, where do you see the future of networking and the -Internet? For most people, the Web is the Internet. Networking geeks -smile politely since we know the Web is an application running over the -Internet but what if they're right? The Internet is about enabling -conversations between pairs of hosts. The Web is about distributed -information production and consumption. "Information propagation" is a -very general view of communication of which "pairwise conversation" is a -tiny subset. We need to move into the larger tent. Networking today -deals with broadcast media (radios, PONs, etc.) by pretending it's a -point-topoint wire. That's massively inefficient. Terabits-per-second of -data are being exchanged all over the World via thumb drives or smart -phones but we don't know how to treat that as "networking". ISPs are -busily setting up caches and CDNs to scalably distribute video and -audio. Caching is a necessary part of the solution but there's no part -of today's networking---from Information, Queuing or Traffic Theory down -to the Internet protocol specs---that tells us how to engineer and -deploy it. I think and hope that over the next few years, networking -will evolve to embrace the much larger vision of communication that -underlies the Web. What people inspired you professionally? - -When I was in grad school, Richard Feynman visited and gave a -colloquium. He talked about a piece of Quantum theory that I'd been -struggling with all semester and his explanation was so simple and lucid -that what had been incomprehensible gibberish to me became obvious and -inevitable. That ability to see and convey the simplicity that underlies -our complex world seems to me a rare and wonderful gift. What are your -recommendations for students who want careers in computer science and -networking? It's a wonderful field---computers and networking have -probably had more impact on society than any invention since the book. -Networking is fundamentally about connecting stuff, and studying it -helps you make intellectual connections: Ant foraging & Bee dances -demonstrate protocol design better than RFCs, traffic jams or people -leaving a packed stadium are the essence of congestion, and students -finding flights back to school in a post-Thanksgiving blizzard are the -core of dynamic routing. If you're interested in lots of stuff and want -to have an impact, it's hard to imagine a better field. - -Chapter 4 The Network Layer: Data Plane - -We learned in the previous chapter that the transport layer provides -various forms of process-to-process communication by relying on the -network layer's host-to-host communication service. We also learned that -the transport layer does so without any knowledge about how the network -layer actually implements this service. So perhaps you're now wondering, -what's under the hood of the host-to-host communication service, what -makes it tick? In this chapter and the next, we'll learn exactly how the -network layer can provide its host-to-host communication service. We'll -see that unlike the transport and application layers, there is a piece -of the network layer in each and every host and router in the network. -Because of this, network-layer protocols are among the most challenging -(and therefore among the most interesting!) in the protocol stack. Since -the network layer is arguably the most complex layer in the protocol -stack, we'll have a lot of ground to cover here. Indeed, there is so -much to cover that we cover the network layer in two chapters. We'll see -that the network layer can be decomposed into two interacting parts, the -data plane and the control plane. In Chapter 4, we'll first cover the -data plane functions of the network layer---the perrouter functions in -the network layer that determine how a datagram (that is, a -network-layer packet) arriving on one of a router's input links is -forwarded to one of that router's output links. We'll cover both -traditional IP forwarding (where forwarding is based on a datagram's -destination address) and generalized forwarding (where forwarding and -other functions may be performed using values in several different -fields in the datagram's header). We'll study the IPv4 and IPv6 -protocols and addressing in detail. In Chapter 5, we'll cover the -control plane functions of the network layer---the network-wide logic -that controls how a datagram is routed among routers along an end-to-end -path from the source host to the destination host. We'll cover routing -algorithms, as well as routing protocols, such as OSPF and BGP, that are -in widespread use in today's Internet. Traditionally, these -control-plane routing protocols and data-plane forwarding functions have -been implemented together, monolithically, within a router. -Software-defined networking (SDN) explicitly separates the data plane -and control plane by implementing these control plane functions as a -separate service, typically in a remote "controller." We'll also cover -SDN controllers in Chapter 5. This distinction between data-plane and -control-plane functions in the network layer is an important concept to -keep in mind as you learn about the network layer ---it will help -structure your thinking about - -the network layer and reflects a modern view of the network layer's role -in computer networking. - -4.1 Overview of Network Layer Figure 4.1 shows a simple network with two -hosts, H1 and H2, and several routers on the path between H1 and H2. -Let's suppose that H1 is sending information to H2, and consider the -role of the network layer in these hosts and in the intervening routers. -The network layer in H1 takes segments from the transport layer in H1, -encapsulates each segment into a datagram, and then sends the datagrams -to its nearby router, R1. At the receiving host, H2, the network layer -receives the datagrams from its nearby router R2, extracts the -transport-layer segments, and delivers the segments up to the transport -layer at H2. The primary data-plane role of each router is to forward -datagrams from its input links to its output links; the primary role of -the network control plane is to coordinate these local, per-router -forwarding actions so that datagrams are ultimately transferred -end-to-end, along paths of routers between source and destination hosts. -Note that the routers in Figure 4.1 are shown with a truncated protocol -stack, that is, with no upper layers above the network layer, because -routers do not run application- and transportlayer protocols such as -those we examined in Chapters 2 and 3. - -4.1.1 Forwarding and Routing: The Data and Control Planes The primary -role of the network layer is deceptively simple---to move packets from a -sending host to a receiving host. To do so, two important network-layer -functions can be identified: Forwarding. When a packet arrives at a -router's input link, the router must move the packet to the appropriate -output link. For example, a packet arriving from Host H1 to Router R1 in -Figure 4.1 must be forwarded to the next router on a path to H2. As we -will see, forwarding is but one function (albeit the most - -Figure 4.1 The network layer - -common and important one!) implemented in the data plane. In the more -general case, which we'll cover in Section 4.4, a packet might also be -blocked from exiting a router (e.g., if the packet originated at a known -malicious sending host, or if the packet were destined to a forbidden -destination host), or might be duplicated and sent over multiple -outgoing links. Routing. The network layer must determine the route or -path taken by packets as they flow from a sender to a receiver. The -algorithms that calculate these paths are referred to as routing -algorithms. A routing algorithm would determine, for example, the path -along which packets flow - -from H1 to H2 in Figure 4.1. Routing is implemented in the control plane -of the network layer. The terms forwarding and routing are often used -interchangeably by authors discussing the network layer. We'll use these -terms much more precisely in this book. Forwarding refers to the -router-local action of transferring a packet from an input link -interface to the appropriate output link interface. Forwarding takes -place at very short timescales (typically a few nanoseconds), and thus -is typically implemented in hardware. Routing refers to the network-wide -process that determines the end-to-end paths that packets take from -source to destination. Routing takes place on much longer timescales -(typically seconds), and as we will see is often implemented in -software. Using our driving analogy, consider the trip from Pennsylvania -to Florida undertaken by our traveler back in Section 1.3.1. During this -trip, our driver passes through many interchanges en route to Florida. -We can think of forwarding as the process of getting through a single -interchange: A car enters the interchange from one road and determines -which road it should take to leave the interchange. We can think of -routing as the process of planning the trip from Pennsylvania to -Florida: Before embarking on the trip, the driver has consulted a map -and chosen one of many paths possible, with each path consisting of a -series of road segments connected at interchanges. A key element in -every network router is its forwarding table. A router forwards a packet -by examining the value of one or more fields in the arriving packet's -header, and then using these header values to index into its forwarding -table. The value stored in the forwarding table entry for those values -indicates the outgoing link interface at that router to which that -packet is to be forwarded. For example, in Figure 4.2, a packet with -header field value of 0110 arrives to a router. The router indexes into -its forwarding table and determines that the output link interface for -this packet is interface 2. The router then internally forwards the -packet to interface 2. In Section 4.2, we'll look inside a router and -examine the forwarding function in much greater detail. Forwarding is -the key function performed by the data-plane functionality of the -network layer. Control Plane: The Traditional Approach But now you are -undoubtedly wondering how a router's forwarding tables are configured in -the first place. This is a crucial issue, one that exposes the important -interplay between forwarding (in data plane) and routing (in control -plane). As shown - -Figure 4.2 Routing algorithms determine values in forward tables - -in Figure 4.2, the routing algorithm determines the contents of the -routers' forwarding tables. In this example, a routing algorithm runs in -each and every router and both forwarding and routing functions are -contained within a router. As we'll see in Sections 5.3 and 5.4, the -routing algorithm function in one router communicates with the routing -algorithm function in other routers to compute the values for its -forwarding table. How is this communication performed? By exchanging -routing messages containing routing information according to a routing -protocol! We'll cover routing algorithms and protocols in Sections 5.2 -through 5.4. The distinct and different purposes of the forwarding and -routing functions can be further illustrated by considering the -hypothetical (and unrealistic, but technically feasible) case of a -network in which all forwarding tables are configured directly by human -network operators physically present at the routers. In this case, no -routing protocols would be required! Of course, the human operators -would need to interact with each other to ensure that the forwarding -tables were configured in such a way that packets reached their intended -destinations. It's also likely that human configuration would be more -error-prone and much slower to respond to changes in the network -topology than a routing protocol. We're thus fortunate that all networks -have both a forwarding and a routing function! Control Plane: The SDN -Approach The approach to implementing routing functionality shown in -Figure 4.2---with each router having a routing component that -communicates with the routing component of other routers---has been the - -traditional approach adopted by routing vendors in their products, at -least until recently. Our observation that humans could manually -configure forwarding tables does suggest, however, that there may be -other ways for control-plane functionality to determine the contents of -the data-plane forwarding tables. Figure 4.3 shows an alternate approach -in which a physically separate (from the routers), remote controller -computes and distributes the forwarding tables to be used by each and -every router. Note that the data plane components of Figures 4.2 and 4.3 -are identical. In Figure 4.3, however, control-plane routing -functionality is separated - -Figure 4.3 A remote controller determines and distributes values in -forwarding tables - -from the physical router---the routing device performs forwarding only, -while the remote controller computes and distributes forwarding tables. -The remote controller might be implemented in a remote data center with -high reliability and redundancy, and might be managed by the ISP or some -third party. How might the routers and the remote controller -communicate? By exchanging messages containing forwarding tables and -other pieces of routing information. The control-plane approach shown in -Figure 4.3 is at the heart of software-defined networking (SDN), where -the network is "software-defined" because the controller that computes -forwarding tables and interacts with routers is implemented in software. -Increasingly, these software implementations are also open, i.e., -similar to Linux OS code, the - -code is publically available, allowing ISPs (and networking researchers -and students!) to innovate and propose changes to the software that -controls network-layer functionality. We will cover the SDN control -plane in Section 5.5. - -4.1.2 Network Service Model Before delving into the network layer's data -plane, let's wrap up our introduction by taking the broader view and -consider the different types of service that might be offered by the -network layer. When the transport layer at a sending host transmits a -packet into the network (that is, passes it down to the network layer at -the sending host), can the transport layer rely on the network layer to -deliver the packet to the destination? When multiple packets are sent, -will they be delivered to the transport layer in the receiving host in -the order in which they were sent? Will the amount of time between the -sending of two sequential packet transmissions be the same as the amount -of time between their reception? Will the network provide any feedback -about congestion in the network? The answers to these questions and -others are determined by the service model provided by the network -layer. The network service model defines the characteristics of -end-to-end delivery of packets between sending and receiving hosts. -Let's now consider some possible services that the network layer could -provide. These services could include: Guaranteed delivery. This service -guarantees that a packet sent by a source host will eventually arrive at -the destination host. Guaranteed delivery with bounded delay. This -service not only guarantees delivery of the packet, but delivery within -a specified host-to-host delay bound (for example, within 100 msec). -In-order packet delivery. This service guarantees that packets arrive at -the destination in the order that they were sent. Guaranteed minimal -bandwidth. This network-layer service emulates the behavior of a -transmission link of a specified bit rate (for example, 1 Mbps) between -sending and receiving hosts. As long as the sending host transmits bits -(as part of packets) at a rate below the specified bit rate, then all -packets are eventually delivered to the destination host. Security. The -network layer could encrypt all datagrams at the source and decrypt them -at the destination, thereby providing confidentiality to all -transport-layer segments. This is only a partial list of services that a -network layer could provide---there are countless variations possible. -The Internet's network layer provides a single service, known as -best-effort service. With best-effort service, packets are neither -guaranteed to be received in the order in which they were sent, nor is -their eventual delivery even guaranteed. There is no guarantee on the -end-to-end delay nor is there a - -minimal bandwidth guarantee. It might appear that best-effort service is -a euphemism for no service at all---a network that delivered no packets -to the destination would satisfy the definition of best-effort delivery -service! Other network architectures have defined and implemented -service models that go beyond the Internet's best-effort service. For -example, the ATM network architecture \[MFA Forum 2016, Black 1995\] -provides for guaranteed in-order delay, bounded delay, and guaranteed -minimal bandwidth. There have also been proposed service model -extensions to the Internet architecture; for example, the Intserv -architecture \[RFC 1633\] aims to provide end-end delay guarantees and -congestion-free communication. Interestingly, in spite of these -well-developed alternatives, the Internet's basic best-effort service -model combined with adequate bandwidth provisioning have arguably proven -to be more than "good enough" to enable an amazing range of -applications, including streaming video services such as Netflix and -voice-and-video-over-IP, real-time conferencing applications such as -Skype and Facetime. - -An Overview of Chapter 4 Having now provided an overview of the network -layer, we'll cover the data-plane component of the network layer in the -following sections in this chapter. In Section 4.2, we'll dive down into -the internal hardware operations of a router, including input and output -packet processing, the router's internal switching mechanism, and packet -queueing and scheduling. In Section 4.3, we'll take a look at -traditional IP forwarding, in which packets are forwarded to output -ports based on their destination IP addresses. We'll encounter IP -addressing, the celebrated IPv4 and IPv6 protocols and more. In Section -4.4, we'll cover more generalized forwarding, where packets may be -forwarded to output ports based on a large number of header values -(i.e., not only based on destination IP address). Packets may be blocked -or duplicated at the router, or may have certain header field values -rewritten---all under software control. This more generalized form of -packet forwarding is a key component of a modern network data plane, -including the data plane in software-defined networks (SDN). We mention -here in passing that the terms forwarding and switching are often used -interchangeably by computer-networking researchers and practitioners; -we'll use both terms interchangeably in this textbook as well. While -we're on the topic of terminology, it's also worth mentioning two other -terms that are often used interchangeably, but that we will use more -carefully. We'll reserve the term packet switch to mean a general -packet-switching device that transfers a packet from input link -interface to output link interface, according to values in a packet's -header fields. Some packet switches, called link-layer switches -(examined in Chapter 6), base their forwarding decision on values in the -fields of the linklayer frame; switches are thus referred to as -link-layer (layer 2) devices. Other packet switches, called routers, -base their forwarding decision on header field values in the -network-layer datagram. Routers are thus network-layer (layer 3) -devices. (To fully appreciate this important distinction, you might want -to review Section 1.5.2, where we discuss network-layer datagrams and -link-layer frames and their relationship.) Since our focus in this -chapter is on the network layer, we'll mostly use the term router in -place of packet switch. - -4.2 What's Inside a Router? Now that we've overviewed the data and -control planes within the network layer, the important distinction -between forwarding and routing, and the services and functions of the -network layer, let's turn our attention to its forwarding function---the -actual transfer of packets from a router's incoming links to the -appropriate outgoing links at that router. A high-level view of a -generic router architecture is shown in Figure 4.4. Four router -components can be identified: - -Figure 4.4 Router architecture - -Input ports. An input port performs several key functions. It performs -the physical layer function of terminating an incoming physical link at -a router; this is shown in the leftmost box of an input port and the -rightmost box of an output port in Figure 4.4. An input port also -performs link-layer functions needed to interoperate with the link layer -at the other side of the incoming link; this is represented by the -middle boxes in the input and output ports. Perhaps most crucially, a -lookup function is also performed at the input port; this will occur in -the rightmost box of the input port. It is here that the forwarding -table is consulted to determine the router output port to which an -arriving packet will be forwarded via the switching fabric. Control -packets (for example, packets carrying routing protocol information) are -forwarded from an input port to the routing processor. Note that the -term "port" here ---referring to the physical input and output router -interfaces---is distinctly different from the software - -ports associated with network applications and sockets discussed in -Chapters 2 and 3. In practice, the number of ports supported by a router -can range from a relatively small number in enterprise routers, to -hundreds of 10 Gbps ports in a router at an ISP's edge, where the number -of incoming lines tends to be the greatest. The Juniper MX2020, edge -router, for example, supports up to 960 10 Gbps Ethernet ports, with an -overall router system capacity of 80 Tbps \[Juniper MX 2020 2016\]. -Switching fabric. The switching fabric connects the router's input ports -to its output ports. This switching fabric is completely contained -within the router---a network inside of a network router! Output ports. -An output port stores packets received from the switching fabric and -transmits these packets on the outgoing link by performing the necessary -link-layer and physical-layer functions. When a link is bidirectional -(that is, carries traffic in both directions), an output port will -typically be paired with the input port for that link on the same line -card. Routing processor. The routing processor performs control-plane -functions. In traditional routers, it executes the routing protocols -(which we'll study in Sections 5.3 and 5.4), maintains routing tables -and attached link state information, and computes the forwarding table -for the router. In SDN routers, the routing processor is responsible for -communicating with the remote controller in order to (among other -activities) receive forwarding table entries computed by the remote -controller, and install these entries in the router's input ports. The -routing processor also performs the network management functions that -we'll study in Section 5.7. A router's input ports, output ports, and -switching fabric are almost always implemented in hardware, as shown in -Figure 4.4. To appreciate why a hardware implementation is needed, -consider that with a 10 Gbps input link and a 64-byte IP datagram, the -input port has only 51.2 ns to process the datagram before another -datagram may arrive. If N ports are combined on a line card (as is often -done in practice), the datagram-processing pipeline must operate N times -faster---far too fast for software implementation. Forwarding hardware -can be implemented either using a router vendor's own hardware designs, -or constructed using purchased merchant-silicon chips (e.g., as sold by -companies such as Intel and Broadcom). While the data plane operates at -the nanosecond time scale, a router's control functions---executing the -routing protocols, responding to attached links that go up or down, -communicating with the remote controller (in the SDN case) and -performing management functions---operate at the millisecond or second -timescale. These control plane functions are thus usually implemented in -software and execute on the routing processor (typically a traditional -CPU). Before delving into the details of router internals, let's return -to our analogy from the beginning of this chapter, where packet -forwarding was compared to cars entering and leaving an interchange. -Let's suppose that the interchange is a roundabout, and that as a car -enters the roundabout, a bit of processing is required. Let's consider -what information is required for this processing: Destination-based -forwarding. Suppose the car stops at an entry station and indicates its -final - -destination (not at the local roundabout, but the ultimate destination -of its journey). An attendant at the entry station looks up the final -destination, determines the roundabout exit that leads to that final -destination, and tells the driver which roundabout exit to take. -Generalized forwarding. The attendant could also determine the car's -exit ramp on the basis of many other factors besides the destination. -For example, the selected exit ramp might depend on the car's origin, -for example the state that issued the car's license plate. Cars from a -certain set of states might be directed to use one exit ramp (that leads -to the destination via a slow road), while cars from other states might -be directed to use a different exit ramp (that leads to the destination -via superhighway). The same decision might be made based on the model, -make and year of the car. Or a car not deemed roadworthy might be -blocked and not be allowed to pass through the roundabout. In the case -of generalized forwarding, any number of factors may contribute to the -attendant's choice of the exit ramp for a given car. Once the car enters -the roundabout (which may be filled with other cars entering from other -input roads and heading to other roundabout exits), it eventually leaves -at the prescribed roundabout exit ramp, where it may encounter other -cars leaving the roundabout at that exit. We can easily recognize the -principal router components in Figure 4.4 in this analogy---the entry -road and entry station correspond to the input port (with a lookup -function to determine to local outgoing port); the roundabout -corresponds to the switch fabric; and the roundabout exit road -corresponds to the output port. With this analogy, it's instructive to -consider where bottlenecks might occur. What happens if cars arrive -blazingly fast (for example, the roundabout is in Germany or Italy!) but -the station attendant is slow? How fast must the attendant work to -ensure there's no backup on an entry road? Even with a blazingly fast -attendant, what happens if cars traverse the roundabout slowly---can -backups still occur? And what happens if most of the cars entering at -all of the roundabout's entrance ramps all want to leave the roundabout -at the same exit ramp---can backups occur at the exit ramp or elsewhere? -How should the roundabout operate if we want to assign priorities to -different cars, or block certain cars from entering the roundabout in -the first place? These are all analogous to critical questions faced by -router and switch designers. In the following subsections, we'll look at -router functions in more detail. \[Iyer 2008, Chao 2001; Chuang 2005; -Turner 1988; McKeown 1997a; Partridge 1998; Sopranos 2011\] provide a -discussion of specific router architectures. For concreteness and -simplicity, we'll initially assume in this section that forwarding -decisions are based only on the packet's destination address, rather -than on a generalized set of packet header fields. We will cover the -case of more generalized packet forwarding in Section 4.4. - -4.2.1 Input Port Processing and Destination-Based Forwarding - -A more detailed view of input processing is shown in Figure 4.5. As just -discussed, the input port's linetermination function and link-layer -processing implement the physical and link layers for that individual -input link. The lookup performed in the input port is central to the -router's operation---it is here that the router uses the forwarding -table to look up the output port to which an arriving packet will be -forwarded via the switching fabric. The forwarding table is either -computed and updated by the routing processor (using a routing protocol -to interact with the routing processors in other network routers) or is -received from a remote SDN controller. The forwarding table is copied -from the routing processor to the line cards over a separate bus (e.g., -a PCI bus) indicated by the dashed line from the routing processor to -the input line cards in Figure 4.4. With such a shadow copy at each line -card, forwarding decisions can be made locally, at each input port, -without invoking the centralized routing processor on a per-packet basis -and thus avoiding a centralized processing bottleneck. Let's now -consider the "simplest" case that the output port to which an incoming -packet is to be switched is based on the packet's destination address. -In the case of 32-bit IP addresses, a brute-force implementation of the -forwarding table would have one entry for every possible destination -address. Since there are more than 4 billion possible addresses, this -option is totally out of the question. - -Figure 4.5 Input port processing - -As an example of how this issue of scale can be handled, let's suppose -that our router has four links, numbered 0 through 3, and that packets -are to be forwarded to the link interfaces as follows: - -Destination Address Range - -Link Interface - -11001000 00010111 00010000 00000000 - -0 - -through 11001000 00010111 00010111 11111111 - -11001000 00010111 00011000 00000000 - -1 - -through 11001000 00010111 00011000 11111111 - -11001000 00010111 00011001 00000000 - -2 - -through 11001000 00010111 00011111 11111111 - -Otherwise - -3 - -Clearly, for this example, it is not necessary to have 4 billion entries -in the router's forwarding table. We could, for example, have the -following forwarding table with just four entries: - -Prefix - -Link Interface - -11001000 00010111 00010 - -0 - -11001000 00010111 00011000 - -1 - -11001000 00010111 00011 - -2 - -Otherwise - -3 - -With this style of forwarding table, the router matches a prefix of the -packet's destination address with the entries in the table; if there's a -match, the router forwards the packet to a link associated with the -match. For example, suppose the packet's destination address is 11001000 -00010111 00010110 10100001 ; because the 21-bit prefix of this address -matches the first entry in the table, the router forwards the packet to -link interface 0. If a prefix doesn't match any of the first three -entries, then the router forwards the packet to the default interface 3. -Although this sounds simple enough, there's a very important subtlety -here. You may have noticed that it is possible for a destination address -to match more than one entry. For example, the first 24 bits of the -address 11001000 00010111 00011000 10101010 match the second entry in -the table, and the first 21 bits of the address match the third entry in -the table. When there are multiple matches, the router uses the longest -prefix matching rule; that is, it finds the longest matching entry in -the table and forwards the packet to the link interface associated with -the longest prefix match. We'll see exactly why this longest -prefix-matching rule is used when we study Internet addressing in more -detail in Section 4.3. - -Given the existence of a forwarding table, lookup is conceptually -simple---hardware logic just searches through the forwarding table -looking for the longest prefix match. But at Gigabit transmission rates, -this lookup must be performed in nanoseconds (recall our earlier example -of a 10 Gbps link and a 64-byte IP datagram). Thus, not only must lookup -be performed in hardware, but techniques beyond a simple linear search -through a large table are needed; surveys of fast lookup algorithms can -be found in \[Gupta 2001, Ruiz-Sanchez 2001\]. Special attention must -also be paid to memory access times, resulting in designs with embedded -on-chip DRAM and faster SRAM (used as a DRAM cache) memories. In -practice, Ternary Content Addressable Memories (TCAMs) are also often -used for lookup \[Yu 2004\]. With a TCAM, a 32-bit IP address is -presented to the memory, which returns the content of the forwarding -table entry for that address in essentially constant time. The Cisco -Catalyst 6500 and 7600 Series routers and switches can hold upwards of a -million TCAM forwarding table entries \[Cisco TCAM 2014\]. Once a -packet's output port has been determined via the lookup, the packet can -be sent into the switching fabric. In some designs, a packet may be -temporarily blocked from entering the switching fabric if packets from -other input ports are currently using the fabric. A blocked packet will -be queued at the input port and then scheduled to cross the fabric at a -later point in time. We'll take a closer look at the blocking, queuing, -and scheduling of packets (at both input ports and output ports) -shortly. Although "lookup" is arguably the most important action in -input port processing, many other actions must be taken: (1) physical- -and link-layer processing must occur, as discussed previously; (2) the -packet's version number, checksum and time-to-live field---all of which -we'll study in Section 4.3---must be checked and the latter two fields -rewritten; and (3) counters used for network management (such as the -number of IP datagrams received) must be updated. Let's close our -discussion of input port processing by noting that the input port steps -of looking up a destination IP address ("match") and then sending the -packet into the switching fabric to the specified output port ("action") -is a specific case of a more general "match plus action" abstraction -that is performed in many networked devices, not just routers. In -link-layer switches (covered in Chapter 6), link-layer destination -addresses are looked up and several actions may be taken in addition to -sending the frame into the switching fabric towards the output port. In -firewalls (covered in Chapter 8)---devices that filter out selected -incoming packets---an incoming packet whose header matches a given -criteria (e.g., a combination of source/destination IP addresses and -transport-layer port numbers) may be dropped (action). In a network -address translator (NAT, covered in Section 4.3), an incoming packet -whose transport-layer port number matches a given value will have its -port number rewritten before forwarding (action). Indeed, the "match -plus action" abstraction is both powerful and prevalent in network -devices today, and is central to the notion of generalized forwarding -that we'll study in Section 4.4. - -4.2.2 Switching The switching fabric is at the very heart of a router, -as it is through this fabric that the packets are actually switched -(that is, forwarded) from an input port to an output port. Switching can -be accomplished in a number of ways, as shown in Figure 4.6: Switching -via memory. The simplest, earliest routers were traditional computers, -with switching between input and output ports being done under direct -control of the CPU (routing processor). Input and output ports -functioned as traditional I/O devices in a traditional operating system. -An input port with an arriving packet first signaled the routing -processor via an interrupt. The packet was then copied from the input -port into processor memory. The routing processor then extracted the -destination address from the header, looked up the appropriate output -port in the forwarding table, and copied the packet to the output port's -buffers. In this scenario, if the memory bandwidth is such that a -maximum of B packets per second can be written into, or read from, -memory, then the overall forwarding throughput (the total rate at which -packets are transferred from input ports to output ports) must be less -than B/2. Note also that two packets cannot be forwarded - -Figure 4.6 Three switching techniques - -at the same time, even if they have different destination ports, since -only one memory read/write can be done at a time over the shared system -bus. Some modern routers switch via memory. A major difference from -early routers, however, is that the lookup of the destination address -and the storing of the packet into the appropriate memory location are -performed by processing on the input line cards. In some ways, routers -that switch via memory look very much like shared-memory -multiprocessors, with the processing on a line card switching (writing) -packets into the memory of the appropriate output port. Cisco's Catalyst -8500 series switches \[Cisco 8500 2016\] internally switches packets via -a shared memory. Switching via a bus. In this approach, an input port -transfers a packet directly to the output port over a shared bus, -without intervention by the routing processor. This is typically done by -having the input port pre-pend a switch-internal label (header) to the -packet indicating the local output port to which this packet is being -transferred and transmitting the packet onto the bus. All output ports -receive the packet, but only the port that matches the label will keep -the packet. The label is then removed at the output port, as this label -is only used within the switch to cross the bus. If multiple packets -arrive to the router at the same time, each at a different input port, -all but one must wait since only one packet can cross the bus at a time. -Because every packet must cross the single bus, the switching speed of -the router is limited to the bus speed; in our roundabout analogy, this -is as if the roundabout could only contain one car at a time. -Nonetheless, switching via a bus is often sufficient for routers that -operate in small local area and enterprise networks. The Cisco 6500 -router \[Cisco 6500 2016\] internally switches packets over a -32-Gbps-backplane bus. Switching via an interconnection network. One way -to overcome the bandwidth limitation of a single, shared bus is to use a -more sophisticated interconnection network, such as those that have been -used in the past to interconnect processors in a multiprocessor computer -architecture. A crossbar switch is an interconnection network consisting -of 2N buses that connect N input ports to N output ports, as shown in -Figure 4.6. Each vertical bus intersects each horizontal bus at a -crosspoint, which can be opened or closed at any time by the switch -fabric controller (whose logic is - -part of the switching fabric itself). When a packet arrives from port A -and needs to be forwarded to port Y, the switch controller closes the -crosspoint at the intersection of busses A and Y, and port A then sends -the packet onto its bus, which is picked up (only) by bus Y. Note that a -packet from port B can be forwarded to port X at the same time, since -the A-to-Y and B-to-X packets use different input and output busses. -Thus, unlike the previous two switching approaches, crossbar switches -are capable of forwarding multiple packets in parallel. A crossbar -switch is non-blocking---a packet being forwarded to an output port will -not be blocked from reaching that output port as long as no other packet -is currently being forwarded to that output port. However, if two -packets from two different input ports are destined to that same output -port, then one will have to wait at the input, since only one packet can -be sent over any given bus at a time. Cisco 12000 series switches -\[Cisco 12000 2016\] use a crossbar switching network; the Cisco 7600 -series can be configured to use either a bus or crossbar switch \[Cisco -7600 2016\]. More sophisticated interconnection networks use multiple -stages of switching elements to allow packets from different input ports -to proceed towards the same output port at the same time through the -multi-stage switching fabric. See \[Tobagi 1990\] for a survey of switch -architectures. The Cisco CRS employs a three-stage non-blocking -switching strategy. A router's switching capacity can also be scaled by -running multiple switching fabrics in parallel. In this approach, input -ports and output ports are connected to N switching fabrics that operate -in parallel. An input port breaks a packet into K smaller chunks, and -sends ("sprays") the chunks through K of these N switching fabrics to -the selected output port, which reassembles the K chunks back into the -original packet. - -4.2.3 Output Port Processing Output port processing, shown in Figure -4.7, takes packets that have been stored in the output port's memory and -transmits them over the output link. This includes selecting and -de-queueing packets for transmission, and performing the needed -link-layer and physical-layer transmission functions. - -4.2.4 Where Does Queuing Occur? If we consider input and output port -functionality and the configurations shown in Figure 4.6, it's clear -that packet queues may form at both the input ports and the output -ports, just as we identified cases where cars may wait at the inputs and -outputs of the traffic intersection in our roundabout analogy. The -location and extent of queueing (either at the input port queues or the -output port queues) will depend on the traffic load, the relative speed -of the switching fabric, and the line speed. Let's now consider these -queues in a bit more detail, since as these queues grow large, the -router's memory can eventually be exhausted and packet loss will occur -when no memory is available to store arriving packets. Recall that in -our earlier discussions, we said that packets were "lost within the -network" or "dropped at a - -router." It is here, at these queues within a router, where such packets -are actually dropped and lost. - -Figure 4.7 Output port processing - -Suppose that the input and output line speeds (transmission rates) all -have an identical transmission rate of Rline packets per second, and -that there are N input ports and N output ports. To further simplify the -discussion, let's assume that all packets have the same fixed length, -and that packets arrive to input ports in a synchronous manner. That is, -the time to send a packet on any link is equal to the time to receive a -packet on any link, and during such an interval of time, either zero or -one packets can arrive on an input link. Define the switching fabric -transfer rate Rswitch as the rate at which packets can be moved from -input port to output port. If Rswitch is N times faster than Rline, then -only negligible queuing will occur at the input ports. This is because -even in the worst case, where all N input lines are receiving packets, -and all packets are to be forwarded to the same output port, each batch -of N packets (one packet per input port) can be cleared through the -switch fabric before the next batch arrives. Input Queueing But what -happens if the switch fabric is not fast enough (relative to the input -line speeds) to transfer all arriving packets through the fabric without -delay? In this case, packet queuing can also occur at the input ports, -as packets must join input port queues to wait their turn to be -transferred through the switching fabric to the output port. To -illustrate an important consequence of this queuing, consider a crossbar -switching fabric and suppose that (1) all link speeds are identical, (2) -that one packet can be transferred from any one input port to a given -output port in the same amount of time it takes for a packet to be -received on an input link, and (3) packets are moved from a given input -queue to their desired output queue in an FCFS manner. Multiple packets -can be transferred in parallel, as long as their output ports are -different. However, if two packets at the front of two input queues are -destined for the same output queue, then one of the packets will be -blocked and must wait at the input queue---the switching fabric can -transfer only one packet to a given output port at a time. Figure 4.8 -shows an example in which two packets (darkly shaded) at the front of -their input queues are destined for the same upper-right output port. -Suppose that the switch fabric chooses to transfer the packet from the -front of the upper-left queue. In this case, the darkly shaded packet in -the lower-left queue must wait. But not only must this darkly shaded -packet wait, so too must the lightly shaded - -packet that is queued behind that packet in the lower-left queue, even -though there is no contention for the middle-right output port (the -destination for the lightly shaded packet). This phenomenon is known as -head-of-the-line (HOL) blocking in an input-queued switch---a queued -packet in an input queue must wait for transfer through the fabric (even -though its output port is free) because it is blocked by another packet -at the head of the line. \[Karol 1987\] shows that due to HOL blocking, -the input queue will grow to unbounded length (informally, this is -equivalent to saying that significant packet loss will occur) under -certain assumptions as soon as the packet arrival rate on the input -links reaches only 58 percent of their capacity. A number of solutions -to HOL blocking are discussed in \[McKeown 1997\]. - -Figure 4.8 HOL blocking at and input-queued switch - -Output Queueing Let's next consider whether queueing can occur at a -switch's output ports. Suppose that Rswitch is again N times faster than -Rline and that packets arriving at each of the N input ports are -destined to the same output port. In this case, in the time it takes to -send a single packet onto the outgoing link, N new packets will arrive -at this output port (one from each of the N input ports). Since the -output port can - -transmit only a single packet in a unit of time (the packet transmission -time), the N arriving packets will have to queue (wait) for transmission -over the outgoing link. Then N more packets can possibly arrive in the -time it takes to transmit just one of the N packets that had just -previously been queued. And so on. Thus, packet queues can form at the -output ports even when the switching fabric is N times faster than the -port line speeds. Eventually, the number of queued packets can grow -large enough to exhaust available memory at the output port. - -Figure 4.9 Output port queueing - -When there is not enough memory to buffer an incoming packet, a decision -must be made to either drop the arriving packet (a policy known as -drop-tail) or remove one or more already-queued packets to make room for -the newly arrived packet. In some cases, it may be advantageous to drop -(or mark the header of) a packet before the buffer is full in order to -provide a congestion signal to the sender. A number of proactive -packet-dropping and -marking policies (which collectively have become -known as active queue management (AQM) algorithms) have been proposed -and analyzed \[Labrador 1999, Hollot 2002\]. One of the most widely -studied and implemented AQM algorithms is the Random Early Detection -(RED) algorithm \[Christiansen 2001; Floyd 2016\]. Output port queuing -is illustrated in Figure 4.9. At time t, a packet has arrived at each of -the incoming input ports, each destined for the uppermost outgoing port. -Assuming identical line speeds and a switch operating at three times the -line speed, one time unit later (that is, in the time needed to receive -or send - -a packet), all three original packets have been transferred to the -outgoing port and are queued awaiting transmission. In the next time -unit, one of these three packets will have been transmitted over the -outgoing link. In our example, two new packets have arrived at the -incoming side of the switch; one of these packets is destined for this -uppermost output port. A consequence of such queuing is that a packet -scheduler at the output port must choose one packet, among those queued, -for transmission--- a topic we'll cover in the following section. Given -that router buffers are needed to absorb the fluctuations in traffic -load, a natural question to ask is how much buffering is required. For -many years, the rule of thumb \[RFC 3439\] for buffer sizing was that -the amount of buffering (B) should be equal to an average round-trip -time (RTT, say 250 msec) times the link capacity (C). This result is -based on an analysis of the queueing dynamics of a relatively small -number of TCP flows \[Villamizar 1994\]. Thus, a 10 Gbps link with an -RTT of 250 msec would need an amount of buffering equal to B 5 RTT · C 5 -2.5 Gbits of buffers. More recent theoretical and experimental efforts -\[Appenzeller 2004\], however, suggest that when there are a large -number of TCP flows (N) passing through a link, the amount of buffering -needed is B=RTI⋅C/N. With a large number of flows typically passing -through large backbone router links (see, e.g., \[Fraleigh 2003\]), the -value of N can be large, with the decrease in needed buffer size -becoming quite significant. \[Appenzeller 2004; Wischik 2005; Beheshti -2008\] provide very readable discussions of the buffer-sizing problem -from a theoretical, implementation, and operational standpoint. - -4.2.5 Packet Scheduling Let's now return to the question of determining -the order in which queued packets are transmitted over an outgoing link. -Since you yourself have undoubtedly had to wait in long lines on many -occasions and observed how waiting customers are served, you're no doubt -familiar with many of the queueing disciplines commonly used in routers. -There is first-come-first-served (FCFS, also known as first-in-firstout, -FIFO). The British are famous for patient and orderly FCFS queueing at -bus stops and in the marketplace ("Oh, are you queueing?"). Other -countries operate on a priority basis, with one class of waiting -customers given priority service over other waiting customers. There is -also round-robin queueing, where customers are again divided into -classes (as in priority queueing) but each class of customer is given -service in turn. First-in-First-Out (FIFO) Figure 4.10 shows the queuing -model abstraction for the FIFO link-scheduling discipline. Packets -arriving at the link output queue wait for transmission if the link is -currently busy transmitting another packet. If there is not sufficient -buffering space to hold the arriving packet, the queue's -packetdiscarding policy then determines whether the packet will be -dropped (lost) or whether other packets will be removed from the queue -to make space for the arriving packet, as discussed above. In our - -discussion below, we'll ignore packet discard. When a packet is -completely transmitted over the outgoing link (that is, receives -service) it is removed from the queue. The FIFO (also known as -first-come-first-served, or FCFS) scheduling discipline selects packets -for link transmission in the same order in which they arrived at the -output link queue. We're all familiar with FIFO queuing from service -centers, where - -Figure 4.10 FIFO queueing abstraction - -arriving customers join the back of the single waiting line, remain in -order, and are then served when they reach the front of the line. Figure -4.11 shows the FIFO queue in operation. Packet arrivals are indicated by -numbered arrows above the upper timeline, with the number indicating the -order in which the packet arrived. Individual packet departures are -shown below the lower timeline. The time that a packet spends in service -(being transmitted) is indicated by the shaded rectangle between the two -timelines. In our examples here, let's assume that each packet takes -three units of time to be transmitted. Under the FIFO discipline, -packets leave in the same order in which they arrived. Note that after -the departure of packet 4, the link remains idle (since packets 1 -through 4 have been transmitted and removed from the queue) until the -arrival of packet 5. Priority Queuing Under priority queuing, packets -arriving at the output link are classified into priority classes upon -arrival at the queue, as shown in Figure 4.12. In practice, a network -operator may configure a queue so that packets carrying network -management information (e.g., as indicated by the source or destination -TCP/UDP port number) receive priority over user traffic; additionally, -real-time voice-over-IP packets might receive priority over non-real -traffic such as SMTP or IMAP e-mail packets. Each - -Figure 4.11 The FIFO queue in operation - -Figure 4.12 The priority queueing model - -priority class typically has its own queue. When choosing a packet to -transmit, the priority queuing discipline will transmit a packet from -the highest priority class that has a nonempty queue (that is, has -packets waiting for transmission). The choice among packets in the same -priority class is typically done in a FIFO manner. Figure 4.13 -illustrates the operation of a priority queue with two priority classes. -Packets 1, 3, and 4 belong to the high-priority class, and packets 2 and -5 belong to the low-priority class. Packet 1 arrives and, finding the -link idle, begins transmission. During the transmission of packet 1, -packets 2 and 3 arrive and are queued in the low- and high-priority -queues, respectively. After the transmission of packet 1, packet 3 (a -high-priority packet) is selected for transmission over packet 2 (which, -even though it arrived earlier, is a low-priority packet). At the end of -the transmission of packet 3, packet 2 then begins transmission. Packet -4 (a high-priority packet) arrives during the transmission of packet 2 -(a low-priority packet). Under a non-preemptive priority queuing -discipline, the transmission of a packet is not interrupted once it has - -Figure 4.13 The priority queue in operation - -Figure 4.14 The two-class robin queue in operation - -begun. In this case, packet 4 queues for transmission and begins being -transmitted after the transmission of packet 2 is completed. Round Robin -and Weighted Fair Queuing (WFQ) Under the round robin queuing -discipline, packets are sorted into classes as with priority queuing. -However, rather than there being a strict service priority among -classes, a round robin scheduler alternates service among the classes. -In the simplest form of round robin scheduling, a class 1 packet is -transmitted, followed by a class 2 packet, followed by a class 1 packet, -followed by a class 2 packet, and so on. A so-called work-conserving -queuing discipline will never allow the link to remain idle whenever -there are packets (of any class) queued for transmission. A -work-conserving round robin discipline that looks for a packet of a -given class but finds none will immediately check the next class in the -round robin sequence. Figure 4.14 illustrates the operation of a -two-class round robin queue. In this example, packets 1, 2, and - -4 belong to class 1, and packets 3 and 5 belong to the second class. -Packet 1 begins transmission immediately upon arrival at the output -queue. Packets 2 and 3 arrive during the transmission of packet 1 and -thus queue for transmission. After the transmission of packet 1, the -link scheduler looks for a class 2 packet and thus transmits packet 3. -After the transmission of packet 3, the scheduler looks for a class 1 -packet and thus transmits packet 2. After the transmission of packet 2, -packet 4 is the only queued packet; it is thus transmitted immediately -after packet 2. A generalized form of round robin queuing that has been -widely implemented in routers is the so-called weighted fair queuing -(WFQ) discipline \[Demers 1990; Parekh 1993; Cisco QoS 2016\]. WFQ is -illustrated in Figure 4.15. Here, arriving packets are classified and -queued in the appropriate per-class waiting area. As in round robin -scheduling, a WFQ scheduler will serve classes in a circular manner--- -first serving class 1, then serving class 2, then serving class 3, and -then (assuming there are three classes) repeating the service pattern. -WFQ is also a work-conserving - -Figure 4.15 Weighted fair queueing - -queuing discipline and thus will immediately move on to the next class -in the service sequence when it finds an empty class queue. WFQ differs -from round robin in that each class may receive a differential amount of -service in any interval of time. Specifically, each class, i, is -assigned a weight, wi. Under WFQ, during any interval of time during -which there are class i packets to send, class i will then be guaranteed -to receive a fraction of service equal to wi/(∑wj), where the sum in the -denominator is taken over all classes that also have packets queued for -transmission. In the worst case, even if all classes have queued -packets, class i will still be guaranteed to receive a fraction wi/(∑wj) -of the bandwidth, where in this worst case the sum in the denominator is -over all classes. Thus, for a link with transmission rate R, class i -will always achieve a throughput of at least R⋅wi/(∑wj). Our description -of WFQ has been idealized, as we have not considered the fact that -packets are discrete and a packet's transmission will not be interrupted -to begin transmission of another packet; \[Demers 1990; Parekh 1993\] -discuss this packetization issue. - -4.3 The Internet Protocol (IP): IPv4, Addressing, IPv6, and More Our -study of the network layer thus far in Chapter 4---the notion of the -data and control plane component of the network layer, our distinction -between forwarding and routing, the identification of various network -service models, and our look inside a router---have often been without -reference to any specific computer network architecture or protocol. In -this section we'll focus on key aspects of the network layer on today's -Internet and the celebrated Internet Protocol (IP). There are two -versions of IP in use today. We'll first examine the widely deployed IP -protocol version 4, which is usually referred to simply as IPv4 \[RFC -791\] - -Figure 4.16 IPv4 datagram format - -in Section 4.3.1. We'll examine IP version 6 \[RFC 2460; RFC 4291\], -which has been proposed to replace IPv4, in Section 4.3.5. In between, -we'll primarily cover Internet addressing---a topic that might seem -rather dry and detail-oriented but we'll see is crucial to understanding -how the Internet's network layer works. To master IP addressing is to -master the Internet's network layer itself! - -4.3.1 IPv4 Datagram Format Recall that the Internet's network-layer -packet is referred to as a datagram. We begin our study of IP with an -overview of the syntax and semantics of the IPv4 datagram. You might be -thinking that nothing could be drier than the syntax and semantics of a -packet's bits. Nevertheless, the datagram plays a central role in the -Internet---every networking student and professional needs to see it, -absorb it, and master it. (And just to see that protocol headers can -indeed be fun to study, check out \[Pomeranz 2010\]). The IPv4 datagram -format is shown in Figure 4.16. The key fields in the IPv4 datagram are -the following: Version number. These 4 bits specify the IP protocol -version of the datagram. By looking at the version number, the router -can determine how to interpret the remainder of the IP datagram. -Different versions of IP use different datagram formats. The datagram -format for IPv4 is shown in Figure 4.16. The datagram format for the new -version of IP (IPv6) is discussed in Section 4.3.5. Header length. -Because an IPv4 datagram can contain a variable number of options (which -are included in the IPv4 datagram header), these 4 bits are needed to -determine where in the IP datagram the payload (e.g., the -transport-layer segment being encapsulated in this datagram) actually -begins. Most IP datagrams do not contain options, so the typical IP -datagram has a 20-byte header. Type of service. The type of service -(TOS) bits were included in the IPv4 header to allow different types of -IP datagrams to be distinguished from each other. For example, it might -be useful to distinguish real-time datagrams (such as those used by an -IP telephony application) from non-realtime traffic (for example, FTP). -The specific level of service to be provided is a policy issue -determined and configured by the network administrator for that router. -We also learned in Section 3.7.2 that two of the TOS bits are used for -Explicit Congestion Notification. Datagram length. This is the total -length of the IP datagram (header plus data), measured in bytes. Since -this field is 16 bits long, the theoretical maximum size of the IP -datagram is 65,535 bytes. However, datagrams are rarely larger than -1,500 bytes, which allows an IP datagram to fit in the payload field of -a maximally sized Ethernet frame. Identifier, flags, fragmentation -offset. These three fields have to do with so-called IP fragmentation, a -topic we will consider shortly. Interestingly, the new version of IP, -IPv6, does not allow for fragmentation. Time-to-live. The time-to-live -(TTL) field is included to ensure that datagrams do not circulate -forever (due to, for example, a long-lived routing loop) in the network. -This field is decremented by one each time the datagram is processed by -a router. If the TTL field reaches 0, a router must drop that datagram. -Protocol. This field is typically used only when an IP datagram reaches -its final destination. The value of this field indicates the specific -transport-layer protocol to which the data portion of this IP datagram -should be passed. For example, a value of 6 indicates that the data -portion is passed to TCP, while a value of 17 indicates that the data is -passed to UDP. For a list of all possible values, - -see \[IANA Protocol Numbers 2016\]. Note that the protocol number in the -IP datagram has a role that is analogous to the role of the port number -field in the transport-layer segment. The protocol number is the glue -that binds the network and transport layers together, whereas the port -number is the glue that binds the transport and application layers -together. We'll see in Chapter 6 that the linklayer frame also has a -special field that binds the link layer to the network layer. Header -checksum. The header checksum aids a router in detecting bit errors in a -received IP datagram. The header checksum is computed by treating each 2 -bytes in the header as a number and summing these numbers using 1s -complement arithmetic. As discussed in Section 3.3, the 1s complement of -this sum, known as the Internet checksum, is stored in the checksum -field. A router computes the header checksum for each received IP -datagram and detects an error condition if the checksum carried in the -datagram header does not equal the computed checksum. Routers typically -discard datagrams for which an error has been detected. Note that the -checksum must be recomputed and stored again at each router, since the -TTL field, and possibly the options field as well, will change. An -interesting discussion of fast algorithms for computing the Internet -checksum is \[RFC 1071\]. A question often asked at this point is, why -does TCP/IP perform error checking at both the transport and network -layers? There are several reasons for this repetition. First, note that -only the IP header is checksummed at the IP layer, while the TCP/UDP -checksum is computed over the entire TCP/UDP segment. Second, TCP/UDP -and IP do not necessarily both have to belong to the same protocol -stack. TCP can, in principle, run over a different network-layer -protocol (for example, ATM) \[Black 1995\]) and IP can carry data that -will not be passed to TCP/UDP. Source and destination IP addresses. When -a source creates a datagram, it inserts its IP address into the source -IP address field and inserts the address of the ultimate destination -into the destination IP address field. Often the source host determines -the destination address via a DNS lookup, as discussed in Chapter 2. -We'll discuss IP addressing in detail in Section 4.3.3. Options. The -options fields allow an IP header to be extended. Header options were -meant to be used rarely---hence the decision to save overhead by not -including the information in options fields in every datagram header. -However, the mere existence of options does complicate matters---since -datagram headers can be of variable length, one cannot determine a -priori where the data field will start. Also, since some datagrams may -require options processing and others may not, the amount of time needed -to process an IP datagram at a router can vary greatly. These -considerations become particularly important for IP processing in -high-performance routers and hosts. For these reasons and others, IP -options were not included in the IPv6 header, as discussed in Section -4.3.5. Data (payload). Finally, we come to the last and most important -field---the raison d'etre for the datagram in the first place! In most -circumstances, the data field of the IP datagram contains the -transport-layer segment (TCP or UDP) to be delivered to the destination. -However, the data field can carry other types of data, such as ICMP -messages (discussed in Section 5.6). Note that an IP datagram has a -total of 20 bytes of header (assuming no options). If the datagram -carries a TCP segment, then each (non-fragmented) datagram carries a -total of 40 bytes of header (20 bytes of IP header plus 20 bytes of TCP -header) along with the application-layer message. - -4.3.2 IPv4 Datagram Fragmentation We'll see in Chapter 6 that not all -link-layer protocols can carry network-layer packets of the same size. -Some protocols can carry big datagrams, whereas other protocols can -carry only little datagrams. For example, Ethernet frames can carry up -to 1,500 bytes of data, whereas frames for some wide-area links can -carry no more than 576 bytes. The maximum amount of data that a -link-layer frame can carry is called the maximum transmission unit -(MTU). Because each IP datagram is encapsulated within the link-layer -frame for transport from one router to the next router, the MTU of the -link-layer protocol places a hard limit on the length of an IP datagram. -Having a hard limit on the size of an IP datagram is not much of a -problem. What is a problem is that each of the links along the route -between sender and destination can use different link-layer protocols, -and each of these protocols can have different MTUs. To understand the -forwarding issue better, imagine that you are a router that -interconnects several links, each running different link-layer protocols -with different MTUs. Suppose you receive an IP datagram from one link. -You check your forwarding table to determine the outgoing link, and this -outgoing link has an MTU that is smaller than the length of the IP -datagram. Time to panic---how are you going to squeeze this oversized IP -datagram into the payload field of the link-layer frame? The solution is -to fragment the payload in the IP datagram into two or more smaller IP -datagrams, encapsulate each of these smaller IP datagrams in a separate -link-layer frame; and send these frames over the outgoing link. Each of -these smaller datagrams is referred to as a fragment. Fragments need to -be reassembled before they reach the transport layer at the destination. -Indeed, both TCP and UDP are expecting to receive complete, unfragmented -segments from the network layer. The designers of IPv4 felt that -reassembling datagrams in the routers would introduce significant -complication into the protocol and put a damper on router performance. -(If you were a router, would you want to be reassembling fragments on -top of everything else you had to do?) Sticking to the principle of -keeping the network core simple, the designers of IPv4 decided to put -the job of datagram reassembly in the end systems rather than in network -routers. When a destination host receives a series of datagrams from the -same source, it needs to determine whether any of these datagrams are -fragments of some original, larger datagram. If some datagrams are -fragments, it must further determine when it has received the last -fragment and how the fragments it has received should be pieced back -together to form the original datagram. To allow the destination host to -perform these reassembly tasks, the designers of IP (version 4) put -identification, flag, and fragmentation offset fields in the IP datagram -header. When a datagram is created, the sending host stamps the datagram -with an identification number as well as source and destination -addresses. Typically, the sending host increments the identification -number for each datagram it sends. When a router needs to fragment a -datagram, each resulting datagram (that is, fragment) is stamped with -the - -source address, destination address, and identification number of the -original datagram. When the destination receives a series of datagrams -from the same sending host, it can examine the identification numbers of -the datagrams to determine which of the datagrams are actually fragments -of the same larger datagram. Because IP is an unreliable service, one or -more of the fragments may never arrive at the destination. For this -reason, in order for the destination host to be absolutely sure it has -received the last fragment of - -Figure 4.17 IP fragmentation and reassembly - -the original datagram, the last fragment has a flag bit set to 0, -whereas all the other fragments have this flag bit set to 1. Also, in -order for the destination host to determine whether a fragment is -missing (and also to be able to reassemble the fragments in their proper -order), the offset field is used to specify where the fragment fits -within the original IP datagram. Figure 4.17 illustrates an example. A -datagram of 4,000 bytes (20 bytes of IP header plus 3,980 bytes of IP -payload) arrives at a router and must be forwarded to a link with an MTU -of 1,500 bytes. This implies that the 3,980 data bytes in the original -datagram must be allocated to three separate fragments (each of which is -also an IP datagram). The online material for this book, and the -problems at the end of this chapter will allow you to explore -fragmentation in more detail. Also, on this book's Web site, we provide -a Java applet that generates fragments. You provide the incoming -datagram size, the MTU, and the incoming datagram identification. - -The applet automatically generates the fragments for you. See -http://www.pearsonhighered.com/csresources/. - -4.3.3 IPv4 Addressing We now turn our attention to IPv4 addressing. -Although you may be thinking that addressing must be a straightforward -topic, hopefully by the end of this section you'll be convinced that -Internet addressing is not only a juicy, subtle, and interesting topic -but also one that is of central importance to the Internet. An excellent -treatment of IPv4 addressing can be found in the first chapter in -\[Stewart 1999\]. Before discussing IP addressing, however, we'll need -to say a few words about how hosts and routers are connected into the -Internet. A host typically has only a single link into the network; when -IP in the host wants to send a datagram, it does so over this link. The -boundary between the host and the physical link is called an interface. -Now consider a router and its interfaces. Because a router's job is to -receive a datagram on one link and forward the datagram on some other -link, a router necessarily has two or more links to which it is -connected. The boundary between the router and any one of its links is -also called an interface. A router thus has multiple interfaces, one for -each of its links. Because every host and router is capable of sending -and receiving IP datagrams, IP requires each host and router interface -to have its own IP address. Thus, an IP address is technically -associated with an interface, rather than with the host or router -containing that interface. Each IP address is 32 bits long -(equivalently, 4 bytes), and there are thus a total of 232 (or -approximately 4 billion) possible IP addresses. These addresses are -typically written in so-called dotted-decimal notation, in which each -byte of the address is written in its decimal form and is separated by a -period (dot) from other bytes in the address. For example, consider the -IP address 193.32.216.9. The 193 is the decimal equivalent of the first -8 bits of the address; the 32 is the decimal equivalent of the second 8 -bits of the address, and so on. Thus, the address 193.32.216.9 in binary -notation is 11000001 00100000 11011000 00001001 Each interface on every -host and router in the global Internet must have an IP address that is -globally unique (except for interfaces behind NATs, as discussed in -Section 4.3.4). These addresses cannot be chosen in a willy-nilly -manner, however. A portion of an interface's IP address will be -determined by the subnet to which it is connected. Figure 4.18 provides -an example of IP addressing and interfaces. In this figure, one router -(with three interfaces) is used to interconnect seven hosts. Take a -close look at the IP addresses assigned to the host and router -interfaces, as there are several things to notice. The three hosts in -the upper-left portion of Figure 4.18, and the router interface to which -they are connected, all have an IP address of the form - -223.1.1.xxx. That is, they all have the same leftmost 24 bits in their -IP address. These four interfaces are also interconnected to each other -by a network that contains no routers. This network could be -interconnected by an Ethernet LAN, in which case the interfaces would be -interconnected by an Ethernet switch (as we'll discuss in Chapter 6), or -by a wireless access point (as we'll discuss in Chapter 7). We'll -represent this routerless network connecting these hosts as a cloud for -now, and dive into the internals of such networks in Chapters 6 and 7. -In IP terms, this network interconnecting three host interfaces and one -router interface forms a subnet \[RFC 950\]. (A subnet is also called an -IP network or simply - -Figure 4.18 Interface addresses and subnets - -a network in the Internet literature.) IP addressing assigns an address -to this subnet: 223.1.1.0/24, where the /24 ("slash-24") notation, -sometimes known as a subnet mask, indicates that the leftmost 24 bits of -the 32-bit quantity define the subnet address. The 223.1.1.0/24 subnet -thus consists of the three host interfaces (223.1.1.1, 223.1.1.2, and -223.1.1.3) and one router interface (223.1.1.4). Any additional hosts -attached to the 223.1.1.0/24 subnet would be required to have an address -of the form 223.1.1.xxx. There are two additional subnets shown in -Figure 4.18: the 223.1.2.0/24 network and the 223.1.3.0/24 subnet. -Figure 4.19 illustrates the three IP subnets present in Figure 4.18. The -IP definition of a subnet is not restricted to Ethernet segments that -connect multiple hosts to a router interface. To get some insight here, -consider Figure 4.20, which shows three routers that are interconnected -with each other by point-to-point links. Each router has three -interfaces, one for each point-to-point link and one for the broadcast -link that directly connects the router to a pair of hosts. What - -subnets are present here? Three subnets, 223.1.1.0/24, 223.1.2.0/24, and -223.1.3.0/24, are similar to the subnets we encountered in Figure 4.18. -But note that there are three additional subnets in this example as -well: one subnet, 223.1.9.0/24, for the interfaces that connect routers -R1 and R2; another subnet, 223.1.8.0/24, for the interfaces that connect -routers R2 and R3; and a third subnet, 223.1.7.0/24, for the interfaces -that connect routers R3 and R1. For a general interconnected system of -routers and hosts, we can use the following recipe to define the subnets -in the system: - -Figure 4.19 Subnet addresses - -To determine the subnets, detach each interface from its host or router, -creating islands of isolated networks, with interfaces terminating the -end points of the isolated networks. Each of these isolated networks is -called a subnet. If we apply this procedure to the interconnected system -in Figure 4.20, we get six islands or subnets. From the discussion -above, it's clear that an organization (such as a company or academic -institution) with multiple Ethernet segments and point-to-point links -will have multiple subnets, with all of the devices on a given subnet -having the same subnet address. In principle, the different subnets -could have quite different subnet addresses. In practice, however, their -subnet addresses often have much in common. To understand why, let's -next turn our attention to how addressing is handled in the global -Internet. The Internet's address assignment strategy is known as -Classless Interdomain Routing (CIDR--- pronounced cider) \[RFC 4632\]. -CIDR generalizes the notion of subnet addressing. As with subnet - -addressing, the 32-bit IP address is divided into two parts and again -has the dotted-decimal form a.b.c.d/x, where x indicates the number of -bits in the first part of the address. The x most significant bits of an -address of the form a.b.c.d/x constitute the network portion of the IP -address, and are often referred to as the prefix (or network prefix) of -the address. An organization is typically assigned a block of contiguous -addresses, that is, a range of addresses with a common prefix (see the -Principles in Practice feature). In this case, the IP addresses of -devices within the organization will share the common prefix. When we -cover the Internet's BGP routing protocol in - -Figure 4.20 Three routers interconnecting six subnets - -Section 5.4, we'll see that only these x leading prefix bits are -considered by routers outside the organization's network. That is, when -a router outside the organization forwards a datagram whose destination -address is inside the organization, only the leading x bits of the -address need be considered. This considerably reduces the size of the -forwarding table in these routers, since a single entry of the form -a.b.c.d/x will be sufficient to forward packets to any destination -within the organization. The remaining 32-x bits of an address can be -thought of as distinguishing among the devices within the organization, -all of which have the same network prefix. These are the bits that will -be considered when forwarding packets at routers within the -organization. These lower-order bits may (or may not) have an - -additional subnetting structure, such as that discussed above. For -example, suppose the first 21 bits of the CIDRized address a.b.c.d/21 -specify the organization's network prefix and are common to the IP -addresses of all devices in that organization. The remaining 11 bits -then identify the specific hosts in the organization. The organization's -internal structure might be such that these 11 rightmost bits are used -for subnetting within the organization, as discussed above. For example, -a.b.c.d/24 might refer to a specific subnet within the organization. -Before CIDR was adopted, the network portions of an IP address were -constrained to be 8, 16, or 24 bits in length, an addressing scheme -known as classful addressing, since subnets with 8-, 16-, and 24-bit -subnet addresses were known as class A, B, and C networks, respectively. -The requirement that the subnet portion of an IP address be exactly 1, -2, or 3 bytes long turned out to be problematic for supporting the -rapidly growing number of organizations with small and medium-sized -subnets. A class C (/24) subnet could accommodate only up to 28 − 2 = -254 hosts (two of the 28 = 256 addresses are reserved for special -use)---too small for many organizations. However, a class B (/16) -subnet, which supports up to 65,634 hosts, was too large. Under classful -addressing, an organization with, say, 2,000 hosts was typically -allocated a class B (/16) subnet address. This led to a rapid depletion -of the class B address space and poor utilization of the assigned -address space. For example, the organization that used a class B address -for its 2,000 hosts was allocated enough of the address space for up to -65,534 interfaces---leaving more than 63,000 addresses that could not be -used by other organizations. - -PRINCIPLES IN PRACTICE This example of an ISP that connects eight -organizations to the Internet nicely illustrates how carefully allocated -CIDRized addresses facilitate routing. Suppose, as shown in Figure 4.21, -that the ISP (which we'll call Fly-By-Night-ISP) advertises to the -outside world that it should be sent any datagrams whose first 20 -address bits match 200.23.16.0/20. The rest of the world need not know -that within the address block 200.23.16.0/20 there are in fact eight -other organizations, each with its own subnets. This ability to use a -single prefix to advertise multiple networks is often referred to as -address aggregation (also route aggregation or route summarization). -Address aggregation works extremely well when addresses are allocated in -blocks to ISPs and then from ISPs to client organizations. But what -happens when addresses are not allocated in such a hierarchical manner? -What would happen, for example, if Fly-By-Night-ISP acquires ISPs-R-Us -and then has Organization 1 connect to the Internet through its -subsidiary ISPs-RUs? As shown in Figure 4.21, the subsidiary ISPs-R-Us -owns the address block 199.31.0.0/16, but Organization 1's IP addresses -are unfortunately outside of this address block. What should be done -here? Certainly, Organization 1 could renumber all of its routers and -hosts to have addresses within the ISPs-R-Us address block. But this is -a costly solution, and Organization 1 might well be reassigned to -another subsidiary in the future. The solution typically adopted is for -Organization 1 to keep its IP addresses in 200.23.18.0/23. In this case, -as shown in Figure 4.22, - -Fly-By-Night-ISP continues to advertise the address block 200.23.16.0/20 -and ISPs-R-Us continues to advertise 199.31.0.0/16. However, ISPs-R-Us -now also advertises the block of addresses for Organization 1, -200.23.18.0/23. When other routers in the larger Internet see the -address blocks 200.23.16.0/20 (from Fly-By-Night-ISP) and 200.23.18.0/23 -(from ISPs-R-Us) and want to route to an address in the block -200.23.18.0/23, they will use longest prefix matching (see Section -4.2.1), and route toward ISPs-R-Us, as it advertises the longest (i.e., -most-specific) address prefix that matches the destination address. - -Figure 4.21 Hierarchical addressing and route aggregation - -Figure 4.22 ISPs-R-Us has a more specific route to Organization 1 - -We would be remiss if we did not mention yet another type of IP address, -the IP broadcast address 255.255.255.255. When a host sends a datagram -with destination address 255.255.255.255, the message is delivered to -all hosts on the same subnet. Routers optionally forward the message -into neighboring subnets as well (although they usually don't). Having -now studied IP addressing in detail, we need to know how hosts and -subnets get their addresses in the first place. Let's begin by looking -at how an organization gets a block of addresses for its devices, and -then look at how a device (such as a host) is assigned an address from -within the organization's block of addresses. Obtaining a Block of -Addresses In order to obtain a block of IP addresses for use within an -organization's subnet, a network administrator might first contact its -ISP, which would provide addresses from a larger block of addresses that -had already been allocated to the ISP. For example, the ISP may itself -have been allocated the address block 200.23.16.0/20. The ISP, in turn, -could divide its address block into eight equal-sized contiguous address -blocks and give one of these address blocks out to each of up to eight -organizations that are supported by this ISP, as shown below. (We have -underlined the subnet part of these addresses for your convenience.) -ISP's block: - -200.23.16.0/20 - -11001000 00010111 00010000 00000000 - -Organization 0 - -200.23.16.0/23 - -11001000 00010111 00010000 00000000 - -Organization 1 - -200.23.18.0/23 - -11001000 00010111 00010010 00000000 - -Organization 2 - -200.23.20.0/23 - -11001000 00010111 00010100 00000000 - - ... ... Organization 7 - -200.23.30.0/23 - - ... 11001000 00010111 00011110 00000000 - -While obtaining a set of addresses from an ISP is one way to get a block -of addresses, it is not the only way. Clearly, there must also be a way -for the ISP itself to get a block of addresses. Is there a global -authority that has ultimate responsibility for managing the IP address -space and allocating address blocks to ISPs and other organizations? -Indeed there is! IP addresses are managed under the authority of the -Internet Corporation for Assigned Names and Numbers (ICANN) \[ICANN -2016\], based on guidelines set forth in \[RFC 7020\]. The role of the -nonprofit ICANN organization \[NTIA 1998\] is not only to allocate IP -addresses, but also to manage the DNS root servers. It also has the very -contentious job of assigning domain names and resolving domain name -disputes. The ICANN allocates addresses to regional Internet registries -(for example, ARIN, RIPE, APNIC, and LACNIC, which together form the -Address Supporting Organization of ICANN \[ASO-ICANN 2016\]), and handle -the allocation/management of addresses within their regions. Obtaining a -Host Address: The Dynamic Host Configuration Protocol Once an -organization has obtained a block of addresses, it can assign individual -IP addresses to the host and router interfaces in its organization. A -system administrator will typically manually configure the IP addresses -into the router (often remotely, with a network management tool). Host -addresses can also be configured manually, but typically this is done -using the Dynamic Host Configuration Protocol (DHCP) \[RFC 2131\]. DHCP -allows a host to obtain (be allocated) an IP address automatically. A -network administrator can configure DHCP so that a given host receives -the same IP address each time it connects to the network, or a host may -be assigned a temporary IP address that will be different each time the -host connects to the network. In addition to host IP address assignment, -DHCP also allows a host to learn additional information, such as its -subnet mask, the address of its first-hop router (often called the -default gateway), and the address of its local DNS server. Because of -DHCP's ability to automate the network-related aspects of connecting a -host into a network, it is often referred to as a plug-and-play or -zeroconf (zero-configuration) protocol. This capability makes it very -attractive to the network administrator who would otherwise have to -perform these tasks manually! DHCP is also enjoying widespread use in -residential Internet access networks, enterprise - -networks, and in wireless LANs, where hosts join and leave the network -frequently. Consider, for example, the student who carries a laptop from -a dormitory room to a library to a classroom. It is likely that in each -location, the student will be connecting into a new subnet and hence -will need a new IP address at each location. DHCP is ideally suited to -this situation, as there are many users coming and going, and addresses -are needed for only a limited amount of time. The value of DHCP's -plug-and-play capability is clear, since it's unimaginable that a system -administrator would be able to reconfigure laptops at each location, and -few students (except those taking a computer networking class!) would -have the expertise to configure their laptops manually. DHCP is a -client-server protocol. A client is typically a newly arriving host -wanting to obtain network configuration information, including an IP -address for itself. In the simplest case, each subnet (in the addressing -sense of Figure 4.20) will have a DHCP server. If no server is present -on the subnet, a DHCP relay agent (typically a router) that knows the -address of a DHCP server for that network is needed. Figure 4.23 shows a -DHCP server attached to subnet 223.1.2/24, with the router serving as -the relay agent for arriving clients attached to subnets 223.1.1/24 and -223.1.3/24. In our discussion below, we'll assume that a DHCP server is -available on the subnet. For a newly arriving host, the DHCP protocol is -a four-step process, as shown in Figure 4.24 for the network setting -shown in Figure 4.23. In this figure, yiaddr (as in "your Internet -address") indicates the address being allocated to the newly arriving -client. The four steps are: - -Figure 4.23 DHCP client and server - -DHCP server discovery. The first task of a newly arriving host is to -find a DHCP server with which to interact. This is done using a DHCP -discover message, which a client sends within a UDP packet to port 67. -The UDP packet is encapsulated in an IP datagram. But to whom should -this datagram be sent? The host doesn't even know the IP address of the -network to which it is attaching, much less the address of a DHCP server -for this network. Given this, the DHCP client creates an IP datagram -containing its DHCP discover message along with the broadcast -destination IP address of 255.255.255.255 and a "this host" source IP -address of 0.0.0.0. The DHCP client passes the IP datagram to the link -layer, which then broadcasts this frame to all nodes attached to the -subnet (we will cover the details of link-layer broadcasting in Section -6.4). DHCP server offer(s). A DHCP server receiving a DHCP discover -message responds to the client with a DHCP offer message that is -broadcast to all nodes on the subnet, again using the IP broadcast -address of 255.255.255.255. (You might want to think about why this -server reply must also be broadcast). Since several DHCP servers can be -present on the subnet, the client may find itself in the enviable -position of being able to choose from among several offers. Each - -Figure 4.24 DHCP client-server interaction - -server offer message contains the transaction ID of the received -discover message, the proposed IP address for the client, the network -mask, and an IP address lease time---the amount of time for which the IP -address will be valid. It is common for the server to set the lease time -to several hours or days \[Droms 2002\]. DHCP request. The newly -arriving client will choose from among one or more server offers and -respond to its selected offer with a DHCP request message, echoing back -the configuration parameters. DHCP ACK. The server responds to the DHCP -request message with a DHCP ACK message, confirming the requested -parameters. Once the client receives the DHCP ACK, the interaction is -complete and the client can use the DHCPallocated IP address for the -lease duration. Since a client may want to use its address beyond the - -lease's expiration, DHCP also provides a mechanism that allows a client -to renew its lease on an IP address. From a mobility aspect, DHCP does -have one very significant shortcoming. Since a new IP address is -obtained from DHCP each time a node connects to a new subnet, a TCP -connection to a remote application cannot be maintained as a mobile node -moves between subnets. In Chapter 6, we will examine mobile IP---an -extension to the IP infrastructure that allows a mobile node to use a -single permanent address as it moves between subnets. Additional details -about DHCP can be found in \[Droms 2002\] and \[dhc 2016\]. An open -source reference implementation of DHCP is available from the Internet -Systems Consortium \[ISC 2016\]. - -4.3.4 Network Address Translation (NAT) Given our discussion about -Internet addresses and the IPv4 datagram format, we're now well aware -that every IP-capable device needs an IP address. With the proliferation -of small office, home office (SOHO) subnets, this would seem to imply -that whenever a SOHO wants to install a LAN to connect multiple -machines, a range of addresses would need to be allocated by the ISP to -cover all of the SOHO's IP devices (including phones, tablets, gaming -devices, IP TVs, printers and more). If the subnet grew bigger, a larger -block of addresses would have to be allocated. But what if the ISP had -already allocated the contiguous portions of the SOHO network's current -address range? And what typical homeowner wants (or should need) to know -how to manage IP addresses in the first place? Fortunately, there is a -simpler approach to address allocation that has found increasingly -widespread use in such scenarios: network address translation (NAT) -\[RFC 2663; RFC 3022; Huston 2004, Zhang 2007; Cisco NAT 2016\]. Figure -4.25 shows the operation of a NAT-enabled router. The NAT-enabled -router, residing in the home, has an interface that is part of the home -network on the right of Figure 4.25. Addressing within the home network -is exactly as we have seen above---all four interfaces in the home -network have the same subnet address of 10.0.0/24. The address space -10.0.0.0/8 is one of three portions of the IP address space that is -reserved in \[RFC 1918\] for a private network or a realm with private -addresses, such as the home network in Figure 4.25. A realm with private -addresses refers to a network whose addresses only have meaning to -devices within that network. To see why this is important, consider the -fact that there are hundreds of thousands of home networks, many using -the same address space, 10.0.0.0/24. Devices within a given home network -can send packets to each other using 10.0.0.0/24 addressing. However, -packets forwarded beyond the home network into the larger global -Internet clearly cannot use these addresses (as either a source or a -destination address) because there are hundreds of thousands of networks -using this block of addresses. That is, the 10.0.0.0/24 addresses can -only have meaning within the - -Figure 4.25 Network address translation - -given home network. But if private addresses only have meaning within a -given network, how is addressing handled when packets are sent to or -received from the global Internet, where addresses are necessarily -unique? The answer lies in understanding NAT. The NAT-enabled router -does not look like a router to the outside world. Instead the NAT router -behaves to the outside world as a single device with a single IP -address. In Figure 4.25, all traffic leaving the home router for the -larger Internet has a source IP address of 138.76.29.7, and all traffic -entering the home router must have a destination address of 138.76.29.7. -In essence, the NAT-enabled router is hiding the details of the home -network from the outside world. (As an aside, you might wonder where the -home network computers get their addresses and where the router gets its -single IP address. Often, the answer is the same---DHCP! The router gets -its address from the ISP's DHCP server, and the router runs a DHCP -server to provide addresses to computers within the -NAT-DHCP-routercontrolled home network's address space.) If all -datagrams arriving at the NAT router from the WAN have the same -destination IP address (specifically, that of the WAN-side interface of -the NAT router), then how does the router know the internal host to -which it should forward a given datagram? The trick is to use a NAT -translation table at the NAT router, and to include port numbers as well -as IP addresses in the table entries. Consider the example in Figure -4.25. Suppose a user sitting in a home network behind host 10.0.0.1 -requests a Web page on some Web server (port 80) with IP address -128.119.40.186. The host 10.0.0.1 assigns the (arbitrary) source port -number 3345 and sends the datagram into the LAN. The NAT router receives -the datagram, generates a new source port number 5001 for the datagram, -replaces the - -source IP address with its WAN-side IP address 138.76.29.7, and replaces -the original source port number 3345 with the new source port number -5001. When generating a new source port number, the NAT router can -select any source port number that is not currently in the NAT -translation table. (Note that because a port number field is 16 bits -long, the NAT protocol can support over 60,000 simultaneous connections -with a single WAN-side IP address for the router!) NAT in the router -also adds an entry to its NAT translation table. The Web server, -blissfully unaware that the arriving datagram containing the HTTP -request has been manipulated by the NAT router, responds with a datagram -whose destination address is the IP address of the NAT router, and whose -destination port number is 5001. When this datagram arrives at the NAT -router, the router indexes the NAT translation table using the -destination IP address and destination port number to obtain the -appropriate IP address (10.0.0.1) and destination port number (3345) for -the browser in the home network. The router then rewrites the datagram's -destination address and destination port number, and forwards the -datagram into the home network. NAT has enjoyed widespread deployment in -recent years. But NAT is not without detractors. First, one might argue -that, port numbers are meant to be used for addressing processes, not -for addressing hosts. This violation can indeed cause problems for -servers running on the home network, since, as we have seen in Chapter -2, server processes wait for incoming requests at well-known port -numbers and peers in a P2P protocol need to accept incoming connections -when acting as servers. Technical solutions to these problems include -NAT traversal tools \[RFC 5389\] and Universal Plug and Play (UPnP), a -protocol that allows a host to discover and configure a nearby NAT -\[UPnP Forum 2016\]. More "philosophical" arguments have also been -raised against NAT by architectural purists. Here, the concern is that -routers are meant to be layer 3 (i.e., network-layer) devices, and -should process packets only up to the network layer. NAT violates this -principle that hosts should be talking directly with each other, without -interfering nodes modifying IP addresses, much less port numbers. But -like it or not, NAT has not become an important component of the -Internet, as have other so-called middleboxes \[Sekar 2011\] that -operate at the network layer but have functions that are quite different -from routers. Middleboxes do not perform traditional datagram -forwarding, but instead perform functions such as NAT, load balancing of -traffic flows, traffic firewalling (see accompanying sidebar), and more. -The generalized forwarding paradigm that we'll study shortly in Section -4.4 allows a number of these middlebox functions, as well as traditional -router forwarding, to be accomplished in a common, integrated manner. - -FOCUS ON SECURITY INSPECTING DATAGRAMS: FIREWALLS AND INTRUSION -DETECTION SYSTEMS Suppose you are assigned the task of administering a -home, departmental, university, or corporate network. Attackers, knowing -the IP address range of your network, can easily send IP datagrams to -addresses in your range. These datagrams can do all kinds of devious -things, including mapping your network with ping sweeps and port scans, -crashing vulnerable hosts with - -malformed packets, scanning for open TCP/UDP ports on servers in your -network, and infecting hosts by including malware in the packets. As the -network administrator, what are you going to do about all those bad guys -out there, each capable of sending malicious packets into your network? -Two popular defense mechanisms to malicious packet attacks are firewalls -and intrusion detection systems (IDSs). As a network administrator, you -may first try installing a firewall between your network and the -Internet. (Most access routers today have firewall capability.) -Firewalls inspect the datagram and segment header fields, denying -suspicious datagrams entry into the internal network. For example, a -firewall may be configured to block all ICMP echo request packets (see -Section 5.6), thereby preventing an attacker from doing a traditional -port scan across your IP address range. Firewalls can also block packets -based on source and destination IP addresses and port numbers. -Additionally, firewalls can be configured to track TCP connections, -granting entry only to datagrams that belong to approved connections. -Additional protection can be provided with an IDS. An IDS, typically -situated at the network boundary, performs "deep packet inspection," -examining not only header fields but also the payloads in the datagram -(including application-layer data). An IDS has a database of packet -signatures that are known to be part of attacks. This database is -automatically updated as new attacks are discovered. As packets pass -through the IDS, the IDS attempts to match header fields and payloads to -the signatures in its signature database. If such a match is found, an -alert is created. An intrusion prevention system (IPS) is similar to an -IDS, except that it actually blocks packets in addition to creating -alerts. In Chapter 8, we'll explore firewalls and IDSs in more detail. -Can firewalls and IDSs fully shield your network from all attacks? The -answer is clearly no, as attackers continually find new attacks for -which signatures are not yet available. But firewalls and traditional -signature-based IDSs are useful in protecting your network from known -attacks. - -4.3.5 IPv6 In the early 1990s, the Internet Engineering Task Force began -an effort to develop a successor to the IPv4 protocol. A prime -motivation for this effort was the realization that the 32-bit IPv4 -address space was beginning to be used up, with new subnets and IP nodes -being attached to the Internet (and being allocated unique IP addresses) -at a breathtaking rate. To respond to this need for a large IP address -space, a new IP protocol, IPv6, was developed. The designers of IPv6 -also took this opportunity to tweak and augment other aspects of IPv4, -based on the accumulated operational experience with IPv4. The point in -time when IPv4 addresses would be completely allocated (and hence no new -networks - -could attach to the Internet) was the subject of considerable debate. -The estimates of the two leaders of the IETF's Address Lifetime -Expectations working group were that addresses would become exhausted in -2008 and 2018, respectively \[Solensky 1996\]. In February 2011, IANA -allocated out the last remaining pool of unassigned IPv4 addresses to a -regional registry. While these registries still have available IPv4 -addresses within their pool, once these addresses are exhausted, there -are no more available address blocks that can be allocated from a -central pool \[Huston 2011a\]. A recent survey of IPv4 address-space -exhaustion, and the steps taken to prolong the life of the address space -is \[Richter 2015\]. Although the mid-1990s estimates of IPv4 address -depletion suggested that a considerable amount of time might be left -until the IPv4 address space was exhausted, it was realized that -considerable time would be needed to deploy a new technology on such an -extensive scale, and so the process to develop IP version 6 (IPv6) \[RFC -2460\] was begun \[RFC 1752\]. (An often-asked question is what happened -to IPv5? It was initially envisioned that the ST-2 protocol would become -IPv5, but ST-2 was later dropped.) An excellent source of information -about IPv6 is \[Huitema 1998\]. IPv6 Datagram Format The format of the -IPv6 datagram is shown in Figure 4.26. The most important changes -introduced in IPv6 are evident in the datagram format: Expanded -addressing capabilities. IPv6 increases the size of the IP address from -32 to 128 bits. This ensures that the world won't run out of IP -addresses. Now, every grain of sand on the planet can be IP-addressable. -In addition to unicast and multicast addresses, IPv6 has introduced a -new type of address, called an anycast address, that allows a datagram -to be delivered to any one of a group of hosts. (This feature could be -used, for example, to send an HTTP GET to the nearest of a number of -mirror sites that contain a given document.) A streamlined 40-byte -header. As discussed below, a number of IPv4 fields have been dropped or -made optional. The resulting 40-byte fixed-length header allows for -faster processing of the IP datagram by a router. A new encoding of -options allows for more flexible options processing. Flow labeling. IPv6 -has an elusive definition of a flow. RFC 2460 states that this allows -"labeling of packets belonging to particular flows for which the sender - -Figure 4.26 IPv6 datagram format - -requests special handling, such as a non-default quality of service or -real-time service." For example, audio and video transmission might -likely be treated as a flow. On the other hand, the more traditional -applications, such as file transfer and e-mail, might not be treated as -flows. It is possible that the traffic carried by a high-priority user -(for example, someone paying for better service for their traffic) might -also be treated as a flow. What is clear, however, is that the designers -of IPv6 foresaw the eventual need to be able to differentiate among the -flows, even if the exact meaning of a flow had yet to be determined. As -noted above, a comparison of Figure 4.26 with Figure 4.16 reveals the -simpler, more streamlined structure of the IPv6 datagram. The following -fields are defined in IPv6: Version. This 4-bit field identifies the IP -version number. Not surprisingly, IPv6 carries a value of 6 in this -field. Note that putting a 4 in this field does not create a valid IPv4 -datagram. (If it did, life would be a lot simpler---see the discussion -below regarding the transition from IPv4 to IPv6.) Traffic class. The -8-bit traffic class field, like the TOS field in IPv4, can be used to -give priority to certain datagrams within a flow, or it can be used to -give priority to datagrams from certain applications (for example, -voice-over-IP) over datagrams from other applications (for example, SMTP -e-mail). Flow label. As discussed above, this 20-bit field is used to -identify a flow of datagrams. Payload length. This 16-bit value is -treated as an unsigned integer giving the number of bytes in the IPv6 -datagram following the fixed-length, 40-byte datagram header. Next -header. This field identifies the protocol to which the contents (data -field) of this datagram will be delivered (for example, to TCP or UDP). -The field uses the same values as the protocol field in the IPv4 header. -Hop limit. The contents of this field are decremented by one by each -router that forwards the datagram. If the hop limit count reaches zero, -the datagram is discarded. - -Source and destination addresses. The various formats of the IPv6 -128-bit address are described in RFC 4291. Data. This is the payload -portion of the IPv6 datagram. When the datagram reaches its destination, -the payload will be removed from the IP datagram and passed on to the -protocol specified in the next header field. The discussion above -identified the purpose of the fields that are included in the IPv6 -datagram. Comparing the IPv6 datagram format in Figure 4.26 with the -IPv4 datagram format that we saw in Figure 4.16, we notice that several -fields appearing in the IPv4 datagram are no longer present in the IPv6 -datagram: Fragmentation/reassembly. IPv6 does not allow for -fragmentation and reassembly at intermediate routers; these operations -can be performed only by the source and destination. If an IPv6 datagram -received by a router is too large to be forwarded over the outgoing -link, the router simply drops the datagram and sends a "Packet Too Big" -ICMP error message (see Section 5.6) back to the sender. The sender can -then resend the data, using a smaller IP datagram size. Fragmentation -and reassembly is a time-consuming operation; removing this -functionality from the routers and placing it squarely in the end -systems considerably speeds up IP forwarding within the network. Header -checksum. Because the transport-layer (for example, TCP and UDP) and -link-layer (for example, Ethernet) protocols in the Internet layers -perform checksumming, the designers of IP probably felt that this -functionality was sufficiently redundant in the network layer that it -could be removed. Once again, fast processing of IP packets was a -central concern. Recall from our discussion of IPv4 in Section 4.3.1 -that since the IPv4 header contains a TTL field (similar to the hop -limit field in IPv6), the IPv4 header checksum needed to be recomputed -at every router. As with fragmentation and reassembly, this too was a -costly operation in IPv4. Options. An options field is no longer a part -of the standard IP header. However, it has not gone away. Instead, the -options field is one of the possible next headers pointed to from within -the IPv6 header. That is, just as TCP or UDP protocol headers can be the -next header within an IP packet, so too can an options field. The -removal of the options field results in a fixed-length, 40-byte IP -header. Transitioning from IPv4 to IPv6 Now that we have seen the -technical details of IPv6, let us consider a very practical matter: How -will the public Internet, which is based on IPv4, be transitioned to -IPv6? The problem is that while new IPv6capable systems can be made -backward-compatible, that is, can send, route, and receive IPv4 -datagrams, already deployed IPv4-capable systems are not capable of -handling IPv6 datagrams. Several options are possible \[Huston 2011b, -RFC 4213\]. One option would be to declare a flag day---a given time and -date when all Internet machines would be turned off and upgraded from -IPv4 to IPv6. The last major technology transition (from using NCP to - -using TCP for reliable transport service) occurred almost 35 years ago. -Even back then \[RFC 801\], when the Internet was tiny and still being -administered by a small number of "wizards," it was realized that such a -flag day was not possible. A flag day involving billions of devices is -even more unthinkable today. The approach to IPv4-to-IPv6 transition -that has been most widely adopted in practice involves tunneling \[RFC -4213\]. The basic idea behind tunneling---a key concept with -applications in many other scenarios beyond IPv4-to-IPv6 transition, -including wide use in the all-IP cellular networks that we'll cover in -Chapter 7---is the following. Suppose two IPv6 nodes (in this example, B -and E in Figure 4.27) want to interoperate using IPv6 datagrams but are -connected to each other by intervening IPv4 routers. We refer to the -intervening set of IPv4 routers between two IPv6 routers as a tunnel, as -illustrated in Figure 4.27. With tunneling, the IPv6 node on the sending -side of the tunnel (in this example, B) takes the entire IPv6 datagram -and puts it in the data (payload) field of an IPv4 datagram. This IPv4 -datagram is then addressed to the IPv6 node on the receiving side of the -tunnel (in this example, E) and sent to the first node in the tunnel (in -this example, C). The intervening IPv4 routers in the tunnel route this -IPv4 datagram among themselves, just as they would any other datagram, -blissfully unaware that the IPv4 datagram itself contains a complete -IPv6 datagram. The IPv6 node on the receiving side of the tunnel -eventually receives the IPv4 datagram (it is the destination of the IPv4 -datagram!), determines that the IPv4 datagram contains an IPv6 datagram -(by observing that the protocol number field in the IPv4 datagram is 41 -\[RFC 4213\], indicating that the IPv4 payload is a IPv6 datagram), -extracts the IPv6 datagram, and then routes the IPv6 datagram exactly as -it would if it had received the IPv6 datagram from a directly connected -IPv6 neighbor. We end this section by noting that while the adoption of -IPv6 was initially slow to take off \[Lawton 2001; Huston 2008b\], -momentum has been building. NIST \[NIST IPv6 2015\] reports that more -than a third of US government second-level domains are IPv6-enabled. On -the client side, Google reports that only about 8 percent of the clients -accessing Google services do so via IPv6 \[Google IPv6 2015\]. But other -recent measurements \[Czyz 2014\] indicate that IPv6 adoption is -accelerating. The proliferation of devices such as IP-enabled phones and -other portable devices - -Figure 4.27 Tunneling - -provides an additional push for more widespread deployment of IPv6. -Europe's Third Generation Partnership Program \[3GPP 2016\] has -specified IPv6 as the standard addressing scheme for mobile multimedia. -One important lesson that we can learn from the IPv6 experience is that -it is enormously difficult to change network-layer protocols. Since the -early 1990s, numerous new network-layer protocols have been trumpeted as -the next major revolution for the Internet, but most of these protocols -have had limited penetration to date. These protocols include IPv6, -multicast protocols, and resource reservation protocols; a discussion of -these latter two protocols can be found in the online supplement to this -text. Indeed, introducing new protocols into the network layer is like -replacing the foundation of a house---it is difficult to do without -tearing the whole house down or at least temporarily relocating the -house's residents. On the other hand, the Internet has witnessed rapid -deployment of new protocols at the application layer. The classic -examples, of course, are the Web, instant messaging, streaming media, -distributed games, and various forms of social media. Introducing new -application-layer protocols is like adding a new layer of paint to a -house---it is relatively easy to do, and if you choose an attractive -color, others in the neighborhood will copy you. In summary, in the -future we can certainly expect to see changes in the Internet's network -layer, but these changes will likely occur on a time scale that is much -slower than the changes that will occur at the application layer. - -4.4 Generalized Forwarding and SDN In Section 4.2.1, we noted that an -Internet router's forwarding decision has traditionally been based -solely on a packet's destination address. In the previous section, -however, we've also seen that there has been a proliferation of -middleboxes that perform many layer-3 functions. NAT boxes rewrite -header IP addresses and port numbers; firewalls block traffic based on -header-field values or redirect packets for additional processing, such -as deep packet inspection (DPI). Load-balancers forward packets -requesting a given service (e.g., an HTTP request) to one of a set of a -set of servers that provide that service. \[RFC 3234\] lists a number of -common middlebox functions. This proliferation of middleboxes, layer-2 -switches, and layer-3 routers \[Qazi 2013\]---each with its own -specialized hardware, software and management interfaces---has -undoubtedly resulted in costly headaches for many network operators. -However, recent advances in software-defined networking have promised, -and are now delivering, a unified approach towards providing many of -these network-layer functions, and certain link-layer functions as well, -in a modern, elegant, and integrated manner. Recall that Section 4.2.1 -characterized destination-based forwarding as the two steps of looking -up a destination IP address ("match"), then sending the packet into the -switching fabric to the specified output port ("action"). Let's now -consider a significantly more general "match-plus-action" paradigm, -where the "match" can be made over multiple header fields associated -with different protocols at different layers in the protocol stack. The -"action" can include forwarding the packet to one or more output ports -(as in destination-based forwarding), load balancing packets across -multiple outgoing interfaces that lead to a service (as in load -balancing), rewriting header values (as in NAT), purposefully -blocking/dropping a packet (as in a firewall), sending a packet to a -special server for further processing and action (as in DPI), and more. -In generalized forwarding, a match-plus-action table generalizes the -notion of the destination-based forwarding table that we encountered in -Section 4.2.1. Because forwarding decisions may be made using -network-layer and/or link-layer source and destination addresses, the -forwarding devices shown in Figure 4.28 are more accurately described as -"packet switches" rather than layer 3 "routers" or layer 2 "switches." -Thus, in the remainder of this section, and in Section 5.5, we'll refer - -Figure 4.28 Generalized forwarding: Each packet switch contains a -match-plus-action table that is computed and distributed by a remote -controller - -to these devices as packet switches, adopting the terminology that is -gaining widespread adoption in SDN literature. Figure 4.28 shows a -match-plus-action table in each packet switch, with the table being -computed, installed, and updated by a remote controller. We note that -while it is possible for the control components at the individual packet -switch to interact with each other (e.g., in a manner similar to that in -Figure 4.2), in practice generalized match-plus-action capabilities are -implemented via a remote controller that computes, installs, and updates -these tables. You might take a minute to compare Figures 4.2, 4.3 and -4.28---what similarities and differences do you notice between -destination-based forwarding shown in Figure 4.2 and 4.3, and -generalized forwarding shown in Figure 4.28? Our following discussion of -generalized forwarding will be based on OpenFlow \[McKeown 2008, -OpenFlow 2009, Casado 2014, Tourrilhes 2014\]---a highly visible and -successful standard that has pioneered the notion of the -match-plus-action forwarding abstraction and controllers, as well as the -SDN revolution more generally \[Feamster 2013\]. We'll primarily -consider OpenFlow 1.0, which introduced key SDN abstractions and -functionality in a particularly clear and concise manner. Later versions -of - -OpenFlow introduced additional capabilities as a result of experience -gained through implementation and use; current and earlier versions of -the OpenFlow standard can be found at \[ONF 2016\]. Each entry in the -match-plus-action forwarding table, known as a flow table in OpenFlow, -includes: A set of header field values to which an incoming packet will -be matched. As in the case of destination-based forwarding, -hardware-based matching is most rapidly performed in TCAM memory, with -more than a million destination address entries being possible -\[Bosshart 2013\]. A packet that matches no flow table entry can be -dropped or sent to the remote controller for more processing. In -practice, a flow table may be implemented by multiple flow tables for -performance or cost reasons \[Bosshart 2013\], but we'll focus here on -the abstraction of a single flow table. A set of counters that are -updated as packets are matched to flow table entries. These counters -might include the number of packets that have been matched by that table -entry, and the time since the table entry was last updated. A set of -actions to be taken when a packet matches a flow table entry. These -actions might be to forward the packet to a given output port, to drop -the packet, makes copies of the packet and sent them to multiple output -ports, and/or to rewrite selected header fields. We'll explore matching -and actions in more detail in Sections 4.4.1 and 4.4.2, respectively. -We'll then study how the network-wide collection of per-packet switch -matching rules can be used to implement a wide range of functions -including routing, layer-2 switching, firewalling, load-balancing, -virtual networks, and more in Section 4.4.3. In closing, we note that -the flow table is essentially an API, the abstraction through which an -individual packet switch's behavior can be programmed; we'll see in -Section 4.4.3 that network-wide behaviors can similarly be programmed by -appropriately programming/configuring these tables in a collection of -network packet switches \[Casado 2014\]. - -4.4.1 Match Figure 4.29 shows the eleven packet-header fields and the -incoming port ID that can be matched in an OpenFlow 1.0 -match-plus-action rule. Recall from - -Figure 4.29 Packet matching fields, OpenFlow 1.0 flow table - -Section 1.5.2 that a link-layer (layer 2) frame arriving to a packet -switch will contain a network-layer (layer 3) datagram as its payload, -which in turn will typically contain a transport-layer (layer 4) -segment. The first observation we make is that OpenFlow's match -abstraction allows for a match to be made on selected fields from three -layers of protocol headers (thus rather brazenly defying the layering -principle we studied in Section 1.5). Since we've not yet covered the -link layer, suffice it to say that the source and destination MAC -addresses shown in Figure 4.29 are the link-layer addresses associated -with the frame's sending and receiving interfaces; by forwarding on the -basis of Ethernet addresses rather than IP addresses, we can see that an -OpenFlow-enabled device can equally perform as a router (layer-3 device) -forwarding datagrams as well as a switch (layer-2 device) forwarding -frames. The Ethernet type field corresponds to the upper layer protocol -(e.g., IP) to which the frame's payload will be demultiplexed, and the -VLAN fields are concerned with so-called virtual local area networks -that we'll study in Chapter 6. The set of twelve values that can be -matched in the OpenFlow 1.0 specification has grown to 41 values in more -recent OpenFlow specifications \[Bosshart 2014\]. The ingress port -refers to the input port at the packet switch on which a packet is -received. The packet's IP source address, IP destination address, IP -protocol field, and IP type of service fields were discussed earlier in -Section 4.3.1. The transport-layer source and destination port number -fields can also be matched. Flow table entries may also have wildcards. -For example, an IP address of 128.119.*.* in a flow table will match the -corresponding address field of any datagram that has 128.119 as the -first 16 bits of its address. Each flow table entry also has an -associated priority. If a packet matches multiple flow table entries, -the selected match and corresponding action will be that of the highest -priority entry with which the packet matches. Lastly, we observe that -not all fields in an IP header can be matched. For example OpenFlow does -not allow matching on the basis of TTL field or datagram length field. -Why are some fields allowed for matching, while others are not? -Undoubtedly, the answer has to do with the tradeoff between -functionality and complexity. The "art" in choosing an abstraction is to -provide for enough functionality to accomplish a task (in this case to -implement, configure, and manage a wide range of network-layer functions -that had previously been implemented through an assortment of -network-layer devices), without over-burdening the abstraction with so -much detail and generality that it becomes bloated and unusable. Butler -Lampson has famously noted \[Lampson 1983\]: Do one thing at a time, and -do it well. An interface should capture the minimum essentials of an -abstraction. Don't generalize; generalizations are generally wrong. -Given OpenFlow's success, one can surmise that its designers indeed -chose their abstraction well. Additional details of OpenFlow matching -can be found in \[OpenFlow 2009, ONF 2016\]. - -4.4.2 Action As shown in Figure 4.28, each flow table entry has a list -of zero or more actions that determine the processing that is to be -applied to a packet that matches a flow table entry. If there are -multiple actions, they are performed in the order specified in the list. -Among the most important possible actions are: Forwarding. An incoming -packet may be forwarded to a particular physical output port, broadcast -over all ports (except the port on which it arrived) or multicast over a -selected set of ports. The packet may be encapsulated and sent to the -remote controller for this device. That controller then may (or may not) -take some action on that packet, including installing new flow table -entries, and may return the packet to the device for forwarding under -the updated set of flow table rules. Dropping. A flow table entry with -no action indicates that a matched packet should be dropped. -Modify-field. The values in ten packet header fields (all layer 2, 3, -and 4 fields shown in Figure 4.29 except the IP Protocol field) may be -re-written before the packet is forwarded to the chosen output port. - -4.4.3 OpenFlow Examples of Match-plus-action in Action Having now -considered both the match and action components of generalized -forwarding, let's put these ideas together in the context of the sample -network shown in Figure 4.30. The network has 6 hosts (h1, h2, h3, h4, -h5 and h6) and three packet switches (s1, s2 and s3), each with four -local interfaces (numbered 1 through 4). We'll consider a number of -network-wide behaviors that we'd like to implement, and the flow table -entries in s1, s2 and s3 needed to implement this behavior. - -Figure 4.30 OpenFlow match-plus-action network with three packet -switches, 6 hosts, and an OpenFlow controller - -A First Example: Simple Forwarding As a very simple example, suppose -that the desired forwarding behavior is that packets from h5 or h6 -destined to h3 or h4 are to be forwarded from s3 to s1, and then from s1 -to s2 (thus completely avoiding the use of the link between s3 and s2). -The flow table entry in s1 would be: - -s1 Flow Table (Example 1) Match - -Action - -Ingress Port = 1 ; IP Src = 10.3.*.* ; IP Dst = 10.2.*.* - -Forward(4) - -... - -... - -Of course, we'll also need a flow table entry in s3 so that datagrams -sent from h5 or h6 are forwarded to s1 over outgoing interface 3: - -s3 Flow Table (Example 1) Match - -Action - -IP Src = 10.3.*.* ; IP Dst = 10.2.*.* - -Forward(3) - -... - -... - -Lastly, we'll also need a flow table entry in s2 to complete this first -example, so that datagrams arriving from s1 are forwarded to their -destination, either host h3 or h4: - -s2 Flow Table (Example 1) Match - -Action - -Ingress port = 2 ; IP Dst = 10.2.0.3 - -Forward(3) - -Ingress port = 2 ; IP Dst = 10.2.0.4 - -Forward(4) - -... - -... - -A Second Example: Load Balancing As a second example, let's consider a -load-balancing scenario, where datagrams from h3 destined to 10.1.*.* -are to be forwarded over the direct link between s2 and s1, while -datagrams from h4 destined to 10.1.*.* are to be forwarded over the link -between s2 and s3 (and then from s3 to s1). Note that this behavior -couldn't be achieved with IP's destination-based forwarding. In this -case, the flow table in s2 would be: - -s2 Flow Table (Example 2) Match - -Action - -Ingress port = 3; IP Dst = 10.1.*.* - -Forward(2) - -Ingress port = 4; IP Dst = 10.1.*.* - -Forward(1) - -... - -... - -Flow table entries are also needed at s1 to forward the datagrams -received from s2 to either h1 or h2; and flow table entries are needed -at s3 to forward datagrams received on interface 4 from s2 over -interface 3 towards s1. See if you can figure out these flow table -entries at s1 and s3. A Third Example: Firewalling As a third example, -let's consider a firewall scenario in which s2 wants only to receive (on -any of its interfaces) traffic sent from hosts attached to s3. - -s2 Flow Table (Example 3) Match - -Action - -IP Src = 10.3.*.* IP Dst = 10.2.0.3 - -Forward(3) - -IP Src = 10.3.*.* IP Dst = 10.2.0.4 - -Forward(4) - -... - -... - -If there were no other entries in s2's flow table, then only traffic -from 10.3.*.* would be forwarded to the hosts attached to s2. Although -we've only considered a few basic scenarios here, the versatility and -advantages of generalized forwarding are hopefully apparent. In homework -problems, we'll explore how flow tables can be used to create many -different logical behaviors, including virtual networks---two or more -logically separate networks (each with their own independent and -distinct forwarding behavior)---that use the same physical set of packet -switches and links. In Section 5.5, we'll return to flow tables when we -study the SDN controllers that compute and distribute the flow tables, -and the protocol used for communicating between a packet switch and its -controller. - -4.5 Summary In this chapter we've covered the data plane functions of -the network layer---the per-router functions that determine how packets -arriving on one of a router's input links are forwarded to one of that -router's output links. We began by taking a detailed look at the -internal operations of a router, studying input and output port -functionality and destination-based forwarding, a router's internal -switching mechanism, packet queue management and more. We covered both -traditional IP forwarding (where forwarding is based on a datagram's -destination address) and generalized forwarding (where forwarding and -other functions may be performed using values in several different -fields in the datagram's header) and seen the versatility of the latter -approach. We also studied the IPv4 and IPv6 protocols in detail, and -Internet addressing, which we found to be much deeper, subtler, and more -interesting than we might have expected. With our newfound understanding -of the network-layer's data plane, we're now ready to dive into the -network layer's control plane in Chapter 5! - -Homework Problems and Questions - -Chapter 4 Review Questions - -SECTION 4.1 R1. Let's review some of the terminology used in this -textbook. Recall that the name of a transport-layer packet is segment -and that the name of a link-layer packet is frame. What is the name of a -network-layer packet? Recall that both routers and link-layer switches -are called packet switches. What is the fundamental difference between a -router and link-layer switch? R2. We noted that network layer -functionality can be broadly divided into data plane functionality and -control plane functionality. What are the main functions of the data -plane? Of the control plane? R3. We made a distinction between the -forwarding function and the routing function performed in the network -layer. What are the key differences between routing and forwarding? R4. -What is the role of the forwarding table within a router? R5. We said -that a network layer's service model "defines the characteristics of -end-to-end transport of packets between sending and receiving hosts." -What is the service model of the Internet's network layer? What -guarantees are made by the Internet's service model regarding the -host-to-host delivery of datagrams? - -SECTION 4.2 R6. In Section 4.2 , we saw that a router typically consists -of input ports, output ports, a switching fabric and a routing -processor. Which of these are implemented in hardware and which are -implemented in software? Why? Returning to the notion of the network -layer's data plane and control plane, which are implemented in hardware -and which are implemented in software? Why? R7. Discuss why each input -port in a high-speed router stores a shadow copy of the forwarding -table. R8. What is meant by destination-based forwarding? How does this -differ from generalized forwarding (assuming you've read Section 4.4 , -which of the two approaches are adopted by Software-Defined Networking)? -R9. Suppose that an arriving packet matches two or more entries in a -router's forwarding table. With traditional destination-based -forwarding, what rule does a router apply to determine which - -of these rules should be applied to determine the output port to which -the arriving packet should be switched? R10. Three types of switching -fabrics are discussed in Section 4.2 . List and briefly describe each -type. Which, if any, can send multiple packets across the fabric in -parallel? R11. Describe how packet loss can occur at input ports. -Describe how packet loss at input ports can be eliminated (without using -infinite buffers). R12. Describe how packet loss can occur at output -ports. Can this loss be prevented by increasing the switch fabric speed? -R13. What is HOL blocking? Does it occur in input ports or output ports? -R14. In Section 4.2 , we studied FIFO, Priority, Round Robin (RR), and -Weighted Fair Queueing (WFQ) packet scheduling disciplines? Which of -these queueing disciplines ensure that all packets depart in the order -in which they arrived? R15. Give an example showing why a network -operator might want one class of packets to be given priority over -another class of packets. R16. What is an essential different between RR -and WFQ packet scheduling? Is there a case (Hint: Consider the WFQ -weights) where RR and WFQ will behave exactly the same? - -SECTION 4.3 R17. Suppose Host A sends Host B a TCP segment encapsulated -in an IP datagram. When Host B receives the datagram, how does the -network layer in Host B know it should pass the segment (that is, the -payload of the datagram) to TCP rather than to UDP or to some other -upper-layer protocol? R18. What field in the IP header can be used to -ensure that a packet is forwarded through no more than N routers? R19. -Recall that we saw the Internet checksum being used in both -transport-layer segment (in UDP and TCP headers, Figures 3.7 and 3.29 -respectively) and in network-layer datagrams (IP header, Figure 4.16 ). -Now consider a transport layer segment encapsulated in an IP datagram. -Are the checksums in the segment header and datagram header computed -over any common bytes in the IP datagram? Explain your answer. R20. When -a large datagram is fragmented into multiple smaller datagrams, where -are these smaller datagrams reassembled into a single larger datagram? -R21. Do routers have IP addresses? If so, how many? R22. What is the -32-bit binary equivalent of the IP address 223.1.3.27? R23. Visit a host -that uses DHCP to obtain its IP address, network mask, default router, -and IP address of its local DNS server. List these values. R24. Suppose -there are three routers between a source host and a destination host. -Ignoring fragmentation, an IP datagram sent from the source host to the -destination host will travel over how many interfaces? How many -forwarding tables will be indexed to move the datagram from the source -to the destination? - -R25. Suppose an application generates chunks of 40 bytes of data every -20 msec, and each chunk gets encapsulated in a TCP segment and then an -IP datagram. What percentage of each datagram will be overhead, and what -percentage will be application data? R26. Suppose you purchase a -wireless router and connect it to your cable modem. Also suppose that -your ISP dynamically assigns your connected device (that is, your -wireless router) one IP address. Also suppose that you have five PCs at -home that use 802.11 to wirelessly connect to your wireless router. How -are IP addresses assigned to the five PCs? Does the wireless router use -NAT? Why or why not? R27. What is meant by the term "route aggregation"? -Why is it useful for a router to perform route aggregation? R28. What is -meant by a "plug-and-play" or "zeroconf" protocol? R29. What is a -private network address? Should a datagram with a private network -address ever be present in the larger public Internet? Explain. R30. -Compare and contrast the IPv4 and the IPv6 header fields. Do they have -any fields in common? R31. It has been said that when IPv6 tunnels -through IPv4 routers, IPv6 treats the IPv4 tunnels as link-layer -protocols. Do you agree with this statement? Why or why not? - -SECTION 4.4 R32. How does generalized forwarding differ from -destination-based forwarding? R33. What is the difference between a -forwarding table that we encountered in destinationbased forwarding in -Section 4.1 and OpenFlow's flow table that we encountered in Section 4.4 -? R34. What is meant by the "match plus action" operation of a router or -switch? In the case of destination-based forwarding packet switch, what -is matched and what is the action taken? In the case of an SDN, name -three fields that can be matched, and three actions that can be taken. -R35. Name three header fields in an IP datagram that can be "matched" in -OpenFlow 1.0 generalized forwarding. What are three IP datagram header -fields that cannot be "matched" in OpenFlow? - -Problems P1. Consider the network below. - -a. Show the forwarding table in router A, such that all traffic - destined to host H3 is forwarded through interface 3. - -b. Can you write down a forwarding table in router A, such that all - traffic from H1 destined to host H3 is forwarded through interface - 3, while all traffic from H2 destined to host H3 is forwarded - through interface 4? (Hint: This is a trick question.) - -P2. Suppose two packets arrive to two different input ports of a router -at exactly the same time. Also suppose there are no other packets -anywhere in the router. - -a. Suppose the two packets are to be forwarded to two different output - ports. Is it possible to forward the two packets through the switch - fabric at the same time when the fabric uses a shared bus? - -b. Suppose the two packets are to be forwarded to two different output - ports. Is it possible to forward the two packets through the switch - fabric at the same time when the fabric uses switching via memory? - -c. Suppose the two packets are to be forwarded to the same output port. - Is it possible to forward the two packets through the switch fabric - at the same time when the fabric uses a crossbar? P3. In Section 4.2 - , we noted that the maximum queuing delay is (n--1)D if the - switching fabric is n times faster than the input line rates. - Suppose that all packets are of the same length, n packets arrive at - the same time to the n input ports, and all n packets want to be - forwarded to different output ports. What is the maximum delay for a - packet for the (a) memory, (b) bus, and - -```{=html} -<!-- --> -``` -(c) crossbar switching fabrics? P4. Consider the switch shown below. - Suppose that all datagrams have the same fixed length, that the - switch operates in a slotted, synchronous manner, and that in one - time slot a datagram can be transferred from an input port to an - output port. The switch fabric is a crossbar so that at most one - datagram can be transferred to a given output port in a time slot, - but different output ports can receive datagrams from different - input ports in a single time slot. What is the minimal number of - time slots needed to transfer the packets shown from input ports to - their output ports, assuming any input queue scheduling order you - want (i.e., it need not have HOL blocking)? What is the largest - number of slots needed, assuming the worst-case scheduling order you - can devise, assuming that a non-empty input queue is never idle? - -P5. Consider a datagram network using 32-bit host addresses. Suppose a -router has four links, numbered 0 through 3, and packets are to be -forwarded to the link interfaces as follows: Destination Address Range - -Link Interface - -11100000 00000000 00000000 00000000 - -0 - -through 11100000 00111111 11111111 11111111 11100000 01000000 00000000 -00000000 - -1 - -through 11100000 01000000 11111111 11111111 2 - -11100000 01000001 00000000 00000000 through 11100001 01111111 11111111 -11111111 otherwise - -3 - -a. Provide a forwarding table that has five entries, uses longest - prefix matching, and forwards packets to the correct link - interfaces. - -b. Describe how your forwarding table determines the appropriate link - interface for datagrams with destination addresses: 11001000 - 10010001 01010001 01010101 11100001 01000000 11000011 00111100 - 11100001 10000000 00010001 01110111 P6. Consider a datagram network - using 8-bit host addresses. Suppose a router uses longest prefix - matching and has the following forwarding table: Prefix Match - -Interface - -00 - -0 - -010 - -1 - -011 - -2 - -10 - -2 - -11 - -3 - -For each of the four interfaces, give the associated range of -destination host addresses and the number of addresses in the range. P7. -Consider a datagram network using 8-bit host addresses. Suppose a router -uses longest prefix matching and has the following forwarding table: -Prefix Match - -Interface - -1 - -0 - -10 - -1 - -111 - -2 - -otherwise - -3 - -For each of the four interfaces, give the associated range of -destination host addresses and the number of addresses in the range. P8. -Consider a router that interconnects three subnets: Subnet 1, Subnet 2, -and Subnet 3. Suppose all of the interfaces in each of these three -subnets are required to have the prefix 223.1.17/24. Also suppose that -Subnet 1 is required to support at least 60 interfaces, Subnet 2 is to -support at least 90 interfaces, and Subnet 3 is to support at least 12 -interfaces. Provide three network addresses (of the form a.b.c.d/x) that -satisfy these constraints. P9. In Section 4.2.2 an example forwarding -table (using longest prefix matching) is given. Rewrite this forwarding -table using the a.b.c.d/x notation instead of the binary string -notation. P10. In Problem P5 you are asked to provide a forwarding table -(using longest prefix matching). Rewrite this forwarding table using the -a.b.c.d/x notation instead of the binary string notation. P11. Consider -a subnet with prefix 128.119.40.128/26. Give an example of one IP -address (of form xxx.xxx.xxx.xxx) that can be assigned to this network. -Suppose an ISP owns the block of addresses of the form 128.119.40.64/26. -Suppose it wants to create four subnets from this block, with each block -having the same number of IP addresses. What are the prefixes (of form - -a.b.c.d/x) for the four subnets? P12. Consider the topology shown in -Figure 4.20 . Denote the three subnets with hosts (starting clockwise at -12:00) as Networks A, B, and C. Denote the subnets without hosts as -Networks D, E, and F. - -a. Assign network addresses to each of these six subnets, with the - following constraints: All addresses must be allocated from - 214.97.254/23; Subnet A should have enough addresses to support 250 - interfaces; Subnet B should have enough addresses to support 120 - interfaces; and Subnet C should have enough addresses to support 120 - interfaces. Of course, subnets D, E and F should each be able to - support two interfaces. For each subnet, the assignment should take - the form a.b.c.d/x or a.b.c.d/x -- e.f.g.h/y. - -b. Using your answer to part (a), provide the forwarding tables (using - longest prefix matching) for each of the three routers. P13. Use the - whois service at the American Registry for Internet Numbers - (http://www.arin.net/ whois) to determine the IP address blocks for - three universities. Can the whois services be used to determine with - certainty the geographical location of a specific IP address? Use - www.maxmind.com to determine the locations of the Web servers at - each of these universities. P14. Consider sending a 2400-byte - datagram into a link that has an MTU of 700 bytes. Suppose the - original datagram is stamped with the identification number 422. How - many fragments are generated? What are the values in the various - fields in the IP datagram(s) generated related to fragmentation? - P15. Suppose datagrams are limited to 1,500 bytes (including header) - between source Host A and destination Host B. Assuming a 20-byte IP - header, how many datagrams would be required to send an MP3 - consisting of 5 million bytes? Explain how you computed your answer. - P16. Consider the network setup in Figure 4.25 . Suppose that the - ISP instead assigns the router the address 24.34.112.235 and that - the network address of the home network is 192.168.1/24. - -c. Assign addresses to all interfaces in the home network. - -d. Suppose each host has two ongoing TCP connections, all to port 80 at - host 128.119.40.86. Provide the six corresponding entries in the NAT - translation table. P17. Suppose you are interested in detecting the - number of hosts behind a NAT. You observe that the IP layer stamps - an identification number sequentially on each IP packet. The - identification number of the first IP packet generated by a host is - a random number, and the identification numbers of the subsequent IP - packets are sequentially assigned. Assume all IP packets generated - by hosts behind the NAT are sent to the outside world. - -e. Based on this observation, and assuming you can sniff all packets - sent by the NAT to the outside, can you outline a simple technique - that detects the number of unique hosts behind a NAT? Justify your - answer. - -f. If the identification numbers are not sequentially assigned but - randomly assigned, would - -your technique work? Justify your answer. P18. In this problem we'll -explore the impact of NATs on P2P applications. Suppose a peer with -username Arnold discovers through querying that a peer with username -Bernard has a file it wants to download. Also suppose that Bernard and -Arnold are both behind a NAT. Try to devise a technique that will allow -Arnold to establish a TCP connection with Bernard without -applicationspecific NAT configuration. If you have difficulty devising -such a technique, discuss why. P19. Consider the SDN OpenFlow network -shown in Figure 4.30 . Suppose that the desired forwarding behavior for -datagrams arriving at s2 is as follows: any datagrams arriving on input -port 1 from hosts h5 or h6 that are destined to hosts h1 or h2 should be -forwarded over output port 2; any datagrams arriving on input port 2 -from hosts h1 or h2 that are destined to hosts h5 or h6 should be -forwarded over output port 1; any arriving datagrams on input ports 1 or -2 and destined to hosts h3 or h4 should be delivered to the host -specified; hosts h3 and h4 should be able to send datagrams to each -other. Specify the flow table entries in s2 that implement this -forwarding behavior. P20. Consider again the SDN OpenFlow network shown -in Figure 4.30 . Suppose that the desired forwarding behavior for -datagrams arriving from hosts h3 or h4 at s2 is as follows: any -datagrams arriving from host h3 and destined for h1, h2, h5 or h6 should -be forwarded in a clockwise direction in the network; any datagrams -arriving from host h4 and destined for h1, h2, h5 or h6 should be -forwarded in a counter-clockwise direction in the network. Specify the -flow table entries in s2 that implement this forwarding behavior. P21. -Consider again the scenario from P19 above. Give the flow tables entries -at packet switches s1 and s3, such that any arriving datagrams with a -source address of h3 or h4 are routed to the destination hosts specified -in the destination address field in the IP datagram. (Hint: Your -forwarding table rules should include the cases that an arriving -datagram is destined for a directly attached host or should be forwarded -to a neighboring router for eventual host delivery there.) P22. Consider -again the SDN OpenFlow network shown in Figure 4.30 . Suppose we want -switch s2 to function as a firewall. Specify the flow table in s2 that -implements the following firewall behaviors (specify a different flow -table for each of the four firewalling behaviors below) for delivery of -datagrams destined to h3 and h4. You do not need to specify the -forwarding behavior in s2 that forwards traffic to other routers. Only -traffic arriving from hosts h1 and h6 should be delivered to hosts h3 or -h4 (i.e., that arriving traffic from hosts h2 and h5 is blocked). Only -TCP traffic is allowed to be delivered to hosts h3 or h4 (i.e., that UDP -traffic is blocked). - -Only traffic destined to h3 is to be delivered (i.e., all traffic to h4 -is blocked). Only UDP traffic from h1 and destined to h3 is to be -delivered. All other traffic is blocked. - -Wireshark Lab In the Web site for this textbook, -www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab -assignment that examines the operation of the IP protocol, and the IP -datagram format in particular. - -AN INTERVIEW WITH... Vinton G. Cerf Vinton G. Cerf is Vice President and -Chief Internet Evangelist for Google. He served for over 16 years at MCI -in various positions, ending up his tenure there as Senior Vice -President for Technology Strategy. He is widely known as the co-designer -of the TCP/IP protocols and the architecture of the Internet. During his -time from 1976 to 1982 at the US Department of Defense Advanced Research -Projects Agency (DARPA), he played a key role leading the development of -Internet and Internet-related data packet and security techniques. He -received the US Presidential Medal of Freedom in 2005 and the US -National Medal of Technology in 1997. He holds a BS in Mathematics from -Stanford University and an MS and PhD in computer science from UCLA. - -What brought you to specialize in networking? I was working as a -programmer at UCLA in the late 1960s. My job was supported by the US -Defense Advanced Research Projects Agency (called ARPA then, called -DARPA now). I was working in the laboratory of Professor Leonard -Kleinrock on the Network Measurement Center of the newly created -ARPAnet. The first node of the ARPAnet was installed at UCLA on -September 1, 1969. I was responsible for programming a computer that was -used to capture performance information about the ARPAnet and to report -this information back for comparison with mathematical models and -predictions of the performance of the network. Several of the other -graduate students and I were made responsible for working on the -so-called - -host-level protocols of the ARPAnet---the procedures and formats that -would allow many different kinds of computers on the network to interact -with each other. It was a fascinating exploration into a new world (for -me) of distributed computing and communication. Did you imagine that IP -would become as pervasive as it is today when you first designed the -protocol? When Bob Kahn and I first worked on this in 1973, I think we -were mostly very focused on the central question: How can we make -heterogeneous packet networks interoperate with one another, assuming we -cannot actually change the networks themselves? We hoped that we could -find a way to permit an arbitrary collection of packet-switched networks -to be interconnected in a transparent fashion, so that host computers -could communicate end-to-end without having to do any translations in -between. I think we knew that we were dealing with powerful and -expandable technology, but I doubt we had a clear image of what the -world would be like with hundreds of millions of computers all -interlinked on the Internet. What do you now envision for the future of -networking and the Internet? What major challenges/obstacles do you -think lie ahead in their development? I believe the Internet itself and -networks in general will continue to proliferate. Already there is -convincing evidence that there will be billions of Internet-enabled -devices on the Internet, including appliances like cell phones, -refrigerators, personal digital assistants, home servers, televisions, -as well as the usual array of laptops, servers, and so on. Big -challenges include support for mobility, battery life, capacity of the -access links to the network, and ability to scale the optical core of -the network up in an unlimited fashion. Designing an interplanetary -extension of the Internet is a project in which I am deeply engaged at -the Jet Propulsion Laboratory. We will need to cut over from IPv4 -\[32-bit addresses\] to IPv6 \[128 bits\]. The list is long! Who has -inspired you professionally? My colleague Bob Kahn; my thesis advisor, -Gerald Estrin; my best friend, Steve Crocker (we met in high school and -he introduced me to computers in 1960!); and the thousands of engineers -who continue to evolve the Internet today. Do you have any advice for -students entering the networking/Internet field? Think outside the -limitations of existing systems---imagine what might be possible; but -then do the hard work of figuring out how to get there from the current -state of affairs. Dare to dream: A half dozen colleagues and I at the -Jet Propulsion Laboratory have been working on the design of an -interplanetary extension of the terrestrial Internet. It may take -decades to implement this, - -mission by mission, but to paraphrase: "A man's reach should exceed his -grasp, or what are the heavens for?" - -Chapter 5 The Network Layer: Control Plane - -In this chapter, we'll complete our journey through the network layer by -covering the control-plane component of the network layer---the -network-wide logic that controls not only how a datagram is forwarded -among routers along an end-to-end path from the source host to the -destination host, but also how network-layer components and services are -configured and managed. In Section 5.2, we'll cover traditional routing -algorithms for computing least cost paths in a graph; these algorithms -are the basis for two widely deployed Internet routing protocols: OSPF -and BGP, that we'll cover in Sections 5.3 and 5.4, respectively. As -we'll see, OSPF is a routing protocol that operates within a single -ISP's network. BGP is a routing protocol that serves to interconnect all -of the networks in the Internet; BGP is thus often referred to as the -"glue" that holds the Internet together. Traditionally, control-plane -routing protocols have been implemented together with data-plane -forwarding functions, monolithically, within a router. As we learned in -the introduction to Chapter 4, software-defined networking (SDN) makes a -clear separation between the data and control planes, implementing -control-plane functions in a separate "controller" service that is -distinct, and remote, from the forwarding components of the routers it -controls. We'll cover SDN controllers in Section 5.5. In Sections 5.6 -and 5.7 we'll cover some of the nuts and bolts of managing an IP -network: ICMP (the Internet Control Message Protocol) and SNMP (the -Simple Network Management Protocol). - -5.1 Introduction Let's quickly set the context for our study of the -network control plane by recalling Figures 4.2 and 4.3. There, we saw -that the forwarding table (in the case of destination-based forwarding) -and the flow table (in the case of generalized forwarding) were the -principal elements that linked the network layer's data and control -planes. We learned that these tables specify the local data-plane -forwarding behavior of a router. We saw that in the case of generalized -forwarding, the actions taken (Section 4.4.2) could include not only -forwarding a packet to a router's output port, but also dropping a -packet, replicating a packet, and/or rewriting layer 2, 3 or 4 -packet-header fields. In this chapter, we'll study how those forwarding -and flow tables are computed, maintained and installed. In our -introduction to the network layer in Section 4.1, we learned that there -are two possible approaches for doing so. Per-router control. Figure 5.1 -illustrates the case where a routing algorithm runs in each and every -router; both a forwarding and a routing function are contained - -Figure 5.1 Per-router control: Individual routing algorithm components -interact in the control plane - -within each router. Each router has a routing component that -communicates with the routing components in other routers to compute the -values for its forwarding table. This per-router control approach has -been used in the Internet for decades. The OSPF and BGP protocols that -we'll study in Sections 5.3 and 5.4 are based on this per-router -approach to control. Logically centralized control. Figure 5.2 -illustrates the case in which a logically centralized controller -computes and distributes the forwarding tables to be used by each and -every router. As we saw in Section 4.4, the generalized -match-plus-action abstraction allows the router to perform traditional -IP forwarding as well as a rich set of other functions (load sharing, -firewalling, and NAT) that had been previously implemented in separate -middleboxes. - -Figure 5.2 Logically centralized control: A distinct, typically remote, -controller interacts with local control agents (CAs) - -The controller interacts with a control agent (CA) in each of the -routers via a well-defined protocol to configure and manage that -router's flow table. Typically, the CA has minimum functionality; its -job is to communicate with the controller, and to do as the controller -commands. Unlike the routing algorithms in Figure 5.1, the CAs do not -directly interact with each other nor do they actively take part in -computing - -the forwarding table. This is a key distinction between per-router -control and logically centralized control. By "logically centralized" -control \[Levin 2012\] we mean that the routing control service is -accessed as if it were a single central service point, even though the -service is likely to be implemented via multiple servers for -fault-tolerance, and performance scalability reasons. As we will see in -Section 5.5, SDN adopts this notion of a logically centralized -controller---an approach that is finding increased use in production -deployments. Google uses SDN to control the routers in its internal B4 -global wide-area network that interconnects its data centers \[Jain -2013\]. SWAN \[Hong 2013\], from Microsoft Research, uses a logically -centralized controller to manage routing and forwarding between a wide -area network and a data center network. China Telecom and China Unicom -are using SDN both within data centers and between data centers \[Li -2015\]. AT&T has noted \[AT&T 2013\] that it "supports many SDN -capabilities and independently defined, proprietary mechanisms that fall -under the SDN architectural framework." - -5.2 Routing Algorithms In this section we'll study routing algorithms, -whose goal is to determine good paths (equivalently, routes), from -senders to receivers, through the network of routers. Typically, a -"good" path is one that has the least cost. We'll see that in practice, -however, real-world concerns such as policy issues (for example, a rule -such as "router x, belonging to organization Y, should not forward any -packets originating from the network owned by organization Z") also come -into play. We note that whether the network control plane adopts a -per-router control approach or a logically centralized approach, there -must always be a welldefined sequence of routers that a packet will -cross in traveling from sending to receiving host. Thus, the routing -algorithms that compute these paths are of fundamental importance, and -another candidate for our top-10 list of fundamentally important -networking concepts. A graph is used to formulate routing problems. -Recall that a graph G=(N, E) is a set N of nodes and a collection E of -edges, where each edge is a pair of nodes from N. In the context of -network-layer routing, the nodes in the graph represent - -Figure 5.3 Abstract graph model of a computer network - -routers---the points at which packet-forwarding decisions are made---and -the edges connecting these nodes represent the physical links between -these routers. Such a graph abstraction of a computer network is shown -in Figure 5.3. To view some graphs representing real network maps, see -\[Dodge 2016, Cheswick 2000\]; for a discussion of how well different -graph-based models model the Internet, see \[Zegura 1997, Faloutsos -1999, Li 2004\]. As shown in Figure 5.3, an edge also has a value -representing its cost. Typically, an edge's cost may reflect the -physical length of the corresponding link (for example, a transoceanic -link might have a higher - -cost than a short-haul terrestrial link), the link speed, or the -monetary cost associated with a link. For our purposes, we'll simply -take the edge costs as a given and won't worry about how they are -determined. For any edge (x, y) in E, we denote c(x, y) as the cost of -the edge between nodes x and y. If the pair (x, y) does not belong to E, -we set c(x, y)=∞. Also, we'll only consider undirected graphs (i.e., -graphs whose edges do not have a direction) in our discussion here, so -that edge (x, y) is the same as edge (y, x) and that c(x, y)=c(y, x); -however, the algorithms we'll study can be easily extended to the case -of directed links with a different cost in each direction. Also, a node -y is said to be a neighbor of node x if (x, y) belongs to E. Given that -costs are assigned to the various edges in the graph abstraction, a -natural goal of a routing algorithm is to identify the least costly -paths between sources and destinations. To make this problem more -precise, recall that a path in a graph G=(N, E) is a sequence of nodes -(x1,x2,⋯,xp) such that each of the pairs (x1,x2),(x2,x3),⋯,(xp−1,xp) are -edges in E. The cost of a path (x1,x2,⋯, xp) is simply the sum of all -the edge costs along the path, that is, c(x1,x2)+c(x2,x3)+⋯+c(xp−1,xp). -Given any two nodes x and y, there are typically many paths between the -two nodes, with each path having a cost. One or more of these paths is a -least-cost path. The least-cost problem is therefore clear: Find a path -between the source and destination that has least cost. In Figure 5.3, -for example, the least-cost path between source node u and destination -node w is (u, x, y, w) with a path cost of 3. Note that if all edges in -the graph have the same cost, the least-cost path is also the shortest -path (that is, the path with the smallest number of links between the -source and the destination). As a simple exercise, try finding the -least-cost path from node u to z in Figure 5.3 and reflect for a moment -on how you calculated that path. If you are like most people, you found -the path from u to z by examining Figure 5.3, tracing a few routes from -u to z, and somehow convincing yourself that the path you had chosen had -the least cost among all possible paths. (Did you check all of the 17 -possible paths between u and z? Probably not!) Such a calculation is an -example of a centralized routing algorithm---the routing algorithm was -run in one location, your brain, with complete information about the -network. Broadly, one way in which we can classify routing algorithms is -according to whether they are centralized or decentralized. A -centralized routing algorithm computes the least-cost path between a -source and destination using complete, global knowledge about the -network. That is, the algorithm takes the connectivity between all nodes -and all link costs as inputs. This then requires that the algorithm -somehow obtain this information before actually performing the -calculation. The calculation itself can be run at one site (e.g., a -logically centralized controller as in Figure 5.2) or could be -replicated in the routing component of each and every router (e.g., as -in Figure 5.1). The key distinguishing feature here, however, is that -the algorithm has complete information about connectivity and link -costs. Algorithms with global state information are often referred to as -link-state (LS) algorithms, since the algorithm must be aware of the -cost of each link in the network. We'll study LS algorithms in Section -5.2.1. In a decentralized routing algorithm, the calculation of the -least-cost path is carried out in an - -iterative, distributed manner by the routers. No node has complete -information about the costs of all network links. Instead, each node -begins with only the knowledge of the costs of its own directly attached -links. Then, through an iterative process of calculation and exchange of -information with its neighboring nodes, a node gradually calculates the -least-cost path to a destination or set of destinations. The -decentralized routing algorithm we'll study below in Section 5.2.2 is -called a distance-vector (DV) algorithm, because each node maintains a -vector of estimates of the costs (distances) to all other nodes in the -network. Such decentralized algorithms, with interactive message -exchange between neighboring routers is perhaps more naturally suited to -control planes where the routers interact directly with each other, as -in Figure 5.1. A second broad way to classify routing algorithms is -according to whether they are static or dynamic. In static routing -algorithms, routes change very slowly over time, often as a result of -human intervention (for example, a human manually editing a link costs). -Dynamic routing algorithms change the routing paths as the network -traffic loads or topology change. A dynamic algorithm can be run either -periodically or in direct response to topology or link cost changes. -While dynamic algorithms are more responsive to network changes, they -are also more susceptible to problems such as routing loops and route -oscillation. A third way to classify routing algorithms is according to -whether they are load-sensitive or loadinsensitive. In a load-sensitive -algorithm, link costs vary dynamically to reflect the current level of -congestion in the underlying link. If a high cost is associated with a -link that is currently congested, a routing algorithm will tend to -choose routes around such a congested link. While early ARPAnet routing -algorithms were load-sensitive \[McQuillan 1980\], a number of -difficulties were encountered \[Huitema 1998\]. Today's Internet routing -algorithms (such as RIP, OSPF, and BGP) are load-insensitive, as a -link's cost does not explicitly reflect its current (or recent past) -level of congestion. - -5.2.1 The Link-State (LS) Routing Algorithm Recall that in a link-state -algorithm, the network topology and all link costs are known, that is, -available as input to the LS algorithm. In practice this is accomplished -by having each node broadcast link-state packets to all other nodes in -the network, with each link-state packet containing the identities and -costs of its attached links. In practice (for example, with the -Internet's OSPF routing protocol, discussed in Section 5.3) this is -often accomplished by a link-state broadcast algorithm \[Perlman 1999\]. -The result of the nodes' broadcast is that all nodes have an identical -and complete view of the network. Each node can then run the LS -algorithm and compute the same set of least-cost paths as every other -node. The link-state routing algorithm we present below is known as -Dijkstra's algorithm, named after its inventor. A closely related -algorithm is Prim's algorithm; see \[Cormen 2001\] for a general -discussion of graph algorithms. Dijkstra's algorithm computes the -least-cost path from one node (the source, which we will refer to as u) -to all other nodes in the network. Dijkstra's algorithm is iterative and -has the property that - -after the kth iteration of the algorithm, the least-cost paths are known -to k destination nodes, and among the least-cost paths to all -destination nodes, these k paths will have the k smallest costs. Let us -define the following notation: D(v): cost of the least-cost path from -the source node to destination v as of this iteration of the algorithm. -p(v): previous node (neighbor of v) along the current least-cost path -from the source to v. N′: subset of nodes; v is in N′ if the least-cost -path from the source to v is definitively known. The centralized routing -algorithm consists of an initialization step followed by a loop. The -number of times the loop is executed is equal to the number of nodes in -the network. Upon termination, the algorithm will have calculated the -shortest paths from the source node u to every other node in the -network. - -Link-State (LS) Algorithm for Source Node u - -1 - -Initialization: - -2 - -N' = {u} - -3 - -for all nodes v - -4 - -if v is a neighbor of u - -5 - -then D(v) = c(u, v) - -6 - -else D(v) = ∞ - -7 8 - -Loop - -9 - -find w not in N' such that D(w) is a minimum - -10 - -add w to N' - -11 - -update D(v) for each neighbor v of w and not in N': - -12 - -D(v) = min(D(v), D(w)+ c(w, v) ) - -13 - -/\* new cost to v is either old cost to v or known - -14 - -least path cost to w plus cost from w to v \*/ - -15 until N'= N - -As an example, let's consider the network in Figure 5.3 and compute the -least-cost paths from u to all possible destinations. A tabular summary -of the algorithm's computation is shown in Table 5.1, where each line in -the table gives the values of the algorithm's variables at the end of -the iteration. Let's consider the few first steps in detail. In the -initialization step, the currently known least-cost paths from u to its -directly attached neighbors, - -v, x, and w, are initialized to 2, 1, and 5, respectively. Note in Table -5.1 Running the link-state algorithm on the network in Figure 5.3 step - -N' - -D (v), p (v) - -D (w), p (w) - -D (x), p (x) - -D (y), p (y) - -D (z), p (z) - -0 - -u - -2, u - -5, u - -1,u - -∞ - -∞ - -1 - -ux - -2, u - -4, x - -2, x - -∞ - -2 - -uxy - -2, u - -3, y - -4, y - -3 - -uxyv - -3, y - -4, y - -4 - -uxyvw - -5 - -uxyvwz - -4, y - -particular that the cost to w is set to 5 (even though we will soon see -that a lesser-cost path does indeed exist) since this is the cost of the -direct (one hop) link from u to w. The costs to y and z are set to -infinity because they are not directly connected to u. In the first -iteration, we look among those nodes not yet added to the set N′ and -find that node with the least cost as of the end of the previous -iteration. That node is x, with a cost of 1, and thus x is added to the -set N′. Line 12 of the LS algorithm is then performed to update D(v) for -all nodes v, yielding the results shown in the second line (Step 1) in -Table 5.1. The cost of the path to v is unchanged. The cost of the path -to w (which was 5 at the end of the initialization) through node x is -found to have a cost of 4. Hence this lower-cost path is selected and -w's predecessor along the shortest path from u is set to x. Similarly, -the cost to y (through x) is computed to be 2, and the table is updated -accordingly. In the second iteration, nodes v and y are found to have -the least-cost paths (2), and we break the tie arbitrarily and add y to -the set N′ so that N′ now contains u, x, and y. The cost to the -remaining nodes not yet in N′, that is, nodes v, w, and z, are updated -via line 12 of the LS algorithm, yielding the results shown in the third -row in Table 5.1. And so on . . . When the LS algorithm terminates, we -have, for each node, its predecessor along the least-cost path from the -source node. For each predecessor, we also have its predecessor, and so -in this manner we can construct the entire path from the source to all -destinations. The forwarding table in a node, say node u, can then be -constructed from this information by storing, for each destination, the -next-hop node on the least-cost path from u to the destination. Figure -5.4 shows the resulting least-cost paths and forwarding table in u for -the network in Figure 5.3. - -Figure 5.4 Least cost path and forwarding table for node u - -What is the computational complexity of this algorithm? That is, given n -nodes (not counting the source), how much computation must be done in -the worst case to find the least-cost paths from the source to all -destinations? In the first iteration, we need to search through all n -nodes to determine the node, w, not in N′ that has the minimum cost. In -the second iteration, we need to check n−1 nodes to determine the -minimum cost; in the third iteration n−2 nodes, and so on. Overall, the -total number of nodes we need to search through over all the iterations -is n(n+1)/2, and thus we say that the preceding implementation of the LS -algorithm has worst-case complexity of order n squared: O(n2). (A more -sophisticated implementation of this algorithm, using a data structure -known as a heap, can find the minimum in line 9 in logarithmic rather -than linear time, thus reducing the complexity.) Before completing our -discussion of the LS algorithm, let us consider a pathology that can -arise. Figure 5.5 shows a simple network topology where link costs are -equal to the load carried on the link, for example, reflecting the delay -that would be experienced. In this example, link costs are not -symmetric; that is, c(u, v) equals c(v, u) only if the load carried on -both directions on the link (u, v) is the same. In this example, node z -originates a unit of traffic destined for w, node x also originates a -unit of traffic destined for w, and node y injects an amount of traffic -equal to e, also destined for w. The initial routing is shown in Figure -5.5(a) with the link costs corresponding to the amount of traffic -carried. When the LS algorithm is next run, node y determines (based on -the link costs shown in Figure 5.5(a)) that the clockwise path to w has -a cost of 1, while the counterclockwise path to w (which it had been -using) has a cost of 1+e. Hence y's least-cost path to w is now -clockwise. Similarly, x determines that its new least-cost path to w is -also clockwise, resulting in costs shown in Figure 5.5(b). When the LS -algorithm is run next, nodes x, y, and z all detect a zero-cost path to -w in the counterclockwise direction, and all route their traffic to the -counterclockwise routes. The next time the LS algorithm is run, x, y, -and z all then route their traffic to the clockwise routes. What can be -done to prevent such oscillations (which can occur in any algorithm, not -just an LS algorithm, that uses a congestion or delay-based link -metric)? One solution would be to mandate that link costs not depend on -the amount of traffic - -Figure 5.5 Oscillations with congestion-sensitive routing - -carried---an unacceptable solution since one goal of routing is to avoid -highly congested (for example, high-delay) links. Another solution is to -ensure that not all routers run the LS algorithm at the same time. This -seems a more reasonable solution, since we would hope that even if -routers ran the LS algorithm with the same periodicity, the execution -instance of the algorithm would not be the same at each node. -Interestingly, researchers have found that routers in the Internet can -self-synchronize among themselves \[Floyd Synchronization 1994\]. That -is, even though they initially execute the algorithm with the same -period but at different instants of time, the algorithm execution -instance can eventually become, and remain, synchronized at the routers. -One way to avoid such self-synchronization is for each router to -randomize the time it sends out a link advertisement. Having studied the -LS algorithm, let's consider the other major routing algorithm that is -used in practice today---the distance-vector routing algorithm. - -5.2.2 The Distance-Vector (DV) Routing Algorithm Whereas the LS -algorithm is an algorithm using global information, the distance-vector -(DV) algorithm is iterative, asynchronous, and distributed. It is -distributed in that each node receives some information from one or more -of its directly attached neighbors, performs a calculation, and then -distributes the results of its calculation back to its neighbors. It is -iterative in that this process continues on until no more information is -exchanged between neighbors. (Interestingly, the algorithm is also -self-terminating---there is no signal that the computation should stop; -it just stops.) The algorithm is asynchronous in that it does not -require all of the nodes to operate in lockstep with each other. We'll -see that an asynchronous, iterative, selfterminating, distributed -algorithm is much more interesting and fun than a centralized algorithm! -Before we present the DV algorithm, it will prove beneficial to discuss -an important relationship that exists among the costs of the least-cost -paths. Let dx(y) be the cost of the least-cost path from node x to node -y. Then the least costs are related by the celebrated Bellman-Ford -equation, namely, - -(5.1) - -dx(y)=minv{c(x,v)+dv(y)}, where the minv in the equation is taken over -all of x's neighbors. The Bellman-Ford equation is rather - -intuitive. Indeed, after traveling from x to v, if we then take the -least-cost path from v to y, the path cost will be c(x,v)+dv(y). Since -we must begin by traveling to some neighbor v, the least cost from x to -y is the minimum of c(x,v)+dv(y) taken over all neighbors v. But for -those who might be skeptical about the validity of the equation, let's -check it for source node u and destination node z in Figure 5.3. The -source node u has three neighbors: nodes v, x, and w. By walking along -various paths in the graph, it is easy to see that dv(z)=5, dx(z)=3, and -dw(z)=3. Plugging these values into Equation 5.1, along with the costs -c(u,v)=2, c(u,x)=1, and c(u,w)=5, gives du(z)=min{2+5,5+3,1+3}=4, which -is obviously true and which is exactly what the Dijskstra algorithm gave -us for the same network. This quick verification should help relieve any -skepticism you may have. The Bellman-Ford equation is not just an -intellectual curiosity. It actually has significant practical -importance: the solution to the Bellman-Ford equation provides the -entries in node x's forwarding table. To see this, let v\* be any -neighboring node that achieves the minimum in Equation 5.1. Then, if -node x wants to send a packet to node y along a least-cost path, it -should first forward the packet to node v*. Thus, node x's forwarding -table would specify node v* as the next-hop router for the ultimate -destination y. Another important practical contribution of the -Bellman-Ford equation is that it suggests the form of the -neighborto-neighbor communication that will take place in the DV -algorithm. The basic idea is as follows. Each node x begins with Dx(y), -an estimate of the cost of the least-cost path from itself to node y, -for all nodes, y, in N. Let Dx=\[Dx(y): y in N\] be node x's distance -vector, which is the vector of cost estimates from x to all other nodes, -y, in N. With the DV algorithm, each node x maintains the following -routing information: For each neighbor v, the cost c(x, v) from x to -directly attached neighbor, v Node x's distance vector, that is, -Dx=\[Dx(y): y in N\], containing x's estimate of its cost to all -destinations, y, in N The distance vectors of each of its neighbors, -that is, Dv=\[Dv(y): y in N\] for each neighbor v of x In the -distributed, asynchronous algorithm, from time to time, each node sends -a copy of its distance vector to each of its neighbors. When a node x -receives a new distance vector from any of its neighbors w, it saves w's -distance vector, and then uses the Bellman-Ford equation to update its -own distance vector as follows: Dx(y)=minv{c(x,v)+Dv(y)} - -for each node y in N - -If node x's distance vector has changed as a result of this update step, -node x will then send its updated - -distance vector to each of its neighbors, which can in turn update their -own distance vectors. Miraculously enough, as long as all the nodes -continue to exchange their distance vectors in an asynchronous fashion, -each cost estimate Dx(y) converges to dx(y), the actual cost of the -least-cost path from node x to node y \[Bertsekas 1991\]! -Distance-Vector (DV) Algorithm At each node, x: - -1 2 - -Initialization: for all destinations y in N: - -3 4 - -Dx(y)= c(x, y)/\* if y is not a neighbor then c(x, y)= ∞ \*/ for each -neighbor w - -5 6 - -Dw(y) = ? for all destinations y in N for each neighbor w - -7 - -send distance vector - -Dx = \[Dx(y): y in N\] to w - -8 9 10 - -loop wait - -11 - -(until I see a link cost change to some neighbor w or until I receive a -distance vector from some neighbor w) - -12 13 - -for each y in N: - -14 - -Dx(y) = minv{c(x, v) + Dv(y)} - -15 16 if Dx(y) changed for any destination y 17 - -send distance vector Dx - -= \[Dx(y): y in N\] to all neighbors - -18 19 forever - -In the DV algorithm, a node x updates its distance-vector estimate when -it either sees a cost change in one of its directly attached links or -receives a distance-vector update from some neighbor. But to update its -own forwarding table for a given destination y, what node x really needs -to know is not the shortest-path distance to y but instead the -neighboring node v*(y) that is the next-hop router along the shortest -path to y. As you might expect, the next-hop router v*(y) is the -neighbor v that achieves the minimum in Line 14 of the DV algorithm. (If -there are multiple neighbors v that achieve the minimum, then v*(y) can -be any of the minimizing neighbors.) Thus, in Lines 13--14, for each -destination y, node x also determines v*(y) and updates its forwarding -table for destination y. - -Recall that the LS algorithm is a centralized algorithm in the sense -that it requires each node to first obtain a complete map of the network -before running the Dijkstra algorithm. The DV algorithm is decentralized -and does not use such global information. Indeed, the only information a -node will have is the costs of the links to its directly attached -neighbors and information it receives from these neighbors. Each node -waits for an update from any neighbor (Lines 10--11), calculates its new -distance vector when receiving an update (Line 14), and distributes its -new distance vector to its neighbors (Lines 16--17). DV-like algorithms -are used in many routing protocols in practice, including the Internet's -RIP and BGP, ISO IDRP, Novell IPX, and the original ARPAnet. Figure 5.6 -illustrates the operation of the DV algorithm for the simple three-node -network shown at the top of the figure. The operation of the algorithm -is illustrated in a synchronous manner, where all nodes simultaneously -receive distance vectors from their neighbors, compute their new -distance vectors, and inform their neighbors if their distance vectors -have changed. After studying this example, you should convince yourself -that the algorithm operates correctly in an asynchronous manner as well, -with node computations and update generation/reception occurring at any -time. The leftmost column of the figure displays three initial routing -tables for each of the three nodes. For example, the table in the -upper-left corner is node x's initial routing table. Within a specific -routing table, each row is a distance vector--- specifically, each -node's routing table includes its own distance vector and that of each -of its neighbors. Thus, the first row in node x's initial routing table -is Dx=\[Dx(x),Dx(y),Dx(z)\]=\[0,2,7\]. The second and third rows in this -table are the most recently received distance vectors from nodes y and -z, respectively. Because at initialization node x has not received -anything from node y or z, the entries in the second and third rows are -initialized to infinity. After initialization, each node sends its -distance vector to each of its two neighbors. This is illustrated in -Figure 5.6 by the arrows from the first column of tables to the second -column of tables. For example, node x sends its distance vector Dx = -\[0, 2, 7\] to both nodes y and z. After receiving the updates, each -node recomputes its own distance vector. For example, node x computes -Dx(x)=0Dx(y)=min{c(x,y)+Dy(y),c(x,z)+Dz(y)}=min{2+0, -7+1}=2Dx(z)=min{c(x,y)+Dy(z),c(x,z)+Dz(z)}=min{2+1,7+0}=3 The second -column therefore displays, for each node, the node's new distance vector -along with distance vectors just received from its neighbors. Note, for -example, that - -Figure 5.6 Distance-vector (DV) algorithm in operation - -node x's estimate for the least cost to node z, Dx(z), has changed from -7 to 3. Also note that for node x, neighboring node y achieves the -minimum in line 14 of the DV algorithm; thus at this stage of the -algorithm, we have at node x that v*(y)=y and v*(z)=y. After the nodes -recompute their distance vectors, they again send their updated distance -vectors to their neighbors (if there has been a change). This is -illustrated in Figure 5.6 by the arrows from the second column of tables -to the third column of tables. Note that only nodes x and z send -updates: node y's distance vector didn't change so node y doesn't send -an update. After receiving the updates, the nodes then recompute their -distance vectors and update their routing tables, which are shown in the -third column. - -The process of receiving updated distance vectors from neighbors, -recomputing routing table entries, and informing neighbors of changed -costs of the least-cost path to a destination continues until no update -messages are sent. At this point, since no update messages are sent, no -further routing table calculations will occur and the algorithm will -enter a quiescent state; that is, all nodes will be performing the wait -in Lines 10--11 of the DV algorithm. The algorithm remains in the -quiescent state until a link cost changes, as discussed next. -Distance-Vector Algorithm: Link-Cost Changes and Link Failure When a -node running the DV algorithm detects a change in the link cost from -itself to a neighbor (Lines 10--11), it updates its distance vector -(Lines 13--14) and, if there's a change in the cost of the least-cost -path, informs its neighbors (Lines 16--17) of its new distance vector. -Figure 5.7(a) illustrates a scenario where the link cost from y to x -changes from 4 to 1. We focus here only on y' and z's distance table -entries to destination x. The DV algorithm causes the following sequence -of events to occur: At time t0, y detects the link-cost change (the cost -has changed from 4 to 1), updates its distance vector, and informs its -neighbors of this change since its distance vector has changed. At time -t1, z receives the update from y and updates its table. It computes a -new least cost to x (it has decreased from a cost of 5 to a cost of 2) -and sends its new distance vector to its neighbors. At time t2, y -receives z's update and updates its distance table. y's least costs do -not change and hence y does not send any message to z. The algorithm -comes to a quiescent state. Thus, only two iterations are required for -the DV algorithm to reach a quiescent state. The good news about the -decreased cost between x and y has propagated quickly through the -network. - -Figure 5.7 Changes in link cost - -Let's now consider what can happen when a link cost increases. Suppose -that the link cost between x and y increases from 4 to 60, as shown in -Figure 5.7(b). - -1. Before the link cost changes, Dy(x)=4, Dy(z)=1, Dz(y)=1, and - Dz(x)=5. At time t0, y detects the link- - -cost change (the cost has changed from 4 to 60). y computes its new -minimum-cost path to x to have a cost of Dy(x)=min{c(y,x)+Dx(x), -c(y,z)+Dz(x)}=min{60+0,1+5}=6 Of course, with our global view of the -network, we can see that this new cost via z is wrong. But the only -information node y has is that its direct cost to x is 60 and that z has -last told y that z could get to x with a cost of 5. So in order to get -to x, y would now route through z, fully expecting that z will be able -to get to x with a cost of 5. As of t1 we have a routing loop---in order -to get to x, y routes through z, and z routes through y. A routing loop -is like a black hole---a packet destined for x arriving at y or z as of -t1 will bounce back and forth between these two nodes forever (or until -the forwarding tables are changed). - -2. Since node y has computed a new minimum cost to x, it informs z of - its new distance vector at time t1. - -3. Sometime after t1, z receives y's new distance vector, which - indicates that y's minimum cost to x is - -4. z knows it can get to y with a cost of 1 and hence computes a new - least cost to x of Dz(x)=min{50+0,1+6}=7. Since z's least cost to x - has increased, it then informs y of its new distance vector at t2. - -5. In a similar manner, after receiving z's new distance vector, y - determines Dy(x)=8 and sends z its distance vector. z then - determines Dz(x)=9 and sends y its distance vector, and so on. How - long will the process continue? You should convince yourself that - the loop will persist for 44 iterations (message exchanges between y - and z)---until z eventually computes the cost of its path via y to - be greater than 50. At this point, z will (finally!) determine that - its least-cost path to x is via its direct connection to x. y will - then route to x via z. The result of the bad news about the increase - in link cost has indeed traveled slowly! What would have happened if - the link cost c(y, x) had changed from 4 to 10,000 and the cost c(z, - -```{=html} -<!-- --> -``` -x) had been 9,999? Because of such scenarios, the problem we have seen - is sometimes referred to as the count-to-infinity problem. - Distance-Vector Algorithm: Adding Poisoned Reverse The specific - looping scenario just described can be avoided using a technique - known as poisoned reverse. The idea is simple---if z routes through - y to get to destination x, then z will advertise to y that its - distance to x is infinity, that is, z will advertise to y that - Dz(x)=∞ (even though z knows Dz(x)=5 in truth). z will continue - telling this little white lie to y as long as it routes to x via y. - Since y believes that z has no path to x, y will never attempt to - route to x via z, as long as z continues to route to x via y (and - lies about doing so). Let's now see how poisoned reverse solves the - particular looping problem we encountered before in Figure 5.5(b). - As a result of the poisoned reverse, y's distance table indicates - Dz(x)=∞. When the cost of the (x, y) link changes from 4 to 60 at - time t0, y updates its table and continues to route directly to x, - albeit - -at a higher cost of 60, and informs z of its new cost to x, that is, -Dy(x)=60. After receiving the update at t1, z immediately shifts its -route to x to be via the direct (z, x) link at a cost of 50. Since this -is a new least-cost path to x, and since the path no longer passes -through y, z now informs y that Dz(x)=50 at t2. After receiving the -update from z, y updates its distance table with Dy(x)=51. Also, since z -is now on y's leastcost path to x, y poisons the reverse path from z to -x by informing z at time t3 that Dy(x)=∞ (even though y knows that -Dy(x)=51 in truth). Does poisoned reverse solve the general -count-to-infinity problem? It does not. You should convince yourself -that loops involving three or more nodes (rather than simply two -immediately neighboring nodes) will not be detected by the poisoned -reverse technique. A Comparison of LS and DV Routing Algorithms The DV -and LS algorithms take complementary approaches toward computing -routing. In the DV algorithm, each node talks to only its directly -connected neighbors, but it provides its neighbors with leastcost -estimates from itself to all the nodes (that it knows about) in the -network. The LS algorithm requires global information. Consequently, -when implemented in each and every router, e.g., as in Figure 4.2 and -5.1, each node would need to communicate with all other nodes (via -broadcast), but it tells them only the costs of its directly connected -links. Let's conclude our study of LS and DV algorithms with a quick -comparison of some of their attributes. Recall that N is the set of -nodes (routers) and E is the set of edges (links). Message complexity. -We have seen that LS requires each node to know the cost of each link in -the network. This requires O(\|N\| \|E\|) messages to be sent. Also, -whenever a link cost changes, the new link cost must be sent to all -nodes. The DV algorithm requires message exchanges between directly -connected neighbors at each iteration. We have seen that the time needed -for the algorithm to converge can depend on many factors. When link -costs change, the DV algorithm will propagate the results of the changed -link cost only if the new link cost results in a changed least-cost path -for one of the nodes attached to that link. Speed of convergence. We -have seen that our implementation of LS is an O(\|N\|2) algorithm -requiring O(\|N\| \|E\|)) messages. The DV algorithm can converge slowly -and can have routing loops while the algorithm is converging. DV also -suffers from the count-to-infinity problem. Robustness. What can happen -if a router fails, misbehaves, or is sabotaged? Under LS, a router could -broadcast an incorrect cost for one of its attached links (but no -others). A node could also corrupt or drop any packets it received as -part of an LS broadcast. But an LS node is computing only its own -forwarding tables; other nodes are performing similar calculations for -themselves. This means route calculations are somewhat separated under -LS, providing a degree of robustness. Under DV, a node can advertise -incorrect least-cost paths to any or all destinations. (Indeed, in 1997, -a malfunctioning router in a small ISP provided national backbone -routers with erroneous routing information. This caused other routers to -flood the malfunctioning router with traffic and caused large portions -of the - -Internet to become disconnected for up to several hours \[Neumann -1997\].) More generally, we note that, at each iteration, a node's -calculation in DV is passed on to its neighbor and then indirectly to -its neighbor's neighbor on the next iteration. In this sense, an -incorrect node calculation can be diffused through the entire network -under DV. In the end, neither algorithm is an obvious winner over the -other; indeed, both algorithms are used in the Internet. - -5.3 Intra-AS Routing in the Internet: OSPF In our study of routing -algorithms so far, we've viewed the network simply as a collection of -interconnected routers. One router was indistinguishable from another in -the sense that all routers executed the same routing algorithm to -compute routing paths through the entire network. In practice, this -model and its view of a homogenous set of routers all executing the same -routing algorithm is simplistic for two important reasons: Scale. As the -number of routers becomes large, the overhead involved in communicating, -computing, and storing routing information becomes prohibitive. Today's -Internet consists of hundreds of millions of routers. Storing routing -information for possible destinations at each of these routers would -clearly require enormous amounts of memory. The overhead required to -broadcast connectivity and link cost updates among all of the routers -would be huge! A distance-vector algorithm that iterated among such a -large number of routers would surely never converge. Clearly, something -must be done to reduce the complexity of route computation in a network -as large as the Internet. Administrative autonomy. As described in -Section 1.3, the Internet is a network of ISPs, with each ISP consisting -of its own network of routers. An ISP generally desires to operate its -network as it pleases (for example, to run whatever routing algorithm it -chooses within its network) or to hide aspects of its network's internal -organization from the outside. Ideally, an organization should be able -to operate and administer its network as it wishes, while still being -able to connect its network to other outside networks. Both of these -problems can be solved by organizing routers into autonomous systems -(ASs), with each AS consisting of a group of routers that are under the -same administrative control. Often the routers in an ISP, and the links -that interconnect them, constitute a single AS. Some ISPs, however, -partition their network into multiple ASs. In particular, some tier-1 -ISPs use one gigantic AS for their entire network, whereas others break -up their ISP into tens of interconnected ASs. An autonomous system is -identified by its globally unique autonomous system number (ASN) \[RFC -1930\]. AS numbers, like IP addresses, are assigned by ICANN regional -registries \[ICANN 2016\]. Routers within the same AS all run the same -routing algorithm and have information about each other. The routing -algorithm running within an autonomous system is called an -intra-autonomous system routing protocol. Open Shortest Path First -(OSPF) - -OSPF routing and its closely related cousin, IS-IS, are widely used for -intra-AS routing in the Internet. The Open in OSPF indicates that the -routing protocol specification is publicly available (for example, as -opposed to Cisco's EIGRP protocol, which was only recently became open -\[Savage 2015\], after roughly 20 years as a Cisco-proprietary -protocol). The most recent version of OSPF, version 2, is defined in -\[RFC 2328\], a public document. OSPF is a link-state protocol that uses -flooding of link-state information and a Dijkstra's least-cost path -algorithm. With OSPF, each router constructs a complete topological map -(that is, a graph) of the entire autonomous system. Each router then -locally runs Dijkstra's shortest-path algorithm to determine a -shortest-path tree to all subnets, with itself as the root node. -Individual link costs are configured by the network administrator (see -sidebar, Principles and Practice: Setting OSPF Weights). The -administrator might choose to set all link costs to 1, - -PRINCIPLES IN PRACTICE SETTING OSPF LINK WEIGHTS Our discussion of -link-state routing has implicitly assumed that link weights are set, a -routing algorithm such as OSPF is run, and traffic flows according to -the routing tables computed by the LS algorithm. In terms of cause and -effect, the link weights are given (i.e., they come first) and result -(via Dijkstra's algorithm) in routing paths that minimize overall cost. -In this viewpoint, link weights reflect the cost of using a link (e.g., -if link weights are inversely proportional to capacity, then the use of -high-capacity links would have smaller weight and thus be more -attractive from a routing standpoint) and Dijsktra's algorithm serves to -minimize overall cost. In practice, the cause and effect relationship -between link weights and routing paths may be reversed, with network -operators configuring link weights in order to obtain routing paths that -achieve certain traffic engineering goals \[Fortz 2000, Fortz 2002\]. -For example, suppose a network operator has an estimate of traffic flow -entering the network at each ingress point and destined for each egress -point. The operator may then want to put in place a specific routing of -ingress-to-egress flows that minimizes the maximum utilization over all -of the network's links. But with a routing algorithm such as OSPF, the -operator's main "knobs" for tuning the routing of flows through the -network are the link weights. Thus, in order to achieve the goal of -minimizing the maximum link utilization, the operator must find the set -of link weights that achieves this goal. This is a reversal of the cause -and effect relationship---the desired routing of flows is known, and the -OSPF link weights must be found such that the OSPF routing algorithm -results in this desired routing of flows. - -thus achieving minimum-hop routing, or might choose to set the link -weights to be inversely proportional to link capacity in order to -discourage traffic from using low-bandwidth links. OSPF does not mandate -a policy for how link weights are set (that is the job of the network -administrator), but instead provides - -the mechanisms (protocol) for determining least-cost path routing for -the given set of link weights. With OSPF, a router broadcasts routing -information to all other routers in the autonomous system, not just to -its neighboring routers. A router broadcasts link-state information -whenever there is a change in a link's state (for example, a change in -cost or a change in up/down status). It also broadcasts a link's state -periodically (at least once every 30 minutes), even if the link's state -has not changed. RFC 2328 notes that "this periodic updating of link -state advertisements adds robustness to the link state algorithm." OSPF -advertisements are contained in OSPF messages that are carried directly -by IP, with an upper-layer protocol of 89 for OSPF. Thus, the OSPF -protocol must itself implement functionality such as reliable message -transfer and link-state broadcast. The OSPF protocol also checks that -links are operational (via a HELLO message that is sent to an attached -neighbor) and allows an OSPF router to obtain a neighboring router's -database of network-wide link state. Some of the advances embodied in -OSPF include the following: Security. Exchanges between OSPF routers -(for example, link-state updates) can be authenticated. With -authentication, only trusted routers can participate in the OSPF -protocol within an AS, thus preventing malicious intruders (or -networking students taking their newfound knowledge out for a joyride) -from injecting incorrect information into router tables. By default, -OSPF packets between routers are not authenticated and could be forged. -Two types of authentication can be configured--- simple and MD5 (see -Chapter 8 for a discussion on MD5 and authentication in general). With -simple authentication, the same password is configured on each router. -When a router sends an OSPF packet, it includes the password in -plaintext. Clearly, simple authentication is not very secure. MD5 -authentication is based on shared secret keys that are configured in all -the routers. For each OSPF packet that it sends, the router computes the -MD5 hash of the content of the OSPF packet appended with the secret key. -(See the discussion of message authentication codes in Chapter 8.) Then -the router includes the resulting hash value in the OSPF packet. The -receiving router, using the preconfigured secret key, will compute an -MD5 hash of the packet and compare it with the hash value that the -packet carries, thus verifying the packet's authenticity. Sequence -numbers are also used with MD5 authentication to protect against replay -attacks. Multiple same-cost paths. When multiple paths to a destination -have the same cost, OSPF allows multiple paths to be used (that is, a -single path need not be chosen for carrying all traffic when multiple -equal-cost paths exist). Integrated support for unicast and multicast -routing. Multicast OSPF (MOSPF) \[RFC 1584\] provides simple extensions -to OSPF to provide for multicast routing. MOSPF uses the existing OSPF -link database and adds a new type of link-state advertisement to the -existing OSPF link-state broadcast mechanism. Support for hierarchy -within a single AS. An OSPF autonomous system can be configured -hierarchically into areas. Each area runs its own OSPF link-state -routing algorithm, with each router in an area broadcasting its link -state to all other routers in that area. Within each area, one or more - -area border routers are responsible for routing packets outside the -area. Lastly, exactly one OSPF area in the AS is configured to be the -backbone area. The primary role of the backbone area is to route traffic -between the other areas in the AS. The backbone always contains all area -border routers in the AS and may contain non-border routers as well. -Inter-area routing within the AS requires that the packet be first -routed to an area border router (intra-area routing), then routed -through the backbone to the area border router that is in the -destination area, and then routed to the final destination. OSPF is a -relatively complex protocol, and our coverage here has been necessarily -brief; \[Huitema 1998; Moy 1998; RFC 2328\] provide additional details. - -5.4 Routing Among the ISPs: BGP We just learned that OSPF is an example -of an intra-AS routing protocol. When routing a packet between a source -and destination within the same AS, the route the packet follows is -entirely determined by the intra-AS routing protocol. However, to route -a packet across multiple ASs, say from a smartphone in Timbuktu to a -server in a datacenter in Silicon Valley, we need an inter-autonomous -system routing protocol. Since an inter-AS routing protocol involves -coordination among multiple ASs, communicating ASs must run the same -inter-AS routing protocol. In fact, in the Internet, all ASs run the -same inter-AS routing protocol, called the Border Gateway Protocol, more -commonly known as BGP \[RFC 4271; Stewart 1999\]. BGP is arguably the -most important of all the Internet protocols (the only other contender -would be the IP protocol that we studied in Section 4.3), as it is the -protocol that glues the thousands of ISPs in the Internet together. As -we will soon see, BGP is a decentralized and asynchronous protocol in -the vein of distance-vector routing described in Section 5.2.2. Although -BGP is a complex and challenging protocol, to understand the Internet on -a deep level, we need to become familiar with its underpinnings and -operation. The time we devote to learning BGP will be well worth the -effort. - -5.4.1 The Role of BGP To understand the responsibilities of BGP, -consider an AS and an arbitrary router in that AS. Recall that every -router has a forwarding table, which plays the central role in the -process of forwarding arriving packets to outbound router links. As we -have learned, for destinations that are within the same AS, the entries -in the router's forwarding table are determined by the AS's intra-AS -routing protocol. But what about destinations that are outside of the -AS? This is precisely where BGP comes to the rescue. In BGP, packets are -not routed to a specific destination address, but instead to CIDRized -prefixes, with each prefix representing a subnet or a collection of -subnets. In the world of BGP, a destination may take the form -138.16.68/22, which for this example includes 1,024 IP addresses. Thus, -a router's forwarding table will have entries of the form (x, I), where -x is a prefix (such as 138.16.68/22) and I is an interface number for -one of the router's interfaces. As an inter-AS routing protocol, BGP -provides each router a means to: - -1. Obtain prefix reachability information from neighboring ASs. In - particular, BGP allows each - -subnet to advertise its existence to the rest of the Internet. A subnet -screams, "I exist and I am here," and BGP makes sure that all the -routers in the Internet know about this subnet. If it weren't for BGP, -each subnet would be an isolated island---alone, unknown and unreachable -by the rest of the Internet. - -2. Determine the "best" routes to the prefixes. A router may learn - about two or more different routes to a specific prefix. To - determine the best route, the router will locally run a BGP - routeselection procedure (using the prefix reachability information - it obtained via neighboring routers). The best route will be - determined based on policy as well as the reachability information. - Let us now delve into how BGP carries out these two tasks. - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - -5.4.2 Advertising BGP Route Information Consider the network shown in -Figure 5.8. As we can see, this simple network has three autonomous -systems: AS1, AS2, and AS3. As shown, AS3 includes a subnet with prefix -x. For each AS, each router is either a gateway router or an internal -router. A gateway router is a router on the edge of an AS that directly -connects to one or more routers in other ASs. An internal router -connects only to hosts and routers within its own AS. In AS1, for -example, router 1c is a gateway router; routers 1a, 1b, and 1d are -internal routers. Let's consider the task of advertising reachability -information for prefix x to all of the routers shown in Figure 5.8. At a -high level, this is straightforward. First, AS3 sends a BGP message to -AS2, saying that x exists and is in AS3; let's denote this message as -"AS3 x". Then AS2 sends a BGP message to AS1, saying that x exists and -that you can get to x by first passing through AS2 and then going to -AS3; let's denote that message as "AS2 AS3 x". In this manner, each of -the autonomous systems will not only learn about the existence of x, but -also learn about a path of autonomous systems that leads to x. Although -the discussion in the above paragraph about advertising BGP reachability -information should get the general idea across, it is not precise in the -sense that autonomous systems do not actually send messages to each -other, but instead routers do. To understand this, let's now re-examine -the example in Figure 5.8. In BGP, - -Figure 5.8 Network with three autonomous systems. AS3 includes a subnet -with prefix x - -pairs of routers exchange routing information over semi-permanent TCP -connections using port 179. Each such TCP connection, along with all the -BGP messages sent over the connection, is called a BGP connection. -Furthermore, a BGP connection that spans two ASs is called an external -BGP (eBGP) connection, and a BGP session between routers in the same AS -is called an internal BGP (iBGP) connection. Examples of BGP connections -for the network in Figure 5.8 are shown in Figure 5.9. There is -typically one eBGP connection for each link that directly connects -gateway routers in different ASs; thus, in Figure 5.9, there is an eBGP -connection between gateway routers 1c and 2a and an eBGP connection -between gateway routers 2c and 3a. There are also iBGP connections -between routers within each of the ASs. In particular, Figure 5.9 -displays a common configuration of one BGP connection for each pair of -routers internal to an AS, creating a mesh of TCP connections within -each AS. In Figure 5.9, the eBGP connections are shown with the long -dashes; the iBGP connections are shown with the short dashes. Note that -iBGP connections do not always correspond to physical links. In order to -propagate the reachability information, both iBGP and eBGP sessions are -used. Consider again advertising the reachability information for prefix -x to all routers in AS1 and AS2. In this process, gateway router 3a -first sends an eBGP message "AS3 x" to gateway router 2c. Gateway router -2c then sends the iBGP message "AS3 x" to all of the other routers in -AS2, including to gateway router 2a. Gateway router 2a then sends the -eBGP message "AS2 AS3 x" to gateway router 1c. - -Figure 5.9 eBGP and iBGP connections - -Finally, gateway router 1c uses iBGP to send the message "AS2 AS3 x" to -all the routers in AS1. After this process is complete, each router in -AS1 and AS2 is aware of the existence of x and is also aware of an AS -path that leads to x. Of course, in a real network, from a given router -there may be many different paths to a given destination, each through a -different sequence of ASs. For example, consider the network in Figure -5.10, which is the original network in Figure 5.8, with an additional -physical link from router 1d to router 3d. In this case, there are two -paths from AS1 to x: the path "AS2 AS3 x" via router 1c; and the new -path "AS3 x" via the router 1d. - -5.4.3 Determining the Best Routes As we have just learned, there may be -many paths from a given router to a destination subnet. In fact, in the -Internet, routers often receive reachability information about dozens of -different possible paths. How does a router choose among these paths -(and then configure its forwarding table accordingly)? Before addressing -this critical question, we need to introduce a little more BGP -terminology. When a router advertises a prefix across a BGP connection, -it includes with the prefix several BGP attributes. In BGP jargon, a -prefix along with its attributes is called a route. Two of the more -important attributes are AS-PATH and NEXT-HOP. The AS-PATH attribute -contains the list of ASs through which the - -Figure 5.10 Network augmented with peering link between AS1 and AS3 - -advertisement has passed, as we've seen in our examples above. To -generate the AS-PATH value, when a prefix is passed to an AS, the AS -adds its ASN to the existing list in the AS-PATH. For example, in Figure -5.10, there are two routes from AS1 to subnet x: one which uses the -AS-PATH "AS2 AS3"; and another that uses the AS-PATH "A3". BGP routers -also use the AS-PATH attribute to detect and prevent looping -advertisements; specifically, if a router sees that its own AS is -contained in the path list, it will reject the advertisement. Providing -the critical link between the inter-AS and intra-AS routing protocols, -the NEXT-HOP attribute has a subtle but important use. The NEXT-HOP is -the IP address of the router interface that begins the AS-PATH. To gain -insight into this attribute, let's again refer to Figure 5.10. As -indicated in Figure 5.10, the NEXT-HOP attribute for the route "AS2 AS3 -x" from AS1 to x that passes through AS2 is the IP address of the left -interface on router 2a. The NEXT-HOP attribute for the route "AS3 x" -from AS1 to x that bypasses AS2 is the IP address of the leftmost -interface of router 3d. In summary, in this toy example, each router in -AS1 becomes aware of two BGP routes to prefix x: IP address of leftmost -interface for router 2a; AS2 AS3; x IP address of leftmost interface of -router 3d; AS3; x Here, each BGP route is written as a list with three -components: NEXT-HOP; AS-PATH; destination prefix. In practice, a BGP -route includes additional attributes, which we will ignore for the time -being. Note that the NEXT-HOP attribute is an IP address of a router -that does not belong to AS1; however, the subnet that contains this IP -address directly attaches to AS1. Hot Potato Routing - -We are now finally in position to talk about BGP routing algorithms in a -precise manner. We will begin with one of the simplest routing -algorithms, namely, hot potato routing. Consider router 1b in the -network in Figure 5.10. As just described, this router will learn about -two possible BGP routes to prefix x. In hot potato routing, the route -chosen (from among all possible routes) is that route with the least -cost to the NEXT-HOP router beginning that route. In this example, -router 1b will consult its intra-AS routing information to find the -least-cost intra-AS path to NEXT-HOP router 2a and the least-cost -intra-AS path to NEXT-HOP router 3d, and then select the route with the -smallest of these least-cost paths. For example, suppose that cost is -defined as the number of links traversed. Then the least cost from -router 1b to router 2a is 2, the least cost from router 1b to router 2d -is 3, and router 2a would therefore be selected. Router 1b would then -consult its forwarding table (configured by its intra-AS algorithm) and -find the interface I that is on the least-cost path to router 2a. It -then adds (x, I) to its forwarding table. The steps for adding an -outside-AS prefix in a router's forwarding table for hot potato routing -are summarized in Figure 5.11. It is important to note that when adding -an outside-AS prefix into a forwarding table, both the inter-AS routing -protocol (BGP) and the intra-AS routing protocol (e.g., OSPF) are used. -The idea behind hot-potato routing is for router 1b to get packets out -of its AS as quickly as possible (more specifically, with the least cost -possible) without worrying about the cost of the remaining portions of -the path outside of its AS to the destination. In the name "hot potato -routing," a packet is analogous to a hot potato that is burning in your -hands. Because it is burning hot, you want to pass it off to another -person (another AS) as quickly as possible. Hot potato routing is thus - -Figure 5.11 Steps in adding outside-AS destination in a router's -forwarding table - -a selfish algorithm---it tries to reduce the cost in its own AS while -ignoring the other components of the end-to-end costs outside its AS. -Note that with hot potato routing, two routers in the same AS may choose -two different AS paths to the same prefix. For example, we just saw that -router 1b would send packets through AS2 to reach x. However, router 1d -would bypass AS2 and send packets directly to AS3 to reach x. -Route-Selection Algorithm - -In practice, BGP uses an algorithm that is more complicated than hot -potato routing, but nevertheless incorporates hot potato routing. For -any given destination prefix, the input into BGP's route-selection -algorithm is the set of all routes to that prefix that have been learned -and accepted by the router. If there is only one such route, then BGP -obviously selects that route. If there are two or more routes to the -same prefix, then BGP sequentially invokes the following elimination -rules until one route remains: - -1. A route is assigned a local preference value as one of its - attributes (in addition to the AS-PATH and NEXT-HOP attributes). The - local preference of a route could have been set by the router or - could have been learned from another router in the same AS. The - value of the local preference attribute is a policy decision that is - left entirely up to the AS's network administrator. (We will shortly - discuss BGP policy issues in some detail.) The routes with the - highest local preference values are selected. - -2. From the remaining routes (all with the same highest local - preference value), the route with the shortest AS-PATH is selected. - If this rule were the only rule for route selection, then BGP would - be using a DV algorithm for path determination, where the distance - metric uses the number of AS hops rather than the number of router - hops. - -3. From the remaining routes (all with the same highest local - preference value and the same ASPATH length), hot potato routing is - used, that is, the route with the closest NEXT-HOP router is - selected. - -4. If more than one route still remains, the router uses BGP - identifiers to select the route; see \[Stewart 1999\]. As an - example, let's again consider router 1b in Figure 5.10. Recall that - there are exactly two BGP routes to prefix x, one that passes - through AS2 and one that bypasses AS2. Also recall that if hot - potato routing on its own were used, then BGP would route packets - through AS2 to prefix x. But in the above route-selection algorithm, - rule 2 is applied before rule 3, causing BGP to select the route - that bypasses AS2, since that route has a shorter AS PATH. So we see - that with the above route-selection algorithm, BGP is no longer a - selfish algorithm---it first looks for routes with short AS paths - (thereby likely reducing end-to-end delay). As noted above, BGP is - the de facto standard for inter-AS routing for the Internet. To see - the contents of various BGP routing tables (large!) extracted from - routers in tier-1 ISPs, see http:// www.routeviews.org. BGP routing - tables often contain over half a million routes (that is, prefixes - and corresponding attributes). Statistics about the size and - characteristics of BGP routing tables are presented in \[Potaroo - 2016\]. - -5.4.4 IP-Anycast - -In addition to being the Internet's inter-AS routing protocol, BGP is -often used to implement the IPanycast service \[RFC 1546, RFC 7094\], -which is commonly used in DNS. To motivate IP-anycast, consider that in -many applications, we are interested in (1) replicating the same content -on different servers in many different dispersed geographical locations, -and (2) having each user access the content from the server that is -closest. For example, a CDN may replicate videos and other objects on -servers in different countries. Similarly, the DNS system can replicate -DNS records on DNS servers throughout the world. When a user wants to -access this replicated content, it is desirable to point the user to the -"nearest" server with the replicated content. BGP's route-selection -algorithm provides an easy and natural mechanism for doing so. To make -our discussion concrete, let's describe how a CDN might use IP-anycast. -As shown in Figure 5.12, during the IP-anycast configuration stage, the -CDN company assigns the same IP address to each of its servers, and uses -standard BGP to advertise this IP address from each of the servers. When -a BGP router receives multiple route advertisements for this IP address, -it treats these advertisements as providing different paths to the same -physical location (when, in fact, the advertisements are for different -paths to different physical locations). When configuring its routing -table, each router will locally use the BGP route-selection algorithm to -pick the "best" (for example, closest, as determined by AS-hop counts) -route to that IP address. For example, if one BGP route (corresponding -to one location) is only one AS hop away from the router, and all other -BGP routes (corresponding to other locations) are two or more AS hops -away, then the BGP router would choose to route packets to the location -that is one hop away. After this initial BGP address-advertisement -phase, the CDN can do its main job of distributing content. When a -client requests the video, the CDN returns to the client the common IP -address used by the geographically dispersed servers, no matter where -the client is located. When the client sends a request to that IP -address, Internet routers then forward the request packet to the -"closest" server, as defined by the BGP route-selection algorithm. -Although the above CDN example nicely illustrates how IP-anycast can be -used, in practice CDNs generally choose not to use IP-anycast because -BGP routing changes can result in different packets of the same TCP -connection arriving at different instances of the Web server. But -IP-anycast is extensively used by the DNS system to direct DNS queries -to the closest root DNS server. Recall from Section 2.4, there are -currently 13 IP addresses for root DNS servers. But corresponding - -Figure 5.12 Using IP-anycast to bring users to the closest CDN server - -to each of these addresses, there are multiple DNS root servers, with -some of these addresses having over 100 DNS root servers scattered over -all corners of the world. When a DNS query is sent to one of these 13 IP -addresses, IP anycast is used to route the query to the nearest of the -DNS root servers that is responsible for that address. - -5.4.5 Routing Policy When a router selects a route to a destination, the -AS routing policy can trump all other considerations, such as shortest -AS path or hot potato routing. Indeed, in the route-selection algorithm, -routes are first selected according to the local-preference attribute, -whose value is fixed by the policy of the local AS. Let's illustrate -some of the basic concepts of BGP routing policy with a simple example. -Figure 5.13 shows six interconnected autonomous systems: A, B, C, W, X, -and Y. It is important to note that A, B, C, W, X, and Y are ASs, not -routers. Let's - -Figure 5.13 A simple BGP policy scenario - -assume that autonomous systems W, X, and Y are access ISPs and that A, -B, and C are backbone provider networks. We'll also assume that A, B, -and C, directly send traffic to each other, and provide full BGP -information to their customer networks. All traffic entering an ISP -access network must be destined for that network, and all traffic -leaving an ISP access network must have originated in that network. W -and Y are clearly access ISPs. X is a multi-homed access ISP, since it -is connected to the rest of the network via two different providers (a -scenario that is becoming increasingly common in practice). However, -like W and Y, X itself must be the source/destination of all traffic -leaving/entering X. But how will this stub network behavior be -implemented and enforced? How will X be prevented from forwarding -traffic between B and C? This can easily be accomplished by controlling -the manner in which BGP routes are advertised. In particular X will -function as an access ISP network if it advertises (to its neighbors B -and C) that it has no paths to any other destinations except itself. -That is, even though X may know of a path, say XCY, that reaches network -Y, it will not advertise this path to B. Since B is unaware that X has a -path to Y, B would never forward traffic destined to Y (or C) via X. -This simple example illustrates how a selective route advertisement -policy can be used to implement customer/provider routing relationships. -Let's next focus on a provider network, say AS B. Suppose that B has -learned (from A) that A has a path AW to W. B can thus install the route -AW into its routing information base. Clearly, B also wants to advertise -the path BAW to its customer, X, so that X knows that it can route to W -via B. But should B advertise the path BAW to C? If it does so, then C -could route traffic to W via BAW. If A, B, and C are all backbone -providers, than B might rightly feel that it should not have to shoulder -the burden (and cost!) of carrying transit traffic between A and C. B -might rightly feel that it is A's and C's job (and cost!) to make sure -that C can route to/from A's customers via a direct connection between A -and C. There are currently no official standards that govern how -backbone ISPs route among themselves. However, a rule of thumb followed -by commercial ISPs is that any traffic flowing across an ISP's backbone -network must have either a source or a destination (or both) in a -network that is a customer of that ISP; otherwise the traffic would be -getting a free ride on the ISP's network. Individual peering agreements -(that would govern questions such as - -PRINCIPLES IN PRACTICE - -WHY ARE THERE DIFFERENT INTER-AS AND INTRA-AS ROUTING PROTOCOLS? Having -now studied the details of specific inter-AS and intra-AS routing -protocols deployed in today's Internet, let's conclude by considering -perhaps the most fundamental question we could ask about these protocols -in the first place (hopefully, you have been wondering this all along, -and have not lost the forest for the trees!): Why are different inter-AS -and intra-AS routing protocols used? The answer to this question gets at -the heart of the differences between the goals of routing within an AS -and among ASs: Policy. Among ASs, policy issues dominate. It may well be -important that traffic originating in a given AS not be able to pass -through another specific AS. Similarly, a given AS may well want to -control what transit traffic it carries between other ASs. We have seen -that BGP carries path attributes and provides for controlled -distribution of routing information so that such policy-based routing -decisions can be made. Within an AS, everything is nominally under the -same administrative control, and thus policy issues play a much less -important role in choosing routes within the AS. Scale. The ability of a -routing algorithm and its data structures to scale to handle routing -to/among large numbers of networks is a critical issue in inter-AS -routing. Within an AS, scalability is less of a concern. For one thing, -if a single ISP becomes too large, it is always possible to divide it -into two ASs and perform inter-AS routing between the two new ASs. -(Recall that OSPF allows such a hierarchy to be built by splitting an AS -into areas.) Performance. Because inter-AS routing is so policy -oriented, the quality (for example, performance) of the routes used is -often of secondary concern (that is, a longer or more costly route that -satisfies certain policy criteria may well be taken over a route that is -shorter but does not meet that criteria). Indeed, we saw that among ASs, -there is not even the notion of cost (other than AS hop count) -associated with routes. Within a single AS, however, such policy -concerns are of less importance, allowing routing to focus more on the -level of performance realized on a route. - -those raised above) are typically negotiated between pairs of ISPs and -are often confidential; \[Huston 1999a\] provides an interesting -discussion of peering agreements. For a detailed description of how -routing policy reflects commercial relationships among ISPs, see \[Gao -2001; Dmitiropoulos 2007\]. For a discussion of BGP routing polices from -an ISP standpoint, see \[Caesar 2005b\]. This completes our brief -introduction to BGP. Understanding BGP is important because it plays a -central role in the Internet. We encourage you to see the references -\[Griffin 2012; Stewart 1999; Labovitz 1997; Halabi 2000; Huitema 1998; -Gao 2001; Feamster 2004; Caesar 2005b; Li 2007\] to learn more about -BGP. - -5.4.6 Putting the Pieces Together: Obtaining Internet Presence Although -this subsection is not about BGP per se, it brings together many of the -protocols and concepts we've seen thus far, including IP addressing, -DNS, and BGP. Suppose you have just created a small company that has a -number of servers, including a public Web server that describes your -company's products and services, a mail server from which your employees -obtain their e-mail messages, and a DNS server. Naturally, you would -like the entire world to be able to visit your Web site in order to -learn about your exciting products and services. Moreover, you would -like your employees to be able to send and receive e-mail to potential -customers throughout the world. To meet these goals, you first need to -obtain Internet connectivity, which is done by contracting with, and -connecting to, a local ISP. Your company will have a gateway router, -which will be connected to a router in your local ISP. This connection -might be a DSL connection through the existing telephone infrastructure, -a leased line to the ISP's router, or one of the many other access -solutions described in Chapter 1. Your local ISP will also provide you -with an IP address range, e.g., a /24 address range consisting of 256 -addresses. Once you have your physical connectivity and your IP address -range, you will assign one of the IP addresses (in your address range) -to your Web server, one to your mail server, one to your DNS server, one -to your gateway router, and other IP addresses to other servers and -networking devices in your company's network. In addition to contracting -with an ISP, you will also need to contract with an Internet registrar -to obtain a domain name for your company, as described in Chapter 2. For -example, if your company's name is, say, Xanadu Inc., you will naturally -try to obtain the domain name xanadu.com. Your company must also obtain -presence in the DNS system. Specifically, because outsiders will want to -contact your DNS server to obtain the IP addresses of your servers, you -will also need to provide your registrar with the IP address of your DNS -server. Your registrar will then put an entry for your DNS server -(domain name and corresponding IP address) in the .com top-level-domain -servers, as described in Chapter 2. After this step is completed, any -user who knows your domain name (e.g., xanadu.com) will be able to -obtain the IP address of your DNS server via the DNS system. So that -people can discover the IP addresses of your Web server, in your DNS -server you will need to include entries that map the host name of your -Web server (e.g., www.xanadu.com) to its IP address. You will want to -have similar entries for other publicly available servers in your -company, including your mail server. In this manner, if Alice wants to -browse your Web server, the DNS system will contact your DNS server, -find the IP address of your Web server, and give it to Alice. Alice can -then establish a TCP connection directly with your Web server. However, -there still remains one other necessary and crucial step to allow -outsiders from around the - -world to access your Web server. Consider what happens when Alice, who -knows the IP address of your Web server, sends an IP datagram (e.g., a -TCP SYN segment) to that IP address. This datagram will be routed -through the Internet, visiting a series of routers in many different -ASs, and eventually reach your Web server. When any one of the routers -receives the datagram, it is going to look for an entry in its -forwarding table to determine on which outgoing port it should forward -the datagram. Therefore, each of the routers needs to know about the -existence of your company's /24 prefix (or some aggregate entry). How -does a router become aware of your company's prefix? As we have just -seen, it becomes aware of it from BGP! Specifically, when your company -contracts with a local ISP and gets assigned a prefix (i.e., an address -range), your local ISP will use BGP to advertise your prefix to the ISPs -to which it connects. Those ISPs will then, in turn, use BGP to -propagate the advertisement. Eventually, all Internet routers will know -about your prefix (or about some aggregate that includes your prefix) -and thus be able to appropriately forward datagrams destined to your Web -and mail servers. - -5.5 The SDN Control Plane In this section, we'll dive into the SDN -control plane---the network-wide logic that controls packet forwarding -among a network's SDN-enabled devices, as well as the configuration and -management of these devices and their services. Our study here builds on -our earlier discussion of generalized SDN forwarding in Section 4.4, so -you might want to first review that section, as well as Section 5.1 of -this chapter, before continuing on. As in Section 4.4, we'll again adopt -the terminology used in the SDN literature and refer to the network's -forwarding devices as "packet switches" (or just switches, with "packet" -being understood), since forwarding decisions can be made on the basis -of network-layer source/destination addresses, link-layer -source/destination addresses, as well as many other values in -transport-, network-, and link-layer packet-header fields. Four key -characteristics of an SDN architecture can be identified \[Kreutz -2015\]: Flow-based forwarding. Packet forwarding by SDN-controlled -switches can be based on any number of header field values in the -transport-layer, network-layer, or link-layer header. We saw in Section -4.4 that the OpenFlow1.0 abstraction allows forwarding based on eleven -different header field values. This contrasts sharply with the -traditional approach to router-based forwarding that we studied in -Sections 5.2--5.4, where forwarding of IP datagrams was based solely on -a datagram's destination IP address. Recall from Figure 5.2 that packet -forwarding rules are specified in a switch's flow table; it is the job -of the SDN control plane to compute, manage and install flow table -entries in all of the network's switches. Separation of data plane and -control plane. This separation is shown clearly in Figures 5.2 and 5.14. -The data plane consists of the network's switches--- relatively simple -(but fast) devices that execute the "match plus action" rules in their -flow tables. The control plane consists of servers and software that -determine and manage the switches' flow tables. Network control -functions: external to data-plane switches. Given that the "S" in SDN is -for "software," it's perhaps not surprising that the SDN control plane -is implemented in software. Unlike traditional routers, however, this -software executes on servers that are both distinct and remote from the -network's switches. As shown in Figure 5.14, the control plane itself -consists of two components ---an SDN controller (or network operating -system \[Gude 2008\]) and a set of network-control applications. The -controller maintains accurate network state information (e.g., the state -of remote links, switches, and hosts); provides this information to the -network-control applications running in the control plane; and provides -the means through which these applications can monitor, program, and -control the underlying network devices. Although the controller in -Figure 5.14 is shown as a single central server, in practice the -controller is only logically centralized; it is typically implemented on -several servers that provide coordinated, scalable performance and high -availability. - -A programmable network. The network is programmable through the -network-control applications running in the control plane. These -applications represent the "brains" of the SDN control plane, using the -APIs provided by the SDN controller to specify and control the data -plane in the network devices. For example, a routing network-control -application might determine the end-end paths between sources and -destinations (e.g., by executing Dijkstra's algorithm using the -node-state and link-state information maintained by the SDN controller). -Another network application might perform access control, i.e., -determine which packets are to be blocked at a switch, as in our third -example in Section 4.4.3. Yet another application might forward packets -in a manner that performs server load balancing (the second example we -considered in Section 4.4.3). From this discussion, we can see that SDN -represents a significant "unbundling" of network functionality ---data -plane switches, SDN controllers, and network-control applications are -separate entities that may each be provided by different vendors and -organizations. This contrasts with the pre-SDN model in which a -switch/router (together with its embedded control plane software and -protocol implementations) was monolithic, vertically integrated, and -sold by a single vendor. This unbundling of network functionality in SDN -has been likened to the earlier evolution from mainframe computers -(where hardware, system software, and applications were provided by a -single vendor) to personal computers (with their separate hardware, -operating systems, and applications). The unbundling of computing -hardware, system software, and applications has arguably led to a rich, -open ecosystem driven by innovation in all three of these areas; one -hope for SDN is that it too will lead to a such rich innovation. Given -our understanding of the SDN architecture of Figure 5.14, many questions -naturally arise. How and where are the flow tables actually computed? -How are these tables updated in response to events at SDN-controlled -devices (e.g., an attached link going up/down)? And how are the flow -table entries at multiple switches coordinated in such a way as to -result in orchestrated and consistent network-wide functionality (e.g., -end-to-end paths for forwarding packets from sources to destinations, or -coordinated distributed firewalls)? It is the role of the SDN control -plane to provide these, and many other, capabilities. - -Figure 5.14 Components of the SDN architecture: SDN-controlled switches, -the SDN controller, network-control applications - -5.5.2 The SDN Control Plane: SDN Controller and SDN Network-control -Applications Let's begin our discussion of the SDN control plane in the -abstract, by considering the generic capabilities that the control plane -must provide. As we'll see, this abstract, "first principles" approach -will lead us to an overall architecture that reflects how SDN control -planes have been implemented in practice. As noted above, the SDN -control plane divides broadly into two components---the SDN controller -and the SDN network-control applications. Let's explore the controller -first. Many SDN controllers have been developed since the earliest SDN -controller \[Gude 2008\]; see \[Kreutz 2015\] for an extremely thorough -and up-to-date survey. Figure 5.15 provides a more detailed view of a -generic SDN controller. A controller's functionality can be broadly -organized into three layers. Let's consider these layers in an -uncharacteristically bottom-up fashion: A communication layer: -communicating between the SDN controller and controlled network devices. -Clearly, if an SDN controller is going to control the operation of a -remote SDN-enabled - -switch, host, or other device, a protocol is needed to transfer -information between the controller and that device. In addition, a -device must be able to communicate locally-observed events to the -controller (e.g., a message indicating that an attached link has gone up -or down, that a device has just joined the network, or a heartbeat -indicating that a device is up and operational). These events provide -the SDN controller with an up-to-date view of the network's state. This -protocol constitutes the lowest layer of the controller architecture, as -shown in Figure 5.15. The communication between the controller and the -controlled devices cross what has come to be known as the controller's -"southbound" interface. In Section 5.5.2, we'll study OpenFlow---a -specific protocol that provides this communication functionality. -OpenFlow is implemented in most, if not all, SDN controllers. A -network-wide state-management layer. The ultimate control decisions made -by the SDN control plane---e.g., configuring flow tables in all switches -to achieve the desired end-end forwarding, to implement load balancing, -or to implement a particular firewalling capability---will require that -the controller have up-to-date information about state of the networks' -hosts, links, switches, and other SDN-controlled devices. A switch's -flow table contains counters whose values might also be profitably used -by network-control applications; these values should thus be available -to the applications. Since the ultimate aim of the control plane is to -determine flow tables for the various controlled devices, a controller -might also maintain a copy of these tables. These pieces of information -all constitute examples of the network-wide "state" maintained by the -SDN controller. The interface to the network-control application layer. -The controller interacts with networkcontrol applications through its -"northbound" interface. This API - -Figure 5.15 Components of an SDN controller - -allows network-control applications to read/write network state and flow -tables within the statemanagement layer. Applications can register to be -notified when state-change events occur, so that they can take actions -in response to network event notifications sent from SDN-controlled -devices. Different types of APIs may be provided; we'll see that two -popular SDN controllers communicate with their applications using a REST -\[Fielding 2000\] request-response interface. We have noted several -times that an SDN controller can be considered to be "logically -centralized," i.e., that the controller may be viewed externally (e.g., -from the point of view of SDN-controlled devices and external -network-control applications) as a single, monolithic service. However, -these services and the databases used to hold state information are -implemented in practice by a distributed set of servers for fault -tolerance, high availability, or for performance reasons. With -controller functions being implemented by a set of servers, the -semantics of the controller's internal operations (e.g., maintaining -logical time ordering of events, consistency, consensus, and more) must -be considered \[Panda 2013\]. - -Such concerns are common across many different distributed systems; see -\[Lamport 1989, Lampson 1996\] for elegant solutions to these -challenges. Modern controllers such as OpenDaylight \[OpenDaylight -Lithium 2016\] and ONOS \[ONOS 2016\] (see sidebar) have placed -considerable emphasis on architecting a logically centralized but -physically distributed controller platform that provides scalable -services and high availability to the controlled devices and -network-control applications alike. The architecture depicted in Figure -5.15 closely resembles the architecture of the originally proposed NOX -controller in 2008 \[Gude 2008\], as well as that of today's -OpenDaylight \[OpenDaylight Lithium 2016\] and ONOS \[ONOS 2016\] SDN -controllers (see sidebar). We'll cover an example of controller -operation in Section 5.5.3. First, however, let's examine the OpenFlow -protocol, which lies in the controller's communication layer. - -5.5.2 OpenFlow Protocol The OpenFlow protocol \[OpenFlow 2009, ONF -2016\] operates between an SDN controller and an SDN-controlled switch -or other device implementing the OpenFlow API that we studied earlier in -Section 4.4. The OpenFlow protocol operates over TCP, with a default -port number of 6653. Among the important messages flowing from the -controller to the controlled switch are the following: Configuration. -This message allows the controller to query and set a switch's -configuration parameters. Modify-State. This message is used by a -controller to add/delete or modify entries in the switch's flow table, -and to set switch port properties. Read-State. This message is used by a -controller to collect statistics and counter values from the switch's -flow table and ports. Send-Packet. This message is used by the -controller to send a specific packet out of a specified port at the -controlled switch. The message itself contains the packet to be sent in -its payload. Among the messages flowing from the SDN-controlled switch -to the controller are the following: Flow-Removed. This message informs -the controller that a flow table entry has been removed, for example by -a timeout or as the result of a received modify-state message. -Port-status. This message is used by a switch to inform the controller -of a change in port status. Packet-in. Recall from Section 4.4 that a -packet arriving at a switch port and not matching any flow table entry -is sent to the controller for additional processing. Matched packets may -also be sent to the controller, as an action to be taken on a match. The -packet-in message is used to send such packets to the controller. - -Additional OpenFlow messages are defined in \[OpenFlow 2009, ONF 2016\]. -Principles in Practice Google's Software-Defined Global Network Recall -from the case study in Section 2.6 that Google deploys a dedicated -wide-area network (WAN) that interconnects its data centers and server -clusters (in IXPs and ISPs). This network, called B4, has a -Google-designed SDN control plane built on OpenFlow. Google's network is -able to drive WAN links at near 70% utilization over the long run (a two -to three fold increase over typical link utilizations) and split -application flows among multiple paths based on application priority and -existing flow demands \[Jain 2013\]. The Google B4 network is -particularly it well-suited for SDN: (i) Google controls all devices -from the edge servers in IXPs and ISPs to routers in their network core; -(ii) the most bandwidthintensive applications are large-scale data -copies between sites that can defer to higher-priority interactive -applications during times of resource congestion; (iii) with only a few -dozen data centers being connected, centralized control is feasible. -Google's B4 network uses custom-built switches, each implementing a -slightly extended version of OpenFlow, with a local Open Flow Agent -(OFA) that is similar in spirit to the control agent we encountered in -Figure 5.2. Each OFA in turn connects to an Open Flow Controller (OFC) -in the network control server (NCS), using a separate "out of band" -network, distinct from the network that carries data-center traffic -between data centers. The OFC thus provides the services used by the NCS -to communicate with its controlled switches, similar in spirit to the -lowest layer in the SDN architecture shown in Figure 5.15. In B4, the -OFC also performs state management functions, keeping node and link -status in a Network Information Base (NIB). Google's implementation of -the OFC is based on the ONIX SDN controller \[Koponen 2010\]. Two -routing protocols, BGP (for routing between the data centers) and IS-IS -(a close relative of OSPF, for routing within a data center), are -implemented. Paxos \[Chandra 2007\] is used to execute hot replicas of -NCS components to protect against failure. A traffic engineering -network-control application, sitting logically above the set of network -control servers, interacts with these servers to provide global, -network-wide bandwidth provisioning for groups of application flows. -With B4, SDN made an important leap forward into the operational -networks of a global network provider. See \[Jain 2013\] for a detailed -description of B4. - -5.5.3 Data and Control Plane Interaction: An Example - -In order to solidify our understanding of the interaction between -SDN-controlled switches and the SDN controller, let's consider the -example shown in Figure 5.16, in which Dijkstra's algorithm (which we -studied in Section 5.2) is used to determine shortest path routes. The -SDN scenario in Figure 5.16 has two important differences from the -earlier per-router-control scenario of Sections 5.2.1 and 5.3, where -Dijkstra's algorithm was implemented in each and every router and -link-state updates were flooded among all network routers: Dijkstra's -algorithm is executed as a separate application, outside of the packet -switches. Packet switches send link updates to the SDN controller and -not to each other. In this example, let's assume that the link between -switch s1 and s2 goes down; that shortest path routing is implemented, -and consequently and that incoming and outgoing flow forwarding rules at -s1, s3, and s4 are affected, but that s2's - -Figure 5.16 SDN controller scenario: Link-state change - -operation is unchanged. Let's also assume that OpenFlow is used as the -communication layer protocol, and that the control plane performs no -other function other than link-state routing. - -1. Switch s1, experiencing a link failure between itself and s2, -notifies the SDN controller of the link-state change using the OpenFlow -port-status message. - -2. The SDN controller receives the OpenFlow message indicating the - link-state change, and notifies the link-state manager, which - updates a link-state database. - -3. The network-control application that implements Dijkstra's - link-state routing has previously registered to be notified when - link state changes. That application receives the notification of - the link-state change. - -4. The link-state routing application interacts with the link-state - manager to get updated link state; it might also consult other - components in the state-management layer. It then computes the new - least-cost paths. - -5. The link-state routing application then interacts with the flow - table manager, which determines the flow tables to be updated. - -6. The flow table manager then uses the OpenFlow protocol to update - flow table entries at affected switches---s1 (which will now route - packets destined to s2 via s4), s2 (which will now begin receiving - packets from s1 via intermediate switch s4), and s4 (which must now - forward packets from s1 destined to s2). This example is simple but - illustrates how the SDN control plane provides control-plane - services (in this case network-layer routing) that had been - previously implemented with per-router control exercised in each and - every network router. One can now easily appreciate how an - SDN-enabled ISP could easily switch from least-cost path routing to - a more hand-tailored approach to routing. Indeed, since the - controller can tailor the flow tables as it pleases, it can - implement any form of forwarding that it pleases ---simply by - changing its application-control software. This ease of change - should be contrasted to the case of a traditional per-router control - plane, where software in all routers (which might be provided to the - ISP by multiple independent vendors) must be changed. - -5.5.4 SDN: Past and Future Although the intense interest in SDN is a -relatively recent phenomenon, the technical roots of SDN, and the -separation of the data and control planes in particular, go back -considerably further. In 2004, \[Feamster 2004, Lakshman 2004, RFC -3746\] all argued for the separation of the network's data and control -planes. \[van der Merwe 1998\] describes a control framework for ATM -networks \[Black 1995\] with multiple controllers, each controlling a -number of ATM switches. The Ethane project \[Casado 2007\] pioneered the -notion of a network of simple flow-based Ethernet switches with -match-plus-action flow tables, a centralized controller that managed -flow admission and routing, and the forwarding of unmatched packets from -the switch to the controller. A network of more than 300 Ethane switches -was operational in 2007. Ethane quickly evolved into the OpenFlow -project, and the rest (as the saying goes) is history! - -Numerous research efforts are aimed at developing future SDN -architectures and capabilities. As we have seen, the SDN revolution is -leading to the disruptive replacement of dedicated monolithic switches -and routers (with both data and control planes) by simple commodity -switching hardware and a sophisticated software control plane. A -generalization of SDN known as network functions virtualization (NFV) -similarly aims at disruptive replacement of sophisticated middleboxes -(such as middleboxes with dedicated hardware and proprietary software -for media caching/service) with simple commodity servers, switching, and -storage \[Gember-Jacobson 2014\]. A second area of important research -seeks to extend SDN concepts from the intra-AS setting to the inter-AS -setting \[Gupta 2014\]. PRINCIPLES IN PRACTICE SDN Controller Case -Studies: The OpenDaylight and ONOS Controllers In the earliest days of -SDN, there was a single SDN protocol (OpenFlow \[McKeown 2008; OpenFlow -2009\]) and a single SDN controller (NOX \[Gude 2008\]). Since then, the -number of SDN controllers in particular has grown significantly \[Kreutz -2015\]. Some SDN controllers are company-specific and proprietary, e.g., -ONIX \[Koponen 2010\], Juniper Networks Contrail \[Juniper Contrail -2016\], and Google's controller \[Jain 2013\] for its B4 wide-area -network. But many more controllers are open-source and implemented in a -variety of programming languages \[Erickson 2013\]. Most recently, the -OpenDaylight controller \[OpenDaylight Lithium 2016\] and the ONOS -controller \[ONOS 2016\] have found considerable industry support. They -are both open-source and are being developed in partnership with the -Linux Foundation. The OpenDaylight Controller Figure 5.17 presents a -simplified view of the OpenDaylight Lithium SDN controller platform -\[OpenDaylight Lithium 2016\]. ODL's main set of controller components -correspond closely to those we developed in Figure 5.15. Network-Service -Applications are the applications that determine how data-plane -forwarding and other services, such as firewalling and load balancing, -are accomplished in the controlled switches. Unlike the canonical -controller in Figure 5.15, the ODL controller has two interfaces through -which applications may communicate with native controller services and -each other: external applications communicate with controller modules -using a REST request-response API running over HTTP. Internal -applications communicate with each other via the Service Abstraction -Layer (SAL). The choice as to whether a controller application is -implemented externally or internally is up to the application designer; - -Figure 5.17 The OpenDaylight controller - -the particular configuration of applications shown in Figure 5.17 is -only meant as an example. ODL's Basic Network-Service Functions are at -the heart of the controller, and they correspond closely to the -network-wide state management capabilities that we encountered in Figure -5.15. The SAL is the controller's nerve center, allowing controller -components and applications to invoke each other's services and to -subscribe to events they generate. It also provides a uniform abstract -interface to the specific underlying communications protocols in the -communication layer, including OpenFlow and SNMP (the Simple Network -Management Protocol---a network management protocol that we will cover -in Section 5.7). OVSDB is a protocol used to manage data center -switching, an important application area for SDN technology. We'll -introduce data center networking in Chapter 6. - -Figure 5.18 ONOS controller architecture - -The ONOS Controller Figure 5.18 presents a simplified view of the ONOS -controller ONOS 2016\]. Similar to the canonical controller in Figure -5.15, three layers can be identified in the ONOS controller: Northbound -abstractions and protocols. A unique feature of ONOS is its intent -framework, which allows an application to request a high-level service -(e.g., to setup a connection between host A and Host B, or conversely to -not allow Host A and host B to communicate) without having to know the -details of how this service is performed. State information is provided -to network-control applications across the northbound API either -synchronously (via query) or asynchronously (via listener callbacks, -e.g., when network state changes). Distributed core. The state of the -network's links, hosts, and devices is maintained in ONOS's distributed -core. ONOS is deployed as a service on a set of interconnected servers, -with each server running an identical copy of the ONOS software; an -increased number of servers offers an increased service capacity. The -ONOS core provides the mechanisms for service replication and -coordination among instances, providing the applications above and the -network devices below with the abstraction of logically centralized core -services. - -Southbound abstractions and protocols. The southbound abstractions mask -the heterogeneity of the underlying hosts, links, switches, and -protocols, allowing the distributed core to be both device and protocol -agnostic. Because of this abstraction, the southbound interface below -the distributed core is logically higher than in our canonical -controller in Figure 5.14 or the ODL controller in Figure 5.17. - -5.6 ICMP: The Internet Control Message Protocol The Internet Control -Message Protocol (ICMP), specified in \[RFC 792\], is used by hosts and -routers to communicate network-layer information to each other. The most -typical use of ICMP is for error reporting. For example, when running an -HTTP session, you may have encountered an error message such as -"Destination network unreachable." This message had its origins in ICMP. -At some point, an IP router was unable to find a path to the host -specified in your HTTP request. That router created and sent an ICMP -message to your host indicating the error. ICMP is often considered part -of IP, but architecturally it lies just above IP, as ICMP messages are -carried inside IP datagrams. That is, ICMP messages are carried as IP -payload, just as TCP or UDP segments are carried as IP payload. -Similarly, when a host receives an IP datagram with ICMP specified as -the upper-layer protocol (an upper-layer protocol number of 1), it -demultiplexes the datagram's contents to ICMP, just as it would -demultiplex a datagram's content to TCP or UDP. ICMP messages have a -type and a code field, and contain the header and the first 8 bytes of -the IP datagram that caused the ICMP message to be generated in the -first place (so that the sender can determine the datagram that caused -the error). Selected ICMP message types are shown in Figure 5.19. Note -that ICMP messages are used not only for signaling error conditions. The -well-known ping program sends an ICMP type 8 code 0 message to the -specified host. The destination host, seeing the echo request, sends -back a type 0 code 0 ICMP echo reply. Most TCP/IP implementations -support the ping server directly in the operating system; that is, the -server is not a process. Chapter 11 of \[Stevens 1990\] provides the -source code for the ping client program. Note that the client program -needs to be able to instruct the operating system to generate an ICMP -message of type 8 code 0. Another interesting ICMP message is the source -quench message. This message is seldom used in practice. Its original -purpose was to perform congestion control---to allow a congested router -to send an ICMP source quench message to a host to force - -Figure 5.19 ICMP message types - -that host to reduce its transmission rate. We have seen in Chapter 3 -that TCP has its own congestioncontrol mechanism that operates at the -transport layer, without the use of network-layer feedback such as the -ICMP source quench message. In Chapter 1 we introduced the Traceroute -program, which allows us to trace a route from a host to any other host -in the world. Interestingly, Traceroute is implemented with ICMP -messages. To determine the names and addresses of the routers between -source and destination, Traceroute in the source sends a series of -ordinary IP datagrams to the destination. Each of these datagrams -carries a UDP segment with an unlikely UDP port number. The first of -these datagrams has a TTL of 1, the second of 2, the third of 3, and so -on. The source also starts timers for each of the datagrams. When the -nth datagram arrives at the nth router, the nth router observes that the -TTL of the datagram has just expired. According to the rules of the IP -protocol, the router discards the datagram and sends an ICMP warning -message to the source (type 11 code 0). This warning message includes -the name of the router and its IP address. When this ICMP message -arrives back at the source, the source obtains the round-trip time from -the timer and the name and IP address of the nth router from the ICMP -message. How does a Traceroute source know when to stop sending UDP -segments? Recall that the source increments the TTL field for each -datagram it sends. Thus, one of the datagrams will eventually make it -all the way to the destination host. Because this datagram contains a -UDP segment with an unlikely port - -number, the destination host sends a port unreachable ICMP message (type -3 code 3) back to the source. When the source host receives this -particular ICMP message, it knows it does not need to send additional -probe packets. (The standard Traceroute program actually sends sets of -three packets with the same TTL; thus the Traceroute output provides -three results for each TTL.) In this manner, the source host learns the -number and the identities of routers that lie between it and the -destination host and the round-trip time between the two hosts. Note -that the Traceroute client program must be able to instruct the -operating system to generate UDP datagrams with specific TTL values and -must also be able to be notified by its operating system when ICMP -messages arrive. Now that you understand how Traceroute works, you may -want to go back and play with it some more. A new version of ICMP has -been defined for IPv6 in RFC 4443. In addition to reorganizing the -existing ICMP type and code definitions, ICMPv6 also added new types and -codes required by the new IPv6 functionality. These include the "Packet -Too Big" type and an "unrecognized IPv6 options" error code. - -5.7 Network Management and SNMP Having now made our way to the end of -our study of the network layer, with only the link-layer before us, -we're well aware that a network consists of many complex, interacting -pieces of hardware and software ---from the links, switches, routers, -hosts, and other devices that comprise the physical components of the -network to the many protocols that control and coordinate these devices. -When hundreds or thousands of such components are brought together by an -organization to form a network, the job of the network administrator to -keep the network "up and running" is surely a challenge. We saw in -Section 5.5 that the logically centralized controller can help with this -process in an SDN context. But the challenge of network management has -been around long before SDN, with a rich set of network management tools -and approaches that help the network administrator monitor, manage, and -control the network. We'll study these tools and techniques in this -section. An often-asked question is "What is network management?" A -well-conceived, single-sentence (albeit a rather long run-on sentence) -definition of network management from \[Saydam 1996\] is: Network -management includes the deployment, integration, and coordination of the -hardware, software, and human elements to monitor, test, poll, -configure, analyze, evaluate, and control the network and element -resources to meet the real-time, operational performance, and Quality of -Service requirements at a reasonable cost. Given this broad definition, -we'll cover only the rudiments of network management in this -section---the architecture, protocols, and information base used by a -network administrator in performing their task. We'll not cover the -administrator's decision-making processes, where topics such as fault -identification \[Labovitz 1997; Steinder 2002; Feamster 2005; Wu 2005; -Teixeira 2006\], anomaly detection \[Lakhina 2005; Barford 2009\], -network design/engineering to meet contracted Service Level Agreements -(SLA's) \[Huston 1999a\], and more come into consideration. Our focus is -thus purposefully narrow; the interested reader should consult these -references, the excellent network-management text by Subramanian -\[Subramanian 2000\], and the more detailed treatment of network -management available on the Web site for this text. - -5.7.1 The Network Management Framework Figure 5.20 shows the key -components of network management: - -The managing server is an application, typically with a human in the -loop, running in a centralized network management station in the network -operations center (NOC). The managing server is the locus of activity -for network management; it controls the collection, processing, -analysis, and/or display of network management information. It is here -that actions are initiated to control network behavior and here that the -human network administrator interacts with the network's devices. A -managed device is a piece of network equipment (including its software) -that resides on a managed network. A managed device might be a host, -router, switch, middlebox, modem, thermometer, or other -network-connected device. There may be several so-called managed objects -within a managed device. These managed objects are the actual pieces of -hardware within the managed device (for example, a network interface -card is but one component of a host or router), and configuration -parameters for these hardware and software components (for example, an -intraAS routing protocol such as OSPF). Each managed object within a -managed device associated information that is collected into a -Management Information Base (MIB); we'll see that the values of these -pieces of information are available to (and in many cases able to be set -by) the managing server. A MIB object might be a counter, such as the -number of IP datagrams discarded at a router due to errors in an IP -datagram header, or the number of UDP segments received at a host; -descriptive information such as the version of the software running on a -DNS server; status information such as whether a particular device is -functioning correctly; or protocol-specific information such as a -routing path to a destination. MIB objects are specified in a data -description language known as SMI (Structure of Management Information) -\[RFC 2578; RFC 2579; RFC 2580\]. A formal definition language is used -to ensure that the syntax and semantics of the network management data -are well defined and unambiguous. Related MIB objects are gathered into -MIB modules. As of mid-2015, there were nearly 400 MIB modules defined -by RFCs, and a much larger number of vendor-specific (private) MIB -modules. Also resident in each managed device is a network management -agent, a process running in the managed device that communicates with -the managing server, - -Figure 5.20 Elements of network management: Managing server, managed -devices, MIB data, remote agents, SNMP - -taking local actions at the managed device under the command and control -of the managing server. The network management agent is similar to the -routing agent that we saw in Figure 5.2. The final component of a -network management framework is the network management protocol. The -protocol runs between the managing server and the managed devices, -allowing the managing server to query the status of managed devices and -indirectly take actions at these devices via its agents. Agents can use -the network management protocol to inform the managing server of -exceptional events (for example, component failures or violation of -performance thresholds). It's important to note that the network -management protocol does not itself manage the network. Instead, it -provides capabilities that a network administrator can use to manage -("monitor, test, poll, configure, analyze, evaluate, and control") the -network. This is a subtle, but important, distinction. In the following -section, we'll cover the Internet's SNMP (Simple Network Management -Protocol) protocol. - -5.7.2 The Simple Network Management Protocol (SNMP) - -The Simple Network Management Protocol version 2 (SNMPv2) \[RFC 3416\] -is an application-layer protocol used to convey network-management -control and information messages between a managing server and an agent -executing on behalf of that managing server. The most common usage of -SNMP is in a request-response mode in which an SNMP managing server -sends a request to an SNMP agent, who receives the request, performs -some action, and sends a reply to the request. Typically, a request will -be used to query (retrieve) or modify (set) MIB object values associated -with a managed device. A second common usage of SNMP is for an agent to -send an unsolicited message, known as a trap message, to a managing -server. Trap messages are used to notify a managing server of an -exceptional situation (e.g., a link interface going up or down) that has -resulted in changes to MIB object values. SNMPv2 defines seven types of -messages, known generically as protocol data units---PDUs---as shown in -Table 5.2 and described below. The format of the PDU is shown in Figure -5.21. The GetRequest , GetNextRequest, and GetBulkRequest PDUs are all -sent from a managing server to an agent to request the value of one or -more MIB objects at the agent's managed device. The MIB objects whose -values are being Table 5.2 SNMPv2 PDU types SNMPv2 PDU - -Sender-receiver - -Description - -manager-to- - -get value of one or more MIB object instances - -Type GetRequest - -agent GetNextRequest - -manager-to- - -get value of next MIB object instance in list or table - -agent GetBulkRequest - -InformRequest - -SetRequest - -manager-to- - -get values in large block of data, for example, values - -agent - -in a large table - -manager-to- - -inform remote managing entity of MIB values remote - -manager - -to its access - -manager-to- - -set value of one or more MIB object instances - -agent Response - -agent-to- - -generated in response to - -manager or manager-tomanager - -GetRequest, - -GetNextRequest, GetBulkRequest, SetRequest PDU, or InformRequest -SNMPv2-Trap - -agent-to- - -inform manager of an exceptional event \# - -manager - -Figure 5.21 SNMP PDU format - -requested are specified in the variable binding portion of the PDU. -GetRequest , GetNextRequest , and GetBulkRequest differ in the -granularity of their data requests. GetRequest can request an arbitrary -set of MIB values; multiple GetNextRequest s can be used to sequence -through a list or table of MIB objects; GetBulkRequest allows a large -block of data to be returned, avoiding the overhead incurred if multiple -GetRequest or GetNextRequest messages were to be sent. In all three -cases, the agent responds with a Response PDU containing the object -identifiers and their associated values. The SetRequest PDU is used by a -managing server to set the value of one or more MIB objects in a managed -device. An agent replies with a Response PDU with the "noError" error -status to confirm that the value has indeed been set. The InformRequest -PDU is used by a managing server to notify another managing server of -MIB - -information that is remote to the receiving server. The Response PDU is -typically sent from a managed device to the managing server in response -to a request message from that server, returning the requested -information. The final type of SNMPv2 PDU is the trap message. Trap -messages are generated asynchronously; that is, they are not generated -in response to a received request but rather in response to an event for -which the managing server requires notification. RFC 3418 defines -well-known trap types that include a cold or warm start by a device, a -link going up or down, the loss of a neighbor, or an authentication -failure event. A received trap request has no required response from a -managing server. Given the request-response nature of SNMP, it is worth -noting here that although SNMP PDUs can be carried via many different -transport protocols, the SNMP PDU is typically carried in the payload of -a UDP datagram. Indeed, RFC 3417 states that UDP is "the preferred -transport mapping." However, since UDP is an unreliable transport -protocol, there is no guarantee that a request, or its response, will be -received at the intended destination. The request ID field of the PDU -(see Figure 5.21) is used by the managing server to number its requests -to an agent; the agent's response takes its request ID from that of the -received request. Thus, the request ID field can be used by the managing -server to detect lost requests or replies. It is up to the managing -server to decide whether to retransmit a request if no corresponding -response is received after a given amount of time. In particular, the -SNMP standard does not mandate any particular procedure for -retransmission, or even if retransmission is to be done in the first -place. It only requires that the managing server "needs to act -responsibly in respect to the frequency and duration of -retransmissions." This, of course, leads one to wonder how a -"responsible" protocol should act! SNMP has evolved through three -versions. The designers of SNMPv3 have said that "SNMPv3 can be thought -of as SNMPv2 with additional security and administration capabilities" -\[RFC 3410\]. Certainly, there are changes in SNMPv3 over SNMPv2, but -nowhere are those changes more evident than in the area of -administration and security. The central role of security in SNMPv3 was -particularly important, since the lack of adequate security resulted in -SNMP being used primarily for monitoring rather than control (for -example, SetRequest is rarely used in SNMPv1). Once again, we see that -security---a topic we'll cover in detail in Chapter 8 --- is of critical -concern, but once again a concern whose importance had been realized -perhaps a bit late and only then "added on." - -5.7 Summary We have now completed our two-chapter journey into the -network core---a journey that began with our study of the network -layer's data plane in Chapter 4 and finished here with our study of the -network layer's control plane. We learned that the control plane is the -network-wide logic that controls not only how a datagram is forwarded -among routers along an end-to-end path from the source host to the -destination host, but also how network-layer components and services are -configured and managed. We learned that there are two broad approaches -towards building a control plane: traditional per-router control (where -a routing algorithm runs in each and every router and the routing -component in the router communicates with the routing components in -other routers) and software-defined networking (SDN) control (where a -logically centralized controller computes and distributes the forwarding -tables to be used by each and every router). We studied two fundamental -routing algorithms for computing least cost paths in a -graph---link-state routing and distance-vector routing---in Section 5.2; -these algorithms find application in both per-router control and in SDN -control. These algorithms are the basis for two widelydeployed Internet -routing protocols, OSPF and BGP, that we covered in Sections 5.3 and -5.4. We covered the SDN approach to the network-layer control plane in -Section 5.5, investigating SDN network-control applications, the SDN -controller, and the OpenFlow protocol for communicating between the -controller and SDN-controlled devices. In Sections 5.6 and 5.7, we -covered some of the nuts and bolts of managing an IP network: ICMP (the -Internet Control Message Protocol) and SNMP (the Simple Network -Management Protocol). Having completed our study of the network layer, -our journey now takes us one step further down the protocol stack, -namely, to the link layer. Like the network layer, the link layer is -part of each and every network-connected device. But we will see in the -next chapter that the link layer has the much more localized task of -moving packets between nodes on the same link or LAN. Although this task -may appear on the surface to be rather simple compared with that of the -network layer's tasks, we will see that the link layer involves a number -of important and fascinating issues that can keep us busy for a long -time. - -Homework Problems and Questions - -Chapter 5 Review Questions - -SECTION 5.1 R1. What is meant by a control plane that is based on -per-router control? In such cases, when we say the network control and -data planes are implemented "monolithically," what do we mean? R2. What -is meant by a control plane that is based on logically centralized -control? In such cases, are the data plane and the control plane -implemented within the same device or in separate devices? Explain. - -SECTION 5.2 R3. Compare and contrast the properties of a centralized and -a distributed routing algorithm. Give an example of a routing protocol -that takes a centralized and a decentralized approach. R4. Compare and -contrast link-state and distance-vector routing algorithms. R5. What is -the "count to infinity" problem in distance vector routing? R6. Is it -necessary that every autonomous system use the same intra-AS routing -algorithm? Why or why not? - -SECTIONS 5.3--5.4 R7. Why are different inter-AS and intra-AS protocols -used in the Internet? R8. True or false: When an OSPF route sends its -link state information, it is sent only to those nodes directly attached -neighbors. Explain. R9. What is meant by an area in an OSPF autonomous -system? Why was the concept of an area introduced? R10. Define and -contrast the following terms: subnet, prefix, and BGP route. R11. How -does BGP use the NEXT-HOP attribute? How does it use the AS-PATH -attribute? R12. Describe how a network administrator of an upper-tier -ISP can implement policy when configuring BGP. R13. True or false: When -a BGP router receives an advertised path from its neighbor, it must add -its own identity to the received path and then send that new path on to -all of its neighbors. - -Explain. - -SECTION 5.5 R14. Describe the main role of the communication layer, the -network-wide state-management layer, and the network-control application -layer in an SDN controller. R15. Suppose you wanted to implement a new -routing protocol in the SDN control plane. At which layer would you -implement that protocol? Explain. R16. What types of messages flow -across an SDN controller's northbound and southbound APIs? Who is the -recipient of these messages sent from the controller across the -southbound interface, and who sends messages to the controller across -the northbound interface? R17. Describe the purpose of two types of -OpenFlow messages (of your choosing) that are sent from a controlled -device to the controller. Describe the purpose of two types of Openflow -messages (of your choosing) that are send from the controller to a -controlled device. R18. What is the purpose of the service abstraction -layer in the OpenDaylight SDN controller? - -SECTIONS 5.6--5.7 R19. Names four different types of ICMP messages R20. -What two types of ICMP messages are received at the sending host -executing the Traceroute program? R21. Define the following terms in the -context of SNMP: managing server, managed device, network management -agent and MIB. R22. What are the purposes of the SNMP GetRequest and -SetRequest messages? R23. What is the purpose of the SNMP trap message? - -Problems P1. Looking at Figure 5.3 , enumerate the paths from y to u -that do not contain any loops. P2. Repeat Problem P1 for paths from x to -z, z to u, and z to w. P3. Consider the following network. With the -indicated link costs, use Dijkstra's shortest-path algorithm to compute -the shortest path from x to all network nodes. Show how the algorithm -works by computing a table similar to Table 5.1 . - -Dijkstra's algorithm: discussion and example - -P4. Consider the network shown in Problem P3. Using Dijkstra's -algorithm, and showing your work using a table similar to Table 5.1 , do -the following: - -a. Compute the shortest path from t to all network nodes. -b. Compute the shortest path from u to all network nodes. -c. Compute the shortest path from v to all network nodes. -d. Compute the shortest path from w to all network nodes. -e. Compute the shortest path from y to all network nodes. -f. Compute the shortest path from z to all network nodes. P5. Consider - the network shown below, and assume that each node initially knows - the costs to each of its neighbors. Consider the distance-vector - algorithm and show the distance table entries at node z. - -P6. Consider a general topology (that is, not the specific network shown -above) and a - -synchronous version of the distance-vector algorithm. Suppose that at -each iteration, a node exchanges its distance vectors with its neighbors -and receives their distance vectors. Assuming that the algorithm begins -with each node knowing only the costs to its immediate neighbors, what -is the maximum number of iterations required before the distributed -algorithm converges? Justify your answer. P7. Consider the network -fragment shown below. x has only two attached neighbors, w and y. w has -a minimum-cost path to destination u (not shown) of 5, and y has a -minimum-cost path to u of 6. The complete paths from w and y to u (and -between w and y) are not shown. All link costs in the network have -strictly positive integer values. - -a. Give x's distance vector for destinations w, y, and u. - -b. Give a link-cost change for either c(x, w) or c(x, y) such that x - will inform its neighbors of a new minimum-cost path to u as a - result of executing the distance-vector algorithm. - -c. Give a link-cost change for either c(x, w) or c(x, y) such that x - will not inform its neighbors of a new minimum-cost path to u as a - result of executing the distance-vector algorithm. P8. Consider the - three-node topology shown in Figure 5.6 . Rather than having the - link costs shown in Figure 5.6 , the link costs are c(x,y)=3, - c(y,z)=6, c(z,x)=4. Compute the distance tables after the - initialization step and after each iteration of a synchronous - version of the distancevector algorithm (as we did in our earlier - discussion of Figure 5.6 ). P9. Consider the count-to-infinity - problem in the distance vector routing. Will the count-to-infinity - problem occur if we decrease the cost of a link? Why? How about if - we connect two nodes which do not have a link? P10. Argue that for - the distance-vector algorithm in Figure 5.6 , each value in the - distance vector D(x) is non-increasing and will eventually stabilize - in a finite number of steps. P11. Consider Figure 5.7. Suppose there - is another router w, connected to router y and z. The costs of all - links are given as follows: c(x,y)=4, c(x,z)=50, c(y,w)=1, c(z,w)=1, - c(y,z)=3. Suppose that poisoned reverse is used in the - distance-vector routing algorithm. - -d. When the distance vector routing is stabilized, router w, y, and z - inform their distances to x to each other. What distance values do - they tell each other? - -e. Now suppose that the link cost between x and y increases to 60. Will - there be a count-toinfinity problem even if poisoned reverse is - used? Why or why not? If there is a count-toinfinity problem, then - how many iterations are needed for the distance-vector routing to - -reach a stable state again? Justify your answer. - -c. How do you modify c(y, z) such that there is no count-to-infinity - problem at all if c(y,x) changes from 4 to 60? P12. Describe how - loops in paths can be detected in BGP. P13. Will a BGP router always - choose the loop-free route with the shortest ASpath length? Justify - your answer. P14. Consider the network shown below. Suppose AS3 and - AS2 are running OSPF for their intra-AS routing protocol. Suppose - AS1 and AS4 are running RIP for their intra-AS routing protocol. - Suppose eBGP and iBGP are used for the inter-AS routing protocol. - Initially suppose there is no physical link between AS2 and AS4. - -d. Router 3c learns about prefix x from which routing protocol: OSPF, - RIP, eBGP, or iBGP? - -e. Router 3a learns about x from which routing protocol? - -f. Router 1c learns about x from which routing protocol? - -g. Router 1d learns about x from which routing protocol? - -P15. Referring to the previous problem, once router 1d learns about x it -will put an entry (x, I) in its forwarding table. - -a. Will I be equal to I1 or I2 for this entry? Explain why in one - sentence. - -b. Now suppose that there is a physical link between AS2 and AS4, shown - by the dotted line. Suppose router 1d learns that x is accessible - via AS2 as well as via AS3. Will I be set to I1 or I2? Explain why - in one sentence. - -c. Now suppose there is another AS, called AS5, which lies on the path - between AS2 and AS4 (not shown in diagram). Suppose router 1d learns - that x is accessible via AS2 AS5 AS4 as well as via AS3 AS4. Will I - be set to I1 or I2? Explain why in one sentence. - -P16. Consider the following network. ISP B provides national backbone -service to regional ISP A. ISP C provides national backbone service to -regional ISP D. Each ISP consists of one AS. B and C peer with each -other in two places using BGP. Consider traffic going from A to D. B -would prefer to hand that traffic over to C on the West Coast (so that C -would have to absorb the cost of carrying the traffic cross-country), -while C would prefer to get the traffic via its East Coast peering point -with B (so that B would have carried the traffic across the country). -What BGP mechanism might C use, so that B would hand over A-to-D traffic -at its East Coast peering point? To answer this question, you will need -to dig into the BGP specification. - -P17. In Figure 5.13 , consider the path information that reaches stub -networks W, X, and Y. Based on the information available at W and X, -what are their respective views of the network topology? Justify your -answer. The topology view at Y is shown below. - -P18. Consider Figure 5.13 . B would never forward traffic destined to Y -via X based on BGP routing. But there are some very popular applications -for which data packets go to X first and then flow to Y. Identify one -such application, and describe how data packets follow a path not given -by BGP routing. - -P19. In Figure 5.13 , suppose that there is another stub network V that -is a customer of ISP A. Suppose that B and C have a peering -relationship, and A is a customer of both B and C. Suppose that A would -like to have the traffic destined to W to come from B only, and the -traffic destined to V from either B or C. How should A advertise its -routes to B and C? What AS routes does C receive? P20. Suppose ASs X and -Z are not directly connected but instead are connected by AS Y. Further -suppose that X has a peering agreement with Y, and that Y has a peering -agreement with Z. Finally, suppose that Z wants to transit all of Y's -traffic but does not want to transit X's traffic. Does BGP allow Z to -implement this policy? P21. Consider the two ways in which communication -occurs between a managing entity and a managed device: request-response -mode and trapping. What are the pros and cons of these two approaches, -in terms of (1) overhead, (2) notification time when exceptional events -occur, and (3) robustness with respect to lost messages between the -managing entity and the device? P22. In Section 5.7 we saw that it was -preferable to transport SNMP messages in unreliable UDP datagrams. Why -do you think the designers of SNMP chose UDP rather than TCP as the -transport protocol of choice for SNMP? - -Socket Programming Assignment At the end of Chapter 2, there are four -socket programming assignments. Below, you will find a fifth assignment -which employs ICMP, a protocol discussed in this chapter. Assignment 5: -ICMP Ping Ping is a popular networking application used to test from a -remote location whether a particular host is up and reachable. It is -also often used to measure latency between the client host and the -target host. It works by sending ICMP "echo request" packets (i.e., ping -packets) to the target host and listening for ICMP "echo response" -replies (i.e., pong packets). Ping measures the RRT, records packet -loss, and calculates a statistical summary of multiple ping-pong -exchanges (the minimum, mean, max, and standard deviation of the -round-trip times). In this lab, you will write your own Ping application -in Python. Your application will use ICMP. But in order to keep your -program simple, you will not exactly follow the official specification -in RFC 1739. Note that you will only need to write the client side of -the program, as the functionality needed on the server side is built -into almost all operating systems. You can find full details of this -assignment, as well as important snippets of the Python code, at the Web -site http://www.pearsonhighered.com/csresources. Programming Assignment - -In this programming assignment, you will be writing a "distributed" set -of procedures that implements a distributed asynchronous distance-vector -routing for the network shown below. You are to write the following -routines that will "execute" asynchronously within the emulated -environment provided for this assignment. For node 0, you will write the -routines: - -rtinit0(). This routine will be called once at the beginning of the -emulation. rtinit0() has no arguments. It should initialize your -distance table in node 0 to reflect the direct costs of 1, 3, and 7 to -nodes 1, 2, and 3, respectively. In the figure above, all links are -bidirectional and the costs in both directions are identical. After -initializing the distance table and any other data structures needed by -your node 0 routines, it should then send its directly connected -neighbors (in this case, 1, 2, and 3) the cost of its minimum-cost paths -to all other network nodes. This minimum-cost information is sent to -neighboring nodes in a routing update packet by calling the routine -tolayer2(), as described in the full assignment. The format of the -routing update packet is also described in the full assignment. -rtupdate0(struct rtpkt *rcvdpkt). This routine will be called when node -0 receives a routing packet that was sent to it by one of its directly -connected neighbors. The parameter *rcvdpkt is a pointer to the packet -that was received. rtupdate0() is the "heart" of the distance-vector -algorithm. The values it receives in a routing update packet from some -other node i contain i's current shortest-path costs to all other -network nodes. rtupdate0() uses these received values to update its own -distance table (as specified by the distance-vector algorithm). If its -own minimum cost to another node changes as a result of the update, node -0 informs its directly connected neighbors of this change in minimum -cost by sending them a routing packet. Recall that in the -distance-vector algorithm, only directly connected nodes will exchange -routing packets. Thus, nodes 1 and 2 will communicate with each other, -but nodes 1 and 3 will not communicate with each other. Similar routines -are defined for nodes 1, 2, and 3. Thus, you will write eight procedures -in all: rtinit0(), rtinit1(), rtinit2(), rtinit3(), rtupdate0(), -rtupdate1(), rtupdate2(), and rtupdate3(). These routines will together -implement a distributed, asynchronous computation of the distance tables -for the topology and costs shown in the figure on the preceding page. -You can find the full details of the programming assignment, as well as -C code that you will need to create the simulated hardware/software -environment, at http://www.pearsonhighered.com/cs-resource. A Java -version of the assignment is also available. - -Wireshark Lab In the Web site for this textbook, -www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab -assignment that examines the use of the ICMP protocol in the ping and -traceroute commands. - -An Interview With... Jennifer Rexford Jennifer Rexford is a Professor in -the Computer Science department at Princeton University. Her research -has the broad goal of making computer networks easier to design and -manage, with particular emphasis on routing protocols. From 1996--2004, -she was a member of the Network Management and Performance department at -AT&T Labs--Research. While at AT&T, she designed techniques and tools -for network measurement, traffic engineering, and router configuration -that were deployed in AT&T's backbone network. Jennifer is co-author of -the book "Web Protocols and Practice: Networking Protocols, Caching, and -Traffic Measurement," published by Addison-Wesley in May 2001. She -served as the chair of ACM SIGCOMM from 2003 to 2007. She received her -BSE degree in electrical engineering from Princeton University in 1991, -and her PhD degree in electrical engineering and computer science from -the University of Michigan in 1996. In 2004, Jennifer was the winner of -ACM's Grace Murray Hopper Award for outstanding young computer -professional and appeared on the MIT TR-100 list of top innovators under -the age of 35. - -Please describe one or two of the most exciting projects you have worked -on during your career. What were the biggest challenges? When I was a -researcher at AT&T, a group of us designed a new way to manage routing -in Internet Service Provider backbone networks. Traditionally, network -operators configure each router individually, and these routers run -distributed protocols to compute paths through the network. We believed -that network management would be simpler and more flexible if network - -operators could exercise direct control over how routers forward traffic -based on a network-wide view of the topology and traffic. The Routing -Control Platform (RCP) we designed and built could compute the routes -for all of AT&T's backbone on a single commodity computer, and could -control legacy routers without modification. To me, this project was -exciting because we had a provocative idea, a working system, and -ultimately a real deployment in an operational network. Fast forward a -few years, and software-defined networking (SDN) has become a mainstream -technology, and standard protocols (like OpenFlow) have made it much -easier to tell the underlying switches what to do. How do you think -software-defined networking should evolve in the future? In a major -break from the past, control-plane software can be created by many -different programmers, not just at companies selling network equipment. -Yet, unlike the applications running on a server or a smart phone, -controller apps must work together to handle the same traffic. Network -operators do not want to perform load balancing on some traffic and -routing on other traffic; instead, they want to perform load balancing -and routing, together, on the same traffic. Future SDN controller -platforms should offer good programming abstractions for composing -independently written multiple controller applications together. More -broadly, good programming abstractions can make it easier to create -controller applications, without having to worry about low-level details -like flow table entries, traffic counters, bit patterns in packet -headers, and so on. Also, while an SDN controller is logically -centralized, the network still consists of a distributed collection of -devices. Future controllers should offer good abstractions for updating -the flow tables across the network, so apps can reason about what -happens to packets in flight while the devices are updated. Programming -abstractions for control-plane software is an exciting area for -interdisciplinary research between computer networking, distributed -systems, and programming languages, with a real chance for practical -impact in the years ahead. Where do you see the future of networking and -the Internet? Networking is an exciting field because the applications -and the underlying technologies change all the time. We are always -reinventing ourselves! Who would have predicted even ten years ago the -dominance of smart phones, allowing mobile users to access existing -applications as well as new location-based services? The emergence of -cloud computing is fundamentally changing the relationship between users -and the applications they run, and networked sensors and actuators (the -"Internet of Things") are enabling a wealth of new applications (and -security vulnerabilities!). The pace of innovation is truly inspiring. -The underlying network is a crucial component in all of these -innovations. Yet, the network is notoriously "in the way"---limiting -performance, compromising reliability, constraining applications, and -complicating the deployment and management of services. We should strive -to make the network of the future as invisible as the air we breathe, so -it never stands in the way of - -new ideas and valuable services. To do this, we need to raise the level -of abstraction above individual network devices and protocols (and their -attendant acronyms!), so we can reason about the network and the user's -high-level goals as a whole. What people inspired you professionally? -I've long been inspired by Sally Floyd at the International Computer -Science Institute. Her research is always purposeful, focusing on the -important challenges facing the Internet. She digs deeply into hard -questions until she understands the problem and the space of solutions -completely, and she devotes serious energy into "making things happen," -such as pushing her ideas into protocol standards and network equipment. -Also, she gives back to the community, through professional service in -numerous standards and research organizations and by creating tools -(such as the widely used ns-2 and ns-3 simulators) that enable other -researchers to succeed. She retired in 2009 but her influence on the -field will be felt for years to come. What are your recommendations for -students who want careers in computer science and networking? Networking -is an inherently interdisciplinary field. Applying techniques from other -disciplines breakthroughs in networking come from such diverse areas as -queuing theory, game theory, control theory, distributed systems, -network optimization, programming languages, machine learning, -algorithms, data structures, and so on. I think that becoming conversant -in a related field, or collaborating closely with experts in those -fields, is a wonderful way to put networking on a stronger foundation, -so we can learn how to build networks that are worthy of society's -trust. Beyond the theoretical disciplines, networking is exciting -because we create real artifacts that real people use. Mastering how to -design and build systems---by gaining experience in operating systems, -computer architecture, and so on---is another fantastic way to amplify -your knowledge of networking to help make the world a better place. - -Chapter 6 The Link Layer and LANs - -In the previous two chapters we learned that the network layer provides -a communication service between any two network hosts. Between the two -hosts, datagrams travel over a series of communication links, some wired -and some wireless, starting at the source host, passing through a series -of packet switches (switches and routers) and ending at the destination -host. As we continue down the protocol stack, from the network layer to -the link layer, we naturally wonder how packets are sent across the -individual links that make up the end-to-end communication path. How are -the networklayer datagrams encapsulated in the link-layer frames for -transmission over a single link? Are different link-layer protocols used -in the different links along the communication path? How are -transmission conflicts in broadcast links resolved? Is there addressing -at the link layer and, if so, how does the linklayer addressing operate -with the network-layer addressing we learned about in Chapter 4? And -what exactly is the difference between a switch and a router? We'll -answer these and other important questions in this chapter. In -discussing the link layer, we'll see that there are two fundamentally -different types of link-layer channels. The first type are broadcast -channels, which connect multiple hosts in wireless LANs, satellite -networks, and hybrid fiber-coaxial cable (HFC) access networks. Since -many hosts are connected to the same broadcast communication channel, a -so-called medium access protocol is needed to coordinate frame -transmission. In some cases, a central controller may be used to -coordinate transmissions; in other cases, the hosts themselves -coordinate transmissions. The second type of link-layer channel is the -point-to-point communication link, such as that often found between two -routers connected by a long-distance link, or between a user's office -computer and the nearby Ethernet switch to which it is connected. -Coordinating access to a point-to-point link is simpler; the reference -material on this book's Web site has a detailed discussion of the -Point-to-Point Protocol (PPP), which is used in settings ranging from -dial-up service over a telephone line to high-speed point-to-point frame -transport over fiber-optic links. We'll explore several important -link-layer concepts and technologies in this chapter. We'll dive deeper -into error detection and correction, a topic we touched on briefly in -Chapter 3. We'll consider multiple access networks and switched LANs, -including Ethernet---by far the most prevalent wired LAN technology. -We'll also look at virtual LANs, and data center networks. Although -WiFi, and more generally wireless LANs, are link-layer topics, we'll -postpone our study of these important topics until - -Chapter 7. - -6.1 Introduction to the Link Layer Let's begin with some important -terminology. We'll find it convenient in this chapter to refer to any -device that runs a link-layer (i.e., layer 2) protocol as a node. Nodes -include hosts, routers, switches, and WiFi access points (discussed in -Chapter 7). We will also refer to the communication channels that -connect adjacent nodes along the communication path as links. In order -for a datagram to be transferred from source host to destination host, -it must be moved over each of the individual links in the end-to-end -path. As an example, in the company network shown at the bottom of -Figure 6.1, consider sending a datagram from one of the wireless hosts -to one of the servers. This datagram will actually pass through six -links: a WiFi link between sending host and WiFi access point, an -Ethernet link between the access point and a link-layer switch; a link -between the link-layer switch and the router, a link between the two -routers; an Ethernet link between the router and a link-layer switch; -and finally an Ethernet link between the switch and the server. Over a -given link, a transmitting node encapsulates the datagram in a linklayer -frame and transmits the frame into the link. In order to gain further -insight into the link layer and how it relates to the network layer, -let's consider a transportation analogy. Consider a travel agent who is -planning a trip for a tourist traveling from Princeton, New Jersey, to -Lausanne, Switzerland. The travel agent decides that it is most -convenient for the tourist to take a limousine from Princeton to JFK -airport, then a plane from JFK airport to Geneva's airport, and finally -a train from Geneva's airport to Lausanne's train station. Once the -travel agent makes the three reservations, it is the responsibility of -the Princeton limousine company to get the tourist from Princeton to -JFK; it is the responsibility of the airline company to get the tourist -from JFK to Geneva; and it is the responsibility - -Figure 6.1 Six link-layer hops between wireless host and server - -of the Swiss train service to get the tourist from Geneva to Lausanne. -Each of the three segments of the trip is "direct" between two -"adjacent" locations. Note that the three transportation segments are -managed by different companies and use entirely different transportation -modes (limousine, plane, and train). Although the transportation modes -are different, they each provide the basic service of moving passengers -from one location to an adjacent location. In this transportation -analogy, the tourist is a datagram, each transportation segment is a -link, the transportation mode is a link-layer protocol, and the - -travel agent is a routing protocol. - -6.1.1 The Services Provided by the Link Layer Although the basic service -of any link layer is to move a datagram from one node to an adjacent -node over a single communication link, the details of the provided -service can vary from one link-layer protocol to the next. Possible -services that can be offered by a link-layer protocol include: Framing. -Almost all link-layer protocols encapsulate each network-layer datagram -within a link-layer frame before transmission over the link. A frame -consists of a data field, in which the network-layer datagram is -inserted, and a number of header fields. The structure of the frame is -specified by the link-layer protocol. We'll see several different frame -formats when we examine specific link-layer protocols in the second half -of this chapter. Link access. A medium access control (MAC) protocol -specifies the rules by which a frame is transmitted onto the link. For -point-to-point links that have a single sender at one end of the link -and a single receiver at the other end of the link, the MAC protocol is -simple (or nonexistent)---the sender can send a frame whenever the link -is idle. The more interesting case is when multiple nodes share a single -broadcast link---the so-called multiple access problem. Here, the MAC -protocol serves to coordinate the frame transmissions of the many nodes. -Reliable delivery. When a link-layer protocol provides reliable delivery -service, it guarantees to move each network-layer datagram across the -link without error. Recall that certain transport-layer protocols (such -as TCP) also provide a reliable delivery service. Similar to a -transport-layer reliable delivery service, a link-layer reliable -delivery service can be achieved with acknowledgments and -retransmissions (see Section 3.4). A link-layer reliable delivery -service is often used for links that are prone to high error rates, such -as a wireless link, with the goal of correcting an error locally---on -the link where the error occurs---rather than forcing an end-to-end -retransmission of the data by a transport- or application-layer -protocol. However, link-layer reliable delivery can be considered an -unnecessary overhead for low bit-error links, including fiber, coax, and -many twisted-pair copper links. For this reason, many wired link-layer -protocols do not provide a reliable delivery service. Error detection -and correction. The link-layer hardware in a receiving node can -incorrectly decide that a bit in a frame is zero when it was transmitted -as a one, and vice versa. Such bit errors are introduced by signal -attenuation and electromagnetic noise. Because there is no need to -forward a datagram that has an error, many link-layer protocols provide -a mechanism to detect such bit errors. This is done by having the -transmitting node include error-detection bits in the frame, and having -the receiving node perform an error check. Recall from Chapters 3 and 4 -that the Internet's transport layer and network layer also provide a -limited form of error detection---the Internet checksum. Error detection -in the link layer is usually more sophisticated and is implemented in -hardware. Error correction is similar to error detection, except that a -receiver not only detects when bit errors have occurred in the frame but -also determines exactly where in the frame the errors have occurred (and - -then corrects these errors). - -6.1.2 Where Is the Link Layer Implemented? Before diving into our -detailed study of the link layer, let's conclude this introduction by -considering the question of where the link layer is implemented. We'll -focus here on an end system, since we learned in Chapter 4 that the link -layer is implemented in a router's line card. Is a host's link layer -implemented in hardware or software? Is it implemented on a separate -card or chip, and how does it interface with the rest of a host's -hardware and operating system components? Figure 6.2 shows a typical -host architecture. For the most part, the link layer is implemented in a -network adapter, also sometimes known as a network interface card (NIC). -At the heart of the network adapter is the link-layer controller, -usually a single, special-purpose chip that implements many of the -link-layer services (framing, link access, error detection, and so on). -Thus, much of a link-layer controller's functionality is implemented in -hardware. For example, Intel's 710 adapter \[Intel 2016\] implements the -Ethernet protocols we'll study in Section 6.5; the Atheros AR5006 -\[Atheros 2016\] controller implements the 802.11 WiFi protocols we'll -study in Chapter 7. Until the late 1990s, most network adapters were -physically separate cards (such as a PCMCIA card or a plug-in card -fitting into a PC's PCI card slot) but increasingly, network adapters -are being integrated onto the host's motherboard ---a so-called -LAN-on-motherboard configuration. On the sending side, the controller -takes a datagram that has been created and stored in host memory by the -higher layers of the protocol stack, encapsulates the datagram in a -link-layer frame (filling in the frame's various fields), and then -transmits the frame into the communication link, following the -linkaccess protocol. On the receiving side, a controller receives the -entire frame, and extracts the networklayer datagram. If the link layer -performs error detection, then it is the sending controller that sets -the error-detection bits in the frame header and it is the receiving -controller that performs error detection. Figure 6.2 shows a network -adapter attaching to a host's bus (e.g., a PCI or PCI-X bus), where it -looks much like any other I/O device to the other host - -Figure 6.2 Network adapter: Its relationship to other host components -and to protocol stack functionality - -components. Figure 6.2 also shows that while most of the link layer is -implemented in hardware, part of the link layer is implemented in -software that runs on the host's CPU. The software components of the -link layer implement higher-level link-layer functionality such as -assembling link-layer addressing information and activating the -controller hardware. On the receiving side, link-layer software responds -to controller interrupts (e.g., due to the receipt of one or more -frames), handling error conditions and passing a datagram up to the -network layer. Thus, the link layer is a combination of hardware and -software---the place in the protocol stack where software meets -hardware. \[Intel 2016\] provides a readable overview (as well as a -detailed description) of the XL710 controller from a softwareprogramming -point of view. - -6.2 Error-Detection and -Correction Techniques In the previous section, -we noted that bit-level error detection and correction---detecting and -correcting the corruption of bits in a link-layer frame sent from one -node to another physically connected neighboring node---are two services -often provided by the link layer. We saw in Chapter 3 that -errordetection and -correction services are also often offered at the -transport layer as well. In this section, we'll examine a few of the -simplest techniques that can be used to detect and, in some cases, -correct such bit errors. A full treatment of the theory and -implementation of this topic is itself the topic of many textbooks (for -example, \[Schwartz 1980\] or \[Bertsekas 1991\]), and our treatment -here is necessarily brief. Our goal here is to develop an intuitive feel -for the capabilities that error-detection and -correction techniques -provide and to see how a few simple techniques work and are used in -practice in the link layer. Figure 6.3 illustrates the setting for our -study. At the sending node, data, D, to be protected against bit errors -is augmented with error-detection and -correction bits (EDC). Typically, -the data to be protected includes not only the datagram passed down from -the network layer for transmission across the link, but also link-level -addressing information, sequence numbers, and other fields in the link -frame header. Both D and EDC are sent to the receiving node in a -link-level frame. At the receiving node, a sequence of bits, D′ and EDC′ -is received. Note that D′ and EDC′ may differ from the original D and -EDC as a result of in-transit bit flips. The receiver's challenge is to -determine whether or not D′ is the same as the original D, given that it -has only received D′ and EDC′. The exact wording of the receiver's -decision in Figure 6.3 (we ask whether an error is detected, not whether -an error has occurred!) is important. Error-detection and -correction -techniques allow the receiver to sometimes, but not always, detect that -bit errors have occurred. Even with the use of error-detection bits -there still may be undetected bit errors; that is, the receiver may be -unaware that the received information contains bit errors. As a - -Figure 6.3 Error-detection and -correction scenario - -consequence, the receiver might deliver a corrupted datagram to the -network layer, or be unaware that the contents of a field in the frame's -header has been corrupted. We thus want to choose an errordetection -scheme that keeps the probability of such occurrences small. Generally, -more sophisticated error-detection and-correction techniques (that is, -those that have a smaller probability of allowing undetected bit errors) -incur a larger overhead---more computation is needed to compute and -transmit a larger number of error-detection and -correction bits. Let's -now examine three techniques for detecting errors in the transmitted -data---parity checks (to illustrate the basic ideas behind error -detection and correction), checksumming methods (which are more -typically used in the transport layer), and cyclic redundancy checks -(which are more typically used in the link layer in an adapter). - -6.2.1 Parity Checks Perhaps the simplest form of error detection is the -use of a single parity bit. Suppose that the information to be sent, D -in Figure 6.4, has d bits. In an even parity scheme, the sender simply -includes one additional bit and chooses its value such that the total -number of 1s in the d+1 bits (the original information plus a parity -bit) is even. For odd parity schemes, the parity bit value is chosen -such that there is an odd number of 1s. Figure 6.4 illustrates an even -parity scheme, with the single parity bit being stored in a separate -field. - -Receiver operation is also simple with a single parity bit. The receiver -need only count the number of 1s in the received d+1 bits. If an odd -number of 1-valued bits are found with an even parity scheme, the -receiver knows that at least one bit error has occurred. More precisely, -it knows that some odd number of bit errors have occurred. But what -happens if an even number of bit errors occur? You should convince -yourself that this would result in an undetected error. If the -probability of bit errors is small and errors can be assumed to occur -independently from one bit to the next, the probability of multiple bit -errors in a packet would be extremely small. In this case, a single -parity bit might suffice. However, measurements have shown that, rather -than occurring independently, errors are often clustered together in -"bursts." Under burst error conditions, the probability of undetected -errors in a frame protected by single-bit parity can approach 50 percent -\[Spragins 1991\]. Clearly, a more robust error-detection scheme is -needed (and, fortunately, is used in practice!). But before examining -error-detection schemes that are used in practice, let's consider a -simple - -Figure 6.4 One-bit even parity - -generalization of one-bit parity that will provide us with insight into -error-correction techniques. Figure 6.5 shows a two-dimensional -generalization of the single-bit parity scheme. Here, the d bits in D -are divided into i rows and j columns. A parity value is computed for -each row and for each column. The resulting i+j+1 parity bits comprise -the link-layer frame's error-detection bits. Suppose now that a single -bit error occurs in the original d bits of information. With this -twodimensional parity scheme, the parity of both the column and the row -containing the flipped bit will be in error. The receiver can thus not -only detect the fact that a single bit error has occurred, but can use -the column and row indices of the column and row with parity errors to -actually identify the bit that was corrupted and correct that error! -Figure 6.5 shows an example in which the 1-valued bit in position (2,2) -is corrupted and switched to a 0---an error that is both detectable and -correctable at the receiver. Although our discussion has focused on the -original d bits of information, a single error in the parity bits -themselves is also detectable and correctable. Two-dimensional parity -can also detect (but not correct!) any combination of two errors in a -packet. Other properties of the two-dimensional parity scheme are -explored in the problems at the end of the chapter. - -Figure 6.5 Two-dimensional even parity - -The ability of the receiver to both detect and correct errors is known -as forward error correction (FEC). These techniques are commonly used in -audio storage and playback devices such as audio CDs. In a network -setting, FEC techniques can be used by themselves, or in conjunction -with link-layer ARQ techniques similar to those we examined in Chapter -3. FEC techniques are valuable because they can decrease the number of -sender retransmissions required. Perhaps more important, they allow for -immediate correction of errors at the receiver. This avoids having to -wait for the round-trip propagation delay needed for the sender to -receive a NAK packet and for the retransmitted packet to propagate back -to the receiver---a potentially important advantage for real-time -network applications \[Rubenstein 1998\] or links (such as deep-space -links) with long propagation delays. Research examining the use of FEC -in error-control protocols includes \[Biersack 1992; Nonnenmacher 1998; -Byers 1998; Shacham 1990\]. - -6.2.2 Checksumming Methods In checksumming techniques, the d bits of -data in Figure 6.4 are treated as a sequence of k-bit integers. One -simple checksumming method is to simply sum these k-bit integers and use -the resulting sum as the error-detection bits. The Internet checksum is -based on this approach---bytes of data are - -treated as 16-bit integers and summed. The 1s complement of this sum -then forms the Internet checksum that is carried in the segment header. -As discussed in Section 3.3, the receiver checks the checksum by taking -the 1s complement of the sum of the received data (including the -checksum) and checking whether the result is all 1 bits. If any of the -bits are 0, an error is indicated. RFC 1071 discusses the Internet -checksum algorithm and its implementation in detail. In the TCP and UDP -protocols, the Internet checksum is computed over all fields (header and -data fields included). In IP the checksum is computed over the IP header -(since the UDP or TCP segment has its own checksum). In other protocols, -for example, XTP \[Strayer 1992\], one checksum is computed over the -header and another checksum is computed over the entire packet. -Checksumming methods require relatively little packet overhead. For -example, the checksums in TCP and UDP use only 16 bits. However, they -provide relatively weak protection against errors as compared with -cyclic redundancy check, which is discussed below and which is often -used in the link layer. A natural question at this point is, Why is -checksumming used at the transport layer and cyclic redundancy check -used at the link layer? Recall that the transport layer is typically -implemented in software in a host as part of the host's operating -system. Because transport-layer error detection is implemented in -software, it is important to have a simple and fast error-detection -scheme such as checksumming. On the other hand, error detection at the -link layer is implemented in dedicated hardware in adapters, which can -rapidly perform the more complex CRC operations. Feldmeier \[Feldmeier -1995\] presents fast software implementation techniques for not only -weighted checksum codes, but CRC (see below) and other codes as well. - -6.2.3 Cyclic Redundancy Check (CRC) An error-detection technique used -widely in today's computer networks is based on cyclic redundancy check -(CRC) codes. CRC codes are also known as polynomial codes, since it is -possible to view the bit string to be sent as a polynomial whose -coefficients are the 0 and 1 values in the bit string, with operations -on the bit string interpreted as polynomial arithmetic. CRC codes -operate as follows. Consider the d-bit piece of data, D, that the -sending node wants to send to the receiving node. The sender and -receiver must first agree on an r+1 bit pattern, known as a generator, -which we will denote as G. We will require that the most significant -(leftmost) bit of G be a 1. The key idea behind CRC codes is shown in -Figure 6.6. For a given piece of data, D, the sender will choose r -additional bits, R, and append them to D such that the resulting d+r bit -pattern (interpreted as a binary number) is exactly divisible by G -(i.e., has no remainder) using modulo-2 arithmetic. The process of error -checking with CRCs is thus simple: The receiver divides the d+r received -bits by G. If the remainder is nonzero, the receiver knows that an error -has occurred; otherwise the data is accepted as being correct. - -All CRC calculations are done in modulo-2 arithmetic without carries in -addition or borrows in subtraction. This means that addition and -subtraction are identical, and both are equivalent to the bitwise -exclusive-or (XOR) of the operands. Thus, for example, - -1011 XOR 0101 = 1110 1001 XOR 1101 = 0100 - -Also, we similarly have - -1011 - 0101 = 1110 1001 - 1101 = 0100 - -Multiplication and division are the same as in base-2 arithmetic, except -that any required addition or subtraction is done without carries or -borrows. As in regular - -Figure 6.6 CRC - -binary arithmetic, multiplication by 2k left shifts a bit pattern by k -places. Thus, given D and R, the quantity D⋅2rXOR R yields the d+r bit -pattern shown in Figure 6.6. We'll use this algebraic characterization -of the d+r bit pattern from Figure 6.6 in our discussion below. Let us -now turn to the crucial question of how the sender computes R. Recall -that we want to find R such that there is an n such that D⋅2rXOR R=nG -That is, we want to choose R such that G divides into D⋅2rXOR R without -remainder. If we XOR (that is, add modulo-2, without carry) R to both -sides of the above equation, we get - -D⋅2r=nG XOR R This equation tells us that if we divide D⋅2r by G, the -value of the remainder is precisely R. In other words, we can calculate -R as R=remainderD⋅2rG Figure 6.7 illustrates this calculation for the -case of D=101110, d=6, G=1001, and r=3. The 9 bits transmitted in this -case are 101 110 011. You should check these calculations for yourself -and also check that indeed D⋅2r=101011⋅G XOR R. - -Figure 6.7 A sample CRC calculation - -International standards have been defined for 8-, 12-, 16-, and 32-bit -generators, G. The CRC-32 32-bit standard, which has been adopted in a -number of link-level IEEE protocols, uses a generator of -GCRC-32=100000100110000010001110110110111 Each of the CRC standards can -detect burst errors of fewer than r+1 bits. (This means that all -consecutive bit errors of r bits or fewer will be detected.) -Furthermore, under appropriate assumptions, a burst of length greater -than r+1 bits is detected with probability 1−0.5r. Also, each of the CRC -standards can detect any odd number of bit errors. See \[Williams 1993\] -for a discussion of implementing CRC checks. The theory behind CRC codes -and even more powerful codes is beyond the scope of this text. The text -\[Schwartz 1980\] provides an excellent introduction to this topic. - -6.3 Multiple Access Links and Protocols In the introduction to this -chapter, we noted that there are two types of network links: -point-to-point links and broadcast links. A point-to-point link consists -of a single sender at one end of the link and a single receiver at the -other end of the link. Many link-layer protocols have been designed for -point-to-point links; the point-to-point protocol (PPP) and high-level -data link control (HDLC) are two such protocols. The second type of -link, a broadcast link, can have multiple sending and receiving nodes -all connected to the same, single, shared broadcast channel. The term -broadcast is used here because when any one node transmits a frame, the -channel broadcasts the frame and each of the other nodes receives a -copy. Ethernet and wireless LANs are examples of broadcast link-layer -technologies. In this section we'll take a step back from specific -link-layer protocols and first examine a problem of central importance -to the link layer: how to coordinate the access of multiple sending and -receiving nodes to a shared broadcast channel---the multiple access -problem. Broadcast channels are often used in LANs, networks that are -geographically concentrated in a single building (or on a corporate or -university campus). Thus, we'll look at how multiple access channels are -used in LANs at the end of this section. We are all familiar with the -notion of broadcasting---television has been using it since its -invention. But traditional television is a one-way broadcast (that is, -one fixed node transmitting to many receiving nodes), while nodes on a -computer network broadcast channel can both send and receive. Perhaps a -more apt human analogy for a broadcast channel is a cocktail party, -where many people gather in a large room (the air providing the -broadcast medium) to talk and listen. A second good analogy is something -many readers will be familiar with---a classroom---where teacher(s) and -student(s) similarly share the same, single, broadcast medium. A central -problem in both scenarios is that of determining who gets to talk (that -is, transmit into the channel) and when. As humans, we've evolved an -elaborate set of protocols for sharing the broadcast channel: "Give -everyone a chance to speak." "Don't speak until you are spoken to." -"Don't monopolize the conversation." "Raise your hand if you have a -question." "Don't interrupt when someone is speaking." "Don't fall -asleep when someone is talking." Computer networks similarly have -protocols---so-called multiple access protocols---by which nodes - -regulate their transmission into the shared broadcast channel. As shown -in Figure 6.8, multiple access protocols are needed in a wide variety of -network settings, including both wired and wireless access networks, and -satellite networks. Although technically each node accesses the -broadcast channel through its adapter, in this section we will refer to -the node as the sending and - -Figure 6.8 Various multiple access channels - -receiving device. In practice, hundreds or even thousands of nodes can -directly communicate over a broadcast channel. Because all nodes are -capable of transmitting frames, more than two nodes can transmit frames -at the same time. When this happens, all of the nodes receive multiple -frames at the same time; that is, the transmitted frames collide at all -of the receivers. Typically, when there is a collision, none of the -receiving nodes can make any sense of any of the frames that were -transmitted; in a sense, the signals of the colliding frames become -inextricably tangled together. Thus, all the frames involved in the -collision are lost, and the broadcast channel is wasted during the -collision interval. Clearly, if many nodes want to transmit frames -frequently, many transmissions will result in collisions, and much of -the bandwidth of the broadcast channel will be wasted. In order to -ensure that the broadcast channel performs useful work when multiple -nodes are active, it is - -necessary to somehow coordinate the transmissions of the active nodes. -This coordination job is the responsibility of the multiple access -protocol. Over the past 40 years, thousands of papers and hundreds of -PhD dissertations have been written on multiple access protocols; a -comprehensive survey of the first 20 years of this body of work is \[Rom -1990\]. Furthermore, active research in multiple access protocols -continues due to the continued emergence of new types of links, -particularly new wireless links. Over the years, dozens of multiple -access protocols have been implemented in a variety of link-layer -technologies. Nevertheless, we can classify just about any multiple -access protocol as belonging to one of three categories: channel -partitioning protocols, random access protocols, and taking-turns -protocols. We'll cover these categories of multiple access protocols in -the following three subsections. Let's conclude this overview by noting -that, ideally, a multiple access protocol for a broadcast channel of -rate R bits per second should have the following desirable -characteristics: - -1. When only one node has data to send, that node has a throughput of R - bps. - -2. When M nodes have data to send, each of these nodes has a throughput - of R/M bps. This need not necessarily imply that each of the M nodes - always has an instantaneous rate of R/M, but rather that each node - should have an average transmission rate of R/M over some suitably - defined interval of time. - -3. The protocol is decentralized; that is, there is no master node that - represents a single point of failure for the network. - -4. The protocol is simple, so that it is inexpensive to implement. - -6.3.1 Channel Partitioning Protocols Recall from our early discussion -back in Section 1.3 that time-division multiplexing (TDM) and -frequency-division multiplexing (FDM) are two techniques that can - -Figure 6.9 A four-node TDM and FDM example - -be used to partition a broadcast channel's bandwidth among all nodes -sharing that channel. As an example, suppose the channel supports N -nodes and that the transmission rate of the channel is R bps. TDM -divides time into time frames and further divides each time frame into N -time slots. (The TDM time frame should not be confused with the -link-layer unit of data exchanged between sending and receiving -adapters, which is also called a frame. In order to reduce confusion, in -this subsection we'll refer to the link-layer unit of data exchanged as -a packet.) Each time slot is then assigned to one of the N nodes. -Whenever a node has a packet to send, it transmits the packet's bits -during its assigned time slot in the revolving TDM frame. Typically, -slot sizes are chosen so that a single packet can be transmitted during -a slot time. Figure 6.9 shows a simple four-node TDM example. Returning -to our cocktail party analogy, a TDM-regulated cocktail party would -allow one partygoer to speak for a fixed period of time, then allow -another partygoer to speak for the same amount of time, and so on. Once -everyone had had a chance to talk, the pattern would repeat. TDM is -appealing because it eliminates collisions and is perfectly fair: Each -node gets a dedicated transmission rate of R/N bps during each frame -time. However, it has two major drawbacks. First, a node is limited to -an average rate of R/N bps even when it is the only node with packets to -send. A second drawback is that a node must always wait for its turn in -the transmission sequence---again, even when it is the only node with a -frame to send. Imagine the partygoer who is the only one with anything -to say (and imagine that this is the even rarer circumstance where -everyone wants to hear what that one person has to say). Clearly, TDM -would be a poor choice for a multiple access protocol for this -particular party. - -While TDM shares the broadcast channel in time, FDM divides the R bps -channel into different frequencies (each with a bandwidth of R/N) and -assigns each frequency to one of the N nodes. FDM thus creates N smaller -channels of R/N bps out of the single, larger R bps channel. FDM shares -both the advantages and drawbacks of TDM. It avoids collisions and -divides the bandwidth fairly among the N nodes. However, FDM also shares -a principal disadvantage with TDM---a node is limited to a bandwidth of -R/N, even when it is the only node with packets to send. A third channel -partitioning protocol is code division multiple access (CDMA). While TDM -and FDM assign time slots and frequencies, respectively, to the nodes, -CDMA assigns a different code to each node. Each node then uses its -unique code to encode the data bits it sends. If the codes are chosen -carefully, CDMA networks have the wonderful property that different -nodes can transmit simultaneously and yet have their respective -receivers correctly receive a sender's encoded data bits (assuming the -receiver knows the sender's code) in spite of interfering transmissions -by other nodes. CDMA has been used in military systems for some time -(due to its anti-jamming properties) and now has widespread civilian -use, particularly in cellular telephony. Because CDMA's use is so -tightly tied to wireless channels, we'll save our discussion of the -technical details of CDMA until Chapter 7. For now, it will suffice to -know that CDMA codes, like time slots in TDM and frequencies in FDM, can -be allocated to the multiple access channel users. - -6.3.2 Random Access Protocols The second broad class of multiple access -protocols are random access protocols. In a random access protocol, a -transmitting node always transmits at the full rate of the channel, -namely, R bps. When there is a collision, each node involved in the -collision repeatedly retransmits its frame (that is, packet) until its -frame gets through without a collision. But when a node experiences a -collision, it doesn't necessarily retransmit the frame right away. -Instead it waits a random delay before retransmitting the frame. Each -node involved in a collision chooses independent random delays. Because -the random delays are independently chosen, it is possible that one of -the nodes will pick a delay that is sufficiently less than the delays of -the other colliding nodes and will therefore be able to sneak its frame -into the channel without a collision. There are dozens if not hundreds -of random access protocols described in the literature \[Rom 1990; -Bertsekas 1991\]. In this section we'll describe a few of the most -commonly used random access protocols---the ALOHA protocols \[Abramson -1970; Abramson 1985; Abramson 2009\] and the carrier sense multiple -access (CSMA) protocols \[Kleinrock 1975b\]. Ethernet \[Metcalfe 1976\] -is a popular and widely deployed CSMA protocol. Slotted ALOHA - -Let's begin our study of random access protocols with one of the -simplest random access protocols, the slotted ALOHA protocol. In our -description of slotted ALOHA, we assume the following: All frames -consist of exactly L bits. Time is divided into slots of size L/R -seconds (that is, a slot equals the time to transmit one frame). Nodes -start to transmit frames only at the beginnings of slots. The nodes are -synchronized so that each node knows when the slots begin. If two or -more frames collide in a slot, then all the nodes detect the collision -event before the slot ends. Let p be a probability, that is, a number -between 0 and 1. The operation of slotted ALOHA in each node is simple: -When the node has a fresh frame to send, it waits until the beginning of -the next slot and transmits the entire frame in the slot. If there isn't -a collision, the node has successfully transmitted its frame and thus -need not consider retransmitting the frame. (The node can prepare a new -frame for transmission, if it has one.) If there is a collision, the -node detects the collision before the end of the slot. The node -retransmits its frame in each subsequent slot with probability p until -the frame is transmitted without a collision. By retransmitting with -probability p, we mean that the node effectively tosses a biased coin; -the event heads corresponds to "retransmit," which occurs with -probability p. The event tails corresponds to "skip the slot and toss -the coin again in the next slot"; this occurs with probability (1−p). -All nodes involved in the collision toss their coins independently. -Slotted ALOHA would appear to have many advantages. Unlike channel -partitioning, slotted ALOHA allows a node to transmit continuously at -the full rate, R, when that node is the only active node. (A node is -said to be active if it has frames to send.) Slotted ALOHA is also -highly decentralized, because each node detects collisions and -independently decides when to retransmit. (Slotted ALOHA does, however, -require the slots to be synchronized in the nodes; shortly we'll discuss -an unslotted version of the ALOHA protocol, as well as CSMA protocols, -none of which require such synchronization.) Slotted ALOHA is also an -extremely simple protocol. Slotted ALOHA works well when there is only -one active node, but how efficient is it when there are multiple active -nodes? There are two possible efficiency - -Figure 6.10 Nodes 1, 2, and 3 collide in the first slot. Node 2 finally -succeeds in the fourth slot, node 1 in the eighth slot, and node 3 in -the ninth slot - -concerns here. First, as shown in Figure 6.10, when there are multiple -active nodes, a certain fraction of the slots will have collisions and -will therefore be "wasted." The second concern is that another fraction -of the slots will be empty because all active nodes refrain from -transmitting as a result of the probabilistic transmission policy. The -only "unwasted" slots will be those in which exactly one node transmits. -A slot in which exactly one node transmits is said to be a successful -slot. The efficiency of a slotted multiple access protocol is defined to -be the long-run fraction of successful slots in the case when there are -a large number of active nodes, each always having a large number of -frames to send. Note that if no form of access control were used, and -each node were to immediately retransmit after each collision, the -efficiency would be zero. Slotted ALOHA clearly increases the efficiency -beyond zero, but by how much? We now proceed to outline the derivation -of the maximum efficiency of slotted ALOHA. To keep this derivation -simple, let's modify the protocol a little and assume that each node -attempts to transmit a frame in each slot with probability p. (That is, -we assume that each node always has a frame to send and that the node -transmits with probability p for a fresh frame as well as for a frame -that has already suffered a collision.) Suppose there are N nodes. Then -the probability that a given slot is a successful slot is the -probability that one of the nodes transmits and that the remaining N−1 -nodes do not transmit. The probability that a given node transmits is p; -the probability that the remaining nodes do not transmit is (1−p)N−1. -Therefore the probability a given node has a success is p(1−p)N−1. -Because there are N nodes, the probability that any one of the N nodes -has a success is Np(1−p)N−1. Thus, when there are N active nodes, the -efficiency of slotted ALOHA is Np(1−p)N−1. To obtain the maximum -efficiency for N active nodes, we have to find the p\* that maximizes -this expression. (See the - -homework problems for a general outline of this derivation.) And to -obtain the maximum efficiency for a large number of active nodes, we -take the limit of Np*(1−p*)N−1 as N approaches infinity. (Again, see the -homework problems.) After performing these calculations, we'll find that -the maximum efficiency of the protocol is given by 1/e=0.37. That is, -when a large number of nodes have many frames to transmit, then (at -best) only 37 percent of the slots do useful work. Thus the effective -transmission rate of the channel is not R bps but only 0.37 R bps! A -similar analysis also shows that 37 percent of the slots go empty and 26 -percent of slots have collisions. Imagine the poor network administrator -who has purchased a 100-Mbps slotted ALOHA system, expecting to be able -to use the network to transmit data among a large number of users at an -aggregate rate of, say, 80 Mbps! Although the channel is capable of -transmitting a given frame at the full channel rate of 100 Mbps, in the -long run, the successful throughput of this channel will be less than 37 -Mbps. ALOHA The slotted ALOHA protocol required that all nodes -synchronize their transmissions to start at the beginning of a slot. The -first ALOHA protocol \[Abramson 1970\] was actually an unslotted, fully -decentralized protocol. In pure ALOHA, when a frame first arrives (that -is, a network-layer datagram is passed down from the network layer at -the sending node), the node immediately transmits the frame in its -entirety into the broadcast channel. If a transmitted frame experiences -a collision with one or more other transmissions, the node will then -immediately (after completely transmitting its collided frame) -retransmit the frame with probability p. Otherwise, the node waits for a -frame transmission time. After this wait, it then transmits the frame -with probability p, or waits (remaining idle) for another frame time -with probability 1 -- p. To determine the maximum efficiency of pure -ALOHA, we focus on an individual node. We'll make the same assumptions -as in our slotted ALOHA analysis and take the frame transmission time to -be the unit of time. At any given time, the probability that a node is -transmitting a frame is p. Suppose this frame begins transmission at -time t0. As shown in Figure 6.11, in order for this frame to be -successfully transmitted, no other nodes can begin their transmission in -the interval of time \[ t0−1,t0\]. Such a transmission would overlap -with the beginning of the transmission of node i's frame. The -probability that all other nodes do not begin a transmission in this -interval is (1−p)N−1. Similarly, no other node can begin a transmission -while node i is transmitting, as such a transmission would overlap with -the latter part of node i's transmission. The probability that all other -nodes do not begin a transmission in this interval is also (1−p)N−1. -Thus, the probability that a given node has a successful transmission is -p(1−p)2(N−1). By taking limits as in the slotted ALOHA case, we find -that the maximum efficiency of the pure ALOHA protocol is only -1/(2e)---exactly half that of slotted ALOHA. This then is the price to -be paid for a fully decentralized ALOHA protocol. - -Figure 6.11 Interfering transmissions in pure ALOHA - -Carrier Sense Multiple Access (CSMA) In both slotted and pure ALOHA, a -node's decision to transmit is made independently of the activity of the -other nodes attached to the broadcast channel. In particular, a node -neither pays attention to whether another node happens to be -transmitting when it begins to transmit, nor stops transmitting if -another node begins to interfere with its transmission. In our cocktail -party analogy, ALOHA protocols are quite like a boorish partygoer who -continues to chatter away regardless of whether other people are -talking. As humans, we have human protocols that allow us not only to -behave with more civility, but also to decrease the amount of time spent -"colliding" with each other in conversation and, consequently, to -increase the amount of data we exchange in our conversations. -Specifically, there are two important rules for polite human -conversation: Listen before speaking. If someone else is speaking, wait -until they are finished. In the networking world, this is called carrier -sensing---a node listens to the channel before transmitting. If a frame -from another node is currently being transmitted into the channel, a -node then waits until it detects no transmissions for a short amount of -time and then begins transmission. If someone else begins talking at the -same time, stop talking. In the networking world, this is called -collision detection---a transmitting node listens to the channel while -it is transmitting. If it detects that another node is transmitting an -interfering frame, it stops transmitting and waits a random amount of -time before repeating the sense-and-transmit-when-idle cycle. These two -rules are embodied in the family of carrier sense multiple access (CSMA) -and CSMA with collision detection (CSMA/CD) protocols \[Kleinrock 1975b; -Metcalfe 1976; Lam 1980; Rom 1990\]. Many variations on CSMA and - -CASE HISTORY - -NORM ABRAMSON AND ALOHANET Norm Abramson, a PhD engineer, had a passion -for surfing and an interest in packet switching. This combination of -interests brought him to the University of Hawaii in 1969. Hawaii -consists of many mountainous islands, making it difficult to install and -operate land-based networks. When not surfing, Abramson thought about -how to design a network that does packet switching over radio. The -network he designed had one central host and several secondary nodes -scattered over the Hawaiian Islands. The network had two channels, each -using a different frequency band. The downlink channel broadcasted -packets from the central host to the secondary hosts; and the upstream -channel sent packets from the secondary hosts to the central host. In -addition to sending informational packets, the central host also sent on -the downstream channel an acknowledgment for each packet successfully -received from the secondary hosts. Because the secondary hosts -transmitted packets in a decentralized fashion, collisions on the -upstream channel inevitably occurred. This observation led Abramson to -devise the pure ALOHA protocol, as described in this chapter. In 1970, -with continued funding from ARPA, Abramson connected his ALOHAnet to the -ARPAnet. Abramson's work is important not only because it was the first -example of a radio packet network, but also because it inspired Bob -Metcalfe. A few years later, Metcalfe modified the ALOHA protocol to -create the CSMA/CD protocol and the Ethernet LAN. - -CSMA/CD have been proposed. Here, we'll consider a few of the most -important, and fundamental, characteristics of CSMA and CSMA/CD. The -first question that you might ask about CSMA is why, if all nodes -perform carrier sensing, do collisions occur in the first place? After -all, a node will refrain from transmitting whenever it senses that -another node is transmitting. The answer to the question can best be -illustrated using space-time diagrams \[Molle 1987\]. Figure 6.12 shows -a space-time diagram of four nodes (A, B, C, D) attached to a linear -broadcast bus. The horizontal axis shows the position of each node in -space; the vertical axis represents time. At time t0, node B senses the -channel is idle, as no other nodes are currently transmitting. Node B -thus begins transmitting, with its bits propagating in both directions -along the broadcast medium. The downward propagation of B's bits in -Figure 6.12 with increasing time indicates that a nonzero amount of time -is needed for B's bits actually to propagate (albeit at near the speed -of light) along the broadcast medium. At time t1(t1\>t0), node D has a -frame to send. Although node B is currently transmitting at time t1, the -bits being transmitted by B have yet to reach D, and thus D senses - -Figure 6.12 Space-time diagram of two CSMA nodes with colliding -transmissions - -the channel idle at t1. In accordance with the CSMA protocol, D thus -begins transmitting its frame. A short time later, B's transmission -begins to interfere with D's transmission at D. From Figure 6.12, it is -evident that the end-to-end channel propagation delay of a broadcast -channel---the time it takes for a signal to propagate from one of the -nodes to another---will play a crucial role in determining its -performance. The longer this propagation delay, the larger the chance -that a carrier-sensing node is not yet able to sense a transmission that -has already begun at another node in the network. Carrier Sense Multiple -Access with Collision Dection (CSMA/CD) In Figure 6.12, nodes do not -perform collision detection; both B and D continue to transmit their -frames in their entirety even though a collision has occurred. When a -node performs collision detection, it ceases transmission as soon as it -detects a collision. Figure 6.13 shows the same scenario as in Figure -6.12, except that the two - -Figure 6.13 CSMA with collision detection - -nodes each abort their transmission a short time after detecting a -collision. Clearly, adding collision detection to a multiple access -protocol will help protocol performance by not transmitting a useless, -damaged (by interference with a frame from another node) frame in its -entirety. Before analyzing the CSMA/CD protocol, let us now summarize -its operation from the perspective of an adapter (in a node) attached to -a broadcast channel: - -1. The adapter obtains a datagram from the network layer, prepares a - link-layer frame, and puts the frame adapter buffer. - -2. If the adapter senses that the channel is idle (that is, there is no - signal energy entering the adapter from the channel), it starts to - transmit the frame. If, on the other hand, the adapter senses that - the channel is busy, it waits until it senses no signal energy and - then starts to transmit the frame. - -3. While transmitting, the adapter monitors for the presence of signal - energy coming from other adapters using the broadcast channel. - -4. If the adapter transmits the entire frame without detecting signal - energy from other adapters, the - -adapter is finished with the frame. If, on the other hand, the adapter -detects signal energy from other adapters while transmitting, it aborts -the transmission (that is, it stops transmitting its frame). - -5. After aborting, the adapter waits a random amount of time and then - returns to step 2. The need to wait a random (rather than fixed) - amount of time is hopefully clear---if two nodes transmitted frames - at the same time and then both waited the same fixed amount of time, - they'd continue colliding forever. But what is a good interval of - time from which to choose the random backoff time? If the interval - is large and the number of colliding nodes is small, nodes are - likely to wait a large amount of time (with the channel remaining - idle) before repeating the sense-and-transmit-when-idle step. On the - other hand, if the interval is small and the number of colliding - nodes is large, it's likely that the chosen random values will be - nearly the same, and transmitting nodes will again collide. What - we'd like is an interval that is short when the number of colliding - nodes is small, and long when the number of colliding nodes is - large. The binary exponential backoff algorithm, used in Ethernet as - well as in DOCSIS cable network multiple access protocols \[DOCSIS - 2011\], elegantly solves this problem. Specifically, when - transmitting a frame that has already experienced n collisions, a - node chooses the value of K at random from { 0,1,2,...2n−1}. Thus, - the more collisions experienced by a frame, the larger the interval - from which K is chosen. For Ethernet, the actual amount of time a - node waits is K⋅512 bit times (i.e., K times the amount of time - needed to send 512 bits into the Ethernet) and the maximum value - that n can take is capped at -6. Let's look at an example. Suppose that a node attempts to transmit a - frame for the first time and while transmitting it detects a - collision. The node then chooses K=0 with probability 0.5 or chooses - K=1 with probability 0.5. If the node chooses K=0, then it - immediately begins sensing the channel. If the node chooses K=1, it - waits 512 bit times (e.g., 5.12 microseconds for a 100 Mbps - Ethernet) before beginning the sense-and-transmit-when-idle cycle. - After a second collision, K is chosen with equal probability from - {0,1,2,3}. After three collisions, K is chosen with equal - probability from {0,1,2,3,4,5,6,7}. After 10 or more collisions, K - is chosen with equal probability from {0,1,2,..., 1023}. Thus, the - size of the sets from which K is chosen grows exponentially with the - number of collisions; for this reason this algorithm is referred to - as binary exponential backoff. We also note here that each time a - node prepares a new frame for transmission, it runs the CSMA/CD - algorithm, not taking into account any collisions that may have - occurred in the recent past. So it is possible that a node with a - new frame will immediately be able to sneak in a successful - transmission while several other nodes are in the exponential - backoff state. CSMA/CD Efficiency - -When only one node has a frame to send, the node can transmit at the -full channel rate (e.g., for Ethernet typical rates are 10 Mbps, 100 -Mbps, or 1 Gbps). However, if many nodes have frames to transmit, the -effective transmission rate of the channel can be much less. We define -the efficiency of CSMA/CD to be the long-run fraction of time during -which frames are being transmitted on the channel without collisions -when there is a large number of active nodes, with each node having a -large number of frames to send. In order to present a closed-form -approximation of the efficiency of Ethernet, let dprop denote the -maximum time it takes signal energy to propagate between any two -adapters. Let dtrans be the time to transmit a maximum-size frame -(approximately 1.2 msecs for a 10 Mbps Ethernet). A derivation of the -efficiency of CSMA/CD is beyond the scope of this book (see \[Lam 1980\] -and \[Bertsekas 1991\]). Here we simply state the following -approximation: Efficiency=11+5dprop/dtrans We see from this formula that -as dprop approaches 0, the efficiency approaches 1. This matches our -intuition that if the propagation delay is zero, colliding nodes will -abort immediately without wasting the channel. Also, as dtrans becomes -very large, efficiency approaches 1. This is also intuitive because when -a frame grabs the channel, it will hold on to the channel for a very -long time; thus, the channel will be doing productive work most of the -time. - -6.3.3 Taking-Turns Protocols Recall that two desirable properties of a -multiple access protocol are (1) when only one node is active, the -active node has a throughput of R bps, and (2) when M nodes are active, -then each active node has a throughput of nearly R/M bps. The ALOHA and -CSMA protocols have this first property but not the second. This has -motivated researchers to create another class of protocols---the -taking-turns protocols. As with random access protocols, there are -dozens of taking-turns protocols, and each one of these protocols has -many variations. We'll discuss two of the more important protocols here. -The first one is the polling protocol. The polling protocol requires one -of the nodes to be designated as a master node. The master node polls -each of the nodes in a round-robin fashion. In particular, the master -node first sends a message to node 1, saying that it (node 1) can -transmit up to some maximum number of frames. After node 1 transmits -some frames, the master node tells node 2 it (node 2) can transmit up to -the maximum number of frames. (The master node can determine when a node -has finished sending its frames by observing the lack of a signal on the -channel.) The procedure continues in this manner, with the master node -polling each of the nodes in a cyclic manner. The polling protocol -eliminates the collisions and empty slots that plague random access -protocols. This allows polling to achieve a much higher efficiency. But -it also has a few drawbacks. The first drawback is that the protocol -introduces a polling delay---the amount of time required to notify a -node that it can - -transmit. If, for example, only one node is active, then the node will -transmit at a rate less than R bps, as the master node must poll each of -the inactive nodes in turn each time the active node has sent its -maximum number of frames. The second drawback, which is potentially more -serious, is that if the master node fails, the entire channel becomes -inoperative. The 802.15 protocol and the Bluetooth protocol we will -study in Section 6.3 are examples of polling protocols. The second -taking-turns protocol is the token-passing protocol. In this protocol -there is no master node. A small, special-purpose frame known as a token -is exchanged among the nodes in some fixed order. For example, node 1 -might always send the token to node 2, node 2 might always send the -token to node 3, and node N might always send the token to node 1. When -a node receives a token, it holds onto the token only if it has some -frames to transmit; otherwise, it immediately forwards the token to the -next node. If a node does have frames to transmit when it receives the -token, it sends up to a maximum number of frames and then forwards the -token to the next node. Token passing is decentralized and highly -efficient. But it has its problems as well. For example, the failure of -one node can crash the entire channel. Or if a node accidentally -neglects to release the token, then some recovery procedure must be -invoked to get the token back in circulation. Over the years many -token-passing protocols have been developed, including the fiber -distributed data interface (FDDI) protocol \[Jain 1994\] and the IEEE -802.5 token ring protocol \[IEEE 802.5 2012\], and each one had to -address these as well as other sticky issues. - -6.3.4 DOCSIS: The Link-Layer Protocol for Cable Internet Access In the -previous three subsections, we've learned about three broad classes of -multiple access protocols: channel partitioning protocols, random access -protocols, and taking turns protocols. A cable access network will make -for an excellent case study here, as we'll find aspects of each of these -three classes of multiple access protocols with the cable access -network! Recall from Section 1.2.1 that a cable access network typically -connects several thousand residential cable modems to a cable modem -termination system (CMTS) at the cable network headend. The -DataOver-Cable Service Interface Specifications (DOCSIS) \[DOCSIS 2011\] -specifies the cable data network architecture and its protocols. DOCSIS -uses FDM to divide the downstream (CMTS to modem) and upstream (modem to -CMTS) network segments into multiple frequency channels. Each downstream -channel is 6 MHz wide, with a maximum throughput of approximately 40 -Mbps per channel (although this data rate is seldom seen at a cable -modem in practice); each upstream channel has a maximum channel width of -6.4 MHz, and a maximum upstream throughput of approximately 30 Mbps. -Each upstream and - -Figure 6.14 Upstream and downstream channels between CMTS and cable -modems - -downstream channel is a broadcast channel. Frames transmitted on the -downstream channel by the CMTS are received by all cable modems -receiving that channel; since there is just a single CMTS transmitting -into the downstream channel, however, there is no multiple access -problem. The upstream direction, however, is more interesting and -technically challenging, since multiple cable modems share the same -upstream channel (frequency) to the CMTS, and thus collisions can -potentially occur. As illustrated in Figure 6.14, each upstream channel -is divided into intervals of time (TDM-like), each containing a sequence -of mini-slots during which cable modems can transmit to the CMTS. The -CMTS explicitly grants permission to individual cable modems to transmit -during specific mini-slots. The CMTS accomplishes this by sending a -control message known as a MAP message on a downstream channel to -specify which cable modem (with data to send) can transmit during which -mini-slot for the interval of time specified in the control message. -Since mini-slots are explicitly allocated to cable modems, the CMTS can -ensure there are no colliding transmissions during a mini-slot. But how -does the CMTS know which cable modems have data to send in the first -place? This is accomplished by having cable modems send -mini-slot-request frames to the CMTS during a special set of interval -mini-slots that are dedicated for this purpose, as shown in Figure 6.14. -These mini-slotrequest frames are transmitted in a random access manner -and so may collide with each other. A cable modem can neither sense -whether the upstream channel is busy nor detect collisions. Instead, the -cable modem infers that its mini-slot-request frame experienced a -collision if it does not receive a response to the requested allocation -in the next downstream control message. When a collision is inferred, a -cable modem uses binary exponential backoff to defer the retransmission -of its mini-slot-request frame to a future time slot. When there is -little traffic on the upstream channel, a cable modem may actually -transmit data frames during slots nominally assigned for -mini-slot-request frames (and thus avoid having - -to wait for a mini-slot assignment). A cable access network thus serves -as a terrific example of multiple access protocols in action---FDM, TDM, -random access, and centrally allocated time slots all within one -network! - -6.4 Switched Local Area Networks Having covered broadcast networks and -multiple access protocols in the previous section, let's turn our -attention next to switched local networks. Figure 6.15 shows a switched -local network connecting three departments, two servers and a router -with four switches. Because these switches operate at the link layer, -they switch link-layer frames (rather than network-layer datagrams), -don't recognize network-layer addresses, and don't use routing -algorithms like RIP or OSPF to determine - -Figure 6.15 An institutional network connected together by four switches - -paths through the network of layer-2 switches. Instead of using IP -addresses, we will soon see that they use link-layer addresses to -forward link-layer frames through the network of switches. We'll begin -our study of switched LANs by first covering link-layer addressing -(Section 6.4.1). We then examine the celebrated Ethernet protocol -(Section 6.5.2). After examining link-layer addressing and Ethernet, -we'll look at how link-layer switches operate (Section 6.4.3), and then -see (Section 6.4.4) how these switches are often used to build -large-scale LANs. - -6.4.1 Link-Layer Addressing and ARP Hosts and routers have link-layer -addresses. Now you might find this surprising, recalling from Chapter 4 -that hosts and routers have network-layer addresses as well. You might -be asking, why in the world do we need to have addresses at both the -network and link layers? In addition to describing the syntax and -function of the link-layer addresses, in this section we hope to shed -some light on why the two layers of addresses are useful and, in fact, -indispensable. We'll also cover the Address Resolution Protocol (ARP), -which provides a mechanism to translate IP addresses to link-layer -addresses. MAC Addresses In truth, it is not hosts and routers that have -link-layer addresses but rather their adapters (that is, network -interfaces) that have link-layer addresses. A host or router with -multiple network interfaces will thus have multiple link-layer addresses -associated with it, just as it would also have multiple IP addresses -associated with it. It's important to note, however, that link-layer -switches do not have linklayer addresses associated with their -interfaces that connect to hosts and routers. This is because the job of -the link-layer switch is to carry datagrams between hosts and routers; a -switch does this job transparently, that is, without the host or router -having to explicitly address the frame to the intervening switch. This -is illustrated in Figure 6.16. A link-layer address is variously called -a LAN address, a physical address, or a MAC address. Because MAC address -seems to be the most popular term, we'll henceforth refer to link-layer -addresses as MAC addresses. For most LANs (including Ethernet and 802.11 -wireless LANs), the MAC address is 6 bytes long, giving 248 possible MAC -addresses. As shown in Figure 6.16, these 6-byte addresses are typically -expressed in hexadecimal notation, with each byte of the address -expressed as a pair of hexadecimal numbers. Although MAC addresses were -designed to be permanent, it is now possible to change an adapter's MAC -address via software. For the rest of this section, however, we'll -assume that an adapter's MAC address is fixed. One interesting property -of MAC addresses is that no two adapters have the same address. This -might seem surprising given that adapters are manufactured in many -countries by many companies. How does a company manufacturing adapters -in Taiwan make sure that it is using different addresses from a company -manufacturing - -Figure 6.16 Each interface connected to a LAN has a unique MAC address - -adapters in Belgium? The answer is that the IEEE manages the MAC address -space. In particular, when a company wants to manufacture adapters, it -purchases a chunk of the address space consisting of 224 addresses for a -nominal fee. IEEE allocates the chunk of 224 addresses by fixing the -first 24 bits of a MAC address and letting the company create unique -combinations of the last 24 bits for each adapter. An adapter's MAC -address has a flat structure (as opposed to a hierarchical structure) -and doesn't change no matter where the adapter goes. A laptop with an -Ethernet interface always has the same MAC address, no matter where the -computer goes. A smartphone with an 802.11 interface always has the same -MAC address, no matter where the smartphone goes. Recall that, in -contrast, IP addresses have a hierarchical structure (that is, a network -part and a host part), and a host's IP addresses needs to be changed -when the host moves, i.e., changes the network to which it is attached. -An adapter's MAC address is analogous to a person's social security -number, which also has a flat addressing structure and which doesn't -change no matter where the person goes. An IP address is analogous to a -person's postal address, which is hierarchical and which must be changed -whenever a person moves. Just as a person may find it useful to have -both a postal address and a social security number, it is useful for a -host and router interfaces to have both a network-layer address and a -MAC address. When an adapter wants to send a frame to some destination -adapter, the sending adapter inserts the destination adapter's MAC -address into the frame and then sends the frame into the LAN. As we will -soon see, a switch occasionally broadcasts an incoming frame onto all of -its interfaces. We'll see in Chapter 7 that 802.11 also broadcasts -frames. Thus, an adapter may receive a frame that isn't addressed to it. -Thus, when an adapter receives a frame, it will check to see whether the -destination MAC address in the frame matches its own MAC address. If -there is a match, the adapter extracts the enclosed datagram and passes -the datagram up the protocol stack. If there isn't a match, the adapter -discards the frame, without passing the network-layer datagram up. Thus, -the destination only will be - -interrupted when the frame is received. However, sometimes a sending -adapter does want all the other adapters on the LAN to receive and -process the frame it is about to send. In this case, the sending adapter -inserts a special MAC broadcast address into the destination address -field of the frame. For LANs that use 6-byte addresses (such as Ethernet -and 802.11), the broadcast address is a string of 48 consecutive 1s -(that is, FF-FF-FF-FF-FFFF in hexadecimal notation). Address Resolution -Protocol (ARP) Because there are both network-layer addresses (for -example, Internet IP addresses) and link-layer addresses (that is, MAC -addresses), there is a need to translate between them. For the Internet, -this is the job of the Address Resolution Protocol (ARP) \[RFC 826\]. To -understand the need for a protocol such as ARP, consider the network -shown in Figure 6.17. In this simple example, each host and router has a -single IP address and single MAC address. As usual, IP addresses are -shown in dotted-decimal - -PRINCIPLES IN PRACTICE KEEPING THE LAYERS INDEPENDENT There are several -reasons why hosts and router interfaces have MAC addresses in addition -to network-layer addresses. First, LANs are designed for arbitrary -network-layer protocols, not just for IP and the Internet. If adapters -were assigned IP addresses rather than "neutral" MAC addresses, then -adapters would not easily be able to support other network-layer -protocols (for example, IPX or DECnet). Second, if adapters were to use -network-layer addresses instead of MAC addresses, the network-layer -address would have to be stored in the adapter RAM and reconfigured -every time the adapter was moved (or powered up). Another option is to -not use any addresses in the adapters and have each adapter pass the -data (typically, an IP datagram) of each frame it receives up the -protocol stack. The network layer could then check for a matching -network-layer address. One problem with this option is that the host -would be interrupted by every frame sent on the LAN, including by frames -that were destined for other hosts on the same broadcast LAN. In -summary, in order for the layers to be largely independent building -blocks in a network architecture, different layers need to have their -own addressing scheme. We have now seen three types of addresses: host -names for the application layer, IP addresses for the network layer, and -MAC addresses for the link layer. - -Figure 6.17 Each interface on a LAN has an IP address and a MAC address - -notation and MAC addresses are shown in hexadecimal notation. For the -purposes of this discussion, we will assume in this section that the -switch broadcasts all frames; that is, whenever a switch receives a -frame on one interface, it forwards the frame on all of its other -interfaces. In the next section, we will provide a more accurate -explanation of how switches operate. Now suppose that the host with IP -address 222.222.222.220 wants to send an IP datagram to host -222.222.222.222. In this example, both the source and destination are in -the same subnet, in the addressing sense of Section 4.3.3. To send a -datagram, the source must give its adapter not only the IP datagram but -also the MAC address for destination 222.222.222.222. The sending -adapter will then construct a link-layer frame containing the -destination's MAC address and send the frame into the LAN. The important -question addressed in this section is, How does the sending host -determine the MAC address for the destination host with IP address -222.222.222.222? As you might have guessed, it uses ARP. An ARP module -in the sending host takes any IP address on the same LAN as input, and -returns the corresponding MAC address. In the example at hand, sending -host 222.222.222.220 provides its ARP module the IP address -222.222.222.222, and the ARP module returns the corresponding MAC -address 49-BD-D2-C7-56-2A. So we see that ARP resolves an IP address to -a MAC address. In many ways it is analogous to DNS (studied in Section -2.5), which resolves host names to IP addresses. However, one important -difference between the two resolvers is that DNS resolves host names for -hosts anywhere in the Internet, whereas ARP resolves IP addresses only -for hosts and router interfaces on the same subnet. If a node in -California were to try to use ARP to resolve the IP address for a node -in Mississippi, ARP would return with an error. - -Figure 6.18 A possible ARP table in 222.222.222.220 - -Now that we have explained what ARP does, let's look at how it works. -Each host and router has an ARP table in its memory, which contains -mappings of IP addresses to MAC addresses. Figure 6.18 shows what an ARP -table in host 222.222.222.220 might look like. The ARP table also -contains a timeto-live (TTL) value, which indicates when each mapping -will be deleted from the table. Note that a table does not necessarily -contain an entry for every host and router on the subnet; some may have -never been entered into the table, and others may have expired. A -typical expiration time for an entry is 20 minutes from when an entry is -placed in an ARP table. Now suppose that host 222.222.222.220 wants to -send a datagram that is IP-addressed to another host or router on that -subnet. The sending host needs to obtain the MAC address of the -destination given the IP address. This task is easy if the sender's ARP -table has an entry for the destination node. But what if the ARP table -doesn't currently have an entry for the destination? In particular, -suppose 222.222.222.220 wants to send a datagram to 222.222.222.222. In -this case, the sender uses the ARP protocol to resolve the address. -First, the sender constructs a special packet called an ARP packet. An -ARP packet has several fields, including the sending and receiving IP -and MAC addresses. Both ARP query and response packets have the same -format. The purpose of the ARP query packet is to query all the other -hosts and routers on the subnet to determine the MAC address -corresponding to the IP address that is being resolved. Returning to our -example, 222.222.222.220 passes an ARP query packet to the adapter along -with an indication that the adapter should send the packet to the MAC -broadcast address, namely, FF-FF-FFFF-FF-FF. The adapter encapsulates -the ARP packet in a link-layer frame, uses the broadcast address for the -frame's destination address, and transmits the frame into the subnet. -Recalling our social security number/postal address analogy, an ARP -query is equivalent to a person shouting out in a crowded room of -cubicles in some company (say, AnyCorp): "What is the social security -number of the person whose postal address is Cubicle 13, Room 112, -AnyCorp, Palo Alto, California?" The frame containing the ARP query is -received by all the other adapters on the subnet, and (because of the -broadcast address) each adapter passes the ARP packet within the frame -up to its ARP module. Each of these ARP modules checks to see if its IP -address matches the destination IP address in the ARP packet. The one -with a match sends back to the querying host a response ARP packet with -the desired mapping. The querying host 222.222.222.220 can then update -its ARP table and send its IP datagram, encapsulated in a link-layer -frame whose destination MAC is that of the host or router responding to -the earlier ARP query. - -There are a couple of interesting things to note about the ARP protocol. -First, the query ARP message is sent within a broadcast frame, whereas -the response ARP message is sent within a standard frame. Before reading -on you should think about why this is so. Second, ARP is plug-and-play; -that is, an ARP table gets built automatically---it doesn't have to be -configured by a system administrator. And if a host becomes disconnected -from the subnet, its entry is eventually deleted from the other ARP -tables in the subnet. Students often wonder if ARP is a link-layer -protocol or a network-layer protocol. As we've seen, an ARP packet is -encapsulated within a link-layer frame and thus lies architecturally -above the link layer. However, an ARP packet has fields containing -link-layer addresses and thus is arguably a link-layer protocol, but it -also contains network-layer addresses and thus is also arguably a -network-layer protocol. In the end, ARP is probably best considered a -protocol that straddles the boundary between the link and network -layers---not fitting neatly into the simple layered protocol stack we -studied in Chapter 1. Such are the complexities of real-world protocols! -Sending a Datagram off the Subnet It should now be clear how ARP -operates when a host wants to send a datagram to another host on the -same subnet. But now let's look at the more complicated situation when a -host on a subnet wants to send a network-layer datagram to a host off -the subnet (that is, across a router onto another subnet). Let's discuss -this issue in the context of Figure 6.19, which shows a simple network -consisting of two subnets interconnected by a router. There are several -interesting things to note about Figure 6.19. Each host has exactly one -IP address and one adapter. But, as discussed in Chapter 4, a router has -an IP address for each of its interfaces. For each router interface -there is also an ARP module (in the router) and an adapter. Because the -router in Figure 6.19 has two interfaces, it has two IP addresses, two -ARP modules, and two adapters. Of course, each adapter in the network -has its own MAC address. - -Figure 6.19 Two subnets interconnected by a router - -Also note that Subnet 1 has the network address 111.111.111/24 and that -Subnet 2 has the network address 222.222.222/24. Thus all of the -interfaces connected to Subnet 1 have addresses of the form -111.111.111.xxx and all of the interfaces connected to Subnet 2 have -addresses of the form 222.222.222.xxx. Now let's examine how a host on -Subnet 1 would send a datagram to a host on Subnet 2. Specifically, -suppose that host 111.111.111.111 wants to send an IP datagram to a host -222.222.222.222. The sending host passes the datagram to its adapter, as -usual. But the sending host must also indicate to its adapter an -appropriate destination MAC address. What MAC address should the adapter -use? One might be tempted to guess that the appropriate MAC address is -that of the adapter for host 222.222.222.222, namely, 49-BD-D2-C7-56-2A. -This guess, however, would be wrong! If the sending adapter were to use -that MAC address, then none of the adapters on Subnet 1 would bother to -pass the IP datagram up to its network layer, since the frame's -destination address would not match the MAC address of any adapter on -Subnet 1. The datagram would just die and go to datagram heaven. If we -look carefully at Figure 6.19, we see that in order for a datagram to go -from 111.111.111.111 to a host on Subnet 2, the datagram must first be -sent to the router interface 111.111.111.110, which is the IP address of -the first-hop router on the path to the final destination. Thus, the -appropriate MAC address for the frame is the address of the adapter for -router interface 111.111.111.110, namely, E6-E9-00-17BB-4B. How does the -sending host acquire the MAC address for 111.111.111.110? By using ARP, -of course! Once the sending adapter has this MAC address, it creates a -frame (containing the datagram addressed to 222.222.222.222) and sends -the frame into Subnet 1. The router adapter on Subnet 1 sees that the -link-layer frame is addressed to it, and therefore passes the frame to -the network layer of the router. Hooray---the IP datagram has -successfully been moved from source host to the router! But we are not -finished. We still have to move the datagram from the router to the -destination. The router now has to determine the correct interface on -which the datagram is to be forwarded. As discussed in Chapter 4, this -is done by consulting a forwarding table in the router. The forwarding -table tells the router that the datagram is to be forwarded via router -interface 222.222.222.220. This interface then passes the datagram to -its adapter, which encapsulates the datagram in a new frame and sends -the frame into Subnet 2. This time, the destination MAC address of the -frame is indeed the MAC address of the ultimate destination. And how -does the router obtain this destination MAC address? From ARP, of -course! ARP for Ethernet is defined in RFC 826. A nice introduction to -ARP is given in the TCP/IP tutorial, RFC 1180. We'll explore ARP in more -detail in the homework problems. - -6.4.2 Ethernet - -Ethernet has pretty much taken over the wired LAN market. In the 1980s -and the early 1990s, Ethernet faced many challenges from other LAN -technologies, including token ring, FDDI, and ATM. Some of these other -technologies succeeded in capturing a part of the LAN market for a few -years. But since its invention in the mid-1970s, Ethernet has continued -to evolve and grow and has held on to its dominant position. Today, -Ethernet is by far the most prevalent wired LAN technology, and it is -likely to remain so for the foreseeable future. One might say that -Ethernet has been to local area networking what the Internet has been to -global networking. There are many reasons for Ethernet's success. First, -Ethernet was the first widely deployed high-speed LAN. Because it was -deployed early, network administrators became intimately familiar with -Ethernet--- its wonders and its quirks---and were reluctant to switch -over to other LAN technologies when they came on the scene. Second, -token ring, FDDI, and ATM were more complex and expensive than Ethernet, -which further discouraged network administrators from switching over. -Third, the most compelling reason to switch to another LAN technology -(such as FDDI or ATM) was usually the higher data rate of the new -technology; however, Ethernet always fought back, producing versions -that operated at equal data rates or higher. Switched Ethernet was also -introduced in the early 1990s, which further increased its effective -data rates. Finally, because Ethernet has been so popular, Ethernet -hardware (in particular, adapters and switches) has become a commodity -and is remarkably cheap. The original Ethernet LAN was invented in the -mid-1970s by Bob Metcalfe and David Boggs. The original Ethernet LAN -used a coaxial bus to interconnect the nodes. Bus topologies for -Ethernet actually persisted throughout the 1980s and into the mid-1990s. -Ethernet with a bus topology is a broadcast LAN ---all transmitted -frames travel to and are processed by all adapters connected to the bus. -Recall that we covered Ethernet's CSMA/CD multiple access protocol with -binary exponential backoff in Section 6.3.2. By the late 1990s, most -companies and universities had replaced their LANs with Ethernet -installations using a hub-based star topology. In such an installation -the hosts (and routers) are directly connected to a hub with -twisted-pair copper wire. A hub is a physical-layer device that acts on -individual bits rather than frames. When a bit, representing a zero or a -one, arrives from one interface, the hub simply recreates the bit, -boosts its energy strength, and transmits the bit onto all the other -interfaces. Thus, Ethernet with a hub-based star topology is also a -broadcast LAN---whenever a hub receives a bit from one of its -interfaces, it sends a copy out on all of its other interfaces. In -particular, if a hub receives frames from two different interfaces at -the same time, a collision occurs and the nodes that created the frames -must retransmit. In the early 2000s Ethernet experienced yet another -major evolutionary change. Ethernet installations continued to use a -star topology, but the hub at the center was replaced with a switch. -We'll be examining switched Ethernet in depth later in this chapter. For -now, we only mention that a switch is not only "collision-less" but is -also a bona-fide store-and-forward packet switch; but unlike routers, -which operate up through layer 3, a switch operates only up through -layer 2. - -Figure 6.20 Ethernet frame structure - -Ethernet Frame Structure We can learn a lot about Ethernet by examining -the Ethernet frame, which is shown in Figure 6.20. To give this -discussion about Ethernet frames a tangible context, let's consider -sending an IP datagram from one host to another host, with both hosts on -the same Ethernet LAN (for example, the Ethernet LAN in Figure 6.17.) -(Although the payload of our Ethernet frame is an IP datagram, we note -that an Ethernet frame can carry other network-layer packets as well.) -Let the sending adapter, adapter A, have the MAC address -AA-AA-AA-AA-AA-AA and the receiving adapter, adapter B, have the MAC -address BB-BB-BB-BB-BB-BB. The sending adapter encapsulates the IP -datagram within an Ethernet frame and passes the frame to the physical -layer. The receiving adapter receives the frame from the physical layer, -extracts the IP datagram, and passes the IP datagram to the network -layer. In this context, let's now examine the six fields of the Ethernet -frame, as shown in Figure 6.20. Data field (46 to 1,500 bytes). This -field carries the IP datagram. The maximum transmission unit (MTU) of -Ethernet is 1,500 bytes. This means that if the IP datagram exceeds -1,500 bytes, then the host has to fragment the datagram, as discussed in -Section 4.3.2. The minimum size of the data field is 46 bytes. This -means that if the IP datagram is less than 46 bytes, the data field has -to be "stuffed" to fill it out to 46 bytes. When stuffing is used, the -data passed to the network layer contains the stuffing as well as an IP -datagram. The network layer uses the length field in the IP datagram -header to remove the stuffing. Destination address (6 bytes). This field -contains the MAC address of the destination adapter, BBBB-BB-BB-BB-BB. -When adapter B receives an Ethernet frame whose destination address is -either BB-BB-BB-BB-BB-BB or the MAC broadcast address, it passes the -contents of the frame's data field to the network layer; if it receives -a frame with any other MAC address, it discards the frame. Source -address (6 bytes). This field contains the MAC address of the adapter -that transmits the frame onto the LAN, in this example, -AA-AA-AA-AA-AA-AA. Type field (2 bytes). The type field permits Ethernet -to multiplex network-layer protocols. To understand this, we need to -keep in mind that hosts can use other network-layer protocols besides -IP. In fact, a given host may support multiple network-layer protocols -using different protocols for different applications. For this reason, -when the Ethernet frame arrives at adapter B, adapter B needs to know to -which network-layer protocol it should pass (that is, demultiplex) the -contents of the data field. IP and other network-layer protocols (for -example, Novell IPX or AppleTalk) each have their own, standardized type -number. Furthermore, the ARP protocol (discussed in the previous - -section) has its own type number, and if the arriving frame contains an -ARP packet (i.e., has a type field of 0806 hexadecimal), the ARP packet -will be demultiplexed up to the ARP protocol. Note that the type field -is analogous to the protocol field in the network-layer datagram and the -port-number fields in the transport-layer segment; all of these fields -serve to glue a protocol at one layer to a protocol at the layer above. -Cyclic redundancy check (CRC) (4 bytes). As discussed in Section 6.2.3, -the purpose of the CRC field is to allow the receiving adapter, adapter -B, to detect bit errors in the frame. Preamble (8 bytes). The Ethernet -frame begins with an 8-byte preamble field. Each of the first 7 bytes of -the preamble has a value of 10101010; the last byte is 10101011. The -first 7 bytes of the preamble serve to "wake up" the receiving adapters -and to synchronize their clocks to that of the sender's clock. Why -should the clocks be out of synchronization? Keep in mind that adapter A -aims to transmit the frame at 10 Mbps, 100 Mbps, or 1 Gbps, depending on -the type of Ethernet LAN. However, because nothing is absolutely -perfect, adapter A will not transmit the frame at exactly the target -rate; there will always be some drift from the target rate, a drift -which is not known a priori by the other adapters on the LAN. A -receiving adapter can lock onto adapter A's clock simply by locking onto -the bits in the first 7 bytes of the preamble. The last 2 bits of the -eighth byte of the preamble (the first two consecutive 1s) alert adapter -B that the "important stuff" is about to come. All of the Ethernet -technologies provide connectionless service to the network layer. That -is, when adapter A wants to send a datagram to adapter B, adapter A -encapsulates the datagram in an Ethernet frame and sends the frame into -the LAN, without first handshaking with adapter B. This layer-2 -connectionless service is analogous to IP's layer-3 datagram service and -UDP's layer-4 connectionless service. Ethernet technologies provide an -unreliable service to the network layer. Specifically, when adapter B -receives a frame from adapter A, it runs the frame through a CRC check, -but neither sends an acknowledgment when a frame passes the CRC check -nor sends a negative acknowledgment when a frame fails the CRC check. -When a frame fails the CRC check, adapter B simply discards the frame. -Thus, adapter A has no idea whether its transmitted frame reached -adapter B and passed the CRC check. This lack of reliable transport (at -the link layer) helps to make Ethernet simple and cheap. But it also -means that the stream of datagrams passed to the network layer can have -gaps. - -CASE HISTORY BOB METCALFE AND ETHERNET As a PhD student at Harvard -University in the early 1970s, Bob Metcalfe worked on the ARPAnet at -MIT. During his studies, he also became exposed to Abramson's work on -ALOHA and random access protocols. After completing his PhD and just -before beginning a job at Xerox Palo Alto Research Center (Xerox PARC), -he visited Abramson and his University of Hawaii colleagues for three -months, getting a firsthand look at ALOHAnet. At Xerox PARC, Metcalfe - -became exposed to Alto computers, which in many ways were the -forerunners of the personal computers of the 1980s. Metcalfe saw the -need to network these computers in an inexpensive manner. So armed with -his knowledge about ARPAnet, ALOHAnet, and random access protocols, -Metcalfe---along with colleague David Boggs---invented Ethernet. -Metcalfe and Boggs's original Ethernet ran at 2.94 Mbps and linked up to -256 hosts separated by up to one mile. Metcalfe and Boggs succeeded at -getting most of the researchers at Xerox PARC to communicate through -their Alto computers. Metcalfe then forged an alliance between Xerox, -Digital, and Intel to establish Ethernet as a 10 Mbps Ethernet standard, -ratified by the IEEE. Xerox did not show much interest in -commercializing Ethernet. In 1979, Metcalfe formed his own company, -3Com, which developed and commercialized networking technology, -including Ethernet technology. In particular, 3Com developed and -marketed Ethernet cards in the early 1980s for the immensely popular IBM -PCs. - -If there are gaps due to discarded Ethernet frames, does the application -at Host B see gaps as well? As we learned in Chapter 3, this depends on -whether the application is using UDP or TCP. If the application is using -UDP, then the application in Host B will indeed see gaps in the data. On -the other hand, if the application is using TCP, then TCP in Host B will -not acknowledge the data contained in discarded frames, causing TCP in -Host A to retransmit. Note that when TCP retransmits data, the data will -eventually return to the Ethernet adapter at which it was discarded. -Thus, in this sense, Ethernet does retransmit data, although Ethernet is -unaware of whether it is transmitting a brand-new datagram with -brand-new data, or a datagram that contains data that has already been -transmitted at least once. Ethernet Technologies In our discussion -above, we've referred to Ethernet as if it were a single protocol -standard. But in fact, Ethernet comes in many different flavors, with -somewhat bewildering acronyms such as 10BASE-T, 10BASE-2, 100BASE-T, -1000BASE-LX, 10GBASE-T and 40GBASE-T. These and many other Ethernet -technologies have been standardized over the years by the IEEE 802.3 -CSMA/CD (Ethernet) working group \[IEEE 802.3 2012\]. While these -acronyms may appear bewildering, there is actually considerable order -here. The first part of the acronym refers to the speed of the standard: -10, 100, 1000, or 10G, for 10 Megabit (per second), 100 Megabit, -Gigabit, 10 Gigabit and 40 Gigibit Ethernet, respectively. "BASE" refers -to baseband Ethernet, meaning that the physical media only carries -Ethernet traffic; almost all of the 802.3 standards are for baseband -Ethernet. The final part of the acronym refers to the physical media -itself; Ethernet is both a link-layer and a physical-layer specification -and is carried over a variety of physical media including coaxial cable, -copper wire, and fiber. Generally, a "T" refers to twisted-pair copper -wires. Historically, an Ethernet was initially conceived of as a segment -of coaxial cable. The early 10BASE-2 and 10BASE-5 standards specify 10 -Mbps Ethernet over two types of coaxial cable, each limited in - -length to 500 meters. Longer runs could be obtained by using a -repeater---a physical-layer device that receives a signal on the input -side, and regenerates the signal on the output side. A coaxial cable -corresponds nicely to our view of Ethernet as a broadcast medium---all -frames transmitted by one interface are received at other interfaces, -and Ethernet's CDMA/CD protocol nicely solves the multiple access -problem. Nodes simply attach to the cable, and voila, we have a local -area network! Ethernet has passed through a series of evolutionary steps -over the years, and today's Ethernet is very different from the original -bus-topology designs using coaxial cable. In most installations today, -nodes are connected to a switch via point-to-point segments made of -twisted-pair copper wires or fiber-optic cables, as shown in Figures -6.15--6.17. In the mid-1990s, Ethernet was standardized at 100 Mbps, 10 -times faster than 10 Mbps Ethernet. The original Ethernet MAC protocol -and frame format were preserved, but higher-speed physical layers were -defined for copper wire (100BASE-T) and fiber (100BASE-FX, 100BASE-SX, -100BASE-BX). Figure 6.21 shows these different standards and the common -Ethernet MAC protocol and frame format. 100 Mbps Ethernet is limited to -a 100-meter distance over twisted pair, and to - -Figure 6.21 100 Mbps Ethernet standards: A common link layer, different -physical layers - -several kilometers over fiber, allowing Ethernet switches in different -buildings to be connected. Gigabit Ethernet is an extension to the -highly successful 10 Mbps and 100 Mbps Ethernet standards. Offering a -raw data rate of 40,000 Mbps, 40 Gigabit Ethernet maintains full -compatibility with the huge installed base of Ethernet equipment. The -standard for Gigabit Ethernet, referred to as IEEE 802.3z, does the -following: Uses the standard Ethernet frame format (Figure 6.20) and is -backward compatible with 10BASE-T and 100BASE-T technologies. This -allows for easy integration of Gigabit Ethernet with the existing -installed base of Ethernet equipment. Allows for point-to-point links as -well as shared broadcast channels. Point-to-point links use switches -while broadcast channels use hubs, as described earlier. In Gigabit -Ethernet jargon, hubs are called buffered distributors. Uses CSMA/CD for -shared broadcast channels. In order to have acceptable efficiency, the - -maximum distance between nodes must be severely restricted. Allows for -full-duplex operation at 40 Gbps in both directions for point-to-point -channels. Initially operating over optical fiber, Gigabit Ethernet is -now able to run over category 5 UTP cabling. Let's conclude our -discussion of Ethernet technology by posing a question that may have -begun troubling you. In the days of bus topologies and hub-based star -topologies, Ethernet was clearly a broadcast link (as defined in Section -6.3) in which frame collisions occurred when nodes transmitted at the -same time. To deal with these collisions, the Ethernet standard included -the CSMA/CD protocol, which is particularly effective for a wired -broadcast LAN spanning a small geographical region. But if the prevalent -use of Ethernet today is a switch-based star topology, using -store-and-forward packet switching, is there really a need anymore for -an Ethernet MAC protocol? As we'll see shortly, a switch coordinates its -transmissions and never forwards more than one frame onto the same -interface at any time. Furthermore, modern switches are full-duplex, so -that a switch and a node can each send frames to each other at the same -time without interference. In other words, in a switch-based Ethernet -LAN there are no collisions and, therefore, there is no need for a MAC -protocol! As we've seen, today's Ethernets are very different from the -original Ethernet conceived by Metcalfe and Boggs more than 30 years -ago---speeds have increased by three orders of magnitude, Ethernet -frames are carried over a variety of media, switched-Ethernets have -become dominant, and now even the MAC protocol is often unnecessary! Is -all of this really still Ethernet? The answer, of course, is "yes, by -definition." It is interesting to note, however, that through all of -these changes, there has indeed been one enduring constant that has -remained unchanged over 30 years---Ethernet's frame format. Perhaps this -then is the one true and timeless centerpiece of the Ethernet standard. - -6.4.3 Link-Layer Switches Up until this point, we have been purposefully -vague about what a switch actually does and how it works. The role of -the switch is to receive incoming link-layer frames and forward them -onto outgoing links; we'll study this forwarding function in detail in -this subsection. We'll see that the switch itself is transparent to the -hosts and routers in the subnet; that is, a host/router addresses a -frame to another host/router (rather than addressing the frame to the -switch) and happily sends the frame into the LAN, unaware that a switch -will be receiving the frame and forwarding it. The rate at which frames -arrive to any one of the switch's output interfaces may temporarily -exceed the link capacity of that interface. To accommodate this problem, -switch output interfaces have buffers, in much the same way that router -output interfaces have buffers for datagrams. Let's now take a closer -look at how switches operate. Forwarding and Filtering - -Filtering is the switch function that determines whether a frame should -be forwarded to some interface or should just be dropped. Forwarding is -the switch function that determines the interfaces to which a frame -should be directed, and then moves the frame to those interfaces. Switch -filtering and forwarding are done with a switch table. The switch table -contains entries for some, but not necessarily all, of the hosts and -routers on a LAN. An entry in the switch table contains (1) a MAC -address, (2) the switch interface that leads toward that MAC address, -and (3) the time at which the entry was placed in the table. An example -switch table for the uppermost switch in Figure 6.15 is shown in Figure -6.22. This description of frame forwarding may sound similar to our -discussion of datagram forwarding - -Figure 6.22 Portion of a switch table for the uppermost switch in Figure -6.15 - -in Chapter 4. Indeed, in our discussion of generalized forwarding in -Section 4.4, we learned that many modern packet switches can be -configured to forward on the basis of layer-2 destination MAC addresses -(i.e., function as a layer-2 switch) or layer-3 IP destination addresses -(i.e., function as a layer-3 router). Nonetheless, we'll make the -important distinction that switches forward packets based on MAC -addresses rather than on IP addresses. We will also see that a -traditional (i.e., in a non-SDN context) switch table is constructed in -a very different manner from a router's forwarding table. To understand -how switch filtering and forwarding work, suppose a frame with -destination address DDDD-DD-DD-DD-DD arrives at the switch on interface -x. The switch indexes its table with the MAC address DD-DD-DD-DD-DD-DD. -There are three possible cases: There is no entry in the table for -DD-DD-DD-DD-DD-DD. In this case, the switch forwards copies of the frame -to the output buffers preceding all interfaces except for interface x. -In other words, if there is no entry for the destination address, the -switch broadcasts the frame. There is an entry in the table, associating -DD-DD-DD-DD-DD-DD with interface x. In this case, the frame is coming -from a LAN segment that contains adapter DD-DD-DD-DD-DD-DD. There being -no need to forward the frame to any of the other interfaces, the switch -performs the filtering function by discarding the frame. There is an -entry in the table, associating DD-DD-DD-DD-DD-DD with interface y≠x. In -this case, the frame needs to be forwarded to the LAN segment attached -to interface y. The switch performs its forwarding function by putting -the frame in an output buffer that precedes interface y. - -Let's walk through these rules for the uppermost switch in Figure 6.15 -and its switch table in Figure 6.22. Suppose that a frame with -destination address 62-FE-F7-11-89-A3 arrives at the switch from -interface 1. The switch examines its table and sees that the destination -is on the LAN segment connected to interface 1 (that is, Electrical -Engineering). This means that the frame has already been broadcast on -the LAN segment that contains the destination. The switch therefore -filters (that is, discards) the frame. Now suppose a frame with the same -destination address arrives from interface 2. The switch again examines -its table and sees that the destination is in the direction of interface -1; it therefore forwards the frame to the output buffer preceding -interface 1. It should be clear from this example that as long as the -switch table is complete and accurate, the switch forwards frames toward -destinations without any broadcasting. In this sense, a switch is -"smarter" than a hub. But how does this switch table get configured in -the first place? Are there link-layer equivalents to network-layer -routing protocols? Or must an overworked manager manually configure the -switch table? Self-Learning A switch has the wonderful property -(particularly for the already-overworked network administrator) that its -table is built automatically, dynamically, and autonomously---without -any intervention from a network administrator or from a configuration -protocol. In other words, switches are self-learning. This capability is -accomplished as follows: - -1. The switch table is initially empty. -2. For each incoming frame received on an interface, the switch stores - in its table (1) the MAC address in the frame's source address - field, (2) the interface from which the frame arrived, and - -```{=html} -<!-- --> -``` -(3) the current time. In this manner the switch records in its table the - LAN segment on which the sender resides. If every host in the LAN - eventually sends a frame, then every host will eventually get - recorded in the table. - -```{=html} -<!-- --> -``` -3. The switch deletes an address in the table if no frames are received - with that address as the source address after some period of time - (the aging time). In this manner, if a PC is replaced by another PC - (with a different adapter), the MAC address of the original PC will - eventually be purged from the switch table. Let's walk through the - self-learning property for the uppermost switch in Figure 6.15 and - its corresponding switch table in Figure 6.22. Suppose at time 9:39 - a frame with source address 01-12-2334-45-56 arrives from - interface 2. Suppose that this address is not in the switch table. - Then the switch adds a new entry to the table, as shown in Figure - 6.23. Continuing with this same example, suppose that the aging time - for this switch is 60 minutes, and no frames with source address - 62-FE-F7-11-89-A3 arrive to the switch between 9:32 and 10:32. Then - at - -time 10:32, the switch removes this address from its table. - -Figure 6.23 Switch learns about the location of an adapter with address -01-12-23-34-45-56 - -Switches are plug-and-play devices because they require no intervention -from a network administrator or user. A network administrator wanting to -install a switch need do nothing more than connect the LAN segments to -the switch interfaces. The administrator need not configure the switch -tables at the time of installation or when a host is removed from one of -the LAN segments. Switches are also full-duplex, meaning any switch -interface can send and receive at the same time. Properties of -Link-Layer Switching Having described the basic operation of a -link-layer switch, let's now consider their features and properties. We -can identify several advantages of using switches, rather than broadcast -links such as buses or hub-based star topologies: Elimination of -collisions. In a LAN built from switches (and without hubs), there is no -wasted bandwidth due to collisions! The switches buffer frames and never -transmit more than one frame on a segment at any one time. As with a -router, the maximum aggregate throughput of a switch is the sum of all -the switch interface rates. Thus, switches provide a significant -performance improvement over LANs with broadcast links. Heterogeneous -links. Because a switch isolates one link from another, the different -links in the LAN can operate at different speeds and can run over -different media. For example, the uppermost switch in Figure 6.15 might -have three1 Gbps 1000BASE-T copper links, two 100 Mbps 100BASEFX fiber -links, and one 100BASE-T copper link. Thus, a switch is ideal for mixing -legacy equipment with new equipment. Management. In addition to -providing enhanced security (see sidebar on Focus on Security), a switch -also eases network management. For example, if an adapter malfunctions -and continually sends Ethernet frames (called a jabbering adapter), a -switch can detect the problem and internally disconnect the -malfunctioning adapter. With this feature, the network administrator -need not get out of bed and drive back to work in order to correct the -problem. Similarly, a cable cut disconnects only that host that was -using the cut cable to connect to the switch. In the days of coaxial -cable, many a - -network manager spent hours "walking the line" (or more accurately, -"crawling the floor") to find the cable break that brought down the -entire network. Switches also gather statistics on bandwidth usage, -collision rates, and traffic types, and make this information available -to the network manager. This information can be used to debug and -correct problems, and to plan how the LAN should evolve in the future. -Researchers are exploring adding yet more management functionality into -Ethernet LANs in prototype deployments \[Casado 2007; Koponen 2011\]. -FOCUS ON SECURITY SNIFFING A SWITCHED LAN: SWITCH POISONING When a host -is connected to a switch, it typically only receives frames that are -intended for it. For example, consider a switched LAN in Figure 6.17. -When host A sends a frame to host B, and there is an entry for host B in -the switch table, then the switch will forward the frame only to host B. -If host C happens to be running a sniffer, host C will not be able to -sniff this A-to-B frame. Thus, in a switched-LAN environment (in -contrast to a broadcast link environment such as 802.11 LANs or -hub--based Ethernet LANs), it is more difficult for an attacker to sniff -frames. However, because the switch broadcasts frames that have -destination addresses that are not in the switch table, the sniffer at C -can still sniff some frames that are not intended for C. Furthermore, a -sniffer will be able sniff all Ethernet broadcast frames with broadcast -destination address FF--FF--FF--FF--FF--FF. A well-known attack against -a switch, called switch poisoning, is to send tons of packets to the -switch with many different bogus source MAC addresses, thereby filling -the switch table with bogus entries and leaving no room for the MAC -addresses of the legitimate hosts. This causes the switch to broadcast -most frames, which can then be picked up by the sniffer \[Skoudis -2006\]. As this attack is rather involved even for a sophisticated -attacker, switches are significantly less vulnerable to sniffing than -are hubs and wireless LANs. - -Switches Versus Routers As we learned in Chapter 4, routers are -store-and-forward packet switches that forward packets using -network-layer addresses. Although a switch is also a store-and-forward -packet switch, it is fundamentally different from a router in that it -forwards packets using MAC addresses. Whereas a router is a layer-3 -packet switch, a switch is a layer-2 packet switch. Recall, however, -that we learned in Section 4.4 that modern switches using the "match -plus action" operation can be used to forward a layer-2 frame based on -the frame's destination MAC address, as well as a layer-3 datagram using -the datagram's destination IP address. Indeed, we saw that switches -using the OpenFlow standard can perform generalized packet forwarding -based on any of eleven different frame, datagram, and transportlayer -header fields. - -Even though switches and routers are fundamentally different, network -administrators must often choose between them when installing an -interconnection device. For example, for the network in Figure 6.15, the -network administrator could just as easily have used a router instead of -a switch to connect the department LANs, servers, and internet gateway -router. Indeed, a router would permit interdepartmental communication -without creating collisions. Given that both switches and routers are -candidates for interconnection devices, what are the pros and cons of -the two approaches? - -Figure 6.24 Packet processing in switches, routers, and hosts - -First consider the pros and cons of switches. As mentioned above, -switches are plug-and-play, a property that is cherished by all the -overworked network administrators of the world. Switches can also have -relatively high filtering and forwarding rates---as shown in Figure -6.24, switches have to process frames only up through layer 2, whereas -routers have to process datagrams up through layer 3. On the other hand, -to prevent the cycling of broadcast frames, the active topology of a -switched network is restricted to a spanning tree. Also, a large -switched network would require large ARP tables in the hosts and routers -and would generate substantial ARP traffic and processing. Furthermore, -switches are susceptible to broadcast storms---if one host goes haywire -and transmits an endless stream of Ethernet broadcast frames, the -switches will forward all of these frames, causing the entire network to -collapse. Now consider the pros and cons of routers. Because network -addressing is often hierarchical (and not flat, as is MAC addressing), -packets do not normally cycle through routers even when the network has -redundant paths. (However, packets can cycle when router tables are -misconfigured; but as we learned in Chapter 4, IP uses a special -datagram header field to limit the cycling.) Thus, packets are not -restricted to a spanning tree and can use the best path between source -and destination. Because routers do not have the spanning tree -restriction, they have allowed the Internet to be built with a rich -topology that includes, for example, multiple active links between -Europe and North America. Another feature of routers is that they -provide firewall protection against layer-2 broadcast storms. Perhaps -the most significant drawback of routers, though, is that they are not -plug-and-play---they and the hosts that connect to them need their IP -addresses to be configured. Also, routers often have a larger per-packet -processing time than switches, because they have to process up through -the layer-3 fields. Finally, there - -are two different ways to pronounce the word router, either as "rootor" -or as "rowter," and people waste a lot of time arguing over the proper -pronunciation \[Perlman 1999\]. Given that both switches and routers -have their pros and cons (as summarized in Table 6.1), when should an -institutional network (for example, a university campus Table 6.1 -Comparison of the typical features of popular interconnection devices -Hubs - -Routers - -Switches - -Traffic isolation - -No - -Yes - -Yes - -Plug and play - -Yes - -No - -Yes - -Optimal routing - -No - -Yes - -No - -network or a corporate campus network) use switches, and when should it -use routers? Typically, small networks consisting of a few hundred hosts -have a few LAN segments. Switches suffice for these small networks, as -they localize traffic and increase aggregate throughput without -requiring any configuration of IP addresses. But larger networks -consisting of thousands of hosts typically include routers within the -network (in addition to switches). The routers provide a more robust -isolation of traffic, control broadcast storms, and use more -"intelligent" routes among the hosts in the network. For more discussion -of the pros and cons of switched versus routed networks, as well as a -discussion of how switched LAN technology can be extended to accommodate -two orders of magnitude more hosts than today's Ethernets, see \[Meyers -2004; Kim 2008\]. - -6.4.4 Virtual Local Area Networks (VLANs) In our earlier discussion of -Figure 6.15, we noted that modern institutional LANs are often -configured hierarchically, with each workgroup (department) having its -own switched LAN connected to the switched LANs of other groups via a -switch hierarchy. While such a configuration works well in an ideal -world, the real world is often far from ideal. Three drawbacks can be -identified in the configuration in Figure 6.15: Lack of traffic -isolation. Although the hierarchy localizes group traffic to within a -single switch, broadcast traffic (e.g., frames carrying ARP and DHCP -messages or frames whose destination has not yet been learned by a -self-learning switch) must still traverse the entire institutional -network. - -Limiting the scope of such broadcast traffic would improve LAN -performance. Perhaps more importantly, it also may be desirable to limit -LAN broadcast traffic for security/privacy reasons. For example, if one -group contains the company's executive management team and another group -contains disgruntled employees running Wireshark packet sniffers, the -network manager may well prefer that the executives' traffic never even -reaches employee hosts. This type of isolation could be provided by -replacing the center switch in Figure 6.15 with a router. We'll see -shortly that this isolation also can be achieved via a switched (layer -2) solution. Inefficient use of switches. If instead of three groups, -the institution had 10 groups, then 10 firstlevel switches would be -required. If each group were small, say less than 10 people, then a -single 96-port switch would likely be large enough to accommodate -everyone, but this single switch would not provide traffic isolation. -Managing users. If an employee moves between groups, the physical -cabling must be changed to connect the employee to a different switch in -Figure 6.15. Employees belonging to two groups make the problem even -harder. Fortunately, each of these difficulties can be handled by a -switch that supports virtual local area networks (VLANs). As the name -suggests, a switch that supports VLANs allows multiple virtual local -area networks to be defined over a single physical local area network -infrastructure. Hosts within a VLAN communicate with each other as if -they (and no other hosts) were connected to the switch. In a port-based -VLAN, the switch's ports (interfaces) are divided into groups by the -network manager. Each group constitutes a VLAN, with the ports in each -VLAN forming a broadcast domain (i.e., broadcast traffic from one port -can only reach other ports in the group). Figure 6.25 shows a single -switch with 16 ports. Ports 2 to 8 belong to the EE VLAN, while ports 9 -to 15 belong to the CS VLAN (ports 1 and 16 are unassigned). This VLAN -solves all of the difficulties noted above---EE and CS VLAN frames are -isolated from each other, the two switches in Figure 6.15 have been -replaced by a single switch, and if the user at switch port 8 joins the -CS Department, the network operator simply reconfigures the VLAN -software so that port 8 is now associated with the CS VLAN. One can -easily imagine how the VLAN switch is configured and operates---the -network manager declares a port to belong - -Figure 6.25 A single switch with two configured VLANs - -to a given VLAN (with undeclared ports belonging to a default VLAN) -using switch management software, a table of port-to-VLAN mappings is -maintained within the switch; and switch hardware only delivers frames -between ports belonging to the same VLAN. But by completely isolating -the two VLANs, we have introduced a new difficulty! How can traffic from -the EE Department be sent to the CS Department? One way to handle this -would be to connect a VLAN switch port (e.g., port 1 in Figure 6.25) to -an external router and configure that port to belong both the EE and CS -VLANs. In this case, even though the EE and CS departments share the -same physical switch, the logical configuration would look as if the EE -and CS departments had separate switches connected via a router. An IP -datagram going from the EE to the CS department would first cross the EE -VLAN to reach the router and then be forwarded by the router back over -the CS VLAN to the CS host. Fortunately, switch vendors make such -configurations easy for the network manager by building a single device -that contains both a VLAN switch and a router, so a separate external -router is not needed. A homework problem at the end of the chapter -explores this scenario in more detail. Returning again to Figure 6.15, -let's now suppose that rather than having a separate Computer -Engineering department, some EE and CS faculty are housed in a separate -building, where (of course!) they need network access, and (of course!) -they'd like to be part of their department's VLAN. Figure 6.26 shows a -second 8-port switch, where the switch ports have been defined as -belonging to the EE or the CS VLAN, as needed. But how should these two -switches be interconnected? One easy solution would be to define a port -belonging to the CS VLAN on each switch (similarly for the EE VLAN) and -to connect these ports to each other, as shown in Figure 6.26(a). This -solution doesn't scale, however, since N VLANS would require N ports on -each switch simply to interconnect the two switches. A more scalable -approach to interconnecting VLAN switches is known as VLAN trunking. In -the VLAN trunking approach shown in Figure 6.26(b), a special port on -each switch (port 16 on the left switch and port 1 on the right switch) -is configured as a trunk port to interconnect the two VLAN switches. The -trunk port belongs to all VLANs, and frames sent to any VLAN are -forwarded over the trunk link to the other switch. But this raises yet -another question: How does a switch know that a frame arriving on a -trunk port belongs to a particular VLAN? The IEEE has defined an -extended Ethernet frame format, 802.1Q, for frames crossing a VLAN -trunk. As shown in Figure 6.27, the 802.1Q frame consists of the -standard Ethernet frame with a four-byte VLAN tag added into the header -that carries the identity of the VLAN to which the frame belongs. The -VLAN tag is added into a frame by the switch at the sending side of a -VLAN trunk, parsed, and removed by the switch at the receiving side of -the trunk. The VLAN tag itself consists of a 2-byte Tag Protocol -Identifier (TPID) field (with a fixed hexadecimal value of 81-00), a -2byte Tag Control Information field that contains a 12-bit VLAN -identifier field, and a 3-bit priority field that is similar in intent -to the IP datagram TOS field. - -Figure 6.26 Connecting two VLAN switches with two VLANs: (a) two cables -(b) trunked - -Figure 6.27 Original Ethernet frame (top), 802.1Q-tagged Ethernet VLAN -frame (below) - -In this discussion, we've only briefly touched on VLANs and have focused -on port-based VLANs. We should also mention that VLANs can be defined in -several other ways. In MAC-based VLANs, the network manager specifies -the set of MAC addresses that belong to each VLAN; whenever a device -attaches to a port, the port is connected into the appropriate VLAN -based on the MAC address of the device. VLANs can also be defined based -on network-layer protocols (e.g., IPv4, IPv6, or Appletalk) and other -criteria. It is also possible for VLANs to be extended across IP -routers, allowing islands of LANs to be connected together to form a -single VLAN that could span the globe \[Yu 2011\]. See the 802.1Q -standard \[IEEE 802.1q 2005\] for more details. - -6.5 Link Virtualization: A Network as a Link Layer Because this chapter -concerns link-layer protocols, and given that we're now nearing the -chapter's end, let's reflect on how our understanding of the term link -has evolved. We began this chapter by viewing the link as a physical -wire connecting two communicating hosts. In studying multiple access -protocols, we saw that multiple hosts could be connected by a shared -wire and that the "wire" connecting the hosts could be radio spectra or -other media. This led us to consider the link a bit more abstractly as a -channel, rather than as a wire. In our study of Ethernet LANs (Figure -6.15) we saw that the interconnecting media could actually be a rather -complex switched infrastructure. Throughout this evolution, however, the -hosts themselves maintained the view that the interconnecting medium was -simply a link-layer channel connecting two or more hosts. We saw, for -example, that an Ethernet host can be blissfully unaware of whether it -is connected to other LAN hosts by a single short LAN segment (Figure -6.17) or by a geographically dispersed switched LAN (Figure 6.15) or by -a VLAN (Figure 6.26). In the case of a dialup modem connection between -two hosts, the link connecting the two hosts is actually the telephone -network---a logically separate, global telecommunications network with -its own switches, links, and protocol stacks for data transfer and -signaling. From the Internet link-layer point of view, however, the -dial-up connection through the telephone network is viewed as a simple -"wire." In this sense, the Internet virtualizes the telephone network, -viewing the telephone network as a link-layer technology providing -link-layer connectivity between two Internet hosts. You may recall from -our discussion of overlay networks in Chapter 2 that an overlay network -similarly views the Internet as a means for providing connectivity -between overlay nodes, seeking to overlay the Internet in the same way -that the Internet overlays the telephone network. In this section, we'll -consider Multiprotocol Label Switching (MPLS) networks. Unlike the -circuit-switched telephone network, MPLS is a packet-switched, -virtual-circuit network in its own right. It has its own packet formats -and forwarding behaviors. Thus, from a pedagogical viewpoint, a -discussion of MPLS fits well into a study of either the network layer or -the link layer. From an Internet viewpoint, however, we can consider -MPLS, like the telephone network and switched-Ethernets, as a link-layer -technology that serves to interconnect IP devices. Thus, we'll consider -MPLS in our discussion of the link layer. Framerelay and ATM networks -can also be used to interconnect IP devices, though they represent a -slightly older (but still deployed) technology and will not be covered -here; see the very readable book \[Goralski 1999\] for details. Our -treatment of MPLS will be necessarily brief, as entire books could be -(and have been) written on these networks. We recommend \[Davie 2000\] -for details on MPLS. We'll focus here primarily on how MPLS servers -interconnect to IP devices, although we'll dive a bit deeper into the -underlying technologies as well. - -6.5.1 Multiprotocol Label Switching (MPLS) Multiprotocol Label Switching -(MPLS) evolved from a number of industry efforts in the mid-to-late -1990s to improve the forwarding speed of IP routers by adopting a key -concept from the world of virtual-circuit networks: a fixed-length -label. The goal was not to abandon the destination-based IP -datagramforwarding infrastructure for one based on fixed-length labels -and virtual circuits, but to augment it by selectively labeling -datagrams and allowing routers to forward datagrams based on -fixed-length labels (rather than destination IP addresses) when -possible. Importantly, these techniques work hand-in-hand with IP, using -IP addressing and routing. The IETF unified these efforts in the MPLS -protocol \[RFC 3031, RFC 3032\], effectively blending VC techniques into -a routed datagram network. Let's begin our study of MPLS by considering -the format of a link-layer frame that is handled by an MPLS-capable -router. Figure 6.28 shows that a link-layer frame transmitted between -MPLS-capable devices has a small MPLS header added between the layer-2 -(e.g., Ethernet) header and layer-3 (i.e., IP) header. RFC 3032 defines -the format of the MPLS header for such links; headers are defined for -ATM and frame-relayed networks as well in other RFCs. Among the fields -in the MPLS - -Figure 6.28 MPLS header: Located between link- and network-layer headers - -header are the label, 3 bits reserved for experimental use, a single S -bit, which is used to indicate the end of a series of "stacked" MPLS -headers (an advanced topic that we'll not cover here), and a time-tolive -field. It's immediately evident from Figure 6.28 that an MPLS-enhanced -frame can only be sent between routers that are both MPLS capable (since -a non-MPLS-capable router would be quite confused when it found an MPLS -header where it had expected to find the IP header!). An MPLS-capable -router is often referred to as a label-switched router, since it -forwards an MPLS frame by looking up the MPLS label in its forwarding -table and then immediately passing the datagram to the appropriate -output interface. Thus, the MPLS-capable router need not extract the -destination IP address and perform a lookup of the longest prefix match -in the forwarding table. But how does a router know if its neighbor is -indeed MPLS capable, and how does a router know what label to associate -with the given IP destination? To answer these questions, we'll need to -take a look at the interaction among a group of MPLS-capable routers. - -In the example in Figure 6.29, routers R1 through R4 are MPLS capable. -R5 and R6 are standard IP routers. R1 has advertised to R2 and R3 that -it (R1) can route to destination A, and that a received frame with MPLS -label 6 will be forwarded to destination A. Router R3 has advertised to -router R4 that it can route to destinations A and D, and that incoming -frames with MPLS labels 10 and 12, respectively, will be switched toward -those destinations. Router R2 has also advertised to router R4 that it -(R2) can reach destination A, and that a received frame with MPLS label -8 will be switched toward A. Note that router R4 is now in the -interesting position of having - -Figure 6.29 MPLS-enhanced forwarding - -two MPLS paths to reach A: via interface 0 with outbound MPLS label 10, -and via interface 1 with an MPLS label of 8. The broad picture painted -in Figure 6.29 is that IP devices R5, R6, A, and D are connected -together via an MPLS infrastructure (MPLS-capable routers R1, R2, R3, -and R4) in much the same way that a switched LAN or an ATM network can -connect together IP devices. And like a switched LAN or ATM network, the -MPLS-capable routers R1 through R4 do so without ever touching the IP -header of a packet. In our discussion above, we've not specified the -specific protocol used to distribute labels among the MPLS-capable -routers, as the details of this signaling are well beyond the scope of -this book. We note, however, that the IETF working group on MPLS has -specified in \[RFC 3468\] that an extension of the RSVP protocol, known -as RSVP-TE \[RFC 3209\], will be the focus of its efforts for MPLS -signaling. We've also not discussed how MPLS actually computes the paths -for packets among MPLS capable routers, nor how it gathers link-state -information (e.g., amount of link bandwidth unreserved by MPLS) to - -use in these path computations. Existing link-state routing algorithms -(e.g., OSPF) have been extended to flood this information to -MPLS-capable routers. Interestingly, the actual path computation -algorithms are not standardized, and are currently vendor-specific. Thus -far, the emphasis of our discussion of MPLS has been on the fact that -MPLS performs switching based on labels, without needing to consider the -IP address of a packet. The true advantages of MPLS and the reason for -current interest in MPLS, however, lie not in the potential increases in -switching speeds, but rather in the new traffic management capabilities -that MPLS enables. As noted above, R4 has two MPLS paths to A. If -forwarding were performed up at the IP layer on the basis of IP address, -the IP routing protocols we studied in Chapter 5 would specify only a -single, least-cost path to A. Thus, MPLS provides the ability to forward -packets along routes that would not be possible using standard IP -routing protocols. This is one simple form of traffic engineering using -MPLS \[RFC 3346; RFC 3272; RFC 2702; Xiao 2000\], in which a network -operator can override normal IP routing and force some of the traffic -headed toward a given destination along one path, and other traffic -destined toward the same destination along another path (whether for -policy, performance, or some other reason). It is also possible to use -MPLS for many other purposes as well. It can be used to perform fast -restoration of MPLS forwarding paths, e.g., to reroute traffic over a -precomputed failover path in response to link failure \[Kar 2000; Huang -2002; RFC 3469\]. Finally, we note that MPLS can, and has, been used to -implement so-called virtual private networks (VPNs). In implementing a -VPN for a customer, an ISP uses its MPLS-enabled network to connect -together the customer's various networks. MPLS can be used to isolate -both the resources and addressing used by the customer's VPN from that -of other users crossing the ISP's network; see \[DeClercq 2002\] for -details. Our discussion of MPLS has been brief, and we encourage you to -consult the references we've mentioned. We note that with so many -possible uses for MPLS, it appears that it is rapidly becoming the Swiss -Army knife of Internet traffic engineering! - -6.6 Data Center Networking In recent years, Internet companies such as -Google, Microsoft, Facebook, and Amazon (as well as their counterparts -in Asia and Europe) have built massive data centers, each housing tens -to hundreds of thousands of hosts, and concurrently supporting many -distinct cloud applications (e.g., search, e-mail, social networking, -and e-commerce). Each data center has its own data center network that -interconnects its hosts with each other and interconnects the data -center with the Internet. In this section, we provide a brief -introduction to data center networking for cloud applications. The cost -of a large data center is huge, exceeding \$12 million per month for a -100,000 host data center \[Greenberg 2009a\]. Of these costs, about 45 -percent can be attributed to the hosts themselves (which need to be -replaced every 3--4 years); 25 percent to infrastructure, including -transformers, uninterruptable power supplies (UPS) systems, generators -for long-term outages, and cooling systems; 15 percent for electric -utility costs for the power draw; and 15 percent for networking, -including network gear (switches, routers and load balancers), external -links, and transit traffic costs. (In these percentages, costs for -equipment are amortized so that a common cost metric is applied for -one-time purchases and ongoing expenses such as power.) While networking -is not the largest cost, networking innovation is the key to reducing -overall cost and maximizing performance \[Greenberg 2009a\]. The worker -bees in a data center are the hosts: They serve content (e.g., Web pages -and videos), store e-mails and documents, and collectively perform -massively distributed computations (e.g., distributed index computations -for search engines). The hosts in data centers, called blades and -resembling pizza boxes, are generally commodity hosts that include CPU, -memory, and disk storage. The hosts are stacked in racks, with each rack -typically having 20 to 40 blades. At the top of each rack there is a -switch, aptly named the Top of Rack (TOR) switch, that interconnects the -hosts in the rack with each other and with other switches in the data -center. Specifically, each host in the rack has a network interface card -that connects to its TOR switch, and each TOR switch has additional -ports that can be connected to other switches. Today hosts typically -have 40 Gbps Ethernet connections to their TOR switches \[Greenberg -2015\]. Each host is also assigned its own data-center-internal IP -address. The data center network supports two types of traffic: traffic -flowing between external clients and internal hosts and traffic flowing -between internal hosts. To handle flows between external clients and -internal hosts, the data center network includes one or more border -routers, connecting the data center network to the public Internet. The -data center network therefore interconnects the racks with each other -and connects the racks to the border routers. Figure 6.30 shows an -example of a data center network. Data center network design, the art of -designing the interconnection network and protocols that connect the -racks with each other and with the border routers, has become an -important branch of - -computer networking research in recent years \[Al-Fares 2008; Greenberg -2009a; Greenberg 2009b; Mysore 2009; Guo 2009; Wang 2010\]. - -Figure 6.30 A data center network with a hierarchical topology - -Load Balancing A cloud data center, such as a Google or Microsoft data -center, provides many applications concurrently, such as search, e-mail, -and video applications. To support requests from external clients, each -application is associated with a publicly visible IP address to which -clients send their requests and from which they receive responses. -Inside the data center, the external requests are first directed to a -load balancer whose job it is to distribute requests to the hosts, -balancing the load across the hosts as a function of their current load. -A large data center will often have several load balancers, each one -devoted to a set of specific cloud applications. Such a load balancer is -sometimes referred to as a "layer-4 switch" since it makes decisions -based on the destination port number (layer 4) as well as destination IP -address in the packet. Upon receiving a request for a particular -application, the load balancer forwards it to one of the hosts that -handles the application. (A host may then invoke the services of other -hosts to help process the request.) When the host finishes processing -the request, it sends its response back to the load balancer, which in -turn relays the response back to the external client. The load balancer -not only balances the work load across hosts, but also provides a -NAT-like function, translating the public external IP address to the -internal IP address of the appropriate host, and - -then translating back for packets traveling in the reverse direction -back to the clients. This prevents clients from contacting hosts -directly, which has the security benefit of hiding the internal network -structure and preventing clients from directly interacting with the -hosts. Hierarchical Architecture For a small data center housing only a -few thousand hosts, a simple network consisting of a border router, a -load balancer, and a few tens of racks all interconnected by a single -Ethernet switch could possibly suffice. But to scale to tens to hundreds -of thousands of hosts, a data center often employs a hierarchy of -routers and switches, such as the topology shown in Figure 6.30. At the -top of the hierarchy, the border router connects to access routers (only -two are shown in Figure 6.30, but there can be many more). Below each -access router there are three tiers of switches. Each access router -connects to a top-tier switch, and each top-tier switch connects to -multiple second-tier switches and a load balancer. Each second-tier -switch in turn connects to multiple racks via the racks' TOR switches -(third-tier switches). All links typically use Ethernet for their -link-layer and physical-layer protocols, with a mix of copper and fiber -cabling. With such a hierarchical design, it is possible to scale a data -center to hundreds of thousands of hosts. Because it is critical for a -cloud application provider to continually provide applications with high -availability, data centers also include redundant network equipment and -redundant links in their designs (not shown in Figure 6.30). For -example, each TOR switch can connect to two tier-2 switches, and each -access router, tier-1 switch, and tier-2 switch can be duplicated and -integrated into the design \[Cisco 2012; Greenberg 2009b\]. In the -hierarchical design in Figure 6.30, observe that the hosts below each -access router form a single subnet. In order to localize ARP broadcast -traffic, each of these subnets is further partitioned into smaller VLAN -subnets, each comprising a few hundred hosts \[Greenberg 2009a\]. -Although the conventional hierarchical architecture just described -solves the problem of scale, it suffers from limited host-to-host -capacity \[Greenberg 2009b\]. To understand this limitation, consider -again Figure 6.30, and suppose each host connects to its TOR switch with -a 1 Gbps link, whereas the links between switches are 10 Gbps Ethernet -links. Two hosts in the same rack can always communicate at a full 1 -Gbps, limited only by the rate of the hosts' network interface cards. -However, if there are many simultaneous flows in the data center -network, the maximum rate between two hosts in different racks can be -much less. To gain insight into this issue, consider a traffic pattern -consisting of 40 simultaneous flows between 40 pairs of hosts in -different racks. Specifically, suppose each of 10 hosts in rack 1 in -Figure 6.30 sends a flow to a corresponding host in rack 5. Similarly, -there are ten simultaneous flows between pairs of hosts in racks 2 and -6, ten simultaneous flows between racks 3 and 7, and ten simultaneous -flows between racks 4 and 8. If each flow evenly shares a link's -capacity with other flows traversing that link, then the 40 flows -crossing the 10 Gbps A-to-B link (as well as the 10 Gbps B-to-C link) -will each only receive 10 Gbps/40=250 Mbps, which is significantly less -than the 1 Gbps network - -interface card rate. The problem becomes even more acute for flows -between hosts that need to travel higher up the hierarchy. One possible -solution to this limitation is to deploy higher-rate switches and -routers. But this would significantly increase the cost of the data -center, because switches and routers with high port speeds are very -expensive. Supporting high-bandwidth host-to-host communication is -important because a key requirement in data centers is flexibility in -placement of computation and services \[Greenberg 2009b; Farrington -2010\]. For example, a large-scale Internet search engine may run on -thousands of hosts spread across multiple racks with significant -bandwidth requirements between all pairs of hosts. Similarly, a cloud -computing service such as EC2 may wish to place the multiple virtual -machines comprising a customer's service on the physical hosts with the -most capacity irrespective of their location in the data center. If -these physical hosts are spread across multiple racks, network -bottlenecks as described above may result in poor performance. Trends in -Data Center Networking In order to reduce the cost of data centers, and -at the same time improve their delay and throughput performance, -Internet cloud giants such as Google, Facebook, Amazon, and Microsoft -are continually deploying new data center network designs. Although -these designs are proprietary, many important trends can nevertheless be -identified. One such trend is to deploy new interconnection -architectures and network protocols that overcome the drawbacks of the -traditional hierarchical designs. One such approach is to replace the -hierarchy of switches and routers with a fully connected topology -\[Facebook 2014; Al-Fares 2008; Greenberg 2009b; Guo 2009\], such as the -topology shown in Figure 6.31. In this design, each tier-1 switch -connects to all of the tier-2 switches so that (1) host-to-host traffic -never has to rise above the switch tiers, and (2) with n tier-1 -switches, between any two tier-2 switches there are n disjoint paths. -Such a design can significantly improve the host-to-host capacity. To -see this, consider again our example of 40 flows. The topology in Figure -6.31 can handle such a flow pattern since there are four distinct paths -between the first tier-2 switch and the second tier-2 switch, together -providing an aggregate capacity of 40 Gbps between the first two tier-2 -switches. Such a design not only alleviates the host-to-host capacity -limitation, but also creates a more flexible computation and service -environment in which communication between any two racks not connected -to the same switch is logically equivalent, irrespective of their -locations in the data center. Another major trend is to employ shipping -container--based modular data centers (MDCs) \[YouTube 2009; Waldrop -2007\]. In an MDC, a factory builds, within a - -Figure 6.31 Highly interconnected data network topology - -standard 12-meter shipping container, a "mini data center" and ships the -container to the data center location. Each container has up to a few -thousand hosts, stacked in tens of racks, which are packed closely -together. At the data center location, multiple containers are -interconnected with each other and also with the Internet. Once a -prefabricated container is deployed at a data center, it is often -difficult to service. Thus, each container is designed for graceful -performance degradation: as components (servers and switches) fail over -time, the container continues to operate but with degraded performance. -When many components have failed and performance has dropped below a -threshold, the entire container is removed and replaced with a fresh -one. Building a data center out of containers creates new networking -challenges. With an MDC, there are two types of networks: the -container-internal networks within each of the containers and the core -network connecting each container \[Guo 2009; Farrington 2010\]. Within -each container, at the scale of up to a few thousand hosts, it is -possible to build a fully connected network (as described above) using -inexpensive commodity Gigabit Ethernet switches. However, the design of -the core network, interconnecting hundreds to thousands of containers -while providing high host-to-host bandwidth across containers for -typical workloads, remains a challenging problem. A hybrid -electrical/optical switch architecture for interconnecting the -containers is proposed in \[Farrington 2010\]. When using highly -interconnected topologies, one of the major issues is designing routing -algorithms among the switches. One possibility \[Greenberg 2009b\] is to -use a form of random routing. Another possibility \[Guo 2009\] is to -deploy multiple network interface cards in each host, connect each host -to multiple low-cost commodity switches, and allow the hosts themselves -to intelligently route traffic among the switches. Variations and -extensions of these approaches are currently being deployed in -contemporary data centers. Another important trend is that large cloud -providers are increasingly building or customizing just about everything -that is in their data centers, including network adapters, switches -routers, TORs, software, - -and networking protocols \[Greenberg 2015, Singh 2015\]. Another trend, -pioneered by Amazon, is to improve reliability with "availability -zones," which essentially replicate distinct data centers in different -nearby buildings. By having the buildings nearby (a few kilometers -apart), transactional data can be synchronized across the data centers -in the same availability zone while providing fault tolerance \[Amazon -2014\]. Many more innovations in data center design are likely to -continue to come; interested readers are encouraged to see the recent -papers and videos on data center network design. - -6.7 Retrospective: A Day in the Life of a Web Page Request Now that -we've covered the link layer in this chapter, and the network, transport -and application layers in earlier chapters, our journey down the -protocol stack is complete! In the very beginning of this book (Section -1.1), we wrote "much of this book is concerned with computer network -protocols," and in the first five chapters, we've certainly seen that -this is indeed the case! Before heading into the topical chapters in -second part of this book, we'd like to wrap up our journey down the -protocol stack by taking an integrated, holistic view of the protocols -we've learned about so far. One way then to take this "big picture" view -is to identify the many (many!) protocols that are involved in -satisfying even the simplest request: downloading a Web page. Figure -6.32 illustrates our setting: a student, Bob, connects a laptop to his -school's Ethernet switch and downloads a Web page (say the home page of -www.google.com). As we now know, there's a lot going on "under the hood" -to satisfy this seemingly simple request. A Wireshark lab at the end of -this chapter examines trace files containing a number of the packets -involved in similar scenarios in more detail. - -6.7.1 Getting Started: DHCP, UDP, IP, and Ethernet Let's suppose that -Bob boots up his laptop and then connects it to an Ethernet cable -connected to the school's Ethernet switch, which in turn is connected to -the school's router, as shown in Figure 6.32. The school's router is -connected to an ISP, in this example, comcast.net. In this example, -comcast.net is providing the DNS service for the school; thus, the DNS -server resides in the Comcast network rather than the school network. -We'll assume that the DHCP server is running within the router, as is -often the case. When Bob first connects his laptop to the network, he -can't do anything (e.g., download a Web page) without an IP address. -Thus, the first network-related - -Figure 6.32 A day in the life of a Web page request: Network setting and -actions - -action taken by Bob's laptop is to run the DHCP protocol to obtain an IP -address, as well as other information, from the local DHCP server: - -1. The operating system on Bob's laptop creates a DHCP request message - (Section 4.3.3) and puts this message within a UDP segment (Section - 3.3) with destination port 67 (DHCP server) and source port 68 (DHCP - client). The UDP segment is then placed within an IP datagram - (Section 4.3.1) with a broadcast IP destination address - (255.255.255.255) and a source IP address of 0.0.0.0, since Bob's - laptop doesn't yet have an IP address. - -2. The IP datagram containing the DHCP request message is then placed - within an Ethernet frame (Section 6.4.2). The Ethernet frame has a - destination MAC addresses of FF:FF:FF:FF:FF:FF so that the frame - will be broadcast to all devices connected to the switch (hopefully - including a DHCP server); the frame's source MAC address is that of - Bob's laptop, 00:16:D3:23:68:8A. - -3. The broadcast Ethernet frame containing the DHCP request is the - first frame sent by Bob's laptop to the Ethernet switch. The switch - broadcasts the incoming frame on all outgoing ports, including the - port connected to the router. - -4. The router receives the broadcast Ethernet frame containing the DHCP - request on its interface with MAC address 00:22:6B:45:1F:1B and the - IP datagram is extracted from the Ethernet frame. The datagram's - broadcast IP destination address indicates that this IP datagram - should be processed by upper layer protocols at this node, so the - datagram's payload (a UDP segment) is - -thus demultiplexed (Section 3.2) up to UDP, and the DHCP request message -is extracted from the UDP segment. The DHCP server now has the DHCP -request message. - -5. Let's suppose that the DHCP server running within the router can - allocate IP addresses in the CIDR (Section 4.3.3) block - 68.85.2.0/24. In this example, all IP addresses used within the - school are thus within Comcast's address block. Let's suppose the - DHCP server allocates address 68.85.2.101 to Bob's laptop. The DHCP - server creates a DHCP ACK message (Section 4.3.3) containing this IP - address, as well as the IP address of the DNS server (68.87.71.226), - the IP address for the default gateway router (68.85.2.1), and the - subnet block (68.85.2.0/24) (equivalently, the "network mask"). The - DHCP message is put inside a UDP segment, which is put inside an IP - datagram, which is put inside an Ethernet frame. The Ethernet frame - has a source MAC address of the router's interface to the home - network (00:22:6B:45:1F:1B) and a destination MAC address of Bob's - laptop (00:16:D3:23:68:8A). - -6. The Ethernet frame containing the DHCP ACK is sent (unicast) by the - router to the switch. Because the switch is self-learning (Section - 6.4.3) and previously received an Ethernet frame (containing the - DHCP request) from Bob's laptop, the switch knows to forward a frame - addressed to 00:16:D3:23:68:8A only to the output port leading to - Bob's laptop. - -7. Bob's laptop receives the Ethernet frame containing the DHCP ACK, - extracts the IP datagram from the Ethernet frame, extracts the UDP - segment from the IP datagram, and extracts the DHCP ACK message from - the UDP segment. Bob's DHCP client then records its IP address and - the IP address of its DNS server. It also installs the address of - the default gateway into its IP forwarding table (Section 4.1). - Bob's laptop will send all datagrams with destination address - outside of its subnet 68.85.2.0/24 to the default gateway. At this - point, Bob's laptop has initialized its networking components and is - ready to begin processing the Web page fetch. (Note that only the - last two DHCP steps of the four presented in Chapter 4 are actually - necessary.) - -6.7.2 Still Getting Started: DNS and ARP When Bob types the URL for -www.google.com into his Web browser, he begins the long chain of events -that will eventually result in Google's home page being displayed by his -Web browser. Bob's Web browser begins the process by creating a TCP -socket (Section 2.7) that will be used to send the HTTP request (Section -2.2) to www.google.com. In order to create the socket, Bob's laptop will -need to know the IP address of www.google.com. We learned in Section -2.5, that the DNS protocol is used to provide this name-to-IP-address -translation service. - -8. The operating system on Bob's laptop thus creates a DNS query - message (Section 2.5.3), putting the string "www.google.com" in the - question section of the DNS message. This DNS message is then placed - within a UDP segment with a destination port of 53 (DNS server). The - UDP segment is then placed within an IP datagram with an IP - destination address of - -68.87.71.226 (the address of the DNS server returned in the DHCP ACK in -step 5) and a source IP address of 68.85.2.101. - -9. Bob's laptop then places the datagram containing the DNS query - message in an Ethernet frame. This frame will be sent (addressed, at - the link layer) to the gateway router in Bob's school's network. - However, even though Bob's laptop knows the IP address of the - school's gateway router (68.85.2.1) via the DHCP ACK message in step - 5 above, it doesn't know the gateway router's MAC address. In order - to obtain the MAC address of the gateway router, Bob's laptop will - need to use the ARP protocol (Section 6.4.1). - -10. Bob's laptop creates an ARP query message with a target IP address - of 68.85.2.1 (the default gateway), places the ARP message within an - Ethernet frame with a broadcast destination address - (FF:FF:FF:FF:FF:FF) and sends the Ethernet frame to the switch, - which delivers the frame to all connected devices, including the - gateway router. - -11. The gateway router receives the frame containing the ARP query - message on the interface to the school network, and finds that the - target IP address of 68.85.2.1 in the ARP message matches the IP - address of its interface. The gateway router thus prepares an ARP - reply, indicating that its MAC address of 00:22:6B:45:1F:1B - corresponds to IP address 68.85.2.1. It places the ARP reply message - in an Ethernet frame, with a destination address of - 00:16:D3:23:68:8A (Bob's laptop) and sends the frame to the switch, - which delivers the frame to Bob's laptop. - -12. Bob's laptop receives the frame containing the ARP reply message and - extracts the MAC address of the gateway router (00:22:6B:45:1F:1B) - from the ARP reply message. - -13. Bob's laptop can now (finally!) address the Ethernet frame - containing the DNS query to the gateway router's MAC address. Note - that the IP datagram in this frame has an IP destination address of - 68.87.71.226 (the DNS server), while the frame has a destination - address of 00:22:6B:45:1F:1B (the gateway router). Bob's laptop - sends this frame to the switch, which delivers the frame to the - gateway router. - -6.7.3 Still Getting Started: Intra-Domain Routing to the DNS Server 14. -The gateway router receives the frame and extracts the IP datagram -containing the DNS query. The router looks up the destination address of -this datagram (68.87.71.226) and determines from its forwarding table -that the datagram should be sent to the leftmost router in the Comcast -network in Figure 6.32. The IP datagram is placed inside a link-layer -frame appropriate for the link connecting the school's router to the -leftmost Comcast router and the frame is sent over this link. - -15. The leftmost router in the Comcast network receives the frame, - extracts the IP datagram, examines the datagram's destination - address (68.87.71.226) and determines the outgoing interface on - which to forward the datagram toward the DNS server from its - forwarding table, which has been filled in by Comcast's intra-domain - protocol (such as RIP, OSPF or IS-IS, - -Section 5.3) as well as the Internet's inter-domain protocol, BGP -(Section 5.4). - -16. Eventually the IP datagram containing the DNS query arrives at the - DNS server. The DNS server extracts the DNS query message, looks up - the name www.google.com in its DNS database (Section 2.5), and finds - the DNS resource record that contains the IP address - (64.233.169.105) for www.google.com. (assuming that it is currently - cached in the DNS server). Recall that this cached data originated - in the authoritative DNS server (Section 2.5.2) for googlecom. The - DNS server forms a DNS reply message containing this - hostname-to-IPaddress mapping, and places the DNS reply message in a - UDP segment, and the segment within an IP datagram addressed to - Bob's laptop (68.85.2.101). This datagram will be forwarded back - through the Comcast network to the school's router and from there, - via the Ethernet switch to Bob's laptop. - -17. Bob's laptop extracts the IP address of the server www.google.com - from the DNS message. Finally, after a lot of work, Bob's laptop is - now ready to contact the www.google.com server! - -6.7.4 Web Client-Server Interaction: TCP and HTTP 18. Now that Bob's -laptop has the IP address of www.google.com, it can create the TCP -socket (Section 2.7) that will be used to send the HTTP GET message -(Section 2.2.3) to www.google.com. When Bob creates the TCP socket, the -TCP in Bob's laptop must first perform a three-way handshake (Section -3.5.6) with the TCP in www.google.com. Bob's laptop thus first creates a -TCP SYN segment with destination port 80 (for HTTP), places the TCP -segment inside an IP datagram with a destination IP address of -64.233.169.105 (www.google.com), places the datagram inside a frame with -a destination MAC address of 00:22:6B:45:1F:1B (the gateway router) and -sends the frame to the switch. - -19. The routers in the school network, Comcast's network, and Google's - network forward the datagram containing the TCP SYN toward - www.google.com, using the forwarding table in each router, as in - steps 14--16 above. Recall that the router forwarding table entries - governing forwarding of packets over the inter-domain link between - the Comcast and Google networks are determined by the BGP protocol - (Chapter 5). - -20. Eventually, the datagram containing the TCP SYN arrives at - www.google.com. The TCP SYN message is extracted from the datagram - and demultiplexed to the welcome socket associated with port 80. A - connection socket (Section 2.7) is created for the TCP connection - between the Google HTTP server and Bob's laptop. A TCP SYNACK - (Section 3.5.6) segment is generated, placed inside a datagram - addressed to Bob's laptop, and finally placed inside a link-layer - frame appropriate for the link connecting www.google.com to its - first-hop router. - -21. The datagram containing the TCP SYNACK segment is forwarded through - the Google, Comcast, and school networks, eventually arriving at the - Ethernet card in Bob's laptop. The datagram is demultiplexed within - the operating system to the TCP socket created in step 18, which - enters the connected state. - -22. With the socket on Bob's laptop now (finally!) ready to send bytes -to www.google.com, Bob's browser creates the HTTP GET message (Section -2.2.3) containing the URL to be fetched. The HTTP GET message is then -written into the socket, with the GET message becoming the payload of a -TCP segment. The TCP segment is placed in a datagram and sent and -delivered to www.google.com as in steps 18--20 above. - -23. The HTTP server at www.google.com reads the HTTP GET message from - the TCP socket, creates an HTTP response message (Section 2.2), - places the requested Web page content in the body of the HTTP - response message, and sends the message into the TCP socket. - -24. The datagram containing the HTTP reply message is forwarded through - the Google, Comcast, and school networks, and arrives at Bob's - laptop. Bob's Web browser program reads the HTTP response from the - socket, extracts the html for the Web page from the body of the HTTP - response, and finally (finally!) displays the Web page! Our scenario - above has covered a lot of networking ground! If you've understood - most or all of the above example, then you've also covered a lot of - ground since you first read Section 1.1, where we wrote "much of - this book is concerned with computer network protocols" and you may - have wondered what a protocol actually was! As detailed as the above - example might seem, we've omitted a number of possible additional - protocols (e.g., NAT running in the school's gateway router, - wireless access to the school's network, security protocols for - accessing the school network or encrypting segments or datagrams, - network management protocols), and considerations (Web caching, the - DNS hierarchy) that one would encounter in the public Internet. - We'll cover a number of these topics and more in the second part of - this book. Lastly, we note that our example above was an integrated - and holistic, but also very "nuts and bolts," view of many of the - protocols that we've studied in the first part of this book. The - example focused more on the "how" than the "why." For a broader, - more reflective view on the design of network protocols in general, - see \[Clark 1988, RFC 5218\]. - -6.8 Summary In this chapter, we've examined the link layer---its -services, the principles underlying its operation, and a number of -important specific protocols that use these principles in implementing -link-layer services. We saw that the basic service of the link layer is -to move a network-layer datagram from one node (host, switch, router, -WiFi access point) to an adjacent node. We saw that all link-layer -protocols operate by encapsulating a network-layer datagram within a -link-layer frame before transmitting the frame over the link to the -adjacent node. Beyond this common framing function, however, we learned -that different link-layer protocols provide very different link access, -delivery, and transmission services. These differences are due in part -to the wide variety of link types over which link-layer protocols must -operate. A simple point-to-point link has a single sender and receiver -communicating over a single "wire." A multiple access link is shared -among many senders and receivers; consequently, the link-layer protocol -for a multiple access channel has a protocol (its multiple access -protocol) for coordinating link access. In the case of MPLS, the "link" -connecting two adjacent nodes (for example, two IP routers that are -adjacent in an IP sense---that they are next-hop IP routers toward some -destination) may actually be a network in and of itself. In one sense, -the idea of a network being considered as a link should not seem odd. A -telephone link connecting a home modem/computer to a remote -modem/router, for example, is actually a path through a sophisticated -and complex telephone network. Among the principles underlying -link-layer communication, we examined error-detection and -correction -techniques, multiple access protocols, link-layer addressing, -virtualization (VLANs), and the construction of extended switched LANs -and data center networks. Much of the focus today at the link layer is -on these switched networks. In the case of error detection/correction, -we examined how it is possible to add additional bits to a frame's -header in order to detect, and in some cases correct, bit-flip errors -that might occur when the frame is transmitted over the link. We covered -simple parity and checksumming schemes, as well as the more robust -cyclic redundancy check. We then moved on to the topic of multiple -access protocols. We identified and studied three broad approaches for -coordinating access to a broadcast channel: channel partitioning -approaches (TDM, FDM), random access approaches (the ALOHA protocols and -CSMA protocols), and taking-turns approaches (polling and token -passing). We studied the cable access network and found that it uses -many of these multiple access methods. We saw that a consequence of -having multiple nodes share a single broadcast channel was the need to -provide node addresses at the link layer. We learned that link-layer -addresses were quite different from network-layer addresses and that, in -the case of the Internet, a special protocol (ARP---the Address -Resolution Protocol) is used to translate between these two forms of -addressing and studied the hugely successful Ethernet protocol in -detail. We then examined how nodes sharing a broadcast channel form - -a LAN and how multiple LANs can be connected together to form larger -LANs---all without the intervention of network-layer routing to -interconnect these local nodes. We also learned how multiple virtual -LANs can be created on a single physical LAN infrastructure. We ended -our study of the link layer by focusing on how MPLS networks provide -link-layer services when they interconnect IP routers and an overview of -the network designs for today's massive data centers. We wrapped up this -chapter (and indeed the first five chapters) by identifying the many -protocols that are needed to fetch a simple Web page. Having covered the -link layer, our journey down the protocol stack is now over! Certainly, -the physical layer lies below the link layer, but the details of the -physical layer are probably best left for another course (for example, -in communication theory, rather than computer networking). We have, -however, touched upon several aspects of the physical layer in this -chapter and in Chapter 1 (our discussion of physical media in Section -1.2). We'll consider the physical layer again when we study wireless -link characteristics in the next chapter. Although our journey down the -protocol stack is over, our study of computer networking is not yet at -an end. In the following three chapters we cover wireless networking, -network security, and multimedia networking. These four topics do not -fit conveniently into any one layer; indeed, each topic crosscuts many -layers. Understanding these topics (billed as advanced topics in some -networking texts) thus requires a firm foundation in all layers of the -protocol stack---a foundation that our study of the link layer has now -completed! - -Homework Problems and Questions - -Chapter 6 Review Questions - -SECTIONS 6.1--6.2 R1. Consider the transportation analogy in Section -6.1.1 . If the passenger is analagous to a datagram, what is analogous -to the link layer frame? R2. If all the links in the Internet were to -provide reliable delivery service, would the TCP reliable delivery -service be redundant? Why or why not? R3. What are some of the possible -services that a link-layer protocol can offer to the network layer? -Which of these link-layer services have corresponding services in IP? In -TCP? - -SECTION 6.3 R4. Suppose two nodes start to transmit at the same time a -packet of length L over a broadcast channel of rate R. Denote the -propagation delay between the two nodes as dprop. Will there be a -collision if dprop\<L/R? Why or why not? R5. In Section 6.3 , we listed -four desirable characteristics of a broadcast channel. Which of these -characteristics does slotted ALOHA have? Which of these characteristics -does token passing have? R6. In CSMA/CD, after the fifth collision, what -is the probability that a node chooses K=4? The result K=4 corresponds -to a delay of how many seconds on a 10 Mbps Ethernet? R7. Describe -polling and token-passing protocols using the analogy of cocktail party -interactions. R8. Why would the token-ring protocol be inefficient if a -LAN had a very large perimeter? - -SECTION 6.4 R9. How big is the MAC address space? The IPv4 address -space? The IPv6 address space? R10. Suppose nodes A, B, and C each -attach to the same broadcast LAN (through their adapters). If A sends -thousands of IP datagrams to B with each encapsulating frame addressed -to the MAC address of B, will C's adapter process these frames? If so, -will C's adapter pass the IP datagrams in these frames to the network -layer C? How would your answers change if A sends frames with the MAC -broadcast address? R11. Why is an ARP query sent within a broadcast -frame? Why is an ARP response sent within - -a frame with a specific destination MAC address? R12. For the network in -Figure 6.19 , the router has two ARP modules, each with its own ARP -table. Is it possible that the same MAC address appears in both tables? -R13. Compare the frame structures for 10BASE-T, 100BASE-T, and Gigabit -Ethernet. How do they differ? R14. Consider Figure 6.15 . How many -subnetworks are there, in the addressing sense of Section 4.3 ? R15. -What is the maximum number of VLANs that can be configured on a switch -supporting the 802.1Q protocol? Why? R16. Suppose that N switches -supporting K VLAN groups are to be connected via a trunking protocol. -How many ports are needed to connect the switches? Justify your answer. - -Problems P1. Suppose the information content of a packet is the bit -pattern 1110 0110 1001 1101 and an even parity scheme is being used. -What would the value of the field containing the parity bits be for the -case of a two-dimensional parity scheme? Your answer should be such that -a minimumlength checksum field is used. P2. Show (give an example other -than the one in Figure 6.5 ) that two-dimensional parity checks can -correct and detect a single bit error. Show (give an example of) a -double-bit error that can be detected but not corrected. P3. Suppose the -information portion of a packet (D in Figure 6.3 ) contains 10 bytes -consisting of the 8-bit unsigned binary ASCII representation of string -"Networking." Compute the Internet checksum for this data. P4. Consider -the previous problem, but instead suppose these 10 bytes contain - -a. the binary representation of the numbers 1 through 10. - -b. the ASCII representation of the letters B through K (uppercase). - -c. the ASCII representation of the letters b through k (lowercase). - Compute the Internet checksum for this data. P5. Consider the 5-bit - generator, G=10011, and suppose that D has the value 1010101010. - What is the value of R? P6. Consider the previous problem, but - suppose that D has the value - -d. 1001010101. - -e. 101101010. - -f. 1010100000. P7. In this problem, we explore some of the properties - of the CRC. For the generator G(=1001) given in Section - 6.2.3 , answer the following questions. - -a. Why can it detect any single bit error in data D? b. Can the above G -detect any odd number of bit errors? Why? P8. In Section 6.3 , we -provided an outline of the derivation of the efficiency of slotted -ALOHA. In this problem we'll complete the derivation. - -a. Recall that when there are N active nodes, the efficiency of slotted - ALOHA is Np(1−p)N−1. Find the value of p that maximizes this - expression. - -b. Using the value of p found in (a), find the efficiency of slotted - ALOHA by letting N approach infinity. Hint: (1−1/N)N approaches 1/e - as N approaches infinity. P9. Show that the maximum efficiency of - pure ALOHA is 1/(2e). Note: This problem is easy if you have - completed the problem above! P 10. Consider two nodes, A and B, that - use the slotted ALOHA protocol to contend for a channel. Suppose - node A has more data to transmit than node B, and node A's - retransmission probability pA is greater than node B's - retransmission probability, pB. - -c. Provide a formula for node A's average throughput. What is the total - efficiency of the protocol with these two nodes? - -d. If pA=2pB, is node A's average throughput twice as large as that of - node B? Why or why not? If not, how can you choose pA and pB to make - that happen? - -e. In general, suppose there are N nodes, among which node A has - retransmission probability 2p and all other nodes have - retransmission probability p. Provide expressions to compute the - average throughputs of node A and of any other node. P11. Suppose - four active nodes---nodes A, B, C and D---are competing for access - to a channel using slotted ALOHA. Assume each node has an infinite - number of packets to send. Each node attempts to transmit in each - slot with probability p. The first slot is numbered slot 1, the - second slot is numbered slot 2, and so on. - -f. What is the probability that node A succeeds for the first time in - slot 5? - -g. What is the probability that some node (either A, B, C or D) - succeeds in slot 4? - -h. What is the probability that the first success occurs in slot 3? - -i. What is the efficiency of this four-node system? P12. Graph the - efficiency of slotted ALOHA and pure ALOHA as a function of p for - the following values of N: - -j. N=15. - -k. N=25. - -l. N=35. P13. Consider a broadcast channel with N nodes and a - transmission rate of R bps. Suppose the broadcast channel uses - polling (with an additional polling node) for multiple access. - Suppose the - -amount of time from when a node completes transmission until the -subsequent node is permitted to transmit (that is, the polling delay) is -dpoll. Suppose that within a polling round, a given node is allowed to -transmit at most Q bits. What is the maximum throughput of the broadcast -channel? P14. Consider three LANs interconnected by two routers, as -shown in Figure 6.33 . - -a. Assign IP addresses to all of the interfaces. For Subnet 1 use - addresses of the form 192.168.1.xxx; for Subnet 2 uses addresses of - the form 192.168.2.xxx; and for Subnet 3 use addresses of the form - 192.168.3.xxx. - -b. Assign MAC addresses to all of the adapters. - -c. Consider sending an IP datagram from Host E to Host B. Suppose all - of the ARP tables are up to date. Enumerate all the steps, as done - for the single-router example in Section 6.4.1 . - -d. Repeat (c), now assuming that the ARP table in the sending host is - empty (and the other tables are up to date). P15. Consider Figure - 6.33 . Now we replace the router between subnets 1 and 2 with a - switch S1, and label the router between subnets 2 and 3 as R1. - -Figure 6.33 Three subnets, interconnected by routers - -a. Consider sending an IP datagram from Host E to Host F. Will Host E - ask router R1 to help forward the datagram? Why? In the Ethernet - frame containing the IP datagram, what are the source and - destination IP and MAC addresses? - -b. Suppose E would like to send an IP datagram to B, and assume that - E's ARP cache does not contain B's MAC address. Will E perform an - ARP query to find B's MAC - -address? Why? In the Ethernet frame (containing the IP datagram destined -to B) that is delivered to router R1, what are the source and -destination IP and MAC addresses? - -c. Suppose Host A would like to send an IP datagram to Host B, and - neither A's ARP cache contains B's MAC address nor does B's ARP - cache contain A's MAC address. Further suppose that the switch S1's - forwarding table contains entries for Host B and router R1 only. - Thus, A will broadcast an ARP request message. What actions will - switch S1 perform once it receives the ARP request message? Will - router R1 also receive this ARP request message? If so, will R1 - forward the message to Subnet 3? Once Host B receives this ARP - request message, it will send back to Host A an ARP response - message. But will it send an ARP query message to ask for A's MAC - address? Why? What will switch S1 do once it receives an ARP - response message from Host B? P16. Consider the previous problem, - but suppose now that the router between subnets 2 and 3 is replaced - by a switch. Answer questions (a)--(c) in the previous problem in - this new context. P17. Recall that with the CSMA/CD protocol, the - adapter waits K⋅512 bit times after a collision, where K is drawn - randomly. For K=100, how long does the adapter wait until returning - to Step 2 for a 10 Mbps broadcast channel? For a 100 Mbps broadcast - channel? P18. Suppose nodes A and B are on the same 10 Mbps - broadcast channel, and the propagation delay between the two nodes - is 325 bit times. Suppose CSMA/CD and Ethernet packets are used for - this broadcast channel. Suppose node A begins transmitting a frame - and, before it finishes, node B begins transmitting a frame. Can A - finish transmitting before it detects that B has transmitted? Why or - why not? If the answer is yes, then A incorrectly believes that its - frame was successfully transmitted without a collision. Hint: - Suppose at time t=0 bits, A begins transmitting a frame. In the - worst case, A transmits a minimum-sized frame of 512+64 bit times. - So A would finish transmitting the frame at t=512+64 bit times. - Thus, the answer is no, if B's signal reaches A before bit time - t=512+64 bits. In the worst case, when does B's signal reach A? P19. - Suppose nodes A and B are on the same 10 Mbps broadcast channel, and - the propagation delay between the two nodes is 245 bit times. - Suppose A and B send Ethernet frames at the same time, the frames - collide, and then A and B choose different values of K in the - CSMA/CD algorithm. Assuming no other nodes are active, can the - retransmissions from A and B collide? For our purposes, it suffices - to work out the following example. Suppose A and B begin - transmission at t=0 bit times. They both detect collisions at t=245 - t bit times. Suppose KA=0 and KB=1. At what time does B schedule its - retransmission? At what time does A begin transmission? (Note: The - nodes must wait for an idle channel after returning to Step 2---see - protocol.) At what time does A's signal reach B? Does B refrain from - transmitting at its scheduled time? P20. In this problem, you will - derive the efficiency of a CSMA/CD-like multiple access protocol. In - this protocol, time is slotted and all adapters are synchronized to - the slots. Unlike slotted ALOHA, however, the length of a slot (in - seconds) is much less than a frame time (the time to transmit a - frame). Let S be the length of a slot. Suppose all frames are of - constant length - -L=kRS, where R is the transmission rate of the channel and k is a large -integer. Suppose there are N nodes, each with an infinite number of -frames to send. We also assume that dprop\<S, so that all nodes can -detect a collision before the end of a slot time. The protocol is as -follows: If, for a given slot, no node has possession of the channel, -all nodes contend for the channel; in particular, each node transmits in -the slot with probability p. If exactly one node transmits in the slot, -that node takes possession of the channel for the subsequent k−1 slots -and transmits its entire frame. If some node has possession of the -channel, all other nodes refrain from transmitting until the node that -possesses the channel has finished transmitting its frame. Once this -node has transmitted its frame, all nodes contend for the channel. Note -that the channel alternates between two states: the productive state, -which lasts exactly k slots, and the nonproductive state, which lasts -for a random number of slots. Clearly, the channel efficiency is the -ratio of k/(k+x), where x is the expected number of consecutive -unproductive slots. - -a. For fixed N and p, determine the efficiency of this protocol. - -b. For fixed N, determine the p that maximizes the efficiency. - -c. Using the p (which is a function of N) found in (b), determine the - efficiency as N approaches infinity. - -d. Show that this efficiency approaches 1 as the frame length becomes - large. P21. Consider Figure 6.33 in problem P14. Provide MAC - addresses and IP addresses for the interfaces at Host A, both - routers, and Host F. Suppose Host A sends a datagram to Host F. Give - the source and destination MAC addresses in the frame encapsulating - this IP datagram as the frame is transmitted (i) from A to the left - router, (ii) from the left router to the right router, (iii) from - the right router to F. Also give the source and destination IP - addresses in the IP datagram encapsulated within the frame at each - of these points in time. P22. Suppose now that the leftmost router - in Figure 6.33 is replaced by a switch. Hosts A, B, C, and D and the - right router are all star-connected into this switch. Give the - source and destination MAC addresses in the frame encapsulating this - IP datagram as the frame is transmitted (i) from A to the - switch, (ii) from the switch to the right router, (iii) from the - right router to F. Also give the source and destination IP addresses - in the IP datagram encapsulated within the frame at each of these - points in time. P23. Consider Figure 6.15 . Suppose that all links - are 100 Mbps. What is the maximum total aggregate throughput that - can be achieved among the 9 hosts and 2 servers in this network? You - can assume that any host or server can send to any other host or - server. Why? P24. Suppose the three departmental switches in Figure - 6.15 are replaced by hubs. All links are 100 Mbps. Now answer the - questions posed in problem P23. P25. Suppose that all the switches - in Figure 6.15 are replaced by hubs. All links are 100 Mbps. Now - answer the questions posed in problem P23. - -P26. Let's consider the operation of a learning switch in the context of -a network in which 6 nodes labeled A through F are star connected into -an Ethernet switch. Suppose that (i) B sends a frame to E, (ii) E -replies with a frame to B, (iii) A sends a frame to B, (iv) B replies -with a frame to A. The switch table is initially empty. Show the state -of the switch table before and after each of these events. For each of -these events, identify the link(s) on which the transmitted frame will -be forwarded, and briefly justify your answers. P27. In this problem, we -explore the use of small packets for Voice-over-IP applications. One of -the drawbacks of a small packet size is that a large fraction of link -bandwidth is consumed by overhead bytes. To this end, suppose that the -packet consists of P bytes and 5 bytes of header. - -a. Consider sending a digitally encoded voice source directly. Suppose - the source is encoded at a constant rate of 128 kbps. Assume each - packet is entirely filled before the source sends the packet into - the network. The time required to fill a packet is the packetization - delay. In terms of L, determine the packetization delay in - milliseconds. - -b. Packetization delays greater than 20 msec can cause a noticeable and - unpleasant echo. Determine the packetization delay for L=1,500 bytes - (roughly corresponding to a maximum-sized Ethernet packet) and for - L=50 (corresponding to an ATM packet). - -c. Calculate the store-and-forward delay at a single switch for a link - rate of R=622 Mbps for L=1,500 bytes, and for L=50 bytes. - -d. Comment on the advantages of using a small packet size. P28. - Consider the single switch VLAN in Figure 6.25 , and assume an - external router is connected to switch port 1. Assign IP addresses - to the EE and CS hosts and router interface. Trace the steps taken - at both the network layer and the link layer to transfer an IP - datagram from an EE host to a CS host (Hint: Reread the discussion - of Figure 6.19 in the text). P29. Consider the MPLS network shown in - Figure 6.29 , and suppose that routers R5 and R6 are now MPLS - enabled. Suppose that we want to perform traffic engineering so that - packets from R6 destined for A are switched to A via R6-R4-R3-R1, - and packets from R5 destined for A are switched via R5-R4-R2-R1. - Show the MPLS tables in R5 and R6, as well as the modified table in - R4, that would make this possible. P30. Consider again the same - scenario as in the previous problem, but suppose that packets from - R6 destined for D are switched via R6-R4-R3, while packets from R5 - destined to D are switched via R4-R2-R1-R3. Show the MPLS tables in - all routers that would make this possible. P31. In this problem, you - will put together much of what you have learned about Internet - protocols. Suppose you walk into a room, connect to Ethernet, and - want to download a Web page. What are all the protocol steps that - take place, starting from powering on your PC to getting the Web - page? Assume there is nothing in our DNS or browser caches when you - power on your PC. (Hint: The steps include the use of Ethernet, - DHCP, ARP, DNS, TCP, and HTTP protocols.) Explicitly indicate in - your steps how you obtain the IP and MAC addresses of a gateway - router. P32. Consider the data center network with hierarchical - topology in Figure 6.30 . Suppose now - -there are 80 pairs of flows, with ten flows between the first and ninth -rack, ten flows between the second and tenth rack, and so on. Further -suppose that all links in the network are 10 Gbps, except for the links -between hosts and TOR switches, which are 1 Gbps. - -a. Each flow has the same data rate; determine the maximum rate of a - flow. - -b. For the same traffic pattern, determine the maximum rate of a flow - for the highly interconnected topology in Figure 6.31 . - -c. Now suppose there is a similar traffic pattern, but involving 20 - hosts on each rack and 160 pairs of flows. Determine the maximum - flow rates for the two topologies. P33. Consider the hierarchical - network in Figure 6.30 and suppose that the data center needs to - support e-mail and video distribution among other applications. - Suppose four racks of servers are reserved for e-mail and four racks - are reserved for video. For each of the applications, all four racks - must lie below a single tier-2 switch since the tier-2 to tier-1 - links do not have sufficient bandwidth to support the - intra-application traffic. For the e-mail application, suppose that - for 99.9 percent of the time only three racks are used, and that the - video application has identical usage patterns. - -d. For what fraction of time does the e-mail application need to use a - fourth rack? How about for the video application? - -e. Assuming e-mail usage and video usage are independent, for what - fraction of time do (equivalently, what is the probability that) - both applications need their fourth rack? - -f. Suppose that it is acceptable for an application to have a shortage - of servers for 0.001 percent of time or less (causing rare periods - of performance degradation for users). Discuss how the topology in - Figure 6.31 can be used so that only seven racks are collectively - assigned to the two applications (assuming that the topology can - support all the traffic). - -Wireshark Labs At the Companion website for this textbook, -http://www.pearsonhighered.com/cs-resources/, you'll find a Wireshark -lab that examines the operation of the IEEE 802.3 protocol and the -Wireshark frame format. A second Wireshark lab examines packet traces -taken in a home network scenario. - -AN INTERVIEW WITH... Simon S. Lam Simon S. Lam is Professor and Regents -Chair in Computer Sciences at the University of Texas at Austin. From -1971 to 1974, he was with the ARPA Network Measurement Center at UCLA, -where he worked on satellite and radio packet switching. He led a -research group that invented secure sockets and prototyped, in 1993, the -first secure sockets layer named Secure Network Programming, which won -the 2004 ACM Software System Award. His research interests are in design -and analysis of network protocols and security services. He received his -BSEE from - -Washington State University and his MS and PhD from UCLA. He was elected -to the National Academy of Engineering in 2007. - -Why did you decide to specialize in networking? When I arrived at UCLA -as a new graduate student in Fall 1969, my intention was to study -control theory. Then I took the queuing theory classes of Leonard -Kleinrock and was very impressed by him. For a while, I was working on -adaptive control of queuing systems as a possible thesis topic. In early -1972, Larry Roberts initiated the ARPAnet Satellite System project -(later called Packet Satellite). Professor Kleinrock asked me to join -the project. The first thing we did was to introduce a simple, yet -realistic, backoff algorithm to the slotted ALOHA protocol. Shortly -thereafter, I found many interesting research problems, such as ALOHA's -instability problem and need for adaptive backoff, which would form the -core of my thesis. You were active in the early days of the Internet in -the 1970s, beginning with your student days at UCLA. What was it like -then? Did people have any inkling of what the Internet would become? The -atmosphere was really no different from other system-building projects I -have seen in industry and academia. The initially stated goal of the -ARPAnet was fairly modest, that is, to provide access to expensive -computers from remote locations so that many more scientists could use -them. However, with the startup of the Packet Satellite project in 1972 -and the Packet Radio project in 1973, ARPA's goal had expanded -substantially. By 1973, ARPA was building three different packet -networks at the same time, and it became necessary for Vint Cerf and Bob -Kahn to develop an interconnection strategy. Back then, all of these -progressive developments in networking were viewed (I believe) as -logical rather than magical. No one could have envisioned the scale of -the Internet and power of personal computers today. It was a decade -before appearance of the first PCs. To put things in perspective, most -students submitted their computer programs as decks of punched cards for -batch processing. Only some students had direct access to computers, -which were typically housed in a restricted area. Modems were slow and -still a rarity. As a graduate student, I had only a phone on my desk, -and I used pencil and paper to do most of my work. - -Where do you see the field of networking and the Internet heading in the -future? In the past, the simplicity of the Internet's IP protocol was -its greatest strength in vanquishing competition and becoming the de -facto standard for internetworking. Unlike competitors, such as X.25 in -the 1980s and ATM in the 1990s, IP can run on top of any link-layer -networking technology, because it offers only a best-effort datagram -service. Thus, any packet network can connect to the Internet. Today, -IP's greatest strength is actually a shortcoming. IP is like a -straitjacket that confines the Internet's development to specific -directions. In recent years, many researchers have redirected their -efforts to the application layer only. There is also a great deal of -research on wireless ad hoc networks, sensor networks, and satellite -networks. These networks can be viewed either as stand-alone systems or -link-layer systems, which can flourish because they are outside of the -IP straitjacket. Many people are excited about the possibility of P2P -systems as a platform for novel Internet applications. However, P2P -systems are highly inefficient in their use of Internet resources. A -concern of mine is whether the transmission and switching capacity of -the Internet core will continue to increase faster than the traffic -demand on the Internet as it grows to interconnect all kinds of devices -and support future P2P-enabled applications. Without substantial -overprovisioning of capacity, ensuring network stability in the presence -of malicious attacks and congestion will continue to be a significant -challenge. The Internet's phenomenal growth also requires the allocation -of new IP addresses at a rapid rate to network operators and enterprises -worldwide. At the current rate, the pool of unallocated IPv4 addresses -would be depleted in a few years. When that happens, large contiguous -blocks of address space can only be allocated from the IPv6 address -space. Since adoption of IPv6 is off to a slow start, due to lack of -incentives for early adopters, IPv4 and IPv6 will most likely coexist on -the Internet for many years to come. Successful migration from an -IPv4-dominant Internet to an IPv6-dominant Internet will require a -substantial global effort. What is the most challenging part of your -job? The most challenging part of my job as a professor is teaching and -motivating every student in my class, and every doctoral student under -my supervision, rather than just the high achievers. The very bright and -motivated may require a little guidance but not much else. I often learn -more from these students than they learn from me. Educating and -motivating the underachievers present a major challenge. What impacts do -you foresee technology having on learning in the future? Eventually, -almost all human knowledge will be accessible through the Internet, -which will be the most powerful tool for learning. This vast knowledge -base will have the potential of leveling the - -playing field for students all over the world. For example, motivated -students in any country will be able to access the best-class Web sites, -multimedia lectures, and teaching materials. Already, it was said that -the IEEE and ACM digital libraries have accelerated the development of -computer science researchers in China. In time, the Internet will -transcend all geographic barriers to learning. - -Chapter 7 Wireless and Mobile Networks - -In the telephony world, the past 20 years have arguably been the golden -years of cellular telephony. The number of worldwide mobile cellular -subscribers increased from 34 million in 1993 to nearly 7.0 billion -subscribers by 2014, with the number of cellular subscribers now -surpassing the number of wired telephone lines. There are now a larger -number of mobile phone subscriptions than there are people on our -planet. The many advantages of cell phones are evident to -all---anywhere, anytime, untethered access to the global telephone -network via a highly portable lightweight device. More recently, -laptops, smartphones, and tablets are wirelessly connected to the -Internet via a cellular or WiFi network. And increasingly, devices such -as gaming consoles, thermostats, home security systems, home appliances, -watches, eye glasses, cars, traffic control systems and more are being -wirelessly connected to the Internet. From a networking standpoint, the -challenges posed by networking these wireless and mobile devices, -particularly at the link layer and the network layer, are so different -from traditional wired computer networks that an individual chapter -devoted to the study of wireless and mobile networks (i.e., this -chapter) is appropriate. We'll begin this chapter with a discussion of -mobile users, wireless links, and networks, and their relationship to -the larger (typically wired) networks to which they connect. We'll draw -a distinction between the challenges posed by the wireless nature of the -communication links in such networks, and by the mobility that these -wireless links enable. Making this important distinction---between -wireless and mobility---will allow us to better isolate, identify, and -master the key concepts in each area. Note that there are indeed many -networked environments in which the network nodes are wireless but not -mobile (e.g., wireless home or office networks with stationary -workstations and large displays), and that there are limited forms of -mobility that do not require wireless links (e.g., a worker who uses a -wired laptop at home, shuts down the laptop, drives to work, and -attaches the laptop to the company's wired network). Of course, many of -the most exciting networked environments are those in which users are -both wireless and mobile---for example, a scenario in which a mobile -user (say in the back seat of car) maintains a Voice-over-IP call and -multiple ongoing TCP connections while racing down the autobahn at 160 -kilometers per hour, soon in an autonomous vehicle. It is here, at the -intersection of wireless and mobility, that we'll find the most -interesting technical challenges! - -We'll begin by illustrating the setting in which we'll consider wireless -communication and mobility---a network in which wireless (and possibly -mobile) users are connected into the larger network infrastructure by a -wireless link at the network's edge. We'll then consider the -characteristics of this wireless link in Section 7.2. We include a brief -introduction to code division multiple access (CDMA), a shared-medium -access protocol that is often used in wireless networks, in Section 7.2. -In Section 7.3, we'll examine the link-level aspects of the IEEE 802.11 -(WiFi) wireless LAN standard in some depth; we'll also say a few words -about Bluetooth and other wireless personal area networks. In Section -7.4, we'll provide an overview of cellular Internet access, including 3G -and emerging 4G cellular technologies that provide both voice and -high-speed Internet access. In Section 7.5, we'll turn our attention to -mobility, focusing on the problems of locating a mobile user, routing to -the mobile user, and "handing off" the mobile user who dynamically moves -from one point of attachment to the network to another. We'll examine -how these mobility services are implemented in the mobile IP standard in -enterprise 802.11 networks, and in LTE cellular networks in Sections 7.6 -and 7.7, respectively. Finally, we'll consider the impact of wireless -links and mobility on transport-layer protocols and networked -applications in Section 7.8. - -7.1 Introduction Figure 7.1 shows the setting in which we'll consider -the topics of wireless data communication and mobility. We'll begin by -keeping our discussion general enough to cover a wide range of networks, -including both wireless LANs such as IEEE 802.11 and cellular networks -such as a 4G network; we'll drill down into a more detailed discussion -of specific wireless architectures in later sections. We can identify -the following elements in a wireless network: Wireless hosts. As in the -case of wired networks, hosts are the end-system devices that run -applications. A wireless host might be a laptop, tablet, smartphone, or -desktop computer. The hosts themselves may or may not be mobile. - -Figure 7.1 Elements of a wireless network - -Wireless links. A host connects to a base station (defined below) or to -another wireless host through a wireless communication link. Different -wireless link technologies have different - -transmission rates and can transmit over different distances. Figure 7.2 -shows two key characteristics (coverage area and link rate) of the more -popular wireless network standards. (The figure is only meant to provide -a rough idea of these characteristics. For example, some of these types -of networks are only now being deployed, and some link rates can -increase or decrease beyond the values shown depending on distance, -channel conditions, and the number of users in the wireless network.) -We'll cover these standards later in the first half of this chapter; -we'll also consider other wireless link characteristics (such as their -bit error rates and the causes of bit errors) in Section 7.2. In Figure -7.1, wireless links connect wireless hosts located at the edge of the -network into the larger network infrastructure. We hasten to add that -wireless links are also sometimes used within a network to connect -routers, switches, and - -Figure 7.2 Link characteristics of selected wireless network standards - -other network equipment. However, our focus in this chapter will be on -the use of wireless communication at the network edge, as it is here -that many of the most exciting technical challenges, and most of the -growth, are occurring. Base station. The base station is a key part of -the wireless network infrastructure. Unlike the wireless host and -wireless link, a base station has no obvious counterpart in a wired -network. A base station is responsible for sending and receiving data -(e.g., packets) to and from a wireless host that is associated with that -base station. A base station will often be responsible for coordinating -the transmission of multiple wireless hosts with which it is associated. -When we say a wireless host is - -"associated" with a base station, we mean that (1) the host is within -the wireless communication distance of the base station, and (2) the -host uses that base station to relay data between it (the host) and the -larger network. Cell towers in cellular networks and access points in -802.11 wireless LANs are examples of base stations. In Figure 7.1, the -base station is connected to the larger network (e.g., the Internet, -corporate or home network, or telephone network), thus functioning as a -link-layer relay between the wireless host and the rest of the world -with which the host communicates. Hosts associated with a base station -are often referred to as operating in infrastructure mode, since all -traditional network services (e.g., address assignment and routing) are -provided by the network to which a host is connected via CASE HISTORY -PUBLIC WIFI ACCESS: COMING SOON TO A LAMP POST NEAR YOU? WiFi -hotspots---public locations where users can find 802.11 wireless -access---are becoming increasingly common in hotels, airports, and cafés -around the world. Most college campuses offer ubiquitous wireless -access, and it's hard to find a hotel that doesn't offer wireless -Internet access. Over the past decade a number of cities have designed, -deployed, and operated municipal WiFi networks. The vision of providing -ubiquitous WiFi access to the community as a public service (much like -streetlights)---helping to bridge the digital divide by providing -Internet access to all citizens and to promote economic development---is -compelling. Many cities around the world, including Philadelphia, -Toronto, Hong Kong, Minneapolis, London, and Auckland, have plans to -provide ubiquitous wireless within the city, or have already done so to -varying degrees. The goal in Philadelphia was to "turn Philadelphia into -the nation's largest WiFi hotspot and help to improve education, bridge -the digital divide, enhance neighborhood development, and reduce the -costs of government." The ambitious program--- an agreement between the -city, Wireless Philadelphia (a nonprofit entity), and the Internet -Service Provider Earthlink---built an operational network of 802.11b -hotspots on streetlamp pole arms and traffic control devices that -covered 80 percent of the city. But financial and operational concerns -caused the network to be sold to a group of private investors in 2008, -who later sold the network back to the city in 2010. Other cities, such -as Minneapolis, Toronto, Hong Kong, and Auckland, have had success with -smaller-scale efforts. The fact that 802.11 networks operate in the -unlicensed spectrum (and hence can be deployed without purchasing -expensive spectrum use rights) would seem to make them financially -attractive. However, 802.11 access points (see Section 7.3) have much -shorter ranges than 4G cellular base stations (see Section 7.4), -requiring a larger number of deployed endpoints to cover the same -geographic region. Cellular data networks providing Internet access, on -the other hand, operate in the licensed spectrum. Cellular providers pay - -billions of dollars for spectrum access rights for their networks, -making cellular data networks a business rather than municipal -undertaking. the base station. In ad hoc networks, wireless hosts have -no such infrastructure with which to connect. In the absence of such -infrastructure, the hosts themselves must provide for services such as -routing, address assignment, DNS-like name translation, and more. When a -mobile host moves beyond the range of one base station and into the -range of another, it will change its point of attachment into the larger -network (i.e., change the base station with which it is associated)---a -process referred to as handoff. Such mobility raises many challenging -questions. If a host can move, how does one find the mobile host's -current location in the network so that data can be forwarded to that -mobile host? How is addressing performed, given that a host can be in -one of many possible locations? If the host moves during a TCP -connection or phone call, how is data routed so that the connection -continues uninterrupted? These and many (many!) other questions make -wireless and mobile networking an area of exciting networking research. -Network infrastructure. This is the larger network with which a wireless -host may wish to communicate. Having discussed the "pieces" of a -wireless network, we note that these pieces can be combined in many -different ways to form different types of wireless networks. You may -find a taxonomy of these types of wireless networks useful as you read -on in this chapter, or read/learn more about wireless networks beyond -this book. At the highest level we can classify wireless networks -according to two criteria: (i) whether a packet in the wireless network -crosses exactly one wireless hop or multiple wireless hops, and (ii) -whether there is infrastructure such as a base station in the network: -Single-hop, infrastructure-based. These networks have a base station -that is connected to a larger wired network (e.g., the Internet). -Furthermore, all communication is between this base station and a -wireless host over a single wireless hop. The 802.11 networks you use in -the classroom, café, or library; and the 4G LTE data networks that we -will learn about shortly all fall in this category. The vast majority of -our daily interactions are with single-hop, infrastructure-based -wireless networks. Single-hop, infrastructure-less. In these networks, -there is no base station that is connected to a wireless network. -However, as we will see, one of the nodes in this single-hop network may -coordinate the transmissions of the other nodes. Bluetooth networks -(that connect small wireless devices such as keyboards, speakers, and -headsets, and which we will study in Section 7.3.6) and 802.11 networks -in ad hoc mode are single-hop, infrastructure-less networks. Multi-hop, -infrastructure-based. In these networks, a base station is present that -is wired to the larger network. However, some wireless nodes may have to -relay their communication through other wireless nodes in order to -communicate via the base station. Some wireless sensor networks and -so-called wireless mesh networks fall in this category. Multi-hop, -infrastructure-less. There is no base station in these networks, and -nodes may have to relay messages among several other nodes in order to -reach a destination. Nodes may also be - -mobile, with connectivity changing among nodes---a class of networks -known as mobile ad hoc networks (MANETs). If the mobile nodes are -vehicles, the network is a vehicular ad hoc network (VANET). As you -might imagine, the development of protocols for such networks is -challenging and is the subject of much ongoing research. In this -chapter, we'll mostly confine ourselves to single-hop networks, and then -mostly to infrastructurebased networks. Let's now dig deeper into the -technical challenges that arise in wireless and mobile networks. We'll -begin by first considering the individual wireless link, deferring our -discussion of mobility until later in this chapter. - -7.2 Wireless Links and Network Characteristics Let's begin by -considering a simple wired network, say a home network, with a wired -Ethernet switch (see Section 6.4) interconnecting the hosts. If we -replace the wired Ethernet with a wireless 802.11 network, a wireless -network interface would replace the host's wired Ethernet interface, and -an access point would replace the Ethernet switch, but virtually no -changes would be needed at the network layer or above. This suggests -that we focus our attention on the link layer when looking for important -differences between wired and wireless networks. Indeed, we can find a -number of important differences between a wired link and a wireless -link: Decreasing signal strength. Electromagnetic radiation attenuates -as it passes through matter (e.g., a radio signal passing through a -wall). Even in free space, the signal will disperse, resulting in -decreased signal strength (sometimes referred to as path loss) as the -distance between sender and receiver increases. Interference from other -sources. Radio sources transmitting in the same frequency band will -interfere with each other. For example, 2.4 GHz wireless phones and -802.11b wireless LANs transmit in the same frequency band. Thus, the -802.11b wireless LAN user talking on a 2.4 GHz wireless phone can expect -that neither the network nor the phone will perform particularly well. -In addition to interference from transmitting sources, electromagnetic -noise within the environment (e.g., a nearby motor, a microwave) can -result in interference. Multipath propagation. Multipath propagation -occurs when portions of the electromagnetic wave reflect off objects and -the ground, taking paths of different lengths between a sender and -receiver. This results in the blurring of the received signal at the -receiver. Moving objects between the sender and receiver can cause -multipath propagation to change over time. For a detailed discussion of -wireless channel characteristics, models, and measurements, see -\[Anderson 1995\]. The discussion above suggests that bit errors will be -more common in wireless links than in wired links. For this reason, it -is perhaps not surprising that wireless link protocols (such as the -802.11 protocol we'll examine in the following section) employ not only -powerful CRC error detection codes, but also link-level -reliable-data-transfer protocols that retransmit corrupted frames. -Having considered the impairments that can occur on a wireless channel, -let's next turn our attention to the host receiving the wireless signal. -This host receives an electromagnetic signal that is a combination of a -degraded form of the original signal transmitted by the sender (degraded -due to the attenuation and multipath propagation effects that we -discussed above, among others) and background noise in the - -environment. The signal-to-noise ratio (SNR) is a relative measure of -the strength of the received signal (i.e., the information being -transmitted) and this noise. The SNR is typically measured in units of -decibels (dB), a unit of measure that some think is used by electrical -engineers primarily to confuse computer scientists. The SNR, measured in -dB, is twenty times the ratio of the base-10 logarithm of the amplitude -of the received signal to the amplitude of the noise. For our purposes -here, we need only know that a larger SNR makes it easier for the -receiver to extract the transmitted signal from the background noise. -Figure 7.3 (adapted from \[Holland 2001\]) shows the bit error rate -(BER)---roughly speaking, the probability that a transmitted bit is -received in error at the receiver---versus the SNR for three different -modulation techniques for encoding information for transmission on an -idealized wireless channel. The theory of modulation and coding, as well -as signal extraction and BER, is well beyond the scope of - -Figure 7.3 Bit error rate, transmission rate, and SNR - -Figure 7.4 Hidden terminal problem caused by obstacle (a) and fading (b) - -this text (see \[Schwartz 1980\] for a discussion of these topics). -Nonetheless, Figure 7.3 illustrates several physical-layer -characteristics that are important in understanding higher-layer -wireless communication protocols: For a given modulation scheme, the -higher the SNR, the lower the BER. Since a sender can increase the SNR -by increasing its transmission power, a sender can decrease the -probability that a frame is received in error by increasing its -transmission power. Note, however, that there is arguably little -practical gain in increasing the power beyond a certain threshold, say -to decrease the BER from 10−12 to 10−13. There are also disadvantages -associated with increasing the transmission power: More energy must be -expended by the sender (an important concern for battery-powered mobile -users), and the sender's transmissions are more likely to interfere with -the transmissions of another sender (see Figure 7.4(b)). For a given -SNR, a modulation technique with a higher bit transmission rate (whether -in error or not) will have a higher BER. For example, in Figure 7.3, -with an SNR of 10 dB, BPSK modulation with a transmission rate of 1 Mbps -has a BER of less than 10−7, while with QAM16 modulation with a -transmission rate of 4 Mbps, the BER is 10−1, far too high to be -practically useful. However, with an SNR of 20 dB, QAM16 modulation has -a transmission rate of 4 Mbps and a BER of 10−7, while BPSK modulation -has a transmission rate of only 1 Mbps and a BER that is so low as to be -(literally) "off the charts." If one can tolerate a BER of 10−7, the -higher transmission rate offered by QAM16 would make it the preferred -modulation technique in this situation. These considerations give rise -to the final characteristic, described next. Dynamic selection of the -physical-layer modulation technique can be used to adapt the modulation -technique to channel conditions. The SNR (and hence the BER) may change -as a result of mobility or due to changes in the environment. Adaptive -modulation and coding are used in cellular data systems and in the -802.11 WiFi and 4G cellular data networks that we'll study in Sections -7.3 and 7.4. This allows, for example, the selection of a modulation -technique that provides the highest transmission rate possible subject -to a constraint on the BER, for given channel characteristics. - -A higher and time-varying bit error rate is not the only difference -between a wired and wireless link. Recall that in the case of wired -broadcast links, all nodes receive the transmissions from all other -nodes. In the case of wireless links, the situation is not as simple, as -shown in Figure 7.4. Suppose that Station A is transmitting to Station -B. Suppose also that Station C is transmitting to Station B. With the -so-called hidden terminal problem, physical obstructions in the -environment (for example, a mountain or a building) may prevent A and C -from hearing each other's transmissions, even though A's and C's -transmissions are indeed interfering at the destination, B. This is -shown in Figure 7.4(a). A second scenario that results in undetectable -collisions at the receiver results from the fading of a signal's -strength as it propagates through the wireless medium. Figure 7.4(b) -illustrates the case where A and C are placed such that their signals -are not strong enough to detect each other's transmissions, yet their -signals are strong enough to interfere with each other at station B. As -we'll see in Section 7.3, the hidden terminal problem and fading make -multiple access in a wireless network considerably more complex than in -a wired network. - -7.2.1 CDMA Recall from Chapter 6 that when hosts communicate over a -shared medium, a protocol is needed so that the signals sent by multiple -senders do not interfere at the receivers. In Chapter 6 we described -three classes of medium access protocols: channel partitioning, random -access, and taking turns. Code division multiple access (CDMA) belongs -to the family of channel partitioning protocols. It is prevalent in -wireless LAN and cellular technologies. Because CDMA is so important in -the wireless world, we'll take a quick look at CDMA now, before getting -into specific wireless access technologies in the subsequent sections. -In a CDMA protocol, each bit being sent is encoded by multiplying the -bit by a signal (the code) that changes at a much faster rate (known as -the chipping rate) than the original sequence of data bits. Figure 7.5 -shows a simple, idealized CDMA encoding/decoding scenario. Suppose that -the rate at which original data bits reach the CDMA encoder defines the -unit of time; that is, each original data bit to be transmitted requires -a one-bit slot time. Let di be the value of the data bit for the ith bit -slot. For mathematical convenience, we represent a data bit with a 0 -value as −1. Each bit slot is further subdivided into M mini-slots; in -Figure 7.5, M=8, - -Figure 7.5 A simple CDMA example: Sender encoding, receiver decoding - -although in practice M is much larger. The CDMA code used by the sender -consists of a sequence of M values, cm, m=1,..., M, each taking a+1 or -−1 value. In the example in Figure 7.5, the M-bit CDMA code being used -by the sender is (1,1,1,−1,1,−1,−1,−1). To illustrate how CDMA works, -let us focus on the ith data bit, di. For the mth mini-slot of the -bittransmission time of di, the output of the CDMA encoder, Zi,m, is the -value of di multiplied by the mth bit in the assigned CDMA code, cm: -Zi,m=di⋅cm In a simple world, with no interfering senders, the receiver -would receive the encoded bits, Zi,m, and recover the original data bit, -di, by computing: - -(7.1) - -di=1M∑m=1MZi,m⋅cm - -(7.2) - -The reader might want to work through the details of the example in -Figure 7.5 to see that the original data bits are indeed correctly -recovered at the receiver using Equation 7.2. The world is far from -ideal, however, and as noted above, CDMA must work in the presence of -interfering senders that are encoding and transmitting their data using -a different assigned code. But how can a CDMA receiver recover a -sender's original data bits when those data bits are being tangled with -bits being transmitted by other senders? CDMA works under the assumption -that the interfering transmitted bit signals are additive. This means, -for example, that if three senders send a 1 value, and a fourth sender -sends a −1 value during the same mini-slot, then the received signal at -all receivers during that mini-slot is a 2 (since 1+1+1−1=2). In the -presence of multiple senders, sender s computes its encoded -transmissions, Zi,ms, in exactly the same manner as in Equation 7.1. The -value received at a receiver during the mth mini-slot of the ith bit -slot, however, is now the sum of the transmitted bits from all N senders -during that mini-slot: Zi,m*=∑s=1NZi,ms Amazingly, if the senders' codes -are chosen carefully, each receiver can recover the data sent by a given -sender out of the aggregate signal simply by using the sender's code in -exactly the same manner as in Equation 7.2: di=1M∑m=1MZi,m*⋅cm - -(7.3) - -as shown in Figure 7.6, for a two-sender CDMA example. The M-bit CDMA -code being used by the upper sender is (1,1,1,−1,1,−1,−1,−1), while the -CDMA code being used by the lower sender is (1,−1,1,1,1,−1,1,1). Figure -7.6 illustrates a receiver recovering the original data bits from the -upper sender. Note that the receiver is able to extract the data from -sender 1 in spite of the interfering transmission from sender 2. Recall -our cocktail analogy from Chapter 6. A CDMA protocol is similar to -having partygoers speaking in multiple languages; in such circumstances -humans are actually quite good at locking into the conversation in the -language they understand, while filtering out the remaining -conversations. We see here that CDMA is a partitioning protocol in that -it partitions the codespace (as opposed to time or frequency) and -assigns each node a dedicated piece of the codespace. Our discussion -here of CDMA is necessarily brief; in practice a number of difficult -issues must be addressed. First, in order for the CDMA receivers to be -able - -Figure 7.6 A two-sender CDMA example - -to extract a particular sender's signal, the CDMA codes must be -carefully chosen. Second, our discussion has assumed that the received -signal strengths from various senders are the same; in reality this can -be difficult to achieve. There is a considerable body of literature -addressing these and other issues related to CDMA; see \[Pickholtz 1982; -Viterbi 1995\] for details. - -7.3 WiFi: 802.11 Wireless LANs Pervasive in the workplace, the home, -educational institutions, cafés, airports, and street corners, wireless -LANs are now one of the most important access network technologies in -the Internet today. Although many technologies and standards for -wireless LANs were developed in the 1990s, one particular class of -standards has clearly emerged as the winner: the IEEE 802.11 wireless -LAN, also known as WiFi. In this section, we'll take a close look at -802.11 wireless LANs, examining its frame structure, its medium access -protocol, and its internetworking of 802.11 LANs with wired Ethernet -LANs. There are several 802.11 standards for wireless LAN technology in -the IEEE 802.11 ("WiFi") family, as summarized in Table 7.1. The -different 802.11 standards all share some common characteristics. They -all use the same medium access protocol, CSMA/CA, which we'll discuss -shortly. All three use the same frame structure for their link-layer -frames as well. All three standards have the ability to reduce their -transmission rate in order to reach out over greater distances. And, -importantly, 802.11 products are also all backwards compatible, meaning, -for example, that a mobile capable only of 802.11g may still interact -with a newer 802.11ac base station. However, as shown in Table 7.1, the -standards have some major differences at the physical layer. 802.11 -devices operate in two difference frequency ranges: 2.4--2.485 GHz -(referred to as the 2.4 GHz range) and 5.1 -- 5.8 GHz (referred to as -the 5 GHz range). The 2.4 GHz range is an unlicensed frequency band, -where 802.11 devices may compete for frequency spectrum with 2.4 GHz -phones and microwave ovens. At 5 GHz, 802.11 LANs have a shorter -transmission distance for a given power level and suffer more from -multipath propagation. The two most recent standards, 802.11n \[IEEE -802.11n 2012\] and 802.11ac \[IEEE 802.11ac 2013; Cisco 802.11ac 2015\] -uses multiple input multiple-output (MIMO) antennas; i.e., two or more -antennas on the sending side and two or more antennas on the receiving -side that are transmitting/receiving different signals \[Diggavi 2004\]. -802.11ac base Table 7.1 Summary of IEEE 802.11 standards Standard - -Frequency Range - -Data Rate - -802.11b - -2.4 GHz - -up to 11 Mbps - -802.11a - -5 GHz - -up to 54 Mbps - -802.11g - -2.4 GHz - -up to 54 Mbps - -802.11n - -2.5 GHz and 5 GHz - -up to 450 Mbps - -802.11ac - -5 GHz - -up to 1300 Mbps - -stations may transmit to multiple stations simultaneously, and use -"smart" antennas to adaptively beamform to target transmissions in the -direction of a receiver. This decreases interference and increases the -distance reached at a given data rate. The data rates shown in Table 7.1 -are for an idealized environment, e.g., a receiver placed 1 meter away -from the base station, with no interference ---a scenario that we're -unlikely to experience in practice! So as the saying goes, YMMV: Your -Mileage (or in this case your wireless data rate) May Vary. - -7.3.1 The 802.11 Architecture Figure 7.7 illustrates the principal -components of the 802.11 wireless LAN architecture. The fundamental -building block of the 802.11 architecture is the basic service set -(BSS). A BSS contains one or more wireless stations and a central base -station, known as an access point (AP) in 802.11 parlance. Figure 7.7 -shows the AP in each of two BSSs connecting to an interconnection device -(such as a switch or router), which in turn leads to the Internet. In a -typical home network, there is one AP and one router (typically -integrated together as one unit) that connects the BSS to the Internet. -As with Ethernet devices, each 802.11 wireless station has a 6-byte MAC -address that is stored in the firmware of the station's adapter (that -is, 802.11 network interface card). Each AP also has a MAC address for -its wireless interface. As with Ethernet, these MAC addresses are -administered by IEEE and are (in theory) globally unique. - -Figure 7.7 IEEE 802.11 LAN architecture - -Figure 7.8 An IEEE 802.11 ad hoc network - -As noted in Section 7.1, wireless LANs that deploy APs are often -referred to as infrastructure wireless LANs, with the "infrastructure" -being the APs along with the wired Ethernet infrastructure that -interconnects the APs and a router. Figure 7.8 shows that IEEE 802.11 -stations can also group themselves together to form an ad hoc -network---a network with no central control and with no connections to -the "outside world." Here, the network is formed "on the fly," by mobile -devices that have found themselves in proximity to each other, that have -a need to communicate, and that find no preexisting network -infrastructure in their location. An ad hoc network might be formed when -people with - -laptops get together (for example, in a conference room, a train, or a -car) and want to exchange data in the absence of a centralized AP. There -has been tremendous interest in ad hoc networking, as communicating -portable devices continue to proliferate. In this section, though, we'll -focus our attention on infrastructure wireless LANs. Channels and -Association In 802.11, each wireless station needs to associate with an -AP before it can send or receive networklayer data. Although all of the -802.11 standards use association, we'll discuss this topic specifically -in the context of IEEE 802.11b/g. When a network administrator installs -an AP, the administrator assigns a one- or two-word Service Set -Identifier (SSID) to the access point. (When you choose Wi-Fi under -Setting on your iPhone, for example, a list is displayed showing the -SSID of each AP in range.) The administrator must also assign a channel -number to the AP. To understand channel numbers, recall that 802.11 -operates in the frequency range of 2.4 GHz to 2.4835 GHz. Within this 85 -MHz band, 802.11 defines 11 partially overlapping channels. Any two -channels are non-overlapping if and only if they are separated by four -or more channels. In particular, the set of channels 1, 6, and 11 is the -only set of three non-overlapping channels. This means that an -administrator could create a wireless LAN with an aggregate maximum -transmission rate of 33 Mbps by installing three 802.11b APs at the same -physical location, assigning channels 1, 6, and 11 to the APs, and -interconnecting each of the APs with a switch. Now that we have a basic -understanding of 802.11 channels, let's describe an interesting (and not -completely uncommon) situation---that of a WiFi jungle. A WiFi jungle is -any physical location where a wireless station receives a sufficiently -strong signal from two or more APs. For example, in many cafés in New -York City, a wireless station can pick up a signal from numerous nearby -APs. One of the APs might be managed by the café, while the other APs -might be in residential apartments near the café. Each of these APs -would likely be located in a different IP subnet and would have been -independently assigned a channel. Now suppose you enter such a WiFi -jungle with your phone, tablet, or laptop, seeking wireless Internet -access and a blueberry muffin. Suppose there are five APs in the WiFi -jungle. To gain Internet access, your wireless device needs to join -exactly one of the subnets and hence needs to associate with exactly one -of the APs. Associating means the wireless device creates a virtual wire -between itself and the AP. Specifically, only the associated AP will -send data frames (that is, frames containing data, such as a datagram) -to your wireless device, and your wireless device will send data frames -into the Internet only through the associated AP. But how does your -wireless device associate with a particular AP? And more fundamentally, -how does your wireless device know which APs, if any, are out there in -the jungle? The 802.11 standard requires that an AP periodically send -beacon frames, each of which includes the - -AP's SSID and MAC address. Your wireless device, knowing that APs are -sending out beacon frames, scans the 11 channels, seeking beacon frames -from any APs that may be out there (some of which may be transmitting on -the same channel---it's a jungle out there!). Having learned about -available APs from the beacon frames, you (or your wireless device) -select one of the APs for association. The 802.11 standard does not -specify an algorithm for selecting which of the available APs to -associate with; that algorithm is left up to the designers of the 802.11 -firmware and software in your wireless device. Typically, the device -chooses the AP whose beacon frame is received with the highest signal -strength. While a high signal strength is good (see, e.g., Figure 7.3), -signal strength is not the only AP characteristic that will determine -the performance a device receives. In particular, it's possible that the -selected AP may have a strong signal, but may be overloaded with other -affiliated devices (that will need to share the wireless bandwidth at -that AP), while an unloaded AP is not selected due to a slightly weaker -signal. A number of alternative ways of choosing APs have thus recently -been proposed \[Vasudevan 2005; Nicholson 2006; Sundaresan 2006\]. For -an interesting and down-to-earth discussion of how signal strength is -measured, see \[Bardwell 2004\]. - -Figure 7.9 Active and passive scanning for access points - -The process of scanning channels and listening for beacon frames is -known as passive scanning (see Figure 7.9a). A wireless device can also -perform active scanning, by broadcasting a probe frame that will be -received by all APs within the wireless device's range, as shown in -Figure 7.9b. APs respond to the probe request frame with a probe -response frame. The wireless device can then choose the AP with which to -associate from among the responding APs. - -After selecting the AP with which to associate, the wireless device -sends an association request frame to the AP, and the AP responds with -an association response frame. Note that this second request/response -handshake is needed with active scanning, since an AP responding to the -initial probe request frame doesn't know which of the (possibly many) -responding APs the device will choose to associate with, in much the -same way that a DHCP client can choose from among multiple DHCP servers -(see Figure 4.21). Once associated with an AP, the device will want to -join the subnet (in the IP addressing sense of Section 4.3.3) to which -the AP belongs. Thus, the device will typically send a DHCP discovery -message (see Figure 4.21) into the subnet via the AP in order to obtain -an IP address on the subnet. Once the address is obtained, the rest of -the world then views that device simply as another host with an IP -address in that subnet. In order to create an association with a -particular AP, the wireless device may be required to authenticate -itself to the AP. 802.11 wireless LANs provide a number of alternatives -for authentication and access. One approach, used by many companies, is -to permit access to a wireless network based on a device's MAC address. -A second approach, used by many Internet cafés, employs usernames and -passwords. In both cases, the AP typically communicates with an -authentication server, relaying information between the wireless device -and the authentication server using a protocol such as RADIUS \[RFC -2865\] or DIAMETER \[RFC 3588\]. Separating the authentication server -from the AP allows one authentication server to serve many APs, -centralizing the (often sensitive) decisions of authentication and -access within the single server, and keeping AP costs and complexity -low. We'll see in Chapter 8 that the new IEEE 802.11i protocol defining -security aspects of the 802.11 protocol family takes precisely this -approach. - -7.3.2 The 802.11 MAC Protocol Once a wireless device is associated with -an AP, it can start sending and receiving data frames to and from the -access point. But because multiple wireless devices, or the AP itself -may want to transmit data frames at the same time over the same channel, -a multiple access protocol is needed to coordinate the transmissions. In -the following, we'll refer to the devices or the AP as wireless -"stations" that share the multiple access channel. As discussed in -Chapter 6 and Section 7.2.1, broadly speaking there are three classes of -multiple access protocols: channel partitioning (including CDMA), random -access, and taking turns. Inspired by the huge success of Ethernet and -its random access protocol, the designers of 802.11 chose a random -access protocol for 802.11 wireless LANs. This random access protocol is -referred to as CSMA with collision avoidance, or more succinctly as -CSMA/CA. As with Ethernet's CSMA/CD, the "CSMA" in CSMA/CA stands for -"carrier sense multiple access," meaning that each station senses the -channel before transmitting, and refrains from transmitting when the -channel is sensed busy. Although both Ethernet and 802.11 use -carrier-sensing random access, the two MAC protocols have important -differences. First, instead of using collision detection, 802.11 uses -collisionavoidance techniques. Second, because of the relatively high -bit error rates of wireless channels, - -802.11 (unlike Ethernet) uses a link-layer acknowledgment/retransmission -(ARQ) scheme. We'll describe 802.11's collision-avoidance and link-layer -acknowledgment schemes below. Recall from Sections 6.3.2 and 6.4.2 that -with Ethernet's collision-detection algorithm, an Ethernet station -listens to the channel as it transmits. If, while transmitting, it -detects that another station is also transmitting, it aborts its -transmission and tries to transmit again after waiting a small, random -amount of time. Unlike the 802.3 Ethernet protocol, the 802.11 MAC -protocol does not implement collision detection. There are two important -reasons for this: The ability to detect collisions requires the ability -to send (the station's own signal) and receive (to determine whether -another station is also transmitting) at the same time. Because the -strength of the received signal is typically very small compared to the -strength of the transmitted signal at the 802.11 adapter, it is costly -to build hardware that can detect a collision. More importantly, even if -the adapter could transmit and listen at the same time (and presumably -abort transmission when it senses a busy channel), the adapter would -still not be able to detect all collisions, due to the hidden terminal -problem and fading, as discussed in Section 7.2. Because 802.11wireless -LANs do not use collision detection, once a station begins to transmit a -frame, it transmits the frame in its entirety; that is, once a station -gets started, there is no turning back. As one might expect, -transmitting entire frames (particularly long frames) when collisions -are prevalent can significantly degrade a multiple access protocol's -performance. In order to reduce the likelihood of collisions, 802.11 -employs several collision-avoidance techniques, which we'll shortly -discuss. Before considering collision avoidance, however, we'll first -need to examine 802.11's link-layer acknowledgment scheme. Recall from -Section 7.2 that when a station in a wireless LAN sends a frame, the -frame may not reach the destination station intact for a variety of -reasons. To deal with this non-negligible chance of failure, the 802.11 -MAC protocol uses link-layer acknowledgments. As shown in Figure 7.10, -when the destination station receives a frame that passes the CRC, it -waits a short period of time known as the Short Inter-frame Spacing -(SIFS) and then sends back - -Figure 7.10 802.11 uses link-layer acknowledgments - -an acknowledgment frame. If the transmitting station does not receive an -acknowledgment within a given amount of time, it assumes that an error -has occurred and retransmits the frame, using the CSMA/CA protocol to -access the channel. If an acknowledgment is not received after some -fixed number of retransmissions, the transmitting station gives up and -discards the frame. Having discussed how 802.11 uses link-layer -acknowledgments, we're now in a position to describe the 802.11 CSMA/CA -protocol. Suppose that a station (wireless device or an AP) has a frame -to transmit. - -1. If initially the station senses the channel idle, it transmits its - frame after a short period of time known as the Distributed - Inter-frame Space (DIFS); see Figure 7.10. - -2. Otherwise, the station chooses a random backoff value using binary - exponential backoff (as we encountered in Section 6.3.2) and counts - down this value after DIFS when the channel is sensed idle. While - the channel is sensed busy, the counter value remains frozen. - -3. When the counter reaches zero (note that this can only occur while - the channel is sensed idle), the station transmits the entire frame - and then waits for an acknowledgment. - -4. If an acknowledgment is received, the transmitting station knows - that its frame has been correctly received at the destination - station. If the station has another frame to send, it begins - -the CSMA/CA protocol at step 2. If the acknowledgment isn't received, -the transmitting station reenters the backoff phase in step 2, with the -random value chosen from a larger interval. Recall that under Ethernet's -CSMA/CD, multiple access protocol (Section 6.3.2), a station begins -transmitting as soon as the channel is sensed idle. With CSMA/CA, -however, the station refrains from transmitting while counting down, -even when it senses the channel to be idle. Why do CSMA/CD and CDMA/CA -take such different approaches here? To answer this question, let's -consider a scenario in which two stations each have a data frame to -transmit, but neither station transmits immediately because each senses -that a third station is already transmitting. With Ethernet's CSMA/CD, -the two stations would each transmit as soon as they detect that the -third station has finished transmitting. This would cause a collision, -which isn't a serious issue in CSMA/CD, since both stations would abort -their transmissions and thus avoid the useless transmissions of the -remainders of their frames. In 802.11, however, the situation is quite -different. Because 802.11 does not detect a collision and abort -transmission, a frame suffering a collision will be transmitted in its -entirety. The goal in 802.11 is thus to avoid collisions whenever -possible. In 802.11, if the two stations sense the channel busy, they -both immediately enter random backoff, hopefully choosing different -backoff values. If these values are indeed different, once the channel -becomes idle, one of the two stations will begin transmitting before the -other, and (if the two stations are not hidden from each other) the -"losing station" will hear the "winning station's" signal, freeze its -counter, and refrain from transmitting until the winning station has -completed its transmission. In this manner, a costly collision is -avoided. Of course, collisions can still occur with 802.11 in this -scenario: The two stations could be hidden from each other, or the two -stations could choose random backoff values that are close enough that -the transmission from the station starting first have yet to reach the -second station. Recall that we encountered this problem earlier in our -discussion of random access algorithms in the context of Figure 6.12. -Dealing with Hidden Terminals: RTS and CTS The 802.11 MAC protocol also -includes a nifty (but optional) reservation scheme that helps avoid -collisions even in the presence of hidden terminals. Let's investigate -this scheme in the context of Figure 7.11, which shows two wireless -stations and one access point. Both of the wireless stations are within -range of the AP (whose coverage is shown as a shaded circle) and both -have associated with the AP. However, due to fading, the signal ranges -of wireless stations are limited to the interiors of the shaded circles -shown in Figure 7.11. Thus, each of the wireless stations is hidden from -the other, although neither is hidden from the AP. Let's now consider -why hidden terminals can be problematic. Suppose Station H1 is -transmitting a frame and halfway through H1's transmission, Station H2 -wants to send a frame to the AP. H2, not hearing the transmission from -H1, will first wait a DIFS interval and then transmit the frame, -resulting in - -a collision. The channel will therefore be wasted during the entire -period of H1's transmission as well as during H2's transmission. In -order to avoid this problem, the IEEE 802.11 protocol allows a station -to use a short Request to Send (RTS) control frame and a short Clear to -Send (CTS) control frame to reserve access to the channel. When a sender -wants to send a DATA - -Figure 7.11 Hidden terminal example: H1 is hidden from H2, and vice -versa - -frame, it can first send an RTS frame to the AP, indicating the total -time required to transmit the DATA frame and the acknowledgment (ACK) -frame. When the AP receives the RTS frame, it responds by broadcasting a -CTS frame. This CTS frame serves two purposes: It gives the sender -explicit permission to send and also instructs the other stations not to -send for the reserved duration. Thus, in Figure 7.12, before -transmitting a DATA frame, H1 first broadcasts an RTS frame, which is -heard by all stations in its circle, including the AP. The AP then -responds - -Figure 7.12 Collision avoidance using the RTS and CTS frames - -with a CTS frame, which is heard by all stations within its range, -including H1 and H2. Station H2, having heard the CTS, refrains from -transmitting for the time specified in the CTS frame. The RTS, CTS, -DATA, and ACK frames are shown in Figure 7.12. The use of the RTS and -CTS frames can improve performance in two important ways: The hidden -station problem is mitigated, since a long DATA frame is transmitted -only after the channel has been reserved. Because the RTS and CTS frames -are short, a collision involving an RTS or CTS frame will last only - -for the duration of the short RTS or CTS frame. Once the RTS and CTS -frames are correctly transmitted, the following DATA and ACK frames -should be transmitted without collisions. You are encouraged to check -out the 802.11 applet in the textbook's Web site. This interactive -applet illustrates the CSMA/CA protocol, including the RTS/CTS exchange -sequence. Although the RTS/CTS exchange can help reduce collisions, it -also introduces delay and consumes channel resources. For this reason, -the RTS/CTS exchange is only used (if at all) to reserve the channel for -the transmission of a long DATA frame. In practice, each wireless -station can set an RTS threshold such that the RTS/CTS sequence is used -only when the frame is longer than the threshold. For many wireless -stations, the default RTS threshold value is larger than the maximum -frame length, so the RTS/CTS sequence is skipped for all DATA frames -sent. Using 802.11 as a Point-to-Point Link Our discussion so far has -focused on the use of 802.11 in a multiple access setting. We should -mention that if two nodes each have a directional antenna, they can -point their directional antennas at each other and run the 802.11 -protocol over what is essentially a point-to-point link. Given the low -cost of commodity 802.11 hardware, the use of directional antennas and -an increased transmission power allow 802.11 to be used as an -inexpensive means of providing wireless point-to-point connections over -tens of kilometers distance. \[Raman 2007\] describes one of the first -such multi-hop wireless networks, operating in the rural Ganges plains -in India using point-to-point 802.11 links. - -7.3.3 The IEEE 802.11 Frame Although the 802.11 frame shares many -similarities with an Ethernet frame, it also contains a number of fields -that are specific to its use for wireless links. The 802.11 frame is -shown in Figure 7.13. The numbers above each of the fields in the frame -represent the lengths of the fields in bytes; the numbers above each of -the subfields in the frame control field represent the lengths of the -subfields in bits. Let's now examine the fields in the frame as well as -some of the more important subfields in the frame's control field. - -Figure 7.13 The 802.11 frame - -Payload and CRC Fields At the heart of the frame is the payload, which -typically consists of an IP datagram or an ARP packet. Although the -field is permitted to be as long as 2,312 bytes, it is typically fewer -than 1,500 bytes, holding an IP datagram or an ARP packet. As with an -Ethernet frame, an 802.11 frame includes a 32-bit cyclic redundancy -check (CRC) so that the receiver can detect bit errors in the received -frame. As we've seen, bit errors are much more common in wireless LANs -than in wired LANs, so the CRC is even more useful here. Address Fields -Perhaps the most striking difference in the 802.11 frame is that it has -four address fields, each of which can hold a 6-byte MAC address. But -why four address fields? Doesn't a source MAC field and destination MAC -field suffice, as they do for Ethernet? It turns out that three address -fields are needed for internetworking purposes---specifically, for -moving the network-layer datagram from a wireless station through an AP -to a router interface. The fourth address field is used when APs forward -frames to each other in ad hoc mode. Since we are only considering -infrastructure networks here, let's focus our attention on the first -three address fields. The 802.11 standard defines these fields as -follows: Address 2 is the MAC address of the station that transmits the -frame. Thus, if a wireless station transmits the frame, that station's -MAC address is inserted in the address 2 field. Similarly, if an AP -transmits the frame, the AP's MAC address is inserted in the address 2 -field. Address 1 is the MAC address of the wireless station that is to -receive the frame. Thus if a mobile wireless station transmits the -frame, address 1 contains the MAC address of the destination AP. -Similarly, if an AP transmits the frame, address 1 contains the MAC -address of the destination wireless station. - -Figure 7.14 The use of address fields in 802.11 frames: Sending frames -between H1 and R1 - -To understand address 3, recall that the BSS (consisting of the AP and -wireless stations) is part of a subnet, and that this subnet connects to -other subnets via some router interface. Address 3 contains the MAC -address of this router interface. To gain further insight into the -purpose of address 3, let's walk through an internetworking example in -the context of Figure 7.14. In this figure, there are two APs, each of -which is responsible for a number of wireless stations. Each of the APs -has a direct connection to a router, which in turn connects to the -global Internet. We should keep in mind that an AP is a link-layer -device, and thus neither "speaks" IP nor understands IP addresses. -Consider now moving a datagram from the router interface R1 to the -wireless Station H1. The router is not aware that there is an AP between -it and H1; from the router's perspective, H1 is just a host in one of -the subnets to which it (the router) is connected. The router, which -knows the IP address of H1 (from the destination address of the -datagram), uses ARP to determine the MAC address of H1, just as in an -ordinary Ethernet LAN. After obtaining H1's MAC address, router -interface R1 encapsulates the datagram within an Ethernet frame. The -source address field of this frame contains R1's MAC address, and the -destination address field contains H1's MAC address. When the Ethernet -frame arrives at the AP, the AP converts the 802.3 Ethernet frame to an -802.11 frame before transmitting the frame into the wireless channel. -The AP fills in address 1 and address 2 with H1's MAC address and its -own MAC address, respectively, as described above. For address 3, the AP -inserts the MAC address of R1. In this manner, H1 can determine (from -address 3) the MAC address of the router interface that sent the -datagram into the subnet. - -Now consider what happens when the wireless station H1 responds by -moving a datagram from H1 to R1. H1 creates an 802.11 frame, filling the -fields for address 1 and address 2 with the AP's MAC address and H1's -MAC address, respectively, as described above. For address 3, H1 inserts -R1's MAC address. When the AP receives the 802.11 frame, it converts the -frame to an Ethernet frame. The source address field for this frame is -H1's MAC address, and the destination address field is R1's MAC address. -Thus, address 3 allows the AP to determine the appropriate destination -MAC address when constructing the Ethernet frame. In summary, address 3 -plays a crucial role for internetworking the BSS with a wired LAN. -Sequence Number, Duration, and Frame Control Fields Recall that in -802.11, whenever a station correctly receives a frame from another -station, it sends back an acknowledgment. Because acknowledgments can -get lost, the sending station may send multiple copies of a given frame. -As we saw in our discussion of the rdt2.1 protocol (Section 3.4.1), the -use of sequence numbers allows the receiver to distinguish between a -newly transmitted frame and the retransmission of a previous frame. The -sequence number field in the 802.11 frame thus serves exactly the same -purpose here at the link layer as it did in the transport layer in -Chapter 3. Recall that the 802.11 protocol allows a transmitting station -to reserve the channel for a period of time that includes the time to -transmit its data frame and the time to transmit an acknowledgment. This -duration value is included in the frame's duration field (both for data -frames and for the RTS and CTS frames). As shown in Figure 7.13, the -frame control field includes many subfields. We'll say just a few words -about some of the more important subfields; for a more complete -discussion, you are encouraged to consult the 802.11 specification -\[Held 2001; Crow 1997; IEEE 802.11 1999\]. The type and subtype fields -are used to distinguish the association, RTS, CTS, ACK, and data frames. -The to and from fields are used to define the meanings of the different -address fields. (These meanings change depending on whether ad hoc or -infrastructure modes are used and, in the case of infrastructure mode, -whether a wireless station or an AP is sending the frame.) Finally the -WEP field indicates whether encryption is being used or not (WEP is -discussed in Chapter 8). - -7.3.4 Mobility in the Same IP Subnet - -In order to increase the physical range of a wireless LAN, companies and -universities will often deploy multiple BSSs within the same IP subnet. -This naturally raises the issue of mobility among the BSSs--- how do -wireless stations seamlessly move from one BSS to another while -maintaining ongoing TCP sessions? As we'll see in this subsection, -mobility can be handled in a relatively straightforward manner when the -BSSs are part of the subnet. When stations move between subnets, more -sophisticated mobility management protocols will be needed, such as -those we'll study in Sections 7.5 and 7.6. Let's now look at a specific -example of mobility between BSSs in the same subnet. Figure 7.15 shows -two interconnected BSSs with a host, H1, moving from BSS1 to BSS2. -Because in this example the interconnection device that connects the two -BSSs is not a router, all of the stations in the two BSSs, including the -APs, belong to the same IP subnet. Thus, when H1 moves from BSS1 to -BSS2, it may keep its IP address and all of its ongoing TCP connections. -If the interconnection device were a router, then H1 would have to -obtain a new IP address in the subnet in which it was moving. This -address change would disrupt (and eventually terminate) any on-going TCP -connections at H1. In Section 7.6, we'll see how a network-layer -mobility protocol, such as mobile IP, can be used to avoid this problem. -But what specifically happens when H1 moves from BSS1 to BSS2? As H1 -wanders away from AP1, H1 detects a weakening signal from AP1 and starts -to scan for a stronger signal. H1 receives beacon frames from AP2 (which -in many corporate and university settings will have the same SSID as -AP1). H1 then disassociates with AP1 and associates with AP2, while -keeping its IP address and maintaining its ongoing TCP sessions. This -addresses the handoff problem from the host and AP viewpoint. But what -about the switch in Figure 7.15? How does it know that the host has -moved from one AP to another? As you may recall from Chapter 6, switches -are "self-learning" and automatically build their forwarding tables. -This selflearning feature nicely handles - -Figure 7.15 Mobility in the same subnet - -occasional moves (for example, when an employee gets transferred from -one department to another); however, switches were not designed to -support highly mobile users who want to maintain TCP connections while -moving between BSSs. To appreciate the problem here, recall that before -the move, the switch has an entry in its forwarding table that pairs -H1's MAC address with the outgoing switch interface through which H1 can -be reached. If H1 is initially in BSS1, then a datagram destined to H1 -will be directed to H1 via AP1. Once H1 associates with BSS2, however, -its frames should be directed to AP2. One solution (a bit of a hack, -really) is for AP2 to send a broadcast Ethernet frame with H1's source -address to the switch just after the new association. When the switch -receives the frame, it updates its forwarding table, allowing H1 to be -reached via AP2. The 802.11f standards group is developing an inter-AP -protocol to handle these and related issues. Our discussion above has -focused on mobility with the same LAN subnet. Recall that VLANs, which -we studied in Section 6.4.4, can be used to connect together islands of -LANs into a large virtual LAN that can span a large geographical region. -Mobility among base stations within such a VLAN can be handled in -exactly the same manner as above \[Yu 2011\]. - -7.3.5 Advanced Features in 802.11 We'll wrap up our coverage of 802.11 -with a short discussion of two advanced capabilities found in 802.11 -networks. As we'll see, these capabilities are not completely specified -in the 802.11 standard, but rather are made possible by mechanisms -specified in the standard. This allows different vendors to implement -these capabilities using their own (proprietary) approaches, presumably -giving them an edge over the competition. 802.11 Rate Adaptation We saw -earlier in Figure 7.3 that different modulation techniques (with the -different transmission rates that they provide) are appropriate for -different SNR scenarios. Consider for example a mobile 802.11 user who -is initially 20 meters away from the base station, with a high -signal-to-noise ratio. Given the high SNR, the user can communicate with -the base station using a physical-layer modulation technique that -provides high transmission rates while maintaining a low BER. This is -one happy user! Suppose now that the user becomes mobile, walking away -from the base station, with the SNR falling as the distance from the -base station increases. In this case, if the modulation technique used -in the 802.11 protocol operating between the base station and the user -does not change, the BER will become unacceptably high as the SNR -decreases, and eventually no transmitted frames will be received -correctly. For this reason, some 802.11 implementations have a rate -adaptation capability that adaptively selects the underlying -physical-layer modulation technique to use based on current or recent -channel - -characteristics. If a node sends two frames in a row without receiving -an acknowledgment (an implicit indication of bit errors on the channel), -the transmission rate falls back to the next lower rate. If 10 frames in -a row are acknowledged, or if a timer that tracks the time since the -last fallback expires, the transmission rate increases to the next -higher rate. This rate adaptation mechanism shares the same "probing" -philosophy as TCP's congestion-control mechanism---when conditions are -good (reflected by ACK receipts), the transmission rate is increased -until something "bad" happens (the lack of ACK receipts); when something -"bad" happens, the transmission rate is reduced. 802.11 rate adaptation -and TCP congestion control are thus similar to the young child who is -constantly pushing his/her parents for more and more (say candy for a -young child, later curfew hours for the teenager) until the parents -finally say "Enough!" and the child backs off (only to try again later -after conditions have hopefully improved!). A number of other schemes -have also been proposed to improve on this basic automatic -rateadjustment scheme \[Kamerman 1997; Holland 2001; Lacage 2004\]. -Power Management Power is a precious resource in mobile devices, and -thus the 802.11 standard provides powermanagement capabilities that -allow 802.11 nodes to minimize the amount of time that their sense, -transmit, and receive functions and other circuitry need to be "on." -802.11 power management operates as follows. A node is able to -explicitly alternate between sleep and wake states (not unlike a sleepy -student in a classroom!). A node indicates to the access point that it -will be going to sleep by setting the power-management bit in the header -of an 802.11 frame to 1. A timer in the node is then set to wake up the -node just before the AP is scheduled to send its beacon frame (recall -that an AP typically sends a beacon frame every 100 msec). Since the AP -knows from the set power-transmission bit that the node is going to -sleep, it (the AP) knows that it should not send any frames to that -node, and will buffer any frames destined for the sleeping host for -later transmission. A node will wake up just before the AP sends a -beacon frame, and quickly enter the fully active state (unlike the -sleepy student, this wakeup requires only 250 microseconds \[Kamerman -1997\]!). The beacon frames sent out by the AP contain a list of nodes -whose frames have been buffered at the AP. If there are no buffered -frames for the node, it can go back to sleep. Otherwise, the node can -explicitly request that the buffered frames be sent by sending a polling -message to the AP. With an inter-beacon time of 100 msec, a wakeup time -of 250 microseconds, and a similarly small time to receive a beacon -frame and check to ensure that there are no buffered frames, a node that -has no frames to send or receive can be asleep 99% of the time, -resulting in a significant energy savings. - -7.3.6 Personal Area Networks: Bluetooth and Zigbee As illustrated in -Figure 7.2, the IEEE 802.11 WiFi standard is aimed at communication -among devices separated by up to 100 meters (except when 802.11 is used -in a point-to-point configuration with a - -directional antenna). Two other wireless protocols in the IEEE 802 -family are Bluetooth and Zigbee (defined in the IEEE 802.15.1 and IEEE -802.15.4 standards \[IEEE 802.15 2012\]). Bluetooth An IEEE 802.15.1 -network operates over a short range, at low power, and at low cost. It -is essentially a low-power, short-range, low-rate "cable replacement" -technology for interconnecting a computer with its wireless keyboard, -mouse or other peripheral device; cellular phones, speakers, headphones, -and many other devices, whereas 802.11 is a higher-power, medium-range, -higher-rate "access" technology. For this reason, 802.15.1 networks are -sometimes referred to as wireless personal area networks (WPANs). The -link and physical layers of 802.15.1 are based on the earlier Bluetooth -specification for personal area networks \[Held 2001, Bisdikian 2001\]. -802.15.1 networks operate in the 2.4 GHz unlicensed radio band in a TDM -manner, with time slots of 625 microseconds. During each time slot, a -sender transmits on one of 79 channels, with the channel changing in a -known but pseudo-random manner from slot to slot. This form of channel -hopping, known as frequency-hopping spread spectrum (FHSS), spreads -transmissions in time over the frequency spectrum. 802.15.1 can provide -data rates up to 4 Mbps. 802.15.1 networks are ad hoc networks: No -network infrastructure (e.g., an access point) is needed to interconnect -802.15.1 devices. Thus, 802.15.1 devices must organize themselves. -802.15.1 devices are first organized into a piconet of up to eight -active devices, as shown in Figure 7.16. One of these devices is -designated as the master, with the remaining devices acting as slaves. -The master node truly rules the piconet---its clock determines time in -the piconet, it can transmit in each odd-numbered slot, and a - -Figure 7.16 A Bluetooth piconet - -slave can transmit only after the master has communicated with it in the -previous slot and even then the slave can only transmit to the master. -In addition to the slave devices, there can also be up to 255 parked -devices in the network. These devices cannot communicate until their -status has been changed from parked to active by the master node. For -more information about WPANs, the interested reader should consult the -Bluetooth references \[Held 2001, Bisdikian 2001\] or the official IEEE -802.15 Web site \[IEEE 802.15 2012\]. Zigbee A second personal area -network standardized by the IEEE is the 802.15.4 standard \[IEEE 802.15 -2012\] known as Zigbee. While Bluetooth networks provide a "cable -replacement" data rate of over a Megabit per second, Zigbee is targeted -at lower-powered, lower-data-rate, lower-duty-cycle applications than -Bluetooth. While we may tend to think that "bigger and faster is -better," not all network applications need high bandwidth and the -consequent higher costs (both economic and power costs). For example, -home temperature and light sensors, security devices, and wall-mounted -switches are all very simple, lowpower, low-duty-cycle, low-cost -devices. Zigbee is thus well-suited for these devices. Zigbee defines -channel rates of 20, 40, 100, and 250 Kbps, depending on the channel -frequency. Nodes in a Zigbee network come in two flavors. So-called -"reduced-function devices" operate as slave devices under the control of -a single "full-function device," much as Bluetooth slave devices. A -fullfunction device can operate as a master device as in Bluetooth by -controlling multiple slave devices, and multiple full-function devices -can additionally be configured into a mesh network in which fullfunction -devices route frames amongst themselves. Zigbee shares many protocol -mechanisms that we've already encountered in other link-layer protocols: -beacon frames and link-layer acknowledgments (similar to 802.11), -carrier-sense random access protocols with binary exponential backoff -(similar to 802.11 and Ethernet), and fixed, guaranteed allocation of -time slots (similar to DOCSIS). Zigbee networks can be configured in -many different ways. Let's consider the simple case of a single -full-function device controlling multiple reduced-function devices in a -time-slotted manner using beacon frames. Figure 7.17 shows the case - -Figure 7.17 Zigbee 802.15.4 super-frame structure - -where the Zigbee network divides time into recurring super frames, each -of which begins with a beacon frame. Each beacon frame divides the super -frame into an active period (during which devices may transmit) and an -inactive period (during which all devices, including the controller, can -sleep and thus conserve power). The active period consists of 16 time -slots, some of which are used by devices in a CSMA/CA random access -manner, and some of which are allocated by the controller to specific -devices, thus providing guaranteed channel access for those devices. -More details about Zigbee networks can be found at \[Baronti 2007, IEEE -802.15.4 2012\]. - -7.4 Cellular Internet Access In the previous section we examined how an -Internet host can access the Internet when inside a WiFi hotspot---that -is, when it is within the vicinity of an 802.11 access point. But most -WiFi hotspots have a small coverage area of between 10 and 100 meters in -diameter. What do we do then when we have a desperate need for wireless -Internet access and we cannot access a WiFi hotspot? Given that cellular -telephony is now ubiquitous in many areas throughout the world, a -natural strategy is to extend cellular networks so that they support not -only voice telephony but wireless Internet access as well. Ideally, this -Internet access would be at a reasonably high speed and would provide -for seamless mobility, allowing users to maintain their TCP sessions -while traveling, for example, on a bus or a train. With sufficiently -high upstream and downstream bit rates, the user could even maintain -videoconferencing sessions while roaming about. This scenario is not -that far-fetched. Data rates of several megabits per second are becoming -available as broadband data services such as those we will cover here -become more widely deployed. In this section, we provide a brief -overview of current and emerging cellular Internet access technologies. -Our focus here will be on both the wireless first hop as well as the -network that connects the wireless first hop into the larger telephone -network and/or the Internet; in Section 7.7 we'll consider how calls are -routed to a user moving between base stations. Our brief discussion will -necessarily provide only a simplified and high-level description of -cellular technologies. Modern cellular communications, of course, has -great breadth and depth, with many universities offering several courses -on the topic. Readers seeking a deeper understanding are encouraged to -see \[Goodman 1997; Kaaranen 2001; Lin 2001; Korhonen 2003; Schiller -2003; Palat 2009; Scourias 2012; Turner 2012; Akyildiz 2010\], as well -as the particularly excellent and exhaustive references \[Mouly 1992; -Sauter 2014\]. - -7.4.1 An Overview of Cellular Network Architecture In our description of -cellular network architecture in this section, we'll adopt the -terminology of the Global System for Mobile Communications (GSM) -standards. (For history buffs, the GSM acronym was originally derived -from Groupe Spécial Mobile, until the more anglicized name was adopted, -preserving the original acronym letters.) In the 1980s, Europeans -recognized the need for a pan-European digital cellular telephony system -that would replace the numerous incompatible analog cellular telephony -systems, leading to the GSM standard \[Mouly 1992\]. Europeans deployed -GSM technology with great - -success in the early 1990s, and since then GSM has grown to be the -800-pound gorilla of the cellular telephone world, with more than 80% of -all cellular subscribers worldwide using GSM. - -CASE HISTORY 4G Cellular Mobile Versus Wireless LANs Many cellular -mobile phone operators are deploying 4G cellular mobile systems. In some -countries (e.g., Korea and Japan), 4G LTE coverage is higher than -90%---nearly ubiquitous. In 2015, average download rates over deployed -LTE systems range from 10Mbps in the US and India to close to 40 Mbps in -New Zealand. These 4G systems are being deployed in licensed -radio-frequency bands, with some operators paying considerable sums to -governments for spectrum-use licenses. 4G systems allow users to access -the Internet from remote outdoor locations while on the move, in a -manner similar to today's cellular phone-only access. In many cases, a -user may have simultaneous access to both wireless LANs and 4G. With the -capacity of 4G systems being both more constrained and more expensive, -many mobile devices default to the use of WiFi rather than 4G, when both -are avilable. The question of whether wireless edge network access will -be primarily over wireless LANs or cellular systems remains an open -question: The emerging wireless LAN infrastructure may become nearly -ubiquitous. IEEE 802.11 wireless LANs, operating at 54 Mbps and higher, -are enjoying widespread deployment. Essentially all laptops, tablets and -smartphones are factory-equipped with 802.11 LAN capabilities. -Furthermore, emerging Internet appliances---such as wireless cameras and -picture frames---also have low-powered wireless LAN capabilities. -Wireless LAN base stations can also handle mobile phone appliances. Many -phones are already capable of connecting to the cellular phone network -or to an IP network either natively or using a Skype-like Voice-over-IP -service, thus bypassing the operator's cellular voice and 4G data -services. Of course, many other experts believe that 4G not only will be -a major success, but will also dramatically revolutionize the way we -work and live. Most likely, both WiFi and 4G will both become prevalent -wireless technologies, with roaming wireless devices automatically -selecting the access technology that provides the best service at their -current physical location. - -When people talk about cellular technology, they often classify the -technology as belonging to one of several "generations." The earliest -generations were designed primarily for voice traffic. First generation -(1G) systems were analog FDMA systems designed exclusively for -voice-only communication. These 1G systems are almost extinct now, -having been replaced by digital 2G systems. The original 2G systems were -also designed for voice, but later extended (2.5G) to support data -(i.e., Internet) as well as voice service. 3G systems also support voice -and data, but with an emphasis on data capabilities and - -higher-speed radio access links. The 4G systems being deployed today are -based on LTE technology, feature an all-IP core network, and provide -integrated voice and data at multi-Megabit speeds. Cellular Network -Architecture, 2G: Voice Connections to the Telephone Network The term -cellular refers to the fact that the region covered by a cellular -network is partitioned into a number of geographic coverage areas, known -as cells, shown as hexagons on the left side of Figure 7.18. As with the -802.11WiFi standard we studied in Section 7.3.1, GSM has its own -particular nomenclature. Each cell - -Figure 7.18 Components of the GSM 2G cellular network architecture - -contains a base transceiver station (BTS) that transmits signals to and -receives signals from the mobile stations in its cell. The coverage area -of a cell depends on many factors, including the transmitting power of -the BTS, the transmitting power of the user devices, obstructing -buildings in the cell, and the height of base station antennas. Although -Figure 7.18 shows each cell containing one base transceiver station -residing in the middle of the cell, many systems today place the BTS at -corners where three cells intersect, so that a single BTS with -directional antennas can service three cells. The GSM standard for 2G -cellular systems uses combined FDM/TDM (radio) for the air interface. -Recall from Chapter 1 that, with pure FDM, the channel is partitioned -into a number of frequency bands with each band devoted to a call. Also -recall from Chapter 1 that, with pure TDM, time is partitioned into - -frames with each frame further partitioned into slots and each call -being assigned the use of a particular slot in the revolving frame. In -combined FDM/TDM systems, the channel is partitioned into a number of -frequency sub-bands; within each sub-band, time is partitioned into -frames and slots. Thus, for a combined FDM/TDM system, if the channel is -partitioned into F sub-bands and time is partitioned into T slots, then -the channel will be able to support F.T simultaneous calls. Recall that -we saw in Section 6.3.4 that cable access networks also use a combined -FDM/TDM approach. GSM systems consist of 200-kHz frequency bands with -each band supporting eight TDM calls. GSM encodes speech at 13 kbps and -12.2 kbps. A GSM network's base station controller (BSC) will typically -service several tens of base transceiver stations. The role of the BSC -is to allocate BTS radio channels to mobile subscribers, perform paging -(finding the cell in which a mobile user is resident), and perform -handoff of mobile users---a topic we'll cover shortly in Section 7.7.2. -The base station controller and its controlled base transceiver stations -collectively constitute a GSM base station subsystem (BSS). As we'll see -in Section 7.7, the mobile switching center (MSC) plays the central role -in user authorization and accounting (e.g., determining whether a mobile -device is allowed to connect to the cellular network), call -establishment and teardown, and handoff. A single MSC will typically -contain up to five BSCs, resulting in approximately 200K subscribers per -MSC. A cellular provider's network will have a number of MSCs, with -special MSCs known as gateway MSCs connecting the provider's cellular -network to the larger public telephone network. - -7.4.2 3G Cellular Data Networks: Extending the Internet to Cellular -Subscribers Our discussion in Section 7.4.1 focused on connecting -cellular voice users to the public telephone network. But, of course, -when we're on the go, we'd also like to read e-mail, access the Web, get -location-dependent services (e.g., maps and restaurant recommendations) -and perhaps even watch streaming video. To do this, our smartphone will -need to run a full TCP/IP protocol stack (including the physical link, -network, transport, and application layers) and connect into the -Internet via the cellular data network. The topic of cellular data -networks is a rather bewildering collection of competing and -ever-evolving standards as one generation (and half-generation) succeeds -the former and introduces new technologies and services with new -acronyms. To make matters worse, there's no single official body that -sets requirements for 2.5G, 3G, 3.5G, or 4G technologies, making it hard -to sort out the differences among competing standards. In our discussion -below, we'll focus on the UMTS (Universal Mobile Telecommunications -Service) 3G and 4G standards developed by the 3rd Generation Partnership -project (3GPP) \[3GPP 2016\]. Let's first take a top-down look at 3G -cellular data network architecture shown in Figure 7.19. - -Figure 7.19 3G system architecture - -3G Core Network The 3G core cellular data network connects radio access -networks to the public Internet. The core network interoperates with -components of the existing cellular voice network (in particular, the -MSC) that we previously encountered in Figure 7.18. Given the -considerable amount of existing infrastructure (and profitable -services!) in the existing cellular voice network, the approach taken by -the designers of 3G data services is clear: leave the existing core GSM -cellular voice network untouched, adding additional cellular data -functionality in parallel to the existing cellular voice network. The -alternative--- integrating new data services directly into the core of -the existing cellular voice network---would have raised the same -challenges encountered in Section 4.3, where we discussed integrating -new (IPv6) and legacy (IPv4) technologies in the Internet. - -There are two types of nodes in the 3G core network: Serving GPRS -Support Nodes (SGSNs) and Gateway GPRS Support Nodes (GGSNs). (GPRS -stands for Generalized Packet Radio Service, an early cellular data -service in 2G networks; here we discuss the evolved version of GPRS in -3G networks). An SGSN is responsible for delivering datagrams to/from -the mobile nodes in the radio access network to which the SGSN is -attached. The SGSN interacts with the cellular voice network's MSC for -that area, providing user authorization and handoff, maintaining -location (cell) information about active mobile nodes, and performing -datagram forwarding between mobile nodes in the radio access network and -a GGSN. The GGSN acts as a gateway, connecting multiple SGSNs into the -larger Internet. A GGSN is thus the last piece of 3G infrastructure that -a datagram originating at a mobile node encounters before entering the -larger Internet. To the outside world, the GGSN looks like any other -gateway router; the mobility of the 3G nodes within the GGSN's network -is hidden from the outside world behind the GGSN. 3G Radio Access -Network: The Wireless Edge The 3G radio access network is the wireless -first-hop network that we see as a 3G user. The Radio Network Controller -(RNC) typically controls several cell base transceiver stations similar -to the base stations that we encountered in 2G systems (but officially -known in 3G UMTS parlance as a "Node Bs"---a rather non-descriptive -name!). Each cell's wireless link operates between the mobile nodes and -a base transceiver station, just as in 2G networks. The RNC connects to -both the circuit-switched cellular voice network via an MSC, and to the -packet-switched Internet via an SGSN. Thus, while 3G cellular voice and -cellular data services use different core networks, they share a common -first/last-hop radio access network. A significant change in 3G UMTS -over 2G networks is that rather than using GSM's FDMA/TDMA scheme, UMTS -uses a CDMA technique known as Direct Sequence Wideband CDMA (DS-WCDMA) -\[Dahlman 1998\] within TDMA slots; TDMA slots, in turn, are available -on multiple frequencies---an interesting use of all three dedicated -channel-sharing approaches that we earlier identified in Chapter 6 and -similar to the approach taken in wired cable access networks (see -Section 6.3.4). This change requires a new 3G cellular wireless-access -network operating in parallel with the 2G BSS radio network shown in -Figure 7.19. The data service associated with the WCDMA specification is -known as HSPA (High Speed Packet Access) and promises downlink data -rates of up to 14 Mbps. Details regarding 3G networks can be found at -the 3rd Generation Partnership Project (3GPP) Web site \[3GPP 2016\]. - -7.4.3 On to 4G: LTE Fourth generation (4G) cellular systems are becoming -widely deployed. In 2015, more than 50 countries had 4G coverage -exceeding 50%. The 4G Long-Term Evolution (LTE) standard \[Sauter 2014\] -put forward by the 3GPP has two important innovations over 3G systems an -all-IP core network and an - -enhanced radio access network, as discussed below. 4G System -Architecture: An All-IP Core Network Figure 7.20 shows the overall 4G -network architecture, which (unfortunately) introduces yet another -(rather impenetrable) new vocabulary and set of acronyms for - -Figure 7.20 4G network architecture - -network components. But let's not get lost in these acronyms! There are -two important high-level observations about the 4G architecture: A -unified, all-IP network architecture. Unlike the 3G network shown in -Figure 7.19, which has separate network components and paths for voice -and data traffic, the 4G architecture shown in Figure 7.20 is -"all-IP"---both voice and data are carried in IP datagrams to/from the -wireless device (the User Equipment, UE in 4G parlance) to the gateway -to the packet gateway (P-GW) that connects the 4G edge network to the -rest of the network. With 4G, the last vestiges of cellular networks' -roots in the telephony have disappeared, giving way to universal IP -service! A clear separation of the 4G data plane and 4G control plane. -Mirroring our distinction between the data and control planes for IP's -network layer in Chapters 4 and 5 respectively, the 4G network -architecture also clearly separates the data and control planes. We'll -discuss their functionality below. A clear separation between the radio -access network, and the all-IP-core network. IP datagrams carrying user -data are forwarded between the user (UE) and the gateway (P-GW in - -Figure 7.20) over a 4G-internal IP network to the external Internet. -Control packets are exchanged over this same internal network among the -4G's control services components, whose roles are described below. The -principal components of the 4G architecture are as follows. The eNodeB -is the logical descendant of the 2G base station and the 3G Radio -Network Controller (a.k.a Node B) and again plays a central role here. -Its data-plane role is to forward datagrams between UE (over the LTE -radio access network) and the P-GW. UE datagrams are encapsulated at the -eNodeB and tunneled to the P-GW through the 4G network's all-IP enhanced -packet core (EPC). This tunneling between the eNodeB and P-GW is similar -the tunneling we saw in Section 4.3 of IPv6 datagrams between two IPv6 -endpoints through a network of IPv4 routers. These tunnels may have -associated quality of service (QoS) guarantees. For example, a 4G -network may guarantee that voice traffic experiences no more than a 100 -msec delay between UE and P-GW, and has a packet loss rate of less than -1%; TCP traffic might have a guarantee of 300 msec and a packet loss -rate of less than .0001% \[Palat 2009\]. We'll cover QoS in Chapter 9. -In the control plane, the eNodeB handles registration and mobility -signaling traffic on behalf of the UE. The Packet Data Network Gateway -(P-GW) allocates IP addresses to the UEs and performs QoS enforcement. -As a tunnel endpoint it also performs datagram -encapsulation/decapsulation when forwarding a datagram to/from a UE. The -Serving Gateway (S-GW) is the data-plane mobility anchor point---all UE -traffic will pass through the S-GW. The S-GW also performs -charging/billing functions and lawful traffic interception. The Mobility -Management Entity (MME) performs connection and mobility management on -behalf of the UEs resident in the cell it controls. It receives UE -subscription information from the HHS. We cover mobility in cellular -networks in detail in Section 7.7. The Home Subscriber Server (HSS) -contains UE information including roaming access capabilities, quality -of service profiles, and authentication information. As we'll see in -Section 7.7, the HSS obtains this information from the UE's home -cellular provider. Very readable introductions to 4G network -architecture and its EPC are \[Motorola 2007; Palat 2009; Sauter 2014\]. -LTE Radio Access Network LTE uses a combination of frequency division -multiplexing and time division multiplexing on the downstream channel, -known as orthogonal frequency division multiplexing (OFDM) \[Rohde 2008; -Ericsson 2011\]. (The term "orthogonal" comes from the fact the signals -being sent on different frequency - -channels are created so that they interfere very little with each other, -even when channel frequencies are tightly spaced). In LTE, each active -mobile node is allocated one or more 0.5 ms time slots in one or more of -the channel frequencies. Figure 7.21 shows an allocation of eight time -slots over four frequencies. By being allocated increasingly more time -slots (whether on the same frequency or on different frequencies), a -mobile node is able to achieve increasingly higher transmission rates. -Slot (re)allocation among mobile - -Figure 7.21 Twenty 0.5 ms slots organized into 10 ms frames at each -frequency. An eight-slot allocation is shown shaded. - -nodes can be performed as often as once every millisecond. Different -modulation schemes can also be used to change the transmission rate; see -our earlier discussion of Figure 7.3 and dynamic selection of modulation -schemes in WiFi networks. The particular allocation of time slots to -mobile nodes is not mandated by the LTE standard. Instead, the decision -of which mobile nodes will be allowed to transmit in a given time slot -on a given frequency is determined by the scheduling algorithms provided -by the LTE equipment vendor and/or the network operator. With -opportunistic scheduling \[Bender 2000; Kolding 2003; Kulkarni 2005\], -matching the physical-layer protocol to the channel conditions between -the sender and receiver and choosing the receivers to which packets will -be sent based on channel conditions allow the radio network controller -to make best use of the wireless medium. In addition, user priorities -and contracted levels of service (e.g., silver, gold, or platinum) can -be used in scheduling downstream packet transmissions. In addition to -the LTE capabilities described above, LTE-Advanced allows for downstream -bandwidths of hundreds of Mbps by allocating aggregated channels to a -mobile node \[Akyildiz 2010\]. - -An additional 4G wireless technology---WiMAX (World Interoperability for -Microwave Access)---is a family of IEEE 802.16 standards that differ -significantly from LTE. WiMAX has not yet been able to enjoy the -widespread deployment of LTE. A detailed discussion of WiMAX can be -found on this book's Web site. - -7.5 Mobility Management: Principles Having covered the wireless nature -of the communication links in a wireless network, it's now time to turn -our attention to the mobility that these wireless links enable. In the -broadest sense, a mobile node is one that changes its point of -attachment into the network over time. Because the term mobility has -taken on many meanings in both the computer and telephony worlds, it -will serve us well first to consider several dimensions of mobility in -some detail. From the network layer's standpoint, how mobile is a user? -A physically mobile user will present a very different set of challenges -to the network layer, depending on how he or she moves between points of -attachment to the network. At one end of the spectrum in Figure 7.22, a -user may carry a laptop with a wireless network interface card around in -a building. As we saw in Section 7.3.4, this user is not mobile from a -network-layer perspective. Moreover, if the user associates with the -same access point regardless of location, the user is not even mobile -from the perspective of the link layer. At the other end of the -spectrum, consider the user zooming along the autobahn in a BMW or Tesla -at 150 kilometers per hour, passing through multiple wireless access -networks and wanting to maintain an uninterrupted TCP connection to a -remote application throughout the trip. This user is definitely mobile! -In between - -Figure 7.22 Various degrees of mobility, from the network layer's point -of view - -these extremes is a user who takes a laptop from one location (e.g., -office or dormitory) into another (e.g., coffeeshop, classroom) and -wants to connect into the-network in the new location. This user is also -mobile (although less so than the BMW driver!) but does not need to -maintain an ongoing connection while moving between points of attachment -to the network. Figure 7.22 illustrates this spectrum of user mobility -from the network layer's perspective. How important is it for the mobile -node's address to always remain the same? With mobile telephony, your -phone number---essentially the network-layer address of your -phone---remains the same as you travel from one provider's mobile phone -network to another. Must a laptop similarly - -maintain the same IP address while moving between IP networks? The -answer to this question will depend strongly on the applications being -run. For the BMW or Tesla driver who wants to maintain an uninterrupted -TCP connection to a remote application while zipping along the autobahn, -it would be convenient to maintain the same IP address. Recall from -Chapter 3 that an Internet application needs to know the IP address and -port number of the remote entity with which it is communicating. If a -mobile entity is able to maintain its IP address as it moves, mobility -becomes invisible from the application standpoint. There is great value -to this transparency ---an application need not be concerned with a -potentially changing IP address, and the same application code serves -mobile and nonmobile connections alike. We'll see in the following -section that mobile IP provides this transparency, allowing a mobile -node to maintain its permanent IP address while moving among networks. -On the other hand, a less glamorous mobile user might simply want to -turn off an office laptop, bring that laptop home, power up, and work -from home. If the laptop functions primarily as a client in -client-server applications (e.g., send/read e-mail, browse the Web, -Telnet to a remote host) from home, the particular IP address used by -the laptop is not that important. In particular, one could get by fine -with an address that is temporarily allocated to the laptop by the ISP -serving the home. We saw in Section 4.3 that DHCP already provides this -functionality. What supporting wired infrastructure is available? In all -of our scenarios above, we've implicitly assumed that there is a fixed -infrastructure to which the mobile user can connect---for example, the -home's ISP network, the wireless access network in the office, or the -wireless access networks lining the autobahn. What if no such -infrastructure exists? If two users are within communication proximity -of each other, can they establish a network connection in the absence of -any other network-layer infrastructure? Ad hoc networking provides -precisely these capabilities. This rapidly developing area is at the -cutting edge of mobile networking research and is beyond the scope of -this book. \[Perkins 2000\] and the IETF Mobile Ad Hoc Network (manet) -working group Web pages \[manet 2016\] provide thorough treatments of -the subject. In order to illustrate the issues involved in allowing a -mobile user to maintain ongoing connections while moving between -networks, let's consider a human analogy. A twenty-something adult -moving out of the family home becomes mobile, living in a series of -dormitories and/or apartments, and often changing addresses. If an old -friend wants to get in touch, how can that friend find the address of -her mobile friend? One common way is to contact the family, since a -mobile adult will often register his or her current address with the -family (if for no other reason than so that the parents can send money -to help pay the rent!). The family home, with its permanent address, -becomes that one place that others can go as a first step in -communicating with the mobile adult. Later communication from the friend -may be either indirect (for example, with mail being sent first to the -parents' home and then forwarded to the mobile adult) or direct (for -example, with the friend using the address obtained from the parents to -send mail directly to her mobile friend). - -In a network setting, the permanent home of a mobile node (such as a -laptop or smartphone) is known as the home network, and the entity -within the home network that performs the mobility management functions -discussed below on behalf of the mobile node is known as the home agent. -The network in which the mobile node is currently residing is known as -the foreign (or visited) network, and the entity within the foreign -network that helps the mobile node with the mobility management -functions discussed below is known as a foreign agent. For mobile -professionals, their home network might likely be their company network, -while the visited network might be the network of a colleague they are -visiting. A correspondent is the entity wishing to communicate with the -mobile node. Figure 7.23 illustrates these concepts, as well as -addressing concepts considered below. In Figure 7.23, note that agents -are shown as being collocated with routers (e.g., as processes running -on routers), but alternatively they could be executing on other hosts or -servers in the network. - -7.5.1 Addressing We noted above that in order for user mobility to be -transparent to network applications, it is desirable for a mobile node -to keep its address as it moves from one network - -Figure 7.23 Initial elements of a mobile network architecture - -to another. When a mobile node is resident in a foreign network, all -traffic addressed to the node's permanent address now needs to be routed -to the foreign network. How can this be done? One option is for the -foreign network to advertise to all other networks that the mobile node -is resident in its network. This could be via the usual exchange of -intradomain and interdomain routing information and would require few -changes to the existing routing infrastructure. The foreign network -could simply advertise to its neighbors that it has a highly specific -route to the mobile node's permanent address (that is, essentially -inform other networks that it has the correct path for routing datagrams -to the mobile node's permanent address; see Section 4.3). These -neighbors would then propagate this routing information throughout the -network as part of the normal procedure of updating routing information -and forwarding tables. When the mobile node leaves one foreign network -and joins another, the new foreign network would advertise a new, highly -specific route to the mobile node, and the old foreign network would -withdraw its routing information regarding the mobile node. This solves -two problems at once, and it does so without making significant changes -to the networklayer infrastructure. Other networks know the location of -the mobile node, and it is easy to route datagrams to the mobile node, -since the forwarding tables will direct datagrams to the foreign -network. A significant drawback, however, is that of scalability. If -mobility management were to be the responsibility of network routers, -the routers would have to maintain forwarding table entries for -potentially millions of mobile nodes, and update these entries as nodes -move. Some additional drawbacks are explored in the problems at the end -of this chapter. An alternative approach (and one that has been adopted -in practice) is to push mobility functionality from the network core to -the network edge---a recurring theme in our study of Internet -architecture. A natural way to do this is via the mobile node's home -network. In much the same way that parents of the mobile -twenty-something track their child's location, the home agent in the -mobile node's home network can track the foreign network in which the -mobile node resides. A protocol between the mobile node (or a foreign -agent representing the mobile node) and the home agent will certainly be -needed to update the mobile node's location. Let's now consider the -foreign agent in more detail. The conceptually simplest approach, shown -in Figure 7.23, is to locate foreign agents at the edge routers in the -foreign network. One role of the foreign agent is to create a so-called -care-of address (COA) for the mobile node, with the network portion of -the COA matching that of the foreign network. There are thus two -addresses associated with a mobile node, its permanent address -(analogous to our mobile youth's family's home address) and its COA, -sometimes known as a foreign address (analogous to the address of the -house in which our mobile youth is currently residing). In the example -in Figure 7.23, the permanent address of the mobile node is -128.119.40.186. When visiting network 79.129.13/24, the mobile node has -a COA of 79.129.13.2. A second role of the foreign agent is to inform -the home agent that the mobile node is resident in its (the foreign -agent's) network and has the given COA. We'll see shortly that the COA -will - -be used to "reroute" datagrams to the mobile node via its foreign agent. -Although we have separated the functionality of the mobile node and the -foreign agent, it is worth noting that the mobile node can also assume -the responsibilities of the foreign agent. For example, the mobile node -could obtain a COA in the foreign network (for example, using a protocol -such as DHCP) and itself inform the home agent of its COA. - -7.5.2 Routing to a Mobile Node We have now seen how a mobile node -obtains a COA and how the home agent can be informed of that address. -But having the home agent know the COA solves only part of the problem. -How should datagrams be addressed and forwarded to the mobile node? -Since only the home agent (and not network-wide routers) knows the -location of the mobile node, it will no longer suffice to simply address -a datagram to the mobile node's permanent address and send it into the -network-layer infrastructure. Something more must be done. Two -approaches can be identified, which we will refer to as indirect and -direct routing. Indirect Routing to a Mobile Node Let's first consider a -correspondent that wants to send a datagram to a mobile node. In the -indirect routing approach, the correspondent simply addresses the -datagram to the mobile node's permanent address and sends the datagram -into the network, blissfully unaware of whether the mobile node is -resident in its home network or is visiting a foreign network; mobility -is thus completely transparent to the correspondent. Such datagrams are -first routed, as usual, to the mobile node's home network. This is -illustrated in step 1 in Figure 7.24. Let's now turn our attention to -the home agent. In addition to being responsible for interacting with a -foreign agent to track the mobile node's COA, the home agent has another -very important function. Its second job is to be on the lookout for -arriving datagrams addressed to nodes whose home network is that of the -home agent but that are currently resident in a foreign network. The -home agent intercepts these datagrams and then forwards them to a mobile -node in a two-step process. The datagram is first forwarded to the -foreign agent, using the mobile node's COA (step 2 in Figure 7.24), and -then forwarded from the foreign agent to the mobile node (step 3 in -Figure 7.24). - -Figure 7.24 Indirect routing to a mobile node - -It is instructive to consider this rerouting in more detail. The home -agent will need to address the datagram using the mobile node's COA, so -that the network layer will route the datagram to the foreign network. -On the other hand, it is desirable to leave the correspondent's datagram -intact, since the application receiving the datagram should be unaware -that the datagram was forwarded via the home agent. Both goals can be -satisfied by having the home agent encapsulate the correspondent's -original complete datagram within a new (larger) datagram. This larger -datagram is addressed and delivered to the mobile node's COA. The -foreign agent, who "owns" the COA, will receive and decapsulate the -datagram---that is, remove the correspondent's original datagram from -within the larger encapsulating datagram and forward (step 3 in Figure -7.24) the original datagram to the mobile node. Figure 7.25 shows a -correspondent's original datagram being sent to the home network, an -encapsulated datagram being sent to the foreign agent, and the original -datagram being delivered to the mobile node. The sharp reader will note -that the encapsulation/decapsulation described here is identical to the -notion of tunneling, discussed in Section 4.3 in the context of IP -multicast and IPv6. Let's next consider how a mobile node sends -datagrams to a correspondent. This is quite simple, as the mobile node -can address its datagram directly to the correspondent (using its own -permanent address as the source address, and the - -Figure 7.25 Encapsulation and decapsulation - -correspondent's address as the destination address). Since the mobile -node knows the correspondent's address, there is no need to route the -datagram back through the home agent. This is shown as step 4 in Figure -7.24. Let's summarize our discussion of indirect routing by listing the -new network-layer functionality required to support mobility. A -mobile-node--to--foreign-agent protocol. The mobile node will register -with the foreign agent when attaching to the foreign network. Similarly, -a mobile node will deregister with the foreign agent when it leaves the -foreign network. A foreign-agent--to--home-agent registration protocol. -The foreign agent will register the mobile node's COA with the home -agent. A foreign agent need not explicitly deregister a COA when a -mobile node leaves its network, because the subsequent registration of a -new COA, when the mobile node moves to a new network, will take care of -this. A home-agent datagram encapsulation protocol. Encapsulation and -forwarding of the correspondent's original datagram within a datagram -addressed to the COA. A foreign-agent decapsulation protocol. Extraction -of the correspondent's original datagram from the encapsulating -datagram, and the forwarding of the original datagram to the mobile -node. The previous discussion provides all the pieces---foreign agents, -the home agent, and indirect - -forwarding---needed for a mobile node to maintain an ongoing connection -while moving among networks. As an example of how these pieces fit -together, assume the mobile node is attached to foreign network A, has -registered a COA in network A with its home agent, and is receiving -datagrams that are being indirectly routed through its home agent. The -mobile node now moves to foreign network B and registers with the -foreign agent in network B, which informs the home agent of the mobile -node's new COA. From this point on, the home agent will reroute -datagrams to foreign network B. As far as a correspondent is concerned, -mobility is transparent---datagrams are routed via the same home agent -both before and after the move. As far as the home agent is concerned, -there is no disruption in the flow of datagrams---arriving datagrams are -first forwarded to foreign network A; after the change in COA, datagrams -are forwarded to foreign network B. But will the mobile node see an -interrupted flow of datagrams as it moves between networks? As long as -the time between the mobile node's disconnection from network A (at -which point it can no longer receive datagrams via A) and its attachment -to network B (at which point it will register a new COA with its home -agent) is small, few datagrams will be lost. Recall from Chapter 3 that -end-to-end connections can suffer datagram loss due to network -congestion. Hence occasional datagram loss within a connection when a -node moves between networks is by no means a catastrophic problem. If -loss-free communication is required, upperlayer mechanisms will recover -from datagram loss, whether such loss results from network congestion or -from user mobility. An indirect routing approach is used in the mobile -IP standard \[RFC 5944\], as discussed in Section 7.6. Direct Routing to -a Mobile Node The indirect routing approach illustrated in Figure 7.24 -suffers from an inefficiency known as the triangle routing -problem---datagrams addressed to the mobile node must be routed first to -the home agent and then to the foreign network, even when a much more -efficient route exists between the correspondent and the mobile node. In -the worst case, imagine a mobile user who is visiting the foreign -network of a colleague. The two are sitting side by side and exchanging -data over the network. Datagrams from the correspondent (in this case -the colleague of the visitor) are routed to the mobile user's home agent -and then back again to the foreign network! Direct routing overcomes the -inefficiency of triangle routing, but does so at the cost of additional -complexity. In the direct routing approach, a correspondent agent in the -correspondent's network first learns the COA of the mobile node. This -can be done by having the correspondent agent query the home agent, -assuming that (as in the case of indirect routing) the mobile node has -an up-to-date value for its COA registered with its home agent. It is -also possible for the correspondent itself to perform the function of -the correspondent agent, just as a mobile node could perform the -function of the foreign agent. This is shown as steps 1 and 2 in Figure -7.26. The correspondent agent then tunnels datagrams directly to the -mobile node's COA, in a manner analogous to the tunneling performed by -the home agent, steps 3 and 4 in Figure 7.26. - -While direct routing overcomes the triangle routing problem, it -introduces two important additional challenges: A mobile-user location -protocol is needed for the correspondent agent to query the home agent -to obtain the mobile node's COA (steps 1 and 2 in Figure 7.26). When the -mobile node moves from one foreign network to another, how will data now -be forwarded to the new foreign network? In the case of indirect -routing, this problem was easily solved by updating the COA maintained -by the home agent. However, with direct routing, the home agent is -queried for the COA by the correspondent agent only once, at the -beginning of the session. Thus, updating the COA at the home agent, -while necessary, will not be enough to solve the problem of routing data -to the mobile node's new foreign network. One solution would be to -create a new protocol to notify the correspondent of the changing COA. -An alternate solution, and one that we'll see adopted in practice - -Figure 7.26 Direct routing to a mobile user - -in GSM networks, works as follows. Suppose data is currently being -forwarded to the mobile node in the foreign network where the mobile -node was located when the session first started (step 1 in Figure 7.27). -We'll identify the foreign agent in that foreign network where the -mobile node was first found as the anchor foreign agent. When the mobile -node moves to a new foreign network (step 2 in Figure 7.27), the mobile -node registers with the new foreign agent (step 3), and the new foreign -agent provides the anchor foreign agent with the mobile node's new COA -(step 4). When the anchor foreign agent receives an encapsulated -datagram for a departed mobile node, it can then re-encapsulate the -datagram and forward it to the mobile node (step 5) using the new COA. -If the mobile node later moves yet again to a new foreign network, the -foreign agent in that new visited network would then contact the anchor -foreign agent in order to set up forwarding to this new foreign network. - -Figure 7.27 Mobile transfer between networks with direct routing - -7.6 Mobile IP The Internet architecture and protocols for supporting -mobility, collectively known as mobile IP, are defined primarily in RFC -5944 for IPv4. Mobile IP is a flexible standard, supporting many -different modes of operation (for example, operation with or without a -foreign agent), multiple ways for agents and mobile nodes to discover -each other, use of single or multiple COAs, and multiple forms of -encapsulation. As such, mobile IP is a complex standard, and would -require an entire book to describe in detail; indeed one such book is -\[Perkins 1998b\]. Our modest goal here is to provide an overview of the -most important aspects of mobile IP and to illustrate its use in a few -common-case scenarios. The mobile IP architecture contains many of the -elements we have considered above, including the concepts of home -agents, foreign agents, care-of addresses, and -encapsulation/decapsulation. The current standard \[RFC 5944\] specifies -the use of indirect routing to the mobile node. The mobile IP standard -consists of three main pieces: Agent discovery. Mobile IP defines the -protocols used by a home or foreign agent to advertise its services to -mobile nodes, and protocols for mobile nodes to solicit the services of -a foreign or home agent. Registration with the home agent. Mobile IP -defines the protocols used by the mobile node and/or foreign agent to -register and deregister COAs with a mobile node's home agent. Indirect -routing of datagrams. The standard also defines the manner in which -datagrams are forwarded to mobile nodes by a home agent, including rules -for forwarding datagrams, rules for handling error conditions, and -several forms of encapsulation \[RFC 2003, RFC 2004\]. Security -considerations are prominent throughout the mobile IP standard. For -example, authentication of a mobile node is clearly needed to ensure -that a malicious user does not register a bogus care-of address with a -home agent, which could cause all datagrams addressed to an IP address -to be redirected to the malicious user. Mobile IP achieves security -using many of the mechanisms that we will examine in Chapter 8, so we -will not address security considerations in our discussion below. Agent -Discovery A mobile IP node arriving to a new network, whether attaching -to a foreign network or returning to its home network, must learn the -identity of the corresponding foreign or home agent. Indeed it is the -discovery of a new foreign agent, with a new network address, that -allows the network layer in a mobile - -node to learn that it has moved into a new foreign network. This process -is known as agent discovery. Agent discovery can be accomplished in one -of two ways: via agent advertisement or via agent solicitation. With -agent advertisement, a foreign or home agent advertises its services -using an extension to the existing router discovery protocol \[RFC -1256\]. The agent periodically broadcasts an ICMP message with a type -field of 9 (router discovery) on all links to which it is connected. The -router discovery message contains the IP address of the router (that is, -the agent), thus allowing a mobile node to learn the agent's IP address. -The router discovery message also contains a mobility agent -advertisement extension that contains additional information needed by -the mobile node. Among the more important fields in the extension are -the following: Home agent bit (H). Indicates that the agent is a home -agent for the network in which it resides. Foreign agent bit (F). -Indicates that the agent is a foreign agent for the network in which it -resides. Registration required bit (R). Indicates that a mobile user in -this network must register with a foreign agent. In particular, a mobile -user cannot obtain a care-of address in the foreign network (for -example, using DHCP) and assume the functionality of the foreign agent -for itself, without registering with the foreign agent. - -Figure 7.28 ICMP router discovery message with mobility agent -advertisement extension - -M, G encapsulation bits. Indicate whether a form of encapsulation other -than IP-in-IP encapsulation will be used. Care-of address (COA) fields. -A list of one or more care-of addresses provided by the foreign - -agent. In our example below, the COA will be associated with the foreign -agent, who will receive datagrams sent to the COA and then forward them -to the appropriate mobile node. The mobile user will select one of these -addresses as its COA when registering with its home agent. Figure 7.28 -illustrates some of the key fields in the agent advertisement message. -With agent solicitation, a mobile node wanting to learn about agents -without waiting to receive an agent advertisement can broadcast an agent -solicitation message, which is simply an ICMP message with type value -10. An agent receiving the solicitation will unicast an agent -advertisement directly to the mobile node, which can then proceed as if -it had received an unsolicited advertisement. Registration with the Home -Agent Once a mobile IP node has received a COA, that address must be -registered with the home agent. This can be done either via the foreign -agent (who then registers the COA with the home agent) or directly by -the mobile IP node itself. We consider the former case below. Four steps -are involved. - -1. Following the receipt of a foreign agent advertisement, a mobile - node sends a mobile IP registration message to the foreign agent. - The registration message is carried within a UDP datagram and sent - to port 434. The registration message carries a COA advertised by - the foreign agent, the address of the home agent (HA), the permanent - address of the mobile node (MA), the requested lifetime of the - registration, and a 64-bit registration identification. The - requested registration lifetime is the number of seconds that the - registration is to be valid. If the registration is not renewed at - the home agent within the specified lifetime, the registration will - become invalid. The registration identifier acts like a sequence - number and serves to match a received registration reply with a - registration request, as discussed below. - -2. The foreign agent receives the registration message and records the - mobile node's permanent IP address. The foreign agent now knows that - it should be looking for datagrams containing an encapsulated - datagram whose destination address matches the permanent address of - the mobile node. The foreign agent then sends a mobile IP - registration message (again, within a UDP datagram) to port 434 of - the home agent. The message contains the COA, HA, MA, encapsulation - format requested, requested registration lifetime, and registration - identification. - -3. The home agent receives the registration request and checks for - authenticity and correctness. The home agent binds the mobile node's - permanent IP address with the COA; in the future, datagrams arriving - at the home agent and addressed to the mobile node will now be - encapsulated and tunneled to the COA. The home agent sends a mobile - IP registration reply containing the HA, MA, actual registration - lifetime, and the registration identification of the request that is - being satisfied with this reply. - -4. The foreign agent receives the registration reply and then forwards - it to the mobile node. - -At this point, registration is complete, and the mobile node can receive -datagrams sent to its permanent address. Figure 7.29 illustrates these -steps. Note that the home agent specifies a lifetime that is smaller -than the lifetime requested by the mobile node. A foreign agent need not -explicitly deregister a COA when a mobile node leaves its network. This -will occur automatically, when the mobile node moves to a new network -(whether another foreign network or its home network) and registers a -new COA. The mobile IP standard allows many additional scenarios and -capabilities in addition to those described previously. The interested -reader should consult \[Perkins 1998b; RFC 5944\]. - -Figure 7.29 Agent advertisement and mobile IP registration - -7.7 Managing Mobility in Cellular Networks Having examined how mobility -is managed in IP networks, let's now turn our attention to networks with -an even longer history of supporting mobility---cellular telephony -networks. Whereas we focused on the first-hop wireless link in cellular -networks in Section 7.4, we'll focus here on mobility, using the GSM -cellular network \[Goodman 1997; Mouly 1992; Scourias 2012; Kaaranen -2001; Korhonen 2003; Turner 2012\] as our case study, since it is a -mature and widely deployed technology. Mobility in 3G and 4G networks is -similar in principle to that used in GSM. As in the case of mobile IP, -we'll see that a number of the fundamental principles we identified in -Section 7.5 are embodied in GSM's network architecture. Like mobile IP, -GSM adopts an indirect routing approach (see Section 7.5.2), first -routing the correspondent's call to the mobile user's home network and -from there to the visited network. In GSM terminology, the mobile -users's home network is referred to as the mobile user's home public -land mobile network (home PLMN). Since the PLMN acronym is a bit of a -mouthful, and mindful of our quest to avoid an alphabet soup of -acronyms, we'll refer to the GSM home PLMN simply as the home network. -The home network is the cellular provider with which the mobile user has -a subscription (i.e., the provider that bills the user for monthly -cellular service). The visited PLMN, which we'll refer to simply as the -visited network, is the network in which the mobile user is currently -residing. As in the case of mobile IP, the responsibilities of the home -and visited networks are quite different. The home network maintains a -database known as the home location register (HLR), which contains the -permanent cell phone number and subscriber profile information for each -of its subscribers. Importantly, the HLR also contains information about -the current locations of these subscribers. That is, if a mobile user is -currently roaming in another provider's cellular network, the HLR -contains enough information to obtain (via a process we'll describe -shortly) an address in the visited network to which a call to the mobile -user should be routed. As we'll see, a special switch in the home -network, known as the Gateway Mobile services Switching Center (GMSC) is -contacted by a correspondent when a call is placed to a mobile user. -Again, in our quest to avoid an alphabet soup of acronyms, we'll refer -to the GMSC here by a more descriptive term, home MSC. The visited -network maintains a database known as the visitor location register -(VLR). The VLR contains an entry for each mobile user that is currently -in the portion of the network served by the VLR. VLR entries thus come -and go as mobile users enter and leave the network. A VLR is usually -co-located with the mobile switching center (MSC) that coordinates the -setup of a call to and from the visited network. - -In practice, a provider's cellular network will serve as a home network -for its subscribers and as a visited network for mobile users whose -subscription is with a different cellular provider. - -Figure 7.30 Placing a call to a mobile user: Indirect routing - -7.7.1 Routing Calls to a Mobile User We're now in a position to describe -how a call is placed to a mobile GSM user in a visited network. We'll -consider a simple example below; more complex scenarios are described in -\[Mouly 1992\]. The steps, as illustrated in Figure 7.30, are as -follows: - -1. The correspondent dials the mobile user's phone number. This number - itself does not refer to a particular telephone line or location - (after all, the phone number is fixed and the user is mobile!). The - leading digits in the number are sufficient to globally identify the - mobile's home network. The call is routed from the correspondent - through the PSTN to the home MSC in the mobile's home network. This - is the first leg of the call. - -2. The home MSC receives the call and interrogates the HLR to determine - the location of the mobile user. In the simplest case, the HLR - returns the mobile station roaming number (MSRN), which we will - refer to as the roaming number. Note that this number is different - from the mobile's permanent phone number, which is associated with - the mobile's home network. The - -roaming number is ephemeral: It is temporarily assigned to a mobile when -it enters a visited network. The roaming number serves a role similar to -that of the care-of address in mobile IP and, like the COA, is invisible -to the correspondent and the mobile. If HLR does not have the roaming -number, it returns the address of the VLR in the visited network. In -this case (not shown in Figure 7.30), the home MSC will need to query -the VLR to obtain the roaming number of the mobile node. But how does -the HLR get the roaming number or the VLR address in the first place? -What happens to these values when the mobile user moves to another -visited network? We'll consider these important questions shortly. - -3. Given the roaming number, the home MSC sets up the second leg of the - call through the network to the MSC in the visited network. The call - is completed, being routed from the correspondent to the home MSC, - and from there to the visited MSC, and from there to the base - station serving the mobile user. An unresolved question in step 2 is - how the HLR obtains information about the location of the mobile - user. When a mobile telephone is switched on or enters a part of a - visited network that is covered by a new VLR, the mobile must - register with the visited network. This is done through the exchange - of signaling messages between the mobile and the VLR. The visited - VLR, in turn, sends a location update request message to the - mobile's HLR. This message informs the HLR of either the roaming - number at which the mobile can be contacted, or the address of the - VLR (which can then later be queried to obtain the mobile number). - As part of this exchange, the VLR also obtains subscriber - information from the HLR about the mobile and determines what - services (if any) should be accorded the mobile user by the visited - network. - -7.7.2 Handoffs in GSM A handoff occurs when a mobile station changes its -association from one base station to another during a call. As shown in -Figure 7.31, a mobile's call is initially (before handoff) routed to the -mobile through one base station (which we'll refer to as the old base -station), and after handoff is routed to the mobile through another base - -Figure 7.31 Handoff scenario between base stations with a common MSC - -station (which we'll refer to as the new base station). Note that a -handoff between base stations results not only in the mobile -transmitting/receiving to/from a new base station, but also in the -rerouting of the ongoing call from a switching point within the network -to the new base station. Let's initially assume that the old and new -base stations share the same MSC, and that the rerouting occurs at this -MSC. There may be several reasons for handoff to occur, including (1) -the signal between the current base station and the mobile may have -deteriorated to such an extent that the call is in danger of being -dropped, and (2) a cell may have become overloaded, handling a large -number of calls. This congestion may be alleviated by handing off -mobiles to less congested nearby cells. While it is associated with a -base station, a mobile periodically measures the strength of a beacon -signal from its current base station as well as beacon signals from -nearby base stations that it can "hear." These measurements are reported -once or twice a second to the mobile's current base station. Handoff in -GSM is initiated by the old base station based on these measurements, -the current loads of mobiles in nearby cells, and other factors \[Mouly -1992\]. The GSM standard does not specify the specific algorithm to be -used by a base station to determine whether or not to perform handoff. -Figure 7.32 illustrates the steps involved when a base station does -decide to hand off a mobile user: - -1. The old base station (BS) informs the visited MSC that a handoff is - to be performed and the BS (or possible set of BSs) to which the - mobile is to be handed off. - -2. The visited MSC initiates path setup to the new BS, allocating the - resources needed to carry the rerouted call, and signaling the new - BS that a handoff is about to occur. - -3. The new BS allocates and activates a radio channel for use by the - mobile. - -4. The new BS signals back to the visited MSC and the old BS that the - visited-MSC-to-new-BS path has been established and that the mobile - should be - -Figure 7.32 Steps in accomplishing a handoff between base stations with -a common MSC - -informed of the impending handoff. The new BS provides all of the -information that the mobile will need to associate with the new BS. - -5. The mobile is informed that it should perform a handoff. Note that - up until this point, the mobile has been blissfully unaware that the - network has been laying the groundwork (e.g., allocating a channel - in the new BS and allocating a path from the visited MSC to the new - BS) for a handoff. - -6. The mobile and the new BS exchange one or more messages to fully - activate the new channel in the new BS. - -7. The mobile sends a handoff complete message to the new BS, which is - forwarded up to the visited MSC. The visited MSC then reroutes the - ongoing call to the mobile via the new BS. - -8. The resources allocated along the path to the old BS are then - released. Let's conclude our discussion of handoff by considering - what happens when the mobile moves to a BS that is associated with a - different MSC than the old BS, and what happens when this inter-MSC - handoff occurs more than once. As shown in Figure 7.33, GSM defines - the notion of an anchor MSC. The anchor MSC is the MSC visited by - the mobile when a call first begins; the anchor MSC thus remains - unchanged during the call. Throughout the call's duration and - regardless of the number of inter-MSC - -Figure 7.33 Rerouting via the anchor MSC - -Table 7.2 Commonalities between mobile IP and GSM mobility GSM element - -Comment on GSM element - -Mobile IP element - -Home system - -Network to which the mobile user's permanent phone number - -Home - -belongs. - -network - -Gateway mobile - -Home MSC: point of contact to obtain routable address of - -Home - -switching center or - -mobile user. HLR: database in home system containing - -agent - -simply home MSC, - -permanent phone number, profile information, current location - -Home location register - -of mobile user, subscription information. - -(HLR) Visited system - -Network other than home system where mobile user is - -Visited - -currently residing. - -network - -Visited mobile services - -Visited MSC: responsible for setting up calls to/from mobile - -Foreign - -switching center, - -nodes in cells associated with MSC. VLR: temporary database - -agent - -Visitor location register - -entry in visited system, containing subscription information for - -(VLR) - -each visiting mobile user. - -Mobile station roaming - -Routable address for telephone call segment between home - -Care-of - -number (MSRN) or - -MSC and visited MSC, visible to neither the mobile nor the - -address - -simply roaming number - -correspondent. - -transfers performed by the mobile, the call is routed from the home MSC -to the anchor MSC, and then from the anchor MSC to the visited MSC where -the mobile is currently located. When a mobile moves from the coverage -area of one MSC to another, the ongoing call is rerouted from the anchor -MSC to the new visited MSC containing the new base station. Thus, at all -times there are at most three MSCs (the home MSC, the anchor MSC, and -the visited MSC) between the correspondent and the mobile. Figure 7.33 -illustrates the routing of a call among the MSCs visited by a mobile -user. Rather than maintaining a single MSC hop from the anchor MSC to -the current MSC, an alternative approach would have been to simply chain -the MSCs visited by the mobile, having an old MSC forward the ongoing -call to the new MSC each time the mobile moves to a new MSC. Such MSC -chaining can in fact occur in IS-41 cellular networks, with an optional -path minimization step to remove MSCs between the anchor MSC and the -current visited MSC \[Lin 2001\]. Let's wrap up our discussion of GSM -mobility management with a comparison of mobility management in GSM and -Mobile IP. The comparison in Table 7.2 indicates that although IP and -cellular networks are fundamentally different in many ways, they share a -surprising number of common functional elements and overall approaches -in handling mobility. - -7.8 Wireless and Mobility: Impact on Higher-Layer Protocols In this -chapter, we've seen that wireless networks differ significantly from -their wired counterparts at both the link layer (as a result of wireless -channel characteristics such as fading, multipath, and hidden terminals) -and at the network layer (as a result of mobile users who change their -points of attachment to the network). But are there important -differences at the transport and application layers? It's tempting to -think that these differences will be minor, since the network layer -provides the same best-effort delivery service model to upper layers in -both wired and wireless networks. Similarly, if protocols such as TCP or -UDP are used to provide transport-layer services to applications in both -wired and wireless networks, then the application layer should remain -unchanged as well. In one sense our intuition is right---TCP and UDP can -(and do) operate in networks with wireless links. On the other hand, -transport protocols in general, and TCP in particular, can sometimes -have very different performance in wired and wireless networks, and it -is here, in terms of performance, that differences are manifested. Let's -see why. Recall that TCP retransmits a segment that is either lost or -corrupted on the path between sender and receiver. In the case of mobile -users, loss can result from either network congestion (router buffer -overflow) or from handoff (e.g., from delays in rerouting segments to a -mobile's new point of attachment to the network). In all cases, TCP's -receiver-to-sender ACK indicates only that a segment was not received -intact; the sender is unaware of whether the segment was lost due to -congestion, during handoff, or due to detected bit errors. In all cases, -the sender's response is the same---to retransmit the segment. TCP's -congestion-control response is also the same in all cases---TCP -decreases its congestion window, as discussed in Section 3.7. By -unconditionally decreasing its congestion window, TCP implicitly assumes -that segment loss results from congestion rather than corruption or -handoff. We saw in Section 7.2 that bit errors are much more common in -wireless networks than in wired networks. When such bit errors occur or -when handoff loss occurs, there's really no reason for the TCP sender to -decrease its congestion window (and thus decrease its sending rate). -Indeed, it may well be the case that router buffers are empty and -packets are flowing along the end-to-end path unimpeded by congestion. -Researchers realized in the early to mid 1990s that given high bit error -rates on wireless links and the possibility of handoff loss, TCP's -congestion-control response could be problematic in a wireless setting. -Three broad classes of approaches are possible for dealing with this -problem: Local recovery. Local recovery protocols recover from bit -errors when and where (e.g., at the wireless link) they occur, e.g., the -802.11 ARQ protocol we studied in Section 7.3, or more sophisticated -approaches that use both ARQ and FEC \[Ayanoglu 1995\]. - -TCP sender awareness of wireless links. In the local recovery -approaches, the TCP sender is blissfully unaware that its segments are -traversing a wireless link. An alternative approach is for the TCP -sender and receiver to be aware of the existence of a wireless link, to -distinguish between congestive losses occurring in the wired network and -corruption/loss occurring at the wireless link, and to invoke congestion -control only in response to congestive wired-network losses. -\[Balakrishnan 1997\] investigates various types of TCP, assuming that -end systems can make this distinction. \[Liu 2003\] investigates -techniques for distinguishing between losses on the wired and wireless -segments of an end-to-end path. Split-connection approaches. In a -split-connection approach \[Bakre 1995\], the end-to-end connection -between the mobile user and the other end point is broken into two -transport-layer connections: one from the mobile host to the wireless -access point, and one from the wireless access point to the other -communication end point (which we'll assume here is a wired host). The -end-to-end connection is thus formed by the concatenation of a wireless -part and a wired part. The transport layer over the wireless segment can -be a standard TCP connection \[Bakre 1995\], or a specially tailored -error recovery protocol on top of UDP. \[Yavatkar 1994\] investigates -the use of a transport-layer selective repeat protocol over the wireless -connection. Measurements reported in \[Wei 2006\] indicate that split -TCP connections are widely used in cellular data networks, and that -significant improvements can indeed be made through the use of split TCP -connections. Our treatment of TCP over wireless links has been -necessarily brief here. In-depth surveys of TCP challenges and solutions -in wireless networks can be found in \[Hanabali 2005; Leung 2006\]. We -encourage you to consult the references for details of this ongoing area -of research. Having considered transport-layer protocols, let us next -consider the effect of wireless and mobility on application-layer -protocols. Here, an important consideration is that wireless links often -have relatively low bandwidths, as we saw in Figure 7.2. As a result, -applications that operate over wireless links, particularly over -cellular wireless links, must treat bandwidth as a scarce commodity. For -example, a Web server serving content to a Web browser executing on a 4G -phone will likely not be able to provide the same image-rich content -that it gives to a browser operating over a wired connection. Although -wireless links do provide challenges at the application layer, the -mobility they enable also makes possible a rich set of location-aware -and context-aware applications \[Chen 2000; Baldauf 2007\]. More -generally, wireless and mobile networks will play a key role in -realizing the ubiquitous computing environments of the future \[Weiser -1991\]. It's fair to say that we've only seen the tip of the iceberg -when it comes to the impact of wireless and mobile networks on networked -applications and their protocols! - -7.9 Summary Wireless and mobile networks have revolutionized telephony -and are having an increasingly profound impact in the world of computer -networks as well. With their anytime, anywhere, untethered access into -the global network infrastructure, they are not only making network -access more ubiquitous, they are also enabling an exciting new set of -location-dependent services. Given the growing importance of wireless -and mobile networks, this chapter has focused on the principles, common -link technologies, and network architectures for supporting wireless and -mobile communication. We began this chapter with an introduction to -wireless and mobile networks, drawing an important distinction between -the challenges posed by the wireless nature of the communication links -in such networks, and by the mobility that these wireless links enable. -This allowed us to better isolate, identify, and master the key concepts -in each area. We focused first on wireless communication, considering -the characteristics of a wireless link in Section 7.2. In Sections 7.3 -and 7.4, we examined the link-level aspects of the IEEE 802.11 (WiFi) -wireless LAN standard, two IEEE 802.15 personal area networks (Bluetooth -and Zigbee), and 3G and 4G cellular Internet access. We then turned our -attention to the issue of mobility. In Section 7.5, we identified -several forms of mobility, with points along this spectrum posing -different challenges and admitting different solutions. We considered -the problems of locating and routing to a mobile user, as well as -approaches for handing off the mobile user who dynamically moves from -one point of attachment to the network to another. We examined how these -issues were addressed in the mobile IP standard and in GSM, in Sections -7.6 and 7.7, respectively. Finally, we considered the impact of wireless -links and mobility on transport-layer protocols and networked -applications in Section 7.8. Although we have devoted an entire chapter -to the study of wireless and mobile networks, an entire book (or more) -would be required to fully explore this exciting and rapidly expanding -field. We encourage you to delve more deeply into this field by -consulting the many references provided in this chapter. - -Homework Problems and Questions - -Chapter 7 Review Questions - -Section 7.1 R1. What does it mean for a wireless network to be operating -in "infrastructure mode"? If the network is not in infrastructure mode, -what mode of operation is it in, and what is the difference between that -mode of operation and infrastructure mode? R2. What are the four types -of wireless networks identified in our taxonomy in Section 7.1 ? Which -of these types of wireless networks have you used? - -Section 7.2 R3. What are the differences between the following types of -wireless channel impairments: path loss, multipath propagation, -interference from other sources? R4. As a mobile node gets farther and -farther away from a base station, what are two actions that a base -station could take to ensure that the loss probability of a transmitted -frame does not increase? - -Sections 7.3 and 7.4 R5. Describe the role of the beacon frames in -802.11. R6. True or false: Before an 802.11 station transmits a data -frame, it must first send an RTS frame and receive a corresponding CTS -frame. R7. Why are acknowledgments used in 802.11 but not in wired -Ethernet? R8. True or false: Ethernet and 802.11 use the same frame -structure. R9. Describe how the RTS threshold works. R10. Suppose the -IEEE 802.11 RTS and CTS frames were as long as the standard DATA and ACK -frames. Would there be any advantage to using the CTS and RTS frames? -Why or why not? R11. Section 7.3.4 discusses 802.11 mobility, in which a -wireless station moves from one BSS to another within the same subnet. -When the APs are interconnected with a switch, an AP may need to send a -frame with a spoofed MAC address to get the switch to forward the frame -properly. Why? - -R12. What are the differences between a master device in a Bluetooth -network and a base station in an 802.11 network? R13. What is meant by a -super frame in the 802.15.4 Zigbee standard? R14. What is the role of -the "core network" in the 3G cellular data architecture? R15. What is -the role of the RNC in the 3G cellular data network architecture? What -role does the RNC play in the cellular voice network? R16. What is the -role of the eNodeB, MME, P-GW, and S-GW in 4G architecture? R17. What -are three important differences between the 3G and 4G cellular -architectures? - -Sections 7.5 and 7.6 R18. If a node has a wireless connection to the -Internet, does that node have to be mobile? Explain. Suppose that a user -with a laptop walks around her house with her laptop, and always -accesses the Internet through the same access point. Is this user mobile -from a network standpoint? Explain. R19. What is the difference between -a permanent address and a care-of address? Who assigns a care-of -address? R20. Consider a TCP connection going over Mobile IP. True or -false: The TCP connection phase between the correspondent and the mobile -host goes through the mobile's home network, but the data transfer phase -is directly between the correspondent and the mobile host, bypassing the -home network. - -Section 7.7 R21. What are the purposes of the HLR and VLR in GSM -networks? What elements of mobile IP are similar to the HLR and VLR? -R22. What is the role of the anchor MSC in GSM networks? - -Section 7.8 R23. What are three approaches that can be taken to avoid -having a single wireless link degrade the performance of an end-to-end -transport-layer TCP connection? - -Problems P1. Consider the single-sender CDMA example in Figure 7.5 . -What would be the sender's output (for the 2 data bits shown) if the -sender's CDMA code were (1,−1,1,−1,1,−1,1,−1)? P2. Consider sender 2 in -Figure 7.6 . What is the sender's output to the channel (before it is -added to the signal from sender 1), Zi,m2? - -P3. Suppose that the receiver in Figure 7.6 wanted to receive the data -being sent by sender 2. Show (by calculation) that the receiver is -indeed able to recover sender 2's data from the aggregate channel signal -by using sender 2's code. P4. For the two-sender, two-receiver example, -give an example of two CDMA codes containing 1 and 21 values that do not -allow the two receivers to extract the original transmitted bits from -the two CDMA senders. P5. Suppose there are two ISPs providing WiFi -access in a particular café, with each ISP operating its own AP and -having its own IP address block. - -a. Further suppose that by accident, each ISP has configured its AP to - operate over channel 11. Will the 802.11 protocol completely break - down in this situation? Discuss what happens when two stations, each - associated with a different ISP, attempt to transmit at the same - time. - -b. Now suppose that one AP operates over channel 1 and the other over - channel 11. How do your answers change? P6. In step 4 of the CSMA/CA - protocol, a station that successfully transmits a frame begins the - CSMA/CA protocol for a second frame at step 2, rather than at - step 1. What rationale might the designers of CSMA/CA have had in - mind by having such a station not transmit the second frame - immediately (if the channel is sensed idle)? P7. Suppose an 802.11b - station is configured to always reserve the channel with the RTS/CTS - sequence. Suppose this station suddenly wants to transmit 1,000 - bytes of data, and all other stations are idle at this time. As a - function of SIFS and DIFS, and ignoring propagation delay and - assuming no bit errors, calculate the time required to transmit the - frame and receive the acknowledgment. P8. Consider the scenario - shown in Figure 7.34 , in which there are four wireless nodes, A, B, - C, and D. The radio coverage of the four nodes is shown via the - shaded ovals; all nodes share the same frequency. When A transmits, - it - -Figure 7.34 Scenario for problem P8 - -can only be heard/received by B; when B transmits, both A and C can -hear/receive from B; when C transmits, both B and D can hear/receive -from C; when D transmits, only C can hear/receive - -from D. Suppose now that each node has an infinite supply of messages -that it wants to send to each of the other nodes. If a message's -destination is not an immediate neighbor, then the message must be -relayed. For example, if A wants to send to D, a message from A must -first be sent to B, which then sends the message to C, which then sends -the message to D. Time is slotted, with a message transmission time -taking exactly one time slot, e.g., as in slotted Aloha. During a slot, -a node can do one of the following: (i) send a message, (ii) receive a -message (if exactly one message is being sent to it), (iii) remain -silent. As always, if a node hears two or more simultaneous -transmissions, a collision occurs and none of the transmitted messages -are received successfully. You can assume here that there are no -bit-level errors, and thus if exactly one message is sent, it will be -received correctly by those within the transmission radius of the -sender. - -a. Suppose now that an omniscient controller (i.e., a controller that - knows the state of every node in the network) can command each node - to do whatever it (the omniscient controller) wishes, i.e., to send - a message, to receive a message, or to remain silent. Given this - omniscient controller, what is the maximum rate at which a data - message can be transferred from C to A, given that there are no - other messages between any other source/destination pairs? - -b. Suppose now that A sends messages to B, and D sends messages to C. - What is the combined maximum rate at which data messages can flow - from A to B and from D to C? - -c. Suppose now that A sends messages to B, and C sends messages to D. - What is the combined maximum rate at which data messages can flow - from A to B and from C to D? - -d. Suppose now that the wireless links are replaced by wired links. - Repeat questions (a) through (c) again in this wired scenario. - -e. Now suppose we are again in the wireless scenario, and that for - every data message sent from source to destination, the destination - will send an ACK message back to the source (e.g., as in TCP). Also - suppose that each ACK message takes up one slot. Repeat questions - (a)--(c) above for this scenario. P9. Describe the format of the - 802.15.1 Bluetooth frame. You will have to do some reading outside - of the text to find this information. Is there anything in the frame - format that inherently limits the number of active nodes in an - 802.15.1 network to eight active nodes? Explain. P10. Consider the - following idealized LTE scenario. The downstream channel (see Figure - 7.21 ) is slotted in time, across F frequencies. There are four - nodes, A, B, C, and D, reachable from the base station at rates of - 10 Mbps, 5 Mbps, 2.5 Mbps, and 1 Mbps, respectively, on the - downstream channel. These rates assume that the base station - utilizes all time slots available on all F frequencies to send to - just one station. The base station has an infinite amount of data to - send to each of the nodes, and can send to any one of these four - nodes using any of the F frequencies during any time slot in the - downstream sub-frame. - -f. What is the maximum rate at which the base station can send to the - nodes, assuming it - -can send to any node it chooses during each time slot? Is your solution -fair? Explain and define what you mean by "fair." - -b. If there is a fairness requirement that each node must receive an - equal amount of data during each one second interval, what is the - average transmission rate by the base station (to all nodes) during - the downstream sub-frame? Explain how you arrived at your answer. - -c. Suppose that the fairness criterion is that any node can receive at - most twice as much data as any other node during the sub-frame. What - is the average transmission rate by the base station (to all nodes) - during the sub-frame? Explain how you arrived at your answer. P11. - In Section 7.5 , one proposed solution that allowed mobile users to - maintain their IP addresses as they moved among foreign networks was - to have a foreign network advertise a highly specific route to the - mobile user and use the existing routing infrastructure to propagate - this information throughout the network. We identified scalability - as one concern. Suppose that when a mobile user moves from one - network to another, the new foreign network advertises a specific - route to the mobile user, and the old foreign network withdraws its - route. Consider how routing information propagates in a - distance-vector algorithm (particularly for the case of interdomain - routing among networks that span the globe). - -d. Will other routers be able to route datagrams immediately to the new - foreign network as soon as the foreign network begins advertising - its route? - -e. Is it possible for different routers to believe that different - foreign networks contain the mobile user? - -f. Discuss the timescale over which other routers in the network will - eventually learn the path to the mobile users. P12. Suppose the - correspondent in Figure 7.23 were mobile. Sketch the additional - networklayer infrastructure that would be needed to route the - datagram from the original mobile user to the (now mobile) - correspondent. Show the structure of the datagram(s) between the - original mobile user and the (now mobile) correspondent, as in - Figure 7.24 . P13. In mobile IP, what effect will mobility have on - end-to-end delays of datagrams between the source and destination? - P14. Consider the chaining example discussed at the end of Section - 7.7.2 . Suppose a mobile user visits foreign networks A, B, and C, - and that a correspondent begins a connection to the mobile user when - it is resident in foreign network A. List the sequence of messages - between foreign agents, and between foreign agents and the home - agent as the mobile user moves from network A to network B to - network C. Next, suppose chaining is not performed, and the - correspondent (as well as the home agent) must be explicitly - notified of the changes in the mobile user's care-of address. List - the sequence of messages that would need to be exchanged in this - second scenario. - -P15. Consider two mobile nodes in a foreign network having a foreign -agent. Is it possible for the two mobile nodes to use the same care-of -address in mobile IP? Explain your answer. P16. In our discussion of how -the VLR updated the HLR with information about the mobile's current -location, what are the advantages and disadvantages of providing the -MSRN as opposed to the address of the VLR to the HLR? - -Wireshark Lab At the Web site for this textbook, -www.pearsonhighered.com/cs-resources, you'll find a Wireshark lab for -this chapter that captures and studies the 802.11 frames exchanged -between a wireless laptop and an access point. - -AN INTERVIEW WITH... Deborah Estrin Deborah Estrin is a Professor of -Computer Science at Cornell Tech in New York City and a Professor of -Public Health at Weill Cornell Medical College. She is founder of the -Health Tech Hub at Cornell Tech and co-founder of the non-profit startup -Open mHealth. She received her Ph.D. (1985) in Computer Science from -M.I.T. and her B.S. (1980) from UC Berkeley. Estrin's early research -focused on the design of network protocols, including multicast and -inter-domain routing. In 2002 Estrin founded the NSF-funded Science and -Technology Center at UCLA, Center for Embedded Networked Sensing (CENS -http://cens.ucla.edu.). CENS launched new areas of multi-disciplinary -computer systems research from sensor networks for environmental -monitoring, to participatory sensing for citizen science. Her current -focus is on mobile health and small data, leveraging the pervasiveness -of mobile devices and digital interactions for health and life -management, as described in her 2013 TEDMED talk. Professor Estrin is an -elected member of the American Academy of Arts and Sciences (2007) and -the National Academy of Engineering (2009). She is a fellow of the IEEE, -ACM, and AAAS. She was selected as the first ACM-W Athena Lecturer -(2006), awarded the Anita Borg Institute's Women of Vision Award for -Innovation (2007), inducted into the WITI hall of fame (2008) and -awarded Doctor Honoris Causa from EPFL (2008) and Uppsala University -(2011). - -Please describe a few of the most exciting projects you have worked on -during your career. What were the biggest challenges? In the mid-90s at -USC and ISI, I had the great fortune to work with the likes of Steve -Deering, Mark Handley, and Van Jacobson on the design of multicast -routing protocols (in particular, PIM). I tried to carry many of the -architectural design lessons from multicast into the design of -ecological monitoring arrays, where for the first time I really began to -take applications and multidisciplinary research seriously. That -interest in jointly innovating in the social and technological space is -what interests me so much about my latest area of research, mobile -health. The challenges in these projects were as diverse as the problem -domains, but what they all had in common was the need to keep our eyes -open to whether we had the problem definition right as we iterated -between design and deployment, prototype and pilot. None of them were -problems that could be solved analytically, with simulation or even in -constructed laboratory experiments. They all challenged our ability to -retain clean architectures in the presence of messy problems and -contexts, and they all called for extensive collaboration. What changes -and innovations do you see happening in wireless networks and mobility -in the future? In a prior edition of this interview I said that I have -never put much faith into predicting the future, but I did go on to -speculate that we might see the end of feature phones (i.e., those that -are not programmable and are used only for voice and text messaging) as -smart phones become more and more powerful and the primary point of -Internet access for many---and now not so many years later that is -clearly the case. I also predicted that we would see the continued -proliferation of embedded SIMs by which all sorts of devices have the -ability to communicate via the cellular network at low data rates. While -that has occurred, we see many devices and "Internet of Things" that use -embedded WiFi and other lower power, shorter range, forms of -connectivity to local hubs. I did not anticipate at that time the -emergence of a large consumer wearables market. By the time the next -edition is published I expect broad proliferation of personal -applications that leverage data from IoT and other digital traces. Where -do you see the future of networking and the Internet? Again I think its -useful to look both back and forward. Previously I observed that the -efforts in named data and software-defined networking would emerge to -create a more manageable, evolvable, and richer infrastructure and more -generally represent moving the role of architecture higher up in the -stack. In the beginnings of the Internet, architecture was layer 4 and -below, with - -applications being more siloed/monolithic, sitting on top. Now data and -analytics dominate transport. The adoption of SDN (which I'm really -happy to see is featured in this 7th edition of this book) has been well -beyond what I ever anticipated. However, looking up the stack, our -dominant applications increasingly live in walled gardens, whether -mobile apps or large consumer platforms such as Facebook. As Data -Science and Big Data techniques develop, they might help to lure these -applications out of their silos because of the value in connecting with -other apps and platforms. What people inspired you professionally? There -are three people who come to mind. First, Dave Clark, the secret sauce -and under-sung hero of the Internet community. I was lucky to be around -in the early days to see him act as the "organizing principle" of the -IAB and Internet governance; the priest of rough consensus and running -code. Second, Scott Shenker, for his intellectual brilliance, integrity, -and persistence. I strive for, but rarely attain, his clarity in -defining problems and solutions. He is always the first person I e-mail -for advice on matters large and small. Third, my sister Judy Estrin, who -had the creativity and courage to spend her career bringing ideas and -concepts to market. Without the Judys of the world the Internet -technologies would never have transformed our lives. What are your -recommendations for students who want careers in computer science and -networking? First, build a strong foundation in your academic work, -balanced with any and every real-world work experience you can get. As -you look for a working environment, seek opportunities in problem areas -you really care about and with smart teams that you can learn from. - -Chapter 8 Security in Computer Networks - -Way back in Section 1.6 we described some of the more prevalent and -damaging classes of Internet attacks, including malware attacks, denial -of service, sniffing, source masquerading, and message modification and -deletion. Although we have since learned a tremendous amount about -computer networks, we still haven't examined how to secure networks from -those attacks. Equipped with our newly acquired expertise in computer -networking and Internet protocols, we'll now study in-depth secure -communication and, in particular, how computer networks can be defended -from those nasty bad guys. Let us introduce Alice and Bob, two people -who want to communicate and wish to do so "securely." This being a -networking text, we should remark that Alice and Bob could be two -routers that want to exchange routing tables securely, a client and -server that want to establish a secure transport connection, or two -e-mail applications that want to exchange secure e-mail---all case -studies that we will consider later in this chapter. Alice and Bob are -well-known fixtures in the security community, perhaps because their -names are more fun than a generic entity named "A" that wants to -communicate securely with a generic entity named "B." Love affairs, -wartime communication, and business transactions are the commonly cited -human needs for secure communications; preferring the first to the -latter two, we're happy to use Alice and Bob as our sender and receiver, -and imagine them in this first scenario. We said that Alice and Bob want -to communicate and wish to do so "securely," but what precisely does -this mean? As we will see, security (like love) is a many-splendored -thing; that is, there are many facets to security. Certainly, Alice and -Bob would like for the contents of their communication to remain secret -from an eavesdropper. They probably would also like to make sure that -when they are communicating, they are indeed communicating with each -other, and that if their communication is tampered with by an -eavesdropper, that this tampering is detected. In the first part of this -chapter, we'll cover the fundamental cryptography techniques that allow -for encrypting communication, authenticating the party with whom one is -communicating, and ensuring message integrity. In the second part of -this chapter, we'll examine how the fundamental cryptography principles -can be used to create secure networking protocols. Once again taking a -top-down approach, we'll examine secure protocols in each of the (top -four) layers, beginning with the application layer. We'll examine how to -secure e-mail, how to secure a TCP connection, how to provide blanket -security at the network layer, and how to secure a wireless LAN. In the -third part of this chapter we'll consider operational security, - -which is about protecting organizational networks from attacks. In -particular, we'll take a careful look at how firewalls and intrusion -detection systems can enhance the security of an organizational network. - -8.1 What Is Network Security? Let's begin our study of network security -by returning to our lovers, Alice and Bob, who want to communicate -"securely." What precisely does this mean? Certainly, Alice wants only -Bob to be able to understand a message that she has sent, even though -they are communicating over an insecure medium where an intruder (Trudy, -the intruder) may intercept whatever is transmitted from Alice to Bob. -Bob also wants to be sure that the message he receives from Alice was -indeed sent by Alice, and Alice wants to make sure that the person with -whom she is communicating is indeed Bob. Alice and Bob also want to make -sure that the contents of their messages have not been altered in -transit. They also want to be assured that they can communicate in the -first place (i.e., that no one denies them access to the resources -needed to communicate). Given these considerations, we can identify the -following desirable properties of secure communication. Confidentiality. -Only the sender and intended receiver should be able to understand the -contents of the transmitted message. Because eavesdroppers may intercept -the message, this necessarily requires that the message be somehow -encrypted so that an intercepted message cannot be understood by an -interceptor. This aspect of confidentiality is probably the most -commonly perceived meaning of the term secure communication. We'll study -cryptographic techniques for encrypting and decrypting data in Section -8.2. Message integrity. Alice and Bob want to ensure that the content of -their communication is not altered, either maliciously or by accident, -in transit. Extensions to the checksumming techniques that we -encountered in reliable transport and data link protocols can be used to -provide such message integrity. We will study message integrity in -Section 8.3. End-point authentication. Both the sender and receiver -should be able to confirm the identity of the other party involved in -the communication---to confirm that the other party is indeed who or -what they claim to be. Face-to-face human communication solves this -problem easily by visual recognition. When communicating entities -exchange messages over a medium where they cannot see the other party, -authentication is not so simple. When a user wants to access an inbox, -how does the mail server verify that the user is the person he or she -claims to be? We study end-point authentication in Section 8.4. -Operational security. Almost all organizations (companies, universities, -and so on) today have networks that are attached to the public Internet. -These networks therefore can potentially be compromised. Attackers can -attempt to deposit worms into the hosts in the network, obtain corporate -secrets, map the internal network configurations, and launch DoS -attacks. We'll see in Section 8.9 that operational devices such as -firewalls and intrusion detection systems are used to counter attacks -against an organization's network. A firewall sits between the -organization's network and the public network, controlling packet access -to and from the network. An intrusion detection - -system performs "deep packet inspection," alerting the network -administrators about suspicious activity. Having established what we -mean by network security, let's next consider exactly what information -an intruder may have access to, and what actions can be taken by the -intruder. Figure 8.1 illustrates the scenario. Alice, the sender, wants -to send data to Bob, the receiver. In order to exchange data securely, -while meeting the requirements of confidentiality, end-point -authentication, and message integrity, Alice and Bob will exchange -control messages and data messages (in much the same way that TCP -senders and receivers exchange control segments and data segments). - -Figure 8.1 Sender, receiver, and intruder (Alice, Bob, and Trudy) - -All or some of these messages will typically be encrypted. As discussed -in Section 1.6, an intruder can potentially perform -eavesdropping---sniffing and recording control and data messages on the -channel. modification, insertion, or deletion of messages or message -content. As we'll see, unless appropriate countermeasures are taken, -these capabilities allow an intruder to mount a wide variety of security -attacks: snooping on communication (possibly stealing passwords and -data), impersonating another entity, hijacking an ongoing session, -denying service to legitimate network users by overloading system -resources, and so on. A summary of reported attacks is maintained at the -CERT Coordination Center \[CERT 2016\]. Having established that there -are indeed real threats loose in the Internet, what are the Internet -equivalents of Alice and Bob, our friends who need to communicate -securely? Certainly, Bob and Alice might be human users at two end -systems, for example, a real Alice and a real Bob who really do want to -exchange secure e-mail. They might also be participants in an electronic -commerce transaction. For example, a real Bob might want to transfer his -credit card number securely to a Web server to purchase - -an item online. Similarly, a real Alice might want to interact with her -bank online. The parties needing secure communication might themselves -also be part of the network infrastructure. Recall that the domain name -system (DNS, see Section 2.4) or routing daemons that exchange routing -information (see Chapter 5) require secure communication between two -parties. The same is true for network management applications, a topic -we examined in Chapter 5). An intruder that could actively interfere -with DNS lookups (as discussed in Section 2.4), routing computations -\[RFC 4272\], or network management functions \[RFC 3414\] could wreak -havoc in the Internet. Having now established the framework, a few of -the most important definitions, and the need for network security, let -us next delve into cryptography. While the use of cryptography in -providing confidentiality is self-evident, we'll see shortly that it is -also central to providing end-point authentication and message -integrity---making cryptography a cornerstone of network security. - -8.2 Principles of Cryptography Although cryptography has a long history -dating back at least as far as Julius Caesar, modern cryptographic -techniques, including many of those used in the Internet, are based on -advances made in the past 30 years. Kahn's book, The Codebreakers \[Kahn -1967\], and Singh's book, The Code Book: The Science of Secrecy from -Ancient Egypt to Quantum Cryptography \[Singh 1999\], provide a -fascinating look at the - -Figure 8.2 Cryptographic components - -long history of cryptography. A complete discussion of cryptography -itself requires a complete book \[Kaufman 1995; Schneier 1995\] and so -we only touch on the essential aspects of cryptography, particularly as -they are practiced on the Internet. We also note that while our focus in -this section will be on the use of cryptography for confidentiality, -we'll see shortly that cryptographic techniques are inextricably woven -into authentication, message integrity, nonrepudiation, and more. -Cryptographic techniques allow a sender to disguise data so that an -intruder can gain no information from the intercepted data. The -receiver, of course, must be able to recover the original data from the -disguised data. Figure 8.2 illustrates some of the important -terminology. Suppose now that Alice wants to send a message to Bob. -Alice's message in its original form (for example, " Bob, I love you. -Alice ") is known as plaintext, or cleartext. Alice encrypts her -plaintext message using an encryption algorithm so that the encrypted -message, known as ciphertext, looks unintelligible to any intruder. -Interestingly, in many modern cryptographic systems, - -including those used in the Internet, the encryption technique itself is -known---published, standardized, and available to everyone (for example, -\[RFC 1321; RFC 3447; RFC 2420; NIST 2001\]), even a potential intruder! -Clearly, if everyone knows the method for encoding data, then there must -be some secret information that prevents an intruder from decrypting the -transmitted data. This is where keys come in. In Figure 8.2, Alice -provides a key, KA, a string of numbers or characters, as input to the -encryption algorithm. The encryption algorithm takes the key and the -plaintext message, m, as input and produces ciphertext as output. The -notation KA(m) refers to the ciphertext form (encrypted using the key -KA) of the plaintext message, m. The actual encryption algorithm that -uses key KA will be evident from the context. Similarly, Bob will -provide a key, KB, to the decryption algorithm that takes the ciphertext -and Bob's key as input and produces the original plaintext as output. -That is, if Bob receives an encrypted message KA(m), he decrypts it by -computing KB(KA(m))=m. In symmetric key systems, Alice's and Bob's keys -are identical and are secret. In public key systems, a pair of keys is -used. One of the keys is known to both Bob and Alice (indeed, it is -known to the whole world). The other key is known only by either Bob or -Alice (but not both). In the following two subsections, we consider -symmetric key and public key systems in more detail. - -8.2.1 Symmetric Key Cryptography All cryptographic algorithms involve -substituting one thing for another, for example, taking a piece of -plaintext and then computing and substituting the appropriate ciphertext -to create the encrypted message. Before studying a modern key-based -cryptographic system, let us first get our feet wet by studying a very -old, very simple symmetric key algorithm attributed to Julius Caesar, -known as the Caesar cipher (a cipher is a method for encrypting data). -For English text, the Caesar cipher would work by taking each letter in -the plaintext message and substituting the letter that is k letters -later (allowing wraparound; that is, having the letter z followed by the -letter a) in the alphabet. For example if k=3, then the letter a in -plaintext becomes d in ciphertext; b in plaintext becomes e in -ciphertext, and so on. Here, the value of k serves as the key. As an -example, the plaintext message " bob, i love you. Alice " becomes " ere, -l oryh brx. dolfh " in ciphertext. While the ciphertext does indeed look -like gibberish, it wouldn't take long to break the code if you knew that -the Caesar cipher was being used, as there are only 25 possible key -values. An improvement on the Caesar cipher is the monoalphabetic -cipher, which also substitutes one letter of the alphabet with another -letter of the alphabet. However, rather than substituting according to a -regular pattern (for example, substitution with an offset of k for all -letters), any letter can be substituted for any other letter, as long as -each letter has a unique substitute letter, and vice versa. The -substitution - -rule in Figure 8.3 shows one possible rule for encoding plaintext. The -plaintext message " bob, i love you. Alice " becomes "nkn, s gktc wky. -Mgsbc." Thus, as in the case of the Caesar cipher, this looks like -gibberish. A monoalphabetic cipher would also appear to be better than -the Caesar cipher in that there are 26! (on the order of 1026) possible -pairings of letters rather than 25 possible pairings. A brute-force -approach of trying all 1026 possible pairings - -Figure 8.3 A monoalphabetic cipher - -would require far too much work to be a feasible way of breaking the -encryption algorithm and decoding the message. However, by statistical -analysis of the plaintext language, for example, knowing that the -letters e and t are the most frequently occurring letters in typical -English text (accounting for 13 percent and 9 percent of letter -occurrences), and knowing that particular two-and three-letter -occurrences of letters appear quite often together (for example, "in," -"it," "the," "ion," "ing," and so forth) make it relatively easy to -break this code. If the intruder has some knowledge about the possible -contents of the message, then it is even easier to break the code. For -example, if Trudy the intruder is Bob's wife and suspects Bob of having -an affair with Alice, then she might suspect that the names "bob" and -"alice" appear in the text. If Trudy knew for certain that those two -names appeared in the ciphertext and had a copy of the example -ciphertext message above, then she could immediately determine seven of -the 26 letter pairings, requiring 109 fewer possibilities to be checked -by a brute-force method. Indeed, if Trudy suspected Bob of having an -affair, she might well expect to find some other choice words in the -message as well. When considering how easy it might be for Trudy to -break Bob and Alice's encryption scheme, one can distinguish three -different scenarios, depending on what information the intruder has. -Ciphertext-only attack. In some cases, the intruder may have access only -to the intercepted ciphertext, with no certain information about the -contents of the plaintext message. We have seen how statistical analysis -can help in a ciphertext-only attack on an encryption scheme. -Known-plaintext attack. We saw above that if Trudy somehow knew for sure -that "bob" and "alice" appeared in the ciphertext message, then she -could have determined the (plaintext, ciphertext) pairings for the -letters a, l, i, c, e, b, and o. Trudy might also have been fortunate -enough to have recorded all of the ciphertext transmissions and then -found Bob's own decrypted version of one of the transmissions scribbled -on a piece of paper. When an intruder knows some of the (plaintext, -ciphertext) pairings, we refer to this as a known-plaintext attack on -the encryption scheme. Chosen-plaintext attack. In a chosen-plaintext -attack, the intruder is able to choose the plaintext - -message and obtain its corresponding ciphertext form. For the simple -encryption algorithms we've seen so far, if Trudy could get Alice to -send the message, " The quick brown fox jumps over the lazy dog, " she -could completely break the encryption scheme. We'll see shortly that for -more sophisticated encryption techniques, a chosen-plaintext attack does -not necessarily mean that the encryption technique can be broken. Five -hundred years ago, techniques improving on monoalphabetic encryption, -known as polyalphabetic encryption, were invented. The idea behind -polyalphabetic encryption is to use multiple monoalphabetic ciphers, -with a specific - -Figure 8.4 A polyalphabetic cipher using two Caesar ciphers - -monoalphabetic cipher to encode a letter in a specific position in the -plaintext message. Thus, the same letter, appearing in different -positions in the plaintext message, might be encoded differently. An -example of a polyalphabetic encryption scheme is shown in Figure 8.4. It -has two Caesar ciphers (with k=5 and k=19), shown as rows. We might -choose to use these two Caesar ciphers, C1 and C2, in the repeating -pattern C1, C2, C2, C1, C2. That is, the first letter of plaintext is to -be encoded using C1, the second and third using C2, the fourth using C1, -and the fifth using C2. The pattern then repeats, with the sixth letter -being encoded using C1, the seventh with C2, and so on. The plaintext -message " bob, i love you. " is thus encrypted " ghu, n etox dhz. " Note -that the first b in the plaintext message is encrypted using C1, while -the second b is encrypted using C2. In this example, the encryption and -decryption "key" is the knowledge of the two Caesar keys (k=5, k=19) and -the pattern C1, C2, C2, C1, C2. Block Ciphers Let us now move forward to -modern times and examine how symmetric key encryption is done today. -There are two broad classes of symmetric encryption techniques: stream -ciphers and block ciphers. We'll briefly examine stream ciphers in -Section 8.7 when we investigate security for wireless LANs. In this -section, we focus on block ciphers, which are used in many secure -Internet protocols, including PGP (for secure e-mail), SSL (for securing -TCP connections), and IPsec (for securing the network-layer transport). -In a block cipher, the message to be encrypted is processed in blocks of -k bits. For example, if k=64, then the message is broken into 64-bit -blocks, and each block is encrypted independently. To encode a block, -the cipher uses a one-to-one mapping to map the k-bit block of cleartext -to a k-bit block of - -ciphertext. Let's look at an example. Suppose that k=3, so that the -block cipher maps 3-bit inputs (cleartext) to 3-bit outputs -(ciphertext). One possible mapping is given in Table 8.1. Notice that -this is a one-to-one mapping; that is, there is a different output for -each input. This block cipher breaks the message up into 3-bit blocks -and encrypts each block according to the above mapping. You should -verify that the message 010110001111 gets encrypted into 101000111001. -Continuing with this 3-bit block example, note that the mapping in Table -8.1 is just one mapping of many possible mappings. How many possible -mappings are Table 8.1 A specific 3-bit block cipher input - -output - -input - -output - -000 - -110 - -100 - -011 - -001 - -111 - -101 - -010 - -010 - -101 - -110 - -000 - -011 - -100 - -111 - -001 - -there? To answer this question, observe that a mapping is nothing more -than a permutation of all the possible inputs. There are 23(=8) possible -inputs (listed under the input columns). These eight inputs can be -permuted in 8!=40,320 different ways. Since each of these permutations -specifies a mapping, there are 40,320 possible mappings. We can view -each of these mappings as a key---if Alice and Bob both know the mapping -(the key), they can encrypt and decrypt the messages sent between them. -The brute-force attack for this cipher is to try to decrypt ciphtertext -by using all mappings. With only 40,320 mappings (when k=3), this can -quickly be accomplished on a desktop PC. To thwart brute-force attacks, -block ciphers typically use much larger blocks, consisting of k=64 bits -or even larger. Note that the number of possible mappings for a general -k-block cipher is 2k!, which is astronomical for even moderate values of -k (such as k=64). Although full-table block ciphers, as just described, -with moderate values of k can produce robust symmetric key encryption -schemes, they are unfortunately difficult to implement. For k=64 and for -a given mapping, Alice and Bob would need to maintain a table with 264 -input values, which is an infeasible task. Moreover, if Alice and Bob -were to change keys, they would have to each regenerate the table. Thus, -a full-table block cipher, providing predetermined mappings between all -inputs and outputs (as in the example above), is simply out of the -question. - -Instead, block ciphers typically use functions that simulate randomly -permuted tables. An example (adapted from \[Kaufman 1995\]) of such a -function for k=64 bits is shown in Figure 8.5. The function first breaks -a 64-bit block into 8 chunks, with each chunk consisting of 8 bits. Each -8-bit chunk is processed by an 8-bit to 8-bit table, which is of -manageable size. For example, the first chunk is processed by the table -denoted by T1. Next, the 8 output chunks are reassembled into a 64-bit -block. The positions of the 64 bits in the block are then scrambled -(permuted) to produce a 64-bit output. This output is fed back to the -64-bit input, where another cycle begins. After n such cycles, the -function provides a 64-bit block of ciphertext. The purpose of the -rounds is to make each input bit affect most (if not all) of the final -output bits. (If only one round were used, a given input bit would -affect only 8 of the 64 output bits.) The key for this block cipher -algorithm would be the eight permutation tables (assuming the scramble -function is publicly known). - -Figure 8.5 An example of a block cipher - -Today there are a number of popular block ciphers, including DES -(standing for Data Encryption Standard), 3DES, and AES (standing for -Advanced Encryption Standard). Each of these standards uses functions, -rather than predetermined tables, along the lines of Figure 8.5 (albeit -more complicated and specific to each cipher). Each of these algorithms -also uses a string of bits for a key. For example, DES uses 64-bit -blocks with a 56-bit key. AES uses 128-bit blocks and can operate with -keys that are 128, 192, and 256 bits long. An algorithm's key determines -the specific "mini-table" mappings and permutations within the -algorithm's internals. The brute-force attack for each of these ciphers -is to cycle through all the keys, applying the decryption algorithm with -each key. Observe that with a key length of n, there are 2n possible -keys. NIST \[NIST 2001\] estimates that a machine that could crack -56-bit DES in one second (that is, try all 256 keys in one second) would -take approximately 149 trillion years to crack a 128-bit AES key. - -Cipher-Block Chaining In computer networking applications, we typically -need to encrypt long messages (or long streams of data). If we apply a -block cipher as described by simply chopping up the message into k-bit -blocks and independently encrypting each block, a subtle but important -problem occurs. To see this, observe that two or more of the cleartext -blocks can be identical. For example, the cleartext in two or more -blocks could be "HTTP/1.1". For these identical blocks, a block cipher -would, of course, produce the same ciphertext. An attacker could -potentially guess the cleartext when it sees identical ciphertext blocks -and may even be able to decrypt the entire message by identifying -identical ciphtertext blocks and using knowledge about the underlying -protocol structure \[Kaufman 1995\]. To address this problem, we can mix -some randomness into the ciphertext so that identical plaintext blocks -produce different ciphertext blocks. To explain this idea, let m(i) -denote the ith plaintext block, c(i) denote the ith ciphertext block, -and a⊕b denote the exclusive-or (XOR) of two bit strings, a and b. -(Recall that the 0⊕0=1⊕1=0 and 0⊕1=1⊕0=1, and the XOR of two bit strings -is done on a bit-by-bit basis. So, for example, -10101010⊕11110000=01011010.) Also, denote the block-cipher encryption -algorithm with key S as KS. The basic idea is as follows. The sender -creates a random k-bit number r(i) for the ith block and calculates -c(i)=KS(m(i)⊕r(i)). Note that a new k-bit random number is chosen for -each block. The sender then sends c(1), r(1), c(2), r(2), c(3), r(3), -and so on. Since the receiver receives c(i) and r(i), it can recover -each block of the plaintext by computing m(i)=KS(c(i))⊕r(i). It is -important to note that, although r(i) is sent in the clear and thus can -be sniffed by Trudy, she cannot obtain the plaintext m(i), since she -does not know the key KS. Also note that if two plaintext blocks m(i) -and m(j) are the same, the corresponding ciphertext blocks c(i) and c(j) -will be different (as long as the random numbers r(i) and r(j) are -different, which occurs with very high probability). As an example, -consider the 3-bit block cipher in Table 8.1. Suppose the plaintext is -010010010. If Alice encrypts this directly, without including the -randomness, the resulting ciphertext becomes 101101101. If Trudy sniffs -this ciphertext, because each of the three cipher blocks is the same, -she can correctly surmise that each of the three plaintext blocks are -the same. Now suppose instead Alice generates the random blocks -r(1)=001, r(2)=111, and r(3)=100 and uses the above technique to -generate the ciphertext c(1)=100, c(2)=010, and c(3)=000. Note that the -three ciphertext blocks are different even though the plaintext blocks -are the same. Alice then sends c(1), r(1), c(2), and r(2). You should -verify that Bob can obtain the original plaintext using the shared key -KS. The astute reader will note that introducing randomness solves one -problem but creates another: namely, Alice must transmit twice as many -bits as before. Indeed, for each cipher bit, she must now also send a -random bit, doubling the required bandwidth. In order to have our cake -and eat it too, block ciphers typically use a technique called Cipher -Block Chaining (CBC). The basic idea is to send only one random value -along with the very first message, and then have the sender and receiver -use the - -computed coded blocks in place of the subsequent random number. -Specifically, CBC operates as follows: - -1. Before encrypting the message (or the stream of data), the sender - generates a random k-bit string, called the Initialization Vector - (IV). Denote this initialization vector by c(0). The sender sends - the IV to the receiver in cleartext. - -2. For the first block, the sender calculates m(1)⊕c(0), that is, - calculates the exclusive-or of the first block of cleartext with - the IV. It then runs the result through the block-cipher algorithm - to get the corresponding ciphertext block; that is, - c(1)=KS(m(1)⊕c(0)). The sender sends the encrypted block c(1) to the - receiver. - -3. For the ith block, the sender generates the ith ciphertext block - from c(i)= KS(m(i)⊕c(i−1)). Let's now examine some of the - consequences of this approach. First, the receiver will still be - able to recover the original message. Indeed, when the receiver - receives c(i), it decrypts it with KS to obtain s(i)=m(i)⊕c(i−1); - since the receiver also knows c(i−1), it then obtains the cleartext - block from m(i)=s(i)⊕c(i−1). Second, even if two cleartext blocks - are identical, the corresponding ciphtertexts (almost always) will - be different. Third, although the sender sends the IV in the clear, - an intruder will still not be able to decrypt the ciphertext blocks, - since the intruder does not know the secret key, S. Finally, the - sender only sends one overhead block (the IV), thereby negligibly - increasing the bandwidth usage for long messages (consisting of - hundreds of blocks). As an example, let's now determine the - ciphertext for the 3-bit block cipher in Table 8.1 with plaintext - 010010010 and IV=c(0)=001. The sender first uses the IV to calculate - c(1)=KS(m(1)⊕c(0))=100. The sender then calculates c(2)= - KS(m(2)⊕c(1))=KS(010⊕100)=000, and C(3)=KS(m(3)⊕c(2))=KS(010⊕ - 000)=101. The reader should verify that the receiver, knowing the IV - and KS can recover the original plaintext. CBC has an important - consequence when designing secure network protocols: we'll need to - provide a mechanism within the protocol to distribute the IV from - sender to receiver. We'll see how this is done for several protocols - later in this chapter. - -8.2.2 Public Key Encryption For more than 2,000 years (since the time of -the Caesar cipher and up to the 1970s), encrypted communication required -that the two communicating parties share a common secret---the symmetric -key used for encryption and decryption. One difficulty with this -approach is that the two parties must somehow agree on the shared key; -but to do so requires (presumably secure) communication! Perhaps the -parties could first meet and agree on the key in person (for example, -two of Caesar's centurions might meet at the Roman baths) and thereafter -communicate with encryption. In a networked world, - -however, communicating parties may never meet and may never converse -except over the network. Is it possible for two parties to communicate -with encryption without having a shared secret key that is known in -advance? In 1976, Diffie and Hellman \[Diffie 1976\] demonstrated an -algorithm (known now as Diffie-Hellman Key Exchange) to do just that---a -radically different and marvelously elegant approach toward secure -communication that has led to the development of today's public key -cryptography systems. We'll see shortly that public key cryptography -systems also have several wonderful properties that make them useful not -only - -Figure 8.6 Public key cryptography - -for encryption, but for authentication and digital signatures as well. -Interestingly, it has recently come to light that ideas similar to those -in \[Diffie 1976\] and \[RSA 1978\] had been independently developed in -the early 1970s in a series of secret reports by researchers at the -Communications-Electronics Security Group in the United Kingdom \[Ellis -1987\]. As is often the case, great ideas can spring up independently in -many places; fortunately, public key advances took place not only in -private, but also in the public view, as well. The use of public key -cryptography is conceptually quite simple. Suppose Alice wants to -communicate with Bob. As shown in Figure 8.6, rather than Bob and Alice -sharing a single secret key (as in the case of symmetric key systems), -Bob (the recipient of Alice's messages) instead has two keys---a public -key that is available to everyone in the world (including Trudy the -intruder) and a private key that is known only to Bob. We will use the -notation KB+ and KB− to refer to Bob's public and private keys, -respectively. In order to communicate with Bob, Alice first fetches -Bob's public key. Alice then encrypts her message, m, to Bob using Bob's -public key and a known (for example, standardized) encryption algorithm; -that is, Alice computes KB−(m). Bob receives Alice's encrypted message -and uses his private key and a known (for example, standardized) -decryption algorithm to decrypt Alice's encrypted message. That is, Bob -computes KB−(KB+(m)). We will see below that there are -encryption/decryption - -algorithms and techniques for choosing public and private keys such that -KB−(KB+(m))=m; that is, applying Bob's public key, KB+, to a message, m -(to get KB−(m)), and then applying Bob's private key, KB−, to the -encrypted version of m (that is, computing KB−(KB+(m))) gives back m. -This is a remarkable result! In this manner, Alice can use Bob's -publicly available key to send a secret message to Bob without either of -them having to distribute any secret keys! We will see shortly that we -can interchange the public key and private key encryption and get the -same remarkable result----that is, KB−(B+(m))=KB+(KB−(m))=m. The use of -public key cryptography is thus conceptually simple. But two immediate -worries may spring to mind. A first concern is that although an intruder -intercepting Alice's encrypted message will see only gibberish, the -intruder knows both the key (Bob's public key, which is available for -all the world to see) and the algorithm that Alice used for encryption. -Trudy can thus mount a chosen-plaintext attack, using the known -standardized encryption algorithm and Bob's publicly available -encryption key to encode any message she chooses! Trudy might well try, -for example, to encode messages, or parts of messages, that she suspects -that Alice might send. Clearly, if public key cryptography is to work, -key selection and encryption/decryption must be done in such a way that -it is impossible (or at least so hard as to be nearly impossible) for an -intruder to either determine Bob's private key or somehow otherwise -decrypt or guess Alice's message to Bob. A second concern is that since -Bob's encryption key is public, anyone can send an encrypted message to -Bob, including Alice or someone claiming to be Alice. In the case of a -single shared secret key, the fact that the sender knows the secret key -implicitly identifies the sender to the receiver. In the case of public -key cryptography, however, this is no longer the case since anyone can -send an encrypted message to Bob using Bob's publicly available key. A -digital signature, a topic we will study in Section 8.3, is needed to -bind a sender to a message. RSA While there may be many algorithms that -address these concerns, the RSA algorithm (named after its founders, Ron -Rivest, Adi Shamir, and Leonard Adleman) has become almost synonymous -with public key cryptography. Let's first see how RSA works and then -examine why it works. RSA makes extensive use of arithmetic operations -using modulo-n arithmetic. So let's briefly review modular arithmetic. -Recall that x mod n simply means the remainder of x when divided by n; -so, for example, 19 mod 5=4. In modular arithmetic, one performs the -usual operations of addition, multiplication, and exponentiation. -However, the result of each operation is replaced by the integer -remainder that is left when the result is divided by n. Adding and -multiplying with modular arithmetic is facilitated with the following -handy facts: \[ (a mod n)+(b mod n)\]mod n=(a+b)mod n\[ (a mod n)−(b mod -n)\]mod n=(a−b)mod n\[ (a mod n)⋅(b mod n)\]mod n=(a⋅b)mod n - -It follows from the third fact that (a mod n)d n=ad mod n, which is an -identity that we will soon find very useful. Now suppose that Alice -wants to send to Bob an RSA-encrypted message, as shown in Figure 8.6. -In our discussion of RSA, let's always keep in mind that a message is -nothing but a bit pattern, and every bit pattern can be uniquely -represented by an integer number (along with the length of the bit -pattern). For example, suppose a message is the bit pattern 1001; this -message can be represented by the decimal integer 9. Thus, when -encrypting a message with RSA, it is equivalent to encrypting the unique -integer number that represents the message. There are two interrelated -components of RSA: The choice of the public key and the private key The -encryption and decryption algorithm To generate the public and private -RSA keys, Bob performs the following steps: - -1. Choose two large prime numbers, p and q. How large should p and q - be? The larger the values, the more difficult it is to break RSA, - but the longer it takes to perform the encoding and decoding. RSA - Laboratories recommends that the product of p and q be on the order - of 1,024 bits. For a discussion of how to find large prime numbers, - see \[Caldwell 2012\]. - -2. Compute n=pq and z=(p−1)(q−1). - -3. Choose a number, e, less than n, that has no common factors (other - than 1) with z. (In this case, e and z are said to be relatively - prime.) The letter e is used since this value will be used in - encryption. - -4. Find a number, d, such that ed−1 is exactly divisible (that is, with - no remainder) by z. The letter d is used because this value will be - used in decryption. Put another way, given e, we choose d such that - ed modz=1 - -5. The public key that Bob makes available to the world, KB+, is the - pair of numbers (n, e); his private key, KB−, is the pair of numbers - (n, d). The encryption by Alice and the decryption by Bob are done - as follows: Suppose Alice wants to send Bob a bit pattern - represented by the integer number m (with m\<n). To encode, Alice - performs the exponentiation me, and then computes the integer - remainder when me is divided by n. In other words, the encrypted - value, c, of Alice's plaintext message, m, is c=memod n - -The bit pattern corresponding to this ciphertext c is sent to Bob. To -decrypt the received ciphertext message, c, Bob computes m=cdmod n which -requires the use of his private key (n, d). Table 8.2 Alice's RSA -encryption, e=5, n=35 Plaintext Letter - -m: numeric representation - -me - -Ciphertext c=me mod n - -l - -12 - -248832 - -17 - -o - -15 - -759375 - -15 - -v - -22 - -5153632 - -22 - -e - -5 - -3125 - -10 - -As a simple example of RSA, suppose Bob chooses p=5 and q=7. -(Admittedly, these values are far too small to be secure.) Then n=35 and -z=24. Bob chooses e=5, since 5 and 24 have no common factors. Finally, -Bob chooses d=29, since 5⋅29−1 (that is, ed−1) is exactly divisible by -24. Bob makes the two values, n=35 and e=5, public and keeps the value -d=29 secret. Observing these two public values, suppose Alice now wants -to send the letters l, o, v, and e to Bob. Interpreting each letter as a -number between 1 and 26 (with a being 1, and z being 26), Alice and Bob -perform the encryption and decryption shown in Tables 8.2 and 8.3, -respectively. Note that in this example, we consider each of the four -letters as a distinct message. A more realistic example would be to -convert the four letters into their 8-bit ASCII representations and then -encrypt the integer corresponding to the resulting 32-bit bit pattern. -(Such a realistic example generates numbers that are much too long to -print in a textbook!) Given that the "toy" example in Tables 8.2 and 8.3 -has already produced some extremely large numbers, and given that we saw -earlier that p and q should each be several hundred bits long, several -practical issues regarding RSA come to mind. How does one choose large -prime numbers? How does one then choose e and d? How does one perform -exponentiation with large numbers? A discussion of these important -issues is beyond the scope of this book; see \[Kaufman 1995\] and the -references therein for details. Table 8.3 Bob's RSA decryption, d=29, -n=35 Ciphertext c - -cd - -m = cd mod n - -Plaintext Letter - -17 - -4819685721067509150915091411825223071697 - -12 - -l - -15 - -127834039403948858939111232757568359375 - -15 - -o - -22 - -851643319086537701956194499721106030592 - -22 - -v - -10 - -1000000000000000000000000000000 - -5 - -e - -Session Keys We note here that the exponentiation required by RSA is a -rather time-consuming process. By contrast, DES is at least 100 times -faster in software and between 1,000 and 10,000 times faster in hardware -\[RSA Fast 2012\]. As a result, RSA is often used in practice in -combination with symmetric key cryptography. For example, if Alice wants -to send Bob a large amount of encrypted data, she could do the -following. First Alice chooses a key that will be used to encode the -data itself; this key is referred to as a session key, and is denoted by -KS. Alice must inform Bob of the session key, since this is the shared -symmetric key they will use with a symmetric key cipher (e.g., with DES -or AES). Alice encrypts the session key using Bob's public key, that is, -computes c=(KS)e mod n. Bob receives the RSA-encrypted session key, c, -and decrypts it to obtain the session key, KS. Bob now knows the session -key that Alice will use for her encrypted data transfer. Why Does RSA -Work? RSA encryption/decryption appears rather magical. Why should it be -that by applying the encryption algorithm and then the decryption -algorithm, one recovers the original message? In order to understand why -RSA works, again denote n=pq, where p and q are the large prime numbers -used in the RSA algorithm. Recall that, under RSA encryption, a message -(uniquely represented by an integer), m, is exponentiated to the power e -using modulo-n arithmetic, that is, c=memod n Decryption is performed by -raising this value to the power d, again using modulo-n arithmetic. The -result of an encryption step followed by a decryption step is thus (me -mod n)d mod n. Let's now see what we can say about this quantity. As -mentioned earlier, one important property of modulo arithmetic is (a mod -n)d mod n=ad mod n for any values a, n, and d. Thus, using a=me in this -property, we have (memod n)dmod n=medmod n - -It therefore remains to show that medmod n=m. Although we're trying to -remove some of the magic about why RSA works, to establish this, we'll -need to use a rather magical result from number theory here. -Specifically, we'll need the result that says if p and q are prime, -n=pq, and z=(p−1)(q−1), then xy mod n is the same as x(y mod z) mod n -\[Kaufman 1995\]. Applying this result with x=m and y=ed we have medmod -n=m(edmod z)mod n But remember that we have chosen e and d such that -edmod z=1. This gives us medmod n=m1mod n=m which is exactly the result -we are looking for! By first exponentiating to the power of e (that is, -encrypting) and then exponentiating to the power of d (that is, -decrypting), we obtain the original value, m. Even more wonderful is the -fact that if we first exponentiate to the power of d and then -exponentiate to the power of e---that is, we reverse the order of -encryption and decryption, performing the decryption operation first and -then applying the encryption operation---we also obtain the original -value, m. This wonderful result follows immediately from the modular -arithmetic: (mdmod n)emod n=mdemod n=medmod n=(memod n)dmod n The -security of RSA relies on the fact that there are no known algorithms -for quickly factoring a number, in this case the public value n, into -the primes p and q. If one knew p and q, then given the public value e, -one could easily compute the secret key, d. On the other hand, it is not -known whether or not there exist fast algorithms for factoring a number, -and in this sense, the security of RSA is not guaranteed. Another -popular public-key encryption algorithm is the Diffie-Hellman algorithm, -which we will briefly explore in the homework problems. Diffie-Hellman -is not as versatile as RSA in that it cannot be used to encrypt messages -of arbitrary length; it can be used, however, to establish a symmetric -session key, which is in turn used to encrypt messages. - -8.3 Message Integrity and Digital Signatures In the previous section we -saw how encryption can be used to provide confidentiality to two -communicating entities. In this section we turn to the equally important -cryptography topic of providing message integrity (also known as message -authentication). Along with message integrity, we will discuss two -related topics in this section: digital signatures and end-point -authentication. We define the message integrity problem using, once -again, Alice and Bob. Suppose Bob receives a message (which may be -encrypted or may be in plaintext) and he believes this message was sent -by Alice. To authenticate this message, Bob needs to verify: - -1. The message indeed originated from Alice. -2. The message was not tampered with on its way to Bob. We'll see in - Sections 8.4 through 8.7 that this problem of message integrity is a - critical concern in just about all secure networking protocols. As a - specific example, consider a computer network using a link-state - routing algorithm (such as OSPF) for determining routes between each - pair of routers in the network (see Chapter 5). In a link-state - algorithm, each router needs to broadcast a link-state message to - all other routers in the network. A router's link-state message - includes a list of its directly connected neighbors and the direct - costs to these neighbors. Once a router receives link-state messages - from all of the other routers, it can create a complete map of the - network, run its least-cost routing algorithm, and configure its - forwarding table. One relatively easy attack on the routing - algorithm is for Trudy to distribute bogus link-state messages with - incorrect link-state information. Thus the need for message - integrity---when router B receives a linkstate message from router - A, router B should verify that router A actually created the message - and, further, that no one tampered with the message in transit. In - this section, we describe a popular message integrity technique that - is used by many secure networking protocols. But before doing so, we - need to cover another important topic in cryptography--- - cryptographic hash functions. - -8.3.1 Cryptographic Hash Functions As shown in Figure 8.7, a hash -function takes an input, m, and computes a fixed-size string H(m) - -known as a hash. The Internet checksum (Chapter 3) and CRCs (Chapter 6) -meet this definition. A cryptographic hash function is required to have -the following additional property: It is computationally infeasible to -find any two different messages x and y such that H(x)=H(y). Informally, -this property means that it is computationally infeasible for an -intruder to substitute one message for another message that is protected -by the hash - -Figure 8.7 Hash functions - -Figure 8.8 Initial message and fraudulent message have the same -checksum! - -function. That is, if (m, H(m)) are the message and the hash of the -message created by the sender, then - -an intruder cannot forge the contents of another message, y, that has -the same hash value as the original message. Let's convince ourselves -that a simple checksum, such as the Internet checksum, would make a poor -cryptographic hash function. Rather than performing 1s complement -arithmetic (as in the Internet checksum), let us compute a checksum by -treating each character as a byte and adding the bytes together using -4-byte chunks at a time. Suppose Bob owes Alice \$100.99 and sends an -IOU to Alice consisting of the text string " IOU100.99BOB. " The ASCII -representation (in hexadecimal notation) for these letters is 49 , 4F , -55 , 31 , 30 , 30 , 2E , 39 , 39 , 42 , 4F , 42 . Figure 8.8 (top) shows -that the 4-byte checksum for this message is B2 C1 D2 AC. A slightly -different message (and a much more costly one for Bob) is shown in the -bottom half of Figure 8.8. The messages " IOU100.99BOB " and " -IOU900.19BOB " have the same checksum. Thus, this simple checksum -algorithm violates the requirement above. Given the original data, it is -simple to find another set of data with the same checksum. Clearly, for -security purposes, we are going to need a more powerful hash function -than a checksum. The MD5 hash algorithm of Ron Rivest \[RFC 1321\] is in -wide use today. It computes a 128-bit hash in a four-step process -consisting of a padding step (adding a one followed by enough zeros so -that the length of the message satisfies certain conditions), an append -step (appending a 64-bit representation of the message length before -padding), an initialization of an accumulator, and a final looping step -in which the message's 16-word blocks are processed (mangled) in four -rounds. For a description of MD5 (including a C source code -implementation) see \[RFC 1321\]. The second major hash algorithm in use -today is the Secure Hash Algorithm (SHA-1) \[FIPS 1995\]. This algorithm -is based on principles similar to those used in the design of MD4 \[RFC -1320\], the predecessor to MD5. SHA-1, a US federal standard, is -required for use whenever a cryptographic hash algorithm is needed for -federal applications. It produces a 160-bit message digest. The longer -output length makes SHA-1 more secure. - -8.3.2 Message Authentication Code Let's now return to the problem of -message integrity. Now that we understand hash functions, let's take a -first stab at how we might perform message integrity: - -1. Alice creates message m and calculates the hash H(m) (for example - with SHA-1). -2. Alice then appends H(m) to the message m, creating an extended - message (m, H(m)), and sends the extended message to Bob. - -3. Bob receives an extended message (m, h) and calculates H(m). If -H(m)=h, Bob concludes that everything is fine. This approach is -obviously flawed. Trudy can create a bogus message m´ in which she says -she is Alice, calculate H(m´), and send Bob (m´, H(m´)). When Bob -receives the message, everything checks out in step 3, so Bob doesn't -suspect any funny business. To perform message integrity, in addition to -using cryptographic hash functions, Alice and Bob will need a shared -secret s. This shared secret, which is nothing more than a string of -bits, is called the authentication key. Using this shared secret, -message integrity can be performed as follows: - -1. Alice creates message m, concatenates s with m to create m+s, and - calculates the hash H(m+s) (for example with SHA-1). H(m+s) is - called the message authentication code (MAC). - -2. Alice then appends the MAC to the message m, creating an extended - message (m, H(m+s)), and sends the extended message to Bob. - -3. Bob receives an extended message (m, h) and knowing s, calculates - the MAC H(m+s). If H(m+s)=h, Bob concludes that everything is fine. - A summary of the procedure is shown in Figure 8.9. Readers should - note that the MAC here (standing for "message authentication code") - is not the same MAC used in link-layer protocols (standing for - "medium access control")! One nice feature of a MAC is that it does - not require an encryption algorithm. Indeed, in many applications, - including the link-state routing algorithm described earlier, - communicating entities are only concerned with message integrity and - are not concerned with message confidentiality. Using a MAC, the - entities can authenticate - -Figure 8.9 Message authentication code (MAC) - -the messages they send to each other without having to integrate complex -encryption algorithms into the integrity process. As you might expect, a -number of different standards for MACs have been proposed over the -years. The most popular standard today is HMAC, which can be used either -with MD5 or SHA-1. HMAC actually runs data and the authentication key -through the hash function twice \[Kaufman 1995; RFC 2104\]. There still -remains an important issue. How do we distribute the shared -authentication key to the communicating entities? For example, in the -link-state routing algorithm, we would somehow need to distribute the -secret authentication key to each of the routers in the autonomous -system. (Note that the routers can all use the same authentication key.) -A network administrator could actually accomplish this by physically -visiting each of the routers. Or, if the network administrator is a lazy -guy, and if each router has its own public key, the network -administrator could distribute the authentication key to any one of the -routers by encrypting it with the router's public key and then sending -the encrypted key over the network to the router. - -8.3.3 Digital Signatures Think of the number of the times you've signed -your name to a piece of paper during the last week. You sign checks, -credit card receipts, legal documents, and letters. Your signature -attests to the fact that you (as opposed to someone else) have -acknowledged and/or agreed with the document's contents. In a digital -world, one often wants to indicate the owner or creator of a document, -or to signify one's agreement with a document's content. A digital -signature is a cryptographic technique for achieving these goals in a -digital world. Just as with handwritten signatures, digital signing -should be done in a way that is verifiable and nonforgeable. That is, it -must be possible to prove that a document signed by an individual was -indeed signed by that individual (the signature must be verifiable) and -that only that individual could have signed the document (the signature -cannot be forged). Let's now consider how we might design a digital -signature scheme. Observe that when Bob signs a message, Bob must put -something on the message that is unique to him. Bob could consider -attaching a MAC for the signature, where the MAC is created by appending -his key (unique to him) to the message, and then taking the hash. But -for Alice to verify the signature, she must also have a copy of the key, -in which case the key would not be unique to Bob. Thus, MACs are not -going to get the job done here. - -Recall that with public-key cryptography, Bob has both a public and -private key, with both of these keys being unique to Bob. Thus, -public-key cryptography is an excellent candidate for providing digital -signatures. Let us now examine how it is done. Suppose that Bob wants to -digitally sign a document, m. We can think of the document as a file or -a message that Bob is going to sign and send. As shown in Figure 8.10, -to sign this document, Bob simply uses his private key, KB−, to compute -KB−(m). At first, it might seem odd that Bob is using his private key -(which, as we saw in Section 8.2, was used to decrypt a message that had -been encrypted with his public key) to sign a document. But recall that -encryption and decryption are nothing more than mathematical operations -(exponentiation to the power of e or d in RSA; see Section 8.2) and -recall that Bob's goal is not to scramble or obscure the contents of the -document, but rather to sign the document in a manner that is verifiable -and nonforgeable. Bob's digital signature of the document is KB−(m). -Does the digital signature KB−(m) meet our requirements of being -verifiable and nonforgeable? Suppose Alice has m and KB−(m). She wants -to prove in court (being - -Figure 8.10 Creating a digital signature for a document - -litigious) that Bob had indeed signed the document and was the only -person who could have possibly signed the document. Alice takes Bob's -public key, KB+, and applies it to the digital signature, KB−(m), -associated with the document, m. That is, she computes KB+(KB−(m)), and -voilà, with a dramatic flurry, she produces m, which exactly matches the -original document! Alice then argues that only Bob could have signed the -document, for the following reasons: Whoever signed the message must -have used the private key, KB−, in computing the signature KB−(m), such -that KB+(KB−(m))=m. The only person who could have known the private -key, KB−, is Bob. Recall from our discussion of - -RSA in Section 8.2 that knowing the public key, KB+, is of no help in -learning the private key, KB−. Therefore, the only person who could know -KB− is the person who generated the pair of keys, (KB+, KB−), in the -first place, Bob. (Note that this assumes, though, that Bob has not -given KB− to anyone, nor has anyone stolen KB− from Bob.) It is also -important to note that if the original document, m, is ever modified to -some alternate form, m´, the signature that Bob created for m will not -be valid for m´, since KB+(KB−(m)) does not equal m´. Thus we see that -digital signatures also provide message integrity, allowing the receiver -to verify that the message was unaltered as well as the source of the -message. One concern with signing data by encryption is that encryption -and decryption are computationally expensive. Given the overheads of -encryption and decryption, signing data via complete -encryption/decryption can be overkill. A more efficient approach is to -introduce hash functions into the digital signature. Recall from Section -8.3.2 that a hash algorithm takes a message, m, of arbitrary length and -computes a fixed-length "fingerprint" of the message, denoted by H(m). -Using a hash function, Bob signs the hash of a message rather than the -message itself, that is, Bob calculates KB−(H(m)). Since H(m) is -generally much smaller than the original message m, the computational -effort required to create the digital signature is substantially -reduced. In the context of Bob sending a message to Alice, Figure 8.11 -provides a summary of the operational procedure of creating a digital -signature. Bob puts his original long message through a hash function. -He then digitally signs the resulting hash with his private key. The -original message (in cleartext) along with the digitally signed message -digest (henceforth referred to as the digital signature) is then sent to -Alice. Figure 8.12 provides a summary of the operational procedure of -the signature. Alice applies the sender's public key to the message to -obtain a hash result. Alice also applies the hash function to the -cleartext message to obtain a second hash result. If the two hashes -match, then Alice can be sure about the integrity and author of the -message. Before moving on, let's briefly compare digital signatures with -MACs, since they have parallels, but also have important subtle -differences. Both digital signatures and - -Figure 8.11 Sending a digitally signed message - -MACs start with a message (or a document). To create a MAC out of the -message, we append an authentication key to the message, and then take -the hash of the result. Note that neither public key nor symmetric key -encryption is involved in creating the MAC. To create a digital -signature, we first take the hash of the message and then encrypt the -message with our private key (using public key cryptography). Thus, a -digital signature is a "heavier" technique, since it requires an -underlying Public Key Infrastructure (PKI) with certification -authorities as described below. We'll see in Section 8.4 that PGP---a -popular secure e-mail system---uses digital signatures for message -integrity. We've seen already that OSPF uses MACs for message integrity. -We'll see in Sections 8.5 and 8.6 that MACs are also used for popular -transport-layer and network-layer security protocols. Public Key -Certification An important application of digital signatures is public -key certification, that is, certifying that a public key belongs to a -specific entity. Public key certification is used in many popular secure -networking protocols, including IPsec and SSL. To gain insight into this -problem, let's consider an Internet-commerce version of the classic -"pizza prank." Alice is in the pizza delivery business and accepts -orders - -Figure 8.12 Verifying a signed message - -over the Internet. Bob, a pizza lover, sends Alice a plaintext message -that includes his home address and the type of pizza he wants. In this -message, Bob also includes a digital signature (that is, a signed hash -of the original plaintext message) to prove to Alice that he is the true -source of the message. To verify the signature, Alice obtains Bob's -public key (perhaps from a public key server or from the e-mail message) -and checks the digital signature. In this manner she makes sure that -Bob, rather than some adolescent prankster, placed the order. This all -sounds fine until clever Trudy comes along. As shown in Figure 8.13, -Trudy is indulging in a prank. She sends a message to Alice in which she -says she is Bob, gives Bob's home address, and orders a pizza. In this -message she also includes her (Trudy's) public key, although Alice -naturally assumes it is Bob's public key. Trudy also attaches a digital -signature, which was created with her own (Trudy's) private key. After -receiving the message, Alice applies Trudy's public key (thinking that -it is Bob's) to the digital signature and concludes that the plaintext -message was - -Figure 8.13 Trudy masquerades as Bob using public key cryptography - -indeed created by Bob. Bob will be very surprised when the delivery -person brings a pizza with pepperoni and anchovies to his home! We see -from this example that for public key cryptography to be useful, you -need to be able to verify that you have the actual public key of the -entity (person, router, browser, and so on) with whom you want to -communicate. For example, when Alice wants to communicate with Bob using -public key cryptography, she needs to verify that the public key that is -supposed to be Bob's is indeed Bob's. Binding a public key to a -particular entity is typically done by a Certification Authority (CA), -whose job is to validate identities and issue certificates. A CA has the -following roles: - -1. A CA verifies that an entity (a person, a router, and so on) is who - it says it is. There are no mandated procedures for how - certification is done. When dealing with a CA, one must trust the CA - to have performed a suitably rigorous identity verification. For - example, if Trudy were able to walk into the Fly-by-Night - -Figure 8.14 Bob has his public key certified by the CA - -CA and simply announce "I am Alice" and receive certificates associated -with the identity of Alice, then one shouldn't put much faith in public -keys certified by the Fly-by-Night CA. On the other hand, one might (or -might not!) be more willing to trust a CA that is part of a federal or -state program. You can trust the identity associated with a public key -only to the extent to which you can trust a CA and its identity -verification techniques. What a tangled web of trust we spin! - -2. Once the CA verifies the identity of the entity, the CA creates a - certificate that binds the public key of the entity to the identity. - The certificate contains the public key and globally unique - identifying information about the owner of the public key (for - example, a human name or an IP address). The certificate is - digitally signed by the CA. These steps are shown in Figure 8.14. - Let us now see how certificates can be used to combat pizza-ordering - pranksters, like Trudy, and other undesirables. When Bob places his - order he also sends his CA-signed certificate. Alice uses the CA's - public key to check the validity of Bob's certificate and extract - Bob's public key. Both the International Telecommunication Union - (ITU) and the IETF have developed standards for CAs. ITU X.509 \[ITU - 2005a\] specifies an authentication service as well as a specific - syntax for certificates. \[RFC 1422\] describes CA-based key - management for use with secure Internet e-mail. It is compatible - with X.509 but goes beyond X.509 by establishing procedures and - conventions for a key management architecture. Table 8.4 describes - some of the important fields in a certificate. Table 8.4 Selected - fields in an X.509 and RFC 1422 public key - -Field Name - -Description - -Version - -Version number of X.509 specification - -Serial - -CA-issued unique identifier for a certificate - -number Signature - -Specifies the algorithm used by CA to sign this certificate - -Issuer - -Identity of CA issuing this certificate, in distinguished name (DN) -\[RFC 4514\] format - -name Validity - -Start and end of period of validity for certificate - -period Subject - -Identity of entity whose public key is associated with this certificate, -in DN format - -name Subject - -The subject's public key as well indication of the public key algorithm -(and algorithm - -public key - -parameters) to be used with this key - -8.4 End-Point Authentication End-point authentication is the process of -one entity proving its identity to another entity over a computer -network, for example, a user proving its identity to an e-mail server. -As humans, we authenticate each other in many ways: We recognize each -other's faces when we meet, we recognize each other's voices on the -telephone, we are authenticated by the customs official who checks us -against the picture on our passport. In this section, we consider how -one party can authenticate another party when the two are communicating -over a network. We focus here on authenticating a "live" party, at the -point in time when communication is actually occurring. A concrete -example is a user authenticating him or herself to an email server. This -is a subtly different problem from proving that a message received at -some point in the past did indeed come from that claimed sender, as -studied in Section 8.3. When performing authentication over the network, -the communicating parties cannot rely on biometric information, such as -a visual appearance or a voiceprint. Indeed, we will see in our later -case studies that it is often network elements such as routers and -client/server processes that must authenticate each other. Here, -authentication must be done solely on the basis of messages and data -exchanged as part of an authentication protocol. Typically, an -authentication protocol would run before the two communicating parties -run some other protocol (for example, a reliable data transfer protocol, -a routing information exchange protocol, or an e-mail protocol). The -authentication protocol first establishes the identities of the parties -to each other's satisfaction; only after authentication do the parties -get down to the work at hand. As in the case of our development of a -reliable data transfer (rdt) protocol in Chapter 3, we will find it -instructive here to develop various versions of an authentication -protocol, which we will call ap (authentication protocol), and poke -holes in each version - -Figure 8.15 Protocol ap1.0 and a failure scenario - -as we proceed. (If you enjoy this stepwise evolution of a design, you -might also enjoy \[Bryant 1988\], which recounts a fictitious narrative -between designers of an open-network authentication system, and their -discovery of the many subtle issues involved.) Let's assume that Alice -needs to authenticate herself to Bob. - -8.4.1 Authentication Protocol ap1.0 Perhaps the simplest authentication -protocol we can imagine is one where Alice simply sends a message to Bob -saying she is Alice. This protocol is shown in Figure 8.15. The flaw -here is obvious--- there is no way for Bob actually to know that the -person sending the message "I am Alice" is indeed Alice. For example, -Trudy (the intruder) could just as well send such a message. - -8.4.2 Authentication Protocol ap2.0 If Alice has a well-known network -address (e.g., an IP address) from which she always communicates, Bob -could attempt to authenticate Alice by verifying that the source address -on the IP datagram carrying the authentication message matches Alice's -well-known address. In this case, Alice would be authenticated. This -might stop a very network-naive intruder from impersonating Alice, but -it wouldn't stop the determined student studying this book, or many -others! From our study of the network and data link layers, we know that -it is not that hard (for example, if one had access to the operating -system code and could build one's own operating system kernel, as is the - -case with Linux and several other freely available operating systems) to -create an IP datagram, put whatever IP source address we want (for -example, Alice's well-known IP address) into the IP datagram, and send -the datagram over the link-layer protocol to the first-hop router. From -then - -Figure 8.16 Protocol ap2.0 and a failure scenario - -on, the incorrectly source-addressed datagram would be dutifully -forwarded to Bob. This approach, shown in Figure 8.16, is a form of IP -spoofing. IP spoofing can be avoided if Trudy's first-hop router is -configured to forward only datagrams containing Trudy's IP source -address \[RFC 2827\]. However, this capability is not universally -deployed or enforced. Bob would thus be foolish to assume that Trudy's -network manager (who might be Trudy herself) had configured Trudy's -first-hop router to forward only appropriately addressed datagrams. - -8.4.3 Authentication Protocol ap3.0 One classic approach to -authentication is to use a secret password. The password is a shared -secret between the authenticator and the person being authenticated. -Gmail, Facebook, telnet, FTP, and many other services use password -authentication. In protocol ap3.0, Alice thus sends her secret password -to Bob, as shown in Figure 8.17. Since passwords are so widely used, we -might suspect that protocol ap3.0 is fairly secure. If so, we'd be -wrong! The security flaw here is clear. If Trudy eavesdrops on Alice's -communication, then she can learn Alice's password. Lest you think this -is unlikely, consider the fact that when you Telnet to another machine -and log in, the login password is sent unencrypted to the Telnet server. -Someone connected to the Telnet client or server's LAN can possibly -sniff (read and store) all packets transmitted on the LAN and thus steal -the login password. In fact, this is a well-known approach for stealing -passwords (see, for example, \[Jimenez 1997\]). Such a threat is -obviously very real, so ap3.0 clearly won't do. - -8.4.4 Authentication Protocol ap3.1 Our next idea for fixing ap3.0 is -naturally to encrypt the password. By encrypting the password, we can -prevent Trudy from learning Alice's password. If we assume - -Figure 8.17 Protocol ap3.0 and a failure scenario - -that Alice and Bob share a symmetric secret key, KA−B, then Alice can -encrypt the password and send her identification message, " I am Alice, -" and her encrypted password to Bob. Bob then decrypts the password and, -assuming the password is correct, authenticates Alice. Bob feels -comfortable in authenticating Alice since Alice not only knows the -password, but also knows the shared secret key value needed to encrypt -the password. Let's call this protocol ap3.1. While it is true that -ap3.1 prevents Trudy from learning Alice's password, the use of -cryptography here does not solve the authentication problem. Bob is -subject to a playback attack: Trudy need only eavesdrop on Alice's -communication, record the encrypted version of the password, and play -back the encrypted version of the password to Bob to pretend that she is -Alice. The use of an encrypted password in ap3.1 doesn't make the -situation manifestly different from that of protocol ap3.0 in Figure -8.17. - -8.4.5 Authentication Protocol ap4.0 The failure scenario in Figure 8.17 -resulted from the fact that Bob could not distinguish between the -original authentication of Alice and the later playback of Alice's -original authentication. That is, Bob could not tell if Alice was live -(that is, was currently really on the other end of the connection) or -whether the messages he was receiving were a recorded playback of a -previous authentication of Alice. The very (very) observant reader will -recall that the three-way TCP handshake protocol needed to address the -same problem---the server side of a TCP connection did not want to -accept a connection if the received SYN segment was an old copy -(retransmission) of a SYN segment from an earlier connection. How did -the TCP server side solve the problem of determining whether the client -was really live? It chose an initial sequence number that had not been -used in a very long time, sent that number to the client, and then -waited for the client to respond with an ACK segment containing that -number. We can adopt the same idea here for authentication purposes. A -nonce is a number that a protocol will use only once in a lifetime. That -is, once a protocol uses a nonce, it will never use that number again. -Our ap4.0 protocol uses a nonce as follows: - -1. Alice sends the message " I am Alice " to Bob. - -2. Bob chooses a nonce, R, and sends it to Alice. - -3. Alice encrypts the nonce using Alice and Bob's symmetric secret key, - KA−B, and sends the encrypted nonce, KA−B (R), back to Bob. As in - protocol ap3.1, it is the fact that Alice knows KA−B and uses it to - encrypt a value that lets Bob know that the message he receives was - generated by Alice. The nonce is used to ensure that Alice is live. - -4. Bob decrypts the received message. If the decrypted nonce equals the - nonce he sent Alice, then Alice is authenticated. Protocol ap4.0 is - illustrated in Figure 8.18. By using the once-in-a-lifetime value, - R, and then checking the returned value, KA−B (R), Bob can be sure - that Alice is both who she says she is (since she knows the secret - key value needed to encrypt R) and live (since she has encrypted the - nonce, R, that Bob just created). The use of a nonce and symmetric - key cryptography forms the basis of ap4.0. A natural question is - whether we can use a nonce and public key cryptography (rather than - symmetric key cryptography) to solve the authentication problem. - This issue is explored in the problems at the end of the chapter. - -Figure 8.18 Protocol ap4.0 and a failure scenario - -8.5 Securing E-Mail In previous sections, we examined fundamental issues -in network security, including symmetric key and public key -cryptography, end-point authentication, key distribution, message -integrity, and digital signatures. We are now going to examine how these -tools are being used to provide security in the Internet. Interestingly, -it is possible to provide security services in any of the top four -layers of the Internet protocol stack. When security is provided for a -specific application-layer protocol, the application using the protocol -will enjoy one or more security services, such as confidentiality, -authentication, or integrity. When security is provided for a -transport-layer protocol, all applications that use that protocol enjoy -the security services of the transport protocol. When security is -provided at the network layer on a host-tohost basis, all -transport-layer segments (and hence all application-layer data) enjoy -the security services of the network layer. When security is provided on -a link basis, then the data in all frames traveling over the link -receive the security services of the link. In Sections 8.5 through 8.8, -we examine how security tools are being used in the application, -transport, network, and link layers. Being consistent with the general -structure of this book, we begin at the top of the protocol stack and -discuss security at the application layer. Our approach is to use a -specific application, e-mail, as a case study for application-layer -security. We then move down the protocol stack. We'll examine the SSL -protocol (which provides security at the transport layer), IPsec (which -provides security at the network layer), and the security of the IEEE -802.11 wireless LAN protocol. You might be wondering why security -functionality is being provided at more than one layer in the Internet. -Wouldn't it suffice simply to provide the security functionality at the -network layer and be done with it? There are two answers to this -question. First, although security at the network layer can offer -"blanket coverage" by encrypting all the data in the datagrams (that is, -all the transport-layer segments) and by authenticating all the source -IP addresses, it can't provide user-level security. For example, a -commerce site cannot rely on IP-layer security to authenticate a -customer who is purchasing goods at the commerce site. Thus, there is a -need for security functionality at higher layers as well as blanket -coverage at lower layers. Second, it is generally easier to deploy new -Internet services, including security services, at the higher layers of -the protocol stack. While waiting for security to be broadly deployed at -the network layer, which is probably still many years in the future, -many application developers "just do it" and introduce security -functionality into their favorite applications. A classic example is -Pretty Good Privacy (PGP), which provides secure e-mail (discussed later -in this section). Requiring only client and server application code, PGP -was one of the first security technologies to be broadly used in the -Internet. - -8.5.1 Secure E-Mail We now use the cryptographic principles of Sections -8.2 through 8.3 to create a secure e-mail system. We create this -high-level design in an incremental manner, at each step introducing new -security services. When designing a secure e-mail system, let us keep in -mind the racy example introduced in Section 8.1---the love affair -between Alice and Bob. Imagine that Alice wants to send an e-mail -message to Bob, and Trudy wants to intrude. Before plowing ahead and -designing a secure e-mail system for Alice and Bob, we should consider -which security features would be most desirable for them. First and -foremost is confidentiality. As discussed in Section 8.1, neither Alice -nor Bob wants Trudy to read Alice's e-mail message. The second feature -that Alice and Bob would most likely want to see in the secure e-mail -system is sender authentication. In particular, when Bob receives the -message " I don't love you anymore. I never want to see you again. -Formerly yours, Alice, " he would naturally want to be sure that the -message came from Alice and not from Trudy. Another feature that the two -lovers would appreciate is message integrity, that is, assurance that -the message Alice sends is not modified while en route to Bob. Finally, -the e-mail system should provide receiver authentication; that is, Alice -wants to make sure that she is indeed sending the letter to Bob and not -to someone else (for example, Trudy) who is impersonating Bob. So let's -begin by addressing the foremost concern, confidentiality. The most -straightforward way to provide confidentiality is for Alice to encrypt -the message with symmetric key technology (such as DES or AES) and for -Bob to decrypt the message on receipt. As discussed in Section 8.2, if -the symmetric key is long enough, and if only Alice and Bob have the -key, then it is extremely difficult for anyone else (including Trudy) to -read the message. Although this approach is straightforward, it has the -fundamental difficulty that we discussed in Section 8.2---distributing a -symmetric key so that only Alice and Bob have copies of it. So we -naturally consider an alternative approach---public key cryptography -(using, for example, RSA). In the public key approach, Bob makes his -public key publicly available (e.g., in a public key server or on his -personal Web page), Alice encrypts her message with Bob's public key, -and she sends the encrypted message to Bob's e-mail address. When Bob -receives the message, he simply decrypts it with his private key. -Assuming that Alice knows for sure that the public key is Bob's public -key, this approach is an excellent means to provide the desired -confidentiality. One problem, however, is that public key encryption is -relatively inefficient, particularly for long messages. To overcome the -efficiency problem, let's make use of a session key (discussed in -Section 8.2.2). In particular, Alice (1) selects a random symmetric -session key, KS, (2) encrypts her message, m, with the symmetric key, -(3) encrypts the symmetric key with Bob's public key, KB+, (4) -concatenates the - -encrypted message and the encrypted symmetric key to form a "package," -and (5) sends the package to Bob's - -Figure 8.19 Alice used a symmetric session key, KS, to send a secret -e-mail to Bob - -e-mail address. The steps are illustrated in Figure 8.19. (In this and -the subsequent figures, the circled "+" represents concatenation and the -circled "−" represents deconcatenation.) When Bob receives the package, -he (1) uses his private key, KB−, to obtain the symmetric key, KS, and -(2) uses the symmetric key KS to decrypt the message m. Having designed -a secure e-mail system that provides confidentiality, let's now design -another system that provides both sender authentication and message -integrity. We'll suppose, for the moment, that Alice and Bob are no -longer concerned with confidentiality (they want to share their feelings -with everyone!), and are concerned only about sender authentication and -message integrity. To accomplish this task, we use digital signatures -and message digests, as described in Section 8.3. Specifically, Alice -(1) applies a hash function, H (for example, MD5), to her message, m, to -obtain a message digest, (2) signs the result of the hash function with -her private key, KA−, to create a digital signature, (3) concatenates -the original (unencrypted) message with the signature to create a -package, and (4) sends the package to Bob's e-mail address. When Bob -receives the package, he (1) applies Alice's public key, KA+, to the -signed message digest and (2) compares the result of this operation with -his own hash, H, of the message. The steps are illustrated in Figure -8.20. As discussed in Section 8.3, if the two results are the same, Bob -can be pretty confident that the message came from Alice and is -unaltered. Now let's consider designing an e-mail system that provides -confidentiality, sender authentication, and message integrity. This can -be done by combining the procedures in Figures 8.19 and 8.20. Alice -first creates a preliminary package, exactly as in Figure 8.20, that -consists of her original message along with a digitally signed hash of -the message. She then treats this preliminary package as a message in -itself and sends this new message through the sender steps in Figure -8.19, creating a new package that is sent to Bob. The steps applied by -Alice are shown in Figure 8.21. When Bob receives the - -package, he first applies his side of Figure 8.19 and then his - -Figure 8.20 Using hash functions and digital signatures to provide -sender authentication and message integrity - -side of Figure 8.20. It should be clear that this design achieves the -goal of providing confidentiality, sender authentication, and message -integrity. Note that, in this scheme, Alice uses public key cryptography -twice: once with her own private key and once with Bob's public key. -Similarly, Bob also uses public key cryptography twice---once with his -private key and once with Alice's public key. The secure e-mail design -outlined in Figure 8.21 probably provides satisfactory security for most -e-mail users for most occasions. But there is still one important issue -that remains to be addressed. The design in Figure 8.21 requires Alice -to obtain Bob's public key, and requires Bob to obtain Alice's public -key. The distribution of these public keys is a nontrivial problem. For -example, Trudy might masquerade as Bob and give Alice her own public key -while saying that it is Bob's public key, - -Figure 8.21 Alice uses symmetric key cyptography, public key -cryptography, a hash function, and a digital signature to provide -secrecy, sender authentication, and message integrity - -enabling her to receive the message meant for Bob. As we learned in -Section 8.3, a popular approach for securely distributing public keys is -to certify the public keys using a CA. - -8.5.2 PGP Written by Phil Zimmermann in 1991, Pretty Good Privacy (PGP) -is a nice example of an e-mail encryption scheme \[PGPI 2016\]. Versions -of PGP are available in the public domain; for example, you can find the -PGP software for your favorite platform as well as lots of interesting -reading at the International PGP Home Page \[PGPI 2016\]. The PGP design -is, in essence, the same as the design shown in Figure 8.21. Depending -on the version, the PGP software uses MD5 or SHA for calculating the -message digest; CAST, triple-DES, or IDEA for symmetric key encryption; -and RSA for the public key encryption. When PGP is installed, the -software creates a public key pair for the user. The public key can be -posted on the user's Web site or placed in a public key server. The -private key is protected by the use of a password. The password has to -be entered every time the user accesses the private key. PGP gives the -user the option of digitally signing the message, encrypting the -message, or both digitally signing and encrypting. Figure 8.22 shows a -PGP signed message. This message appears after the MIME header. The -encoded data in the message is KA−(H(m)), that is, the digitally signed -message digest. As we discussed above, in order for Bob to verify the -integrity of the message, he needs to have access to Alice's public key. -Figure 8.23 shows a secret PGP message. This message also appears after -the MIME header. Of course, the plaintext message is not included within -the secret e-mail message. When a sender (such as Alice) wants both -confidentiality and integrity, PGP contains a message like that of -Figure 8.23 within the message of Figure 8.22. PGP also provides a -mechanism for public key certification, but the mechanism is quite -different from the more conventional CA. PGP public keys are certified -by - -Figure 8.22 A PGP signed message - -Figure 8.23 A secret PGP message - -a web of trust. Alice herself can certify any key/username pair when she -believes the pair really belong together. In addition, PGP permits Alice -to say that she trusts another user to vouch for the authenticity of -more keys. Some PGP users sign each other's keys by holding key-signing -parties. Users physically gather, exchange public keys, and certify each -other's keys by signing them with their private keys. - -8.6 Securing TCP Connections: SSL In the previous section, we saw how -cryptographic techniques can provide confidentiality, data integrity, -and end-point authentication to a specific application, namely, e-mail. -In this section, we'll drop down a layer in the protocol stack and -examine how cryptography can enhance TCP with security services, -including confidentiality, data integrity, and end-point authentication. -This enhanced version of TCP is commonly known as Secure Sockets Layer -(SSL). A slightly modified version of SSL version 3, called Transport -Layer Security (TLS), has been standardized by the IETF \[RFC 4346\]. -The SSL protocol was originally designed by Netscape, but the basic -ideas behind securing TCP had predated Netscape's work (for example, see -Woo \[Woo 1994\]). Since its inception, SSL has enjoyed broad -deployment. SSL is supported by all popular Web browsers and Web -servers, and it is used by Gmail and essentially all Internet commerce -sites (including Amazon, eBay, and TaoBao). Hundreds of billions of -dollars are spent over SSL every year. In fact, if you have ever -purchased anything over the Internet with your credit card, the -communication between your browser and the server for this purchase -almost certainly went over SSL. (You can identify that SSL is being used -by your browser when the URL begins with https: rather than http.) To -understand the need for SSL, let's walk through a typical Internet -commerce scenario. Bob is surfing the Web and arrives at the Alice -Incorporated site, which is selling perfume. The Alice Incorporated site -displays a form in which Bob is supposed to enter the type of perfume -and quantity desired, his address, and his payment card number. Bob -enters this information, clicks on Submit, and expects to receive (via -ordinary postal mail) the purchased perfumes; he also expects to receive -a charge for his order in his next payment card statement. This all -sounds good, but if no security measures are taken, Bob could be in for -a few surprises. If no confidentiality (encryption) is used, an intruder -could intercept Bob's order and obtain his payment card information. The -intruder could then make purchases at Bob's expense. If no data -integrity is used, an intruder could modify Bob's order, having him -purchase ten times more bottles of perfume than desired. Finally, if no -server authentication is used, a server could display Alice -Incorporated's famous logo when in actuality the site maintained by -Trudy, who is masquerading as Alice Incorporated. After receiving Bob's -order, Trudy could take Bob's money and run. Or Trudy could carry out an -identity theft by collecting Bob's name, address, and credit card -number. SSL addresses these issues by enhancing TCP with -confidentiality, data integrity, server authentication, and client -authentication. - -SSL is often used to provide security to transactions that take place -over HTTP. However, because SSL secures TCP, it can be employed by any -application that runs over TCP. SSL provides a simple Application -Programmer Interface (API) with sockets, which is similar and analogous -to TCP's API. When an application wants to employ SSL, the application -includes SSL classes/libraries. As shown in Figure 8.24, although SSL -technically resides in the application layer, from the developer's -perspective it is a transport protocol that provides TCP's services -enhanced with security services. - -8.6.1 The Big Picture We begin by describing a simplified version of -SSL, one that will allow us to get a big-picture understanding of the -why and how of SSL. We will refer to this simplified - -Figure 8.24 Although SSL technically resides in the application layer, -from the developer's perspective it is a transport-layer protocol - -version of SSL as "almost-SSL." After describing almost-SSL, in the next -subsection we'll then describe the real SSL, filling in the details. -Almost-SSL (and SSL) has three phases: handshake, key derivation, and -data transfer. We now describe these three phases for a communication -session between a client (Bob) and a server (Alice), with Alice having a -private/public key pair and a certificate that binds her identity to her -public key. - -Handshake During the handshake phase, Bob needs to (a) establish a TCP -connection with Alice, (b) verify that Alice is really Alice, and (c) -send Alice a master secret key, which will be used by both Alice and Bob -to generate all the symmetric keys they need for the SSL session. These -three steps are shown in Figure 8.25. Note that once the TCP connection -is established, Bob sends Alice a hello message. Alice then responds -with her certificate, which contains her public key. As discussed in -Section 8.3, because the certificate has been certified by a CA, Bob -knows for sure that the public key in the certificate belongs to Alice. -Bob then generates a Master Secret (MS) (which will only be used for -this SSL session), encrypts the MS with Alice's public key to create the -Encrypted Master Secret (EMS), and sends the EMS to Alice. Alice -decrypts the EMS with her private key to get the MS. After this phase, -both Bob and Alice (and no one else) know the master secret for this SSL -session. - -Figure 8.25 The almost-SSL handshake, beginning with a TCP connection - -Key Derivation In principle, the MS, now shared by Bob and Alice, could -be used as the symmetric session key for all subsequent encryption and -data integrity checking. It is, however, generally considered safer for -Alice and Bob to each use different cryptographic keys, and also to use -different keys for encryption and integrity checking. Thus, both Alice -and Bob use the MS to generate four keys: EB= session encryption key for -data sent from Bob to Alice MB= session MAC key for data sent from Bob -to Alice EA= - -session encryption key for data sent from Alice to Bob MA= session MAC -key for data sent from Alice to Bob Alice and Bob each generate the four -keys from the MS. This could be done by simply slicing the MS into four -keys. (But in real SSL it is a little more complicated, as we'll see.) -At the end of the key derivation phase, both Alice and Bob have all four -keys. The two encryption keys will be used to encrypt data; the two MAC -keys will be used to verify the integrity of the data. Data Transfer Now -that Alice and Bob share the same four session keys (EB, MB, EA, and -MA), they can start to send secured data to each other over the TCP -connection. Since TCP is a byte-stream protocol, a natural approach -would be for SSL to encrypt application data on the fly and then pass -the encrypted data on the fly to TCP. But if we were to do this, where -would we put the MAC for the integrity check? We certainly do not want -to wait until the end of the TCP session to verify the integrity of all -of Bob's data that was sent over the entire session! To address this -issue, SSL breaks the data stream into records, appends a MAC to each -record for integrity checking, and then encrypts the record +MAC. To -create the MAC, Bob inputs the record data along with the key MB into a -hash function, as discussed in Section 8.3. To encrypt the package -record +MAC, Bob uses his session encryption key EB. This encrypted -package is then passed to TCP for transport over the Internet. Although -this approach goes a long way, it still isn't bullet-proof when it comes -to providing data integrity for the entire message stream. In -particular, suppose Trudy is a woman-in-the-middle and has the ability -to insert, delete, and replace segments in the stream of TCP segments -sent between Alice and Bob. Trudy, for example, could capture two -segments sent by Bob, reverse the order of the segments, adjust the TCP -sequence numbers (which are not encrypted), and then send the two -reverse-ordered segments to Alice. Assuming that each TCP segment -encapsulates exactly one record, let's now take a look at how Alice -would process these segments. - -1. TCP running in Alice would think everything is fine and pass the two - records to the SSL sublayer. - -2. SSL in Alice would decrypt the two records. - -3. SSL in Alice would use the MAC in each record to verify the data - integrity of the two records. - -4. SSL would then pass the decrypted byte streams of the two records to - the application layer; but the complete byte stream received by - Alice would not be in the correct order due to reversal of the - records! You are encouraged to walk through similar scenarios for - when Trudy removes segments or when Trudy replays segments. - -The solution to this problem, as you probably guessed, is to use -sequence numbers. SSL does this as follows. Bob maintains a sequence -number counter, which begins at zero and is incremented for each SSL -record he sends. Bob doesn't actually include a sequence number in the -record itself, but when he calculates the MAC, he includes the sequence -number in the MAC calculation. Thus, the MAC is now a hash of the data -plus the MAC key MB plus the current sequence number. Alice tracks Bob's -sequence numbers, allowing her to verify the data integrity of a record -by including the appropriate sequence number in the MAC calculation. -This use of SSL sequence numbers prevents Trudy from carrying out a -woman-in-the-middle attack, such as reordering or replaying segments. -(Why?) SSL Record The SSL record (as well as the almost-SSL record) is -shown in Figure 8.26. The record consists of a type field, version -field, length field, data field, and MAC field. Note that the first -three fields are not encrypted. The type field indicates whether the -record is a handshake message or a message that contains application -data. It is also used to close the SSL connection, as discussed below. -SSL at the receiving end uses the length field to extract the SSL -records out of the incoming TCP byte stream. The version field is -self-explanatory. - -8.6.2 A More Complete Picture The previous subsection covered the -almost-SSL protocol; it served to give us a basic understanding of the -why and how of SSL. Now that we have a basic understanding of SSL, we -can dig a little deeper and examine the essentials of the actual SSL -protocol. In parallel to reading this description of the SSL protocol, -you are encouraged to complete the Wireshark SSL lab, available at the -textbook's Web site. - -Figure 8.26 Record format for SSL - -SSL Handshake SSL does not mandate that Alice and Bob use a specific -symmetric key algorithm, a specific public-key algorithm, or a specific -MAC. Instead, SSL allows Alice and Bob to agree on the cryptographic -algorithms at the beginning of the SSL session, during the handshake -phase. Additionally, during the handshake phase, Alice and Bob send -nonces to each other, which are used in the creation of the - -session keys (EB, MB, EA, and MA). The steps of the real SSL handshake -are as follows: - -1. The client sends a list of cryptographic algorithms it supports, - along with a client nonce. - -2. From the list, the server chooses a symmetric algorithm (for - example, AES), a public key algorithm (for example, RSA with a - specific key length), and a MAC algorithm. It sends back to the - client its choices, as well as a certificate and a server nonce. - -3. The client verifies the certificate, extracts the server's public - key, generates a Pre-Master Secret (PMS), encrypts the PMS with the - server's public key, and sends the encrypted PMS to the server. - -4. Using the same key derivation function (as specified by the SSL - standard), the client and server independently compute the Master - Secret (MS) from the PMS and nonces. The MS is then sliced up to - generate the two encryption and two MAC keys. Furthermore, when the - chosen symmetric cipher employs CBC (such as 3DES or AES), then two - Initialization Vectors (IVs)--- one for each side of the - connection---are also obtained from the MS. Henceforth, all messages - sent between client and server are encrypted and authenticated (with - the MAC). - -5. The client sends a MAC of all the handshake messages. - -6. The server sends a MAC of all the handshake messages. The last two - steps protect the handshake from tampering. To see this, observe - that in step 1, the client typically offers a list of - algorithms---some strong, some weak. This list of algorithms is sent - in cleartext, since the encryption algorithms and keys have not yet - been agreed upon. Trudy, as a woman-in-themiddle, could delete the - stronger algorithms from the list, forcing the client to select a - weak algorithm. To prevent such a tampering attack, in step 5 the - client sends a MAC of the concatenation of all the handshake - messages it sent and received. The server can compare this MAC with - the MAC of the handshake messages it received and sent. If there is - an inconsistency, the server can terminate the connection. - Similarly, the server sends a MAC of the handshake messages it has - seen, allowing the client to check for inconsistencies. You may be - wondering why there are nonces in steps 1 and 2. Don't sequence - numbers suffice for preventing the segment replay attack? The answer - is yes, but they don't alone prevent the "connection replay attack." - Consider the following connection replay attack. Suppose Trudy - sniffs all messages between Alice and Bob. The next day, Trudy - masquerades as Bob and sends to Alice exactly the same sequence of - messages that Bob sent to Alice on the previous day. If Alice - doesn't use nonces, she will respond with exactly the same sequence - of messages she sent the previous day. Alice will not suspect any - funny business, as each message she receives will pass the integrity - check. If Alice is an ecommerce server, she will think that Bob is - placing a second order (for exactly the same thing). On the other - hand, by including a nonce in the protocol, Alice will send - different nonces for each TCP session, causing the encryption keys - to be different on the two days. Therefore, when Alice receives - played-back SSL records from Trudy, the records will fail the - integrity checks, and the bogus e-commerce transaction will not - succeed. In summary, in SSL, nonces are used to defend against the - "connection replay attack" - -and sequence numbers are used to defend against replaying individual -packets during an ongoing session. Connection Closure At some point, -either Bob or Alice will want to end the SSL session. One approach would -be to let Bob end the SSL session by simply terminating the underlying -TCP connection---that is, by having Bob send a TCP FIN segment to Alice. -But such a naive design sets the stage for the truncation attack whereby -Trudy once again gets in the middle of an ongoing SSL session and ends -the session early with a TCP FIN. If Trudy were to do this, Alice would -think she received all of Bob's data when actuality she only received a -portion of it. The solution to this problem is to indicate in the type -field whether the record serves to terminate the SSL session. (Although -the SSL type is sent in the clear, it is authenticated at the receiver -using the record's MAC.) By including such a field, if Alice were to -receive a TCP FIN before receiving a closure SSL record, she would know -that something funny was going on. This completes our introduction to -SSL. We've seen that it uses many of the cryptography principles -discussed in Sections 8.2 and 8.3. Readers who want to explore SSL on -yet a deeper level can read Rescorla's highly readable book on SSL -\[Rescorla 2001\]. - -8.7 Network-Layer Security: IPsec and Virtual Private Networks The IP -security protocol, more commonly known as IPsec, provides security at -the network layer. IPsec secures IP datagrams between any two -network-layer entities, including hosts and routers. As we will soon -describe, many institutions (corporations, government branches, -non-profit organizations, and so on) use IPsec to create virtual private -networks (VPNs) that run over the public Internet. Before getting into -the specifics of IPsec, let's step back and consider what it means to -provide confidentiality at the network layer. With network-layer -confidentiality between a pair of network entities (for example, between -two routers, between two hosts, or between a router and a host), the -sending entity encrypts the payloads of all the datagrams it sends to -the receiving entity. The encrypted payload could be a TCP segment, a -UDP segment, an ICMP message, and so on. If such a network-layer service -were in place, all data sent from one entity to the other---including -e-mail, Web pages, TCP handshake messages, and management messages (such -as ICMP and SNMP)---would be hidden from any third party that might be -sniffing the network. For this reason, network-layer security is said to -provide "blanket coverage." In addition to confidentiality, a -network-layer security protocol could potentially provide other security -services. For example, it could provide source authentication, so that -the receiving entity can verify the source of the secured datagram. A -network-layer security protocol could provide data integrity, so that -the receiving entity can check for any tampering of the datagram that -may have occurred while the datagram was in transit. A network-layer -security service could also provide replay-attack prevention, meaning -that Bob could detect any duplicate datagrams that an attacker might -insert. We will soon see that IPsec indeed provides mechanisms for all -these security services, that is, for confidentiality, source -authentication, data integrity, and replay-attack prevention. - -8.7.1 IPsec and Virtual Private Networks (VPNs) An institution that -extends over multiple geographical regions often desires its own IP -network, so that its hosts and servers can send data to each other in a -secure and confidential manner. To achieve this goal, the institution -could actually deploy a stand-alone physical network---including -routers, links, and a DNS infrastructure---that is completely separate -from the public Internet. Such a disjoint network, dedicated to a -particular institution, is called a private network. Not surprisingly, a -private network can be very costly, as the institution needs to -purchase, install, and maintain its own physical network infrastructure. - -Instead of deploying and maintaining a private network, many -institutions today create VPNs over the existing public Internet. With a -VPN, the institution's inter-office traffic is sent over the public -Internet rather than over a physically independent network. But to -provide confidentiality, the inter-office traffic is encrypted before it -enters the public Internet. A simple example of a VPN is shown in Figure -8.27. Here the institution consists of a headquarters, a branch office, -and traveling salespersons that typically access the Internet from their -hotel rooms. (There is only one salesperson shown in the figure.) In -this VPN, whenever two hosts within headquarters send IP datagrams to -each other or whenever two hosts within the branch office want to -communicate, they use good-old vanilla IPv4 (that is, without IPsec -services). However, when two of the institution's hosts - -Figure 8.27 Virtual private network (VPN) - -communicate over a path that traverses the public Internet, the traffic -is encrypted before it enters the Internet. To get a feel for how a VPN -works, let's walk through a simple example in the context of Figure -8.27. When a host in headquarters sends an IP datagram to a salesperson -in a hotel, the gateway router in headquarters converts the vanilla IPv4 -datagram into an IPsec datagram and then forwards this IPsec datagram -into the Internet. This IPsec datagram actually has a traditional IPv4 -header, so that the routers in the public Internet process the datagram -as if it were an ordinary IPv4 datagram---to them, the datagram is a -perfectly ordinary datagram. But, as shown Figure 8.27, the payload of -the IPsec datagram includes an IPsec header, which is used for IPsec -processing; furthermore, the payload of the - -IPsec datagram is encrypted. When the IPsec datagram arrives at the -salesperson's laptop, the OS in the laptop decrypts the payload (and -provides other security services, such as verifying data integrity) and -passes the unencrypted payload to the upper-layer protocol (for example, -to TCP or UDP). We have just given a high-level overview of how an -institution can employ IPsec to create a VPN. To see the forest through -the trees, we have brushed aside many important details. Let's now take -a closer look. - -8.7.2 The AH and ESP Protocols IPsec is a rather complex animal---it is -defined in more than a dozen RFCs. Two important RFCs are RFC 4301, -which describes the overall IP security architecture, and RFC 6071, -which provides an overview of the IPsec protocol suite. Our goal in this -textbook, as usual, is not simply to re-hash the dry and arcane RFCs, -but instead take a more operational and pedagogic approach to describing -the protocols. In the IPsec protocol suite, there are two principal -protocols: the Authentication Header (AH) protocol and the Encapsulation -Security Payload (ESP) protocol. When a source IPsec entity (typically a -host or a router) sends secure datagrams to a destination entity (also a -host or a router), it does so with either the AH protocol or the ESP -protocol. The AH protocol provides source authentication and data -integrity but does not provide confidentiality. The ESP protocol -provides source authentication, data integrity, and confidentiality. -Because confidentiality is often critical for VPNs and other IPsec -applications, the ESP protocol is much more widely used than the AH -protocol. In order to de-mystify IPsec and avoid much of its -complication, we will henceforth focus exclusively on the ESP protocol. -Readers wanting to learn also about the AH protocol are encouraged to -explore the RFCs and other online resources. - -8.7.3 Security Associations IPsec datagrams are sent between pairs of -network entities, such as between two hosts, between two routers, or -between a host and router. Before sending IPsec datagrams from source -entity to destination entity, the source and destination entities create -a network-layer logical connection. This logical connection is called a -security association (SA). An SA is a simplex logical connection; that -is, it is unidirectional from source to destination. If both entities -want to send secure datagrams to each other, then two SAs (that is, two -logical connections) need to be established, one in each direction. For -example, consider once again the institutional VPN in Figure 8.27. This -institution consists of a - -headquarters office, a branch office and, say, n traveling salespersons. -For the sake of example, let's suppose that there is bi-directional -IPsec traffic between headquarters and the branch office and -bidirectional IPsec traffic between headquarters and the salespersons. -In this VPN, how many SAs are there? To answer this question, note that -there are two SAs between the headquarters gateway router and the -branch-office gateway router (one in each direction); for each -salesperson's laptop, there are two SAs between the headquarters gateway -router and the laptop (again, one in each direction). So, in total, -there are (2+2n) SAs. Keep in mind, however, that not all traffic sent -into the Internet by the gateway routers or by the laptops will be IPsec -secured. For example, a host in headquarters may want to access a Web -server (such as Amazon or Google) in the public Internet. Thus, the -gateway router (and the laptops) will emit into the Internet both -vanilla IPv4 datagrams and secured IPsec datagrams. - -Figure 8.28 Security association (SA) from R1 to R2 - -Let's now take a look "inside" an SA. To make the discussion tangible -and concrete, let's do this in the context of an SA from router R1 to -router R2 in Figure 8.28. (You can think of Router R1 as the -headquarters gateway router and Router R2 as the branch office gateway -router from Figure 8.27.) Router R1 will maintain state information -about this SA, which will include: A 32-bit identifier for the SA, -called the Security Parameter Index (SPI) The origin interface of the SA -(in this case 200.168.1.100) and the destination interface of the SA (in -this case 193.68.2.23) The type of encryption to be used (for example, -3DES with CBC) The encryption key The type of integrity check (for -example, HMAC with MD5) The authentication key Whenever router R1 needs -to construct an IPsec datagram for forwarding over this SA, it accesses -this state information to determine how it should authenticate and -encrypt the datagram. Similarly, router R2 will maintain the same state -information for this SA and will use this information to authenticate -and decrypt any IPsec datagram that arrives from the SA. An IPsec entity -(router or host) often maintains state information for many SAs. For -example, in the VPN - -example in Figure 8.27 with n salespersons, the headquarters gateway -router maintains state information for (2+2n) SAs. An IPsec entity -stores the state information for all of its SAs in its Security -Association Database (SAD), which is a data structure in the entity's OS -kernel. - -8.7.4 The IPsec Datagram Having now described SAs, we can now describe -the actual IPsec datagram. IPsec has two different packet forms, one for -the so-called tunnel mode and the other for the so-called transport -mode. The tunnel mode, being more appropriate for VPNs, - -Figure 8.29 IPsec datagram format - -is more widely deployed than the transport mode. In order to further -de-mystify IPsec and avoid much of its complication, we henceforth focus -exclusively on the tunnel mode. Once you have a solid grip on the tunnel -mode, you should be able to easily learn about the transport mode on -your own. The packet format of the IPsec datagram is shown in Figure -8.29. You might think that packet formats are boring and insipid, but we -will soon see that the IPsec datagram actually looks and tastes like a -popular Tex-Mex delicacy! Let's examine the IPsec fields in the context -of Figure 8.28. Suppose router R1 receives an ordinary IPv4 datagram -from host 172.16.1.17 (in the headquarters network) which is destined to -host 172.16.2.48 (in the branch-office network). Router R1 uses the -following recipe to convert this "original IPv4 datagram" into an IPsec -datagram: Appends to the back of the original IPv4 datagram (which -includes the original header fields!) an "ESP trailer" field Encrypts -the result using the algorithm and key specified by the SA Appends to -the front of this encrypted quantity a field called "ESP header"; the -resulting package is called the "enchilada" Creates an authentication -MAC over the whole enchilada using the algorithm and key specified in - -the SA Appends the MAC to the back of the enchilada forming the payload -Finally, creates a brand new IP header with all the classic IPv4 header -fields (together normally 20 bytes long), which it appends before the -payload Note that the resulting IPsec datagram is a bona fide IPv4 -datagram, with the traditional IPv4 header fields followed by a payload. -But in this case, the payload contains an ESP header, the original IP -datagram, an ESP trailer, and an ESP authentication field (with the -original datagram and ESP trailer encrypted). The original IP datagram -has 172.16.1.17 for the source IP address and 172.16.2.48 for the -destination IP address. Because the IPsec datagram includes the original -IP datagram, these addresses are included (and encrypted) as part of the -payload of the IPsec packet. But what about the source and destination -IP addresses that are in the new IP header, that is, in the left-most -header of the IPsec datagram? As you might expect, they are set to the -source and destination router interfaces at the two ends of the tunnels, -namely, 200.168.1.100 and 193.68.2.23. Also, the protocol number in this -new IPv4 header field is not set to that of TCP, UDP, or SMTP, but -instead to 50, designating that this is an IPsec datagram using the ESP -protocol. After R1 sends the IPsec datagram into the public Internet, it -will pass through many routers before reaching R2. Each of these routers -will process the datagram as if it were an ordinary datagram---they are -completely oblivious to the fact that the datagram is carrying -IPsec-encrypted data. For these public Internet routers, because the -destination IP address in the outer header is R2, the ultimate -destination of the datagram is R2. Having walked through an example of -how an IPsec datagram is constructed, let's now take a closer look at -the ingredients in the enchilada. We see in Figure 8.29 that the ESP -trailer consists of three fields: padding; pad length; and next header. -Recall that block ciphers require the message to be encrypted to be an -integer multiple of the block length. Padding (consisting of meaningless -bytes) is used so that when added to the original datagram (along with -the pad length and next header fields), the resulting "message" is an -integer number of blocks. The pad-length field indicates to the -receiving entity how much padding was inserted (and thus needs to be -removed). The next header identifies the type (e.g., UDP) of data -contained in the payload-data field. The payload data (typically the -original IP datagram) and the ESP trailer are concatenated and then -encrypted. Appended to the front of this encrypted unit is the ESP -header, which is sent in the clear and consists of two fields: the SPI -and the sequence number field. The SPI indicates to the receiving entity -the SA to which the datagram belongs; the receiving entity can then -index its SAD with the SPI to determine the appropriate -authentication/decryption algorithms and keys. The sequence number field -is used to defend against replay attacks. The sending entity also -appends an authentication MAC. As stated earlier, the sending entity -calculates - -a MAC over the whole enchilada (consisting of the ESP header, the -original IP datagram, and the ESP trailer---with the datagram and -trailer being encrypted). Recall that to calculate a MAC, the sender -appends a secret MAC key to the enchilada and then calculates a -fixed-length hash of the result. When R2 receives the IPsec datagram, R2 -observes that the destination IP address of the datagram is R2 itself. -R2 therefore processes the datagram. Because the protocol field (in the -left-most IP header) is 50, R2 sees that it should apply IPsec ESP -processing to the datagram. First, peering into the enchilada, R2 uses -the SPI to determine to which SA the datagram belongs. Second, it -calculates the MAC of the enchilada and verifies that the MAC is -consistent with the value in the ESP MAC field. If it is, it knows that -the enchilada comes from R1 and has not been tampered with. Third, it -checks the sequence-number field to verify that the datagram is fresh -(and not a replayed datagram). Fourth, it decrypts the encrypted unit -using the decryption algorithm and key associated with the SA. Fifth, it -removes padding and extracts the original, vanilla IP datagram. And -finally, sixth, it forwards the original datagram into the branch office -network toward its ultimate destination. Whew, what a complicated -recipe, huh? Well no one ever said that preparing and unraveling an -enchilada was easy! There is actually another important subtlety that -needs to be addressed. It centers on the following question: When R1 -receives an (unsecured) datagram from a host in the headquarters -network, and that datagram is destined to some destination IP address -outside of headquarters, how does R1 know whether it should be converted -to an IPsec datagram? And if it is to be processed by IPsec, how does R1 -know which SA (of many SAs in its SAD) should be used to construct the -IPsec datagram? The problem is solved as follows. Along with a SAD, the -IPsec entity also maintains another data structure called the Security -Policy Database (SPD). The SPD indicates what types of datagrams (as a -function of source IP address, destination IP address, and protocol -type) are to be IPsec processed; and for those that are to be IPsec -processed, which SA should be used. In a sense, the information in a SPD -indicates "what" to do with an arriving datagram; the information in the -SAD indicates "how" to do it. Summary of IPsec Services So what services -does IPsec provide, exactly? Let us examine these services from the -perspective of an attacker, say Trudy, who is a woman-in-the-middle, -sitting somewhere on the path between R1 and R2 in Figure 8.28. Assume -throughout this discussion that Trudy does not know the authentication -and encryption keys used by the SA. What can and cannot Trudy do? First, -Trudy cannot see the original datagram. If fact, not only is the data in -the original datagram hidden from Trudy, but so is the protocol number, -the source IP address, and the destination IP address. For datagrams -sent over the SA, Trudy only knows that the datagram originated from -some host in 172.16.1.0/24 and is destined to some host in -172.16.2.0/24. She does not know if it is carrying TCP, UDP, or ICMP -data; she does not know if it is carrying HTTP, SMTP, or some other type -of application data. This confidentiality thus goes a lot farther than -SSL. Second, suppose Trudy tries to tamper with a datagram in the SA by -flipping some of its bits. When this tampered datagram arrives at R2, it -will fail the integrity check (using the MAC), thwarting - -Trudy's vicious attempts once again. Third, suppose Trudy tries to -masquerade as R1, creating a IPsec datagram with source 200.168.1.100 -and destination 193.68.2.23. Trudy's attack will be futile, as this -datagram will again fail the integrity check at R2. Finally, because -IPsec includes sequence numbers, Trudy will not be able create a -successful replay attack. In summary, as claimed at the beginning of -this section, IPsec provides---between any pair of devices that process -packets through the network layer--- confidentiality, source -authentication, data integrity, and replay-attack prevention. - -8.7.5 IKE: Key Management in IPsec When a VPN has a small number of end -points (for example, just two routers as in Figure 8.28), the network -administrator can manually enter the SA information -(encryption/authentication algorithms and keys, and the SPIs) into the -SADs of the endpoints. Such "manual keying" is clearly impractical for a -large VPN, which may consist of hundreds or even thousands of IPsec -routers and hosts. Large, geographically distributed deployments require -an automated mechanism for creating the SAs. IPsec does this with the -Internet Key Exchange (IKE) protocol, specified in RFC 5996. IKE has -some similarities with the handshake in SSL (see Section 8.6). Each -IPsec entity has a certificate, which includes the entity's public key. -As with SSL, the IKE protocol has the two entities exchange -certificates, negotiate authentication and encryption algorithms, and -securely exchange key material for creating session keys in the IPsec -SAs. Unlike SSL, IKE employs two phases to carry out these tasks. Let's -investigate these two phases in the context of two routers, R1 and R2, -in Figure 8.28. The first phase consists of two exchanges of message -pairs between R1 and R2: During the first exchange of messages, the two -sides use Diffie-Hellman (see Homework Problems) to create a -bi-directional IKE SA between the routers. To keep us all confused, this -bi-directional IKE SA is entirely different from the IPsec SAs discussed -in Sections 8.6.3 and 8.6.4. The IKE SA provides an authenticated and -encrypted channel between the two routers. During this first -message-pair exchange, keys are established for encryption and -authentication for the IKE SA. Also established is a master secret that -will be used to compute IPSec SA keys later in phase 2. Observe that -during this first step, RSA public and private keys are not used. In -particular, neither R1 nor R2 reveals its identity by signing a message -with its private key. During the second exchange of messages, both sides -reveal their identity to each other by signing their messages. However, -the identities are not revealed to a passive sniffer, since the messages -are sent over the secured IKE SA channel. Also during this phase, the -two sides negotiate the IPsec encryption and authentication algorithms -to be employed by the IPsec SAs. In phase 2 of IKE, the two sides create -an SA in each direction. At the end of phase 2, the encryption - -and authentication session keys are established on both sides for the -two SAs. The two sides can then use the SAs to send secured datagrams, -as described in Sections 8.7.3 and 8.7.4. The primary motivation for -having two phases in IKE is computational cost---since the second phase -doesn't involve any public-key cryptography, IKE can generate a large -number of SAs between the two IPsec entities with relatively little -computational cost. - -8.8 Securing Wireless LANs Security is a particularly important concern -in wireless networks, where radio waves carrying frames can propagate -far beyond the building containing the wireless base station and hosts. -In this section we present a brief introduction to wireless security. -For a more in-depth treatment, see the highly readable book by Edney and -Arbaugh \[Edney 2003\]. The issue of security in 802.11 has attracted -considerable attention in both technical circles and in the media. While -there has been considerable discussion, there has been little -debate---there seems to be universal agreement that the original 802.11 -specification contains a number of serious security flaws. Indeed, -public domain software can now be downloaded that exploits these holes, -making those who use the vanilla 802.11 security mechanisms as open to -security attacks as users who use no security features at all. In the -following section, we discuss the security mechanisms initially -standardized in the 802.11 specification, known collectively as Wired -Equivalent Privacy (WEP). As the name suggests, WEP is meant to provide -a level of security similar to that found in wired networks. We'll then -discuss a few of the security holes in WEP and discuss the 802.11i -standard, a fundamentally more secure version of 802.11 adopted in 2004. - -8.8.1 Wired Equivalent Privacy (WEP) The IEEE 802.11 WEP protocol was -designed in 1999 to provide authentication and data encryption between a -host and a wireless access point (that is, base station) using a -symmetric shared key approach. WEP does not specify a key management -algorithm, so it is assumed that the host and wireless access point have -somehow agreed on the key via an out-of-band method. Authentication is -carried out as follows: - -1. A wireless host requests authentication by an access point. - -2. The access point responds to the authentication request with a - 128-byte nonce value. - -3. The wireless host encrypts the nonce using the symmetric key that it - shares with the access point. - -4. The access point decrypts the host-encrypted nonce. If the decrypted - nonce matches the nonce value originally sent to the host, then the - host is - -authenticated by the access point. The WEP data encryption algorithm is -illustrated in Figure 8.30. A secret 40-bit symmetric key, KS, is -assumed to be known by both a host and the access point. In addition, a -24-bit Initialization Vector (IV) is appended to the 40-bit key to -create a 64-bit key that will be used to encrypt a single frame. The IV -will - -Figure 8.30 802.11 WEP protocol - -change from one frame to another, and hence each frame will be encrypted -with a different 64-bit key. Encryption is performed as follows. First a -4-byte CRC value (see Section 6.2) is computed for the data payload. The -payload and the four CRC bytes are then encrypted using the RC4 stream -cipher. We will not cover the details of RC4 here (see \[Schneier 1995\] -and \[Edney 2003\] for details). For our purposes, it is enough to know -that when presented with a key value (in this case, the 64-bit (KS, IV) -key), the RC4 algorithm produces a stream of key values, -k1IV,k2IV,k3IV,... that are used to encrypt the data and CRC value in a -frame. For practical purposes, we can think of these operations being -performed a byte at a time. Encryption is performed by XOR-ing the ith -byte of data, di, with the ith key, kiIV, in the stream of key values -generated by the (KS, IV) pair to produce the ith byte of ciphertext, -ci: ci=di⊕kiIV The IV value changes from one frame to the next and is -included in plaintext in the header of each WEP-encrypted 802.11 frame, -as shown in Figure 8.30. The receiver takes the secret 40-bit symmetric -key that it shares with the sender, appends the IV, and uses the -resulting 64-bit key (which is identical to the key used by the sender -to perform encryption) to decrypt the frame: di=ci⊕kiIV Proper use of -the RC4 algorithm requires that the same 64-bit key value never be used -more than once. Recall that the WEP key changes on a frame-by-frame -basis. For a given KS (which changes rarely, if ever), this means that -there are only 224 unique keys. If these keys are chosen randomly, we -can show - -\[Edney 2003\] that the probability of having chosen the same IV value -(and hence used the same 64-bit key) is more than 99 percent after only -12,000 frames. With 1 Kbyte frame sizes and a data transmission rate of -11 Mbps, only a few seconds are needed before 12,000 frames are -transmitted. Furthermore, since the IV is transmitted in plaintext in -the frame, an eavesdropper will know whenever a duplicate IV value is -used. To see one of the several problems that occur when a duplicate key -is used, consider the following chosen-plaintext attack taken by Trudy -against Alice. Suppose that Trudy (possibly using IP spoofing) sends a -request (for example, an HTTP or FTP request) to Alice to transmit a -file with known content, d1, d2, d3, d4,.... Trudy also observes the -encrypted data c1, c2, c3, c4,.... Since di=ci⊕kiIV, if we XOR ci with -each side of this equality we have di⊕ci=kiIV With this relationship, -Trudy can use the known values of di and ci to compute kiIV. The next -time Trudy sees the same value of IV being used, she will know the key -sequence k1IV,k2IV,k3IV,... and will thus be able to decrypt the -encrypted message. There are several additional security concerns with -WEP as well. \[Fluhrer 2001\] described an attack exploiting a known -weakness in RC4 when certain weak keys are chosen. \[Stubblefield 2002\] -discusses efficient ways to implement and exploit this attack. Another -concern with WEP involves the CRC bits shown in Figure 8.30 and -transmitted in the 802.11 frame to detect altered bits in the payload. -However, an attacker who changes the encrypted content (e.g., -substituting gibberish for the original encrypted data), computes a CRC -over the substituted gibberish, and places the CRC into a WEP frame can -produce an 802.11 frame that will be accepted by the receiver. What is -needed here are message integrity techniques such as those we studied in -Section 8.3 to detect content tampering or substitution. For more -details of WEP security, see \[Edney 2003; Wright 2015\] and the -references therein. - -8.8.2 IEEE 802.11i Soon after the 1999 release of IEEE 802.11, work -began on developing a new and improved version of 802.11 with stronger -security mechanisms. The new standard, known as 802.11i, underwent final -ratification in 2004. As we'll see, while WEP provided relatively weak -encryption, only a single way to perform authentication, and no key -distribution mechanisms, IEEE 802.11i provides for much stronger forms -of encryption, an extensible set of authentication mechanisms, and a key -distribution mechanism. In the following, we present an overview of -802.11i; an excellent (streaming audio) technical overview of 802.11i is -\[TechOnline 2012\]. Figure 8.31 overviews the 802.11i framework. In -addition to the wireless client and access point, - -802.11i defines an authentication server with which the AP can -communicate. Separating the authentication server from the AP allows one -authentication server to serve many APs, centralizing the (often -sensitive) decisions - -Figure 8.31 802.11i: Four phases of operation - -regarding authentication and access within the single server, and -keeping AP costs and complexity low. 802.11i operates in four phases: - -1. Discovery. In the discovery phase, the AP advertises its presence - and the forms of authentication and encryption that can be provided - to the wireless client node. The client then requests the specific - forms of authentication and encryption that it desires. Although the - client and AP are already exchanging messages, the client has not - yet been authenticated nor does it have an encryption key, and so - several more steps will be required before the client can - communicate with an arbitrary remote host over the wireless channel. - -2. Mutual authentication and Master Key (MK) generation. Authentication - takes place between the wireless client and the authentication - server. In this phase, the access point acts essentially as a relay, - forwarding messages between the client and the authentication - server. The Extensible Authentication Protocol (EAP) \[RFC 3748\] - defines the end-to-end message formats used in a simple - request/response mode of interaction between the client and - authentication server. As shown in Figure 8.32, EAP messages are - encapsulated using EAPoL (EAP over LAN, \[IEEE 802.1X\]) and sent - over the 802.11 wireless link. These EAP messages - -are then decapsulated at the access point, and then re-encapsulated -using the RADIUS protocol for transmission over UDP/IP to the -authentication server. While - -Figure 8.32 EAP is an end-to-end protocol. EAP messages are encapsulated -using EAPoL over the wireless link between the client and the access -point, and using RADIUS over UDP/IP between the access point and the -authentication server - -the RADIUS server and protocol \[RFC 2865\] are not required by the -802.11i protocol, they are de facto standard components for 802.11i. The -recently standardized DIAMETER protocol \[RFC 3588\] is likely to -replace RADIUS in the near future. With EAP, the authentication server -can choose one of a number of ways to perform authentication. While -802.11i does not mandate a particular authentication method, the EAPTLS -authentication scheme \[RFC 5216\] is often used. EAP-TLS uses public -key techniques (including nonce encryption and message digests) similar -to those we studied in Section 8.3 to allow the client and the -authentication server to mutually authenticate each other, and to derive -a Master Key (MK) that is known to both parties. - -3. Pairwise Master Key (PMK) generation. The MK is a shared secret - known only to the client and the authentication server, which they - each use to generate a second key, the Pairwise Master Key (PMK). - The authentication server then sends the PMK to the AP. This is - where we wanted to be! The client and AP now have a shared key - (recall that in WEP, the problem of key distribution was not - addressed at all) and have mutually authenticated each other. - They're just about ready to get down to business. - -4. Temporal Key (TK) generation. With the PMK, the wireless client and - AP can now generate additional keys that will be used for - communication. Of particular interest is the Temporal Key (TK), - which will be used to perform the link-level encryption of data sent - over the wireless link and to an arbitrary remote host. 802.11i - provides several forms of encryption, including an AES-based - encryption scheme and a - -strengthened version of WEP encryption. - -8.9 Operational Security: Firewalls and Intrusion Detection Systems -We've seen throughout this chapter that the Internet is not a very safe -place---bad guys are out there, wreaking all sorts of havoc. Given the -hostile nature of the Internet, let's now consider an organization's -network and the network administrator who administers it. From a network -administrator's point of view, the world divides quite neatly into two -camps---the good guys (who belong to the organization's network, and who -should be able to access resources inside the organization's network in -a relatively unconstrained manner) and the bad guys (everyone else, -whose access to network resources must be carefully scrutinized). In -many organizations, ranging from medieval castles to modern corporate -office buildings, there is a single point of entry/exit where both good -guys and bad guys entering and leaving the organization are -security-checked. In a castle, this was done at a gate at one end of the -drawbridge; in a corporate building, this is done at the security desk. -In a computer network, when traffic entering/leaving a network is -security-checked, logged, dropped, or forwarded, it is done by -operational devices known as firewalls, intrusion detection systems -(IDSs), and intrusion prevention systems (IPSs). - -8.9.1 Firewalls A firewall is a combination of hardware and software -that isolates an organization's internal network from the Internet at -large, allowing some packets to pass and blocking others. A firewall -allows a network administrator to control access between the outside -world and resources within the administered network by managing the -traffic flow to and from these resources. A firewall has three goals: -All traffic from outside to inside, and vice versa, passes through the -firewall. Figure 8.33 shows a firewall, sitting squarely at the boundary -between the administered network and the rest of the Internet. While -large organizations may use multiple levels of firewalls or distributed -firewalls \[Skoudis 2006\], locating a firewall at a single access point -to the network, as shown in Figure 8.33, makes it easier to manage and -enforce a security-access policy. Only authorized traffic, as defined by -the local security policy, will be allowed to pass. With all traffic -entering and leaving the institutional network passing through the -firewall, the firewall can restrict access to authorized traffic. The -firewall itself is immune to penetration. The firewall itself is a -device connected to the network. If not designed or installed properly, -it can be compromised, in which case it provides only - -a false sense of security (which is worse than no firewall at all!). - -Figure 8.33 Firewall placement between the administered network and the -outside world - -Cisco and Check Point are two of the leading firewall vendors today. You -can also easily create a firewall (packet filter) from a Linux box using -iptables (public-domain software that is normally shipped with Linux). -Furthermore, as discussed in Chapters 4 and 5, firewalls are now -frequently implemented in routers and controlled remotely using SDNs. -Firewalls can be classified in three categories: traditional packet -filters, stateful filters, and application gateways. We'll cover each of -these in turn in the following subsections. Traditional Packet Filters -As shown in Figure 8.33, an organization typically has a gateway router -connecting its internal network to its ISP (and hence to the larger -public Internet). All traffic leaving and entering the internal network -passes through this router, and it is at this router where packet -filtering occurs. A packet filter examines each datagram in isolation, -determining whether the datagram should be allowed to pass or should be -dropped based on administrator-specific rules. Filtering decisions are -typically based on: IP source or destination address Protocol type in IP -datagram field: TCP, UDP, ICMP, OSPF, and so on TCP or UDP source and -destination port - -Table 8.5 Policies and corresponding filtering rules for an -organization's network 130.207/16 with Web server at 130.207.244.203 -Policy - -Firewall Setting - -No outside Web access. - -Drop all outgoing packets to any IP address, port 80. - -No incoming TCP connections, except those for - -Drop all incoming TCP SYN packets to any - -organization's public Web server only. - -IP except 130.207.244.203, port 80. - -Prevent Web-radios from eating up the - -Drop all incoming UDP packets---except DNS - -available bandwidth. - -packets. - -Prevent your network from being used for a - -Drop all ICMP ping packets going to a - -smurf DoS attack. - -"broadcast" address (eg 130.207.255.255). - -Prevent your network from being tracerouted. - -Drop all outgoing ICMP TTL expired traffic. - -TCP flag bits: SYN, ACK, and so on ICMP message type Different rules for -datagrams leaving and entering the network Different rules for the -different router interfaces A network administrator configures the -firewall based on the policy of the organization. The policy may take -user productivity and bandwidth usage into account as well as the -security concerns of an organization. Table 8.5 lists a number of -possible polices an organization may have, and how they would be -addressed with a packet filter. For example, if the organization doesn't -want any incoming TCP connections except those for its public Web -server, it can block all incoming TCP SYN segments except TCP SYN -segments with destination port 80 and the destination IP address -corresponding to the Web server. If the organization doesn't want its -users to monopolize access bandwidth with Internet radio applications, -it can block all not-critical UDP traffic (since Internet radio is often -sent over UDP). If the organization doesn't want its internal network to -be mapped (tracerouted) by an outsider, it can block all ICMP TTL -expired messages leaving the organization's network. A filtering policy -can be based on a combination of addresses and port numbers. For -example, a filtering router could forward all Telnet datagrams (those -with a port number of 23) except those going to and coming from a list -of specific IP addresses. This policy permits Telnet connections to and -from hosts on the allowed list. Unfortunately, basing the policy on -external addresses provides no protection against - -datagrams that have had their source addresses spoofed. Filtering can -also be based on whether or not the TCP ACK bit is set. This trick is -quite useful if an organization wants to let its internal clients -connect to external servers but wants to prevent external clients from -connecting to internal servers. Table 8.6 An access control list for a -router interface action - -allow - -source address - -222.22/16 - -dest address - -source - -dest - -flag - -port - -port - -bit - -TCP - -> 1023 - -80 - -any - -222.22/16 - -TCP - -80 - -> 1023 - -ACK - -outside of - -UDP - -> 1023 - -53 - ---- - -222.22/16 - -UDP - -53 - -> 1023 - ---- - -all - -all - -all - -all - -all - -outside of - -protocol - -222.22/16 allow - -outside of 222.22/16 - -allow - -222.22/16 - -222.22/16 allow - -outside of 222.22/16 - -deny - -all - -Recall from Section 3.5 that the first segment in every TCP connection -has the ACK bit set to 0, whereas all the other segments in the -connection have the ACK bit set to 1. Thus, if an organization wants to -prevent external clients from initiating connections to internal -servers, it simply filters all incoming segments with the ACK bit set to -0. This policy kills all TCP connections originating from the outside, -but permits connections originating internally. Firewall rules are -implemented in routers with access control lists, with each router -interface having its own list. An example of an access control list for -an organization 222.22/16 is shown in Table 8.6. This access control -list is for an interface that connects the router to the organization's -external ISPs. Rules are applied to each datagram that passes through -the interface from top to bottom. The first two rules together allow -internal users to surf the Web: The first rule allows any TCP packet -with destination port 80 to leave the organization's network; the second -rule allows any TCP packet with source port 80 and the ACK bit set to -enter the organization's network. Note that if an external source -attempts to establish a TCP connection with an internal host, the -connection will be blocked, even if the source or destination port is -80. The second two rules together allow DNS packets to enter and leave -the organization's - -network. In summary, this rather restrictive access control list blocks -all traffic except Web traffic initiated from within the organization -and DNS traffic. \[CERT Filtering 2012\] provides a list of recommended -port/protocol packet filterings to avoid a number of well-known security -holes in existing network applications. Stateful Packet Filters In a -traditional packet filter, filtering decisions are made on each packet -in isolation. Stateful filters actually track TCP connections, and use -this knowledge to make filtering decisions. Table 8.7 Connection table -for stateful filter source address - -dest address - -source port - -dest port - -222.22.1.7 - -37.96.87.123 - -12699 - -80 - -222.22.93.2 - -199.1.205.23 - -37654 - -80 - -222.22.65.143 - -203.77.240.43 - -48712 - -80 - -To understand stateful filters, let's reexamine the access control list -in Table 8.6. Although rather restrictive, the access control list in -Table 8.6 nevertheless allows any packet arriving from the outside with -ACK = 1 and source port 80 to get through the filter. Such packets could -be used by attackers in attempts to crash internal systems with -malformed packets, carry out denial-of-service attacks, or map the -internal network. The naive solution is to block TCP ACK packets as -well, but such an approach would prevent the organization's internal -users from surfing the Web. Stateful filters solve this problem by -tracking all ongoing TCP connections in a connection table. This is -possible because the firewall can observe the beginning of a new -connection by observing a three-way handshake (SYN, SYNACK, and ACK); -and it can observe the end of a connection when it sees a FIN packet for -the connection. The firewall can also (conservatively) assume that the -connection is over when it hasn't seen any activity over the connection -for, say, 60 seconds. An example connection table for a firewall is -shown in Table 8.7. This connection table indicates that there are -currently three ongoing TCP connections, all of which have been -initiated from within the organization. Additionally, the stateful -filter includes a new column, "check connection," in its access control -list, as shown in Table 8.8. Note that Table 8.8 is identical to the -access control list in Table 8.6, except now it indicates that the -connection should be checked for two of the rules. Let's walk through -some examples to see how the connection table and the extended access -control list - -work hand-in-hand. Suppose an attacker attempts to send a malformed -packet into the organization's network by sending a datagram with TCP -source port 80 and with the ACK flag set. Further suppose that this -packet has source port number 12543 and source IP address 150.23.23.155. -When this packet reaches the firewall, the firewall checks the access -control list in Table 8.7, which indicates that the connection table -must also be checked before permitting this packet to enter the -organization's network. The firewall duly checks the connection table, -sees that this packet is not part of an ongoing TCP connection, and -rejects the packet. As a second example, suppose that an internal user -wants to surf an external Web site. Because this user first sends a TCP -SYN segment, the user's TCP connection gets recorded in the connection -table. When Table 8.8 Access control list for stateful filter action - -allow - -source address - -222.22/16 - -dest address - -outside of - -protocol - -source - -dest - -flag - -check - -port - -port - -bit - -conxion - -TCP - -> 1023 - -80 - -any - -TCP - -80 - -ACK - -222.22/16 allow - -outside of - -222.22/16 - -222.22/16 allow - -222.22/16 - -X - -1023 outside of - -UDP - -> 1023 - -53 - ---- - -UDP - -53 - ---- - -222.22/16 allow - -outside of - -222.22/16 - -222.22/16 deny - -all - -X - -1023 all - -all - -all - -all - -all - -the Web server sends back packets (with the ACK bit necessarily set), -the firewall checks the table and sees that a corresponding connection -is in progress. The firewall will thus let these packets pass, thereby -not interfering with the internal user's Web surfing activity. -Application Gateway In the examples above, we have seen that -packet-level filtering allows an organization to perform coarse-grain -filtering on the basis of the contents of IP and TCP/UDP headers, -including IP addresses, port numbers, and acknowledgment bits. But what -if an organization wants to provide a Telnet service to a restricted set -of internal users (as opposed to IP addresses)? And what if the -organization wants such privileged users to authenticate themselves -first before being allowed to create Telnet sessions to the - -outside world? Such tasks are beyond the capabilities of traditional and -stateful filters. Indeed, information about the identity of the internal -users is application-layer data and is not included in the IP/TCP/UDP -headers. To have finer-level security, firewalls must combine packet -filters with application gateways. Application gateways look beyond the -IP/TCP/UDP headers and make policy decisions based on application data. -An application gateway is an application-specific server through which -all application data (inbound and outbound) must pass. Multiple -application gateways can run on the same host, but each gateway is a -separate server with its own processes. To get some insight into -application gateways, let's design a firewall that allows only a -restricted set of internal users to Telnet outside and prevents all -external clients from Telneting inside. Such a policy can be -accomplished by implementing - -Figure 8.34 Firewall consisting of an application gateway and a filter - -a combination of a packet filter (in a router) and a Telnet application -gateway, as shown in Figure 8.34. The router's filter is configured to -block all Telnet connections except those that originate from the IP -address of the application gateway. Such a filter configuration forces -all outbound Telnet connections to pass through the application gateway. -Consider now an internal user who wants to Telnet to the outside world. -The user must first set up a Telnet session with the application -gateway. An application running in the gateway, which listens for -incoming Telnet sessions, prompts the user for a user ID and password. -When the user supplies this information, the application gateway checks -to see if the user has - -permission to Telnet to the outside world. If not, the Telnet connection -from the internal user to the gateway is terminated by the gateway. If -the user has permission, then the gateway (1) prompts the user for the -host name of the external host to which the user wants to connect, (2) -sets up a Telnet session between the gateway and the external host, and -(3) relays to the external host all data arriving from the user, and -relays to the user all data arriving from the external host. Thus, the -Telnet application gateway not only performs user authorization but also -acts as a Telnet server and a Telnet client, relaying information -between the user and the remote Telnet server. Note that the filter will -permit step 2 because the gateway initiates the Telnet connection to the -outside world. - -CASE HISTORY ANONYMITY AND PRIVACY Suppose you want to visit a -controversial Web site (for example, a political activist site) and you -(1) don't want to reveal your IP address to the Web site, (2) don't want -your local ISP (which may be your home or office ISP) to know that you -are visiting the site, and (3) don't want your local ISP to see the data -you are exchanging with the site. If you use the traditional approach of -connecting directly to the Web site without any encryption, you fail on -all three counts. Even if you use SSL, you fail on the first two counts: -Your source IP address is presented to the Web site in every datagram -you send; and the destination address of every packet you send can -easily be sniffed by your local ISP. To obtain privacy and anonymity, -you can instead use a combination of a trusted proxy server and SSL, as -shown in Figure 8.35. With this approach, you first make an SSL -connection to the trusted proxy. You then send, into this SSL -connection, an HTTP request for a page at the desired site. When the -proxy receives the SSL-encrypted HTTP request, it decrypts the request -and forwards the cleartext HTTP request to the Web site. The Web site -then responds to the proxy, which in turn forwards the response to you -over SSL. Because the Web site only sees the IP address of the proxy, -and not of your client's address, you are indeed obtaining anonymous -access to the Web site. And because all traffic between you and the -proxy is encrypted, your local ISP cannot invade your privacy by logging -the site you visited or recording the data you are exchanging. Many -companies today (such as proxify .com) make available such proxy -services. Of course, in this solution, your proxy knows everything: It -knows your IP address and the IP address of the site you're surfing; and -it can see all the traffic in cleartext exchanged between you and the -Web site. Such a solution, therefore, is only as good as the -trustworthiness of the proxy. A more robust approach, taken by the TOR -anonymizing and privacy service, is to route your traffic through a -series of non-colluding proxy servers \[TOR 2016\]. In particular, TOR -allows independent individuals to contribute proxies to its proxy pool. -When a user connects to a server using TOR, TOR randomly chooses (from -its proxy pool) a chain of three proxies and routes all traffic between -client and server over the chain. In this manner, assuming the proxies -do not collude, no one knows that communication took place between your -IP address and the - -target Web site. Furthermore, although cleartext is sent between the -last proxy and the server, the last proxy doesn't know what IP address -is sending and receiving the cleartext. - -Figure 8.35 Providing anonymity and privacy with a proxy - -Internal networks often have multiple application gateways, for example, -gateways for Telnet, HTTP, FTP, and e-mail. In fact, an organization's -mail server (see Section 2.3) and Web cache are application gateways. -Application gateways do not come without their disadvantages. First, a -different application gateway is needed for each application. Second, -there is a performance penalty to be paid, since all data will be -relayed via the gateway. This becomes a concern particularly when -multiple users or applications are using the same gateway machine. -Finally, the client software must know how to contact the gateway when -the user makes a request, and must know how to tell the application -gateway what external server to connect to. - -8.9.2 Intrusion Detection Systems We've just seen that a packet filter -(traditional and stateful) inspects IP, TCP, UDP, and ICMP header fields -when deciding which packets to let pass through the firewall. However, -to detect many attack types, we need to perform deep packet inspection, -that is, look beyond the header fields and into the actual application -data that the packets carry. As we saw in Section 8.9.1, application -gateways often do deep packet inspection. But an application gateway -only does this for a specific application. Clearly, there is a niche for -yet another device---a device that not only examines the headers of all -packets passing through it (like a packet filter), but also performs -deep packet inspection (unlike a packet filter). When such a device -observes a suspicious packet, or a suspicious series of packets, it -could prevent those packets from entering the organizational network. -Or, because the activity is only - -deemed as suspicious, the device could let the packets pass, but send -alerts to a network administrator, who can then take a closer look at -the traffic and take appropriate actions. A device that generates alerts -when it observes potentially malicious traffic is called an intrusion -detection system (IDS). A device that filters out suspicious traffic is -called an intrusion prevention system (IPS). In this section we study -both systems---IDS and IPS---together, since the most interesting -technical aspect of these systems is how they detect suspicious traffic -(and not whether they send alerts or drop packets). We will henceforth -collectively refer to IDS systems and IPS systems as IDS systems. An IDS -can be used to detect a wide range of attacks, including network mapping -(emanating, for example, from nmap), port scans, TCP stack scans, DoS -bandwidth-flooding attacks, worms and viruses, OS vulnerability attacks, -and application vulnerability attacks. (See Section 1.6 for a survey of -network attacks.) Today, thousands of organizations employ IDS systems. -Many of these deployed systems are proprietary, marketed by Cisco, Check -Point, and other security equipment vendors. But many of the deployed -IDS systems are public-domain systems, such as the immensely popular -Snort IDS system (which we'll discuss shortly). An organization may -deploy one or more IDS sensors in its organizational network. Figure -8.36 shows an organization that has three IDS sensors. When multiple -sensors are deployed, they typically work in concert, sending -information about - -Figure 8.36 An organization deploying a filter, an application gateway, -and IDS sensors - -suspicious traffic activity to a central IDS processor, which collects -and integrates the information and sends alarms to network -administrators when deemed appropriate. In Figure 8.36, the organization -has partitioned its network into two regions: a high-security region, -protected by a packet filter and an application gateway and monitored by -IDS sensors; and a lower-security region---referred to as the -demilitarized zone (DMZ)---which is protected only by the packet filter, -but also monitored by IDS sensors. Note that the DMZ includes the -organization's servers that need to communicate with the outside world, -such as its public Web server and its authoritative DNS server. You may -be wondering at this stage, why multiple IDS sensors? Why not just place -one IDS sensor just behind the packet filter (or even integrated with -the packet filter) in Figure 8.36? We will soon see that an IDS not only -needs to do deep packet inspection, but must also compare each passing -packet with tens of thousands of "signatures"; this can be a significant -amount of processing, particularly if the organization receives -gigabits/sec of traffic from the Internet. By placing the IDS sensors -further downstream, each sensor sees only a fraction of the -organization's traffic, and can more easily keep up. Nevertheless, -high-performance IDS and IPS systems are available today, and many -organizations can actually get by with just one sensor located near its -access router. IDS systems are broadly classified as either -signature-based systems or anomaly-based systems. A signature-based IDS -maintains an extensive database of attack signatures. Each signature is -a set of rules pertaining to an intrusion activity. A signature may -simply be a list of characteristics about a single packet (e.g., source -and destination port numbers, protocol type, and a specific string of -bits in the packet payload), or may relate to a series of packets. The -signatures are normally created by skilled network security engineers -who research known attacks. An organization's network administrator can -customize the signatures or add its own to the database. Operationally, -a signature-based IDS sniffs every packet passing by it, comparing each -sniffed packet with the signatures in its database. If a packet (or -series of packets) matches a signature in the database, the IDS -generates an alert. The alert could be sent to the network administrator -in an e-mail message, could be sent to the network management system, or -could simply be logged for future inspection. Signature-based IDS -systems, although widely deployed, have a number of limitations. Most -importantly, they require previous knowledge of the attack to generate -an accurate signature. In other words, a signature-based IDS is -completely blind to new attacks that have yet to be recorded. Another -disadvantage is that even if a signature is matched, it may not be the -result of an attack, so that a false alarm is generated. Finally, -because every packet must be compared with an extensive collection of -signatures, the IDS can become overwhelmed with processing and actually -fail to detect many malicious - -packets. An anomaly-based IDS creates a traffic profile as it observes -traffic in normal operation. It then looks for packet streams that are -statistically unusual, for example, an inordinate percentage of ICMP -packets or a sudden exponential growth in port scans and ping sweeps. -The great thing about anomaly-based IDS systems is that they don't rely -on previous knowledge about existing attacks---that is, they can -potentially detect new, undocumented attacks. On the other hand, it is -an extremely challenging problem to distinguish between normal traffic -and statistically unusual traffic. To date, most IDS deployments are -primarily signature-based, although some include some anomaly-based -features. Snort Snort is a public-domain, open source IDS with hundreds -of thousands of existing deployments \[Snort 2012; Koziol 2003\]. It can -run on Linux, UNIX, and Windows platforms. It uses the generic sniffing -interface libpcap, which is also used by Wireshark and many other packet -sniffers. It can easily handle 100 Mbps of traffic; for installations -with gibabit/sec traffic rates, multiple Snort sensors may be needed. To -gain some insight into Snort, let's take a look at an example of a Snort -signature: - -alert icmp \$EXTERNAL_NET any -\> \$HOME_NET any (msg:"ICMP PING NMAP"; -dsize: 0; itype: 8;) - -This signature is matched by any ICMP packet that enters the -organization's network ( \$HOME_NET ) from the outside ( \$EXTERNAL_NET -), is of type 8 (ICMP ping), and has an empty payload (dsize = 0). Since -nmap (see Section 1.6) generates ping packets with these specific -characteristics, this signature is designed to detect nmap ping sweeps. -When a packet matches this signature, Snort generates an alert that -includes the message "ICMP PING NMAP" . Perhaps what is most impressive -about Snort is the vast community of users and security experts that -maintain its signature database. Typically within a few hours of a new -attack, the Snort community writes and releases an attack signature, -which is then downloaded by the hundreds of thousands of Snort -deployments distributed around the world. Moreover, using the Snort -signature syntax, network administrators can tailor the signatures to -their own organization's needs by either modifying existing signatures -or creating entirely new ones. - -8.10 Summary In this chapter, we've examined the various mechanisms that -our secret lovers, Bob and Alice, can use to communicate securely. We've -seen that Bob and Alice are interested in confidentiality (so they alone -are able to understand the contents of a transmitted message), end-point -authentication (so they are sure that they are talking with each other), -and message integrity (so they are sure that their messages are not -altered in transit). Of course, the need for secure communication is not -confined to secret lovers. Indeed, we saw in Sections 8.5 through 8.8 -that security can be used in various layers in a network architecture to -protect against bad guys who have a large arsenal of possible attacks at -hand. The first part of this chapter presented various principles -underlying secure communication. In Section 8.2, we covered -cryptographic techniques for encrypting and decrypting data, including -symmetric key cryptography and public key cryptography. DES and RSA were -examined as specific case studies of these two major classes of -cryptographic techniques in use in today's networks. In Section 8.3, we -examined two approaches for providing message integrity: message -authentication codes (MACs) and digital signatures. The two approaches -have a number of parallels. Both use cryptographic hash functions and -both techniques enable us to verify the source of the message as well as -the integrity of the message itself. One important difference is that -MACs do not rely on encryption whereas digital signatures require a -public key infrastructure. Both techniques are extensively used in -practice, as we saw in Sections 8.5 through 8.8. Furthermore, digital -signatures are used to create digital certificates, which are important -for verifying the validity of public keys. In Section 8.4, we examined -endpoint authentication and introduced nonces to defend against the -replay attack. In Sections 8.5 through 8.8 we examined several security -networking protocols that enjoy extensive use in practice. We saw that -symmetric key cryptography is at the core of PGP, SSL, IPsec, and -wireless security. We saw that public key cryptography is crucial for -both PGP and SSL. We saw that PGP uses digital signatures for message -integrity, whereas SSL and IPsec use MACs. Having now an understanding -of the basic principles of cryptography, and having studied how these -principles are actually used, you are now in position to design your own -secure network protocols! Armed with the techniques covered in Sections -8.2 through 8.8, Bob and Alice can communicate securely. (One can only -hope that they are networking students who have learned this material -and can thus avoid having their tryst uncovered by Trudy!) But -confidentiality is only a small part of the network security picture. As -we learned in Section 8.9, increasingly, the focus in network security -has been on securing the network infrastructure against a potential -onslaught by the bad guys. In the latter part of this chapter, we thus -covered firewalls and IDS systems which inspect packets entering and -leaving an - -organization's network. This chapter has covered a lot of ground, while -focusing on the most important topics in modern network security. -Readers who desire to dig deeper are encouraged to investigate the -references cited in this chapter. In particular, we recommend \[Skoudis -2006\] for attacks and operational security, \[Kaufman 1995\] for -cryptography and how it applies to network security, \[Rescorla 2001\] -for an in-depth but readable treatment of SSL, and \[Edney 2003\] for a -thorough discussion of 802.11 security, including an insightful -investigation into WEP and its flaws. - -Homework Problems and Questions - -Chapter 8 Review Problems - -SECTION 8.1 R1. What are the differences between message confidentiality -and message integrity? Can you have confidentiality without integrity? -Can you have integrity without confidentiality? Justify your answer. R2. -Internet entities (routers, switches, DNS servers, Web servers, user end -systems, and so on) often need to communicate securely. Give three -specific example pairs of Internet entities that may want secure -communication. - -SECTION 8.2 R3. From a service perspective, what is an important -difference between a symmetric-key system and a public-key system? R4. -Suppose that an intruder has an encrypted message as well as the -decrypted version of that message. Can the intruder mount a -ciphertext-only attack, a known-plaintext attack, or a chosenplaintext -attack? R5. Consider an 8-block cipher. How many possible input blocks -does this cipher have? How many possible mappings are there? If we view -each mapping as a key, then how many possible keys does this cipher -have? R6. Suppose N people want to communicate with each of N−1 other -people using symmetric key encryption. All communication between any two -people, i and j, is visible to all other people in this group of N, and -no other person in this group should be able to decode their -communication. How many keys are required in the system as a whole? Now -suppose that public key encryption is used. How many keys are required -in this case? R7. Suppose n=10,000, a=10,023, and b=10,004. Use an -identity of modular arithmetic to calculate in your head (a⋅b)mod n. R8. -Suppose you want to encrypt the message 10101111 by encrypting the -decimal number that corresponds to the message. What is the decimal -number? - -SECTIONS 8.3--8.4 - -R9. In what way does a hash provide a better message integrity check -than a checksum (such as the Internet checksum)? R10. Can you "decrypt" -a hash of a message to get the original message? Explain your answer. -R11. Consider a variation of the MAC algorithm (Figure 8.9 ) where the -sender sends (m, H(m)+s), where H(m)+s is the concatenation of H(m) and -s. Is this variation flawed? Why or why not? R12. What does it mean for -a signed document to be verifiable and nonforgeable? R13. In what way -does the public-key encrypted message hash provide a better digital -signature than the public-key encrypted message? R14. Suppose -certifier.com creates a certificate for foo.com. Typically, the entire -certificate would be encrypted with certifier.com's public key. True or -false? R15. Suppose Alice has a message that she is ready to send to -anyone who asks. Thousands of people want to obtain Alice's message, but -each wants to be sure of the integrity of the message. In this context, -do you think a MAC-based or a digital-signature-based integrity scheme -is more suitable? Why? R16. What is the purpose of a nonce in an -end-point authentication protocol? R17. What does it mean to say that a -nonce is a once-in-a-lifetime value? In whose lifetime? R18. Is the -message integrity scheme based on HMAC susceptible to playback attacks? -If so, how can a nonce be incorporated into the scheme to remove this -susceptibility? - -SECTIONS 8.5--8.8 R19. Suppose that Bob receives a PGP message from -Alice. How does Bob know for sure that Alice created the message (rather -than, say, Trudy)? Does PGP use a MAC for message integrity? R20. In the -SSL record, there is a field for SSL sequence numbers. True or false? -R21. What is the purpose of the random nonces in the SSL handshake? R22. -Suppose an SSL session employs a block cipher with CBC. True or false: -The server sends to the client the IV in the clear. R23. Suppose Bob -initiates a TCP connection to Trudy who is pretending to be Alice. -During the handshake, Trudy sends Bob Alice's certificate. In what step -of the SSL handshake algorithm will Bob discover that he is not -communicating with Alice? R24. Consider sending a stream of packets from -Host A to Host B using IPsec. Typically, a new SA will be established -for each packet sent in the stream. True or false? R25. Suppose that TCP -is being run over IPsec between headquarters and the branch office in -Figure 8.28 . If TCP retransmits the same packet, then the two -corresponding packets sent by R1 packets will have the same sequence -number in the ESP header. True or false? R26. An IKE SA and an IPsec SA -are the same thing. True or false? R27. Consider WEP for 802.11. Suppose -that the data is 10101100 and the keystream is 1111000. What is the -resulting ciphertext? - -R28. In WEP, an IV is sent in the clear in every frame. True or false? - -SECTION 8.9 R29. Stateful packet filters maintain two data structures. -Name them and briefly describe what they do. R30. Consider a traditional -(stateless) packet filter. This packet filter may filter packets based -on TCP flag bits as well as other header fields. True or false? R31. In -a traditional packet filter, each interface can have its own access -control list. True or false? R32. Why must an application gateway work -in conjunction with a router filter to be effective? R33. -Signature-based IDSs and IPSs inspect into the payloads of TCP and UDP -segments. True or false? - -Problems P1. Using the monoalphabetic cipher in Figure 8.3 , encode the -message "This is an easy problem." Decode the message "rmij'u uamu xyj." -P2. Show that Trudy's known-plaintext attack, in which she knows the -(ciphertext, plaintext) translation pairs for seven letters, reduces the -number of possible substitutions to be checked in the example in Section -8.2.1 by approximately 109. P3. Consider the polyalphabetic system shown -in Figure 8.4 . Will a chosen-plaintext attack that is able to get the -plaintext encoding of the message "The quick brown fox jumps over the -lazy dog." be sufficient to decode all messages? Why or why not? P4. -Consider the block cipher in Figure 8.5 . Suppose that each block cipher -Ti simply reverses the order of the eight input bits (so that, for -example, 11110000 becomes 00001111). Further suppose that the 64-bit -scrambler does not modify any bits (so that the output value of the mth -bit is equal to the input value of the mth bit). (a) With n=3 and the -original 64-bit input equal to 10100000 repeated eight times, what is -the value of the output? (b) Repeat part (a) but now change the last bit -of the original 64-bit input from a 0 to a 1. (c) Repeat parts (a) and -(b) but now suppose that the 64-bit scrambler inverses the order of the -64 bits. P5. Consider the block cipher in Figure 8.5 . For a given "key" -Alice and Bob would need to keep eight tables, each 8 bits by 8 bits. -For Alice (or Bob) to store all eight tables, how many bits of storage -are necessary? How does this number compare with the number of bits -required for a full-table 64-bit block cipher? P6. Consider the 3-bit -block cipher in Table 8.1 . Suppose the plaintext is 100100100. (a) -Initially assume that CBC is not used. What is the resulting ciphertext? -(b) Suppose Trudy sniffs the ciphertext. Assuming she knows that a 3-bit -block cipher without CBC is being employed (but doesn't know the -specific cipher), what can she surmise? (c) Now suppose that CBC is used - -with IV=111. What is the resulting ciphertext? P7. (a) Using RSA, choose -p=3 and q=11, and encode the word "dog" by encrypting each letter -separately. Apply the decryption algorithm to the encrypted version to -recover the original plaintext message. (b) Repeat part (a) but now -encrypt "dog" as one message m. P8. Consider RSA with p=5 and q=11. - -a. What are n and z? - -b. Let e be 3. Why is this an acceptable choice for e? - -c. Find d such that de=1 (mod z) and d\<160. - -d. Encrypt the message m=8 using the key (n, e). Let c denote the - corresponding ciphertext. Show all work. Hint: To simplify the - calculations, use the fact: \[ (a mod n)⋅(b mod n)\]mod n=(a⋅b)modn - P9. In this problem, we explore the Diffie-Hellman (DH) public-key - encryption algorithm, which allows two entities to agree on a shared - key. The DH algorithm makes use of a large prime number p and - another large number g less than p. Both p and g are made public (so - that an attacker would know them). In DH, Alice and Bob each - independently choose secret keys, SA and SB, respectively. Alice - then computes her public key, TA, by raising g to SA and then taking - mod p. Bob similarly computes his own public key TB by raising g to - SB and then taking mod p. Alice and Bob then exchange their public - keys over the Internet. Alice then calculates the shared secret key - S by raising TB to SA and then taking mod p. Similarly, Bob - calculates the shared key S′ by raising TA to SB and then taking mod - p. - -e. Prove that, in general, Alice and Bob obtain the same symmetric key, - that is, prove S=S′. - -f. With p = 11 and g = 2, suppose Alice and Bob choose private keys - SA=5 and SB=12, respectively. Calculate Alice's and Bob's public - keys, TA and TB. Show all work. - -g. Following up on part (b), now calculate S as the shared symmetric - key. Show all work. - -h. Provide a timing diagram that shows how Diffie-Hellman can be - attacked by a man-inthe-middle. The timing diagram should have three - vertical lines, one for Alice, one for Bob, and one for the attacker - Trudy. P10. Suppose Alice wants to communicate with Bob using - symmetric key cryptography using a session key KS. In Section 8.2 , - we learned how public-key cryptography can be used to distribute the - session key from Alice to Bob. In this problem, we explore how the - session key can be distributed---without public key - cryptography---using a key distribution center (KDC). The KDC is a - server that shares a unique secret symmetric key with each - registered user. For Alice and Bob, denote these keys by KA-KDC and - KB-KDC. Design a scheme that uses the KDC to distribute KS to Alice - and Bob. Your scheme should use three messages to distribute the - session key: a message from Alice to the KDC; a message from the KDC - to Alice; and finally a message from Alice to Bob. The first message - is KA-KDC (A, B). Using the notation, KA-KDC, KB-KDC, S, A, and B - answer the following questions. - -a. What is the second message? b. What is the third message? P11. -Compute a third message, different from the two messages in Figure 8.8 , -that has the same checksum as the messages in Figure 8.8 . P12. Suppose -Alice and Bob share two secret keys: an authentication key S1 and a -symmetric encryption key S2. Augment Figure 8.9 so that both integrity -and confidentiality are provided. P13. In the BitTorrent P2P file -distribution protocol (see Chapter 2 ), the seed breaks the file into -blocks, and the peers redistribute the blocks to each other. Without any -protection, an attacker can easily wreak havoc in a torrent by -masquerading as a benevolent peer and sending bogus blocks to a small -subset of peers in the torrent. These unsuspecting peers then -redistribute the bogus blocks to other peers, which in turn redistribute -the bogus blocks to even more peers. Thus, it is critical for BitTorrent -to have a mechanism that allows a peer to verify the integrity of a -block, so that it doesn't redistribute bogus blocks. Assume that when a -peer joins a torrent, it initially gets a .torrent file from a fully -trusted source. Describe a simple scheme that allows peers to verify the -integrity of blocks. P14. The OSPF routing protocol uses a MAC rather -than digital signatures to provide message integrity. Why do you think a -MAC was chosen over digital signatures? P15. Consider our authentication -protocol in Figure 8.18 in which Alice authenticates herself to Bob, -which we saw works well (i.e., we found no flaws in it). Now suppose -that while Alice is authenticating herself to Bob, Bob must authenticate -himself to Alice. Give a scenario by which Trudy, pretending to be -Alice, can now authenticate herself to Bob as Alice. (Hint: Consider -that the sequence of operations of the protocol, one with Trudy -initiating and one with Bob initiating, can be arbitrarily interleaved. -Pay particular attention to the fact that both Bob and Alice will use a -nonce, and that if care is not taken, the same nonce can be used -maliciously.) P16. A natural question is whether we can use a nonce and -public key cryptography to solve the end-point authentication problem in -Section 8.4 . Consider the following natural protocol: (1) Alice sends -the message " I am Alice " to Bob. (2) Bob chooses a nonce, R, and sends -it to Alice. (3) Alice uses her private key to encrypt the nonce and -sends the resulting value to Bob. (4) Bob applies Alice's public key to -the received message. Thus, Bob computes R and authenticates Alice. - -a. Diagram this protocol, using the notation for public and private - keys employed in the textbook. - -b. Suppose that certificates are not used. Describe how Trudy can - become a "woman-inthe-middle" by intercepting Alice's messages and - then pretending to be Alice to Bob. P17. Figure 8.19 shows the - operations that Alice must perform with PGP to provide - confidentiality, authentication, and integrity. Diagram the - corresponding operations that Bob must perform on the package - received from Alice. P18. Suppose Alice wants to send an e-mail to - Bob. Bob has a public-private key pair - -(KB+,KB−), and Alice has Bob's certificate. But Alice does not have a -public, private key pair. Alice and Bob (and the entire world) share the -same hash function H(⋅). - -a. In this situation, is it possible to design a scheme so that Bob can - verify that Alice created the message? If so, show how with a block - diagram for Alice and Bob. - -b. Is it possible to design a scheme that provides confidentiality for - sending the message from Alice to Bob? If so, show how with a block - diagram for Alice and Bob. P19. Consider the Wireshark output below - for a portion of an SSL session. - -c. Is Wireshark packet 112 sent by the client or server? - -d. What is the server's IP address and port number? - -e. Assuming no loss and no retransmissions, what will be the sequence - number of the next TCP segment sent by the client? - -f. How many SSL records does Wireshark packet 112 contain? - -g. Does packet 112 contain a Master Secret or an Encrypted Master - Secret or neither? - -h. Assuming that the handshake type field is 1 byte and each length - field is 3 bytes, what are the values of the first and last bytes of - the Master Secret (or Encrypted Master Secret)? - -i. The client encrypted handshake message takes into account how many - SSL records? - -j. The server encrypted handshake message takes into account how many - SSL records? P20. In Section 8.6.1 , it is shown that without - sequence numbers, Trudy (a woman-in-the middle) can wreak havoc in - an SSL session by interchanging TCP segments. Can Trudy do something - similar by deleting a TCP segment? What does she need to do to - succeed at the deletion attack? What effect will it have? - -(Wireshark screenshot reprinted by permission of the Wireshark -Foundation.) - -P21. Suppose Alice and Bob are communicating over an SSL session. -Suppose an attacker, who does not have any of the shared keys, inserts a -bogus TCP segment into a packet stream with correct TCP checksum and -sequence numbers (and correct IP addresses and port numbers). Will SSL -at the receiving side accept the bogus packet and pass the payload to -the receiving application? Why or why not? P22. The following true/false -questions pertain to Figure 8.28 . - -a. When a host in 172.16.1/24 sends a datagram to an Amazon.com server, - the router R1 will encrypt the datagram using IPsec. - -b. When a host in 172.16.1/24 sends a datagram to a host in - 172.16.2/24, the router R1 will change the source and destination - address of the IP datagram. - -c. Suppose a host in 172.16.1/24 initiates a TCP connection to a Web - server in 172.16.2/24. As part of this connection, all datagrams - sent by R1 will have protocol number 50 in the left-most IPv4 header - field. - -d. Consider sending a TCP segment from a host in 172.16.1/24 to a host - in 172.16.2/24. Suppose the acknowledgment for this segment gets - lost, so that TCP resends the segment. Because IPsec uses sequence - numbers, R1 will not resend the TCP segment. - -P23. Consider the example in Figure 8.28 . Suppose Trudy is a -woman-in-the-middle, who can insert datagrams into the stream of -datagrams going from R1 and R2. As part of a replay attack, Trudy sends -a duplicate copy of one of the datagrams sent from R1 to R2. Will R2 -decrypt the duplicate datagram and forward it into the branch-office -network? If not, describe in detail how R2 detects the duplicate -datagram. P24. Consider the following pseudo-WEP protocol. The key is 4 -bits and the IV is 2 bits. The IV is appended to the end of the key when -generating the keystream. Suppose that the shared secret key is 1010. -The keystreams for the four possible inputs are as follows: 101000: -0010101101010101001011010100100 . . . 101001: -1010011011001010110100100101101 . . . 101010: -0001101000111100010100101001111 . . . 101011: -1111101010000000101010100010111 . . . Suppose all messages are 8 bits -long. Suppose the ICV (integrity check) is 4 bits long, and is -calculated by XOR-ing the first 4 bits of data with the last 4 bits of -data. Suppose the pseudoWEP packet consists of three fields: first the -IV field, then the message field, and last the ICV field, with some of -these fields encrypted. - -a. We want to send the message m=10100000 using the IV=11 and using - WEP. What will be the values in the three WEP fields? - -b. Show that when the receiver decrypts the WEP packet, it recovers the - message and the ICV. - -c. Suppose Trudy intercepts a WEP packet (not necessarily with the - IV=11) and wants to modify it before forwarding it to the receiver. - Suppose Trudy flips the first ICV bit. Assuming that Trudy does not - know the keystreams for any of the IVs, what other bit(s) must Trudy - also flip so that the received packet passes the ICV check? - -d. Justify your answer by modifying the bits in the WEP packet in part - (a), decrypting the resulting packet, and verifying the integrity - check. P25. Provide a filter table and a connection table for a - stateful firewall that is as restrictive as possible but - accomplishes the following: - -e. Allows all internal users to establish Telnet sessions with external - hosts. - -f. Allows external users to surf the company Web site at 222.22.0.12. - -g. But otherwise blocks all inbound and outbound traffic. The internal - network is 222.22/16. In your solution, suppose that the connection - table is currently caching three connections, all from inside to - outside. You'll need to invent appropriate IP addresses and port - numbers. P26. Suppose Alice wants to visit the Web site activist.com - using a TOR-like service. This service uses two non-colluding proxy - servers, Proxy1 and Proxy2. Alice first obtains the - -certificates (each containing a public key) for Proxy1 and Proxy2 from -some central server. Denote K1+(),K2+(),K1−(), and K2−() for the -encryption/decryption with public and private RSA keys. - -a. Using a timing diagram, provide a protocol (as simple as possible) - that enables Alice to establish a shared session key S1 with Proxy1. - Denote S1(m) for encryption/decryption of data m with the shared key - S1. - -b. Using a timing diagram, provide a protocol (as simple as possible) - that allows Alice to establish a shared session key S2 with Proxy2 - without revealing her IP address to Proxy2. - -c. Assume now that shared keys S1 and S2 are now established. Using a - timing diagram, provide a protocol (as simple as possible and not - using public-key cryptography) that allows Alice to request an html - page from activist.com without revealing her IP address to Proxy2 - and without revealing to Proxy1 which site she is visiting. Your - diagram should end with an HTTP request arriving at activist.com. - -Wireshark Lab In this lab (available from the book Web site), we -investigate the Secure Sockets Layer (SSL) protocol. Recall from Section -8.6 that SSL is used for securing a TCP connection, and that it is -extensively used in practice for secure Internet transactions. In this -lab, we will focus on the SSL records sent over the TCP connection. We -will attempt to delineate and classify each of the records, with a goal -of understanding the why and how for each record. We investigate the -various SSL record types as well as the fields in the SSL messages. We -do so by analyzing a trace of the SSL records sent between your host and -an e-commerce server. - -IPsec Lab In this lab (available from the book Web site), we will -explore how to create IPsec SAs between linux boxes. You can do the -first part of the lab with two ordinary linux boxes, each with one -Ethernet adapter. But for the second part of the lab, you will need four -linux boxes, two of which having two Ethernet adapters. In the second -half of the lab, you will create IPsec SAs using the ESP protocol in the -tunnel mode. You will do this by first manually creating the SAs, and -then by having IKE create the SAs. - -AN INTERVIEW WITH... Steven M. Bellovin Steven M. Bellovin joined the -faculty at Columbia University after many years at the Network Services -Research Lab at AT&T Labs Research in Florham Park, New Jersey. His -focus is on networks, security, and why the two are incompatible. In -1995, he was awarded the Usenix Lifetime Achievement Award for his work -in the creation of Usenet, the first newsgroup exchange network that -linked two or more computers and allowed users to share information - -and join in discussions. Steve is also an elected member of the National -Academy of Engineering. He received his BA from Columbia University and -his PhD from the University of North Carolina at Chapel Hill. - -What led you to specialize in the networking security area? This is -going to sound odd, but the answer is simple: It was fun. My background -was in systems programming and systems administration, which leads -fairly naturally to security. And I've always been interested in -communications, ranging back to part-time systems programming jobs when -I was in college. My work on security continues to be motivated by two -things---a desire to keep computers useful, which means that their -function can't be corrupted by attackers, and a desire to protect -privacy. What was your vision for Usenet at the time that you were -developing it? And now? We originally viewed it as a way to talk about -computer science and computer programming around the country, with a lot -of local use for administrative matters, for-sale ads, and so on. In -fact, my original prediction was one to two messages per day, from -50--100 sites at the most--- ever. But the real growth was in -people-related topics, including---but not limited to---human -interactions with computers. My favorite newsgroups, over the years, -have been things like rec.woodworking, as well as sci.crypt. To some -extent, netnews has been displaced by the Web. Were I to start designing -it today, it would look very different. But it still excels as a way to -reach a very broad audience that is interested in the topic, without -having to rely on particular Web sites. Has anyone inspired you -professionally? In what ways? Professor Fred Brooks---the founder and -original chair of the computer science department at the University of -North Carolina at Chapel Hill, the manager of the team that developed -the IBM S/360 and OS/360, and the author of The Mythical Man-Month---was -a tremendous influence on my career. More than anything else, he taught -outlook and trade-offs---how to look at problems in the context of the -real world (and how much messier the real world is than a theorist would -like), and how to balance competing interests in designing a solution. -Most computer work is engineering---the art of making the right -trade-offs to satisfy many contradictory objectives. What is your vision -for the future of networking and security? Thus far, much of the -security we have has come from isolation. A firewall, for example, works -by cutting off access to certain machines and services. But we're in an -era of increasing connectivity---it's gotten harder to isolate things. -Worse yet, our production systems require far more separate pieces, -interconnected by networks. Securing all that is one of our biggest -challenges. - -What would you say have been the greatest advances in security? How much -further do we have to go? At least scientifically, we know how to do -cryptography. That's been a big help. But most security problems are due -to buggy code, and that's a much harder problem. In fact, it's the -oldest unsolved problem in computer science, and I think it will remain -that way. The challenge is figuring out how to secure systems when we -have to build them out of insecure components. We can already do that -for reliability in the face of hardware failures; can we do the same for -security? Do you have any advice for students about the Internet and -networking security? Learning the mechanisms is the easy part. Learning -how to "think paranoid" is harder. You have to remember that probability -distributions don't apply---the attackers can and will find improbable -conditions. And the details matter---a lot. - -Chapter 9 Multimedia Networking - -While lounging in bed or riding buses and subways, people in all corners -of the world are currently using the Internet to watch movies and -television shows on demand. Internet movie and television distribution -companies such as Netflix and Amazon in North America and Youku and -Kankan in China have practically become household names. But people are -not only watching Internet videos, they are using sites like YouTube to -upload and distribute their own user-generated content, becoming -Internet video producers as well as consumers. Moreover, network -applications such as Skype, Google Talk, and WeChat (enormously popular -in China) allow people to not only make "telephone calls" over the -Internet, but to also enhance those calls with video and multi-person -conferencing. In fact, we predict that by the end of the current decade -most of the video consumption and voice conversations will take place -end-to-end over the Internet, more typically to wireless devices -connected to the Internet via cellular and WiFi access networks. -Traditional telephony and broadcast television are quickly becoming -obsolete. We begin this chapter with a taxonomy of multimedia -applications in Section 9.1. We'll see that a multimedia application can -be classified as either streaming stored audio/video, conversational -voice/video-over-IP, or streaming live audio/video. We'll see that each -of these classes of applications has its own unique service requirements -that differ significantly from those of traditional elastic applications -such as e-mail, Web browsing, and remote login. In Section 9.2, we'll -examine video streaming in some detail. We'll explore many of the -underlying principles behind video streaming, including client -buffering, prefetching, and adapting video quality to available -bandwidth. In Section 9.3, we investigate conversational voice and -video, which, unlike elastic applications, are highly sensitive to -end-to-end delay but can tolerate occasional loss of data. Here we'll -examine how techniques such as adaptive playout, forward error -correction, and error concealment can mitigate against network-induced -packet loss and delay. We'll also examine Skype as a case study. In -Section 9.4, we'll study RTP and SIP, two popular protocols for -real-time conversational voice and video applications. In Section 9.5, -we'll investigate mechanisms within the network that can be used to -distinguish one class of traffic (e.g., delay-sensitive applications -such as conversational voice) from another (e.g., elastic applications -such as browsing Web pages), and provide differentiated service among -multiple classes of traffic. - -9.1 Multimedia Networking Applications We define a multimedia network -application as any network application that employs audio or video. In -this section, we provide a taxonomy of multimedia applications. We'll -see that each class of applications in the taxonomy has its own unique -set of service requirements and design issues. But before diving into an -in-depth discussion of Internet multimedia applications, it is useful to -consider the intrinsic characteristics of the audio and video media -themselves. - -9.1.1 Properties of Video Perhaps the most salient characteristic of -video is its high bit rate. Video distributed over the Internet -typically ranges from 100 kbps for low-quality video conferencing to -over 3 Mbps for streaming highdefinition movies. To get a sense of how -video bandwidth demands compare with those of other Internet -applications, let's briefly consider three different users, each using a -different Internet application. Our first user, Frank, is going quickly -through photos posted on his friends' Facebook pages. Let's assume that -Frank is looking at a new photo every 10 seconds, and that photos are on -average 200 Kbytes in size. (As usual, throughout this discussion we -make the simplifying assumption that 1 Kbyte=8,000 bits.) Our second -user, Martha, is streaming music from the Internet ("the cloud") to her -smartphone. Let's assume Martha is using a service such as Spotify to -listen to many MP3 songs, one after the other, each encoded at a rate of -128 kbps. Our third user, Victor, is watching a video that has been -encoded at 2 Mbps. Finally, let's suppose that the session length for -all three users is 4,000 seconds (approximately 67 minutes). Table 9.1 -compares the bit rates and the total bytes transferred for these three -users. We see that video streaming consumes by far the most bandwidth, -having a bit rate of more than ten times greater than that of the -Facebook and music-streaming applications. Therefore, when design Table -9.1 Comparison of bit-rate requirements of three Internet applications -Bit rate - -Bytes transferred in 67 min - -Facebook Frank - -160 kbps - -80 Mbytes - -Martha Music - -128 kbps - -64 Mbytes - -Victor Video - -2 Mbps - -1 Gbyte - -ing networked video applications, the first thing we must keep in mind -is the high bit-rate requirements of video. Given the popularity of -video and its high bit rate, it is perhaps not surprising that Cisco -predicts \[Cisco 2015\] that streaming and stored video will be -approximately 80 percent of global consumer Internet traffic by 2019. -Another important characteristic of video is that it can be compressed, -thereby trading off video quality with bit rate. A video is a sequence -of images, typically being displayed at a constant rate, for example, at -24 or 30 images per second. An uncompressed, digitally encoded image -consists of an array of pixels, with each pixel encoded into a number of -bits to represent luminance and color. There are two types of redundancy -in video, both of which can be exploited by video compression. Spatial -redundancy is the redundancy within a given image. Intuitively, an image -that consists of mostly white space has a high degree of redundancy and -can be efficiently compressed without significantly sacrificing image -quality. Temporal redundancy reflects repetition from image to -subsequent image. If, for example, an image and the subsequent image are -exactly the same, there is no reason to re-encode the subsequent image; -it is instead more efficient simply to indicate during encoding that the -subsequent image is exactly the same. Today's off-the-shelf compression -algorithms can compress a video to essentially any bit rate desired. Of -course, the higher the bit rate, the better the image quality and the -better the overall user viewing experience. We can also use compression -to create multiple versions of the same video, each at a different -quality level. For example, we can use compression to create, say, three -versions of the same video, at rates of 300 kbps, 1 Mbps, and 3 Mbps. -Users can then decide which version they want to watch as a function of -their current available bandwidth. Users with high-speed Internet -connections might choose the 3 Mbps version; users watching the video -over 3G with a smartphone might choose the 300 kbps version. Similarly, -the video in a video conference application can be compressed -"on-the-fly" to provide the best video quality given the available -end-to-end bandwidth between conversing users. - -9.1.2 Properties of Audio Digital audio (including digitized speech and -music) has significantly lower bandwidth requirements than video. -Digital audio, however, has its own unique properties that must be -considered when designing multimedia network applications. To understand -these properties, let's first consider how analog audio (which humans -and musical instruments generate) is converted to a digital signal: The -analog audio signal is sampled at some fixed rate, for example, at 8,000 -samples per second. The value of each sample will be some real number. -Each of the samples is then rounded to one of a finite number of values. -This operation is referred to as quantization. The number of such finite -values---called quantization values---is typically a power - -of two, for example, 256 quantization values. Each of the quantization -values is represented by a fixed number of bits. For example, if there -are 256 quantization values, then each value---and hence each audio -sample---is represented by one byte. The bit representations of all the -samples are then concatenated together to form the digital -representation of the signal. As an example, if an analog audio signal -is sampled at 8,000 samples per second and each sample is quantized and -represented by 8 bits, then the resulting digital signal will have a -rate of 64,000 bits per second. For playback through audio speakers, the -digital signal can then be converted back---that is, decoded---to an -analog signal. However, the decoded analog signal is only an -approximation of the original signal, and the sound quality may be -noticeably degraded (for example, high-frequency sounds may be missing -in the decoded signal). By increasing the sampling rate and the number -of quantization values, the decoded signal can better approximate the -original analog signal. Thus (as with video), there is a trade-off -between the quality of the decoded signal and the bit-rate and storage -requirements of the digital signal. The basic encoding technique that we -just described is called pulse code modulation (PCM). Speech encoding -often uses PCM, with a sampling rate of 8,000 samples per second and 8 -bits per sample, resulting in a rate of 64 kbps. The audio compact disk -(CD) also uses PCM, with a sampling rate of 44,100 samples per second -with 16 bits per sample; this gives a rate of 705.6 kbps for mono and -1.411 Mbps for stereo. PCM-encoded speech and music, however, are rarely -used in the Internet. Instead, as with video, compression techniques are -used to reduce the bit rates of the stream. Human speech can be -compressed to less than 10 kbps and still be intelligible. A popular -compression technique for near CDquality stereo music is MPEG 1 layer 3, -more commonly known as MP3. MP3 encoders can compress to many different -rates; 128 kbps is the most common encoding rate and produces very -little sound degradation. A related standard is Advanced Audio Coding -(AAC), which has been popularized by Apple. As with video, multiple -versions of a prerecorded audio stream can be created, each at a -different bit rate. Although audio bit rates are generally much less -than those of video, users are generally much more sensitive to audio -glitches than video glitches. Consider, for example, a video conference -taking place over the Internet. If, from time to time, the video signal -is lost for a few seconds, the video conference can likely proceed -without too much user frustration. If, however, the audio signal is -frequently lost, the users may have to terminate the session. - -9.1.3 Types of Multimedia Network Applications The Internet supports a -large variety of useful and entertaining multimedia applications. In -this subsection, we classify multimedia applications into three broad -categories: (i) streaming stored - -audio/video, (ii) conversational voice/video-over-IP, and (iii) -streaming live audio/video. As we will soon see, each of these -application categories has its own set of service requirements and -design issues. Streaming Stored Audio and Video To keep the discussion -concrete, we focus here on streaming stored video, which typically -combines video and audio components. Streaming stored audio (such as -Spotify's streaming music service) is very similar to streaming stored -video, although the bit rates are typically much lower. In this class of -applications, the underlying medium is prerecorded video, such as a -movie, a television show, a prerecorded sporting event, or a prerecorded -user-generated video (such as those commonly seen on YouTube). These -prerecorded videos are placed on servers, and users send requests to the -servers to view the videos on demand. Many Internet companies today -provide streaming video, including YouTube (Google), Netflix, Amazon, -and Hulu. Streaming stored video has three key distinguishing features. -Streaming. In a streaming stored video application, the client typically -begins video playout within a few seconds after it begins receiving the -video from the server. This means that the client will be playing out -from one location in the video while at the same time receiving later -parts of the video from the server. This technique, known as streaming, -avoids having to download the entire video file (and incurring a -potentially long delay) before playout begins. Interactivity. Because -the media is prerecorded, the user may pause, reposition forward, -reposition backward, fast-forward, and so on through the video content. -The time from when the user makes such a request until the action -manifests itself at the client should be less than a few seconds for -acceptable responsiveness. Continuous playout. Once playout of the video -begins, it should proceed according to the original timing of the -recording. Therefore, data must be received from the server in time for -its playout at the client; otherwise, users experience video frame -freezing (when the client waits for the delayed frames) or frame -skipping (when the client skips over delayed frames). By far, the most -important performance measure for streaming video is average throughput. -In order to provide continuous playout, the network must provide an -average throughput to the streaming application that is at least as -large the bit rate of the video itself. As we will see in Section 9.2, -by using buffering and prefetching, it is possible to provide continuous -playout even when the throughput fluctuates, as long as the average -throughput (averaged over 5--10 seconds) remains above the video rate -\[Wang 2008\]. For many streaming video applications, prerecorded video -is stored on, and streamed from, a CDN rather than from a single data -center. There are also many P2P video streaming applications for which -the video is stored on users' hosts (peers), with different chunks of -video arriving from different peers - -that may spread around the globe. Given the prominence of Internet video -streaming, we will explore video streaming in some depth in Section 9.2, -paying particular attention to client buffering, prefetching, adapting -quality to bandwidth availability, and CDN distribution. Conversational -Voice- and Video-over-IP Real-time conversational voice over the -Internet is often referred to as Internet telephony, since, from the -user's perspective, it is similar to the traditional circuit-switched -telephone service. It is also commonly called Voice-over-IP (VoIP). -Conversational video is similar, except that it includes the video of -the participants as well as their voices. Most of today's voice and -video conversational systems allow users to create conferences with -three or more participants. Conversational voice and video are widely -used in the Internet today, with the Internet companies Skype, QQ, and -Google Talk boasting hundreds of millions of daily users. In our -discussion of application service requirements in Chapter 2 (Figure -2.4), we identified a number of axes along which application -requirements can be classified. Two of these axes---timing -considerations and tolerance of data loss---are particularly important -for conversational voice and video applications. Timing considerations -are important because audio and video conversational applications are -highly delay-sensitive. For a conversation with two or more interacting -speakers, the delay from when a user speaks or moves until the action is -manifested at the other end should be less than a few hundred -milliseconds. For voice, delays smaller than 150 milliseconds are not -perceived by a human listener, delays between 150 and 400 milliseconds -can be acceptable, and delays exceeding 400 milliseconds can result in -frustrating, if not completely unintelligible, voice conversations. On -the other hand, conversational multimedia applications are -loss-tolerant---occasional loss only causes occasional glitches in -audio/video playback, and these losses can often be partially or fully -concealed. These delay-sensitive but loss-tolerant characteristics are -clearly different from those of elastic data applications such as Web -browsing, e-mail, social networks, and remote login. For elastic -applications, long delays are annoying but not particularly harmful; the -completeness and integrity of the transferred data, however, are of -paramount importance. We will explore conversational voice and video in -more depth in Section 9.3, paying particular attention to how adaptive -playout, forward error correction, and error concealment can mitigate -against network-induced packet loss and delay. Streaming Live Audio and -Video This third class of applications is similar to traditional -broadcast radio and television, except that transmission takes place -over the Internet. These applications allow a user to receive a live -radio or television transmission---such as a live sporting event or an -ongoing news event---transmitted from any corner of the world. Today, -thousands of radio and television stations around the world are -broadcasting content over the Internet. - -Live, broadcast-like applications often have many users who receive the -same audio/video program at the same time. In the Internet today, this -is typically done with CDNs (Section 2.6). As with streaming stored -multimedia, the network must provide each live multimedia flow with an -average throughput that is larger than the video consumption rate. -Because the event is live, delay can also be an issue, although the -timing constraints are much less stringent than those for conversational -voice. Delays of up to ten seconds or so from when the user chooses to -view a live transmission to when playout begins can be tolerated. We -will not cover streaming live media in this book because many of the -techniques used for streaming live media---initial buffering delay, -adaptive bandwidth use, and CDN distribution---are similar to those for -streaming stored media. - -9.2 Streaming Stored Video For streaming video applications, prerecorded -videos are placed on servers, and users send requests to these servers -to view the videos on demand. The user may watch the video from -beginning to end without interruption, may stop watching the video well -before it ends, or interact with the video by pausing or repositioning -to a future or past scene. Streaming video systems can be classified -into three categories: UDP streaming, HTTP streaming, and adaptive HTTP -streaming (see Section 2.6). Although all three types of systems are -used in practice, the majority of today's systems employ HTTP streaming -and adaptive HTTP streaming. A common characteristic of all three forms -of video streaming is the extensive use of client-side application -buffering to mitigate the effects of varying end-to-end delays and -varying amounts of available bandwidth between server and client. For -streaming video (both stored and live), users generally can tolerate a -small several-second initial delay between when the client requests a -video and when video playout begins at the client. Consequently, when -the video starts to arrive at the client, the client need not -immediately begin playout, but can instead build up a reserve of video -in an application buffer. Once the client has built up a reserve of -several seconds of buffered-but-not-yet-played video, the client can -then begin video playout. There are two important advantages provided by -such client buffering. First, client-side buffering can absorb -variations in server-to-client delay. If a particular piece of video -data is delayed, as long as it arrives before the reserve of -received-but-not-yet-played video is exhausted, this long delay will not -be noticed. Second, if the server-to-client bandwidth briefly drops -below the video consumption rate, a user can continue to enjoy -continuous playback, again as long as the client application buffer does -not become completely drained. Figure 9.1 illustrates client-side -buffering. In this simple example, suppose that video is encoded at a -fixed bit rate, and thus each video block contains video frames that are -to be played out over the same fixed amount of time, Δ. The server -transmits the first video block at t0, the second block at t0+Δ, the -third block at t0+2Δ, and so on. Once the client begins playout, each -block should be played out Δ time units after the previous block in -order to reproduce the timing of the original recorded video. Because of -the variable end-to-end network delays, different video blocks -experience different delays. The first video block arrives at the client -at t1 and the second block arrives at t2. The network delay for the ith -block is the horizontal distance between the time the block was -transmitted by the server and the time it is received at the client; -note that the network delay varies from one video block to another. In -this example, if the client were to begin playout as soon as the first -block arrived at t1, then the second block would not have arrived in -time to be played out at out at t1+Δ. In this case, video playout would -either have to stall (waiting for block 2 to arrive) or block 2 could be -skipped---both resulting in undesirable - -playout impairments. Instead, if the client were to delay the start of -playout until t3, when blocks 1 through 6 have all arrived, periodic -playout can proceed with all blocks having been received before their -playout time. - -Figure 9.1 Client playout delay in video streaming - -9.2.1 UDP Streaming We only briefly discuss UDP streaming here, -referring the reader to more in-depth discussions of the protocols -behind these systems where appropriate. With UDP streaming, the server -transmits video at a rate that matches the client's video consumption -rate by clocking out the video chunks over UDP at a steady rate. For -example, if the video consumption rate is 2 Mbps and each UDP packet -carries 8,000 bits of video, then the server would transmit one UDP -packet into its socket every (8000 bits)/(2 Mbps)=4 msec. As we learned -in Chapter 3, because UDP does not employ a congestion-control -mechanism, the server can push packets into the network at the -consumption rate of the video without the rate-control restrictions of -TCP. UDP streaming typically uses a small client-side buffer, big enough -to hold less than a second of video. Before passing the video chunks to -UDP, the server will encapsulate the video chunks within transport -packets specially designed for transporting audio and video, using the -Real-Time Transport Protocol (RTP) \[RFC 3550\] or a similar (possibly -proprietary) scheme. We delay our coverage of RTP until Section 9.3, -where we discuss RTP in the context of conversational voice and video -systems. Another distinguishing property of UDP streaming is that in -addition to the server-to-client video stream, the client and server -also maintain, in parallel, a separate control connection over which the -client sends commands regarding session state changes (such as pause, -resume, reposition, and so on). The Real- - -Time Streaming Protocol (RTSP) \[RFC 2326\], explained in some detail in -the Web site for this textbook, is a popular open protocol for such a -control connection. Although UDP streaming has been employed in many -open-source systems and proprietary products, it suffers from three -significant drawbacks. First, due to the unpredictable and varying -amount of available bandwidth between server and client, constant-rate -UDP streaming can fail to provide continuous playout. For example, -consider the scenario where the video consumption rate is 1 Mbps and the -server-to-client available bandwidth is usually more than 1 Mbps, but -every few minutes the available bandwidth drops below 1 Mbps for several -seconds. In such a scenario, a UDP streaming system that transmits video -at a constant rate of 1 Mbps over RTP/UDP would likely provide a poor -user experience, with freezing or skipped frames soon after the -available bandwidth falls below 1 Mbps. The second drawback of UDP -streaming is that it requires a media control server, such as an RTSP -server, to process client-to-server interactivity requests and to track -client state (e.g., the client's playout point in the video, whether the -video is being paused or played, and so on) for each ongoing client -session. This increases the overall cost and complexity of deploying a -large-scale video-on-demand system. The third drawback is that many -firewalls are configured to block UDP traffic, preventing the users -behind these firewalls from receiving UDP video. - -9.2.2 HTTP Streaming In HTTP streaming, the video is simply stored in an -HTTP server as an ordinary file with a specific URL. When a user wants -to see the video, the client establishes a TCP connection with the -server and issues an HTTP GET request for that URL. The server then -sends the video file, within an HTTP response message, as quickly as -possible, that is, as quickly as TCP congestion control and flow control -will allow. On the client side, the bytes are collected in a client -application buffer. Once the number of bytes in this buffer exceeds a -predetermined threshold, the client application begins -playback---specifically, it periodically grabs video frames from the -client application buffer, decompresses the frames, and displays them on -the user's screen. We learned in Chapter 3 that when transferring a file -over TCP, the server-to-client transmission rate can vary significantly -due to TCP's congestion control mechanism. In particular, it is not -uncommon for the transmission rate to vary in a "saw-tooth" manner -associated with TCP congestion control. Furthermore, packets can also be -significantly delayed due to TCP's retransmission mechanism. Because of -these characteristics of TCP, the conventional wisdom in the 1990s was -that video streaming would never work well over TCP. Over time, however, -designers of streaming video systems learned that TCP's congestion -control and reliable-data transfer mechanisms do not necessarily -preclude continuous playout when client buffering and prefetching -(discussed in the next section) are used. - -The use of HTTP over TCP also allows the video to traverse firewalls and -NATs more easily (which are often configured to block most UDP traffic -but to allow most HTTP traffic). Streaming over HTTP also obviates the -need for a media control server, such as an RTSP server, reducing the -cost of a largescale deployment over the Internet. Due to all of these -advantages, most video streaming applications today---including YouTube -and Netflix---use HTTP streaming (over TCP) as its underlying streaming -protocol. Prefetching Video As we just learned, client-side buffering -can be used to mitigate the effects of varying end-to-end delays and -varying available bandwidth. In our earlier example in Figure 9.1, the -server transmits video at the rate at which the video is to be played -out. However, for streaming stored video, the client can attempt to -download the video at a rate higher than the consumption rate, thereby -prefetching video frames that are to be consumed in the future. This -prefetched video is naturally stored in the client application buffer. -Such prefetching occurs naturally with TCP streaming, since TCP's -congestion avoidance mechanism will attempt to use all of the available -bandwidth between server and client. To gain some insight into -prefetching, let's take a look at a simple example. Suppose the video -consumption rate is 1 Mbps but the network is capable of delivering the -video from server to client at a constant rate of 1.5 Mbps. Then the -client will not only be able to play out the video with a very small -playout delay, but will also be able to increase the amount of buffered -video data by 500 Kbits every second. In this manner, if in the future -the client receives data at a rate of less than 1 Mbps for a brief -period of time, the client will be able to continue to provide -continuous playback due to the reserve in its buffer. \[Wang 2008\] -shows that when the average TCP throughput is roughly twice the media -bit rate, streaming over TCP results in minimal starvation and low -buffering delays. Client Application Buffer and TCP Buffers Figure 9.2 -illustrates the interaction between client and server for HTTP -streaming. At the server side, the portion of the video file in white -has already been sent into the server's socket, while the darkened -portion is what remains to be sent. After "passing through the socket -door," the bytes are placed in the TCP send buffer before being -transmitted into the Internet, as described in Chapter 3. In Figure 9.2, -because the TCP send buffer at the server side is shown to be full, the -server is momentarily prevented from sending more bytes from the video -file into the socket. On the client side, the client application (media -player) reads bytes from the TCP receive buffer (through its client -socket) and places the bytes into the client application buffer. At the -same time, the client application periodically grabs video frames from -the client application buffer, decompresses the frames, and displays -them on the user's screen. Note that if the client application buffer is -larger than the video file, then the whole process of moving bytes from -the server's storage to the client's application buffer is equivalent to -an ordinary file download over HTTP---the client simply pulls the video -off the server as fast as TCP will allow! - -Figure 9.2 Streaming stored video over HTTP/TCP - -Consider now what happens when the user pauses the video during the -streaming process. During the pause period, bits are not removed from -the client application buffer, even though bits continue to enter the -buffer from the server. If the client application buffer is finite, it -may eventually become full, which will cause "back pressure" all the way -back to the server. Specifically, once the client application buffer -becomes full, bytes can no longer be removed from the client TCP receive -buffer, so it too becomes full. Once the client receive TCP buffer -becomes full, bytes can no longer be removed from the server TCP send -buffer, so it also becomes full. Once the TCP becomes full, the server -cannot send any more bytes into the socket. Thus, if the user pauses the -video, the server may be forced to stop transmitting, in which case the -server will be blocked until the user resumes the video. In fact, even -during regular playback (that is, without pausing), if the client -application buffer becomes full, back pressure will cause the TCP -buffers to become full, which will force the server to reduce its rate. -To determine the resulting rate, note that when the client application -removes f bits, it creates room for f bits in the client application -buffer, which in turn allows the server to send f additional bits. Thus, -the server send rate can be no higher than the video consumption rate at -the client. Therefore, a full client application buffer indirectly -imposes a limit on the rate that video can be sent from server to client -when streaming over HTTP. Analysis of Video Streaming Some simple -modeling will provide more insight into initial playout delay and -freezing due to application buffer depletion. As shown in Figure 9.3, -let B denote the size - -Figure 9.3 Analysis of client-side buffering for video streaming - -(in bits) of the client's application buffer, and let Q denote the -number of bits that must be buffered before the client application -begins playout. (Of course, Q\<B.) Let r denote the video consumption -rate ---the rate at which the client draws bits out of the client -application buffer during playback. So, for example, if the video's -frame rate is 30 frames/sec, and each (compressed) frame is 100,000 -bits, then r=3 Mbps. To see the forest through the trees, we'll ignore -TCP's send and receive buffers. Let's assume that the server sends bits -at a constant rate x whenever the client buffer is not full. (This is a -gross simplification, since TCP's send rate varies due to congestion -control; we'll examine more realistic time-dependent rates x(t) in the -problems at the end of this chapter.) Suppose at time t=0, the -application buffer is empty and video begins arriving to the client -application buffer. We now ask at what time t=tp does playout begin? And -while we are at it, at what time t=tf does the client application buffer -become full? First, let's determine tp, the time when Q bits have -entered the application buffer and playout begins. Recall that bits -arrive to the client application buffer at rate x and no bits are -removed from this buffer before playout begins. Thus, the amount of time -required to build up Q bits (the initial buffering delay) is tp=Q/x. Now -let's determine tf, the point in time when the client application buffer -becomes full. We first observe that if x\<r (that is, if the server send -rate is less than the video consumption rate), then the client buffer -will never become full! Indeed, starting at time tp, the buffer will be -depleted at rate r and will only be filled at rate x\<r. Eventually the -client buffer will empty out entirely, at which time the video will -freeze on the screen while the client buffer waits another tp seconds to -build up Q bits of video. Thus, when the - -available rate in the network is less than the video rate, playout will -alternate between periods of continuous playout and periods of freezing. -In a homework problem, you will be asked to determine the length of each -continuous playout and freezing period as a function of Q, r, and x. Now -let's determine tf for when x\>r. In this case, starting at time tp, the -buffer increases from Q to B at rate x−r since bits are being depleted -at rate r but are arriving at rate x, as shown in Figure 9.3. Given -these hints, you will be asked in a homework problem to determine tf, -the time the client buffer becomes full. Note that when the available -rate in the network is more than the video rate, after the initial -buffering delay, the user will enjoy continuous playout until the video -ends. Early Termination and Repositioning the Video HTTP streaming -systems often make use of the HTTP byte-range header in the HTTP GET -request message, which specifies the specific range of bytes the client -currently wants to retrieve from the desired video. This is particularly -useful when the user wants to reposition (that is, jump) to a future -point in time in the video. When the user repositions to a new position, -the client sends a new HTTP request, indicating with the byte-range -header from which byte in the file should the server send data. When the -server receives the new HTTP request, it can forget about any earlier -request and instead send bytes beginning with the byte indicated in the -byte-range request. While we are on the subject of repositioning, we -briefly mention that when a user repositions to a future point in the -video or terminates the video early, some prefetched-but-not-yet-viewed -data transmitted by the server will go unwatched---a waste of network -bandwidth and server resources. For example, suppose that the client -buffer is full with B bits at some time t0 into the video, and at this -time the user repositions to some instant t\>t0+B/r into the video, and -then watches the video to completion from that point on. In this case, -all B bits in the buffer will be unwatched and the bandwidth and server -resources that were used to transmit those B bits have been completely -wasted. There is significant wasted bandwidth in the Internet due to -early termination, which can be quite costly, particularly for wireless -links \[Ihm 2011\]. For this reason, many streaming systems use only a -moderate-size client application buffer, or will limit the amount of -prefetched video using the byte-range header in HTTP requests \[Rao -2011\]. Repositioning and early termination are analogous to cooking a -large meal, eating only a portion of it, and throwing the rest away, -thereby wasting food. So the next time your parents criticize you for -wasting food by not eating all your dinner, you can quickly retort by -saying they are wasting bandwidth and server resources when they -reposition while watching movies over the Internet! But, of course, two -wrongs do not make a right---both food and bandwidth are not to be -wasted! In Sections 9.2.1 and 9.2.2, we covered UDP streaming and HTTP -streaming, respectively. A third type of streaming is Dynamic Adaptive -Streaming over HTTP (DASH), which uses multiple versions of the - -video, each compressed at a different rate. DASH is discussed in detail -in Section 2.6.2. CDNs are often used to distribute stored and live -video. CDNs are discussed in detail in Section 2.6.3. - -9.3 Voice-over-IP Real-time conversational voice over the Internet is -often referred to as Internet telephony, since, from the user's -perspective, it is similar to the traditional circuit-switched telephone -service. It is also commonly called Voice-over-IP (VoIP). In this -section we describe the principles and protocols underlying VoIP. -Conversational video is similar in many respects to VoIP, except that it -includes the video of the participants as well as their voices. To keep -the discussion focused and concrete, we focus here only on voice in this -section rather than combined voice and video. - -9.3.1 Limitations of the Best-Effort IP Service The Internet's -network-layer protocol, IP, provides best-effort service. That is to say -the service makes its best effort to move each datagram from source to -destination as quickly as possible but makes no promises whatsoever -about getting the packet to the destination within some delay bound or -about a limit on the percentage of packets lost. The lack of such -guarantees poses significant challenges to the design of real-time -conversational applications, which are acutely sensitive to packet -delay, jitter, and loss. In this section, we'll cover several ways in -which the performance of VoIP over a best-effort network can be -enhanced. Our focus will be on application-layer techniques, that is, -approaches that do not require any changes in the network core or even -in the transport layer at the end hosts. To keep the discussion -concrete, we'll discuss the limitations of best-effort IP service in the -context of a specific VoIP example. The sender generates bytes at a rate -of 8,000 bytes per second; every 20 msecs the sender gathers these bytes -into a chunk. A chunk and a special header (discussed below) are -encapsulated in a UDP segment, via a call to the socket interface. Thus, -the number of bytes in a chunk is (20 msecs)⋅(8,000 bytes/sec)=160 -bytes, and a UDP segment is sent every 20 msecs. If each packet makes it -to the receiver with a constant end-to-end delay, then packets arrive at -the receiver periodically every 20 msecs. In these ideal conditions, the -receiver can simply play back each chunk as soon as it arrives. But -unfortunately, some packets can be lost and most packets will not have -the same end-to-end delay, even in a lightly congested Internet. For -this reason, the receiver must take more care in determining (1) when to -play back a chunk, and (2) what to do with a missing chunk. Packet Loss - -Consider one of the UDP segments generated by our VoIP application. The -UDP segment is encapsulated in an IP datagram. As the datagram wanders -through the network, it passes through router buffers (that is, queues) -while waiting for transmission on outbound links. It is possible that -one or more of the buffers in the path from sender to receiver is full, -in which case the arriving IP datagram may be discarded, never to arrive -at the receiving application. Loss could be eliminated by sending the -packets over TCP (which provides for reliable data transfer) rather than -over UDP. However, retransmission mechanisms are often considered -unacceptable for conversational real-time audio applications such as -VoIP, because they increase end-to-end delay \[Bolot 1996\]. -Furthermore, due to TCP congestion control, packet loss may result in a -reduction of the TCP sender's transmission rate to a rate that is lower -than the receiver's drain rate, possibly leading to buffer starvation. -This can have a severe impact on voice intelligibility at the receiver. -For these reasons, most existing VoIP applications run over UDP by -default. \[Baset 2006\] reports that UDP is used by Skype unless a user -is behind a NAT or firewall that blocks UDP segments (in which case TCP -is used). But losing packets is not necessarily as disastrous as one -might think. Indeed, packet loss rates between 1 and 20 percent can be -tolerated, depending on how voice is encoded and transmitted, and on how -the loss is concealed at the receiver. For example, forward error -correction (FEC) can help conceal packet loss. We'll see below that with -FEC, redundant information is transmitted along with the original -information so that some of the lost original data can be recovered from -the redundant information. Nevertheless, if one or more of the links -between sender and receiver is severely congested, and packet loss -exceeds 10 to 20 percent (for example, on a wireless link), then there -is really nothing that can be done to achieve acceptable audio quality. -Clearly, best-effort service has its limitations. End-to-End Delay -End-to-end delay is the accumulation of transmission, processing, and -queuing delays in routers; propagation delays in links; and end-system -processing delays. For real-time conversational applications, such as -VoIP, end-to-end delays smaller than 150 msecs are not perceived by a -human listener; delays between 150 and 400 msecs can be acceptable but -are not ideal; and delays exceeding 400 msecs can seriously hinder the -interactivity in voice conversations. The receiving side of a VoIP -application will typically disregard any packets that are delayed more -than a certain threshold, for example, more than 400 msecs. Thus, -packets that are delayed by more than the threshold are effectively -lost. Packet Jitter A crucial component of end-to-end delay is the -varying queuing delays that a packet experiences in the network's -routers. Because of these varying delays, the time from when a packet is -generated at the - -source until it is received at the receiver can fluctuate from packet to -packet, as shown in Figure 9.1. This phenomenon is called jitter. As an -example, consider two consecutive packets in our VoIP application. The -sender sends the second packet 20 msecs after sending the first packet. -But at the receiver, the spacing between these packets can become -greater than 20 msecs. To see this, suppose the first packet arrives at -a nearly empty queue at a router, but just before the second packet -arrives at the queue a large number of packets from other sources arrive -at the same queue. Because the first packet experiences a small queuing -delay and the second packet suffers a large queuing delay at this -router, the first and second packets become spaced by more than 20 -msecs. The spacing between consecutive packets can also become less than -20 msecs. To see this, again consider two consecutive packets. Suppose -the first packet joins the end of a queue with a large number of -packets, and the second packet arrives at the queue before this first -packet is transmitted and before any packets from other sources arrive -at the queue. In this case, our two packets find themselves one right -after the other in the queue. If the time it takes to transmit a packet -on the router's outbound link is less than 20 msecs, then the spacing -between first and second packets becomes less than 20 msecs. The -situation is analogous to driving cars on roads. Suppose you and your -friend are each driving in your own cars from San Diego to Phoenix. -Suppose you and your friend have similar driving styles, and that you -both drive at 100 km/hour, traffic permitting. If your friend starts out -one hour before you, depending on intervening traffic, you may arrive at -Phoenix more or less than one hour after your friend. If the receiver -ignores the presence of jitter and plays out chunks as soon as they -arrive, then the resulting audio quality can easily become -unintelligible at the receiver. Fortunately, jitter can often be removed -by using sequence numbers, timestamps, and a playout delay, as discussed -below. - -9.3.2 Removing Jitter at the Receiver for Audio For our VoIP -application, where packets are being generated periodically, the -receiver should attempt to provide periodic playout of voice chunks in -the presence of random network jitter. This is typically done by -combining the following two mechanisms: Prepending each chunk with a -timestamp. The sender stamps each chunk with the time at which the chunk -was generated. Delaying playout of chunks at the receiver. As we saw in -our earlier discussion of Figure 9.1, the playout delay of the received -audio chunks must be long enough so that most of the packets are -received before their scheduled playout times. This playout delay can -either be fixed throughout the duration of the audio session or vary -adaptively during the audio session lifetime. We now discuss how these -three mechanisms, when combined, can alleviate or even eliminate the -effects of jitter. We examine two playback strategies: fixed playout -delay and adaptive playout delay. - -Fixed Playout Delay With the fixed-delay strategy, the receiver attempts -to play out each chunk exactly q msecs after the chunk is generated. So -if a chunk is timestamped at the sender at time t, the receiver plays -out the chunk at time t+q, assuming the chunk has arrived by that time. -Packets that arrive after their scheduled playout times are discarded -and considered lost. What is a good choice for q? VoIP can support -delays up to about 400 msecs, although a more satisfying conversational -experience is achieved with smaller values of q. On the other hand, if q -is made much smaller than 400 msecs, then many packets may miss their -scheduled playback times due to the network-induced packet jitter. -Roughly speaking, if large variations in end-to-end delay are typical, -it is preferable to use a large q; on the other hand, if delay is small -and variations in delay are also small, it is preferable to use a small -q, perhaps less than 150 msecs. The trade-off between the playback delay -and packet loss is illustrated in Figure 9.4. The figure shows the times -at which packets are generated and played - -Figure 9.4 Packet loss for different fixed playout delays - -out for a single talk spurt. Two distinct initial playout delays are -considered. As shown by the leftmost staircase, the sender generates -packets at regular intervals---say, every 20 msecs. The first packet in -this talk spurt is received at time r. As shown in the figure, the -arrivals of subsequent packets are not evenly spaced due to the network -jitter. For the first playout schedule, the fixed initial playout delay -is set to p−r. With this schedule, the fourth - -packet does not arrive by its scheduled playout time, and the receiver -considers it lost. For the second playout schedule, the fixed initial -playout delay is set to p′−r. For this schedule, all packets arrive -before their scheduled playout times, and there is therefore no loss. -Adaptive Playout Delay The previous example demonstrates an important -delay-loss trade-off that arises when designing a playout strategy with -fixed playout delays. By making the initial playout delay large, most -packets will make their deadlines and there will therefore be negligible -loss; however, for conversational services such as VoIP, long delays can -become bothersome if not intolerable. Ideally, we would like the playout -delay to be minimized subject to the constraint that the loss be below a -few percent. The natural way to deal with this trade-off is to estimate -the network delay and the variance of the network delay, and to adjust -the playout delay accordingly at the beginning of each talk spurt. This -adaptive adjustment of playout delays at the beginning of the talk -spurts will cause the sender's silent periods to be compressed and -elongated; however, compression and elongation of silence by a small -amount is not noticeable in speech. Following \[Ramjee 1994\], we now -describe a generic algorithm that the receiver can use to adaptively -adjust its playout delays. To this end, let ti= the timestamp of the ith -packet = the time the packet was generated by the sender ri= the time -packet i is received by receiver pi= the time packet i is played at -receiver The end-to-end network delay of the ith packet is ri−ti. Due to -network jitter, this delay will vary from packet to packet. Let di -denote an estimate of the average network delay upon reception of the -ith packet. This estimate is constructed from the timestamps as follows: -di=(1−u)di−1+u(ri−ti) where u is a fixed constant (for example, u=0.01). -Thus di is a smoothed average of the observed network delays -r1−t1,...,ri−ti. The estimate places more weight on the recently -observed network delays than on the observed network delays of the -distant past. This form of estimate should not be completely unfamiliar; -a similar idea is used to estimate round-trip times in TCP, as discussed -in Chapter 3. Let vi denote an estimate of the average deviation of the -delay from the estimated average delay. This estimate is also -constructed from the timestamps: vi=(1−u)vi−1+u\| ri−ti−di\| - -The estimates di and vi are calculated for every packet received, -although they are used only to determine the playout point for the first -packet in any talk spurt. Once having calculated these estimates, the -receiver employs the following algorithm for the playout of packets. If -packet i is the first packet of a talk spurt, its playout time, pi, is -computed as: pi=ti+di+Kvi where K is a positive constant (for example, -K=4). The purpose of the Kvi term is to set the playout time far enough -into the future so that only a small fraction of the arriving packets in -the talk spurt will be lost due to late arrivals. The playout point for -any subsequent packet in a talk spurt is computed as an offset from the -point in time when the first packet in the talk spurt was played out. In -particular, let qi=pi−ti be the length of time from when the first -packet in the talk spurt is generated until it is played out. If packet -j also belongs to this talk spurt, it is played out at time pj=tj+qi The -algorithm just described makes perfect sense assuming that the receiver -can tell whether a packet is the first packet in the talk spurt. This -can be done by examining the signal energy in each received packet. - -9.3.3 Recovering from Packet Loss We have discussed in some detail how a -VoIP application can deal with packet jitter. We now briefly describe -several schemes that attempt to preserve acceptable audio quality in the -presence of packet loss. Such schemes are called loss recovery schemes. -Here we define packet loss in a broad sense: A packet is lost either if -it never arrives at the receiver or if it arrives after its scheduled -playout time. Our VoIP example will again serve as a context for -describing loss recovery schemes. As mentioned at the beginning of this -section, retransmitting lost packets may not be feasible in a realtime -conversational application such as VoIP. Indeed, retransmitting a packet -that has missed its playout deadline serves absolutely no purpose. And -retransmitting a packet that overflowed a router queue cannot normally -be accomplished quickly enough. Because of these considerations, VoIP -applications often use some type of loss anticipation scheme. Two types -of loss anticipation schemes are forward error correction (FEC) and -interleaving. - -Forward Error Correction (FEC) The basic idea of FEC is to add redundant -information to the original packet stream. For the cost of marginally -increasing the transmission rate, the redundant information can be used -to reconstruct approximations or exact versions of some of the lost -packets. Following \[Bolot 1996\] and \[Perkins 1998\], we now outline -two simple FEC mechanisms. The first mechanism sends a redundant encoded -chunk after every n chunks. The redundant chunk is obtained by exclusive -OR-ing the n original chunks \[Shacham 1990\]. In this manner if any one -packet of the group of n+1 packets is lost, the receiver can fully -reconstruct the lost packet. But if two or more packets in a group are -lost, the receiver cannot reconstruct the lost packets. By keeping n+1, -the group size, small, a large fraction of the lost packets can be -recovered when loss is not excessive. However, the smaller the group -size, the greater the relative increase of the transmission rate. In -particular, the transmission rate increases by a factor of 1/n, so that, -if n=3, then the transmission rate increases by 33 percent. Furthermore, -this simple scheme increases the playout delay, as the receiver must -wait to receive the entire group of packets before it can begin playout. -For more practical details about how FEC works for multimedia transport -see \[RFC 5109\]. The second FEC mechanism is to send a lower-resolution -audio stream as the redundant information. For example, the sender might -create a nominal audio stream and a corresponding low-resolution, lowbit -rate audio stream. (The nominal stream could be a PCM encoding at 64 -kbps, and the lower-quality stream could be a GSM encoding at 13 kbps.) -The low-bit rate stream is referred to as the redundant stream. As shown -in Figure 9.5, the sender constructs the nth packet by taking the nth -chunk from the nominal stream and appending to it the (n−1)st chunk from -the redundant stream. In this manner, whenever there is nonconsecutive -packet loss, the receiver can conceal the loss by playing out the lowbit -rate encoded chunk that arrives with the subsequent packet. Of course, -low-bit rate chunks give lower quality than the nominal chunks. However, -a stream of mostly high-quality chunks, occasional lowquality chunks, -and no missing chunks gives good overall audio quality. Note that in -this scheme, the receiver only has to receive two packets before -playback, so that the increased playout delay is small. Furthermore, if -the low-bit rate encoding is much less than the nominal encoding, then -the marginal increase in the transmission rate will be small. In order -to cope with consecutive loss, we can use a simple variation. Instead of -appending just the (n−1)st low-bit rate chunk to the nth nominal chunk, -the sender can append the (n−1)st and (n−2)nd lowbit rate chunk, or -append the (n−1)st and (n−3)rd low-bit rate chunk, and so on. By -appending more lowbit rate chunks to each nominal chunk, the audio -quality at the receiver becomes acceptable for a wider variety of harsh -best-effort environments. On the other hand, the additional chunks -increase the transmission bandwidth and the playout delay. - -Figure 9.5 Piggybacking lower-quality redundant information - -Interleaving As an alternative to redundant transmission, a VoIP -application can send interleaved audio. As shown in Figure 9.6, the -sender resequences units of audio data before transmission, so that -originally adjacent units are separated by a certain distance in the -transmitted stream. Interleaving can mitigate the effect of packet -losses. If, for example, units are 5 msecs in length and chunks are 20 -msecs (that is, four units per chunk), then the first chunk could -contain units 1, 5, 9, and 13; the second chunk could contain units 2, -6, 10, and 14; and so on. Figure 9.6 shows that the loss of a single -packet from an interleaved stream results in multiple small gaps in the -reconstructed stream, as opposed to the single large gap that would -occur in a noninterleaved stream. Interleaving can significantly improve -the perceived quality of an audio stream \[Perkins 1998\]. It also has -low overhead. The obvious disadvantage of interleaving is that it -increases latency. This limits its use for conversational applications -such as VoIP, although it can perform well for streaming stored audio. A -major advantage of interleaving is that it does not increase the -bandwidth requirements of a stream. Error Concealment Error concealment -schemes attempt to produce a replacement for a lost packet that is -similar to the original. As discussed in \[Perkins 1998\], this is -possible since audio - -Figure 9.6 Sending interleaved audio - -signals, and in particular speech, exhibit large amounts of short-term -self-similarity. As such, these techniques work for relatively small -loss rates (less than 15 percent), and for small packets (4--40 msecs). -When the loss length approaches the length of a phoneme (5--100 msecs) -these techniques break down, since whole phonemes may be missed by the -listener. Perhaps the simplest form of receiver-based recovery is packet -repetition. Packet repetition replaces lost packets with copies of the -packets that arrived immediately before the loss. It has low -computational complexity and performs reasonably well. Another form of -receiver-based recovery is interpolation, which uses audio before and -after the loss to interpolate a suitable packet to cover the loss. -Interpolation performs somewhat better than packet repetition but is -significantly more computationally intensive \[Perkins 1998\]. - -9.3.4 Case Study: VoIP with Skype Skype is an immensely popular VoIP -application with over 50 million accounts active on a daily basis. In -addition to providing host-to-host VoIP service, Skype offers -host-to-phone services, phone-to-host services, and multi-party -host-to-host video conferencing services. (Here, a host is again any -Internet connected IP device, including PCs, tablets, and smartphones.) -Skype was acquired by Microsoft in 2011. - -Because the Skype protocol is proprietary, and because all Skype's -control and media packets are encrypted, it is difficult to precisely -determine how Skype operates. Nevertheless, from the Skype Web site and -several measurement studies, researchers have learned how Skype -generally works \[Baset 2006; Guha 2006; Chen 2006; Suh 2006; Ren 2006; -Zhang X 2012\]. For both voice and video, the Skype clients have at -their disposal many different codecs, which are capable of encoding the -media at a wide range of rates and qualities. For example, video rates -for Skype have been measured to be as low as 30 kbps for a low-quality -session up to almost 1 Mbps for a high quality session \[Zhang X 2012\]. -Typically, Skype's audio quality is better than the "POTS" (Plain Old -Telephone Service) quality provided by the wire-line phone system. -(Skype codecs typically sample voice at 16,000 samples/sec or higher, -which provides richer tones than POTS, which samples at 8,000/sec.) By -default, Skype sends audio and video packets over UDP. However, control -packets are sent over TCP, and media packets are also sent over TCP when -firewalls block UDP streams. Skype uses FEC for loss recovery for both -voice and video streams sent over UDP. The Skype client also adapts the -audio and video streams it sends to current network conditions, by -changing video quality and FEC overhead \[Zhang X 2012\]. Skype uses P2P -techniques in a number of innovative ways, nicely illustrating how P2P -can be used in applications that go beyond content distribution and file -sharing. As with instant messaging, host-to-host Internet telephony is -inherently P2P since, at the heart of the application, pairs of users -(that is, peers) communicate with each other in real time. But Skype -also employs P2P techniques for two other important functions, namely, -for user location and for NAT traversal. - -Figure 9.7 Skype peers - -As shown in Figure 9.7, the peers (hosts) in Skype are organized into a -hierarchical overlay network, with each peer classified as a super peer -or an ordinary peer. Skype maintains an index that maps Skype usernames -to current IP addresses (and port numbers). This index is distributed -over the super peers. When Alice wants to call Bob, her Skype client -searches the distributed index to determine Bob's current IP address. -Because the Skype protocol is proprietary, it is currently not known how -the index mappings are organized across the super peers, although some -form of DHT organization is very possible. P2P techniques are also used -in Skype relays, which are useful for establishing calls between hosts -in home networks. Many home network configurations provide access to the -Internet through NATs, as discussed in Chapter 4. Recall that a NAT -prevents a host from outside the home network from initiating a -connection to a host within the home network. If both Skype callers have -NATs, then there is a problem---neither can accept a call initiated by -the other, making a call seemingly impossible. The clever use of super -peers and relays nicely solves this problem. Suppose that when Alice -signs in, she is assigned to a non-NATed super peer and initiates a -session to that super peer. (Since Alice is initiating the session, her -NAT permits this session.) This session allows Alice and her super peer -to exchange control messages. The same happens for Bob when he signs in. -Now, when Alice wants to call Bob, she informs her super peer, who in -turn informs Bob's super peer, who in turn informs Bob of Alice's -incoming call. If Bob accepts the call, the two super peers select a -third non-NATed super peer---the relay peer---whose job will be to relay -data between Alice and Bob. Alice's and Bob's super peers then instruct -Alice and Bob respectively to initiate a session with the relay. As -shown in Figure 9.7, Alice then sends voice packets to the relay over -the Alice-to-relay connection (which was initiated by Alice), and the -relay then forwards these packets over the relay-to-Bob connection -(which was initiated by Bob); packets from Bob to Alice flow over these -same two relay connections in reverse. And voila!---Bob and Alice have -an end-to-end connection even though neither can accept a session -originating from outside. Up to now, our discussion on Skype has focused -on calls involving two persons. Now let's examine multi-party audio -conference calls. With N\>2 participants, if each user were to send a -copy of its audio stream to each of the N−1 other users, then a total of -N(N−1) audio streams would need to be sent into the network to support -the audio conference. To reduce this bandwidth usage, Skype employs a -clever distribution technique. Specifically, each user sends its audio -stream to the conference initiator. The conference initiator combines -the audio streams into one stream (basically by adding all the audio -signals together) and then sends a copy of each combined stream to each -of the other N−1 participants. In this manner, the number of streams is -reduced to 2(N−1). For ordinary two-person video conversations, Skype -routes the call peer-to-peer, unless NAT traversal is required, in which -case the call is relayed through a non-NATed peer, as described earlier. -For a video conference call involving N\>2 participants, due to the -nature of the video medium, Skype does not combine the call into one - -stream at one location and then redistribute the stream to all the -participants, as it does for voice calls. Instead, each participant's -video stream is routed to a server cluster (located in Estonia as of -2011), which in turn relays to each participant the N−1 streams of the -N−1 other participants \[Zhang X 2012\]. You may be wondering why each -participant sends a copy to a server rather than directly sending a copy -of its video stream to each of the other N−1 participants? Indeed, for -both approaches, N(N−1) video streams are being collectively received by -the N participants in the conference. The reason is, because upstream -link bandwidths are significantly lower than downstream link bandwidths -in most access links, the upstream links may not be able to support the -N−1 streams with the P2P approach. VoIP systems such as Skype, WeChat, -and Google Talk introduce new privacy concerns. Specifically, when Alice -and Bob communicate over VoIP, Alice can sniff Bob's IP address and then -use geo-location services \[MaxMind 2016; Quova 2016\] to determine -Bob's current location and ISP (for example, his work or home ISP). In -fact, with Skype it is possible for Alice to block the transmission of -certain packets during call establishment so that she obtains Bob's -current IP address, say every hour, without Bob knowing that he is being -tracked and without being on Bob's contact list. Furthermore, the IP -address discovered from Skype can be correlated with IP addresses found -in BitTorrent, so that Alice can determine the files that Bob is -downloading \[LeBlond 2011\]. Moreover, it is possible to partially -decrypt a Skype call by doing a traffic analysis of the packet sizes in -a stream \[White 2011\]. - -9.4 Protocols for Real-Time Conversational Applications Real-time -conversational applications, including VoIP and video conferencing, are -compelling and very popular. It is therefore not surprising that -standards bodies, such as the IETF and ITU, have been busy for many -years (and continue to be busy!) at hammering out standards for this -class of applications. With the appropriate standards in place for -real-time conversational applications, independent companies are -creating new products that interoperate with each other. In this section -we examine RTP and SIP for real-time conversational applications. Both -standards are enjoying widespread implementation in industry products. - -9.4.1 RTP In the previous section, we learned that the sender side of a -VoIP application appends header fields to the audio chunks before -passing them to the transport layer. These header fields include -sequence numbers and timestamps. Since most multimedia networking -applications can make use of sequence numbers and timestamps, it is -convenient to have a standardized packet structure that includes fields -for audio/video data, sequence number, and timestamp, as well as other -potentially useful fields. RTP, defined in RFC 3550, is such a standard. -RTP can be used for transporting common formats such as PCM, ACC, and -MP3 for sound and MPEG and H.263 for video. It can also be used for -transporting proprietary sound and video formats. Today, RTP enjoys -widespread implementation in many products and research prototypes. It -is also complementary to other important real-time interactive -protocols, such as SIP. In this section, we provide an introduction to -RTP. We also encourage you to visit Henning Schulzrinne's RTP site -\[Schulzrinne-RTP 2012\], which provides a wealth of information on the -subject. Also, you may want to visit the RAT site \[RAT 2012\], which -documents VoIP application that uses RTP. RTP Basics RTP typically runs -on top of UDP. The sending side encapsulates a media chunk within an RTP -packet, then encapsulates the packet in a UDP segment, and then hands -the segment to IP. The receiving side extracts the RTP packet from the -UDP segment, then extracts the media chunk from the RTP packet, and then -passes the chunk to the media player for decoding and rendering. As an -example, consider the use of RTP to transport voice. Suppose the voice -source is PCM-encoded - -(that is, sampled, quantized, and digitized) at 64 kbps. Further suppose -that the application collects the encoded data in 20-msec chunks, that -is, 160 bytes in a chunk. The sending side precedes each chunk of the -audio data with an RTP header that includes the type of audio encoding, -a sequence number, and a timestamp. The RTP header is normally 12 bytes. -The audio chunk along with the RTP header form the RTP packet. The RTP -packet is then sent into the UDP socket interface. At the receiver side, -the application receives the RTP packet from its socket interface. The -application extracts the audio chunk from the RTP packet and uses the -header fields of the RTP packet to properly decode and play back the -audio chunk. If an application incorporates RTP---instead of a -proprietary scheme to provide payload type, sequence numbers, or -timestamps---then the application will more easily interoperate with -other networked multimedia applications. For example, if two different -companies develop VoIP software and they both incorporate RTP into their -product, there may be some hope that a user using one of the VoIP -products will be able to communicate with a user using the other VoIP -product. In Section 9.4.2, we'll see that RTP is often used in -conjunction with SIP, an important standard for Internet telephony. It -should be emphasized that RTP does not provide any mechanism to ensure -timely delivery of data or provide other quality-of-service (QoS) -guarantees; it does not even guarantee delivery of packets or prevent -out-of-order delivery of packets. Indeed, RTP encapsulation is seen only -at the end systems. Routers do not distinguish between IP datagrams that -carry RTP packets and IP datagrams that don't. RTP allows each source -(for example, a camera or a microphone) to be assigned its own -independent RTP stream of packets. For example, for a video conference -between two participants, four RTP streams could be opened---two streams -for transmitting the audio (one in each direction) and two streams for -transmitting the video (again, one in each direction). However, many -popular encoding techniques---including MPEG 1 and MPEG 2---bundle the -audio and video into a single stream during the encoding process. When -the audio and video are bundled by the encoder, then only one RTP stream -is generated in each direction. RTP packets are not limited to unicast -applications. They can also be sent over one-to-many and manyto-many -multicast trees. For a many-to-many multicast session, all of the -session's senders and sources typically use the same multicast group for -sending their RTP streams. RTP multicast streams belonging together, -such as audio and video streams emanating from multiple senders in a -video conference application, belong to an RTP session. - -Figure 9.8 RTP header fields - -RTP Packet Header Fields As shown in Figure 9.8, the four main RTP -packet header fields are the payload type, sequence number, timestamp, -and source identifier fields. The payload type field in the RTP packet -is 7 bits long. For an audio stream, the payload type field is used to -indicate the type of audio encoding (for example, PCM, adaptive delta -modulation, linear predictive encoding) that is being used. If a sender -decides to change the encoding in the middle of a session, the sender -can inform the receiver of the change through this payload type field. -The sender may want to change the encoding in order to increase the -audio quality or to decrease the RTP stream bit rate. Table 9.2 lists -some of the audio payload types currently supported by RTP. For a video -stream, the payload type is used to indicate the type of video encoding -(for example, motion JPEG, MPEG 1, MPEG 2, H.261). Again, the sender can -change video encoding on the fly during a session. Table 9.3 lists some -of the video payload types currently supported by RTP. The other -important fields are the following: Sequence number field. The sequence -number field is 16 bits long. The sequence number increments by one for -each RTP packet sent, and may be used by the receiver to detect packet -loss and to restore packet sequence. For example, if the receiver side -of the application receives a stream of RTP packets with a gap between -sequence numbers 86 and 89, then the receiver knows that packets 87 and -88 are missing. The receiver can then attempt to conceal the lost data. -Timestamp field. The timestamp field is 32 bits long. It reflects the -sampling instant of the first byte in the RTP data packet. As we saw in -the preceding section, the receiver can use timestamps to remove packet -jitter introduced in the network and to provide synchronous playout at -the receiver. The timestamp is derived from a sampling clock at the -sender. As an example, for audio the timestamp clock increments by one -for each sampling period (for example, each 125 μsec for an 8 kHz -sampling clock); if the audio application generates chunks consisting of -160 encoded samples, then the timestamp increases by 160 for each RTP -packet when the source is active. The timestamp clock continues to -increase at a constant rate even if the source is inactive. -Synchronization source identifier (SSRC). The SSRC field is 32 bits -long. It identifies the source of the RTP stream. Typically, each stream -in an RTP session has a distinct SSRC. The SSRC is not the IP address of -the sender, but instead is a number that the source assigns randomly -when the new stream is started. The probability that two streams get -assigned the same SSRC is very small. Should this happen, the two -sources pick a new SSRC value. Table 9.2 Audio payload types supported -by RTP Payload-Type Number - -Audio Format - -Sampling Rate - -Rate - -0 - -PCM μ-law - -8 kHz - -64 kbps - -1 - -1016 - -8 kHz - -4.8 kbps - -3 - -GSM - -8 kHz - -13 kbps - -7 - -LPC - -8 kHz - -2.4 kbps - -9 - -G.722 - -16 kHz - -48--64 kbps - -14 - -MPEG Audio - -90 kHz - ---- - -15 - -G.728 - -8 kHz - -16 kbps - -Table 9.3 Some video payload types supported by RTP Payload-Type Number - -Video Format - -26 - -Motion JPEG - -31 - -H.261 - -32 - -MPEG 1 video - -33 - -MPEG 2 video - -9.4.2 SIP The Session Initiation Protocol (SIP), defined in \[RFC 3261; -RFC 5411\], is an open and lightweight protocol that does the following: -It provides mechanisms for establishing calls between a caller and a -callee over an IP network. It allows the caller to notify the callee -that it wants to start a call. It allows the participants to agree on -media encodings. It also allows participants to end calls. It provides -mechanisms for the caller to determine the current IP address of the -callee. Users do not have a single, fixed IP address because they may be -assigned addresses dynamically (using DHCP) and because they may have -multiple IP devices, each with a different IP address. It provides -mechanisms for call management, such as adding new media streams during -the call, - -changing the encoding during the call, inviting new participants during -the call, call transfer, and call holding. Setting Up a Call to a Known -IP Address To understand the essence of SIP, it is best to take a look -at a concrete example. In this example, Alice is at her PC and she wants -to call Bob, who is also working at his PC. Alice's and Bob's PCs are -both equipped with SIP-based software for making and receiving phone -calls. In this initial example, we'll assume that Alice knows the IP -address of Bob's PC. Figure 9.9 illustrates the SIP call-establishment -process. In Figure 9.9, we see that an SIP session begins when Alice -sends Bob an INVITE message, which resembles an HTTP request message. -This INVITE message is sent over UDP to the well-known port 5060 for -SIP. (SIP messages can also be sent over TCP.) The INVITE message -includes an identifier for Bob (bob@193.64.210.89), an indication of -Alice's current IP address, an indication that Alice desires to receive -audio, which is to be encoded in format AVP 0 (PCM encoded μ-law) and - -Figure 9.9 SIP call establishment when Alice knows Bob's IP address - -encapsulated in RTP, and an indication that she wants to receive the RTP -packets on port 38060. After receiving Alice's INVITE message, Bob sends -an SIP response message, which resembles an HTTP response message. This -response SIP message is also sent to the SIP port 5060. Bob's response -includes a 200 OK as well as an indication of his IP address, his -desired encoding and packetization for reception, and his port number to -which the audio packets should be sent. Note that in this example Alice -and Bob are going to use different audio-encoding mechanisms: Alice is -asked to encode her audio with GSM whereas Bob is asked to encode his -audio with PCM μ-law. After receiving Bob's response, Alice sends Bob an -SIP acknowledgment message. After this SIP transaction, Bob and Alice -can talk. (For visual convenience, Figure 9.9 shows Alice talking after -Bob, but in truth they would normally talk at the same time.) Bob will -encode and packetize the audio as requested and send the audio packets -to port number 38060 at IP address 167.180.112.24. Alice will also -encode and packetize the audio as requested and send the audio packets -to port number 48753 at IP address 193.64.210.89. From this simple -example, we have learned a number of key characteristics of SIP. First, -SIP is an outof-band protocol: The SIP messages are sent and received in -sockets that are different from those used for sending and receiving the -media data. Second, the SIP messages themselves are ASCII-readable and -resemble HTTP messages. Third, SIP requires all messages to be -acknowledged, so it can run over UDP or TCP. In this example, let's -consider what would happen if Bob does not have a PCM μ-law codec for -encoding audio. In this case, instead of responding with 200 OK, Bob -would likely respond with a 606 Not Acceptable and list in the message -all the codecs he can use. Alice would then choose one of the listed -codecs and send another INVITE message, this time advertising the chosen -codec. Bob could also simply reject the call by sending one of many -possible rejection reply codes. (There are many such codes, including -"busy," "gone," "payment required," and "forbidden.") SIP Addresses In -the previous example, Bob's SIP address is sip:bob@193.64.210.89. -However, we expect many---if not most---SIP addresses to resemble e-mail -addresses. For example, Bob's address might be sip:bob@domain.com. When -Alice's SIP device sends an INVITE message, the message would include -this e-mail-like address; the SIP infrastructure would then route the -message to the IP device that Bob is currently using (as we'll discuss -below). Other possible forms for the SIP address could be Bob's legacy -phone number or simply Bob's first/middle/last name (assuming it is -unique). An interesting feature of SIP addresses is that they can be -included in Web pages, just as people's email addresses are included in -Web pages with the mailto URL. For example, suppose Bob has a - -personal homepage, and he wants to provide a means for visitors to the -homepage to call him. He could then simply include the URL -sip:bob@domain.com. When the visitor clicks on the URL, the SIP -application in the visitor's device is launched and an INVITE message is -sent to Bob. SIP Messages In this short introduction to SIP, we'll not -cover all SIP message types and headers. Instead, we'll take a brief -look at the SIP INVITE message, along with a few common header lines. -Let us again suppose that Alice wants to initiate a VoIP call to Bob, -and this time Alice knows only Bob's SIP address, bob@domain.com, and -does not know the IP address of the device that Bob is currently using. -Then her message might look something like this: - -INVITE sip:bob@domain.com SIP/2.0 Via: SIP/2.0/UDP 167.180.112.24 From: -sip:alice@hereway.com To: sip:bob@domain.com Call-ID: -a2e3a@pigeon.hereway.com Content-Type: application/sdp Content-Length: -885 c=IN IP4 167.180.112.24 m=audio 38060 RTP/AVP 0 - -The INVITE line includes the SIP version, as does an HTTP request -message. Whenever an SIP message passes through an SIP device (including -the device that originates the message), it attaches a Via header, which -indicates the IP address of the device. (We'll see soon that the typical -INVITE message passes through many SIP devices before reaching the -callee's SIP application.) Similar to an e-mail message, the SIP message -includes a From header line and a To header line. The message includes a -Call-ID, which uniquely identifies the call (similar to the message-ID -in e-mail). It includes a Content-Type header line, which defines the -format used to describe the content contained in the SIP message. It -also includes a Content-Length header line, which provides the length in -bytes of the content in the message. Finally, after a carriage return -and line feed, the message contains the content. In this case, the -content provides information about Alice's IP address and how Alice -wants to receive the audio. Name Translation and User Location In the -example in Figure 9.9, we assumed that Alice's SIP device knew the IP -address where Bob could - -be contacted. But this assumption is quite unrealistic, not only because -IP addresses are often dynamically assigned with DHCP, but also because -Bob may have multiple IP devices (for example, different devices for his -home, work, and car). So now let us suppose that Alice knows only Bob's -e-mail address, bob@domain.com, and that this same address is used for -SIP-based calls. In this case, Alice needs to obtain the IP address of -the device that the user bob@domain.com is currently using. To find this -out, Alice creates an INVITE message that begins with INVITE -bob@domain.com SIP/2.0 and sends this message to an SIP proxy. The proxy -will respond with an SIP reply that might include the IP address of the -device that bob@domain.com is currently using. Alternatively, the reply -might include the IP address of Bob's voicemail box, or it might include -a URL of a Web page (that says "Bob is sleeping. Leave me alone!"). -Also, the result returned by the proxy might depend on the caller: If -the call is from Bob's wife, he might accept the call and supply his IP -address; if the call is from Bob's mother-inlaw, he might respond with -the URL that points to the I-am-sleeping Web page! Now, you are probably -wondering, how can the proxy server determine the current IP address for -bob@domain.com? To answer this question, we need to say a few words -about another SIP device, the SIP registrar. Every SIP user has an -associated registrar. Whenever a user launches an SIP application on a -device, the application sends an SIP register message to the registrar, -informing the registrar of its current IP address. For example, when Bob -launches his SIP application on his PDA, the application would send a -message along the lines of: - -REGISTER sip:domain.com SIP/2.0 Via: SIP/2.0/UDP 193.64.210.89 From: -sip:bob@domain.com To: sip:bob@domain.com Expires: 3600 - -Bob's registrar keeps track of Bob's current IP address. Whenever Bob -switches to a new SIP device, the new device sends a new register -message, indicating the new IP address. Also, if Bob remains at the same -device for an extended period of time, the device will send refresh -register messages, indicating that the most recently sent IP address is -still valid. (In the example above, refresh messages need to be sent -every 3600 seconds to maintain the address at the registrar server.) It -is worth noting that the registrar is analogous to a DNS authoritative -name server: The DNS server translates fixed host names to fixed IP -addresses; the SIP registrar translates fixed human identifiers (for -example, bob@domain.com) to dynamic IP addresses. Often SIP registrars -and SIP proxies are run on the same host. Now let's examine how Alice's -SIP proxy server obtains Bob's current IP address. From the preceding -discussion we see that the proxy server simply needs to forward Alice's -INVITE message to Bob's registrar/proxy. The registrar/proxy could then -forward the message to Bob's current SIP device. Finally, - -Bob, having now received Alice's INVITE message, could send an SIP -response to Alice. As an example, consider Figure 9.10, in which -jim@umass.edu, currently working on 217.123.56.89, wants to initiate a -Voice-over-IP (VoIP) session with keith@upenn.edu, currently working on -197.87.54.21. The following steps are taken: - -Figure 9.10 Session initiation, involving SIP proxies and registrars - -(1) Jim sends an INVITE message to the umass SIP proxy. (2) The proxy - does a DNS lookup on the SIP registrar upenn.edu (not shown in - diagram) and then forwards the message to the registrar server. -(2) Because keith@upenn.edu is no longer registered at the upenn - registrar, the upenn registrar sends a redirect response, indicating - that it should try keith@nyu.edu. (4) The umass proxy sends an - INVITE message to the NYU SIP registrar. (5) The NYU registrar knows - the IP address of keith@upenn.edu and forwards the INVITE message to - the host 197.87.54.21, which is running Keith's SIP client. (6--8) - An SIP response is sent back through registrars/proxies to the SIP - client on 217.123.56.89. (9) Media is sent directly between the two - clients. (There is also an SIP acknowledgment message, which is not - shown.) Our discussion of SIP has focused on call initiation for - voice calls. SIP, being a signaling protocol for initiating and - ending calls in general, can be used for video conference calls as - well as for text-based - -sessions. In fact, SIP has become a fundamental component in many -instant messaging applications. Readers desiring to learn more about SIP -are encouraged to visit Henning Schulzrinne's SIP Web site -\[Schulzrinne-SIP 2016\]. In particular, on this site you will find open -source software for SIP clients and servers \[SIP Software 2016\]. - -9.5 Network Support for Multimedia In Sections 9.2 through 9.4, we -learned how application-level mechanisms such as client buffering, -prefetching, adapting media quality to available bandwidth, adaptive -playout, and loss mitigation techniques can be used by multimedia -applications to improve a multimedia application's performance. We also -learned how content distribution networks and P2P overlay networks can -be used to provide a system-level approach for delivering multimedia -content. These techniques and approaches are all designed to be used in -today's best-effort Internet. Indeed, they are in use today precisely -because the Internet provides only a single, best-effort class of -service. But as designers of computer networks, we can't help but ask -whether the network (rather than the applications or application-level -infrastructure alone) might provide mechanisms to support multimedia -content delivery. As we'll see shortly, the answer is, of course, "yes"! -But we'll also see that a number of these new network-level mechanisms -have yet to be widely deployed. This may be due to their complexity and -to the fact that application-level techniques together with best-effort -service and properly dimensioned network resources (for example, -bandwidth) can indeed provide a "good-enough" (even if -not-always-perfect) end-to-end multimedia delivery service. Table 9.4 -summarizes three broad approaches towards providing network-level -support for multimedia applications. Making the best of best-effort -service. The application-level mechanisms and infrastructure that we -studied in Sections 9.2 through 9.4 can be successfully used in a -well-dimensioned network where packet loss and excessive end-to-end -delay rarely occur. When demand increases are forecasted, the ISPs -deploy additional bandwidth and switching capacity to continue to ensure -satisfactory delay and packet-loss performance \[Huang 2005\]. We'll -discuss such network dimensioning further in Section 9.5.1. -Differentiated service. Since the early days of the Internet, it's been -envisioned that different types of traffic (for example, as indicated in -the Type-of-Service field in the IP4v packet header) could be provided -with different classes of service, rather than a single -one-size-fits-all best-effort service. With differentiated service, one -type of traffic might be given strict priority over another class of -traffic when both types of traffic are queued at a router. For example, -packets belonging to a realtime conversational application might be -given priority over other packets due to their stringent delay -constraints. Introducing differentiated service into the network will -require new mechanisms for packet marking (indicating a packet's class -of service), packet scheduling, and more. We'll cover differentiated -service, and new network mechanisms needed to implement this service, in -Sections 9.5.2 and 9.5.3. - -Table 9.4 Three network-level approaches to supporting multimedia -applications Approach - -Granularity - -Guarantee - -Mechanisms - -Complexity - -Deployment to date - -Making the - -all traffic - -none, or - -application-layer - -best of best- - -treated - -soft - -support, CDNs, - -effort service - -equally - -minimal - -everywhere - -medium - -some - -light - -little - -overlays, networklevel resource provisioning - -Differentiated - -different - -none, or - -packet marking, - -service - -classes of - -soft - -policing, scheduling - -traffic treated differently Per- - -each - -soft or - -packet marking, - -connection - -source- - -hard, once - -policing, - -Quality-of- - -destination - -flow is - -scheduling; call - -Service (QoS) - -flows treated - -admitted - -admission and - -Guarantees - -differently - -signaling - -Per-connection Quality-of-Service (QoS) Guarantees. With per-connection -QoS guarantees, each instance of an application explicitly reserves -end-to-end bandwidth and thus has a guaranteed end-to-end performance. A -hard guarantee means the application will receive its requested quality -of service (QoS) with certainty. A soft guarantee means the application -will receive its requested quality of service with high probability. For -example, if a user wants to make a VoIP call from Host A to Host B, the -user's VoIP application reserves bandwidth explicitly in each link along -a route between the two hosts. But permitting applications to make -reservations and requiring the network to honor the reservations -requires some big changes. First, we need a protocol that, on behalf of -the applications, reserves link bandwidth on the paths from the senders -to their receivers. Second, we'll need new scheduling policies in the -router queues so that per-connection bandwidth reservations can be -honored. Finally, in order to make a reservation, the applications must -give the network a description of the traffic that they intend to send -into the network and the network will need to police each application's -traffic to make sure that it abides by that description. These -mechanisms, when combined, require new and complex software in hosts and -routers. Because per-connection QoS guaranteed service has not seen -significant deployment, we'll cover these mechanisms only briefly in -Section 9.5.4. - -9.5.1 Dimensioning Best-Effort Networks Fundamentally, the difficulty in -supporting multimedia applications arises from their stringent -performance requirements---low end-to-end packet delay, delay jitter, -and loss---and the fact that packet delay, delay jitter, and loss occur -whenever the network becomes congested. A first approach to improving -the quality of multimedia applications---an approach that can often be -used to solve just about any problem where resources are -constrained---is simply to "throw money at the problem" and thus simply -avoid resource contention. In the case of networked multimedia, this -means providing enough link capacity throughout the network so that -network congestion, and its consequent packet delay and loss, never (or -only very rarely) occurs. With enough link capacity, packets could zip -through today's Internet without queuing delay or loss. From many -perspectives this is an ideal situation---multimedia applications would -perform perfectly, users would be happy, and this could all be achieved -with no changes to Internet's best-effort architecture. The question, of -course, is how much capacity is "enough" to achieve this nirvana, and -whether the costs of providing "enough" bandwidth are practical from a -business standpoint to the ISPs. The question of how much capacity to -provide at network links in a given topology to achieve a given level of -performance is often known as bandwidth provisioning. The even more -complicated problem of how to design a network topology (where to place -routers, how to interconnect routers with links, and what capacity to -assign to links) to achieve a given level of end-to-end performance is a -network design problem often referred to as network dimensioning. Both -bandwidth provisioning and network dimensioning are complex topics, well -beyond the scope of this textbook. We note here, however, that the -following issues must be addressed in order to predict application-level -performance between two network end points, and thus provision enough -capacity to meet an application's performance requirements. Models of -traffic demand between network end points. Models may need to be -specified at both the call level (for example, users "arriving" to the -network and starting up end-to-end applications) and at the packet level -(for example, packets being generated by ongoing applications). Note -that workload may change over time. Well-defined performance -requirements. For example, a performance requirement for supporting -delay-sensitive traffic, such as a conversational multimedia -application, might be that the probability that the end-to-end delay of -the packet is greater than a maximum tolerable delay be less than some -small value \[Fraleigh 2003\]. Models to predict end-to-end performance -for a given workload model, and techniques to find a minimal cost -bandwidth allocation that will result in all user requirements being -met. Here, researchers are busy developing performance models that can -quantify performance for a given workload, and optimization techniques -to find minimal-cost bandwidth allocations meeting performance -requirements. - -Given that today's best-effort Internet could (from a technology -standpoint) support multimedia traffic at an appropriate performance -level if it were dimensioned to do so, the natural question is why -today's Internet doesn't do so. The answers are primarily economic and -organizational. From an economic standpoint, would users be willing to -pay their ISPs enough for the ISPs to install sufficient bandwidth to -support multimedia applications over a best-effort Internet? The -organizational issues are perhaps even more daunting. Note that an -end-to-end path between two multimedia end points will pass through the -networks of multiple ISPs. From an organizational standpoint, would -these ISPs be willing to cooperate (perhaps with revenue sharing) to -ensure that the end-to-end path is properly dimensioned to support -multimedia applications? For a perspective on these economic and -organizational issues, see \[Davies 2005\]. For a perspective on -provisioning tier-1 backbone networks to support delay-sensitive -traffic, see \[Fraleigh 2003\]. - -9.5.2 Providing Multiple Classes of Service Perhaps the simplest -enhancement to the one-size-fits-all best-effort service in today's -Internet is to divide traffic into classes, and provide different levels -of service to these different classes of traffic. For example, an ISP -might well want to provide a higher class of service to delay-sensitive -Voice-over-IP or teleconferencing traffic (and charge more for this -service!) than to elastic traffic such as e-mail or HTTP. Alternatively, -an ISP may simply want to provide a higher quality of service to -customers willing to pay more for this improved service. A number of -residential wired-access ISPs and cellular wireless-access ISPs have -adopted such tiered levels of service---with platinum-service -subscribers receiving better performance than gold- or silver-service -subscribers. We're all familiar with different classes of service from -our everyday lives---first-class airline passengers get better service -than business-class passengers, who in turn get better service than -those of us who fly economy class; VIPs are provided immediate entry to -events while everyone else waits in line; elders are revered in some -countries and provided seats of honor and the finest food at a table. -It's important to note that such differential service is provided among -aggregates of traffic, that is, among classes of traffic, not among -individual connections. For example, all first-class passengers are -handled the same (with no first-class passenger receiving any better -treatment than any other first-class passenger), just as all VoIP -packets would receive the same treatment within the network, independent -of the particular end-to-end connection to which they belong. As we will -see, by dealing with a small number of traffic aggregates, rather than a -large number of individual connections, the new network mechanisms -required to provide better-than-best service can be kept relatively -simple. The early Internet designers clearly had this notion of multiple -classes of service in mind. Recall the type-of-service (ToS) field in -the IPv4 header discussed in Chapter 4. IEN123 \[ISI 1979\] describes -the ToS field also present in an ancestor of the IPv4 datagram as -follows: "The Type of Service \[field\] - -provides an indication of the abstract parameters of the quality of -service desired. These parameters are to be used to guide the selection -of the actual service parameters when transmitting a datagram through a -particular network. Several networks offer service precedence, which -somehow treats high precedence traffic as more important that other -traffic." More than four decades ago, the vision of providing different -levels of service to different classes of traffic was clear! However, -it's taken us an equally long period of time to realize this vision. -Motivating Scenarios Let's begin our discussion of network mechanisms -for providing multiple classes of service with a few motivating -scenarios. Figure 9.11 shows a simple network scenario in which two -application packet flows originate on Hosts H1 and H2 on one LAN and are -destined for Hosts H3 and H4 on another LAN. The routers on the two LANs -are connected by a 1.5 Mbps link. Let's assume the LAN speeds are -significantly higher than 1.5 Mbps, and focus on the output queue of -router R1; it is here that packet delay and packet loss will occur if -the aggregate sending rate of H1 and H2 exceeds 1.5 Mbps. Let's further -suppose that a 1 Mbps audio application (for example, a CD-quality audio -call) shares the - -Figure 9.11 Competing audio and HTTP applications - -1.5 Mbps link between R1 and R2 with an HTTP Web-browsing application -that is downloading a Web page from H2 to H4. In the best-effort -Internet, the audio and HTTP packets are mixed in the output queue at R1 -and (typically) transmitted in a first-in-first-out (FIFO) order. In -this scenario, a burst of packets from the Web - -server could potentially fill up the queue, causing IP audio packets to -be excessively delayed or lost due to buffer overflow at R1. How should -we solve this potential problem? Given that the HTTP Webbrowsing -application does not have time constraints, our intuition might be to -give strict priority to audio packets at R1. Under a strict priority -scheduling discipline, an audio packet in the R1 output buffer would -always be transmitted before any HTTP packet in the R1 output buffer. -The link from R1 to R2 would look like a dedicated link of 1.5 Mbps to -the audio traffic, with HTTP traffic using the R1-to-R2 link only when -no audio traffic is queued. In order for R1 to distinguish between the -audio and HTTP packets in its queue, each packet must be marked as -belonging to one of these two classes of traffic. This was the original -goal of the type-of-service (ToS) field in IPv4. As obvious as this -might seem, this then is our first insight into mechanisms needed to -provide multiple classes of traffic: Insight 1: Packet marking allows a -router to distinguish among packets belonging to different classes of -traffic. Note that although our example considers a competing multimedia -and elastic flow, the same insight applies to the case that platinum, -gold, and silver classes of service are implemented---a packetmarking -mechanism is still needed to indicate that class of service to which a -packet belongs. Now suppose that the router is configured to give -priority to packets marked as belonging to the 1 Mbps audio application. -Since the outgoing link speed is 1.5 Mbps, even though the HTTP packets -receive lower priority, they can still, on average, receive 0.5 Mbps of -transmission service. But what happens if the audio application starts -sending packets at a rate of 1.5 Mbps or higher (either maliciously or -due to an error in the application)? In this case, the HTTP packets will -starve, that is, they will not receive any service on the R1-to-R2 link. -Similar problems would occur if multiple applications (for example, -multiple audio calls), all with the same class of service as the audio -application, were sharing the link's bandwidth; they too could -collectively starve the FTP session. Ideally, one wants a degree of -isolation among classes of traffic so that one class of traffic can be -protected from the other. This protection could be implemented at -different places in the network---at each and every router, at first -entry to the network, or at inter-domain network boundaries. This then -is our second insight: Insight 2: It is desirable to provide a degree of -traffic isolation among classes so that one class is not adversely -affected by another class of traffic that misbehaves. We'll examine -several specific mechanisms for providing such isolation among traffic -classes. We note here that two broad approaches can be taken. First, it -is possible to perform traffic policing, as shown in Figure 9.12. If a -traffic class or flow must meet certain criteria (for example, that the -audio flow not exceed a peak rate of 1 Mbps), then a policing mechanism -can be put into place to ensure that these criteria are indeed observed. -If the policed application misbehaves, the policing mechanism will take -some action (for example, drop or delay packets that are in violation of -the criteria) so that the traffic actually entering the network conforms -to the criteria. The leaky bucket mechanism that we'll examine - -shortly is perhaps the most widely used policing mechanism. In Figure -9.12, the packet classification and marking mechanism (Insight 1) and -the policing mechanism (Insight 2) are both implemented together at the -network's edge, either in the end system or at an edge router. A -complementary approach for providing isolation among traffic classes is -for the link-level packetscheduling mechanism to explicitly allocate a -fixed amount of link bandwidth to each class. For example, the audio -class could be allocated 1 Mbps at R1, and the HTTP class could be -allocated 0.5 Mbps. In this case, the audio and - -Figure 9.12 Policing (and marking) the audio and HTTP traffic classes - -Figure 9.13 Logical isolation of audio and HTTP traffic classes - -HTTP flows see a logical link with capacity 1.0 and 0.5 Mbps, -respectively, as shown in Figure 9.13. With strict enforcement of the -link-level allocation of bandwidth, a class can use only the amount of -bandwidth that has been allocated; in particular, it cannot utilize -bandwidth that is not currently being used by others. For example, if -the audio flow goes silent (for example, if the speaker pauses and -generates no audio packets), the HTTP flow would still not be able to -transmit more than 0.5 Mbps over the R1-to-R2 link, even though the -audio flow's 1 Mbps bandwidth allocation is not being used at that -moment. Since bandwidth is a "use-it-or-lose-it" resource, there is no -reason to prevent HTTP traffic from using bandwidth not used by the -audio traffic. We'd like to use bandwidth as efficiently as possible, -never wasting it when it could be otherwise used. This gives rise to our -third insight: Insight 3: While providing isolation among classes or -flows, it is desirable to use resources (for example, link bandwidth and -buffers) as efficiently as possible. Recall from our discussion in -Sections 1.3 and 4.2 that packets belonging to various network flows are -multiplexed and queued for transmission at the output buffers associated -with a link. The manner in which queued packets are selected for -transmission on the link is known as the link-scheduling discipline, and -was discussed in detail in Section 4.2. Recall that in Section 4.2 three -link-scheduling disciplines were discussed, namely, FIFO, priority -queuing, and Weighted Fair Queuing (WFQ). We'll see soon see that WFQ -will play a particularly important role for isolating the traffic -classes. The Leaky Bucket One of our earlier insights was that policing, -the regulation of the rate at which a class or flow (we will assume the -unit of policing is a flow in our discussion below) is allowed to inject -packets into the - -network, is an important QoS mechanism. But what aspects of a flow's -packet rate should be policed? We can identify three important policing -criteria, each differing from the other according to the time scale over -which the packet flow is policed: Average rate. The network may wish to -limit the long-term average rate (packets per time interval) at which a -flow's packets can be sent into the network. A crucial issue here is the -interval of time over which the average rate will be policed. A flow -whose average rate is limited to 100 packets per second is more -constrained than a source that is limited to 6,000 packets per minute, -even though both have the same average rate over a long enough interval -of time. For example, the latter constraint would allow a flow to send -1,000 packets in a given second-long interval of time, while the former -constraint would disallow this sending behavior. Peak rate. While the -average-rate constraint limits the amount of traffic that can be sent -into the network over a relatively long period of time, a peak-rate -constraint limits the maximum number of packets that can be sent over a -shorter period of time. Using our example above, the network may police -a flow at an average rate of 6,000 packets per minute, while limiting -the flow's peak rate to 1,500 packets per second. Burst size. The -network may also wish to limit the maximum number of packets (the -"burst" of packets) that can be sent into the network over an extremely -short interval of time. In the limit, as the interval length approaches -zero, the burst size limits the number of packets that can be -instantaneously sent into the network. Even though it is physically -impossible to instantaneously send multiple packets into the network -(after all, every link has a physical transmission rate that cannot be -exceeded!), the abstraction of a maximum burst size is a useful one. The -leaky bucket mechanism is an abstraction that can be used to -characterize these policing limits. As shown in Figure 9.14, a leaky -bucket consists of a bucket that can hold up to b tokens. Tokens are -added to this bucket as follows. New tokens, which may potentially be -added to the bucket, are always being generated at a rate of r tokens -per second. (We assume here for simplicity that the unit of time is a -second.) If the bucket is filled with less than b tokens when a token is -generated, the newly generated token is added to the bucket; otherwise -the newly generated token is ignored, and the token bucket remains full -with b tokens. Let us now consider how the leaky bucket can be used to -police a packet flow. Suppose that before a packet is transmitted into -the network, it must first remove a token from the token bucket. If the -token bucket is empty, the packet must wait for - -Figure 9.14 The leaky bucket policer - -a token. (An alternative is for the packet to be dropped, although we -will not consider that option here.) Let us now consider how this -behavior polices a traffic flow. Because there can be at most b tokens -in the bucket, the maximum burst size for a leaky-bucket-policed flow is -b packets. Furthermore, because the token generation rate is r, the -maximum number of packets that can enter the network of any interval of -time of length t is rt+b. Thus, the token-generation rate, r, serves to -limit the long-term average rate at which packets can enter the network. -It is also possible to use leaky buckets (specifically, two leaky -buckets in series) to police a flow's peak rate in addition to the -long-term average rate; see the homework problems at the end of this -chapter. Leaky Bucket + Weighted Fair Queuing = Provable Maximum Delay -in a Queue Let's close our discussion on policing by showing how the -leaky bucket and WFQ can be combined to provide a bound on the delay -through a router's queue. (Readers who have forgotten about WFQ are -encouraged to review WFQ, which is covered in Section 4.2.) Let's -consider a router's output link that multiplexes n flows, each policed -by a leaky bucket with parameters bi and ri,i=1,...,n, using WFQ -scheduling. We use the term flow here loosely to refer to the set of -packets that are not distinguished from each other by the scheduler. In -practice, a flow might be comprised of traffic from a single end-toend -connection or a collection of many such connections, see Figure 9.15. -Recall from our discussion of WFQ that each flow, i, is guaranteed to -receive a share of the link bandwidth equal to at least R⋅wi/(∑ wj), -where R is the transmission - -Figure 9.15 n multiplexed leaky bucket flows with WFQ scheduling - -rate of the link in packets/sec. What then is the maximum delay that a -packet will experience while waiting for service in the WFQ (that is, -after passing through the leaky bucket)? Let us focus on flow 1. Suppose -that flow 1's token bucket is initially full. A burst of b1 packets then -arrives to the leaky bucket policer for flow 1. These packets remove all -of the tokens (without wait) from the leaky bucket and then join the WFQ -waiting area for flow 1. Since these b1 packets are served at a rate of -at least R⋅wi/(∑ wj) packet/sec, the last of these packets will then -have a maximum delay, dmax, until its transmission is completed, where -dmax=b1R⋅w1/∑ wj The rationale behind this formula is that if there are -b1 packets in the queue and packets are being serviced (removed) from -the queue at a rate of at least R⋅w1/(∑ wj) packets per second, then the -amount of time until the last bit of the last packet is transmitted -cannot be more than b1/(R⋅w1/(∑ wj)). A homework problem asks you to -prove that as long as r1\<R⋅w1/(∑ wj), then dmax is indeed the maximum -delay that any packet in flow 1 will ever experience in the WFQ queue. - -9.5.3 Diffserv Having seen the motivation, insights, and specific -mechanisms for providing multiple classes of service, let's wrap up our -study of approaches toward proving multiple classes of service with an -example---the Internet Diffserv architecture \[RFC 2475; Kilkki 1999\]. -Diffserv provides service differentiation---that is, the ability to -handle different classes of traffic in different ways within the -Internet in a scalable manner. - -The need for scalability arises from the fact that millions of -simultaneous source-destination traffic flows may be present at a -backbone router. We'll see shortly that this need is met by placing only -simple functionality within the network core, with more complex control -operations being implemented at the network's edge. Let's begin with the -simple network shown in Figure 9.16. We'll describe one possible use of -Diffserv here; other variations are possible, as described in RFC 2475. -The Diffserv architecture consists of two sets of functional elements: -Edge functions: Packet classification and traffic conditioning. At the -incoming edge of the network (that is, at either a Diffserv-capable host -that generates traffic or at the first Diffserv-capable router that the -traffic passes through), arriving packets are marked. More specifically, -the differentiated service (DS) field in the IPv4 or IPv6 packet header -is set to some value \[RFC 3260\]. The definition of the DS field is -intended to supersede the earlier definitions of the IPv4 type-ofservice -field and the IPv6 traffic class fields that we discussed in Chapter 4. -For example, in Figure 9.16, packets being sent from H1 to H3 might be -marked at R1, while packets being sent from H2 to H4 might be marked at -R2. The mark that a packet receives identifies the class of traffic to -which it belongs. Different classes of traffic will then receive -different service within the core network. - -Figure 9.16 A simple Diffserv network example - -Core function: Forwarding. When a DS-marked packet arrives at a -Diffserv-capable router, the packet is forwarded onto its next hop -according to the so-called per-hop behavior (PHB) associated with that -packet's class. The per-hop behavior influences how a router's buffers -and link bandwidth are shared among the competing classes of traffic. A -crucial tenet of the Diffserv architecture is that - -a router's per-hop behavior will be based only on packet markings, that -is, the class of traffic to which a packet belongs. Thus, if packets -being sent from H1 to H3 in Figure 9.16 receive the same marking as -packets being sent from H2 to H4, then the network routers treat these -packets as an aggregate, without distinguishing whether the packets -originated at H1 or H2. For example, R3 would not distinguish between -packets from H1 and H2 when forwarding these packets on to R4. Thus, the -Diffserv architecture obviates the need to keep router state for -individual sourcedestination pairs---a critical consideration in making -Diffserv scalable. An analogy might prove useful here. At many -large-scale social events (for example, a large public reception, a -large dance club or discothèque, a concert, or a football game), people -entering the event receive a pass of one type or another: VIP passes for -Very Important People; over-21 passes for people who are 21 years old or -older (for example, if alcoholic drinks are to be served); backstage -passes at concerts; press passes for reporters; even an ordinary pass -for the Ordinary Person. These passes are typically distributed upon -entry to the event, that is, at the edge of the event. It is here at the -edge where computationally intensive operations, such as paying for -entry, checking for the appropriate type of invitation, and matching an -invitation against a piece of identification, are performed. -Furthermore, there may be a limit on the number of people of a given -type that are allowed into an event. If there is such a limit, people -may have to wait before entering the event. Once inside the event, one's -pass allows one to receive differentiated service at many locations -around the event---a VIP is provided with free drinks, a better table, -free food, entry to exclusive rooms, and fawning service. Conversely, an -ordinary person is excluded from certain areas, pays for drinks, and -receives only basic service. In both cases, the service received within -the event depends solely on the type of one's pass. Moreover, all people -within a class are treated alike. Figure 9.17 provides a logical view of -the classification and marking functions within the edge router. Packets -arriving to the edge router are first classified. The classifier selects -packets based on the values of one or more packet header fields (for -example, source address, destination address, source port, destination -port, and protocol ID) and steers the packet to the appropriate marking -function. As noted above, a packet's marking is carried in the DS field -in the packet header. In some cases, an end user may have agreed to -limit its packet-sending rate to conform to a declared traffic profile. -The traffic profile might contain a limit on the peak rate, as well as -the burstiness of the packet flow, as we saw previously with the leaky -bucket mechanism. As long as the user sends packets into the network in -a way that conforms to the negotiated traffic profile, the packets -receive their priority - -Figure 9.17 A simple Diffserv network example - -marking and are forwarded along their route to the destination. On the -other hand, if the traffic profile is violated, out-of-profile packets -might be marked differently, might be shaped (for example, delayed so -that a maximum rate constraint would be observed), or might be dropped -at the network edge. The role of the metering function, shown in Figure -9.17, is to compare the incoming packet flow with the negotiated traffic -profile and to determine whether a packet is within the negotiated -traffic profile. The actual decision about whether to immediately -remark, forward, delay, or drop a packet is a policy issue determined by -the network administrator and is not specified in the Diffserv -architecture. So far, we have focused on the marking and policing -functions in the Diffserv architecture. The second key component of the -Diffserv architecture involves the per-hop behavior (PHB) performed by -Diffservcapable routers. PHB is rather cryptically, but carefully, -defined as "a description of the externally observable forwarding -behavior of a Diffserv node applied to a particular Diffserv behavior -aggregate" \[RFC 2475\]. Digging a little deeper into this definition, -we can see several important considerations embedded within: A PHB can -result in different classes of traffic receiving different performance -(that is, different externally observable forwarding behaviors). While a -PHB defines differences in performance (behavior) among classes, it does -not mandate any particular mechanism for achieving these behaviors. As -long as the externally observable performance criteria are met, any -implementation mechanism and any buffer/bandwidth allocation policy can -be used. For example, a PHB would not require that a particular -packet-queuing discipline (for example, a priority queue versus a WFQ -queue versus a FCFS queue) be used to achieve a particular behavior. The -PHB is the end, to which resource allocation and implementation -mechanisms are the means. Differences in performance must be observable -and hence measurable. - -Two PHBs have been defined: an expedited forwarding (EF) PHB \[RFC -3246\] and an assured forwarding (AF) PHB \[RFC 2597\]. The expedited -forwarding PHB specifies that the departure rate of a class of traffic -from a router must equal or exceed a configured rate. The assured -forwarding PHB divides traffic into four classes, where each AF class is -guaranteed to be provided with some minimum amount of bandwidth and -buffering. Let's close our discussion of Diffserv with a few -observations regarding its service model. First, we have implicitly -assumed that Diffserv is deployed within a single administrative domain, -but typically an endto-end service must be fashioned from multiple ISPs -sitting between communicating end systems. In order to provide -end-to-end Diffserv service, all the ISPs between the end systems must -not only provide this service, but most also cooperate and make -settlements in order to offer end customers true end-to-end service. -Without this kind of cooperation, ISPs directly selling Diffserv service -to customers will find themselves repeatedly saying: "Yes, we know you -paid extra, but we don't have a service agreement with the ISP that -dropped and delayed your traffic. I'm sorry that there were so many gaps -in your VoIP call!" Second, if Diffserv were actually in place and the -network ran at only moderate load, most of the time there would be no -perceived difference between a best-effort service and a Diffserv -service. Indeed, end-to-end delay is usually dominated by access rates -and router hops rather than by queuing delays in the routers. Imagine -the unhappy Diffserv customer who has paid more for premium service but -finds that the best-effort service being provided to others almost -always has the same performance as premium service! - -9.5.4 Per-Connection Quality-of-Service (QoS) Guarantees: Resource -Reservation and Call Admission In the previous section, we have seen -that packet marking and policing, traffic isolation, and link-level -scheduling can provide one class of service with better performance than -another. Under certain scheduling disciplines, such as priority -scheduling, the lower classes of traffic are essentially "invisible" to -the highest-priority class of traffic. With proper network dimensioning, -the highest class of service can indeed achieve extremely low packet -loss and delay---essentially circuit-like performance. But can the -network guarantee that an ongoing flow in a high-priority traffic class -will continue to receive such service throughout the flow's duration -using only the mechanisms that we have described so far? It cannot. In -this section, we'll see why yet additional network mechanisms and -protocols are required when a hard service guarantee is provided to -individual connections. Let's return to our scenario from Section 9.5.2 -and consider two 1 Mbps audio applications transmitting their packets -over the 1.5 Mbps link, as shown in Figure 9.18. The combined data rate -of the two flows (2 Mbps) exceeds the link capacity. Even with -classification and marking, isolation of flows, and sharing of unused -bandwidth (of which there is none), this is clearly a losing -proposition. There is simply not - -enough bandwidth to accommodate the needs of both applications at - -Figure 9.18 Two competing audio applications overloading the R1-to-R2 -link - -the same time. If the two applications equally share the bandwidth, each -application would lose 25 percent of its transmitted packets. This is -such an unacceptably low QoS that both audio applications are completely -unusable; there's no need even to transmit any audio packets in the -first place. Given that the two applications in Figure 9.18 cannot both -be satisfied simultaneously, what should the network do? Allowing both -to proceed with an unusable QoS wastes network resources on application -flows that ultimately provide no utility to the end user. The answer is -hopefully clear---one of the application flows should be blocked (that -is, denied access to the network), while the other should be allowed to -proceed on, using the full 1 Mbps needed by the application. The -telephone network is an example of a network that performs such call -blocking---if the required resources (an end-to-end circuit in the case -of the telephone network) cannot be allocated to the call, the call is -blocked (prevented from entering the network) and a busy signal is -returned to the user. In our example, there is no gain in allowing a -flow into the network if it will not receive a sufficient QoS to be -considered usable. Indeed, there is a cost to admitting a flow that does -not receive its needed QoS, as network resources are being used to -support a flow that provides no utility to the end user. By explicitly -admitting or blocking flows based on their resource requirements, and -the source requirements of already-admitted flows, the network can -guarantee that admitted flows will be able to receive their requested -QoS. Implicit in the need to provide a guaranteed QoS to a flow is the -need for the flow to declare its QoS requirements. This process of -having a flow declare its QoS requirement, and then having the network -either accept the flow (at the required QoS) or block the flow is -referred to as the call admission process. This then is our fourth -insight (in addition to the three earlier insights from Section 9.5.2,) -into the mechanisms needed to provide QoS. - -Insight 4: If sufficient resources will not always be available, and QoS -is to be guaranteed, a call admission process is needed in which flows -declare their QoS requirements and are then either admitted to the -network (at the required QoS) or blocked from the network (if the -required QoS cannot be provided by the network). Our motivating example -in Figure 9.18 highlights the need for several new network mechanisms -and protocols if a call (an end-to-end flow) is to be guaranteed a given -quality of service once it begins: Resource reservation. The only way to -guarantee that a call will have the resources (link bandwidth, buffers) -needed to meet its desired QoS is to explicitly allocate those resources -to the call---a process known in networking parlance as resource -reservation. Once resources are reserved, the call has on-demand access -to these resources throughout its duration, regardless of the demands of -all other calls. If a call reserves and receives a guarantee of x Mbps -of link bandwidth, and never transmits at a rate greater than x, the -call will see loss- and delay-free performance. Call admission. If -resources are to be reserved, then the network must have a mechanism for -calls to request and reserve resources. Since resources are not -infinite, a call making a call admission request will be denied -admission, that is, be blocked, if the requested resources are not -available. Such a call admission is performed by the telephone -network---we request resources when we dial a number. If the circuits -(TDMA slots) needed to complete the call are available, the circuits are -allocated and the call is completed. If the circuits are not available, -then the call is blocked, and we receive a busy signal. A blocked call -can try again to gain admission to the network, but it is not allowed to -send traffic into the network until it has successfully completed the -call admission process. Of course, a router that allocates link -bandwidth should not allocate more than is available at that link. -Typically, a call may reserve only a fraction of the link's bandwidth, -and so a router may allocate link bandwidth to more than one call. -However, the sum of the allocated bandwidth to all calls should be less -than the link capacity if hard quality of service guarantees are to be -provided. Call setup signaling. The call admission process described -above requires that a call be able to reserve sufficient resources at -each and every network router on its source-to-destination path to -ensure that its end-to-end QoS requirement is met. Each router must -determine the local resources required by the session, consider the -amounts of its resources that are already committed to other ongoing -sessions, and determine whether it has sufficient resources to satisfy -the per-hop QoS requirement of the session at this router without -violating local QoS guarantees made to an alreadyadmitted session. A -signaling protocol is needed to coordinate these various -activities---the per-hop allocation of local resources, as well as the -overall end-to-end decision of whether or not the call has been able to -reserve suf - -Figure 9.19 The call setup process - -ficient resources at each and every router on the end-to-end path. This -is the job of the call setup protocol, as shown in Figure 9.19. The RSVP -protocol \[Zhang 1993, RFC 2210\] was proposed for this purpose within -an Internet architecture for providing quality-of-service guarantees. In -ATM networks, the Q2931b protocol \[Black 1995\] carries this -information among the ATM network's switches and end point. Despite a -tremendous amount of research and development, and even products that -provide for perconnection quality of service guarantees, there has been -almost no extended deployment of such services. There are many possible -reasons. First and foremost, it may well be the case that the simple -application-level mechanisms that we studied in Sections 9.2 through -9.4, combined with proper network dimensioning (Section 9.5.1) provide -"good enough" best-effort network service for multimedia applications. -In addition, the added complexity and cost of deploying and managing a -network that provides per-connection quality of service guarantees may -be judged by ISPs to be simply too high given predicted customer -revenues for that service. - -9.6 Summary Multimedia networking is one of the most exciting -developments in the Internet today. People throughout the world less and -less time in front of their televisions, and are instead use their -smartphones and devices to receive audio and video transmissions, both -live and prerecorded. Moreover, with sites like YouTube, users have -become producers as well as consumers of multimedia Internet content. In -addition to video distribution, the Internet is also being used to -transport phone calls. In fact, over the next 10 years, the Internet, -along with wireless Internet access, may make the traditional -circuitswitched telephone system a thing of the past. VoIP not only -provides phone service inexpensively, but also provides numerous -value-added services, such as video conferencing, online directory -services, voice messaging, and integration into social networks such as -Facebook and WeChat. In Section 9.1, we described the intrinsic -characteristics of video and voice, and then classified multimedia -applications into three categories: (i) streaming stored audio/video, -(ii) conversational voice/video-over-IP, and (iii) streaming live -audio/video. In Section 9.2, we studied streaming stored video in some -depth. For streaming video applications, prerecorded videos are placed -on servers, and users send requests to these servers to view the videos -on demand. We saw that streaming video systems can be classified into -two categories: UDP streaming and HTTP. We observed that the most -important performance measure for streaming video is average throughput. -In Section 9.3, we examined how conversational multimedia applications, -such as VoIP, can be designed to run over a best-effort network. For -conversational multimedia, timing considerations are important because -conversational applications are highly delay-sensitive. On the other -hand, conversational multimedia applications are -loss---tolerant---occasional loss only causes occasional glitches in -audio/video playback, and these losses can often be partially or fully -concealed. We saw how a combination of client buffers, packet sequence -numbers, and timestamps can greatly alleviate the effects of -network-induced jitter. We also surveyed the technology behind Skype, -one of the leading voice- and video-over-IP companies. In Section 9.4, -we examined two of the most important standardized protocols for VoIP, -namely, RTP and SIP. In Section 9.5, we introduced how several network -mechanisms (link-level scheduling disciplines and traffic policing) can -be used to provide differentiated service among several classes of -traffic. - -Homework Problems and Questions - -Chapter 9 Review Questions - -SECTION 9.1 R1. Reconstruct Table 9.1 for when Victor Video is watching -a 4 Mbps video, Facebook Frank is looking at a new 100 Kbyte image every -20 seconds, and Martha Music is listening to 200 kbps audio stream. R2. -There are two types of redundancy in video. Describe them, and discuss -how they can be exploited for efficient compression. R3. Suppose an -analog audio signal is sampled 16,000 times per second, and each sample -is quantized into one of 1024 levels. What would be the resulting bit -rate of the PCM digital audio signal? R4. Multimedia applications can be -classified into three categories. Name and describe each category. - -SECTION 9.2 R5. Streaming video systems can be classified into three -categories. Name and briefly describe each of these categories. R6. List -three disadvantages of UDP streaming. R7. With HTTP streaming, are the -TCP receive buffer and the client's application buffer the same thing? -If not, how do they interact? R8. Consider the simple model for HTTP -streaming. Suppose the server sends bits at a constant rate of 2 Mbps -and playback begins when 8 million bits have been received. What is the -initial buffering delay tp? - -SECTION 9.3 R9. What is the difference between end-to-end delay and -packet jitter? What are the causes of packet jitter? R10. Why is a -packet that is received after its scheduled playout time considered -lost? R11. Section 9.3 describes two FEC schemes. Briefly summarize -them. Both schemes increase the transmission rate of the stream by -adding overhead. Does interleaving also increase the - -transmission rate? - -SECTION 9.4 R12. How are different RTP streams in different sessions -identified by a receiver? How are different streams from within the same -session identified? R13. What is the role of a SIP registrar? How is the -role of an SIP registrar different from that of a home agent in Mobile -IP? - -Problems P1. Consider the figure below. Similar to our discussion of -Figure 9.1 , suppose that video is encoded at a fixed bit rate, and thus -each video block contains video frames that are to be played out over -the same fixed amount of time, Δ. The server transmits the first video -block at t0, the second block at t0+Δ, the third block at t0+2Δ, and so -on. Once the client begins playout, each block should be played out Δ -time units after the previous block. - -a. Suppose that the client begins playout as soon as the first block - arrives at t1. In the figure below, how many blocks of video - (including the first block) will have arrived at the client in time - for their playout? Explain how you arrived at your answer. - -b. Suppose that the client begins playout now at t1+Δ. How many blocks - of video (including the first block) will have arrived at the client - in time for their playout? Explain how you arrived at your answer. - -c. In the same scenario at (b) above, what is the largest number of - blocks that is ever stored in the client buffer, awaiting playout? - Explain how you arrived at your answer. - -d. What is the smallest playout delay at the client, such that every - video block has arrived in time for its playout? Explain how you - arrived at your answer. - -P2. Recall the simple model for HTTP streaming shown in Figure 9.3 . -Recall that B denotes the size of the client's application buffer, and Q -denotes the number of bits that must be buffered before the client -application begins playout. Also r denotes the video consumption rate. -Assume that the server sends bits at a constant rate x whenever the -client buffer is not full. a. Suppose that x\<r. As discussed in the -text, in this case playout will alternate between periods of continuous -playout and periods of freezing. Determine the length of each continuous -playout and freezing period as a function of Q, r, and x. b. Now suppose -that x\>r. At what time t=tf does the client application buffer become -full? P3. Recall the simple model for HTTP streaming shown in Figure 9.3 -. Suppose the buffer size is infinite but the server sends bits at -variable rate x(t). Specifically, suppose x(t) has the following -saw-tooth shape. The rate is initially zero at time t=0 and linearly -climbs to H at time t=T. It then repeats this pattern again and again, -as shown in the figure below. - -a. What is the server's average send rate? - -b. Suppose that Q=0, so that the client starts playback as soon as it - receives a video frame. What will happen? - -c. Now suppose Q\>0 and HT/2≥Q. Determine as a function of Q, H, and T - the time at which playback first begins. - -d. Suppose H\>2r and Q=HT/2. Prove there will be no freezing after the - initial playout delay. - -e. Suppose H\>2r. Find the smallest value of Q such that there will be - no freezing after the initial playback delay. - -f. Now suppose that the buffer size B is finite. Suppose H\>2r. As a - function of Q, B, T, and H, determine the time t=tf when the client - application buffer first becomes full. P4. Recall the simple model - for HTTP streaming shown in Figure 9.3 . Suppose the client - application buffer is infinite, the server sends at the constant - rate x, and the video consumption r\<x. - -rate is r with Also suppose playback begins immediately. Suppose that -the user terminates the video early at time t=E. At the time of -termination, the server stops sending bits (if it hasn't already sent -all the bits in the video). - -a. Suppose the video is infinitely long. How many bits are wasted (that - is, sent but not viewed)? - -b. Suppose the video is T seconds long with T\>E. How many bits are - wasted (that is, sent but not viewed)? P5. Consider a DASH system - (as discussed in Section 2.6 ) for which there are N video versions - (at N different rates and qualities) and N audio versions (at N - different rates and qualities). Suppose we want to allow the player - to choose at any time any of the N video versions and any of the N - audio versions. - -c. If we create files so that the audio is mixed in with the video, so - server sends only one media stream at given time, how many files - will the server need to store (each a different URL)? - -d. If the server instead sends the audio and video streams separately - and has the client synchronize the streams, how many files will the - server need to store? P6. In the VoIP example in Section 9.3 , let h - be the total number of header bytes added to each chunk, including - UDP and IP header. - -e. Assuming an IP datagram is emitted every 20 msecs, find the - transmission rate in bits per second for the datagrams generated by - one side of this application. - -f. What is a typical value of h when RTP is used? P7. Consider the - procedure described in Section 9.3 for estimating average delay di. - Suppose that u=0.1. Let r1−t1 be the most recent sample delay, let - r2−t2 be the next most recent sample delay, and so on. - -g. For a given audio application suppose four packets have arrived at - the receiver with sample delays r4−t4, r3−t3, r2−t2, and r1−t1. - Express the estimate of delay d in terms of the four samples. - -h. Generalize your formula for n sample delays. - -i. For the formula in part (b), let n approach infinity and give the - resulting formula. Comment on why this averaging procedure is called - an exponential moving average. P8. Repeat parts (a) and (b) in - Question P7 for the estimate of average delay deviation. P9. For the - VoIP example in Section 9.3 , we introduced an online procedure - (exponential moving average) for estimating delay. In this problem - we will examine an alternative procedure. Let ti be the timestamp of - the ith packet received; let ri be the time at which the ith packet - is received. Let dn be our estimate of average delay after receiving - the nth packet. After the first packet is received, we set the delay - estimate equal to d1=r1−t1. - -a. Suppose that we would like dn=(r1−t1+r2−t2+⋯+rn−tn)/n for all n. Give -a recursive formula for dn in terms of dn−1, rn, and tn. - -b. Describe why for Internet telephony, the delay estimate described in - Section 9.3 is more appropriate than the delay estimate outlined in - part (a). P10. Compare the procedure described in Section 9.3 for - estimating average delay with the procedure in Section 3.5 for - estimating round-trip time. What do the procedures have in common? - How are they different? P11. Consider the figure below (which is - similar to Figure 9.3 ). A sender begins sending packetized audio - periodically at t=1. The first packet arrives at the receiver at - t=8. - -c. What are the delays (from sender to receiver, ignoring any playout - delays) of packets 2 through 8? Note that each vertical and - horizontal line segment in the figure has a length of 1, 2, or 3 - time units. - -d. If audio playout begins as soon as the first packet arrives at the - receiver at t=8, which of the first eight packets sent will not - arrive in time for playout? - -e. If audio playout begins at t=9, which of the first eight packets - sent will not arrive in time for playout? - -f. What is the minimum playout delay at the receiver that results in - all of the first eight packets arriving in time for their playout? - P12. Consider again the figure in P11, showing packet audio - transmission and reception times. - -g. Compute the estimated delay for packets 2 through 8, using the - formula for di from Section 9.3.2 . Use a value of u=0.1. - -b. Compute the estimated deviation of the delay from the estimated -average for packets 2 through 8, using the formula for vi from Section -9.3.2 . Use a value of u=0.1. P13. Recall the two FEC schemes for VoIP -described in Section 9.3 . Suppose the first scheme generates a -redundant chunk for every four original chunks. Suppose the second -scheme uses a low-bit rate encoding whose transmission rate is 25 -percent of the transmission rate of the nominal stream. - -a. How much additional bandwidth does each scheme require? How much - playback delay does each scheme add? - -b. How do the two schemes perform if the first packet is lost in every - group of five packets? Which scheme will have better audio quality? - -c. How do the two schemes perform if the first packet is lost in every - group of two packets? Which scheme will have better audio quality? - P14. - -d. Consider an audio conference call in Skype with N\>2 participants. - Suppose each participant generates a constant stream of rate r bps. - How many bits per second will the call initiator need to send? How - many bits per second will each of the other N−1 participants need to - send? What is the total send rate, aggregated over all participants? - -e. Repeat part (a) for a Skype video conference call using a central - server. - -f. Repeat part (b), but now for when each peer sends a copy of its - video stream to each of the N−1 other peers. P15. - -g. Suppose we send into the Internet two IP datagrams, each carrying a - different UDP segment. The first datagram has source IP address A1, - destination IP address B, source port P1, and destination port T. - The second datagram has source IP address A2, destination IP address - B, source port P2, and destination port T. Suppose that A1 is - different from A2 and that P1 is different from P2. Assuming that - both datagrams reach their final destination, will the two UDP - datagrams be received by the same socket? Why or why not? - -h. Suppose Alice, Bob, and Claire want to have an audio conference call - using SIP and RTP. For Alice to send and receive RTP packets to and - from Bob and Claire, is only one UDP socket sufficient (in addition - to the socket needed for the SIP messages)? If yes, then how does - Alice's SIP client distinguish between the RTP packets received from - Bob and Claire? P16. True or false: - -i. If stored video is streamed directly from a Web server to a media - player, then the application is using TCP as the underlying - transport protocol. - -b. When using RTP, it is possible for a sender to change encoding in the -middle of a session. - -c. All applications that use RTP must use port 87. - -d. If an RTP session has a separate audio and video stream for each - sender, then the audio and video streams use the same SSRC. - -e. In differentiated services, while per-hop behavior defines - differences in performance among classes, it does not mandate any - particular mechanism for achieving these performances. - -f. Suppose Alice wants to establish an SIP session with Bob. In her - INVITE message she includes the line: m=audio 48753 RTP/AVP 3 (AVP 3 - denotes GSM audio). Alice has therefore indicated in this message - that she wishes to send GSM audio. - -g. Referring to the preceding statement, Alice has indicated in her - INVITE message that she will send audio to port 48753. - -h. SIP messages are typically sent between SIP entities using a default - SIP port number. - -i. In order to maintain registration, SIP clients must periodically - send REGISTER messages. - -j. SIP mandates that all SIP clients support G.711 audio encoding. P17. - Consider the figure below, which shows a leaky bucket policer being - fed by a stream of packets. The token buffer can hold at most two - tokens, and is initially full at t=0. New tokens arrive at a rate of - one token per slot. The output link speed is such that if two - packets obtain tokens at the beginning of a time slot, they can both - go to the output link in the same slot. The timing details of the - system are as follows: - -A. Packets (if any) arrive at the beginning of the slot. Thus in the -figure, packets 1, 2, and 3 arrive in slot 0. If there are already -packets in the queue, then the arriving packets join the end of the -queue. Packets proceed towards the front of the queue in a FIFO manner. - -B. After the arrivals have been added to the queue, if there are any -queued packets, one or two of those packets (depending on the number of -available tokens) will each remove a token from the token buffer and go -to the output link during that slot. Thus, packets 1 and - -2 each remove a token from the buffer (since there are initially two -tokens) and go to the output link during slot 0. - -C. A new token is added to the token buffer if it is not full, since the -token generation rate is r = 1 token/slot. - -D. Time then advances to the next time slot, and these steps repeat. -Answer the following questions: - -a. For each time slot, identify the packets that are in the queue and - the number of tokens in the bucket, immediately after the arrivals - have been processed (step 1 above) but before any of the packets - have passed through the queue and removed a token. Thus, for the t=0 - time slot in the example above, packets 1, 2, and 3 are in the - queue, and there are two tokens in the buffer. - -b. For each time slot indicate which packets appear on the output after - the token(s) have been removed from the queue. Thus, for the t=0 - time slot in the example above, packets 1 and 2 appear on the output - link from the leaky buffer during slot 0. P18. Repeat P17 but assume - that r=2. Assume again that the bucket is initially full. P19. - Consider P18 and suppose now that r=3 and that b=2 as before. Will - your answer to the question above change? P20. Consider the leaky - bucket policer that polices the average rate and burst size of a - packet flow. We now want to police the peak rate, p, as well. Show - how the output of this leaky bucket policer can be fed into a second - leaky bucket policer so that the two leaky buckets in series police - the average rate, peak rate, and burst size. Be sure to give the - bucket size and token generation rate for the second policer. P21. A - packet flow is said to conform to a leaky bucket specification - (r, b) with burst size b and average rate r if the number of packets - that arrive to the leaky bucket is less than rt+b packets in every - interval of time of length t for all t. Will a packet flow that - conforms to a leaky bucket specification (r, b) ever have to wait at - a leaky bucket policer with parameters r and b? Justify your answer. - P22. Show that as long as r1\<Rw1/(∑ wj), then dmax is indeed the - maximum delay that any packet in flow 1 will ever experience in the - WFQ queue. Programming Assignment In this lab, you will implement a - streaming video server and client. The client will use the real-time - streaming protocol (RTSP) to control the actions of the server. The - server will use the real-time protocol (RTP) to packetize the video - for transport over UDP. You will be given Python code that partially - implements RTSP and RTP at the client and server. Your job will be - to complete both the client and server code. When you are finished, - you will have created a client-server application that does the - following: - -The client sends SETUP, PLAY, PAUSE, and TEARDOWN RTSP commands, and the -server responds to the commands. When the server is in the playing -state, it periodically grabs a stored JPEG frame, packetizes the frame -with RTP, and sends the RTP packet into a UDP socket. The client -receives the RTP packets, removes the JPEG frames, decompresses the -frames, and renders the frames on the client's monitor. The code you -will be given implements the RTSP protocol in the server and the RTP -depacketization in the client. The code also takes care of displaying -the transmitted video. You will need to implement RTSP in the client and -RTP server. This programming assignment will significantly enhance the -student's understanding of RTP, RTSP, and streaming video. It is highly -recommended. The assignment also suggests a number of optional -exercises, including implementing the RTSP DESCRIBE command at both -client and server. You can find full details of the assignment, as well -as an overview of the RTSP protocol, at the Web site -www.pearsonhighered.com/cs-resources. AN INTERVIEW WITH . . . Henning -Schulzrinne Henning Schulzrinne is a professor, chair of the Department -of Computer Science, and head of the Internet Real-Time Laboratory at -Columbia University. He is the co-author of RTP, RTSP, SIP, and -GIST---key protocols for audio and video communications over the -Internet. Henning received his BS in electrical and industrial -engineering at TU Darmstadt in Germany, his MS in electrical and -computer engineering at the University of Cincinnati, and his PhD in -electrical engineering at the University of Massachusetts, Amherst. - -What made you decide to specialize in multimedia networking? This -happened almost by accident. As a PhD student, I got involved with -DARTnet, an experimental network spanning the United States with T1 -lines. DARTnet was used as a proving ground for multicast and Internet -real-time tools. That led me to write my first audio tool, NeVoT. -Through some of the DARTnet participants, I became involved in the IETF, -in the then-nascent - -Audio Video Transport working group. This group later ended up -standardizing RTP. What was your first job in the computer industry? -What did it entail? My first job in the computer industry was soldering -together an Altair computer kit when I was a high school student in -Livermore, California. Back in Germany, I started a little consulting -company that devised an address management program for a travel -agency---storing data on cassette tapes for our TRS-80 and using an IBM -Selectric typewriter with a home-brew hardware interface as a printer. -My first real job was with AT&T Bell Laboratories, developing a network -emulator for constructing experimental networks in a lab environment. -What are the goals of the Internet Real-Time Lab? Our goal is to provide -components and building blocks for the Internet as the single future -communications infrastructure. This includes developing new protocols, -such as GIST (for network-layer signaling) and LoST (for finding -resources by location), or enhancing protocols that we have worked on -earlier, such as SIP, through work on rich presence, peer-to-peer -systems, next-generation emergency calling, and service creation tools. -Recently, we have also looked extensively at wireless systems for VoIP, -as 802.11b and 802.11n networks and maybe WiMax networks are likely to -become important last-mile technologies for telephony. We are also -trying to greatly improve the ability of users to diagnose faults in the -complicated tangle of providers and equipment, using a peer-to-peer -fault diagnosis system called DYSWIS (Do You See What I See). We try to -do practically relevant work, by building prototypes and open source -systems, by measuring performance of real systems, and by contributing -to IETF standards. What is your vision for the future of multimedia -networking? We are now in a transition phase; just a few years shy of -when IP will be the universal platform for multimedia services, from -IPTV to VoIP. We expect radio, telephone, and TV to be available even -during snowstorms and earthquakes, so when the Internet takes over the -role of these dedicated networks, users will expect the same level of -reliability. We will have to learn to design network technologies for an -ecosystem of competing carriers, service and content providers, serving -lots of technically untrained users and defending them against a small, -but destructive, set of malicious and criminal users. Changing protocols -is becoming increasingly hard. They are also becoming more complex, as -they need to take into account competing business interests, security, -privacy, and the lack of transparency of networks caused by firewalls -and network address translators. Since multimedia networking is becoming -the foundation for almost all of consumer - -entertainment, there will be an emphasis on managing very large -networks, at low cost. Users will expect ease of use, such as finding -the same content on all of their devices. Why does SIP have a promising -future? As the current wireless network upgrade to 3G networks proceeds, -there is the hope of a single multimedia signaling mechanism spanning -all types of networks, from cable modems, to corporate telephone -networks and public wireless networks. Together with software radios, -this will make it possible in the future that a single device can be -used on a home network, as a cordless BlueTooth phone, in a corporate -network via 802.11 and in the wide area via 3G networks. Even before we -have such a single universal wireless device, the personal mobility -mechanisms make it possible to hide the differences between networks. -One identifier becomes the universal means of reaching a person, rather -than remembering or passing around half a dozen technology- or -location-specific telephone numbers. SIP also breaks apart the provision -of voice (bit) transport from voice services. It now becomes technically -possible to break apart the local telephone monopoly, where one company -provides neutral bit transport, while others provide IP "dial tone" and -the classical telephone services, such as gateways, call forwarding, and -caller ID. Beyond multimedia signaling, SIP offers a new service that -has been missing in the Internet: event notification. We have -approximated such services with HTTP kludges and e-mail, but this was -never very satisfactory. Since events are a common abstraction for -distributed systems, this may simplify the construction of new services. -Do you have any advice for students entering the networking field? -Networking bridges disciplines. It draws from electrical engineering, -all aspects of computer science, operations research, statistics, -economics, and other disciplines. Thus, networking researchers have to -be familiar with subjects well beyond protocols and routing algorithms. -Given that networks are becoming such an important part of everyday -life, students wanting to make a difference in the field should think of -the new resource constraints in networks: human time and effort, rather -than just bandwidth or storage. Work in networking research can be -immensely satisfying since it is about allowing people to communicate -and exchange ideas, one of the essentials of being human. The Internet -has become the third major global infrastructure, next to the -transportation system and energy distribution. Almost no part of the -economy can work without high-performance networks, so there should be -plenty of opportunities for the foreseeable future. - -References A note on URLs. In the references below, we have provided -URLs for Web pages, Web-only documents, and other material that has not -been published in a conference or journal (when we have been able to -locate a URL for such material). We have not provided URLs for -conference and journal publications, as these documents can usually be -located via a search engine, from the conference Web site (e.g., papers -in all ACM SIGCOMM conferences and workshops can be located via -http://www.acm.org/ sigcomm), or via a digital library subscription. -While all URLs provided below were valid (and tested) in Jan. 2016, URLs -can become out of date. Please consult the online version of this book -(www.pearsonhighered .com/cs-resources) for an up-to-date bibliography. -A note on Internet Request for Comments (RFCs): Copies of Internet RFCs -are available at many sites. The RFC Editor of the Internet Society (the -body that oversees the RFCs) maintains the site, -http://www.rfc-editor.org. This site allows you to search for a specific -RFC by title, number, or authors, and will show updates to any RFCs -listed. Internet RFCs can be updated or obsoleted by later RFCs. Our -favorite site for getting RFCs is the original -source---http://www.rfc-editor.org. \[3GPP 2016\] Third Generation -Partnership Project homepage, http://www.3gpp.org/ - -\[Abramson 1970\] N. Abramson, "The Aloha System---Another Alternative -for Computer Communications," Proc. 1970 Fall Joint Computer Conference, -AFIPS Conference, p. 37, 1970. - -\[Abramson 1985\] N. Abramson, "Development of the Alohanet," IEEE -Transactions on Information Theory, Vol. IT-31, No. 3 (Mar. 1985), -pp. 119--123. - -\[Abramson 2009\] N. Abramson, "The Alohanet---Surfing for Wireless -Data," IEEE Communications Magazine, Vol. 47, No. 12, pp. 21--25. - -\[Adhikari 2011a\] V. K. Adhikari, S. Jain, Y. Chen, Z. L. Zhang, -"Vivisecting YouTube: An Active Measurement Study," Technical Report, -University of Minnesota, 2011. - -\[Adhikari 2012\] V. K. Adhikari, Y. Gao, F. Hao, M. Varvello, V. Hilt, -M. Steiner, Z. L. Zhang, "Unreeling Netflix: Understanding and Improving -Multi-CDN Movie Delivery," Technical Report, University of Minnesota, -2012. - -\[Afanasyev 2010\] A. Afanasyev, N. Tilley, P. Reiher, L. Kleinrock, -"Host-to-Host Congestion Control for TCP," IEEE Communications Surveys & -Tutorials, Vol. 12, No. 3, pp. 304--342. - -\[Agarwal 2009\] S. Agarwal, J. Lorch, "Matchmaking for Online Games and -Other Latency-sensitive P2P Systems," Proc. 2009 ACM SIGCOMM. - -\[Ager 2012\] B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, W. -Willinger, "Anatomy of a Large European ISP," Sigcomm, 2012. - -\[Ahn 1995\] J. S. Ahn, P. B. Danzig, Z. Liu, and Y. Yan, "Experience -with TCP Vegas: Emulation and Experiment," Proc. 1995 ACM SIGCOMM -(Boston, MA, Aug. 1995), pp. 185--195. - -\[Akamai 2016\] Akamai homepage, http://www.akamai.com - -\[Akella 2003\] A. Akella, S. Seshan, A. Shaikh, "An Empirical -Evaluation of Wide-Area Internet Bottlenecks," Proc. 2003 ACM Internet -Measurement Conference (Miami, FL, Nov. 2003). - -\[Akhshabi 2011\] S. Akhshabi, A. C. Begen, C. Dovrolis, "An -Experimental Evaluation of Rate-Adaptation Algorithms in Adaptive -Streaming over HTTP," Proc. 2011 ACM Multimedia Systems Conf. - -\[Akyildiz 2010\] I. Akyildiz, D. Gutierrex-Estevez, E. Reyes, "The -Evolution to 4G Cellular Systems, LTE Advanced," Physical Communication, -Elsevier, 3 (2010), 217--244. - -\[Albitz 1993\] P. Albitz and C. Liu, DNS and BIND, O'Reilly & -Associates, Petaluma, CA, 1993. - -\[Al-Fares 2008\] M. Al-Fares, A. Loukissas, A. Vahdat, "A Scalable, -Commodity Data Center Network Architecture," Proc. 2008 ACM SIGCOMM. - -\[Amazon 2014\] J. Hamilton, "AWS: Innovation at Scale, YouTube video, -https://www.youtube.com/watch?v=JIQETrFC_SQ - -\[Anderson 1995\] J. B. Andersen, T. S. Rappaport, S. Yoshida, -"Propagation Measurements and Models for Wireless Communications -Channels," IEEE Communications Magazine, (Jan. 1995), pp. 42--49. - -\[Alizadeh 2010\] M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. -Patel, B. Prabhakar, S. Sengupta, M. Sridharan. "Data center TCP -(DCTCP)," ACM SIGCOMM 2010 Conference, ACM, New York, NY, USA, -pp. 63--74. - -\[Allman 2011\] E. Allman, "The Robustness Principle Reconsidered: -Seeking a Middle Ground," Communications of the ACM, Vol. 54, No. 8 -(Aug. 2011), pp. 40--45. - -\[Appenzeller 2004\] G. Appenzeller, I. Keslassy, N. McKeown, "Sizing -Router Buffers," Proc. 2004 ACM SIGCOMM (Portland, OR, Aug. 2004). - -\[ASO-ICANN 2016\] The Address Supporting Organization homepage, -http://www.aso.icann.org - -\[AT&T 2013\] "AT&T Vision Alignment Challenge Technology Survey," AT&T -Domain 2.0 Vision White Paper, November 13, 2013. - -\[Atheros 2016\] Atheros Communications Inc., "Atheros AR5006 WLAN -Chipset Product Bulletins," -http://www.atheros.com/pt/AR5006Bulletins.htm - -\[Ayanoglu 1995\] E. Ayanoglu, S. Paul, T. F. La Porta, K. K. Sabnani, -R. D. Gitlin, "AIRMAIL: A Link-Layer Protocol for Wireless Networks," -ACM ACM/Baltzer Wireless Networks Journal, 1: 47--60, Feb. 1995. - -\[Bakre 1995\] A. Bakre, B. R. Badrinath, "I-TCP: Indirect TCP for -Mobile Hosts," Proc. 1995 Int. Conf. on Distributed Computing Systems -(ICDCS) (May 1995), pp. 136--143. - -\[Balakrishnan 1997\] H. Balakrishnan, V. Padmanabhan, S. Seshan, R. -Katz, "A Comparison of Mechanisms for Improving TCP Performance Over -Wireless Links," IEEE/ACM Transactions on Networking Vol. 5, No. 6 -(Dec. 1997). - -\[Balakrishnan 2003\] H. Balakrishnan, F. Kaashoek, D. Karger, R. -Morris, I. Stoica, "Looking Up Data in P2P Systems," Communications of -the ACM, Vol. 46, No. 2 (Feb. 2003), pp. 43--48. - -\[Baldauf 2007\] M. Baldauf, S. Dustdar, F. Rosenberg, "A Survey on -Context-Aware Systems," Int. J. Ad Hoc and Ubiquitous Computing, Vol. 2, -No. 4 (2007), pp. 263--277. - -\[Baran 1964\] P. Baran, "On Distributed Communication Networks," IEEE -Transactions on Communication Systems, Mar. 1964. Rand Corporation -Technical report with the same title (Memorandum RM-3420-PR, 1964). -http://www.rand.org/publications/RM/RM3420/ - -\[Bardwell 2004\] J. Bardwell, "You Believe You Understand What You -Think I Said . . . The Truth About 802.11 Signal and Noise Metrics: A -Discussion Clarifying OftenMisused 802.11 WLAN Terminologies," -http://www.connect802.com/download/techpubs/2004/you_believe_D100201.pdf - -\[Barford 2009\] P. Barford, N. Duffield, A. Ron, J. Sommers, "Network -Performance Anomaly Detection and Localization," Proc. 2009 IEEE INFOCOM -(Apr. 2009). - -\[Baronti 2007\] P. Baronti, P. Pillai, V. Chook, S. Chessa, A. Gotta, -Y. Hu, "Wireless Sensor Networks: A Survey on the State of the Art and -the 802.15.4 and ZigBee Standards," Computer Communications, Vol. 30, -No. 7 (2007), pp. 1655--1695. - -\[Baset 2006\] S. A. Basset and H. Schulzrinne, "An Analysis of the -Skype Peer-to-Peer Internet Telephony Protocol," Proc. 2006 IEEE INFOCOM -(Barcelona, Spain, Apr. 2006). - -\[BBC 2001\] BBC news online "A Small Slice of Design," Apr. 2001, -http://news.bbc.co.uk/2/hi/science/nature/1264205.stm - -\[Beheshti 2008\] N. Beheshti, Y. Ganjali, M. Ghobadi, N. McKeown, G. -Salmon, "Experimental Study of Router Buffer Sizing," Proc. ACM Internet -Measurement Conference (Oct. 2008, Vouliagmeni, Greece). - -\[Bender 2000\] P. Bender, P. Black, M. Grob, R. Padovani, N. -Sindhushayana, A. Viterbi, "CDMA/HDR: A Bandwidth-Efficient High-Speed -Wireless Data Service for Nomadic Users," IEEE Commun. Mag., Vol. 38, -No. 7 (July 2000), pp. 70--77. - -\[Berners-Lee 1989\] T. Berners-Lee, CERN, "Information Management: A -Proposal," Mar. 1989, May 1990. http://www.w3.org/History/1989/proposal -.html - -\[Berners-Lee 1994\] T. Berners-Lee, R. Cailliau, A. Luotonen, H. -Frystyk Nielsen, A. Secret, "The World-Wide Web," Communications of the -ACM, Vol. 37, No. 8 (Aug. 1994), pp. 76--82. - -\[Bertsekas 1991\] D. Bertsekas, R. Gallagher, Data Networks, 2nd Ed., -Prentice Hall, Englewood Cliffs, NJ, 1991. - -\[Biersack 1992\] E. W. Biersack, "Performance Evaluation of Forward -Error Correction in ATM Networks," Proc. 1999 ACM SIGCOMM (Baltimore, -MD, Aug. 1992), pp. 248--257. - -\[BIND 2016\] Internet Software Consortium page on BIND, -http://www.isc.org/bind.html - -\[Bisdikian 2001\] C. Bisdikian, "An Overview of the Bluetooth Wireless -Technology," IEEE Communications Magazine, No. 12 (Dec. 2001), -pp. 86--94. - -\[Bishop 2003\] M. Bishop, Computer Security: Art and Science, Boston: -Addison Wesley, Boston MA, 2003. - -\[Black 1995\] U. Black, ATM Volume I: Foundation for Broadband -Networks, Prentice Hall, 1995. - -\[Black 1997\] U. Black, ATM Volume II: Signaling in Broadband Networks, -Prentice Hall, 1997. - -\[Blumenthal 2001\] M. Blumenthal, D. Clark, "Rethinking the Design of -the Internet: The End-to-end Arguments vs. the Brave New World," ACM -Transactions on Internet Technology, Vol. 1, No. 1 (Aug. 2001), -pp. 70--109. - -\[Bochman 1984\] G. V. Bochmann, C. A. Sunshine, "Formal Methods in -Communication Protocol Design," IEEE Transactions on Communications, -Vol. 28, No. 4 (Apr. 1980) pp. 624--631. - -\[Bolot 1996\] J-C. Bolot, A. Vega-Garcia, "Control Mechanisms for -Packet Audio in the Internet," Proc. 1996 IEEE INFOCOM, pp. 232--239. - -\[Bosshart 2013\] P. Bosshart, G. Gibb, H. Kim, G. Varghese, N. McKeown, -M. Izzard, F. Mujica, M. Horowitz, "Forwarding Metamorphosis: Fast -Programmable Match-Action Processing in Hardware for SDN," ACM SIGCOMM -Comput. Commun. Rev. 43, 4 (Aug. 2013), 99--110. - -\[Bosshart 2014\] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, -J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, D. -Walker, "P4: Programming Protocol-Independent Packet Processors," ACM -SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), pp. 87--95. - -\[Brakmo 1995\] L. Brakmo, L. Peterson, "TCP Vegas: End to End -Congestion Avoidance on a Global Internet," IEEE Journal of Selected -Areas in Communications, Vol. 13, No. 8 (Oct. 1995), pp. 1465--1480. - -\[Bryant 1988\] B. Bryant, "Designing an Authentication System: A -Dialogue in Four Scenes," http://web.mit.edu/kerberos/www/dialogue.html - -\[Bush 1945\] V. Bush, "As We May Think," The Atlantic Monthly, July -1945. http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm - -\[Byers 1998\] J. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A Digital -Fountain Approach to Reliable Distribution of Bulk Data," Proc. 1998 ACM -SIGCOMM (Vancouver, Canada, Aug. 1998), pp. 56--67. - -\[Caesar 2005a\] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A. -Shaikh, J. van der Merwe, "Design and implementation of a Routing -Control Platform," Proc. Networked Systems Design and Implementation -(May 2005). - -\[Caesar 2005b\] M. Caesar, J. Rexford, "BGP Routing Policies in ISP -Networks," IEEE Network Magazine, Vol. 19, No. 6 (Nov. 2005). - -\[Caldwell 2012\] C. Caldwell, "The Prime Pages," -http://www.utm.edu/research/primes/prove - -\[Cardwell 2000\] N. Cardwell, S. Savage, T. Anderson, "Modeling TCP -Latency," Proc. 2000 IEEE INFOCOM (Tel-Aviv, Israel, Mar. 2000). - -\[Casado 2007\] M. Casado, M. Freedman, J. Pettit, J. Luo, N. McKeown, -S. Shenker, "Ethane: Taking Control of the Enterprise," Proc. ACM -SIGCOMM '07, New York, pp. 1--12. See also IEEE/ACM Trans. Networking, -17, 4 (Aug. 2007), pp. 270--1283. - -\[Casado 2009\] M. Casado, M. Freedman, J. Pettit, J. Luo, N. Gude, N. -McKeown, S. Shenker, "Rethinking Enterprise Network Control," IEEE/ACM -Transactions on Networking (ToN), Vol. 17, No. 4 (Aug. 2009), -pp. 1270--1283. - -\[Casado 2014\] M. Casado, N. Foster, A. Guha, "Abstractions for -Software-Defined Networks," Communications of the ACM, Vol. 57 No. 10, -(Oct. 2014), pp. 86--95. - -\[Cerf 1974\] V. Cerf, R. Kahn, "A Protocol for Packet Network -Interconnection," IEEE Transactions on Communications Technology, Vol. -COM-22, No. 5, pp. 627--641. - -\[CERT 2001--09\] CERT, "Advisory 2001--09: Statistical Weaknesses in -TCP/IP Initial Sequence Numbers," -http://www.cert.org/advisories/CA-2001-09.html - -\[CERT 2003--04\] CERT, "CERT Advisory CA-2003-04 MS-SQL Server Worm," -http://www.cert.org/advisories/CA-2003-04.html - -\[CERT 2016\] CERT, http://www.cert.org - -\[CERT Filtering 2012\] CERT, "Packet Filtering for Firewall Systems," -http://www.cert.org/tech_tips/packet_filtering.html - -\[Cert SYN 1996\] CERT, "Advisory CA-96.21: TCP SYN Flooding and IP -Spoofing Attacks," http://www.cert.org/advisories/CA-1998-01.html - -\[Chandra 2007\] T. Chandra, R. Greisemer, J. Redstone, "Paxos Made -Live: an Engineering Perspective," Proc. of 2007 ACM Symposium on -Principles of Distributed Computing (PODC), pp. 398--407. - -\[Chao 2001\] H. J. Chao, C. Lam, E. Oki, Broadband Packet Switching -Technologies---A Practical Guide to ATM Switches and IP Routers, John -Wiley & Sons, 2001. - -\[Chao 2011\] C. Zhang, P. Dunghel, D. Wu, K. W. Ross, "Unraveling the -BitTorrent Ecosystem," IEEE Transactions on Parallel and Distributed -Systems, Vol. 22, No. 7 (July 2011). - -\[Chen 2000\] G. Chen, D. Kotz, "A Survey of Context-Aware Mobile -Computing Research," Technical Report TR2000-381, Dept. of Computer -Science, Dartmouth College, Nov. 2000. -http://www.cs.dartmouth.edu/reports/TR2000-381.pdf - -\[Chen 2006\] K.-T. Chen, C.-Y. Huang, P. Huang, C.-L. Lei, "Quantifying -Skype User Satisfaction," Proc. 2006 ACM SIGCOMM (Pisa, Italy, -Sept. 2006). - -\[Chen 2011\] Y. Chen, S. Jain, V. K. Adhikari, Z. Zhang, -"Characterizing Roles of Front-End Servers in End-to-End Performance of -Dynamic Content Distribution," Proc. 2011 ACM Internet Measurement -Conference (Berlin, Germany, Nov. 2011). - -\[Cheswick 2000\] B. Cheswick, H. Burch, S. Branigan, "Mapping and -Visualizing the Internet," Proc. 2000 Usenix Conference (San Diego, CA, -June 2000). - -\[Chiu 1989\] D. Chiu, R. Jain, "Analysis of the Increase and Decrease -Algorithms for Congestion Avoidance in Computer Networks," Computer -Networks and ISDN Systems, Vol. 17, No. 1, pp. 1--14. -http://www.cs.wustl.edu/\~jain/papers/cong_av.htm - -\[Christiansen 2001\] M. Christiansen, K. Jeffay, D. Ott, F. D. Smith, -"Tuning Red for Web Traffic," IEEE/ACM Transactions on Networking, Vol. -9, No. 3 (June 2001), pp. 249--264. - -\[Chuang 2005\] S. Chuang, S. Iyer, N. McKeown, "Practical Algorithms -for Performance Guarantees in Buffered Crossbars," Proc. 2005 IEEE -INFOCOM. - -\[Cisco 802.11ac 2014\] Cisco Systems, "802.11ac: The Fifth Generation -of Wi-Fi," Technical White Paper, Mar. 2014. - -\[Cisco 7600 2016\] Cisco Systems, "Cisco 7600 Series Solution and -Design Guide," -http://www.cisco.com/en/US/products/hw/routers/ps368/prod_technical\_ -reference09186a0080092246.html - -\[Cisco 8500 2012\] Cisco Systems Inc., "Catalyst 8500 Campus Switch -Router Architecture," -http://www.cisco.com/univercd/cc/td/doc/product/l3sw/8540/rel_12_0/w5_6f/softcnfg/1cfg8500.pdf - -\[Cisco 12000 2016\] Cisco Systems Inc., "Cisco XR 12000 Series and -Cisco 12000 Series Routers," -http://www.cisco.com/en/US/products/ps6342/index.html - -\[Cisco 2012\] Cisco 2012, Data Centers, http://www.cisco.com/go/dce - -\[Cisco 2015\] Cisco Visual Networking Index: Forecast and Methodology, -2014--2019, White Paper, 2015. - -\[Cisco 6500 2016\] Cisco Systems, "Cisco Catalyst 6500 Architecture -White Paper," http://www.cisco.com/c/en/us/products/collateral/switches/ -catalyst-6500-seriesswitches/prod_white_paper0900aecd80673385.html - -\[Cisco NAT 2016\] Cisco Systems Inc., "How NAT Works," -http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080094831.shtml - -\[Cisco QoS 2016\] Cisco Systems Inc., "Advanced QoS Services for the -Intelligent Internet," -http://www.cisco.com/warp/public/cc/pd/iosw/ioft/ioqo/tech/qos_wp.htm - -\[Cisco Queue 2016\] Cisco Systems Inc., "Congestion Management -Overview," -http://www.cisco.com/en/US/docs/ios/12_2/qos/configuration/guide/qcfconmg.html - -\[Cisco SYN 2016\] Cisco Systems Inc., "Defining Strategies to Protect -Against TCP SYN Denial of Service Attacks," -http://www.cisco.com/en/US/tech/tk828/ -technologies_tech_note09186a00800f67d5.shtml - -\[Cisco TCAM 2014\] Cisco Systems Inc., "CAT 6500 and 7600 Series -Routers and Switches TCAM Allocation Adjustment Procedures," -http://www.cisco.com/c/en/us/ -support/docs/switches/catalyst-6500-series-switches/117712-problemsolution-cat6500-00.html - -\[Cisco VNI 2015\] Cisco Systems Inc., "Visual Networking Index," -http://www.cisco.com/web/solutions/sp/vni/vni_forecast_highlights/index.html - -\[Clark 1988\] D. Clark, "The Design Philosophy of the DARPA Internet -Protocols," Proc. 1988 ACM SIGCOMM (Stanford, CA, Aug. 1988). - -\[Cohen 1977\] D. Cohen, "Issues in Transnet Packetized Voice -Communication," Proc. Fifth Data Communications Symposium (Snowbird, UT, -Sept. 1977), pp. 6--13. - -\[Cookie Central 2016\] Cookie Central homepage, -http://www.cookiecentral.com/ n_cookie_faq.htm - -\[Cormen 2001\] T. H. Cormen, Introduction to Algorithms, 2nd Ed., MIT -Press, Cambridge, MA, 2001. - -\[Crow 1997\] B. Crow, I. Widjaja, J. Kim, P. Sakai, "IEEE 802.11 -Wireless Local Area Networks," IEEE Communications Magazine -(Sept. 1997), pp. 116--126. - -\[Cusumano 1998\] M. A. Cusumano, D. B. Yoffie, Competing on Internet -Time: Lessons from Netscape and Its Battle with Microsoft, Free Press, -New York, NY, 1998. - -\[Czyz 2014\] J. Czyz, M. Allman, J. Zhang, S. Iekel-Johnson, E. -Osterweil, M. Bailey, "Measuring IPv6 Adoption," Proc. ACM SIGCOMM 2014, -ACM, New York, NY, USA, pp. 87--98. - -\[Dahlman 1998\] E. Dahlman, B. Gudmundson, M. Nilsson, J. Sköld, -"UMTS/IMT-2000 Based on Wideband CDMA," IEEE Communications Magazine -(Sept. 1998), pp. 70--80. - -\[Daigle 1991\] J. N. Daigle, Queuing Theory for Telecommunications, -Addison-Wesley, Reading, MA, 1991. - -\[DAM 2016\] Digital Attack Map, http://www.digitalattackmap.com - -\[Davie 2000\] B. Davie and Y. Rekhter, MPLS: Technology and -Applications, Morgan Kaufmann Series in Networking, 2000. - -\[Davies 2005\] G. Davies, F. Kelly, "Network Dimensioning, Service -Costing, and Pricing in a Packet-Switched Environment," -Telecommunications Policy, Vol. 28, No. 4, pp. 391--412. - -\[DEC 1990\] Digital Equipment Corporation, "In Memoriam: J. C. R. -Licklider 1915--1990," SRC Research Report 61, Aug. 1990. -http://www.memex.org/ licklider.pdf - -\[DeClercq 2002\] J. DeClercq, O. Paridaens, "Scalability Implications -of Virtual Private Networks," IEEE Communications Magazine, Vol. 40, -No. 5 (May 2002), pp. 151--157. - -\[Demers 1990\] A. Demers, S. Keshav, S. Shenker, "Analysis and -Simulation of a Fair Queuing Algorithm," Internetworking: Research and -Experience, Vol. 1, No. 1 (1990), pp. 3--26. - -\[dhc 2016\] IETF Dynamic Host Configuration working group homepage, -http://www.ietf.org/html.charters/dhc-charter.html - -\[Dhungel 2012\] P. Dhungel, K. W. Ross, M. Steiner., Y. Tian, X. Hei, -"Xunlei: Peer-Assisted Download Acceleration on a Massive Scale," -Passive and Active Measurement Conference (PAM) 2012, Vienna, 2012. - -\[Diffie 1976\] W. Diffie, M. E. Hellman, "New Directions in -Cryptography," IEEE Transactions on Information Theory, Vol IT-22 -(1976), pp. 644--654. - -\[Diggavi 2004\] S. N. Diggavi, N. Al-Dhahir, A. Stamoulis, R. -Calderbank, "Great Expectations: The Value of Spatial Diversity in -Wireless Networks," Proceedings of the IEEE, Vol. 92, No. 2 (Feb. 2004). - -\[Dilley 2002\] J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, -B. Weihl, "Globally Distributed Content Delivert," IEEE Internet -Computing (Sept.--Oct. 2002). - -\[Diot 2000\] C. Diot, B. N. Levine, B. Lyles, H. Kassem, D. -Balensiefen, "Deployment Issues for the IP Multicast Service and -Architecture," IEEE Network, Vol. 14, No. 1 (Jan./Feb. 2000) pp. 78--88. - -\[Dischinger 2007\] M. Dischinger, A. Haeberlen, K. Gummadi, S. Saroiu, -"Characterizing residential broadband networks," Proc. 2007 ACM Internet -Measurement Conference, pp. 24--26. - -\[Dmitiropoulos 2007\] X. Dmitiropoulos, D. Krioukov, M. Fomenkov, B. -Huffaker, Y. Hyun, K. C. Claffy, G. Riley, "AS Relationships: Inference -and Validation," ACM Computer Communication Review (Jan. 2007). - -\[DOCSIS 2011\] Data-Over-Cable Service Interface Specifications, DOCSIS -3.0: MAC and Upper Layer Protocols Interface Specification, -CM-SP-MULPIv3.0-I16-110623, 2011. - -\[Dodge 2016\] M. Dodge, "An Atlas of Cyberspaces," -http://www.cybergeography.org/atlas/isp_maps.html - -\[Donahoo 2001\] M. Donahoo, K. Calvert, TCP/IP Sockets in C: Practical -Guide for Programmers, Morgan Kaufman, 2001. - -\[DSL 2016\] DSL Forum homepage, http://www.dslforum.org/ - -\[Dhunghel 2008\] P. Dhungel, D. Wu, B. Schonhorst, K.W. Ross, "A -Measurement Study of Attacks on BitTorrent Leechers," 7th International -Workshop on Peer-to-Peer Systems (IPTPS 2008) (Tampa Bay, FL, -Feb. 2008). - -\[Droms 2002\] R. Droms, T. Lemon, The DHCP Handbook (2nd Edition), SAMS -Publishing, 2002. - -\[Edney 2003\] J. Edney and W. A. Arbaugh, Real 802.11 Security: Wi-Fi -Protected Access and 802.11i, Addison-Wesley Professional, 2003. - -\[Edwards 2011\] W. K. Edwards, R. Grinter, R. Mahajan, D. Wetherall, -"Advancing the State of Home Networking," Communications of the ACM, -Vol. 54, No. 6 (June 2011), pp. 62--71. - -\[Ellis 1987\] H. Ellis, "The Story of Non-Secret Encryption," -http://jya.com/ellisdoc.htm - -\[Erickson 2013\] D. Erickson, " The Beacon Openflow Controller," 2nd -ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking -(HotSDN '13). ACM, New York, NY, USA, pp. 13--18. - -\[Ericsson 2012\] Ericsson, "The Evolution of Edge," -http://www.ericsson.com/technology/whitepapers/broadband/evolution_of_EDGE.shtml - -\[Facebook 2014\] A. Andreyev, "Introducing Data Center Fabric, the -Next-Generation Facebook Data Center Network," -https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network - -\[Faloutsos 1999\] C. Faloutsos, M. Faloutsos, P. Faloutsos, "What Does -the Internet Look Like? Empirical Laws of the Internet Topology," Proc. -1999 ACM SIGCOMM (Boston, MA, Aug. 1999). - -\[Farrington 2010\] N. Farrington, G. Porter, S. Radhakrishnan, H. -Bazzaz, V. Subramanya, Y. Fainman, G. Papen, A. Vahdat, "Helios: A -Hybrid Electrical/Optical Switch Architecture for Modular Data Centers," -Proc. 2010 ACM SIGCOMM. - -\[Feamster 2004\] N. Feamster, H. Balakrishnan, J. Rexford, A. Shaikh, -K. van der Merwe, "The Case for Separating Routing from Routers," ACM -SIGCOMM Workshop on Future Directions in Network Architecture, -Sept. 2004. - -\[Feamster 2004\] N. Feamster, J. Winick, J. Rexford, "A Model for BGP -Routing for Network Engineering," Proc. 2004 ACM SIGMETRICS (New York, -NY, June 2004). - -\[Feamster 2005\] N. Feamster, H. Balakrishnan, "Detecting BGP -Configuration Faults with Static Analysis," NSDI (May 2005). - -\[Feamster 2013\] N. Feamster, J. Rexford, E. Zegura, "The Road to SDN," -ACM Queue, Volume 11, Issue 12, (Dec. 2013). - -\[Feldmeier 1995\] D. Feldmeier, "Fast Software Implementation of Error -Detection Codes," IEEE/ACM Transactions on Networking, Vol. 3, No. 6 -(Dec. 1995), pp. 640--652. - -\[Ferguson 2013\] A. Ferguson, A. Guha, C. Liang, R. Fonseca, S. -Krishnamurthi, "Participatory Networking: An API for Application Control -of SDNs," Proceedings ACM SIGCOMM 2013, pp. 327--338. - -\[Fielding 2000\] R. Fielding, "Architectural Styles and the Design of -Network-based Software Architectures," 2000. PhD Thesis, UC Irvine, -2000. - -\[FIPS 1995\] Federal Information Processing Standard, "Secure Hash -Standard," FIPS Publication 180-1. -http://www.itl.nist.gov/fipspubs/fip180-1.htm - -\[Floyd 1999\] S. Floyd, K. Fall, "Promoting the Use of End-to-End -Congestion Control in the Internet," IEEE/ACM Transactions on -Networking, Vol. 6, No. 5 (Oct. 1998), pp. 458--472. - -\[Floyd 2000\] S. Floyd, M. Handley, J. Padhye, J. Widmer, -"Equation-Based Congestion Control for Unicast Applications," Proc. 2000 -ACM SIGCOMM (Stockholm, Sweden, Aug. 2000). - -\[Floyd 2001\] S. Floyd, "A Report on Some Recent Developments in TCP -Congestion Control," IEEE Communications Magazine (Apr. 2001). - -\[Floyd 2016\] S. Floyd, "References on RED (Random Early Detection) -Queue Management," http://www.icir.org/floyd/red.html - -\[Floyd Synchronization 1994\] S. Floyd, V. Jacobson, "Synchronization -of Periodic Routing Messages," IEEE/ACM Transactions on Networking, Vol. -2, No. 2 (Apr. 1997) pp. 122--136. - -\[Floyd TCP 1994\] S. Floyd, "TCP and Explicit Congestion Notification," -ACM SIGCOMM Computer Communications Review, Vol. 24, No. 5 (Oct. 1994), -pp. 10--23. - -\[Fluhrer 2001\] S. Fluhrer, I. Mantin, A. Shamir, "Weaknesses in the -Key Scheduling Algorithm of RC4," Eighth Annual Workshop on Selected -Areas in Cryptography (Toronto, Canada, Aug. 2002). - -\[Fortz 2000\] B. Fortz, M. Thorup, "Internet Traffic Engineering by -Optimizing OSPF Weights," Proc. 2000 IEEE INFOCOM (Tel Aviv, Israel, -Apr. 2000). - -\[Fortz 2002\] B. Fortz, J. Rexford, M. Thorup, "Traffic Engineering -with Traditional IP Routing Protocols," IEEE Communication Magazine -(Oct. 2002). - -\[Fraleigh 2003\] C. Fraleigh, F. Tobagi, C. Diot, "Provisioning IP -Backbone Networks to Support Latency Sensitive Traffic," Proc. 2003 IEEE -INFOCOM (San Francisco, CA, Mar. 2003). - -\[Frost 1994\] J. Frost, "BSD Sockets: A Quick and Dirty Primer," -http://world.std .com/\~jimf/papers/sockets/sockets.html - -\[FTC 2015\] Internet of Things: Privacy and Security in a Connected -World, Federal Trade Commission, 2015, -https://www.ftc.gov/system/files/documents/reports/ -federal-trade-commission-staff-report-november-2013-workshop-entitled-internet-things-privacy/150127iotrpt.pdf - -\[FTTH 2016\] Fiber to the Home Council, http://www.ftthcouncil.org/ - -\[Gao 2001\] L. Gao, J. Rexford, "Stable Internet Routing Without Global -Coordination," IEEE/ACM Transactions on Networking, Vol. 9, No. 6 -(Dec. 2001), pp. 681--692. - -\[Gartner 2014\] Gartner report on Internet of Things, -http://www.gartner.com/ technology/research/internet-of-things - -\[Gauthier 1999\] L. Gauthier, C. Diot, and J. Kurose, "End-to-End -Transmission Control Mechanisms for Multiparty Interactive Applications -on the Internet," Proc. 1999 IEEE INFOCOM (New York, NY, Apr. 1999). - -\[Gember-Jacobson 2014\] A. Gember-Jacobson, R. Viswanathan, C. Prakash, -R. Grandl, J. Khalid, S. Das, A. Akella, "OpenNF: Enabling Innovation in -Network Function Control," Proc. ACM SIGCOMM 2014, pp. 163--174. - -\[Goodman 1997\] David J. Goodman, Wireless Personal Communications -Systems, Prentice-Hall, 1997. - -\[Google IPv6 2015\] Google Inc. "IPv6 Statistics," -https://www.google.com/intl/en/ipv6/statistics.html - -\[Google Locations 2016\] Google data centers. -http://www.google.com/corporate/datacenter/locations.html - -\[Goralski 1999\] W. Goralski, Frame Relay for High-Speed Networks, John -Wiley, New York, 1999. - -\[Greenberg 2009a\] A. Greenberg, J. Hamilton, D. Maltz, P. Patel, "The -Cost of a Cloud: Research Problems in Data Center Networks," ACM -Computer Communications Review (Jan. 2009). - -\[Greenberg 2009b\] A. Greenberg, N. Jain, S. Kandula, C. Kim, P. -Lahiri, D. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and Flexible -Data Center Network," Proc. 2009 ACM SIGCOMM. - -\[Greenberg 2011\] A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. -Kim, P. Lahiri, D. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and -Flexible Data Center Network," Communications of the ACM, Vol. 54, No. 3 -(Mar. 2011), pp. 95--104. - -\[Greenberg 2015\] A. Greenberg, "SDN for the Cloud," Sigcomm 2015 -Keynote Address, -http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/keynote.pdf - -\[Griffin 2012\] T. Griffin, "Interdomain Routing Links," -http://www.cl.cam.ac.uk/\~tgg22/interdomain/ - -\[Gude 2008\] N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. -McKeown, and S. Shenker, "NOX: Towards an Operating System for -Networks," ACM SIGCOMM Computer Communication Review, July 2008. - -\[Guha 2006\] S. Guha, N. Daswani, R. Jain, "An Experimental Study of -the Skype Peer-to-Peer VoIP System," Proc. Fifth Int. Workshop on P2P -Systems (Santa Barbara, CA, 2006). - -\[Guo 2005\] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, X. Zhang, -"Measurement, Analysis, and Modeling of BitTorrent-Like Systems," Proc. -2005 ACM Internet Measurement Conference. - -\[Guo 2009\] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. -Zhang, S. Lu, "BCube: A High Performance, Server-centric Network -Architecture for Modular Data Centers," Proc. 2009 ACM SIGCOMM. - -\[Gupta 2001\] P. Gupta, N. McKeown, "Algorithms for Packet -Classification," IEEE Network Magazine, Vol. 15, No. 2 (Mar./Apr. 2001), -pp. 24--32. - -\[Gupta 2014\] A. Gupta, L. Vanbever, M. Shahbaz, S. Donovan, B. -Schlinker, N. Feamster, J. Rexford, S. Shenker, R. Clark, E. -Katz-Bassett, "SDX: A Software Defined Internet Exchange, " Proc. ACM -SIGCOMM 2014 (Aug. 2014), pp. 551--562. - -\[Ha 2008\] S. Ha, I. Rhee, L. Xu, "CUBIC: A New TCP-Friendly High-Speed -TCP Variant," ACM SIGOPS Operating System Review, 2008. - -\[Halabi 2000\] S. Halabi, Internet Routing Architectures, 2nd Ed., -Cisco Press, 2000. - -\[Hanabali 2005\] A. A. Hanbali, E. Altman, P. Nain, "A Survey of TCP -over Ad Hoc Networks," IEEE Commun. Surveys and Tutorials, Vol. 7, No. 3 -(2005), pp. 22--36. - -\[Hei 2007\] X. Hei, C. Liang, J. Liang, Y. Liu, K. W. Ross, "A -Measurement Study of a Large-scale P2P IPTV System," IEEE Trans. on -Multimedia (Dec. 2007). - -\[Heidemann 1997\] J. Heidemann, K. Obraczka, J. Touch, "Modeling the -Performance of HTTP over Several Transport Protocols," IEEE/ACM -Transactions on Networking, Vol. 5, No. 5 (Oct. 1997), pp. 616--630. - -\[Held 2001\] G. Held, Data Over Wireless Networks: Bluetooth, WAP, and -Wireless LANs, McGraw-Hill, 2001. - -\[Holland 2001\] G. Holland, N. Vaidya, V. Bahl, "A Rate-Adaptive MAC -Protocol for Multi-Hop Wireless Networks," Proc. 2001 ACM Int. -Conference of Mobile Computing and - -Networking (Mobicom01) (Rome, Italy, July 2001). - -\[Hollot 2002\] C.V. Hollot, V. Misra, D. Towsley, W. Gong, "Analysis -and Design of Controllers for AQM Routers Supporting TCP Flows," IEEE -Transactions on Automatic Control, Vol. 47, No. 6 (June 2002), -pp. 945--959. - -\[Hong 2013\] C. Hong, S, Kandula, R. Mahajan, M.Zhang, V. Gill, M. -Nanduri, R. Wattenhofer, "Achieving High Utilization with -Software-driven WAN," ACM SIGCOMM Conference (Aug. 2013), pp.15--26. - -\[Huang 2002\] C. Haung, V. Sharma, K. Owens, V. Makam, "Building -Reliable MPLS Networks Using a Path Protection Mechanism," IEEE -Communications Magazine, Vol. 40, No. 3 (Mar. 2002), pp. 156--162. - -\[Huang 2005\] Y. Huang, R. Guerin, "Does Over-Provisioning Become More -or Less Efficient as Networks Grow Larger?," Proc. IEEE Int. Conf. -Network Protocols (ICNP) (Boston MA, Nov. 2005). - -\[Huang 2008\] C. Huang, J. Li, A. Wang, K. W. Ross, "Understanding -Hybrid CDN-P2P: Why Limelight Needs Its Own Red Swoosh," Proc. 2008 -NOSSDAV, Braunschweig, Germany. - -\[Huitema 1998\] C. Huitema, IPv6: The New Internet Protocol, 2nd Ed., -Prentice Hall, Englewood Cliffs, NJ, 1998. - -\[Huston 1999a\] G. Huston, "Interconnection, Peering, and -Settlements---Part I," The Internet Protocol Journal, Vol. 2, No. 1 -(Mar. 1999). - -\[Huston 2004\] G. Huston, "NAT Anatomy: A Look Inside Network Address -Translators," The Internet Protocol Journal, Vol. 7, No. 3 (Sept. 2004). - -\[Huston 2008a\] G. Huston, "Confronting IPv4 Address Exhaustion," -http://www.potaroo.net/ispcol/2008-10/v4depletion.html - -\[Huston 2008b\] G. Huston, G. Michaelson, "IPv6 Deployment: Just where -are we?" http://www.potaroo.net/ispcol/2008-04/ipv6.html - -\[Huston 2011a\] G. Huston, "A Rough Guide to Address Exhaustion," The -Internet Protocol Journal, Vol. 14, No. 1 (Mar. 2011). - -\[Huston 2011b\] G. Huston, "Transitioning Protocols," The Internet -Protocol Journal, Vol. 14, No. 1 (Mar. 2011). - -\[IAB 2016\] Internet Architecture Board homepage, http://www.iab.org/ - -\[IANA Protocol Numbers 2016\] Internet Assigned Numbers Authority, -Protocol Numbers, -http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml - -\[IBM 1997\] IBM Corp., IBM Inside APPN - The Essential Guide to the -Next-Generation SNA, SG24-3669-03, June 1997. - -\[ICANN 2016\] The Internet Corporation for Assigned Names and Numbers -homepage, http://www.icann.org - -\[IEEE 802 2016\] IEEE 802 LAN/MAN Standards Committee homepage, -http://www.ieee802.org/ - -\[IEEE 802.11 1999\] IEEE 802.11, "1999 Edition (ISO/IEC 8802-11: 1999) -IEEE Standards for Information Technology---Telecommunications and -Information Exchange Between Systems---Local and Metropolitan Area -Network---Specific Requirements---Part 11: Wireless LAN Medium Access -Control (MAC) and Physical Layer (PHY) Specification," -http://standards.ieee.org/getieee802/download/802.11-1999.pdf - -\[IEEE 802.11ac 2013\] IEEE, "802.11ac-2013---IEEE Standard for -Information technology---Telecommunications and Information Exchange -Between Systems---Local and Metropolitan Area Networks---Specific -Requirements---Part 11: Wireless LAN Medium Access Control (MAC) and -Physical Layer (PHY) Specifications---Amendment 4: Enhancements for Very -High Throughput for Operation in Bands Below 6 GHz." - -\[IEEE 802.11n 2012\] IEEE, "IEEE P802.11---Task Group N---Meeting -Update: Status of 802.11n," -http://grouper.ieee.org/groups/802/11/Reports/tgn_update .htm - -\[IEEE 802.15 2012\] IEEE 802.15 Working Group for WPAN homepage, -http://grouper.ieee.org/groups/802/15/. - -\[IEEE 802.15.4 2012\] IEEE 802.15 WPAN Task Group 4, -http://www.ieee802.org/15/pub/TG4.html - -\[IEEE 802.16d 2004\] IEEE, "IEEE Standard for Local and Metropolitan -Area Networks, Part 16: Air Interface for Fixed Broadband Wireless -Access Systems," http:// -standards.ieee.org/getieee802/download/802.16-2004.pdf - -\[IEEE 802.16e 2005\] IEEE, "IEEE Standard for Local and Metropolitan -Area Networks, Part 16: Air Interface for Fixed and Mobile Broadband -Wireless Access Systems, Amendment 2: Physical and Medium Access Control -Layers for Combined Fixed and Mobile Operation in Licensed Bands and -Corrigendum 1," http:// -standards.ieee.org/getieee802/download/802.16e-2005.pdf - -\[IEEE 802.1q 2005\] IEEE, "IEEE Standard for Local and Metropolitan -Area Networks: Virtual Bridged Local Area Networks," -http://standards.ieee.org/ getieee802/ download/802.1Q-2005.pdf - -\[IEEE 802.1X\] IEEE Std 802.1X-2001 Port-Based Network Access Control, -http://standards.ieee.org/reading/ieee/std_public/description/lanman/ -802.1x-2001_desc.html - -\[IEEE 802.3 2012\] IEEE, "IEEE 802.3 CSMA/CD (Ethernet)," -http://grouper.ieee.org/groups/802/3/ - -\[IEEE 802.5 2012\] IEEE, IEEE 802.5 homepage, http://www.ieee802.org/5/ -www8025org/ - -\[IETF 2016\] Internet Engineering Task Force homepage, -http://www.ietf.org - -\[Ihm 2011\] S. Ihm, V. S. Pai, "Towards Understanding Modern Web -Traffic," Proc. 2011 ACM Internet Measurement Conference (Berlin). - -\[IMAP 2012\] The IMAP Connection, http://www.imap.org/ - -\[Intel 2016\] Intel Corp., "Intel 710 Ethernet Adapter," -http://www.intel.com/ -content/www/us/en/ethernet-products/converged-network-adapters/ethernet-xl710 -.html - -\[Internet2 Multicast 2012\] Internet2 Multicast Working Group homepage, -http://www.internet2.edu/multicast/ - -\[ISC 2016\] Internet Systems Consortium homepage, http://www.isc.org - -\[ISI 1979\] Information Sciences Institute, "DoD Standard Internet -Protocol," Internet Engineering Note 123 (Dec. 1979), -http://www.isi.edu/in-notes/ien/ ien123.txt - -\[ISO 2016\] International Organization for Standardization homepage, -International Organization for Standardization, http://www.iso.org/ - -\[ISO X.680 2002\] International Organization for Standardization, -"X.680: ITU-T Recommendation X.680 (2002) Information -Technology---Abstract Syntax Notation One (ASN.1): Specification of -Basic Notation," -http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf - -\[ITU 1999\] Asymmetric Digital Subscriber Line (ADSL) Transceivers. -ITU-T G.992.1, 1999. - -\[ITU 2003\] Asymmetric Digital Subscriber Line (ADSL) -Transceivers---Extended Bandwidth ADSL2 (ADSL2Plus). ITU-T G.992.5, -2003. - -\[ITU 2005a\] International Telecommunication Union, "ITU-T X.509, The -Directory: Public-key and attribute certificate frameworks" (Aug. 2005). - -\[ITU 2006\] ITU, "G.993.1: Very High Speed Digital Subscriber Line -Transceivers (VDSL)," https://www.itu.int/rec/T-REC-G.993.1-200406-I/en, -2006. - -\[ITU 2015\] "Measuring the Information Society Report," 2015, -http://www.itu.int/en/ITU-D/Statistics/Pages/publications/mis2015.aspx - -\[ITU 2012\] The ITU homepage, http://www.itu.int/ - -\[ITU-T Q.2931 1995\] International Telecommunication Union, -"Recommendation Q.2931 (02/95)---Broadband Integrated Services Digital -Network (B-ISDN)--- Digital Subscriber Signalling System No. 2 (DSS -2)---User-Network Interface (UNI)---Layer 3 Specification for Basic -Call/Connection Control." - -\[IXP List 2016\] List of IXPs, Wikipedia, -https://en.wikipedia.org/wiki/List_of\_ Internet_exchange_points - -\[Iyengar 2015\] J. Iyengar, I. Swett, "QUIC: A UDP-Based Secure and -Reliable Transport for HTTP/2," Internet Draft -draft-tsvwg-quic-protocol-00, June 2015. - -\[Iyer 2008\] S. Iyer, R. R. Kompella, N. McKeown, "Designing Packet -Buffers for Router Line Cards," IEEE Transactions on Networking, Vol. -16, No. 3 (June 2008), pp. 705--717. - -\[Jacobson 1988\] V. Jacobson, "Congestion Avoidance and Control," Proc. -1988 ACM SIGCOMM (Stanford, CA, Aug. 1988), pp. 314--329. - -\[Jain 1986\] R. Jain, "A Timeout-Based Congestion Control Scheme for -Window Flow-Controlled Networks," IEEE Journal on Selected Areas in -Communications SAC-4, 7 (Oct. 1986). - -\[Jain 1989\] R. Jain, "A Delay-Based Approach for Congestion Avoidance -in Interconnected Heterogeneous Computer Networks," ACM SIGCOMM Computer -Communications Review, Vol. 19, No. 5 (1989), pp. 56--71. - -\[Jain 1994\] R. Jain, FDDI Handbook: High-Speed Networking Using Fiber -and Other Media, Addison-Wesley, Reading, MA, 1994. - -\[Jain 1996\] R. Jain. S. Kalyanaraman, S. Fahmy, R. Goyal, S. Kim, -"Tutorial Paper on ABR Source Behavior," ATM Forum/96-1270, Oct. 1996. -http://www.cse.wustl.edu/ \~jain/atmf/ftp/atm96-1270.pdf - -\[Jain 2013\] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. -Singh, S.Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. -Stuart, A, Vahdat, "B4: Experience with a Globally Deployed Software -Defined Wan," ACM SIGCOMM 2013, pp. 3--14. - -\[Jaiswal 2003\] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, D. -Towsley, "Measurement and Classification of Out-of-Sequence Packets in a -Tier-1 IP backbone," Proc. 2003 IEEE INFOCOM. - -\[Ji 2003\] P. Ji, Z. Ge, J. Kurose, D. Towsley, "A Comparison of -Hard-State and Soft-State Signaling Protocols," Proc. 2003 ACM SIGCOMM -(Karlsruhe, Germany, Aug. 2003). - -\[Jimenez 1997\] D. Jimenez, "Outside Hackers Infiltrate MIT Network, -Compromise Security," The Tech, Vol. 117, No 49 (Oct. 1997), p. 1, -http://www-tech.mit.edu/V117/ N49/hackers.49n.html - -\[Jin 2004\] C. Jin, D. X. We, S. Low, "FAST TCP: Motivation, -Architecture, Algorithms, Performance," Proc. 2004 IEEE INFOCOM (Hong -Kong, Mar. 2004). - -\[Juniper Contrail 2016\] Juniper Networks, "Contrail," -http://www.juniper.net/us/en/products-services/sdn/contrail/ - -\[Juniper MX2020 2015\] Juniper Networks, "MX2020 and MX2010 3D -Universal Edge Routers," -www.juniper.net/us/en/local/pdf/.../1000417-en.pdf - -\[Kaaranen 2001\] H. Kaaranen, S. Naghian, L. Laitinen, A. Ahtiainen, V. -Niemi, Networks: Architecture, Mobility and Services, New York: John -Wiley & Sons, 2001. - -\[Kahn 1967\] D. Kahn, The Codebreakers: The Story of Secret Writing, -The Macmillan Company, 1967. - -\[Kahn 1978\] R. E. Kahn, S. Gronemeyer, J. Burchfiel, R. Kunzelman, -"Advances in Packet Radio Technology," Proc. 1978 IEEE INFOCOM, 66, 11 -(Nov. 1978). - -\[Kamerman 1997\] A. Kamerman, L. Monteban, "WaveLAN-II: A High-- -Performance Wireless LAN for the Unlicensed Band," Bell Labs Technical -Journal (Summer 1997), pp. 118--133. - -\[Kar 2000\] K. Kar, M. Kodialam, T. V. Lakshman, "Minimum Interference -Routing of Bandwidth Guaranteed Tunnels with MPLS Traffic Engineering -Applications," IEEE J. Selected Areas in Communications (Dec. 2000). - -\[Karn 1987\] P. Karn, C. Partridge, "Improving Round-Trip Time -Estimates in Reliable Transport Protocols," Proc. 1987 ACM SIGCOMM. - -\[Karol 1987\] M. Karol, M. Hluchyj, A. Morgan, "Input Versus Output -Queuing on a Space-Division Packet Switch," IEEE Transactions on -Communications, Vol. 35, No. 12 (Dec.1987), pp. 1347--1356. - -\[Kaufman 1995\] C. Kaufman, R. Perlman, M. Speciner, Network Security, -Private Communication in a Public World, Prentice Hall, Englewood -Cliffs, NJ, 1995. - -\[Kelly 1998\] F. P. Kelly, A. Maulloo, D. Tan, "Rate Control for -Communication Networks: Shadow Prices, Proportional Fairness and -Stability," J. Operations Res. Soc., Vol. 49, No. 3 (Mar. 1998), -pp. 237--252. - -\[Kelly 2003\] T. Kelly, "Scalable TCP: Improving Performance in High -Speed Wide Area Networks," ACM SIGCOMM Computer Communications Review, -Volume 33, No. 2 (Apr. 2003), pp.83--91. - -\[Kilkki 1999\] K. Kilkki, Differentiated Services for the Internet, -Macmillan Technical Publishing, Indianapolis, IN, 1999. - -\[Kim 2005\] H. Kim, S. Rixner, V. Pai, "Network Interface Data -Caching," IEEE Transactions on Computers, Vol. 54, No. 11 (Nov. 2005), -pp. 1394--1408. - -\[Kim 2008\] C. Kim, M. Caesar, J. Rexford, "Floodless in SEATTLE: A -Scalable Ethernet Architecture for Large Enterprises," Proc. 2008 ACM -SIGCOMM (Seattle, WA, Aug. 2008). - -\[Kleinrock 1961\] L. Kleinrock, "Information Flow in Large -Communication Networks," RLE Quarterly Progress Report, July 1961. - -\[Kleinrock 1964\] L. Kleinrock, 1964 Communication Nets: Stochastic -Message Flow and Delay, McGraw-Hill, New York, NY, 1964. - -\[Kleinrock 1975\] L. Kleinrock, Queuing Systems, Vol. 1, John Wiley, -New York, 1975. - -\[Kleinrock 1975b\] L. Kleinrock, F. A. Tobagi, "Packet Switching in -Radio Channels: Part I---Carrier Sense Multiple-Access Modes and Their -Throughput-Delay Characteristics," IEEE Transactions on Communications, -Vol. 23, No. 12 (Dec. 1975), pp. 1400--1416. - -\[Kleinrock 1976\] L. Kleinrock, Queuing Systems, Vol. 2, John Wiley, -New York, 1976. - -\[Kleinrock 2004\] L. Kleinrock, "The Birth of the Internet," -http://www.lk.cs.ucla.edu/LK/Inet/birth.html - -\[Kohler 2006\] E. Kohler, M. Handley, S. Floyd, "DDCP: Designing DCCP: -Congestion Control Without Reliability," Proc. 2006 ACM SIGCOMM (Pisa, -Italy, Sept. 2006). - -\[Kolding 2003\] T. Kolding, K. Pedersen, J. Wigard, F. Frederiksen, P. -Mogensen, "High Speed Downlink Packet Access: WCDMA Evolution," IEEE -Vehicular Technology Society News (Feb. 2003), pp. 4--10. - -\[Koponen 2010\] T. Koponen, M. Casado, N. Gude, J. Stribling, L. -Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, S. -Shenker, "Onix: A Distributed Control Platform for Large-Scale -Production Networks," 9th USENIX conference on Operating systems design -and implementation (OSDI'10), pp. 1--6. - -\[Koponen 2011\] T. Koponen, S. Shenker, H. Balakrishnan, N. Feamster, -I. Ganichev, A. Ghodsi, P. B. Godfrey, N. McKeown, G. Parulkar, B. -Raghavan, J. Rexford, S. Arianfar, D. Kuptsov, "Architecting for -Innovation," ACM Computer Communications Review, 2011. - -\[Korhonen 2003\] J. Korhonen, Introduction to 3G Mobile Communications, -2nd ed., Artech House, 2003. - -\[Koziol 2003\] J. Koziol, Intrusion Detection with Snort, Sams -Publishing, 2003. - -\[Kreutz 2015\] D. Kreutz, F.M.V. Ramos, P. Esteves Verissimo, C. -Rothenberg, S. Azodolmolky, S. Uhlig, "Software-Defined Networking: A -Comprehensive Survey," Proceedings of the IEEE, Vol. 103, No. 1 -(Jan. 2015), pp. 14-76. This paper is also being updated at -https://github.com/SDN-Survey/latex/wiki - -\[Krishnamurthy 2001\] B. Krishnamurthy, J. Rexford, Web Protocols and -Practice: HTTP/ 1.1, Networking Protocols, and Traffic Measurement, -Addison-Wesley, Boston, MA, 2001. - -\[Kulkarni 2005\] S. Kulkarni, C. Rosenberg, "Opportunistic Scheduling: -Generalizations to Include Multiple Constraints, Multiple Interfaces, -and Short Term Fairness," Wireless Networks, 11 (2005), 557--569. - -\[Kumar 2006\] R. Kumar, K.W. Ross, "Optimal Peer-Assisted File -Distribution: Single and Multi-Class Problems," IEEE Workshop on Hot -Topics in Web Systems and Technologies (Boston, MA, 2006). - -\[Labovitz 1997\] C. Labovitz, G. R. Malan, F. Jahanian, "Internet -Routing Instability," Proc. 1997 ACM SIGCOMM (Cannes, France, -Sept. 1997), pp. 115--126. - -\[Labovitz 2010\] C. Labovitz, S. Iekel-Johnson, D. McPherson, J. -Oberheide, F. Jahanian, "Internet Inter-Domain Traffic," Proc. 2010 ACM -SIGCOMM. - -\[Labrador 1999\] M. Labrador, S. Banerjee, "Packet Dropping Policies -for ATM and IP Networks," IEEE Communications Surveys, Vol. 2, No. 3 -(Third Quarter 1999), pp. 2--14. - -\[Lacage 2004\] M. Lacage, M.H. Manshaei, T. Turletti, "IEEE 802.11 Rate -Adaptation: A Practical Approach," ACM Int. Symposium on Modeling, -Analysis, and Simulation of Wireless and Mobile Systems (MSWiM) (Venice, -Italy, Oct. 2004). - -\[Lakhina 2004\] A. Lakhina, M. Crovella, C. Diot, "Diagnosing -Network-Wide Traffic Anomalies," Proc. 2004 ACM SIGCOMM. - -\[Lakhina 2005\] A. Lakhina, M. Crovella, C. Diot, "Mining Anomalies -Using Traffic Feature Distributions," Proc. 2005 ACM SIGCOMM. - -\[Lakshman 1997\] T. V. Lakshman, U. Madhow, "The Performance of TCP/IP -for Networks with High Bandwidth-Delay Products and Random Loss," -IEEE/ACM Transactions on Networking, Vol. 5, No. 3 (1997), pp. 336--350. - -\[Lakshman 2004\] T. V. Lakshman, T. Nandagopal, R. Ramjee, K. Sabnani, -T. Woo, "The SoftRouter Architecture," Proc. 3nd ACM Workshop on Hot -Topics in Networks (Hotnets-III), Nov. 2004. - -\[Lam 1980\] S. Lam, "A Carrier Sense Multiple Access Protocol for Local -Networks," Computer Networks, Vol. 4 (1980), pp. 21--32. - -\[Lamport 1989\] L. Lamport, "The Part-Time Parliament," Technical -Report 49, Systems Research Center, Digital Equipment Corp., Palo Alto, -Sept. 1989. - -\[Lampson 1983\] Lampson, Butler W. "Hints for computer system design," -ACM SIGOPS Operating Systems Review, Vol. 17, No. 5, 1983. - -\[Lampson 1996\] B. Lampson, "How to Build a Highly Available System -Using Consensus," Proc. 10th International Workshop on Distributed -Algorithms (WDAG '96), Özalp Babaoglu and Keith Marzullo (Eds.), -Springer-Verlag, pp. 1--17. - -\[Lawton 2001\] G. Lawton, "Is IPv6 Finally Gaining Ground?" IEEE -Computer Magazine (Aug. 2001), pp. 11--15. - -\[LeBlond 2011\] S. Le Blond, C. Zhang, A. Legout, K. Ross, W. Dabbous. -2011, "I know where you are and what you are sharing: exploiting P2P -communications to invade users' privacy." 2011 ACM Internet Measurement -Conference, ACM, New York, NY, USA, pp. 45--60. - -\[Leighton 2009\] T. Leighton, "Improving Performance on the Internet," -Communications of the ACM, Vol. 52, No. 2 (Feb. 2009), pp. 44--51. - -\[Leiner 1998\] B. Leiner, V. Cerf, D. Clark, R. Kahn, L. Kleinrock, D. -Lynch, J. Postel, L. Roberts, S. Woolf, "A Brief History of the -Internet," http://www.isoc.org/internet/history/brief.html - -\[Leung 2006\] K. Leung, V. O.K. Li, "TCP in Wireless Networks: Issues, -Approaches, and Challenges," IEEE Commun. Surveys and Tutorials, Vol. 8, -No. 4 (2006), pp. 64--79. - -\[Levin 2012\] D. Levin, A. Wundsam, B. Heller, N. Handigol, A. -Feldmann, "Logically Centralized?: State Distribution Trade-offs in -Software Defined Networks," Proc. First Workshop on Hot Topics in -Software Defined Networks (Aug. 2012), pp. 1--6. - -\[Li 2004\] L. Li, D. Alderson, W. Willinger, J. Doyle, "A -First-Principles Approach to Understanding the Internet's Router-Level -Topology," Proc. 2004 ACM SIGCOMM (Portland, OR, Aug. 2004). - -\[Li 2007\] J. Li, M. Guidero, Z. Wu, E. Purpus, T. Ehrenkranz, "BGP -Routing Dynamics Revisited." ACM Computer Communication Review -(Apr. 2007). - -\[Li 2015\] S.Q. Li, "Building Softcom Ecosystem Foundation," Open -Networking Summit, 2015. - -\[Lin 2001\] Y. Lin, I. Chlamtac, Wireless and Mobile Network -Architectures, John Wiley and Sons, New York, NY, 2001. - -\[Liogkas 2006\] N. Liogkas, R. Nelson, E. Kohler, L. Zhang, "Exploiting -BitTorrent for Fun (but Not Profit)," 6th International Workshop on -Peer-to-Peer Systems (IPTPS 2006). - -\[Liu 2003\] J. Liu, I. Matta, M. Crovella, "End-to-End Inference of -Loss Nature in a Hybrid Wired/Wireless Environment," Proc. WiOpt'03: -Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks. - -\[Locher 2006\] T. Locher, P. Moor, S. Schmid, R. Wattenhofer, "Free -Riding in BitTorrent is Cheap," Proc. ACM HotNets 2006 (Irvine CA, -Nov. 2006). - -\[Lui 2004\] J. Lui, V. Misra, D. Rubenstein, "On the Robustness of Soft -State Protocols," Proc. IEEE Int. Conference on Network Protocols (ICNP -'04), pp. 50--60. - -\[Mahdavi 1997\] J. Mahdavi, S. Floyd, "TCP-Friendly Unicast Rate-Based -Flow Control," unpublished note (Jan. 1997). - -\[MaxMind 2016\] http://www.maxmind.com/app/ip-location - -\[Maymounkov 2002\] P. Maymounkov, D. Mazières. "Kademlia: A -Peer-to-Peer Information System Based on the XOR Metric." Proceedings of -the 1st International Workshop on Peerto-Peer Systems (IPTPS '02) -(Mar. 2002), pp. 53--65. - -\[McKeown 1997a\] N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, -M. Horowitz, "The Tiny Tera: A Packet Switch Core," IEEE Micro Magazine -(Jan.--Feb. 1997). - -\[McKeown 1997b\] N. McKeown, "A Fast Switched Backplane for a Gigabit -Switched Router," Business Communications Review, Vol. 27, No. 12. -http://tinytera.stanford.edu/\~nickm/papers/cisco_fasts_wp.pdf - -\[McKeown 2008\] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, -L. Peterson, J. Rexford, S. Shenker, J. Turner. 2008. OpenFlow: Enabling -Innovation in Campus Networks. SIGCOMM Comput. Commun. Rev. 38, 2 -(Mar. 2008), pp. 69--74. - -\[McQuillan 1980\] J. McQuillan, I. Richer, E. Rosen, "The New Routing -Algorithm for the Arpanet," IEEE Transactions on Communications, Vol. -28, No. 5 (May 1980), pp. 711--719. - -\[Metcalfe 1976\] R. M. Metcalfe, D. R. Boggs. "Ethernet: Distributed -Packet Switching for Local Computer Networks," Communications of the -Association for Computing Machinery, Vol. 19, No. 7 (July 1976), -pp. 395--404. - -\[Meyers 2004\] A. Myers, T. Ng, H. Zhang, "Rethinking the Service -Model: Scaling Ethernet to a Million Nodes," ACM Hotnets Conference, -2004. - -\[MFA Forum 2016\] IP/MPLS Forum homepage, http://www.ipmplsforum.org/ - -\[Mockapetris 1988\] P. V. Mockapetris, K. J. Dunlap, "Development of -the Domain Name System," Proc. 1988 ACM SIGCOMM (Stanford, CA, -Aug. 1988). - -\[Mockapetris 2005\] P. Mockapetris, Sigcomm Award Lecture, video -available at http://www.postel.org/sigcomm - -\[Molinero-Fernandez 2002\] P. Molinaro-Fernandez, N. McKeown, H. Zhang, -"Is IP Going to Take Over the World (of Communications)?" Proc. 2002 ACM -Hotnets. - -\[Molle 1987\] M. L. Molle, K. Sohraby, A. N. Venetsanopoulos, -"Space-Time Models of Asynchronous CSMA Protocols for Local Area -Networks," IEEE Journal on Selected Areas in Communications, Vol. 5, -No. 6 (1987), pp. 956--968. - -\[Moore 2001\] D. Moore, G. Voelker, S. Savage, "Inferring Internet -Denial of Service Activity," Proc. 2001 USENIX Security Symposium -(Washington, DC, Aug. 2001). - -\[Motorola 2007\] Motorola, "Long Term Evolution (LTE): A Technical -Overview," -http://www.motorola.com/staticfiles/Business/Solutions/Industry%20Solutions/Service%20Providers/Wireless%20Operators/LTE/\_Document/Static%20Files/6834_MotDoc_New.pdf - -\[Mouly 1992\] M. Mouly, M. Pautet, The GSM System for Mobile -Communications, Cell and Sys, Palaiseau, France, 1992. - -\[Moy 1998\] J. Moy, OSPF: Anatomy of An Internet Routing Protocol, -Addison-Wesley, Reading, MA, 1998. - -\[Mukherjee 1997\] B. Mukherjee, Optical Communication Networks, -McGraw-Hill, 1997. - -\[Mukherjee 2006\] B. Mukherjee, Optical WDM Networks, Springer, 2006. - -\[Mysore 2009\] R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. -Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat, "PortLand: A Scalable -Fault-Tolerant Layer 2 Data Center Network Fabric," Proc. 2009 ACM -SIGCOMM. - -\[Nahum 2002\] E. Nahum, T. Barzilai, D. Kandlur, "Performance Issues in -WWW Servers," IEEE/ACM Transactions on Networking, Vol 10, No. 1 -(Feb. 2002). - -\[Netflix Open Connect 2016\] Netflix Open Connect CDN, 2016, https:// -openconnect.netflix.com/ - -\[Netflix Video 1\] Designing Netflix's Content Delivery System, D. -Fulllager, 2014, https://www.youtube.com/watch?v=LkLLpYdDINA - -\[Netflix Video 2\] Scaling the Netflix Global CDN, D. Temkin, 2015, -https://www .youtube.com/watch?v=tbqcsHg-Q_o - -\[Neumann 1997\] R. Neumann, "Internet Routing Black Hole," The Risks -Digest: Forum on Risks to the Public in Computers and Related Systems, -Vol. 19, No. 12 (May 1997). -http://catless.ncl.ac.uk/Risks/19.12.html#subj1.1 - -\[Neville-Neil 2009\] G. Neville-Neil, "Whither Sockets?" Communications -of the ACM, Vol. 52, No. 6 (June 2009), pp. 51--55. - -\[Nicholson 2006\] A Nicholson, Y. Chawathe, M. Chen, B. Noble, D. -Wetherall, "Improved Access Point Selection," Proc. 2006 ACM Mobisys -Conference (Uppsala Sweden, 2006). - -\[Nielsen 1997\] H. F. Nielsen, J. Gettys, A. Baird-Smith, E. -Prud'hommeaux, H. W. Lie, C. Lilley, "Network Performance Effects of -HTTP/1.1, CSS1, and PNG," W3C Document, 1997 (also appears in Proc. 1997 -ACM SIGCOM (Cannes, France, Sept 1997), pp. 155--166. - -\[NIST 2001\] National Institute of Standards and Technology, "Advanced -Encryption Standard (AES)," Federal Information Processing Standards -197, Nov. 2001, http:// -csrc.nist.gov/publications/fips/fips197/fips-197.pdf - -\[NIST IPv6 2015\] US National Institute of Standards and Technology, -"Estimating IPv6 & DNSSEC Deployment SnapShots," -http://fedv6-deployment.antd.nist.gov/snapall.html - -\[Nmap 2012\] Nmap homepage, http://www.insecure.com/nmap - -\[Nonnenmacher 1998\] J. Nonnenmacher, E. Biersak, D. Towsley, -"Parity-Based Loss Recovery for Reliable Multicast Transmission," -IEEE/ACM Transactions on Networking, Vol. 6, No. 4 (Aug. 1998), -pp. 349--361. - -\[Nygren 2010\] Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun, "The -Akamai Network: A Platform for High-performance Internet Applications," -SIGOPS Oper. Syst. Rev. 44, 3 (Aug. 2010), 2--19. - -\[ONF 2016\] Open Networking Foundation, Technical Library, -https://www.opennetworking.org/sdn-resources/technical-library - -\[ONOS 2016\] Open Network Operating System (ONOS), "Architecture -Guide," https://wiki.onosproject.org/display/ONOS/Architecture+Guide, -2016. - -\[OpenFlow 2009\] Open Network Foundation, "OpenFlow Switch -Specification 1.0.0, TS-001," -https://www.opennetworking.org/images/stories/downloads/sdnresources/onf-specifications/openflow/openflow-spec-v1.0.0.pdf - -\[OpenDaylight Lithium 2016\] OpenDaylight, "Lithium," -https://www.opendaylight.org/lithium - -\[OSI 2012\] International Organization for Standardization homepage, -http://www.iso.org/iso/en/ISOOnline.frontpage - -\[Osterweil 2012\] E. Osterweil, D. McPherson, S. DiBenedetto, C. -Papadopoulos, D. Massey, "Behavior of DNS Top Talkers," Passive and -Active Measurement Conference, 2012. - -\[Padhye 2000\] J. Padhye, V. Firoiu, D. Towsley, J. Kurose, "Modeling -TCP Reno Performance: A Simple Model and Its Empirical Validation," -IEEE/ACM Transactions on Networking, Vol. 8 No. 2 (Apr. 2000), -pp. 133--145. - -\[Padhye 2001\] J. Padhye, S. Floyd, "On Inferring TCP Behavior," Proc. -2001 ACM SIGCOMM (San Diego, CA, Aug. 2001). - -\[Palat 2009\] S. Palat, P. Godin, "The LTE Network Architecture: A -Comprehensive Tutorial," in LTE---The UMTS Long Term Evolution: From -Theory to Practice. Also available as a standalone Alcatel white paper. - -\[Panda 2013\] A. Panda, C. Scott, A. Ghodsi, T. Koponen, S. Shenker, -"CAP for Networks," Proc. ACM HotSDN '13, pp. 91--96. - -\[Parekh 1993\] A. Parekh, R. Gallagher, "A Generalized Processor -Sharing Approach to Flow Control in Integrated Services Networks: The -Single-Node Case," IEEE/ACM Transactions on Networking, Vol. 1, No. 3 -(June 1993), pp. 344--357. - -\[Partridge 1992\] C. Partridge, S. Pink, "An Implementation of the -Revised Internet Stream Protocol (ST-2)," Journal of Internetworking: -Research and Experience, Vol. 3, No. 1 (Mar. 1992). - -\[Partridge 1998\] C. Partridge, et al. "A Fifty Gigabit per second IP -Router," IEEE/ACM Transactions on Networking, Vol. 6, No. 3 (Jun. 1998), -pp. 237--248. - -\[Pathak 2010\] A. Pathak, Y. A. Wang, C. Huang, A. Greenberg, Y. C. Hu, -J. Li, K. W. Ross, "Measuring and Evaluating TCP Splitting for Cloud -Services," Passive and Active Measurement (PAM) Conference (Zurich, -2010). - -\[Perkins 1994\] A. Perkins, "Networking with Bob Metcalfe," The Red -Herring Magazine (Nov. 1994). - -\[Perkins 1998\] C. Perkins, O. Hodson, V. Hardman, "A Survey of Packet -Loss Recovery Techniques for Streaming Audio," IEEE Network Magazine -(Sept./Oct. 1998), pp. 40--47. - -\[Perkins 1998b\] C. Perkins, Mobile IP: Design Principles and Practice, -Addison-Wesley, Reading, MA, 1998. - -\[Perkins 2000\] C. Perkins, Ad Hoc Networking, Addison-Wesley, Reading, -MA, 2000. - -\[Perlman 1999\] R. Perlman, Interconnections: Bridges, Routers, -Switches, and Internetworking Protocols, 2nd ed., Addison-Wesley -Professional Computing Series, Reading, MA, 1999. - -\[PGPI 2016\] The International PGP homepage, http://www.pgpi.org - -\[Phifer 2000\] L. Phifer, "The Trouble with NAT," The Internet Protocol -Journal, Vol. 3, No. 4 (Dec. 2000), -http://www.cisco.com/warp/public/759/ipj_3-4/ipj\_ 3-4_nat.html - -\[Piatek 2007\] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, A. -Venkataramani, "Do Incentives Build Robustness in Bittorrent?," Proc. -NSDI (2007). - -\[Piatek 2008\] M. Piatek, T. Isdal, A. Krishnamurthy, T. Anderson, "One -Hop Reputations for Peer-to-peer File Sharing Workloads," Proc. NSDI -(2008). - -\[Pickholtz 1982\] R. Pickholtz, D. Schilling, L. Milstein, "Theory of -Spread Spectrum Communication---a Tutorial," IEEE Transactions on -Communications, Vol. 30, No. 5 (May 1982), pp. 855--884. - -\[PingPlotter 2016\] PingPlotter homepage, http://www.pingplotter.com - -\[Piscatello 1993\] D. Piscatello, A. Lyman Chapin, Open Systems -Networking, Addison-Wesley, Reading, MA, 1993. - -\[Pomeranz 2010\] H. Pomeranz, "Practical, Visual, Three-Dimensional -Pedagogy for Internet Protocol Packet Header Control Fields," -https://righteousit.wordpress.com/ -2010/06/27/practical-visual-three-dimensional-pedagogy-for-internet-protocol-packet-header-control-fields/, -June 2010. - -\[Potaroo 2016\] "Growth of the BGP Table--1994 to Present," -http://bgp.potaroo.net/ - -\[PPLive 2012\] PPLive homepage, http://www.pplive.com - -\[Qazi 2013\] Z. Qazi, C. Tu, L. Chiang, R. Miao, V. Sekar, M. Yu, -"SIMPLE-fying Middlebox Policy Enforcement Using SDN," ACM SIGCOMM -Conference (Aug. 2013), pp. 27--38. - -\[Quagga 2012\] Quagga, "Quagga Routing Suite," http://www.quagga.net/ - -\[Quittner 1998\] J. Quittner, M. Slatalla, Speeding the Net: The Inside -Story of Netscape and How It Challenged Microsoft, Atlantic Monthly -Press, 1998. - -\[Quova 2016\] www.quova.com - -\[Ramakrishnan 1990\] K. K. Ramakrishnan, R. Jain, "A Binary Feedback -Scheme for Congestion Avoidance in Computer Networks," ACM Transactions -on Computer Systems, Vol. 8, No. 2 (May 1990), pp. 158--181. - -\[Raman 1999\] S. Raman, S. McCanne, "A Model, Analysis, and Protocol -Framework for Soft State-based Communication," Proc. 1999 ACM SIGCOMM -(Boston, MA, Aug. 1999). - -\[Raman 2007\] B. Raman, K. Chebrolu, "Experiences in Using WiFi for -Rural Internet in India," IEEE Communications Magazine, Special Issue on -New Directions in Networking Technologies in Emerging Economies -(Jan. 2007). - -\[Ramaswami 2010\] R. Ramaswami, K. Sivarajan, G. Sasaki, Optical -Networks: A Practical Perspective, Morgan Kaufman Publishers, 2010. - -\[Ramjee 1994\] R. Ramjee, J. Kurose, D. Towsley, H. Schulzrinne, -"Adaptive Playout Mechanisms for Packetized Audio Applications in -Wide-Area Networks," Proc. 1994 IEEE INFOCOM. - -\[Rao 2011\] A. S. Rao, Y. S. Lim, C. Barakat, A. Legout, D. Towsley, W. -Dabbous, "Network Characteristics of Video Streaming Traffic," Proc. -2011 ACM CoNEXT (Tokyo). - -\[Ren 2006\] S. Ren, L. Guo, X. Zhang, "ASAP: An AS-Aware Peer-Relay -Protocol for High Quality VoIP," Proc. 2006 IEEE ICDCS (Lisboa, -Portugal, July 2006). - -\[Rescorla 2001\] E. Rescorla, SSL and TLS: Designing and Building -Secure Systems, Addison-Wesley, Boston, 2001. - -\[RFC 001\] S. Crocker, "Host Software," RFC 001 (the very first RFC!). - -\[RFC 768\] J. Postel, "User Datagram Protocol," RFC 768, Aug. 1980. - -\[RFC 791\] J. Postel, "Internet Protocol: DARPA Internet Program -Protocol Specification," RFC 791, Sept. 1981. - -\[RFC 792\] J. Postel, "Internet Control Message Protocol," RFC 792, -Sept. 1981. - -\[RFC 793\] J. Postel, "Transmission Control Protocol," RFC 793, -Sept. 1981. - -\[RFC 801\] J. Postel, "NCP/TCP Transition Plan," RFC 801, Nov. 1981. - -\[RFC 826\] D. C. Plummer, "An Ethernet Address Resolution -Protocol---or--- Converting Network Protocol Addresses to 48-bit -Ethernet Address for Transmission on Ethernet Hardware," RFC 826, -Nov. 1982. - -\[RFC 829\] V. Cerf, "Packet Satellite Technology Reference Sources," -RFC 829, Nov. 1982. - -\[RFC 854\] J. Postel, J. Reynolds, "TELNET Protocol Specification," RFC -854, May 1993. - -\[RFC 950\] J. Mogul, J. Postel, "Internet Standard Subnetting -Procedure," RFC 950, Aug. 1985. - -\[RFC 959\] J. Postel and J. Reynolds, "File Transfer Protocol (FTP)," -RFC 959, Oct. 1985. - -\[RFC 1034\] P. V. Mockapetris, "Domain Names---Concepts and -Facilities," RFC 1034, Nov. 1987. - -\[RFC 1035\] P. Mockapetris, "Domain Names---Implementation and -Specification," RFC 1035, Nov. 1987. - -\[RFC 1058\] C. L. Hendrick, "Routing Information Protocol," RFC 1058, -June 1988. - -\[RFC 1071\] R. Braden, D. Borman, and C. Partridge, "Computing the -Internet Checksum," RFC 1071, Sept. 1988. - -\[RFC 1122\] R. Braden, "Requirements for Internet Hosts---Communication -Layers," RFC 1122, Oct. 1989. - -\[RFC 1123\] R. Braden, ed., "Requirements for Internet -Hosts---Application and Support," RFC-1123, Oct. 1989. - -\[RFC 1142\] D. Oran, "OSI IS-IS Intra-Domain Routing Protocol," RFC -1142, Feb. 1990. - -\[RFC 1190\] C. Topolcic, "Experimental Internet Stream Protocol: -Version 2 (ST-II)," RFC 1190, Oct. 1990. - -\[RFC 1256\] S. Deering, "ICMP Router Discovery Messages," RFC 1256, -Sept. 1991. - -\[RFC 1320\] R. Rivest, "The MD4 Message-Digest Algorithm," RFC 1320, -Apr. 1992. - -\[RFC 1321\] R. Rivest, "The MD5 Message-Digest Algorithm," RFC 1321, -Apr. 1992. - -\[RFC 1323\] V. Jacobson, S. Braden, D. Borman, "TCP Extensions for High -Performance," RFC 1323, May 1992. - -\[RFC 1422\] S. Kent, "Privacy Enhancement for Internet Electronic Mail: -Part II: Certificate-Based Key Management," RFC 1422. - -\[RFC 1546\] C. Partridge, T. Mendez, W. Milliken, "Host Anycasting -Service," RFC 1546, 1993. - -\[RFC 1584\] J. Moy, "Multicast Extensions to OSPF," RFC 1584, -Mar. 1994. - -\[RFC 1633\] R. Braden, D. Clark, S. Shenker, "Integrated Services in -the Internet Architecture: an Overview," RFC 1633, June 1994. - -\[RFC 1636\] R. Braden, D. Clark, S. Crocker, C. Huitema, "Report of IAB -Workshop on Security in the Internet Architecture," RFC 1636, Nov. 1994. - -\[RFC 1700\] J. Reynolds, J. Postel, "Assigned Numbers," RFC 1700, -Oct. 1994. - -\[RFC 1752\] S. Bradner, A. Mankin, "The Recommendations for the IP Next -Generation Protocol," RFC 1752, Jan. 1995. - -\[RFC 1918\] Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, E. -Lear, "Address Allocation for Private Internets," RFC 1918, Feb. 1996. - -\[RFC 1930\] J. Hawkinson, T. Bates, "Guidelines for Creation, -Selection, and Registration of an Autonomous System (AS)," RFC 1930, -Mar. 1996. - -\[RFC 1939\] J. Myers, M. Rose, "Post Office Protocol---Version 3," RFC -1939, May 1996. - -\[RFC 1945\] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext -Transfer Protocol---HTTP/1.0," RFC 1945, May 1996. - -\[RFC 2003\] C. Perkins, "IP Encapsulation Within IP," RFC 2003, -Oct. 1996. - -\[RFC 2004\] C. Perkins, "Minimal Encapsulation Within IP," RFC 2004, -Oct. 1996. - -\[RFC 2018\] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective -Acknowledgment Options," RFC 2018, Oct. 1996. - -\[RFC 2131\] R. Droms, "Dynamic Host Configuration Protocol," RFC 2131, -Mar. 1997. - -\[RFC 2136\] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic -Updates in the Domain Name System," RFC 2136, Apr. 1997. - -\[RFC 2205\] R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin, -"Resource ReSerVation Protocol (RSVP)---Version 1 Functional -Specification," RFC 2205, Sept. 1997. - -\[RFC 2210\] J. Wroclawski, "The Use of RSVP with IETF Integrated -Services," RFC 2210, Sept. 1997. - -\[RFC 2211\] J. Wroclawski, "Specification of the Controlled-Load -Network Element Service," RFC 2211, Sept. 1997. - -\[RFC 2215\] S. Shenker, J. Wroclawski, "General Characterization -Parameters for Integrated Service Network Elements," RFC 2215, -Sept. 1997. - -\[RFC 2326\] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming -Protocol (RTSP)," RFC 2326, Apr. 1998. - -\[RFC 2328\] J. Moy, "OSPF Version 2," RFC 2328, Apr. 1998. - -\[RFC 2420\] H. Kummert, "The PPP Triple-DES Encryption Protocol -(3DESE)," RFC 2420, Sept. 1998. - -\[RFC 2453\] G. Malkin, "RIP Version 2," RFC 2453, Nov. 1998. - -\[RFC 2460\] S. Deering, R. Hinden, "Internet Protocol, Version 6 (IPv6) -Specification," RFC 2460, Dec. 1998. - -\[RFC 2475\] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. -Weiss, "An Architecture for Differentiated Services," RFC 2475, -Dec. 1998. - -\[RFC 2578\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Structure of -Management Information Version 2 (SMIv2)," RFC 2578, Apr. 1999. - -\[RFC 2579\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Textual -Conventions for SMIv2," RFC 2579, Apr. 1999. - -\[RFC 2580\] K. McCloghrie, D. Perkins, J. Schoenwaelder, "Conformance -Statements for SMIv2," RFC 2580, Apr. 1999. - -\[RFC 2597\] J. Heinanen, F. Baker, W. Weiss, J. Wroclawski, "Assured -Forwarding PHB Group," RFC 2597, June 1999. - -\[RFC 2616\] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, -P. Leach, T. Berners-Lee, R. Fielding, "Hypertext Transfer -Protocol---HTTP/1.1," RFC 2616, June 1999. - -\[RFC 2663\] P. Srisuresh, M. Holdrege, "IP Network Address Translator -(NAT) Terminology and Considerations," RFC 2663. - -\[RFC 2702\] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, -"Requirements for Traffic Engineering Over MPLS," RFC 2702, Sept. 1999. - -\[RFC 2827\] P. Ferguson, D. Senie, "Network Ingress Filtering: -Defeating Denial of Service Attacks which Employ IP Source Address -Spoofing," RFC 2827, May 2000. - -\[RFC 2865\] C. Rigney, S. Willens, A. Rubens, W. Simpson, "Remote -Authentication Dial In User Service (RADIUS)," RFC 2865, June 2000. - -\[RFC 3007\] B. Wellington, "Secure Domain Name System (DNS) Dynamic -Update," RFC 3007, Nov. 2000. - -\[RFC 3022\] P. Srisuresh, K. Egevang, "Traditional IP Network Address -Translator (Traditional NAT)," RFC 3022, Jan. 2001. - -\[RFC 3022\] P. Srisuresh, K. Egevang, "Traditional IP Network Address -Translator (Traditional NAT)," RFC 3022, Jan. 2001. - -\[RFC 3031\] E. Rosen, A. Viswanathan, R. Callon, "Multiprotocol Label -Switching Architecture," RFC 3031, Jan. 2001. - -\[RFC 3032\] E. Rosen, D. Tappan, G. Fedorkow, Y. Rekhter, D. Farinacci, -T. Li, A. Conta, "MPLS Label Stack Encoding," RFC 3032, Jan. 2001. - -\[RFC 3168\] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of -Explicit Congestion Notification (ECN) to IP," RFC 3168, Sept. 2001. - -\[RFC 3209\] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, G. -Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels," RFC 3209, -Dec. 2001. - -\[RFC 3221\] G. Huston, "Commentary on Inter-Domain Routing in the -Internet," RFC 3221, Dec. 2001. - -\[RFC 3232\] J. Reynolds, "Assigned Numbers: RFC 1700 Is Replaced by an -On-line Database," RFC 3232, Jan. 2002. - -\[RFC 3234\] B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues," -RFC 3234, Feb. 2002. - -\[RFC 3246\] B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le -Boudec, W. Courtney, S. Davari, V. Firoiu, D. Stiliadis, "An Expedited -Forwarding PHB (Per-Hop Behavior)," RFC 3246, Mar. 2002. - -\[RFC 3260\] D. Grossman, "New Terminology and Clarifications for -Diffserv," RFC 3260, Apr. 2002. - -\[RFC 3261\] J. Rosenberg, H. Schulzrinne, G. Carmarillo, A. Johnston, -J. Peterson, R. Sparks, M. Handley, E. Schooler, "SIP: Session -Initiation Protocol," RFC 3261, July 2002. - -\[RFC 3272\] J. Boyle, V. Gill, A. Hannan, D. Cooper, D. Awduche, B. -Christian, W. S. Lai, "Overview and Principles of Internet Traffic -Engineering," RFC 3272, May 2002. - -\[RFC 3286\] L. Ong, J. Yoakum, "An Introduction to the Stream Control -Transmission Protocol (SCTP)," RFC 3286, May 2002. - -\[RFC 3346\] J. Boyle, V. Gill, A. Hannan, D. Cooper, D. Awduche, B. -Christian, W. S. Lai, "Applicability Statement for Traffic Engineering -with MPLS," RFC 3346, Aug. 2002. - -\[RFC 3390\] M. Allman, S. Floyd, C. Partridge, "Increasing TCP's -Initial Window," RFC 3390, Oct. 2002. - -\[RFC 3410\] J. Case, R. Mundy, D. Partain, "Introduction and -Applicability Statements for Internet Standard Management Framework," -RFC 3410, Dec. 2002. - -\[RFC 3414\] U. Blumenthal and B. Wijnen, "User-based Security Model -(USM) for Version 3 of the Simple Network Management Protocol (SNMPv3)," -RFC 3414, Dec. 2002. - -\[RFC 3416\] R. Presuhn, J. Case, K. McCloghrie, M. Rose, S. Waldbusser, -"Version 2 of the Protocol Operations for the Simple Network Management -Protocol (SNMP)," Dec. 2002. - -\[RFC 3439\] R. Bush, D. Meyer, "Some Internet Architectural Guidelines -and Philosophy," RFC 3439, Dec. 2003. - -\[RFC 3447\] J. Jonsson, B. Kaliski, "Public-Key Cryptography Standards -(PKCS) #1: RSA Cryptography Specifications Version 2.1," RFC 3447, -Feb. 2003. - -\[RFC 3468\] L. Andersson, G. Swallow, "The Multiprotocol Label -Switching (MPLS) Working Group Decision on MPLS Signaling Protocols," -RFC 3468, Feb. 2003. - -\[RFC 3469\] V. Sharma, Ed., F. Hellstrand, Ed, "Framework for -Multi-Protocol Label Switching (MPLS)-based Recovery," RFC 3469, -Feb. 2003. ftp://ftp.rfc-editor.org/innotes/rfc3469.txt - -\[RFC 3501\] M. Crispin, "Internet Message Access Protocol---Version -4rev1," RFC 3501, Mar. 2003. - -\[RFC 3550\] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: -A Transport Protocol for Real-Time Applications," RFC 3550, July 2003. - -\[RFC 3588\] P. Calhoun, J. Loughney, E. Guttman, G. Zorn, J. Arkko, -"Diameter Base Protocol," RFC 3588, Sept. 2003. - -\[RFC 3649\] S. Floyd, "HighSpeed TCP for Large Congestion Windows," RFC -3649, Dec. 2003. - -\[RFC 3746\] L. Yang, R. Dantu, T. Anderson, R. Gopal, "Forwarding and -Control Element Separation (ForCES) Framework," Internet, RFC 3746, -Apr. 2004. - -\[RFC 3748\] B. Aboba, L. Blunk, J. Vollbrecht, J. Carlson, H. -Levkowetz, Ed., "Extensible Authentication Protocol (EAP)," RFC 3748, -June 2004. - -\[RFC 3782\] S. Floyd, T. Henderson, A. Gurtov, "The NewReno -Modification to TCP's Fast Recovery Algorithm," RFC 3782, Apr. 2004. - -\[RFC 4213\] E. Nordmark, R. Gilligan, "Basic Transition Mechanisms for -IPv6 Hosts and Routers," RFC 4213, Oct. 2005. - -\[RFC 4271\] Y. Rekhter, T. Li, S. Hares, Ed., "A Border Gateway -Protocol 4 (BGP-4)," RFC 4271, Jan. 2006. - -\[RFC 4272\] S. Murphy, "BGP Security Vulnerabilities Analysis," RFC -4274, Jan. 2006. - -\[RFC 4291\] R. Hinden, S. Deering, "IP Version 6 Addressing -Architecture," RFC 4291, Feb. 2006. - -\[RFC 4340\] E. Kohler, M. Handley, S. Floyd, "Datagram Congestion -Control Protocol (DCCP)," RFC 4340, Mar. 2006. - -\[RFC 4443\] A. Conta, S. Deering, M. Gupta, Ed., "Internet Control -Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) -Specification," RFC 4443, Mar. 2006. - -\[RFC 4346\] T. Dierks, E. Rescorla, "The Transport Layer Security (TLS) -Protocol Version 1.1," RFC 4346, Apr. 2006. - -\[RFC 4514\] K. Zeilenga, Ed., "Lightweight Directory Access Protocol -(LDAP): String Representation of Distinguished Names," RFC 4514, June -2006. - -\[RFC 4601\] B. Fenner, M. Handley, H. Holbrook, I. Kouvelas, "Protocol -Independent Multicast---Sparse Mode (PIM-SM): Protocol Specification -(Revised)," RFC 4601, Aug. 2006. - -\[RFC 4632\] V. Fuller, T. Li, "Classless Inter-domain Routing (CIDR): -The Internet Address Assignment and Aggregation Plan," RFC 4632, -Aug. 2006. - -\[RFC 4960\] R. Stewart, ed., "Stream Control Transmission Protocol," -RFC 4960, Sept. 2007. - -\[RFC 4987\] W. Eddy, "TCP SYN Flooding Attacks and Common Mitigations," -RFC 4987, Aug. 2007. - -\[RFC 5000\] RFC editor, "Internet Official Protocol Standards," RFC -5000, May 2008. - -\[RFC 5109\] A. Li (ed.), "RTP Payload Format for Generic Forward Error -Correction," RFC 5109, Dec. 2007. - -\[RFC 5216\] D. Simon, B. Aboba, R. Hurst, "The EAP-TLS Authentication -Protocol," RFC 5216, Mar. 2008. - -\[RFC 5218\] D. Thaler, B. Aboba, "What Makes for a Successful -Protocol?," RFC 5218, July 2008. - -\[RFC 5321\] J. Klensin, "Simple Mail Transfer Protocol," RFC 5321, -Oct. 2008. - -\[RFC 5322\] P. Resnick, Ed., "Internet Message Format," RFC 5322, -Oct. 2008. - -\[RFC 5348\] S. Floyd, M. Handley, J. Padhye, J. Widmer, "TCP Friendly -Rate Control (TFRC): Protocol Specification," RFC 5348, Sept. 2008. - -\[RFC 5389\] J. Rosenberg, R. Mahy, P. Matthews, D. Wing, "Session -Traversal Utilities for NAT (STUN)," RFC 5389, Oct. 2008. - -\[RFC 5411\] J Rosenberg, "A Hitchhiker's Guide to the Session -Initiation Protocol (SIP)," RFC 5411, Feb. 2009. - -\[RFC 5681\] M. Allman, V. Paxson, E. Blanton, "TCP Congestion Control," -RFC 5681, Sept. 2009. - -\[RFC 5944\] C. Perkins, Ed., "IP Mobility Support for IPv4, Revised," -RFC 5944, Nov. 2010. - -\[RFC 6265\] A Barth, "HTTP State Management Mechanism," RFC 6265, -Apr. 2011. - -\[RFC 6298\] V. Paxson, M. Allman, J. Chu, M. Sargent, "Computing TCP's -Retransmission Timer," RFC 6298, June 2011. - -\[RFC 7020\] R. Housley, J. Curran, G. Huston, D. Conrad, "The Internet -Numbers Registry System," RFC 7020, Aug. 2013. - -\[RFC 7094\] D. McPherson, D. Oran, D. Thaler, E. Osterweil, -"Architectural Considerations of IP Anycast," RFC 7094, Jan. 2014. - -\[RFC 7323\] D. Borman, R. Braden, V. Jacobson, R. Scheffenegger (ed.), -"TCP Extensions for High Performance," RFC 7323, Sept. 2014. - -\[RFC 7540\] M. Belshe, R. Peon, M. Thomson (Eds), "Hypertext Transfer -Protocol Version 2 (HTTP/2)," RFC 7540, May 2015. - -\[Richter 2015\] P. Richter, M. Allman, R. Bush, V. Paxson, "A Primer on -IPv4 Scarcity," ACM SIGCOMM Computer Communication Review, Vol. 45, -No. 2 (Apr. 2015), pp. 21--32. - -\[Roberts 1967\] L. Roberts, T. Merril, "Toward a Cooperative Network of -Time-Shared Computers," AFIPS Fall Conference (Oct. 1966). - -\[Rodriguez 2010\] R. Rodrigues, P. Druschel, "Peer-to-Peer Systems," -Communications of the ACM, Vol. 53, No. 10 (Oct. 2010), pp. 72--82. - -\[Rohde 2008\] Rohde, Schwarz, "UMTS Long Term Evolution (LTE) -Technology Introduction," Application Note 1MA111. - -\[Rom 1990\] R. Rom, M. Sidi, Multiple Access Protocols: Performance and -Analysis, Springer-Verlag, New York, 1990. - -\[Root Servers 2016\] Root Servers home page, -http://www.root-servers.org/ - -\[RSA 1978\] R. Rivest, A. Shamir, L. Adelman, "A Method for Obtaining -Digital Signatures and Public-key Cryptosystems," Communications of the -ACM, Vol. 21, No. 2 (Feb. 1978), pp. 120--126. - -\[RSA Fast 2012\] RSA Laboratories, "How Fast Is RSA?" -http://www.rsa.com/rsalabs/node.asp?id=2215 - -\[RSA Key 2012\] RSA Laboratories, "How Large a Key Should Be Used in -the RSA Crypto System?" http://www.rsa.com/rsalabs/node.asp?id=2218 - -\[Rubenstein 1998\] D. Rubenstein, J. Kurose, D. Towsley, "Real-Time -Reliable Multicast Using Proactive Forward Error Correction," -Proceedings of NOSSDAV '98 (Cambridge, UK, July 1998). - -\[Ruiz-Sanchez 2001\] M. Ruiz-Sánchez, E. Biersack, W. Dabbous, "Survey -and Taxonomy of IP Address Lookup Algorithms," IEEE Network Magazine, -Vol. 15, No. 2 (Mar./Apr. 2001), pp. 8--23. - -\[Saltzer 1984\] J. Saltzer, D. Reed, D. Clark, "End-to-End Arguments in -System Design," ACM Transactions on Computer Systems (TOCS), Vol. 2, -No. 4 (Nov. 1984). - -\[Sandvine 2015\] "Global Internet Phenomena Report, Spring 2011," -http://www.sandvine.com/news/globalbroadbandtrends.asp, 2011. - -\[Sardar 2006\] B. Sardar, D. Saha, "A Survey of TCP Enhancements for -Last-Hop Wireless Networks," IEEE Commun. Surveys and Tutorials, Vol. 8, -No. 3 (2006), pp. 20--34. - -\[Saroiu 2002\] S. Saroiu, P. K. Gummadi, S. D. Gribble, "A Measurement -Study of Peer-to-Peer File Sharing Systems," Proc. of Multimedia -Computing and Networking (MMCN) (2002). - -\[Sauter 2014\] M. Sauter, From GSM to LTE-Advanced, John Wiley and -Sons, 2014. - -\[Savage 2015\] D. Savage, J. Ng, S. Moore, D. Slice, P. Paluch, R. -White, "Enhanced Interior Gateway Routing Protocol," Internet Draft, -draft-savage-eigrp-04.txt, Aug. 2015. - -\[Saydam 1996\] T. Saydam, T. Magedanz, "From Networks and Network -Management into Service and Service Management," Journal of Networks and -System Management, Vol. 4, No. 4 (Dec. 1996), pp. 345--348. - -\[Schiller 2003\] J. Schiller, Mobile Communications 2nd edition, -Addison Wesley, 2003. - -\[Schneier 1995\] B. Schneier, Applied Cryptography: Protocols, -Algorithms, and Source Code in C, John Wiley and Sons, 1995. - -\[Schulzrinne-RTP 2012\] Henning Schulzrinne's RTP site, -http://www.cs.columbia .edu/\~hgs/rtp - -\[Schulzrinne-SIP 2016\] Henning Schulzrinne's SIP site, -http://www.cs.columbia.edu/\~hgs/sip - -\[Schwartz 1977\] M. Schwartz, Computer-Communication Network Design and -Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1997. - -\[Schwartz 1980\] M. Schwartz, Information, Transmission, Modulation, -and Noise, McGraw Hill, New York, NY 1980. - -\[Schwartz 1982\] M. Schwartz, "Performance Analysis of the SNA Virtual -Route Pacing Control," IEEE Transactions on Communications, Vol. 30, -No. 1 (Jan. 1982), pp. 172--184. - -\[Scourias 2012\] J. Scourias, "Overview of the Global System for Mobile -Communications: GSM." http://www.privateline.com/PCS/GSM0.html - -\[SDNHub 2016\] SDNHub, "App Development Tutorials," http://sdnhub.org/ -tutorials/ - -\[Segaller 1998\] S. Segaller, Nerds 2.0.1, A Brief History of the -Internet, TV Books, New York, 1998. - -\[Sekar 2011\] V. Sekar, S. Ratnasamy, M. Reiter, N. Egi, G. Shi, " The -Middlebox Manifesto: Enabling Innovation in Middlebox Deployment," Proc. -10th ACM Workshop on Hot Topics in Networks (HotNets), Article 21, 6 -pages. - -\[Serpanos 2011\] D. Serpanos, T. Wolf, Architecture of Network Systems, -Morgan Kaufmann Publishers, 2011. - -\[Shacham 1990\] N. Shacham, P. McKenney, "Packet Recovery in High-Speed -Networks Using Coding and Buffer Management," Proc. 1990 IEEE INFOCOM -(San Francisco, CA, Apr. 1990), pp. 124--131. - -\[Shaikh 2001\] A. Shaikh, R. Tewari, M. Agrawal, "On the Effectiveness -of DNS-based Server Selection," Proc. 2001 IEEE INFOCOM. - -\[Singh 1999\] S. Singh, The Code Book: The Evolution of Secrecy from -Mary, Queen of Scotsto Quantum Cryptography, Doubleday Press, 1999. - -\[Singh 2015\] A. Singh, J. Ong,. Agarwal, G. Anderson, A. Armistead, R. -Banno, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. -Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, A. -Vahdat, "Jupiter Rising: A Decade of Clos Topologies and Centralized -Control in Google's Datacenter Network," Sigcomm, 2015. - -\[SIP Software 2016\] H. Schulzrinne Software Package site, -http://www.cs.columbia.edu/IRT/software - -\[Skoudis 2004\] E. Skoudis, L. Zeltser, Malware: Fighting Malicious -Code, Prentice Hall, 2004. - -\[Skoudis 2006\] E. Skoudis, T. Liston, Counter Hack Reloaded: A -Step-by-Step Guide to Computer Attacks and Effective Defenses (2nd -Edition), Prentice Hall, 2006. - -\[Smith 2009\] J. Smith, "Fighting Physics: A Tough Battle," -Communications of the ACM, Vol. 52, No. 7 (July 2009), pp. 60--65. - -\[Snort 2012\] Sourcefire Inc., Snort homepage, -http://http://www.snort.org/ - -\[Solensky 1996\] F. Solensky, "IPv4 Address Lifetime Expectations," in -IPng: Internet Protocol Next Generation (S. Bradner, A. Mankin, ed.), -Addison-Wesley, Reading, MA, - -1996. - -\[Spragins 1991\] J. D. Spragins, Telecommunications Protocols and -Design, Addison-Wesley, Reading, MA, 1991. - -\[Srikant 2004\] R. Srikant, The Mathematics of Internet Congestion -Control, Birkhauser, 2004 - -\[Steinder 2002\] M. Steinder, A. Sethi, "Increasing Robustness of Fault -Localization Through Analysis of Lost, Spurious, and Positive Symptoms," -Proc. 2002 IEEE INFOCOM. - -\[Stevens 1990\] W. R. Stevens, Unix Network Programming, Prentice-Hall, -Englewood Cliffs, NJ. - -\[Stevens 1994\] W. R. Stevens, TCP/IP Illustrated, Vol. 1: The -Protocols, Addison-Wesley, Reading, MA, 1994. - -\[Stevens 1997\] W.R. Stevens, Unix Network Programming, Volume 1: -Networking APIs-Sockets and XTI, 2nd edition, Prentice-Hall, Englewood -Cliffs, NJ, 1997. - -\[Stewart 1999\] J. Stewart, BGP4: Interdomain Routing in the Internet, -Addison-Wesley, 1999. - -\[Stone 1998\] J. Stone, M. Greenwald, C. Partridge, J. Hughes, -"Performance of Checksums and CRC's Over Real Data," IEEE/ACM -Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp. 529--543. - -\[Stone 2000\] J. Stone, C. Partridge, "When Reality and the Checksum -Disagree," Proc. 2000 ACM SIGCOMM (Stockholm, Sweden, Aug. 2000). - -\[Strayer 1992\] W. T. Strayer, B. Dempsey, A. Weaver, XTP: The Xpress -Transfer Protocol, Addison-Wesley, Reading, MA, 1992. - -\[Stubblefield 2002\] A. Stubblefield, J. Ioannidis, A. Rubin, "Using -the Fluhrer, Mantin, and Shamir Attack to Break WEP," Proceedings of -2002 Network and Distributed Systems Security Symposium (2002), -pp. 17--22. - -\[Subramanian 2000\] M. Subramanian, Network Management: Principles and -Practice, Addison-Wesley, Reading, MA, 2000. - -\[Subramanian 2002\] L. Subramanian, S. Agarwal, J. Rexford, R. Katz, -"Characterizing the Internet Hierarchy from Multiple Vantage Points," -Proc. 2002 IEEE INFOCOM. - -\[Sundaresan 2006\] K.Sundaresan, K. Papagiannaki, "The Need for -Cross-layer Information in Access Point Selection," Proc. 2006 ACM -Internet Measurement Conference (Rio De Janeiro, Oct. 2006). - -\[Suh 2006\] K. Suh, D. R. Figueiredo, J. Kurose and D. Towsley, -"Characterizing and Detecting Relayed Traffic: A Case Study Using -Skype," Proc. 2006 IEEE INFOCOM (Barcelona, Spain, Apr. 2006). - -\[Sunshine 1978\] C. Sunshine, Y. Dalal, "Connection Management in -Transport Protocols," Computer Networks, North-Holland, Amsterdam, 1978. - -\[Tariq 2008\] M. Tariq, A. Zeitoun, V. Valancius, N. Feamster, M. -Ammar, "Answering What-If Deployment and Configuration Questions with -WISE," Proc. 2008 ACM SIGCOMM (Aug. 2008). - -\[TechnOnLine 2012\] TechOnLine, "Protected Wireless Networks," online -webcast tutorial, -http://www.techonline.com/community/tech_topic/internet/21752 - -\[Teixeira 2006\] R. Teixeira, J. Rexford, "Managing Routing Disruptions -in Internet Service Provider Networks," IEEE Communications Magazine -(Mar. 2006). - -\[Think 2012\] Technical History of Network Protocols, "Cyclades," -http://www.cs.utexas.edu/users/chris/think/Cyclades/index.shtml - -\[Tian 2012\] Y. Tian, R. Dey, Y. Liu, K. W. Ross, "China's Internet: -Topology Mapping and Geolocating," IEEE INFOCOM Mini-Conference 2012 -(Orlando, FL, 2012). - -\[TLD list 2016\] TLD list maintained by Wikipedia, -https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains - -\[Tobagi 1990\] F. Tobagi, "Fast Packet Switch Architectures for -Broadband Integrated Networks," Proc. 1990 IEEE INFOCOM, Vol. 78, No. 1 -(Jan. 1990), pp. 133--167. - -\[TOR 2016\] Tor: Anonymity Online, http://www.torproject.org - -\[Torres 2011\] R. Torres, A. Finamore, J. R. Kim, M. M. Munafo, S. Rao, -"Dissecting Video Server Selection Strategies in the YouTube CDN," Proc. -2011 Int. Conf. on Distributed Computing Systems. - -\[Tourrilhes 2014\] J. Tourrilhes, P. Sharma, S. Banerjee, J. Petit, -"SDN and Openflow Evolution: A Standards Perspective," IEEE Computer -Magazine, Nov. 2014, pp. 22--29. - -\[Turner 1988\] J. S. Turner, "Design of a Broadcast packet switching -network," IEEE Transactions on Communications, Vol. 36, No. 6 (June -1988), pp. 734--743. - -\[Turner 2012\] B. Turner, "2G, 3G, 4G Wireless Tutorial," -http://blogs.nmscommunications.com/communications/2008/10/2g-3g-4g-wireless-tutorial.html - -\[UPnP Forum 2016\] UPnP Forum homepage, http://www.upnp.org/ - -\[van der Berg 2008\] R. van der Berg, "How the 'Net Works: An -Introduction to Peering and Transit," -http://arstechnica.com/guides/other/peering-and-transit.ars - -\[van der Merwe 1998\] J. van der Merwe, S. Rooney, I. Leslie, S. -Crosby, "The Tempest: A Practical Framework for Network -Programmability," IEEE Network, Vol. 12, No. 3 (May 1998), pp. 20--28. - -\[Varghese 1997\] G. Varghese, A. Lauck, "Hashed and Hierarchical Timing -Wheels: Efficient Data Structures for Implementing a Timer Facility," -IEEE/ACM Transactions on Networking, Vol. 5, No. 6 (Dec. 1997), -pp. 824--834. - -\[Vasudevan 2012\] S. Vasudevan, C. Diot, J. Kurose, D. Towsley, -"Facilitating Access Point Selection in IEEE 802.11 Wireless Networks," -Proc. 2005 ACM Internet Measurement Conference, (San Francisco CA, -Oct. 2005). - -\[Villamizar 1994\] C. Villamizar, C. Song. "High Performance tcp in -ansnet," ACM SIGCOMM Computer Communications Review, Vol. 24, No. 5 -(1994), pp. 45--60. - -\[Viterbi 1995\] A. Viterbi, CDMA: Principles of Spread Spectrum -Communication, Addison-Wesley, Reading, MA, 1995. - -\[Vixie 2009\] P. Vixie, "What DNS Is Not," Communications of the ACM, -Vol. 52, No. 12 (Dec. 2009), pp. 43--47. - -\[Wakeman 1992\] I. Wakeman, J. Crowcroft, Z. Wang, D. Sirovica, -"Layering Considered Harmful," IEEE Network (Jan. 1992), pp. 20--24. - -\[Waldrop 2007\] M. Waldrop, "Data Center in a Box," Scientific American -(July 2007). - -\[Wang 2004\] B. Wang, J. Kurose, P. Shenoy, D. Towsley, "Multimedia -Streaming via TCP: An Analytic Performance Study," Proc. 2004 ACM -Multimedia Conference (New York, NY, Oct. 2004). - -\[Wang 2008\] B. Wang, J. Kurose, P. Shenoy, D. Towsley, "Multimedia -Streaming via TCP: An Analytic Performance Study," ACM Transactions on -Multimedia Computing Communications and Applications (TOMCCAP), Vol. 4, -No. 2 (Apr. 2008), p. 16. 1--22. - -\[Wang 2010\] G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. -S. E. Ng, M. Kozuch, M. Ryan, "c-Through: Part-time Optics in Data -Centers," Proc. 2010 ACM SIGCOMM. - -\[Wei 2006\] W. Wei, C. Zhang, H. Zang, J. Kurose, D. Towsley, -"Inference and Evaluation of Split-Connection Approaches in Cellular -Data Networks," Proc. Active and Passive Measurement Workshop (Adelaide, -Australia, Mar. 2006). - -\[Wei 2007\] D. X. Wei, C. Jin, S. H. Low, S. Hegde, "FAST TCP: -Motivation, Architecture, Algorithms, Performance," IEEE/ACM -Transactions on Networking (2007). - -\[Weiser 1991\] M. Weiser, "The Computer for the Twenty-First Century," -Scientific American (Sept. 1991): 94--10. -http://www.ubiq.com/hypertext/weiser/ SciAmDraft3.html - -\[White 2011\] A. White, K. Snow, A. Matthews, F. Monrose, "Hookt on -fon-iks: Phonotactic Reconstruction of Encrypted VoIP Conversations," -IEEE Symposium on Security and Privacy, Oakland, CA, 2011. - -\[Wigle.net 2016\] Wireless Geographic Logging Engine, -http://www.wigle.net - -\[Wiki Satellite 2016\] Satellite Internet access, -https://en.wikipedia.org/wiki/Satellite_Internet_access - -\[Wireshark 2016\] Wireshark homepage, http://www.wireshark.org - -\[Wischik 2005\] D. Wischik, N. McKeown, "Part I: Buffer Sizes for Core -Routers," ACM SIGCOMM Computer Communications Review, Vol. 35, No. 3 -(July 2005). - -\[Woo 1994\] T. Woo, R. Bindignavle, S. Su, S. Lam, "SNP: an interface -for secure network programming," Proc. 1994 Summer USENIX (Boston, MA, -June 1994), pp. 45--58. - -\[Wright 2015\] J. Wright, J. Wireless Security Secrets & Solutions, 3e, -"Hacking Exposed Wireless," McGraw-Hill Education, 2015. - -\[Wu 2005\] J. Wu, Z. M. Mao, J. Rexford, J. Wang, "Finding a Needle in -a Haystack: Pinpointing Significant BGP Routing Changes in an IP -Network," Proc. USENIX NSDI (2005). - -\[Xanadu 2012\] Xanadu Project homepage, http://www.xanadu.com/ - -\[Xiao 2000\] X. Xiao, A. Hannan, B. Bailey, L. Ni, "Traffic Engineering -with MPLS in the Internet," IEEE Network (Mar./Apr. 2000). - -\[Xu 2004\] L. Xu, K Harfoush, I. Rhee, "Binary Increase Congestion -Control (BIC) for Fast Long-Distance Networks," IEEE INFOCOM 2004, -pp. 2514--2524. - -\[Yavatkar 1994\] R. Yavatkar, N. Bhagwat, "Improving End-to-End -Performance of TCP over Mobile Internetworks," Proc. Mobile 94 Workshop -on Mobile Computing Systems and Applications (Dec. 1994). - -\[YouTube 2009\] YouTube 2009, Google container data center tour, 2009. - -\[YouTube 2016\] YouTube Statistics, 2016, -https://www.youtube.com/yt/press/ statistics.html - -\[Yu 2004\] Yu, Fang, H. Katz, Tirunellai V. Lakshman. "Gigabit Rate -Packet Pattern-Matching Using TCAM," Proc. 2004 Int. Conf. Network -Protocols, pp. 174--183. - -\[Yu 2011\] M. Yu, J. Rexford, X. Sun, S. Rao, N. Feamster, "A Survey of -VLAN Usage in Campus Networks," IEEE Communications Magazine, July 2011. - -\[Zegura 1997\] E. Zegura, K. Calvert, M. Donahoo, "A Quantitative -Comparison of Graph-based Models for Internet Topology," IEEE/ACM -Transactions on Networking, Vol. 5, No. 6, (Dec. 1997). See also -http://www.cc.gatech.edu/projects/gtim for a software package that -generates networks with a transit-stub structure. - -\[Zhang 1993\] L. Zhang, S. Deering, D. Estrin, S. Shenker, D. Zappala, -"RSVP: A New Resource Reservation Protocol," IEEE Network Magazine, Vol. -7, No. 9 (Sept. 1993), pp. 8--18. - -\[Zhang 2007\] L. Zhang, "A Retrospective View of NAT," The IETF -Journal, Vol. 3, Issue 2 (Oct. 2007). - -\[Zhang 2015\] G. Zhang, W. Liu, X. Hei, W. Cheng, "Unreeling Xunlei -Kankan: Understanding Hybrid CDN-P2P Video-on-Demand Streaming," IEEE -Transactions on Multimedia, Vol. 17, No. 2, Feb. 2015. - -\[Zhang X 2102\] X. Zhang, Y. Xu, Y. Liu, Z. Guo, Y. Wang, "Profiling -Skype Video Calls: Rate Control and Video Quality," IEEE INFOCOM -(Mar. 2012). - -\[Zink 2009\] M. Zink, K. Suh, Y. Gu, J. Kurose, "Characteristics of -YouTube Network Traffic at a Campus Network---Measurements, Models, and -Implications," Computer Networks, Vol. 53, No. 4, pp. 501--514, 2009. - -Index - - |
