diff options
Diffstat (limited to 'book/Working With Unix Processes.txt')
| -rw-r--r-- | book/Working With Unix Processes.txt | 3264 |
1 files changed, 3264 insertions, 0 deletions
diff --git a/book/Working With Unix Processes.txt b/book/Working With Unix Processes.txt new file mode 100644 index 0000000..29b0b89 --- /dev/null +++ b/book/Working With Unix Processes.txt @@ -0,0 +1,3264 @@ + + + + + + + + +Working with Unix Processes + +Copyright © 2013 Jesse Storimer. All rights reserved. + +This is a one-man operation, please respect the time and effort that went into +this book. If you came by a free copy and find it useful, you can compensate me +at http://workingwithunixprocesses.com. + +Acknowledgements + +A big thank you to a few awesome folks who read early drafts of the book, +helped me understand how to market this thing, gave me a push when I needed it, +and were all-around extremely helpful: Sam Storry, Jesse Kaunisviita, and Marc- +André Cournoyer. + +I have to express my immense gratitude towards my wife and daughter for not +only supporting the erratic schedule that made this book possible, but also +always being there to provide a second opinion. Without your love and support I +couldn't have done this. You make it all worthwhile. + + +Contents + +Introduction +Primer +Why_Care? +Harness_the_Power! +Overview +System_Calls +Nomenclature,_wtf(2) +Processes:_The_Atoms_of_Unix +Processes_Have_IDs +Cross_Referencing +In_the_Real_World +System_Calls +Processes_Have_Parents +Cross_Referencing +In_the_Real_World +System_Calls +Processes_Have_File_Descriptors +Everything_is_a_File +Descriptors_Represent_Resources +Standard_Streams +In_the_Real_World +System_Calls +Processes_Have_Resource_Limits +Finding_the_Limits +Soft_Limits_vs._Hard_Limits +Bumping_the_Soft_Limit +Exceeding_the_Limit +Other_Resources +In_the_Real_World +System_Calls +Processes_Have_an_Environment +It's_a_hash,_right? +In_the_Real_World +System_Calls +Processes_Have_Arguments +It's_an_Array! +In_the_Real_World +Processes_Have_Names +Naming_Processes +In_the_Real_World +Processes_Have_Exit_Codes +How_to_Exit_a_Process +exit +exit! +abort +raise +Processes_Can_Fork +Use_the_fork(2),_Luke +Multicore_Programming? +Using_a_Block +In_the_Real_World +System_Calls +Orphaned_Processes +Out_of_Control +Abandoned_Children +Managing_Orphans +Processes_Are_Friendly +Being_CoW_Friendly +Processes_Can_Wait +Babysitting +Process.wait_and_Cousins +Communicating_with_Process.wait2 +Waiting_for_Specific_Children +Race_Conditions +In_the_Real_World +System_Calls +Zombie_Processes +Good_Things_Come_to_Those_Who_wait(2) +What_Do_Zombies_Look_Like? +In_The_Real_World +System_Calls +Processes_Can_Get_Signals +Trapping_SIGCHLD +SIGCHLD_and_Concurrency +Signals_Primer +Where_do_Signals_Come_From? +The_Big_Picture +Redefining_Signals +Ignoring_Signals +Signal_Handlers_are_Global +Being_Nice_about_Redefining_Signals +When_Can't_You_Receive_Signals? +In_the_Real_World +System_Calls +Processes_Can_Communicate +Our_First_Pipe +Pipes_Are_One-Way_Only +Sharing_Pipes +Streams_vs._Messages +Remote_IPC? +In_the_Real_World +System_Calls +Daemon_Processes +The_First_Process +Creating_Your_First_Daemon_Process +Diving_into_Rack +Daemonizing_a_Process,_Step_by_Step +Process_Groups_and_Session_Groups +In_the_Real_World +System_Calls +Spawning_Terminal_Processes +fork_+_exec +File_descriptors_and_exec +Arguments_to_exec +Kernel#system +Kernel#` +Process.spawn +IO.popen +open3 +In_the_Real_World +System_Calls +Ending +Abstraction +Communication +Farewell,_But_Not_Goodbye +Appendix:_How_Resque_Manages_Processes +The_Architecture +Forking_for_Memory_Management +Why_Bother? +Doesn't_the_GC_clean_up_for_us? +Appendix:_How_Unicorn_Reaps_Worker_Processes +Reaping_What? +Conclusion +Appendix:_Preforking_Servers +Efficient_use_of_memory +Many_Mongrels +Many_Unicorn +Efficient_load_balancing +Efficient_sysadminning +Basic_Example_of_a_Preforking_Server +Appendix:_Spyglass +Spyglass'_Architecture +Booting_Spyglass +Before_a_Request_Arrives +Connection_is_Made +Things_Get_Quiet +Getting_Started + +Updates + + +* December 20, 2011 - First public version +* December 21, 2011 - Typos +* December 23, 2011 - Explanation for Process.setsid +* December 27, 2011 - Section on SIGCHLD and concurrency +* December 27, 2011 - Note about redefining 'default' signal handlers +* December 28, 2011 - Typos +* December 31, 2011 - Clarification around exiting with Kernel.raise; Section + on using fork with a block; More typos; Note about getsid(2) +* January 13, 2012 - Improved code highlighting. Improved e-reader formatting. +* February 1, 2012 - New cover art. +* February 7, 2012 - New chapters: zombie processes, environment variables, + preforking servers, the spyglass project. Clarifications on CoW-friendliness + in MRI. Added sections for IO.popen and Open3. +* February 13, 2012 - Clarifications on Process::WNOHANG and file descriptor + relations. +* March 13, 2012 - Include TXT format. +* March 29, 2012 - Formatting and errata. +* April 20, 2012 - New chapters: ARGV and IPC. +* May 15, 2012 - Clarification about reentrancy in signal handlers. +* June 12, 2012 - New chapter on rlimits; Many formatting/syntax updates. +* Dec 4, 2012 - Fixed ToC target page issue; Fixed lsof reference. +* July 30, 2013 - Updated CoW section to reflect MRI's new GC; Added bit about + FD leaking to fork + exec section. + + +Introduction + +When I was growing up I was sitting in front of a computer every chance I got. +Not because I was programming, but because I was fascinated by what was +possible with this amazing machine. I grew up as a computer user using ICQ, +Winamp, and Napster. +As I got older I spent more time playing video games on the computer. At first +I was into first-person shooters and eventually spent most of my time playing +real-time strategy games. And then I discovered that you can play these games +online! Throughout my youth I was a 'computer guy': I knew how to use +computers, but I had no idea how they worked under the hood. +The reason I'm giving you my background is because I want you to know that I +was not a child prodigy. I did not teach myself how to program Basic at age 7. +When I took my first computer programming class I was not teaching the teacher +and correcting his mistakes. +It wasn't until my second year of a University degree that I really came to +love programming as an activity. Some may say that I'm a late bloomer, but I +have a feeling that I'm closer to the norm than you may think. +Although I came to love programming for the sake of programming itself I still +didn't have a good grasp of how the computer was working under the hood. If you +had told me back then that all of my code ran inside of a process I would have +looked at you sideways. +Fortunately for me I was given a great work opportunity at a local web startup. +This gave me a chance to do some programming on a real production system. This +changed everything for me. This gave me a reason to learn how things were +working under the hood. +As I worked on this high-traffic production system I was presented with +increasingly complex problems. As our traffic and resource demands increased we +had to begin looking at our full stack to debug and fix outstanding issues. By +just focusing on the application code we couldn't get the full picture of how +the app was functioning. +We had many layers in front of the application: a firewall, load balancer, +reverse proxy, and http cache. We had layers that worked alongside the +application: job queue, database server, and stats collector. Every application +will have a different set of components that comprise it, and this book won't +teach you everything there is to know about all of it. +This book will teach you all you need to know about Unix processes, and that is +guaranteed to improve your understanding of any component at work in your +application. +Through debugging issues I was forced to dig deep into Ruby projects that made +use of Unix programming concepts. Projects like Resque and Unicorn. These two +projects were my introduction to Unix programming in Ruby. +After getting a deeper understanding of how they were working I was able to +diagnose issues faster and with greater understanding, as well as debug pesky +problems that didn't make sense when looking at the application code by itself. +I even started coming up with new, faster, more efficient solutions to the +problems I was solving that used the techniques I was learning from these +projects. Alright, enough about me. Let's go down the rabbit hole. + +Primer + +This section will provide background on some key concepts used in the book. +It's definitely recommended that you read this before moving on to the meatier +chapters. + +Why Care? + +The Unix programming model has existed, in some form, since 1970. It was then +that Unix was famously invented at Bell Labs, along with the C programming +language or framework. In the decades that have elapsed since then Unix has +stood the test of time as the operating system of choice for reliability, +security, and stability. +Unix programming concepts and techniques are not a fad, they're not the latest +popular programming language. These techniques transcend programming languages. +Whether you're programming in C, C++, Ruby, Python, JavaScript, Haskell, or +[insert your favourite language here] these techniques WILL be useful. +This stuff has existed, largely unchanged, for decades. Smart programmers have +been using Unix programming to solve tough problems with a multitude of +programming languages for the last 40 years, and they will continue to do so +for the next 40 years. + +Harness the Power! + +I'll warn you now, the concepts and techniques described in this book can bring +you great power. With this power you can create new software, understand +complex software that is already out there, even use this knowledge to advance +your career to the next level. +Just remember, with great power comes great responsibility. Read on and I'll +tell you everything you need to know to gain the power and avoid the pitfalls. + +Overview + +This book is not meant to be read as a reference manual. It's more of a +walkthrough. To get the most out of it you should read it sequentially, since +each chapter builds on the last. Once you're finished you can use the chapter +headings to find information if you need a refresher. +This book contains many code examples. I highly recommend that you follow along +with them by actually running them yourself in a Ruby interpreter. Playing with +the code yourself and making tweaks will help the concepts sink in that much +more. +Once you've read through the book and played with the examples I'm sure you'll +be wanting to get your hands on a real world project that's a little more in +depth. At that point have a look at the included Spyglass project. +Spyglass is a web server that was created specifically for inclusion with this +book. It's designed to teach Unix programming concepts. It takes the concepts +you learn here and shows how a real-world project would put them to use. Have a +look at the last chapter in this book for a deeper introduction. + +System Calls + +To understand system calls first requires a quick explanation of the components +of a Unix system, specifically userland vs. the kernel. +The kernel of your Unix system sits atop the hardware of your computer. It's a +middleman for any interactions that need to happen with the hardware. This +includes things like writing/reading from the filesystem, sending data over the +network, allocating memory, or playing audio over the speakers. Given its +power, programs are not allowed direct access to the kernel. Any communication +is done via system calls. +The system call interface connects the kernel to userland. It defines the +interactions that are allowed between your program and the computer hardware. +Userland is where all of your programs run. You can do a lot in your userland +programs without ever making use of a system call: do mathematics, string +operations, control flow with logical statements. But I'd go as far as saying +that if you want your programs to do anything interesting then you'll need to +involve the kernel via system calls. +If you were a C programmer this stuff would probably be second nature to you. +System calls are at the heart of C programming. +But I'm going to expect that you, like me, don't have any C programming +experience. You learned to program in a high level language. When you learned +to write data to the filesystem you weren't told which system calls make that +happen. +The takeaway here is that system calls allow your user-space programs to +interact indirectly with the hardware of your computer, via the kernel. We'll +be looking at common system calls as we go through the chapters. + +Nomenclature, wtf(2) + +One of the roadblocks to learning about Unix programming is where to find the +proper documentation. Want to hear the kicker? It's all available via Unix +manual pages (manpages), and if you're using a Unix based computer right now +it's already on your computer! +If you've never used manpages before you can start by invoking the command man +man from a terminal. +Perfect, right? Well, kind of. The manpages for the system call api are a great +resource in two situations: + + 1. you're a C programmer who wants to know how to invoke a given system call, + or + 2. you're trying to figure out the purpose of a given system call + +I'm going to assume we're not C programmers here, so #1 isn't so useful, but #2 +is very useful. +You'll see references throughout this text to things like this: select(2). This +bit of text is telling you where you can find the manpage for a given system +call. You may or may not know this, but there are many sections to the Unix +manpages. +Here's a look at the most commonly used sections of the manpages for FreeBSD +and Linux systems: + +* Section 1: General Commands +* Section 2: System Calls +* Section 3: C Library Functions +* Section 4: Special Files + +So Section 1 is for general commands (a.k.a. shell commands). If I wanted to +refer you to the manual page for the find command I would write it like this: +find(1). This tells you that there is a manual page for find in section 1 of +the manpages. +If I wanted to refer to the manual page for the getpid system call I would +write it like this: getpid(2). This tells you that there is a manual page for +getpid in section 2 of the manpages. +Why do manpages need multiple sections? Because a command may be available in +more than one section, ie. available as both a shell command and a system call. +Take stat(1) and stat(2) as an example. +In order to access other sections of the manpages you can specify it like this +on the command line: + + $ man 2 getpid + $ man 3 malloc + $ man find # same as man 1 find + +This nomenclature was not invented for this book, it's a convention that's used +everywhere %{http://en.wikipedia.org/wiki/Man_page#Usage} when referring to the +manpages. So it's a good idea to learn it now and get comfortable with seeing +it. + +Processes: The Atoms of Unix + +Processes are the building blocks of a Unix system. Why? Because any code that +is executed happens inside a process. +For example, when you launch ruby from the command line a new process is +created for your code. When your code is finished that process exits. + + $ ruby -e "p Time.now" + +The same is true for all code running on your system. You know that MySQL +server that's always running? That's running in its own process. The e-reader +software you're using right now? That's running in its own process. The email +client that's desperately trying to tell you you have new messages? You should +ignore it by the way and keep reading! It also runs in its own process. +Things start to get interesting when you realize that one process can spawn and +manage many others. We'll be taking a look at that over the course of this +book. + +Processes Have IDs + +Every process running on your system has a unique process identifier, hereby +referred to as 'pid'. +The pid doesn't say anything about the process itself, it's simply a sequential +numeric label. This is how the kernel sees your process: as a number. +Here's how we can inspect the current pid in a ruby program. Fire up irb and +try this: + + # This line will print the pid of the current ruby process. This might be an + # irb process, a rake process, a rails server, or just a plain ruby script. + puts Process.pid + +A pid is a simple, generic representation of a process. Since it's not tied to +any aspect of the content of the process it can be understood from any +programming language and with simple tools. We'll see below how we can use the +pid to trace the process details using different utilities. + +Cross Referencing + +To get a full picture, we can use ps(1) to cross-reference our pid with what +the kernel is seeing. Leaving your irb session open run the following command +at a terminal: + + $ ps -p <pid-of-irb-process> + +That command should show a process called 'irb' with a pid matching what was +printed in the irb session. + +In the Real World + +Just knowing the pid isn't all that useful in itself. So where is it used? +A common place you'll find pids in the real world is in log files. When you +have multiple processes logging to one file it's imperative that you're able to +tell which log line comes from which process. Including the pid in each line +solves that problem. +Including the pid also allows you to cross reference information with the OS, +through the use of commands like top(1) or lsof(8). Here's some sample output +from the Spyglass server booting up. The first square brackets of each line +denote the pid where the log line is coming from. + + [58550] [Spyglass::Server] Listening on port 4545 + [58550] [Spyglass::Lookout] Received incoming connection + [58557] [Spyglass::Master] Loaded the app + [58557] [Spyglass::Master] Spawned 4 workers. Babysitting now... + [58558] [Spyglass::Worker] Received connection + + +System Calls + +Ruby's Process.pid maps to getpid(2). +There is also a global variable that holds the value of the current pid. You +can access it with $$. +Ruby inherits this behaviour from other languages before it (both Perl and bash +support $$), however I avoid it when possible. Typing out Process.pid in full +is much more expressive of your intent than the dollar-dollar variable, and +less likely to confuse those who haven't seen the dollar-dollar before. + +Processes Have Parents + +Every process running on your system has a parent process. Each process knows +its parent process identifier (hereby referred to as 'ppid'). +In the majority of cases the parent process for a given process is the process +that invoked it. For example, you're an OSX user who starts up Terminal.app and +lands in a bash prompt. Since everything is a process that action started a new +Terminal.app process, which in turn started a bash process. +The parent of that new bash process will be the Terminal.app process. If you +then invoke ls(1) from the bash prompt, the parent of that ls process will be +the bash process. You get the picture. +Since the kernel deals only in pids there is a way to get the pid of the +current parent process. Here's how it's done in Ruby: + + # Notice that this is only one character different from getting the + # pid of the current process. + puts Process.ppid + + +Cross Referencing + +Leaving your irb session open run the following command at a terminal: + + $ ps -p <ppid-of-irb-process> + +That command should show a process called 'bash' (or 'zsh' or whatever) with a +pid that matches the one that was printed in your irb session. + +In the Real World + +There aren't a ton of uses for the ppid in the real world. It can be important +when detecting daemon processes, something covered in a later chapter. + +System Calls + +Ruby's Process.ppid maps to getppid(2). + +Processes Have File Descriptors + +In much the same way as pids represent running processes, file descriptors +represent open files. + +Everything is a File + +A part of the Unix philosophy: in the land of Unix 'everything is a file'. This +means that devices are treated as files, sockets and pipes are treated as +files, and files are treated as files. +Since all of these things are treated as files I'm going to use the word +'resource' when I'm talking about files in a general sense (including devices, +pipes, sockets, etc.) and I'll use the word 'file' when I mean the classical +definition (a file on the file system). + +Descriptors Represent Resources + +Any time that you open a resource in a running process it is assigned a file +descriptor number. File descriptors are NOT shared between unrelated processes, +they live and die with the process they are bound to, just as any open +resources for a process are closed when it exits. There are special semantics +for file descriptor sharing when you fork a process, more on that later. +In Ruby, open resources are represented by the IO class. Any IO object can have +an associated file descriptor number. Use IO#fileno to get access to it. +# ./code/snippets/fileno.rb + + passwd = File.open('/etc/passwd') + puts passwd.fileno + +outputs: + + 3 + +Any resource that your process opens gets a unique number identifying it. This +is how the kernel keeps track of any resources that your process is using. +What happens when we have multiple resources open? +# ./code/snippets/multiple_filenos.rb + + passwd = File.open('/etc/passwd') + puts passwd.fileno + + hosts = File.open('/etc/hosts') + puts hosts.fileno + + # Close the open passwd file. The frees up its file descriptor + # number to be used by the next opened resource. + passwd.close + + null = File.open('/dev/null') + puts null.fileno + +outputs: + + 3 + 4 + 3 + +There are two key takeaways from this example. + + 1. File descriptor numbers are assigned the lowest unused value. The first + file we opened, passwd, got file descriptor #3, the next open file got #4 + because #3 was already in use. + 2. Once a resource is closed its file descriptor number becomes available + again. Once we closed the passwd file its file descriptor number became + available again. So when we opened the file at dev/null it was assigned + the lowest unused value, which was then #3. + +It's important to note that file descriptors keep track of open resources only. +Closed resources are not given a file descriptor number. +Stepping back to the kernel's viewpoint again this makes a lot of sense. Once a +resource is closed it no longer needs to interact with the hardware layer so +the kernel can stop keeping track of it. +Given the above, file descriptors are sometimes called 'open file descriptors'. +This is a bit of misnomer since there is no such thing as a 'closed file +descriptor'. In fact, trying to read the file descriptor number from a closed +resource will raise an exception: +# ./code/snippets/closed_fileno.rb + + passwd = File.open('/etc/passwd') + puts passwd.fileno + passwd.close + puts passwd.fileno + +outputs: + + 3 + -e:4:in `fileno': closed stream (IOError) + +You may have noticed that when we open a file and ask for its file descriptor +number the lowest value we get is 3. What happened to 0, 1, and 2? + +Standard Streams + +Every Unix process comes with three open resources. These are your standard +input (STDIN), standard output (STDOUT), and standard error (STDERR) resources. +These standard resources exist for a very important reason that we take for +granted today. STDIN provides a generic way to read input from keyboard devices +or pipes, STDOUT and STDERR provide generic ways to write output to monitors, +files, printers, etc. This was one of the innovations of Unix. +Before STDIN existed your program had to include a keyboard driver for all the +keyboards it wanted to support! And if it wanted to print something to the +screen it had to know how to manipulate the pixels required to do so. So let's +all be thankful for standard streams. +# ./code/snippets/standard_streams.rb + + puts STDIN.fileno + puts STDOUT.fileno + puts STDERR.fileno + +outputs: + + 0 + 1 + 2 + +That's where those first 3 file descriptor numbers went to. + +In the Real World + +File descriptors are at the core of network programming using sockets, pipes, +etc. and are also at the core of any file system operations. +Hence, they are used by every running process and are at the core of most of +the interesting stuff you can do with a computer. You'll see many more examples +of how to use them in the following chapters or in the attached Spyglass +project. + +System Calls + +Many methods on Ruby's IO class map to system calls of the same name. These +include open(2), close(2), read(2), write(2), pipe(2), fsync(2), stat(2), among +others. + +Processes Have Resource Limits + +In the last chapter we looked at the fact that open resources are represented +by file descriptors. You may have noticed that when resources aren't being +closed the file descriptor numbers continue to increase. It begs the question: +how many file descriptors can one process have? +The answer depends on your system configuration, but the important point is +there are some resource limits imposed on a process by the kernel. + +Finding the Limits + +We'll continue on the subject of file descriptors. Using Ruby we can ask +directly for the maximum number of allowed file descriptors: +# ./code/snippets/getrlimit.rb + + p Process.getrlimit(:NOFILE) + +On my machine this snippet outputs: + + [2560, 9223372036854775807] + +We used a method called Process.getrlimit and asked for the maximum number of +open files using the symbol :NOFILE. It returned a two-element Array. +The first element in the Array is the soft limit for the number of file +descriptors, the second element in the Array is the hard limit for the number +of file descriptors. + +Soft Limits vs. Hard Limits + +What's the difference? Glad you asked. The soft limit isn't really a limit. +Meaning that if you exceed the soft limit (in this case by opening more than +2560 resources at once) an exception will be raised, but you can always change +that limit if you want to. +Note that the hard limit on my system for the number of file descriptors is a +ridiculously large integer. Is it even possible to open that many? Likely not, +I'm sure you'd run into hardware constraints before that many resources could +be opened at once. +On my system that number actually represents infinity. It's repeated in the +constant Process::RLIM_INFINITY. Try comparing those two values to be sure. So, +on my system, I can effectively open as many resources as I'd like, once I bump +the soft limit for my needs. +So any process is able to change its own soft limit, but what about the hard +limit? Typically that can only be done by a superuser. However, your process is +also able to bump the hard limit assuming it has the required permissions. If +you're interested in changing the limits at a system-wide level then start by +having a look at sysctl(8). + +Bumping the Soft Limit + +Let's go ahead and bump the soft limit for the current process: +# ./code/snippets/setrlimit.rb + + Process.setrlimit(:NOFILE, 4096) + p Process.getrlimit(:NOFILE) + +outputs: + + [4096, 4096] + +You can see that we set a new limit for the number of open files, and upon +asking for that limit again both the hard limit and the soft limit were set to +the new value 4096. +We can optionally pass a third argument to Process.setrlimit specifying a new +hard limit as well, assuming we have the permissions to do so. Note that +lowering the hard limit, as we did in that last snippet, is irreversible: once +it comes down it won't go back up. +The following example is a common way to raise the soft limit of a system +resource to be equal with the hard limit, the maximum allowed value. +# ./code/snippets/soft_to_hard_rlimit.rb + + Process.setrlimit(:NOFILE, Process.getrlimit(:NOFILE)[1]) + + +Exceeding the Limit + +Note that exceeding the soft limit will raise Errno::EMFILE: +# ./code/snippets/exceeding_soft_rlimits.rb + + # Set the maximum number of open files to 3. We know this + # will be maxed out because the standard streams occupy + # the first three file descriptors. + Process.setrlimit(:NOFILE, 3) + + File.open('/dev/null') + +outputs: + + Errno::EMFILE: Too many open files - /dev/null + + +Other Resources + +You can use these same methods to check and modify limits on other system +resources. Some common ones are: +# ./code/snippets/rlimits.rb + + # The maximum number of simultaneous processes + # allowed for the current user. + Process.getrlimit(:NPROC) + + # The largest size file that may be created. + Process.getrlimit(:FSIZE) + + # The maximum size of the stack segment of the + # process. + Process.getrlimit(:STACK) + +Have a look at the documentation %{http://www.ruby-doc.org/core-1.9.3/ +Process.html#method-c-setrlimit} for Process.getrlimit for a full listing of +the available options. + +In the Real World + +Needing to modify limits for system resources isn't a common need for most +programs. However, for some specialized tools this can be very important. +One use case is any process needing to handle thousands of simultaneous network +connections. An example of this is the httperf(1) http performance tool. A +command like httperf --hog --server www --num-conn 5000 will ask httperf(1) to +create 5000 concurrent connections. Obviously this will be a problem on my +system due to its default soft limit, so httperf(1) will need to bump its soft +limit before it can properly do its testing. +Another real world use case for limiting system resources is a situation where +you execute third-party code and need to keep it within certain constraints. +You could set limits for the processes running that code and revoke the +permissions required to change them, hence ensuring that they don't use more +resources than you allow for them. + +System Calls + +Ruby's Process.getrlimit and Process.setrlimit map to getrlimit(2) and +setrlimit(2), respectively. + +Processes Have an Environment + +Environment, in this sense, refers to what's known as 'environment variables'. +Environment variables are key-value pairs that hold data for a process. +Every process inherits environment variables from its parent. They are set by a +parent process and inherited by its child processes. Environment variables are +per-process and are global to each process. +Here's a simple example of setting an environment variable in a bash shell, +launching a Ruby process, and reading that environment variable. +# ./code/snippets/env_launch.sh + + $ MESSAGE='wing it' ruby -e "puts ENV['MESSAGE']" + +The VAR=value syntax is the bash way of setting environment variables. The same +thing can be accomplished in Ruby using the ENV constant. +# ./code/snippets/env_set.rb + + # The same thing, with places reversed! + ENV['MESSAGE'] = 'wing it' + system "echo $MESSAGE" + +Both of these examples print: + + wing it + +In bash environment variables are accessed using the syntax: $VAR. As you can +tell from these few examples environment variables can be used to share state +between processes running different languages, bash and ruby in this case. + +It's a hash, right? + +Although ENV uses the hash-style accessor API it's not actually a Hash. For +instance, it implements Enumerable and some of the Hash API, but not all of it. +Key methods like merge are not implemented. So you can do things like +ENV.has_key?, but don't count on all hash operations working. +# ./code/snippets/env_aint_a_hash.rb + + puts ENV['EDITOR'] + puts ENV.has_key?('PATH') + puts ENV.is_a?(Hash) + +outputs: + + vim + true + false + + +In the Real World + +In the real world environment variables have many uses. Here's a few that are +common workflows in the Ruby community: + + $ RAILS_ENV=production rails server + $ EDITOR=mate bundle open actionpack + $ QUEUE=default rake resque:work + +Environment variables are often used as a generic way to accept input into a +command-line program. Any terminal (on Unix or Windows) already supports them +and most programmers are familiar with them. Using environment variables is +often less overhead than explicitly parsing command line options. + +System Calls + +There are no system calls for directly manipulating environment variables, but +the C library functions setenv(3) and getenv(3) do the brunt of the work. Also +have a look at environ(7) for an overview. + +Processes Have Arguments + +Every process has access to a special array called ARGV. Other programming +languages may implement it slightly differently, but every one has something +called 'argv'. +argv is a short form for 'argument vector'. In other words: a vector, or array, +of arguments. It holds the arguments that were passed in to the current process +on the command line. Here's an example of inspecting ARGV and passing in some +simple options. + + $ cat argv.rb + p ARGV + $ ruby argv.rb foo bar -va + ["foo", "bar", "-va"] + + +It's an Array! + +Unlike the previous chapter, where we learned that ENV isn't a Hash, ARGV is +simply an Array. You can add elements to it, remove elements from it, change +the elements it contains, whatever you like. But if it simply represents the +arguments passed in on the command line why would you need to change anything? +Some libraries will read from ARGV to parse command line options, for example. +You can programmatically change ARGV before they have a chance to see it in +order to modify the options at runtime. + +In the Real World + +The most common use case for ARGV is probably for accepting filenames into a +program. It's very common to write a program that takes one or more filenames +as input on the command line and does something useful with them. +The other common use case, as mentioned, is for parsing command line input. +There are many Ruby libraries for dealing with command line input. One called +optparse is available as part of the standard library. +But now that you know how ARGV works you can skip that extra overhead for +simple command line options and do it by hand. If you just want to support a +few flags you can implement them directly as array operations. + + # did the user request help? + ARGV.include?('--help') + # get the value of the -c option + ARGV.include?('-c') && ARGV[ARGV.index('-c') + 1] + + +Processes Have Names + +Unix processes have very few inherent ways of communicating about their state. +Programmers have worked around this and invented things like logfiles. Logfiles +allow processes to communicate anything they want about their state by writing +to the filesystem, but this operates at the level of the filesystem rather than +being inherent to the process itself. +Similarly, processes can use the network to open sockets and communicate with +other processes. But again, that operates at a different level than the process +itself, since it relies on the network. +There are two mechanisms that operate at the level of the process itself that +can be used to communicate information. One is the process name, the other is +exit codes. + +Naming Processes + +Every process on the system has a name. For example, when you start up an irb +session that process is given the name 'irb'. The neat thing about process +names is that they can be changed at runtime and used as a method of +communication. +In Ruby you can access the name of the current process in the $PROGRAM_NAME +variable. Similarly, you can assign a value to that global variable to change +the name of the current process. +# ./code/snippets/program_name.rb + + puts $PROGRAM_NAME + + 10.downto(1) do |num| + $PROGRAM_NAME = "Process: #{num}" + puts $PROGRAM_NAME + end + +outputs: + + irb + Process: 10 + Process: 9 + Process: 8 + Process: 7 + Process: 6 + Process: 5 + Process: 4 + Process: 3 + Process: 2 + Process: 1 + +As a fun exercise you can start an irb session, print the pid, and change the +process name. Then you can use the ps(1) utility to see your changes reflected +on the system. +Unfortunately this global variable (and its mirror $0) is the only mechanism +provided by Ruby for this feature. There is not a more intent-revealing way to +change the name of the current process. + +In the Real World + +To see an example of how this is used in a real project read through How Resque +Manages Processes in the appendices. + +Processes Have Exit Codes + +When a process comes to an end it has one last chance to make its mark on the +world: its exit code. Every process that exits does so with a numeric exit code +(0-255) denoting whether it exited successfully or with an error. +Traditionally, a process that exits with an exit code of 0 is said to be +successful. Any other exit code denotes an error, with different codes pointing +to different errors. +Though traditionally they're used to denote different errors, they're really +just a channel for communication. All you need to do is handle the different +exit codes that a process may exit with in a way that suits your program and +you've gotten away from the traditions. +It's usually a good idea to stick with the '0 as success' exit code tradition +so that your programs will play nicely with other Unix tools. + +How to Exit a Process + +There are several ways you can exit a process in Ruby, each for different +purposes. + +exit + +The simplest way to exit a process is using Kernel#exit. This is also what +happens implicitly when your script ends without an explicit exit statement. +# ./code/snippets/exit_0.rb + + # This will exit the program with the success status code (0). + exit + +# ./code/snippets/custom_exit_code.rb + + # You can pass a custom exit code to this method + exit 22 + +# ./code/snippets/at_exit.rb + + # When Kernel#exit is invoked, before exiting Ruby invokes any blocks + # defined by Kernel#at_exit. + at_exit { puts 'Last!' } + exit + +will output: + + Last! + + +exit! + +Kernel#exit! is almost exactly the same as Kernel#exit, but with two key +differences. The first is that it sets an unsuccessful status code by default +(1), and the second is that it will not invoke any blocks defined using +Kernel#at_exit. +# ./code/snippets/exit_bang.rb + + # This will exit the program with a status code 1. + exit! + +# ./code/snippets/custom_exit_bang.rb + + # You can still pass an exit code. + exit! 33 + +# ./code/snippets/exit_bang_skips_at_exit.rb + + # This block will never be invoked. + at_exit { puts 'Silence!' } + exit! + + +abort + +Kernel#abort provides a generic way to exit a process unsuccessfully. +Kernel#abort will set the exit code to 1 for the current process. +# ./code/snippets/abort.rb + + # Will exit with exit code 1. + abort + +# ./code/snippets/abort_message.rb + + # You can pass a message to Kernel#abort. This message will be printed + # to STDERR before the process exits. + abort "Something went horribly wrong." + +# ./code/snippets/abort_with_at_exit.rb + + # Kernel#at_exit blocks are invoked when using Kernel#abort. + at_exit { puts 'Last!' } + abort "Something went horribly wrong." + +will output: + + Something went horribly wrong. + Last! + + +raise + +A different way to end a process is with an unhandled exception. This is +something that you never want to happen in a production environment, but it's +almost always happening in development and test environments. +Note that Kernel#raise, unlike the previous methods, will not exit the process +immediately. It simply raises an exception that may be rescued somewhere up the +stack. If the exception is not rescued anywhere in the codebase then the +unhandled exception will cause the process to exit. +Ending a process this way will still invoke any at_exit handlers and will print +the exception message and backtrace to STDERR. +# ./code/snippets/raise_exit.rb + + # Similar to abort, an unhandled exception will set the exit code to 1. + raise 'hell' + + +Processes Can Fork + + +Use the fork(2), Luke + +Forking is one of the most powerful concepts in Unix programming. The fork(2) +system call allows a running process to create new process programmatically. +This new process is an exact copy of the original process. +Up until now we've talked about creating processes by launching them from the +terminal. We've also mentioned low level operating system processes that create +other processes: fork(2) is how they do it. +When forking, the process that initiates the fork(2) is called the "parent", +and the newly created process is called the "child". +The child process inherits a copy of all of the memory in use by the parent +process, as well as any open file descriptors belonging to the parent process. +Let's take a moment to review child processes from the eye of our first three +chapters. +Since the child process is an entirely new process, it gets its own unique pid. +The parent of the child process is, obviously, its parent process. So its ppid +is set to the pid of the process that initiated the fork(2). +The child process inherits any open file descriptors from the parent at the +time of the fork(2). It's given the same map of file descriptor numbers that +the parent process has. In this way the two processes can share open files, +sockets, etc. +The child process inherits a copy of everything that the parent process has in +main memory. In this way a process could load up a large codebase, say a Rails +app, that occupies 500MB of main memory. Then this process can fork 2 new child +processes. Each of these child processes would effectively have their own copy +of that codebase loaded in memory. +The call to fork returns near-instantly so we now have 3 processes with each +using 500MB of memory. Perfect for when you want to have multiple instances of +your application loaded in memory at the same time. Because only one process +needs to load the app and forking is fast, this method is faster than loading +the app 3 times in separate instances. +The child processes would be free to modify their copy of the memory without +affecting what the parent process has in memory. See the next chapter for a +discussion of copy-on-write and how it affects memory when forking. +Let's get started with forking in Ruby by looking at a mind-bending example: +# ./code/snippets/if_fork.rb + + if fork + puts "entered the if block" + else + puts "entered the else block" + end + +outputs: + + entered the if block + entered the else block + +WTF! What's going on here? A call to the fork method has taken the once- +familiar if construct and turned it on its head. Somehow this piece of code is +entering both the if and else block of the if construct! +It's no mystery what's happening here. One call to the fork method actually +returns twice. Remember that fork creates a new process. So it returns once in +the calling process (parent) and once in the newly created process (child). +The last example becomes more obvious if we print the pids. +# ./code/snippets/if_fork_pid.rb + + puts "parent process pid is #{Process.pid}" + + if fork + puts "entered the if block from #{Process.pid}" + else + puts "entered the else block from #{Process.pid}" + end + +outputs: + + parent process is 21268 + entered the if block from 21268 + entered the else block from 21282 + +Now it becomes clear that the code in the if block is being executed by the +parent process, while the code in the else block is being executed by the child +process. The child process will exit after executing its code in the else +block, while the parent process will carry on. +Again, there's a rhythm to this beat, and it has to do with the return value of +the fork method. In the child process fork returns nil. Since nil is falsy it +executes the code in the else block. +In the parent process fork returns the pid of the newly created child process. +Since an integer is truthy it executes the code in the if block. +This concept is illustrated nicely by simply printing the return value of a +fork call. + + puts fork + +outputs + + 21423 + nil + +Here we have the two different return values. The first value returned is the +pid of the newly created child process; this comes from the parent. The second +return value is the nil from the child process. + +Multicore Programming? + +In a roundabout way, yes. By making new processes it means that your code is +able, but not guaranteed, to be distributed across multiple CPU cores. +Given a system with 4 CPUs, if you fork 4 new processes then those can be +handled each by a separate CPU, giving you multicore concurrency. +However, there's no guarantee that stuff will be happening in parallel. On a +busy system it's possible that all 4 of your processes are handled by the same +CPU. +fork(2) creates a new process that's a copy of the old process. So if a process +is using 500MB of main memory, then it forks, now you have 1GB in main memory. +Do this another ten times and you can quickly exhaust main memory. This is +often called a fork bomb. Before you turn up the concurrency make sure that you +know the consequences. + +Using a Block + +In the example above we've demonstrated fork with an if/else construct. It's +also possible, and more common in Ruby code, to use fork with a block. +When you pass a block to the fork method that block will be executed in the new +child process, while the parent process simply skips over it. The child process +exits when it's done executing the block. It does not continue along the same +code path as the parent. + + fork do + # Code here is only executed in the child process + end + + # Code here is only executed in the parent process. + + +In the Real World + +Have a look at either of the appendices, or the attached Spyglass project, to +see some real-world examples of using fork(2). + +System Calls + +Ruby's Kernel#fork maps to fork(2). + +Orphaned Processes + + +Out of Control + +You may have noticed when running the examples in the last chapter that when +child processes are involved, it's no longer possible to control everything +from a terminal like we're used to. +When starting a process via a terminal, we normally have only one process +writing to STDOUT, taking keyboard input, or listening for that Ctrl-C telling +it to exit. +But once that process has forked child processes that all becomes a little more +difficult. When you press Ctrl-C which process should exit? All of them? Only +the parent? +It's good to know about this stuff because it's actually very easy to create +orphaned processes: +# ./code/snippets/orphan_process.rb + + fork do + 5.times do + sleep 1 + puts "I'm an orphan!" + end + end + + abort "Parent process died..." + +If you run this program from a terminal you'll notice that since the parent +process dies immediately the terminal returns you to the command prompt. At +which point, it's overwritten by the STDOUT from the child process! Strange +things can start to happen when forking processes. + +Abandoned Children + +What happens to a child process when its parent dies? +The short answer is, nothing. That is to say, the operating system doesn't +treat child processes any differently than any other processes. So, when the +parent process dies the child process continues on; the parent process does not +take the child down with it. + +Managing Orphans + +Can you still manage orphaned processes? +We're getting a bit ahead of ourselves with this question, but it touches on +two interesting concepts. +The first is something called daemon processes. Daemon processes are long +running processes that are intentionally orphaned and meant to stay running +forever. These are covered in detail in a later chapter. +The second interesting bit here is communicating with processes that are not +attached to a terminal session. You can do this using something called Unix +signals. This is also covered in more detail in a later chapter. +We'll soon talk about how to properly manage and control child processes. + +Processes Are Friendly + +Let's take a step back from looking at code for a minute to talk about a higher +level concept and how it's handled in different Ruby implementations. + +Being CoW Friendly + +As mentioned in the forking chapter, fork(2) creates a new child process that's +an exact copy of the parent process. This includes a copy of everything the +parent process has in memory. +Physically copying all of that data can be considerable overhead, so modern +Unix systems employ something called copy-on-write semantics (CoW) to combat +this. +As you may have guessed from the name, CoW delays the actual copying of memory +until it needs to be written. +So a parent process and a child process will actually share the same physical +data in memory until one of them needs to modify it, at which point the memory +will be copied so that proper separation between the two processes can be +preserved. +# ./code/snippets/cow_no_writes.rb + + arr = [1,2,3] + + fork do + # At this point the child process has been initialized. + # Using CoW this process doesn't need to copy the arr variable, + # since it hasn't modified any shared values it can continue reading + # from the same memory location as the parent process. + p arr + end + +# ./code/snippets/cow.rb + + arr = [1,2,3] + + fork do + # At this point the child process has been initialized. + # Because of CoW the arr variable hasn't been copied yet. + arr << 4 + # The above line of code modifies the array, so a copy of + # the array will need to be made for this process before + # it can modify it. The array in the parent process remains + # unchanged. + end + +This is a big win when using fork(2) as it saves on resources. It means that +fork(2) is fast since it doesn't need to copy any of the physical memory of the +parent. It also means that child processes only get a copy of the data they +need, the rest can be shared. +In order for you to have CoW semantics, a Ruby implementation needs to be +written in such a way that it doesn't clobber this feature provided by the +kernel. Versions of MRI >= 2.0 are written in such a way that they respect +and preserve these semantics. Versions of MRI <= 1.9 did not preserve the +semantics. +But how? +MRI's garbage collector uses a 'mark-and-sweep' algorithm. In a nutshell this +means that when the GC is invoked it must traverse the graph of live objects, +and for each one the GC must 'mark' it as alive. +In MRI <= 1.9, this 'mark' step was implemented as a modification to that +object in memory. So when the GC was invoked right after a fork, all live +objects were modified, forcing the OS to make copies of all live Ruby objects +and foregoing any benefit from CoW semantics. +MRI >= 2.0 still uses a mark-and-sweep GC, but preserves CoW semantics by +storing all of the 'marks' in a small data structure in a disparate region of +memory. So when the GC runs after a fork, this small region of memory must be +copied, but the graph of live Ruby objects can be shared between parent and +child until your code modifies an object. +What does this mean for you? +If you're building something, or using tools, that depend heavily on fork(2), +you should expect much better memory utilization with MRI 2.0 than with earlier +versions. + +Processes Can Wait + +In the examples of fork(2) up until now we have let the parent process continue +on in parallel with the child process. In some cases this led to weird results, +such as when the parent process exited before the child process. +That kind of scenario is really only suitable for one use case, fire and +forget. It's useful when you want a child process to handle something +asynchronously, but the parent process still has its own work to do. +# ./code/snippets/bg_process.rb + + message = 'Good Morning' + recipient = 'tree@mybackyard.com' + + fork do + # In this contrived example the parent process forks a child to take + # care of sending data to the stats collector. Meanwhile the parent + # process has continued on with its work of sending the actual payload. + + # The parent process doesn't want to be slowed down with this task, and + # it doesn't matter if this would fail for some reason. + StatsCollector.record message, recipient + end + + # send message to recipient + + +Babysitting + +For most other use cases involving fork(2) you'll want some way to keep tabs on +your child processes. In Ruby, one technique for this is provided by +Process.wait. Let's rewrite our orphan-inducing example from the last chapter +to perform with less surprises. +# ./code/snippets/babysitting_processes.rb + + fork do + 5.times do + sleep 1 + puts "I am an orphan!" + end + end + + Process.wait + abort "Parent process died..." + +This time the output will look like: + + I am an orphan! + I am an orphan! + I am an orphan! + I am an orphan! + I am an orphan! + Parent process died... + +Not only that, but control will not be returned to the terminal until all of +the output has been printed. +So what does Process.wait do? Process.wait is a blocking call instructing the +parent process to wait for one of its child processes to exit before +continuing. + +Process.wait and Cousins + +I mentioned something key in that last statement, Process.wait blocks until any +one of its child processes exit. If you have a parent that's babysitting more +than one child process and you're using Process.wait, you need to know which +one exited. For this, you can use the return value. +Process.wait returns the pid of the child that exited. Check it out. +# ./code/snippets/wait_for_each_process.rb + + # We create 3 child processes. + 3.times do + fork do + # Each one sleeps for a random amount of number less than 5 seconds. + sleep rand(5) + end + end + + 3.times do + # We wait for each child process to exit and print the pid that + # gets returned. + puts Process.wait + end + + +Communicating with Process.wait2 + +But wait! Process.wait has a cousin called Process.wait2! +Why the name confusion? It makes sense once you know that Process.wait returns +1 value (pid), but Process.wait2 returns 2 values (pid, status). +This status can be used as communication between processes via exit codes. In +our chapter on Exit Codes we mentioned that you can use exit codes to encode +information for other processes. Process.wait2 gives you direct access to that +information. +The status returned from Process.wait2 is an instance of Process::Status. It +has a lot of useful information attached to it for figuring out exactly how a +process exited. +# ./code/snippets/wait2.rb + + # We create 5 child processes. + 5.times do + fork do + # Each generates a random number. If even they exit + # with a 111 exit code, otherwise they use a 112 exit code. + if rand(5).even? + exit 111 + else + exit 112 + end + end + end + + 5.times do + # We wait for each of the child processes to exit. + pid, status = Process.wait2 + + # If the child process exited with the 111 exit code + # then we know they encountered an even number. + if status.exitstatus == 111 + puts "#{pid} encountered an even number!" + else + puts "#{pid} encountered an odd number!" + end + end + +Communication between processes without the filesystem or network! + +Waiting for Specific Children + +But wait! The Process.wait cousins have two more cousins. Process.waitpid and +Process.waitpid2. +You can probably guess what these do. They function the same as Process.wait +and Process.wait2 except, rather than waiting for any child to exit they only +wait for a specific child to exit, specified by pid. +# ./code/snippets/waitpid2.rb + + favourite = fork do + exit 77 + end + + middle_child = fork do + abort "I want to be waited on!" + end + + pid, status = Process.waitpid2 favourite + puts status.exitstatus + +Although it appears that Process.wait and Process.waitpid provide different +behaviour don't be fooled! They are actually aliased to the same thing. Both +will accept the same arguments and behave the same. +You can pass a pid to Process.wait in order to get it to wait for a specific +child, and you can pass -1 as the pid to Process.waitpid to get it to wait for +any child process. +The same is true for Process.wait2 and Process.waitpid2. +Just like with Process.pid vs. $$ I think it's important that, as programmers, +we use the provided tools to reveal our intent where possible. Although these +methods are identical you should use Process.wait when you're waiting for any +child process and use Process.waitpid when you're waiting for a specific +process. + +Race Conditions + +As you look at these simple code examples you may start to wonder about race +conditions. +What if the code that handles one exited process is still running when another +child process exits? What if I haven't gotten back around to Process.wait and +another process exits? Let's see: +# ./code/snippets/process_wait_queue.rb + + # We create two child processes. + 2.times do + fork do + # Both processes exit immediately. + abort "Finished!" + end + end + + # The parent process waits for the first process, then sleeps for 5 seconds. + # In the meantime the second child process has exited and is no + # longer running. + puts Process.wait + sleep 5 + + # The parent process asks to wait once again, and amazingly enough, the + second + # process' exit information has been queued up and is returned here. + puts Process.wait + +As you can see this technique is free from race conditions. The kernel queues +up information about exited processes so that the parent always receives the +information in the order that the children exited. +So even if the parent is slow at processing each exited child it will always be +able to get the information for each exited child when it's ready for it. +Take note that calling any variant of Process.wait when there are no child +processes will raise Errno::ECHILD. It's always a good idea to keep track of +how many child processes you have created so you don't encounter this +exception. + +In the Real World + +The idea of looking in on your child processes is at the core of a common Unix +programming pattern. The pattern is sometimes called babysitting processes, +master/worker, or preforking. +At the core of this pattern is the concept that you have one process that forks +several child processes, for concurrency, and then spends its time looking +after them: making sure they are still responsive, reacting if any of them +exit, etc. +For example, the Unicorn web server %{http://unicorn.bogomips.org} employs this +pattern. You tell it how many worker processes you want it to start up for you, +5 for instance. +Then a unicorn process will boot up that will fork 5 child processes to handle +web requests. The parent (or master) process maintains a heartbeat with each +child and ensures that all of the child processes stay responsive. +This pattern allows for both concurrency and reliability. Read more about +Unicorn in its Appendix at the end of the book. +For an alternative usage of this technique read through the Lookout class in +the attached Spyglass project. + +System Calls + +Ruby's Process.wait and cousins map to waitpid(2). + +Zombie Processes + +At the beginning of the last chapter we looked at an example that used a child +process to asynchronously handle a task in a fire and forget manner. We need to +revisit that example and ensure that we clean up that child process +appropriately, lest it become a zombie! + +Good Things Come to Those Who wait(2) + +In the last chapter I showed that the kernel queues up status information about +child processes that have exited. So even if you call Process.wait long after +the child process has exited its status information is still available. I'm +sure you can smell a problem here... +The kernel will retain the status of exited child processes until the parent +process requests that status using Process.wait. If the parent never requests +the status then the kernel can never reap that status information. So creating +fire and forget child processes without collecting their status information is +a poor use of kernel resources. +If you're not going to wait for a child process to exit using Process.wait (or +the technique described in the next chapter) then you need to 'detach' that +child process. Here's the fire and forget example from last chapter rectified +to properly detach the child process: +# ./code/snippets/zombie_process.rb + + message = 'Good Morning' + recipient = 'tree@mybackyard.com' + + pid = fork do + # In this contrived example the parent process forks a child to take + # care of sending data to the stats collector. Meanwhile the parent + # process has continued on with its work of sending the actual payload. + + # The parent process doesn't want to be slowed down with this task, and + # it doesn't matter if this would fail for some reason. + StatsCollector.record message, recipient + end + + # This line ensures that the process performing the stats collection + # won't become a zombie. + Process.detach(pid) + +What does Process.detach do? It simply spawns a new thread whose sole job is to +wait for the child process specified by pid to exit. This ensures that the +kernel doesn't hang on to any status information we don't need. + +What Do Zombies Look Like? + +# ./code/snippets/zombies_eg.rb + + # Create a child process that exits after 1 second. + pid = fork { sleep 1 } + # Print its pid. + puts pid + # Put the parent process to sleep so we can inspect the + # process status of the child + sleep 5 + +Running the following command at a terminal, using the pid printed from the +last snippet, will print the status of that zombie process. The status should +say 'z' or 'Z+', meaning that the process is a zombie. + + ps -ho pid,state -p [pid of zombie process] + + +In The Real World + +Notice that any dead process whose status hasn't been waited on is a zombie +process. So every child process that dies while its parent is still active will +be a zombie, if only for a short time. Once the parent process collects the +status from the zombie then it effectively disappears, no longer consuming +kernel resources. +It's fairly uncommon to fork child processes in a fire and forget manner, never +collecting their status. If work needs to be offloaded in the background it's +much more common to do that with a dedicated background queueing system. +That being said there is a Rubygem called spawnling %{https://github.com/tra/ +spawnling} that provides this exact functionality. Besides providing a generic +API over processes or threads, it ensures that fire and forget processes are +properly detached. + +System Calls + +There's no system call for Process.detach because it's implemented in Ruby +simply as a thread and Process.wait. The implementation in Rubinius %{https:// +github.com/rubinius/rubinius/blob/c6e8e33b37601d4a082ddcbbd60a568767074771/ +kernel/common/process.rb#L377-395} is stark in its simplicity. + +Processes Can Get Signals + +In the last chapter we looked at Process.wait. It provides a nice way for a +parent process to keep tabs on its child processes. However it is a blocking +call: it will not return until a child process dies. +What's a busy parent to do? Not every parent has the luxury of waiting around +on their children all day. There is a solution for the busy parent! And it's +our introduction to Unix signals. + +Trapping SIGCHLD + +Let's take a simple example from the last chapter and rewrite it for a busy +parent process. +# ./code/snippets/signals_chld_naive.rb + + child_processes = 3 + dead_processes = 0 + # We fork 3 child processes. + child_processes.times do + fork do + # They sleep for 3 seconds. + sleep 3 + end + end + + # Our parent process will be busy doing some intense mathematics. + # But still wants to know when one of its children exits. + + # By trapping the :CHLD signal our process will be notified by the kernel + # when one of its children exits. + trap(:CHLD) do + # Since Process.wait queues up any data that it has for us we can ask for + it + # here, since we know that one of our child processes has exited. + + puts Process.wait + dead_processes += 1 + # We exit explicitly once all the child processes are accounted for. + exit if dead_processes == child_processes + end + + # Work it. + loop do + (Math.sqrt(rand(44)) ** 8).floor + sleep 1 + end + + +SIGCHLD and Concurrency + +Before we go on I must mention a caveat. Signal delivery is unreliable. By this +I mean that if your code is handling a CHLD signal while another child process +dies you may or may not receive a second CHLD signal. +This can lead to inconsistent results with the code snippet above. Sometimes +the timing will be such that things will work out perfectly, and sometimes +you'll actually 'miss' an instance of a child process dying. +This behaviour only happens when receiving the same signal several times in +quick succession; you can always count on at least one instance of the signal +arriving. This same caveat is true for other signals you handle in Ruby; read +on to hear more about those. +To properly handle CHLD you must call Process.wait in a loop and look for as +many dead child processes as are available, since you may have received +multiple CHLD signals since entering the signal handler. But....isn't +Process.wait a blocking call? If there's only one dead child process and I call +Process.wait again how will I avoid blocking the whole process? +Now we get to the second argument to Process.wait. In the last chapter we +looked at passing a pid to Process.wait as the first argument, but it also +takes a second argument, flags. One such flag that can be passed tells the +kernel not to block if no child has exited. Just what we need! +There's a constant that represents the value of this flag, Process::WNOHANG, +and it can be used like so: + + Process.wait(-1, Process::WNOHANG) + +Easy enough. +Here's a rewrite of the code snippet from the beginning of this chapter that +won't 'miss' any child process deaths: +# ./code/snippets/signals_chld_nohang.rb + + child_processes = 3 + dead_processes = 0 + # We fork 3 child processes. + child_processes.times do + fork do + # They sleep for 3 seconds. + sleep 3 + end + end + + # Sync $stdout so the call to #puts in the CHLD handler isn't + # buffered. Can cause a ThreadError if a signal handler is + # interrupted after calling #puts. Always a good idea to do + # this if your handlers will be doing IO. + $stdout.sync = true + + # Our parent process will be busy doing some intense mathematics. + # But still wants to know when one of its children exits. + + # By trapping the :CHLD signal our process will be notified by the kernel + # when one of its children exits. + trap(:CHLD) do + # Since Process.wait queues up any data that it has for us we can ask for + it + # here, since we know that one of our child processes has exited. + + # We loop over a non-blocking Process.wait to ensure that any dead child + # processes are accounted for. + begin + while pid = Process.wait(-1, Process::WNOHANG) + puts pid + dead_processes += 1 + end + rescue Errno::ECHILD + end + end + + loop do + # We exit ourself once all the child processes are accounted for. + exit if dead_processes == child_processes + + sleep 1 + end + +One more thing to remember is that Process.wait, even this variant, will raise +Errno::ECHILD if no child processes exist. Since signals might arrive at any +time it's possible for the last CHLD signal to arrive after the previous CHLD +handler has already called Process.wait twice and gotten the last available +status. This asynchronous stuff can be mind-bending. Any line of code can be +interrupted with a signal. You've been warned! +So you must handle the Errno::ECHILD exception in your CHLD signal handler. +Also if you don't know how many child processes you are waiting on you should +rescue that exception and handle it properly. + +Signals Primer + +This was our first foray to Unix signals. Signals are asynchronous +communication. When a process receives a signal from the kernel it can do one +of the following: + + 1. ignore the signal + 2. perform a specified action + 3. perform the default action + + +Where do Signals Come From? + +Technically signals are sent by the kernel, just like text messages are sent by +a cell phone carrier. But text messages have an original sender, and so do +signals. Signals are sent from one process to another process, using the kernel +as a middleman. +The original purpose of signals was to specify different ways that a process +should be killed. Let's start there. +Let's start up two ruby programs and we'll use one to kill the other. +For these examples we won't use irb because it defines its own signal handlers +that get in the way of our demonstrations. Instead we'll just use the ruby +program itself. +Give this a try: launch the ruby program without any arguments. Enter some +code. Hit Ctrl-D. +This executes the code that you entered and then exits. +Start up two ruby processes using the technique mentioned above and we'll kill +one of them using a signal. + + 1. In the first ruby session execute the following code: + + puts Process.pid + sleep # so that we have time to send it a signal + + 2. In the second ruby session issue the following command to kill the first + session with a signal: + + Process.kill(:INT, <pid of first session>) + + +So the second process sent an "INT" signal to the first process, causing it to +exit. "INT" is short for "INTERRUPT". +The system default when a process receives this signal is that it should +interrupt whatever it's doing and exit immediately. + +The Big Picture + +Below is a table showing signals commonly supported on Unix systems. Every Unix +process will be able to respond to these signals and any signal can be sent to +any process. +When naming signals the SIG portion of the name is optional. The Action column +in the table describes the default action for each signal: + + + Term + means that the process will terminate immediately + + Core + means that the process will terminate immediately and dump core (stack + trace) + + Ign + means that the process will ignore the signal + + Stop + means that the process will stop (ie pause) + + Cont + means that the process will resume (ie unpause) + + + Signal Value Action Comment + ------------------------------------------------------------------------- + SIGHUP 1 Term Hangup detected on controlling terminal + or death of controlling process + SIGINT 2 Term Interrupt from keyboard + SIGQUIT 3 Core Quit from keyboard + SIGILL 4 Core Illegal Instruction + SIGABRT 6 Core Abort signal from abort(3) + SIGFPE 8 Core Floating point exception + SIGKILL 9 Term Kill signal + SIGSEGV 11 Core Invalid memory reference + SIGPIPE 13 Term Broken pipe: write to pipe with no readers + SIGALRM 14 Term Timer signal from alarm(2) + SIGTERM 15 Term Termination signal + SIGUSR1 30,10,16 Term User-defined signal 1 + SIGUSR2 31,12,17 Term User-defined signal 2 + SIGCHLD 20,17,18 Ign Child stopped or terminated + SIGCONT 19,18,25 Cont Continue if stopped + SIGSTOP 17,19,23 Stop Stop process + SIGTSTP 18,20,24 Stop Stop typed at tty + SIGTTIN 21,21,26 Stop tty input for background process + SIGTTOU 22,22,27 Stop tty output for background process + + The signals SIGKILL and SIGSTOP cannot be trapped, blocked, or ignored. + +This table might seem a bit out of left field, but it gives you a rough idea of +what to expect when you send a certain signal to a process. You can see that, +by default, most of the signals terminate a process. +It's interesting to note the SIGUSR1 and SIGUSR2 signals. These are signals +whose action is meant specifically to be defined by your process. We'll see +shortly that we're free to redefine any of the signal actions that we please, +but those two signals are meant for your use. + +Redefining Signals + +Let's go back to our two ruby sessions and have some fun. + + 1. In the first ruby session use the following code to redefine the behaviour + of the INT signal: + + puts Process.pid + trap(:INT) { print "Na na na, you can't get me" } + sleep # so that we have time to send it a signal + + Now our process won't exit when it receives the INT signal. + 2. In the second ruby session issue the following command and notice that the + first process is taunting us! + + Process.kill(:INT, <pid of first session>) + + 3. You can try using Ctrl-C to kill that first session, and notice that it + responds the same! + 4. But as the table said there are some signals that cannot be redefined. + SIGKILL will show that guy who's boss. + + Process.kill(:KILL, <pid of first session>) + + + +Ignoring Signals + + + 1. In the first ruby session use the following code: + + puts Process.pid + trap(:INT, "IGNORE") + sleep # so that we have time to send it a signal + + 2. In the second ruby session issue the following command and notice that the + first process isn't affected. + + Process.kill(:INT, <pid of first session>) + + The first ruby session is unaffected. + + +Signal Handlers are Global + +Signals are a great tool and are the perfect fit for certain situations. But +it's good to keep in mind that trapping a signal is a bit like using a global +variable, you might be overwriting something that some other code depends on. +And unlike global variables signal handlers can't be namespaced. +So make sure you read this next section before you go and add signal handlers +to all of your open source libraries :) + +Being Nice about Redefining Signals + +There is a way to preserve handlers defined by other Ruby code, so that your +signal handler won't trample any other ones that are already defined. It looks +something like this: +# ./code/snippets/trap_int_signal.rb + + trap(:INT) { puts 'This is the first signal handler' } + + old_handler = trap(:INT) { + old_handler.call + puts 'This is the second handler' + exit + } + sleep 5 # so that we have time to send it a signal + +Just send it a Ctrl-C to see the effect. Both signal handlers are called. +Now let's see if we can preserve the system default behaviour. Hit the code +below with a Ctrl-C. +# ./code/snippets/decorate_default_signal_behaviour.rb + + system_handler = trap(:INT) { + puts 'about to exit!' + system_handler.call + } + sleep 5 # so that we have time to send it a signal + +:/ It blew up that time. So we can't preserve the system default behaviour with +this technique, but we can preserve other Ruby code handlers that have been +defined. +In terms of best practices your code probably shouldn't define any signal +handlers, unless it's a server. As in a long-running process that's booted from +the command line. It's very rare that library code should trap a signal. + + # The 'friendly' method of trapping a signal. + + old_handler = trap(:QUIT) { + # do some cleanup + puts 'All done!' + + old_handler.call if old_handler.respond_to?(:call) + } + +This handler for the QUIT signal will preserve any previous QUIT handlers that +have been defined. Though this looks 'friendly' it's not generally a good idea. +Imagine a scenario where a Ruby server tells its users they can send it a QUIT +signal and it will do a graceful shutdown. You tell the users of your library +that they can send a QUIT signal and it will draw an ASCII rainbow. Now if a +user sends the QUIT signal both handlers will be invoked. This violates the +expectations of both libraries. +Whether or not you decide to preserve previously defined signal handlers is up +to you, just make sure you know why you're doing it. If you simply want to wire +up some behaviour to clean up resources before exiting you can use an at_exit +hook, which we touched on in the chapter about exit codes. + +When Can't You Receive Signals? + +Your process can receive a signal anytime. That's the beauty of them! They're +asynchronous. +Your process can be pulled out of a busy for-loop into a signal handler, or +even out of a long sleep. Your process can even be pulled from one signal +handler to another if it receives one signal while processing another. But, as +expected, it will always go back and finish the code in all the handlers that +are invoked. + +In the Real World + +With signals, any process can communicate with any other process on the system, +so long as it knows its pid. This makes signals a very powerful communication +tool. It's common to send signals from the shell using kill(1). +In the real world signals are mostly used by long running processes like +servers and daemons. And for the most part it will be the human users who are +sending signals rather than automated programs. +For instance, the Unicorn web server %{http://unicorn.bogomips.org} responds to +the INT signal by killing all of its processes and shutting down immediately. +It responds to the USR2 signal by re-executing itself for a zero-downtime +restart. It responds to the TTIN signal by incrementing the number of worker +processes it has running. +See the SIGNALS file included with Unicorn %{http://unicorn.bogomips.org/ +SIGNALS.html} for a full list of the signals it supports and how it responds to +them. +The memprof project has a interesting example of being a friendly citizen when +handling signals %{https://github.com/ice799/memprof/blob/ +d4bc228aca323b58fea92dbde20c1f8ec36e5386/lib/memprof/signal.rb#L8-16}. + +System Calls + +Ruby's Process.kill maps to kill(2), Kernel#trap maps roughly to sigaction(2). +signal(7) is also useful. + +Processes Can Communicate + +Up until now we've looked at related processes that share memory and share open +resources. But what about communicating information between multiple processes? +This is part of a whole field of study called Inter-process communication (IPC +for short). There are many different ways to do IPC but I'm going to cover two +commonly useful methods: pipes and socket pairs. + +Our First Pipe + +A pipe is a uni-directional stream of data. In other words you can open a pipe, +one process can 'claim' one end of it and another process can 'claim' the other +end. Then data can be passed along the pipe but only in one direction. So if +one process 'claims' the position of reader, rather than writer, it will not be +able to write to the pipe. And vice versa. +Before we involve multiple processes let's just look at how to create a pipe +and what we get from that: +# ./code/snippets/pipe.rb + + reader, writer = IO.pipe #=> [#<IO:fd 5>, #<IO:fd 6>] + +IO.pipe returns an array with two elements, both of which are IO objects. +Ruby's amazing IO class %{http://librelist.com/browser//usp.ruby/2011/9/17/the- +ruby-io-class/} is the superclass to File, TCPSocket, UDPSocket, and others. As +such, all of these resources have a common interface. +The IO objects returned from IO.pipe can be thought of something like anonymous +files. You can basically treat them the same way you would a File. You can call +#read, #write, #close, etc. But this object won't respond to #path and won't +have a location on the filesystem. +Still holding back from bringing in multiple processes let's demonstrate +communication with a pipe: +# ./code/snippets/pipe_io.rb + + reader, writer = IO.pipe + writer.write("Into the pipe I go...") + writer.close + puts reader.read + +outputs + + Into the pipe I go... + +Pretty simple right? Notice that I had to close the writer after I wrote to the +pipe? That's because when the reader calls IO#read it will continue trying to +read data until it sees an EOF (aka. end-of-file marker %{http:// +en.wikipedia.org/wiki/End-of-file}). This tells the reader that no more data +will be available for reading. +So long as the writer is still open the reader might see more data, so it +waits. By closing the writer before reading it puts an EOF on the pipe so the +reader stops reading after it gets the initial data. If you skip closing the +writer then the reader will block and continue trying to read indefinitely. + +Pipes Are One-Way Only + +# ./code/snippets/pipe_direction.rb + + reader, writer = IO.pipe + reader.write("Trying to get the reader to write something") + +outputs + + >> reader.write("Trying to get the reader to write something") + IOError: not opened for writing + from (irb):2:in `write' + from (irb):2 + +The IO objects returned by IO.pipe can only be used for uni-directional +communication. So the reader can only read and the writer can only write. +Now let's introduce processes into the mix. + +Sharing Pipes + +In the chapter on forking I described how open resources are shared, or copied, +when a process forks a child. Pipes are considered a resource, they get their +own file descriptors and everything, so they are shared with child processes. +Here's a simple example of using a pipe to communicate between a parent and +child process. The child indicates to the parent that it has finished an +iteration of work by writing to the pipe: +# ./code/snippets/pipe_sharing_with_fork.rb + + reader, writer = IO.pipe + + fork do + reader.close + + 10.times do + # heavy lifting + writer.puts "Another one bites the dust" + end + end + + writer.close + while message = reader.gets + $stdout.puts message + end + +outputs Another one bites the dust ten times. +Notice that, like above, the unused ends of the pipe are closed so as not to +interfere with EOF being sent. There's actually one more layer when considering +EOF now that two processes are involved. Since the file descriptors were copied +there's now 4 instances floating around. Since only two of them will be used to +communicate the other 2 instances must be closed. Hence the extra instances of +closing. +Since the ends of the pipe are IO objects we can call any IO methods on them, +not just #read and #write. In this example I use #puts and #gets to read and +write a String delimited with a newline. I actually used those here to simplify +one aspect of pipes: pipes hold a stream of data. + +Streams vs. Messages + +When I say stream I mean that when writing and reading data to a pipe there's +no concept of beginning and end. When working with an IO stream, like pipes or +TCP sockets, you write your data to the stream followed by some protocol- +specific delimiter. For example, HTTP uses a series of newlines to delimit the +headers from the body. +Then when reading data from that IO stream you read it in one chunk at a time, +stopping when you come across the delimiter. That's why I used #puts and #gets +in the last example: it used a newline as the delimiter for me. +As you may have guessed it's possible to communicate via messages instead of +streams. We can't do it with pipe, but we can do it with Unix sockets. Without +going into too much detail, Unix sockets are a type of socket that can only +communicate on the same physical machine. As such it's much faster than TCP +sockets and is a great fit for IPC. +Here's an example where we create a pair of Unix sockets that can communicate +via messages: +# ./code/snippets/socketpair.rb + + require 'socket' + Socket.pair(:UNIX, :DGRAM, 0) #=> [#<Socket:fd 15>, #<Socket:fd + 16>] + +This creates a pair of UNIX sockets that are already connected to each other. +These sockets communicate using datagrams, rather than a stream. In this way +you write a whole message to one of the sockets and read a whole message from +the other socket. No delimiters required. +Here's a slightly more complex version of the pipe example where the child +process actually waits for the parent to tell it what to work on, then it +reports back to the parent once it's finished the work: +# ./code/snippets/socketpair_communication.rb + + require 'socket' + + child_socket, parent_socket = Socket.pair(:UNIX, :DGRAM, 0) + maxlen = 1000 + + fork do + parent_socket.close + + 4.times do + instruction = child_socket.recv(maxlen) + child_socket.send("#{instruction} accomplished!", 0) + end + end + child_socket.close + + 2.times do + parent_socket.send("Heavy lifting", 0) + end + 2.times do + parent_socket.send("Feather lifting", 0) + end + + 4.times do + $stdout.puts parent_socket.recv(maxlen) + end + +outputs: + + Heavy lifting accomplished! + Heavy lifting accomplished! + Feather lifting accomplished! + Feather lifting accomplished! + +So whereas pipes provide uni-directional communication, a socket pair provides +bi-directional communication. The parent socket can both read and write to the +child socket, and vice versa. + +Remote IPC? + +IPC implies communication between processes running on the same machine. If +you're interested in scaling up from one machine to many machines while still +doing something resembling IPC there are a few things to look into. The first +one would simply be to communicate via TCP sockets. This option would require +more boilerplate code than the others for a non-trivial system. Other plausible +solutions would be RPC %{http://en.wikipedia.org/wiki/Remote_procedure_call} +(remote procedure call), a messaging system like ZeroMQ %{http:// +www.zeromq.org/}, or the general body of distributed systems %{http:// +en.wikipedia.org/wiki/Distributed_computing}. + +In the Real World + +Both pipes and socket pairs are useful abstractions for communicating between +processes. They're fast and easy. They're often used as a communication channel +instead of a more brute force approach such as a shared database or log file. +As for which method to use: it depends on your needs. Keep in mind that pipes +are uni-directional and socket pairs are bi-directional when weighing your +decision. +For a more in-depth example have a look at the Spyglass Master class in the +included Spyglass project. It uses a more involved example of the code you saw +above where many child processes communicate over a single pipe with their +parent process. + +System Calls + +Ruby's IO.pipe maps to pipe(2), Socket.pair maps to socketpair(2). Socket.recv +maps to recv(2) and Socket.send maps to send(2). + +Daemon Processes + +Daemon processes are processes that run in the background, rather than under +the control of a user at a terminal. Common examples of daemon processes are +things like web servers, or database servers which will always be running in +the background in order to serve requests. +Daemon processes are also at the core of your operating system. There are many +processes that are constantly running in the background that keep your system +functioning normally. These are things like the window server on a GUI system, +printing services or audio services so that your speakers are always ready to +play that annoying 'ding' notification. + +The First Process + +There is one daemon process in particular that has special significance for +your operating system. We talked in a previous chapter about every process +having a parent process. Can that be true for all processes? What about the +very first process on the system? +This is a classic who-created-the-creator kind of problem, and it has a simple +answer. When the kernel is bootstrapped it spawns a process called the init +process. This process has a ppid of 0 and is the 'grandparent of all +processes'. It's the first one and it has no ancestor. Its pid is 1. + +Creating Your First Daemon Process + +What do we need to get started? Not much. Any process can be made into a daemon +process. +Let's look to the rack project %{http://github.com/rack/rack} for an example +here. Rack ships with a rackup command to serve applications using different +rack supported web servers. Web servers are a great example of a process that +will never end; so long as your application is active you'll need a server +listening for connections. +The rackup command includes an option to daemonize the server and run it in the +background. Let's have a look at what that does. + +Diving into Rack + +# ./code/snippets/daemonize.rb + + def daemonize_app + if RUBY_VERSION < "1.9" + exit if fork + Process.setsid + exit if fork + Dir.chdir "/" + STDIN.reopen "/dev/null" + STDOUT.reopen "/dev/null", "a" + STDERR.reopen "/dev/null", "a" + else + Process.daemon + end + end + +Lots going on here. Let's first jump to the else block. Ruby 1.9.x ships with a +method called Process.daemon that will daemonize the current process! How +convenient! +But don't you want to know how it works under the hood? I knew ya did! The +truth is that if you look at the MRI source for Process.daemon %{https:// +github.com/ruby/ruby/blob/c852d76f46a68e28200f0c3f68c8c67879e79c86/ +process.c#L4817-4860} and stumble through the C code it ends up doing the exact +same thing that Rack does in the if block above. +So let's continue using that as an example. We'll break down the code line by +line. + +Daemonizing a Process, Step by Step + + + exit if fork + +This line of code makes intelligent use of the return value of the fork method. +Recall from the forking chapter that fork returns twice, once in the parent +process and once in the child process. In the parent process it returns the +child's pid and in the child process it returns nil. +As always, the return value will be truth-y for the parent and false-y for the +child. This means that the parent process will exit, and as we know, orphaned +child processes carry on as normal. +If a process is orphaned then what happens when you ask for Process.ppid? +This is where knowledge of the init process becomes relevant. The ppid of +orphaned processes is always 1. This is the only process that the kernel can be +sure is active at all times. +This first step is imperative when creating a daemon because it causes the +terminal that invoked this script to think the command is done, returning +control to the terminal and taking it out of the equation. + + Process.setsid + +Calling Process.setsid does three things: + + 1. The process becomes a session leader of a new session + 2. The process becomes the process group leader of a new process group + 3. The process has no controlling terminal + +To understand exactly what effect these three things have we need to step out +of the context of our Rack example for a moment and look a little deeper. + +Process Groups and Session Groups + +Process groups and session groups are all about job control. By 'job control' +I'm referring to the way that processes are handled by the terminal. +We begin with process groups. +Each and every process belongs to a group, and each group has a unique integer +id. A process group is just a collection of related processes, typically a +parent process and its children. However you can also group your processes +arbitrarily by setting their group id using Process.setpgrp(new_group_id). +Have a look at the output from the following snippet. +# ./code/snippets/daemons_grp_eq_pid.rb + + puts Process.getpgrp + puts Process.pid + +If you ran that code in an irb session then those two values will be equal. +Typically the process group id will be the same as the pid of the process group +leader. The process group leader is the 'originating' process of a terminal +command. ie. If you start an irb process at the terminal it will become the +group leader of a new process group. Any child processes that it creates will +be made part of the same process group. +Try out the following example to see that process groups are inherited. +# ./code/snippets/daemons_grp_inherited.rb + + puts Process.pid + puts Process.getpgrp + + fork { + puts Process.pid + puts Process.getpgrp + } + +You can see that although the child process gets a unique pid it inherits the +group id from its parent. So these two processes are part of the same group. +You'll recall that we looked previously at Orphaned Processes. In that section +I said that child processes are not given special treatment by the kernel. Exit +a parent process and the child will continue on. This is the behaviour when a +parent process exits, but the behaviour is a bit different when the parent +process is being controlled by a terminal and is killed by a signal. +Consider for a moment: a Ruby script that shells out to a long-running shell +command, eg. a long backup script. What happens if you kill the Ruby script +with a Ctrl-C? +If you try this out you'll notice that the long-running backup script is not +orphaned, it does not continue on when its parent is killed. We haven't set up +any code to forward the signal from the parent to the child, so how is this +done? +The terminal receives the signal and forwards it on to any process in the +foreground process group. In this case, both the Ruby script and the long- +running shell command would part of the same process group, so they would both +be killed by the same signal. +And then session groups... +A session group is one level of abstraction higher up, a collection of process +groups. Consider the following shell command: + + git log | grep shipped | less + +In this case each command will get its own process group, since each may be +creating child processes but none is a child process of another. Even though +these commands are not part of the same process group one Ctrl-C will kill them +all. +These commands are part of the same session group. Each invocation from the +shell gets its own session group. An invocation may be a single command or a +string of commands joined by pipes. +Like in the above example, a session group may be attached to a terminal. It +might also not be attached to any terminal, as in the case of a daemon. +Again, your terminal handles session groups in a special way: sending a signal +to the session leader will forward that signal to all the process groups in +that session, which will forward it to all the processes in those process +groups. Turtles all the way down ;) +There is a system call for retrieving the current session group id, getsid(2), +but Ruby's core library has no interface to it. Using Process.setsid will +return the id of the new sesssion group it creates, you can store that if you +need it. +So, getting back to our Rack example, in the first line a child process was +forked and the parent exited. The originating terminal recognized the exit and +returned control to the user, but the forked process still has the inherited +group id and session id from its parent. At the moment this forked process is +neither a session leader nor a group leader. +So the terminal still has a link to our forked process, if it were to send a +signal to its session group the forked process would receive it, but we want to +be fully detached from a terminal. +Process.setsid will make this forked process the leader of a new process group +and a new session group. Note that Process.setsid will fail in a process that +is already a process group leader, it can only be run from child processes. +This new session group does not have a controlling terminal, but technically +one could be assigned. + + exit if fork + +The forked process that had just become a process group and session group +leader forks again and then exits. +This newly forked process is no longer a process group leader nor a session +leader. Since the previous session leader had no controlling terminal, and this +process is not a session leader, it's guaranteed that this process can never +have a controlling terminal. Terminals can only be assigned to session leaders. +This dance ensures that our process is now fully detached from a controlling +terminal and will run to its completion. + + Dir.chdir "/" + +This changes the current working directory to the root directory for the +system. This isn't strictly necessary but it's an extra step to ensure that +current working directory of the daemon doesn't disappear during its execution. +This avoids problems where the directory that the daemon was started from gets +deleted or unmounted for any reason. + + STDIN.reopen "/dev/null" + STDOUT.reopen "/dev/null", "a" + STDERR.reopen "/dev/null", "a" + +This sets all of the standard streams to go to /dev/null, a.k.a. to be ignored. +Since the daemon is no longer attached to a terminal session these are of no +use anyway. They can't simply be closed because some programs expect them to +always be available. Redirecting them to /dev/null ensures that they're still +available to the program but have no effect. + +In the Real World + +As mentioned, the rackup command ships with a command line option for +daemonizing the process. Same goes with any of the popular Ruby web servers. +If you want to dig in to more internals of daemon processes you should look at +the daemons rubygem %{http://rubygems.org/gems/daemons}. +If you think you want to create a daemon process you should ask yourself one +basic question: Does this process need to stay responsive forever? +If the answer is no then you probably want to look at a cron job or background +job system. If the answer is yes, then you probably have a good candidate for a +daemon process. + +System Calls + +Ruby's Process.setsid maps to setsid(2), Process.getpgrp maps to getpgrp(2). +Other system calls mentioned in this chapter were covered in detail in previous +chapters. + +Spawning Terminal Processes + +A common interaction in a Ruby program is 'shelling out' from your program to +run a command in a terminal. This happens especially when I'm writing a Ruby +script to glue together some common commands for myself. There are several ways +you can spawn processes to run terminal commands in Ruby. +Before we look at the different ways of 'shelling out' let's look at the +mechanism they're all using under the hood. + +fork + exec + +All of the methods described below are variations on one theme: fork(2) + +execve(2). +We've had a good look at fork(2) in previous chapters, but this is our first +look at execve(2). It's pretty simple, execve(2) allows you to replace the +current process with a different process. +Put another way: execve(2) allows you to transform the current process into any +other process. You can take a Ruby process and turn it into a Python process, +or an ls(1) process, or another Ruby process. +execve(2) transforms the process and never returns. Once you've transformed +your Ruby process into something else you can never come back. + + exec 'ls', '--help' + +The fork + exec combo is a common one when spawning new processes. execve(2) is +a very powerful and efficient way to transform the current process into another +one; the only catch is that your current process is gone. That's where fork(2) +comes in handy. +You can use fork(2) to create a new process, then use execve(2) to transform +that process into anything you like. Voila! Your current process is still +running just as it was before and you were able to spawn any other process that +you want to. +If your program depends on the output from the execve(2) call you can use the +tools you learned in previous chapters to handle that. Process.wait will ensure +that your program waits for the child process to finish whatever it's doing so +you can get the result back. + +File descriptors and exec + +At the OS level, a call to execve(2) doesn't close any open file descriptors by +default. +However, a call to exec in Ruby will close all open file descriptors by default +(excluding the standard streams). +In other words, the default OS behaviour when you exec('ls') would be to give +ls a copy of any open file descriptors, eg. a database connection. This is +rarely what you want, so Ruby's default is to close all open file descriptors +before doing an exec. +This default behaviour of closing file descriptors on exec prevents file +descriptor 'leaks'. A leak may happen when you fork + exec to spawn another +process that has no need for the file descriptors you currently have open (like +your database connections, logfiles, etc.) A leak can waste resources but, even +worse, can lead to havoc when you try to close your database connection, only +to find that some other process erroneously still has the connection open. +However, you may sometimes want to keep a file descriptor open, to pass an open +logfile or live socket to another program being booted via exec%{The Unicorn +web server uses this exact behavoiur to enable restarts without losing any +connections. By passing the open listener socket to the new version of itself +through an exec, it ensures that the listener socket is never closed during a +restart.}. You can control this behaviour by passing an options hash to exec +mapping file descriptor numbers to IO objects, as seen in the following +example. +# ./code/snippets/exec_python.rb + + hosts = File.open('/etc/hosts') + + python_code = %Q[import os; print os.fdopen(#{hosts.fileno}).read()] + + # The hash as the last arguments maps any file descriptors that should + # stay open through the exec. + exec 'python', '-c', python_code, {hosts.fileno => hosts} + +In this example we start up a Ruby program and open the /etc/hosts file. Then +we exec a python process and tell it to open the file descriptor number that +Ruby received for opening the /etc/hosts file. You can see that python +recognizes this file descriptor (because it was shared via execve(2)) and is +able to read from it without having to open the file again. +Notice the options hash mapping the file descriptor number to the IO object. If +you remove that hash, the Python program won't be able to open the file +descriptor, that declaration keeps it open through the execve(2). +Unlike fork(2), execve(2) does not share memory with the newly created process. +In the python example above, whatever was allocated in memory for the use of +the Ruby program was essentially wiped away when execve(2) was called leaving +the python program with a blank slate in terms of memory usage. + +Arguments to exec + +Notice in all of the examples above I sent an array of arguments to exec, +rather than passing them as a string? There's a subtle difference to the two +argument forms. +Pass a string to exec and it will actually start up a shell process and pass +the string to the shell to interpret. Pass an array and it will skip the shell +and set up the array directly as the ARGV to the new process. +Generally you want to avoid passing a string unless you really need to. Pass an +array where possible. Passing a string and running code through the shell can +raise security concerns. If user input is involved it may be possible for them +to inject a malicious command directly in a shell, potentially gaining access +to any privileges the current process has. In a case where you want to do +something like exec('ls * | awk '{print($1)}') you'll have to pass it as a +string. + +Kernel#system + + + system('ls') + system('ls', '--help') + system('git log | tail -10') + +The return value of Kernel#system reflects the exit code of the terminal +command in the most basic way. If the exit code of the terminal command was 0 +then it returns true, otherwise it returns false. +The standard streams of the terminal command are shared with the current +process (through the magic of fork(2)), so any output coming from the terminal +command should be seen in the same way output is seen from the current process. + +Kernel#` + + + `ls` + `ls --help` + %x[git log | tail -10] + +Kernel#` works slightly differently. The value returned is the STDOUT of the +terminal program collected into a String. +As mentioned, it's using fork(2) under the hood and it doesn't do anything +special with STDERR, so you can see in the second example that STDERR is +printed to the screen just as with Kernel#system. +Kernel#` and %x[] do the exact same thing. + +Process.spawn + +# ./code/snippets/process_spawn.rb + + # This call will start up the 'rails server' process with the + # RAILS_ENV environment variable set to 'test'. + Process.spawn({'RAILS_ENV' => 'test'}, 'rails server') + + # This call will merge STDERR with STDOUT for the duration + # of the 'ls --help' program. + Process.spawn('ls', '--zz', STDERR => STDOUT) + +Process.spawn is a bit different than the others in that it is non-blocking. +If you compare the following two examples you will see that Kernel#system will +block until the command is finished, whereas Process.spawn will return +immediately. +# ./code/snippets/process_spawn_waitpid.rb + + # Do it the blocking way + system 'sleep 5' + + # Do it the non-blocking way + Process.spawn 'sleep 5' + + # Do it the blocking way with Process.spawn + # Notice that it returns the pid of the child process + pid = Process.spawn 'sleep 5' + Process.waitpid(pid) + +The last example in this code block is a really great example of the +flexibility of Unix programming. In previous chapters we talked a lot about +Process.wait, but it was always in the context of forking and then running some +Ruby code. You can see from this example that the kernel cares not what you are +doing in your process, it will always work the same. +So even though we fork(2) and then run the sleep(1) program (a C program) the +kernel still knows how to wait for that process to finish. Not only that, it +will be able to properly return the exit code just as was happening in our Ruby +programs. +All code looks the same to the kernel; that's what makes it such a flexible +system. You can use any programming language to interact with any other +programming language, and all will be treated equally. +Process.spawn takes many options that allow you to control the behaviour of the +child process. I showed a few useful ones in the example above. Consult the +official rdoc %{http://www.ruby-doc.org/core-1.9.3/Process.html#method-c-spawn} +for an exhaustive list. + +IO.popen + +# ./code/snippets/spawn_popen_no_block.rb + + # This example will return a file descriptor (IO object). Reading from it + # will return what was printed to STDOUT from the shell command. + IO.popen('ls') + +The most common usage for IO.popen is an implementation of Unix pipes in pure +Ruby. That's where the 'p' comes from in popen. Underneath it's still doing the +fork+exec, but it's also setting up a pipe to communicate with the spawned +process. That pipe is passed as the block argument in the block form of +IO.popen. +# ./code/snippets/spawn_popen_block.rb + + # An IO object is passed into the block. In this case we open the stream + # for writing, so the stream is set to the STDIN of the spawned process. + # + # If we open the stream for reading (the default) then + # the stream is set to the STDOUT of the spawned process. + IO.popen('less', 'w') { |stream| + stream.puts "some\ndata" + } + +With IO.popen you have to choose which stream you have access to. You can't +access them all at once. + +open3 + +Open3 allows simultaneous access to the STDIN, STDOUT, and STDERR of a spawned +process. +# ./code/snippets/spawn_open3_eg.rb + + # This is available as part of the standard library. + require 'open3' + + Open3.popen3('grep', 'data') { |stdin, stdout, stderr| + stdin.puts "some\ndata" + stdin.close + puts stdout.read + } + + # Open3 will use Process.spawn when available. Options can be passed to + # Process.spawn like so: + Open3.popen3('ls', '-uhh', :err => :out) { |stdin, stdout, stderr| + puts stdout.read + } + +Open3 acts like a more flexible version of IO.popen, for those times when you +need it. + +In the Real World + +All of these methods are common in the Real World. Since they all differ in +their behaviour you have to select one based on your needs. +One drawback to all of these methods is that they rely on fork(2). What's wrong +with that? Imagine this scenario: You have a big Ruby app that is using +hundreds of MB of memory. You need to shell out. If you use any of the methods +above you'll incur the cost of forking. +Even if you're shelling out to a simple ls(1) call the kernel will still need +to make sure that all of the memory that your Ruby process is using is +available for that new ls(1) process. Why? Because that's the API of fork(2). +When you fork(2) the process the kernel doesn't know that you're about to +transform that process with an exec(2). You may be forking in order to run Ruby +code, in which case you'll need to have all of the memory available. +It's good to keep in mind that fork(2) has a cost, and sometimes it can be a +performance bottleneck. What if you need to shell out a lot and don't want to +incur the cost of fork(2)? +There are some native Unix system calls for spawning processes without the +overhead of fork(2). Unfortunately they don't have support in the Ruby language +core library. However, there is a Rubygem that provides a Ruby interface to +these system calls. The posix-spawn project %{http://github.com/rtomayko/posix- +spawn} provides access to posix_spawn(2), which is available on most Unix +systems. +posix-spawn mimics the Process.spawn API. In fact, most of the options that you +pass to Process.spawn can also be passed to POSIX::Spawn.spawn. So you can keep +using the same API and yet reap the benefits of faster, more resource efficient +spawning. +At a basic level posix_spawn(2) is a subset of fork(2). Recall the two +discerning attributes of a new child process from fork(2): 1) it gets an exact +copy of everything that the parent process had in memory, and 2) it gets a copy +of all the file descriptors that the parent process had open. +posix_spawn(2) preserves #2, but not #1. That's the big difference between the +two. So you can expect a newly spawned process to have access to any of the +file descriptors opened by the parent, but it won't share any of the memory. +This is what makes posix_spawn(2) faster and more efficient than fork(2). But +keep in mind that it also makes it less flexible. + +System Calls + +Ruby's Kernel#system maps to system(3), Kernel#exec maps to execve(2), IO.popen +maps to popen(3), posix-spawn uses posix_spawn(2). Ruby controls the 'close-on- +exec' behaviour using fcntl(2) with the FD_CLOEXEC option. + +Ending + +Working with processes in Unix is about two things: abstraction and +communication. + +Abstraction + +The kernel has an extremely abstract (and simple) view of its processes. As +programmers we're used to looking at source code as the differentiator between +two programs. +We are masters of many programming languages, using each for different +purposes. We couldn't possibly write memory-efficient code in a language with a +garbage collector, we'll have to use C. But we need objects, let's use C++. On +and on. +But if you ask the kernel it all looks the same. In the end, all of our code is +compiled down to something simple that the kernel can understand. And when it's +working at that level all processes are treated the same. Everything gets its +numeric identifier and is given equal access to the resources of the kernel. +What's the point of all this jibber-jabber? Using Unix programming lets you +twiddle with these knobs a little bit. It lets you do things that you can't +accomplish when working at the programming language level. +Unix programming is programming language agnostic. It lets you interface your +Ruby script with a C program, and vice versa. It also lets you reuse its +concepts across programming languages. The Unix Programming skills that you get +from Ruby will be just as applicable in Python, or node.js, or C. These are +skills that are about programming in general. + +Communication + +Besides the basic act of creating new processes, almost everything else we +talked about was regarding communication. Following the principle of +abstraction mentioned above, the kernel provides very abstract ways of +communicating between processes. +Using signals any two processes on the system can communicate with each other. +By naming your processes you can communicate with any user who is inspecting +your program on the command line. Using exit codes you can send success/failure +messages to any process that's looking after your own. + +Farewell, But Not Goodbye + +That's the end! Congratulations for making it here! Believe it or not, you now +know more than most programmers about the inner workings of Unix processes. +Now that you know the fundamentals you can go out apply your newfound knowledge +to anything that you work on. Things are going to start making more sense for +you. And the more you apply your newfound knowledge: the clearer things will +become. There's no stopping you now. +And we haven't even talked about networking :) We'll save that one for another +edition. +Read the appendices at the end of this book for a look at some popular Ruby +projects and how they use Unix processes to be awesome. +If you have any feedback on this book, find an error or build something cool +with your newfound knowledge, I'd love to hear it. Send a message to +jesse@jstorimer.com. Happy coding! + +Appendix: How Resque Manages Processes + +This section looks at how a popular Ruby job queue, Resque %{http://github.com/ +defunkt/resque#readme}, effectively manages processes. Specifically it makes +use of fork(2) to manage memory, not for concurrency or speed reasons. + +The Architecture + +To understand why Resque works the way it does we need a basic understanding of +how the system works. +From the README: + + Resque is a Redis-backed library for creating background jobs, + placing those jobs on multiple queues, and processing them later. + +The component that we're interested in is the Resque worker. Resque workers +take care of the 'processing them later' part. The job of a Resque worker is to +boot up, load your application environment, then connect to Redis and try to +reserve any pending background jobs. When it's able to reserve one such job it +works off the job, then goes back to step 1. Simple enough. +For an application of non-trivial size one Resque worker is not enough. So it's +very common to spin up multiple Resque workers in parallel to work off jobs. + +Forking for Memory Management + +Resque workers employ fork(2) for memory management purposes. Let's have a look +at the relevant bit of code (from Resque v1.18.0) eand then dissect it line by +line. + + if @child = fork + srand # Reseeding + procline "Forked #{@child} at #{Time.now.to_i}" + Process.wait(@child) + else + procline "Processing #{job.queue} since #{Time.now.to_i}" + perform(job, &block) + exit! unless @cant_fork + end + +This bit of code is executed every time Resque works off a job. +If you've read through the Forking chapter then you'll already be familiar with +the if/else style here. Otherwise go read it now! +We'll start by looking at the code inside the parent process (ie. inside the if +block). + + srand # Reseeding + +This line is here simply because of a bug %{http://redmine.ruby-lang.org/ +issues/4338} in a certain patchlevel of MRI Ruby 1.8.7. + + procline "Forked #{@child} at #{Time.now.to_i}" + +procline is Resque's internal way of updating the name of the current process. +Remember we noted that you can change the name of the current process by +setting $0 but Ruby doesn't include a method for it? +This is Resque's solution. procline sets the name of the current process. + + Process.wait(@child) + +If you've read the chapter on Process.wait then this line of code should be +familiar to you. +The @child variable was assigned the value of the fork call. So in the parent +process that will be the child pid. This line of code tells the parent process +to block until the child is finished. +Now we'll look at what happens in the child process. + + procline "Processing #{job.queue} since #{Time.now.to_i}" + +Notice that both the if and else block make a call to procline. Even though +these two lines are part of the same logical construct they are being executed +in two different processes. Since the process name is process-specific these +two calls will set the name for the parent and child process respectively. + + perform(job, &block) + +Here in the child process is where the job is actually 'performed' by Resque. + + exit! unless @cant_fork + +Then the child process exits. + +Why Bother? + +As mentioned in the first paragraph of this chapter, Resque isn't doing this to +achieve concurrency or to make things faster. In fact, it adds an extra step to +the processing of each job which makes the whole thing slower. So why go to the +trouble? Why not just process job after job? +Resque uses fork(2) to ensure that the memory usage of its worker processes +don't bloat. Let's review what happens when a Resque worker forks and how that +affects the Ruby VM. +You'll recall that fork(2) creates a new process that's an exact copy of the +original process. The original process, in this case, has preloaded the +application environment and nothing else. So we know that after forking we'll +have a new process with just the application environment loaded. +Then the child process will go to the task of working off the job. This is +where memory usage can go awry. The background job may require that image files +are loaded into main memory for processing, or many ActiveRecord objects are +fetched from the database, or any other operation that requires large amounts +of main memory to be used. +Once the child process is finished with the job it exits, which releases all of +its memory back to the OS to clean up. Then the original process can resume, +once again with only the application environment loaded. +So each time after a job is performed by Resque you end up back at a clean +slate in terms of memory usage. This means that memory usage may spike when +jobs are being worked on, but it should always come back to that nice baseline. + +Doesn't the GC clean up for us? + +Well, yes, but it doesn't do a great job. It does an OK job. The truth is that +MRI's GC has a hard time releasing memory that it doesn't need anymore. +When the Ruby VM boots up it is allocated a certain block of main memory by the +kernel. When it uses up all that it has it needs to ask for another block of +main memory from the kernel. +Due to numerous issues with Ruby's GC (naive approach, disk fragmentation) it +is rare that the VM is able to release a block of memory back to the kernel. So +the memory usage of a Ruby process is likely to grow over time, but not to +shrink. Now Resque's approach begins to make sense! +If the Resque worker simply worked off each job as it became available then it +wouldn't be able to maintain that nice baseline level of memory usage. As soon +as it worked on a job that required lots of main memory then that memory would +be stuck with the worker process until it exited. +Even if subsequent jobs needed much less memory Ruby would have a hard time +giving that memory back to the kernel. Hence, the worker processes would +inevitably get bigger over time. Never shrinking. +Thanks to the power of fork(2) Resque workers are reliable and don't need to be +restarted after working a certain number of jobs. + +Appendix: How Unicorn Reaps Worker Processes + +Any investigation of Unix Programming in the Ruby language would be remiss +without many mentions of the Unicorn web server %{http://unicorn.bogomips.org}. +Indeed, the project has already been mentioned several times in this book. +What's the big deal? Unicorn is a web server that attempts to push as much +responsibility onto the kernel as it can. It uses lots of Unix Programming. The +codebase is chock full of Unix Programming techniques. +Not only that, but it's performant and reliable. It's used by lots of big Ruby +websites like Github and Shopify. +The point is, if this book has whet your appetite and you want to learn more +about Unix Programming in Ruby you should plumb the depths of Unicorn. It may +take you several trips into the belly of the mythical beast but you will come +out with better understanding and new ideas. + +Reaping What? + +Before we dive into the code I'd like to provide a bit of context about how +Unicorn works. At a very high level Unicorn is a pre-forking web server. +This means that you boot it up and tell it how many worker processes you would +like it to have. It starts by initializing its network sockets and loading your +application. Then it uses fork(2) to create the worker processes. It uses the +master-worker pattern we mentioned in the chapter on forking. +The Unicorn master process keep a heartbeat on each of its workers and ensures +they're not taking too long to process requests. The code below is used when +you tell the Unicorn master process to exit. As we covered in chapter (Forking) +if a parent process doesn't kill its children before it exits they will +continue on without stopping. +So it's important that Unicorn clean up after itself before it exits. The code +below is invoked as part of Unicorn's exit procedure. Before invoking this code +it will send a QUIT signal to each of its worker process, instructing it to +exit gracefully. +The code below is used by Unicorn (current as of v4.0.0) to clean up its +internal representation of its workers and ensure that they all exited +properly. +Let's dive in. + + # reaps all unreaped workers + def reap_all_workers + begin + wpid, status = Process.waitpid2(-1, Process::WNOHANG) + wpid or return + if reexec_pid == wpid + logger.error "reaped #{status.inspect} exec()-ed" + self.reexec_pid = 0 + self.pid = pid.chomp('.oldbin') if pid + proc_name 'master' + else + worker = WORKERS.delete(wpid) and worker.close rescue nil + m = "reaped #{status.inspect} worker=#{worker.nr rescue 'unknown'}" + status.success? ? logger.info(m) : logger.error(m) + end + rescue Errno::ECHILD + break + end while true + end + +We'll take it one line at a time: + + begin + ... + end while true + +The first thing that I want to draw your attention to is the fact that the +begin block that's started on the first line of this method actually starts an +endless loop. There are others ways to write endless loops in Ruby, but the +important part is to keep in mind that we are in an endless loop so we'll need +a hard return or a break in order to finish this method. + + wpid, status = Process.waitpid2(-1, Process::WNOHANG) + +This line should have some familiarity. We looked at Process.waitpid2 in the +chapter on Process.wait. +There we saw that passing a valid pid as the first option would cause the +Process.waitpid call to wait only for that pid. What happens when you pass - +1 to Process.waitpid? We know that there are no processes with a pid less than +1, so... +Passing -1 waits for any child process to exit. It turns out that this is the +default option to that method. If you don't specify a pid then it uses -1 by +default. In this case, since the author needed to pass something in for the +second argument, the first argument couldn't be left blank, so it was set to +the default. +Hey, if you're waiting on any child process why not use Process.wait2 then? I +suspect that the author decided here, and I agree with him, that it was most +readable to use a waitpid variation when specifying a value for the pid. As +mentioned above the value specified is simply the default, but nonetheless it's +most salient to use waitpid if you're specifying any value for the pid. +Remember Process::WNOHANG from before? When using this flag if there are no +processes that have exited for us then it will not block and simply return nil. + + wpid or return + +This line may look a little odd but it's actually a conditional return +statement. If wpid is nil then we know that the last line returned nil. This +would mean that there are no child processes that have exited returning their +status to us. +If this is the case then this method will return and its job is done. + + if reexec_pid == wpid + logger.error "reaped #{status.inspect} exec()-ed" + self.reexec_pid = 0 + self.pid = pid.chomp('.oldbin') if pid + proc_name 'master' + +I don't want to spend much time talking about this bit. The 'reexec' stuff has +to do with Unicorn internals, specifically how it handles zero-downtime +restarts. Perhaps I can cover that process in a future report. +One thing that I will draw your attention to is the call to proc_name. This is +similar to the procline method from the Resque chapter. Unicorn also has a +method for changing the display name of the current process. A critical piece +of communication with the user of your software. + + else + worker = WORKERS.delete(wpid) and worker.close rescue nil + +Unicorn stores a list of currently active worker processes in its WORKERS +constant. WORKERS is a hash where the key is the pid of the worker process and +the value is an instance of Unicorn::Worker. +So this line removes the worker process from Unicorn's internal tracking list +(WORKERS) and calls #close on the worker instance, which closes its no longer +needed heartbeat mechanism. + + m = "reaped #{status.inspect} worker=#{worker.nr rescue 'unknown'}" + +These lines craft a log message based on the status returned from the +Process.waitpid2 call. +The string is crafted by first inspecting the status variable. What does that +look like? Something like this: + + #<Process::Status: pid=32227,exited(0)> + # or + #<Process::Status: pid=32308,signaled(SIGINT=2)> + +It includes the pid of the ended process, as well as the way it ended. In the +first line the process exited itself with an exit code of 0. In the second line +the process was killed with a signal, SIGINT in this case. So a line like that +will be added to the Unicorn log. +The second part of the log line worker.nr is Unicorn's internal representation +of the worker's number. + + status.success? ? logger.info(m) : logger.error(m) + +This line takes the crafted log message and sends it to the logger. It uses the +success? method on the status object to log this message as at the INFO level +or the ERROR level. +The success? method will only return true in one case, when the process exited +with an exit code of 0. If it exited with a different code it will return +false. If it was killed by a signal, it will return nil. + + rescue Errno::ECHILD + break + +This is part of the top-level begin statement in this method. If this exception +is raised then the endless loop that is this method breaks and it will return. +The Errno::ECHILD exception will be raised by Process.waitpid2 (or any of its +cousins) if there are no child processes for the current processes. If that +happens in this case then it means the job of this method is done! All of the +child processes have been reaped. So it returns. + +Conclusion + +If this bit of code interested you and you want to learn more about Unix +Programming in Ruby, Unicorn is a great resource. See the official site at +http://unicorn.bogomips.org and go learn! + +Appendix: Preforking Servers + +I'm glad you made it this far because this chapter may be the most action- +packed in the whole book. Preforking servers bring together a lot of the +concepts that are explained in this book into a powerful, highly-efficient +approach to solving certain problems. +There's a good chance that you've used either Phusion Passenger %{http:// +www.modrails.com/} or Unicorn %{http://unicorn.bogomips.org}. Both of those +servers, and Spyglass (the web server included with this book), are examples of +preforking servers. +At the core of all these projects is the preforking model. There are a few +things about preforking that make it special, here are 3: + + 1. Efficient use of memory. + 2. Efficient load balancing. + 3. Efficient sysadminning. + +We'll look at each in turn. + +Efficient use of memory + +In the chapter on forking we discussed how fork(2) creates a new process that's +an exact copy of the calling (parent) process. This includes anything that the +parent process had in memory at the time. +Loading a Rails App +On my Macbook Pro loading only Rails 3.1 (no libraries or application code) +takes in the neighbourhood of 3 seconds. After loading Rails the process is +consuming about 70MB of memory. +Whether or not these numbers are exactly the same on your machine isn't +significant for our purposes. I'll be referring to these as a baseline in the +following examples. +Preforking uses memory more efficiently than does spawning multiple unrelated +processes. For comparison, this is like running Unicorn with 10 worker +processes compared to running 10 instances of Mongrel (a non-preforking +server). +Let's review what will happen from the standpoint of processes, first looking +at Mongrel, then at Unicorn, when we boot up 10 instances of each server. + +Many Mongrels + +Booting up 10 Mongrel processes in parallel will look about the same as booting +up 10 Mongrel processes serially. +When booting them in parallel all 10 processes will be competing for resources +from the kernel. Each will be consuming resources to load Rails, and each can +be expected to take the customary 3 seconds to boot. In total, that's 30 +seconds. On top of that, each process will be consuming 70MB of memory once +Rails has been loaded. In total, that's 700MB of memory for 10 processes. +A preforking server can do better. + +Many Unicorn + +Booting up 10 Unicorn workers will make use of 11 processes. One process will +be the master, babysitting the other worker processes, of which there are 10. +When booting Unicorn only one process, the master process, will load Rails. +There won't be competition for kernel resources. +The master process will take the customary 3 seconds to load, and forking 10 +processes will be more-or-less instantaneous. The master process will be +consuming 70MB of memory to load Rails and, thanks to copy-on-write, the child +processes should not be using any memory on top of what the master was using. +The truth is that it does take some time to fork a process (it's not +instantaneous) and that there is some memory overhead for each child process. +These values are negligible compared to the overhead of booting many Mongrels. +Preforking wins. +Keep in mind that the benefits of copy-on-write are forfeited if you're running +MRI. To reap these benefits you need to be using REE. + +Efficient load balancing + +I already highlighted the fact that fork(2) creates an exact copy of the +calling process. This includes any file descriptors that the parent process has +open. +The Very Basics of Sockets +Efficient load balancing has a lot to do with how sockets work. Since we're +talking about web servers: sockets are important. They're at the very core of +networking. As I hinted earlier: sockets and networking are a complex topic, +too big to fit into this book. But you need to understand the very basic +workflow in order to understand this next part. +Using a socket involves multiple steps: 1) A socket is opened and binds to a +unique port, 2) A connection is accepted on that socket using accept(2), and 3) +Data can be read from this connection, written to the connection, and +ultimately the connection is closed. The socket stays open, but the connection +is closed. +Typically this would happen in the same process. A socket is opened, then the +process waits for connections on that socket. The connection is handled, +closed, and the loop starts over again. +Preforking servers use a different workflow to let the kernel balance heavy +load across the socket. Let's look at how that's done. +In servers like Unicorn and Spyglass the first thing that the master process +does is open the socket, before even loading the Rails app. This is the socket +that is available for external connections from web clients. But the master +process does not accept connections. Thanks to the way fork(2) works, when the +master process forks worker processes each one gets a copy of the open socket. +This is where the magic happens. +Each worker process has an exact copy of the open socket, and each worker +process attempts to accept connections on that socket using accept(2). This is +where the kernel takes over and balances load across the 10 copies of the +socket. It ensures that one, and only one, process can accept each individual +connection. Even under heavy load the kernel ensures that the load is balanced +and that only one process handles each connection. +Compare this to how Mongrel achieves load balancing. +Given 10 unrelated processes that aren't sharing a socket each one must bind to +a unique port. Now a piece of infrastructure must sit in front of all of the +Mongrel processes. It must know which port each Mongrel processes is bound to, +and it must do the job of making sure that each Mongrel is handling only one +connection at a time and that connections are load balanced properly. +Again, preforking wins both for simplicity and resource efficiency. + +Efficient sysadminning + +This point is less technical, more human-centric. +As someone administering a preforking server you typically only need to issue +commands (usually signals) to the master process. It will handle keeping track +of and relaying messages to its worker processes. +When administering many instances of a non-preforking server the sysadmin must +keep track of each instance, adminster them separately and ensure that their +commands are followed. + +Basic Example of a Preforking Server + +What follows is some really basic code for a preforking server. It can respond +to requests in parallel using multiple processes and will leverage the kernel +for load balancing. For a more involved example of a preforking server I +suggest you check out the Spyglass source code (next chapter) or the Unicorn +source code. +# ./code/snippets/prefork.rb + + require 'socket' + + # Open a socket. + socket = TCPServer.open('0.0.0.0', 8080) + + # Preload app code. + # require 'config/environment' + + # Forward any relevant signals to the child processes. + [:INT, :QUIT].each do |signal| + Signal.trap(signal) { + wpids.each { |wpid| Process.kill(signal, wpid) } + } + end + + # For keeping track of child process pids. + wpids = [] + + 5.times { + wpids << fork do + loop { + connection = socket.accept + connection.puts 'Hello Readers!' + connection.close + } + end + } + + Process.waitall + +You can consume it with something like nc(1) or telnet(1) to see it in action. + + $ nc localhost 8080 + $ telnet localhost 8080 + +Notice that I snuck something new into that one? We haven't seen +Process.waitall yet, it appeared on the last line of the example code above. +Process.waitall is simply a convenience method around Process.wait. It runs a +loop waiting for all child processes to exit and returns an array of process +statuses. Useful when you don't actually want to do anything with the process +status info, it just waits for the children to exit. + +Appendix: Spyglass + +If you want to know even more about Unix processes then your next stop should +be the included Spyglass project. Why? Because it was written specifically to +showcase Unix programming concepts. +If you have a copy of this book but didn't get the included code project, send +me an email and I'll hook you up: jesse@jstorimer.com. +The case studies you read are meant to showcase the same thing, but at times +they can be dense and hard to read when you're new to Unix programming. +Spyglass is meant to bridge that gap. + +Spyglass' Architecture + +Spyglass is a web server. It opens a socket to the outside world and handles +web requests. Spyglass parses HTTP, is Rack-compliant, and is awesome. +Here's a brief summary of how to start a Spyglass server and what happens when +it receives an HTTP request. + +Booting Spyglass + + + $ spyglass + $ spyglass -p other_port + $ spyglass -h # for help + + +Before a Request Arrives + +After it boots, control is passed to Spyglass::Lookout. This class DOES NOT +preload the Rack application and knows nothing about HTTP, it just waits for a +connection. At this point in time Spyglass is extremely lightweight, it's +nothing more than just an open socket. + +Connection is Made + +When Spyglass::Lookout is notified that a connection has been made it forks a +Spyglass::Master to actually handle the connection. Spyglass::Lookout uses +Process.wait after forking the master process, so it remains idle until the +master exits. +Spyglass::Master is responsible for preloading the Rack application and +forking/babysitting worker processes. The master process itself doesn't know +anything about HTTP parsing or request handling. +The real work is done in Spyglass::Worker. It accepts connections using the +method outlined in the chapter on preforking, leaning on the kernel for load +balancing. Once it has a connection it parses the HTTP request, calls the Rack +app, and writes the response to the client. + +Things Get Quiet + +So long as there is a steady flow of incoming traffic Spyglass continues to act +as a preforking server. If its internal timeout is able to expire without +receiving any more incoming requests then the master process, and all its +worker processes, exit. Control is returned to Spyglass::Lookout and the +workflow begins again. + +Getting Started + +Spyglass is not a production-ready server, so don't rush to start using it for +your projects! It's a codebase that's meant to be read. It's heavily commented +and formatted documentation is generated with rocco %{http:// +rtomayko.github.com/rocco}. +The best thing to do at this point is enter the code directory that comes with +this book in your terminal, find the Spyglass codebase, and run rake read. This +will open up the formatted documentation in your browser for your reading +pleasure. +Now go forth and read the code! And may the fork(2) be with you! |
