Learning Notes -š§ How Linux Process were created
Linux Processes Creation
How Linux Process were created
1. The Big Picture: Overview of fork(), exit(), wait(), and execve()
These four system calls are the fundamental building blocks of process creation and management in UNIX/Linux.
fork(): Creates a new child process that is an almost exact duplicate of the parent process.execve(): Replaces the current processās memory (text, data, stack, heap) with a completely new program.exit(): Terminates a process and releases all its resources, returning an exit status to the parent.wait(): Pauses the parent process until one of its children terminates, allowing the parent to collect the childās exit status and prevent āzombieā processes.
š Process Lifecycle Sequence Diagram
sequenceDiagram
participant P as Parent Process
participant K as Kernel
participant C as Child Process
P->>K: 1. fork()
K-->>P: Returns child PID
K-->>C: Returns 0
Note over C: Child is a duplicate<br/>of parent (COW)
C->>K: 2. execve("/bin/ls", ...)
Note over C: OLD memory DESTROYED<br/>NEW program loaded
C->>K: 3. exit(status)
Note over C: Process terminates<br/>Becomes ZOMBIE until reaped
K-->>P: SIGCHLD signal
P->>K: 4. wait(&status)
K-->>P: Returns child PID + status
Note over P: Zombie reaped<br/>Parent continues
2. fork() and File Descriptors
When fork() is called, the child process inherits copies of the parentās file descriptors. Crucial Concept: These descriptors do not point to separate files; they point to the same underlying āOpen File Descriptionā in the kernel.
š Analogy: The Shared Safe
Imagine a bank safe (the Open File Description). The parent has a key (File Descriptor 3). When fork() happens, the bank makes an exact copy of the key and gives it to the child (Childās File Descriptor 3). Both keys open the exact same safe, and more importantly, they share the same internal pointer (the file offset). If the parent reads 10 bytes, the internal pointer moves 10 bytes forward. When the child uses its key to read next, it will start from the 11th byte.
š ASCII Diagram: File Descriptor Inheritance
flowchart LR
subgraph ProcessA ["Process A (Parent)"]
direction TB
A_fd0["FD 0"]
A_fd2["FD 2"]
A_fd20["FD 20"]
end
subgraph ProcessB ["Process B (Child)"]
direction TB
B_fd0["FD 0"]
B_fd2["FD 2"]
B_fd3["FD 3"]
end
subgraph OpenFileTable ["System-wide Open File Table"]
direction TB
OFD_1["Open File Description 1<br/>Offset: 0<br/>Flags: O_RDONLY"]
OFD_2["Open File Description 2<br/>Offset: 100<br/>Flags: O_RDWR"]
OFD_3["Open File Description 3<br/>Offset: 0<br/>Flags: O_WRONLY"]
end
subgraph InodeTable ["System-wide i-node Table"]
direction TB
Inode_1["i-node 1976<br/>Size: 1024<br/>Perms: 0644"]
Inode_2["i-node 224<br/>Size: 2048<br/>Perms: 0644"]
end
%% Inherited via fork()
A_fd2 -->|"Inherited via fork"| OFD_2
B_fd2 -->|"Inherited via fork"| OFD_2
%% Duplicated via dup()
A_fd20 -->|"Duplicated via dup"| OFD_2
%% Standard inheritance for stdin/stdout
A_fd0 --> OFD_1
B_fd0 --> OFD_1
%% Opened independently after fork()
A_fd3 -.->|"Opened independently<br/>after fork"| OFD_3
B_fd3 -.->|"Opened independently<br/>after fork"| OFD_3
%% OFD to i-node mapping
OFD_1 --> Inode_1
OFD_2 --> Inode_1
OFD_3 --> Inode_2
3. fork() and Memory Semantics (Copy-on-Write)
Historically, fork() duplicated the parentās entire memory space for the child. This was incredibly slow and wasted RAM. Modern Linux uses Copy-on-Write (COW).
š Analogy: The Shared Textbook
Imagine the parent and child are given a textbook (the memory pages). Instead of printing two expensive books, the library gives them one shared book, but locks it in a glass case marked āRead-Onlyā.
- If both just read it, no extra cost is incurred.
- If the child wants to take notes (write to memory), the glass case breaks (a Page Fault occurs). The librarian quickly photocopies only that specific page and gives the copy to the child to write on.
- This is Copy-on-Write: Memory is only physically copied when a process actually tries to modify it.
1. Copy-on-Write (COW) Structural Illustration: Memory Mapping After a Write
This flowchart illustrates the state of the virtual and physical memory after the child process has attempted to write to a shared page, triggering the Copy-on-Write mechanism.
flowchart TD
subgraph Virtual ["Virtual Address Space"]
subgraph ParentVM ["Parent Process"]
P_PT["Parent Page Table"]
end
subgraph ChildVM ["Child Process"]
C_PT["Child Page Table"]
end
end
subgraph Physical ["Physical Memory (RAM)"]
P1["Physical Page A<br/>(Original Data)<br/>Ref Count: 1"]
P2["Physical Page B<br/>(Copied Data)<br/>Ref Count: 1"]
end
P_PT -->|Maps to| P1
C_PT -. "Updated mapping<br/>after COW" .-> P2
style P1 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style P2 fill:#fff3e0,stroke:#e65100,stroke-width:2px
style P_PT fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style C_PT fill:#fce4ec,stroke:#c2185b,stroke-width:2px
2. Detailed Step-by-Step Explanation
To truly understand Copy-on-Write (COW), we have to look at the interaction between the Process, the hardware Memory Management Unit (MMU), and the Kernel. Here is the exact sequence of events:
- The
fork()Call: When a process callsfork(), the kernel creates a new Page Table for the child. Crucially, it does not copy the physical memory pages. Instead, it points both the Parentās and Childās page tables to the exact same physical page frames in RAM. These shared physical pages are temporarily marked as Read-Only (or Copy-on-Write) by the kernel. - The Write Attempt (The Trigger): Both processes can read from these shared pages without any issues. However, the moment the Child process attempts to modify (write to) one of these shared pages, the hardware MMU detects the violation (since the page is marked Read-Only) and halts the process, triggering a Page Fault.
- Kernel Intervention (The Copy): The Kernelās page fault handler intercepts this fault. It recognizes it as a COW fault. The kernel allocates a new, empty physical page frame in RAM, copies the exact contents of the original shared page into this new page, and marks the new page as Read-Write.
- Page Table Update: The kernel updates the Childās page table to point to the newly copied physical page. The Parentās page table is left untouched and continues to point to the original physical page.
- Resumption: The kernel resumes the Child process. The Childās write operation now succeeds because it is writing to its own private, Read-Write physical page.
The Result: Both processes now have their own independent copies of the modified data. However, any pages that were never modified remain shared in physical RAM, saving massive amounts of memory.
3. The COW Mechanism in Action with sequence Diagram
This sequence diagram visualizes the exact chronological flow of the Page Fault and the Kernelās Copy-on-Write intervention described above.
sequenceDiagram
participant Child as Child Process
participant MMU as Memory Management Unit (MMU)
participant Kernel as Kernel Page Fault Handler
participant RAM as Physical RAM
Note over Child, RAM: Initial State: Both Parent & Child point to the same Read-Only physical page.
Child->>MMU: 1. Attempts to write to shared page
MMU->>MMU: 2. Checks page permissions (Sees Read-Only)
MMU->>Kernel: 3. Triggers Page Fault (COW Fault)
Note over Kernel: Kernel intercepts the fault and recognizes it as a COW event.
Kernel->>RAM: 4. Allocates a NEW physical page frame
Kernel->>RAM: 5. Copies data from old page to new page
Kernel->>Kernel: 6. Updates Child's Page Table to point to the new page
Kernel->>MMU: 7. Resumes Child Process
MMU->>RAM: 8. Child successfully writes to the new private page
Summary of the āMagicā of COW
- Before
fork(): 1 set of physical pages. - Immediately after
fork(): Still 1 set of physical pages (shared, marked Read-Only). - After a write: 2 sets of physical pages (only the specific pages that were written to are duplicated).
This mechanism is why fork() is incredibly fast and memory-efficient, even for massive processes like web browsers or databases. The kernel only pays the ācostā of copying memory if the child process actually decides to modify it!
1
*Note: The **Text Segment** (program code) is almost always shared and remains Read-Only, as programs rarely modify their own executable code.*
4. The Magic of fork() + execve()
Why do we usually call fork() immediately followed by execve()? Because of Copy-on-Write, if the child calls execve() immediately, the new program destroys the old memory segments (text, data, stack, heap) and replaces them with the new programās segments.
5. Example Code: Putting it all together
Here is a complete C program demonstrating fork(), execve(), and wait().
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(int argc, char *argv[]) {
pid_t child_pid;
int status;
printf("Parent: My PID is %d\n", getpid());
// 1. Create a child process
child_pid = fork();
if (child_pid == -1) {
perror("fork failed");
exit(EXIT_FAILURE);
}
if (child_pid == 0) {
// --- CHILD PROCESS ---
printf("Child: My PID is %d. I am about to execve()!\n", getpid());
// Prepare arguments for execve
// execve requires: path to program, array of arguments, array of environment vars
char *args[] = {"/bin/ls", "-l", NULL};
char *env[] = {NULL}; // Inherit environment or pass NULL
// 2. Replace child's memory with the '/bin/ls' program
// This destroys the child's original text, data, stack, and heap!
execve("/bin/ls", args, env);
// If execve returns, it MUST have failed.
perror("execve failed");
exit(EXIT_FAILURE);
} else {
// --- PARENT PROCESS ---
printf("Parent: Forked child with PID %d. Waiting for child to finish...\n", child_pid);
// 4. Wait for the child to terminate and collect its exit status
wait(&status);
if (WIFEXITED(status)) {
printf("Parent: Child exited normally with status %d\n", WEXITSTATUS(status));
} else {
printf("Parent: Child exited abnormally.\n");
}
}
return 0;
}
Compile and run:
1
2
gcc -o fork_exec_demo fork_exec_demo.c
./fork_exec_demo
You will see the ls commands output generated along with process informations. Lets see how it could be breaking down using strace.
1
strace -f -e trace=process sh -c './fork_exec_demo'
Here is a detailed breakdown of your strace log. mapping it to the concepts from The Linux Programming Interface (TLPI).
1. The Process Tree
First, letās identify the three processes involved in this trace based on their PIDs:
- PID 21848 (
sh): The outermost shell process executingsh -c './fork_exec_demo'. - PID 21849 (
./fork_exec_demo): Your compiled C program. - PID 21850 (
/bin/ls): The grandchild process executed by your C program.
2. Step-by-Step Log Breakdown
Phase 1: Shell starts your program
1
2
3
4
5
6
execve("/usr/bin/sh", ["sh", "-c", "./fork_exec_demo"], ...) = 0
vfork(strace: Process 21849 attached
<unfinished ...>
[pid 21849] execve("./fork_exec_demo", ["./fork_exec_demo"], ...) <unfinished ...>
[pid 21848] <... vfork resumed>) = 21849
[pid 21849] <... execve resumed>) = 0
execve: The shell starts.vfork: Notice thatshusesvfork()instead offork()to launch your program. This is a common optimization in shells. Because the shell knows the child will immediately callexecve()to replace itself,vfork()avoids copying the parentās page tables, saving time and memory. The parent (sh) is blocked until the child callsexecveorexit.execve: PID 21849 loads and executes./fork_exec_demo.
Phase 2: Your program forks a child
1
2
3
4
5
[pid 21848] wait4(-1, Parent: My PID is 21849
<unfinished ...>
[pid 21850] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7728dd60da10) = 21850
Child: My PID is 21850. I am about to execve()!
Parent: Forked child with PID 21850. Waiting for child to finish...
wait4: PID 21848 (sh) is now blocked inwait4()(the underlying syscall for the Cwait()function), waiting for PID 21849 to finish.clone: Your program (PID 21849) calls the C libraryfork(). Under the hood, glibc implementsfork()using theclone()system call. The flagsCLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDare standard glibc flags to ensure proper thread/process cleanup and signal delivery. It creates PID 21850.- Printfs: The child and parent print their respective messages.
Phase 3: The Grandchild executes and exits
1
2
3
4
5
6
7
8
[pid 21850] execve("/bin/ls", ["/bin/ls", "-l"], ...) <unfinished ...>
[pid 21849] wait4(-1, <unfinished ...>
[pid 21850] <... execve resumed>) = 0
total 112
-rwxrwxr-x 1 ubuntu ubuntu 26736 Jun 20 19:10 a.out
... (ls output omitted for brevity) ...
[pid 21850] exit_group(0) = ?
[pid 21850] +++ exited with 0 +++
execve: PID 21850 replaces itself with/bin/ls -l.wait4: PID 21849 (your program) is now blocked inwait4(), waiting for PID 21850 (ls) to finish.lsoutput: The directory listing is printed to stdout.exit_group(0): PID 21850 finishes and exits with status0. (exit_groupis the modern Linux syscall that terminates all threads in a process group, which is what glibcāsexit()calls).
Phase 4: Signals and Parent Cleanup
1
2
3
4
5
[pid 21849] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 21850
[pid 21849] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21850, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
Parent: Child exited normally with status 0
[pid 21849] exit_group(0) = ?
[pid 21849] +++ exited with 0 +++
wait4resumes: PID 21849āswait4unblocks because the child exited. It returns the childās PID (21850) and populates the status integer.SIGCHLD: The kernel delivers aSIGCHLDsignal to PID 21849. Because your C program didnāt install a custom handler forSIGCHLD, the default action (ignore) applies, but the signal is still delivered and recorded bystrace.exit_group(0): Your program prints its final message and exits normally.
Phase 5: Shell Cleanup
1
2
3
4
5
<... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 21849
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21849, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffce0a5206c, WNOHANG, NULL) = -1 ECHILD (No child processes)
exit_group(0) = ?
+++ exited with 0 +++
wait4resumes: The outersh(PID 21848) unblocks, seeing that PID 21849 exited.SIGCHLD:shreceivesSIGCHLDfor your program.wait4(..., WNOHANG) = -1 ECHILD: Shells often have internal loops to reap any remaining zombie children. Here,shdoes a non-blockingwait4(WNOHANG). Because all children have already been reaped, it returns-1withECHILD(No child processes).exit_group(0):shexits, and the trace ends.
3. Key Takeaways from this Trace
- glibc
fork()vs Linuxclone(): In TLPI, you learn thatfork()is a standard C library function. This trace proves that on Linux, glibc implementsfork()by invoking theclone()system call with specific flags (SIGCHLD,CLONE_CHILD_SETTID, etc.) to mimic standard POSIXfork()semantics. vfork()Optimization: The trace showsshusingvfork(). As TLPI explains,vfork()suspends the parent until the child callsexecveor_exit. This is a massive performance optimization because it avoids duplicating the parentās page tables.wait()vswait4(): Similarly, the C standard librarywait()is implemented using the Linux-specificwait4()system call under the hood.- Signal Delivery (
SIGCHLD): The trace explicitly shows theSIGCHLDsignal being delivered to the parent processes exactly when their children callexit_group(). This perfectly illustrates the asynchronous nature of signals and process termination described in TLPI.
References & Books That I used to write this article
The Linux Programming Interface by Michael Kerrisk (Chapter 6,24.1,24.2.1) vfork wait SIGCHILD LWN.net
Cheers