OS Note Chapter 4: Threads

2023-03-28

4 min read

Note , Learning

Overview

Motivation Behind Threads

Multiple tasks with the application can be implemented by separate threads (Scalability, Responsiveness )
- Update display
- Fetch data
- Spell checking
- Answer a network request
Process creation is heavy-weight while thread creation is light- weight (economy)
Threads run within the application/process (Resource Sharing)
Can simplify code, increase efficiency
Most modern applications are multithreaded
Kernels are generally multithreaded

Single and Multithreaded Processes

Page 4, object 30

Multithreaded Server Architecture

Page 5, object 33

Benefits

Responsiveness – may allow continued execution if part of process is blocked, especially important for user interfaces
Resource Sharing – threads share resources of process, easier than shared memory or message passing
Economy – cheaper than process creation, thread switching lower overhead than context switching
Scalability – process can take advantage of multicore architectures, with one or two threads per core
But:
- More difficult to program with threads (a single process can now do multiple things at the same time).
- New categories of bug are possible (synchronization is then required between threads: Chapter 6).

What is a thread

A thread is the smallest sequence of programmed instructions within a process that can be managed independently by an OS scheduler. (or A thread is a single sequence stream within a process)
The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process.
Multiple threads can exist within one process, executing concurrently and sharing resources such as memory, while different processes do not share these resources.
The threads of a process share its executable code and the values of its variables (code section, data section, OS resources) at any given time.

Threads vs. processes

Similarities (like process)

Threads share CPU and only one thread active (running at a time.
Threads within a processes execute sequentially.
Thread can create children.
If one thread is blocked, another thread can run.

Threads differ from processes in that:

processes	threads
typically independent	subsets of a process
more state information	share process state and resources
Separate address spaces	Same address space
interact through ipc models: (shared memory/message passing)	variables
Slower context switching	Faster context switching
might or might not assist one another	designed to assist one another

Multicore Programming

Multicore or multiprocessor systems putting pressure on programmers, challenges include:
- Dividing activities
- Load Balance
- Data splitting
- Data dependency
- Testing and debugging

Parallelism and Concurrency

*Parallelism(并行 同时做)* implies a system can perform more than one task simultaneously
- Multiple processors / cores are required
- Parallelism is a run-time property where two or more tasks are being executed simultaneously.
*Concurrency(并发 一起做)* supports more than one task making progress
- Single processor / core, CPU scheduler providing concurrency by doing context switches
- Concurrency is a property of a program where two or more tasks can be in progress simultaneously.
Parallelism implies concurrency, but concurrency does not imply parallelism

Concurrency vs. Parallelism

Concurrent execution on single-core system:

Page 11, object 62

Parallelism on a multi-core system:

Page 11, object 63

Multicore Programming

Types of parallelism

Data parallelism

distributes subsets of the same data across multiple cores, same operation on each

Example: when doing image processing, two cores can each process half of the image

Task parallelism

distributing threads across cores, each thread performing unique operation

Example: when doing sound processing, the sound data can move through each core in sequence, with each core doing a different kind of sound processing (filtering, echo, etc.)

Page 14, object 75

Amdahl’s Law

Identifies performance gains from adding additional cores to an application that has both serial and parallel components

$speedup\leq\frac{1}{S+\frac{1-S}{N}}$, where $S$ is serial portion, $N$ is processing cores.

Proof:

$S$ is serial portion, $P$ is parallel portion of program.

So $S + P = 100% = 1$

Running time on one core: $R_1 = S + P = 1$

Running time on $N$ cores: $R_N ≥ S + \frac{P}{N} = S + \frac{1 – S}{N}$

$≥$, not $=$, because of extra communication required between threads.

$speedup = \frac{R_1}{R_N}\leq\frac{1}{S+\frac{1-S}{N}}$

Example: if the application is 75% parallel and 25% serial, moving from 1 to 2 cores: 0.25 + 0.75/2 = 0.625, results in a maximum speedup of 1/0.625 = 1.6 times.
As N approaches infinity, the maximum speedup approaches 1 / S
- Serial portion of an application has disproportionate effect on performance gained by adding additional cores

Gustafson’s Law

Gustafson’s law addresses the shortcomings of Amdahl’s law, which is based on the assumption of a fixed problem size, that is of an execution workload that does not change with respect to the improvement of the resources.

$S(speedup) = S+P*N = S+(1-S)*N= N+(1-N)*S$

where

$S$ is the theoretical speedup of the program with parallelism (scaled speedup); $N$ is the number of processors; $S$ and $P$: the fractions of time spent executing the serial parts and the parallel parts of the program on the parallel system, $S+P=1$

Example: if the application is 75% parallel and 25% serial, moving from 1 to 2 cores: 2+(1-2) * 0.25 = 1.75 time faster.

Comparison Amdahl’s Law and Gustafson’s Law

In Amdahl’s law, the computational workload(problem) is fixed while the number of processors can be increased.

However, the speedup can NOT be increased to infinity even if the number of processors is increased to infinity. Therefore, it is referred to as a sequential bottleneck of multiprocessor systems.

If your goal is to solve a problem faster, this law is reasonable.

In Gustafson’s law, the workload size is increasing for large machines and it can retain scalability with the number of processors.

If the workload is scaled up to maintain a fixed execution time as the number of processors increases, the speedup increases linearly.

https://www.d.umn.edu/~tkwon/course/5315/HW/MultiprocessorLaws.pdf

User Threads and Kernel Threads

User Threads

User threads - management (thread creation, thread scheduling, etc.) done by user-level threads library.

Advantages:
- No need for OS support so works even on very old or very simple OS that does not have system calls for thread management.
- No system call required so thread management is fast: only need a library function call.
Disadvantage:
- When the kernel schedules processes, a process with only one thread gets as much CPU time as a process with many threads.
- All the thread scheduling inside a process must be done at user level (not done by kernel) so each thread must be nice and cooperate with the other threads in the process and regularly give CPU time to the other threads. This makes the program more complicated to write.

Kernel Threads

Kernel threads - supported by the kernel.

Advantages:
- Kernel knows how many threads each process contains so it can give more CPU time to the processes with many threads.
- No need for threads to cooperate for scheduling (thread scheduling done automatically by kernel) so user program simpler to write.
Disadvantages:
- Every thread management operation requires a system call so slower compared to user-level threads.
- Kernel’s PCB data structures more complex because the kernel needs to keep track of both processes and threads inside processes.

Examples

virtually all general purpose operating systems, including:

Windows
Linux
Mac OS X
iOS
Android

Multithreading Models

If threads are available both at user level and kernel level, then some user threads are normally associated with some kernel threads.
Several models of association between user threads and kernel threads are possible:
- Many-to-One
- One-to-One
- Many-to-Many

Many-to-One

Many user-level threads mapped to single kernel thread.
One thread blocking (waiting for something) causes all threads to block (because their common kernel thread is blocked).
Multiple threads may not run in parallel on multicore system because only one may be in kernel at a time.
Few systems currently use this model.
Examples:
- Solaris Green Threads
- GNU Portable Threads

One-to-One

Each user-level thread maps to kernel thread.
Creating a user-level thread creates a kernel thread.
More concurrency than many-to-one.
Number of threads per process sometimes restricted due to overhead.
Examples:
- Windows
- Linux

Many-to-Many Model

Allows many user level threads to be mapped to many kernel threads.
Allows the operating system to create a sufficient number of kernel threads.
Example: Windows with the ThreadFiber package.
Otherwise not very common.

Thread Libraries

Thread library provides programmer with API for creating and managing threads
Two primary ways of implementing
- Library entirely in user space (user threads only)
- OS-level library supported by the kernel (user threads mapped to kernel threads, with one-to-one model for example).
Three primary thread libraries:
1. POSIX Pthreads
2. Windows threads
3. Java threads

Pthreads (POSIX Standard API)

May be provided either as user-level or kernel-level
A POSIX standard (IEEE 1003.1c) API for thread creation and sychronization
Specification, not implementation
API specifies behavior of the thread library, implementation is up to developers of the library
Common in UNIX operating systems (Linux & Mac OS X)
Examples refer to Slides

Implicit Threading

Growing in popularity as numbers of threads increase, program correctness more difficult with explicit threads
Creation and management of threads done by compilers and run-time libraries rather than programmers
Five methods explored
1. ThreadPools
2. Fork-Join
3. OpenMP(http://www.openmp.org/)
4. GrandCentralDispatch
5. Intel Threading Building Blocks (TBB)

Thread Pools

Create a number of threads in advance in a pool where they await work
Advantages:
- Usually slightly faster to service a request with an existing thread than create a new thread
- Allows the number of threads in the application(s) to be bound to the size of the pool
- Separating task to be performed from mechanics of creating task allows different strategies for running task
i.e. Tasks could be scheduled to run periodically
Windows API supports thread pools

Fork-Join Parallelism

Multiple threads (tasks) are forked, and then joined.
- Available in Java. (since Java SE7)
- Similar to Hadoop MapReduce operation

Page 34, object 146

有点像 divide-and-conquer

OpenMP

Set of compiler directives and an API for C, C++, FORTRAN
- Provides support for parallel programming in shared-memory environments
- Identifies parallel regions – blocks of code that can run in parallel

Thread-Local Storage

Thread-local storage (TLS) allows each thread to have its own copy of data
- Useful when you do not have control over the thread creation process (i.e., when using a thread pool)
- Different from local variables
  - Local variables visible only during single function invocation
  - TLS visible across function invocations
- Similar to static data
  - TLS is unique to each thread
Linux declare a TLS variable: __thread int number;

Thread Cancellation

Terminating a thread before it has finished
Thread to be canceled is target thread
- Two general approaches:
  1. Asynchronous cancellation terminates the target thread immediately
  2. Deferred cancellation allows the target thread to periodically check if it should be cancelled
Invoking thread cancellation requests cancellation, but actual cancellation depends on thread state

Mode	State	Type
Off	Disabled	-
Deferred	Enabled	Deferred
Asynchronous	Enabled	Asynchronous

If thread has cancellation disabled, cancellation remains pending until thread enables it
Default type is deferred
- Cancellation only occurs when thread reaches cancellation point
  - i.e. pthread_testcancel()
  - Then cleanup handler is invoked
On Linux systems, thread cancellation is handled through signals

Example: Windows Threads

Windows API – primary API for Windows applications
Implements the one-to-one mapping, kernel-level
Each thread contains
- A thread id
- Register set representing state of processor
- Separate user and kernel stacks for when thread runs in user mode or kernel mode
- Private data storage area used by run-time libraries and dynamic link libraries (DLLs)
The register set, stacks, and private data storage area are known as the context of the thread

OS Note Chapter 4: Threads

Overview

Motivation Behind Threads

Single and Multithreaded Processes

Multithreaded Server Architecture

Benefits

What is a thread

Threads vs. processes

Multicore Programming

Parallelism and Concurrency

Concurrency vs. Parallelism

Multicore Programming

Types of parallelism

Data parallelism

Task parallelism

Amdahl’s Law

Gustafson’s Law

Comparison Amdahl’s Law and Gustafson’s Law

User Threads and Kernel Threads

User Threads

Kernel Threads

Examples

Multithreading Models

Many-to-One

One-to-One

Many-to-Many Model

Thread Libraries

Pthreads (POSIX Standard API)

Implicit Threading

Thread Pools

Fork-Join Parallelism

OpenMP

Thread-Local Storage

Thread Cancellation

Example: Windows Threads

Kirin

On This Page

See Also