Parallelisation & OpenMP

architecture model from programmer perspective:

share memory
multiple processing units

program with multithreading. one code, one data heap, multiple program counters, multiple register sets, multiple runtime stacks. threads are provided by OS.

OpenMP:

compiler directives, library functions, env variables
ideal: automatic parallelisation of seq code
but data dependencies are hard to assess, and compilers must be conservative
so add annotations to sequential program for parallelisation
- #pragma omp name [clause]*
- #pragma omp parallel { ... }: parallelize code block
for gcc, add -fopenmp
control num threads with env variable OMP_NUM_THREADS or lib function omp_set_num_threads(int)

Loop parallelisation:

have each thread compute some disjoint part of the vectors
no data dependence between any two iterations. has to be true.
pragma omp parallel divides the data among threads and synchronizes them
directive must directly precede for-loop, for loop must match constrained pattern, trip-count of for-loop must be known in advance (when you reach the loop, not necessarily at compile time)
private variables: one private instance for each thread, no comms between threads within parallel section or between parallel/sequential sections
shared variables: one shared instance for all threads, comms betwe threads in parallel section and between parallel/sequential sections. concurrent access to this is problematic.
can decide private/shared with clause: #pragma omp parallel for private(i) shared(c, a, b, len)
loop-carried dependence: if you compute based on some updated values.

concurrent access is like a fridge in a shared apartment, your beers can disappear at any time for any reason.

race condition/data race: if behaviour of program depends on execution order of program parts whose temporal behaviour is beyond control

a critical section is used to restrict thread interleaving, only one thread executes in critical section
#pragma omp critical { ... }
disadvantage: critical sections are synchronized. named critical sections (#pragma omp critical {...}) execute synced with other same name critical sections
can use reduction(+; sum) clause in pragma instead

Programming Multi-Core and Many-Core Systems

Table of Contents

Parallelisation & OpenMP