Programming Multi-Core and Many-Core Systems

Multi vs many core

CPU levels of parallelism

Cores/threads:

GPU levels of parallelism:

usually connect GPU with host over PCI express 3, theoretical speed 8 GT/s (gigatransactions per second).

Why different design in CPU vs GPU?

Locality: programs tend to use data and instructions with address near to those used recently

CPU caches:

Hardware performance metrics:

Main constraint for optimising compilers: do not cause any change in program behavior. So when in doubt, compiler must be conservative.

In-core parallelism: