CUDA introduction part 1

CUDA is a tool, with the objective of using CUDA is parallelizing workloads accross multiple cores called CUDA cores. It is not the ideal computing source for most applications (e.g. single-threaded applications). Ideally, you would be using a GPGPU (General-Purpose GPU) computing platform like CUDA for massively parralelizable applications. One example of that is computing the determinant of large $N \times N$ matrices.

Basic Definitions

Heterogenous computing: offloading certain types of operations from the processor (CPU) to the GPU.

Warp: A group of CUDA threads that a single streaming multi-processor (SM) controls. A warp has 32 threads in it.

CUDA also defines special computing units called blocks and grids. These units are based on threads, which a CUDA thread executes on a single CUDA core. A group of threads are organized into one logical entity called a CUDA block. These are software terms, which correspond to CUDA core and CUDA multi-processor, respectively.

The CUDA grid\kernel is then a group of blocks that is executed on the device (the GPU). We have this structure to make threads within the same block communicate with each other.

Launching device code

The three chevron brackets $< < < > > >$ indicate the number of threads and blocks to run the device code on.

some_device_function<<<number_of_blocks, number_of_threads>>>();

Basic example using CUDA’s API for C

Notice the speed difference between using your CPU (Ryzen 3950X in my case), and a GPU with many more compute corse (GTX 780 here).

//kernel definition

__global__ void VecAdd(float* A, float* B, float* C){
	int i = threadIdx.x; // This is the thread's ID number, which is of length N. This function will run as N threads 
	C[i] = A[i] + B[i];

int main(){
	VecAdd<<<1, N>>>(A, B, C);  // <<<>>> is the execution configuration syntax 

Best books to start learning CUDA

  1. Learning CUDA programming A begineers guide to GPU programming and parallel computing with CUDA 10.x and CC++ by Jaegeun Han, Bharatkumar Sharma

  2. CUDA CPP Programming Guide

  3. Programming Massively Parallel Processors A Hands-On Approach by David B. Kirk, Wen-Mei W Hwu

More useful resources

I recommend watching for video tutorials on implementing CUDA code.