CUDA Parallel Computing Platform

CUDA Parallel Computing Platform

developer.nvidia.com

2

About this website

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA for general-purpose computing on GPUs (Graphics Processing Units). Released in 2007, CUDA enables developers to harness the massive computational power of NVIDIA GPUs for non-graphics workloads including scientific computing, machine learning, and data analytics. Key features: C/C++ and Fortran programming model with language extensions (function qualifiers like __device__, __global__, __host__) for writing code that runs on the GPU. Kernel-based execution model where a kernel function is launched in parallel across thousands of GPU threads organized into thread blocks and grids. Memory hierarchy including global memory, shared memory (fast on-chip memory shared within a block), constant memory, texture memory, and per-thread registers and local memory. Warp-level programming for fine-grained SIMD-like control including warp shuffle operations, cooperative groups, and warp-level primitives. CUDA libraries including cuBLAS (linear algebra), cuFFT (fast Fourier transform), cuDNN (deep neural networks), cuSPARSE (sparse matrices), cuSOLVER (dense and sparse solvers), Thrust (C++ template library for parallel algorithms), and CUTLASS (matrix multiplication). CUDA Streams for concurrent execution and overlapping computation with data transfers. Asynchronous memory copies between host and device for hiding transfer latency. Unified Memory for simplified memory management with automatic page migration between CPU and GPU. CUDA Graphs for defining, reusing, and optimizing sequences of operations with minimal launch overhead. Multi-GPU programming via NCCL for collective communication across GPUs. nvcc compiler for compiling CUDA code with PTX (parallel thread execution) intermediate representation and SASS assembly for specific GPU architectures. Profiling tools including Nsight Systems, Nsight Compute, and nvprof for performance optimization.

Tags & Categories

Statistics

2
Views
0
Clicks
0
Like
0
Dislike

Comments

Log In to post a comment

No comments yet. Be the first!