Nvidia A40 GPU
The World’s Most Powerful Data Center GPU for Visual Computing
Modern data centers are evolving rapidly. Advanced technologies such as real-time ray tracing, AI, compute, simulation, and VR are common across industries. The need to work remotely has accelerated faster than anyone could have anticipated, with workloads that span the entire enterprise.
NVIDIA A40 delivers the data center-based solution designers, engineers, artists, and scientists need to meet today’s challenges. Built on the NVIDIA Ampere architecture, the A40 combines the latest generation RT Cores, Tensor Cores, and CUDA Cores with 48GB of graphics memory for unprecedented graphics, rendering, compute, and AI performance. From powerful virtual workstations accessible from anywhere to dedicated render nodes, the A40 is built to tackle the most demanding visual computing workloads from the data center
SPECIFICATIONS
GPU architecture |
NVIDIA Ampere architecture |
Display ports |
3x DisplayPort 1.4**; Supports NVIDIA Mosaic and Quadro® Sync4 |
GPU memory
|
48 GB GDDR6 with ECC
|
Max power consumption |
300 W |
Memory bandwidth |
696 GB/s |
Power connector |
8-pin CPU |
Interconnect interface |
NVIDIA® NVLink® 112.5 GB/s (bidirectional)3 PCIe Gen4: 64GB/s |
Thermal solution |
Passive |
NVIDIA Ampere architecture-based CUDA Cores |
10,752 |
Virtual GPU (vGPU) software support |
NVIDIA vPC/vApps, NVIDIA RTX Virtual Workstation, NVIDIA Virtual Compute Server |
NVIDIA second-generation RT Cores |
84 |
vGPU profiles supported |
See the Virtual GPU Licensing Guide |
NVIDIA third-generation Tensor Cores |
336 |
NVENC | NVDEC |
1x | 2x (includes AV1 decode) |
Peak FP32 TFLOPS (non-Tensor) |
37.4 |
Secure and measured boot with hardware root of trust |
Yes (optional) |
Peak FP16 Tensor TFLOPS with FP16 Accumulate |
149.7 | 299.4 |
NEBS ready |
Level 3 |
Peak TF32 Tensor TFLOPS |
74.8 | 149.6 |
Compute APIs |
CUDA, DirectCompute, OpenCL™, OpenACC® |
RT Core performance TFLOPS |
73.1 |
MIG support |
No |
Peak BF16 Tensor TFLOPS with FP32 Accumulate |
149.7 | 299.4 |
Form factor |
4.4" (H) x 10.5" (L) dual slot |
Peak INT8 Tensor TOPS Peak INT 4 Tensor TOPS |
299.3 | 598.6 598.7 | 1,197.4 |
Graphics APIs |
DirectX 12.075, Shader Model 5.175, OpenGL 4.686, Vulkan 1.186 |
A Look Inside the NVIDIA Ampere Architecture
- 48 GB GDDR6 Memory with NVLink
The NVIDIA A40 graphics card features ultra-fast GDDR6 memory with a default capacity of 48 GB, which can be expanded up to 96 GB using NVLink technology. This high-speed memory and wide bandwidth allow data scientists and researchers to handle complex and large-scale processing tasks efficiently. The extensive memory capacity and speed enable the card to meet increasing demands for large data sets and rapid computations, optimizing performance in heavy workloads.
- Third-Generation Tensor Cores
The NVIDIA A40 utilizes third-generation Tensor Cores that provide up to 5 times the training throughput compared to the previous generation, thanks to Tensor Float 32 (TF32) precision. This technology is particularly suited for machine learning applications and complex models such as deep neural networks, delivering unparalleled computational power and efficiency.
- Data Center Efficiency and Security
With its dual-slot, power-efficient design, the NVIDIA A40 is up to 2 times more power-efficient than previous models. This efficiency contributes to reduced operational costs and increased productivity in data centers. The A40's energy-efficient design also helps in minimizing environmental impact and lowering energy costs, making it a valuable asset for large data centers and supercomputing systems.
- PCI Express Gen 4
The NVIDIA A40 supports PCI Express Gen 4, which doubles the bandwidth compared to PCIe Gen 3. This increase in bandwidth significantly enhances data transfer speeds and reduces latency in data transmission. This feature is crucial for applications that require fast data movement, such as complex analyses and parallel processing tasks.
- Advanced Features of the NVIDIA A40
The NVIDIA A40 offers state-of-the-art visual computing capabilities, including real-time ray tracing, AI acceleration, and multi-workload flexibility. This makes it an ideal choice for accelerating deep learning, data science, and compute-intensive tasks. The card delivers advanced visual processing power that helps users achieve more accurate and faster results.
- Virtual Workstations and NVIDIA Software
Virtual workstations powered by the NVIDIA A40, along with NVIDIA RTX Virtual Workstation (vWS) and NVIDIA Virtual Compute Server software, benefit from extensive testing across a broad range of industry applications and professional software. These technologies ensure optimal performance and stability, enabling organizations to leverage advanced graphics and computational capabilities with enhanced security and efficiency.
Summary
The NVIDIA A40 graphics card, with its high-speed and extensive memory, third-generation Tensor Cores, and power-efficient design, represents a significant advancement in graphics and computational processing. Its advanced features meet the needs of professional users across various scientific and industrial fields, improving system performance and efficiency in complex computing tasks.