Product | AMD Instinct™ MI100 Accelerator - 32GB HBM2 - PCIe 4.0 x16 - Passive Cooling | NVIDIA® A10 GPU Computing Accelerator - 24GB GDDR6 - PCIe 4.0 x16 - Passive Cooler (w/o CEC) | NVIDIA® A40 GPU Computing Accelerator - 48GB GDDR6 - PCIe 4.0 x16 - Passive Cooling | NVIDIA® L40 ADA GPU Computing Accelerator - 48GB GDDR6 - PCIe 4.0 x16 - Passive Cooling | NVIDIA® L40S ADA GPU Computing Accelerator - 48GB GDDR6 - PCIe 4.0 x16 - Passive Cooling |
Action | Select | Select | Select | Select | Select |
Main Specifications | |||||
Product Series | AMD Instinct | Nvidia A10 | Nvidia A40 | Nvidia L40 | Nvidia L40S |
Core Type | NVIDIA TENSOR | NVIDIA TENSOR | NVIDIA TENSOR | NVIDIA TENSOR | |
Core Clock Speed | 1502 MHz | 885 MHz (1695 MHz Boost Clock) | |||
Host Interface | PCI Express 4.0 x16 | PCI Express 4.0 x16 64GB/s | PCI Express 4.0 x16 | PCI Express 4.0 x16 | PCI Express 4.0 x16 |
GPU Architecture | CDNA | Ampere | Ampere | Ada Lovelace | Ada Lovelace |
Detailed Specifications | |||||
Streaming Processor Cores | 7,680 | 10752 CUDA Cores | 18,176 | ||
Compute Units | 120 | ||||
NVIDIA Tensor Cores | 336 Tensor Cores | 568 | Gen 4 | |||
NVIDIA RT Cores | 72 RT Cores | 84 RT Cores | 142 | Gen 3 | ||
Memory Clock Speed | 1.2 GHz | 1563 MHz | |||
Memory Interface | 4096-bit | 384-bit | |||
Memory Speeds (GT/s) | 14.5Gbps GDDR6 | ||||
Max Memory Size | 32 GB HBM2 | 24 GB GDDR6 | 48 GB GDDR6 with error-correcting code (ECC) | 48 GB GDDR6 with ECC | 48GB GDDR6 with ECC |
Max Memory Bandwidth | Up to 1228.8 GB/s | 600 GB/s | 696 GB/s | 864 GB/s | |
Infinity Fabric™ Links | 3 | ||||
Peak Infinity Fabric™ Link Bandwidth | 92 GB/s | ||||
INT8 Tensor Core | 250 TOPS | 500 TOPS | 733 teraFLOPS | |||
TF32 Tensor Core | 62.5 teraFLOPS | 125 teraFLOPS | 183 teraFLOPS | |||
FP32 | 31.2 teraFLOPS | 91.6 teraFLOPS | |||
Peak BFLOAT16 Tensor Core | 125 teraFLOPS | 250 teraFLOPS | 362.05 teraFLOPS | |||
Peak FP16 Tensor Core | 125 teraFLOPS | 250 teraFLOPS | 362.05 teraFLOPS | |||
Peak FP8 Tensor Core | 733 teraFLOPS | ||||
Peak INT4 Tensor Core | 500 TOPS | 1,000 TOPS | 733 teraFLOPS | |||
Total NVLink Bandwidth | NVIDIA NVLink 112.5 GB/s (bidirectional) PCIe Gen4 16 GB/s | Not supported | |||
Multi-Instance GPUs | No | ||||
vGPU Software Support | NVIDIA vPC/vApps, NVIDIA RTX Virtual Workstation (vWS) | ||||
NVENC | NVDEC | 3x | 3x (Includes AV1 Encode & Decode) | 3x l 3x (includes AV1 encode and decode) | |||
Secure Boot with Root of Trust | Yes | Yes | |||
NEBS Ready | Yes / Level 3 | Level 3 | |||
Peak Half Precision (FP16) Performance | 184.6 TFLOPs | ||||
Peak Single Precision Matrix (FP32) Performance | 46.1 TFLOPs | ||||
Peak Single Precision (FP32) Performance | 23.1 TFLOPs | ||||
Peak Double Precision (FP64) Performance | 11.5 TFLOPs | ||||
Peak INT4 Performance | 184.6 TOPs | ||||
Peak INT8 Performance | 184.6 TOPs | ||||
Peak bfloat16 | 92.3 TFLOPs | ||||
DisplayPort Connectors | 3x DisplayPort 1.4 A40 is configured for virtualization by default with physical display connectors disabled. The display outputs can be enabled via management software tools. | 4x DP 1.4a | 4x DisplayPort 1.4a | ||
OS Support | Linux x86_64 | ||||
Cooling | Passive | Passive | Passive | Passive | Passive |
Dual Slot | yes | Single-slot | 2-slot Low-profile | Yes | |
Dimensions | 10.5" (267 mm) Board Length | FHFL | 4.4" (H) x 10.5" (L) | 4.4" (H) x 10.5" (L) | 4.4" (H) x 10.5" (L) |
Form Factor | PCIe | ||||
Lithography | TSMC 7nm FinFET | 8 nm | Samsung 8nm | ||
Supplementary Power Connectors | 2x PCIe 8-pin connectors | None | 1x 8-pin CPU (EPS12V) | 1x 16-pin PCIe CEM5 | 1x 16-pin |
Max Graphics Card Power (W) | 300W | 150W | 300W | 300W | 350W |
Action | Select | Select | Select | Select | Select |