NVIDIA TESLA® V100

Tesla V100 is the flagship product of Tesla data center computing platform for deep learning, HPC, and graphics. The Tesla platform accelerates over 550 HPC applications and every major deep learning framework. It is available everywhere from desktops to servers to cloud services, delivering both dramatic performance gains and cost savings opportunities.

 

NVIDIA TESLA V100 PCIe
NVIDIA TESLA V100 NVLINK

 

 

 

 

 

 

 

 

 

 

 

Tesla V100 HPC Application Performance Guide Presentation

 

  • VOLTA ARCHITECTURE By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning.
  • TENSOR CORE Equipped with 640 Tensor Cores, Tesla V100 delivers 125 teraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.
  • NEXT GENERATION NVLINK NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300GB/s to unleash the highest application performance possible on a single server.
  • MAXIMUM EFFICIENCY MODE The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.
  • HBM2 With a combination of improved raw bandwidth of 900GB/s and higher DrAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STrEAM. Tesla V100 is now available in a 32GB configuration that doubles the memory of the standard 16GB offering.
  • PROGRAMMABILITY Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

 

NVIDIA V100 GPU Powered by Volta Tensor Cores

Designed specifically for deep learning, the first-generation Tensor Cores in Volta deliver groundbreaking performance with mixed-precision matrix multiply in FP16 and FP32—up to 12X higher peak teraflops (TFLOPS) for training and 6X higher peak TFLOPS for inference over the prior-generation NVIDIA Pascal™. This key capability enables Volta to deliver 3X performance speedups in training and inference over Pascal.

Each of Tesla V100’s 640 Tensor Cores operates on a 4×4 matrix, and their associated data paths are custom-designed to power the world’s fastest floating-point compute throughput with high-energy efficiency.

 

NVIDIA Tesla V100 Specifications

Tesla V100 for NVLink Tesla V100 for PCIe
PERFORMANCE
with NVIDIA GPU Boost™
Double-Precision: 7.8 teraFLOPS
Single-Precision: 15.7 teraFLOPS
Deep Learning: 125 teraFLOPS
Double-Precision: 7 teraFLOPS
Single-Precision: 14 teraFLOPS
Deep Learning: 112 teraFLOPS
INTERCONNECT BANDWIDTH
Bi-Directional
NVLINK
300 GB/s
PCIe
32 GB/s
MEMORY
CoWoS Stacked HBM2
CAPACITY
32/16 GB HBM2
BANDWIDTH
900 GB/s
POWER
Max Consumption
300 WATTS 250 WATTS