Bfloat16_floating-point_format

bfloat16 floating-point format

Floating-point number format used in computer processors

The bfloat16 (brain floating point)^[1]^[2] floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a shortened (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing.^[3] It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use. Bfloat16 is used to reduce the storage requirements and increase the calculation speed of machine learning algorithms.^[4]

The bfloat16 format was developed by Google Brain, an artificial intelligence research group at Google. It is utilized in many CPUs, GPUs, and AI processors, such as Intel Xeon processors (AVX-512 BF16 extensions), Intel Data Center GPU, Intel Nervana NNP-L1000, Intel FPGAs,^[5]^[6]^[7] AMD Zen, AMD Instinct, NVIDIA GPUs, Google Cloud TPUs,^[8]^[9]^[10] AWS Inferentia, AWS Trainium, ARMv8.6-A,^[11] and Apple's M2^[12] and therefore A15 chips and later. Many libraries support bfloat16, such as CUDA,^[13] Intel oneAPI Math Kernel Library, AMD ROCm,^[14] AMD Optimizing CPU Libraries, PyTorch, and TensorFlow.^[10]^[15] On these platforms, bfloat16 may also be used in mixed-precision arithmetic, where bfloat16 numbers may be operated on and expanded to wider data types.

bfloat16 floating-point format

bfloat16 has the following format:

Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 8 bits (7 explicitly stored, with an implicit leading bit), as opposed to 24 bits in a classical single-precision floating-point format

The bfloat16 format, being a shortened IEEE 754 single-precision 32-bit float, allows for fast conversion to and from an IEEE 754 single-precision 32-bit float; in conversion to the bfloat16 format, the exponent bits are preserved while the significand field can be reduced by truncation (thus corresponding to round toward 0) or other rounding mechanisms, ignoring the NaN special case. Preserving the exponent bits maintains the 32-bit float's range of ≈ 10⁻³⁸ to ≈ 3 × 10³⁸.^[16]

The bits are laid out as follows:

IEEE half-precision 16-bit float
sign		exponent (5 bit)					fraction (10 bit)
┃		┌────────┐					┌───────────────────────────┐
	0	0	1	1	0	0	0	1	0	0	0	0	0	0	0	0
	15	14				10	9									0

bfloat16
sign		exponent (8 bit)								fraction (7 bit)
┃		┌────────────────┐								┌───────────────────┐
	0	0	1	1	1	1	1	0	0	0	1	0	0	0	0	0
	15	14							7	6						0

NVidia's TensorFloat (19 bits)
sign		exponent (8 bit)								fraction (10 bit)
┃		┌───────────────┐								┌────────────────────────┐
	0	0	1	1	1	1	1	0	0	0	1	0	0	0	0	0	0	0	0
	18	17							10	9									0

AMD's fp24 format
sign		exponent (7 bit)							fraction (16 bit)
┃		┌─────────────┐							┌─────────────────────────────────────┐
	0	0	1	1	1	1	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
	23	22						16	15															0

Pixar's PXR24 format
sign		exponent (8 bit)								fraction (15 bit)
┃		┌────────────────┐								┌──────────────────────────────────┐
	0	0	1	1	1	1	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
	23	22							15	14														0

IEEE 754 single-precision 32-bit float
sign		exponent (8 bit)								fraction (23 bit)
┃		┌────────────────┐								┌────────────────────────────────────────────────┐
	0	0	1	1	1	1	1	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
	31	30							23	22																						0

Examples

These examples are given in bit representation, in hexadecimal and binary, of the floating-point value. This includes the sign, (biased) exponent, and significand.

3f80 = 0 01111111 0000000 = 1
c000 = 1 10000000 0000000 = −2

7f7f = 0 11111110 1111111 = (2⁸ − 1) × 2⁻⁷ × 2¹²⁷ ≈ 3.38953139 × 10³⁸ (max finite positive value in bfloat16 precision)
0080 = 0 00000001 0000000 = 2⁻¹²⁶ ≈ 1.175494351 × 10⁻³⁸ (min normalized positive value in bfloat16 precision and single-precision floating point)

The maximum positive finite value of a normal bfloat16 number is 3.38953139 × 10³⁸, slightly below (2²⁴ − 1) × 2⁻²³ × 2¹²⁷ = 3.402823466 × 10³⁸, the max finite positive value representable in single precision.

Zeros and infinities

0000 = 0 00000000 0000000 = 0
8000 = 1 00000000 0000000 = −0

7f80 = 0 11111111 0000000 = infinity
ff80 = 1 11111111 0000000 = −infinity

Special values

4049 = 0 10000000 1001001 = 3.140625 ≈ π ( pi )
3eab = 0 01111101 0101011 = 0.333984375 ≈ 1/3

NaNs

ffc1 = x 11111111 1000001 => qNaN
ff81 = x 11111111 0000001 => sNaN

References

[1]
Teich, Paul (2018-05-10). "Tearing Apart Google's TPU 3.0 AI Coprocessor". The Next Platform. Retrieved 2020-08-11. Google invented its own internal floating point format called "bfloat" for "brain floating point" (after Google Brain).
[2]
Wang, Shibo; Kanwar, Pankaj (2019-08-23). "BFloat16: The secret to high performance on Cloud TPUs". Google Cloud. Retrieved 2020-08-11. This custom floating point format is called "Brain Floating Point Format," or "bfloat16" for short. The name flows from "Google Brain", which is an artificial intelligence research group at Google where the idea for this format was conceived.
[3]
Tagliavini, Giuseppe; Mach, Stefan; Rossi, Davide; Marongiu, Andrea; Benin, Luca (2018). "A transprecision floating-point platform for ultra-low power computing". 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). pp. 1051–1056. arXiv:1711.10374. doi:10.23919/DATE.2018.8342167. ISBN 978-3-9819263-0-9. S2CID 5067903.
[4]
Dr. Ian Cutress (2020-03-17). "Intel': Cooper lake Plans: Why is BF16 Important?". Retrieved 2020-05-12. The bfloat16 standard is a targeted way of representing numbers that give the range of a full 32-bit number, but in the data size of a 16-bit number, keeping the accuracy close to zero but being a bit more loose with the accuracy near the limits of the standard. The bfloat16 standard has a lot of uses inside machine learning algorithms, by offering better accuracy of values inside the algorithm while affording double the data in any given dataset (or doubling the speed in those calculation sections).
[5]
Khari Johnson (2018-05-23). "Intel unveils Nervana Neural Net L-1000 for accelerated AI training". VentureBeat. Retrieved 2018-05-23. ...Intel will be extending bfloat16 support across our AI product lines, including Intel Xeon processors and Intel FPGAs.
[6]
Michael Feldman (2018-05-23). "Intel Lays Out New Roadmap for AI Portfolio". TOP500 Supercomputer Sites. Retrieved 2018-05-23. Intel plans to support this format across all their AI products, including the Xeon and FPGA lines
[7]
Lucian Armasu (2018-05-23). "Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019". Tom's Hardware. Retrieved 2018-05-23. Intel said that the NNP-L1000 would also support bfloat16, a numerical format that's being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.
[8]
"Available TensorFlow Ops | Cloud TPU | Google Cloud". Google Cloud. Retrieved 2018-05-23. This page lists the TensorFlow Python APIs and graph operators available on Cloud TPU.
[9]
Elmar Haußmann (2018-04-26). "Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50". RiseML Blog. Archived from the original on 2018-04-26. Retrieved 2018-05-23. For the Cloud TPU, Google recommended we use the bfloat16 implementation from the official TPU repository with TensorFlow 1.7.0. Both the TPU and GPU implementations make use of mixed-precision computation on the respective architecture and store most tensors with half-precision.
[10]
Tensorflow Authors (2018-07-23). "ResNet-50 using BFloat16 on TPU". Google. Retrieved 2018-11-06.
[11]
"BFloat16 extensions for Armv8-A". community.arm.com. 29 August 2019. Retrieved 2019-08-30.
[12]
"AArch64: add support for newer Apple CPUs · llvm/llvm-project@677da09". GitHub. Retrieved 2023-05-08.
[13]
"CUDA Library bloat16 Intrinsics".
[14]
"ROCm version history". github.com. Retrieved 2019-10-23.
[15]
Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous (2017-11-28). TensorFlow Distributions (Report). arXiv:1711.10604. Bibcode:2017arXiv171110604D. Accessed 2018-05-23. All operations in TensorFlow Distributions are numerically stable across half, single, and double floating-point precisions (as TensorFlow dtypes: tf.bfloat16 (truncated floating point), tf.float16, tf.float32, tf.float64). Class constructors have a validate_args flag for numerical asserts{{cite report}}: CS1 maint: multiple names: authors list (link)
[16]
"Livestream Day 1: Stage 8 (Google I/O '18) - YouTube". Google. 2018-05-08. Retrieved 2018-05-23. In many models this is a drop-in replacement for float-32
[17]
"The bfloat16 numerical format". Google Cloud. Retrieved 2023-07-11. On TPU, the rounding scheme in the conversion is round to nearest even and overflow to inf.
[18]
"Arm A64 Instruction Set Architecture". developer.arm.com. Retrieved 2023-07-26. Uses the non-IEEE Round-to-Odd rounding mode.
[19]
"1.3.5. Bfloat16 Precision Conversion and Data Movement" (PDF). docs.nvidia.com. p. 199. Retrieved 2023-07-26. Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.

Share this article:

This article uses material from the Wikipedia article Bfloat16_floating-point_format, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[1] [1]
Teich, Paul (2018-05-10). "Tearing Apart Google's TPU 3.0 AI Coprocessor". The Next Platform. Retrieved 2020-08-11. Google invented its own internal floating point format called "bfloat" for "brain floating point" (after Google Brain).

[2] [2]
Wang, Shibo; Kanwar, Pankaj (2019-08-23). "BFloat16: The secret to high performance on Cloud TPUs". Google Cloud. Retrieved 2020-08-11. This custom floating point format is called "Brain Floating Point Format," or "bfloat16" for short. The name flows from "Google Brain", which is an artificial intelligence research group at Google where the idea for this format was conceived.

[3] [3]
Tagliavini, Giuseppe; Mach, Stefan; Rossi, Davide; Marongiu, Andrea; Benin, Luca (2018). "A transprecision floating-point platform for ultra-low power computing". 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). pp. 1051–1056. arXiv:1711.10374. doi:10.23919/DATE.2018.8342167. ISBN 978-3-9819263-0-9. S2CID 5067903.

[Why-4] [4]
Dr. Ian Cutress (2020-03-17). "Intel': Cooper lake Plans: Why is BF16 Important?". Retrieved 2020-05-12. The bfloat16 standard is a targeted way of representing numbers that give the range of a full 32-bit number, but in the data size of a 16-bit number, keeping the accuracy close to zero but being a bit more loose with the accuracy near the limits of the standard. The bfloat16 standard has a lot of uses inside machine learning algorithms, by offering better accuracy of values inside the algorithm while affording double the data in any given dataset (or doubling the speed in those calculation sections).

[vent_Inte-5] [5]
Khari Johnson (2018-05-23). "Intel unveils Nervana Neural Net L-1000 for accelerated AI training". VentureBeat. Retrieved 2018-05-23. ...Intel will be extending bfloat16 support across our AI product lines, including Intel Xeon processors and Intel FPGAs.

[top5_Inte-6] [6]
Michael Feldman (2018-05-23). "Intel Lays Out New Roadmap for AI Portfolio". TOP500 Supercomputer Sites. Retrieved 2018-05-23. Intel plans to support this format across all their AI products, including the Xeon and FPGA lines

[toms_Inte-7] [7]
Lucian Armasu (2018-05-23). "Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019". Tom's Hardware. Retrieved 2018-05-23. Intel said that the NNP-L1000 would also support bfloat16, a numerical format that's being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.

[clou_Avai-8] [8]
"Available TensorFlow Ops | Cloud TPU | Google Cloud". Google Cloud. Retrieved 2018-05-23. This page lists the TensorFlow Python APIs and graph operators available on Cloud TPU.

[blog_Comp-9] [9]
Elmar Haußmann (2018-04-26). "Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50". RiseML Blog. Archived from the original on 2018-04-26. Retrieved 2018-05-23. For the Cloud TPU, Google recommended we use the bfloat16 implementation from the official TPU repository with TensorFlow 1.7.0. Both the TPU and GPU implementations make use of mixed-precision computation on the respective architecture and store most tensors with half-precision.

[gith_tens-10] [10]
Tensorflow Authors (2018-07-23). "ResNet-50 using BFloat16 on TPU". Google. Retrieved 2018-11-06.

[11] [11]
"BFloat16 extensions for Armv8-A". community.arm.com. 29 August 2019. Retrieved 2019-08-30.

[12] [12]
"AArch64: add support for newer Apple CPUs · llvm/llvm-project@677da09". GitHub. Retrieved 2023-05-08.

[13] [13]
"CUDA Library bloat16 Intrinsics".

[14] [14]
"ROCm version history". github.com. Retrieved 2019-10-23.

[arxiv_1711.10604-15] [15]
Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous (2017-11-28). TensorFlow Distributions (Report). arXiv:1711.10604. Bibcode:2017arXiv171110604D. Accessed 2018-05-23. All operations in TensorFlow Distributions are numerically stable across half, single, and double floating-point precisions (as TensorFlow dtypes: tf.bfloat16 (truncated floating point), tf.float16, tf.float32, tf.float64). Class constructors have a validate_args flag for numerical asserts{{cite report}}: CS1 maint: multiple names: authors list (link)

[googleio18-day1-time2575-16] [16]
"Livestream Day 1: Stage 8 (Google I/O '18) - YouTube". Google. 2018-05-08. Retrieved 2018-05-23. In many models this is a drop-in replacement for float-32

[google_TPU-17] [17]
"The bfloat16 numerical format". Google Cloud. Retrieved 2023-07-11. On TPU, the rounding scheme in the conversion is round to nearest even and overflow to inf.

[arm_product-18] [18]
"Arm A64 Instruction Set Architecture". developer.arm.com. Retrieved 2023-07-26. Uses the non-IEEE Round-to-Odd rounding mode.

[19] [19]
"1.3.5. Bfloat16 Precision Conversion and Data Movement" (PDF). docs.nvidia.com. p. 199. Retrieved 2023-07-26. Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Exponent	Significand zero	Significand non-zero	Equation
00_H	zero, −0	subnormal numbers	(−1)^signbit×2⁻¹²⁶× 0.significandbits
01_H, ..., FE_H	normalized value		(−1)^signbit×2^{exponentbits−127}× 1.significandbits
FF_H	±infinity	NaN (quiet, signaling)

Bfloat16_floating-point_format

bfloat16 floating-point format

bfloat16 floating-point format

Exponent encoding

Rounding and conversion

Encoding of special values

Positive and negative infinity

Not a Number

Range and precision

Examples

Zeros and infinities

Special values

NaNs

See also

References

Share this article: