Chipmakers push for a standardized FP8 format for AI

In March, Nvidia introduced its GH100, the first GPU based on the new “Hopper” architecture, which targets both HPC and AI workloads, and especially for the latter, supports a floating-point processing format. Eight-bit FP8. Two months later, rival Intel released Gaudi2, the second generation of its AI drive chip, which also sports an FP8 form factor.

The FP8 format is important for a number of reasons, not the least of which is that until now there was some sort of separation between AI inference, done at low precision in integer formats (usually INT8 but sometimes INT4), with IA training being performed in FP16, FP32 or FP64 accuracy and HPC performed in FP32 or FP64 accuracy. Both Nvidia and Intel claim that FP8 can be used not only for inference, but also for AI training in some cases, dramatically increasing the effective throughput of their accelerators.

This is important because toggling between floating point and integer formats is a pain in the neck, and it’s much easier than staying floating point. Also, at some point in the future, if inference moves to 8-bit FP8 and maybe even 4-bit FP4 formats, that means valuable chips dedicated to the entire processor can be freed up and used for something else.

In a post-Moore’s Law world, every transistor is sacred, every clock cycle must be cherished. Companies are looking for more efficient ways to perform AI tasks at a time when advances in processing speed are as rapid as in the past. Organizations need to figure out how to improve processing capabilities, especially for training, using the power currently available. Lower precision data formats can help.

AI chip makers see the benefits. In June, Graphcore published a 30-page study that showed not only the superior performance of low-precision floating-point formats over scaled integers of similar size, but also the long-term benefits of energy consumption in training initiatives that include a fast-growing model. sizes.

“Low-precision digital formats can be a key component of large machine learning models that provide state-of-the-art accuracy while reducing their environmental impact,” the researchers wrote. “In particular, by using 8-bit floating-point arithmetic, energy efficiency can be increased up to 4× compared to float-16 arithmetic and up to 16× compared to float- 32.”

Now Graphcore is beating the drum for the IEEE to adopt the vendor’s FP8 format designed for AI as the standard anyone can work on. The company made its keynote this week, with Graphcore co-founder and chief technology officer Simon Knowles saying that “the advent of 8-bit floating point offers huge performance and efficiency benefits for AI calculation. It’s also an opportunity for the industry to settle on a single open standard, rather than introduce a confusing mix of competing formats.

AMD and Qualcomm are also lending their support to the Graphcore initiative, with John Kehrli, senior director of product management at Qualomm, saying the proposal “has emerged as a compelling format for 8-bit floating-point computing, offering significant performance and efficiencies for inference and can help reduce training and inference costs for cloud and edge.”

AMD is expected to support the FP8 form factor in the upcoming Instinct MI300A APU, which will combine an AMD GPU and an Epyc 7004 processor in a single package. We expect there will also be regular MI300 discrete GPUs, and that they will also support FP8 data and processing.

It would also benefit the range of AI chipmakers, including SambaNova, Cerebras and Groq.

Graphcore argues that the use of lower and mixed precision formats – such as using 16-bit and 32-bit simultaneously – is common in AI and strikes a good balance between precision and efficiency at a time when the law de Moore and Dennard Scaling are slowing down.

FP8 gives the AI ​​industry a chance to adopt a “native AI” standard and interoperability between systems for inference and training. Graphcore will also give its specifications to other industry players until the IEEE formalizes a standard.

“As the complexity of deep learning applications continues to increase, the scalability of machine learning systems has also become indispensable,” the Graphcore researchers wrote in their paper. “Training large distributed models creates a number of challenges, relying on the efficient use of available compute, memory, and networking resources shared between different nodes, constrained by the available power budget. In this context, the use of efficient digital formats is of crucial importance, as it allows for increased energy efficiency due to both improved computational efficiency and communication efficiency in the exchange of data between processing units.

Chipmakers have been evaluating the use of lower precision formats for some time. In 2019, IBM Research unveiled a quad-core AI chip based on 7-nanometer EUV technology that supported hybrid FP16 and FP8 form factors for training and inference.

“This novel hybrid training method fully preserves model accuracy across a wider range of deep learning models,” IBM Research experts wrote in a blog post. “The FP8-bit hybrid format also overcomes the loss of previous training accuracy on models such as MobileNet (Vision) and Transformer (NLP), which are more susceptible to information loss due to quantization. To overcome this challenge , the Hybrid FP8 scheme adopts a new FP8-bit format in the forward path for higher resolution and another FP8-bit format for gradients in the backward path for greater range.

Two years later, IBM demonstrated a test chip at the ISSCC 2021 event that supported 16- and 8-bit training and 4- and 2-bit inference.

“The sophistication and adoption of AI models is growing rapidly, now used for drug discovery, modernizing legacy computing applications, and writing code for new applications,” the researchers wrote at the time. ‘IBM. “But the rapid evolution of the complexity of AI models also increases the energy consumption of the technology, and a big problem has been to create sophisticated AI models without increasing the carbon footprint. Historically, the field simply accepted that if the need for computing is large, so will be the power needed to power it.

Now the ball is in IEEE’s court to bring everyone together and create a standard.

Comments are closed.