power-efficient-nn

Power-Efficient Neural Networks Using Low-Precision Data Types and Quantization

Tutorial at CVPR 2025, Nashville, USA, on June 12.

Abstract

The growing size of neural networks, particularly in generative AI, poses significant challenges in terms of sustainability, time, and cost, hindering their study and practical application. Low-precision data types and computations, especially when natively supported by hardware, offer an effective solution, enabling broader research access and deployment on edge devices. However, to this end networks that are usually trained with high-precision data types have to be prepared for low-precision execution. In this tutorial, we review different low-precision data types and showcase typical challenges of their application, like outlier handling, on simple hands-on examples. In order to maintain the original task performance of neural networks, sophisticated quantization methods are required to compensate for quantization errors induced by low-precision data types. We introduce and compare the most common and effective methods to quantize neural networks and provide guidance for practitioners.

Schedule

1:00 pm - 1:10 pm: Opening remarks
- Recording
1:10 pm - 2:10 pm: Session 1
- Title: Low-precision data types and computation
- Speaker: Thomas Pfeil
- Recording / High-quality recording / Slides / Code
2:10 pm - 3:10 pm: Session 2
- Title: Quantization algorithms fundamentals
- Speaker: Markus Nagel
- Recording / Slides
3:10 pm - 3:30 pm: Coffee break
3:30 pm - 4:30 pm: Session 3
- Title: Advanced LLM quantization methods
- Speaker: Tijmen Blankevoort
- Recording / Slides

Speaker


Thomas Pfeil	Markus Nagel	Tijmen Blankevoort
Recogni	Qualcomm	Meta