3:00 PM EST
• 2 min read
Dan Alistarh
Dan Alistarh
IST Austria & Red Hat

Title: Massive Models in Low Precision: Power, Limits, and Scaling Laws

Abstract: Highly-accurate machine learning models for language or vision can have massive computational and memory costs. In this talk, I will provide an overview of our work on reducing these costs for both inference–serving such models–and for training them. The first project, called FPTQ, studies one-shot, post-training quantization (PTQ) for recent compression formats, uncovering some of their surprising accuracy characteristics, and leading to an efficient variant of the format with state-of-the-art accuracy. The second project, called Quartet, investigates training using NVFP4 for all the major matrix multiplications (both forward and backward), and is the first to show that training natively in this format can be optimal in terms of accuracy-vs-wall-clock-time, in certain data and compute budget regimes. Both projects come with GPU kernel support, released in our open-source library, called QuTLASS.

Bio: Dan Alistarh is a Professor at IST Austria, in Vienna. Previously, he was a Researcher with Microsoft, a Postdoc at MIT CSAIL, and received his PhD from the EPFL. His research is on algorithms for efficient machine learning and high-performance computing, with a focus on scalable DNN inference and training. In his spare time, he works with the ML research team at Neural Magic/Red Hat AI on making compression faster, more accurate and accessible.