NVIDIA Corporation (NASDAQ:NVDA) Q4 2023 Earnings Call Transcript

Published on February 26, 2023 at 5:17 am by Insider Monkey Transcripts in News, Transcripts

Page 2 of 8

Jensen Huang: Large language models are called large because they are quite large. However, remember that we’ve accelerated and advanced AI processing by a million x over the last decade. Moore’s Law, in its best days, would have delivered 100x in a decade. By coming up with new processors, new systems, new interconnects, new frameworks and algorithms and working with data scientists, AI researchers on new models, across that entire span, we’ve made large language model processing a million times faster, a million times faster. What would have taken a couple of months in the beginning, now it happens in about 10 days. And of course, you still need a large infrastructure. And even the large infrastructure, we’re introducing Hopper, which, with its transformer engine, it’s new NVLink switches and its new InfiniBand 400 gigabits per second data rates, we’re able to take another leap in the processing of large language models.

And so I think the — by putting NVIDIA’s DGX supercomputers into the cloud with NVIDIA DGX cloud, we’re going to democratize the access of this infrastructure, and with accelerated training capabilities, really make this technology and this capability quite accessible. So that’s one thought. The second is the number of large language models or foundation models that have to be developed is quite large. Different countries with different cultures and its body of knowledge are different. Different fields, different domains, whether it’s imaging or its biology or its physics, each one of them need their own domain of foundation models. With large language models, of course, we now have a prior that could be used to accelerate the development of all these other fields, which is really quite exciting.

The other thing to remember is that the number of companies in the world have their own proprietary data. The most valuable data in the world are proprietary. And they belong to the company. It’s inside their company. It will never leave the company. And that body of data will also be harnessed to train new AI models for the very first time. And so we — our strategy and our goal is to put the DGX infrastructure in the cloud so that we can make this capability available to every enterprise, every company in the world who would like to create proprietary data and so — proprietary models. The second thing about competition. We’ve had competition for a long time. Our approach, our computing architecture, as you know, is quite different on several dimensions.

Number one, it is universal, meaning you could use it for training, you can use it for inference, you can use it for models of all different types. It supports every framework. It supports every cloud. It’s everywhere. It’s cloud to private cloud, cloud to on-prem. It’s all the way out to the edge. It could be an autonomous system. This one architecture allows developers to develop their AI models and deploy it everywhere. The second very large idea is that no AI in itself is an application. There’s a preprocessing part of it and a post-processing part of it to turn it into an application or service. Most people don’t talk about the pre and post processing because it’s maybe not as sexy and not as interesting. However, it turns out that preprocessing and post-processing oftentimes consumes half or 2/3 of the overall workload.

And so by accelerating the entire end-to-end pipeline, from preprocessing, from data ingestion, data processing, all the way to the preprocessing all the way to post processing, we’re able to accelerate the entire pipeline versus just accelerating half of the pipeline. The limit to speed up, even if you’re instantly passed if you only accelerate half of the workload, is twice as fast. Whereas if you accelerate the entire workload, you could accelerate the workload maybe 10, 20, 50x faster, which is the reason why when you hear about NVIDIA accelerating applications, you routinely hear 10x, 20x, 50x speed up. And the reason for that is because we accelerate things end to end, not just the deep learning part of it, but using CUDA to accelerate everything from end to end.