NVIDIA Accelerates Generative AI on Windows PCs with TensorRT-LLM

Generative AI, one of the most crucial trends in personal computing, is getting a significant performance boost. NVIDIA’s TensorRT-LLM for Windows, an open-source library, is set to accelerate large language models like Llama 2 and Code Llama, making them up to four times faster on RTX-powered Windows PCs. This new development follows closely on the heels of the introduction of TensorRT-LLM for data centers last month.

Supercharging Language Models for Enhanced User Experience

NVIDIA’s GeForce RTX and NVIDIA RTX GPUs, equipped with dedicated AI processors called Tensor Cores, have already brought the power of generative AI to over 100 million Windows PCs and workstations. These large language models are instrumental in a variety of tasks, from chat interactions and document summarization to drafting emails and blogs. The acceleration from TensorRT-LLM is particularly beneficial for more advanced applications like writing and coding assistants, which can now generate multiple, unique auto-complete results at once.

Integration with Retrieval-Augmented Generation

The speed boost is also advantageous when these language models are integrated with other technologies, such as retrieval-augmented generation (RAG). In RAG, a language model is paired with a vector library or database, enabling it to provide more targeted responses based on a specific dataset. For instance, the Llama 2 base model provided a more accurate and faster answer when paired with RAG and accelerated by TensorRT-LLM.

Developer Tools and Beyond Language Models

To help developers optimize their language models, NVIDIA has released a suite of tools, including scripts for custom model optimization and a developer reference project. These tools demonstrate both the speed and quality of language model responses when accelerated by TensorRT. In addition to language models, TensorRT is now also accelerating Stable Diffusion, a generative AI diffusion model, by up to two times.

Updates in Video Super Resolution

The release also includes an update to RTX Video Super Resolution (VSR) version 1.5 as part of today’s Game Ready Driver release. This update further improves visual quality and adds support for RTX GPUs based on the NVIDIA Turing architecture.

The Future of AI with NVIDIA

With these advancements, NVIDIA continues to lead in the AI era, supercharging a wide array of applications from gaming and video to development and more. TensorRT-LLM will soon be available for download from the NVIDIA Developer website, offering a new level of performance and quality for end-users and developers alike.

GGWPTECH | Tech, Gaming & Music Gear