Scaling LLM Inference with vLLM and AWS Tranium

Join us in this hands-on workshop to learn how to deploy and optimize large language models (LLMs) for scalable inference at enterprise scale. Participants will learn to orchestrate distributed LLM serving with vLLM on Amazon EKS, enabling robust, flexible, and highly available deployments. The session demonstrates how to utilize AWS Trainium hardware within EKS to maximize throughput and cost efficiency, leveraging Kubernetes-native features for automated scaling, resource management, and seamless integration with AWS services.

Location: Room 206

Duration: 1 hour

Speaker(s):

Author:

Asheesh Goja

Principal GenAI Solutions Architect

AWS

Author:

Pinak Panigrahi

Sr. Machine Learning Architect - Annapurna ML

AWS

Session Type:

General Session (Presentation)