Scaling LLM Inference with vLLM and AWS Tranium | Kisaco Research

Join us in this hands-on workshop to learn how to deploy and optimize large language models (LLMs) for scalable inference at enterprise scale. Participants will learn to orchestrate distributed LLM serving with vLLM on Amazon EKS, enabling robust, flexible, and highly available deployments. The session demonstrates how to utilize AWS Trainium hardware within EKS to maximize throughput and cost efficiency, leveraging Kubernetes-native features for automated scaling, resource management, and seamless integration with AWS services.

Location: Room 206

Duration: 1 hour

Sponsor(s): 
AWS
Speaker(s): 

Author:

Asheesh Goja

Principal GenAI Solutions Architect
AWS

Asheesh Goja

Principal GenAI Solutions Architect
AWS

Author:

Pinak Panigrahi

Sr. Machine Learning Architect - Annapurna ML
AWS

Pinak Panigrahi

Sr. Machine Learning Architect - Annapurna ML
AWS
Session Type: 
General Session (Presentation)