Join us in this hands-on workshop to learn how to deploy and optimize large language models (LLMs) for scalable inference at enterprise scale. Participants will learn to orchestrate distributed LLM serving with vLLM on Amazon EKS, enabling robust, flexible, and highly available deployments. The session demonstrates how to utilize AWS Trainium hardware within EKS to maximize throughput and cost efficiency, leveraging Kubernetes-native features for automated scaling, resource management, and seamless integration with AWS services.
Location: Room 206
Duration: 1 hour
Sponsor(s):
AWS
Speaker(s):

Asheesh Goja
Principal GenAI Solutions Architect
AWS

Pinak Panigrahi
Sr. Machine Learning Architect - Annapurna ML
AWS
Session Type:
General Session (Presentation)