VVZ API is not affiliated with ETH Zurich. Data might be outdated or incorrect. Please view the official ETHZ Vorlesungsverzeichnis for binding information.

263-5059-00L 3 Credits MSC , WBZ D-MATH , D-INFK
You're viewing possible stale or outdated data. Please check the latest semester for more up-to-date information.

Large-Scale AI Engineering

Lecturers & Examiners: Dr. Arnout Devos, Dr. Imanol Schlág
VVZ CR 4.1

Last Updated: 2026-06-01 11:33:07

Abstract

This course focuses on the engineering principles and practices required to develop and optimize large-scale AI systems. Studentswill gain hands-on experience with high-performance computing (HPC) infrastructures, emphasizing the deployment and scaling of AI models on advanced GPU clusters.

Objective

By the end of this course, students will be able to: 1. Understand the architecture and components of large-scale AI systems. 2. Apply HPC techniques to enhance the performance of AI model training and inference. 3. Implement optimizations, such as model parallelization, in AI workflows. 4. Collaborate effectively in teams to improve AI system throughput and scalability.

Content

1. Introduction to Large-Scale AI Systems: Overview of AI architectures and challenges in scaling. 2. High-Performance Computing Fundamentals: Principles of HPC, including parallel computing and GPU acceleration. 3. AI Model Optimization Techniques: Strategies such as FP8 precision and flash attention to improve efficiency. 4. Efficient Distributed Workload Execution: Deploying and managing large-scale AI workloads on advanced HPC infrastructure. 5. AI Hardware Overview: Latest advancements in AI hardware, including GPUs, specialized AI accelerators, and emerging technologies. 6. Performance Monitoring and Profiling: Tools and methods for assessing and enhancing system performance. 7. Team-Based Projects: Collaborative efforts to optimize AI models, culminating in a competition to achieve the highest throughput.

Resources

Learning Materials (Links)

General Information

Language
English
Levels
MSC , WBZ
Frequency
Semesterly recurring

Examination

Type
ungraded semester performance
Last cancellation/deregistration date for this ungraded semester performance: Sunday, 2 March 2025! Please note that no deregistration will be accepted after that date, and the course will be considered a "fail".

Registration & Places

Max Places
80
Signup End
02.03.2025
Priority: Registration for the course unit is until 24.02.2025 only possible for the primary target group

Course Components

Type Title Time & Place Hours
lecture Large-Scale AI Engineering
  • Tue 16:00-18:00 (OAT S 15)
  • Tue 16:00-18:00 (OAT S 16)
2 h weekly

Offered In