Production AI Evals & Observability

This workshop covers building evaluation pipelines, implementing real-time observability, collecting user feedback, and optimizing AI system performance in production environments.

Overview

This is a hands-on workshop focused on building and evaluating AI applications in production. The 4-hour session covers logging and analyzing system behavior, collecting user feedback, designing evaluations, and optimizing performance. Participants will work through practical exercises to build evaluation pipelines and implement observability tools, with real-world case studies and troubleshooting examples.

Links

https://wandb.me/weave-seattle
W&B Weave: LLM application framework for tracking, evaluating, and improving via SDKs.

Tech stack