Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Beyond RAGAS: Agent Evaluation
Learn practical methods to evaluate RAG and agent outputs using semantic similarity, factual consistency checks, hallucination detection, and planning assessment through live coding examples.
This hands-on demo will introduce practical approaches for comprehensive agent evaluation that go beyond basic RAGAS metrics. We’ll explore multi-dimensional evaluation frameworks that combine semantic similarity, factual consistency checking, and novel hallucination detection methods, demonstrated through live coding examples using real-world agent outputs. We will investigate the nuts and bots of RAGAS and where it might fail. We will also trying to understand the fundametal framework to evaluate agent output and planning/reasoning