Beyond RAGAS: Advanced Evaluation Frameworks for RAG and Agent Natural Language Outputs and Planning

Learn practical methods to evaluate RAG and agent outputs using semantic similarity, factual consistency checks, hallucination detection, and planning assessment through live coding examples.

Overview

This hands-on demo will introduce practical approaches for comprehensive agent evaluation that go beyond basic RAGAS metrics. We’ll explore multi-dimensional evaluation frameworks that combine semantic similarity, factual consistency checking, and novel hallucination detection methods, demonstrated through live coding examples using real-world agent outputs. We will investigate the nuts and bots of RAGAS and where it might fail. We will also trying to understand the fundametal framework to evaluate agent output and planning/reasoning

Tech stack