LABEL: LLM Assistance for Better Evaluation Labels

Learn an effective workflow for labeling data, evaluating LLM-evaluators against human judgments, and potentially optimizing them. This talk offers a practical approach to assessing LLM-powered experiences.

Overview

A demo workflow and UX for labeling data, using it to evaluate LLM-evaluators, and then aligning the LLM-evaluator to human judgments (and perhaps optimizing the evaluator!)

P.S., Kyle Corbitt of OpenPipe will be demoing something similar, and I hope to have a faceoff with them by going before them so the audience can decide the pros and cons of each.

Links

https://github.com/eugeneyan/label-app

Tech stack