Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
LLM Evaluation Labeling Workflow
Learn an effective workflow for labeling data, evaluating LLM-evaluators against human judgments, and potentially optimizing them. This talk offers a practical approach to assessing LLM-powered experiences.
A demo workflow and UX for labeling data, using it to evaluate LLM-evaluators, and then aligning the LLM-evaluator to human judgments (and perhaps optimizing the evaluator!)
P.S., Kyle Corbitt of OpenPipe will be demoing something similar, and I hope to have a faceoff with them by going before them so the audience can decide the pros and cons of each.