How to finetune on open-source permissive-use data for hallucination detection

Explores finetuning with open‑source, permissive data to improve hallucination detection in news summaries, using out‑of‑domain samples to boost FIB benchmark model performance.

Overview

It’s easy to finetune models for our specific tasks; we just need a couple hundred or a few thousand samples. However, collecting these samples is costly and time-consuming. What if, we could bootstrap our tasks with out-of-domain data? We’ll explore this idea here. The task is to finetune a model to detect factual inconsistencies aka hallucinations. We’ll focus on news summaries in the Factual Inconsistency Benchmark (FIB). It is a challenging dataset—even after finetuning for 10 epochs on the training split, the model still does poorly on the validation split. But with some finetuning on out-of-domain data, we’ll see how the model improves.

Links

https://eugeneyan.com/writing/finetuning/
Out-of-domain finetuning a BART NLI model significantly improves hallucination detection.
https://github.com/eugeneyan/visualizing-finetunes
This project demonstrates out-of-domain pre-finetuning dramatically boosts hallucination detection performance.

Tech stack