Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Detecting Data in LLM Training
We present a method to test whether a specific text corpus appears in a language model’s training data, using statistical probing and verification techniques.
It’s been a debatable topic to prove or disprove if private data has been used in LLM.
In this demo, we show a mechanism to detect if a particular corpus has been used in training.
While its hard to show (may be impossible) the source from which data has scraped, but it is not that challenging to that certain data has been used in our