Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Hyparquet: Parquet in Browser
Parse Apache Parquet files in the browser with a pure‑JavaScript library, streaming multi‑gigabyte LLM datasets from URLs and exploring them without any backend.
6 months ago I started looking at LLM datasets on huggingface. Most of them are in Apache Parquet format. I wanted to LOOK at the data in these files. And of course it is possible to query parquet in duckdb or a notebook, but if you want to view it in a modern UI, the browser, there was no good way to do so. So I build Hyparquet - an open source Apache Parquet parser written in pure JavaScript, which can stream data from remote urls. I will talk about how I built it, why it’s useful, and give a demo of loading massive (multi-gigabyte) datasets in the browser, with no backend.