FAQ Events Sponsorship Merch 🚀 One-Shot 📚 Paper Club Abu Dhabi Atlanta Austin Bengaluru Berlin Bogotá Boise Boston Cali Chicago Cincinnati Denver-Boulder Dubai Dublin Fort Wayne Ho Chi Minh City Kuala Lumpur London Los Angeles Medellín Montreal Mumbai New York City Oklahoma City Palo Alto Paris Portland Prague San Francisco Santa Barbara Seattle Singapore Tashkent Toronto Tulsa Waterloo Zürich Jobs

Ultravox: Open Source Speech LLM

Presented by Justin Uberti on June 07, 2024 at AI Tinkerers Seattle - June 2024

Abstract

Ultravox is a new multimodal LLM that is able to directly understand speech (unlike current voice AI stacks, it does not require a separate speech recognition stage). This approach makes voice AI applications faster, more robust, and allows them to understand the non-textual parts of speech.

It builds on a Llama 3 backbone which means that it can be trained much faster than a typical foundation model. We've just open-sourced Ultravox at https://ultravox.ai and are working on growing a community around it.

Justification

With the announcement of GPT-4o, there has been a spotlight on speech LLMs. Ultravox shows that there's a path to support the same sort of functionality with open source models. Accordingly, this talk will be useful for people building voice AI applications, or interested in pushing open source AI forward. The talk will include a brief discussion of multi modality, an overview of the Ultravox architecture, a basic API walkthrough, and finally, an end-to-end demo.

ultravox.ai ai.town