MoCA
Propose MoA+ to improve Together AI’s Mixture of Agents (Jun 2024) and Feb 2025 simplification for synthetic data, LLM-judges.
Project Description
Mixture of Agents (MoA) is an ensemble approach to language generation that uses producer and aggregator models to collaboratively refine content. MoAs typically have the following architecture: MoAs consist of a few layers of LLMs and each layer consists of a number of LLMs. At the end of each llayer, each agent receives the original user query and all outputs from the prior layer, synthesizing an improved answer. A final aggregator eventually produces the ultimate response. The number of layers, number of agents per layer, and model diversity are customizable. This method has shown promise, matching or exceeding top language-model performance while relying solely on open-source models on tasks like LLM-as-a-Judge, synthetic data generation, problem solving, etc.
We sought to test two architectural tweaks to the original design, calling the adjusted structure MoCA, with testing performed on a random subset of the MMLU pro benchmark. Because Ollama’s local-inference overhead is relatively high, the hackathon schedule limited us to a very, very small random subset of the benchmark. We plan to extend to larger, or preferably complete, benchmark testing set in the future.
Baseline. We first reproduced the original MoA setup with today’s strong, <30 B parameter size, open source models—Qwen-3-12b, Gemma-3-12b, and Mistral-nemo; and used DeepSeek r1:-distill-qwen14b as the final aggregator.
MoCA variant. We then swapped the standard architecture for our MoCA design built with the same models.
Although the hackathon theme focuses on agentic systems, our MoCA prototype has not yet been extended to autonomous task execution. Standing up the new architecture and running preliminary baseline comparisons consumed the available time, although our original goal was to extend it to an agentic application for the hackathon. Integrating MoCA for an agentic use-case remains interesting future work.
Regrettably, our baseline tests had not been completed by the submission deadline, so we aren’t yet ready to open-source our code or publish benchmark metrics. We’re happy to demo our current progress and, once we finish validation tests and additional permutations, we plan to release the code for downstream agentic use-cases and post our approach to arXiv.