Oracle // NVIDIA Demo

Explore RAG with Oracle 23ai and NVIDIA KVDB for accelerated vector search. Discover how to enhance database capabilities for AI applications.

Oracle Database 23ai RAG NVIDIA KVDB

Overview

Oracle Database integration with NVIDIA KVDB for vector search.

Technical approach:

RAG blueprint with Oracle 23ai database backend
Accelerated vector search using NVIDIA KVDB

Tech stack

Oracle Database 23ai

Oracle Database 23ai is a converged, long-term support database release that integrates native vector search and JSON-relational duality to simplify AI-driven application development.

Oracle Database 23ai represents a major evolution in enterprise data management, focusing heavily on artificial intelligence and developer efficiency. By incorporating built-in AI Vector Search, developers can run fast similarity queries directly alongside business data without managing a separate vector database. The release also solves the classic object-relational mismatch through JSON Relational Duality Views: a feature allowing applications to interact with data as simple JSON documents while storing it securely in relational tables. With additional capabilities like True Cache for high-speed caching and support for property graphs, this long-term support release delivers the performance and versatility required for modern, mission-critical workloads.

https://www.oracle.com/database/

View projects
RAG

RAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.

RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.

https://en.wikipedia.org/wiki/Retrieval-augmented_generation

View projects
NVIDIA KVDB

NVIDIA KVBM is a high-performance, tiered memory manager that offloads and shares LLM key-value caches across GPU, CPU, SSD, and remote storage.

NVIDIA KVBM (Key-Value Block Manager) serves as the dedicated memory orchestration layer within the NVIDIA Dynamo inference framework. It eliminates the GPU memory bottleneck during long-context and agentic AI workloads by establishing a four-tier storage hierarchy: GPU HBM, pinned CPU host memory, local SSDs, and remote cloud or object storage. By abstracting KV cache management from specific engine runtimes like vLLM and TensorRT-LLM, KVBM allows teams to store, retrieve, and share context blocks across distributed GPU clusters. This architecture prevents expensive prefix recomputations, improves first-token latency, and scales multi-turn AI conversations without requiring additional GPU hardware.

https://github.com/ai-dynamo/dynamo

View projects