CAG is an alternative to Retrieval Augmented Generation (RAG) that aims to improve the efficiency of Large Language Models (LLMs) by preloading knowledge into the model's context, rather than retrieving it during inference.
How CAG works:
Preloading: Relevant information is identified, preprocessed, and loaded into the LLM's context during initialization.
Caching: This preprocessed knowledge is stored in a key-value cache for efficient retrieval.
Query Processing: When a user query is received, the LLM uses the preloaded context to generate a response.
Benefits of CAG:
Faster Inference: By preloading knowledge, CAG eliminates the need for real-time retrieval, resulting in faster response times.
Simplified Architecture: CAG simplifies the system design by removing the need for external retrieval systems.
Improved Consistency: CAG can maintain a unified perspective on content, which favors consistency in responses.
Better for Static Knowledge: CAG is particularly well suited for applications with stable, knowledge heavy domains where data changes infrequently.