AI/ML

Building a RAG System with DeepSeek R1, Ollama and LangChain

Deepseek Model for your Business?

Cost Efficiency (Open Source)
Lower Long Term costs
Customised data control
Pre-trained model

Get Your Deepseek AI Model Running in a Day

Need technical help?

Our experts will get back to you within 24 hours.

Free Installation Guide - Step by Step Instructions Inside!

Overview

A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.

RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:

Load DeepSeek R1 using Ollama.
Process and store document embeddings.
Retrieve relevant documents based on user queries.
Generate responses using retrieved context.

Step 1: Install Required Dependencies

Before setting up the system, install the necessary dependencies:

pip install langchain langchain-community chromadb pypdf streamlit ollama

LangChain: Framework for retrieval-based LLM applications.
Chromadb: Vector database for storing and searching embeddings.
PyPDF: Used for loading and parsing PDF documents.
Ollama: Runs the DeepSeek R1 model locally.

Installing DeepSeek R1 in Ollama

Run the following command to download DeepSeek R1 to your machine:

ollama pull deepseek-r1

Step 2: Project Structure

Below is the recommended project structure:

rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point

Step 3: Load and Process Documents

To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.

File: “embeddings/text_splitter.py”

from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)

This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.

Step 4: Generate and Store Embeddings

Now, we need to convert the text chunks into embeddings and store them in a vector database.

File: “embeddings/vector_store.py”

from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()

Uses ChromaDB to store text embeddings.

DeepSeek R1 is used to generate embeddings via Ollama.

Step 5: Retrieve Relevant Information

When a user asks a question, we retrieve the most relevant text chunks from the vector database.

File: “app/retriever.py”

from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)

Uses cosine similarity to find the top 3 most relevant text chunks.

Step 6: Load DeepSeek R1 in Ollama

To process user queries, we need to load the DeepSeek R1 model using Ollama.

File: “ollama_model/deepseek_r1.py”

import ollama

def load_llm():

return ollama.Chat(model="deepseek-r1")

Initializes DeepSeek R1 as the primary language model.

Step 7: RAG Chain – Combining Retrieval with LLM

Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.

File: “app/rag_chain.py”

from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response

This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.

Step 8: Create a Web UI with Streamlit

To allow users to interact with the system, we use Streamlit for a simple web interface.

File: “app/streamlit_app.py”

import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)

The app provides a text input for user queries and displays responses.

Run the UI:

streamlit run app/streamlit_app.py

Step 9: Running the Complete RAG System

File: “main.py”

from embeddings.text_splitter import split_textfrom embeddings.vector_store import store_embeddingsdef main(): print("[1/2] Splitting and processing documents...") chunks = split_text("data/sample.pdf") print("[2/2] Generating and storing embeddings...") store_embeddings(chunks) print("Embeddings stored. You can now run the Streamlit app with:\n") print(" streamlit run app/streamlit_app.py")if __name__ == "__main__": main()

Once all components are ready, follow these steps to run the full system.

Start Ollama and Ensure DeepSeek R1 is Available

ollama pull deepseek-r1

Run the Main Pipeline

python main.py

Launch the Web UI

streamlit run app/streamlit_app.py

System Requirements

CPU: 8-core processor (Intel/AMD)
RAM: 16GB+
GPU: NVIDIA RTX 3090+ (for faster inference)
Disk Space: 20GB+ (for model and embeddings)
OS: Ubuntu 20.04 / 22.04

Summary

Documents are split into smaller chunks.
Embeddings are stored using ChromaDB.
User queries retrieve relevant document chunks.
DeepSeek R1 generates answers using context aware retrieval.
A Streamlit UI enables user interaction.

This completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.

Ready to transform your business with our technology solutions? Contact Us today to Leverage Our AI/ML Expertise.

Experts in AI, ML, and automation at OneClick IT Consultancy

AI Force

AI Force at OneClick IT Consultancy pioneers artificial intelligence and machine learning solutions. We drive COE initiatives by developing intelligent automation, predictive analytics, and AI-driven applications that transform businesses.

Comment

AI/ML

Related Center Of Excellence

See all