AI/ML

Building a RAG System with DeepSeek R1, Ollama and LangChain

deepseek
Deepseek Model for your Business?
  • check icon

    Cost Efficiency (Open Source)

  • check icon

    Lower Long Term costs

  • check icon

    Customised data control

  • check icon

    Pre-trained model

Read More

Get Your Deepseek AI Model Running in a Day


Free Installation Guide - Step by Step Instructions Inside!

Overview

A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.

 

RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:

  • Load DeepSeek R1 using Ollama.
  • Process and store document embeddings.
  • Retrieve relevant documents based on user queries.
  • Generate responses using retrieved context.

 

Step 1: Install Required Dependencies

Before setting up the system, install the necessary dependencies:

pip install langchain langchain-community chromadb pypdf streamlit ollama
  • LangChain: Framework for retrieval-based LLM applications.
  • Chromadb: Vector database for storing and searching embeddings.
  • PyPDF: Used for loading and parsing PDF documents.
  • Ollama: Runs the DeepSeek R1 model locally.

Installing DeepSeek R1 in Ollama

Run the following command to download DeepSeek R1 to your machine:

ollama pull deepseek-r1

 

Step 2: Project Structure

Below is the recommended project structure:

rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point 

Step 3: Load and Process Documents

To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.

File: “embeddings/text_splitter.py”

from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)

 

This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.

Step 4: Generate and Store Embeddings

Now, we need to convert the text chunks into embeddings and store them in a vector database.

File: “embeddings/vector_store.py”

from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()

 

Uses ChromaDB to store text embeddings.

DeepSeek R1 is used to generate embeddings via Ollama.

Step 5: Retrieve Relevant Information

When a user asks a question, we retrieve the most relevant text chunks from the vector database.

File: “app/retriever.py”

from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)

 

Uses cosine similarity to find the top 3 most relevant text chunks.

 

Step 6: Load DeepSeek R1 in Ollama

To process user queries, we need to load the DeepSeek R1 model using Ollama.

File: “ollama_model/deepseek_r1.py”

import ollama def load_llm(): return ollama.Chat(model="deepseek-r1")

 

Initializes DeepSeek R1 as the primary language model.

Step 7: RAG Chain – Combining Retrieval with LLM

Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.

File: “app/rag_chain.py”

from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response

 

This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.

Step 8: Create a Web UI with Streamlit

To allow users to interact with the system, we use Streamlit for a simple web interface.

File: “app/streamlit_app.py”

import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)

 

The app provides a text input for user queries and displays responses.

Run the UI:

streamlit run app/streamlit_app.py

 

Step 9: Running the Complete RAG System

Once all components are ready, follow these steps to run the full system.

Start Ollama and Ensure DeepSeek R1 is Available

ollama pull deepseek-r1

Run the Main Pipeline

python main.py

Launch the Web UI

streamlit run app/streamlit_app.py

System Requirements

  • CPU: 8-core processor (Intel/AMD)
  • RAM: 16GB+
  • GPU: NVIDIA RTX 3090+ (for faster inference)
  • Disk Space: 20GB+ (for model and embeddings)
  • OS: Ubuntu 20.04 / 22.04

Summary

  • Documents are split into smaller chunks.
  • Embeddings are stored using ChromaDB.
  • User queries retrieve relevant document chunks.
  • DeepSeek R1 generates answers using context aware retrieval.
  • A Streamlit UI enables user interaction.

This completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.

 Ready to transform your business with our technology solutions? Contact Us  today to Leverage Our AI/ML Expertise. 

0

AI/ML

Related Center Of Excellence