Cost Efficiency (Open Source)
Lower Long Term costs
Customised data control
Pre-trained model
Get Your Deepseek AI Model Running in a Day
A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.
RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:
Before setting up the system, install the necessary dependencies:
pip install langchain langchain-community chromadb pypdf streamlit ollama
Installing DeepSeek R1 in Ollama
Run the following command to download DeepSeek R1 to your machine:
ollama pull deepseek-r1
Below is the recommended project structure:
rag-system/
│── embeddings/
│ ├── __init__.py
│ ├── text_splitter.py # Splits documents into smaller chunks
│ ├── vector_store.py # Handles embeddings and storage
│── ollama_model/
│ ├── __init__.py
│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama
│── app/
│ ├── __init__.py
│ ├── retriever.py # Retrieves relevant document chunks
│ ├── rag_chain.py # Generates final response
│ ├── streamlit_app.py # Web UI for interaction
│── data/
│ ├── sample.pdf # Example document for testing
│── requirements.txt # Required dependencies
│── .env # API keys (if needed)
│── main.py # Main entry point
To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.
File: “embeddings/text_splitter.py”
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
def split_text(file_path):
loader = PyPDFLoader(file_path)
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
return splitter.split_documents(documents)
This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.
Now, we need to convert the text chunks into embeddings and store them in a vector database.
File: “embeddings/vector_store.py”
from langchain.vectorstores import Chroma
from langchain.embeddings import OllamaEmbeddings
def store_embeddings(chunks):
embeddings = OllamaEmbeddings(model="deepseek-r1")
vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db")
vector_store.persist()
Uses ChromaDB to store text embeddings.
DeepSeek R1 is used to generate embeddings via Ollama.
When a user asks a question, we retrieve the most relevant text chunks from the vector database.
File: “app/retriever.py”
from langchain.vectorstores import Chroma
def retrieve_chunks(query):
vector_store = Chroma(persist_directory="./vector_db")
return vector_store.similarity_search(query, k=3)
Uses cosine similarity to find the top 3 most relevant text chunks.
To process user queries, we need to load the DeepSeek R1 model using Ollama.
File: “ollama_model/deepseek_r1.py”
import ollama
def load_llm():
return ollama.Chat(model="deepseek-r1")
Initializes DeepSeek R1 as the primary language model.
Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.
File: “app/rag_chain.py”
from ollama_model.deepseek_r1 import load_llm
from app.retriever import retrieve_chunks
def get_rag_response(query):
retrieved_chunks = retrieve_chunks(query)
context = "\n".join([chunk.page_content for chunk in retrieved_chunks])
llm = load_llm()
response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}")
return response
This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.
To allow users to interact with the system, we use Streamlit for a simple web interface.
File: “app/streamlit_app.py”
import streamlit as st
from app.rag_chain import get_rag_response
st.title("RAG System with DeepSeek R1")
query = st.text_input("Ask a question:")
if query:
response = get_rag_response(query)
st.write("### Response:")
st.write(response)
The app provides a text input for user queries and displays responses.
Run the UI:
streamlit run app/streamlit_app.py
Once all components are ready, follow these steps to run the full system.
Start Ollama and Ensure DeepSeek R1 is Available
ollama pull deepseek-r1
Run the Main Pipeline
python main.py
Launch the Web UI
streamlit run app/streamlit_app.py
This completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.
Ready to transform your business with our technology solutions? Contact Us today to Leverage Our AI/ML Expertise.