Deepseek Model for your Business?
Cost Efficiency (Open Source)
Lower Long Term costs
Customised data control
Pre-trained model
Get Your Deepseek AI Model Running in a Day
Free Installation Guide - Step by Step Instructions Inside!
A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.
RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:
Before setting up the system, install the necessary dependencies:
pip install langchain langchain-community chromadb pypdf streamlit ollamaInstalling DeepSeek R1 in Ollama
Run the following command to download DeepSeek R1 to your machine:
ollama pull deepseek-r1
Below is the recommended project structure:
rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.
File: “embeddings/text_splitter.py”
from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)
This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.
Now, we need to convert the text chunks into embeddings and store them in a vector database.
File: “embeddings/vector_store.py”
from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()
Uses ChromaDB to store text embeddings.
DeepSeek R1 is used to generate embeddings via Ollama.
When a user asks a question, we retrieve the most relevant text chunks from the vector database.
File: “app/retriever.py”
from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)
Uses cosine similarity to find the top 3 most relevant text chunks.
To process user queries, we need to load the DeepSeek R1 model using Ollama.
File: “ollama_model/deepseek_r1.py”
import ollama
def load_llm(): return ollama.Chat(model="deepseek-r1")
Initializes DeepSeek R1 as the primary language model.
Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.
File: “app/rag_chain.py”
from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response
This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.
To allow users to interact with the system, we use Streamlit for a simple web interface.
File: “app/streamlit_app.py”
import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)
The app provides a text input for user queries and displays responses.
Run the UI:
streamlit run app/streamlit_app.py
File: “main.py”
from embeddings.text_splitter import split_textfrom embeddings.vector_store import store_embeddingsdef main(): print("[1/2] Splitting and processing documents...") chunks = split_text("data/sample.pdf") print("[2/2] Generating and storing embeddings...") store_embeddings(chunks) print("Embeddings stored. You can now run the Streamlit app with:\n") print(" streamlit run app/streamlit_app.py")if __name__ == "__main__": main()Once all components are ready, follow these steps to run the full system.
Start Ollama and Ensure DeepSeek R1 is Available
ollama pull deepseek-r1Run the Main Pipeline
python main.pyLaunch the Web UI
streamlit run app/streamlit_app.pyThis completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.
Ready to transform your business with our technology solutions? Contact Us today to Leverage Our AI/ML Expertise.
Contact Us