Deploying OpenThinker 7B on Google Cloud Platform (GCP) allows for scalable, secure, and cost efficient hosting of the model. GCP provides various services such as Google Kubernetes Engine (GKE), Cloud Run and Compute Engine (GCE) for deployment.
In this guide, we will focus on deploying OpenThinker 7B using Google Kubernetes Engine (GKE), which provides managed Kubernetes infrastructure for deploying and scaling containers.
Before starting, ensure you have:
Enable GCR API and Authenticate Docker
Enable Google Container Registry (GCR):
gcloud services enable containerregistry.googleapis.com
Authenticate Docker to push images to GCR:
gcloud auth configure-docker
Tag the Docker Image
Retrieve your GCP project ID:
gcloud config get-value project
Tag the image for GCR (replace <project-id> and <region> with your actual values):
docker tag openthinker-7b gcr.io/<project-id>/openthinker-7b:latest
Push the Image to GCR
docker push gcr.io/<project-id>/openthinker-7b:latest
Once completed, the image will be stored in Google Container Registry (GCR).
We will use Google Kubernetes Engine (GKE) to deploy the model.
Enable GKE API
gcloud services enable container.googleapis.com
Create a GKE Cluster
gcloud container clusters create openthinker-cluster \
--zone us-central1-a \
--num-nodes 2 \
--machine-type n1-standard-4
This command creates a 2-node cluster in us-central1-a using n1-standard-4 instances.
Connect to the Cluster
gcloud container clusters get-credentials openthinker-cluster --zone us-central1-a
Create a Kubernetes Deployment YAML File
Create a new file called openthinker deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: openthinker-deployment
spec:
replicas: 1
selector:
matchLabels:
app: openthinker
template:
metadata:
labels:
app: openthinker
spec:
containers:
- name: openthinker
image: gcr.io/<project-id>/openthinker-7b:latest
ports:
- containerPort: 11434
resources:
limits:
memory: "8Gi"
cpu: "2"
Apply the Deployment
kubectl apply -f openthinker-deployment.yaml
To allow external access to OpenThinker 7B, create a Kubernetes Service.
Create a Service YAML File
Create a new file called openthinker-service.yaml:
apiVersion: v1
kind: Service
metadata:
name: openthinker-service
spec:
type: LoadBalancer
selector:
app: openthinker
ports:
- protocol: TCP
port: 80
targetPort: 11434
Apply the Service
kubectl apply -f openthinker-service.yaml
This command exposes OpenThinker via a LoadBalancer, which assigns a public IP.
Check Running Pods
kubectl get pods
Ensure the pod is running.
Get the External IP
kubectl get service openthinker-service
Look for the EXTERNAL IP field. Once assigned, you can access OpenThinker using:
curl http://<external-ip>
Expected output:
{"message": "Model is up and running"}
To handle high traffic, increase the number of replicas:
Update the Replica Count
kubectl scale deployment openthinker-deployment --replicas=3
Enable Auto Scaling
kubectl autoscale deployment openthinker-deployment --cpu-percent=70 --min=1 --max=5
This scales the model dynamically based on CPU usage.
To delete the Kubernetes deployment:
kubectl delete deployment openthinker-deployment
kubectl delete service openthinker-service
To delete the GKE cluster:
gcloud container clusters delete openthinker-cluster --zone us-central1-a
Deploying OpenThinker 7B on Google Cloud Platform (GCP) using GKE allows for scalable, managed deployment. By leveraging Google Kubernetes Engine (GKE), Google Container Registry (GCR) and Load Balancers, the model runs efficiently with minimal manual intervention.
Ready to transform your business with our technology solutions? Contact Us today to Leverage Our AI/ML Expertise.