Create Your Own Locally Hosted DeepSeek-R1 Powered Chatbot
Build a private DeepSeek-R1 chatbot with Ollama, MongoDB, and chat U. No external APIs. Deployment steps for local setups or AWS.
So you've probably heard about DeepSeek R1 by now, that open source large language model from the Chinese startup DeepSeek. The release made headlines and actually managed to spook the U.S. stock market, with several AI stocks taking a hit. Look, there are already tons of reviews out there telling you how impressive this model is, so I won't pile on. What I want to talk about instead is something actually useful. You can download this thing and run it on your own machine. For a quick primer on the foundations behind models like R1, see how transformer architectures power large language models like DeepSeek R1.
Why would you want to do that? Well, maybe you're not comfortable sending your data to some third party API. Or maybe you're trying to keep costs under control. I've been there. Running locally means you can fine tune the model and customize everything to fit your specific stack. Plus, you can learn how in-context learning techniques can further boost your model's accuracy and control.

Here's the good news. Getting DeepSeek R1 running on your own hardware is actually pretty straightforward. Let me walk you through exactly how I did it.
Get a machine: AWS EC2 instance
First things first, you need a machine to run DeepSeek R1. If you're just experimenting or building a personal chatbot, honestly, your local computer might be enough. But if you're thinking about production, you'll probably want dedicated servers. And if you just want to get started quickly without any fuss, a cloud instance is your fastest bet.
For a lightweight start, an AWS EC2 CPU instance will handle the 1.5B parameter variant just fine. When I first tested this, I used an m5.2xlarge. You can still use that one. But actually, you might want to consider newer generation instances like m7i.2xlarge or m7g.2xlarge for better price performance. I've noticed the newer ones run cooler too.
Now, if you want faster responses or you're planning to try the larger variants, go with a GPU instance. A g6.xlarge or g5.xlarge makes a good baseline. These give you an NVIDIA GPU with enough VRAM to handle 7B class models at practical quantization levels. Trust me, the speed difference is worth it if you're doing anything beyond basic testing.
To launch an EC2 instance, just follow the official AWS guide: AWS EC2 Getting Started Guide
Set up your machine
Next, you'll need to install the essentials to run DeepSeek R1. I always start with a fresh Ubuntu LTS image to keep things clean. Ubuntu 24.04 LTS is what I'm using these days, works great.
Connect to your instance over SSH using AWS's official steps: Connecting to Your Linux Instance Using SSH
Install dependencies
# Update system packages
sudo apt update && sudo apt upgrade -y
# Install dependencies
sudo apt install -y curl git
# Install Ollama
# This script will create, enable, and start the Ollama systemd service
curl -fsSL https://ollama.com/install.sh | sh
# Install Node.js and npm (for Chat UI)
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
# Restart shell session to apply changes
exec bashDownload and serve DeepSeek R1 on Ollama
Alright, time to choose your DeepSeek R1 model variant based on what resources you have and what performance you need. In this walkthrough, I'm using DeepSeek R1 1.5B, which is the smallest version you'll find in most community runtimes.
The 1.5B model has, as the name suggests, 1.5 billion parameters. It runs really well on consumer hardware or modest cloud instances. Lower compute demands, but still delivers solid results for everyday chat and coding tasks. I was actually surprised how capable it is for its size.
As of late 2025, you've got options for larger variants in Ollama and similar runtimes too. There are several 7B and 8B options available, for instance. The larger models respond more coherently and reason better. But here's the thing, they need more memory. Let me give you a quick rule of thumb that I've learned the hard way:
CPU only with 16 to 32 GB RAM. Stick with 1.5B or a quantized 7B model.
Single mid range GPU with 16 to 24 GB VRAM. You can run 7B or 8B quantized models comfortably.
High VRAM GPUs. Consider the larger models if you need stronger reasoning and can handle the higher cost.
# 1.5B version (smallest, lightweight, suitable for low-resource setups)
ollama pull deepseek-r1:1.5b
# 8B version (mid-range, balances performance and resource usage)
ollama pull deepseek-r1:8b
# 14B version (higher accuracy, requires more compute power)
ollama pull deepseek-r1:14b
# 32B version (powerful, best for advanced tasks, needs high-end hardware)
ollama pull deepseek-r1:32b
# 70B version (largest, highest performance, very resource-intensive)
ollama pull deepseek-r1:70bAfter the download finishes, list your installed models to make sure everything loaded properly. I always do this, learned my lesson after a corrupted download once.
$ ollama list
NAME ID SIZE MODIFIED
deepseek-r1:1.5b a42b25d8c10a 1.1 GB 2 seconds agoOllama serves on http://127.0.0.1:11434 by default. Check that the service is healthy with these commands. Keep this API URL handy, you'll need it when you configure your chat UI.
# Check if Ollama is running and list downloaded models
curl http://127.0.0.1:11434/api/tagsYou should see output listing all the models available on your machine.
{
"models":[
{
"name":"deepseek-r1:1.5b",
"model":"deepseek-r1:1.5b",
"modified_at":"2025-02-01T17:05:07.520024256Z",
"size":1117322599,
"digest":"a42b25d8c10a841bd24724309898ae851466696a7d7f3a0a408b895538ccbc96",
"details":{
"parent_model":"",
"format":"gguf",
"family":"qwen2",
"families":[
"qwen2"
],
"parameter_size":"1.8B",
"quantization_level":"Q4_K_M"
}
}
]
}Test the model with a simple generate call. This is where you'll know if everything's working.
curl -X POST http://127.0.0.1:11434/api/generate -d '{
"model": "deepseek-r1:1.5b",
"prompt": "What is Ollama?",
"num_predict": 100,
"stream": false
}'Set up the chat interface
Alright, DeepSeek R1 is running. Next step is adding a chat UI so you can actually talk to your model from a browser. When you're ready to go beyond a basic interface, you might want to explore advanced chatbot architectures that integrate knowledge graphs for richer, more accurate responses.
Install MongoDB
The chat UI needs MongoDB to store conversation history. It won't work without it. I tried skipping this step once, doesn't work. The simplest approach is running a local MongoDB container with a persistent volume. Docker makes this easy and repeatable.
sudo snap install docker
sudo docker run -d -p 27017:27017 -v mongo-chat-ui:/data --name mongo-chat-ui mongo:latestWhen MongoDB is running, you can access the database at: mongodb://localhost:27017
You'll add this URL to your chat UI configuration file (.env.local). Don't forget this step or you'll be scratching your head wondering why nothing's saving.
Download and install Clone Chat UI
#Clone Chat UI
git clone https://github.com/huggingface/chat-ui.git
cd chat-ui
#Install Dependencies
npm installConfigure Chat UI
Update your .env.local file with these values:
MongoDB URL: mongodb://localhost:27017. This is where your chat history gets stored.
Ollama Endpoint: http://127.0.0.1:11434. This is your local Ollama API.
Ollama Model Name: deepseek-r1:1.5b. Replace this with whatever model tag you actually installed.
# Create a .env.local file:
nano .env.localYou can tweak these parameters to match your hardware and latency goals. I usually start conservative and then bump things up.
MONGODB_URL=mongodb://localhost:27017
MODELS=`[
{
"name": "DeepSeek-R1",
"chatPromptTemplate": "<s>{{#each messages}}{{#ifUser}}[INST] {{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s> {{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.3,
"top_p": 0.95,
"max_new_tokens": 1024,
"stop": ["</s>"]
},
"endpoints": [
{
"type": "ollama",
"url" : "http://127.0.0.1:11434",
"ollamaName" : "deepseek-r1:1.5b"
}
]
}
]`
When you're done, save and exit. Use CTRL+X, then Y, then ENTER.
Use your very own DeepSeek R1 chatbot
You're ready to use your DeepSeek R1 chatbot. This is the fun part.
Start Chat UI
# Start the Chat UI in development mode, making it accessible on the network
$ npm run dev -- --host 0.0.0.0
# The output confirms the server is running and displays the accessible port
> chat-ui@0.9.4 dev
> vite dev --host 0.0.0.0
VITE v5.4.14 ready in 1122 ms
➜ Local: http://localhost:5173/
➜ Network: http://100.00.00.000:5173/
➜ Network: http://100.00.0.0:5173/
➜ press h + enter to show helpIf you're running on an AWS EC2 instance, remember to open the UI port in the instance security group. You can do this in the AWS Console under EC2, then Security Groups, then Inbound Rules. Or you can use the AWS CLI if you prefer. I always forget this step and then wonder why I can't connect.
For a public deployment, you should really consider adding a reverse proxy with HTTPS and enabling authentication. You want to protect both your Ollama endpoint and the chat UI. Seriously, don't skip this if you're going to production.
Access your chatbot
Open your machine's public address and port in a browser. You should see the chat interface.


Conclusion
And there you have it. You now have a powerful AI model running under your control, on your own machine, inside your own security perimeter. Pretty cool, right?
Recap
You set up the environment by installing Ollama, MongoDB, and all the required dependencies.
You downloaded and configured DeepSeek R1 to run locally.
You set up a Chat UI and connected it to MongoDB and Ollama.
You made sure network access was working by opening the needed ports on AWS.
You accessed the chatbot from your browser.
Optional. You picked a GPU instance for faster responses and larger models.
Your locally hosted DeepSeek R1 chatbot is now up and running. The whole process took me about 30 minutes the first time, and now I can spin one up in under 10. If you want to keep building your skills and plan your next projects, check out our practical roadmap for aspiring GenAI developers.