Create Your Own Locally Hosted DeepSeek-R1 Powered Chatbot

Build a private DeepSeek-R1 chatbot with Ollama, MongoDB, and chat U. No external APIs. Deployment steps for local setups or AWS.

Paco Awissi

9 min read • November 20, 2025

So you've probably heard about DeepSeek R1 by now, that open source large language model from the Chinese startup DeepSeek. The release made headlines and actually managed to spook the U.S. stock market, with several AI stocks taking a hit. Look, there are already tons of reviews out there telling you how impressive this model is, so I won't pile on. What I want to talk about instead is something actually useful. You can download this thing and run it on your own machine. For a quick primer on the foundations behind models like R1, see how transformer architectures power large language models like DeepSeek R1.

Why would you want to do that? Well, maybe you're not comfortable sending your data to some third party API. Or maybe you're trying to keep costs under control. I've been there. Running locally means you can fine tune the model and customize everything to fit your specific stack. Plus, you can learn how in-context learning techniques can further boost your model's accuracy and control.

Here's the good news. Getting DeepSeek R1 running on your own hardware is actually pretty straightforward. Let me walk you through exactly how I did it.

Get a machine: AWS EC2 instance

First things first, you need a machine to run DeepSeek R1. If you're just experimenting or building a personal chatbot, honestly, your local computer might be enough. But if you're thinking about production, you'll probably want dedicated servers. And if you just want to get started quickly without any fuss, a cloud instance is your fastest bet.

For a lightweight start, an AWS EC2 CPU instance will handle the 1.5B parameter variant just fine. When I first tested this, I used an m5.2xlarge. You can still use that one. But actually, you might want to consider newer generation instances like m7i.2xlarge or m7g.2xlarge for better price performance. I've noticed the newer ones run cooler too.

Now, if you want faster responses or you're planning to try the larger variants, go with a GPU instance. A g6.xlarge or g5.xlarge makes a good baseline. These give you an NVIDIA GPU with enough VRAM to handle 7B class models at practical quantization levels. Trust me, the speed difference is worth it if you're doing anything beyond basic testing.

To launch an EC2 instance, just follow the official AWS guide: AWS EC2 Getting Started Guide

Set up your machine

Next, you'll need to install the essentials to run DeepSeek R1. I always start with a fresh Ubuntu LTS image to keep things clean. Ubuntu 24.04 LTS is what I'm using these days, works great.

Connect to your instance over SSH using AWS's official steps: Connecting to Your Linux Instance Using SSH

Install dependencies

# Update system packages
sudo apt update && sudo apt upgrade -y

# Install dependencies
sudo apt install -y curl git

# Install Ollama
# This script will create, enable, and start the Ollama systemd service
curl -fsSL https://ollama.com/install.sh | sh

# Install Node.js and npm (for Chat UI)
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs

# Restart shell session to apply changes
exec bash

Download and serve DeepSeek R1 on Ollama

Alright, time to choose your DeepSeek R1 model variant based on what resources you have and what performance you need. In this walkthrough, I'm using DeepSeek R1 1.5B, which is the smallest version you'll find in most community runtimes.

The 1.5B model has, as the name suggests, 1.5 billion parameters. It runs really well on consumer hardware or modest cloud instances. Lower compute demands, but still delivers solid results for everyday chat and coding tasks. I was actually surprised how capable it is for its size.

As of late 2025, you've got options for larger variants in Ollama and similar runtimes too. There are several 7B and 8B options available, for instance. The larger models respond more coherently and reason better. But here's the thing, they need more memory. Let me give you a quick rule of thumb that I've learned the hard way:

CPU only with 16 to 32 GB RAM. Stick with 1.5B or a quantized 7B model.
Single mid range GPU with 16 to 24 GB VRAM. You can run 7B or 8B quantized models comfortably.
High VRAM GPUs. Consider the larger models if you need stronger reasoning and can handle the higher cost.

# 1.5B version (smallest, lightweight, suitable for low-resource setups)
ollama pull deepseek-r1:1.5b

# 8B version (mid-range, balances performance and resource usage)
ollama pull deepseek-r1:8b

# 14B version (higher accuracy, requires more compute power)
ollama pull deepseek-r1:14b

# 32B version (powerful, best for advanced tasks, needs high-end hardware)
ollama pull deepseek-r1:32b

# 70B version (largest, highest performance, very resource-intensive)
ollama pull deepseek-r1:70b

After the download finishes, list your installed models to make sure everything loaded properly. I always do this, learned my lesson after a corrupted download once.

$ ollama list
NAME                ID              SIZE      MODIFIED
deepseek-r1:1.5b    a42b25d8c10a    1.1 GB    2 seconds ago

Ollama serves on http://127.0.0.1:11434 by default. Check that the service is healthy with these commands. Keep this API URL handy, you'll need it when you configure your chat UI.

# Check if Ollama is running and list downloaded models
curl http://127.0.0.1:11434/api/tags

You should see output listing all the models available on your machine.

{
   "models":[
      {
         "name":"deepseek-r1:1.5b",
         "model":"deepseek-r1:1.5b",
         "modified_at":"2025-02-01T17:05:07.520024256Z",
         "size":1117322599,
         "digest":"a42b25d8c10a841bd24724309898ae851466696a7d7f3a0a408b895538ccbc96",
         "details":{
            "parent_model":"",
            "format":"gguf",
            "family":"qwen2",
            "families":[
               "qwen2"
            ],
            "parameter_size":"1.8B",
            "quantization_level":"Q4_K_M"
         }
      }
   ]
}

Test the model with a simple generate call. This is where you'll know if everything's working.

curl -X POST http://127.0.0.1:11434/api/generate -d '{
  "model": "deepseek-r1:1.5b",
  "prompt": "What is Ollama?",
  "num_predict": 100,
  "stream": false
}'

Set up the chat interface

Alright, DeepSeek R1 is running. Next step is adding a chat UI so you can actually talk to your model from a browser. When you're ready to go beyond a basic interface, you might want to explore advanced chatbot architectures that integrate knowledge graphs for richer, more accurate responses.

Install MongoDB

The chat UI needs MongoDB to store conversation history. It won't work without it. I tried skipping this step once, doesn't work. The simplest approach is running a local MongoDB container with a persistent volume. Docker makes this easy and repeatable.

sudo snap install docker
sudo docker run -d -p 27017:27017 -v mongo-chat-ui:/data --name mongo-chat-ui mongo:latest

When MongoDB is running, you can access the database at: mongodb://localhost:27017

You'll add this URL to your chat UI configuration file (.env.local). Don't forget this step or you'll be scratching your head wondering why nothing's saving.

Download and install Clone Chat UI

#Clone Chat UI
git clone https://github.com/huggingface/chat-ui.git
cd chat-ui

#Install Dependencies
npm install

Configure Chat UI

Update your .env.local file with these values:

MongoDB URL: mongodb://localhost:27017. This is where your chat history gets stored.
Ollama Endpoint: http://127.0.0.1:11434. This is your local Ollama API.
Ollama Model Name: deepseek-r1:1.5b. Replace this with whatever model tag you actually installed.

# Create a .env.local file:
nano .env.local

You can tweak these parameters to match your hardware and latency goals. I usually start conservative and then bump things up.

MONGODB_URL=mongodb://localhost:27017
MODELS=`[
  {
    "name": "DeepSeek-R1",
    "chatPromptTemplate": "<s>{{#each messages}}{{#ifUser}}[INST] {{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s> {{/ifAssistant}}{{/each}}",
    "parameters": {
      "temperature": 0.3,
      "top_p": 0.95,
      "max_new_tokens": 1024,
      "stop": ["</s>"]
    },
    "endpoints": [
      {
        "type": "ollama",
        "url" : "http://127.0.0.1:11434",
        "ollamaName" : "deepseek-r1:1.5b" 
      }
    ]
  }
]`

When you're done, save and exit. Use CTRL+X, then Y, then ENTER.

Use your very own DeepSeek R1 chatbot

You're ready to use your DeepSeek R1 chatbot. This is the fun part.

Start Chat UI

# Start the Chat UI in development mode, making it accessible on the network
$ npm run dev -- --host 0.0.0.0

# The output confirms the server is running and displays the accessible port
> chat-ui@0.9.4 dev
> vite dev --host 0.0.0.0


  VITE v5.4.14  ready in 1122 ms

  ➜  Local:   http://localhost:5173/
  ➜  Network: http://100.00.00.000:5173/
  ➜  Network: http://100.00.0.0:5173/
  ➜  press h + enter to show help

If you're running on an AWS EC2 instance, remember to open the UI port in the instance security group. You can do this in the AWS Console under EC2, then Security Groups, then Inbound Rules. Or you can use the AWS CLI if you prefer. I always forget this step and then wonder why I can't connect.

For a public deployment, you should really consider adding a reverse proxy with HTTPS and enabling authentication. You want to protect both your Ollama endpoint and the chat UI. Seriously, don't skip this if you're going to production.

Access your chatbot

Open your machine's public address and port in a browser. You should see the chat interface.

Conclusion

And there you have it. You now have a powerful AI model running under your control, on your own machine, inside your own security perimeter. Pretty cool, right?

Recap

You set up the environment by installing Ollama, MongoDB, and all the required dependencies.
You downloaded and configured DeepSeek R1 to run locally.
You set up a Chat UI and connected it to MongoDB and Ollama.
You made sure network access was working by opening the needed ports on AWS.
You accessed the chatbot from your browser.
Optional. You picked a GPU instance for faster responses and larger models.

Your locally hosted DeepSeek R1 chatbot is now up and running. The whole process took me about 30 minutes the first time, and now I can spin one up in under 10. If you want to keep building your skills and plan your next projects, check out our practical roadmap for aspiring GenAI developers.

Get a machine: AWS EC2 instance

Set up your machine

Install dependencies

Download and serve DeepSeek R1 on Ollama

Set up the chat interface

Install MongoDB

Download and install Clone Chat UI

Configure Chat UI

Use your very own DeepSeek R1 chatbot

Start Chat UI

Access your chatbot

Conclusion

Recap

Join the conversation

Read More

LangGraph Agent: How to Build a Deterministic Plan-Execute with Memory

Fine-tuning large language models: a step-by-step guide

How to Run Your Own Self-Hosted LLM on a Server: A Practical Guide

How to Build a Multi-Agent Chatbot with CrewAI