Ollama, an absurdly flexible service that makes running LLMs locally a breeze. Combine that with Open WebUI, which ties everything together into a neat little interface, and of course, my go-to Nginx reverse proxy for easy access.
Let's break down the setup.
Ollama makes deploying LLMs locally ridiculously simple. Here's how to install it:
curl -fsSL https://ollama.com/install.sh | sh
This will install Ollama and set up everything you need to start running models locally. Want to make sure it's working? Just run:
ollama run codellama:13b
If you see an interactive prompt, congrats - you've got a local LLM running!
Ollama is great, but a web interface makes it even better. That's where Open WebUI comes in. It gives you a sleek, chat-like interface to interact with your models.
To install Open WebUI manually without Docker, follow these steps:
python3 -m venv ~/openwebui-venv
source ~/openwebui-venv/bin/activate
pip install open-webui
To make sure Open WebUI runs on startup, create a systemd service file:
sudo nano /etc/systemd/system/openwebui.service
Paste the following content:
[Unit]
Description=Open WebUI Service
After=network.target
[Service]
User=$USER
Group=$USER
WorkingDirectory=/home/$USER/openwebui-venv
ExecStart=/home/$USER/openwebui-venv/bin/open-webui
Restart=always
[Install]
WantedBy=multi-user.target
Save and exit, then reload systemd and enable the service:
sudo systemctl daemon-reload
sudo systemctl enable openwebui.service
sudo systemctl start openwebui.service
Now, let's make accessing our LLM easier by setting up an Nginx reverse proxy. This way, we can reach Open WebUI without exposing it directly.
Here's a basic Nginx config:
server {
listen 443 ssl;
server_name chat.example.com;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Reload Nginx with:
sudo systemctl restart nginx
Now, you can access your self-hosted LLM via https://chat.example.com
. Fancy.
One of the biggest advantages of self-hosting an LLM? Your data stays with you.
No sending queries to an external API, no third-party tracking what you're asking, no potential leaks of sensitive information. It's all running on your hardware, fully under your control. Whether you're experimenting with code, processing confidential documents, or just having fun chatting with AI, everything stays local.
Of course, different models come with different memory requirements. Here's what I'm running on my RTX ADA 4000 and how much VRAM they use:
NAME ID SIZE PROCESSOR UNTIL
codellama:7b 8fdf8f752f6e 9.4 GB 100% GPU 2 minutes from now
codellama:13b 9f438cb9cd58 15 GB 100% GPU 4 minutes from now
gemma3:12b 6fd036cefda5 13 GB 100% GPU 4 minutes from now
This means I can comfortably run mid-sized models like codellama:13b
while keeping things snappy.
Picking the right GPU is all about balancing performance, VRAM, and cost - because, let's be honest, unless you're running an AI research lab, you're not dropping $30,000 on an H100.
Here's a quick breakdown of solid options:
Now, why the ADA 4000? While the RTX 5090 is the fastest in raw compute power, VRAM is king for training AI models. The ADA 4000's 20 GB VRAM gives you enough room for Stable Diffusion training, larger batch sizes, and AI experiments, without hitting the limits of other consumer GPUs such as 3080 (10 GB).
Performance-wise, the 4090 and 5090 has more horsepower, but for training workloads where memory matters more than raw speed, the ADA 4000 is the more practical and cost-efficient choice. Plus, lower power consumption makes it a better long-term option if you're running AI workloads frequently. Also, the physical size - I mean it's a low profile card. Small, fits perfectly into any case.
At the end of the day, if you're serious about AI training and need a balance of VRAM, price, and efficiency, the ADA 4000 is the way to get started.
Prompt: implement fibonacci in python and also some unit tests using pytest
Prompt: implement fibonacci in python and also some unit tests using pytest
With an RTX ADA 4000, Ollama, Open WebUI, and an Nginx reverse proxy, I now have an AI-powered assistant running entirely on my own hardware. No subscriptions, no cloud dependencies, just raw, local AI power. If you're serious about AI and privacy, setting this up is a no-brainer. Give it a try, and let your GPU do some work.