Running LLM AI models on your local desktop.
Install and Run Ollama + Docker Desktop + Web UI Locally.
Guide for Researchers & Personal Users
Guide for Researchers & Personal Users
A step-by-step guide to setting up Ollama with a Web UI for running AI models on your local desktop.
What is a Large Language Model (LLM)?
A Large Language Model (LLM) is an advanced artificial intelligence (AI) system trained to understand and generate human-like text. Examples include OpenAI's ChatGPT, Meta's Llama, and Google's Gemini.
LLMs are used in chatbots, content creation, coding assistance, research, and more. Many AI models are cloud-based, meaning your data is sent over the internet for processing.
100% Open-Source AI in This Guide
This tutorial focuses **only on open-source models and tools**, ensuring full transparency, privacy, and control over your AI workflows. Everything used here—including **Ollama, Open WebUI, and the AI models**—is free and open-source.
You don’t need a cloud account or proprietary software. With just a few simple steps, you can run AI models **locally on your personal computer** without any hidden costs or restrictions.
Open-Source Models You Can Use
Some of the most popular **open-source** AI models that can run on Ollama include:
- Mistral 7B – Lightweight and efficient for general AI tasks.
- Mixtral 8x7B – A powerful MoE (Mixture of Experts) model for advanced AI tasks.
- Llama 2 (7B, 13B, 70B) – Meta’s open-source model, great for chat and research.
- DeepSeek (7B, 67B) – A cutting-edge AI model optimized for multilingual tasks.
- StableLM – A lightweight model by Stability AI for creative writing and research.
- Phi-2 – Small and efficient, ideal for personal AI assistants.
- Gemma – A lightweight research-focused LLM.
- Falcon (7B, 40B) – A high-performance model optimized for text generation.
- WizardLM – A model fine-tuned for instruction-following AI tasks.
**No proprietary models required!** This guide keeps it simple and fully open-source, making AI accessible for **everyone**, including researchers outside of computer science.
**Tip:** If you're concerned about complexity, don’t worry! This tutorial is designed to be **easy to follow** and requires only basic computer knowledge.
Why Run an LLM Locally?
- Privacy: No data is sent to external servers.
- Speed: Faster responses without internet delays.
- Customization: Fine-tune or modify the model for your needs.
- Offline Capability: Use AI models without an internet connection.
To run an LLM locally, we use Ollama, a lightweight AI framework that allows you to download and run models without complex setup. We will also install a Web UI so you can interact with the AI using a browser.
Step 1: Install Ollama
Ollama is a tool that allows you to run AI models on your computer without relying on cloud services.
- Download Ollama from the Official Website.
- Run the installer and follow the setup instructions.
- Verify installation by opening a terminal and running:
ollama --version - Test Ollama by running a basic AI model:
ollama run mistral
Step 2: Install Docker Desktop
Docker helps run applications in an isolated environment, making it easier to manage AI tools like Web UIs.
- Download Docker Desktop from Get Docker.
- Install Docker and restart your computer if prompted.
- Verify installation by running:
docker --version
Step 3: Install Web UI for Ollama
Instead of using the command line, we can install a Web UI to interact with Ollama through a simple browser interface.
- Open a terminal and pull the Web UI container:
docker pull ghcr.io/open-webui/open-webui:main - Run the Web UI container:
docker run -d --name ollama-webui \ -p 3000:3000 \ -v open-webui-data:/app/backend/data \ -e OLLAMA_API_BASE_URL="http://host.docker.internal:11434" \ ghcr.io/open-webui/open-webui:main - Access the Web UI by opening:
http://localhost:3000
Step 4: Test Ollama with the Web UI
- Go to Web UI.
- Enter a test query like:
Hello! How can I use Ollama for local AI processing? - If Ollama generates a response, everything is working correctly.
Step 5: Automate Startup (Optional)
To make sure the Web UI and Ollama start automatically:
- Enable Docker auto-start: Open Docker Desktop ? Settings ? General and check “Start Docker at system login.”
- Restart the Web UI manually if needed:
docker start ollama-webui
Final Check
Before you start using your local AI, verify that everything is set up:
- Ollama Installed: Run
ollama --version - Docker Installed: Run
docker --version - Web UI Running: Open Web UI
Quick Tips for Personal Users and Researchers
Running AI models locally is exciting, but there are a few important things to consider, especially if you're using a **personal computer** or working in **academic research** without a strong computer science background.
Model Size Matters: 7B vs. 617B Parameters
- 7B Parameters (Smaller Models) – Suitable for personal desktops, runs on **CPU**, good for research.
- 13B Parameters (Medium Models) – May run slowly on a **CPU**, better if you have **lots of RAM (32GB+).**
- 65B+ Parameters (Large Models) – Needs a **high-end GPU**; not practical for most laptops or desktops.
- 617B+ Parameters (Huge Models) – These are **cloud-only models**, requiring thousands of dollars in computing power.
Speed vs. Hardware: CPU vs. GPU
- **CPU-Only (Standard Computers)** – Works for **small AI models (7B-13B),** but will be slower.
- **RAM Matters** – AI models store data in memory. **16GB RAM minimum, 32GB+ recommended.**
- **GPU-Accelerated (Gaming or AI GPUs)** – Needed for **large models (30B+)** for real-time responses.
- **Cloud GPUs (Google Colab, AWS, etc.)** – Can be used for larger models but may have costs.
Potential Issues & Fixes
- **Slow Responses?** – Reduce model size (use 7B instead of 13B) or add more RAM.
- **High RAM Usage?** – Close other programs. AI models use a lot of memory.
- **CPU Overheating?** – Long AI sessions can overheat laptops; use a cooling pad.
- **Battery Draining Fast?** – AI workloads use full CPU/GPU power; plug into power.
Best Practices for Academic Research
- **Use Open-Source Models** – Avoid paid models if working in academia with limited funding.
- **Use Lightweight Models First** – Test with 7B-13B models before scaling up.
- **Use Local Vector Databases** – For Retrieval-Augmented Generation (RAG) experiments.
- **Use Jupyter Notebooks** – Best for running small AI models and analyzing text output.
**Tip:** If you’re running Ollama on a **CPU-based system**, stick to **7B or 13B models** for the best performance!