Arif Sheikh

-- How-To Guides | Running LLMs Locally | Installation Guide --

Running LLM AI models on your local desktop.

Install and Run Ollama + Docker Desktop + Web UI Locally.
Guide for Researchers & Personal Users

A step-by-step guide to setting up Ollama with a Web UI for running AI models on your local desktop.

What is a Large Language Model (LLM)?

A Large Language Model (LLM) is an advanced artificial intelligence (AI) system trained to understand and generate human-like text. Examples include OpenAI's ChatGPT, Meta's Llama, and Google's Gemini.

LLMs are used in chatbots, content creation, coding assistance, research, and more. Many AI models are cloud-based, meaning your data is sent over the internet for processing.

100% Open-Source AI in This Guide

This tutorial focuses **only on open-source models and tools**, ensuring full transparency, privacy, and control over your AI workflows. Everything used here—including **Ollama, Open WebUI, and the AI models**—is free and open-source.

You don’t need a cloud account or proprietary software. With just a few simple steps, you can run AI models **locally on your personal computer** without any hidden costs or restrictions.

Open-Source Models You Can Use

Some of the most popular **open-source** AI models that can run on Ollama include:

  • Mistral 7B – Lightweight and efficient for general AI tasks.
  • Mixtral 8x7B – A powerful MoE (Mixture of Experts) model for advanced AI tasks.
  • Llama 2 (7B, 13B, 70B) – Meta’s open-source model, great for chat and research.
  • DeepSeek (7B, 67B) – A cutting-edge AI model optimized for multilingual tasks.
  • StableLM – A lightweight model by Stability AI for creative writing and research.
  • Phi-2 – Small and efficient, ideal for personal AI assistants.
  • Gemma – A lightweight research-focused LLM.
  • Falcon (7B, 40B) – A high-performance model optimized for text generation.
  • WizardLM – A model fine-tuned for instruction-following AI tasks.

**No proprietary models required!** This guide keeps it simple and fully open-source, making AI accessible for **everyone**, including researchers outside of computer science.

**Tip:** If you're concerned about complexity, don’t worry! This tutorial is designed to be **easy to follow** and requires only basic computer knowledge.

Why Run an LLM Locally?

  • Privacy: No data is sent to external servers.
  • Speed: Faster responses without internet delays.
  • Customization: Fine-tune or modify the model for your needs.
  • Offline Capability: Use AI models without an internet connection.

To run an LLM locally, we use Ollama, a lightweight AI framework that allows you to download and run models without complex setup. We will also install a Web UI so you can interact with the AI using a browser.

Step 1: Install Ollama

Ollama is a tool that allows you to run AI models on your computer without relying on cloud services.

  1. Download Ollama from the Official Website.
  2. Run the installer and follow the setup instructions.
  3. Verify installation by opening a terminal and running:
    ollama --version
  4. Test Ollama by running a basic AI model:
    ollama run mistral

Step 2: Install Docker Desktop

Docker helps run applications in an isolated environment, making it easier to manage AI tools like Web UIs.

  1. Download Docker Desktop from Get Docker.
  2. Install Docker and restart your computer if prompted.
  3. Verify installation by running:
    docker --version

Step 3: Install Web UI for Ollama

Instead of using the command line, we can install a Web UI to interact with Ollama through a simple browser interface.

  1. Open a terminal and pull the Web UI container:
    docker pull ghcr.io/open-webui/open-webui:main
  2. Run the Web UI container:
    docker run -d --name ollama-webui \
    -p 3000:3000 \
    -v open-webui-data:/app/backend/data \
    -e OLLAMA_API_BASE_URL="http://host.docker.internal:11434" \
    ghcr.io/open-webui/open-webui:main
  3. Access the Web UI by opening:
    http://localhost:3000

Step 4: Test Ollama with the Web UI

  1. Go to Web UI.
  2. Enter a test query like:
    Hello! How can I use Ollama for local AI processing?
  3. If Ollama generates a response, everything is working correctly.

Step 5: Automate Startup (Optional)

To make sure the Web UI and Ollama start automatically:

  • Enable Docker auto-start: Open Docker Desktop ? Settings ? General and check “Start Docker at system login.”
  • Restart the Web UI manually if needed:
    docker start ollama-webui

Final Check

Before you start using your local AI, verify that everything is set up:

  • Ollama Installed: Run ollama --version
  • Docker Installed: Run docker --version
  • Web UI Running: Open Web UI

Quick Tips for Personal Users and Researchers

Running AI models locally is exciting, but there are a few important things to consider, especially if you're using a **personal computer** or working in **academic research** without a strong computer science background.

Model Size Matters: 7B vs. 617B Parameters

  • 7B Parameters (Smaller Models) – Suitable for personal desktops, runs on **CPU**, good for research.
  • 13B Parameters (Medium Models) – May run slowly on a **CPU**, better if you have **lots of RAM (32GB+).**
  • 65B+ Parameters (Large Models) – Needs a **high-end GPU**; not practical for most laptops or desktops.
  • 617B+ Parameters (Huge Models) – These are **cloud-only models**, requiring thousands of dollars in computing power.

Speed vs. Hardware: CPU vs. GPU

  • **CPU-Only (Standard Computers)** – Works for **small AI models (7B-13B),** but will be slower.
  • **RAM Matters** – AI models store data in memory. **16GB RAM minimum, 32GB+ recommended.**
  • **GPU-Accelerated (Gaming or AI GPUs)** – Needed for **large models (30B+)** for real-time responses.
  • **Cloud GPUs (Google Colab, AWS, etc.)** – Can be used for larger models but may have costs.

Potential Issues & Fixes

  • **Slow Responses?** – Reduce model size (use 7B instead of 13B) or add more RAM.
  • **High RAM Usage?** – Close other programs. AI models use a lot of memory.
  • **CPU Overheating?** – Long AI sessions can overheat laptops; use a cooling pad.
  • **Battery Draining Fast?** – AI workloads use full CPU/GPU power; plug into power.

Best Practices for Academic Research

  • **Use Open-Source Models** – Avoid paid models if working in academia with limited funding.
  • **Use Lightweight Models First** – Test with 7B-13B models before scaling up.
  • **Use Local Vector Databases** – For Retrieval-Augmented Generation (RAG) experiments.
  • **Use Jupyter Notebooks** – Best for running small AI models and analyzing text output.

**Tip:** If you’re running Ollama on a **CPU-based system**, stick to **7B or 13B models** for the best performance!

Author & Acknowledgments

Author: Arifuzzaman (Arif) Sheikh

Affiliation: Department of Systems Engineering, Colorado State University

URL: https://www.engr.colostate.edu/~arif2022/new_design/llm

Email: arif.sheikh@colostate.edu

This guide was created to help researchers and individual users set up and run AI models locally using Ollama, Docker, and Web UIs. It is intended for **educational and research purposes** only.

Citation & Academic Use

If you found this guide useful in your research, please cite as:

@article{sheikh_2024ollama,
  author = {Arifuzzaman (Arif) Sheikh},
  title = {A Practical Guide to Running AI Models Locally: Ollama, Docker, and Web UI for Researchers},
  year = {2024},
  journal = {Colorado State University Systems Engineering Repository},
  url = {https://www.engr.colostate.edu/~arif2022/new_design/llm}
}

If publishing an academic paper, please include proper attribution when using insights from this guide.

Copyright & License

© 2024 Arifuzzaman (Arif) Sheikh, Colorado State University. All rights reserved. This document is distributed under the MIT License for educational and research purposes. Redistribution is permitted with proper attribution.

Additional Resources

This tutorial is part of an **open educational initiative** to support **AI research in academia** and **help individual users explore local AI model deployment**.

Now you are ready to run AI models on your local machine!