The complete cookbook for running open-source AI models on your own hardware – from a budget laptop to a workstation. No cloud bills. No data leaving your machine. Real engineers. Real setups. No fluff.
Each episode is a standalone guide. Together they take you from zero – never run a local model – to being the person in your team who actually understands how local AI works.
Five tiers. Every common hardware profile covered. Pick your tier and see exactly which models run, what use cases unlock, and what you can't do yet.
Every use case in this series builds on the same two tools. Install this once. Every playbook below assumes you have these running.
localhost:11434. Every other tool connects to it.# Linux / Mac curl -fsSL https://ollama.com/install.sh | sh # Windows → download installer from ollama.com # Pull your first model (T2 recommendation) ollama pull gemma4:26b-a4b ollama pull phi4 ollama pull qwen3:14b # List available models ollama list
docker run -d -p 3000:80 \ -v open-webui:/app/backend/data \ --add-host=host.docker.internal:host-gateway \ -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \ --name open-webui \ ghcr.io/open-webui/open-webui:main # Open browser at: # http://localhost:3000
pip install open-webui && open-webui serve# In the Open WebUI sidebar: 1. Go to Workspace → Documents 2. Upload PDFs, DOCX, TXT, MD files 3. In chat, type # to reference a document 4. Or enable "RAG Mode" to always search docs # Advanced: AnythingLLM for full local RAG docker run -d -p 3001:3001 \ -v anythingllm:/app/server/storage \ --name anythingllm \ mintplexlabs/anythingllm
docker run -d -p 5678:5678 \ -v n8n_data:/home/node/.n8n \ --name n8n \ n8nio/n8n # Open n8n at http://localhost:5678 # Add Ollama node → connect to localhost:11434
Same foundation every time: Ollama + Open WebUI. What changes is the model you choose, the documents you upload, and how you wire the tools together.
phi4Continue.devollama, model to phi4Tab for inline completion, Ctrl+I for chat in editor#filename to query a specific docgemma4:26b-a4b – 256K context handles full reportsgemma4:26b-a4b or mistral-small3.1docker run -d -p 8080:8080 searxng/searxngdocker run -d -p 5678:5678 n8nio/n8nA general model knows everything broadly but nothing specifically about your domain, your writing style, or your audience. Fine-tuning teaches it. Here's the complete picture – tools, techniques, process, and what hardware you actually need.
{
"messages": [
{"role": "system", "content": "You are a senior cloud architect who explains complex systems to junior engineers in plain Urdu-influenced English."},
{"role": "user", "content": "Explain microservices in simple terms."},
{"role": "assistant", "content": "Think of it like a dhaba kitchen..."}
]
}
# Repeat this pattern for each training example (200+ recommended)
from unsloth import FastLanguageModel from trl import SFTTrainer # Load model + apply LoRA model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Phi-4", max_seq_length = 2048, load_in_4bit = True, # QLoRA ) model = FastLanguageModel.get_peft_model( model, r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], ) # Train with your JSONL dataset trainer = SFTTrainer(model=model, tokenizer=tokenizer, train_dataset=your_dataset, ...) trainer.train() # Export to GGUF for Ollama model.save_pretrained_gguf("my_model", tokenizer, quantization_method="q4_k_m")
Local AI is a leverage skill. Every stage below unlocks a new tier of things you can build, automate, or offer. The timeline is real – these are hours-to-weeks, not years.
The whole stack is free and runs on your machine. Here is every project this series is built on, with its source. Star the ones you rely on; that is how open source stays alive.