Orion1: a local AI assistant trained on a Mac
A practical write-up: what Orion1 is, what it can do, and how it was trained locally.
What Orion1 is
A small model, carefully tuned for real conversations
Orion1 is a ChatGPT-style assistant you can run entirely on your own machine. It starts from a small open instruction model (primarily Qwen2.5-1.5B-Instruct) and learns new behavior through a lightweight fine-tuning method called LoRA.
The goal wasn't "make it bigger." It was to make it more useful: clearer writing, better step-by-step reasoning, stronger coding help, and fewer made-up answers when the question includes source text.
How it works
The same chat loop as larger assistants, just local
Orion1 follows a standard chat loop:
- Build a conversation (system prompt + your messages + history)
- Convert the conversation into the model's exact chat format
- Generate the reply token-by-token (with safe stop rules)
- Stream the answer back into a local UI (Gradio) or into Ollama
Practical lesson: the Ollama Modelfile must match Qwen's chat template and stop tokens. When it doesn't, quality drops (style drift, weird identity claims, and inconsistent formatting).
Base model size
~1.5B params
Qwen2.5-1.5B-Instruct (primary)
Fine-tuning method
LoRA adapters
Train ~1% params instead of all weights
Training device
MacBook Air (M5)
Apple Silicon via PyTorch MPS
Training data mix
Configured cap across datasets: 680,927 examples (maximum)
Training data by category
- General chat274,500 (40.3%)
- Coding/agentic165,000 (24.2%)
- Reasoning157,000 (23.1%)
- Preference/safety50,000 (7.3%)
- Grounded QA34,427 (5.1%)
Source: local dataset configuration. Caps are sampling ceilings, not guaranteed counts.
Configured examples by category
Caps keep local training predictable on a laptop.
What we trained for
Practical targets and why they help
| Goal | How we trained it | Expected impact |
|---|---|---|
| Speak properly | High-quality instruction/chat + preference/safety datasets | Cleaner structure, less rambling, better tone |
| Answer complex questions | Reasoning + math/science mixtures (Stratos, Tulu, MetaMath, Nemotron) | More step-by-step problem solving and synthesis |
| Coding + agentic behavior | Code instruction + function-calling datasets | Better code generation and tool-style responses |
| Know info correctly | Add grounded QA (BoolQ/SQuAD) + use "be honest" system prompts | Better at using provided context; still imperfect for open-world facts without retrieval |
Tip: training helps style and habits. For up-to-date factual answers, pair the model with retrieval (search/RAG) or tools.
Milestones (project timeline)
What we built, in order
1
Vision model
2
Chat LoRA (Qwen2.5-1.5B)
3
UI + inference hardening
4
More coding/agentic data
5
Reasoning mix + grounded QA
6
Ollama export (GGUF + Modelfile)
Source: project history in the transcript.
Why training took days
Fanless laptop + large dataset
0.5h
Small smoke test
1.5h
20k examples
6h
100k+ examples
9h
500k+ examples
LoRA keeps memory manageable, but long runs on a MacBook Air can still be slow due to fanless thermals and limited GPU throughput.
Illustrative scaling curve (order-of-magnitude).
How Orion1 was trained (nutshell)
Training loop
We streamed multiple Hugging Face datasets and converted them into a single, consistent chat format. Then we applied Qwen's chat template and fine-tuned with LoRA on Apple Silicon (MPS), using gradient accumulation and checkpointing to fit within laptop memory.
Deployment
After training, we exported the model for local use: either via a Gradio UI in Python, or by converting to GGUF (llama.cpp) and importing into Ollama with a Modelfile that preserves the chat template.
Training hardware + constraints
Machine: MacBook Air (M5), running PyTorch with Apple's MPS backend.
Key constraints: fanless thermals, limited GPU throughput vs desktop CUDA, and unified memory pressure for long context lengths.
Why LoRA: trains a small set of adapter weights (millions) rather than all model weights (billions), making local fine-tuning feasible.
One-line summary: "Orion1 is a LoRA-adapted Qwen2.5-1.5B chat model trained locally on an M5 MacBook Air using a curated multi-dataset mixture for reasoning, coding, and grounded QA."