OpenAI-compatible /v1/* API
shipped Chat, completions, embeddings, rerank, transcriptions, speech, images. Every OpenAI SDK works unchanged against the local box.
Slot lifecycle state machine
shipped Real atomic transitions (offline → pulling → starting → warming → ready → serving), persisted and SSE-streamed so the dashboard reflects reality, not snapshots.
Hardware-aware probe
shipped Detects GPU, NPU, and unified memory; surfaces VRAM/RAM fit warnings inline before you load a model that won’t fit.
Bundled OpenWebUI
shipped Prewired chat on :3001 pointed at the local API. Zero config; the installer wires it for you.
Cosign-signed self-update
shipped Atomic version swap with rollback. Stable and nightly channels, GitHub OIDC-verified release tarballs.
Five-provider stack
shipped llama.cpp (Vulkan / ROCm) for chat and embed, FLM for NPU multiplex, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation. All first-class in v1.
Image generation
shipped POST /v1/images/generations served by a ComfyUI provider on ROCm. Curated SDXL Turbo / SD 1.5 / Flux Schnell with license badges; the img slot runs inside the same lifecycle as chat, embed, and audio.
Caddy + auth + HTTPS + mDNS
shipped install.sh --auth=basic brings up Caddy at the edge with basic_auth, Bearer tokens for the OpenAI API, automatic HTTPS (internal CA for .local, Let’s Encrypt for real domains), and a best-effort Avahi service file so hal0.local resolves on the LAN. Off by default; opt in when you’re ready to expose the box.
Installer overhaul
shipped ASCII banner + step counter, extracted preflight.sh (disk + ports + python + systemd), hal0 doctor for re-running it post-install, hardware cards + auto-populated slots/primary.toml, contextual ERR-trap recovery hints, and a live-hello + QR + reachability finish.
Capability slots overlay
shipped Embed, Voice, and Img capability cards plus an NPU backend rollup in the dashboard, backed by /etc/hal0/capabilities.toml and GET|POST /api/capabilities/*. The orchestrator picks one model per child, validates (backend, model) pairs against the catalog, and reconciles drift on every apply.
FLM NPU provider live
shipped Self-contained ghcr.io/hal0ai/hal0-toolbox-flm:v1 image bundles FLM; chat and embed surfaced in the picker only when XDNA hardware and the local toolbox image are both present. Model namespace comes from flm list -j; STT slice deferred.
Proxmox host-pressure segment
shipped Drop a read-only PVEAuditor API token into Settings → Proxmox integration and the dashboard memory bar shows the physical DIMM total plus a muted Proxmox host segment for other-tenant + ZFS ARC + kernel pressure. Token sits 0600 at /etc/hal0/proxmox.json and the API redacts it on read; bare-metal installs leave the panel off.
embed-rerank built-in slot
shipped The capability orchestrator auto-creates an embed-rerank slot on first enable: bge-reranker-v2-m3-q4_k_m on port 8086 with --reranking, so /v1/rerankings stays separate from the chat surface on :8081.
First-run wizard rewrite
shipped Eight linear steps from password through hardware, primary model, capabilities, HF token, license, install, done. HF-token step is conditional on a gated model selection; legacy 5-step picker and three IA prototypes deleted.
--models-dir install flag
shipped install.sh --models-dir=PATH (or HAL0_MODELS_DIR, or an interactive tty prompt) points a fresh install at an existing model store. Persisted as [models].pull_root and included in [models].roots so the on-disk tree scans on first boot.
Models scan preview overhaul
shipped POST /api/models/scan/preview is inspection-only; kind-driven gating (llama / moonshine / kokoro / flm), shared skip rules, per-row select, dedup across HF blob-cache symlinks, GGUF magic-byte detection regardless of extension, GGUF general.name → suggested name, HF-cache repo-name fallback.
Per-slot live metrics
shipped GET /api/slots/metrics reads docker cgroup memory + ActiveEnterTimestampMonotonic + scraped llama-server /metrics, with a fallback to the systemd unit’s MemoryCurrent for native-host slots.
Orchestrator drift reconcile
shipped CapabilityOrchestrator.apply() now reconciles capabilities.toml ↔ slots/*.toml on every call rather than only on selection diff, so prior failed applies, manual edits, and install-time seeds can no longer leave the two disagreeing.
Hugging Face model pulls
in-progress POST /api/models/{id}/pull (streaming HF download) is still 501 today; the picker + first-run wizard wire it for the next 0.1.x alpha. (FLM-aware pull for NPU tags landed in PR #89, so Ollama-style family:size ids already pull from the dashboard.)
Agent management & integration
in-progress First-class agents stored on hal0 with conversation orchestration and deterministic tool-use loops. A place for agents to live on your box, not only to chat with.
Unified memory system
in-progress Persistent, federated memory across local slots and (optionally) external models. Mem0-style API with sources, search, and scoping.
MCP support, host and server
in-progress hal0 speaks Model Context Protocol both directions. Compose tools across local slots and external MCP services; discover them from the dashboard.
Extensions framework
in-progress Third-party apps packaged as hal0 extensions: systemd unit + dashboard tile + healthcheck. Bring-your-own-app, lifecycle-managed.
Benchmarks & presets UI
in-progress In-dashboard tok/s + latency runs, plus curated loadout presets you can flash onto a fresh install.
AUR PKGBUILD & Ubuntu PPA
in-progress Native distro packages on top of the install script: pacman and apt as first-class install paths.