Docs
CLI Reference
Reference for pull, chat, serve, quantize, models, and list.
This page lists the main trillim subcommands and the flags most people use.
If you installed with uv, prefix each command on this page with uv run.
trillim list
List models and adapters available on HuggingFace from the Trillim organization.
trillim list [--json]
| Flag | Description |
|---|---|
--json | Output JSON instead of a formatted table |
Downloaded items are marked as local.
trillim pull
Download a pre-quantized model from HuggingFace.
trillim pull <model_id> [--revision <ref>] [--force]
| Flag | Description |
|---|---|
model_id | HuggingFace model ID such as Trillim/BitNet-TRNQ |
--revision | Branch, tag, or commit hash to download |
--force, -f | Re-download even if the model already exists locally |
Models are stored under ~/.trillim/models/<org>/<model>/.
Example:
trillim pull Trillim/BitNet-TRNQ
trillim models
List locally downloaded models and adapters.
trillim models [--json]
| Flag | Description |
|---|---|
--json | Output JSON instead of a formatted table |
Example output:
Models
MODEL ID ARCH SIZE SOURCE
-------------------- ---------- ---- -----
Trillim/BitNet-TRNQ BitNet 1.2G microsoft/bitnet-b1.58-2B-4T-bf16
Adapters
ADAPTER ID SIZE COMPATIBLE MODELS
------------------------------ ---- -----------------
Trillim/BitNet-GenZ-LoRA-TRNQ 24M Trillim/BitNet-TRNQ
trillim chat
Start an interactive chat session with a model.
trillim chat <model_dir> [options]
| Flag | Description |
|---|---|
model_dir | Local path or HuggingFace model ID resolved from ~/.trillim/models/ |
--lora <dir> | Quantized LoRA adapter directory |
--threads <N> | Inference thread count; 0 auto-detects as num_cores - 2 |
--lora-quant <type> | LoRA quantization: none, bf16, int8, q4_0, q5_0, q6_k, q8_0 |
--unembed-quant <type> | Unembed quantization: int8, q4_0, q5_0, q6_k, q8_0 |
--trust-remote-code | Allow loading custom tokenizer code from the model directory |
--harness <name> | Harness name: default or search |
--search-provider <name> | Search provider for the search harness: ddgs or brave |
Examples:
trillim chat Trillim/BitNet-TRNQ
trillim chat ./my-model-TRNQ
trillim chat Trillim/BitNet-TRNQ --lora Trillim/BitNet-GenZ-LoRA-TRNQ
trillim chat Trillim/BitNet-TRNQ --threads 4
trillim chat Trillim/BitNet-Search-TRNQ --harness search
trillim chat Trillim/BitNet-Search-TRNQ --harness search --search-provider brave
trillim serve
Start an OpenAI-compatible API server.
trillim serve <model_dir> [options]
| Flag | Description |
|---|---|
model_dir | Local path or HuggingFace model ID |
--host <addr> | Bind address, default 127.0.0.1 |
--port <N> | Bind port, default 8000 |
--voice | Enable speech-to-text and text-to-speech endpoints |
--whisper-model <size> | Whisper model size, default base.en |
--voices-dir <dir> | Directory for persistent custom voice WAV files, default ~/.trillim/voices |
--threads <N> | Inference thread count; 0 auto-detects |
--lora-quant <type> | LoRA quantization level |
--unembed-quant <type> | Unembed quantization level |
--trust-remote-code | Allow loading custom tokenizer code |
If you want --voice, install the optional extra first with uv add "trillim[voice]" or pip install "trillim[voice]".
trillim serve starts with the default harness. To switch a running server to the search harness, call POST /v1/models/load with "harness": "search" and optional "search_provider": "ddgs" | "brave".
Examples:
trillim serve Trillim/BitNet-TRNQ
trillim serve Trillim/BitNet-TRNQ --host 0.0.0.0 --port 3000
trillim serve Trillim/BitNet-TRNQ --voice
trillim serve Trillim/BitNet-TRNQ --voice --whisper-model medium.en
trillim quantize
Quantize safetensors model weights and/or extract a LoRA adapter into Trillim’s binary format. Only works for BitNet models currently.
trillim quantize <model_dir> [--model] [--adapter <dir>]
| Flag | Description |
|---|---|
model_dir | HuggingFace model directory containing config.json and safetensors |
--model | Write <model_dir>-TRNQ/qmodel.tensors and rope.cache |
--adapter <dir> | Write <adapter_dir>-TRNQ/qmodel.lora |
You can pass both --model and --adapter in the same command.
Examples:
trillim quantize ./bitnet-2b --model
trillim quantize ./bitnet-2b --adapter ./my-lora-checkpoint
trillim quantize ./bitnet-2b --model --adapter ./my-lora-checkpoint