<tt>GET /health</tt>
Purpose:
- authoritative runtime readiness check
- active profile and release identity
- safe public telemetry surface for the current room
Important response fields:
status
ready
detail
service_rev
profile_id
router_enabled
max_doc_tokens
default_memory_tokens
default_max_new_tokens
fallback_mode
deterministic_generation
config_class
base_model_class
documents_total
service_uptime_s
Example:
curl -s http://127.0.0.1:8091/health | jq
<tt>GET /documents</tt>
Purpose:
- list the documents currently compiled into the active runtime
Response fields per document:
doc_id
name
tokens
chars
memory_tokens
cache_size_mb
compile_time_s
extraction_time_s
total_compile_time_s
source_tokens_total
source_tokens_used
source_truncated
status
Example:
curl -s http://127.0.0.1:8091/documents | jq
<tt>POST /warmup</tt>
Purpose:
- explicitly load the runtime before the first compile or query
Example:
curl -s -X POST http://127.0.0.1:8091/warmup | jq '.status, .ready'
<tt>POST /compile_text</tt>
Purpose:
- compile already-extracted text directly into LATCH memory
Request fields:
name
text
source_type
memory_tokens optional
Example:
curl -s http://127.0.0.1:8091/compile_text \
-H 'Content-Type: application/json' \
-d '{"name":"acme.txt","text":"Acme Corp builds workflow software.","source_type":"text"}'
<tt>POST /compile_file</tt>
Purpose:
- upload one file payload and let the runtime extract and compile it
Request fields:
name or filename
content_base64
memory_tokens optional
Supported file types:
.pdf
.txt
.md
.html
.docx
.xlsx
.pptx
.csv
.json
.xml
Example:
curl -s http://127.0.0.1:8091/compile_file \
-H 'Content-Type: application/json' \
-d '{"filename":"acme-10k.pdf","content_base64":"<base64>"}'
<tt>POST /query</tt>
Purpose:
- run one query against one or more compiled documents
Request fields:
query
doc_ids
memory_tokens optional
max_new_tokens optional
Example:
curl -s http://127.0.0.1:8091/query \
-H 'Content-Type: application/json' \
-d '{"query":"Summarize the company in 3 bullets.","doc_ids":["doc_6f3a59b3f8"]}'
<tt>POST /batch</tt>
Purpose:
- execute one synchronous batch request against a set of documents and prompts
Use this endpoint when you want one request that evaluates multiple prompts or documents together through the supported batch contract.
This is the customer-facing JSON-in / JSON-out multi-prompt endpoint.
Typical flow:
- Compile one or more documents first with
POST /compile_text or POST /compile_file.
- Save the compiled
doc_id values returned by those compile calls.
- Put those
doc_id values plus a series of prompts into a JSON file.
- POST that JSON file to
/batch.
- Receive one JSON response containing the answers plus optional telemetry and economics blocks.
Key request fields:
session_id
schema_version
documents
queries
options
Per-document fields:
doc_id
filename optional
description optional
Per-query fields:
query_id
prompt
doc_scope
note optional
Execution options:
include_telemetry
include_economics
temperature
max_tokens
Notes:
doc_scope can be a list of doc_id values or the string "all".
temperature must remain 0.0 for the supported deterministic product path.
- The response is ordinary JSON, so it is easy to save directly to disk for downstream scripts.
Smallest useful request:
{
"session_id": "demo-001",
"schema_version": "1.0",
"documents": [
{ "doc_id": "doc-10k" }
],
"queries": [
{
"query_id": "q001",
"prompt": "Summarize this company in 2 sentences.",
"doc_scope": ["doc-10k"]
}
],
"options": {
"temperature": 0.0,
"max_tokens": 256
}
}
Example batch-request.json:
{
"session_id": "demo-001",
"schema_version": "1.0",
"documents": [
{
"doc_id": "doc-10k",
"filename": "Instructure_10-K.pdf",
"description": "Primary annual report"
}
],
"queries": [
{
"query_id": "q001",
"prompt": "What does Instructure do and what markets does it serve?",
"doc_scope": ["doc-10k"],
"note": "Opening summary question"
},
{
"query_id": "q002",
"prompt": "List the main risks in 3 bullets.",
"doc_scope": "all"
}
],
"options": {
"include_telemetry": true,
"include_economics": true,
"temperature": 0.0,
"max_tokens": 512
}
}
Example:
curl -s http://127.0.0.1:8091/batch \
-H 'Content-Type: application/json' \
-d @batch-request.json > batch-response.json
Example response shape:
{
"session_id": "demo-001",
"schema_version": "1.0",
"model": "cdlac_latch_qwen14b_locked_20260317",
"latch_version": "latch_product_nomount_20260325",
"generated_at": "2026-03-25T23:45:00Z",
"session_telemetry": {
"documents_compiled": 1,
"total_compile_time_s": 1.24,
"total_queries": 2,
"total_wall_clock_s": 2.91,
"total_tokens_generated": 58
},
"economics": {
"hardware": "NVIDIA A100 80GB PCIe",
"latch_session_cost_usd": 0.001786,
"baseline_estimated_session_cost_usd": 0.004911,
"savings_pct": 63.64,
"speedup_factor": 2.75
},
"queries": [
{
"query_id": "q001",
"prompt": "What does Instructure do and what markets does it serve?",
"doc_scope": ["doc-10k"],
"note": "Opening summary question",
"response": "Instructure provides cloud-based learning software for education and enterprise training.",
"telemetry": {
"ttft_s": 0.23,
"e2e_s": 1.67,
"tokens_generated": 23,
"decode_tps": 147.8,
"decode_mode": "single"
},
"baseline_estimate": {
"estimated_ttft_s": 1.42,
"estimated_e2e_s": 2.08,
"ttft_speedup": 6.17,
"e2e_speedup": 1.25
}
}
],
"latch_signature": {
"license_id": "sha256:1234567890abcdef",
"model_tuple": "LATCH-Qwen2.5-14B-r1-5k-step3000",
"image_tag": "latch_product_nomount_20260325"
}
}
Important response fields:
session_id
schema_version
model
latch_version
generated_at
session_telemetry
economics optional
queries
latch_signature
<tt>POST /documents/clear</tt>
Purpose:
- clear the active compiled document set from the runtime
Example:
curl -s -X POST http://127.0.0.1:8091/documents/clear | jq