LATCH Customer Docs
LATCH Customer API Reference
Customer-facing HTTP API and telemetry surface for the self-hosted LATCH runtime.
Endpoint Reference

<tt>GET /health</tt>

Purpose:

  • authoritative runtime readiness check
  • active profile and release identity
  • safe public telemetry surface for the current room

Important response fields:

  • status
  • ready
  • detail
  • service_rev
  • profile_id
  • router_enabled
  • max_doc_tokens
  • default_memory_tokens
  • default_max_new_tokens
  • fallback_mode
  • deterministic_generation
  • config_class
  • base_model_class
  • documents_total
  • service_uptime_s

Example:

curl -s http://127.0.0.1:8091/health | jq

<tt>GET /documents</tt>

Purpose:

  • list the documents currently compiled into the active runtime

Response fields per document:

  • doc_id
  • name
  • tokens
  • chars
  • memory_tokens
  • cache_size_mb
  • compile_time_s
  • extraction_time_s
  • total_compile_time_s
  • source_tokens_total
  • source_tokens_used
  • source_truncated
  • status

Example:

curl -s http://127.0.0.1:8091/documents | jq

<tt>POST /warmup</tt>

Purpose:

  • explicitly load the runtime before the first compile or query

Example:

curl -s -X POST http://127.0.0.1:8091/warmup | jq '.status, .ready'

<tt>POST /compile_text</tt>

Purpose:

  • compile already-extracted text directly into LATCH memory

Request fields:

  • name
  • text
  • source_type
  • memory_tokens optional

Example:

curl -s http://127.0.0.1:8091/compile_text \
-H 'Content-Type: application/json' \
-d '{"name":"acme.txt","text":"Acme Corp builds workflow software.","source_type":"text"}'

<tt>POST /compile_file</tt>

Purpose:

  • upload one file payload and let the runtime extract and compile it

Request fields:

  • name or filename
  • content_base64
  • memory_tokens optional

Supported file types:

  • .pdf
  • .txt
  • .md
  • .html
  • .docx
  • .xlsx
  • .pptx
  • .csv
  • .json
  • .xml

Example:

curl -s http://127.0.0.1:8091/compile_file \
-H 'Content-Type: application/json' \
-d '{"filename":"acme-10k.pdf","content_base64":"<base64>"}'

<tt>POST /query</tt>

Purpose:

  • run one query against one or more compiled documents

Request fields:

  • query
  • doc_ids
  • memory_tokens optional
  • max_new_tokens optional

Example:

curl -s http://127.0.0.1:8091/query \
-H 'Content-Type: application/json' \
-d '{"query":"Summarize the company in 3 bullets.","doc_ids":["doc_6f3a59b3f8"]}'

<tt>POST /batch</tt>

Purpose:

  • execute one synchronous batch request against a set of documents and prompts

Use this endpoint when you want one request that evaluates multiple prompts or documents together through the supported batch contract.

This is the customer-facing JSON-in / JSON-out multi-prompt endpoint.

Typical flow:

  1. Compile one or more documents first with POST /compile_text or POST /compile_file.
  2. Save the compiled doc_id values returned by those compile calls.
  3. Put those doc_id values plus a series of prompts into a JSON file.
  4. POST that JSON file to /batch.
  5. Receive one JSON response containing the answers plus optional telemetry and economics blocks.

Key request fields:

  • session_id
  • schema_version
  • documents
  • queries
  • options

Per-document fields:

  • doc_id
  • filename optional
  • description optional

Per-query fields:

  • query_id
  • prompt
  • doc_scope
  • note optional

Execution options:

  • include_telemetry
  • include_economics
  • temperature
  • max_tokens

Notes:

  • doc_scope can be a list of doc_id values or the string "all".
  • temperature must remain 0.0 for the supported deterministic product path.
  • The response is ordinary JSON, so it is easy to save directly to disk for downstream scripts.

Smallest useful request:

{
"session_id": "demo-001",
"schema_version": "1.0",
"documents": [
{ "doc_id": "doc-10k" }
],
"queries": [
{
"query_id": "q001",
"prompt": "Summarize this company in 2 sentences.",
"doc_scope": ["doc-10k"]
}
],
"options": {
"temperature": 0.0,
"max_tokens": 256
}
}

Example batch-request.json:

{
"session_id": "demo-001",
"schema_version": "1.0",
"documents": [
{
"doc_id": "doc-10k",
"filename": "Instructure_10-K.pdf",
"description": "Primary annual report"
}
],
"queries": [
{
"query_id": "q001",
"prompt": "What does Instructure do and what markets does it serve?",
"doc_scope": ["doc-10k"],
"note": "Opening summary question"
},
{
"query_id": "q002",
"prompt": "List the main risks in 3 bullets.",
"doc_scope": "all"
}
],
"options": {
"include_telemetry": true,
"include_economics": true,
"temperature": 0.0,
"max_tokens": 512
}
}

Example:

curl -s http://127.0.0.1:8091/batch \
-H 'Content-Type: application/json' \
-d @batch-request.json > batch-response.json

Example response shape:

{
"session_id": "demo-001",
"schema_version": "1.0",
"model": "cdlac_latch_qwen14b_locked_20260317",
"latch_version": "latch_product_nomount_20260325",
"generated_at": "2026-03-25T23:45:00Z",
"session_telemetry": {
"documents_compiled": 1,
"total_compile_time_s": 1.24,
"total_queries": 2,
"total_wall_clock_s": 2.91,
"total_tokens_generated": 58
},
"economics": {
"hardware": "NVIDIA A100 80GB PCIe",
"latch_session_cost_usd": 0.001786,
"baseline_estimated_session_cost_usd": 0.004911,
"savings_pct": 63.64,
"speedup_factor": 2.75
},
"queries": [
{
"query_id": "q001",
"prompt": "What does Instructure do and what markets does it serve?",
"doc_scope": ["doc-10k"],
"note": "Opening summary question",
"response": "Instructure provides cloud-based learning software for education and enterprise training.",
"telemetry": {
"ttft_s": 0.23,
"e2e_s": 1.67,
"tokens_generated": 23,
"decode_tps": 147.8,
"decode_mode": "single"
},
"baseline_estimate": {
"estimated_ttft_s": 1.42,
"estimated_e2e_s": 2.08,
"ttft_speedup": 6.17,
"e2e_speedup": 1.25
}
}
],
"latch_signature": {
"license_id": "sha256:1234567890abcdef",
"model_tuple": "LATCH-Qwen2.5-14B-r1-5k-step3000",
"image_tag": "latch_product_nomount_20260325"
}
}

Important response fields:

  • session_id
  • schema_version
  • model
  • latch_version
  • generated_at
  • session_telemetry
  • economics optional
  • queries
  • latch_signature

<tt>POST /documents/clear</tt>

Purpose:

  • clear the active compiled document set from the runtime

Example:

curl -s -X POST http://127.0.0.1:8091/documents/clear | jq