Question 1

What is LATCH?

Accepted Answer

LATCH is a proprietary inference layer that compiles document sets into persistent LLM memory. After a one-time compilation step, every subsequent query runs against the compiled representation without re-reading, re-chunking, or re-embedding source documents. The result is saved as a portable .latch or .latchdoc binary file.

Question 2

How is LATCH different from RAG?

Accepted Answer

RAG chunks documents, embeds them, retrieves relevant chunks per query, and injects them into the context window every time. LATCH compiles the entire document set once into a persistent model-level representation. There is no chunking, no retrieval step, and no per-query cost after the initial compile.

This eliminates chunking artifacts and enables full cross-document reasoning. For the longer version, see the full LATCH vs RAG comparison.

Question 3

How is LATCH different from KV cache?

Accepted Answer

Standard KV caches are session-bound — they are evicted when the session ends and cannot be persisted to disk or shared. LATCH produces a persistent binary file that can be saved, transferred, and reloaded in 1.6ms. It also reduces VRAM usage by 50%, which KV caching does not.

Question 4

What models does LATCH support?

Accepted Answer

LATCH currently supports four model families: Qwen (2.5 14B benchmarked), Mistral, Llama, and DeepSeek.

Question 5

What hardware do I need?

Accepted Answer

LATCH requires an NVIDIA GPU with 80GB VRAM. The H100 and A100 are the recommended and benchmarked GPUs. It runs as a Docker container on Linux.

Question 6

How much does LATCH cost?

Accepted Answer

The evaluation/personal license is $79 one-time, covering up to 3 activations for one user. Commercial deployment and enterprise/OEM licenses are available by contacting sales.

Question 7

Is LATCH a hosted service?

Accepted Answer

No. LATCH is self-hosted by default. You run the Docker container on your own infrastructure and your documents never leave your environment. A managed hosted option is planned for the future.

Question 8

What document formats does LATCH accept?

Accepted Answer

PDF, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, JSON, and XML.

Question 9

What is a .latch file?

Accepted Answer

A .latch file is a portable binary containing only the compiled model-level memory — no source text. It can be shared without exposing the original documents and reloaded in 1.6ms.

Question 10

What is a .latchdoc file?

Accepted Answer

A .latchdoc file includes everything in a .latch file plus embedded raw text, enabling full-text search and automatic quality fallback for edge-case queries. It is the recommended default format.

Question 11

What are the benchmarked performance numbers?

Accepted Answer

On NVIDIA H100 80GB with vLLM: 0.11s time-to-first-token (vs 23.1s baseline), 210× faster cold start, 1.6ms cache reload, 91.7% multi-document pass rate, 97% cost reduction after 25 queries, 50% less VRAM, and 5.2× end-to-end speedup.

See the main-site benchmarks section for the compact summary.

Question 12

Is the API compatible with OpenAI's format?

Accepted Answer

Yes. LATCH exposes an OpenAI-format compatible REST API, so existing tooling and integrations work with minimal changes.

Clear answers about LATCH, hardware, pricing, and document-memory architecture.

Technical and commercial basics in one place.