Questions
Technical and commercial basics in one place.
Relevant product details also appear on the benchmarks section, the portable format section, and the licensing section of the main site.
What is LATCH?
LATCH is a proprietary inference layer that compiles document sets into persistent LLM memory. After a one-time compilation step, every subsequent query runs against the compiled representation without re-reading, re-chunking, or re-embedding source documents. The result is saved as a portable .latch or .latchdoc binary file.
How is LATCH different from RAG?
RAG chunks documents, embeds them, retrieves relevant chunks per query, and injects them into the context window every time. LATCH compiles the entire document set once into a persistent model-level representation. There is no chunking, no retrieval step, and no per-query cost after the initial compile.
This eliminates chunking artifacts and enables full cross-document reasoning. For the longer version, see the full LATCH vs RAG comparison.
How is LATCH different from KV cache?
Standard KV caches are session-bound. They are evicted when the session ends and cannot be persisted to disk or shared. LATCH produces a persistent binary file that can be saved, transferred, and reloaded in 1.6ms.
LATCH also reduces VRAM usage by 50%, which KV caching does not.
What models does LATCH support?
LATCH currently supports four model families: Qwen (2.5 14B benchmarked), Mistral, Llama, and DeepSeek.
What hardware do I need?
LATCH requires an NVIDIA GPU with 80GB VRAM. The H100 and A100 are the recommended and benchmarked GPUs. It runs as a Docker container on Linux.
The deployment path is documented in the Ubuntu GPU quickstart and the RunPod guide.
How much does LATCH cost?
The evaluation/personal license is $79 one-time, covering up to 3 activations for one user. Commercial deployment and enterprise/OEM licenses are available by contacting sales.
Is LATCH a hosted service?
No. LATCH is self-hosted by default. You run the Docker container on your own infrastructure and your documents never leave your environment. A managed hosted option is planned for the future.
What document formats does LATCH accept?
PDF, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, JSON, and XML. The current API surface is documented in the customer API reference.
What is a .latch file?
A .latch file is a portable binary containing only the compiled model-level memory with no source text. It can be shared without exposing the original documents and reloaded in 1.6ms.
What is a .latchdoc file?
A .latchdoc file includes everything in a .latch file plus embedded raw text, enabling full-text search and automatic quality fallback for edge-case queries. It is the recommended default format.
The format overview on the main site lives in the portable document memory section.
What are the benchmarked performance numbers?
On NVIDIA H100 80GB with vLLM: 0.11s time-to-first-token (vs 23.1s baseline), 210× faster cold start, 1.6ms cache reload, 91.7% multi-document pass rate, 97% cost reduction after 25 queries, 50% less VRAM, and 5.2× end-to-end speedup.
See the main-site benchmarks section for the compact summary.
Is the API compatible with OpenAI's format?
Yes. LATCH exposes an OpenAI-format compatible REST API, so existing tooling and integrations work with minimal changes.