Context windows are expensive, and RAG breaks the reasoning path.
Every major LLM, whether hosted or open-weight, handles documents the same basic way: the text goes into the context window, the model reads it, and then it answers. Ask another question and the system reads the corpus again from scratch, with the same latency and nearly the same cost profile as the first query.
RAG was supposed to fix that. It helps on token cost by chunking documents into a vector database and retrieving only the nearest pieces, but it introduces a different failure mode: cross-document reasoning silently degrades when the answer depends on connections across sections and files that never get retrieved together.
Your 200th question should not cost the same as your 1st. LATCH is built around that premise.
Compile the corpus once, then query the compiled memory.
LATCH compiles a document set into a persistent neural representation. It is not an embedding, not a summary, and not a vector index. The goal is to preserve the semantic content of the corpus and the relationships across it in a reusable latent memory tensor.
After compilation, queries run against that compiled memory instead of forcing the model to re-read the source tokens. The compile cost is paid once and then amortized over every future query.
The result is saved as a portable file that can move between machines. One person can compile the document set, send the output to a colleague, and that colleague can reload and query it without recompiling the original corpus.
Contains only the compiled memory. No source text is included, and the file is intended to be non-reversible back into the original documents.
Contains the compiled memory plus extracted source text, enabling full-text search and automatic quality fallback for edge-case questions.
Both formats are encrypted, portable, and designed to reload in approximately 1.6 milliseconds.
Measured on enterprise documents, not toy prompts.
On the flagship Qwen 2.5 14B configuration, benchmarked on an NVIDIA H100 80GB, the current measured profile looks like this:
Start self-hosted, then make the format portable enough to travel anywhere.
Today, creating a .latchdoc requires a self-hosted LATCH runtime on an H100 or A100 GPU. That is the right starting point for enterprise customers who need private, on-infrastructure document intelligence.
The longer-term direction is broader: a cloud compilation service that can turn uploaded documents into a .latchdoc without the user owning a GPU, and a lightweight local reader capable of opening and querying those files on consumer hardware.
The end state is straightforward: create in the cloud, read anywhere, and treat .latchdoc as a portable format for AI-compiled document intelligence.
One founder, one codebase, one thesis carried through.
CoDynamics Lab Corporation is a Delaware C-Corp based in Gilbert, Arizona. The company is currently a solo-founder operation, and the product has been built by the founder directly. The background behind that build includes more than 20 years in systems optimization and 13 US patents across multiple technical domains.
LATCH is already available in self-hosted form.
- Self-hosted evaluation license: $79 on Gumroad
- Technical documentation: codynamicslab.com/documentation
- Commercial and enterprise licensing: mike@codynamicslab.com