# The Blackstone Graph

> Source: <https://isaacus.com/blog/announcing-the-blackstone-graph>
> Published: 2026-06-04 10:33:13+00:00

For most of my life, I’ve been obsessed with two things: law and data. That obsession is what drove me to study law, train my own legal AI models, lead machine learning at a justice agency, and, most recently, start my own foundational legal AI research lab.

What I’ve found time and time again is that law and data don’t often play well together. Leaving aside data availability, accessibility, and quality issues (and there are many!), at a fundamental level, law is less a world unto itself and more a process that operates on the world. To truly model law in digital form, you need to model the world itself.

Today, I’m announcing the culmination of my life’s work to make law and data get along: the Blackstone Graph. To get your hands on it when it enters closed beta in a couple of months, sign up to the [Isaacus Beta Program](https://isaacus.com/beta).

## A world model for law

The Blackstone Graph is a living network of interrelated laws, regulations, cases, people, companies, locations, and other legal and real-world entities, covering every major civil and common law jurisdiction, represented through a new standardized, information-rich legal world model.

Blackstone has no analogs. Unlike previous attempts at modeling law, Blackstone considers the fundamental atoms of the world of law to be people, places, and instruments, *not* legal norms. Under the assumption that law is a process that operates on the world, we have left the inherently normative process of interpreting legal norms to the users of Blackstone: lawyers and agents.

Unlike contemporary legal data services, Blackstone will be international, agent-facing, and open from day one. Blackstone’s schema, in particular, will support common, civil, mixed, and even tribal legal systems regardless of their language and jurisdictional peculiarities. At the same time, fundamental, recurring legal concepts have been rigorously standardized to ensure querying and retrieval are ergonomic for humans and agents alike. In the interests of propelling the field of legal research forward, Blackstone’s schema will be made freely available under a commercially permissive open-source license.

Finally, unlike other legal standards, Blackstone and its schema will be interoperable with popular legal ontologies and formats, including Akoma Ntoso (AKN), the Federated Open Legal Information Ontology (FOLIO), and Legal Matter Specification Standard (LMSS).

## Legal data is in disarray

In 2026, the year of agents, the state of public legal data remains in disarray.

To build Blackstone, we’ve had to index over 400 data sources (and counting) across the US, UK, Canada, Australia, New Zealand, Ireland, Singapore, and EU. In the process, we’ve found some truly egregious examples of missing, incomplete, and inaccessible data. The copy of the Alaska Statutes hosted by the Alaska State Legislature, for example, is two years old and rendered in an interface more befitting of a Bloomberg Terminal. Worse still, the Supreme Court of Victoria does not even host its own judgments, instead directing members of the public to visit AustLII, a private data provider.

*An actual page from the Alaska State Legislature’s reproduction of the Alaska Statutes.*

Given that one must know the law to be able to comply with it, the general inaccessibility of most legal data is not just an annoyance; rather, it risks corrosion of the rule of law.

Today, there are a handful of enterprises holding an oligopoly on high-quality legal data—higher quality even than the data held by the parliaments and courts that actually issue laws and decisions. Lexis, for example, has a more up-to-date copy of the Colorado Revised Statutes than the one hosted by the Colorado General Assembly.

Members of the public are therefore pushed to adopt commercial solutions that lock them into subpar interfaces. Startups, in turn, end up totally unable to compete on quality and coverage.

Having both built my own legal AI models and spent countless hours at law school scouring through Lexis and Westlaw to find relevant precedent, I know that legal knowledge could be represented and delivered so much better than how it is today. A lack of competition has led to stagnation and complacency.

## Reimagining the delivery of legal data

For three decades, the interface to legal data services had remained the same: a search box. Now, however, legal data is more often being delivered via agents running their own queries, analyzing results, and distilling their findings back to users. That process is similar to how a lawyer might conduct research, yet the interface is fundamentally different. Instead of querying a search box, agents prefer to make API requests, speak to MCP servers, and grep file systems.

To get access to the same information lawyers have, agents are being forced to spin up browser instances where they interact with websites like a human would. That approach sets the ceiling on performance far lower than it could be.

For an LLM, accessing legal data through human-facing interfaces is like drinking through a straw instead of from a jug. With direct access to a rich, ergonomic, up-to-date legal knowledge graph, an LLM could query and reason about complex legal relationships in ways that would otherwise be impossible.

Those holding an oligopoly on high-quality legal data are aware of this but are concerned about opening the floodgates to anyone being able to pull and manipulate data outside of their walled gardens.

That presents an opportunity for us at Isaacus. We are not and will never be mere data brokers. We build state-of-the-art data and AI services. We deliver value on top of data by enriching and manipulating it in ways that would otherwise be impossible to achieve without owning your own self-improving legal AI stack.

We therefore plan on making Blackstone as accessible and interface-agnostic as possible.

At release, the Blackstone Graph will support direct API access and MCP. In the months thereafter, we will release the Isaacus Research platform, enabling end users to experience the end-to-end Isaacus stack for the first time.

Additionally, we’re currently in talks with select enterprises about offering private deployments of the entire Blackstone Graph system to ingest, enrich, and serve their own data.

The ontology, standards, and formats powering the Blackstone Graph will also all be made available under an open-source license. Everyone, including our competitors, will be free to build their own applications around our knowledge model.

This interface-agnostic, open-source approach to the delivery of legal data is, to our knowledge, the first of its kind.

## How we’re pulling it off

A project of the scale and magnitude of what we’ve envisioned for Blackstone demands a level of depth, precision, and forethought that is unprecedented with respect to any other project I’ve worked on.

Although most of the architectural infrastructure we’re building now will not be needed by what we deliver at Blackstone’s launch, it will absolutely be required at scale.

Retrofitting support for common law jurisdictions in a system fundamentally designed around EU civil law is not something that can be done without compromising ergonomics, stability, or quality. Most legal data providers end up either focusing on the same segment they’ve always served or acquiring a competitor in another jurisdiction and offering separate jurisdiction-specific interfaces.

That’s why, from day one, we’ve considered how all the various peculiarities of different jurisdictions’ conceptions of law, norms, authority, competence, and bindingness can be standardized in a way that minimizes loss of valuable distinctions, maximizes ergonomics, and is robust to the introduction of new, exotic data.

We’ve achieved this by building our own schema for representing information for legal research known as the Blackstone Graph Schema (BGS). BGS currently consists of over 50 entity types, 40 taxonomies, and 150 relationships. From first-class support for distinguishing and linking courts, the authorities they wield over jurisdictions, the jurisdictional forums through which they exercise such authority, the proceedings they adjudicate, decisions in those proceedings, and opinions expressed in decisions, all the way to tracking the identity and evolution of individual sections and subsections of laws across amendments and translations, almost anything you can conceive of as being genuinely valuable for legal research has either already been or is soon to be baked into BGS.

BGS is just one of several architectural innovations we’ve had to develop to realize the Blackstone Graph.

For example, given that, at scale, Blackstone will need to ingest millions of records from thousands of sources, each served, formatted, and maintained differently, we decided to roll our own ingestion framework, including a bespoke HTTP client, XML parser, task scheduler, and more. This decision was made not because of an immediate need nor because we enjoy over-engineering, but because Blackstone’s full potential can never be reached without solid infrastructure to power it.

Now, instead of writing thousands of brittle, bespoke, complex one-off scripts, the framework we’ve built will allow us to write simple specifications that glue a curated selection of tools together in a modular fashion. Over time, we can and will improve those tools without ever needing to touch the pipelines that rely on them.

Our focus on centralizing enrichment further enables us to track the provenance of data at an unprecedented level of granularity. For any given field of any given BGS entity, we will be able to know the exact artifacts it was sourced from and how those artifacts were processed by our framework, including the precise AI models and libraries used to transform them. Over time, we will leverage that information to identify, correct, and improve enrichments produced by outdated approaches.

In addition to BGS and our ingestion framework, one of the most important innovations powering the Blackstone Graph is our Kanon family of legal encoder models. From extracting, linking, and disambiguating entities to identifying the full hierarchy of sections and subsections within documents, much of what we need to enrich our data into BGS can already be performed by one of our existing Kanon models. For the remainder, we’re internally finetuning our own task-specific models alongside a handful of new advanced natural language processing algorithms.

Because our Kanon models are all non-generative encoder models, we have already achieved state-of-the-art performance (see, eg, the reproducible, open-source [Massive Legal Embedding Benchmark](https://isaacus.com/mleb)) with significantly less latency than typical generative-AI-based approaches at a fraction of the cost. The fact that they are not generative models further makes them architecturally incapable of producing generative hallucinations, allowing us to guarantee the integrity of our data in ways that others simply cannot.

## Recursive improvement

Because we are, at our core, a foundational AI research company and not (just) a data broker or application builder, we have the unique and rare privilege of being able to train our own models to solve the problems of our industry.

Soon after the initial release of the Blackstone Graph, we will be releasing a successor to every task-specific Kanon model we’ve ever built: Kanon 3. Kanon 3 will be able to vectorize, classify, segment, and enrich data all at once. It will support custom schemas, multiple languages, and even span- and entity-level embeddings for latent disambiguation. Just like every Kanon model before it, Kanon 3 will achieve state-of-the-art performance with an architectural guarantee of zero generative hallucinations.

The Blackstone Graph will be instrumental to achieving that. Every single annotation and link in the Blackstone Graph will be fed into training Kanon 3. As the product itself of all of our previous Kanon models, in the process of training Kanon 3 on the Blackstone Graph, we will, for the first time for any legal AI company, establish a recursive improvement loop.

Through Kanon 3, we plan on re-enriching the entire Blackstone Graph, dramatically elevating quality and depth to new heights. After that, we will train our first generative model, Kadi, on the Blackstone Graph. New Isaacus models will continue to feed into and be fed the Blackstone Graph.

Kanon transforms unstructured legal data into the Blackstone Graph Schema (BGS).

On a personal level, I have never been more excited in my professional life to see a project come to fruition. I’m confident there will be a step change in the performance and capabilities of our models through our recursive improvement loop (we already had a taste of that when we used Kanon 2 Embedder to train Kanon 2 Reranker) but have no idea what it will look like. To our knowledge, there is no precedent in the legal AI space. We are, in essence, operating at and pushing the frontier of what’s possible.

## Join us at the frontier

In the coming months, we expect to release the Blackstone Graph through the [Isaacus Beta Program](https://isaacus.com/beta). The beta program will allow trusted design partners to play with the Blackstone Graph in a raw form ahead of its wider public release. Program members will share their feedback with us and participate in joint marketing.

Members of the Isaacus Beta Program include Harvey, KPMG Law, Alvarez & Marsal, Clifford Chance, Clyde & Co, Carey Olsen, Smokeball, Moonlit, LawY, and the NSW Government.

If you’re a legal tech builder or represent an enterprise, government, or startup and would like to join us in pushing the frontier in legal AI, you’re welcome to apply to the program here. There are a limited number of slots available, so we encourage applying sooner rather than later.
