Local by default: why most AI should run on your own hardware

Simon Hoffmann

COO

On the evening of June 12, 2026, one of the most capable AI models in the world went dark for every customer who relied on it. Not because of an outage or because of a bug. The US government issued an export-control directive ordering that the models be cut off from any foreign national, inside or outside the country. Because there is no reliable way to check a user's nationality at the moment an API call comes in, the only way to comply was to switch the models off for everyone, everywhere.

If you were a company that had built a product on top of one of those models, you found out the same way the rest of us did. The capability you depended on was simply gone, with no notice, no migration window, and no say in the matter. The provider disputed the directive and said it was working to restore access. That is a fair position, and the policy fight is its own story. But for anyone building on borrowed infrastructure, the policy fight is beside the point. The lesson is simpler and older than this particular event: when the thing your business runs on lives on someone else's servers, in someone else's jurisdiction, you do not control it. Someone else does.


This was not a one-off

It is tempting to view that day as a "unusual one-off" and move on. We think that is the wrong read. It is a recent illustration of a structural reality that has been building for years.

Europe runs an enormous share of its software, its cloud, and now its AI on infrastructure owned by a handful of US companies. For a long time that felt like a convenience and a bargain. Increasingly it looks like a dependency. The direction of travel is toward more restriction, not less: export controls on advanced compute and models, data-transfer rules that keep getting tested in court, and laws that let a government compel a domestic provider to hand over data even when it sits on servers abroad. None of this requires bad faith from anyone. It just means that a capability sitting on a foreign platform can be re-priced, re-licensed, geofenced, or switched off, for reasons that have nothing to do with you.


Some data should never leave the building

The documents most businesses care about most are also the ones that should travel the least: invoices, contracts, medical records, financial statements, customer files, internal reports. These are exactly the inputs people now want to push through AI to pull out structured data. And the default way to do that, today, is to upload them to a third-party API in another country and hope the terms of service, the data-handling promises, and the geopolitics all hold.

For sensitive documents, that should not be a dependency at all. The general principle is not new: you do not want a single external party holding something core to how you operate, whether that is your data, your tooling, or your ability to keep running next week. What is new is that, for this specific job, doing it yourself is no longer a research project or a compromise. The open models are good enough. The hardware is reachable. The only thing missing was tooling that made it boringly simple.

That is the gap we built ParseHawk to close.


What ParseHawk is

ParseHawk turns PDFs, scans, images, and text into structured, validated JSON, running entirely on your own hardware. You describe the fields you want with a schema, give it plain-language instructions, and it returns clean JSON that matches. No document ever leaves your machine.

It runs the model locally so you can do real extraction on a server or even on a MacBook. You drive the same workflow from a web UI, from the command line, or from a REST API, whichever fits the way you already work. It is open source under Apache-2.0, built on an open-weight extraction model.

The point is not just privacy as a feature. The point is that nobody outside your organization can revoke it, meter it, geofence it, or read what runs through it. It is yours.


What it is not, yet

We would rather you hear this from us than discover it later.

ParseHawk runs a small, quantized model on local hardware. On clean, structured documents it does very well. On genuinely messy or dense ones, it will not match the very largest frontier cloud models, and if your use case demands that last few percent of accuracy on hard inputs, you should test it against your own documents. It is also early. We are in developer preview, which means the API can still change between releases.


Who this is for

If you are a team handling documents that legally or ethically cannot go to a cloud API, a clinic, a law firm, an accountant, an insurer, a public body, ParseHawk is built for exactly your constraint. If you are a European company that has quietly grown uncomfortable with how much of your stack depends on infrastructure you cannot govern, this is one concrete place to take a piece of it back. And if you just believe, as we do, that your own documents should run on your own machine, you are in the right place.

We build ParseHawk in Vienna, in the open. Give it a try.