Running AI without sending customer data to OpenAI

EU-sovereign AI field guide for founders who can't send customer data to OpenAI. Mistral, IONOS, Azure EU data zones — real costs, real trade-offs, GDPR-safe.

Most of the public conversation about "sovereign AI" mixes three different things into one slogan: a regulatory question about where data sits, a political question about who controls the infrastructure, and a marketing question about how a vendor wants to differentiate. The result is that founders trying to decide what to actually do end up either over-engineering for sovereignty they do not need, or under-engineering for sovereignty they cannot escape.

This guide is the version we wish we had had when we started running deployments for EU companies. The numbers are real. They come from a recent engagement with a Berlin research firm whose business model makes data leaving the building structurally impossible: a 20–30 person team, 500+ NDA-bound client firms feeding them sensitive compensation data, a hard requirement that nothing leaves their physical control during inference. The cost discovery was uncomfortable. A fully air-gapped local deployment for that team comes in around €714,000 over three years in the recommended configuration. The equivalent capability on IONOS AI Model Hub in Frankfurt (also German jurisdiction, also GDPR-aligned, also auditable) costs €5,000–€15,000 over three years. The sovereignty premium is roughly 50×.

That is the number that should frame the rest of this conversation. Sovereignty is not free. It is rarely binary. And the right answer for most EU small companies sits somewhere in the middle of that spectrum, not at either end.

Garden's working thesis: the right architecture for most EU small companies is a hybrid. Put sensitive sub-tasks on EU-hosted open weights. Put non-sensitive heavy reasoning on the best model regardless of origin. The hard work, the work most vendors will not do for you, is drawing the line between the two.

This guide walks through how to draw that line.

What "EU-Sovereign AI" Actually Means (and What It Doesn't)

EU-sovereign AI means an AI deployment where data, model weights, and inference compute remain inside the legal and physical perimeter of the European Union, governed by EU law alone, with no exposure to extraterritorial subpoena regimes (chiefly the US CLOUD Act).

There are three layers stacked underneath that sentence.

Data sovereignty is about where customer data goes during inference. If a user pastes a confidential document into a prompt, whose servers process it, who can be compelled to produce a copy?

Compute sovereignty is about who owns and operates the physical infrastructure. A French company on AWS Frankfurt is hosting in the EU; it is not running on EU-controlled infrastructure. Amazon is a US company subject to US law, including the CLOUD Act, which can compel US companies to produce data stored anywhere in the world. The data is in Frankfurt; the control is not.

Model sovereignty is about who builds, owns, and updates the model. Using GPT-4.1 through Azure EU data zones gives you data residency. It does not give you model sovereignty. The model can be updated, deprecated, or restricted by OpenAI at any time. Your stack is downstream of a foreign company's commercial decisions.

A genuinely sovereign deployment satisfies all three. A "sovereign-washed" deployment satisfies one and uses the word "sovereign" anyway. Most compliance obligations care about data sovereignty; most strategic risk sits in compute and model sovereignty. If you only think about one, you solve half the problem.

The regulatory backdrop in 2026:

GDPR governs personal data processing. Cross-border transfer to the US requires Standard Contractual Clauses plus a Transfer Impact Assessment, or reliance on the EU-US Data Privacy Framework (Schrems III is pending).
EU AI Act is fully in force from August 2, 2026, with proposed extensions to December 2027 via the November 2025 Digital Omnibus for high-risk system obligations. Most internal-use AI assistants are not Annex III high-risk; the main obligations are transparency (Article 50) and GPAI documentation (only if you fine-tune above 10^23 FLOPS, which essentially nobody does).
CLOUD Act is US law, but it is the reason this conversation exists. It can compel any US company (including the EU subsidiaries of US hyperscalers) to disclose data on demand. Contractual guarantees of EU data residency do not override this.

The one-line summary: if your customer data is subject to NDAs or sector-specific confidentiality (legal, healthcare, finance, defense, certain HR data), routing it through any US-controlled cloud (even one billed as "EU data zone") may be a breach you cannot paper over with terms of service.

When Sovereignty Matters, and When It's Vanity

This is the section vendors will not write for you. Most companies do not need fully sovereign AI. Some companies absolutely do. The interesting work is figuring out which one you are, and which parts of your workload sit on which side of that line.

02.01When sovereignty actually matters

HR and employee data. GDPR Article 88 lets member states (and Germany has) impose additional rules on processing employee data. An AI tool that processes performance reviews, salaries, internal communications, or candidate evaluations has a high bar for non-EU processing.

Healthcare and life sciences. Patient data is special-category data under GDPR Article 9; Germany's BDSG layers additional requirements. Routing identifiable patient records through OpenAI's API is not compliant without an enormous lift, and even then the optics with patients and ethics boards are bad.

Legal services, especially litigation. Lawyer-client privilege does not survive transmission to a third-party AI processor without appropriate contractual structures, and even with those, the disclosure question stays uncomfortable. Several large EU law firms have moved to fully internal LLM deployments specifically to preserve privilege.

Financial services with regulated activities. BaFin-regulated work, MiFID II reporting, AML/KYC, insurance underwriting touching health data: these have layered regulators who do not accept "but the vendor promised" as a control. The German BSI's August 2025 "Design Principles for LLM-based Systems with Zero Trust" and the January 2026 "Evasion-Attacks auf LLMs — Gegenmaßnahmen in der Praxis" are the operational checklists German auditors will reference.

Public sector and contracts with public sector. France's "Cloud de Confiance" labeling (SecNumCloud) sets requirements most US providers cannot meet through their EU subsidiaries. If you sell into Bundesländer, ministries, or local government, sovereign infrastructure is a procurement gate, not a preference.

Companies whose data is the product. If your core asset is proprietary data collected under strict NDAs from your customers, the relevant question is not "does our cloud provider have good privacy controls?" It is "does routing this data through any third-party infrastructure comply with the confidentiality terms of every individual agreement we have signed?" That question is very hard to answer in the affirmative, and the cost of a single counterparty legitimately objecting is high.

Defense, intelligence, dual-use research. Self-explanatory.

02.02When sovereignty is mostly vanity

Marketing copy. A blog post about your product features is not confidential. There is no compliance argument for refusing to use GPT-5 to draft a LinkedIn post. Use the best model.

Public-facing customer support drafts. Customer queries are typically not special-category data. Standard DPAs from major vendors cover this well enough for most businesses, especially with retention policies and PII scrubbing in place.

Code generation for non-proprietary internal tools. Your tax calculator script does not need a sovereign LLM. The code is your IP, but writing it does not expose it in any way that creates real risk.

Research, learning, internal prototyping. Using ChatGPT to understand a new technical area is not a sovereignty event. And before committing to architecture, you should know what is achievable. Frontier closed models set the quality ceiling on many hard tasks. Sandboxing against them with synthetic or sanitized data is how you measure how much quality you are giving up.

The most common mistake we see: teams choosing sovereignty at the company level rather than the task level. A boutique consulting firm decrees "we use Mistral, never OpenAI" as policy. Now the marketing intern is writing newsletters with a model worse at marketing copy than the alternative, while partners are casually pasting client documents into ChatGPT in defiance of the same policy. Policy at the wrong granularity is worse than no policy.

The Four Architectural Approaches

Once you accept that the question is "which parts of which workloads," you need a vocabulary for the architectural choices available. There are four. Each has a different protection mechanism, a different cost structure, and a different failure mode.

03.01Approach A: Fully Local (On-Premises, Air-Gapped)

Inference runs on hardware you own, in your facility or a dedicated colocation. Model weights are downloaded once and run entirely inside your perimeter. No data crosses an organizational boundary during operation.

Protection mechanism: Technical. Data physically cannot reach external systems.

Best for: Companies where the data is the product (compensation surveys, M&A advisory, defense research, legal litigation files), or where NDAs run from clients directly to you in ways that do not permit any third-party processing.

What it costs: €80k–€160k CapEx for a 20–30 person team at 70B-class model quality, plus €60k–€85k annual OpEx in colocation, electricity, support, and personnel. Realistic three-year total: €470k–€714k depending on hosting (office vs colo) and configuration.

Honest limits: Procurement lead times are 3–6 months for current-generation GPUs in the EU. You need MLOps capacity (0.5–1 FTE equivalent, ~€125k–€140k loaded for a senior MLE in Berlin, or a consultancy retainer at €36k–€60k/year). And the model quality ceiling sits below frontier closed-source on the hardest tasks. Open 70B-class models materially trail GPT-4 on TableBench, and the gap is wider on multi-step reasoning.

03.02Approach B: EU-Sovereign Cloud (Managed Hosting Inside EU Jurisdiction)

Inference runs on third-party infrastructure that is physically located in the EU and operated by an EU company subject exclusively to EU law. The hardware is not yours; the legal perimeter is.

Protection mechanism: Contractual plus jurisdictional. Data leaves your physical control but stays within EU legal boundaries, covered by a data processing agreement.

Providers worth knowing in 2026:

IONOS AI Model Hub (Frankfurt, German jurisdiction, BSI C5 attestation). Hosts open-weight models (Mistral, Llama, Qwen). Charges roughly €1–€3 per million tokens depending on model. No training on customer data, contractually.
Mistral La Plateforme (French jurisdiction). Hosts Mistral's own models including Large 2 and Codestral. Pricing around €5 per million tokens for Large 2. GDPR-aligned, EU-resident, EU-controlled.
OVHcloud AI Endpoints (Strasbourg, French jurisdiction). Hosts a curated set of open-weight models on French infrastructure.
Schwarz Digits / STACKIT (German jurisdiction, Lidl Group). EU sovereign cloud with AI services. Growing but less mature for AI workloads specifically.
Aleph Alpha (Heidelberg, German company). Builds its own sovereign LLM family, sells direct enterprise deployments.

Best for: Companies where EU jurisdiction is sufficient (most German SMEs in regulated sectors), where the contractual sovereignty of an EU provider satisfies your compliance posture, and where the cost of full local hosting does not justify itself given your token volumes.

What it costs: For a 20-person team processing ~250M tokens per year (a realistic mid-range estimate), IONOS AI Model Hub costs roughly €500–€1,500 per year at standard pricing. Mistral La Plateforme runs higher, €1,250–€3,750 per year for the same volume on Large 2. Three-year totals stay well under €10k for most small teams.

Honest limits: You depend on the provider's roadmap. If IONOS drops a model you rely on, or Mistral changes its pricing structure, you have a migration project. Performance and model availability are narrower than the hyperscaler portfolios. And for some specific NDA scenarios (where the contracting client explicitly forbids any third-party processing) even German jurisdiction is not enough.

03.03Approach C: Frontier Cloud with EU Data Zones (Contractual Sovereignty)

Using OpenAI, Anthropic, or Microsoft 365 Copilot with enterprise agreements that commit to EU data residency, no training on customer data, and various compliance attestations (SOC 2, ISO 27001, GDPR DPA frameworks). Azure OpenAI EU Data Zone is the canonical example.

Protection mechanism: Contractual. Trust is placed in the vendor's operational commitments and audit certifications.

Best for: Companies that want the quality of frontier closed-source models, have lawyers who can negotiate enterprise terms, and have determined that their data exposure does not breach any NDA-style obligations to their own clients. Most B2C companies and many B2B-with-low-confidentiality companies fit here.

What it costs: GPT-4.1 at $2/$8 per million tokens, Claude Sonnet 4.6 at $3/$15 per million tokens. ChatGPT Enterprise at $60 per user per month. M365 Copilot at €28.10 per user per month. For a 20-person team this lands at €20k–€40k per year depending on plan and usage.

Honest limits: The CLOUD Act remains in force regardless of the data processing agreement. Microsoft has been transparent that they have, on rare occasions, complied with US warrants for data stored in EU regions. Azure OpenAI EU Data Zone caveats include that diagnostic and abuse-monitoring data may still flow to the US in some configurations; you have to read the docs carefully and configure correctly. And the model is not yours: the vendor can deprecate, change pricing, or restrict access.

There is a subtler issue. For data that is sensitive to your clients (not only to you) you can have the best DPA in the world and still have a breach problem if your client's NDA explicitly says "no third-party processing." The vendor's compliance posture is irrelevant to that contract. You signed something with your customer that you have to honor regardless of how good Microsoft's certifications are.

03.04Approach D: Algorithmic Sovereignty (Sanitization at Inference Time)

A preprocessing layer strips or anonymizes sensitive fields before sending queries to an external model. The external system processes the anonymized version; results are post-processed to re-insert context.

Protection mechanism: Algorithmic. A filter acts as a buffer between sensitive data and the external system.

Best for: Workloads where the sensitive fields are clearly delimited and removable without losing task fidelity. Customer-support tickets where you can strip the customer ID before routing the body to GPT. Legal documents where you can replace party names with placeholders and the legal reasoning still works.

What it costs: Minimal direct cost beyond your normal API spend, but the sanitization layer is engineering work. Building a robust PII scrubber for arbitrary inputs is a several-week project. Off-the-shelf options exist (Presidio, Skyflow) but require integration.

Honest limits: Sanitization is imperfect. Re-identification risk exists whenever numerical patterns, country-firm combinations, or contextual specifics survive in the query. For some tasks (anything where the sensitive data is the task, like comparing compensation figures across firm types) there is nothing to sanitize without losing the task. Compliance auditors and counterparties may not accept algorithmic anonymization as equivalent to non-disclosure. And it adds latency and architectural complexity without providing technical guarantees.

This approach is useful as a hybrid component: sanitize the sensitive parts, route them through sovereign infrastructure, and let the rest go to frontier models. It is rarely the right answer as the whole architecture.

The Realistic 2026 Stack

Here is what we actually build for clients in 2026, broken down by component. None of this is exotic. All of it is open or sovereign.

04.01Inference engine

vLLM 0.7+ is the production default. Red Hat's August 2025 benchmarks, re-validated in 2026, measured vLLM at roughly 793 tokens per second peak throughput versus Ollama's 41 tokens per second on identical hardware, with P99 latency of 80 ms versus 673 ms. Hugging Face's official Inference Endpoints documentation now recommends migrating off TGI to vLLM or SGLang. For multi-user serving on NVIDIA hardware, vLLM is the unambiguous choice. Ollama is fine for single-developer workstation use; it does not scale past one or two concurrent users.

04.02Hosted EU options

IONOS AI Model Hub is the workhorse: German jurisdiction, BSI C5 attested, no training on customer data, hosts Mistral Small/Medium, Llama 3.1 8B/70B, and others. Mistral La Plateforme is the pick when you specifically want Mistral models (Large 2, Codestral) under French jurisdiction; their commercial license terms are clear. OVHcloud AI Endpoints is a credible third option, particularly for French-anchored stacks.

04.03Open-weight models we deploy in 2026

Qwen 2.5-72B-Instruct for general-purpose strong reasoning at 70B class. Community top pick for math and careful editing. One of only two open models that retain meaningful long-context accuracy on RULER at 128K tokens.

Qwen3-32B for the same job at smaller scale. Apache 2.0 license, fits on a single L40S 48GB or a single H100. The Qwen3 Technical Report (arXiv:2505.09388, May 2025) puts Qwen3-235B-A22B (Thinking) above DeepSeek-R1 on 17 of 23 benchmarks.

Llama 3.3-70B-Instruct when ecosystem support matters more than peak reasoning. Cleanest instruction-following, broadest tooling, weaker on multi-step numerical reasoning.

Mistral Large 2 / Codestral when EU provenance is a procurement requirement. Strong on European-language tasks; commercial license required for production.

DeepSeek V3 / R1 distilled-70B for the strongest open reasoning. The full 671B variant needs 8× H200, not realistic for small companies. The distilled 70B is the practical pick on 2-GPU hardware.

04.04Hybrid sub-task routing

For non-sensitive heavy reasoning we route to Anthropic Claude (Sonnet 4.6 or Opus) or OpenAI GPT-4.1 through standard API contracts with PII scrubbing on the way in. Azure OpenAI EU Data Zone is the standard pick if you want frontier OpenAI quality with the strongest available contractual EU residency. Read the docs carefully, configure the abuse-monitoring opt-out where your compliance posture requires it, document the configuration in your DPIA.

The point is not to refuse to use these models. The point is to send them only the work that should be sent to them.

04.05Chat UI, orchestration, hardening

Open WebUI is the user-facing chat surface (137,000+ stars, multi-user, RBAC, RAG, MCP, audit logs). Letta (formerly MemGPT) is the local memory layer (~83% LongMemEval, encrypted SQLite). n8n for workflow automation (IONOS published official nodes in January 2026). Langfuse self-hosted for observability, replaces LangSmith for sovereign deployments. Smolagents with a sandboxed Docker code interpreter for any task involving numerical reasoning: the LLM emits Python, the sandbox executes it, results stream back. This pattern closes the open-vs-frontier quality gap on TableBench-style tasks more effectively than picking a bigger model.

For the server itself, the BSI/ANSSI "Design Principles for LLM-based Systems with Zero Trust" (August 2025) is the relevant German guideline: no inbound internet route from the inference node, separation of high-trust (private data) and low-trust (web/email) contexts in the same session, structured prompts with role separators, Prometheus plus DCGM exporter for monitoring.

"Sovereign AI" is not a switch. It's a hundred small operational decisions, none of them glamorous, all of them load-bearing.

Real Numbers: What Each Option Actually Costs

This section uses the Berlin research firm engagement as a worked example. The team is 20–30 people. They handle compensation survey data under NDA from 500+ consulting firms. Token volume estimate: 250 million tokens per year (roughly 50,000 tokens per user per working day).

05.01Option 1: Full local, recommended tier (2× H200 NVL in Berlin colocation)

Hardware CapEx: Supermicro 4U/8U dual-EPYC chassis with 1 TB DDR5 ECC (€30k), 2× NVIDIA H200 NVL 141 GB at €29,741 each (€59.5k, Delta Computer commercial pricing per heise online, May 2026), enterprise NVMe RAID 10 (€4.5k), networking and NAS and assembly (€18k). Total: €112,000.

Year-one implementation: Berlin specialist consultancy for 4–6 months covering procurement, deployment, security review, knowledge transfer (€80k); DPIA and privacy counsel per Art. 35 GDPR (€15k); BSI Zero-Trust security audit and penetration testing (€10k); team training (€3.5k); 10% contingency (€13k). Total: €121,500.

Annual OpEx: Tier 3 Berlin colocation 10 kW rack at NTT/Maincubes/ScaleUp midpoint (€60k), electricity in colo at PUE 1.3 (€4.4k), cross-connects and remote hands (€3.9k), hardware support contract (€5k), internal AI platform owner uplift on an existing IT generalist (€15k), consultancy retainer at ~€3k/month (€36k), annual model refresh (€15k), backups and DR (€6k), compliance and annual security audit (€16k). Total: ~€161,000/year.

Three-year total: €394,500 + €158,400 + €161,600 = ~€714,500. Cost per employee per month averaged over three years: ~€992; in years 2–3, when hardware amortization is no longer in the mix, ~€820/month.

Office hosting instead of colocation saves ~€180k over three years (€535k total, €743 per employee per month) but takes on noise, cooling, and physical-security risk. We do not recommend office hosting for a 2× H200 server. 70+ dB fan noise alone makes it incompatible with most 20-person offices.

05.02Option 2: EU-sovereign cloud (IONOS AI Model Hub)

Same workload on IONOS: token cost at ~€1–€3 per million blended runs €500–€1,500/year. Add a self-hosted Open WebUI plus Letta plus Langfuse on a small office workstation (~€500 one-time) and a DPIA for the AI use case (~€5k). Three-year total: ~€5,000–€15,000.

Roughly 50× cheaper than full local. Data sits in Frankfurt, German jurisdiction, BSI C5 attested, no training on customer data by contract. What you give up: control over the model roadmap, the ability to tell a counterparty "no data ever leaves our infrastructure" when their NDA requires that exact phrasing, and ownership of the inference stack.

05.03Option 3: Mistral La Plateforme

Same workload on Mistral's own EU-hosted Large 2: token cost at ~€5/M blended is ~€1,250/year. Same self-hosted tooling. Three-year total: ~€5,000–€10,000. Slightly more per token than IONOS, but you get Mistral Large 2 specifically: the strongest European-language model in the open-weight ecosystem.

05.04Option 4: Hybrid (sovereign for sensitive, frontier for non-sensitive)

The configuration we recommend for most EU SMEs without a Vencon-style structural requirement.

Architecture: sensitive workloads (NDA-bound data, employee records, internal financials) route to IONOS or Mistral La Plateforme through Open WebUI with strict access controls and Langfuse audit logs. Non-sensitive heavy reasoning (research, open-source code review, marketing drafts, internal docs) routes to Claude Sonnet 4.6 or GPT-4.1 via standard enterprise API with PII scrubbing (Presidio or Skyflow) on the way in.

Three-year cost for 20 users: IONOS/Mistral sovereign sub-tasks (~€5k); frontier API spend at ~100M tokens/year on Claude (~€18k); engineering for routing and PII scrubber one-time (~€15k); DPIA covering the hybrid architecture (~€8k). Total: ~€46,000. Roughly 15× cheaper than full local, with frontier quality available where it matters.

05.05Option 5: Pure frontier (Azure OpenAI EU Data Zone)

For comparison, with no sovereignty constraint: ChatGPT Enterprise at $60/user/month over three years lands at ~€38k; GitHub Copilot Enterprise at ~€25k; M365 Copilot at ~€20k. The lowest sovereignty-friction option, M365 Copilot with EU Data Boundary configured, runs roughly €20,000 over three years.

05.06Side-by-side

Approach	3-year cost (20 users) / Sovereignty / Best for
01ChatGPT Enterprise (US contractual)	~€38,000 \| None (US jurisdiction, CLOUD Act applies) \| Companies with no NDA constraint on customer data
02M365 Copilot (EU Data Boundary)	~€20,000 \| Weak (Microsoft EU subsidiary, CLOUD Act applies) \| Default office-productivity AI; not for confidential customer data
03Mistral La Plateforme (FR)	~€10,000 \| Strong (French jurisdiction) \| European-language workflows, EU provenance for procurement
04IONOS AI Model Hub (DE)	~€5,000–€15,000 \| Strong (German jurisdiction, BSI C5) \| The default sovereign baseline for German SMEs
05Hybrid (sovereign + frontier)	~€46,000 \| Per-task (drawn carefully) \| Most EU SMEs that want frontier quality where it is safe
06Full local, office hosting	~€535,000 \| Maximum (technical guarantee) \| Companies with structural sovereignty requirements and MLOps capacity
07Full local, colocation	~€714,500 \| Maximum (technical guarantee) \| Same, but who need 24/7 operational reliability

The premium for full sovereignty is real. About 50× over EU-sovereign cloud, about 15× over the hybrid configuration. The premium is justified only by a structural reason: NDA, sector regulation, or a specific contractual obligation that nothing else can satisfy.

The Trade-Offs Nobody Mentions in the Pitch Deck

Going sovereign costs more than money. Here are the things that get quietly traded away.

06.01Latency

Frontier closed-source models have heavily-optimized inference with global edge deployment. OpenAI's median time-to-first-token on GPT-4.1 sits around 400 ms. A self-hosted Qwen 2.5-72B on a 2× H100 node, well-configured, lands at roughly 40–60 tokens per second per request with 20 concurrent users. Fine for chat, but the time-to-first-token will be longer and you will feel it on streaming. Mistral La Plateforme and IONOS sit between self-hosted and frontier. For voice-first or real-time applications, sovereign options narrow significantly.

06.02Top-end quality on hard reasoning

This is the trade-off vendors most aggressively obscure. Frontier closed models materially outperform the best open 70B-class models on the hardest reasoning tasks:

TableBench (AAAI 2025, 18 sub-categories of tabular QA): "the most advanced model, GPT-4, achieves only a modest score compared to humans." Open 70B-class models trail GPT-4 by ~10–15 points on multi-hop numerical reasoning.
DABstep (Adyen/Hugging Face, multi-step data-analysis agents): o3-mini at 16%, DeepSeek-R1 at 13%, DeepSeek-V3 at 6%. Even the best agents score below 20% on the hardest tasks. The open-vs-closed gap is widest exactly there.
MMLU-Pro: GPT-5 and Claude Opus in high-80s, Qwen3-235B-A22B at 80.6%. Small gap at the top of the benchmark; the gap widens on out-of-distribution problems.
RULER 128K long context: only Gemini 1.5 Flash and Qwen 2.5-72B retain meaningful 128K accuracy in 2025 multilingual evaluations; most others degrade sharply.

For most professional knowledge work (drafting, summarizing, structured Q&A, code completion) open 70B-class is good enough. For the hardest reasoning, sovereignty costs you quality. Plan workflows around it: human-in-the-loop on numerical outputs, sandboxed code interpreter for math (let the LLM emit Pandas; the sandbox computes), evaluation on your actual workloads rather than generic benchmarks.

06.03Ecosystem, refresh, vendor concentration, staffing

Four trade-offs we will not pretend away. Ecosystem: the open-weight stack has closed most of the tooling gap (vLLM supports OpenAI-compatible tool use, Open WebUI supports MCP, Langfuse covers observability) but documentation is thinner and the polish gap is real. The first 80% is open; the last 20% takes work. Refresh cadence: the open-weight landscape moves on a 2–4 month cycle. Today's picks (Qwen 2.5-72B, DeepSeek V3.2, Llama 3.3-70B) will likely be superseded within a deployment's three-year window. Frontier vendors deprecate and upgrade for you. With sovereign deployment, you do the upgrades. Budget one model refresh per year. Vendor concentration: the sovereign supplier landscape (IONOS, Mistral, OVHcloud, STACKIT, Aleph Alpha) is narrower than the hyperscaler field. A pricing or roadmap shift at any one of them affects more of your stack than the same shift at a hyperscaler would. The market is consolidating. Staffing: a frontier-cloud-only stack can be operated by a competent backend developer. A sovereign-on-prem stack needs at minimum a senior engineer with MLOps experience or a credible consultancy retainer. If you do not have that capacity and cannot afford it, do not start the on-prem path. A misconfigured local LLM is worse than no local LLM: it produces confident-but-wrong outputs that humans then trust, and the failure modes are quieter than cloud failures because there is no vendor pager.

H100/H200 lead times in the EU were 3–6 months through most of 2025, improving but still longer than US lead times in 2026. Bridging the wait with EU-sovereign cloud (run the sandbox on IONOS while hardware ships) is standard practice.

An Honest Evaluation Framework

Here is the decision tree we use with clients. It is not the only frame, but it has held up across the engagements we have run.

07.01Step 1: Classify your workloads by data sensitivity

List every AI-augmented task you can imagine in your business. Tag each one:

Class 1: No customer data, no employee data, no internal financials, no IP-significant content. (Blog posts about your product, public-company research, summarizing public web content.)
Class 2: Internal employee or operational data, no special-category protection, no external NDA. (Meeting notes, project status updates, internal training docs.)
Class 3: Customer data, financial records, or content covered by client-to-you NDAs. (Customer support tickets with personal data, financial statements under review, M&A documents.)
Class 4: Special-category data under GDPR (health, religious belief, biometric, criminal records, trade union membership) or sector-specific confidentiality (lawyer-client privileged material, medical records, defense research).

07.02Step 2: Map classes to architectural approaches

Class 1: Frontier closed-source is fine. Use the best model. Class 2: EU Data Zone with a frontier vendor is typically defensible with a DPA in place; hybrid with PII scrubbing also works. Class 3: EU-sovereign cloud (IONOS, Mistral, OVHcloud) is the default; frontier with sanitization may be acceptable where sensitive fields are cleanly removable. Class 4: Full local (Approach A) is the default, or the most rigorous EU-sovereign cloud if your sector regulators accept it. Document everything; get specialized counsel.

07.03Step 3: Estimate token volumes per class

This breaks most ROI calculations. Do not go on-prem until you have measured your actual token volume. Run a pilot on EU-sovereign cloud for three months. Capture per-user-per-day token counts. Multiply out. Empirically: 20 users at ~50,000 tokens per user per working day = ~250M tokens/year. Real organizations vary 2–5× around this. If you come in below 100M tokens/year, full local is almost certainly not justified. The EU-sovereign cloud cost at that volume is too small to defend the CapEx against.

07.04Step 4: Check the staffing constraint

This kills more sovereign deployments than money does. On-prem (Approach A) needs either a senior MLE on staff (~€125k–€140k loaded in Berlin) or a credible consultancy retainer (~€36k–€60k/year). Sovereign cloud (Approach B) needs a competent backend engineer to operate Open WebUI, configure auth, manage Langfuse logs, respond to incidents. If neither is available and the budget cannot stretch, stay on EU Data Zones with a frontier vendor until you have built the operational capacity. Sovereignty without operational capacity is theater.

07.05Step 5: Build the evaluation suite, then phase

Before committing to a model, build an evaluation suite on 50–100 representative tasks from your own corpus with reference outputs scored by your own people. Generic benchmarks tell you about average behavior on academic-style tasks. They do not tell you whether Qwen 2.5-72B handles your spreadsheet pipelines.

Then phase: months 1–3, pilot on EU-sovereign cloud, build the front-end stack, measure token volume, capture evaluation data on real tasks. Months 3–6, if pilot validates and volume justifies it, place hardware order in parallel. If volume is below 100M/year, stop here. EU-sovereign cloud is the destination, not a waypoint. Save the €500k. Months 6–9, deploy on-prem if applicable, migrate the same software stack to point at local vLLM, keep the EU-sovereign cloud as a 60-day fallback. Quarterly, re-evaluate model choices. Annually, plan one model upgrade.

This staged path validates the use case before sinking €400k+ of CapEx into the wrong configuration.

When This Approach Is Wrong

We try to be honest about when our own framing breaks. Three scenarios where the "draw the line carefully" thesis is not the right answer:

You are too small to draw the line. A two-person consultancy will not get value out of building a hybrid routing layer with PII scrubbing. The engineering overhead exceeds the benefit. The right answer is usually: use the most defensible single tool (Mistral La Plateforme for everything, or M365 Copilot with EU Data Boundary configuration for everything), document the choice, move on. Sophistication has a fixed cost. Pay it only when the workload justifies it.

You are too large to operate at the task level. A 500-person company needs centralized policies, not per-task discretion. You cannot trust every employee to correctly classify data sensitivity at the point of use. At that scale, the right architecture is usually a strict enterprise policy with a small number of approved tools, automated guardrails (DLP at the endpoint, automated PII scrubbing in routing), and clear escalation for edge cases. The hybrid-by-task model is too fragile at scale.

Your regulator has spoken. If BaFin, BSI, or a sectoral authority has issued specific guidance for your industry, follow it. The guidance will usually be more conservative than this framework. Our job is to help you think clearly when the rules are ambiguous. When they are not ambiguous, the rules win.

The data is the product, full stop. This is the Vencon case from the introduction. When the entire business rests on confidentiality agreements with sources, and those agreements predate the AI era, the operative question is not "what is the most cost-effective architecture?" The operative question is "what arrangement do our counterparties accept?" In our experience, "we run a fully local air-gapped inference stack with no outbound network access during operation" is one of the few answers that sources will accept without a renegotiation. The €500k+ is the cost of that conversation not happening. Sometimes it is worth it.

FAQ

Is GDPR compliance enough? Why talk about sovereignty at all?

GDPR is about lawful processing of personal data. It is necessary but not sufficient for many real-world confidentiality scenarios. NDAs from your clients to you may restrict third-party processing in ways GDPR does not require but does not preempt either. The CLOUD Act creates a parallel exposure that GDPR cannot defend against directly. Sovereignty is the layer above GDPR that addresses jurisdiction and control, not lawful basis alone.

Is Mistral really European in any meaningful sense?

Mistral AI is a French company headquartered in Paris with predominantly European leadership and ownership; its models ship with clear commercial licenses and La Plateforme runs on EU infrastructure. Funding has included international (including US) investors, so "European" is not "European-only capital." For most procurement and compliance purposes, Mistral satisfies an EU sovereign AI requirement. For politically sensitive deployments where ownership matters, dig deeper before committing.

What about Azure OpenAI EU Data Zone? Doesn't that solve this?

Partially. It gives strong contractual data residency. It does not eliminate CLOUD Act exposure, because Microsoft is a US company. For most B2C and many B2B use cases, the contractual posture is sufficient. For NDA scenarios where third-party processing is restricted, it may not be. Read your client contracts before deciding. And read the Azure OpenAI configuration docs: diagnostic and abuse-monitoring data flows have caveats you have to configure for.

Can I use Claude or GPT through OpenRouter and claim sovereignty?

No. The data still flows to Anthropic or OpenAI for processing. The gateway adds a billing relationship, not a sovereignty layer. If you need sovereignty, the model itself has to run on sovereign infrastructure.

How fast is the gap between open-weight and frontier closing?

Unevenly. On general knowledge (MMLU, simple QA), open 70B-class is essentially at parity with frontier for most practical purposes. On hard reasoning (BrowseComp, DABstep, ARC-style novel problems), frontier still outperforms by 2 to 5 times on the hardest sub-tasks. On code generation, the gap is small and narrowing fast. On long-context retention, both camps degrade past 32K. Pragmatic 2026 working number: for 80% of small-company workflows, open 70B is good enough. For the hardest 20%, it is not. Design the stack around that ratio.

What does Garden actually do in a sovereign deployment engagement?

Three phases. Audit (2–4 weeks): map workflows, classify data, identify which workloads belong on which side of the line, quantify token volumes and accuracy thresholds. Sandbox (4–6 weeks): stand up the recommended architecture on EU-sovereign cloud, run it against your real tasks, measure performance against your evaluation suite. Deploy (4–8 weeks): install the production version, integrate with auth and monitoring, train the team, document the handover. Sandbox runs on IONOS or Mistral while procurement is in flight for on-prem clients. A Phase 1 of audit-plus-sandbox typically lands in the €57k–€75k range based on the engagement we ran with the Berlin research firm.

What to Do Next

The temptation when reading a guide like this is to decide on the architecture before doing the diagnostic work. Resist it. The architecture is downstream of three things you may not have measured yet: what your actual data sensitivity classes are, what your actual token volumes look like, and what your actual quality bar is for the hardest workloads.

The minimum useful next step is a 30-day diagnostic:

List your top 20 AI use cases by anticipated frequency.
Tag each one with the data class (1–4 from Section 7).
For three of them (one from each of the heaviest-volume classes), run a sandbox against open-weight on EU-sovereign cloud and against a frontier model with PII scrubbing. Score the outputs on your own evaluation criteria.
Calculate the realistic token volume across all 20 use cases per month.
Re-read Section 5 with those numbers in hand.

Most teams find, after this exercise, that the right answer is somewhere in the hybrid zone: sovereign cloud for the sensitive 40%, frontier with care for the rest, full on-prem only if a structural reason demands it. The exceptions (companies where on-prem is the only honest answer) usually know who they are before reading any of this.

If you are trying to draw that line for your team and would rather not draw it alone, that is the conversation we have in a Garden audit. We run sovereign-deployment engagements: audit → sandbox on EU infrastructure → deploy. The output is a working system with your team trained to maintain it, not a slide deck. Email a@gardenresearch.eu.

Garden Research helps EU teams of up to ~20 people adopt AI without the hype, the chaos, or the 95% pilot failure rate. We run our own sovereign-deployed agents in production and we write what we learn.