Private by Design: Rethinking AI Interpreting Beyond the Cloud

In recent weeks, an interpreter made headlines. According to Le Monde, the European Commission dismissed an interpreter suspected of espionage on behalf of Moscow. The individual had reportedly taken notes (while not interpreting) during a high-level, closed-door meeting with Ukrainian President Volodymyr Zelensky in late 2024. The dismissal followed an internal investigation into the interpreter’s activities.

This case — for which, to be honest, we have more speculation than actual detail — underscores the persistent risk of data leaks in high-stakes interpreting, not only when humans are involved, as in this instance, but increasingly as machines are introduced in one role or another into sensitive communication settings. For many organizations exploring machine interpreting — whether in diplomacy, defense, or corporate environments — one obstacle or fear looms larger than all others: data security.

Today’s AI commercial solutions, whether built into conferencing platforms, mobile apps, or dedicated devices, almost without exception rely on cloud-based services. This means that speech data, sometimes of sensitive nature, must be sent by design to external, often multiple servers for processing, a reality that remains a thorn in the side of hospitals, law enforcement agencies, courts, and other potential adopters bound by strict data protection requirements. The fundamental question persists: How secure can your data truly be if it leaves your control?

The reasons behind the reliance on cloud services in AI interpreting

The near-universal dependence on cloud infrastructure is no accident. It is mainly required by the need for robust and powerful hardware due to:

Architecture Complexity: Current solutions, as seen before, comprises many components and their architecture is or might be very complex, making them not easy to deploy and maintain locally.
Model Performance: The most advanced ASR, MT, and TTS systems, the ones with the best quality and language coverage, are mainly proprietary and are built by AI giants like OpenAI, Google, Meta, or others. They are large and computationally intensive.

Cloud deployments, while often a technical necessity in many current use cases, also offer several key advantages that would make them the preferred choice even when the technology for on-device deployment were available.

Scalability: Cloud solutions can support thousands of audio streams concurrently, scaling to meet demand without user-side investment in hardware.
Accessibility: Powerful interpreting capabilities become available with nothing more than a stable internet connection on any device.
Lower Entry Barrier: For many users and organizations, cloud solutions offer a quicker and more affordable path to adopting advanced language technologies without requiring particular expertise or infrastructure.

The downside: data security, privacy and sovereignty

The reliance of current AI interpreting systems on cloud infrastructure comes at a cost. As introduced above, sensitive speech data must be transmitted to servers operated by third parties, frequently across jurisdictions, to perform the needed services. For sectors such as healthcare, law enforcement, defense, or critical infrastructure, this is a compromise that may sometimes prove unacceptable. Even with end-to-end encryption in place, the data still leaves the organization’s trusted perimeter, a risk that cannot be entirely mitigated by technical means alone.

Today, this issue is typically addressed through extensive contractual agreements between service providers and clients. These contracts stipulate, often in great detail, how data can be used and what measures are in place to protect it. To align with regulatory requirements, particularly in regions like the European Union, many providers offer regional hosting options. Hosting AI services within the EU, for instance, helps ensure compliance with the General Data Protection Regulation (GDPR), offering legal reassurance that data is handled appropriately.

In the case of AI services, concerns about confidentiality are arguably amplified by the technology’s reputation. AI companies have been widely criticized for data practices, not necessarily because they aim to compromise users, but because data is (said to be) often used to improve models, sometimes without clear user consent. As a result, AI is perceived not just as a tool, but as a potential liability. While contractual safeguards and cybersecurity defenses are sufficient for most use cases, there are contexts where absolute confidentiality is non-negotiable. No legal clauses, and no assurances of encryption, can replace the need for full control. Consider, for example, a closed-door meeting between the European Commission and President Zelensky to discuss military assistance. In such cases, 100% security is not a luxury: it is a prerequisite. Many stakeholders will simply require that no data is leaving the safety perimeter of the meeting room.

This raises a crucial question for speech language technologies, like AI interpreting: Is it possible to achieve high-quality translation services without sending data to external platforms, and thereby eliminate the risk entirely?

A turning point: the rise of offline AI interpreting

The only 99.99% secure solution for high-stakes scenarios is to ensure that the entire AI interpreting pipeline runs offline, either on a local server or a powerful consumer-grade machine. Is this technically feasible? The answer is yes, and the first building blocks are already emerging.

We’ve already witnessed this shift in automatic speech recognition (ASR). Thanks to open-source (or better termed open-weights) models like Whisper, high-quality offline transcription is now accessible to a broad audience. Applications such as MacWhisper show that local ASR can match — and in some cases outperform — older cloud-based systems. This success is paving the way for the next step: bringing the entire real-time speech translation pipeline offline.

By combining local offline models for ASR (e.g. Whisper), machine translation (MT, e.g. META’s NLLB) or large language models (LLMs, e.g., DeepSeek), and text-to-speech synthesis (e.g. kokoro), it is already possible to perform fully offline translation of recordings. I have tested a simple pipeline on my own computer, and with some tricks quality is already very high and resembles what you can get from commercial applications using a cascading approach for video translation. More remarkably, streaming translation — and even near-simultaneous interpretation — is quickly becoming feasible as well. I am testing a prototype, and results are encouraging.

While this kind of deployment is still not widespread and it will likely unfold in stages, first with speech-to-text (for example for captioning) and later with speech-to-speech, the trajectory is clear: offline AI interpreting is no longer a distant vision, but an emerging reality.

What this means for stakeholders

These developments will mark the emergence of a new market: self-hosted machine interpreting, designed with privacy, adaptability, and sovereignty at its core. Similar solutions are already available for transcription services. Stakeholders with the highest security requirements will be able to deploy solutions that are secure by design, running entirely offline or within their own trusted infrastructure.

The main challenge, however, will likely be quality. Many of these systems can be built using open-source components, and for now, it is difficult to imagine them matching the performance of proprietary models offered by companies like Google, Microsoft, or OpenAI¹.

Imagine large institutions owning their machine interpreting infrastructure or licensing it for internal use, with the full assurance that no data ever leaves their control. This shift could dramatically expand the use of AI interpreting in sectors where absolute privacy is paramount.

Much work remains on the technical front. Plug and play is not an option here, and finding the right components and mixing them the right way still remain an art. Productization will take time, but the core components are rapidly falling into place. The future of machine interpreting will not be confined to the cloud, which will undoubtedly remain dominant for the vast majority of use cases, as viable alternatives are beginning to emerge.

I will keep building and tweaking various offline pipelines over the next months, and see where this journey goes.

One advantage of open-source models is that they open the door to customization, in other words they offer the possibility to fine-tune pipelines for specific domains, verticals, or under-served languages, potentially delivering better performance in areas where today’s generic cloud APIs fall short. However, this still requires quite a lot of expertise which it is not available in every company.. ↩︎

The reasons behind the reliance on cloud services in AI interpreting

The downside: data security, privacy and sovereignty

A turning point: the rise of offline AI interpreting

What this means for stakeholders

Leave a Reply Cancel reply