Over the last few years, AI interpreting has made a leap, from flashy demos to real-world products. Machines can now translate speech between languages, sometimes with impressive accuracy. But interpreting isn’t just about getting the words right. It’s about context, nuance, stakes. And that’s where things get tricky.
So, how good is AI at interpreting today?
It depends. Compare it to top-tier professionals — the ones interpreting high-stakes diplomacy at the UN or the European Commission — and the answer is simple: not even close. Compare it to casual settings, or non-high-stake situations where the goal is to foster some form of access, and AI already does a decent job. Think: catching the gist of a talk in another language you’d otherwise miss. For that, it works.
Still, mismatched expectations are everywhere. Take the WHO’s recent public evaluation campaign of a commercial AI interpreter. I was surprised to see so much effort put into answering a question that, had they asked me, I’d have resolved in a sentence: save your time and budget, AI interpreting isn’t built yet for that level of demand or precision that you are testing against. And I can say, with a fair degree of confidence, that it won’t be anytime soon. To understand the context: the WHO is an international organization accustomed to the highest standards when it comes to its multilingualism, employing the best interpreters on the market with years of experience. To be honest, I can’t say whether the WHO did it genuinely expecting it might meet their highest standards, merely to demonstrate that AI isn’t yet ready for them, or simply out of genuine curiosity.
Don’t get me wrong: with surprise came appreciation, not for the result, which as I said was predictable, but for what the process revealed: evaluating AI interpreting is hard. It’s not just about asking can it work, but rather for whom, when, and why. And learning how to answer that question is a skill we still need to develop. This will become crucial in the near future, as more and more capable machines will enter the field.
So let’s be clear: AI interpreting in 2025 is not ready for prime time, if by “prime time” we mean any setting traditionally handled by highly-skilled professional interpreters, especially those settings where expectations must match the level of quality that only top-notch interpreters deliver. In other words, it is not a replacement for them. That’s said, it is a powerful tool for low(er)-stakes, high-volume communication, where the focus are other priorities than that level of quality. Even the WHO might find AI interpreting useful, for example, in meetings where no world-changing decisions are made, in breakout rooms, and in other situations that would typically remain monolingual (see my article Towards non-discriminatory multilingualism). Probably, the best why to frame it today is to think it as a tool that works best not when we ask it to match human excellence, but when we ask it to fill human absence. I am aware that reality is more nuanced than this, and that the truth likely lies somewhere in between, within a continuum of expectations, needs, available resources, and much more. The gray zone is immense, and the discussion would need more space.
So why the inflated expectations in this kind of evaluation campaigns?
What puzzles me is how we’ve reached a point where stakeholders either expect AI to deliver the same quality they once paid a premium for by using high skilled interpreters or, conversely, seek proof that it cannot. Why this disconnect between expectations and commonsense of today’s reality? Blame the general AI hype since the release of ChatGPT. Blame the fact that such organizations do not have in-house experts in speech translation and external ones are very often not involved at all. Maybe it is a combination of all this. If asked, experts would tell you plainly: stop pretending this tech is ready for this kind of prime time. It’s not. It’s improving, and fast, but it’s still learning (see my article on Human parity in AI interpreting). Instead, use this technology for what it can serve you best, and there are many use cases where it makes perfectly sense, as I said before even in such international organizations. Having the right expectations is key. When in the near future, LLMs and multimodal models will drive the next major leap forward, then we can reassess their usability across more demanding contexts, even those very high stake scenarios like the WHO or the UN general assembly (see the upcoming International Conference on Spoken Language Translation for up-to-date developments in the field).
In my eyes, it is like watching stakeholders in the industry confuse a glider with a jumbo jet. Sure, both fly (albeit in a different way), but you don’t expect to cross the Atlantic in a glider. If you use a glider for its intended purpose, it can be brilliant. But load it with cargo and aim it at New York from Rome, and it won’t just fail — it won’t even get off the ground. Just ask aviation experts. They’ll tell you not to load it in the first place.
This should be obvious, but apparently it is not. I imagine that, at the dawn of aviation, many people made similar mistakes. Perhaps that’s simply the natural course of things. The technology is already great and will undoubtedly evolve. But let’s not kid ourselves: there’s still a long road ahead to pass the Turing Test for Spoken Language Translation.
Hi, Claudio. The WHO had to do it, with reliable and trusted resources, because the general public and those who hold the purse strings choose to believe AI is perfect. These individuals are responding to well crafted and worded advertising. In the meantime, crucial information is missed, and concepts are misinterpreted; however, since the general public does not understand the source language, all this goes undetected until a larger problem arises.
Thank you for this one little piece, in special: “What puzzles me is how we’ve reached a point where stakeholders either expect AI to deliver the same quality they once paid a premium for by using high skilled interpreters or, conversely, seek proof that it cannot. Why this disconnect between expectations and commonsense reality?”
Abraços.