There is a quiet contradiction in today’s debates about AI and language. Many people insist that machine translation and machine interpreting will never work at a truly high level, at least not anytime soon. At the same time, those very same people express growing alarm about deepfakes: synthetic voices, faces, and videos that are increasingly impossible to distinguish from real ones. Both positions cannot be held at once.
If deepfakes are already convincing, as we will see later in this article, then machine interpreting, arguably the most complex form of translation, might be as well. The reason is simple: machine interpreting can be framed, if accept some minor simplifications, as an imitation game.
Machine interpreting as an imitation game
In an interesting book1, Umberto Eco famously described translation as the art of saying “almost the same thing”. I suggest that spoken language translation is that same idea under time pressure and contextual constraints: producing, in real time, a target-language utterance that sounds right, feels right, and works well. In other words, interpreting consists in an imitation game, i.e. imitating the speaker’s act of saying, not by mimicking form or delivery, but by reproducing the same communicative effect in another language and cultural context. With some caveats.
Interpreting scholars are correct to remind us that interpreting is more than linguistic substitution. It involves intervention, negotiation, positioning, ethics. But here is the uncomfortable truth: all of these dimensions can themselves be modeled as forms of imitation, patterns of response conditioned on context. At its functional core, interpreting is therefore not about inner understanding. It is about producing a convincing performance of understanding. And that is precisely the kind of task at which contemporary AI systems seem to excel. It is an imitation game.
The Deepfake Moment
This idea became viscerally clear through a personal experience that had little to do with interpreting, yet much to do with AI’s capacity to imitate human behavior or, if one prefers, to plagiarize it. I recently watched about twenty minutes of a video featuring Yanis Varoufakis, the former Greek finance minister and a public intellectual whose voice, gestures, and argumentative style I admire (even though I often do not share his views). While I was watching, the video just felt slightly off. I felt that his tone was unusually solemn. His arguments somewhat repetitive. Still, nothing registered as truly strange. The voice was his. The reasoning was plausible. The arguments too. The performance was convincing. Only halfway through did I notice a comment stating that the video was AI-generated. For a direct illustration, a short video example from one of the many fake channels impersonating him is provided below (there are many more impersonating important intellectuals, at this link one for example of John Mearsheimer, apparently this phenomenon is very widespread).
The shock was immediate. Not because the deepfake was flashy (it was), but because it was credible. I believed in it. Varoufakis himself later explained in an interview for Unheard, which I invite you to watch, that it took him nearly two minutes to realize that the person speaking in the video was not him. If even the person being impersonated cannot immediately tell, the implications are obvious.
We have entered a new phase of illusion. Deepfakes are here, and they are here to stay. They are now a reality, and a deeply troubling one. While there may be legitimate and even beneficial uses for this technology, the potential for misuse scarcely needs explanation.
From Impersonation to Interpretation
Once we accept this new reality (everyone should be convinced by now), and we accept that interpreting is a sort of imitation game (I am aware that probably not many will agree with me on this), the leap to machine interpreting and what it means for predicting its future trajectory is short. To make a long story short, if AI can convincingly impersonate a specific individual — replicating voice, prosody, facial expression, rhetorical habits — then producing a convincing utterance in another language even considering context, goals, etc. is not a harder problem anymore. In many respects, it is easier.
Dubbing and machine interpreting, two closely related applications, are far less spectacular than deepfakes of public figures, yet they rest on the same fundamental capability: producing outputs that humans perceive as authentic, coherent, and appropriate to context. From this perspective, they may be seen as a form of linguistic deepfake, reproducing a speaker’s words and ideas in another language. If deepfakes of living people are possible, then, so my theory, also linguistic deepfakes will be possible.
The Real Problem Ahead
The consequence of this line of reasoning, and the central point I wish to make, is that machine interpreting is likely to reach an extraordinary level of performance in the near future. I am optimistic about the societal and economic opportunities this development may enable. At the same time, such optimism should not lead to an uncritical embrace of these technologies. Quite the opposite. It raises uncomfortable questions about control, manipulation, and power.
If machines are able to translate and interpret convincingly, and if they are still widely perceived as algorithmic impartial, a belief that is largely unfounded, then it follows that they also possess the capacity to filter, frame, and subtly steer meaning at scale. This is already well known from the ongoing debates on AI bias. What is often overlooked, however, is that beyond bias, which can at least in principle be mitigated, technology can also be actively weaponized, that is, deliberately manipulated to pursue specific goals rather than remain neutral. From this perspective, translation and interpreting are, at least in theory, no exception.
A couple of examples help clarify the point. Consider a machine-translation system used by a news agency to render, offline or in real time, events unfolding around the world. Such a system could be subtly tuned to deform reality in small but systematic ways, aligning translations with particular political or institutional interests, including those of the government currently in power. Or consider an interpreting application used in asylum procedures: it could be calibrated to render utterances in ways that consistently favor or disadvantage applicants, without ever appearing overtly inaccurate.
None of this is fundamentally new. Similar distortions have always been possible when humans translate or interpret. What changes with machines is scale, consistency, and ease. What once required coercion, pressure, or ideological alignment of professionals can now be embedded directly into a system. And precisely because it is technology, we tend to assume neutrality, and therefore fail to question the possibility at all.
These are governance problems, not engineering ones. And yet we remain unable to confront them as long as we keep treating high-quality machine interpreting as a kind of hocus-pocus, either impossible, suspect, or morally tainted, rather than acknowledging it for what it is: an emerging, functional technology that demands knowledge and, when necessary, regulation. Certainly not denial.
The Imitation Game Is Already On
Deepfakes are no longer science fiction. They work because they imitate reality well enough for us to accept them. Machine interpreting operates on a similar principle: not perfect equivalence of human performances, but sufficient resemblance. If we are already living in a world where synthetic voices and faces can convincingly pass as real, then we are also living in a world where machines will soon speak for us, across languages, in real time in a convincingly way. This opens up new and great opportunities, in which I firmly believe, but also a long series of challenges. The imitation game is not coming. It has already begun.
To close with a concrete example. Below is a simultaneous interpreting system developed by myself using a cloned version of my own voice. To my and my loves’ ones ears, this is quite astonishing. But voice is only the most visible layer. Giving my own voice to an artificial interpreter does not cause any harm. However, many other dimensions of interpreting can be faked or tuned with equal ease: how an immigrant’s words are rendered, whether they sound hesitant or confident, cooperative or evasive. Depending on who controls the machine — say, a state authority — such systems could quietly shape outcomes, including something as consequential as an asylum application.
- Umberto Eco, ↩︎