InterpretBank ASR 3.0 – Some thoughts from behind the scenes

A few days ago, we finally released InterpretBank ASR 3.0. This version means a lot to me — not because it’s “new”, but because it feels right — or at least that’s my genuine feeling about it. It took a few years, and a few wrong turns, to get here. But I think the wait was worth it.

When I first started working on Automatic Speech Recognition (ASR) for interpreters back in 2017, it was just a small prototype for a conference paper presented in London¹. The idea was simple, and maybe a bit ahead of its time: could speech recognition and real-time suggestions for terminology and numbers be used to support simultaneous interpreters while they work? Not to replace them, but to help: to make the job a little lighter, the listening a little sharper, and ultimately the translation a little better.

The first proper version accessible to the public came out in 2019, and together with my paper, it quickly found its way into labs and interpreting booths. Researchers and professionals began experimenting with it, often showing me uses I hadn’t even imagined. That period was exciting. The idea that AI (though at the time we didn’t even call it that) could genuinely assist interpreters started to feel real. The number of scholars writing papers — even several PhD theses — using InterpretBank ASR was surprisingly high.

At the same time, the slow adoption rate and, admittedly, my own waning enthusiasm for that particular line of research, meant that the project never received the attention it truly deserved. Still, together with the University of Ghent, we launched a European Commission–funded project (EABM) to explore what a “virtual boothmate” might look like. We had funding, ran an experiment with more than twenty professional interpreters, and gathered valuable insights, though the results were not especially groundbreaking in terms of redefining what a live CAI tool could be.

Meanwhile, the software itself received little of my attention. I was deeply involved in my work as CTO at KUDO, where we were developing several other exciting projects, among them KUDO Assist, an AI-driven system designed to support interpreters in real time.

Then came version 2 and honestly, it was probably the worst tool I’ve ever designed. It looked clumsy. It didn’t feel right. It had too much of my idea of how it should be, and too little of how interpreters actually wanted it to be. That was a good lesson.

When you design a tool that hasn’t existed before, you do need some intuition — your vision of how it might work — but for the most part, you need to listen to your users. You need to understand the challenges they face, and how they, not you, want the tool to support them. Lesson learned.

So, in the last few months, I went back to the beginning. This time, I asked twenty interpreters — along with a few expert trainers giving workshops with InterpretBank — listened to their feedback, and rebuilt the tool from scratch, with their workflow, their stress points, and their needs in mind.

The New InterpretBank ASR 3

Now, ASR 3.0 feels different. It listens, transcribes, highlights numbers, and suggests terminology from curated glossaries. It can even translate on the fly, including live translation of the transcript itself. There’s also a simple drawing pad for note-taking, useful in healthcare or legal interpreting.

Interestingly, many people were using it not for simultaneous interpreting, as it was originally designed, but for consecutive or dialog interpreting (read this article about the mistake we as community did). So, the interface can now be fully customized — from font size to layout — something truly unique in this space. In the end, all the research carried out by so many scholars hasn’t provided much insight into how the tool should be. So perhaps the best solution is to let each user decide what it should do, and how it should look.

In this new version, everything is intentionally minimalist, because interpreters don’t need more noise.

Yet beneath that simplicity, there’s quite a lot of AI and NLP at work. For instance, the system can retrieve glossary terms even when they appear in inflected forms: singular or plural, different tenses, and so on. The engine always finds the right match from the user’s glossaries. This happens server-side, through real-time syntactic and part-of-speech analysis. To my knowledge, the competitors don’t do this and only matches exact word forms. An immense difference¹.

Looking at it now, I realize this project has always been a dialogue — between technology and people, between how things could be and how they should actually work. And I think this version finally found that balance.

This means that to match the word “clutches” in a speech it must have the plural entered in the database (the singular form “clutch” will not match). InterpretBank ASR instead can match every single variation of the form ↩︎

Leave a Reply Cancel reply