DoblAI: AI for easy and fast dubbing of multimedia content

uPlayer: enhancing the video playback experience
February 2, 2026

02/02/2026

The AgroTech research group at the Universitat Politècnica de Catalunya – BarcelonaTech (UPC), together with its spin-off Ugiat Technologies, have driven DoblAI, an AI platform that integrates transcription, translation, subtitling and video dubbing into a single workflow. The solution, which uses deep learning technology and cloned or default voice models, is specifically designed for the journalism and communications sector.


In the digital era, audiovisual content has become one of the key formats for reaching global audiences. Media organisations, communications departments and content creators face a growing challenge: producing accessible, multilingual videos without increasing costs or extending production timelines. Manual transcription, translation and dubbing remain slow and fragmented processes, often outsourced, which hinder content scalability and its reach to international audiences. In this context, having tools that automate these steps is essential to improve operational efficiency and competitiveness. Efficient automation tools not only enhance the quality of content dissemination but also increase competitiveness in global markets saturated with information.

DoblAI is an artificial intelligence–based service for automatic video transcription, translation, subtitling and dubbing. The platform allows users to upload audio or video files (wav, mp3, mp4) or work directly from platform links. From there, the user selects the source language and one or more target languages, with the option to generate transcription only, use standard voices or clone the original speakers’ voices.

The system uses advanced video and speech analysis techniques, combining speaker identification (diarisation), transcription in the original language, machine translation and text-to-speech (TTS) synthesis to deliver an integrated result that helps journalists, editors and communicators accelerate the production of accessible, multilingual content. The web interface is intuitive and allows results to be reviewed and edited before publishing, reducing the need for labour-intensive manual processes and integrating all stages into a single tool.

The solution integrates automatic speech recognition, machine translation and speech synthesis technologies, with advanced features such as speaker diarisation (identifying who is speaking at each moment, even with overlapping voices), synchronisation of dubbed voices with the original speech timing, and audio enhancement to reduce background noise. The system can generate transcriptions and translations in over 90 languages and dubbing in more than 40 languages (in the demo version, dubbing is not currently integrated, but a demo can be requested).

A web-based editor allows users to review and modify results: edit texts, adjust timings, change voices or speakers, and regenerate the dubbing. Finally, users can download the dubbed video or audio, as well as subtitles in standard formats (VTT, SRT and TXT), all within a single workflow designed for non-technical professionals, especially journalists and communications teams.

The UPC audiovisual technologies group (AgroTech) has contributed its expertise in validating the operation of the different system components and identifying the most appropriate models. It has also collaborated on other research tasks, such as improving databases for automatic language detection. Extensive system testing has been carried out in different TVE programme scenarios to identify performance under varying conditions, including ambient noise levels and multiple people speaking simultaneously.

Collaboration with the spin-off Ugiat Technologies has been key in translating research outcomes into an applied solution with direct impact on the media sector.

Budget and funding

The project ran for 1 year (2023) and had a total budget of € 30,000.



Related Projects