I always missed subtitles for the Croatian TV series Bitange i princeze, because the actors speak pretty fast. Recently, I found out that Google Cloud offers Speech-to-Text conversion. However, to use Google Cloud you have to provide your credit card information and I am already at Amazon AWS for cloud services. Thus, it did not seem reasonable to register an account with Google only for trying to subtitle a series.

However, Google allows you to test the service with audio files up to one minute. I extracted a few scenes from an episode and tried it. The results did not seem so bad. I also tried a few excerpt from German movies and on the other hand was not very pleased. The results did not match the dialogs very well. Thus, even though the Croatian results for my series seemed OK, I did not continue.

Later, I got a recommendation for the project autosub from an acquaintance. This tool takes a very clever approach: It splits the video into individual sections with speech activity (i.e. whenever nobody seems to talk it performs a cut) and then sends the sections to the Google Web Speech API for transcription. The Google Web Speech API to my understanding is an endpoint that is normally used by Google for speech detection in its browser Chrome. E.g., when you send a voice command to your browser (probably more common on the phone), the browser will forward this request to Google and return the transcription.

In fact, Web Speech API is a standard API that different browsers implement. The backend can be chosen by the browser vendor, but the API is a standard. This means, that autosub probably does something similar to what this Javascript (from Wikipedia) would do in a browser:

var recognition = new SpeechRecognition();
recognition.lang = 'de'; //Sprache auf Deutsch festlegen
recognition.onresult = function (event) {
  if (event.results.length > 0) {
    alert(event.results[0][0].transcript); //erstes Ergebnis ausgeben
  }
};
recognition.start();

We can use this tool to subtitle our series. It’s very simple to use, you just have to specify your language and the input file. It will automatically use a matching name for the subtitles.

autosub -S hr -D hr Bitange-i-princeze.mp4

With a simple batch loop, this command can easily be run on all available files from the series.

Bitange i princeze subtitled with autosub

The detection is of course far from perfect, but in some scenes it allows me to understand the dialogues better. Sometimes the Google transcription is better than my Croatian understanding, but sometimes I also recognize errors in the transcription.

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.