Speech recognition: a must-have or nice to have?

Solutions for automated Speech Recognition, or speech to text, have made tremendous progress in terms of accuracy, scalability and usability. More and more organizations see the benefits these type of solutions have to offer, but on the other hand there are organizations that struggle with the question of what it has to offer them. In other words, is speech to text a must have or a nice to have? In my humble opinion speech to text most definitely is a must have, provided it’s being leveraged the right way. Let me share a couple of examples on how speech recognition can help with saving time and resources, making content better searchable and how it can help meet legislative demands in the area of digital accessibility. 

Digital accessibility, Legal obligation 

In most European countries, but also the U.S. and the Philippines for example companies have a legal obligation to make sure their content is accessible for everyone. In Europe for example, from the 23rd of September 2020 a new Standard will go into effect requiring all governments to improve digital accessibility. At Scriptix we have been working closely with our partner Arbor Media to help local governments to do just that. Together we worked on a customized speech to text model that can handle political jargon better. Arbor Media has integrated the solution in their new platform, enabling automated subtitles for VOD streams for municipalities. Thanks to this cooperation, these governmental institutions can meet the European standard in a fully automated way while at the same time saving time and resources. 

Bigger archives require better metadata  

Apart from the legal obligation, we see that companies are looking for ways to get a better grip on their content. The amount of audio/video content is growing exponentially and as a result, so are the content archives of broadcast companies, streaming services, political bodies such as parliaments, educational institutions, but also influencers on YouTube for example. the common ways of adding metadata to all that AV-content is becoming insufficient to adequately search through it. There all kinds of new ways to index content, think of face and object detection for example. However, the most relevant information lies in audio. Think of it, a face or object cannot be detected during a radio broadcast for example. All the content that’s being produced however, is mostly comprised of audio. And it is in that audio where we can find a lot of relevant information using speech recognition. First and foremost, all spoken words can be turned into text making the content directly searchable because that way you have the words in writing. For the content creators out there, this obviously also greatly helps with your Google ranking since the trusted search engine indexes based on text. But there’s more. Using speech to text, we can also gather additional relevant information from audio. It’s possible to detect speakers, so when you’re browsing or indexing your archive it helps you to find out quickly not only what is being said, but also who’s saying it. 

Automate your process to save time and money 

The fact that speech to text is automated greatly reduces the time needed to add metadata to content, or to create subtitles for example. Recently cloud services have really taken off giving us the possibility to scale-up like never before. Hundreds of thousands of hours of content can now be processed in a heartbeat. Imagine all that metadata that had to be added manually before, which can now be added like that and in a fully automated fashion. The same goes for subtitles. At Scriptix we work with various broadcasters and providers of subtitle software. The latter can now ingest content in their platform while having it transcribed automatically with our speech to text solution. This way their clients only need to make a few minor adjustments to get the subtitles ready for airing. The entire process of creating subtitles manually in other words has become much more efficient since the bulk of the work has already been done. Thus enabling the users of such software to work through much more content, which as we know is increasing at an exponential rate. 

Whether automated speech recognition is a must have or a nice to have depends on the use case presented, and there are many out there. We believe the ones described above – each to a certain level – require new techniques such as speech to text to either save time, meet legal demands or help prevent that you can’t see the wood for the trees when indexing your media archives. To these use cases at least speech to text would be a must have.  

What are your thoughts? Do you believe speech to text is a must-have or a nice to have? Let’s get in touch and discuss some more. Drop me an email at: [email protected] 

Share this post

Share on twitter
Share on linkedin
Share on email