View Full Version : Voice to Text and Beyond

VoIP News & PR
06-11-2014, 06:08 AM
How the world has changed over the past couple of decades? In the last 15 years, the technology we use daily is a by-product of an unprecedented rate of evolution. In the 1990s, email was a revolutionary tool for many people but today, it is merely a basic necessity.

A video made its way around the web last week where several teenagers’ reactions were gauged as they viewed an educational video (http://www.pcmag.com/article2/0,2817,2458835,00.asp) for the Internet. After these kids watched the video they are asked several questions such as, “How do you connect to the Internet?” and “Do you know what a modem does?” One girl is actually informed that her smart phone wasn’t an option for that time when she says, “I’ll stick to my phone,” after listening to the noise a modem makes while connecting the Internet.

Those of us who remember that time can probably recall the thrill of early UC solutions like AIM (the AOL Instant Messenger.) The sound of the modem connecting meant you were on the cusp of Internet exploration and a sea of chat windows on your screen.

Of course, instant messaging is still used today as part of many VoIP based solutions but technology has substantially improved to offer more means of communication. Voice and video are now options for not only our computers but are commonly used on our phones and tablets. All of this would have seemed like science fiction 20 years ago.

The current state of technology in plain text

Anytime we look back to a time before we lived, it’s hard to imagine what life would be like without certain conveniences. Prior to voicemail, to think that a phone call would not only be missed if someone was not around to answer the phone, it would not be logged by a caller ID. It’s alarming for some of us to imagine a time when urgent calls requiring immediate action would have gone unanswered.

In the 70′s when voicemail first became widely available, it was a big step for communications. However, over the past several years, voicemail has declined in use for the everyday consumer. This is for a multitude of reasons. For example, many of us can see missed calls logged on our phones and call back pertinent contacts while ignoring others.

Further, thanks to texting and applications like Facebook Messenger, it is often much quicker to convey a message compared to the couple of minutes required to access and listen to a voicemail. This meme sums it up adequately:


Realizing that voicemail is on the decline, many providers have added extra features to voicemail. For example, voicemail to text is a common feature offered by many carriers and providers that make voicemail less cumbersome. And it’s all thanks to voice transcription.

Text to voice and vice versa

When you use features like Siri or Google voice search, you are using an evolving technology which is changing how we use the Internet. The underlying technology in these applications is known as “speech synthesis.” It was originally developed to translate text to speech for people who have reading difficulties. Since then, a few companies have engaged in the task of changing speech to text.

To do so, creating such software requires a masterful comprehension of linguistics which delves much deeper into language than simply finding the meaning of words. Phonetics, syntax, tone and morphology are taken into consideration. If you look at the US alone, we have many different dialects – essentially, a linguistic Rubik’s cube. In order to effectively transpose every region’s accent, a large sample size from different areas must be analyzed and applied within the technology, for US English alone!

When you perform a voice search in English with developed platforms, most words are easily recognized with some exceptions. In a nutshell, sound wave recognition for individual voices is integrated into an algorithm and then processed. This is possible because data from various voice search engines has been anonymously collected and analyzed against the intended search queries, then is later applied to the application.

Translating with Skype

A new breakthrough in speech recognition technology was announced last week. Skype recently unveiled a beta version of their technology that can translate voice into different languages (http://guardianlv.com/2014/05/skype-translator-makes-life-easier-video/). Data collected over the past 15 years has been compiled and analyzed with the help of Microsoft Research (the subsidiary company responsible for developing the Xbox Kinect peripheral.) It is slated to be available in a beta format later this year after further development.

At this point in time, the translator is not very accurate which should come as no surprise. Most of us in the US speak English as our first and often only language and therefore do not realize its complexity. Beyond the various dialects, English is well known (and often criticized) for the frequent and various vernacular applied to everyday conversation. With so many exceptions to certain rules, varying meanings based on context, homophones and other quirks, English is often cited as a difficult language to learn so programming a machine to comprehend it is understandably challenging.

Obstacles to overcome…

Translating a sentence word for word is one task, but creating a meaningful sentence is a more elaborate process. Take an Eastern, non-Latin based language into consideration. Here is an example of a Facebook post translated by Bing into English which represents the difficulty of many translation tasks:


Though the technology requires polishing, it will soon open doors in communication, especially for international business. Fortunately, because of information collected from the web and other data –facial recognition (http://www.cngl.ie/cngl-researchers-use-facial-recognition-to-translate-emotion-into-speech-output/) will even play a role – more complex translation issues may be solved. English that is thick with turn of expressions and Eastern languages that often rely on inflection and volume will someday be overcome by software making the world a more connected place.