“WOW it’s very accurate!!” This is how new customers and partners react the first time they try SayIt from nVoq. But it hasn’t always been a slam dunk like this.

When we built the first version of SayIt ten years ago, we did so with the best technology available. At the time, automatic speech recognition was dominated by Gaussian distribution acoustic modeling and statistical language modeling based on the probability of n-word sequences. And, the incremental improvements were developed by, and concentrated inside, a few large companies. This technology was good for many use cases but was somewhat temperamental and didn’t deliver the great end user experience we were all searching for. Acoustic models needed to be trained for each speaker and changing microphones, mobile devices, background noise, etc. often lead to poor results. But that has changed.

About the Author

Jon Ford

VP Engineering

New technologies are driving disruptive change in speech recognition. Graphics Processing Units (GPU’s) and corresponding advances in machine learning algorithms enable the training of neural networks in a fraction of the time required just a few years ago. For example, NVIDIA’s DGX-1 Deep Learning System uses GPU’s to deliver 170 TFLOPS in a single rack mounted server. Model construction and training that would have taken more than 6 days on CPUs can be done in two hours on one of these machines [1]. In fact, the DGX-1 it would have been the fastest computer in the world in 2005 [2].

Neural networks and machine learning are not new. But, this computing power is. And, NVIDIA’s developer tools make it available. Support from research groups at Google, Facebook, Baidu, and Twitter for open source machine learning libraries such as Tensorflow, Torch and Warp CTC are changing the way we see, hear, and interpret our world. At nVoq, we use these tools and others to process enough text and audio to build speech models that deliver a fantastic real-time speech recognition experience. And, using open and off the shelf web technologies, we make this available via a zero footprint HTTP API and web client.

The next generation of speech is here, driving the next round of improvements in efficient and accurate documentation. We welcome you to join us on this amazing journey.

[1] http://www.nvidia.com/object/deep-learning-system.html
[2] https://en.wikipedia.org/wiki/History_of_supercomputing#Historical_TOP500_table

Comments 1

Chad Hiner, RN, MS, Director, Healthcare Industry Solutions
July 26, 2016 at 4:23 pm

So interesting, Jon, thanks for sharing. It is exciting to be working for such an innovative company. Good things to come! – Chad

Log in to Reply

Disrupting Speech

About the Author

Jon Ford

Comments 1

Leave a Reply Cancel reply