Manufacturers compete for speech recognition, who can detonate the market?

How long does it take for humans to use intelligent and advanced voice assistants like in Iron Man? Last year, several domestic speech recognition manufacturers have announced their own new speech recognition strategies. It seems that the natural interaction between humans and speech is gradually approaching.

HKUST Xunfei once announced that the independently developed offline voice dictation engine will be applied to "Xunfei input method" and other products, which can meet the user's demand for voice technology without or under a weak network. A few days before this, another company, Spitz, also announced in an industry salon to redefine the direction of the human-computer interaction experience, advocating that the machine should change from being able to speak to being able to listen.

And foreign giants are also in the field of speech recognition. Some foreign media reported that Microsoft is developing its own voice personal assistant software code-named "Cortana" and plans to launch it in the next major upgrade of the Windows Phone platform to compete with Google Now and Apple Siri.

As Li Jianhui, vice president and general manager of the dialogue workshop, said, the development trend of smart devices and the advent of the era of mobile internet have made perceptual computing the future direction of human-computer interaction, requiring more natural, intuitive and immersive interaction. the way.

Zhang Jidong, deputy general manager of HKUST Xunfei Mobile Internet Division, described the evolution of speech recognition products as a marathon. In the process, many manufacturers have withdrawn from the field, and Sogou voice assistants are no longer the high-profile promotion, Airi has stopped updating a year ago. Another small i robot is converted to B2B market.

There are manufacturers exiting and new manufacturers entering. A new round of layout and competition based on speech recognition applications has begun.

Crappy voice interaction experience

Although the speech recognition rate of Xunfei's Xunfei input method can reach more than 95%, from the perspective of the entire speech recognition application, the user experience at this stage can only be described as lame.

On the one hand, it is due to birth defects that errors in voice interaction are easily transmitted. "If the accuracy rate of speech recognition is between 85% -95%, the accuracy rate of semantic analysis is between 85% -95%, and the accuracy rate of the final recognition is only 70% -90%." Yu Kai, Chief Scientist of Spitz Say.

It is more difficult for offline voice technology. At present, the two international giants of Google (Weibo) and Apple, as well as HKUST, have offline voice technology. However, due to the lack of network connection and limited storage space, the success rate of Xunfei's offline voice recognition is only about 85%, "just reached the usable level."

On the other hand, because speech recognition technology has too high a technical threshold, the evolution to advanced functions has just begun. "From speech evaluation, speech synthesis to understanding of natural semantics, each direction requires sufficient corpus and algorithms for continuous optimization." Zhang Jidong said.

While optimizing the technology, it is also necessary to build an ecosystem. For example, community question and answer, what kind of movie is Andy Lau playing, or based on the knowledge graph of music and video, what kind of movie does Andy Lau have played.

"It is a trend that natural interaction based on voice is becoming more and more convenient and will replace keyboard input and other methods, but it has risen to the level of just need, and it is not the time." Zhang Jidong said.

Invest heavily in speech recognition

Despite the difficulties, the general direction of speech recognition technology is irreversible.

"All mobile phone manufacturers are investing in voice, expanding voice technology, creating more elegant designs and integrating them deeper into mobile phones," said Michael Thompson, senior vice president of speech recognition technology company Nuance.

Although Apple's Siri has been laughed at repeatedly, and even called one of Apple's most failed products, Apple's investment has continued to increase. Apple even set up a mysterious office near the Massachusetts Institute of Technology (MIT) to do research and development of Siri speech recognition technology. Yu Kai revealed that the staff of Siri's voice technology department maintained the ratio of 1: 4. One person is responsible for studying the input and output of speech, and four persons are responsible for natural language processing to overcome the difficulty of natural interaction of speech.

Domestic manufacturers who are deeply involved in the field of speech recognition have also received investment for R & D. The year before last, Spitz received joint investment from Lenovo and Tus. China Mobile (Weibo), through its subsidiary, bought shares in HKUST Xunfei at a price of 1.363 billion yuan, accounting for 15% of the shares, and then jointly launched the intelligent voice portal product "Lingxi" in December of that year. Lingxi can realize functions such as voice call, text message, and weather check.

Who can detonate voice interaction?

"Sometimes it may just be boiled, or it may even be driven by other directions in the future." Zhang Jidong said. He thinks WeChat is one of them.

When WeChat was just launched, many people were confused when they saw other users “talking to themselves” on their mobile phones. Later, they discovered that it was WeChat's voice intercom function. Now, people have become accustomed to talking to WeChat.

Zhang Jidong believes that the next possibility to explode speech recognition applications is the increasingly popular wearable devices. For example, the bracelet can transfer user data to the cloud, and then analyze a suggestion for personal health. Even, the data found that a user's work schedule is irregular, and the voice assistant can give a voice prompt when the user needs to rest.

More realistic applications are wearable devices such as smart watches, and functions such as voiceprint recognition and voice wake-up can become typical applications. The former user can use his voice as the password to turn on the device, while the latter wakes the device without the user touching the device.

"We are also working with chip manufacturers to try to integrate voice recognition technology into smart wearable devices to reduce power consumption and expand the application time of voice recognition on wearable devices." A person in charge of a voice recognition technology manufacturer Say.

Geared Stepper Motor

Geared Stepper Motor,Planet Gearbox,Spur Gearbox,Nema23 Geared Electric Motor

Changzhou Sherry International Trading Co., Ltd. , https://www.sherry-motor.com