Voice bots have become the norm even in the most seemingly conservative areas of business. They take on common tasks, working faster, cheaper and more efficiently than people. They are more and more like a person. Andrey Kirsanov, CTO at Brainy Solutions, talks about the main trends in the development of speech technologies in 2021 and how their customization benefits businesses.

Request for “custom tailoring” is the main trend

Today bots are being trained to act like they are a live person. It must not only understand well what is being said and correctly interpret what was said, but also show simple human reactions. Does the customer call immediately after placing an order in the online store? Use this context. Are you interrupted? Shut up and listen, decide if you can reformulate your answer to be more helpful. The client did not understand? Try to reformulate your question instead of repeating the same thing. Is he unhappy? Defuse the environment, adapt your speech, or gently switch to a human operator.

Specialists in the field of analytics, psychology and content are also working on the development of a voice bots. Some tasks are solved by technical means. Speech technologies, speech recognition and synthesis are the basis for successful communication between a bot and a person. If at this stage we lose important information, then all the rest of the work with the bot becomes virtually meaningless.

Improved customization of speech recognition 

Bots work with text and rely on speech recognition results. How can we determine that we are good at this task? There is a certain metric of speech recognition (in this case, WER) – this is the number of correctly recognized words in relation to the original. If the bot, for example, does not correctly recognize the street name for the delivery of the goods , then some addressee may be left without a gift for the New Year.

To eliminate errors, the recognition model is trained on examples of dialogues from real life, therefore, real calls are used as a priority. In addition, the business is changing, the requests are changing, and, therefore, the model needs to be further adjusted. Moreover, this should be based on real metrics, and not on a subjective feeling of the customer. We are already on our way to that.

Improved speech synthesis customization 

It is important for business that the client does not feel discomfort from talking to the bot. Conventional speech synthesis provides informative communication, but the way a bot communicates is different from a human, and not all people want to talk to a bot. In the pursuit of conversion and user experience, synthesis often changes to pre-recorded phrases performed by a professional operator. These fragments are inserted in the desired context and the dialog is assembled as a constructor. This approach is not universal – to add variables, for example: the name of the subscriber, his address, the name and price of the product, you will have to mount the phrase in parts. But the difference in pronunciation will destroy the magic of human communication. It is impossible to write down all the possible options.

The most advanced and universal method is speech synthesis with the maximum approximation to the speaker’s style, including the generation of variable parts. We have already managed to achieve 98% human similarity. There is a growing demand for custom synthesis to increase brand awareness. Many companies are looking to personalize robots in call centres.

Other trends

Increasing the share of using voice interfaces

People get used to communicating with applications by voice without pressing buttons on the keyboard or screen. The proliferation of wide communication channels and cloud platforms will allow even small businesses to use full-fledged trainable bots with very little effort.


The days of chatbots responding with text messages are a thing of the past. Text interface will be combined with voice and video depending on the client’s needs. The future belongs to virtual assistants with their own character and manners. They will be able to use the jargon, idioms and aphorisms, make phrases with the inclusion of “extra” words and sounds, including interjections or sighs. In general, use the entire arsenal of communication techniques.

Safety of interaction between bots and humans

Bots will learn new skills, access more information, and take on more responsibility. All of this will require its creators to be careful about risk management and control, similar to working with other personnel.

Bot learning algorithms and voice synthesis will become valuable intellectual property – patented, licensed, subject to major lawsuits.

Expanding the integration of speech technologies

Developers of smart home solutions will begin to standardize voice control and include it in all household appliances and electronics, including devices of the Internet of Things, to which about 8 billion devices are connected in 2020, and their number may triple by 2030.

How to get the most out of speech technology in business

  1. Leverage the expertise of existing platforms. Out-of-the-box customization tools provide the benefits of customization without having to design everything from scratch.
  2. Use cloud solutions. They are available everywhere for both employees and customers. Only consumed resources are paid, often free testing is provided.
  3. Pay attention to tailoring the solution to your business. Professional customization will provide a competitive advantage by improving service quality and customer loyalty at a low cost.
  4. Don’t forget about support. Speech technology is not a “set and forget” service – the system will evolve along with the business.
Published On: March 4th, 2021 / Categories: Artificial Intelligence / Tags: , /