AltexSoft Created a Conferencing App with Speech Recognition

Choosing the best-fitting ASR solution

Before anything else, we made a thorough analysis of available ASR tools to be integrated into the web app. On checking all significant players against basic criteria, we decided on six tools. Our team created a sandbox to run the instruments on several browsers and machines with different configurations. We focused on parameters essential for our client — such as the accuracy of transcriptions, the number of sessions supported simultaneously, the speed of response, and more. After evaluating tools across multiple dimensions, AltexSoft chose the most relevant option.

Enabling effortless switch between ASR providers

Since AI tools dynamically evolve, chances are that another ASR provider will soon be able to deliver better services. So, it’s critical to have a technical possibility for a fast switch between third-party APIs. We embedded the needed flexibility into the app architecture by creating a separate API integration layer with adapters. As a result, we can relatively quickly swap vendors with minimal to no impact on the core business logic.

Ensuring quick recovery after connection issues

The ASR API integration is based on the WebSockets protocol, which enables two-way real-time communication between a web server and a browser. Yet, the technology is sensitive to network disruptions, which may happen for many reasons and at any stage of data transfer. We implemented buffering so that a local device used by a speaker could accumulate chunks of audio data when the Internet drops out. This allows for resuming the online audio transcription once the connection is restored.

Synchronizing time to milliseconds

One of the technical challenges was achieving clock unification across all users during a live session. We set a server-side timer as a single source of truth and made local clocks to match the server’s computer. Subsequently, all participants, no matter their location, see the same time, up to milliseconds, on their screens.

Designing an intuitive interface

Our team created the app’s interface with simplicity in mind. The transcribing process is automated, with human intervention boiled down to two main operations — start and stop the program. It takes conference organizers and participants a short while to begin using the software in real settings.

Addressing security concerns

Safety and compliance with data protection regulations are our client’s top priorities. To mitigate possible risks, we deployed the system in the Amazon virtual private cloud (VPC) — a logically isolated environment that allows for strict access control and management. Security measures involve creating a unique 16-character ID and passcode for each conference session, a strong authentication process, and Internet traffic encryption. Only a limited number of people with a proper permission level can access and manually download conference data.

AltexSoft Built a Conferencing App with Embedded Speech Recognition

Background

Challenges

Enable online tracking of the conference and remote participation in it

Make the process more convenient for speakers

Minimize the price of transcribing

Value Delivered