Choosing the best-fitting ASR solution
Before anything else, we made a thorough analysis of available ASR tools to be integrated into the web app. On checking all significant players against basic criteria, we decided on six tools. Our team created a sandbox to run the instruments on several browsers and machines with different configurations. We focused on parameters essential for our client — such as the accuracy of transcriptions, the number of sessions supported simultaneously, the speed of response, and more. After evaluating tools across multiple dimensions, AltexSoft chose the most relevant option.
Enabling effortless switch between ASR providers
Since AI tools dynamically evolve, chances are that another ASR provider will soon be able to deliver better services. So, it’s critical to have a technical possibility for a fast switch between third-party APIs. We embedded the needed flexibility into the app architecture by creating a separate API integration layer with adapters. As a result, we can relatively quickly swap vendors with minimal to no impact on the core business logic.
Ensuring quick recovery after connection issues
The ASR API integration is based on the WebSockets protocol, which enables two-way real-time communication between a web server and a browser. Yet, the technology is sensitive to network disruptions, which may happen for many reasons and at any stage of data transfer. We implemented buffering so that a local device used by a speaker could accumulate chunks of audio data when the Internet drops out. This allows for resuming the online audio transcription once the connection is restored.
Synchronizing time to milliseconds
One of the technical challenges was achieving clock unification across all users during a live session. We set a server-side timer as a single source of truth and made local clocks to match the server’s computer. Subsequently, all participants, no matter their location, see the same time, up to milliseconds, on their screens.
Designing an intuitive interface
Our team created the app’s interface with simplicity in mind. The transcribing process is automated, with human intervention boiled down to two main operations — start and stop the program. It takes conference organizers and participants a short while to begin using the software in real settings.
Addressing security concerns
Safety and compliance with data protection regulations are our client’s top priorities. To mitigate possible risks, we deployed the system in the Amazon virtual private cloud (VPC) — a logically isolated environment that allows for strict access control and management. Security measures involve creating a unique 16-character ID and passcode for each conference session, a strong authentication process, and Internet traffic encryption. Only a limited number of people with a proper permission level can access and manually download conference data.