speech recognition

AltexSoft Built a Conferencing App with Embedded Speech Recognition

Business domain
Professional Services
Technology
APINode.jsVue.jsAmazon Web ServicesDynamoDBWebSocket

Background

A US-based company specializing in conference services contacted AltexSoft to optimize transcribing the recordings of online and offline meetings. Our team built a web app with automated speech recognition (ASR) that streams the audio of the event, supports multiple speakers, and converts their words into text in real time.

Challenges

The ASR-based system we created allows conference participants to tailor their speeches on the fly and get the written record of the event once it’s over, with no need for manual transcribers. All in all, the software addresses the following business challenges:

Enable online tracking of the conference and remote participation in it

Make the process more convenient for speakers

Minimize the price of transcribing

Value Delivered

Choosing the best-fitting ASR solution

Choosing the best-fitting ASR solution

Before anything else, we made a thorough analysis of available ASR tools to be integrated into the web app. On checking all significant players against basic criteria, we decided on six tools. Our team created a sandbox to run the instruments on several browsers and machines with different configurations. We focused on parameters essential for our client — such as the accuracy of transcriptions, the number of sessions supported simultaneously, the speed of response, and more. After evaluating tools across multiple dimensions, AltexSoft chose the most relevant option.

Enabling effortless switch between ASR providers

Enabling effortless switch between ASR providers

Since AI tools dynamically evolve, chances are that another ASR provider will soon be able to deliver better services. So, it’s critical to have a technical possibility for a fast switch between third-party APIs. We embedded the needed flexibility into the app architecture by creating a separate API integration layer with adapters. As a result, we can relatively quickly swap vendors with minimal to no impact on the core business logic.

Ensuring quick recovery after connection issues

Ensuring quick recovery after connection issues

The ASR API integration is based on the WebSockets protocol, which enables two-way real-time communication between a web server and a browser. Yet, the technology is sensitive to network disruptions, which may happen for many reasons and at any stage of data transfer. We implemented buffering so that a local device used by a speaker could accumulate chunks of audio data when the Internet drops out. This allows for resuming the online audio transcription once the connection is restored.

Synchronizing time to milliseconds

Synchronizing time to milliseconds

One of the technical challenges was achieving clock unification across all users during a live session. We set a server-side timer as a single source of truth and made local clocks to match the server’s computer. Subsequently, all participants, no matter their location, see the same time, up to milliseconds, on their screens.

Designing an intuitive interface

Designing an intuitive interface

Our team created the app’s interface with simplicity in mind. The transcribing process is automated, with human intervention boiled down to two main operations — start and stop the program. It takes conference organizers and participants a short while to begin using the software in real settings.

Addressing security concerns

Addressing security concerns

Safety and compliance with data protection regulations are our client’s top priorities. To mitigate possible risks, we deployed the system in the Amazon virtual private cloud (VPC) — a logically isolated environment that allows for strict access control and management. Security measures involve creating a unique 16-character ID and passcode for each conference session, a strong authentication process, and Internet traffic encryption. Only a limited number of people with a proper permission level can access and manually download conference data.

Approach and Technical Info

The project is in progress, with 2 backend developers, a frontend developer, a project manager, a QA engineer, and a UI/UX designer involved.

The tech stack included AWS, DynamoDB, Node.js, Vue.js, and WebSockets.

 

APINode.jsVue.jsAmazon Web ServicesDynamoDBWebSocket