Engineering
The AI secret sauce: Going beyond buzzwords to look into this new world of tech
We’re always thinking about how we can take our software to the next level and add that Voyage secret sauce. For us, part of this means using powerful new technologies such as artificial intelligence (AI) in our customers' products. These technologies are key to our innovation or, as we like to put it, 'raising the bar.'
'Alright, but what is it?' - we hear you ask. The practical application of AI can mean different things for different needs and use cases. One exciting example, however, is Speech-To-Text (STT) and Large Language Models (LLMs). On this front, we recently finished a successful project for a client where we were tasked with taking the latest AI tech available and turning a tool that struggled to pick up simple commands into a smart, robust system that understands context, multiple speakers, and complex sentences.
The STT system was implemented as a stepping stone in an AI journey for this client’s financial technology products. We delivered against their initial brief to implement an STT feature that is both accurate and reliable, complete with an easy-to-use, sleek user interface that can also accommodate ongoing growth.
Meeting the sauce makers - Voyage’s team of developers 🧑🍳
To achieve this project, we needed a team (cue the 80s montage music). Luckily, the developers I work with are some of the best around, so it didn’t take long for us to come up with multiple potential STT solutions using AWS Transcribe, AssemblyAI, and Deepgram. To test our theories we brought these into our in-depth discovery phase, creating a Proof of Concept to test all potential options.
Eventually we settled on AssemblyAI as it offered the best features for the job at hand, including the ability to understand multiple speakers and rewrite text if it perceives a correction needs to be made.
Once the client and the Voyage team settled on the architecture for the project, we got to work. My background is in intelligent systems, so this task was a great fit! The image below shows how this architecture was implemented into the code.
The result ✨
Once we had our solid foundation, aka a bit of React/server code magic, and a fresh lick of paint, (otherwise known as the user interface), we were able to share our final result as seen in the demo video below. Driven by maximum usability, we made sure the client could use the input field wherever it is needed - meaning it’s not restricted to one single input or project.
Right off the bat the client said they were happy with what we created, and since then have reported back that they love the STT functionality, and use it ‘all the time’.
Ok… but why is that important? 🔬
Now that we are able to take high-quality user input and create function-forward AI tools, what can we do with this tech going forward? Raise the bar, naturally.
As is the nature of AI, the more data we capture and assimilate into the tools we build, the better it becomes. In this example, when our client activates recording, we securely capture the users microphone, send this encrypted audio to a speech to text service to understand what is being said, then can use a Large Language Model (LLM) to process, summarise, and auto-generate specialised intelligent responses, correct input errors, auto-generate data, and even create graphics. These LLM powered services looks like the diagram below, and this is just the start!
Behind the scenes, it’s also possible to provide these LLMs with not only what the user has input but pull in other data, including data that is unorganised or from different sources. These LLM systems can also be given access to data and tools through the internet, allowing them to do even more number crunching and sorting behind the scenes, growing more useful and specialised over time.
As AI technology continues to develop and our horizons expand, we say welcome to the future, where the sky's the limit!
Get in touch with Voyage to chat about how AI can enhance your business