Have you been curious how exactly AI is powering meeting summaries? Learn what technologies in machine learning are powering the future of automatic meeting summaries.
Nowadays, everyone is talking about AI and how it can be used to automize tasks. The goal of which is, and always has been, to make the lives of people easier. The problem is that many think of the developments in the field of AI as a big buzzword that accomplishes impressive things in magic ways. The reality, however, is different. We have recently built our first AI-enabled product, jamie, that automatically summarises meetings. In this article, I want to shed light on the technology that makes automatic meeting summarization possible while providing a good introduction to the technical details of making this magical experience possible.
First, we need to expand our understanding of AI beyond the marketing buzz we hear in the media. As you know, AI stands for artificial intelligence. The essence of this is allowing computer systems to perform tasks that used to only be able to be performed by humans. Common tasks include visual recognition, speech recognition, decision-making, or translation among languages.
This sounds exciting on a high level, but by diving deeper into it, we need to define how to build these systems. And this is where machine learning comes into play, which is an application of AI. Machine learning can be best described as the ability of a computer program to learn without explicitly telling it certain behaviors. Diving deeper, we discover the world of deep learning (pun intended) which is a sophisticated version of machine learning that relies on mimicking the way our brain works through neural networks. Most of the advancements and applications you hear about in the world of AI are deep learning models tweaked for different purposes.
There are many sub-fields in deep learning, the most relevant of which is natural-language processing, considering the purpose of powering automatic meeting summaries. You can think of it as a subclass that deals with all sorts of language tasks, ranging from classification to text generation.
So after establishing a basic overview of the field of AI, let's jump into establishing what software for end-users can look like. We recently built jamie, a personal AI assistant that generates meeting summaries in business-writing quality in seconds. This is how it works for users:
After installing jamie on your Mac or Windows computer, you can open the application and "start jamie". After starting jamie, you can hold your meeting as you usually would without any changes to your workflow. No matter whether it's in an offline setting or any virtual meeting software. After you are done with your meeting, you provide jamie with the title of the meeting and select a goal type. After this, jamie starts their work. Within seconds, you receive a notification and get a perfectly written meeting summary. You can edit quickly and copy & paste it into your favorite tool. The best part is that jamie works seamlessly in more than 15 languages.
The reaction of our users is always the same: "wooow, I didn't expect it works so well". So how does this magic experience work in the background?
There are multiple puzzle pieces in place, powering the seamless experience for users described above. Namely, a beautiful and robust app for Mac & Windows, robust audio-to-text transcription, and summary capabilities itself. Together, they make the idea of jamie a reality.
After having a seamless application in place for users and having a way to capture the audio, the transcription into text needs to happen. For this, there is a large variety of APIs like Google Speech-to-text or AssemblyAI out there, as well as, open-sourced deep learning models that can be easily deployed on virtual machines in the cloud like wav2vec, Hubert, or Whisper. Building a way for the audio to be processed into text is the base requirement to continue with the next step: generating the meeting summary.
So how do you generate an intelligent summary from potentially tremendously long texts? The answer lies in large language models (LLM) which are deep learning models for usage in NLP tasks. As the name suggests, those models have been trained on enormous amounts of data (more than billions of data points) and are made accessible as pre-trained models for others to use. This is the technology powering summarisation in human-like quality.
The most prominent models currently available are Google's T5, OpenAI's GPT-3, and open-sourced projects like EleutherAI's GPT-J. As those models have been pre-trained on a wide range of tasks, they perform quite well out of the box. Yet, significant performance and accuracy improvements can be achieved if leveraging fine-tuning. This is the process of creating your custom versions of the model by training it on your custom training data.
The state of the technology accessible today already produces impressive results. Considering the example of jamie, not only can our product capture, extract, and re-phrase points in meetings based on true meaning while filtering out unneeded details, sarcasm, and noise. But also does it understand the basic cause-effect relationship between the points that are raised. The best way to see how far automatic meeting summaries powered by AI have come is to try it out, we even offer a free trial without a credit card required.
Nevertheless, the interesting question is where the field is headed. Despite it being tremendously difficult to predict the future, there are some trends on the horizon. Firstly, there are new architectures like switch transformers that allow LLMs to be trained on even more data with more parameters (more than a trillion) which will further improve the quality of the summarisation capabilities. Secondly, audio-to-text transcription will only get better over time as well, allowing for even more accurate capturing of new and specific words, which will deliver better results overall. Lastly, the field will be able to incrementally move towards real-time capabilities of these models, allowing for the nearly instantaneous creation of summaries as all the steps required will be executed significantly faster. All this paints a bright picture for the future of automatic meeting summaries, powered by the latest advancements in the field of deep learning.
If you want to join us on the journey of building the best meeting AI assistant in the world, feel free to drop us a message.