Using a language model to make sense of an email inbox
In September 2023 Bard (an experimental AI chatbot by Google) introduced a big
update, including a new ability to interact with Gmail.
In this post we will see what the new Gmail integration in Bard can do and what
are it’s current limitations.
How to enable Gmail in Bard?
Visit Bard, click on the Extensions icon in the upper right section and enable “Google Workspace”:
Asking about a specific conversation
Let’s start with the basics, asking the chatbot about the status of a specific
task that we know is tracked by some emails somewhere in our inbox:
Wow, this just worked!
How does it work?
When Bard answers a query based on information from Gmail, the UI lists the
emails that were used. These are displayed below the response itself.
That suggests that the integration likely
works like this:
The LLM reads our initial query. Behind the scenes it generates a Gmail
search query to retrieve the emails that would be related to it
It picks a few top resulting emails and feeds it back to the language
model, prompting it to answer the original question
🗺️ Making a map based on emails
Given that Bard now also has an integration with Google Maps, could it bridge
between Gmail and Maps and draw a map based on emails in our inbox?
… yes it can …
I love that this just worked!
The 5 email limit
In my experiments, no matter what email-related query I make, the results are
always based on at most 5 emails. This means that the tool can only handle
queries that can be answered by reading at most 5 emails, and only if the
right 5 emails can be identified.
For example, let’s try a broad query that would require consulting many email
threads: Is there anything in my Gmail I should be following up on?
Pretty good answers, but the catch applies: as far as we can tell, Bard only
uses at most 5 emails to answer the question. We may have 🔥 20 exploding
unanswered crisis-mode conversations that the tool didn’t surface, so don’t be
too reassured by this type of response :).
In my experiments, Bard occasionally would respond in a way completely detached from
the retrieved emails.
This seems pretty legit, but I didn’t remember any of these conversations. Then
I took a look at the “emails used” for this response:
A newsletter about writing, a boarding pass, two train travel notifications and
one hotel booking. It turns out that the response above was completely made up.
This seems to be an example of what’s called “hallucination”: the language model
is responding in a way that sounds convincing, but isn’t grounded in facts. In
my experiments in September 2023 this was happening less than 10% of the time,
but was quite stark when it did.
🧹 Deleting unimportant emails may improve accuracy
Over the 15 years that I’ve been using Gmail I accumulated tens of thousands of
low interest automated emails of all sorts: social media notifications,
promotional emails, etc. They don’t take much space so I never had a reason to
clean them up. Until now, that is!
As we saw above, Bard’s ability to answer queries using Gmail is based on
finding the right 5 emails. The more noise we have in our archive, the higher the chance
that the noisy emails will crowd out the ones that are relevant for the query.
In my experiments I saw Bard failing to pick the right email that contained my
car booking reservation. In the screenshot below it picked 5 emails including a
pretty obscure Couchsurfing notification from 2015 (which probably included the
word “Milan” in it), but it didn’t include the one email from car rental company
that included my reservation:
I then went to Gmail and used the query category:promotions older_than:1y to bulk delete thousands of old emails, including that Couchsurfing notification:
After this, the car rental booking query in Bard started to work as a charm :).
Based on these examples, integrating a language model tool like Bard with tools
and data sources such as Gmail looks promising: it may enable handling useful
requests that require orchestration across multiple systems. This already seems
to work well when the query can be answered using a single or a small number of
We also saw that in the current form the Gmail integration in Bard seems to
always use 5 emails to answer each query. That limits what type of queries can
We also saw that the way the integration works relies on identifying the right 5
emails to answer each query. Deleting old low-importance emails from the archive
may make it easier for the tool to pick the right emails each time.
Personally, I’m excited to get some increasingly capable AI help in managing the
chaos of my digital life 💫.
If you liked this and want more ...
People trying to get along with computers. Deep Learning by the Seine 🇫🇷; the world's simplest explainable neural network; an occasional segway to Steinbeck's post-rodeo hangover 💫.