How BERT is revolutionizing Knowledgebases

Anthony Loss
22 June 2022
5 min read
It feels that every online product or platform has incorporated some type of machine learning today.  Online retailers like Walmart, Target, and Amazon use machine learning to understand user’s behaviors and recommend products.  If you have Gmail, Google uses machine learning to categorize emails into groups as Primary, Promotions, or Social. Have you sent a text message recently?  Notice how your phone suggests your next word?  Yes, that’s machine learning, too (specifically a decoder model, but we’ll get into that soon).

There are basically three types of machine learning: supervised, unsupervised, and reinforcement.  For this article, we’re going to focus on unsupervised learning.  Unsupervised learning uses machine learning algorithms to analyze and cluster unlabeled datasets. These algorithms determine hidden patterns or data groupings without the need for human intervention (hence “unsupervised”).

The topic of our time together today is BERT, which is an unsupervised machine learning model, and how its revolutionizing today’s knowledgebases.  Let’s define BERT, first.  BERT stands forBidirectional Encoder Representations from Transformers.  Makes sense, right?

Now without diving too far into the T in BERT, I should explain what Transformers is.  The term “Transformers” was introduced by Google in 2017, in efforts to replace sub-par models, and designed to enhance natural language processing (NLP) problems.  NLP is simply the ability of a computer program to understand human language as it is spoken and written.  To simplify even more, Transformer is an attention mechanism that learns contextual relations between words.  In its most basic form, it includes two separate mechanisms — an “encoder” that reads the text input and a “decoder” that produces a prediction for the task.

Let’s circle back to BERT.  BERT is aTransformer model, and yes, it was designed by Google.  It’s an increasingly popular encoder model that takes words, or sentences, as input and outputs a numerical representation for each word.  The numerical representation is called a feature vector.  One vector per word.  The dimension of the vector is defined by BERT.  Each vector doesn't just define the word, but the words around it (hence bidirectional). So, for example, if the words“Welcome to NYC” were ingested into BERT, the “to” vector also has information from “Welcome” and “NYC” in it.  This is in efforts to provide context.

This methodology makes BERT an excellent candidate to be applied in a Question/Answer tool.  BERT extracts answers to questions like “Who invented the Transformer architecture?”, given this answer is somewhere in the dataset.  Another reason BERT is becoming widely used is due to the minimal to no upkeep.  Fine-tuning a BERT model can absolutely enhance performance, but it’s not necessary.
How does this relate to knowledgebases? Well, a true knowledgebase is designed to be of assistance to transfer knowledge to its users.  In the past, a knowledgebase was searched with“key words” and relevant data returned included those “key words”. However, in efforts to make knowledgebases more helpful, knowledgebase providers wanted a way for users to search their data even if they didn’t have those exact keywords.  

Here's where machine learning, more specificallyBERT, comes in.  Remember how we said BERT is great for Q/A?  I hope so because BERT is applied in a ton of todays knowledgebases. Knowledgebase providers are implementing BERT so that their tools are more helpful to their users. It’s natural language processing/understanding allows users to ask the knowledgebases questions with regular semantics, allowing BERT to figure out the rest and return answers based on the context of your question. 

You probably use BERT every day.  Google usesBERT when you search.  Other innovators in the space, such as, are using BERT in their products. is a knowledgebase provider that focuses its platform on your email archives as the data source. Companies have tons of vital information trapped in their email archives and provides a way, via BERT, for the users to search with natural semantics to get an answer.  To find out more about it, visit All in all, you can add yet another way businesses are utilizing machine learning to propel their path forward.  As machine learning continues to grow, we will no longer be talking about how machine learning is differentiating a business, rather how a business without machine learning can even survive. 

Structured Pruning of BERT-based QuestionAnswering Models J.S. McCarley and Rishav Chakravarti and AvirupSil IBM Research AI Yorktown Heights, NY