Retrieval Augmented Generation for LLM Hallucinations

People have become aware that when you ask a GenAI tool questions that aren’t surface level or about events after 2021, they tend to “hallucinate”. In this state of hallucination, the AI will write inaccurate responses or some nonsense unrelated to the question asked.

This is a very common problem involving Generative AI (Artificial Intelligence) and there are techniques researched by many organizations to prevent these GenAI tools from baffling themselves so much to the point that they don’t even provide relevant info.

As the capabilities of technology improve, so does the timeframe to provide definite solutions shrink at an accelerated rate. In some cases, let's say a project in an IT company, it is impractical to fine-tune your data and retrain your LLM (Large Language Model) all the time to stay updated on current events. Especially true when the data you want to add is massive in size. 

These “hallucinations” will cause a huge problem, particularly with those who are using GenAI for customer support. While using it is very helpful and cost-effective for businesses, this will again need AI to focus on the nitty-gritty details unique to the business or it will continue to provide wrong information.

To combat such a problem, there was a concept introduced that almost provides a solution for all the handicaps mentioned earlier. That is the concept of Retrieval Augmented Generation (RAG) implemented into Generative AI/LLMs. 

Start Implementing RAG-Based AI Customer Support with Aptedge - Request a Demo

Retrieval Augmented Generation for LLM Limitations

One of the most common limitations in LLM is that the querying is limited to the amount of data it is trained with. For example, ChatGPT is currently only trained with data till 2022. If you ask about any events happening in 2023, it will be very incorrect. This is detrimental when you’re trying to provide customer support in real-time, but you can’t access or use the data of new customers due to the sheer task of updating all the data and fine-tuning it.

This problem can be solved by implementing RAG into your LLM model. It involves adding new knowledge pertaining to the context and then using it to get your answer. 

1. Frozen in time

Once an LLM is created and trained, it will stay “stuck” with the information you’ve trained it with. Especially, in NLP getting new words in vocabulary means that you’ll have to redefine the dictionaries, definitions, and so on so that the model isn’t biased to the new word and gives wrong meanings.

It can’t handle current events well, which means if your business is working on getting a bigger customer base, you will have difficulties updating it.

2. Too Expensive to handle

Another problem is that maintaining and training LLMs are ridiculously expensive. It takes $3.2 million for OpenAI to maintain ChatGPT. So, you can see the reluctance of many companies to upgrade or fine-tune the model, since it might have to be done so frequently, with the steady appearance of new data. Imagine spending over $3.2 million every week! Not even the tech corporate giants can afford that much money.

These models are trained to answer every question asked by the user. But, these models are trained in such a way that they aren’t allowed to say “I don’t have any clue” to any of the questions. Similar to a human answering a question they aren’t fully thorough, the Generative AI model will mimic that scenario and ‘confidently’ say something trying to match your question. It looks so convincing that you’ll buy it since it’s said with such conviction. But, if you check your homework, you’ll probably see it’s completely wrong. Saying something so wrong confidently is what people mean by the AI “hallucinating”.

This will cause a lot of problems when you’re using Gen AI for customer support. You can’t dish out millions of dollars to tune your AI model to prevent hallucinations. That’s highly impractical. This is where RAG (Retrieval Augmented Generation) comes in. This means that you don’t have to fine-tune every single aspect. In this, the model goes to its knowledge base (the dataset has many knowledge bases that can be updated easily) and gets data from that for any query you ask it.

Addressing GenAI Hallucinations in Large Language Models

Generative AI (Gen AI) such as Bard, ChatGPT, Claude and so on, use the concept of LLM (Large Language Models) to train their voluminous datasets and mimic human interactions. Even though it is one of the more recent breakthroughs in automation, it comes with a caveat which is that LLMs are monoliths that are frozen in time once created. It is a mountainous task to update it according to current events. While data is plentiful, it takes quite a bit of time and money to fine-tune every single aspect with information coming out every second rendering the previous information obsolete or wrong, in some cases. There are a couple of ways the AI can hallucinate.

Types of Hallucinations

1. Current Events Hallucination

Say, you want to learn about the current events, like who the president of the United States of America is, or who the Prime Minister of the UK is. You can ask ChatGPT, but it comes with a warning that it isn’t up to date with information like Google is as you can see below.

Current Events Hallucination

This is totally wrong since the prime minister of the UK right now is Rishi Sunak. Now, you can see the problems that are starting.

2. Hallucinating in math problems

When it comes to mathematical problems, which require critical thinking and logical reasoning, Generative AI falls short. Here’s an example.

Hallucinating in math problems

Normally, if you were to solve this you would get the answer 151$. ChatGPT while able to multiply and get the cost of each product, performs addition wrongly. This is because ChatGPT is an AI language model and Generative AI doesn’t necessarily mean they have an inbuilt calculator. Hence, another reason as to why ChatGPT “hallucinates” when solving maths problems.

Ways to Reduce Hallucinations

There are other ways to combat Generative AI hallucinations. Some of them are:

1. Model Fine-Tuning

From time to time, you can update information, fine-tune the parameters, or set up default responses to make sure that the Generative AI doesn’t hallucinate. In fact, OpenAI fine-tuned their LLM model, ChatGPT for quite a while.

2. Iterative Prompting

Instead of regular prompt Engineering, you ask the same query in different ways a couple of times. This makes the agent in the model, an AI agent go back and forth between the vector databases.

It uses the concept of Forward-Looking Active Retrieval Generation aka FLARE.

Iterative Prompting: Forward-Looking Active Retrieval Generation

Here, the AI Agent goes back and forth to the vector database. Used in combination with RAG, by iterative querying, the LLM finds the semantic context of the query asked, and then generates a set of similar queries to the knowledge base in the vector database. Then, it compiles all the responses into one complete response and returns it.

One of the most famous examples of iterative querying used is in LangChain which has 

perfected the ability to connect LLMs and databases. One other way is to use AptEdge’s product, AnswerGPT to work with your business’s customer support.

How AptEdge’s AnswerGPT works

How AptEdge’s AnswerGPT works

How Retrieval Augmented Generation Works?

It uses the concept of semantic search, which in layman's terms means that the LLM should be able to understand the context “read between the lines” and match results properly instead of doing a simple keyword match. It makes the model go back and forth into relevant knowledge bases which are frequently updated from time to time and stored in vectors.

How Retrieval Augmented Generation Works?

Here, the data for the knowledge base is taken from the dataset used to train the LLM. It is then stored as such. When the query is asked, based on the keywords from the query, it can search for the word in an index and provide the most accurate response. This is very helpful in a customer support scenario since it involves periodically going through the knowledge base and absorbing all the information.

This data can also be given by users and tell the LLM to work based on that.

Base Model

Before you start implementing the RAG model, you’ll need to construct a base LLM model, like ChatGPT. After training and testing the LLM model up to date, you can deploy it as open-source software. (Note: ChatGPT is not open-source).

After building it and deploying it, you’ll need to constantly feed it new information to prevent it from giving wrong information.

Build a vector database

With a knowledge base that can be easily replaced and changed, you need to store such data in a vector database.

vector database

A vector database plays into the high-speed word search or word matching. Combine that with the semantic keyword search and you have an ingenious plan.

Vector databases are faster than traditional RDBMS because, when you query, it is way faster since vectors search based on the keyword and return the data which is relevant to your query. But, when it comes to RDBMS, it compares and contrasts every single file in the database when queried which, in very large databases will take a substantial amount of time.

Additionally, you can implement various search algorithms for vectors, something you cannot do in other types of databases.

Why is RAG preferred for customer support?

The main reason why RAG is preferred over other methods is its relatively low usage of resources. Let’s see some of the main reasons why it is preferred for customer support.

1. RAG is the most Cost-Effective

Building a vector index database based on every word available takes a lot less time and is more optimal too. In a business, you can’t simply leave Gen AI to “hallucinate” and cause problems. You can’t fine-tune the models frequently due to the high cost of doing so. OpenAI reportedly spends over $11 million to maintain ChatGPT’s LLM. So, it is known that not every business can spend lots of money to fine-tune it.

2. RAG is Easy to Implement

The secondary reason is that it takes a lot less time compared to other ways. You simply need to add AptEdge’s customer support to the LLM model you’ve trained for business. Compared to other expensive, time-consuming methods, you just need to build vector databases and index them. Some of these are available in DataStax’s AstraDB, which can be implemented into your LLM.

3. RAG for Faster Response

Retrieval Augmented Generation, as mentioned earlier, uses vector databases and is indexed based on keywords. This ensures high-speed response for the queries your customer asks for and provides semantic context which plays a major role in customer support, where the information is mostly vague.

This, coupled with fast search algorithms, such as greedy best-first search algorithms or many more, depending on your choices can be used to make it even faster.

RAG for Faster Response

It is said that customer satisfaction increases when the problem is resolved with less number of chats. GPT-powered answers from Generative AI will decrease response time to a fraction of a second. You can use AptEdge’s Edge automation techniques to improve your chat resolution time.

4. RAG increases High-Performance

Due to fast responses and a combination of algorithms, this complex architecture, increases the performance of the model. With the addition of RAG in your LLM, the hallucination problem is a thing of your past. Adding the RAG model into your LLM architecture will revolutionize the industry of customer support.

Start Implementing RAG-Based AI Customer Support with Aptedge - Request a Demo.

Schedule a Demo


Generative AI has transformed the digital world in ways never seen before. As every rose has a thorn, Gen AI has some kinks that need to be ironed out. Retrieval Augmented Generation (RAG) is the best way to iron the hallucination problems and also improves your LLM quality. RAG should always be used in combination with other ways such as iterative querying, search algorithms and so on.

It has been estimated that Generative AI will bring out many more skilled jobs and way more profits for businesses with its revolutionary technology. Due to this, it has become equally important to find out and research techniques to improve its responses. To help you work on this, AptEdge provides customer support with RAG. 

Get Going Today!

AptEdge is easy to use, works out of the box, and ready to go in minutes.