Featured Articles

How Retrieval-Augmented Generation Delivers Better GenAI Answers

The team of researchers who first introduced the world to the concept of retrieval-augmented generation (RAG) can perhaps be forgiven for also coining its less-than-ideal acronym. After all, RAG elegantly addresses some of the key limitations of large language models (LLMs) employed by generative AI (genAI) tools such as ChatGPT, Meta AI, and Google’s Gemini. Subpar acronym aside, RAG is relevant to anyone interested in improving LLMs’ effectiveness. Crucially, RAG sharpens the accuracy and relevance of responses by tethering models to specific, trusted, and up-to-date knowledge bases. RAG also enables users to trace these responses back to the source. This highly efficient technique also cuts down on the need to train vast swathes of data in the first place. As a result, RAG gives data-driven enterprises exciting opportunities to add business value. And the potential use cases stretch well beyond the world of the chatbot to encompass fields like data analytics and beyond. 


Welcome to the RAG Generation

The landmark paper that first introduced RAG to the supercharged world of genAI development was published by a group of Meta researchers back in 2020. Four years on, it now represents another potentially valuable, easily accessible AI tool. However, before taking a deeper dive into how RAG works, it’s worth reviewing the core capabilities and limitations of LLMs and genAI. Why? Because RAG is all about mitigating weaknesses and harnessing strengths. 


Trust Me, I’m a Large Language Model

Even a casual user of genAI tools like ChatGPT will acknowledge that they’re hit-or-miss. Sometimes the results are brilliant. Occasionally they are laughably (and even dangerously) off the mark. More typically, the responses simply err on the side of generalization.   

This is for good reason. LLMs are generally very effective at accessing relevant data. The leading genAI tools are also good at creating natural, human-like text responses. However, LLMs are typically trained on huge, publicly available datasets. Consequently, they often don’t have access to the most up-to-date information. And it probably won’t come as a surprise to learn that not everything on the web is true. So LLMs are not only exposed to a lot of irrelevant data, but in many cases, they’re also exposed to information that’s misleading or simply incorrect. In addition, an LLM will sift through a lot of material before concluding that essential knowledge is missing. In such situations, the nature of genAI is to generate a response, not admit it can’t find the complete answer to the query. 

All these characteristics create obstacles for organizations looking to utilize LLMs. Tools such as ChatGPT have built an unenviable reputation for inventing answers, known as “hallucinating.” What’s more, LLMs don’t allow users to check what the model is basing its responses on. Inevitably, this creates trust issues. In addition, enterprises must factor in the time and expertise required to train and/or fine-tune models in the first place.


How Retrieval-Augmented Generation Works

As the name suggests, RAG overcomes these barriers by integrating the best elements of both retrieval and generation components to optimize LLMs’ outputs. In a nutshell, that means leveraging LLMs’ ability to retrieve relevant data, genAI’s ability to create high-quality, contextual, natural language responses. 

Reflecting this, the RAG process comprises two distinct phases: retrieval and generation. RAG also utilizes several novel components and techniques that are critical to the speed and accuracy with which it can deliver answers to users’ queries. These include an embedding model, which employs a process called text embedding to convert text into a machine-readable numerical format known as a vector. Significantly, these vectors can capture words’ nuance and “real” meaning. 

Figure 1
Figure 2. RAG sharpens the accuracy and relevance of responses generated by a conventional LLM such as ChatGPT (Figure 1) by tethering the process to trusted and up-to-date knowledge in the form of a vector database (Figure 2).  

The embedding model converts the trusted data specified by the model owner into vectors, and stores them in a vector database (also known as a vector store). Vectors are important because they support semantic searches. Compared to a simple keyword search, semantic searches are based on a more sophisticated understanding of both the user’s query and the meaning and nuance encompassed within the vectors stored in the vector database. 

As Figure 2 shows, RAG utilizes these components in what’s essentially a detour in the conventional process flow of an LLM (Figure 1). In the retrieval phase, the user’s query is initially sent to an embedding model to create a vector. This vector is compared to the contents of the vector database to find a match or matches, and the associated data is retrieved. At this point, the retrieved data is synthesized with the user’s query to craft a much stronger, more contextually appropriate prompt, perhaps best described as a “RAG-enriched smart prompt.” In the generation phase, the LLM creates a contextualized, natural language response. 

Of course, with RAG, the knowledge base utilized by the LLM is defined and controlled by the owner. In other words, the owner has the keys to the cabinet: they decide what goes into the cabinet, what needs removing and replacing, what stays outside, and how it stays up to date. RAG makes editing easy, allowing changes to be made on the fly, sparing the owner the time and effort involved in fine-tuning a conventional LLM. 


RAG to Riches 

Most organizations require models that understand their specific fields of operation and tap into their knowledge bases and sources of expertise. Equally, organizations don’t want their model output clouded by a vast and often irrelevant mass of publicly available information. They also need to be able to explain how the results are generated. When employing genAI tools that tap into public domain content, copyright and IP issues are another area of growing concern.  

Essentially, RAG offers a way to harness the power of genAI and the LLM while sidestepping their limitations. In terms of use cases, the obvious first port of call is the chatbot. Here, RAG offers the opportunity to deliver far more accurate responses. Chatbots can be tethered to an organization’s relevant, up-to-date policy documents and guidelines, which dramatically reduces the need for script writing. But RAG’s potential extends much further. Medical diagnostics and the legal profession are just two environments where users must draw on trusted sources and specialised knowledge and employ very specific terminology. In education, the creation of interactive training courses tailored to individual learning requirements also needs to be based on recognized and authoritative sources.  

RAG offers similarly exciting opportunities in areas that include data analytics. Moreover, platforms such as Altair® RapidMiner® are simplifying and democratizing deployment. Techniques including text embedding and vector stores are brought together, as is access to open source resources such as Hugging Face. Solutions can be implemented quickly, without needing to write a single line of code. 

Figure 3. Within the Altair RapidMiner platform, users can take advantage of an intuitive tool that mirrors the step-by-step RAG workflow introduced in Figure 2.  


Time to Focus

It's easy to understand why so many people regard genAI as a fresh-faced upstart, inextricably linked with the launch of ChatGPT. In reality, genAI’s history stretches back to the 1960s. However, there’s no doubt that the landscape of genAI is now evolving at unprecedented speed. Alongside popular blockbusters from the likes of Google, Meta, and OpenAI, future development will continue to be defined by the creation of more specialized, more focused, and more efficient tools such as RAG. Overall, RAG addresses these and other flaws, opening the door to a new generation of AI use cases shaped by accuracy, trust, and accountability. 


RAG and Altair RapidMiner

The Altair RapidMiner platform includes a versatile, customizable extension that provides a secure and cost-effective way of leveraging the broad capabilities of genAI, including RAG. Users can also enjoy easy access to hundreds of thousands of LLMs. 

What’s next for RAG? Altair’s acquisition of Cambridge Semantics promises another significant step forward in this exciting branch of AI. Instead of relying on embedding models and third-party vector databases, users will soon be able to take advantage of their own on-premises graph databases to create smart prompts.

For further details, visit