Back to Blog
Educational

RAG: A Deeper Drive

2024-11-2514 min
RAGEquity ResearchDeep Dive

In-depth exploration of Retrieval-Augmented Generation (RAG) and its applications in equity research.

RAG – A Deeper Dive

As mentioned in a prior blog post, Retrieval-Augmented Generation (RAG) is a technique used to enhance the capabilities of Large Language Models (LLMs) by combining their generative power with the ability to access and process external information. Essentially, it allows LLMs to "look up" information they don't know, making their responses more factual, up-to-date, and relevant.

More specifically, RAG is “an advanced technique in natural language processing that combines the strengths of information retrieval and generative AI models to produce more accurate, factual, and context-aware responses. By grounding the model’s responses in external, reliable data sources, RAG addresses one of the key limitations of LLMs – the hallucination problem.”

How RAG Works

  1. Prompt: The user provides a question or prompt to the system.
  2. Information Retrieval: The system uses a retriever to search for relevant information in an external knowledge base. This could be anything from a specific document, a collection of web pages, a database, or even a code repository. The retriever often uses techniques like semantic search to find the most relevant passages rather than just keyword matching. The retrieved information is often in the form of documents or passages called "chunks."
  3. Contextualization: The retrieved information is processed and formatted to be used by the LLM. This might involve summarizing, highlighting key facts, or structuring it in a specific way.
  4. Generation: The LLM uses both the original prompt and the retrieved context to generate a response. This allows the LLM to incorporate factual information and ground its answer in evidence, rather than relying solely on its pre-trained knowledge.
  5. Response: The system presents the generated response to the user. This response is now more likely to be accurate, comprehensive, and up-to-date.

But RAG is not one size fits all, and there are actually many different styles of RAG as the technique rapidly evolves along with the improving capabilities of AI models.

Choosing the right RAG style depends heavily on the specific application and the desired balance between accuracy, efficiency, and complexity. Factors to consider include the size and nature of the knowledge base, the types of queries expected, and the available resources.

RAG Based on Retrieval Method

Different kinds of retrieval methods exist as each technique solves specific challenges as well as trade-offs in the process of information retrieval. When considering the trade-off of performance vs speed, dense retrieval gives better semantic understanding but runs slower, while sparse retrieval is much faster but may miss related concepts.

There are also different types of search needs. For searches that need exact matches, sparse retrieval works better. On the other hand, for concept matching (e.g., finding documents about similar topics), dense retrieval works better.

Resource constraints are a consideration as well, with dense methods being more compute and memory intensive, while sparse methods can work on simple hardware.

Data characteristics and query complexity are determining factors as well, as more complex natural language queries tend to require more complex information retrieval approaches.

1. Dense Retrieval

Uses dense vector embeddings to represent both the query and the documents in the knowledge base. Semantic similarity is calculated between the query vector and document vectors, and the most similar documents are retrieved. This approach is good at capturing semantic meaning and finding relevant documents even if they don't share exact keywords with the query.

Examples in Equity Research:

  • An analyst researching Tesla might use a dense retrieval system to find information not just on Tesla directly, but also on competitor strategies, battery technology advancements, government regulations impacting electric vehicles, and consumer sentiment towards sustainable transportation.
  • The system could identify a report on lithium mining in South America as relevant, even if it doesn't explicitly mention Tesla, because it understands the connection between lithium supply and electric vehicle production.

2. Sparse Retrieval

This relies on traditional keyword-based search techniques like TF-IDF or BM25. It looks for documents containing keywords that overlap with the query. This is generally faster than dense retrieval.

Examples in Equity Research:

  • An analyst needing to quickly find Apple's latest quarterly earnings report could use a sparse retrieval system with keywords like "Apple," "Q4 2023," "Earnings," and "Financial Results."
  • A simple equity research platform may employ this to surface past company filings based on keywords such as "10-K" and the company name.

3. Hybrid Retrieval

This approach combines the strengths of both dense and sparse retrieval methods.

Examples in Equity Research:

  • An analyst researching the impact of inflation on the retail sector could use a hybrid approach. They might start with a sparse retrieval using keywords like "inflation," "retail," "consumer spending," and "CPI."
  • Then, a dense retrieval step could refine the results by identifying documents that discuss the specific impact of inflation on retail margins, supply chain disruptions, and changing consumer behavior.

RAG Based on Retrieval Architecture

RAG systems have different approaches based on the architecture of the retriever mechanism. Retrieval architectures differ based on data scale and complexity, query processing needs, response time requirements, and integration requirements.

Single-stage retrieval architectures work better for smaller datasets, but large-scale systems may require multi-stage architectures.

Query processing can be run with serial architectures, processing one step at a time, but retrievals can also be run simultaneously in parallel architectures.

Information can be retrieved with synchronous architectures (e.g., wait for the best result) or asynchronously (return initial results fast, then improve).

1. Single Retriever

Uses one retriever to fetch relevant documents. This is simpler to implement but may not be as effective as using multiple retrievers.

Examples in Equity Research:

  • A basic equity research platform might use a single retriever based on keyword search over a database of company filings (10-Ks, 10-Qs, etc.).
  • When an analyst searches for "Apple financial performance," the retriever would look for documents containing those keywords within the filings database.

2. Multiple Retrievers (“Ensemble”)

Employs multiple retrievers, each using potentially different techniques or focusing on different parts of the knowledge base. Results are then combined, resulting in comprehensive coverage with improved recall and precision.

Examples in Equity Research:

  • Retriever 1: Dense retrieval on a database of news articles and analyst reports to capture broader market sentiment and qualitative insights.
  • Retriever 2: Sparse retrieval on a database of financial statements to extract precise financial data and metrics.
  • Combining information from these diverse sources using multiple retrievers allows for more nuanced and comprehensive analyses.

RAG Based on Generative Model Integration

RAG systems vary based on how they integrate with generative models, including the timing of integration with the model, as well as different types of context window usage, different control flow, and different forms of knowledge integration.

1. Sequential RAG

The retriever first fetches the relevant documents, and then the LLM generates the response based on the retrieved information.

Examples in Equity Research:

  • An analyst researching a company's competitive landscape might use a sequential RAG system.
  • The retriever would first gather relevant documents about the company and its competitors from sources like news articles, company filings, and industry reports.
  • Straightforward approach, but lacks the ability to dynamically adjust the search based on the initial findings.

2. Iterative RAG

Involves multiple rounds of retrieval and generation. The LLM might generate an initial response, then use that response to refine the search query and retrieve more relevant information, and then generate a more informed response based on the updated context.

Examples in Equity Research:

  • An analyst investigating a company's supply chain vulnerabilities could use an iterative RAG system.
  • The initial query might be broad, like "Risks to [Company X]'s supply chain." The first iteration might retrieve documents about general supply chain risks.
  • Based on this initial retrieval, the LLM could refine the query to be more specific, such as "[Company X] reliance on specific suppliers in [Country Y]."

3. Interleaved RAG

Tightly integrates retrieval and generation, allowing the LLM to make retrieval calls during the generation process. The model can dynamically request more information as needed while generating the response, allowing it to adapt and refine its answer in real-time.

Examples in Equity Research:

  • An analyst evaluating a potential merger between two companies might use an interleaved RAG system.
  • As the LLM generates its analysis, it might dynamically query for information about specific aspects of the merger, such as antitrust regulations, overlapping product lines, or cultural compatibility.
  • While assessing the merger's impact on market share, the LLM might dynamically query for the latest market share data for both companies and their competitors.

RAG Based on Knowledge Source

Data structure type, data update frequency, data source location, and domain specificity all have an impact on the types of RAG to be used.

Structured data sourced from databases or tables is easier to work with compared to unstructured data (free-form content as featured in documents or text) or semi-structured data such as JSON data structures.

Data can either be static knowledge that rarely changes (e.g., historical documents), or it can be dynamic (e.g., news feeds) or can be real-time knowledge that is constantly changing such as stock prices.

The external data can either be sourced internally from local knowledge bases, or from external third-party APIs, or may be hybrid.

Domain-specific data focused on a particular field would require a different approach as compared to broad, diverse information considered general knowledge.

1. Open-domain RAG

The knowledge base is vast and covers a wide range of topics, such as the entire web or a massive document collection.

Examples in Equity Research:

  • An analyst researching the overall macroeconomic environment might use an open-domain RAG system to gather information from sources like news articles, economic reports from government agencies, academic research papers, and Wikipedia.

2. Closed-domain RAG

The knowledge base is limited to a specific domain or topic—allows for more focused and accurate retrieval within that domain.

Examples in Equity Research:

  • An analyst evaluating a specific company's financial performance might use a closed-domain RAG system with access to a database of that company's historical financial statements, earnings call transcripts, and analyst reports.
  • Platforms like Bloomberg Terminal or FactSet, when augmented with LLM capabilities, could function as closed-domain RAG systems for financial analysis.

Conclusion

Retrieval-Augmented Generation (RAG) represents a significant advancement in enhancing the capabilities of Large Language Models by integrating robust information retrieval mechanisms. By addressing limitations such as factual accuracy and contextual relevance, RAG systems offer more reliable and comprehensive responses, particularly valuable in data-intensive fields like investment management and equity research. As RAG techniques continue to evolve, their applications will expand, offering even greater potential for intelligent, efficient, and precise information processing across various domains.