Will MCP Make RAG Obsolete?


Hello Reader,

MCP is all the rage now. But does this make RAG obsolete? This is becoming a burning question in real-world projects and in interviews. In today's edition, let's take a closer look and find out the answer.

Let's go over what is what quickly.

RAG (Retrieval Augmented Generation)

  1. RAG (Retrieval Augmented Generation) is used where the response can be made better by using company specific context that the LLM does NOT have. You store relevant company data into a vector database. This is done by a process called embeddings where data is transformed into numeric vectors
  2. User gives a prompt which can be made better by adding company specific info
  3. A process (code/jupyter notebook/application) converts the prompt into vector and then search the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG) and returned
  4. The original prompt is AUGMENTED (Second part of RAG) with this company specific info and sent to LLM
  5. LLM GENERATES (Last part of RAG) the response and sends back to the user

MCP (Model Context Protocol)

MCP standardizes the communication between the agentic code and tools, and datasources. What does this mean?

  • An MCP client (think of a piece of code running inside the agent), connects to the MCP server, instead of connecting to the tool directly with a predefined API URL
  • The developers of each tool expose this MCP server
  • MCP client asks the server, "What can you do?". In response, the MCP server responds with the tool capability, description, and input/output schemas
  • IMPORTANT - this discovery is dynamic, and happening at runtime. If input/output field changes, this discovery call will reveal all the fields at runtime
  • The MCP client registers all these, and then can invoke the tool via the MCP server
  • The MCP server handles the connection to the tool. As a result, the code does NOT need to hardcode the API URLs like before

The Showdown

The obvious question on your mind is, well, if MCP can access the datasource, and the same datasource is being used in RAG, why don't I skip the whole RAG part, and just ask my question to the MCP Host App as shown above? There are a couple of problems with this approach:

  • If you want MCP to connect to the database directly, that means the app team needs to give access to the database to the MCP server.
  • If you have ever worked at an enterprise, you know teams don't want to give access to their actual database. You might think this is territorial (part of it is!), but there are also good reasons for it:
    • If one team is granted access, more teams will be granted access
    • As these teams access the actual database, it eats away at the read and write capacity for the business application to serve customers
    • Database is expensive, and scaling up means more cost
    • App team, managing the database, can't change the schema freely because it'd break the MCP flow

RAG actually solves this problem, how?

  • RAG flow does NOT access the actual database, but the vector database. This way, RAG queries don't consume database capacity
  • The embedding process typically runs nightly during off-hours
  • Embedding into a vector database also indexes the data, ideal for text queries
  • The vector database is not impacted even if the schema is changed in the underlying database

As we can see, RAG also has some strengths. And MCP is already powerful. What now? Turns out there is a middle ground, a best-of-both-worlds solution.

The Solution

Sometimes you do want to query the MCP host because the MCP host has access to many tools, and even other agents! In those cases, if you need to utilize the vector database, MCP has a server that interacts directly with the vector database! For AWS, such MCP server can interact directly with Bedrock Knowledge Base which is basically the vector database.

On the other hand, if you need the RAG flow, you can go the RAG route. RAG route is generally faster because it just queries the vector database. Meanwhile, for the MCP flow, handshaking is involved, as discussed above in the brief MCP overview. This back and forth introduces some latency.

In summary, RAG is going nowhere, and MCP complements RAG! If you get this question in your interview or projects, knock it out of the park!

If you have found this newsletter helpful, and want to support me 🙏:

Checkout my bestselling courses on AWS, System Design, Kubernetes, DevOps, and more: Max discounted links

AWS SA Bootcamp with Live Classes, Mock Interviews, Hands-On, Resume Improvement and more: https://www.sabootcamp.com/

Keep learning and keep rocking 🚀,

Raj

Fast Track To Cloud

Free Cloud Interview Guide to crush your next interview. Plus, real-world answers for cloud interviews, and system design from a top Solutions Architect at AWS.

Read more from Fast Track To Cloud

Hello Reader, Not all questions are equal in interviews and real-world projects. There are some questions that you simply can't mess up, because these concepts are so fundamental, they are used in almost ALL projects. One such concept is high availability. Surprisingly, I hear wrong answers on this all the time. In this edition, let's go over the common bad answers, a good answer, and then some! Question: What is High Availability? Bad Answers Even if a component fails, application should...

Hello Reader, EDA (Event Driven Architecture) has become increasingly popular in recent times. In this newsletter edition, we will explore what EDA is, what the benefits of EDA are, and then some advanced patterns of EDA, including with Kubernetes! Let's get started: An event-driven architecture decouples the producer and processor. In this example producer (human) invokes an API, and send information in JSON payload. API Gateway puts it into an event store (SQS), and the processor (Lambda)...

Hello Reader, In today’s post, let’s look at another correct but average answer and a great answer that gets you hired to common cloud interview questions. Question - What is RTO and RPO? Common mistakes candidate make - they say RPO (Recovery Point Objective) is measured in unit of data, e.g. gigabyte, petabyte etc. Here is the right answer - Both RPO and RTO are measured in time. RTO stands for Recovery Time Objective and is a measure of how quickly after an outage an application must be...