Will MCP Make RAG Obsolete?

Hello Reader,

MCP is all the rage now. But does this make RAG obsolete? This is becoming a burning question in real-world projects and in interviews. In today's edition, let's take a closer look and find out the answer.

Let's go over what is what quickly.

RAG (Retrieval Augmented Generation)

RAG (Retrieval Augmented Generation) is used where the response can be made better by using company specific context that the LLM does NOT have. You store relevant company data into a vector database. This is done by a process called embeddings where data is transformed into numeric vectors
User gives a prompt which can be made better by adding company specific info
A process (code/jupyter notebook/application) converts the prompt into vector and then search the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG) and returned
The original prompt is AUGMENTED (Second part of RAG) with this company specific info and sent to LLM
LLM GENERATES (Last part of RAG) the response and sends back to the user

MCP (Model Context Protocol)

MCP standardizes the communication between the agentic code and tools, and datasources. What does this mean?

An MCP client (think of a piece of code running inside the agent), connects to the MCP server, instead of connecting to the tool directly with a predefined API URL
The developers of each tool expose this MCP server
MCP client asks the server, "What can you do?". In response, the MCP server responds with the tool capability, description, and input/output schemas
IMPORTANT - this discovery is dynamic, and happening at runtime. If input/output field changes, this discovery call will reveal all the fields at runtime
The MCP client registers all these, and then can invoke the tool via the MCP server
The MCP server handles the connection to the tool. As a result, the code does NOT need to hardcode the API URLs like before

The Showdown

The obvious question on your mind is, well, if MCP can access the datasource, and the same datasource is being used in RAG, why don't I skip the whole RAG part, and just ask my question to the MCP Host App as shown above? There are a couple of problems with this approach:

If you want MCP to connect to the database directly, that means the app team needs to give access to the database to the MCP server.
If you have ever worked at an enterprise, you know teams don't want to give access to their actual database. You might think this is territorial (part of it is!), but there are also good reasons for it:
- If one team is granted access, more teams will be granted access
- As these teams access the actual database, it eats away at the read and write capacity for the business application to serve customers
- Database is expensive, and scaling up means more cost
- App team, managing the database, can't change the schema freely because it'd break the MCP flow

RAG actually solves this problem, how?

RAG flow does NOT access the actual database, but the vector database. This way, RAG queries don't consume database capacity
The embedding process typically runs nightly during off-hours
Embedding into a vector database also indexes the data, ideal for text queries
The vector database is not impacted even if the schema is changed in the underlying database

As we can see, RAG also has some strengths. And MCP is already powerful. What now? Turns out there is a middle ground, a best-of-both-worlds solution.

The Solution

Sometimes you do want to query the MCP host because the MCP host has access to many tools, and even other agents! In those cases, if you need to utilize the vector database, MCP has a server that interacts directly with the vector database! For AWS, such MCP server can interact directly with Bedrock Knowledge Base which is basically the vector database.

On the other hand, if you need the RAG flow, you can go the RAG route. RAG route is generally faster because it just queries the vector database. Meanwhile, for the MCP flow, handshaking is involved, as discussed above in the brief MCP overview. This back and forth introduces some latency.

In summary, RAG is going nowhere, and MCP complements RAG! If you get this question in your interview or projects, knock it out of the park!

If you have found this newsletter helpful, and want to support me 🙏:

Checkout my bestselling courses on AWS, System Design, Kubernetes, DevOps, and more: Max discounted links

AWS SA Bootcamp with Live Classes, Mock Interviews, Hands-On, Resume Improvement and more: https://www.sabootcamp.com/

Poll

⭐ I loved this newsletter

👁 It was okay

😭 I hated this newsletter

Keep learning and keep rocking 🚀,

Raj

Fast Track To Cloud

Will MCP Make RAG Obsolete?

RAG (Retrieval Augmented Generation)

MCP (Model Context Protocol)

The Showdown

The Solution

Gen AI Layers and Most Job Opportunities 🚀

Simple Steps to Become AWS Community Builder 🚀

💻Common Interview Question Candidates Get Wrong