Hello Reader, Gen AI is here to stay! One of the most popular pattern in Gen AI is RAG, which stands for Retrieval Augmented Generation. In my day to day work as an AWS SA, this is coming up more and more in my customer conversations. In this newsletter edition, we will first level set what is RAG, and then show one of the implementation architecture on AWS. RAG (Retrieval Augmented Generation): 1. RAG (Retrieval Augmented Generation) is used where the response can be made better by using company specific context that the LLM does NOT have. You store relevant company data into a vector database. This is done by a process called embeddings where data is transformed into numeric vectors 2. User gives a prompt which can be made better by adding company specific info 3. A process (code/jupyter notebook/application) converts the prompt into vector and then search the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG) and returned 4. The original prompt is AUGMENTED (Second part of RAG) with this company specific info and sent to LLM 5. LLM GENERATES (Last part of RAG) the response and sends back to the user AWS Sample Implementation: The above is a Serverless implementation (Pay as you go, and no servers to manage!) for the RAG. Question for you readers - How are you studying Gen AI? Keep Learning and Keep Rocking 🚀, Cloud With Raj |
Free Cloud Interview Guide to crush your next interview. Plus, real-world answers for cloud interviews, and system design from a top Solutions Architect at AWS.
Hello Reader, Most engineers are using MCP clients and agents. But very few know how to build and host an MCP server, let alone run it remotely on the cloud. In today's edition, we will learn how to create and run a remote MCP server on Kubernetes, on Amazon EKS! I will share the code repo as well, so you can try this out yourself. But first.. 🔧 What is an MCP Server really? It’s not just an API that performs a task. An MCP Server is a protocol-compliant endpoint (defined by Anthropic) that...
Hello Reader, On my interactions, this question is coming up a lot - “How are AWS Strands different from Bedrock Agents?”. In today's newsletter, we will go over this, so you can also answer this in your interviews or real-world projects Let’s break it down using a practical example: What happens when a user asks an LLM app: What’s the time in New York? What’s the weather there? List my S3 buckets The LLM don't have these information, hence it needs to invoke tools for time, weather, and AWS...
Hello Reader, Another day, another MCP tool. But this one is special. Today we are going to go over newly released EKS MCP server. This is the official Kubernetes MCP server released and maintained by AWS. This one will rule them all! In today's edition, we are going to go over what it is, why this one is a game changer, how you can use this to get job interviews and demand more money, and whether it will eliminate SRE jobs. There are three Ways to Manage Kubernetes : Traditional Way (Manual...