|
Hello Reader, Have you ever repeated yourself to an AI and thought, “Didn’t we already talk about this?” That frustration isn’t your fault. It’s how GenAI systems work by default. To overcome this, we need to implement memory. Now, there are a lot of confusion around this - do we need different types of memory, does this make RAG obsolete, and how does this even work? Let's learn all of it in today's edition. Agents Are StatelessBy default, agents are stateless. Previously, we used to combat this with adding everything from the current session in the context each time. However, this was unsustainable because:
Hence the concept of memory was born. Let's take a look at that. Memory - Short Vs LongThere are two kinds of memory - short term and long term. This is very similar to human memory. Short term memory is related to the current session. This short term memory is generally on the same hardware stack the LLM is running. Since most LLMs these days run on GPU, the short term memory will be the Video RAM. What do we know about RAM:
Now, short term memory is great for the current session. but how about, you closed your session, and you came back later. It'd be terribly inconvenient if you have to repeat yourself again. But short term memory is ephemeral, so how can we persist the info? This is where long term memory comes into play! Periodically, certain info from short term memory will be extracted, and saved to long term memory. This long term memory is saved in a vector store, which is typically on a hard disk. Hence, it's durable, cheap, but little slower than short term. That's okay, because once the info is retrieved for another session, and is used, it will be saved in short term memory again for faster access. Now, even though long term memory is cheaper compare to short term memory, you don't wanna fill it up with ALL info. Hence only below things are extracted from short term memory:
How is Memory Extraction Done?This is actually simple! Think about a process which can extract certain things from a wall of text - a LLM! A LLM with a prompt runs periodically, and extracts the three categories. You can customize this. Based on the agent type, you can extract other types of info and save it into the long term memory. Now, the question is how does the agent get the info from memory based on user query. Let's find out.
ImplementationYou'd separate yourself from the pack, if you can talk about the implementation! I am a tad biased with AWS (for the new readers - I was a Principal SA at AWS, where I spent 6.5 years before leaving and building my own startup). I am showing the implementation with AWS, but the major components are open source:
Now, how deep should you go on this for interviews, and for an explanation with a use case, check out my detailed video on this topic: Hope, this helped you understand Gen AI memory, and got you the answer that RAG is still alive and kicking! Till next time! If you have found this newsletter helpful, and want to support me 🙏: Checkout my bestselling courses on AWS, System Design, Kubernetes, DevOps, and more: Max discounted links AWS SA Bootcamp with Live Classes, Mock Interviews, Hands-On, Resume Improvement and more: https://www.sabootcamp.com/ Keep learning and keep rocking 🚀, Raj |
Free Cloud Interview Guide to crush your next interview. Plus, real-world answers for cloud interviews, and system design from a top AWS Solutions Architect.
Hello Reader, Not all System Designs are created equal! To make matters complicated, there are so many designs out there. As a former Principal Solutions Architect at AWS and Distinguished Cloud Architect at Verizon, I have taken over 300+ interviews, and I have seen three patterns coming over and over in interviews. In this newsletter edition, we will go through 3 System Design patterns that appear the MOST in cloud interviews and actual projects. If you nail these 3, you will be ahead of...
Hello Reader, Claude. ChatGPT. Gemini. Copilot. If you're not using at least one of these daily, you're the outlier. So here's the uncomfortable truth: walking into an interview and saying "I use Claude Code every day" is no longer impressive. It's table stakes. That's the average answer. And average doesn't get you hired. In today's edition, I'll show you what separates a forgettable Gen AI answer from one that makes the interviewer lean forward. The Average Answer (And Why It Fails) Here's...
Hello Reader, Recruiters reaching out to you for interviews. That's the dream, right? And one of the best ways to make that happen is a badge most cloud professionals have never heard of - the AWS Community Builder. I've had multiple students get accepted into this program recently. Recruiters started finding them on LinkedIn. Interview calls went up. And the best part? You don't need to be a Principal Architect or a 10x AWS certified rockstar to qualify. In today's newsletter, I'll show you...