|
Hello Reader, Have you ever repeated yourself to an AI and thought, “Didn’t we already talk about this?” That frustration isn’t your fault. It’s how GenAI systems work by default. To overcome this, we need to implement memory. Now, there are a lot of confusion around this - do we need different types of memory, does this make RAG obsolete, and how does this even work? Let's learn all of it in today's edition. Agents Are StatelessBy default, agents are stateless. Previously, we used to combat this with adding everything from the current session in the context each time. However, this was unsustainable because:
Hence the concept of memory was born. Let's take a look at that. Memory - Short Vs LongThere are two kinds of memory - short term and long term. This is very similar to human memory. Short term memory is related to the current session. This short term memory is generally on the same hardware stack the LLM is running. Since most LLMs these days run on GPU, the short term memory will be the Video RAM. What do we know about RAM:
Now, short term memory is great for the current session. but how about, you closed your session, and you came back later. It'd be terribly inconvenient if you have to repeat yourself again. But short term memory is ephemeral, so how can we persist the info? This is where long term memory comes into play! Periodically, certain info from short term memory will be extracted, and saved to long term memory. This long term memory is saved in a vector store, which is typically on a hard disk. Hence, it's durable, cheap, but little slower than short term. That's okay, because once the info is retrieved for another session, and is used, it will be saved in short term memory again for faster access. Now, even though long term memory is cheaper compare to short term memory, you don't wanna fill it up with ALL info. Hence only below things are extracted from short term memory:
How is Memory Extraction Done?This is actually simple! Think about a process which can extract certain things from a wall of text - a LLM! A LLM with a prompt runs periodically, and extracts the three categories. You can customize this. Based on the agent type, you can extract other types of info and save it into the long term memory. Now, the question is how does the agent get the info from memory based on user query. Let's find out.
ImplementationYou'd separate yourself from the pack, if you can talk about the implementation! I am a tad biased with AWS (for the new readers - I was a Principal SA at AWS, where I spent 6.5 years before leaving and building my own startup). I am showing the implementation with AWS, but the major components are open source:
Now, how deep should you go on this for interviews, and for an explanation with a use case, check out my detailed video on this topic: Hope, this helped you understand Gen AI memory, and got you the answer that RAG is still alive and kicking! Till next time! If you have found this newsletter helpful, and want to support me 🙏: Checkout my bestselling courses on AWS, System Design, Kubernetes, DevOps, and more: Max discounted links AWS SA Bootcamp with Live Classes, Mock Interviews, Hands-On, Resume Improvement and more: https://www.sabootcamp.com/ Keep learning and keep rocking 🚀, Raj |
Free Cloud Interview Guide to crush your next interview. Plus, real-world answers for cloud interviews, and system design from a top AWS Solutions Architect.
Hello Reader, I just unveiled the SA Bootcamp. The bootcamp covers everything you need to become an SA in as little as 3 months and spoiler alert its not just technical. This Bootcamp is a one of its kind because its taught by a Top SA still working on world class projects. And good news - it already worked for last cohort's students who secured cloud jobs in top FAANG companies, and some of them didn't even have cloud experience 💰. This SA bootcamp offers… a proven blueprint for the fastest...
Hello Reader, Are you thinking about becoming an AWS SA? The demand for AWS Solutions Architects has never been higher. And the data indicates it will continue to rise because there are literally trillions of dollars worth of projects currently running on legacy technologies that need to be migrated to the cloud. SA Bootcamp is developed to be the most direct and guided route to become a Solutions Architect and get a high paying cloud job. In as little as 3 months you could be an AWS SA...
Hello Reader, Happy New Year 2026 to you and your family 🎉. 2025 was a big year for me both professionally and personally. My biggest achievements of 2025 are delivering critical customer projects that YOU use in your life, starting a Start Up, and helping my students succeed. In this email, I will share some highlights and lessons that helped me: If you live in the US, you have certainly used one of the projects I have architected. When a commercial airplane pilot goes up or down, or turn...