There is No Best Design - Only Pros and Cons ⚖


Hello Reader,

When I became a Cloud Architect a decade back, I always pursued perfect architecture. I'd search for AWS service which is THE best! However, after designing and deploying many critical systems, I came to the conclusion that no system design is perfect. Every service, tool, and design has trade-offs. In today's edition, let's discuss some of the main ones:

  • Serverless Vs Kubernetes: Let's start with the most contentious one. I have been in countless meetings when person A thinks Lambda is the best and person B thinks Kubernetes is the best. In reality, both have their superpowers and considerations. With AWS Lambda, you don't need to manage any physical or virtual infrastructure. It scales automatically, pay as you go, and is inherently highly available. You might think you'd use AWS Lambda for every application! However, it's time-consuming (and sometimes complex) to refactor existing apps running in VM to Lambda, and Lambda doesn't support GPU and can only run for 15 minutes. On the other hand, if you are Kubernetes fanatic, you may think it's open source, cloud agnostic, and can run virtually anything. But you have to consider the overhead to manage and upgrade the cluster, take care of the AMI of the worker nodes, and the time needed to upskill resources. These are just a few examples. Hopefully, you are starting to see that nothing is the absolute best or worst but a mix of pros and cons.

  • DynamoDB Vs DSQL Vs Amazon RDS : Database requires critical consideration before choosing. Often I'd come across students who are gung-ho on Dynamo, and want to use it for everything. Sure, DynamoDB offers low latency, high availability, active active replication via Global Table, and more. But in real-world systems you need the capability to join tables, and run complex queries, which Dynamo doesn't support. On the other hand, Amazon RDS supports multiple database engines (MySQL, Oracle, etc.) , is easier to migrate to from on-prem databases, and provides complex join and querying capabilities. But you pay the cost of the underlying RDS instance irrespective of how much you are utilizing it; you have to ensure high availability, and it can't handle as high a scale as DynamoDB without you implementing different techniques such as sharding, read replica, storage scaling, caching, etc.
    • DSQL is AWESOME - I mean, I myself said this in my LinkedIn post after its release at Re:Invent 2024. Even though it has awesome features like autoscaling, pay as you go, and active active replication for a SQL database, it has some considerations as well. For example, DSQL does NOT support foreign keys, which is widely used in transactional systems. It also doesn't support Views, Triggers, Triggers etc. Hopefully, it's becoming clear that every service can be awesome or inapplicable depending on your project requirements.

  • Increased Reliability Vs Cost: This one is pretty evident. You can provision more EC2s in multiple AZs and even other AWS regions. The reliability increases, and so does the cost. Hence, you need to decide on the degree of reliability based on the criticality of the app and the cost. I have implemented projects where we had to multisite active-active DR, which is super expensive, but the app was a critical one for the company, and that application was bringing lot of revenue ;). On the other hand, I have even implemented a single AZ deployment because the application is just informational and not critical.

As you can see, there is no right or wrong decision; you choose based on the requirements. I would like to end this newsletter with one of my favorite quote from former AWS CEO, and current Amazon CEO Andy Jassy: "You have to use the right tool for the right job"

Over to you - what design trade-offs have you made in the past?

If you have found this newsletter helpful, and want to support me 🙏:

Checkout my bestselling courses on AWS, System Design, Kubernetes, DevOps, and more: Max discounted links

AWS SA Bootcamp with Live Classes, Mock Interviews, Hands-On, Resume Improvement and more: https://www.sabootcamp.com/

Keep learning and keep rocking 🚀,

Raj

Fast Track To Cloud

Free Cloud Interview Guide to crush your next interview. Plus, real-world answers for cloud interviews, and system design from a top Solutions Architect at AWS.

Read more from Fast Track To Cloud

Hello Reader, Another day, another MCP tool. But this one is special. Today we are going to go over newly released EKS MCP server. This is the official Kubernetes MCP server released and maintained by AWS. This one will rule them all! In today's edition, we are going to go over what it is, why this one is a game changer, how you can use this to get job interviews and demand more money, and whether it will eliminate SRE jobs. There are three Ways to Manage Kubernetes : Traditional Way (Manual...

Hello Reader, We all heard it - Gen AI is taking away your job. The reality is, it is for sure impacting your job functionalities. However, there is a bigger reason why many folks are failing interviews and not growing in their career, due to Gen AI, but NOT for the reasons you think! Let's dive deep You're Learning the Wrong Gen AI Tools ⚙ There are countless Gen AI tools and services out there. But here’s the key question you should be asking: Which tools are actually being used by the...

Hello Reader, MCP is all the rage now. But does this make RAG obsolete? This is becoming a burning question in real-world projects and in interviews. In today's edition, let's take a closer look and find out the answer. Let's go over what is what quickly. RAG (Retrieval Augmented Generation) RAG (Retrieval Augmented Generation) is used where the response can be made better by using company specific context that the LLM does NOT have. You store relevant company data into a vector database....