A newbie’s guide to launching highly scalable website with AWS

Scaling up web applications is hard, and there’s no silver bullet or one magical solution that works for everyone. Many of our customers start with our directory software on shared hosting and big part of them will never have to worry about scaling up.

But some of you might be planning huge websites, and that’s when challenges come in.

Sometime back I wrote a post and I recommended Digital Ocean over AWS/EC2 because of the cost issues, however EC2 is right fit when you know you’ll be scaling up eventually and it’s worth paying extra.

A lot of you may have heard about magical powers of AWS but it still might be a mystery for you. After all, there’s too much stuff going on there. I hope by the end of this article you’ll uncover some parts of it.

I don’t need to rant about shared hosting issues but if you’re really interested, check this out: why we don’t recommend GoDaddy.

Types of large websites

When we say large websites, it can mean websites with:

a) Big database
b) High traffic
c) both

Small note on technology stacks: A lot of people email us and ask if we’re scalable enough because we’re using MySQL or why we aren’t using NoSQL.

To be honest it doesn’t matter what your technology stack is (most of the times), it’s how you architecture your web app. Although NoSQL does make it easier to scale up but at same times makes querying data a little challenging. And yes, MySQL does scale.

Why architecture is important?

As I said each web app presents its own challenges and you need to think about those challenges upfront to solve the scaling problem. For example, if you have large database but low-to-moderate traffic, you’ll want to use a different strategy than someone who has large traffic with a small database.

Another reason is resource utilization, we can always launch the most expensive instance at EC2 we can afford but what’s the point if you aren’t utilizing it the way you should? You’ll end up paying more!

Apart from that, architecture prepares you for strategically scaling up or down. And finally, it allows you to closely monitor different services.

Crash course of AWS terminology

Before we move forward, I’d like to clarify AWS terminology which we’re going to use in next section. These are different services available at AWS (you can read more on their website if my single line explanation doesn’t make sense or skip it entirely if you already know this):

EC2 (Elastic cloud computing): Service that allow you to launch virtual machines

S3 (Simple storage service): To upload your static files like images / videos etc.

RDS (Relational database service): MySQL hosting

ELB (Elastic load balancer): Load balancer that divides traffic to different available nodes / VMs

Lambda: Server less computing that allows you to run snippets of code without a dedicated machine/VM

Cloud Watch: Simple service that can trigger different events based on your resource usage

Common patterns of deployment

So now that we’re on same page, let’s explore some patterns of architecture and know when to use what. This is not just limited to Crowd Vox deployments but any web app there is. However with Crowd Vox we have many of these implementations ready to use, so get in touch with us if you need any assistance.

Setup

When to use it?

Why?

Setup #1 – Single EC2 instance

When you have no idea of how big your web app is going to be but you know eventually you’ll need to scale.

All your files and data are at AWS and moving up to a more complex setup within AWS will be much easier.

Setup #2 – RDS + Single EC2 instance

When you have large and growing database but traffic is low-to-moderate.

RDS is optimized for database and it can be scaled up at any time. Single EC2 instance is enough to handle traffic.

Setup #3 – S3 + Single EC2 instance

Small database but lot of user uploaded files / static files with low-to-moderate traffic

S3 has no limits on size and you pay for what you use. So this scales up automatically.

Setup #4 – RDS + S3 + Single EC2 instance

Large database + Lot of static files with low-to-moderate traffic

See above

Setup #5 – RDS + S3 + Multiple EC2 Instances + ELB

Large database + Lot of user generated content + High traffic

ELB will distribute traffic to your multiple EC2 instances so one single VM doesn’t run out of memory.

Setup #6 – Lambda + RDS + S3 + Multiple EC2 Instances + ELB

Large database + Lot of user generated content + High traffic + Recurring Memory intensive processes

A memory intensive process can be extra spike which occurs a few times during the day. For eg: importing a large file into database and likewise. Lambda will charge you for those extra computing second-wise without requiring a whole new VM.

Setup #7 – Lambda + RDS + S3 + Multiple EC2 Instances + ELB + Auto scaling event with Cloud Watch

Large database + Lot of user generated content + High traffic + Recurring Memory intensive processes + Big variation in traffic or fast-growing traffic

If you are finding yourself in middle of adding/removing EC2 nodes to ELB, Cloud watch can automate that for you. So you can scale up and down automatically.

 

As you can see, things get complicated as you grow but AWS has all the answers for each stage and that’s what makes it worth paying extra for. Big players like Netflix use Setup #7 – actually even more complicated version of that, check out this talk if you’re interested: Another Day in the Life of a Netflix Engineer (I was watching it the other day and this is what inspired me to write this article)

Scaling up & down

Now that you know about some common deployment patterns, it also pays to know when it’s time to make a switch and to what. Unfortunately we cannot provide all the answers here but I’ll provide you some cases which might give you some basic idea:

Let’s say you are running single EC2 instance (Setup #1) and you observe a lot of CPU spikes throughout the day (monitoring is an important aspect of app but we haven’t covered it here)

Now let’s say these CPU spikes are a result of growing traffic (something you have to deduce manually) you’re next step would be to consider upgrading the EC2 instance to a higher level.

But what if you are already running a powerful machine? Perhaps it’s time for introducing a load balancer.

But as I said, knowing when to act is whole another part of this game which requires monitoring your app metrics and ready-ness to scale up (lot of testing).

Back when I had no idea how this stuff worked, I had hired 3 engineers to answer one simple question: ‘when do I need to scale up?’ Their answers can be summed up as “we don’t know”. Later I realized, it wasn’t because they were incompetent or something –they just didn’t had proper tools to answer this question. In short, it was our lack of monitoring and metrics that they had to come up with that answer.

So I leave it at that, you’ll need to find these things out for yourselves:

a) When to scale up/down
b) When to add nodes or entirely switch to another architecture

Conclusion

Launching a highly-scalable web app is not difficult with set of tools offered by AWS but you still need to work and determine how you need to scale up and choose your tool stack accordingly. You also need to be prepared to handle any changes in architecture as you come to know more and more about your user base and how they are using the application.

If you’re looking to scale up your web app or Crowd Vox deployment, please feel free to reach us. At Crowd Vox we have worked with hundreds of customers in variety of solutions and have launched some really really big websites.