Loading...

This presentation is an HTML5 website

Press key to advance.

Slides controls
  • and to move around.
  • Ctrl/Command and + or - to zoom in and out if slides don’t fit.
  • S to view page source.
  • T to change the theme.
  • H to toggle syntax highlight.
  • N to toggle speaker notes.
  • 3 to toggle 3D effect.
  • 0 to toggle help.

Cruise Control for the Cloud

Your Subtitle Here

Agenda - Table of Contents

Cruise Control for the Cloud

Lakitu

Ronald Dahlgren - Software Engineer @ Lootsie

What do I mean by ‘Cloud’?

What do I mean by ‘Cloud’?

  • Infrastructure as a Service (IaaS) products

What do I mean by ‘Cloud’?

  • Infrastructure as a Service (IaaS) products
  • Virtualized, sometimes ephemeral machines

What do I mean by ‘Cloud’?

  • Infrastructure as a Service (IaaS) products
  • Virtualized, sometimes ephemeral machines
  • On-demand, networked computational resources

The Hockey Stick Graph

  • Business-speak for a period of geometric growth
  • Important to expect
  • Don’t fear it!

An Example

Hockey Stick Graph

Geometric Growth

Once you get traction with your audience, expect the hockey stick…

One User

Geometric Growth

Six Users

Geometric Growth

Fifteen Users

Geometric Growth

Thirty Users

Geometric Growth

Thirty-Seven Users

Geometric Growth

Typical System Load

Hockey Stick Graph

Load Testing Limits

Hockey Stick Graph

Unknown Behavior!

Hockey Stick Graph

Planning for Scaling

Plan for the hockey stick early - don’t fear success.

This can be done for cheaper than you may think!

What this talk is about

Some ideas for building a friendly system.

  • You can use FOSS software and a cheap IaaS provider to build really amazing things.
  • Doesn’t require a full team to keep it healthy.

So what kind of qualities make a system friendly?

An Ideal System should be..

So what kind of qualities make a system friendly?

An Ideal System should be..

  • Easy to Inspect

So what kind of qualities make a system friendly?

An Ideal System should be..

  • Easy to Inspect
  • Speaks up when there’s a problem

So what kind of qualities make a system friendly?

An Ideal System should be..

  • Easy to Inspect
  • Speaks up when there’s a problem
  • Doesn’t cry wolf

Easy to Inspect

  • Provides monitoring of both the host and the application(s)
  • Exposes healthchecks to get that warm fuzzy feeling

Speaks up when there’s a problem

  • Alerts people when something bad happens

Doesn’t Cry Wolf

Doesn’t cause alerts and fire alarms because of trivialities

  • Hard disk space
  • Host has gone dark

Key Concepts

Themes to keep in mind

  • Automation
  • Monitoring
  • Failover
  • Alerting
  • Scaling
  • Distribution of authority
  • Be as stateless as possible

Better Living through Automation

Always keep in mind - “Fix, then automate”

  • Anything done repeatedly can be automated
  • Code deployments, starting new machines, testing, etc

What’s Happening? (Monitoring)

  • Examples
    • Host level
      • Disk full?
      • Network I/O going crazy?
      • CPU pegged?
      • Swapping like mad?
    • Application level
      • Requests per second
      • Request latency
      • Application health
  • Best Practices
    • Application health checks
    • Dashboards
    • System-wide view, able to drill down to machine level view
    • Log aggregation

Have a Wing Man (Failover)

  • Requirements
    • Detect failures (healthchecks)
    • Remove misbehaving nodes from service pools
    • Ideally, add a replacement instance back into the pool
  • Best Practices
    • Use a load balancer atop all distinct service components
    • Stop bad nodes, don’t destroy the instance (forensics)

See Something, Say Something (Alerting)

  • Requirements
    • Fixable failures
      • Disk space is close to full (fix then automate)
      • Heavy swap usage
      • CPU is pegged
    • Unrecoverable Errors
      • Host has gone dark
  • Best Practices

Scaling

  • Requirements
    • New nodes can be provisioned automatically
    • Under-used resources can be release (watch for oscillations!)
  • Best Practices
    • Use configuration management (Chef, Puppet, etc)
    • Use an IaaS provider with a REST API for provisioning
    • With AWS, bake release AMIs

Distributed Sources of Authority

A system design facet

  • Single sources of authority are inherently bottlenecks
  • RFC 4122 UUIDs are magic
  • Distributed data solutions are out there, and free
    • Hadoop
    • Riak
    • Redis w/ replication

Wiring It Together

  • Example of a self-healing component
  • Example of on-demand scaling

FOSS Tools

Let’s take a look at some of the free open-source software tools out there…

FOSS Tools, Automation

  • Chef - http://www.opscode.com/chef/
  • Jenkins CI - http://jenkins-ci.org/
  • Rundeck - http://rundeck.org

FOSS Tools, Monitoring

  • Ganglia - http://ganglia.sourceforge.net/
  • Graphite - http://graphite.wikidot.com/
  • Riemann - http://riemann.io/
  • Graylog2 - http://graylog2.org/
  • Metrics Libraries - http://metrics.codahale.com/

FOSS Tools, Failover

  • HAProxy - http://haproxy.1wt.eu/

FOSS Tools, Alerting

  • Nagios - http://www.nagios.org/
  • OpenNMS - http://www.opennms.org/

Tools to make this whole process easier

  • AWS OpsWorks
  • AWS CloudFormation
  • RightScale
  • Rundeck + Chef + IaaS services
  • OpenStack?

Quick Recap

  • Start planning for growth early
  • Keep the fundamentals in mind when designing systems

Next Time…

  • Specific architectural patterns
  • Specific tool choices
  • Lessons learned with the suggested tools

Q&A Time!

Slides available at http://dahlgren.work/