Cover for Site Reliability Engineering - How Google Runs Production Systems

Site Reliability Engineering - How Google Runs Production Systems

Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff

Summary

In the complex world of large-scale software systems, ensuring reliability while maintaining rapid development and deployment cycles is a formidable challenge. This book explores how Google, a pioneer in internet-scale computing, addresses this challenge through Site Reliability Engineering (SRE). It offers a comprehensive look at the principles, practices, and organizational philosophies that enable the company to run its production systems efficiently and resiliently.

  • SRE as a discipline: Combining software engineering and operations, SRE focuses on creating scalable and highly reliable software systems.
  • Service Level Objectives (SLOs): Defining clear, measurable targets for system performance...

    Full summary available for members.

    Log in or create a free account to view.