Site Reliability Engineering - How Google Runs Production Systems
Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff
Summary
In the complex world of large-scale software systems, ensuring reliability while maintaining rapid development and deployment cycles is a formidable challenge. This book explores how Google, a pioneer in internet-scale computing, addresses this challenge through Site Reliability Engineering (SRE). It offers a comprehensive look at the principles, practices, and organizational philosophies that enable the company to run its production systems efficiently and resiliently.
- SRE as a discipline: Combining software engineering and operations, SRE focuses on creating scalable and highly reliable software systems.
- Service Level Objectives (SLOs): Defining clear, measurable targets for system performance...
Full summary available for members.
Log in or create a free account to view.