About Site Reliability Engineering - How Google Runs Production Systems

In the complex world of large-scale software systems, ensuring reliability while maintaining rapid development and deployment cycles is a formidable challenge. This book explores how Google, a pioneer in internet-scale computing, addresses this challenge through Site Reliability Engineering (SRE). It offers a comprehensive look at the principles, practices, and organizational philosophies that enable the company to run its production systems efficiently and resiliently.

SRE as a discipline: Combining software engineering and operations, SRE focuses on creating scalable and highly reliable software systems.
Service Level Objectives (SLOs): Defining clear, measurable targets for system performance...
Full summary available for members.

Log in or create a free account to view.

Site Reliability Engineering - How Google Runs Production Systems

Summary