Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

merkez / sre-interview-prep-guide Public

forked from mxssl/sre-interview-prep-guide

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Code
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

sre-interview-prep-guide

/

README.md

Latest commit

History

276 lines (209 loc) · 18.2 KB

Breadcrumbs

sre-interview-prep-guide

/

README.md

File metadata and controls

276 lines (209 loc) · 18.2 KB

Site Reliability Engineer (SRE) Interview Preparation Guide

This repository is an attempt to consolidate useful resources for Site Reliability Engineer (SRE) interview preparation.

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!

Basics

Simple: What happens when you type in ‘www.cnn.com’ in your browser?
Detailed: What happens when you type google.com into your browser's address box and press enter?

Linux

What every SRE should know about GNU/Linux shell related internals: file descriptors, pipes, terminals, user sessions, process groups and daemons

Boot Process

An introduction to the Linux boot and startup processes
What happens when we turn on computer?
What happens when we turn on computer?
From Power up to login prompt

Filesystem

Understanding Inodes
Understand UNIX / Linux Inodes Basics with Examples
Understanding proc filesystem
Common Mount Options
Understanding Linux filesystems: ext4 and beyond

Kernel

Explain the basics of Linux kernel
Kernel Space and User Space
Linux Kernel Process Management
Linux Addressing
Linux Kernel Memory Management
STACK AND HEAP
Paging and Segmentation
Linux Kernel System Calls
The Virtual Filesystem
Concurrency and Race Conditions
Memory Leak
What is a kernel Panic?
Book about the linux kernel

Troubleshooting

Linux troubleshooting tools
Linux Performance Analysis in 60,000 Milliseconds
strace
lsof
Linux system debugging
SaaS where users can test their Linux troubleshooting skills

Networking

The Internet explained from first principles
Network protocols for anyone who knows a programming language
Introduction to Linux interfaces for virtual networking
Multi-tier load-balancing with Linux
Introduction to modern network load balancing and proxying
Load Balancing Algorithms

Containers

Introduction to Docker and Containers
Containers Patterns
Docker Container Anti Patterns
Anti-Patterns When Building Container Images

Kubernetes

Deploying and Scaling Microservices with Docker and Kubernetes
Demystifying the Kubernetes Iceberg
What happens when ... Kubernetes edition!
Kubernetes Production Patterns
Kubernetes production best practices
A Guide to the Kubernetes Networking Model
47 Things To Become a Kubernetes Expert
Kubernetes Best Practices 101
15 Kubernetes Best Practices Every Developer Should Know
THE KUBERNETES NETWORKING GUIDE
The life of a DNS query in Kubernetes

Infrastructure as code / Configuration management

Terraform
A Comprehensive Guide to Terraform
Ansible
Getting Started With Terraform on AWS
Google Cloud: Best practices for using Terraform

Databases

Things You Should Know About Databases
7 Database Paradigms
CAP theorem
Evolutionary Database Design
ACID vs BASE in Databases
Understanding Database Sharding
Database Replication
SQL vs. NoSQL Database: When to Use, How to Choose
How do database indexes work?
Redis Explained
Database Sharding Explained

CI/CD

7 Pipeline Design Patterns for Continuous Delivery
CI/CD patterns
Six Strategies for Application Deployment

Clouds

The Open Guide to Amazon Web Services
Learning Azure
Hands-On Training with GCP

Programming

Python

Python Basics
Python For Everyone
Complete Python Tutorial

Go (Golang)

A tour of Go
Go by Example
Go Tutorials & Examples
Learn Go with Tests
Getting up and running with Go
Effective Go
Go Design Patterns
Go Memory Management
Style Guide
Style Decisions
Best Practices
50 Shades of Go: Traps, Gotchas, and Common Mistakes for New Golang Devs

Big O Notation, Algorithms and Data Structures

AlgoExperts
Hacking a Google Interview – Handout 1
Hacking a Google Interview – Handout 2
Hacking a Google Interview – Handout 3

System design

SystemsExpert course from AlgoExpert
System Design 101
Grokking the System Design Interview
The System Design Primer
Crack the System Design Interview
System design interview for IT companies
Web Architecture 101
What's in a Production Web Application?
Distributed systems
Failover

System design examples

Designing WhatsApp
Designing Uber
Designing Tinder
Designing Instagram
Designing Netflix

Monitoring

SLOs & You: A Guide To Service Level Objectives
Setting up Service Monitoring — The Why’s and What’s
How NOT to Measure Latency
The four Golden Signals of Kubernetes monitoring

Prometheus

Introduction to Prometheus
Prometheus Relabeling Training
Avoid These 6 Mistakes When Getting Started With Prometheus
A Deep Dive Into the Four Types of Prometheus Metrics
How Prometheus Querying Works
PromQL Cheat Sheet

Processes

The practical guide to incident management
Incident Response
Postmortems
Runbooks
Identifying and tracking toil using SRE principles
Building SRE from Scratch
SRE at Google: Our complete list of CRE life lessons
Incident Management vs. Incident Response - What's the Difference?
Practical Guide to SRE: Using SLOs to Increase Reliability
Practical Guide to SRE: Automating On-Call
Going from Zero to SRE
An Incident Command Training Handbook
Howie guide to post‑incident investigations
Rundown of LinkedIn’s SRE practices
Rundown of Uber’s SRE practice
SRE in the Real World
SRE Engagement Models
SRE Checklist
Why bother with SLI and SLO?
The System Resiliency Pyramid

Resume

SRE Complete Resume Writing Guide

Interview

SRE interview process

How to hire talent
Recruitment process for a Google job (SRE, Site Reliability Engineer)

Interview Questions

A collection of questions to practice with for SRE interviews
SRE Interview Questions
Sysadmin Test Questions
Kubernetes job interview questions
DevOps Guide
Questions I ask in SRE interviews
DevOps Roadmap: Learn to become a DevOps Engineer or SRE
The Must-Know Terraform Interview Questions

Blogposts

SRE Interviews in Silicon Valley
Preparing the SRE interview
How to Get Into SRE
My Job Interview at Google
Path to Site Reliability Management
Becoming a Site Reliability Engineer
How I get a job at Google as SRE
Become A DevOps Engineer in 2023: [Detailed Guide]
How to Get an SRE Role

Books

SRE books

Site Reliability Engineering
The Site Reliability Workbook
Seeking SRE
Building Secure and Reliable Systems
Implementing Service Level Objectives

Linux

Linux Kernel Development (3rd Edition)
UNIX and Linux System Administration Handbook (5th Edition)
Linux Pocket Guide, 3rd Edition

Networking

TCP/IP Illustrated, Volume 1

Troubleshooting and Performance

Systems Performance: Enterprise and the Cloud
Systems Performance, 2nd Edition

Courses

Site Reliability Engineering: Measuring and Managing Reliability
School of SRE

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.