How to hire Site Reliability Engineers (SREs): 5 top qualities

The Site Reliability Engineer (SRE) role continues to gain momentum in enterprise IT. Hiring managers, consider this advice on how to spot a strong candidate
200 readers like this.
Resume on target with arrow through it

Site Reliability Engineering (SRE) continues to gain momentum among IT organizations. According to the Upskilling 2021: Enterprise DevOps Skills Report, 47 percent of survey respondents (up from 28 percent in 2020) say SRE is a must-have process and framework skill. As the demand for strong SRE skills rises, so does SRE hiring.

However, a challenge for business and hiring managers is determining which skills, traits, and competencies make a strong site reliability engineer. In light of the upcoming SRE-focused SKILup Day conference, I asked several DevOps Institute Ambassadors and SRE subject matter experts to weigh in on what makes a great SRE. Here’s what they had to say:

1. "Great SREs have a passion for high-quality automation. They have a lot of ideas about automation of toilsome production tasks that can improve reliability and save a lot of time for operations. They are good communicators and like to spend time with developers to understand how new products and services can be deployed and operated in high-scale, high-reliability environments." -Marc Hornbeek, CEO and principal consultant at Engineering DevOps Consulting and author of Engineering DevOps

[ Want DevOps best practices? Watch the on-demand webinar: Lessons from The Phoenix project you can use today. ]

2. "A great SRE ensures SLOs (Service Level Objectives) are set at correct boundaries of service; they define alerts to detect SLI (Service Level Indicator) thresholds. They enable developers on CI/CD automation, quality thresholds, and deployment automation using infrastructure as code. They enable developers to understand how their applications are performing in production building observability. They thoroughly understand deployment and fail-safe strategies. They influence in building fault-tolerant, autoscaling, cost-efficient, high-performing design and architecture.

"An SRE should ensure the consumption of platform standards and consistency of tooling. SREs handle on-call events and do post-mortems. They ensure error budgets are followed, they ensure self-regulation of velocity and stability, and they ensure excess Ops work overflows to the Dev team." -Shivagami Gugan, CTO at CX Tech Unicorn

3. Prize Communication.  "A great SRE must have a mix of developer and operations skills. Ideally is not just an ops person and not just a development person. The person must transition between ops and dev very smoothly. A great SRE knows how to communicate well, either writing documentation or talking with their colleagues (especially when working remotely)." -Andre Almar, Co-founder and technical trainer at DevOps Bootcamp

[ Get prepared. Read also: How to spot a great software developer: 7 interview questions and 10 top DevOps engineer interview questions for 2021. ]

4. Look for longer-term support experience. “When Google pioneered the SRE approach, they were adamant that all SREs be skilled developers. So, spotting a good SRE is very similar to how one would identify/screen for a good developer. In our company, we use HackerRank to test the proficiency of the devs we hire. Culturally though, the best SREs are developers who have spent time actually maintaining the products that they have built. Many organizations and service providers still adopt short-term project-oriented team structures, so developers end up being shuffled from one product to another instead of sticking with the same product and learning how to support/improve/stabilize it over time." -Lisa Chan, Head of software engineering & DevOps at PETRONAS

5. Look for a person that demonstrates empathy.  "Typically, the greatest concentration is on the technical skills, and yes, these are important and to be considered when looking at the toolset to be employed. However, knowledge in the use of tools is something that can be easily trained. Furthermore, any enterprise implementing good SRE is also considering that tools can be easily swapped out, so the need to know and have experience in specific technologies is really not as fundamental as other areas that can’t be trained.

"To spot a great SRE, it is key to find someone who has empathy. The greatest barrier to the implementation of any way of working is culture, and for Agile, DevOps and SRE, it is about an open culture. The greatest enemy to having a flowing and open culture is a closed mind. If a candidate is the kind of person who will consider their own role as primary and all others as secondary is possibly not a best fit. Therefore, and something of good advice to candidates also, is to have a holistic perspective for the role you are in and have a balanced perspective on how you fit and impact the other roles around you. Beyond holistics, it is also about having respect for what others do and the challenges they may face. In all, empathy!" -Stephen Walters, Solution architect at xMatters, Inc.

Want to continue the SRE discussion? Connect with DevOps Institute Ambassadors and speakers during SKILup Day: Site Reliability Engineering on May 20. Register here.

Jayne Groll is co-founder and CEO of the DevOps Institute (DOI). Jayne carries many IT credentials including ITIL Expert™, Certified ScrumMaster, Certified Agile Service Manager, DevOps Foundation and is a Certified Process Design Engineer (CPDE)™.