Today’s IT infrastructure and operations leaders face daunting challenges. New infrastructure and architectures such as microservices, containers, serverless, and more have co-enabled IT teams to increase scalability and flexibility. Modern business technology ecosystems that serve hundreds of millions of customers simultaneously have become so complex that no single person can grasp every detail of their infrastructure, software, and services.
According to DevOps Institute chief research director Eveline Oehrlich, the present and future of the digital business are driving a wide variety of applications, business services, and data sources deployed on diverse platforms including on-premises, private cloud, hybrid, public cloud, and multi-cloud environments. Adoption of new technologies, coupled with the increased speed in application delivery, has caused a demand for IT operations professionals with updated skills and knowledge. Findings from the 2021 Upskilling Report show that global site reliability engineering (SRE) adoption in 2021 was at 22 percent and is predicted to double in 2022.
To that end, companies are on the search for site reliability engineers, whose job is to help create and implement automated software tools that maximize a system’s reliability and efficiency while working closely with software development and IT operations teams.
Why IT professionals see rising demand for SREs
I asked DevOps Institute Ambassadors to share their thoughts about why the role of site reliability engineer is one of the most in-demand job titles this year. Here’s what they had to say.
Removal of performance bottlenecks
"The demand to perform continuing reliability analysis of existing infrastructure by focusing on removing performance bottlenecks – all while optimizing the infrastructure and workflows to deliver operational resilience – makes a site reliability engineer an attractive role to fill.
“Over the past few years, site reliability engineering teams have reduced significant amounts of toil across the deployment pipeline, reduced technical debt, and ensured improved customer experiences. SRE has been the only approach to derive the value you get from something running in production and help recalibrate design and development for sustainable business value.” - Suresh GP, managing director, TaUB Solutions LLC, USA
Software development has caused a shift
“We are in the middle of a massive shift in how we develop software. First came the use of containers, then new ways to manage containers in a production environment, and now microservices. All of these changes require extra help in the area of maintaining service level objectives and ensuring the reliability of software. This shift drives the rising demand for site reliability engineers. Adapting the DevOps pipeline to manage the full application stack will be critical now and in the future.” - Tracy Ragan, CEO and co-founder, DeployHub
Complicated problems get a real-time solution
"Site reliability engineers sound like problem-fixers to hiring managers, which is always a quality in high demand. They jump in and find a way to move from identifying a problem to finding the root causes and offering solutions. These tend to be senior people in DevOps organizations who understand the process from idea to delivery to customer response.
“Employing an individual you can hand a complicated problem to and getting a real-time solution is powerful. However, the truth is, very few individuals can make this work regularly in large organizations. Things are too complicated, rooted in paperwork, and tied to other structures that prevent success. If your organization has dedicated DevOps teams, site reliability engineers can make a huge difference in taking items that would appear as unplanned work off the team and contribute to overall success.” - Mark Peters, technical lead, Novetta
Increased DevOps adoption requires more SRE roles
“With the increased adoption of DevOps by organizations and digital businesses, the site reliability engineer role becomes more important to support sustainable growth. They play a key role in business improvement and building better IT services within an organization. Their role can also be seen as a more proactive form of QA and they are dedicated full-time to creating software that improves the reliability of systems in production, fixing issues, responding to incidents, and usually taking on-call responsibilities.” - Parveen Kr. Arora, co-founder and director, VVnT Foundation
[ Need to explain key Agile and DevOps terms to others? Get our cheat sheet: DevOps Glossary. ]
Providing a bridge between development and operations
“By bringing a software engineering perspective to system administration, site reliability engineers provide a bridge between development and operations. They teach developers how to create dependable services, which any company needs. They aim to ensure that an organization’s systems are operational 24 hours a day, seven days a week. Here, security, stability, and scalability are critical.” - Supratip Banerjee, solutions architect, Principal Global Services
Changing the siloed processes of Dev and Ops
"Because DevOps has a problem! The Puppet State of DevOps Report has shown that organizations are mired in the mid-tier of DevOps transformation maturity. In many cases, this is because it is development teams that have been driving the transformation off the back of Agile transformations. The first thing they have implemented as technologists is technology – automation and toolchains. In that process, they have forgotten about changing the culture, learning process, measurements, and sharing what they do. They still allow the last great silo wall between Dev and Ops to survive, and Ops gets left behind.
“To combat this, IT Ops undertakes an SRE transformation that lags the DevOps transformation that created so many DevOps Engineers. Now we have a demand for site reliability engineers. Ironically, this continues the great silo wall between Dev and Ops.” - Stephen Walters, field CTO, CEM Digital
SREs can keep up with competitive marketplaces
"Site reliability engineers are in demand simply because, without a smart and capable SRE practice, most organizations wouldn’t be able to survive in current competitive marketplaces.
“Adding to that, it is not easy to find a good site reliability engineer because it is a special function that requires many skills – i.e., coding, scripting in multiple languages; and knowledge in multiple areas – i.e., cloud computing, virtualization, containerization, Kubernetes, OS, etc.” - Samer Akkoub, senior alliances/channels solutions architect (APJ), GitLab
[ Share this free eBook with your SRE team: A guide to Kubernetes for SREs and sysadmins ]
Reliable service depends on SRE talents
“All companies depend on software to run and can no longer afford IT outages where even a few minutes outage can mean a significant loss in revenue and reputation. The adoption of cloud technologies and microservices architecture has made it easy for companies to have global reach and rapidly scale their services to meet the demand of millions of customers. However, running at that scale requires robust practices to ensure you can run a reliable service. That’s what site reliability engineering aims to solve and why site reliability engineers are so in demand.” - Craig Cook, principal engineer, Catapult CX
Continue learning about Site Reliability Engineering at the next SKILup Day on February 17, 2022.
What to read next
Subscribe to our weekly newsletter.
Keep up with the latest advice and insights from CIOs and IT leaders.