Digital transformation: How we took NoOps to the next level

Dynatrace CTO Bernd Greifeneder explains how an evolution of the "no operations" approach is making life easier for engineers and improving customer experience
322 readers like this.

Dynatrace is a Software-as-a-Service company, with its origins in digital, so our own digital transformation might look a little different than organizations that have had to move processes from analog/manual to digital/automated or take on cloud migration from scratch. We’re working on transformation in multiple areas, but our operations approach has been at the core of much of that change.

As our SaaS and Managed Service products have evolved and our team has grown, scaling became one of our biggest challenges. We have operated for the last five years on a NoOps approach – there is no operations team – only a DevOps-inspired release coordination team acting as a NoOps role model for the entire engineering team.

This leaves engineering teams responsible for deploying and operating from automation all the way to production. Because we were already organized around the NoOps model, it seemed counterintuitive to try and grow the release coordination team to scale with the business. It’s called NoOps for a reason; teams shouldn’t be that large or complex.

It’s called NoOps for a reason; teams shouldn’t be that large or complex.

Instead of expanding that team, we created a practice based on the NoOps idea, as well as our own experiences with cultural, business, and organizational best-practices that could be documented and enabled. This was required to enable more of the rapidly-growing engineering teams with NoOps, or now with a more structured enablement approach. Taking that approach beyond NoOps at scale is what we call Autonomous Cloud Enablement (ACE).

[ Want to spur a more innovative mindset? Read: Culture hacking: 5 examples you can borrow. ]

ACE vs. SRE: Our logic

We’ll have to take a step back here to talk about what ACE is and why we went that route. We knew many other companies employ Site Reliability Engineering (SRE). SRE teams are responsible for reliability aspects including availability, latency performance, efficiency, change management, monitoring, emergency response, and capacity planning – acting as a bridge to operations. With such a full task list, SRE teams create automation for software they didn’t build themselves. Which, ironically, starts to push them back toward serving the same role as a traditional ops team just with more dev-like automation. 

With ACE, our goal was to keep the production (ops) responsibility with the developers who built the services.

With ACE, our goal was to keep the production (ops) responsibility with the developers who built the services, because this is the only way they feel the pain of poor architecture, poor performance, poor coding, and poor usability. As a result, they proactively take precautions, add monitoring, and include production information in daily standups. And because no developer wants to be woken up at night because a system is down, they are incentivized to ensure the software they build is as self-healing as possible. 

The key difference here is that while an SRE engineer might automate the product after-the-fact, ACE practices ensure the product and services are engineered to be self-healing from the start. This removes bottlenecks when it comes to scalability, and creates bandwidth for teams to be creative and proactive.

[ Read also: DevOps vs. ITIL 4 vs. SRE: Stop the arguments. ]

Culture change required

ACE is more than an ops model, it’s also a philosophy and culture, and it has facilitated our organizational growth and ushered in new ways for us to collaborate. This has been an important transformation for Dynatrace. Obviously, ACE is about removing siloes, which automatically leads to greater collaboration, but it doesn’t stop there. The model is not well-supported by traditional ways of working together, so we had to identify, and in some instances create, new tools and methodologies that allowed us to optimize ACE.

I've previously written about how using of automation and cloud technologies to support a NoOps model can impact your company culture. In the early stages of NoOps adoption, it’s very common to see internal teams becoming nervous about the change; technicians are hesitant to understand the business, while the business teams are often cautious about learning operations. ACE teams face many of the same challenges in implementing a new set of practices across an organization, which is a challenging cultural shift. 

In the early stages of NoOps adoption, it’s very common to see internal teams becoming nervous about the change.

We achieved that culture shift by introducing and training teams on new automation tools, breaking down departmental siloes, deploying new processes, and getting buy-in from leadership. This was a gradual process that took us a year to onboard.

[ Want advice on culture? Read: How to scale company culture: 5 tips for IT leaders ]

Supported by automation benefits, our ACE culture leads to a more effective team overall and, therefore, better software. Treating the symptoms of a broken product after the fact is not as effective or efficient as automatically self-healing before issues even arise. Taking this approach to operations, products, and services makes life easier on our engineers, and ultimately, leads to more value for our customers.

Taking customer experience to the next level

Results demonstrate our initial success. From 2017-2019, customer-identified production bugs remained steady at 7 to 8 percent despite greater than 100 percent growth in the number of customers using Dynatrace SaaS or managed services during that same period. This means despite our dramatically increasing scale, our dev teams are finding 93 percent of our bugs before they impact our customers’ experience.

This is what I consider another key area of our transformation – transforming Dynatrace from a mere product and service provider to an active driver of new value for our customers. The ACE model has been key to this; opening new opportunities for us to predict and anticipate customer needs and answer them before the customer even has to ask. As you might imagine, that entails a great deal of automation functionality; ACE means that we have built a foundation of practices upon which all our teams can use those tools to their best potential.

Powered by automation, we’re not only making sure we have the right support, analytics, and other elements in place to anticipate customer needs, we’re also creating an environment where our engineers are able to mature with the capabilities of the software as well as proactively identify and take advantage of opportunities to do more. ACE is an evolution of DevOps, NoOps, and SRE that I credit with setting Dynatrace, and our customers, up for our own evolutions and steady, scalable, sustainable growth.

[ Want advice from top CIOs on solving talent challenges? Get the Harvard Business Review Analytic Services report: IT talent strategy: New tactics for a new era. ]

Always driven by what’s just over the horizon,  SVP and Chief Technology Officer Bernd Greifeneder is taking Dynatrace solutions into a brilliant future. He’s a serial entrepreneur, Dynatrace being his third successful venture.