The principles of DevOps go beyond just improving software development processes. They also help foster stronger, more productive teams
COVID-19 pivot: Lessons learned at Bloomberg
How did IT teams at financial data giant Bloomberg pivot to stay enterprise ready during COVID-19? Here’s what their head of telemetry shared
During the first days and weeks as countries, states, and cities around the world went into a coronavirus lockdown, companies had to deal with considerable stress on their people, processes and technology. I talked with Stig Sorensen, Head of Telemetry at Bloomberg, about their organization’s experience and lessons learned.
Sorensen started his career as a financial software developer building Bloomberg trading systems in the early 2000’s. “Looking back, the team I started on then would be called an SRE team today, so I can say that built a solid foundation,” he says. “After 12 years of building software and software teams in the Trading Systems department, I moved to manage a new Production Visibility group responsible for Telemetry and inventory management tools, as well as serving as the executive sponsor for the SRE movement across our Engineering department.”
Six questions with Stig Sorensen, Head of Telemetry at Bloomberg
Q: Which pre-existing processes were in place to ensure that Bloomberg was prepared for near 100 percent work-from-home requirements?
Sorensen: We have always been a place that everyone is on-call, so in that sense, it has been business as usual. We’ve had to do more triage and handling issues from home now. While we haven’t traditionally been a ‘remote first’ organization, we are more than capable of empowering our workforce to operate in a collaborative manner from home. Kicking off new larger projects may require different processes. Lots of our ‘keeping the lights on’ processes have always been there.
We have a solid incident management org and good communication channels. If you rely on people being in the office, you’re probably worse off. People here are prepared to answer questions and requests from home. While doing so, they choose the person most likely to be able to fix it or who would be awake. Having everyone at home for long periods of time is new for us, but we are generally comfortable with working from home.
Q: What stresses did this put on Bloomberg’s systems and infrastructure? Also, how did the highly-volatile markets contribute to these stresses?
Sorensen: We had two main factors that weren’t standard. First, market instability and volatility were off the charts, and many of our systems are designed to scale to handle the increase in volumes of market data. The second factor is that many of our clients are now working from home, where they are using different methods than normal to connect with our systems. Some systems that aren’t normally under stress have now become stressed. Clients are coming in more via public broadband or our mobile Bloomberg Professional app.
In terms of preparation for this kind of business disruption, we’ve learned from previous incidents, like 9/11 and Hurricane Sandy, which displaced our clients for shorter periods of time. For most clients, this is the first time they’ve had to use our systems to do certain things while working from home for the long-term. It isn’t easy to understand or anticipate all the different patterns or how clients will use our systems. Different access points create different stresses within our network and application infrastructure.
Q: Which tools do teams use to collaborate while working remotely?
Sorensen: Instant Bloomberg (IB) is the instant messaging system that our clients use to stay connected and share trading ideas and prices with one another. Our employees also use this internally to communicate, as one element of our collaboration suite. Our engineers also use it as our ticket management system and it’s well integrated with alarm issues and so on. We also have our NEXI video communications platform that our employees can use for face-to-face meetings.
We work in an Agile manner, so the purer the Agile teams are, the easier it is to run in a distributed model. With daily check-ins, it’s easy to spot if someone is stuck and not making progress. Doing things like whiteboarding in a room with your team will usually feel more collaborative, but we have plenty of experience with handling these scenarios remotely.
Q: Do measurements and metrics change when teams are all WFH? If so, what kind of changes have been made?
Sorensen: No. Managers must think of it differently to make sure that not everyone starts managing by numbers. Don’t get me wrong, this isn’t easy to do when we’re all not together in person, but they have to trust their teams and learn to read people’s emotions differently.
Most teams have doubled the time for their daily stand ups. In practice, this means spending 15 minutes to catch up and chat before the usual 15 minutes for the work portion of the meeting. We have people who are single or have no family at home, and they might need additional support right now. Teams are learning to understand that. Our leadership and HR teams have communicated this throughout our ranks. Family and health are priority number one. Work will still be there when we come back. This sets the right tone from the top and helps relieve some of the pressure that everyone is feeling.
Q: Where did you go wrong? What could have been done better?
Sorensen: From a technical point of view, we are in a pretty good place. The initial market data volumes were much larger than expected, but we’ve been able to handle them. We invested a lot in local development (on laptops) and productivity for our engineers. This has been a tremendous win to be able to take this and have the same experience developing at home. If we had not made these preparations, we would have been in a worse place right now. These were early investments we made well before the COVID-19 pandemic. People are as happy as they can be, relatively speaking, given the circumstances.
Q: Tell us about your team and the importance of the telemetry platform?
Sorensen: A key thing is to have a centralized monitoring system. When so many systems are under so much extra stress, having one place to go for monitoring is incredibly important. Having people share their views and be able to look across all the systems is important.
We had invested in this over the last few years, which has allowed us to pull up centralized monitoring as a service. Our telemetry platform has become a center of excellence for monitoring. This is managed for everyone across our department of 6,000-plus engineers around the globe and it is easy to look across. If we didn’t have this, each team would have their own instance and there would be no overarching view.
[ Editor's note: A version of this article appeared originally on the mthree blog. It is republished here with permission. ]
[ How can automation free up more staff time for innovation? Get the free Ebook: Managing IT with Automation. ]