What’s hot in DevOps job titles, skills, and organizational strategies? Take note, job seekers and hiring managers.
Behind-the-scenes: How the FCC migrated to the cloud (Part 1)
Part one of the two-part series on the FCC’s IT modernization and journey to the cloud. Read Part Two.
Last year, The Enterprisers Project spoke with Dr. David Bray, CIO of the U.S. Federal Communications Commission, about his ambitious plans to overhaul the agency's then 207 legacy systems and move all physical servers off site to a commercial managed services vendor.
The FCC IT organization recently completed “Operation Server Lift” (as the initiative was dubbed) and is now enjoying a number of benefits. In all, the move to the cloud and a commercial service provider reduced rack count from 90 to 72, lowering both footprint and operating costs. The move also eliminated or consolidated infrastructure where possible, like retiring two servers that were 11 years old and weighed one ton each. It reduced on-going maintenance costs from more than 85 percent of their IT budget to now less than 50 percent of their budget. The effort also provided the FCC a more accurate inventory of all the applications, unique configurations, and software in their environment. Bray likened this feeling to moving into a new house after 20 years. “You finally get a sense of everything you own,” he said.
The overhaul – not quite Mission: Impossible, but definitely a Mission: Complex – is detailed in a timeline that dates back to January 2014 and culminates in ticking down the hours of Sept. 4, 2015. In this two-part series, we asked Bray to walk us through all the successes, lessons learned, and nail-biting moments along the way.
Making a case for an IT overhaul
As you might expect, upgrading 207 legacy systems and making a large scale move to the cloud takes a lot of planning. Bray came on board as CIO of the FCC in late 2013 and, after doing an initial “listen and learning” tour with the FCC’s 18 different Bureaus and Offices, he determined in January 2014 that the Commission’s IT was not in a sustainable state. Eighty five percent of its IT budget was allocated to maintaining legacy systems. There was also low morale among the IT team members, as there had been nine CIOs in eight years prior to his arrival. Having listened to multiple perspectives both inside and outside the Commission, Bray swiftly got to work developing an overhaul plan.
As Bray fine-tuned his overhaul plan – something that he later dubbed an “intervention” – he needed to rationalize how he was going to move forward with a limited and constrained budget. The first step was to get out of having to patch and maintain individual PCs.
“Starting in January and within a four-month window, we rolled out Virtual Desktop to the entire Commission at an accelerated clip,” said Bray. “That essentially did away with the users' desktops, meaning, we could refocus resources previously spent on individually patching and maintaining each PC in the organization to focus on the planned ‘Operation Server Lift.’ Making this shift would also allow us to move the servers with as minimal disruption to the user as possible, because we wouldn’t have to actually move their software and their hardware.”
To strengthen his case, in September 2014 Bray concentrated efforts on replacing the FCC's 15-year old Consumer Help Desk to show that it was possible to go from a legacy IT system to a cloud-based one quickly. “The private sector quoted that to do it 'on premise' would probably take between one to two years and cost about $3.2 million. We didn’t have that time, let alone that money. So in September, we explored doing it as Software-as-a-Service using a commercial cloud product, a move that would let us get it done in six months,” Bray said.
Within four months, the FCC rolled-out the SaaS Consumer Help Desk to the public. “Thanks to doing Software-as-a-Service, we did it in less than six months and for about one-sixth the price,” Bray said. [Read How a change agent saved the FCC millions on a legacy IT upgrade for more background.]
The project was a success, but of course, not everything can be easily moved to the cloud. The FCC had some legacy 32-bit applications that were on old hardware and couldn’t make a direct leap to the cloud. Until the FCC got out of the increasing cost burden of maintaining these servers, an ever-increasing amount of its budget would be tied up just supporting these old systems. “Maintaining 207 systems on-premise at the FCC just didn’t make sense from a cost or a workforce perspective,” Bray said. “We had to do a dramatic intervention.”
It was then, in January 2015, after proving the FCC could if it had funds move old legacy systems to newer cloud platforms at a fraction of the price of a new system on-premise, and do so in less than half the time of an on-premise effort, that “Operation Server Lift” began to really take shape and ramp up. Some legacy systems could follow the help desk's path to the cloud, some needed to be retired, and some needed to to be moved to a commercial provider – a provider that could take over maintaining and caring for those systems so the FCC wouldn't have to, and do it with a better economy of scale, Bray explained.
“We literally retired two Sun E25Ks, which as background, these systems each weigh one ton. We clearly did not want to load those into the trucks, and they were 11 years old. Those were moved to newer server blades that were lighter, more modular, etc., so that they could be more easily transported to the commercial data facility. Those one-ton systems could now be gracefully retired and disposed of as appropriate,” Bray said.
By the end of Bray's first year as CIO, significant progress had been made.
“We had gone from having 207 different IT systems down to approximately 100. But despite the forward momentum, we still had over 300 servers and about 400 applications running different software on the stack,” Bray said. “There was still a long road ahead of us.”
Preparing for all outcomes
As 2015 marched on, Bray continued to spearhead projects that would ultimately make “Operation Server Lift” a success. In February, the FCC awarded a contract to move all email and documents to the cloud. This was an important step because the FCC didn’t have the budget to fully replicate all its existing servers to the commercial service provider. With funds severely constrained, the only way “Operation Server Lift” would succeed would be a power-down, physical move, and power-back up at the commercial service provider of the servers.
Bray explained that the migration of FCC’s email and documents to the cloud enabled the FCC to complete the server lift without impacting anyone's inbox or email accessibility. It also reduced the FCC's email server footprint, internal operations and maintenance costs, and contractor overhead.
With FCC’s 1750 employees and 500 contractors, it was estimated that moving all email to the cloud would take about five months, but Bray’s ambitious goal was to complete it in only two months. Indeed, the process that began in June 2015 was completed by August on a marathon pace.
“To the entire team’s credit (especially two positive change agents Layli Suel and Nitin Madan), everyone rallied and achieved a roll-out within a very tight window,” said Bray. The shortened timeframe, Bray told us, was linked to the possibility of a government shutdown in October 2015. Thus, the FCC wanted to complete “Operation Server Lift” before such an event.
In March 2015, an RFP was issued for datacenter consolidation, optimization, and the actual move of FCC’s servers, and two months later a contract was awarded. In the midst of this activity, Bray's team rolled out a prototype for a revamped FCC.gov website that would be fully hosted in the cloud after six-months of user feedback, analyzing web content, web analytics, card-sorting, external audience focus groups, and one-on-one user experience interviews. “There are so many different stakeholders that use different FCC.gov services, it was extremely important we spend time listening and involving them in designing a refreshed design for FCC.gov’s web content,” Bray said.
As the D-Day for “Operation Server Lift” neared, Bray weighed all possible outcomes to ensure he was prepared.
“FCC had about 400 terabytes of data, all on these different servers on our storage area network,” Bray said. “Our budget was so constrained we couldn’t do a mirrored replication of everything, so my biggest concern was 'What if the storage area network is in a truck that gets hit by a car?' We can restore it from backup, but that's a very slow process. It would probably require two to three weeks to do. I didn’t think the public would be that patient with us if it came to that conclusion.”
To prepare for this worst case scenario, “The FCC IT Team made a complete copy of the Commission’s storage area network and planned to have the two versions leave on separate days, following two separate routes, in two separate trucks,” Bray said. “One would not leave the FCC until we confirmed that the other one had arrived in the commercial facility intact.”
In early August 2015, 28 days from powering down the servers, Bray's team started a countdown clock tracking the days, hours, minutes, and seconds until “Operation Server Lift.” In part two, find out what happened to those two trucks as Bray shares the lessons learned and the critical moments from the last leg of his journey.
Read Part Two.