You've spent millions of dollars on systems that were state-of-the-art the year you installed them. But time and technology have marched on since then. Should you keep patching what you've got or rip it out and start over? And just how do you decide what to keep and what to replace?
Take the long view, and take "technological debt" into account, advises Guido Laures, CTO of Spreadshirt, an e-commerce platform for print-on-demand apparel and accessories. In an interview with The Enterprisers Project, he explains how he and his team make these decisions at Spreadshirt.
The Enterprisers Project (TEP): Spreadshirt is a 15-year-old company and technology has advanced quite a bit in that time. Once investments have been made in legacy systems and they are used to operate key functions, how do you make the difficult decision to make changes?
Laures: The stark reality of a technology-driven business is that there is no such thing as "protecting investments" in IT systems. No matter how much you invested into a technical component, it is always subject to be replaced.
Take a long-term view and realize that every investment into technology immediately turns into technical debt, which generates associated costs. Usually, the higher the investment, the higher the maintenance costs. Hence, killing a component that represents a lot of investment also means cost savings. In most cases, you gain flexibility if you replace old components that are hard to use. Understand that holding onto outdated technology makes the enterprise slower, less efficient, and thus also has associated costs.
TEP: How has taking a long-term view helped you make these decisions at Spreadshirt?
Laures: Our e-commerce platform evolved over more than 10 years, starting as a PHP monolith back in 2003. Eventually, our system reached the "point-of-no-more-innovation" in 2013. Any improvements, or even small new features, required major development efforts. Every change had become risky because of the many interdependencies from mostly unrelated parts of the platform. Changing something in one place caused multiple issues in other places in the architecture. The decision to replace instead of further updating wasn't difficult.
The IT department decided to migrate the monolith to a modern, microservice-based architecture while still creating new, business-generating features. Given some previously unsuccessful attempts to update the systems, we knew that we would require at least three years for the process of transferring our core business process into the bright new service-based world.
TEP: That sounds like a big project. How did you decide where to start?
Laures: The order of migration is defined by innovation pressure. We analyze those areas of the business that are suffering most from the inability to change and modernize. It turned out that the front-end areas where most of our traffic is routed by search engines and ads were the most affected.
But we only migrated what needed to be touched anyway. Techies sometimes tend to work in an all-or-nothing approach. Trying to avoid complexity can lead to something like: "Let's start from scratch on a green field and switch everything to the new platform once it is finished."
This is fine as long as the system to be migrated has only a short history. But that's not always the case. At Spreadshirt, the monolith consisted of a couple of million code lines covering three different business models, ERP and production processes. The option of starting from scratch was off the table fairly quickly as it would have lasted for years and the old system would not provide any technical innovation to our customers during the transition. A successful big-bang migration at the end of such a process is something that I have not ever seen during my 15-year IT career.
Touching only those components that will be modernized next is a better approach to reduce complexity, stick to the things that work and still innovate in certain areas.
TEP: Once you've determined what needs to be migrated, what approach do you use to get the job done?
Laures: After identifying the first component to transfer into the new world, we had to define the process of migration. Besides the Big Bang approach, there are actually two additional paths to modernization: "Duplicate and Sync" and "Rip and Integrate."
With "Duplicate and Sync," one starts on a green field (but only for one component) and duplicates the business logic within the new (in our case) micro-service. New components, especially new front ends, can then use these services and do not have to be part of the monolith anymore. To ensure that the service has access to the required data, database synchronization between the legacy and the new system is set up. After that, new front ends can easily integrate with the new service using modern front-end technologies.
"Rip and integrate" tries to avoid error-prone and expensive data synchronizations by switching off the legacy component after the new micro-service is available and re-routes all clients into the new service. This approach could also be seen as the last step of a component's legacy migration.
Organizing a large migration task within agile teams turned out to be most efficient strategy in making sure that one team owns the entire business process.
TEP: Can you describe an example of how this worked?
Laures: With the tactics described, it was an easy task to identify the first component to work on: The checkout! Back in 2013, the checkout was a multiple-step HTML page wizard, which was slow, ugly, and not responsive.
Almost 80 percent of customers who entered the checkout with the intention of buying some products dropped off somewhere in the checkout process. This was especially true for mobile users. Hence, we decided to start modernizing the checkout in early summer 2014 with the goal of putting it in front of our customers before the holiday 2014 season started. We were able to stick to this plan and released the MVP using a REST-services back-end and a modern UI based on responsive HTML5/Javascript technologies.
It led to a 14 percent uplift in orders right after "go live." To get rid of legacy, we invested another six months so we could remove large portions of the PHP code. All of this was completed by one agile seven-person team including product management, UX and developers.