Implementing new technologies in order to solve existing issues is part and parcel of the CIO's world. As new technologies appear and become validated as useful and safe for enterprise deployment, active IT departments take advantage of the new opportunities to put the technologies to work.
Big data is one of the technologies that is touted to be a solution for many issues. Technology executives who are able to move from initial deployment to a fully optimized and elegant implementation that delivers more than what was initially envisioned help to move their companies ahead of the pack.
Austin Park, senior vice president and chief technology officer for Paragon Development Systems, works with projects that involve big data and offered his thoughts on moving toward optimizing big data projects.
"Optimizing a big data environment can be a bit daunting simply due to the sheer volume of data we have to be concerned with. But you can break it down to some key areas of concern and approach them sequentially," Park said.
"One of the first considerations is the storage costs. Most organizations make storage hardware decisions based on a cost/performance formula they perform once every three years during a refresh. They need to adopt a far more granular approach when selecting appropriate storage for big data. Often making quarterly or even monthly adjustments to the storage architecture. Many are looking at software defined storage to make this process easier, but ultimately it needs to support the tools you plan on using.
"Secondly, big data involves big compute. Optimizing that inevitably involves taking a look at new infrastructure architectures that incorporate lots of automation to ensure that you squeeze out every last bit of computational capacity. This usually involves building your own open stack environment or leveraging a cloud service.
"Thirdly, given the amount of data we are working with, the better organized the information is, the better optimized the solution will be. This, of course, is much easier said than done. Therefore, many organizations have started the work of creating data lakes. This is a relatively new term to describe a data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment. This means the data itself would not be encumbered by poorly-organized schema decisions that may have happened years before knowing the type of data we are working with."
Clearly the path to optimizing the use of big data is not a simple one. But since we can expect that big data will become even bigger over time, it's a journey that is better started as soon as possible, and with a solid plan in place.