Performance-related issues are among the hardest IT problems to solve, period. When something is broken, alarm bells sound (metaphorically in most cases) and alerts are sent to let IT ops know there’s an issue. But when performance slows for end-user applications there’s often no notification until someone calls to complain. Yet, in reality, losing 10 or 15 seconds every minute for an hour is at least as bad as an hour of downtime in an 8-hour day. At least as bad, and maybe worse – there’s the productivity hit, but there’s also significant frustration. At least when the system is really down, users can turn their attention to other tasks while the fixes are installed.
One reason why this remains an ongoing issue for many IT organizations is that there are few management tools that provide an overall view of the entire IT infrastructure with the ability to correlate between all of its different components. It’s not that there aren’t individual tools to monitor and manage everything in the system, it’s that coordinating results from these different tools is time consuming and hard to do. There are lots of choices when it comes to server monitoring, desk-top monitoring, application monitoring, network monitoring, cloud monitoring etc., and there are suites of products that cover many of the bases. The challenge is that in most cases these management system components never get fully integrated to the point where the overall solution can find the problem and quickly identify root-cause.
If IT was a static science, it’s a good bet this problem would have been solved a long time ago. But as we know, IT is a hot bed for innovation. New services, capabilities and approaches are released regularly and the immense variety of infrastructure components supporting today’s IT environments make it difficult for monitoring and management vendors to keep up. New management tools appear very frequently too, but the cost and effort of addressing existing infrastructures is often cost-prohibitive for start-ups trying to get their new management capabilities to market quickly.
IT management is its own “big data” problem.
As IT infrastructure continues to become ever more complex, IT management is becoming its own big data problem. Querying an individual device or server to check status and performance may retrieve only a relatively small amount of data to be sent to the management or monitoring system; a small volume of data but likely a diverse set of information indicating the status of numerous parameters and configuration items. Polling mobile devices and desk-tops, servers, applications, cloud services, hypervisors, routers, switches, firewalls … generates a whole lot of data, each different item having its unique set of parameters and configurations to retrieve. Polling hundreds, thousands or even tens of thousands of devices every few minutes (so that the management system will be current with device status) can create significant network traffic volume that must be supported without impacting business applications. On top of that, the volume of data, the polling frequency, and the resultant velocity of traffic must be accommodated to support storage, trend analysis and real-time processing. System management information is usually stored for months or even years, so that historical trends and analyses can be performed. But most importantly the management system needs to rapidly process information in real-time to correlate cause and effect, disable downstream alert and alarm conditions and perform predictive analysis so that valid messages can be proactively sent to alert IT ops. Now system management architecture becomes important. Add to that the need for flexibility to accommodate the ever changing IT landscape and management system design to support this “big data application” becomes a critical issue.
This, in part, is why IT management vendors are migrating their solutions to the cloud. As IT infrastructures continue to expand in terms of size, complexity and diversity, the ability to support the volumes of data generated and the performance needed to undertake comprehensive root-cause analysis becomes more and more challenging for purely on-premise solutions. In addition, cloud solutions offer the promise of “best practice” advice that can only be derived from a shared-service model, with the caveat that security and privacy remain paramount.
Of course, cloud solutions, with their pay-as-you-use pricing and (on premise) infrastructure free installations are also becoming far more attractive than perpetual licensing arrangements. However, the bottom line is cloud-architected solutions are extremely extensible and able to more quickly and easily accommodate new functionality to the benefit of all users. Not the least of which is the ability to deploy better diagnostic tools and capabilities to support the needs of today’s diverse IT user communities for high levels of systems availability AND performance.