In our recent conversation with David Schatsky, managing director at Deloitte, he indicated that 2018 is the year AI talk
How to avoid a GDPR compliance audit: Best practices
How can CIOs prepare for the impending GDPR privacy regulations? Here are four strategies for Linux environments
The General Data Protection Regulation (GDPR), a set of consumer data privacy regulations that applies common guidelines to companies across the EU, poses a looming issue for CIOs. The enforcement date is May 2018, at which time companies must comply or face significant fines.
We’re not just talking about EU companies. According to the informational site about the regulations: “The GDPR not only applies to organisations located within the EU, but it will also apply to organisations located outside of the EU if they offer goods or services to, or monitor the behaviour of, EU data subjects. It applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.”
However, many U.S. organizations are still racing to get ready for the GDPR. (See CIO.com’s article, GDPR: Why it is time for the CIOs to be counted, for more background on GDPR and what it means for IT.)
How can CIOs prepare in such a way to head off a GDPR compliance audit in a Linux environment?
Hadoop, OpenStack, and hypervisor and high-performance computing installers make it easy to build mega Linux clusters and deal with node failure reasonably simply. However, compliance for the GDPR may not be so easy – even for Linux environments. With a minimum fine of at least $20 million looming for a GDPR violation, privacy and encryption of data is important and should still be a key part of your GDPR compliance planning. Yet CIOs also need to deal with data infrastructure and provide clear documentation of how the software manipulates, aggregates, and anonymizes or de-anonymizes consumers’ data in an audit.
How auditors will conduct their review of your organization’s people-related data storage and manipulation should be equally important as you shape your privacy and security measures. Here are four best practices to consider.
1. Streamline certified admins
An audit needs to prove how personal data is stored and how data is manipulated. Therefore, you must identify who on your staff can create, change, or log in at these application-specific accounts, or worse, the operating system root account.
To somewhat address this challenge, there are big data or HPC cluster software tools with specific setup and deployment models that suggest standard templates for installation. Unfortunately, less than 20 percent of Hadoop licences purchased worldwide have moved into live production. Furthermore, typical cluster installation tool models from commercial edition vendors do not reflect the compliance standards needed in 2018.
Implementing a consistent way to trace organizational user ID (UID)/group ID (GID) early is also necessary in Linux environments. From within your cluster’s software, the unique application user and group IDs need to fit into that matrix across the organization’s infrastructure, not just in the cluster. Additionally, staff IDs need to be unique across the business, not just in the cluster. Best practice dictates using multi-factor authentication challenges when they log in and move from node to node to prove they are a real person, not a stolen account and password pair.
2. Incorporate a SIEM platform for better reporting
Clusters can generate an overwhelming number of log files. An operating system, application, or cluster manager’s log viewer would show you only slices of a typical event. By incorporating a security information and event management (SIEM) platform, you have a more efficient way to track correlated events. Sending all logs at all levels to your enterprise SIEM is safer, more complete, and can become part of another team’s responsibility to create reports.
Furthermore, ensuring your admin staff have unique account names and account IDs makes correlation very simple to track in the network, operating system, and software layers. In fact, auditors and business data owners prefer someone apart from your Linux admin team mark documentation.
3. Make it easy for auditors
If it’s taking more than two days for a team member to help auditors during an audit cycle, you’re not making it easy for auditors. That could result in more scrutiny. It’s best to focus on the “what actually happened” events from a SIEM report than interrupt the operation of your data cluster.
A common challenge, however, is that in order to achieve 10- or 100-times data processing performance, it is now nearly impossible to get a time-based snapshot of customer data from 45 days ago from your cluster.
Thankfully, most open source and commercial SIEM systems have interactive reporting capabilities, and there are credible third-party report tools. Auditor training using these reporting tools can take as little as one to two days, a significant audit cycle cost saving, rather than attempting to train them in the full operation of your cluster’s operations. Just remember to have technical audit headcount with the appropriate admin and lifecycle experience.
4. Think beyond privacy
When planning for GDPR compliance, a focus on only data privacy can be problematic. The exposure of “user-less” big data software solutions are vulnerable to small teams of administration staff who can easily subvert the cluster’s technical platforms. We can learn from the financial services industry on how to meet compliance requirements in Linux environments.
By implementing the best practices noted here, international banks have passed similar quarterly audit cycles with ease on UNIX and Linux platforms for three decades, and data forecasting clusters for the last 15 years.
During your 2018 operational and GDPR planning, carefully consider how auditors will work their way through checklists. With potential fines starting at 2 percent of your company’s total worldwide turnover, your scope needs to be organization-wide.