Dark data and other big data surprises

693 readers like this.
CIO Code

The image above shows how JPL employees used the popular tool GitHub to share open source code across the organization. One day JPL hopes to use big data visualizations like this one to incentivize specific code-sharing behaviors based on positive outcomes.

By Tom Soderstrom, IT Chief Technology Officer, Jet Propulsion Laboratory

Jet Propulsion Laboratory and NASA have been producers and users of big data for a long time. You could say it’s part of our DNA. Until recently our big data strategy has been focused only on science and engineering, such as how could we predict hurricanes and earthquakes or find the most interesting rocks on Mars. These are complex and difficult problems to solve, and they take time.  So this past summer we decided to put our big data to work against some problems that were closer to home. And the whole initiative was even more successful than we had hoped.

To begin, we defined what we called a low-hanging-fruit big data strategy to rapidly prototype some ideas. First we hired an IT Data Scientist, Rob Witoff, and we were admittedly lucky there. Then we went around JPL and asked people, “What are data-intensive problems you wish you could solve?” Rob and I got 70 end-user problems and narrowed that to seven ‘fruits’ we considered as low hanging (something we could solve in a few months using one or two students). Then we scoured the US for the best students in Analytics, which is a new field that’s focused on getting something actionable out of big data. We brought these students in for the summer in a start-up environment and gave each student one of our seven big data problems to crack.

Dark Data: It’s Everywhere You Look

One surprise as a result of this project is that we found a lot of what we now call dark data. Just as the universe is mostly made up of dark matter and dark energy, which doesn’t have a clear function and is at this point only measurable, dark data is there but has been either unknown or its potential has been untapped. We found that we could mine that data and trend it over years and surface things we didn’t know. Data on meetings, like time sheets and project codes that tell us how many meetings were held and with what level of employee attending them, is valuable dark data. So are help desk tickets. We have half a million of those we collected over the years. So is combining data from the outside with our internal data to gain new insights. That’s all dark data.

Another surprise was the power of showing visual representations of big data. That’s what interested users and got them excited. They said, “That’s interesting, why is that spike there or why is that dip there?” Those turned out to be the valuable questions.

A third thing that surprised us was not to jump the gun too quickly. Analytics really comes in three stages: descriptive, predictive, and prescriptive. Jumping too fast to the prescriptive is a mistake. You have to bring in the users who really understand the data to get their takeaway.

Finally, we were surprised by the importance of tools that are coming from the outside world, especially those that can help in managing unstructured data. Tableau works very well in the cloud, for example, and Splunk is great for real-time data.  There are many others.

Let Them Do Analytics

So where does this lead JPL going forward? Towards a great deal of self-service analytics. As part of a few prototype projects we created the ability for the users themselves to drill into and select whatever data they wanted to analyze. In HR, for example, you could look at the percentage of time a person had worked on a project, their starting year, where that person came from, and where they went in the organization. Another example was JPL’s IT charge-back system, where users could drill in and see where they were spending money with IT and how that compared to their peers. It has gone all the way to giving members of the Curiosity Rover team the ability to do analytics to compare thermal, engineering and electricity power data on the Rover on Mars. Now, not surprisingly, our long-term big data goal is to enable self-service analytics across JPL within three years.

Read more about how to adopt technology for the next decade by Tom Soderstrom.

Tom Soderstrom is Chief Technology and Innovation Officer, Office of the CIO at the Jet Propulsion Laboratory (JPL) in Los Angeles, CA. JPL is the lead U.S. center for robotic exploration of the solar system and conducts major programs in space-based Earth sciences. JPL currently has several dozen aircraft and instruments conducting active missions in and outside of our solar system.

Tom serves as the Chief Technology and Innovation Officer, in the Office of the CIO at NASA's Jet Propulsion Laboratory, where his mission is to identify and infuse new IT technologies into JPL's environment.He has led remote teams and large scale IT best practices development and change efforts in both small startup and large commercial companies, in international venues, and in the US Governm