Stop that truck! – a conversation on big data with Paul Bower
As Senior Vice President and General Manager of Advanced Business Intelligence Solutions at Ventyx, Paul Bower leads the company’s analytics business and is the resident “big data” practitioner. I had a chance recently to speak with Paul and get a preview of what he’ll be presenting this week at Ventyx World.
Q: “Big data” is a popular buzzword at the moment, but what does it really mean?
A: Broadly speaking, it’s the combination of having large amounts of data available and the ability to do something with it. Data might come from operational systems, enterprise systems or even third-party sources. The real challenge is in understanding how to harness these various sources to extract useful information.
Q: So how do companies go about doing that?
A: There are four key elements that we in the industry call the “four V’s” of big data that must be understood for any big data project to work: volume, velocity, variety and veracity. Taking the first three in order, they refer to the fact that we are seeing more data being produced more quickly from more sources, but there is also the issue of data integrity and that’s the veracity part. This makes implementing a master data approach critical so that you have one repository of timely, accurate data you can then draw on from any given application.
The industries we’re primarily working with are seeing a sharp rise in data volumes and new data streams are being added all the time. You also have a large body of historical data that needs to be mined as well.
Q: How are industrial businesses using big data today?
A: Taking the power industry as an example, one of our focus areas has been on meter analysis. Automated Metering Infrastructure (AMI) systems generate lots of useful data, but meters are fragile, relatively speaking. Problems can come up, but they’re not necessarily things that need field work. In fact, we estimate that close to half of all errors reported by the meter are false positives, and we’ve worked with meter vendors to develop models and tools that can bring that rate down to around 15%. When you consider that it costs a utility around $150 to send a field crew out to check on a meter, the cost implications are significant.
Utilities are also very interested in storm response and recovery. They want to apply lessons learned from previous events to future ones. We’re looking now at how we can incorporate many different data streams to provide a more nuanced picture of what’s going on as well as to model and predict the impact of those storms. That would obviously include data from monitoring and control systems but also maintenance histories, weather data, customer inquiries and so on. One major challenge with this is that not all of these sources produce structured data.
Q: What is “structured” vs. “unstructured” data?
A: Structured data is what most of us would think of as being “data,” things that fit neatly into a database like numbers and codes. Unstructured data comes in two forms: semi-structured, such as text and emails; and unstructured, referring to things like images, video, voice. These could provide a lot of information, but they have to be interpreted, and we’re working on tools to address this. For example, we have developed a method for extracting text from unstructured PDF documents and converting it to structured values in our Asset Health solutions.
Going forward, we will see things like images and video incorporated as well. With the explosion of handheld smart devices people are starting to see images and video as a standard form or data. The objective is to provide as much context as possible around the business processes that we are improving with our solutions.
Q: What is the advantage of using big data with Business Analytics Intelligence?
A: Previously, business intelligence (BI) solutions focused mostly on internal structured data and processed that information in regularly occurring cycles. Big data expands your view of the enterprise by increasing the range and variety of data that can be analyzed so that you have additional context and insight to enable better decision making. In addition, big data scales in a predictable and straightforward way, both in size and speed, so that BI or more powerful Business Analytics solutions can grow with your business.
Speed is also important. With decreased time to actionable results, big data can provide an advantage by adding a real-time analytics capability that can enable your personnel to be more responsive in day-to-day situations.