The term “Big Data” is everywhere lately. You can’t toss a rock without hitting a company that has some kind of big data offering and you’d be hard pressed to think of another buzzword in tech that is getting more hype. Everyone seems to have some kind of big data solution, whether it be hardware companies like HP, IBM, and EMC; or software companies like Terradata, Cloudera, HortonWorks, and Microsoft (or even new hybrid or hosted solutions like GoodData).
With all this hype and buzz, it can be difficult for SMBs to get a handle on big data and how it can affect their business. At Business.com we’re working with big data to deliver more relevant content to our audience and to ensure that our customers can reach this audience with relevant messages at the right time in the marketing funnel. When we can ensure that all content and marketing is relevant to the audience, it benefits both the audience and the advertiser. The promise of big data technology is to deliver these types of difficult insights to all businesses from the mountain of data they generate.
So What Is Big Data?
There are a lot of circulating opinions on what big data is and a quick Google search will turn up plenty of them. The simplest way to explain big data is twofold: the content and the technology.
The content of big data is extremely vast, rapidly growing, data sets often comprised of unstructured (read dissimilar and from many sources) data, that are too big to be managed and mined using traditional database technology. When addressing a data set, it can likely be considered big data if it meets some or all of IBM’s “Four Vs”:
- Volume – the sheer volume of the data set is huge, hundreds of millions to billions or more records or more likely on the order of terabytes (1,024 gigabytes) or petabytes (1,048,576 gigabytes)
- Velocity – the very high rate at which new data points are generated, many times per second or more
- Variety – the multitude of related but unstructured data sources and types the comprise the set to be analyzed
- Veracity – the messiness of the data (typos, verbal speech, hashtags, user generated content)
Datasets like retail transactions, website activity, phone records, shipping GPS measurements, and DNA or other biological data are some common examples.
The technology spawned out of two locations, a 2004 paper from Google on a technology called MapReduce, and the technology developed by Doug Cutting at Yahoo! to harness the concept, called Hadoop. Using Hadoop, (the most common big data processing framework) these massive data sets can be broken into chunks and processed in parallel across tens, hundreds, thousands or more individual servers or nodes.
Once all this processing power is harnessed to attack these massive data sets, we can generate meaningful insights from an unwieldy mound of data.For example, if you’re an online retailer using many different marketing channels you likely want to know where you’re getting the best ROI. Most commonly, a sale or conversion is attributed to a marketing channel using either “first touch” or “last touch” methodology (which means exactly what you’d think, the first or last channel a user interacted with). Big data technology allows us to tackle this traditional marketing problem and harness the multitude of touchpoints. Where a user’s multiple interactions with your marketing might have previously been lost, we can now find actionable trends across that huge data set. Think about how powerful this is!
Does This Mean The Data Warehouse (or other database systems) Are Dead?
Not at all. While Hadoop and others offer powerful data mining ability, you need a way to store the structured or semi-structured data generated from these processes. You wouldn’t want to repeat that hard data mining work over and over again. Traditional database and data warehouse solutions like MSSQL, Oracle, MySQL and Postgres will continue to be widely used to store the relational structured data that is generated from big data operations. The recent advent of NoSQL, schemaless and unstructured databases present even more interesting options for managing the crunched data sets generated from Hadoop and others. Bleeding edge technology like graph databases are yet another potential option for storage of these data sets, depending on your use case. Many people often think that big data technologies serve as a replacement for these traditional and less traditional databases, but that is simply not the case.
Big data technology has burst onto the scene, but is definitely here to stay. While there is a lot of buzz being generated and it can be confusing to comb through all the offerings, most companies have an opportunity to find a way to turn the data they may not even realize they’re generating into meaningful insights.