Diamond mining is an obsessive pursuit for quality that drives companies to literally move mountains. To be exact, up to 1750 tonnes of earth for every one carat of gem quality diamond.
The analogy is particularly relevant in the context of our obsession with big data.
In our pursuit for the perfect correlation, we’ve become obsessed with collecting as much data as possible, fulfilling the “big” in big data but at the same time loosing perspective on how best to retrieve that one-carat of flawless data that validates the investment.
This is not an argument against more data, on the contrary; a business needs to strip mine huge swaths of real estate to find a data gem. Rather, this is a reminder that big data can be flawed, irrelevant and even un-interpretable, requiring both computing power and human intuition to identify the correlations that underpin the project’s value.

The model above illustrates that big data is actually a function of data with various degrees of quality and utility. Found within all this ‘noise’ is a much smaller “information” section, and within that, relevant information containing pertinent insights. It’s a fraction of a fraction.
The sheer volume of data originates from five broad categories: stated, behavioral, social, psychological and genetic. There are implicit and explicit types in several categories to be sure, but broadly speaking, big data should be attempting to harness each source to make the necessary correlations or predictors. If not, the ‘mining’ process falls short and is caught in a race against time to exploit the insights before their relevancy diminishes.

This is why grading data, and subsequent insights, is central to the big data story.
A high grade data ‘diamond’ will draw in high quality anchors from stated data (explicit) like CRM data (demo, postcode, income strata), social (explicit), and potentially psychological (character traits, mindset) and genetic indicators. By comparison, lower grade correlations will be dominated by behavioral data and implicit signals for stated and social data like search terms and sentiment.
With this in mind, and in the context of RTB in Australia right now, what’s the perception of data quality applied to the bidding process? You’d probably rate it low to medium grade, with 80 percent of the data at its most effective within 48 hours of its time-stamp.
Time to dig a little deeper, and a little smarter.










