Anomalies and False Narratives: Finding Truth in Big Data
By Kerry Pearce, SVP, Product Development, UberMedia
It’s rare for marketers to find clear answers in Big Data. But for inquisitive marketers, Big Data is incredibly helpful for identifying the right questions.
Recently, a fitness apparel brand sought to identify potential customers by analyzing mobile location data. We focused on gyms, public recreation areas, and stand-alone fitness locations like yoga and cycling studios. We found the audience, but we also found something unexpected: a large overlap with audiences that frequent fast food establishments, none of which could be classified as healthy options. To the client, this looked like the kind of previously unseen and counterintuitive insight Big Data is famous for. Yes, the client reasoned, logic dictates that you’re unlikely to find a significant audience of gym rats who also have a serious French Fry habit. But with so much data, how could that conclusion be wrong?
Big data is really about picking the relevant “small” data
For all its emphasis on scale, Big Data is really about analyzing the small fraction of collected data that is relevant to the inquiry. This is because collecting Big Data, by definition, means collecting even more noise. This is why data scientists talk about “cleaning up” the data as a prerequisite to analysis – the idea isn’t to find the needle in the haystack, but rather to locate the relevant haystacks.
So is it possible that people who workout a lot also enjoy eating unhealthy food? Of course it’s possible, and we can even come up with some behavioral theories to explain why. Perhaps, these gym rats workout to offset junk food. Or maybe, the appeal is convenience, because people who workout are pressed for time. Neither of these theories is inherently wrong, but the further down the road we go with this particular data-driven narrative, the more susceptible we become to our own bias. Put simply, we want to believe we’ve discovered a new audience segment, and so we tell ourselves a story where Big Data unearthed a hidden clue. But is it truly a relevant data point; or put another way, is this insight one that will move the needle for a marketer?
When the data adds up, worry
One constant requirement of narrative is that the storyteller ties up loose ends. But that’s not how the real world works because the real world is messy. So if the data adds up to a tidy story – the audience segment for fitness apparel is huge because everyone is passionate about fitness! – it’s time to worry.
Too often, marketers seek out only data that confirms narratives they already believe to be true, or narratives they want to believe to be true. All humans are susceptible to this problem, by the way – it’s why we want to believe news reports about studies that tout the health benefits of drinking alcohol and eating dessert. But marketers can be especially prone to this type of bias because marketers are the guardians of a brand’s intangible qualities and values. They know their brands, which simultaneously makes them experts and the least likely people in the room to see the bias of their own assumptions. They believe everyone cares about fitness – whether they actually demonstrate that care or not – because everyone who works at a fitness brand is demonstrably passionate about working out. The question is not how to get rid of that bias – you can’t – but how can marketers use Big Data to seek out deeper truths that may upend what we think we know for sure?
Embrace the anomalies
The overlap between the fitness and fast food audiences is an anomaly – one we must embrace. If taken at face value, the overlap seems to prove what we want to believe: everyone is passionate about fitness. But the same data can actually be used to tell just the opposite story. People who go to the gym and frequent fast food may not be passionate about fitness at all. True, they do exercise and so may require workout gear, but their level of enthusiasm for exercise might actually be diminished by their interest in fast food. Put another way, the anomaly didn’t enlarge the fitness audience, it actually made it smaller because the deeper we dug into the data, the more nuance we found. And in that nuance we discovered a subset of the fitness segment that doesn’t really share the primary values associated with the overall segment.
Of course, trading a large data set for a smaller, albeit more useful one, feels counterproductive because doing so forces us to abandon the story we told ourselves. In embracing the anomaly, we found hard evidence that the passion for fitness is not universal. But in exchange for letting go of that false narrative, we put ourselves in a better position to locate that passion. It’s not an easy trade, even if it is a good one. In fact, for marketers, letting go of stories that neatly summarize brands and customers is terrifying. But the alternative ought to be even more frightening, because a strategy based on a false narrative is one that is inevitably doomed to fail.