Skip to Main Content

Impact | Listen and learn

Volume 21, Number 5
May 2015

Building a better tool box for mining consumer preferences.

wang-impact-15.jpgData mining has come a long way since the legendary discovery of the infamous “diaper-beer correlation,” which helped to launch a thriving data warehousing market long before “big data” was a household term.

For decades, industry vendors have talked about how data mining software uncovered a golden relationship between beer and diapers, like old-time prospectors discussing a mother lode during the Klondike gold rush. “It is part of the folklore of data processing,” Forbes noted in 1998, adding the story typically goes something like this: “A retail chain put all its checkout-counter data into a giant digital warehouse and set the disk drives spinning. Out popped a most unexpected correlation: sales of diapers and beer. Evidently, young fathers would make a late-night run to the store to pick up Pampers and get some Bud Light while they were there. Capitalizing on the discovery, the store placed the disparate items together. Sales zoomed.”

Unfortunately, like the early hype around data mining, the diaper-beer legend wasn’t exactly accurate. If truth be told, many organizations that jumped on the data mining bandwagon in the 1990s failed to see any ROI whatsoever. The estimated failure rate for early adopters was 70 per cent. Technical problems included underestimating the difficulty of cleaning data, not to mention the need to link multiple silos of enterprise data into one warehousing system. On the design front, end users were often not consulted, so the systems designed didn’t always meet business needs. Simply put, instead of mining internally generated data for profitable insights into customer behaviour, many companies found themselves with sophisticated multimillion-dollar software applications that marketing people left alone to gather dust.

Since the start of the 21st century, of course, digital data has experienced exponential growth. As a result, the potential payoff of data mining has cracked through the ceilings of corporate data farms and soared into the so-called cloud. Data sets of previously unimaginable value are now readily accessible to mining applications seeking to uncover patterns, trends, and associations relating to consumer preferences and behaviour. And many more marketing professionals can now see the potential because much of the available data waiting to be mined is now actually provided by consumers themselves.

“Social media can tell companies how to improve their product offerings,” says data mining expert Xin (Shane) Wang, an Assistant Professor of Marketing at the Ivey Business School. “All businesses need to do is figure out how to listen effectively. And that is what my research aims to help them do.”

Prior to pursuing a PhD in Marketing at University of Cincinnati, Wang studied mathematics and worked as a statistician in Washington, D.C. He was driven into marketing by an interest in cognitive psychology and behavioural economics, along with a desire to use his programming and data analysis skills to better enable companies to derive actionable intelligence from user-generated content (UGC). Like manufacturer-provided content, UGC is readily available in the big data era. But due to its large volume and unstructured nature, it creates notable challenges for academics and practitioners. Wang’s research has helped overcome the difficulties by contributing to the development of an innovative method of mining UGC that advances and automates market structure analyses. Integrating sentiment analysis techniques with Latent Dirichlet Allocation (LDA)—a topic modeling tool used to identify key words, patterns, and trends in large quantities of unlabelled data—the method in question can be used to mine consumer-generated information such as online product reviews.

“Unlike transaction records collected from legacy systems, consumer-generated product reviews contain rich insights into behavioural information and product attributes that matter most to consumers,” noted Wang. “In other words, although researchers must take steps to avoid bogus reviews, firms can now actually learn what customers consider the strengths and weaknesses of their products and competitor products—right from the horse’s mouth—without conducting costly surveys or focus groups.”

Wang's research contributed to the development of a prototype system that was deployed to analyze tablet computer reviews from The results indicated the proposed data mining method provides high validity compared with extant market structure analysis methods. The tablet computer data—which contains information on product characteristics and market dynamics gleaned from more than 40,000 product reviews spanning a period of 24 weeks—is available online to benefit the practice of marketing. It is currently being used by other marketing researchers to help accelerate the drive toward big data analytics. Although this data set—built through data collection, cleaning, filtering, and integration processes, rather than just the simple, straightforward crawling—relates to tablet computers, the proposed data collection method can be used to obtain data about different product categories.

Wang’s interest in data mining goes well beyond using consumer-generated product reviews to automate market structure analyses. Along with a group of like-minded academics, Wang recently conducted a historical analysis of the Journal of Consumer Research (JCR), using text mining and citation analysis to identify trends in past research, as well as JCR articles that have had the most impact since the journal’s inception decades ago. Using this data, the study group was able to project future trends.

So where is the field of consumer research heading?

According to Wang, as long as you are happy extrapolating, a safe bet for the next big thing in marketing is research on social identities. This area of study, of course, aims to help marketers better understand how subtle shifts in social context can subconsciously influence consumer behaviour and dramatically change how open an individual is to specific messages at any given instant. And that, of course, could be the key to understanding why beer and diaper sales don’t always go hand-in-hand.


Previous issues of impact