Machine Learning and Unstructured Data: The New Peanut Butter on Toast

Businesses typically operate on structured data. However, in the Internet era, most data that is available to businesses is unstructured, commonly stored as text from various sources — webpages, user forums, or audio and video, for example. Most organizations have mounds of this unstructured data, and they often struggle to make sense of it or to generate useful, actionable insights from it. To create any sort of intelligence from unstructured data, analysts have to manipulate it in some way — that is where machine learning really helps. Individually, unstructured data and machine learning can be academically bland, but if used together, they provide a rich smorgasbord of insight for strategic marketing.

Machine Learning Today

Although machine learning can be valuable on any type of data (structured or unstructured), unstructured data is virtually useless without it. Most companies have more data than a human can consume in 10 lifetimes, so the only way to make sense of it is to use some sort of machine-learning algorithms, including Natural Language Processing (NLP) algorithms, text-mining algorithms, and pattern/classification algorithms. Search engines and recommendation systems are prime examples of applied machine learning.

The dramatic drop in the cost of computing power and hard-drive space over the last 10 years has made it possible to analyze data on levels never before imagined. With all this computing power now available, the bottleneck has shifted to the lack of talent available to operate on the data. The supply of Big Data has grown over the past decade, while academic programs to train professionals to do this job have only recently been established, creating a huge gap between supply and demand.

Why Do This if It Is So Difficult?

The short answer is the Internet — a massive source of unstructured data with huge potential for providing actionable insights.

I recently worked with a publisher who wanted to know what types of content were most popular with their viewers. They had a ton of articles, representing an enormous amount of unstructured data. Thanks to Adobe Analytics, they already knew how many people were viewing their content, but they hoped to learn specifically which parts of the content were resonating with viewers. Was the main topic driving viewership, or was it something else?

I worked with this publisher to apply text-mining algorithms to every article. These algorithms classified their content by the major subtopics each article contained. Combined with their previously recorded information, they were able to understand the most popular topics that generally drove viewers to visit their content.

However, simply knowing the most popular topics was insufficient; next, they wanted to know which one topic was the most popular across all of their content.

Luckily, since they were able to mine this rich unstructured data, they could answer those questions without conducting a market survey of their audience. Businesses typically have most of the data they need at their disposal already; they just have to apply the right machine learning. With the help of a data scientist, a content creator can go into the data and ask, “What topics will really resonate with millennials, so more of them will visit our website?” and find that answer. This was impossible 10 years ago.

This works both for content creators and marketers, helping them understand how to focus tightly and personalize messages for specific audiences.

Other Benefits

Beyond producing content, businesses enjoy other benefits: setting prices, predicting demand, and maintaining inventory. Keyword Advertising is an area in which we see a disruptive technology using unstructured data. Adobe Media Optimizer produces very deep analysis for marketing purposes on all search keywords, display advertisements, and social postings from people across the Internet. That information is then used for automated real-time bidding on ads and ad posts.

Many businesses pay top dollar for their branded search keywords. Machine-learning algorithms can help SEM marketers uncover ways to reduce spending on expensive branded keywords and instead invest in long-tail keywords for a bigger ROI than what could have been achieved otherwise.

A human could never examine all of the information and find specific keywords way out in the long tail. Machine learning can not only find these words, but also predict that they will drive conversions, enabling SEM marketers to justify increased spending to the C-suite executives.

Looking Ahead

New approaches are being developed based on foundational research that has only recently arrived. There are many machine-learning groups in academics, information technology, and marketing — all approaching this from different angles.

At Adobe, we are trying to make this type of analysis more accessible to data citizens, rather than cater exclusively to data scientists. Someone should not need a PhD to answer questions like, “What is the most popular topic on our website?” Or, “Which search keywords should I bid on?”

Today, many of our customers are doing well in this area, and our strategy is to make this easier for even more companies. We will continue to make machine learning and data more accessible, enabling companies to operate on data being collected and to find insights to help them in marketing and beyond.