How Adobe Research Is Using Data Science Workspace to Deliver Customer Insights and Predictions in Innovative Ways at Scale

by Seth Reilly

posted on 10-17-2019

In today’s market landscape, data is key to success and survival; the better you know and understand your customers, the better you can make use of personalization tools to provide them with the best digital experiences. If you’re not able to analyze and act on the data you have, at scale, then you’re at risk of missing out on key opportunities. Unfortunately, this is often the case when dealing with large data sets coming from many different places.

Adobe Experience Platform allows you to centralize and standardize how you collect and maintain customer data from many sources. Data Science Workspace sits on top of Adobe Experience Platform and accelerates your ability to drive innovation through this data as it enables enterprises to create, train, and tune machine learning models quickly. This includes leveraging Adobe Sensei, our artificial intelligence and machine learning technology, in the form of pre-built “recipes” or model specifications as well as automating and streamlining data processing pipelines and model training and re-training workflows. All this, and you control the cost through sharing and auto-scaling of compute resources and a flexible pricing model to scale to project demands as needed.

Data Science Workspace was officially launched as a product available to our customers in June, but at Adobe, we believe that it’s important to be users as well as developers of our own products. Therefore, along with several other early users, Adobe Research had over 100 of its summer interns make full use of it during the course of their projects and found that it led to large efficiency and time savings gains.

Hierarchical relationship between Recipes, Models, Training Runs, and Scoring Runs.

Hierarchical relationship between Recipes, Models, Training Runs, and Scoring Runs.

“Having full access to your data and compute is the most important thing,” said Tom Jacobs, director of the Systems Technology Lab at Adobe Research. “The number one goal of Data Science Workspace is to get customer data from Adobe Analytics or other sources into the data lake, and then just use it whenever you want. This keeps us from having to download it to different compute clusters multiple times.”

The efficacy of Data Science Workspace is being tried and tested and already has produced some great results for our customers. Our teams of interns in California and Bangalore created several interesting projects. In this article, we’ll dive into how the Adobe Research intern projects showcase Data Science Workspace making it easier to collect and analyze data that meet business needs.

How Adobe Research interns used the Data Science Workspace to create new ML models efficiently

Data Science Workspace, as a single platform to create, deliver, manage, and monitor AI-powered services, has helped to transform how Adobe works with its interns. “We know the pain of having to copy data and moving it onto hard drives, from place to place, cloud to cloud. Getting it all in one place, with lots of compute, in a reasonable, industry-standard framework, really is a fantastic outcome” said Tom. Data Science Workspace allows teams to considerably shorten the time from data to insights by streamlining the entire data science workflow, from gathering data to authoring and re-training models to deploying intelligent services

A diagram of Data Science Workspace infrastructure.

A diagram of Data Science Workspace infrastructure.

This has meant intern projects have been more collaborative and more useful than ever before. “In past years, with the interns, we’d be writing our own scripts to analyze the customer data we get. Now, with Data Science Workspace, we can directly use the SQL Query service and with Apache Spark, which is a distributed system that can work on a number of files at a time, we’ve been able to reduce compute time,” added Shiv Kumar, the Research Lead for Customer Intelligence Research at Adobe, who has been mentoring ML research interns for over 5 years. In addition, in some cases, interns could quickly experiment with using state-of-the-art ‘recipes’ to apply novel solutions to common customer problems. Finally, the process of retraining and re-testing models with different data for different use cases is more automated than ever.

Having all these tools at their disposal, Adobe’s research interns were able to work on a number of dynamic projects, including several using massive datasets from one of our mass retailer customers. Here’s a quick look at some of what they were able to accomplish:

Cold Start Analytics

An effective way to broadcast SKU-level sales on new products.

We took data from thousands of products from this mass retailer, and solved a key research problem: while our customers have a bunch of timeseries tools at their disposal to forecast data with their established products, what happens when they want to forecast sales for products that have just launched and don’t have historical data to work with?

For this project, the interns involved were able to use a relatively new deep learning approach, via Data Science Workspace. Moreover, we’ll be able to take the code that was employed and leverage it for separate product launch forecasts or other use cases such as to forecast the readership of newly launched news articles.

Disentangling clickthrough data

An effective way to determine the cause-effect relationships related to click-throughs (for example, when a business’s potential customer clicks an email offer).

A search ad click involves three separate events: a customer deciding to search, a firm deciding to show a search ad to the customer, and the customer deciding to click. The firm will be interested in knowing the effect of showing the search ad. This project provides techniques to disentangle the effect of these three events on a future conversion event and is particularly useful for attribution, determining which scenarios lead to click-through conversion, and many other applications.

For this project, the interns greatly benefitted from the data pipelines that Data Science Workspace includes in order to get the data to a place where it could easily be used for model training and creation.

Estimating share of wallet

A tool for understanding customer spending.

A customer’s “share of wallet” for a firm is the fraction of the customer’s total spend on a product that goes to the firm. This project estimates the share of wallet when only partial data is available. At the customer level, a share of wallet estimate can be used to identify key opportunities to increase product sales to different segments.

Here, Data Science Workspace provided an improved development experience by combining a familiar environment for those who use Jupyter notebooks with easy data access and optimized cluster management.

There were several other interesting use cases that arose from the interns’ usage of Data Science Workspace; from B2B attribution use cases, to models for better data-based decision making, to identifying the causes for cognitive biases in analysts’ decision-making processes, and more.

Data Science Workspace: A leader in the Forrester Wave

As evidenced by the Adobe Research intern project results to date, Data Science Workspace is about working smarter and delivering the personalized experiences your customers not only appreciate, but now expect at scale.

We’re continuing to improve and develop Data Science Workspace, but already the industry is noticing: Adobe was named the only ‘Leader’ in The Forrester Wave: Digital Intelligence Platforms report for Q2 2017. We received the highest scores possible in nine criteria, including behavioral targeting and online testing.

It’s a testament to our investment in the future of AI-powered data science. After all, AI empowers brands to differentiate themselves and create brand loyalty, and we’re committed to being our users’ best partner to do so.

Topics: Digital Transformation

Products: