The Data Is Wrong! Or is it?

A deeper look at data quality.

A few years back I was having a discussion with a client about the quality of data that his team was receiving from Adobe. He said they saw inconsistencies, it couldn’t be relied upon, it had bad data, and — the bottom line — it was wrong.

Ironically, I have had this discussion before, both with other clients, many engineers, and I’ve seen it firsthand. It is a frustrating place to be in. Many people have spent a lot of time both putting their implementation in place and just as much, if not more, time digging through trying to troubleshoot an issue. With this client we agreed to have me come on-site and walk through what the issue was. I don’t think they expected, nor got what they were asking for.

Data quality is an interesting thing, it reminds me of the famous quote, “I can’t tell you what it is, but I know it when I see it.” The reason is, when you press most people for it, they can tell you they want better data quality, but have no way of measuring it (or chose not to). Data quality is also one of those sneaky issues in that, when it is obviously wrong, we can adjust or ignore, etc. But the real problem is when we don’t know it is wrong. Thus, back to our quote… what if you can’t see it? In a previous role I had discovered a data quality issue that ran back years, and no one knew. In fact, the problem was so large, the company ended up having to restate revenue. So what does this mean for you? You may not be in finance or passing data into your P&L, but does it make it any less important?

The importance of data

Data seems to permeate everywhere in our data-driven marketing ecosystem, and yet data quality still feels like something we know is important, but aren’t quite sure how to deal with it. The problem feels so simple at times and we query or code our way around it. While at other times, the problem is so massive we find ourselves dealing with submitting tickets to fix it. Many of those times we don’t know the source of the problem and then we are asked to justify the resources or get bounced around between vendors and internal engineering to isolate the root cause before anyone can work on it. So, how do we change that and make it part of how things are done? Not best practice, but standard practice.

It starts with education. Everyone knows the “garbage in, garbage out,” phrase, but do they go two, three, or four steps further to understand where it is coming from, as well as where is it going to, and so on, and to understand a full end-to-end data creation to data deletion lifecycle?

Back to my on-site request. I arrived at the client’s office and spent the day walking this team through the basics of how Adobe reports on data. I then began to walk back through all the various ways that data could have been manipulated along the way. We discussed processing rules, VISTA rules, DTM, SAINT classifications, data sources, custom JavaScript, SDK, data insertion API, and the data layer. While they were shocked that so many touchpoints were involved, I didn’t stop there. I explained that, in the end, what Adobe reports is what it is given. It isn’t creating data out of thin air — however, it can manipulate data based on rules in the system. Given that, the core of most data quality issues falls into these categories:

Ensuring the data is put into the data layer correctly.
Ensuring the data is mapped to the correct data element/eVar/prop/event/mbox/etc.
Ensuring the event fires at the right time.

While the above is specific to Adobe Analytics, each solution will have its own method of modifying data. Adobe Target has its profile scripts while Audience Manager has derived signals. Launch, by Adobe has endless ways of manipulating data as well. The point being the closer we could put the end state of the data needed by the solutions, the less manipulating each solution must do, and, therefore, the simpler the whole ecosystem becomes.

It was at this point I started walking them through their back-end systems, asking them where each data point came from that populated data layer. We discussed if the data layer was populated dynamically or was pulling from a table or page properties. How were those populated and maintained? Do the people that determine these values preview the data sent into Adobe? We dove into what the process was for getting changes onto the site if a bug was found and how long that took.

After we were done with the upstream dependencies, we went through the downstream dependencies, so we had a complete end-to-end picture of where the data originated to the last point it was used. This is where we went through not just how they were ingesting data into dashboards, but to show how data flowed from Analytics into Audience Manager, the segments it defined out to Target, and any off-site remarketing (more on this in another post). Since they were utilizing data from Analytics in their offline analysis, we walked through the various places that was used by their internal systems.

Available resource

Rather than leaving them overwhelmed and on their own, we discussed some resources available to them within the Adobe ecosystem.

Adobe Consulting’s Analytics Health Dashboard
ObservePoint
Anomaly Detection & Contribution Analysis
Intelligent Alerts
Experience Platform Auditor (technically, this didn’t exist at the time, but you should use it)

Things to keep in mind

So, when you’re ready to dig into your own data quality initiative (sounds better than “issues”), feel free to use the above resources. Do some research on your data sources and destinations. Remember to take a holistic look end to end, and sit down and put together a plan that covers:

1. Measure with KPIs.

For example, the number of orders submitted by bots.
Tie these to business impact (cost, revenue, etc.).

2. Monitor it so you know your changes are effective and when new issues arise.

3. Know and establish your thresholds beforehand.

Once you hit your threshold, move on to another area.

4. Start simple.

Run reports that check for impossible combinations.
For example, the number of Toyota Camrys made by Honda
Or revenue-on-cart abandon

5. Create a process and/or methodology for fixing anomalies that doesn’t put you at the bottom of a priority list (dependent on resources you don’t control).

6. Governance around data, segments, and reuse across both Adobe and external solutions.

7. Communicate and document when fixes go live.

You don’t want people surprised when their KPI changes, a data element they are using disappears, or traffic shifts to a different channel.

Good luck, and remember — it is a journey. Don’t chase perfection.