Using the ASUG Taxonomy to improve data quality

June 21, 2012

Now that the new version of the ASUG Data Quality Taxonomy has been published I thought it might be a good idea to review some of the ways the taxonomy is being used and potentially can be used as more companies implement this open source taxonomy.



What is it? 

The ASUG Taxonomy for Data Quality Controls is simply an open source classification system for data quality queries and controls based on the SAP data model. While it was originally established as a means of bench-marking data quality methods between companies, the use of it has evolved to include some benefit points that I will touch on below.

  • It is intended to allow for better transparency for data quality bench-marking, both for what and how data quality is being measured and for the results.

  • It is not a tool, although it can be used to help evaluate tools.

  • It is not a list of key KPIs, but it can easily serve to aggregate individual queries for data quality into enterprise level KPIs in a standard way.

  • More background is available in a whitepaper posted on the ASUG website or by contacting me for a copy. 


Why use a standard organizational framework on your queries in the first place? 

The best answer after the realization of the networking / bench-marking value is to avoid blind spots in your data quality approach.If all you are ever doing is creating data quality tests and or controls based on problems after the fact, then you are never getting ahead of the curve! The Taxonomy allows you to view your controls across a framework to allow you to see more easily if there are specific areas that you are not targeting for controls. If those areas also map to the key drivers of your enterprise’s objectives (improve percentage perfect order or lower inventory for instance) then you have a gap in how you can ensure your data is meeting the needs of the business. The taxonomy does not do the work for you of course, but it does help to organize and visualize it. Because it is in a hierarchy, it also allows you to benchmark with other companies or regions in your own company using the same open source framework.


Follow the money. Better yet: Chase the money!


So why use it if you do not want to benchmark?


Use it to save your company money and avoid data problems BEFORE the business asks.


All too often, the Data Governance Group at a company is all too focused on data maintenance and simple process improvement to see the bigger picture of enterprise quality and how their role can be much larger in making the company money. By leveraging the taxonomy and mapping it at your company to your enterprise’s key corporate objectives, you can start to focus on zones of data where quality is costing money to the business in a proactive way. Using the taxonomy as a checklist and road-map to saving your company money and helping it to meet its objectives is really a no-brainer.


Improve communication


You can use the taxonomy also to improve communication to your executive leadership as well as the operation leaders in the business in a common standard way. Nothing glazes over an executive’s eyeballs like looking at the results of 500 line of data quality errors (lesson learned the hard way. Heck, I thought they were fascinating).  


As the taxonomy is a hierarchy, if you express your quality specifics in terms of the hierarchy, then the top level is critical to the execs and they may have an interest in level 2. The operational leadership is focused on level 2 and 3 specifics and more detail is required the deeper into the organization you go until the steward gets his very detailed report. Imagine if your dashboard for quality leveraged the same framework as other companies. Not only is your credibility for how you are rolling it up higher, but opportunities for easier and deeper bench-marking are available for free. 


One may be saying to oneself about now, “My landscape for data quality monitoring is pretty simple. Why is something like this even needed?” The answer is mature organizations three years ago were running on average 80-100 control checks for the material master in SAP. Today based on observing these same companies, I estimate the average is >>200 controls and some companies are over 900. You must keep these organized and while you can invent something, wouldn’t using a taxonomy framework with 4 years of vetting history from 20+ companies make sense? If you are using a structured name only to organize your controls, like is required with many of the DQ Tools, then there is a risk that your naming becomes overly technical so not as useful for the business. Also, that technique is essentially flat so that somewhere around 200-300 total controls people will start to have trouble finding a monitoring query and will start to just create their own (redundant) ones. The taxonomy framework promotes reuse and redeploy rather than recreate all new individually focused checks. 


The fasted way to 3000 queries with many duplicates clogging your data quality tool is not to use a framework for organizing them! 


Active / front end vs. Passive / back end controls 


As more companies use the Taxonomy for organizing their controls and their “world of quality queries”, some interesting new ideas are taking root.

The best new idea that I have heard is to use this framework for the “active controls” as well as for the data quality monitoring query organization. For instance, if you were just looking at quality monitoring for say “Valuation Class” and you have a hard-coded control in your SAP system an active control that pre-populates Val Class based on material type and group, then there is less need to monitor Val Class on the back end.


However, if you are bench-marking or just looking for blind spots and if you do not document this active control, one could get the impression that you have a gap. One installing company is addressing this gap by documenting their active controls as well as their passive (or back end monitoring) controls to have a complete picture for the coverage of their data governance vs. the total master data in use in the same framework. I think this is brilliant.


You should consider this if you are a governance leader in your company, because it is your responsibility to know quickly find all the controls for data quality active or passive and where are the blind spots. It is your responsibility as well to be able to communicate properly your coverage over the data elements in use. (So what is the percentage of your master that that is controlled?) If you cannot do this then you are at risk of being caught in a tough position when something fails.


You can create your own framework to protect you or use one that is created already.




The ASUG Quality (controls) Taxonomy) is easy to follow and has been vetted at top SAP running companies. 


Not only is it an improvement to only using highly technical naming conventions based on the technical metadata of the dependent fields to name your controls, but because it is a hierarchy, it also provides another advantage. Some additional advantages are using it as a framework for rolling the quality data up consistently into a dashboard and communicating the percent of control coverage that you have by domain and type of data.


This might sound like a big chunk of work.


It is not.


It truly isn’t. 


However, like any journey, it helps to have a guide.


RuleBase 5.0 leverages the Taxonomy in its controls database (QueryBase).


Richard A. King


Share on Facebook
Share on Twitter
Please reload

Featured Posts

Knowledge Management and the Master Data Steward

November 17, 2014

Please reload

Recent Posts

January 16, 2017

Please reload

Please reload

Search By Tags
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square