How do I Maintain Data Warehouse Quality?

Article Details
  • Written By: Carol Francois
  • Edited By: A. Joseph
  • Last Modified Date: 21 October 2019
  • Copyright Protected:
    Conjecture Corporation
  • Print this Article
Free Widgets for your Site/Blog
In 2019, a winery in Moldova hosted a 10-km race in the world's largest wine cellar, which holds 2 million bottles.  more...

November 19 ,  1863 :  The Gettysburg Address speech was delivered.  more...

There are four primary factors to consider when you are looking to maintain data warehouse quality: data integrity, data input source and methodology used, frequency of data import and audience. A data warehouse is an electronic repository of large quantities of data and is used increasingly by businesses and other larger organizations to store data in a tool that facilitates reporting and data output requirements. The usefulness of a data warehouse is driven primarily by the quality of the data and the responsiveness to user requirements.

Data integrity is a concept common to data warehouse quality as it relates to the rules governing the relationships between the data, dates, definitions and business rules that shape the relevance of the data to the organization. Keeping the data consistent and reconcilable is the foundation of data integrity. Steps used to maintain data warehouse quality must include a cohesive data architecture plan, regular inspection of the data and the use of rules and processes to keep the data consistent whenever possible.


The data input source for a data warehouse is typically an import tool or program. The easiest way to maintain data warehouse quality is to implement rules and checkpoints in the data import program itself. Data that does not follow the appropriate pattern will not be added to the data warehouse but will require user intervention to correct, reconcile or change the program. In many organizations, these types of changes can be implemented only by the data warehouse architect, which greatly increases the data warehouse quality.

The accuracy and relevance of the data is essential to maintaining data warehouse quality. The timing of the import and frequency has a large impact on the overall usefulness of the tool, as well as the quality. For example, if purchase order information is entered into the warehouse but invoices are updated only intermittently, the ability to report accurately on purchase-related activity is compromised.

Data warehouse quality is easiest to maintain and support if the users are knowledgeable and have a solid understanding of the business processes. Training the users to not only understand how to build queries, but on the underlying data warehouse structure enables them to identify inconsistencies much faster and to highlight potential issues early in the process. Any changes to the data tables, structure or linkages and the addition of new data fields must be reviewed with the entire team of users and support staff members in order to ensure a consistent understanding of the risks and challenges that might occur.


You might also Like


Discuss this Article

Post 4

@everetra - I would prefer more power in data warehouse reporting. I’ve been able to pull reports, but it seems that the data architects create these subsets of data for us to work on.

That’s okay I suppose, but sometimes I want to pull in larger data sets. I have to jump through hoops to get the architect to load in more data into the warehouse for us to work on.

I understand his position. He wants to make sure we only get what we need to do our job, and not bog down the server. Still, I find it a bit too restrictive for my needs.

Post 3

@NathanG - I think it’s important not to give end users too much power. That way they can’t break the data so to speak. For example in our company end users pretty much have read only access. That means they can build basic “SELECT” queries but they can’t run queries that will modify the tables in anyway.

If they enter data, the data gets checked for proper formatting and stuff like that at the entry stage. If it doesn’t follow proper formatting it gets flagged and doesn’t get entered.

We use business intelligence software to input the information. With business intelligence data warehouse checks are put in place so that only good data enters the system. Of course these checks are only as good as the programmers who define them, so as long as they’ve done their jobs properly.

Post 2

@David09 - Well, one way to avoid the data integrity issue is to closely monitor how the data gets there in the first place. As long as everyone is following the same data warehouse methodology then the data entered should follow the same format.

As an architect I think you could be the gatekeeper in this respect. You could create a data entry form for all users to use to enter the data.

If data is not coming in through a data entry form but instead is being bulk loaded (like through another server or something) then I suppose that you would have to stipulate what the data warehouse requirements are for the data that you’re putting in.

Post 1

Of all the components listed for data warehouse ETL, I would have to say that data integrity is the most important. The most common problem that I’ve seen in businesses I’ve worked at is a lack of consistency.

For example, a simple customer record may be listed with different names but they are all the same customer. Of course, if you have a customer ID then that should help you tie the records together. But these records themselves may have information that is inconsistent.

So while it seems at first glance that it would be an automated process, at some point you will have to ask around to prune up the data correctly. It tends to be very time consuming in the end I think.

Post your comments

Post Anonymously


forgot password?