Data warehouses store vast amounts of data for use in many different fields. There are two main types of data warehouse design: top-down and bottom-up. The two designs have their own advantages and disadvantages. Bottom-up is easier and cheaper to implement, but it is less complete, and data correlations are more sporadic. In a top-down design, connections between data are obvious and well-established, but the data may be out of date, and the system is costly to implement.
Data marts are the central figure in data warehouse design. A data mart is a collection of data based around a single concept. Each data mart is a unique and complete subset of data. Each of these collections is completely correlated internally and often has connections to external data marts.
The way data marts are handled is the main difference between the two styles of data warehouse design. In the top-down design, data marts occur naturally as data is put into the system. In the bottom-up design, data marts are made directly and connected together to form the warehouse. While this may seem like a minor difference, it makes for a very different design.
The top-down method was the original data warehouse design. Using this method, all of the information the organization holds is put into the system. Each broad subject will have its own general area within the databases. As the data is used, connections will appear between correlative data points, and data marts will appear. In addition, any data in the system stays there forever—even if the data is superseded or trivialized by later information, it will stay in the system as a record of past events.
The bottom-up method of data warehouse design works from the opposite direction. A company puts in information as a standalone data mart. As time goes on, other data sets are added to the system, either as their own data mart or as part of one that already exists. When two data marts are considered connected enough, they merge together into a single unit.
The two data warehouse designs each have their own strong and weak points. The top-down method is a huge project for even smaller data sets. Since big projects are also more costly, it is the most expensive in terms of money and manpower. If the data warehouse is finished and maintained, it is a vast collection, containing everything that the company knows.
The bottom-up process is much faster and cheaper, but since the data is entered as needed, the database will never actually be complete. In addition, correlations between data marts are only as strong as their usage makes them. If a strong correlation exists, but no users see it, it goes unconnected.