A large volume of data is being generated on a daily basis. Storing this data and ensuring that it can be used by various departments for analytical, reporting and decision making purposes is essential for reporting at various levels. Data warehousing the process of storing, collecting and managing this data. In this blog, we’ll be talking about the top 66 data warehouse interview questions and answers which you must learn in 2021.
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Edureka
4. What is data transformation?
Data transformation is the process or method of changing the format, structure, or values of data.
39. Explain the chameleon method utilized in data warehousing.
Chameleon may be a methodology that may be a hierarchical clustering algorithm that overcomes the restrictions of the prevailing models and methods in data warehousing. This method operates on the sparse graph having nodes that represent data items and edges which represent the weights of the info items. This representation allows large data sets to be created and operated successfully. The tactic finds the clusters that are utilized in the info set using the two-phase algorithm. The primary phase consists of the graph partitioning that permits the clustering of the info items into a larger number of sub-clusters; the second phase, on the opposite hand, uses an agglomerative hierarchical clustering algorithm to look for the clusters that are genuine and may be combined alongside the sub-clusters that are produced.
60. What is the benefit of denormalization?
Denormalization adds required redundant terms into the tables to avoid using complex joins and lots of other complex operations. Denormalization doesn’t mean that normalization won’t be done, but the denormalization process takes place after the normalization process.
31. What is a degenerate dimension?
In a data warehouse, a degenerate dimension is a dimension key in the fact table that does not have its own dimension table. Degenerate dimensions commonly occur when the fact table’s grain is a single transaction (or transaction line).
6. Why do we need a Data Warehouse?
The primary reason for a data warehouse is for an organization to get advantage over its competitors. This also helps the organization to make smart decisions. Smarter decisions can be taken only if the executive responsibilities for taking such decisions have data at their disposal.
15. What is the difference between ER Modelling vs Dimensional Modelling?
ER Modelling | Dimension Modelling |
Used for OLTP Application design.Optimized for Select / Insert / Update / Delete | Used for OLAP Application design. Optimized for retrieving data and answering business queries. |
Revolves around entities and their relationships to capture process | Revolves around Dimensions for decision making, Doesn’t capture process |
The unit of storage is a table. | Cubes are units of storage. |
Contains normalized data. | Contains denormalized data |
35. What is the level of granularity of a fact table?
A fact table is usually designed at a low level of granularity. This means that we need to find the lowest amount of information stored in a fact table. For example, employee performance is a very high level of granularity while employee performance daily and employee performance weekly can be considered low levels of granularity because they are much more frequently recorded data. The granularity is the lowest level of information stored in the fact table; the depth of the data level is known as granularity in the date dimension. The level could be year month quarter period week and the day of granularity, so the day being the lowest level the year being the highest level the process consists of the following two steps determining the dimensions that are to be included and determining the location to find the hierarchy of each dimension of that information the above factors of determination will be resent as per the requirements.