T-SQL/MSBI Knowledge Share Videos

Basic Data Warehouse Concepts

What is OLTP system and OLAP system?
OLTP system: OnLine Transaction Processing system is one where the transactions will be happening very often i.e. the data gets changed frequently. Example, any application database is a OLTP system.Consider any online banking application. The moment you transfer the amount, the value gets over written in the database with new balance and you won't get the old balance amount. No history is captured here. In a Data Ware house architecture, source systems are the OLTP systems.

OLAP system: OnLine Analytical Prcoessing system is nothing but a Data Warehouse. It maintains the history of record and is mainly used for analytics to analyze the business growth. Example, in case of online banking if we take OLTP system it just over writes the data, but OLAP contains the old value and new value as well.


Difference between OLTP and OLAP?
  • Used for transaction processing.
  • Holds current and most recent data.
  • Application driven
  • Normalized data
  • Volatile data
  • Used for Query processing
  • Holds historical data
  • Analysis driven
  • Denormalized data
  • Non-Volatile data
What is Dimension?
A dimension is something that qualifies a quantity(measure).
If we just say... "20kg", it does not mean anything. But 20kg of Rice(Product) is sold to Ramesh (Customer) on 5th April (Date), gives a meaningful sense.These Product, Customer, Date are some dimension that qualified the measure. 

What is Fact?
A Fact is something that is quantifiable (or measurable). Facts are typically (but not always) numerical values that can be aggregated.

What is Factless Fact?
A Factless fact table is just like a bridge table between two dimensions to resolve many-to-many relationship. 
We know that fact table is a collection of many facts and measures having multiple keys joined with one or more dimension tables. Facts contains both numeric and additive fields, but factless fact table are different from all these.
A factless fact table is fact table that does not contain fact. They contain only dimensional keys and it captures events that happen only at information level but not included in the calculations level. Just the information about an event that happened over a period.

There are two types of factless fact table :
  • Those that describes events
  • Those that describes condtions.
Factless fact tables for Events:
This table captures events. many event-tracking tables in dimensional data warehouses turn out to be fact less. Sometimes there seem to be no facts associated with an important business process. Events or activities occur that you wish to track, but you find no measurements. In situations like this, build a standard transaction-grained fact table that contains no facts.
For eg.

The above fact is used to capture the leave taken by employee. Whenever an employee takes leave a record is created with the dimensions. Using the fact FACT_LEAVE we can answer many questions like 
  • Number of leaves taken by an employee
  • The type of leave an employee takes
  • Details of the employee who took leave
Factless fact table for Conditions:
Factless fact tables are also used to model conditions or other important relationships among dimensions. In these case, there are no clear transactions or events. It is used to support negative analysis report. For example a store that did not sell a product for a given period. To produce such report, you need to have a fact table to capture all the possible combinations. You can then figure out what is missing.
For eg, FACT_PROMO gives the information about the products which have promotions but still dint not sell

The fact answers the below questions:
  • To find out products that have promotions.
  • To find out products that have promotion that sell.
  • The list of products that have promotion but did not sell.
Apart from these another example to capture the events using factless fact is, Consider Teacher and student dimensions. One teacher teaches many student and one student is taught by many teachers. For this we can establish a factless fact table which hold the key from each table and we can find the following informations.
  • A student is taught by how many teachers?
  • How many students a teacher teaches?
  • Which teacher teaches more number of students?
What is the relationship between Fact and the Dimension?
There is one-to-Many relationship between Dimension and the Fact. i.e. one record in dimension table references multiple records in the fact table.

What is the Difference between ER Modeling and Dimensional Modeling?
ER modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

 What is a business key?
A business key uniquely identifies each record in the table that is used to populate the dimensional table. i.e., a column which is unique in the source table which is populating the dimension table.

 What is Surrogate key and the purpose of surrogate key?
A surrogate key is a numeric number that uniquely identifies records in the dimension table. 
Dimension table stores the history of a record. Each record has a primary key in the source table. When tracking the history of a record it's difficult to distinguish the record, since for each record same primary key from the source in the dimension table. So to avoid that conflict a Surrogate key is used.