What is Star-Schema Data Model?
A star-schema is a data modelling technique used in data warehousing. It consists of a central fact table and related dimension tables. The fact table contains the measures, or quantitative data, that we want to analyse, while the dimension tables contain descriptive data that provide context for the measures.
What are Dimensions and Facts?
Dimensions and facts are the two main types of tables in a star-schema. Dimension tables provide the context for the quantitative data in the fact table, while the fact table contains the measures or quantitative data we want to analyse.
- A dimension table provides descriptive information about a business entity, such as a customer, product, or location.
- A fact table contains the quantitative data that we want to analyse.
How to Identify Dimension Tables
To identify a dimension table, we look for tables that contain descriptive information about a business entity, such as customers, products, or locations. Here are some criteria for identifying dimension tables:
- A dimension table contains attributes that describe the entity, such as name, location, and demographic information.
- A dimension table usually has a one-to-many relationship with the fact table. For example, one customer can have multiple orders in the fact table.
- A dimension table typically does not contain many numerical or quantitative data.
Example of Dimension Tables:
Product Dimension Table: This table contains information about products that a company sells, such as the product name, category, price, brand, etc. Each product is a unique entity that can be identified by a primary key.Customer Dimension Table: This table contains information about customers, such as their name, address, email, phone number, etc. Each customer is a unique entity that can be identified by a primary key.Customer Segment Dimension Table: This table contains information about customer segments, such as new customers, loyal customers, high-value customers, etc. Each segment is a unique entity that can be identified by a primary key.Customer Demographics Dimension Table: This table contains information about customer demographics, such as age, gender, income, education level, etc. Each demographic group is a unique entity that can be identified by a primary key.Product Category Dimension Table: This table contains information about product categories, such as electronics, clothing, home goods, etc. Each category is a unique entity that can be identified by a primary key.Supplier Dimension Table: This table contains information about suppliers, such as their name, address, phone number, email, etc. Each supplier is a unique entity that can be identified by a primary key.Transaction Type Dimension Table: This table contains information about transaction types, such as sale, return, exchange, etc. Each transaction type is a unique entity that can be identified by a primary key.Employee Dimension Table: This table contains information about employees, such as their name, job title, department, hire date, salary, etc. Each employee is a unique entity that can be identified by a primary key.Salesperson Dimension Table: This table contains information about salespeople, such as their name, sales region, sales quota, etc. Each salesperson is a unique entity that can be identified by a primary key.Time Dimension Table: This table contains information about time, such as the date, month, year, week, day of the week, etc. Each date is a unique entity that can be identified by a primary key.Geography Dimension Table: This table contains information about geographical regions, such as countries, states, cities, and zip codes. Each region is a unique entity that can be identified by a primary key.Location Dimension Table: This table contains information about locations, such as store locations, distribution centers, etc. Each location is a unique entity that can be identified by a primary key.
How to Identify Fact Tables
To identify a fact table, we look for tables that contain numerical or quantitative data that we want to analyse. Here are some criteria for identifying fact tables:
- A fact table contains measures, or quantitative data, that can be analysed or aggregated.
- A fact table usually has a many-to-one relationship with dimension tables.
- A fact table typically does not contain descriptive information about a business entity.
Example of Fact table:
Sales/Orders Fact Table: This table contains information about sales made by a company, such as the quantity sold, the price, the discount, etc. Each sale is a unique event that can be identified by a combination of foreign keys from the product and customer dimension tables, as well as the date from the time dimension table.Inventory Fact Table: This table contains information about inventory levels, such as the number of units in stock, units sold, units returned, etc. Each inventory item is a unique event that can be identified by a combination of foreign keys from the product and geography dimension tables, as well as the date from the time dimension table.Website Conversion Fact Table: This table contains information about website conversions, such as completed purchases, form submissions, account registrations, etc. Each conversion is a unique event that can be identified by a combination of foreign keys from the visitor and time dimension tables.Web Traffic Fact Table: This table contains information about website traffic, such as the number of page views, clicks, unique visitors, etc. Each page view or click is a unique event that can be identified by a combination of foreign keys from the website, visitor, and time dimension tables.Web Analytics Fact Table: This table contains information about website interactions, such as page views, clicks, bounce rates, etc. Each interaction is a unique event that can be identified by a combination of foreign keys from the visitor and time dimension tables.Supply Chain Fact Table: This table contains information about the supply chain, such as the number of orders, order lead time, order fill rate, etc. Each order is a unique event that can be identified by a combination of foreign keys from the product and supplier dimension tables, as well as the date from the time dimension table.Financial Fact Table: This table contains information about financial transactions, such as revenue, expenses, profit, etc. Each transaction is a unique event that can be identified by a combination of foreign keys from the time dimension table, as well as the account dimension table.Employee Performance Fact Table: This table contains information about employee performance, such as sales revenue generated, customer satisfaction ratings, number of calls handled, etc. Each performance metric is a unique event that can be identified by a combination of foreign keys from the employee and time dimension tables.Marketing Campaign Fact Table: This table contains information about marketing campaigns, such as the number of emails sent, number of clicks, conversion rate, etc. Each campaign is a unique event that can be identified by a combination of foreign keys from the marketing channel and time dimension tables.Hotel Booking Fact Table: This table contains information about hotel bookings, such as the number of rooms booked, check-in and check-out dates, room rates, etc. Each booking is a unique event that can be identified by a combination of foreign keys from the hotel, customer, and time dimension tables.
Example 1:
Below is a sample schema for Airbnb, we will use this to create a star-schema.
listings table: This table contains detailed information about Airbnb listings.
| Column Name | Data Type | Description |
|---|---|---|
id | string | Unique identifier for the listing |
name | string | Name of the listing |
description | string | Description of the listing |
host_id | string | Unique identifier for the host of the listing |
price | float | Price per night for the listing |
neighbourhood | string | Name of the neighbourhood where the listing is located |
bedrooms | integer | Number of bedrooms in the listing |
bathrooms | float | Number of bathrooms in the listing |
calendar table: This table contains information about the availability and price of Airbnb listings for a given date range.
| Column Name | Data Type | Description |
|---|---|---|
listing_id | string | Unique identifier for the listing |
date | date | Date for which availability and price information is provided |
available | string | Indicates whether the listing is available on the given date (values: “t” or “f”) |
price | float | Price per night for the listing on the given date |
reviews table: This table contains reviews written by Airbnb guests about their experiences staying in a particular listing.
| Column Name | Data Type | Description |
|---|---|---|
listing_id | string | Unique identifier for the listing |
id | integer | Unique identifier for the review |
date | date | Date on which the review was written |
reviewer_id | string | Unique identifier for the guest who wrote the review |
comments | string | Text of the review comments |
hosts table: This table contains information about the hosts who list their properties on Airbnb.
| Column Name | Data Type | Description |
|---|---|---|
id | string | Unique identifier for the host |
name | string | Name of the host |
location | string | Location of the host |
host_since | date | Date on which the host joined Airbnb |
neighborhoods table: This table contains a list of neighbourhoods in various cities around the world, along with their geographic coordinates.
| Column Name | Data Type | Description |
|---|---|---|
neighbourhood | string | Name of the neighbourhood |
neighbourhood_group | string | Name of the city or region in which the neighbourhood is located |
latitude | float | Latitude coordinate of the neighbourhood |
longitude | float | Longitude coordinate of the neighbourhood |
Identify Dim
hosts table provides descriptive information about the hosts of each listing, such as their name, location, and the number of properties they manage.
neighborhoods table provides information about the neighbourhoods where the listings are located, such as the neighbourhood name, it’s geographic location, and other demographic information. These tables would be good candidates for dimension tables in a star-schema.
Identify Fact
listings table contains detailed information about each Airbnb listing, such as its name, description, pricing and the number of bedrooms and bathrooms. This table would be a good candidate for the fact table in a star-schema.