How to create a star-schema data model?

What is Star-Schema Data Model?

A star-schema is a data modelling technique used in data warehousing. It consists of a central fact table and related dimension tables. The fact table contains the measures, or quantitative data, that we want to analyse, while the dimension tables contain descriptive data that provide context for the measures.

What are Dimensions and Facts?

Dimensions and facts are the two main types of tables in a star-schema. Dimension tables provide the context for the quantitative data in the fact table, while the fact table contains the measures or quantitative data we want to analyse.

  • A dimension table provides descriptive information about a business entity, such as a customer, product, or location.
  • A fact table contains the quantitative data that we want to analyse.

How to Identify Dimension Tables

To identify a dimension table, we look for tables that contain descriptive information about a business entity, such as customers, products, or locations. Here are some criteria for identifying dimension tables:

  • A dimension table contains attributes that describe the entity, such as name, location, and demographic information.
  • A dimension table usually has a one-to-many relationship with the fact table. For example, one customer can have multiple orders in the fact table.
  • A dimension table typically does not contain many numerical or quantitative data.

Example of Dimension Tables:

  1. Product Dimension Table: This table contains information about products that a company sells, such as the product name, category, price, brand, etc. Each product is a unique entity that can be identified by a primary key.
  2. Customer Dimension Table: This table contains information about customers, such as their name, address, email, phone number, etc. Each customer is a unique entity that can be identified by a primary key.
  3. Customer Segment Dimension Table: This table contains information about customer segments, such as new customers, loyal customers, high-value customers, etc. Each segment is a unique entity that can be identified by a primary key.
  4. Customer Demographics Dimension Table: This table contains information about customer demographics, such as age, gender, income, education level, etc. Each demographic group is a unique entity that can be identified by a primary key.
  5. Product Category Dimension Table: This table contains information about product categories, such as electronics, clothing, home goods, etc. Each category is a unique entity that can be identified by a primary key.
  6. Supplier Dimension Table: This table contains information about suppliers, such as their name, address, phone number, email, etc. Each supplier is a unique entity that can be identified by a primary key.
  7. Transaction Type Dimension Table: This table contains information about transaction types, such as sale, return, exchange, etc. Each transaction type is a unique entity that can be identified by a primary key.
  8. Employee Dimension Table: This table contains information about employees, such as their name, job title, department, hire date, salary, etc. Each employee is a unique entity that can be identified by a primary key.
  9. Salesperson Dimension Table: This table contains information about salespeople, such as their name, sales region, sales quota, etc. Each salesperson is a unique entity that can be identified by a primary key.
  10. Time Dimension Table: This table contains information about time, such as the date, month, year, week, day of the week, etc. Each date is a unique entity that can be identified by a primary key.
  11. Geography Dimension Table: This table contains information about geographical regions, such as countries, states, cities, and zip codes. Each region is a unique entity that can be identified by a primary key.
  12. Location Dimension Table: This table contains information about locations, such as store locations, distribution centers, etc. Each location is a unique entity that can be identified by a primary key.

How to Identify Fact Tables

To identify a fact table, we look for tables that contain numerical or quantitative data that we want to analyse. Here are some criteria for identifying fact tables:

  • A fact table contains measures, or quantitative data, that can be analysed or aggregated.
  • A fact table usually has a many-to-one relationship with dimension tables.
  • A fact table typically does not contain descriptive information about a business entity.

Example of Fact table:

  1. Sales/Orders Fact Table: This table contains information about sales made by a company, such as the quantity sold, the price, the discount, etc. Each sale is a unique event that can be identified by a combination of foreign keys from the product and customer dimension tables, as well as the date from the time dimension table.
  2. Inventory Fact Table: This table contains information about inventory levels, such as the number of units in stock, units sold, units returned, etc. Each inventory item is a unique event that can be identified by a combination of foreign keys from the product and geography dimension tables, as well as the date from the time dimension table.
  3. Website Conversion Fact Table: This table contains information about website conversions, such as completed purchases, form submissions, account registrations, etc. Each conversion is a unique event that can be identified by a combination of foreign keys from the visitor and time dimension tables.
  4. Web Traffic Fact Table: This table contains information about website traffic, such as the number of page views, clicks, unique visitors, etc. Each page view or click is a unique event that can be identified by a combination of foreign keys from the website, visitor, and time dimension tables.
  5. Web Analytics Fact Table: This table contains information about website interactions, such as page views, clicks, bounce rates, etc. Each interaction is a unique event that can be identified by a combination of foreign keys from the visitor and time dimension tables.
  6. Supply Chain Fact Table: This table contains information about the supply chain, such as the number of orders, order lead time, order fill rate, etc. Each order is a unique event that can be identified by a combination of foreign keys from the product and supplier dimension tables, as well as the date from the time dimension table.
  7. Financial Fact Table: This table contains information about financial transactions, such as revenue, expenses, profit, etc. Each transaction is a unique event that can be identified by a combination of foreign keys from the time dimension table, as well as the account dimension table.
  8. Employee Performance Fact Table: This table contains information about employee performance, such as sales revenue generated, customer satisfaction ratings, number of calls handled, etc. Each performance metric is a unique event that can be identified by a combination of foreign keys from the employee and time dimension tables.
  9. Marketing Campaign Fact Table: This table contains information about marketing campaigns, such as the number of emails sent, number of clicks, conversion rate, etc. Each campaign is a unique event that can be identified by a combination of foreign keys from the marketing channel and time dimension tables.
  10. Hotel Booking Fact Table: This table contains information about hotel bookings, such as the number of rooms booked, check-in and check-out dates, room rates, etc. Each booking is a unique event that can be identified by a combination of foreign keys from the hotel, customer, and time dimension tables.

Example 1:

Below is a sample schema for Airbnb, we will use this to create a star-schema.

listings table: This table contains detailed information about Airbnb listings.

Column NameData TypeDescription
idstringUnique identifier for the listing
namestringName of the listing
descriptionstringDescription of the listing
host_idstringUnique identifier for the host of the listing
pricefloatPrice per night for the listing
neighbourhoodstringName of the neighbourhood where the listing is located
bedroomsintegerNumber of bedrooms in the listing
bathroomsfloatNumber of bathrooms in the listing

calendar table: This table contains information about the availability and price of Airbnb listings for a given date range.

Column NameData TypeDescription
listing_idstringUnique identifier for the listing
datedateDate for which availability and price information is provided
availablestringIndicates whether the listing is available on the given date (values: “t” or “f”)
pricefloatPrice per night for the listing on the given date

reviews table: This table contains reviews written by Airbnb guests about their experiences staying in a particular listing.

Column NameData TypeDescription
listing_idstringUnique identifier for the listing
idintegerUnique identifier for the review
datedateDate on which the review was written
reviewer_idstringUnique identifier for the guest who wrote the review
commentsstringText of the review comments

hosts table: This table contains information about the hosts who list their properties on Airbnb.

Column NameData TypeDescription
idstringUnique identifier for the host
namestringName of the host
locationstringLocation of the host
host_sincedateDate on which the host joined Airbnb

neighborhoods table: This table contains a list of neighbourhoods in various cities around the world, along with their geographic coordinates.

Column NameData TypeDescription
neighbourhoodstringName of the neighbourhood
neighbourhood_groupstringName of the city or region in which the neighbourhood is located
latitudefloatLatitude coordinate of the neighbourhood
longitudefloatLongitude coordinate of the neighbourhood

Identify Dim

hosts table provides descriptive information about the hosts of each listing, such as their name, location, and the number of properties they manage.

neighborhoods table provides information about the neighbourhoods where the listings are located, such as the neighbourhood name, it’s geographic location, and other demographic information. These tables would be good candidates for dimension tables in a star-schema.

Identify Fact

listings table contains detailed information about each Airbnb listing, such as its name, description, pricing and the number of bedrooms and bathrooms. This table would be a good candidate for the fact table in a star-schema.