In this article I’m going to show you how I build my time dimensions. There are lots of different ways to create time dimensions, and there are lots of different requirements, from the attributes in the dimension to the granularity of the data. You can use these scripts to build an entire time dimension, and you can use them as a starting point for yours.
I’ve built time dimensions using various methods, but this is the one that I’ve been using for a while. The big diversion that I take from the norm is that the key for my time dimension is not an integer, it’s a date. I do this for a few reasons:
You don’t always need to join to the time dimension
The biggest cost associated with querying a well structured data warehouse is the joins. When your time dimension key is a date and you are simply querying for data within a date range you don’t need to join to the time dimension to perform that query. One less join. Maybe more, depending on how many date keys you have in your fact table.
I can happily clear and reload the time dimension and not lose relationships
The way that I add new dates to my time dimension is to clear and reload the table with the new range. It’s super quick and with a date as the key you’re guaranteed that the value of the key before will match the value of the key afterwards. If you’re using an incrementing integer then that’s not going to happen. You can hold your data as an INT in a YYYYMMDD format (e.g. 20190130), but if you’re doing that why not just go for a date?
Dates are smaller than INTs
The DATE data type is 3 bytes. INTs are 4 bytes. If you’re going down to intra-day precision such as hourly entries in your time dimension, your data type might be a DATETIME2 which would be 6 bytes in this scenario, compared to a BIGINT which is 8 bytes.
The Code
So enough about why I do it. Here’s how I go about doing it.
Create and populate a LookupNumbers table
Firstly I create a table to store a lot of numbers.
CREATE TABLE [dbo].[LookupNumbers] ( [Number] INT NOT NULL )
Then I populate it with a lot of numbers.
-- Generate lots of numbers in sequence to allow us to efficiently generate dates for the time dimension IF NOT EXISTS (SELECT 1 FROM LookupNumbers) BEGIN PRINT 'Loading LookupNumbers' INSERT dbo.LookupNumbers (Number) VALUES (1); INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers; END
Create a table valued function to get all of the dates within a range
--===================================================================== --Created by: Chris Jenkins --Date: 6th September 2018 --Description: Gets all dates within the passed in range. Used to -- populate the time dimension. --===================================================================== CREATE FUNCTION [dbo].[udf_GetAllDatesInRange] ( @dStartDate DATE , @dEndDate DATE ) RETURNS TABLE AS RETURN (SELECT DATEADD(DAY,Number-1,@dStartDate) as TheDate FROM LookupNumbers WHERE Number <= DATEDIFF(DAY,@dStartDate,@dEndDate) + 1);
Now create the time dimension
CREATE TABLE [dbo].[DimDate] ( [DateKey] DATE NOT NULL, [FullDateName] VARCHAR(30) NOT NULL, [CalendarYear] SMALLINT NOT NULL, [CalendarQuarterKey] INT NOT NULL, [CalendarQuarterNumber] TINYINT NOT NULL, [CalendarQuarterShortName] CHAR(2) NOT NULL, [CalendarQuarterFullName] CHAR(9) NOT NULL, [CalendarMonthKey] INT NOT NULL, [CalendarMonthNumber] TINYINT NOT NULL, [CalendarMonthShortName] CHAR(3) NOT NULL, [CalendarMonthFullName] VARCHAR(9) NOT NULL, [CalendarMonthYearShortName] CHAR(8) NOT NULL, [CalendarMonthYearFullName] VARCHAR(14) NOT NULL, [CalendarWeekKey] INT NOT NULL, [CalendarWeekNumber] TINYINT NOT NULL, [CalendarWeekShortName] VARCHAR(4) NOT NULL, [CalendarWeekFullName] VARCHAR(7) NOT NULL, [CalendarWeekYearShortName] VARCHAR(9) NOT NULL, [CalendarWeekYearFullName] VARCHAR(12) NOT NULL, [CalendarDayOfMonthNumber] TINYINT NOT NULL, [CalendarDayOfMonthName] VARCHAR(4) NOT NULL, [CalendarDayOfWeekNumber] TINYINT NOT NULL, [CalendarDayShortName] CHAR(3) NOT NULL, [CalendarDayFullName] VARCHAR(9) NOT NULL )
Finally populate the time dimension
--================================================================================= --Created by: Chris Jenkins --Date: 6th September 2018 --Description: Regenerates all data for the time dimension. This is easier and -- cleaner than checking and maintaining existing data. We just clear -- and repopulate the table. It should complete almost instantaneously. --================================================================================= CREATE PROCEDURE [dbo].[usp_PopulateDimDate] AS SET NOCOUNT ON; DECLARE @nFututeYearsToPopulate INT = 1 -- This can be passed in or retrieved from lookup table , @nPastYearsToRetain INT = 5; -- This can be passed in or retrieved from lookup table --Get the first day of this year and then take off @nPastYearsToRetain to get the start date DECLARE @dStartDate DATE = DATEADD(YEAR, -@nPastYearsToRetain, DATEADD(YEAR, DATEDIFF(YEAR,0,SYSDATETIME()), 0)); --Get the last day of this year and then add @nFututeYearsToPopulate to get the end date DECLARE @dEndDate DATE = DATEADD(YEAR, @nFututeYearsToPopulate, DATEADD(YEAR, DATEDIFF(YEAR,0,SYSDATETIME()) + 1, -1)); BEGIN TRY BEGIN TRAN TRUNCATE TABLE dbo.DimDate; INSERT INTO [dbo].[DimDate] ([DateKey] ,[FullDateName] ,[CalendarYear] ,[CalendarQuarterKey] ,[CalendarQuarterNumber] ,[CalendarQuarterShortName] ,[CalendarQuarterFullName] ,[CalendarMonthKey] ,[CalendarMonthNumber] ,[CalendarMonthShortName] ,[CalendarMonthFullName] ,[CalendarMonthYearShortName] ,[CalendarMonthYearFullName] ,[CalendarWeekKey] ,[CalendarWeekNumber] ,[CalendarWeekShortName] ,[CalendarWeekFullName] ,[CalendarWeekYearShortName] ,[CalendarWeekYearFullName] ,[CalendarDayOfMonthNumber] ,[CalendarDayOfMonthName] ,[CalendarDayOfWeekNumber] ,[CalendarDayShortName] ,[CalendarDayFullName]) SELECT TheDate AS DateKey , DATENAME(DAY, TheDate) + CASE WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '0' THEN 'th ' WHEN DATENAME(DAY, TheDate) IN ('11', '12', '13') THEN 'th ' WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '1' THEN 'st ' WHEN LEN(TheDate) = 2 THEN 'th ' WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '2' THEN 'nd ' WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '3' THEN 'rd ' ELSE 'th ' END + DATENAME(MONTH, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS FullDateName , DATEPART(YEAR, TheDate) AS CalendarYear , DATENAME(YEAR, TheDate) + DATENAME(QUARTER, TheDate) AS CalendarQuarterKey , DATEPART(QUARTER, TheDate) AS CalendarQuarterNumber , 'Q' + DATENAME(QUARTER, TheDate) AS CalendarQuarterShortName , 'Quarter ' + DATENAME(QUARTER, TheDate) AS CalendarQuarterFullName , DATENAME(YEAR, TheDate) + CONVERT(VARCHAR, REPLACE(STR(DATEPART(MONTH, TheDate),2),SPACE(1), '0')) AS CalendarMonthKey , DATEPART(MONTH, TheDate) AS CalendarMonthNumber , LEFT(DATENAME(MONTH, TheDate), 3) AS CalendarMonthShortName , DATENAME(MONTH, TheDate) AS CalendarMonthFullName , LEFT(DATENAME(MONTH, TheDate), 3) + ' ' + DATENAME(YEAR, TheDate) AS CalendarMonthYearShortName , DATENAME(MONTH, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS CalendarMonthYearFullName , DATENAME(YEAR, TheDate) + CONVERT(VARCHAR, REPLACE(STR(DATEPART(WEEK, TheDate),2),SPACE(1), '0')) AS CalendarWeekKey , DATEPART(WEEK, TheDate) AS CalendarWeekNumber , 'WK' + DATENAME(WEEK, TheDate) AS CalendarWeekShortName , 'Week ' + DATENAME(WEEK, TheDate) AS CalendarWeekFullName , 'WK' + DATENAME(WEEK, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS CalendarWeekYearShortName , 'Week ' + DATENAME(WEEK, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS CalendarWeekYearFullName , DATEPART(DAY, TheDate) AS CalendarDayOfMonthNumber , DATENAME(DAY, TheDate) + CASE WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '0' THEN 'th' WHEN DATENAME(DAY, TheDate) IN ('11', '12', '13') THEN 'th' WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '1' THEN 'st' WHEN LEN(TheDate) = 2 THEN 'th' WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '2' THEN 'nd' WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '3' THEN 'rd' ELSE 'th' END AS CalendarDayOfMonthName , DATEPART(WEEKDAY, TheDate) AS CalendarDayOfWeekNumber , LEFT(DATENAME(WEEKDAY, TheDate), 3) AS CalendarDayShortName , DATENAME(WEEKDAY, TheDate) AS CalendarDayFullName FROM dbo.udf_GetAllDatesInRange (@dStartDate, @dEndDate) as c; COMMIT TRAN END TRY --If there have been any failures we'll jump into the catch block. --Rollback the transaction to ensure everything is as it was before --we started and show the error. BEGIN CATCH IF (@@TRANCOUNT > 0) BEGIN ROLLBACK TRAN; END SELECT ERROR_MESSAGE() AS ErrorMsg; END CATCH
Usage
The intended usage is that the usp_PopulateDimDate stored procedure is executed as often as needed. I’ve tested it in a number of scenarios with a number of date ranges and it usually repopulates the time dimension in less than a second. I usually have it set up so that you pass in the date range or you fetch it from a global variables table.
You don’t need to do anything other than re-running that procedure. Just make sure you don’t clear your LookupNumbers table.
This is just one of many ways to do the same or similar things, but I hope you find it useful. You’ll find all of the scripts used above in a zip file in the resources section of the website here.
[…] https://bidesign.uk/data-warehousing-tip-7/, https://bidesign.uk, http://www.sqlservercentral.com/blogs/chris-jenkins-blog/, […]
[…] https://bidesign.uk/data-warehousing-tip-7/, https://bidesign.uk, http://www.sqlservercentral.com/blogs/chris-jenkins-blog/, […]
A couple of things. First, I use a smallint rather than an int for my date dimension keys. Smallints are smaller than dates. I would strongly suggest never integrating a date and time dimension together. The row count would explode. They should be kept separate in almost any normal use case. I can’t think of an exception, but I’m sure there is one. I always make my joins. The database engine uses this information, and I have not seen them cause cube performance issues. Most cubes only have one date dimension. But I’ve designed several accumulating snapshots that have quite a few dates.
Hi Ron. It’s really good to hear how you implement your time dimensions and you make some very good points. You are quite right that a smallint would be absolutely fine for the vast majority of time dimensions where you have one record per day, so only 365 records per year. That’s a lot of years…
It’s only if you have a warehouse that you’re querying directly that the option of not joining to the time dimension can be of benefit. The time dimension is central to cubes though so I’d never consider circumventing it there.
Forgot to add that I use an integer field without an identity. I populate it with a date difference value from Day 0 of the data warehouse. I used to use an identity, but this provides more flexibility.
Hi Ron. That’s a good way of using a integer key and still being able to tear down and rebuild your time dimension (if you choose to) and retain your existing links in your fact tables. Thanks again for sharing.