Turning Point-in-Time Data Into Time Series
Journey through time while standing still.
February 3, 2018
Introduction
Recently at work I needed to turn different sets of “point-in-time” (PiT) data into time series. Specifically, given rows in a SQL Server database that represent the current state of an entity, how can we show how long that entity has been or was “in process?”
I am sure it is not original, but since I figured this out independently, I thought I would document it here.
Assumptions
The PiT data has to have at least three columns for this technique to work:
Created on - a column indicating the date and time at which the entity was originally created.
Modified on - a column indicating the date and time at which the entity was last altered.
Status - some sort of code indicating whether the entity is still “open” (in-flight, in-process) or else “closed” (finished, expired, cancelled, etc.) This doesn’t have to be a simple yes/no or open/closed status, as we will see.
There are many business entities that have that sort of data - e.g., service tickets in a help desk system, opportunities in a CRM system, loan applications, etc.
Also, we will need a simple table with a series of dates in it. See the next section for that.
Create a Sequence of Dates
I can’t take credit for this - I ripped off the original logic from the accepted answer at this StackOverflow post. However, I just wanted a fixed range, and I chose from January 1, 1980, well before our business data begins, until December 31, 2100, long after I am dead (I will let this be part of someone’s Y2100 problem).
First I created a view:
CREATE VIEW [dbo].[ExplodeDates]
AS
WITH
N0 AS (SELECT 1 as n UNION ALL SELECT 1)
,N1 AS (SELECT 1 as n FROM N0 t1, N0 t2)
,N2 AS (SELECT 1 as n FROM N1 t1, N1 t2)
,N3 AS (SELECT 1 as n FROM N2 t1, N2 t2)
,N4 AS (SELECT 1 as n FROM N3 t1, N3 t2)
,N5 AS (SELECT 1 as n FROM N4 t1, N4 t2)
,N6 AS (SELECT 1 as n FROM N5 t1, N5 t2)
,nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS num FROM N6)
SELECT DATEADD(day,num-1,'1980-01-01') AS thedate
FROM nums
WHERE num <= DATEDIFF(day,'1980-01-01','2100-12-31') + 1
Note that the above generates the given date range of over 44,000 rows fast - under a second (because there is no I/O involved). However, after some testing it seemed like the view would be too slow (not much the SQL optimizer can do with it), so then I created a materialized view and populated it as follows:
CREATE TABLE dbo.ExplodedDates
(
TheDate DATETIME NOT NULL,
CONSTRAINT PK_ComplianceChecklist PRIMARY KEY CLUSTERED
(
TheDate ASC
)
)
GO
INSERT INTO ExplodedDates SELECT * FROM ExplodeDates
Now we have a simple, keyed table with all the dates from 1/1/1980 through 12/31/2100. Cool.
Convert PiT Rows to Time Series
For example purposes I’ll use opportunities from Microsoft CRM. However, the concept remains the same regardless of your entity, as long as it has the three columns I mentioned above, i.e., “created on,” “modified on,” and “status.” Let’s create and populate the time series table for opportunities.
CREATE TABLE CurrentOpportunityStatus
(
StartedOn DATETIME NOT NULL,
LastTouched DATETIME NOT NULL,
OpportunityId UNIQUEIDENTIFIER NOT NULL,
CONSTRAINT PK_CurrentOpportunityStatus PRIMARY KEY CLUSTERED
(
StartedOn ASC,
LastTouched ASC,
OpportunityId ASC
)
)
GO
INSERT INTO CurrentOpportunityStatus
SELECT
FO.createdon AS StartedOn,
CASE FO.statecode
WHEN 0 THEN GETDATE() -- Open
WHEN 1 THEN FO.modifiedon -- Won
WHEN 2 THEN FO.modifiedon -- Lost
ELSE GETDATE() -- Unknown state code
END AS LastTouched,
FO.opportunityid
FROM Your_MSCRM.dbo.FilteredOpportunity FO WITH(NOLOCK)
Note that we use the current date and time if the opportunity is still open, otherwise we used the modifiedon
value to indicate when it was closed. For some other entities I have done this for, the CASE
statement can get quite long, but at the end, the logic remains:
For “open” status codes, the
WHEN
returns the current date and time.For “closed” status codes, the
WHEN
returns the last modified date and time.
That’s all there is to it.
How Many X Were Open on Any Given Date?
The whole reason I went down this path was then to be able to chart how many of each entity type was open on any given date. That is where the ExplodedDates
table comes in to play. For each entity, simply create a view similar to the following:
CREATE VIEW TimeSeriesOfOpportunities
AS
SELECT TOP 10000000 -- TOP used to get ORDER BY in a view
TheDate,
COUNT(*) AS OpenOpps
FROM ExplodedDates ED
INNER JOIN CurrentOpportunityStatus O ON
ED.TheDate >= O.StartedOn
AND ED.TheDate <= O.LastTouched
GROUP BY TheDate
ORDER BY TheDate
Basically, for each date in the date range, if is is on or after the opportunity’s “created on” date and it is before or on the “last touched” date (either “modified on” or today’s date), then it is counted as “open” on that day. This yields a series like:
TheDate OpenOpps
1/1/2017 1455
1/2/2017 1455
1/3/2017 1456
1/4/2017 1453
1/5/2017 1463
1/6/2017 1465
1/7/2017 1465
...lots more rows...
1/27/2018 899
1/28/2018 899
1/29/2018 899
1/30/2018 828
1/31/2018 816
2/1/2018 810
2/2/2018 801
2/3/2018 781
Conclusion
Creating the above takes very little time for each business entity. Then, you can start showing graphs that show counts of the various types of entities in your business that are active by date. It is very handy for dashboards and the like.