[With apologies to Edsger Dijkstra.]
We have a vendor database at work to which we’ve built some custom integrations. It is a document management system. One of the columns in the main table contains object names - folders, files, users, whatever - they store all types of objects in one big table.
If the folder is a “customer-level” folder then it contains the customer’s name followed by the customer id in parentheses (I didn’t design any of this!) There is no direct access to the customer id by itself, so I wrote some SQL that looks to see if a customer folder already exists using the customer id1. It looked like this:
SELECT TOP 1 CASE WHEN Id IS NULL THEN 0 ELSE Id END AS Id, Name FROM Objects WITH(NoLock) WHERE Name LIKE '%(' + CAST(@custno AS NVARCHAR(10)) + ')%'
This has been getting slower and slower as the database grows in size, and was taking 10-15 seconds per run. Not good.
Changing it to use the
PATINDEX function instead of
LIKE didn’t help at all, nor did I expect it to.
Then I changed the wildcard search to instead do an “ends with” search:
SELECT TOP 1 CASE WHEN Id IS NULL THEN 0 ELSE Id END AS Id, Name FROM Objects WITH(NoLock) WHERE RIGHT(Name, LEN('(' + CAST(@custno AS NVARCHAR(10)) + ')')) = '(' + CAST(@custno AS NVARCHAR(10)) + ')'
This ran in under a second. I have now rolled it into production.
In retrospect, this result doesn’t surprise me. What did surprise me was the order of magnitude difference.
Since it is a doc management system it would be fair to ask, “Why isn’t the customer number stored as metadata on the folder?” The answer is, “It is,” but because of the way the vendor’s database is designed, it actually takes longer to look up objects by using the metadata than using the approach above!↩