
If you’ve escaped string parsing thus far, count thyself lucky. I’ve seen some numbing scripts from pre-SQL Server 2016 when STRING_SPLIT was introduced. I think the fact this month’s T-SQL Tuesday is entirely on string parsing says enough about how many ways there are to paint this particular cat.
I’ve been using the IMDb’s public dataset since my early days of presenting. AdventureWorks is great, but why not have demos with movie data I can make fun graphics about Lord of the Rings with? Having used IMDb the website all the time I figured I knew what sort of information I was going to get. What I did not expect was this file of writers and directors by title despite the existence of a cast file that also contains, surprise surprise, writers and directors.

This is a different though overlapping list that must now by normalized to flesh out my cast table. I’ve loaded the file to TitletoDirectorWriter matching the file schema one-to-one. Foreshadowing dictates using STRING_SPLIT() to solve this problem.
My initial exposure to STRING_SPLIT() was when parsing a comma delimited list of values for reporting.
DECLARE @ItemTypes nvarchar(max) = 'Reverse,Skip,Draw4,Draw2,WildCard'

I also grew up learning CROSS APPLY and OUTER APPLY were of the devil and not to be used, but over the last few years I’ve started to see the light. They have a place, and this is one of them.
SELECT tdw.TitleId [TitleId]
,Writer.Value [PersonId]
,WriterPersonIds
,'Writer' [Category]
FROM TitleToDirectorWriter tdw
CROSS APPLY STRING_SPLIT(WriterPersonIds, ',') [Writer]
WHERE WriterPersonIds IS NOT NULL
UNION ALL
SELECT tdw.TitleId [TitleId]
,Director.Value [DirectorId]
,DirectorPersonIds
,'Director' [Role]
FROM TitleToDirectorWriter tdw
CROSS APPLY STRING_SPLIT(DirectorPersonIds, ',') [Director]
Where DirectorPersonIds IS NOT NULL
ORDER BY TitleId
We get the following results with some potentially unexpected behavior.

SQL Server doesn’t like making promises, especially when it comes to returning ordered data. SELECT TOP 1 Name FROM YourTable with no order by, and you’re gonna have a bad time. STRING_SPLIT() is no different, or at least it was. SQL Server 2022 brought to us a third argument enable_ordinal. Now if we adjust our script to include the original column from our string split with the extra argument passed in we can guarantee the order of our results.
SELECT tdw.TitleId [TitleId]
,Writer.Value [PersonId]
,Writer.ordinal
,WriterPersonIds
,'Writer' [Category]
FROM TitleToDirectorWriter tdw
CROSS APPLY STRING_SPLIT(WriterPersonIds, ',', 1) [Writer]
WHERE WriterPersonIds IS NOT NULL
UNION ALL
SELECT tdw.TitleId [TitleId]
,Director.Value [DirectorId]
,Director.ordinal
,DirectorPersonIds
,'Director' [Role]
FROM TitleToDirectorWriter tdw
CROSS APPLY STRING_SPLIT(DirectorPersonIds, ',', 1) [Director]
WHERE DirectorPersonIds IS NOT NULL
ORDER BY TitleId
,ordinal

Here’s to October and many more T-SQL Tuesdays.
Random Fun Fact: Plastic Tupperware containers are most easily closed by pressing down firmly in the center, not fussing along the outside lip of the lid.

Leave a comment