SQL union vs cross join

Friday 10 May 2013 @ 6:08 am

I often have reports that require me to include multiple copies of the same records. Normally this means using a command object in Crystal, rather than going directly to the tables. And in most cases I write a UNION query which appends one set of records to itself multiple times. But last month I had a report that required 24 copies of the same 2-year dataset.

I started testing the performance and found that getting a single copy of the data took nearly 5 minutes, partly because the data was coming from a view. I then found that adding a UNION of a second copy of the data added another 4 minutes to the query, and each additional UNION added 4 more minutes. It would take nearly 2 hours to get 24 copies of the data, so I had to scrap the UNION idea.

Then I wondered if using a CROSS JOIN might be faster. A CROSS JOIN is when you add a table to a report, but you leave it unlinked. Normally this is a very bad thing to do but it is handy when you need duplicate data.

So I found an unrelated table that had a column of consecutive numbers and wrote a query that selected the numbers from 1 to 7. Then I added the SQL for that small query as a cross-join in my command. The result was 7 copies of every record in the dataset, retrieved in about 5 minutes. Even when I scaled it up to generate all 24 copies of the data, it only took about 7 minutes total. Obviously much better than 2 hours for the UNION.

Now maybe this would be obvious to someone who works primarily writing SQL, but the magnitude of the difference was a surprise to me. I will be using CROSS JOINS whenever possible, now.









Leave a Reply

Recrystallize Pro