Dashes and strange sorting

Thursday 6 February 2014 @ 10:34 pm

I noticed something odd the other day when sorting/grouping a character column. The column was date values stored as strings in the format Year-Month-Day. When I looked at the data I noticed that some values had dashes and some did not, like this:

2013-10-30
20131031
20131101
2013-11-02
20131103
2013-11-04
20131105
20131106

What surprised me was that they were still sorting in chronological order, as if the dashes didn’t exist. I had never seen that before. Since I like my sorts to be consistent and predictable I did some quick testing to help me understand what was happening.

I discovered that if you sort or group on a database field – and even on some simpler formulas – the database driver will add an “ORDER BY” to the SQL query. This means the database will do the sorting on the server, and many databases ignore dashes when they sort columns. This has something to do with Unicode “word sort” rules, but the main point is that it ignores dashes.

When you sort or group on most formulas fields the database driver does NOT add an “ORDER BY”. In those cases CR does the sort locally. So I wrote a formula like this and used it as a sort:

Left ({Table.Field} , 12)

I have found that formulas that use the LEFT() or RIGHT() function never pass any logic to the SQL. The value looks identical to the original field, but my sort order was now like this with dashes sorting ahead of numbers:

2013-10-30
2013-11-02
2013-11-04
20131031
20131101
20131103
20131105
20131106

No, this isn’t chronological but it is what I would expect when the data is not formatted consistently. But there was still one last surprise. When the dashes were between alpha characters even the local sort in CR would ignore the dashes. So a local sort recognizes dashes between numbers but not between alpha characters. My formula using the LEFT() function wouldn’t help with alpha characters. I got a sort like this:

2013-10-30
2013-11-02
2013-11-04
20131031
20131101
20131103
20131105
20131106
ABA
AB-B
ABCD
AB-Q

So if you ever need a sort that always recognizes the dashes you will have to use the REPLACE() function to replace dashes with another  character that has a similar ASCII value. For example, replacing all dashes with plus signs [+] makes it sort consistently. The formula would look like this:

Replace ( {Table1.Field1} ,'-','+')

You would still display the original field but use the formula as the sort field. The sort would end up in this order:

2013-10-30
2013-11-02
2013-11-04
20131031
20131101
20131103
20131105
20131106
AB-BB
AB-QB
ABABA
ABCDE

I found two links that discuss this.  One from StackOverflow and the other from Microsoft.  If anyone has any more information about sort anomalies related to Unicode, drop me a line.

(For examples of my most popular formulas, please visit the FORMULAS page on my website.)







Leave a Reply

Jeff-Net
Recrystallize Pro

Crystal Reports Server