How to reorder tables to improve performance

Monday 17 June 2019 @ 12:20 pm

So say you have four tables A, B, C and D. A joins to B, B joins to C and C joins to D. If all the tables are required in the results (i.e. you are using inner joins) you can theoretically use 4 different link configurations that should give you the exact same output. If you start with A or D the joins would be in a straight line (ABCD or DCBA).  If you start with B or C you would get a fork, like B to A and B to C with C linking to D. But even though the results will be the same, the performance could be dramatically different. So how do you decide which pattern is most efficient?

There isn’t a simple answer that works in every case, so testing is important. However, there are two places I look that often help: the indexed fields and the WHERE clause fields. You can often see the indexed fields in the linking window (colored tabs) or you can ask someone who knows the database what the indexes are on each table. To see the WHERE clause fields go to the database menu in Crystal and select “Show SQL Query”. The fields mentioned in the WHERE clause should match your record selection formula.  If they don’t you may need to tweak the formula so that the criteria can translate into SQL.

Indexed fields:
When linking you want your join to go TO indexed fields and ideally to ALL the fields in that index. So say Tables A and B are linked on two fields from each table. And say that these four fields all have red index tabs. But table B has a third field with a red tab and that field isn’t part of the join. That would mean you should link from B to A.  This uses the complete index in A which is the more efficient than linking to the partial index in B.

And don’t assume that because B is sitting on the left that the join starts at B and goes to A. I always hit the “auto-arrange” button in the links window to confirm the direction of the joins. After hitting “auto-arrange” all the joins flow from left to right. If a join is backwards, you can right-click that join and select “reverse join”, then click “auto-arrange” again to confirm the new direction.

Here are some other posts where I discuss the affects of linking on indexes:
https://kenhamady.com/cru/archives/2923
https://kenhamady.com/cru/archives/2653

WHERE clause fields:
Now lets also say that most of the WHERE clause criteria applies to the C table. I try to take the table with the most restrictive criteria and put it all the way to the left (or as far left as possible). That way they query starts out with the smallest data set possible and each subsequent join has fewer matches to find.

If the primary field in the WHERE clause is found in more than one table you get some flexibility. You can select the table that works best for indexing and then use the field from that table in the criteria.

So based on the above scenario I would recommend starting with table C. Then forking from C to both B and D, with a final link from B to A.

In some rare cases the indexed fields  and the WHERE clause fields can’t both be optimized at the same time because they point in opposite directions. When that happens you have to test different join patterns to see which works best.

One last note. In most reports the order of the joins is obvious from the link pattern.  But if you look at the SQL and the links aren’t in the order you want, you might have to use the “order links” feature of the database expert.

 





The last resort when you need an extra pass

Saturday 8 June 2019 @ 2:07 pm

I had a request this week that sounded relatively simple on the surface. The data was a list of people with from 1 to 10 characteristic rows. The wanted me to assemble all the characteristics for each person into a single alphabetized string, and then count how many people had each string combination. This meant that I had to Group [by person] and then Sort [by characteristic] then Group again[by the combined string]. That is one pass more than Crystal Reports can do.

My normal solution for this would be to do the first pass in the database using a SQL command. And I would have succeeded if the data had been in SQL Server or Oracle. The RowNumber () and Partition functions I wrote about recently would have been part of the solution. But, alas, the data was a classic MS Access MDB file.

After quite a bit of research I found a way to write a SQL command for MS Access that would do the job.  It worked in my test data, but it took hours to run on a normally sized sample of data.

So I offered the customer a relatively fast two-step approach, which is my last resort for getting an extra pass. This involves writing one report to do part of the work, then exporting the results to a spreadsheet, and finally creating a second report to create the final output from the spreadsheet data.

In this case the first report groups, sorts and assembles the string of characteristics for each person. This is exported to a spreadsheet as one column of data, with one row per person. Then the second report reads this spreadsheet and groups on that column and counts occurrences of each value. The process takes a few minutes.

One thing to note, this process is very simple if your export can use classic (XLS) spreadsheets. Crystal includes a native driver that can read tables in XLS files. But XLS files are limited to 64K rows.

The newer XLSX files can hold up to one million rows, and versions of Crystal since 2011 can export to XLSX files. But reading an XLSX file with Crystal requires that you have newer MS Office drivers that don’t come with Crystal Reports. To see if you have these drivers you can create a new OLEDB connection and look in the providers list for:

“Microsoft Office 12.0 Access Database Engine OLE DB Provider

If you don’t see this provider listed you can download and install the drivers.





Server-based scheduler comparison (2019)

Monday 27 May 2019 @ 10:20 pm

I have just updated my comparison of server-based scheduling tools for 2019. These tools are similar to the desktop-based scheduling tools I write about every March, but these are designed to be run on server. This allows multiple people to schedule reports for automated delivery by Email, FTP or network folder.

There are 11 products on the list this year and a few feature updates and price changes. The blog page provides a brief overview of each product. It also has a link to the feature matrix that compares roughly 70 features of these tools. There is even a feature glossary that defines all the terms. So if you need a short course in automating Crystal Reports delivery, this is a pretty good place to start.





Using the ExtractString function

Friday 24 May 2019 @ 7:49 pm

I recently found a function in a Crystal Report that I hadn’t noticed before. Technically it is an additional function (UFL) but I am pretty sure it has been installed automatically with Crystal Reports for a long time. The file is dated 11/8/2000 so it might have been introduced with Crystal Reports V8.

The function is called ExtractString (). It is designed to locate two character strings within a longer string and return all the characters in between those two strings. I have done something similar using the InStr() function but it is much more complicated. A good example of a use for this is when you have XML tags in the middle of a long memo field and you want to extract the value between two specific tags. Say it looks something like this:

“blah blah blah <price>19.99</price> blah blah blah <price2>29.99</price2> blah blah blah “

To extract the value for price2 you would use the following formula:

ExtractString ( {table.xmlField}, '<price2>', '</price2>' )

You give the function three arguments:

  • The field you are searching
  • The string that marks the start
  • The string that marks the end

If it doesn’t find the start string it returns a blank (even if the end string is there).
If it finds the start string but no end string it returns everything after the start string.
if it finds both the start and the end it returns all the characters between them.
It skips over any end strings that occur before the start string.
If it finds multiple starts or ends it uses the first.

Next time I have to parse XML or do something similar I will use this function and save myself a few steps.





Impossible link in Pervasive SQL (Elliot)

Tuesday 14 May 2019 @ 3:14 pm

I have written before about databases that take selection criteria from Crystal and then use the wrong index so that valid records are missing from the results. The solution is to write the criteria so those rules don’t make it into the SQL WHERE clause. Crystal can then apply that criteria locally so it is done correctly.

But today I ran into a similar problem that didn’t have a simple solution. I was creating a report to read Elliot data. Elliot is what used to be called Macola Accounting. We were connecting to a Pervasive SQL DB using an ODBC connection. We were trying to link the Item file to a second instance of the Item file to get a list of components for manufactured items. What we found is that when we joined the component ID from instance one to the part ID in instance two, the results would not return a single match between the two tables.

Looking at the tables separately showed the matching data was there. And when I tried to filter to a single PartID the results would not find that ID. This is when I realized that we had an index problem like the one I described above.

So I looked at the index tabs in the Database Expert and noticed that this table had two red index tabs, meaning there were two fields in the primary index. The tabs were on Item Number and Sequence number. We were linking from a table where there was a component item number but there was not a component sequence number. It appears that Pervasive SQL defaults to using the primary index for ODBC joins, even if the fields you are using for the join don’t completely match the fields in the index. So the link will fail every time. I even unchecked the option “use indexes or server for speed” to see if that would help, but it didn’t have any affect.

We were lucky that the table we were using had an equivalent view. We linked this view to itself and we were finally able to find matching records. I assume this worked because this view, like most views, are not indexed.





Group sort reverses ascending and descending

Wednesday 8 May 2019 @ 5:13 pm

I solved another occasional mystery today.

Crystal allows you to put your groups in order based on the summary fields that exist for that group. This feature is in the “Report” menu under the label “Group Sort”. So if you group by customer and subtotal sales for each customer you can put the customers in order based on their sum. You can rank the customers this way in either ascending or descending order.

You are allowed to use any type of summary field for group sorting, and on occasion I have used a summary that is based on a date field. This could be the first (minimum) order date of for each customer or the last (maximum) order date for each customer. I have noticed that sometimes when I rank a group based on one of these date summaries that Ascending seems to behave like Descending or vice versa. For instance I might pick Descending and I expect the groups with the latest dates to be first. Sometimes they are and sometimes they aren’t. Before today I had never taken the time to figure out what was going on.  Now I know.

If your summary is the Maximum of a date field, like the last order date for each customer, then setting the group sort to ascending or descending will behave in the expected way. Ascending will put the groups with the earliest summary dates first and descending will put the groups with the later dates first.

But if your summary is the Minimum of a date field, like the first order date for each customer, then group sorting for that field will work in reverse. Ascending will put the groups with later dates first and descending will put the groups with earliest dates first. To get the groups to go the way you want you just have to pick the opposite direction.

A couple of notes:
1) This doesn’t happen with a minimum summary of numbers or strings, just a minimum of dates.
2) You can get the same summary value as the minimum by doing the Nth smallest (with N = 1). If you use the Nth Smallest summary for group sorting it does the ascending/descending the normal way, not reversed like the minimum function.





Using the function DrillDownGroupLevel

Tuesday 30 April 2019 @ 4:42 pm

Crystal has a function that calculates the drill-down level of the current window. This allows you to have objects and sections format differently at specific levels of drill-down. The function is called DrilllDownGroupLevel and it gives you a numeric value that tells you the current drill-down level for any preview tab.  The normal preview is level 0, the first drill-down is level 1, etc. So if you want a section with column headings to appear at the second level of drill-down and ONLY at the second level of drill-down you can use the following as a suppression formula for that section:

DrilllDownGroupLevel <> 2

That section will be suppressed in normal preview (drill-down level 0) and also at the first drill-down level, but will appear if you drill down again from level 1. If you are ever unsure which level of drill-down you are on you can write a formula that says simply:

DrillDownGroupLevel

Place this formula in the section in question. When you see that section the number shown will tell you the current drill-down level.

DrillDownGroupLevel also helps you solve one of the dilemmas of report design.  One of the lessons in my advanced class is how to have a report be either a detailed report or a summary report, based on a parameter prompt. You accomplish this with a parameter that has two choices (Summary/Detail) and you use that parameter in the detail section’s suppress formula. It has to be the suppress property because the “hide” property doesn’t have a condition formula button. And, because you are using the suppress property you can no longer drill-down from the summary version of the report to see the details for a group.

But by using the following as your suppress formula you can have the parameter to choose summary or detail, and still have drill-down available on the summary version:

{?Parameter} = "Summary" and
DrillDownGroupLevel <> 1

This will suppress the section when you choose “Summary”, but only as long as you are in the main preview tab.  As soon as you drill down the suppress condition will no longer be met and the details will be visible.





CurrentDate vs DataDate

Tuesday 23 April 2019 @ 11:20 pm

There are two ways to use today’s date in a Crystal formula. These are useful when you want the report to automatically determine a date range, like the last three days. You can calculate the date for three days ago by using one of the following calculations:

CurrentDate - 3

or

DataDate - 3

These expressions will generate the same value at the time the report is refreshed. But if you interact with reports after refreshing them, or if you open reports with saved data, it is important to know how these functions differ.

CurrentDate is identical to the functions PrintDate and Today. PrintDate is also a special field. These date function will all update whenever one of three things happens. When the report is:

1) opened
2) previewed after a modification.
3) printed

Interestingly, exporting a report to a PDF does NOT update these dates.

DataDate is a function and is also a special field. These are updated when the the report is refreshed.

So lets say I ran a report yesterday, saved it with data yesterday and then reopened it today without refreshing it. The CurrentDate function will show today’s date while the DataDate function will show Yesterday’s date.

You can see how these functions could affect record selection. If I calculate criteria to include the last 3 days using the CurrentDate function, and then reopen the report later with saved data, the CurrentDate will change and so will the criteria. The saved data will be reduced or eliminated. However, if I wrote the same criteria using DataDate there is no change because I have not refreshed the report.  So when deciding to use one of these functions you should think about how the formula should respond when you reopen the report with saved data. If it should use date when the report is opened then use CurrentDate.  If it should use the date when it was refreshed then use DataDate.

Also note that there are corresponding time functions (CurrentTime, PrintTime) and special fields that have the same names.   The time functions follow the same pattern as the date functions.





Hiding part of your criteria from the database

Tuesday 16 April 2019 @ 5:21 pm

I first wrote about this issue a decade ago when it showed up in databases like Paradox, Btrieve and Visual Dbase. But I recently saw it twice in PostGreSQL environments so I am going to write about it again.

The problem occurs when you add a new rule to the selection formula. The main symptom is that the criteria works fine when you click “use saved data” but then when you refresh you lose some valid records. Some people think they have a bad join which can also cause you to lose records, but a bad join will lose records even if you click “use saved data”.

In my experience, the issue is usually connected with an index in the database.  When Crystal Reports sends your new rule to the database, the database tries to use an index to speed things up. But some indexes don’t work correctly for filtering and will cause the report to miss some or all valid records. The solution is to prevent that rule from being sent to the database by forcing the rule to be applied locally. There are three approaches I have used:

1) Starting with CR 2008 (v12) you can put this rule into the “Saved Data” selection formula. This selection formula is always applied locally so the database usually doesn’t see it. Crystal will apply this selection formula to the records that are sent back by the database.

2) The first option may not work in call cases, and of course it won’t work if you use an older version of Crystal that doesn’t have that feature.  In that case you have to use the method I wrote about in 2007. You write the selection formula in a way that forces Crystal to apply that rule locally.  If the report is SQL based you want to prevent Crystal’s SQL generator from converting that rule into the SQL WHERE clause.  The most reliable way I have found is by putting the database field inside a Crystal function. For instance if your problem rule uses  a numeric field in the selection formula like this:

{table.Amount} > 50

I would write it as

Round({table.Amount},2) > 50

Or if your selection formula is:

{Transaction.Code} = “ABC”

I would write it as

Left({Transaction.Code},3) = “ABC”

Applying a function to a database field in the selection formula will usually prevent that rule from being converted into the SQL.  In some cases I write the rule as a separate formula field called {@Criteria} and then reference {@Criteria} in the selection formula by adding a line to the selection formula like:

…. and {@Criteria}

Note that this is the exact opposite of what you normally want to do. Normally we want ALL of the record selection formula to be sent to the database or convert into the SQL WHERE clause.  It makes your query more efficient.  When I have a slow report one of the first things I check is for functions in the selection formula.  But it doesn’t help to be efficient if the database can’t handle the rule correctly.

3) I recently had one case where neither of the above options solved the problem. No matter how I wrote the criteria formula, as soon as it was part of the selection formula I would lose records.  So we gave up on selection and used suppression instead.  This is a last resort.  You don’t use the {@Criteria} formula in the selection formula at all.  Without the rule the report will include some records that you don’t want. You then suppress the unwanted records with a suppression formula like this:

not {@Criteria}

When you use this method you also have to make sure these records aren’t included in any totals. This means writing a formula that only includes the records that you want and then doing all totals on these formulas. For instance I could write a formula like this:

if  {@Criteria}
then {table.Amount}
else 0

If I total this formula it won’t matter that I have suppressed records in the report. I don’t see them and they aren’t in the totals.





Solving problems when reporting on CSV files

Monday 8 April 2019 @ 6:39 pm

You can create Crystal Reports that directly read CSV files using the Access/Excel(DAO) connection.  Just keep in mind that CSV files don’t always make an ideal data source. Like spreadsheets, CSV files don’t have set data types for each column. This can cause data type ambiguity which might cause you to lose some data. And in some cases the report will read the first record of the CSV file as the column headings, removing the first record of data from the dataset.

But here CSV files have one advantage over XLS files. CSV files allow you to introduce a Schema.ini file to define the data type for each column in the CSV. This is something you can’t do with spreadsheets. The schema.ini file is a simple text file that sits in the same folder as your CSV. There are many attributes available in schema.ini, but you only need to use the attributes that you need.  The other attributes will be set based on defaults stored in the registry.  Here are the most common problems I find that can be solved with a schema.ini file.

  1. The CSV is reading the first row of data as column headings
  2. Columns read as the wrong data type
  3. Character column is read as numeric and shows only the numeric values
  4. The columns are parsed using the wrong character

Here is an example of a schema.ini that defines two different CSV files in the same folder:

[sample1.csv]
Format=CSVDelimited
ColNameHeader=False

[sample2.csv]
Format=CSVDelimited
ColNameHeader=False
Col1=OrderDate date
Col2=Amount long
Col3=CustID text
Col4=CustName text
Col5=CustCategory text

As you can see, a single INI file can define multiple CSV files when they are in the same folder. Each file gets it’s own section of the INI file. Both files are set to be read as comma delimited.  Both files are set to NOT treat the first row as column headings. In the first file we allow the driver to name the fields (usually A, B, C, etc) and determine the data type automatically. In the second file we name each column and assign each a data type.

So if you are reporting on a CSV file and running into issues, using a Schema.ini file may help solve the problem.





Next Posts »» «« Previous Posts
Jeff-Net
Recrystallize Pro

Crystal Reports Server