Dealing with duplicate records or “table inflation”

Thursday 2 July 2009 @ 8:50 pm

This week I had 4 different customers ask about the same problem – duplicate records in a report.  After explaining the process I use to troubleshoot this for the fourth time I decided it might make a good BLOG article.

1) The first thing I do is check the joins to see if one of them is obviously wrong or only partially complete.  A partial join is a join that needs 2 or more fields linked, but where only some of the needed links were made.  For instance if you have an accounting system that allows multiple companies you may need to link two fields to identify the vendor – Company ID and Vendor ID.  This is because Vendor #7 might be a different person in one company than in the other.  Or maybe to link a GL account you need both the account and the subaccount.  If you only make one of these links then your link isn’t specific enough and you will get multiple irrelevant matches for each record.

2) If that doesn’t solve it I start a brand new report to test all the links and identify which table is causing the duplicates.  I start the report with just the primary table and put on just enough fields to confirm that there aren’t any duplicates there.  Then I add another table to the report, link it to the first, and add one or two fields from that table to the report.  After the fields from each new table are added I check to see if that the duplicates have appeared.  If so then I know that this table link is the source of the problem.  (Note – it is very important to add at least one field from each new table to the layout of the report.  Otherwise Crystal won’t incorporate that table into the query, and you don’t really know if the link is valid or not.)

3) Once you identify the  table that is the source of the duplicates you have to determine if the duplicates are irrelevant records caused by an incomplete join, or if these are valid records in an intentional one-to-many table relationship.  For instance if you link payments to charges you may find that a single charge can actually have more than one payment.  Linking the charge to multiple payments would make the charge appear multiple times, as if it were a duplicate.  But this is an intentional part of the database design.

Once you know which table is causing the duplicates you can sample a few fields from that table to see if the  data in each the row corresponds with the existing columns of the report.  If all the matching records in the inflating table match up with the prior data then you probably have an intentional one-to-many relationship. But if some (or all) of the data in the inflating table does NOT match, you probably have a bad or incomplete link.

This should get you started and may even solve the problem.  If you need help troubleshooting duplicates or totaling the intentional duplicates, give me a call.

Leave a Reply

Recrystallize Pro