Data-type changes in Excel.

Thursday 6 June 2013 @ 8:59 am

People love to work with Excel, and Crystal does a decent job of reading Excel tables. But there is one huge headache that I have had repeatedly over the years. Excel columns don’t have a data type so Crystal (or more specifically the Excel driver) has to guess the data types of each column. The problems start when you have columns with mix of data types, usually a string column that has some numeric values mixed in. Crystal has to decide (guess) the data type of the column based on the data that it sees. Once it decides on a type, all values that are NOT that type become null.

The problem for me has been that I could never find the rules for the decision. But with a bit of research and testing I now feel I have a handle on it. The Excel driver (by default) will scan the first 8 rows of the spreadsheet and each column will be assigned the data-type of the majority of the values in those 8 rows (excluding the header row). Ties go to numeric. So if the first 8 data rows have 5 numbers and 3 strings, the column will be called a numeric and all strings are NULL.

The problem is usually that people need ALL the values and they want them all to be strings. Unfortunately the only way to get this is to change the data in the column. I did some testing today and found that you can’t simply format the column to TEXT format. The format doesn’t change the value and the Excel driver ignore the format. What worked consistently for me was to write the following formula:

=""&A2

In other words I took the problem column and created a new column that appended an empty string to the front of it. This fixes the numerics by converting them to a string but doesn’t change the strings that are already working as strings.

The same problem comes up when reading text files. To fix in text files you can pull into Excel and use the method above, or you can create a schema.ini file in the same directory as the CSV and use that to define the data types of the fields.

Here are some related links for working with Excel or schema.ini:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx
http://support.microsoft.com/kb/210073
http://support.microsoft.com/kb/257819
http://scn.sap.com/thread/907537








2 Responses to 'Data-type changes in Excel.'

  1. Ken Hamady - June 6th, 2013 at 10:04 pm

    I received the following comment from Bob Antaki of the Bay Area News Group:

    In Excel, the way that I convert numbers (for example, account numbers that are really string values in the database, even though the values are all numeric digits) is to:

    – Select the whole column in Excel (or select cells in one column).
    – In the Data tab or menu, select “Text to Columns”.
    – In the Wizard window step 1, accept the default and click Next.
    – In the Wizard window step 2, click Next.
    – In the Wizard window step 3, for “Column data format”, select the “Text” radio button, and click Finish.

    I use this so often, the shortcut keys are in my finger memory: Alt-D, E, Enter, Enter, T, Enter

    Similarly, when opening a text file in Excel, you get should get the Text Import Wizard, which is basically the same as the Wizard above. In step 3, you can force one or more column to be text in Excel by selecting the columns, one at a time or several adjacent columns, and then clicking the “Text” radio button. The column heading will then change from “General” to “Text”. You can also use this process to force a column to be a date value.

    You can also use Text to Columns to split one column into multiple columns, based on a character in the column. Be sure that there are empty columns on the right to hold all the splits.

    For instance, if you have values that are in the format Last Name, comma, First Name, then you can split the column into two separate columns, using the comma as the delimiter character. Assuming that there is a maximum of one comma per cell, then you would need one empty column to the right. Or you can split columns based on fixed width.

    If I am going to use an Excel file as a data source in CR, especially if I am going to run the report multiple times with different data, then I:

    – Create a “dummy file” that the report will use as the data source.
    – Row 1 contains column headings.
    – Row 2 contains “dummy” sample data with the format that I want the column to have. For example, “x” for text, 0.00 for numbers, or an Excel date for dates, etc.
    – Save the report. This way CR sets the data types.
    – Before running the report with actual data, copy the “real” data from the source spreadsheet into the dummy spreadsheet (or a copy). I usually use Copy Special -> Values (shortcut Alt-E, S, V), in case the cell formatting in the source spreadsheet is different than in the dummy spreadsheet. You could delete the row 2 dummy data, but I leave the row in the worksheet, and filter it out in the record selection formula.
    – Save the spreadsheet.
    – Run the report.

  2. Ken Hamady - July 29th, 2013 at 12:54 pm

    Jame Becker sent another suggestion:

    If this is something that you have to do a lot or if you do a lot of work in Excel, I would recommend looking into ASAP Utilities. The plug-in has a quick function that will change a highlighted column from number values to text values by inserting an apostrophe in front. While I wouldn’t necessarily buy the add-in for only this feature, it is an excellent Excel tool with a lot of functions that is very reasonably priced with a free trial (free for students or charitable organizations).


Leave a Reply

Recrystallize Pro