Friday 13 May 2011

SSIS: Excel Source Causes NULLs

I've come across a strange issue using SQL2008 recently trying to import data from an Excel datasource. Essentially, i was using the Unpivot task in order to import some legacy time series data to a SQL table. Some of the data in the columns was incomplete for the oldest dates so some cells were blank early on and this is where the problems came. Essentially, for those columns with "leading" blanks, I was getting no data at all for any of the dates once it had been unpivotted. Using the dataviewer I was able to establish that the problem was with the Excel Source component.

Hopefully, the screenshots will show the issue:

Excel Source

Note here how there are leading blanks in 2 of the columns.



SSIS Package

Nothing fancy in this package. Just a simple task to unpivot that data into a normalised table.



Datagrid View

But look what happens when we run it. The Datagrid View shows us that the 3s in ColC have been replaced by NULLs. This will have ramifications for the data as these will not be imported into the database.



So why has this happened? The key is with the behaviour of the Excel driver. Essentially, it will guess the datatypes of the values in the columns based upon what is in the first 8 rows. If it there is a conversion error then the value will be exposed as a NULL and this is exactly what is happening in this example. The values in ColB are unaffected as there are values in the sampled 8 rows whereas ColB has blanks for the first 8 rows and it assumes that anything after this will be Text.

You can read about the behaviour here where it tells you of the workaround which is to:

"add IMEX=1 to the value of Extended Properties in the connection string of the Excel connection manager in the Properties window"

Example:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\SSIS\TestExcelSource.xls;Extended Properties="EXCEL 8.0;HDR=YES";

No comments:

Post a Comment

/* add this crazy stuff in so i can use syntax highlighter