"Checking "NOT NULL" in SQL Server Dynamically"
In Extract Transform & Load (ETL) operation, we come across scenario's where the target column where we are pushing data is mandatory (not null) column. But often due to bad data in source \ lack of business rules to map a particular column to target, we end up trying to put NULL in columns which are mandatory and the procedure \ ETL fails.
In approach below, I have tried to put forward a solution that will keep the ETL process running, even in case of such scenario's and these problematic rows can be later retrieved and sent back to business.
"Steps" :-
1) Define all the "NOT NULL" columns in table with some default values when you create a table.
Primarily data types can be broadly classified in 3 types (in SQL Server)
- String ('char','varchar','text','nchar','nvarchar','ntext','binary','varbinary','image')
- Numeric ('bit','tinyint','smallint','int','bigint','decimal','numeric','smallmoney','money','float','real')
- Date ('datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp')
The default's to be assigned are as follows
- String ('#' is the default value)
- Numeric (-1) (can be tweaked as per your requirement, as in your case -1 can be a valid value)
- Date ('01-Jan-1753') (this is the earliest date that you can use in SQL Server)
2) Create the objects using the defaults mentioned above.
create table Profile.Test_Defaults ( ID INT NOT NULL , tmp_date date NOT NULL default '01-Jan-1753' , tmp_str varchar(10) NOT NULL default '#' , tmp_num INT NOT NULL default -1 )
Insert data in table
insert into Profile.Test_Defaults (ID) select (1) insert into Profile.Test_Defaults (ID) select (2) insert into Profile.Test_Defaults (ID,tmp_date,tmp_str, tmp_num) select 3,'01-Jan-1900','Test Str',2
As seen data for ID = 1 & 2 is problematic in nature.
3) Use Dynamic SQL to fetch all the data in all tables.
IF OBJECT_ID('tempdb..#SQL_Temp') IS NOT NULL DROP TABLE #SQL_Temp select Base.TABLE_NAME ,CONCAT( case when Base.Col_Order = 1 then concat ('SELECT * FROM ' ,Base.TABLE_SCHEMA,'.' ,Base.TABLE_NAME, ' WHERE ') else '' end ,case when Base.Col_Order = 1 and Base.DATA_TYPE in( 'datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp') then concat(CONCAT(' CAST(',Base.COLUMN_NAME,' AS DATE )'), ' = ') when Base.Col_Order = 1 then concat(Base.COLUMN_NAME, ' = ') when (Base.Col_Order <> 1 and Base.DATA_TYPE in( 'datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp')) then CONCAT(' OR ',CONCAT(CONCAT(' CAST(',Base.COLUMN_NAME,' AS DATE )'), ' = ')) else concat(' OR ', Base.COLUMN_NAME, ' = ') end ,case when Base.DATA_TYPE in( 'char','varchar','text','nchar','nvarchar','ntext','binary','varbinary','image') then '''#''' when Base.DATA_TYPE in( 'datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp') then '''01-Jan-1753''' when Base.DATA_TYPE in ('bit','tinyint','smallint','int','bigint','decimal','numeric','smallmoney','money','float','real') then '-1' end ) SQl_CODE ,Base.Col_Order into #SQL_Temp FROM ( select Col.TABLE_NAME ,Col.COLUMN_NAME ,Col.DATA_TYPE ,Col.TABLE_SCHEMA ,ROW_NUMBER() OVER (PARTITION BY col.TABLE_NAME ORDER BY col.TABLE_NAME, col.ORDINAL_POSITION ASC) Col_Order from INFORMATION_SCHEMA.COLUMNS Col where Col.TABLE_SCHEMA = 'Profile' and col.TABLE_CATALOG = 'TransformHome' and IS_NULLABLE = 'NO'
) Base
Now when you do
Select * from #SQL_Temp;
As seen above the only operation remaining is to combine these multiple columns in a single row, which is done be piece of code below.
SELECT SQL_Out.TABLE_NAME, SQL_CODE =STUFF ( ( SELECT ' ' + CAST(SQL_IN.SQl_CODE AS VARCHAR(MAX)) FROM #SQL_Temp SQL_IN WHERE SQL_IN.TABLE_NAME = SQL_Out.TABLE_NAME order by SQL_IN.TABLE_NAME,SQL_IN.Col_Order FOR XML PATH('') ),1,1,'' ) FROM #SQL_Temp SQL_Out
GROUP BY SQL_Out.TABLE_NAME;
The output look like this.
When you run the peice of code generated above.
SELECT * FROM Profile.Test_Defaults WHERE ID = -1 OR CAST(tmp_date AS DATE ) = '01-Jan-1753' OR tmp_str = '#' OR tmp_num = -1
We get the following output.
The piece of code above will generate similar queries for all tables mentioned in clause.
where Col.TABLE_SCHEMA = 'Profile' and col.TABLE_CATALOG = 'TransformHome' and IS_NULLABLE = 'NO'
The main advantage of this is that the PROCEDURE \ ETL process does not stop or fail in between due to NULL value.
But being said that the target system might have default data , which may not be desirable in every situation. So depending on your scenario the solution can be tweaked.
Hope you found this useful, do let me know your thought.