"Checking "NOT NULL"? in SQL Server Dynamically"?

"Checking "NOT NULL" in SQL Server Dynamically"

In Extract Transform & Load (ETL) operation, we come across scenario's where the target column where we are pushing data is mandatory (not null) column. But often due to bad data in source \ lack of business rules to map a particular column to target, we end up trying to put NULL in columns which are mandatory and the procedure \ ETL fails.

In approach below, I have tried to put forward a solution that will keep the ETL process running, even in case of such scenario's and these problematic rows can be later retrieved and sent back to business.

"Steps" :-

1) Define all the "NOT NULL" columns in table with some default values when you create a table.

Primarily data types can be broadly classified in 3 types (in SQL Server)

  • String ('char','varchar','text','nchar','nvarchar','ntext','binary','varbinary','image')
  • Numeric ('bit','tinyint','smallint','int','bigint','decimal','numeric','smallmoney','money','float','real')
  • Date ('datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp')

The default's to be assigned are as follows

  • String ('#' is the default value)
  • Numeric (-1) (can be tweaked as per your requirement, as in your case -1 can be a valid value)
  • Date ('01-Jan-1753') (this is the earliest date that you can use in SQL Server)

2) Create the objects using the defaults mentioned above.

create table  Profile.Test_Defaults
(
   ID INT NOT NULL
 , tmp_date date  NOT NULL default '01-Jan-1753'
 , tmp_str varchar(10) NOT NULL default '#'
,  tmp_num INT NOT NULL default -1
  

)

Insert data in table

insert into Profile.Test_Defaults
(ID)
select 
(1)




insert into Profile.Test_Defaults
(ID)
select 
(2)




insert into Profile.Test_Defaults
(ID,tmp_date,tmp_str, tmp_num)
select 
3,'01-Jan-1900','Test Str',2

As seen data for ID = 1 & 2 is problematic in nature.

No alt text provided for this image

3) Use Dynamic SQL to fetch all the data in all tables.

IF OBJECT_ID('tempdb..#SQL_Temp') IS NOT NULL 
DROP TABLE #SQL_Temp


select
 Base.TABLE_NAME
,CONCAT(	
	case
		when Base.Col_Order = 1 then concat ('SELECT * FROM ' ,Base.TABLE_SCHEMA,'.' ,Base.TABLE_NAME, ' WHERE ')
		else ''
	end	
	,case
			when Base.Col_Order = 1 and Base.DATA_TYPE in( 'datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp') then concat(CONCAT(' CAST(',Base.COLUMN_NAME,' AS DATE )'), ' = ')
			when Base.Col_Order = 1 then concat(Base.COLUMN_NAME, ' = ')
			when (Base.Col_Order <> 1 and Base.DATA_TYPE in( 'datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp')) then CONCAT(' OR ',CONCAT(CONCAT(' CAST(',Base.COLUMN_NAME,' AS DATE )'), ' = '))
			else concat(' OR ', Base.COLUMN_NAME, ' = ')
	end		
	,case
		when Base.DATA_TYPE in( 'char','varchar','text','nchar','nvarchar','ntext','binary','varbinary','image') then '''#'''
		when Base.DATA_TYPE in( 'datetime','datetime2','smalldatetime','date','time','datetimeoffset','timestamp') then '''01-Jan-1753'''
		when Base.DATA_TYPE in ('bit','tinyint','smallint','int','bigint','decimal','numeric','smallmoney','money','float','real') then '-1'
	end 	
) SQl_CODE
,Base.Col_Order
into #SQL_Temp
FROM
(
	select 
		 Col.TABLE_NAME
		,Col.COLUMN_NAME
		,Col.DATA_TYPE
		,Col.TABLE_SCHEMA
		,ROW_NUMBER() OVER (PARTITION BY col.TABLE_NAME ORDER BY col.TABLE_NAME, col.ORDINAL_POSITION ASC) Col_Order
	from INFORMATION_SCHEMA.COLUMNS Col
	where 
		Col.TABLE_SCHEMA = 'Profile'
		and col.TABLE_CATALOG = 'TransformHome'
		and IS_NULLABLE = 'NO'
) Base

Now when you do

Select * from #SQL_Temp;
No alt text provided for this image

As seen above the only operation remaining is to combine these multiple columns in a single row, which is done be piece of code below.

SELECT SQL_Out.TABLE_NAME,  
SQL_CODE =STUFF  
(  
     (  
       SELECT 
	   ' ' + CAST(SQL_IN.SQl_CODE AS VARCHAR(MAX))  
       FROM #SQL_Temp SQL_IN   
       WHERE SQL_IN.TABLE_NAME = SQL_Out.TABLE_NAME	   
       order by SQL_IN.TABLE_NAME,SQL_IN.Col_Order   
	   FOR XML PATH('')  	   
     ),1,1,''  	 
)  
FROM #SQL_Temp SQL_Out 
GROUP BY SQL_Out.TABLE_NAME; 

The output look like this.

No alt text provided for this image

When you run the peice of code generated above.

SELECT * FROM Profile.Test_Defaults WHERE ID = -1  OR  CAST(tmp_date AS DATE ) = '01-Jan-1753'  OR tmp_str = '#'  OR tmp_num = -1

We get the following output.

No alt text provided for this image

The piece of code above will generate similar queries for all tables mentioned in clause.

where 
		Col.TABLE_SCHEMA = 'Profile'
		and col.TABLE_CATALOG = 'TransformHome'
		and IS_NULLABLE = 'NO'

The main advantage of this is that the PROCEDURE \ ETL process does not stop or fail in between due to NULL value.

But being said that the target system might have default data , which may not be desirable in every situation. So depending on your scenario the solution can be tweaked.

Hope you found this useful, do let me know your thought.

要查看或添加评论,请登录

Swanand Marathe的更多文章

社区洞察

其他会员也浏览了