Thursday, November 12, 2015

SQL ID Extraction from String

Suppose we have a requirement that we have to extract numbers from a set of alphanumeric strings.E.g. the input will be

ID StringValue
1 My Address is #456, 13th Main, 16th Cross, Area code - 560009
2 My first mobile number:91161181100. Second mobile number:- 1111111111. 

Third one is
: 1212121212


The respective output will be
ID CommaSeperatedValues
1 456,13,16,560009
2 91161181100,1111111111,1212121212


Inorder to solve this problem,first we will split the strings with space(' ') as the delimeter as shown under

DECLARE @T TABLE(ID INT IDENTITY,StringValue VARCHAR(500))
INSERT INTO @T 
SELECT 'My Address is #456, 13th Main, 16th Cross, Area code - 560009' UNION ALL
SELECT 'My first mobile number:91161181100. Second mobile number:- 1111111111. Third one is: 1212121212'

;WITH SplitCTE AS(
SELECT 
 F1.ID, 
 X.SplitData 
 ,Position = PATINDEX('%[0-9]%', X.SplitData)
FROM (
 SELECT *, 
 CAST(''+REPLACE(StringValue,' ','')+'' AS XML) AS XmlFilter 
 FROM @T F
 )F1
 CROSS APPLY
 ( 
 SELECT fdata.D.value('.','varchar(50)') AS SplitData 
 FROM f1.xmlfilter.nodes('X') AS fdata(D)) X
 )
 SELECT *
 FROM SplitCTE
The output is as expected
ID SplitData Position
1 My   0
1 Address   0
1 is   0
1 #456   2
1    0
1 13th   1
1 Main   0
1    0
1 16th   1
1 Cross   0
1    0
1 Area   0
1 code   0
1 -   0
1 560009   1
2 My   0
2 first   0
2 mobile   0
2 number:91161181100. 8
2 Second   0
2 mobile   0
2 number:-  0
2 1111111111.  1
2 Third   0
2 one   0
2 is:   0
2 1212121212  1

Now, it is quite silmple to figure out the set of numeric and alphanumeric strings by using the

PATINDEX function as shown under

WHERE PATINDEX('%[0-9]%', SplitData) > 0


The output will be
ID SplitData Position
1 #456   2
1 13th   1
1 16th   1
1 560009   1
2 number:91161181100. 8
2 1111111111.  1
2 1212121212  1

The next task is to find out the numeric values which can be done by using the following SQL Code snippet

SELECT ID ,AllNumeric = LEFT(SUBSTRING(SplitData, Position, LEN(SplitData)), PATINDEX('%[^0-9]%', SUBSTRING(SplitData, Position, LEN(SplitData)) + 't') - 1)   
FROM SplitCTE

The output will be
ID AllNumeric
1 456
1 13
1 16
1 560009
2 91161181100
2 1111111111
2 1212121212
So now as a final step, we need to use the FORXMLPATH and group by the items by using the ID

SELECT ID,STUFF(( SELECT ',' + c1.AllNumeric
    FROM numericCTE c1
    WHERE c1.ID = c2.ID
    FOR XML PATH(''),TYPE)
    .value('.','NVARCHAR(MAX)'),1,1,'') AS CommaSeperatedValues
FROM numericCTE c2
GROUP BY ID

The result
ID CommaSeperatedValues
1 456,13,16,560009
2 91161181100,1111111111,1212121212

No comments:

Post a Comment