Oh boy, I’m so sorry if you are desperate enough to have to do what I spent last weekend doing and I sure hope some AI does not unknowingly feed you this, but here we go.
Microsoft SQL Server has this wonderful feature called Dynamic data masking . It allows you to mask sensitive data from certain SQL users, while still retaining the original values.
Data masking is a useful way to mark certain data as sensitive. However, for some reason or another you or someone close to you might come to the conclusion that you need to anonymize the database permanently. If your data is already masked, we can use that information to apply the masking destructively.
Here’s the script
First we need to create some functions
The first masking function will emulate the ’email()’ masking method
CREATE FUNCTION dbo.MaskEmailString(@OriginalEmail VARCHAR(100)) RETURNS VARCHAR(100) AS
BEGIN
RETURN LEFT(@OriginalEmail, 1) + 'xxx@xxxx.com'
END;
The Second will emulate the different ‘partial()’ masking methods
CREATE FUNCTION dbo.MaskPartialString (@input NVARCHAR(MAX),@prefix int,@custom VARCHAR(64),@postfix int) RETURNS NVARCHAR(MAX) AS
BEGIN
RETURN LEFT(@input, @prefix) + @custom + RIGHT(@input, @postfix)
END;
Make sure to tweak these if they don’t completely fit your schema. SQL Server will tell you if something doesn’t add up.
Generating the update statements
SELECT
tbl.[name] as table_name,
c.[name] as column_name,
c.masking_function as mask,
co.DATA_TYPE,
co.CHARACTER_MAXIMUM_LENGTH as max_length INTO #Masks
FROM
sys.masked_columns AS c
JOIN sys.tables AS tbl ON c.[object_id] = tbl.[object_id]
JOIN INFORMATION_SCHEMA.COLUMNS AS co ON co.COLUMN_NAME = c.name
AND tbl.name = co.TABLE_NAME
WHERE c.is_masked = 1
ORDER BY c.name, tbl.name;
--Finally We generate a list of UPDATE statements for each masked column in the database
SELECT
CASE
--Text default = xxxx
WHEN mask = 'default()' AND DATA_TYPE in ('char','nchar','varchar','nvarchar','text','ntext'
) THEN FORMATMESSAGE('UPDATE %s SET %s = ''%s'';',table_name,column_name,column_name,REPLICATE('x', IIF(max_length>4, 4, max_length)))
--Numeric defaults = 0
WHEN mask = 'default()' AND DATA_TYPE in ('bigint','bit','decimal','int','money','numeric','smallint','smallmoney','tinyint','float','real'
) THEN FORMATMESSAGE('UPDATE %s SET %s = 0;',table_name,column_name)
--Date defaults = 1900-01-01
WHEN mask = 'default()'
AND DATA_TYPE in ('date','datetime2','datetime','datetimeoffset','smalldatetime','time'
) THEN FORMATMESSAGE('UPDATE %s SET %s = ''1900-01-01 00:00:00.0000000'';', table_name, column_name)
--Email uses the function declared above
WHEN mask = 'email()' THEN FORMATMESSAGE('UPDATE %s SET %s = dbo.MaskEmailString(%s);',table_name,column_name,column_name)
--Random will set all values in a column to the same random number using the column name as a seed
WHEN mask LIKE 'random(%' THEN FORMATMESSAGE('UPDATE %s SET %s = RAND(CAST(CAST(%s as varbinary)as int));',table_name,column_name,column_name)
--Partial will try to parse the existing mask and reconstruct it using the function declared above
WHEN mask LIKE 'partial(%' THEN FORMATMESSAGE('UPDATE %s SET %s = dbo.MaskPartialString(%s,%s,''%s'',%s);',
table_name,
column_name,
column_name,
--Prefix argument
SUBSTRING(mask,CHARINDEX('(', mask) + 1,CHARINDEX(',', mask) - CHARINDEX('(', mask) -1),
--Custom string argument
SUBSTRING(mask,CHARINDEX('"', mask) + 1,CHARINDEX('"', mask, CHARINDEX('"', mask) + 1) - CHARINDEX('"', mask) -1),
--Postfix argument
SUBSTRING(mask,CHARINDEX(',', mask, CHARINDEX(',', mask) + 1) + 1,CHARINDEX(')', mask) - CHARINDEX(',', mask, CHARINDEX(',', mask) + 1) -1)
)
ELSE null
END
FROM
#Masks;
NOTES
-
I’m not going to write a cursor loop that executes this automatically. Run function to generate the individual UPDATE statements and check them for issues.
-
Some esoteric masking strategies are funky and react strangely with compound datatypes.
-
FINAL REMINDER: These will destructively and permanently alter your data!
-
random() is not really supported as I couldn’t find a way to randomize a value for every row. The Random case here will set all values in a column to the same random number using the column name as a seed