From: SnapDive on 13 Apr 2010 16:01 SQL Server 2008, server and database are both set to default Latin CI collation. Some characters are in there that are not valid for consuming systems. I would like to write a SQL-Update that will replace the varchar value with a value with a "restricted collation" like ASCII 7-bit so characters not on a standard US keyboard will be removed. How can I do that? Thanks.
From: Jeroen Mostert on 14 Apr 2010 01:34 On 2010-04-13 22:01, SnapDive wrote: > SQL Server 2008, server and database are both set to default Latin CI > collation. Some characters are in there that are not valid for > consuming systems. I would like to write a SQL-Update that will > replace the varchar value with a value with a "restricted collation" > like ASCII 7-bit so characters not on a standard US keyboard will be > removed. > > How can I do that? > You can't define a custom collation; the only real solution is to write a function that filters the value and use that in triggers or CHECK constraints: CREATE FUNCTION dbo.filter_non_ascii(@s VARCHAR(MAX)) RETURNS VARCHAR(MAX) WITH SCHEMABINDING AS BEGIN DECLARE @result VARCHAR(MAX) = ''; DECLARE @i INT = 1; DECLARE @c INT; WHILE @i <= LEN(@s) BEGIN SET @c = ASCII(SUBSTRING(@s, @i, 1)); SET @result += CASE WHEN @c > 127 THEN '?' ELSE CHAR(@c) END; SET @i += 1; END; RETURN @result; END; Writing INSTEAD OF triggers is a pain; I recommend checking and rejecting the value (CHECK mycolumn = dbo.filter_non_ascii(mycolumn)) instead of filtering it as part of updating/inserting, forcing the clients to filter the value as they see fit before passing it to the database. This also minimizes surprises. T-SQL is none too fast at string manipulation, so if you're passing around huge strings this might become a bottleneck. You may want to consider using a CLR function in that case, but don't do this as a case of premature optimization, as this involves some nontrivial work. If you are limiting the range of characters to ASCII because you are going to use the strings as machine-readable data, you may want to consider using Latin1_General_BIN2 for the column's collation. The binary collation does bitwise comparisons of strings (by code point, rather) and can offer a performance benefit. Beware that strings must then match by exact characters, no variation in case is allowed. -- J.
|
Pages: 1 Prev: Limiting a column to certain values Next: Create Adjancency Model From Flat Data? |