From: Volker Jordan on 7 Jun 2010 08:19 Hi NG, is there a way to select nvarchar fields, that have non latin characters ? Do I have to use UNICODE() and search for high numbers ? Regards Volker Jordan
From: Erland Sommarskog on 7 Jun 2010 09:00 Volker Jordan (v_jordan(a)web.de) writes: > is there a way to select nvarchar fields, that have non latin characters ? > > Do I have to use UNICODE() and search for high numbers ? That or an expression with patindex: SELECT ... FROM tbl WHERE patindex('%[' + nchar(nnn) + '-' + nchar(65535) + ']%', col COLLATE Latin1_General_BIN2) This assumes that there is a code point at which the Latin characters "ends". I will have to admit that I don't know whether is such a simple limit. It also depends on what characters you really want to find. Is LATIN CAPIAL LETTER N WITH GRAVE (used in Pinyin) a "Latin" character by your standards? In any case, for these kind of patterns it is a good idea to force a binary collation so that the ranges works you would expect. A possibility to explore is what support the .Net Framework offers in this regard. If .Net Fx includes function for classification of characters, you could write a CLR function for the task. -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
From: Volker Jordan on 7 Jun 2010 10:16 Am Mon, 07 Jun 2010 13:00:49 +0000 schrieb Erland Sommarskog: > Volker Jordan (v_jordan(a)web.de) writes: >> is there a way to select nvarchar fields, that have non latin >> characters ? >> > > SELECT ... > FROM tbl > WHERE patindex('%[' + nchar(nnn) + '-' + nchar(65535) + ']%', > col COLLATE Latin1_General_BIN2) > > In any case, for these kind of patterns it is a good idea to force a > binary collation so that the ranges works you would expect. > That looks very good, but why do I have to set a binary collate ? Regards Volker
From: Erland Sommarskog on 7 Jun 2010 17:39 Volker Jordan (v_jordan(a)web.de) writes: >> SELECT ... >> FROM tbl >> WHERE patindex('%[' + nchar(nnn) + '-' + nchar(65535) + ']%', >> col COLLATE Latin1_General_BIN2) >> >> In any case, for these kind of patterns it is a good idea to force a >> binary collation so that the ranges works you would expect. >> > > That looks very good, but why do I have to set a binary collate ? I tried to explain that in my post, but consider this example: create table #data (a nvarchar(20) NOT NULL) go insert #data (a) VALUES ('RABARBER') insert #data (a) VALUES ('EWIGKEIT') insert #data (a) VALUES ('Wissenschaft') go SELECT * FROM #data WHERE a LIKE '%[a-z]%' SELECT * FROM #data WHERE a COLLATE Latin1_General_BIN2 LIKE '%[a-z]%' go DROP TABLE #data The desire is to find words with lowercase letters in them. But you will find that the first select returns all three words. This is because the range a-z relates to the collation so it goes aBbCc...z. If you force a binary collation, you use the actual ASCII range. To avoid such surprises, you should specify a binary collation. I should also add that for your Unicode quest, it is important that you use a BIN2 collation. The BIN collations are quire weird. -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
From: Volker Jordan on 8 Jun 2010 08:46 Am Mon, 07 Jun 2010 23:39:22 +0200 schrieb Erland Sommarskog: > To avoid such surprises, you should specify a binary collation. I should > also add that for your Unicode quest, it is important that you use a > BIN2 collation. The BIN collations are quire weird. Many thanks !
|
Pages: 1 Prev: How to group the column in SQL Next: Writing from SQL Server to a UNC path |