Prev: very slow SQL client
Next: SQL Server Output Buffer
From: Erland Sommarskog on 5 Dec 2009 05:19 Jay (spam(a)nospam.org) writes: > OK, I'll bite. When is it good to use a binary collation? When you care more about performance than case-sensitivity. As Jeroen indicates, it may not be the best choice for the server default or database default, but say that you have this query: select * from persons where email like '%abd%' and there are a lot of persons. It does not help if there is an index on email, due to the initial wildcard. But not only does SQL Server have to scan the index, it has to scan the full string for all rows as well. The comparison rules for Unicode are quite complex, so this is costly. If you force a binary collation: select * from persons where email COLLATE Latin1_General_BIN2 LIKE '%abd%' You can gain speed with a factor of 10. Email addresses are case- insensitive, but it is perfectly acceptable to force them to lowercase when you save them. Notes: 1) If you have an SQL collation and email varchar, this is almost as good. For varchar, an SQL collation only have 255 characters to deal with, and comparisons are a lot simpler. 2) For even faster solutions to the problem get this book: http://www.sqlservermvpdeepdives.com/ and read chapter 17. If you buy this book you also help War Child International to help children affected by war, because that's where our royalies go. -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
From: Jay on 5 Dec 2009 10:28 OK, I got it. Thanks. It also clears something up that has bugged me for a while: Unix clearly uses a binary collation. Thanks, Jay
From: Jeroen Mostert on 5 Dec 2009 11:36 Jay wrote: > It also clears something up that has bugged me for a while: Unix clearly > uses a binary collation. > Oh dear, I feel another enormous post coming up. But I'll keep it short this time. "Unix" is a very broad moniker that could apply to just about anything, but if (as I'm thinking) you mean that file systems mostly used by Unix operating systems tend to be case- and accent-sensitive, then yes. However, NTFS has the same property. The only reason the file system appears case-insensitive to you is because the Win32 subsystem usually layered over interactions with the OS compensates for htis. It is possible to have file names differing only in case on an NTFS system, and the regular Win32 functions will not be able to properly distinguish between them. It is certainly not true that "Unix" uses binary collations for everything. For example, on most recent systems, the "ls" command will sort according to the collation implied by your locale (or by LC_COLLATE, if you've set it), not by a simple binary order. -- J.
From: Erland Sommarskog on 5 Dec 2009 14:33
Jay (spam(a)nospam.org) writes: > It also clears something up that has bugged me for a while: Unix clearly > uses a binary collation. As Jeroen say, not all do. Here is an authentic output: +� ls �lder �ldre arvode autoreply.pl* autoreserv.txt auto-se.pl* bin/ brev/ crontab.save disapproved emacs/ flimsy.pl forward.test lib/ Maildir/ mbox mbox.save mbox.tmp.4473 nail-11.25/ News/ �vre perllib/ Pnews.1 procmaillog procmailrc.bcl progs/ public_html/ quiz/ Rnmail.1 slask socket.pl spamfilter.log spamfilter.pl sql-grupper SQLMPUMS testsh* trn.1 trn-4.0-test76/ That is clearly not binary. In that case you would have see "�ldre", "�lder" and "�vre" last in the listin that order. In SQL Server, there are 18 collations for each collation designator. Taking Finnish_Swedish as an example, there are: Finnish_Swedish_BIN Finnish_Swedish_BIN2 Finnish_Swedish_CI_AI Finnish_Swedish_CI_AI_WS Finnish_Swedish_CI_AI_KS Finnish_Swedish_CI_AI_KS_WS Finnish_Swedish_CI_AS Finnish_Swedish_CI_AS_WS Finnish_Swedish_CI_AS_KS Finnish_Swedish_CI_AS_KS_WS Finnish_Swedish_CS_AI Finnish_Swedish_CS_AI_WS Finnish_Swedish_CS_AI_KS Finnish_Swedish_CS_AI_KS_WS Finnish_Swedish_CS_AS Finnish_Swedish_CS_AS_WS Finnish_Swedish_CS_AS_KS Finnish_Swedish_CS_AS_KS_WS That is two binary collations, and then 16 which are all combinations of case- accent, kana- and width-sensitivity. There are no KI and WI collations in the list, but that is implied by their absence. -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx |