Prev: very slow SQL client
Next: SQL Server Output Buffer
From: Jay on 3 Dec 2009 17:15 I'm going through a 2005 Install lesson and it is saying to change the default collation to: Latin1_General with a Binary sort order. Well Latin1_General is normal enough, but why would they want to change the default sort order?
From: Russell Fields on 3 Dec 2009 17:35 Since it is a lesson, perhaps just to teach you how to do it. (However, back a decade or so, when I set up a SQL Servers for installing SAP, that was the required sort order using code page 850.) RLF "Jay" <spam(a)nospam.org> wrote in message news:%23Qj%23YYGdKHA.1640(a)TK2MSFTNGP06.phx.gbl... > I'm going through a 2005 Install lesson and it is saying to change the > default collation to: Latin1_General with a Binary sort order. > > Well Latin1_General is normal enough, but why would they want to change > the default sort order? >
From: Jay on 3 Dec 2009 17:42 Well, I think it would change: SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CI_AS_BIN. I'm looking at: http://msdn.microsoft.com/en-us/library/ms144250(SQL.90).aspx "Russell Fields" <russellfields(a)nomail.com> wrote in message news:%23JWG3jGdKHA.4724(a)TK2MSFTNGP05.phx.gbl... > Since it is a lesson, perhaps just to teach you how to do it. (However, > back a decade or so, when I set up a SQL Servers for installing SAP, that > was the required sort order using code page 850.) > > RLF > > "Jay" <spam(a)nospam.org> wrote in message > news:%23Qj%23YYGdKHA.1640(a)TK2MSFTNGP06.phx.gbl... >> I'm going through a 2005 Install lesson and it is saying to change the >> default collation to: Latin1_General with a Binary sort order. >> >> Well Latin1_General is normal enough, but why would they want to change >> the default sort order? >> >
From: Jay on 3 Dec 2009 18:28 "Jay" <spam(a)nospam.org> wrote in message news:%23Qj%23YYGdKHA.1640(a)TK2MSFTNGP06.phx.gbl... > I'm going through a 2005 Install lesson and it is saying to change the > default collation to: Latin1_General with a Binary sort order. > > Well Latin1_General is normal enough, but why would they want to change > the default sort order? OK, the answer I'm getting is vague, to say the least. However, the implication is that: if you're not using Unicode, there will be small system-wide performance improvement in character operations - basically for free. All this for specifying: SQL_Latin1_General_CP1_CI_AS_BIN, rather than SQL_Latin1_General_CP1_CI_AS. I assume, but do not know, that if the binary sort order is specified and you use Unicode, there would be some slight performance degradation, but no errors. My problem is that the accent-e is a Unicode character (I think) and that these characters have become very common in datasets. So, this "improvement" could easily end up costing you (a small penalty in ) CPU cycles, if Unicode characters are present AND you're doing character operations. Last, what specific kind of operations would be affected by the _BIN? String manipulations?
From: Jeroen Mostert on 3 Dec 2009 19:27
Jay wrote: > I'm going through a 2005 Install lesson and it is saying to change the > default collation to: Latin1_General with a Binary sort order. > > Well Latin1_General is normal enough, but why would they want to change the > default sort order? > Because they don't care about their end users. :-) The binary collations relieve SQL Server from thinking about things like accent- and case-sensitivity: they compare characters on their code points, not strings on their language sorting order. Under a binary collation, 'TEST' < 'Test' because 'E' < 'e', and that's because 69 < 101. This speeds up string comparisons. The problem is that if you set this as the default it affects *everything*, including the system catalogs. In particular, table names and column names then use binary collation as well, so everything becomes case- and accent-sensitive (everything-sensitive, really). Your statements must have been written to take this into account. Worse, it allows for awful mistakes like creating a table "Foo" and a table "FOO" and a table "foo", or a table "cr�pes" with the "�" a single LATIN SMALL LETTER E WITH CIRCUMFLEX and a table "cr�pes" with the "�" a LATIN SMALL LETTER E followed by a COMBINING CIRCUMFLEX ACCENT. Under a binary collation, these are all different strings and so they are admissible as unique names. Good luck sorting that out (no pun intended). This is also not friendly on the user data. Any ORDER BY will sort by code point values[*], and that order matches the sorting rules of no natural language on Earth (that includes English). There's little point to being faster if users can't find what they're looking for or if they can introduce logical inconsistencies by inserting values that differ in code points but not in characters. You can avert this by including an explicit COLLATE clause in your queries, but if you don't have an index on a computed field using this collation then I hope you like index scans and explicit sort operations, because that's what you're going to get. It is generally much more practicable to use a sensible non-binary collation as the default (like Latin1_General_CI_AS) and use the binary collations (like Latin1_General_BIN) only for specialized fields where the performance gain makes sense and you can afford true binary lookup. In those cases it still nearly always makes sense to have a separate computed field that uses a non-binary collation, for purposes of human interaction. -- J. [*]This is not strictly true because SQL Server 2005's binary collations are sort-of broken: the first character is compared according to code point, but the remainder is compared by comparing the bytes. This has quite unintuitive results, and SQL Server 2008 fixes this with the new _BIN2 collations that do compare by code point for all characters. |