Collation settings [Server Configuration]

Prev: very slow SQL client
Next: SQL Server Output Buffer

From: Jay on 3 Dec 2009 17:15

I'm going through a 2005 Install lesson and it is saying to change the
default collation to: Latin1_General with a Binary sort order.

Well Latin1_General is normal enough, but why would they want to change the
default sort order?

From: Russell Fields on 3 Dec 2009 17:35

Since it is a lesson, perhaps just to teach you how to do it. (However,
back a decade or so, when I set up a SQL Servers for installing SAP, that
was the required sort order using code page 850.)

RLF

"Jay" <spam(a)nospam.org> wrote in message
news:%23Qj%23YYGdKHA.1640(a)TK2MSFTNGP06.phx.gbl...
> I'm going through a 2005 Install lesson and it is saying to change the
> default collation to: Latin1_General with a Binary sort order.
>
> Well Latin1_General is normal enough, but why would they want to change
> the default sort order?
>

From: Jay on 3 Dec 2009 17:42

Well, I think it would change: SQL_Latin1_General_CP1_CI_AS to
SQL_Latin1_General_CP1_CI_AS_BIN.

I'm looking at:
http://msdn.microsoft.com/en-us/library/ms144250(SQL.90).aspx

"Russell Fields" <russellfields(a)nomail.com> wrote in message
news:%23JWG3jGdKHA.4724(a)TK2MSFTNGP05.phx.gbl...
> Since it is a lesson, perhaps just to teach you how to do it. (However,
> back a decade or so, when I set up a SQL Servers for installing SAP, that
> was the required sort order using code page 850.)
>
> RLF
>
> "Jay" <spam(a)nospam.org> wrote in message
> news:%23Qj%23YYGdKHA.1640(a)TK2MSFTNGP06.phx.gbl...
>> I'm going through a 2005 Install lesson and it is saying to change the
>> default collation to: Latin1_General with a Binary sort order.
>>
>> Well Latin1_General is normal enough, but why would they want to change
>> the default sort order?
>>
>

From: Jay on 3 Dec 2009 18:28

"Jay" <spam(a)nospam.org> wrote in message
news:%23Qj%23YYGdKHA.1640(a)TK2MSFTNGP06.phx.gbl...
> I'm going through a 2005 Install lesson and it is saying to change the
> default collation to: Latin1_General with a Binary sort order.
>
> Well Latin1_General is normal enough, but why would they want to change
> the default sort order?

OK, the answer I'm getting is vague, to say the least. However, the
implication is that: if you're not using Unicode, there will be small
system-wide performance improvement in character operations - basically for
free.

All this for specifying: SQL_Latin1_General_CP1_CI_AS_BIN, rather than
SQL_Latin1_General_CP1_CI_AS.

I assume, but do not know, that if the binary sort order is specified and
you use Unicode, there would be some slight performance degradation, but no
errors.

My problem is that the accent-e is a Unicode character (I think) and that
these characters have become very common in datasets. So, this "improvement"
could easily end up costing you (a small penalty in ) CPU cycles, if Unicode
characters are present AND you're doing character operations.

Last, what specific kind of operations would be affected by the _BIN? String
manipulations?

From: Jeroen Mostert on 3 Dec 2009 19:27

Jay wrote:
> I'm going through a 2005 Install lesson and it is saying to change the
> default collation to: Latin1_General with a Binary sort order.
>
> Well Latin1_General is normal enough, but why would they want to change the
> default sort order?
>
Because they don't care about their end users. :-)

The binary collations relieve SQL Server from thinking about things like
accent- and case-sensitivity: they compare characters on their code points,
not strings on their language sorting order. Under a binary collation,
'TEST' < 'Test' because 'E' < 'e', and that's because 69 < 101. This speeds
up string comparisons.

The problem is that if you set this as the default it affects *everything*,
including the system catalogs. In particular, table names and column names
then use binary collation as well, so everything becomes case- and
accent-sensitive (everything-sensitive, really). Your statements must have
been written to take this into account. Worse, it allows for awful mistakes
like creating a table "Foo" and a table "FOO" and a table "foo", or a table
"cr�pes" with the "�" a single LATIN SMALL LETTER E WITH CIRCUMFLEX and a
table "cr�pes" with the "�" a LATIN SMALL LETTER E followed by a COMBINING
CIRCUMFLEX ACCENT. Under a binary collation, these are all different strings
and so they are admissible as unique names. Good luck sorting that out (no
pun intended).

This is also not friendly on the user data. Any ORDER BY will sort by code
point values[*], and that order matches the sorting rules of no natural
language on Earth (that includes English). There's little point to being
faster if users can't find what they're looking for or if they can introduce
logical inconsistencies by inserting values that differ in code points but
not in characters. You can avert this by including an explicit COLLATE
clause in your queries, but if you don't have an index on a computed field
using this collation then I hope you like index scans and explicit sort
operations, because that's what you're going to get.

It is generally much more practicable to use a sensible non-binary collation
as the default (like Latin1_General_CI_AS) and use the binary collations
(like Latin1_General_BIN) only for specialized fields where the performance
gain makes sense and you can afford true binary lookup. In those cases it
still nearly always makes sense to have a separate computed field that uses
a non-binary collation, for purposes of human interaction.

--
J.
[*]This is not strictly true because SQL Server 2005's binary collations are
sort-of broken: the first character is compared according to code point, but
the remainder is compared by comparing the bytes. This has quite unintuitive
results, and SQL Server 2008 fixes this with the new _BIN2 collations that
do compare by code point for all characters.

| Next | Last
Pages: 1 2 3
Prev: very slow SQL client
Next: SQL Server Output Buffer