New bie to SQL, confused about SQL collation [Using MS SQL Server]

Prev: Are IF statemens legal in a VIEW?
Next: a brand new query tool is out on beta, we need feedback - so you get a free copy

From: Althaf on 5 Nov 2009 08:34

QUESTON # 1
----------------------
I'm confused about the SQL collation, what is it actually for?. I
created a new database and later checked my SQL collaction it is set
to SQL_Latin1_General_CP1_CS_AS. The problem what i have now is
accessing the field names / tablename has become case sensitive. No
idea how this happend.

for ex. SELECT CHANNEL_NAME from table1 does not work
select Channel_Name from Table1 works

I'm getting mad about this, why is the field and tablenames have
become case sensitive. how do i make it to normal i mean in case
sensitive.

QUESTON # 2
----------------------
I'm making a content database for news management. Example, A single
article would have English, Arabic and spanish content. So my table is
as below.

Table # 1

ID Bigint
English_Title Nvarchar
Spanish_Title Nvarchar
Arabic_Title Nvarchar
English_Content Nvarchar
Spanish_Content Nvarchar
Arabic_Content Nvarchar

is it advisable to do so?, i made it as above as i want the data
related to one article (of all languages) to be stored in a single row
of a table. Now what should be the collation for my database in this
case. is collation related to storing of content also.

Please help me....

From: bill on 5 Nov 2009 12:42

#1. The collation pages determine the order in which your data is
sorted and (as you discovered) whether your database (both the data in
your tables AND the catalog object names themselves (e.g. columns,
tables, etc) are case sensitive. Unless you have a very strong
reason, I would go with a case insensitive page. You can choose the
default collation page for your database when you create it. However,
you can always override the default on a column-by-column basis (once
again, I wouldn't do this unless you have a particularly pressing
need). To change the collation of existing columns, look up the ALTER
statement in BOL.

If you try to join on columns with differing collation pages, you will
get a collation page conflict. I prefer to fix the tables to avoid
this issue, but if that is not possible, you can change collation on
the fly in your join, like this:

SELECT
<list of column_names>
from
<table_1> as a
inner join
<table_2> as b
ON a.<column_whatever> = <b.some_column> COLLATE
<collation_page_name>

The above statement will change the collation for b.some_column for
the purposes of the join only. I would expect some kind of
performance penalty, but I tested the statement on a very large join,
and didn't notice a significant difference. It's much better to have
all the columns collated the same to begin with, but it's handy to
know about changing on the fly.

#2. I would recommend against the approach you outline, and instead
go for a table like this: CREATE TABLE CONTENT_ITEM_TITLE
(CONTENT_ITEM_ID (int) NOT NULL, LANGUAGE_CD varchar(2) NOT NULL,
TITLE_TX varchar(75) NOT NULL). ALTER TABLE CONTENT_ITEM_TITLE ADD
CONSTRAINT CONTENT_TITLE_PK primary key clustered CONTENT_ITEM_ID,
LANGUAGE_CD). CONTENT_ITEM_ID would be a foreign key to the
CONTENT_ITEM table and LANGUAGE_CD should be a foreign key to a table
with all the whose primary key is the ISO code for each language.

The advantages are:
1. It will be very easy for you to add or drop languages this way.

2. You won't have to deal with NULLS in cases where you're missing a
title in a particular language. NULLs complicate queries because they
represent an unknown truth-value and this do not evaluate to either
true or false. You thus have to put extra stuff into your query like
"OR <column> is NULL". But if you have other conditions in your WHERE
and they are connected with AND operators, then you have to logically
punctuate to avoid unexpected results. Also, if you do outer joins
and the joined (e.g. the "outer" table) has NULLs in a column of
interest for selectivity, you can be in for a surprise. This is
especially true if you use the old style join syntax. But even if you
use the ANSI syntax (which is much better), it's still possible to put
the filter in the wrong place (e.g. WHERE clause instead of Join
Condition) and be confused by the results.

3. Your SQL queries will be exactly the same, regardless of which
language the user wants. You don't want to change columns in your
SELECT list (e.g SELECT spanish_title. . . versus SELECT
english_title . . .) simply because the user has chosen a particular
language. With the table that I have outlined, your query is always
of the form "SELECT <column list> from CONTENT_ITEM as ci INNER JOIN
CONTENT_ITEM_TITLE as ct ON ci.<content_item_id> =
ct.<content_item_id> AND ct.LANGUAGE_CD = <whatever the user wants>

Thanks,

Bill

Thanks,

Bil

From: Erland Sommarskog on 6 Nov 2009 18:35

Althaf (althafexcel(a)gmail.com) writes:
> QUESTON # 1
> ----------------------
> I'm confused about the SQL collation, what is it actually for?. I
> created a new database and later checked my SQL collaction it is set
> to SQL_Latin1_General_CP1_CS_AS. The problem what i have now is
> accessing the field names / tablename has become case sensitive. No
> idea how this happend.
>
> for ex. SELECT CHANNEL_NAME from table1 does not work
> select Channel_Name from Table1 works
>
> I'm getting mad about this, why is the field and tablenames have
> become case sensitive. how do i make it to normal i mean in case
> sensitive.

As Bill said, the database collation affects both data and metadata.

I like to point out that since you intend to use multiple languages and
use Unicode, you should not use an SQL collation, but you should use a
Windows collation. For nvarchar an SQL collation behaves just like a
Windows collation, but for varchar SQL collations have its own specific
rules. This can lead to unexpceted surprises, not the least with
performance.

> I'm making a content database for news management. Example, A single
> article would have English, Arabic and spanish content. So my table is
> as below.
>
> Table # 1
>
> ID Bigint
> English_Title Nvarchar
> Spanish_Title Nvarchar
> Arabic_Title Nvarchar
> English_Content Nvarchar
> Spanish_Content Nvarchar
> Arabic_Content Nvarchar
>
> is it advisable to do so?, i made it as above as i want the data
> related to one article (of all languages) to be stored in a single row
> of a table. Now what should be the collation for my database in this
> case. is collation related to storing of content also.

Bill made a good point that if you store the strings in a table keyed by
ID and language, you have something which is more extensible and easier
to use.

However, if you need to sort and index these values, that approach raiaes
some problems. With separate columns you can set the appropriate collation
for each column, Latin1_General_CS_AS for English, Modern_Spanish_CS_AS for
Spanish and Arabic_CS_AS for Arabic.

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

|
Pages: 1
Prev: Are IF statemens legal in a VIEW?
Next: a brand new query tool is out on beta, we need feedback - so you get a free copy