From: SnapDive on 28 Jul 2010 14:30 I have a table with only a simple int pk. I do a merge operation and need to compare several columns in order to know if I need to 'when not matched then insert'. There are 20,000 rows+ in the source table (a temp table) and a few million in the destination table. The table schemata are the same. I know the approx number of unique values in each table, and think I need to create a unique index across all of the columns. Is that true? What is the right syntax for doing that? -- ColumnA: 167 unique values in #temptable -- ColumnB: 1 unique values in #temptable -- ColumnC: 13 unique values in #temptable -- ColumnD: 89 unique values in #temptable -- ColumnE: 550 unique values in #temptable -- ColumnF: 487 unique values in #temptable -- ColumnA: 3690 unique values in desttable -- ColumnB: 3 unique values (plus null is possible) in desttable -- ColumnC: 1113 unique values in desttable -- ColumnD: 2662 unique values in desttable -- ColumnE: 1770 unique values in desttable -- ColumnF: 1480 unique values in desttable
From: Erland Sommarskog on 28 Jul 2010 18:07 SnapDive (SnapDive(a)community.nospam) writes: > I have a table with only a simple int pk. I do a merge operation and > need to compare several columns in order to know if I need to 'when > not matched then insert'. There are 20,000 rows+ in the source table > (a temp table) and a few million in the destination table. The table > schemata are the same. > > I know the approx number of unique values in each table, and think I > need to create a unique index across all of the columns. Is that true? > What is the right syntax for doing that? Not really sure why you would need to create an index over all the columns, although creating the index on the desttable may be good for performance. The syntax? You did not try Books Online? Seems like a more efficient way to get answer of a question than waiting for a response in a newsgroup. But here it goes: CREATE UNIQUE INDEX my_index ON desttable(ColumnA, ColumnB, ...) -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
From: SnapDive on 28 Jul 2010 19:37 "synatx" was not the right word. Should have used strategy. One thing that came to mind was creating a new varchar(42) column and stuffing the sha1 hash of the column values in there, then I could have a simple unique index on that single column instead of a unique index on the 6 columns. Not sure how that would perform though. On Thu, 29 Jul 2010 00:07:15 +0200, Erland Sommarskog <esquel(a)sommarskog.se> wrote: >SnapDive (SnapDive(a)community.nospam) writes: >> I have a table with only a simple int pk. I do a merge operation and >> need to compare several columns in order to know if I need to 'when >> not matched then insert'. There are 20,000 rows+ in the source table >> (a temp table) and a few million in the destination table. The table >> schemata are the same. >> >> I know the approx number of unique values in each table, and think I >> need to create a unique index across all of the columns. Is that true? >> What is the right syntax for doing that? > >Not really sure why you would need to create an index over all >the columns, although creating the index on the desttable may be good for >performance. > >The syntax? You did not try Books Online? Seems like a more efficient way to >get answer of a question than waiting for a response in a newsgroup. But >here it goes: > > CREATE UNIQUE INDEX my_index ON desttable(ColumnA, ColumnB, ...) >
From: Dan on 29 Jul 2010 03:48 I'm not sure if it would perform better, but how about a computed column using CHECKSUM for the columns you need to compare - this might be quicker than a "roll your own" hashing solution. Dan "SnapDive" <SnapDive(a)community.nospam> wrote in message news:8nf156p8jaf05ueco0vmsc167qltm0hi16(a)4ax.com... > "synatx" was not the right word. Should have used strategy. One thing > that came to mind was creating a new varchar(42) column and stuffing > the sha1 hash of the column values in there, then I could have a > simple unique index on that single column instead of a unique index on > the 6 columns. Not sure how that would perform though. > > > > > On Thu, 29 Jul 2010 00:07:15 +0200, Erland Sommarskog > <esquel(a)sommarskog.se> wrote: > >>SnapDive (SnapDive(a)community.nospam) writes: >>> I have a table with only a simple int pk. I do a merge operation and >>> need to compare several columns in order to know if I need to 'when >>> not matched then insert'. There are 20,000 rows+ in the source table >>> (a temp table) and a few million in the destination table. The table >>> schemata are the same. >>> >>> I know the approx number of unique values in each table, and think I >>> need to create a unique index across all of the columns. Is that true? >>> What is the right syntax for doing that? >> >>Not really sure why you would need to create an index over all >>the columns, although creating the index on the desttable may be good for >>performance. >> >>The syntax? You did not try Books Online? Seems like a more efficient way >>to >>get answer of a question than waiting for a response in a newsgroup. But >>here it goes: >> >> CREATE UNIQUE INDEX my_index ON desttable(ColumnA, ColumnB, ...) >> >
From: Erland Sommarskog on 29 Jul 2010 07:46
SnapDive (SnapDive(a)community.nospam) writes: > "synatx" was not the right word. Should have used strategy. One thing > that came to mind was creating a new varchar(42) column and stuffing > the sha1 hash of the column values in there, then I could have a > simple unique index on that single column instead of a unique index on > the 6 columns. Not sure how that would perform though. I would go for the index on the six columns, at least at first. You know, keep it simple. Dan suggested CHECKSUM(), but with a few million rows in the destination table, you are bound to get rows with the same checksum, even if they have different content. A 32-bit value is not enough for that volume. -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx |