Mass updating a database with unique key [Visual Basic]

Prev: ADO: Mass updating a database with unique key
Next: add initial MSHFLexGrid row number to adodb. recordset

From: Tony Toews [MVP] on 2 Feb 2010 14:47

"Ivar" <ivar.ekstromer000(a)ntlworld.com> wrote:

>If it's speed your looking for then my first thought would be to keep
>communication with the ADO to a minimum. Run a query on the ADO to return a
>single field recordset of the unique keys that already exists in the
>database.Create a VB collection and add each record of the recordset to it.
>Create a second VB collection. Try to add the new keys to the first
>collection, if a duplicate key error happens when adding to the first
>collection then add the new key to the second collection. The second
>collection will then contain only the unique keys of the records to add to
>the database. I'm sure this would be a lot faster than using ADO error
>checking.

Interesting idea. That would depend on the ratio of the number of
records already present and to be inserted. If 10M and 20 then this
solution wouldn't be practical. If 1M and 1M then it would make a
lot of sense.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeupdater.com/
Granite Fleet Manager http://www.granitefleet.com/

From: Schmidt on 2 Feb 2010 17:02

"Tony Toews [MVP]" <ttoews(a)telusplanet.net> schrieb im Newsbeitrag
news:s50hm5ditqornk7uo3g4mankv2t618tuqp(a)4ax.com...
> "Ivar" <ivar.ekstromer000(a)ntlworld.com> wrote:
>
> >If it's speed your looking for then my first thought would
> >be to keep communication with the ADO to a minimum.
> >Run a query on the ADO to return a single field recordset
> >of the unique keys that already exists in the database.
> >Create a VB collection and add each record of the recordset
> >to it...
>
> Interesting idea. That would depend on the ratio of the
> number of records already present and to be inserted.
> If 10M and 20 then this solution wouldn't be practical.

This solution would be even more performant (relative),
the more records exist in the DB.
The KeyPoint (as already addressed by Ivar) is, to avoid
inefficient Actions on the DB on a "per Record-Base" -
even more so, if each of these single "Record-Exists-Checks"
would be additionally buffered (or wrapped) by a VB-
Error-Handler.
A DB-engine is very good on "well-sized set-operations",
but not that much on "highfrequent single DB-actions".

An Exists-Check against a Dictionary which offers such
an Exists-method directly (to avoid On Error wrapping)
(or maybe a lightweight, selfwritten [B]-Tree-structure)
cannot be beaten IMO, if compared to DB-based checks,
which have to go through too many layers then.

To reduce the amount of "hashed RecordKeys to check
against", one can perform a fast "MinKey, MaxKey"
determination on the (as Faraz writes, about 2000 new
to be inserted records).
This can be achieved in a simple loop first (no sorting
needed on the 2000 new insert-candidates - just by
checking/extending a Min- and a Max-Variable in
one single "loop through". Though pre-sorting would not
hurt.

Then the Select against the already existing RecordKeys in the
huge table can be reduced to the range (or ranges, if there
are too huge gaps in-between) - this way you will end up
with 1-3 smaller (single KeyField) Recordsets, which need
to be transferred into the "Fast-KeyLookup" container-structures.

I've already had to implement such a scenario and ended up
with something like that - everything else I've tested was
slower (including something like Larrys suggestion, based on
Temp-Tables, which already would work faster than the
"Error-Buffered #2-approach", since it would involve much
less stress on the "DBLayer-*interface*", working more
"DBEngine internally" then).

That was based on my SortedDictionary-Implementation
and SQLite, but the principle should work well also on
ADO/JET (or DAO/JET), I'm very sure about that.

Olaf

From: Faraz Azhar on 4 Feb 2010 00:08

On Feb 1, 3:53 pm, "Ivar" <ivar.ekstromer...(a)ntlworld.com> wrote:
> If it's speed your looking for then my first thought would be to keep
> communication with the ADO to a minimum. Run a query on the ADO to return a
> single field recordset of the unique keys that already exists in the
> database.Create a VB collection and add each record of the recordset to it.
> Create a second VB collection. Try to add the new keys to the first
> collection, if a duplicate key error happens when adding to the first
> collection then add the new key to the second collection. The second
> collection will then contain only the unique keys of the records to add to
> the database. I'm sure this would be a lot faster than using ADO error
> checking.
>
> Ivar

I tried the #2 option. It was pretty slow. I'll try this one as well.
Yes, speed is a big concern for me. On every synchronization I run
about 3k SQL queries to achieve #2 option. That really pulls down the
speed. I'll try the buffered approach by using collections (or
dictionaries). Lets see if that increases the throughput.

From: David Youngblood on 4 Feb 2010 06:54

> "Faraz Azhar" <itzfaraz(a)gmail.com> wrote...
> I tried the #2 option. It was pretty slow. I'll try this one as well.
> Yes, speed is a big concern for me. On every synchronization I run
> about 3k SQL queries to achieve #2 option. That really pulls down the
> speed. I'll try the buffered approach by using collections (or
> dictionaries). Lets see if that increases the throughput.

Here are a couple of things you might try.

DAO - generally faster than ADO against an access db. It also has a Seek
method that provides for faster lookup of an indexed field.

BeginTrans/CommitTrans - can be used to buffer your data, improving speed

I haven't used it since vb3, but something like this.
This is an outline, not code.

BeginTrans
Do While Not EOF(ff)
Get next record
i = i + 1

.Seek "=", keyvalue
If .NoMatch Then
.AddNew
Fill in new record with data
.Update
End If

If i Mod 100 = 0 Then
CommitTrans
BeginTrans
End If

Loop
CommitTrans

This may not work though, if your new data has duplicates within itself.

David

From: Helmut Meukel on 4 Feb 2010 10:49

"David Youngblood" <dwy(a)flash.net> schrieb im Newsbeitrag
news:OPRNvDZpKHA.5588(a)TK2MSFTNGP02.phx.gbl...
>
> Here are a couple of things you might try.
>
> DAO - generally faster than ADO against an access db. It also has a Seek
> method that provides for faster lookup of an indexed field.
>

I found Seek unbeatable fast, compared to Find method or SQL.

> BeginTrans/CommitTrans - can be used to buffer your data, improving speed
>
> I haven't used it since vb3, but something like this.
> This is an outline, not code.
>
> BeginTrans
> Do While Not EOF(ff)
> Get next record
> i = i + 1
>
> .Seek "=", keyvalue
> If .NoMatch Then
> .AddNew
> Fill in new record with data
> .Update
> End If
>
> If i Mod 100 = 0 Then
> CommitTrans
> BeginTrans
> End If
>
> Loop
> CommitTrans
>
> This may not work though, if your new data has duplicates within itself.
>

Hmm, I never tested if Seek will find the new entry if it's still part of a
open
transaction, but otherwise it will find the new entry and the code will skip
the duplicate in your new data just fine.
Worst case you can always add the key values of the not committed new
records to an array and check the new key value against the values in the
array first. Clear the array after the CommitTrans.

Helmut.

First | Prev |
Pages: 1 2
Prev: ADO: Mass updating a database with unique key
Next: add initial MSHFLexGrid row number to adodb. recordset