IMPORT vs. LOAD [DB2]

Prev: HADR is failing during online backup
Next: CLP giving weird error code for simple SQL

From: Gregor Kovač on 20 May 2010 00:31

Hi!

Today we had a hot discussion about the performance of LOAD vs. IMPORT.
I'd like to hear what you have to say:
- Should LOAD be faster that IMPORT in every circumstance?
- Is IMPORT more suited for small files and LOAD for large files?
- Is LOAD more suited for table that have many indexes and IMPORT for tables
that have few or no indexes?

Best regards,
Kovi

From: Frederik Engelen on 20 May 2010 04:27

On May 20, 6:31Â am, Gregor KovaÄ <gregor.ko...(a)mikropis.si> wrote:
> Hi!
>
> Today we had a hot discussion about the performance of LOAD vs. IMPORT.
> I'd like to hear what you have to say:
> - Should LOAD be faster that IMPORT in every circumstance?
> - Is IMPORT more suited for small files and LOAD for large files?
> - Is LOAD more suited for table that have many indexes and IMPORT for tables
> that have few or no indexes?
>
> Best regards,
> Â Â Â Â Kovi

Hello Kovi,

Interesting questions, although I'm not sure they are always the rigth
ones to ask. There are factors that can influence your choice for
IMPORT or LOAD that might be more important:
- recoverability of the operation
- for moving data between databases, load from cursor has a lot of
advantages
- availability of data during/after insert
- presence of generated columns
- compression on tables
- ...

I'll still try to give a shot at your questions, though:
> - Should LOAD be faster that IMPORT in every circumstance?
I tried a small table with a lot of indexes. A load operation takes
consistently 4 times longer than the corresponding insert. Another
thing to consider for no-so large tables: how fast can you type 'SET
INTEGRITY...'? :-)

> - Is IMPORT more suited for small files and LOAD for large files?
I think you need to consider at least the points above before taking
into account the size of the file.

> - Is LOAD more suited for table that have many indexes and IMPORT for tables
According to my little test on a small table with lots of indexes (15
on a 3-column table), it's exactly the inverse. I didn't do the test
for large tables, there it might also depend on the indexing mode you
choose (incremental or rebuild). I suggest you just give it a try
yourself.

--
Frederik Engelen
Realdolmen

From: Serge Rielau on 20 May 2010 10:29

Kovi,

Feedback from backstage:

Load is faster than import in most of cases. Better to describe cases
where import can be faster:

1) Correct about small files. Load has a startup and shutdown cost.
2) Parallel import can catch up and beat load (ie, running lots of
parallel import command with the data split already) specifically when
XML is in the table.

More indexes will likely hurt import more than load, because the data
isn't sorted, and the index inserts are a single row at a time. Load
will be faster with larger files regardless of the number of indexes

--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

From: kenfar on 21 May 2010 09:09

On May 20, 8:29 am, Serge Rielau <srie...(a)ca.ibm.com> wrote:

> More indexes will likely hurt import more than load, because the data
> isn't sorted, and the index inserts are a single row at a time. Load
> will be faster with larger files regardless of the number of indexes

I've found that indexes are my enemy with the load utility - even
though I can load the data in parallel and only do incremental index
builds, the index build phase isn't parallel. Rebuilding indexes
isn't a great strategy for real-time data warehousing, and I don't yet
have enough funding to go to DPF. My current server has a decent
number of rather slow CPUs. So, in this scenario I can split a file
into 8 individual pieces then import each simultaneously and get
better overall performance. Unfortunately. Think I might be missing
anything?

Thanks,

Ken

From: Serge Rielau on 21 May 2010 23:36

On 5/21/2010 9:09 AM, kenfar wrote:
> On May 20, 8:29 am, Serge Rielau<srie...(a)ca.ibm.com> wrote:
>
>> More indexes will likely hurt import more than load, because the data
>> isn't sorted, and the index inserts are a single row at a time. Load
>> will be faster with larger files regardless of the number of indexes
>
> I've found that indexes are my enemy with the load utility - even
> though I can load the data in parallel and only do incremental index
> builds, the index build phase isn't parallel. Rebuilding indexes
> isn't a great strategy for real-time data warehousing, and I don't yet
> have enough funding to go to DPF. My current server has a decent
> number of rather slow CPUs. So, in this scenario I can split a file
> into 8 individual pieces then import each simultaneously and get
> better overall performance. Unfortunately. Think I might be missing
> anything?
Just a hunch. You are on DB2 9.1 or Db2 8? I.e. prior to the
multi-threaded engine?

Cheers
Serge

--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

| Next | Last
Pages: 1 2
Prev: HADR is failing during online backup
Next: CLP giving weird error code for simple SQL