From: Tom Anderson on
On Wed, 7 Apr 2010, Lew wrote:

> Arne Vajh?j wrote:
>>>>> Especially with a high update ratio then an UPDATE with a WHERE and
>>>>> only an INSERT if no rows were updated could be worth considering.
>
> Andreas Leitgeb wrote:
>>>> Would it be safe? Or could one end up with two entries for that key,
>>>> if the attempted updates happen at the same time?
>
> Martin Gregorie wrote:
>>> Should be OK provided you use explicit commit units rather than default
>>> automatic commits.
>
> Andreas Leitgeb wrote:
>> "Should be ..." doesn't really sound too convincing in the context of
>> possible concurrency problems ;-)
>
> If the database engine supports transactions and you remember to use
> them, it will be safe.

You mean, if it has them, you remember to use them, and you're happy to
live with the consequences of using the serializable isolation level,
you'll be safe.

Being able to do an 'upsert' in a single atomic operation makes it
possible to be safe much faster than having to do it with two queries.

> Another trick to consider is judicious use of SELECT ... FOR UPDATE.

True. I don't think it's applicable here, though.

tom

--
Thinking about it, history begins now -- sarah
From: Tom Anderson on
On Tue, 6 Apr 2010, Arne Vajh?j wrote:

> On 05-04-2010 10:09, Tom Anderson wrote:
>> On Sun, 4 Apr 2010, Arne Vajh?j wrote:
>>
>>> On 04-04-2010 20:12, Lew wrote:
>>>> I hear good things about H2.
>>>
>>> ;ALLOW_LITERALS=NONE in the connection URL certainly is nice !
>>
>> H2 supports the rather useful MERGE INTO command [...] Anyway, argh,
>> looks like i'm going to have to add a facility for database-specific
>> SQL for the insert operation if i want this to be portable. I'd really
>> like to avoid having to fall back to running multiple queries to probe
>> the database for the existence of the key and then do UPDATE or INSERT
>> accordingly.
>
> Given the maintenance work of adding a specific SQL statement for every
> new database wouldn't it just be easier to make two statements?

Easier, of course. Significantly easier? Not really. It's just one SQL
statement, and not a very complicated one.

> Especially with a high update ratio then an UPDATE with a WHERE and only
> an INSERT if no rows were updated could be worth considering.

Yes, absolutely. And easier to do than the INSERT-first case, because you
have to detect failure of the primary key constraint to trigger the
UPDATE, and i'm not confident about being able to do that portably.

> Anyway the cost should not be so bad, because the first of the two
> statements should get the relevant pages loaded in memory so the second
> should be fast.

As discussed downstream, i worry that it will clobber concurrency if done
safely.

tom

--
Thinking about it, history begins now -- sarah
From: Martin Gregorie on
On Wed, 07 Apr 2010 21:11:01 +0000, Andreas Leitgeb wrote:

> "Should be ..." doesn't really sound too convincing in the context of
> possible concurrency problems ;-)
>
That was me being cautious since we don't know what DBMS the OP has in
mind. For the following code to work the DBMS *must* support explicit
transaction control, serialised transaction mode and (probably) the
ability to set the lock timeout period. If we have all that the code is
straight forward:

set_session_conditions
{
set auto-commit off;
set transaction serialisation;
set lock timeout = 5 seconds; // needs to be at least 2x the max
// transaction execution time.
}

add_entry
{
start transaction // may be implied
select count(*) from table where key='?'
if (count == zero)
insert into table (key, value) values ('?', '?');
commit
}

If there are no explicit transactions, then all bets are off: it can't
be done without serialising the database access, i.e. you'll need to
queue updates for execution by a single worker thread - or something like
that.

If you can't serialise transactions, then you *must* make the key unique
and be prepared for inserts to fail with a duplicate key exception. You
must also be prepared to rollback and retry if the transaction times out.

> Which row or page would be locked, if the where-clause didn't match
> anything? The whole table?
>
Depends what locks you're using - table, page or row. A decent DBMS will
let you specify at least table or row locking and some may allow page
locking as well. It doesn't actually matter in this case since there's
only a single row accessed during a transaction. However, the bigger the
scope of the lock the more chance that a concurrent transaction will fail
due to a lock conflict and need to be retried

Ideally you'd use row locking: then the select will lock zero or 1 rows
and the insert, if run, will lock one row. Page locking extends this to
the page containing the row and table locking prevents any concurrency
for the duration of the transaction.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
From: Andreas Leitgeb on
Martin Gregorie <martin(a)address-in-sig.invalid> wrote:
>> Which row or page would be locked, if the where-clause didn't match
>> anything? The whole table?
> Depends what locks you're using - table, page or row. ...
> Ideally you'd use row locking: then the select will lock zero or 1 rows
> and the insert, if run, will lock one row.

That explained it. Thanks.

So second thread waits on its insert-line, until first thread commits,
and then second thread gets some unique constraint error. It then just
repeats the whole transaction, which (assuming no further concurrencies)
succeeds right at the update, so second thread's value wins, as expected
for put-semantics.

I don't know, why I was so stuck at expecting threads to already block on
the empty update...

From: Tom Anderson on
On Wed, 7 Apr 2010, Martin Gregorie wrote:

> On Wed, 07 Apr 2010 21:11:01 +0000, Andreas Leitgeb wrote:
>
>> "Should be ..." doesn't really sound too convincing in the context of
>> possible concurrency problems ;-)
>
> That was me being cautious since we don't know what DBMS the OP has in
> mind. For the following code to work the DBMS *must* support explicit
> transaction control, serialised transaction mode and (probably) the
> ability to set the lock timeout period. If we have all that the code is
> straight forward:
>
> set_session_conditions
> {
> set auto-commit off;
> set transaction serialisation;
> set lock timeout = 5 seconds; // needs to be at least 2x the max
> // transaction execution time.
> }
>
> add_entry
> {
> start transaction // may be implied
> select count(*) from table where key='?'
> if (count == zero)
> insert into table (key, value) values ('?', '?');
> commit
> }

I'd be keen to see an update in there somewhere too!

tom

--
The sunlights differ, but there is only one darkness. -- Ursula K. LeGuin,
'The Dispossessed'