MappedByteBuffer [Java Help]

Prev: swing html parser
Next: Class Constants - pros and cons

From: markspace on 24 Jul 2010 18:57

Alan Gutierrez wrote:

> I'm also eager to hear someone destroy the idea entirely, to tell me,
> "no you're not supposed to use `MappedByteBuffer` that way,"

I think you're well on your way to becoming the expert on memory mapped
B-trees around here. ;)

I don't have a lot of experience using memory mapped files like this.
What I would question is why you want to do this at all. Why not just
use JDB or SQLite and not implement your own B-trees at all? I assume
you must have your reasons.

I think I'd like to see the difference in performance between hand
rolled B-trees and JDB. Should be useful to know if the extra effort
results in a worthwhile performance gain, or if JDB is fast enough as-is.

From: Alan Gutierrez on 25 Jul 2010 05:18

markspace wrote:
> Alan Gutierrez wrote:
>
>> I'm also eager to hear someone destroy the idea entirely, to tell me,
>> "no you're not supposed to use `MappedByteBuffer` that way,"
>
>
> I think you're well on your way to becoming the expert on memory mapped
> B-trees around here. ;)

Green light then. Trial and error, here I come.

> I don't have a lot of experience using memory mapped files like this.
> What I would question is why you want to do this at all. Why not just
> use JDB or SQLite and not implement your own B-trees at all? I assume
> you must have your reasons.

Reasons? Why, of course I have my reasons! I'm building a Skyscraper to
the Moon!

For this I will need a write ahead log, a B+tree, and, eventually, a
Paxos implementation.

And by Skyscraper to the Moon, I mean, a personal project that seems
absurdly ambitious, even to myself, that only I could possibly care
about, especially at this stage.

I'm asserting my freedom to tinker. I find the topic interesting. I've
got a web application I'd like to build. Its not a secret, but the
details are off topic. The application is an end to justify the means.
The means is a database assembled from database primitives. The
motivation is that I've come to suspect the value of the myriad of
abstraction layers I've come employ in my Java programming, especially
for solo programmer projects.

> I think I'd like to see the difference in performance between hand
> rolled B-trees and JDB. Should be useful to know if the extra effort
> results in a worthwhile performance gain, or if JDB is fast enough as-is.

When I work with Hibernate, the notion that Hibernate abstracts the
database so I don't have to think about it is something that I'm
subscribing to in order to humor Hibernate, to make Hibernate feel like
my super best helper, when in my mind, I know that I've had to
de-normalize, time, tune, index and configure, configure, configure to
get the performance I seek. I'm very conscious of each index, and the
strategy for each query, and even which indexes are written to which
disk, and which disk hosts my write ahead log.

Actually, Hibernate *is* my super best helper, but I don't consider it
an abstraction. Everything I do with Hibernate is very deliberate.

In fact, Hibernate has talked me out of pessimistic concurrency
controls, so I've begun to structure my web applications with very
deliberate concurrency controls, optimistic locking that catches
concurrency errors and retries operations, usually in worker threads,
with some form of queuing, so I can reduce collisions. The Hibernate
community favors optimistic concurrency control, and now so do I.

Recently, I set out to implement incremental backups on some MySQL
databases that had gotten rather large for to dump whole. The
experience, not yet complete, of implementing incremental backups on
INNODB was the last straw. It shook my confidence right down the point
where I thought, if the only thing keeping me pursuing my interest in
distributed databases is a fear that I won't get durability right, well,
here I am learning about the inner workings of INNODB to get durability
right.

Thus, when I work with Hibernate, the database is in my head, from
concurrency, to schema, to indexes, to the backups. My experiment is to
see if a rugged write ahead log and generic B+tree is *less* effort than
all this.

I sense an opportunity for arbitrage.

If you nose around my GitHub account (bigeasy) you'll find it with its
MIT license, but its not setup for distribution yet, and I'm not
flogging it here, and never will. There are some incredible open source
projects producing powerful alternatives to ORM+RDBMS, to which I could
contribute, but that's not what I want to do.

I want to build a Skyscraper to the Moon.

--
Alan Gutierrez - alan(a)blogometer.com - http://twitter.com/bigeasy

From: Esmond Pitt on 25 Jul 2010 23:36

> Green light then. Trial and error, here I come.

The Lucene project found about a 20% speed improvement using mapped byte
buffers for reading Lucene indexes.

They couldn't use it for output because you have to predetermine the
size, so you can't extend the file.

One *major* gotcha you need to be aware of is that there is no GC of the
address space of a MappedByteBuffer, so you can't just open them
willy-nilly. As in the Lucene case they are best used when you have a
known number of files of known size.

From: markspace on 26 Jul 2010 01:15

Esmond Pitt wrote:

> One *major* gotcha you need to be aware of is that there is no GC of the
> address space of a MappedByteBuffer, so you can't just open them
> willy-nilly. As in the Lucene case they are best used when you have a
> known number of files of known size.

So, if a memory mapped file is closed, its address space is never
reclaimed? That seems really wrong....

From: Esmond Pitt on 26 Jul 2010 07:18

> So, if a memory mapped file is closed, its address space is never
> reclaimed? That seems really wrong....

I told you it was a gotcha. See the Bug Parade, endless discussion about it.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: swing html parser
Next: Class Constants - pros and cons