Database development [Java Programming]

From: Arne Vajhøj on 25 Apr 2010 12:58

On 25-04-2010 12:44, Zlatko Duric wrote:
> On 04/25/2010 06:14 PM, Tom Anderson wrote:
>> And as Arne said, when you're trying to do something unusual, you may be
>> outside the limits of what ORM can comfortably do, and you'll be better
>> off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy
>> / domain logicky bits, and JDBC for complex queries.
>
> I inherited something that uses Hibernate, and I'm thinking about
> speeding up a few things. I was just thinking about how it would be
> difficult to try to speed all the slow stuff up by replacing all the
> hibernate stuff with all JDBC queries, and with my experience there's no
> chance I'll be doing this. But this approach (combination of ORM and
> JDBC) sounds very interesting to me.
>
> Now, my data is all objects - that suits me perfectly. But there is some
> information about all those objects I'd like to store in a single table
> or maybe two of them, that'd be super-fast to reach, without having to
> look for all those parent/children/node/parameters/other links and
> without having other issues to think about. I believe that part of the
> features would benefit from it a lot in terms of performance.
>
> Now, how common is this approach (combination)? Is there something
> really important I should read about this, before starting with the
> implementation?

If you are accessing the data as objects, then I don't think
that switching from Hibernate to raw JDBC is the right direction
to optimize the code.

Instead you should focus on tuning Hibernate and the databases
itself.

Hibernate can be slow and Hibernate can be fast. It all depends
on the guy writing the code.

Arne

From: Arne Vajhøj on 25 Apr 2010 13:00

On 25-04-2010 12:14, Tom Anderson wrote:
> On Sun, 25 Apr 2010, Arne Vajh?j wrote:
>
>> On 25-04-2010 01:03, Jack wrote:
>>
>>> When I work on database development projects, I use JDBC and SQL. Many
>>> people use hibernate/spring. Can somebody explain the pros and cons of
>>> using JDBC and SQL vs using hibernate/spring on database
>>> developments?
>>
>> That is a rather big discussion.
>>
>> The ultra short version is:
>> - ORM (Hibernate or other) is best when the problem to
>> be solved is CRUD of objects
>> - pure SQL (JDBC) is best when you want to do something
>> more unusual
>
> I'd rephrase that slightly to say that ORM is best when you want to deal
> with your data as objects - when you need to be able to call methods,
> traverse object graphs, and generally think of your data as objects.
>
> If your data is something that isn't usefully thought of as objects
> (perhaps a big boring spew of temperature measurements over time or
> something) then there isn't much benefit to ORM. There's probably no
> real harm either, so if you prefer ORM, you can still use it.
>
> And as Arne said, when you're trying to do something unusual, you may be
> outside the limits of what ORM can comfortably do, and you'll be better
> off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy
> / domain logicky bits, and JDBC for complex queries.

It is possible to mix different persistence technologies, but it raises
lots of potential consistency issues. I would avoid it if possible.

Arne

From: markspace on 25 Apr 2010 13:35

Zlatko Duric wrote:

> But there is some
> information about all those objects I'd like to store in a single table
> or maybe two of them, that'd be super-fast to reach, without having to
> look for all those parent/children/node/parameters/other links ...

> Now, how common is this approach (combination)? Is there something
> really important I should read about this, before starting with the
> implementation?

As far as I know, de-normalizing a database for faster access is very
common, as long as you started with a good normalized design, and you
document carefully what you denormalize, and you measure carefully the
performance boost and can justify the extra maintenance.

I don' have any links handy, but if you Google for "database
denormalization optimization" there seems to be plenty of info. I'd try
some standard techniques for denormalization first, rather than try to
improvise something.

From: Arne Vajhøj on 25 Apr 2010 13:40

On 25-04-2010 13:35, markspace wrote:
> Zlatko Duric wrote:
>> But there is some information about all those objects I'd like to
>> store in a single table or maybe two of them, that'd be super-fast to
>> reach, without having to look for all those
>> parent/children/node/parameters/other links ...
>
>> Now, how common is this approach (combination)? Is there something
>> really important I should read about this, before starting with the
>> implementation?
>
> As far as I know, de-normalizing a database for faster access is very
> common, as long as you started with a good normalized design, and you
> document carefully what you denormalize, and you measure carefully the
> performance boost and can justify the extra maintenance.
>
> I don' have any links handy, but if you Google for "database
> denormalization optimization" there seems to be plenty of info. I'd try
> some standard techniques for denormalization first, rather than try to
> improvise something.

It is very common to denormalise databases for "performance".

I have a strong suspicion that in more than 90% of cases it
is unwarranted.

Databases are extremely optimized to do joins efficiently.

If the logical and physical design is good then joins is
usually not the problem.

Even if it is a problem, then the specific databases may
offer the possibility of materialized views to solve the
problem.

Arne

From: Lew on 25 Apr 2010 14:09

Zlatko Duric wrote:
>>> But there is some information about all those objects I'd like to
>>> store in a single table or maybe two of them, that'd be super-fast to
>>> reach, without having to look for all those
>>> parent/children/node/parameters/other links ...

Wrong approach.

>>> Now, how common is this approach (combination)? Is there something

Common doesn't mean correct.

>>> really important I should read about this, before starting with the
>>> implementation?

Yes. Read about why normalization is important in the first place. Read
about the things others have mentioned, like (materialized) views. Read about
why "premature optimization is the root of all evil."

Whatever you do to "optimize" won't, at least not unless you actually
*measure* performance before and after your so-called "optimizations" under
realistic loads and field conditions.

Don't forget to take into account the cost of the increased code complexity
for denormalized structures, and compare that to the cost of keeping data
normalized. Don't forget to take into account the actuarial cost of the risk
to your data from the denormalization.

Better yet, stick with best practices.

markspace wrote:
>> As far as I know, de-normalizing a database for faster access is very
>> common, as long as you started with a good normalized design, and you
>> document carefully what you denormalize, and you measure carefully the
>> performance boost and can justify the extra maintenance.
>>
>> I don't have any links handy, but if you Google for "database
>> denormalization optimization" there seems to be plenty of info. I'd try
>> some standard techniques for denormalization first, rather than try to
>> improvise something.

Arne Vajhøj wrote:
> It is very common to denormalise databases for "performance".

OP: Notice how some of us put "performance" in quotation marks? There's a
good reason for that.

> I have a strong suspicion that in more than 90% of cases it
> is unwarranted.

You are being kind.

> Databases are extremely optimized to do joins efficiently.
>
> If the logical and physical design is good then joins is
> usually not the problem.
>
> Even if it is a problem, then the specific databases may
> offer the possibility of materialized views to solve the
> problem.

When one denormalizes a database for "performance", one usually winds up with
none of the expected performance gains and all of the expected increase in
risk to the data.

At least, one ought to expect that risk. The purpose of normalizing a
database is to prevent data anomalies and enforce data constraints.
Denormalize and you screw that up.

As for ORM efficiency, as Arne pointed out:
> Hibernate [or any other ORM framework] can be slow
> and Hibernate can be fast.
> It all depends on the guy writing the code.

Properly written, JPA code is no slower than raw SQL coding, takes less time
to develop and maintain (part of the cost-benefit equation, folks!), and is a
much more natural fit to the object model of the application.

Furthermore, JPA frameworks offload much of the management effort for
persistent storage connections and O-R mapping. This is similar to how
managed-memory environments like the JVM and .Net offload the effort, expense
and risk of memory management from the programmer. Don't give that up lightly.

Beyond that, Hibernate and other JPA frameworks lend themselves well to
inbuilt and outboard cache approaches. Out of the box, JPA gives you a "level
one" cache (a.k.a. a "session") that will help optimize interaction with the
database.

If you're looking to optimize database access, it is by far much more
productive to pay attention to things like client- and server-side statement
preparation, scope and lifecycle of 'EntityManager' instances, database tuning
parameters (such as work memory or whatever the DBMS calls it), connection
pooling, disk speed (use high-rotational-speed drives in a RAID array with a
battery-backed RAID controller), scalability, concurrency, indexes,
partitioning (database, not disk), and other adjustments that will improve
performance WITHOUT TOTALLY HOSING YOUR DATA MODEL.

A good object model that caches well without concurrency bottlenecks will
scale well to additional hardware and provide much more performance than a
messed-up data model, without the risks of the latter.

I've been part of performance optimization efforts for databases a number of
times. Denormalization usually has hurt, not helped. In one case that I
recall fondly, the denormalized structure caused a quadratic increase in
processing time with data quantity, rather than the linear increase a
normalized database would have provided (and did, when they finally accepted
my recommendation to normalize, but not before causing a major problem with
their customer that got the project manager replaced and nearly cost the
contract).

Program run time is rarely the dominant cost in a project.

Code correctly and well first. That will almost always give sufficient
performance. If not, MEASURE and optimize and MEASURE again, focusing first
and foremost on things that don't mess you up by harming data integrity,
maintainability or scalability.

--
Lew

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: Null pointer issues
Next: Urgent