Prev: Null pointer issues
Next: Urgent
From: Arne Vajhøj on 25 Apr 2010 12:58 On 25-04-2010 12:44, Zlatko Duric wrote: > On 04/25/2010 06:14 PM, Tom Anderson wrote: >> And as Arne said, when you're trying to do something unusual, you may be >> outside the limits of what ORM can comfortably do, and you'll be better >> off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy >> / domain logicky bits, and JDBC for complex queries. > > I inherited something that uses Hibernate, and I'm thinking about > speeding up a few things. I was just thinking about how it would be > difficult to try to speed all the slow stuff up by replacing all the > hibernate stuff with all JDBC queries, and with my experience there's no > chance I'll be doing this. But this approach (combination of ORM and > JDBC) sounds very interesting to me. > > Now, my data is all objects - that suits me perfectly. But there is some > information about all those objects I'd like to store in a single table > or maybe two of them, that'd be super-fast to reach, without having to > look for all those parent/children/node/parameters/other links and > without having other issues to think about. I believe that part of the > features would benefit from it a lot in terms of performance. > > Now, how common is this approach (combination)? Is there something > really important I should read about this, before starting with the > implementation? If you are accessing the data as objects, then I don't think that switching from Hibernate to raw JDBC is the right direction to optimize the code. Instead you should focus on tuning Hibernate and the databases itself. Hibernate can be slow and Hibernate can be fast. It all depends on the guy writing the code. Arne
From: Arne Vajhøj on 25 Apr 2010 13:00 On 25-04-2010 12:14, Tom Anderson wrote: > On Sun, 25 Apr 2010, Arne Vajh?j wrote: > >> On 25-04-2010 01:03, Jack wrote: >> >>> When I work on database development projects, I use JDBC and SQL. Many >>> people use hibernate/spring. Can somebody explain the pros and cons of >>> using JDBC and SQL vs using hibernate/spring on database >>> developments? >> >> That is a rather big discussion. >> >> The ultra short version is: >> - ORM (Hibernate or other) is best when the problem to >> be solved is CRUD of objects >> - pure SQL (JDBC) is best when you want to do something >> more unusual > > I'd rephrase that slightly to say that ORM is best when you want to deal > with your data as objects - when you need to be able to call methods, > traverse object graphs, and generally think of your data as objects. > > If your data is something that isn't usefully thought of as objects > (perhaps a big boring spew of temperature measurements over time or > something) then there isn't much benefit to ORM. There's probably no > real harm either, so if you prefer ORM, you can still use it. > > And as Arne said, when you're trying to do something unusual, you may be > outside the limits of what ORM can comfortably do, and you'll be better > off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy > / domain logicky bits, and JDBC for complex queries. It is possible to mix different persistence technologies, but it raises lots of potential consistency issues. I would avoid it if possible. Arne
From: markspace on 25 Apr 2010 13:35 Zlatko Duric wrote: > But there is some > information about all those objects I'd like to store in a single table > or maybe two of them, that'd be super-fast to reach, without having to > look for all those parent/children/node/parameters/other links ... > Now, how common is this approach (combination)? Is there something > really important I should read about this, before starting with the > implementation? As far as I know, de-normalizing a database for faster access is very common, as long as you started with a good normalized design, and you document carefully what you denormalize, and you measure carefully the performance boost and can justify the extra maintenance. I don' have any links handy, but if you Google for "database denormalization optimization" there seems to be plenty of info. I'd try some standard techniques for denormalization first, rather than try to improvise something.
From: Arne Vajhøj on 25 Apr 2010 13:40 On 25-04-2010 13:35, markspace wrote: > Zlatko Duric wrote: >> But there is some information about all those objects I'd like to >> store in a single table or maybe two of them, that'd be super-fast to >> reach, without having to look for all those >> parent/children/node/parameters/other links ... > >> Now, how common is this approach (combination)? Is there something >> really important I should read about this, before starting with the >> implementation? > > As far as I know, de-normalizing a database for faster access is very > common, as long as you started with a good normalized design, and you > document carefully what you denormalize, and you measure carefully the > performance boost and can justify the extra maintenance. > > I don' have any links handy, but if you Google for "database > denormalization optimization" there seems to be plenty of info. I'd try > some standard techniques for denormalization first, rather than try to > improvise something. It is very common to denormalise databases for "performance". I have a strong suspicion that in more than 90% of cases it is unwarranted. Databases are extremely optimized to do joins efficiently. If the logical and physical design is good then joins is usually not the problem. Even if it is a problem, then the specific databases may offer the possibility of materialized views to solve the problem. Arne
From: Lew on 25 Apr 2010 14:09
Zlatko Duric wrote: >>> But there is some information about all those objects I'd like to >>> store in a single table or maybe two of them, that'd be super-fast to >>> reach, without having to look for all those >>> parent/children/node/parameters/other links ... Wrong approach. >>> Now, how common is this approach (combination)? Is there something Common doesn't mean correct. >>> really important I should read about this, before starting with the >>> implementation? Yes. Read about why normalization is important in the first place. Read about the things others have mentioned, like (materialized) views. Read about why "premature optimization is the root of all evil." Whatever you do to "optimize" won't, at least not unless you actually *measure* performance before and after your so-called "optimizations" under realistic loads and field conditions. Don't forget to take into account the cost of the increased code complexity for denormalized structures, and compare that to the cost of keeping data normalized. Don't forget to take into account the actuarial cost of the risk to your data from the denormalization. Better yet, stick with best practices. markspace wrote: >> As far as I know, de-normalizing a database for faster access is very >> common, as long as you started with a good normalized design, and you >> document carefully what you denormalize, and you measure carefully the >> performance boost and can justify the extra maintenance. >> >> I don't have any links handy, but if you Google for "database >> denormalization optimization" there seems to be plenty of info. I'd try >> some standard techniques for denormalization first, rather than try to >> improvise something. Arne Vajhøj wrote: > It is very common to denormalise databases for "performance". OP: Notice how some of us put "performance" in quotation marks? There's a good reason for that. > I have a strong suspicion that in more than 90% of cases it > is unwarranted. You are being kind. > Databases are extremely optimized to do joins efficiently. > > If the logical and physical design is good then joins is > usually not the problem. > > Even if it is a problem, then the specific databases may > offer the possibility of materialized views to solve the > problem. When one denormalizes a database for "performance", one usually winds up with none of the expected performance gains and all of the expected increase in risk to the data. At least, one ought to expect that risk. The purpose of normalizing a database is to prevent data anomalies and enforce data constraints. Denormalize and you screw that up. As for ORM efficiency, as Arne pointed out: > Hibernate [or any other ORM framework] can be slow > and Hibernate can be fast. > It all depends on the guy writing the code. Properly written, JPA code is no slower than raw SQL coding, takes less time to develop and maintain (part of the cost-benefit equation, folks!), and is a much more natural fit to the object model of the application. Furthermore, JPA frameworks offload much of the management effort for persistent storage connections and O-R mapping. This is similar to how managed-memory environments like the JVM and .Net offload the effort, expense and risk of memory management from the programmer. Don't give that up lightly. Beyond that, Hibernate and other JPA frameworks lend themselves well to inbuilt and outboard cache approaches. Out of the box, JPA gives you a "level one" cache (a.k.a. a "session") that will help optimize interaction with the database. If you're looking to optimize database access, it is by far much more productive to pay attention to things like client- and server-side statement preparation, scope and lifecycle of 'EntityManager' instances, database tuning parameters (such as work memory or whatever the DBMS calls it), connection pooling, disk speed (use high-rotational-speed drives in a RAID array with a battery-backed RAID controller), scalability, concurrency, indexes, partitioning (database, not disk), and other adjustments that will improve performance WITHOUT TOTALLY HOSING YOUR DATA MODEL. A good object model that caches well without concurrency bottlenecks will scale well to additional hardware and provide much more performance than a messed-up data model, without the risks of the latter. I've been part of performance optimization efforts for databases a number of times. Denormalization usually has hurt, not helped. In one case that I recall fondly, the denormalized structure caused a quadratic increase in processing time with data quantity, rather than the linear increase a normalized database would have provided (and did, when they finally accepted my recommendation to normalize, but not before causing a major problem with their customer that got the project manager replaced and nearly cost the contract). Program run time is rarely the dominant cost in a project. Code correctly and well first. That will almost always give sufficient performance. If not, MEASURE and optimize and MEASURE again, focusing first and foremost on things that don't mess you up by harming data integrity, maintainability or scalability. -- Lew |