Saturday, January 14, 2012

Hibernate or JDBC - play smart

We know how ORM frameworks like Hibernate, JPA make a developer's life easier. Given a choice, we would always code in objects for manipulating data. But, once in a while you come across a use case where JDBC code clearly trumps the Hibernate in terms of performance. I came across such a use case, where I was required to parse thousands of sql statements and input the data into relational tables.What I found was even after I followed the Hibernate's best practices, Hibernate code was still twice slower than JDBC. This post is to generate conversation and to invite expert opinion on how to determine when to not use Hibernate. Does my use case a perfect example of this? Do you know of any other use cases?

Typically Hibernate's best practices on improving performance include following techniques related to my use case.

  • Use batching while performing batch inserts/updates
  • Use second level cache
  • Specify the objects for caching
  • Clearing session to avoid object caching at first level

And so forth .......

I found that the last option, flushing the session often was most helpful for my project. But I wanted to post this information to invite help, guidance, advice from Hibernate experts on performance tuning in general, and to verify if what I did (use JDBC) was the best option for the fastest performance.

To more curious folks, here are the top 5 urls in google search on 'hibernate performance tuning', but none of those discuss the use case I have here.

http://www.javadev.org/files/Hibernate%20Performance%20Tuning.pdf
http://java.dzone.com/articles/hibernate-performance-tuning
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html
http://www.javaperformancetuning.com/news/interview041.shtml
http://arnosoftwaredev.blogspot.com/2011/01/hibernate-performance-tips.html

The application where I had to use JDBC in favor of Hibernate due to performance reasons was really very simple. Continuously streamed SQL data was parsed and went into three relational tables - parent, child and a child of child. Let's call them tables A, B and C respectively for simplicity. There was 1-1 relation between A and B while C had variable number of rows for each row of table B.
Table relationship

The code I wrote with Hibernate added a sample of 100K rows to A, 100K rows to B and around 200K rows to table C. There was no need to read the data while it was being written. The 'for loop' I wrote repeated 100K times with each instance of the loop adding row to table A first, followed by table B and table C next. Before debugging my code (Hibernate) due to very very sluggish performance, the speed of the program progressively decreased and almost started crawling before it threw 'out of memory' error. I had to add session.flush()  at the end of each cycle of the loop. Adding session.flush() statement greatly improved performance of writes and because there was no need to read the data at the time of writing session flushing did not cost performance penalty.

Typically in any real life OLTP applications some user sessions will always be reading the data while other user sessions are actively writing it. Hibernate's best practices suggest the use of second level cache for better performance and in most cases it works well when multiple sessions are adding data. Remember, Hibernate caches recently added data automatically. Second level cache is involved when multiple sessions are writing and reading data but in the scenario I discussed, I had only one session that was writing and no session was reading the data. Hence turning off or on the second level cache had no impact on performance at all.

But a slightly modified scenario could potentially be a real life use case. What are the Hibernate best practices to achieve optimum performance in such a use case. The modified scenario is to have one session writing the data and multiple sessions reading the data. For example, results and statistics of Summer Olympics being fed to a data store by some feed engine, while participants and viewers are viewing the data in real time. What steps could we take to get the best read/write performance using Hibernate. Please consider for discussion sake that data from feed engine is not a bulk upload but more of a live stream data accepted using Hibernate code (I guess it's not really a best practice). Do we turn second level cache off, do we flush session on each write cycle, batch the write operations or?

Of course, the performance tests using simulated work load provides the best answer, but as an architect how do we propose a solution that ultimately is validated by performance tests. Following table shows the performance numbers I got in my test runs with different configurations. The code was really simple and it performed operations I mentioned earlier.


Hibernate Vs JDBC
 If you are interested please add a comment and I will send out the code to you. Please note the code was run from within Eclipse on Windows XP with 512 MB max memory settings in Eclipse. I used Sybase 15.5 ASE server to host the tables. The relationship between the tables is really simple each having a primary key and a foreign key for one to many relation between table B and C and one to one relation between A and B. It's clear that flushing the session after each write cycle (after adding 1 record in table A, 1 in table B and multiple records in table C) was most important factor in getting the faster execution time. But the JDBC code still out performed Hibernate code. It was twice as fast Hibernate. See the last row in the table.

Another interesting finding was that batching (20 as suggested by documentation) actually performed worse than non-batching. This application from complexity perspective was really simple - not many tables, simple object structure, simple relationship, no complex SQLs. So it wasn't too difficult or cumbersome to use JDBC for the best performance.
There is abundance of technical literature available on internet on every technical subject. We read and follow guidelines from well regarded sources without thinking twice. But in some occassions it's useful to try things out and validate for yourself. That's what I was forced to do and I was glad I did it. Because of it I got some interesting results to share with the community. All hibernate enthusiasts, feel free to comment and advise.