Skip to main content

Mysql JDBC driver and Streaming large resultset

We are moving one legacy component from Berkeley db to mysql so that we can replicate it to distribute read requests and shard it to scale it. I had to dump the entire db contents and stream it over http to some other component. the db content for a single customer can range from 100K to 3-4 Million records. We are using spring JDBC to make the task of dealing with jdbc api simple. Now I was using a ResultSetExtractor to stream the resultset like this.

            ResultSetExtractor resultSet = new ResultSetExtractor() {
                @Override
                public Object extractData(ResultSet result) throws SQLException, DataAccessException {
                    while(result.next()){
                        XXXXX
                    }
                    return "";
                }
            };

getJdbcTemplate(context).getJdbcOperations().query( sql, resultSetExtractor);

But it appears that Mysql native JDBC driver loads entire resultset into memory before passing the control onto ResultSetExtractor and that was causing OOM.

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.


This is explained here http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html 

It seems SimpleJdbcTemplate doesnt have a setFetchSize method and JdbcTemplate has a setFetchSize but that doesn not work when you call the query method and if you use statement then the applySettings method applies fetchSize only if its >0.  So the solution is to use PreparedStatement directly

callback = new PreparedStatementCallback() {
                @Override
                public Void doInPreparedStatement(PreparedStatement pstmt) throws SQLException, DataAccessException {
                    ResultSet rs = pstmt.executeQuery();
                    resultSetExtractor.extractData(rs);
                    rs.close();
                    return null;
                }
            };

        executeStreamed(jdbcTemplate, callback, sql);

    /**
     * http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
     * Unless you specify the statement settings as below the mysql driver is going to load all results in memory.
     *
     * @param jdbcTemplate
     * @param callback
     * @param sql
     */
    protected void executeStreamed(SimpleJdbcTemplate jdbcTemplate, PreparedStatementCallback callback, final String sql) {
        PreparedStatementCreator creator = new PreparedStatementCreator() {
            @Override
            public PreparedStatement createPreparedStatement(Connection conn) throws SQLException {
                PreparedStatement pstmt = conn.prepareStatement(sql, java.sql.ResultSet.TYPE_FORWARD_ONLY,
                        java.sql.ResultSet.CONCUR_READ_ONLY);
                pstmt.setFetchSize(Integer.MIN_VALUE);
                return pstmt;
            }
        };
        jdbcTemplate.getJdbcOperations().execute(creator, callback);
    }


Comments

  1. Alternatively, we can subclass JdbcTemplate and override applySettings method and use this subclass.

    ReplyDelete

Post a Comment

Popular posts from this blog

RabbitMQ java clients for beginners

Here is a sample of a consumer and producer example for RabbitMQ. The steps are
Download ErlangDownload Rabbit MQ ServerDownload Rabbit MQ Java client jarsCompile and run the below two class and you are done.
This sample create a Durable Exchange, Queue and a Message. You will have to start the consumer first before you start the for the first time.

For more information on AMQP, Exchanges, Queues, read this excellent tutorial
http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/

+++++++++++++++++RabbitMQProducer.java+++++++++++++++++++++++++++
import com.rabbitmq.client.Connection; import com.rabbitmq.client.Channel; import com.rabbitmq.client.*; public class RabbitMQProducer { public static void main(String []args) throws Exception { ConnectionFactory factory = new ConnectionFactory(); factory.setUsername("guest"); factory.setPassword("guest"); factory.setVirtualHost("/"); factory.setHost("127.0.0.1"); factory.setPort(5672); Conne…

Logging to Graphite monitoring tool from java

We use Graphite as a tool for monitoring some stats and watch trends. A requirement is to monitor impact of new releases as build is deployed to app nodes to see if things like
1) Has the memcache usage increased.
2) Has the no of Java exceptions went up.
3) Is the app using more tomcat threads.
Here is a screenshot

We changed the installer to log a deploy event when a new build is deployed. I wrote a simple spring bean to log graphite events using java. Logging to graphite is easy, all you need to do is open a socket and send lines of events.
import org.slf4j.Logger;import org.slf4j.LoggerFactory; import java.io.OutputStreamWriter; import java.io.Writer; import java.net.Socket; import java.util.HashMap; import java.util.Map; public class GraphiteLogger { private static final Logger logger = LoggerFactory.getLogger(GraphiteLogger.class); private String graphiteHost; private int graphitePort; public String getGraphiteHost() { return graphiteHost; } public void setGraphite…

Jersey posting multipart data

This took me sometime to figure out mostly it was because I was only including jersey-multipart-1.6.jar but I was not including mimepull-1.3.jar.

So the intent is to upload a file using REST api and we need pass meta attributes in addition to uploading the file. Also the intent is to stream the file instead of first storing it on the local disk. Here is some sample code.
@Path("/upload-service") public class UploadService { @Context protected HttpServletResponse response; @Context protected HttpServletRequest request; @POST @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATION_JSON) public String uploadFile(@PathParam("fileName") final String fileName, @FormDataParam("workgroupId") String workgroupId, @FormDataParam("userId") final int userId, @FormDataParam("content") final InputStream content) throws JSONException { //.......Upload the file to S3 or netapp or any storage service } }
Now to tes…