Monday, February 6, 2012

Mysql JDBC driver and Streaming large resultset

We are moving one legacy component from Berkeley db to mysql so that we can replicate it to distribute read requests and shard it to scale it. I had to dump the entire db contents and stream it over http to some other component. the db content for a single customer can range from 100K to 3-4 Million records. We are using spring JDBC to make the task of dealing with jdbc api simple. Now I was using a ResultSetExtractor to stream the resultset like this.

            ResultSetExtractor resultSet = new ResultSetExtractor() {
                @Override
                public Object extractData(ResultSet result) throws SQLException, DataAccessException {
                    while(result.next()){
                        XXXXX
                    }
                    return "";
                }
            };

getJdbcTemplate(context).getJdbcOperations().query( sql, resultSetExtractor);

But it appears that Mysql native JDBC driver loads entire resultset into memory before passing the control onto ResultSetExtractor and that was causing OOM.

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.


This is explained here http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html 

It seems SimpleJdbcTemplate doesnt have a setFetchSize method and JdbcTemplate has a setFetchSize but that doesn not work when you call the query method and if you use statement then the applySettings method applies fetchSize only if its >0.  So the solution is to use PreparedStatement directly

callback = new PreparedStatementCallback() {
                @Override
                public Void doInPreparedStatement(PreparedStatement pstmt) throws SQLException, DataAccessException {
                    ResultSet rs = pstmt.executeQuery();
                    resultSetExtractor.extractData(rs);
                    rs.close();
                    return null;
                }
            };

        executeStreamed(jdbcTemplate, callback, sql);

    /**
     * http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
     * Unless you specify the statement settings as below the mysql driver is going to load all results in memory.
     *
     * @param jdbcTemplate
     * @param callback
     * @param sql
     */
    protected void executeStreamed(SimpleJdbcTemplate jdbcTemplate, PreparedStatementCallback callback, final String sql) {
        PreparedStatementCreator creator = new PreparedStatementCreator() {
            @Override
            public PreparedStatement createPreparedStatement(Connection conn) throws SQLException {
                PreparedStatement pstmt = conn.prepareStatement(sql, java.sql.ResultSet.TYPE_FORWARD_ONLY,
                        java.sql.ResultSet.CONCUR_READ_ONLY);
                pstmt.setFetchSize(Integer.MIN_VALUE);
                return pstmt;
            }
        };
        jdbcTemplate.getJdbcOperations().execute(creator, callback);
    }


1 comment:

  1. Alternatively, we can subclass JdbcTemplate and override applySettings method and use this subclass.

    ReplyDelete