近来,项目中有一个耗时较长的Job存在CPU占用过高的问题,经排查发现,首要时刻耗费在往MyBatis中批量刺进数据。mapper configuration是用foreach循环做的,差不多是这样。(因为项目保密,以下代码均为自己手写的demo代码)

<insert id="batchInsert" parameterType="java.util.List">
    insert into USER (id, name) values  
    <foreach collection="list" item="model" index="index" separator=",">   
        (#{model.id}, #{model.name})  
    </foreach>  
</insert>  

这个办法提高批量刺进速度的原理是,将传统的:

INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");  
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");  
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");  
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");  

转化为:

INSERT INTO `table1` (`field1`, `field2`)
      VALUES ("data1", "data2"),  
             ("data1", "data2"),  
             ("data1", "data2"),  
             ("data1", "data2"),  
             ("data1", "data2");  

在MySql Docs中也提到过这个trick,假如要优化刺进速度时,能够将许多小型操作组合到一个大型操作中。抱负情况下,这样能够在单个衔接中一次性发送许多新行的数据,并将一切索引更新和一致性查看推迟到最后才进行。

乍看上去这个foreach没有问题,可是通过项目实践发现,当表的列数较多(20+),以及一次性刺进的行数较多(5000+)时,整个刺进的耗时非常漫长,达到了14分钟,这是不能忍的。在资猜中也提到了一句话:

Of course don’t combine ALL of them, if the amount is HUGE. Say you have 1000 rows you need to insert, then don’t do it one at a time. You shouldn’t equally try to have all 1000 rows in a single query. Instead break it into smaller sizes.

它强调,当刺进数量很多时,不能一次性全放在一条句子里。可是为什么不能放在同一条句子里呢?这条句子为什么会耗时这么久呢?我查阅了材料发现:

Insert inside Mybatis foreach is not batch, this is a single (could become giant) SQL statement and that brings drawbacks:

some database such as Oracle here does not support.

in relevant cases: there will be a large number of records to insert and the database configured limit (by default around 2000 parameters per statement) will be hit, and eventually possibly DB stack error if the statement itself become too large.

Iteration over the collection must not be done in the mybatis XML. Just execute a simple Insertstatement in a Java Foreach loop. The most important thing is the session Executor type.

SqlSession session = sessionFactory.openSession(ExecutorType.BATCH);
for(Model model : list) {
session.insert(“insertStatement”, model);
}
session.flushStatements();

Unlike default ExecutorType.SIMPLE, the statement will be prepared once and executed for each record to insert.

从资猜中可知,默认执行器类型为Simple,会为每个句子创立一个新的预处理句子,也便是创立一个PreparedStatement目标。

在我们的项目中,会不停地运用批量刺进这个办法,而因为MyBatis关于含有的句子,无法采用缓存,那么在每次调用办法时,都会重新解析sql句子。

Internally, it still generates the same single insert statement with many placeholders as the JDBC code above.

MyBatis has an ability to cache PreparedStatement, but this statement cannot be cached because it containselement and the statement varies depending on the parameters. As a result, MyBatis has to 1) evaluate the foreach part and 2) parse the statement string to build parameter mapping [1] on every execution of this statement. And these steps are relatively costly process when the statement string is big and contains many placeholders.

[1] simply put, it is a mapping between placeholders and the parameters.

从上述材料可知,耗时就耗在,因为我foreach后有5000+个values,所以这个PreparedStatement特别长,包含了很多占位符,关于占位符和参数的映射尤其耗时。并且,查阅相关材料可知,values的增加与所需的解析时刻,是呈指数型增加的。

这次被 foreach 坑惨了,再也不敢乱用了...

图片

所以,假如非要运用 foreach 的办法来进行批量刺进的话,能够考虑削减一条 insert 句子中 values 的个数,最好能达到上面曲线的最底部的值,使速度最快。一般按经历来说,一次性插20~50行数量是比较适宜的,时刻耗费也能承受。

要点来了。上面讲的是,假如非要用的办法来刺进,能够提高功能的办法。而实际上,MyBatis文档中写批量刺进的时候,是引荐运用别的一种办法。(能够看
www.mybatis.org/mybatis-dyn… Insert Support标题里的内容)

SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);
try {  
    SimpleTableMapper mapper = session.getMapper(SimpleTableMapper.class);  
    List<SimpleTableRecord> records = getRecordsToInsert(); // not shown  
    BatchInsert<SimpleTableRecord> batchInsert = insert(records)  
            .into(simpleTable)  
            .map(id).toProperty("id")  
            .map(firstName).toProperty("firstName")  
            .map(lastName).toProperty("lastName")  
            .map(birthDate).toProperty("birthDate")  
            .map(employed).toProperty("employed")  
            .map(occupation).toProperty("occupation")  
            .build()  
            .render(RenderingStrategy.MYBATIS3);  
    batchInsert.insertStatements().stream().forEach(mapper::insert);  
    session.commit();  
} finally {  
    session.close();  
}  

即基本思想是将MyBatis session的executor type设为 Batch ,然后多次执行刺进句子。就类似于JDBC的下面句子相同。

Connection connection = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/mydb?useUnicode=true&characterEncoding=UTF-8&useServerPrepStmts=false&rewriteBatchedStatements=true","root","root");  
connection.setAutoCommit(false);  
PreparedStatement ps = connection.prepareStatement(  
        "insert into tb_user (name) values(?)");  
for (int i = 0; i < stuNum; i++) {  
    ps.setString(1,name);  
    ps.addBatch();  
}  
ps.executeBatch();  
connection.commit();  
connection.close();  

通过实验,运用了ExecutorType.BATCH的刺进办法,功能明显提高,不到 2s 便能全部刺进完成。

总结一下

假如MyBatis需要进行批量刺进,引荐运用ExecutorType.BATCH的刺进办法,假如非要运用的刺进的话,需要将每次刺进的记录控制在 20~50 左右。