1.背景概述

客户业务产生死锁的报错,依据业务程序日志及业务流程,发现形成死锁的原因是:业务1 delete insert ,业务2 delete insert 2个业务交替履行导致的死锁;因为GAP锁堵塞了刺进意向锁,而且当delete的数据存在时死锁不会产生,当delete的数据不存在时,会产生死锁。

2.问题复现

本次测验根据 GreatSQL-8.0.32-24,阻隔级别为 RR

2.1 创建测验表

greatsql> create database test;
greatsql> create table test(c1 int unique key, c2 int, c3 int, c4 int);
greatsql> insert into test values (1,1,1,1),(3,3,3,3),(5,5,5,5),(9,9,9,9);
greatsql> select * from test;
 ------ ------ ------ ------ 
| c1  | c2  | c3  | c4  |
 ------ ------ ------ ------ 
|   1 |   1 |   1 |   1 |
|   3 |   3 |   3 |   3 |
|   5 |   5 |   5 |   5 |
|   9 |   9 |   9 |   9 |
 ------ ------ ------ ------ 
4 rows in set (0.01 sec)

2.2 业务履行顺序

按以下业务履行顺序,假如要删去的数据存在,则不会产生死锁;假如要删去的数据不存,而且要删去的数据在同一个GAP锁的区间内则会产生死锁;

时间 业务1 业务2
T1 BEGIN; BEGIN;
T2 delete from test where c1=?;
T3 delete from test where c1=?;
T4 insert into test value(?,?,?,?);
T5 insert into test value(?,?,?,?);

2.3 当delete的数据存在时

业务1:delete

greatsql> begin;
Query OK, 0 rows affected (0.00 sec)
greatsql> delete from test where c1=3;
Query OK, 1 row affected (0.00 sec)

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME    | LOCK_TYPE | LOCK_MODE   | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
|     59 | test      | test     | NULL     | TABLE   | IX       | GRANTED   | NULL        |
|     59 | test      | test     | c1       | RECORD   | X,REC_NOT_GAP | GRANTED   | 3, 0x000000000201 |
|     59 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000201   |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
3 rows in set (0.00 sec)

此刻业务1给 3, 0x000000000201 这条数据加了 记载锁 X,REC_NOT_GAP

业务2:delete

greatsql> begin;
Query OK, 0 rows affected (0.00 sec)
greatsql> delete from test where c1=5;
Query OK, 1 row affected (0.00 sec)

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME    | LOCK_TYPE | LOCK_MODE   | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
|     57 | test      | test     | NULL       | TABLE   | IX       | GRANTED   | NULL        |
|     57 | test      | test     | c1        | RECORD   | X,REC_NOT_GAP | GRANTED   | 5, 0x000000000202 |
|     57 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000202   |
|     59 | test      | test     | NULL       | TABLE   | IX       | GRANTED   | NULL        |
|     59 | test      | test     | c1        | RECORD   | X,REC_NOT_GAP | GRANTED   | 3, 0x000000000201 |
|     59 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000201   |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
6 rows in set (0.00 sec)

此刻业务2给 5, 0x000000000202 这条数据加了 记载锁 X,REC_NOT_GAP

业务1:insert

greatsql> insert into test value(3,3,3,3);
Query OK, 1 row affected (0.00 sec)

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME    | LOCK_TYPE | LOCK_MODE   | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
|     57 | test      | test     | NULL       | TABLE   | IX       | GRANTED   | NULL        |
|     57 | test      | test     | c1        | RECORD   | X,REC_NOT_GAP | GRANTED   | 5, 0x000000000202 |
|     57 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000202   |
|     59 | test      | test     | NULL       | TABLE   | IX       | GRANTED   | NULL        |
|     59 | test      | test     | c1        | RECORD   | X,REC_NOT_GAP | GRANTED   | 3, 0x000000000201 |
|     59 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000201   |
|     59 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 3, 0x000000000201 |
|     59 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 5, 0x000000000202 |
|     59 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 3, 0x000000000206 |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
9 rows in set (0.00 sec)

此刻业务1给 被delete删去的数据 3, 0x000000000201 ,刺进的数据 3, 0x000000000202,以及相邻的下一条数据 5, 0x000000000202 加了 空隙锁 S,GAP

业务2:insert

greatsql> insert into test value(5,5,5,5);
Query OK, 1 row affected (0.01 sec)

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME    | LOCK_TYPE | LOCK_MODE   | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
|     57 | test      | test     | NULL       | TABLE   | IX       | GRANTED   | NULL        |
|     57 | test      | test     | c1        | RECORD   | X,REC_NOT_GAP | GRANTED   | 5, 0x000000000202 |
|     57 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000202   |
|     57 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 5, 0x000000000202 |
|     57 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 9, 0x000000000203 |
|     57 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 5, 0x000000000207 |
|     59 | test      | test     | NULL       | TABLE   | IX       | GRANTED   | NULL        |
|     59 | test      | test     | c1        | RECORD   | X,REC_NOT_GAP | GRANTED   | 3, 0x000000000201 |
|     59 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,REC_NOT_GAP | GRANTED   | 0x000000000201   |
|     59 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 3, 0x000000000201 |
|     59 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 5, 0x000000000202 |
|     59 | test      | test     | c1        | RECORD   | S,GAP     | GRANTED   | 3, 0x000000000206 |
 ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 
12 rows in set (0.00 sec)

此刻业务2给 被delete删去的数据 5, 0x000000000202 ,刺进的数据 5, 0x000000000207,以及相邻的下一条数据 9, 0x000000000203 加了 空隙锁 S,GAP

因为GAP锁之间是彼此兼容的,所以没有产生锁等候及死锁,此刻业务1,业务2都履行完结,能够正常提交。

2.4 当delete的数据不存在时

业务1:delete

greatsql> begin;
Query OK, 0 rows affected (0.01 sec)
greatsql> delete from test where c1=6;
Query OK, 0 rows affected (0.00 sec)

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 
|     61 | test      | test     | NULL    | TABLE   | IX     | GRANTED   | NULL        |
|     61 | test      | test     | c1     | RECORD   | X,GAP   | GRANTED   | 9, 0x000000000203 |
 ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 
2 rows in set (0.00 sec)

此刻业务1给 9, 0x000000000203 这条数据加了 空隙锁 X,GAP

业务2:delete

greatsql> begin;
Query OK, 0 rows affected (0.00 sec)
greatsql> delete from test where c1=7;
Query OK, 0 rows affected (0.00 sec)

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 
|     62 | test      | test     | NULL    | TABLE   | IX     | GRANTED   | NULL        |
|     62 | test      | test     | c1     | RECORD   | X,GAP   | GRANTED   | 9, 0x000000000203 |
|     61 | test      | test     | NULL    | TABLE   | IX     | GRANTED   | NULL        |
|     61 | test      | test     | c1     | RECORD   | X,GAP   | GRANTED   | 9, 0x000000000203 |
 ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 
4 rows in set (0.00 sec)

此刻业务2给 9, 0x000000000203 这条数据加了 空隙锁 X,GAP 空隙锁能够彼此兼容,因而没有报错

业务1:insert

greatsql> insert into test value(6,6,6,6);
---hang住,处于锁等候

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ------------ ----------- ------------------------ ------------- ------------------- 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE        | LOCK_STATUS | LOCK_DATA     |
 ----------- --------------- ------------- ------------ ----------- ------------------------ ------------- ------------------- 
|     62 | test      | test     | NULL    | TABLE   | IX           | GRANTED   | NULL        |
|     62 | test      | test     | c1     | RECORD   | X,GAP          | GRANTED   | 9, 0x000000000203 |
|     61 | test      | test     | NULL    | TABLE   | IX           | GRANTED   | NULL        |
|     61 | test      | test     | c1     | RECORD   | X,GAP          | GRANTED   | 9, 0x000000000203 |
|     61 | test      | test     | c1     | RECORD   | X,GAP,INSERT_INTENTION | WAITING   | 9, 0x000000000203 |
 ----------- --------------- ------------- ------------ ----------- ------------------------ ------------- ------------------- 
5 rows in set (0.00 sec)

此刻业务1,要请求给 9, 0x000000000203 这条数据加 空隙锁,刺进意向锁 X,GAP,INSERT_INTENTION;因为业务2现已加了 空隙锁 X,GAP 空隙锁与刺进意向锁并不兼容,因而业务1的insert处于锁等候状态

业务2:insert

greatsql> insert into test value(7,7,7,7);
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

检查锁信息:

greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks;
 ----------- --------------- ------------- ----------------- ----------- ------------------------ ------------- ------------------------ 
| THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME    | LOCK_TYPE | LOCK_MODE        | LOCK_STATUS | LOCK_DATA        |
 ----------- --------------- ------------- ----------------- ----------- ------------------------ ------------- ------------------------ 
|     61 | test      | test     | NULL       | TABLE   | IX           | GRANTED   | NULL          |
|     61 | test      | test     | c1        | RECORD   | X,GAP          | GRANTED   | 9, 0x000000000203    |
|     61 | test      | test     | c1        | RECORD   | X,GAP          | GRANTED   | 6, 0x000000000213    |
|     61 | test      | test     | GEN_CLUST_INDEX | RECORD   | X            | GRANTED   | supremum pseudo-record |
|     61 | test      | test     | GEN_CLUST_INDEX | RECORD   | X,GAP          | GRANTED   | 0x000000000213     |
|     61 | test      | test     | c1        | RECORD   | X,GAP,INSERT_INTENTION | GRANTED   | 9, 0x000000000203    |
 ----------- --------------- ------------- ----------------- ----------- ------------------------ ------------- ------------------------ 
6 rows in set (0.01 sec)

业务2回滚,只要业务1的加锁信息。

因为此刻业务2,要请求给 9, 0x000000000203 这条数据加 空隙锁,刺进意向锁因为业务1现已加了 空隙锁 X,GAP 空隙锁与刺进意向锁并不兼容,因而业务2的insert处于锁等候状态,2个业务彼此等候锁导致死锁,此刻业务2回滚。

3.总结

此次死锁的产生主要是GAP 锁 和 刺进意向锁的冲突,主张让业务修改一下逻辑,先判别数据是否存在 select 一下,存在的话 delete 后在 insert ; 不存在的话直接 insert 不必delete了。

Enjoy GreatSQL :)

关于 GreatSQL

GreatSQL是适用于金融级应用的国内自主开源数据库,具备高性能、高可靠、高易用性、高安全等多个核心特性,能够作为MySQL或Percona Server的可选替换,用于线上生产环境,且完全免费并兼容MySQL或Percona Server。

相关链接: GreatSQL社区 Gitee GitHub Bilibili

GreatSQL社区:

GreatSQL 死锁事例分析

社区有奖主张反应: greatsql.cn/thread-54-1…

社区博客有奖征稿概况: greatsql.cn/thread-100-…

(对文章有疑问或者有独到见解都能够去社区官网提出或共享哦~)

技术交流群:

微信&QQ群:

QQ群:533341697

微信群:增加GreatSQL社区助手(微信号:wanlidbc )好友,待社区助手拉您进群。