公司有一批HP服务器,采用的是Intel XEON E5
v3的CPU,并且配置了FusionIO的SSD卡,在运行MySQL的时候经常出现MySQL hang住的情况。
找了HP厂家,驱动,硬件换了一个溜够,故障依然存在。近日DBA部门经理去参加一个交流会议时从其他公司同行处得知,此乃一个内核的BUG导致的,尤其是在Intel
haswells平台CPU上特别明显,具体情况见参考文档。这个BUG影响的范围如下:
RHEL 6 (and CentOS 6, and SL 6): 6.0-6.5
are good. 6.6
is BAD. 6.6.z
is good.
RHEL 7
(and CentOS 7, and SL 7): 7.1
is BAD. As
of yesterday. there does not yet appear to be a 7.x
fix. [May 13,
2015]
RHEL 5
(and CentOS 5, and SL 5): All versions
are good (including
5.11).
经过确认,CentOS
6.7版本已经解决了这个问题,可以直接升级到6.7版即可。
可以查看CentOS
6.7内核源码,linux-2.6.32-573.el6/kernel/futex.c,224~230行。
212 static void get_futex_key_refs(union
futex_key *key)
213 {
214
if
(!key->both.ptr)
215
return;
216
217
switch (key->both.offset
& (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
218
case
FUT_OFF_INODE:
219
atomic_inc(&key->shared.inode->i_count);
220
break;
221
case
FUT_OFF_MMSHARED:
222
futex_get_mm(key);
223
break;
224
default:
225
--
226
-- *
Private futexes do not hold reference on an inode
or
227
-- * mm,
therefore the only purpose of calling
get_futex_key_refs
228
-- * is
because we need the barrier for the lockless waiter
check.
229
--
*/
230
smp_mb();
231
}
232 }
参考文档:
https://groups.google.com/forum/#!msg/mechanical-sympathy/QbmpZxp6C64/0M4_EbzSLj4J
https://www.infoq.com/news/2015/05/redhat-futex
https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0