Примерно раз в месяц сервер, который до этого отработал пару лет - перезагружается. Или просто виснет, да так, что отключается видеосигнал на монитор, клавиатура тоже вне реакции. Реагирует только на аппаратный Reset.
Блок питания уже поменял (грешил на плюс минус пять вольт) на дорогой и рабочий из магазина.
После ресета работает как новенький еще месяц.
Уважаемое сообщество, куда можно посмотреть?
В логах до аварии такое:
/var/log/kernel/errors
5d c3 <55> 89 e5 57 89 c7 56 53 83 ec 04 64 8b 1d 10 80 76 c1 8d 76 00
Oct 12 03:05:18 server-cr kernel: [2218515.868002] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:05:18 server-cr kernel: [2218515.868002] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:05:18 server-cr kernel: [2218515.868002] task: c167ba40 ti: c1670000 task.ti: c1670000
Oct 12 03:05:18 server-cr kernel: [2218515.868002] Stack:
Oct 12 03:05:18 server-cr kernel: [2218515.868002] Call Trace:
Oct 12 03:05:18 server-cr kernel: [2218515.868002] Code: ae f0 66 90 89 e0 31 d2 25 00 e0 ff ff 83 c0 08 89 d1 0f 01 c8 64 a1 74 86 76 c1 8b 40 04 8b 40 08 a8 08 75 0d 31 c0 fb 0f 01 c9 <eb> 0b 90 8d 74 26 00 fb 66 66 90 66 90 89 e0 25 00 e0 ff ff 83
Oct 12 03:05:41 server-cr kernel: [2218540.064001] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:7012]
Oct 12 03:05:41 server-cr kernel: [2218540.064001] CPU: 1 PID: 7012 Comm: smbd Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:05:41 server-cr kernel: [2218540.064001] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:05:41 server-cr kernel: [2218540.064001] task: f3be2db0 ti: f3b8c000 task.ti: f3b8c000
Oct 12 03:05:41 server-cr kernel: [2218540.064001] Stack:
Oct 12 03:05:42 server-cr kernel: [2218540.064001] Call Trace:
Oct 12 03:05:42 server-cr kernel: [2218540.064001] Code: 0f b6 c6 38 d0 75 01 c3 55 89 e5 83 ec 10 89 5d f8 89 c3 83 e3 fe 89 75 fc 0f b6 f3 b8 00 80 00 00 eb 08 90 f3 90 83 e8 01 74 11 <0f> b6 11 38 d3 75 f2 8b 5d f8 8b 75 fc 89 ec 5d c3 89 c8 89 f2
Oct 12 03:06:09 server-cr kernel: [2218568.064001] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:7012]
Oct 12 03:06:09 server-cr kernel: [2218568.064001] CPU: 1 PID: 7012 Comm: smbd Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:06:09 server-cr kernel: [2218568.064001] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:06:09 server-cr kernel: [2218568.064001] task: f3be2db0 ti: f3b8c000 task.ti: f3b8c000
Oct 12 03:06:10 server-cr kernel: [2218568.064001] Stack:
Oct 12 03:06:10 server-cr kernel: [2218568.064001] Call Trace:
Oct 12 03:06:10 server-cr kernel: [2218568.064001] Code: f0 66 0f c1 10 0f b6 c6 38 d0 75 01 c3 55 89 e5 83 ec 10 89 5d f8 89 c3 83 e3 fe 89 75 fc 0f b6 f3 b8 00 80 00 00 eb 08 90 f3 90 <83> e8 01 74 11 0f b6 11 38 d3 75 f2 8b 5d f8 8b 75 fc 89 ec 5d
Oct 12 03:06:37 server-cr kernel: [2218596.064001] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:7012]
Oct 12 03:06:37 server-cr kernel: [2218596.064001] CPU: 1 PID: 7012 Comm: smbd Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:06:37 server-cr kernel: [2218596.064001] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:06:37 server-cr kernel: [2218596.064001] task: f3be2db0 ti: f3b8c000 task.ti: f3b8c000
Oct 12 03:06:37 server-cr kernel: [2218596.064001] Stack:
Oct 12 03:06:38 server-cr kernel: [2218596.064001] Call Trace:
Oct 12 03:06:38 server-cr kernel: [2218596.064001] Code: f0 66 0f c1 10 0f b6 c6 38 d0 75 01 c3 55 89 e5 83 ec 10 89 5d f8 89 c3 83 e3 fe 89 75 fc 0f b6 f3 b8 00 80 00 00 eb 08 90 f3 90 <83> e8 01 74 11 0f b6 11 38 d3 75 f2 8b 5d f8 8b 75 fc 89 ec 5d
Oct 12 03:07:05 server-cr kernel: [2218624.064002] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:7012]
Oct 12 03:07:05 server-cr kernel: [2218624.064002] CPU: 1 PID: 7012 Comm: smbd Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:07:05 server-cr kernel: [2218624.064002] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:07:05 server-cr kernel: [2218624.064002] task: f3be2db0 ti: f3b8c000 task.ti: f3b8c000
Oct 12 03:07:06 server-cr kernel: [2218624.064002] Stack:
Oct 12 03:07:06 server-cr kernel: [2218624.064002] Call Trace:
Oct 12 03:07:06 server-cr kernel: [2218624.064002] Code: f0 66 0f c1 10 0f b6 c6 38 d0 75 01 c3 55 89 e5 83 ec 10 89 5d f8 89 c3 83 e3 fe 89 75 fc 0f b6 f3 b8 00 80 00 00 eb 08 90 f3 90 <83> e8 01 74 11 0f b6 11 38 d3 75 f2 8b 5d f8 8b 75 fc 89 ec 5d
Oct 12 03:07:33 server-cr kernel: [2218652.064001] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:7012]
Oct 12 03:07:33 server-cr kernel: [2218652.064001] CPU: 1 PID: 7012 Comm: smbd Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:07:33 server-cr kernel: [2218652.064001] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:07:33 server-cr kernel: [2218652.064001] task: f3be2db0 ti: f3b8c000 task.ti: f3b8c000
Oct 12 03:07:33 server-cr kernel: [2218652.064001] Stack:
Oct 12 03:07:34 server-cr kernel: [2218652.064001] Call Trace:
Oct 12 03:07:34 server-cr kernel: [2218652.064001] Code: 0f b6 c6 38 d0 75 01 c3 55 89 e5 83 ec 10 89 5d f8 89 c3 83 e3 fe 89 75 fc 0f b6 f3 b8 00 80 00 00 eb 08 90 f3 90 83 e8 01 74 11 <0f> b6 11 38 d3 75 f2 8b 5d f8 8b 75 fc 89 ec 5d c3 89 c8 89 f2
Oct 12 03:08:01 server-cr kernel: [2218680.064001] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:7012]
Oct 12 03:08:01 server-cr kernel: [2218680.064001] CPU: 1 PID: 7012 Comm: smbd Not tainted 3.14.79-std-def-alt0.M70P.1 #1
Oct 12 03:08:01 server-cr kernel: [2218680.064001] Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0601 08/23/2007
Oct 12 03:08:01 server-cr kernel: [2218680.064001] task: f3be2db0 ti: f3b8c000 task.ti: f3b8c000
Oct 12 03:08:01 server-cr kernel: [2218680.064001] Stack:
Oct 12 03:08:02 server-cr kernel: [2218680.064001] Call Trace:
Oct 12 03:08:02 server-cr kernel: [2218680.064001] Code: f0 66 0f c1 10 0f b6 c6 38 d0 75 01 c3 55 89 e5 83 ec 10 89 5d f8 89 c3 83 e3 fe 89 75 fc 0f b6 f3 b8 00 80 00 00 eb 08 90 f3 90 <83> e8 01 74 11 0f b6 11 38 d3 75 f2 8b 5d f8 8b 75 fc 89 ec 5d
/var/log/kernel/info
Oct 12 02:35:01 server-cr kernel: mod
Oct 12 02:35:17 server-cr kernel: [2216715.744001] sending NMI to all CPUs:
Oct 12 02:35:41 server-cr kernel: mod
Oct 12 02:36:37 server-cr last message repeated 2 times
Oct 12 02:37:33 server-cr last message repeated 2 times
Oct 12 02:38:01 server-cr kernel: mod
Oct 12 02:38:17 server-cr kernel: [2216895.756001] sending NMI to all CPUs:
Oct 12 02:38:41 server-cr kernel: mod
Oct 12 02:39:37 server-cr last message repeated 2 times
Oct 12 02:40:33 server-cr last message repeated 2 times
Oct 12 02:41:01 server-cr kernel: mod
Oct 12 02:41:17 server-cr kernel: [2217075.768001] sending NMI to all CPUs:
Oct 12 02:41:41 server-cr kernel: mod
Oct 12 02:42:37 server-cr last message repeated 2 times
Oct 12 02:43:33 server-cr last message repeated 2 times
Oct 12 02:44:01 server-cr kernel: mod
Oct 12 02:44:17 server-cr kernel: [2217255.780001] sending NMI to all CPUs:
Oct 12 02:44:41 server-cr kernel: mod
Oct 12 02:45:37 server-cr last message repeated 2 times
Oct 12 02:46:33 server-cr last message repeated 2 times
Oct 12 02:47:01 server-cr kernel: mod
Oct 12 02:47:17 server-cr kernel: [2217435.792001] sending NMI to all CPUs:
Oct 12 02:47:41 server-cr kernel: mod
Oct 12 02:48:37 server-cr last message repeated 2 times
Oct 12 02:49:33 server-cr last message repeated 2 times
Oct 12 02:50:01 server-cr kernel: mod
Oct 12 02:50:17 server-cr kernel: [2217615.804001] sending NMI to all CPUs:
Oct 12 02:50:41 server-cr kernel: mod
Oct 12 02:51:37 server-cr last message repeated 2 times
Oct 12 02:52:33 server-cr last message repeated 2 times
Oct 12 02:53:01 server-cr kernel: mod
Oct 12 02:53:17 server-cr kernel: [2217795.816001] sending NMI to all CPUs:
Oct 12 02:53:41 server-cr kernel: mod
Oct 12 02:54:37 server-cr last message repeated 2 times
Oct 12 02:55:33 server-cr last message repeated 2 times
Oct 12 02:56:01 server-cr kernel: mod
Oct 12 02:56:17 server-cr kernel: [2217975.828001] sending NMI to all CPUs:
Oct 12 02:56:41 server-cr kernel: mod
Oct 12 02:57:37 server-cr last message repeated 2 times
Oct 12 02:58:33 server-cr last message repeated 2 times
Oct 12 02:59:01 server-cr kernel: mod
Oct 12 02:59:17 server-cr kernel: [2218155.840001] sending NMI to all CPUs:
Oct 12 02:59:41 server-cr kernel: mod
Oct 12 03:00:37 server-cr last message repeated 2 times
Oct 12 03:01:33 server-cr last message repeated 2 times
Oct 12 03:02:01 server-cr kernel: mod
Oct 12 03:02:17 server-cr kernel: [2218335.852001] sending NMI to all CPUs:
Oct 12 03:02:41 server-cr kernel: mod
Oct 12 03:03:37 server-cr last message repeated 2 times
Oct 12 03:04:33 server-cr last message repeated 2 times
Oct 12 03:05:01 server-cr kernel: mod
Oct 12 03:05:17 server-cr kernel: [2218515.864001] sending NMI to all CPUs:
Oct 12 03:05:41 server-cr kernel: mod
Oct 12 03:06:37 server-cr last message repeated 2 times
Oct 12 03:07:33 server-cr last message repeated 2 times
В других логах вроде больше криминала не обнаружил.
Вентиляция и охлаждение сервера - в норме (корпус не выше 19 градусов).
Что это может быть? Я спинным мозгом понимаю, что это скорее всего - железо, но что конкретно? И почему с почти равной периодичностью?