问题描述
我已经安装了最新版本的 OSSEC (2.8.1),并且还启用了电子邮件通知。我收到大量此类通知,称存在硬件错误以及有关 mce 的信息:
OSSEC HIDS Notification.
2015 Apr 04 20:09:22
Received From: Bath-Towel->/var/log/syslog
Rule: 1002 fired (level 2) -> "Unknown problem somewhere in the system."
Portion of the log(s):
Apr 4 20:09:21 Bath-Towel kernel: [ 1873.680872] mce: [Hardware Error]: Machine check events logged
--END OF NOTIFICATION
那么这到底是什么意思呢? mce 代表什么?这个明显的硬件错误是我应该担心的吗?
操作系统信息:
Description: Ubuntu 14.10
Release: 14.10
最佳方法
\\n
A Machine Check Exception (MCE) is a type of computer hardware\\n error that occurs when a computer’s central processing unit detects a\\n hardware problem.
\\n
您的计算机遇到硬件错误,内核在缓冲区中记录了一个事件。您可以使用 mcelog
来记录和查看机器检查事件。来自 mcelog
联机帮助页:
\\n
X86 CPUs report errors detected by the CPU as machine check events\\n (MCEs). These can be data corruption detected in the CPU caches, in\\n main memory by an integrated memory controller, data transfer errors\\n on the front side bus or CPU interconnect or other internal errors.\\n Possible causes can be cosmic radiation, instable power supplies,\\n cooling problems, broken hardware, running systems out of\\n specification, or bad luck.
\\n
Most errors can be corrected by the CPU by internal error correction\\n mechanisms. Uncorrected errors cause machine check exceptions which\\n may kill processes or panic the machine. A small number of corrected\\n errors is usually not a cause for worry, but a large number can\\n indicate future failure.
\\n
When a corrected or recovered error happens the x86 kernel writes a\\n record describing the MCE into a internal ring buffer available\\n through the /dev/mcelog device. mcelog retrieves errors from\\n /dev/mcelog, decodes them into a human readable format and prints them\\n on the standard output or optionally into the system log.
\\n
如果您没有注意到任何崩溃,则可能已成功更正错误。尽管如此,我还是建议您安装 mcelog
以跟踪此类事件:
sudo apt-get install mcelog
事件将记录到 /var/log/mcelog
。您还可以运行:
sudo mcelog --client
查询 mcelog
守护进程的错误。