Discussion:
KASLR causes intermittent boot failures on some systems
Dave Young
2017-04-12 08:24:33 UTC
Permalink
Hi,
commit 021182e52fe01 ("x86/mm: Enable KASLR for physical mapping memory
regions") causes some of my systems with persistent memory (whether real
or emulated) to fail to boot with a couple of different crash
signatures. The first signature is a NMI watchdog lockup of all but 1
cpu, which causes much difficulty in extracting useful information from
the console. The second variant is an invalid paging request, listed
below.
On some systems, I haven't hit this problem at all. Other systems
experience a failed boot maybe 20-30% of the time. To reproduce it,
configure some emulated pmem on your system. You can find directions
for that here: https://nvdimm.wiki.kernel.org/
Install ndctl (https://github.com/pmem/ndctl).
# ndctl create-namespace -f -e namespace0.0 -m memory
Then just reboot several times (5 should be enough), and hopefully
you'll hit the issue.
I've attached both my .config and the dmesg output from a successful
boot at the end of this mail.
[snip]

I did some tests about emulated pmem via memmap=, kdump kernel hangs or
just reboots early during compressing kernel, no clue how to handle it.
Since for kdump kernel kaslr is pointless a workaround is use "nokaslr"

In Fedora or RHEL, just add "nokaslr" in KDUMP_COMMANDLINE_APPEND
in /etc/sysconfig/kdump

Can you try if this works?

Thanks
Dave
Dave Young
2017-04-12 08:27:44 UTC
Permalink
Post by Dave Young
Hi,
commit 021182e52fe01 ("x86/mm: Enable KASLR for physical mapping memory
regions") causes some of my systems with persistent memory (whether real
or emulated) to fail to boot with a couple of different crash
signatures. The first signature is a NMI watchdog lockup of all but 1
cpu, which causes much difficulty in extracting useful information from
the console. The second variant is an invalid paging request, listed
below.
On some systems, I haven't hit this problem at all. Other systems
experience a failed boot maybe 20-30% of the time. To reproduce it,
configure some emulated pmem on your system. You can find directions
for that here: https://nvdimm.wiki.kernel.org/
Install ndctl (https://github.com/pmem/ndctl).
# ndctl create-namespace -f -e namespace0.0 -m memory
Then just reboot several times (5 should be enough), and hopefully
you'll hit the issue.
I've attached both my .config and the dmesg output from a successful
boot at the end of this mail.
[snip]
I did some tests about emulated pmem via memmap=, kdump kernel hangs or
just reboots early during compressing kernel, no clue how to handle it.
s/compressing/uncompressing
Post by Dave Young
Since for kdump kernel kaslr is pointless a workaround is use "nokaslr"
In Fedora or RHEL, just add "nokaslr" in KDUMP_COMMANDLINE_APPEND
in /etc/sysconfig/kdump
Can you try if this works?
Thanks
Dave
Dave Young
2017-04-12 08:40:37 UTC
Permalink
Post by Dave Young
Hi,
commit 021182e52fe01 ("x86/mm: Enable KASLR for physical mapping memory
regions") causes some of my systems with persistent memory (whether real
or emulated) to fail to boot with a couple of different crash
signatures. The first signature is a NMI watchdog lockup of all but 1
cpu, which causes much difficulty in extracting useful information from
the console. The second variant is an invalid paging request, listed
below.
On some systems, I haven't hit this problem at all. Other systems
experience a failed boot maybe 20-30% of the time. To reproduce it,
configure some emulated pmem on your system. You can find directions
for that here: https://nvdimm.wiki.kernel.org/
Install ndctl (https://github.com/pmem/ndctl).
# ndctl create-namespace -f -e namespace0.0 -m memory
Then just reboot several times (5 should be enough), and hopefully
you'll hit the issue.
I've attached both my .config and the dmesg output from a successful
boot at the end of this mail.
[snip]
I did some tests about emulated pmem via memmap=, kdump kernel hangs or
just reboots early during compressing kernel, no clue how to handle it.
Since for kdump kernel kaslr is pointless a workaround is use "nokaslr"
In Fedora or RHEL, just add "nokaslr" in KDUMP_COMMANDLINE_APPEND
in /etc/sysconfig/kdump
Can you try if this works?
Oops, your problem is normal boot instead of kdump so this is two
different problems. Seems we have not met your bug yet..

Thanks
Dave
Jeff Moyer
2017-04-12 12:52:36 UTC
Permalink
Post by Dave Young
Post by Dave Young
I did some tests about emulated pmem via memmap=, kdump kernel hangs or
just reboots early during compressing kernel, no clue how to handle it.
Since for kdump kernel kaslr is pointless a workaround is use "nokaslr"
In Fedora or RHEL, just add "nokaslr" in KDUMP_COMMANDLINE_APPEND
in /etc/sysconfig/kdump
Can you try if this works?
Oops, your problem is normal boot instead of kdump so this is two
different problems. Seems we have not met your bug yet..
Correct.

-Jeff

Loading...