Philip Prindeville
2017-03-04 23:26:10 UTC
Hi.
I’m working on a couple of minimal embedded systems, and one of the things I do is platform bring-up (which means dealing with crashes).
We have kexec-tools ported to our distro (OpenWRT/LEDE) but not the OS scripting to integrate them. I had a couple of questions about Documentation/kdump.txt that I was hoping you all wouldn’t mind answering.
The kernel I’m building (for a Xeon Haswell and an Atom64 Pineview) has:
# CONFIG_SMP is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_PROC_VMCORE=y
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=y
all per the directions above.
Would the <arch-specific-options> be:
crashkernel=***@16M 1 irqpoll maxcpus=1 reset_devices
in that case?
On a normally running system, using an overlay root, our cmdline looks like:
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd
so I guess we’d just mash on those extra arguments. On a running system, our mount points are:
/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
/dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
but it doesn’t sound like any of that would change (except perhaps mounting a USB thumb-drive if we wanted to copy our crashdump to that device instead).
So if I’ve understood, when the first loaded kernel (the system kernel) crashes, kexec will then try the next kernel it sees… which will be something like:
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
Do both kernels use the same “crashdump=“ value, or do they need different base addresses?
And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?
And then have something in my init.d scripts like:
kexec_loaded=$(< /sys/kernel/kexec_loaded)
if [ “$kexec_loaded” = 0 ]; then
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
else
echo “*** HANDLING CRASH DUMP COLLECTION"
mkdir -p /mnt/crashdrive
mount LABEL=crashdrive /mnt/crashdrive
# might do something clever here with “df —output=avail -m /mnt/crashdrive” to make
# sure I have enough space for the copy, perhaps deleting older dumps until I do…
cp /proc/vmcore /mnt/crashdrive
sync
umount /mnt/crashdrive
echo “*** NOW REBOOTING"
reboot -f
fi
Do I need to reboot in a particular way to avoid looping? The “Kernel Panic” section seems to state that normal reboots won’t be affected.
I appreciate the documentation you’ve written, but it’s a little unclear (to me at least) how to handle the degenerate case of using the same kernel as the system kernel and the crashdump kernel…
I want to make sure that I don’t inadvertently set it up to do looping infinitely nested kernels, etc.
I’m probably overthinking this, but… we’re having crashes in the field and the customers are a little riled up right now so I don’t want to spend a lot of time saying “here try this image”. They want their smoking gun and they want it soon.
Thanks,
-Philip
I’m working on a couple of minimal embedded systems, and one of the things I do is platform bring-up (which means dealing with crashes).
We have kexec-tools ported to our distro (OpenWRT/LEDE) but not the OS scripting to integrate them. I had a couple of questions about Documentation/kdump.txt that I was hoping you all wouldn’t mind answering.
The kernel I’m building (for a Xeon Haswell and an Atom64 Pineview) has:
# CONFIG_SMP is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_PROC_VMCORE=y
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=y
all per the directions above.
2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
kernel.
5) Make and install the kernel and its modules. DO NOT add this kernel
to the boot loader configuration files.
In the case of having a single system kernel binary, then you’d have to install this kernel and it’s modules, and add this kernel to the boot loader configuration files, wouldn’t you? What do my grub arguments look like?no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
kernel.
5) Make and install the kernel and its modules. DO NOT add this kernel
to the boot loader configuration files.
where Y specifies how much memory to reserve for the dump-capture kernel
and X specifies the beginning of this reserved memory. For example,
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
If you are using a compressed bzImage/vmlinuz, then use following command
to load dump-capture kernel.
kexec -p <dump-capture-kernel-bzImage> \
--initrd=<initrd-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
Not sure I understand this part. So if we have a relocatable kernel with crashdump built-in to our system kernel, do we need to load two kernels, just with different <arch-specific-options> and everything else being the same?and X specifies the beginning of this reserved memory. For example,
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
If you are using a compressed bzImage/vmlinuz, then use following command
to load dump-capture kernel.
kexec -p <dump-capture-kernel-bzImage> \
--initrd=<initrd-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
Would the <arch-specific-options> be:
crashkernel=***@16M 1 irqpoll maxcpus=1 reset_devices
in that case?
On a normally running system, using an overlay root, our cmdline looks like:
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd
so I guess we’d just mash on those extra arguments. On a running system, our mount points are:
/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
/dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
but it doesn’t sound like any of that would change (except perhaps mounting a USB thumb-drive if we wanted to copy our crashdump to that device instead).
So if I’ve understood, when the first loaded kernel (the system kernel) crashes, kexec will then try the next kernel it sees… which will be something like:
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
Kernel Panic
============
After successfully loading the dump-capture kernel as previously
described, the system will reboot into the dump-capture kernel if a
system crash is triggered. [snip]
assuming the system isn’t so badly hosed that a WDT expires causing a BIOS reset, etc.============
After successfully loading the dump-capture kernel as previously
described, the system will reboot into the dump-capture kernel if a
system crash is triggered. [snip]
Do both kernels use the same “crashdump=“ value, or do they need different base addresses?
And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?
And then have something in my init.d scripts like:
kexec_loaded=$(< /sys/kernel/kexec_loaded)
if [ “$kexec_loaded” = 0 ]; then
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
else
echo “*** HANDLING CRASH DUMP COLLECTION"
mkdir -p /mnt/crashdrive
mount LABEL=crashdrive /mnt/crashdrive
# might do something clever here with “df —output=avail -m /mnt/crashdrive” to make
# sure I have enough space for the copy, perhaps deleting older dumps until I do…
cp /proc/vmcore /mnt/crashdrive
sync
umount /mnt/crashdrive
echo “*** NOW REBOOTING"
reboot -f
fi
Do I need to reboot in a particular way to avoid looping? The “Kernel Panic” section seems to state that normal reboots won’t be affected.
I appreciate the documentation you’ve written, but it’s a little unclear (to me at least) how to handle the degenerate case of using the same kernel as the system kernel and the crashdump kernel…
I want to make sure that I don’t inadvertently set it up to do looping infinitely nested kernels, etc.
I’m probably overthinking this, but… we’re having crashes in the field and the customers are a little riled up right now so I don’t want to spend a lot of time saying “here try this image”. They want their smoking gun and they want it soon.
Thanks,
-Philip