Discussion:
Questions about kexec-tools (resend to list)
Philip Prindeville
2017-03-04 23:26:10 UTC
Permalink
Hi.

I’m working on a couple of minimal embedded systems, and one of the things I do is platform bring-up (which means dealing with crashes).

We have kexec-tools ported to our distro (OpenWRT/LEDE) but not the OS scripting to integrate them. I had a couple of questions about Documentation/kdump.txt that I was hoping you all wouldn’t mind answering.

The kernel I’m building (for a Xeon Haswell and an Atom64 Pineview) has:

# CONFIG_SMP is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_PROC_VMCORE=y
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=y

all per the directions above.
2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
kernel.
5) Make and install the kernel and its modules. DO NOT add this kernel
to the boot loader configuration files.
In the case of having a single system kernel binary, then you’d have to install this kernel and it’s modules, and add this kernel to the boot loader configuration files, wouldn’t you? What do my grub arguments look like?
where Y specifies how much memory to reserve for the dump-capture kernel
and X specifies the beginning of this reserved memory. For example,
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
If you are using a compressed bzImage/vmlinuz, then use following command
to load dump-capture kernel.
kexec -p <dump-capture-kernel-bzImage> \
--initrd=<initrd-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
Not sure I understand this part. So if we have a relocatable kernel with crashdump built-in to our system kernel, do we need to load two kernels, just with different <arch-specific-options> and everything else being the same?

Would the <arch-specific-options> be:

crashkernel=***@16M 1 irqpoll maxcpus=1 reset_devices

in that case?

On a normally running system, using an overlay root, our cmdline looks like:

BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd


so I guess we’d just mash on those extra arguments. On a running system, our mount points are:

/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
/dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)


but it doesn’t sound like any of that would change (except perhaps mounting a USB thumb-drive if we wanted to copy our crashdump to that device instead).

So if I’ve understood, when the first loaded kernel (the system kernel) crashes, kexec will then try the next kernel it sees… which will be something like:

kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
Kernel Panic
============
After successfully loading the dump-capture kernel as previously
described, the system will reboot into the dump-capture kernel if a
system crash is triggered. [snip]
assuming the system isn’t so badly hosed that a WDT expires causing a BIOS reset, etc.

Do both kernels use the same “crashdump=“ value, or do they need different base addresses?

And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?

And then have something in my init.d scripts like:

kexec_loaded=$(< /sys/kernel/kexec_loaded)

if [ “$kexec_loaded” = 0 ]; then
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
else
echo “*** HANDLING CRASH DUMP COLLECTION"
mkdir -p /mnt/crashdrive
mount LABEL=crashdrive /mnt/crashdrive
# might do something clever here with “df —output=avail -m /mnt/crashdrive” to make
# sure I have enough space for the copy, perhaps deleting older dumps until I do…
cp /proc/vmcore /mnt/crashdrive
sync
umount /mnt/crashdrive
echo “*** NOW REBOOTING"
reboot -f
fi

Do I need to reboot in a particular way to avoid looping? The “Kernel Panic” section seems to state that normal reboots won’t be affected.

I appreciate the documentation you’ve written, but it’s a little unclear (to me at least) how to handle the degenerate case of using the same kernel as the system kernel and the crashdump kernel…

I want to make sure that I don’t inadvertently set it up to do looping infinitely nested kernels, etc.

I’m probably overthinking this, but… we’re having crashes in the field and the customers are a little riled up right now so I don’t want to spend a lot of time saying “here try this image”. They want their smoking gun and they want it soon.

Thanks,

-Philip
Pratyush Anand
2017-03-07 14:53:09 UTC
Permalink
Hi Philip,

On Sunday 05 March 2017 04:56 AM, Philip Prindeville wrote:

[...]
Post by Philip Prindeville
In the case of having a single system kernel binary, then you’d have to install this kernel and it’s modules, and add this kernel to the boot loader configuration files, wouldn’t you? What do my grub arguments look like?
Not necessarily all the modules. Kdump kernel will use only minimal
modules. You can build your initramfs with a minimum needed module, so
that you can boot and copy vmcore.
In the first kernel you need to pass "crashkernel=". Only size(64M
)should also work. Kernel should find the appropriate start address of
crash kernel location.
Post by Philip Prindeville
where Y specifies how much memory to reserve for the dump-capture kernel
and X specifies the beginning of this reserved memory. For example,
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
If you are using a compressed bzImage/vmlinuz, then use following command
to load dump-capture kernel.
kexec -p <dump-capture-kernel-bzImage> \
--initrd=<initrd-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
Not sure I understand this part. So if we have a relocatable kernel with crashdump built-in to our system kernel, do we need to load two kernels, just with different <arch-specific-options> and everything else being the same?
You are in primary kernel and you need to load crash kernel.

`kexec -p /boot/vmlinuz --initrd=/boot/kdump-initrd --reuse-cmdline
--append="irqpoll maxcpus=1 reset_devices"` should work.

You need to prepare kdump-initrd, OR you can use current initrd, but
that will load all your modules of 1st kernel and 64M might not be
sufficient space then.
"crashkernel=" *must* *not* be passed to crash kernel. It is only for
the primary kernel.
Post by Philip Prindeville
in that case?
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd
So, it should also have crashkernel=64M.
Post by Philip Prindeville
/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
/dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
but it doesn’t sound like any of that would change (except perhaps mounting a USB thumb-drive if we wanted to copy our crashdump to that device instead).
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
OK..so you can exclude --initrd argument to kexec.
Post by Philip Prindeville
Kernel Panic
============
After successfully loading the dump-capture kernel as previously
described, the system will reboot into the dump-capture kernel if a
system crash is triggered. [snip]
assuming the system isn’t so badly hosed that a WDT expires causing a BIOS reset, etc.
Do both kernels use the same “crashdump=“ value, or do they need different base addresses?
Again, only 1st kernel need "crashkernel=".
Post by Philip Prindeville
And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?
Yep, you can find /proc/vmcore in 2nd kernel but not in 1st kernel.
/sys/kernel/kexec_crash_loaded should have 1 in 1st kernel while 0 in
crash kernel.
Post by Philip Prindeville
kexec_loaded=$(< /sys/kernel/kexec_loaded)
/sys/kernel/kexec_crash_loaded
Post by Philip Prindeville
if [ “$kexec_loaded” = 0 ]; then
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
else
echo “*** HANDLING CRASH DUMP COLLECTION"
mkdir -p /mnt/crashdrive
mount LABEL=crashdrive /mnt/crashdrive
# might do something clever here with “df —output=avail -m /mnt/crashdrive” to make
# sure I have enough space for the copy, perhaps deleting older dumps until I do…
cp /proc/vmcore /mnt/crashdrive
sync
umount /mnt/crashdrive
echo “*** NOW REBOOTING"
reboot -f
fi
Above should work.

There can be many ways. You can have a look on fedora kexec-tools code.
http://pkgs.fedoraproject.org/cgit/rpms/kexec-tools.git/
Post by Philip Prindeville
Do I need to reboot in a particular way to avoid looping? The “Kernel Panic” section seems to state that normal reboots won’t be affected.
When you execute reboot, it will reboot to the 1st kernel through grub
(boot loader).
Post by Philip Prindeville
I appreciate the documentation you’ve written, but it’s a little unclear (to me at least) how to handle the degenerate case of using the same kernel as the system kernel and the crashdump kernel…
I want to make sure that I don’t inadvertently set it up to do looping infinitely nested kernels, etc.
I’m probably overthinking this, but… we’re having crashes in the field and the customers are a little riled up right now so I don’t want to spend a lot of time saying “here try this image”. They want their smoking gun and they want it soon.
~Pratyush
Philip Prindeville
2017-03-07 23:34:38 UTC
Permalink
Post by Pratyush Anand
Hi Philip,
[...]
Post by Philip Prindeville
In the case of having a single system kernel binary, then you’d have to install this kernel and it’s modules, and add this kernel to the boot loader configuration files, wouldn’t you? What do my grub arguments look like?
Not necessarily all the modules. Kdump kernel will use only minimal modules. You can build your initramfs with a minimum needed module, so that you can boot and copy vmcore.
In the first kernel you need to pass "crashkernel=". Only size(64M )should also work. Kernel should find the appropriate start address of crash kernel location.
Post by Philip Prindeville
where Y specifies how much memory to reserve for the dump-capture kernel
and X specifies the beginning of this reserved memory. For example,
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
If you are using a compressed bzImage/vmlinuz, then use following command
to load dump-capture kernel.
kexec -p <dump-capture-kernel-bzImage> \
--initrd=<initrd-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
Not sure I understand this part. So if we have a relocatable kernel with crashdump built-in to our system kernel, do we need to load two kernels, just with different <arch-specific-options> and everything else being the same?
You are in primary kernel and you need to load crash kernel.
`kexec -p /boot/vmlinuz --initrd=/boot/kdump-initrd --reuse-cmdline --append="irqpoll maxcpus=1 reset_devices"` should work.
Tried something like that:

***@PowercodeBMU:/# kexec -p /boot/vmlinuz --reuse-cmdline --append="irqpoll maxcpus=1 reset_devices 1"
Cannot get kernel page_offset_base symbol address
Cannot load /boot/vmlinuz
***@PowercodeBMU:/#

Not sure why I’m seeing this.
Post by Pratyush Anand
You need to prepare kdump-initrd, OR you can use current initrd, but that will load all your modules of 1st kernel and 64M might not be sufficient space then.
It’s an embedded system so it’s pretty skinny. Everything needed to boot is “baked in”. Everything else gets loaded as a module into the booting kernel via init.d scripts …
Post by Pratyush Anand
"crashkernel=" *must* *not* be passed to crash kernel. It is only for the primary kernel.
Okay. And --reuse-cmdline takes care of stripping that out for you, it looks like. That option isn’t discussed in Documentation/kdump/ but it might be handy to add something about it.
Post by Pratyush Anand
Post by Philip Prindeville
in that case?
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd
So, it should also have crashkernel=64M.
Well, right. I was talking about a nominal system before I’ve started trying to get it to be crash-dump capable.
Post by Pratyush Anand
Post by Philip Prindeville
/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
/dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
but it doesn’t sound like any of that would change (except perhaps mounting a USB thumb-drive if we wanted to copy our crashdump to that device instead).
Ah, actually, that’s not quite right. /boot has been unmounted early on but we’ll need to keep it mounted (even if we remount it as ‘ro’).
Post by Pratyush Anand
Post by Philip Prindeville
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
OK..so you can exclude --initrd argument to kexec.
Yes.
Post by Pratyush Anand
Post by Philip Prindeville
Kernel Panic
============
After successfully loading the dump-capture kernel as previously
described, the system will reboot into the dump-capture kernel if a
system crash is triggered. [snip]
assuming the system isn’t so badly hosed that a WDT expires causing a BIOS reset, etc.
Do both kernels use the same “crashdump=“ value, or do they need different base addresses?
Again, only 1st kernel need "crashkernel=“.
Okay, got it.
Post by Pratyush Anand
Post by Philip Prindeville
And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?
Yep, you can find /proc/vmcore in 2nd kernel but not in 1st kernel.
/sys/kernel/kexec_crash_loaded should have 1 in 1st kernel while 0 in crash kernel.
So far I’m seeing the opposite:

***@PowercodeBMU:/# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd crashkernel=64M
***@PowercodeBMU:/# cat /sys/kernel/kexec_crash_loaded
0
***@PowercodeBMU:/#

Maybe it’s the other way around?
Post by Pratyush Anand
Post by Philip Prindeville
kexec_loaded=$(< /sys/kernel/kexec_loaded)
/sys/kernel/kexec_crash_loaded
Right.
Post by Pratyush Anand
Post by Philip Prindeville
if [ “$kexec_loaded” = 0 ]; then
kexec -p /boot/vmlinuz \
—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
else
echo “*** HANDLING CRASH DUMP COLLECTION"
mkdir -p /mnt/crashdrive
mount LABEL=crashdrive /mnt/crashdrive
# might do something clever here with “df —output=avail -m /mnt/crashdrive” to make
# sure I have enough space for the copy, perhaps deleting older dumps until I do…
cp /proc/vmcore /mnt/crashdrive
sync
umount /mnt/crashdrive
echo “*** NOW REBOOTING"
reboot -f
fi
Above should work.
Question… will crashkernel being 64M mean that /sys/kernel/kexec_crash_size is also 64M (67108864) and that would also be the size of /proc/vmcore?
Post by Pratyush Anand
There can be many ways. You can have a look on fedora kexec-tools code.
http://pkgs.fedoraproject.org/cgit/rpms/kexec-tools.git/
Post by Philip Prindeville
Do I need to reboot in a particular way to avoid looping? The “Kernel Panic” section seems to state that normal reboots won’t be affected.
When you execute reboot, it will reboot to the 1st kernel through grub (boot loader).
Okay.

Thanks,

-Philip
Post by Pratyush Anand
Post by Philip Prindeville
I appreciate the documentation you’ve written, but it’s a little unclear (to me at least) how to handle the degenerate case of using the same kernel as the system kernel and the crashdump kernel…
I want to make sure that I don’t inadvertently set it up to do looping infinitely nested kernels, etc.
I’m probably overthinking this, but… we’re having crashes in the field and the customers are a little riled up right now so I don’t want to spend a lot of time saying “here try this image”. They want their smoking gun and they want it soon.
~Pratyush
Pratyush Anand
2017-03-08 11:33:23 UTC
Permalink
On Wednesday 08 March 2017 05:04 AM, Philip Prindeville wrote:
[...]
Post by Philip Prindeville
Cannot get kernel page_offset_base symbol address
That you can ignore.
Following patch will change this error message into warning:
http://lists.infradead.org/pipermail/kexec/2017-March/018299.html
Post by Philip Prindeville
Cannot load /boot/vmlinuz
Humm..can you pl run with -d and share debug output.
[...]
Post by Philip Prindeville
Post by Pratyush Anand
"crashkernel=" *must* *not* be passed to crash kernel. It is only for the primary kernel.
Okay. And --reuse-cmdline takes care of stripping that out for you, it looks like. That option isn’t discussed in Documentation/kdump/ but it might be handy to add something about it.
You should see about it in kexec-tools doc.

man kexec
[...]
Post by Philip Prindeville
Post by Pratyush Anand
Post by Philip Prindeville
And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?
Yep, you can find /proc/vmcore in 2nd kernel but not in 1st kernel.
/sys/kernel/kexec_crash_loaded should have 1 in 1st kernel while 0 in crash kernel.
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd crashkernel=64M
0
Because your kexec -p did not succeed yet.

~Pratyush
Philip Prindeville
2017-03-08 17:29:00 UTC
Permalink
Post by Pratyush Anand
[...]
Post by Philip Prindeville
Cannot get kernel page_offset_base symbol address
That you can ignore.
http://lists.infradead.org/pipermail/kexec/2017-March/018299.html
Done. Thanks!
Post by Pratyush Anand
Post by Philip Prindeville
Cannot load /boot/vmlinuz
Humm..can you pl run with -d and share debug output.
Sure thing:

***@PowercodeBMU:/# file /tmp/boot/boot/vmlinuz
/tmp/boot/boot/vmlinuz: Linux kernel x86 boot executable bzImage, version 4.4.14 (***@ubuntu16) #10 Wed Mar 8 17:19:10 UTC 2017, RO-rootFS, swap_dev 0x2, Normal VGA
***@PowercodeBMU:/#
***@PowercodeBMU:/# kexec -d -p /tmp/boot/boot/vmlinuz --reuse-cmdline --append="irqpoll maxcpus=1 reset_devices 1"
Try gzip decompression.
Try LZMA decompression.
lzma_decompress_file: read on /tmp/boot/boot/vmlinuz of 65536 bytes failed
kernel: 0x7f18b2596020 kernel_size: 0x27af20
MEMORY RANGES
0000000000000100-0000000000099bff (0)
0000000000099c00-000000000009ffff (1)
00000000000e0000-00000000000fffff (1)
0000000000100000-00000000cc309fff (0)
00000000cc30a000-00000000cc310fff (3)
00000000cc311000-00000000cc946fff (0)
00000000cc947000-00000000ccb5bfff (1)
00000000ccb5c000-00000000db8b4fff (0)
00000000db8b5000-00000000db957fff (1)
00000000db958000-00000000db96dfff (2)
00000000db96e000-00000000dbac3fff (3)
00000000dbac4000-00000000dbffefff (1)
00000000dbfff000-00000000dbffffff (0)
00000000dd000000-00000000df1fffff (1)
00000000f8000000-00000000fbffffff (1)
00000000fec00000-00000000fec00fff (1)
00000000fed00000-00000000fed03fff (1)
00000000fed1c000-00000000fed1ffff (1)
00000000fee00000-00000000fee00fff (1)
00000000ff000000-00000000ffffffff (1)
0000000100000000-000000041fdfffff (0)
CRASH MEMORY RANGES
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (2)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000130-00007f18b2a21938 (0)
Cannot get kernel page_offset_base symbol address
kernel symbol _stext vaddr = a0000000005
kernel vaddr = 0xffffffff81000000 size = 0x818000
Memmap after adding segment
0000000000000000-000000000009ffff (0)
Cannot load /tmp/boot/boot/vmlinuz
***@PowercodeBMU:/#

-Philip
Post by Pratyush Anand
[...]
Post by Philip Prindeville
Post by Pratyush Anand
"crashkernel=" *must* *not* be passed to crash kernel. It is only for the primary kernel.
Okay. And --reuse-cmdline takes care of stripping that out for you, it looks like. That option isn’t discussed in Documentation/kdump/ but it might be handy to add something about it.
You should see about it in kexec-tools doc.
man kexec
[...]
Post by Philip Prindeville
Post by Pratyush Anand
Post by Philip Prindeville
And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel? Do we use /sys/kernel/kexec_loaded for this purpose? Or do we just look for the existence of /proc/vmcore?
Yep, you can find /proc/vmcore in 2nd kernel but not in 1st kernel.
/sys/kernel/kexec_crash_loaded should have 1 in 1st kernel while 0 in crash kernel.
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd crashkernel=64M
0
Because your kexec -p did not succeed yet.
~Pratyush
Pratyush Anand
2017-03-09 05:29:16 UTC
Permalink
Post by Philip Prindeville
CRASH MEMORY RANGES
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (2)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000130-00007f18b2a21938 (0)
No sure whats going here..but above does not look good. It should have
values from /proc/iomem.
Post by Philip Prindeville
Cannot get kernel page_offset_base symbol address
kernel symbol _stext vaddr = a0000000005
kernel vaddr = 0xffffffff81000000 size = 0x818000
Memmap after adding segment
0000000000000000-000000000009ffff (0)
Cannot load /tmp/boot/boot/vmlinuz
Above wrong ranges might be the reason of failure most likely in
crash_create_elf**_headers().


~Pratyush
Philip Prindeville
2017-03-10 19:09:20 UTC
Permalink
Post by Philip Prindeville
CRASH MEMORY RANGES
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (2)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000130-00007f18b2a21938 (0)
No sure whats going here..but above does not look good. It should have values from /proc/iomem.
For comparison, here’s what /proc/iomem looks like:

***@PowercodeBMU:/# cat /proc/iomem
00000000-00000fff : reserved
00001000-00099bff : System RAM
00099c00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000cebff : Video ROM
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000dffff : PCI Bus 0000:00
000e0000-000fffff : reserved
000e0000-000e3fff : PCI Bus 0000:00
000e4000-000e7fff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-cc309fff : System RAM
01000000-01483d1a : Kernel code
01483d1b-016b897f : Kernel data
01777000-017f1fff : Kernel bss
34000000-37ffffff : Crash kernel
cc30a000-cc310fff : ACPI Non-volatile Storage
cc311000-cc946fff : System RAM
cc947000-ccb5bfff : reserved
ccb5c000-db8b4fff : System RAM
db8b5000-db957fff : reserved
db958000-db96dfff : ACPI Tables
db96e000-dbac3fff : ACPI Non-volatile Storage
dbac4000-dbffefff : reserved
dbfff000-dbffffff : System RAM
dd000000-df1fffff : reserved
df200000-feafffff : PCI Bus 0000:00
e0000000-efffffff : 0000:00:02.0
f0000000-f04fffff : PCI Bus 0000:02
f0000000-f01fffff : 0000:02:00.1
f0000000-f01fffff : ixgbe
f0200000-f03fffff : 0000:02:00.0
f0200000-f03fffff : ixgbe
f0400000-f0403fff : 0000:02:00.1
f0400000-f0403fff : ixgbe
f0404000-f0407fff : 0000:02:00.0
f0404000-f0407fff : ixgbe
f0600000-f0afffff : PCI Bus 0000:01
f0600000-f07fffff : 0000:01:00.1
f0600000-f07fffff : ixgbe
f0800000-f09fffff : 0000:01:00.0
f0800000-f09fffff : ixgbe
f0a00000-f0a03fff : 0000:01:00.1
f0a00000-f0a03fff : ixgbe
f0a04000-f0a07fff : 0000:01:00.0
f0a04000-f0a07fff : ixgbe
f6000000-f63fffff : 0000:00:02.0
f6400000-f66fffff : PCI Bus 0000:0a
f6400000-f64fffff : 0000:0a:00.0
f6500000-f65fffff : 0000:0a:00.0
f6500000-f65fffff : igb
f6600000-f6603fff : 0000:0a:00.0
f6600000-f6603fff : igb
f6700000-f69fffff : PCI Bus 0000:09
f6700000-f67fffff : 0000:09:00.0
f6800000-f68fffff : 0000:09:00.0
f6800000-f68fffff : igb
f6900000-f6903fff : 0000:09:00.0
f6900000-f6903fff : igb
f6a00000-f6cfffff : PCI Bus 0000:08
f6a00000-f6afffff : 0000:08:00.0
f6b00000-f6bfffff : 0000:08:00.0
f6b00000-f6bfffff : igb
f6c00000-f6c03fff : 0000:08:00.0
f6c00000-f6c03fff : igb
f6d00000-f6ffffff : PCI Bus 0000:07
f6d00000-f6dfffff : 0000:07:00.0
f6e00000-f6efffff : 0000:07:00.0
f6e00000-f6efffff : igb
f6f00000-f6f03fff : 0000:07:00.0
f6f00000-f6f03fff : igb
f7000000-f72fffff : PCI Bus 0000:06
f7000000-f70fffff : 0000:06:00.0
f7100000-f71fffff : 0000:06:00.0
f7100000-f71fffff : igb
f7200000-f7203fff : 0000:06:00.0
f7200000-f7203fff : igb
f7300000-f75fffff : PCI Bus 0000:05
f7300000-f73fffff : 0000:05:00.0
f7400000-f74fffff : 0000:05:00.0
f7400000-f74fffff : igb
f7500000-f7503fff : 0000:05:00.0
f7500000-f7503fff : igb
f7600000-f78fffff : PCI Bus 0000:04
f7600000-f76fffff : 0000:04:00.0
f7700000-f77fffff : 0000:04:00.0
f7700000-f77fffff : igb
f7800000-f7803fff : 0000:04:00.0
f7800000-f7803fff : igb
f7900000-f7bfffff : PCI Bus 0000:03
f7900000-f79fffff : 0000:03:00.0
f7a00000-f7afffff : 0000:03:00.0
f7a00000-f7afffff : igb
f7b00000-f7b03fff : 0000:03:00.0
f7b00000-f7b03fff : igb
f7c00000-f7c0ffff : 0000:00:14.0
f7c00000-f7c0ffff : xhci-hcd
f7c11000-f7c110ff : 0000:00:1f.3
f7c12000-f7c127ff : 0000:00:1f.2
f7c12000-f7c127ff : ahci
f7c14000-f7c1400f : 0000:00:16.0
f7fdf000-f7fdffff : pnp 00:09
f7fe0000-f7feffff : pnp 00:09
f8000000-fbffffff : reserved
f8000000-fbffffff : pnp 00:09
fec00000-fec00fff : reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed03fff : reserved
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed10000-fed17fff : pnp 00:09
fed18000-fed18fff : pnp 00:09
fed19000-fed19fff : pnp 00:09
fed1c000-fed1ffff : reserved
fed1c000-fed1ffff : pnp 00:09
fed1f410-fed1f414 : iTCO_wdt.0.auto
fed1f410-fed1f414 : iTCO_wdt
fed20000-fed3ffff : pnp 00:09
fed40000-fed44fff : pnp 00:00
fed45000-fed8ffff : pnp 00:09
fed90000-fed93fff : pnp 00:09
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
ff000000-ffffffff : reserved
ff000000-ffffffff : INT0800:00
ff000000-ffffffff : pnp 00:09
100000000-41fdfffff : System RAM
41fe00000-41fffffff : RAM buffer
Post by Philip Prindeville
Cannot get kernel page_offset_base symbol address
kernel symbol _stext vaddr = a0000000005
kernel vaddr = 0xffffffff81000000 size = 0x818000
Memmap after adding segment
0000000000000000-000000000009ffff (0)
Cannot load /tmp/boot/boot/vmlinuz
Above wrong ranges might be the reason of failure most likely in crash_create_elf**_headers().
~Pratyush
Philip Prindeville
2017-03-10 20:20:21 UTC
Permalink
Post by Philip Prindeville
Post by Pratyush Anand
Humm..can you pl run with -d and share debug output.
Try gzip decompression.
Try LZMA decompression.
lzma_decompress_file: read on /tmp/boot/boot/vmlinuz of 65536 bytes failed
kernel: 0x7f18b2596020 kernel_size: 0x27af20
MEMORY RANGES
0000000000000100-0000000000099bff (0)
0000000000099c00-000000000009ffff (1)
00000000000e0000-00000000000fffff (1)
0000000000100000-00000000cc309fff (0)
00000000cc30a000-00000000cc310fff (3)
00000000cc311000-00000000cc946fff (0)
00000000cc947000-00000000ccb5bfff (1)
00000000ccb5c000-00000000db8b4fff (0)
00000000db8b5000-00000000db957fff (1)
00000000db958000-00000000db96dfff (2)
00000000db96e000-00000000dbac3fff (3)
00000000dbac4000-00000000dbffefff (1)
00000000dbfff000-00000000dbffffff (0)
00000000dd000000-00000000df1fffff (1)
00000000f8000000-00000000fbffffff (1)
00000000fec00000-00000000fec00fff (1)
00000000fed00000-00000000fed03fff (1)
00000000fed1c000-00000000fed1ffff (1)
00000000fee00000-00000000fee00fff (1)
00000000ff000000-00000000ffffffff (1)
0000000100000000-000000041fdfffff (0)
CRASH MEMORY RANGES
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (2)
0000000000000005-ffffffffffffffff (3)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (1)
0000000000000005-ffffffffffffffff (0)
0000000000000130-00007f18b2a21938 (0)
Cannot get kernel page_offset_base symbol address
kernel symbol _stext vaddr = a0000000005
kernel vaddr = 0xffffffff81000000 size = 0x818000
Memmap after adding segment
0000000000000000-000000000009ffff (0)
Cannot load /tmp/boot/boot/vmlinuz
-Philip
The above memory ranges look highly suspect.

Anyone have any ideas where I should start digging to figure out where things are going sideways?

I’m using 2.0.14 on a Linux 4.4.19 kernel, with gcc 5.3.0 and MUSL 1.1.16.

Thanks,

-Philip
Baoquan He
2017-03-11 00:16:42 UTC
Permalink
Post by Philip Prindeville
The above memory ranges look highly suspect.
Anyone have any ideas where I should start digging to figure out where things are going sideways?
I’m using 2.0.14 on a Linux 4.4.19 kernel, with gcc 5.3.0 and MUSL 1.1.16.
Hi,

This is user space program and using in 1st kernel. I really suggest you
should start a gdb to track what's going on when your real /proc/iomem
is different with the debug printing of kexec. Honestly, it won't be too
difficult, I really like this kind of debugging. If happened in kdump
kernel, even in user space tools like makedumpfile, you have to add
debug printing again and again.

Thanks
Baoquan
Philip Prindeville
2017-03-11 03:14:28 UTC
Permalink
Post by Baoquan He
Post by Philip Prindeville
The above memory ranges look highly suspect.
Anyone have any ideas where I should start digging to figure out where things are going sideways?
I’m using 2.0.14 on a Linux 4.4.19 kernel, with gcc 5.3.0 and MUSL 1.1.16.
Hi,
This is user space program and using in 1st kernel. I really suggest you
should start a gdb to track what's going on when your real /proc/iomem
is different with the debug printing of kexec. Honestly, it won't be too
difficult, I really like this kind of debugging. If happened in kdump
kernel, even in user space tools like makedumpfile, you have to add
debug printing again and again.
Thanks
Baoquan
Actually, I have my smoking gun. And found a clue (regrettably after I had done all the digging) to confirm that I was on the right track:

http://git.alsa-project.org/?p=alsa-lib.git;a=commitdiff;h=1d3f7975f920f47e6a8a324f547da2180e64171a

Sending a patch, separately.

-Philip

Loading...