Discussion:
kexec on panic
Denys Fedoryshchenko
2017-02-10 08:14:02 UTC
Permalink
Hello,

After years of using kexec and recent unpleasant experience with modern
(supposed to be blazing fast to boot) hardware that need 5-10 minutes
just to pass POST tests,
one question came up to me:
Is it possible anyhow to execute regular (not special "panic" one to
capture crash data) kexec on panic to reduce reboot time?

Thanks!
Petr Tesarik
2017-02-10 15:43:13 UTC
Permalink
On Fri, 10 Feb 2017 10:14:02 +0200
Post by Denys Fedoryshchenko
Hello,
After years of using kexec and recent unpleasant experience with modern
(supposed to be blazing fast to boot) hardware that need 5-10 minutes
just to pass POST tests,
Is it possible anyhow to execute regular (not special "panic" one to
capture crash data) kexec on panic to reduce reboot time?
No. But you can load a specially crafted panic initrd which kexec's
back to the production kernel.

HTH,
Petr T
Clif Houck
2017-02-15 17:29:32 UTC
Permalink
Is it possible to kexec on demand (not panic!) into another kernel with
the idea being to avoid a reboot?

For instance, say you had Linux running in a ramdisk, and all that
ramdisk Linux did was lay down a bootable Linux image onto the main
disk, and then awaited a command to kexec to the Linux image on disk? Is
something like that possible? Would I still need to specially craft the
initrd? If so, is there any literature available on how to do that?

Thanks,
Clif Houck
Post by Petr Tesarik
On Fri, 10 Feb 2017 10:14:02 +0200
Post by Denys Fedoryshchenko
Hello,
After years of using kexec and recent unpleasant experience with modern
(supposed to be blazing fast to boot) hardware that need 5-10 minutes
just to pass POST tests,
Is it possible anyhow to execute regular (not special "panic" one to
capture crash data) kexec on panic to reduce reboot time?
No. But you can load a specially crafted panic initrd which kexec's
back to the production kernel.
HTH,
Petr T
_______________________________________________
kexec mailing list
http://lists.infradead.org/mailman/listinfo/kexec
Marc Milgram
2017-02-15 21:45:10 UTC
Permalink
It is possible to boot into a new kernel as documented on the following
page:

https://access.redhat.com/discussions/682993

That said, even though this is documented on a Red Hat page, Red Hat
does not officially support it.

Marc
Post by Clif Houck
Is it possible to kexec on demand (not panic!) into another kernel with
the idea being to avoid a reboot?
For instance, say you had Linux running in a ramdisk, and all that
ramdisk Linux did was lay down a bootable Linux image onto the main
disk, and then awaited a command to kexec to the Linux image on disk? Is
something like that possible? Would I still need to specially craft the
initrd? If so, is there any literature available on how to do that?
Thanks,
Clif Houck
Post by Petr Tesarik
On Fri, 10 Feb 2017 10:14:02 +0200
Post by Denys Fedoryshchenko
Hello,
After years of using kexec and recent unpleasant experience with modern
(supposed to be blazing fast to boot) hardware that need 5-10 minutes
just to pass POST tests,
Is it possible anyhow to execute regular (not special "panic" one to
capture crash data) kexec on panic to reduce reboot time?
No. But you can load a specially crafted panic initrd which kexec's
back to the production kernel.
HTH,
Petr T
_______________________________________________
kexec mailing list
http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
http://lists.infradead.org/mailman/listinfo/kexec
Jon Masters
2017-02-18 07:42:12 UTC
Permalink
Hi Denys,
After years of using kexec and recent unpleasant experience with modern (supposed to be blazing fast to boot) hardware that need 5-10 minutes just to pass POST tests,
Is it possible anyhow to execute regular (not special "panic" one to capture crash data) kexec on panic to reduce reboot time?
Generally, you don't want to do this, because various platform hardware
might be in non-quiescent states (still doing DMA to random memory, etc.)
and other nastiness that means you don't want to do more than the minimal
amount in a kexec on panic (crash). We've seen no end of fun and games
even with just regular crash dumps while hardware is busily writing to
memory that it shouldn't be. An IOMMU helps, but isn't a cure-all.

Jon.
Denys Fedoryshchenko
2017-02-18 08:09:39 UTC
Permalink
Post by Jon Masters
Hi Denys,
Post by Denys Fedoryshchenko
After years of using kexec and recent unpleasant experience with
modern (supposed to be blazing fast to boot) hardware that need 5-10
minutes just to pass POST tests,
Is it possible anyhow to execute regular (not special "panic" one to
capture crash data) kexec on panic to reduce reboot time?
Generally, you don't want to do this, because various platform hardware
might be in non-quiescent states (still doing DMA to random memory, etc.)
and other nastiness that means you don't want to do more than the minimal
amount in a kexec on panic (crash). We've seen no end of fun and games
even with just regular crash dumps while hardware is busily writing to
memory that it shouldn't be. An IOMMU helps, but isn't a cure-all.
Jon.
Well, i have to try, even sometimes i am facing issues with non-booting
hardware even on regular kexec, but having at small customer HP server
that need almost 6 minutes to boot,
no hot-spare(and hard to do by many reasons, no spare 10G ports, cost of
hardware and etc) and some nasty bugs that is not resolved yet - forcing
me to search way to reduce reboot time.
If i will find way to save backtrace and reboot fast, it will help a lot
to debug kernels with minimal downtime, if bug is reproducible only on
live system.

What i did now, might be insanely wrong, but:
diff -Naur linux-4.9.9-vanilla/kernel/kexec_core.c
linux-4.9.9/kernel/kexec_core.c
--- linux-4.9.9-vanilla/kernel/kexec_core.c 2017-02-09
07:08:40.000000000 +0000
+++ linux-4.9.9/kernel/kexec_core.c 2017-02-17 12:54:49.000000000 +0000
@@ -897,6 +897,10 @@
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
}
+ if (kexec_image) {
+ machine_shutdown();
+ machine_kexec(kexec_image);
+ }
mutex_unlock(&kexec_mutex);
}
}

Then

kexec -l /mnt/flash/kernel --append="intel_idle.max_cstate=0
processor.max_cstate=1"

and
echo c >/proc/sysrq-trigger
worked even on busy network router, but i'm not sure it will be same on
real networking stack crash.

Loading...