Discussion:
crash_kexec in oops_end() and panic()
Daniel Walker
2017-06-07 16:20:51 UTC
Permalink
Hi,

These two paths seem to be duplicating each other. We have an issue
where we're using mtdoops to collect kernel logs on oops and panic, we
also have a crash kernel (which also collects these logs). mtdoops saves
logs differently for oops and panic, since oops isn't always fatal it
schedules a write to the flash. Since panic() is always fatal is writes
the logs immediately. In oops_end() the crash kernel runs immediately
while still signaling an OOPS condition to mtdoops. Since mtdoops
schedules a write to flash later, there is no later since the crash
kernel runs immediately, we end up without getting the logs

I'm wondering what the significance is to have these two paths ?
oops_end() could just call into panic() or a modified panic_with_regs()
then we would collapse multiple paths. There is what I would call a hack
in kexec_should_crash() which checks if there are
crash_kexec_post_notifiers and it runs panic() if they exist. This
wouldn't be needed if we always called panic() . I also wonder if there
are other things in panic() which we should be running , but don't get
run because of these two paths.


Any info appreciated.


Daniel
Eric W. Biederman
2017-06-07 16:46:27 UTC
Permalink
Post by Daniel Walker
Hi,
These two paths seem to be duplicating each other. We have an issue
where we're using mtdoops to collect kernel logs on oops and panic, we
also have a crash kernel (which also collects these logs). mtdoops
saves logs differently for oops and panic, since oops isn't always
fatal it schedules a write to the flash. Since panic() is always fatal
is writes the logs immediately. In oops_end() the crash kernel runs
immediately while still signaling an OOPS condition to mtdoops. Since
mtdoops schedules a write to flash later, there is no later since the
crash kernel runs immediately, we end up without getting the logs
I'm wondering what the significance is to have these two paths ?
oops_end() could just call into panic() or a modified
panic_with_regs() then we would collapse multiple paths. There is what
I would call a hack in kexec_should_crash() which checks if there are
crash_kexec_post_notifiers and it runs panic() if they exist. This
wouldn't be needed if we always called panic() . I also wonder if
there are other things in panic() which we should be running , but
don't get run because of these two paths.
crash_kexec_post_notifiers is a horrible hack it is broken by design and
no one should use it.

Looking at the history and it still seems valid is the point of
kexec_should_crash is so that crash_kexec could be called with the
registers at the time of the crash.

The code that is run in kexec on panic path the less well it works.
This has been a known fact for years.

Please figure out how to depend on less code running in a broken
kernel. Trying to figure out how to run more code is not the solution
to making the kernel reliable at the time of a crash.

Eric
Daniel Walker
2017-06-07 17:00:15 UTC
Permalink
Post by Eric W. Biederman
Post by Daniel Walker
Hi,
These two paths seem to be duplicating each other. We have an issue
where we're using mtdoops to collect kernel logs on oops and panic, we
also have a crash kernel (which also collects these logs). mtdoops
saves logs differently for oops and panic, since oops isn't always
fatal it schedules a write to the flash. Since panic() is always fatal
is writes the logs immediately. In oops_end() the crash kernel runs
immediately while still signaling an OOPS condition to mtdoops. Since
mtdoops schedules a write to flash later, there is no later since the
crash kernel runs immediately, we end up without getting the logs
I'm wondering what the significance is to have these two paths ?
oops_end() could just call into panic() or a modified
panic_with_regs() then we would collapse multiple paths. There is what
I would call a hack in kexec_should_crash() which checks if there are
crash_kexec_post_notifiers and it runs panic() if they exist. This
wouldn't be needed if we always called panic() . I also wonder if
there are other things in panic() which we should be running , but
don't get run because of these two paths.
crash_kexec_post_notifiers is a horrible hack it is broken by design and
no one should use it.
Looking at the history and it still seems valid is the point of
kexec_should_crash is so that crash_kexec could be called with the
registers at the time of the crash.
This is why I mention panic_with_regs() , it's not that hard to just
send them into panic().
Post by Eric W. Biederman
The code that is run in kexec on panic path the less well it works.
This has been a known fact for years.
Please figure out how to depend on less code running in a broken
kernel. Trying to figure out how to run more code is not the solution
to making the kernel reliable at the time of a crash.
Eric
While this may be true, your cutting short an already existing panic()
path which has been around and is well used. I would think if it's good
enough for everyone else it should be good enough for the crash kernel path.

Daniel
Eric W. Biederman
2017-06-07 17:11:08 UTC
Permalink
Post by Daniel Walker
Post by Eric W. Biederman
Post by Daniel Walker
Hi,
These two paths seem to be duplicating each other. We have an issue
where we're using mtdoops to collect kernel logs on oops and panic, we
also have a crash kernel (which also collects these logs). mtdoops
saves logs differently for oops and panic, since oops isn't always
fatal it schedules a write to the flash. Since panic() is always fatal
is writes the logs immediately. In oops_end() the crash kernel runs
immediately while still signaling an OOPS condition to mtdoops. Since
mtdoops schedules a write to flash later, there is no later since the
crash kernel runs immediately, we end up without getting the logs
I'm wondering what the significance is to have these two paths ?
oops_end() could just call into panic() or a modified
panic_with_regs() then we would collapse multiple paths. There is what
I would call a hack in kexec_should_crash() which checks if there are
crash_kexec_post_notifiers and it runs panic() if they exist. This
wouldn't be needed if we always called panic() . I also wonder if
there are other things in panic() which we should be running , but
don't get run because of these two paths.
crash_kexec_post_notifiers is a horrible hack it is broken by design and
no one should use it.
Looking at the history and it still seems valid is the point of
kexec_should_crash is so that crash_kexec could be called with the
registers at the time of the crash.
This is why I mention panic_with_regs() , it's not that hard to just
send them into panic().
Given your reasoning adding more useless code that is likely to
destabalize the code path I am not particularly interested.
Post by Daniel Walker
Post by Eric W. Biederman
The code that is run in kexec on panic path the less well it works.
This has been a known fact for years.
Please figure out how to depend on less code running in a broken
kernel. Trying to figure out how to run more code is not the solution
to making the kernel reliable at the time of a crash.
Eric
While this may be true, your cutting short an already existing panic()
path which has been around and is well used. I would think if it's
good enough for everyone else it should be good enough for the crash
kernel path.
You are talking about decisions over a decade old at this point. So
this isn't some new thing you are complaing about. In fact mtdoops is
much newer. So if it isn't working that is the code with the bug.

Furthermore we have lots of emperical evidence that doing lots of things
in the kernel at crash time doesn't work in practice.

Without kexec-on-panic there just isn't anything else the code can do,
so it makes sense to do a lot of flaky things.

I am not at all sympathetic with the notion of reorganizing the code so
that more code can run in the kexec on panic path and reduce it's
changes of working. We have been there. We have the scars. That is
the reason why kexec is used at all.

There used to be a core dumper in the kernel it only worked in the lab
never in the field.

What I hear you asking for is to put a core dumper (or something as
liable to failure) back in the kernel without acknowledging any of the
reasons why that doesn't work in practice. I am hearing you arguing for
making the code buggier. I am hearing that and I am saying not interested.

Eric
Daniel Walker
2017-06-07 17:38:50 UTC
Permalink
Post by Eric W. Biederman
Post by Daniel Walker
Post by Eric W. Biederman
Post by Daniel Walker
Hi,
These two paths seem to be duplicating each other. We have an issue
where we're using mtdoops to collect kernel logs on oops and panic, we
also have a crash kernel (which also collects these logs). mtdoops
saves logs differently for oops and panic, since oops isn't always
fatal it schedules a write to the flash. Since panic() is always fatal
is writes the logs immediately. In oops_end() the crash kernel runs
immediately while still signaling an OOPS condition to mtdoops. Since
mtdoops schedules a write to flash later, there is no later since the
crash kernel runs immediately, we end up without getting the logs
I'm wondering what the significance is to have these two paths ?
oops_end() could just call into panic() or a modified
panic_with_regs() then we would collapse multiple paths. There is what
I would call a hack in kexec_should_crash() which checks if there are
crash_kexec_post_notifiers and it runs panic() if they exist. This
wouldn't be needed if we always called panic() . I also wonder if
there are other things in panic() which we should be running , but
don't get run because of these two paths.
crash_kexec_post_notifiers is a horrible hack it is broken by design and
no one should use it.
Looking at the history and it still seems valid is the point of
kexec_should_crash is so that crash_kexec could be called with the
registers at the time of the crash.
This is why I mention panic_with_regs() , it's not that hard to just
send them into panic().
Given your reasoning adding more useless code that is likely to
destabalize the code path I am not particularly interested.
Post by Daniel Walker
Post by Eric W. Biederman
The code that is run in kexec on panic path the less well it works.
This has been a known fact for years.
Please figure out how to depend on less code running in a broken
kernel. Trying to figure out how to run more code is not the solution
to making the kernel reliable at the time of a crash.
Eric
While this may be true, your cutting short an already existing panic()
path which has been around and is well used. I would think if it's
good enough for everyone else it should be good enough for the crash
kernel path.
You are talking about decisions over a decade old at this point. So
this isn't some new thing you are complaing about. In fact mtdoops is
much newer. So if it isn't working that is the code with the bug.
Furthermore we have lots of emperical evidence that doing lots of things
in the kernel at crash time doesn't work in practice.
Without kexec-on-panic there just isn't anything else the code can do,
so it makes sense to do a lot of flaky things.
I am not at all sympathetic with the notion of reorganizing the code so
that more code can run in the kexec on panic path and reduce it's
changes of working. We have been there. We have the scars. That is
the reason why kexec is used at all.
There used to be a core dumper in the kernel it only worked in the lab
never in the field.
What I hear you asking for is to put a core dumper (or something as
liable to failure) back in the kernel without acknowledging any of the
reasons why that doesn't work in practice. I am hearing you arguing for
making the code buggier. I am hearing that and I am saying not interested.
panic() already runs crash_kexec(). I'm not familiar with this "dumper"
, can you explain the connection? I'm just suggesting we combine two
similar code paths. The panic() call does minimal additional code prior
to calling crash_kexec , but I'm not sure it's significant enough to
cause reliability to change. Is there something specific in panic()
which you can point to which you know would cause a problem ?

It seems your suggesting crash_kexec() inside panic() never , or rarely,
works in the field. I think it's worked fine a Cisco more than once.

Daniel

Loading...