Discussion:
[PATCH v26 0/7] arm64: add kdump support
AKASHI Takahiro
2016-09-07 04:37:46 UTC
Permalink
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
For kexec-tools, see:
http://lists.infradead.org/pipermail/kexec/2016-September/017158.html

-Takahiro AKASHI
[4] https://github.com/crash-utility/crash.git
o Use /reserved-memory instead of "linux,usable-memory-range" property
(dropping v25's patch#2 and #3, updating ex-patch#9.)
o Rebase to Linux-4.8-rc4
o Use memremap() instead of ioremap_cache() [patch#5]
o Rebase to Linux-4.8-rc1
o Update descriptions about newly added DT proerties
o Move memblock_reserve() to a single place in reserve_crashkernel()
o Use cpu_park_loop() in ipi_cpu_crash_stop()
o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
(for ACPI) from usable memory at crash kernel
o Export "crashkernel-base" and "crashkernel-size" via device-tree,
and add some descriptions about them in chosen.txt
o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
with powerpc's "usable-memory"
o Make cosmetic changes regarding "ifdef" usage
o Correct some wordings in kdump.txt
o Remove kexec patches.
o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
o Clarify the description about kvm in kdump.txt.
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html
arm64: kdump: reserve memory for crash dump kernel
memblock: add memblock_cap_memory_range()
arm64: limit memory regions based on DT property, usable-memory-range
arm64: kdump: implement machine_crash_shutdown()
arm64: kdump: add kdump support
arm64: kdump: add VMCOREINFO's for user-space coredump tools
arm64: kdump: enable kdump in the arm64 defconfig
arm64: kdump: update a kernel doc
Documentation: dt: chosen properties for arm64 kdump
Documentation/devicetree/bindings/chosen.txt | 45 ++++++
Documentation/kdump/kdump.txt | 16 ++-
arch/arm64/Kconfig | 11 ++
arch/arm64/configs/defconfig | 1 +
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 41 +++++-
arch/arm64/include/asm/smp.h | 2 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 ++++++++++
arch/arm64/kernel/machine_kexec.c | 67 ++++++++-
arch/arm64/kernel/setup.c | 7 +-
arch/arm64/kernel/smp.c | 63 +++++++++
arch/arm64/mm/init.c | 202 +++++++++++++++++++++++++++
include/linux/memblock.h | 1 +
mm/memblock.c | 28 ++++
15 files changed, 551 insertions(+), 7 deletions(-)
create mode 100644 arch/arm64/kernel/crash_dump.c
--
2.9.0
arm64: kdump: reserve memory for crash dump kernel
arm64: kdump: implement machine_crash_shutdown()
arm64: kdump: add kdump support
arm64: kdump: add VMCOREINFO's for user-space coredump tools
arm64: kdump: enable kdump in the arm64 defconfig
arm64: kdump: update a kernel doc
Documentation: dt: chosen properties for arm64 kdump
Documentation/devicetree/bindings/chosen.txt | 30 +++++
Documentation/kdump/kdump.txt | 16 ++-
arch/arm64/Kconfig | 11 ++
arch/arm64/configs/defconfig | 1 +
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 41 ++++++-
arch/arm64/include/asm/smp.h | 2 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 ++++++++++++
arch/arm64/kernel/machine_kexec.c | 67 ++++++++++-
arch/arm64/kernel/setup.c | 7 +-
arch/arm64/kernel/smp.c | 63 ++++++++++
arch/arm64/mm/init.c | 167 +++++++++++++++++++++++++++
13 files changed, 472 insertions(+), 7 deletions(-)
create mode 100644 arch/arm64/kernel/crash_dump.c
--
2.9.0
AKASHI Takahiro
2016-09-07 04:32:03 UTC
Permalink
From: James Morse <***@arm.com>

Add documentation for
linux,crashkernel-base and crashkernel-size,
linux,elfcorehdr
used by arm64 kexec/kdump to decribe the kdump reserved area, and
the elfcorehdr's location within it.

Signed-off-by: James Morse <***@arm.com>
[***@linaro.org: added "linux,crashkernel-base" and "-size" ]
Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
Documentation/devicetree/bindings/chosen.txt | 30 ++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82..6257ee7 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,33 @@ This property is set (currently only on PowerPC, and only needed on
book3e) by some versions of kexec-tools to tell the new kernel that it
is being booted by kexec, as the booting environment may differ (e.g.
a different secondary CPU release mechanism)
+
+linux,crashkernel-base
+linux,crashkernel-size
+----------------------
+
+These properties (currently used on PowerPC and arm64) indicates
+the base address and the size, respectively, of the reserved memory
+range for crash dump kernel.
+e.g.
+
+/ {
+ chosen {
+ linux,crashkernel-base = <0x9 0xf0000000>;
+ linux,crashkernel-size = <0x0 0x10000000>;
+ };
+};
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+ chosen {
+ linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+ };
+};
--
2.9.0
Rob Herring
2016-09-16 13:03:08 UTC
Permalink
Post by AKASHI Takahiro
Add documentation for
linux,crashkernel-base and crashkernel-size,
linux,elfcorehdr
used by arm64 kexec/kdump to decribe the kdump reserved area, and
the elfcorehdr's location within it.
---
Documentation/devicetree/bindings/chosen.txt | 30 ++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
Acked-by: Rob Herring <***@kernel.org>
AKASHI Takahiro
2016-09-07 04:29:08 UTC
Permalink
This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Baoquan He <***@redhat.com>
Acked-by: Dave Young <***@redhat.com>
---
Documentation/kdump/kdump.txt | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 88ff63d..c090531 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
a remote system.

Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.

When the system kernel boots, it reserves a small section of memory for
the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)

AUTO_ZRELADDR=y

+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+ on non-VHE systems even if it is configured. This is because the CPU
+ cannot be reset to EL2 on panic.
+
Extended crashkernel syntax
===========================

@@ -305,6 +312,8 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.

+ On arm64, use "crashkernel=Y[@X]". Note that the start address of
+ the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).

Load the Dump-capture Kernel
============================
@@ -327,6 +336,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
+ - Use vmlinux or Image

If you are using a uncompressed vmlinux image then use following command
to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
For arm:
"1 maxcpus=1 reset_devices"

+For arm64:
+ "1 maxcpus=1 reset_devices"
+
Notes on loading the dump-capture kernel:

* By default, the ELF headers are stored in ELF64 format to support
--
2.9.0
James Morse
2016-09-16 16:08:28 UTC
Permalink
Hi Akashi,
Post by AKASHI Takahiro
This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
AUTO_ZRELADDR=y
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+ on non-VHE systems even if it is configured. This is because the CPU
+ cannot be reset to EL2 on panic.
Nit:
cannot be -> will not be

We could try to do this, but its more code that could prevent us reaching the
kdump kernel, so we choose not to.
Post by AKASHI Takahiro
"1 maxcpus=1 reset_devices"
+ "1 maxcpus=1 reset_devices"
+
'maxcpus=1' is a bit fragile. Since 44dbcc93ab67145 ("arm64: Fix behavior of
maxcpus=N") udev on ubuntu vivid (running on Juno) has taken it upon itself to
bring the secondary cores online, even when booted with 'maxcpus=1'.

Can we change the recomendation to "1 nosmp reset_devices"?


Thanks,

James
AKASHI Takahiro
2016-09-20 08:27:37 UTC
Permalink
Post by James Morse
Hi Akashi,
Post by AKASHI Takahiro
This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
AUTO_ZRELADDR=y
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+ on non-VHE systems even if it is configured. This is because the CPU
+ cannot be reset to EL2 on panic.
cannot be -> will not be
OK.
Post by James Morse
We could try to do this, but its more code that could prevent us reaching the
kdump kernel, so we choose not to.
Post by AKASHI Takahiro
"1 maxcpus=1 reset_devices"
+ "1 maxcpus=1 reset_devices"
+
'maxcpus=1' is a bit fragile. Since 44dbcc93ab67145 ("arm64: Fix behavior of
maxcpus=N") udev on ubuntu vivid (running on Juno) has taken it upon itself to
bring the secondary cores online, even when booted with 'maxcpus=1'.
Can we change the recomendation to "1 nosmp reset_devices"?
Well, I have no strong opinion here, but I'm not quite sure whether
this change does make any difference in practice.
Looking at kernel/smp.c, the only difference is setup_max_cpus. But
given that arch_disable_smp_support() is null and smp_cpus_done()
ignores "max_cpus" on arm64, I don't think that the change is very
meaningful.

I might miss something.

Thanks,
-Takahiro AKASHI
Post by James Morse
Thanks,
James
Matthias Brugger
2016-09-26 17:21:23 UTC
Permalink
Post by James Morse
Hi Akashi,
Post by AKASHI Takahiro
This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)
AUTO_ZRELADDR=y
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+ on non-VHE systems even if it is configured. This is because the CPU
+ cannot be reset to EL2 on panic.
cannot be -> will not be
We could try to do this, but its more code that could prevent us reaching the
kdump kernel, so we choose not to.
Post by AKASHI Takahiro
"1 maxcpus=1 reset_devices"
+ "1 maxcpus=1 reset_devices"
+
'maxcpus=1' is a bit fragile. Since 44dbcc93ab67145 ("arm64: Fix behavior of
maxcpus=N") udev on ubuntu vivid (running on Juno) has taken it upon itself to
bring the secondary cores online, even when booted with 'maxcpus=1'.
This looks pretty much like a bug to me and should get fixed on their site.
Post by James Morse
Can we change the recomendation to "1 nosmp reset_devices"?
Thanks,
James
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
AKASHI Takahiro
2016-09-07 04:29:07 UTC
Permalink
Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/configs/defconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index eadf485..e181132 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -77,6 +77,7 @@ CONFIG_CMA=y
CONFIG_SECCOMP=y
CONFIG_XEN=y
CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_COMPAT=y
CONFIG_CPU_IDLE=y
--
2.9.0
AKASHI Takahiro
2016-09-07 04:29:06 UTC
Permalink
For the current crash utility, we need to know, at least,
- kimage_voffset
- PHYS_OFFSET
to handle the contents of core dump file (/proc/vmcore) correctly due to
the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.
This patch puts them as VMCOREINFO's into the file.

- VA_BITS
is also added for makedumpfile command.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 8ac9dba8..38b4411 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -17,6 +17,7 @@

#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/memory.h>
#include <asm/mmu_context.h>

#include "cpu-reset.h"
@@ -260,3 +261,13 @@ void machine_crash_shutdown(struct pt_regs *regs)

pr_info("Starting crashdump kernel...\n");
}
+
+void arch_crash_save_vmcoreinfo(void)
+{
+ VMCOREINFO_NUMBER(VA_BITS);
+ /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+ vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+ kimage_voffset);
+ vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+ PHYS_OFFSET);
+}
--
2.9.0
James Morse
2016-09-16 16:04:45 UTC
Permalink
Post by AKASHI Takahiro
For the current crash utility, we need to know, at least,
- kimage_voffset
- PHYS_OFFSET
to handle the contents of core dump file (/proc/vmcore) correctly due to
the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.
This patch puts them as VMCOREINFO's into the file.
- VA_BITS
is also added for makedumpfile command.
More VMCOREINFO's may be added later.
Reviewed-by: James Morse <***@arm.com>


Thanks,

James
AKASHI Takahiro
2016-09-07 04:29:05 UTC
Permalink
On crash dump kernel, all the information about primary kernel's system
memory (core image) is available in elf core header.
The primary kernel will set aside this header with reserve_elfcorehdr()
at boot time and inform crash dump kernel of its location via a new
device-tree property, "linux,elfcorehdr".

Please note that all other architectures use traditional "elfcorehdr="
kernel parameter for this purpose.

Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.

We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/Kconfig | 11 +++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/init.c | 54 ++++++++++++++++++++++++++++++++
4 files changed, 137 insertions(+)
create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bc3f00f..9c15c66 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -683,6 +683,17 @@ config KEXEC
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.

+config CRASH_DUMP
+ bool "Build kdump crash kernel"
+ help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec.
+
+ For more details see Documentation/kdump/kdump.txt
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 14f7b65..f1cbfc8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o

obj-y += $(arm64-obj-y) vdso/ probes/
obj-m += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 0000000..bc5b932
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <***@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user(buf, vaddr + offset, csize)) {
+ memunmap(vaddr);
+ return -EFAULT;
+ }
+ } else {
+ memcpy(buf, vaddr + offset, csize);
+ }
+
+ memunmap(vaddr);
+
+ return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+ return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index dd273ec..e4d9c38 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
#include <linux/efi.h>
#include <linux/swiotlb.h>
#include <linux/kexec.h>
+#include <linux/crash_dump.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -186,6 +187,57 @@ static void __init reserve_crashkernel(void)
}
#endif /* CONFIG_KEXEC_CORE */

+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+ of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+ if (!elfcorehdr_size)
+ return;
+
+ if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+ pr_warn("elfcorehdr is overlapped\n");
+ return;
+ }
+
+ memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+ pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
+ elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+ ;
+}
+#endif /* CONFIG_CRASH_DUMP */
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -409,6 +461,8 @@ void __init arm64_memblock_init(void)

reserve_crashkernel();

+ reserve_elfcorehdr();
+
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
--
2.9.0
James Morse
2016-09-16 14:50:24 UTC
Permalink
Post by AKASHI Takahiro
On crash dump kernel, all the information about primary kernel's system
memory (core image) is available in elf core header.
The primary kernel will set aside this header with reserve_elfcorehdr()
at boot time and inform crash dump kernel of its location via a new
device-tree property, "linux,elfcorehdr".
Please note that all other architectures use traditional "elfcorehdr="
kernel parameter for this purpose.
Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.
We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.
One nit below, looks good.

Reviewed-by: James Morse <***@arm.com>


Thanks,

James
Post by AKASHI Takahiro
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user(buf, vaddr + offset, csize)) {
../arch/arm64/kernel/crash_dump.c:45:34: warning: incorrect type in argument 1
(different address spaces)
Post by AKASHI Takahiro
../arch/arm64/kernel/crash_dump.c:45:34: expected void [noderef] <asn:1>*to
../arch/arm64/kernel/crash_dump.c:45:34: got char *buf
AKASHI Takahiro
2016-09-20 07:46:23 UTC
Permalink
Post by James Morse
Post by AKASHI Takahiro
On crash dump kernel, all the information about primary kernel's system
memory (core image) is available in elf core header.
The primary kernel will set aside this header with reserve_elfcorehdr()
at boot time and inform crash dump kernel of its location via a new
device-tree property, "linux,elfcorehdr".
Please note that all other architectures use traditional "elfcorehdr="
kernel parameter for this purpose.
Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.
We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.
One nit below, looks good.
Fixed.

Thanks,
-Takahiro AKASHI
Post by James Morse
Thanks,
James
Post by AKASHI Takahiro
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user(buf, vaddr + offset, csize)) {
../arch/arm64/kernel/crash_dump.c:45:34: warning: incorrect type in argument 1
(different address spaces)
Post by AKASHI Takahiro
../arch/arm64/kernel/crash_dump.c:45:34: expected void [noderef] <asn:1>*to
../arch/arm64/kernel/crash_dump.c:45:34: got char *buf
Matthias Brugger
2016-09-22 15:50:57 UTC
Permalink
Post by AKASHI Takahiro
On crash dump kernel, all the information about primary kernel's system
memory (core image) is available in elf core header.
The primary kernel will set aside this header with reserve_elfcorehdr()
at boot time and inform crash dump kernel of its location via a new
device-tree property, "linux,elfcorehdr".
Please note that all other architectures use traditional "elfcorehdr="
kernel parameter for this purpose.
Then crash dump kernel will access the primary kernel's memory with
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.
We also need our own elfcorehdr_read() here since the header is placed
within crash dump kernel's usable memory.
---
arch/arm64/Kconfig | 11 +++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/init.c | 54 ++++++++++++++++++++++++++++++++
4 files changed, 137 insertions(+)
create mode 100644 arch/arm64/kernel/crash_dump.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bc3f00f..9c15c66 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -683,6 +683,17 @@ config KEXEC
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.
+config CRASH_DUMP
+ bool "Build kdump crash kernel"
+ help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec.
+
+ For more details see Documentation/kdump/kdump.txt
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 14f7b65..f1cbfc8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-y += $(arm64-obj-y) vdso/ probes/
obj-m += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 0000000..bc5b932
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user(buf, vaddr + offset, csize)) {
+ memunmap(vaddr);
+ return -EFAULT;
+ }
+ } else {
+ memcpy(buf, vaddr + offset, csize);
+ }
+
+ memunmap(vaddr);
+
+ return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ *
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+ return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index dd273ec..e4d9c38 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
#include <linux/efi.h>
#include <linux/swiotlb.h>
#include <linux/kexec.h>
+#include <linux/crash_dump.h>
#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -186,6 +187,57 @@ static void __init reserve_crashkernel(void)
}
#endif /* CONFIG_KEXEC_CORE */
+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves elf core header given in "elfcorehdr=" kernel
+ * command line parameter. This region contains all the information about
+ * primary kernel's core image and is used by a dump capture kernel to
+ * access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+ of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
Do I get that right that we can pass crashkernel address/size through
the kernel boot parameter, but elfcorehdr can only be provided via
device tree?
Why? If there is a reason for doing so, we should fix the documentation.

Regards,
Matthias
AKASHI Takahiro
2016-09-07 04:29:04 UTC
Permalink
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 41 ++++++++++++++++++++++++-
arch/arm64/include/asm/smp.h | 2 ++
arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
arch/arm64/kernel/smp.c | 63 +++++++++++++++++++++++++++++++++++++++
5 files changed, 159 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297..1473fc2 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
#include <linux/threads.h>
#include <asm/irq.h>

-#define NR_IPI 6
+#define NR_IPI 7

typedef struct {
unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+ :
+ : "memory"
+ );
+ }
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 0226447..6b0f2c7 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
*/
bool cpus_are_stuck_in_kernel(void);

+extern void smp_send_crash_stop(void);
+
#endif /* ifndef __ASSEMBLY__ */

#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a..8ac9dba8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
* published by the Free Software Foundation.
*/

+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/smp.h>

@@ -22,6 +25,7 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;

+bool in_crash_kexec;
static unsigned long kimage_start;

/**
@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
/*
* New cpus may have become stuck_in_kernel after we loaded the image.
*/
- BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+ BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+ !WARN_ON(in_crash_kexec));

reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
* relocation is complete.
*/

- cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
+ cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
kimage_start, 0);

BUG(); /* Should never get here. */
}

+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+ int ret;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ /*
+ * First try to remove the active state. If this
+ * fails, try to EOI the interrupt.
+ */
+ ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+ if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
void machine_crash_shutdown(struct pt_regs *regs)
{
- /* Empty routine needed to avoid build errors. */
+ local_irq_disable();
+
+ in_crash_kexec = true;
+
+ /* shutdown non-crashing cpus */
+ smp_send_crash_stop();
+
+ /* for crashing cpu */
+ crash_save_cpu(regs, smp_processor_id());
+ machine_kexec_mask_interrupts();
+
+ pr_info("Starting crashdump kernel...\n");
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..b401b25 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
#include <linux/completion.h>
#include <linux/of.h>
#include <linux/irq_work.h>
+#include <linux/kexec.h>

#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -71,6 +72,7 @@ enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_CPU_CRASH_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
IPI_WAKEUP
@@ -734,6 +736,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
S(IPI_RESCHEDULE, "Rescheduling interrupts"),
S(IPI_CALL_FUNC, "Function call interrupts"),
S(IPI_CPU_STOP, "CPU stop interrupts"),
+ S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
S(IPI_TIMER, "Timer broadcast interrupts"),
S(IPI_IRQ_WORK, "IRQ work interrupts"),
S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -808,6 +811,29 @@ static void ipi_cpu_stop(unsigned int cpu)
cpu_relax();
}

+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+ crash_save_cpu(regs, cpu);
+
+ atomic_dec(&waiting_for_crash_ipi);
+
+ local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (cpu_ops[cpu]->cpu_die)
+ cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+ /* just in case */
+ cpu_park_loop();
+#endif
+}
+
/*
* Main handler for inter-processor interrupts
*/
@@ -838,6 +864,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
irq_exit();
break;

+ case IPI_CPU_CRASH_STOP:
+ if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+ irq_enter();
+ ipi_cpu_crash_stop(cpu, regs);
+
+ unreachable();
+ }
+ break;
+
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
case IPI_TIMER:
irq_enter();
@@ -910,6 +945,34 @@ void smp_send_stop(void)
cpumask_pr_args(cpu_online_mask));
}

+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(cpu_online_mask));
+}
+#endif
+
/*
* not supported here
*/
--
2.9.0
James Morse
2016-09-14 18:09:33 UTC
Permalink
Hi Akashi,

(CC: Marc who knows how this irqchip wizardry works
Cover letter: https://www.spinics.net/lists/arm-kernel/msg529520.html )
Post by AKASHI Takahiro
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.
As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().
---
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 41 ++++++++++++++++++++++++-
arch/arm64/include/asm/smp.h | 2 ++
arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
arch/arm64/kernel/smp.c | 63 +++++++++++++++++++++++++++++++++++++++
5 files changed, 159 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297..1473fc2 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
#include <linux/threads.h>
#include <asm/irq.h>
-#define NR_IPI 6
+#define NR_IPI 7
typedef struct {
unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+
What about SPSEL? While we don't use it, it is correctly preserved for
everything except a CPU that calls panic()...
Post by AKASHI Takahiro
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+ : "memory"
+ );
+ }
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 0226447..6b0f2c7 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
*/
bool cpus_are_stuck_in_kernel(void);
+extern void smp_send_crash_stop(void);
+
#endif /* ifndef __ASSEMBLY__ */
#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a..8ac9dba8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
* published by the Free Software Foundation.
*/
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/smp.h>
@@ -22,6 +25,7 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
+bool in_crash_kexec;
static?
Post by AKASHI Takahiro
static unsigned long kimage_start;
/**
@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
/*
* New cpus may have become stuck_in_kernel after we loaded the image.
*/
- BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+ BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+ !WARN_ON(in_crash_kexec));
In the kdump case, num_online_cpus() is unchanged as ipi_cpu_crash_stop()
doesn't update the online cpu masks, so this WARN_ON always fires. This is
confusing as the 'failed to stop secondary CPUs' message doesn't appear, because
those CPUs did stop, and waiting_for_crash_ipi has the expected value...
Post by AKASHI Takahiro
reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
* relocation is complete.
*/
- cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
+ cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
kimage_start, 0);
BUG(); /* Should never get here. */
}
+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+ int ret;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ /*
+ * First try to remove the active state. If this
+ * fails, try to EOI the interrupt.
+ */
+ ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+ if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
This function is over my head ... I have no idea how this works, I can only
comment that its different to the version under arch/arm

/me adds Marc Z to CC.
Post by AKASHI Takahiro
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
void machine_crash_shutdown(struct pt_regs *regs)
{
- /* Empty routine needed to avoid build errors. */
+ local_irq_disable();
+
+ in_crash_kexec = true;
+
+ /* shutdown non-crashing cpus */
+ smp_send_crash_stop();
+
+ /* for crashing cpu */
+ crash_save_cpu(regs, smp_processor_id());
+ machine_kexec_mask_interrupts();
+
+ pr_info("Starting crashdump kernel...\n");
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..b401b25 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
#include <linux/completion.h>
#include <linux/of.h>
#include <linux/irq_work.h>
+#include <linux/kexec.h>
#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -71,6 +72,7 @@ enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_CPU_CRASH_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
IPI_WAKEUP
@@ -734,6 +736,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
S(IPI_RESCHEDULE, "Rescheduling interrupts"),
S(IPI_CALL_FUNC, "Function call interrupts"),
S(IPI_CPU_STOP, "CPU stop interrupts"),
+ S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
S(IPI_TIMER, "Timer broadcast interrupts"),
S(IPI_IRQ_WORK, "IRQ work interrupts"),
S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -808,6 +811,29 @@ static void ipi_cpu_stop(unsigned int cpu)
cpu_relax();
}
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+ crash_save_cpu(regs, cpu);
+
+ atomic_dec(&waiting_for_crash_ipi);
+
+ local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (cpu_ops[cpu]->cpu_die)
+ cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+ /* just in case */
+ cpu_park_loop();
+#endif
+}
+
/*
* Main handler for inter-processor interrupts
*/
@@ -838,6 +864,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
irq_exit();
break;
+ if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+ irq_enter();
+ ipi_cpu_crash_stop(cpu, regs);
+
+ unreachable();
+ }
+ break;
+
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
irq_enter();
@@ -910,6 +945,34 @@ void smp_send_stop(void)
cpumask_pr_args(cpu_online_mask));
}
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(cpu_online_mask));
+}
+#endif
This is very similar to smp_send_stop() which also has the timeout. Is it
possible to merge them? You could use in_crash_kexec to choose the IPI type.
Post by AKASHI Takahiro
+
/*
* not supported here
*/
Reviewed-by: James Morse <***@arm.com>


Thanks,

James
Marc Zyngier
2016-09-15 08:13:49 UTC
Permalink
Hi James,

Thanks for cc-ing me.
Post by James Morse
Hi Akashi,
(CC: Marc who knows how this irqchip wizardry works
Cover letter: https://www.spinics.net/lists/arm-kernel/msg529520.html )
Post by AKASHI Takahiro
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.
As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().
---
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 41 ++++++++++++++++++++++++-
arch/arm64/include/asm/smp.h | 2 ++
arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
arch/arm64/kernel/smp.c | 63 +++++++++++++++++++++++++++++++++++++++
5 files changed, 159 insertions(+), 5 deletions(-)
[...]
Post by James Morse
Post by AKASHI Takahiro
+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+ int ret;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ /*
+ * First try to remove the active state. If this
+ * fails, try to EOI the interrupt.
+ */
+ ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+ if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
This function is over my head ... I have no idea how this works, I can only
comment that its different to the version under arch/arm
/me adds Marc Z to CC.
I wrote the damn code! ;-)

The main idea is that simply EOIing an interrupt is not good enough if
the interrupt has been offloaded to a VM. It needs to be actively
deactivated for the state machine to be reset.

But realistically, even that is not enough. What we need is a way to
completely shut off the GIC, irrespective of the state of the various
interrupts. A "panic button" of some sort, with no return.

That would probably work for GICv3 (assuming that we don't need to
involve the secure side of things), but anything GICv2 based would be
difficult to deal with (you cannot access the other CPU private
interrupt configuration). Maybe that'd be enough, maybe not. Trying to
boot a crash kernel is like buying a lottery ticket anyway (and with
similar odds...).

I'll have a look.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
AKASHI Takahiro
2016-09-16 03:21:02 UTC
Permalink
James,

Thank you for your review.
Post by James Morse
Hi Akashi,
(CC: Marc who knows how this irqchip wizardry works
Cover letter: https://www.spinics.net/lists/arm-kernel/msg529520.html )
Post by AKASHI Takahiro
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.
As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().
---
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 41 ++++++++++++++++++++++++-
arch/arm64/include/asm/smp.h | 2 ++
arch/arm64/kernel/machine_kexec.c | 56 ++++++++++++++++++++++++++++++++--
arch/arm64/kernel/smp.c | 63 +++++++++++++++++++++++++++++++++++++++
5 files changed, 159 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297..1473fc2 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
#include <linux/threads.h>
#include <asm/irq.h>
-#define NR_IPI 6
+#define NR_IPI 7
typedef struct {
unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+
What about SPSEL? While we don't use it, it is correctly preserved for
everything except a CPU that calls panic()...
My comment above might be confusing, but what I want to fake
here is "spsr" as pt_regs.pstate is normally set based on spsr_el1.
So there is no corresponding field of SPSEL in spsr.
Post by James Morse
Post by AKASHI Takahiro
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+ : "memory"
+ );
+ }
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 0226447..6b0f2c7 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -136,6 +136,8 @@ static inline void cpu_panic_kernel(void)
*/
bool cpus_are_stuck_in_kernel(void);
+extern void smp_send_crash_stop(void);
+
#endif /* ifndef __ASSEMBLY__ */
#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a..8ac9dba8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
* published by the Free Software Foundation.
*/
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/smp.h>
@@ -22,6 +25,7 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
+bool in_crash_kexec;
static?
Yes, it can be.
Post by James Morse
Post by AKASHI Takahiro
static unsigned long kimage_start;
/**
@@ -148,7 +152,8 @@ void machine_kexec(struct kimage *kimage)
/*
* New cpus may have become stuck_in_kernel after we loaded the image.
*/
- BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+ BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
+ !WARN_ON(in_crash_kexec));
In the kdump case, num_online_cpus() is unchanged as ipi_cpu_crash_stop()
doesn't update the online cpu masks, so this WARN_ON always fires. This is
confusing as the 'failed to stop secondary CPUs' message doesn't appear, because
those CPUs did stop, and waiting_for_crash_ipi has the expected value...
Good catch! I've never noticed that the message was wrong.
The line should be changed to:
BUG_ON((cpus_are_stuck_in_kernel() || (num_online_cpus() > 1)) &&
!in_crash_kexec);
Post by James Morse
Post by AKASHI Takahiro
reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -200,13 +205,58 @@ void machine_kexec(struct kimage *kimage)
* relocation is complete.
*/
- cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
+ cpu_soft_restart(!in_crash_kexec, reboot_code_buffer_phys, kimage->head,
kimage_start, 0);
BUG(); /* Should never get here. */
}
+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+ int ret;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ /*
+ * First try to remove the active state. If this
+ * fails, try to EOI the interrupt.
+ */
+ ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+ if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
This function is over my head ... I have no idea how this works, I can only
comment that its different to the version under arch/arm
/me adds Marc Z to CC.
This function was once borrowed from arch/arm, then dropped temporarily
and revamped based on Marc's comment IIRC.
So I would like to defer to Marc.
Post by James Morse
Post by AKASHI Takahiro
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
void machine_crash_shutdown(struct pt_regs *regs)
{
- /* Empty routine needed to avoid build errors. */
+ local_irq_disable();
+
+ in_crash_kexec = true;
+
+ /* shutdown non-crashing cpus */
+ smp_send_crash_stop();
+
+ /* for crashing cpu */
+ crash_save_cpu(regs, smp_processor_id());
+ machine_kexec_mask_interrupts();
+
+ pr_info("Starting crashdump kernel...\n");
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..b401b25 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
#include <linux/completion.h>
#include <linux/of.h>
#include <linux/irq_work.h>
+#include <linux/kexec.h>
#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -71,6 +72,7 @@ enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_CPU_CRASH_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
IPI_WAKEUP
@@ -734,6 +736,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
S(IPI_RESCHEDULE, "Rescheduling interrupts"),
S(IPI_CALL_FUNC, "Function call interrupts"),
S(IPI_CPU_STOP, "CPU stop interrupts"),
+ S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
S(IPI_TIMER, "Timer broadcast interrupts"),
S(IPI_IRQ_WORK, "IRQ work interrupts"),
S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -808,6 +811,29 @@ static void ipi_cpu_stop(unsigned int cpu)
cpu_relax();
}
+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi;
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+ crash_save_cpu(regs, cpu);
+
+ atomic_dec(&waiting_for_crash_ipi);
+
+ local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (cpu_ops[cpu]->cpu_die)
+ cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+ /* just in case */
+ cpu_park_loop();
+#endif
+}
+
/*
* Main handler for inter-processor interrupts
*/
@@ -838,6 +864,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
irq_exit();
break;
+ if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+ irq_enter();
+ ipi_cpu_crash_stop(cpu, regs);
+
+ unreachable();
+ }
+ break;
+
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
irq_enter();
@@ -910,6 +945,34 @@ void smp_send_stop(void)
cpumask_pr_args(cpu_online_mask));
}
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(cpu_online_mask));
+}
+#endif
This is very similar to smp_send_stop() which also has the timeout. Is it
possible to merge them? You could use in_crash_kexec to choose the IPI type.
Yeah, we could merge them along with ipi_cpu_(crash_)stop().
But the resulting code would be quite noisy if each line
is switched by "if (in_crash_kexec)."
Otherwise, we may have one big "if" like:
void smp_send_stop(void)
{
if (in_crash_kexec)
...
else
...
}
It seems to me that it is not much different from the current code.
What do you think?

-Takahiro AKASHI
Post by James Morse
Post by AKASHI Takahiro
+
/*
* not supported here
*/
Thanks,
James
James Morse
2016-09-16 14:49:31 UTC
Permalink
Post by AKASHI Takahiro
Post by James Morse
Post by AKASHI Takahiro
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.
As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+
What about SPSEL? While we don't use it, it is correctly preserved for
everything except a CPU that calls panic()...
My comment above might be confusing, but what I want to fake
here is "spsr" as pt_regs.pstate is normally set based on spsr_el1.
So there is no corresponding field of SPSEL in spsr.
Here is my logic, I may have missed something obvious, see what you think:

SPSR_EL{1,2} shows the CPU mode 'M' in bits 0-4, From aarch64 bit 4 is always 0.
From the register definitions in the ARM-ARM C5.2, likely values in 0-3 are:
0100 EL1t
0101 EL1h
1000 EL2t
1001 EL2h

I'm pretty sure this least significant bit is what SPSEL changes, so it does get
implicitly recorded in SPSR.
CurrentEL returns a value in bits 0-3, of which 0-1 are RES0, so we lose the
difference between EL?t and EL?h.
Post by AKASHI Takahiro
Post by James Morse
Post by AKASHI Takahiro
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+ : "memory"
Do you need the memory clobber? This asm only modifies values in newregs.
Post by AKASHI Takahiro
Post by James Morse
Post by AKASHI Takahiro
+ );
+ }
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(cpu_online_mask));
+}
+#endif
This is very similar to smp_send_stop() which also has the timeout. Is it
possible to merge them? You could use in_crash_kexec to choose the IPI type.
Yeah, we could merge them along with ipi_cpu_(crash_)stop().
But the resulting code would be quite noisy if each line
is switched by "if (in_crash_kexec)."
void smp_send_stop(void)
{
if (in_crash_kexec)
...
else
...
}
It seems to me that it is not much different from the current code.
What do you think?
Hmm, yes, its too fiddly to keep the existing behaviour of both.

The problems are ipi_cpu_stop() doesn't call cpu_die(), (I can't see a good
reason for this, but more archaeology is needed), and ipi_cpu_crash_stop()
doesn't modify the online cpu mask.

I don't suggest we do this yet, but it could be future cleanup if it's proved to
be safe:
smp_send_stop() is only called from: machine_halt(), machine_power_off(),
machine_restart() and panic(). In all those cases the CPUs are never expected to
come back, so we can probably merge the IPIs. This involves modifying the
online cpu mask during kdump, (which I think is fine as it uses the atomic
bitops so we won't get blocked on a lock), and promoting in_crash_kexec to some
atomic type.

But I think we should leave it as it is for now,


Thanks,

James
AKASHI Takahiro
2016-09-20 07:36:35 UTC
Permalink
Post by James Morse
Post by AKASHI Takahiro
Post by James Morse
Post by AKASHI Takahiro
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.
As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc..a908958 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,46 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+
What about SPSEL? While we don't use it, it is correctly preserved for
everything except a CPU that calls panic()...
My comment above might be confusing, but what I want to fake
here is "spsr" as pt_regs.pstate is normally set based on spsr_el1.
So there is no corresponding field of SPSEL in spsr.
SPSR_EL{1,2} shows the CPU mode 'M' in bits 0-4, From aarch64 bit 4 is always 0.
0100 EL1t
0101 EL1h
1000 EL2t
1001 EL2h
I'm pretty sure this least significant bit is what SPSEL changes, so it does get
implicitly recorded in SPSR.
CurrentEL returns a value in bits 0-3, of which 0-1 are RES0, so we lose the
difference between EL?t and EL?h.
OK.
SPSel will be added assuming that CurrentEL is never 0 here.
Post by James Morse
Post by AKASHI Takahiro
Post by James Morse
Post by AKASHI Takahiro
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=r" (tmp1), "=r" (tmp2), "+r" (newregs)
+ : "memory"
Do you need the memory clobber? This asm only modifies values in newregs.
What about this (including the change above):
| "/* faked current PSTATE */\n"
| "mrs %0, CurrentEL\n"
| "mrs %1, SPSEL\n"
| "orr %0, %0, %1\n"
| "mrs %1, DAIF\n"
| "orr %0, %0, %1\n"
| "mrs %1, NZCV\n"
| "orr %0, %0, %1\n"
| /* pc */
| "adr %1, 1f\n"
| "1:\n"
| "stp %1, %0, [%2, #16 * 16]\n"
| : "+r" (tmp1), "+r" (tmp2)
| : "r" (newregs)
| : "memory"
Post by James Morse
Post by AKASHI Takahiro
Post by James Morse
Post by AKASHI Takahiro
+ );
+ }
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(cpu_online_mask));
+}
+#endif
This is very similar to smp_send_stop() which also has the timeout. Is it
possible to merge them? You could use in_crash_kexec to choose the IPI type.
Yeah, we could merge them along with ipi_cpu_(crash_)stop().
But the resulting code would be quite noisy if each line
is switched by "if (in_crash_kexec)."
void smp_send_stop(void)
{
if (in_crash_kexec)
...
else
...
}
It seems to me that it is not much different from the current code.
What do you think?
Hmm, yes, its too fiddly to keep the existing behaviour of both.
The problems are ipi_cpu_stop() doesn't call cpu_die(), (I can't see a good
reason for this, but more archaeology is needed), and ipi_cpu_crash_stop()
doesn't modify the online cpu mask.
I don't suggest we do this yet, but it could be future cleanup if it's proved to
smp_send_stop() is only called from: machine_halt(), machine_power_off(),
machine_restart() and panic(). In all those cases the CPUs are never expected to
come back, so we can probably merge the IPIs. This involves modifying the
online cpu mask during kdump, (which I think is fine as it uses the atomic
bitops so we won't get blocked on a lock), and promoting in_crash_kexec to some
atomic type.
But I think we should leave it as it is for now,
Sure.

Thanks,
-Takahiro AKASHI
Post by James Morse
Thanks,
James
AKASHI Takahiro
2016-09-07 04:29:03 UTC
Permalink
On the startup of primary kernel, the memory region used by crash dump
kernel must be specified by "crashkernel=" kernel parameter.
reserve_crashkernel() will allocate and reserve the region for later use.

User space tools, like kexec-tools, will be able to find that region as
- "Crash kernel" in /proc/iomem, or
- "linux,crashkernel-base" and "linux,crashkernel-size" under
/sys/firmware/devicetree/base/chosen

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Signed-off-by: Mark Salter <***@redhat.com>
Signed-off-by: Pratyush Anand <***@redhat.com>
Reviewed-by: James Morse <***@arm.com>
---
arch/arm64/kernel/setup.c | 7 ++-
arch/arm64/mm/init.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 514b4e3..38589b5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
@@ -225,6 +224,12 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
}
+
+#ifdef CONFIG_KEXEC_CORE
+ /* User space tools will find "Crash kernel" region in /proc/iomem. */
+ if (crashk_res.end)
+ insert_resource(&iomem_resource, &crashk_res);
+#endif
}

u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bbb7ee7..dd273ec 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,11 +29,13 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/dma-mapping.h>
#include <linux/dma-contiguous.h>
#include <linux/efi.h>
#include <linux/swiotlb.h>
+#include <linux/kexec.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -76,6 +78,114 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif

+#ifdef CONFIG_KEXEC_CORE
+static unsigned long long crash_size, crash_base;
+static struct property crash_base_prop = {
+ .name = "linux,crashkernel-base",
+ .length = sizeof(u64),
+ .value = &crash_base
+};
+static struct property crash_size_prop = {
+ .name = "linux,crashkernel-size",
+ .length = sizeof(u64),
+ .value = &crash_size,
+};
+
+static int __init export_crashkernel(void)
+{
+ struct device_node *node;
+ int ret;
+
+ if (!crashk_res.end)
+ return 0;
+
+ crash_base = cpu_to_be64(crashk_res.start);
+ crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
+
+ /* Add /chosen/linux,crashkernel-* properties */
+ node = of_find_node_by_path("/chosen");
+ if (!node)
+ return -ENOENT;
+
+ /*
+ * There might be existing crash kernel properties, but we can't
+ * be sure what's in them, so remove them.
+ */
+ of_remove_property(node, of_find_property(node,
+ "linux,crashkernel-base", NULL));
+ of_remove_property(node, of_find_property(node,
+ "linux,crashkernel-size", NULL));
+
+ ret = of_add_property(node, &crash_base_prop);
+ if (ret)
+ goto ret_err;
+
+ ret = of_add_property(node, &crash_size_prop);
+ if (ret)
+ goto ret_err;
+
+ return 0;
+
+ret_err:
+ pr_warn("Exporting crashkernel region to device tree failed\n");
+ return ret;
+}
+late_initcall(export_crashkernel);
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ /* no crashkernel= or invalid value specified */
+ if (ret || !crash_size)
+ return;
+
+ if (crash_base == 0) {
+ /* Current arm64 boot protocol requires 2MB alignment */
+ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+ crash_size, SZ_2M);
+ if (crash_base == 0) {
+ pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size) ||
+ memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("crashkernel has wrong address or size\n");
+ return;
+ }
+
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ pr_warn("crashkernel base address is not 2MB aligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+ crash_size >> 20, crash_base >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+ ;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -296,6 +406,9 @@ void __init arm64_memblock_init(void)
arm64_dma_phys_limit = max_zone_dma_phys();
else
arm64_dma_phys_limit = PHYS_MASK + 1;
+
+ reserve_crashkernel();
+
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
--
2.9.0
Matthias Bruger
2016-09-22 10:23:08 UTC
Permalink
Post by AKASHI Takahiro
On the startup of primary kernel, the memory region used by crash dump
kernel must be specified by "crashkernel=" kernel parameter.
reserve_crashkernel() will allocate and reserve the region for later use.
User space tools, like kexec-tools, will be able to find that region as
- "Crash kernel" in /proc/iomem, or
- "linux,crashkernel-base" and "linux,crashkernel-size" under
/sys/firmware/devicetree/base/chosen
---
arch/arm64/kernel/setup.c | 7 ++-
arch/arm64/mm/init.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 514b4e3..38589b5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
@@ -225,6 +224,12 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
}
+
+#ifdef CONFIG_KEXEC_CORE
+ /* User space tools will find "Crash kernel" region in /proc/iomem. */
+ if (crashk_res.end)
+ insert_resource(&iomem_resource, &crashk_res);
+#endif
}
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bbb7ee7..dd273ec 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,11 +29,13 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/dma-mapping.h>
#include <linux/dma-contiguous.h>
#include <linux/efi.h>
#include <linux/swiotlb.h>
+#include <linux/kexec.h>
#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -76,6 +78,114 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif
+#ifdef CONFIG_KEXEC_CORE
+static unsigned long long crash_size, crash_base;
+static struct property crash_base_prop = {
+ .name = "linux,crashkernel-base",
+ .length = sizeof(u64),
+ .value = &crash_base
+};
+static struct property crash_size_prop = {
+ .name = "linux,crashkernel-size",
+ .length = sizeof(u64),
+ .value = &crash_size,
+};
+
+static int __init export_crashkernel(void)
+{
+ struct device_node *node;
+ int ret;
+
+ if (!crashk_res.end)
+ return 0;
+
+ crash_base = cpu_to_be64(crashk_res.start);
+ crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
+
Shouldn't that be the same values as in reserve_crashkernel()?
IMHO this does not need to be recalculated here.

Regards,
Matthias
Post by AKASHI Takahiro
+ /* Add /chosen/linux,crashkernel-* properties */
+ node = of_find_node_by_path("/chosen");
+ if (!node)
+ return -ENOENT;
+
+ /*
+ * There might be existing crash kernel properties, but we can't
+ * be sure what's in them, so remove them.
+ */
+ of_remove_property(node, of_find_property(node,
+ "linux,crashkernel-base", NULL));
+ of_remove_property(node, of_find_property(node,
+ "linux,crashkernel-size", NULL));
+
+ ret = of_add_property(node, &crash_base_prop);
+ if (ret)
+ goto ret_err;
+
+ ret = of_add_property(node, &crash_size_prop);
+ if (ret)
+ goto ret_err;
+
+ return 0;
+
+ pr_warn("Exporting crashkernel region to device tree failed\n");
+ return ret;
+}
+late_initcall(export_crashkernel);
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ /* no crashkernel= or invalid value specified */
+ if (ret || !crash_size)
+ return;
+
+ if (crash_base == 0) {
+ /* Current arm64 boot protocol requires 2MB alignment */
+ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+ crash_size, SZ_2M);
+ if (crash_base == 0) {
+ pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size) ||
+ memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("crashkernel has wrong address or size\n");
+ return;
+ }
+
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ pr_warn("crashkernel base address is not 2MB aligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+ crash_size >> 20, crash_base >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+ ;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -296,6 +406,9 @@ void __init arm64_memblock_init(void)
arm64_dma_phys_limit = max_zone_dma_phys();
else
arm64_dma_phys_limit = PHYS_MASK + 1;
+
+ reserve_crashkernel();
+
dma_contiguous_reserve(arm64_dma_phys_limit);
memblock_allow_resize();
AKASHI Takahiro
2016-09-23 08:37:29 UTC
Permalink
Post by Matthias Bruger
Post by AKASHI Takahiro
On the startup of primary kernel, the memory region used by crash dump
kernel must be specified by "crashkernel=" kernel parameter.
reserve_crashkernel() will allocate and reserve the region for later use.
User space tools, like kexec-tools, will be able to find that region as
- "Crash kernel" in /proc/iomem, or
- "linux,crashkernel-base" and "linux,crashkernel-size" under
/sys/firmware/devicetree/base/chosen
---
arch/arm64/kernel/setup.c | 7 ++-
arch/arm64/mm/init.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 514b4e3..38589b5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
@@ -225,6 +224,12 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
}
+
+#ifdef CONFIG_KEXEC_CORE
+ /* User space tools will find "Crash kernel" region in /proc/iomem. */
+ if (crashk_res.end)
+ insert_resource(&iomem_resource, &crashk_res);
+#endif
}
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bbb7ee7..dd273ec 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,11 +29,13 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/dma-mapping.h>
#include <linux/dma-contiguous.h>
#include <linux/efi.h>
#include <linux/swiotlb.h>
+#include <linux/kexec.h>
#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -76,6 +78,114 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif
+#ifdef CONFIG_KEXEC_CORE
+static unsigned long long crash_size, crash_base;
+static struct property crash_base_prop = {
+ .name = "linux,crashkernel-base",
+ .length = sizeof(u64),
+ .value = &crash_base
+};
+static struct property crash_size_prop = {
+ .name = "linux,crashkernel-size",
+ .length = sizeof(u64),
+ .value = &crash_size,
+};
+
+static int __init export_crashkernel(void)
+{
+ struct device_node *node;
+ int ret;
+
+ if (!crashk_res.end)
+ return 0;
+
+ crash_base = cpu_to_be64(crashk_res.start);
+ crash_size = cpu_to_be64(crashk_res.end - crashk_res.start + 1);
+
Shouldn't that be the same values as in reserve_crashkernel()?
IMHO this does not need to be recalculated here.
Right. crashk_res is calculated from crash_base/size.
So I should and will remove those lines.

Thanks,
-Takahiro AKASHI
Post by Matthias Bruger
Regards,
Matthias
Post by AKASHI Takahiro
+ /* Add /chosen/linux,crashkernel-* properties */
+ node = of_find_node_by_path("/chosen");
+ if (!node)
+ return -ENOENT;
+
+ /*
+ * There might be existing crash kernel properties, but we can't
+ * be sure what's in them, so remove them.
+ */
+ of_remove_property(node, of_find_property(node,
+ "linux,crashkernel-base", NULL));
+ of_remove_property(node, of_find_property(node,
+ "linux,crashkernel-size", NULL));
+
+ ret = of_add_property(node, &crash_base_prop);
+ if (ret)
+ goto ret_err;
+
+ ret = of_add_property(node, &crash_size_prop);
+ if (ret)
+ goto ret_err;
+
+ return 0;
+
+ pr_warn("Exporting crashkernel region to device tree failed\n");
+ return ret;
+}
+late_initcall(export_crashkernel);
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ /* no crashkernel= or invalid value specified */
+ if (ret || !crash_size)
+ return;
+
+ if (crash_base == 0) {
+ /* Current arm64 boot protocol requires 2MB alignment */
+ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+ crash_size, SZ_2M);
+ if (crash_base == 0) {
+ pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size) ||
+ memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("crashkernel has wrong address or size\n");
+ return;
+ }
+
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ pr_warn("crashkernel base address is not 2MB aligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+ crash_size >> 20, crash_base >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+ ;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -296,6 +406,9 @@ void __init arm64_memblock_init(void)
arm64_dma_phys_limit = max_zone_dma_phys();
else
arm64_dma_phys_limit = PHYS_MASK + 1;
+
+ reserve_crashkernel();
+
dma_contiguous_reserve(arm64_dma_phys_limit);
memblock_allow_resize();
James Morse
2016-09-16 16:04:34 UTC
Permalink
(Cc: Ard),

Mark, Ard, how does/will reserved-memory work on an APCI only system?
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
Some narrative on how the old memory ranges get reserved, as there is no longer
any code in the series doing this, (which is pretty neat!):

kexec-tools parses the list of memory ranges in /proc/iomem, and adds a node to
the /reserved-memory for System RAM ranges that don't cover the crash kernel.
Decompiling the crash-kernel DT from Seattle, it looks roughly like this:

reserved-memory {
ranges;
#size-cells = <0x2>;
#address-cells = <0x2>;

***@83ffe50000 {
no-map;
reg = <0x83 0xffe50000 0x0 0x1b0000>;
};

[ ... ]
};


'no-map' means its doing the same thing to memblock as
'linux,usable-memory-range' did in earlier versions,
early_init_dt_reserve_memory_arch() takes no-map to mean memblock_remove().
We trigger the removing via early_init_fdt_scan_reserved_mem() in
arch/arm64/mm/init.c. This happens later than before, but its before the
crashkernel and cma ranges get reserved.

One difference I can see is that before we avoided memblock_remove()ing ranges
that were also in memblock.nomap. This was to avoid the ACPI tables getting
mapped as device memory by mistake, this is fixed by [1]. Now these ranges are
published in /proc/iomem as 'reserved' and won't get covered by a
reserved-memory node, and so we don't need to check memblock.nomap when
memblock_remove()ing.


The only odd thing I can see is for a (mythical?) pure-ACPI system. The EFI stub
will create a DT with a chosen node containing pointers to the memory map and
the efi command line. Now such as system may also grow a /reserved-memory node
after kdump. I don't think this is a problem, but it may not match how an
acpi-only system reserves memory. (how does that work?)
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
This is queued in Will's arm64/for-next/core,
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
This is queued in tip, but I can't see why kdump depends on it. It only has an
effect if the uefi memory map has !WB regions that linux needs to use.



Thanks,

James
Ard Biesheuvel
2016-09-16 20:17:33 UTC
Permalink
Post by James Morse
(Cc: Ard),
Mark, Ard, how does/will reserved-memory work on an APCI only system?
It works by accident, at the moment. We used to ignore both
/memreserve/s and the /reserved-memory node, but due to some unrelated
refactoring, we ended up honouring the reserved-memory node when
booting via UEFI

I proposed some patches a while ago to at least check the
reservations, given that UEFI itself is unaware of them and may end up
occupying a region that should have been reserved.

http://article.gmane.org/gmane.linux.kernel.efi/6464
Post by James Morse
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
Some narrative on how the old memory ranges get reserved, as there is no longer
kexec-tools parses the list of memory ranges in /proc/iomem, and adds a node to
the /reserved-memory for System RAM ranges that don't cover the crash kernel.
reserved-memory {
ranges;
#size-cells = <0x2>;
#address-cells = <0x2>;
no-map;
reg = <0x83 0xffe50000 0x0 0x1b0000>;
};
[ ... ]
};
'no-map' means its doing the same thing to memblock as
'linux,usable-memory-range' did in earlier versions,
early_init_dt_reserve_memory_arch() takes no-map to mean memblock_remove().
We trigger the removing via early_init_fdt_scan_reserved_mem() in
arch/arm64/mm/init.c. This happens later than before, but its before the
crashkernel and cma ranges get reserved.
One difference I can see is that before we avoided memblock_remove()ing ranges
that were also in memblock.nomap. This was to avoid the ACPI tables getting
mapped as device memory by mistake, this is fixed by [1]. Now these ranges are
published in /proc/iomem as 'reserved' and won't get covered by a
reserved-memory node, and so we don't need to check memblock.nomap when
memblock_remove()ing.
The only odd thing I can see is for a (mythical?) pure-ACPI system. The EFI stub
will create a DT with a chosen node containing pointers to the memory map and
the efi command line. Now such as system may also grow a /reserved-memory node
after kdump. I don't think this is a problem, but it may not match how an
acpi-only system reserves memory. (how does that work?)
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
This is queued in Will's arm64/for-next/core,
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
This is queued in tip, but I can't see why kdump depends on it. It only has an
effect if the uefi memory map has !WB regions that linux needs to use.
Thanks,
James
James Morse
2016-09-19 16:05:48 UTC
Permalink
Post by Ard Biesheuvel
Post by James Morse
Mark, Ard, how does/will reserved-memory work on an APCI only system?
It works by accident, at the moment. We used to ignore both
/memreserve/s and the /reserved-memory node, but due to some unrelated
refactoring, we ended up honouring the reserved-memory node when
booting via UEFI
Okay, so kdump probably shouldn't rely on this behaviour...

For an acpi-only system, we could get reserve_crashkernel() to copy the uefi
memory map into the reserved region, changing the region types for existing
kernel memory to EfiReservedMemoryType (for example) and fixing up the reserved
region boundaries.

This second memory map could then be added alongside the real one in the
DT/chosen, and used in preference the second time we go through uefi_init() in
the crash kernel.

kexec-tools would still need to keep the '/reserved-memory' node for non-uefi
systems.

Doing this doesn't depend on userspace, and means the uefi memory map is still
the one and only true source of memory layout information. If fixing it like
this is valid I don't think it should block kdump.

... I will think about this some more before trying to put it together.



Thanks,

James
Ard Biesheuvel
2016-09-19 16:10:27 UTC
Permalink
Post by James Morse
Post by Ard Biesheuvel
Post by James Morse
Mark, Ard, how does/will reserved-memory work on an APCI only system?
It works by accident, at the moment. We used to ignore both
/memreserve/s and the /reserved-memory node, but due to some unrelated
refactoring, we ended up honouring the reserved-memory node when
booting via UEFI
Okay, so kdump probably shouldn't rely on this behaviour...
No, but I would still like to keep /reserved-memory node support for
dynamic ranges, since they are guaranteed not to contain anything
'magic' left behind by the firmware. So if we keep /that/, keeping
static allocation support (with validation, as I proposed in the
series I quoted) is only a small step.
--
Ard.
Post by James Morse
For an acpi-only system, we could get reserve_crashkernel() to copy the uefi
memory map into the reserved region, changing the region types for existing
kernel memory to EfiReservedMemoryType (for example) and fixing up the reserved
region boundaries.
This second memory map could then be added alongside the real one in the
DT/chosen, and used in preference the second time we go through uefi_init() in
the crash kernel.
kexec-tools would still need to keep the '/reserved-memory' node for non-uefi
systems.
Doing this doesn't depend on userspace, and means the uefi memory map is still
the one and only true source of memory layout information. If fixing it like
this is valid I don't think it should block kdump.
... I will think about this some more before trying to put it together.
Thanks,
James
AKASHI Takahiro
2016-09-21 07:42:50 UTC
Permalink
Post by James Morse
Post by Ard Biesheuvel
Post by James Morse
Mark, Ard, how does/will reserved-memory work on an APCI only system?
It works by accident, at the moment. We used to ignore both
/memreserve/s and the /reserved-memory node, but due to some unrelated
refactoring, we ended up honouring the reserved-memory node when
booting via UEFI
Okay, so kdump probably shouldn't rely on this behaviour...
For an acpi-only system, we could get reserve_crashkernel() to copy the uefi
memory map into the reserved region, changing the region types for existing
kernel memory to EfiReservedMemoryType (for example) and fixing up the reserved
region boundaries.
This second memory map could then be added alongside the real one in the
DT/chosen, and used in preference the second time we go through uefi_init() in
the crash kernel.
Do we need add this map as the second one?
Why not replace "linux,uefi-mmap-start" in a new blob?
Post by James Morse
kexec-tools would still need to keep the '/reserved-memory' node for non-uefi
systems.
Yeah, but if we go in our own way on UEFI/ACPI systems, we may want to
go in a DT-specific way, like PPC does, on DT systems.
(That is, "linux,usable-memory" in memory nodes.)

Thanks,
-Takahiro AKASHI
Post by James Morse
Doing this doesn't depend on userspace, and means the uefi memory map is still
the one and only true source of memory layout information. If fixing it like
this is valid I don't think it should block kdump.
... I will think about this some more before trying to put it together.
Thanks,
James
AKASHI Takahiro
2016-09-21 07:33:43 UTC
Permalink
James,
Post by James Morse
(Cc: Ard),
Mark, Ard, how does/will reserved-memory work on an APCI only system?
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
Some narrative on how the old memory ranges get reserved, as there is no longer
Thank you for detailed explanation :)
I was wondering whether I should have added such kind of description,
but it was nothing but, I believed, a "normal" DT behavior.
Post by James Morse
kexec-tools parses the list of memory ranges in /proc/iomem, and adds a node to
the /reserved-memory for System RAM ranges that don't cover the crash kernel.
reserved-memory {
ranges;
#size-cells = <0x2>;
#address-cells = <0x2>;
no-map;
reg = <0x83 0xffe50000 0x0 0x1b0000>;
};
[ ... ]
};
'no-map' means its doing the same thing to memblock as
'linux,usable-memory-range' did in earlier versions,
early_init_dt_reserve_memory_arch() takes no-map to mean memblock_remove().
We trigger the removing via early_init_fdt_scan_reserved_mem() in
arch/arm64/mm/init.c. This happens later than before, but its before the
crashkernel and cma ranges get reserved.
One difference I can see is that before we avoided memblock_remove()ing ranges
that were also in memblock.nomap. This was to avoid the ACPI tables getting
mapped as device memory by mistake, this is fixed by [1]. Now these ranges are
published in /proc/iomem as 'reserved' and won't get covered by a
reserved-memory node, and so we don't need to check memblock.nomap when
memblock_remove()ing.
The only odd thing I can see is for a (mythical?) pure-ACPI system. The EFI stub
will create a DT with a chosen node containing pointers to the memory map and
the efi command line. Now such as system may also grow a /reserved-memory node
after kdump. I don't think this is a problem, but it may not match how an
acpi-only system reserves memory. (how does that work?)
I didn't get what you mean by "may grow a /reserved-memory after kdump."
Post by James Morse
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
This is queued in Will's arm64/for-next/core,
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
This is queued in tip, but I can't see why kdump depends on it. It only has an
effect if the uefi memory map has !WB regions that linux needs to use.
Just because you said that the patch had fixed your problem on Seattle.
If I misunderstood, it will be fine to remove this reference from
my commit message.

Thanks,
-Takahiro AKASHI
Post by James Morse
Thanks,
James
Manish Jaggi
2016-10-03 07:54:34 UTC
Permalink
Hi Akashi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.

To generate a panic, i have a kernel module which on init calls panic.

Observations:
1.1. Dump capture kernel shows different memory map.
---------------------------------------------------
In dump capture kernel /proc/meminfo and /proc/iomem differ

***@arm64:/home/ubuntu/CODE/crash#
MemTotal: 65882432 kB
MemFree: 65507136 kB
MemAvailable: 60373632 kB
Buffers: 29248 kB
Cached: 46720 kB
SwapCached: 0 kB
Active: 63872 kB
Inactive: 19776 kB
Active(anon): 8256 kB
Inactive(anon): 7616 kB

First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.

***@arm64:/home/ubuntu/CODE/crash# cat /proc/iomem
41400000-fffeffff : System RAM
41480000-420cffff : Kernel code
42490000-4278ffff : Kernel data
ffff0000-ffffffff : reserved
100000000-ffaa7ffff : System RAM
ffaa80000-ffaabffff : reserved
ffaac0000-fffa6ffff : System RAM
fffa70000-fffacffff : reserved
fffad0000-fffffffff : System RAM

1.2 Live crash dump fails with error
--------------------------------------
$crash vmlinux

crash 7.1.5++
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...

crash: read error: kernel virtual address: ffff800ffffffcc0 type: "pglist node_id"

Observation 2
------------
If saved vmcore file is used

$crash vmlinux vmcore_saved
Got the below error.

please wait... (gathering module symbol data)crash: malloc.c:2846: mremap_chunk: Assertion `((size + offset) & (_rtld_global_ro._dl_pagesize - 1)) == 0' failed.
Aborted

Experiment 3
------------
If crash.git is modified with a hack patch in symbols.c. Crash utility works fine log, bt commands work.
-------------------
Patch: symbols.c
git diff symbols.c
diff --git a/symbols.c b/symbols.c
index 13282f4..f7c6cac 100644
--- a/symbols.c
+++ b/symbols.c
@@ -2160,6 +2160,7 @@ store_module_kallsyms_v2(struct load_module *lm, int start
FREEBUF(module_buf);
return 0;
}
+ lm->mod_init_size = 0;

if (lm->mod_init_size > 0) {
module_buf_init = GETBUF(lm->mod_init_size);
------------------

$ crash vmlinux vmcore_saved
KERNEL: /home/ubuntu/CODE/linux/vmlinux
DUMPFILE: vm
CPUS: 48 [OFFLINE: 46]
DATE: Mon Oct 3 00:11:47 2016
UPTIME: 00:02:41
LOAD AVERAGE: 0.36, 0.14, 0.05
TASKS: 171
NODENAME: arm64
RELEASE: 4.8.0-rc3-00044-g070a615-dirty
VERSION: #63 SMP Sat Oct 1 01:39:45 PDT 2016
MACHINE: aarch64 (unknown Mhz)
MEMORY: 2 GB
PANIC: "Kernel panic - not syncing: crash module starting"
PID: 958
COMMAND: "insmod"
TASK: ffff800007859300 [THREAD_INFO: ffff80000c940000]
CPU: 0
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 958 TASK: ffff800007859300 CPU: 0 COMMAND: "insmod"
#0 [ffff80000c943980] __crash_kexec at ffff000008144fe8
#1 [ffff80000c943ae0] panic at ffff0000081ae704
#2 [ffff80000c943ba0] init_module at ffff000000900014 [crash]
#3 [ffff80000c943bb0] do_one_initcall at ffff000008083bb4
#4 [ffff80000c943c40] do_init_module at ffff0000081af6f0
#5 [ffff80000c943c70] load_module at ffff000008140b7c
#6 [ffff80000c943e10] sys_finit_module at ffff000008141634
#7 [ffff80000c943ed0] el0_svc_naked at ffff0000080833ec
PC: 00000003 LR: ffffaca050a0 SP: ffffaca865a0 PSTATE: 00000111
X12: ffffac941a5c X11: 00000080 X10: 00000004 X9: 00000030
X8: ffffffff X7: fefefefefefeff40 X6: 00000111 X5: 00000001
X4: 00000001 X3: 0002ed61 X2: 00000000 X1: 00000003
X0: 00000000
crash>


---
Thanks,
manish
AKASHI Takahiro
2016-10-03 11:04:25 UTC
Permalink
Manish,
Post by James Morse
Hi Akashi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.
To generate a panic, i have a kernel module which on init calls panic.
1.1. Dump capture kernel shows different memory map.
---------------------------------------------------
In dump capture kernel /proc/meminfo and /proc/iomem differ
MemTotal: 65882432 kB
MemFree: 65507136 kB
MemAvailable: 60373632 kB
Buffers: 29248 kB
Cached: 46720 kB
SwapCached: 0 kB
Active: 63872 kB
Inactive: 19776 kB
Active(anon): 8256 kB
Inactive(anon): 7616 kB
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
41400000-fffeffff : System RAM
41480000-420cffff : Kernel code
42490000-4278ffff : Kernel data
ffff0000-ffffffff : reserved
100000000-ffaa7ffff : System RAM
ffaa80000-ffaabffff : reserved
ffaac0000-fffa6ffff : System RAM
fffa70000-fffacffff : reserved
fffad0000-fffffffff : System RAM
Are you saying that "mem=..." doesn't have any effect?
What about if you don't specify "crashkernel=...?"
Post by James Morse
1.2 Live crash dump fails with error
--------------------------------------
$crash vmlinux
crash 7.1.5++
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...
crash: read error: kernel virtual address: ffff800ffffffcc0 type: "pglist node_id"
I have no ideas here.
Post by James Morse
Observation 2
------------
If saved vmcore file is used
$crash vmlinux vmcore_saved
Got the below error.
please wait... (gathering module symbol data)crash: malloc.c:2846: mremap_chunk: Assertion `((size + offset) & (_rtld_global_ro._dl_pagesize - 1)) == 0' failed.
Aborted
I have no ideas here.
Post by James Morse
Experiment 3
------------
If crash.git is modified with a hack patch in symbols.c. Crash utility works fine log, bt commands work.
In which case, "crash vmlinux" or "crash vmlinux vmcore_saved?"

I was able to reproduce this issue in the latter case
(but with a different error message).
It seems to be a crash util's bug.
Please report it to crash-util mailing list.
I will post a patch.

Thanks,
-Takahiro AKASHI
Post by James Morse
-------------------
Patch: symbols.c
git diff symbols.c
diff --git a/symbols.c b/symbols.c
index 13282f4..f7c6cac 100644
--- a/symbols.c
+++ b/symbols.c
@@ -2160,6 +2160,7 @@ store_module_kallsyms_v2(struct load_module *lm, int start
FREEBUF(module_buf);
return 0;
}
+ lm->mod_init_size = 0;
if (lm->mod_init_size > 0) {
module_buf_init = GETBUF(lm->mod_init_size);
------------------
$ crash vmlinux vmcore_saved
KERNEL: /home/ubuntu/CODE/linux/vmlinux
DUMPFILE: vm
CPUS: 48 [OFFLINE: 46]
DATE: Mon Oct 3 00:11:47 2016
UPTIME: 00:02:41
LOAD AVERAGE: 0.36, 0.14, 0.05
TASKS: 171
NODENAME: arm64
RELEASE: 4.8.0-rc3-00044-g070a615-dirty
VERSION: #63 SMP Sat Oct 1 01:39:45 PDT 2016
MACHINE: aarch64 (unknown Mhz)
MEMORY: 2 GB
PANIC: "Kernel panic - not syncing: crash module starting"
PID: 958
COMMAND: "insmod"
TASK: ffff800007859300 [THREAD_INFO: ffff80000c940000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 958 TASK: ffff800007859300 CPU: 0 COMMAND: "insmod"
#0 [ffff80000c943980] __crash_kexec at ffff000008144fe8
#1 [ffff80000c943ae0] panic at ffff0000081ae704
#2 [ffff80000c943ba0] init_module at ffff000000900014 [crash]
#3 [ffff80000c943bb0] do_one_initcall at ffff000008083bb4
#4 [ffff80000c943c40] do_init_module at ffff0000081af6f0
#5 [ffff80000c943c70] load_module at ffff000008140b7c
#6 [ffff80000c943e10] sys_finit_module at ffff000008141634
#7 [ffff80000c943ed0] el0_svc_naked at ffff0000080833ec
PC: 00000003 LR: ffffaca050a0 SP: ffffaca865a0 PSTATE: 00000111
X12: ffffac941a5c X11: 00000080 X10: 00000004 X9: 00000030
X8: ffffffff X7: fefefefefefeff40 X6: 00000111 X5: 00000001
X4: 00000001 X3: 0002ed61 X2: 00000000 X1: 00000003
X0: 00000000
crash>
---
Thanks,
manish
Manish Jaggi
2016-10-03 12:41:40 UTC
Permalink
Post by AKASHI Takahiro
Manish,
Post by James Morse
Hi Akashi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.
To generate a panic, i have a kernel module which on init calls panic.
1.1. Dump capture kernel shows different memory map.
---------------------------------------------------
In dump capture kernel /proc/meminfo and /proc/iomem differ
MemTotal: 65882432 kB
MemFree: 65507136 kB
MemAvailable: 60373632 kB
Buffers: 29248 kB
Cached: 46720 kB
SwapCached: 0 kB
Active: 63872 kB
Inactive: 19776 kB
Active(anon): 8256 kB
Inactive(anon): 7616 kB
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
41400000-fffeffff : System RAM
41480000-420cffff : Kernel code
42490000-4278ffff : Kernel data
ffff0000-ffffffff : reserved
100000000-ffaa7ffff : System RAM
ffaa80000-ffaabffff : reserved
ffaac0000-fffa6ffff : System RAM
fffa70000-fffacffff : reserved
fffad0000-fffffffff : System RAM
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
Post by AKASHI Takahiro
What about if you don't specify "crashkernel=...?"
In that case the second kernel will not boot as kexec tools will complain that memory not reserved.
Post by AKASHI Takahiro
Post by James Morse
1.2 Live crash dump fails with error
--------------------------------------
$crash vmlinux
crash 7.1.5++
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...
crash: read error: kernel virtual address: ffff800ffffffcc0 type: "pglist node_id"
I have no ideas here.
If I run with debug logs phys address accessed is > 64G. (10413ffcc0)
Could be that somehow 64 + 1G + (addr) = 10413ffcc0 and actually addr was required.
addr = 413ffcc0 which seems in line with 424b0c50


Logs:
<read_dev_mem: addr: ffff0000090b3008 paddr: 424b3008 cnt: 8>
node_online_map: [1] -> nodes online: 1
<readmem: ffff0000090b0c50, KVADDR, ""node_data"", 8, (ROE), ffffc330eb00>
<read_dev_mem: addr: ffff0000090b0c50 paddr: 424b0c50 cnt: 8>
<readmem: ffff800ffffffcc0, KVADDR, ""pglist node_id"", 4, (FOE), ffffc330f1e4>
<read_dev_mem: addr: ffff800ffffffcc0 paddr: 10413ffcc0 cnt: 4>
/dev/mem: Bad address
crash: read(/dev/mem, 10413ffcc0, 4): 4294967295 (ffffffff)
crash: read error: kernel virtual address: ffff800ffffffcc0 type: ""pglist node_id""
"
Post by AKASHI Takahiro
Post by James Morse
Observation 2
------------
If saved vmcore file is used
$crash vmlinux vmcore_saved
Got the below error.
please wait... (gathering module symbol data)crash: malloc.c:2846: mremap_chunk: Assertion `((size + offset) & (_rtld_global_ro._dl_pagesize - 1)) == 0' failed.
Aborted
I have no ideas here.
Post by James Morse
Experiment 3
------------
If crash.git is modified with a hack patch in symbols.c. Crash utility works fine log, bt commands work.
In which case, "crash vmlinux" or "crash vmlinux vmcore_saved?"
vmcore_saved
Post by AKASHI Takahiro
I was able to reproduce this issue in the latter case
(but with a different error message).
It seems to be a crash util's bug.
Please report it to crash-util mailing list.
I will post a patch.
The same patch as below ?
Can you please share your patch
Post by AKASHI Takahiro
Thanks,
-Takahiro AKASHI
Post by James Morse
-------------------
Patch: symbols.c
git diff symbols.c
diff --git a/symbols.c b/symbols.c
index 13282f4..f7c6cac 100644
--- a/symbols.c
+++ b/symbols.c
@@ -2160,6 +2160,7 @@ store_module_kallsyms_v2(struct load_module *lm, int start
FREEBUF(module_buf);
return 0;
}
+ lm->mod_init_size = 0;
if (lm->mod_init_size > 0) {
module_buf_init = GETBUF(lm->mod_init_size);
------------------
$ crash vmlinux vmcore_saved
KERNEL: /home/ubuntu/CODE/linux/vmlinux
DUMPFILE: vm
CPUS: 48 [OFFLINE: 46]
DATE: Mon Oct 3 00:11:47 2016
UPTIME: 00:02:41
LOAD AVERAGE: 0.36, 0.14, 0.05
TASKS: 171
NODENAME: arm64
RELEASE: 4.8.0-rc3-00044-g070a615-dirty
VERSION: #63 SMP Sat Oct 1 01:39:45 PDT 2016
MACHINE: aarch64 (unknown Mhz)
MEMORY: 2 GB
PANIC: "Kernel panic - not syncing: crash module starting"
PID: 958
COMMAND: "insmod"
TASK: ffff800007859300 [THREAD_INFO: ffff80000c940000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 958 TASK: ffff800007859300 CPU: 0 COMMAND: "insmod"
#0 [ffff80000c943980] __crash_kexec at ffff000008144fe8
#1 [ffff80000c943ae0] panic at ffff0000081ae704
#2 [ffff80000c943ba0] init_module at ffff000000900014 [crash]
#3 [ffff80000c943bb0] do_one_initcall at ffff000008083bb4
#4 [ffff80000c943c40] do_init_module at ffff0000081af6f0
#5 [ffff80000c943c70] load_module at ffff000008140b7c
#6 [ffff80000c943e10] sys_finit_module at ffff000008141634
#7 [ffff80000c943ed0] el0_svc_naked at ffff0000080833ec
PC: 00000003 LR: ffffaca050a0 SP: ffffaca865a0 PSTATE: 00000111
X12: ffffac941a5c X11: 00000080 X10: 00000004 X9: 00000030
X8: ffffffff X7: fefefefefefeff40 X6: 00000111 X5: 00000001
X4: 00000001 X3: 0002ed61 X2: 00000000 X1: 00000003
X0: 00000000
crash>
---
Thanks,
manish
AKASHI Takahiro
2016-10-04 02:56:58 UTC
Permalink
Post by Manish Jaggi
Post by AKASHI Takahiro
Manish,
Post by James Morse
Hi Akashi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.
To generate a panic, i have a kernel module which on init calls panic.
1.1. Dump capture kernel shows different memory map.
---------------------------------------------------
In dump capture kernel /proc/meminfo and /proc/iomem differ
MemTotal: 65882432 kB
MemFree: 65507136 kB
MemAvailable: 60373632 kB
Buffers: 29248 kB
Cached: 46720 kB
SwapCached: 0 kB
Active: 63872 kB
Inactive: 19776 kB
Active(anon): 8256 kB
Inactive(anon): 7616 kB
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
41400000-fffeffff : System RAM
41480000-420cffff : Kernel code
42490000-4278ffff : Kernel data
ffff0000-ffffffff : reserved
100000000-ffaa7ffff : System RAM
ffaa80000-ffaabffff : reserved
ffaac0000-fffa6ffff : System RAM
fffa70000-fffacffff : reserved
fffad0000-fffffffff : System RAM
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
Post by AKASHI Takahiro
What about if you don't specify "crashkernel=...?"
In that case the second kernel will not boot as kexec tools will complain that memory not reserved.
Post by AKASHI Takahiro
Post by James Morse
1.2 Live crash dump fails with error
--------------------------------------
$crash vmlinux
crash 7.1.5++
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...
crash: read error: kernel virtual address: ffff800ffffffcc0 type: "pglist node_id"
I have no ideas here.
If I run with debug logs phys address accessed is > 64G. (10413ffcc0)
Could be that somehow 64 + 1G + (addr) = 10413ffcc0 and actually addr was required.
addr = 413ffcc0 which seems in line with 424b0c50
<read_dev_mem: addr: ffff0000090b3008 paddr: 424b3008 cnt: 8>
node_online_map: [1] -> nodes online: 1
<readmem: ffff0000090b0c50, KVADDR, ""node_data"", 8, (ROE), ffffc330eb00>
<read_dev_mem: addr: ffff0000090b0c50 paddr: 424b0c50 cnt: 8>
<readmem: ffff800ffffffcc0, KVADDR, ""pglist node_id"", 4, (FOE), ffffc330f1e4>
<read_dev_mem: addr: ffff800ffffffcc0 paddr: 10413ffcc0 cnt: 4>
/dev/mem: Bad address
crash: read(/dev/mem, 10413ffcc0, 4): 4294967295 (ffffffff)
crash: read error: kernel virtual address: ffff800ffffffcc0 type: ""pglist node_id""
"
Post by AKASHI Takahiro
Post by James Morse
Observation 2
------------
If saved vmcore file is used
$crash vmlinux vmcore_saved
Got the below error.
please wait... (gathering module symbol data)crash: malloc.c:2846: mremap_chunk: Assertion `((size + offset) & (_rtld_global_ro._dl_pagesize - 1)) == 0' failed.
Aborted
I have no ideas here.
Post by James Morse
Experiment 3
------------
If crash.git is modified with a hack patch in symbols.c. Crash utility works fine log, bt commands work.
In which case, "crash vmlinux" or "crash vmlinux vmcore_saved?"
vmcore_saved
Post by AKASHI Takahiro
I was able to reproduce this issue in the latter case
(but with a different error message).
It seems to be a crash util's bug.
Please report it to crash-util mailing list.
I will post a patch.
The same patch as below ?
No.
Post by Manish Jaggi
Can you please share your patch
I submitted a bug fix patch. See:
https://www.redhat.com/archives/crash-utility/2016-October/msg00000.html

-Takahiro AKASHI
Post by Manish Jaggi
Post by AKASHI Takahiro
Thanks,
-Takahiro AKASHI
Post by James Morse
-------------------
Patch: symbols.c
git diff symbols.c
diff --git a/symbols.c b/symbols.c
index 13282f4..f7c6cac 100644
--- a/symbols.c
+++ b/symbols.c
@@ -2160,6 +2160,7 @@ store_module_kallsyms_v2(struct load_module *lm, int start
FREEBUF(module_buf);
return 0;
}
+ lm->mod_init_size = 0;
if (lm->mod_init_size > 0) {
module_buf_init = GETBUF(lm->mod_init_size);
------------------
$ crash vmlinux vmcore_saved
KERNEL: /home/ubuntu/CODE/linux/vmlinux
DUMPFILE: vm
CPUS: 48 [OFFLINE: 46]
DATE: Mon Oct 3 00:11:47 2016
UPTIME: 00:02:41
LOAD AVERAGE: 0.36, 0.14, 0.05
TASKS: 171
NODENAME: arm64
RELEASE: 4.8.0-rc3-00044-g070a615-dirty
VERSION: #63 SMP Sat Oct 1 01:39:45 PDT 2016
MACHINE: aarch64 (unknown Mhz)
MEMORY: 2 GB
PANIC: "Kernel panic - not syncing: crash module starting"
PID: 958
COMMAND: "insmod"
TASK: ffff800007859300 [THREAD_INFO: ffff80000c940000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 958 TASK: ffff800007859300 CPU: 0 COMMAND: "insmod"
#0 [ffff80000c943980] __crash_kexec at ffff000008144fe8
#1 [ffff80000c943ae0] panic at ffff0000081ae704
#2 [ffff80000c943ba0] init_module at ffff000000900014 [crash]
#3 [ffff80000c943bb0] do_one_initcall at ffff000008083bb4
#4 [ffff80000c943c40] do_init_module at ffff0000081af6f0
#5 [ffff80000c943c70] load_module at ffff000008140b7c
#6 [ffff80000c943e10] sys_finit_module at ffff000008141634
#7 [ffff80000c943ed0] el0_svc_naked at ffff0000080833ec
PC: 00000003 LR: ffffaca050a0 SP: ffffaca865a0 PSTATE: 00000111
X12: ffffac941a5c X11: 00000080 X10: 00000004 X9: 00000030
X8: ffffffff X7: fefefefefefeff40 X6: 00000111 X5: 00000001
X4: 00000001 X3: 0002ed61 X2: 00000000 X1: 00000003
X0: 00000000
crash>
---
Thanks,
manish
James Morse
2016-10-04 09:46:27 UTC
Permalink
Hi Manish,
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.
To generate a panic, i have a kernel module which on init calls panic.
... modules ... I haven't tested that. I bet it causes some problems!
We probably need to include module_alloc_base as an elf note in the vmcore file...
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
So your second crashkernel has 63G of memory? Unless you provide the same 'mem='
to the kdump kernel, this is the expected behaviour. The
DT:/reserved-memory/crash_dump describes the memory not to use.

On your first boot with 'mem=2G' memblock_mem_limit_remove_map() called from
arm64_memblock_init() removed the top 62G of memory. Neither the first kernel
nor kexec-tools know about the top 62G.
When you run kexec-tools, it describes what it sees in /proc/iomem in the
DT:/reserved-memory/crash_dump, which is just the remaining 1G of memory.

When we crash and reboot, the crash kernel discovers all 64G of memory from the
EFI memory map.
kexec-tools described the 1G of memory that the first kernel was using in the
DT:/reserved-memory/crash_dump node, so early_init_fdt_scan_reserved_mem()
reserves the 1G of memory the first kernel used. This leaves us with 63G of memory.

This may change with the next version of kdump if it switches back to using
DT:/chosen/linux,usable-memory-range.
If you need v26 to avoid the top 62G of memory, you need to provide the same
'mem=' to the first and second kernel.
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.2 Live crash dump fails with error
... do we expect this to work? I don't think it has anything to do with this
series...


Thanks,

James
Manish Jaggi
2016-10-04 10:05:43 UTC
Permalink
Post by James Morse
Hi Manish,
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.
To generate a panic, i have a kernel module which on init calls panic.
... modules ... I haven't tested that. I bet it causes some problems!
We probably need to include module_alloc_base as an elf note in the vmcore file...
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
So your second crashkernel has 63G of memory? Unless you provide the same 'mem='
to the kdump kernel, this is the expected behaviour. The
DT:/reserved-memory/crash_dump describes the memory not to use.
On your first boot with 'mem=2G' memblock_mem_limit_remove_map() called from
arm64_memblock_init() removed the top 62G of memory. Neither the first kernel
nor kexec-tools know about the top 62G.
When you run kexec-tools, it describes what it sees in /proc/iomem in the
DT:/reserved-memory/crash_dump, which is just the remaining 1G of memory.
When we crash and reboot, the crash kernel discovers all 64G of memory from the
EFI memory map.
So the iomem and meminfo should be same or different for the second kernel?
Also i assumed that crashkernel=1G should restrict the second kernels to 1G.
This is my understanding from the description. It should not require a second mem= option
Post by James Morse
kexec-tools described the 1G of memory that the first kernel was using in the
DT:/reserved-memory/crash_dump node, so early_init_fdt_scan_reserved_mem()
reserves the 1G of memory the first kernel used. This leaves us with 63G of memory.
This may change with the next version of kdump if it switches back to using
DT:/chosen/linux,usable-memory-range.
If you need v26 to avoid the top 62G of memory, you need to provide the same
'mem=' to the first and second kernel.
If I provide for second kernel, I dont see any prints after Bye.
Have you tired this anytime?
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.2 Live crash dump fails with error
... do we expect this to work? I don't think it has anything to do with this
series...
Why it should not?
I saved the vmcore file while in second kernel. Since crash without vmcore file didnt run,
Tried with vmcore file and it worked. Its just that if you want to boot a second kernel
with read only file system without network live crash dump analysis is handy.
Post by James Morse
Thanks,
James
James Morse
2016-10-04 10:53:36 UTC
Permalink
Hi Manish,
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
So your second crashkernel has 63G of memory? Unless you provide the same 'mem='
to the kdump kernel, this is the expected behaviour. The
DT:/reserved-memory/crash_dump describes the memory not to use.
On your first boot with 'mem=2G' memblock_mem_limit_remove_map() called from
arm64_memblock_init() removed the top 62G of memory. Neither the first kernel
nor kexec-tools know about the top 62G.
When you run kexec-tools, it describes what it sees in /proc/iomem in the
DT:/reserved-memory/crash_dump, which is just the remaining 1G of memory.
When we crash and reboot, the crash kernel discovers all 64G of memory from the
EFI memory map.
So the iomem and meminfo should be same or different for the second kernel?
Also i assumed that crashkernel=1G should restrict the second kernels to 1G.
Not with v26 of this series. What should it do with the 62G of memory that was
removed by booting with 'mem=2G'? It isn't part of the crashkernel reserved
area, and it isn't part of the vmcore described in elfcorehdr either...
Post by Manish Jaggi
This is my understanding from the description. It should not require a second mem= option
Post by James Morse
kexec-tools described the 1G of memory that the first kernel was using in the
DT:/reserved-memory/crash_dump node, so early_init_fdt_scan_reserved_mem()
reserves the 1G of memory the first kernel used. This leaves us with 63G of memory.
This may change with the next version of kdump if it switches back to using
DT:/chosen/linux,usable-memory-range.
If you need v26 to avoid the top 62G of memory, you need to provide the same
'mem=' to the first and second kernel.
If I provide for second kernel, I dont see any prints after Bye.
Have you tired this anytime?
Yes, on juno-r1 passing 'mem=2G' to both the first and second kernel causes only
the first 2G of memory to be used with this pattern:
first kernel: [1G used for linux] [1G reserved for Crash kernel] [6G memory
hidden]
kdump kernel: [1G vmcore] [1G used for linux] [6G memory hidden]
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.2 Live crash dump fails with error
... do we expect this to work? I don't think it has anything to do with this
series...
Why it should not?
I saved the vmcore file while in second kernel. Since crash without vmcore file didnt run,
Tried with vmcore file and it worked. Its just that if you want to boot a second kernel
with read only file system without network live crash dump analysis is handy.
Ah, you want to run /usr/bin/crash with the kdump boot of linux. You still need
to tell it where to find the memory image: "crash /path/to/vmlinux /proc/vmcore"
should do the trick.


Thanks,

James
Manish Jaggi
2016-10-04 13:23:28 UTC
Permalink
Post by James Morse
Hi Manish,
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
So your second crashkernel has 63G of memory? Unless you provide the same 'mem='
to the kdump kernel, this is the expected behaviour. The
DT:/reserved-memory/crash_dump describes the memory not to use.
On your first boot with 'mem=2G' memblock_mem_limit_remove_map() called from
arm64_memblock_init() removed the top 62G of memory. Neither the first kernel
nor kexec-tools know about the top 62G.
When you run kexec-tools, it describes what it sees in /proc/iomem in the
DT:/reserved-memory/crash_dump, which is just the remaining 1G of memory.
When we crash and reboot, the crash kernel discovers all 64G of memory from the
EFI memory map.
So the iomem and meminfo should be same or different for the second kernel?
Also i assumed that crashkernel=1G should restrict the second kernels to 1G.
Not with v26 of this series. What should it do with the 62G of memory that was
removed by booting with 'mem=2G'? It isn't part of the crashkernel reserved
area, and it isn't part of the vmcore described in elfcorehdr either...
Post by Manish Jaggi
This is my understanding from the description. It should not require a second mem= option
Post by James Morse
kexec-tools described the 1G of memory that the first kernel was using in the
DT:/reserved-memory/crash_dump node, so early_init_fdt_scan_reserved_mem()
reserves the 1G of memory the first kernel used. This leaves us with 63G of memory.
This may change with the next version of kdump if it switches back to using
DT:/chosen/linux,usable-memory-range.
If you need v26 to avoid the top 62G of memory, you need to provide the same
'mem=' to the first and second kernel.
If I provide for second kernel, I dont see any prints after Bye.
Have you tired this anytime?
Yes, on juno-r1 passing 'mem=2G' to both the first and second kernel causes only
first kernel: [1G used for linux] [1G reserved for Crash kernel] [6G memory
hidden]
kdump kernel: [1G vmcore] [1G used for linux] [6G memory hidden]
Oh, ok!
I was giving mem=1G to crashkernel to test. with mem=2G it works.
Post by James Morse
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.2 Live crash dump fails with error
... do we expect this to work? I don't think it has anything to do with this
series...
Why it should not?
I saved the vmcore file while in second kernel. Since crash without vmcore file didnt run,
Tried with vmcore file and it worked. Its just that if you want to boot a second kernel
with read only file system without network live crash dump analysis is handy.
Ah, you want to run /usr/bin/crash with the kdump boot of linux. You still need
to tell it where to find the memory image: "crash /path/to/vmlinux /proc/vmcore"
should do the trick.
We should fix the documentation of kdump them.
Since it is not supported, it should be removed.
Post by James Morse
Thanks,
James
AKASHI Takahiro
2016-10-05 05:48:51 UTC
Permalink
Manish,
Post by Manish Jaggi
Post by James Morse
Hi Manish,
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
So your second crashkernel has 63G of memory? Unless you provide the same 'mem='
to the kdump kernel, this is the expected behaviour. The
DT:/reserved-memory/crash_dump describes the memory not to use.
On your first boot with 'mem=2G' memblock_mem_limit_remove_map() called from
arm64_memblock_init() removed the top 62G of memory. Neither the first kernel
nor kexec-tools know about the top 62G.
When you run kexec-tools, it describes what it sees in /proc/iomem in the
DT:/reserved-memory/crash_dump, which is just the remaining 1G of memory.
When we crash and reboot, the crash kernel discovers all 64G of memory from the
EFI memory map.
So the iomem and meminfo should be same or different for the second kernel?
Also i assumed that crashkernel=1G should restrict the second kernels to 1G.
Not with v26 of this series. What should it do with the 62G of memory that was
removed by booting with 'mem=2G'? It isn't part of the crashkernel reserved
area, and it isn't part of the vmcore described in elfcorehdr either...
Post by Manish Jaggi
This is my understanding from the description. It should not require a second mem= option
Post by James Morse
kexec-tools described the 1G of memory that the first kernel was using in the
DT:/reserved-memory/crash_dump node, so early_init_fdt_scan_reserved_mem()
reserves the 1G of memory the first kernel used. This leaves us with 63G of memory.
This may change with the next version of kdump if it switches back to using
DT:/chosen/linux,usable-memory-range.
If you need v26 to avoid the top 62G of memory, you need to provide the same
'mem=' to the first and second kernel.
If I provide for second kernel, I dont see any prints after Bye.
Have you tired this anytime?
Yes, on juno-r1 passing 'mem=2G' to both the first and second kernel causes only
first kernel: [1G used for linux] [1G reserved for Crash kernel] [6G memory
hidden]
kdump kernel: [1G vmcore] [1G used for linux] [6G memory hidden]
Oh, ok!
I was giving mem=1G to crashkernel to test. with mem=2G it works.
I didn't know that you specified "mem=1G" in our local discussions ...
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.2 Live crash dump fails with error
... do we expect this to work? I don't think it has anything to do with this
series...
Why it should not?
I saved the vmcore file while in second kernel. Since crash without vmcore file didnt run,
Tried with vmcore file and it worked. Its just that if you want to boot a second kernel
with read only file system without network live crash dump analysis is handy.
Ah, you want to run /usr/bin/crash with the kdump boot of linux. You still need
to tell it where to find the memory image: "crash /path/to/vmlinux /proc/vmcore"
should do the trick.
We should fix the documentation of kdump them.
Since it is not supported, it should be removed.
Remove what?

And can you please double-check if you still have any problem
on a live system or with a saved core file?
(except for "mem=" stuff)

-Takahiro AKASHI
Post by Manish Jaggi
Post by James Morse
Thanks,
James
AKASHI Takahiro
2016-10-05 05:41:12 UTC
Permalink
Post by James Morse
Hi Manish,
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
With the v26 kdump and v3 kexec-tools and top of tree crash.git, below are the tests done
Attached is a patch in crash.git (symbols.c) to make crash utility work on my setup.
Can you please have a look and provide your comments.
To generate a panic, i have a kernel module which on init calls panic.
... modules ... I haven't tested that. I bet it causes some problems!
We probably need to include module_alloc_base as an elf note in the vmcore file...
No, I don't think so :)
I created some test module as Manish implied and tested kdump:
(My kernel here even enables KASLR.)
===8<===
$ crash vmlinux vmcore
...
please wait... (gathering module symbol data)
...
crash> mod -S
MODULE NAME SIZE OBJECT FILE
ffff04d78f4b8000 testmod 16384 /opt/buildroot/15.11_64/root/kexec/testmod.ko
crash> bt
PID: 1102 TASK: ffffb4da8e910000 CPU: 0 COMMAND: "insmod"
#0 [ffffb4da8e9afa30] __crash_kexec at ffff0e0045020a54
#1 [ffffb4da8e9afb90] panic at ffff0e004505523c
#2 [ffffb4da8e9afc50] testmod_init at ffff04d78f4b6014 [testmod]
#3 [ffffb4da8e9afb40] do_one_initcall at ffff0e0044f7333c
--- <Exception in user> ---
PC: 0000000a LR: 00000000 SP: ffff04d78f4b6000 PSTATE: 7669726420656c75
X12: ffffb4da8e9ac000 X11: ffff04d78f4b6018 X10: ffffb4da8e9afc50 X9: 20676e6973756143
X8: 00000000 X7: ffff0e0045e5ce00 X6: ffff0e0045e5c000 X5: 600001c5
X4: ffff0e0045020a58 X3: ffffb4da8e9afa30 X2: ffff0e004502098c X1: ffffb4da8e9afa30
X0: 00000124
crash> disas testmod_init
Dump of assembler code for function testmod_init:
0xffff04d78f4b6000 <+0>: stp x29, x30, [sp,#-16]!
0xffff04d78f4b6004 <+4>: mov x29, sp
0xffff04d78f4b6008 <+8>: ldr x0, 0xffff04d78f4b6018
0xffff04d78f4b600c <+12>: bl 0xffff04d78f4b6090
0xffff04d78f4b6010 <+16>: ldr x0, 0xffff04d78f4b6020
0xffff04d78f4b6014 <+20>: bl 0xffff04d78f4b6080
End of assembler dump.
===>8===
(I see some issue in disassembled code, though.)
Post by James Morse
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
As per /proc/iomem System RAM the information is correct, but the /proc/meminfo is showing total memory
much more than the first kernel had in first place.
So your second crashkernel has 63G of memory? Unless you provide the same 'mem='
to the kdump kernel, this is the expected behaviour. The
DT:/reserved-memory/crash_dump describes the memory not to use.
On your first boot with 'mem=2G' memblock_mem_limit_remove_map() called from
arm64_memblock_init() removed the top 62G of memory. Neither the first kernel
nor kexec-tools know about the top 62G.
When you run kexec-tools, it describes what it sees in /proc/iomem in the
DT:/reserved-memory/crash_dump, which is just the remaining 1G of memory.
When we crash and reboot, the crash kernel discovers all 64G of memory from the
EFI memory map.
kexec-tools described the 1G of memory that the first kernel was using in the
DT:/reserved-memory/crash_dump node, so early_init_fdt_scan_reserved_mem()
reserves the 1G of memory the first kernel used. This leaves us with 63G of memory.
Thank you very much for elaborating this on behalf of myself!
Post by James Morse
This may change with the next version of kdump if it switches back to using
DT:/chosen/linux,usable-memory-range.
Indeed.
We need to talk to Rob.

Thanks,
-Takahiro AKASHI
Post by James Morse
If you need v26 to avoid the top 62G of memory, you need to provide the same
'mem=' to the first and second kernel.
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.2 Live crash dump fails with error
... do we expect this to work? I don't think it has anything to do with this
series...
Thanks,
James
Mark Rutland
2016-10-04 10:18:33 UTC
Permalink
Post by Manish Jaggi
Post by AKASHI Takahiro
Post by Manish Jaggi
1.1. Dump capture kernel shows different memory map.
---------------------------------------------------
In dump capture kernel /proc/meminfo and /proc/iomem differ
MemTotal: 65882432 kB
MemFree: 65507136 kB
MemAvailable: 60373632 kB
Buffers: 29248 kB
Cached: 46720 kB
SwapCached: 0 kB
Active: 63872 kB
Inactive: 19776 kB
Active(anon): 8256 kB
Inactive(anon): 7616 kB
First kernel is booted with mem=2G crashkernel=1G command line option.
While the system has 64G memory.
41400000-fffeffff : System RAM
41480000-420cffff : Kernel code
42490000-4278ffff : Kernel data
ffff0000-ffffffff : reserved
100000000-ffaa7ffff : System RAM
ffaa80000-ffaabffff : reserved
ffaac0000-fffa6ffff : System RAM
fffa70000-fffacffff : reserved
fffad0000-fffffffff : System RAM
Are you saying that "mem=..." doesn't have any effect?
What I am saying it that If the first kernel is booted using mem= option and crashkernel= option
the memory for second kernel has to be withing the crashkernel size.
Please don't try to use mem= to limit the kernel to a specific range of
memory. It's really only there as a tool to test handling of low-memory
situations.

While it guarantees that at most, the amount requested will be used
(modulo a number of edge cases with reserved memory ranges), it does not
guarantee *which* memory will be used. It is *very* fragile.

Thanks,
Mark.
Ruslan Bilovol
2016-10-17 15:41:01 UTC
Permalink
Hi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
Are you going to rebase your patch series onto v4.9-rc1 tag soon? I see
that patches [1] and [2] are already in v4.9-rc1, but when tried to apply
this series, I've got conflict on first patch of the series ("arm64: kdump:
reserve memory for crash dump kernel"). I want to try arm64 kdump
patches again on my board, so I'm interested in this. The question is
whether I need to rebase it myself or you will do the same (and address
comments) soon.

Also I see Geoff published v6 of arm64 kexec-tools patches, so same
question is applicable to "(kexec-tools) arm64: add kdump support"
patch series.

Thanks,
Ruslan
AKASHI Takahiro
2016-10-18 06:26:19 UTC
Permalink
Ruslan,
Post by Ruslan Bilovol
Hi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
Are you going to rebase your patch series onto v4.9-rc1 tag soon? I see
Yes, definitely as soon as possible! (actually I've done it.)
But before submitting a new version, I need to convince Rob (Herring)
that he would accept my old approach (v25) regarding specifying usable
memory for crash dump kernel:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/459379.html
Post by Ruslan Bilovol
that patches [1] and [2] are already in v4.9-rc1, but when tried to apply
reserve memory for crash dump kernel"). I want to try arm64 kdump
patches again on my board, so I'm interested in this. The question is
whether I need to rebase it myself or you will do the same (and address
comments) soon.
Thank you for your interests and sorry for any inconvenience.

-Takahiro AKASHI
Post by Ruslan Bilovol
Also I see Geoff published v6 of arm64 kexec-tools patches, so same
question is applicable to "(kexec-tools) arm64: add kdump support"
patch series.
Thanks,
Ruslan
Ruslan Bilovol
2016-11-01 12:19:43 UTC
Permalink
Post by AKASHI Takahiro
Ruslan,
Post by Ruslan Bilovol
Hi,
v26-specific note: After a comment from Rob[0], an idea of adding
"linux,usable-memory-range" was dropped. Instead, an existing
"reserved-memory" node will be used to limit usable memory ranges
on crash dump kernel.
This works not only on UEFI/ACPI systems but also on DT-only systems,
but if he really insists on using DT-specific "usable-memory" property,
I will post additional patches for kexec-tools. Those would be
redundant, though.
Even in that case, the kernel will not have to be changed.
This patch series adds kdump support on arm64.
There are some prerequisite patches [1],[2].
To load a crash-dump kernel to the systems, a series of patches to
kexec-tools, which have not yet been merged upstream, are needed.
Please always use my latest kdump patches, v3 [3].
To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (coming v7.1.6 or later) [4]
(Necessary patches have already been queued in the master.)
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/452582.html
[1] "arm64: mark reserved memblock regions explicitly in iomem"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450433.html
[2] "efi: arm64: treat regions with WT/WC set but WB cleared as memory"
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/451491.html
[3] T.B.D.
[4] https://github.com/crash-utility/crash.git
Are you going to rebase your patch series onto v4.9-rc1 tag soon? I see
Yes, definitely as soon as possible! (actually I've done it.)
But before submitting a new version, I need to convince Rob (Herring)
that he would accept my old approach (v25) regarding specifying usable
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/459379.html
It looks like the patches got stuck on review.

Could you please share that rebased version of kernel and
kexec-tools (maybe even on Linaro private git repo), I'd like
to try it on our HW while review is in progress.

Thanks,
Ruslan

Loading...