Discussion:
[PATCH v35 00/14] arm64: add kdump support
AKASHI Takahiro
2017-04-03 02:21:39 UTC
Permalink
This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools[1] are also needed. Please use the latest one, v6 [2].
For your convinience, you can pick them up from:
https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (v7.1.8 or later) [3]

I tested this patchset on fast model and hikey.

The previous versions were also:
Tested-by: Pratyush Anand <***@redhat.com> (v33, mustang and seattle)
Tested-by: James Morse <***@arm.com> (v27/v32?, Juno)
Tested-by: Sameer Goel (v33, QDT2400)

Changes for v35 (Apr 3, 2017)
o rebased to arm64/for-next/core, solving conflicts on mmu.c (patch#6)

Changes for v34 (Mar 28, 2017)
o add and use set_memory_valid() instead of create_pgd_mapping()
(patch #5,6)
o rename functions from kexec_* to crash_* (patch #7)
o supress WARN_ON() message if successfully shutting down secondary cpus
(patch #8)

Changes for v33 (Mar 15, 2017)
o rebased to v4.11-rc2+
o arch_kexec_(un)protect_crashkres() now protects loaded data segments
only along with moving copying of control_code_page back to machine_kexec()
(patch #6)
o reduce the size of hibernation image when kdump and hibernation are
comfigured at the same time (patch #7)
o clearify that "linux,usable-memory-range" and "linux,elfcorehdr"
have values of the size of root node's "#address-cells" and "#size-cells"
(patch #13)
o add "efi/libstub/arm*: Set default address and size cells values for
an empty dtb" from Sameer Goel (patch #14)
(I didn't test the case though.)

Changes for v32 (Feb 7, 2017)
o isolate crash dump kernel memory as well as kernel text/data by using
MEMBLOCK_MAP attribute to and then specifically map them in map_mem()
(patch #1,6)
o delete remove_pgd_mapping() and instead modify create_pgd_mapping() to
allowing for unmapping a kernel mapping (patch #5)
o correct a commit message as well as a comment in the source (patch#10)
o other trivial changes after Mark's comments (patch#3,4)

Changes for v31 (Feb 1, 2017)
o add/use remove_pgd_mapping() instead of modifying (__)create_pgd_mapping()
to protect crash dump kernel memory (patch #4,5)
o fix an issue at the isolation of crash dump kernel memory in
map_mem()/__map_memblock(), adding map_crashkernel() (patch#5)
o preserve the contents of crash dump kernel memory around hibernation
(patch#6)

Changes for v30 (Jan 24, 2017)
o rebased to Linux-v4.10-rc5
o remove "linux,crashkernel-base/size" from exported device tree
o protect memory region for crash-dump kernel (adding patch#4,5)
o remove "in_crash_kexec" variable
o and other trivial changes

Changes for v29 (Dec 28, 2016)
o rebased to Linux-v4.10-rc1
o change asm constraints in crash_setup_regs() per Catalin

Changes for v28 (Nov 22, 2016)
o rebased to Linux-v4.9-rc6
o revamp patch #1 and merge memblock_cap_memory_range() with
memblock_mem_limit_remove_map()

Changes for v27 (Nov 1, 2016)
o rebased to Linux-v4.9-rc3
o revert v26 change, i.e. revive "linux,usable-memory-range" property
(patch #2/#3, updating patch #9)
o minor fixes per review comments (patch #3/#4/#6/#8)
o re-order patches and improve commit messages for readability

Changes for v26 (Sep 7, 2016):
o Use /reserved-memory instead of "linux,usable-memory-range" property
(dropping v25's patch#2 and #3, updating ex-patch#9.)

Changes for v25 (Aug 29, 2016):
o Rebase to Linux-4.8-rc4
o Use memremap() instead of ioremap_cache() [patch#5]

Changes for v24 (Aug 9, 2016):
o Rebase to Linux-4.8-rc1
o Update descriptions about newly added DT proerties

Changes for v23 (July 26, 2016):

o Move memblock_reserve() to a single place in reserve_crashkernel()
o Use cpu_park_loop() in ipi_cpu_crash_stop()
o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
(for ACPI) from usable memory at crash kernel

Changes for v22 (July 12, 2016):

o Export "crashkernel-base" and "crashkernel-size" via device-tree,
and add some descriptions about them in chosen.txt
o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
with powerpc's "usable-memory"
o Make cosmetic changes regarding "ifdef" usage
o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

o Remove kexec patches.
o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
o Clarify the description about kvm in kdump.txt.

See the link [4] for older changes.


[1] https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
[2] http://lists.infradead.org/pipermail/kexec/2017-March/018356.html
[3] https://github.com/crash-utility/crash.git
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (11):
memblock: add memblock_clear_nomap()
memblock: add memblock_cap_memory_range()
arm64: limit memory regions based on DT property, usable-memory-range
arm64: kdump: reserve memory for crash dump kernel
arm64: mm: add set_memory_valid()
arm64: hibernate: preserve kdump image around hibernation
arm64: kdump: implement machine_crash_shutdown()
arm64: kdump: add VMCOREINFO's for user-space tools
arm64: kdump: provide /proc/vmcore file
arm64: kdump: enable kdump in defconfig
Documentation: kdump: describe arm64 port

James Morse (1):
Documentation: dt: chosen properties for arm64 kdump

Sameer Goel (1):
efi/libstub/arm*: Set default address and size cells values for an
empty dtb

Takahiro Akashi (1):
arm64: kdump: protect crash dump kernel memory

Documentation/devicetree/bindings/chosen.txt | 45 +++++++
Documentation/kdump/kdump.txt | 16 ++-
arch/arm64/Kconfig | 11 ++
arch/arm64/configs/defconfig | 1 +
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 52 +++++++-
arch/arm64/include/asm/smp.h | 3 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 +++++++++++
arch/arm64/kernel/hibernate.c | 10 +-
arch/arm64/kernel/machine_kexec.c | 170 +++++++++++++++++++++++--
arch/arm64/kernel/setup.c | 7 +-
arch/arm64/kernel/smp.c | 68 ++++++++++
arch/arm64/mm/init.c | 181 +++++++++++++++++++++++++++
arch/arm64/mm/mmu.c | 103 +++++++--------
arch/arm64/mm/pageattr.c | 15 ++-
drivers/firmware/efi/libstub/fdt.c | 28 ++++-
include/linux/memblock.h | 2 +
mm/memblock.c | 56 ++++++---
20 files changed, 757 insertions(+), 86 deletions(-)
create mode 100644 arch/arm64/kernel/crash_dump.c
--
2.11.1
AKASHI Takahiro
2017-04-03 02:23:54 UTC
Permalink
This function, with a combination of memblock_mark_nomap(), will be used
in a later kdump patch for arm64 when it temporarily isolates some range
of memory from the other memory blocks in order to create a specific
kernel mapping at boot time.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 12 ++++++++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index bdfc65af4152..e82daffcfc44 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,7 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
+int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
ulong choose_memblock_flags(void);

/* Low level functions */
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d17c4e..2f4ca8104ea4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -805,6 +805,18 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
}

/**
+ * memblock_clear_nomap - Clear flag MEMBLOCK_NOMAP for a specified region.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
+}
+
+/**
* __next_reserved_mem_region - next function for for_each_reserved_region()
* @idx: pointer to u64 loop variable
* @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
--
2.11.1
AKASHI Takahiro
2017-04-03 02:23:55 UTC
Permalink
Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Will Deacon <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
Acked-by: Dennis Chen <***@arm.com>
Cc: linux-***@kvack.org
Cc: Andrew Morton <***@linux-foundation.org>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 44 +++++++++++++++++++++++++++++---------------
2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e82daffcfc44..4ce24a376262 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -336,6 +336,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
phys_addr_t memblock_start_of_DRAM(void);
phys_addr_t memblock_end_of_DRAM(void);
void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
void memblock_mem_limit_remove_map(phys_addr_t limit);
bool memblock_is_memory(phys_addr_t addr);
int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 2f4ca8104ea4..b049c9b2dba8 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1543,11 +1543,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
(phys_addr_t)ULLONG_MAX);
}

+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+ int start_rgn, end_rgn;
+ int i, ret;
+
+ if (!size)
+ return;
+
+ ret = memblock_isolate_range(&memblock.memory, base, size,
+ &start_rgn, &end_rgn);
+ if (ret)
+ return;
+
+ /* remove all the MAP regions */
+ for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+ if (!memblock_is_nomap(&memblock.memory.regions[i]))
+ memblock_remove_region(&memblock.memory, i);
+
+ for (i = start_rgn - 1; i >= 0; i--)
+ if (!memblock_is_nomap(&memblock.memory.regions[i]))
+ memblock_remove_region(&memblock.memory, i);
+
+ /* truncate the reserved regions */
+ memblock_remove_range(&memblock.reserved, 0, base);
+ memblock_remove_range(&memblock.reserved,
+ base + size, (phys_addr_t)ULLONG_MAX);
+}
+
void __init memblock_mem_limit_remove_map(phys_addr_t limit)
{
- struct memblock_type *type = &memblock.memory;
phys_addr_t max_addr;
- int i, ret, start_rgn, end_rgn;

if (!limit)
return;
@@ -1558,19 +1584,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
if (max_addr == (phys_addr_t)ULLONG_MAX)
return;

- ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
- &start_rgn, &end_rgn);
- if (ret)
- return;
-
- /* remove all the MAP regions above the limit */
- for (i = end_rgn - 1; i >= start_rgn; i--) {
- if (!memblock_is_nomap(&type->regions[i]))
- memblock_remove_region(type, i);
- }
- /* truncate the reserved regions */
- memblock_remove_range(&memblock.reserved, max_addr,
- (phys_addr_t)ULLONG_MAX);
+ memblock_cap_memory_range(0, max_addr);
}

static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
--
2.11.1
Catalin Marinas
2017-04-05 17:20:43 UTC
Permalink
Post by AKASHI Takahiro
Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().
This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").
This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)
---
include/linux/memblock.h | 1 +
mm/memblock.c | 44 +++++++++++++++++++++++++++++---------------
2 files changed, 30 insertions(+), 15 deletions(-)
Andrew, are you ok with patches 1 and 2 in this series (touching
mm/memblock.c and include/linux/memblock.h) to go in via the arm64 tree?

Thanks.
--
Catalin
AKASHI Takahiro
2017-04-03 02:24:33 UTC
Permalink
This function validates and invalidates PTE entries, and will be utilized
in kdump to protect loaded crash dump kernel image.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/pageattr.c | 15 +++++++++------
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 5a2a6ee65f65..728f933cef8c 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -154,5 +154,6 @@ int set_memory_ro(unsigned long addr, int numpages);
int set_memory_rw(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_valid(unsigned long addr, unsigned long size, int enable);

#endif
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 8def55e7249b..3212ee0558f6 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -125,20 +125,23 @@ int set_memory_x(unsigned long addr, int numpages)
}
EXPORT_SYMBOL_GPL(set_memory_x);

-#ifdef CONFIG_DEBUG_PAGEALLOC
-void __kernel_map_pages(struct page *page, int numpages, int enable)
+int set_memory_valid(unsigned long addr, int numpages, int enable)
{
- unsigned long addr = (unsigned long) page_address(page);
-
if (enable)
- __change_memory_common(addr, PAGE_SIZE * numpages,
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
__pgprot(PTE_VALID),
__pgprot(0));
else
- __change_memory_common(addr, PAGE_SIZE * numpages,
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
__pgprot(0),
__pgprot(PTE_VALID));
}
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+void __kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ set_memory_valid((unsigned long)page_address(page), numpages, enable);
+}
#ifdef CONFIG_HIBERNATION
/*
* When built with CONFIG_DEBUG_PAGEALLOC and CONFIG_HIBERNATION, this function
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:35 UTC
Permalink
Since arch_kexec_protect_crashkres() removes a mapping for crash dump
kernel image, the loaded data won't be preserved around hibernation.

In this patch, helper functions, crash_prepare_suspend()/
crash_post_resume(), are additionally called before/after hibernation so
that the relevant memory segments will be mapped again and preserved just
as the others are.

In addition, to minimize the size of hibernation image, crash_is_nosave()
is added to pfn_is_nosave() in order to recognize only the pages that hold
loaded crash dump kernel image as saveable. Hibernation excludes any pages
that are marked as Reserved and yet "nosave."

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
---
arch/arm64/include/asm/kexec.h | 10 ++++++
arch/arm64/kernel/hibernate.c | 10 +++++-
arch/arm64/kernel/machine_kexec.c | 71 +++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/init.c | 27 +++++++++++++++
4 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc5fb61..90aabbe893b7 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -43,6 +43,16 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
/* Empty routine needed to avoid build errors. */
}

+#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION)
+extern bool crash_is_nosave(unsigned long pfn);
+extern void crash_prepare_suspend(void);
+extern void crash_post_resume(void);
+#else
+static inline bool crash_is_nosave(unsigned long pfn) {return false; }
+static inline void crash_prepare_suspend(void) {}
+static inline void crash_post_resume(void) {}
+#endif
+
#endif /* __ASSEMBLY__ */

#endif
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 97a7384100f3..a44e13942d30 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -28,6 +28,7 @@
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/irqflags.h>
+#include <asm/kexec.h>
#include <asm/memory.h>
#include <asm/mmu_context.h>
#include <asm/pgalloc.h>
@@ -102,7 +103,8 @@ int pfn_is_nosave(unsigned long pfn)
unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);

- return (pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn);
+ return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)) ||
+ crash_is_nosave(pfn);
}

void notrace save_processor_state(void)
@@ -286,6 +288,9 @@ int swsusp_arch_suspend(void)
local_dbg_save(flags);

if (__cpu_suspend_enter(&state)) {
+ /* make the crash dump kernel image visible/saveable */
+ crash_prepare_suspend();
+
sleep_cpu = smp_processor_id();
ret = swsusp_save();
} else {
@@ -297,6 +302,9 @@ int swsusp_arch_suspend(void)
if (el2_reset_needed())
dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end);

+ /* make the crash dump kernel image protected again */
+ crash_post_resume();
+
/*
* Tell the hibernation core that we've just restored
* the memory
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index b63baa749609..a6d66b98d795 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -10,6 +10,7 @@
*/

#include <linux/kexec.h>
+#include <linux/page-flags.h>
#include <linux/smp.h>

#include <asm/cacheflush.h>
@@ -230,3 +231,73 @@ void arch_kexec_unprotect_crashkres(void)
__phys_to_virt(kexec_crash_image->segment[i].mem),
kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
}
+
+#ifdef CONFIG_HIBERNATION
+/*
+ * To preserve the crash dump kernel image, the relevant memory segments
+ * should be mapped again around the hibernation.
+ */
+void crash_prepare_suspend(void)
+{
+ if (kexec_crash_image)
+ arch_kexec_unprotect_crashkres();
+}
+
+void crash_post_resume(void)
+{
+ if (kexec_crash_image)
+ arch_kexec_protect_crashkres();
+}
+
+/*
+ * crash_is_nosave
+ *
+ * Return true only if a page is part of reserved memory for crash dump kernel,
+ * but does not hold any data of loaded kernel image.
+ *
+ * Note that all the pages in crash dump kernel memory have been initially
+ * marked as Reserved in kexec_reserve_crashkres_pages().
+ *
+ * In hibernation, the pages which are Reserved and yet "nosave" are excluded
+ * from the hibernation iamge. crash_is_nosave() does thich check for crash
+ * dump kernel and will reduce the total size of hibernation image.
+ */
+
+bool crash_is_nosave(unsigned long pfn)
+{
+ int i;
+ phys_addr_t addr;
+
+ if (!crashk_res.end)
+ return false;
+
+ /* in reserved memory? */
+ addr = __pfn_to_phys(pfn);
+ if ((addr < crashk_res.start) || (crashk_res.end < addr))
+ return false;
+
+ if (!kexec_crash_image)
+ return true;
+
+ /* not part of loaded kernel image? */
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ if (addr >= kexec_crash_image->segment[i].mem &&
+ addr < (kexec_crash_image->segment[i].mem +
+ kexec_crash_image->segment[i].memsz))
+ return false;
+
+ return true;
+}
+
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
+{
+ unsigned long addr;
+ struct page *page;
+
+ for (addr = begin; addr < end; addr += PAGE_SIZE) {
+ page = phys_to_page(addr);
+ ClearPageReserved(page);
+ free_reserved_page(page);
+ }
+}
+#endif /* CONFIG_HIBERNATION */
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 09d19207362d..89ba3cd0fe44 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -134,10 +134,35 @@ static void __init reserve_crashkernel(void)
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
}
+
+static void __init kexec_reserve_crashkres_pages(void)
+{
+#ifdef CONFIG_HIBERNATION
+ phys_addr_t addr;
+ struct page *page;
+
+ if (!crashk_res.end)
+ return;
+
+ /*
+ * To reduce the size of hibernation image, all the pages are
+ * marked as Reserved initially.
+ */
+ for (addr = crashk_res.start; addr < (crashk_res.end + 1);
+ addr += PAGE_SIZE) {
+ page = phys_to_page(addr);
+ SetPageReserved(page);
+ }
+#endif
+}
#else
static void __init reserve_crashkernel(void)
{
}
+
+static void __init kexec_reserve_crashkres_pages(void)
+{
+}
#endif /* CONFIG_KEXEC_CORE */

/*
@@ -517,6 +542,8 @@ void __init mem_init(void)
/* this will put all unused low memory onto the freelists */
free_all_bootmem();

+ kexec_reserve_crashkres_pages();
+
mem_init_print_info(NULL);

#define MLK(b, t) b, t, ((t) - (b)) >> 10
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:39 UTC
Permalink
Kdump is enabled by default as kexec is.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/configs/defconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 7c48028ec64a..927ee18bbdf2 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -82,6 +82,7 @@ CONFIG_CMA=y
CONFIG_SECCOMP=y
CONFIG_XEN=y
CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_COMPAT=y
CONFIG_CPU_IDLE=y
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:38 UTC
Permalink
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.

A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().

Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.

On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.

Meanwhile, elfcorehdr_read() is simple as the region is always mapped.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/Kconfig | 11 +++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/init.c | 53 +++++++++++++++++++++++++++++++
4 files changed, 136 insertions(+)
create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3741859765cf..e7f043efff41 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -736,6 +736,17 @@ config KEXEC
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.

+config CRASH_DUMP
+ bool "Build kdump crash kernel"
+ help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec.
+
+ For more details see Documentation/kdump/kdump.txt
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 1606c6b2a280..0fe44a7513ff 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o

obj-y += $(arm64-obj-y) vdso/ probes/
obj-m += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 000000000000..f46d57c31443
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2017 Linaro Limited
+ * Author: AKASHI Takahiro <***@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+ memunmap(vaddr);
+ return -EFAULT;
+ }
+ } else {
+ memcpy(buf, vaddr + offset, csize);
+ }
+
+ memunmap(vaddr);
+
+ return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+ return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 89ba3cd0fe44..5960bef0170d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -39,6 +39,7 @@
#include <linux/vmalloc.h>
#include <linux/mm.h>
#include <linux/kexec.h>
+#include <linux/crash_dump.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -165,6 +166,56 @@ static void __init kexec_reserve_crashkres_pages(void)
}
#endif /* CONFIG_KEXEC_CORE */

+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves the memory occupied by an elf core header
+ * described in the device tree. This region contains all the
+ * information about primary kernel's core image and is used by a dump
+ * capture kernel to access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+ of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+ if (!elfcorehdr_size)
+ return;
+
+ if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+ pr_warn("elfcorehdr is overlapped\n");
+ return;
+ }
+
+ memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+ pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
+ elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+}
+#endif /* CONFIG_CRASH_DUMP */
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -423,6 +474,8 @@ void __init arm64_memblock_init(void)

reserve_crashkernel();

+ reserve_elfcorehdr();
+
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:40 UTC
Permalink
Add arch specific descriptions about kdump usage on arm64 to kdump.txt.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Baoquan He <***@redhat.com>
Acked-by: Dave Young <***@redhat.com>
Acked-by: Catalin Marinas <***@arm.com>
---
Documentation/kdump/kdump.txt | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index b0eb27b956d9..615434d81108 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
a remote system.

Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.

When the system kernel boots, it reserves a small section of memory for
the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)

AUTO_ZRELADDR=y

+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+ on non-VHE systems even if it is configured. This is because the CPU
+ will not be reset to EL2 on panic.
+
Extended crashkernel syntax
===========================

@@ -305,6 +312,8 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.

+ On arm64, use "crashkernel=Y[@X]". Note that the start address of
+ the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).

Load the Dump-capture Kernel
============================
@@ -327,6 +336,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
+ - Use vmlinux or Image

If you are using a uncompressed vmlinux image then use following command
to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
For arm:
"1 maxcpus=1 reset_devices"

+For arm64:
+ "1 maxcpus=1 reset_devices"
+
Notes on loading the dump-capture kernel:

* By default, the ELF headers are stored in ELF64 format to support
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:36 UTC
Permalink
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 42 +++++++++++++++++++++++-
arch/arm64/include/asm/smp.h | 3 ++
arch/arm64/kernel/machine_kexec.c | 58 ++++++++++++++++++++++++++++++---
arch/arm64/kernel/smp.c | 68 +++++++++++++++++++++++++++++++++++++++
5 files changed, 167 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297dac77..1473fc2f7ab7 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
#include <linux/threads.h>
#include <asm/irq.h>

-#define NR_IPI 6
+#define NR_IPI 7

typedef struct {
unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 90aabbe893b7..e17f0529a882 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,47 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, SPSEL\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=&r" (tmp1), "=&r" (tmp2)
+ : "r" (newregs)
+ : "memory"
+ );
+ }
}

#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d050d720a1b4..55f08c5acfad 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,6 +148,9 @@ static inline void cpu_panic_kernel(void)
*/
bool cpus_are_stuck_in_kernel(void);

+extern void smp_send_crash_stop(void);
+extern bool smp_crash_stop_failed(void);
+
#endif /* ifndef __ASSEMBLY__ */

#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index a6d66b98d795..779a80046066 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
* published by the Free Software Foundation.
*/

+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/page-flags.h>
#include <linux/smp.h>
@@ -143,11 +146,15 @@ void machine_kexec(struct kimage *kimage)
{
phys_addr_t reboot_code_buffer_phys;
void *reboot_code_buffer;
+ bool in_kexec_crash = (kimage == kexec_crash_image);
+ bool stuck_cpus = cpus_are_stuck_in_kernel();

/*
* New cpus may have become stuck_in_kernel after we loaded the image.
*/
- BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+ BUG_ON(!in_kexec_crash && (stuck_cpus || (num_online_cpus() > 1)));
+ WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
+ "Some CPUs may be stale, kdump will be unreliable.\n");

reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -199,15 +206,58 @@ void machine_kexec(struct kimage *kimage)
* relocation is complete.
*/

- cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage->start, 0);
+ cpu_soft_restart(kimage != kexec_crash_image,
+ reboot_code_buffer_phys, kimage->head, kimage->start, 0);

BUG(); /* Should never get here. */
}

+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+ int ret;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ /*
+ * First try to remove the active state. If this
+ * fails, try to EOI the interrupt.
+ */
+ ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+ if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
void machine_crash_shutdown(struct pt_regs *regs)
{
- /* Empty routine needed to avoid build errors. */
+ local_irq_disable();
+
+ /* shutdown non-crashing cpus */
+ smp_send_crash_stop();
+
+ /* for crashing cpu */
+ crash_save_cpu(regs, smp_processor_id());
+ machine_kexec_mask_interrupts();
+
+ pr_info("Starting crashdump kernel...\n");
}

void arch_kexec_protect_crashkres(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d4739552da28..ffee4e454ac5 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -39,6 +39,7 @@
#include <linux/completion.h>
#include <linux/of.h>
#include <linux/irq_work.h>
+#include <linux/kexec.h>

#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -76,6 +77,7 @@ enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_CPU_CRASH_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
IPI_WAKEUP
@@ -756,6 +758,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
S(IPI_RESCHEDULE, "Rescheduling interrupts"),
S(IPI_CALL_FUNC, "Function call interrupts"),
S(IPI_CPU_STOP, "CPU stop interrupts"),
+ S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
S(IPI_TIMER, "Timer broadcast interrupts"),
S(IPI_IRQ_WORK, "IRQ work interrupts"),
S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -830,6 +833,29 @@ static void ipi_cpu_stop(unsigned int cpu)
cpu_relax();
}

+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0);
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+ crash_save_cpu(regs, cpu);
+
+ atomic_dec(&waiting_for_crash_ipi);
+
+ local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (cpu_ops[cpu]->cpu_die)
+ cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+ /* just in case */
+ cpu_park_loop();
+#endif
+}
+
/*
* Main handler for inter-processor interrupts
*/
@@ -860,6 +886,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
irq_exit();
break;

+ case IPI_CPU_CRASH_STOP:
+ if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+ irq_enter();
+ ipi_cpu_crash_stop(cpu, regs);
+
+ unreachable();
+ }
+ break;
+
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
case IPI_TIMER:
irq_enter();
@@ -932,6 +967,39 @@ void smp_send_stop(void)
cpumask_pr_args(cpu_online_mask));
}

+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(&mask));
+}
+
+bool smp_crash_stop_failed(void)
+{
+ return (atomic_read(&waiting_for_crash_ipi) > 0);
+}
+#endif
+
/*
* not supported here
*/
--
2.11.1
David Woodhouse
2017-04-03 09:21:34 UTC
Permalink
Post by AKASHI Takahiro
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+       crash_save_cpu(regs, cpu);
+
+       atomic_dec(&waiting_for_crash_ipi);
+
+       local_irq_disable();
Shouldn't the above two be the other way round? Stop handling
interrupts *before* we tell the surviving CPU that we're done?

Also, should the surviving CPU be calling __cpu_die() to attempt to
check that the others really have died? Sure, it can't do much more
than print a warning if the call to ->cpu_kill() times out, but that's
still useful information. And perhaps we could allow platforms to
register a method to shoot other CPUs down forcefully if they don't
respond. My board could apparently support that.

It would certainly be useful to allow smp_send_crash_stop() to be
overridden with a platform-specific method instead of normal IPIs. The
platform might be able to do something... less maskable.
AKASHI Takahiro
2017-04-04 07:29:23 UTC
Permalink
Post by David Woodhouse
Post by AKASHI Takahiro
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+       crash_save_cpu(regs, cpu);
+
+       atomic_dec(&waiting_for_crash_ipi);
+
+       local_irq_disable();
Shouldn't the above two be the other way round? Stop handling
interrupts *before* we tell the surviving CPU that we're done?
Might be, but I'm not sure how well such a change works.
Do you see any improvement in, say, saving failed-to-reboot cases?
Post by David Woodhouse
Also, should the surviving CPU be calling __cpu_die() to attempt to
check that the others really have died? Sure, it can't do much more
than print a warning if the call to ->cpu_kill() times out, but that's
still useful information.
A pair of cpu_wait_death() and cpu_report_death() merely does
a hand-shake protocol over per-cpu "cpu_hotplug_state" variable.
In this sense, it's not much different from "waiting_for_crash_ipi"
approach used here in ipi_cpu_crash_stop().

Having said that, cpu_ops->cpu_kill() might help forcefully tear down
the cpu(s). One of my concerns is that we may lose some data in cache.
Post by David Woodhouse
And perhaps we could allow platforms to
register a method to shoot other CPUs down forcefully if they don't
respond. My board could apparently support that.
It would certainly be useful to allow smp_send_crash_stop() to be
overridden with a platform-specific method instead of normal IPIs. The
platform might be able to do something... less maskable.
I agree.
It would be great if you post your own patch for such a new interface.
(Without any hardware, I can't implement/test this feature, anyway.)

Thanks,
-Takahiro AKASHI
AKASHI Takahiro
2017-04-03 02:26:33 UTC
Permalink
From: Sameer Goel <***@codeaurora.org>

In cases where a device tree is not provided (ie ACPI based system), an
empty fdt is generated by efistub. #address-cells and #size-cells are not
set in the empty fdt, so they default to 1 (4 byte wide). This can be an
issue on 64-bit systems where values representing addresses, etc may be
8 bytes wide as the default value does not align with the general
requirements for an empty DTB, and is fragile when passed to other agents
as extra care is required to read the entire width of a value.

This issue is observed on Qualcomm Technologies QDF24XX platforms when
kexec-tools inserts 64-bit addresses into the "linux,elfcorehdr" and
"linux,usable-memory-range" properties of the fdt. When the values are
later consumed, they are truncated to 32-bit.

Setting #address-cells and #size-cells to 2 at creation of the empty fdt
resolves the observed issue, and makes the fdt less fragile.

Signed-off-by: Sameer Goel <***@codeaurora.org>
Signed-off-by: Jeffrey Hugo <***@codeaurora.org>
Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
drivers/firmware/efi/libstub/fdt.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index 260c4b4b492e..82973b86efe4 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -16,6 +16,22 @@

#include "efistub.h"

+#define EFI_DT_ADDR_CELLS_DEFAULT 2
+#define EFI_DT_SIZE_CELLS_DEFAULT 2
+
+static void fdt_update_cell_size(efi_system_table_t *sys_table, void *fdt)
+{
+ int offset;
+
+ offset = fdt_path_offset(fdt, "/");
+ /* Set the #address-cells and #size-cells values for an empty tree */
+
+ fdt_setprop_u32(fdt, offset, "#address-cells",
+ EFI_DT_ADDR_CELLS_DEFAULT);
+
+ fdt_setprop_u32(fdt, offset, "#size-cells", EFI_DT_SIZE_CELLS_DEFAULT);
+}
+
static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
unsigned long orig_fdt_size,
void *fdt, int new_fdt_size, char *cmdline_ptr,
@@ -42,10 +58,18 @@ static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
}
}

- if (orig_fdt)
+ if (orig_fdt) {
status = fdt_open_into(orig_fdt, fdt, new_fdt_size);
- else
+ } else {
status = fdt_create_empty_tree(fdt, new_fdt_size);
+ if (status == 0) {
+ /*
+ * Any failure from the following function is non
+ * critical
+ */
+ fdt_update_cell_size(sys_table, fdt);
+ }
+ }

if (status != 0)
goto fdt_set_fail;
--
2.11.1
AKASHI Takahiro
2017-04-03 02:26:06 UTC
Permalink
From: James Morse <***@arm.com>

Add documentation for DT properties:
linux,usable-memory-range
linux,elfcorehdr
used by arm64 kdump. Those are, respectively, a usable memory range
allocated to crash dump kernel and the elfcorehdr's location within it.

Signed-off-by: James Morse <***@arm.com>
[***@linaro.org: update the text due to recent changes ]
Signed-off-by: AKASHI Takahiro <***@linaro.org>
Acked-by: Mark Rutland <***@arm.com>
Cc: ***@vger.kernel.org
Cc: Rob Herring <robh+***@kernel.org>
---
Documentation/devicetree/bindings/chosen.txt | 45 ++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82d4c37..b5e39af4ddc0 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,48 @@ This property is set (currently only on PowerPC, and only needed on
book3e) by some versions of kexec-tools to tell the new kernel that it
is being booted by kexec, as the booting environment may differ (e.g.
a different secondary CPU release mechanism)
+
+linux,usable-memory-range
+-------------------------
+
+This property (arm64 only) holds a base address and size, describing a
+limited region in which memory may be considered available for use by
+the kernel. Memory outside of this range is not available for use.
+
+This property describes a limitation: memory within this range is only
+valid when also described through another mechanism that the kernel
+would otherwise use to determine available memory (e.g. memory nodes
+or the EFI memory map). Valid memory may be sparse within the range.
+e.g.
+
+/ {
+ chosen {
+ linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+ };
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+While this property does not represent a real hardware, the address
+and the size are expressed in #address-cells and #size-cells,
+respectively, of the root node.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+ chosen {
+ linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+ };
+};
+
+While this property does not represent a real hardware, the address
+and the size are expressed in #address-cells and #size-cells,
+respectively, of the root node.
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:37 UTC
Permalink
In addition to common VMCOREINFO's defined in
crash_save_vmcoreinfo_init(), we need to know, for crash utility,
- kimage_voffset
- PHYS_OFFSET
to examine the contents of a dump file (/proc/vmcore) correctly
due to the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.

- VA_BITS
is also required for makedumpfile command.

arch_crash_save_vmcoreinfo() appends them to the dump file.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 779a80046066..481f54a866c5 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -18,6 +18,7 @@

#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/memory.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
#include <asm/page.h>
@@ -351,3 +352,13 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
}
}
#endif /* CONFIG_HIBERNATION */
+
+void arch_crash_save_vmcoreinfo(void)
+{
+ VMCOREINFO_NUMBER(VA_BITS);
+ /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+ vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+ kimage_voffset);
+ vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+ PHYS_OFFSET);
+}
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:34 UTC
Permalink
From: Takahiro Akashi <***@linaro.org>

arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.

The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent coherency issues due to potential cache aliasing (with
mismatched attributes).

Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++---
arch/arm64/mm/mmu.c | 103 ++++++++++++++++++++------------------
2 files changed, 80 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@

#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>

#include "cpu-reset.h"

@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;

-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);

if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);

/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);

pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/

cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);

BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 91502e36e6d9..0c429ec6fde8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -393,10 +395,28 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
flush_tlb_kernel_range(virt, virt + size);
}

-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot, int flags)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc, flags);
+}
+
+void __init mark_linear_text_alias_ro(void)
+{
+ /*
+ * Remove the write permissions from the linear alias of .text/.rodata
+ */
+ update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+ (unsigned long)__init_begin - (unsigned long)_text,
+ PAGE_KERNEL_RO);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
int flags = 0;

if (debug_pagealloc_enabled())
@@ -405,30 +425,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
/*
* Take care not to create a writable alias for the
* read-only text and rodata sections of the kernel image.
+ * So temporarily mark them as NOMAP to skip mappings in
+ * the following for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif

- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- return;
- }
+ /* map all the memory banks */
+ for_each_memblock(memory, reg) {
+ phys_addr_t start = reg->base;
+ phys_addr_t end = start + reg->size;

- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc, flags);
+ if (start >= end)
+ break;
+ if (memblock_is_nomap(reg))
+ continue;
+
+ __map_memblock(pgd, start, end, PAGE_KERNEL, flags);
+ }

/*
* Map the linear alias of the [_text, __init_begin) interval
@@ -440,37 +458,24 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
* Note that contiguous mappings cannot be remapped in this way,
* so we should avoid them here.
*/
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, NO_CONT_MAPPINGS);
-}
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL, NO_CONT_MAPPINGS);
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);

-void __init mark_linear_text_alias_ro(void)
-{
+#ifdef CONFIG_KEXEC_CORE
/*
- * Remove the write permissions from the linear alias of .text/.rodata
+ * Use page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
*/
- update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
- (unsigned long)__init_begin - (unsigned long)_text,
- PAGE_KERNEL_RO);
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
-
- /* map all the memory banks */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
-
- if (start >= end)
- break;
- if (memblock_is_nomap(reg))
- continue;
-
- __map_memblock(pgd, start, end);
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL,
+ NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}

void mark_rodata_ro(void)
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:32 UTC
Permalink
"crashkernel=" kernel parameter specifies the size (and optionally
the start address) of the system ram to be used by crash dump kernel.
reserve_crashkernel() will allocate and reserve that memory at boot time
of primary kernel.

The memory range will be exposed to userspace as a resource named
"Crash kernel" in /proc/iomem.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Signed-off-by: Mark Salter <***@redhat.com>
Signed-off-by: Pratyush Anand <***@redhat.com>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/kernel/setup.c | 7 ++++-
arch/arm64/mm/init.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 42274bda0ccb..28855ec1be95 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
@@ -226,6 +225,12 @@ static void __init request_standard_resources(void)
if (kernel_data.start >= res->start &&
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
+#ifdef CONFIG_KEXEC_CORE
+ /* Userspace will find "Crash kernel" region in /proc/iomem. */
+ if (crashk_res.end && crashk_res.start >= res->start &&
+ crashk_res.end <= res->end)
+ request_resource(res, &crashk_res);
+#endif
}
}

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 290794b1a0f1..09d19207362d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -30,6 +30,7 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/dma-mapping.h>
#include <linux/dma-contiguous.h>
@@ -37,6 +38,7 @@
#include <linux/swiotlb.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
+#include <linux/kexec.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -77,6 +79,67 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif

+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_base, crash_size;
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ /* no crashkernel= or invalid value specified */
+ if (ret || !crash_size)
+ return;
+
+ crash_size = PAGE_ALIGN(crash_size);
+
+ if (crash_base == 0) {
+ /* Current arm64 boot protocol requires 2MB alignment */
+ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+ crash_size, SZ_2M);
+ if (crash_base == 0) {
+ pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size)) {
+ pr_warn("cannot reserve crashkernel: region is not memory\n");
+ return;
+ }
+
+ if (memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
+ return;
+ }
+
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
+ crash_base, crash_base + crash_size, crash_size >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+}
+#endif /* CONFIG_KEXEC_CORE */
+
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -332,6 +395,9 @@ void __init arm64_memblock_init(void)
arm64_dma_phys_limit = max_zone_dma_phys();
else
arm64_dma_phys_limit = PHYS_MASK + 1;
+
+ reserve_crashkernel();
+
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
--
2.11.1
AKASHI Takahiro
2017-04-03 02:24:31 UTC
Permalink
Crash dump kernel uses only a limited range of available memory as System
RAM. On arm64 kdump, This memory range is advertised to crash dump kernel
via a device-tree property under /chosen,
linux,usable-memory-range = <BASE SIZE>

Crash dump kernel reads this property at boot time and calls
memblock_cap_memory_range() to limit usable memory which are listed either
in UEFI memory map table or "memory" nodes of a device tree blob.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Geoff Levand <***@infradead.org>
Acked-by: Catalin Marinas <***@arm.com>
Acked-by: Mark Rutland <***@arm.com>
Reviewed-by: Ard Biesheuvel <***@linaro.org>
---
arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e19e06593e37..290794b1a0f1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -188,10 +188,45 @@ static int __init early_mem(char *p)
}
early_param("mem", early_mem);

+static int __init early_init_dt_scan_usablemem(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ struct memblock_region *usablemem = data;
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+ struct memblock_region reg = {
+ .size = 0,
+ };
+
+ of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+ if (reg.size)
+ memblock_cap_memory_range(reg.base, reg.size);
+}
+
void __init arm64_memblock_init(void)
{
const s64 linear_region_size = -(s64)PAGE_OFFSET;

+ /* Handle linux,usable-memory-range property */
+ fdt_enforce_memory_region();
+
/*
* Ensure that the linear region takes up exactly half of the kernel
* virtual address space. This way, we can distinguish a linear address
--
2.11.1
Loading...