Discussion:
[PATCH v34 00/14] arm64: add kdump support
AKASHI Takahiro
2017-03-28 06:48:31 UTC
Permalink
This patch series adds kdump support on arm64.

To load a crash-dump kernel to the systems, a series of patches to
kexec-tools[1] are also needed. Please use the latest one, v6 [2].
For your convinience, you can pick them up from:
https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump

To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
- crash utility (v7.1.8 or later) [3]

I tested this patchset on fast model and hikey.

The previous versions were also:
Tested-by: Pratyush Anand <***@redhat.com> (v33, mustang and seattle)
Tested-by: James Morse <***@arm.com> (v27/v32?, Juno)
Tested-by: Sameer Goel (v33, QDT2400)

Changes for v34 (Mar 28, 2017)
o add and use set_memory_valid() instead of create_pgd_mapping()
(patch #5,6)
o rename functions from kexec_* to crash_* (patch #7)
o supress WARN_ON() message if successfully shutting down secondary cpus
(patch #8)

Changes for v33 (Mar 15, 2017)
o rebased to v4.11-rc2+
o arch_kexec_(un)protect_crashkres() now protects loaded data segments
only along with moving copying of control_code_page back to machine_kexec()
(patch #6)
o reduce the size of hibernation image when kdump and hibernation are
comfigured at the same time (patch #7)
o clearify that "linux,usable-memory-range" and "linux,elfcorehdr"
have values of the size of root node's "#address-cells" and "#size-cells"
(patch #13)
o add "efi/libstub/arm*: Set default address and size cells values for
an empty dtb" from Sameer Goel (patch #14)
(I didn't test the case though.)

Changes for v32 (Feb 7, 2017)
o isolate crash dump kernel memory as well as kernel text/data by using
MEMBLOCK_MAP attribute to and then specifically map them in map_mem()
(patch #1,6)
o delete remove_pgd_mapping() and instead modify create_pgd_mapping() to
allowing for unmapping a kernel mapping (patch #5)
o correct a commit message as well as a comment in the source (patch#10)
o other trivial changes after Mark's comments (patch#3,4)

Changes for v31 (Feb 1, 2017)
o add/use remove_pgd_mapping() instead of modifying (__)create_pgd_mapping()
to protect crash dump kernel memory (patch #4,5)
o fix an issue at the isolation of crash dump kernel memory in
map_mem()/__map_memblock(), adding map_crashkernel() (patch#5)
o preserve the contents of crash dump kernel memory around hibernation
(patch#6)

Changes for v30 (Jan 24, 2017)
o rebased to Linux-v4.10-rc5
o remove "linux,crashkernel-base/size" from exported device tree
o protect memory region for crash-dump kernel (adding patch#4,5)
o remove "in_crash_kexec" variable
o and other trivial changes

Changes for v29 (Dec 28, 2016)
o rebased to Linux-v4.10-rc1
o change asm constraints in crash_setup_regs() per Catalin

Changes for v28 (Nov 22, 2016)
o rebased to Linux-v4.9-rc6
o revamp patch #1 and merge memblock_cap_memory_range() with
memblock_mem_limit_remove_map()

Changes for v27 (Nov 1, 2016)
o rebased to Linux-v4.9-rc3
o revert v26 change, i.e. revive "linux,usable-memory-range" property
(patch #2/#3, updating patch #9)
o minor fixes per review comments (patch #3/#4/#6/#8)
o re-order patches and improve commit messages for readability

Changes for v26 (Sep 7, 2016):
o Use /reserved-memory instead of "linux,usable-memory-range" property
(dropping v25's patch#2 and #3, updating ex-patch#9.)

Changes for v25 (Aug 29, 2016):
o Rebase to Linux-4.8-rc4
o Use memremap() instead of ioremap_cache() [patch#5]

Changes for v24 (Aug 9, 2016):
o Rebase to Linux-4.8-rc1
o Update descriptions about newly added DT proerties

Changes for v23 (July 26, 2016):

o Move memblock_reserve() to a single place in reserve_crashkernel()
o Use cpu_park_loop() in ipi_cpu_crash_stop()
o Always enforce ARCH_LOW_ADDRESS_LIMIT to the memory range of crash kernel
o Re-implement fdt_enforce_memory_region() to remove non-reserve regions
(for ACPI) from usable memory at crash kernel

Changes for v22 (July 12, 2016):

o Export "crashkernel-base" and "crashkernel-size" via device-tree,
and add some descriptions about them in chosen.txt
o Rename "usable-memory" to "usable-memory-range" to avoid inconsistency
with powerpc's "usable-memory"
o Make cosmetic changes regarding "ifdef" usage
o Correct some wordings in kdump.txt

Changes for v21 (July 6, 2016):

o Remove kexec patches.
o Rebase to arm64's for-next/core (Linux-4.7-rc4 based).
o Clarify the description about kvm in kdump.txt.

See the link [4] for older changes.


[1] https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
[2] http://lists.infradead.org/pipermail/kexec/2017-March/018356.html
[3] https://github.com/crash-utility/crash.git
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/438780.html

AKASHI Takahiro (12):
memblock: add memblock_clear_nomap()
memblock: add memblock_cap_memory_range()
arm64: limit memory regions based on DT property, usable-memory-range
arm64: kdump: reserve memory for crash dump kernel
arm64: mm: add set_memory_valid()
arm64: kdump: protect crash dump kernel memory
arm64: hibernate: preserve kdump image around hibernation
arm64: kdump: implement machine_crash_shutdown()
arm64: kdump: add VMCOREINFO's for user-space tools
arm64: kdump: provide /proc/vmcore file
arm64: kdump: enable kdump in defconfig
Documentation: kdump: describe arm64 port

James Morse (1):
Documentation: dt: chosen properties for arm64 kdump

Sameer Goel (1):
efi/libstub/arm*: Set default address and size cells values for an
empty dtb

Documentation/devicetree/bindings/chosen.txt | 45 +++++++
Documentation/kdump/kdump.txt | 16 ++-
arch/arm64/Kconfig | 11 ++
arch/arm64/configs/defconfig | 1 +
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 52 +++++++-
arch/arm64/include/asm/smp.h | 3 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 +++++++++++
arch/arm64/kernel/hibernate.c | 10 +-
arch/arm64/kernel/machine_kexec.c | 170 +++++++++++++++++++++++--
arch/arm64/kernel/setup.c | 7 +-
arch/arm64/kernel/smp.c | 68 ++++++++++
arch/arm64/mm/init.c | 181 +++++++++++++++++++++++++++
arch/arm64/mm/mmu.c | 90 ++++++-------
arch/arm64/mm/pageattr.c | 15 ++-
drivers/firmware/efi/libstub/fdt.c | 28 ++++-
include/linux/memblock.h | 2 +
mm/memblock.c | 56 ++++++---
20 files changed, 749 insertions(+), 81 deletions(-)
create mode 100644 arch/arm64/kernel/crash_dump.c
--
2.11.1
AKASHI Takahiro
2017-03-28 06:50:32 UTC
Permalink
This function, with a combination of memblock_mark_nomap(), will be used
in a later kdump patch for arm64 when it temporarily isolates some range
of memory from the other memory blocks in order to create a specific
kernel mapping at boot time.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 12 ++++++++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index bdfc65af4152..e82daffcfc44 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,7 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
+int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
ulong choose_memblock_flags(void);

/* Low level functions */
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d17c4e..2f4ca8104ea4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -805,6 +805,18 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
}

/**
+ * memblock_clear_nomap - Clear flag MEMBLOCK_NOMAP for a specified region.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
+}
+
+/**
* __next_reserved_mem_region - next function for for_each_reserved_region()
* @idx: pointer to u64 loop variable
* @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
--
2.11.1
Ard Biesheuvel
2017-03-28 09:47:38 UTC
Permalink
Post by AKASHI Takahiro
This function, with a combination of memblock_mark_nomap(), will be used
in a later kdump patch for arm64 when it temporarily isolates some range
of memory from the other memory blocks in order to create a specific
kernel mapping at boot time.
---
include/linux/memblock.h | 1 +
mm/memblock.c | 12 ++++++++++++
2 files changed, 13 insertions(+)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index bdfc65af4152..e82daffcfc44 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,7 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
+int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
ulong choose_memblock_flags(void);
/* Low level functions */
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d17c4e..2f4ca8104ea4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -805,6 +805,18 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
}
/**
+ * memblock_clear_nomap - Clear flag MEMBLOCK_NOMAP for a specified region.
+ *
+ * Return 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
+}
+
+/**
* __next_reserved_mem_region - next function for for_each_reserved_region()
--
2.11.1
AKASHI Takahiro
2017-03-28 06:50:33 UTC
Permalink
Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Will Deacon <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
Acked-by: Dennis Chen <***@arm.com>
Cc: linux-***@kvack.org
Cc: Andrew Morton <***@linux-foundation.org>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 44 +++++++++++++++++++++++++++++---------------
2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e82daffcfc44..4ce24a376262 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -336,6 +336,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
phys_addr_t memblock_start_of_DRAM(void);
phys_addr_t memblock_end_of_DRAM(void);
void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
void memblock_mem_limit_remove_map(phys_addr_t limit);
bool memblock_is_memory(phys_addr_t addr);
int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 2f4ca8104ea4..b049c9b2dba8 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1543,11 +1543,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
(phys_addr_t)ULLONG_MAX);
}

+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+ int start_rgn, end_rgn;
+ int i, ret;
+
+ if (!size)
+ return;
+
+ ret = memblock_isolate_range(&memblock.memory, base, size,
+ &start_rgn, &end_rgn);
+ if (ret)
+ return;
+
+ /* remove all the MAP regions */
+ for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+ if (!memblock_is_nomap(&memblock.memory.regions[i]))
+ memblock_remove_region(&memblock.memory, i);
+
+ for (i = start_rgn - 1; i >= 0; i--)
+ if (!memblock_is_nomap(&memblock.memory.regions[i]))
+ memblock_remove_region(&memblock.memory, i);
+
+ /* truncate the reserved regions */
+ memblock_remove_range(&memblock.reserved, 0, base);
+ memblock_remove_range(&memblock.reserved,
+ base + size, (phys_addr_t)ULLONG_MAX);
+}
+
void __init memblock_mem_limit_remove_map(phys_addr_t limit)
{
- struct memblock_type *type = &memblock.memory;
phys_addr_t max_addr;
- int i, ret, start_rgn, end_rgn;

if (!limit)
return;
@@ -1558,19 +1584,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
if (max_addr == (phys_addr_t)ULLONG_MAX)
return;

- ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
- &start_rgn, &end_rgn);
- if (ret)
- return;
-
- /* remove all the MAP regions above the limit */
- for (i = end_rgn - 1; i >= start_rgn; i--) {
- if (!memblock_is_nomap(&type->regions[i]))
- memblock_remove_region(type, i);
- }
- /* truncate the reserved regions */
- memblock_remove_range(&memblock.reserved, max_addr,
- (phys_addr_t)ULLONG_MAX);
+ memblock_cap_memory_range(0, max_addr);
}

static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
--
2.11.1
Ard Biesheuvel
2017-03-28 09:48:42 UTC
Permalink
Post by AKASHI Takahiro
Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().
This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").
This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)
---
include/linux/memblock.h | 1 +
mm/memblock.c | 44 +++++++++++++++++++++++++++++---------------
2 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e82daffcfc44..4ce24a376262 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -336,6 +336,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn);
phys_addr_t memblock_start_of_DRAM(void);
phys_addr_t memblock_end_of_DRAM(void);
void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
void memblock_mem_limit_remove_map(phys_addr_t limit);
bool memblock_is_memory(phys_addr_t addr);
int memblock_is_map_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 2f4ca8104ea4..b049c9b2dba8 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1543,11 +1543,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
(phys_addr_t)ULLONG_MAX);
}
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+ int start_rgn, end_rgn;
+ int i, ret;
+
+ if (!size)
+ return;
+
+ ret = memblock_isolate_range(&memblock.memory, base, size,
+ &start_rgn, &end_rgn);
+ if (ret)
+ return;
+
+ /* remove all the MAP regions */
+ for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+ if (!memblock_is_nomap(&memblock.memory.regions[i]))
+ memblock_remove_region(&memblock.memory, i);
+
+ for (i = start_rgn - 1; i >= 0; i--)
+ if (!memblock_is_nomap(&memblock.memory.regions[i]))
+ memblock_remove_region(&memblock.memory, i);
+
+ /* truncate the reserved regions */
+ memblock_remove_range(&memblock.reserved, 0, base);
+ memblock_remove_range(&memblock.reserved,
+ base + size, (phys_addr_t)ULLONG_MAX);
+}
+
void __init memblock_mem_limit_remove_map(phys_addr_t limit)
{
- struct memblock_type *type = &memblock.memory;
phys_addr_t max_addr;
- int i, ret, start_rgn, end_rgn;
if (!limit)
return;
@@ -1558,19 +1584,7 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
if (max_addr == (phys_addr_t)ULLONG_MAX)
return;
- ret = memblock_isolate_range(type, max_addr, (phys_addr_t)ULLONG_MAX,
- &start_rgn, &end_rgn);
- if (ret)
- return;
-
- /* remove all the MAP regions above the limit */
- for (i = end_rgn - 1; i >= start_rgn; i--) {
- if (!memblock_is_nomap(&type->regions[i]))
- memblock_remove_region(type, i);
- }
- /* truncate the reserved regions */
- memblock_remove_range(&memblock.reserved, max_addr,
- (phys_addr_t)ULLONG_MAX);
+ memblock_cap_memory_range(0, max_addr);
}
static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:21 UTC
Permalink
Crash dump kernel uses only a limited range of available memory as System
RAM. On arm64 kdump, This memory range is advertised to crash dump kernel
via a device-tree property under /chosen,
linux,usable-memory-range = <BASE SIZE>

Crash dump kernel reads this property at boot time and calls
memblock_cap_memory_range() to limit usable memory which are listed either
in UEFI memory map table or "memory" nodes of a device tree blob.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Geoff Levand <***@infradead.org>
Acked-by: Catalin Marinas <***@arm.com>
Acked-by: Mark Rutland <***@arm.com>
---
arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e19e06593e37..290794b1a0f1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -188,10 +188,45 @@ static int __init early_mem(char *p)
}
early_param("mem", early_mem);

+static int __init early_init_dt_scan_usablemem(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ struct memblock_region *usablemem = data;
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+ struct memblock_region reg = {
+ .size = 0,
+ };
+
+ of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+ if (reg.size)
+ memblock_cap_memory_range(reg.base, reg.size);
+}
+
void __init arm64_memblock_init(void)
{
const s64 linear_region_size = -(s64)PAGE_OFFSET;

+ /* Handle linux,usable-memory-range property */
+ fdt_enforce_memory_region();
+
/*
* Ensure that the linear region takes up exactly half of the kernel
* virtual address space. This way, we can distinguish a linear address
--
2.11.1
Ard Biesheuvel
2017-03-28 09:50:31 UTC
Permalink
Post by AKASHI Takahiro
Crash dump kernel uses only a limited range of available memory as System
RAM. On arm64 kdump, This memory range is advertised to crash dump kernel
via a device-tree property under /chosen,
linux,usable-memory-range = <BASE SIZE>
Crash dump kernel reads this property at boot time and calls
memblock_cap_memory_range() to limit usable memory which are listed either
in UEFI memory map table or "memory" nodes of a device tree blob.
---
arch/arm64/mm/init.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e19e06593e37..290794b1a0f1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -188,10 +188,45 @@ static int __init early_mem(char *p)
}
early_param("mem", early_mem);
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ struct memblock_region *usablemem = data;
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+static void __init fdt_enforce_memory_region(void)
+{
+ struct memblock_region reg = {
+ .size = 0,
+ };
+
+ of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+
+ if (reg.size)
+ memblock_cap_memory_range(reg.base, reg.size);
+}
+
void __init arm64_memblock_init(void)
{
const s64 linear_region_size = -(s64)PAGE_OFFSET;
+ /* Handle linux,usable-memory-range property */
+ fdt_enforce_memory_region();
+
/*
* Ensure that the linear region takes up exactly half of the kernel
* virtual address space. This way, we can distinguish a linear address
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:24 UTC
Permalink
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.

The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent any corruption due to potential cache alias (with different
attributes) problem.

Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++++---
arch/arm64/mm/mmu.c | 90 ++++++++++++++++++++-------------------
2 files changed, 72 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@

#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>

#include "cpu-reset.h"

@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;

-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);

if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);

/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);

pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/

cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);

BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..f6a3c0e9d37f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -332,56 +334,31 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
NULL, debug_pagealloc_enabled());
}

-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot,
+ bool page_mappings_only)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc,
+ page_mappings_only);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;

/*
- * Take care not to create a writable alias for the
- * read-only text and rodata sections of the kernel image.
+ * Temporarily marked as NOMAP to skip mapping in the next for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);

- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
- return;
- }
-
- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
-
- /*
- * Map the linear alias of the [_text, __init_begin) interval as
- * read-only/non-executable. This makes the contents of the
- * region accessible to subsystems such as hibernate, but
- * protects it from inadvertent modification or execution.
- */
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL_RO,
- early_pgtable_alloc, debug_pagealloc_enabled());
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif

/* map all the memory banks */
for_each_memblock(memory, reg) {
@@ -393,8 +370,33 @@ static void __init map_mem(pgd_t *pgd)
if (memblock_is_nomap(reg))
continue;

- __map_memblock(pgd, start, end);
+ __map_memblock(pgd, start, end,
+ PAGE_KERNEL, debug_pagealloc_enabled());
+ }
+
+ /*
+ * Map the linear alias of the [_text, __init_begin) interval as
+ * read-only/non-executable. This makes the contents of the
+ * region accessible to subsystems such as hibernate, but
+ * protects it from inadvertent modification or execution.
+ */
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL_RO, debug_pagealloc_enabled());
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
+
+#ifdef CONFIG_KEXEC_CORE
+ /*
+ * User page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
+ */
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL, true);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}

void mark_rodata_ro(void)
--
2.11.1
Ard Biesheuvel
2017-03-28 10:07:05 UTC
Permalink
Post by AKASHI Takahiro
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.
The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent any corruption due to potential cache alias (with different
attributes) problem.
I think it would be more accurate to replace 'corruption' with
'coherency issues', given that this patch does not solve the issue of
writable aliases that may be used to modify the contents of the
region, but it does prevent issues related to mismatched attributes
(which are arguably a bigger concern)
Post by AKASHI Takahiro
Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.
As a head's up, this patch is going to conflict heavily with patches
that are queued up in arm64/for-next/core atm.

Some questions below.
Post by AKASHI Takahiro
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++++---
arch/arm64/mm/mmu.c | 90 ++++++++++++++++++++-------------------
2 files changed, 72 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>
#include "cpu-reset.h"
@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);
if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);
/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);
pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/
cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);
BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..f6a3c0e9d37f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -332,56 +334,31 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
NULL, debug_pagealloc_enabled());
}
-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot,
+ bool page_mappings_only)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc,
+ page_mappings_only);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
/*
- * Take care not to create a writable alias for the
- * read-only text and rodata sections of the kernel image.
+ * Temporarily marked as NOMAP to skip mapping in the next for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
OK, so the trick is to mark a memblock region NOMAP temporarily, so
that we can iterate over the regions more easily?
Is that the sole reason for using NOMAP in this series?
Post by AKASHI Takahiro
- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
- return;
- }
-
- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
-
- /*
- * Map the linear alias of the [_text, __init_begin) interval as
- * read-only/non-executable. This makes the contents of the
- * region accessible to subsystems such as hibernate, but
- * protects it from inadvertent modification or execution.
- */
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL_RO,
- early_pgtable_alloc, debug_pagealloc_enabled());
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif
/* map all the memory banks */
for_each_memblock(memory, reg) {
@@ -393,8 +370,33 @@ static void __init map_mem(pgd_t *pgd)
if (memblock_is_nomap(reg))
continue;
- __map_memblock(pgd, start, end);
+ __map_memblock(pgd, start, end,
+ PAGE_KERNEL, debug_pagealloc_enabled());
+ }
+
+ /*
+ * Map the linear alias of the [_text, __init_begin) interval as
+ * read-only/non-executable. This makes the contents of the
+ * region accessible to subsystems such as hibernate, but
+ * protects it from inadvertent modification or execution.
+ */
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL_RO, debug_pagealloc_enabled());
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
+
+#ifdef CONFIG_KEXEC_CORE
+ /*
+ * User page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
+ */
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL, true);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}
void mark_rodata_ro(void)
--
2.11.1
AKASHI Takahiro
2017-03-28 11:07:34 UTC
Permalink
Ard,
Post by Ard Biesheuvel
Post by AKASHI Takahiro
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.
The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent any corruption due to potential cache alias (with different
attributes) problem.
I think it would be more accurate to replace 'corruption' with
'coherency issues', given that this patch does not solve the issue of
writable aliases that may be used to modify the contents of the
region, but it does prevent issues related to mismatched attributes
(which are arguably a bigger concern)
OK
Post by Ard Biesheuvel
Post by AKASHI Takahiro
Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.
As a head's up, this patch is going to conflict heavily with patches
that are queued up in arm64/for-next/core atm.
I'll look into it later, but
Post by Ard Biesheuvel
Some questions below.
Post by AKASHI Takahiro
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++++---
arch/arm64/mm/mmu.c | 90 ++++++++++++++++++++-------------------
2 files changed, 72 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>
#include "cpu-reset.h"
@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);
if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);
/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);
pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/
cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);
BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..f6a3c0e9d37f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -332,56 +334,31 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
NULL, debug_pagealloc_enabled());
}
-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot,
+ bool page_mappings_only)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc,
+ page_mappings_only);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
/*
- * Take care not to create a writable alias for the
- * read-only text and rodata sections of the kernel image.
+ * Temporarily marked as NOMAP to skip mapping in the next for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
OK, so the trick is to mark a memblock region NOMAP temporarily, so
that we can iterate over the regions more easily?
Is that the sole reason for using NOMAP in this series?
Yes. (I followed Mark's suggestion.)

So I assume that my change here will be essentially orthogonal
with the chnages in for-next/core, at least, in its intent.

Thanks,
-Takahiro AKASHI
Post by Ard Biesheuvel
Post by AKASHI Takahiro
- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
- return;
- }
-
- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
-
- /*
- * Map the linear alias of the [_text, __init_begin) interval as
- * read-only/non-executable. This makes the contents of the
- * region accessible to subsystems such as hibernate, but
- * protects it from inadvertent modification or execution.
- */
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL_RO,
- early_pgtable_alloc, debug_pagealloc_enabled());
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif
/* map all the memory banks */
for_each_memblock(memory, reg) {
@@ -393,8 +370,33 @@ static void __init map_mem(pgd_t *pgd)
if (memblock_is_nomap(reg))
continue;
- __map_memblock(pgd, start, end);
+ __map_memblock(pgd, start, end,
+ PAGE_KERNEL, debug_pagealloc_enabled());
+ }
+
+ /*
+ * Map the linear alias of the [_text, __init_begin) interval as
+ * read-only/non-executable. This makes the contents of the
+ * region accessible to subsystems such as hibernate, but
+ * protects it from inadvertent modification or execution.
+ */
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL_RO, debug_pagealloc_enabled());
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
+
+#ifdef CONFIG_KEXEC_CORE
+ /*
+ * User page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
+ */
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL, true);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}
void mark_rodata_ro(void)
--
2.11.1
Ard Biesheuvel
2017-03-28 14:05:58 UTC
Permalink
Post by AKASHI Takahiro
Ard,
Post by Ard Biesheuvel
Post by AKASHI Takahiro
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.
The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent any corruption due to potential cache alias (with different
attributes) problem.
I think it would be more accurate to replace 'corruption' with
'coherency issues', given that this patch does not solve the issue of
writable aliases that may be used to modify the contents of the
region, but it does prevent issues related to mismatched attributes
(which are arguably a bigger concern)
OK
Post by Ard Biesheuvel
Post by AKASHI Takahiro
Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.
As a head's up, this patch is going to conflict heavily with patches
that are queued up in arm64/for-next/core atm.
I'll look into it later, but
Post by Ard Biesheuvel
Some questions below.
Post by AKASHI Takahiro
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++++---
arch/arm64/mm/mmu.c | 90 ++++++++++++++++++++-------------------
2 files changed, 72 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>
#include "cpu-reset.h"
@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);
if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);
/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);
pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/
cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);
BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..f6a3c0e9d37f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -332,56 +334,31 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
NULL, debug_pagealloc_enabled());
}
-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot,
+ bool page_mappings_only)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc,
+ page_mappings_only);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
/*
- * Take care not to create a writable alias for the
- * read-only text and rodata sections of the kernel image.
+ * Temporarily marked as NOMAP to skip mapping in the next for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
OK, so the trick is to mark a memblock region NOMAP temporarily, so
that we can iterate over the regions more easily?
Is that the sole reason for using NOMAP in this series?
Yes. (I followed Mark's suggestion.)
OK. It is slightly hacky, but it should work without any problems afaict
Post by AKASHI Takahiro
So I assume that my change here will be essentially orthogonal
with the chnages in for-next/core, at least, in its intent.
Yes. The changes should not conflict in fundamental ways, but the code
has changed in ways that git will not be able to deal with.
Unfortunately, that does mean another respin :-(
AKASHI Takahiro
2017-03-30 09:56:27 UTC
Permalink
Ard,
Post by Ard Biesheuvel
Post by AKASHI Takahiro
Ard,
Post by Ard Biesheuvel
Post by AKASHI Takahiro
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.
The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent any corruption due to potential cache alias (with different
attributes) problem.
I think it would be more accurate to replace 'corruption' with
'coherency issues', given that this patch does not solve the issue of
writable aliases that may be used to modify the contents of the
region, but it does prevent issues related to mismatched attributes
(which are arguably a bigger concern)
OK
Post by Ard Biesheuvel
Post by AKASHI Takahiro
Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.
As a head's up, this patch is going to conflict heavily with patches
that are queued up in arm64/for-next/core atm.
I'll look into it later, but
Post by Ard Biesheuvel
Some questions below.
Post by AKASHI Takahiro
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++++---
arch/arm64/mm/mmu.c | 90 ++++++++++++++++++++-------------------
2 files changed, 72 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>
#include "cpu-reset.h"
@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);
if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);
/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);
pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/
cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);
BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..f6a3c0e9d37f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -332,56 +334,31 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
NULL, debug_pagealloc_enabled());
}
-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot,
+ bool page_mappings_only)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc,
+ page_mappings_only);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
/*
- * Take care not to create a writable alias for the
- * read-only text and rodata sections of the kernel image.
+ * Temporarily marked as NOMAP to skip mapping in the next for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
OK, so the trick is to mark a memblock region NOMAP temporarily, so
that we can iterate over the regions more easily?
Is that the sole reason for using NOMAP in this series?
Yes. (I followed Mark's suggestion.)
OK. It is slightly hacky, but it should work without any problems afaict
Post by AKASHI Takahiro
So I assume that my change here will be essentially orthogonal
with the chnages in for-next/core, at least, in its intent.
Yes. The changes should not conflict in fundamental ways, but the code
has changed in ways that git will not be able to deal with.
Unfortunately, that does mean another respin :-(
Can you please review the patch attached below?
Hunks look a bit complex, but the resulting code is good, I believe.

If you are happy with it, I will post the entire patchset as v35.

Thanks,
-Takahiro AKASHI

===8<===
From 5b546ab2a755cc2abf858c2abbd5887cdcbb31fd Mon Sep 17 00:00:00 2001
From: Takahiro Akashi <***@linaro.org>
Date: Tue, 28 Mar 2017 15:51:24 +0900
Subject: [PATCH] arm64: kdump: protect crash dump kernel memory

arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.

The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent coherency issues due to potential cache aliasing (with
mismatched attributes).

Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++---
arch/arm64/mm/mmu.c | 103 ++++++++++++++++++++------------------
2 files changed, 80 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@

#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>

#include "cpu-reset.h"

@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;

-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);

if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);

/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);

pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/

cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);

BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 91502e36e6d9..3cde5f2d30ec 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -393,10 +395,28 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
flush_tlb_kernel_range(virt, virt + size);
}

-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot, int flags)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc, flags);
+}
+
+void __init mark_linear_text_alias_ro(void)
+{
+ /*
+ * Remove the write permissions from the linear alias of .text/.rodata
+ */
+ update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+ (unsigned long)__init_begin - (unsigned long)_text,
+ PAGE_KERNEL_RO);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
int flags = 0;

if (debug_pagealloc_enabled())
@@ -405,30 +425,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
/*
* Take care not to create a writable alias for the
* read-only text and rodata sections of the kernel image.
+ * So temporarily mark them as NOMAP to skip mappings in
+ * the following for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif

- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- return;
- }
+ /* map all the memory banks */
+ for_each_memblock(memory, reg) {
+ phys_addr_t start = reg->base;
+ phys_addr_t end = start + reg->size;

- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc, flags);
+ if (start >= end)
+ break;
+ if (memblock_is_nomap(reg))
+ continue;
+
+ __map_memblock(pgd, start, end, PAGE_KERNEL, flags);
+ }

/*
* Map the linear alias of the [_text, __init_begin) interval
@@ -440,37 +458,24 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
* Note that contiguous mappings cannot be remapped in this way,
* so we should avoid them here.
*/
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, NO_CONT_MAPPINGS);
-}
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL_RO, NO_CONT_MAPPINGS);
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);

-void __init mark_linear_text_alias_ro(void)
-{
+#ifdef CONFIG_KEXEC_CORE
/*
- * Remove the write permissions from the linear alias of .text/.rodata
+ * Use page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
*/
- update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
- (unsigned long)__init_begin - (unsigned long)_text,
- PAGE_KERNEL_RO);
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
-
- /* map all the memory banks */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
-
- if (start >= end)
- break;
- if (memblock_is_nomap(reg))
- continue;
-
- __map_memblock(pgd, start, end);
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL,
+ NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}

void mark_rodata_ro(void)
--
2.11.1
Ard Biesheuvel
2017-03-30 13:58:14 UTC
Permalink
Post by AKASHI Takahiro
Date: Tue, 28 Mar 2017 15:51:24 +0900
Subject: [PATCH] arm64: kdump: protect crash dump kernel memory
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.
The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent coherency issues due to potential cache aliasing (with
mismatched attributes).
Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.
This looks mostly correct. One comment below.
Post by AKASHI Takahiro
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++---
arch/arm64/mm/mmu.c | 103 ++++++++++++++++++++------------------
2 files changed, 80 insertions(+), 55 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>
#include "cpu-reset.h"
@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);
if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);
/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);
pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/
cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);
BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 91502e36e6d9..3cde5f2d30ec 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -393,10 +395,28 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
flush_tlb_kernel_range(virt, virt + size);
}
-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot, int flags)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc, flags);
+}
+
+void __init mark_linear_text_alias_ro(void)
+{
+ /*
+ * Remove the write permissions from the linear alias of .text/.rodata
+ */
+ update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+ (unsigned long)__init_begin - (unsigned long)_text,
+ PAGE_KERNEL_RO);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
int flags = 0;
if (debug_pagealloc_enabled())
@@ -405,30 +425,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
/*
* Take care not to create a writable alias for the
* read-only text and rodata sections of the kernel image.
+ * So temporarily mark them as NOMAP to skip mappings in
+ * the following for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif
- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- return;
- }
+ /* map all the memory banks */
+ for_each_memblock(memory, reg) {
+ phys_addr_t start = reg->base;
+ phys_addr_t end = start + reg->size;
- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc, flags);
+ if (start >= end)
+ break;
+ if (memblock_is_nomap(reg))
+ continue;
+
+ __map_memblock(pgd, start, end, PAGE_KERNEL, flags);
+ }
/*
* Map the linear alias of the [_text, __init_begin) interval
@@ -440,37 +458,24 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
* Note that contiguous mappings cannot be remapped in this way,
* so we should avoid them here.
*/
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, NO_CONT_MAPPINGS);
-}
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL_RO, NO_CONT_MAPPINGS);
This should be PAGE_KERNEL not PAGE_KERNEL_RO.
Post by AKASHI Takahiro
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
-void __init mark_linear_text_alias_ro(void)
-{
+#ifdef CONFIG_KEXEC_CORE
/*
- * Remove the write permissions from the linear alias of .text/.rodata
+ * Use page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
*/
- update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
- (unsigned long)__init_begin - (unsigned long)_text,
- PAGE_KERNEL_RO);
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
-
- /* map all the memory banks */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
-
- if (start >= end)
- break;
- if (memblock_is_nomap(reg))
- continue;
-
- __map_memblock(pgd, start, end);
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL,
+ NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}
void mark_rodata_ro(void)
--
2.11.1
AKASHI Takahiro
2017-04-03 02:28:44 UTC
Permalink
Post by Ard Biesheuvel
Post by AKASHI Takahiro
Date: Tue, 28 Mar 2017 15:51:24 +0900
Subject: [PATCH] arm64: kdump: protect crash dump kernel memory
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
are meant to be called by kexec_load() in order to protect the memory
allocated for crash dump kernel once the image is loaded.
The protection is implemented by unmapping the relevant segments in crash
dump kernel memory, rather than making it read-only as other archs do,
to prevent coherency issues due to potential cache aliasing (with
mismatched attributes).
Page-level mappings are consistently used here so that we can change
the attributes of segments in page granularity as well as shrink the region
also in page granularity through /sys/kernel/kexec_crash_size, putting
the freed memory back to buddy system.
This looks mostly correct. One comment below.
Post by AKASHI Takahiro
---
arch/arm64/kernel/machine_kexec.c | 32 +++++++++---
arch/arm64/mm/mmu.c | 103 ++++++++++++++++++++------------------
2 files changed, 80 insertions(+), 55 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index bc96c8a7fc79..b63baa749609 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -14,7 +14,9 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/page.h>
#include "cpu-reset.h"
@@ -22,8 +24,6 @@
extern const unsigned char arm64_relocate_new_kernel[];
extern const unsigned long arm64_relocate_new_kernel_size;
-static unsigned long kimage_start;
-
/**
* kexec_image_info - For debugging output.
*/
@@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage)
*/
int machine_kexec_prepare(struct kimage *kimage)
{
- kimage_start = kimage->start;
-
kexec_image_info(kimage);
if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
@@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage)
kexec_list_flush(kimage);
/* Flush the new image if already in place. */
- if (kimage->head & IND_DONE)
+ if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE))
kexec_segment_flush(kimage);
pr_info("Bye!\n");
@@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage)
*/
cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage_start, 0);
+ kimage->start, 0);
BUG(); /* Should never get here. */
}
@@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs)
{
/* Empty routine needed to avoid build errors. */
}
+
+void arch_kexec_protect_crashkres(void)
+{
+ int i;
+
+ kexec_segment_flush(kexec_crash_image);
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0);
+}
+
+void arch_kexec_unprotect_crashkres(void)
+{
+ int i;
+
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ set_memory_valid(
+ __phys_to_virt(kexec_crash_image->segment[i].mem),
+ kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
+}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 91502e36e6d9..3cde5f2d30ec 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
@@ -393,10 +395,28 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
flush_tlb_kernel_range(virt, virt + size);
}
-static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start,
+ phys_addr_t end, pgprot_t prot, int flags)
+{
+ __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc, flags);
+}
+
+void __init mark_linear_text_alias_ro(void)
+{
+ /*
+ * Remove the write permissions from the linear alias of .text/.rodata
+ */
+ update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+ (unsigned long)__init_begin - (unsigned long)_text,
+ PAGE_KERNEL_RO);
+}
+
+static void __init map_mem(pgd_t *pgd)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ struct memblock_region *reg;
int flags = 0;
if (debug_pagealloc_enabled())
@@ -405,30 +425,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
/*
* Take care not to create a writable alias for the
* read-only text and rodata sections of the kernel image.
+ * So temporarily mark them as NOMAP to skip mappings in
+ * the following for-loop
*/
+ memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end)
+ memblock_mark_nomap(crashk_res.start,
+ resource_size(&crashk_res));
+#endif
- /* No overlap with the kernel text/rodata */
- if (end < kernel_start || start >= kernel_end) {
- __create_pgd_mapping(pgd, start, __phys_to_virt(start),
- end - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- return;
- }
+ /* map all the memory banks */
+ for_each_memblock(memory, reg) {
+ phys_addr_t start = reg->base;
+ phys_addr_t end = start + reg->size;
- /*
- * This block overlaps the kernel text/rodata mappings.
- * Map the portion(s) which don't overlap.
- */
- if (start < kernel_start)
- __create_pgd_mapping(pgd, start,
- __phys_to_virt(start),
- kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc, flags);
- if (kernel_end < end)
- __create_pgd_mapping(pgd, kernel_end,
- __phys_to_virt(kernel_end),
- end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc, flags);
+ if (start >= end)
+ break;
+ if (memblock_is_nomap(reg))
+ continue;
+
+ __map_memblock(pgd, start, end, PAGE_KERNEL, flags);
+ }
/*
* Map the linear alias of the [_text, __init_begin) interval
@@ -440,37 +458,24 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
* Note that contiguous mappings cannot be remapped in this way,
* so we should avoid them here.
*/
- __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, NO_CONT_MAPPINGS);
-}
+ __map_memblock(pgd, kernel_start, kernel_end,
+ PAGE_KERNEL_RO, NO_CONT_MAPPINGS);
This should be PAGE_KERNEL not PAGE_KERNEL_RO.
Ah, yes. Fixed it.

Thanks,
-Takahiro AKASHI
Post by Ard Biesheuvel
Post by AKASHI Takahiro
+ memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
-void __init mark_linear_text_alias_ro(void)
-{
+#ifdef CONFIG_KEXEC_CORE
/*
- * Remove the write permissions from the linear alias of .text/.rodata
+ * Use page-level mappings here so that we can shrink the region
+ * in page granularity and put back unused memory to buddy system
+ * through /sys/kernel/kexec_crash_size interface.
*/
- update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
- (unsigned long)__init_begin - (unsigned long)_text,
- PAGE_KERNEL_RO);
-}
-
-static void __init map_mem(pgd_t *pgd)
-{
- struct memblock_region *reg;
-
- /* map all the memory banks */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
-
- if (start >= end)
- break;
- if (memblock_is_nomap(reg))
- continue;
-
- __map_memblock(pgd, start, end);
+ if (crashk_res.end) {
+ __map_memblock(pgd, crashk_res.start, crashk_res.end + 1,
+ PAGE_KERNEL,
+ NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
+ memblock_clear_nomap(crashk_res.start,
+ resource_size(&crashk_res));
}
+#endif
}
void mark_rodata_ro(void)
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:26 UTC
Permalink
Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus
and save registers' status in per-cpu ELF notes before starting crash
dump kernel. See kernel_kexec().
Even if not all secondary cpus have shut down, we do kdump anyway.

As we don't have to make non-boot(crashed) cpus offline (to preserve
correct status of cpus at crash dump) before shutting down, this patch
also adds a variant of smp_send_stop().

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/kexec.h | 42 +++++++++++++++++++++++-
arch/arm64/include/asm/smp.h | 3 ++
arch/arm64/kernel/machine_kexec.c | 58 ++++++++++++++++++++++++++++++---
arch/arm64/kernel/smp.c | 68 +++++++++++++++++++++++++++++++++++++++
5 files changed, 167 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 8740297dac77..1473fc2f7ab7 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,7 +20,7 @@
#include <linux/threads.h>
#include <asm/irq.h>

-#define NR_IPI 6
+#define NR_IPI 7

typedef struct {
unsigned int __softirq_pending;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 90aabbe893b7..e17f0529a882 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -40,7 +40,47 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ u64 tmp1, tmp2;
+
+ __asm__ __volatile__ (
+ "stp x0, x1, [%2, #16 * 0]\n"
+ "stp x2, x3, [%2, #16 * 1]\n"
+ "stp x4, x5, [%2, #16 * 2]\n"
+ "stp x6, x7, [%2, #16 * 3]\n"
+ "stp x8, x9, [%2, #16 * 4]\n"
+ "stp x10, x11, [%2, #16 * 5]\n"
+ "stp x12, x13, [%2, #16 * 6]\n"
+ "stp x14, x15, [%2, #16 * 7]\n"
+ "stp x16, x17, [%2, #16 * 8]\n"
+ "stp x18, x19, [%2, #16 * 9]\n"
+ "stp x20, x21, [%2, #16 * 10]\n"
+ "stp x22, x23, [%2, #16 * 11]\n"
+ "stp x24, x25, [%2, #16 * 12]\n"
+ "stp x26, x27, [%2, #16 * 13]\n"
+ "stp x28, x29, [%2, #16 * 14]\n"
+ "mov %0, sp\n"
+ "stp x30, %0, [%2, #16 * 15]\n"
+
+ "/* faked current PSTATE */\n"
+ "mrs %0, CurrentEL\n"
+ "mrs %1, SPSEL\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, DAIF\n"
+ "orr %0, %0, %1\n"
+ "mrs %1, NZCV\n"
+ "orr %0, %0, %1\n"
+ /* pc */
+ "adr %1, 1f\n"
+ "1:\n"
+ "stp %1, %0, [%2, #16 * 16]\n"
+ : "=&r" (tmp1), "=&r" (tmp2)
+ : "r" (newregs)
+ : "memory"
+ );
+ }
}

#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d050d720a1b4..55f08c5acfad 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,6 +148,9 @@ static inline void cpu_panic_kernel(void)
*/
bool cpus_are_stuck_in_kernel(void);

+extern void smp_send_crash_stop(void);
+extern bool smp_crash_stop_failed(void);
+
#endif /* ifndef __ASSEMBLY__ */

#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index a6d66b98d795..779a80046066 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,9 @@
* published by the Free Software Foundation.
*/

+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/page-flags.h>
#include <linux/smp.h>
@@ -143,11 +146,15 @@ void machine_kexec(struct kimage *kimage)
{
phys_addr_t reboot_code_buffer_phys;
void *reboot_code_buffer;
+ bool in_kexec_crash = (kimage == kexec_crash_image);
+ bool stuck_cpus = cpus_are_stuck_in_kernel();

/*
* New cpus may have become stuck_in_kernel after we loaded the image.
*/
- BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
+ BUG_ON(!in_kexec_crash && (stuck_cpus || (num_online_cpus() > 1)));
+ WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
+ "Some CPUs may be stale, kdump will be unreliable.\n");

reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
@@ -199,15 +206,58 @@ void machine_kexec(struct kimage *kimage)
* relocation is complete.
*/

- cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head,
- kimage->start, 0);
+ cpu_soft_restart(kimage != kexec_crash_image,
+ reboot_code_buffer_phys, kimage->head, kimage->start, 0);

BUG(); /* Should never get here. */
}

+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+ int ret;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ /*
+ * First try to remove the active state. If this
+ * fails, try to EOI the interrupt.
+ */
+ ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+
+ if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-crashing cpus and save registers
+ */
void machine_crash_shutdown(struct pt_regs *regs)
{
- /* Empty routine needed to avoid build errors. */
+ local_irq_disable();
+
+ /* shutdown non-crashing cpus */
+ smp_send_crash_stop();
+
+ /* for crashing cpu */
+ crash_save_cpu(regs, smp_processor_id());
+ machine_kexec_mask_interrupts();
+
+ pr_info("Starting crashdump kernel...\n");
}

void arch_kexec_protect_crashkres(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ef1caae02110..8016914591d2 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -39,6 +39,7 @@
#include <linux/completion.h>
#include <linux/of.h>
#include <linux/irq_work.h>
+#include <linux/kexec.h>

#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -76,6 +77,7 @@ enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_CPU_CRASH_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
IPI_WAKEUP
@@ -755,6 +757,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
S(IPI_RESCHEDULE, "Rescheduling interrupts"),
S(IPI_CALL_FUNC, "Function call interrupts"),
S(IPI_CPU_STOP, "CPU stop interrupts"),
+ S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"),
S(IPI_TIMER, "Timer broadcast interrupts"),
S(IPI_IRQ_WORK, "IRQ work interrupts"),
S(IPI_WAKEUP, "CPU wake-up interrupts"),
@@ -829,6 +832,29 @@ static void ipi_cpu_stop(unsigned int cpu)
cpu_relax();
}

+#ifdef CONFIG_KEXEC_CORE
+static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0);
+#endif
+
+static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+#ifdef CONFIG_KEXEC_CORE
+ crash_save_cpu(regs, cpu);
+
+ atomic_dec(&waiting_for_crash_ipi);
+
+ local_irq_disable();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (cpu_ops[cpu]->cpu_die)
+ cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+ /* just in case */
+ cpu_park_loop();
+#endif
+}
+
/*
* Main handler for inter-processor interrupts
*/
@@ -859,6 +885,15 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
irq_exit();
break;

+ case IPI_CPU_CRASH_STOP:
+ if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+ irq_enter();
+ ipi_cpu_crash_stop(cpu, regs);
+
+ unreachable();
+ }
+ break;
+
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
case IPI_TIMER:
irq_enter();
@@ -931,6 +966,39 @@ void smp_send_stop(void)
cpumask_pr_args(cpu_online_mask));
}

+#ifdef CONFIG_KEXEC_CORE
+void smp_send_crash_stop(void)
+{
+ cpumask_t mask;
+ unsigned long timeout;
+
+ if (num_online_cpus() == 1)
+ return;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+
+ pr_crit("SMP: stopping secondary CPUs\n");
+ smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
+ udelay(1);
+
+ if (atomic_read(&waiting_for_crash_ipi) > 0)
+ pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
+ cpumask_pr_args(&mask));
+}
+
+bool smp_crash_stop_failed(void)
+{
+ return (atomic_read(&waiting_for_crash_ipi) > 0);
+}
+#endif
+
/*
* not supported here
*/
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:27 UTC
Permalink
In addition to common VMCOREINFO's defined in
crash_save_vmcoreinfo_init(), we need to know, for crash utility,
- kimage_voffset
- PHYS_OFFSET
to examine the contents of a dump file (/proc/vmcore) correctly
due to the introduction of KASLR (CONFIG_RANDOMIZE_BASE) in v4.6.

- VA_BITS
is also required for makedumpfile command.

arch_crash_save_vmcoreinfo() appends them to the dump file.
More VMCOREINFO's may be added later.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/kernel/machine_kexec.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 779a80046066..481f54a866c5 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -18,6 +18,7 @@

#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
+#include <asm/memory.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
#include <asm/page.h>
@@ -351,3 +352,13 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
}
}
#endif /* CONFIG_HIBERNATION */
+
+void arch_crash_save_vmcoreinfo(void)
+{
+ VMCOREINFO_NUMBER(VA_BITS);
+ /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
+ vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
+ kimage_voffset);
+ vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
+ PHYS_OFFSET);
+}
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:28 UTC
Permalink
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.

A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().

Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.

On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.

Meanwhile, elfcorehdr_read() is simple as the region is always mapped.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/Kconfig | 11 +++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 ++++++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/init.c | 53 +++++++++++++++++++++++++++++++
4 files changed, 136 insertions(+)
create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3741859765cf..e7f043efff41 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -736,6 +736,17 @@ config KEXEC
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.

+config CRASH_DUMP
+ bool "Build kdump crash kernel"
+ help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec.
+
+ For more details see Documentation/kdump/kdump.txt
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 1606c6b2a280..0fe44a7513ff 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o

obj-y += $(arm64-obj-y) vdso/ probes/
obj-m += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 000000000000..f46d57c31443
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2017 Linaro Limited
+ * Author: AKASHI Takahiro <***@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+ memunmap(vaddr);
+ return -EFAULT;
+ }
+ } else {
+ memcpy(buf, vaddr + offset, csize);
+ }
+
+ memunmap(vaddr);
+
+ return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+ return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 89ba3cd0fe44..5960bef0170d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -39,6 +39,7 @@
#include <linux/vmalloc.h>
#include <linux/mm.h>
#include <linux/kexec.h>
+#include <linux/crash_dump.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -165,6 +166,56 @@ static void __init kexec_reserve_crashkres_pages(void)
}
#endif /* CONFIG_KEXEC_CORE */

+#ifdef CONFIG_CRASH_DUMP
+static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ const __be32 *reg;
+ int len;
+
+ if (depth != 1 || strcmp(uname, "chosen") != 0)
+ return 0;
+
+ reg = of_get_flat_dt_prop(node, "linux,elfcorehdr", &len);
+ if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ return 1;
+
+ elfcorehdr_addr = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ elfcorehdr_size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ return 1;
+}
+
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves the memory occupied by an elf core header
+ * described in the device tree. This region contains all the
+ * information about primary kernel's core image and is used by a dump
+ * capture kernel to access the system memory on primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+ of_scan_flat_dt(early_init_dt_scan_elfcorehdr, NULL);
+
+ if (!elfcorehdr_size)
+ return;
+
+ if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+ pr_warn("elfcorehdr is overlapped\n");
+ return;
+ }
+
+ memblock_reserve(elfcorehdr_addr, elfcorehdr_size);
+
+ pr_info("Reserving %lldKB of memory at 0x%llx for elfcorehdr\n",
+ elfcorehdr_size >> 10, elfcorehdr_addr);
+}
+#else
+static void __init reserve_elfcorehdr(void)
+{
+}
+#endif /* CONFIG_CRASH_DUMP */
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -423,6 +474,8 @@ void __init arm64_memblock_init(void)

reserve_crashkernel();

+ reserve_elfcorehdr();
+
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:29 UTC
Permalink
Kdump is enabled by default as kexec is.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/configs/defconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 7c48028ec64a..927ee18bbdf2 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -82,6 +82,7 @@ CONFIG_CMA=y
CONFIG_SECCOMP=y
CONFIG_XEN=y
CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_COMPAT=y
CONFIG_CPU_IDLE=y
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:25 UTC
Permalink
Since arch_kexec_protect_crashkres() removes a mapping for crash dump
kernel image, the loaded data won't be preserved around hibernation.

In this patch, helper functions, crash_prepare_suspend()/
crash_post_resume(), are additionally called before/after hibernation so
that the relevant memory segments will be mapped again and preserved just
as the others are.

In addition, to minimize the size of hibernation image, crash_is_nosave()
is added to pfn_is_nosave() in order to recognize only the pages that hold
loaded crash dump kernel image as saveable. Hibernation excludes any pages
that are marked as Reserved and yet "nosave."

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: James Morse <***@arm.com>
---
arch/arm64/include/asm/kexec.h | 10 ++++++
arch/arm64/kernel/hibernate.c | 10 +++++-
arch/arm64/kernel/machine_kexec.c | 71 +++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/init.c | 27 +++++++++++++++
4 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 04744dc5fb61..90aabbe893b7 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -43,6 +43,16 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
/* Empty routine needed to avoid build errors. */
}

+#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION)
+extern bool crash_is_nosave(unsigned long pfn);
+extern void crash_prepare_suspend(void);
+extern void crash_post_resume(void);
+#else
+static inline bool crash_is_nosave(unsigned long pfn) {return false; }
+static inline void crash_prepare_suspend(void) {}
+static inline void crash_post_resume(void) {}
+#endif
+
#endif /* __ASSEMBLY__ */

#endif
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 97a7384100f3..a44e13942d30 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -28,6 +28,7 @@
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/irqflags.h>
+#include <asm/kexec.h>
#include <asm/memory.h>
#include <asm/mmu_context.h>
#include <asm/pgalloc.h>
@@ -102,7 +103,8 @@ int pfn_is_nosave(unsigned long pfn)
unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);

- return (pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn);
+ return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)) ||
+ crash_is_nosave(pfn);
}

void notrace save_processor_state(void)
@@ -286,6 +288,9 @@ int swsusp_arch_suspend(void)
local_dbg_save(flags);

if (__cpu_suspend_enter(&state)) {
+ /* make the crash dump kernel image visible/saveable */
+ crash_prepare_suspend();
+
sleep_cpu = smp_processor_id();
ret = swsusp_save();
} else {
@@ -297,6 +302,9 @@ int swsusp_arch_suspend(void)
if (el2_reset_needed())
dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end);

+ /* make the crash dump kernel image protected again */
+ crash_post_resume();
+
/*
* Tell the hibernation core that we've just restored
* the memory
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index b63baa749609..a6d66b98d795 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -10,6 +10,7 @@
*/

#include <linux/kexec.h>
+#include <linux/page-flags.h>
#include <linux/smp.h>

#include <asm/cacheflush.h>
@@ -230,3 +231,73 @@ void arch_kexec_unprotect_crashkres(void)
__phys_to_virt(kexec_crash_image->segment[i].mem),
kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1);
}
+
+#ifdef CONFIG_HIBERNATION
+/*
+ * To preserve the crash dump kernel image, the relevant memory segments
+ * should be mapped again around the hibernation.
+ */
+void crash_prepare_suspend(void)
+{
+ if (kexec_crash_image)
+ arch_kexec_unprotect_crashkres();
+}
+
+void crash_post_resume(void)
+{
+ if (kexec_crash_image)
+ arch_kexec_protect_crashkres();
+}
+
+/*
+ * crash_is_nosave
+ *
+ * Return true only if a page is part of reserved memory for crash dump kernel,
+ * but does not hold any data of loaded kernel image.
+ *
+ * Note that all the pages in crash dump kernel memory have been initially
+ * marked as Reserved in kexec_reserve_crashkres_pages().
+ *
+ * In hibernation, the pages which are Reserved and yet "nosave" are excluded
+ * from the hibernation iamge. crash_is_nosave() does thich check for crash
+ * dump kernel and will reduce the total size of hibernation image.
+ */
+
+bool crash_is_nosave(unsigned long pfn)
+{
+ int i;
+ phys_addr_t addr;
+
+ if (!crashk_res.end)
+ return false;
+
+ /* in reserved memory? */
+ addr = __pfn_to_phys(pfn);
+ if ((addr < crashk_res.start) || (crashk_res.end < addr))
+ return false;
+
+ if (!kexec_crash_image)
+ return true;
+
+ /* not part of loaded kernel image? */
+ for (i = 0; i < kexec_crash_image->nr_segments; i++)
+ if (addr >= kexec_crash_image->segment[i].mem &&
+ addr < (kexec_crash_image->segment[i].mem +
+ kexec_crash_image->segment[i].memsz))
+ return false;
+
+ return true;
+}
+
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
+{
+ unsigned long addr;
+ struct page *page;
+
+ for (addr = begin; addr < end; addr += PAGE_SIZE) {
+ page = phys_to_page(addr);
+ ClearPageReserved(page);
+ free_reserved_page(page);
+ }
+}
+#endif /* CONFIG_HIBERNATION */
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 09d19207362d..89ba3cd0fe44 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -134,10 +134,35 @@ static void __init reserve_crashkernel(void)
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
}
+
+static void __init kexec_reserve_crashkres_pages(void)
+{
+#ifdef CONFIG_HIBERNATION
+ phys_addr_t addr;
+ struct page *page;
+
+ if (!crashk_res.end)
+ return;
+
+ /*
+ * To reduce the size of hibernation image, all the pages are
+ * marked as Reserved initially.
+ */
+ for (addr = crashk_res.start; addr < (crashk_res.end + 1);
+ addr += PAGE_SIZE) {
+ page = phys_to_page(addr);
+ SetPageReserved(page);
+ }
+#endif
+}
#else
static void __init reserve_crashkernel(void)
{
}
+
+static void __init kexec_reserve_crashkres_pages(void)
+{
+}
#endif /* CONFIG_KEXEC_CORE */

/*
@@ -517,6 +542,8 @@ void __init mem_init(void)
/* this will put all unused low memory onto the freelists */
free_all_bootmem();

+ kexec_reserve_crashkres_pages();
+
mem_init_print_info(NULL);

#define MLK(b, t) b, t, ((t) - (b)) >> 10
--
2.11.1
AKASHI Takahiro
2017-03-28 06:52:42 UTC
Permalink
From: James Morse <***@arm.com>

Add documentation for DT properties:
linux,usable-memory-range
linux,elfcorehdr
used by arm64 kdump. Those are, respectively, a usable memory range
allocated to crash dump kernel and the elfcorehdr's location within it.

Signed-off-by: James Morse <***@arm.com>
[***@linaro.org: update the text due to recent changes ]
Signed-off-by: AKASHI Takahiro <***@linaro.org>
Acked-by: Mark Rutland <***@arm.com>
Cc: ***@vger.kernel.org
Cc: Rob Herring <robh+***@kernel.org>
---
Documentation/devicetree/bindings/chosen.txt | 45 ++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 6ae9d82d4c37..b5e39af4ddc0 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -52,3 +52,48 @@ This property is set (currently only on PowerPC, and only needed on
book3e) by some versions of kexec-tools to tell the new kernel that it
is being booted by kexec, as the booting environment may differ (e.g.
a different secondary CPU release mechanism)
+
+linux,usable-memory-range
+-------------------------
+
+This property (arm64 only) holds a base address and size, describing a
+limited region in which memory may be considered available for use by
+the kernel. Memory outside of this range is not available for use.
+
+This property describes a limitation: memory within this range is only
+valid when also described through another mechanism that the kernel
+would otherwise use to determine available memory (e.g. memory nodes
+or the EFI memory map). Valid memory may be sparse within the range.
+e.g.
+
+/ {
+ chosen {
+ linux,usable-memory-range = <0x9 0xf0000000 0x0 0x10000000>;
+ };
+};
+
+The main usage is for crash dump kernel to identify its own usable
+memory and exclude, at its boot time, any other memory areas that are
+part of the panicked kernel's memory.
+
+While this property does not represent a real hardware, the address
+and the size are expressed in #address-cells and #size-cells,
+respectively, of the root node.
+
+linux,elfcorehdr
+----------------
+
+This property (currently used only on arm64) holds the memory range,
+the address and the size, of the elf core header which mainly describes
+the panicked kernel's memory layout as PT_LOAD segments of elf format.
+e.g.
+
+/ {
+ chosen {
+ linux,elfcorehdr = <0x9 0xfffff000 0x0 0x800>;
+ };
+};
+
+While this property does not represent a real hardware, the address
+and the size are expressed in #address-cells and #size-cells,
+respectively, of the root node.
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:30 UTC
Permalink
Add arch specific descriptions about kdump usage on arm64 to kdump.txt.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Reviewed-by: Baoquan He <***@redhat.com>
Acked-by: Dave Young <***@redhat.com>
Acked-by: Catalin Marinas <***@arm.com>
---
Documentation/kdump/kdump.txt | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index b0eb27b956d9..615434d81108 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
a remote system.

Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.

When the system kernel boots, it reserves a small section of memory for
the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,13 @@ Dump-capture kernel config options (Arch Dependent, arm)

AUTO_ZRELADDR=y

+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+ on non-VHE systems even if it is configured. This is because the CPU
+ will not be reset to EL2 on panic.
+
Extended crashkernel syntax
===========================

@@ -305,6 +312,8 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.

+ On arm64, use "crashkernel=Y[@X]". Note that the start address of
+ the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).

Load the Dump-capture Kernel
============================
@@ -327,6 +336,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
+ - Use vmlinux or Image

If you are using a uncompressed vmlinux image then use following command
to load dump-capture kernel.
@@ -370,6 +381,9 @@ For s390x:
For arm:
"1 maxcpus=1 reset_devices"

+For arm64:
+ "1 maxcpus=1 reset_devices"
+
Notes on loading the dump-capture kernel:

* By default, the ELF headers are stored in ELF64 format to support
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:23 UTC
Permalink
This function validates and invalidates PTE entries, and will be utilized
in kdump to protect loaded crash dump kernel image.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/pageattr.c | 15 +++++++++------
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 5a2a6ee65f65..728f933cef8c 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -154,5 +154,6 @@ int set_memory_ro(unsigned long addr, int numpages);
int set_memory_rw(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_valid(unsigned long addr, unsigned long size, int enable);

#endif
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 8def55e7249b..3212ee0558f6 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -125,20 +125,23 @@ int set_memory_x(unsigned long addr, int numpages)
}
EXPORT_SYMBOL_GPL(set_memory_x);

-#ifdef CONFIG_DEBUG_PAGEALLOC
-void __kernel_map_pages(struct page *page, int numpages, int enable)
+int set_memory_valid(unsigned long addr, int numpages, int enable)
{
- unsigned long addr = (unsigned long) page_address(page);
-
if (enable)
- __change_memory_common(addr, PAGE_SIZE * numpages,
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
__pgprot(PTE_VALID),
__pgprot(0));
else
- __change_memory_common(addr, PAGE_SIZE * numpages,
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
__pgprot(0),
__pgprot(PTE_VALID));
}
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+void __kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ set_memory_valid((unsigned long)page_address(page), numpages, enable);
+}
#ifdef CONFIG_HIBERNATION
/*
* When built with CONFIG_DEBUG_PAGEALLOC and CONFIG_HIBERNATION, this function
--
2.11.1
Ard Biesheuvel
2017-03-28 09:54:21 UTC
Permalink
Post by AKASHI Takahiro
This function validates and invalidates PTE entries, and will be utilized
in kdump to protect loaded crash dump kernel image.
---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/pageattr.c | 15 +++++++++------
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 5a2a6ee65f65..728f933cef8c 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -154,5 +154,6 @@ int set_memory_ro(unsigned long addr, int numpages);
int set_memory_rw(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_valid(unsigned long addr, unsigned long size, int enable);
#endif
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 8def55e7249b..3212ee0558f6 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -125,20 +125,23 @@ int set_memory_x(unsigned long addr, int numpages)
}
EXPORT_SYMBOL_GPL(set_memory_x);
-#ifdef CONFIG_DEBUG_PAGEALLOC
-void __kernel_map_pages(struct page *page, int numpages, int enable)
+int set_memory_valid(unsigned long addr, int numpages, int enable)
{
- unsigned long addr = (unsigned long) page_address(page);
-
if (enable)
- __change_memory_common(addr, PAGE_SIZE * numpages,
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
__pgprot(PTE_VALID),
__pgprot(0));
else
- __change_memory_common(addr, PAGE_SIZE * numpages,
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
__pgprot(0),
__pgprot(PTE_VALID));
}
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+void __kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ set_memory_valid((unsigned long)page_address(page), numpages, enable);
+}
#ifdef CONFIG_HIBERNATION
/*
* When built with CONFIG_DEBUG_PAGEALLOC and CONFIG_HIBERNATION, this function
--
2.11.1
AKASHI Takahiro
2017-03-28 06:51:22 UTC
Permalink
"crashkernel=" kernel parameter specifies the size (and optionally
the start address) of the system ram to be used by crash dump kernel.
reserve_crashkernel() will allocate and reserve that memory at boot time
of primary kernel.

The memory range will be exposed to userspace as a resource named
"Crash kernel" in /proc/iomem.

Signed-off-by: AKASHI Takahiro <***@linaro.org>
Signed-off-by: Mark Salter <***@redhat.com>
Signed-off-by: Pratyush Anand <***@redhat.com>
Reviewed-by: James Morse <***@arm.com>
Acked-by: Catalin Marinas <***@arm.com>
---
arch/arm64/kernel/setup.c | 7 ++++-
arch/arm64/mm/init.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 42274bda0ccb..28855ec1be95 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
@@ -226,6 +225,12 @@ static void __init request_standard_resources(void)
if (kernel_data.start >= res->start &&
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
+#ifdef CONFIG_KEXEC_CORE
+ /* Userspace will find "Crash kernel" region in /proc/iomem. */
+ if (crashk_res.end && crashk_res.start >= res->start &&
+ crashk_res.end <= res->end)
+ request_resource(res, &crashk_res);
+#endif
}
}

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 290794b1a0f1..09d19207362d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -30,6 +30,7 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/dma-mapping.h>
#include <linux/dma-contiguous.h>
@@ -37,6 +38,7 @@
#include <linux/swiotlb.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
+#include <linux/kexec.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -77,6 +79,67 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif

+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_base, crash_size;
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ /* no crashkernel= or invalid value specified */
+ if (ret || !crash_size)
+ return;
+
+ crash_size = PAGE_ALIGN(crash_size);
+
+ if (crash_base == 0) {
+ /* Current arm64 boot protocol requires 2MB alignment */
+ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+ crash_size, SZ_2M);
+ if (crash_base == 0) {
+ pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size)) {
+ pr_warn("cannot reserve crashkernel: region is not memory\n");
+ return;
+ }
+
+ if (memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
+ return;
+ }
+
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
+ crash_base, crash_base + crash_size, crash_size >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+}
+#endif /* CONFIG_KEXEC_CORE */
+
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -332,6 +395,9 @@ void __init arm64_memblock_init(void)
arm64_dma_phys_limit = max_zone_dma_phys();
else
arm64_dma_phys_limit = PHYS_MASK + 1;
+
+ reserve_crashkernel();
+
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
--
2.11.1
Ard Biesheuvel
2017-03-28 09:52:29 UTC
Permalink
Post by AKASHI Takahiro
"crashkernel=" kernel parameter specifies the size (and optionally
the start address) of the system ram to be used by crash dump kernel.
reserve_crashkernel() will allocate and reserve that memory at boot time
of primary kernel.
The memory range will be exposed to userspace as a resource named
"Crash kernel" in /proc/iomem.
---
arch/arm64/kernel/setup.c | 7 ++++-
arch/arm64/mm/init.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 72 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 42274bda0ccb..28855ec1be95 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
@@ -226,6 +225,12 @@ static void __init request_standard_resources(void)
if (kernel_data.start >= res->start &&
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
+#ifdef CONFIG_KEXEC_CORE
+ /* Userspace will find "Crash kernel" region in /proc/iomem. */
+ if (crashk_res.end && crashk_res.start >= res->start &&
+ crashk_res.end <= res->end)
+ request_resource(res, &crashk_res);
+#endif
}
}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 290794b1a0f1..09d19207362d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -30,6 +30,7 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/dma-mapping.h>
#include <linux/dma-contiguous.h>
@@ -37,6 +38,7 @@
#include <linux/swiotlb.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
+#include <linux/kexec.h>
#include <asm/boot.h>
#include <asm/fixmap.h>
@@ -77,6 +79,67 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif
+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_base, crash_size;
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ /* no crashkernel= or invalid value specified */
+ if (ret || !crash_size)
+ return;
+
+ crash_size = PAGE_ALIGN(crash_size);
+
+ if (crash_base == 0) {
+ /* Current arm64 boot protocol requires 2MB alignment */
+ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+ crash_size, SZ_2M);
+ if (crash_base == 0) {
+ pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size)) {
+ pr_warn("cannot reserve crashkernel: region is not memory\n");
+ return;
+ }
+
+ if (memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
+ return;
+ }
+
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
+ crash_base, crash_base + crash_size, crash_size >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#else
+static void __init reserve_crashkernel(void)
+{
+}
+#endif /* CONFIG_KEXEC_CORE */
+
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -332,6 +395,9 @@ void __init arm64_memblock_init(void)
arm64_dma_phys_limit = max_zone_dma_phys();
else
arm64_dma_phys_limit = PHYS_MASK + 1;
+
+ reserve_crashkernel();
+
dma_contiguous_reserve(arm64_dma_phys_limit);
memblock_allow_resize();
--
2.11.1
David Woodhouse
2017-04-03 08:18:12 UTC
Permalink
+       if (crash_base == 0) {
+               /* Current arm64 boot protocol requires 2MB alignment */
+               crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
+                               crash_size, SZ_2M);
+               if (crash_base == 0) {
+                       pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
+                               crash_size);
+                       return;
+               }
+       } else {
+               /* User specifies base address explicitly. */
+               if (!memblock_is_region_memory(crash_base, crash_size)) {
+                       pr_warn("cannot reserve crashkernel: region is not memory\n");
+                       return;
+               }
+
+               if (memblock_is_region_reserved(crash_base, crash_size)) {
+                       pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
+                       return;
+               }
+
+               if (!IS_ALIGNED(crash_base, SZ_2M)) {
+                       pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
+                       return;
+               }
You still have typos here.
AKASHI Takahiro
2017-04-04 05:41:45 UTC
Permalink
Post by David Woodhouse
You still have typos here.
I'd like to defer to the maintainers whether we prefer MiB over MB.

Thanks,
-Takahiro AKASHI
David Woodhouse
2017-04-04 06:14:55 UTC
Permalink
Post by AKASHI Takahiro
Post by David Woodhouse
You still have typos here.
I'd like to defer to the maintainers whether we prefer MiB over MB.
It is not really a matter of preference. One is correct; the other is not.

While simple errors can of course be forgiven, I cannot understand why you
would deliberately repeat an error once it has been pointed out to you.
--
dwmw2
AKASHI Takahiro
2017-04-04 07:35:05 UTC
Permalink
Post by David Woodhouse
Post by AKASHI Takahiro
Post by David Woodhouse
You still have typos here.
I'd like to defer to the maintainers whether we prefer MiB over MB.
It is not really a matter of preference. One is correct; the other is not.
While simple errors can of course be forgiven, I cannot understand why you
would deliberately repeat an error once it has been pointed out to you.
Because I think that people sometimes use those two interchangeably.
So I said I would defer to the maintainers.

-Takahiro AKASHI
Post by David Woodhouse
--
dwmw2
Ard Biesheuvel
2017-04-04 07:39:28 UTC
Permalink
Post by AKASHI Takahiro
Post by David Woodhouse
Post by AKASHI Takahiro
Post by David Woodhouse
You still have typos here.
I'd like to defer to the maintainers whether we prefer MiB over MB.
It is not really a matter of preference. One is correct; the other is not.
While simple errors can of course be forgiven, I cannot understand why you
would deliberately repeat an error once it has been pointed out to you.
Because I think that people sometimes use those two interchangeably.
So I said I would defer to the maintainers.
I have to agree with Akashi-san here: while you are technically
correct, the reality is that the MiB is not as widely adopted as you
suggest, and there is no ambiguity whatsoever in this particular case
(i.e., when referring to blocks of RAM), so I feel it is somewhat
counterproductive to confuse reviewers by stating 'You still have
typos here.' without specifying that it is MiB vs MB that you are
actually referring to.

These patches are complex enough as they are, so could we *please*
focus on the things that matter?
David Woodhouse
2017-04-04 07:44:04 UTC
Permalink
Post by AKASHI Takahiro
Because I think that people sometimes use those two interchangeably.
So I said I would defer to the maintainers.
Sometimes they do, yes. Just as sometimes people use "their",
"they're", and "there" interchangeably.

Rarely in a professional context, though. And even more rarely when
their error has already been pointed out to them.

There are no good reasons to *deliberately* get it wrong.

I've heard it suggested that 'MiB' would confuse people who have never
seen it before. And that it was ugly. Those arguments were fairly
specious when they were first made, and they're even sillier now — more
than 20 years since the binary prefixes were introduced.

The alleged confusion, and the perceived ugliness, are purely due to
unfamiliarity and will pass.

The correctness, and the lack of ambiguity, will not.
Will Deacon
2017-04-04 09:26:04 UTC
Permalink
Guys, we were supposed to stop discussing this three days ago.
Post by David Woodhouse
Post by AKASHI Takahiro
Because I think that people sometimes use those two interchangeably.
So I said I would defer to the maintainers.
Sometimes they do, yes. Just as sometimes people use "their",
"they're", and "there" interchangeably.
Rarely in a professional context, though. And even more rarely when
their error has already been pointed out to them.
There are no good reasons to *deliberately* get it wrong.
I've heard it suggested that 'MiB' would confuse people who have never
seen it before. And that it was ugly. Those arguments were fairly
specious when they were first made, and they're even sillier now — more
than 20 years since the binary prefixes were introduced.
The alleged confusion, and the perceived ugliness, are purely due to
unfamiliarity and will pass.
The correctness, and the lack of ambiguity, will not.
I think consistency comes into play here. We (arm64) and the rest of the
kernel get this wrong all the time, so if we're going to fix it then we
should look at the wider codebase and I'd rather not do that as part of the
kdump series. Also, why stop at the suffixes? We don't have any occurences
of 'mebibyte' in the kernel sources, but plenty of busted 'megabytes'. A
patch making arm64 consistent could be discussed separately, otherwise kdump
becomes the pedantic ISO guy trying to lead by example, but really everybody
ignores him because it's completely inconsequential and they also know he
went 35 versions without giving a monkey's.

David, since you seem to be the most outraged, fancy sending a patch? ;)

Alternatively, who fancies burning some dictionaries?

Will
David Woodhouse
2017-04-13 12:18:58 UTC
Permalink
Less important than in user-visible messages, but still good practice as
there's still no excuse for ARM64 code to look like it was written before
1996.

Signed-off-by: David Woodhouse <***@amazon.co.uk>
---
arch/arm64/boot/dts/arm/juno-base.dtsi | 2 +-
.../boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts | 2 +-
arch/arm64/boot/dts/broadcom/ns2-xmc.dts | 24 +++++++++++-----------
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts | 2 +-
arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 2 +-
arch/arm64/boot/dts/marvell/berlin4ct-dmp.dts | 2 +-
arch/arm64/boot/dts/marvell/berlin4ct-stb.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7796-m3ulcb.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7796-salvator-x.dts | 2 +-
arch/arm64/include/asm/assembler.h | 2 +-
arch/arm64/include/asm/boot.h | 4 ++--
arch/arm64/include/asm/elf.h | 2 +-
arch/arm64/include/asm/fixmap.h | 6 +++---
arch/arm64/kernel/head.S | 8 ++++----
arch/arm64/kernel/hyp-stub.S | 2 +-
arch/arm64/kernel/kaslr.c | 14 ++++++-------
arch/arm64/kernel/module-plts.c | 2 +-
arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++--------
arch/arm64/mm/mmu.c | 2 +-
arch/arm64/mm/proc.S | 2 +-
22 files changed, 53 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
index df539e8..f70fb13 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -694,7 +694,7 @@

***@80000000 {
device_type = "memory";
- /* last 16MB of the first memory area is reserved for secure world use by firmware */
+ /* last 16MiB of the first memory area is reserved for secure world use by firmware */
reg = <0x00000000 0x80000000 0x0 0x7f000000>,
<0x00000008 0x80000000 0x1 0x80000000>;
};
diff --git a/arch/arm64/boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts b/arch/arm64/boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts
index e3a1711..38ece4b 100644
--- a/arch/arm64/boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts
+++ b/arch/arm64/boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts
@@ -60,7 +60,7 @@

***@80000000 {
device_type = "memory";
- reg = <0 0x80000000 0 0x80000000>; /* 2GB @ 2GB */
+ reg = <0 0x80000000 0 0x80000000>; /* 2GiB @ 2GiB */
};

gic: interrupt-***@2c001000 {
diff --git a/arch/arm64/boot/dts/broadcom/ns2-xmc.dts b/arch/arm64/boot/dts/broadcom/ns2-xmc.dts
index 99a2723..5908524 100644
--- a/arch/arm64/boot/dts/broadcom/ns2-xmc.dts
+++ b/arch/arm64/boot/dts/broadcom/ns2-xmc.dts
@@ -87,36 +87,36 @@

***@0 {
label = "nboot";
- reg = <0x00000000 0x00280000>; /* 2.5MB */
+ reg = <0x00000000 0x00280000>; /* 2.5MiB */
read-only;
};

***@280000 {
label = "nenv";
- reg = <0x00280000 0x00040000>; /* 0.25MB */
+ reg = <0x00280000 0x00040000>; /* 0.25MiB */
read-only;
};

***@2c0000 {
label = "ndtb";
- reg = <0x002c0000 0x00040000>; /* 0.25MB */
+ reg = <0x002c0000 0x00040000>; /* 0.25MiB */
read-only;
};

***@300000 {
label = "nsystem";
- reg = <0x00300000 0x03d00000>; /* 61MB */
+ reg = <0x00300000 0x03d00000>; /* 61MiB */
read-only;
};

***@4000000 {
label = "nrootfs";
- reg = <0x04000000 0x06400000>; /* 100MB */
+ reg = <0x04000000 0x06400000>; /* 100MiB */
};

***@0a400000{
label = "ncustfs";
- reg = <0x0a400000 0x35c00000>; /* 860MB */
+ reg = <0x0a400000 0x35c00000>; /* 860MiB */
};
};
};
@@ -156,32 +156,32 @@

***@0 {
label = "bl0";
- reg = <0x00000000 0x00080000>; /* 512KB */
+ reg = <0x00000000 0x00080000>; /* 512KiB */
};

***@80000 {
label = "fip";
- reg = <0x00080000 0x00150000>; /* 1344KB */
+ reg = <0x00080000 0x00150000>; /* 1344KiB */
};

***@1e0000 {
label = "env";
- reg = <0x001e0000 0x00010000>;/* 64KB */
+ reg = <0x001e0000 0x00010000>;/* 64KiB */
};

***@1f0000 {
label = "dtb";
- reg = <0x001f0000 0x00010000>; /* 64KB */
+ reg = <0x001f0000 0x00010000>; /* 64KiB */
};

***@200000 {
label = "kernel";
- reg = <0x00200000 0x00e00000>; /* 14MB */
+ reg = <0x00200000 0x00e00000>; /* 14MiB */
};

***@1000000 {
label = "rootfs";
- reg = <0x01000000 0x01000000>; /* 16MB */
+ reg = <0x01000000 0x01000000>; /* 16MiB */
};
};
};
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts
index c37110b..71ce1f1 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts
@@ -126,7 +126,7 @@
***@0 {
#address-cells = <1>;
#size-cells = <1>;
- compatible = "n25q128a13", "jedec,spi-nor"; /* 16MB */
+ compatible = "n25q128a13", "jedec,spi-nor"; /* 16MiB */
reg = <0>;
spi-max-frequency = <1000000>; /* input clock */
};
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index e5935f2..43e549f 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -152,7 +152,7 @@
***@80000000 {
device_type = "memory";
reg = <0x00000000 0x80000000 0 0x80000000>;
- /* DRAM space - 1, size : 2 GB DRAM */
+ /* DRAM space - 1, size : 2 GiB DRAM */
};

sysclk: sysclk {
diff --git a/arch/arm64/boot/dts/marvell/berlin4ct-dmp.dts b/arch/arm64/boot/dts/marvell/berlin4ct-dmp.dts
index fae6c69..4b3a5e2 100644
--- a/arch/arm64/boot/dts/marvell/berlin4ct-dmp.dts
+++ b/arch/arm64/boot/dts/marvell/berlin4ct-dmp.dts
@@ -56,7 +56,7 @@

***@1000000 {
device_type = "memory";
- /* the first 16MB is for firmwares' usage */
+ /* the first 16MiB is for firmwares' usage */
reg = <0 0x01000000 0 0x7f000000>;
};
};
diff --git a/arch/arm64/boot/dts/marvell/berlin4ct-stb.dts b/arch/arm64/boot/dts/marvell/berlin4ct-stb.dts
index d47edad..049018b 100644
--- a/arch/arm64/boot/dts/marvell/berlin4ct-stb.dts
+++ b/arch/arm64/boot/dts/marvell/berlin4ct-stb.dts
@@ -56,7 +56,7 @@

***@1000000 {
device_type = "memory";
- /* the first 16MB is for firmwares' usage */
+ /* the first 16MiB is for firmwares' usage */
reg = <0 0x01000000 0 0x7f000000>;
};
};
diff --git a/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts b/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts
index c5f8f69..15f57e6 100644
--- a/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts
+++ b/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts
@@ -29,7 +29,7 @@

***@48000000 {
device_type = "memory";
- /* first 128MB is reserved for secure area. */
+ /* first 128MiB is reserved for secure area. */
reg = <0x0 0x48000000 0x0 0x38000000>;
};

diff --git a/arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dts b/arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dts
index 7a8986e..35f7cd2 100644
--- a/arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dts
+++ b/arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dts
@@ -52,7 +52,7 @@

***@48000000 {
device_type = "memory";
- /* first 128MB is reserved for secure area. */
+ /* first 128MiB is reserved for secure area. */
reg = <0x0 0x48000000 0x0 0x38000000>;
};

diff --git a/arch/arm64/boot/dts/renesas/r8a7796-m3ulcb.dts b/arch/arm64/boot/dts/renesas/r8a7796-m3ulcb.dts
index c3f064a..4d1295e 100644
--- a/arch/arm64/boot/dts/renesas/r8a7796-m3ulcb.dts
+++ b/arch/arm64/boot/dts/renesas/r8a7796-m3ulcb.dts
@@ -28,7 +28,7 @@

***@48000000 {
device_type = "memory";
- /* first 128MB is reserved for secure area. */
+ /* first 128MiB is reserved for secure area. */
reg = <0x0 0x48000000 0x0 0x38000000>;
};

diff --git a/arch/arm64/boot/dts/renesas/r8a7796-salvator-x.dts b/arch/arm64/boot/dts/renesas/r8a7796-salvator-x.dts
index c7f40f8..2a019a3 100644
--- a/arch/arm64/boot/dts/renesas/r8a7796-salvator-x.dts
+++ b/arch/arm64/boot/dts/renesas/r8a7796-salvator-x.dts
@@ -28,7 +28,7 @@

***@48000000 {
device_type = "memory";
- /* first 128MB is reserved for secure area. */
+ /* first 128MiB is reserved for secure area. */
reg = <0x0 0x48000000 0x0 0x78000000>;
};

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 1b67c37..56094e2 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -165,7 +165,7 @@ lr .req x30 // link register

/*
* Pseudo-ops for PC-relative adr/ldr/str <reg>, <symbol> where
- * <symbol> is within the range +/- 4 GB of the PC when running
+ * <symbol> is within the range +/- 4 GiB of the PC when running
* in core kernel context. In module context, a movz/movk sequence
* is used, since modules may be loaded far away from the kernel
* when KASLR is in effect.
diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
index ebf2481..95e26bc 100644
--- a/arch/arm64/include/asm/boot.h
+++ b/arch/arm64/include/asm/boot.h
@@ -6,14 +6,14 @@

/*
* arm64 requires the DTB to be 8 byte aligned and
- * not exceed 2MB in size.
+ * not exceed 2MiB in size.
*/
#define MIN_FDT_ALIGN 8
#define MAX_FDT_SIZE SZ_2M

/*
* arm64 requires the kernel image to placed
- * TEXT_OFFSET bytes beyond a 2 MB aligned base
+ * TEXT_OFFSET bytes beyond a 2 MiB aligned base
*/
#define MIN_KIMG_ALIGN SZ_2M

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 5d17004..28e63ff 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -156,7 +156,7 @@ struct linux_binprm;
extern int arch_setup_additional_pages(struct linux_binprm *bprm,
int uses_interp);

-/* 1GB of VA */
+/* 1GiB of VA */
#ifdef CONFIG_COMPAT
#define STACK_RND_MASK (test_thread_flag(TIF_32BIT) ? \
0x7ff >> (PAGE_SHIFT - 12) : \
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index caf86be..7e90ab2 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -37,13 +37,13 @@ enum fixed_addresses {
FIX_HOLE,

/*
- * Reserve a virtual window for the FDT that is 2 MB larger than the
+ * Reserve a virtual window for the FDT that is 2 MiB larger than the
* maximum supported size, and put it at the top of the fixmap region.
* The additional space ensures that any FDT that does not exceed
* MAX_FDT_SIZE can be mapped regardless of whether it crosses any
- * 2 MB alignment boundaries.
+ * 2 MiB alignment boundaries.
*
- * Keep this at the top so it remains 2 MB aligned.
+ * Keep this at the top so it remains 2 MiB aligned.
*/
#define FIX_FDT_SIZE (MAX_FDT_SIZE + SZ_2M)
FIX_FDT_END,
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4fb6ccd..067ddd6 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -45,11 +45,11 @@
#define __PHYS_OFFSET (KERNEL_START - TEXT_OFFSET)

#if (TEXT_OFFSET & 0xfff) != 0
-#error TEXT_OFFSET must be at least 4KB aligned
+#error TEXT_OFFSET must be at least 4KiB aligned
#elif (PAGE_OFFSET & 0x1fffff) != 0
-#error PAGE_OFFSET must be at least 2MB aligned
+#error PAGE_OFFSET must be at least 2MiB aligned
#elif TEXT_OFFSET > 0x1fffff
-#error TEXT_OFFSET must be less than 2MB
+#error TEXT_OFFSET must be less than 2MiB
#endif

/*
@@ -360,7 +360,7 @@ ENDPROC(preserve_boot_args)
* Setup the initial page tables. We only setup the barest amount which is
* required to get the kernel running. The following sections are required:
* - identity mapping to enable the MMU (low address, TTBR0)
- * - first few MB of the kernel linear mapping to jump to once the MMU has
+ * - first few MiB of the kernel linear mapping to jump to once the MMU has
* been enabled
*/
__create_page_tables:
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index d3b5f75..9585ea7 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -106,7 +106,7 @@ ENDPROC(\label)
* be called on each CPU.
*
* x0 must be the physical address of the new vector table, and must be
- * 2KB aligned.
+ * 2KiB aligned.
*
* Before calling this, you must check that the stub hypervisor is installed
* everywhere, by waiting for any secondary CPUs to be brought up and then
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index d7e90d9..a5a3068 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -118,9 +118,9 @@ u64 __init kaslr_early_init(u64 dt_phys, u64 modulo_offset)
* OK, so we are proceeding with KASLR enabled. Calculate a suitable
* kernel image offset from the seed. Let's place the kernel in the
* lower half of the VMALLOC area (VA_BITS - 2).
- * Even if we could randomize at page granularity for 16k and 64k pages,
- * let's always round to 2 MB so we don't interfere with the ability to
- * map using contiguous PTEs
+ * Even if we could randomize at page granularity for 16KiB and 64KiB
+ * pages, let's always round to 2 MiB so we don't interfere with the
+ * ability to map using contiguous PTEs
*/
mask = ((1UL << (VA_BITS - 2)) - 1) & ~(SZ_2M - 1);
offset = seed & mask;
@@ -129,10 +129,10 @@ u64 __init kaslr_early_init(u64 dt_phys, u64 modulo_offset)
memstart_offset_seed = seed >> 48;

/*
- * The kernel Image should not extend across a 1GB/32MB/512MB alignment
- * boundary (for 4KB/16KB/64KB granule kernels, respectively). If this
- * happens, increase the KASLR offset by the size of the kernel image
- * rounded up by SWAPPER_BLOCK_SIZE.
+ * The kernel Image should not extend across a 1GiB/32MiB/512MiB
+ * alignment boundary (for 4KiB/16KiB/64KiB granule kernels,
+ * respectively). If this happens, increase the KASLR offset by
+ * the size of the kernel image rounded up by SWAPPER_BLOCK_SIZE.
*/
if ((((u64)_text + offset + modulo_offset) >> SWAPPER_TABLE_SHIFT) !=
(((u64)_end + offset + modulo_offset) >> SWAPPER_TABLE_SHIFT)) {
diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
index 1ce90d8..0d66734 100644
--- a/arch/arm64/kernel/module-plts.c
+++ b/arch/arm64/kernel/module-plts.c
@@ -118,7 +118,7 @@ static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num)
* We only have to consider branch targets that resolve
* to undefined symbols. This is not simply a heuristic,
* it is a fundamental limitation, since the PLT itself
- * is part of the module, and needs to be within 128 MB
+ * is part of the module, and needs to be within 128 MiB
* as well, so modules can never grow beyond that limit.
*/
s = syms + ELF64_R_SYM(rela[i].r_info);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa..93b39d6 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -25,12 +25,12 @@ jiffies = jiffies_64;

#define HYPERVISOR_TEXT \
/* \
- * Align to 4 KB so that \
+ * Align to 4 KiB so that \
* a) the HYP vector table is at its minimum \
* alignment of 2048 bytes \
* b) the HYP init code will not cross a page \
* boundary if its size does not exceed \
- * 4 KB (see related ASSERT() below) \
+ * 4 KiB (see related ASSERT() below) \
*/ \
. = ALIGN(SZ_4K); \
VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
@@ -60,7 +60,7 @@ jiffies = jiffies_64;
* The size of the PE/COFF section that covers the kernel image, which
* runs from stext to _edata, must be a round multiple of the PE/COFF
* FileAlignment, which we set to its minimum value of 0x200. 'stext'
- * itself is 4 KB aligned, so padding out _edata to a 0x200 aligned
+ * itself is 4 KiB aligned, so padding out _edata to a 0x200 aligned
* boundary should be sufficient.
*/
PECOFF_FILE_ALIGNMENT = 0x200;
@@ -74,16 +74,16 @@ PECOFF_FILE_ALIGNMENT = 0x200;

#if defined(CONFIG_DEBUG_ALIGN_RODATA)
/*
- * 4 KB granule: 1 level 2 entry
- * 16 KB granule: 128 level 3 entries, with contiguous bit
- * 64 KB granule: 32 level 3 entries, with contiguous bit
+ * 4 KiB granule: 1 level 2 entry
+ * 16 KiB granule: 128 level 3 entries, with contiguous bit
+ * 64 KiB granule: 32 level 3 entries, with contiguous bit
*/
#define SEGMENT_ALIGN SZ_2M
#else
/*
- * 4 KB granule: 16 level 3 entries, with contiguous bit
- * 16 KB granule: 4 level 3 entries, without contiguous bit
- * 64 KB granule: 1 level 3 entry
+ * 4 KiB granule: 16 level 3 entries, with contiguous bit
+ * 16 KiB granule: 4 level 3 entries, without contiguous bit
+ * 64 KiB granule: 1 level 3 entry
*/
#define SEGMENT_ALIGN SZ_64K
#endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf..5d8b743 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -229,7 +229,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
next = pud_addr_end(addr, end);

/*
- * For 4K granule only, attempt to put down a 1GB block
+ * For 4KiB granule only, attempt to put down a 1GiB block
*/
if (use_1G_block(addr, next, phys) && !page_mappings_only) {
pud_set_huge(pud, phys, prot);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 877d42f..daba6ba 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -221,7 +221,7 @@ ENTRY(__cpu_setup)
bic x0, x0, x5 // clear bits
orr x0, x0, x6 // set bits
/*
- * Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for
+ * Set/prepare TCR and TTBR. We use 512GiB (39-bit) address range for
* both user and kernel.
*/
ldr x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
--
2.9.3
--
dwmw2
Simon Horman
2017-04-16 23:12:23 UTC
Permalink
[Cc linux-renesas-soc]
Post by David Woodhouse
Less important than in user-visible messages, but still good practice as
there's still no excuse for ARM64 code to look like it was written before
1996.
Hi David,

I'd be happy to take the Renesas portions of this change if they were
broken out into a separate patch. The reason I would prefer a separate
patch is to avoid the possibility of conflicts, even trivial ones.
Post by David Woodhouse
---
arch/arm64/boot/dts/arm/juno-base.dtsi | 2 +-
.../boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts | 2 +-
arch/arm64/boot/dts/broadcom/ns2-xmc.dts | 24 +++++++++++-----------
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts | 2 +-
arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 2 +-
arch/arm64/boot/dts/marvell/berlin4ct-dmp.dts | 2 +-
arch/arm64/boot/dts/marvell/berlin4ct-stb.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7796-m3ulcb.dts | 2 +-
arch/arm64/boot/dts/renesas/r8a7796-salvator-x.dts | 2 +-
arch/arm64/include/asm/assembler.h | 2 +-
arch/arm64/include/asm/boot.h | 4 ++--
arch/arm64/include/asm/elf.h | 2 +-
arch/arm64/include/asm/fixmap.h | 6 +++---
arch/arm64/kernel/head.S | 8 ++++----
arch/arm64/kernel/hyp-stub.S | 2 +-
arch/arm64/kernel/kaslr.c | 14 ++++++-------
arch/arm64/kernel/module-plts.c | 2 +-
arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++--------
arch/arm64/mm/mmu.c | 2 +-
arch/arm64/mm/proc.S | 2 +-
22 files changed, 53 insertions(+), 53 deletions(-)
...
Geert Uytterhoeven
2017-04-17 11:54:49 UTC
Permalink
Post by Simon Horman
Post by David Woodhouse
Less important than in user-visible messages, but still good practice as
there's still no excuse for ARM64 code to look like it was written before
1996.
Hi David,
I'd be happy to take the Renesas portions of this change if they were
broken out into a separate patch. The reason I would prefer a separate
patch is to avoid the possibility of conflicts, even trivial ones.
s/possibility/reality/, unless this series goes in before v4.12-rc1.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ***@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Catalin Marinas
2017-04-18 14:13:12 UTC
Permalink
Post by Geert Uytterhoeven
Post by Simon Horman
Post by David Woodhouse
Less important than in user-visible messages, but still good practice as
there's still no excuse for ARM64 code to look like it was written before
1996.
Hi David,
I'd be happy to take the Renesas portions of this change if they were
broken out into a separate patch. The reason I would prefer a separate
patch is to avoid the possibility of conflicts, even trivial ones.
s/possibility/reality/, unless this series goes in before v4.12-rc1.
Maybe the arm-soc guys can take both patches on top of the other series
they pull, in which case, for both patches:

Acked-by: Catalin Marinas <***@arm.com>
Olof Johansson
2017-04-19 14:25:03 UTC
Permalink
Post by Catalin Marinas
Post by Geert Uytterhoeven
Post by Simon Horman
Post by David Woodhouse
Less important than in user-visible messages, but still good practice as
there's still no excuse for ARM64 code to look like it was written before
1996.
Hi David,
I'd be happy to take the Renesas portions of this change if they were
broken out into a separate patch. The reason I would prefer a separate
patch is to avoid the possibility of conflicts, even trivial ones.
s/possibility/reality/, unless this series goes in before v4.12-rc1.
Maybe the arm-soc guys can take both patches on top of the other series
Sure, please resend to ***@kernel.org with collected acks.

-Olo

David Woodhouse
2017-04-13 12:15:54 UTC
Permalink
Post by Will Deacon
 A
patch making arm64 consistent could be discussed separately, otherwise kdump
becomes the pedantic ISO guy trying to lead by example, but really everybody
ignores him because it's completely inconsequential and they also know he
went 35 versions without giving a monkey's.
I still don't see the logic there for *wanting* kdump to be wrong.

Sure, kdump getting it right doesn't necessarily make a big difference
in itself.

But given that the error has been pointed out, what is the motivation
for *wanting* the error to remain in the kdump code, instead of just
fixing it? "Consistency" isn't an answer because we are *already*
inconsistent — some code gets it right, and other code doesn't.

We should converge towards *correctness* rather than deliberately
adding more incorrect code.

I've used my 'i' key more times just in typing this email than it would
have taken to just fix the problem the first time it was pointed out.
Post by Will Deacon
David, since you seem to be the most outraged, fancy sending a patch? ;)
Coming up. In two parts — user-visible messages, followed by cosmetic
and less relevant stuff.
--
dwmw2
David Woodhouse
2017-04-13 12:17:47 UTC
Permalink
The IEC binary prefixes (Ki, Mi, Gi, etc.) were published over twenty
years ago. We should use them consistently, especially in user-visible
messages.

Sure, it doesn't often matter, just as *most* typos and spelling or
grammar mistakes don't often matter. But sometimes, such misuse really
do actually introduce ambiguity, and we should avoid that.

Conversely, there is absolutely no good reason *not* to be using the
binary prefixes. Some people once claimed to find them "ugly", or that
they would cause confusion. But those are purely down to unfamiliarity.

The perceived ugliness, and the alleged confusion, will pass with use.

The correctness, and the lack of ambiguity, will not.

ARM64 in particular, as a new platform, has no excuse for not using the
IEC prefixes which predate its existence by a decade and a half.

What's worse is that some people are pointing at the existing errors and
actually claiming that they want their *new* code to be deliberately
wrong in order to be "consistent" with what's there.

So let's fix the user-visible messages in all of arch/arm64 and nip
*that* particular stupidity in the bud...

Signed-off-by: David Woodhouse <***@amazon.co.uk>
---
arch/arm64/Kconfig | 22 +++++++++++-----------
arch/arm64/kernel/efi.c | 2 +-
arch/arm64/kernel/setup.c | 2 +-
arch/arm64/mm/init.c | 36 ++++++++++++++++++------------------
4 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3741859..9643223 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -442,7 +442,7 @@ config CAVIUM_ERRATUM_22375
This implements two gicv3-its errata workarounds for ThunderX. Both
with small impact affecting only ITS table allocation.

- erratum 22375: only alloc 8MB table size
+ erratum 22375: only alloc 8MiB table size
erratum 24313: ignore memory access type

The fixes are in ITS initialization and basically ignore memory access
@@ -528,21 +528,21 @@ choice
Page size (translation granule) configuration.

config ARM64_4K_PAGES
- bool "4KB"
+ bool "4KiB"
help
- This feature enables 4KB pages support.
+ This feature enables 4KiB pages support.

config ARM64_16K_PAGES
- bool "16KB"
+ bool "16KiB"
help
- The system will use 16KB pages support. AArch32 emulation
- requires applications compiled with 16K (or a multiple of 16K)
+ The system will use 16KiB pages support. AArch32 emulation
+ requires applications compiled with 16KiB (or a multiple of 16KiB)
aligned segments.

config ARM64_64K_PAGES
- bool "64KB"
+ bool "64KiB"
help
- This feature enables 64KB pages support (4KB by default)
+ This feature enables 64KiB pages support (4KiB by default)
allowing only two levels of page tables and faster TLB
look-up. AArch32 emulation requires applications compiled
with 64K aligned segments.
@@ -1063,9 +1063,9 @@ config COMPAT
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.

- If you use a page size other than 4KB (i.e, 16KB or 64KB), please be aware
- that you will only be able to execute AArch32 binaries that were compiled
- with page size aligned segments.
+ If you use a page size other than 4KiB (i.e, 16KiB or 64KiB), please be
+ aware that you will only be able to execute AArch32 binaries that were
+ compiled with page size aligned segments.

If you want to execute 32-bit userspace applications, say Y.

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 5d17f37..d80c11f 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -31,7 +31,7 @@ static __init pteval_t create_mapping_protection(efi_memory_desc_t *md)
return PROT_DEVICE_nGnRE;

if (WARN_ONCE(!PAGE_ALIGNED(md->phys_addr),
- "UEFI Runtime regions are not aligned to 64 KB -- buggy firmware?"))
+ "UEFI Runtime regions are not aligned to 64 KiB -- buggy firmware?"))
/*
* If the region is not aligned to the page size of the OS, we
* can not use strict permissions, since that would also affect
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 42274bd..86de05d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -185,7 +185,7 @@ static void __init setup_machine_fdt(phys_addr_t dt_phys)
if (!dt_virt || !early_init_dt_scan(dt_virt)) {
pr_crit("\n"
"Error: invalid device tree blob at physical address %pa (virtual address 0x%p)\n"
- "The dtb must be 8-byte aligned and must not exceed 2 MB in size\n"
+ "The dtb must be 8-byte aligned and must not exceed 2 MiB in size\n"
"\nPlease check your bootloader.",
&dt_phys, dt_virt);

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e19e065..85481b9 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -111,7 +111,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)

memset(zone_size, 0, sizeof(zone_size));

- /* 4GB maximum for 32-bit only capable devices */
+ /* 4GiB maximum for 32-bit only capable devices */
#ifdef CONFIG_ZONE_DMA
max_dma = PFN_DOWN(arm64_dma_phys_limit);
zone_size[ZONE_DMA] = max_dma - min;
@@ -182,7 +182,7 @@ static int __init early_mem(char *p)
return 1;

memory_limit = memparse(p, &p) & PAGE_MASK;
- pr_notice("Memory limited to %lldMB\n", memory_limit >> 20);
+ pr_notice("Memory limited to %lldMiB\n", memory_limit >> 20);

return 0;
}
@@ -242,7 +242,7 @@ void __init arm64_memblock_init(void)
* We can only add back the initrd memory if we don't end up
* with more memory than we can address via the linear mapping.
* It is up to the bootloader to position the kernel and the
- * initrd reasonably close to each other (i.e., within 32 GB of
+ * initrd reasonably close to each other (i.e., within 32 GiB of
* each other) so that all granule/#levels combinations can
* always access both.
*/
@@ -292,7 +292,7 @@ void __init arm64_memblock_init(void)

early_init_fdt_scan_reserved_mem();

- /* 4GB maximum for 32-bit only capable devices */
+ /* 4GiB maximum for 32-bit only capable devices */
if (IS_ENABLED(CONFIG_ZONE_DMA))
arm64_dma_phys_limit = max_zone_dma_phys();
else
@@ -425,35 +425,35 @@ void __init mem_init(void)

pr_notice("Virtual kernel memory layout:\n");
#ifdef CONFIG_KASAN
- pr_notice(" kasan : 0x%16lx - 0x%16lx (%6ld GB)\n",
+ pr_notice(" kasan : 0x%16lx - 0x%16lx (%6ld GiB)\n",
MLG(KASAN_SHADOW_START, KASAN_SHADOW_END));
#endif
- pr_notice(" modules : 0x%16lx - 0x%16lx (%6ld MB)\n",
+ pr_notice(" modules : 0x%16lx - 0x%16lx (%6ld MiB)\n",
MLM(MODULES_VADDR, MODULES_END));
- pr_notice(" vmalloc : 0x%16lx - 0x%16lx (%6ld GB)\n",
+ pr_notice(" vmalloc : 0x%16lx - 0x%16lx (%6ld GiB)\n",
MLG(VMALLOC_START, VMALLOC_END));
- pr_notice(" .text : 0x%p" " - 0x%p" " (%6ld KB)\n",
+ pr_notice(" .text : 0x%p" " - 0x%p" " (%6ld KiB)\n",
MLK_ROUNDUP(_text, _etext));
- pr_notice(" .rodata : 0x%p" " - 0x%p" " (%6ld KB)\n",
+ pr_notice(" .rodata : 0x%p" " - 0x%p" " (%6ld KiB)\n",
MLK_ROUNDUP(__start_rodata, __init_begin));
- pr_notice(" .init : 0x%p" " - 0x%p" " (%6ld KB)\n",
+ pr_notice(" .init : 0x%p" " - 0x%p" " (%6ld KiB)\n",
MLK_ROUNDUP(__init_begin, __init_end));
- pr_notice(" .data : 0x%p" " - 0x%p" " (%6ld KB)\n",
+ pr_notice(" .data : 0x%p" " - 0x%p" " (%6ld KiB)\n",
MLK_ROUNDUP(_sdata, _edata));
- pr_notice(" .bss : 0x%p" " - 0x%p" " (%6ld KB)\n",
+ pr_notice(" .bss : 0x%p" " - 0x%p" " (%6ld KiB)\n",
MLK_ROUNDUP(__bss_start, __bss_stop));
- pr_notice(" fixed : 0x%16lx - 0x%16lx (%6ld KB)\n",
+ pr_notice(" fixed : 0x%16lx - 0x%16lx (%6ld KiB)\n",
MLK(FIXADDR_START, FIXADDR_TOP));
- pr_notice(" PCI I/O : 0x%16lx - 0x%16lx (%6ld MB)\n",
+ pr_notice(" PCI I/O : 0x%16lx - 0x%16lx (%6ld MiB)\n",
MLM(PCI_IO_START, PCI_IO_END));
#ifdef CONFIG_SPARSEMEM_VMEMMAP
- pr_notice(" vmemmap : 0x%16lx - 0x%16lx (%6ld GB maximum)\n",
+ pr_notice(" vmemmap : 0x%16lx - 0x%16lx (%6ld GiB maximum)\n",
MLG(VMEMMAP_START, VMEMMAP_START + VMEMMAP_SIZE));
- pr_notice(" 0x%16lx - 0x%16lx (%6ld MB actual)\n",
+ pr_notice(" 0x%16lx - 0x%16lx (%6ld MiB actual)\n",
MLM((unsigned long)phys_to_page(memblock_start_of_DRAM()),
(unsigned long)virt_to_page(high_memory)));
#endif
- pr_notice(" memory : 0x%16lx - 0x%16lx (%6ld MB)\n",
+ pr_notice(" memory : 0x%16lx - 0x%16lx (%6ld MiB)\n",
MLM(__phys_to_virt(memblock_start_of_DRAM()),
(unsigned long)high_memory));

@@ -523,7 +523,7 @@ __setup("keepinitrd", keepinitrd_setup);
static int dump_mem_limit(struct notifier_block *self, unsigned long v, void *p)
{
if (memory_limit != (phys_addr_t)ULLONG_MAX) {
- pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20);
+ pr_emerg("Memory Limit: %llu MiB\n", memory_limit >> 20);
} else {
pr_emerg("Memory Limit: none\n");
}
--
2.9.3
--
dwmw2
Geert Uytterhoeven
2017-04-19 09:29:22 UTC
Permalink
Hi David,
Post by David Woodhouse
The IEC binary prefixes (Ki, Mi, Gi, etc.) were published over twenty
years ago. We should use them consistently, especially in user-visible
messages.
Sure, it doesn't often matter, just as *most* typos and spelling or
grammar mistakes don't often matter. But sometimes, such misuse really
do actually introduce ambiguity, and we should avoid that.
Conversely, there is absolutely no good reason *not* to be using the
binary prefixes. Some people once claimed to find them "ugly", or that
they would cause confusion. But those are purely down to unfamiliarity.
The perceived ugliness, and the alleged confusion, will pass with use.
The correctness, and the lack of ambiguity, will not.
ARM64 in particular, as a new platform, has no excuse for not using the
IEC prefixes which predate its existence by a decade and a half.
What's worse is that some people are pointing at the existing errors and
actually claiming that they want their *new* code to be deliberately
wrong in order to be "consistent" with what's there.
So let's fix the user-visible messages in all of arch/arm64 and nip
*that* particular stupidity in the bud...
For correctness:
Reviewed-by: Geert Uytterhoeven <geert+***@glider.be>

For the policy:
Acked-by: Geert Uytterhoeven <geert+***@glider.be>

You're gonna need a new sweep for soon to be added occurrences, though.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ***@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
AKASHI Takahiro
2017-03-28 06:53:19 UTC
Permalink
From: Sameer Goel <***@codeaurora.org>

In cases where a device tree is not provided (ie ACPI based system), an
empty fdt is generated by efistub. #address-cells and #size-cells are not
set in the empty fdt, so they default to 1 (4 byte wide). This can be an
issue on 64-bit systems where values representing addresses, etc may be
8 bytes wide as the default value does not align with the general
requirements for an empty DTB, and is fragile when passed to other agents
as extra care is required to read the entire width of a value.

This issue is observed on Qualcomm Technologies QDF24XX platforms when
kexec-tools inserts 64-bit addresses into the "linux,elfcorehdr" and
"linux,usable-memory-range" properties of the fdt. When the values are
later consumed, they are truncated to 32-bit.

Setting #address-cells and #size-cells to 2 at creation of the empty fdt
resolves the observed issue, and makes the fdt less fragile.

Signed-off-by: Sameer Goel <***@codeaurora.org>
Signed-off-by: Jeffrey Hugo <***@codeaurora.org>
---
drivers/firmware/efi/libstub/fdt.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index 260c4b4b492e..82973b86efe4 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -16,6 +16,22 @@

#include "efistub.h"

+#define EFI_DT_ADDR_CELLS_DEFAULT 2
+#define EFI_DT_SIZE_CELLS_DEFAULT 2
+
+static void fdt_update_cell_size(efi_system_table_t *sys_table, void *fdt)
+{
+ int offset;
+
+ offset = fdt_path_offset(fdt, "/");
+ /* Set the #address-cells and #size-cells values for an empty tree */
+
+ fdt_setprop_u32(fdt, offset, "#address-cells",
+ EFI_DT_ADDR_CELLS_DEFAULT);
+
+ fdt_setprop_u32(fdt, offset, "#size-cells", EFI_DT_SIZE_CELLS_DEFAULT);
+}
+
static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
unsigned long orig_fdt_size,
void *fdt, int new_fdt_size, char *cmdline_ptr,
@@ -42,10 +58,18 @@ static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
}
}

- if (orig_fdt)
+ if (orig_fdt) {
status = fdt_open_into(orig_fdt, fdt, new_fdt_size);
- else
+ } else {
status = fdt_create_empty_tree(fdt, new_fdt_size);
+ if (status == 0) {
+ /*
+ * Any failure from the following function is non
+ * critical
+ */
+ fdt_update_cell_size(sys_table, fdt);
+ }
+ }

if (status != 0)
goto fdt_set_fail;
--
2.11.1
Ard Biesheuvel
2017-03-28 10:08:24 UTC
Permalink
Post by AKASHI Takahiro
In cases where a device tree is not provided (ie ACPI based system), an
empty fdt is generated by efistub. #address-cells and #size-cells are not
set in the empty fdt, so they default to 1 (4 byte wide). This can be an
issue on 64-bit systems where values representing addresses, etc may be
8 bytes wide as the default value does not align with the general
requirements for an empty DTB, and is fragile when passed to other agents
as extra care is required to read the entire width of a value.
This issue is observed on Qualcomm Technologies QDF24XX platforms when
kexec-tools inserts 64-bit addresses into the "linux,elfcorehdr" and
"linux,usable-memory-range" properties of the fdt. When the values are
later consumed, they are truncated to 32-bit.
Setting #address-cells and #size-cells to 2 at creation of the empty fdt
resolves the observed issue, and makes the fdt less fragile.
---
drivers/firmware/efi/libstub/fdt.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index 260c4b4b492e..82973b86efe4 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -16,6 +16,22 @@
#include "efistub.h"
+#define EFI_DT_ADDR_CELLS_DEFAULT 2
+#define EFI_DT_SIZE_CELLS_DEFAULT 2
+
+static void fdt_update_cell_size(efi_system_table_t *sys_table, void *fdt)
+{
+ int offset;
+
+ offset = fdt_path_offset(fdt, "/");
+ /* Set the #address-cells and #size-cells values for an empty tree */
+
+ fdt_setprop_u32(fdt, offset, "#address-cells",
+ EFI_DT_ADDR_CELLS_DEFAULT);
+
+ fdt_setprop_u32(fdt, offset, "#size-cells", EFI_DT_SIZE_CELLS_DEFAULT);
+}
+
static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
unsigned long orig_fdt_size,
void *fdt, int new_fdt_size, char *cmdline_ptr,
@@ -42,10 +58,18 @@ static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
}
}
- if (orig_fdt)
+ if (orig_fdt) {
status = fdt_open_into(orig_fdt, fdt, new_fdt_size);
- else
+ } else {
status = fdt_create_empty_tree(fdt, new_fdt_size);
+ if (status == 0) {
+ /*
+ * Any failure from the following function is non
+ * critical
+ */
+ fdt_update_cell_size(sys_table, fdt);
+ }
+ }
if (status != 0)
goto fdt_set_fail;
--
2.11.1
Loading...