Discussion:
[PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump
Xunlei Pang
2017-03-17 10:22:30 UTC
Permalink
kexec setups identity mappings for all the memory mapped in 1st kernel,
this is not necessary for the kdump case. Actually it can cause extra
memory consumption for paging structures, which is quite considerable
on modern machines with huge memory.

E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
from the reserved memory range if setting all the identity mappings.

It also causes some trouble for distributions that use an intelligent
policy to evaluate the proper "crashkernel=X" for users.

To solve it, in case of kdump, we only setup identity mappings for the
crash memory and the ISA memory(may be needed by purgatory/kdump boot).

Signed-off-by: Xunlei Pang <***@redhat.com>
---
arch/x86/kernel/machine_kexec_64.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 857cdbd..db77a76 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)

level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
- for (i = 0; i < nr_pfn_mapped; i++) {
- mstart = pfn_mapped[i].start << PAGE_SHIFT;
- mend = pfn_mapped[i].end << PAGE_SHIFT;

+ if (image->type == KEXEC_TYPE_CRASH) {
+ /* Always map the ISA range */
result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ level4p, 0, ISA_END_ADDRESS);
if (result)
return result;
+
+ /* crashk_low_res may not be initialized when reaching here */
+ if (crashk_low_res.end) {
+ mstart = crashk_low_res.start;
+ mend = crashk_low_res.end + 1;
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ }
+
+ mstart = crashk_res.start;
+ mend = crashk_res.end + 1;
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ } else {
+ for (i = 0; i < nr_pfn_mapped; i++) {
+ mstart = pfn_mapped[i].start << PAGE_SHIFT;
+ mend = pfn_mapped[i].end << PAGE_SHIFT;
+
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ }
}

/*
--
1.8.3.1
Eric W. Biederman
2017-03-17 17:38:49 UTC
Permalink
Post by Xunlei Pang
kexec setups identity mappings for all the memory mapped in 1st kernel,
this is not necessary for the kdump case. Actually it can cause extra
memory consumption for paging structures, which is quite considerable
on modern machines with huge memory.
E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
from the reserved memory range if setting all the identity mappings.
It also causes some trouble for distributions that use an intelligent
policy to evaluate the proper "crashkernel=X" for users.
To solve it, in case of kdump, we only setup identity mappings for the
crash memory and the ISA memory(may be needed by purgatory/kdump boot).
How about instead we detect the presence of 1GiB pages and use them
if they are available. We already use 2MiB pages. If we can do that
we will only need about 192K for page tables in the case you have
described and this all becomes a non-issue.

I strongly suspect that the presence of 24TiB of memory in an x86 system
strongly correlates to the presence of 1GiB pages.

In principle we certainly can use a less extensive mapping but that
should not be something that differs between the two kexec cases.

I can see forcing the low 1MiB range in. But calling it ISA range is
very wrong and misleading. The reasons that range are special during
boot-up have nothing to do with ISA. But have everything to do with
where legacy page tables are mapped, and where we need identity pages to
start other cpus. I think the only user that actually cares is
purgatory where it plays swapping games with the low 1MiB because we
can't preload what we need to down there or it would mess up the running
kernel. So saying anything about the old ISA bus is wrong and
misleading. At the very very least we need accurate comments.

Eric
Post by Xunlei Pang
---
arch/x86/kernel/machine_kexec_64.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 857cdbd..db77a76 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
- for (i = 0; i < nr_pfn_mapped; i++) {
- mstart = pfn_mapped[i].start << PAGE_SHIFT;
- mend = pfn_mapped[i].end << PAGE_SHIFT;
+ if (image->type == KEXEC_TYPE_CRASH) {
+ /* Always map the ISA range */
result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ level4p, 0, ISA_END_ADDRESS);
if (result)
return result;
+
+ /* crashk_low_res may not be initialized when reaching here */
+ if (crashk_low_res.end) {
+ mstart = crashk_low_res.start;
+ mend = crashk_low_res.end + 1;
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ }
+
+ mstart = crashk_res.start;
+ mend = crashk_res.end + 1;
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ } else {
+ for (i = 0; i < nr_pfn_mapped; i++) {
+ mstart = pfn_mapped[i].start << PAGE_SHIFT;
+ mend = pfn_mapped[i].end << PAGE_SHIFT;
+
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ }
}
/*
Xunlei Pang
2017-03-20 02:22:44 UTC
Permalink
Post by Eric W. Biederman
Post by Xunlei Pang
kexec setups identity mappings for all the memory mapped in 1st kernel,
this is not necessary for the kdump case. Actually it can cause extra
memory consumption for paging structures, which is quite considerable
on modern machines with huge memory.
E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
from the reserved memory range if setting all the identity mappings.
It also causes some trouble for distributions that use an intelligent
policy to evaluate the proper "crashkernel=X" for users.
To solve it, in case of kdump, we only setup identity mappings for the
crash memory and the ISA memory(may be needed by purgatory/kdump boot).
How about instead we detect the presence of 1GiB pages and use them
if they are available. We already use 2MiB pages. If we can do that
we will only need about 192K for page tables in the case you have
described and this all becomes a non-issue.
I strongly suspect that the presence of 24TiB of memory in an x86 system
strongly correlates to the presence of 1GiB pages.
In principle we certainly can use a less extensive mapping but that
should not be something that differs between the two kexec cases.
Ok, will try gbpages for the identity mapping.

Regards,
Xunlei
Post by Eric W. Biederman
I can see forcing the low 1MiB range in. But calling it ISA range is
very wrong and misleading. The reasons that range are special during
boot-up have nothing to do with ISA. But have everything to do with
where legacy page tables are mapped, and where we need identity pages to
start other cpus. I think the only user that actually cares is
purgatory where it plays swapping games with the low 1MiB because we
can't preload what we need to down there or it would mess up the running
kernel. So saying anything about the old ISA bus is wrong and
misleading. At the very very least we need accurate comments.
Eric
Post by Xunlei Pang
---
arch/x86/kernel/machine_kexec_64.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 857cdbd..db77a76 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
- for (i = 0; i < nr_pfn_mapped; i++) {
- mstart = pfn_mapped[i].start << PAGE_SHIFT;
- mend = pfn_mapped[i].end << PAGE_SHIFT;
+ if (image->type == KEXEC_TYPE_CRASH) {
+ /* Always map the ISA range */
result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ level4p, 0, ISA_END_ADDRESS);
if (result)
return result;
+
+ /* crashk_low_res may not be initialized when reaching here */
+ if (crashk_low_res.end) {
+ mstart = crashk_low_res.start;
+ mend = crashk_low_res.end + 1;
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ }
+
+ mstart = crashk_res.start;
+ mend = crashk_res.end + 1;
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ } else {
+ for (i = 0; i < nr_pfn_mapped; i++) {
+ mstart = pfn_mapped[i].start << PAGE_SHIFT;
+ mend = pfn_mapped[i].end << PAGE_SHIFT;
+
+ result = kernel_ident_mapping_init(&info,
+ level4p, mstart, mend);
+ if (result)
+ return result;
+ }
}
/*
Loading...