summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/e820.c
AgeCommit message (Collapse)Author
2017-01-28x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable namesIngo Molnar
So the E820 code has a lot of cases of: struct e820_entry *ei; ... but the 'ei' name makes very little sense if you think about it, it's not an abbreviation of anything obviously related to E820 table entries. This results in weird looking lines such as: if (type && ei->type != type) where you might have to double check what 'ei' really means, plus weird looking secondary variable names, such as: u64 ei_end; The 'ei' name was introduced in a single function over a decade ago, and then mindlessly cargo-copied over into other functions - with usage growing to over 60 uses altogether (!). ( My best guess is that it might have been originally meant as abbreviation of 'entry interval'. ) Anyway, rename these to the much more obvious: struct e820_entry *entry; No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename memblock_x86_fill() to e820__memblock_setup() and ↵Ingo Molnar
improve the explanations So memblock_x86_fill() is another E820 code misnomer: - nothing in its name tells us that it's part of the E820 subsystem ... - The 'fill' wording is ambiguous and doesn't tell us whether it's a single entry or some process - while the _real_ purpose of the function is hidden, which is to do a complete setup of the (platform independent) memblock regions. So rename it accordingly, to e820__memblock_setup(). Also translate this incomprehensible and misleading comment: /* * EFI may have more than 128 entries * We are safe to enable resizing, beause memblock_x86_fill() * is rather later for x86 */ memblock_allow_resize(); The worst aspect of this comment isn't even the sloppy typos, but that it casually mentions a '128' number with no explanation, which makes one lead to the assumption that this is related to the well-known limit of a maximum of 128 E820 entries passed via legacy bootloaders. But no, the _real_ meaning of 128 here is that of the memblock subsystem, which too happens to have a 128 entries limit for very early memblock regions (which is unrelated to E820), via INIT_MEMBLOCK_REGIONS ... So change the comment to a more comprehensible version: /* * The bootstrap memblock region count maximum is 128 entries * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries * than that - so allow memblock resizing. * * This is safe, because this call happens pretty late during x86 setup, * so we know about reserved memory regions already. (This is important * so that memblock resizing does no stomp over reserved areas.) */ memblock_allow_resize(); No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Basic cleanup of e820.cIngo Molnar
Over the last decade or so e820.c has become an ureadable mess of tinkerware. Perform some very basic cleanups before doing more intricate cleanups, so that my eyes don't start bleeding when I look at it. Here's some of the excesses: - Total disregard of countless aspects of Documentation/CodingStyle. - Totally inconsistent hodge-podge of various coding styles and practices. - Gems like: (unsigned long long) e820_table->entries[i].addr ... which is a completely unnecessary type conversion of an u64 value. - Incomprehensible comments while there are major functions with absolutely no explanation - plus an armada of typos and grammar mistakes. - Mindless checkpatch artifacts such as: if (append_e820_table(boot_params.e820_table, boot_params.e820_entries) < 0) { for_each_free_mem_range(u, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { - Actively misleading comments: /* In case someone cares... */ return who; ( The usage site of the return value just a few lines further down makes it clear that we very much care about the return value, we use it to print out the e820 map... ) - Colorfully inconsistent capitalization and punctuation throughout. - etc. This patch fixes only the worst excesses - there's more to fix. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename e820_table_saved to e820_table_firmware and improve ↵Ingo Molnar
the description So the 'e820_table_saved' is a bit of a misnomer that hides its real purpose. At first sight the name suggests that it's some sort save/restore mechanism, as this is how we typically name such facilities in the kernel. But that is not so, e820_table_saved is the original firmware version of the e820 table, not modified by the kernel. This table is displayed in the /sys/firmware/memmap file, and it's also used by the hibernation code to calculate a physical memory layout MD5 fingerprint checksum which is invariant of the kernel. So rename it to 'e820_table_firmware' and update all the comments to better describe the main e820 data strutures. Also rename: 'initial_e820_table_saved' => 'e820_table_firmware_init' 'e820_update_range_saved' => 'e820_update_range_firmware' ... to better match the new nomenclature. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename default_machine_specific_memory_setup() to ↵Ingo Molnar
e820__memory_setup_default() The default_machine_specific_memory_setup() is a mouthful and despite the many words it doesn't actually tell us clearly what it does. The function is the x86 legacy memory layout setup code, based on E820-formatted memory layout information passed by the bootloader via the boot_params. Rename it to e820__memory_setup_default() to better signal its purpose. Also rename the related higher level function to be consistent with this new naming: setup_memory_map() => e820__memory_setup() No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Harmonize the 'struct e820_table' fieldsIngo Molnar
So the e820_table->map and e820_table->nr_map names are a bit confusing, because it's not clear what a 'map' really means (it could be a bitmap, or some other data structure), nor is it clear what nr_map means (is it a current index, or some other count). Rename the fields from: e820_table->map => e820_table->entries e820_table->nr_map => e820_table->nr_entries which makes it abundantly clear that these are entries of the table, and that the size of the table is ->nr_entries. Propagate the changes to all affected files. Where necessary, adjust local variable names to better reflect the new field names. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename everything to e820_tableIngo Molnar
No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename 'e820_map' variables to 'e820_array'Ingo Molnar
In line with the rename to 'struct e820_array', harmonize the naming of common e820 table variable names as well: e820 => e820_array e820_saved => e820_array_saved e820_map => e820_array initial_e820 => e820_array_init This makes the variable names more consistent and easier to grep for. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Remove e820_mark_nosave_regions() definition ugliesIngo Molnar
The e820_mark_nosave_regions definition has a number of ugly #ifdef conditions that unnecessarily uglify both the header and the e820.c file. Make this function unconditional: most distro kernels have hibernation enabled. If LTO functionality is added in the future it will be able to eliminate unused functions without uglifying the source code. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and ↵Ingo Molnar
'struct e820_array' The 'e820entry' and 'e820map' names have various annoyances: - the missing underscore departs from the usual kernel style and makes the code look weird, - in the past I kept confusing the 'map' with the 'entry', because a 'map' is ambiguous in that regard, - it's not really clear from the 'e820map' that this is a regular C array. Rename them to 'struct e820_entry' and 'struct e820_array' accordingly. ( Leave the legacy UAPI header alone but do the rename in the bootparam.h and e820/types.h file - outside tools relying on these defines should either adjust their code, or should use the legacy header, or should create their private copies for the definitions. ) No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Move asm/e820.h to asm/e820/api.hIngo Molnar
In line with asm/e820/types.h, move the e820 API declarations to asm/e820/api.h and update all usage sites. This is just a mechanical, obviously correct move & replace patch, there will be subsequent changes to clean up the code and to make better use of the new header organization. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-12x86/e820/32: Fix e820_search_gap() error handling on x86-32Arnd Bergmann
GCC correctly points out that on 32-bit kernels, e820_search_gap() not finding a start now leads to pci_mem_start ('gapstart') being set to an uninitialized value: arch/x86/kernel/e820.c: In function 'e820_setup_gap': arch/x86/kernel/e820.c:641:16: error: 'gapstart' may be used uninitialized in this function [-Werror=maybe-uninitialized] This restores the behavior from before this cleanup: b4ed1d15b453 ("x86/e820: Make e820_search_gap() static and remove unused variables") ... defaulting to address 0x10000000 if nothing was found. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Yang <richard.weiyang@gmail.com> Fixes: b4ed1d15b453 ("x86/e820: Make e820_search_gap() static and remove unused variables") Link: http://lkml.kernel.org/r/20170111144926.695369-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-28x86/e820: Make e820_search_gap() static and remove unused variablesWei Yang
e820_search_gap() is just used locally now and the 'start_addr' and 'end_addr' parameters are fixed values. Also, 'gapstart' is not checked in this function anymore. So make the function static and remove those unused variables. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Link: http://lkml.kernel.org/r/1482676551-11411-1-git-send-email-richard.weiyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16x86/e820: Don't merge consecutive E820_PRAM rangesDan Williams
Commit: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") ... fixed up the broken manipulations of max_pfn in the presence of E820_PRAM ranges. However, it also broke the sanitize_e820_map() support for not merging E820_PRAM ranges. Re-introduce the enabling to keep resource boundaries between consecutive defined ranges. Otherwise, for example, an environment that boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0 device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Yi <yizhan@redhat.com> Cc: linux-nvdimm@lists.01.org Fixes: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulationDan Williams
In commit: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type") Christoph references the original patch I wrote implementing pmem support. The intent of the 'max_pfn' changes in that commit were to enable persistent memory ranges to be covered by the struct page memmap by default. However, that approach was abandoned when Christoph ported the patches [1], and that functionality has since been replaced by devm_memremap_pages(). In the meantime, this max_pfn manipulation is confusing kdump [2] that assumes that everything covered by the max_pfn is "System RAM". This results in kdump hanging or crashing. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098 So fix it. Reported-by: Zhang Yi <yizhan@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Zhang Yi <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> # v4.1 and later kernels Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-nvdimm@lists.01.org Fixes: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type") Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-21x86/e820: Use much less memory for e820/e820_saved, save up to 120kDenys Vlasenko
The maximum size of e820 map array for EFI systems is defined as E820_X_MAX (E820MAX + 3 * MAX_NUMNODES). In x86_64 defconfig, this ends up with E820_X_MAX = 320, e820 and e820_saved are 6404 bytes each. With larger configs, for example Fedora kernels, E820_X_MAX = 3200, e820 and e820_saved are 64004 bytes each. Most of this space is wasted. Typical machines have some 20-30 e820 areas at most. After previous patch, e820 and e820_saved are pointers to e280 maps. Change them to initially point to maps which are __initdata. At the very end of kernel init, just before __init[data] sections are freed in free_initmem(), allocate smaller blocks, copy maps there, and change pointers. The late switch makes sure that all functions which can be used to change e820 maps are no longer accessible (they are all __init functions). Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160918182125.21000-1-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-21x86/e820: Prepare e280 code for switch to dynamic storageDenys Vlasenko
This patch turns e820 and e820_saved into pointers to e820 tables, of the same size as before. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160917213927.1787-2-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-21x86/e820: Mark some static functions __initDenys Vlasenko
They are all called only from other __init functions in e820.c Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160917213927.1787-1-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08x86/e820: Fix very large 'size' handling boundary conditionWei Yang
The (start, size) tuple represents a range [start, start + size - 1], which means "start" and "start + size - 1" should be compared to see whether the range overflows. For example, a range with (start, size): (0xffffffff fffffff0, 0x00000000 00000010) represents [0xffffffff fffffff0, 0xffffffff ffffffff] ... would be judged overflow in the original code, while actually it is not. This patch fixes this and makes sure it still works when size is zero. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/1471657213-31817-1-git-send-email-richard.weiyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-15Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "This is another big update. Main changes are: - lots of x86 system call (and other traps/exceptions) entry code enhancements. In particular the complex parts of the 64-bit entry code have been migrated to C code as well, and a number of dusty corners have been refreshed. (Andy Lutomirski) - vDSO special mapping robustification and general cleanups (Andy Lutomirski) - cpufeature refactoring, cleanups and speedups (Borislav Petkov) - lots of other changes ..." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/cpufeature: Enable new AVX-512 features x86/entry/traps: Show unhandled signal for i386 in do_trap() x86/entry: Call enter_from_user_mode() with IRQs off x86/entry/32: Change INT80 to be an interrupt gate x86/entry: Improve system call entry comments x86/entry: Remove TIF_SINGLESTEP entry work x86/entry/32: Add and check a stack canary for the SYSENTER stack x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed x86/entry: Vastly simplify SYSENTER TF (single-step) handling x86/entry/traps: Clear DR6 early in do_debug() and improve the comment x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/entry/32: Restore FLAGS on SYSEXIT x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test selftests/x86: In syscall_nt, test NT|TF as well x86/asm-offsets: Remove PARAVIRT_enabled x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled uprobes: __create_xol_area() must nullify xol_mapping.fault x86/cpufeature: Create a new synthetic cpu capability for machine check recovery ...
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/e820: Set System RAM type and descriptorToshi Kani
Change e820_reserve_resources() to set 'flags' and 'desc' from e820 types. Set E820_RESERVED_KERN and E820_RAM's (System RAM) io resource type to IORESOURCE_SYSTEM_RAM. Do the same for "Kernel data", "Kernel code", and "Kernel bss", which are child nodes of System RAM. I/O resource descriptor is set to 'desc' for entries that are (and will be) target ranges of walk_iomem_res() and region_intersects(). Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joerg Roedel <jroedel@suse.de> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Mark Salter <msalter@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: WANG Chao <chaowang@redhat.com> Cc: linux-arch@vger.kernel.org Cc: linux-mm <linux-mm@kvack.org> Link: http://lkml.kernel.org/r/1453841853-11383-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30x86/e820: Deinline e820_type_to_string, save 126 bytesDenys Vlasenko
This function compiles to 102 bytes of machine code. It has two callsites. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/1443443037-22077-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-29Merge tag 'libnvdimm-for-4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm Pull libnvdimm subsystem from Dan Williams: "The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore" * tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits) arch, x86: pmem api for ensuring durability of persistent memory updates libnvdimm: Add sysfs numa_node to NVDIMM devices libnvdimm: Set numa_node to NVDIMM devices acpi: Add acpi_map_pxm_to_online_node() libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only pmem: flag pmem block devices as non-rotational libnvdimm: enable iostat pmem: make_request cleanups libnvdimm, pmem: fix up max_hw_sectors libnvdimm, blk: add support for blk integrity libnvdimm, btt: add support for blk integrity fs/block_dev.c: skip rw_page if bdev has integrity libnvdimm: Non-Volatile Devices tools/testing/nvdimm: libnvdimm unit test infrastructure libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory nd_btt: atomic sector updates libnvdimm: infrastructure for btt devices libnvdimm: write blk label set libnvdimm: write pmem label set libnvdimm: blk labels and namespace instantiation ...
2015-06-24mm/memblock: add extra "flags" to memblock to allow selection of memory ↵Tony Luck
based on attribute Some high end Intel Xeon systems report uncorrectable memory errors as a recoverable machine check. Linux has included code for some time to process these and just signal the affected processes (or even recover completely if the error was in a read only page that can be replaced by reading from disk). But we have no recovery path for errors encountered during kernel code execution. Except for some very specific cases were are unlikely to ever be able to recover. Enter memory mirroring. Actually 3rd generation of memory mirroing. Gen1: All memory is mirrored Pro: No s/w enabling - h/w just gets good data from other side of the mirror Con: Halves effective memory capacity available to OS/applications Gen2: Partial memory mirror - just mirror memory begind some memory controllers Pro: Keep more of the capacity Con: Nightmare to enable. Have to choose between allocating from mirrored memory for safety vs. NUMA local memory for performance Gen3: Address range partial memory mirror - some mirror on each memory controller Pro: Can tune the amount of mirror and keep NUMA performance Con: I have to write memory management code to implement The current plan is just to use mirrored memory for kernel allocations. This has been broken into two phases: 1) This patch series - find the mirrored memory, use it for boot time allocations 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused mirrored memory from mm/memblock.c and only give it out to select kernel allocations (this is still being scoped because page_alloc.c is scary). This patch (of 3): Add extra "flags" to memblock to allow selection of memory based on attribute. No functional changes Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-27e820, efi: add ACPI 6.0 persistent memory typesDan Williams
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory. Mark it "reserved" and allow it to be claimed by a persistent memory device driver. This definition is in addition to the Linux kernel's existing type-12 definition that was recently added in support of shipping platforms with NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as OEM reserved). Note, /proc/iomem can be consulted for differentiating legacy "Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory" E820_PMEM. Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-04-18Merge branch 'x86-pmem-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull PMEM driver from Ingo Molnar: "This is the initial support for the pmem block device driver: persistent non-volatile memory space mapped into the system's physical memory space as large physical memory regions. The driver is based on Intel code, written by Ross Zwisler, with fixes by Boaz Harrosh, integrated with x86 e820 memory resource management and tidied up by Christoph Hellwig. Note that there were two other separate pmem driver submissions to lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are reasonably happy with this initial version. This version enables minimal support that enables persistent memory devices out in the wild to work as block devices, identified through a magic (non-standard) e820 flag and auto-discovered if CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the memory maps via the "memmap=..." boot option with the new, special '!' modifier character. Limitations: this is a regular block device, and since the pmem areas are not struct page backed, they are invisible to the rest of the system (other than the block IO device), so direct IO to/from pmem areas, direct mmap() or XIP is not possible yet. The page cache will also shadow and double buffer pmem contents, etc. Initial support is for x86" * 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: drivers/block/pmem: Fix 32-bit build warning in pmem_alloc() drivers/block/pmem: Add a driver for persistent memory x86/mm: Add support for the non-standard protected e820 type
2015-04-01x86/mm: Add support for the non-standard protected e820 typeChristoph Hellwig
Various recent BIOSes support NVDIMMs or ADR using a non-standard e820 memory type, and Intel supplied reference Linux code using this type to various vendors. Wire this e820 table type up to export platform devices for the pmem driver so that we can use it in Linux. Based on earlier work from: Dave Jiang <dave.jiang@intel.com> Dan Williams <dan.j.williams@intel.com> Includes fixes for NUMA regions from Boaz Harrosh. Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <keith.busch@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-nvdimm@ml01.01.org Link: http://lkml.kernel.org/r/1427872339-6688-2-git-send-email-hch@lst.de [ Minor cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-24x86/mm: Use early_memunmap() instead of early_iounmap()Juergen Gross
Memory mapped via early_memremap() should be unmapped with early_memunmap() instead of early_iounmap(). Signed-off-by: Juergen Gross <jgross@suse.com> Cc: matt.fleming@intel.com Link: http://lkml.kernel.org/r/1424769211-11378-2-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-23x86, e820: Clean up sanitize_e820_map() usersWANG Chao
The argument 3 of sanitize_e820_map() will only be updated upon a successful sanitization. Some of the callers have extra conditionals for the same purpose. Clean them up. default_machine_specific_memory_setup() must keep the extra conditional because boot_params.e820_entries is an u8 and not an u32, so the direct update would overwrite other fields in boot_params. [ tglx: Massaged changelog ] Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Lee Chun-Yi <joeyli.kernel@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Link: http://lkml.kernel.org/r/1420601859-18439-1-git-send-email-chaowang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-11x86/mm: Use min() instead of min_t() in the e820 printout codeXishi Qiu
The type of "MAX_DMA_PFN" and "xXx_pfn" are both unsigned long now, so use min() instead of min_t(). Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Linux MM <linux-mm@kvack.org> Cc: <dave@sr71.net> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/5487AB3F.7050807@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16x86/mm, hibernate: Do not assume the first e820 area to be RAMLee, Chun-Yi
In arch/x86/kernel/setup.c::trim_bios_range(), the codes introduced by 1b5576e6 (base on d8a9e6a5), it updates the first 4Kb of memory to be E820_RESERVED region. That's because it's a BIOS owned area but generally not listed in the E820 table: e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x0000000000096fff] usable BIOS-e820: [mem 0x0000000000097000-0x0000000000097fff] reserved ... e820: update [mem 0x00000000-0x00000fff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable But the region of first 4Kb didn't register to nosave memory: PM: Registered nosave memory: [mem 0x00097000-0x00097fff] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] The code in e820_mark_nosave_regions() assumes the first e820 area to be RAM, so it causes the first 4Kb E820_RESERVED region ignored when register to nosave. This patch removed assumption of the first e820 area. Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1410491038-17576-1-git-send-email-jlee@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-21x86/mm: memblock: switch to use NUMA_NO_NODEGrygorii Strashko
Update X86 code to use NUMA_NO_NODE instead of MAX_NUMNODES while calling memblock APIs, because memblock API will be changed to use NUMA_NO_NODE and will produce warning during boot otherwise. See: https://lkml.org/lkml/2013/12/9/898 Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13x86: avoid remapping data in parse_setup_data()Linn Crosetto
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size larger than early_memremap() is able to handle, which is currently limited to 256kB. If this occurs it leads to a NULL dereference in parse_setup_data(). To avoid this, remap the setup_data header and allow parsing functions for individual types to handle their own data remapping. Signed-off-by: Linn Crosetto <linn@hp.com> Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com Acked-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Let "memmap=" take more entries one timeYinghai Lu
Current "memmap=" only can take one entry every time. when we have more entries, we have to use memmap= for each of them. For pxe booting, we have command line length limitation, those extra "memmap=" would waste too much space. This patch make memmap= could take several entries one time, and those entries will be split with ',' Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-47-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-24x86, mm: Trim memory in memblock to be page alignedYinghai Lu
We will not map partial pages, so need to make sure memblock allocation will not allocate those bytes out. Also we will use for_each_mem_pfn_range() to loop to map memory range to keep them consistent. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com Acked-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
2012-07-30firmware_map: make firmware_map_add_early() argument consistent with ↵Yasuaki Ishimatsu
firmware_map_add_hotplug() There are two ways to create /sys/firmware/memmap/X sysfs: - firmware_map_add_early When the system starts, it is calledd from e820_reserve_resources() - firmware_map_add_hotplug When the memory is hot plugged, it is called from add_memory() But these functions are called without unifying value of end argument as below: - end argument of firmware_map_add_early() : start + size - 1 - end argument of firmware_map_add_hogplug() : start + size The patch unifies them to "start + size". Even if applying the patch, /sys/firmware/memmap/X/end file content does not change. [akpm@linux-foundation.org: clarify comments] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29x86: print e820 physical addresses consistently with other parts of kernelBjorn Helgaas
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -BIOS-provided physical RAM map: +e820: BIOS-provided physical RAM map: - BIOS-e820: 0000000000000100 - 000000000009e000 (usable) +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000) +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices -reserve RAM buffer: 000000000009e000 - 000000000009ffff +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-18Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux This includes initial support for the recently published ACPI 5.0 spec. In particular, support for the "hardware-reduced" bit that eliminates the dependency on legacy hardware. APEI has patches resulting from testing on real hardware. Plus other random fixes. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits) acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec intel_idle: Split up and provide per CPU initialization func ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2 ACPI processor: Remove unneeded cpuidle_unregister_driver call intel idle: Make idle driver more robust intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle ACPI: kernel-parameters.txt : Add intel_idle.max_cstate intel_idle: remove redundant local_irq_disable() call ACPI processor: Fix error path, also remove sysdev link ACPI: processor: fix acpi_get_cpuid for UP processor intel_idle: fix API misuse ACPI APEI: Convert atomicio routines ACPI: Export interfaces for ioremapping/iounmapping ACPI registers ACPI: Fix possible alignment issues with GAS 'address' references ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64) ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) ACPI: Store SRAT table revision ACPI, APEI, Resolve false conflict between ACPI NVS and APEI ACPI, Record ACPI NVS regions ACPI, APEI, EINJ, Refine the fix of resource conflict ...
2012-01-17ACPI, Record ACPI NVS regionsHuang Ying
Some firmware will access memory in ACPI NVS region via APEI. That is, instructions in APEI ERST/EINJ table will read/write ACPI NVS region. The original resource conflict checking in APEI code will check memory/ioport accessed by APEI via general resource management mechanism. But ACPI NVS region is marked as busy already, so that the false resource conflict will prevent APEI ERST/EINJ to work. To fix this, this patch record ACPI NVS regions, so that we can avoid request resources for memory region inside it. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-11Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/numa: Add constraints check for nid parameters mm, x86: Remove debug_pagealloc_enabled x86/mm: Initialize high mem before free_all_bootmem() arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map() x86: Fix mmap random address range x86, mm: Unify zone_sizes_init() x86, mm: Prepare zone_sizes_init() for unification x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32 x86, mm: Use max_pfn instead of highend_pfn x86, mm: Move zone init from paging_init() on 64-bit x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit
2011-12-12Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid"Keith Packard
This hangs my MacBook Air at boot time; I get no console messages at all. I reverted this on top of -rc5 and my machine boots again. This reverts commit e8c7106280a305e1ff2a3a8a4dfce141469fb039. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-09x86, efi: Calling __pa() with an ioremap()ed address is invalidMatt Fleming
If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set in ->attribute we currently call set_memory_uc(), which in turn calls __pa() on a potentially ioremap'd address. On CONFIG_X86_32 this is invalid, resulting in the following oops on some machines: BUG: unable to handle kernel paging request at f7f22280 IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210 [...] Call Trace: [<c104f8ca>] ? page_is_ram+0x1a/0x40 [<c1025aff>] reserve_memtype+0xdf/0x2f0 [<c1024dc9>] set_memory_uc+0x49/0xa0 [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa [<c19216d4>] start_kernel+0x291/0x2f2 [<c19211c7>] ? loglevel+0x1b/0x1b [<c19210bf>] i386_start_kernel+0xbf/0xc8 A better approach to this problem is to map the memory region with the correct attributes from the start, instead of modifying it after the fact. The uncached case can be handled by ioremap_nocache() and the cached by ioremap_cache(). Despite first impressions, it's not possible to use ioremap_cache() to map all cached memory regions on CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really don't like being mapped into the vmalloc space, as detailed in the following bug report, https://bugzilla.redhat.com/show_bug.cgi?id=748516 Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA regions are covered by the direct kernel mapping table on CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI regions via the direct kernel mapping with the initial call to init_memory_mapping() in setup_arch(), whereas previously these regions wouldn't be mapped if they were after the last E820_RAM region until efi_ioremap() was called. Doing it this way allows us to delete efi_ioremap() completely. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-08memblock: s/memblock_analyze()/memblock_allow_resize()/ and update usersTejun Heo
The only function of memblock_analyze() is now allowing resize of memblock region arrays. Rename it to memblock_allow_resize() and update its users. * The following users remain the same other than renaming. arm/mm/init.c::arm_memblock_init() microblaze/kernel/prom.c::early_init_devtree() powerpc/kernel/prom.c::early_init_devtree() openrisc/kernel/prom.c::early_init_devtree() sh/mm/init.c::paging_init() sparc/mm/init_64.c::paging_init() unicore32/mm/init.c::uc32_memblock_init() * In the following users, analyze was used to update total size which is no longer necessary. powerpc/kernel/machine_kexec.c::reserve_crashkernel() powerpc/kernel/prom.c::early_init_devtree() powerpc/mm/init_32.c::MMU_init() powerpc/mm/tlb_nohash.c::__early_init_mmu() powerpc/platforms/ps3/mm.c::ps3_mm_add_memory() powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups() sh/kernel/machine_kexec.c::reserve_crashkernel() * x86/kernel/e820.c::memblock_x86_fill() was directly setting memblock_can_resize before populating memblock and calling analyze afterwards. Call memblock_allow_resize() before start populating. memblock_can_resize is now static inside memblock.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-05arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointerH Hartley Sweeten
The last parameter to sort() is a pointer to the function used to swap items. This parameter should be NULL, not 0, when not used. This quiets the following sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map()Mike Ditto
Replace the bubble sort in sanitize_e820_map() with a call to the generic kernel sort function to avoid pathological performance with large maps. On large (thousands of entries) E820 maps, the previous code took minutes to run; with this change it's now milliseconds. Signed-off-by: Mike Ditto <mditto@google.com> Cc: sassmann@kpanic.de Cc: yuenn@google.com Cc: Stefan Assmann <sassmann@kpanic.de> Cc: Nancy Yuen <yuenn@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-28Merge branch 'master' into x86/memblockTejun Heo
Conflicts & resolutions: * arch/x86/xen/setup.c dc91c728fd "xen: allow extra memory to be in multiple regions" 24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c 166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/" 5dfe8660a3d "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig 6661672053a "memblock: add NO_BOOTMEM config symbol" c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c d1f0ece6cdc "mm/memblock.c: small function definition fixes" ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-10-31x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULEPaul Gortmaker
These files were implicitly getting EXPORT_SYMBOL via device.h which was including module.h, but that will be fixed up shortly. By fixing these now, we can avoid seeing things like: arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’ arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’ arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’ [ with input from Randy Dunlap <rdunlap@xenotime.net> and also from Stephen Rothwell <sfr@canb.auug.org.au> ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-07-14memblock, x86: Reimplement memblock_find_dma_reserve() using iteratorsTejun Heo
memblock_find_dma_reserve() wants to find out how much memory is reserved under MAX_DMA_PFN. memblock_x86_memory_[free_]in_range() are used to find out the amounts of all available and free memory in the area, which are then subtracted to find out the amount of reservation. memblock_x86_memblock_[free_]in_range() are implemented using __memblock_x86_memory_in_range() which builds ranges from memblock and then count them, which is rather unnecessarily complex. This patch open codes the counting logic directly in memblock_find_dma_reserve() using memblock iterators and removes now unused __memblock_x86_memory_in_range() and find_range_array(). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-11-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14x86: Use __memblock_alloc_base() in early_reserve_e820()Tejun Heo
early_reserve_e820() implements its own ad-hoc early allocator using memblock_x86_find_in_range_size(). Use __memblock_alloc_base() instead and remove the unnecessary @startt parameter (it's top-down allocation anyway). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-6-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>