summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2008-04-19x86: move dma_sync_single_for_device to common headerGlauber Costa
i386 gets an empty function. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: move dma_sync_single_for_cpu to common headerGlauber Costa
i386 gets an empty function. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: move dma_unmap_sg to common headerGlauber Costa
i386 gets an empty function. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: move dma_map_sg to common headerGlauber Costa
the old i386 implementation is moved to pci-base_32.c Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: move dma_unmap_single to common headerGlauber Costa
i386 base does not need it, so it gets an empty function. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: implement dma_map_single through dma_opsGlauber Costa
That's already the name of the game for x86_64. For i386, we add a pci-base_32.c, that will hold the default operations. The function call itself goes through dma-mapping.h , the common header Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: move dma_ops struct definition to dma-mapping.hGlauber Costa
take it off the x86_64 specific header Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: reserve dma32 early for gartYinghai Lu
a system with 256 GB of RAM, when NUMA is disabled crashes the following way: Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Cannot allocate aperture memory hole (ffff8101c0000000,65536K) Kernel panic - not syncing: Not enough memory for aperture Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33 Call Trace: [<ffffffff84037c62>] panic+0xb2/0x190 [<ffffffff840381fc>] ? release_console_sem+0x7c/0x250 [<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90 [<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50 [<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680 [<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310 [<ffffffff84506a2f>] ? _etext+0x0/0x1 [<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40 [<ffffffff847ac795>] mem_init+0x45/0x1a0 [<ffffffff8479ff35>] start_kernel+0x295/0x380 [<ffffffff8479f1c2>] _sinittext+0x1c2/0x230 the root cause is : memmap PMD is too big, [ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0 almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G. solution will be: 1. make memmap allocation get memory above 4G... 2. reserve some dma32 range early before we try to set up memmap for all. and release that before pci_iommu_alloc, so gart or swiotlb could get some range under 4g limit for sure. the patch is using method 2. because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP will get Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 4000000 Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init) Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19srat, x86: add support for nodes spanning other nodesSuresh Siddha
For example, If the physical address layout on a two node system with 8 GB memory is something like: node 0: 0-2GB, 4-6GB node 1: 2-4GB, 6-8GB Current kernels fail to boot/detect this NUMA topology. ACPI SRAT tables can expose such a topology which needs to be supported. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: fpu xstate split fixSuresh Siddha
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: fpu xstate split cleanupSuresh Siddha
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86, fpu: lazy allocation of FPU area - v5Suresh Siddha
Only allocate the FPU area when the application actually uses FPU, i.e., in the first lazy FPU trap. This could save memory for non-fpu using apps. for example: on my system after boot, there are around 300 processes, with only 17 using FPU. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86, fpu: split FPU state from task struct - v5Suresh Siddha
Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: rename find_max_pfn() to propagate_e820_map()Ingo Molnar
this function doesnt just 'find' the max_pfn - it also has other side-effects such as registering sparse memory maps. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: implement prctl PR_GET_TSC and PR_SET_TSCErik Bosman
This patch implements the PR_GET_TSC and PR_SET_TSC prctl() commands on the x86 platform (both 32 and 64 bit.) These commands control the ability to read the timestamp counter from userspace (the RDTSC instruction.) While the RDTSC instuction is a useful profiling tool, it is also the source of some non-determinism in ring-3. For deterministic replay applications it is useful to be able to trap and emulate (and record the outcome of) this instruction. This patch uses code earlier used to disable the timestamp counter for the SECCOMP framework. A side-effect of this patch is that the SECCOMP environment will now also disable the timestamp counter on x86_64 due to the addition of the TIF_NOTSC define on this platform. The code which enables/disables the RDTSC instruction during context switches is in the __switch_to_xtra function, which already handles other unusual conditions, so normal performance should not have to suffer from this change. Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSCErik Bosman
This patch adds prctl commands that make it possible to deny the execution of timestamp counters in userspace. If this is not implemented on a specific architecture, prctl will return -EINVAL. ned-off-by: Erik Bosman <ejbosman@cs.vu.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: cleanup boot-heap usageAlexander van Heukelum
The kernel decompressor wrapper uses memory located beyond the end of the image. This might lead to hard to debug problems, but even if it can be proven to be safe, it is at the very least unclean. I don't see any advantages either, unless you count it not being zeroed out as an advantage. This patch moves the boot-heap area to the bss segment. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: pageattr.c fix shadowed variable warningHarvey Harrison
irqs_disabled() uses flags internally, use _flags to avoid shadowing code calling into this macro. Introduced between 2.6.25-rc3 and -rc4 Fixes the sparse warning: arch/x86/mm/pageattr.c:383:21: warning: symbol 'flags' shadows an earlier one arch/x86/mm/pageattr.c:369:16: originally declared here Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: EFI_PAGE_SHIFT fixHuang, Ying
Make x86 EFI code works when EFI_PAGE_SHIFT != PAGE_SHIFT. The memrage_efi_to_native() provided in this patch can be used on other EFI platform such as IA64 too. This patch has been tested on Intel x86_64 platform with EFI 64/32 firmware. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19Merge branch 'merge-fixes' into develRussell King
2008-04-19Merge branch 'omap2-upstream' into develRussell King
2008-04-19Merge branches 'arm', 'at91', 'ep93xx', 'iop', 'ks8695', 'misc', 'mxc', ↵Russell King
'ns9x', 'orion', 'pxa', 'sa1100', 's3c' and 'sparsemem' into devel
2008-04-19[ARM] pxa: V4L2 soc_camera driver for PXA270Guennadi Liakhovetski
This patch adds a driver for the Quick Capture Interface on the PXA270. It is based on the original driver from Intel, but has been re-worked multiple times since then, now it also supports the V4L2 API. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4988/1: Add GPIO lib support to the EP93xxRyan Mallon
Adds support for the generic GPIO lib to the EP93xx family. The gpio handling code has been moved from core.c to a new file called gpio.c. The GPIO based IRQ code has not been changed. Signed-off-by: Ryan Mallon <ryan@bluewatersys.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] Add initial sparsemem supportRussell King
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 5002/1: tosa: add two more ledsDmitry Baryshkov
This adds support for two more leds: the wlan one (found in SL-6000W and SL-6000L) and the blutooth one (found in SL-6000W). Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 5004/1: Tosa: make several unreferenced structures static.Dmitry Baryshkov
Now that scoop gpio's are converted to generic_gpio, tosascoop_device and tosascoop_jc_device don't have to be exported. Also make tosa_gpio_* static Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4976/1: zylonite: Configure GPIO for WM9713 IRQ lineMark Brown
Set up the IRQ line for the WM9713 device on the Zylonite. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4973/1: Tosa: use leds-gpio driver.Dmitry Baryshkov
Now as the scoop pins are covered by the generic gpio API, we can use leds-gpio driver instead of special leds-tosa. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4972/1: Tosa: convert scoop GPIOs usage to generic gpio codeDmitry Baryshkov
Convert set/reset_scoop_gpio to generic gpio calls. This patch depends on the pxaficp_ir hooks patch. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4971/1: pxaficp_ir: provide startup and shutdown hooksDmitry Baryshkov
Let platform do some specific initialisation and cleanup things during pxaficp_ir probing and removing. E.g. this can be usefull to request/free gpios used by the platform to control the transceiver. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4964/1: htc-pasic3: MFD driver for PASIC3 LED control + DS1WM chipPhilipp Zabel
This driver will provide registers, clocks and GPIOs of the HTC PASIC3 (AIC3) and PASIC2 (AIC2) chips to the ds1wm and leds-pasic3 drivers. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4959/1: PXA: Fix misprint in CICR1_RGBT_CONVMike Rapoport
This patch fixes misprint in definition of CICR1_RGBT_CONV in include/asm-arm/arch-pxa/pxa-regs.h Signed-off-by: Mike Rapoport <mike@compulab.co.il> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4952/1: magician: add LCD detection, LCD power switching, update pxafb ↵Philipp Zabel
settings All magician devices I've encountered so far have featured the Toppoly TD028STEB1 display, so the Samsung LTP280QV support is untested. The power-on sequence is not correct because pxafb doesn't yet support enabling the LCD controller in the middle of the it. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4948/1: magician: use htc-egpio to drive the GPIO/IRQ expander CPLDPhilipp Zabel
needed for power management (audio, BT, charging, GSM, LCD, SD), GSM, flash and SD operation and audio routing. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4947/1: htc-egpio, a driver for GPIO/IRQ expanders with fixed ↵Philipp Zabel
input/output pins implemented in CPLD chips on several HTC devices. The original driver was written by Kevin O'Connor, I have adapted it to use gpiolib and made the bus/register widths configurable. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4943/2: magician: fix magician.h GPIO header includesPhilipp Zabel
PXA GPIO definitions were split from pxa-regs.h into pxa2xx-gpio.h. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4867/1: Adds flash, udc, mci support for gumstix F boardsJaya Kumar
This patch implements support for Gumstix-F flash, udc and mci. Fixes since the last time are: - Steve Sakoman as maintainer - cleanup for udc and mci setup Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] pxa: remove keypad register definitions from pxa-regs.heric miao
Keypad registers are now fully defined within pxa27x-keypad.c, no need to keep those definitions in pxa-regs.h Signed-off-by: eric miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] pxa: add pxa27x_keypad device and pxa_set_keypad_info()eric miao
also update the clk definitions in pxa27x and pxa3xx. Signed-off-by: eric miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] pxa: allow dynamic enable/disable of GPIO wakeup for pxa{25x,27x}eric miao
Changes include: 1. rename MFP_LPM_WAKEUP_ENABLE into MFP_LPM_CAN_WAKEUP to indicate the board capability of this pin to wakeup the system 2. add gpio_set_wake() and keypad_set_wake() to allow dynamically enable/disable wakeup from GPIOs and keypad GPIO * these functions are currently kept in mfp-pxa2xx.c due to their dependency to the MFP configuration 3. pxa2xx_mfp_config() only gives early warning if MFP_LPM_CAN_WAKEUP is set on incorrect pins So that the GPIO's wakeup capability is now decided by the following: a) processor's capability: (only those GPIOs which have dedicated bits within PWER/PRER/PFER can wakeup the system), this is initialized by pxa{25x,27x}_init_mfp() b) board design decides: - whether the pin is designed to wakeup the system (some of the GPIOs are configured as other functions, which is not intended to be a wakeup source), by OR'ing the pin config with MFP_LPM_CAN_WAKEUP - which edge the pin is designed to wakeup the system, this may depends on external peripherals/connections, which is totally board specific; this is indicated by MFP_LPM_EDGE_* c) the corresponding device's (most likely the gpio_keys.c) wakeup attribute: Signed-off-by: eric miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] pxa: add MFP-alike pin configuration support for pxa{25x, 27x}eric miao
Pin configuration on pxa{25x,27x} has now separated from generic GPIO into dedicated mfp-pxa2xx.c by this patch. The name "mfp" is borrowed from pxa3xx and is used here to alert the difference between the two concepts: pin configuration and generic GPIOs. A GPIO can be called a "GPIO" _only_ when the corresponding pin is configured so. A pin configuration on pxa{25x,27x} is composed of: - alternate function selection (or pin mux as commonly called) - low power state or sleep state - wakeup enabling from low power mode The following MFP_xxx bit definitions in mfp.h are re-used: - MFP_PIN(x) - MFP_AFx - MFP_LPM_DRIVE_{LOW, HIGH} - MFP_LPM_EDGE_* Selecting alternate function on pxa{25x, 27x} involves configuration of GPIO direction register GPDRx, so a new bit and MFP_DIR_{IN, OUT} are introduced. And pin configurations are defined by the following two macros: - MFP_CFG_IN : for input alternate functions - MFP_CFG_OUT : for output alternate functions Every configuration should provide a low power state if it configured as output using MFP_CFG_OUT(). As a general guideline, the low power state should be decided to minimize the overall power dissipation. As an example, it is better to drive the pin as high level in low power mode if the GPIO is configured as an active low chip select. Pins configured as GPIO are defined by MFP_CFG_IN(). This is to avoid side effects when it is firstly configured as output. The actual direction of the GPIO is configured by gpio_direction_{input, output} Wakeup enabling on pxa{25x, 27x} is actually GPIO based wakeup, thus the device based enable_irq_wake() mechanism is not applicable here. E.g. invoking enable_irq_wake() with a GPIO IRQ as in the following code to enable OTG wakeup is by no means portable and intuitive, and it is valid _only_ when GPIO35 is configured as USB_P2_1: enable_irq_wake( gpio_to_irq(35) ); To make things worse, not every GPIO is able to wakeup the system. Only a small number of them can, on either rising or falling edge, or when level is high (for keypad GPIOs). Thus, another new bit is introduced to indicate that the GPIO will wakeup the system: - MFP_LPM_WAKEUP_ENABLE The following macros can be used in platform code, and be OR'ed to the GPIO configuration to enable its wakeup: - WAKEUP_ON_EDGE_{RISE, FALL, BOTH} - WAKEUP_ON_LEVEL_HIGH The WAKEUP_ON_LEVEL_HIGH is used for keypad GPIOs _only_, there is no edge settings for those GPIOs. These WAKEUP_ON_* flags OR'ed on wrong GPIOs will be ignored in case that platform code author is careless enough. The tradeoff here is that the wakeup source is fully determined by the platform configuration, instead of enable_irq_wake(). Signed-off-by: eric miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] pxa: separate GPIOs and their mode definitions to pxa2xx-gpio.heric miao
two reasons: 1. GPIO namings and their mode definitions are conceptually not part of the PXA register definitions 2. this is actually a temporary move in the transition of PXA2xx to use MFP-alike APIs (as what PXA3xx is now doing), so that legacy code will still work and new code can be added in step by step Signed-off-by: eric miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] pxa: integrate low IRQ chip (ICIP) and high IRQ chip (ICIP2) into oneeric miao
This makes the code better organized and simplified a bit. The change will lose a bit of performance when performing IRQ ack/mask/unmask,but that's not too much after checking the result binary. This patch also removes the ugly #ifdef CONFIG_PXA27x .. #endif by carefully not to access those pxa{27x,3xx} specific registers, this is done by keeping an internal IRQ number variable. The pxa-regs.h is also modified so registers for IRQ > PXA_IRQ(31) are made public even if CONFIG_PXA{27x,3xx} isn't defined (for pxa25x's sake) The incorrect assumption in the original code that internal irq starts from 0 is also corrected by comparing with PXA_IRQ(0). "struct sys_device" for the IRQ are reduced into one single device on pxa{27x,3xx}. Signed-off-by: eric miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4868/1: Enhance pxa270 GPIO definitionsRobert Jarzmik
Enhanced GPIO alternate functions descriptions, taken from Intel PXA270 Developers Manual. Signed-off-by: Robert Jarzmik <rjarzmik@free.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4830/1: Add support for the CLK_POUT pin on PXA3xx CPUsMark Brown
Expose control of the PXA3xx 13MHz CLK_POUT pin via the clock API Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[ARM] 4852/1: Add timerfd_create, timerfd_settime and timerfd_gettime ↵Uwe Kleine-König
syscall entries Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-04-19[PATCH] r/o bind mounts: debugging for missed callsDave Hansen
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts. There was also a set of potentially missed mnt_want_write()s from dentry_open() calls. This patch provides a very simple debugging framework to catch these kinds of bugs. It will WARN_ON() them, but should stop us from having any oopses or mnt_writer count imbalances. I'm quite convinced that this is a good thing because it found bugs in the stuff I was working on as soon as I wrote it. [hch: made it conditional on a debug option. But it's still a little bit too ugly] [hch: merged forced remount r/o fix from Dave and akpm's fix for the fix] Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19[PATCH] r/o bind mounts: honor mount writer counts at remountDave Hansen
Originally from: Herbert Poetzl <herbert@13thfloor.at> This is the core of the read-only bind mount patch set. Note that this does _not_ add a "ro" option directly to the bind mount operation. If you require such a mount, you must first do the bind, then follow it up with a 'mount -o remount,ro' operation: If you wish to have a r/o bind mount of /foo on bar: mount --bind /foo /bar mount -o remount,ro /bar Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19[PATCH] r/o bind mounts: track numbers of writers to mountsDave Hansen
This is the real meat of the entire series. It actually implements the tracking of the number of writers to a mount. However, it causes scalability problems because there can be hundreds of cpus doing open()/close() on files on the same mnt at the same time. Even an atomic_t in the mnt has massive scalaing problems because the cacheline gets so terribly contended. This uses a statically-allocated percpu variable. All want/drop operations are local to a cpu as long that cpu operates on the same mount, and there are no writer count imbalances. Writer count imbalances happen when a write is taken on one cpu, and released on another, like when an open/close pair is performed on two Upon a remount,ro request, all of the data from the percpu variables is collected (expensive, but very rare) and we determine if there are any outstanding writers to the mount. I've written a little benchmark to sit in a loop for a couple of seconds in several cpus in parallel doing open/write/close loops. http://sr71.net/~dave/linux/openbench.c The code in here is a a worst-possible case for this patch. It does opens on a _pair_ of files in two different mounts in parallel. This should cause my code to lose its "operate on the same mount" optimization completely. This worst-case scenario causes a 3% degredation in the benchmark. I could probably get rid of even this 3%, but it would be more complex than what I have here, and I think this is getting into acceptable territory. In practice, I expect writing more than 3 bytes to a file, as well as disk I/O to mask any effects that this has. (To get rid of that 3%, we could have an #defined number of mounts in the percpu variable. So, instead of a CPU getting operate only on percpu data when it accesses only one mount, it could stay on percpu data when it only accesses N or fewer mounts.) [AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>