linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2014-07-17	usb: musb: Fix panic upon musb_am335x module removal	Ezequiel Garcia
	commit 7adb5c876e9c0677078a1e1094c6eafd29c30b74 upstream. At probe time, the musb_am335x driver register its childs by calling of_platform_populate(), which registers all childs in the devicetree hierarchy recursively. On the other side, the driver's remove() function uses of_device_unregister() to remove each child of musb_am335x's. However, when musb_dsps is loaded, its devices are attached to the musb_am335x device as musb_am335x childs. Hence, musb_am335x remove() will attempt to unregister the devices registered by musb_dsps, which produces a kernel panic. In other words, the childs in the "struct device" hierarchy are not the same as the childs in the "devicetree" hierarchy. Ideally, we should enforce the removal of the devices registered by musb_am335x only, instead of all its child devices. However, because of the recursive nature of of_platform_populate, this doesn't seem possible. Therefore, as the only solution at hand, this commit disables musb_am335x driver removal capability, preventing it from being ever removed. This was originally suggested by Sebastian Siewior: https://www.mail-archive.com/linux-omap@vger.kernel.org/msg104946.html And for reference, here's the panic upon module removal: musb-hdrc musb-hdrc.0.auto: remove, state 4 usb usb1: USB disconnect, device number 1 musb-hdrc musb-hdrc.0.auto: USB bus 1 deregistered Unable to handle kernel NULL pointer dereference at virtual address 0000008c pgd = de11c000 [0000008c] pgd=9e174831, pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] ARM Modules linked in: musb_am335x(-) musb_dsps musb_hdrc usbcore usb_common CPU: 0 PID: 623 Comm: modprobe Not tainted 3.15.0-rc4-00001-g24efd13 #69 task: de1b7500 ti: de122000 task.ti: de122000 PC is at am335x_shutdown+0x10/0x28 LR is at am335x_shutdown+0xc/0x28 pc : [<c0327798>] lr : [<c0327794>] psr: a0000013 sp : de123df8 ip : 00000004 fp : 00028f00 r10: 00000000 r9 : de122000 r8 : c000e6c4 r7 : de0e3c10 r6 : de0e3800 r5 : de624010 r4 : de1ec750 r3 : de0e3810 r2 : 00000000 r1 : 00000001 r0 : 00000000 Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 9e11c019 DAC: 00000015 Process modprobe (pid: 623, stack limit = 0xde122240) Stack: (0xde123df8 to 0xde124000) 3de0: de0e3810 bf054488 3e00: bf05444c de624010 60000013 bf043650 000012fc de624010 de0e3810 bf043a20 3e20: de0e3810 bf04b240 c0635b88 c02ca37c c02ca364 c02c8db0 de1b7500 de0e3844 3e40: de0e3810 c02c8e28 c0635b88 de02824c de0e3810 c02c884c de0e3800 de0e3810 3e60: de0e3818 c02c5b20 bf05417c de0e3800 de0e3800 c0635b88 de0f2410 c02ca838 3e80: bf05417c de0e3800 bf055438 c02ca8cc de0e3c10 bf054194 de0e3c10 c02ca37c 3ea0: c02ca364 c02c8db0 de1b7500 de0e3c44 de0e3c10 c02c8e28 c0635b88 de02824c 3ec0: de0e3c10 c02c884c de0e3c10 de0e3c10 de0e3c18 c02c5b20 de0e3c10 de0e3c10 3ee0: 00000000 bf059000 a0000013 c02c5bc0 00000000 bf05900c de0e3c10 c02c5c48 3f00: de0dd0c0 de1ec970 de0f2410 bf05929c de0f2444 bf05902c de0f2410 c02ca37c 3f20: c02ca364 c02c8db0 bf05929c de0f2410 bf05929c c02c94c8 bf05929c 00000000 3f40: 00000800 c02c8ab4 bf0592e0 c007fc40 c00dd820 6273756d 336d615f 00783533 3f60: c064a0ac de1b7500 de122000 de1b7500 c000e590 00000001 c000e6c4 c0060160 3f80: 00028e70 00028e70 00028ea4 00000081 60000010 00028e70 00028e70 00028ea4 3fa0: 00000081 c000e500 00028e70 00028e70 00028ea4 00000800 becb59f8 00027608 3fc0: 00028e70 00028e70 00028ea4 00000081 00000001 00000001 00000000 00028f00 3fe0: b6e6b6f0 becb59d4 000160e8 b6e6b6fc 60000010 00028ea4 00000000 00000000 [<c0327798>] (am335x_shutdown) from [<bf054488>] (dsps_musb_exit+0x3c/0x4c [musb_dsps]) [<bf054488>] (dsps_musb_exit [musb_dsps]) from [<bf043650>] (musb_shutdown+0x80/0x90 [musb_hdrc]) [<bf043650>] (musb_shutdown [musb_hdrc]) from [<bf043a20>] (musb_remove+0x24/0x68 [musb_hdrc]) [<bf043a20>] (musb_remove [musb_hdrc]) from [<c02ca37c>] (platform_drv_remove+0x18/0x1c) [<c02ca37c>] (platform_drv_remove) from [<c02c8db0>] (__device_release_driver+0x70/0xc8) [<c02c8db0>] (__device_release_driver) from [<c02c8e28>] (device_release_driver+0x20/0x2c) [<c02c8e28>] (device_release_driver) from [<c02c884c>] (bus_remove_device+0xdc/0x10c) [<c02c884c>] (bus_remove_device) from [<c02c5b20>] (device_del+0x104/0x198) [<c02c5b20>] (device_del) from [<c02ca838>] (platform_device_del+0x14/0x9c) [<c02ca838>] (platform_device_del) from [<c02ca8cc>] (platform_device_unregister+0xc/0x20) [<c02ca8cc>] (platform_device_unregister) from [<bf054194>] (dsps_remove+0x18/0x38 [musb_dsps]) [<bf054194>] (dsps_remove [musb_dsps]) from [<c02ca37c>] (platform_drv_remove+0x18/0x1c) [<c02ca37c>] (platform_drv_remove) from [<c02c8db0>] (__device_release_driver+0x70/0xc8) [<c02c8db0>] (__device_release_driver) from [<c02c8e28>] (device_release_driver+0x20/0x2c) [<c02c8e28>] (device_release_driver) from [<c02c884c>] (bus_remove_device+0xdc/0x10c) [<c02c884c>] (bus_remove_device) from [<c02c5b20>] (device_del+0x104/0x198) [<c02c5b20>] (device_del) from [<c02c5bc0>] (device_unregister+0xc/0x20) [<c02c5bc0>] (device_unregister) from [<bf05900c>] (of_remove_populated_child+0xc/0x14 [musb_am335x]) [<bf05900c>] (of_remove_populated_child [musb_am335x]) from [<c02c5c48>] (device_for_each_child+0x44/0x70) [<c02c5c48>] (device_for_each_child) from [<bf05902c>] (am335x_child_remove+0x18/0x30 [musb_am335x]) [<bf05902c>] (am335x_child_remove [musb_am335x]) from [<c02ca37c>] (platform_drv_remove+0x18/0x1c) [<c02ca37c>] (platform_drv_remove) from [<c02c8db0>] (__device_release_driver+0x70/0xc8) [<c02c8db0>] (__device_release_driver) from [<c02c94c8>] (driver_detach+0xb4/0xb8) [<c02c94c8>] (driver_detach) from [<c02c8ab4>] (bus_remove_driver+0x4c/0xa0) [<c02c8ab4>] (bus_remove_driver) from [<c007fc40>] (SyS_delete_module+0x128/0x1cc) [<c007fc40>] (SyS_delete_module) from [<c000e500>] (ret_fast_syscall+0x0/0x48) Fixes: 97238b35d5bb ("usb: musb: dsps: use proper child nodes") Acked-by: George Cherian <george.cherian@ti.com> Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	usb: musb: Ensure that cppi41 timer gets armed on premature DMA TX irq	Thomas Gleixner
	commit c58d80f523ffc15ef4d062fc7aeb03793fe39701 upstream. Some TI chips raise the DMA complete interrupt before the actual transfer has been completed. The code tries to busy wait for a few microseconds and if that fails it arms an hrtimer to recheck. So far so good, but that has the following issue: CPU 0 CPU1 start_next_transfer(RQ1); DMA interrupt if (premature_irq(RQ1)) if (!hrtimer_active(timer)) hrtimer_start(timer); hrtimer expires timer->state = CALLBACK_RUNNING; timer->fn() cppi41_recheck_tx_req() complete_request(RQ1); if (requests_pending()) start_next_transfer(RQ2); DMA interrupt if (premature_irq(RQ2)) if (!hrtimer_active(timer)) hrtimer_start(timer); timer->state = INACTIVE; The premature interrupt of request2 on CPU1 does not arm the timer and therefor the request completion never happens because it checks for !hrtimer_active(). hrtimer_active() evaluates: timer->state != HRTIMER_STATE_INACTIVE which of course evaluates to true in the above case as timer->state is CALLBACK_RUNNING. That's clearly documented: * A timer is active, when it is enqueued into the rbtree or the * callback function is running or it's in the state of being migrated * to another cpu. But that's not what the code wants to check. The code wants to check whether the timer is queued, i.e. whether its armed and waiting for expiry. We have a helper function for this: hrtimer_is_queued(). This evaluates: timer->state & HRTIMER_STATE_QUEUED So in the above case this evaluates to false and therefor forces the DMA interrupt on CPU1 to call hrtimer_start(). Use hrtimer_is_queued() instead of hrtimer_active() and evrything is good. Reported-by: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	usb: musb: ux500: don't propagate the OF node	Linus Walleij
	commit 82363cf2eeafeea6ba88849f5e2febdc8a05943f upstream. There is a regression in the upcoming v3.16-rc1, that is caused by a problem that has been around for a while but now finally hangs the system. The bootcrawl looks like this: pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.0.auto pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.0.auto) status -22 pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik musb-hdrc musb-hdrc.0.auto: Error applying setting, reverse things back HS USB OTG: no transceiver configured musb-hdrc musb-hdrc.0.auto: musb_init_controller failed with status -517 platform musb-hdrc.0.auto: Driver musb-hdrc requests probe deferral (...) The ux500 MUSB driver propagates the OF node to the dynamically created musb-hdrc device, which is incorrect as it makes the OF core believe there are two devices spun from the very same DT node, which confuses other parts of the device core, notably the pin control subsystem, which will try to apply all the pin control settings also to the HDRC device as it gets instantiated. (The OMAP2430 for example, does not set the of_node member.) Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	usb: option: add/modify Olivetti Olicard modems	Bjørn Mork
	commit b0ebef36e93703e59003ad6a1a20227e47714417 upstream. Adding a couple of Olivetti modems and blacklisting the net function on a couple which are already supported. Reported-by: Lars Melin <larsm17@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	USB: option: add device ID for SpeedUp SU9800 usb 3g modem	Oliver Neukum
	commit 1cab4c68e339086cdaff7535848e878e8f261fca upstream. Reported by Alif Mubarak Ahmad: This device vendor and product id is 1c9e:9800 It is working as serial interface with generic usbserial driver. I thought it is more suitable to use usbserial option driver, which has better capability distinguishing between modem serial interface and micro sd storage interface. [ johan: style changes ] Signed-off-by: Oliver Neukum <oneukum@suse.de> Tested-by: Alif Mubarak Ahmad <alive4ever@live.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	xhci: Fix runtime suspended xhci from blocking system suspend.	Wang, Yu
	commit d6236f6d1d885aa19d1cd7317346fe795227a3cc upstream. The system suspend flow as following: 1, Freeze all user processes and kenrel threads. 2, Try to suspend all devices. 2.1, If pci device is in RPM suspended state, then pci driver will try to resume it to RPM active state in the prepare stage. 2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two workqueue items to resume usb2&usb3 roothub devices. 2.3, Call suspend callbacks of devices. 2.3.1, All suspend callbacks of all hcd's children, including roothub devices are called. 2.3.2, Finally, hcd_pci_suspend callback is called. Due to workqueue threads were already frozen in step 1, the workqueue items can't be scheduled, and the roothub devices can't be resumed in this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in usb_hcd_resume_root_hub won't be cleared. Finally, hcd_pci_suspend will return -EBUSY, and system suspend fails. The reason why this issue doesn't show up very often is due to that choose_wakeup will be called in step 2.3.1. In step 2.3.1, if udev->do_remote_wakeup is not equal to device_may_wakeup(&udev->dev), then udev will resume to RPM active for changing the wakeup settings. This has been a lucky hit which hides this issue. For some special xHCI controllers which have no USB2 port, then roothub will not match hub driver due to probe failed. Then its do_remote_wakeup will be set to zero, and we won't be as lucky. xhci driver doesn't need to resume roothub devices everytime like in the above case. It's only needed when there are pending event TRBs. This patch should be back-ported to kernels as old as 3.2, that contains the commit f69e3120df82391a0ee8118e0a156239a06b2afb "USB: XHCI: resume root hubs when the controller resumes" Signed-off-by: Wang, Yu <yu.y.wang@intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> [use readl() instead of removed xhci_readl(), reword commit message -Mathias] Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	xhci: correct burst count field for isoc transfers on 1.0 xhci hosts	Mathias Nyman
	commit 3213b151387df0b95f4eada104f68eb1c1409cb3 upstream. The transfer burst count (TBC) field in xhci 1.0 hosts should be set to the number of bursts needed to transfer all packets in a isoc TD. Supported values are 0-2 (1 to 3 bursts per service interval). Formula for TBC calculation is given in xhci spec section 4.11.2.3: TBC = roundup( Transfer Descriptor Packet Count / Max Burst Size +1 ) - 1 This patch should be applied to stable kernels since 3.0 that contain the commit 5cd43e33b9519143f06f507dd7cbee6b7a621885 "xhci 1.0: Set transfer burst count field." Suggested-by: ShiChun Ma <masc2008@qq.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	virtio-scsi: fix various bad behavior on aborted requests	Paolo Bonzini
	commit 8faeb529b2dabb9df691d614dda18910a43d05c9 upstream. Even though the virtio-scsi spec guarantees that all requests related to the TMF will have been completed by the time the TMF itself completes, the request queue's callback might not have run yet. This causes requests to be completed more than once, and as a result triggers a variety of BUGs or oopses. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	scsi_error: fix invalid setting of host byte	Ulrich Obergfell
	commit 8922a908908ff947f1f211e07e2e97f1169ad3cb upstream. After scsi_try_to_abort_cmd returns, the eh_abort_handler may have already found that the command has completed in the device, causing the host_byte to be nonzero (e.g. it could be DID_ABORT). When this happens, ORing DID_TIME_OUT into the host byte will corrupt the result field and initiate an unwanted command retry. Fix this by using set_host_byte instead, following the model of commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> [Fix all instances according to review comments. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	virtio-scsi: avoid cancelling uninitialized work items	Paolo Bonzini
	commit cdda0e5acbb78f7b777049f8c27899e5c5bb368f upstream. Calling the workqueue interface on uninitialized work items isn't a good idea even if they're zeroed. It's not failing catastrophically only through happy accidents. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	ibmvscsi: Add memory barriers for send / receive	Brian King
	commit 7114aae02742d6b5c5a0d39a41deb61d415d3717 upstream. Add a memory barrier prior to sending a new command to the VIOS to ensure the VIOS does not receive stale data in the command buffer. Also add a memory barrier when processing the CRQ for completed commands. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	ibmvscsi: Abort init sequence during error recovery	Brian King
	commit 9ee755974bea2f9880e517ec985dc9dede1b3a36 upstream. If a CRQ reset is triggered for some reason while in the middle of performing VSCSI adapter initialization, we don't want to call the done function for the initialization MAD commands as this will only result in two threads attempting initialization at the same time, resulting in failures. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	ALSA: usb-audio: Fix races at disconnection and PCM closing	Takashi Iwai
	commit 92a586bdc06de6629dae1b357dac221253f55ff8 upstream. When a USB-audio device is disconnected while PCM is still running, we still see some race: the disconnect callback calls snd_usb_endpoint_free() that calls release_urbs() and then kfree() while a PCM stream would be closed at the same time and calls stop_endpoints() that leads to wait_clear_urbs(). That is, the EP object might be deallocated while a PCM stream is syncing with wait_clear_urbs() with the same EP. Basically calling multiple wait_clear_urbs() would work fine, also calling wait_clear_urbs() and release_urbs() would work, too, as wait_clear_urbs() just reads some fields in ep. The problem is the succeeding kfree() in snd_pcm_endpoint_free(). This patch moves out the EP deallocation into the later point, the destructor callback. At this stage, all PCMs must have been already closed, so it's safe to free the objects. Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	tracing: Fix syscall_*regfunc() vs copy_process() race	Oleg Nesterov
	commit 4af4206be2bd1933cae20c2b6fb2058dbc887f7c upstream. syscall_regfunc() and syscall_unregfunc() should set/clear TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race with copy_process() and miss the new child which was not added to the process/thread lists yet. Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT under tasklist. Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints" Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	tracing: Try again for saved cmdline if failed due to locking	Steven Rostedt (Red Hat)
	commit 379cfdac37923653c9d4242d10052378b7563005 upstream. In order to prevent the saved cmdline cache from being filled when tracing is not active, the comms are only recorded after a trace event is recorded. The problem is, a comm can fail to be recorded if the trace_cmdline_lock is held. That lock is taken via a trylock to allow it to happen from any context (including NMI). If the lock fails to be taken, the comm is skipped. No big deal, as we will try again later. But! Because of the code that was added to only record after an event, we may not try again later as the recording is made as a oneshot per event per CPU. Only disable the recording of the comm if the comm is actually recorded. Fixes: 7ffbd48d5cab "tracing: Cache comms only after an event occurred" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	Documentation/SubmittingPatches: describe the Fixes: tag	Jacob Keller
	commit 8401aa1f59975c03eeebd3ac6d264cbdfe9af5de upstream. Update the SubmittingPatches process to include howto about the new 'Fixes:' tag to be used when a patch fixes an issue in a previous commit (found by git-bisect for example). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	lz4: add overrun checks to lz4_uncompress_unknownoutputsize()	Greg Kroah-Hartman
	commit 4a3a99045177369700c60d074c0e525e8093b0fc upstream. Jan points out that I forgot to make the needed fixes to the lz4_uncompress_unknownoutputsize() function to mirror the changes done in lz4_decompress() with regards to potential pointer overflows. The only in-kernel user of this function is the zram code, which only takes data from a valid compressed buffer that it made itself, so it's not a big issue. But due to external kernel modules using this function, it's better to be safe here. Reported-by: Jan Beulich <JBeulich@suse.com> Cc: "Don A. Bailey" <donb@securitymouse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	ptrace,x86: force IRET path after a ptrace_stop()	Tejun Heo
	commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a upstream. The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	ipvs: Fix panic due to non-linear skb	Peter Christensen
	commit f44a5f45f544561302e855e7bd104e5f506ec01b upstream. Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit f2edb9f7706dcb2c0d9a362b2ba849efe3a97f5e ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by: Peter Christensen <pch@ordbogen.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	MIPS: KVM: Fix memory leak on VCPU	Deng-Cheng Zhu
	commit 8c9eb041cf76038eb3b62ee259607eec9b89f48d upstream. kvm_arch_vcpu_free() is called in 2 code paths: 1) kvm_vm_ioctl() kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_destroy() kvm_arch_vcpu_free() 2) kvm_put_kvm() kvm_destroy_vm() kvm_arch_destroy_vm() kvm_mips_free_vcpus() kvm_arch_vcpu_free() Neither of the paths handles VCPU free. We need to do it in kvm_arch_vcpu_free() corresponding to the memory allocation in kvm_arch_vcpu_create(). Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	MIPS: KVM: Remove redundant NULL checks before kfree()	James Hogan
	commit c6c0a6637f9da54f9472144d44f71cf847f92e20 upstream. The kfree() function already NULL checks the parameter so remove the redundant NULL checks before kfree() calls in arch/mips/kvm/. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	reiserfs: call truncate_setsize under tailpack mutex	Jeff Mahoney
	commit 22e7478ddbcb670e33fab72d0bbe7c394c3a2c84 upstream. Prior to commit 0e4f6a791b1e (Fix reiserfs_file_release()), reiserfs truncates serialized on i_mutex. They mostly still do, with the exception of reiserfs_file_release. That blocks out other writers via the tailpack mutex and the inode openers counter adjusted in reiserfs_file_open. However, NFS will call reiserfs_setattr without having called ->open, so we end up with a race when nfs is calling ->setattr while another process is releasing the file. Ultimately, it triggers the BUG_ON(inode->i_size != new_file_size) check in maybe_indirect_to_direct. The solution is to pull the lock into reiserfs_setattr to encompass the truncate_setsize call as well. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc: Don't skip ePAPR spin-table CPUs	Scott Wood
	commit 6663a4fa6711050036562ddfd2086edf735fae21 upstream. Commit 59a53afe70fd530040bdc69581f03d880157f15a "powerpc: Don't setup CPUs with bad status" broke ePAPR SMP booting. ePAPR says that CPUs that aren't presently running shall have status of disabled, with enable-method being used to determine whether the CPU can be enabled. Fix by checking for spin-table, which is currently the only supported enable-method. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc: Add AT_HWCAP2 to indicate V.CRYPTO category support	Benjamin Herrenschmidt
	commit dd58a092c4202f2bd490adab7285b3ff77f8e467 upstream. The Vector Crypto category instructions are supported by current POWER8 chips, advertise them to userspace using a specific bit to properly differentiate with chips of the same architecture level that might not have them. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc: Don't setup CPUs with bad status	Michael Neuling
	commit 59a53afe70fd530040bdc69581f03d880157f15a upstream. OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU node. Unfortunatley Linux doesn't check this property and will put the bad CPU in the present map. This has caused hangs on booting when we try to unsplit the core. This patch checks the CPU is avaliable via this status property before putting it in the present map. Signed-off-by: Michael Neuling <mikey@neuling.org> Tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc: fix typo 'CONFIG_PPC_CPU'	Paul Bolle
	commit b69a1da94f3d1589d1942b5d1b384d8cfaac4500 upstream. Commit cd64d1697cf0 ("powerpc: mtmsrd not defined") added a check for CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended. Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc/perf: Ensure all EBB register state is cleared on fork()	Michael Ellerman
	commit 3df48c981d5a9610e02e9270b1bc4274fb536710 upstream. In commit 330a1eb "Core EBB support for 64-bit book3s" I messed up clear_task_ebb(). It clears some but not all of the task's Event Based Branch (EBB) registers when we duplicate a task struct. That allows a child task to observe the EBBHR & EBBRR of its parent, which it should not be able to do. Fix it by clearing EBBHR & EBBRR. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc: fix typo 'CONFIG_PMAC'	Paul Bolle
	commit 6e0fdf9af216887e0032c19d276889aad41cad00 upstream. Commit b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending") added a check for CONFIG_PMAC were a check for CONFIG_PPC_PMAC was clearly intended. Fixes: b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc: 64bit sendfile is capped at 2GB	Anton Blanchard
	commit 5d73320a96fcce80286f1447864c481b5f0b96fa upstream. commit 8f9c0119d7ba (compat: fs: Generic compat_sys_sendfile implementation) changed the PowerPC 64bit sendfile call from sys_sendile64 to sys_sendfile. Unfortunately this broke sendfile of lengths greater than 2G because sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which fixes the bug. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc/serial: Use saner flags when creating legacy ports	Benjamin Herrenschmidt
	commit c4cad90f9e9dcb85afc5e75a02ae3522ed077296 upstream. We had a mix & match of flags used when creating legacy ports depending on where we found them in the device-tree. Among others we were missing UPF_SKIP_TEST for some kind of ISA ports which is a problem as quite a few UARTs out there don't support the loopback test (such as a lot of BMCs). Let's pick the set of flags used by the SoC code and generalize it which means autoconf, no loopback test, irq maybe shared and fixed port. Sending to stable as the lack of UPF_SKIP_TEST is breaking serial on some machines so I want this back into distros Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc/mm: Check paca psize is up to date for huge mappings	Michael Ellerman
	commit 09567e7fd44291bfc08accfdd67ad8f467842332 upstream. We have a bug in our hugepage handling which exhibits as an infinite loop of hash faults. If the fault is being taken in the kernel it will typically trigger the softlockup detector, or the RCU stall detector. The bug is as follows: 1. mmap(0xa0000000, ..., MAP_FIXED \| MAP_HUGE_TLB \| MAP_ANONYMOUS ..) 2. Slice code converts the slice psize to 16M. 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area() synchronises the mm->context with the paca->context. So the paca slice mask is updated to include the 16M slice. 3. Either: * mmap() fails because there are no huge pages available. * mmap() succeeds and the mapping is then munmapped. In both cases the slice psize remains at 16M in both the paca & mm. 4. mmap(0xa0000000, ..., MAP_FIXED \| MAP_ANONYMOUS ..) 5. The slice psize is converted back to 64K. Because of the check on line 539 of slice.c we DO NOT update the paca->context. The paca slice mask is now out of sync with the mm slice mask. 6. User/kernel accesses 0xa0000000. 7. The SLB miss handler slb_allocate_realmode() uses the paca slice mask to create an SLB entry and inserts it in the SLB. 18. With the 16M SLB entry in place the hardware does a hash lookup, no entry is found so a data access exception is generated. 19. The data access handler calls do_page_fault() -> handle_mm_fault(). 10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page(). 11. The hardware retries the access, there is still nothing in the hash table so once again a data access exception is generated. 12. hash_page() calls into __hash_page_thp() and inserts a mapping in the hash. Although the THP mapping maps 16M the hashing is done using 64K as the segment page size. 13. hash_page() returns immediately after calling __hash_page_thp(), skipping over the code at line 1125. Resulting in the mismatch between the paca->context and mm->context not being detected. 14. The hardware retries the access, the hash it generates using the 16M SLB entry does NOT match the hash we inserted. 15. We take another data access and go into __hash_page_thp(). 16. We see a valid entry in the hpte_slot_array and so we call updatepp() which succeeds. 17. Goto 14. We could fix this in two ways. The first would be to remove or modify the check on line 539 of slice.c. The second option is to cause the check of paca psize in hash_page() on line 1125 to also be done for THP pages. We prefer the latter, because the check & update of the paca psize is not done until we know it's necessary. It's also done only on the current cpu, so we don't need to IPI all other cpus. Without further rearranging the code, the simplest fix is to pull out the code that checks paca psize and call it in two places. Firstly for THP/hugetlb, and secondly for other mappings as before. Thanks to Dave Jones for trinity, which originally found this bug. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	powerpc/pseries: Fix overwritten PE state	Gavin Shan
	commit 54f112a3837d4e7532bbedbbbf27c0de277be510 upstream. In pseries_eeh_get_state(), EEH_STATE_UNAVAILABLE is always overwritten by EEH_STATE_NOT_SUPPORT because of the missed "break" there. The patch fixes the issue. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	nfs: Fix cache_validity check in nfs_write_pageuptodate()	Scott Mayhew
	commit 18dd78c427513fb0f89365138be66e6ee8700d1b upstream. NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation. We're still having some problems with data corruption when multiple clients are appending to a file and those clients are being granted write delegations on open. To reproduce: Client A: vi /mnt/`hostname -s` while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done Client B: vi /mnt/`hostname -s` while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done What's happening is that in nfs_update_inode() we're recognizing that the file size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we ignore the cache_validity flags in nfs_write_pageuptodate() because we have a delegation. As a result, in nfs_updatepage() we're extending the write to cover the full page even though we've not read in the data to begin with. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	NFS: populate ->net in mount data when remounting	Mateusz Guzik
	commit a914722f333b3359d2f4f12919380a334176bb89 upstream. Otherwise the kernel oopses when remounting with IPv6 server because net is dereferenced in dev_get_by_name. Use net ns of current thread so that dev_get_by_name does not operate on foreign ns. Changing the address is prohibited anyway so this should not affect anything. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	NFS: Don't declare inode uptodate unless all attributes were checked	Trond Myklebust
	commit 43b6535e717d2f656f71d9bd16022136b781c934 upstream. Fix a bug, whereby nfs_update_inode() was declaring the inode to be up to date despite not having checked all the attributes. The bug occurs because the temporary variable in which we cache the validity information is 'sanitised' before reapplying to nfsi->cache_validity. Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer	Christoph Hellwig
	commit 12337901d654415d9f764b5f5ba50052e9700f37 upstream. Note nobody's ever noticed because the typical client probably never requests FILES_AVAIL without also requesting something else on the list. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	pNFS: Handle allocation errors correctly in filelayout_alloc_layout_hdr()	Trond Myklebust
	commit 6df200f5d5191bdde4d2e408215383890f956781 upstream. Return the NULL pointer when the allocation fails. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	SUNRPC: Fix a module reference leak in svc_handle_xprt	Trond Myklebust
	commit c789102c20bbbdda6831a273e046715be9d6af79 upstream. If the accept() call fails, we need to put the module reference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/umad: Fix use-after-free on close	Bart Van Assche
	commit 60e1751cb52cc6d1ae04b6bd3c2b96e770b5823f upstream. Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n> triggers a use-after-free. __fput() invokes f_op->release() before it invokes cdev_put(). Make sure that the ib_umad_device structure is freed by the cdev_put() call instead of f_op->release(). This avoids that changing the port mode from IB into Ethernet and back to IB followed by restarting opensmd triggers the following kernel oops: general protection fault: 0000 [#1] PREEMPT SMP RIP: 0010:[<ffffffff810cc65c>] [<ffffffff810cc65c>] module_put+0x2c/0x170 Call Trace: [<ffffffff81190f20>] cdev_put+0x20/0x30 [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0 [<ffffffff8118e35e>] ____fput+0xe/0x10 [<ffffffff810723bc>] task_work_run+0xac/0xe0 [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0 [<ffffffff814b8398>] int_signal+0x12/0x17 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/umad: Fix error handling	Bart Van Assche
	commit 8ec0a0e6b58218bdc1db91dd70ebfcd6ad8dd6cd upstream. Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL or if nonseekable_open() fails. Avoid leaking a kref count, that sm_sem is kept down and also that the IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if nonseekable_open() fails. Since container_of() never returns NULL, remove the code that tests whether container_of() returns NULL. Moving the kref_get() call from the start of ib_umad_*open() to the end is safe since it is the responsibility of the caller of these functions to ensure that the cdev pointer remains valid until at least when these functions return. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ydroneaud@opteya.com: rework a bit to reduce the amount of code changed] Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> [ nonseekable_open() can't actually fail, but.... - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/srp: Fix a sporadic crash triggered by cable pulling	Bart Van Assche
	commit 024ca90151f5e4296d30f72c13ff9a075e23c9ec upstream. Avoid that the loops that iterate over the request ring can encounter a pointer to a SCSI command in req->scmnd that is no longer associated with that request. If the function srp_unmap_data() is invoked twice for a SCSI command that is not in flight then that would cause ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument, resulting in a kernel oops. Reported-by: Sagi Grimberg <sagig@mellanox.com> Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/ipath: Translate legacy diagpkt into newer extended diagpkt	Dennis Dalessandro
	commit 7e6d3e5c70f13874fb06e6b67696ed90ce79bd48 upstream. This patch addresses an issue where the legacy diagpacket is sent in from the user, but the driver operates on only the extended diagpkt. This patch specifically initializes the extended diagpkt based on the legacy packet. Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/qib: Fix port in pkey change event	Mike Marciniszyn
	commit 911eccd284d13d78c92ec4f1f1092c03457d732a upstream. The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE. As of the dual port qib QDR card, this is not necessarily correct. Change to use the port as specified in the call. Reported-by: Alex Estrin <alex.estrin@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq	Yann Droneaud
	commit 43bc889380c2ad9aa230eccc03a15cc52cf710d4 upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. Tool pahole could be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_srq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: # +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 # --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 # @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq { # __u64 db_addr; /* 8 8 / # __u32 flags; / 16 4 / # # - / size: 20, cachelines: 1, members: 3 / # - / last cacheline: 20 bytes / # + / size: 24, cachelines: 1, members: 3 / # + / padding: 4 / # + / last cacheline: 24 bytes */ # }; ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lay in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail and the uverb will fail. This patch makes create_srq_user() takes care of the input data size to handle the case where no padding was provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq	Yann Droneaud
	commit a8237b32a3faab155a5dc8f886452147ce73da3e upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_cq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: # +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 # --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 # @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq { # __u64 db_addr; /* 8 8 / # __u32 cqe_size; / 16 4 / # # - / size: 20, cachelines: 1, members: 3 / # - / last cacheline: 20 bytes / # + / size: 24, cachelines: 1, members: 3 / # + / padding: 4 / # + / last cacheline: 24 bytes */ # }; This ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, a x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lies in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail when trying to read the 4 bytes of padding and the uverb will fail. This patch makes create_cq_user() takes care of the input data size to handle the case where no padding is provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	watchdog: kempld-wdt: Use the correct value when configuring the prescaler ↵	gundberg
	with the watchdog commit a9e0436b303e94ba57d3bd4b1fcbeaa744b7ebeb upstream. Use the prescaler index, rather than its value, to configure the watchdog. This will prevent a mismatch with the prescaler used to calculate the cycles. Signed-off-by: Per Gundberg <per.gundberg@icomera.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Michael Brunner <michael.brunner@kontron.com> Tested-by: Michael Brunner <michael.brunner@kontron.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	watchdog: ath79_wdt: avoid spurious restarts on AR934x	Gabor Juhos
	commit 23afeb613ec0e10aecfae7838a14d485db62ac52 upstream. On some AR934x based systems, where the frequency of the AHB bus is relatively high, the built-in watchdog causes a spurious restart when it gets enabled. The possible cause of these restarts is that the timeout value written into the TIMER register does not reaches the hardware in time. Add an explicit delay into the ath79_wdt_enable function to avoid the spurious restarts. Signed-off-by: Gabor Juhos <juhosg@openwrt.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	watchdog: sp805: Set watchdog_device->timeout from ->set_timeout()	Viresh Kumar
	commit 938626d96a3ffb9eb54552bb0d3a4f2b30ffdeb0 upstream. Implementation of ->set_timeout() is supposed to set 'timeout' field of 'struct watchdog_device' passed to it. sp805 was rather setting this in a local variable. Fix it. Reported-by: Arun Ramamurthy <arun.ramamurthy@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	UBIFS: Remove incorrect assertion in shrink_tnc()	hujianyang
	commit 72abc8f4b4e8574318189886de627a2bfe6cd0da upstream. I hit the same assert failed as Dolev Raviv reported in Kernel v3.10 shows like this: [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297) [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1 [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24) [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28) [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238) [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418) [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530) [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54) [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184) [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac) [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0) [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40) After analyzing the code, I found a condition that may cause this failed in correct operations. Thus, I think this assertion is wrong and should be removed. Suppose there are two clean znodes and one dirty znode in TNC. So the per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops on this znode. We clear COW bit and DIRTY bit in write_index() without @tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the comments in write_index() shows, if another process hold @tnc_mutex and dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1). We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in free_obsolete_znodes() to keep it right. If shrink_tnc() performs between decrease and increase, it will release other 2 clean znodes it holds and found @clean_zn_cnt is less than zero (1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will soon correct @clean_zn_cnt and no harm to fs in this case, I think this assertion could be removed. 2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2 Thread A (commit) Thread B (write or others) Thread C (shrinker) ->write_index ->clear_bit(DIRTY_NODE) ->clear_bit(COW_ZNODE) @clean_zn_cnt == 2 ->mutex_locked(&tnc_mutex) ->dirty_cow_znode ->!ubifs_zn_cow(znode) ->!test_and_set_bit(DIRTY_NODE) ->atomic_dec(&clean_zn_cnt) ->mutex_unlocked(&tnc_mutex) @clean_zn_cnt == 1 ->mutex_locked(&tnc_mutex) ->shrink_tnc ->destroy_tnc_subtree ->atomic_sub(&clean_zn_cnt, 2) ->ubifs_assert <- hit ->mutex_unlocked(&tnc_mutex) @clean_zn_cnt == -1 ->mutex_lock(&tnc_mutex) ->free_obsolete_znodes ->atomic_inc(&clean_zn_cnt) ->mutux_unlock(&tnc_mutex) @clean_zn_cnt == 0 (correct after shrink) Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-17	UBIFS: fix an mmap and fsync race condition	hujianyang
	commit 691a7c6f28ac90cccd0dbcf81348ea90b211bdd0 upstream. There is a race condition in UBIFS: Thread A (mmap) Thread B (fsync) ->__do_fault ->write_cache_pages -> ubifs_vm_page_mkwrite -> budget_space -> lock_page -> release/convert_page_budget -> SetPagePrivate -> TestSetPageDirty -> unlock_page -> lock_page -> TestClearPageDirty -> ubifs_writepage -> do_writepage -> release_budget -> ClearPagePrivate -> unlock_page -> !(ret & VM_FAULT_LOCKED) -> lock_page -> set_page_dirty -> ubifs_set_page_dirty -> TestSetPageDirty (set page dirty without budgeting) -> unlock_page This leads to situation where we have a diry page but no budget allocated for this page, so further write-back may fail with -ENOSPC. In this fix we return from page_mkwrite without performing unlock_page. We return VM_FAULT_LOCKED instead. After doing this, the race above will not happen. Signed-off-by: hujianyang <hujianyang@huawei.com> Tested-by: Laurence Withers <lwithers@guralp.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>