linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2020-03-11	driver code: clarify and fix platform device DMA mask allocation	Christoph Hellwig
	This does three inter-related things to clarify the usage of the platform device dma_mask field. In the process, fix the bug introduced by cdfee5623290 ("driver core: initialize a default DMA mask for platform device") that caused Artem Tashkinov's laptop to not boot with newer Fedora kernels. This does: - First off, rename the field to "platform_dma_mask" to make it greppable. We have way too many different random fields called "dma_mask" in various data structures, where some of them are actual masks, and some of them are just pointers to the mask. And the structures all have pointers to each other, or embed each other inside themselves, and "pdev" sometimes means "platform device" and sometimes it means "PCI device". So to make it clear in the code when you actually use this new field, give it a unique name (it really should be something even more unique like "platform_device_dma_mask", since it's per platform device, not per platform, but that gets old really fast, and this is unique enough in context). To further clarify when the field gets used, initialize it when we actually start using it with the default value. - Then, use this field instead of the random one-off allocation in platform_device_register_full() that is now unnecessary since we now already have a perfectly fine allocation for it in the platform device structure. - The above then allows us to fix the actual bug, where the error path of platform_device_register_full() would unconditionally free the platform device DMA allocation with 'kfree()'. That kfree() was dont regardless of whether the allocation had been done earlier with the (now removed) kmalloc, or whether setup_pdev_dma_masks() had already been used and the dma_mask pointer pointed to the mask that was part of the platform device. It seems most people never triggered the error path, or only triggered it from a call chain that set an explicit pdevinfo->dma_mask value (and thus caused the unnecessary allocation that was "cleaned up" in the error path) before calling platform_device_register_full(). Robin Murphy points out that in Artem's case the wdat_wdt driver failed in platform_device_add(), and that was the one that had called platform_device_register_full() with pdevinfo.dma_mask = 0, and would have caused that kfree() of pdev.dma_mask corrupting the heap. A later unrelated kmalloc() then oopsed due to the heap corruption. Fixes: cdfee5623290 ("driver core: initialize a default DMA mask for platform device") Reported-bisected-and-tested-by: Artem S. Tashkinov <aros@gmx.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-11	mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY	Ulf Hansson
	It has turned out that the sdhci-tegra controller requires the R1B response, for commands that has this response associated with them. So, converting from an R1B to an R1 response for a CMD6 for example, leads to problems with the HW busy detection support. Fix this by informing the mmc core about the requirement, via setting the host cap, MMC_CAP_NEED_RSP_BUSY. Reported-by: Bitan Biswas <bbiswas@nvidia.com> Reported-by: Peter Geis <pgwipeout@gmail.com> Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Cc: <stable@vger.kernel.org> Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Tested-By: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-03-11	mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY	Ulf Hansson
	It has turned out that the sdhci-omap controller requires the R1B response, for commands that has this response associated with them. So, converting from an R1B to an R1 response for a CMD6 for example, leads to problems with the HW busy detection support. Fix this by informing the mmc core about the requirement, via setting the host cap, MMC_CAP_NEED_RSP_BUSY. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Anders Roxell <anders.roxell@linaro.org> Reported-by: Faiz Abbas <faiz_abbas@ti.com> Cc: <stable@vger.kernel.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Faiz Abbas <faiz_abbas@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-03-11	mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard	Ulf Hansson
	The busy timeout that is computed for each erase/trim/discard operation, can become quite long and may thus exceed the host->max_busy_timeout. If that becomes the case, mmc_do_erase() converts from using an R1B response to an R1 response, as to prevent the host from doing HW busy detection. However, it has turned out that some hosts requires an R1B response no matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note that, if the R1B gets enforced, the host becomes fully responsible of managing the needed busy timeout, in one way or the other. Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Cc: <stable@vger.kernel.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Tested-by: Faiz Abbas <faiz_abbas@ti.com> Tested-By: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-03-11	mmc: core: Allow host controllers to require R1B for CMD6	Ulf Hansson
	It has turned out that some host controllers can't use R1B for CMD6 and other commands that have R1B associated with them. Therefore invent a new host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this. In __mmc_switch(), let's check the flag and use it to prevent R1B responses from being converted into R1. Note that, this also means that the host are on its own, when it comes to manage the busy timeout. Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Cc: <stable@vger.kernel.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Tested-by: Faiz Abbas <faiz_abbas@ti.com> Tested-By: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-03-11	x86/ioremap: Map EFI runtime services data as encrypted for SEV	Tom Lendacky
	The dmidecode program fails to properly decode the SMBIOS data supplied by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is encrypted and resides in reserved memory that is marked as EFI runtime services data. As a result, when memremap() is attempted for the SMBIOS data, it can't be mapped as regular RAM (through try_ram_remap()) and, since the address isn't part of the iomem resources list, it isn't mapped encrypted through the fallback ioremap(). Add a new __ioremap_check_other() to deal with memory types like EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges. This allows any runtime services data which has been created encrypted, to be mapped encrypted too. [ bp: Move functionality to a separate function. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: <stable@vger.kernel.org> # 5.3 Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com
2020-03-11	ftrace: Return the first found result in lookup_rec()	Artem Savkov
	It appears that ip ranges can overlap so. In that case lookup_rec() returns whatever results it got last even if it found nothing in last searched page. This breaks an obscure livepatch late module patching usecase: - load livepatch - load the patched module - unload livepatch - try to load livepatch again To fix this return from lookup_rec() as soon as it found the record containing searched-for ip. This used to be this way prior lookup_rec() introduction. Link: http://lkml.kernel.org/r/20200306174317.21699-1-asavkov@redhat.com Cc: stable@vger.kernel.org Fixes: 7e16f581a817 ("ftrace: Separate out functionality from ftrace_location_range()") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-11	mac80211: Do not send mesh HWMP PREQ if HWMP is disabled	Nicolas Cavallari
	When trying to transmit to an unknown destination, the mesh code would unconditionally transmit a HWMP PREQ even if HWMP is not the current path selection algorithm. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Link: https://lore.kernel.org/r/20200305140409.12204-1-cavallar@lri.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11	nl80211: add missing attribute validation for channel switch	Jakub Kicinski
	Add missing attribute validation for NL80211_ATTR_OPER_CLASS to the netlink policy. Fixes: 1057d35ede5d ("cfg80211: introduce TDLS channel switch commands") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20200303051058.4089398-4-kuba@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11	nl80211: add missing attribute validation for beacon report scanning	Jakub Kicinski
	Add missing attribute validation for beacon report scanning to the netlink policy. Fixes: 1d76250bd34a ("nl80211: support beacon report scanning") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20200303051058.4089398-3-kuba@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11	nl80211: add missing attribute validation for critical protocol indication	Jakub Kicinski
	Add missing attribute validation for critical protocol fields to the netlink policy. Fixes: 5de17984898c ("cfg80211: introduce critical protocol indication from user-space") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20200303051058.4089398-2-kuba@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11	KVM: s390: Also reset registers in sync regs for initial cpu reset	Christian Borntraeger
	When we do the initial CPU reset we must not only clear the registers in the internal data structures but also in kvm_run sync_regs. For modern userspace sync_regs is the only place that it looks at. Fixes: 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API") Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-03-10	scsi: ipr: Fix softlockup when rescanning devices in petitboot	Wen Xiong
	When trying to rescan disks in petitboot shell, we hit the following softlockup stacktrace: Kernel panic - not syncing: System is deadlocked on memory [ 241.223394] CPU: 32 PID: 693 Comm: sh Not tainted 5.4.16-openpower1 #1 [ 241.223406] Call Trace: [ 241.223415] [c0000003f07c3180] [c000000000493fc4] dump_stack+0xa4/0xd8 (unreliable) [ 241.223432] [c0000003f07c31c0] [c00000000007d4ac] panic+0x148/0x3cc [ 241.223446] [c0000003f07c3260] [c000000000114b10] out_of_memory+0x468/0x4c4 [ 241.223461] [c0000003f07c3300] [c0000000001472b0] __alloc_pages_slowpath+0x594/0x6d8 [ 241.223476] [c0000003f07c3420] [c00000000014757c] __alloc_pages_nodemask+0x188/0x1a4 [ 241.223492] [c0000003f07c34a0] [c000000000153e10] alloc_pages_current+0xcc/0xd8 [ 241.223508] [c0000003f07c34e0] [c0000000001577ac] alloc_slab_page+0x30/0x98 [ 241.223524] [c0000003f07c3520] [c0000000001597fc] new_slab+0x138/0x40c [ 241.223538] [c0000003f07c35f0] [c00000000015b204] ___slab_alloc+0x1e4/0x404 [ 241.223552] [c0000003f07c36c0] [c00000000015b450] __slab_alloc+0x2c/0x48 [ 241.223566] [c0000003f07c36f0] [c00000000015b754] kmem_cache_alloc_node+0x9c/0x1b4 [ 241.223582] [c0000003f07c3760] [c000000000218c48] blk_alloc_queue_node+0x34/0x270 [ 241.223599] [c0000003f07c37b0] [c000000000226574] blk_mq_init_queue+0x2c/0x78 [ 241.223615] [c0000003f07c37e0] [c0000000002ff710] scsi_mq_alloc_queue+0x28/0x70 [ 241.223631] [c0000003f07c3810] [c0000000003005b8] scsi_alloc_sdev+0x184/0x264 [ 241.223647] [c0000003f07c38a0] [c000000000300ba0] scsi_probe_and_add_lun+0x288/0xa3c [ 241.223663] [c0000003f07c3a00] [c000000000301768] __scsi_scan_target+0xcc/0x478 [ 241.223679] [c0000003f07c3b20] [c000000000301c64] scsi_scan_channel.part.9+0x74/0x7c [ 241.223696] [c0000003f07c3b70] [c000000000301df4] scsi_scan_host_selected+0xe0/0x158 [ 241.223712] [c0000003f07c3bd0] [c000000000303f04] store_scan+0x104/0x114 [ 241.223727] [c0000003f07c3cb0] [c0000000002d5ac4] dev_attr_store+0x30/0x4c [ 241.223741] [c0000003f07c3cd0] [c0000000001dbc34] sysfs_kf_write+0x64/0x78 [ 241.223756] [c0000003f07c3cf0] [c0000000001da858] kernfs_fop_write+0x170/0x1b8 [ 241.223773] [c0000003f07c3d40] [c0000000001621fc] __vfs_write+0x34/0x60 [ 241.223787] [c0000003f07c3d60] [c000000000163c2c] vfs_write+0xa8/0xcc [ 241.223802] [c0000003f07c3db0] [c000000000163df4] ksys_write+0x70/0xbc [ 241.223816] [c0000003f07c3e20] [c00000000000b40c] system_call+0x5c/0x68 As a part of the scan process Linux will allocate and configure a scsi_device for each target to be scanned. If the device is not present, then the scsi_device is torn down. As a part of scsi_device teardown a workqueue item will be scheduled and the lockups we see are because there are 250k workqueue items to be processed. Accoding to the specification of SIS-64 sas controller, max_channel should be decreased on SIS-64 adapters to 4. The patch fixes softlockup issue. Thanks for Oliver Halloran's help with debugging and explanation! Link: https://lore.kernel.org/r/1583510248-23672-1-git-send-email-wenxiong@linux.vnet.ibm.com Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-10	Merge branch 's390-qeth-fixes'	David S. Miller
	Julian Wiedmann says: ==================== s390/qeth: fixes 2020-03-10 This fixes three minor issues: 1) a setup parameter gets cleared unnecessarily when the HW config changes, 2) insufficient error handling when initially filling the RX ring, and 3) a rarely used worker that needs to be cancelled during tear down. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	s390/qeth: cancel RX reclaim work earlier	Julian Wiedmann
	When qeth's napi poll code fails to refill an entirely empty RX ring, it kicks off buffer_reclaim_work to try again later. Make sure that this worker is cancelled when setting the qeth device offline. Otherwise a RX refill action can unexpectedly end up running concurrently to bigger re-configurations (eg. resizing the buffer pool), without any locking. Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	s390/qeth: handle error when backing RX buffer	Julian Wiedmann
	qeth_init_qdio_queues() fills the RX ring with an initial set of RX buffers. If qeth_init_input_buffer() fails to back one of the RX buffers with memory, we need to bail out and report the error. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	s390/qeth: don't reset default_out_queue	Julian Wiedmann
	When an OSA device in prio-queue setup is reduced to 1 TX queue due to HW restrictions, we reset its the default_out_queue to 0. In the old code this was needed so that qeth_get_priority_queue() gets the queue selection right. But with proper multiqueue support we already reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic on txq 0 without even calling .ndo_select_queue. Thus we can preserve the user's configuration, and apply it if the OSA device later re-gains support for multiple TX queues. Fixes: 73dc2daf110f ("s390/qeth: add TX multiqueue support for OSA devices") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	scsi: ufs: Fix possible unclocked access to auto hibern8 timer register	Can Guo
	Before access auto hibner8 timer register, make sure power and clock are properly configured to avoid unclocked register access. Link: https://lore.kernel.org/r/1583398391-14273-1-git-send-email-cang@codeaurora.org Fixes: ba7af5ec5126 ("scsi: ufs: export ufshcd_auto_hibern8_update for vendor usage") Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-10	Merge branch 'MACSec-bugfixes-related-to-MAC-address-change'	David S. Miller
	Igor Russkikh says: ==================== MACSec bugfixes related to MAC address change We found out that there's an issue in MACSec code when the MAC address is changed. Both s/w and offloaded implementations don't update SCI when the MAC address changes at the moment, but they should do so, because SCI contains MAC in its first 6 octets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	net: macsec: invoke mdo_upd_secy callback when mac address changed	Dmitry Bogdanov
	Notify the offload engine about MAC address change to reconfigure it accordingly. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	net: macsec: update SCI upon MAC address change.	Dmitry Bogdanov
	SCI should be updated, because it contains MAC in its first 6 octets. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	ibmvnic: Do not process device remove during device reset	Juliet Kim
	The ibmvnic driver does not check the device state when the device is removed. If the device is removed while a device reset is being processed, the remove may free structures needed by the reset, causing an oops. Fix this by checking the device state before processing device remove. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	net/smc: cancel event worker during device removal	Karsten Graul
	During IB device removal, cancel the event worker before the device structure is freed. Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface	Hangbin Liu
	Rafał found an issue that for non-Ethernet interface, if we down and up frequently, the memory will be consumed slowly. The reason is we add allnodes/allrouters addressed in multicast list in ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up() for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb getting bigger and bigger. The call stack looks like: addrconf_notify(NETDEV_REGISTER) ipv6_add_dev ipv6_dev_mc_inc(ff01::1) ipv6_dev_mc_inc(ff02::1) ipv6_dev_mc_inc(ff02::2) addrconf_notify(NETDEV_UP) addrconf_dev_config /* Alas, we support only Ethernet autoconfiguration. */ return; addrconf_notify(NETDEV_DOWN) addrconf_ifdown ipv6_mc_down igmp6_group_dropped(ff02::2) mld_add_delrec(ff02::2) igmp6_group_dropped(ff02::1) igmp6_group_dropped(ff01::1) After investigating, I can't found a rule to disable multicast on non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM, tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up() in inetdev_event(). Even for IPv6, we don't check the dev type and call ipv6_add_dev(), ipv6_dev_mc_inc() after register device. So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for non-Ethernet interface. v2: Also check IFF_MULTICAST flag to make sure the interface supports multicast Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <zajec5@gmail.com> Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels") Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	Merge tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux	Linus Torvalds
	Pull clang-format update from Miguel Ojeda: "Another update for the .clang-format macro list It has been a while since the last time I sent one!" * tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux: clang-format: Update with the latest for_each macro list
2020-03-10	net: memcg: late association of sock to memcg	Shakeel Butt
	If a TCP socket is allocated in IRQ context or cloned from unassociated (i.e. not associated to a memcg) in IRQ context then it will remain unassociated for its whole life. Almost half of the TCPs created on the system are created in IRQ context, so, memory used by such sockets will not be accounted by the memcg. This issue is more widespread in cgroup v1 where network memory accounting is opt-in but it can happen in cgroup v2 if the source socket for the cloning was created in root memcg. To fix the issue, just do the association of the sockets at the accept() time in the process context and then force charge the memory buffer already used and reserved by the socket. Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	cgroup: memcg: net: do not associate sock with unrelated cgroup	Shakeel Butt
	We are testing network memory accounting in our setup and noticed inconsistent network memory usage and often unrelated cgroups network usage correlates with testing workload. On further inspection, it seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in irq context specially for cgroup v1. mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context and kind of assumes that this can only happen from sk_clone_lock() and the source sock object has already associated cgroup. However in cgroup v1, where network memory accounting is opt-in, the source sock can be unassociated with any cgroup and the new cloned sock can get associated with unrelated interrupted cgroup. Cgroup v2 can also suffer if the source sock object was created by process in the root cgroup or if sk_alloc() is called in irq context. The fix is to just do nothing in interrupt. WARNING: Please note that about half of the TCP sockets are allocated from the IRQ context, so, memory used by such sockets will not be accouted by the memcg. The stack trace of mem_cgroup_sk_alloc() from IRQ-context: CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1 Hardware name: ... Call Trace: <IRQ> dump_stack+0x57/0x75 mem_cgroup_sk_alloc+0xe9/0xf0 sk_clone_lock+0x2a7/0x420 inet_csk_clone_lock+0x1b/0x110 tcp_create_openreq_child+0x23/0x3b0 tcp_v6_syn_recv_sock+0x88/0x730 tcp_check_req+0x429/0x560 tcp_v6_rcv+0x72d/0xa40 ip6_protocol_deliver_rcu+0xc9/0x400 ip6_input+0x44/0xd0 ? ip6_protocol_deliver_rcu+0x400/0x400 ip6_rcv_finish+0x71/0x80 ipv6_rcv+0x5b/0xe0 ? ip6_sublist_rcv+0x2e0/0x2e0 process_backlog+0x108/0x1e0 net_rx_action+0x26b/0x460 __do_softirq+0x104/0x2a6 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq.part.19+0x40/0x50 __local_bh_enable_ip+0x51/0x60 ip6_finish_output2+0x23d/0x520 ? ip6table_mangle_hook+0x55/0x160 __ip6_finish_output+0xa1/0x100 ip6_finish_output+0x30/0xd0 ip6_output+0x73/0x120 ? __ip6_finish_output+0x100/0x100 ip6_xmit+0x2e3/0x600 ? ipv6_anycast_cleanup+0x50/0x50 ? inet6_csk_route_socket+0x136/0x1e0 ? skb_free_head+0x1e/0x30 inet6_csk_xmit+0x95/0xf0 __tcp_transmit_skb+0x5b4/0xb20 __tcp_send_ack.part.60+0xa3/0x110 tcp_send_ack+0x1d/0x20 tcp_rcv_state_process+0xe64/0xe80 ? tcp_v6_connect+0x5d1/0x5f0 tcp_v6_do_rcv+0x1b1/0x3f0 ? tcp_v6_do_rcv+0x1b1/0x3f0 __release_sock+0x7f/0xd0 release_sock+0x30/0xa0 __inet_stream_connect+0x1c3/0x3b0 ? prepare_to_wait+0xb0/0xb0 inet_stream_connect+0x3b/0x60 __sys_connect+0x101/0x120 ? __sys_getsockopt+0x11b/0x140 __x64_sys_connect+0x1a/0x20 do_syscall_64+0x51/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The stack trace of mem_cgroup_sk_alloc() from IRQ-context: Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking") Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	Merge tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux	Linus Torvalds
	Pull auxdisplay updates from Miguel Ojeda: "A few minor auxdisplay improvements: - charlcd: replace zero-length array with flexible-array member (kernel-wide cleanup by Gustavo A. R. Silva) - img-ascii-lcd: convert to devm_platform_ioremap_resource (Yangtao Li) - Fix Kconfig indentation (Krzysztof Kozlowski) * tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux: auxdisplay: charlcd: replace zero-length array with flexible-array member auxdisplay: img-ascii-lcd: convert to devm_platform_ioremap_resource auxdisplay: Fix Kconfig indentation
2020-03-10	MAINTAINERS: update cxgb4vf maintainer to Vishal	Jakub Kicinski
	Casey Leedomn <leedom@chelsio.com> is bouncing, Vishal indicated he's happy to take the role. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10	Merge branch 'for-5.6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - cgroup.procs listing related fixes. It didn't interlock properly with exiting tasks leaving a short window where a cgroup has empty cgroup.procs but still can't be removed and misbehaved on short reads. - psi_show() crash fix on 32bit ino archs - Empty release_agent handling fix * 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup1: don't call release_agent when it is "" cgroup: fix psi_show() crash on 32bit ino archs cgroup: Iterate tasks that did not finish do_exit() cgroup: cgroup_procs_next should increase position index cgroup-v1: cgroup_pidlist_next should update position index
2020-03-10	Merge branch 'for-5.6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Workqueue has been incorrectly round-robining per-cpu work items. Hillf's patch fixes that. The other patch documents memory-ordering properties of workqueue operations" * 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: don't use wq_select_unbound_cpu() for bound works workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()
2020-03-10	drm/amdgpu/powerplay: nv1x, renior copy dcn clock settings of watermark to ↵	Hersen Wu
	smu during boot up dc to pplib interface is changed for navi1x, renoir. display_config_changed is not called by dc anymore. smu_write_watermarks_table is not executed for navi1x, renoir during boot up. solution: call smu_write_watermarks_table just after dc pass watermark clock settings to pplib Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10	blk-iocost: fix incorrect vtime comparison in iocg_is_idle()	Tejun Heo
	vtimes may wrap and time_before/after64() should be used to determine whether a given vtime is before or after another. iocg_is_idle() was incorrectly using plain "<" comparison do determine whether done_vtime is before vtime. Here, the only thing we're interested in is whether done_vtime matches vtime which indicates that there's nothing in flight. Let's test for inequality instead. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10	workqueue: don't use wq_select_unbound_cpu() for bound works	Hillf Danton
	wq_select_unbound_cpu() is designed for unbound workqueues only, but it's wrongly called when using a bound workqueue too. Fixing this ensures work queued to a bound workqueue with cpu=WORK_CPU_UNBOUND always runs on the local CPU. Before, that would happen only if wq_unbound_cpumask happened to include it (likely almost always the case), or was empty, or we got lucky with forced round-robin placement. So restricting /sys/devices/virtual/workqueue/cpumask to a small subset of a machine's CPUs would cause some bound work items to run unexpectedly there. Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Hillf Danton <hdanton@sina.com> [dj: massage changelog] Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-10	i2c: gpio: suppress error on probe defer	Hamish Martin
	If a GPIO we are trying to use is not available and we are deferring the probe, don't output an error message. This seems to have been the intent of commit 05c74778858d ("i2c: gpio: Add support for named gpios in DT") but the error was still output due to not checking the updated 'retdesc'. Fixes: 05c74778858d ("i2c: gpio: Add support for named gpios in DT") Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-10	macintosh: windfarm: fix MODINFO regression	Wolfram Sang
	Commit af503716ac14 made sure OF devices get an OF style modalias with I2C events. It assumed all in-tree users were converted, yet it missed some Macintosh drivers. Add an OF module device table for all windfarm drivers to make them automatically load again. Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices registered via OF") Link: https://bugzilla.kernel.org/show_bug.cgi?id=199471 Reported-by: Erhard Furtner <erhard_f@mailbox.org> Tested-by: Erhard Furtner <erhard_f@mailbox.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org # v4.17+
2020-03-10	i2c: designware-pci: Fix BUG_ON during device removal	Jarkko Nikula
	Function i2c_dw_pci_remove() -> pci_free_irq_vectors() -> pci_disable_msi() -> free_msi_irqs() will throw a BUG_ON() for MSI enabled device since the driver has not released the requested IRQ before calling the pci_free_irq_vectors(). Here driver requests an IRQ using devm_request_irq() but automatic release happens only after remove callback. Fix this by explicitly freeing the IRQ before calling pci_free_irq_vectors(). Fixes: 21aa3983d619 ("i2c: designware-pci: Switch over to MSI interrupts") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-10	iommu/vt-d: Silence RCU-list debugging warnings	Qian Cai
	Similar to the commit 02d715b4a818 ("iommu/vt-d: Fix RCU list debugging warnings"), there are several other places that call list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence those false positives as well. drivers/iommu/intel-iommu.c:4288 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1ad/0xb97 drivers/iommu/dmar.c:366 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x125/0xb97 drivers/iommu/intel-iommu.c:5057 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffffa71892c8 (dmar_global_lock){++++}, at: intel_iommu_init+0x61a/0xb13 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-10	iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()	Qian Cai
	There are several places traverse RCU-list without holding any lock in intel_iommu_init(). Fix them by acquiring dmar_global_lock. WARNING: suspicious RCU usage ----------------------------- drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/1. Call Trace: dump_stack+0xa0/0xea lockdep_rcu_suspicious+0x102/0x10b intel_iommu_init+0x947/0xb13 pci_iommu_init+0x26/0x62 do_one_initcall+0xfe/0x500 kernel_init_freeable+0x45a/0x4f8 kernel_init+0x11/0x139 ret_from_fork+0x3a/0x50 DMAR: Intel(R) Virtualization Technology for Directed I/O Fixes: d8190dc63886 ("iommu/vt-d: Enable DMA remapping after rmrr mapped") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-10	i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device	Mika Westerberg
	Martin noticed that nct6775 driver does not load properly on his system in v5.4+ kernels. The issue was bisected to commit b84398d6d7f9 ("i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond") but it is likely not the culprit because the faulty code has been in the driver already since commit 9424693035a5 ("i2c: i801: Create iTCO device on newer Intel PCHs"). So more likely some commit that added PCI IDs of recent chipsets made the driver to create the iTCO_wdt device on Martins system. The issue was debugged to be PCI configuration access to the PMC device that is not present. This returns all 1's when read and this caused the iTCO_wdt driver to accidentally request resourses used by nct6775. It turns out that the SMI resource is only required for some ancient systems, not the ones supported by this driver. For this reason do not populate the SMI resource at all and drop all the related code. The driver now always populates the main I/O resource and only in case of SPT (Intel Sunrisepoint) compatible devices it adds another resource for the NO_REBOOT bit. These two resources are of different types so platform_get_resource() used by the iTCO_wdt driver continues to find the both resources at index 0. Link: https://lore.kernel.org/linux-hwmon/CAM1AHpQ4196tyD=HhBu-2donSsuogabkfP03v1YF26Q7_BgvgA@mail.gmail.com/ Fixes: 9424693035a5 ("i2c: i801: Create iTCO device on newer Intel PCHs") [wsa: complete fix needs all of http://patchwork.ozlabs.org/project/linux-i2c/list/?series=160959&state=*] Reported-by: Martin Volf <martin.volf.42@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-10	watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional	Mika Westerberg
	The iTCO_wdt driver only needs ICH_RES_IO_SMI I/O resource when either turn_SMI_watchdog_clear_off module parameter is set to match ->iTCO_version (or higher), and when legacy iTCO_vendorsupport is set. Modify the driver so that ICH_RES_IO_SMI is optional if the two conditions are not met. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-10	watchdog: iTCO_wdt: Export vendorsupport	Mika Westerberg
	In preparation for making ->smi_res optional the iTCO_wdt driver needs to know whether vendorsupport is being set to non-zero. For this reason export the variable. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-10	Merge tag 'gvt-fixes-2020-03-10' of https://github.com/intel/gvt-linux into ↵	Jani Nikula
	drm-intel-fixes gvt-fixes-2020-03-10 - Fix vgpu idr destroy causing timer destroy failure (Zhenyu) - Fix VBT size (Tina) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310080933.GE28483@zhen-hp.sh.intel.com
2020-03-10	Merge tag 'linux-cpupower-5.6-rc6' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility fix for v5.6 from Shuah Khan: "This cpupower update for Linux 5.6-rc6 consists of a fix from Mike Gilbert for build failures when -fno-common is enabled. -fno-common will be default in gcc v10." * tag 'linux-cpupower-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: cpupower: avoid multiple definition with gcc -fno-common
2020-03-10	drm/exynos: Fix cleanup of IOMMU related objects	Marek Szyprowski
	Store the IOMMU mapping created by the device core of each Exynos DRM sub-device and restore it when the Exynos DRM driver is unbound. This fixes IOMMU initialization failure for the second time when a deferred probe is triggered from the bind() callback of master's compound DRM driver. This also fixes the following issue found using kmemleak detector: unreferenced object 0xc2137640 (size 64): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 50 a3 14 c2 80 a2 14 c2 01 00 00 00 20 00 00 00 P........... ... 00 10 00 00 00 80 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 [<8cd12507>] 0x0 unreferenced object 0xc214a280 (size 128): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 00 a0 ec ed 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 [<8cd12507>] 0x0 unreferenced object 0xedeca000 (size 4096): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 [<8cd12507>] 0x0 unreferenced object 0xc214a300 (size 128): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 00 a3 14 c2 00 a3 14 c2 00 40 18 c2 00 80 18 c2 .........@...... 02 00 02 00 ad 4e ad de ff ff ff ff ff ff ff ff .....N.......... backtrace: [<08cbd8bc>] iommu_domain_alloc+0x24/0x50 [<b835abee>] arm_iommu_create_mapping+0xe4/0x134 [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2020-03-09	Merge tag 'batadv-net-for-davem-20200306' of git://git.open-mesh.org/linux-merge	David S. Miller
	Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Don't schedule OGM for disabled interface, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09	net: mscc: ocelot: properly account for VLAN header length when setting MRU	Vladimir Oltean
	What the driver writes into MAC_MAXLEN_CFG does not actually represent VLAN_ETH_FRAME_LEN but instead ETH_FRAME_LEN + ETH_FCS_LEN. Yes they are numerically equal, but the difference is important, as the switch treats VLAN-tagged traffic specially and knows to increase the maximum accepted frame size automatically. So it is always wrong to account for VLAN in the MAC_MAXLEN_CFG register. Unconditionally increase the maximum allowed frame size for double-tagged traffic. Accounting for the additional length does not mean that the other VLAN membership checks aren't performed, so there's no harm done. Also, stop abusing the MTU name for configuring the MRU. There is no support for configuring the MRU on an interface at the moment. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Fixes: fa914e9c4d94 ("net: mscc: ocelot: create a helper for changing the port MTU") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09	ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()	Eric Dumazet
	Commit e18b353f102e ("ipvlan: add cond_resched_rcu() while processing muticast backlog") added a cond_resched_rcu() in a loop using rcu protection to iterate over slaves. This is breaking rcu rules, so lets instead use cond_resched() at a point we can reschedule Fixes: e18b353f102e ("ipvlan: add cond_resched_rcu() while processing muticast backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09	cgroup, netclassid: periodically release file_lock on classid updating	Dmitry Yakunin
	In our production environment we have faced with problem that updating classid in cgroup with heavy tasks cause long freeze of the file tables in this tasks. By heavy tasks we understand tasks with many threads and opened sockets (e.g. balancers). This freeze leads to an increase number of client timeouts. This patch implements following logic to fix this issue: аfter iterating 1000 file descriptors file table lock will be released thus providing a time gap for socket creation/deletion. Now update is non atomic and socket may be skipped using calls: dup2(oldfd, newfd); close(oldfd); But this case is not typical. Moreover before this patch skip is possible too by hiding socket fd in unix socket buffer. New sockets will be allocated with updated classid because cgroup state is updated before start of the file descriptors iteration. So in common cases this patch has no side effects. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09	macvlan: add cond_resched() during multicast processing	Mahesh Bandewar
	The Rx bound multicast packets are deferred to a workqueue and macvlan can also suffer from the same attack that was discovered by Syzbot for IPvlan. This solution is not as effective as in IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing to a workqueue while macvlan does this way only for the Rx. This fix should address the Rx codition to certain extent. Tx is still suseptible. Tx multicast processing happens when .ndo_start_xmit is called, hence we cannot add cond_resched(). However, it's not that severe since the user which is generating / flooding will be affected the most. Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>