linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2021-05-12	scsi: smartpqi: Use host-wide tag space	Don Brace
	[ Upstream commit c6d3ee209b9e863c6251f72101511340451ca324 ] Correct SCSI midlayer sending more requests than exposed host queue depth causing firmware ASSERT and lockup issues by enabling host-wide tags. Note: This also results in better performance. Link: https://lore.kernel.org/r/161549369787.25025.8975999483518581619.stgit@brunhilda Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: John Garry <john.garry@huawei.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	power: supply: cpcap-charger: Add usleep to cpcap charger to avoid usb plug ↵	Carl Philipp Klemm
	bounce [ Upstream commit 751faedf06e895a17e985a88ef5b6364ffd797ed ] Adds 80000 us sleep when the usb cable is plugged in to hopefully avoid bouncing contacts. Upon pluging in the usb cable vbus will bounce for some time, causing cpcap to dissconnect charging due to detecting an undervoltage condition. This is a scope of vbus on xt894 while quickly inserting the usb cable with firm force, probed at the far side of the usb socket and vbus loaded with approx 1k: http://uvos.xyz/maserati/usbplug.jpg. As can clearly be seen, vbus is all over the place for the first 15 ms or so with a small blip at ~40 ms this causes the cpcap to trip up and disable charging again. The delay helps cpcap_usb_detect avoid the worst of this. It is, however, still not ideal as strong vibrations can cause the issue to reapear any time during charging. I have however not been able to cause the device to stop charging due to this in practice as it is hard to vibrate the device such that the vbus pins start bouncing again but cpcap_usb_detect is not called again due to a detected disconnect/reconnect event. Signed-off-by: Carl Philipp Klemm <philipp@uvos.xyz> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Fix checking for < 0 for unsigned values	Fenghua Yu
	[ Upstream commit 1205b688c92558a04d8dd4cbc2b213e0fceba5db ] Dan reported following static checker warnings tools/testing/selftests/resctrl/resctrl_val.c:545 measure_vals() warn: 'bw_imc' unsigned <= 0 tools/testing/selftests/resctrl/resctrl_val.c:549 measure_vals() warn: 'bw_resc_end' unsigned <= 0 These warnings are reported because 1. measure_vals() declares 'bw_imc' and 'bw_resc_end' as unsigned long variables 2. Return value of get_mem_bw_imc() and get_mem_bw_resctrl() are assigned to 'bw_imc' and 'bw_resc_end' respectively 3. The returned values are checked for <= 0 to see if the calls failed Checking for < 0 for an unsigned value doesn't make any sense. Fix this issue by changing the implementation of get_mem_bw_imc() and get_mem_bw_resctrl() such that they now accept reference to a variable and set the variable appropriately upon success and return 0, else return < 0 on error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Fix incorrect parsing of iMC counters	Fenghua Yu
	[ Upstream commit d81343b5eedf84be71a4313e8fd073d0c510afcf ] iMC (Integrated Memory Controller) counters are usually at "/sys/bus/event_source/devices/" and are named as "uncore_imc_<n>". num_of_imcs() function tries to count number of such iMC counters so that it could appropriately initialize required number of perf_attr structures that could be used to read these iMC counters. num_of_imcs() function assumes that all the directories under this path that start with "uncore_imc" are iMC counters. But, on some systems there could be directories named as "uncore_imc_free_running" which aren't iMC counters. Trying to read from such directories will result in "not found file" errors and MBM/MBA tests will fail. Hence, fix the logic in num_of_imcs() such that it looks at the first character after "uncore_imc_" to check if it's a numerical digit or not. If it's a digit then the directory represents an iMC counter, else, skip the directory. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Use resctrl/info for feature detection	Fenghua Yu
	[ Upstream commit ee0415681eb661efa1eb2db7acc263f2c7df1e23 ] Resctrl test suite before running any unit test (like cmt, cat, mbm and mba) should first check if the feature is enabled (by kernel and not just supported by H/W) on the platform or not. validate_resctrl_feature_request() is supposed to do that. This function intends to grep for relevant flags in /proc/cpuinfo but there are several issues here 1. validate_resctrl_feature_request() calls fgrep() to get flags from /proc/cpuinfo. But, fgrep() can only return a string with maximum of 255 characters and hence the complete cpu flags are never returned. 2. The substring search logic is also busted. If strstr() finds requested resctrl feature in the cpu flags, it returns pointer to the first occurrence. But, the logic negates the return value of strstr() and hence validate_resctrl_feature_request() returns false if the feature is present in the cpu flags and returns true if the feature is not present. 3. validate_resctrl_feature_request() checks if a resctrl feature is reported in /proc/cpuinfo flags or not. Having a cpu flag means that the H/W supports the feature, but it doesn't mean that the kernel enabled it. A user could selectively enable only a subset of resctrl features using kernel command line arguments. Hence, /proc/cpuinfo isn't a reliable source to check if a feature is enabled or not. The 3rd issue being the major one and fixing it requires changing the way validate_resctrl_feature_request() works. Since, /proc/cpuinfo isn't the right place to check if a resctrl feature is enabled or not, a more appropriate place is /sys/fs/resctrl/info directory. Change validate_resctrl_feature_request() such that, 1. For cat, check if /sys/fs/resctrl/info/L3 directory is present or not 2. For mba, check if /sys/fs/resctrl/info/MB directory is present or not 3. For cmt, check if /sys/fs/resctrl/info/L3_MON directory is present and check if /sys/fs/resctrl/info/L3_MON/mon_features has llc_occupancy 4. For mbm, check if /sys/fs/resctrl/info/L3_MON directory is present and check if /sys/fs/resctrl/info/L3_MON/mon_features has mbm_<total/local>_bytes Please note that only L3_CAT, L3_CMT, MBA and MBM are supported. CDP and L2 variants can be added later. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Fix missing options "-n" and "-p"	Fenghua Yu
	[ Upstream commit d7af3d0d515cbdf63b6c3398a3c15ecb1bc2bd38 ] resctrl test suite accepts command line arguments (like -b, -t, -n and -p) as documented in the help. But passing -n and -p throws an invalid option error. This happens because -n and -p are missing in the list of characters that getopt() recognizes as valid arguments. Hence, they are treated as invalid options. Fix this by adding them to the list of characters that getopt() recognizes as valid arguments. Please note that the main() function already has the logic to deal with the values passed as part of these arguments and hence no changes are needed there. Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Clean up resctrl features check	Fenghua Yu
	[ Upstream commit 2428673638ea28fa93d2a38b1c3e8d70122b00ee ] Checking resctrl features call strcmp() to compare feature strings (e.g. "mba", "cat" etc). The checkings are error prone and don't have good coding style. Define the constant strings in macros and call strncmp() to solve the potential issues. Suggested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Fix compilation issues for other global variables	Fenghua Yu
	[ Upstream commit 896016d2ad051811ff9c9c087393adc063322fbc ] Reinette reported following compilation issue on Fedora 32, gcc version 10.1.1 /usr/bin/ld: resctrl_tests.o:<src_dir>/resctrl.h:65: multiple definition of `bm_pid'; cache.o:<src_dir>/resctrl.h:65: first defined here Other variables are ppid, tests_run, llc_occup_path, is_amd. Compiler isn't happy because these variables are defined globally in two .c files but are not declared as extern. To fix issues for the global variables, declare them as extern. Chang Log: - Split this patch from v4's patch 1 (Shuah). Reported-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Fix compilation issues for global variables	Fenghua Yu
	[ Upstream commit 8236c51d85a64643588505a6791e022cc8d84864 ] Reinette reported following compilation issue on Fedora 32, gcc version 10.1.1 /usr/bin/ld: cqm_test.o:<src_dir>/cqm_test.c:22: multiple definition of `cache_size'; cat_test.o:<src_dir>/cat_test.c:23: first defined here The same issue is reported for long_mask, cbm_mask, count_of_bits etc variables as well. Compiler isn't happy because these variables are defined globally in two .c files namely cqm_test.c and cat_test.c and the compiler during compilation finds that the variable is already defined (multiple definition error). Taking a closer look at the usage of these variables reveals that these variables are used only locally in functions such as cqm_resctrl_val() (defined in cqm_test.c) and cat_perf_miss_val() (defined in cat_test.c). These variables are not shared between those functions. So, there is no need for these variables to be global. Hence, fix this issue by making them static variables. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	selftests/resctrl: Enable gcc checks to detect buffer overflows	Fenghua Yu
	[ Upstream commit a9d26a302dea29eb84f491b1340a57e56c631a71 ] David reported a buffer overflow error in the check_results() function of the cmt unit test and he suggested enabling _FORTIFY_SOURCE gcc compiler option to automatically detect any such errors. Feature Test Macros man page describes_FORTIFY_SOURCE as below "Defining this macro causes some lightweight checks to be performed to detect some buffer overflow errors when employing various string and memory manipulation functions (for example, memcpy, memset, stpcpy, strcpy, strncpy, strcat, strncat, sprintf, snprintf, vsprintf, vsnprintf, gets, and wide character variants thereof). For some functions, argument consistency is checked; for example, a check is made that open has been supplied with a mode argument when the specified flags include O_CREAT. Not all problems are detected, just some common cases. If _FORTIFY_SOURCE is set to 1, with compiler optimization level 1 (gcc -O1) and above, checks that shouldn't change the behavior of conforming programs are performed. With _FORTIFY_SOURCE set to 2, some more checking is added, but some conforming programs might fail. Some of the checks can be performed at compile time (via macros logic implemented in header files), and result in compiler warnings; other checks take place at run time, and result in a run-time error if the check fails. Use of this macro requires compiler support, available with gcc since version 4.0." Fix the buffer overflow error in the check_results() function of the cmt unit test and enable _FORTIFY_SOURCE gcc check to catch any future buffer overflow errors. Reported-by: David Binderman <dcb314@hotmail.com> Suggested-by: David Binderman <dcb314@hotmail.com> Tested-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	nvmet: return proper error code from discovery ctrl	Hou Pu
	[ Upstream commit 79695dcd9ad4463a82def7f42960e6d7baa76f0b ] Return NVME_SC_INVALID_FIELD from discovery controller like normal controller when executing identify or get log page command. Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/komeda: Fix bit check to import to value of proper type	Carsten Haitzler
	[ Upstream commit a1c3be890440a1769ed6f822376a3e3ab0d42994 ] Another issue found by KASAN. The bit finding is buried inside the dp_for_each_set_bit() macro (that passes on to for_each_set_bit() that calls the bit stuff. These bit functions want an unsigned long pointer as input and just dumbly casting leads to out-of-bounds accesses. This fixes that. Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: James Qian Wang <james.qian.wang@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210204131102.68658-1-carsten.haitzler@foss.arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	ata: ahci: Disable SXS for Hisilicon Kunpeng920	Xingui Yang
	[ Upstream commit 234e6d2c18f5b080cde874483c4c361f3ae7cffe ] On Hisilicon Kunpeng920, ESP is set to 1 by default for all ports of SATA controller. In some scenarios, some ports are not external SATA ports, and it cause disks connected to these ports to be identified as removable disks. So disable the SXS capability on the software side to prevent users from mistakenly considering non-removable disks as removable disks and performing related operations. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1615544676-61926-1-git-send-email-luojiaxing@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	mmc: sdhci-brcmstb: Remove CQE quirk	Al Cooper
	[ Upstream commit f0bdf98fab058efe7bf49732f70a0f26d1143154 ] Remove the CQHCI_QUIRK_SHORT_TXFR_DESC_SZ quirk because the latest chips have this fixed and earlier chips have other CQE problems that prevent the feature from being enabled. Signed-off-by: Al Cooper <alcooperx@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210325192834.42955-1-alcooperx@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	mmc: sdhci-pci: Add PCI IDs for Intel LKF	Adrian Hunter
	[ Upstream commit ee629112be8b4eff71d4d3d108a28bc7dc877e13 ] Add PCI IDs for Intel LKF eMMC and SD card host controllers. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20210322055356.24923-1-adrian.hunter@intel.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	mmc: sdhci-esdhc-imx: validate pinctrl before use it	Peng Fan
	[ Upstream commit f410ee0aa2df050a9505f5c261953e9b18e21206 ] When imx_data->pinctrl is not a valid pointer, pinctrl_lookup_state will trigger kernel panic. When we boot Dual OS on Jailhouse hypervisor, we let the 1st Linux to configure pinmux ready for the 2nd OS, so the 2nd OS not have pinctrl settings. Similar to this commit b62eee9f804e ("mmc: sdhci-esdhc-imx: no fail when no pinctrl available"). Reviewed-by: Bough Chen <haobo.chen@nxp.com> Reviewed-by: Alice Guo <alice.guo@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/1614222604-27066-6-git-send-email-peng.fan@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	scsi: qla2xxx: Fix use after free in bsg	Quinn Tran
	[ Upstream commit 2ce35c0821afc2acd5ee1c3f60d149f8b2520ce8 ] On bsg command completion, bsg_job_done() was called while qla driver continued to access the bsg_job buffer. bsg_job_done() would free up resources that ended up being reused by other task while the driver continued to access the buffers. As a result, driver was reading garbage data. localhost kernel: BUG: KASAN: use-after-free in sg_next+0x64/0x80 localhost kernel: Read of size 8 at addr ffff8883228a3330 by task swapper/26/0 localhost kernel: localhost kernel: CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted: G OE --------- - - 4.18.0-193.el8.x86_64+debug #1 localhost kernel: Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 08/12/2016 localhost kernel: Call Trace: localhost kernel: <IRQ> localhost kernel: dump_stack+0x9a/0xf0 localhost kernel: print_address_description.cold.3+0x9/0x23b localhost kernel: kasan_report.cold.4+0x65/0x95 localhost kernel: debug_dma_unmap_sg.part.12+0x10d/0x2d0 localhost kernel: qla2x00_bsg_sp_free+0xaf6/0x1010 [qla2xxx] Link: https://lore.kernel.org/r/20210329085229.4367-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/vkms: fix misuse of WARN_ON	Dmitry Vyukov
	[ Upstream commit b4142fc4d52d051d4d8df1fb6c569e5b445d369e ] vkms_vblank_simulate() uses WARN_ON for timing-dependent condition (timer overrun). This is a mis-use of WARN_ON, WARN_ON must be used to denote kernel bugs. Use pr_warn() instead. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot+4fc21a003c8332eb0bdd@syzkaller.appspotmail.com Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: Melissa Wen <melissa.srw@gmail.com> Cc: Haneen Mohammed <hamohammed.sa@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Acked-by: Melissa Wen <melissa.srw@gmail.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210320132840.1315853-1-dvyukov@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	scsi: qla2xxx: Always check the return value of qla24xx_get_isp_stats()	Bart Van Assche
	[ Upstream commit a2b2cc660822cae08c351c7f6b452bfd1330a4f7 ] This patch fixes the following Coverity warning: CID 361199 (#1 of 1): Unchecked return value (CHECKED_RETURN) 3. check_return: Calling qla24xx_get_isp_stats without checking return value (as is done elsewhere 4 out of 5 times). Link: https://lore.kernel.org/r/20210320232359.941-7-bvanassche@acm.org Cc: Quinn Tran <qutran@marvell.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Daniel Wagner <dwagner@suse.de> Cc: Lee Duncan <lduncan@suse.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amd/display: fix dml prefetch validation	Dmytro Laktyushkin
	[ Upstream commit 8ee0fea4baf90e43efe2275de208a7809f9985bc ] Incorrect variable used, missing initialization during validation. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amd/display: DCHUB underflow counter increasing in some scenarios	Aric Cyr
	[ Upstream commit 4710430a779e6077d81218ac768787545bff8c49 ] [Why] When unplugging a display, the underflow counter can be seen to increase because PSTATE switch is allowed even when some planes are not blanked. [How] Check that all planes are not active instead of all streams before allowing PSTATE change. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'	Anson Jacob
	[ Upstream commit 6a30a92997eee49554f72b462dce90abe54a496f ] [Why] dc_cursor_position do not initialise position.translate_by_source when crtc or plane->state->fb is NULL. UBSAN caught this error in dce110_set_cursor_position, as the value was garbage. [How] Initialise dc_cursor_position structure elements to 0 in handle_cursor_update before calling get_cursor_position. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1471 Reported-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Anson Jacob <Anson.Jacob@amd.com> Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amd/pm: fix workload mismatch on vega10	Kenneth Feng
	[ Upstream commit 0979d43259e13846d86ba17e451e17fec185d240 ] Workload number mapped to the correct one. This issue is only on vega10. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f	shaoyunl
	[ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e33336e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amdkfd: Fix UBSAN shift-out-of-bounds warning	Anson Jacob
	[ Upstream commit 50e2fc36e72d4ad672032ebf646cecb48656efe0 ] If get_num_sdma_queues or get_num_xgmi_sdma_queues is 0, we end up doing a shift operation where the number of bits shifted equals number of bits in the operand. This behaviour is undefined. Set num_sdma_queues or num_xgmi_sdma_queues to ULLONG_MAX, if the count is >= number of bits in the operand. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1472 Reported-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Anson Jacob <Anson.Jacob@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amdgpu: mask the xgmi number of hops reported from psp to kfd	Jonathan Kim
	[ Upstream commit 4ac5617c4b7d0f0a8f879997f8ceaa14636d7554 ] The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report the incorrect IO link weights without proper masking. The actual number of hops is located in the 3 least significant bits of this field so mask if off accordingly before passing it to the KFD. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Amber Lin <amber.lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	backlight: qcom-wled: Fix FSC update issue for WLED5	Kiran Gunda
	[ Upstream commit 4d6e9cdff7fbb6bef3e5559596fab3eeffaf95ca ] Currently, for WLED5, the FSC (Full scale current) setting is not updated properly due to driver toggling the wrong register after an FSC update. On WLED5 we should only toggle the MOD_SYNC bit after a brightness update. For an FSC update we need to toggle the SYNC bits instead. Fix it by adopting the common wled3_sync_toggle() for WLED5 and introducing new code to the brightness update path to compensate. Signed-off-by: Kiran Gunda <kgunda@codeaurora.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	backlight: qcom-wled: Use sink_addr for sync toggle	Obeida Shamoun
	[ Upstream commit cdfd4c689e2a52c313b35ddfc1852ff274f91acb ] WLED3_SINK_REG_SYNC is, as the name implies, a sink register offset. Therefore, use the sink address as base instead of the ctrl address. This fixes the sync toggle on wled4, which can be observed by the fact that adjusting brightness now works. It has no effect on wled3 because sink and ctrl base addresses are the same. This allows adjusting the brightness without having to disable then reenable the module. Signed-off-by: Obeida Shamoun <oshmoun100@googlemail.com> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Acked-by: Kiran Gunda <kgunda@codeaurora.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	power: supply: Use IRQF_ONESHOT	dongjian
	[ Upstream commit 2469b836fa835c67648acad17d62bc805236a6ea ] Fixes coccicheck error: drivers/power/supply/pm2301_charger.c:1089:7-27: ERROR: drivers/power/supply/lp8788-charger.c:502:8-28: ERROR: drivers/power/supply/tps65217_charger.c:239:8-33: ERROR: drivers/power/supply/tps65090-charger.c:303:8-33: ERROR: Threaded IRQ with no primary handler requested without IRQF_ONESHOT Signed-off-by: dongjian <dongjian@yulong.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: v4l2-ctrls.c: initialize flags field of p_fwht_params	Hans Verkuil
	[ Upstream commit ea1611ba3a544b34f89ffa3d1e833caab30a3f09 ] The V4L2_CID_STATELESS_FWHT_PARAMS compound control was missing a proper initialization of the flags field, so after loading the vicodec module for the first time, running v4l2-compliance for the stateless decoder would fail on this control because the initial control value was considered invalid by the vicodec driver. Initializing the flags field to sane values fixes this. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: gspca/sq905.c: fix uninitialized variable	Hans Verkuil
	[ Upstream commit eaaea4681984c79d2b2b160387b297477f0c1aab ] act_len can be uninitialized if usb_bulk_msg() returns an error. Set it to 0 to avoid a KMSAN error. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reported-by: syzbot+a4e309017a5f3a24c7b3@syzkaller.appspotmail.com Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: media/saa7164: fix saa7164_encoder_register() memory leak bugs	Daniel Niv
	[ Upstream commit c759b2970c561e3b56aa030deb13db104262adfe ] Add a fix for the memory leak bugs that can occur when the saa7164_encoder_register() function fails. The function allocates memory without explicitly freeing it when errors occur. Add a better error handling that deallocate the unused buffers before the function exits during a fail. Signed-off-by: Daniel Niv <danielniv3@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	extcon: arizona: Fix various races on driver unbind	Hans de Goede
	[ Upstream commit e5b499f6fb17bc95a813e85d0796522280203806 ] We must free/disable all interrupts and cancel all pending works before doing further cleanup. Before this commit arizona_extcon_remove() was doing several register writes to shut things down before disabling the IRQs and it was cancelling only 1 of the 3 different works used. Move all the register-writes shutting things down to after the disabling of the IRQs and add the 2 missing cancel_delayed_work_sync() calls. This fixes various possible races on driver unbind. One of which would always trigger on devices using the mic-clamp feature for jack detection. The ARIZONA_MICD_CLAMP_MODE_MASK update was done before disabling the IRQs, causing: 1. arizona_jackdet() to run 2. detect a jack being inserted (clamp disabled means jack inserted) 3. call arizona_start_mic() which: 3.1 Enables the MICVDD regulator 3.2 takes a pm_runtime_reference And this was all happening after the ARIZONA_MICD_ENA bit clearing, which would undo 3.1 and 3.2 because the ARIZONA_MICD_CLAMP_MODE_MASK update was being done after the ARIZONA_MICD_ENA bit clearing. So this means that arizona_extcon_remove() would exit with 1. MICVDD enabled and 2. The pm_runtime_reference being unbalanced. MICVDD still being enabled caused the following oops when the regulator is released by the devm framework: [ 2850.745757] ------------[ cut here ]------------ [ 2850.745827] WARNING: CPU: 2 PID: 2098 at drivers/regulator/core.c:2123 _regulator_put.part.0+0x19f/0x1b0 [ 2850.745835] Modules linked in: extcon_arizona ... ... [ 2850.746909] Call Trace: [ 2850.746932] regulator_put+0x2d/0x40 [ 2850.746946] release_nodes+0x22a/0x260 [ 2850.746984] __device_release_driver+0x190/0x240 [ 2850.747002] driver_detach+0xd4/0x120 ... [ 2850.747337] ---[ end trace f455dfd7abd9781f ]--- Note this oops is just one of various theoretically possible races caused by the wrong ordering inside arizona_extcon_remove(), this fixes the ordering fixing all possible races, including the reported oops. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Tested-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	extcon: arizona: Fix some issues when HPDET IRQ fires after the jack has ↵	Hans de Goede
	been unplugged [ Upstream commit c309a3e8793f7e01c4a4ec7960658380572cb576 ] When the jack is partially inserted and then removed again it may be removed while the hpdet code is running. In this case the following may happen: 1. The "JACKDET rise" or ""JACKDET fall" IRQ triggers 2. arizona_jackdet runs and takes info->lock 3. The "HPDET" IRQ triggers 4. arizona_hpdet_irq runs, blocks on info->lock 5. arizona_jackdet calls arizona_stop_mic() and clears info->hpdet_done 6. arizona_jackdet releases info->lock 7. arizona_hpdet_irq now can continue running and: 7.1 Calls arizona_start_mic() (if a mic was detected) 7.2 sets info->hpdet_done Step 7 is undesirable / a bug: 7.1 causes the device to stay in a high power-state (with MICVDD enabled) 7.2 causes hpdet to not run on the next jack insertion, which in turn causes the EXTCON_JACK_HEADPHONE state to never get set This fixes both issues by skipping these 2 steps when arizona_hpdet_irq runs after the jack has been unplugged. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Tested-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	power: supply: bq27xxx: fix power_avg for newer ICs	Matthias Schiffer
	[ Upstream commit c4d57c22ac65bd503716062a06fad55a01569cac ] On all newer bq27xxx ICs, the AveragePower register contains a signed value; in addition to handling the raw value as unsigned, the driver code also didn't convert it to µW as expected. At least for the BQ28Z610, the reference manual incorrectly states that the value is in units of 1mW and not 10mW. I have no way of knowing whether the manuals of other supported ICs contain the same error, or if there are models that actually use 1mW. At least, the new code shouldn't be less correct than the old version for any device. power_avg is removed from the cache structure, se we don't have to extend it to store both a signed value and an error code. Always getting an up-to-date value may be desirable anyways, as it avoids inconsistent current and power readings when switching between charging and discharging. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	atomisp: don't let it go past pipes array	Mauro Carvalho Chehab
	[ Upstream commit 1f6c45ac5fd70ab59136ab5babc7def269f3f509 ] In practice, IA_CSS_PIPE_ID_NUM should never be used when calling atomisp_q_video_buffers_to_css(), as the driver should discover the right pipe before calling it. Yet, if some pipe parsing issue happens, it could end using it. So, add a WARN_ON() to prevent such case. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: imx: capture: Return -EPIPE from __capture_legacy_try_fmt()	Laurent Pinchart
	[ Upstream commit cc271b6754691af74d710b761eaf027e3743e243 ] The correct return code to report an invalid pipeline configuration is -EPIPE. Return it instead of -EINVAL from __capture_legacy_try_fmt() when the capture format doesn't match the media bus format of the connected subdev. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Rui Miguel Silva <rmfrfs@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: cx23885: add more quirks for reset DMA on some AMD IOMMU	Brad Love
	[ Upstream commit 5f864cfbf59bfed2057bd214ce7fbf6ad420d54b ] The folowing AMD IOMMU are affected by the RiSC engine stall, requiring a reset to maintain continual operation. After being added to the broken_dev_id list the systems are functional long term. 0x1481 is the PCI ID for the IOMMU found on Starship/Matisse 0x1419 is the PCI ID for the IOMMU found on 15h (Models 10h-1fh) family 0x5a23 is the PCI ID for the IOMMU found on RD890S/RD990 Signed-off-by: Brad Love <brad@nextdimension.cc> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: drivers/media/usb: fix memory leak in zr364xx_probe	Pavel Skripkin
	[ Upstream commit 9c39be40c0155c43343f53e3a439290c0fec5542 ] syzbot reported memory leak in zr364xx_probe()[1]. The problem was in invalid error handling order. All error conditions rigth after v4l2_ctrl_handler_init() must call v4l2_ctrl_handler_free(). Reported-by: syzbot+efe9aefc31ae1e6f7675@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: drivers: media: pci: sta2x11: fix Kconfig dependency on GPIOLIB	Julian Braha
	[ Upstream commit 24df8b74c8b2fb42c49ffe8585562da0c96446ff ] When STA2X11_VIP is enabled, and GPIOLIB is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for VIDEO_ADV7180 Depends on [n]: MEDIA_SUPPORT [=y] && GPIOLIB [=n] && VIDEO_V4L2 [=y] && I2C [=y] Selected by [y]: - STA2X11_VIP [=y] && MEDIA_SUPPORT [=y] && MEDIA_PCI_SUPPORT [=y] && MEDIA_CAMERA_SUPPORT [=y] && PCI [=y] && VIDEO_V4L2 [=y] && VIRT_TO_BUS [=y] && I2C [=y] && (STA2X11 [=n] \|\| COMPILE_TEST [=y]) && MEDIA_SUBDRV_AUTOSELECT [=y] This is because STA2X11_VIP selects VIDEO_ADV7180 without selecting or depending on GPIOLIB, despite VIDEO_ADV7180 depending on GPIOLIB. Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	media: ite-cir: check for receive overflow	Sean Young
	[ Upstream commit 28c7afb07ccfc0a939bb06ac1e7afe669901c65a ] It's best if this condition is reported. Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	scsi: target: pscsi: Fix warning in pscsi_complete_cmd()	Chaitanya Kulkarni
	[ Upstream commit fd48c056a32ed6e7754c7c475490f3bed54ed378 ] This fixes a compilation warning in pscsi_complete_cmd(): drivers/target/target_core_pscsi.c: In function ‘pscsi_complete_cmd’: drivers/target/target_core_pscsi.c:624:5: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ; /* XXX: TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE */ Link: https://lore.kernel.org/r/20210228055645.22253-5-chaitanya.kulkarni@wdc.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/virtio: fix possible leak/unlock virtio_gpu_object_array	xndcn
	[ Upstream commit 377f8331d0565e6f71ba081c894029a92d0c7e77 ] virtio_gpu_object array is not freed or unlocked in some failed cases. Signed-off-by: xndcn <xndchn@gmail.com> Link: http://patchwork.freedesktop.org/patch/msgid/20210305151819.14330-1-xndchn@gmail.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	kvfree_rcu: Use same set of GFP flags as does single-argument	Uladzislau Rezki (Sony)
	[ Upstream commit ee6ddf58475cce8a3d3697614679cd8cb4a6f583 ] Running an rcuscale stress-suite can lead to "Out of memory" of a system. This can happen under high memory pressure with a small amount of physical memory. For example, a KVM test configuration with 64 CPUs and 512 megabytes can result in OOM when running rcuscale with below parameters: ../kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig CONFIG_NR_CPUS=64 \ --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 \ rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot" --trust-make <snip> [ 12.054448] kworker/1:1H invoked oom-killer: gfp_mask=0x2cc0(GFP_KERNEL\|__GFP_NOWARN), order=0, oom_score_adj=0 [ 12.055303] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510 [ 12.055416] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014 [ 12.056485] Workqueue: events_highpri fill_page_cache_func [ 12.056485] Call Trace: [ 12.056485] dump_stack+0x57/0x6a [ 12.056485] dump_header+0x4c/0x30a [ 12.056485] ? del_timer_sync+0x20/0x30 [ 12.056485] out_of_memory.cold.47+0xa/0x7e [ 12.056485] __alloc_pages_slowpath.constprop.123+0x82f/0xc00 [ 12.056485] __alloc_pages_nodemask+0x289/0x2c0 [ 12.056485] __get_free_pages+0x8/0x30 [ 12.056485] fill_page_cache_func+0x39/0xb0 [ 12.056485] process_one_work+0x1ed/0x3b0 [ 12.056485] ? process_one_work+0x3b0/0x3b0 [ 12.060485] worker_thread+0x28/0x3c0 [ 12.060485] ? process_one_work+0x3b0/0x3b0 [ 12.060485] kthread+0x138/0x160 [ 12.060485] ? kthread_park+0x80/0x80 [ 12.060485] ret_from_fork+0x22/0x30 [ 12.062156] Mem-Info: [ 12.062350] active_anon:0 inactive_anon:0 isolated_anon:0 [ 12.062350] active_file:0 inactive_file:0 isolated_file:0 [ 12.062350] unevictable:0 dirty:0 writeback:0 [ 12.062350] slab_reclaimable:2797 slab_unreclaimable:80920 [ 12.062350] mapped:1 shmem:2 pagetables:8 bounce:0 [ 12.062350] free:10488 free_pcp:1227 free_cma:0 ... [ 12.101610] Out of memory and no killable processes... [ 12.102042] Kernel panic - not syncing: System is deadlocked on memory [ 12.102583] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510 [ 12.102600] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014 <snip> Because kvfree_rcu() has a fallback path, memory allocation failure is not the end of the world. Furthermore, the added overhead of aggressive GFP settings must be balanced against the overhead of the fallback path, which is a cache miss for double-argument kvfree_rcu() and a call to synchronize_rcu() for single-argument kvfree_rcu(). The current choice of GFP_KERNEL\|__GFP_NOWARN can result in longer latencies than a call to synchronize_rcu(), so less-tenacious GFP flags would be helpful. Here is the tradeoff that must be balanced: a) Minimize use of the fallback path, b) Avoid pushing the system into OOM, c) Bound allocation latency to that of synchronize_rcu(), and d) Leave the emergency reserves to use cases lacking fallbacks. This commit therefore changes GFP flags from GFP_KERNEL\|__GFP_NOWARN to GFP_KERNEL\|__GFP_NORETRY\|__GFP_NOMEMALLOC\|__GFP_NOWARN. This combination leaves the emergency reserves alone and can initiate reclaim, but will not invoke the OOM killer. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	sched/topology: fix the issue groups don't span domain->span for NUMA ↵	Barry Song
	diameter > 2 [ Upstream commit 585b6d2723dc927ebc4ad884c4e879e4da8bc21f ] As long as NUMA diameter > 2, building sched_domain by sibling's child domain will definitely create a sched_domain with sched_group which will span out of the sched_domain: +------+ +------+ +-------+ +------+ \| node \| 12 \|node \| 20 \| node \| 12 \|node \| \| 0 +---------+1 +--------+ 2 +-------+3 \| +------+ +------+ +-------+ +------+ domain0 node0 node1 node2 node3 domain1 node0+1 node0+1 node2+3 node2+3 + domain2 node0+1+2 \| group: node0+1 \| group:node2+3 <-------------------+ when node2 is added into the domain2 of node0, kernel is using the child domain of node2's domain2, which is domain1(node2+3). Node 3 is outside the span of the domain including node0+1+2. This will make load_balance() run based on screwed avg_load and group_type in the sched_group spanning out of the sched_domain, and it also makes select_task_rq_fair() pick an idle CPU outside the sched_domain. Real servers which suffer from this problem include Kunpeng920 and 8-node Sun Fire X4600-M2, at least. Here we move to use the child domain of the child domain of node2's domain2 as the new added sched_group. At the same, we re-use the lower level sgc directly. +------+ +------+ +-------+ +------+ \| node \| 12 \|node \| 20 \| node \| 12 \|node \| \| 0 +---------+1 +--------+ 2 +-------+3 \| +------+ +------+ +-------+ +------+ domain0 node0 node1 +- node2 node3 \| domain1 node0+1 node0+1 \| node2+3 node2+3 \| domain2 node0+1+2 \| group: node0+1 \| group:node2 <-------------------+ While the lower level sgc is re-used, this patch only changes the remote sched_groups for those sched_domains playing grandchild trick, therefore, sgc->next_update is still safe since it's only touched by CPUs that have the group span as local group. And sgc->imbalance is also safe because sd_parent remains the same in load_balance and LB only tries other CPUs from the local group. Moreover, since local groups are not touched, they are still getting roughly equal size in a TL. And should_we_balance() only matters with local groups, so the pull probability of those groups are still roughly equal. Tested by the below topology: qemu-system-aarch64 -M virt -nographic \ -smp cpus=8 \ -numa node,cpus=0-1,nodeid=0 \ -numa node,cpus=2-3,nodeid=1 \ -numa node,cpus=4-5,nodeid=2 \ -numa node,cpus=6-7,nodeid=3 \ -numa dist,src=0,dst=1,val=12 \ -numa dist,src=0,dst=2,val=20 \ -numa dist,src=0,dst=3,val=22 \ -numa dist,src=1,dst=2,val=22 \ -numa dist,src=2,dst=3,val=12 \ -numa dist,src=1,dst=3,val=24 \ -m 4G -cpu cortex-a57 -kernel arch/arm64/boot/Image w/o patch, we get lots of "groups don't span domain->span": [ 0.802139] CPU0 attaching sched-domain(s): [ 0.802193] domain-0: span=0-1 level=MC [ 0.802443] groups: 0:{ span=0 cap=1013 }, 1:{ span=1 cap=979 } [ 0.802693] domain-1: span=0-3 level=NUMA [ 0.802731] groups: 0:{ span=0-1 cap=1992 }, 2:{ span=2-3 cap=1943 } [ 0.802811] domain-2: span=0-5 level=NUMA [ 0.802829] groups: 0:{ span=0-3 cap=3935 }, 4:{ span=4-7 cap=3937 } [ 0.802881] ERROR: groups don't span domain->span [ 0.803058] domain-3: span=0-7 level=NUMA [ 0.803080] groups: 0:{ span=0-5 mask=0-1 cap=5843 }, 6:{ span=4-7 mask=6-7 cap=4077 } [ 0.804055] CPU1 attaching sched-domain(s): [ 0.804072] domain-0: span=0-1 level=MC [ 0.804096] groups: 1:{ span=1 cap=979 }, 0:{ span=0 cap=1013 } [ 0.804152] domain-1: span=0-3 level=NUMA [ 0.804170] groups: 0:{ span=0-1 cap=1992 }, 2:{ span=2-3 cap=1943 } [ 0.804219] domain-2: span=0-5 level=NUMA [ 0.804236] groups: 0:{ span=0-3 cap=3935 }, 4:{ span=4-7 cap=3937 } [ 0.804302] ERROR: groups don't span domain->span [ 0.804520] domain-3: span=0-7 level=NUMA [ 0.804546] groups: 0:{ span=0-5 mask=0-1 cap=5843 }, 6:{ span=4-7 mask=6-7 cap=4077 } [ 0.804677] CPU2 attaching sched-domain(s): [ 0.804687] domain-0: span=2-3 level=MC [ 0.804705] groups: 2:{ span=2 cap=934 }, 3:{ span=3 cap=1009 } [ 0.804754] domain-1: span=0-3 level=NUMA [ 0.804772] groups: 2:{ span=2-3 cap=1943 }, 0:{ span=0-1 cap=1992 } [ 0.804820] domain-2: span=0-5 level=NUMA [ 0.804836] groups: 2:{ span=0-3 mask=2-3 cap=3991 }, 4:{ span=0-1,4-7 mask=4-5 cap=5985 } [ 0.804944] ERROR: groups don't span domain->span [ 0.805108] domain-3: span=0-7 level=NUMA [ 0.805134] groups: 2:{ span=0-5 mask=2-3 cap=5899 }, 6:{ span=0-1,4-7 mask=6-7 cap=6125 } [ 0.805223] CPU3 attaching sched-domain(s): [ 0.805232] domain-0: span=2-3 level=MC [ 0.805249] groups: 3:{ span=3 cap=1009 }, 2:{ span=2 cap=934 } [ 0.805319] domain-1: span=0-3 level=NUMA [ 0.805336] groups: 2:{ span=2-3 cap=1943 }, 0:{ span=0-1 cap=1992 } [ 0.805383] domain-2: span=0-5 level=NUMA [ 0.805399] groups: 2:{ span=0-3 mask=2-3 cap=3991 }, 4:{ span=0-1,4-7 mask=4-5 cap=5985 } [ 0.805458] ERROR: groups don't span domain->span [ 0.805605] domain-3: span=0-7 level=NUMA [ 0.805626] groups: 2:{ span=0-5 mask=2-3 cap=5899 }, 6:{ span=0-1,4-7 mask=6-7 cap=6125 } [ 0.805712] CPU4 attaching sched-domain(s): [ 0.805721] domain-0: span=4-5 level=MC [ 0.805738] groups: 4:{ span=4 cap=984 }, 5:{ span=5 cap=924 } [ 0.805787] domain-1: span=4-7 level=NUMA [ 0.805803] groups: 4:{ span=4-5 cap=1908 }, 6:{ span=6-7 cap=2029 } [ 0.805851] domain-2: span=0-1,4-7 level=NUMA [ 0.805867] groups: 4:{ span=4-7 cap=3937 }, 0:{ span=0-3 cap=3935 } [ 0.805915] ERROR: groups don't span domain->span [ 0.806108] domain-3: span=0-7 level=NUMA [ 0.806130] groups: 4:{ span=0-1,4-7 mask=4-5 cap=5985 }, 2:{ span=0-3 mask=2-3 cap=3991 } [ 0.806214] CPU5 attaching sched-domain(s): [ 0.806222] domain-0: span=4-5 level=MC [ 0.806240] groups: 5:{ span=5 cap=924 }, 4:{ span=4 cap=984 } [ 0.806841] domain-1: span=4-7 level=NUMA [ 0.806866] groups: 4:{ span=4-5 cap=1908 }, 6:{ span=6-7 cap=2029 } [ 0.806934] domain-2: span=0-1,4-7 level=NUMA [ 0.806953] groups: 4:{ span=4-7 cap=3937 }, 0:{ span=0-3 cap=3935 } [ 0.807004] ERROR: groups don't span domain->span [ 0.807312] domain-3: span=0-7 level=NUMA [ 0.807386] groups: 4:{ span=0-1,4-7 mask=4-5 cap=5985 }, 2:{ span=0-3 mask=2-3 cap=3991 } [ 0.807686] CPU6 attaching sched-domain(s): [ 0.807710] domain-0: span=6-7 level=MC [ 0.807750] groups: 6:{ span=6 cap=1017 }, 7:{ span=7 cap=1012 } [ 0.807840] domain-1: span=4-7 level=NUMA [ 0.807870] groups: 6:{ span=6-7 cap=2029 }, 4:{ span=4-5 cap=1908 } [ 0.807952] domain-2: span=0-1,4-7 level=NUMA [ 0.807985] groups: 6:{ span=4-7 mask=6-7 cap=4077 }, 0:{ span=0-5 mask=0-1 cap=5843 } [ 0.808045] ERROR: groups don't span domain->span [ 0.808257] domain-3: span=0-7 level=NUMA [ 0.808571] groups: 6:{ span=0-1,4-7 mask=6-7 cap=6125 }, 2:{ span=0-5 mask=2-3 cap=5899 } [ 0.808848] CPU7 attaching sched-domain(s): [ 0.808860] domain-0: span=6-7 level=MC [ 0.808880] groups: 7:{ span=7 cap=1012 }, 6:{ span=6 cap=1017 } [ 0.808953] domain-1: span=4-7 level=NUMA [ 0.808974] groups: 6:{ span=6-7 cap=2029 }, 4:{ span=4-5 cap=1908 } [ 0.809034] domain-2: span=0-1,4-7 level=NUMA [ 0.809055] groups: 6:{ span=4-7 mask=6-7 cap=4077 }, 0:{ span=0-5 mask=0-1 cap=5843 } [ 0.809128] ERROR: groups don't span domain->span [ 0.810361] domain-3: span=0-7 level=NUMA [ 0.810400] groups: 6:{ span=0-1,4-7 mask=6-7 cap=5961 }, 2:{ span=0-5 mask=2-3 cap=5903 } w/ patch, we don't get "groups don't span domain->span" any more: [ 1.486271] CPU0 attaching sched-domain(s): [ 1.486820] domain-0: span=0-1 level=MC [ 1.500924] groups: 0:{ span=0 cap=980 }, 1:{ span=1 cap=994 } [ 1.515717] domain-1: span=0-3 level=NUMA [ 1.515903] groups: 0:{ span=0-1 cap=1974 }, 2:{ span=2-3 cap=1989 } [ 1.516989] domain-2: span=0-5 level=NUMA [ 1.517124] groups: 0:{ span=0-3 cap=3963 }, 4:{ span=4-5 cap=1949 } [ 1.517369] domain-3: span=0-7 level=NUMA [ 1.517423] groups: 0:{ span=0-5 mask=0-1 cap=5912 }, 6:{ span=4-7 mask=6-7 cap=4054 } [ 1.520027] CPU1 attaching sched-domain(s): [ 1.520097] domain-0: span=0-1 level=MC [ 1.520184] groups: 1:{ span=1 cap=994 }, 0:{ span=0 cap=980 } [ 1.520429] domain-1: span=0-3 level=NUMA [ 1.520487] groups: 0:{ span=0-1 cap=1974 }, 2:{ span=2-3 cap=1989 } [ 1.520687] domain-2: span=0-5 level=NUMA [ 1.520744] groups: 0:{ span=0-3 cap=3963 }, 4:{ span=4-5 cap=1949 } [ 1.520948] domain-3: span=0-7 level=NUMA [ 1.521038] groups: 0:{ span=0-5 mask=0-1 cap=5912 }, 6:{ span=4-7 mask=6-7 cap=4054 } [ 1.522068] CPU2 attaching sched-domain(s): [ 1.522348] domain-0: span=2-3 level=MC [ 1.522606] groups: 2:{ span=2 cap=1003 }, 3:{ span=3 cap=986 } [ 1.522832] domain-1: span=0-3 level=NUMA [ 1.522885] groups: 2:{ span=2-3 cap=1989 }, 0:{ span=0-1 cap=1974 } [ 1.523043] domain-2: span=0-5 level=NUMA [ 1.523092] groups: 2:{ span=0-3 mask=2-3 cap=4037 }, 4:{ span=4-5 cap=1949 } [ 1.523302] domain-3: span=0-7 level=NUMA [ 1.523352] groups: 2:{ span=0-5 mask=2-3 cap=5986 }, 6:{ span=0-1,4-7 mask=6-7 cap=6102 } [ 1.523748] CPU3 attaching sched-domain(s): [ 1.523774] domain-0: span=2-3 level=MC [ 1.523825] groups: 3:{ span=3 cap=986 }, 2:{ span=2 cap=1003 } [ 1.524009] domain-1: span=0-3 level=NUMA [ 1.524086] groups: 2:{ span=2-3 cap=1989 }, 0:{ span=0-1 cap=1974 } [ 1.524281] domain-2: span=0-5 level=NUMA [ 1.524331] groups: 2:{ span=0-3 mask=2-3 cap=4037 }, 4:{ span=4-5 cap=1949 } [ 1.524534] domain-3: span=0-7 level=NUMA [ 1.524586] groups: 2:{ span=0-5 mask=2-3 cap=5986 }, 6:{ span=0-1,4-7 mask=6-7 cap=6102 } [ 1.524847] CPU4 attaching sched-domain(s): [ 1.524873] domain-0: span=4-5 level=MC [ 1.524954] groups: 4:{ span=4 cap=958 }, 5:{ span=5 cap=991 } [ 1.525105] domain-1: span=4-7 level=NUMA [ 1.525153] groups: 4:{ span=4-5 cap=1949 }, 6:{ span=6-7 cap=2006 } [ 1.525368] domain-2: span=0-1,4-7 level=NUMA [ 1.525428] groups: 4:{ span=4-7 cap=3955 }, 0:{ span=0-1 cap=1974 } [ 1.532726] domain-3: span=0-7 level=NUMA [ 1.532811] groups: 4:{ span=0-1,4-7 mask=4-5 cap=6003 }, 2:{ span=0-3 mask=2-3 cap=4037 } [ 1.534125] CPU5 attaching sched-domain(s): [ 1.534159] domain-0: span=4-5 level=MC [ 1.534303] groups: 5:{ span=5 cap=991 }, 4:{ span=4 cap=958 } [ 1.534490] domain-1: span=4-7 level=NUMA [ 1.534572] groups: 4:{ span=4-5 cap=1949 }, 6:{ span=6-7 cap=2006 } [ 1.534734] domain-2: span=0-1,4-7 level=NUMA [ 1.534783] groups: 4:{ span=4-7 cap=3955 }, 0:{ span=0-1 cap=1974 } [ 1.536057] domain-3: span=0-7 level=NUMA [ 1.536430] groups: 4:{ span=0-1,4-7 mask=4-5 cap=6003 }, 2:{ span=0-3 mask=2-3 cap=3896 } [ 1.536815] CPU6 attaching sched-domain(s): [ 1.536846] domain-0: span=6-7 level=MC [ 1.536934] groups: 6:{ span=6 cap=1005 }, 7:{ span=7 cap=1001 } [ 1.537144] domain-1: span=4-7 level=NUMA [ 1.537262] groups: 6:{ span=6-7 cap=2006 }, 4:{ span=4-5 cap=1949 } [ 1.537553] domain-2: span=0-1,4-7 level=NUMA [ 1.537613] groups: 6:{ span=4-7 mask=6-7 cap=4054 }, 0:{ span=0-1 cap=1805 } [ 1.537872] domain-3: span=0-7 level=NUMA [ 1.537998] groups: 6:{ span=0-1,4-7 mask=6-7 cap=6102 }, 2:{ span=0-5 mask=2-3 cap=5845 } [ 1.538448] CPU7 attaching sched-domain(s): [ 1.538505] domain-0: span=6-7 level=MC [ 1.538586] groups: 7:{ span=7 cap=1001 }, 6:{ span=6 cap=1005 } [ 1.538746] domain-1: span=4-7 level=NUMA [ 1.538798] groups: 6:{ span=6-7 cap=2006 }, 4:{ span=4-5 cap=1949 } [ 1.539048] domain-2: span=0-1,4-7 level=NUMA [ 1.539111] groups: 6:{ span=4-7 mask=6-7 cap=4054 }, 0:{ span=0-1 cap=1805 } [ 1.539571] domain-3: span=0-7 level=NUMA [ 1.539610] groups: 6:{ span=0-1,4-7 mask=6-7 cap=6102 }, 2:{ span=0-5 mask=2-3 cap=5845 } Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Meelis Roos <mroos@linux.ee> Link: https://lkml.kernel.org/r/20210224030944.15232-1-song.bao.hua@hisilicon.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	sched/pelt: Fix task util_est update filtering	Vincent Donnefort
	[ Upstream commit b89997aa88f0b07d8a6414c908af75062103b8c9 ] Being called for each dequeue, util_est reduces the number of its updates by filtering out when the EWMA signal is different from the task util_avg by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the decay from a previous high util_avg, EWMA might now be close enough to the new util_avg. No update would then happen while it would leave ue.enqueued with an out-of-date value. Taking into consideration the two util_est members, EWMA and enqueued for the filtering, ensures, for both, an up-to-date value. This is for now an issue only for the trace probe that might return the stale value. Functional-wise, it isn't a problem, as the value is always accessed through max(enqueued, ewma). This problem has been observed using LISA's UtilConvergence:test_means on the sd845c board. No regression observed with Hackbench on sd845c and Perf-bench sched pipe on hikey/hikey960. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210225165820.1377125-1-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	drm/amdgpu: Fix some unload driver issues	Emily Deng
	[ Upstream commit bb0cd09be45ea457f25fdcbcb3d6cf2230f26c46 ] When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue before fence driver fini. And also put drm_sched_fini before waiting fence. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	scsi: lpfc: Fix ADISC handling that never frees nodes	James Smart
	[ Upstream commit 309b477462df7542355ac984674a6e89c01c89aa ] While testing target port swap test with ADISC enabled, several nodes remain in UNUSED state. These nodes are never freed and rmmod hangs for long time before finising with "0233 Nodelist not empty" error. During PLOGI completion lpfc_plogi_confirm_nport() looks for existing nodes with same WWPN. If found, the existing node is used to continue discovery. The node on which plogi was performed is freed. When ADISC is enabled, an ADISC els request is triggered in response to an RSCN. It's possible that the ADISC may be rejected by the remote port causing the ADISC completion handler to clear the port and node name in the node. If this occurs, if a PLOGI is received it causes a node lookup based on wwpn to now fail, causing the port swap logic to kick in which allocates a new node and swaps to it. This effectively orphans the original node structure. Fix the situation by detecting when the lookup fails and forgo the node swap and node allocation by using the node on which the PLOGI was issued. Link: https://lore.kernel.org/r/20210301171821.3427-15-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	scsi: lpfc: Fix PLOGI ACC to be transmit after REG_LOGIN	James Smart
	[ Upstream commit 143753059b8b957f1cf4355338a3e3a32f3a85bf ] The driver is seeing a scenario where PLOGI response was issued and traffic is arriving while the adapter is still setting up the login context. This is resulting in errors handling the traffic. Change the driver so that PLOGI response is sent after the login context has been setup to avoid the situation. Link: https://lore.kernel.org/r/20210301171821.3427-14-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	scsi: lpfc: Fix status returned in lpfc_els_retry() error exit path	James Smart
	[ Upstream commit 148bc64d38fe314475a074c4f757ec9d84537d1c ] An unlikely error exit path from lpfc_els_retry() returns incorrect status to a caller, erroneously indicating that a retry has been successfully issued or scheduled. Change error exit path to indicate no retry. Link: https://lore.kernel.org/r/20210301171821.3427-12-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>