summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-05kvm: mmu: Replace unsigned with unsigned int for PTE accessBen Gardon
There are several functions which pass an access permission mask for SPTEs as an unsigned. This works, but checkpatch complains about it. Switch the occurrences of unsigned to unsigned int to satisfy checkpatch. No functional change expected. Tested by running kvm-unit-tests on an Intel Haswell machine. This commit introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()Sean Christopherson
The blurb pertaining to the return value of nested_vmx_load_cr3() no longer matches reality, remove it entirely as the behavior it is attempting to document is quite obvious when reading the actual code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()Sean Christopherson
Fold kvm_mips_comparecount_func() into kvm_mips_comparecount_wakeup() to eliminate the nondescript function name as well as its unnecessary cast of a vcpu to "unsigned long" and back to a vcpu. Presumably func() was used as a callback at some point during pre-upstream development, as wakeup() is the only user of func() and has been the only user since both with introduced by commit 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM"). Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: MIPS: Fix a build error due to referencing not-yet-defined functionSean Christopherson
Hoist kvm_mips_comparecount_wakeup() above its only user, kvm_arch_vcpu_create() to fix a compilation error due to referencing an undefined function. Fixes: d11dfed5d700 ("KVM: MIPS: Move all vcpu init code into kvm_arch_vcpu_create()") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05x86/kvm: do not setup pv tlb flush when not paravirtualizedThadeu Lima de Souza Cascardo
kvm_setup_pv_tlb_flush will waste memory and print a misguiding message when KVM paravirtualization is not available. Intel SDM says that the when cpuid is used with EAX higher than the maximum supported value for basic of extended function, the data for the highest supported basic function will be returned. So, in some systems, kvm_arch_para_features will return bogus data, causing kvm_setup_pv_tlb_flush to detect support for pv tlb flush. Testing for kvm_para_available will work as it checks for the hypervisor signature. Besides, when the "nopv" command line parameter is used, it should not continue as well, as kvm_guest_init will no be called in that case. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: fix overflow of zero page refcount with ksm runningZhuang Yanying
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows. step1: echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan echo 1 > /sys/kernel/pages_to_scan/run echo 1 > /sys/kernel/pages_to_scan/use_zero_pages step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time. After a long period of time, all domains hang because of the refcount of zero page overflow. Qemu print error log as follow: … error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04 … Meanwhile, a kernel warning is departed. [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7 Signed-off-by: LinFeng <linfeng23@huawei.com> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05qed: Fix timestamping issue for L2 unicast ptp packets.Sudarsana Reddy Kalluru
commit cedeac9df4b8 ("qed: Add support for Timestamping the unicast PTP packets.") handles the timestamping of L4 ptp packets only. This patch adds driver changes to detect/timestamp both L2/L4 unicast PTP packets. Fixes: cedeac9df4b8 ("qed: Add support for Timestamping the unicast PTP packets.") Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05KVM: x86: Take a u64 when checking for a valid dr7 valueSean Christopherson
Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build warning on i386 due to right-shifting a 32-bit value by 32 when checking for bits being set in dr7[63:32]. Alternatively, the warning could be resolved by rewriting the check to use an i386-friendly method, but taking a u64 fixes another oddity on 32-bit KVM. Beause KVM implements natural width VMCS fields as u64s to avoid layout issues between 32-bit and 64-bit, a devious guest can stuff vmcs12->guest_dr7 with a 64-bit value even when both the guest and host are 32-bit kernels. KVM eventually drops vmcs12->guest_dr7[63:32] when propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely on that behavior for correctness. Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: use raw clock values consistentlyPaolo Bonzini
Commit 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") changed kvmclock to use tkr_raw instead of tkr_mono. However, the default kvmclock_offset for the VM was still based on the monotonic clock and, if the raw clock drifted enough from the monotonic clock, this could cause a negative system_time to be written to the guest's struct pvclock. RHEL5 does not like it and (if it boots fast enough to observe a negative time value) it hangs. There is another thing to be careful about: getboottime64 returns the host boot time with tkr_mono frequency, and subtracting the tkr_raw-based kvmclock value will cause the wallclock to be off if tkr_raw drifts from tkr_mono. To avoid this, compute the wallclock delta from the current time instead of being clever and using getboottime64. Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: reorganize pvclock_gtod_data membersPaolo Bonzini
We will need a copy of tk->offs_boot in the next patch. Store it and cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot included, store the raw value in struct pvclock_clock and sum it in do_monotonic_raw and do_realtime. tk->tkr_xxx.xtime_nsec also moves to struct pvclock_clock. While at it, fix a (usually harmless) typo in do_monotonic_raw, which was using gtod->clock.shift instead of gtod->raw_clock.shift. Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: nVMX: delete meaningless nested_vmx_run() declarationMiaohe Lin
The function nested_vmx_run() declaration is below its implementation. So this is meaningless and should be removed. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: SVM: allow AVIC without split irqchipPaolo Bonzini
SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT are in use. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: ioapic: Lazy update IOAPIC EOISuravee Suthikulpanit
In-kernel IOAPIC does not receive EOI with AMD SVM AVIC since the processor accelerate write to APIC EOI register and does not trap if the interrupt is edge-triggered. Workaround this by lazy check for pending APIC EOI at the time when setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: ioapic: Refactor kvm_ioapic_update_eoi()Suravee Suthikulpanit
Refactor code for handling IOAPIC EOI for subsequent patch. There is no functional change. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.Suravee Suthikulpanit
AMD SVM AVIC accelerates EOI write and does not trap. This causes in-kernel PIT re-injection mode to fail since it relies on irq-ack notifier mechanism. So, APICv is activated only when in-kernel PIT is in discard mode e.g. w/ qemu option: -global kvm-pit.lost_tick_policy=discard Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this reason. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Temporarily deactivate AVIC during ExtINT handlingSuravee Suthikulpanit
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary deactivated and fall back to using legacy interrupt injection via vINTR and interrupt window. Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Deactivate AVIC when launching guest with nested SVM supportSuravee Suthikulpanit
Since AVIC does not currently work w/ nested virtualization, deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM] (i.e. indicate support for SVM, which is needed for nested virtualization). Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for this reason. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: hyperv: Use APICv update request interfaceSuravee Suthikulpanit
Since disabling APICv has to be done for all vcpus on AMD-based system, adopt the newly introduced kvm_request_apicv_update() interface, and introduce a new APICV_INHIBIT_REASON_HYPERV. Also, remove the kvm_vcpu_deactivate_apicv() since no longer used. Cc: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Add support for dynamic APICvSuravee Suthikulpanit
Add necessary logics to support (de)activate AVIC at runtime. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce x86 ops hook for pre-update APICvSuravee Suthikulpanit
AMD SVM AVIC needs to update APIC backing page mapping before changing APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl function hook to be called prior KVM APICv update request to each vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasonsSuravee Suthikulpanit
Inibit reason bits are used to determine if APICv deactivation is applicable for a particular hardware virtualization architecture. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: svm: avic: Add support for dynamic setup/teardown of virtual APIC ↵Suravee Suthikulpanit
backing page Re-factor avic_init_access_page() to avic_update_access_page() since activate/deactivate AVIC requires setting/unsetting the memory region used for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: svm: Add support to (de)activate posted interruptsSuravee Suthikulpanit
Introduce interface for (de)activate posted interrupts, and implement SVM hooks to toggle AMD IOMMU guest virtual APIC mode. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add APICv (de)activate request trace pointsSuravee Suthikulpanit
Add trace points when sending request to (de)activate APICv. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add support for dynamic APICv activationSuravee Suthikulpanit
Certain runtime conditions require APICv to be temporary deactivated during runtime. The current implementation only support run-time deactivation of APICv when Hyper-V SynIC is enabled, which is not temporary. In addition, for AMD, when APICv is (de)activated at runtime, all vcpus in the VM have to operate in the same mode. Thus the requesting vcpu must notify the others. So, introduce the following: * A new KVM_REQ_APICV_UPDATE request bit * Interfaces to request all vcpus to update APICv status * A new interface to update APICV-related parameters for each vcpu Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: remove get_enable_apicv from kvm_x86_opsPaolo Bonzini
It is unused now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv inhibit reason bitsSuravee Suthikulpanit
There are several reasons in which a VM needs to deactivate APICv e.g. disable APICv via parameter during module loading, or when enable Hyper-V SynIC support. Additional inhibit reasons will be introduced later on when dynamic APICv is supported, Introduce KVM APICv inhibit reason bits along with a new variable, apicv_inhibit_reasons, to help keep track of APICv state for each VM, Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate the case where APICv is disabled during KVM module load. (e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: lapic: Introduce APICv update helper functionSuravee Suthikulpanit
Re-factor code into a helper function for setting lapic parameters when activate/deactivate APICv, and export the function for subsequent usage. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05Merge branch 'macb-TSO-bug-fixes'David S. Miller
Harini Katakam says: ==================== macb: TSO bug fixes An IP errata was recently discovered when testing TSO enabled versions with perf test tools where a false amba error is reported by the IP. Some ways to reproduce would be to use iperf or applications with payload descriptor sizes very close to 16K. Once the error is observed TXERR (or bit 6 of ISR) will be constantly triggered leading to a series of tx path error handling and clean up. Workaround the same by limiting this size to 0x3FC0 as recommended by Cadence. There was no performance impact on 1G system that I tested with. Note on patch 1: The alignment code may be unused but leaving it there in case anyone is using UFO. Added Fixes tag to patch 1. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net: macb: Limit maximum GEM TX length in TSOHarini Katakam
GEM_MAX_TX_LEN currently resolves to 0x3FF8 for any IP version supporting TSO with full 14bits of length field in payload descriptor. But an IP errata causes false amba_error (bit 6 of ISR) when length in payload descriptors is specified above 16387. The error occurs because the DMA falsely concludes that there is not enough space in SRAM for incoming payload. These errors were observed continuously under stress of large packets using iperf on a version where SRAM was 16K for each queue. This errata will be documented shortly and affects all versions since TSO functionality was added. Hence limit the max length to 0x3FC0 (rounded). Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net: macb: Remove unnecessary alignment check for TSOHarini Katakam
The IP TSO implementation does NOT require the length to be a multiple of 8. That is only a requirement for UFO as per IP documentation. Hence, exit macb_features_check function in the beginning if the protocol is not UDP. Only when it is UDP, proceed further to the alignment checks. Update comments to reflect the same. Also remove dead code checking for protocol TCP when calculating header length. Fixes: 1629dd4f763c ("cadence: Add LSO support.") Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05bonding/alb: properly access headers in bond_alb_xmit()Eric Dumazet
syzbot managed to send an IPX packet through bond_alb_xmit() and af_packet and triggered a use-after-free. First, bond_alb_xmit() was using ipx_hdr() helper to reach the IPX header, but ipx_hdr() was using the transport offset instead of the network offset. In the particular syzbot report transport offset was 0xFFFF This patch removes ipx_hdr() since it was only (mis)used from bonding. Then we need to make sure IPv4/IPv6/IPX headers are pulled in skb->head before dereferencing anything. BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108 (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline] [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53 [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282 [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline] [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline] [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422 [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469 [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline] [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224 [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline] [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline] [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline] [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627 [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238 [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278 [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline] [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252 [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline] [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684 [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996 [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline] [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05devlink: report 0 after hitting end in region readJacob Keller
commit fdd41ec21e15 ("devlink: Return right error code in case of errors for region read") modified the region read code to report errors properly in unexpected cases. In the case where the start_offset and ret_offset match, it unilaterally converted this into an error. This causes an issue for the "dump" version of the command. In this case, the devlink region dump will always report an invalid argument: 000000000000ffd0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 000000000000ffe0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff devlink answers: Invalid argument 000000000000fff0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff This occurs because the expected flow for the dump is to return 0 after there is no further data. The simplest fix would be to stop converting the error code to -EINVAL if start_offset == ret_offset. However, avoid unnecessary work by checking for when start_offset is larger than the region size and returning 0 upfront. Fixes: fdd41ec21e15 ("devlink: Return right error code in case of errors for region read") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net: ethernet: dec: tulip: Fix length mask in receive length calculationMoritz Fischer
The receive frame length calculation uses a wrong mask to calculate the length of the received frames. Per spec table 4-1 the length is contained in the FL (Frame Length) field in bits 30:16. This didn't show up as an issue so far since frames were limited to 1500 bytes which falls within the 11 bit window. Signed-off-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05Merge branch 'wg-fixes'David S. Miller
Jason A. Donenfeld says: ==================== wireguard fixes for 5.6-rc1 Here are fixes for WireGuard before 5.6-rc1 is tagged. It includes: 1) A fix for a UaF (caused by kmalloc failing during a very small allocation) that syzkaller found, from Eric Dumazet. 2) A fix for a deadlock that syzkaller found, along with an additional selftest to ensure that the bug fix remains correct, from me. 3) Two little fixes/cleanups to the selftests from Krzysztof Kozlowski and me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05wireguard: selftests: tie socket waiting to target pidJason A. Donenfeld
Without this, we wind up proceeding too early sometimes when the previous process has just used the same listening port. So, we tie the listening socket query to the specific pid we're interested in. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05wireguard: selftests: cleanup CONFIG_ENABLE_WARN_DEPRECATEDKrzysztof Kozlowski
CONFIG_ENABLE_WARN_DEPRECATED is gone since commit 771c035372a0 ("deprecate the '__deprecated' attribute warnings entirely and for good"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05wireguard: selftests: ensure non-addition of peers with failed precomputationJason A. Donenfeld
Ensure that peers with low order points are ignored, both in the case where we already have a device private key and in the case where we do not. This adds points that naturally give a zero output. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05wireguard: noise: reject peers with low order public keysJason A. Donenfeld
Our static-static calculation returns a failure if the public key is of low order. We check for this when peers are added, and don't allow them to be added if they're low order, except in the case where we haven't yet been given a private key. In that case, we would defer the removal of the peer until we're given a private key, since at that point we're doing new static-static calculations which incur failures we can act on. This meant, however, that we wound up removing peers rather late in the configuration flow. Syzkaller points out that peer_remove calls flush_workqueue, which in turn might then wait for sending a handshake initiation to complete. Since handshake initiation needs the static identity lock, holding the static identity lock while calling peer_remove can result in a rare deadlock. We have precisely this case in this situation of late-stage peer removal based on an invalid public key. We can't drop the lock when removing, because then incoming handshakes might interact with a bogus static-static calculation. While the band-aid patch for this would involve breaking up the peer removal into two steps like wg_peer_remove_all does, in order to solve the locking issue, there's actually a much more elegant way of fixing this: If the static-static calculation succeeds with one private key, it *must* succeed with all others, because all 32-byte strings map to valid private keys, thanks to clamping. That means we can get rid of this silly dance and locking headaches of removing peers late in the configuration flow, and instead just reject them early on, regardless of whether the device has yet been assigned a private key. For the case where the device doesn't yet have a private key, we safely use zeros just for the purposes of checking for low order points by way of checking the output of the calculation. The following PoC will trigger the deadlock: ip link add wg0 type wireguard ip addr add 10.0.0.1/24 dev wg0 ip link set wg0 up ping -f 10.0.0.2 & while true; do wg set wg0 private-key /dev/null peer AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= allowed-ips 10.0.0.0/24 endpoint 10.0.0.3:1234 wg set wg0 private-key <(echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=) done [ 0.949105] ====================================================== [ 0.949550] WARNING: possible circular locking dependency detected [ 0.950143] 5.5.0-debug+ #18 Not tainted [ 0.950431] ------------------------------------------------------ [ 0.950959] wg/89 is trying to acquire lock: [ 0.951252] ffff8880333e2128 ((wq_completion)wg-kex-wg0){+.+.}, at: flush_workqueue+0xe3/0x12f0 [ 0.951865] [ 0.951865] but task is already holding lock: [ 0.952280] ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0 [ 0.953011] [ 0.953011] which lock already depends on the new lock. [ 0.953011] [ 0.953651] [ 0.953651] the existing dependency chain (in reverse order) is: [ 0.954292] [ 0.954292] -> #2 (&wg->static_identity.lock){++++}: [ 0.954804] lock_acquire+0x127/0x350 [ 0.955133] down_read+0x83/0x410 [ 0.955428] wg_noise_handshake_create_initiation+0x97/0x700 [ 0.955885] wg_packet_send_handshake_initiation+0x13a/0x280 [ 0.956401] wg_packet_handshake_send_worker+0x10/0x20 [ 0.956841] process_one_work+0x806/0x1500 [ 0.957167] worker_thread+0x8c/0xcb0 [ 0.957549] kthread+0x2ee/0x3b0 [ 0.957792] ret_from_fork+0x24/0x30 [ 0.958234] [ 0.958234] -> #1 ((work_completion)(&peer->transmit_handshake_work)){+.+.}: [ 0.958808] lock_acquire+0x127/0x350 [ 0.959075] process_one_work+0x7ab/0x1500 [ 0.959369] worker_thread+0x8c/0xcb0 [ 0.959639] kthread+0x2ee/0x3b0 [ 0.959896] ret_from_fork+0x24/0x30 [ 0.960346] [ 0.960346] -> #0 ((wq_completion)wg-kex-wg0){+.+.}: [ 0.960945] check_prev_add+0x167/0x1e20 [ 0.961351] __lock_acquire+0x2012/0x3170 [ 0.961725] lock_acquire+0x127/0x350 [ 0.961990] flush_workqueue+0x106/0x12f0 [ 0.962280] peer_remove_after_dead+0x160/0x220 [ 0.962600] wg_set_device+0xa24/0xcc0 [ 0.962994] genl_rcv_msg+0x52f/0xe90 [ 0.963298] netlink_rcv_skb+0x111/0x320 [ 0.963618] genl_rcv+0x1f/0x30 [ 0.963853] netlink_unicast+0x3f6/0x610 [ 0.964245] netlink_sendmsg+0x700/0xb80 [ 0.964586] __sys_sendto+0x1dd/0x2c0 [ 0.964854] __x64_sys_sendto+0xd8/0x1b0 [ 0.965141] do_syscall_64+0x90/0xd9a [ 0.965408] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 0.965769] [ 0.965769] other info that might help us debug this: [ 0.965769] [ 0.966337] Chain exists of: [ 0.966337] (wq_completion)wg-kex-wg0 --> (work_completion)(&peer->transmit_handshake_work) --> &wg->static_identity.lock [ 0.966337] [ 0.967417] Possible unsafe locking scenario: [ 0.967417] [ 0.967836] CPU0 CPU1 [ 0.968155] ---- ---- [ 0.968497] lock(&wg->static_identity.lock); [ 0.968779] lock((work_completion)(&peer->transmit_handshake_work)); [ 0.969345] lock(&wg->static_identity.lock); [ 0.969809] lock((wq_completion)wg-kex-wg0); [ 0.970146] [ 0.970146] *** DEADLOCK *** [ 0.970146] [ 0.970531] 5 locks held by wg/89: [ 0.970908] #0: ffffffff827433c8 (cb_lock){++++}, at: genl_rcv+0x10/0x30 [ 0.971400] #1: ffffffff82743480 (genl_mutex){+.+.}, at: genl_rcv_msg+0x642/0xe90 [ 0.971924] #2: ffffffff827160c0 (rtnl_mutex){+.+.}, at: wg_set_device+0x9f/0xcc0 [ 0.972488] #3: ffff888032819de0 (&wg->device_update_lock){+.+.}, at: wg_set_device+0xb0/0xcc0 [ 0.973095] #4: ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0 [ 0.973653] [ 0.973653] stack backtrace: [ 0.973932] CPU: 1 PID: 89 Comm: wg Not tainted 5.5.0-debug+ #18 [ 0.974476] Call Trace: [ 0.974638] dump_stack+0x97/0xe0 [ 0.974869] check_noncircular+0x312/0x3e0 [ 0.975132] ? print_circular_bug+0x1f0/0x1f0 [ 0.975410] ? __kernel_text_address+0x9/0x30 [ 0.975727] ? unwind_get_return_address+0x51/0x90 [ 0.976024] check_prev_add+0x167/0x1e20 [ 0.976367] ? graph_lock+0x70/0x160 [ 0.976682] __lock_acquire+0x2012/0x3170 [ 0.976998] ? register_lock_class+0x1140/0x1140 [ 0.977323] lock_acquire+0x127/0x350 [ 0.977627] ? flush_workqueue+0xe3/0x12f0 [ 0.977890] flush_workqueue+0x106/0x12f0 [ 0.978147] ? flush_workqueue+0xe3/0x12f0 [ 0.978410] ? find_held_lock+0x2c/0x110 [ 0.978662] ? lock_downgrade+0x6e0/0x6e0 [ 0.978919] ? queue_rcu_work+0x60/0x60 [ 0.979166] ? netif_napi_del+0x151/0x3b0 [ 0.979501] ? peer_remove_after_dead+0x160/0x220 [ 0.979871] peer_remove_after_dead+0x160/0x220 [ 0.980232] wg_set_device+0xa24/0xcc0 [ 0.980516] ? deref_stack_reg+0x8e/0xc0 [ 0.980801] ? set_peer+0xe10/0xe10 [ 0.981040] ? __ww_mutex_check_waiters+0x150/0x150 [ 0.981430] ? __nla_validate_parse+0x163/0x270 [ 0.981719] ? genl_family_rcv_msg_attrs_parse+0x13f/0x310 [ 0.982078] genl_rcv_msg+0x52f/0xe90 [ 0.982348] ? genl_family_rcv_msg_attrs_parse+0x310/0x310 [ 0.982690] ? register_lock_class+0x1140/0x1140 [ 0.983049] netlink_rcv_skb+0x111/0x320 [ 0.983298] ? genl_family_rcv_msg_attrs_parse+0x310/0x310 [ 0.983645] ? netlink_ack+0x880/0x880 [ 0.983888] genl_rcv+0x1f/0x30 [ 0.984168] netlink_unicast+0x3f6/0x610 [ 0.984443] ? netlink_detachskb+0x60/0x60 [ 0.984729] ? find_held_lock+0x2c/0x110 [ 0.984976] netlink_sendmsg+0x700/0xb80 [ 0.985220] ? netlink_broadcast_filtered+0xa60/0xa60 [ 0.985533] __sys_sendto+0x1dd/0x2c0 [ 0.985763] ? __x64_sys_getpeername+0xb0/0xb0 [ 0.986039] ? sockfd_lookup_light+0x17/0x160 [ 0.986397] ? __sys_recvmsg+0x8c/0xf0 [ 0.986711] ? __sys_recvmsg_sock+0xd0/0xd0 [ 0.987018] __x64_sys_sendto+0xd8/0x1b0 [ 0.987283] ? lockdep_hardirqs_on+0x39b/0x5a0 [ 0.987666] do_syscall_64+0x90/0xd9a [ 0.987903] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 0.988223] RIP: 0033:0x7fe77c12003e [ 0.988508] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 4 [ 0.989666] RSP: 002b:00007fffada2ed58 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 0.990137] RAX: ffffffffffffffda RBX: 00007fe77c159d48 RCX: 00007fe77c12003e [ 0.990583] RDX: 0000000000000040 RSI: 000055fd1d38e020 RDI: 0000000000000004 [ 0.991091] RBP: 000055fd1d38e020 R08: 000055fd1cb63358 R09: 000000000000000c [ 0.991568] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000002c [ 0.992014] R13: 0000000000000004 R14: 000055fd1d38e020 R15: 0000000000000001 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05wireguard: allowedips: fix use-after-free in root_remove_peer_listsEric Dumazet
In the unlikely case a new node could not be allocated, we need to remove @newnode from @peer->allowedips_list before freeing it. syzbot reported: BUG: KASAN: use-after-free in __list_del_entry_valid+0xdc/0xf5 lib/list_debug.c:54 Read of size 8 at addr ffff88809881a538 by task syz-executor.4/30133 CPU: 0 PID: 30133 Comm: syz-executor.4 Not tainted 5.5.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 __list_del_entry_valid+0xdc/0xf5 lib/list_debug.c:54 __list_del_entry include/linux/list.h:132 [inline] list_del include/linux/list.h:146 [inline] root_remove_peer_lists+0x24f/0x4b0 drivers/net/wireguard/allowedips.c:65 wg_allowedips_free+0x232/0x390 drivers/net/wireguard/allowedips.c:300 wg_peer_remove_all+0xd5/0x620 drivers/net/wireguard/peer.c:187 wg_set_device+0xd01/0x1350 drivers/net/wireguard/netlink.c:542 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x67d/0xea0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 genl_rcv+0x29/0x40 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45b399 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f99a9bcdc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f99a9bce6d4 RCX: 000000000045b399 RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 R13: 00000000000009ba R14: 00000000004cb2b8 R15: 0000000000000009 Allocated by task 30103: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527 kmem_cache_alloc_trace+0x158/0x790 mm/slab.c:3551 kmalloc include/linux/slab.h:556 [inline] kzalloc include/linux/slab.h:670 [inline] add+0x70a/0x1970 drivers/net/wireguard/allowedips.c:236 wg_allowedips_insert_v4+0xf6/0x160 drivers/net/wireguard/allowedips.c:320 set_allowedip drivers/net/wireguard/netlink.c:343 [inline] set_peer+0xfb9/0x1150 drivers/net/wireguard/netlink.c:468 wg_set_device+0xbd4/0x1350 drivers/net/wireguard/netlink.c:591 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x67d/0xea0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 genl_rcv+0x29/0x40 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 30103: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 add+0x12d2/0x1970 drivers/net/wireguard/allowedips.c:266 wg_allowedips_insert_v4+0xf6/0x160 drivers/net/wireguard/allowedips.c:320 set_allowedip drivers/net/wireguard/netlink.c:343 [inline] set_peer+0xfb9/0x1150 drivers/net/wireguard/netlink.c:468 wg_set_device+0xbd4/0x1350 drivers/net/wireguard/netlink.c:591 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x67d/0xea0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 genl_rcv+0x29/0x40 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88809881a500 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 56 bytes inside of 64-byte region [ffff88809881a500, ffff88809881a540) The buggy address belongs to the page: page:ffffea0002620680 refcount:1 mapcount:0 mapping:ffff8880aa400380 index:0x0 raw: 00fffe0000000200 ffffea000250b748 ffffea000254bac8 ffff8880aa400380 raw: 0000000000000000 ffff88809881a000 0000000100000020 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88809881a400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88809881a480: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc >ffff88809881a500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff88809881a580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88809881a600: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: wireguard@lists.zx2c4.com Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net_sched: fix a resource leak in tcindex_set_parms()Cong Wang
Jakub noticed there is a potential resource leak in tcindex_set_parms(): when tcindex_filter_result_init() fails and it jumps to 'errout1' which doesn't release the memory and resources allocated by tcindex_alloc_perfect_hash(). We should just jump to 'errout_alloc' which calls tcindex_free_perfect_hash(). Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Reported-by: Jakub Kicinski <kuba@kernel.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05mptcp: fix use-after-free on tcp fallbackFlorian Westphal
When an mptcp socket connects to a tcp peer or when a middlebox interferes with tcp options, mptcp needs to fall back to plain tcp. Problem is that mptcp is trying to be too clever in this case: It attempts to close the mptcp meta sk and transparently replace it with the (only) subflow tcp sk. Unfortunately, this is racy -- the socket is already exposed to userspace. Any parallel calls to send/recv/setsockopt etc. can cause use-after-free: BUG: KASAN: use-after-free in atomic_try_cmpxchg include/asm-generic/atomic-instrumented.h:693 [inline] CPU: 1 PID: 2083 Comm: syz-executor.1 Not tainted 5.5.0 #2 atomic_try_cmpxchg include/asm-generic/atomic-instrumented.h:693 [inline] queued_spin_lock include/asm-generic/qspinlock.h:78 [inline] do_raw_spin_lock include/linux/spinlock.h:181 [inline] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline] _raw_spin_lock_bh+0x71/0xd0 kernel/locking/spinlock.c:175 spin_lock_bh include/linux/spinlock.h:343 [inline] __lock_sock+0x105/0x190 net/core/sock.c:2414 lock_sock_nested+0x10f/0x140 net/core/sock.c:2938 lock_sock include/net/sock.h:1516 [inline] mptcp_setsockopt+0x2f/0x1f0 net/mptcp/protocol.c:800 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xb7/0x3d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 While the use-after-free can be resolved, there is another problem: sock->ops and sock->sk assignments are not atomic, i.e. we may get calls into mptcp functions with sock->sk already pointing at the subflow socket, or calls into tcp functions with a mptcp meta sk. Remove the fallback code and call the relevant functions for the (only) subflow in case the mptcp socket is connected to tcp peer. Reported-by: Christoph Paasch <cpaasch@apple.com> Diagnosed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net: dsa: microchip: Platform data shan't include kernel.hAndy Shevchenko
Replace with appropriate types.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net: dsa: b53: Platform data shan't include kernel.hAndy Shevchenko
Replace with appropriate types.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05netdevsim: fix ptr_ret.cocci warningskbuild test robot
drivers/net/netdevsim/dev.c:937:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: 6556ff32f12d ("netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs") CC: Taehee Yoo <ap420073@gmail.com> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05net: sgi: ioc3-eth: Remove leftover free_irq()Thomas Bogendoerfer
Commit 0ce5ebd24d25 ("mfd: ioc3: Add driver for SGI IOC3 chip") moved request_irq() from ioc3_open into probe function, but forgot to remove free_irq() from ioc3_close. Fixes: 0ce5ebd24d25 ("mfd: ioc3: Add driver for SGI IOC3 chip") Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05cifs: fail i/o on soft mounts if sessionsetup errors outRonnie Sahlberg
RHBZ: 1579050 If we have a soft mount we should fail commands for session-setup failures (such as the password having changed/ account being deleted/ ...) and return an error back to the application. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2020-02-05smb3: fix problem with null cifs super block with previous patchSteve French
Add check for null cifs_sb to create_options helper Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-02-05Merge tag 'asoc-v5.6-2' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.6 A collection of updates for bugs fixed since the initial pull request, the most important one being the addition of COMMON_CLK for wcd934x which is needed for MFD to be merged.
2020-02-05ASoC: wcd934x: Add missing COMMON_CLK dependency to SND_SOC_ALL_CODECSGeert Uytterhoeven
Just adding a dependency on COMMON_CLK to SND_SOC_WCD934X is not sufficient, as enabling SND_SOC_ALL_CODECS will still select it, breaking the build later: WARNING: unmet direct dependencies detected for SND_SOC_WCD934X Depends on [n]: SOUND [=m] && !UML && SND [=m] && SND_SOC [=m] && COMMON_CLK [=n] && MFD_WCD934X [=m] Selected by [m]: - SND_SOC_ALL_CODECS [=m] && SOUND [=m] && !UML && SND [=m] && SND_SOC [=m] && COMPILE_TEST [=y] && MFD_WCD934X [=m] ... ERROR: "of_clk_add_provider" [sound/soc/codecs/snd-soc-wcd934x.ko] undefined! ERROR: "of_clk_src_simple_get" [sound/soc/codecs/snd-soc-wcd934x.ko] undefined! ERROR: "clk_hw_register" [sound/soc/codecs/snd-soc-wcd934x.ko] undefined! ERROR: "__clk_get_name" [sound/soc/codecs/snd-soc-wcd934x.ko] undefined! Fix this by adding the missing dependency to SND_SOC_ALL_CODECS Fixes: 42b716359beca106 ("ASoC: wcd934x: Add missing COMMON_CLK dependency") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20200204131857.7634-1-geert@linux-m68k.org Signed-off-by: Mark Brown <broonie@kernel.org>