summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-06iwlwifi: mvm: show more HE radiotap data for TB PPDUsJohannes Berg
For trigger-based PPDUs, most values aren't part of the HE-SIG-A because they're preconfigured by the trigger frame. However, we still have this information since we used the trigger frame to configure the hardware, so we can (and do) read it back out and can thus show it in radiotap. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: decode HE information for MU (without ext info)Johannes Berg
When the info type is MU, we still have the data from the TSF overload words, so should decode that. When it's MU_EXT_INFO we additionally have the SIG-B common 0/1/2 fields. Also document the validity depending on the info type and fix the name of the regular TB PPDU info type accordingly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: add debugfs to send host commandShahar S Matityahu
Add debugfs to send host command in mvm and fmac op modes. Allows to send host command at runtime via send_hcmd debugfs file. The command is received as a string that represents hex values. The struct of the command is as follows: [cmd_id][flags][length][data] cmd_id and flags are 8 chars long each. length is 4 chars long. data is length * 2 chars long. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: runtime: add send host command op to firmware runtime op structShahar S Matityahu
Add send host command op to firmware runtime op struct to allow sending host commands to the op mode from the fw runtime context. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: bump firmware API version for 9000 and 22000 series devicesJohannes Berg
Bump the firmware API version to 41. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm Support new MCC update responseHaim Dreyfuss
Change MCC update response API to be compatible with new FW API. While at it change v2 which is not in use anymore to v3 and cleanup mcc_update v1 command and response which is obsolete. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: pcie: check iwl_pcie_txq_build_tfd() return valueJohannes Berg
If we use the iwl_pcie_txq_build_tfd() return value for BIT(), we should validate that it's not going to be negative, so do the check and bail out if we hit an error. We shouldn't, as we check if it'll fit beforehand, but better be safe. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: add fall through commentJohannes Berg
The fall-through to the MVM case is intended as we have to do *something* to continue, and can't easily clean up. So we'll just fail in mvm later, if this does happen. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: pcie gen2: check iwl_pcie_gen2_set_tb() return valueJohannes Berg
If we use the iwl_pcie_gen2_set_tb() return value for BIT(), we should validate that it's not going to be negative, so do the check and bail out if we hit an error. We shouldn't, as we check if it'll fit beforehand, but better be safe. Fixes: ab6c644539e9 ("iwlwifi: pcie: copy TX functions to new transport") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: nvm: get num of hw addresses from firmwareNaftali Goldstein
With NICs that don't read the NVM directly and instead rely on getting the relevant data from the firmware, the number of reserved MAC addresses was not added to the API. This caused the driver to assume there is only one address which results in all interfaces getting the same address. Update the API to fix this. While at it, fix-up the comments with firmware api names to actually match what we have in the firmware. Fixes: e9e1ba3dbf00 ("iwlwifi: mvm: support getting nvm data from firmware") Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: add dump collection in case alive flow failsShahar S Matityahu
Trigger dump collection if the alive flow fails, regardless of the reason. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: pcie: avoid empty free RB queueShaul Triebitz
If all free RB queues are empty, the driver will never restock the free RB queue. That's because the restocking happens in the Rx flow, and if the free queue is empty there will be no Rx. Although there's a background worker (a.k.a. allocator) allocating memory for RBs so that the Rx handler can restock them, the worker may run only after the free queue has become empty (and then it is too late for restocking as explained above). There is a solution for that called 'emergency': If the number of used RB's reaches half the amount of all RB's, the Rx handler will not wait for the allocator but immediately allocate memory for the used RB's and restock the free queue. But, since the used RB's is per queue, it may happen that the used RB's are spread between the queues such that the emergency check will fail for each of the queues (and still run out of RBs, causing the above symptom). To fix it, move to emergency mode if the sum of *all* used RBs (for all Rx queues) reaches half the amount of all RB's Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: set max TX/RX A-MPDU subframes to HE limitJohannes Berg
In mac80211, the default remains for HT, so set the limit to HE for our driver. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: add more information to HE radiotapJohannes Berg
For SU/SU-ER/MU PPDUs we have spatial reuse. For those where it's relevant we also know the pre-FEC padding factor, PE disambiguity bit, beam change bit and doppler bit. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: add LDPC-XSYM to HE radiotap dataJohannes Berg
Add information about the LDCP extra symbol segment to the HE data when applicable (not for trigger-based PPDUs). While at it, clean up the code for UL/DL a bit. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: add TXOP to HE radiotap dataJohannes Berg
We have this data available, so add it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: move HE-MU LTF_NUM parsing to he_phy_data parsingJohannes Berg
This code gets shorter if it doesn't have to check all the conditions, so move it to an appropriate place that has all of them validated already. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: clean up HE radiotap RU allocation parsingJohannes Berg
Split the code out into a separate routine, and move that to be called inside the previously introduced iwl_mvm_decode_he_phy_data() function. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: pull some he_phy_data decoding into a separate functionJohannes Berg
Pull some of the decoding of he_phy_data into a separate function so we don't need to check over and over again if it's valid. While at it, fix the UL/DL bit reporting to be for all but trigger- based frames. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: put HE SIG-B symbols/users data correctlyJohannes Berg
As detected by Luca during code review when I move this in the next patch, the code here is putting the data into the wrong field (flags1 instead of flags2). Fix that. Fixes: e5721e3f770f ("iwlwifi: mvm: add radiotap data for HE") Reported-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: minor cleanups to HE radiotap codeJohannes Berg
Remove a stray empty line, unbreak some lines that aren't really that long, and move on variable setting into the initializer to avoid initializing it twice. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: remove unnecessary overload variableJohannes Berg
This is equivalent to checking he_phy_data != HE_PHY_DATA_INVAL, which is already done in a number of places, so remove the extra 'overload' variable entirely. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: clear HW_RESTART_REQUESTED when stopping the interfaceEmmanuel Grumbach
Fix a bug that happens in the following scenario: 1) suspend without WoWLAN 2) mac80211 calls drv_stop because of the suspend 3) __iwl_mvm_mac_stop deallocates the aux station 4) during drv_stop the firmware crashes 5) iwlmvm: * sets IWL_MVM_STATUS_HW_RESTART_REQUESTED * asks mac80211 to kick the restart flow 6) mac80211 puts the restart worker into a freezable queue which means that the worker will not run for now since the workqueue is already frozen 7) ... 8) resume 9) mac80211 runs ieee80211_reconfig as part of the resume 10) mac80211 detects that a restart flow has been requested and that we are now resuming from suspend and cancels the restart worker 11) mac80211 calls drv_start() 12) __iwl_mvm_mac_start checks that IWL_MVM_STATUS_HW_RESTART_REQUESTED clears it, sets IWL_MVM_STATUS_IN_HW_RESTART and calls iwl_mvm_restart_cleanup() 13) iwl_fw_error_dump gets called and accesses the device to get debug data 14) iwl_mvm_up adds the aux station 15) iwl_mvm_add_aux_sta() allocates an internal station for the aux station 16) iwl_mvm_allocate_int_sta() tests IWL_MVM_STATUS_IN_HW_RESTART and doesn't really allocate a station ID for the aux station 17) a new queue is added for the aux station Note that steps from 5 to 9 aren't really part of the problem but were described for the sake of completeness. Once the iwl_mvm_mac_stop() is called, the device is not accessible, meaning that step 12) can't succeed and we'll see the following: drivers/net/wireless/intel/iwlwifi/pcie/trans.c:2122 iwl_trans_pcie_grab_nic_access+0xc0/0x1d6 [iwlwifi]() Timeout waiting for hardware access (CSR_GP_CNTRL 0x080403d8) Call Trace: [<ffffffffc03e6ad3>] iwl_trans_pcie_grab_nic_access+0xc0/0x1d6 [iwlwifi] [<ffffffffc03e6a13>] iwl_trans_pcie_dump_regs+0x3fd/0x3fd [iwlwifi] [<ffffffffc03dad42>] iwl_fw_error_dump+0x4f5/0xe8b [iwlwifi] [<ffffffffc04bd43e>] __iwl_mvm_mac_start+0x5a/0x21a [iwlmvm] [<ffffffffc04bd6d2>] iwl_mvm_mac_start+0xd4/0x103 [iwlmvm] [<ffffffffc042d378>] drv_start+0xa1/0xc5 [iwl7000_mac80211] [<ffffffffc045a339>] ieee80211_reconfig+0x145/0xf50 [mac80211] [<ffffffffc044788b>] ieee80211_resume+0x62/0x66 [mac80211] [<ffffffffc0366c5b>] wiphy_resume+0xa9/0xc6 [cfg80211] The station id of the aux station is set to 0xff in step 3 and because we don't really allocate a new station id for the auxliary station (as explained in 16), we end up sending a command to the firmware asking to connect the queue to station id 0xff. This makes the firmware crash with the following information: 0x00002093 | ADVANCED_SYSASSERT 0x000002F0 | trm_hw_status0 0x00000000 | trm_hw_status1 0x00000B38 | branchlink2 0x0001978C | interruptlink1 0x00000000 | interruptlink2 0xFF080501 | data1 0xDEADBEEF | data2 0xDEADBEEF | data3 Firmware error during reconfiguration - reprobe! FW error in SYNC CMD SCD_QUEUE_CFG Fix this by clearing IWL_MVM_STATUS_HW_RESTART_REQUESTED in iwl_mvm_mac_stop(). We won't be able to collect debug data anyway and when we will brought up again, we will have a clean state from the firmware perspective. Since we won't have IWL_MVM_STATUS_IN_HW_RESTART set in step 12) we won't get to the 2093 ASSERT either. Fixes: bf8b286f86fc ("iwlwifi: mvm: defer setting IWL_MVM_STATUS_IN_HW_RESTART") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: dbg: group trigger condition to helper functionSara Sharon
The triplet of get trigger, is trigger enabled and is trigger stopped repeats itself. Group them in a function to avoid code duplication. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: dbg: dump memory in a helper functionSara Sharon
The code that dumps various memory types repeats itself. Move it to a function to avoid duplication. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: allow channel reorder optimization during scanAyala Beker
Allow the FW to reorder HB channels and first scan HB channels with assumed APs, in order to reduce the scan duration. Currently enable it for all scan requests types. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: dbg: split iwl_fw_error_dump to two functionsSara Sharon
Split iwl_fw_error_dump to two parts. The first part will dump the actual data, and second will do the file allocations, trans calls and actual file operations. This is done in order to enable reuse of the code for the new debug ini infrastructure. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: dbg: refactor dump code to improve readabilitySara Sharon
Add a macro to replace all the conditions checking for valid dump length. In addition, move the fifo len calculation to a helper function. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: dbg: move debug data to a structSara Sharon
The debug variables are bloating the iwl_fw struct. And the fields are out of order, missing docs and some are redundant. Clean this up. This serves as preparation for unionizing it for the new ini infra. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-06iwlwifi: mvm: check for n_profiles validity in EWRD ACPILuca Coelho
When reading the profiles from the EWRD table in ACPI, we loop over the data and set it into our internal table. We use the number of profiles specified in ACPI without checking its validity, so if the ACPI table is corrupted and the number is larger than our array size, we will try to make an out-of-bounds access. Fix this by making sure the value specified in the ACPI table is valid. Fixes: 6996490501ed ("iwlwifi: mvm: add support for EWRD (Dynamic SAR) ACPI table") Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-05Merge branch 'akpm'Greg Kroah-Hartman
* akpm: mm: madvise(MADV_DODUMP): allow hugetlbfs pages ocfs2: fix locking for res->tracking and dlm->tracking_list mm/vmscan.c: fix int overflow in callers of do_shrink_slab() mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly mm/vmstat.c: fix outdated vmstat_text proc: restrict kernel stack dumps to root mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes mm/migrate.c: split only transparent huge pages when allocation fails ipc/shm.c: use ERR_CAST() for shm_lock() error return mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl mm, thp: fix mlocking THP page with migration enabled ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() hugetlb: take PMD sharing into account when flushing tlb/caches mm: migration: fix migration of huge PMD shared pages
2018-10-05mm: madvise(MADV_DODUMP): allow hugetlbfs pagesDaniel Black
Reproducer, assuming 2M of hugetlbfs available: Hugetlbfs mounted, size=2M and option user=testuser # mount | grep ^hugetlbfs hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan) # sysctl vm.nr_hugepages=1 vm.nr_hugepages = 1 # grep Huge /proc/meminfo AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 2048 kB Code: #include <sys/mman.h> #include <stddef.h> #define SIZE 2*1024*1024 int main() { void *ptr; ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0); madvise(ptr, SIZE, MADV_DONTDUMP); madvise(ptr, SIZE, MADV_DODUMP); } Compile and strace: mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000 madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0 madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument) hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on author testing with analysis from Florian Weimer[1]. The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a consequence of the large useage of VM_DONTEXPAND in device drivers. A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be marked DODUMP. A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs memory for a while and later request that madvise(MADV_DODUMP) on the same memory. We correct this omission by allowing madvice(MADV_DODUMP) on hugetlbfs pages. [1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit [2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers") Link: http://lkml.kernel.org/r/20180930054629.29150-1-daniel@linux.ibm.com Link: https://lists.launchpad.net/maria-discuss/msg05245.html Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers") Reported-by: Kenneth Penza <kpenza@gmail.com> Signed-off-by: Daniel Black <daniel@linux.ibm.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05ocfs2: fix locking for res->tracking and dlm->tracking_listAshish Samant
In dlm_init_lockres() we access and modify res->tracking and dlm->tracking_list without holding dlm->track_lock. This can cause list corruptions and can end up in kernel panic. Fix this by locking res->tracking and dlm->tracking_list with dlm->track_lock instead of dlm->spinlock. Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/vmscan.c: fix int overflow in callers of do_shrink_slab()Kirill Tkhai
do_shrink_slab() returns unsigned long value, and the placing into int variable cuts high bytes off. Then we compare ret and 0xfffffffe (since SHRINK_EMPTY is converted to ret type). Thus a large number of objects returned by do_shrink_slab() may be interpreted as SHRINK_EMPTY, if low bytes of their value are equal to 0xfffffffe. Fix that by declaration ret as unsigned long in these functions. Link: http://lkml.kernel.org/r/153813407177.17544.14888305435570723973.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properlyJann Horn
5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") made the availability of the NR_TLB_REMOTE_FLUSH* counters inside the kernel unconditional to reduce #ifdef soup, but (either to avoid showing dummy zero counters to userspace, or because that code was missed) didn't update the vmstat_array, meaning that all following counters would be shown with incorrect values. This only affects kernel builds with CONFIG_VM_EVENT_COUNTERS=y && CONFIG_DEBUG_TLBFLUSH=y && CONFIG_SMP=n. Link: http://lkml.kernel.org/r/20181001143138.95119-2-jannh@google.com Fixes: 5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Kemi Wang <kemi.wang@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/vmstat.c: fix outdated vmstat_textJann Horn
7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") removed the VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding entry in vmstat_text. This causes an out-of-bounds access in vmstat_show(). Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which is probably very rare. Link: http://lkml.kernel.org/r/20181001143138.95119-1-jannh@google.com Fixes: 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Kemi Wang <kemi.wang@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05proc: restrict kernel stack dumps to rootJann Horn
Currently, you can use /proc/self/task/*/stack to cause a stack walk on a task you control while it is running on another CPU. That means that the stack can change under the stack walker. The stack walker does have guards against going completely off the rails and into random kernel memory, but it can interpret random data from your kernel stack as instruction pointers and stack pointers. This can cause exposure of kernel stack contents to userspace. Restrict the ability to inspect kernel stacks of arbitrary tasks to root in order to prevent a local attacker from exploiting racy stack unwinding to leak kernel task stack contents. See the added comment for a longer rationale. There don't seem to be any users of this userspace API that can't gracefully bail out if reading from the file fails. Therefore, I believe that this change is unlikely to break things. In the case that this patch does end up needing a revert, the next-best solution might be to fake a single-entry stack based on wchan. Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizesAnshuman Khandual
ARM64 architecture also supports 32MB and 512MB HugeTLB page sizes. This just adds mmap() system call argument encoding for them. Link: http://lkml.kernel.org/r/1537841300-6979-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/migrate.c: split only transparent huge pages when allocation failsAnshuman Khandual
split_huge_page_to_list() fails on HugeTLB pages. I was experimenting with moving 32MB contig HugeTLB pages on arm64 (with a debug patch applied) and hit the following stack trace when the kernel crashed. [ 3732.462797] Call trace: [ 3732.462835] split_huge_page_to_list+0x3b0/0x858 [ 3732.462913] migrate_pages+0x728/0xc20 [ 3732.462999] soft_offline_page+0x448/0x8b0 [ 3732.463097] __arm64_sys_madvise+0x724/0x850 [ 3732.463197] el0_svc_handler+0x74/0x110 [ 3732.463297] el0_svc+0x8/0xc [ 3732.463347] Code: d1000400 f90b0e60 f2fbd5a2 a94982a1 (f9000420) When unmap_and_move[_huge_page]() fails due to lack of memory, the splitting should happen only for transparent huge pages not for HugeTLB pages. PageTransHuge() returns true for both THP and HugeTLB pages. Hence the conditonal check should test PagesHuge() flag to make sure that given pages is not a HugeTLB one. Link: http://lkml.kernel.org/r/1537798495-4996-1-git-send-email-anshuman.khandual@arm.com Fixes: 94723aafb9 ("mm: unclutter THP migration") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05ipc/shm.c: use ERR_CAST() for shm_lock() error returnKees Cook
This uses ERR_CAST() instead of an open-coded cast, as it is casting across structure pointers, which upsets __randomize_layout: ipc/shm.c: In function `shm_lock': ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm' return (void *)ipcp; ^~~~~~~~~~~~ Link: http://lkml.kernel.org/r/20180919180722.GA15073@beast Fixes: 82061c57ce93 ("ipc: drop ipc_lock()") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctlYueHaibing
get_user_pages_fast() will return negative value if no pages were pinned, then be converted to a unsigned, which is compared to zero, giving the wrong result. Link: http://lkml.kernel.org/r/20180921095015.26088-1-yuehaibing@huawei.com Fixes: 09e35a4a1ca8 ("mm/gup_benchmark: handle gup failures") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm, thp: fix mlocking THP page with migration enabledKirill A. Shutemov
A transparent huge page is represented by a single entry on an LRU list. Therefore, we can only make unevictable an entire compound page, not individual subpages. If a user tries to mlock() part of a huge page, we want the rest of the page to be reclaimable. We handle this by keeping PTE-mapped huge pages on normal LRU lists: the PMD on border of VM_LOCKED VMA will be split into PTE table. Introduction of THP migration breaks[1] the rules around mlocking THP pages. If we had a single PMD mapping of the page in mlocked VMA, the page will get mlocked, regardless of PTE mappings of the page. For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in remove_migration_pmd(). Anon THP pages can only be shared between processes via fork(). Mlocked page can only be shared if parent mlocked it before forking, otherwise CoW will be triggered on mlock(). For Anon-THP, we can fix the issue by munlocking the page on removing PTE migration entry for the page. PTEs for the page will always come after mlocked PMD: rmap walks VMAs from oldest to newest. Test-case: #include <unistd.h> #include <sys/mman.h> #include <sys/wait.h> #include <linux/mempolicy.h> #include <numaif.h> int main(void) { unsigned long nodemask = 4; void *addr; addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); if (fork()) { wait(NULL); return 0; } mlock(addr, 4UL << 10); mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES, &nodemask, 4, MPOL_MF_MOVE); return 0; } [1] https://lkml.kernel.org/r/CAOMGZ=G52R-30rZvhGxEbkTw7rLLwBGadVYeo--iizcD3upL3A@mail.gmail.com Link: http://lkml.kernel.org/r/20180917133816.43995-1-kirill.shutemov@linux.intel.com Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()Larry Chen
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages is dirty. When a page has not been written back, it is still in dirty state. If ocfs2_duplicate_clusters_by_page() is called against the dirty page, the crash happens. To fix this bug, we can just unlock the page and wait until the page until its not dirty. The following is the backtrace: kernel BUG at /root/code/ocfs2/refcounttree.c:2961! [exception RIP: ocfs2_duplicate_clusters_by_page+822] __ocfs2_move_extent+0x80/0x450 [ocfs2] ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] ocfs2_move_extents+0x180/0x3b0 [ocfs2] ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] ocfs2_ioctl+0x253/0x640 [ocfs2] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Once we find the page is dirty, we do not wait until it's clean, rather we use write_one_page() to write it back Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com [lchen@suse.com: update comments] Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Larry Chen <lchen@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05hugetlb: take PMD sharing into account when flushing tlb/cachesMike Kravetz
When fixing an issue with PMD sharing and migration, it was discovered via code inspection that other callers of huge_pmd_unshare potentially have an issue with cache and tlb flushing. Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst case ranges for mmu notifiers. Ensure that this range is flushed if huge_pmd_unshare succeeds and unmaps a PUD_SUZE area. Link: http://lkml.kernel.org/r/20180823205917.16297-3-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm: migration: fix migration of huge PMD shared pagesMike Kravetz
The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com [mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05Merge tag 'pci-v4.19-fixes-3' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Bjorn writes: "PCI fixes for v4.19: - Reprogram bridge prefetch registers to fix NVIDIA and Radeon issues after suspend/resume (Daniel Drake) - Fix mvebu I/O mapping creation sequence (Thomas Petazzoni) - Fix minor MAINTAINERS file match issue (Bjorn Helgaas)" * tag 'pci-v4.19-fixes-3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: mvebu: Fix PCI I/O mapping creation sequence MAINTAINERS: Remove obsolete drivers/pci pattern from ACPI section PCI: Reprogram bridge prefetch registers on resume
2018-10-05Merge tag 'for-4.19/dm-fixes-2' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Mike writes: "device mapper fixes - Fix a DM thinp __udivdi3 undefined on 32-bit bug introduced during 4.19 merge window. - Fix leak and dangling pointer in DM multipath's scsi_dh related code. - A couple stable@ fixes for DM cache's resize support. - A DM raid fix to remove "const" from decipher_sync_action()'s return type." * tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix resize crash if user doesn't reload cache table dm cache metadata: ignore hints array being too small during resize dm raid: remove bogus const from decipher_sync_action() return type dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer dm thin metadata: fix __udivdi3 undefined on 32-bit
2018-10-05Merge tag 'gpio-v4.19-3' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Linus writes: "A single GPIO fix: Free the last used descriptor, an off by one error. This is tagged for stable as well." * tag 'gpio-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpiolib: Free the last requested descriptor
2018-10-05Merge tag 'pm-4.19-rc7' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Rafael writes: "Power management fix for 4.19-rc7 Fix a bug that may cause runtime PM to misbehave for some devices after a failing or aborted system suspend which is nasty enough for an -rc7 time frame fix." * tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / core: Clear the direct_complete flag on errors
2018-10-05Merge branch 'perf-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "perf fixes: - fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver - fix a CPU event enumeration bug in the x86 AMD PMU driver - fix a perf ring-buffer corruption bug when using tracepoints - fix a PMU unregister locking bug" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX perf/ring_buffer: Prevent concurent ring buffer access perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0 perf/core: Fix perf_pmu_unregister() locking