summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-30Merge tag 'fs_for_v5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF and isofs updates from Jan Kara: "Several smaller fixes and cleanups in UDF and isofs" * tag 'fs_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf_get_extendedattr() had no boundary checks. isofs: joliet: Fix iocharset=utf8 mount option udf: Fix iocharset=utf8 mount option udf: Get rid of 0-length arrays in struct fileIdentDesc udf: Get rid of 0-length arrays udf: Remove unused declaration udf: Check LVID earlier
2021-08-30Merge tag 'fiemap_for_v5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull FIEMAP cleanups from Jan Kara: "FIEMAP cleanups from Christoph transitioning all remaining filesystems supporting FIEMAP (ext2, hpfs) to iomap API and removing the old helper" * tag 'fiemap_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs: remove generic_block_fiemap hpfs: use iomap_fiemap to implement ->fiemap ext2: use iomap_fiemap to implement ->fiemap ext2: make ext2_iomap_ops available unconditionally
2021-08-30Merge tag 'fsnotify_for_v5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify speedups when notification actually isn't used and support for identifying processes which caused fanotify events through pidfd instead of normal pid" * tag 'fsnotify_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize the case of no marks of any type fsnotify: count all objects with attached connectors fsnotify: count s_fsnotify_inode_refs for attached connectors fsnotify: replace igrab() with ihold() on attach connector fanotify: add pidfd support to the fanotify API fanotify: introduce a generic info record copying helper fanotify: minor cosmetic adjustments to fid labels kernel/pid.c: implement additional checks upon pidfd_create() parameters kernel/pid.c: remove static qualifier from pidfd_create()
2021-08-30vt_kdsetmode: extend console lockingLinus Torvalds
As per the long-suffering comment. Reported-by: Minh Yuan <yuanmingbuaa@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-30console: consume APC, DM, DCSnick black
The Linux console's VT102 implementation already consumes OSC ("Operating System Command") sequences, probably because that's how palette changes are transmitted. In addition to OSC, there are three other major clases of ANSI control strings: APC ("Application Program Command"), PM ("Privacy Message"), and DCS ("Device Control String"). They are handled similarly to OSC in terms of termination. Source: vt100.net Add three new enumerated states, one for each of these types. All three are handled the same way right now--they simply consume input until terminated. I hope to expand upon this firmament in the future. Add new predicate ansi_control_string(), returning true for any of these states. Replace explicit checks against ESosc with calls to this function. Transition to these states appropriately from the escape initiation (ESesc) state. This was motivated by the following Notcurses bugs: https://github.com/dankamongmen/notcurses/issues/2050 https://github.com/dankamongmen/notcurses/issues/1828 https://github.com/dankamongmen/notcurses/issues/2069 where standard VT sequences are not consumed by the Linux console. It's not necessary that the Linux console *support* these sequences, but it ought *consume* these well-specified classes of sequences. Tested by sending a variety of escape sequences to the console, and verifying that they still worked, or were now properly consumed. Verified that the escapes were properly terminated at a generic level. Verified that the Notcurses tools continued to show expected output on the Linux console, except now without escape bleedthrough. Link: https://lore.kernel.org/lkml/YSydL0q8iaUfkphg@schwarzgerat.orthanc/ Signed-off-by: nick black <dankamongmen@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-30string: improve default out-of-line memcmp() implementationLinus Torvalds
This just does the "if the architecture does efficient unaligned handling, start the memcmp using 'unsigned long' accesses", since Nikolay Borisov found a load that cares. This is basically the minimal patch, and limited to architectures that are known to not have slow unaligned handling. We've had the stupid byte-at-a-time version forever, and nobody has ever even noticed before, so let's keep the fix minimal. A potential further improvement would be to align one of the sources in order to at least minimize unaligned cases, but the only real case of bigger memcmp() users seems to be the FIDEDUPERANGE ioctl(). As David Sterba says, the dedupe ioctl is typically called on ranges spanning many pages so the common case will all be page-aligned anyway. All the relevant architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS, so I'm not going to worry about the combination of a very rare use-case and a rare architecture until somebody actually hits it. Particularly since Nikolay also tested the more complex patch with extra alignment handling code, and it only added overhead. Link: https://lore.kernel.org/lkml/20210721135926.602840-1-nborisov@suse.com/ Reported-by: Nikolay Borisov <nborisov@suse.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-30io-wq: fix wakeup race when adding new workJens Axboe
When new work is added, io_wqe_enqueue() checks if we need to wake or create a new worker. But that check is done outside the lock that otherwise synchronizes us with a worker going to sleep, so we can end up in the following situation: CPU0 CPU1 lock insert work unlock atomic_read(nr_running) != 0 lock atomic_dec(nr_running) no wakeup needed Hold the wqe lock around the "need to wakeup" check. Then we can also get rid of the temporary work_flags variable, as we know the work will remain valid as long as we hold the lock. Cc: stable@vger.kernel.org Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-30io-wq: wqe and worker locks no longer need to be IRQ safeJens Axboe
io_uring no longer queues async work off completion handlers that run in hard or soft interrupt context, and that use case was the only reason that io-wq had to use IRQ safe locks for wqe and worker locks. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-30io-wq: check max_worker limits if a worker transitions bound stateJens Axboe
For the two places where new workers are created, we diligently check if we are allowed to create a new worker. If we're currently at the limit of how many workers of a given type we can have, then we don't create any new ones. If you have a mixed workload with various types of bound and unbounded work, then it can happen that a worker finishes one type of work and is then transitioned to the other type. For this case, we don't check if we are actually allowed to do so. This can cause io-wq to temporarily exceed the allowed number of workers for a given type. When retrieving work, check that the types match. If they don't, check if we are allowed to transition to the other type. If not, then don't handle the new work. Cc: stable@vger.kernel.org Reported-by: Johannes Lundberg <johalun0@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-30net: ipv4: Fix the warning for dereferenceYajun Deng
Add a if statements to avoid the warning. Dan Carpenter report: The patch faf482ca196a: "net: ipv4: Move ip_options_fragment() out of loop" from Aug 23, 2021, leads to the following Smatch complaint: net/ipv4/ip_output.c:833 ip_do_fragment() warn: variable dereferenced before check 'iter.frag' (see line 828) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: faf482ca196a ("net: ipv4: Move ip_options_fragment() out of loop") Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#t Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: qrtr: make checks in qrtr_endpoint_post() stricterDan Carpenter
These checks are still not strict enough. The main problem is that if "cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is guaranteed to be 4 but we need to be at least 16 bytes. In fact, we can reject everything smaller than sizeof(*pkt) which is 20 bytes. Also I don't like the ALIGN(size, 4). It's better to just insist that data is needs to be aligned at the start. Fixes: 0baa99ee353c ("net: qrtr: Allow non-immediate node routing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30fix array-index-out-of-bounds in taprio_changeHaimin Zhang
syzbot report an array-index-out-of-bounds in taprio_change index 16 is out of range for type '__u16 [16]' that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check the return value of netdev_set_num_tc. Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: fix NULL pointer reference in cipso_v4_doi_free王贇
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc failed, we sometime observe panic: BUG: kernel NULL pointer dereference, address: ... RIP: 0010:cipso_v4_doi_free+0x3a/0x80 ... Call Trace: netlbl_cipsov4_add_std+0xf4/0x8c0 netlbl_cipsov4_add+0x13f/0x1b0 genl_family_rcv_msg_doit.isra.15+0x132/0x170 genl_rcv_msg+0x125/0x240 This is because in cipso_v4_doi_free() there is no check on 'doi_def->map.std' when doi_def->type got value 1, which is possibe, since netlbl_cipsov4_add_std() haven't initialize it before alloc 'doi_def->map.std'. This patch just add the check to prevent panic happen in similar cases. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge branch 'inet-exceptions-less-predictable'David S. Miller
Eric Dumazet says: ==================== inet: make exception handling less predictible This second round of patches is addressing Keyu Man recommendations to make linux hosts more robust against a class of brute force attacks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ipv4: make exception cache less predictibleEric Dumazet
Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in fnhe_hashfun()"), an attacker can still use brute force to learn some secrets from a victim linux host. One way to defeat these attacks is to make the max depth of the hash table bucket a random value. Before this patch, each bucket of the hash table used to store exceptions could contain 6 items under attack. After the patch, each bucket would contains a random number of items, between 6 and 10. The attacker can no longer infer secrets. This is slightly increasing memory size used by the hash table, by 50% in average, we do not expect this to be a problem. This patch is more complex than the prior one (IPv6 equivalent), because IPv4 was reusing the oldest entry. Since we need to be able to evict more than one entry per update_or_create_fnhe() call, I had to replace fnhe_oldest() with fnhe_remove_oldest(). Also note that we will queue extra kfree_rcu() calls under stress, which hopefully wont be a too big issue. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ipv6: make exception cache less predictibleEric Dumazet
Even after commit 4785305c05b2 ("ipv6: use siphash in rt6_exception_hash()"), an attacker can still use brute force to learn some secrets from a victim linux host. One way to defeat these attacks is to make the max depth of the hash table bucket a random value. Before this patch, each bucket of the hash table used to store exceptions could contain 6 items under attack. After the patch, each bucket would contains a random number of items, between 6 and 10. The attacker can no longer infer secrets. This is slightly increasing memory size used by the hash table, we do not expect this to be a problem. Following patch is dealing with the same issue in IPv4. Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30s390: remove SCHED_CORE from defconfigsHeiko Carstens
This causes too many problems. Enable it again when everything has been sorted out. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Clean up and consolidate ct ecache infrastructure by merging ct and expect notifiers, from Florian Westphal. 2) Missing counters and timestamp in nfnetlink_queue and _log conntrack information. 3) Missing error check for xt_register_template() in iptables mangle, as a incremental fix for the previous pull request, also from Florian Westphal. 4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets do not break. There is a static key to disable the hooks by default. The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable impact in the seg6_input path for non-netfilter users: similar numbers with and without this patch. This is a sample of the perf report output: 11.67% kpktgend_0 [ipv6] [k] ipv6_get_saddr_eval 7.89% kpktgend_0 [ipv6] [k] __ipv6_addr_label 7.52% kpktgend_0 [ipv6] [k] __ipv6_dev_get_saddr 6.63% kpktgend_0 [kernel.vmlinux] [k] asm_exc_nmi 4.74% kpktgend_0 [ipv6] [k] fib6_node_lookup_1 3.48% kpktgend_0 [kernel.vmlinux] [k] pskb_expand_head 3.33% kpktgend_0 [ipv6] [k] ip6_rcv_core.isra.29 3.33% kpktgend_0 [ipv6] [k] seg6_do_srh_encap 2.53% kpktgend_0 [ipv6] [k] ipv6_dev_get_saddr 2.45% kpktgend_0 [ipv6] [k] fib6_table_lookup 2.24% kpktgend_0 [kernel.vmlinux] [k] ___cache_free 2.16% kpktgend_0 [ipv6] [k] ip6_pol_route 2.11% kpktgend_0 [kernel.vmlinux] [k] __ipv6_addr_type ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge branch 'IXP46x-PTP-Timer'David S. Miller
Linus Walleij says: ==================== IXP46x PTP Timer clean-up and DT ChangeLog v2->v3: - Dropped the patch enabling compile tests: we are still dependent on some machine-specific headers. The plan is to get rid of this after device tree conversion. We include one of the compile testing fixes anyway, because it is nice to have fixed. - Rebased on the latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ixp4xx_eth: Probe the PTP module from the device treeLinus Walleij
This adds device tree probing support for the PTP module adjacent to the ethernet module. It is pretty straight forward, all resources are in the device tree as they come to the platform device. Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ixp4xx_eth: Add devicetree bindingsLinus Walleij
This adds device tree bindings for the IXP46x PTP Timer, a companion to the IXP4xx ethernet in newer platforms. Cc: devicetree@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ixp4xx_eth: Stop referring to GPIOsLinus Walleij
The driver is being passed interrupts, then looking up the same interrupts as GPIOs a second time to convert them into interrupts and set properties on them. This is pointless: the GPIO and irqchip APIs of a GPIO chip are orthogonal. Just request the interrupts and be done with it, drop reliance on any GPIO functions or definitions. Use devres-managed functions and add a small devress quirk to unregister the clock as well and we can rely on devres to handle all the resources and cut down a bunch of boilerplate in the process. Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ixp4xx_eth: fix compile-testingArnd Bergmann
Change the driver to use portable integer types to avoid warnings during compile testing, including: drivers/net/ethernet/xscale/ixp4xx_eth.c:721:21: error: cast to 'u32 *' (aka 'unsigned int *') from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast] memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4); ^ drivers/net/ethernet/xscale/ixp4xx_eth.c:963:12: error: incompatible pointer types passing 'u32 *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types] &port->desc_tab_phys))) ^~~~~~~~~~~~~~~~~~~~ include/linux/dmapool.h:27:20: note: passing argument to parameter 'handle' here dma_addr_t *handle); ^ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ixp4xx_eth: make ptp support a platform driverArnd Bergmann
After the recent ixp4xx cleanups, the ptp driver has gained a build failure in some configurations: drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init': drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function) Avoid the last bit of hardcoded constants from platform headers by turning the ptp driver bit into a platform driver and passing the IRQ and MMIO address as resources. This is a bit tricky: - The interface between the two drivers is now the new ixp46x_ptp_find() function, replacing the global ixp46x_phc_index variable. The call is done as late as possible, in hwtstamp_set(), to ensure that the ptp device is fully probed. - As the ptp driver is now called by the network driver, the link dependency is reversed, which in turn requires a small Makefile hack - The GPIO number is still left hardcoded. This is clearly not great, but it can be addressed later. Note that commit 98ac0cc270b7 ("ARM: ixp4xx: Convert to MULTI_IRQ_HANDLER") changed the IRQ number to something meaningless. Passing the correct IRQ in a resource fixes this. - When the PTP driver is disabled, ethtool .get_ts_info() now correctly lists only software timestamping regardless of the hardware. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [Fix a missing include] Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge branch 'hns3-cleanups'David S. Miller
Guangbin Huang says: ==================== net: hns3: add some cleanups This series includes some cleanups for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hns3: uniform parameter name of hclge_ptp_clean_tx_hwts()Hao Chen
The parameter name of hclge_ptp_clean_tx_hwts() in declaration is "dev", but the definition of this function is used the common name "hdev" as other functions, so modify it. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hnss3: use max() to simplify codeHao Chen
Replace the "? :" statement wich max() to simplify code. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hns3: modify a print format of hns3_dbg_queue_map()Hao Chen
The type of tqp_vector->vector_irq is int, so modify its print format to "%d". Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hns3: refine function hclge_dbg_dump_tm_pri()Guangbin Huang
To improve flexibility, simplicity and maintainability to dump info of every element of tm priority, add a struct hclge_dbg_item array of tm priority and fill string of every data according to this array. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hns3: reconstruct function hclge_ets_validate()Guangbin Huang
This patch reconstructs function hclge_ets_validate() to reduce the code cycle complexity and make code more concise. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hns3: reconstruct function hns3_self_testPeng Li
This patch reconstructs function hns3_self_test to reduce the code cycle complexity and make code more concise. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: hns3: initialize each member of structure array on a separate lineJiaran Zhang
To make the format of each member initialization of structure array clearer, initialize each member on a separate line. Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge branch 'bnxt_en-fw-messages'David S. Miller
Michael Chan says: ==================== bnxt_en: Implement new driver APIs to send FW messages The current driver APIs to send messages to the firmware allow only one outstanding message in flight. There is only one buffer for the firmware response for each firmware channel. To send a firmware message, all callers must take a mutex and it is released after the firmware response has been read. This scheme does not allow multiple firmware messages in flight. Firmware may take a long time to respond to some messages (e.g. NVRAM related ones) and this causes the mutex to be held for a long time, blocking other callers. This patchset intoduces the new driver APIs to address the above shortcomings. The new APIs are compatible with new and old firmware. But the new deferred firmware response mechanism will require newer firmware in order to allow multiple outstanding firmware commands. All callers are updated to use the new APIs. v2: Patch 4 and patch 9 updated to fix issues reported by test robot ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: support multiple HWRM commands in flightEdwin Peer
Add infrastructure to maintain a pending list of HWRM commands awaiting completion and reduce the scope of the hwrm_cmd_lock mutex so that it protects only the request mailbox. The mailbox is free to use for one or more concurrent commands after receiving deferred response events. For uniformity and completeness, use the same pending list for collecting completions for commands that respond via a completion ring. These commands are only used for freeing rings and for IRQ test and we only support one such command in flight. Note deferred responses are also only supported on the main channel. The secondary channel (KONG) does not support deferred responses. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: remove legacy HWRM interfaceEdwin Peer
There are no longer any callers relying on the old API. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: update all firmware calls to use the new APIsEdwin Peer
The conversion follows this general pattern for most of the calls: 1. The input message is changed from a stack variable initialized using bnxt_hwrm_cmd_hdr_init() to a pointer allocated and intialized using hwrm_req_init(). 2. If we don't need to read the firmware response, the hwrm_send_message() call is replaced with hwrm_req_send(). 3. If we need to read the firmware response, the mutex lock is replaced by hwrm_req_hold() to hold the response. When the response is read, the mutex unlock is replaced by hwrm_req_drop(). If additional DMA buffers are needed for firmware response data, the hwrm_req_dma_slice() is used instead of calling dma_alloc_coherent(). Some minor refactoring is also done while doing these conversions. v2: Fix unintialized variable warnings in __bnxt_hwrm_get_tx_rings() and bnxt_approve_mac() Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: use link_lock instead of hwrm_cmd_lock to protect link_infoEdwin Peer
We currently use the hwrm_cmd_lock to serialize the update of the firmware's link status response data and the copying of link status data to the VF. This won't work when we update the firmware message APIs, so we use the link_lock mutex instead. All link_info data should be updated under the link_lock mutex. Also add link_lock to functions that touch link_info in __bnxt_open_nic() and bnxt_probe_phy(). The locking is probably not strictly necessary during probe, but it's more consistent. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: add support for HWRM request slicesEdwin Peer
Slices are a mechanism for suballocating DMA mapped regions from the request buffer. Such regions can be used for indirect command data instead of creating new mappings with dma_alloc_coherent(). The advantage of using a slice is that the lifetime of the slice is bound to the request and will be automatically unmapped when the request is consumed. A single external region is also supported. This allows for regions that will not fit inside the spare request buffer space such that the same API can be used consistently even for larger mappings. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: add HWRM request assignment APIEdwin Peer
hwrm_req_replace() provides an assignment like operation to replace a managed HWRM request object with data from a pre-built source. This is useful for handling request data provided by higher layer HWRM clients. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: discard out of sequence HWRM responsesEdwin Peer
During firmware crash recovery, it is possible for firmware to respond to stale HWRM commands that have already timed out. Because response buffers may be reused, any out of sequence responses need to be ignored and only the matching seq_id should be accepted. Also, READ_ONCE should be used for the reads from the DMA buffer to ensure that the necessary loads are scheduled. Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: introduce new firmware message API based on DMA poolsEdwin Peer
This change constitutes a major step towards supporting multiple firmware commands in flight by maintaining a separate response buffer for the duration of each request. These firmware commands are also known as Hardware Resource Manager (HWRM) commands. Using separate response buffers requires an API change in order for callers to be able to free the buffer when done. It is impossible to keep the existing APIs unchanged. The existing usage for a simple HWRM message request such as the following: struct input req = {0}; bnxt_hwrm_cmd_hdr_init(bp, &req, REQ_TYPE, -1, -1); rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); if (rc) /* error */ changes to: struct input *req; rc = hwrm_req_init(bp, req, REQ_TYPE); if (rc) /* error */ rc = hwrm_req_send(bp, req); /* consumes req */ if (rc) /* error */ The key changes are: 1. The req is no longer allocated on the stack. 2. The caller must call hwrm_req_init() to allocate a req buffer and check for a valid buffer. 3. The req buffer is automatically released when hwrm_req_send() returns. 4. If the caller wants to check the firmware response, the caller must call hwrm_req_hold() to take ownership of the response buffer and release it afterwards using hwrm_req_drop(). The caller is no longer required to explicitly hold the hwrm_cmd_lock mutex to read the response. 5. Because the firmware commands and responses all have different sizes, some safeguards are added to the code. This patch maintains legacy API compatibiltiy, implementing the old API in terms of the new. The follow-on patches will convert all callers to use the new APIs. v2: Fix redefined writeq with parisc .config Fix "cast from pointer to integer of different size" warning in hwrm_calc_sentinel() Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: move HWRM API implementation into separate fileEdwin Peer
Move all firmware messaging functions and definitions to new bnxt_hwrm.[ch]. The follow-on patches will make major modifications to these APIs. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: Refactor the HWRM_VER_GET firmware callsEdwin Peer
Refactor the code so that __bnxt_hwrm_ver_get() does not call bnxt_hwrm_do_send_msg() directly. The new APIs will not expose this internal call. Add a new bnxt_hwrm_poll() to poll the HWRM_VER_GET firmware call silently. The other bnxt_hwrm_ver_get() function will send the HWRM_VER_GET message directly with error logs enabled. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30bnxt_en: remove DMA mapping for KONG responseEdwin Peer
The additional response buffer serves no useful purpose. There can be only one firmware command in flight due to the hwrm_cmd_lock mutex, which is taken for the entire duration of any command completion, KONG or otherwise. It is thus safe to share a single DMA buffer. Removing the code associated with the additional mapping will simplify matters in the next patch, which allocates response buffers from DMA pools on a per request basis. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30netfilter: add netfilter hooks to SRv6 data planeRyoga Saito
This patch introduces netfilter hooks for solving the problem that conntrack couldn't record both inner flows and outer flows. This patch also introduces a new sysctl toggle for enabling lightweight tunnel netfilter hooks. Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-29io_uring: allow updating linked timeoutsPavel Begunkov
We allow updating normal timeouts, add support for adjusting timings of linked timeouts as well. Reported-by: Victor Stewart <v@nametag.social> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-29io_uring: keep ltimeouts in a listPavel Begunkov
A preparation patch. Keep all queued linked timeout in a list, so they may be found and updated. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-29Linux 5.14v5.14Linus Torvalds
2021-08-29Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One hotfix for a NULL pointer deref in the Renesas usb clk driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference
2021-08-29Merge tag 'irqchip-5.15' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - API updates: - Treewide conversion to generic_handle_domain_irq() for anything that looks like a chained interrupt controller - Update the irqdomain documentation - Use of bitmap_zalloc() throughout the tree - New functionalities: - Support for GICv3 EPPI partitions - Fixes: - Qualcomm PDC hierarchy fixes - Yet another priority decoding fix for the GICv3 pseudo-NMIs - Fix the apple-aic driver irq_eoi() callback to always unmask the interrupt - Properly handle edge interrupts on loongson-pch-pic - Let the mtk-sysirq driver advertise IRQCHIP_SKIP_SET_WAKE Link: https://lore.kernel.org/r/20210828121013.2647964-1-maz@kernel.org