summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-13amd-xgbe: extend 10Mbps support to MAC version 21HRaju Rangoju
MAC version 21H supports the 10Mbps speed. So, extend support to platforms that support it. Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13Merge branch 'octeontx2-updates'David S. Miller
Naveen Mamindlapalli says: ==================== RVU NIX AF driver updates This patch series includes a few enhancements and other updates to the RVU NIX AF driver. The first patch adds devlink option to configure NPC MCAM high priority zone entries reservation. This is useful when the requester needs more high priority entries than default reserved entries. The second patch adds support for RSS hash computation using L3 SRC or DST only, or L4 SRC or DST only. The third patch updates DWRR MTU configuration for CN10KB silicon. HW uses the DWRR MTU to compute DWRR weight. Patch 4 configures the LBK link in TL3_TL2 configuration only when switch mode is enabled. Patch 5 adds an option in the mailbox request to enable/disable DROP_RE bit which drops packets with L2 errors when set. Patch 6 updates SMQ flush mechanism to stop other child nodes from enqueuing any packets while SMQ flush is active. Otherwise SMQ flush may timeout. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13octeontx2-af: Set XOFF on other child transmit schedulers during SMQ flushNaveen Mamindlapalli
When multiple transmit scheduler queues feed a TL1 transmit link, the SMQ flush initiated on a low priority queue might get stuck when a high priority queue fully subscribes the transmit link. This inturn effects interface teardown. To avoid this, temporarily XOFF all TL1's other immediate child transmit scheduler queues and also clear any rate limit configuration on all the scheduler queues in SMQ(flush) hierarchy. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13octeontx2-af: add option to toggle DROP_RE enable in rx cfgNithin Dabilpuram
Add option to toggle DROP_RE bit in rx cfg mbox. This helps in modifying the config runtime as opposed to setting available via nix_lf_alloc() mbox at NIX LF init time. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob Kollanukkaran <jerinj@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13octeontx2-af: Enable LBK links only when switch mode is on.Subbaraya Sundeep
Currently, all the TL3_TL2 nodes are being configured to enable switch LBK channel 63 in them. Instead enable them only when switch mode is enabled. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13octeontx2-af: cn10k: Set NIX DWRR MTU for CN10KB siliconSunil Goutham
The DWRR MTU config added for SDP and RPM/LBK links on CN10K silicon is further extended on CK10KB silicon variant and made it configurable. Now there are 4 DWRR MTU config to choose while setting transmit scheduler's RR_WEIGHT. Here we are reserving one config for each of RPM, SDP and LBK. NIXX_AF_DWRR_MTUX(0) ---> RPM NIXX_AF_DWRR_MTUX(1) ---> SDP NIXX_AF_DWRR_MTUX(2) ---> LBK PF/VF drivers can choose the DWRR_MTU to be used by setting SMQX_CFG[pkt_link_type] to one of above. TLx_SCHEDULE[RR_WEIGHT] is to be as configured 'quantum / 2^DWRR_MTUX[MTU]'. DWRR_MTU of each link is exposed to PF/VF drivers via mailbox for RR_WEIGHT calculation. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13octeontx2-af: extend RSS supported offload typesKiran Kumar K
Add support to select L3 SRC or DST only, L4 SRC or DST only for RSS calculation. AF consumer may have requirement as we can select only SRC or DST data for RSS calculation in L3, L4 layers. With this requirement there will be following combinations, IPV[4,6]_SRC_ONLY, IPV[4,6]_DST_ONLY, [TCP,UDP,SCTP]_SRC_ONLY, [TCP,UDP,SCTP]_DST_ONLY. So, instead of creating a bit for each combination, we are using upper 4 bits (31:28) in the flow_key_cfg to represent the SRC, DST selection. 31 => L3_SRC, 30 => L3_DST, 29 => L4_SRC, 28 => L4_DST. These won't be part of flow_cfg, so that we don't need to change the existing ABI. Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13octeontx2-af: Add devlink option to adjust mcam high prio zone entriesNaveen Mamindlapalli
The NPC MCAM entries are currently divided into three priority zones in AF driver: high, mid, and low. The high priority zone and low priority zone take up 1/8th (each) of the available MCAM entries, and remaining going to the mid priority zone. The current allocation scheme may not meet certain requirements, such as when a requester needs more high priority zone entries than are reserved. This patch adds a devlink configurable option to increase the number of high priority zone entries that can be allocated by requester. The max number of entries that can be reserved for high priority usage is 100% of available MCAM entries. Usage: 1) Change high priority zone percentage to 75%: devlink -p dev param set pci/0002:01:00.0 name npc_mcam_high_zone_percent \ value 75 cmode runtime 2) Read high priority zone percentage: devlink -p dev param show pci/0002:01:00.0 name npc_mcam_high_zone_percent The devlink set configuration is only permitted when no MCAM entries are assigned, i.e., all MCAM entries are free, indicating that no PF/VF driver is loaded. So user must unload/unbind PF/VF driver/devices before modifying the high priority zone percentage. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-13selftests: forwarding: Fix layer 2 miss test syntaxIdo Schimmel
The test currently specifies "l2_miss" as "true" / "false", but the version that eventually landed in iproute2 uses "1" / "0" [1]. Align the test accordingly. [1] https://lore.kernel.org/netdev/20230607153550.3829340-1-idosch@nvidia.com/ Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12Merge branch 'splice-net-some-miscellaneous-msg_splice_pages-changes'Jakub Kicinski
David Howells says: ==================== splice, net: Some miscellaneous MSG_SPLICE_PAGES changes Now that the splice_to_socket() has been rewritten so that nothing now uses the ->sendpage() file op[1], some further changes can be made, so here are some miscellaneous changes that can now be done. (1) Remove the ->sendpage() file op. (2) Remove hash_sendpage*() from AF_ALG. (3) Make sunrpc send multiple pages in single sendmsg() call rather than calling sendpage() in TCP (or maybe TLS). (4) Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg(). (5) Make AF_KCM use sendmsg() when calling down to TCP and then make it send entire fragment lists in single sendmsg calls. Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=fd5f4d7da29218485153fd8b4c08da7fc130c79f [1] ==================== Link: https://lore.kernel.org/r/20230609100221.2620633-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12kcm: Send multiple frags in one sendmsg()David Howells
Rewrite the AF_KCM transmission loop to send all the fragments in a single skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list of fragments in each skb is conveniently a bio_vec[] that can just be attached to a BVEC iter. Note: I'm working out the size of each fragment-skb by adding up bv_len for all the bio_vecs in skb->frags[] - but surely this information is recorded somewhere? For the skbs in head->frag_list, this is equal to skb->data_len, but not for the head. head->data_len includes all the tail frags too. Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12kcm: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpageDavid Howells
When transmitting data, call down into the transport socket using sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than using sendpage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12tcp_bpf: Make tcp_bpf_sendpage() go through tcp_bpf_sendmsg(MSG_SPLICE_PAGES)David Howells
Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg(MSG_SPLICE_PAGES) rather than a loop calling tcp_sendpage(). sendpage() will be removed in the future. Signed-off-by: David Howells <dhowells@redhat.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Sitnicki <jakub@cloudflare.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpageDavid Howells
When transmitting data, call down into TCP using sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing sendpage calls to transmit header, data pages and trailer. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12algif: Remove hash_sendpage*()David Howells
Remove hash_sendpage*() as nothing should now call it since the rewrite of splice_to_socket()[1]. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2dc334f1a63a8839b88483a3e73c0f27c9c1791c [1] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12Remove file->f_op->sendpageDavid Howells
Remove file->f_op->sendpage as splicing to a socket now calls sendmsg rather than sendpage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12Merge branch 'net-flower-add-cfm-support'Jakub Kicinski
Zahari Doychev says: ==================== net: flower: add cfm support The first patch adds cfm support to the flow dissector. The second adds the flower classifier support. The third adds a selftest for the flower cfm functionality. ==================== Link: https://lore.kernel.org/r/20230608105648.266575-1-zahari.doychev@linux.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: net: add tc flower cfm testZahari Doychev
New cfm flower test case is added to the net forwarding selfttests. Example output: # ./tc_flower_cfm.sh p1 p2 TEST: CFM opcode match test [ OK ] TEST: CFM level match test [ OK ] TEST: CFM opcode and level match test [ OK ] Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: flower: add support for matching cfm fieldsZahari Doychev
Add support to the tc flower classifier to match based on fields in CFM information elements like level and opcode. tc filter add dev ens6 ingress protocol 802.1q \ flower vlan_id 698 vlan_ethtype 0x8902 cfm mdl 5 op 46 \ action drop Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: flow_dissector: add support for cfm packetsZahari Doychev
Add support for dissecting cfm packets. The cfm packet header fields maintenance domain level and opcode can be dissected. Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12Merge branch 'mptcp-fixes'Jakub Kicinski
Matthieu Baerts says: ==================== selftests: mptcp: skip tests not supported by old kernels (part 3) After a few years of increasing test coverage in the MPTCP selftests, we realised [1] the last version of the selftests is supposed to run on old kernels without issues. Supporting older versions is not that easy for this MPTCP case: these selftests are often validating the internals by checking packets that are exchanged, when some MIB counters are incremented after some actions, how connections are getting opened and closed in some cases, etc. In other words, it is not limited to the socket interface between the userspace and the kernelspace. In addition to that, the current MPTCP selftests run a lot of different sub-tests but the TAP13 protocol used in the selftests don't support sub-tests: one failure in sub-tests implies that the whole selftest is seen as failed at the end because sub-tests are not tracked. It is then important to skip sub-tests not supported by old kernels. To minimise the modifications and reduce the complexity to support old versions, the idea is to look at external signs and skip the whole selftest or just some sub-tests before starting them. This cannot be applied in all cases. Similar to the second part, this third one focuses on marking different sub-tests as skipped if some MPTCP features are not supported. This time, only in "mptcp_join.sh" selftest, the remaining one, is modified. Several techniques are used here to achieve this task: - Before starting some tests: - Check if a file (sysctl knob) is present: that's what patch 12/17 is doing for the userspace PM feature. - Check if a required kernel symbol is present in /proc/kallsyms: patches 9, 10, 14 and 15/17 are using this technique. - Check if it is possible to setup a particular network environment requiring Netfilter or TC: if the preparation step fail, the linked sub-test is marked as skipped. Patch 5/17 is doing that. - Check if a MIB counter is available: patches 7 and 13/17 do that. - Check if the kernel version is newer than a specific one: patch 1/17 adds some helpers in mptcp_lib.sh to ease its use. That's not ideal and it is only used as last resort but as mentioned above, it is important to skip tests if they are not supported not to have the whole selftest always being marked as failed on old kernels. Patches 11 and 17/17 are checking the kernel version. An alternative would be to ignore the results for some sub-tests but that's not ideal too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be set to 1 not to skip these tests if the running kernel doesn't have a supported version. - After having launched the tests: - Adapt the expectations depending on the presence of a kernel symbol (patch 6/17) or a kernel version (patch 8/17). - Check is a MIB counter is available and skip the verification if not. Patch 4/17 is using this technique. Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var value is checked: if it is set to 1, the test is marked as "failed" instead of "skipped". MPTCP public CI expects to have all features supported and it sets this env var to 1 to catch regressions in these new checks. Patch 2/17 uses 'iptables-legacy' if available because it might be needed when using an older kernel not supporting iptables-nft. Patch 3/17 adds some helpers used in the other patches mentioned to easily mark sub-tests as skipped. Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code from userspace_pm.sh but without using the "code style" and ways of using tools and printing messages from MPTCP Join selftest. Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1] Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 ==================== Link: https://lore.kernel.org/r/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip mixed tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of a mix of subflows in v4 and v6 by the in-kernel PM introduced by commit b9d69db87fb7 ("mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses"). It looks like there is no external sign we can use to predict the expected behaviour. Instead of accepting different behaviours and thus not really checking for the expected behaviour, we are looking here for a specific kernel version. That's not ideal but it looks better than removing the test because it cannot support older kernel versions. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: ad3493746ebe ("selftests: mptcp: add test-cases for mixed v4/v6 subflows") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: uniform listener testsMatthieu Baerts
The alignment was different from the other tests because tabs were used instead of spaces. While at it, also use 'echo' instead of 'printf' to print the result to keep the same style as done in the other sub-tests. And, even if it should be better with, also remove 'stdbuf' and sed's '--unbuffered' option because they are not used in the other subtests and they are not available when using a minimal environment with busybox. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 178d023208eb ("selftests: mptcp: listener test for in-kernel PM") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip PM listener tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of PM listener events introduced by commit f8c9dfbd875b ("mptcp: add pm listener events"). It is possible to look for "mptcp_event_pm_listener" in kallsyms to know in advance if the kernel supports this feature. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 178d023208eb ("selftests: mptcp: listener test for in-kernel PM") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip MPC backups tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of sending an MP_PRIO signal for the initial subflow, introduced by commit c157bbe776b7 ("mptcp: allow the in kernel PM to set MPC subflow priority"). It is possible to look for "mptcp_subflow_send_ack" in kallsyms because it was needed to introduce the mentioned feature. So we can know in advance if the feature is supported instead of trying and accepting any results. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 914f6a59b10f ("selftests: mptcp: add MPC backup tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip fail tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of the MP_FAIL / infinite mapping introduced by commit 1e39e5a32ad7 ("mptcp: infinite mapping sending") and the following ones. It is possible to look for one of the infinite mapping counters to know in advance if the this feature is available. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase") Cc: stable@vger.kernel.org Fixes: 2ba18161d407 ("selftests: mptcp: add MP_FAIL reset testcase") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip userspace PM tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of the userspace PM introduced by commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs") and the following ones. It is possible to look for the MPTCP pm_type's sysctl knob to know in advance if the userspace PM is available. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip fullmesh flag tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of the fullmesh flag for the in-kernel PM introduced by commit 2843ff6f36db ("mptcp: remote addresses fullmesh") and commit 1a0d6136c5f0 ("mptcp: local addresses fullmesh"). It looks like there is no easy external sign we can use to predict the expected behaviour. We could add the flag and then check if it has been added but for that, and for each fullmesh test, we would need to setup a new environment, do the checks, clean it and then only start the test from yet another clean environment. To keep it simple and avoid introducing new issues, we look for a specific kernel version. That's not ideal but an acceptable solution for this case. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 6a0653b96f5d ("selftests: mptcp: add fullmesh setting tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip backup if set flag on ID not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. Commit bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint") has simplified the way the backup flag is set on an endpoint. Instead of doing: ./pm_nl_ctl set 10.0.2.1 flags backup Now we do: ./pm_nl_ctl set id 1 flags backup The new way is easier to maintain but it is also incompatible with older kernels not supporting the implicit endpoints putting in place the infrastructure to set flags per ID, hence the second Fixes tag. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint") Cc: stable@vger.kernel.org Fixes: 4cf86ae84c71 ("mptcp: strict local address ID selection") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip implicit tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of the implicit endpoints introduced by commit d045b9eb95a9 ("mptcp: introduce implicit endpoints"). It is possible to look for "mptcp_subflow_send_ack" in kallsyms because it was needed to introduce the mentioned feature. So we can know in advance if the feature is supported instead of trying and accepting any results. Note that here and in the following commits, we re-do the same check for each sub-test of the same function for a few reasons. The main one is not to break the ID assign to each test in order to be able to easily compare results between different kernel versions. Also, we can still run a specific test even if it is skipped. Another reason is that it makes it clear during the review that a specific subtest will be skipped or not under certain conditions. At the end, it looks OK to call the exact same helper multiple times: it is not a critical path and it is the same code that is executed, not really more cases to maintain. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: support RM_ADDR for used endpoints or notMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. At some points, a new feature caused internal behaviour changes we are verifying in the selftests, see the Fixes tag below. It was not a UAPI change but because in these selftests, we check some internal behaviours, it is normal we have to adapt them from time to time after having added some features. It looks like there is no external sign we can use to predict the expected behaviour. Instead of accepting different behaviours and thus not really checking for the expected behaviour, we are looking here for a specific kernel version. That's not ideal but it looks better than removing the test because it cannot support older kernel versions. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 6fa0174a7c86 ("mptcp: more careful RM_ADDR generation") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip Fastclose tests if not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the support of MP_FASTCLOSE introduced in commit f284c0c77321 ("mptcp: implement fastclose xmit path"). If the MIB counter is not available, the test cannot be verified and the behaviour will not be the expected one. So we can skip the test if the counter is missing. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 01542c9bf9ab ("selftests: mptcp: add fastclose testcase") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: support local endpoint being tracked or notMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. At some points, a new feature caused internal behaviour changes we are verifying in the selftests, see the Fixes tag below. It was not a uAPI change but because in these selftests, we check some internal behaviours, it is normal we have to adapt them from time to time after having added some features. It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms because it was needed to introduce the mentioned feature. So we can know in advance what the behaviour we are expecting here instead of supporting the two behaviours. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip test if iptables/tc cmds failMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. Some tests are using IPTables and/or TC commands to force some behaviours. If one of these commands fails -- likely because some features are not available due to missing kernel config -- we should intercept the error and skip the tests requiring these features. Note that if we expect to have these features available and if SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests will be marked as failed instead of skipped. This patch also replaces the 'exit 1' by 'return 1' not to stop the selftest in the middle without the conclusion if there is an issue with NF or TC. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: skip check if MIB counter not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. One of them is the MPTCP MIB counters introduced in commit fc518953bc9c ("mptcp: add and use MIB counter infrastructure") and more later. The MPTCP Join selftest heavily relies on these counters. If a counter is not supported by the kernel, it is not displayed when using 'nstat -z'. We can then detect that and skip the verification. A new helper (get_counter()) has been added to do the required checks and return an error if the counter is not available. Note that if we expect to have these features available and if SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests will be marked as failed instead of skipped. This new helper also makes sure we get the exact counter we want to avoid issues we had in the past, e.g. with MPTcpExtRmAddr and MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the way we fetch a MIB counter. Note for the backports: we rarely change these modified blocks so if there is are conflicts, it is very likely because a counter is not used in the older kernels and we don't need that chunk. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: helpers to skip testsMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. Here are some helpers that will be used to mark subtests as skipped if a feature is not supported. Marking as a fix for the commit introducing this selftest to help with the backports. While at it, also check if kallsyms feature is available as it will also be used in the following commits to check if MPTCP features are available before starting a test. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: join: use 'iptables-legacy' if availableMatthieu Baerts
IPTables commands using 'iptables-nft' fail on old kernels, at least 5.15 because it doesn't see the default IPTables chains: $ iptables -L iptables/1.8.2 Failed to initialize nft: Protocol not supported As a first step before switching to NFTables, we can use iptables-legacy if available. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12selftests: mptcp: lib: skip if not below kernel versionMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features. A new function is now available to easily detect if a feature is missing by looking at the kernel version. That's clearly not ideal and this kind of check should be avoided as soon as possible. But sometimes, there are no external sign that a "feature" is available or not: internal behaviours can change without modifying the uAPI and these selftests are verifying the internal behaviours. Sometimes, the only (easy) way to verify if the feature is present is to run the test but then the validation cannot determine if there is a failure with the feature or if the feature is missing. Then it looks better to check the kernel version instead of having tests that can never fail. In any case, we need a solution not to have a whole selftest being marked as failed just because one sub-test has failed. Note that this env var car be set to 1 not to do such check and run the linked sub-test: SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK. This new helper is going to be used in the following commits. In order to ease the backport of such future patches, it would be good if this patch is backported up to the introduction of MPTCP selftests, hence the Fixes tag below: this type of check was supposed to be done from the beginning. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12Merge branch 'fixes-for-q-usgmii-speeds-and-autoneg'Jakub Kicinski
Maxime Chevallier says: ==================== fixes for Q-USGMII speeds and autoneg This is the second version of a small changeset for QUSGMII support, fixing inconsistencies in reported max speed and control word parsing. As reported here [1], there are some inconsistencies for the Q-USGMII mode speeds and configuration. The first patch in this fixup series makes so that we correctly report the max speed of 1Gbps for this mode. The second patch uses a dedicated helper to decode the control word. This is necessary as although USGMII control words are close to USXGMII, they don't support the same speeds. [1] : https://lore.kernel.org/netdev/ZHnd+6FUO77XFJvQ@shell.armlinux.org.uk/ ==================== Link: https://lore.kernel.org/r/20230609080305.546028-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: phylink: use a dedicated helper to parse usgmii control wordMaxime Chevallier
Q-USGMII is a derivative of USGMII, that uses a specific formatting for the control word. The layout is close to the USXGMII control word, but doesn't support speeds over 1Gbps. Use a dedicated decoding logic for the USGMII control word, re-using USXGMII definitions but only considering 10/100/1000Mbps speeds Fixes: 5e61fe157a27 ("net: phy: Introduce QUSGMII PHY mode") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: phylink: report correct max speed for QUSGMIIMaxime Chevallier
Q-USGMII is the quad port version of USGMII, and supports a max speed of 1Gbps on each line. Make so that phylink_interface_max_speed() reports this information correctly. Fixes: ae0e4bb2a0e0 ("net: phylink: Adjust link settings based on rate matching") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. 14 are cc:stable and the remainder address issues which were introduced during this development cycle or which were considered inappropriate for a backport" * tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zswap: do not shrink if cgroup may not zswap page cache: fix page_cache_next/prev_miss off by one ocfs2: check new file size on fallocate call mailmap: add entry for John Keeping mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp() epoll: ep_autoremove_wake_function should use list_del_init_careful mm/gup_test: fix ioctl fail for compat task nilfs2: reject devices with insufficient block count ocfs2: fix use-after-free when unmounting read-only filesystem lib/test_vmalloc.c: avoid garbage in page array nilfs2: fix possible out-of-bounds segment allocation in resize ioctl riscv/purgatory: remove PGO flags powerpc/purgatory: remove PGO flags x86/purgatory: remove PGO flags kexec: support purgatories with .text.hot sections mm/uffd: allow vma to merge as much as possible mm/uffd: fix vma operation where start addr cuts part of vma radix-tree: move declarations to header nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
2023-06-12igb: fix nvm.ops.read() error handlingAleksandr Loktionov
Add error handling into igb_set_eeprom() function, in case nvm.ops.read() fails just quit with error code asap. Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12igc: Fix possible system crash when loading moduleVinicius Costa Gomes
Guarantee that when probe() is run again, PTM and PCI busmaster will be in the same state as it was if the driver was never loaded. Avoid an i225/i226 hardware issue that PTM requests can be made even though PCI bus mastering is not enabled. These unexpected PTM requests can crash some systems. So, "force" disable PTM and busmastering before removing the driver, so they can be re-enabled in the right order during probe(). This is more like a workaround and should be applicable for i225 and i226, in any platform. Fixes: 1b5d73fb8624 ("igc: Enable PCIe PTM") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12igc: Clean the TX buffer and TX descriptor ringMuhammad Husaini Zulkifli
There could be a race condition during link down where interrupt being generated and igc_clean_tx_irq() been called to perform the TX completion. Properly clear the TX buffer/descriptor ring and disable the TX Queue ring in igc_free_tx_resources() to avoid that. Kernel trace: [ 108.237177] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLIFUI1.R00.4204.A00.2105270302 05/27/2021 [ 108.237178] RIP: 0010:refcount_warn_saturate+0x55/0x110 [ 108.242143] RSP: 0018:ffff9e7980003db0 EFLAGS: 00010286 [ 108.245555] Code: 84 bc 00 00 00 c3 cc cc cc cc 85 f6 74 46 80 3d 20 8c 4d 01 00 75 ee 48 c7 c7 88 f4 03 ab c6 05 10 8c 4d 01 01 e8 0b 10 96 ff <0f> 0b c3 cc cc cc cc 80 3d fc 8b 4d 01 00 75 cb 48 c7 c7 b0 f4 03 [ 108.250434] [ 108.250434] RSP: 0018:ffff9e798125f910 EFLAGS: 00010286 [ 108.254358] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 108.259325] [ 108.259325] RAX: 0000000000000000 RBX: ffff8ddb935b8000 RCX: 0000000000000027 [ 108.261868] RDX: ffff8de250a28800 RSI: ffff8de250a1c580 RDI: ffff8de250a1c580 [ 108.265538] RDX: 0000000000000027 RSI: 0000000000000002 RDI: ffff8de250a9c588 [ 108.265539] RBP: ffff8ddb935b8000 R08: ffffffffab2655a0 R09: ffff9e798125f898 [ 108.267914] RBP: ffff8ddb8a5b8d80 R08: 0000005648eba354 R09: 0000000000000000 [ 108.270196] R10: 0000000000000001 R11: 000000002d2d2d2d R12: ffff9e798125f948 [ 108.270197] R13: ffff9e798125fa1c R14: ffff8ddb8a5b8d80 R15: 7fffffffffffffff [ 108.273001] R10: 000000002d2d2d2d R11: 000000002d2d2d2d R12: ffff8ddb8a5b8ed4 [ 108.276410] FS: 00007f605851b740(0000) GS:ffff8de250a80000(0000) knlGS:0000000000000000 [ 108.280597] R13: 00000000000002ac R14: 00000000ffffff99 R15: ffff8ddb92561b80 [ 108.282966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.282967] CR2: 00007f053c039248 CR3: 0000000185850003 CR4: 0000000000f70ee0 [ 108.286206] FS: 0000000000000000(0000) GS:ffff8de250a00000(0000) knlGS:0000000000000000 [ 108.289701] PKRU: 55555554 [ 108.289702] Call Trace: [ 108.289704] <TASK> [ 108.293977] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.297562] sock_alloc_send_pskb+0x20c/0x240 [ 108.301494] CR2: 00007f053c03a168 CR3: 0000000184394002 CR4: 0000000000f70ef0 [ 108.301495] PKRU: 55555554 [ 108.306464] __ip_append_data.isra.0+0x96f/0x1040 [ 108.309441] Call Trace: [ 108.309443] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.314927] <IRQ> [ 108.314928] sock_wfree+0x1c7/0x1d0 [ 108.318078] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.320276] skb_release_head_state+0x32/0x90 [ 108.324812] ip_make_skb+0xf6/0x130 [ 108.327188] skb_release_all+0x16/0x40 [ 108.330775] ? udp_sendmsg+0x9f3/0xcb0 [ 108.332626] napi_consume_skb+0x48/0xf0 [ 108.334134] ? xfrm_lookup_route+0x23/0xb0 [ 108.344285] igc_poll+0x787/0x1620 [igc] [ 108.346659] udp_sendmsg+0x9f3/0xcb0 [ 108.360010] ? ttwu_do_activate+0x40/0x220 [ 108.365237] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.366744] ? try_to_wake_up+0x289/0x5e0 [ 108.376987] ? sock_sendmsg+0x81/0x90 [ 108.395698] ? __pfx_process_timeout+0x10/0x10 [ 108.395701] sock_sendmsg+0x81/0x90 [ 108.409052] __napi_poll+0x29/0x1c0 [ 108.414279] ____sys_sendmsg+0x284/0x310 [ 108.419507] net_rx_action+0x257/0x2d0 [ 108.438216] ___sys_sendmsg+0x7c/0xc0 [ 108.439723] __do_softirq+0xc1/0x2a8 [ 108.444950] ? finish_task_switch+0xb4/0x2f0 [ 108.452077] irq_exit_rcu+0xa9/0xd0 [ 108.453584] ? __schedule+0x372/0xd00 [ 108.460713] common_interrupt+0x84/0xa0 [ 108.467840] ? clockevents_program_event+0x95/0x100 [ 108.474968] </IRQ> [ 108.482096] ? do_nanosleep+0x88/0x130 [ 108.489224] <TASK> [ 108.489225] asm_common_interrupt+0x26/0x40 [ 108.496353] ? __rseq_handle_notify_resume+0xa9/0x4f0 [ 108.503478] RIP: 0010:cpu_idle_poll+0x2c/0x100 [ 108.510607] __sys_sendmsg+0x5d/0xb0 [ 108.518687] Code: 05 e1 d9 c8 00 65 8b 15 de 64 85 55 85 c0 7f 57 e8 b9 ef ff ff fb 65 48 8b 1c 25 00 cc 02 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 77 63 cd 00 85 c0 75 ed e8 ce ec ff ff [ 108.525817] do_syscall_64+0x44/0xa0 [ 108.531563] RSP: 0018:ffffffffab203e70 EFLAGS: 00000202 [ 108.538693] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 108.546775] [ 108.546777] RIP: 0033:0x7f605862b7f7 [ 108.549495] RAX: 0000000000000001 RBX: ffffffffab20c940 RCX: 000000000000003b [ 108.551955] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 108.554068] RDX: 4000000000000000 RSI: 000000002da97f6a RDI: 00000000002b8ff4 [ 108.559816] RSP: 002b:00007ffc99264058 EFLAGS: 00000246 [ 108.564178] RBP: 0000000000000000 R08: 00000000002b8ff4 R09: ffff8ddb01554c80 [ 108.571302] ORIG_RAX: 000000000000002e [ 108.571303] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f605862b7f7 [ 108.574023] R10: 000000000000015b R11: 000000000000000f R12: ffffffffab20c940 [ 108.574024] R13: 0000000000000000 R14: ffff8de26fbeef40 R15: ffffffffab20c940 [ 108.578727] RDX: 0000000000000000 RSI: 00007ffc992640a0 RDI: 0000000000000003 [ 108.578728] RBP: 00007ffc99264110 R08: 0000000000000000 R09: 175f48ad1c3a9c00 [ 108.581187] do_idle+0x62/0x230 [ 108.585890] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc992642d8 [ 108.585891] R13: 00005577814ab2ba R14: 00005577814addf0 R15: 00007f605876d000 [ 108.587920] cpu_startup_entry+0x1d/0x20 [ 108.591422] </TASK> [ 108.596127] rest_init+0xc5/0xd0 [ 108.600490] ---[ end trace 0000000000000000 ]--- Test Setup: DUT: - Change mac address on DUT Side. Ensure NIC not having same MAC Address - Running udp_tai on DUT side. Let udp_tai running throughout the test Example: ./udp_tai -i enp170s0 -P 100000 -p 90 -c 1 -t 0 -u 30004 Host: - Perform link up/down every 5 second. Result: Kernel panic will happen on DUT Side. Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12zswap: do not shrink if cgroup may not zswapNhat Pham
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt. However, since the zswap's LRU is not memcg-aware, this can create the following pathological behavior: the cgroup whose zswap limit is 0 will evict pages from other cgroups continually, without lowering its own zswap usage. This means the shrinking will continue until the need for swap ceases or the pool becomes empty. As a result of this, we observe a disproportionate amount of zswap writeback and a perpetually small zswap pool in our experiments, even though the pool limit is never hit. More generally, a cgroup might unnecessarily evict pages from other cgroups before we drive the memcg back below its limit. This patch fixes the issue by rejecting zswap store attempt without shrinking the pool when obj_cgroup_may_zswap() returns false. [akpm@linux-foundation.org: fix return of unintialized value] [akpm@linux-foundation.org: s/ENOSPC/ENOMEM/] Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20230530232435.3097106-1-nphamcs@gmail.com Fixes: f4840ccfca25 ("zswap: memcg accounting") Signed-off-by: Nhat Pham <nphamcs@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12page cache: fix page_cache_next/prev_miss off by oneMike Kravetz
Ackerley Tng reported an issue with hugetlbfs fallocate here[1]. The issue showed up after the conversion of hugetlb page cache lookup code to use page_cache_next_miss. Code in hugetlb fallocate, userfaultfd and GUP is now using page_cache_next_miss to determine if a page is present the page cache. The following statement is used. present = page_cache_next_miss(mapping, index, 1) != index; There are two issues with page_cache_next_miss when used in this way. 1) If the passed value for index is equal to the 'wrap-around' value, the same index will always be returned. This wrap-around value is 0, so 0 will be returned even if page is present at index 0. 2) If there is no gap in the range passed, the last index in the range will be returned. When passed a range of 1 as above, the passed index value will be returned even if the page is present. The end result is the statement above will NEVER indicate a page is present in the cache, even if it is. As noted by Ackerley in [1], users can see this by hugetlb fallocate incorrectly returning EEXIST if pages are already present in the file. In addition, hugetlb pages will not be included in core dumps if they need to be brought in via GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages already present in the cache. It may try to allocate a new page and potentially return ENOMEM as opposed to EEXIST. Both page_cache_next_miss and page_cache_prev_miss have similar issues. Fix by: - Check for index equal to 'wrap-around' value and do not exit early. - If no gap is found in range, return index outside range. - Update function description to say 'wrap-around' value could be returned if passed as index. [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/ Link: https://lkml.kernel.org/r/20230602225747.103865-2-mike.kravetz@oracle.com Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Cc: Erdem Aktas <erdemaktas@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12ocfs2: check new file size on fallocate callLuís Henriques
When changing a file size with fallocate() the new size isn't being checked. In particular, the FSIZE ulimit isn't being checked, which makes fstest generic/228 fail. Simply adding a call to inode_newsize_ok() fixes this issue. Link: https://lkml.kernel.org/r/20230529152645.32680-1-lhenriques@suse.de Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Mark Fasheh <mark@fasheh.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mailmap: add entry for John KeepingJohn Keeping
Map my corporate address to my personal one, as I am leaving the company. Link: https://lkml.kernel.org/r/20230531144839.1157112-1-john@keeping.me.uk Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()Kefeng Wang
If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses in damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide error, let's validate the values of them in damon_set_attrs() to fix it, which similar to others attrs check. Link: https://lkml.kernel.org/r/20230527032101.167788-1-wangkefeng.wang@huawei.com Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes") Reported-by: syzbot+841a46899768ec7bec67@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=841a46899768ec7bec67 Link: https://lore.kernel.org/damon/00000000000055fc4e05fc975bc2@google.com/ Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>