diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-30 08:58:55 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-30 08:58:55 -0700 |
commit | 8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf (patch) | |
tree | fec3039a08284cd87f4ec9c3bea5b5a439f1859f /drivers/net/tun.c | |
parent | 4b290aae788e06561754b28c6842e4080957d3f7 (diff) | |
parent | fa582ca7e187a15e772e6a72fe035f649b387a60 (diff) |
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
Diffstat (limited to 'drivers/net/tun.c')
-rw-r--r-- | drivers/net/tun.c | 72 |
1 files changed, 55 insertions, 17 deletions
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index f8c5e2fd04df..cc6c50180663 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -186,7 +186,8 @@ struct tun_struct { struct net_device *dev; netdev_features_t set_features; #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \ - NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4) + NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4 | \ + NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM) int align; int vnet_hdr_sz; @@ -925,6 +926,7 @@ static int tun_net_init(struct net_device *dev) dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; + dev->hw_enc_features = dev->hw_features; dev->features = dev->hw_features; dev->vlan_features = dev->features & ~(NETIF_F_HW_VLAN_CTAG_TX | @@ -1000,8 +1002,8 @@ static unsigned int run_ebpf_filter(struct tun_struct *tun, /* Net device start xmit */ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) { + enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; struct tun_struct *tun = netdev_priv(dev); - enum skb_drop_reason drop_reason; int txq = skb->queue_mapping; struct netdev_queue *queue; struct tun_file *tfile; @@ -1030,10 +1032,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) } if (tfile->socket.sk->sk_filter && - sk_filter(tfile->socket.sk, skb)) { - drop_reason = SKB_DROP_REASON_SOCKET_FILTER; + sk_filter_reason(tfile->socket.sk, skb, &drop_reason)) goto drop; - } len = run_ebpf_filter(tun, skb, len); if (len == 0) { @@ -1698,7 +1698,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, struct sk_buff *skb; size_t total_len = iov_iter_count(from); size_t len = total_len, align = tun->align, linear; - struct virtio_net_hdr gso = { 0 }; + struct virtio_net_hdr_v1_hash_tunnel hdr; + struct virtio_net_hdr *gso; int good_linear; int copylen; int hdr_len = 0; @@ -1708,6 +1709,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, int skb_xdp = 1; bool frags = tun_napi_frags_enabled(tfile); enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; + netdev_features_t features = 0; + + /* + * Keep it easy and always zero the whole buffer, even if the + * tunnel-related field will be touched only when the feature + * is enabled and the hdr size id compatible. + */ + memset(&hdr, 0, sizeof(hdr)); + gso = (struct virtio_net_hdr *)&hdr; if (!(tun->flags & IFF_NO_PI)) { if (len < sizeof(pi)) @@ -1721,7 +1731,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, if (tun->flags & IFF_VNET_HDR) { int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz); - hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso); + features = tun_vnet_hdr_guest_features(vnet_hdr_sz); + hdr_len = __tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, + features, from, gso); if (hdr_len < 0) return hdr_len; @@ -1755,7 +1767,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, * (e.g gso or jumbo packet), we will do it at after * skb was created with generic XDP routine. */ - skb = tun_build_skb(tun, tfile, from, &gso, len, &skb_xdp); + skb = tun_build_skb(tun, tfile, from, gso, len, &skb_xdp); err = PTR_ERR_OR_ZERO(skb); if (err) goto drop; @@ -1799,7 +1811,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, } } - if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) { + if (tun_vnet_hdr_tnl_to_skb(tun->flags, features, skb, &hdr)) { atomic_long_inc(&tun->rx_frame_errors); err = -EINVAL; goto free_skb; @@ -2050,13 +2062,21 @@ static ssize_t tun_put_user(struct tun_struct *tun, } if (vnet_hdr_sz) { - struct virtio_net_hdr gso; + struct virtio_net_hdr_v1_hash_tunnel hdr; + struct virtio_net_hdr *gso; - ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso); + ret = tun_vnet_hdr_tnl_from_skb(tun->flags, tun->dev, skb, + &hdr); if (ret) return ret; - ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso); + /* + * Drop the packet if the configured header size is too small + * WRT the enabled offloads. + */ + gso = (struct virtio_net_hdr *)&hdr; + ret = __tun_vnet_hdr_put(vnet_hdr_sz, tun->dev->features, + iter, gso); if (ret) return ret; } @@ -2356,13 +2376,14 @@ static int tun_xdp_one(struct tun_struct *tun, struct tun_page *tpage) { unsigned int datasize = xdp->data_end - xdp->data; - struct tun_xdp_hdr *hdr = xdp->data_hard_start; - struct virtio_net_hdr *gso = &hdr->gso; + struct virtio_net_hdr *gso = xdp->data_hard_start; + struct virtio_net_hdr_v1_hash_tunnel *tnl_hdr; struct bpf_prog *xdp_prog; struct sk_buff *skb = NULL; struct sk_buff_head *queue; + netdev_features_t features; u32 rxhash = 0, act; - int buflen = hdr->buflen; + int buflen = xdp->frame_sz; int metasize = 0; int ret = 0; bool skb_xdp = false; @@ -2426,7 +2447,9 @@ build: if (metasize > 0) skb_metadata_set(skb, metasize); - if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) { + features = tun_vnet_hdr_guest_features(READ_ONCE(tun->vnet_hdr_sz)); + tnl_hdr = (struct virtio_net_hdr_v1_hash_tunnel *)gso; + if (tun_vnet_hdr_tnl_to_skb(tun->flags, features, skb, tnl_hdr)) { atomic_long_inc(&tun->rx_frame_errors); kfree_skb(skb); ret = -EINVAL; @@ -2812,6 +2835,8 @@ static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr) } +#define PLAIN_GSO (NETIF_F_GSO_UDP_L4 | NETIF_F_TSO | NETIF_F_TSO6) + /* This is like a cut-down ethtool ops, except done via tun fd so no * privs required. */ static int set_offload(struct tun_struct *tun, unsigned long arg) @@ -2841,6 +2866,18 @@ static int set_offload(struct tun_struct *tun, unsigned long arg) features |= NETIF_F_GSO_UDP_L4; arg &= ~(TUN_F_USO4 | TUN_F_USO6); } + + /* + * Tunnel offload is allowed only if some plain offload is + * available, too. + */ + if (features & PLAIN_GSO && arg & TUN_F_UDP_TUNNEL_GSO) { + features |= NETIF_F_GSO_UDP_TUNNEL; + if (arg & TUN_F_UDP_TUNNEL_GSO_CSUM) + features |= NETIF_F_GSO_UDP_TUNNEL_CSUM; + arg &= ~(TUN_F_UDP_TUNNEL_GSO | + TUN_F_UDP_TUNNEL_GSO_CSUM); + } } /* This gives the user a way to test for new features in future by @@ -3186,7 +3223,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, case SIOCGIFHWADDR: /* Get hw address */ - dev_get_mac_address(&ifr.ifr_hwaddr, net, tun->dev->name); + netif_get_mac_address(&ifr.ifr_hwaddr, net, tun->dev->name); if (copy_to_user(argp, &ifr, ifreq_len)) ret = -EFAULT; break; @@ -3695,3 +3732,4 @@ MODULE_AUTHOR(DRV_COPYRIGHT); MODULE_LICENSE("GPL"); MODULE_ALIAS_MISCDEV(TUN_MINOR); MODULE_ALIAS("devname:net/tun"); +MODULE_IMPORT_NS("NETDEV_INTERNAL"); |