summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2005-10-10[PATCH] BIC coding bug in Linux 2.6.13Stephen Hemminger
Please consider this change for 2.6.13-stable Since BIC is the default congestion control algorithm, this fix is quite important. Missing parenthesis in causes BIC to be slow in increasing congestion window. Spotted by Injong Rhee. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-03[PATCH] Don't over-clamp window in tcp_clamp_window()Alexey Kuznetsov
Handle better the case where the sender sends full sized frames initially, then moves to a mode where it trickles out small amounts of data at a time. This known problem is even mentioned in the comments above tcp_grow_window() in tcp_input.c, specifically: ... * The scheme does not work when sender sends good segments opening * window and then starts to feed us spagetti. But it should work * in common situations. Otherwise, we have to rely on queue collapsing. ... When the sender gives full sized frames, the "struct sk_buff" overhead from each packet is small. So we'll advertize a larger window. If the sender moves to a mode where small segments are sent, this ratio becomes tilted to the other extreme and we start overrunning the socket buffer space. tcp_clamp_window() tries to address this, but it's clamping of tp->window_clamp is a wee bit too aggressive for this particular case. Fix confirmed by Ion Badulescu. Signed-off-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@osdl.org>
2005-10-03[PATCH] tcp: set default congestion control correctly for incoming connectionsStephen Hemminger
Patch from Joel Sing to fix the default congestion control algorithm for incoming connections. If a new congestion control handler is added (via module), it should become the default for new connections. Instead, the incoming connections use reno. The cause is incorrect initialisation causes the tcp_init_congestion_control() function to return after the initial if test fails. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@osdl.org>
2005-10-03[PATCH] ipvs: ip_vs_ftp breaks connections using persistenceJulian Anastasov
ip_vs_ftp when loaded can create NAT connections with unknown client port for passive FTP. For such expectations we lookup with cport=0 on incoming packet but it matches the format of the persistence templates causing packets to other persistent virtual servers to be forwarded to real server without creating connection. Later the reply packets are treated as foreign and not SNAT-ed. If the IPVS box serves both FTP and other services (eg. HTTP) for the time we wait for first packet for the FTP data connections with unknown client port (there can be many), other HTTP connections that have nothing common to the FTP conn break, i.e. HTTP client sends SYN to the virtual IP but the SYN+ACK is not NAT-ed properly in IPVS box and the client box returns RST to real server IP. I.e. the result can be 10% broken HTTP traffic if 10% of the time there are passive FTP connections in connecting state. It hurts only IPVS connections. This patch changes the connection lookup for packets from clients: * introduce IP_VS_CONN_F_TEMPLATE connection flag to mark the connection as template * create new connection lookup function just for templates - ip_vs_ct_in_get * make sure ip_vs_conn_in_get hits only connections with IP_VS_CONN_F_NO_CPORT flag set when s_port is 0. By this way we avoid returning template when looking for cport=0 (ftp) Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Chris Wright <chrisw@osdl.org>
2005-09-16[PATCH] Fix DHCP + MASQUERADE problemPatrick McHardy
In 2.6.13-rcX the MASQUERADE target was changed not to exclude local packets for better source address consistency. This breaks DHCP clients using UDP sockets when the DHCP requests are caught by a MASQUERADE rule because the MASQUERADE target drops packets when no address is configured on the outgoing interface. This patch makes it ignore packets with a source address of 0. Thanks to Rusty for this suggestion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@osdl.org>
2005-09-09[PATCH] raw_sendmsg DoS (CAN-2005-2492)Al Viro
Fix unchecked __get_user that could be tricked into generating a memory read on an arbitrary address. The result of the read is not returned directly but you may be able to divine some information about it, or use the read to cause a crash on some architectures by reading hardware state. CAN-2005-2492. Fix from Al Viro, ack from Dave Miller. Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-09-09[PATCH] Reassembly trim not clearing CHECKSUM_HWStephen Hemminger
[IPV4]: Reassembly trim not clearing CHECKSUM_HW This was found by inspection while looking for checksum problems with the skge driver that sets CHECKSUM_HW. It did not fix the problem, but it looks like it is needed. If IP reassembly is trimming an overlapping fragment, it should reset (or adjust) the hardware checksum flag on the skb. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-08-23[FIB_TRIE]: Don't ignore negative results from fib_semantic_matchPatrick McHardy
When a semantic match occurs either success, not found or an error (for matching unreachable routes/blackholes) is returned. fib_trie ignores the errors and looks for a different matching route. Treat results other than "no match" as success and end lookup. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23[TCP]: Document non-trivial locking path in tcp_v{4,6}_get_port().David S. Miller
This trips up a lot of folks reading this code. Put an unlikely() around the port-exhaustion test for good measure. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23[TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail().David S. Miller
Intention of this bit is to force pushing of the existing send queue when TCP_CORK or TCP_NODELAY state changes via setsockopt(). But it's easy to create a situation where the bit never clears. For example, if the send queue starts empty: 1) set TCP_NODELAY 2) clear TCP_NODELAY 3) set TCP_CORK 4) do small write() The current code will leave TCP_NAGLE_PUSH set after that sequence. Unconditionally clearing the bit when new data is added via skb_entail() solves the problem. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23[NETFILTER]: Fix HW checksum handling in ip_queue/ip6_queuePatrick McHardy
The checksum needs to be filled in on output, after mangling a packet ip_summed needs to be reset. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23[IPV4]: Fix negative timer loop with lots of ipv4 peers.Dave Johnson
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> Found this bug while doing some scaling testing that created 500K inet peers. peer_check_expire() in net/ipv4/inetpeer.c isn't using inet_peer_gc_mintime correctly and will end up creating an expire timer with less than the minimum duration, and even zero/negative if enough active peers are present. If >65K peers, the timer will be less than inet_peer_gc_mintime, and with >70K peers, the timer duration will reach zero and go negative. The timer handler will continue to schedule another zero/negative timer in a loop until peers can be aged. This can continue for at least a few minutes or even longer if the peers remain active due to arriving packets while the loop is occurring. Bug is present in both 2.4 and 2.6. Same patch will apply to both just fine. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23[TCP]: Do TSO deferral even if tail SKB can go out now.Dmitry Yusupov
If the tail SKB fits into the window, it is still benefitical to defer until the goal percentage of the window is available. This give the application time to feed more data into the send queue and thus results in larger TSO frames going out. Patch from Dmitry Yusupov <dima@neterion.com>. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20[NETFILTER]: Fix HW checksum handling in TCPMSS targetPatrick McHardy
Most importantly, remove bogus BUG() in receive path. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20[NETFILTER]: Fix HW checksum handling in ECN targetPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20[NETFILTER]: Fix ECN target TCP markingPatrick McHardy
An incorrect check made it bail out before doing anything. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18[IPCOMP]: Fix false smp_processor_id warningHerbert Xu
This patch fixes a false-positive from debug_smp_processor_id(). The processor ID is only used to look up crypto_tfm objects. Any processor ID is acceptable here as long as it is one that is iterated on by for_each_cpu(). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18[IPV4]: Fix DST leak in icmp_push_reply()Patrick McHardy
Based upon a bug report and initial patch by Ollie Wild. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-16[TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864Herbert Xu
1) We send out a normal sized packet with TSO on to start off. 2) ICMP is received indicating a smaller MTU. 3) We send the current sk_send_head which needs to be fragmented since it was created before the ICMP event. The first fragment is then sent out. At this point the remaining fragment is allocated by tcp_fragment. However, its size is padded to fit the L1 cache-line size therefore creating tail-room up to 124 bytes long. This fragment will also be sitting at sk_send_head. 4) tcp_sendmsg is called again and it stores data in the tail-room of of the fragment. 5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment since the packet as a whole exceeds the MTU. At this point we have a packet that has data in the head area being fed to tso_fragment which bombs out. My take on this is that we shouldn't ever call tcp_fragment on a TSO socket for a packet that is yet to be transmitted since this creates a packet on sk_send_head that cannot be extended. So here is a patch to change it so that tso_fragment is always used in this case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10[TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()Herbert Xu
Well I've only found one potential cause for the assertion failure in tcp_mark_head_lost. First of all, this can only occur if cnt > 1 since tp->packets_out is never zero here. If it did hit zero we'd have much bigger problems. So cnt is equal to fackets_out - reordering. Normally fackets_out is less than packets_out. The only reason I've found that might cause fackets_out to exceed packets_out is if tcp_fragment is called from tcp_retransmit_skb with a TSO skb and the current MSS is greater than the MSS stored in the TSO skb. This might occur as the result of an expiring dst entry. In that case, packets_out may decrease (line 1380-1381 in tcp_output.c). However, fackets_out is unchanged which means that it may in fact exceed packets_out. Previously tcp_retrans_try_collapse was the only place where packets_out can go down and it takes care of this by decrementing fackets_out. So we should make sure that fackets_out is reduced by an appropriate amount here as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
2005-08-08[IPV4]: Debug cleanupHeikki Orsila
Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux kernel 2.6.13-rc5. Also weird use of indentation is changed in some places. Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08[PATCH] don't try to do any NAT on untracked connectionsHarald Welte
With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-06[IPSEC]: Restrict socket policy loading to CAP_NET_ADMIN.Herbert Xu
The interface needs much redesigning if we wish to allow normal users to do this in some way. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-05[IPV4]: Fix memory leak during fib_info hash expansion.David S. Miller
When we grow the tables, we forget to free the olds ones up. Noticed by Yan Zheng. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-04[PATCH] tcp: fix TSO cwnd caching bugHerbert Xu
tcp_write_xmit caches the cwnd value indirectly in cwnd_quota. When tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached value becomes invalid. This patch ensures that the cwnd value is always reread after each tcp_transmit_skb call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04[PATCH] tcp: fix TSO sizing bugsDavid S. Miller
MSS changes can be lost since we preemptively initialize the tso_segs count for an SKB before we %100 commit to sending it out. So, by the time we send it out, the tso_size information can be stale due to PMTU events. This mucks up all of the logic in our send engine, and can even result in the BUG() triggering in tcp_tso_should_defer(). Another problem we have is that we're storing the tp->mss_cache, not the SACK block normalized MSS, as the tso_size. That's wrong too. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-30[NET]: fix oops after tunnel module unloadAlexey Kuznetsov
Tunnel modules used to obtain module refcount each time when some tunnel was created, which meaned that tunnel could be unloaded only after all the tunnels are deleted. Since killing old MOD_*_USE_COUNT macros this protection has gone. It is possible to return it back as module_get/put, but it looks more natural and practically useful to force destruction of all the child tunnels on module unload. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30[NETFILTER] Inherit masq_index to slave connectionsHarald Welte
masq_index is used for cleanup in case the interface address changes (such as a dialup ppp link with dynamic addreses). Without this patch, slave connections are not evicted in such a case, since they don't inherit masq_index. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30[NET]: Spelling mistakes threshoulds -> thresholdsBaruch Even
Just simple spelling mistake fixes. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27[NET]: Move in_aton from net/ipv4/utils.c to net/core/utils.cMatt Mackall
Move in_aton to allow netpoll and pktgen to work without the rest of the IPv4 stack. Fix whitespace and add comment for the odd placement. Delete now-empty net/ipv4/utils.c Re-enable netpoll/netconsole without CONFIG_INET Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27[NETFILTER]: Fix -Wunder error in ip_conntrack_core.cNick Sillik
Signed-off-by: Nick Sillik <n.sillik@temple.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27[IPV4]: Fix Kconfig syntax errorHans-Juergen Tappe (SYSGO AG)
From: "Hans-Juergen Tappe (SYSGO AG)" <hjt@sysgo.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Use correct byteorder in ICMP NATPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Wait until all references to ip_conntrack_untracked are dropped ↵Patrick McHardy
on unload Fixes a crash when unloading ip_conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Fix potential memory corruption in NAT code (aka memory NAT)Patrick McHardy
The portptr pointing to the port in the conntrack tuple is declared static, which could result in memory corruption when two packets of the same protocol are NATed at the same time and one conntrack goes away. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-21[NETFILTER]: ip_conntrack_expect_related must not free expectationRusty Russell
If a connection tracking helper tells us to expect a connection, and we're already expecting that connection, we simply free the one they gave us and return success. The problem is that NAT helpers (eg. FTP) have to allocate the expectation first (to see what port is available) then rewrite the packet. If that rewrite fails, they try to remove the expectation, but it was freed in ip_conntrack_expect_related. This is one example of a larger problem: having registered the expectation, the pointer is no longer ours to use. Reference counting is needed for ctnetlink anyway, so introduce it now. To have a single "put" path, we need to grab the reference to the connection on creation, rather than open-coding it in the caller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[NET]: Make ipip/ip6_tunnel independant of XFRMPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie.Stephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[IPV4]: Don't select XFRM for ip_grePatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[IPV4]: fix IP_FIB_HASH kconfig warningAdrian Bunk
This patch fixes the following kconfig warning: net/ipv4/Kconfig:92:warning: defaults for choice values not supported Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12[NETFILTER]: Revert nf_reset changePhil Oester
Revert the nf_reset change that caused so much trouble, drop conntrack references manually before packets are queued to packet sockets. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[NET]: move config options out to individual protocolsSam Ravnborg
Move the protocol specific config options out to the specific protocols. With this change net/Kconfig now starts to become readable and serve as a good basis for further re-structuring. The menu structure is left almost intact, except that indention is fixed in most cases. Most visible are the INET changes where several "depends on INET" are replaced with a single ifdef INET / endif pair. Several new files were created to accomplish this change - they are small but serve the purpose that config options are now distributed out where they belongs. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[IPV4]: Prevent oops when printing martian sourceOlaf Kirch
In some cases, we may be generating packets with a source address that qualifies as martian. This can happen when we're in the middle of setting up the network, and netfilter decides to reject a packet with an RST. The IPv4 routing code would try to print a warning and oops, because locally generated packets do not have a valid skb->mac.raw pointer at this point. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[IPVS]: Add and reorder bh locks after moving to keventd.Julian Anastasov
An addition to the last ipvs changes that move update_defense_level/si_meminfo to keventd: - ip_vs_random_dropentry now runs in process context and should use _bh locks to protect from softirqs - update_defense_level still needs _bh locks after si_meminfo is called, for the same purpose Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: fix IPv4 leave-group group matchingDavid L Stevens
This patch fixes the multicast group matching for IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior patch. Groups are identifiedby <group address,interface> and including the interface address in the match will fail if a leave-group is done by address when the join was done by index, or if different addresses on the same interface are used in the join and leave. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & ↵David L Stevens
errno fix 1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state multicast source filter APIs (IPv4 and IPv6) 2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be EADDRNOTAVAIL) Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: multicast API "join" issuesDavid L Stevens
1) In the full-state API when imsf_numsrc == 0 errno should be "0", but returns EADDRNOTAVAIL 2) An illegal filter mode change errno should be EINVAL, but returns EADDRNOTAVAIL 3) Trying to do an any-source option without IP_ADD_MEMBERSHIP errno should be EINVAL, but returns EADDRNOTAVAIL 4) Adds comments for the less obvious error return values Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: multicast API "join" issuesDavid L Stevens
1) Changes IP_ADD_SOURCE_MEMBERSHIP and MCAST_JOIN_SOURCE_GROUP to ignore EADDRINUSE errors on a "courtesy join" -- prior membership or not is ok for these. 2) Adds "leave group" equivalence of (INCLUDE, empty) filters in the delta-based API. Without this, mixing delta-based API calls that end in an (INCLUDE, empty) filter would not allow a subsequent regular IP_ADD_MEMBERSHIP. It also frees socket buffer memory that isn't needed for both the multicast group record and source filter. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: multicast API "join" issuesDavid L Stevens
This patch corrects a few problems with the IP_ADD_MEMBERSHIP socket option: 1) The existing code makes an attempt at reference counting joins when using the ip_mreqn/imr_ifindex interface. Joining the same group on the same socket is an error, whatever the API. This leads to unexpected results when mixing ip_mreqn by index with ip_mreqn by address, ip_mreq, or other API's. For example, ip_mreq followed by ip_mreqn of the same group will "work" while the same two reversed will not. Fixed to always return EADDRINUSE on a duplicate join and removed the (now unused) reference count in ip_mc_socklist. 2) The group-search list in ip_mc_join_group() is comparing a full ip_mreqn structure and all of it must match for it to find the group. This doesn't correctly match a group that was joined with ip_mreq or ip_mreqn with an address (with or without an index). It also doesn't match groups that are joined by different addresses on the same interface. All of these are the same multicast group, which is identified by group address and interface index. Fixed the check to correctly match groups so we don't get duplicate group entries on the ip_mc_socklist. 3) The old code allocates a multicast address before searching for duplicates requiring it to free in various error cases. This patch moves the allocate until after the search and igmp_max_memberships check, so never a need to allocate, then free an entry. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>