summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2011-03-05batman-adv: remove orig_hash spinlockMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: increase refcount in create_neighbor to be consistentMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Correct rcu refcounting for orig_nodeMarek Lindner
It might be possible that 2 threads access the same data in the same rcu grace period. The first thread calls call_rcu() to decrement the refcount and free the data while the second thread increases the refcount to use the data. To avoid this race condition all refcount operations have to be atomic. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: remove extra layer between hash and hash element - hash bucketMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: separate ethernet comparing calls from hash functionsMarek Lindner
Note: The function compare_ether_addr() provided by the Linux kernel requires aligned memory. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Fix possible buffer overflow in softif neigh list outputLinus Lüssing
When printing the soft interface table the number of entries in the softif neigh list are first being counted and a fitting buffer allocated. After that the softif neigh list gets locked again and the buffer printed - which has the following two issues: For one thing, the softif neigh list might have grown when reacquiring the rcu lock, which results in writing outside of the allocated buffer. Furthermore 31 Bytes are not enough for printing an entry with a vid of more than 2 digits. The manual buffering is unnecessary, we can safely print to the seq directly during the rcu_read_lock(). Signed-off-by: Linus Lüssing <linus.luessing@ascom.ch> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Increase orig_node refcount before releasing rcu read lockLinus Lüssing
When unicast_send_skb() is increasing the orig_node's refcount another thread might have been freeing this orig_node already. We need to increase the refcount in the rcu read lock protected area to avoid that. Signed-off-by: Linus Lüssing <linus.luessing@ascom.ch> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Make bat_priv->curr_gw an rcu protected pointerLinus Lüssing
The rcu protected macros rcu_dereference() and rcu_assign_pointer() for the bat_priv->curr_gw need to be used, as well as spin/rcu locking. Otherwise we might end up using a curr_gw pointer pointing to already freed memory. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Linus Lüssing <linus.luessing@ascom.ch> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: make broadcast seqno operations atomicMarek Lindner
Batman-adv could receive several payload broadcasts at the same time that would trigger access to the broadcast seqno sliding window to determine whether this is a new broadcast or not. If these incoming broadcasts are accessing the sliding window simultaneously it could be left in an inconsistent state. Therefore it is necessary to make sure this access is atomic. Reported-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect bit operations to count OGMs with spinlockMarek Lindner
Reported-by: Linus Lüssing <linus.luessing@saxnet.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Correct rcu refcounting for batman_ifMarek Lindner
It might be possible that 2 threads access the same data in the same rcu grace period. The first thread calls call_rcu() to decrement the refcount and free the data while the second thread increases the refcount to use the data. To avoid this race condition all refcount operations have to be atomic. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Correct rcu refcounting for softif_neighMarek Lindner
It might be possible that 2 threads access the same data in the same rcu grace period. The first thread calls call_rcu() to decrement the refcount and free the data while the second thread increases the refcount to use the data. To avoid this race condition all refcount operations have to be atomic. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Correct rcu refcounting for gw_nodeMarek Lindner
It might be possible that 2 threads access the same data in the same rcu grace period. The first thread calls call_rcu() to decrement the refcount and free the data while the second thread increases the refcount to use the data. To avoid this race condition all refcount operations have to be atomic. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: Correct rcu refcounting for neigh_nodeMarek Lindner
It might be possible that 2 threads access the same data in the same rcu grace period. The first thread calls call_rcu() to decrement the refcount and free the data while the second thread increases the refcount to use the data. To avoid this race condition all refcount operations have to be atomic. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect bonding with rcu locksSimon Wunderlich
bonding / alternating candidates need to be secured by rcu locks as well. This patch therefore converts the bonding list from a plain pointer list to a rcu securable lists and references the bonding candidates. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect ogm counter arrays with spinlockMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect originator nodes with reference countersMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect each hash row with rcu locksMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect neigh_nodes used outside of rcu_locks with refcountingMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: free neighbors when an interface is deactivatedMarek Lindner
hardif_disable_interface() calls purge_orig_ref() to immediately free all neighbors associated with the interface that is going down. purge_orig_neighbors() checked if the interface status is IF_INACTIVE which is set to IF_NOT_IN_USE shortly before calling purge_orig_ref(). Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect neighbor list with rcu locksMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: convert neighbor list to hlistMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05batman-adv: protect neighbor nodes with reference countersMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-03-05ipx: remove the BKLArnd Bergmann
This replaces all instances of lock_kernel in the IPX code with lock_sock. As far as I can tell, this is safe to do, because there is no global state that needs to be locked in IPX, and the code does not recursively take the lock or sleep indefinitely while holding it. Compile-tested only. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: netdev@vger.kernel.org
2011-03-05appletalk: remove the BKLArnd Bergmann
This changes appletalk to use lock_sock instead of lock_kernel for serialization. I tried to make sure that we don't hold the socket lock during sleeping functions, but I did not try to prove whether the locks are necessary in the first place. Compile-tested only. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: David Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org
2011-03-05x25: remove the BKLArnd Bergmann
This replaces all instances of lock_kernel in x25 with lock_sock, taking care to release the socket lock around sleeping functions (sock_alloc_send_skb and skb_recv_datagram). It is not clear whether this is a correct solution, but it seem to be what other protocols do in the same situation. Includes a fix suggested by Eric Dumazet. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Andrew Hendry <andrew.hendry@gmail.com> Cc: linux-x25@vger.kernel.org Cc: netdev@vger.kernel.org Cc: Eric Dumazet <eric.dumazet@gmail.com>
2011-03-04ipv4: Remove flowi from struct rtable.David S. Miller
The only necessary parts are the src/dst addresses, the interface indexes, the TOS, and the mark. The rest is unnecessary bloat, which amounts to nearly 50 bytes on 64-bit. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04ipv4: Set rt->rt_iif more sanely on output routes.David S. Miller
rt->rt_iif is only ever inspected on input routes, for example DCCP uses this to populate a route lookup flow key when generating replies to another packet. Therefore, setting it to anything other than zero on output routes makes no sense. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04ipv4: Get peer more cheaply in rt_init_metrics().David S. Miller
We know this is a new route object, so doing atomics and stuff makes no sense at all. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04ipv4: Optimize flow initialization in output route lookup.David S. Miller
We burn a lot of useless cycles, cpu store buffer traffic, and memory operations memset()'ing the on-stack flow used to perform output route lookups in __ip_route_output_key(). Only the first half of the flow object members even matter for output route lookups in this context, specifically: FIB rules matching cares about: dst, src, tos, iif, oif, mark FIB trie lookup cares about: dst FIB semantic match cares about: tos, scope, oif Therefore only initialize these specific members and elide the memset entirely. On Niagara2 this kills about ~300 cycles from the output route lookup path. Likely, we can take things further, since all callers of output route lookups essentially throw away the on-stack flow they use. So they don't care if we use it as a scratch-pad to compute the final flow key. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-03-04inetpeer: seqlock optimizationEric Dumazet
David noticed : ------------------ Eric, I was profiling the non-routing-cache case and something that stuck out is the case of calling inet_getpeer() with create==0. If an entry is not found, we have to redo the lookup under a spinlock to make certain that a concurrent writer rebalancing the tree does not "hide" an existing entry from us. This makes the case of a create==0 lookup for a not-present entry really expensive. It is on the order of 600 cpu cycles on my Niagara2. I added a hack to not do the relookup under the lock when create==0 and it now costs less than 300 cycles. This is now a pretty common operation with the way we handle COW'd metrics, so I think it's definitely worth optimizing. ----------------- One solution is to use a seqlock instead of a spinlock to protect struct inet_peer_base. After a failed avl tree lookup, we can easily detect if a writer did some changes during our lookup. Taking the lock and redo the lookup is only necessary in this case. Note: Add one private rcu_deref_locked() macro to place in one spot the access to spinlock included in seqlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04Merge branch 'for-davem' of ↵David S. Miller
ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2011-03-04libceph: fix msgr standby handlingSage Weil
The standby logic used to be pretty dependent on the work requeueing behavior that changed when we switched to WQ_NON_REENTRANT. It was also very fragile. Restructure things so that: - We clear WRITE_PENDING when we set STANDBY. This ensures we will requeue work when we wake up later. - con_work backs off if STANDBY is set. There is nothing to do if we are in standby. - clear_standby() helper is called by both con_send() and con_keepalive(), the two actions that can wake us up again. Move the connect_seq++ logic here. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr keepalive flagSage Weil
There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr backoffSage Weil
With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
2011-03-04mac80211: Remove redundant preamble and RTS flag setup in minstrel_htHelmut Schaa
mac80211 does the same afterwards anyway. Hence, just drop this redundant code. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-04Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6
2011-03-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h
2011-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]
2011-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) MAINTAINERS: Add Andy Gospodarek as co-maintainer. r8169: disable ASPM RxRPC: Fix v1 keys AF_RXRPC: Handle receiving ACKALL packets cnic: Fix lost interrupt on bnx2x cnic: Prevent status block race conditions with hardware net: dcbnl: check correct ops in dcbnl_ieee_set() e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead igb: fix sparse warning e1000: fix sparse warning netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values dccp: fix oops on Reset after close ipvs: fix dst_lock locking on dest update davinci_emac: Add Carrier Link OK check in Davinci RX Handler bnx2x: update driver version to 1.62.00-6 bnx2x: properly calculate lro_mss bnx2x: perform statistics "action" before state transition. bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode). bnx2x: Fix ethtool -t link test for MF (non-pmf) devices. bnx2x: Fix nvram test for single port devices. ...
2011-03-04DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]David Howells
When a DNS resolver key is instantiated with an error indication, attempts to read that key will result in an oops because user_read() is expecting there to be a payload - and there isn't one [CVE-2011-1076]. Give the DNS resolver key its own read handler that returns the error cached in key->type_data.x[0] as an error rather than crashing. Also make the kenter() at the beginning of dns_resolver_instantiate() limit the amount of data it prints, since the data is not necessarily NUL-terminated. The buggy code was added in: commit 4a2d789267e00b5a1175ecd2ddefcc78b83fbf09 Author: Wang Lei <wang840925@gmail.com> Date: Wed Aug 11 09:37:58 2010 +0100 Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2] This can trivially be reproduced by any user with the following program compiled with -lkeyutils: #include <stdlib.h> #include <keyutils.h> #include <err.h> static char payload[] = "#dnserror=6"; int main() { key_serial_t key; key = add_key("dns_resolver", "a", payload, sizeof(payload), KEY_SPEC_SESSION_KEYRING); if (key == -1) err(1, "add_key"); if (keyctl_read(key, NULL, 0) == -1) err(1, "read_key"); return 0; } What should happen is that keyctl_read() reports error 6 (ENXIO) to the user: dns-break: read_key: No such device or address but instead the kernel oopses. This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands as both of those cut the data down below the NUL termination that must be included in the data. Without this dns_resolver_instantiate() will return -EINVAL and the key will not be instantiated such that it can be read. The oops looks like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f PGD 3bdf8067 PUD 385b9067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq CPU 0 Modules linked in: Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468 /DG965RY RIP: 0010:[<ffffffff811b99f7>] [<ffffffff811b99f7>] user_read+0x4f/0x8f RSP: 0018:ffff88003bf47f08 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378 RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000 R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1 FS: 00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090) Stack: ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000 ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000 00000000004005a0 00007fffba368060 0000000000000000 0000000000000000 Call Trace: [<ffffffff811b708e>] keyctl_read_key+0xac/0xcf [<ffffffff811b7c07>] sys_keyctl+0x75/0xb6 [<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed <41> 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48 RIP [<ffffffff811b99f7>] user_read+0x4f/0x8f RSP <ffff88003bf47f08> CR2: 0000000000000010 Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> cc: Wang Lei <wang840925@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-03libceph: retry after authorization failureSage Weil
If we mark the connection CLOSED we will give up trying to reconnect to this server instance. That is appropriate for things like a protocol version mismatch that won't change until the server is restarted, at which point we'll get a new addr and reconnect. An authorization failure like this is probably due to the server not properly rotating it's secret keys, however, and should be treated as transient so that the normal backoff and retry behavior kicks in. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03libceph: fix handling of short returns from get_user_pagesSage Weil
get_user_pages() can return fewer pages than we ask for. We were returning a bogus pointer/error code in that case. Instead, loop until we get all the pages we want or get an error we can return to the caller. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03netlink: kill eff_cap from struct netlink_skb_parmsPatrick McHardy
Netlink message processing in the kernel is synchronous these days, capabilities can be checked directly in security_netlink_recv() from the current process. Signed-off-by: Patrick McHardy <kaber@trash.net> Reviewed-by: James Morris <jmorris@namei.org> [chrisw: update to include pohmelfs and uvesafb] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv6: Use ERR_CAST in addrconf_dst_alloc.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv4: Fix __ip_dev_find() to use ifa_local instead of ifa_address.David S. Miller
Reported-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03net_sched: reduce fifo qdisc sizeEric Dumazet
Because of various alignements [SLUB / qdisc], we use 512 bytes of memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and 192 bytes on 32bit ones. Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs) Change qdisc_alloc(), first trying a regular allocation before an oversized one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parmsPatrick McHardy
Netlink message processing in the kernel is synchronous these days, the session information can be collected when needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails.David S. Miller
As reported by Eric: [11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60 ... [11483.697741] Call Trace: [11483.697764] [<c12fc9d2>] udp_sendmsg+0x282/0x6e0 [11483.697790] [<c12a1c01>] ? memcpy_toiovec+0x51/0x70 [11483.697818] [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0 The pointer passed to dst_release() is -EINVAL, that's because we leave an error pointer in the local variable "rt" by accident. NULL it out to fix the bug. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>