From owner-freebsd-net@FreeBSD.ORG Sat Oct 9 01:15:40 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C316106564A; Sat, 9 Oct 2010 01:15:40 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 46B078FC13; Sat, 9 Oct 2010 01:15:40 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id BC9EC46B5C; Fri, 8 Oct 2010 21:15:39 -0400 (EDT) Date: Sat, 9 Oct 2010 02:15:39 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Attilio Rao In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-1924871622-1286586939=:1232" Cc: FreeBSD Current , freebsd-net@freebsd.org, Sergey Kandaurov , Jack F Vogel , Ryan Stone , Ed Maste Subject: Re: [PATCH] Netdump for review and testing -- preliminary version X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Oct 2010 01:15:40 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1924871622-1286586939=:1232 Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT On Fri, 8 Oct 2010, Attilio Rao wrote: >> GENERAL FRAMEWORK ARCHITECTURE >> >> Netdump is composed, right now, by an userland "server" and a kernel >> "client". The former is run on the target machine (where the dump will >> phisically happen) and it is responsible for receiving  the packets >> containing coredumps frame and for correctly writing them on-disk. The >> latter is part of the kernel installed on the source machine (where the >> dump is initiated) and is responsible for building correctly UDP packets >> containing the coredump frames, pushing through the network interface and >> routing them appropriately. Hi Attilio: Network dumps would be a great addition to the FreeBSD debugging suite! A few casual comments and questions, I need to spend some more time pondering the implications of the current netdump design later in the week. (1) Did you consider using tftp as the network dump protocol, rather than a custom protocol? It's also a simple UDP-based, ACKed file transfer protocol, with the advantage that it's widely supported by exiting tftp daemons, packet sniffers, security devices, etc. This wouldn't require using a stock tftpd, although that would certainly be a benefit as well. The post-processing of crashdumps that seems to happen in the netdump server could, presumably, happen "offline" as easily...? (2) Have you thought about adding a checksum to the dump format, since packets are going over the wire on UDP, etc? Even an MD5 sum wouldn't be too hard to arrange: all the code is in the kernel already, requires relatively little state storage, and is designed for streamed data. (3) As the bounds checking/etc behavior in dumping grows more complex, it seems a shame to replicate it in architecture-specific code. Could we pull the mediaoffset/mediasize checking into common code? Also, a reserved size/offset of "0" meaning "no limit" sounds slightly worrying; it might be slightly more conservative to add a flags field to dumperinfo and have a flag indicating "size is no object" for netdump, rather than overloading the existing fields. Some specific patch comments: + * XXX: This should be split into machdep and non-machdep parts What MD parts are in the file? The sysctl parts of the patch have a number of issues: - Please use sysctl_handle_string rather than manual calls and string bounds checking with SYSCTL_IN/SYSCTL_OUT. In general, type-specific sysctl handlers are named sysctl_handle_, which I recommend here as well. - Please use stack-local, fixed-length string buffers as the source/destination of copyin/copyout on ifnet names, rather than the fields in the structure itself. We support interface renaming, so the field can change, and sysctl has the potential to block for extended periods mid-copyin/copyout. - Because we support ifnet renaming, "none" is probably not a good reserved name. Could you use the empty string or something similar, or just a flag to indicate that netdump is disabled? - "ifp" rather than "ifn" and "nic" is the preferred variable name for ifnet pointers throughout the kernel. - ifnets can, and are, removeable at runtime. We have a refcount, but that's not really what you want. I suggest simply looking up the ifnet by name when it's needed during a dump or for sysctl output, rather than maintaining a long-term pointer, which can become stale. Especially with the auto-detect code. - Since sysctl_nic is only used once, I'd prefer it were inlined into a use-specific handler function, avoiding the arg1/arg2 complications that make this code a bit harder to read. sysctl_ip is useful because it's used more than once, though. However, it also very much wants you to call sysctl_handle_string! +sysctl_force_crash(SYSCTL_HANDLER_ARGS) Does this belong in the netdump code? We already have some of these options in debug.kdb.*, but perhaps others should be added there as well. + /* + * get and fill a header mbuf, then chain data as an extended + * mbuf. + */ + MGETHDR(m, M_DONTWAIT, MT_DATA); The idea of calling into the mbuf allocator in this context is just freaky, and may have some truly awful side effects. I suppose this is the cost of trying to combine code paths in the network device driver rather than have an independent path in the netdump case, but it's quite unfortunate and will significantly reduce the robustness of netdumps in the face of, for example, mbuf starvation. + if (ntohs(ah->ar_hrd) != ARPHRD_ETHER && + ntohs(ah->ar_hrd) != ARPHRD_IEEE802 && + ntohs(ah->ar_hrd) != ARPHRD_ARCNET && + ntohs(ah->ar_hrd) != ARPHRD_IEEE1394) { + NETDDEBUG("nd_handle_arp: unknown hardware address fmt " + "0x%2D)\n", (unsigned char *)&ah->ar_hrd, ""); + return; + } Are you sure you don't want to just check for ETHER here? + /* XXX: Probably should check if we're the recipient MAC address */ + /* Done ethernet processing. Strip off the ethernet header */ Yes, quite probably. What if you panic in promiscuous mode? + /* + * The first write (at offset 0) is the kernel dump header. Flag it + * for the server to treat specially. XXX: This doesn't strip out the + * footer KDH, although it shouldn't hurt anything. + */ The footer allows us to confirm that the tail end of a dump matches the beginning; while probably not strictly necessary in this scenario, it's not a bad idea given the lack of a checksum. + /* Default the nic to the first available interface */ + if ((ifn = TAILQ_FIRST(&ifnet)) != NULL) do { + if ((ifn->if_flags & IFF_UP) == 0) + continue; More locking needed. I'm not sure I like the idea of defaulting to net dumping out the first interface -- in the future, we'll want to compile netdump into the kernel in GENERIC so that it's always available. But we won't want to default to doing the above for a number of reasons. This means the defaults need to be sensible and quite conservative (i.e., disabled and non-autoconfigured). + /* Default the client to the first IP on nd_nic */ + if ((ifa = TAILQ_FIRST(&nd_nic->if_addrhead)) != NULL) do { + if (ifa->ifa_addr->sa_family != AF_INET) { + continue; + } + nd_client = ((struct sockaddr_in *)ifa->ifa_addr)->sin_addr; + } while ((ifa = TAILQ_NEXT(ifa, ifa_link)) != NULL && + nd_client.s_addr == INADDR_ANY); More locking needed here too. I'm guess this code is forward-ported from an older FreeBSD version without locking here, so you will want to check for other changes in the replicated code paths, such as input / arp / etc handling. And is this really a good idea? Addresses change, and I think you aren't registering a callback to pick up the change in the event you're using an auto-configured address. In some ways, in the same way you might want to look up the ifnet to use at dump-time, you might simultaneously want to auto-configure an address then, if auto-configuration is selected (i.e., netdump is enabled and no IP is configured). - void *if_pspare[7]; + struct netdump_methods *if_ndumpfuncs; /* netdump virtual methods */ + void *if_pspare[6]; int if_ispare[4]; In HEAD, at least, you should add your field and not use the spare. The spare should only be used on a merge to a KBI-stable branch (such as 8.x). I need to ponder your changes to ifnet and to the drivers some more; I recognize the benefits of maximal code alignment, but I worry a lot about these code paths having side effects (not least, calls into memory allocators, which in turn can enter VM, etc). I wonder if, given that, we need to teach the mbuf allocator to be much more conservative if netdumping is active... There are style bugs throughout. Robert --621616949-1924871622-1286586939=:1232--