From owner-freebsd-net@FreeBSD.ORG Mon May 6 08:13:03 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 20044FD6; Mon, 6 May 2013 08:13:03 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id 951FF9CA; Mon, 6 May 2013 08:13:02 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r468D1wh071410; Mon, 6 May 2013 12:13:01 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r468D0lP071409; Mon, 6 May 2013 12:13:00 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Mon, 6 May 2013 12:13:00 +0400 From: Gleb Smirnoff To: Richard Sharpe Subject: Re: Seeing EINVAL from writev on 8.0 to a non-blocking socket even though the data seems to hit the wire Message-ID: <20130506081300.GU15182@glebius.int.ru> References: <5181ECDF.1040905@mu.org> <51827DAA.2020009@vangyzen.net> <5183CC06.9020806@vangyzen.net> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: jfv@FreeBSD.org, freebsd-net@freebsd.org, Eric van Gyzen , Alfred Perlstein , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 May 2013 08:13:03 -0000 [I'm adding Jack Vogel, maintainer of ixgbe, to cc] On Fri, May 03, 2013 at 07:01:18PM -0700, Richard Sharpe wrote: R> On Fri, May 3, 2013 at 10:18 AM, Richard Sharpe R> wrote: R> > On Fri, May 3, 2013 at 7:39 AM, Eric van Gyzen wrote: R> >> On 05/02/2013 19:00, Richard Sharpe wrote: R> >>> On Thu, May 2, 2013 at 7:52 AM, Eric van Gyzen wrote: R> >>>> On 05/02/2013 08:48, Richard Sharpe wrote: R> >>>>> On Wed, May 1, 2013 at 9:34 PM, Alfred Perlstein wrote: R> >>>>>> On 5/1/13 8:03 PM, Richard Sharpe wrote: R> >>>>>>> Hi folks, R> >>>>>>> R> >>>>>>> I am checking to see if there are any known bugs with respect to this R> >>>>>>> in FreeBSD 8.0. R> >>>>>>> R> >>>>>>> Situation is that Samba 3.6.6 uses writev to a non-blocking socket to R> >>>>>>> get the SMB2 requests on the wire. R> >>>>>>> R> >>>>>>> Intermittently, we see the writev return EINVAL even though the data R> >>>>>>> has gotten on the wire. This I have verified by grabbing a capture and R> >>>>>>> comparing the SMB Sequence number in the last outgoing packet on the R> >>>>>>> wire vs the in-memory contents when we get EINVAL. R> >>>>>>> R> >>>>>>> Sometimes it occurs on a four-element IOVEC, sometimes we get EAGAIN R> >>>>>>> on the four-element IOVEC and then we get EINVAL when retrying on a R> >>>>>>> smaller IOVEC. R> >>>>>>> R> >>>>>>> Where should I look to check if there is some path where this might be R> >>>>>>> happening? Is this even the correct mailing list? R> >>>>>>> R> >>>>>> What does the iovec look like when you get EINVAL? Can you sanity check R> >>>>>> it? Is there anything special about it? (zero length vecs?) R> >>>>>> R> >>>>>> I think there are a few "maxvals" that if overrun cause EINVAL to be R> >>>>>> returned. example is if your iovec is somehow huge or has many, many R> >>>>>> elements. R> >>>>> Can anyone tell me the call graph down to the TCP code? R> >>>>> R> >>>> writev kern/sys_generic.c R> >>>> kern_writev R> >>>> dofilewrite R> >>>> fo_write in sys/file.h R> >>>> soo_write in kern/sys_socket.c R> >>>> sosend in kern/uipc_socket.c R> >>>> sosend_generic R> >>>> tcp_usr_send in netinet/tcp_usrreq.c R> >>> Is there a tool that generates call graphs? R> >> R> >> I'm not aware of one that works in the kernel--other than the kernel R> >> itself, of course. With DDB compiled in, you could set a breakpoint on, R> >> say, tcp_output, and show the call stack with bt. R> >> R> >> Also, take a look at stack(9). R> >> R> >>> I have been able to demonstrate that I am getting EINVAL returned from R> >>> writev kern/sys_generic.c, kern_writev, dofilewrite and soo_write, R> >>> but when I add printfs to sosend/sosend_generic it becomes very hard R> >>> to provoke this problem. R> >> R> >> So, either relocating code or changing the timing has changed the R> >> behavior--a Heisenbug. R> >> R> >> If your code looks like this: R> >> R> >> if (error == EINVAL) R> >> printf("you are here\n"); R> >> R> >> You might add __predict_false, like this: R> >> R> >> if (__predict_false(error == EINVAL)) R> >> printf("you are here\n"); R> >> R> >> That /might/ reduce the impact on runtime behavior. R> > R> > Thanks for that. The problem does not appear to be in the TCP or IP R> > layers. Rather, it appears to be in the ixgbe driver. R> > R> > The problem takes a little more effort to provoke, but simple printfs R> > are doing the job so far. R> R> The version of the ixgbe driver we are using seems to set the max size R> of a dma element to 65535 (IXGBE_TSO_SIZE) and, even though large R> numbers of iovecs are sent where the last element is 65536 bytes in R> size, sometimes this causes EINVAL to be returned ... Jack, can you look at this please? -- Totus tuus, Glebius.