Date: Fri, 5 Apr 2013 07:09:24 -0400 From: Matt Miller <matt@matthewjmiller.net> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Juan Mojica <jmojica@gmail.com> Subject: Re: panic in tcp_do_segment() Message-ID: <CAFc6gu9bmJQ-Uxv7PzspSLDyFz6zE3fV=u7HEumCh6f=YQLU2Q@mail.gmail.com> In-Reply-To: <1043692819.529554.1365114790772.JavaMail.root@erie.cs.uoguelph.ca> References: <1043692819.529554.1365114790772.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hey Rick, I believe Juan and I have root caused this crash recently. The t_state = 0x1, TCPS_LISTEN, in the link provided at the time of the assertion. In tcp_input(), if we're in TCPS_LISTEN, SO_ACCEPTCONN should be set on the socket and we should never enter tcp_do_segment() for this state. I think if you look in your corefile, you'll see the socket *doesn't* have this flag set in your case. 1043 /* 1044 * When the socket is accepting connections (the INPCB is in LISTEN 1045 * state) we look into the SYN cache if this is a new connection 1046 * attempt or the completion of a previous one. Because listen 1047 * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock will be 1048 * held in this case. 1049 */ 1050 if (so->so_options & SO_ACCEPTCONN) { 1051 struct in_conninfo inc; 1052 1053 KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting but " 1054 "tp not listening", __func__)); ... 1356 syncache_add(&inc, &to, th, inp, &so, m, NULL, NULL); 1357 /* 1358 * Entry added to syncache and mbuf consumed. 1359 * Everything already unlocked by syncache_add(). 1360 */ 1361 INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); 1362 return; 1363 } ... 1384 /* 1385 * Segment belongs to a connection in SYN_SENT, ESTABLISHED or later 1386 * state. tcp_do_segment() always consumes the mbuf chain, unlocks 1387 * the inpcb, and unlocks pcbinfo. 1388 */ 1389 tcp_do_segment(m, th, so, tp, drop_hdrlen, tlen, iptos, ti_locked); I think this has to do with this patch in soclose() where SO_ACCEPTCONN is being turned off in soclose(). I suspect if you look at the other threads in your corefile, you'll see one at this point in soclose() operating on the same socket as the one in the tcp_do_segment() thread. http://svnweb.freebsd.org/base?view=revision&revision=243627 817 /* 818 * Prevent new additions to the accept queues due 819 * to ACCEPT_LOCK races while we are draining them. 820 */ 821 so->so_options &= ~SO_ACCEPTCONN; 822 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) { 823 TAILQ_REMOVE(&so->so_incomp, sp, so_list); 824 so->so_incqlen--; 825 sp->so_qstate &= ~SQ_INCOMP; 826 sp->so_head = NULL; 827 ACCEPT_UNLOCK(); 828 soabort(sp); 829 ACCEPT_LOCK(); 830 } Juan had evaluated this code path and it seemed safe to just drop the packet in this case: + /* + * In closing down the socket, the SO_ACCEPTCONN flag is removed to + * prevent new connections from being established. This means that + * any frames in that were in the midst of being processed could + * make it here. Need to just drop the frame. + */ + if (TCPS_LISTEN == tp->t_state) { + TCPSTAT_INC(tcps_rcvwhileclosing); + goto drop; + } KASSERT(tp->t_state > TCPS_LISTEN, ("%s: TCPS_LISTEN", __func__)); Or, if there's someone more familiar with the locking in these paths, they may be able to come up with a way to restructure the locks and logic to close this window. -Matt On Thu, Apr 4, 2013 at 6:33 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Hi, > > When pho@ was doing some NFS testing, he got the > following crash, which I can't figure out. (As far > as I can see, INP_WLOCK() is always held when > tp->t_state = TCPS_CLOSED and it is held from before > the test for TCPS_CLOSED in tcp_input() up until > the tcp_do_segment() call. As such, I don't see how > tp->t_state can be TCPS_CLOSED, but that seems to > be what causes the panic?) > > The "umount -f" will result in: > soshutdown(so, SHUT_WR); > soclose(so); > being done by the krpc on the socket. > > Anyone have any ideas on this? > pho@ wrote: > > I continued running the NFS tests and got this "panic: tcp_do_segment: > > TCPS_LISTEN". It's the second time I get this panic with the same test > > scenario, so it seems to be reproducible. The scenario is "umount -f" > > of a mount point that is very active. > > > > http://people.freebsd.org/~pho/stress/log/kostik555.txt > > Thanks in advance for any help, rick > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFc6gu9bmJQ-Uxv7PzspSLDyFz6zE3fV=u7HEumCh6f=YQLU2Q>