Date: Thu, 27 May 2010 22:30:04 GMT From: "Robert N. M. Watson" <rwatson@FreeBSD.org> To: freebsd-net@FreeBSD.org Subject: Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly Message-ID: <201005272230.o4RMU4C8033464@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/146845; it has been noted by GNATS. From: "Robert N. M. Watson" <rwatson@FreeBSD.org> To: Mikolaj Golub <to.my.trociny@gmail.com> Cc: bug-followup@FreeBSD.org, Anton Lavrentiev <lavr@ncbi.nlm.nih.gov>, freebsd-net@FreeBSD.org Subject: Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly Date: Thu, 27 May 2010 23:21:06 +0100 On 27 May 2010, at 22:25, Mikolaj Golub wrote: > We observed the same issue on our FreeBSD6 and 7 servers. I tried to = reproduce > the problem writing a simple test case but failed -- I didn't come to = the idea > of shutdown()/close() sequence (as Anton did). Although looking now at = the > code we had the issue with I see that shutdown()/close() sequence was = used > there too. Hi Mikolaj-- Indeed, this looks very like the ENOTCONN issue in kern/144061 -- = another indication that a bit more refinement is required in the = synchronization for our socket-layer state machine. Unfortunately, not = an issue we'll likely address in a general sense until 9.0. However, a = similar workaround to kenr/144061 does sound like the solution -- I'll = get this new regression test in the tree, as well as the workaround, = tomorrow, with the hopes of making 8.1. Thanks all! Robert >=20 > It looks like SO_LINGER is not important to reproduce ECONNRESET. > shutdown()/close() on one end and close() on the other is enough. = Also, > slowdown of one the processes (done by Anton using select()) is not = important > too. Taking this into consideration I have wrote a simplified version = of a test > to reproduce the bug (may be it worth of including to = tools/regression/sockets?). >=20 > I can easily reproduce the error with this test on FreeBSD7.1 and > 8-STABLE. Adding some prints to the kernel code I localized the place = where > the error appears and added panic() to get a backtrace. >=20 > So, the backtrace: >=20 > (kgdb) bt > #0 doadump () at pcpu.h:246 > #1 0xc04ec829 in db_fncall (dummy1=3D-1064461270, dummy2=3D0, = dummy3=3D-1, dummy4=3D0xe85e58b0 "=C4X^=E8") > at /usr/src/sys/ddb/db_command.c:548 > #2 0xc04ecc5f in db_command (last_cmdp=3D0xc0e0af9c, cmd_table=3D0x0, = dopager=3D0) > at /usr/src/sys/ddb/db_command.c:445 > #3 0xc04ecd14 in db_command_script (command=3D0xc0e0bec4 "call = doadump") > at /usr/src/sys/ddb/db_command.c:516 > #4 0xc04f0e50 in db_script_exec (scriptname=3D0xe85e59bc = "kdb.enter.panic", warnifnotfound=3DVariable "warnifnotfound" is not = available. > ) > at /usr/src/sys/ddb/db_script.c:302 > #5 0xc04f0f37 in db_script_kdbenter (eventname=3D0xc0cc78ea "panic") > at /usr/src/sys/ddb/db_script.c:324 > #6 0xc04eec18 in db_trap (type=3D3, code=3D0) at = /usr/src/sys/ddb/db_main.c:228 > #7 0xc08d9aa6 in kdb_trap (type=3D3, code=3D0, tf=3D0xe85e5af8) at = /usr/src/sys/kern/subr_kdb.c:535 > #8 0xc0befecb in trap (frame=3D0xe85e5af8) at = /usr/src/sys/i386/i386/trap.c:690 > #9 0xc0bd15eb in calltrap () at = /usr/src/sys/i386/i386/exception.s:165 > #10 0xc08d9c2a in kdb_enter (why=3D0xc0cc78ea "panic", msg=3D0xc0cc78ea = "panic") at cpufunc.h:71 > #11 0xc08a95b6 in panic (fmt=3D0xc0ce6585 "ECONNRESET") at = /usr/src/sys/kern/kern_shutdown.c:562 > #12 0xc0a3d805 in tcp_usr_disconnect (so=3D0xc715c670) at = /usr/src/sys/netinet/tcp_usrreq.c:552 > #13 0xc09111bd in sodisconnect (so=3D0xc715c670) at = /usr/src/sys/kern/uipc_socket.c:810 > #14 0xc0914144 in soclose (so=3D0xc715c670) at = /usr/src/sys/kern/uipc_socket.c:658 > #15 0xc08f6459 in soo_close (fp=3D0xc743e230, td=3D0xc7023000) > at /usr/src/sys/kern/sys_socket.c:291 > #16 0xc086efc3 in _fdrop (fp=3D0xc743e230, td=3D0xc7023000) at = file.h:293 > #17 0xc0870cf0 in closef (fp=3D0xc743e230, td=3D0xc7023000) > at /usr/src/sys/kern/kern_descrip.c:2117 > #18 0xc0871097 in kern_close (td=3D0xc7023000, fd=3D4) at = /usr/src/sys/kern/kern_descrip.c:1162 > #19 0xc087123a in close (td=3D0xc7023000, uap=3D0xe85e5cf8) > at /usr/src/sys/kern/kern_descrip.c:1114 > #20 0xc0bef600 in syscall (frame=3D0xe85e5d38) at = /usr/src/sys/i386/i386/trap.c:1111 > #21 0xc0bd1680 in Xint0x80_syscall () at = /usr/src/sys/i386/i386/exception.s:261 > #22 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) fr 12 > #12 0xc0a3d805 in tcp_usr_disconnect (so=3D0xc715c670) at = /usr/src/sys/netinet/tcp_usrreq.c:552 > 552 panic("ECONNRESET"); > (kgdb) list > 547 inp =3D sotoinpcb(so); > 548 KASSERT(inp !=3D NULL, ("tcp_usr_disconnect: inp =3D=3D = NULL")); > 549 INP_WLOCK(inp); > 550 if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { > 551 error =3D ECONNRESET; > 552 panic("ECONNRESET"); > 553 /* log(LOG_INFO, "ECONNRESET 3: file %s; line = %d\n", __FILE__, __LINE__); */ > 554 goto out; > 555 } > 556 tp =3D intotcpcb(inp); > (kgdb) p/x inp->inp_flags > $1 =3D 0x4800000 >=20 > #define INP_DROPPED 0x04000000 /* protocol drop flag */ >=20 > (kgdb) fr 14 > #14 0xc0914144 in soclose (so=3D0xc715c670) at = /usr/src/sys/kern/uipc_socket.c:658 > 658 error =3D sodisconnect(so); > (kgdb) list > 653 > 654 CURVNET_SET(so->so_vnet); > 655 funsetown(&so->so_sigio); > 656 if (so->so_state & SS_ISCONNECTED) { > 657 if ((so->so_state & SS_ISDISCONNECTING) =3D=3D = 0) { > 658 error =3D sodisconnect(so); > 659 if (error) { > 660 if (error =3D=3D ENOTCONN) > 661 error =3D 0; > 662 goto drop; > (kgdb) p/x so->so_state =20 > $2 =3D 0x2000 >=20 > #define SS_ISDISCONNECTED 0x2000 /* socket disconnected from = peer */ >=20 > (the code differs a bit here from original 8-STABLE version as it = includes > patch from kern/144061 to deal with ENOTCONN errors but this does not > influence). >=20 > Actually, I think that the issue here is similar to kern/144061. In = the code > above so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING = without > locking and then sodisconnect() is called. It looks like the following = may > happen: >=20 > 1) after shutdown() our output is closed; >=20 > 2) then we call close(), soclose() checks that we are still in = SS_ISCONNECTED > and calls sodisconnect(); >=20 > 3) at this time FIN arrives from the other end, which has called = close() too, > and the kernel disconnects the socket (INP_DROPPED is set); >=20 > 4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and = returns > ECONNRESET. >=20 > I am attaching the patch, which may not be a solution but rather for > illustration to described above. Running the test with this patch I am > observing the following messages in error logs >=20 > May 27 23:55:41 zhuzha kernel: ECONNRESET: so->state: 0x2000; file = /usr/src/sys/kern/uipc_socket.c; line 664 >=20 > and test does not fail. >=20 > --=20 > Mikolaj Golub >=20 > <tcp_close.c><uipc_socket.c.econnreset.patch>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201005272230.o4RMU4C8033464>