Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 May 2010 23:21:06 +0100
From:      "Robert N. M. Watson" <rwatson@FreeBSD.org>
To:        Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        freebsd-net@FreeBSD.org, Anton Lavrentiev <lavr@ncbi.nlm.nih.gov>, bug-followup@FreeBSD.org
Subject:   Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly
Message-ID:  <90057FE4-4DCF-48F1-8D95-F1438E0799CD@FreeBSD.org>
In-Reply-To: <86vda982ax.fsf@kopusha.home.net>
References:  <201005231404.o4NE4xJd006658@freefall.freebsd.org> <86vda982ax.fsf@kopusha.home.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On 27 May 2010, at 22:25, Mikolaj Golub wrote:

> We observed the same issue on our FreeBSD6 and 7 servers. I tried to =
reproduce
> the problem writing a simple test case but failed -- I didn't come to =
the idea
> of shutdown()/close() sequence (as Anton did). Although looking now at =
the
> code we had the issue with I see that shutdown()/close() sequence was =
used
> there too.

Hi Mikolaj--

Indeed, this looks very like the ENOTCONN issue in kern/144061 -- =
another indication that a bit more refinement is required in the =
synchronization for our socket-layer state machine. Unfortunately, not =
an issue we'll likely address in a general sense until 9.0. However, a =
similar workaround to kenr/144061 does sound like the solution -- I'll =
get this new regression test in the tree, as well as the workaround, =
tomorrow, with the hopes of making 8.1.

Thanks all!

Robert

>=20
> It looks like SO_LINGER is not important to reproduce ECONNRESET.
> shutdown()/close() on one end and close() on the other is enough. =
Also,
> slowdown of one the processes (done by Anton using select()) is not =
important
> too. Taking this into consideration I have wrote a simplified version =
of a test
> to reproduce the bug (may be it worth of including to =
tools/regression/sockets?).
>=20
> I can easily reproduce the error with this test on FreeBSD7.1 and
> 8-STABLE. Adding some prints to the kernel code I localized the place =
where
> the error appears and added panic() to get a backtrace.
>=20
> So, the backtrace:
>=20
> (kgdb) bt
> #0  doadump () at pcpu.h:246
> #1  0xc04ec829 in db_fncall (dummy1=3D-1064461270, dummy2=3D0, =
dummy3=3D-1, dummy4=3D0xe85e58b0 "=C4X^=E8")
>    at /usr/src/sys/ddb/db_command.c:548
> #2  0xc04ecc5f in db_command (last_cmdp=3D0xc0e0af9c, cmd_table=3D0x0, =
dopager=3D0)
>    at /usr/src/sys/ddb/db_command.c:445
> #3  0xc04ecd14 in db_command_script (command=3D0xc0e0bec4 "call =
doadump")
>    at /usr/src/sys/ddb/db_command.c:516
> #4  0xc04f0e50 in db_script_exec (scriptname=3D0xe85e59bc =
"kdb.enter.panic", warnifnotfound=3DVariable "warnifnotfound" is not =
available.
> )
>    at /usr/src/sys/ddb/db_script.c:302
> #5  0xc04f0f37 in db_script_kdbenter (eventname=3D0xc0cc78ea "panic")
>    at /usr/src/sys/ddb/db_script.c:324
> #6  0xc04eec18 in db_trap (type=3D3, code=3D0) at =
/usr/src/sys/ddb/db_main.c:228
> #7  0xc08d9aa6 in kdb_trap (type=3D3, code=3D0, tf=3D0xe85e5af8) at =
/usr/src/sys/kern/subr_kdb.c:535
> #8  0xc0befecb in trap (frame=3D0xe85e5af8) at =
/usr/src/sys/i386/i386/trap.c:690
> #9  0xc0bd15eb in calltrap () at =
/usr/src/sys/i386/i386/exception.s:165
> #10 0xc08d9c2a in kdb_enter (why=3D0xc0cc78ea "panic", msg=3D0xc0cc78ea =
"panic") at cpufunc.h:71
> #11 0xc08a95b6 in panic (fmt=3D0xc0ce6585 "ECONNRESET") at =
/usr/src/sys/kern/kern_shutdown.c:562
> #12 0xc0a3d805 in tcp_usr_disconnect (so=3D0xc715c670) at =
/usr/src/sys/netinet/tcp_usrreq.c:552
> #13 0xc09111bd in sodisconnect (so=3D0xc715c670) at =
/usr/src/sys/kern/uipc_socket.c:810
> #14 0xc0914144 in soclose (so=3D0xc715c670) at =
/usr/src/sys/kern/uipc_socket.c:658
> #15 0xc08f6459 in soo_close (fp=3D0xc743e230, td=3D0xc7023000)
>    at /usr/src/sys/kern/sys_socket.c:291
> #16 0xc086efc3 in _fdrop (fp=3D0xc743e230, td=3D0xc7023000) at =
file.h:293
> #17 0xc0870cf0 in closef (fp=3D0xc743e230, td=3D0xc7023000)
>    at /usr/src/sys/kern/kern_descrip.c:2117
> #18 0xc0871097 in kern_close (td=3D0xc7023000, fd=3D4) at =
/usr/src/sys/kern/kern_descrip.c:1162
> #19 0xc087123a in close (td=3D0xc7023000, uap=3D0xe85e5cf8)
>    at /usr/src/sys/kern/kern_descrip.c:1114
> #20 0xc0bef600 in syscall (frame=3D0xe85e5d38) at =
/usr/src/sys/i386/i386/trap.c:1111
> #21 0xc0bd1680 in Xint0x80_syscall () at =
/usr/src/sys/i386/i386/exception.s:261
> #22 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) fr 12
> #12 0xc0a3d805 in tcp_usr_disconnect (so=3D0xc715c670) at =
/usr/src/sys/netinet/tcp_usrreq.c:552
> 552                     panic("ECONNRESET");
> (kgdb) list
> 547             inp =3D sotoinpcb(so);
> 548             KASSERT(inp !=3D NULL, ("tcp_usr_disconnect: inp =3D=3D =
NULL"));
> 549             INP_WLOCK(inp);
> 550             if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
> 551                     error =3D ECONNRESET;
> 552                     panic("ECONNRESET");
> 553                     /* log(LOG_INFO, "ECONNRESET 3: file %s; line =
%d\n", __FILE__, __LINE__); */
> 554                     goto out;
> 555             }
> 556             tp =3D intotcpcb(inp);
> (kgdb) p/x inp->inp_flags
> $1 =3D 0x4800000
>=20
> #define INP_DROPPED             0x04000000 /* protocol drop flag */
>=20
> (kgdb) fr 14
> #14 0xc0914144 in soclose (so=3D0xc715c670) at =
/usr/src/sys/kern/uipc_socket.c:658
> 658                             error =3D sodisconnect(so);
> (kgdb) list
> 653
> 654             CURVNET_SET(so->so_vnet);
> 655             funsetown(&so->so_sigio);
> 656             if (so->so_state & SS_ISCONNECTED) {
> 657                     if ((so->so_state & SS_ISDISCONNECTING) =3D=3D =
0) {
> 658                             error =3D sodisconnect(so);
> 659                             if (error) {
> 660                                     if (error =3D=3D ENOTCONN)
> 661                                             error =3D 0;
> 662                                     goto drop;
> (kgdb) p/x so->so_state =20
> $2 =3D 0x2000
>=20
> #define SS_ISDISCONNECTED       0x2000  /* socket disconnected from =
peer */
>=20
> (the code differs a bit here from original 8-STABLE version as it =
includes
> patch from kern/144061 to deal with ENOTCONN errors but this does not
> influence).
>=20
> Actually, I think that the issue here is similar to kern/144061. In =
the code
> above so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING =
without
> locking and then sodisconnect() is called. It looks like the following =
may
> happen:
>=20
> 1) after shutdown() our output is closed;
>=20
> 2) then we call close(), soclose() checks that we are still in =
SS_ISCONNECTED
> and calls sodisconnect();
>=20
> 3) at this time FIN arrives from the other end, which has called =
close() too,
> and the kernel disconnects the socket (INP_DROPPED is set);
>=20
> 4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and =
returns
> ECONNRESET.
>=20
> I am attaching the patch, which may not be a solution but rather for
> illustration to described above. Running the test with this patch I am
> observing the following messages in error logs
>=20
> May 27 23:55:41 zhuzha kernel: ECONNRESET: so->state: 0x2000; file =
/usr/src/sys/kern/uipc_socket.c; line 664
>=20
> and test does not fail.
>=20
> --=20
> Mikolaj Golub
>=20
> <tcp_close.c><uipc_socket.c.econnreset.patch>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?90057FE4-4DCF-48F1-8D95-F1438E0799CD>