Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 May 2010 21:30:04 GMT
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        freebsd-net@FreeBSD.org
Subject:   Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly
Message-ID:  <201005272130.o4RLU4NB082160@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/146845; it has been noted by GNATS.

From: Mikolaj Golub <to.my.trociny@gmail.com>
To: bug-followup@FreeBSD.org
Cc: Anton Lavrentiev <lavr@ncbi.nlm.nih.gov>, Robert Watson <rwatson@FreeBSD.org>, freebsd-net@FreeBSD.org
Subject: Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly
Date: Fri, 28 May 2010 00:25:42 +0300

 --=-=-=
 Content-Type: text/plain; charset=koi8-r
 Content-Transfer-Encoding: 8bit
 
 Hi,
 
 We observed the same issue on our FreeBSD6 and 7 servers. I tried to reproduce
 the problem writing a simple test case but failed -- I didn't come to the idea
 of shutdown()/close() sequence (as Anton did). Although looking now at the
 code we had the issue with I see that shutdown()/close() sequence was used
 there too.
 
 It looks like SO_LINGER is not important to reproduce ECONNRESET.
 shutdown()/close() on one end and close() on the other is enough. Also,
 slowdown of one the processes (done by Anton using select()) is not important
 too. Taking this into consideration I have wrote a simplified version of a test
 to reproduce the bug (may be it worth of including to tools/regression/sockets?).
 
 I can easily reproduce the error with this test on FreeBSD7.1 and
 8-STABLE. Adding some prints to the kernel code I localized the place where
 the error appears and added panic() to get a backtrace.
 
 So, the backtrace:
 
 (kgdb) bt
 #0  doadump () at pcpu.h:246
 #1  0xc04ec829 in db_fncall (dummy1=-1064461270, dummy2=0, dummy3=-1, dummy4=0xe85e58b0 "ÄX^è")
     at /usr/src/sys/ddb/db_command.c:548
 #2  0xc04ecc5f in db_command (last_cmdp=0xc0e0af9c, cmd_table=0x0, dopager=0)
     at /usr/src/sys/ddb/db_command.c:445
 #3  0xc04ecd14 in db_command_script (command=0xc0e0bec4 "call doadump")
     at /usr/src/sys/ddb/db_command.c:516
 #4  0xc04f0e50 in db_script_exec (scriptname=0xe85e59bc "kdb.enter.panic", warnifnotfound=Variable "warnifnotfound" is not available.
 )
     at /usr/src/sys/ddb/db_script.c:302
 #5  0xc04f0f37 in db_script_kdbenter (eventname=0xc0cc78ea "panic")
     at /usr/src/sys/ddb/db_script.c:324
 #6  0xc04eec18 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
 #7  0xc08d9aa6 in kdb_trap (type=3, code=0, tf=0xe85e5af8) at /usr/src/sys/kern/subr_kdb.c:535
 #8  0xc0befecb in trap (frame=0xe85e5af8) at /usr/src/sys/i386/i386/trap.c:690
 #9  0xc0bd15eb in calltrap () at /usr/src/sys/i386/i386/exception.s:165
 #10 0xc08d9c2a in kdb_enter (why=0xc0cc78ea "panic", msg=0xc0cc78ea "panic") at cpufunc.h:71
 #11 0xc08a95b6 in panic (fmt=0xc0ce6585 "ECONNRESET") at /usr/src/sys/kern/kern_shutdown.c:562
 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at /usr/src/sys/netinet/tcp_usrreq.c:552
 #13 0xc09111bd in sodisconnect (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:810
 #14 0xc0914144 in soclose (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:658
 #15 0xc08f6459 in soo_close (fp=0xc743e230, td=0xc7023000)
     at /usr/src/sys/kern/sys_socket.c:291
 #16 0xc086efc3 in _fdrop (fp=0xc743e230, td=0xc7023000) at file.h:293
 #17 0xc0870cf0 in closef (fp=0xc743e230, td=0xc7023000)
     at /usr/src/sys/kern/kern_descrip.c:2117
 #18 0xc0871097 in kern_close (td=0xc7023000, fd=4) at /usr/src/sys/kern/kern_descrip.c:1162
 #19 0xc087123a in close (td=0xc7023000, uap=0xe85e5cf8)
     at /usr/src/sys/kern/kern_descrip.c:1114
 #20 0xc0bef600 in syscall (frame=0xe85e5d38) at /usr/src/sys/i386/i386/trap.c:1111
 #21 0xc0bd1680 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
 #22 0x00000033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) fr 12
 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at /usr/src/sys/netinet/tcp_usrreq.c:552
 552                     panic("ECONNRESET");
 (kgdb) list
 547             inp = sotoinpcb(so);
 548             KASSERT(inp != NULL, ("tcp_usr_disconnect: inp == NULL"));
 549             INP_WLOCK(inp);
 550             if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
 551                     error = ECONNRESET;
 552                     panic("ECONNRESET");
 553                     /* log(LOG_INFO, "ECONNRESET 3: file %s; line %d\n", __FILE__, __LINE__); */
 554                     goto out;
 555             }
 556             tp = intotcpcb(inp);
 (kgdb) p/x inp->inp_flags
 $1 = 0x4800000
 
 #define INP_DROPPED             0x04000000 /* protocol drop flag */
 
 (kgdb) fr 14
 #14 0xc0914144 in soclose (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:658
 658                             error = sodisconnect(so);
 (kgdb) list
 653
 654             CURVNET_SET(so->so_vnet);
 655             funsetown(&so->so_sigio);
 656             if (so->so_state & SS_ISCONNECTED) {
 657                     if ((so->so_state & SS_ISDISCONNECTING) == 0) {
 658                             error = sodisconnect(so);
 659                             if (error) {
 660                                     if (error == ENOTCONN)
 661                                             error = 0;
 662                                     goto drop;
 (kgdb) p/x so->so_state  
 $2 = 0x2000
 
 #define SS_ISDISCONNECTED       0x2000  /* socket disconnected from peer */
 
 (the code differs a bit here from original 8-STABLE version as it includes
 patch from kern/144061 to deal with ENOTCONN errors but this does not
 influence).
 
 Actually, I think that the issue here is similar to kern/144061. In the code
 above so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING without
 locking and then sodisconnect() is called. It looks like the following may
 happen:
 
 1) after shutdown() our output is closed;
 
 2) then we call close(), soclose() checks that we are still in SS_ISCONNECTED
 and calls sodisconnect();
 
 3) at this time FIN arrives from the other end, which has called close() too,
 and the kernel disconnects the socket (INP_DROPPED is set);
 
 4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and returns
 ECONNRESET.
 
 I am attaching the patch, which may not be a solution but rather for
 illustration to described above. Running the test with this patch I am
 observing the following messages in error logs
 
 May 27 23:55:41 zhuzha kernel: ECONNRESET: so->state: 0x2000; file /usr/src/sys/kern/uipc_socket.c; line 664
 
 and test does not fail.
 
 -- 
 Mikolaj Golub
 
 
 --=-=-=
 Content-Type: application/octet-stream
 Content-Disposition: attachment; filename=tcp_close.c
 Content-Transfer-Encoding: base64
 
 I2luY2x1ZGUgPHN5cy90eXBlcy5oPgojaW5jbHVkZSA8c3lzL3NvY2tldC5oPgojaW5jbHVkZSA8
 c3lzL3N5c2N0bC5oPgojaW5jbHVkZSA8bmV0aW5ldC9pbi5oPgojaW5jbHVkZSA8c2lnbmFsLmg+
 CiNpbmNsdWRlIDxzdGRpby5oPgojaW5jbHVkZSA8c3RyaW5nLmg+CiNpbmNsdWRlIDxzdGRsaWIu
 aD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgojaW5jbHVkZSA8ZXJyLmg+CgojZGVmaW5lIExPT1BTICAg
 MTAwMDAwCiNkZWZpbmUgVVNMRUVQICAxMDAKCmludAptYWluKGludCBhcmdjLCBjaGFyICoqYXJn
 dikKewoJc3RydWN0IHNvY2thZGRyX2luIHNpbjsKCWludCBsaXN0ZW5mZCwgY29ubmZkLCBwaWQ7
 Cgl1X2ludCBjb3VudGVyLCBuY3B1czsKCXNpemVfdCBsZW47Cglzb2NrbGVuX3Qgc29sZW47CgoJ
 bGVuID0gc2l6ZW9mKG5jcHVzKTsKCWlmIChzeXNjdGxieW5hbWUoImtlcm4uc21wLmNwdXMiLCAm
 bmNwdXMsICZsZW4sIE5VTEwsIDApIDwgMCkKCQllcnIoMSwgImtlcm4uc21wLmNwdXMiKTsKCWlm
 IChsZW4gIT0gc2l6ZW9mKG5jcHVzKSkKCQllcnJ4KDEsICJrZXJuLnNtcC5jcHVzOiBpbnZhbGlk
 IGxlbmd0aCIpOwoJaWYgKG5jcHVzIDwgMikKCQl3YXJueCgiU01QIG5vdCBwcmVzZW50LCB0ZXN0
 IG1heSBiZSB1bmFibGUgdG8gdHJpZ2dlciByYWNlIik7CgoJLyoKCSAqIENyZWF0ZSBhIHRjcCBz
 b2NrZXQgdGhhdCB0aGUgY2hpbGQgd2lsbCByZXBlYXRlZGx5IGFjY2VwdCgpCgkgKiBmcm9tLCBh
 bmQgdGhhdCB0aGUgcGFyZW50IHdpbGwgcmVwZWF0ZWRseSBjb25uZWN0KCkgdG8uCgkgKiBDaG9v
 c2UgdGhlIGxpc3RlbiBwb3J0IGR5bmFtaWNhbGx5IGFuZCBnZXQgaXRzIHZhbHVlIHVzaW5nCgkg
 KiBnZXRzb2NrbmFtZSgpLgoJICovCglpZiAoKGxpc3RlbmZkID0gc29ja2V0KEFGX0lORVQsIFNP
 Q0tfU1RSRUFNLCAwKSkgPCAwKQoJCWVycigxLCAicGFyZW50OiBzb2NrZXQgZXJyb3IiKTsKCW1l
 bXNldCgmc2luLCAwLCBzaXplb2Yoc2luKSk7CglzaW4uc2luX2xlbiA9IHNpemVvZihzaW4pOwoJ
 c2luLnNpbl9mYW1pbHkgPSBBRl9JTkVUOwoJaWYgKGJpbmQobGlzdGVuZmQsIChzdHJ1Y3Qgc29j
 a2FkZHIgKikgJnNpbiwKCQkgc2l6ZW9mKHNpbikpIDwgMCkKCQllcnIoMSwgInBhcmVudDogYmlu
 ZCBlcnJvciIpOwoJaWYgKGdldHNvY2tuYW1lKGxpc3RlbmZkLCAoc3RydWN0IHNvY2thZGRyICop
 ICZzaW4sCgkJCSZzb2xlbikgPCAwKQoJCWVycigxLCAicGFyZW50OiBnZXRzb2NrbmFtZSBlcnJv
 ciIpOwoJaWYgKGxpc3RlbihsaXN0ZW5mZCwgMTAyNCkgPCAwKQoJCWVycigxLCAicGFyZW50OiBs
 aXN0ZW4gZXJyb3IiKTsKCXBpZCA9IGZvcmsoKTsKCWlmIChwaWQgPT0gLTEpCgkJZXJyKDEsICJm
 b3JrKCkiKTsKCWlmIChwaWQgIT0gMCkgewoJCS8qCgkJICogSW4gdGhlIHBhcmVudCwgcmVwZWF0
 ZWRseSBjb25uZWN0IGFuZCBkaXNjb25uZWN0CgkJICogZnJvbSB0aGUgc29ja2V0LCBhdHRlbXB0
 aW5nIHRvIGluZHVjZSB0aGUgcmFjZS4KCQkgKi8KCQljbG9zZShsaXN0ZW5mZCk7CgkJc2xlZXAo
 MSk7CgkJZm9yIChjb3VudGVyID0gMDsgY291bnRlciA8IExPT1BTOyBjb3VudGVyKyspIHsKCQkJ
 aWYgKChjb25uZmQgPSBzb2NrZXQoQUZfSU5FVCwgU09DS19TVFJFQU0sIDApKSA8IDApIHsKCQkJ
 CSh2b2lkKWtpbGwocGlkLCBTSUdURVJNKTsKCQkJCWVycigxLCAicGFyZW50OiBzb2NrZXQgZXJy
 b3IiKTsKCQkJfQoJCQlpZiAoY29ubmVjdChjb25uZmQsIChzdHJ1Y3Qgc29ja2FkZHIgKikmc2lu
 LAoJCQkJICAgIHNpemVvZihzaW4pKSA8IDApIHsKCQkJCWVycigxLCAicGFyZW50OiBjb25uZWN0
 IGVycm9yIChhZnRlciAlZCBpdGVyYXRpb25zKSIsCgkJCQkgICAgY291bnRlcik7CgkJCX0KCQkJ
 aWYgKGNsb3NlKGNvbm5mZCkgPCAwKSB7CgkJCQkodm9pZClraWxsKHBpZCwgU0lHVEVSTSk7CgkJ
 CQllcnIoMSwgInBhcmVudDogY2xvc2UgZXJyb3IiKTsKCQkJfQoJCQl1c2xlZXAoVVNMRUVQKTsK
 CQl9CgkJKHZvaWQpa2lsbChwaWQsIFNJR1RFUk0pOwoJfSBlbHNlIHsKCQkvKgoJCSAqIEluIHRo
 ZSBjaGlsZCwgbG9vcCBhY2NlcHRpbmcsIHNodXRkb3duIGFuZCBjbG9zaW5nLgoJCSAqIFdlIG1h
 eSBwaWNrIHVwIHRoZSByYWNlIGhlcmUgc28gcmVwb3J0IGVycm9ycyBmcm9tCgkJICogY2xvc2Uo
 KS4KCQkgKi8KCQlmb3IgKCA7IDsgKSB7CgkJCWlmICgoY29ubmZkID0gYWNjZXB0KGxpc3RlbmZk
 LAoJCQkgICAgKHN0cnVjdCBzb2NrYWRkciAqKU5VTEwsIE5VTEwpKSA8IDApCgkJCQllcnIoMSwg
 ImNoaWxkOiBhY2NlcHQgZXJyb3IiKTsKCQkJaWYgKHNodXRkb3duKGNvbm5mZCwgU0hVVF9XUikg
 PCAwKQoJCQkJZXJyKDEsICJjaGlsZDogc2h1dGRvd24gZXJyb3IiKTsKCQkJaWYgKGNsb3NlKGNv
 bm5mZCkgPCAwKQoJCQkJZXJyKDEsICJjaGlsZDogY2xvc2UgZXJyb3IiKTsKCQl9Cgl9Cglwcmlu
 dGYoIk9LXG4iKTsKCWV4aXQoMCk7Cn0K
 --=-=-=
 Content-Type: application/octet-stream
 Content-Disposition: attachment; filename=uipc_socket.c.econnreset.patch
 Content-Transfer-Encoding: base64
 
 LS0tIHN5cy9rZXJuL3VpcGNfc29ja2V0LmMub3JpZwkyMDEwLTA0LTAyIDEzOjM2OjE5LjAwMDAw
 MDAwMCArMDMwMAorKysgc3lzL2tlcm4vdWlwY19zb2NrZXQuYwkyMDEwLTA1LTI3IDIyOjA5OjA2
 LjAwMDAwMDAwMCArMDMwMApAQCAtMTI5LDYgKzEyOSw3IEBAIF9fRkJTRElEKCIkRnJlZUJTRDog
 c3JjL3N5cy9rZXJuL3VpcGNfc28KICNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAjaW5jbHVkZSA8
 c3lzL3Vpby5oPgogI2luY2x1ZGUgPHN5cy9qYWlsLmg+CisjaW5jbHVkZSA8c3lzL3N5c2xvZy5o
 PgogCiAjaW5jbHVkZSA8bmV0L3ZuZXQuaD4KIApAQCAtNjU2LDggKzY1NywxNSBAQCBzb2Nsb3Nl
 KHN0cnVjdCBzb2NrZXQgKnNvKQogCWlmIChzby0+c29fc3RhdGUgJiBTU19JU0NPTk5FQ1RFRCkg
 ewogCQlpZiAoKHNvLT5zb19zdGF0ZSAmIFNTX0lTRElTQ09OTkVDVElORykgPT0gMCkgewogCQkJ
 ZXJyb3IgPSBzb2Rpc2Nvbm5lY3Qoc28pOwotCQkJaWYgKGVycm9yKQorCQkJaWYgKGVycm9yKSB7
 CisJCQkJaWYgKGVycm9yID09IEVDT05OUkVTRVQpIHsKKwkJCQkJbG9nKExPR19JTkZPLCAKKwkJ
 CQkJICAgICJFQ09OTlJFU0VUOiBzby0+c3RhdGU6IDB4JXg7IGZpbGUgJXM7IGxpbmUgJWRcbiIs
 CisJCQkJCSAgICBzby0+c29fc3RhdGUsIF9fRklMRV9fLCBfX0xJTkVfXyk7CisJCQkJCWVycm9y
 ID0gMDsKKwkJCQl9CiAJCQkJZ290byBkcm9wOworCQkJfQogCQl9CiAJCWlmIChzby0+c29fb3B0
 aW9ucyAmIFNPX0xJTkdFUikgewogCQkJaWYgKChzby0+c29fc3RhdGUgJiBTU19JU0RJU0NPTk5F
 Q1RJTkcpICYmCg==
 --=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201005272130.o4RLU4NB082160>