From owner-freebsd-current@FreeBSD.ORG Sun Jul 16 14:23:48 2006 Return-Path: X-Original-To: freebsd-current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A8D9F16A4DA for ; Sun, 16 Jul 2006 14:23:48 +0000 (UTC) (envelope-from nork@FreeBSD.org) Received: from sakura.ninth-nine.com (sakura.ninth-nine.com [219.127.74.120]) by mx1.FreeBSD.org (Postfix) with ESMTP id 31AE543D4C for ; Sun, 16 Jul 2006 14:23:48 +0000 (GMT) (envelope-from nork@FreeBSD.org) Received: from nadesico.ninth-nine.com ([IPv6:2001:3e0:4cf:1:d2:ff:fe23:1b4]) (authenticated bits=0) by sakura.ninth-nine.com (8.13.6/8.13.6/NinthNine) with ESMTP id k6GENfU6052399 for ; Sun, 16 Jul 2006 23:23:46 +0900 (JST) (envelope-from nork@FreeBSD.org) Date: Sun, 16 Jul 2006 23:23:38 +0900 From: Norikatsu Shigemura To: freebsd-current@FreeBSD.org Message-Id: <20060716232338.2357f50a.nork@FreeBSD.org> X-Mailer: Sylpheed version 2.2.6 (GTK+ 2.8.20; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (sakura.ninth-nine.com [IPv6:2001:3e0:4cf:0:230:48ff:fe41:2455]); Sun, 16 Jul 2006 23:23:46 +0900 (JST) Cc: Subject: many thread applications are unstable on 7-current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Jul 2006 14:23:48 -0000 I think that this is FYI, maybe. Sorry, I don't know why it causes. I could not trace problems. The problems are happened for about one ~ two month, but I report following environment with SYMVER_ENABLE=YES. I don't know what happened on 7-current without SYMBER_ENABLE. FreeBSD nadesico.ninth-nine.com 7.0-CURRENT FreeBSD 7.0-CURRENT #87: Sun Jul 16 11:48:32 JST 2006 nork@nadesico.ninth-nine.com:/usr/obj/usr/src/sys/NADESICO i386 The applications which I troubled are following: 1. cvsync (net/cvsync) 2. openoffice.org (editors/openoffice.org-2.0) 3. firefox (www/firefox) 4. scim-launcher (textproc/scim) It offten crash, and automatically recovery. And it is too hard to trace:-(. So I didn't trace it. ---------------------------------------------------------- 1. cvsync (net/cvsync) # ktrace -i /usr/local/bin/cvsync -v -c FreeBSD.conf : Updating (collection freebsd-cvsroot-ports/rcs) /home/ncvs/CVSROOT-ports/commitlogs/.cvsync.QCnj4R: Unknown error: 0 Updater(RCS): UPDATE: /home/ncvs/CVSROOT-ports/commitlogs/ports Updater: RCS Error Socket Error: recv: 2 residue 2 Receiver Error: recv Mux(RECV) Error: not running: 0 FileScan: RCS Error Mux(SEND) Error: not running: 0 DirScan: RCS Error Failed I got following kdump's result: (snip) 4667 cvsync CALL utimes(0x831071c,0xbf5faeb0) 4667 cvsync NAMI "/home/ncvs/CVSROOT-ports/commitlogs/.cvsync.QCnj4R" 4667 cvsync RET utimes 0 4667 cvsync CALL rename(0x831071c,0x8310218) 4667 cvsync NAMI "/home/ncvs/CVSROOT-ports/commitlogs/.cvsync.QCnj4R" 4667 cvsync NAMI "/home/ncvs/CVSROOT-ports/commitlogs/ports" 4667 cvsync RET rename 0 4667 cvsync CALL unlink(0x831071c) 4667 cvsync NAMI "/home/ncvs/CVSROOT-ports/commitlogs/.cvsync.QCnj4R" 4667 cvsync RET unlink -1 errno 2 No such file or directory 4667 cvsync CALL write(0x2,0xbf5fa690,0x45) 4667 cvsync GIO fd 2 wrote 69 bytes "/home/ncvs/CVSROOT-ports/commitlogs/.cvsync.QCnj4R: Unknown error: 0 " 4667 cvsync RET write 69/0x45 (snip) It looks like errno cross-talking per threads. ---------------------------------------------------------- 2. openoffice.org (editors/openoffice.org-2.0) $ ktrace -i /usr/local/bin/openoffice.org (Open new document) X IO Error I got following kdump's result: (snip) 4720 soffice.bin CALL socket(PF_LOCAL|PF_UNIX,SOCK_STREAM,0) 4720 soffice.bin RET socket 36/0x24 (snip) 4720 soffice.bin CALL connect(0x24,0xbfbfceb8,0x13) 4720 soffice.bin NAMI "/tmp/.X11-unix/X0" 4720 soffice.bin RET connect 0 (snip) 4720 soffice.bin CALL fcntl(0x24,F_SETFD,FD_CLOEXEC) 4720 soffice.bin RET fcntl 0 (snip) 4720 soffice.bin CALL write(0x24,0x9014000,0x8) 4720 soffice.bin GIO fd 36 wrote 8 bytes 0x0000 1700 0200 2401 0000 |....$...| 4720 soffice.bin RET write 8 4720 soffice.bin CALL read(0x24,0xbf4f9aac,0x20) 4720 soffice.bin RET read -1 errno 35 Resource temporarily unavailable 4720 soffice.bin CALL write(0x2,0xbf4f9404,0xb) 4720 soffice.bin GIO fd 2 wrote 11 bytes "X IO Error " 4720 soffice.bin RET write 11/0xb 4720 soffice.bin CALL exit(0) Sorry, I don't have any coments. ---------------------------------------------------------- 3. firefox (www/firefox) $ ktrace -i /usr/X11R6/bin/firefox http://www.google.co.jp/ I got following kdump's result: (snip) 4862 firefox-bin CALL socket(PF_INET,SOCK_STREAM,0) 4862 firefox-bin RET socket 34/0x22 4862 firefox-bin CALL fcntl(0x22,F_GETFL,0xbf8fdb38) 4862 firefox-bin RET fcntl 2 4862 firefox-bin CALL fcntl(0x22,F_SETFL,O_RDWR|O_NONBLOCK) 4862 firefox-bin RET fcntl 0 4862 firefox-bin CALL connect(0x22,0xbf8fdc50,0x10) 4862 firefox-bin RET connect -1 errno 36 Operation now in progress 4862 firefox-bin CALL write(0x9,0x68284d2d,0x1) 4862 firefox-bin GIO fd 9 wrote 1 byte "8" 4862 firefox-bin RET write 1 4862 firefox-bin CALL close(0x22) 4862 firefox-bin RET close 0 (snip) 4862 firefox-bin CALL socket(PF_INET,SOCK_STREAM,0) 4862 firefox-bin RET socket 35/0x23 4862 firefox-bin CALL fcntl(0x23,F_GETFL,0xbf8fdb38) 4862 firefox-bin RET fcntl 2 4862 firefox-bin CALL fcntl(0x23,F_SETFL,O_RDWR|O_NONBLOCK) 4862 firefox-bin RET fcntl 0 4862 firefox-bin CALL connect(0x23,0xbf8fdc50,0x10) 4862 firefox-bin RET connect -1 errno 36 Operation now in progress 4862 firefox-bin CALL write(0x9,0x68284d2d,0x1) 4862 firefox-bin GIO fd 9 wrote 1 byte "8" 4862 firefox-bin RET write 1 4862 firefox-bin CALL close(0x23) 4862 firefox-bin RET close 0 (snip) I got following result of 'tcpdump -pn not port ssh'. 23:10:32.340676 IP 219.127.74.122.54603 > 66.249.89.104.80: S 3969568294:3969568294(0) win 65535 23:10:32.358419 IP 66.249.89.104.80 > 219.127.74.122.54603: S 2563066654:2563066654(0) ack 3969568295 win 8190 23:10:32.358486 IP 219.127.74.122.54603 > 66.249.89.104.80: R 3969568295:3969568295(0) win 0 23:10:32.823420 IP 219.127.74.122.54748 > 63.245.209.21.80: S 3885439161:3885439161(0) win 65535 23:10:32.970835 IP 63.245.209.21.80 > 219.127.74.122.54748: S 2028454746:2028454746(0) ack 3885439162 win 8190 23:10:32.970930 IP 219.127.74.122.54748 > 63.245.209.21.80: R 3885439162:3885439162(0) win 0 According to tcpdump, this is two times connect(2) failure are matched. 219.127.74.122:54603 -> 66.249.89.104:80 SYN 219.127.74.122:54603 <- 66.249.89.104:80 SYN+ACK 219.127.74.122:54603 -> 66.249.89.104:80 RSET 219.127.74.122.54748 -> 63.245.209.21.80 SYN 219.127.74.122.54748 <- 63.245.209.21.80 SYN+ACK 219.127.74.122.54748 -> 63.245.209.21.80 RSET I know why connect(2) is falure, but I don't know why close(2).