From owner-freebsd-hackers Mon Apr 9 8:40:15 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from cs.rpi.edu (mumble.cs.rpi.edu [128.213.8.16]) by hub.freebsd.org (Postfix) with ESMTP id 889F037B422 for ; Mon, 9 Apr 2001 08:40:12 -0700 (PDT) (envelope-from crossd@cs.rpi.edu) Received: from cs.rpi.edu (bill.cs.rpi.edu [128.213.2.2]) by cs.rpi.edu (8.9.3/8.9.3) with ESMTP id LAA52639 for ; Mon, 9 Apr 2001 11:40:12 -0400 (EDT) Message-Id: <200104091540.LAA52639@cs.rpi.edu> To: freebsd-hackers@freebsd.org Subject: sigh... ypserv bug still very much alive Date: Mon, 09 Apr 2001 11:40:11 -0400 From: "David E. Cross" Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG The ypserv bug (the one where ypserv randomly stops responding or just seg-faults) is still very much alive. I had to restart it about 11 times in the course of 20 minutes this morning. That's the bad news, the good news is that I started it each time with 'ktrace -i'. Going back a bit, Matt Dillon suggested that the problem may have been in the signal handler for sigchld. I looked at the signal handler and it does not appear to be doing anything dangerous at all (just a child_count--;) is it doing something dangerous that I am just not seeing? Also, in the last 200 lines of kdump output for each and every crash there is the sequence of calls "select(); gettimeofday();"... that sequence of calls never appears in the ypserv source code, but does appear in svc_tcp.c in librpc... my question is: "ypserv defines its own svc_run, and for TCP connections specifically handles things itself very carefully, how is the svc_tcp.c code getting called at all?" I think the answer to that is the source of the problem (it should also be noted that in the case where ypserv hasn't died and I have collected ktrace information -- up to 8 gig of it -- the "select(); gettimeofday();" sequence is _never_ called.) One of my ktrace-s is _very_ small, only 330K, from fork()/exec() to SIG_DFL/SEGV, so I am hoping this will provide easily digestible information. I did not include context-switch information in the ktrace for the following reasons: 1) It didn't appear to be usefull, and since I did specify the -i, it is obvious where context switches occur (to the only thing that could affect anything: the children) 2) It caused ypserv to act strangely... instead of dying, it just got very slow, and didn't respond. Anyone interested in helping me track this one down? -- David Cross | email: crossd@cs.rpi.edu Lab Director | Rm: 308 Lally Hall Rensselaer Polytechnic Institute, | Ph: 518.276.2860 Department of Computer Science | Fax: 518.276.4033 I speak only for myself. | WinNT:Linux::Linux:FreeBSD To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message