From owner-freebsd-hackers Tue Apr 13 7:45:54 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from gate.lustig.com (gate.lustig.com [205.246.2.242]) by hub.freebsd.org (Postfix) with SMTP id E4673152F7 for ; Tue, 13 Apr 1999 07:45:34 -0700 (PDT) (envelope-from barry@lustig.com) Received: (qmail 72827 invoked from network); 13 Apr 1999 14:43:17 -0000 Received: from devious.lustig.com (205.246.2.244) by gate.lustig.com with SMTP; 13 Apr 1999 14:43:17 -0000 Received: (qmail 16672 invoked by uid 1001); 13 Apr 1999 14:44:09 -0000 Message-ID: <19990413144409.16671.qmail@devious.lustig.com> Content-Type: text/plain MIME-Version: 1.0 (NeXT Mail 4.2mach v148) In-Reply-To: <199904122143.RAA25814@cs.rpi.edu> X-Nextstep-Mailer: Mail 4.2mach (Enhance 2.2p1) Received: by NeXT.Mailer (1.148.RR) From: Barry Lustig Date: Tue, 13 Apr 1999 10:44:08 -0400 To: "David E. Cross" Subject: Re: ypserv Cc: freebsd-hackers@FreeBSD.ORG Reply-To: barry@Lustig.COM References: <199904122143.RAA25814@cs.rpi.edu> X-Organizations: Barry Lustig & Associates, Inc. Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG David, You need to compile ypserv with ElectricFence to start to track down this problem. Otherwise, you don't see the problem until much later after the problem really occurs. I have a network with 3 2.x-stable machines running ypserv and 1 3.x-stable machine running ypserv. These machines serve yp to solaris 2.5.1, 2.6, 2.7, irix 5.3, 6.2, 6.3, 6.4, 6.5, and openstep 4.2 machines. I see dozens of SEGV's from the ypserv processes on the FreeBSD boxes. Here is what I managed to find. I was waiting for Bill Paul to get back to me on this, but haven't heard from him. Here is a stack trace from one of my cores: (gdb) bt #0 0x62ef in readtcp (xprt=0x20524fcc, buf=0x2066b05c , len=4000) at /usr/src/lib/libc/../libc/rpc/svc_tcp.c:346 #1 0xf103 in fill_input_buf (rstrm=0x204eefbc) at /usr/src/lib/libc/../libc/xdr/xdr_rec.c:510 #2 0xf1ed in skip_input_bytes (rstrm=0x204eefbc, cnt=4) at /usr/src/lib/libc/../libc/xdr/xdr_rec.c:576 #3 0xefd5 in xdrrec_eof (xdrs=0x204ece58) at /usr/src/lib/libc/../libc/xdr/xdr_rec.c:436 #4 0x636d in svctcp_stat (xprt=0x20524fcc) at /usr/src/lib/libc/../libc/rpc/svc_tcp.c:381 #5 0x7a0e in svc_getreqset (readfds=0xefbfd924) at /usr/src/lib/libc/../libc/rpc/svc.c:480 #6 0x4241 in yp_svc_run () at /usr/src/usr.sbin/ypserv/yp_main.c:144 #7 0x47e4 in main (argc=1, argv=0xefbfd99c) at /usr/src/usr.sbin/ypserv/yp_main.c:335 *** By the way, this is applicable to 3.x-Stable as well If you follow the stack frames up, it is clear that the problem is stemming from svc_getreqset in libc/rpc/svc.c (which I've included below). I get the SEGV at the call to SVC_STAT(xprt). Based upon the values of sock and bit: $31 = (SVCXPRT *) 0x20524fcc (gdb) p sock $32 = 0 (gdb) p bit $33 = 21 (gdb) p *xprt Error accessing memory address 0x20524fcc: Bad address. xprt should be pointing to xports[20]. Here is xports[20] (gdb) p xports[20] $28 = (SVCXPRT *) 0x0 I think that one of the children processes (or the parent process) may be calling svctcp_destroy(xprt) or svcudp_destroy(xprt) on the slot that the parent process (or child process) is working with. barry void svc_getreqset(readfds) fd_set *readfds; { enum xprt_stat stat; struct rpc_msg msg; int prog_found; u_long low_vers; u_long high_vers; struct svc_req r; register SVCXPRT *xprt; register u_long mask; register int bit; register u_long *maskp; register int setsize; register int sock; char cred_area[2*MAX_AUTH_BYTES + RQCRED_SIZE]; msg.rm_call.cb_cred.oa_base = cred_area; msg.rm_call.cb_verf.oa_base = &(cred_area[MAX_AUTH_BYTES]); r.rq_clntcred = &(cred_area[2*MAX_AUTH_BYTES]); setsize = _rpc_dtablesize(); maskp = (u_long *)readfds->fds_bits; for (sock = 0; sock < setsize; sock += NFDBITS) { for (mask = *maskp++; (bit = ffs(mask)); mask ^= (1 << (bit - 1))) { /* sock has input waiting */ xprt = xports[sock + bit - 1]; if (xprt == NULL) /* But do we control sock? */ continue; /* now receive msgs from xprtprt (support batch calls) */ do { if (SVC_RECV(xprt, &msg)) { /* now find the exported program and call it */ register struct svc_callout *s; enum auth_stat why; r.rq_xprt = xprt; r.rq_prog = msg.rm_call.cb_prog; r.rq_vers = msg.rm_call.cb_vers; r.rq_proc = msg.rm_call.cb_proc; r.rq_cred = msg.rm_call.cb_cred; /* first authenticate the message */ if ((why= _authenticate(&r, &msg)) != AUTH_OK) { svcerr_auth(xprt, why); goto call_done; } /* now match message with a registered service*/ prog_found = FALSE; low_vers = 0 - 1; high_vers = 0; for (s = svc_head; s != NULL_SVC; s = s->sc_next) { if (s->sc_prog == r.rq_prog) { if (s->sc_vers == r.rq_vers) { (*s->sc_dispatch)(&r, xprt); goto call_done; } /* found correct version */ prog_found = TRUE; if (s->sc_vers < low_vers) low_vers = s->sc_vers; if (s->sc_vers > high_vers) high_vers = s->sc_vers; } /* found correct program */ } /* * if we got here, the program or version * is not served ... */ if (prog_found) svcerr_progvers(xprt, low_vers, high_vers); else svcerr_noprog(xprt); /* Fall through to ... */ } call_done: if ((stat = SVC_STAT(xprt)) == XPRT_DIED){ SVC_DESTROY(xprt); break; } } while (stat == XPRT_MOREREQS); } } } static void svctcp_destroy(xprt) register SVCXPRT *xprt; { register struct tcp_conn *cd = (struct tcp_conn *)xprt->xp_p1; xprt_unregister(xprt); (void)close(xprt->xp_sock); if (xprt->xp_port != 0) { /* a rendezvouser socket */ xprt->xp_port = 0; } else { /* an actual connection socket */ XDR_DESTROY(&(cd->xdrs)); } mem_free((caddr_t)cd, sizeof(struct tcp_conn)); mem_free((caddr_t)xprt, sizeof(SVCXPRT)); } static void svcudp_destroy(xprt) register SVCXPRT *xprt; { register struct svcudp_data *su = su_data(xprt); xprt_unregister(xprt); (void)close(xprt->xp_sock); XDR_DESTROY(&(su->su_xdrs)); mem_free(rpc_buffer(xprt), su->su_iosz); mem_free((caddr_t)su, sizeof(struct svcudp_data)); mem_free((caddr_t)xprt, sizeof(SVCXPRT)); } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message