From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 6 08:11:35 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 06E0116A41F for ; Tue, 6 Sep 2005 08:11:35 +0000 (GMT) (envelope-from alsbergt@cs.huji.ac.il) Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 714C143D48 for ; Tue, 6 Sep 2005 08:11:34 +0000 (GMT) (envelope-from alsbergt@cs.huji.ac.il) Received: from serin.cs.huji.ac.il ([132.65.80.149]) by cs1.cs.huji.ac.il with esmtp id 1ECYYH-0007Jz-99 for freebsd-hackers@freebsd.org; Tue, 06 Sep 2005 11:11:33 +0300 Received: from alsbergt by serin.cs.huji.ac.il with local (Exim 4.44 (FreeBSD)) id 1ECYYH-0005xq-83 for freebsd-hackers@freebsd.org; Tue, 06 Sep 2005 11:11:33 +0300 Date: Tue, 6 Sep 2005 11:11:33 +0300 From: Tom Alsberg To: FreeBSD Hackers List Message-ID: <20050906081133.GA22769@cs.huji.ac.il> Mail-Followup-To: Tom Alsberg , FreeBSD Hackers List Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Face: "5"j@Y1Peoz1; ftTv>\|['ox-csmV+:_RDNdi/2lSe2x?0:HVAeVW~ajwQ7RfDlcb^18eJ; t,O,s5-aNdU/DJ2E8h1s,..4}N9$27u`pWmH|; s!zlqqVwr9R^_ji=1\3}Z6gQBYyQ]{gd5-V8s^fYf{$V2*_&S>eA|SH@Y\hOVUjd[5eah{EO@gCr.ydSpJHJIU[QsH~bC?$C@O:SzF=CaUxp80-iknM(]q(W List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Sep 2005 08:11:35 -0000 Greetings, We have a FreeBSD 5.4 server which we try run in production, but have severe problems with it crashing every few days. Having run it with DDB, it originally appeared to be filesystem and NFS related on crash. However, I enabled dumps to swap and fired up kgdb on the core, and it seems to go through vn_read and ffs_read, making the impression that it is not NFS related. I haven't yet figured enough about how to check which file the pread tried to access, which process it was, and what the process was doing at that time (any pointers to documentation appreciated, but I honestly haven't looked that much yet). Any way, it seems to be caused by pread somehow. Can somebody make more of it using the information I have now at hand (stack-trace follows)? This began when I upgraded the server from FreeBSD 4.10 to FreeBSD 5.4. The server is filesystem intensive, mainly NFS (running Samba). It appears that frames 11 and up are the relevant ones - everything below that is initiated by the trap and DDB itself. P.S. Are there easy ways to access individual processes (their data, list open files, running status, IPCs, etc. and perhaps even stack trace of them) given a kernel core file? -- Tom Follows bt from gdb: #0 doadump () at pcpu.h:160 #1 0xc046657a in db_fncall (dummy1=0, dummy2=0, dummy3=-1065484837, dummy4=0xeb4e1850 "|<...binary crap...>\n") at /r+d/5.4/src/sys/ddb/db_command.c:531 #2 0xc0466388 in db_command (last_cmdp=0xc0906664, cmd_table=0x0, aux_cmd_tablep=0xc0885e1c, aux_cmd_tablep_end=0xc0885e38) at /r+d/5.4/src/sys/ddb/db_command.c:349 #3 0xc0466450 in db_command_loop () at /r+d/5.4/src/sys/ddb/db_command.c:455 #4 0xc0467fe9 in db_trap (type=12, code=0) at /r+d/5.4/src/sys/ddb/db_main.c:221 #5 0xc0646483 in kdb_trap (type=12, code=0, tf=0x1) at /r+d/5.4/src/sys/kern/subr_kdb.c:470 #6 0xc07fafc5 in trap_fatal (frame=0xeb4e19e4, eva=28) at /r+d/5.4/src/sys/i386/i386/trap.c:812 #7 0xc07fad23 in trap_pfault (frame=0xeb4e19e4, usermode=0, eva=28) at /r+d/5.4/src/sys/i386/i386/trap.c:735 #8 0xc07fa939 in trap (frame= {tf_fs = -1067319272, tf_es = -699793392, tf_ds = 1048592, tf_edi = -699757236, tf_esi = -699757236, tf_ebp = -347203024, tf_isp = -347203056, tf_ebx = -699757236, tf_edx = 0, tf_ecx = -1024473216, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip = -1066976993, tf_cs = 8, tf_eflags = 66050, tf_esp = -699757236, tf_ss = -699757236}) at /r+d/5.4/src/sys/i386/i386/trap.c:425 #9 0xc07e890a in calltrap () at /r+d/5.4/src/sys/i386/i386/exception.s:140 #10 0xc0620018 in linker_load_file (filename=0xd64a8d4c "\002", result=0x1) at /r+d/5.4/src/sys/kern/kern_linker.c:327 #11 0xc0674176 in getnewbuf (slpflag=0, slptimeo=0, size=16384, maxsize=16384) at /r+d/5.4/src/sys/kern/vfs_bio.c:1885 #12 0xc06755fd in getblk (vp=0xc3242318, blkno=19, size=16384, slpflag=0, slptimeo=0, flags=0) at /r+d/5.4/src/sys/kern/vfs_bio.c:2585 #13 0xc0679b32 in cluster_read (vp=0xc3242318, filesize=1302528, lblkno=19, size=16384, cred=0x0, totread=32768, seqcount=0, bpp=0x0) at /r+d/5.4/src/sys/kern/vfs_cluster.c:117 #14 0xc076ed72 in ffs_read (ap=0x0) at /r+d/5.4/src/sys/ufs/ffs/ffs_vnops.c:462 #15 0xc068ed9c in vn_read (fp=0xc3be1088, uio=0xeb4e1cbc, active_cred=0xc2d55800, flags=1, td=0xc2efc780) at vnode_if.h:398 #16 0xc064f4d5 in dofileread (td=0xc2efc780, fd=61, fp=0xc3be1088, auio=0xeb4e1cbc, offset=Unhandled dwarf expression opcode 0x93 ) at file.h:233 #17 0xc064f435 in kern_preadv (td=0xc2efc780, fd=61, auio=0xeb4e1cbc, offset=319488) at /r+d/5.4/src/sys/kern/sys_generic.c:242 #18 0xc064f2e3 in pread (td=0xc2efc780, uap=0x0) at /r+d/5.4/src/sys/kern/sys_generic.c:151 #19 0xc07fb333 in syscall (frame= {tf_fs = 47, tf_es = 131119, tf_ds = 137691183, tf_edi = 0, tf_esi = 319488, tf_ebp = -1077959384, tf_isp = -347202204, tf_ebx = -2008021396, tf_edx = 137863231, tf_ecx = 61, tf_eax = 198, tf_trapno = 0, tf_err = 2, tf_eip = -2008518601, tf_cs = 31, tf_eflags = 646, tf_esp = -1077959428, tf_ss = 47}) at /r+d/5.4/src/sys/i386/i386/trap.c:1009 #20 0xc07e895f in Xint0x80_syscall () at /r+d/5.4/src/sys/i386/i386/exception.s:201 -- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.