From owner-freebsd-current@FreeBSD.ORG Sun Aug 19 21:53:55 2007 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE2BD16A418; Sun, 19 Aug 2007 21:53:55 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id A7E2413C428; Sun, 19 Aug 2007 21:53:55 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net [67.160.44.208]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l7JLrqQY088531 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Sun, 19 Aug 2007 17:53:54 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sun, 19 Aug 2007 14:56:38 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: Robert Watson In-Reply-To: <20070819144903.U568@10.0.0.1> Message-ID: <20070819145532.O568@10.0.0.1> References: <20070819214953.I1099@fledge.watson.org> <20070819144903.U568@10.0.0.1> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: attilio@freebsd.org, dds@freebsd.org, current@freebsd.org Subject: Re: Crash in accounting code: encode_long(), due to bad rusage data? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2007 21:53:56 -0000 On Sun, 19 Aug 2007, Jeff Roberson wrote: > On Sun, 19 Aug 2007, Robert Watson wrote: > >> >> Diomidis, Jeff, Attilio, >> >> I recently upgraded two servers from FreeBSD 6-STABLE to FreeBSD 7-CURRENT >> in anticipation of the forthcoming release. Both of them run with >> accounting enabled at all times. When a large pine session was exiting on >> one of the two boxes, I ran into the following panic: >> >> panic: encode_long: -ve value -32749 >> KDB: enter: panic >> db> bt >> Tracing pid 81759 tid 100377 td 0xc4a76400 >> kdb_enter(c0a9f844,1,c0a9ab3f,e6937b10,1,...) at kdb_enter+0x32 >> panic(c0a9ab3f,ffff8013,e6937bb0,e6937c14,c0719dc2,...) at panic+0x124 >> encode_long(e6937bb0,e6937bb8,e6937bc0,e6937bb8,0,...) at encode_long+0x1d >> acct_process(c4a76400,c0a9c0c2,162,129,e6937c50,...) at acct_process+0x1c2 >> exit1(c4a76400,0,e6937d2c,c0a08303,c4a76400,...) at exit1+0x957 >> sys_exit(c4a76400,e6937cfc,4,e6937d38,c0b4bed8,...) at sys_exit+0x1d >> syscall(e6937d38) at syscall+0x2b3 >> Xint0x80_syscall() at Xint0x80_syscall+0x20 >> --- syscall (1, FreeBSD ELF32, sys_exit), eip = 0x355f2093, esp = >> 0xbfbfe0ec, ebp = 0xbfbfe108 --- >> >> kgdb was of mixed helpfulness: >> >> #2 0xc048d575 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401 >> #3 0xc048ece5 in db_trap (type=3, code=0) at >> /usr/src/sys/ddb/db_main.c:222 >> #4 0xc0773d56 in kdb_trap (type=3, code=0, tf=0xe6937a90) >> at /usr/src/sys/kern/subr_kdb.c:502 >> #5 0xc0a08aeb in trap (frame=0xe6937a90) at >> /usr/src/sys/i386/i386/trap.c:621 >> #6 0xc09ede3b in calltrap () at /usr/src/sys/i386/i386/exception.s:139 >> #7 0xc0773ed2 in kdb_enter (msg=0xc0a9f844 "panic") at cpufunc.h:60 >> #8 0xc074c5d4 in panic (fmt=0xc0a9ab3f "encode_long: -ve value %ld") >> at /usr/src/sys/kern/kern_shutdown.c:547 >> #9 0xc0719bad in encode_long (val=18) at /usr/src/sys/kern/kern_acct.c:525 >> #10 0xc0719dc2 in acct_process (td=0xc4a76400) >> at /usr/src/sys/kern/kern_acct.c:391 >> #11 0xc072d7e7 in exit1 (td=0xc4a76400, rv=0) >> at /usr/src/sys/kern/kern_exit.c:355 >> #12 0xc072e21d in sys_exit (td=Could not find the frame base for >> "sys_exit". >> ) at /usr/src/sys/kern/kern_exit.c:98 >> #13 0xc0a08303 in syscall (frame=0xe6937d38) >> at /usr/src/sys/i386/i386/trap.c:1008 >> #14 0xc09edea0 in Xint0x80_syscall () >> at /usr/src/sys/i386/i386/exception.s:196 >> #15 0x00000033 in ?? () >> Previous frame inner to this frame (corrupt stack?) >> (kgdb) frame 9 >> #9 0xc0719bad in encode_long (val=18) at /usr/src/sys/kern/kern_acct.c:525 >> 525 KASSERT(val >= 0, ("encode_long: -ve value %ld", val)); >> (kgdb) print val >> $1 = 18 >> (kgdb) list >> 520 encode_long(long val) >> 521 { >> 522 int norm_exp; /* Normalized exponent */ >> 523 int shift; >> 524 >> 525 KASSERT(val >= 0, ("encode_long: -ve value %ld", val)); >> 526 if (val == 0) >> 527 return (0); >> 528 norm_exp = fls(val) - 1; >> 529 shift = FLT_MANT_DIG - norm_exp - 1; >> (kgdb) up >> #10 0xc0719dc2 in acct_process (td=0xc4a76400) >> at /usr/src/sys/kern/kern_acct.c:391 >> 391 acct.ac_mem = encode_long((ru.ru_ixrss + >> ru.ru_idrss > > ru_idrss an integral value of the amount of unshared memory residing > in the data segment of a process (expressed in units of > kilobytes * ticks-of-execution). > > That seems like it's pretty prone to overflow given that this is a 32bit > machine. This may just be an improper assert. I'm shocked that noone has > run into this so far. Sorry, should elaborate, see below. > > Jeff > >> + >> (kgdb) list >> 386 tmp = ut; >> 387 timevaladd(&tmp, &st); >> 388 /* Convert tmp (i.e. u + s) into hz units to match ru_i*. >> */ >> 389 t = tmp.tv_sec * hz + tmp.tv_usec / tick; >> 390 if (t) >> 391 acct.ac_mem = encode_long((ru.ru_ixrss + >> ru.ru_idrss + >> 392 + ru.ru_isrss) / t); >> 393 else >> 394 acct.ac_mem = 0; >> 395 >> (kgdb) inspect ru >> $2 = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, >> tv_usec = 0}, ru_maxrss = 509908, ru_ixrss = 26810576, >> ru_idrss = -2016077424, ru_isrss = 1044992, ru_minflt = 372535, >> ru_majflt = 42, ru_nswap = 0, ru_inblock = 58, ru_oublock = 278, >> ru_msgsnd = 1907, ru_msgrcv = 4671, ru_nsignals = 32, ru_nvcsw = 19892, >> ru_nivcsw = 4614} > Can you tell me what's in t, tmp, hz and tick? Thanks, Jeff > >> I find the large negative value in ru_idrss somewhat sad to contemplate, >> and while this could well be a problem with capturing of process runtime >> information, I'd like it if the accounting code were robust against this >> sort of bug, rather than panicking, and I guess I'd also rather than the >> process runtime information also be correctly captured :-). >> >> I have a usable core dump, and have disabled accounting in the interim. >> >> Robert N M Watson >> Computer Laboratory >> University of Cambridge >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >> >