Date: Wed, 02 Apr 2003 07:11:26 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Dmitry Sivachenko <mitya@cavia.pp.ru> Cc: hackers@freebsd.org Subject: Re: Repeated similar panics on -STABLE Message-ID: <3E8AFD9E.A34213B4@mindspring.com> References: <20030402134428.GA43549@fling-wing.demos.su>
next in thread | previous in thread | raw e-mail | index | archive | help
Dmitry Sivachenko wrote: > We have three machines under relatively high load. They are running -STABLE > on the same hardware with 2 processors (and SMP kernel). > Periodically (approximately once a week) they panic with similar symptoms: [ ... ] Panic. > #18 0xc0162549 in panic (fmt=0xc028e3b9 "%s") > at /mnt/se3/releng_4/src/sys/kern/kern_shutdown.c:595 > #19 0xc0251b1a in trap_fatal (frame=0xeb278e04, eva=1558020096) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:974 > #20 0xc0251775 in trap_pfault (frame=0xeb278e04, usermode=0, eva=1558020096) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:867 > #21 0xc02512b7 in trap (frame={tf_fs = -1072300008, tf_es = -361627632, > tf_ds = 16, tf_edi = -1070989600, tf_esi = -349729108, > tf_ebp = -349729176, tf_isp = -349729232, tf_ebx = -1070870564, > tf_edx = 1558020096, tf_ecx = 7, tf_eax = 128, tf_trapno = 12, > tf_err = 0, tf_eip = -1072309505, tf_cs = 8, tf_eflags = 66054, > tf_esp = 0, tf_ss = -349729108}) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:466 Page not present error. > #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0) > at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243 Malloc failure was not checked for return value by source code; probably the kbp list was just refreshed, and while you were calling the failing malloc, the list was reemptied. What this generally means is that KVA was exhausted, and the caller did not expect that. To workaround: don't exhaust the KVA space; probably you have tuned some kernel parameter way too high. To fix: at line 243, you need to check if va is NULL; if it is, you need to wheck the M_WAITOK, and if set, restart the allocation. This has to be done before the next line, where "va" is dereferenced. Maybe something like: Change: va = kbp->kb_next; kbp->kb_next = ((struct freelist *)va)->next; To: va = kbp->kb_next; if (va == NULL) { if (flags & M_NOWAIT) { splx(s); return ((void *) NULL); } goto restart; /* put this label above the "while" */ } kbp->kb_next = ((struct freelist *)va)->next; Working around the problem is easier (IMO): just change your tuning parameters to avoid running out of KVA. Probably your mbufs or mbufclusters are way to large, for your amount of physical RAM; remember that, except in very sepcial circumstances, kernel memory is non-pageable. > #23 0xc015a3fe in exit1 (p=0xea726820, rv=15) > at /mnt/se3/releng_4/src/sys/kern/kern_exit.c:166 It was trying to allocate a "zombie" structure. > #24 0xc0164011 in sigexit (p=0xea726820, sig=15) > at /mnt/se3/releng_4/src/sys/kern/kern_sig.c:1503 For a process someone sent a SIGTERM to, to kill it. > #25 0xc0163d9c in postsig (sig=15) > at /mnt/se3/releng_4/src/sys/kern/kern_sig.c:1406 > #26 0xc0251fc5 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, > tf_edi = 174, tf_esi = 1049187701, tf_ebp = -1077936960, > tf_isp = -349728812, tf_ebx = 1, tf_edx = 3, tf_ecx = -1078002496, > tf_eax = 3, tf_trapno = 7, tf_err = 2, tf_eip = 672039098, tf_cs = 31, > tf_eflags = 659, tf_esp = -1078069180, tf_ss = 47}) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:174 Looks like you caused a floating point exception, and died when the exit1 failed to create a zombie structure for the process. -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E8AFD9E.A34213B4>