Date: Sat, 31 Aug 2002 14:14:50 -0300 (ADT) From: "Marc G. Fournier" <scrappy@hub.org> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Arnvid Karstad <arnvid@karstad.org>, <bmah@FreeBSD.ORG>, <freebsd-stable@FreeBSD.ORG> Subject: Re: Problems with FreeBSD - causing zalloc to return 0 ?! Message-ID: <20020831140602.O14642-101000@mail1.hub.org> In-Reply-To: <20020831014427.X14642-100000@mail1.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
Morning ... as I said, shouldn't take long :)
I setup the script to log in 1 second intervals, to a log.`date +%S` file,
hoping that you only wanted the last few seconds of data? If not, I can
set it up differently, but here are the result of the last minute of its
apparent existence ...
First, the kernel panic of:
Aug 31 02:02:53 jupiter main[9361]: [1] ERROR: Cannot insert a duplicate key into unique index trackip_idx
Aug 31 03:56:53 jupiter /kernel:
Aug 31 03:56:53 jupiter /kernel:
Aug 31 03:56:53 jupiter /kernel: Fatal trap 12: page fault while in kernel mode
Aug 31 03:56:53 jupiter /kernel: mp_lock = 01000002; cpuid = 1; lapic.id = 01000000
Aug 31 03:56:53 jupiter /kernel: fault virtual address = 0x10713
Aug 31 03:56:53 jupiter /kernel: fault code = supervisor read, page not present
Aug 31 03:56:53 jupiter /kernel: instruction pointer = 0x8:0xc01e6bab
Aug 31 03:56:53 jupiter /kernel: stack pointer = 0x10:0xf9422dbc
Aug 31 03:56:53 jupiter /kernel: frame pointer = 0x10:0xf9422dc4
Aug 31 03:56:53 jupiter /kernel: code segment = base 0x0, limit 0xfffff, type 0x1b
Aug 31 03:56:53 jupiter /kernel: = DPL 0, pres 1, def32 1, gran 1
Aug 31 03:56:53 jupiter /kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Aug 31 03:56:53 jupiter /kernel: current process = 9404 (sh)
Aug 31 03:56:53 jupiter /kernel: interrupt mask = none <- SMP: XXX
Aug 31 03:56:53 jupiter /kernel: trap number = 12
Aug 31 03:56:53 jupiter /kernel: panic: page fault
Aug 31 03:56:53 jupiter /kernel: mp_lock = 01000002; cpuid = 1; lapic.id = 01000000
Aug 31 03:56:53 jupiter /kernel: boot() called on cpu#1
Aug 31 03:56:53 jupiter /kernel:
Aug 31 03:56:53 jupiter /kernel: syncing disks... Copyright (c) 1992-2002 The FreeBSD Project.
jupiter# nm -n /kernel | grep c01e6bab
jupiter# nm -n /kernel | grep c01e6ba
jupiter# nm -n /kernel | grep c01e6b
c01e6b24 t vm_map_entry_create
c01e6b68 T vm_map_lookup_entry
c01e6bd4 T vm_map_insert
The last 'logfile' shows ... you'll notice that the time on the file
corresponds to the last entry in the syslog output *before* the panic
itself, which appears to happen almost 2 hours later?
I've attached a tar file of the previous 60 stat files to this email, in
case those help at all?
Also note that this kernel is slightly older then on venus, since I was
trying to concentrate on the one before mucking any further with the
other. If you think that there have been commits since this date that
might relate, just say the word and I'll upgrade and re-run the tests:
jupiter# uname -a
FreeBSD jupiter.hub.org 4.6-STABLE FreeBSD 4.6-STABLE #15: Fri Aug 16 22:15:03 CDT 2002 root@jupiter.hub.org:/usr/obj/usr/src/sys/kernel i386
Sat Aug 31 02:02:53 CDT 2002
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
16 1105 943 1544966 1280 0
32 5978 6310 670444 640 350
64 118659 61 3631368 320 2984
128 5664 3584 773146 160 84355
256 89289 7 943543 80 73
512 698 22 53234 40 1206
1K 386 842 297419 20 60124
2K 53 327 3939 10 2288
4K 8 2 7484 5 0
8K 4 4 852 5 0
16K 9 0 9 5 0
32K 6 0 1310 5 0
64K 1 0 1 5 0
128K 4 0 4 5 0
256K 1 0 2 5 0
512K 6 0 6 5 0
Memory usage type by bucket size
Size Type(s)
16 NFSV3 srvdesc, p1003.1b, routetbl, ether_multi, BPF, vnodes, mount,
pcb, soname, shm, rman, bus, sysctloid, sysctl, kld, temp, devbuf,
atexit, proc-args
32 linux, atkbddev, dirrem, mkdir, diradd, freefile, freefrag, indirdep,
bmsafemap, newblk, tseg_qent, in_multi, routetbl, ether_multi,
ifaddr, BPF, vnodes, cluster_save buffer, pcb, soname, taskqueue,
SWAP, eventhandler, bus, sysctl, uidinfo, subproc, pgrp, kld, temp,
devbuf, proc-args, sigio
64 NFS daemon, NFS req, isadev, allocindir, allocdirect, pagedep,
in6_multi, routetbl, ether_multi, ifaddr, vnodes,
cluster_save buffer, vfscache, pcb, soname, rman, eventhandler, bus,
sysctloid, subproc, session, ip6ndp, temp, devbuf, lockf, proc-args,
file
128 ZONE, dev_t, freeblks, inodedep, routetbl, BPF, vnodes, mount,
vfscache, timecounter, soname, ttys, iov, bus, cred, kld, ip6ndp,
temp, devbuf, zombie, proc-args
256 NFSV3 srvdesc, NFS daemon, NFS srvsock, FFS node, newblk, routetbl,
faith, ifaddr, vnodes, vfscache, ttys, bus, subproc, temp, devbuf,
proc-args, kqueue, file desc
512 NFS daemon, UFS mount, ifaddr, mount, BIO buffer, ptys, msg,
ioctlops, bus, ip6ndp, temp, devbuf, prison, file desc
1K NQNFS Lease, BIO buffer, shm, ioctlops, bus, uidinfo, temp, kqueue,
file desc
2K UFS mount, ifaddr, BIO buffer, pcb, bus, kld, temp, devbuf,
file desc
4K memdesc, sem, msg, bus, proc, kld, temp, devbuf, file desc
8K UFS mount, indirdep, syncache, temp, file desc
16K shm, msg, kld, devbuf
32K mbuf, UFS mount, BPF, devbuf
64K sem
128K pagedep, sem, kld, temp
256K VM pgdata, SWAP
512K NFS hash, UFS ihash, inodedep, vfscache, SWAP, kld
Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
linux 7 1K 1K102400K 7 0 0 32
NFS hash 1 512K 512K102400K 1 0 0 512K
NQNFS Lease 1 1K 1K102400K 1 0 0 1K
NFSV3 srvdesc 0 0K 2K102400K 278150 0 0 16,256
NFS daemon 69 7K 7K102400K 69 0 0 64,256,512
NFS req 0 0K 2K102400K 138904 0 0 64
NFS srvsock 1 1K 1K102400K 1 0 0 256
atkbddev 2 1K 1K102400K 2 0 0 32
memdesc 1 4K 4K102400K 1 0 0 4K
mbuf 1 24K 24K102400K 1 0 0 32K
isadev 6 1K 1K102400K 6 0 0 64
ZONE 15 2K 2K102400K 15 0 0 128
VM pgdata 1 256K 256K102400K 1 0 0 256K
dev_t 1209 152K 152K102400K 1209 0 0 128
UFS mount 15 59K 59K102400K 15 0 0 512,2K,8K,32K
UFS ihash 1 512K 512K102400K 1 0 0 512K
FFS node 83862 20966K 20966K102400K 549687 0 0 256
dirrem 3 1K 46K102400K 39169 0 0 32
mkdir 0 0K 1K102400K 380 0 0 32
diradd 42 2K 48K102400K 39689 0 0 32
freefile 1 1K 38K102400K 17913 0 0 32
freeblks 14 2K 146K102400K 25171 0 0 128
freefrag 1 1K 20K102400K 26209 0 0 32
allocindir 0 0K 170K102400K 214872 0 0 64
indirdep 3 1K 41K102400K 1638 0 0 32,8K
allocdirect 34 3K 13K102400K 62646 0 0 64
bmsafemap 12 1K 6K102400K 25154 0 0 32
newblk 1 1K 1K102400K 277519 0 0 32,256
inodedep 73 521K 851K102400K 60847 0 0 128,512K
pagedep 37 131K 166K102400K 21968 0 0 64,128K
p1003.1b 1 1K 1K102400K 1 0 0 16
in6_multi 6 1K 1K102400K 6 0 0 64
syncache 1 8K 8K102400K 1 0 0 8K
tseg_qent 0 0K 1K102400K 110 0 0 32
in_multi 2 1K 1K102400K 2 0 0 32
routetbl 1042 158K 1804K102400K 81831 0 0 16,32,64,128,256
faith 1 1K 1K102400K 1 0 0 256
ether_multi 28 2K 2K102400K 28 0 0 16,32,64
ifaddr 187 47K 47K102400K 187 0 0 32,64,256,512,2K
BPF 8 65K 129K102400K 1968 0 0 16,32,128,32K
vnodes 114 7K 7K102400K 355 0 0 16,32,64,128,256
mount 291 146K 146K102400K 293 0 0 16,128,512
cluster_save buffer 0 0K 1K102400K 3030 0 0 32,64
vfscache 94761 6947K 6947K102400K 647062 0 0 64,128,256,512K
BIO buffer 119 157K 1433K102400K 19293 0 0 512,1K,2K
timecounter 5 1K 1K102400K 5 0 0 128
pcb 264 9K 24K102400K 107022 0 0 16,32,64,2K
soname 1180 102K 108K102400K 1305781 0 0 16,32,64,128
ptys 3 2K 2K102400K 3 0 0 512
ttys 168 22K 26K102400K 675 0 0 128,256
shm 71 74K 76K102400K 2176 0 0 16,1K,16K
sem 3 164K 164K102400K 3 0 0 4K,64K,128K
msg 4 25K 25K102400K 4 0 0 512,4K,16K
rman 44 3K 3K102400K 399 0 0 16,64
iov 0 0K 1K102400K 170 0 0 128
ioctlops 0 0K 1K102400K 15 0 0 512,1K
taskqueue 1 1K 1K102400K 1 0 0 32
SWAP 2 1097K 1237K102400K 4 0 0 32,256K,512K
eventhandler 13 1K 1K102400K 13 0 0 32,64
bus 320 31K 31K102400K 643 0 0 16,32,64,128,256,512,1K,2K,4K
sysctloid 20 1K 1K102400K 20 0 0 16,64
sysctl 0 0K 1K102400K 45810 0 0 16,32
uidinfo 12 2K 2K102400K 146 0 0 32,1K
cred 2783 348K 350K102400K 304626 0 0 128
subproc 7429 525K 537K102400K 226466 0 0 32,64,256
proc 2 8K 8K102400K 2 0 0 4K
session 1067 67K 71K102400K 6355 0 0 64
pgrp 1070 34K 36K102400K 6418 0 0 32
kld 16 453K 459K102400K 69 0 0 16,32,128,2K,4K,16K,128K,512K
ip6ndp 3 1K 1K102400K 4 0 0 64,128,512
temp 120 78K 93K102400K 487840 0 0 16,32,64,128,256,512,1K,2K,4K,8K,128K
devbuf 82 130K 130K102400K 146 0 0 16,32,64,128,256,512,2K,4K,16K,32K
lockf 363 23K 25K102400K 219975 0 0 64
prison 141 71K 71K102400K 144 0 0 512
atexit 1 1K 1K102400K 1 0 0 16
zombie 0 0K 27K102400K 105491 0 0 128
proc-args 2509 129K 129K102400K 623644 0 0 16,32,64,128,256
kqueue 233 230K 310K102400K 87154 0 0 256,1K
sigio 28 1K 1K102400K 167 0 0 32
file 17767 1111K 1127K102400K 1741065 0 0 64
file desc 4173 1125K 1140K102400K 119863 0 0 256,512,1K,2K,4K,8K
Memory Totals: In Use Free Requests
36540K 2213K 7927734
ITEM SIZE LIMIT USED FREE REQUESTS
NFSNODE: 352, 0, 3724, 544, 6112
NFSMOUNT: 544, 0, 144, 10, 144
PIPE: 160, 0, 4688, 412, 74444
SWAPMETA: 160, 509724, 249, 32, 249
unpcb: 160, 0, 1895, 155, 64106
ripcb: 192, 15360, 0, 21, 1
syncache: 160, 15359, 0, 51, 12982
tcpcb: 544, 15360, 1417, 661, 20978
udpcb: 192, 15360, 301, 104, 105893
socket: 192, 15360, 3613, 163, 191118
KNOTE: 64, 0, 4, 188, 100135
VNODE: 192, 0, 87883, 97, 87883
NAMEI: 1024, 0, 28, 108, 3748936
VMSPACE: 192, 0, 3276, 52, 108768
PROC: 416, 0, 3281, 51, 108773
DP fakepg: 64, 0, 0, 0, 0
PV ENTRY: 28, 2887566, 578636, 453547, 26313720
MAP ENTRY: 48, 0, 96358, 2030, 3940163
KMAP ENTRY: 48, 253727, 12559, 283, 97456
MAP: 108, 0, 7, 3, 7
VM OBJECT: 96, 0, 137679, 85, 1639209
141/608/61440 mbufs in use (current/peak/max):
131 mbufs allocated to data
10 mbufs allocated to packet headers
130/326/15360 mbuf clusters in use (current/peak/max)
804 Kbytes allocated to network (1% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
On Sat, 31 Aug 2002, Marc G. Fournier wrote:
>
> Okay, I've setup this script on both my (un)STABLE servers, and will
> report after the next crash ... which shouldn't take long :)
>
> On Fri, 30 Aug 2002, Matthew Dillon wrote:
>
> >
> > :Is there any definite way of determining this? From what I can find int
> > :he man page, VNODES have to do with the file system and directory lists
> > :... my last two crashes look to be related to the file system itself, so
> > :have started to watch the VNODE numbers to, but is there some way of
> > :determining what I should raise, and where?
> >
> > If you can get to a DDB> prompt on the crash (kernel config w/ DDB)
> > do this:
> >
> > ddb> print *kernel_vm_end
> >
> > And tell me what you get. Also note the fault address.
> >
> > In order to really track down the cause I need a vmcore and kernel.debug
> > to play with, but baring that you might be able to dump memory
> > statistics to a file like once a second until the machine crashes.
> >
> > while (1)
> > sleep 1
> > date >> stats.log
> > vmstat -m >> stats.log
> > vmstat -z >> stats.log
> > netstat -m >> stats.log
> > fsync stats.log
> > end
> >
> > :I'm running 4Gig of RAM and Dual CPU over here, so having a swap device
> > :large enough to 'dump core' is kinda out of the question, so I can't
> > :provide any more infomration that i have so far in other messages :(
> >
> > If you can reproduce the crash with less memory you may be able to
> > generate a core. To boot the machine with less memory add a line
> > to your /boot/loader.conf file:
> >
> > hw.physmem="768m"
> >
> > I usually always keep such a line in my loader.conf file, commented
> > out (e.g. #hw.physmem=...) until I need it, because I always forget
> > the name of the variable :-)
> >
> > :And, also, should any 'server class' operating system be more graceful
> > :about such things? Some sort of soft limit that triggers it to refuse new
> > :processes or something when its hit, so that it doesn't actually crash?
> > :Kinda like the NMBCLUSTERS warning/error message when its set too low?
> >
> > FreeBSD-current will be far more graceful, but FreeBSD-stable is still
> > using algorithms based on circa 1990 system memory capacities. As
> > memory capacities have grown larger then available KVM the algorithms
> > have been less able to cope with the massive number of resources
> > that can now be cached.
> >
> > -Matt
> >
> >
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-stable" in the body of the message
>
[-- Attachment #2 --]
p= [sܶ
lUR5{o8^ŗ;9%5%23r4I G#G#+6Y."h4.c=yrJ%5J99^%$R2|@W|MȃjoHtznA@υ6_p Bo?)/-
PqIP0jxXn˓
9@N-dϢy B\_6~^~l7ͷBC'xS Mø;(eJαx) B Z8U GpKE$aX B 7WQpoФT{aDx`dk6/Y ?jޅt5
B,o-%!K(ɫV0|'EAD *>/l/ P;b PcyAzxYE'_oǗ
Y?]lNf;jbvl^8#/V.+B.O fu1?_9?_ zxmas=[-Oywl3rx|:HrXN{~}sf^/ wp@7^/x<=]\79D{$bx沞y/abZz89^߃ү^^J;vW 忾}CogDS3r<]^^cf}͋Ao+tjPtF!ylutp8iյл5ҩ5Ud~vQ%P^Nj56f ƻL^7~qxm4/wHs;]/N}殘{,ϓK
SxȌ>E,Kl 3#ٛtrvPPiFBzЎ^O6Ui먴.b'b>ƶ}٦^GDgoye'3p%Y-u
<&8/x> EzZ-8 ѠY!<D
v߾ZGH+6zkۯOIηYoby;,e_z3>6ehFWb}䷞\>'H']Пm}#x'\c̎Ү6}#M}%eB$̲S`GMFD$M7K3(ކ?ұfP)wRRf%+Gפ 0BRCRws`vHם3m:ID۟!:u~;Mzw6Px1}J^)qo]yu0q!Nea2:?'U;s9G3{4k.7p31VP>e^⹗!Vq,74SpX2SpL&)6tIn! ɪyU m|I`ƱzZc$WLյ5nTYN3Z"Ns ̤8FLB[v{"7Q<vCm0
`E:
B,*[o9TffUЦaF:n|)Ve-hm_&РtD٢|&iǿd4M*wK?rӀ膴$yZOlP8Qmc;K
YfA&b72Vn'hix^y$u-ncE?wf<w8V9]KSC<:e&mRyE)hS ɻrTٕ@;l 2Dd8Bdgs.?*
;e6
(d2#CH=x+TE*=BF!E/-⼄BJ08pС}x=MLH!~8QvQ/,Ir2cZ2#b8w9b:vX&gyg(i6Cp+L*E#o@Ho+ZT_7(CgH7yʻ2^D'Ͱ!jg={.e02X5(ǜyz9j10Qt{|߅Dۑ-h9ucn3dL!>QI^ ʨ`;NP0Rߡ(ucS:BF_&T[Oo؈sPy)E䞞sEOŶ6q`Ai{4GӍq98ЮklYM^8lYSv{@-YЧO{K%0q5(?Z[sA]KbP
*ۥ^LZ-NHVI<uq2//nFҫ%KO
$XkRc
ƦVuHDpUmVD,]*! Ib6L֥XMN^1Xڣz]À]LJKbw?´3Jм|1NnVb2 kdźWl(&ZjNB<ԱʍE35
L<yi'}'O|CW/?(>k;JDS|yzw'pyE](<K g#'<}GE@}c/;t(12'c@(pT^D'Ϝu"YӎˤM8lO[v1ݩDTڤrg<XG2HvMC.s
9zU^4PJknp6BCmxwg>}$M)H
Xoӗ/>akc$Ͽϲ#A@u|/.r3\5/~%X/-nQbPVhҁp͌5Z7ODjcpJp[Kls4r肅1Rdz9cy2;PwwLL0,h}@
c<>Hh!r.듫/w~
Ƿ¬*d"SFø{(~<~y2K%9:]"/?Vwk_dg_o꿆ŏWk]ɷtCxzm|LoW'3rd?Xl
O}Paq\OCS):oNG:SGc1( G3iE887Bfp7~1 ?@k&j?9^Ngd?'s?'s?'s?'3~'_&0e25i&̙ #aJgF?#gVg OEO*}dH3,ϗg5Jm27?uYU/
4R8 'RYMwLX8F{q9
Y4o?
|(x]ֲ4I>AiO~ģFMgpOU:{?-sSQUo23ܣa|7)3|:@'s
O? q3?yaZz\3t+Pj'?
?0di
d#Os`(|OcOlP-)AOh8PO+h-?-1N.?]6<()58<
e?5O͝܋t brZ#\i))Gϯ=b~Xo&OOs'- .i䃉?#w!ɱ8eY
J'tՔ?.|N\!8Th\Zca@8Hq͞dqFX
d:-EӔi@oYU?#ĞdV$ИH'4en'tvy>NNkB9QPtLøh!i>1
Y&>SFS5!
݃dIRRT~ؗ4!XI}
Z*>RY/OY~^SY6
S5>wXq̚9XnJ}Znс'R^kSUS);H=Y0B},MSoOZn!-XRv;~'/Tq+OJ=!)2'弄gȈOOI(_OR#>cf'xr1KC
ϡ/y">/5*gOg YdVS5T= OE|={O#}p͈:g:Dby~+~MbB|2ab>P.Ht'_.ħOvE|h{%nin]ᡆ'W
My
j`Lr'סP8Qt=431.?#b/mٮѰ}O03hnġVSr^'R\z/SRFA}O1~'fjg{Li@8ًlM/p[FۃE)p'D:٨6=$ Ӡ8g@_B_]O|8Y)EɁ+'>'|cpYk$>a<*yږZtO21?'?Uwahhe'Gpt`dW5:ahf?l26݈l)ij<0Y'sG,9999Tdr?&s7W:%G!ǪY,!,>?Y30?k̾[
5'0#gsT1EFMgV"$Y?uSU?EuzGJ功;OEY=\k?'s?kn'YֳC?u<OoBĦl҂Av>Wޠ_Sؿ9?Gvr3Xvr*D\"Q]bdp@K?uձ|z͌E o`7w;?~'.ߥ
ʉˢz~,O'm1N?-CStAO
@qK6݇4$^ɰ?
cC Yh?M
Odɾ4 tg?1j?t!Df;@d̦v-Ҟq/3Z!M:jg«q=NgqNqzS6jʮYSDdt"wiIO?RQNS*8#ωiHЉm%|5$DN$h-uq̙;X<'Ŕqtِ-}<Qe
,HЌ^Rp9{>f9 ce}@oeTm#V{!mD{njƐqwƣ$(W=ew{V?>cʅ*νuwe*"gߋ&G>7C.8siP\t hAMEРF
a+.w#mӠﷶv;45ѠCn+*]4ew0ۋvhrKGz"ӠaGXO!^f{Ѡa1[
EԺhbB9$T]=hP7j5*+ً?4n9Er۸[*m:~u0 ;(לS:nUY8ng{kr4wzʧ@Cbl鿵@ą>G.o-u[9K0*d5v˩kR8C{?rd$!?^tx|3kOYJ2s??.o{3xY
4*r`ʷs
2dꓩϏ>O>sL\>=8?ps/??ӞfxL}~͆^sS'p;Sy\JM6'?cg`3h#)]z<} ]7@շ/?F;O==?Ly8]֣\~4;h7ӝ?sۘLi>|S~5'jU#6=?OO SAS҆. =\^ncgu?OKo?-ԷcISTGOEj#ɝ9~Qi\?ڗg(*OaO?dO?dO?g`'UޒL(?Gs?JZY?O?ߍL?9D?djs'ndϑn{OմeO08gޏ{Z,o
:?t Ia
Ti0٠։?-Ӓ
1qg'B?kO7>jm?3i2??
Oz_OG`iO_y]nscUiTع翻66kҦBA"}%=/8133mAAKS`<02y!#'yV1U(S[2DŽx_-_. v@f
}O?dO?~3;`L'϶o*k)ҮgŕtjFoD Ԑ4_S߽*DRak4YҦYF6Cju:(O=OI}!8Z.3l h?7gf/$MwW#"侒"?r&b. dώ*:\Skir=~-\mgr>L|ϗ0]SqQPWD1ٷv1l7
Xg}'@U7@1K67ڰsLm
<>}8Ԋ5qӜw?qrub<3W- ϢKI] =)]Xle&I?-]kJ?op?!^Tl<uж튔ΘnLbS#ez.6&OlӚ .g&Q!%dP?YJ|/ݒtF?ϸk'}":(N^?Iĩ'7(o?)V=Vwa8cr%]ؚ<36>mJǚhe/!5Ʀ
q2֜w<!~I2\&}4&W?5sK?(gUFH_KUjt@_-Jq,̓ݬ
qF.>