From owner-freebsd-hackers  Wed May  3 16:53:38 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from account.abs.net (account.abs.net [207.114.5.70])
	by hub.freebsd.org (Postfix) with ESMTP
	id EA5C537BEC3; Wed,  3 May 2000 16:53:23 -0700 (PDT)
	(envelope-from howardl@account.abs.net)
Received: (from howardl@localhost)
	by account.abs.net (8.9.3/8.9.3+RBL+DUL+RSS+ORBS) id TAA49749;
	Wed, 3 May 2000 19:53:11 -0400 (EDT)
	(envelope-from howardl)
From: Howard Leadmon <howardl@account.abs.net>
Message-Id: <200005032353.TAA49749@account.abs.net>
Subject: Re: Debugging Kernel/System Crashes, can anyone help??
In-Reply-To: <200005031724.KAA63381@apollo.backplane.com> from Matthew Dillon
 at "May 3, 2000 10:24:36 am"
To: Matthew Dillon <dillon@apollo.backplane.com>
Date: Wed, 3 May 2000 19:53:11 -0400 (EDT)
Cc: Greg Lehey <grog@lemis.com>, freebsd-stable@FreeBSD.ORG,
	freebsd-hackers@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL72 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


OK, well as I don't remember what options had been in what kernel from the
old crashes, I just setup the machine to generate more crash dumps and sure
enough it was willing to give me one quickly.. :)

Here is a backtrace done on the dump I got only a few minutes ago, also ple=
ase
note that currently I am using a DEC based network card instead of the EEpro
adapter as I had both sitting around.  If it would be better to try and get
the dumps with the EEpro I am sure it can be arranged. Also note that I am
currently running SMP, but did remove one CPU and built a non-SMP kernel to
see what happened, and still the machine dies..


Here is the gdb info:

# gdb -k kernel.0 vmcore.0
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
(no debugging symbols found)...
SMP 2 cpus
IdlePTD 3112960
initial pcb at 2815e0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
mp_lock =3D 00000002; cpuid =3D 0; lapic.id =3D 00000000
fault virtual address   =3D 0x261e930
fault code              =3D supervisor write, page not present
instruction pointer     =3D 0x8:0xc017246b
stack pointer           =3D 0x10:0xff806f34
frame pointer           =3D 0x10:0xff806f79
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, def32 1, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D Idle
interrupt mask          =3D net  <- SMP: XXX
trap number             =3D 12
panic: page fault
mp_lock =3D 00000002; cpuid =3D 0; lapic.id =3D 00000000
boot() called on cpu#0

syncing disks...=20

Fatal trap 12: page fault while in kernel mode
mp_lock =3D 00000003; cpuid =3D 0; lapic.id =3D 00000000
fault virtual address   =3D 0x30
fault code              =3D supervisor read, page not present
instruction pointer     =3D 0x8:0xc01cdca5
stack pointer           =3D 0x10:0xff806d5c
frame pointer           =3D 0x10:0xff806d60
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, def32 1, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D Idle
interrupt mask          =3D net bio cam  <- SMP: XXX
trap number             =3D 12
panic: page fault
mp_lock =3D 00000003; cpuid =3D 0; lapic.id =3D 00000000
boot() called on cpu#0
Uptime: 4m48s

dumping to dev #ad/0x20001, offset 128
dump ata0: resetting devices .. done
383 382 381 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365=
 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 34=
6 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 3=
27 326 325 324 323 322 321 32=
0 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 3=
01 300 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 =
282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264=
 263 262 261 260 259 258 257=20=
256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238=
 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 21=
9 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 2=
00 199 198 197 196 195 194 19=
3 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 1=
74 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 =
155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137=
 136 135 134 133 132 131 130=20=
129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111=
 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 =
89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 =
64 63 62 61 60 59 58 57 56 55=
 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30=
 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 =
2 1 0=20
---
#0  0xc013abd1 in boot ()
(kgdb) back
#0  0xc013abd1 in boot ()
#1  0xc013af94 in poweroff_wait ()
#2  0xc02283cc in trap_fatal ()
#3  0xc022805d in trap_pfault ()
#4  0xc0227c57 in trap ()
#5  0xc01cdca5 in acquire_lock ()
#6  0xc01d2f7c in softdep_count_dependencies ()
#7  0xc01d624c in ffs_fsync ()
#8  0xc01d4d66 in ffs_sync ()
#9  0xc01673ef in sync ()
#10 0xc013a9b3 in boot ()
#11 0xc013af94 in poweroff_wait ()
#12 0xc02283cc in trap_fatal ()
#13 0xc022805d in trap_pfault ()
#14 0xc0227c57 in trap ()
#15 0xc017246b in bpfioctl ()
#16 0xc01c19 in ?? ()
cannot read proc at 0
(kgdb)=20


>     Judging by your original bug report, Howard, it seems likely that eit=
her
>     the machine or the network the machine is sitting on is being attacked
>     and the machine is running out of some resource (probably network mbu=
fs).
>     Increasing the NMBCLUSTERS any more will probably not help.
>=20
>     What you need to do is figure out what kind of attack it is and start
>     experimenting with the various kernel config (see LINT) and sysctl
>     features to try to stem the attack.
>=20
>     Now, of course the kernel should not be crashing... if you can obtain
>     a backtrace from some of your core's it might help us locate the=20
>     problem.
>=20
>     gunzip vmcore.*.gz kernel.*.gz
>=20
>     gdb -k kernel.0 vmcore.0
>     back
>=20
>     gdb -k kernel.1 vmcore.1
>     back
>=20
>     I do not think this is vinum or fxp related.  If fxp is getting device
>     timeouts its probably due to the machine or network being attacked.
>=20
>     It's also possible that bad network cabling or a bad switch port=20
>     is to blame.
>=20
> 						-Matt


---
Howard Leadmon - howardl@abs.net - http://www.abs.net
ABSnet Internet Services - Phone: 410-361-8160 - FAX: 410-361-8162


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message