From owner-freebsd-hackers Wed May 3 16:53:38 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from account.abs.net (account.abs.net [207.114.5.70]) by hub.freebsd.org (Postfix) with ESMTP id EA5C537BEC3; Wed, 3 May 2000 16:53:23 -0700 (PDT) (envelope-from howardl@account.abs.net) Received: (from howardl@localhost) by account.abs.net (8.9.3/8.9.3+RBL+DUL+RSS+ORBS) id TAA49749; Wed, 3 May 2000 19:53:11 -0400 (EDT) (envelope-from howardl) From: Howard Leadmon Message-Id: <200005032353.TAA49749@account.abs.net> Subject: Re: Debugging Kernel/System Crashes, can anyone help?? In-Reply-To: <200005031724.KAA63381@apollo.backplane.com> from Matthew Dillon at "May 3, 2000 10:24:36 am" To: Matthew Dillon Date: Wed, 3 May 2000 19:53:11 -0400 (EDT) Cc: Greg Lehey , freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL72 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG OK, well as I don't remember what options had been in what kernel from the old crashes, I just setup the machine to generate more crash dumps and sure enough it was willing to give me one quickly.. :) Here is a backtrace done on the dump I got only a few minutes ago, also ple= ase note that currently I am using a DEC based network card instead of the EEpro adapter as I had both sitting around. If it would be better to try and get the dumps with the EEpro I am sure it can be arranged. Also note that I am currently running SMP, but did remove one CPU and built a non-SMP kernel to see what happened, and still the machine dies.. Here is the gdb info: # gdb -k kernel.0 vmcore.0 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... (no debugging symbols found)... SMP 2 cpus IdlePTD 3112960 initial pcb at 2815e0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock =3D 00000002; cpuid =3D 0; lapic.id =3D 00000000 fault virtual address =3D 0x261e930 fault code =3D supervisor write, page not present instruction pointer =3D 0x8:0xc017246b stack pointer =3D 0x10:0xff806f34 frame pointer =3D 0x10:0xff806f79 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, def32 1, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D Idle interrupt mask =3D net <- SMP: XXX trap number =3D 12 panic: page fault mp_lock =3D 00000002; cpuid =3D 0; lapic.id =3D 00000000 boot() called on cpu#0 syncing disks...=20 Fatal trap 12: page fault while in kernel mode mp_lock =3D 00000003; cpuid =3D 0; lapic.id =3D 00000000 fault virtual address =3D 0x30 fault code =3D supervisor read, page not present instruction pointer =3D 0x8:0xc01cdca5 stack pointer =3D 0x10:0xff806d5c frame pointer =3D 0x10:0xff806d60 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, def32 1, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D Idle interrupt mask =3D net bio cam <- SMP: XXX trap number =3D 12 panic: page fault mp_lock =3D 00000003; cpuid =3D 0; lapic.id =3D 00000000 boot() called on cpu#0 Uptime: 4m48s dumping to dev #ad/0x20001, offset 128 dump ata0: resetting devices .. done 383 382 381 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365= 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 34= 6 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 3= 27 326 325 324 323 322 321 32= 0 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 3= 01 300 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 = 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264= 263 262 261 260 259 258 257=20= 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238= 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 21= 9 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 2= 00 199 198 197 196 195 194 19= 3 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 1= 74 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 = 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137= 136 135 134 133 132 131 130=20= 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111= 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 = 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 = 64 63 62 61 60 59 58 57 56 55= 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30= 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 = 2 1 0=20 --- #0 0xc013abd1 in boot () (kgdb) back #0 0xc013abd1 in boot () #1 0xc013af94 in poweroff_wait () #2 0xc02283cc in trap_fatal () #3 0xc022805d in trap_pfault () #4 0xc0227c57 in trap () #5 0xc01cdca5 in acquire_lock () #6 0xc01d2f7c in softdep_count_dependencies () #7 0xc01d624c in ffs_fsync () #8 0xc01d4d66 in ffs_sync () #9 0xc01673ef in sync () #10 0xc013a9b3 in boot () #11 0xc013af94 in poweroff_wait () #12 0xc02283cc in trap_fatal () #13 0xc022805d in trap_pfault () #14 0xc0227c57 in trap () #15 0xc017246b in bpfioctl () #16 0xc01c19 in ?? () cannot read proc at 0 (kgdb)=20 > Judging by your original bug report, Howard, it seems likely that eit= her > the machine or the network the machine is sitting on is being attacked > and the machine is running out of some resource (probably network mbu= fs). > Increasing the NMBCLUSTERS any more will probably not help. >=20 > What you need to do is figure out what kind of attack it is and start > experimenting with the various kernel config (see LINT) and sysctl > features to try to stem the attack. >=20 > Now, of course the kernel should not be crashing... if you can obtain > a backtrace from some of your core's it might help us locate the=20 > problem. >=20 > gunzip vmcore.*.gz kernel.*.gz >=20 > gdb -k kernel.0 vmcore.0 > back >=20 > gdb -k kernel.1 vmcore.1 > back >=20 > I do not think this is vinum or fxp related. If fxp is getting device > timeouts its probably due to the machine or network being attacked. >=20 > It's also possible that bad network cabling or a bad switch port=20 > is to blame. >=20 > -Matt --- Howard Leadmon - howardl@abs.net - http://www.abs.net ABSnet Internet Services - Phone: 410-361-8160 - FAX: 410-361-8162 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message