Date: Mon, 16 Apr 2012 21:56:44 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Rainer Hurling <rhurlin@gwdg.de> Cc: matt <sendtomatt@gmail.com>, "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, ken@freebsd.org, freebsd-current@freebsd.org, trasz@freebsd.org, "Conrad J. Sabatier" <conrads@cox.net> Subject: Re: Kernel builds, but crashes at boot (amd64, Revision: 234306) Message-ID: <20120416185644.GI2358@deviant.kiev.zoral.com.ua> In-Reply-To: <4F8C5DE1.60200@gwdg.de> References: <20120415053032.370280f9@cox.net> <4F8BDF13.4060903@mail.zedat.fu-berlin.de> <4F8C2E2B.20408@gmail.com> <20120416145543.GB2358@deviant.kiev.zoral.com.ua> <4F8C45A4.2050407@gwdg.de> <20120416173150.GH2358@deviant.kiev.zoral.com.ua> <4F8C5DE1.60200@gwdg.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--FYv8j7amGUaVp64H Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 16, 2012 at 07:58:57PM +0200, Rainer Hurling wrote: > On 16.04.2012 19:31 (UTC+1), Konstantin Belousov wrote: > >On Mon, Apr 16, 2012 at 06:15:32PM +0200, Rainer Hurling wrote: > >>On 16.04.2012 16:55 (UTC+1), Konstantin Belousov wrote: > >>>On Mon, Apr 16, 2012 at 07:35:23AM -0700, matt wrote: > >>>>On 04/16/12 01:57, O. Hartmann wrote: > >>>>>On 04/15/12 12:30, Conrad J. Sabatier wrote: > >>>>>>Today I'm suddenly unable to boot a newly built kernel without=20 > >>>>>>crashing > >>>>>>right near the end of the device probes, just before the system is > >>>>>>about to actually come up: > >>>>>> > >>>>>>Fatal trap 18: integer divide fault while in kernel mode > >>>>>> > >>>>>>Stopped at 0xffffffff803b2646 =3D g_label_ufs_taste_common+0x36 > >>>>>>divl 0x50(%rcx),%eax > >>>>>> > >>>>>>Backtrace lists this chain of calls: > >>>>>>g_label_ufs_taste_common > >>>>>>g_label_taste > >>>>>>g_new_provider_event > >>>>>>g_run_events > >>>>>>g_event_procbody > >>>>>>fork_exit > >>>>>>fork_trampoline > >>>>>> > >>>>>>Whether built with clang or gcc, CUSTOM config or GENERIC, same=20 > >>>>>>results > >>>>>>on rebooting. No idea why this suddenly started happening, haven't > >>>>>>changed anything at all in my setup. > >>>>>My recent kernel does the same on two "FreeBSD 10.0-CURRENT #1 r2343= 09: > >>>>>Sun Apr 15 14:14:11 CEST 2012" boxes. Both boxes in common is they a= re > >>>>>attached to a Dell UltraSharp U2711 screen which does have a built-in > >>>>>USB/MMC hub. I realized that it was possible to log into my lab's box > >>>>>from remote when I'm not in the lab and that is usually coincidental= ly > >>>>>with a switched off screen. > >>>>>This morning I loged in from home, loged out and got to the office, > >>>>>switched on the screen - and reboot! I wasn't able to get the system > >>>>>running again, it always got stuck in a > >>>>> > >>>>>Fatal trap 18: integer divide fault while in kernel mode > >>>>> > >>>>>Unplugging the screen's USB hub makes the system booting again! > >>>>> > >>>>>Following is one of the last logged messages from the kernel, I don = not > >>>>>know whether this is usefull looking for the problem. > >>>>> > >>>>>Regards, > >>>>>Oliver > >>>>> > >>>>>Apr 12 15:32:33 telesto kernel: hwpmc: > >>>>>SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> TSC/1/64/0x20<REA> > >>>>>IAP/4/48/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC> > >>>>>IAF/3/48/0x61<INT,REA,WRI> =20 > >>>>>UCP/8/48/0x3f8<EDG,THR,REA,WRI,INV,QUA,PRC> > >>>>>UCF/1/48/0x60<REA,WRI> > >>>>>Apr 12 15:32:33 telesto kernel: uhub1: 4 ports with 4 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: uhub2: 4 ports with 4 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: uhub3: 2 ports with 2 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: uhub0: 2 ports with 2 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: ugen3.2:<vendor 0x8087> at usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: uhub4:<vendor 0x8087 product 0x0024, > >>>>>class 9/0, rev 2.00/0.00, addr 2> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: ugen0.2:<vendor 0x8087> at usbus0 > >>>>>Apr 12 15:32:33 telesto kernel: uhub5:<vendor 0x8087 product 0x0024, > >>>>>class 9/0, rev 2.00/0.00, addr 2> on usbus0 > >>>>>Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 usbus0 > >>>>>Apr 12 15:32:33 telesto kernel: uhub5: 6 ports with 6 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: uhub4: 8 ports with 8 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: ugen3.3:<Cherry GmbH> at usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: ukbd0:<Cherry GmbH wired keyboard, > >>>>>class 0/0, rev 2.00/1.11, addr 3> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: kbd2 at ukbd0 > >>>>>Apr 12 15:32:33 telesto kernel: uhid0:<Cherry GmbH wired keyboard, > >>>>>class 0/0, rev 2.00/1.11, addr 3> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: ugen3.4:<vendor 0x0424> at usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: uhub6:<vendor 0x0424 product 0x2514, > >>>>>class 9/0, rev 2.00/0.00, addr 4> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: uhub6: 3 ports with 2 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: ugen3.5:<vendor 0x0424> at usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: uhub7:<vendor 0x0424 product 0x2640, > >>>>>class 9/0, rev 2.00/0.00, addr 5> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: uhub7: 3 ports with 2 removable, self > >>>>>powered > >>>>>Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: ugen3.6:<Generic> at usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: umass0:<Generic Ultra Fast Media > >>>>>Reader, class 0/0, rev 2.00/1.91, addr 6> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): TEST UNIT > >>>>>READY. CDB: 0 0 0 0 0 0 > >>>>>Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): CAM statu= s: > >>>>>SCSI Status Error > >>>>>Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): SCSI stat= us: > >>>>>Check Condition > >>>>>Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): SCSI sens= e: > >>>>>NOT READY asc:3a,0 (Medium not present) > >>>>>Apr 12 15:32:33 telesto kernel: da0 at umass-sim0 bus 0 scbus14 targ= et=20 > >>>>>0 > >>>>>lun 0 > >>>>>Apr 12 15:32:33 telesto kernel: da0:<Generic Ultra HS-SD/MMC 1.91> > >>>>>Removable Direct Access SCSI-0 device > >>>>>Apr 12 15:32:33 telesto kernel: da0: 40.000MB/s transfers > >>>>>Apr 12 15:32:33 telesto kernel: da0: Attempt to query device size > >>>>>failed: NOT READY, Medium not present > >>>>>Apr 12 15:32:33 telesto kernel: ugen3.7:<Logitech> at usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: ums0:<Logitech USB Laser Mouse, class > >>>>>0/0, rev 2.00/56.01, addr 7> on usbus3 > >>>>>Apr 12 15:32:33 telesto kernel: ums0: 8 buttons and [XYZT] coordinat= es > >>>>>ID=3D0 > >>>>>Apr 12 15:32:33 telesto kernel: Trying to mount root from > >>>>>ufs:/dev/gpt/root [rw]... > >>>>>Apr 12 15:32:33 telesto kernel: nvidia0:<GeForce GTX 570> on vgapc= i0 > >>>>>Apr 12 15:32:33 telesto kernel: vgapci0: child nvidia0 requested > >>>>>pci_enable_io > >>>>>Apr 12 15:32:33 telesto kernel: vgapci0: child nvidia0 requested > >>>>>pci_enable_io > >>>>>Apr 12 15:32:33 telesto kernel: vboxdrv: fAsync=3D0 offMin=3D0x2d8 > >>>>>offMax=3D0x603c > >>>>>Apr 12 15:32:33 telesto kernel: module_register: module ng_ether=20 > >>>>>already > >>>>>exists! > >>>>>Apr 12 15:32:33 telesto kernel: Module ng_ether failed to register: = 17 > >>>>> > >>>>Disconnect "Generic Ultra HS-SD/MMC" device which is presenting > >>>>da0...same problem here. System will boot if da0 is either not present > >>>>or has media (I think). In my case it was a different card reader that > >>>>had no cards in it, which seem to be similar to your case. > >>>> > >>>>My guess is that this problem is related to recent changes in da, but= I > >>>>couldn't pinpoint in the diff what's going wrong in a quick look. > >>> > >>>So did you tried to revert r234177 and/or r233963 ? > >> > >>I just updated my system to r234342, only downgraded > >>/usr/src/sys/cam/scsi/scsi_da.c to r233746, and now the system is > >>booting again. So obviously there is something wrong with the newest > >>patch to scsi_da.c. > >It is too broad, try to revert exactly one patch and see whether it work= s. >=20 > Sorry for my bad english. I wanted to say, that I only reverted exactly= =20 > one patch (file scsi_da.c from 234177 back to 233746 manually). The rest= =20 > is up to r234342. I see, it is confusing but right. If you reverted only the single file, namely scsi_da.c, then it has only single commit r234177 in the range r233746-r234342. So it is definitely trasz' commit. Edward ? --FYv8j7amGUaVp64H Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+Ma2wACgkQC3+MBN1Mb4hwQQCgmZffYhJy2hTV8unziZW7XuJS QWUAnjAQQulu/g8VCH99WUUJ8XtkR9dc =uono -----END PGP SIGNATURE----- --FYv8j7amGUaVp64H--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120416185644.GI2358>