Date: Mon, 3 Feb 2014 17:00:41 -0500 From: Garrett Wollman <wollman@csail.mit.edu> To: "Kenneth D. Merry" <ken@freebsd.org> Cc: freebsd-scsi@freebsd.org, scottl@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 == BOOM!) Message-ID: <21232.4489.544435.898780@khavrinen.csail.mit.edu> In-Reply-To: <20140131003342.GA11755@nargothrond.kdm.org> References: <21225.19508.683025.581620@khavrinen.csail.mit.edu> <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu> <20140129221514.GA47535@nargothrond.kdm.org> <21225.38749.179621.454579@khavrinen.csail.mit.edu> <20140131003342.GA11755@nargothrond.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
<<On Thu, 30 Jan 2014 17:33:42 -0700, "Kenneth D. Merry" <ken@freebsd.org> said: > The attached patch should fix the leaked allocations. I'm CCing Steve and > Kashyap at LSI so that they can verify that this is the right place to do > the mapping shutdown. It does fix the leak. > I don't know yet why that particular change is causing problems. Perhaps > it just moved things around and exposed an existing problem. > The fact that the redzone code doesn't expose any problems makes it more > likely that it is a problem other than a heap overflow. > Since it is consistent, is there any chance you could hook up remote gdb to > the box and poke around when it crashes? Perhaps you'll see something > interesting that will point to the problem. No way to do a remote GDB, unfortunately. However, I tried a few other things: - It makes no difference whether mps.ko is preloaded or loaded in single-user mode. - If I boot a kernel/modules without redzone, loading mps.ko instapanics, in a very different place (apologies for the poor transcription; I can either be up in the machine room to plug in USB sticks or use the serial console, not both): --- trap 0xc, rip = 0xffff....f807e934a, rsp = 0xff...94da4c48f0, rbp = 0xff...94da4c4950 --- bzero() at bzero+0xa/frame 0xff...94da4c4af0 mpssas_add_device() at mpssas_add_device+0x78/frame 0xff..94da4c4af0 mpssas_firmware_event_work() at mpssas_firmware_event_work+0x437/frame 0xff....94da4c4b78 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xff..94da4c4bc0 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xff..94da4c4be0 Inspection of the code does not reveal any arc from mpssas_add_device to bzero. The return address in the frame is the location of the first function call (to mpssas_startup_increment()) in mpssas_add_device(). So I think it's fair to say that something is scribbling over memory in quite a bad way. Two things that may be relevant: on boot, this server's MPT2 BIOS always complains "adapter configuration may have changed", and I haven't discovered anything in the configuration utility that changes this. Also, on boot, I always get the following messages: failure at /usr/src-9-stable/sys/dev/mps/mps_sas_lsi.c:667/mpssas_add_device()! Could not get ID for device with handle 0x0010 mpssas_fw_work: failed to add device with handle 0x10 This has been true across mps(4) revisions, on all three copies of this hardware that I have in service. -GAWollman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?21232.4489.544435.898780>