Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Feb 2014 17:00:41 -0500
From:      Garrett Wollman <wollman@csail.mit.edu>
To:        "Kenneth D. Merry" <ken@freebsd.org>
Cc:        freebsd-scsi@freebsd.org, scottl@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 == BOOM!)
Message-ID:  <21232.4489.544435.898780@khavrinen.csail.mit.edu>
In-Reply-To: <20140131003342.GA11755@nargothrond.kdm.org>
References:  <21225.19508.683025.581620@khavrinen.csail.mit.edu> <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu> <20140129221514.GA47535@nargothrond.kdm.org> <21225.38749.179621.454579@khavrinen.csail.mit.edu> <20140131003342.GA11755@nargothrond.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
<<On Thu, 30 Jan 2014 17:33:42 -0700, "Kenneth D. Merry" <ken@freebsd.org> said:

> The attached patch should fix the leaked allocations.  I'm CCing Steve and
> Kashyap at LSI so that they can verify that this is the right place to do
> the mapping shutdown.

It does fix the leak.

> I don't know yet why that particular change is causing problems.  Perhaps
> it just moved things around and exposed an existing problem.

> The fact that the redzone code doesn't expose any problems makes it more
> likely that it is a problem other than a heap overflow.

> Since it is consistent, is there any chance you could hook up remote gdb to
> the box and poke around when it crashes?  Perhaps you'll see something
> interesting that will point to the problem.

No way to do a remote GDB, unfortunately.  However, I tried a few
other things:

- It makes no difference whether mps.ko is preloaded or loaded in
single-user mode.

- If I boot a kernel/modules without redzone, loading mps.ko
instapanics, in a very different place (apologies for the poor
transcription; I can either be up in the machine room to plug in USB
sticks or use the serial console, not both):

--- trap 0xc, rip = 0xffff....f807e934a, rsp = 0xff...94da4c48f0, rbp = 0xff...94da4c4950 ---
bzero() at bzero+0xa/frame 0xff...94da4c4af0
mpssas_add_device() at mpssas_add_device+0x78/frame 0xff..94da4c4af0
mpssas_firmware_event_work() at mpssas_firmware_event_work+0x437/frame 0xff....94da4c4b78
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xff..94da4c4bc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xff..94da4c4be0

Inspection of the code does not reveal any arc from mpssas_add_device
to bzero.  The return address in the frame is the location of the
first function call (to mpssas_startup_increment()) in
mpssas_add_device().

So I think it's fair to say that something is scribbling over memory
in quite a bad way.

Two things that may be relevant: on boot, this server's MPT2 BIOS
always complains "adapter configuration may have changed", and I
haven't discovered anything in the configuration utility that changes
this.  Also, on boot, I always get the following messages:

failure at /usr/src-9-stable/sys/dev/mps/mps_sas_lsi.c:667/mpssas_add_device()! Could not get ID for device with handle 0x0010
mpssas_fw_work: failed to add device with handle 0x10

This has been true across mps(4) revisions, on all three copies of
this hardware that I have in service.

-GAWollman



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?21232.4489.544435.898780>