Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Aug 2011 21:43:21 +0900
From:      Stephane LAPIE <stephane.lapie@darkbsd.org>
To:        freebsd-geom@freebsd.org
Subject:   Poor interaction between gmultipath(8), ZFS and isp(4)
Message-ID:  <4E394269.3090208@darkbsd.org>

next in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig0ECBDA10BF951DFFC811471E
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hello list,

(Not 100% sure the bug is in GEOM_MULTIPATH or in another driver.)

I am running a FreeBSD 8.2-RELEASE server with ZFSv15, with the
following hardware :

http://www.darkbsd.org/~darksoul/server_dmesg.txt

I have a dual fibre-channel controller (isp(4) driver), and I am
accessing 16 RAID0 logical drives on a Promise vTrak E630fD (1 volume /
physical disk)

Since both controllers are plugged to the same storage unit with no LUN
masking, both controllers end up seeing the same devices. Which is what
made me combine these devices using geom_multipath.

Here is my zpool structure :
config:

        NAME                  STATE     READ WRITE CKSUM
        data                  ONLINE       0     0     0
          raidz1              ONLINE       0     0     0
            multipath/disk0   ONLINE       0     0     0
            multipath/disk1   ONLINE       0     0     0
            multipath/disk2   ONLINE       0     0     0
            multipath/disk3   ONLINE       0     0     0
            multipath/disk4   ONLINE       0     0     0
            multipath/disk5   ONLINE       0     0     0
            multipath/disk6   ONLINE       0     0     0
            multipath/disk7   ONLINE       0     0     0
          raidz1              ONLINE       0     0     0
            multipath/disk8   ONLINE       0     0     0
            multipath/disk9   ONLINE       0     0     0
            multipath/disk10  ONLINE       0     0     0
            multipath/disk11  ONLINE       0     0     0
            multipath/disk12  ONLINE       0     0     0
            multipath/disk13  ONLINE       0     0     0
            multipath/disk14  ONLINE       0     0     0
            multipath/disk15  ONLINE       0     0     0

errors: No known data errors


Using gmultipath, I eventually want to have disk{1,3,5,7,9,11,13,15} use
the second controller, while the rest uses the first. The idea was that
if anyone removed the fiber, it would switch everything over to the
remaining fiber.

For the sake of testing, I put every multipath device on the same
controller, isp1.

Here is the kernel log fragment I could acquire from my test (removing a
fiber on which transfers are actively running), however since I don't
have serial console access, I couldn't acquire the relevant kernel panic
trace (it simply mentions a kernel trap during a page fault in g_mp_kt
in the last readable section displayed, but I reckon it's like every CPU
raises the panic message)

http://www.darkbsd.org/~darksoul/server_lastlog_before_kernelpanic.txt

After that, I get the aforementioned kernel panic. I can consistently
reproduce it, and will try to acquire serial console output to get more
detailed kernel panic trace, but it feels like everything is occuring at
the same time without proper locking, or confirming relevant structures
are still allocated. This looks like a race condition between isp(4)
loopdown provoking da(4) destruction, and gmultipath(8) failover.
(Therefore having g_mp_kt accessing a da(4) structure that is being
destroyed, or already destroyed, and accessing unallocated memory)

Maybe this is similar to this issue :
http://freebsd.1045724.n5.nabble.com/Kernel-panic-with-gmultipath-td42047=
00.html


Could this be tuned so that :
1) initially, on isp(4) loopdown -> da(4) devices depending on it return
SCSI errors, provoking clean failover of gmultipath
2) afterwards, on isp(4) timeout -> da(4) devices are destroyed

Is this a case for using the following boot hints ?
- "hint.isp.0.loop_down_limit" and "hint.isp.0.gone_device_time" (though
I am not quite sure what the difference is between the two ... Which one
does the actual deallocation of underlying devices ?)

Thanks in advance for your time,
--=20
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo


--------------enig0ECBDA10BF951DFFC811471E
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk45QmoACgkQ24Ql8u6TF2NSHACeNHa2ug7j6x8GqobfuVdcskox
/EQAoM+YGH7HhcuA+Bpo9rc70Uhz76Q/
=F/5b
-----END PGP SIGNATURE-----

--------------enig0ECBDA10BF951DFFC811471E--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E394269.3090208>