From nobody Fri Sep 10 15:45:54 2021
X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E906217B17D7
	for <freebsd-hardware@mlmmj.nyi.freebsd.org>; Fri, 10 Sep 2021 15:46:12 +0000 (UTC)
	(envelope-from mike@jellydonut.org)
Received: from mail-ej1-x62e.google.com (mail-ej1-x62e.google.com [IPv6:2a00:1450:4864:20::62e])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4H5gFv6njVz4vBl
	for <freebsd-hardware@freebsd.org>; Fri, 10 Sep 2021 15:46:11 +0000 (UTC)
	(envelope-from mike@jellydonut.org)
Received: by mail-ej1-x62e.google.com with SMTP id ho42so5080741ejc.9
        for <freebsd-hardware@freebsd.org>; Fri, 10 Sep 2021 08:46:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=jellydonut-org.20150623.gappssmtp.com; s=20150623;
        h=mime-version:from:date:message-id:subject:to;
        bh=rrpXRXY7xHpayEpOBCUaHEr8la5d0nVKNCogy/KtORQ=;
        b=Db7ZXsdZMm1awGuqiji7mYdD1uP7uCm3pLDd7GmKIV3z0vH19Vx5h9u+zBMPT2PAU8
         LpHROwXLvX+CrjgYyqPN6mW1UKdxlibEi9RyjJWXbxWS3FC8dJbWQFuu3brbvjOH+cB9
         Ttx4D4uHpqb/170SN6V101ngFE2Ep3il7WeEWcubWaff3O4lDd5pcHwm62V6Y9sA0utZ
         mmjY7IYuTiPN2vuNN96azzJ/mtMt/KTkMXJ0JlFfDVFAOY5yAmUeC4BSZm2qLp7UDS/I
         6pFN0a7LqSWpxLLdP2ww9Dsr4AO5064hweZViuo6wLkSecsfrqygTk+ZzNn0vw9lbQRe
         lBVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
        bh=rrpXRXY7xHpayEpOBCUaHEr8la5d0nVKNCogy/KtORQ=;
        b=GWQ2kL8/5o8aPN9gpUiERF3Mw2hGx3+wfLhqGeVb5B1sKDj7eJqk7IXOCF6yS2AP+u
         YJoZhFGd9GGNgtlxcXf6OfYToNfYIjlmNkFfQzMcLq95aINobwdROISsmxkoZVVVnJ16
         hQj6WGqz36jNolckpzSbe3BQPigmG6DVNqM3uA94Ac0ZFa3XLSR/N1Ol5z6sst864zXs
         /jqCtHiYRttdjAGihPilEvwG1zLxdQpfYB/Go475EdhPA8U+F9OjWOoWIYYO+EUfsxYY
         Ffe0CB1OVk+xvUaQHA9Gh6Lq8OSVZ7VYb7rvSaHFZoP+tQIHiouHSag7LCOF9NmeBa2k
         bv3w==
X-Gm-Message-State: AOAM532y7Oe2ooJhxpa/9q5KY5f0JGpeXzrEvjhcHyRX5Mg4eAWMRSFH
	Qb8okuFD3Mjpd7ovSt2xrdlyZWZ4KRH8CJqTiu/GAcXmlHoFUA==
X-Google-Smtp-Source: ABdhPJzaDc6XQg9OFU8UguIv67cFtTTwqAlHzm+/b1Vk8hfeVZefMGWcQehqpr+NXDakGHqEgd6LIQ5kUPrbQld6buU=
X-Received: by 2002:a17:906:a0c9:: with SMTP id bh9mr10131802ejb.51.1631288764892;
 Fri, 10 Sep 2021 08:46:04 -0700 (PDT)
List-Id: General discussion of FreeBSD hardware <freebsd-hardware.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-hardware
List-Help: <mailto:freebsd-hardware+help@freebsd.org>
List-Post: <mailto:freebsd-hardware@freebsd.org>
List-Subscribe: <mailto:freebsd-hardware+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-hardware+unsubscribe@freebsd.org>
Sender: owner-freebsd-hardware@freebsd.org
MIME-Version: 1.0
From: Michael Proto <mike@jellydonut.org>
Date: Fri, 10 Sep 2021 11:45:54 -0400
Message-ID: <CAGAnWo0J0gHL3sPAM05B8XWTXqYANoBsg6RLd1Ffu6=q4CLzCw@mail.gmail.com>
Subject: Constant state-changes in a ZFS array
To: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, freebsd-hardware@freebsd.org
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 4H5gFv6njVz4vBl
X-Spamd-Bar: -
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=jellydonut-org.20150623.gappssmtp.com header.s=20150623 header.b=Db7ZXsdZ;
	dmarc=none;
	spf=pass (mx1.freebsd.org: domain of mike@jellydonut.org designates 2a00:1450:4864:20::62e as permitted sender) smtp.mailfrom=mike@jellydonut.org
X-Spamd-Result: default: False [-1.50 / 15.00];
	 ARC_NA(0.00)[];
	 NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	 R_DKIM_ALLOW(-0.20)[jellydonut-org.20150623.gappssmtp.com:s=20150623];
	 FREEFALL_USER(0.00)[mike];
	 FROM_HAS_DN(0.00)[];
	 TO_DN_SOME(0.00)[];
	 R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36];
	 NEURAL_HAM_LONG(-1.00)[-1.000];
	 MIME_GOOD(-0.10)[text/plain];
	 PREVIOUSLY_DELIVERED(0.00)[freebsd-hardware@freebsd.org];
	 DMARC_NA(0.00)[jellydonut.org];
	 NEURAL_SPAM_SHORT(1.00)[1.000];
	 TO_MATCH_ENVRCPT_SOME(0.00)[];
	 DKIM_TRACE(0.00)[jellydonut-org.20150623.gappssmtp.com:+];
	 RCPT_COUNT_TWO(0.00)[2];
	 RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::62e:from];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
	 RCVD_COUNT_TWO(0.00)[2];
	 RCVD_TLS_ALL(0.00)[]
X-ThisMailContainsUnwantedMimeParts: N

Hey all,

I had a server go dark earlier this week and after several hardware
swaps I'm left scratching my head. The server is a HP DL380p Gen8 with
a D3600 shelf attached, using 2 EF0600FARNA HP 600G disks in a ZFS
mirror (da0 and da1) and another 22 8TB Ultrastar disks in a ZFS
RAID10 for data (da3 through da23, though da23 has been removed in
this situation). They're all attached to a LSI SAS2308 operating in
HBA mode.

The large array threw a disk shortly before which we would normally
handle online like we've done dozens of times before. In this case
there's a bigger problem I'm struggling with. In addition to the disk
being thrown I'm now unable to bring the larger ZFS array online.
Commands issued to check array status or bring it online during boot
are stalling. The 2-disk zroot mirror is recognized on boot and is
loading, so I can get into the OS as normal with the larger tank array
failing to come online.

Looking at syslog I'm seeing a regular stream of messages coming from
devd regarding media and state-change events from both ZFS, GEOM and
DEVFS. Sample below:

Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da2'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da2'
Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
subsystem=ZFS type=resource.fs.zfs.statechange  version=0
class=resource.fs.zfs.statechange pool_guid=9328454021323814501
vdev_guid=8915574321583737794'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da2p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da2p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da6'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da2'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da2'
Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
subsystem=ZFS type=resource.fs.zfs.statechange  version=0
class=resource.fs.zfs.statechange pool_guid=9328454021323814501
vdev_guid=8915574321583737794'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da2p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da2p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da6'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da6'
Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
subsystem=ZFS type=resource.fs.zfs.statechange  version=0
class=resource.fs.zfs.statechange pool_guid=9328454021323814501
vdev_guid=7024987654522270730'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da6p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da6p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da9'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da9'
Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
subsystem=ZFS type=resource.fs.zfs.statechange  version=0
class=resource.fs.zfs.statechange pool_guid=9328454021323814501
vdev_guid=4207599288564790488'
Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
subsystem=CDEV type=MEDIACHANGE cdev=da9p1'
Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
subsystem=DEV type=MEDIACHANGE cdev=da9p1'


The disk devices appearing in these messages are all disks in the
RAID10 array. They appear as a group, every 5 seconds. Furthermore the
state changes seem to be happening evenly across all affected devices
with the exception of da15 which is precisely half the volume. Here's
a count from /var/log/messages piped to sort and uniq (count, then
message):

32100 cdev=da10'
32100 cdev=da10p1'
32100 cdev=da11'
32100 cdev=da11p1'
32100 cdev=da12'
32100 cdev=da12p1'
32100 cdev=da13'
32100 cdev=da13p1'
32100 cdev=da14'
32100 cdev=da14p1'
16050 cdev=da15'
16050 cdev=da15p1'
32100 cdev=da16'
32100 cdev=da16p1'
32100 cdev=da17'
32100 cdev=da17p1'
32100 cdev=da18'
32100 cdev=da18p1'
32100 cdev=da19'
32100 cdev=da19p1'
32100 cdev=da2'
32100 cdev=da20'
32100 cdev=da20p1'
32100 cdev=da21'
32100 cdev=da21p1'
32100 cdev=da22'
32100 cdev=da22p1'
32100 cdev=da2p1'
32100 cdev=da3'
32100 cdev=da3p1'
32100 cdev=da4'
32100 cdev=da4p1'
32100 cdev=da5'
32100 cdev=da5p1'
32100 cdev=da6'
32100 cdev=da6p1'
32100 cdev=da7'
32100 cdev=da7p1'
32100 cdev=da8'
32100 cdev=da8p1'
32100 cdev=da9'
32100 cdev=da9p1'


I can run diskinfo against all the listed disks no problem and I see
them via camcontrol. I can also issue a reset via camcontrol to both
the chassis and the D3600 with no issues. sesutil-map sees the
chassis, shelf, and all disk devices.

So far I've swapped the LSI controller, the D3600 shelf (twice) and
the cabling, same behavior. Previously when a collection of disks go
problematic like this we've swapped the D3600 shelf or occasionally
just reseated the external cabling and everything came back normal.
Not this time. I'm scheduling a chassis swap for next week but figured
I'd throw this out here to see if anyone has seen this before.


Thanks!
Mike Proto

From nobody Fri Sep 10 16:10:05 2021
X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9287717BDDA9
	for <freebsd-hardware@mlmmj.nyi.freebsd.org>; Fri, 10 Sep 2021 16:10:19 +0000 (UTC)
	(envelope-from mike@jellydonut.org)
Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4H5gnj5WYyz3LrZ
	for <freebsd-hardware@freebsd.org>; Fri, 10 Sep 2021 16:10:17 +0000 (UTC)
	(envelope-from mike@jellydonut.org)
Received: by mail-ej1-x634.google.com with SMTP id e21so5212879ejz.12
        for <freebsd-hardware@freebsd.org>; Fri, 10 Sep 2021 09:10:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=jellydonut-org.20150623.gappssmtp.com; s=20150623;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=Wvf1Ija0NxbcFuEB/z/eESMX1+9QwOv0/+26AGT1vGY=;
        b=qWyai5164BROexPhv6z9CbeAdSDUualXISxZHhhqouvKrk1fuhdOTJEwvnbdk85gv6
         5qBjIm3NBu1XoTGjQleMQOZMFVe67v7aXsHcLk7AtZw/OVXhyYV7hFG359tEWd9JZCUk
         fo3ajs64C9tGK7RW2SXl+SAWLoycWj0GaIhwsdVeoH7whnhFHZhrtslIQkc11n8cESLQ
         5piqR0sVb9dqzpRokCMQrhd5Kgz+dT24Co1vFGwA9pL7tS7DtvtychsaxYpQ+o75Y+XS
         5KxTWXI0/K+hWFer+J9+jtPjQrkQ/ge/uto90dv8c/4dg//oeT1m2nenIp3TUZLOHEoA
         Zs1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=Wvf1Ija0NxbcFuEB/z/eESMX1+9QwOv0/+26AGT1vGY=;
        b=0vIpTOhZ56KyQeKCFlcFzaN6L8csDx6Kevj7ad0b4wEIKin5JHKzB2DcA7nKIawm+6
         A26EehPNPaqAz3XpRn3pil/PpdA9uEGGfNdUunm22WoOgRbX9pMZHYm1f8ZWFJFQ/J4d
         sfAbws7ycICj2srnOyJBPczQu9lb6tsPY06L0f1+1WyG0yiaWKRNuiEZdDVgXE7kkw6N
         V5Y/1xA9cM7o0uJVEehSqcWFApnSRtNPxd71Zic2tCdf9Z9dgULGJ9H/RQ2eDjNJI7Sn
         +xJy4Vu1g7irNwFis4HobT7JygDUJQS/SLz43yycUJPpYmfk+DzeAzBIdlcuuWelBKVd
         clQA==
X-Gm-Message-State: AOAM5311MeWebNly/hKQlmc67tH1rMvPyPrH/UDwbyBPC69mOl7gxQ/5
	ryEjROx8WIXEQ9KOhjM9Ma9EKhL81cohpwQjHgShTA9cM+o=
X-Google-Smtp-Source: ABdhPJxPwKJjMM2OYpNnDieJHsxsrf4x3zcPhWzyC3eSEki6MfjlUZiZ43SBeoG9kJnaM/QSgfPdtTK0dR9n880YQy4=
X-Received: by 2002:a17:906:7802:: with SMTP id u2mr10210201ejm.325.1631290216484;
 Fri, 10 Sep 2021 09:10:16 -0700 (PDT)
List-Id: General discussion of FreeBSD hardware <freebsd-hardware.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-hardware
List-Help: <mailto:freebsd-hardware+help@freebsd.org>
List-Post: <mailto:freebsd-hardware@freebsd.org>
List-Subscribe: <mailto:freebsd-hardware+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-hardware+unsubscribe@freebsd.org>
Sender: owner-freebsd-hardware@freebsd.org
MIME-Version: 1.0
References: <CAGAnWo0J0gHL3sPAM05B8XWTXqYANoBsg6RLd1Ffu6=q4CLzCw@mail.gmail.com>
In-Reply-To: <CAGAnWo0J0gHL3sPAM05B8XWTXqYANoBsg6RLd1Ffu6=q4CLzCw@mail.gmail.com>
From: Michael Proto <mike@jellydonut.org>
Date: Fri, 10 Sep 2021 12:10:05 -0400
Message-ID: <CAGAnWo1KXeAivyxk0UjuFdwR0Bd8-8prJ9QOH49GNvZ4jnYfSQ@mail.gmail.com>
Subject: Re: Constant state-changes in a ZFS array
To: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, freebsd-hardware@freebsd.org
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 4H5gnj5WYyz3LrZ
X-Spamd-Bar: -
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=jellydonut-org.20150623.gappssmtp.com header.s=20150623 header.b=qWyai516;
	dmarc=none;
	spf=pass (mx1.freebsd.org: domain of mike@jellydonut.org designates 2a00:1450:4864:20::634 as permitted sender) smtp.mailfrom=mike@jellydonut.org
X-Spamd-Result: default: False [-1.50 / 15.00];
	 ARC_NA(0.00)[];
	 NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	 R_DKIM_ALLOW(-0.20)[jellydonut-org.20150623.gappssmtp.com:s=20150623];
	 FREEFALL_USER(0.00)[mike];
	 FROM_HAS_DN(0.00)[];
	 TO_DN_SOME(0.00)[];
	 R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36];
	 NEURAL_HAM_LONG(-1.00)[-1.000];
	 MIME_GOOD(-0.10)[text/plain];
	 PREVIOUSLY_DELIVERED(0.00)[freebsd-hardware@freebsd.org];
	 DMARC_NA(0.00)[jellydonut.org];
	 NEURAL_SPAM_SHORT(1.00)[1.000];
	 TO_MATCH_ENVRCPT_SOME(0.00)[];
	 DKIM_TRACE(0.00)[jellydonut-org.20150623.gappssmtp.com:+];
	 RCPT_COUNT_TWO(0.00)[2];
	 RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::634:from];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
	 RCVD_COUNT_TWO(0.00)[2];
	 RCVD_TLS_ALL(0.00)[]
X-ThisMailContainsUnwantedMimeParts: N

Just realized I neglected version info. This is on FreeBSD 11.4-RELEASE-p3

On Fri, Sep 10, 2021 at 11:45 AM Michael Proto <mike@jellydonut.org> wrote:
>
> Hey all,
>
> I had a server go dark earlier this week and after several hardware
> swaps I'm left scratching my head. The server is a HP DL380p Gen8 with
> a D3600 shelf attached, using 2 EF0600FARNA HP 600G disks in a ZFS
> mirror (da0 and da1) and another 22 8TB Ultrastar disks in a ZFS
> RAID10 for data (da3 through da23, though da23 has been removed in
> this situation). They're all attached to a LSI SAS2308 operating in
> HBA mode.
>
> The large array threw a disk shortly before which we would normally
> handle online like we've done dozens of times before. In this case
> there's a bigger problem I'm struggling with. In addition to the disk
> being thrown I'm now unable to bring the larger ZFS array online.
> Commands issued to check array status or bring it online during boot
> are stalling. The 2-disk zroot mirror is recognized on boot and is
> loading, so I can get into the OS as normal with the larger tank array
> failing to come online.
>
> Looking at syslog I'm seeing a regular stream of messages coming from
> devd regarding media and state-change events from both ZFS, GEOM and
> DEVFS. Sample below:
>
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da2'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da2'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
> subsystem=ZFS type=resource.fs.zfs.statechange  version=0
> class=resource.fs.zfs.statechange pool_guid=9328454021323814501
> vdev_guid=8915574321583737794'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da2p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da2p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da6'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da2'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da2'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
> subsystem=ZFS type=resource.fs.zfs.statechange  version=0
> class=resource.fs.zfs.statechange pool_guid=9328454021323814501
> vdev_guid=8915574321583737794'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da2p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da2p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da6'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da6'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
> subsystem=ZFS type=resource.fs.zfs.statechange  version=0
> class=resource.fs.zfs.statechange pool_guid=9328454021323814501
> vdev_guid=7024987654522270730'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da6p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da6p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da9'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da9'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=ZFS
> subsystem=ZFS type=resource.fs.zfs.statechange  version=0
> class=resource.fs.zfs.statechange pool_guid=9328454021323814501
> vdev_guid=4207599288564790488'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=DEVFS
> subsystem=CDEV type=MEDIACHANGE cdev=da9p1'
> Sep 10 04:28:33 backup11 devd: Processing event '!system=GEOM
> subsystem=DEV type=MEDIACHANGE cdev=da9p1'
>
>
>
> The disk devices appearing in these messages are all disks in the
> RAID10 array. They appear as a group, every 5 seconds. Furthermore the
> state changes seem to be happening evenly across all affected devices
> with the exception of da15 which is precisely half the volume. Here's
> a count from /var/log/messages piped to sort and uniq (count, then
> message):
>
> 32100 cdev=da10'
> 32100 cdev=da10p1'
> 32100 cdev=da11'
> 32100 cdev=da11p1'
> 32100 cdev=da12'
> 32100 cdev=da12p1'
> 32100 cdev=da13'
> 32100 cdev=da13p1'
> 32100 cdev=da14'
> 32100 cdev=da14p1'
> 16050 cdev=da15'
> 16050 cdev=da15p1'
> 32100 cdev=da16'
> 32100 cdev=da16p1'
> 32100 cdev=da17'
> 32100 cdev=da17p1'
> 32100 cdev=da18'
> 32100 cdev=da18p1'
> 32100 cdev=da19'
> 32100 cdev=da19p1'
> 32100 cdev=da2'
> 32100 cdev=da20'
> 32100 cdev=da20p1'
> 32100 cdev=da21'
> 32100 cdev=da21p1'
> 32100 cdev=da22'
> 32100 cdev=da22p1'
> 32100 cdev=da2p1'
> 32100 cdev=da3'
> 32100 cdev=da3p1'
> 32100 cdev=da4'
> 32100 cdev=da4p1'
> 32100 cdev=da5'
> 32100 cdev=da5p1'
> 32100 cdev=da6'
> 32100 cdev=da6p1'
> 32100 cdev=da7'
> 32100 cdev=da7p1'
> 32100 cdev=da8'
> 32100 cdev=da8p1'
> 32100 cdev=da9'
> 32100 cdev=da9p1'
>
>
> I can run diskinfo against all the listed disks no problem and I see
> them via camcontrol. I can also issue a reset via camcontrol to both
> the chassis and the D3600 with no issues. sesutil-map sees the
> chassis, shelf, and all disk devices.
>
> So far I've swapped the LSI controller, the D3600 shelf (twice) and
> the cabling, same behavior. Previously when a collection of disks go
> problematic like this we've swapped the D3600 shelf or occasionally
> just reseated the external cabling and everything came back normal.
> Not this time. I'm scheduling a chassis swap for next week but figured
> I'd throw this out here to see if anyone has seen this before.
>
>
>
> Thanks!
> Mike Proto