Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Apr 2019 14:52:28 -0400
From:      Zaphod Beeblebrox <zbeeble@gmail.com>
To:        Karl Denninger <karl@denninger.net>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
Message-ID:  <CACpH0MfmPzEO5BO2kFk8-F1hP9TsXEiXbfa1qxcvB8YkvAjWWw@mail.gmail.com>
In-Reply-To: <2bc8a172-6168-5ba9-056c-80455eabc82b@denninger.net>
References:  <f87f32f2-b8c5-75d3-4105-856d9f4752ef@denninger.net> <c96e31ad-6731-332e-5d2d-7be4889716e1@FreeBSD.org> <9a96b1b5-9337-fcae-1a2a-69d7bb24a5b3@denninger.net> <CACpH0MdLNQ_dqH%2Bto=amJbUuWprx3LYrOLO0rQi7eKw-ZcqWJw@mail.gmail.com> <1866e238-e2a1-ef4e-bee5-5a2f14e35b22@denninger.net> <3d2ad225-b223-e9db-cce8-8250571b92c9@FreeBSD.org> <2bc8a172-6168-5ba9-056c-80455eabc82b@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger <karl@denninger.net> wrote:


> In this specific case the adapter in question is...
>
> mps0: <Avago Technologies (LSI) SAS2116> port 0xc000-0xc0ff mem
> 0xfbb3c000-0xfbb3ffff,0xfbb40000-0xfbb7ffff irq 30 at device 0.0 on pci3
> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
> mps0: IOCCapabilities:
> 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
>
> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
> his drives via dumb on-MoBo direct SATA connections.
>

Maybe I'm in good company.  My current setup has 8 of the disks connected
to:

mps0: <Avago Technologies (LSI) SAS2308> port 0xb000-0xb0ff mem
0xfe240000-0xfe24ffff,0xfe200000-0xfe23ffff irq 32 at device 0.0 on pci6
mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities:
5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>

... just with a cable that breaks out each of the 2 connectors into 4
SATA-style connectors, and the other 8 disks (plus boot disks and SSD
cache/log) connected to ports on...

- ahci0: <ASMedia ASM1062 AHCI SATA controller> port
0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem
0xfe900000-0xfe9001ff irq 44 at device 0.0 on pci2
- ahci2: <Marvell 88SE9230 AHCI SATA controller> port
0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem
0xfe610000-0xfe6107ff irq 40 at device 0.0 on pci7
- ahci3: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port
0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem
0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0

... each drive connected to a single port.

I can actually reproduce this at will.  Because I have 16 drives, when one
fails, I need to find it.  I pull the sata cable for a drive, determine if
it's the drive in question, if not, reconnect, "ONLINE" it and wait for
resilver to stop... usually only a minute or two.

... if I do this 4 to 6 odd times to find a drive (I can tell, in general,
that a drive is part of the SAS controller or the SATA controllers... so
I'm only looking among 8, ever) ... then I "REPLACE" the problem drive.
More often than not, the a scrub will find a few problems.  In fact, it
appears that the most recent scrub is an example:

[1:7:306]dgilbert@vr:~> zpool status
  pool: vr1
 state: ONLINE
  scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr  1 23:12:03
2019
config:

        NAME            STATE     READ WRITE CKSUM
        vr1             ONLINE       0     0     0
          raidz2-0      ONLINE       0     0     0
            gpt/v1-d0   ONLINE       0     0     0
            gpt/v1-d1   ONLINE       0     0     0
            gpt/v1-d2   ONLINE       0     0     0
            gpt/v1-d3   ONLINE       0     0     0
            gpt/v1-d4   ONLINE       0     0     0
            gpt/v1-d5   ONLINE       0     0     0
            gpt/v1-d6   ONLINE       0     0     0
            gpt/v1-d7   ONLINE       0     0     0
          raidz2-2      ONLINE       0     0     0
            gpt/v1-e0c  ONLINE       0     0     0
            gpt/v1-e1b  ONLINE       0     0     0
            gpt/v1-e2b  ONLINE       0     0     0
            gpt/v1-e3b  ONLINE       0     0     0
            gpt/v1-e4b  ONLINE       0     0     0
            gpt/v1-e5a  ONLINE       0     0     0
            gpt/v1-e6a  ONLINE       0     0     0
            gpt/v1-e7c  ONLINE       0     0     0
        logs
          gpt/vr1log    ONLINE       0     0     0
        cache
          gpt/vr1cache  ONLINE       0     0     0

errors: No known data errors

... it doesn't say it now, but there were 5 CKSUM errors on one of the
drives that I had trial-removed (and not on the one replaced).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0MfmPzEO5BO2kFk8-F1hP9TsXEiXbfa1qxcvB8YkvAjWWw>