Date: Wed, 14 Oct 2015 15:34:34 -0500 From: cruxpot <cruxpot@gmail.com> To: Juan Bernhard <juan@inti.gob.ar>, freebsd-questions@freebsd.org Subject: Re: zfs pool ssd cache drive dropping off Message-ID: <CAPYfQ9yNWcthCzLDxTr1uJUxo0n5ou7jvk6b=4YH=hdAQfTROQ@mail.gmail.com> In-Reply-To: <561EA1BA.2030402@inti.gob.ar> References: <CAPYfQ9yn4Xvcv09rKedXA5J2MGnpKhKiWbhBw1mUP03C%2By5EkQ@mail.gmail.com> <561EA1BA.2030402@inti.gob.ar>
next in thread | previous in thread | raw e-mail | index | archive | help
When I first put it in, it was working fine but I logged in one day and that was the state it was in. It dropped off after 7 days. After a recent reboot, it dropped off an hour later according to the timestamps. Are there some diagnostic commands on FreeBSD to help me determine if the SSD is going failing or not? I'm wondering if ZFS killed it. I simply added to a raid-z with "zpool add <pool> cache /dev/ada4" command. I can try a different SATA cable and port but it is probably a small chance that is the problem because this seems to be an intermittent issue. On Wed, Oct 14, 2015 at 1:40 PM, Juan Bernhard <juan@inti.gob.ar> wrote: > > El 14/10/2015 a las 02:24 p.m., cruxpot escribi=C3=B3: > >> I recently added a Crucial 64GB SSD drive that I had lying around to my >> zfs >> pool. unfortunately, it keeps dropping off and I'm not sure why. The dri= ve >> wasn't failed when I removed it from an old laptop. It has happened twic= e >> and only system restart brings it back. Here are the log messages, they >> repeat but here is the base mess: >> >> >> zpool status >> pool: zrewt >> state: ONLINE >> status: One or more devices has been removed by the administrator. >> Sufficient replicas exist for the pool to continue functioning >> in a >> degraded state. >> action: Online the device using 'zpool online' or replace the device wit= h >> 'zpool replace'. >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> zrewt ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> ada0 ONLINE 0 0 0 >> ada1 ONLINE 0 0 0 >> ada2 ONLINE 0 0 0 >> ada3 ONLINE 0 0 0 >> cache >> 16818205039835910221 REMOVED 0 0 0 was /dev/ad= a4 >> >> errors: No known data errors >> >> kernel: >> Trying to mount root from zfs:zrewt []... >> ahcich4: Timeout on slot 0 port 0 >> ahcich4: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr >> 00000000 cmd 0004c017 >> (ada4:ahcich4:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 38 78 2f 05 40 00 00 0= 0 >> 00 00 00 >> (ada4:ahcich4:0:0:0): CAM status: Command timeout >> (ada4:ahcich4:0:0:0): Retrying command >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Timeout on slot 1 port 0 >> ahcich4: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 80 serr >> 00000000 cmd 0004c117 >> (aprobe0:ahcich4:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 0= 0 >> 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Retrying command >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Timeout on slot 2 port 0 >> ahcich4: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 80 serr >> 00000000 cmd 0004c217 >> (aprobe0:ahcich4:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 0= 0 >> 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Error 5, Retries exhausted >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Timeout on slot 3 port 0 >> ahcich4: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr >> 00000000 cmd 0004c317 >> (aprobe0:ahcich4:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 0= 0 >> 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Error 5, Retry was blocked >> ada4 at ahcich4 bus 0 scbus6 target 0 lun 0 >> ada4: <M4-CT064M4SSD2 0009> s/n 0000000011290314E425 detached >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Timeout on slot 4 port 0 >> ahcich4: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr >> 00000000 cmd 0004c417 >> (aprobe0:ahcich4:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 0= 0 >> 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Retrying command >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Timeout on slot 5 port 0 >> ahcich4: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr >> 00000000 cmd 0004c517 >> (aprobe0:ahcich4:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 0= 0 >> 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Error 5, Retries exhausted >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Poll timeout on slot 7 port 0 >> ahcich4: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd 80 serr >> 00000000 cmd 0004c717 >> (aprobe0:ahcich4:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Error 5, Retries exhausted >> ahcich4: Timeout on slot 8 port 0 >> ahcich4: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 80 serr >> 00000000 cmd 0004c817 >> (ada4:ahcich4:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 = 00 >> 00 00 00 00 00 >> (ada4:ahcich4:0:0:0): CAM status: Command timeout >> (ada4:ahcich4:0:0:0): Error 5, Periph was invalidated >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Poll timeout on slot 10 port 0 >> ahcich4: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr >> 00000000 cmd 0004ca17 >> (aprobe0:ahcich4:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Error 5, Retries exhausted >> ahcich4: Timeout on slot 11 port 0 >> ahcich4: is 00000000 cs 00000800 ss 00000800 rs 00000800 tfd 80 serr >> 00000000 cmd 0004cb17 >> (ada4:ahcich4:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 38 78 2f 05 40 00 00 0= 0 >> 00 00 00 >> (ada4:ahcich4:0:0:0): CAM status: Command timeout >> (ada4:ahcich4:0:0:0): Error 5, Periph was invalidated >> (ada4:ahcich4:0:0:0): Periph destroyed >> ahcich4: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) >> ahcich4: Poll timeout on slot 13 port 0 >> ahcich4: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd 80 serr >> 00000000 cmd 0004cd17 >> (aprobe0:ahcich4:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 >> (aprobe0:ahcich4:0:0:0): CAM status: Command timeout >> (aprobe0:ahcich4:0:0:0): Error 5, Retries exhausted >> > > The SSD takes 31 seconds to respond. Try to use it as a regular disk, run > some bechmarcks on it to test it with load. If the disk was working on > another computer, che the cable and the sata port. > > Saludos, Juan > _______________________________________________ > freebsd-questions@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to " > freebsd-questions-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPYfQ9yNWcthCzLDxTr1uJUxo0n5ou7jvk6b=4YH=hdAQfTROQ>