Date: Mon, 13 Jun 2022 20:20:06 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 253954] kernel: g_access(958): provider da8 has error 6 set Message-ID: <bug-253954-227-J5vQkNzJzc@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-253954-227@https.bugs.freebsd.org/bugzilla/> References: <bug-253954-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253954 jnaughto@ee.ryerson.ca changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jnaughto@ee.ryerson.ca --- Comment #4 from jnaughto@ee.ryerson.ca --- Any update on this bug. I just experienced the exact same issue. I have 8 disks (all SATA) connected to a Freebsd 12.3 system. The ZFS pool is setup= as a raidz3. Got in today found one drive was "REMOVED" # zpool status pool pool: pool state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0 in 0 days 02:32:26 with 0 errors on Sat Jun 11 05:32:26 2022 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 8936423309855741075 REMOVED 0 0 0 was /dev/ada5 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 I assumed that the drive had died and pulled it. I put a new drive in place and attempted to replace it: # zpool replace pool 8936423309855741075 ada5 cannot replace 8936423309855741075 with ada5: no such pool or dataset It seems that the old drive somehow is still remembered by the system. I d= ug through the logs to find the following occurring when the new drive is inse= rted into the system: Jun 13 13:03:15 server kernel: cam_periph_alloc: attempt to re-allocate val= id device ada5 rejected flags 0x118 refcount 1 Jun 13 13:03:15 server kernel: adaasync: Unable to attach to new device due= to status 0x6 Jun 13 13:04:23 server kernel: g_access(961): provider ada5 has error 6 set Did a reboot without the new drive in place. On reboot the output of the p= ool did look somewhat different: # zpool status pool pool: pool state: DEGRADED status: One or more devices could not be used because the label is missing = or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-4J scan: scrub repaired 0 in 0 days 02:32:26 with 0 errors on Sat Jun 11 05:32:26 2022 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 8936423309855741075 FAULTED 0 0 0 was /dev/ada5 ada5 ONLINE 0 0 0 diskid/DISK-Z1W4HPXX ONLINE 0 0 0 errors: No known data errors I assumed this was due to the fact that there was one less drive attached a= nd the system assigned new adaX values to each drive. At this point when I inserted the new drive the new drive appeared as an ada9. So I re-issued t= he zpool replace command but now with ada9. Though it did take about 3mins be= fore the zpool replace command responded back (which really concerned me). Yet = the server has quite a few users accessing the filesystem so I thought as long = as the new drive was re-silvering I would be fine.... I do a weekly scrub of the pool and I believe the error crept up after the scub. at 11am today the logs showed the following response: Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. AC= B: ea 00 00 00 00 40 00 00 00 00 00 00 Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Comm= and timeout Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Retrying command= , 0 more tries remain Jun 13 11:30:35 172.16.20.66 kernel: ahcich5: Timeout on slot 5 port 0 Jun 13 11:30:35 172.16.20.66 kernel: ahcich5: is 00000000 cs 00000060 ss 00000000 rs 00000060 tfd c0 serr 00000000 cmd 0004c517 Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. AC= B: ea 00 00 00 00 40 00 00 00 00 00 00 Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Comm= and timeout Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Retrying command= , 0 more tries remain Jun 13 11:31:08 172.16.20.66 kernel: ahcich5: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) At 11:39 I believe the following log entries are of note: Jun 13 11:39:45 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request Jun 13 11:39:45 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Error 5, Periph = was invalidated Jun 13 11:39:45 172.16.20.66 ZFS[92964]: vdev state changed, pool_guid=3D$5100646062824685774 vdev_guid=3D$8936423309855741075 Jun 13 11:39:45 172.16.20.66 ZFS[92966]: vdev is removed, pool_guid=3D$5100646062824685774 vdev_guid=3D$8936423309855741075 Jun 13 11:39:46 172.16.20.66 kernel: g_access(961): provider ada5 has error= 6 set Jun 13 11:39:47 reactor syslogd: last message repeated 1 times Jun 13 11:39:47 172.16.20.66 syslogd: last message repeated 1 times Jun 13 11:39:47 172.16.20.66 kernel: ZFS WARNING: Unable to attach to ada5. Any idea on what was the issue? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-253954-227-J5vQkNzJzc>