Date: Thu, 11 Jul 2019 16:18:00 +0300 From: Daniel Braniss <danny@cs.huji.ac.il> To: Allan Jude <allanjude@freebsd.org> Cc: freebsd-hackers@freebsd.org Subject: Re: zpool errors Message-ID: <88CFC175-8275-4C4E-B7BE-110E07C0A31C@cs.huji.ac.il> In-Reply-To: <05D8BD75-78B4-4336-8A8A-C84A901CB3D4@cs.huji.ac.il> References: <52CE32B1-7E01-4C35-A2AB-84D3D5BD4E2F@cs.huji.ac.il> <27c3e59a-07ea-5df3-9de2-302d5290a477@freebsd.org> <831204B6-3F3B-4736-89FA-1207C4C46A7E@cs.huji.ac.il> <70f1be10-e37a-de20-e188-6155fda2d06a@freebsd.org> <05D8BD75-78B4-4336-8A8A-C84A901CB3D4@cs.huji.ac.il>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 11 Jul 2019, at 10:39, Daniel Braniss <danny@cs.huji.ac.il> wrote: >=20 >=20 >=20 >> On 10 Jul 2019, at 20:23, Allan Jude <allanjude@freebsd.org> wrote: >>=20 >> On 2019-07-10 11:37, Daniel Braniss wrote: >>>=20 >>>=20 >>>> On 10 Jul 2019, at 18:24, Allan Jude <allanjude@freebsd.org> wrote: >>>>=20 >>>> On 2019-07-10 10:48, Daniel Braniss wrote: >>>>> hi, >>>>> i got a degraded pool, but can=E2=80=99t make sense of the file = name: >>>>>=20 >>>>> protonew-2# zpool status -vx >>>>> pool: h >>>>> state: ONLINE >>>>> status: One or more devices has experienced an error resulting in = data >>>>> corruption. Applications may be affected. >>>>> action: Restore the file in question if possible. Otherwise = restore the >>>>> entire pool from backup. >>>>> see: http://illumos.org/msg/ZFS-8000-8A = <http://illumos.org/msg/ZFS-8000-8A> >>>>> scan: scrub repaired 6.50K in 17h30m with 0 errors on Wed Jul 10 = 12:06:14 2019 >>>>> config: >>>>>=20 >>>>> NAME STATE READ WRITE CKSUM >>>>> h ONLINE 0 0 14.4M >>>>> gpt/r5/zfs ONLINE 0 0 57.5M >>>>>=20 >>>>> errors: Permanent errors have been detected in the following = files: >>>>>=20 >>>>> <0x102>:<0x30723> >>>>> <0x102>:<0x30726> >>>>> <0x102>:<0x3062a> >>>>> =E2=80=A6 >>>>> <0x281>:<0x0> >>>>> <0x6aa>:<0x305cd> >>>>> <0xffffffffffffffff>:<0x305cd> >>>>>=20 >>>>>=20 >>>>> any hints as how I can identify third files? >>>>>=20 >>>>> thanks, >>>>> danny >>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-hackers@freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>>>> To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" >>>>>=20 >>>>=20 >>>> Once a file has been deleted, ZFS can have a hard time determining = its >>>> filename. >>>>=20 >>>> It is inode 198186 (0x3062a) on dataset 0x102. The file has been >>>> deleted, but still exists in at least one snapshot. >>>>=20 >>>> Although, 57 million checksum errors seems like there may be some = other >>>> problem. You might look for and resolve the problem with what = appears to >>>> be a raid5 you have built your ZFS pool on top of it? Then do = 'zpool >>>> clear' to reset the counters to zero, and 'zpool scrub' to try to = read >>>> everything again. >>>>=20 >>>> --=20 >>>> Allan Jude >>>>=20 >>> I don=E2=80=99t know when the first error was detected, and this = host has been up for 367 days! >>> I did a scrub but no change. >>> i will remove old snapshots and see if it helps. >>>=20 >>> is it possible to know at least which volume? >>>=20 >>> thanks, >>> danny >>>=20 >>>=20 >>> _______________________________________________ >>> freebsd-hackers@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" >>>=20 >>=20 >> zdb -ddddd h 0x102 >>=20 >> Should tell you about which dataset that is >>=20 >> --=20 >> Allan Jude >>=20 >=20 the above did=E2=80=99t work for me, but, after removing old snapshots I reduced the problematic files to 1! <0xffffffffffffffff>:<0x305cd> which seems very odd -1? so now I removed more old snapshots, and started a a new zpool scrub. what still worries me is the fast growing checksum count, thanks, danny > firstly, thanks for your help! > now, after doing a zpool clear, I notice that the CHKSUM is growing, > the pool is on a raid controller raid5 (PERC from dell) which is = showing > it=E2=80=99s correcting the errors (=E2=80=98Corrected medium error = during recovery on PD =E2=80=A6). >=20 > so what can be the cause? btw, the FreeBSD is 10.3-stable. >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?88CFC175-8275-4C4E-B7BE-110E07C0A31C>