Date: Mon, 21 Sep 2020 17:02:34 -0700 From: Kirk McKusick <mckusick@mckusick.com> To: Colin Percival <cperciva@tarsnap.com> Cc: ericr <erobison@gmail.com>, freebsd-cloud@freebsd.org Subject: Re: Fwd: filesystem checksum problems on AWS EC2 instances Message-ID: <202009220002.08M02YQ3054819@chez.mckusick.com> In-Reply-To: <01000174b15f88a5-8de101fb-8f1b-4adb-bd7e-3c752bf86d61-000000@email.amazonses.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Fri, 18 Sep 2020 20:00:49 -0700 > From: Colin Percival <cperciva@tarsnap.com> > Subject: Re: filesystem checksum problems on AWS EC2 instances > To: ericr <erobison@gmail.com>, freebsd-cloud@freebsd.org, > Kirk McKusick <mckusick@FreeBSD.org> > = > [Adding Kirk since this seems like a UFS issue...] > = > On 2020-09-16 15:15, ericr wrote: >> On Tue, Sep 15, 2020 at 6:24 PM Colin Percival <cperciva@tarsnap.com> w= rote: >>> On 2020-09-15 14:30, ericr wrote: >>>> Sep 1 20:50:15 <kern.crit> freebsd kernel: UFS /dev/gpt/rootfs (/) >>>> cylinder checksum failed: cg 0, cgp: 0x9c14700e !=3D bp: 0x27bfa3d0 >>>> Sep 1 20:50:15 <kern.crit> freebsd syslogd: last message repeated 1 >>> times >>>> Sep 1 20:50:15 <kern.crit> freebsd kernel: UFS /dev/gpt/rootfs (/) >>>> cylinder checksum failed: cg 7, cgp: 0x43ed3fa1 !=3D bp: 0xe9b0182e >>>> >>>> and from there on, I get cylinder checksum errors pretty often. >>> >>> Do you get this if you launch from the non-Marketplace AMIs listed in = the >>> release announcement? >>> https://www.freebsd.org/releases/12.1R/announce.html >> = >> = >> Yes. I just tried both of these AMI's from the release notes: >> us-east-1 region: ami-0de268ac2498ba33d >> us-east-2 region: ami-0a44f10b2c6deb365 >> = >> I got the same errors. > = > I've managed to reproduce this, with a filesystem which I've > verified is clean (at least, which passes fsck) before resizing > up to ~ 200 GB: > = >> root@freebsd:/usr/home/ec2-user # fsck_ufs /dev/nvd1p2 = >> ** /dev/nvd1p2 >> ** Last Mounted on /releng/12-amd64-GENERIC-release/usr/obj/usr/src/amd= 64.amd64/release/cw-ec2/new >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> 25701 files, 758977 used, 229774 free (9654 frags, 27515 blocks, 1.0% f= ragmentation) >> = >> ***** FILE SYSTEM IS CLEAN ***** >> root@freebsd:/usr/home/ec2-user # gpart recover /dev/nvd1 >> nvd1 recovered >> root@freebsd:/usr/home/ec2-user # gpart resize -i 2 /dev/nvd1 >> nvd1p2 resized >> root@freebsd:/usr/home/ec2-user # growfs -y /dev/nvd1p2 >> super-block backups (for fsck_ffs -b #) at: >> [snip] >> root@freebsd:/usr/home/ec2-user # fsck_ufs /dev/nvd1p2 >> ** /dev/nvd1p2 >> ** Last Mounted on = >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> CG 0: BAD CHECK-HASH 0x9c14700e vs 0xc9441f74 >> SUMMARY INFORMATION BAD >> SALVAGE? [yn] n >> = >> CG 7: BAD CHECK-HASH 0xad168305 vs 0x74ba48a >> 25701 files, 758977 used, 50019285 free (9661 frags, 6251203 blocks, 0.= 0% fragmentation) >> = >> ***** FILE SYSTEM MARKED DIRTY ***** >> = >> ***** PLEASE RERUN FSCK ***** > = > This seems like a bug in UFS and/or growfs, but I'm not familiar enough > with either to say any more. > = > Kirk, are you aware of any issues on FreeBSD 12.1-RELEASE which can caus= e > cylinder checksum errors after growfs? (On amd64 if it matters.) If it > would help I can provide you with SSH access to an affected EC2 instance= . > = > -- = > Colin Percival > Security Officer Emeritus, FreeBSD | The power to serve > Founder, Tarsnap | www.tarsnap.com | Online backups for the truly parano= id I have managed to reproduce a similar problem in one of my rather ancient 12.0 bhyve images that I have lying around: FreeBSD 12.0-STABLE (GENERIC) #5 r350458M: Sat Oct 26 21:18:51 UTC 2019 The follow patch fixes it in that instance. Could you please try this in the EC2 instance and see if it also resolves your problem. Kirk McKusick =3D-=3D-=3D Index: sbin/growfs/growfs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sbin/growfs/growfs.c (revision 365971) +++ sbin/growfs/growfs.c (working copy) @@ -572,6 +572,7 @@ updjcg(int cylno, time_t modtime, int fsi, int fso if (sblock.fs_magic =3D=3D FS_UFS1_MAGIC) acg.cg_old_ncyl =3D sblock.fs_old_cpg; = + cgckhash(&acg); wtfs(fsbtodb(&sblock, cgtod(&sblock, cylno)), (size_t)sblock.fs_cgsize, (void *)&acg, fso, Nflag); DBG_PRINT0("jcg written\n"); @@ -947,6 +948,7 @@ updcsloc(time_t modtime, int fsi, int fso, unsigne * Now write the former cylinder group containing the cylinder * summary back to disk. */ + cgckhash(&acg); wtfs(fsbtodb(&sblock, cgtod(&sblock, ocscg)), (size_t)sblock.fs_cgsize, (void *)&acg, fso, Nflag); DBG_PRINT0("oscg written\n"); @@ -1039,6 +1041,7 @@ updcsloc(time_t modtime, int fsi, int fso, unsigne * Write the new cylinder group containing the cylinder summary * back to disk. */ + cgckhash(&acg); wtfs(fsbtodb(&sblock, cgtod(&sblock, ncscg)), (size_t)sblock.fs_cgsize, (void *)&acg, fso, Nflag); DBG_PRINT0("nscg written\n");
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202009220002.08M02YQ3054819>