From owner-freebsd-stable@freebsd.org Fri May 11 22:21:22 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 17D4EFD0FBD for ; Fri, 11 May 2018 22:21:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id A35EA84645 for ; Fri, 11 May 2018 22:21:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 61C50FD0FB2; Fri, 11 May 2018 22:21:21 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 382CDFD0FB1 for ; Fri, 11 May 2018 22:21:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BFABB84641 for ; Fri, 11 May 2018 22:21:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 106E77FF6 for ; Fri, 11 May 2018 22:21:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w4BMLJVJ034104 for ; Fri, 11 May 2018 22:21:19 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w4BMLJAg034103 for stable@FreeBSD.org; Fri, 11 May 2018 22:21:19 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: stable@FreeBSD.org Subject: [Bug 228174] [dump] dump(8) can read garbage and loop forever Date: Fri, 11 May 2018 22:21:20 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 11.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: eugen@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: stable@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter cc Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2018 22:21:22 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D228174 Bug ID: 228174 Summary: [dump] dump(8) can read garbage and loop forever Product: Base System Version: 11.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: stable@FreeBSD.org Reporter: eugen@freebsd.org CC: fs@FreeBSD.org, longwitz@incore.de, mckusick@FreeBSD.org Hi! I have several FreeBSD boxes with UFS2 and soft-updates enabled (no SU+J) mounted as (ufs, local, soft-updates). They periodically run dump(8) to make zero-level backups once a week and first level incremental backup on other = days of week. The command line is like: dump -$level -C 32 -h 0 -au -f - /home That is, not snapshots used and cache size is big. Sometimes it finished just fine. Sometimes it its four children start to eat CPU at max and loop forever not making progress and never finishing until killed manually. I've rebuilt /sbin/dump with debugging options -O0 -g and installed unstrip= ped binary and when the problem reproduced, I ran sysctl kern.corefile=3D/var/tmp/%N.%P.core && killall -QUIT dump. So, I have five = core files with debug info. I've digged it a bit and I think I found the problem. Here is what do I get= for FreeBSD 11.1-STABLE/amd64 r332356 and file system having 41943040 sectors, = 512 bytes per sector and standard bsize=3D32768, fsize=3D4096: Core was generated by `dump: /dev/mirror/gm0s1g: pass 4: 60.53% done, finis= hed in 0:03 at Fri May 11 03'. Program terminated with signal 3, Quit. Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x000000080098bfea in _read () from /lib/libc.so.7 (gdb) bt #0 0x000000080098bfea in _read () from /lib/libc.so.7 #1 0x0000000000408806 in atomic (func=3D0x4021f4 , fd=3D5, buf=3D0x7ffffffedb44 "", count=3D4) at /usr/local/src/sbin/dump/tape.c:877 #2 0x0000000000407d75 in flushtape () at /usr/local/src/sbin/dump/tape.c:2= 45 #3 0x000000000040814b in dumpblock (blkno=3D3906360519982919218, size=3D32= 768) at /usr/local/src/sbin/dump/tape.c:188 #4 0x000000000040cd38 in ufs2_blksout (dp=3D0x8011a9fc0, blkp=3D0x7ffffffe= dc58, frags=3D679, ino=3D522029, last=3D1) at /usr/local/src/sbin/dump/traverse.c:704 #5 0x000000000040cfea in dmpindir (dp=3D0x8011a9fc0, ino=3D522029, blk=3D5= 832576, ind_level=3D0, size=3D0x7fffffffdd58) at /usr/local/src/sbin/dump/traverse.c:609 #6 0x000000000040c322 in dumpino (dp=3D0x8011a9fc0, ino=3D522029) at /usr/local/src/sbin/dump/traverse.c:573 #7 0x0000000000404cd1 in main (argc=3D0, argv=3D0x7fffffffec60) at /usr/local/src/sbin/dump/main.c:579 Note insanely large blkno for frame 3. I think the problem is in the dmpindir() function, frame 5. Note ino=3D522029. It corresponds to some large text CSV file this box prod= uces periodically and "find /home -inode 522029 -ls" shows its pathname and I can see its contents to the end with "less" command just fine. However: (gdb) frame 5 #5 0x000000000040cfea in dmpindir (dp=3D0x8011a9fc0, ino=3D522029, blk=3D5= 832576, ind_level=3D0, size=3D0x7fffffffdd58) at /usr/local/src/sbin/dump/traverse.c:609 (gdb) l dmpindir 580 * Read indirect blocks, and pass the data blocks to be dumped. 581 */ 582 static void 583 dmpindir(union dinode *dp, ino_t ino, ufs2_daddr_t blk, int ind_lev= el, 584 off_t *size) 585 { 586 union { 587 ufs1_daddr_t ufs1[MAXBSIZE / sizeof(ufs1_daddr_t)]; 588 ufs2_daddr_t ufs2[MAXBSIZE / sizeof(ufs2_daddr_t)]; 589 } idblk; 590 int i, cnt, last; 591 592 if (blk !=3D 0) 593 bread(fsbtodb(sblock, blk), (char *)&idblk, 594 (int)sblock->fs_bsize); 595 else 596 memset(&idblk, 0, sblock->fs_bsize); 597 if (ind_level <=3D 0) { 598 if (*size > NINDIR(sblock) * sblock->fs_bsize) { 599 cnt =3D NINDIR(sblock) * sblock->fs_frag; (gdb) p blk $1 =3D 5832576 (gdb) p idblk.ufs2[0] $2 =3D 3329910757767148590 The code bread's garbage from the file system (line 593) and uses it later without any sanity checks. --=20 You are receiving this mail because: You are the assignee for the bug.=