Date: Mon, 3 Jul 95 21:43 CDT From: uhclem%nemesis@fw.ast.com (Frank Durda IV) To: bugs@freebsd.org Subject: State of Problem 389 (and 392)? Message-ID: <m0sSxxM-0004w1C@nemesis.lonestar.org>
next in thread | raw e-mail | index | archive | help
Has anybody looked into problem 389 since it was reported back in May? This had to do with the filesystem being corrupted by lots of file/directory deletions and file/directory creations going on at the same time. You eventually end up with directories that can't be deleted by rmdir because the link counts are wrong. Then you must run fsck two or three times to completely straighten-out things. This still happens in 2.0.5R. Two of my client sites are really bugging me about this, as they clean the filesystems every day and encounter the residual of this bug. Makes them paranoid. There was a similar problem with DOS file systems that was reported under 392 and has apparently been closed, but I see no evidence of it being fixed. If anyone knows what happened to 392, I'd like to know. Thanks. Frank Durda IV uhclem%nemesis@fw.ast.com Here is the 389 report again. >Number: 389 >Category: bin >Synopsis: Simultaneous creation/deletion of dirs corrupts filesystem [FDIV024] >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs (FreeBSD bugs mailing list) >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon May 8 21:20:00 1995 >Originator: Frank Durda IV >Organization: >Release: FreeBSD 2.0.950412-SNAP i386 and FreeBSD 1.1.5.1 >Environment: [FDIV024] FreeBSD 2.0.950412-SNAP i386 (also on 2.0.5R) Stock kernel, "make world" kernel, or custom kernel. Problem also noted in FreeBSD 1.1.5.1 on stock and custom kernel. >Description: On my 1.1.5.1, I discovered that I frequently ended-up with directories that could not be deleted in my news partition. The reason rmdir refused to delete the directories was due to bad link counts. Running fsck at least two times would correct the link counts so that the directories could be deleted. I recently discovered that I could cause bogus link counts on demand, simply by trying to remove files and directories while other processes were trying to create files and directories in the same tree. In my case, I was doing some rm -rf commands on selected portions of the newsgroups to obtain space, but at the same time the cnews system was injecting new articles and re-creating some of the directories I was deleting. Note that the partition DOES NOT have to be low on space to create the problem. I reproduced it on a root filesystem that had 7.7Meg free worst case. I tested the latest snapshot and determined the problem still exists. >How-To-Repeat: By using tar and rm I can reproduce the problem on the latest SNAP or 1.1.5.1. In my case, I created a tar file containing about 6 Meg of a heavily expired alt.* tree using cd /usr/spool/news/alt tar cvf /tmp/news.tar * FYI, the alt tree consisted of 538 directories and 1684 files. It seems more important to have a large number of directories than it is to have lots of files. Using the news tree provided this but the failure can probably be caused by using other distribution trees that have lots of directories and small files. Now login on the system to test on at least two screens as root. On screen 1, cd / mkdir test cd test Now, ftp news.tar file from remote system to this location. DO NOT USE /tmp in place of /test! (If you crash - you lose things) mkdir scramble cd scramble tar xvf ../news.tar sync You can fsck here to verify things are sane at this point if you want. Now that the news tree is extracted, begin to exercise the system. The numbers indicate which virtual screen to use for the commands: 1 tar xvf ../news.tar & 2 rm -rf [l-r]* & 2 rm -rf [a-k]* & 2 rm -rf [0-9]* & 2 rm -rf [s-z]* & Now monitor on screen 1 until the tar is about half-way through (by directory), and then repeat all of the above commands. Now wait until both tars complete and wait for all of the rm's to finish. Then issue: rm -rf * and note any "Directory not removed..." messages. If the rm finishes and you didn't get any error messages, start over, and maybe start three cycles of extract and rm running at once. [WARNING - Doing too many extract/rm pairs at once caused the processes to hang with no disk I/O. Characters were echoed (for a while) and CAPS LOCK toggles. Then the system output a message indicating that syslogd had terminated and that it was syncing disks. However it just hung there and never halted. This only happened once and may be related to the VNODE lock problem. I think this lock/shutdown is unrelated to the problem I am reporting. My systems have between 8 and 12 Meg of RAM] Using the above procedure, I eventually ended up with the following undeletable directories: ls -aliR total 5 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 . 142 drwxrwxr-x 3 root wheel 512 May 8 21:55 .. 13788 drwxrwxr-x 5 news news 512 May 8 21:49 politics 13524 drwxrwxr-x 10 news news 512 May 8 21:49 society scramble/politics: total 4 13788 drwxrwxr-x 5 news news 512 May 8 21:49 . 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 .. scramble/society: total 4 13524 drwxrwxr-x 10 news news 512 May 8 21:49 . 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 .. I then sync'ed and halted the system. On reboot, I ran fsck with these results: fsck -y /dev/wd0a ** /dev/rwd0a ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity UNREF DIR I=13581 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=13581 CONNECTED. PARENT WAS I=13524 UNREF DIR I=13578 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=13578 CONNECTED. PARENT WAS I=13524 UNREF DIR I=13544 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=13544 CONNECTED. PARENT WAS I=13524 UNREF DIR I=13792 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:47 1995 RECONNECT? [yn] DIR I=13792 CONNECTED. PARENT WAS I=13788 UNREF DIR I=13539 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=13539 CONNECTED. PARENT WAS I=13524 UNREF DIR I=13555 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=13555 CONNECTED. PARENT WAS I=13524 UNREF DIR I=13536 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=13536 CONNECTED. PARENT WAS I=13524 UNREF DIR I=9037 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=9037 CONNECTED. PARENT WAS I=13524 UNREF DIR I=399 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:47 1995 RECONNECT? [yn] DIR I=399 CONNECTED. PARENT WAS I=13788 UNREF DIR I=4892 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:47 1995 RECONNECT? [yn] DIR I=4892 CONNECTED. PARENT WAS I=13788 UNREF DIR I=166 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 RECONNECT? [yn] DIR I=166 CONNECTED. PARENT WAS I=13524 ** Phase 4 - Check Reference Counts LINK COUNT DIR I=166 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=399 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:47 1995 COUNT 2 SHOULD BE 3 ADJUST? [yn] LINK COUNT DIR I=4892 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:47 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=9037 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13536 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13539 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13544 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13555 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13578 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13581 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:43 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13792 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:47 1995 COUNT 1 SHOULD BE 2 ADJUST? [yn] ** Phase 5 - Check Cyl groups CLEAN FLAG NOT SET IN SUPERBLOCK FIX? [yn] 924 files, 43271 used, 32792 free (272 frags, 4065 blocks, 0.4% fragmentation) ***** FILE SYSTEM WAS MODIFIED ***** ***** REBOOT NOW ***** Now I re-ran fsck because in the past it always took multiple passes to really correct the problems: fsck -y /dev/wd0a ** /dev/rwd0a ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts LINK COUNT DIR I=13524 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:49 1995 COUNT 10 SHOULD BE 2 ADJUST? [yn] LINK COUNT DIR I=13788 OWNER=news MODE=40775 SIZE=512 MTIME=May 8 21:49 1995 COUNT 5 SHOULD BE 2 ADJUST? [yn] ** Phase 5 - Check Cyl groups 924 files, 43271 used, 32792 free (272 frags, 4065 blocks, 0.4% fragmentation) ***** FILE SYSTEM WAS MODIFIED ***** ***** REBOOT NOW ***** Finally, I re-ran fsck a third time: ** /dev/rwd0a ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 924 files, 43271 used, 32792 free (272 frags, 4065 blocks, 0.4% fragmentation) Ok, now here is what the directory looks like now: total 6102 142 drwxrwxr-x 3 root wheel 512 May 8 22:04 . 2 drwxr-xr-x 17 root wheel 512 May 8 21:55 .. 143 -rw-rw-r-- 1 root wheel 505 May 8 21:55 sample1 * 145 -rw-rw-r-- 1 root wheel 3135 May 8 22:01 sample2 * 146 -rw-rw-r-- 1 root wheel 588 May 8 22:02 sample3 * 147 -rw-rw-r-- 1 root wheel 297 May 8 22:02 sample4 * 148 -rw-rw-r-- 1 root wheel 0 May 8 22:04 sample5 * 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 scramble 144 -rw-rw-r-- 1 root wheel 6225920 May 8 21:55 news.tar ./scramble: total 5 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 . 142 drwxrwxr-x 3 root wheel 512 May 8 22:04 .. 13788 drwxrwxr-x 2 news news 512 May 8 21:49 politics 13524 drwxrwxr-x 2 news news 512 May 8 21:49 society ./scramble/politics: total 4 13788 drwxrwxr-x 2 news news 512 May 8 21:49 . 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 .. ./scramble/society: total 4 13524 drwxrwxr-x 2 news news 512 May 8 21:49 . 9032 drwxrwxr-x 4 root bin 3072 May 8 21:55 .. * are the "tee" logs of fsck and ls" for the bug report. They were written to a different partition and moved back to this location after the fscks completed and the system was rebooted. At this point, "politics" and "society" could be deleted with rmdir. (The directories and their files reconnected by fsck land in lost+found.) >Fix: Not known. *END* >Audit-Trail: >Unformatted: *END2*
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m0sSxxM-0004w1C>