Date: Fri, 20 Aug 1999 11:13:13 -0700 (PDT) From: Julian Elischer <julian@whistle.com> To: fs@freebsd.org Subject: Re: BUG in 3.2 fsck! (fwd) Message-ID: <Pine.BSF.3.95.990820111248.1212A-100000@current1.whistle.com>
next in thread | raw e-mail | index | archive | help
further discussion... ---------- Forwarded message ---------- Date: Fri, 20 Aug 1999 02:57:25 -0700 (PDT) From: milt <milt@vicor-nb.com> To: julian@whistle.com, milt@vicor-nb.com Cc: cayford@vicor-nb.com, conor@vicor-nb.com, davep@vicor-nb.com, daver@vicor-nb.com, jrh@vicor-nb.com Subject: Re: BUG in 3.2 fsck! HIYA Well, soft updates sure sounds interesting. One of our current problems is that our damn raids don't preserve the disk write order requested by unix. I'm fighting that - maybe soft updates will give me enough ammunition to win that fight. If not, soft updates won't help us! I haven't read the soft updates paper yet - I will, but not tonight. One question, I am not sure how soft updates are intended to inter-act with fsck. Is it your intention that fsck behave differently only in preen mode? If so, you have mis-spelled at least one if statement (the LINK COUNT INCREASING test in dir.c::adjust). I am writing this now because I want to let you see my current status for fsck fixes before you install my previous patch. I will repeat all of this in later mail (with test instructions yet) once I figure out what the heck I want to do. What I have right now is: BUG 1: When lost+found is allocated on a new highest block number for a cylinder group it ends up without an inoinfo entry and will be flagged available during pass 5. This is the one in my previous mail. BUG 2: When an orphan directory happens to start with the a parent pointer to an inode which will become a newly allocated lost+found, the loop in pass 2 will skip the i_dotdot update because it points to a USTATE inode, but pass 3 will unwind the update which wasn't done because it unwinds i_dotdot for everything it connects! (The inode isn't USTATE anymore because it's now lost+found's inode.) BUG 3: It has become virtually impossible to learn things from redirected output. Some lines go partially to stderr and partially to stdout with disastrous results even when both stdout and stderr are redirected! What I am running right now is an fsck that does not mention stderr. (2.2.8's fsck mentions stderr only for fatal setup problems - that works too, but it requires less thought to just eliminated stderr.) BUG 4: When Milt's new code puts over 32768 files in lost+found is is committing a grave error (di_nlinks is a signed, 16 bit quantity). Milt better get his act together before he publishes this. NOTE: there is no problem with allocation or extra passes here. fsck has long been allocating disk pages as it extends lost+found. NEW FEATURE: a q switch which suppresses output for and questions about things that would be(/are) fixed in preen mode. When q is in effect, preen mode fixes just happen - no notification to the operator and no questions. This allows us to get a screen which shows only the interesting errors. The preen mode problems get fixed quietly and only the serious stuff ends up in the operators face or on the redirected output file! My original intention was to have preen mode keep running after some errors, but I now understand why you thought that would be hard. This new switch achieves my goal of seeing only the real problems and is lots easier to implement. DISCUSSION: As you can deduce from my discovery of bug 4, I really am having lots of fun testing all this junk. Current solution to bug 2 is to update the i_dotdot count even in USTATE inodes during pass2. That causes lost+found to come out right but pre-cludes adding inodes in mid stream, invalidating my previous patch for bug 1. Currently, I am pre-allocating one extra inoinfo slot per cylinder group (which prevents bug 1) and updating USTATE counts (which fixes bug 2). I realized that bug 4 was out there only a few minutes ago. Two solutions occur to me: a. Switch to a new directory under a different name when lost+found has 32760 entries. b. Bag it and claim lost+found is full when it has 32760 files in it. With 5 to 8 million files/directories in a file system, 32760 isn't very many so I am not enthused about b. On the other hand, I can't think of a fix for bugs 1/2 which is compatable with solution a. So, I think I'll go to bed! Hmmm, pondering and rereading this an interesting possibility occurs to me. On a bad hardware day, it would help if we put each fsck run in a different lost+found directory (lost+found.01, lost+found.02, etc.). fsck would ALWAYS allocate a new lost+found and if you had multi crashes on one day it would be easier to tell which lost+found files should be recovered to where. (We really do have tools to recover these beasties and are working on improving them.) Which makes the unimplementable solution a more interesting! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.990820111248.1212A-100000>