From owner-freebsd-questions@FreeBSD.ORG Wed Mar 18 11:29:06 2015 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AC2A14AB for ; Wed, 18 Mar 2015 11:29:06 +0000 (UTC) Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3B993958 for ; Wed, 18 Mar 2015 11:29:05 +0000 (UTC) Received: from r56.edvax.de (port-92-195-131-196.dynamic.qsc.de [92.195.131.196]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx02.qsc.de (Postfix) with ESMTPS id DF76024BD5; Wed, 18 Mar 2015 12:29:03 +0100 (CET) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id t2IBT3iF002868; Wed, 18 Mar 2015 12:29:03 +0100 (CET) (envelope-from freebsd@edvax.de) Date: Wed, 18 Mar 2015 12:29:03 +0100 From: Polytropon To: CK Subject: Re: thrashing + lost files Message-Id: <20150318122903.05e189f8.freebsd@edvax.de> In-Reply-To: <0M8Nme-1ZTz3v3hkQ-00vvpg@mail.gmx.com> References: <0M8Nme-1ZTz3v3hkQ-00vvpg@mail.gmx.com> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Mar 2015 11:29:06 -0000 On Wed, 18 Mar 2015 03:05:28 -0800, CK wrote: > > On Tue, 17 Mar 2015 23:56:25 -0800, CK wrote: > > > I would like any thoughts or ideas on how to prevent the following problem, > > > because it is making my computer completely unusable, wasting many efforts. > > > I am using this mail-list because freebsd.forums.org has become completely > > > unusable to those with dial-up connections, requiring 10 seconds for each > > > character typed ... no exaggeration. > > > > A common reply would be: "Who still uses dial-up anyway?" ;-) > > 10s of millions in the USA. High-speed internet is way too expensive, > over $100/mo where I live. Over $1200/yr. Easily 5-10% of take-home > salary for many minimal wage workers. Here in Germany, many people believe that the USA is a "technology utopia", a "magical wonderland" where people earn high wages and have the fastest Internet of the world... :-) > > > The result is the loss of many critical files from a hard drive, as if a "rm > > > *" was done in the home directory. This occurs after the thrashing when > > > Xwindow is accidently shutdown with Opera open with many javascript page tabs, > > > eg, being a memory pig - consuming 1/2 of RAM (256M), which after dumping > > > core, writes a large amount of data (crashlog) even after Xwindow is down: > > > > > > pid 1118 (opera), uid 1001: exited on signal 11 (core dumped) > > > > I thought Opera would simply write a core dump, well, still > > several 100s of MB though... > > Interestingly, the core dump was deleted out of the home directory. I caught a > quick glimpse of it doing "ls" before it was deleted. As I said, it was > exactly like "rm *". Dot files were left intact. Oh, that's surprising! I also had that experience once - home directory empty (!) _except_ dot files (and other directories), just like "rm *" had been issued... very strange... > At first, I thought it was a bug with journaling/soft-updates, so I disabled > those things with tunefs (to the best of my memory). But now it has happened > again. I can't imagine it has to do with that. Massive file loss can appear when a directory inode has been damaged. Then fsck will remove the directory altogether. But it's possible to rescue the files _content_, as those are written with their (orphan) inode number to lost+found/. So their names are lost, but their content will be kept. > The drive was being written to for about 1 minute by the Opera > crashlog/coredump. About 45 seconds after Xwindow was already down. Such kind of crash indicates a significant problem. Are you sure the drives are fully intact? Check with "smartctl -a" just to be sure. And even if it sounds stupid: check the cables. > > > FSCK RESULTS: > > > ------------ > > > Of interest, is that each time fsck was run, more files were lost! > > > > > > # fsck -t ufs -p /dev/ada0p6.eli > > > /dev/ada0p6.eli: NO WRITE ACCESS > > > /dev/ada0p6.eli: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. > > > > This message should alert you. Don't just preen the disk. > > In this mode, only a subset of errors will be detected, > > and not all of them can be corrected. You should actually > > perform > > > > # fsck -t ufs -f /dev/ada0p6.eli > > Thanks, I didn't think of using the -f option. The -f options *f*orces a *f*ull check. You can even run the command two times. The 2nd run should then reveal "no errors", the file system is kept marked clean. > After reading a paper by > Marshall McKusick on fsck, it was my understanding that "preen mode" only > fixed errors that could be fixed with 100% accuracy. I also read that famous paper to gain a better understanding of how UFS works and what fsck does. Data loss teaches you a lot of fundamental knowledge. :-) > > There are several errors shown: > > > > > INCORRECT BLOCK COUNT I=2327435 (8 should be 0) > > > [...] > > > UNREF FILE I=2327428 OWNER=abc MODE=100600 > > > [...] > > > UNREF FILE I=2327439 OWNER=abc MODE=100600 > > > [...] > > > FREE BLK COUNT(S) WRONG IN SUPERBLK > > > [...] > > > SUMMARY INFORMATION BAD > > > [...] > > > BLK(S) MISSING IN BIT MAPS > > I lost about 8 files, a lot of legal research/work, in case that is what the > (8 should be 0) is citing. The question is: Is the data still there? Just because the file is gone - the inode entry -, this does not have to imply that the data isn't still on the disk. Everything is on the disk as long as it hasn't been overwritten. When I found out that one of my files (which I worked a whole day on) was gone (0 bytes) after a freeze + reboot + fsck, I immediately forced a r/o mount on the /home partition and grepped for some text fragment I could remember. I found the block where it was in, dumped that block, and trimmed it to become the original file again. The data wasn't lost, it was fully intact. But not referenced (!) anymore. > > Unmount the partition, let fsck do its job. :-) > > fsck -t ufs -f /dev/ada0p6.eli only reported that > everything was clean. So at _this_ point in time the file system was consistent. Do you maybe have background_fsck="YES" in /etc/rc.conf? Set it to ="NO". Always perform file system checks _prior_ to accessing a file system r/o or even r/w. This may take some time, but you have to find a relation of time vs. data that reflects your priorities. :-) > > Copy files to a different disk (or maybe even external storage, > > such as USB sticks) temporarily, just to be sure. > > Yes, I do this of course, with a USB SDRAM device. But I still lose days of > work, because I can't back up every minute. You could automate this - but on the other hand, when a crash appears, this might also affect the backup process and its results. > This should not happen at all. Yes, it sounds too unusual. > I > have used FreeBSD for 20 years, since 1995, and I never had problems like this > before - and I have the same hardware since 2003, which I ran FreeBSD 4.11 on > until recently. But only now does this problem occur. Certainly, there is a > bug somewhere. My gut feeling is that something is allowing Opera to do > things it should not do, or something in the filesystem layers is breaking > under the stress of Opera's crash dumps. I'd think it's somewhere filesystem-related. I have tortured Opera with approx. 100 tabs open with "Flash" content and JS stuff in it. No crash, it just started swapping heavily. Sometimes I can get Opera to crash, but it successfully "resumes". However, when my system freezes (due to a faulty GPU) and Opera has been running. sometimes the bookmarks are lost. That's why I tend to copy them to ~/ from time to time, just to be sure. In few cases, the Opera settings also are reset. A copy of ~/.opera is helpful. Maybe it's just program design that got worse, like first reading a file into memory, then keeping that file open, maybe modify it, or not, and upon program exit, write memory content back to the file. When the normal program termination is not reached, a damaged or empty file is left behind. I have no idea what makes people write software that way, but it seems to be "modern" now... -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...