Date: Mon, 24 Jan 2000 15:50:28 -0600 (CST) From: Sean Heber <sean@fifthace.com> To: freebsd-questions@freebsd.org Subject: Update regarding stuck file systems Message-ID: <Pine.BSF.4.10.10001241456160.2386-100000@marvin.fifthace.com>
next in thread | raw e-mail | index | archive | help
Ok, you may remember my previous e-mail about this a few days ago.. I have since done a LOT of testing. I don't have much of a conclusion (which is why I'm writing again). As you may recall, my system had an odd problem. If I ran my backup script (which tars files on one hard drive and puts them on another hard drive), all file system access stopped. So, the box would still be up, top would still be running on the console, but nothing would work because the OS couldn't seem to read from the drive. The kicker, though, is no error messages. Nothing in the logs. Nothing on the console. It would just stop and the processes would happily wait for data from the drives, but none would ever come. So, after a whole lot of swearing and Dew drinking, I have narrowed it down only slightly. It seems that for some reason this only happens around 1:00 - 2:30 AM or so. Never any other times. For example, as I write this a backup is being performed. For testing purposes I've been running one backup after another since 8:00 AM (3:30 PM now). No problems at all. I can't think of any reason why this would fail in the early morning hours and never any other time. It's not uptime related since just yesterday I had the box up and down (while testing this) and everything was going great. When I tried to run the backup again around 1:30AM, it died. I was forced to hit the rest button. Once the system came back up, I figured I would try to narrow things more. So, I unloaded vinum on my two IDE backup drives (see below), reformated one and gave it the same mount point. (So the backup would still work. I don't need all that space just yet.) Once that was done, vinum was not loaded and I gave it another shot. The backup froze again. The box had only been up about 30 minutes. The first night I made the backup process, I put it at the end of my daily.local cron script. It runs at 1:59 or something like that. Before that time, the box was up for 2 days. That first night brought it down with a froze file system. The night after I gave the backup script it's own entry in crontab for 3:00AM. It worked just fine. When I woke up in the morning things still worked. Just the other night I changed the cron's run time to 12:05 AM. That also made it through the night just fine. Does any of this make any sense? It doesn't to me. I suppose I have two basic questions here: 1) Is there anyway to make this work aside from the obvious "Don't run it between 1:00 and 2:30 AM"? Because this really bothers me. I have no idea if heavy server load would cause this to happen or if this is just a backup problem due to something stupid I'm doing. 2) I really need a better backup method. The idea originally was to have a duplicate structure on the backup drive as well as the main drive so that in the event of a disk faliure the broken drive could just be unplugged. Is that reasonable? Obviously using tar the way I am doesn't really allow this. The catch (at least it seems like one to me) is the drives are all different sizes.. (see below) Ok, the famed "below": Running FreeBSD 3.3-RELEASE (I had 3.4-STABLE before. Don't ask. Long story. But the problem is still the same in either case.) SMP Kernel 256 MB RAM Dual PII-400Mhz Currently sitting in my room with no other active users and no outside activity via web or anything (it's still being configured, after all) Drives: SCSI id6: 4.5 GB (boot: /, /usr, swap) SCSI id9: 9.0 GB (backup: /eddie) IDE bus1master: 37 GB (data: /sites) IDE bus1slave: none IDE bus2master: 25 GB (backup1) IDE bus2slave: 20 GB (backup2) The last two backup drives are concated using vinum. Mounted as /wowbagger. The idea is that everything on the boot SCSI drive could be on the backup SCSI drive, and the same for the IDE. This layout is like this because our original plan was to have the ability to unplug the broken drive and get things backup with minimum pain. But using tar sort of defeats the purpose--which is why I would like some more suggestions. :-) The backup script does this right now: echo "Backup /:" tar -cslpf /eddie/root.tar / echo # Backup by itself to be handy, maybe. echo "Backup /usr/local:" tar -clspf /eddie/usr.local.tar /usr/local echo echo "Backup all of /usr:" tar -clpsf /eddie/usr.tar /usr echo echo "Backup /sites:" tar -clpsf /wowbagger/sites.tar /sites echo Make sense? One thing I just realized, though, is that I might hit that famed 2GB file limit. I imagine FreeBSD is prone to this? Oh well. I need a better method anyway.. Just so you know, here's the current df: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0s1a 99183 45741 45508 50% / /dev/da0s1e 3713364 507654 2908641 15% /usr /dev/da1s1e 8679993 1227161 6758433 15% /eddie /dev/wd0s1e 35503710 449097 32214317 1% /sites /dev/vinum/vinum0 43643010 996729 39154841 2% /wowbagger procfs 4 4 0 100% /proc As you can see, the partitions that are being backed up are not over 2GB, so that shouldn't be the problem right now. Anyway.. I'm looking for some input here. It's very very hard to make this problem happen. I can try all day and nothing will come of it, but wait until 1:30AM or so, and it happens almost(key word) everytime. Is something deadlocking? Perhaps something to do with SMP? Or am I doing something terrbily stupid? (feel free to flame.. I need to learn sometime, right? :-) I hope someone has a clue of where to start digging, at least. The last e-mail generated one response. The person suggested I try removing drives one by one from the equation. I'm going to attempt that tonight in more detail. The problem is, setting the clock to 1:30 AM myself doesn't seem to matter. Maybe it's tied to the BIOS time... Or perhaps it's not time related at all and just really really coincedental that it happens around that time all the time regardless of how long the box was up, how hot it is, etc. l8r Sean PS> ARG!!!! (This has been driving me nuts for the past 4.5 days now) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.10001241456160.2386-100000>