Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Oct 2005 15:50:09 +0200
From:      Attila Nagy <bra@fsn.hu>
To:        current@FreeBSD.org
Subject:   Freezes with 6.0 and 7-CURRENT when working with many symlinks/dirs
Message-ID:  <43622C91.5080300@fsn.hu>

next in thread | raw e-mail | index | archive | help
Hello,

I'm struggling with this bug for a while now. I have a fully 
reproduceable freeze with both RELENG_6 and HEAD in amd64 mode (I could 
not try with i386).

It strikes when I want to synchonise a large pool of 
symlinks/directories from another machine to this FreeBSD one.
The total number of files is about 6-10 million.

The freeze occurs randomly, either when rsync deletes a massive amount 
of symlinks, or directories on the local machine, or when it starts to 
create them. But it freezes, no matter what I do.

The machine itself is a HP DL380G4 (two Xeons, HTT on), which has an 
additional SmartArray 6402 controller (ciss0: the SmartArray 6i on the 
motherboard and ciss1 the 6402). I would like to sync onto ciss1, that's 
where the activity happens.

Under "freeze" I mean the machine stops working, I can not ping, ssh 
sessions disconnect and the console hungs. I can do two things in this 
stage. Turning MP_WATCHDOG on catches this and enters the debugger and 
when I issue an NMI I get the same effect (of course :).

I've tried the following to workaround or locate the source of this problem:
- turn HTT off
- turn softupdates off
- turn ACPI off (with the beastie menu)
- turn preemption off
- debug.mpsafevfs=0 and debug.mpsafenet=0
- turn dirhash off
all without success.

I have nfsd and quota enabled, but currently the former is not in use.
The synchronised directories and files are in the ownership of many, non 
existend (not in /etc/master.passwd) uids and I have quota for most of 
those uids.

I could collect three traces, some of them are a little bit mangled by 
the ILO (ssh access to the console).

http://people.fsn.hu/~bra/freebsd/crash-20051028/

crash1 and crash2 is from the in-kernel debugger, crash3 is after the 
MP_WATCHDOG fired and a call doadump and kgdb kernel /var/crash/vmcore...

Any ideas what else should I try, or what should I do in the debugger to 
make it easier to find where the problem is?

Thanks,
-- 
Attila Nagy                                   e-mail: Attila.Nagy@fsn.hu
Adopt a directory on our free software         phone: +3630 306 6758
server! http://www.fsn.hu/?f=brick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43622C91.5080300>