Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 05 Apr 2007 03:44:27 -0700
From:      "Chris H." <chris#@1command.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: NFS == lock && reboot
Message-ID:  <20070405034427.7apn8a1lc8s4wkok@webmail.1command.com>
In-Reply-To: <200704050800.l35805AQ086224@lurza.secnetix.de>
References:  <200704050800.l35805AQ086224@lurza.secnetix.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Oliver Fromme <olli@lurza.secnetix.de>:

> Chris H. <chris#@1command.com> wrote:
> > Oliver Fromme wrote:
> > > [...]
> > > However, I don't think that your actual problem (lock-up
> > > and panics) is related to rpc.lockd or rpc.statd.  It
> > > rather sounds like something else is wrong with your
> > > machine.  NFS works perfectly fine for me, including
> > > copying huge files.
> > >
> > > You wrote that you had a lot of crashes that accumulated
> > > many files in lost+found.  Well, maybe your filesystem
> > > was somehow damaged in the process.  It is possible to
> > > damage file systems in a way that can lead to panics, and
> > > it's not necessarily detected and repaired by fsck.
> >
> > Indeed. I /too/ considered this. However, I largely dismissed this
> > as a possibility as most all of them are 0 length in size. The others
> > are fragments of logs. I'm not /completely/ ruling this out though.
>
> The files in lost+found aren't the problem.  The problem
> is the things that you cannot see, and fsck won't move
> those to lost+found.
>
> In particular, if you use softupdates on drives that have
> write-caching enabled, or on drives that illegally cache
> data even if it's disabled (be it intentionally or because
> of bugs in the firmware), it's almost guaranteed that the
> FS will take damage beyond repair on a crash, and even more
> so after several crashes.
>
> Another potential cause of problems is the background fsck
> feature in FreeBSD 6.  I'm not sure if it has been fixed
> in 6-stable, maybe it has.  I don't want to spread FUD.
> But in the past, if a machine crashed and rebooted during
> a background fsck, that was almost a guarantee for damage
> beyond repair, too.  That's why I always disable background
> fsck on my machines.  (Let me repeat:  It _might_ be fixed
> in 6-stable, I don't know.  I haven't seen a definitive
> confirmation of it being fixed on the mailing lists so
> far.  If somebody knows otherwise, please correct me.)

Greetings, and thank you for your thoughtful reply.
Understood on all points. As mentioned; I wasn't /completely/
ruling that out. I have always refused to permit background fsck.
/Not/ because of any lack of faith I have in FBSD. Frankly, I
have nothing /but/ faith - perhaps more than I ought to. But
rather, because I insist on keeping tabs on what's going on
/at all times/. So, should the system crash/shutdown, or halt
for any reason; the BIOS will keep it in a "shutdown" state should
it gain control. In the case of a kernel reboot/crash; the loader
simply sits and awaits my confirmation before starting the system.
That way I am always guaranteed the opportunity to start in single
user mode and answer to any anomalies that the system reports with
an affirmative/negative.
So. In summary, I am /not/ completely ruling out your suggestion that
irreparable damage has been done as a result of the multitude of crashes
imposed upon it. I am also grateful for your taking the time to share
your experiences and insight with me. I simply haven't found anything
/definitive/ yet. Kris might argue here that NFS seems to be working
fine for everyone else, which would also add credence to your theory.
Both of you may indeed be correct. :)
I just think it'd be worth the time to follow through and make a dump
device and crash it to find the /definitive/ reason for this. It may
in fact turn out to be some obscure/near impossible anomaly in the NFS
code. That /I/ was just (un)lucky enough to stub my toe on. :)
At any rate, as this is a production server - and a /real/ busy one at
that; I want to get a (confirmed) good backup off of it before willingly
bashing it any further. It currently serves the largest Netscape browser
client archive on the net. They are all the 0.x - 4.x series browser
clients. You'd be amazed how popular/ how many people still use them.
So as backing it up onto the NFS mounted backup server is currently out
of the question, and there's more than a Terra byte of browser clients
alone, it's going to take me a little longer to follow through with the
dump device > crash > dump > back trace, than it would otherwise - but
it will be done. :)

Thank you again for taking the time to share your thoughts, suggestions
and experiences. I really appreciate it.

--Chris

>
> Best regards
>   Oliver
>
> --
> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
> Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch=E4ftsfuehrun=
g:
> secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M=FC=
n-
> chen, HRB 125758,  Gesch=E4ftsf=FChrer: Maik Bachmann, Olaf Erb, Ralf Geb=
hart
>
> FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd
>
> "Python is an experiment in how much freedom programmers need.
> Too much freedom and nobody can read another's code; too little
> and expressiveness is endangered."
>        -- Guido van Rossum
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



-- 
panic: kernel trap (ignored)



-----------------------------------------------------------------
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
/////////////////////////////////////////////////////////////////




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070405034427.7apn8a1lc8s4wkok>