Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Oct 1997 23:56:33 +0300 (EEST)
From:      Heikki Suonsivu <hsu@mail.clinet.fi>
To:        Karl Denninger <karl@mcs.net>
Cc:        freebsd-current@freebsd.org
Subject:   Re: WARNING! Builds from the last few days have BROKEN NFS
Message-ID:  <199710012056.XAA05552@katiska.clinet.fi>
In-Reply-To: Karl Denninger's message of 27 Sep 1997 23:37:48 %2B0300
References:  <19970927145131.64000@Mars.Mcs.Net> <13496.875390123@critter.freebsd.dk> <19970927150755.39452@Mars.Mcs.Net>

next in thread | previous in thread | raw e-mail | index | archive | help

Similar lockups happen under 2.2-STABLE also.  I do not know exactly if the
reason is the same, it just happens to match your symptoms.  Other symptoms
are panics which are repeatable by doing specific pattern of command (this
evening it was generating a zip file larger than 10M which reproduceably
generated a panic.  See my last PR for traces).

In article <19970927150755.39452@Mars.Mcs.Net> Karl Denninger  <karl@mcs.net> writes:
   Ok -- tomorrow evening I will check out another copy of the current sources,
   and give this another shot with the parameters you're referring to.

   The problem shows up only during fairly heavy I/O load -- my initial tests
   didn't show it, but putting the code on a reasonably-busy web server does,
   and its easily reproduced in about 20-30 minutes.

   Same with the shell systems here.  The symptom is that a single disk I/O
   request will hang in a "D" state.  Further attempts to access that same
   object then also hang, but others, even to the same disk pack, do NOT.

   That is, a "df" still works, but a "cat <object>" locks up.  If "object"
   is a directory then a "ls" will freeze.  If its a file then you have
   to reference the specific file to see the behavior.  Over a fairly short
   period of time once this starts you're in *big* trouble; you'll end up with
   thousands of processes hung in a disk wait for a specific file, and
   eventually run out of either process slots or page space (most people retry
   failed accesses, which makes the problem worse).  Per-user process limits
   (which I have turned off on these machines) would stop some of the damage,
   but not all.

Exactly the same symptoms.

   I was trying to resolve cache inconsistency problems with NFS when I ran
   headfirst into this.  There is a problem with V3 mounts (the default now)
   where you can "mv" a file on one client, and another client never sees the
   change.  This is particularly distressing when you "mv" the access_log
   file from a web server (from another amchine), kick the server to re-create
   the access_log file, and then find that it never shows up on the other
   syste (or does with zero length, but no data in it -- ever).  

Maybe far-fetched, but one of the directories which are locking up on us
are user's www access log directories.  I do not know if this is related.

   If you look on the other system, a "ls" doesn't show the errant file.
   But a "cat" does -- the data is still there.  Needless to say this is
   pretty troublesome, and leads to lots of head-scratching.

   -- 
   Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
   http://www.mcs.net/~karl     | T1's from $600 monthly to FULL DS-3 Service
				| NEW! K56Flex modem support is now available
   Voice: [+1 312 803-MCS1 x219]| 56kbps DIGITAL ISDN DOV on analog lines!
   Fax:   [+1 312 803-4929]     | 2 FULL DS-3 Internet links; 400Mbps B/W Internal

-- 
Heikki Suonsivu, T{ysikuu 10 C 83/02210 Espoo/FINLAND, hsu@clinet.fi
mobile +358-40-5519679 work +358-9-43542270 fax -4555276



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710012056.XAA05552>