Date: Thu, 23 Nov 2006 07:14:18 +0000 From: Chris <chrcoluk@gmail.com> To: "Kris Kennaway" <kris@obsecurity.org> Cc: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: sshfs/nfs cause server lockup Message-ID: <3aaaa3a0611222314k62de0884j610f697c2070c867@mail.gmail.com> In-Reply-To: <20061123061137.GA49872@xor.obsecurity.org> References: <3aaaa3a0611212149u21146180ra84503472a0336e3@mail.gmail.com> <20061122170353.GA38104@xor.obsecurity.org> <3aaaa3a0611222125v36344f17rbc59a60516836b44@mail.gmail.com> <20061123061137.GA49872@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 23/11/06, Kris Kennaway <kris@obsecurity.org> wrote: > On Thu, Nov 23, 2006 at 05:25:21AM +0000, Chris wrote: > > On 22/11/06, Kris Kennaway <kris@obsecurity.org> wrote: > > >On Wed, Nov 22, 2006 at 05:49:12AM +0000, Chris wrote: > > >> On a few occasions all different remote servers I have had nfs cause > > >> servers to stop responding so I stopped using it all the servers were > > >> either 6.0 release 6.1 release or 6-stable. > > >> > > >> We recently discovered sshfs which supports cross platform mounting > > >> server is linux and I mounted on a freebsd 6.1 release using security > > >> branch up to date. > > >> > > >> it was working fine for around 5 to 6 days with some problems with > > >> sshfs not updating files that are updated but wasnt compromising the > > >> stability of the freebsd server I just remounted to keep up to date. > > >> Then today the linux server had network problems so the sshfs timed > > >> out and there is 2 dirs I mount, the first mounted fine a bit slow but > > >> connected but when I ran the command to mount the 2nd dir the server > > >> stopped responding. > > >> > > >> My 2nd ssh terminal was alive I tried to run top to see if sshfs was > > >> hanging or something but when I hit enter top didnt run and the 2nd > > >> terminal was froze, note both terminals didnt timeout and a ircd > > >> running on the server also did not timeout but the box wasnt listening > > >> to any new requests, it was responding to pings fine. > > >> > > >> I have a remote reboot facility on the box but no local access and no > > >> kvm/serial console facility available this is the case for all of my > > >> servers. I initially tried a soft reboot which uses ctrl-alt-delete > > >> but the pings kept replying so I could see the reboot wasn initiated > > >> indicating some kind of console lockup as well, I then did a hard > > >> reboot which brought the server back. > > >> > > >> All logs stopped when the first lockup occured so no errors etc. > > >> recorded bear in mind I have no local access to this machine. It does > > >> appear that 6.x has some kind of serious remote mounting bug because I > > >> never had these nfs problems in freebsd 5.x. > > >> > > >> I would be interested in any thoughts as to what could help me I have > > >> rebooted the server now with network mpsafe disabled to see if this > > >> will help it is using a generic kernel with the following changes. > > > > > >Sounds like your "sshfs" is causing the kernel to deadlock in that > > >error situation. You can confirm by enabling DEBUG_LOCKS and > > >DEBUG_VFS_LOCKS, then breaking to DDB and running 'show lockedvnods' > > >when the deadlock occurs. > > > > > >If you're still having problems with NFS on 6.2, we'd much rather you > > >reported those so that we can investigate and try to fix them. > > > > > >Kris > > > > > > > > > > > > > Ok thanks, I will make sure this box is updated to 6.2 when it hits > > release, if I enable the options in the kernel I will need local > > access to use ddb? > > Yeah, you'll need a form of console access (local or serial). > > In principle you could extract the information from a coredump > (i.e. trigger a coredump when the system deadlocks), but I don't think > there's a kgdb macro equivalent of 'show lockedvnods'. > > Kris > > > > kris a development on this, someone else posted about a nfs problem and reading his post some starkling point he made about network cards, he stated he only gets the bug on sis rl and fxp. I have 2 servers that have no problems they show dc0 and re0 in ifconfig. The servers that lockup I have 2 using fxp0 and 1 using rl0 and another that used sis0 which I no longer have this would back up what he was saying. I wont be able to use ddb on my remote server since the datacentre wont provide kvm even if I offer cash for the service, my local server is rl0 so I will try to repeat the problem on that. Chris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3aaaa3a0611222314k62de0884j610f697c2070c867>