Date: Wed, 14 Dec 2011 10:22:52 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Andrey Zonov <andrey@zonov.org> Cc: freebsd-stable@freebsd.org Subject: Re: directory listing hangs in "ufs" state Message-ID: <20111214182252.GA5176@icarus.home.lan> In-Reply-To: <4EE8E6E3.7050202@zonov.org> References: <4EE7BF77.5000504@zonov.org> <20111213221501.GA85563@icarus.home.lan> <4EE8E6E3.7050202@zonov.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote: > Hi Jeremy, > > This is not hardware problem, I've already checked that. I also ran > fsck today and got no errors. > > After some more exploration of how mongodb works, I found that then > listing hangs, one of mongodb thread is in "biowr" state for a long > time. It periodically calls msync(MS_SYNC) accordingly to ktrace > out. > > If I'll remove msync() calls from mongodb, how often data will be > sync by OS? > > -- > Andrey Zonov > > On 14.12.2011 2:15, Jeremy Chadwick wrote: > >On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote: > >> > >>Have you any ideas what is going on? or how to catch the problem? > > > >Assuming this isn't a file on the root filesystem, try booting the > >machine in single-user mode and using "fsck -f" on the filesystem in > >question. > > > >Can you verify there's no problems with the disk this file lives on as > >well (smartctl -a /dev/disk)? I'm doubting this is the problem, but > >thought I'd mention it. I have no real answer, I'm sorry. msync(2) indicates it's effectively deprecated (see BUGS). It looks like this is effectively a mmap-version of fsync(2). I'm extremely confused by this problem. What you're describing above is that the process is "stuck in biowr state for a long time", but what you stated originally was that the process was "stuck in ufs state for a few minutes": > I've got STABLE-8 (r221983) with mongodb-1.8.1 installed on it. A > couple days ago I observed that listing of mongodb directory stuck in > a few minutes in "ufs" state. Can we narrow down what we're talking about here? Does the process actually deadlock? Or are you concerned about performance implications? I know nothing about this "mongodb" software, but the reason it's calling msync() is because it wants to try and ensure that the data it changed in an mmap()-mapped page to be reflected (fully written) on the disk. This behaviour is fairly common within database software, but "how often" the software chooses to do this is entirely a design implementation choice by the authors. Meaning: if mongodb is either 1) continually calling msync(), or 2) waiting for too long a period of time before calling msync(), performance within the process will suffer. #1 could result in overall bad performance, while #2 could result in a process that's spending a lot of time doing I/O (flushing to disk) and therefore appears "deadlocked" when in fact the kernel/subsystems are doing exactly what they were told to do. Removing the msync() call could result in inconsistent data (possibly non-recoverable) if the mongodb software crashes or if some other piece (thread or child? Not sure) expects to open a new fd on that file which has mmap()'d data. This is about all I know. I would love to be able to tell you "consider a different database" but that seems like an excuse rather than an actual solution. I guess if all you're seeing is the process "stall" for long periods of time, but recover normally, then I would open up a support ticket with the mongodb folks to discuss performance. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111214182252.GA5176>