From owner-freebsd-stable@FreeBSD.ORG Wed Jul 19 20:26:26 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9459F16A4DE for ; Wed, 19 Jul 2006 20:26:26 +0000 (UTC) (envelope-from spork@bway.net) Received: from mail.bway.net (xena.bway.net [216.220.96.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 38D1E43DC1 for ; Wed, 19 Jul 2006 20:24:55 +0000 (GMT) (envelope-from spork@bway.net) Received: (qmail 11845 invoked by uid 0); 19 Jul 2006 20:24:54 -0000 Received: from unknown (HELO office-dhcp-32.bway.net) (spork@bway.net@216.220.107.32) by smtp.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 19 Jul 2006 20:24:54 -0000 Date: Wed, 19 Jul 2006 16:19:14 -0400 (EDT) From: Charles Sprickman To: freebsd-stable@freebsd.org In-Reply-To: Message-ID: <20060719161612.R34644@sporker.bway.net> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Re: 6.1 quota issues X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2006 20:26:26 -0000 Replying to myself and top-posting... A very helpful person contacted me off-list and pointed me to this PR that dates back to 2.2: http://www.freebsd.org/cgi/query-pr.cgi?pr=2325 I'm going to submit a followup to that so that whomever claims it can see that it persists into at least 6.1. In short, the machine I was rsyncing from had some very high UIDs and these seem to trip up the quota code. The big hint was the 4GB+ quota.user file. I'm still doing some more testing, but so far it looks very much like this bug was the root of all my problems. Thanks, Charles On Fri, 7 Jul 2006, Charles Sprickman wrote: > Hello all, > > I'm in the process of rolling out a new shell server and for numerous reasons > have decided 6.x is the best fit (jail improvements, SMP improvements, 3Ware > driver, pf). The shell server is within a jail, and the uids there are > unique so that quotas remain sane. There are about 5000 active accounts > using about 40GB of a 210GB partition. The quota.user file is about 4GB. > > I just started work on getting quotas setup for everyone after rsyncing all > the homedirs from the old server over. At first, all seemed well, then I ran > into a few issues on subsequent rsyncs. I had people with large (1GB+) > homedirs and quotas in the 1GB-4GB range and as rsync was chowning the files > to the users it was throwing errors about "quota exceeded". Here's a brief > example that illustrates what I was seeing: > > ot@beta[/home/staff/micro/tmp]# quota micro > Disk quotas for user micro (uid 5315): > Filesystem usage quota limit grace files quota limit grace > / 1630026 3000000 3100000 13393 0 0 > root@beta[/home/staff/micro/tmp]# chown micro index.html > chown: index.html: Disc quota exceeded > root@beta[/home/staff/micro/tmp]# > > I know in the past when I've seen inconsistencies indicating that I needed a > manual run of quotacheck, they would show up in the output of the quota > command; ie: the "quota" command would show the user had more usage than "du" > would indicate. The above example is a bit odd - "quota" shows that he's > well within his limits, but the kernel thinks otherwise. > > Thinking it would be a good idea to stop the jails, turn off quotas, umount > the partition, fsck it, mount it and then run quotacheck, I found more > problems. My first run of quotacheck ran for a few minutes, reported many > inconsistencies and then sat there for quite some time before spitting this > out: > > quotacheck: /jails/quota.user: seek failed: Invalid argument > > Trying again, it reported the same inconsistencies then sat there for more > than an hour taking up all the available CPU on the box until I killed it. > The mtime on quota.user had not changed during the run. > > Running it yet again now gives me this: > > /jails: fixed: inodes 27 -> 0 blocks 156 -> 0 > quotacheck: /jails/quota.user: seek failed: Invalid argument > THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: > /dev/twed0s1g (/jails) > > For now I can live without quotas, but if there's anything I can test from > -stable that might address this I'd like to try it. I'd say this thing is > still a good month from going live since we have lots of dependancy mess on > the old box to clean up before cutting over. > > Any ideas what's going on here? Is this related to the large number of users > and the size of the partition? I've seen some of the discussions about > snapshots + quotas, but that seems like an entirely different issue. For the > time being I've killed "background_fsck" and "check_quotas" in rc.conf, and > I'll avoid dumping that fs with the snapshot flag. > > What other information can I provide to help better define where this bug > lives? > > Thanks, > > Charles >