From owner-freebsd-stable@FreeBSD.ORG  Wed Jul 19 20:26:26 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9459F16A4DE
	for <freebsd-stable@freebsd.org>; Wed, 19 Jul 2006 20:26:26 +0000 (UTC)
	(envelope-from spork@bway.net)
Received: from mail.bway.net (xena.bway.net [216.220.96.26])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 38D1E43DC1
	for <freebsd-stable@freebsd.org>; Wed, 19 Jul 2006 20:24:55 +0000 (GMT)
	(envelope-from spork@bway.net)
Received: (qmail 11845 invoked by uid 0); 19 Jul 2006 20:24:54 -0000
Received: from unknown (HELO office-dhcp-32.bway.net)
	(spork@bway.net@216.220.107.32)
	by smtp.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP;
	19 Jul 2006 20:24:54 -0000
Date: Wed, 19 Jul 2006 16:19:14 -0400 (EDT)
From: Charles Sprickman <spork@bway.net>
To: freebsd-stable@freebsd.org
In-Reply-To: <Pine.OSX.4.61.0607072227510.3261@white.nat.fasttrackmonkey.com>
Message-ID: <20060719161612.R34644@sporker.bway.net>
References: <Pine.OSX.4.61.0607072227510.3261@white.nat.fasttrackmonkey.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Subject: Re: 6.1 quota issues
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Jul 2006 20:26:26 -0000

Replying to myself and top-posting...

A very helpful person contacted me off-list and pointed me to this PR that 
dates back to 2.2:

http://www.freebsd.org/cgi/query-pr.cgi?pr=2325

I'm going to submit a followup to that so that whomever claims it can see 
that it persists into at least 6.1.

In short, the machine I was rsyncing from had some very high UIDs and 
these seem to trip up the quota code.  The big hint was the 4GB+ 
quota.user file.  I'm still doing some more testing, but so far it looks 
very much like this bug was the root of all my problems.

Thanks,

Charles

On Fri, 7 Jul 2006, Charles Sprickman wrote:

> Hello all,
>
> I'm in the process of rolling out a new shell server and for numerous reasons 
> have decided 6.x is the best fit (jail improvements, SMP improvements, 3Ware 
> driver, pf).  The shell server is within a jail, and the uids there are 
> unique so that quotas remain sane.  There are about 5000 active accounts 
> using about 40GB of a 210GB partition.  The quota.user file is about 4GB.
>
> I just started work on getting quotas setup for everyone after rsyncing all 
> the homedirs from the old server over.  At first, all seemed well, then I ran 
> into a few issues on subsequent rsyncs.  I had people with large (1GB+) 
> homedirs and quotas in the 1GB-4GB range and as rsync was chowning the files 
> to the users it was throwing errors about "quota exceeded".  Here's a brief 
> example that illustrates what I was seeing:
>
> ot@beta[/home/staff/micro/tmp]# quota micro
> Disk quotas for user micro (uid 5315):
> Filesystem   usage   quota   limit   grace   files   quota   limit grace
>     /       1630026 3000000 3100000         13393       0       0
> root@beta[/home/staff/micro/tmp]# chown micro index.html
> chown: index.html: Disc quota exceeded
> root@beta[/home/staff/micro/tmp]#
>
> I know in the past when I've seen inconsistencies indicating that I needed a 
> manual run of quotacheck, they would show up in the output of the quota 
> command; ie: the "quota" command would show the user had more usage than "du" 
> would indicate.  The above example is a bit odd - "quota" shows that he's 
> well within his limits, but the kernel thinks otherwise.
>
> Thinking it would be a good idea to stop the jails, turn off quotas, umount 
> the partition, fsck it, mount it and then run quotacheck, I found more 
> problems.  My first run of quotacheck ran for a few minutes, reported many 
> inconsistencies and then sat there for quite some time before spitting this 
> out:
>
> quotacheck: /jails/quota.user: seek failed: Invalid argument
>
> Trying again, it reported the same inconsistencies then sat there for more 
> than an hour taking up all the available CPU on the box until I killed it. 
> The mtime on quota.user had not changed during the run.
>
> Running it yet again now gives me this:
>
> /jails:          fixed: inodes 27 -> 0  blocks 156 -> 0
> quotacheck: /jails/quota.user: seek failed: Invalid argument
> THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
>        /dev/twed0s1g (/jails)
>
> For now I can live without quotas, but if there's anything I can test from 
> -stable that might address this I'd like to try it.  I'd say this thing is 
> still a good month from going live since we have lots of dependancy mess on 
> the old box to clean up before cutting over.
>
> Any ideas what's going on here?  Is this related to the large number of users 
> and the size of the partition?  I've seen some of the discussions about 
> snapshots + quotas, but that seems like an entirely different issue. For the 
> time being I've killed "background_fsck" and "check_quotas" in rc.conf, and 
> I'll avoid dumping that fs with the snapshot flag.
>
> What other information can I provide to help better define where this bug 
> lives?
>
> Thanks,
>
> Charles
>