From owner-freebsd-questions@FreeBSD.ORG Fri Dec 8 01:46:10 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C907F16A415 for ; Fri, 8 Dec 2006 01:46:10 +0000 (UTC) (envelope-from freebsd@sopwith.solgatos.com) Received: from schitzo.solgatos.com (pool-71-245-104-192.ptldor.fios.verizon.net [71.245.104.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id 959F843CA8 for ; Fri, 8 Dec 2006 01:45:13 +0000 (GMT) (envelope-from freebsd@sopwith.solgatos.com) Received: from schitzo.solgatos.com (localhost.home.localnet [127.0.0.1]) by schitzo.solgatos.com (8.13.8/8.13.6) with ESMTP id kB81k789019651; Thu, 7 Dec 2006 17:46:07 -0800 Received: from sopwith.solgatos.com (uucp@localhost) by schitzo.solgatos.com (8.13.8/8.13.4/Submit) with UUCP id kB81k7ja019635; Thu, 7 Dec 2006 17:46:07 -0800 Received: from localhost by sopwith.solgatos.com (8.8.8/6.24) id BAA07359; Fri, 8 Dec 2006 01:41:51 GMT Message-Id: <200612080141.BAA07359@sopwith.solgatos.com> To: Kris Kennaway In-reply-to: Your message of "Thu, 07 Dec 2006 19:09:32 EST." <20061208000932.GA35387@xor.obsecurity.org> Date: Thu, 07 Dec 2006 17:41:51 +0000 From: Dieter Cc: freebsd-questions@freebsd.org Subject: Re: processes not getting fair share of available disk I/O (was: Re: TCP parameters and interpreting tcpdump output ) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Dec 2006 01:46:10 -0000 > > > > > > > > hw.ata.wc=3D3D3D3D0 > > > > > > > ^^^^^^^^^^^ > > > > > > > "Make my hard drive go reeeeally slow please (just in case I cr= > ash)=3D > > > " :) > > > > > >=3D3D20 > > > > > > Slower, yes, but not *that* slow. > > > > > >=3D3D20 > > > > > > Normal ls : 0.032 second. Two processes using same disk, multipl= > y by=3D > > > two, > > > > > > so 0.064 second. Maybe the multiplier is more than 2, call it 10= > x, so > > > > > > 0.32 second. But I'm seeing a factor of over 9100x. > > > > >=3D20 > > > > > Humour me and turn it back on, then see what happens. > > > >=3D20 > > > > Where is the knob to turn the write cache on/off on a per-drive basis > > > > in FreeBSD? I can do this in NetBSD, but the only knob I can find in > > > > FreeBSD affects all drives, and requires a reboot. > > >=20 > > > Yes, I think you need to do it globally at boot time. > > >=20 > > > > Humour me and read the Subject line. The ls does not get its fair sh= > are > > > > of disk I/O. > > > >=3D20 > > > > Both times are with the disk's write cache in write-through mode. > > > > I'm not comparing times with the write cache in different modes. > > > > I'm comparing ls by itself against ls competing with cp. > > >=20 > > > Your cp is going to be running synchronously, i.e. spend a lot of time > > > waiting on the disk to perform the writes. This may well be the cause > > > of your problem. Once we have established whether or not it is the > > > cause, we can proceed to whether this behaviour can be improved. > >=20 > > I submitted PR 106340 asking for a way to control the disk write cache on > > a per disk basis like NetBSD can. Meanwhile, I added a PATA via USB disk, > > which judging from the write speed, appears to be immune from hw.ata.wc= > =3D0. > >=20 > > So I now have a disk which has the write cache on, is connected via a dif= > ferent > > controller, and thus uses a different device driver. > >=20 > > I still see the same problems. Writing to one disk *significantly* slows= > down > > writing to another disk. Even if one process is at normal default priori= > ty > > and the other is running at rtprio 5. Regardless of which process uses t= > he > > USB disk and which uses the direct-to-chipset disk. Even if the rtprio 5 > > process only needs a very small fraction of the disk bandwidth, it still = > gets > > slowed down to the point that data is lost. > >=20 > > My current SWAG is that writing to a disk requires some spl/mutex/lock th= > at > > is global across all disks on the system. And this spl/mutex/lock is a > > bottleneck. > > In the case of USB devices, yes - all USB accesses require Giant so > all USB I/O is serialized. This isn't true in general though, unless > you have debug.mpsafevfs=3D0 set (or forced because of something else, > e.g. quotas). If this is set then all filesystem I/O is serialized > (and maybe it's even worse, if there are also device drivers in the > I/O path that also require Giant, like USB). debug.mpsafevfs: 1 machine is single CPU I'm not using quotas. > However, I don't know what you mean by "data is lost". Data should > never be lost from the filesystem regardless of how slow the I/O is > happening, unless there's something else going wrong (e.g. driver > bug). > > Also, rtprio should not be used in general - see the manpage. Were > you using rtprio in your original scenario? It can easily cause > resource starvation. I have data arriving on Ethernet. The data rate is 2.5 MB/s max, but the other end only has a small buffer. If the BSD box doesn't read the port fast enough, the data is lost. I have a C program (port2file) reading from the port into a *large* circular buffer, currently 431,226,880 bytes. This should be enough to buffer over 2 minutes of data. It does non-blocking 64KB writes to stdout. Shell script calls this program and redirects stdout to a disk file. Very little if any other i/o to this disk. Even with disk cache in write-through mode, I can write at about 6-7 MB/s. The process needs very little CPU. Sounds like this should be no problem. And it seems to work okay if the system is otherwise idle. The problem is that if some other process is writing to some other disk, it somehow slows down writes to ALL disks. Enough that, dispite the non-blocking writes (?), the TCP receive window shrinks and shrinks and finally is smaller than a packet. The src machine obediantly stops sending packets, its small buffer fills up, and data is lost. Things I have done so far: BIG buffer (over 2 minutes worth). The port2file process cranks up the TCP receive window from 65700 to 197100. It also cranks up rtprio from 20 to 5. sysctl net.inet.tcp.delayed_ack=0 The only process running rtprio is port2file. All other processes are either default priority or niced down with the classic nice(1).