From owner-freebsd-current  Sat Feb  8 13:40:46 2003
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1EAE037B401
	for <current@FreeBSD.ORG>; Sat,  8 Feb 2003 13:40:43 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4349043F75
	for <current@FreeBSD.ORG>; Sat,  8 Feb 2003 13:40:42 -0800 (PST)
	(envelope-from phk@phk.freebsd.dk)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h18LedqV021961
	for <current@FreeBSD.ORG>; Sat, 8 Feb 2003 22:40:40 +0100 (CET)
	(envelope-from phk@phk.freebsd.dk)
To: current@FreeBSD.ORG
Subject: Re: Preview: GEOMs statistics code. 
From: phk@phk.freebsd.dk
In-Reply-To: Your message of "Tue, 04 Feb 2003 22:44:47 +0100."
             <25779.1044395087@critter.freebsd.dk> 
Date: Sat, 08 Feb 2003 22:40:39 +0100
Message-ID: <21960.1044740439@critter.freebsd.dk>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG


I have played with the statistics collection in GEOM a bit, and need
more feedback, but first:  try to play with it a bit.

Assuming you're running -current as of today, otherwise install
include files and libgeom by hand first.

Apply this patch in src/sys/geom and make a new kernel.
	http://phk.freebsd.dk/patch/geom_io.patch

Stick these two in a directory and run "make".

	http://phk.freebsd.dk/patch/Makefile
	http://phk.freebsd.dk/patch/gstat.c

Then run
	sysctl kern.geom.collectstats=1
to enable collection of statistics in the kernel (reports of the
performance impact of doing so welcome!)

and then run the "gstat" program in an xterm.  Notice that the program
uses two ANSI escape sequences directly ("ESC [ 2 J" and "ESC [ H") I
didn't want to mess with curses right now.

You will get a display like this:

    dT L(q)  ops/s    r/s   ms/r    w/s   ms/w    d/s   mw/d %busy  Id
 1.010    0      0      0     .       0     .       0     .    0.0  0xc412a580
 1.010    0      0      0     .       0     .       0     .    0.0  0xc3fe2b40
 1.010    0     44      0     .      21    1.0     23    0.0   1.9  0xc412a400
 1.010    0      0      0     .       0     .       0     .    0.0  0xc3fe2680
 1.010    0     44      0     .      21    1.0     23    0.0   1.9  0xc3fe2600
 1.010    0     44      0     .      21    1.0     23    0.1   2.0  0xc4108e80
 1.010    0     44      0     .      21    1.0     23    0.1 101.0  0xc3fe2500
 1.010    0     44      0     .      21    1.1     23    0.6   3.1  0xc4108d00
 1.010    0      0      0     .       0     .       0     .    0.0  0xc3fe24c0
 1.010    0     44      0     .      21    1.1     23    0.6   3.1  0xc3fe2bc0
 1.010    0      0      0     .       0     .       0     .    0.0  0xc418c180
 1.010    0      0      0     .       0     .       0     .    0.0  0xc4312140

The columns are:
   dT
	Seconds in this measurement interval (change the sleep at the
	bottom of gstat.c to modify).
   L(q)
	Number of transactions in queue at this moment in time
   ops/s
	Operations per second in this interval.
   r/s, w/s, d/s
	Reads, Writes and Deletes per second in this interval
   ms/r, ms/w, ms/d
	Milliseconds per read, write and delete (average for interval).
   %busy
	Attempted calculation of %busy according to the discussion here.
   Id
	A not very random number which can be translated to something
	meaningfull with the output of
		sysctl -b kern.geom.confxml
	Another easy way is to use
		dd if=/dev/ad0 of=/dev/null
		dd if=/dev/ad0s1 of=/dev/null
		...
	to identify the various devices if this is important.

And as you will probably discover, the %busy is not very calm and
the other columns will take a hit every so often too.

I can of course make these statistics a perfect snapshot by employing
locks around all the updates (that will be cheaper than atomics because
there are several fields updated at the same time), and grabbing the
lock when I get a snapshot.

But doing so will cost us in performance:

The actual lock operations, even if uncontested costs something,
while it may not affect I/O throughput, it will affect the entire
systems throughput.

Then there is the lock contention, that is probably not too bad,
it's cheap operations and they are not _that_ frequent.

I will need to use read(2) or ioctl(2) to pull the data out of the
kernel instead of the mmap(2) I use now, since I need a place to
grab the lock.  That means userland/kernel transitions.

A number of intermediate solutions exist, such as flagging the
structures while they are being updated (possibly with memory
barriers).

All it comes down to in the end is:  How much of a performance hit
do we want to take to collect disk statistics ?

Input still very much appreciated...

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message