From owner-freebsd-arch@FreeBSD.ORG  Mon Jan 14 16:16:13 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D8DF95E6;
 Mon, 14 Jan 2013 16:16:13 +0000 (UTC) (envelope-from bright@mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id B1E43254;
 Mon, 14 Jan 2013 16:16:13 +0000 (UTC)
Received: from Alfreds-MacBook-Pro-9.local (unknown [64.25.27.130])
 by elvis.mu.org (Postfix) with ESMTPSA id 0FA2A1A3C24;
 Mon, 14 Jan 2013 08:16:11 -0800 (PST)
Message-ID: <50F42F4B.303@mu.org>
Date: Mon, 14 Jan 2013 11:16:11 -0500
From: Alfred Perlstein <bright@mu.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
To: Andre Oppermann <andre@freebsd.org>
Subject: Re: svn commit: r243631 - in head/sys: kern sys
References: <201211272119.qARLJxXV061083@svn.freebsd.org>
 <ABB3E29B-91F3-4C25-8FAB-869BBD7459E1@bluezbox.com>
 <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com>
 <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com>
 <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu>
 <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com>
 <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com>
 <50EB1841.5030006@bluezbox.com> <50EB22D2.6090103@rice.edu>
 <50EB415F.8020405@freebsd.org>
 <CA+7sy7CkdoyScOEDEXWuwJxjCS5zTcC8_fu9isCeTFxT8opNJQ@mail.gmail.com>
 <50F04FE5.7010406@rice.edu>
 <CA+7sy7D=ZjTLirGW3BVGcAu0h8-dWpib+YziUjEqegOL9J4adw@mail.gmail.com>
 <CAJ-VmonLoL4E3UsNwx87p2FuHXTbJe7wFs9hBn5Zmr7TTQOSkg@mail.gmail.com>
 <50F1BD69.4060104@mu.org>
 <CAJ-VmokjZ_vpcmYeD65pWJN5tfhqn6yDXrFFcXf8dvYc55tQtg@mail.gmail.com>
 <50F2F79C.7040109@mu.org> <50F41F8C.5030900@freebsd.org>
In-Reply-To: <50F41F8C.5030900@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Adrian Chadd <adrian@freebsd.org>, src-committers@freebsd.org,
 Alan Cox <alc@rice.edu>, "Jayachandran C." <jchandra@freebsd.org>,
 svn-src-all@freebsd.org, Oleksandr Tymoshenko <gonzo@bluezbox.com>,
 freebsd-arch@freebsd.org, svn-src-head@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jan 2013 16:16:13 -0000

On 1/14/13 10:09 AM, Andre Oppermann wrote:
> On 13.01.2013 19:06, Alfred Perlstein wrote:
>> On 1/12/13 10:32 PM, Adrian Chadd wrote:
>>> On 12 January 2013 11:45, Alfred Perlstein <bright@mu.org> wrote:
>>>
>>>> I'm not sure if regressing to the waterfall method of development 
>>>> is a good
>>>> idea at this point.
>>>>
>>>> I see a light at the end of the tunnel and we to continue to just 
>>>> handle
>>>> these minor corner cases as we progress.
>>>>
>>>> If we move to a model where a minor bug is grounds to completely 
>>>> remove
>>>> helpful code then nothing will ever get done.
>>>>
>>> Allocating 512MB worth of callwheels on a 16GB MIPS machine is a
>>> little silly, don't you think?
>>>
>>> That suggests to me that the extent of which maxfiles/maxusers/etc
>>> percolates the codebase wasn't totally understood by those who wish to
>>> change it.
>>>
>>> I'd rather see some more investigative work into outlining things that
>>> need fixing and start fixing those, rather than "just change stuff and
>>> fix whatever issues creep up."
>>>
>>> I kinda hope we all understand what we're working on in the kernel a
>>> little better than that.
>>
>> Cool!   I'm glad people are now aware of the callwheel allocation 
>> being insane with large maxusers.
>>
>> I saw this about a month ago (if not longer), but since there were 
>> half a dozen people calling me an
>> imbecile who hadn't really yet read the code I didn't want to inflame 
>> them more by fixing that with
>> "a hack". (actually a simple fix).
>>
>> A simple fix is to clamp callwheel size to the previous result of a 
>> maxusers of 384 and call it a day.
>>
>> However the simplicity of that approach would probably inflame too 
>> many feelings so I am unsure as
>> how to proceed.
>>
>> Any ideas?
>
> I noticed the callwheel dependency as well and asked mav@ about it
> in a short email exchange.  He said it has only little use and goes
> away with the calloutng import.  While that is outstanding we need
> to clamp it to a sane value.
>
> However I don't know what a sane value would be and why its size is
> directly derived from maxproc and maxfiles.  If there can be one
> callout per process and open file descriptor in the system, then
> it probably has to be so big.  If it can deal with 'collisions'
> in the wheel it can be much smaller.
>
If it really goes away with calloutng, then we should probably leave it 
be in -current.

As far as clipping it when/if we push maxusers fixes in -stable (which 
we must do) then my impression (although maybe wrong) is that the 
callwheels (cc_callwheel) are just arrays of hash buckets based on what 
tick will be fired next MOD callwheelmask.  This means that if 
cc_callwheel is way too small, then we will wind up with collisions, 
however if it's enormous then we wind up with a window that is so large 
it can accommodate something like hundreds of ticks into the future.

Example:
> Loaded symbols for /boot/kernel/profile.ko
> #0  sched_switch (td=0xffffffff81373e40, newtd=0xfffffe001aab5960,
>     flags=<value optimized out>) at ../../../kern/sched_ule.c:1954
> 1954            cpuid = PCPU_GET(cpuid);
> (kgdb) p callwheelsize
> $1 = 2097152
> Current language:  auto; currently minimal
> (kgdb) # .(16:06:31)(root@dan)
> /usr/home/alfred # sysctl -a | grep hz
> kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 }
> kern.dcons.poll_hz: 25
> kern.hz: 1000
> debug.psm.hz: 20
> .(16:06:37)(root@dan)
> /usr/home/alfred # 2097152
> .(16:06:40)(root@dan)
> /usr/home/alfred # bc
> 2097152 / 1000
> 2097
> ^D# .(16:06:56)(root@dan)
> /usr/home/alfred # sysctl kern.maxusers
> kern.maxusers: 3406

So basically on this box there are enough callwheel slots for something 
like 2097 seconds, or 34 minutes into the future.

I would assume that a machine that was capped at 384 maxusers would wind 
up with something that could handle callouts up to ~3 minutes in the 
future without wraparound and collisions.

As far as the ncallout, that is for timeout(9) support.  At a glance I'm 
not aware of any users of timeout(9) that are not "per device" so 
there's unlikely to be a need for a timeout(9) supporting pre-allocated 
timeout per prorcess/file, more likely something like N-devices*4, which 
is fine at something way lower than the max allocated at 384 maxusers 
from before all the changes we have made.

I could be wrong.. but I still believe that it would be quite the system 
that would need more than 
callout=get_callout_from_maxusers(min(maxusers, 384));

Functions calling this function: timeout

Functions calling this function: timeout

   File                 Function                  Line
0 si.c                 si_start                  1439 pp->lstart_ch = 
timeout(si_lstart, (caddr_t)pp, time);
1 sio.c                siobusycheck              1269 
timeout(siobusycheck, com, hz / 100);
2 sio.c                siopoll                   1744 
timeout(siobusycheck, com, hz / 100);
3 sio.c                siosettimeout             2203 sio_timeout_handle 
= timeout(comwakeup, (void *)NULL,
4 sio.c                comwakeup                 2220 sio_timeout_handle 
= timeout(comwakeup, (void *)NULL, sio_timeout);
5 syscons.c            scrn_timer                1834 
timeout(scrn_timer, sc, hz / 10);
6 syscons.c            scrn_timer                1884 
timeout(scrn_timer, sc, hz / 10);
7 syscons.c            scrn_timer                1902 
timeout(scrn_timer, sc, hz / 25);
8 syscons.c            blink_screen              3847 
timeout(blink_screen, scp, hz / 10);
9 trm.c                trm_ExecuteSRB             478 
ccb->ccb_h.timeout_ch = timeout(trmtimeout, (caddr_t)srb, 
(ccb->ccb_h.timeout * hz) / 1000);
a tws_cam.c            tws_execute_scsi           782 ccb_h->timeout_ch 
= timeout(tws_timeout, req, (ccb_h->timeout * hz)/1000);
b tws_cam.c            tws_send_scsi_cmd          820 req->thandle = 
timeout(tws_timeout, req, (TWS_IO_TIMEOUT * hz));
c tws_cam.c            tws_set_param              867 req->thandle = 
timeout(tws_timeout, req, (TWS_IOCTL_TIMEOUT * hz));
d tws_services.c       tws_print_stats            398 
timeout(tws_print_stats, sc, 300*hz);
e if_wl.c              wlstart                   1022 sc->watchdog_ch = 
timeout(wlwatchdog, sc, 10);
f spic.c               spictimeout                429 sc->sc_timeout_ch 
= timeout(spictimeout, sc, spic_pollrate);
g spic.c               spictimeout                442 sc->sc_timeout_ch 
= timeout(spictimeout, sc, spic_pollrate);
h spic.c               spicopen                   459 
timeout(spictimeout, sc, spic_pollrate);
i kern_cons.c          sysbeep                    624 
timeout(sysbeepstop, (void *)NULL, period);
j kern_fail.c          fail_point_sleep           133 
timeout(fp->fp_sleep_fn, fp->fp_sleep_arg, timo);
k aarp.c               aarptimer                  128 aarptimer_ch = 
timeout(aarptimer, NULL, AARPT_AGE * hz);
l aarp.c               aarptnew                   580 aarptimer_ch = 
timeout(aarptimer, (caddr_t)0, hz);
m ng_btsocket_l2cap.c  ng_btsocket_l2cap_timeout 2663 pcb->timo = 
timeout(ng_btsocket_l2cap_process_timeout, pcb,
n ng_btsocket_rfcomm.c ng_btsocket_rfcomm_timeou 3449 pcb->timo = 
timeout(ng_btsocket_rfcomm_process_timeout, pcb,
o ng_fec.c             ng_fec_init                642 priv->fec_ch = 
timeout(ng_fec_tick, priv, hz);
p ng_fec.c             ng_fec_tick                717 priv->fec_ch = 
timeout(ng_fec_tick, priv, hz);
q key.c                key_timehandler           4551 (void 
)timeout((void *)key_timehandler, (void *)0, hz);
r key.c                key_init                  7776 timeout((void 
*)key_timehandler, (void *)0, hz);
s ncp_subr.c           ncp_init                   107 ncp_timer_handle = 
timeout(ncp_timer, NULL, NCP_TIMER_TICK);
t fdc.c                fd_turnon                 1186 
timeout(fd_motor_on, fd, hz);
u fdc.c                fdstate                   1786 fd->toffhandle = 
timeout(fd_turnoff, fd, 4 * hz);
v fdc.c                fdstate                   1877 
timeout(fd_pseudointr, fdc, hz / 16);
w fdc.c                fdstate                   2092 fd->tohandle = 
timeout(fd_iotimeout, fdc, hz);
x fdc.c                fdstate                   2101 fd->tohandle = 
timeout(fd_iotimeout, fdc, hz);
y fdc.c                fdstate                   2218 
timeout(fd_pseudointr, fdc, hz / 8);

* Lines 71-106 of 115, 10 more - press the space bar to display more *
Functions calling this function: timeout

   File                 Function                  Line
0 olpt.c               lptopen                    421 timeout (lptout, 
(caddr_t)sc,
1 olpt.c               lptout                     440 timeout (lptout, 
(caddr_t)sc, sc->sc_backoff);
2 pckbd.c              pckbd_timeout              260 
timeout(pckbd_timeout, arg, hz/10);
3 sio.c                sioattach                 2012 
timeout(siobusycheck, com, hz / 100);
4 sio.c                sioattach                 2696 
timeout(siobusycheck, com, hz / 100);
5 sio.c                sioattach                 3330 sio_timeout_handle 
= timeout(comwakeup, (void *)NULL,
6 sio.c                sioattach                 3347 sio_timeout_handle 
= timeout(comwakeup, (void *)NULL, sio_timeout);
7 sio.c                sioattach                 3933 
timeout(pc98_check_msr, (caddr_t)dev,
8 sio.c                sioattach                 3951 
timeout(pc98_check_msr, (caddr_t)dev,
9 ncr.c                ncr_timeout               5171 timeout 
(ncr_timeout, (caddr_t) np, step ? step : 1);

* Press the space bar to display the first lines again *


-Alfred