From owner-freebsd-net@FreeBSD.ORG  Fri Jan 22 21:48:22 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CEFF6106566C
	for <freebsd-net@freebsd.org>; Fri, 22 Jan 2010 21:48:22 +0000 (UTC)
	(envelope-from ekamyshev@omsk.multinex.ru)
Received: from mx.mkc-omsk.ru (ns3.omsktele.com [91.90.32.240])
	by mx1.freebsd.org (Postfix) with ESMTP id 085F08FC0C
	for <freebsd-net@freebsd.org>; Fri, 22 Jan 2010 21:48:21 +0000 (UTC)
Received: from evgen (unknown [192.168.220.33])
	by mx.mkc-omsk.ru (Postfix) with SMTP id 63B691819D30
	for <freebsd-net@freebsd.org>; Sat, 23 Jan 2010 03:32:11 +0600 (OMST)
Message-ID: <C6F35B81AA3747D1B11CAC169F2D8D84@evgen>
From: =?koi8-r?B?5dfHxc7Jyg==?= <ekamyshev@omsk.multinex.ru>
To: <freebsd-net@freebsd.org>
Date: Sat, 23 Jan 2010 03:32:11 +0600
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5843
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579
X-Mailman-Approved-At: Fri, 22 Jan 2010 21:57:54 +0000
Content-Type: text/plain;
	charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Netgraph performance with ng_ipfw
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jan 2010 21:48:22 -0000

Hi,=20
I have several routers under heavy load, running FreeBSD 7.2
These routers use Netgraph to impelement traffic shaping and accouting
(using ng_car and ng_netflow nodes).=20
Packets are passed from firewall to netgraph using the following rules
accounting:
netgraph 100 ip from any to any in
shaping:
netgraph tablearg ip from any to table(118) out
netgraph tablearg ip from table(118) to any in

Table 118 contains users' ip addresses with tablearg referencing =
configured individual ng_car node.
At peak, there are 1500-2000 entries in table and configured nodes.
The problem is that at peak load the router loses packets. After =
studying the sources & doing some debugging,=20
it became clear that packets are being droped at netgraph queue, at =
ng_alloc_item function:

static __inline item_p
ng_alloc_item(int type, int flags)
{
        item_p item;

        KASSERT(((type & ~NGQF_TYPE) =3D=3D 0),
            ("%s: incorrect item type: %d", __func__, type));

        item =3D uma_zalloc((type =3D=3D NGQF_DATA)?ng_qdzone:ng_qzone,
            ((flags & NG_WAITOK) ? M_WAITOK : M_NOWAIT) | M_ZERO);

        if (item) {
                item->el_flags =3D type;
#ifdef  NETGRAPH_DEBUG
                mtx_lock(&ngq_mtx);
                TAILQ_INSERT_TAIL(&ng_itemlist, item, all);
                allocated++;
                mtx_unlock(&ngq_mtx);
#endif
        }

        return (item);
}

It returns NULL if it is unable to allocate entry in ng_qdzone.
When it is being called from ng_package_data, this causes the packet to =
be dropped:
item_p
ng_package_data(struct mbuf *m, int flags)
{
        item_p item;

        if ((item =3D ng_alloc_item(NGQF_DATA, flags)) =3D=3D NULL) {
                NG_FREE_M(m);
                return (NULL);
        }
        ITEM_DEBUG_CHECKS;
        item->el_flags |=3D NGQF_READER;
        NGI_M(item) =3D m;
        return (item);
}


After tuning maxdata parameter, I was able to decrease loses(and =
increase delays), but the question is, why=20
the system does not contain some kind of a counter of packets dropped at =
Netgraph queue? It seem to be=20
a trivial task to add, for example, a sysctl variable that would reflect =
the number of dropped packets, and it would=20
really simplify things.

The second question is about the effectiveness of Netgraph queueing and =
ng_ipfw node with SMP kernel...
At ng_ipfw_connect function, when being connected to some other node,=20
to avoid recursion the hook is set to queueing mode:
/*
 * Set hooks into queueing mode, to avoid recursion between
 * netgraph layer and ip_{input,output}.
 */
static int
ng_ipfw_connect(hook_p hook)
{
        NG_HOOK_FORCE_QUEUE(hook);
        return (0);
}

This causes the packets to be queued when being passed back to ng_ipfw =
node.=20
On SMP kernels, several kernel processes are created to process=20
queues(they are seen as ng_queue* processes in ps).
Now, the code of ngthread that processes the queue:

static void
ngthread(void *arg)
{
        for (;;) {
                node_p  node;

                /* Get node from the worklist. */
                NG_WORKLIST_LOCK();
                while ((node =3D TAILQ_FIRST(&ng_worklist)) =3D=3D NULL)
                        NG_WORKLIST_SLEEP();
                TAILQ_REMOVE(&ng_worklist, node, nd_work);
                NG_WORKLIST_UNLOCK();
                CTR3(KTR_NET, "%20s: node [%x] (%p) taken off worklist",
                    __func__, node->nd_ID, node);
                /*
                 * We have the node. We also take over the reference
                 * that the list had on it.
                 * Now process as much as you can, until it won't
                 * let you have another item off the queue.
                 * All this time, keep the reference
                 * that lets us be sure that the node still exists.
                 * Let the reference go at the last minute.
                 */
                for (;;) {
                        item_p item;
                        int rw;

                        NG_QUEUE_LOCK(&node->nd_input_queue);
                        item =3D ng_dequeue(&node->nd_input_queue, &rw);
                        if (item =3D=3D NULL) {
                                atomic_clear_int(&node->nd_flags, =
NGF_WORKQ);
                                NG_QUEUE_UNLOCK(&node->nd_input_queue);
                                break; /* go look for another node */
                        } else {
                                NG_QUEUE_UNLOCK(&node->nd_input_queue);
                                NGI_GET_NODE(item, node); /* zaps stored =
node */
                                ng_apply_item(node, item, rw);
                                NG_NODE_UNREF(node);
                        }
                }
                NG_NODE_UNREF(node);
        }
}

It takes the node from ng_worklist, and tries to process as many items=20
in queue as possible, until ng_dequeue function returns NULL(no more =
items).=20
Note that in ng_worklist there is usually only one node - ng_ipfw(if =
other nodes=20
did not configure queueing for themselves, that is the case with ng_car =
and ng_netflow nodes).
If the large number of packets is being passed back to ng_ipfw node=20
from other nodes, it is clear that one kernel process(ng_queue*) will =
simply take one node, and=20
if the packets are being passed quicker than they are being processed in =
ng_ipfw(sent further to=20
ip_input or ip_output), one of the ng_queue* processes will take 100% =
time of one CPU core, when the others will not=20
process anything.
I have seen such behavior on my routers - at peak load, one of ng_queue* =
processes takes 100% of one core,=20
and the other processes are seen in top taking 0% of CPU.=20
This seem to be a problem of ng_ipfw - it doesn't seem to be working =
good with SMP.
My question is, can it somehow be fixed?

The third question is about the algorithm of finding hooks in ng_ipfw.
When being passed from firewall, ng_ipfw_input is called, in turn,=20
it calls ng_ipfw_findhook1 function to find hook matching cookie from=20
struct ip_fw_args *fwa.

        if (fw_node =3D=3D NULL ||
           (hook =3D ng_ipfw_findhook1(fw_node, fwa->cookie)) =3D=3D =
NULL) {
                if (tee =3D=3D 0)
                        m_freem(*m0);
                return (ESRCH);         /* no hook associated with this =
rule */
        }


ng_ipfw_findhook function calls converts this cookie to numeric =
representation=20
and calls ng_ipfw_findhook1:

/* Look up hook by name */
hook_p
ng_ipfw_findhook(node_p node, const char *name)
{
        u_int16_t n;    /* numeric representation of hook */
        char *endptr;

        n =3D (u_int16_t)strtol(name, &endptr, 10);
        if (*endptr !=3D '\0')
                return NULL;
        return ng_ipfw_findhook1(node, n);
}


and ng_ipfw_findhook1 simply goes through the whole list of hooks to =
find one matching=20
given cookie:

/* Look up hook by rule number */
static hook_p
ng_ipfw_findhook1(node_p node, u_int16_t rulenum)
{
        hook_p  hook;
        hpriv_p hpriv;

        LIST_FOREACH(hook, &node->nd_hooks, hk_hooks) {
                hpriv =3D NG_HOOK_PRIVATE(hook);
                if (NG_HOOK_IS_VALID(hook) && (hpriv->rulenum =3D=3D =
rulenum))
                        return (hook);
        }

        return (NULL);
}

When the large number of hooks is present, as in the configuration given =
in the beginning of this message,=20
this would cause an obvious decrease in performance - for each packet =
passed from ipfw to netgraph,=20
1 to 1500-2000 iterations are needed to find matching hook. And again, =
it seem to be a trivial task to rewrite=20
this code to find hook by hash or even by array.