From owner-freebsd-net@FreeBSD.ORG Fri Jan 22 21:48:22 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEFF6106566C for ; Fri, 22 Jan 2010 21:48:22 +0000 (UTC) (envelope-from ekamyshev@omsk.multinex.ru) Received: from mx.mkc-omsk.ru (ns3.omsktele.com [91.90.32.240]) by mx1.freebsd.org (Postfix) with ESMTP id 085F08FC0C for ; Fri, 22 Jan 2010 21:48:21 +0000 (UTC) Received: from evgen (unknown [192.168.220.33]) by mx.mkc-omsk.ru (Postfix) with SMTP id 63B691819D30 for ; Sat, 23 Jan 2010 03:32:11 +0600 (OMST) Message-ID: From: =?koi8-r?B?5dfHxc7Jyg==?= To: Date: Sat, 23 Jan 2010 03:32:11 +0600 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5843 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-Mailman-Approved-At: Fri, 22 Jan 2010 21:57:54 +0000 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Netgraph performance with ng_ipfw X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2010 21:48:22 -0000 Hi,=20 I have several routers under heavy load, running FreeBSD 7.2 These routers use Netgraph to impelement traffic shaping and accouting (using ng_car and ng_netflow nodes).=20 Packets are passed from firewall to netgraph using the following rules accounting: netgraph 100 ip from any to any in shaping: netgraph tablearg ip from any to table(118) out netgraph tablearg ip from table(118) to any in Table 118 contains users' ip addresses with tablearg referencing = configured individual ng_car node. At peak, there are 1500-2000 entries in table and configured nodes. The problem is that at peak load the router loses packets. After = studying the sources & doing some debugging,=20 it became clear that packets are being droped at netgraph queue, at = ng_alloc_item function: static __inline item_p ng_alloc_item(int type, int flags) { item_p item; KASSERT(((type & ~NGQF_TYPE) =3D=3D 0), ("%s: incorrect item type: %d", __func__, type)); item =3D uma_zalloc((type =3D=3D NGQF_DATA)?ng_qdzone:ng_qzone, ((flags & NG_WAITOK) ? M_WAITOK : M_NOWAIT) | M_ZERO); if (item) { item->el_flags =3D type; #ifdef NETGRAPH_DEBUG mtx_lock(&ngq_mtx); TAILQ_INSERT_TAIL(&ng_itemlist, item, all); allocated++; mtx_unlock(&ngq_mtx); #endif } return (item); } It returns NULL if it is unable to allocate entry in ng_qdzone. When it is being called from ng_package_data, this causes the packet to = be dropped: item_p ng_package_data(struct mbuf *m, int flags) { item_p item; if ((item =3D ng_alloc_item(NGQF_DATA, flags)) =3D=3D NULL) { NG_FREE_M(m); return (NULL); } ITEM_DEBUG_CHECKS; item->el_flags |=3D NGQF_READER; NGI_M(item) =3D m; return (item); } After tuning maxdata parameter, I was able to decrease loses(and = increase delays), but the question is, why=20 the system does not contain some kind of a counter of packets dropped at = Netgraph queue? It seem to be=20 a trivial task to add, for example, a sysctl variable that would reflect = the number of dropped packets, and it would=20 really simplify things. The second question is about the effectiveness of Netgraph queueing and = ng_ipfw node with SMP kernel... At ng_ipfw_connect function, when being connected to some other node,=20 to avoid recursion the hook is set to queueing mode: /* * Set hooks into queueing mode, to avoid recursion between * netgraph layer and ip_{input,output}. */ static int ng_ipfw_connect(hook_p hook) { NG_HOOK_FORCE_QUEUE(hook); return (0); } This causes the packets to be queued when being passed back to ng_ipfw = node.=20 On SMP kernels, several kernel processes are created to process=20 queues(they are seen as ng_queue* processes in ps). Now, the code of ngthread that processes the queue: static void ngthread(void *arg) { for (;;) { node_p node; /* Get node from the worklist. */ NG_WORKLIST_LOCK(); while ((node =3D TAILQ_FIRST(&ng_worklist)) =3D=3D NULL) NG_WORKLIST_SLEEP(); TAILQ_REMOVE(&ng_worklist, node, nd_work); NG_WORKLIST_UNLOCK(); CTR3(KTR_NET, "%20s: node [%x] (%p) taken off worklist", __func__, node->nd_ID, node); /* * We have the node. We also take over the reference * that the list had on it. * Now process as much as you can, until it won't * let you have another item off the queue. * All this time, keep the reference * that lets us be sure that the node still exists. * Let the reference go at the last minute. */ for (;;) { item_p item; int rw; NG_QUEUE_LOCK(&node->nd_input_queue); item =3D ng_dequeue(&node->nd_input_queue, &rw); if (item =3D=3D NULL) { atomic_clear_int(&node->nd_flags, = NGF_WORKQ); NG_QUEUE_UNLOCK(&node->nd_input_queue); break; /* go look for another node */ } else { NG_QUEUE_UNLOCK(&node->nd_input_queue); NGI_GET_NODE(item, node); /* zaps stored = node */ ng_apply_item(node, item, rw); NG_NODE_UNREF(node); } } NG_NODE_UNREF(node); } } It takes the node from ng_worklist, and tries to process as many items=20 in queue as possible, until ng_dequeue function returns NULL(no more = items).=20 Note that in ng_worklist there is usually only one node - ng_ipfw(if = other nodes=20 did not configure queueing for themselves, that is the case with ng_car = and ng_netflow nodes). If the large number of packets is being passed back to ng_ipfw node=20 from other nodes, it is clear that one kernel process(ng_queue*) will = simply take one node, and=20 if the packets are being passed quicker than they are being processed in = ng_ipfw(sent further to=20 ip_input or ip_output), one of the ng_queue* processes will take 100% = time of one CPU core, when the others will not=20 process anything. I have seen such behavior on my routers - at peak load, one of ng_queue* = processes takes 100% of one core,=20 and the other processes are seen in top taking 0% of CPU.=20 This seem to be a problem of ng_ipfw - it doesn't seem to be working = good with SMP. My question is, can it somehow be fixed? The third question is about the algorithm of finding hooks in ng_ipfw. When being passed from firewall, ng_ipfw_input is called, in turn,=20 it calls ng_ipfw_findhook1 function to find hook matching cookie from=20 struct ip_fw_args *fwa. if (fw_node =3D=3D NULL || (hook =3D ng_ipfw_findhook1(fw_node, fwa->cookie)) =3D=3D = NULL) { if (tee =3D=3D 0) m_freem(*m0); return (ESRCH); /* no hook associated with this = rule */ } ng_ipfw_findhook function calls converts this cookie to numeric = representation=20 and calls ng_ipfw_findhook1: /* Look up hook by name */ hook_p ng_ipfw_findhook(node_p node, const char *name) { u_int16_t n; /* numeric representation of hook */ char *endptr; n =3D (u_int16_t)strtol(name, &endptr, 10); if (*endptr !=3D '\0') return NULL; return ng_ipfw_findhook1(node, n); } and ng_ipfw_findhook1 simply goes through the whole list of hooks to = find one matching=20 given cookie: /* Look up hook by rule number */ static hook_p ng_ipfw_findhook1(node_p node, u_int16_t rulenum) { hook_p hook; hpriv_p hpriv; LIST_FOREACH(hook, &node->nd_hooks, hk_hooks) { hpriv =3D NG_HOOK_PRIVATE(hook); if (NG_HOOK_IS_VALID(hook) && (hpriv->rulenum =3D=3D = rulenum)) return (hook); } return (NULL); } When the large number of hooks is present, as in the configuration given = in the beginning of this message,=20 this would cause an obvious decrease in performance - for each packet = passed from ipfw to netgraph,=20 1 to 1500-2000 iterations are needed to find matching hook. And again, = it seem to be a trivial task to rewrite=20 this code to find hook by hash or even by array.