From owner-freebsd-net@freebsd.org Thu Jun 18 20:08:40 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 32CD2331D7F for ; Thu, 18 Jun 2020 20:08:40 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49ntKz2vP3z3yj8 for ; Thu, 18 Jun 2020 20:08:38 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 8A8AE29DDE for ; Thu, 18 Jun 2020 22:08:31 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id gJdQU22ZBUA4 for ; Thu, 18 Jun 2020 22:08:30 +0200 (CEST) Received: from mail.local.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id C52AB29B6F for ; Thu, 18 Jun 2020 22:08:30 +0200 (CEST) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.local.incore (Postfix) with ESMTP id B0223FE for ; Thu, 18 Jun 2020 22:08:30 +0200 (CEST) Message-ID: <5EEBC9BE.6020206@incore.de> Date: Thu, 18 Jun 2020 22:08:30 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Re: pxeboot is very slow on servers sending gratuitous ARP probably caused by commit r317887 References: <5EE28A7D.7090406@incore.de> In-Reply-To: <5EE28A7D.7090406@incore.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 49ntKz2vP3z3yj8 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of longwitz@incore.de designates 195.145.1.138 as permitted sender) smtp.mailfrom=longwitz@incore.de X-Spamd-Result: default: False [-2.34 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.001]; RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.96)[-0.956]; DMARC_NA(0.00)[incore.de]; NEURAL_HAM_SHORT(-0.08)[-0.084]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3320, ipnet:195.145.0.0/16, country:DE]; RCVD_TLS_LAST(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2020 20:08:40 -0000 In the meantime I did some more research. Now I am sure that after commit r317887 pxeboot is not robust when ARP packets come in. Before this commit pxe.c had a function readudp() which used PXENV_UDP_READ to get data from the NFS server and so never an ARP packet was coming in. Now pxe.c reads data from the NFS server in function pxe_netif_receive() using the API PXENV_UNDI_ISR. In my network a lot of ARP packets are coming in and can be seen in arp.c when ARP_DEBUG is enabled. I have sneaking suspicion incoming gratuitous ARPs send from many of my servers on the network are not handled correct in pxe_netif_receive() and that is the cause of the slowness. At home on a small network without gratuitous ARPs the actual pxeboot has normal speed. The following patch for r360998 disables all broadcast ARPs and pxeboot again runs with normal speed on all of my networks: --- pxe.c.orig 2019-01-13 08:25:55.000000000 +0100 +++ pxe.c 2020-06-15 21:30:19.000000000 +0200 @@ -424,7 +424,7 @@ if (undi_open_p == NULL) return; bzero(undi_open_p, sizeof(*undi_open_p)); - undi_open_p->PktFilter = FLTR_DIRECTED | FLTR_BRDCST; + undi_open_p->PktFilter = FLTR_DIRECTED; pxe_call(PXENV_UNDI_OPEN, undi_open_p); if (undi_open_p->Status != 0) printf("undi open failed: %x\n", undi_open_p->Status); Using ARP_DEBUG in libsa/arp.c needs the following small patch: --- arp.c.orig 2018-12-17 16:13:58.000000000 +0100 +++ arp.c 2020-06-18 21:51:37.920045000 +0200 @@ -178,7 +178,7 @@ if (n == -1 || n < sizeof(struct ether_arp)) { #ifdef ARP_DEBUG if (debug) - printf("bad len=%d\n", n); + printf("bad len=%zd\n", n); #endif free(ptr); return (-1); I could not figure out why and in which way ARPs are handled not always correct. I do not understand the API of PXENV_UNDI_ISR used in pxe_netif_receive(). The well known PXE documentation pxespec.pdf has a description for this but only for the case when the program handles the interrupt of the PIC. But we do not have assembler coding for doing this. It would be fine if somebody can point me to a documentation of the PXENV_UNDI_ISR API used in pxe_netif_receive(). Andreas Longwitz