From owner-freebsd-arch@FreeBSD.ORG Mon Jul 14 14:53:31 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 336D537B401 for ; Mon, 14 Jul 2003 14:53:31 -0700 (PDT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 6A0F843F85 for ; Mon, 14 Jul 2003 14:53:30 -0700 (PDT) (envelope-from silby@silby.com) Received: (qmail 25907 invoked from network); 14 Jul 2003 21:53:29 -0000 Received: from niwun.pair.com (HELO localhost) (209.68.2.70) by relay.pair.com with SMTP; 14 Jul 2003 21:53:29 -0000 X-pair-Authenticated: 209.68.2.70 Date: Mon, 14 Jul 2003 16:53:01 -0500 (CDT) From: Mike Silbersack To: arch@freebsd.org Message-ID: <20030714164426.R8225@odysseus.silby.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: 4.x mbuf binary compatibility; can it be broken? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2003 21:53:31 -0000 In the process of hunting down reported panics in xl_newbuf, I've come to the conclusion that the panics are a result of mbuf cluster refcounts overflowing. This is not too surprising, as we use an array of chars to store the refcounts. (-current uses ints, and doesn't have this problem.) It's easy enough to switch from a char to an int array in 4.x to fix the problem there, but there is a problem: Our friendly mbuf macros (MCLALLOC and MCLFREE) manipulate the refcount. This means that 3rd party modules which use the macros will no longer work properly. Hence, the question posed on the subject line. Aside from putting hacks in many of the mbuf functions so that they avoid reference counts growing into the danger zone, there's no solution to the problem that I can see. So, what's our policy on ABI breakage for modules? It'd be nice to ignore this problem, but the xl-related PRs filed which seem to describe this exact problem are too numerous to ignore. (No, this isn't if_xl's fault; it's simply a victim because it properly uses its descriptor lists, thereby using mbuf cluster refcounts rather than packet copies as many cheaper NICs are required to do.) Thanks, Mike "Silby" Silbersack