From owner-freebsd-net@FreeBSD.ORG Tue Sep 25 20:07:23 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 257CF106564A for ; Tue, 25 Sep 2012 20:07:23 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id CFCF18FC15 for ; Tue, 25 Sep 2012 20:07:22 +0000 (UTC) Received: by qafi29 with SMTP id i29so3952643qaf.13 for ; Tue, 25 Sep 2012 13:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=f+t0sShP6epQamAwT8MT/38yYk9JKK9Ty25Ia4LdtvA=; b=OuFxt3GwB+UvtvxON4HYY81H3UXF42jGFDDDaWUtAiPZuOKDhICiaLdicCGCPtvWof r3witgASN6kmTEUpZVG+3KRYyJ9Ge1Sb+4ccgeFydeOiDju8ogaBwtXwkiQAsrurKHp4 BVAv/r5QyUgJdKfFdxpfQhVWCwo7aIKxYnR/z3UqrjRlpVAYAS6BC6dQHrsqcyi8tG6x KQZBj3pUDCkC2lV1Ee1AWyBBMUR2gYKqPX/rnxzHmn3MqGCweU3mijlyKwZFtDfGVj/L HnVzUgcGrXIzv/Xwl3zAKDlJ3TLlNgY/EHBktBTvYwu/L4dtf2WB1PE9dq+z1FEvzgr+ M5cg== MIME-Version: 1.0 Received: by 10.229.136.17 with SMTP id p17mr9082604qct.139.1348603641882; Tue, 25 Sep 2012 13:07:21 -0700 (PDT) Received: by 10.49.50.103 with HTTP; Tue, 25 Sep 2012 13:07:21 -0700 (PDT) Date: Tue, 25 Sep 2012 16:07:21 -0400 Message-ID: From: Ryan Stone To: freebsd-net Content-Type: text/plain; charset=ISO-8859-1 Subject: netgraph: item->depth not set properly for some queued items X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Sep 2012 20:07:23 -0000 When ng_snd_item tries to send an item, has to acquire a "lock" on the node that it's sending to (I put lock in quotes because this isn't a standard FreeBSD synchronization primitive like a rw_lock or anything; netgraph has rolled its own synchronization primitive using atomic ops). Netgraph's node locks are not blocking locks (presumably to prevent deadlocks from cyclic graphs), so if ng_acquire_read/ng_acquire_write fails to get a lock of the proper type, it queues the item for processing from one of the ngthreads. When it hits this code path, item->depth is not updated. This means that after the message has been delivered by ngthread calling ng_apply_item, the following code does not send the return value of the rcvmsg/rcvdata method to the apply callback (note that depth must be 1 for it to pass the error through): /* Apply callback. */ if (apply != NULL) { if (depth == 1 && error != 0) apply->error = error; if (refcount_release(&apply->refs)) (*apply->apply)(apply->context, apply->error); } In the case that I saw, ngctl sent a message to a node and the item got enqueued. The node's rcvmsg method returned an error, but the error didn't propagate back to ngctl because the depth wasn't set right. The following patch resolves this particular issue for me but I don't entirely buy that it's the correct fix. I don't really understand why the depth should be forced to 1 when the item is enqueued, but it does match what's already done in another code path in ng_snd_item where an item is queued: Index: ng_base.c =================================================================== --- ng_base.c (revision 240923) +++ ng_base.c (working copy) @@ -2008,6 +2008,7 @@ NGI_SET_WRITER(item); else NGI_SET_READER(item); + item->depth = 1; NG_QUEUE_LOCK(ngq); /* Set OP_PENDING flag and enqueue the item. */ @@ -2286,7 +2287,6 @@ } if (queue) { - item->depth = 1; /* Put it on the queue for that node*/ ng_queue_rw(node, item, rw); return ((flags & NG_PROGRESS) ? EINPROGRESS : 0); If any netgraph experts have thoughts on this please chime in as I'm really not that confident in my understanding of the subtleties in netgraph.