From owner-freebsd-stable@freebsd.org Wed Oct 5 03:06:25 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AF5D2AF5525; Wed, 5 Oct 2016 03:06:25 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8F3DAB99; Wed, 5 Oct 2016 03:06:25 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from julian-mbp3.pixel8networks.com (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u9536HPF047484 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Tue, 4 Oct 2016 20:06:18 -0700 (PDT) (envelope-from julian@freebsd.org) To: FreeBSD Stable , freebsd From: Julian Elischer Subject: fix for use-after-free problem in 10.x Message-ID: <7b732876-8cc3-a638-7ff1-e664060d4907@freebsd.org> Date: Tue, 4 Oct 2016 20:06:12 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Oct 2016 03:06:25 -0000 In 11 and 12 the taskqueue code has been rewritten in this area but under 10 this bug still occurs. On our appliances this bug stops the system from mounting the ZFS root, so it is quite severe. Basically while the thread is sleeping during the ZFS mount of root (in the while loop), another thread can free the 'task' item it is checking in that while loop and it can be reused or filled with 'deadcode' etc., with the waiting code unaware of the change.. The fix is to refetch the item at the end of the queue each time around the loop. I don't really want to do the bigger change of MFCing the change in 11, as it is more extensive, though if someone else does, that's ok by me. (If it's ABI compatible) Any comments or suggestions? here's the fix in diff form: [robot@porridge /usr/src]$ p4 diff -du ... --- //depot/pbranches/jelischer/FreeBSD-PZ/10.3/sys/kern/subr_taskqueue.c 2016-09-27 09:14:59.000000000 -0700 +++ /usr/src/sys/kern/subr_taskqueue.c 2016-09-27 09:14:59.000000000 -0700 @@ -441,9 +441,10 @@ TQ_LOCK(queue); task = STAILQ_LAST(&queue->tq_queue, task, ta_link); - if (task != NULL) - while (task->ta_pending != 0) - TQ_SLEEP(queue, task, &queue->tq_mutex, PWAIT, "-", 0); + while (task != NULL && task->ta_pending != 0) { + TQ_SLEEP(queue, task, &queue->tq_mutex, PWAIT, "-", 0); + task = STAILQ_LAST(&queue->tq_queue, task, ta_link); + } taskqueue_drain_running(queue); KASSERT(STAILQ_EMPTY(&queue->tq_queue), ("taskqueue queue is not empty after draining"));