Date: Sun, 23 Mar 2014 13:03:00 +0100 From: Henrik Gulbrandsen <henrik@gulbra.net> To: bug-followup@freebsd.org, freebsd-java@freebsd.org Cc: Craig Rodrigues <rodrigc@freebsd.org>, Alan Cox <alc@freebsd.org>, Konstantin Belousov <kib@freebsd.org> Subject: Re: kern/187238: =?UTF-8?Q?vm=2Epmap=2Epcid=5Fenabled=3D=22=31=22?= =?UTF-8?Q?=20causes=20Java=20to=20coredump=20in=20FBSD=20=31=30?= Message-ID: <831cb7b9f4719265e66a26edcf6c0859@www.gulbra.net>
next in thread | raw e-mail | index | archive | help
This is the most time-consuming bug I've encountered in my life, and not only because I started looking for it in the JVM, but now it seems to have been hiding in plain sight. I'm pretty sure that pmap->pm_save is handled incorrectly in the current kernel. Judging from the code, it's supposed to include all CPUs where the pmap has been active since the latest call to pmap_invalidate_all(...). However, that means that it should always be a superset of pmap->pm_active, since any CPU where the pmap is active may cache pmap information at any time. Currently, this is not the case, and since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we are left with inconsistencies that crash the process soon afterwards. The attached patch solves this by only clearing a CPU from pmap->pm_save if it is not currently included in pmap->pm_active. As far as I can tell, that eliminates the bug. The patch is against STABLE, since that's what I'm currently running, but CURRENT should be pretty close, except for the default setting of pmap_pcid_enabled. By the way, the logic in the invalidation functions is a bit messy now and can probably be simplified. Also, is there a good reason for ignoring the pmap argument in smp_masked_invltlb(...)? /Henrik P.S. After five days it turns out that mx1.FreeBSD.org has been rejecting this email due to a slight misconfiguration of my mail server. I hope that I haven't caused too many hours of frustration by this failure to report the bug fix in due time. Anyway, in the meantime my test (java/openjdk6 building itself) has been running continuously in the background. It used to fail almost every single time, but has now gone through 765 iterations without a single crash. I believe that indicates that the bug is fixed. From owner-freebsd-java@FreeBSD.ORG Sun Mar 23 12:28:06 2014 Return-Path: <owner-freebsd-java@FreeBSD.ORG> Delivered-To: freebsd-java@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CCEFA45B for <freebsd-java@freebsd.org>; Sun, 23 Mar 2014 12:28:06 +0000 (UTC) Received: from gulbra.net (ec2-174-129-193-206.compute-1.amazonaws.com [174.129.193.206]) by mx1.freebsd.org (Postfix) with ESMTP id 9FBF329B for <freebsd-java@freebsd.org>; Sun, 23 Mar 2014 12:28:06 +0000 (UTC) Received: from domU-12-31-39-14-60-7A.compute-1.internal (localhost [127.0.0.1]) by mail.gulbra.net (Postfix) with ESMTP id C9C6D6F6 for <freebsd-java@freebsd.org>; Thu, 20 Mar 2014 19:02:32 +0000 (UTC) Received: by mail.gulbra.net (Postfix, from userid 33) id 733F16EF; Thu, 20 Mar 2014 19:02:32 +0000 (UTC) To: freebsd-java <freebsd-java@freebsd.org> Subject: Re: kern/187238: =?UTF-8?Q?vm=2Epmap=2Epcid=5Fenabled=3D=22=31=22?= =?UTF-8?Q?=20causes=20Java=20to=20coredump=20in=20FBSD=20=31=30?= X-PHP-Originating-Script: 0:rcmail.php MIME-Version: 1.0 Date: Thu, 20 Mar 2014 20:02:32 +0100 From: Henrik Gulbrandsen <henrik@gulbra.net> Message-ID: <2f02a3e211b23241c496fbff48ea85de@www.gulbra.net> X-Sender: henrik@gulbra.net User-Agent: Roundcube Webmail/0.9.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: Craig Rodrigues <rodrigc@freebsd.org>, Alan Cox <alc@freebsd.org>, Konstantin Belousov <kib@freebsd.org> X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Porting Java to FreeBSD <freebsd-java.freebsd.org> List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-java>, <mailto:freebsd-java-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-java/> List-Post: <mailto:freebsd-java@freebsd.org> List-Help: <mailto:freebsd-java-request@freebsd.org?subject=help> List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-java>, <mailto:freebsd-java-request@freebsd.org?subject=subscribe> X-List-Received-Date: Sun, 23 Mar 2014 12:28:06 -0000 It looks like any attempt to send an email to this list without being subscribed means that you end up in moderation indefinitely, so I try to send again, just in case there are other people out there that have been working on this bug. I have been running my test (java/openjdk6 rebuilding itself) continuously for 59 hours now. That used to fail most of the time, but I haven't seen a single crash with this patch. /Henrik -------- Original Message -------- Subject: Re: kern/187238: vm.pmap.pcid_enabled="1" causes Java to coredump in FBSD 10 Date: 2014-03-18 11:54 From: Henrik Gulbrandsen <henrik@gulbra.net> To: bug-followup@FreeBSD.org, freebsd-java@FreeBSD.org Cc: Craig Rodrigues <rodrigc@FreeBSD.org>, Konstantin Belousov <kib@FreeBSD.org>, Alan Cox <alc@FreeBSD.org> This is the most time-consuming bug I've encountered in my life, and not only because I started looking for it in the JVM, but now it seems to have been hiding in plain sight. I'm pretty sure that pmap->pm_save is handled incorrectly in the current kernel. Judging from the code, it's supposed to include all CPUs where the pmap has been active since the latest call to pmap_invalidate_all(...). However, that means that it should always be a superset of pmap->pm_active, since any CPU where the pmap is active may cache pmap information at any time. Currently, this is not the case, and since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we are left with inconsistencies that crash the process soon afterwards. The attached patch solves this by only clearing a CPU from pmap->pm_save if it is not currently included in pmap->pm_active. As far as I can tell, that eliminates the bug. The patch is against STABLE, since that's what I'm currently running, but CURRENT should be pretty close, except for the default setting of pmap_pcid_enabled. By the way, the logic in the invalidation functions is a bit messy now and can probably be simplified. Also, is there a good reason for ignoring the pmap argument in smp_masked_invltlb(...)? /Henrik From owner-freebsd-java@FreeBSD.ORG Sun Mar 23 18:49:28 2014 Return-Path: <owner-freebsd-java@FreeBSD.ORG> Delivered-To: freebsd-java@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 42897737; Sun, 23 Mar 2014 18:49:28 +0000 (UTC) Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com [IPv6:2a00:1450:4010:c04::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 66241903; Sun, 23 Mar 2014 18:49:27 +0000 (UTC) Received: by mail-lb0-f174.google.com with SMTP id u14so2982669lbd.5 for <multiple recipients>; Sun, 23 Mar 2014 11:49:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=8uZilHcNm/zoXPw6v58ixcaPquPYNBQPF0B9YjBvzFQ=; b=lk2jtDeTDrk9en7ClZfDChCNIVq+3MzkihS26r4mqEy1we1gfvWQhWYYdDm13quRG7 Ma0uwBFNUkuJfQl2PxpBrEzmtlipTT31pYvWDtqrLomiWjkDX7d57PHIItrdo291w7sf UOht+R3nGbM1hCHHdHoMZq3FLurnFL630unhvWH3FAJofiOjOU7SGXJs6wFn00zLEl9z ZFyebf6KyYLpvYI0lKrLXkKVTdUoxRPfRGJXM5ckLVaA7Xb0+W/EJBz3udy+poSqRUVF KcmxAfCdeocHxE4okYYdjQj8VOky09qLLNZLGD5UZ9UZ6C7p9tMQST+7aP/28+Krgbhg W7zQ== MIME-Version: 1.0 X-Received: by 10.152.1.199 with SMTP id 7mr42990872lao.24.1395600565326; Sun, 23 Mar 2014 11:49:25 -0700 (PDT) Sender: crodr001@gmail.com Received: by 10.112.169.68 with HTTP; Sun, 23 Mar 2014 11:49:25 -0700 (PDT) In-Reply-To: <2f02a3e211b23241c496fbff48ea85de@www.gulbra.net> References: <2f02a3e211b23241c496fbff48ea85de@www.gulbra.net> Date: Sun, 23 Mar 2014 11:49:25 -0700 X-Google-Sender-Auth: uqjKjB-DKzPwGtHhDTJ05Si4HGM Message-ID: <CAG=rPVdg9MGEmGhw1-c2b0CQHq7uy2jh-1k7sf_CgxnZxGcmfg@mail.gmail.com> Subject: Re: kern/187238: vm.pmap.pcid_enabled="1" causes Java to coredump in FBSD 10 From: Craig Rodrigues <rodrigc@FreeBSD.org> To: Henrik Gulbrandsen <henrik@gulbra.net> Content-Type: text/plain; charset=ISO-8859-1 Cc: Alan Cox <alc@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, freebsd-java <freebsd-java@freebsd.org> X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Porting Java to FreeBSD <freebsd-java.freebsd.org> List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-java>, <mailto:freebsd-java-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-java/> List-Post: <mailto:freebsd-java@freebsd.org> List-Help: <mailto:freebsd-java-request@freebsd.org?subject=help> List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-java>, <mailto:freebsd-java-request@freebsd.org?subject=subscribe> X-List-Received-Date: Sun, 23 Mar 2014 18:49:28 -0000 On Thu, Mar 20, 2014 at 12:02 PM, Henrik Gulbrandsen <henrik@gulbra.net> wrote: > It looks like any attempt to send an email to this list without being > subscribed means that you end up in moderation indefinitely, so I try > to send again, just in case there are other people out there that have > been working on this bug. I have been running my test (java/openjdk6 > rebuilding itself) continuously for 59 hours now. That used to fail > most of the time, but I haven't seen a single crash with this patch. > > /Henrik Henrik, Thanks for your persistence in analyzing the problem and coming up with a patch. I've assigned PR 187238 to kib@ for review. -- Craig
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?831cb7b9f4719265e66a26edcf6c0859>