Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Mar 2014 13:03:00 +0100
From:      Henrik Gulbrandsen <henrik@gulbra.net>
To:        bug-followup@freebsd.org, freebsd-java@freebsd.org
Cc:        Craig Rodrigues <rodrigc@freebsd.org>, Alan Cox <alc@freebsd.org>, Konstantin Belousov <kib@freebsd.org>
Subject:   Re: kern/187238: =?UTF-8?Q?vm=2Epmap=2Epcid=5Fenabled=3D=22=31=22?= =?UTF-8?Q?=20causes=20Java=20to=20coredump=20in=20FBSD=20=31=30?=
Message-ID:  <831cb7b9f4719265e66a26edcf6c0859@www.gulbra.net>

next in thread | raw e-mail | index | archive | help
This is the most time-consuming bug I've encountered in my life, and not
only because I started looking for it in the JVM, but now it seems to 
have
been hiding in plain sight. I'm pretty sure that pmap->pm_save is 
handled
incorrectly in the current kernel. Judging from the code, it's supposed 
to
include all CPUs where the pmap has been active since the latest call to
pmap_invalidate_all(...). However, that means that it should always be a
superset of pmap->pm_active, since any CPU where the pmap is active may
cache pmap information at any time. Currently, this is not the case, and
since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we
are left with inconsistencies that crash the process soon afterwards.

The attached patch solves this by only clearing a CPU from pmap->pm_save
if it is not currently included in pmap->pm_active. As far as I can 
tell,
that eliminates the bug. The patch is against STABLE, since that's what
I'm currently running, but CURRENT should be pretty close, except for 
the
default setting of pmap_pcid_enabled.

By the way, the logic in the invalidation functions is a bit messy now
and can probably be simplified. Also, is there a good reason for 
ignoring
the pmap argument in smp_masked_invltlb(...)?

/Henrik

P.S. After five days it turns out that mx1.FreeBSD.org has been 
rejecting
this email due to a slight misconfiguration of my mail server. I hope 
that
I haven't caused too many hours of frustration by this failure to report
the bug fix in due time. Anyway, in the meantime my test (java/openjdk6
building itself) has been running continuously in the background. It 
used
to fail almost every single time, but has now gone through 765 
iterations
without a single crash. I believe that indicates that the bug is fixed.
From owner-freebsd-java@FreeBSD.ORG  Sun Mar 23 12:28:06 2014
Return-Path: <owner-freebsd-java@FreeBSD.ORG>
Delivered-To: freebsd-java@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CCEFA45B
 for <freebsd-java@freebsd.org>; Sun, 23 Mar 2014 12:28:06 +0000 (UTC)
Received: from gulbra.net (ec2-174-129-193-206.compute-1.amazonaws.com
 [174.129.193.206])
 by mx1.freebsd.org (Postfix) with ESMTP id 9FBF329B
 for <freebsd-java@freebsd.org>; Sun, 23 Mar 2014 12:28:06 +0000 (UTC)
Received: from domU-12-31-39-14-60-7A.compute-1.internal (localhost
 [127.0.0.1]) by mail.gulbra.net (Postfix) with ESMTP id C9C6D6F6
 for <freebsd-java@freebsd.org>; Thu, 20 Mar 2014 19:02:32 +0000 (UTC)
Received: by mail.gulbra.net (Postfix, from userid 33)
 id 733F16EF; Thu, 20 Mar 2014 19:02:32 +0000 (UTC)
To: freebsd-java <freebsd-java@freebsd.org>
Subject: Re: kern/187238: =?UTF-8?Q?vm=2Epmap=2Epcid=5Fenabled=3D=22=31=22?=
 =?UTF-8?Q?=20causes=20Java=20to=20coredump=20in=20FBSD=20=31=30?=
X-PHP-Originating-Script: 0:rcmail.php
MIME-Version: 1.0
Date: Thu, 20 Mar 2014 20:02:32 +0100
From: Henrik Gulbrandsen <henrik@gulbra.net>
Message-ID: <2f02a3e211b23241c496fbff48ea85de@www.gulbra.net>
X-Sender: henrik@gulbra.net
User-Agent: Roundcube Webmail/0.9.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: Craig Rodrigues <rodrigc@freebsd.org>, Alan Cox <alc@freebsd.org>,
 Konstantin Belousov <kib@freebsd.org>
X-BeenThere: freebsd-java@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Porting Java to FreeBSD <freebsd-java.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-java>,
 <mailto:freebsd-java-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-java/>;
List-Post: <mailto:freebsd-java@freebsd.org>
List-Help: <mailto:freebsd-java-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-java>,
 <mailto:freebsd-java-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Mar 2014 12:28:06 -0000

It looks like any attempt to send an email to this list without being
subscribed means that you end up in moderation indefinitely, so I try
to send again, just in case there are other people out there that have
been working on this bug. I have been running my test (java/openjdk6
rebuilding itself) continuously for 59 hours now. That used to fail
most of the time, but I haven't seen a single crash with this patch.

/Henrik

-------- Original Message --------
Subject: Re: kern/187238: vm.pmap.pcid_enabled="1" causes Java to 
coredump in FBSD 10
Date: 2014-03-18 11:54
 From: Henrik Gulbrandsen <henrik@gulbra.net>
To: bug-followup@FreeBSD.org, freebsd-java@FreeBSD.org
Cc: Craig Rodrigues <rodrigc@FreeBSD.org>, Konstantin Belousov 
<kib@FreeBSD.org>, Alan Cox <alc@FreeBSD.org>

This is the most time-consuming bug I've encountered in my life, and not
only because I started looking for it in the JVM, but now it seems to 
have
been hiding in plain sight. I'm pretty sure that pmap->pm_save is 
handled
incorrectly in the current kernel. Judging from the code, it's supposed 
to
include all CPUs where the pmap has been active since the latest call to
pmap_invalidate_all(...). However, that means that it should always be a
superset of pmap->pm_active, since any CPU where the pmap is active may
cache pmap information at any time. Currently, this is not the case, and
since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we
are left with inconsistencies that crash the process soon afterwards.

The attached patch solves this by only clearing a CPU from pmap->pm_save
if it is not currently included in pmap->pm_active. As far as I can 
tell,
that eliminates the bug. The patch is against STABLE, since that's what
I'm currently running, but CURRENT should be pretty close, except for 
the
default setting of pmap_pcid_enabled.

By the way, the logic in the invalidation functions is a bit messy now
and can probably be simplified. Also, is there a good reason for 
ignoring
the pmap argument in smp_masked_invltlb(...)?

/Henrik
From owner-freebsd-java@FreeBSD.ORG  Sun Mar 23 18:49:28 2014
Return-Path: <owner-freebsd-java@FreeBSD.ORG>
Delivered-To: freebsd-java@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 42897737;
 Sun, 23 Mar 2014 18:49:28 +0000 (UTC)
Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com
 [IPv6:2a00:1450:4010:c04::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 66241903;
 Sun, 23 Mar 2014 18:49:27 +0000 (UTC)
Received: by mail-lb0-f174.google.com with SMTP id u14so2982669lbd.5
 for <multiple recipients>; Sun, 23 Mar 2014 11:49:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=8uZilHcNm/zoXPw6v58ixcaPquPYNBQPF0B9YjBvzFQ=;
 b=lk2jtDeTDrk9en7ClZfDChCNIVq+3MzkihS26r4mqEy1we1gfvWQhWYYdDm13quRG7
 Ma0uwBFNUkuJfQl2PxpBrEzmtlipTT31pYvWDtqrLomiWjkDX7d57PHIItrdo291w7sf
 UOht+R3nGbM1hCHHdHoMZq3FLurnFL630unhvWH3FAJofiOjOU7SGXJs6wFn00zLEl9z
 ZFyebf6KyYLpvYI0lKrLXkKVTdUoxRPfRGJXM5ckLVaA7Xb0+W/EJBz3udy+poSqRUVF
 KcmxAfCdeocHxE4okYYdjQj8VOky09qLLNZLGD5UZ9UZ6C7p9tMQST+7aP/28+Krgbhg
 W7zQ==
MIME-Version: 1.0
X-Received: by 10.152.1.199 with SMTP id 7mr42990872lao.24.1395600565326; Sun,
 23 Mar 2014 11:49:25 -0700 (PDT)
Sender: crodr001@gmail.com
Received: by 10.112.169.68 with HTTP; Sun, 23 Mar 2014 11:49:25 -0700 (PDT)
In-Reply-To: <2f02a3e211b23241c496fbff48ea85de@www.gulbra.net>
References: <2f02a3e211b23241c496fbff48ea85de@www.gulbra.net>
Date: Sun, 23 Mar 2014 11:49:25 -0700
X-Google-Sender-Auth: uqjKjB-DKzPwGtHhDTJ05Si4HGM
Message-ID: <CAG=rPVdg9MGEmGhw1-c2b0CQHq7uy2jh-1k7sf_CgxnZxGcmfg@mail.gmail.com>
Subject: Re: kern/187238: vm.pmap.pcid_enabled="1" causes Java to coredump in
 FBSD 10
From: Craig Rodrigues <rodrigc@FreeBSD.org>
To: Henrik Gulbrandsen <henrik@gulbra.net>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Alan Cox <alc@freebsd.org>, Konstantin Belousov <kib@freebsd.org>,
 freebsd-java <freebsd-java@freebsd.org>
X-BeenThere: freebsd-java@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Porting Java to FreeBSD <freebsd-java.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-java>,
 <mailto:freebsd-java-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-java/>;
List-Post: <mailto:freebsd-java@freebsd.org>
List-Help: <mailto:freebsd-java-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-java>,
 <mailto:freebsd-java-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Mar 2014 18:49:28 -0000

On Thu, Mar 20, 2014 at 12:02 PM, Henrik Gulbrandsen <henrik@gulbra.net> wrote:
> It looks like any attempt to send an email to this list without being
> subscribed means that you end up in moderation indefinitely, so I try
> to send again, just in case there are other people out there that have
> been working on this bug. I have been running my test (java/openjdk6
> rebuilding itself) continuously for 59 hours now. That used to fail
> most of the time, but I haven't seen a single crash with this patch.
>
> /Henrik


Henrik,

Thanks for your persistence in analyzing the problem and coming up with a patch.
I've assigned PR 187238 to kib@ for review.

--
Craig



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?831cb7b9f4719265e66a26edcf6c0859>