From owner-freebsd-smp@FreeBSD.ORG  Thu May  1 22:17:08 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: smp@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 65E7E1065671;
	Thu,  1 May 2008 22:17:08 +0000 (UTC)
	(envelope-from dclark@engr.scu.edu)
Received: from endor.engr.scu.edu (smtp.engr.scu.edu [129.210.16.13])
	by mx1.freebsd.org (Postfix) with ESMTP id 3FBD08FC18;
	Thu,  1 May 2008 22:17:08 +0000 (UTC)
	(envelope-from dclark@engr.scu.edu)
Received: from nova46.dc.engr.scu.edu (nova46.dc.engr.scu.edu [129.210.16.43])
	by endor.engr.scu.edu (8.13.6/8.13.6) with ESMTP id m41LCEKm011855
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 1 May 2008 14:12:16 -0700
Received: from localhost (dclark@localhost)
	by nova46.dc.engr.scu.edu (8.13.6/8.13.6) with ESMTP id m41LCERm008754; 
	Thu, 1 May 2008 14:12:14 -0700 (PDT)
X-Authentication-Warning: nova46.dc.engr.scu.edu: dclark owned process doing
	-bs
Date: Thu, 1 May 2008 14:12:14 -0700 (PDT)
From: "Dorr H. Clark" <dclark@engr.scu.edu>
X-Sender: dclark@nova46.dc.engr.scu.edu
To: bug-followup@FreeBSD.org
Message-ID: <Pine.GSO.4.21.0805011409120.8716-100000@nova46.dc.engr.scu.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: bugs@FreeBSD.org, smp@FreeBSD.org, yrao@force10networks.com
Subject: Re: kern/114370: [hang] 6.2 kernel with SMP options hangs when
 dumping core on dual cpu board
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 May 2008 22:17:08 -0000


http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114370

We believe we have recreated this issue with 6.3, 
we have test code which has helped us reproduce it, 
and we have a proposed fix.

Our version of the symptom is slightly different
in that we get a couple #s into the block countdown
of the core dump, but otherwise it's the same.

Note that this test code is solely for the purpose
of exploring the hypothesis of the fix, it is not
required to exhibit the issue, but it makes it convenient
on an SMP/GENERIC kernel (i.e.- no special config).

We have added 2 commands FIOCNCR1 and FIOCNCR2 to the ioctl system
call, which is implemented in kern/sys_generic.c. 
This is just some silly code added to reproduce the issue.

<test code begins>

#define FIOCNCR1 _IO('f', 3)
#define FIOCNCR2 _IO('f', 4)


        case FIOCNCR1:
                mtx_lock_spin(&sched_lock);
                sched_bind(curthread, 0);
                mtx_unlock_spin(&sched_lock);
                while(ncr1) {
                        DELAY(100000);
                        yield(curthread, NULL);
                }
                return (0);

        case FIOCNCR2:
                mtx_lock_spin(&sched_lock);
                sched_bind(curthread, 1);
                mtx_unlock_spin(&sched_lock);
                while(ncr1) {
                        if (ncr2) {
                                panic("force panic on CPU 1");

                        }
                        DELAY(100000);
                        yield(curthread, NULL);
                }
                return (0);

<test code ends>

Here is our explanation of the issue.

If CPU1 is generating a dump, it is not getting out of the following
loop in ata-queue.c[ata_start(), line 213]


                if (dumping) {
                    mtx_unlock(&ch->state_mtx);
                    mtx_unlock(&ch->queue_mtx);
                    while (!ata_interrupt(ch))
                        DELAY(10);
                    return;
                }

The stack trace is like this

DELAY(a) at DELAY+0x92
ata_start() at ata_start+0x313
ata_queue_request(at ata_queue_request+0x27f
ad_strategy() at ad_strategy+0x169
ad_dump() at ad_dump+0xa4
cb_dumpdata() at cb_dumpdata+0x100
foreach_chunk() at foreach_chunk+0x23
dumpsys() at dumpsys+0x1ec
doadump() at doadump+0x48
boot() at boot+0x4ea
panic() at panic+0x1c9
trap_fatal() at trap_fatal+0x31e
trap_pfault() at trap_pfault+0x1d7
trap() at trap+0x309
calltrap() at calltrap+0x5

Basically a request is issued to the disk and the thread is waiting for
the disk IO to complete. The interrupts are not turned off and the
interrupt thread for the disk controller is processing the "disk IO
completion". The thread that is waiting for the disk IO completion is
not aware of this and is waiting forever until ata_interrupt return a
non-zero value[if the interrupts are turned off, ata_interrupt would
have returned a non-zero value].

The proposed patch makes ata_interrupt return 1 if there are no running
requests and dumping is in progress. This patch doesn't have any impact
while dumping is not in progress. With this patch a correct dump is
generated (forced a panic from the slave) and kgdb could read the dump.

An alternative solution may be to disable interrupts across the system, 
but is not currently done in FreeBSD 6.3.  Note kern_shutdown.c boot()

        /* XXX This doesn't disable interrupts any more.  Reconsider? */
        splhigh();

        if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && = !dumping)
                doadump();

In the context of the current code, here is a proposed fix:

@@ -315,16 +315,34 @@
 {
     struct ata_channel *ch = (struct ata_channel *)data;
     struct ata_request *request;

+#if defined(FIX114370)
+    int rv = 0;
+#endif

     mtx_lock(&ch->state_mtx);

     do {

        /* ignore interrupt if its not for us */

+#if defined(FIX114370)
+       if (ch->hw.status && !ch->hw.status(ch->dev)) {
+           if ((dumping) && (ch->running == NULL))
+               rv = 1;
+           break;
+       }
+
+       /* do we have a running request */
+       if (!(request = ch->running)) {
+           if (dumping)
+               rv = 1;
+           break;
+       }
+#else

        if (ch->hw.status && !ch->hw.status(ch->dev))

            break;

        /* do we have a running request */
        if (!(request = ch->running))
            break;
+#endif

        ATA_DEBUG_RQ(request, "interrupt");

@@ -349,7 +367,11 @@

        }
     } while (0);
     mtx_unlock(&ch->state_mtx);

+#if defined(FIX114370)
+    return rv;
+#else
     return 0;
+#endif

 }

 /*

If someone can explain why this is not a fix, identify ill side effects, 
or propose a better solution please respond.

Thanks,

Chitti Nimmagadda
Engineer

Dorr H. Clark
Advisor

Graduate School of Engineering
Santa Clara University
Santa Clara, CA