Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Mar 2017 11:40:42 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Mark Johnston <markj@FreeBSD.org>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Justin Hibbits <chmeeedalf@gmail.com>, Nathan Whitehorn <nwhitehorn@freebsd.org>
Subject:   Re: powerpc64 head -r314687 (PowerMac G5 so-called "Quad Core", clang based): CAM status: Command timeout (always?)
Message-ID:  <1FB4C473-28A7-461A-890E-E22E8D890A76@dsl-only.net>
In-Reply-To: <20170307171526.GA42761@wkstn-mjohnston.west.isilon.com>
References:  <98A62E0D-C2A0-40B1-AE6D-5810906208AE@dsl-only.net> <4C78F6AA-5ABD-4445-B5EF-4E6778CE36FE@dsl-only.net> <20170306164341.GA83069@wkstn-mjohnston.west.isilon.com> <466C25ED-0A70-4988-9BB1-3B43BD031E5E@dsl-only.net> <E67A6606-941D-4F00-993D-4347C2A1D332@dsl-only.net> <20170307010204.GA3611@wkstn-mjohnston.west.isilon.com> <2FA8AC16-8108-4FC7-B1E6-788CBD32F372@dsl-only.net> <20170307171526.GA42761@wkstn-mjohnston.west.isilon.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Mar-7, at 9:15 AM, Mark Johnston <markj at FreeBSD.org> wrote:

> On Mon, Mar 06, 2017 at 08:03:08PM -0800, Mark Millard wrote:
>> On 2017-Mar-6, at 5:02 PM, Mark Johnston <markj at FreeBSD.org> =
wrote:
>>=20
>>> On Mon, Mar 06, 2017 at 02:01:06PM -0800, Mark Millard wrote:
>>>> [scsi_pass.c -r314624 is the problem file vintage of the two =
files.]
>>>>=20
>>>> On 2017-Mar-6, at 10:36 AM, Mark Millard <markmi at dsl-only.net> =
wrote:
>>>>=20
>>>>> On 2017-Mar-6, at 8:43 AM, Mark Johnston <markj at FreeBSD.org> =
wrote:
>>>>>=20
>>>>>> On Mon, Mar 06, 2017 at 02:05:39AM -0800, Mark Millard wrote:
>>>>>>> On 2017-Mar-6, at 1:37 AM, Mark Millard <markmi at dsl-only.net> =
wrote:
>>>>>>> [...]
>>>>>>> Yep: reverting the two files allowed the PowerMac G5 so-called
>>>>>>> "Quad Core" to boot fully and I could log in.
>>>>>>=20
>>>>>> Do you have a full dmesg of the failed boot? Am I correct in =
thinking
>>>>>> that the boot failed before making it to user mode?
>>>>>=20
>>>>> . . .
>>>>>> If so I'm rather
>>>>>> puzzled, as the change should only affect userland applications.
>>>>>> Specifically, it modified a couple of ioctl handlers.
>>>>>>=20
>>>>>>>=20
>>>>>>> It appears that if such powerpc64 machines are to stay bootable
>>>>>>> then other things need to be cleaned up before the two updated
>>>>>>> files from -r314624 should be used.
>>>>>>>=20
>>>>>>> Should the 2 files be reverted until other things are cleaned =
up?
>>>>>>=20
>>>>>> I don't mind reverting the change, but my suspicion is that it =
uncovered
>>>>>> a problem rather than introducing it. If you're willing to narrow =
things
>>>>>> down a bit, could you try booting with one of the file =
modifications and
>>>>>> not the other? They are independent.
>>>>>=20
>>>>> In a while I'll try each of the files individually, one old, one =
modern
>>>>> each time.
>>>>=20
>>>> scsi_pass.c -r314624 (new) and cam_xpt.c -r314283 (old): fails.
>>>>=20
>>>> cam_xpt.c -r314624 (new) and scsi_pass.c -r308451 (old) : works =
fine so far.
>>>>=20
>>>> Prior results:
>>>>=20
>>>> cam_xpt.c and scsi_pass.c both being -r314624 (both new): fails
>>>>=20
>>>> cam_xpt.c -r314283 and scsi_pass.c -r308451 (both old): works fine.
>>>=20
>>> Thank you. I'm still failing to see how the change is connected with =
the
>>> symptoms you're seeing. Are you testing with a kernel that has
>>> INVARIANTS and WITNESS configured?
>>>=20
>>> I've broken up the scsi_pass.c change into several patches. They are
>>> sequential; can you try testing the result of each patch in the =
series?
>>=20
>> I'm no longer able to reproduce the problem, not even with an
>> "svnlite update -r314687" based build where "svnlite status
>> /usr/src/" does not list ether of the files. This was after
>> trying the patch sequence, which had no failures at any stage.
>>=20
>> This suggests some sort of intermittent problem someplace.
>>=20
>> At least it fits with your not finding a way for your code
>> update to cause the results that I got.
>>=20
>> But finding such an intermittent problem is a pain. I've
>> no clue if/when I'll even see an example again, much less
>> find a way to investigate it if I do. (PowerMac's do not
>> take ddb input early.)
>>=20
>> There is the possibility that the recent atomic_fcmpset based
>> locking changes still has some sort of problem, just not seen
>> often. Not easy to find if true.
>>=20
>> Anyway I'm now running -r314687 with:
>=20
> Indeed, this kind of problem is tricky to track down. A couple of
> thoughts:
> - Were you using the same compiler for all of your tests? I noticed =
your
>  post yesterday about clang 3.9 vs. 4.0 for powerpc and powerpc64.

All the powerpc64 builds were cross builds from amd64 -r314687 --
and so all are system-clang 4.0 based.

Those notes are because I've been a long-term tester and issue
reporter for clang targeting the powerpc family. I also report
to the llvm bugzilla for this. I have history to compare against
without running new tests for 3.9.1.

> - Was the rest of the source tree (i.e., everything but cam_xpt.c and
>  scsi_pass.c) the same in all of your testing? I've noticed in the =
past
>  that unrelated changes to the source tree can result in various =
kernel
>  linker sets having a different order than they would have otherwise,
>  and that can expose or hide bugs. See this recent post for an =
example:
>  =
https://lists.freebsd.org/pipermail/freebsd-current/2016-December/064122.h=
tml

Yes: the same. In fact I use reproducible builds now and my
2017-Mar-4 /boot/kerc40/* matches my 2017-Mar-6 build at issue
exactly. (This is not a debug-kernel build context.) Booting
kerc40 no longer gets the problem either, which is part of why
I did that diff -r and discovered the exact match for not
having reverted either file.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1FB4C473-28A7-461A-890E-E22E8D890A76>