Date: Mon, 13 Jul 2015 10:07:30 +0200 From: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> To: =?UTF-8?B?RWR3YXJkIFRvbWFzeiBOYXBpZXJhxYJh?= <trasz@freebsd.org>, FreeBSD Stable <freebsd-stable@freebsd.org>, kib@freebsd.org Subject: Re: r284665 causes MSI problems -> ahcich2: Timeout in slot 11 port 0 Message-ID: <55A371C2.9030009@omnilan.de> In-Reply-To: <55A23A75.8050003@omnilan.de> References: <55A158E1.3000905@omnilan.de> <20150712094153.GA1549@brick> <55A23A75.8050003@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Bezüglich Harald Schmalzbauer's Nachricht vom 12.07.2015 11:59 (localtime): … >>> I can't find suspicious code in r282213 which could cause this strange >>> regression, but I verified carefully that problem arises with r284665. >>> Actually, r282901 >>> (https://svnweb.freebsd.org/base?view=revision&sortby=date&revision=282901) >>> is the real trigger, verified by putting >>> nooptions RACCT >>> nooptions RACCT_DEFAULT_TO_DISABLED >>> nooptions RCTL >>> into my kernel config -> problem vanishes! >>> >>> Setting "kern.racct.enable=1" doesn't make any difference, as soon as >>> 'kern.features.racct' exists, there's the ahci(4)/ahcich2 timeout and >>> machine doesn't finish booting. >>> >>> Unfortunately, I don't have any idea how to track this down to the >>> actual culprit, but I hope the RACCT hackers do have ;-) >>> >>> Shall I open a bugzilla ticket? >> That's... curious. I don't see how those two things could be related. >> What's the FreeBSD version? How reproducible it is? Have you tried >> compiling with and without those three lines a couple of times? > Yes, I tried several times, and falsified that with r284665 the timeouts > reproducably show up (which blocks the booting process, a major issue in > my case). > I also verified that several different revisions <284665 don't lead to > that problem, and also that the changes in ahci code paths for the last > year are not involved. > I also can't see any relation, wich doesn't mean much since I don't have > the kernel skills, but I'm sure the symptoms start with "options RACCT" While still true that I _always_ had troubles with ahcich-timeouts and "options RACCT". I now saw the same problem with kernel compiled without RACCT option :-( In this case, it's random and I had luck several times in a row, but later on, several times in a row not, when the ahcich-timeouts prevented the box from booting. So "options RACCT" does have an influence – like mentioned, I could never boot the machine with kernel >= r284665 and "options RACCT", but most times with the same kernel without "options RACCT" – but removing "options RACCT" from the kernel config is _not_ a true solution. It just improves things in the way that it's possible to boot at all – most times – but sometimes also leads to ahcich-timeouts. At least with kernel >= r284665. I couldn't re-checked with older revisions. Next chance for tests is the weekende after the next :-( Do you have an idea which race "options RACCT" could influence? Verbose booting showed in all cases (with or without timeouts) the same IRQ mapping as far as I could see. Is it likely to be a ACPI-Routing problem? Once the machine booted, I couldn't see any ahcich-timeout yet in production. Thanks, -Harry [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlWjccIACgkQLDqVQ9VXb8hX0ACfQr1On9gC3hcAYPH/peKegv5h 9TkAoLDSW1hRDVczZY9/2cfl0PFaOdnh =ZwzR -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A371C2.9030009>
