Date: Tue, 1 Nov 2016 13:58:53 -0700 From: Jason Harmening <jason.harmening@gmail.com> To: freebsd-stable@freebsd.org Subject: huge nanosleep variance on 11-stable Message-ID: <c88341e2-4c52-ed3c-a469-6446da4415f4@gmail.com>
next in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --A38r7Uo4OBWvgQ5M9aqooXw4kV9U9CR9I Content-Type: multipart/mixed; boundary="LsAW7xxhbRV5jXDkQ8W8EX2K92PL5wtT6"; protected-headers="v1" From: Jason Harmening <jason.harmening@gmail.com> To: freebsd-stable@freebsd.org Message-ID: <c88341e2-4c52-ed3c-a469-6446da4415f4@gmail.com> Subject: huge nanosleep variance on 11-stable --LsAW7xxhbRV5jXDkQ8W8EX2K92PL5wtT6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi everyone, I recently upgraded my main amd64 server from 10.3-stable (r302011) to 11.0-stable (r308099). It went smoothly except for one big issue: certain applications (but not the system as a whole) respond very sluggishly, and video playback of any kind is extremely choppy. The system is under very light load, and I see no evidence of abnormal interrupt latency or interrupt load. More interestingly, if I place the system under full load (~0.0% idle) the problem *disappears* and playback/responsiveness are smooth and quick. Running ktrace on some of the affected apps points me at the problem: huge variance in the amount of time spent in the nanosleep system call. A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry to return of the syscall. OTOH, anything CPU-bound or that waits on condvars or I/O interrupts seems to work fine, so this doesn't seem to be an issue with overall system latency. I can repro this with a simple program that just does a 3ms usleep in a tight loop (i.e. roughly the amount of time a video player would sleep between frames @ 30fps). At light load ktrace will show the huge nanosleep variance; under heavy load every nanosleep will complete in almost exactly 3ms. FWIW, I don't see this on -current, although right now all my -current images are VMs on different HW so that might not mean anything. I'm not aware of any recent timer- or scheduler- specific changes, so I'm wondering if perhaps the recent IPI or taskqueue changes might be somehow to blame. I'm not especially familiar w/ the relevant parts of the kernel, so any guidance on where I should focus my debugging efforts would be much appreciated. Thanks, Jason --LsAW7xxhbRV5jXDkQ8W8EX2K92PL5wtT6-- --A38r7Uo4OBWvgQ5M9aqooXw4kV9U9CR9I Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQF8BAEBCgBmBQJYGQIOXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRENkY3MTQyREU0MTU4MTgyRkZDNUU2ODVC QjlGOEJGOTkyODQxRDFCAAoJELufi/mShB0bmzQH/jmKuVxn4Ug1TvlyWv38ThsS j0xDqygjghN5FT1HtOm8VXr+y67rXrPbfwcJrpBsaizUl3MHhGp9xaYdkcrH34Nt XfL1EtRjSMghf2pmAP1uiiOS4XWuQ9M2r4R0Kd0k9raftKfkn/DfDf/IQQ94i/ah FCzL/Bd9f2KaDzADc8sO0iZpSHIyCBB9uXKFXvWka3ahXaXg0drzBLnv1FJHl2N/ TYueOUlr0wYe6kygo7n4JJ2vA4uZP7lM9Z881j7zIlcGAkap9hcYqWYyflamCikY px4JfNtzXUgarhEIHevYrBnN31ulQSkIzqyPG5kV9B702R1J/mxuwvej3Kp8a4k= =E/vK -----END PGP SIGNATURE----- --A38r7Uo4OBWvgQ5M9aqooXw4kV9U9CR9I--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c88341e2-4c52-ed3c-a469-6446da4415f4>