Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Nov 2016 13:58:53 -0700
From:      Jason Harmening <jason.harmening@gmail.com>
To:        freebsd-stable@freebsd.org
Subject:   huge nanosleep variance on 11-stable
Message-ID:  <c88341e2-4c52-ed3c-a469-6446da4415f4@gmail.com>

next in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--A38r7Uo4OBWvgQ5M9aqooXw4kV9U9CR9I
Content-Type: multipart/mixed; boundary="LsAW7xxhbRV5jXDkQ8W8EX2K92PL5wtT6";
 protected-headers="v1"
From: Jason Harmening <jason.harmening@gmail.com>
To: freebsd-stable@freebsd.org
Message-ID: <c88341e2-4c52-ed3c-a469-6446da4415f4@gmail.com>
Subject: huge nanosleep variance on 11-stable

--LsAW7xxhbRV5jXDkQ8W8EX2K92PL5wtT6
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi everyone,

I recently upgraded my main amd64 server from 10.3-stable (r302011) to
11.0-stable (r308099).  It went smoothly except for one big issue:
certain applications (but not the system as a whole) respond very
sluggishly, and video playback of any kind is extremely choppy.

The system is under very light load, and I see no evidence of abnormal
interrupt latency or interrupt load.  More interestingly, if I place the
system under full load (~0.0% idle) the problem *disappears* and
playback/responsiveness are smooth and quick.

Running ktrace on some of the affected apps points me at the problem:
huge variance in the amount of time spent in the nanosleep system call.
A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry
to return of the syscall.  OTOH, anything CPU-bound or that waits on
condvars or I/O interrupts seems to work fine, so this doesn't seem to
be an issue with overall system latency.

I can repro this with a simple program that just does a 3ms usleep in a
tight loop (i.e. roughly the amount of time a video player would sleep
between frames @ 30fps).  At light load ktrace will show the huge
nanosleep variance; under heavy load every nanosleep will complete in
almost exactly 3ms.

FWIW, I don't see this on -current, although right now all my -current
images are VMs on different HW so that might not mean anything.  I'm not
aware of any recent timer- or scheduler- specific changes, so I'm
wondering if perhaps the recent IPI or taskqueue changes might be
somehow to blame.

I'm not especially familiar w/ the relevant parts of the kernel, so any
guidance on where I should focus my debugging efforts would be much
appreciated.

Thanks,
Jason


--LsAW7xxhbRV5jXDkQ8W8EX2K92PL5wtT6--

--A38r7Uo4OBWvgQ5M9aqooXw4kV9U9CR9I
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQF8BAEBCgBmBQJYGQIOXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRENkY3MTQyREU0MTU4MTgyRkZDNUU2ODVC
QjlGOEJGOTkyODQxRDFCAAoJELufi/mShB0bmzQH/jmKuVxn4Ug1TvlyWv38ThsS
j0xDqygjghN5FT1HtOm8VXr+y67rXrPbfwcJrpBsaizUl3MHhGp9xaYdkcrH34Nt
XfL1EtRjSMghf2pmAP1uiiOS4XWuQ9M2r4R0Kd0k9raftKfkn/DfDf/IQQ94i/ah
FCzL/Bd9f2KaDzADc8sO0iZpSHIyCBB9uXKFXvWka3ahXaXg0drzBLnv1FJHl2N/
TYueOUlr0wYe6kygo7n4JJ2vA4uZP7lM9Z881j7zIlcGAkap9hcYqWYyflamCikY
px4JfNtzXUgarhEIHevYrBnN31ulQSkIzqyPG5kV9B702R1J/mxuwvej3Kp8a4k=
=E/vK
-----END PGP SIGNATURE-----

--A38r7Uo4OBWvgQ5M9aqooXw4kV9U9CR9I--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c88341e2-4c52-ed3c-a469-6446da4415f4>