Date: Mon, 20 Jan 2014 15:22:23 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-virtualization@freebsd.org Cc: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Subject: Re: KVM Clock Message-ID: <3298215.G6vx8QjdMB@pippin.baldwin.cx> In-Reply-To: <1389805150.16498.18.camel@janus.xn--pl-wia.net> References: <lb3jnb$qo8$1@ger.gmane.org> <52D6B18D.4080204@citrix.com> <1389805150.16498.18.camel@janus.xn--pl-wia.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 15 January 2014 17:59:10 Julian Stecklina wrote: > On Mi, 2014-01-15 at 17:04 +0100, Roger Pau Monn=E9 wrote: > > On 15/01/14 14:45, Julian Stecklina wrote: > > > On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monn=E9 wrote: > > >> On 15/01/14 13:05, Julian Stecklina wrote: > > >>> On 01/14/2014 05:13 PM, Peter Grehan wrote: > > >>>> Hi Julian, > > >>>>=20 > > >>>>> is anyone working on KVM clock support for FreeBSD? If not, I= > > >>>>> might take a shot at it. > > >>>>=20 > > >>>> None I know of: go for it :) > > >>>=20 > > >>> Works for me so far: > > >>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031f= c7d6bde > > >>> dc65c3148f> >>=20 > > >> Looking > > >>=20 > > >> at the code it seems some common routines could be shared > > >> between the KVM PV clock and the Xen PV clock > > >> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor= to > > >> the guest has exactly the same format (see struct vcpu_time_info= in > > >> Xen public headers). > > >=20 > > > Yes, I know. Didn't get around to making it pretty yesterday even= ing. ;) > > > I'll post an updated patch, when I have some time. Any suggestion= s where > > > to put the two common functions? > > >=20 > > >> At a first sight the KVM clock can benefit from using scale_delt= a > > >> (which is going to be faster than the C version implemented in > > >> kvmclock_get_timecount), > > >=20 > > > At least somewhat on amd64. 32-bit looks pretty identical. > > >=20 > > >> and xen_fetch_vcpu_tinfo is exactly the same > > >>=20 > > >> as kvmclock_fetch. > > >=20 > > > I think xen_fetch_vcpu_tinfo has a subtle bug: > > > 217 do { > > > 218 dst->version =3D src->version; > > > 219 rmb(); > > > 220 dst->tsc_timestamp =3D src->tsc_timestamp; > > > 221 dst->system_time =3D src->system_time; > > > 222 dst->tsc_to_system_mul =3D src->tsc_to_syst= em_mul; > > > 223 dst->tsc_shift =3D src->tsc_shift; > > > 224 rmb(); > > > 225 } while ((src->version & 1) | (dst->version ^ > > >=20 > > > src->version)); > > >=20 > > > In line 225 src->version is potentially read twice. If you consid= er the > > > following schedule: > > >=20 > > > host starts updating data, sets src->version to 1 > > > guest reads src->version (1) and stores it into dst->version. > > > guest copies inconsistent data > > > guest reads src->version (1) and computes xor with dst->version. > > > host finishes updating data and sets src->version to 2 > > > guest reads src->version (2) and checks whether lower bit is not = set. > > > while loop exits with inconsistent data! > > >=20 > > > I think the C standard (at least C11) permits this as it does not= > > > specify in which order the two reads in line 225 need to happen. > >=20 > > According to the operator precedence and associativity in C > > (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_p= receden > > ce), if I'm reading it right, the condition in the while line will = be > > evaluated as follows (because of the left-to-right associativity of= the > >=20 > > | operator): > > 1. (src->version & 1) > > 2. (dst->version ^ src->version) > > 3. result of 1 | result of 2 > >=20 > > So AFAICT the flow that you describe could never happen, because > > (src->version & 1) is always done before (dst->version ^ src->versi= on). >=20 > Operator precedence doesn't matter. Consider it written like this: >=20 > or(and(src->version, 1), xor(dst->version, src->version)) >=20 > C gives you no guarantees whether the and or the xor will be executed= > first. There is no sequence point in between. And even with a sequenc= e > point, the C11 memory model gives you no guarantees about how the rea= ds > are ordered. >=20 > This discussion is somewhat theoretical, because > a) the hypervisor will probably update the data structure in the > current vCPU context (making memory consistency issues go away). > b) the compiler will likely not be an asshole. ;) >=20 > Pseudocode for a better fetch could be: >=20 > dst->version =3D src->version; > rmb(); > ... read data ... > rmb(); > version_post =3D src->version; > rmb(); <- keeps the compiler from reading src->version multiple times= >=20 > and then doing the test with version_post and dst->version. > Unfortunately, rmb() expands into lfence, even if there is no need fo= r > that here. There is the __compiler_membar() macro in <sys/cdefs.h> that you could = use if=20 this code is x86-specific (and thus knows it only needs a compiler barr= ier). --=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3298215.G6vx8QjdMB>