Date: Mon, 24 Jun 2013 23:16:33 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Gleb Smirnoff <glebius@FreeBSD.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au> Subject: Re: svn commit: r252032 - head/sys/amd64/include Message-ID: <20130624211428.O2235@besplex.bde.org> In-Reply-To: <20130624081056.GD1214@FreeBSD.org> References: <20130621065839.J916@besplex.bde.org> <20130621081116.E1151@besplex.bde.org> <20130621090207.F1318@besplex.bde.org> <20130621064901.GS1214@FreeBSD.org> <20130621184140.G848@besplex.bde.org> <20130621135427.GA1214@FreeBSD.org> <20130622110352.J2033@besplex.bde.org> <20130622124832.S2347@besplex.bde.org> <20130622174921.I3112@besplex.bde.org> <20130623073343.GY91021@kib.kiev.ua> <20130624081056.GD1214@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 24 Jun 2013, Gleb Smirnoff wrote: > On Sun, Jun 23, 2013 at 10:33:43AM +0300, Konstantin Belousov wrote: > K> On Sat, Jun 22, 2013 at 06:58:15PM +1000, Bruce Evans wrote: > K> > So the i386 version be simply "addl; adcl" to memory. Each store in > K> > this is atomic at the per-CPU level. If there is no carry, then the > K> > separate stores are equivalent to adding separate nonnegative values and > K> > the counter value is valid at all times. If there is carry, then the > K> > separate stores are equivalent to adding a negative value followed by > K> > a larger positive value. The counter transiently goes backwards, but > K> > no information is lost and the counter is eventually set to the correct > K> > forward-going value. > K> > K> This is quite interesting idea, but I still did not decided if it > K> acceptable. The issue is that we could add the carry to the other > K> processor counter, if the preemption kicks in at right time between > K> two instructions. I did not found any argument why would it be > K> wrong, the races for fetch operation seems to be the same with either > K> local update method. > > This would be wrong since update isn't locked. Thus, if we are put on > other CPU between two instructions, and in second instruction updating > another CPU counter simultaneously with the original CPU we were on, > then we are losing an update. Hmm, this is subtle. The update is never lost, but is applied to different counter, non-atomically at a later time. Non-atomicity only matters when there is a carry since the counter only goes transiently backards in this case. For example: initial state: CPU1 counter: 00000000 ffffffff CPU2 counter: 00000000 fffffffe Start adding 1 to the first counter, doing it non-atomically by incrementing the low word first. CPU1 counter: 00000000 00000000 (carry in CPU1 eflags) CPU2 counter: 00000000 fffffffe Get preempted at this point: CPU1 counter: 00000000 00000000 CPU2 counter: 00000000 fffffffe (carry in CPU2 eflags) The fetching code running at this point will see a wrong value, but has more serious bugs. Once it is fixed, it can try using a heuristic to detect this case, or the fix might involve a heuristic that detects this case too. I was thinking of keeping a global count of the previous value for each counter, and adding 0x100000000 when the count goes backwards. However, in the worst each component of a counter may be making the transition from 0xffffffff to 0x100000000, with the carry bit in eflags for all CPUs. Detecting this requires watching the low word of all components of all counters. Increments must be limited to considerably less than 32 bits, and negative increments should be disallowed, so that the low word doesn't wap faster than it can be watched. CPU2 runs and completes the increment by adding the carry to the high word: CPU1 counter: 00000000 00000000 CPU2 counter: 00000001 fffffffe The total count becomes correct once the second word is written. Statistics for each CPU should be correct, but losing this race is so rare that maybe we especially don't care about this subcase. There is currently no API for exporting the counter statistics for each CPU. Similarly if the preemption is to another thread on the same CPU, or to another thread that doesn't touch this counter and then back to the original thread (much later) on the same CPU. There may be any number of updates to the low word before the corresponding updates to the high word, mixed in any order except for the low word being updated before the high word for each increment. State for the half-done updates is saved in carry flags and registers in any number of threads. If large increments are allowed, the value may appear to have had many negative increments. > Yes, the probability of such event is extremely low, but still possible. > The idea on counter(9) is that although fetching might be not precise, > the internal value of a counter is consistent and doesn't lose even a > single update. This would allow later to use counter(9) as a cheap > reference counter. The fetching code has essentially the same race (in the 32-bit case) that is laboriously worked around in the update code, irrespective of how atomic the update code is, even with 1 CPU: CPU1 counter: 00000000 ffffffff Start reading it on CPU1, using 32-bit load of the low word (the compiler could make this worse or just different by loading the high word first): ffffffff (read this value) Get preempted; counter update occurs and adds 1 perfectly atomically: CPU1 counter: 00000001 00000000 Back to fetching code on CPU1; read high word: 00000001 ffffffff (read this value; it is garbage) If the compiler loads the high word first, it will see the value off by ~2**32 in the opposite direction (00000000 00000000). Like I said before, this can probably be handled by rereading each pcpu counter until it doesn't change. Or if large increments are not allowed, just read the low word, the high word, and the low word again until the low word doesn't change. This works on the CPU owning pcpu counter provided the counter update is atomic. On other CPUs, I think it depends on MD memory ordering being fairly strict. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130624211428.O2235>