From owner-freebsd-hackers@freebsd.org Fri Oct 4 19:39:39 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9BF7A137260 for ; Fri, 4 Oct 2019 19:39:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 46lKvX6h11z3GG5 for ; Fri, 4 Oct 2019 19:39:36 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id CA06043F06D; Sat, 5 Oct 2019 05:39:29 +1000 (AEST) Date: Sat, 5 Oct 2019 05:39:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Poul-Henning Kamp cc: Sebastian Huber , Warner Losh , Konstantin Belousov , Bruce Evans , FreeBSD Subject: Re: Why is tc_get_timecount() called two times in tc_init()? In-Reply-To: <60167.1570198248@critter.freebsd.dk> Message-ID: <20191005024530.U1757@besplex.bde.org> References: <0e27fb3e-0f60-68e1-dbba-f17c3d91c332@embedded-brains.de> <20191002140040.GA44691@kib.kiev.ua> <20191003013314.O2151@besplex.bde.org> <20191002163946.GE44691@kib.kiev.ua> <20191003030837.C2787@besplex.bde.org> <20191003084021.GI44691@kib.kiev.ua> <47834.1570116246@critter.freebsd.dk> <141ee0af-2ff4-50fc-b4e4-6d1fc47e04f3@embedded-brains.de> <60167.1570198248@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=PGrZBZ7u4VfUz6DTkGYA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 46lKvX6h11z3GG5 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-1.30 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; IP_SCORE_FREEMAIL(0.00)[]; RCPT_COUNT_FIVE(0.00)[6]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; IP_SCORE(0.00)[ip: (-6.92), ipnet: 211.28.0.0/14(-3.23), asn: 4804(-2.38), country: AU(0.01)]; RCVD_NO_TLS_LAST(0.10)[]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[] X-Mailman-Approved-At: Sat, 12 Oct 2019 23:39:36 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Oct 2019 19:39:39 -0000 On Fri, 4 Oct 2019, Poul-Henning Kamp wrote: > -------- > In message <141ee0af-2ff4-50fc-b4e4-6d1fc47e04f3@embedded-brains.de>, Sebastian > Huber writes: > >>> I think the original reason for this was (locked) delta-based >>> timecounters, (ie counters which roll over rapidly) in order that >>> their first "real" use would not return truly bogus values. >> >> Ok, this explanation makes sense. When I ported the FreeBSD timecounter >> implementation to RTEMS I was a bit surprised how many chips (even a >> Cortex-M has nothing out of the box) lack a simple free running counter. >> Maybe it should be added as a comment to these two tc_get_timecount() calls? > > As long as the counter can be read atomically and does not roll over > in a matter of milliseconds, two reads are not necessary. The i8254 timecounter rolls over in a matter of microseconds if suitably (mis)configured. E.g., for pcaudio i8254 periodic timer was run at 16 kHz so it rolled over every 74 or 75 cycles. I only used this for stress tests. Now you can do even better stress tests and foot shooting using event timers: make the i8254 the active timecounter and the active event timer in aperiodic mode, and ask it to generate aperiod interrupts faster than 16 kHz. The pcaudio code attempts to avoid glitches in the timecounter when switching the frequency between HZ and 16k (mostly by deferring the switch to the interrupt handler). The one-shot switching not careful, and can be exercised at higher frequencies by unprivileged applications (after privilege is used to configure). No matter how fast the counter can roll over, 2 reads are only useful or needed accidentally. And they are needed for rollover detection for the i8254 timecounter: XX low = inb(TIMER_CNTR0); XX high = inb(TIMER_CNTR0); XX count = i8254_max_count - ((high << 8) | low); XX if (count < i8254_lastcount || i8254_lastcount is garbage after only 1 read, but it is used here at the start of the rollover detection. Thus the first read returns garbage and also leaves some internal state as garbage, but it will update i8254_lastcount in its internal state and that is enough for the second read to work correctly. XX (!i8254_ticked && (clkintr_pending || XX ((count < 20 || (!(flags & PSL_I) && The magic 20 is something that is hopefully larger than the interrupt latency (IIRC, this dates from McCann's version in ~1992 when we hoped that i386's had interrupt latency this low). XX count < i8254_max_count / 2u)) && 16 kHz gives a non-magic 37 for 'i8254_max_count / 2u' and we do a faster test for rollover between 20 and 37 and no rollover test above 37. XX i8254_pending != NULL && i8254_pending(i8254_intsrc))))) { XX i8254_ticked = 1; XX i8254_offset += i8254_max_count; XX } Rollover detection requires calling i8254_get_timecounter() at least {once per rollover} + {~20 cycles}. It looks like I fixed switching to the i8254 timecounter by adding an extra call in 1 place. This was not so good, and it turned into 2 calls in 1 place, then was later replicated in places that shouldn't know anything about this. The need for 2 calls is only an optimization, but I don't a better way. On any call, we can't know if the previous call left garbage (or was not made) without reading the counter twice. The caller now keeps track of the state and makes a second call when necessary. This fits well with another responsibility of the callers -- calls must be made often enough to detect rollover. Perhaps requiring calls strictly more often than rollover is best. Then the rollover detection in the i8254 timecounter can be removed. I now see the correct design for handling timecounters across suspend/ resume: - in the suspend method for the active timecounter, if this stops the hardware, then first stop the software timecounter, perhaps by switching to a null timecounter. Don't leave the active timecounter returning garbage as now. (acpi suspend does some bogus switching instead, I think from the active timecounter to the non-null acpi timecounter. This can only help if the acpi timecounter is suspended after the active timecounter, which it should be if acpi is configured.) - in the resume method for the active timecounter, if this timecounter was switched away from in suspend, first start the hardware timecounter, then switch to it. - restore switch_timecounter() and fix it to work and call it instead of replicating it. Call it at boot time for dummy -> normal, at sysctl change time for normal -> another, at suspend time for normal -> null and at resume time for null -> normal. - the timecounter for transitions in suspend/resume can't be dummy since we don't know how fast or far it should advance. Since the h/w timer is expected to stop while suspended, it may as well stop for transitions too (acpi might already implement this by stopping and starting its timer as the last and first things in suspend and resume, but I don't trust it to stop all the other CPUs and subsystems and the above works without acpi). - fix races and minimize drift when restoring the RTC time using inittodr(). My version uses only splhigh() locking which stopped working in FreeBSD-5. It keeps track of the drift between the RTC and the timecounter in a recent long interval and assumes that the drift is the same while suspended (it is not the same, but better than 0). My version also fixes the average 0.5 seconds error reading the RTC. - fix monotonic times across suspend/resume. Don't clobber boottime to make the real time come out right after stopping the monotonic clock, but step the monotonic clock forward to approximate physical time since the unclobbered boottime. - fix timeouts across suspend/resume. Most should occur soon after resume, but not all at once. Bruce