From owner-svn-src-head@FreeBSD.ORG Fri Oct 24 19:45:52 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AF7CB4B4; Fri, 24 Oct 2014 19:45:52 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3A0EF361; Fri, 24 Oct 2014 19:45:52 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9OJjkbs049020 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 24 Oct 2014 22:45:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9OJjkbs049020 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9OJjkgY049019; Fri, 24 Oct 2014 22:45:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 24 Oct 2014 22:45:46 +0300 From: Konstantin Belousov To: Rui Paulo Subject: Re: svn commit: r273598 - in head: include sys/dev/acpica Message-ID: <20141024194546.GE1877@kib.kiev.ua> References: <33decfcd-e77c-4e4c-8161-9f4a232213c6@me.com> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <33decfcd-e77c-4e4c-8161-9f4a232213c6@me.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Rui Paulo X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Oct 2014 19:45:52 -0000 On Fri, Oct 24, 2014 at 07:33:07PM +0000, Rui Paulo wrote: > On Oct 24, 2014, at 12:20 PM, Konstantin Belousov wrote: > > On Fri, Oct 24, 2014 at 06:39:16PM +0000, Rui Paulo wrote: > > Author: rpaulo > > Date: Fri Oct 24 18:39:15 2014 > > New Revision: 273598 > > URL: https://svnweb.freebsd.org/changeset/base/273598 > > > > Log: > > HPET: create /dev/hpetN as a way to access HPET from userland. > > > > In some cases, TSC is broken and special applications might benefit > > from memory mapping HPET and reading the registers to count time. > > Most often the main HPET counter is 32-bit only[1], so this only gives > > the application a 300 second window based on the default HPET > > interval. > > Other applications, such as Intel's DPDK, expect /dev/hpet to be > > present and use it to count time as well. > > > > Although we have an almost userland version of gettimeofday() which > > uses rdtsc in userland, it's not always possible to use it, depending > > on how broken the multi-socket hardware is. > Yes, and hpet userland mapping would be better handled through the same > fake-vdso framework. As designed, it has discriminator to inform > userspace about algorithm, and can happilly utilize HPET timecounter > automatically mapped by kernel into the process address space. > š > I'm aware of that, but I found the vdso a bit confusing and decided to work on that later. > > > +static int > > +hpet_open(struct cdev *cdev, int oflags, int devtype, struct thread *td) > > +{ > > + š š šstruct hpet_softc *sc; > > + > > + š š š šsc = cdev->si_drv1; > > + š š š šif (!sc->mmap_allow) > > + š š š š šreturn (EPERM); > > + š š šif (atomic_cmpset_32(&sc->devinuse, 0, 1) == 0) > > + š š š š š šreturn (EBUSY); > This is extra-weird. > The devinuse business disallows simultaneous opens, which prevents > other process from opening and mapping. But if the original caller > does mmap and close, second process now is allowed to open and mmap. > > That said, why do you need this devinuse at all ? > š > Hmm, I wanted to avoid multiple mmap's, but that doesn't work like you said. šI may just remove this restriction.š > This is probably best. > > +static int > > +hpet_mmap(struct cdev *cdev, vm_ooffset_t offset, vm_paddr_t *paddr, > > + int nprot, vm_memattr_t *memattr) > > +{ > > + š šstruct hpet_softc *sc; > > + > > + š šsc = cdev->si_drv1; > > + š š š šif (offset > rman_get_size(sc->mem_res)) > > + š š š š š š š šreturn (EINVAL); > > + š šif (!sc->mmap_allow_write && (nprot & PROT_WRITE)) > > + š š šreturn (EPERM); > > + š š š*paddr = rman_get_start(sc->mem_res) + offset; > What is the memattr for the backing page ? Is it set to non-cached > mode somehow ? I was not able to find place where would this happen. > š > I expect it to be set to non-cached since it's a device anyway, but I don't know where it is. šDuring my testing, I did not see any problems with cached values, though. > I am not claiming that it is wrong, only that I do not see an easy reason why it is right. Just printing the *memattr would provide the confidence. > > + šsc->pdev = make_dev(&hpet_cdevsw, 0, UID_ROOT, GID_WHEEL, > > + š š š š 0600, "hpet%d", device_get_unit(dev)); > > + š š š šif (sc->pdev) { > > + š š š š š š š šsc->pdev->si_drv1 = sc; > > + š š šsc->mmap_allow = 1; > > + š š š š š š š šTUNABLE_INT_FETCH("hw.acpi.hpet.mmap_allow", > > + š š š š š š &sc->mmap_allow); > > + š š š š š šsc->mmap_allow_write = 1; > > + š š š š š šTUNABLE_INT_FETCH("hw.acpi.hpet.mmap_allow_write", > > + š š š š š š š š &sc->mmap_allow_write); > > + š š š š š š š šSYSCTL_ADD_INT(device_get_sysctl_ctx(dev), > > + š š š š š š š š SYSCTL_CHILDREN(device_get_sysctl_tree(dev)), > > + š š š OID_AUTO, "mmap_allow", > > + š š š š š CTLFLAG_RW, &sc->mmap_allow, 0, > > + š š š š š š š š "Allow userland to memory map HPET"); > Why is mmap_allow is per-instance, while mmap_allow_write taken from > the global tunable ? > š > Are you asking why there's no sysctl for it? IMO the allow-write must be controllable per-instance, or just managed by /dev/hpet* unix permissions. Having one global knob, which is consulted at the module load, is not flexible. What is the use-case for writing to HPET page ? To manually micro-adjust the timer ? Then, you probably want to disable write for instance used by system timecounter or eventtimer, but allow for HPET utilized by other consumers.