Date: Mon, 03 Jan 2005 20:58:50 +0100 From: =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@DeepCore.dk> To: Nate Lawson <nate@root.org> Cc: FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: ATA rman performance enhancement Message-ID: <41D9A3FA.20204@DeepCore.dk> In-Reply-To: <41D9A2CF.30704@root.org> References: <41D984A5.7010408@root.org> <41D99FBD.8070500@DeepCore.dk> <41D9A2CF.30704@root.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Nate Lawson wrote: > S=F8ren Schmidt wrote: >=20 >> Nate Lawson wrote: >> >>> While doing some benchmarking of other code, I noticed that there=20 >>> were a lot of calls to rman_get_bustag/handle(). They weren't taking= =20 >>> up much actual time since they're pretty lightweight but seemed to be= =20 >>> unnecessary. >>> >>> I worked up the attached diff and benchmarked it. There are about=20 >>> 1000 calls a second to the rman routines without this patch and=20 >>> essentially none with it. It makes about a 1% difference in=20 >>> throughput under some IO loads. It is only for non-Promise or=20 >>> non-SII controllers right now since I didn't extend the=20 >>> initialization step to more than ata-pci.c. The same approach could=20 >>> be used for the other INW/OUTW calls as well but they're not in the=20 >>> fast path. I think it may make more of a difference with small reads= =2E >> >> I had something semilar to this once back when, but since HW got lots = >> faster I couldn't measure it anymore, but maybe things has changed... >> >> Anyhow it needs to be applied to ata-isa.c ata-card.c ata-cbus.c etc=20 >> to not break anything at least.=20 >=20 > Yes, I agree. I limited the change only to the IDX macros that were=20 > used in the fast path although it could apply to all of them. I *must* be applied to those files, otherwise the handle & tag fields=20 are not initialized when using !pci based devices with you patch. >> I'll think about it and eventually do something about it in ATA-mkIII = >> if it really is mesureable again.. >=20 > It is a minimal difference but with small transfers on a fast drive wit= h=20 > a low-powered CPU, it's measurable. I made the same change a while bac= k=20 > in acpi_timer.c since the resource is only allocated once but accessed = > very often and in a low-latency context. Right, but I'd like to get the much bigger fishes catched first before=20 doing micro optims, thats what ATA-mkIII is all about in the first=20 place. However I'll put it back on the TODO list as something to look=20 into when the time comes... --=20 -S=F8ren
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41D9A3FA.20204>