From owner-freebsd-hackers Sun Jul 20 20:29:30 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id UAA18988 for hackers-outgoing; Sun, 20 Jul 1997 20:29:30 -0700 (PDT) Received: from rah.star-gate.com (rah.star-gate.com [204.188.121.18]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA18971; Sun, 20 Jul 1997 20:29:22 -0700 (PDT) Received: from rah.star-gate.com (localhost.star-gate.com [127.0.0.1]) by rah.star-gate.com (8.8.5/8.8.5) with ESMTP id UAA00494; Sun, 20 Jul 1997 20:28:52 -0700 (PDT) Message-Id: <199707210328.UAA00494@rah.star-gate.com> X-Mailer: exmh version 2.0gamma 1/27/96 To: Michael Smith cc: luigi@iet.unipi.it (Luigi Rizzo), hackers@FreeBSD.ORG, multimedia@FreeBSD.ORG Subject: Re: dma handling in the sound driver In-reply-to: Your message of "Mon, 21 Jul 1997 11:28:44 +0930." <199707210158.LAA20128@genesis.atrad.adelaide.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 20 Jul 1997 20:28:52 -0700 From: Amancio Hasty Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi Guys, All I am trying to do is to explain the existing dma algorithm so that the new re-design does not break the sound apps. Michael's explication is very close to the current dma algorithm. SNDCTL_DSP_SETFRAGMENT sets the block of io which the sub modules use to send or receive data. SNDCTL_DSP_SETBLKSIZE for all practical purposes is the same as the above. If the user does not set the block size then the dma routines choses a size equal to about 1/2 second worth of data. For typical sun style au files this is 4096 bytes . A quick walk thru of the existing dma routines cat TSTSND-elvis-has-left-bldg.au >/dev/audio Jul 20 19:15:29 cioloco /kernel: audio cnt 18364 audio_write initiates request for dma Jul 20 19:15:29 cioloco /kernel: dmabuf start count 65536 Set up an auto dma buffer of size 65536 The current block size which we use for io is 4096. qlen is the number of buffer's queued up . The dma routines loop on the auto dma buffer modulus 4096. Jul 20 19:15:29 cioloco /kernel: intrflag 0 cnt 4095 dsp_count 4095 This is the first time thru in sb16_dsp.c so we kick off the dma process. Jul 20 19:15:29 cioloco /kernel: what qlen 4 qhead 1 We got an interrupt and we have 4 queued up buffers Jul 20 19:15:29 cioloco /kernel: return cnt 4095 Jul 20 19:15:30 cioloco /kernel: what qlen 3 qhead 2 Jul 20 19:15:30 cioloco /kernel: return cnt 4095 Jul 20 19:15:30 cioloco /kernel: what qlen 2 qhead 3 Jul 20 19:15:30 cioloco /kernel: return cnt 4095 Jul 20 19:15:31 cioloco /kernel: what qlen 1 qhead 4 Jul 20 19:15:31 cioloco /kernel: what qlen 0 qhead 5 No more buffers to process we are done. I don't think latency is a problem with the current dma routines given that we can always set the block of io . However the dma routines are very complex and simplification is needed. >From The Desk Of Michael Smith : > Luigi Rizzo stands accused of saying: > > > > - with two DMA buffers, we can refill one buffer while the other one > > is in use by the DMA engine. We can still have troubles if we start > > a long refill near the end of operation of DMA on the other buffer, > > but this problem can be minimized (but not avoided; if we are late, > > we are late, no matter how many buffers we have!) > > This is the traditional double-buffered approach. You go on to > describe a triple-buffered approach which is more suited to the high > latency that is often encountered in multiprocessing situations. > > > In our implementation, we use a single memory block structured as > > three logical, variable-size, buffers: one in use by the dma engine, > > the next one ready for use by the dma (already filled up), the last > > one free for refills. > > > > dp,dl rp,rl fp,fl > > +-------+-------------+---------------+------+ > > | free | used by dma | ready for use | free | > > +-------+-------------+---------------+------+ > > > > Both the "ready" and "free" areas can wrap around the end of the > > buffer. > > I presume that the plan here is that the host DMA controller loops > endlessly over this buffer in autoinit mode? > > If not, there's no apparent need for such a complex approach; you can > just use three separate buffers each sized suitably to cover the > latency involved in filling the next. > > Another alternative would be to simply use an endlessly-recirculating > DMA buffer of appropriate size thus : > > DMA > V > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXOOOOOOOOOOOOOOOOOO > > Some data has been written to the sound device. Regardless of how > much data you have, you start the DMA going. > > DMA > V > OOOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXXXOOOOOOOOOOOOOOOOOO > > Here the DMA is in progress, consuming data. A selecting writer would be > able to write here. > > DMA > V > YYYYOOOOOOOOOOOOOOOOOOOOOXXXXXXXXXYYYYYYYYYYYYYYYYYY > > ... and more data has arrived. The key to keeping more data in the buffer > than is consumed by the looping DMA is to make sure that any selecting > writer is woken up often enough to keep you busy. In order to do this, > you need something that interrupts you more than once per DMA loop. > > At 44kHz, 16-bit stereo you are looking at 160kB/sec throughput, which > equates to 1.6kB per possible wakeup(). This isn't too hard to manage > really; a 64kB recirculating buffer will give you 400ms of audio, or > a 200ms mean wakeup time. > > You can play some more neat games. In the original first case : > > DMA > V > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXOOOOOOOOOOOOOOOOOO > ^ > TC > > you set the terminal count short of the end of the buffer. This avoids > your having to worry about the DMA running into unfilled buffer space. > > In the later case, where the buffer contents have wrapped : > > DMA > V > YYYYOOOOOOOOOOOOOOOOOOOOOXXXXXXXXXYYYYYYYYYYYYYYYYYY > ^ > TC > > TC is set to the end of the buffer, and the autoinit bit is set, so > that it will loop back to the start. When it does loop, you'll get an > interrupt, and you can reprogram TC again : > > DMA > V > YYYYOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO > ^ > TC > > If an application is too slow to respond to your waking it up, then > there's really nothing more you can do. As Amancio has observed, many > applications will want to write small amounts of data on a regular > basis. A timer event run once every 1/hz seconds can easily monitor > the progress of the DMA and update the buffer tail pointer & wake up > any writers. > > In this model, the overhead of uiomove is effectively irrelevant; data > is solicited from the application as early as is possible. > > I'm not sure if this actually helps you... > > > Luigi Rizzo Dip. di Ingegneria dell'Informazione > > -- > ]] Mike Smith, Software Engineer msmith@gsoft.com.au [[ > ]] Genesis Software genesis@gsoft.com.au [[ > ]] High-speed data acquisition and (GSM mobile) 0411-222-496 [[ > ]] realtime instrument control. (ph) +61-8-8267-3493 [[ > ]] Unix hardware collector. "Where are your PEZ?" The Tick [[