Date: Mon, 02 Nov 1998 12:26:26 -0800 From: grady@xcf.berkeley.edu (Steven Grady) To: multimedia@FreeBSD.ORG Subject: How can we switch to a higher-level audio interface? Message-ID: <199811022025.MAA29067@hub.freebsd.org>
next in thread | raw e-mail | index | archive | help
[Summary: the current standard of interacting directly with the audio device sucks. What approach can we take to improve it, if any?] As I've been experimenting more with various audio-related pieces of software in the last few months, I've become more and more concerned with the fact that there is an increasing body of software that uses a _really_ broken approach to sound, namely, to open the device directly. This has three serious problems: it doesn't work over the network, only one application at a time can play a sound, and it is a low-level API. All of these problems used to exist for graphics, which was of course why X was developed. There have been various solutions proposed -- the Network Audio Server is probably the most advanced, but development/maintenance seems abandoned, and the documentation is skimpy (I know -- I'm trying to write an application that uses it). There's also rplay, KDE's kaudioserver (which I haven't looked into yet), and some others. The problem is that none of these solutions is being adopted on a widespread basis. Instead, new applications still use /dev/audio, /dev/dsp, etc. It seems to me that there are a few reasons why no server-based model has been adopted. The main reason is that there is no standard. Unlike X, there is no single interface that everyone uses. Also, the various solutions are not particularly robust -- who wants to spend their precious development time debugging someone else's code? But most likely, I think that it is just a huge blind spot for most people -- people seem comfortable with direct device access, despite there being a much better alternative model literally in front of their faces. It's worth analyzing the three problems I mentioned above: non-network: apparently not a serious problem for most people, since most people run the applications on the machine in front of them. An obvious exception is X terminals, which is why NCD took over the development of NAS. But most BSD/Linux people don't run on X terminals. low-level API: only affects developers. I'm sure the developers dealing with sound would like a higher-level API (I know I do), but since the user won't see it, it doesn't seem worth dealing with. exclusive access: This is the thing that amazes me. I HATE not having common access. I want to use the audio to play mpeg3s, have interesting system sounds, use speak-freely, run timidity, play games, etc. Right now, I have to choose which one I want, and turn off everything else. That SUCKS. Why aren't users up in arms about this? Okay, so my question is, is it too late to do anything about this? Here are some possible approaches: Change the device driver so that multiple access is allowed. Advantages: no applications need to be changed. Addresses the most obvious problem with the least amount of effort. Disadvantages: terrible design -- that kind of functionality should _not go in a device driver. Doesn't address all the problems. Not beneficial outside the FreeBSD world. Dangerous -- bugs in tricky algorithms could cause system to crash. Implement a dynamically-loadable device driver that allows multiple access. Advantages: no applications need to be changed. Disadvantages: Still not a good design. Doesn't address all the problems. Not beneficial outside the FreeBSD world. Requires additional non-trivial work to make the sound driver dynamic (although I think this should be done by someone anyway). Still dangerous. Port all the applications to use NAS (or rplay, kaudioserver, etc.). Advantages: Moves to more X-like model. Disadvantages: SERIOUS porting effort. Locks into one audio API (which may not end up being standard). Such major changes may not be accepted back into code base by original developers. Existing APIs are still fairly low-level. Chosen audio interface must be debugged (both client and server), potentially enhanced to support all existing device-level functionality. Develop toolkit-level API, port all to use that API, implement on top of one or more audio server formats. Ultimately, create multiple language bindings. Advantages: Moves to more tk-like model. Programs that use audio in a simple way (e.g. playing a sound) may require very simple changes, promoting acceptance by original application developers. Eases cross-platform (i.e. beyond UNIX) implementations. Easy to switch low-level interfaces to use the best (most robust and/or functional). Disadvantages: SERIOUS development effort. Non-trivial porting effort. For audio-intensive applications (timidity, speak freely, etc.), not likely to be accepted back into original codebase (until toolkit becomes established). Proliferation of libraries either bloats code, or causes installation problems (with dynamic libraries). Another issue to consider is that we FreeBSD folks are in a priveleged position with respect to the Linux folks. Since our ports model incorporates the patching of existing code bases, we can make programs work with FreeBSD by supplying a patch, rather than trying to get all the individual developers to incorporate whatever changes we come up with. So if we followed one of these approaches, we can experiment just within our world, rather than in the entire free software community. What do people think about these issues? Are there others? Is there any chance that I'll be able to play an mpeg, AND have my system ring when a chat request comes in, without switching to Windows? (Oh, one more thing. This problem occured in displays, it also occurs with sound. In fact, it's a general device-access thing. If we fix this, we still have the problem of accessing the joystick, the CD-ROM, tape drives, etc. People come up with individual ad-hoc solutions; maybe it's time to think about what could be abstracted into a general solution.] (And another thing -- although I've been thinking about this for a while, I was inspired to actually send this e-mail after reading a very interesting memo on open source from a high-level microsoft dude, annotated by Eric Raymond. It occurred to me while reading it that while X provides a wonderful example on the plus side of open source, our current audio problems are a strong example on the minus side. I'd hate to think the audio situation was a lost cause. URL: http://www.tuxedo.org/~esr/halloween.html) Steven grady@xcf.berkeley.edu "I think life should be more like TV. I think all of life's problems ought to be solved in 30 minutes with simple homilies, don't you? I think weight and oral hygiene ought to be our biggest concerns. I think we should all have powerful, high-paying jobs, and everyone should drive fancy sports cars. All our desires should be instantly gratified. Women should always wear tight clothes, and men should carry powerful handguns. Life overall should be more glamorous, thrill-packed, and filled with applause." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-multimedia" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811022025.MAA29067>