Digital and Analog Audio Representation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Analog Representations ~~~~~~~~~~~~~~~~~~~~~~ Air ~~~ Most fundamentally, sound exists as fluctuations of air pressure in our atmosphere. These fluctuations are commonly graphed as functions of time, producing the wave images we are accustomed to viewing. The wave's amplitude is reflective of the current level of air pressure at a fixed point over time, where 0 amplitude refers to the "normal" atmosphere's pressure. As the amplitude increases, our ears interpret a louder signal. There is theoretically no limit to the volume of sound, but our minds stop measuring volume after a certain level has been reached. It is important to realize that there is never any signal representing "pitch": This is all a manifestation of our minds. The pitch of a signal is directly related to the frequency of fluctuations in amplitude. Understand that at any given instant, no particular tone is being played. Tones only exist over (potentially very short) periods of time. Wire ~~~~ Audio can be transmitted across 2 wires fairly easily. A microphone has a carbon diaphragm that changes in resitance as it distorts. It will distort from the fluctuations in air pressure that make up sound. When the audio is transmitted across the wire, the amount of current flowing through the wire corresponds to the change in air pressure. In order to represent both sides of the wave, the speaker uses AC current. This also means that there is no polarity requirements for speakers. They can be plugged in backward, with the only consequence that they may be "out of phase" with other speakers, potentially cancelling out certain frequencies of the sound. Since microphones use the same carbon diaphragm setup as headphones, both work as recording *and* playback devices, and can be used interchangably in most situations. So, if you're ever stuck without a microphone, just pop in your headphones instead. Radio Waves ~~~~~~~~~~~ When transmitting electric signals through the air, it is generally not practical to transmit at audio frequencies for various reasons. One, you could only transmit 1 audio signal at a time without interfering with another one, and two, the wavelengths for such low frequencies would require massive antennas for effective broadcasting. There are many techniques for broadcasting audio over the airwaves. I will briefly discuss 2 techniques: AM and FM. AM ~~ Amplitude Modulation is fairly self explicit. Essentially, a "carrier wave" is generated. This is generated at, say, 800,000 cycles per second, and is nothing more than a sine wave oscillation at this frequency. Often, this is expressed as 800 kHz. Next, the amplitude of the carrier signal is mapped to correspond to the air pressure currently being detected at the microphone. Notice that the frequency of the carrier signal does not change, only the "height" of its waves. This signal is then radiated into the air. A person who wants to decode a signal on 800 kHz generates his/her own carrier wave of 800 kHz, and subtracts that from the signal in the air and are left with a signal that will be filtered, amplified, and sent straight to a speaker. FM ~~ Frequency Modulation is slightly more complicated than AM, but generally makes for better quality sound. Since the standard commercial bands for FM are of a lot higher frequencies than AM, a much smaller antenna is needed for broadcasting. Because of this, FM has become the pirate radio station's modulation of choice. FM works by modulating the frequency instead of the amplitude (duh). The potential amount of frequency that is taken up using this method is called the signal's "bandwidth". For commercial FM broadcasts, the bandwidth is 150 kHz. Since an FM signal's amplitude remains constant, it is less susceptible to RF interference (creating fuzz, clicks, and pops) than AM. On commercial FM frequencies, the bands are spaced 200 kHz apart and start at 87.5 MHz. This is why commercial FM stations always use odd 100-kHz digits. FM is also able to transmit stereo signals. A 19 kHz tone is produced when stereo is being broadcast, which lights up an LED on many radios to indicate stereo. On the normal band, the radio recieves the left signal added to the right signal: L+R. This is so that mono radios will play both channels mixed together. On another channel, the FM station broadcasts L-R. So, when the reciever wants to play stereo, it adds the 2 channels together: L+R + L-R = L+L+R-R = L+L = 2L And, it subtracts the 2 channels: L+R - L-R = L+R - (L-R) = L+R -L+R = L-L + R+R = 2R It then has 2L and 2R, which it sends to the appropriate speakers at half volume. Telephone ~~~~~~~~~ A telephone works similarly to a normal speaker wire, except that the telephone is a full-duplex device (meaning you can talk and recieve at the same time). It does this by both sides modulating amplitude. The telephone also takes some of the signal from your microphone and adds its inverse to the signal fed into your speaker. In effect, this cancels out some of the "feedback" from your own microphone. This is called a "sidetone". This is, essentially, to quiet people down. If the sidetone is too loud, the calling party will generally start shouting into the phone, and if the the sidetone is a perfect copy of the original, it will cancel out all of the party's voice and the phone will appear dead. Although there is no technical requirement for it, the telephone company usually does not transmit frequencies below 400 hertz or frequencies above 3400 hertz. The main advantage in this is that they can transmit more calls at once through long distance lines than would be possible if a full frequency spectrum had to be maintained. A telephone is unique because it draws its power from the received audio signal. The telephone company uses large lead-acid power supplies in order to provide this power. This ensures phone operation during power outages and a "hum-free" power signal necessary for audio signal transmission. Misc ~~~~ Aside from being transmitted, analog encoding methods are also popular for storage of sound. A phonograph uses etchings on a vinyl disc. A tape uses a long strip of magnetic tape, etc. However, analog encoding is not the encoding method of choice for all applications. This brings us to digital representations: Digital Representation ~~~~~~~~~~~~~~~~~~~~~~ I'll discuss 4 digital audio representations: PCM, FM synthesis, compressed PCM, and misc. PCM ~~~ Pulse Code Modulation is perhaps the most common form of digital audio representation. PCM works by taking samples of the air pressure at the microphone at certain consecutive points in time, and recording them as digital numbers. Generally, these measurements are taken according to a clock. The clock ticks at a certain rate: the sample rate. For CD quality audio, this sample rate is 44100 samples/second. 8000 samples/second is usually good enough for human speech, however. The amount of space used for each sample is called the "sample width". CD quality audio uses 16 bits per sample. Most human speech systems use only 8 bits. Often, inside a computer, the sound is represented in 32 bits even though it was recorded at 16 bits (or 24 with modern soundcards). This is to prevent "clipping" of the audio during effects processing. So, the amount of space that a PCM recording takes can easily be calculated using these 2 values. For instance, a 10 second stereo clip at CD quality would take: 44100*16*10*2 = 14112000 bits, or 1764000 bytes. About 1.7 megs. This is about the size a CD quality stereo wave file takes too, for good reason: A wave file is no more than a large PCM file (plus headers). An interesting property of PCM audio is that it is impossible to represent frequencies higher than half of the sample rate. This is because both the bottom and the top half of the signal need to be recorded, so one cycle of a wave needs at least 2 samples. So, a CD's maximum frequency rating is 44100/2 = 22050 hertz. Note that most sound cards work exclusively in PCM for both their input and their output, so that usually you'll always have to convert any type of digital audio to PCM by hand in order to play it. FM Synthesis ~~~~~~~~~~~~ Frequency Modulation Synthesis was a rather poorly chosen name for this type of representation, because it is quite confusing with FM radio representation. FM Synthesis has nothing to do with FM radio. FM Synthesis essentially represents the sounds as a series of sine waves. The frequency of the wave and its duration are stored. Some sound cards directly support this, but it has almost completely been replaced by PCM. FM Synthesis's main advantage over PCM is that the size of the audio is much smaller. FM Synthesis requires more computation than PCM for playback, but the main disadvantage is that the sound quality is usually poor. Music and voice generally sound "tinny". Compressed PCM ~~~~~~~~~~~~~~ Compressed PCM representation is becomming more and more popular. I won't go into how compressed PCM works, but essentially, it's a compression method that, when uncompressed, generates PCM audio. There are 2 types of compressed PCM: lossy and lossless. Lossless compression guarantees that when the audio is uncompressed you will have an exact copy of the original PCM file. Lossy compression throws out some of the PCM data in order to conserve space. MP3/OGG/etc. are examples of lossy compression. FLAC is an example of lossless audio compression. Other ~~~~~ One other method of storing digital audio is to store a list of particular wave files to play, or particular notes and volumes, etc. MIDI is a good example of this. It stores particular music data, and the MIDI player is responsible for generating the PCM wave forms it sends to the soundcard. As a consequence, MIDI files are very small, but can't represent sound in the same sense that PCM or even FM synthesis can. As you can see, there are quite a few ways to represent sound. This is really only the tip of the iceberg, too. There are dozens more ways of representing audio in both analog and digital forms. This was only a quick summary of them. Fractal Hardcore Software hcsw.org