Wednesday, November 7, 2012

Do the WAV, Part I

Whew, sorry for the hiatus (again) folks; they just seem to keep getting longer! I'm going to try breaking my posts into shorter updates, so I can write more often.

Today's update will be on audio file formats, specifically WAV. WAV stands for Waveform Audio File Format, and is a subset of the RIFF specification. RIFF, developed jointly between IBM and Microsoft and first released in 1991, stands for Resource Interchange File Format. The RIFF specification encompasses a multitude of multimedia resource types and is not simply limited to audio formats. However, it does provide the foundation for MIDI (Musical Instrument Digital Interface), so I'll probably try to touch upon that later down the road.

RIFF files are constructed out of sections of data called "chunks". Each chunk contains a chunk ID, a chunk size, and the data contained in the chunk. This format allows any program designed to read files adhering to the RIFF specification to skip over chunks with unknown IDs, thus providing a simple yet robust way for files to be backwards compatible, when new features come out for a particular file format.

So, the simplest possible arrangement for a WAV file is as follows:


The first chunk ID is the ASCII string "RIFF", signifying this is a RIFF formatted file. The chunk size that follows represents the size of the remainder of the file, less 8 bytes to account for "RIFF" and "WAVE". The format string signifies this will be following WAV file conventions.

The next chunk is the "fmt " chunk. Be certain to note that an ASCII space character follows the first 3 characters of the ID string. The chunk size is generally 16 bytes, unless a different format category is used, but I don't intend on touching upon any other format category than PCM (pulse code modulation). Next, of course, is the format category. PCM is represented by 0x1 in WAVE files. Following format category is the number of channels represented in the file; this may be either mono or stereo. Sample rate are the number of samples per second to be processed, and byte rate follows similar suit. An appropriate byte rate is used to estimate the size of the buffer the program reading the WAV file needs for audio. Block alignment is some sort of black magic which is somehow linked to aligning buffers, and bits per sample is simply the size of each sample in the data.

Finally, the last chunk is that of the actual audio data. If PCM is used, this is usually an 8 or 16 bit discrete representation of an audio sample, much like those I have described in detail in my previous posts. The chunk ID is simply the four ASCII letters "data" and the chunk size is the length, in bytes, of the data. If audio is in stereo, then the samples alternate left-right.


Well, I think this is enough for one post! In my post tomorrow, I'll provide code samples for writing WAV files, as well as some of the resulting files for your listening... erm... pleasure. Until next time!

-End transmission-

No comments:

Post a Comment