Wednesday, November 28, 2012

How FM Killed the Additive Star, Part II

Now that we've looked at at some of the mathematics behind FM synthesis, let's try to attain a more concrete, intuitive grasp of the concepts. Firstly, I've provided some code zipped up, which you can get here. Extract it and follow the instructions in the README.

The code is fairly straightforward. I pulled elements from previous wave synthesis work, including the WavWriter, ByteConverter, and Oscillator base class. the FmOscillator simply implements the nextSample() method which Oscillator defines. One issue which has not yet been addressed is that of overflow in the phase. If the note is sufficiently long, eventually the floating point variable representing current phase will overflow, causing some potentially undesirable effects in the audio. While this issue is easily rectified in most other synthesis techniques (simply subtract 2 pi from the current phase when it exceeds 2 pi), the FM wave actually has two frequencies, and therefore 2 phases, one for the carrier frequency and one for the modulating frequency. Finding the exact point in which they align might be tricky (or even impossible, I'm not certain), but it's definitely an interesting problem which ought to be addressed.

The two parameters which vary in this sample program are harmonicity and modulation index. Harmonicity, from a mathematical standpoint, is the ratio of the modulating frequency and the carrier frequency of the FM wave. This ratio is convenient, since it allows us to express our modulating frequency in terms of our carrier frequency, rather than defining two distinct frequencies. Modulation index scales the sinusoid generated by the modulating frequency.

So, let's take a look at a few wave forms and see how varying harmonicity and modulation index affects them. First, we shall generate a wave with a harmonicity of 1.0 and a modulation index of 1.0. The resulting wave looks like this:

With our parameters, the resultant output is reminiscent of a simple additive sine wave we produced earlier in this blog series.

Now, keeping the harmonicity constant, let's increase the modulation index a few times:

The above waves have modulation indices of 1.1 and 1.5, respectively. Harmonicity is held at a constant 1.0. It appears that not too much has changed visually about these waveforms; the extent by which the wave is altered by that "additive-esque" bit simply increases. Perhaps then, we can think of the modulation index as the amount by which the carrier wave is affected by the modulating wave.

Now let's keep modulation index constant at 1.0, and vary harmonicity:

Woah, that's a bit of a difference! The waveforms are still periodic, but it's difficult to see the carrier frequency sinusoid at all now! Using a non-integral multiple of the carrier frequency as the modulating frequency puts it out of phase with the carrier frequency, generating these longer periods of repetition in the wave. Listening to these generated waves makes it sound a bit like a bell tone, much like a ringing phone or a door bell.

Finally, let's vary both parameters a bit:

harm = 1.1, mod = 1.1

harm = 1.1, mod = 1.5

harm = 1.5, mod = 1.1

harm = 1.5, mod = 1.5

We can see here that the observations we have made above apply to these waveforms as well. As modulation index increases, so does the amplitude of the subsidiary "humps". As harmonicity increases, we get more interesting phase differences between the carrier and modulating frequencies. Another way to think of these parameters is as such: the harmonicity dictates our modulating frequency's value. The modulating frequency, which is faster than our carrier frequency, forces our phase to deviate from its linear path. The modulation index determines the amount by which the carrier frequency phase deviates. This deviation leads to the production of audio sidebands in the wave spectrum, which is described mathematically by the infinite sum identity of the FM equation. Our modulation index determines the bandwidth, or number of sidebands generated, while the modulating frequency determines the frequency locations of those sidebands. Choosing a harmonic (integer multiple of the carrier) modulating frequency gives us harmonic sidebands, while non-integer multiples will give us inharmonic sidebands.

Cool stuff, no? I wish I could show just where the sidebands appear in the frequency domain, but unfortunately, I don't have a way to spectrally analyze the waveforms... yet. Perhaps this implies an adventure with Discret Fourier Transforms, and their implementations, Fast Fourier Transforms? We shall see...

The Things Yet to Come

Well, the semester is coming quickly (too quickly) to a close, and so I'll only have time for a few more topics. I hope to touch upon the following items:
  • DFT's and FFT's
  • MIDI format
  • JACK Audio
Additionally, I would like to create a project which appropriately sums up my semester's work. I think a simple sequencer application will be a nice piece with which to end my work for this independent study.

--- end transmission ---

Thursday, November 15, 2012

How FM Killed the Additive Star, Part I

Additive synthesis, as we have explored earlier in the blog, is an excellent way to create very complex and interesting periodic waves. Also, since additive synthesis is modeled after the Fourier series, it is also quite simple, conceptually and mathematically, to understand. However, an issue which arises with additive synthesis is that it becomes quite expensive to compute waves with many harmonics compounded upon it. An oscillator must be provided for each wave that is intended to be part of the summation; calculating all samples creates a fair bit of overhead. Additionally, while the waveforms are intricate in their own right, studies have shown that a large portion of our ability to recognize instrument timbre is due to the attack and decay patterns of each instrument. Additive synthesis does a poor job of emulating accurately these patterns.

Fortunately, Frequency Modulation Synthesis (FM synthesis) arose to fulfill such requisites. Developed by John Chowning at Stanford during the 1970's, FM synthesis is an alternative means by which complex audio signals may be synthesized. However, the process is much cheaper compared to additive synthesis and provides ample flexibility to model to a very granular level of accuracy the intricate attacks and decays of actual instruments. While FM synthesis is somewhat "recent" in its conception, frequency modulation has seen its fair share of applications in years prior, most notably, in FM radio.

Take a look at the FM synthesis equation below:

Amplitude, which may vary by time, simply controls the height of the wave at a given time t. The carrier frequency is the audio frequency about which all the FM sidebands are clustered. Modulation index indicates the amount by which the modulating frequency will affect the carrier frequency over time. The modulating frequency is an additional audio-ranged frequency which is used to alter the carrier frequency.

A bit confusing? Understandably so. To perhaps elucidate the consequences of the equation, we can simplify its appearance a bit:
where phi is our angular frequency for our carrier frequency and beta is our angular frequency for our modulating frequency. We can safely ignore amplitude and modulation index for now. The resulting equation has a trig identity which is the infinite sum of sinusoidal waves of varying phases, multiplied by a Bessel function (with which I have NO experience... mathematics is black magic!). These infinite sinusoids actually represent the sidebands previously mentioned. They can be thought of as our harmonics when adding periodic waves together in additive synthesis. Cool stuff, no?

For a visual and audio walkthrough on this stuff, check out this link here.

In the next post, I'll provide a code and audio example of FM synthesis so that we can get a more tangible idea of the technique.

-- end transmission --

Friday, November 9, 2012

Do the WAV, Part Deux

File formats are rather dry, so I'm going to try to wrap my overview of WAV up with this post. Fortunately, I'll be providing code snippets and sample WAV files, so the monotony might be somewhat alleviated. As an aside, this post will assume you have basic knowledge of C++ file I/O, as well as some understanding of bit manipulation. If not, just check out the docs online - it's pretty simple stuff.

Okay, so the code listed below is the method I wrote to handle writing the header to the WAV file.

1:  bool WavWriter::writeHeader( std::ofstream & fout, int length ) {  
2:       char * intBuf = new char[4];  
3:       char * shortBuf = new char[2];  
5:       ByteConverter::intToBytes( 36 + length, intBuf );  
7:       // write "RIFF"  
8:       fout.write( ckId, 4 );  
9:       // write total size of the chunks, less the 8 bytes for RIFF and WAVE  
10:       fout.write( intBuf, 4 );  
11:       // write "WAVE"  
12:       fout.write( format, 4 );  
13:       // write "fmt "  
14:       fout.write( fmt_, 4 );  
16:       ByteConverter::intToBytes( 16, intBuf );  
17:       // write chunk one size  
18:       fout.write( intBuf, 4 );  
20:       // write the compression level  
21:       if( m_format == PCM ) {  
22:            ByteConverter::shortToBytes( 1, shortBuf );  
23:            fout.write( shortBuf, 2 );  
24:       } else {  
25:            // support for other compression formats later  
26:            return false;  
27:       }  
29:       if( isStereo() ) {  
30:            ByteConverter::shortToBytes( 2, shortBuf );  
31:            fout.write( shortBuf, 2 );  
32:       } else {  
33:            // mono  
34:            ByteConverter::shortToBytes( 1, shortBuf );  
35:            fout.write( shortBuf, 2 );  
36:       }  
38:       ByteConverter::intToBytes( m_sampleRate, intBuf );  
39:       // write the sample rate  
40:       fout.write( intBuf, 4 );  
42:       int channels = 1;  
43:       if( isStereo() ) channels = 2;  
45:       // write byteRate  
46:       int byteRate = m_sampleRate * channels * ( m_bitsPerSample / 8 );  
47:       ByteConverter::intToBytes( byteRate, intBuf );  
48:       fout.write( intBuf, 4 );  
50:       // write block align  
51:       short blockAlign = channels * ( m_bitsPerSample / 8 );  
52:       ByteConverter::shortToBytes( blockAlign, shortBuf );  
53:       fout.write( shortBuf, 2 );  
55:       // write bits per sample  
56:       ByteConverter::shortToBytes( (short) m_bitsPerSample, shortBuf );  
57:       fout.write( shortBuf, 2 );  
59:       // write "data"  
60:       fout.write( data, 4 );  
62:       delete [] intBuf;  
63:       delete [] shortBuf;  
65:       return true;  
66:  }  

It's a bit verbose, but fairly straightforward. Each write to the file (fout) is either a 32-bit or 16-bit length byte array, depending upon which part of the chunk is being written. As you can see, the chunk IDs and chunk sizes are 4 bytes in length, as well as the sample and byte rates. The rest of the fields need only be 2 bytes in length. Please refer to the chart I placed in my previous post it get a more visual overview of the tag ordering in the header.

The ByteConverter methods seen in the code above simply convert the 32-bit and 16-bit datatypes to byte arrays, with the MSB (most significant byte, in this case) being last in the array. The code for such a method would look as such:

1:  void ByteConverter::shortToBytes( short value, char * buffer ) {  
2:  /* Writes a short in byte array format, big endian */  
3:       buffer[0] = value & 0xFF;  
4:       buffer[1] = ( value >> 8 ) & 0xFF;  
5:  }  

Now that the header-writing code is taken care of, we can worry about the data being written. This is actually quite simple, and is handled in the method shown below:

1:  bool WavWriter::writeWav( char * wave, int length ) {  
2:       if( isStereo() ) {  
3:            return writeWav( wave, wave, length );  
4:       }  
6:       std::ofstream fout( m_filename.c_str(), std::ios::out | std::ios::binary );  
7:       if( !fout.is_open() ) {  
8:            return false;  
9:       }  
11:       if( !writeHeader( fout, length ) ) {  
12:            return false;  
13:       }  
15:       char * intBuf = new char[4];  
16:       // write chunk two size  
17:       ByteConverter::intToBytes( length, intBuf );  
18:       fout.write( intBuf, 4 );  
20:       // write the data  
21:       fout.write( wave, length );  
23:       fout.flush();  
24:       fout.close();  
26:       delete [] intBuf;  
28:       return true;  
29:  }  

The check for isStereo() at line 2 simply checks if the user wishes to write the data into two channels. I won't post the code for that here, since it's quite similar to this code, but one must simply interleave the two sets of data for left and right channels, alternating samples in the file. The rest of this code writes the size of the data chunk to the file, followed by the data. The file is then flushed and closed, as is good practice, and voila! a WAV file, hot from the oven.

My main function looks as such below. Don't mind the oscillator objects I have used in there; they simply encapsulate waveform creation. They are basically giving me one sample of the waveform every time I loop, so that I may fill up my data buffer. The WAV is then written, and the program exits.

1:  int main( int argc, char ** argv ) {  
2:       if( argc <= 2 ) {  
3:            std::cout << "please provide valid command line arguments. Syntax is '/wav_writer <filename.wav> <oscillatortype>'" << std::endl;  
4:            return -1;  
5:       }  
7:       std::string filename;  
8:       filename = "res/";  
9:       filename += argv[1];  
11:       Oscillator * oscillator;  
12:       Oscillator * oscillatorTwo = 0;  
14:       if( strcmp( argv[2], "triangle" ) == 0 ) {  
15:            oscillator = new TriangleOscillator();  
16:       } else if( strcmp( argv[2], "rsaw" ) == 0 ) {  
17:            oscillator = new RisingSawtoothOscillator();  
18:       } else if( strcmp( argv[2], "additive" ) == 0 ) {  
19:            oscillator = new SineOscillator();  
20:            oscillatorTwo = new SineOscillator( 523.0f );  
21:       } else {  
22:            oscillator = new SineOscillator();  
23:       }  
25:       WavWriter wavWriter( filename );  
27:       wavWriter.setBitsPerSample( 16 );  
28:       wavWriter.setStereo( false );  
30:       int dataSize = 5 * oscillator->getSampleRate() * 2; // duration in seconds * sample rate * bytes per sample  
31:       char * data = new char[dataSize];  
33:       if( oscillatorTwo != 0 ) {  
34:            for( int i = 0; i < dataSize - 1; i+=2 ) {  
35:                 ByteConverter::shortToBytes( oscillator->nextSample() / 2 + oscillatorTwo->nextSample() / 2, data, i );  
36:            }  
37:       } else {  
38:            for( int i = 0; i < dataSize - 1; i+=2 ) {  
39:                 ByteConverter::shortToBytes( oscillator->nextSample(), data, i );  
40:            }  
41:       }  
43:       if( wavWriter.writeWav( data, dataSize ) ) {  
44:            std::cout << "hooray! it worked!" << std::endl;  
45:       } else {  
46:            std::cout << "aww, no worky." << std::endl;  
47:       }  
49:       delete oscillator;  
50:       if( oscillatorTwo != 0 ) {  
51:            delete oscillatorTwo;  
52:       }  
54:       return 0;  
55:  }  

And that's it! Fairly simple, no? Below, I've posted links to WAV files I've produced using the oscillators listed in the code above. If you want to see the shapes of the waveforms produced below, just open them up in your favorite waveform editor. I would suggest Audacity for a lightweight, yet powerful editor.

Rising Sawtooth

Looking for the complete source code? Just hit me up in a comment! Cheers!

- End Transmission -

Wednesday, November 7, 2012

Do the WAV, Part I

Whew, sorry for the hiatus (again) folks; they just seem to keep getting longer! I'm going to try breaking my posts into shorter updates, so I can write more often.

Today's update will be on audio file formats, specifically WAV. WAV stands for Waveform Audio File Format, and is a subset of the RIFF specification. RIFF, developed jointly between IBM and Microsoft and first released in 1991, stands for Resource Interchange File Format. The RIFF specification encompasses a multitude of multimedia resource types and is not simply limited to audio formats. However, it does provide the foundation for MIDI (Musical Instrument Digital Interface), so I'll probably try to touch upon that later down the road.

RIFF files are constructed out of sections of data called "chunks". Each chunk contains a chunk ID, a chunk size, and the data contained in the chunk. This format allows any program designed to read files adhering to the RIFF specification to skip over chunks with unknown IDs, thus providing a simple yet robust way for files to be backwards compatible, when new features come out for a particular file format.

So, the simplest possible arrangement for a WAV file is as follows:

The first chunk ID is the ASCII string "RIFF", signifying this is a RIFF formatted file. The chunk size that follows represents the size of the remainder of the file, less 8 bytes to account for "RIFF" and "WAVE". The format string signifies this will be following WAV file conventions.

The next chunk is the "fmt " chunk. Be certain to note that an ASCII space character follows the first 3 characters of the ID string. The chunk size is generally 16 bytes, unless a different format category is used, but I don't intend on touching upon any other format category than PCM (pulse code modulation). Next, of course, is the format category. PCM is represented by 0x1 in WAVE files. Following format category is the number of channels represented in the file; this may be either mono or stereo. Sample rate are the number of samples per second to be processed, and byte rate follows similar suit. An appropriate byte rate is used to estimate the size of the buffer the program reading the WAV file needs for audio. Block alignment is some sort of black magic which is somehow linked to aligning buffers, and bits per sample is simply the size of each sample in the data.

Finally, the last chunk is that of the actual audio data. If PCM is used, this is usually an 8 or 16 bit discrete representation of an audio sample, much like those I have described in detail in my previous posts. The chunk ID is simply the four ASCII letters "data" and the chunk size is the length, in bytes, of the data. If audio is in stereo, then the samples alternate left-right.

Well, I think this is enough for one post! In my post tomorrow, I'll provide code samples for writing WAV files, as well as some of the resulting files for your listening... erm... pleasure. Until next time!

-End transmission-