Time domain frequency tracking for voice and tuba

A voice-like waveform, lovingly rendered in drawing software
There are certain timbres which are most easily pitch tracked in the time domain, rather than the frequency domain. The tuba is one such timbre. It, like the human voice, tends to have one large impulse followed by several smaller impulses. The large impulse is the fundamental frequency. The following little hills are overtones and formants and what not. For the tuba, every large impulse is caused by a single buzz of the lips. Lips flap open, impulse happens, impulse echos, lips flap open again, another impulse, more echoes inside the horn. Similarly, your vocal chords work in the same way. The vibrate like buzzing lips and echo in your head. So the fundamental frequency of your voice is the frequency of the large impulses, not the smaller ones that follow.

So to know the pitch, all you have to do is know how often those peaks come. How do you recognize the peaks? Well, they certainly spike up above the average amplitude, while also raising the amplitude. A good amplitude following algorithm is the Root Mean Square. Since the tuba and the voice have low fundamental frequencies (220 Hz for a typical female voice and the tuba gets down around 40 Hz and possibly below), it’s important to have a long enough window length for the RMS. You want to get enough samples such that you don’t have false positives.
Then, you can subtract the RMS from the original frequency. All but the high peaks will drop below 0. Then count the zero crossings. You’ll have to divide by two, since each peak crosses zero twice: once on the way up and once on the way down.
If you suspect that all of your pulse energy is negative for some reason, you have two options: You can try multiplying your original signal by -1 before subtracting. Or you can take the absolute value of the signal and use that.
Here’s some sample code for SuperCollider. Use headphones to prevent feedback and then sing into your computer. If your voice is low, you may need to adjust the window size. Note that it’s given in samples.

  SynthDef("test-time-domain-freq-tracker", { arg in, out, rmswindow = 200;
 
  var rms, xings, inner, peaks, sin;
  
  inner = AudioIn.ar(in, 1);
  
  rms = (RunningSum.ar(inner.squared, rmswindow)/rmswindow).sqrt;
  
  peaks = inner - rms;
  
  xings = ZeroCrossing.ar(peaks);
  
  sin = SinOsc.ar(xings/2);
  
  Out.ar(out, sin);
  
 }).send(s);

Tags: ,

Published by

Charles Céleste Hutchins

Supercolliding since 2003

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.