Problems with sound capture:
How do we process sound to classify and extract information?
Basic features of sound
When the signal is sinusoidal, it’s simple to calculate the frequency with a physics formula.
But if it’s not sinusoidal, what do you do? Analyse frequency spectrum. Enter Fourier.
Fourier: almost every signal can be broken down into multiple sinusoidal waves with different frequencies and amplitudes.
Instead of having signal amplitude as function of time, represent it by function of frequencies.
Then you end up with a Fourier series — sum of simple sinusoidal waves with frequencies kf₀, amplitudes Ak and phase shifts φk:
$x(t) = A_{0} + \sum_{k=1}^N A_{k} \sin (2 \pi k f_{0} t + \phi_{k})$
The periodic signal has a frequency spectrum of various harmonics:
Component frequencies are a multiple of the fundamental frequency, called harmonics.
You can calculate amplitudes Ak with an algorithm called FFT (Fast Fourier Transform), in a vector.
You put in the vector of samples and the number of samples N, and you get out a vector of amplitudes of length N+1
Formulas:
Frequency step | Frequency at amplitude | Nyquist frequency | Last useful amplitude |
$\Delta f = \frac{F_s}{N}$ | $f_{k} = k \Delta f = \frac{kF_{s}}{N}$ | $F_{s}/2$ | $f_{N/2} = N/2 \Delta f$ |
Nyquist frequency (fc): maximum freq. detected using FFT; half sampling rate Fs.
some sound signals are periodic for a very short time
Cut the speech in segments (frames). Then you can apply FFT on those pieces. This is called segmentation or windowing.
Freq. spectrum varies in time
Graph with time on x-axis, frequency on y-axis and colour being amplitude of each frequency
Time domain: moving average filter Frequency domain: