Lab 1 – October 2023
Introduction
This lab is common to EBU6018 and EBU5303.
Part 1 (Questions 1 to 4) of the lab is a revision of the Discrete Fourier Transform (DFT), the Fast Fourier Transform (FFT), and an introduction to the Short-Time Fourier Transform (STFT) and to spectrograms. Marks will go towards the assessment of EBU6018.
Part 2 (Questions 5 to 7) is a guided exploration of the characteristics of speech sounds. Marks will go towards the assessment of EBU5303.
The MATLAB programming outcomes from the lab (i.e., MATLAB .m function files and plot figures) are to be handed in as a “folder” of results, showing that you have completed the steps of the lab successfully.
You must also answer some questions directly within this document, which must be saved and submitted with other outcomes in your folder of results.
You must submit your combined report to BOTH EBU6018 AND EBU5303 QM+ course areas.
Getting Started
In your home directory, create the subdirectories “MULTIMEDIA_Y3_Sem1” and “MULTIMEDIA_Y3_Sem1/lab1”.
Download all the resources needed for the lab (i.e. audio files) in “lab1”.
Start Matlab. Use “cd <directory>” to get into the directory “lab1” you have just created.
In Matlab, type “edit” to start the Matlab editor.
Part 1 (EBU6018)
1. Discrete Fourier Transform
a.Create a Matlab function in the file “dft.m” to calculate the Discrete Fourier Transform of a signal. Recall that the DFT is given by [e.g. Qian, eqn (2.34) and course notes]
Hints:
• Start your function with: function sw = dft(st)
where “st” is the time waveform vector, and “sw” is the frequency waveform vector
• Matlab vectors (e.g. st and sw) start from 1, not zero, so use “n-1” and “m-1” to refer to the appropriate element
• Assume that N=M, and use the “length(st)” to find the value to use for these.
An example outline for your DFT Matlab function is provided below.
b.Generate some waveforms to test your function. Test your dft on the following four signals:
• Uniform function: “s=ones(1,64);”
• Delta function: “s = ((1:64)= =1);”
[NB: “1:64” generates the vector (1 2 … 64) ].
• Cosine wave: “s = sin(((1:64)-1)*2*pi*w/100)” for various values of w (at least two different values, one of which should be ω=12.5).
Why do we need to use “(1:64)-1”?
What values of w give the cleanest dft?
What happens if we use “cos”?
• Symmetrical rectangular pulse: “s = [0:31 32:-1:1]<T” for various values of T.
(NB: Why doesn’t this “look” symmetrical? Remember that the DFT repeats, so the time interval 32 .. 63 is “the same as” the interval -31 .. -1).
The following function may be useful to display your results:
If you want zero frequency (or time) to appear in the middle of your plot, use “fftshift”, e.g. “stem4(fftshift(dft(s)));”
2. Comparison with Matlab’s FFT function
Matlab has a built-in Fast Fourier Transform, “fft”.
a.Compare the results of your dft against the built-in fft. Are the results the same? If so, why: if not, why not?
b.Find out the complexity of your dft and the built-in fft, i.e. how long they take to perform their calculation for various lengths of s. Use “tic” and “toc” to measure the time taken to perform the operation, so e.g.
tic; dft(ones(1,4)); toc % No “;” for final expression
will report how long a 4-point DFT took to calculate.
Hint: You may find your dft is too fast for tic/toc to measure any useful difference. If so, run it several times, e.g.
tic; for (i=1:1e4) dft(ones(1,4)); end; toc
(Of course, remember to divide your measure by the number of times round the loop!)
c.Make a log-log plot (using “loglog”) showing the time increase with the size n of s.
On your plot, show that the DFT takes O(n2) time, while the FFT takes O(n log n).
Hint: Use “hold on” if you want to add a second “loglog” plot to an existing plot.
3. Single Windowed Fourier Transform
Read an audio file into Matlab, using “s = audioread('file.wav')”.
Where ‘file.wav’ is ‘music.wav’ (This file is provided in Part 2 of this labsheet.)
a.Plot the magnitude (“abs”) of the FFT of the waveform. (“plot” is probably better than “stem” for these longer signals).
We will now construct a function that will allow you to “zoom in” on a short section of the signal. To smooth out end effects, we will use a “Hanning” window to multiply the segment that we select. You can show the Hanning window of length 256 in Matlab using “plot(hanning(256))”.
b.Construct a Matlab function in the file “wft.m” that will select a section from the file and window it. The function “wft” is to be called as follows:
y = wft(s, t, n);
where s is the signal, t is the time in the middle of the window, and n is a window length.
You might use the following steps:
1) Select the desired section from the signal, for example using
s(floor(t-n/2)+(1:n));
(if you don’t see how this works, try “help colon”).
2) Multiply elementwise with a Hanning window of length n, using “.*”
3) Use the built-in Matlab fft function to calculate the DFT.
Use window lengths of 54, 256 and 1024.
Use times of 10000, 60000 and 110000 for the window position.
c. Plot the magnitude of this single windowed Fourier transform of your signal for various values of t and n (note that values of t near the beginning and end of s may cause an error, depending on how clever you were at step (1)). Try also plotting with a log y-scale.
4. STFT and Spectrogram
Now we will construct a “spectrogram” to visualise the time-frequency information in a signal on one image.
a.Read the Matlab documentation for the “spectrogram” function.
Plot and investigate the following audio file using the “spectrogram” function: music.wav.
Hint: if the sound s is stereo, you may want to make the signal a 1 row vector:
y = s'; y = y(1,:);
b.Try different window sizes (nfft) to see the effect. For fastest results on long files, use powers of 2 (Why?). Record what values of window size give best visualisation results for different files and suggest why.
4.1 Analysis of music sound
a. Read the ‘music.wav’ audio file using “audioread”:
[x fs] = audioread(‘music.wav’); % fs = sampling frequency
Record the sampling frequency, fs.
If you have headphones, try listening to the signal, using:
soundsc(x,fs); %fs is the sampling frequency of x
b.Plot a spectrogram of x, using the ‘spectrogram’ function.
c.From the spectrogram plot, estimate the fundamental frequencies (f0) of the notes in the sample, giving your answers in Hz.
Repeat your estimates for different window sizes (nfft).
Make your calculation by supplying spectrogram with the correct value fs when you call it.
Part 2 (EBU5303)
5. Comparing noise, music and speech
a.Create a Matlab function in the file “mySpectrogram.m”. Revise your use of the spectrogram function (if needed), so the X axis shows time and the Y axis shows frequencies.
Hint: use this version of the function:
spectrogram(y,w,noverlap,nfft,fs,'yaxis');
Where: y is the time signal (mono)
w = 0.54 - 0.46 * cos(2*pi*[0:nfft-1]/(nfft-1));
noverlap < nfft
nfft corresponds to a 20 ms (milli seconds) window
fs is the signal sampling rate
b.Plot the following audio files using your spectrogram function: firework-launch.mp3, music.wav, sp10.wav, sp10_white.wav. Save each plot as a separate jpg image in your folder of results.
Hints for easier comparison:
• When needed, resample the signal to 8kHz using the resample Matlab function, so all the spectrograms use the same scale on the Y frequency axis, from 0 to 4 kHz.
• Display all the spectrograms in the same size window, e.g.:
figure('Position', [200 250 800 250]);
spectrogram(…);
title(filename); % name of the audio file
c.For each ~20ms time window of signal (which we will roughly approximate to an individual “sound”), describe and compare what you see in the four images (e.g., what do the short horizontal lines indicate? Do they look the same in all the images?)
6. The different sounds in speech
a.Plot the following audio files using the spectrogram function you developed in Question 5: (1) can-you-keep-a-secret.wav, (2) come-on-you-can-do-it.wav, (3) maybe-next-time-huh.wav. Save each plot as a separate jpg image in your folder of results.
Hint: You may want to increase the scale on the frequency axis to make a more accurate estimation of the fundamental frequencies, e.g.:
figure('Position', [200 250 800 500]);
b.In the spectrograms you created in Q6.a, try to identify different types of speech sounds and annotate the images accordingly (annotate at least 10 sounds).
7. Comparing different intonations and different voices
a.Plot the following audio files: (1) ohyeah1.m4a, (2) ohyeah2.m4a, (3) ohyeah3.m4a using the spectrogram function you developed in Question 5. Save each plot as a separate jpg image in your folder of results.
b.Make more recordings of the word “ohyeah” using your own voice and the voices of a few friends (make at least 4 more recordings of both male and female voices varying the intonations).
c.Plot the audio files you obtained using the spectrogram function you developed in Question 5. Save each plot as a separate jpg image in your folder of results.
d.Compare the various spectrograms you plotted in Q7.a and Q7.c (there should be at least seven of them). Describe the differences. In your opinion, what are the implications for a speech recognition system?
e.Using your own voice, make four different recordings of the same short word, pronounced each time using a different tone (use the four tones of the Mandarin language).
f.Plot the four audio files you obtained using the spectrogram function you developed in Question 5. Save each plot as a separate jpg image in your folder of results.
g.Compare the four spectrograms of Q7.f. Describe the similarities and the differences. What are the implications for a speech recognition system of the Mandarin language versus of the English language?
Handing In
Compile the answers to the exercises, including the answers to specific questions, program listings (including comments), and plots from experiments, into a “folder” of results showing that you have completed the lab.
Name the folder: EBU6018_5303_Lab1
Rename this document ‘Lab1_xxxxxxxx’ where xxxxxxxx is your QM student number and save it in your result folder together with your MATLAB files and plot images.
For Part 2: your folder should contain (1) this document; (2) your MATLAB code (mySpectrogram.m); (3) and at least 18 jpeg images (firework_launch.jpg, music.jpg, sp10.jpg, sp10_white.jpg, can-you-keep-a-secret.jpg, come-on-you-can-do-it.jpg, maybe-next-time.jpg, ohyeah1.jpg, ohyeah2.jpg, ohyeah3.jpg, ohyeah4.jpg, ohyeah5.jpg, ohyeah6.jpg, ohyeah7.jpg, word1.jpg, word2.jpg, word3.jpg, word4.jpg).
Submit the folder as a zip archive on QMplus in BOTH the EBU6018 and EBU5303 course areas before the deadline (i.e., submit twice the same zip archive).
IMPORTANT: Plagiarism (copying from other students or copying the work of others without proper referencing) is cheating and will not be tolerated.
IF TWO “FOLDERS” ARE FOUND TO CONTAIN IDENTICAL MATERIAL, BOTH WILL BE GIVEN A MARK OF ZERO.
Marking scheme
Part 1 EBU6018 (max 82 marks)
Q1. Up to 25 marks:
DFT Real, imaginary, magnitude, phase for each signal (uniform, delta, sine, cosine, pulses): 1 mark each x 5 = 20 plus 1 mark for statement about each. Total 25.
Q2. FFT marking as for DFT. Total 25.
Q3. Up to 20 marks.
For magnitude plot of one waveform: 1 mark for time domain, 1 mark for frequency domain, 1 mark for statement. Total 3 marks.
For plots of a chosen signal (eg piccolo): 3 different window lengths, linear amplitude, 1 mark each, log amplitude 1 mark each. 3 different window positions: linear amplitude 1 mark each, log amplitude 1 mark each. Total 12 marks. Plus 1 mark each for 5 statements.
Q4: up to 12 marks
Q4.a 1 mark for each plot (includes Q4.1 b), 4 marks
Q4.b 1 mark for radix-2, up to 2 marks for the best nfft values with justifications
Q4.1.c up to 5 marks for the frequency estimations with explanations
Part2 EBU5303 (max 50 marks):
Q5: up to 12 marks
Q5.a up to 5 marks for the mySpectrogram.m function
Q5.b 2 marks for the plots (0.5 for each plot)
Q5.c up to 5 marks for the comments
Q6: up to 13 marks
Q6.a 3 marks for the plots (1 for each plot)
Q6.b up to 10 marks (1 for each correctly identified sound type)
Q7: up to 25 marks
Q7.a 3 marks for the plots (1 for each plot)
Q7.b up to 4 marks (1 for each recording)
Q7.c up to 2 marks (0.5 for each plot)
Q7.d up to 5 marks for the comments
Q7.e up to 4 marks (1 for each recording)
Q7.f up to 2 marks (0.5 for each plot)
Q7.g up to 5 marks for the comments
Updated by MPD, MEPD
Modified ARW for EBU6018.
Modified MLB for EBU5303.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。