I need to remake "Speech Command Recognition Using Deep Learning" example so I can read audio from the wav file and get time intervals in which the command appears, but I don't know how to change real-time analysis from microphone into static file analysis in this example. Thank you for your help.
%% Detect Commands Using Streaming Audio from Microphone % Test your newly trained command detection network on streaming audio from % your microphone. If you have not trained a network, then type % |load('commandNet.mat')| at the command line to load a pretrained network % and the parameters required to classify live, streaming audio. Try % saying one of the commands, for example, _yes_, _no_, or _stop_. % Then, try saying one of the unknown words such as _Marvin_, _Sheila_, _bed_, % _house_, _cat_, _bird_, or any number from zero to nine. %% % Specify the audio sampling rate and classification rate in Hz and create % an audio device reader that can read audio from your microphone. fs = 16e3; classificationRate = 20; audioIn = audioDeviceReader('SampleRate',fs, ... 'SamplesPerFrame',floor(fs/classificationRate)); %% % Specify parameters for the streaming spectrogram computations and % initialize a buffer for the audio. Extract the classification labels of % the network. Initialize buffers of half a second for the labels and % classification probabilities of the streaming audio. Use these buffers to % compare the classification results over a longer period of time and by % that build 'agreement' over when a command is detected. frameLength = frameDuration*fs; hopLength = hopDuration*fs; waveBuffer = zeros([fs,1]); labels = trainedNet.Layers(end).Classes; YBuffer(1:classificationRate/2) = categorical("background"); probBuffer = zeros([numel(labels),classificationRate/2]); framesNumber = audio/frameLength; %% % Create a figure and detect commands as long as the created figure exists. % To stop the live detection, simply close the figure. h = figure('Units','normalized','Position',[0.2 0.1 0.6 0.8]); while ishandle(h) % Extract audio samples from the audio device and add the samples to % the buffer. x = audioIn(); waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end); waveBuffer(end-numel(x)+1:end) = x; % Compute the spectrogram of the latest audio samples. spec = auditorySpectrogram(waveBuffer,fs, ... 'WindowLength',frameLength, ... 'OverlapLength',frameLength-hopLength, ... 'NumBands',numBands, ... 'Range',[50,7000], ... 'WindowType','Hann', ... 'WarpType','Bark', ... 'SumExponent',2); spec = log10(spec + epsil); % Classify the current spectrogram, save the label to the label buffer, % and save the predicted probabilities to the probability buffer. [YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu'); YBuffer(1:end-1)= YBuffer(2:end); YBuffer(end) = YPredicted; probBuffer(:,1:end-1) = probBuffer(:,2:end); probBuffer(:,end) = probs'; % Plot the current waveform and spectrogram. subplot(2,1,1); plot(waveBuffer) axis tight ylim([-0.2,0.2]) subplot(2,1,2) pcolor(spec) caxis([specMin+2 specMax]) shading flat % Now do the actual command detection by performing a very simple % thresholding operation. Declare a detection and display it in the % figure title if all of the following hold: % 1) The most common label is not |background|. % 2) At least |countThreshold| of the latest frame labels agree. % 3) The maximum predicted probability of the predicted label is at % least |probThreshold|. Otherwise, do not declare a detection. [YMode,count] = mode(YBuffer); countThreshold = ceil(classificationRate*0.2); maxProb = max(probBuffer(labels == YMode,:)); probThreshold = 0.5; subplot(2,1,1); if YMode == "background" || count
[audioIn,fs] = audioread('filename.wav');
2) Split the signal into 1-second chunks, with overlap between consecutive chunks. The higher the overlap, the higher your resolution (i.e. how close you will be able to detect where the keyword occured)
y = buffer(audioIn, fs, round(3*fs/4));
