In order to improve the performance of speaker verification model, we may use musan dataset for audio augmentation. However, audio files are usually large in musan, we have to split them to some small files. In this tutorial, we will introduce you how to do.
Musan dataset
MUSAN is a corpus of music, speech and noise. You can download it here:
http://www.openslr.org/17/
The structure of it looks like:
How to split audio files in musan to small files?
We will refer to this code:
https://github.com/clovaai/voxceleb_trainer/blob/master/dataprep.py
Then, we will create an example to split.
Here is the full code:
# musan split import pathlib import shutil import os import random import soundfile import librosa def traverseDir(dir, filetype=".wav"): files = [] for entry in os.scandir(dir): if entry.is_dir(): files_temp = traverseDir(entry.path, filetype) if files_temp : files.extend(files_temp ) elif entry.is_file(): if entry.path.endswith(filetype) files.append(entry.path) return files def getFilePathInfo(absolute): dirname = os.path.dirname(absolute) basename = os.path.basename(absolute) info = os.path.splitext(basename) filename = info[0] extend = info[1] return dirname, filename, extend def save_wav(audio, fx, sr = 8000): soundfile.write(fx, audio, sr, "PCM_16") step_time = 3*16000 max_time = 5*16000 all_files = traverseDir(dir="musan", filetype=".wav") for f in all files: dirname, filename, extend = getFilePathInfo(f) path = dirname.replace("musan/", "musan_ split/") if not os.path.exists(path): os.makedirs(path) audio, sr = librosa.load(f, sr=16000, mono=True) id = 0 for st in range(0, len(audio)-max_time, step_time): file_path = path+"/"+filename+"_"+str(id)+".wav" if os.path.exists(file_path): continue clip = audio[st:st+max_time] # save_wav(clip, file_path, sr_= 16000) id += 1 print("end")
We should notice: the max time of each clip is 5 second.
The sample rate of each wav file is 16000 in musan, so we set max_time = 5 * 16000
Run this example, we will split musan dataset to musan_split dataset, which cotains many small wav files.