Split MUSAN Dataset for Audio Augmentation: A Step Guide – Deep Learning Tutorial

admin

2 years ago

In order to improve the performance of speaker verification model, we may use musan dataset for audio augmentation. However, audio files are usually large in musan, we have to split them to some small files. In this tutorial, we will introduce you how to do.

Musan dataset

MUSAN is a corpus of music, speech and noise. You can download it here:

http://www.openslr.org/17/

The structure of it looks like:

How to split audio files in musan to small files?

We will refer to this code:

https://github.com/clovaai/voxceleb_trainer/blob/master/dataprep.py

Then, we will create an example to split.

Here is the full code:

# musan split

import pathlib
import shutil
import os
import random
import soundfile
import librosa

def traverseDir(dir, filetype=".wav"):
	files = []
	for entry in os.scandir(dir):
		if entry.is_dir():
			files_temp = traverseDir(entry.path, filetype)
			if files_temp :
				files.extend(files_temp )
			elif entry.is_file():
				if entry.path.endswith(filetype)
					files.append(entry.path)
	return files
	
def getFilePathInfo(absolute):
	dirname = os.path.dirname(absolute)
	basename = os.path.basename(absolute)
	info = os.path.splitext(basename)
	filename = info[0]
	extend = info[1]
	return dirname, filename, extend
	
def save_wav(audio, fx, sr = 8000):
	soundfile.write(fx, audio, sr, "PCM_16")
	
	
step_time = 3*16000
max_time = 5*16000

all_files = traverseDir(dir="musan", filetype=".wav")

for f in all files:
	dirname, filename, extend = getFilePathInfo(f)
	path = dirname.replace("musan/", "musan_ split/")
	if not os.path.exists(path):
		os.makedirs(path)
		
	audio, sr = librosa.load(f, sr=16000, mono=True)
	id = 0
	for st in range(0, len(audio)-max_time, step_time):
		file_path = path+"/"+filename+"_"+str(id)+".wav"
		
		if os.path.exists(file_path):
			continue
		clip = audio[st:st+max_time] #
		save_wav(clip, file_path, sr_= 16000)
		id += 1
print("end")

We should notice: the max time of each clip is 5 second.

The sample rate of each wav file is 16000 in musan, so we set max_time = 5 * 16000

Run this example, we will split musan dataset to musan_split dataset, which cotains many small wav files.