Understand Speech (Voiceprint ) Recognition 1:N and 1:1 – Deep Learning Tutorial

admin

2 years ago

When we are building speech recognition system, we have faced 1:N and 1:1 recognition. In this tutorial, we will introduce the their difference.

1:1 Recognition

For example, if one database contains 100w members, each of them has a unique voiceprint feature, which means 100w voiceprint features are also stored in this database, they are [v1, v2, …, v1,000,000].

As to 1:1 recognition, it represents if you have a new voiceprint feature v, you have to compare 100w times, which means you should compare v with [v1, v2, …, v1,000,000]. Then, you can determines this speech are spoken by which person in database.

1:N Recognition

If N= 2

As to example above, you database contains 2 voiceprint features for each member, it means 200w voiceprint features are stored.

When you got a new voiceprint feature v, you have to compare 200w times, however, because each member has 2 voiceprint features, the compared result may be higher than 1:1

Moreover, we often do not need to compare 200w times, if you have known the user id of one member, we can search 2 features by this user id. Finally, we will only compare 2 times.