When we are building speech recognition system, we have faced 1:N and 1:1 recognition. In this tutorial, we will introduce the their difference.
1:1 Recognition
For example, if one database contains 100w members, each of them has a unique voiceprint feature, which means 100w voiceprint features areĀ also stored in this database, they are [v1, v2, …, v1,000,000].
As to 1:1 recognition, it represents if you have a new voiceprint feature v, you have to compare 100w times, which means you should compare v with [v1, v2, …, v1,000,000]. Then, you can determines this speech are spoken by which person in database.
1:N Recognition
If N= 2
As to example above, you database contains 2 voiceprint features for each member, it means 200w voiceprint features are stored.
When you got a new voiceprint feature v, you have to compare 200w times, however, because each member has 2 voiceprint features, the compared result may be higher than 1:1
Moreover, we often do not need to compare 200w times, if you have known the user id of one member, we can search 2 features by this user id. Finally, we will only compare 2 times.