(简体中文|English)

声纹识别

介绍

声纹识别是一项用计算机程序自动提取说话人特征的技术。

这个 demo 是从一个给定音频文件中提取说话人特征，它可以通过使用 PaddleSpeech 的单个命令或 python 中的几行代码来实现。

使用方法

1. 安装

请看安装文档。

你可以从easy medium，hard 三种方式中选择一种方式安装。

2. 准备输入

声纹cli demo 的输入应该是一个 WAV 文件（.wav），并且采样率必须与模型的采样率相同。

可以下载此 demo 的示例音频：

# 该音频的内容是数字串 85236145389
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav

3. 使用方法

命令行 (推荐使用)

paddlespeech vector --task spk --input 85236145389.wav

echo -e "demo1 85236145389.wav" > vec.job
paddlespeech vector --task spk --input vec.job

echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk

paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"

echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
paddlespeech vector --task score --input vec.job

使用方法：

paddlespeech vector --help

参数：

input(必须输入)：用于识别的音频文件。
task (必须输入): 用于指定 vector 处理的具体任务，默认是 spk。
model：声纹任务的模型，默认值：ecapatdnn_voxceleb12。
sample_rate：音频采样率，默认值：16000。
config：声纹任务的参数文件，若不设置则使用预训练模型中的默认配置，默认值：None。
ckpt_path：模型参数文件，若不设置则下载预训练模型使用，默认值：None。
device：执行预测的设备，默认值：当前系统下 paddlepaddle 的默认 device。

输出：

  [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
  -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
  -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
-12.338331     2.1373026   -5.3957124    9.717328     5.6752305
  3.7805123    3.0597172    3.429692     8.97601     13.174125
  -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
  8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
-13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
  3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
  0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
  -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
  16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
  11.490801     4.2380238    9.550931     8.375046     7.5089145
  -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
  6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
  -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
  -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
  -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
  -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
-11.675817    -2.8630207    4.5721755    2.246612    -4.574342
  1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
  -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
  -0.42654222   8.341269     1.356552     7.0966883  -13.102829
  8.016734    -7.1159344    1.8699781    0.208721    14.699384
  -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
  -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
  11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
  -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
  -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
  -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
  0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
  -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
  -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
  0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
  5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
  2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
  -2.003628     2.4434285    9.973139     5.03668      2.0051203
  2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
  -4.070415    -6.831437  ]

Python API

import paddle
from paddlespeech.cli.vector import VectorExecutor

vector_executor = VectorExecutor()
audio_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./85236145389.wav',
    device=paddle.get_device())
print('Audio embedding Result: \n{}'.format(audio_emb))

test_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./123456789.wav',
    device=paddle.get_device())
print('Test embedding Result: \n{}'.format(test_emb))

# score range [0, 1]
score = vector_executor.get_embeddings_score(audio_emb, test_emb)
print(f"Eembeddings Score: {score}")

输出：

# Vector Result:
 Audio embedding Result:
  [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
    3.7805123    3.0597172    3.429692     8.97601     13.174125
    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
    11.490801     4.2380238    9.550931     8.375046     7.5089145
    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
    8.016734    -7.1159344    1.8699781    0.208721    14.699384
    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
    -2.003628     2.4434285    9.973139     5.03668      2.0051203
    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
    -4.070415    -6.831437  ]
  # get the test embedding
  Test embedding Result:
  [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
    3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
    -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
  -10.776924    -5.694543     1.112041     1.5709964    1.0961034
    1.3976512    2.324352     1.339981     5.279319    13.734659
    -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
    1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
  -15.220778     9.779349    -9.411551    -6.388947     6.8313975
    -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
    6.793837     2.6328635    8.620976     3.4832475    0.52491665
    2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
    16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
    14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
    -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
    -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
    -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
    2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
    0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
    5.899413     0.7384519  -17.760096    10.555021     4.1366534
    -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
    -6.432868    13.006379     4.8956      -9.155822    -1.9441519
    5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
    -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
    8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
    8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
    -1.4963245   -3.476059     4.132903   -10.893354     4.362673
    -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
    -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
    -1.0685898    9.505196     7.3062205    0.08708266  12.927811
    -9.57974      1.3936648   -1.9444873    5.776769    15.251903
    10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
    -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
    0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
    8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
    5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
    -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
    2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
    10.7689295    7.137627     5.099476     0.3473359    9.647881
    -2.0484571   -5.8549366 ]
  # get the score between enroll and test
  Eembeddings Score: 0.45332613587379456

4.预训练模型

以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表：

模型	采样率
ecapatdnn_voxceleb12	16k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_cn.md

README_cn.md

声纹识别

介绍

使用方法

1. 安装

2. 准备输入

3. 使用方法

4.预训练模型

Files

README_cn.md

Latest commit

History

README_cn.md

File metadata and controls

声纹识别

介绍

使用方法

1. 安装

2. 准备输入

3. 使用方法

4.预训练模型