python从音频中识别不同人说话的时间片段数据代码

代码语言:python

所属分类:其他

代码描述:python从音频中识别不同人说话的时间片段数据代码,适合从会议录音中自动整理识别不同人的发言并输出不同人发言的时间段数据,结合声音转文字技术可以生成不同人说话的会议记录。

代码标签: python 音频 识别 不同 说话 时间 片段 数据 代码

下面为部分代码预览,完整代码请点击下载或在bfwstudio webide中打开

#!/usr/local/python3/bin/python3
# -*- coding: utf-8 -*
#!pip install -qq pyannote.audio==3.1.1
#!pip install -qq ipython==7.34.0

import torch
from pyannote.audio import Model, Pipeline, Inference
from pyannote.core import Segment
from scipy.spatial.distance import cosine
import numpy as np

class SpeakerRecognition:
    def __init__(self, token):
        self.token = token
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        
        # 初始化模型
        self.pipeline = Pipeline.from_pretrained(
            "pyannote/speaker-diarization-3.1",#申请授权 https://huggingface.co/pyannote/speaker-diarization-3.1
            use_auth_token=token
        )
        self.pipeline.to(self.device)
        
        # 初始化声纹模型
        self.embed_model = Model.from_pretrained(
            "pyannote/embedding",
            use_auth_token=token
        )
        self.inference = Inference(self.embed_model, window="whole")
        
        # 存储说话人声纹
        self.speaker_embeddings = {}

    def extract_speaker_embedding(self, audio_file, segment):
        """提取特定时间段的声纹特征"""
        try:
            return self.inference.crop(audio_file, segment)
        except Exception as e:
            print(f"提取声纹特征失败: {e}")
            return None

    def add_speaker(self, speaker_name, audio_file):
        """添加说话人到声纹库"""
        try:
            diarization = self.pipeline(audio_file)
            embeddings = []
            
            for turn, _, speaker_label in diarization.iter.........完整代码请登录后点击上方下载按钮下载查看

网友评论0