Ulawulo lwelizwi kwiPC kunye neRaspberryPi ngeWhisper

ulawulo lwelizwi kwi-pc kunye ne-raspberry pi

Umbono weprojekthi yile ukunika imiyalelo yelizwi ukusebenzisana ngePC yethu okanye kwiRaspberry Pi yethu usebenzisa imodeli yeVoice-to-text Whisper.

Siza kunika umyalelo oza kubhalwa, uguqulelwe kwisicatshulwa, kunye ne-Whisper kwaye emva koko sihlalutye ukuphumeza umyalelo ofanelekileyo, onokuthi uphume ekuphumezeni inkqubo yokunika i-voltage kwi-RaspberryPi izikhonkwane.

Ndiza kusebenzisa iRaspberry Pi 2 yakudala, i-USB encinci kwaye ndiza kusebenzisa imodeli yeVoice-to-text esanda kukhutshwa yi-OpenAI, Ukuhleka. Ekupheleni kwenqaku ungabona ukusebeza kancinci.

zonke zifakwe ngaphakathi Python.

Ndikushiya umboniso wendlela esebenza ngayo kule vidiyo, ukulawula iPC ngelizwi.

Indibano

Ukuyisebenzisa kunye nePC, siya kufuna kuphela imakrofoni.

Ukuba uza kuyibeka kwi-RaspberryPi, uya kufuna imakrofoni ye-USB, kuba i-jack enayo kuphela yemveliso.

Kufuneka:

Njengoko injongo jikelele yesixhobo kukuchongwa kwelizwi. Ndiyifumana iluncedo kakhulu ukuyidibanisa ekusebenzeni kwezinye izixhobo.

  • Micro USB
  • IRaspberry PI enenkqubo yokusebenza (umzekelo weRaspbian pro)
  • I-Electronics (i-LED, iingcingo, i-480 ohm resistor kunye ne-breadboard)

Sidibanisa i-LED kwi-pin 17, eyona nto siya kuyenza isebenze kwaye yenze ukuba isebenze kula mava.

uphuhliso lwekhowudi

Yahlulwe yangamacandelo amathathu, eyokuqala, ukurekhodwa komsindo endithathe kuwo ikhowudi geeksforgeeks, kuba andibazi abo bathengisa iincwadi. Okwesibini, ukuguqulwa komsindo kwisicatshulwa kunye ne-Whisper kunye neyesithathu, unyango lweso sicatshulwa kunye nempendulo kwi-RaspberryPi.

Kumzekelo wovavanyo ndiza kusebenzisana kuphela ne-Led, ndiyenze ikhanyise okanye iqhwanyaze, kodwa sinokuphuhlisa iskripthi ukuyilungisa kwiimfuno zethu.

Ndiyazi ukuba le yiRaspberry Pi 2 kwaye izakucotha kakhulu kuneRaspberry Pi 4, kodwa kuvavanyo ilungile.

Ngaphambi kokuba uyifumane isebenze, kuya kufuneka ufakele oku kulandelayo

#Instalar whisper
pip install git+https://github.com/openai/whisper.git
sudo apt update && sudo apt install ffmpeg

#para que funcione la grabación de audio
python3 -m pip install sounddevice --user
pip install git+https://github.com/WarrenWeckesser/wavio.git

#si vas a instalarlo en la raspberry
#dar permisos para usar la GPIO
sudo apt install python3-gpiozero
sudo usermode -aG gpio <username>

yonke ikhowudi

#!/usr/bin/env python3
import whisper
import time
from gpiozero import LED
import sounddevice as sd
from scipy.io.wavfile import write
import wavio as wv

        
def main ():
    inicio = time.time()
    record_audio ()

    model = whisper.load_model("tiny")
    result = model.transcribe("audio1.wav")
    words = result["text"].split()

    for word in words:
        word = word.replace(',', '').replace('.', '').lower()
        if word == 'enciende' or 'encender':
            encender()
            break
        if word == 'parpadea' or 'parpadear':
            parpadear()
            break      
    fin = time.time()
    print(fin-inicio)

def encender ():
    LED(17).on()

def parpadear ():
    light = LED(17)
    while True:
        light.on()
        sleep(1)
        light.off()
        sleep(1)

def record_audio ():
    # Sampling frequency
    freq = 44100
    # Recording duration
    duration = 5
    # Start recorder with the given values
    # of duration and sample frequency
    recording = sd.rec(int(duration * freq),
                    samplerate=freq, channels=2)
    # Record audio for the given number of seconds
    sd.wait()
    # This will convert the NumPy array to an audio
    # file with the given sampling frequency
    write("audio0.wav", freq, recording)
    # Convert the NumPy array to audio file
    wv.write("audio1.wav", recording, freq, sampwidth=2)
        
main ()


#dar permisos para usar la GPIO
#sudo apt install python3-gpiozero
#sudo usermode -aG gpio <username>

#Instalar whisper
#pip install git+https://github.com/openai/whisper.git
#sudo apt update &amp;&amp; sudo apt install ffmpeg

Andikwazanga ukuyivavanya ngenxa yokuba andinayo i-microSD ye-RaspberryPi, okanye isithethi se-USB sokuxhuma, kodwa nje ukuba ndiyizame ndilungisa impazamo ethile ekulula ukungena kuyo.

Inyathelo ngenyathelo inkcazo yekhowudi

#!/usr/bin/env python3

AbakwaShebhang baxelele isixhobo ukuba siluluphi na ulwimi esilulungiselele ukuba silusebenzise kwaye sisebenzise itoliki. Nangona ibonakala ingenamsebenzi, ukungayibeki kubangela iimpazamo kwizihlandlo ezininzi.

amathala eencwadi angaphandle

import whisper
import time
from gpiozero import LED
import sounddevice as sd
from scipy.io.wavfile import write
import wavio as wv

Thetha ukuze usebenze ngemodeli

ixesha, kuba ndiyisebenzisa ukulawula ixesha elithathayo ukwenza iskripthi, gpiozero ukusebenza kunye nezikhonkwane zeGPIO zeRaspberry kunye nesixhobo somsindo, i-scipy kunye ne-wavio ukurekhoda umsindo.

Imisebenzi

Ndenze imisebenzi emi-4:

  • eziphambili ()
  • ukukhanya ()
  • Ukuqhwanyaza ()
  • Record_audio()

vula () ngokulula ukunika i-voltage kwi-pin 17 ye-raspberry apho siqhagamshele kulo mzekelo i-LED ukuvavanya

def encender ():
    LED(17).on()

blink() ifana ne-() kodwa yenza ukuqhwanyaza okukhokelwayo ngokuyilayita kunye nokucima ngaphakathi kwilophu.

def parpadear ():
    light = LED(17)
    while True:
        light.on()
        sleep(1)
        light.off()
        sleep(1)

Ngerekhodi_audio () sirekhoda ifayile yomsindo

def record_audio ():
    # Sampling frequency
    freq = 44100
    # Recording duration
    duration = 5
    # Start recorder with the given values
    # of duration and sample frequency
    recording = sd.rec(int(duration * freq),
                    samplerate=freq, channels=2)
    # Record audio for the given number of seconds
    sd.wait()
    # This will convert the NumPy array to an audio
    # file with the given sampling frequency
    write("audio0.wav", freq, recording)
    # Convert the NumPy array to audio file
    wv.write("audio1.wav", recording, freq, sampwidth=2)

Okungundoqo ngowona msebenzi ungundoqo, qaphela ukuba ekuphela kwento esinayo ngaphandle kwemisebenzi kukufowunela kwi-main () ekupheleni kwescript. Ngale ndlela ekuqaleni, iya kungenisa iilayibrari kwaye emva koko yenza umnxeba wokusebenza.

def main ():
    inicio = time.time()
    record_audio ()

    model = whisper.load_model("tiny")
    result = model.transcribe("audio1.wav")
    words = result["text"].split()

    for word in words:
        word = word.replace(',', '').replace('.', '').lower()
        if word == 'enciende' or 'encender':
            encender()
            break
        if word == 'parpadea' or 'parpadear':
            parpadear()
            break      
    fin = time.time()
    print(fin-inicio)

Sigcina ixesha esiqala ngalo ukwenza umsebenzi kwaye emva koko sibiza umsebenzi womsindo werekhodi oza kurekhoda umyalelo wethu kwifayile ye-.wav, .mp3, njl njl.

    inicio = time.time()
    record_audio ()

  

Nje ukuba sibe nesandi, i-whisper iya kubizwa kwaye siyixelele imodeli esifuna ukuyisebenzisa, kukho ezi-5 ezikhoyo, kwaye siya kusebenzisa encinci, nangona iyeyona ingachanekanga kuba yeyona ikhawulezayo kwaye iaudio iya kuba lula, amagama ama-3 okanye ama-4 kuphela .

     model = whisper.load_model("tiny")
    result = model.transcribe("audio1.wav")

  

Ngale nto sine-audio eguqulelwe kwisicatshulwa kwaye igcinwe kwinguqu. Masiyilungise kancinci.

Siguqula iziphumo zibe luluhlu kunye negama ngalinye lesandi

     words = result["text"].split()

  

Kwaye yonke into ilungele ukunxibelelana nesixhobo sethu. Ngoku kufuneka senze iimeko esizifunayo.

Ukuba isandi sinegama X, yenza Y. Njengoko sinamagama kuluhlu, kulula kakhulu ukongeza iimeko

         for word in words:
        word = word.replace(',', '').replace('.', '').lower()
        if word == 'enciende' or 'encender':
            encender()
            break
        if word == 'parpadea' or 'parpadear':
            parpadear()
            break   

  

Umgca

         
        word = word.replace(',', '').replace('.', '').lower()


  

Ndiyisebenzisa ukuguqula amagama akwiaudiyo abe ngoonobumba abancinci kwaye ndisuse iikoma kunye namaxesha. Kwaye ngale ndlela ziphephe iimpazamo kuthelekiso

Kwimeko nganye ukuba imeko yokuba naliphi na lamagama esiwakhethileyo lidibene, libiza umsebenzi oya kwenza into esiyifunayo,

Apha kulapho siyixelela khona ukuba ivule i-PIN eya kuthi ilayite i-LED okanye iqhwanyaze. Nokuba usebenzisa ikhowudi ethile, okanye uvale ikhompyuter.

Konke oku ngumbono osisiseko. Ukusuka apha ungaphuhlisa iprojekthi kwaye uyiphucule njengoko ufuna. Umntu ngamnye unokufumana indlela eyahlukileyo yokusetyenziswa kwayo.

Izinto esinokuzenza ngale montage

Ezi ziingcinga eziza kum ukuba ndizothatha ithuba le montage. Nje ukuba i-skeleton ixhobile, sinokuyisebenzisa ukwenza yonke into ethi qatha engqondweni ngelizwi, sinokuvula i-relay eqala i-motor okanye sinokuphehlelela iskripthi esenza iskripthi, i-imeyile okanye nantoni na.

Yintoni ukusebeza

I-Whisper yimodeli yokuqaphela i-vol, isebenza kwiilwimi ezininzi kunye nenani elikhulu leelwimi kwaye ivumela ukuguqulelwa kwisiNgesi. Yinto esiyaziyo njengesixhobo sombhalo-kwilizwi, kodwa lo nguMthombo oVulekileyo, okhutshwe liqela le-OpenAI, abadali be-Stable Diffusion.

Shiya amazwana