Ka leo leo ma PC a me RaspberryPi me Whisper

ka mana leo ma ka pc a me ka raspberry pi

ʻO ka manaʻo o ka papahana hāʻawi i nā ʻōlelo aʻo leo e launa pū ma o kā mākou PC a i ʻole kā mākou Raspberry Pi me ka hoʻohana ʻana i ke kumu hoʻohālike Voice-to-text Whisper.

E hāʻawi mākou i kahi kauoha e kope ʻia, hoʻololi ʻia i kikokikona, me Whisper a laila nānā ʻia e hoʻokō i ke kauoha kūpono, ʻo ia ka mea mai ka hoʻokō ʻana i kahi papahana i ka hāʻawi ʻana i ka uila i nā pine RaspberryPi.

E hoʻohana wau i kahi Raspberry Pi 2 kahiko, kahi micro USB a e hoʻohana wau i ke ʻano Voice-to-text i hoʻokuʻu ʻia e OpenAI, Kūkū. Ma ka hopena o ka ʻatikala hiki iā ʻoe ke ʻike hawanawana hou aku.

i hoʻolālā ʻia nā mea a pau Python.

Ke waiho nei au iā ʻoe i kahi hōʻikeʻike o ke ʻano o ka hana ʻana i kēia wikiō, e kāohi ana i ka PC ma ka leo.

Lāpule

No ka hoʻohana ʻana me ka PC, pono mākou i kahi microphone.

Inā ʻoe e kau iā ia ma ka RaspberryPi, pono ʻoe i kahi microphone USB, no ka mea, ʻo ka jack i loaʻa iā ia no ka hoʻopuka wale ʻana.

Pono mākou:

ʻOiai ʻo ke kumu nui o ka hāmeʻa ʻo ka ʻike leo. ʻIke wau he mea maikaʻi loa ia e hoʻohui iā ia i ka hana o nā mea hana ʻē aʻe.

  • Kelepona USB
  • Raspberry PI me ka ʻōnaehana hana (Raspbian pro example)
  • Electronics (LED, uea, 480 ohm resistor a me ka papa palaoa)

Hoʻopili mākou i ka LED i ka pine 17, ʻo ia ka mea a mākou e hoʻāla ai a hoʻopau no kēia ʻike.

hoʻomohala code

Ua māhele ʻia ʻo ia i ʻekolu ʻāpana, ʻo ka mua, ʻo ka hoʻopaʻa leo leo aʻu i lawe ai i kahi code mai geeksforgeeks, no ka mea, ʻaʻole wau ʻike i kēlā mau hale kūʻai puke. ʻO ka lua, ʻo ka hoʻololi ʻana o ka leo i ka kikokikona me Whisper a me ke kolu, ka mālama ʻana i kēlā kikokikona a me ka pane i ka RaspberryPi

Ma ka laʻana hoʻāʻo e hele wale ana wau e launa pū me kahi Led, e hoʻomālamalama a ʻālohilohi paha, akā hiki iā mākou ke hoʻomohala i ka palapala e hoʻoponopono ai i ko mākou pono.

ʻIke wau he Raspberry Pi 2 kēia a e ʻoi aku ka lohi ma mua o ka Raspberry Pi 4, akā no ka hoʻāʻo ʻana ua maikaʻi.

Ma mua o ka hiki ke hana, pono ʻoe e hoʻokomo i kēia

#Instalar whisper
pip install git+https://github.com/openai/whisper.git
sudo apt update && sudo apt install ffmpeg

#para que funcione la grabación de audio
python3 -m pip install sounddevice --user
pip install git+https://github.com/WarrenWeckesser/wavio.git

#si vas a instalarlo en la raspberry
#dar permisos para usar la GPIO
sudo apt install python3-gpiozero
sudo usermode -aG gpio <username>

ke code a pau

#!/usr/bin/env python3
import whisper
import time
from gpiozero import LED
import sounddevice as sd
from scipy.io.wavfile import write
import wavio as wv

        
def main ():
    inicio = time.time()
    record_audio ()

    model = whisper.load_model("tiny")
    result = model.transcribe("audio1.wav")
    words = result["text"].split()

    for word in words:
        word = word.replace(',', '').replace('.', '').lower()
        if word == 'enciende' or 'encender':
            encender()
            break
        if word == 'parpadea' or 'parpadear':
            parpadear()
            break      
    fin = time.time()
    print(fin-inicio)

def encender ():
    LED(17).on()

def parpadear ():
    light = LED(17)
    while True:
        light.on()
        sleep(1)
        light.off()
        sleep(1)

def record_audio ():
    # Sampling frequency
    freq = 44100
    # Recording duration
    duration = 5
    # Start recorder with the given values
    # of duration and sample frequency
    recording = sd.rec(int(duration * freq),
                    samplerate=freq, channels=2)
    # Record audio for the given number of seconds
    sd.wait()
    # This will convert the NumPy array to an audio
    # file with the given sampling frequency
    write("audio0.wav", freq, recording)
    # Convert the NumPy array to audio file
    wv.write("audio1.wav", recording, freq, sampwidth=2)
        
main ()


#dar permisos para usar la GPIO
#sudo apt install python3-gpiozero
#sudo usermode -aG gpio <username>

#Instalar whisper
#pip install git+https://github.com/openai/whisper.git
#sudo apt update &amp;&amp; sudo apt install ffmpeg

ʻAʻole hiki iaʻu ke hoʻāʻo iā ia no ka mea ʻaʻohe oʻu microSD no ka RaspberryPi, a i ʻole kahi leo USB e hoʻopili ai, akā i koʻu hoʻāʻo ʻana, hoʻoponopono wau i kekahi hewa e maʻalahi ke paheʻe i loko.

ʻO ka wehewehe ʻanuʻu ʻana o ke code

#!/usr/bin/env python3

ʻO ka Shebang e haʻi i ke kelepona i ka ʻōlelo a mākou i hoʻolālā ai a me ka unuhi ʻōlelo e hoʻohana ai. ʻOiai he mea liʻiliʻi ia, ʻo ka waiho ʻole ʻana i nā hewa i nā manawa he nui.

nā hale waihona puke i lawe ʻia mai

import whisper
import time
from gpiozero import LED
import sounddevice as sd
from scipy.io.wavfile import write
import wavio as wv

Hāwanawana e hana me ke kŘkohu

manawa, no ka mea, hoʻohana wau ia mea e hoʻomalu i ka manawa e hoʻokō ai i ka palapala, gpiozero e hana me nā pine GPIO o ka Raspberry a me ka sounddevice, scipy a me wavio e hoʻopaʻa i ka leo.

Nā hana

Ua hana au i 4 mau hana:

  • nui ()
  • māmā ()
  • e poni ()
  • record_audio()

e hoʻohuli () hāʻawi wale i ka uila i ka pine 17 o ka raspberry kahi a mākou i hoʻopili ai i kēia hihia ke LED e hoʻāʻo.

def encender ():
    LED(17).on()

blink() ua like ia me on() akā hoʻolilo ia i ke kukui alakaʻi ma ka hoʻohuli ʻana a i ʻole i loko o kahi loop.

def parpadear ():
    light = LED(17)
    while True:
        light.on()
        sleep(1)
        light.off()
        sleep(1)

Me record_audio() hoʻopaʻa mākou i ka faila leo

def record_audio ():
    # Sampling frequency
    freq = 44100
    # Recording duration
    duration = 5
    # Start recorder with the given values
    # of duration and sample frequency
    recording = sd.rec(int(duration * freq),
                    samplerate=freq, channels=2)
    # Record audio for the given number of seconds
    sd.wait()
    # This will convert the NumPy array to an audio
    # file with the given sampling frequency
    write("audio0.wav", freq, recording)
    # Convert the NumPy array to audio file
    wv.write("audio1.wav", recording, freq, sampwidth=2)

ʻO Main ka hana nui, e ʻike ʻo ka mea wale nō i loaʻa iā mākou ma waho o nā hana ʻo ia ke kāhea ʻana i ka main() ma ka hope o ka palapala. ʻO kēia ala i ka hoʻomaka ʻana, e lawe mai i nā hale waihona puke a laila hana i ke kelepona hana.

def main ():
    inicio = time.time()
    record_audio ()

    model = whisper.load_model("tiny")
    result = model.transcribe("audio1.wav")
    words = result["text"].split()

    for word in words:
        word = word.replace(',', '').replace('.', '').lower()
        if word == 'enciende' or 'encender':
            encender()
            break
        if word == 'parpadea' or 'parpadear':
            parpadear()
            break      
    fin = time.time()
    print(fin-inicio)

Mālama mākou i ka manawa a mākou e hoʻomaka ai e hoʻokō i ka hana a laila kāhea mākou i ka hana leo leo e hoʻopaʻa i kā mākou aʻo ʻana i kahi faila .wav, .mp3, etc. a mākou e hoʻololi ai i ka kikokikona.

    inicio = time.time()
    record_audio ()

  

Ke loaʻa iā mākou ka leo, e kāhea ʻia ka hāwanawana a haʻi mākou iā ia i ke kumu hoʻohālike a mākou e makemake ai e hoʻohana, aia he 5 i loaʻa, a e hoʻohana mākou i ka liʻiliʻi, ʻoiai ʻo ia ka imprecise loa no ka mea ʻo ia ka wikiwiki a maʻalahi ka leo. 3 a 4 mau hua'ōlelo wale nō.

     model = whisper.load_model("tiny")
    result = model.transcribe("audio1.wav")

  

Me kēia, ua hoʻololi mākou i ka leo i ka kikokikona a mālama ʻia i kahi loli. E hoʻololi iki kāua.

Hoʻololi mākou i ka hopena i papa inoa me kēlā me kēia huaʻōlelo o ka leo

     words = result["text"].split()

  

A mākaukau nā mea āpau e launa pū me kā mākou hāmeʻa. I kēia manawa pono mākou e hana i nā kūlana a mākou e makemake ai.

Inā loaʻa ka huaʻōlelo X i ka leo, e hana iā Y. ʻOiai aia nā huaʻōlelo i loko o kahi papa inoa, maʻalahi loa ka hoʻohui ʻana i nā kūlana

         for word in words:
        word = word.replace(',', '').replace('.', '').lower()
        if word == 'enciende' or 'encender':
            encender()
            break
        if word == 'parpadea' or 'parpadear':
            parpadear()
            break   

  

Ka laina

         
        word = word.replace(',', '').replace('.', '').lower()


  

Hoʻohana au iā ia e hoʻololi i nā huaʻōlelo i ka leo i ka liʻiliʻi a wehe i nā koma a me nā manawa. A ma kēia ʻano e pale aku ai i nā hewa i ka hoʻohālikelike ʻana

I loko o kēlā me kēia inā e hoʻokō ʻia ke kūlana o ka loaʻa ʻana o kekahi o nā huaʻōlelo a mākou i koho ai, kapa ʻia ia he hana e hana i kā mākou makemake,

ʻO kēia kahi a mākou e haʻi ai iā ia e hoʻāla i kahi PIN e hoʻomālamalama i kahi LED a i ʻole e ʻālohilohi. E holo i kekahi code, a i ʻole e pani i ke kamepiula.

He manaʻo kumu kēia a pau. Mai ʻaneʻi hiki iā ʻoe ke hoʻomohala i ka papahana a hoʻomaikaʻi iā ia e like me kou makemake. Hiki i kēlā me kēia kanaka ke loaʻa kahi hoʻohana ʻokoʻa no ia mea.

Nā mea hiki iā mākou ke hana me kēia montage

He mau manaʻo kēia i hiki mai iaʻu e hoʻohana i kēia montage. I ka wā e paʻa ai ka iwi iwi, hiki iā mākou ke hoʻohana iā ia e hoʻāla i nā mea a pau i hiki i ka noʻonoʻo ma ka leo, hiki iā mākou ke hoʻāla i kahi relay e hoʻomaka ai i kahi kaʻa a i ʻole hiki iā mākou ke hoʻomaka i kahi palapala e hoʻokō ai i kahi palapala, leka uila a i ʻole.

He aha ka hawanawana

ʻO Whisper kahi hiʻohiʻona ʻike vol, hana i ka multilanguage me ka nui o nā ʻōlelo a hiki ke unuhi i ka ʻōlelo Pelekania. ʻO ia ka mea a mākou e ʻike ai ma ke ʻano he mea hana kikokikona, akā ʻo Open Source kēia, i hoʻokuʻu ʻia e ka hui OpenAI, nā mea hana o Stable Diffusion.

Inā he kanaka hoʻomaha ʻoe e like me mākou a makemake ʻoe e hui pū i ka mālama ʻana a me ka hoʻomaikaʻi ʻana i ka papahana, hiki iā ʻoe ke hāʻawi i kahi hāʻawi. E hele nā ​​kālā a pau e kūʻai i nā puke a me nā mea hana e hoʻokolohua a hana i nā haʻawina

Haʻalele i ka manaʻo hoʻopuka