ʻO ka manaʻo o ka papahana hāʻawi i nā ʻōlelo aʻo leo e launa pū ma o kā mākou PC a i ʻole kā mākou Raspberry Pi me ka hoʻohana ʻana i ke kumu hoʻohālike Voice-to-text Whisper.
E hāʻawi mākou i kahi kauoha e kope ʻia, hoʻololi ʻia i kikokikona, me Whisper a laila nānā ʻia e hoʻokō i ke kauoha kūpono, ʻo ia ka mea mai ka hoʻokō ʻana i kahi papahana i ka hāʻawi ʻana i ka uila i nā pine RaspberryPi.
E hoʻohana wau i kahi Raspberry Pi 2 kahiko, kahi micro USB a e hoʻohana wau i ke ʻano Voice-to-text i hoʻokuʻu ʻia e OpenAI, Kūkū. Ma ka hopena o ka ʻatikala hiki iā ʻoe ke ʻike hawanawana hou aku.
i hoʻolālā ʻia nā mea a pau Python.
Ke waiho nei au iā ʻoe i kahi hōʻikeʻike o ke ʻano o ka hana ʻana i kēia wikiō, e kāohi ana i ka PC ma ka leo.
Lāpule
No ka hoʻohana ʻana me ka PC, pono mākou i kahi microphone.
Inā ʻoe e kau iā ia ma ka RaspberryPi, pono ʻoe i kahi microphone USB, no ka mea, ʻo ka jack i loaʻa iā ia no ka hoʻopuka wale ʻana.
Pono mākou:
ʻOiai ʻo ke kumu nui o ka hāmeʻa ʻo ka ʻike leo. ʻIke wau he mea maikaʻi loa ia e hoʻohui iā ia i ka hana o nā mea hana ʻē aʻe.
- Kelepona USB
- Raspberry PI me ka ʻōnaehana hana (Raspbian pro example)
- Electronics (LED, uea, 480 ohm resistor a me ka papa palaoa)
Hoʻopili mākou i ka LED i ka pine 17, ʻo ia ka mea a mākou e hoʻāla ai a hoʻopau no kēia ʻike.
hoʻomohala code
Ua māhele ʻia ʻo ia i ʻekolu ʻāpana, ʻo ka mua, ʻo ka hoʻopaʻa leo leo aʻu i lawe ai i kahi code mai geeksforgeeks, no ka mea, ʻaʻole wau ʻike i kēlā mau hale kūʻai puke. ʻO ka lua, ʻo ka hoʻololi ʻana o ka leo i ka kikokikona me Whisper a me ke kolu, ka mālama ʻana i kēlā kikokikona a me ka pane i ka RaspberryPi
Ma ka laʻana hoʻāʻo e hele wale ana wau e launa pū me kahi Led, e hoʻomālamalama a ʻālohilohi paha, akā hiki iā mākou ke hoʻomohala i ka palapala e hoʻoponopono ai i ko mākou pono.
ʻIke wau he Raspberry Pi 2 kēia a e ʻoi aku ka lohi ma mua o ka Raspberry Pi 4, akā no ka hoʻāʻo ʻana ua maikaʻi.
Ma mua o ka hiki ke hana, pono ʻoe e hoʻokomo i kēia
#Instalar whisper pip install git+https://github.com/openai/whisper.git sudo apt update && sudo apt install ffmpeg #para que funcione la grabación de audio python3 -m pip install sounddevice --user pip install git+https://github.com/WarrenWeckesser/wavio.git #si vas a instalarlo en la raspberry #dar permisos para usar la GPIO sudo apt install python3-gpiozero sudo usermode -aG gpio <username>
ke code a pau
#!/usr/bin/env python3 import whisper import time from gpiozero import LED import sounddevice as sd from scipy.io.wavfile import write import wavio as wv def main (): inicio = time.time() record_audio () model = whisper.load_model("tiny") result = model.transcribe("audio1.wav") words = result["text"].split() for word in words: word = word.replace(',', '').replace('.', '').lower() if word == 'enciende' or 'encender': encender() break if word == 'parpadea' or 'parpadear': parpadear() break fin = time.time() print(fin-inicio) def encender (): LED(17).on() def parpadear (): light = LED(17) while True: light.on() sleep(1) light.off() sleep(1) def record_audio (): # Sampling frequency freq = 44100 # Recording duration duration = 5 # Start recorder with the given values # of duration and sample frequency recording = sd.rec(int(duration * freq), samplerate=freq, channels=2) # Record audio for the given number of seconds sd.wait() # This will convert the NumPy array to an audio # file with the given sampling frequency write("audio0.wav", freq, recording) # Convert the NumPy array to audio file wv.write("audio1.wav", recording, freq, sampwidth=2) main () #dar permisos para usar la GPIO #sudo apt install python3-gpiozero #sudo usermode -aG gpio <username> #Instalar whisper #pip install git+https://github.com/openai/whisper.git #sudo apt update && sudo apt install ffmpeg
ʻAʻole hiki iaʻu ke hoʻāʻo iā ia no ka mea ʻaʻohe oʻu microSD no ka RaspberryPi, a i ʻole kahi leo USB e hoʻopili ai, akā i koʻu hoʻāʻo ʻana, hoʻoponopono wau i kekahi hewa e maʻalahi ke paheʻe i loko.
ʻO ka wehewehe ʻanuʻu ʻana o ke code
#!/usr/bin/env python3
ʻO ka Shebang e haʻi i ke kelepona i ka ʻōlelo a mākou i hoʻolālā ai a me ka unuhi ʻōlelo e hoʻohana ai. ʻOiai he mea liʻiliʻi ia, ʻo ka waiho ʻole ʻana i nā hewa i nā manawa he nui.
nā hale waihona puke i lawe ʻia mai
import whisper import time from gpiozero import LED import sounddevice as sd from scipy.io.wavfile import write import wavio as wv
Hāwanawana e hana me ke kŘkohu
manawa, no ka mea, hoʻohana wau ia mea e hoʻomalu i ka manawa e hoʻokō ai i ka palapala, gpiozero e hana me nā pine GPIO o ka Raspberry a me ka sounddevice, scipy a me wavio e hoʻopaʻa i ka leo.
Nā hana
Ua hana au i 4 mau hana:
- nui ()
- māmā ()
- e poni ()
- record_audio()
e hoʻohuli () hāʻawi wale i ka uila i ka pine 17 o ka raspberry kahi a mākou i hoʻopili ai i kēia hihia ke LED e hoʻāʻo.
def encender (): LED(17).on()
blink() ua like ia me on() akā hoʻolilo ia i ke kukui alakaʻi ma ka hoʻohuli ʻana a i ʻole i loko o kahi loop.
def parpadear (): light = LED(17) while True: light.on() sleep(1) light.off() sleep(1)
Me record_audio() hoʻopaʻa mākou i ka faila leo
def record_audio (): # Sampling frequency freq = 44100 # Recording duration duration = 5 # Start recorder with the given values # of duration and sample frequency recording = sd.rec(int(duration * freq), samplerate=freq, channels=2) # Record audio for the given number of seconds sd.wait() # This will convert the NumPy array to an audio # file with the given sampling frequency write("audio0.wav", freq, recording) # Convert the NumPy array to audio file wv.write("audio1.wav", recording, freq, sampwidth=2)
ʻO Main ka hana nui, e ʻike ʻo ka mea wale nō i loaʻa iā mākou ma waho o nā hana ʻo ia ke kāhea ʻana i ka main() ma ka hope o ka palapala. ʻO kēia ala i ka hoʻomaka ʻana, e lawe mai i nā hale waihona puke a laila hana i ke kelepona hana.
def main (): inicio = time.time() record_audio () model = whisper.load_model("tiny") result = model.transcribe("audio1.wav") words = result["text"].split() for word in words: word = word.replace(',', '').replace('.', '').lower() if word == 'enciende' or 'encender': encender() break if word == 'parpadea' or 'parpadear': parpadear() break fin = time.time() print(fin-inicio)
Mālama mākou i ka manawa a mākou e hoʻomaka ai e hoʻokō i ka hana a laila kāhea mākou i ka hana leo leo e hoʻopaʻa i kā mākou aʻo ʻana i kahi faila .wav, .mp3, etc. a mākou e hoʻololi ai i ka kikokikona.
inicio = time.time() record_audio ()
Ke loaʻa iā mākou ka leo, e kāhea ʻia ka hāwanawana a haʻi mākou iā ia i ke kumu hoʻohālike a mākou e makemake ai e hoʻohana, aia he 5 i loaʻa, a e hoʻohana mākou i ka liʻiliʻi, ʻoiai ʻo ia ka imprecise loa no ka mea ʻo ia ka wikiwiki a maʻalahi ka leo. 3 a 4 mau hua'ōlelo wale nō.
model = whisper.load_model("tiny") result = model.transcribe("audio1.wav")
Me kēia, ua hoʻololi mākou i ka leo i ka kikokikona a mālama ʻia i kahi loli. E hoʻololi iki kāua.
Hoʻololi mākou i ka hopena i papa inoa me kēlā me kēia huaʻōlelo o ka leo
words = result["text"].split()
A mākaukau nā mea āpau e launa pū me kā mākou hāmeʻa. I kēia manawa pono mākou e hana i nā kūlana a mākou e makemake ai.
Inā loaʻa ka huaʻōlelo X i ka leo, e hana iā Y. ʻOiai aia nā huaʻōlelo i loko o kahi papa inoa, maʻalahi loa ka hoʻohui ʻana i nā kūlana
for word in words: word = word.replace(',', '').replace('.', '').lower() if word == 'enciende' or 'encender': encender() break if word == 'parpadea' or 'parpadear': parpadear() break
Ka laina
word = word.replace(',', '').replace('.', '').lower()
Hoʻohana au iā ia e hoʻololi i nā huaʻōlelo i ka leo i ka liʻiliʻi a wehe i nā koma a me nā manawa. A ma kēia ʻano e pale aku ai i nā hewa i ka hoʻohālikelike ʻana
I loko o kēlā me kēia inā e hoʻokō ʻia ke kūlana o ka loaʻa ʻana o kekahi o nā huaʻōlelo a mākou i koho ai, kapa ʻia ia he hana e hana i kā mākou makemake,
ʻO kēia kahi a mākou e haʻi ai iā ia e hoʻāla i kahi PIN e hoʻomālamalama i kahi LED a i ʻole e ʻālohilohi. E holo i kekahi code, a i ʻole e pani i ke kamepiula.
He manaʻo kumu kēia a pau. Mai ʻaneʻi hiki iā ʻoe ke hoʻomohala i ka papahana a hoʻomaikaʻi iā ia e like me kou makemake. Hiki i kēlā me kēia kanaka ke loaʻa kahi hoʻohana ʻokoʻa no ia mea.
Nā mea hiki iā mākou ke hana me kēia montage
He mau manaʻo kēia i hiki mai iaʻu e hoʻohana i kēia montage. I ka wā e paʻa ai ka iwi iwi, hiki iā mākou ke hoʻohana iā ia e hoʻāla i nā mea a pau i hiki i ka noʻonoʻo ma ka leo, hiki iā mākou ke hoʻāla i kahi relay e hoʻomaka ai i kahi kaʻa a i ʻole hiki iā mākou ke hoʻomaka i kahi palapala e hoʻokō ai i kahi palapala, leka uila a i ʻole.
He aha ka hawanawana
ʻO Whisper kahi hiʻohiʻona ʻike vol, hana i ka multilanguage me ka nui o nā ʻōlelo a hiki ke unuhi i ka ʻōlelo Pelekania. ʻO ia ka mea a mākou e ʻike ai ma ke ʻano he mea hana kikokikona, akā ʻo Open Source kēia, i hoʻokuʻu ʻia e ka hui OpenAI, nā mea hana o Stable Diffusion.