Ungaziguqula njani iitafile ukusuka kwi-PDF ukuya kwi-Excel okanye kwi-CSV ngeTabula

Ukupasa kunye nokuguqula i-pdf kwi-csv kwaye iphumelele

Ukujonga idatha yezembali ebonelelwe nguloo meteorological wokujonga esixekweni sam, ndiyabona ukuba Banikezela kuphela ngemizobo kunye nokukhuphela njengePDF. Andiqondi ukuba kutheni bengakuvumeli ukuba uzikhuphele kwi-csv, eya kuba luncedo ngakumbi kuye wonke umntu.

Ke bendikhangela enye Isisombululo sokudlulisa ezi tafile ukusuka kwi-pdf ukuya kwi-csv okanye ukuba umntu othile ufuna ukufomatha i-Excel okanye iOfisi yeLibre. Ndiyayithanda i-csv kuba nge-csv wenza yonke into onokuyiphatha nge-python kunye neelayibrari zayo okanye ungangenisa ngokulula kuyo nayiphi na ispredishithi.

Njengombono kukufezekisa inkqubo ezenzekelayo, into endiyifunayo siskripthi sokusebenza nePython kwaye kulapho uTabula engena khona.

Guqula i-pdf iye kwi-csv ngeTabula

Amanyathelo kunye nokusebenza kulula kakhulu. Eyokuqala iya kuba fakela ithala leencwadi eTabula kwindawo yethu yophuhliso. I-Tabula isivumela ukuba sikhuphe idatha kwiitafile ezikwiPDF ukuya kwiiPandas dataframes, ilayibrari yePython elungiselelwe ukusebenza nge-csv kunye noluhlu.

Iyavumela khupha kwaye uguqule phakathi kwePDF, JSON, CSV kunye neTSV. Gem. Unokufumana ulwazi ngakumbi ngakumbi kwifayile yayo ye- Indawo yokugcina github

Ndisebenzise wonke umsebenzi ukusukela kwiintsuku ezidlulileyo kwaye ndiwufake kwiAnaconda. Kwikhonkco ungabona ukuba njani faka uAnaconda.

Sifaka iTabula

#primero activamos nuestro entorno de desarrollo en nuestro caso sería conda activate comparador
pip install tabula-py

Xa ndiyenza, yandinika impazamo

Isisombululo njengoko kubonisiwe kumaxwebhu abo yayikukukhupha ingxelo yakudala yeThabula kwaye ufake entsha.

pip uninstall tabula
pip install tabula-py

Senza i .py

funda iitafile ukusuka kwi-pdf ukuya kwi-csv

Ndenza ephunyeziweyo .py endiyibiza ngokuba yi-pdftocsv.py ndiyifaka kwifolda yam yokukhuphela / eltiempo kwaye yifayile enekhowudi elandelayo

import tabula
# Extaer los datos del pdf al DataFrame
df = tabula.read_pdf("inforatge.pdf")
# lo convierte en un csv llamdo out.csv codificado con utf-8
df.to_csv('out.csv', sep='\t', encoding='utf-8')

I-pdf yokufunda iya kubizwa inforatge.pdf kwaye ndiyayixelela ukuba imveliso iyabizwa ngaphandle.csv kwaye iya kuhlala kwifolda esisebenza kuyo.

Siya kwisikhombisi apho sinokuphunyezwa kunye ne-pdf esifuna ukuyiguqula. Kubalulekile kuba ukuba izokusixelela ukuba ayinakufumana ifayile.

cd Descargas/eltiempo

Kule khowudi sinePDF, ifayile .py esiyenzileyo kwaye iya kubuyisa i-csv esiyifunayo.

Sisebenzisa ikhowudi

python pdftocsv.py

Qaphela ukuba ndisebenzise i-python, Oko kukuthi, ndiyayitsho ukuba uyiqhube nge-python 2 hayi nge-python3 engaphumeleliyo. Yiyo ke loo nto ukuba ayibuyisi mpazamo, sele sinayo.

sebenzisa iTabula kwindawo yethu yophuhliso lweAnaconda

Songeze eminye imigca emi-3 kwifayile kulawulo lwexesha lokusebenza. ekugqibeleni sishiye ifayile yethu ye-pdftocsv.py njenge

import tabula
import time

start_time = time.time()

df = tabula.read_pdf("inforatge.pdf")
df.to_csv('out.csv', sep='\t', encoding='utf-8')

print("--- %s seconds ---" % (time.time() - start_time))

Olunye ukhetho kwiTabula

Eminye imizekelo yezinto esinokuzenza. Zininzi iindlela onokukhetha kuzo, kungcono ukuya kwindawo yokugcina esemthethweni yaseGithub endiyishiyileyo

# Leer PDF remotos y convertirlos en DataFrame
df2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

# Convertir un PDF en CSV
tabula.convert_into("test.pdf", "output.csv", output_format="csv")

Ngaphandle kwamathandabuzo yenye yezona zinto ziluncedo ekuguquleni zonke iifayile zePDF, iJSON, njl njl.

tabula.convert_into_by_batch("input_directory", output_format='csv')

Ngale nto sinokuzenzekelayo kwimisebenzi enokuthi inde kwaye idinise. Ekugqibeleni, esi sesinye sezizathu zokusebenzisa eli thala leencwadi.

Guqula i-pdf ibalasele kwi-Intanethi

Ukuba into esiyifunayo kukuguqula ifayile, sikhuphe idatha kwitafile ukusuka kwiPDF ukuya kwi-Excel, Librecalc okanye efanayo, akukho mfuneko yokuba iyenze nzima kangako. Kukho izixhobo ezikhoyo zokwenza oku, ezinye ukufaka kunye nezinye ukwenza umsebenzi kwi-intanethi.

Ndizamile ezi zixhobo zimbini ezikwi-Intanethi kwaye zisebenza kakuhle kakhulu.

Gcina ukhumbula ukuba lo ayingomsebenzi ozenzekelayo, yiyo loo nto ukufundwa kwezi zixhobo kungakhange kugqibe. Ndiphawula kuphela kubo kwabo banomdla.

Indlela yeklasikhi

Kwaye sihlala sinendlela yeklasikhi, eyona inqabileyo kwaye iyabiza kodwa ekugqibeleni kukhetho ukuba akukho msebenzi mncinci.

Khuphela iiseli zetafile kwi-pdf kwaye uzincamathisele kwiphepha lethu lespredishithi.

Shiya amazwana