Momwe mungasinthire matebulo kuchokera pa PDF kukhala Excel kapena CSV yokhala ndi Tabula

Pitani ndikusintha pdf kukhala csv ndikupambana

Kuyang'ana mbiri yakale yoperekedwa ndi malo owonera zanyengo mumzinda wanga, ndikuwona amangowapereka mojambula komanso kutsitsa ngati PDF. Sindikumvetsa chifukwa chomwe samakulolani kuti muwatsitse mu csv zomwe zingakhale zothandiza kwa aliyense.

Chifukwa chake ndakhala ndikufunafuna imodzi yankho lodutsa matebulo awa kuchokera ku pdf kupita ku csv kapena ngati wina akufuna kupanga fomu ya Excel kapena Libre Office. Ndimakonda csv chifukwa ndi csv mumachita chilichonse chomwe mungathe kuthana ndi chinsato ndi malaibulale ake kapena mutha kuyitanitsa mosavuta mu spreadsheet iliyonse.

Monga lingaliro ndikukwaniritsa makina, zomwe ndikufuna ndikulemba kuti ndigwire ntchito ndi Python ndipo ndipamene Tabula amalowa.

Sinthani pdf kukhala csv ndi Tabula

Masitepe ndi ntchito ndi lophweka. Woyamba adzakhala ikani laibulale ya Tabula m'malo athu otukuka. Tabula amatilola kuti titulutse deta kuchokera pa matebulo a PDF mu ma Pandas dataframes, laibulale ya Python yokonzedwa kuti igwire ntchito ndi csv ndi arrays.

Zimathandizanso chotsani ndikusintha pakati pa PDF, JSON, CSV ndi TSV. Mwala wamtengo wapatali. Mutha kupeza zambiri zambiri mu chosungira cha github

Tumizani ku mndandanda wathu wamakalata

Ndimagwiritsa ntchito ntchito yonse kuyambira masiku apitawa ndikuyiyika ku Anaconda. Mu ulalo mutha kuwona momwe kukhazikitsa Anaconda.

Timayika Tabula

#primero activamos nuestro entorno de desarrollo en nuestro caso sería conda activate comparador
pip install tabula-py

Ndikayigwira, idandipatsa cholakwika

yankho monga momwe zikusonyezedwera m'malemba awo ndikutulutsa mtundu wakale wa Tabula ndikuyika yatsopano.

pip uninstall tabula
pip install tabula-py

Timapanga fayilo yotheka .py

werengani matebulo kuchokera pa pdf mpaka csv

Ndimapanga pulogalamu yotulutsa .py yomwe ndimaitcha kuti pdftocsv.py ndimayiyika mufoda yanga Yotsitsa / eltiempo ndipo ndi fayilo yokhala ndi nambala iyi

import tabula
# Extaer los datos del pdf al DataFrame
df = tabula.read_pdf("inforatge.pdf")
# lo convierte en un csv llamdo out.csv codificado con utf-8
df.to_csv('out.csv', sep='\t', encoding='utf-8')

Pdf yowerengera imatchedwa inforatge.pdf ndipo ndikunena kuti zotulutsidwazo zimatchedwa kuti.csv ndipo zizikhala mufoda yomwe tikugwirayo.

Timapita kumalo omwe timakhala nawo omwe angathe kuchitidwa komanso pdf yomwe tikufuna kusintha. Ndikofunikira chifukwa ngati mungatiuze kuti simungapeze fayiloyo.

cd Descargas/eltiempo

M'bukuli tili ndi PDF, fayilo ya .py yomwe tidapanga ndipo pamenepo ibwezera csv yomwe tikufuna.

Timapereka code

python pdftocsv.py

Zindikirani kuti ndagwiritsa ntchito nsato, ndiye kuti, ndimayankhula kuti iziyendetsa ndi python 2 osati ndi python3 yomwe imalephera. Ndipo ndizomwe sizingabweretse vuto lililonse, tili nazo kale.

yendetsa Tabula m'malo athu otukuka a Anaconda

Tawonjezeranso mizere 3 mufayilo yoyendetsa nthawi yothamanga. pamapeto tasiya fayilo yathu ya pdftocsv.py ngati

import tabula
import time

start_time = time.time()

df = tabula.read_pdf("inforatge.pdf")
df.to_csv('out.csv', sep='\t', encoding='utf-8')

print("--- %s seconds ---" % (time.time() - start_time))

Zosankha zina kuchokera ku Tabula

Zitsanzo zambiri za zomwe tingachite. Pali zosankha zambiri, ndibwino kuti mupite kumalo osungira a Github omwe ndasiya

# Leer PDF remotos y convertirlos en DataFrame
df2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

# Convertir un PDF en CSV
tabula.convert_into("test.pdf", "output.csv", output_format="csv")

Ndipo mosakayikira chimodzi mwazinthu zofunika kwambiri kusintha mafayilo onse a PDF, JSON, etc.

tabula.convert_into_by_batch("input_directory", output_format='csv')

Ndi izi titha kupanga zinthu zomwe zikadakhala zazitali komanso zotopetsa. Pamapeto pake ichi ndi chimodzi mwazifukwa zogwiritsa ntchito laibulaleyi.

Sinthani pdf kuti ipambane pa intaneti

Ngati zomwe tikufuna ndikungosintha fayilo, chotsani zomwe zili patebulopo kuchokera pa PDF kupita ku Excel, Librecalc kapena zina zotere, sikoyenera kuzisokoneza kwambiri. Pali zida zomwe zingagwiritsidwe ntchito, ena kuti akhazikitse pomwe ena kuti agwiritse ntchito intaneti.

Ndayesera zida ziwirizi pa intaneti ndipo zimagwira ntchito bwino kwambiri.

Kumbukirani kuti iyi si ntchito yokhazikika, chifukwa chake kuphunzira kwa zida izi sikunakhale kwathunthu. Ndimangoyankhapo kwa iwo omwe angakhale achidwi.

Njira yachikale

Ndipo nthawi zonse timakhala ndi njira yachikale, yosalongosoka komanso yokwera mtengo koma pamapeto pake ndi mwayi ngati pali ntchito yochepa.

Lembani ma tebulo kuchokera pa pdf ndikuziika mu spreadsheet yathu.

Kusiya ndemanga