Da MariaDB a Parquet con Python e Apache Arrow
In questo articolo vediamo come creare un file Parquet con Python e Apache Arrow.
I dati li prenderemo da un database MariaDB.
Possiamo installare queste dipendenze direttamente con pip:
pip install pymysql pyarrow
Qui sotto un pò di codice:
import pymysql
import pyarrow as pa
import pyarrow.parquet as pq
connection = pymysql.connect(
host='localhost',
user='root',
password='9211',
database='Sql592686_5',
cursorclass=pymysql.cursors.DictCursor
)
try:
with connection.cursor() as cursor:
query = """
SELECT mov_tipo,
mov_valore,
mov_data,
causale_nome
FROM movimenti
INNER JOIN causali ON causale_id = mov_causale_fk
"""
cursor.execute(query)
results = cursor.fetchall()
column_names = [desc[0] for desc in cursor.description]
data_dict = {col: [] for col in column_names}
for row in results:
for col in column_names:
data_dict[col].append(row[col])
arrow_table = pa.table(data_dict)
pq.write_table(arrow_table, 'movimenti_export.parquet', compression='snappy')
print(f"Salvati {arrow_table.num_rows} record")
finally:
connection.close()
Enjoy!
python pip mariadb parquet apache arrow
Commentami!