Python Script to Combine Parquet Files

Today’s blog post is quick and simple, a script I find myself using quite often these days. It takes a collection of CSV or Parquet files and combines them into a single file. I find it useful whenever I need to query the same data across multiple files. Beats importing the individual files into a DB client like DBeaver. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import os import pandas as pd # Directory containing your files dir_path = '/Users/yunier/Downloads' # Output file name output_file = os.path.join(dir_path, 'combined_output1.parquet') # List of files to combine (CSV and Parquet) files = [f for f in os.listdir(dir_path) if f.endswith('.csv') or f.endswith('.parquet')] # List to hold DataFrames dfs = [] for file in files: file_path = os.path.join(dir_path, file) if file.endswith('.csv'): df = pd.read_csv(file_path) elif file.endswith('.parquet'): df = pd.read_parquet(file_path) else: continue df['source_file'] = file # Track source of data. dfs.append(df) # Combine all DataFrames if dfs: combined_df = pd.concat(dfs, ignore_index=True) combined_df.to_parquet(output_file) print(f'Combined file saved to: {output_file}') else: print('No CSV or Parquet files found to combine.')

August 20, 2025 · Yunier