How to Write a Pandas DataFrame to Google Cloud Storage or BigQuery

To write a Pandas DataFrame to Google Cloud Storage or BigQuery, you can use the “df.to_csv()” function.

Writing a Pandas DataFrame to Google Cloud Storage

You need to install the google-cloud-storage package.

pip install google-cloud-storage

Then, you can use the following code to write a DataFrame to a CSV file in Google Cloud Storage.

import pandas as pd
from google.cloud import storage

# Your DataFrame
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)

# Set up Google Cloud Storage
project_id = 'your_project_id'
bucket_name = 'your_bucket_name'
destination_blob_name = 'your_destination_blob_name.csv'

# Authenticate using a JSON key file. Replace 'path/to/keyfile.json'
# With the path to your JSON key file.
storage_client = storage.Client
                 .from_service_account_json('path/to/keyfile.json')
bucket = storage_client.get_bucket(bucket_name)

# Write the DataFrame to a CSV file in memory
csv_data = df.to_csv(index=False).encode('utf-8')

# Upload the CSV data to Google Cloud Storage
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(csv_data, content_type='text/csv')

print(f"DataFrame uploaded to {destination_blob_name}")

Writing a Pandas DataFrame to BigQuery

First, you need to install the google-cloud-bigquery and pandas-gbq packages:

pip install google-cloud-bigquery pandas-gbq

Then, you can use the following code to write a DataFrame to a BigQuery table:

import pandas as pd
from google.cloud import bigquery

# Your DataFrame
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)

# Set up BigQuery
project_id = 'your_project_id'
dataset_id = 'your_dataset_id'
table_id = 'your_table_id'

# Authenticate using a JSON key file.
# Replace 'path/to/keyfile.json' with the path to your JSON key file.
client = bigquery.Client.from_service_account_json('path/to/keyfile.json')

# Create a job config object to specify the write disposition
job_config = bigquery.LoadJobConfig(
  write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE
)

# Write the DataFrame to BigQuery
table_ref = client.dataset(dataset_id).table(table_id)
job = client.load_table_from_dataframe(df, table_ref, job_config=job_config)
job.result()

print(f"DataFrame uploaded to {dataset_id}.{table_id}")

This code will overwrite the existing table in BigQuery. If you want to append the data instead, change the write_disposition to bigquery.WriteDisposition.WRITE_APPEND.

Leave a Comment