Excel is a powerful tool for creating reports, charts and analyzing data, however, there are times where we need to extract the data from a database, especially when dealing with large datasets, often requires a bit of SQL knowledge.
In this post, I’ll show you how we can leverage Python to automate the extraction and creation of Excel files, making the reporting process more efficient.
The Challenge
A few months ago, I faced a client request for an Excel reports that contained data from a SQL database, the volume of the data was so huge that it couldn’t fit into a single sheet.
To address this challenge, I decided to split the data by month across multiple sheets, the thing was that, doing this manually would have been a nightmare.
So, in my quest for efficiency, I chose to use Python to automate the extraction and creation of the Excel files. This not only saved me a few hours of manual work, but also provided a scalable solution for future reporting needs.
The Automated Solution
Let’s go into Python. First we need to import the necessary packages.
import pyodbc
import pandas as pd
from xlsxwriter import Workbook
Since I’m going to be using SQL Server, we need pyodbc
to manage the connection to the database.
pandas
is to manipulate the data in the data frames, and xlsxwriter
will help with the Excel file.
Next, create the connection to the SQL Server database.
conn = pyodbc.connect(
'DRIVER={SQL SERVER};'
'SERVER=YOUR_SERVER_NAME;'
'DATABASE=YOUR_DATABASE;'
'UID=USERNAME;'
'PWD=PASSWORD;'
)
Don’t forget to change the server name, database, user and password to match your SQL Server configuration.
Now, let’s execute the query and get the data into a pandas data frame.
sql_query = '''
SELECT
transaction_date, account_number, transaction_type, amount
FROM
transactions
'''
query_df = pd.read_sql(sql_query, conn)
Pandas is giving me this warning:
UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
It just means that pandas
prefers using SQLAlchemy for database access. For simplicity, I’ll ignore it.
Moving on, I need to create a writer object, it’s a pandas object to handle the creation of the Excel file and write multiple data frames to into different sheets.
I’ll also create a list of months, this is going to be the name of the sheets in the file.
writer = pd.ExcelWriter('path/to/your/file/ExcelReport.xlsx')
months = ['January','February','March','April','May','June','July','August','September','October','November','December']
Using a for loop, I’ll go through each month, filter the data, and insert it into the corresponding sheet.
I’m converting the column transaction_date
to a pandas datetime object and extracting the full month name.
for month in months:
df = query_df[pd.to_datetime(query_df['transaction_date']).dt.strftime('%B') == month]
df.to_excel(writer, sheet_name=month, index=False)
writer.close()
Once the script completes, you’ll find your Excel report in the folder you specified with separate sheets for each month.
This process could be enhanced adding error handling and logging mechanisms to capture any issues that may show up during the extraction and creation process, this could also be run at a regular interval using a task scheduler or a cron job.
Thank you for stopping by! Feel free to follow me on Medium where I write about SQL, Python and other data related topics.
Cheers! 🍻
Discover more from Ben Rodríguez
Subscribe to get the latest posts sent to your email.