Automate Excel Reports with Python and SQL

Excel is a powerful tool for creating reports, charts and analyzing data, however, there are times where we need to extract the data from a database, especially when dealing with large datasets, often requires a bit of SQL knowledge.

In this post, I’ll show you how we can leverage Python to automate the extraction and creation of Excel files, making the reporting process more efficient.

The Challenge

A few months ago, I faced a client request for an Excel reports that contained data from a SQL database, the volume of the data was so huge that it couldn’t fit into a single sheet.

To address this challenge, I decided to split the data by month across multiple sheets, the thing was that, doing this manually would have been a nightmare.

So, in my quest for efficiency, I chose to use Python to automate the extraction and creation of the Excel files. This not only saved me a few hours of manual work, but also provided a scalable solution for future reporting needs.

The Automated Solution

Let’s go into Python. First we need to import the necessary packages.

import pyodbc
import pandas as pd
from xlsxwriter import Workbook

Since I’m going to be using SQL Server, we need pyodbc to manage the connection to the database.

pandas is to manipulate the data in the data frames, and xlsxwriter will help with the Excel file.

Next, create the connection to the SQL Server database.

conn = pyodbc.connect(
    'DRIVER={SQL SERVER};'
    'SERVER=YOUR_SERVER_NAME;'
    'DATABASE=YOUR_DATABASE;'
    'UID=USERNAME;'
    'PWD=PASSWORD;'
)

Don’t forget to change the server name, database, user and password to match your SQL Server configuration.

Now, let’s execute the query and get the data into a pandas data frame.

sql_query = '''
    SELECT
	    transaction_date, account_number, transaction_type, amount
    FROM 
        transactions
'''

query_df = pd.read_sql(sql_query, conn)

Pandas is giving me this warning:

UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.

It just means that pandas prefers using SQLAlchemy for database access. For simplicity, I’ll ignore it.

Moving on, I need to create a writer object, it’s a pandas object to handle the creation of the Excel file and write multiple data frames to into different sheets.

I’ll also create a list of months, this is going to be the name of the sheets in the file.

writer = pd.ExcelWriter('path/to/your/file/ExcelReport.xlsx')

months = ['January','February','March','April','May','June','July','August','September','October','November','December']

Using a for loop, I’ll go through each month, filter the data, and insert it into the corresponding sheet.

I’m converting the column transaction_date to a pandas datetime object and extracting the full month name.

for month in months:
    df = query_df[pd.to_datetime(query_df['transaction_date']).dt.strftime('%B') == month]
    df.to_excel(writer, sheet_name=month, index=False)

writer.close()

Once the script completes, you’ll find your Excel report in the folder you specified with separate sheets for each month.

This process could be enhanced adding error handling and logging mechanisms to capture any issues that may show up during the extraction and creation process, this could also be run at a regular interval using a task scheduler or a cron job.

Thank you for stopping by! Feel free to follow me on Medium where I write about SQL, Python and other data related topics.

Cheers!

Discover more from Ben Rodríguez

Subscribe to get the latest posts sent to your email.