How to Automate SAP Data Pulls with Python
Edit: While still helpful for understanding the basics of SAP scripting, the process outlined in this article is unnecessary because I made a Python library that does it for you, which is linked here: GitHub Repo. Docs for the library are in the wiki!
A day rarely goes by when I don't hear the word automation. While everyone acknowledges its importance, comprehensive guides on the subject are scarce. Fortunately, I've created a step-by-step guide to help you get started with automating processes
Step 1. Import necessary libraries:
import pandas as pd
import win32com.client
from datetime import datetime
import subprocess
Step 2. Connect to SAP:
SapGuiAuto = win32com.client.GetObject('SAPGUI')
application = SapGuiAuto.GetScriptingEngine
connection = application.Children(0)
session = connection.Children(0)
This code returns an object that gives your script access to all open SAP GUI windows.
Step 3. Handle dates, filename, and folderDir:
current_date = datetime.now()
formatted_date = current_date.strftime('%m.%d.%Y')
folderDir = r'C:\Users\your_name\Desktop\PythonProjects'
filename = f'FBL5N Pull {formatted_date}.XLSX'
Step 4. Walk-through SAP process:
I highly recommend walking through the process and writing down the shortcut keys
Step 5. Record the script:
Recording the script
Step 6. Paste script into your IDE:
Step 7. Convert the VB script to Python syntax:
The script you pasted into your IDE is in VB syntax. We need to convert it to Python. To do so, add parentheses behind verbs, like maximize(), and surround the numbers after sendVKey with parentheses. If you're unsure, you can follow the guide below.
note: I set _PATH = folderDir and _FILENAME = filename.
Step 8. Shut down Excel:
Just like you can't open the same Excel file twice, you can't read an already open file with pandas. Unfortunately, after you save an Excel file from SAP, it automatically opens. To shut this down, we need to close Excel on your computer. To do so, use this code.
subprocess.call(['taskkill', '/F', '/IM', 'EXCEL.EXE'])
Warning: this will force-quit all of your open Excel workbooks. Ensure you don't have anything you need to save open when you're running this code.
Step 9. Test run:
Make sure your script is working.
Step 10. Copy information for script:
Think back to recording the script. You pasted accounts from your clipboard that you copied from somewhere. We need to access that file and automate the copying of those accounts
accounts_file = r'C:\Users\your_name\Desktop\PythonProjects\accounts.XLSX'
accounts_df = pd.read_excel(accounts_file, sheet_name='accounts data')
It's likely that the file you're reading is shared and includes information you don't need. You need to remove the information you don't need from DataFrame. Don't worry; this will not affect the Excel sheet. Let's pretend my name, 'Eoin', is listed in a column next to the accounts I need.
eoin_accounts_df = accounts_df[accounts_df['Name'] == 'Eoin']
This will create a new DataFrame called eoin_accounts_df that only contains rows where my name is listed in the 'Name' column.
This next part is important; the copy function will copy the entire Excel sheet, but you only want the account numbers, so you'll need to remove everything from the DataFrame except for the 'Account' column.
copy_df = eoin_accounts_df['Account']
Now, using the .to_clipboard() function, copy the column to your clipboard.
copy_df.to_clipboard(excel=True, index=False, header=False)
At this point, if you were to run your code, it would copy the necessary information, paste it into SAP, execute it, and save it to your computer. Pretty sweet!
Step 11. Access the exported file:
Now that you've pulled all the data and saved it to your computer, we need to read the exported file. If you followed all of the steps, this code should work.
export_df = pd.read_excel(folderDir + filename)
Step 12. Automate the report:
Now you can do whatever analysis your heart desires. For the sake of this article, I'll pretend there are xlookups I need to do with the account numbers. Let's say each account has one cost center that is accessible on the accounts_df (the one we made in step 10).
We can't easily write functions in Excel, so we'll do the equivalent process in Python: create a dictionary that connects each account number with its corresponding cost center. If you're not familiar with programming, think of it like the periodic table. In the periodic table, each element has a unique atomic number. Similarly, each account has a unique cost center. By setting up this "dictionary," we're essentially telling the computer, "Hey, if you ever need to find the cost center for a particular account number, just look it up in this dictionary we've created." It's like having a handy reference guide for the computer to quickly provide information when needed.
acct_cost_center_dict = accounts_df.set_index('Account')['Cost Center']
Then, we use the dictionary to assign the correct value in the 'Cost Center' column in our SAP export.
export_df['Cost Center'] = export_df['Account].map(acct_cost_center_dict)
Step 13. Save the file
Now you're done. All you need to do is save your DataFrame as an Excel sheet, and you're finished. Use the .to_excel() function.
export_df.to_excel(f'Finished Report {formatted_date}.xlsx', index=False)
If you've made it this far, congratulations! Please understand that the examples in this guide are rudimentary, and I chose them to show what is easily accomplishable. You can automate almost anything given enough time. If anyone has any questions/wants help with trying to automate a process, feel free to message me! ChatGPT and BingAI are also very helpful.
Dynamic Data Analyst | Python | R | MySQL | Stat&Math modeling
7 个月Right what I need! Thanks for knowledge sharing ??
Fascinating work, Eoin! Excited to watch you progress as you continue your coding journey.
Financial Consultant at Optum
1 年Great guide, Eoin! Thank you for sharing
Account Executive @ Gartner
1 年Love this, Eoin!