Does a Business Intelligence developer need to know Python?
Let's start this article with getting to know what exactly a BI (Business Intelligence) developer has to do or better to say needs to master. let's take a look at Wikipedia , Business intelligence (BI) comprise the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. So all a BI developer has to do is to find where business data is stored , "gather" them and "store" them in one specific place so that they can use it for further analysis and reporting. The biggest claim a BI developer can have is data integration. They have to be able to read from different sources of data including relational databases like SQL Server or Oracle , No-SQL databases like MongoDB , other services like Google docs or Google Analytics data or even data on social media like twitter and Instagram.
A BI developer typically uses Microsoft SSIS (SQL Server Integration Services) or similar tools to integrate data from different sources into their data warehouse but the problem with this tool is that it is perfectly designed to read from relational databases and once you try to step a little further it either needs components written by ordinary people which often malfunction or does not support reading from the source you want at all. It is when you as an experienced BI developer must know a way out of this problem because otherwise you cannot support the CLAIM. But don't panic , there is always an easy learning and developing programming language to help you get away with that too , PYTHON.
Fortunately, python supports reading from almost whatever data source there is with the help of its useful and diverse packages and yes I know that there are many analytics , data mining , machine learning and data science packages to help with the modeling process but in this article I wanted to be as simple as a BI developer trying to solve their ETL/ELT (Extract . Load , Transform) problems. So here we go with some real world examples happened here in Digikala BI team.
The first problem happened when we Google sheets started to find their place among business end users, there were a variety of data that was stored on Google Sheets and got share between departments and of course they would have needed to use that data in their reports too. There is a very useful package named gspread_Pandas to read from Google Sheets and store them in Pandas data frames so that you can manipulate data in a way you want.You can see the instructions and documentation here.
After that we had a request from Digipay team to help them with their data which was mainly stored on MongoDB so we decided to ask our dear friend ,Python to help us. There is a package called PyMongo with which you can create a connection MongoDB and read from its collections and documents and if you were wondering about the connection string , this model helped us.
Con = pymongo.MongoClient("mongodb://{Username}:{Password}@{IP}:{Post}/{Authentication Collection Name}")
Have you ever used Google Analytics service on your website? Have ever wished to be able to store its data and use it again in a way you like? Well we had and there was another team that started the project using python but fortunately it was later delivered to our team so that now we could get familiar with another magic that Python performs. The process which is not as straight forward as the previous ones is explained here .
Finally let's talk about one of the hottest data oriented subjects nowadays , Social Media. Yes of course we need to use data from social media to know how well our business works and 2 of most important ones are twitter and Instagram. Both Instagram and Twitter provide APIs to use but they have their own limitations so I am going to propose something else, web scrapping. You can start crawling these 2 sites reading Instagram comments or twitter hashtags using Selenium package which is not easy but it is really fun so try these 2 links for Instagram and twitter to get to know the process better.
I was going to forget , there is something else , as a BI developer you always need a connection to your Data Warehouse which usually is a relational database like SQL Server database to store data. Pyodbc package will help you with that. :)
YC Alum (S13). Just started pumpup.com
4 年This is a great post. I think you've touched on something key: when you're using Python + SQL, you have the flexibility to integrate into a bunch of the different platforms that are used inside companies. You don't have to rely on a propietary connector or for all data to be centralised (which isn't a bad thing - but is often not the case). Another part of this is that the Python ecosystem will always beat a propietary tool in terms of number of libraries, functionality etc., because the open source community is so strong (I'm glad to see gspread-pandas got a mention - it does make working with sheets really simple). We have a bunch use-cases where need to do things like network plots of customers to see how they are related, plotting trends on Google Earth maps, analysing seasonality ontop of Salesforce; in a traditional BI platform, you'd either need to pay for a third-party plugin or outsource these to another SaaS tool - but in Python, someone has usually built a library which you plug right now. Finally, being able to version control scripts, dashboards, reports etc. in git and review them in GitHub instead of relying on a propietary tool's internal system is a lot cleaner. I'm excited to see how this plays out over the next few years, as I'm seeing the velocity of the ecosystem growing so quickly. We're working on an open source library and API (https://docs.datapane.com) which makes it easy to build reports from Python and deploy Notebooks and scripts so that other people can use them for reporting - check it out if you need some help bridging the gap between the Python scripts and the rest of the company.
Business Intelligence Developer at Mytheresa.com
5 年great..you are always keen to try new platform and technology and share it to us. i have learned a lot from you.thanks god we have you in our team.
Sr Data Engineer&Analyst
5 年It was really useful, specially reading socials using python. But BI is not all about Integration. Data cleansing( it is not happend only in ETL step) , DWH designing , OLAP , dashboarding ( data story telling ) are as important as integration, I think.