Data Science articles I have read (w/c 15/11/21)
Another cool visualisation article for bookmarking. You can tell, I was busy this week with looking at data. This post explains how to show dashboards and tables in jupyter notebooks. It’s not completely straight forward and I guess if you have access to looker, PowerBi or another visualisation/reporting software you wouldn’t bother with it.
The article shows a worked example of time series clustering with smart meter load data. There are links to a python notebook and further reading. The author was interested in different cluster types, a daily profile of load as well as a weekly profile. It’s interesting to see clear differences in patterns. It’s a great starting point when exploring energy related time series data.
Nice overview of exploratory data analysis (EDA) charts in python. The title suggests that there is some sort of software involved but it turns out the article lays out ‘ready-made’ charts that are useful for EDA from different python libraries such as seaborn. Definitely a nice article to bookmark when stuck for ways to view your data.
领英推荐
To bookmark, this is a great introduction text to python matplotlib figures and explains in some detail exactly what all the parts of the figure are called and how to change them such as minor and major ticks etc. There are some great graphs in there that can be useful for practice such as plotting predictions effectively.
This is a fascinating report about using a health app and the pitfalls of its recommendations. The author describes that after her initial enthusiasm of using the app waned she would not put in a lot of effort to the data recording about her daily eating habits. She would copy foods and amounts form the previous day, a lot. Thus, there was no variation in the data and the algorithm suggested to her to stop eating so much cauliflower even though we would assume eating cauliflower every day shouldn’t be an issue and actually encouraged. As Data Scientists, how do we deal with bad data? How can we ensure that our algorithm gives useful recommendations even if the data is sparse or terrible?
This is an interview with?Luis Fernando García?who fights against the Mexican government introducing a universal surveillance system by collecting personal data from its citizens. Accordingly, a digital ID is being introduced to combat crime but the interviewee argues that government and crime are in business together and an ID won’t solve any crime issues. Concerns are also raised about rich countries’ interest in such a data collection, especially because the World Bank is funding the scheme and is seen as an instrument of colonialisation and suppression by the recipient countries. We can see attempts on digital IDs also in the West with Covid passports as a starting point.?
As a Data Scientist I think we need to be selective of our jobs and have a responsibility not to work with ethically questionable data event if it would be interesting to satisfy our personal curiosity. If the results of our work are used to suppress and discriminate we need to speak up or vote with our feet.