Data Analysts, Stop Ignoring Pandas Series
When we talk about pandas, the bulk of the conversation revolves around the pandas DataFrame. Barely any attention is given to the pandas Series. This is quite understandable considering how powerful a DataFrame is. While DataFrames excel at handling tabular data, Series offer a versatile way to work with one-dimensional labeled data. They provide the foundation for building DataFrames and serve as the building blocks for various data manipulation tasks In this article, we are going to shine some light on the pandas Series. We'll explore their creation, key operations, and how they empower you to work efficiently with your data.
Understanding the Pandas Series
At its core, the Series is simply a sequence of data. Think of a Series as a single column in a spreadsheet. It holds a collection of elements (data points) with corresponding labels (indexes). These indexes can be unique, but they can also be non-unique. While unique indexes allow for efficient retrieval by index value, non-unique indexes might require additional considerations when manipulating the Series. A Series can be created from data types like lists, NumPy arrays, dictionaries, etc. Below, we create a Series from a list of names:
By default, the data points in the Series will be indexed from zero. However, the Series constructor has an index parameter that we can use to pass a custom index to the data points.
In the code below, we have passed alphabetic characters as the index. For this to work, the length of the custom index must match the length of the data points. If there is a mismatch (if the data points outnumber the index or vice versa), then the code will generate a ValueError.
The data type of the above Series, is "object." This is inferred from the data, which is of the string data type. If the data is of the integer data type, the inferred data type would be int64. See below:
However, we can also use the dtype parameter to set the data type of the Series. In the code below, we have set the Series to a float data type.
While a Series usually holds a single data type, if the data is arbitrary, then the data type will be an object data type. This is the only data type that accommodates all different types of data. In the example below, we have arbitrary data in the form of integers and strings. You can see that the data type is an object.
Please note that extensive mixing of data types can lead to performance issues and unexpected behavior. It is always recommended to have a Series that holds a single type of data.
Build the Confidence to Tackle Data Analysis Projects [40% OFF]
To build a successful data analysis project, one must have skills in data cleaning and preprocessing, visualization, modeling, EDA, and so forth. The main purpose of this book is to ensure that you develop data analysis skills with Python by tackling challenges. By the end of 50 days, you should be confident enough to take on any data analysis project with Python Take advantage of the March discount by clicking here: Start the 50-day challenge now.
Other Resources
Want to learn Python fundamentals the easy way? Check out Master Python Fundamentals: The Ultimate Python Course for Beginners
Challenge yourself with Python challenges. Check out 50 Days of Python: A Challenge a Day.
100 Python Tips and Tricks, Python Tips and Tricks: A Collection of 100 Basic & Intermediate Tips & Tricks.
Creating a Series from a Dictionary
We can also create a Series from a dictionary. By default, the keys of the dictionary will be the index, and the values will be the data points. See below:
If you want to create a Series with specific key-value pairs or require a different order, you can create a custom index that matches the keys of the key-value pairs you want to include in your Series. See the example below:
You can see in the output that the key "gender" and its value "Male" are not in the Series. Remember, for this to work, the custom index should match the keys you want to include in the Series.
Indexing and Slicing pandas Series
The Series supports both label-based and positional indexing. You can access elements by their index label or integer position. In the example below, we use the index label "a" to access 20.0 from the Series.
Additionally, pandas Series supports slicing operations, allowing you to extract subsets of data easily. Let's say we want to extract a subset with numbers 20 to 40 from the above series. Here is how we can use label slicing to extract the subset:
Using Filtering to Extract a Subset
We can extract specific subsets of data based on conditions. We can leverage Boolean expressions or comparison operators to define filtering conditions. Let's say we want to extract a subset of numbers greater than 10 from the Series. Here is how we can do it using filtering:
In this code, my_series [my_series > 10] uses this Boolean mask to filter the original Series. Only the elements for which the corresponding value in the Boolean mask is True are included in the subset. In this case, only values that are greater than 10 are included.
Wrap-Up
The pandas Series can be extremely powerful for manipulating one-dimensional data. Its ability to handle one-dimensional labeled data efficiently makes it a great tool for data analysis tasks. We've explored creating Series from various data structures, like lists and dictionaries. We've also delved into indexing, slicing, and filtering techniques to extract specific data subsets based on conditions.
Now that you're equipped with these foundational concepts, why not explore further? Experiment with creating your own Series, practice filtering operations, and discover the vast potential of Series in your data analysis endeavors. You can also attempt the Day 9 challenge from the book "50 Days of Data Analysis with Python: The Ultimate Challenges Book for Beginners." Thanks for reading.
Newsletter Sponsorship
You can reach a highly engaged audience of over 345,000 tech-savvy subscribers and grow your brand with a newsletter sponsorship. Contact me at [email protected] today to learn more about the sponsorship opportunities.
Very helpful
BA Philosophy || Data analysis || Powe Query || Data manipulation || Data Cleaning || Power Bi|| Dashboard ||Site Engineer
1 天前Wow
Data Profesional | Building Scalable Data Solutions | SQL - Python - ETL | Optimizing Data Pipelines for Business Growth
1 天前This is very insightful. In my opinion, almost everything in data depends on the context of what you need. So, before you work with data series, check if this format fits your problem needs.
Especialista em Python | Automa??o de Processos e Gest?o Financeira | Criador de Scripts em Python | Análise de Dados | Java | JavaScript | .NET
1 天前Interessante, é sempre bom adquirir novos conhecimentos
Sourcing Manager | Trainer & Mentor | Senior Consultant | AI, ML & GenAI & IT Workshop Leader | Customer Service Specialist | Soft Skills Expert | Over 14 Years of Experience Driving Growth & Innovation |????? ????? ????
1 天前Great share, very informative and well explained.