How to wrangle the Data with Python?
Introduction
There is much time needed for programming work in data analysis and modeling. Data preparation are including loading, cleaning, transforming, and rearranging. We occasionally select wrong data that is stored in files or databases for a data processing application.
Several persons select to do ad hoc processing of data from one form to another. They use general-purpose programming for example?Python, Perl, R, or Java, or UNIX text processing tools like sed or awk. Luckily, pandas along with the?Python?standard library offer us a high-level, flexible, and high-performance set of core manipulations. It also provided algorithms to allow us to wrangle data into the right form deprived of much worry.
Description
Importance of Data Wrangling
Uniting and Merging?Data?Sets
Database-style?DataFrame?Merges
Example:
In [15]: df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
....: 'data1': range(7)})
In [16]: df2 = DataFrame({'key': ['a', 'b', 'd'],
....: 'data2': range(3)})
In [17]: df1 In [18]: df2
Out[17]: Out[18]:
data1 key data2 key
0 0 b 0 0 a
1 1 b 1 1 b
2 2 a 2 2 d
3 3 c
4 4 a
5 5 a
6 6 b
In [19]: pd.merge(df1, df2)
Out[19]:
data1 key data2
0 2 a 0
1 4 a 0
2 5 a 0
3 0 b 1
4 1 b 1
5 6 b 1
In [20]: pd.merge(df1, df2, on='key')
Out[20]:
data1 key data2
0 2 a 0
1 4 a 0
2 5 a 0
3 0 b 1
4 1 b 1
5 6 b 1
In [21]: df3 = DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
....: 'data1': range(7)})
In [22]: df4 = DataFrame({'rkey': ['a', 'b', 'd'],
....: 'data2': range(3)})
In [23]: pd.merge(df3, df4, left_on='lkey', right_on='rkey')
Out[23]:
data1 lkey data2 rkey
0 2 a 0 a
1 4 a 0 a
2 5 a 0 a
3 0 b 1 b
4 1 b 1 b
5 6 b 1 b
In [24]: pd.merge(df1, df2, how='outer')
Out[24]:
data1 key data2
0 2 a 0
1 4 a 0
2 5 a 0
3 0 b 1
4 1 b 1
5 6 b 1
6 3 c NaN
7 NaN d 2
Merging on Index
In [36]: left1 = DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'],
....: 'value': range(6)})
In [37]: right1 = DataFrame({'group_val': [3.5, 7]}, index=['a', 'b'])
In [38]: left1 In [39]: right1
Out[38]: Out[39]:
key value group_val
0 a 0 a 3.5
1 b 1 b 7.0
2 a 2
3 a 3
4 b 4
5 c 5
In [40]: pd.merge(left1, right1, left_on='key', right_index=True)
Out[40]:
key value group_val
0 a 0 3.5
2 a 2 3.5
3 a 3 3.5
1 b 1 7.0
4 b 4 7.0
In [41]: pd.merge(left1, right1, left_on='key', right_index=True, how='outer')
Out[41]:
key value group_val
0 a 0 3.5
2 a 2 3.5
3 a 3 3.5
1 b 1 7.0
4 b 4 7.0
5 c 5 NaN
Concatenating Along an Axis
In [58]: arr = np.arange(12).reshape((3, 4))
In [59]: arr
Out[59]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [60]: np.concatenate([arr, arr], axis=1)
Out[60]:
array([[ 0, 1, 2, 3, 0, 1, 2, 3],
[ 4, 5, 6, 7, 4, 5, 6, 7],
[ 8, 9, 10, 11, 8, 9, 10, 11]])
For more details visit:https://www.technologiesinindustry4.com/2021/09/how-to-wrangle-the-data-with-python.html