The PIPE function [python tips]
With Pandas, it could be interesting to use the .Pipe function to enhance the readibility of your code (in a more succinct and readable Python script).
To avoid the repeatition of lines with some transformations or use of nested functions, the .pipe function can display our python script in a clearer formatting. It must be applied with functions that expect pd.Series or DataFrames.
Formatting is DataFrame.pipe(func, *args, **kwargs).
Let's not talk too much and go practicing. Here is the example in the .pipe documentation.
def subtract_federal_tax(df):
return df * 0.9
def subtract_state_tax(df, rate):
return df * (1 - rate)
def subtract_national_insurance(df, rate, rate_increase):
new_rate = rate + rate_increase
return df * (1 - new_rate)
Two solutions in these case:
df1 = substract_federal_tax(df)
df2 = substract_state_tax(df1, rate)
....
df_final = subtract_national_insurance(dfn, rate, rate_increase)
Use nesteed functions as follow :
领英推荐
subtract_national_insurance(
subtract_state_tax(subtract_federal_tax(df), rate=0.12),
rate=0.05,
rate_increase=0.02)
OR use .pipe in one-line :
(
df.pipe(subtract_federal_tax)
.pipe(subtract_state_tax, rate=0.12)
.pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
)
*args and **kwargs are function arguments (non-keywords and keywords arguments respectively).
Here is an example in Data Science from Tom Augspurger, using Decorators. This article is very usefull, with method-chaining in general.
from functools import wraps
import logging
def log_shape(func):
@wraps(func)
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
logging.info("%s,%s" % (func.__name__, result.shape))
return result
return wrapper
def log_dtypes(func):
@wraps(func)
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
logging.info("%s,%s" % (func.__name__, result.dtypes))
return result
return wrapper
@log_shape
@log_dtypes
def load(fp):
df = pd.read_csv(fp, index_col=0, parse_dates=True)
@log_shape
@log_dtypes
def update_events(df, new_events):
df.loc[new_events.index, 'foo'] = new_events
return df
To read more about pipe, links below could be usefull.
Bravo Christophe THIBAULT, PhD pour ce nouvel article ! ??