The PIPE function [python tips]

The PIPE function [python tips]

With Pandas, it could be interesting to use the .Pipe function to enhance the readibility of your code (in a more succinct and readable Python script).

To avoid the repeatition of lines with some transformations or use of nested functions, the .pipe function can display our python script in a clearer formatting. It must be applied with functions that expect pd.Series or DataFrames.

Formatting is DataFrame.pipe(func, *args, **kwargs).

Let's not talk too much and go practicing. Here is the example in the .pipe documentation.

def subtract_federal_tax(df):
    return df * 0.9
def subtract_state_tax(df, rate):
    return df * (1 - rate)
def subtract_national_insurance(df, rate, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)        

Two solutions in these case:

  • Create new intermediate output variables :

df1 = substract_federal_tax(df)
df2 = substract_state_tax(df1, rate)
....
df_final = subtract_national_insurance(dfn, rate, rate_increase)        

Use nesteed functions as follow :

subtract_national_insurance(
    subtract_state_tax(subtract_federal_tax(df), rate=0.12),
    rate=0.05,
    rate_increase=0.02)          

OR use .pipe in one-line :

(
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
)        

*args and **kwargs are function arguments (non-keywords and keywords arguments respectively).

Here is an example in Data Science from Tom Augspurger, using Decorators. This article is very usefull, with method-chaining in general.

from functools import wraps
import logging

def log_shape(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        logging.info("%s,%s" % (func.__name__, result.shape))
        return result
    return wrapper

def log_dtypes(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        logging.info("%s,%s" % (func.__name__, result.dtypes))
        return result
    return wrapper


@log_shape
@log_dtypes
def load(fp):
    df = pd.read_csv(fp, index_col=0, parse_dates=True)

@log_shape
@log_dtypes
def update_events(df, new_events):
    df.loc[new_events.index, 'foo'] = new_events
    return df        

To read more about pipe, links below could be usefull.

要查看或添加评论,请登录

Christophe THIBAULT, PhD的更多文章

社区洞察

其他会员也浏览了