Vectorized Functions in R and Python

Vectorized Functions in R and Python

Data analytics tools such as the popular open source languages, R and Python, often have nuanced functions and procedures that provide efficiency, scalability, and maintainability in application code. Such methods include vectorized ones where analysts can manipulated large serialized objects in single calls.

R

Recent post of a StackOverflow R question by @commissar-vasili-karlovic shows the familiar path newcomers in R take in processing multi-step procedures where the familiar nested for loops (available in most general purpose languages) are used to build large data objects iteratively by row and column indexing as shown below:

a1 <- data.frame(); b1 <- data.frame(); rs <- data.frame()

k <- ncol(company)/levels
l <- 1 - k
for (j in 1:levels) {
  l <- l + k
  k <- k + k
  for (i in l:k) {
    mod <- lm(company[,i] ~ benchmark[,j])

    a1[i,j] <- mod$coefficients[1]
    b1[i,j] <- mod$coefficients[2]
    rs[i,j] <- summary(mod)$adj.r.squared
  }
}

table <- data.frame('Alpha_Coef' = a1, 'Beta_Coef' = b1, 
                    'Adj.R_Squared' = rs)

My answer (@parfait) suggested use of vectorized functions such as Map(), lapply(), do.call() to manipulate entire objects in single calls:

reghandle <- function(x, y){
    mod <- lm(company[[x]] ~ benchmark[[y]])

    return(list(Alpha_coef = unname(mod$coefficients[1]),
                Beta_coef = unname(mod$coefficients[2]),
                Adj.R_Squared = unname(summary(mod)$adj.r.squared)))
}

tablelist <- Map(reghandle, names(company), benchmarknames)
table <- do.call(rbind, lapply(tablelist, data.frame)) 
table

Python

Another example on a StackOverflow Python pandas question, @patthebug appends rows iteratively in a for loop from a list source (again reminiscent of most programming languages) into a pandas dataframe:

finalResults = pd.DataFrame({'Concept1': itemsets_dct[0][0][0], 
                             ?'Concept2': itemsets_dct[0][0][1], 
                             'Concept3': itemsets_dct[0][0][2], 
                             'Concept4': itemsets_dct[0][0][3], 
                             'Count': itemsets_dct[0][1]}, index=[0])

for i in range(1,len(itemsets_dct)):
    tempResult = pd.DataFrame({'Concept1': itemsets_dct[i][0][0], 
                               'Concept2': itemsets_dct[i][0][1], 
                               'Concept3': itemsets_dct[i][0][2], 
                               'Concept4': itemsets_dct[i][0][3], 
                               'Count': itemsets_dct[i][1]}, index=[i])
    finalResults.append(tempResult)

My answer (@parfait) suggested converting nested list into a list of dictionaries using a dictionary comprehension that is then cast to a dataframe in one call:

dfDict = [{'Concept1': i[0][0], 
           'Concept2': i[0][1], 
           'Concept3': i[0][2], 
           'Concept4': i[0][3],          
           'Count': i[1]} for i in itemsets_dct]

finalResults = pd.DataFrame(dfDict)

要查看或添加评论,请登录

Parfait Gasana的更多文章

  • Importance of Long Format Data

    Importance of Long Format Data

    Often in the world of data science, there is the need to reshape data. Nearly every data analytics tool and language…

    4 条评论
  • Relational Databases in Data Science

    Relational Databases in Data Science

    In the age of big data, data scientists should leverage relational databases in their workflow. Doing so, analysts can…

    2 条评论

社区洞察

其他会员也浏览了