Vectorized Functions in R and Python
Data analytics tools such as the popular open source languages, R and Python, often have nuanced functions and procedures that provide efficiency, scalability, and maintainability in application code. Such methods include vectorized ones where analysts can manipulated large serialized objects in single calls.
R
Recent post of a StackOverflow R question by @commissar-vasili-karlovic shows the familiar path newcomers in R take in processing multi-step procedures where the familiar nested for loops (available in most general purpose languages) are used to build large data objects iteratively by row and column indexing as shown below:
a1 <- data.frame(); b1 <- data.frame(); rs <- data.frame()
k <- ncol(company)/levels
l <- 1 - k
for (j in 1:levels) {
l <- l + k
k <- k + k
for (i in l:k) {
mod <- lm(company[,i] ~ benchmark[,j])
a1[i,j] <- mod$coefficients[1]
b1[i,j] <- mod$coefficients[2]
rs[i,j] <- summary(mod)$adj.r.squared
}
}
table <- data.frame('Alpha_Coef' = a1, 'Beta_Coef' = b1,
'Adj.R_Squared' = rs)
My answer (@parfait) suggested use of vectorized functions such as Map(), lapply(), do.call() to manipulate entire objects in single calls:
reghandle <- function(x, y){
mod <- lm(company[[x]] ~ benchmark[[y]])
return(list(Alpha_coef = unname(mod$coefficients[1]),
Beta_coef = unname(mod$coefficients[2]),
Adj.R_Squared = unname(summary(mod)$adj.r.squared)))
}
tablelist <- Map(reghandle, names(company), benchmarknames)
table <- do.call(rbind, lapply(tablelist, data.frame))
table
Python
Another example on a StackOverflow Python pandas question, @patthebug appends rows iteratively in a for loop from a list source (again reminiscent of most programming languages) into a pandas dataframe:
finalResults = pd.DataFrame({'Concept1': itemsets_dct[0][0][0],
?'Concept2': itemsets_dct[0][0][1],
'Concept3': itemsets_dct[0][0][2],
'Concept4': itemsets_dct[0][0][3],
'Count': itemsets_dct[0][1]}, index=[0])
for i in range(1,len(itemsets_dct)):
tempResult = pd.DataFrame({'Concept1': itemsets_dct[i][0][0],
'Concept2': itemsets_dct[i][0][1],
'Concept3': itemsets_dct[i][0][2],
'Concept4': itemsets_dct[i][0][3],
'Count': itemsets_dct[i][1]}, index=[i])
finalResults.append(tempResult)
My answer (@parfait) suggested converting nested list into a list of dictionaries using a dictionary comprehension that is then cast to a dataframe in one call:
dfDict = [{'Concept1': i[0][0],
'Concept2': i[0][1],
'Concept3': i[0][2],
'Concept4': i[0][3],
'Count': i[1]} for i in itemsets_dct]
finalResults = pd.DataFrame(dfDict)