登录查看更多内容

Backtransformation

Dr. Saurav Das

Research Director | Farming Systems Trial | Rodale Institute | Soil Health, Biogeochemistry of Carbon & Nitrogen, Environmental Microbiology, and Data Science | Outreach & Extension | Vibe coding

发布日期: 2024年2月22日

Backtransformation is the process of converting the results obtained from a transformed dataset back to the original scale of the data. This step is essential for interpretation and communication, especially when the audience is not familiar with the specific transformations used in the analysis.

Why Back-transform?

Interpretability: Data is often transformed to meet the assumptions of a statistical model. However, the results in the transformed scale may not be intuitive. Back-transforming allows results to be presented in the original, more understandable units.
Meaningful Conclusions: It helps in drawing conclusions that are meaningful in the context of the original data. For example, predicting a response in the original scale can be more relevant than in a logarithmic scale.

Common Backtransformation Methods

Inverse of Log Transformation: If you have used a logarithmic transformation (log(x)), backtransformation involves taking the exponent of the result. The formula is x=e^(transformed)?value.
Inverse of Square Root Transformation: For square root transformation (root(x)), square the result to back-transform (x=(transformed?value)^2).
Inverse of Box-Cox Transformation: The Box-Cox transformation is a bit more complex. It includes a parameter λ, and the backtransformation depends on the value of λ. The backtransformed formula looks like something; x = (transformed value. λ + 1)^1/λ when λ =/0 and x = e^trasnformed value when λ = 0
Other Specific Inverse Transformations: Depending on the transformation applied (e.g., reciprocal, logarithm base 10, arcsine), the inverse function corresponding to the original transformation is used.

How to Back-transform

1. Logarithmic Transformation

Backtransformation: Exponentiation
Used For Normalizing right-skewed data
Common Transformations: Natural log (log), log base 10 (log10)

R Code:

# Log Transformation
log_transformed <- log(original_data)

# Backtransformation
backtransformed <- exp(log_transformed)

2. Square Root Transformation

Backtransformation: Squaring
Used For Reducing right skewness, stabilizing variance in count data

R Code:

# Square Root Transformation
sqrt_transformed <- sqrt(original_data)

# Backtransformation
backtransformed <- sqrt_transformed^2

3. Box-Cox Transformation

Backtransformation: Depends on lambda value
Used For Normalizing data, stabilizing the variance
Requires: Specifying a lambda (λ) value

R Code:

library(MASS) # for Box-Cox Transformation

# Box-Cox Transformation (example with lambda = 0.5)
boxcox_transformed <- boxcox(original_data, lambda = 0.5)

# Backtransformation (for lambda = 0.5)
backtransformed <- (boxcox_transformed * 0.5 + 1)^(1/0.5)

4. Reciprocal Transformation

Backtransformation: Reciprocal
Used For Reducing skewness, the modifying scale of measurement

R Code:

# Reciprocal Transformation
reciprocal_transformed <- 1 / original_data

# Backtransformation
backtransformed <- 1 / reciprocal_transformed

领英推荐

Unlocking the Power of Data & Algorithms: Transforming…

DataThick 9 个月前

What is Observability and how to unleash the power of…

Tobi Delly 5 个月前

From Data to Decisions: A Bird’s Eye View of Data…

Noorain Fathima 6 个月前

6. Arcsine Square Root Transformation

Backtransformation: Square and Inverse Sine
Used For Proportion data, especially near 0 or 1

R Code:

# Arcsine Square Root Transformation
arcsine_transformed <- asin(sqrt(original_data))

# Backtransformation
backtransformed <- sin(arcsine_transformed)^2

Considerations in Backtransformation

Bias: Direct backtransformation can lead to bias, especially for transformations like the logarithm. Adjustments such as Duan’s smearing estimate can be used to correct this.
Interpretation of Parameters: In regression models, backtransforming coefficients for interpretation can be complex, as the relationship between variables in the transformed scale may not directly translate to the original scale.
Error Metrics: If you transform your response variable, remember that error metrics (like RMSE) will also be in the transformed scale and might need adjustment or interpretation in the original scale.

Readings: https://davegiles.blogspot.com/2014/12/s.html (Bias during back-transformation)

Duan's smearing estimate is a technique used in statistics to adjust for bias when backtransforming predictions from a regression model. This technique is particularly relevant when the dependent variable has been log-transformed to satisfy model assumptions, such as linearity or homoscedasticity.

When you log-transform a dependent variable and develop a regression model, the predictions made by this model are also in the log scale. Directly exponentiating these predicted values to back transform them to the original scale can introduce bias, particularly in the presence of heteroscedasticity.

Duan's Smearing Estimate

To address this issue, Duan (1983) proposed the smearing estimate. The idea is to adjust the back-transformed predictions by an average factor derived from the residuals of the model. This method is especially useful when you expect the residuals to be heteroscedastic.

How It Works

Fit a Regression Model: First, fit a regression model to the log-transformed dependent variable.
Calculate Residuals: Compute the residuals from this model.
Smearing Factor: Calculate the smearing factor, which is the average of the exponentiated residuals.
Adjust Predictions: Multiply the exponentiated predictions by the smearing factor to obtain the back-transformed, adjusted predictions.

R Code:

# Assume 'model' is your regression model on log-transformed data

# Step 1: Fit a regression model

# Step 2: Calculate residuals
residuals <- residuals(model)

# Step 3: Calculate the smearing factor
smearing_factor <- mean(exp(residuals))

# Step 4: Adjust predictions
# Assume 'log_predictions' are your model's predictions on log scale
adjusted_predictions <- exp(log_predictions) * smearing_factor

When to Use

Duan's smearing estimate is particularly useful when:

The dependent variable in the regression is log-transformed.
There is heteroscedasticity in the residuals of the model.
You want to backtransform predictions while minimizing bias.

Limitations

Duan's method assumes that the form of the heteroscedasticity in the residuals is multiplicative.
It may not be appropriate if the log transformation does not adequately stabilize the variance or if the residuals are not normally distributed.

If my post helped you, you can buy me a coffee: https://www.buymeacoffee.com/sauravdastsk

R for Soil Science

2,632 位关注者

Nawaraj Pandey

Masters in Forest Biomaterials Science and Engineering || Oils, Bioactives and phyto chemistry || Founder of Kapal Fertilizer Manufacturing Farm

1 年

Thanks for sharing! hope you address the following my confusion related to backtransformation. 1. Is it compulsory or not in research work ? 2. How can we change the value of Standard Deviation from transformed value to back transformed ? 3. Which data is suitable to make the graph ?

1 次回应

Jaya Nepal

Postdoctoral researcher: Sustainable Cropping Systems Laboratory @Cornell CALS

1 年

Awesome information. Thank you. I may reach out for specific question later though :)

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr. Saurav Das的更多文章

Synthetic Data for Soil C Modeling

2025年2月9日

Synthetic Data for Soil C Modeling

Note: The article is not complete yet My all-time question is, do we need all and precise data from producers (maybe I…
Bootstrapping

2025年1月7日

Bootstrapping

1. Introduction to Bootstrapping Bootstrapping is a statistical resampling method used to estimate the variability and…
Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

2024年12月24日

Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

The valuation of ecosystem services in monetary terms represents a critical frontier in environmental economics…
Redefining ROI for True Sustainability

2024年8月28日

Redefining ROI for True Sustainability

It’s been a while since I last posted for Muddy Monday, but a few thoughts have been taking root in my mind, growing…
Linear Plateau in R

2024年5月22日

Linear Plateau in R

When working with data in fields such as agriculture, biology, and economics, it’s common to observe a response that…

2 条评论
R vs R-Studio

2024年3月29日

R vs R-Studio

R: R is a programming language and software environment for statistical computing and graphics. Developed by Ross Ihaka…

1 条评论
Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

2024年1月30日

Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

Spectroscopic methods comprise a diverse array of analytical techniques that quantify how light interacts with a…

2 条评论
Regression & Classification

2024年1月30日

Regression & Classification

Regression and classification are two predictive modeling approaches in statistics and machine learning. Here's a brief…

2 条评论
Vectorization over loop

2024年1月17日

Vectorization over loop

Vectorization Vectorization in R refers to the practice of applying a function to an entire vector or array of data at…
Correlation: Updating Font size/Linear Regression/R2 for Chart.Correlation

2023年11月25日

Correlation: Updating Font size/Linear Regression/R2 for Chart.Correlation

Note: Original package for this function: https://www.rdocumentation.

See all articles

Backtransformation

Dr. Saurav Das

Research Director | Farming Systems Trial | Rodale Institute | Soil Health, Biogeochemistry of Carbon & Nitrogen, Environmental Microbiology, and Data Science | Outreach & Extension | Vibe coding

Why Back-transform?

Common Backtransformation Methods

How to Back-transform

1. Logarithmic Transformation

2. Square Root Transformation

3. Box-Cox Transformation

4. Reciprocal Transformation

领英推荐

6. Arcsine Square Root Transformation

Considerations in Backtransformation

Duan's Smearing Estimate

How It Works

R Code:

When to Use

Limitations

R for Soil Science

2,632 位关注者

Dr. Saurav Das的更多文章

社区洞察

其他会员也浏览了

Comprehensively upgraded, SuperMap iDesktopX 2023 is your smarter assistant to deal with multi-type spatial data(Ⅰ)

How Big Data Contributes to the Connected Client Experience

Decision Tree Classification

Paxata

Poor data quality? It's simple to solve...

Introduction to Group Feature Selection

How do you handle missing data in a dataset?

05.01.2023 Executive Data Bytes – The Ultimate Guide To Crafting An AI and Data Strategy

The Death of Data & Analytics Democratisation: A Personal Journey

Why Back-transform?

Common Backtransformation Methods

How to Back-transform

1. Logarithmic Transformation

2. Square Root Transformation

3. Box-Cox Transformation

4. Reciprocal Transformation

领英推荐

6. Arcsine Square Root Transformation

Considerations in Backtransformation

Duan's Smearing Estimate

How It Works

R Code:

When to Use

Limitations

R for Soil Science

2,632 位关注者

Dr. Saurav Das的更多文章

Synthetic Data for Soil C Modeling

Bootstrapping

Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

Redefining ROI for True Sustainability

Linear Plateau in R

R vs R-Studio

Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

Regression & Classification

Vectorization over loop

Correlation: Updating Font size/Linear Regression/R2 for Chart.Correlation

社区洞察

其他会员也浏览了

Comprehensively upgraded, SuperMap iDesktopX 2023 is your smarter assistant to deal with multi-type spatial data(Ⅰ)

How Big Data Contributes to the Connected Client Experience

Decision Tree Classification

Paxata

Poor data quality? It's simple to solve...

Introduction to Group Feature Selection

How do you handle missing data in a dataset?

05.01.2023 Executive Data Bytes – The Ultimate Guide To Crafting An AI and Data Strategy

The Death of Data & Analytics Democratisation: A Personal Journey