登录查看更多内容

Bibliometric Analysis Using rscopus in R

Dr. Saurav Das

Research Director | Farming Systems Trial | Rodale Institute | Soil Health, Biogeochemistry of Carbon & Nitrogen, Environmental Microbiology, and Data Science | Outreach & Extension | Vibe coding

发布日期: 2023年10月31日

Bibliometric analysis is a powerful tool for researchers and academics to analyze the impact and trends in scientific publications. With R and its vast array of packages, conducting bibliometric analysis has become more accessible and efficient. One interesting package is rscopus (CRAN: rscopus), which allows users to interact with the Scopus database directly from R. This blog post will guide you through the process of downloading data using rscopus and performing post‐processing to extract meaningful insights.

Getting Started with RScopus

Installation First, install and load the rscopus package:

# Install the package (if not already installed)
install.packages("rscopus")

# Load the rscopus library
library(rscopus)

Setting Up API Key

Setting Up the API Key To use rscopus, you need an API key from Elsevier. Register on the Elsevier Developer Portal (Elsevier API) to obtain your key. Then, set the API key in R as follows:

options(elsevier_api_key = "your_api_key_here")

Searching for Papers

You can search for papers using the scopus_search() function. For example, to search for papers related to “soil health” published between 2000 and 2022, use:

query <- 'TITLE("soil health") AND PUBYEAR > 1999 AND PUBYEAR < 2023'
results <- scopus_search(query, count = 25, view = "COMPLETE")

Note:The count parameter specifies how many results to retrieve per query. If the total number of papers exceeds this count, you might need to implement pagination (i.e., performing multiple queries in multiples of 25) because of query limits.

Post-Processing the Results

Extracting Information Extract titles, authors, and abstracts from the returned results. Ensure that each command is on its own line for clarity:

# Extract the list of entries from the search results
papers <- results$entries

# Extract titles, authors, and abstracts using sapply
titles    <- sapply(papers, function(x) x$`dc:title`)
authors   <- sapply(papers, function(x) x$`dc:creator`)
abstracts <- sapply(papers, function(x) x$`dc:description`)

Creating a Data Frame

Combine the extracted information into a data frame:

data <- data.frame(
  Title    = titles,
  Authors  = authors,
  Abstract = abstracts,
  stringsAsFactors = FALSE
)

Cleaning the Data

Remove rows with missing values:

data <- na.omit(data)

Analyzing the Data

For instance, check how many papers mention both “soil health” and “crop yield” in the abstract:

# Create a logical column indicating the occurrence of both terms
data$soil_health_crop_yield <- grepl("soil health", data$Abstract, ignore.case = TRUE) &
                               grepl("crop yield", data$Abstract, ignore.case = TRUE)

# Count the number of papers meeting the criteria
num_papers <- sum(data$soil_health_crop_yield)
print(num_papers)

Visualizing the Results

Before creating visualizations, load the required plotting libraries:

library(ggplot2)
library(viridis)  # For scale_fill_viridis_d()

Create a bar plot to visualize the occurrence of the specific terms in the abstracts:

# Create a summary data frame
summary_df <- data.frame(
  Category = c("Soil Health + Crop Yield"),
  Count    = c(num_papers)
)

# Plot the results using ggplot2
ggplot(summary_df, aes(x = reorder(Category, -Count), y = Count, fill = Category)) +
  geom_bar(stat = "identity", show.legend = FALSE, width = 0.7, color = "black") +
  theme_minimal() +
  labs(title = "Occurrence of Specific Terms in Abstracts",
       y = "Number of Papers", x = "") +
  geom_text(aes(label = Count), vjust = -0.5, size = 5, color = "black") +
  scale_fill_viridis_d()

Conclusion

The rscopus package provides seamless access to bibliometric data from Scopus directly in R. By following the steps outlined above, you can efficiently download, clean, and analyze publication data and visualize your findings to gain valuable insights into scientific trends. Happy analyzing!

New packages:

Bibliometrix (https://www.bibliometrix.org/home)
bibliometrixData
Scholar (https://github.com/YuLab-SMU/scholar)
CitationNetworkViz
pubmedR (https://github.com/massimoaria/pubmedR)
litsearchr (https://elizagrames.github.io/litsearchr/)

R for Soil Science

2,634 位关注者

Sheshu Mutyalu

1 个月

I'm getting this error Error in get_results(query, start = init_start, count = count, verbose = verbose,?:? ?Unauthorized (HTTP 401).

1 次回应

Dr. Apurva Shukla, PhD

7 个月

Can you please list out the journals which accept the articles in bibliometrics in the field of finance?

1 次回应

Anubhav Thakur

Seed Science and Technology

1 年

Can use this for another fields ?

1 次回应

Kent T.

Disaster Resilience Researcher | Project Manager | Civil Engineer | Soil and Water Conservation Professional |

1 年

Have you tried Bibliometrix?

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr. Saurav Das的更多文章

Reference Extraction and Distribution by Year

2025年3月23日

Reference Extraction and Distribution by Year

Recently, during the revision of one of our manuscripts, we had a bit of back-and-forth with the journal over whether…
Synthetic Data for Soil C Modeling

2025年2月9日

Synthetic Data for Soil C Modeling

Note: The article is not complete yet My all-time question is, do we need all and precise data from producers (maybe I…
Bootstrapping

2025年1月7日

Bootstrapping

1. Introduction to Bootstrapping Bootstrapping is a statistical resampling method used to estimate the variability and…
Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

2024年12月24日

Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

The valuation of ecosystem services in monetary terms represents a critical frontier in environmental economics…
Redefining ROI for True Sustainability

2024年8月28日

Redefining ROI for True Sustainability

It’s been a while since I last posted for Muddy Monday, but a few thoughts have been taking root in my mind, growing…
Linear Plateau in R

2024年5月22日

Linear Plateau in R

When working with data in fields such as agriculture, biology, and economics, it’s common to observe a response that…

2 条评论
R vs R-Studio

2024年3月29日

R vs R-Studio

R: R is a programming language and software environment for statistical computing and graphics. Developed by Ross Ihaka…

1 条评论
Backtransformation

2024年2月22日

Backtransformation

Backtransformation is the process of converting the results obtained from a transformed dataset back to the original…

3 条评论
Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

2024年1月30日

Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

Spectroscopic methods comprise a diverse array of analytical techniques that quantify how light interacts with a…

2 条评论
Regression & Classification

2024年1月30日

Regression & Classification

Regression and classification are two predictive modeling approaches in statistics and machine learning. Here's a brief…

2 条评论

See all articles

Getting Started with RScopus

Setting Up API Key

Searching for Papers

Post-Processing the Results

Creating a Data Frame

Cleaning the Data

Analyzing the Data

Visualizing the Results

Conclusion

New packages:

R for Soil Science

2,634 位关注者

Dr. Saurav Das的更多文章

Reference Extraction and Distribution by Year

Synthetic Data for Soil C Modeling

Bootstrapping

Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

Redefining ROI for True Sustainability

Linear Plateau in R

R vs R-Studio

Backtransformation

Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

Regression & Classification