Bibliometric Analysis Using rscopus in R

Bibliometric Analysis Using rscopus in R

Bibliometric analysis is a powerful tool for researchers and academics to analyze the impact and trends in scientific publications. With R and its vast array of packages, conducting bibliometric analysis has become more accessible and efficient. One interesting package is rscopus (CRAN: rscopus), which allows users to interact with the Scopus database directly from R. This blog post will guide you through the process of downloading data using rscopus and performing post‐processing to extract meaningful insights.

Getting Started with RScopus

Installation First, install and load the rscopus package:

# Install the package (if not already installed)
install.packages("rscopus")

# Load the rscopus library
library(rscopus)
        

Setting Up API Key

Setting Up the API Key To use rscopus, you need an API key from Elsevier. Register on the Elsevier Developer Portal (Elsevier API) to obtain your key. Then, set the API key in R as follows:

options(elsevier_api_key = "your_api_key_here")
        

Searching for Papers

You can search for papers using the scopus_search() function. For example, to search for papers related to “soil health” published between 2000 and 2022, use:

query <- 'TITLE("soil health") AND PUBYEAR > 1999 AND PUBYEAR < 2023'
results <- scopus_search(query, count = 25, view = "COMPLETE")
        
Note:The count parameter specifies how many results to retrieve per query. If the total number of papers exceeds this count, you might need to implement pagination (i.e., performing multiple queries in multiples of 25) because of query limits.

Post-Processing the Results

Extracting Information Extract titles, authors, and abstracts from the returned results. Ensure that each command is on its own line for clarity:

# Extract the list of entries from the search results
papers <- results$entries

# Extract titles, authors, and abstracts using sapply
titles    <- sapply(papers, function(x) x$`dc:title`)
authors   <- sapply(papers, function(x) x$`dc:creator`)
abstracts <- sapply(papers, function(x) x$`dc:description`)
        

Creating a Data Frame

Combine the extracted information into a data frame:

data <- data.frame(
  Title    = titles,
  Authors  = authors,
  Abstract = abstracts,
  stringsAsFactors = FALSE
)        

Cleaning the Data

Remove rows with missing values:

data <- na.omit(data)        

Analyzing the Data

For instance, check how many papers mention both “soil health” and “crop yield” in the abstract:

# Create a logical column indicating the occurrence of both terms
data$soil_health_crop_yield <- grepl("soil health", data$Abstract, ignore.case = TRUE) &
                               grepl("crop yield", data$Abstract, ignore.case = TRUE)

# Count the number of papers meeting the criteria
num_papers <- sum(data$soil_health_crop_yield)
print(num_papers)
        

Visualizing the Results

Before creating visualizations, load the required plotting libraries:

library(ggplot2)
library(viridis)  # For scale_fill_viridis_d()        

Create a bar plot to visualize the occurrence of the specific terms in the abstracts:

# Create a summary data frame
summary_df <- data.frame(
  Category = c("Soil Health + Crop Yield"),
  Count    = c(num_papers)
)

# Plot the results using ggplot2
ggplot(summary_df, aes(x = reorder(Category, -Count), y = Count, fill = Category)) +
  geom_bar(stat = "identity", show.legend = FALSE, width = 0.7, color = "black") +
  theme_minimal() +
  labs(title = "Occurrence of Specific Terms in Abstracts",
       y = "Number of Papers", x = "") +
  geom_text(aes(label = Count), vjust = -0.5, size = 5, color = "black") +
  scale_fill_viridis_d()
        

Conclusion

The rscopus package provides seamless access to bibliometric data from Scopus directly in R. By following the steps outlined above, you can efficiently download, clean, and analyze publication data and visualize your findings to gain valuable insights into scientific trends. Happy analyzing!


New packages:

  1. Bibliometrix (https://www.bibliometrix.org/home)
  2. bibliometrixData
  3. Scholar (https://github.com/YuLab-SMU/scholar)
  4. CitationNetworkViz
  5. pubmedR (https://github.com/massimoaria/pubmedR)
  6. litsearchr (https://elizagrames.github.io/litsearchr/)



Sheshu Mutyalu

PhD & Graduate Research Assistant| University of Tennessee | Soil Science | Nutrient Cycling | Climate Smart Agriculture | GHG Emissions

1 个月

I'm getting this error Error in get_results(query, start = init_start, count = count, verbose = verbose,?:? ?Unauthorized (HTTP 401).

Dr. Apurva Shukla, PhD

ICSSR Post-Doctoral Fellow at IIITA, Prayagraj|PhD, NIT PrayagrajI MBA (Finance)|Paryavaran Sanrakshan Gatividhi|National Finalist of NEYP 2022 at Parliament of India|CA Aspirant|M.Com

7 个月

Can you please list out the journals which accept the articles in bibliometrics in the field of finance?

Anubhav Thakur

Seed Science and Technology

1 年

Can use this for another fields ?

Kent T.

Disaster Resilience Researcher | Project Manager | Civil Engineer | Soil and Water Conservation Professional |

1 年

Have you tried Bibliometrix?

要查看或添加评论,请登录

Dr. Saurav Das的更多文章

  • Reference Extraction and Distribution by Year

    Reference Extraction and Distribution by Year

    Recently, during the revision of one of our manuscripts, we had a bit of back-and-forth with the journal over whether…

  • Synthetic Data for Soil C Modeling

    Synthetic Data for Soil C Modeling

    Note: The article is not complete yet My all-time question is, do we need all and precise data from producers (maybe I…

  • Bootstrapping

    Bootstrapping

    1. Introduction to Bootstrapping Bootstrapping is a statistical resampling method used to estimate the variability and…

  • Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

    Ecosystem Service Dollar Valuation (Series - Rethinking ROI)

    The valuation of ecosystem services in monetary terms represents a critical frontier in environmental economics…

  • Redefining ROI for True Sustainability

    Redefining ROI for True Sustainability

    It’s been a while since I last posted for Muddy Monday, but a few thoughts have been taking root in my mind, growing…

  • Linear Plateau in R

    Linear Plateau in R

    When working with data in fields such as agriculture, biology, and economics, it’s common to observe a response that…

    2 条评论
  • R vs R-Studio

    R vs R-Studio

    R: R is a programming language and software environment for statistical computing and graphics. Developed by Ross Ihaka…

    1 条评论
  • Backtransformation

    Backtransformation

    Backtransformation is the process of converting the results obtained from a transformed dataset back to the original…

    3 条评论
  • Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

    Spectroscopic Methods and Use in Soil Organic Matter & Carbon Measurement

    Spectroscopic methods comprise a diverse array of analytical techniques that quantify how light interacts with a…

    2 条评论
  • Regression & Classification

    Regression & Classification

    Regression and classification are two predictive modeling approaches in statistics and machine learning. Here's a brief…

    2 条评论