Bibliometric Analysis Using rscopus in R
Dr. Saurav Das
Research Director | Farming Systems Trial | Rodale Institute | Soil Health, Biogeochemistry of Carbon & Nitrogen, Environmental Microbiology, and Data Science | Outreach & Extension | Vibe coding
Bibliometric analysis is a powerful tool for researchers and academics to analyze the impact and trends in scientific publications. With R and its vast array of packages, conducting bibliometric analysis has become more accessible and efficient. One interesting package is rscopus (CRAN: rscopus), which allows users to interact with the Scopus database directly from R. This blog post will guide you through the process of downloading data using rscopus and performing post‐processing to extract meaningful insights.
Getting Started with RScopus
Installation First, install and load the rscopus package:
# Install the package (if not already installed)
install.packages("rscopus")
# Load the rscopus library
library(rscopus)
Setting Up API Key
Setting Up the API Key To use rscopus, you need an API key from Elsevier. Register on the Elsevier Developer Portal (Elsevier API) to obtain your key. Then, set the API key in R as follows:
options(elsevier_api_key = "your_api_key_here")
Searching for Papers
You can search for papers using the scopus_search() function. For example, to search for papers related to “soil health” published between 2000 and 2022, use:
query <- 'TITLE("soil health") AND PUBYEAR > 1999 AND PUBYEAR < 2023'
results <- scopus_search(query, count = 25, view = "COMPLETE")
Note:The count parameter specifies how many results to retrieve per query. If the total number of papers exceeds this count, you might need to implement pagination (i.e., performing multiple queries in multiples of 25) because of query limits.
Post-Processing the Results
Extracting Information Extract titles, authors, and abstracts from the returned results. Ensure that each command is on its own line for clarity:
# Extract the list of entries from the search results
papers <- results$entries
# Extract titles, authors, and abstracts using sapply
titles <- sapply(papers, function(x) x$`dc:title`)
authors <- sapply(papers, function(x) x$`dc:creator`)
abstracts <- sapply(papers, function(x) x$`dc:description`)
Creating a Data Frame
Combine the extracted information into a data frame:
data <- data.frame(
Title = titles,
Authors = authors,
Abstract = abstracts,
stringsAsFactors = FALSE
)
Cleaning the Data
Remove rows with missing values:
data <- na.omit(data)
Analyzing the Data
For instance, check how many papers mention both “soil health” and “crop yield” in the abstract:
# Create a logical column indicating the occurrence of both terms
data$soil_health_crop_yield <- grepl("soil health", data$Abstract, ignore.case = TRUE) &
grepl("crop yield", data$Abstract, ignore.case = TRUE)
# Count the number of papers meeting the criteria
num_papers <- sum(data$soil_health_crop_yield)
print(num_papers)
Visualizing the Results
Before creating visualizations, load the required plotting libraries:
library(ggplot2)
library(viridis) # For scale_fill_viridis_d()
Create a bar plot to visualize the occurrence of the specific terms in the abstracts:
# Create a summary data frame
summary_df <- data.frame(
Category = c("Soil Health + Crop Yield"),
Count = c(num_papers)
)
# Plot the results using ggplot2
ggplot(summary_df, aes(x = reorder(Category, -Count), y = Count, fill = Category)) +
geom_bar(stat = "identity", show.legend = FALSE, width = 0.7, color = "black") +
theme_minimal() +
labs(title = "Occurrence of Specific Terms in Abstracts",
y = "Number of Papers", x = "") +
geom_text(aes(label = Count), vjust = -0.5, size = 5, color = "black") +
scale_fill_viridis_d()
Conclusion
The rscopus package provides seamless access to bibliometric data from Scopus directly in R. By following the steps outlined above, you can efficiently download, clean, and analyze publication data and visualize your findings to gain valuable insights into scientific trends. Happy analyzing!
New packages:
PhD & Graduate Research Assistant| University of Tennessee | Soil Science | Nutrient Cycling | Climate Smart Agriculture | GHG Emissions
1 个月I'm getting this error Error in get_results(query, start = init_start, count = count, verbose = verbose,?:? ?Unauthorized (HTTP 401).
ICSSR Post-Doctoral Fellow at IIITA, Prayagraj|PhD, NIT PrayagrajI MBA (Finance)|Paryavaran Sanrakshan Gatividhi|National Finalist of NEYP 2022 at Parliament of India|CA Aspirant|M.Com
7 个月Can you please list out the journals which accept the articles in bibliometrics in the field of finance?
Seed Science and Technology
1 年Can use this for another fields ?
Disaster Resilience Researcher | Project Manager | Civil Engineer | Soil and Water Conservation Professional |
1 年Have you tried Bibliometrix?