Common Formulas
GET, POST, paste, paste0, str, substr, apply (and related family of functions), etc.
The R package httr
is the preferred choice to interact with API’s. We illustrate an example below using the free site https://newsapi.org/docs. The site aggregates and archives news stories, which we can pull by keyword and similar filters.
Build URL
After receiving an API key from the site, we want to pull stories about baseball from the past couple of weeks. Following the instructions at the site, we build the url we want to query in the below fashion.

We install (if necessary) the httr
package along with other packages we want to use for our analysis such as jsonlite
.

Query API With GET
The main way we query the API is with the GET
command from the httr
library. The argument to the function is the url we created in the previous step.

Note that the full structure of the return, which we get with str
and which we abbreviated, is a bit intractable. We can get a more manageable overview by adding the argument max.levels=1
to the str
command.

JSON To Plaintext
Of the objects in the list that is returned, we are interested in the actual content. The API returns JSON (JavaScript Object Notation) data, so we use the jsonlite
package to convert the return into something more easily readable. The rawToChar
and fromJSON
commands from the package do the trick, and we see the following high-level return data.

We now store the actual articles as a data frame called news_articles
, and examine the data associated with each article with summary
. In addition to the content of the article, see that we get additional information including the author, title, url, and publisher of the piece that falls within our parameters defined in the first section (namely the keyword and date range).

Returning Relevant Data
We want to make our article search easier to navigate. Suppose we only care about the title of the piece, the author of the piece, and the website where the piece was published. We can use normal baseR
operations to filter down our data frame (news_articles[,c("url", "author", "title")]
), and then use an anonymous function within apply
to make our output more manageable. The function we are apply
ing to our reduced data frame is substr
, which allows us to only return the first 30 characters of each column. We apply
this function along the columns of our data frame (hence 2
), and make sure to store the result as a data frame by wrapping apply
in as.data.frame
. The result is shown below.

We can peak inside of the i
th article by accessing it’s content.

Putting Everything In A Function
To avoid running the above process over and over each time we want a different keyword or date range, we can put everything into a single function.

We try out the function on two topics– spring training and math.


Code
A Copiable version of the underneath image is below.


#####1. Example API#####
###1a. Load Required Packages###
install.packages("tidyverse")
install.packages("httr")
install.packages("jsonlite")
library(tidyverse)
library(httr)
library(jsonlite)
###1b. Preface The API Call###
topic="baseball"
key="733b246ed99t80c8e6f"
start_date="2025-03-01"
end_date="2025-03-16"
sort="popularity"
my_url=paste0("https://newsapi.org/v2/everything?",
"q=", topic,
"&from=", start_date,
"&to=", end_date,
"&sortBy=", sort,
"&apiKey=", key)
api_return=GET(my_url) #GET is from httr package#
str(api_return) #
str(api_return, max.level=1) #glimpse of what the API returns#
###1c. Converting JSON To Plaintext###
plaintext=fromJSON(rawToChar(api_return$content))
str(plaintext, max.level=1)
news_articles=plaintext$articles
summary(news_articles)
reduced=as.data.frame(apply(
news_articles[,c("url", "author", "title")],
2,
function(x) substr(x,1,30)))
head(reduced)
news_articles$content[1]
###1d. Put Everything In A Function###
function_api_query=function(subject, time, key) {
my_url=paste0("https://newsapi.org/v2/everything?",
"q=", subject,
"&from=", time,
"&apiKey=", key)
api_return=GET(my_url)
plaintext=fromJSON(rawToChar(api_return$content))
a=plaintext$articles
b=a[,c("url", "author", "title", "content")]
return(b)
}
###1e. Example###
topic1="spring_training"
topic2="math"
date1="2025-03-01"
date2="2025-03-16"
key="733b246ed99t80c8e6f"
baseball_articles=function_api_query(topic1, date1, key)
math_articles=function_api_query(topic2, date2, key)
head(baseball_articles[,c("author", "title")])
head(math_articles[,c("author", "title")])