Querying API’s

Common Formulas

GET, POST, paste, paste0, str, substr, apply (and related family of functions), etc.

The R package httr is the preferred choice to interact with API’s. We illustrate an example below using the free site https://newsapi.org/docs. The site aggregates and archives news stories, which we can pull by keyword and similar filters.

Build URL

After receiving an API key from the site, we want to pull stories about baseball from the past couple of weeks. Following the instructions at the site, we build the url we want to query in the below fashion.

We install (if necessary) the httr package along with other packages we want to use for our analysis such as jsonlite.

Query API With GET

The main way we query the API is with the GET command from the httr library. The argument to the function is the url we created in the previous step.

Note that the full structure of the return, which we get with str and which we abbreviated, is a bit intractable. We can get a more manageable overview by adding the argument max.levels=1 to the str command.

JSON To Plaintext

Of the objects in the list that is returned, we are interested in the actual content. The API returns JSON (JavaScript Object Notation) data, so we use the jsonlite package to convert the return into something more easily readable. The rawToChar and fromJSON commands from the package do the trick, and we see the following high-level return data.

We now store the actual articles as a data frame called news_articles, and examine the data associated with each article with summary. In addition to the content of the article, see that we get additional information including the author, title, url, and publisher of the piece that falls within our parameters defined in the first section (namely the keyword and date range).

Returning Relevant Data

We want to make our article search easier to navigate. Suppose we only care about the title of the piece, the author of the piece, and the website where the piece was published. We can use normal baseR operations to filter down our data frame (news_articles[,c("url", "author", "title")]), and then use an anonymous function within apply to make our output more manageable. The function we are applying to our reduced data frame is substr, which allows us to only return the first 30 characters of each column. We apply this function along the columns of our data frame (hence 2), and make sure to store the result as a data frame by wrapping apply in as.data.frame. The result is shown below.

We can peak inside of the ith article by accessing it’s content.

Putting Everything In A Function

To avoid running the above process over and over each time we want a different keyword or date range, we can put everything into a single function.

We try out the function on two topics– spring training and math.

Code

A Copiable version of the underneath image is below.

#####1. Example API#####
###1a. Load Required Packages###
install.packages("tidyverse")
install.packages("httr")
install.packages("jsonlite")

library(tidyverse)
library(httr)
library(jsonlite)



###1b. Preface The API Call###
topic="baseball"                             
key="733b246ed99t80c8e6f"                   
start_date="2025-03-01"
end_date="2025-03-16"
sort="popularity"

my_url=paste0("https://newsapi.org/v2/everything?", 
              "q=", topic, 
              "&from=", start_date,
              "&to=", end_date,
              "&sortBy=", sort,
              "&apiKey=", key)

api_return=GET(my_url)          #GET is from httr package#
str(api_return)                 #
str(api_return, max.level=1)    #glimpse of what the API returns#


###1c. Converting JSON To Plaintext###
plaintext=fromJSON(rawToChar(api_return$content))
str(plaintext, max.level=1)

news_articles=plaintext$articles
summary(news_articles)

reduced=as.data.frame(apply(
  news_articles[,c("url", "author", "title")], 
  2, 
  function(x) substr(x,1,30)))
head(reduced)

news_articles$content[1]        



###1d. Put Everything In A Function###
function_api_query=function(subject, time, key) {
  
  my_url=paste0("https://newsapi.org/v2/everything?", 
                "q=", subject, 
                "&from=", time,
                "&apiKey=", key)
  
  api_return=GET(my_url)
  plaintext=fromJSON(rawToChar(api_return$content))
  a=plaintext$articles
  b=a[,c("url", "author", "title", "content")]
  
  return(b)
}



###1e. Example###
topic1="spring_training"
topic2="math"

date1="2025-03-01"
date2="2025-03-16"

key="733b246ed99t80c8e6f"                   

baseball_articles=function_api_query(topic1, date1, key)
math_articles=function_api_query(topic2, date2, key)

head(baseball_articles[,c("author", "title")])
head(math_articles[,c("author", "title")])