selenium

R-CMD-check CRAN status

selenium is a tool for the automation of web browsers. It is a low-level interface to the WebDriver specification, and an up-to-date alternative to RSelenium.

Installation

# Install selenider from CRAN
install.packages("selenium")

# Or the development version from Github
# install.packages("pak")
pak::pak("ashbythorpe/selenium-r")

However, you must also have a selenium server installed and running (see below).

Starting the server

A selenium instance consists of two parts: the client and the server. The selenium package only provides the client. This means that you have to start the server yourself.

To do this you must:

There are many different ways to download and start the server, one of which is provided by selenium:

library(selenium)
server <- selenium_server()

This will download the latest version of the server and start it.

By default, the server file will be stored in a temporary directory, meaning it will be deleted when the session is closed. If you want the server to persist, meaning that you don’t have to re-download the server each time, you can use the temp argument:

server <- selenium_server(temp = FALSE)

You can also do this manually if you want:

  1. Download the latest .jar file for Selenium Server. Do this by navigating to the latest GitHub release page (https://github.com/SeleniumHQ/selenium/releases/latest/), scrolling down to the Assets section, and downloading the file named selenium-server-standalone-<VERSION>.jar (with <VERSION> being the latest release version).
  2. Make sure you are in the same directory as the file you downloaded.
  3. In the terminal, run java -jar selenium-server-standalone-<VERSION>.jar standalone --selenium-manager true, replacing <VERSION> with the version number that you downloaded. This will download any drivers you need to communicate with the server and the browser, and start the server.

There are a few other ways of starting Selenium Server:

Waiting for the server to be online

The Selenium server won’t be ready to be used immediately. If you used selenium_server() to create your server, you can pass it into wait_for_server():

wait_for_server(server)

You can also use server$read_output() and server$read_error()

If you used a different method to create your server, use wait_for_selenium_available() instead.

wait_for_selenium_available()

If any point in this process produces an error or doesn’t work, please see the Debugging Selenium article for more information.

Starting the client

Client sessions can be started using SeleniumSession$new()

session <- SeleniumSession$new()

By default, this will connect to Firefox, but you can use the browser argument to specify a different browser if you like.

session <- SeleniumSession$new(browser = "chrome")

Usage

Once the session has been successfully started, you can use the session object to control the browser. Here, we dynamically navigate through the R project homepage. Remember to close the session and the server process when you are done.

session$navigate("https://www.r-project.org/")
session$
  find_element(using = "css selector", value = ".row")$
  find_element(using = "css selector", value = "ul")$
  find_element(using = "css selector", value = "a")$
  click()

session$
  find_element(using = "css selector", value = ".row")$
  find_elements(using = "css selector", value = "div")[[2]]$
  find_element(using = "css selector", value = "p")$
  get_text()
#> [1] ""

session$close()
#> [1] TRUE
server$kill()

For a more detailed introduction to using selenium, see the Getting Started article.

Note that selenium is low-level and mainly aimed towards developers. If you are wanting to use browser automation for web scraping or testing, you may want to take a look at selenider instead.