Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] on Docker with Ruby on Rails (webscraping) #15042

Open
kevinher7 opened this issue Jan 7, 2025 · 2 comments

Comments

@kevinher7
Copy link

kevinher7 commented Jan 7, 2025

What happened?

Hello everyone! It's my first bug report so sorry if the format is bad.

I've been looking for a solution to this for the last couple of days but the few things I found were about testing with selenium and haven't been of much help.

I am making a Rails app that uses selenium to do web scrapping. Although locally it works just fine, once I run a docker container I get the following log messages

2025-01-07 22:27:25 Starting ChromeDriver 131.0.6778.204 (52183f9e99a61056f9b78535f53d256f1516f2a0-refs/branch-heads/6778_155@{#7}) on port 9515
2025-01-07 22:27:25 Only local connections are allowed.
2025-01-07 22:27:25 Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
**2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515]** 
2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:29 2025-01-07 13:27:29 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:29 2025-01-07 13:27:29 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515]
.....
.....
.....
2025-01-07 22:27:42 2025-01-07 13:27:42 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:25 [7e9d73e0-f218-4617-bd30-a5916bc181ad] Completed 500 Internal Server Error in 71391ms (ActiveRecord: 0.0ms (0 queries, 0 cached) | GC: 0.0ms)
2025-01-07 22:27:25 [7e9d73e0-f218-4617-bd30-a5916bc181ad]   
2025-01-07 22:27:25 [7e9d73e0-f218-4617-bd30-a5916bc181ad] **Selenium::WebDriver::Error::WebDriverError (unable to connect to /home/rails/.cache/selenium/chromedriver/linux64/131.0.6778.204/chromedriver 127.0.0.1:9515):**

Even then, I can run chromedriver from the terminal just fine

$ /home/rails/.cache/selenium/chromedriver/linux64/131.0.6778.204/chromedriver
Starting ChromeDriver 131.0.6778.204 (52183f9e99a61056f9b78535f53d256f1516f2a0-refs/branch-heads/6778_155@{#7}) on port 0
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully on port 41375.

I do get the following socket related error when I try to execute chrome and I think the problem might lie here but I am not sure how to go around it. I do intend to use chromedriver headless.

/home/rails/.cache/selenium/chrome/linux64/131.0.6778.204/chrome
[256:275:0107/140103.774691:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[256:256:0107/140103.776889:ERROR:ozone_platform_x11.cc(244)] Missing X server or $DISPLAY
[256:256:0107/140103.776924:ERROR:env.cc(257)] The platform failed to initialize.  Exiting.

I would appreciate if I could get some help with this problem.

How can we reproduce the issue?

This is just the basic ruby on rails app with a couple modifications. Github repo: https://github.com/kevinher7/selenium-docker

Modifications made:
1. Added the selenium-webdriver gem to gemfile
2. Created a service to scrape aka use selenium (could be done in the same place as the controller)
3. Use the scrape class in a controller

and the only modification to the dockerfile was to add the google chrome dependencies as:

RUN apt-get update -qq && \
    apt-get install -y fonts-liberation libasound2 libatk-bridge2.0-0 libatk1.0-0 libatspi2.0-0 \
    libcups2 libdbus-1-3 libdrm2 libgbm1 libgtk-3-0 libvulkan1 libxcomposite1 libxdamage1 libxfixes3 libxkbcommon0 \
    libxrandr2 xdg-utils && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

Build the Image and then after running it go to the endpoint that triggers the scrapping get the logs mentioned above.

I also played changing to the normal (not slim) version of the ruby image but the problem persisted



### Relevant log output

```shell
2025-01-07 22:27:25 Starting ChromeDriver 131.0.6778.204 (52183f9e99a61056f9b78535f53d256f1516f2a0-refs/branch-heads/6778_155@{#7}) on port 9515
2025-01-07 22:27:25 Only local connections are allowed.
2025-01-07 22:27:25 Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
**2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515]** 
2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:26 2025-01-07 13:27:26 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:27 2025-01-07 13:27:27 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:28 2025-01-07 13:27:28 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:29 2025-01-07 13:27:29 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:29 2025-01-07 13:27:29 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515]
.....
.....
.....
2025-01-07 22:27:42 2025-01-07 13:27:42 DEBUG Selenium [:driver_service] polling for socket on ["127.0.0.1", 9515] 
2025-01-07 22:27:25 [7e9d73e0-f218-4617-bd30-a5916bc181ad] Completed 500 Internal Server Error in 71391ms (ActiveRecord: 0.0ms (0 queries, 0 cached) | GC: 0.0ms)
2025-01-07 22:27:25 [7e9d73e0-f218-4617-bd30-a5916bc181ad]   
2025-01-07 22:27:25 [7e9d73e0-f218-4617-bd30-a5916bc181ad] **Selenium::WebDriver::Error::WebDriverError (unable to connect to /home/rails/.cache/selenium/chromedriver/linux64/131.0.6778.204/chromedriver 127.0.0.1:9515):**

Operating System

It says that the Ruby image is based on Debian but I am not sure.

Selenium version

Ruby 3.4.1 and selenium-webdriver 4.27.0

What are the browser(s) and version(s) where you see this issue?

Chrome 131.0.6778.204 (stable version atm)

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver 131.0.6778.204

Are you using Selenium Grid?

No (I think)

Copy link

github-actions bot commented Jan 7, 2025

@kevinher7, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@kevinher7
Copy link
Author

Hello again! After some more debugging I found out it's not really an issue with Selenium itself but rather some sort configuration set by Rails, but this is just my guess. For the moment here is what I found.

The work around

The work around I found is to use Docker Compose with two services: the rails app and a selenium image. The resulting image was surprisingly light so it's not a bad work around. I add this docker-compose.yml for reference

services:
  website:
    build: .
    entrypoint: ./bin/docker-entrypoint
    command: ["./bin/thrust", "./bin/rails", "server"]
    ports:
      - "80:3000"
    depends_on:
      - selenium
    environment:
      - RAILS_MASTER_KEY=${RAILS_MASTER_KEY}
      - SELENIUM_URL=http://selenium:4444/wd/hub
    tty: true # you may not need this
    stdin_open: true # or this, but I forgot what they do 
  
  selenium:
    image: selenium/standalone-chrome
    ports:
      - "4444:4444"

The important things are

  1. Correctly do the port mapping on the selenium service
  2. Add the SELENIUM_URL environmental variable to the website environment (replace 4444 with the port you decide to use)

And then modify the scraper service class to use a remote driver

require "selenium-webdriver"

class Scraper
    def initialize(url)
        @options = Selenium::WebDriver::Chrome::Options.new
        @options.add_argument("--headless")

        @selenium_url = ENV.fetch("SELENIUM_URL", "http://localhost:4444/wd/hub")
        @article_url = url
    end

    def scrape
        @driver = Selenium::WebDriver.for :remote, url: @selenium_url, options: @options
        @driver.navigate.to @article_url

        # Do Some Scraping

        @driver.quit

        # Return Scraped Data
    end
end

After making these changes, build with docker-compose build and then run the application using docker-compose up.

Hope this helps someone out there!

Why tho?

After some more debugging I managed to pinpoint the cause to being some sort of compatibility issue between the Rails and Chrome/Chromedriver (maybe the way Rails is made it prevents the process to directly interact with other Chrome instances), since I could run a scraping script from the shell of the same docker container given that the ruby script was outside the rails app directory (although I didn't try running a file inside the rails directory from the docker bash).

I did try a lot of things like running dbus or trying to set up a display, but my best guess is that Rails (at least in production mode) prevents ruby files from opening or connecting with Chrome/Chromedriver.

However, in the end I guess is some sort of issue with rails and not selenium itself, so I suppose it's okay to close this issue. Thanks for reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
@kevinher7 and others