Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl in Sidekiq - Selenium::WebDriver::Error::WebDriverError: not a file: "./bin/chromedriver #48

Open
Mirk32 opened this issue Jun 29, 2020 · 4 comments

Comments

@Mirk32
Copy link

Mirk32 commented Jun 29, 2020

I try to run crawler via Sidekiq job on my DigitalOcean droplet, but always get fail with error Selenium::WebDriver::Error::WebDriverError: not a file: "./bin/chromedriver", in the same time I can run crawl! via rails console and it works well, also it works well via Sidekiq on my local machine. I defined chromedriver_path in the Kimurai initializer - config.chromedriver_path = Rails.root.join('lib', 'webdrivers', 'chromedriver_83').to_s
Logs of the Sidekiq job which I started also via rails console with FekoCrawlWorker.perform_async

Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29T19:43:26.602Z 7201 TID-ou13yz8xx FekoCrawlWorker JID-7d134b4ee9407973d7803f0b INFO: start
Jun 29 19:43:26 aquacraft sidekiq[7201]: I, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140]  #033[36mINFO -- feko_spider:#033[0m Spider: started: feko_spider
Jun 29 19:43:26 aquacraft sidekiq[7201]: D, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140] #033[32mDEBUG -- feko_spider:#033[0m BrowserBuilder (selenium_chrome): created browser instance
Jun 29 19:43:26 aquacraft sidekiq[7201]: D, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140] #033[32mDEBUG -- feko_spider:#033[0m BrowserBuilder (selenium_chrome): enabled native headless_mode
Jun 29 19:43:26 aquacraft sidekiq[7201]: I, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140]  #033[36mINFO -- feko_spider:#033[0m Browser: started get request to: https://feko.com.ua/shop/category/kotly/gazovye-kotly331/page/1
Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29 19:43:26 WARN Selenium [DEPRECATION] :driver_path is deprecated. Use :service with an instance of Selenium::WebDriver::Service instead.
Jun 29 19:43:26 aquacraft sidekiq[7201]: I, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140]  #033[36mINFO -- feko_spider:#033[0m Info: visits: requests: 1, responses: 0
Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29 19:43:26 WARN Selenium [DEPRECATION] :driver_path is deprecated. Use :service with an instance of Selenium::WebDriver::Service instead.
Jun 29 19:43:26 aquacraft sidekiq[7201]: I, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140]  #033[36mINFO -- feko_spider:#033[0m Browser: driver selenium_chrome has been destroyed
Jun 29 19:43:26 aquacraft sidekiq[7201]: F, [2020-06-29 19:43:26 +0000#7201] [C: 70059979631140] #033[1;31mFATAL -- feko_spider:#033[0m Spider: stopped: {#033[35m:spider_name#033[0m=>#033[33m"feko_spider"#033[0m, #033[35m:status#033[0m=>:failed, #033[35m:error#033[0m=>#033[33m"#<Selenium::WebDriver::Error::WebDriverError: not a file: \"./bin/chromedriver\">"#033[0m, #033[35m:environment#033[0m=>#033[33m"development"#033[0m, #033[35m:start_time#033[0m=>#033[36m2020#033[0m-06-29 19:43:26 +0000, #033[35m:stop_time#033[0m=>#033[36m2020#033[0m-06-29 19:43:26 +0000, #033[35m:running_time#033[0m=>#033[33m"0s"#033[0m, #033[35m:visits#033[0m=>{#033[35m:requests#033[0m=>#033[36m1#033[0m, #033[35m:responses#033[0m=>#033[36m0#033[0m}, #033[35m:items#033[0m=>{#033[35m:sent#033[0m=>#033[36m0#033[0m, #033[35m:processed#033[0m=>#033[36m0#033[0m}, #033[35m:events#033[0m=>{#033[35m:requests_errors#033[0m=>{}, #033[35m:drop_items_errors#033[0m=>{}, #033[35m:custom#033[0m=>{}}}
Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29T19:43:26.607Z 7201 TID-ou13yz8xx FekoCrawlWorker JID-7d134b4ee9407973d7803f0b INFO: fail: 0.006 sec
Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29T19:43:26.608Z 7201 TID-ou13yz8xx WARN: {"context":"Job raised exception","job":{"class":"FekoCrawlWorker","args":[],"retry":false,"queue":"default","backtrace":true,"jid":"7d134b4ee9407973d7803f0b","created_at":1593459806.6006012,"enqueued_at":1593459806.6006787},"jobstr":"{\"class\":\"FekoCrawlWorker\",\"args\":[],\"retry\":false,\"queue\":\"default\",\"backtrace\":true,\"jid\":\"7d134b4ee9407973d7803f0b\",\"created_at\":1593459806.6006012,\"enqueued_at\":1593459806.6006787}"}
Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29T19:43:26.608Z 7201 TID-ou13yz8xx WARN: Selenium::WebDriver::Error::WebDriverError: not a file: "./bin/chromedriver"
Jun 29 19:43:26 aquacraft sidekiq[7201]: 2020-06-29T19:43:26.608Z 7201 TID-ou13yz8xx WARN: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/platform.rb:136:in `assert_file'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/platform.rb:140:in `assert_executable'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:138:in `binary_path'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:94:in `initialize'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:41:in `new'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/service.rb:41:in `chrome'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:299:in `service_url'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/chrome/driver.rb:40:in `initialize'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:46:in `new'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:46:in `for'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver.rb:88:in `for'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/capybara-2.18.0/lib/capybara/selenium/driver.rb:23:in `browser'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/selenium/driver.rb:32:in `port'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/selenium/driver.rb:28:in `pid'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/driver/base.rb:16:in `current_memory'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:51:in `ensure in visit'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:52:in `visit'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:201:in `request_to'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:128:in `block in crawl!'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:124:in `each'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/shared/bundle/ruby/2.6.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:124:in `crawl!'
Jun 29 19:43:26 aquacraft sidekiq[7201]: /home/deploy/aquacraft/releases/20200627190630/app/workers/feko_crawl_worker.rb:9:in `perform'

Sidekiq worker code:

require 'sidekiq-scheduler'

class FekoCrawlWorker
  include Sidekiq::Worker

  sidekiq_options retry: false, backtrace: true, queue: 'default'

  def perform
    Crawlers::Feko.crawl!
  end
end
@Mirk32 Mirk32 changed the title Selenium::WebDriver Sidekiq - Selenium::WebDriver::Error::WebDriverError: not a file: "./bin/chromedriver Crawl in Sidekiq - Selenium::WebDriver::Error::WebDriverError: not a file: "./bin/chromedriver Jun 29, 2020
@kaka-ruto
Copy link

kaka-ruto commented Sep 16, 2020

I have sort of the same issue when I use :selenium_chrome, but on my machine

/Users/kaka/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/platform.rb:136:in `assert_file': not a file: "/usr/local/bin/chromedriver" (Selenium::WebDriver::Error::WebDriverError)

It works when I use :selenium_firefox

@kaka-ruto
Copy link

Also check the config, haven't tried it but maybe changing the default location for the webdriver could help https://github.com/vifreefly/kimuraframework#configuration-options

 # Provide custom chrome binary path (default is any available chrome/chromium in the PATH):
  # config.selenium_chrome_path = "/usr/bin/chromium-browser"
  # Provide custom selenium chromedriver path (default is "/usr/local/bin/chromedriver"):
  # config.chromedriver_path = "~/.local/bin/chromedriver"

@GarnicaJR
Copy link

Thanks @kaka-ruto i tried usng Kimurai.configure and worked as shown below


require 'kimurai'

Kimurai.configure do |config|
  # Default logger has colored mode in development.
  # If you would like to disable it, set `colorize_logger` to false.
  # config.colorize_logger = false

  # Logger level for default logger:
  # config.log_level = :info

  # Custom logger:
  # config.logger = Logger.new(STDOUT)

  # Custom time zone (for logs):
  # config.time_zone = "UTC"
  # config.time_zone = "Europe/Moscow"

  # Provide custom chrome binary path (default is any available chrome/chromium in the PATH):
  # config.selenium_chrome_path = "/usr/bin/chromium-browser"
  # Provide custom selenium chromedriver path (default is "/usr/local/bin/chromedriver"):
  config.chromedriver_path = "/usr/bin/chromedriver"
end

class JobScraper < Kimurai::Base
  @name= 'eng_job_scraper'
  @start_urls = ["https://www.indeed.com/jobs?q=software+engineer&l=New+York%2C+NY"]
  @engine = :selenium_chrome

  @@jobs = []

  def scrape_page
    doc = browser.current_response
    returned_jobs = doc.css('td#resultsCol')
    returned_jobs.css('div.jobsearch-SerpJobCard').each do |char_element|
      title = char_element.css('h2 a')[0].attributes["title"].value.gsub(/\n/, "")
      link = "https://indeed.com" + char_element.css('h2 a')[0].attributes["href"].value.gsub(/\n/, "")
      description = char_element.css('div.summary').text.gsub(/\n/, "")
      company = description = char_element.css('span.company').text.gsub(/\n/, "")
      location = char_element.css('div.location').text.gsub(/\n/, "")
      salary = char_element.css('div.salarySnippet').text.gsub(/\n/, "")
      requirements = char_element.css('div.jobCardReqContainer').text.gsub(/\n/, "")
      # job = [title, link, description, company, location, salary, requirements]
      job = {title: title, link: link, description: description, company: company, location: location, salary: salary, requirements: requirements}

      @@jobs << job if !@@jobs.include?(job)
    end
  end

  def parse(response, url:, data: {})

    10.times do
      scrape_page

      if browser.current_response.css('div#popover-background') || browser.current_response.css('div#popover-input-locationtst')
        browser.refresh
      end

      browser.find('/html/body/table[2]/tbody/tr/td/table/tbody/tr/td[1]/nav/div/ul/li[6]/a/span').click
      puts "🔹 🔹 🔹 CURRENT NUMBER OF JOBS: #{@@jobs.count}🔹 🔹 🔹"
      puts "🔺 🔺 🔺 🔺 🔺  CLICKED NEXT BUTTON 🔺 🔺 🔺 🔺 "
    end

    CSV.open('jobs.csv', "w") do |csv|
      csv << @@jobs
    end

    File.open("jobs.json","w") do |f|
      f.write(JSON.pretty_generate(@@jobs))
    end

    @@jobs
  end
end

jobs = JobScraper.crawl!

FYI, I am using Archilinux and by default chromedriver is installed in this path '/usr/bin/chromedriver', finally when i ran the code i found another issue related to lsof it tool by default is not installed in Arch so i had to install it from AUR reposittories

yay -S lsof

Now everything looks good :)

@kaka-ruto
Copy link

Awesome @GarnicaJR ! Glad you got it working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants