Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Scrapy Integration to Prevent Reactor Issues #4

Open
s4dhulabs opened this issue Oct 4, 2024 · 0 comments
Open

Improve Scrapy Integration to Prevent Reactor Issues #4

s4dhulabs opened this issue Oct 4, 2024 · 0 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request reactor scrapy

Comments

@s4dhulabs
Copy link
Owner

Description:

Currently, the Vimana framework uses Scrapy for multithreading and asynchronous processing within plugins. However, due to the design of the framework, where a plugin's workflow can be composed of tasks executed by many other plugins, there are scenarios where a plugin that implements Scrapy can get stuck in a loop or freeze when called by another plugin. This is particularly problematic when using CrawlerProcess and CrawlerRunner within the same execution flow.

For example, in the siddhi plugin, the following code snippet illustrates the issue:

try:
    process = CrawlerProcess(dict(settings))
    process.crawl(_dmt_, **self.vmnf_handler)
    process.start(stop_after_crawl=False) 
except twisted.internet.error.ReactorAlreadyInstalledError:
    runner = CrawlerRunner(dict(settings))
    d = runner.crawl(_dmt_, **self.vmnf_handler)
    reactor.run(0)
    

In this scenario, if the siddhi plugin calls another plugin that also uses Scrapy, it can lead to the reactor being already installed, causing the execution to break or enter an infinite loop. This issue needs to be addressed to ensure smooth execution of plugins within the Vimana framework.

Steps to Reproduce:

  1. Create a plugin that uses Scrapy and CrawlerProcess.
  2. Call this plugin from another plugin that also uses Scrapy.
  3. Observe that the reactor may already be installed, causing the execution to break or freeze.

Expected Behavior:

  • The framework should handle the reactor installation gracefully.
  • Plugins should be able to call other plugins that use Scrapy without causing reactor issues.
  • Proper exception handling should be in place to prevent infinite loops or freezing.

Proposed Solution:

  • Use CrawlerRunner instead of CrawlerProcess to avoid issues with the reactor already being installed.
  • Manage the reactor lifecycle more robustly to ensure it is started and stopped correctly.
  • Implement proper exception handling to catch and handle errors gracefully.
  • Additional Context: This issue is critical for ensuring the reliability and stability of the Vimana framework, especially when dealing with complex workflows involving multiple plugins.

Observation: In some cases, to resolve this problem in the past, CrawlerRunner was used instead of CrawlerProcess. However, this did not resolve the issue in all scenarios due to the complexity of the framework and the unique contexts created by the integrations. The framework's complexity and the interactions between plugins create unique contexts that are difficult to test comprehensively.

This issue aims to improve the integration of Scrapy within the Vimana framework to prevent reactor-related issues and ensure smooth execution of plugins.

@s4dhulabs s4dhulabs added bug Something isn't working enhancement New feature or request scrapy reactor labels Oct 4, 2024
@s4dhulabs s4dhulabs self-assigned this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request reactor scrapy
Projects
None yet
Development

No branches or pull requests

1 participant