Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure PyWPS objects are serializable #658

Open
1 of 8 tasks
huard opened this issue May 13, 2022 · 5 comments
Open
1 of 8 tasks

Make sure PyWPS objects are serializable #658

huard opened this issue May 13, 2022 · 5 comments

Comments

@huard
Copy link
Collaborator

huard commented May 13, 2022

Description

Parallelisation libraries, like dask, communicate processes from the scheduler to workers by serializing-deserializing objects through the network. It seems that some PyWPS objects are not serializable. The issues I've found so far are:

  • Input and Outputs in the Process objects include weak references (weakref in the IO handlers)
  • WPSRequest has an EncodedFile object that is not pickable.

I propose to start by writing tests that try to pickle PyWPS objects, submit a PR, and pursue the discussion over there.

Environment

  • operating system: Ubuntu
  • Python version: 3.9.7
  • PyWPS version: 4.5.2
  • source/distribution
  • git clone
  • Debian
  • PyPI
  • zip/tar.gz
  • other (please specify):
  • web server
  • Apache/mod_wsgi
  • CGI
  • other (please specify):

Steps to Reproduce

Additional Information

@huard
Copy link
Collaborator Author

huard commented May 17, 2022

While investigating this, I realized that the Process.json returns dict, while WPSRequest.json returns a string. the former has a from_json method, while in the second, json is a property with getter and setter methods.

Is this something that should be uniform across the code?

@huard
Copy link
Collaborator Author

huard commented May 18, 2022

Another more serious issue is that Process._run_process, the method actually running the process handler, triggers Process.launch_next_process, which runs Service.prepare_process_for_execution. So individual processes need a reference to the overall service, which complicates the serialization of Processes. I'm not sure I can solve this one without falling into a refactoring nightmare. Ideas ?

@gschwind
Copy link
Collaborator

gschwind commented Oct 9, 2023

Hello huard,

While investigating this, I realized that the Process.json returns dict, while WPSRequest.json returns a string. the former has a from_json method, while in the second, json is a property with getter and setter methods.

Is this something that should be uniform across the code?

I also noticed the different behavior of json properties across the code and I did addressed the issue in some of my refactoring such as [1].

I think this should be fixed.

[1] gschwind@db27387

@gschwind
Copy link
Collaborator

gschwind commented Oct 9, 2023

Hello huard,

Another more serious issue is that Process._run_process, the method actually running the process handler, triggers Process.launch_next_process, which runs Service.prepare_process_for_execution. So individual processes need a reference to the overall service, which complicates the serialization of Processes. I'm not sure I can solve this one without falling into a refactoring nightmare. Ideas ?

I do also agree that is quite an issue, but refactoring this is very difficult at the moment.

Best regard.

@gschwind
Copy link
Collaborator

gschwind commented Oct 9, 2023

Hello,

Moreover the json serialization is used in different context with very different meaning, the serialization may end up as json outputs for json request, may be used within XML templates or may be used to serialize data to the data base.

We should clarify the intend of json serialization and try to keep it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants