Skip to content

Conversation

@SaelKimberly
Copy link

@SaelKimberly SaelKimberly commented Aug 26, 2025

Hello everyone!

Perhaps, as part of a real modernization of the code base, it is worth introducing the use of Ruff as a linter and formatter for python code.
And move once and for all from python2 (dead for many years) to python3 (ideally 3.9+, since older ones are no longer supported by the community).

This PR is an attempt to automatically (by 99.9%) migrate to more or less modern code.

If it takes off, you can add checks for known errors and formatting problems to CI before building:

  • ruff check --silent --target-version py37 --select <error codes>
  • ruff format --check --silent --target-version py37

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

modernize --no-diffs --no-six -wnj 8 $(ruff check --show-files)
`ruff format --target-version py37 --preview`
Also, some manual fixes for identation issues in legacy
`ruff check --fix --target-version py37 --preview`
@Stolb27
Copy link
Collaborator

Stolb27 commented Sep 4, 2025

Hello, @SaelKimberly! We are glad to see your contribution! Python 2 reached EOL 5 years ago.

As far I understand, you've checked the possibility to make cluster and proceed basic cluster operations. Let's run it on our CI. For this purpose, I ask you to separate code linting to including in the further patch. It helps us to concentrate on functional changes first. Can you make it?

The next step is expanding the test suite by running more tests. At least @concourse_cluster tests should be brought up. We'll plan this work, but the similar help are welcome.

Suggested by you changes require modification not only in the core part of Greengage but in the user third-party utilities. E.g.: utilities that use built-in gppylib for cluster connection. So we need some time to collect feedback from other users about these changes.

In case of success, we'll be able to integrate this patch past November, taking in account internal working plan.

PS. Also, please reword the ticket description in English, to make this discussion available for everyone.

@SaelKimberly SaelKimberly changed the title Автоматическая миграция и форматирование легаси кода (Python2) Automatic modernization (Python 2.x to Python 3.x) and formatting of legacy python code. Sep 4, 2025
@SaelKimberly
Copy link
Author

Hello, @Stolb27! I have not built yet GreengageDB with these changes.
Perhaps Ruff should be tuned more granular to keep some pieces of legacy code intact, but this utility applies only safe fixes to codebase, so I don't know, if this needed or not.
The modernize utility also should not break the source code, but if some processes uses Python 2.x, it definitely may be broken since.

This PR is, first and foremost, proof of the fundamental possibility of automatic code migration (99.9% of code changes took less than 1 minute).
When I was still working with Greenplum, the approach to writing Python code in this project was quite sloppy, and maintaining such a codebase seemed very challenging to me even back then - precisely due to the significant lag behind standards adopted by the community. What really got to me was the use of Python2 in some parts of the project, despite its support having ended quite some time ago. It either shouldn't be there at all if it's not being used, or we should move away from using it.

@Stolb27
Copy link
Collaborator

Stolb27 commented Sep 4, 2025

This PR is, first and foremost, proof of the fundamental possibility of automatic code migration (99.9% of code changes took less than 1 minute).

So let's concentrate on this task to collect troubles after auto-migration.

but this utility applies only safe fixes to codebase, so I don't know, if this needed or not.

Is it really a required step to check migration possibility? Does it replace python2-specific code with more generic? But it increases patch footprint significantly.

For example, I've checked the 1st patch locally and got the next errors:

112.5 pgmodule.c:3641:13: error: storage class specified for parameter ‘pg__doc__’
112.5  3641 | static char pg__doc__[] = "Python interface to PostgreSQL DB";
112.5       |             ^~~~~~~~~
112.5 pgmodule.c:3641:1: error: parameter ‘pg__doc__’ is initialized
112.5  3641 | static char pg__doc__[] = "Python interface to PostgreSQL DB";
112.5       | ^~~~~~
112.5 pgmodule.c:3644:1: error: expected declaration specifiers before ‘DL_EXPORT’
112.5  3644 | DL_EXPORT(void)
112.5       | ^~~~~~~~~
112.5 pgmodule.c:138:1: error: old-style parameter declarations in prototyped function definition
112.5   138 | DL_EXPORT(void) init_pg(void);
112.5       | ^~~~~~~~~
112.5 pgmodule.c:3746: error: expected ‘{’ at end of input

It seems that old pygresql 4.0 is incompatible with python 3.

@SaelKimberly
Copy link
Author

I see. As in PyPi docs of pygresql:

The following Python versions are supported:

PyGreSQL 4.x and earlier: Python 2 only

PyGreSQL 5.x: Python 2 and Python 3

PyGreSQL 6.x and newer: Python 3 only

And also, in docs

The current version of PyGreSQL (*6.1.0*) has been tested with Python versions 3.7 to 3.13, and PostgreSQL versions 10 to 17.

Maybe, I don't know, version 4 of pygresql is a little bit outdated...

I think there are definitely no live packages for Python that require Python 2 to run, except for the really rare, battle-tested ones. PyGreSQL is certainly battle-tested, but the Python 2 requirement is also probably a security issue, and proof that the mammoths are alive.

Unless you're using Postgres 9 here, then I really agree it's may be necessary. In this case, of course, it is worth bypassing this package when migrating code.

@Stolb27
Copy link
Collaborator

Stolb27 commented Sep 23, 2025

Unless you're using Postgres 9 here, then I really agree it's may be necessary. In this case, of course, it is worth bypassing this package when migrating code.

Greengage main branch contains gpdb 6 code base. That is based on Postgres 9.4. So the first step that we should do before accepting this patch is updating pygresql version or replacing it with other lib (e.g. psycopg2 as was made in gpdb 7).

Currently, we work under test automation to migrate Greengage 7 development to this repo too. Greengage 7 was already migrated to python 3.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 8, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
226 Security Hotspots
4.2% Duplication on New Code (required ≤ 3%)
E Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@KnightMurloc
Copy link
Collaborator

I am currently in the process of researching the possibility of updating the PyGreSQL library to version 5.2.5 and I have encountered the following incompatibilities:

  1. Support for new types - PyGreSQL can now convert timestamp types and arrays to representations in python instead of passing them as strings. Which broke a number of places where we relied on the string representation.
  2. Support for the 'with' syntax - PyGreSQL is now able to close a connection if it was opened via with. Which resulted in reading from the cursor after closing the connection in a number of places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants