-
Notifications
You must be signed in to change notification settings - Fork 224
Lzdownload service - download all packages of a given channel #9679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Lzdownload service - download all packages of a given channel #9679
Conversation
lzreposync will be a spacewalk-repo-sync replacement written in Python. It uses a src layout and a pyproject.toml. The target Python version is 3.11, compatibility with older Python versions is explicitly not a goal.
Added the remote_path column that will hold the remote path/ url of a given package. This information will help locate the package later-on on the remote repository and download it.
A boolean argument that checks whether we should call the header.hdr.fullFilelist() We added this argument to disable the header.hdr.fullFilelist() function only for the lzreposync service.
The inspect.getargspec() method is deprecated in Python 3 It can be replaced by inspect.getfullargspec()
The import_signatures is a boolean argument that specifies whether we should execute the _import_signatures() method. We added this parameter to disable the _import_signatures() method for the lzreposync service.
Parsing the rpm's Primary.xml packages metadata file using pulldom xml parser as a memory efficient parsing library. Note that some attributes in the returned parsed object are faked, and maybe filled in elsewhere. The faking of some of the data is done because some attributes are required by the importer service.
Parsing the rpm's filelists.xml metadata file using pulldom xml parser as a memory efficient parsing library. The parser parses the given filelists.xml file (normally in gz format), and cache the filelist information of each package in a separate file in the cache directory, using the package's hash as the filename, with no file extension.
Using both primary_parser and filelists_parser, return the full packages' metadata, pacakge by package, using lazing parsing. Note that there some attributes that are faked, because we can't fetch them now, and they're required by the package importer later-on. However, we can fake them more efficiently, using less memory.
Parsed the update-info.xml file and imported the parsed patches/updates to the database. We used pretty much the same code from the old Reposync class.
Import the parsed rpm and debian packages to the database in batche, and associate each pacakge with the corresponding channel
Parsed the debian Packages metadata file in a lazy way and yield the metadata of each package separately.
Parsed the debian's Translation file that contains the full description of packages, grouped by description-md5, and cache the parsed descriptions in a cache directory.
Using both packages_parser and translation_parser, return the full packages' metadata, pacakge by package, using lazing parsing Also set the debian repository's information in a DebRepo class
Given the channel label, fetch important repository's information form the database, and store it in a temporary object RepoDTO
Added the necessary command line arguments. Identify the target repositories, prepare the datastructures, and execute the lazy synchronization of repositories/packages.
Added a new dependency python-gnupg used to verify repo signature.
Ignored two linting complains about rasing exceptions floowing the approach in the old reposync. We can enhance the code instead of doing this though.
This commit completes almost all the logic and use cases of the new lazy reposync. **Note** that this commit will be restructured and possibly divided into smaller and more convenient commits. This commit is for review purposes.
Seemingly this error happened because we reached the maximum number of unclosed db connections. And thought that this might be due to the fact that the close() method in the Database class was not implemented, and the rhnSQL.closeDB() was not closing any connection. However, we're still hesitating about whether this is the root cause of the problem, because the old(current) reposync is was using it without any error.
This is the latest and almost the final version of the lzreposync service. (gpg sig check not complete) It contains pretty much all the necessary tests, including the ones for updates/patches import. Some of the remaining 'todos' are either for code enhancements or some unclear concepts that will be discussed with the team. Of course, this commit will be split into smaller ones later after rebase.
- Removed some todos. - Changed some sql queries with equivalent ones using JOIN...ON. - Some other minor cleanup
Optimized some code by changing classes and methods in some logics with free functions. Consolidated the debian repo parsing.
Completed the gpg signature check for rpm repositories, mainly for the repomd.xml file. This is done by downloading the signature file from the remote rpm repo, and executing 'gpg verify' to verify the repomd.xml file against its signature using the already added gpg keys on the filesystem. So, if you haven't already added the required gpg keyring on your system, you'll not be able to verify the repo. You should ideally run this version directly on the uyuni- server, because the gpg keyring will probably be present there.
makedirs() in uyuni.common.fileutils now accepts relative paths that consist of only a directory name or paths with trailing slashes.
Completed the gpg signature check for debian repositories. If you haven't already added the required gpg keyring on your system, you'll not be able to verify the repo, and you'll normally get a GeneralRepoException. You should ideally run this version directly on the uyuni- server, because the gpg keyring will probably be present there.
Mocked the SPACEWALK_GPG_HOMEDIR value to `~/.gnupg/`, which is the default directory for gpg, in order to execute the gpg tests outside the uyuni-server
Made the lzreposync service continuously loop over the existing channels and synchronize the corresponding repositories. Added a status column in the rhnchannel table to indicate the sync status of a given channel. Also added some helper arguments to the service that allows us to perform test operations, like creating a test channel and associating repositories to it, etc
Implemented a first, minimal, working version of the download service, using the download all strategy, meaning that for a given channel, we download all the packages that are linked to that channel. The download directory is hard coded, but it should be further discussed.
72d910d to
f504fca
Compare
|
@waterflow80 Just letting you know that I'm not forgetting to review your PR... just overloaded these days |
No problem. |
|
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
What does this PR change?
The
lzdownalodservice will be in charge of using the cached packages' metadata to download the actual binaries (or source rpms).In this PR, we implemented a minimal version of the
download allstrategy, by downloading all the packages of given channel using their cached metadata.Usage
And the packages will be downloaded to the specified location in the filesystem:
Independence
We have separated the
lzdownloadfrom thelzreposyncso that each service can run independently from each other. This will help in scaling and in the separation of tasks.We may consider putting some functions used by both services in a common location.
GUI diff
No difference.
Documentation
No documentation needed: only internal and user invisible changes
DONE
Test coverage
ℹ️ If a major new functionality is added, it is strongly recommended that tests for the new functionality are added to the Cucumber test suite
No tests: Unit tests will be added on the fly.
DONE
Links
Issue(s): #
Port(s): # add downstream PR(s), if any
Changelogs
Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository
If you don't need a changelog check, please mark this checkbox:
If you uncheck the checkbox after the PR is created, you will need to re-run
changelog_test(see below)Re-run a test
If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:
Before you merge
Check How to branch and merge properly!