Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc fixes - building in plan, PyYAML, datastream XML parsing #260

Merged
merged 4 commits into from
Sep 3, 2024

Conversation

comps
Copy link
Contributor

@comps comps commented Aug 30, 2024

(These were originally part of a new audit-sample-rules test, #259, but I decided to split them off for easier discussion tracking.)

Going by commits, from oldest:

add util/backup.py for backup+restore of filesystem paths

This was for an earlier version of the test, which actually remediated the audit rules into /etc/audit - I first had a VM-using test, but realized we need to run it multi-arch, so I rewrote it without VMs, hence the backup/restore.

The current version of the test doesn't use this, but I think it would still be good to have it, rather than throw the code away. The early versions of Contest also had it: https://github.com/RHSecurityCompliance/contest/blob/bb2778816c463f9148395c8e566c7724ce60b28e/lib/backup.sh 😄

start using PyYAML instead of hacking YAML syntax

From the commit message body:

    This was likely a RHEL-7 limitation, plus we didn't want to add
    unnecessary dependency on non-python-stdlib modules (like 'requests'
    or 'PyYAML').

    However we keep running into the need to parse YAML (ie. playbooks
    or TMT metadata), and python3*-pyyaml seems to be widely available
    in RHEL (even ie. python39-pyyaml on RHEL-8), unlike 'rpm' or 'dnf',
    so we might as well use it.

    The '/static-checks/ansible/allowed-modules' test already uses it
    without adding a RPM dependency, which hints at how widely available
    PyYAML is. Also, TMT probably depends on it too.

I specifically needed it in the audit test to parse ansible remediations and get .rules filepath / contents without having to sed/grep bash code.

build CONTEST_CONTENT in a TMT plan prepare step

Per the commit:

    This is because building takes a long time (10+ minutes) on some
    slow systems and that easily exceeds a 5/10 minute 'duration'
    of a small test that never anticipated the extra delay.

    Moving it to a plan moves it outside the 'duration' count.

    Also, it's probably a better central place given that all tests
    always share the built content so the ./build_product options
    have to be considered globally.

    Finally, this is not a great fix given that we are still left
    with the 'build' argument to get_content(),

        def get_content(build=True):
            ...

    and its SRPM-based building code, which still runs within test
    'duration', however there is no easy solution for that (other than
    getting rid of SRPM support entirely), so this commit at least
    tries to bump things in a right direction.

split off xml-parsing code from class Datastream

I originally wanted to redesign the whole XML parsing in class Datastream to have a more modular "what you want" data retrieval:

class Datastream(enum.Flag):
    """
    Represent an OpenSCAP datastream XML definition and enable extracting
    selected values from it.

    This is done by the class definition having enum.Flag style variables,
    which get overwritten by a class instance to either None (if the given
    value was not chosen to be extracted) or the contents of the extracted
    data.

    Ie.
        ds = Datastream()
        ds.parse_xml('file.xml', Datastream.profiles | Datastream.rules)

        profiles = ds.profiles          # is a dict() of profile metadata
        rules = ds.rules                # is a dict() of rule metadata
        remediations = ds.remediations  # is None (was not requested)

    This is to preserve reasonable speed and low memory overhead for use
    cases that don't need all the details.

    Data structure example:

        ds.path = Path(file_the_datastream_was_parsed_from)

        ds.profiles = {
          'ospp': namespace(
            .title = 'Some text',
            .rules = set( 'audit_delete_success' , 'service_firewalld_enabled' , ...),
            .values = set( ('var_rekey_limit_size','1G') , ...),
          ),
          ... (all profiles in the datastream) ...
        }

        ds.rules = {
          'configure_crypto_policy': namespace(
            .fixes = FixType.bash | FixType.ansible,
          ),
          'account_password_selinux_faillock_dir': namespace(
            .fixes = FixType.bash,
          ),
          ... (all rules in the datastream) ...
        }

        ds.remediations = {
          'configure_crypto_policy': {
            FixType.bash: '... bash code ...',
            FixType.ansible: '... ansible playbook ...',
          },
          'account_password_selinux_faillock_dir': namespace(
            FixType.bash: '... bash code ...',
          ),
          ... (all rules with at least one type of fix available) ...
        }

        (FixType is Datastream.FixType)
    """

However, mid-way through implementing it, I realized that when parsing out remediation code, we probably don't want to extract all 10s of MBs of code, just a few select rules.

So I went with what's in the commit - splitting off the batched XML parser, allowing the audit test to parse the DS XML on its own, extracting remediation code only for rules in the policy_rules Group.

I'm not against revisiting the class Datastream code in the future, but (depite the big +++ and ---), nothing has really changed inside, aside from one minor bug fix (elem.get to elements[-1].get).

# store the original files in the backup + copy them for our use
# - this is to preserve original inode numbers after restore
path.rename(path_backup)
shutil.copytree(path_backup, path, symlinks=True, ignore_dangling_symlinks=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore_dangling_symlinks=True seems useful only when symlinks=False.
At least my understanding is when symlinks=True and file pointed to by symlink doesn't exist, it still copies the symlink (and doesn't care the file pointed to doesn't).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I read the manual as to imply to "trigger an error on broken symlink", but practically testing it, symlinks=True does copy a broken symlink even without ignore_dangling_symlinks, so I'll remove it.

This was likely a RHEL-7 limitation, plus we didn't want to add
unnecessary dependency on non-python-stdlib modules (like 'requests'
or 'PyYAML').

However we keep running into the need to parse YAML (ie. playbooks
or TMT metadata), and python3*-pyyaml seems to be widely available
in RHEL (even ie. python39-pyyaml on RHEL-8), unlike 'rpm' or 'dnf',
so we might as well use it.

The '/static-checks/ansible/allowed-modules' test already uses it
without adding a RPM dependency, which hints at how widely available
PyYAML is. Also, TMT probably depends on it too.

Signed-off-by: Jiri Jaburek <[email protected]>
This is because building takes a long time (10+ minutes) on some
slow systems and that easily exceeds a 5/10 minute 'duration'
of a small test that never anticipated the extra delay.

Moving it to a plan moves it outside the 'duration' count.

Also, it's probably a better central place given that all tests
always share the built content so the ./build_product options
have to be considered globally.

Finally, this is not a great fix given that we are still left
with the 'build' argument to get_content(),

    def get_content(build=True):
        ...

and its SRPM-based building code, which still runs within test
'duration', however there is no easy solution for that (other than
getting rid of SRPM support entirely), so this commit at least
tries to bump things in a right direction.

Signed-off-by: Jiri Jaburek <[email protected]>
@mildas mildas merged commit 0e4c1c3 into main Sep 3, 2024
3 checks passed
@mildas mildas deleted the misc_fixes_2024_08 branch September 3, 2024 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants