Skip to content

Preparation for pipelines upgrade #22

@patkyn

Description

@patkyn

There has been recent pipelines code upgrade requiring
Spark 3.5
Hadoop 3.2
Java 17

This requires some preparation work to make pipelines work with a later version of EMR.

Several issues need to be address:

  1. Bootstrap fail when using later version of EMR from the production version (EMR 5.34): Related to Launch ingest dag with different emr version failed #20.

  2. After fixing Bootstrap fail, launching EMR with the latest version still fail with EMR 7.9 using ManagadPolicy with permissions issue(may need to reconfirm if this is still the case)

  3. However, using the service role which is used by ala-dev to launch EMR 7.9, the EMR is able to be launched successfully. The applications installed in EMR 7.9 are run with Java 17. However the default java version of the EMR is still java 8. Hence, when the pipelines is called by command-runner jar, this would fail. EMR 7.9 does come with few different java version alternatives and this need to be set explicitly in the first step once the EMR is launched. Setting it in the bootstrap fails

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions