Skip to content

[Failing Test]: YamlTransformE2ETest.test_csv_to_json is flaky #36689

@mohamedawnallah

Description

@mohamedawnallah

Where did this flake appear?

It appeared in the following CI workflow runs as part of PR #36684:
https://github.com/apache/beam/actions/runs/18947074452/job/54101320609?pr=36684#step:8:10096

Are there any failure logs for debugging?

____________________ YamlTransformE2ETest.test_csv_to_json _____________________
[gw5] linux -- Python 3.10.17 /runner/_work/beam/beam/sdks/python/test-suites/tox/py310/build/srcs/sdks/python/target/.tox-py310-cloud/py310-cloud/bin/python

self = <apache_beam.yaml.yaml_transform_test.YamlTransformE2ETest testMethod=test_csv_to_json>

    def test_csv_to_json(self):
      try:
        import pandas as pd
      except ImportError:
        raise unittest.SkipTest('Pandas not available.')
    
      with tempfile.TemporaryDirectory() as tmpdir:
        data = pd.DataFrame([
            {
                'label': '11a', 'rank': 0
            },
            {
                'label': '37a', 'rank': 1
            },
            {
                'label': '389a', 'rank': 2
            },
        ])
        input = os.path.join(tmpdir, 'input.csv')
        output = os.path.join(tmpdir, 'output.json')
        data.to_csv(input, index=False)
    
        with beam.Pipeline() as p:
          result = p | YamlTransform(
              '''
              type: chain
              transforms:
                - type: ReadFromCsv
                  config:
                      path: %s
                - type: WriteToJson
                  config:
                      path: %s
                  num_shards: 1
              ''' % (repr(input), repr(output)))
    
        output_shard = list(glob.glob(output + "*"))[0]
        result = pd.read_json(
            output_shard, orient='records',
            lines=True).sort_values('rank').reindex()
>       pd.testing.assert_frame_equal(data, result)
E       AssertionError: DataFrame are different
E       
E       DataFrame shape mismatch
E       [left]:  (3, 2)
E       [right]: (2, 2)

apache_beam/yaml/yaml_transform_test.py:260: AssertionError
------------------------------ Captured log call -------------------

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions