Skip to content

Conversation

@sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Oct 14, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2124398

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

Comment on lines +43 to +46
def _partitions(self) -> List[InputPartition]:
if self._internal_partitions is None:
self._internal_partitions = self.reader(self.schema()).partitions()
return self._internal_partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will crash with AttributeError if base class methods aren't overridden. The chain self.reader(self.schema()).partitions() will fail because:

  1. self.schema() returns None (base implementation has only pass)
  2. self.reader(None) returns None (base implementation has only pass)
  3. None.partitions() raises AttributeError: 'NoneType' object has no attribute 'partitions'

The base class methods should either raise NotImplementedError or the class should use ABC with @abstractmethod decorators:

from abc import ABC, abstractmethod

class DataSource(ABC):
    @abstractmethod
    def reader(self, schema: Union[StructType, str]) -> DataSourceReader:
        raise NotImplementedError()
Suggested change
def _partitions(self) -> List[InputPartition]:
if self._internal_partitions is None:
self._internal_partitions = self.reader(self.schema()).partitions()
return self._internal_partitions
def _partitions(self) -> List[InputPartition]:
if self._internal_partitions is None:
schema = self.schema()
if schema is None:
raise NotImplementedError("schema() method must be implemented by subclasses")
reader = self.reader(schema)
if reader is None:
raise NotImplementedError("reader() method must be implemented by subclasses")
self._internal_partitions = reader.partitions()
return self._internal_partitions

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

This comment came from an experimental review—please leave feedback if it was helpful/unhelpful. Learn more about experimental comments here.

Comment on lines 754 to 759
for result in reader.read(partition):
if isinstance(result, list):
yield from result
else:
yield from list(reader.read(partition))
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a logical issue in the result processing logic. When a result is not a list, the code calls reader.read(partition) a second time, which would re-execute the entire read operation and potentially return duplicate data. Additionally, the break statement after processing the first non-list result would prematurely exit the loop, potentially causing data loss.

Consider revising this section to handle non-list results directly:

for result in reader.read(partition):
    if isinstance(result, list):
        yield from result
    else:
        yield result  # Simply yield the individual result

This approach would process each result exactly once without re-executing the read operation.

Suggested change
for result in reader.read(partition):
if isinstance(result, list):
yield from result
else:
yield from list(reader.read(partition))
break
for result in reader.read(partition):
if isinstance(result, list):
yield from result
else:
yield result

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants