Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding methods for sequential processing and filtering of class reflections #1475

Open
wants to merge 10 commits into
base: 6.52.x
Choose a base branch
from

Conversation

shcherbanich
Copy link

I made a small optimization of the FileIteratorSourceLocator so that all files are not loaded into the array before filtering, but only after it, and also added new functionality for sequential work with files and the ability to filter them.

Motivation

When working with really big projects, performance issues often arise. They can be solved with some techniques, such as sequential file processing. This was my main motivation for this work.

What has been done?

  1. New methods for sequentially obtaining *Reflection have been added to SourceLocators;
  2. A wrapper method has been added to DefaultReflector for sequentially obtaining reflect classes;
  3. A filtering system has been implemented for obtained *Reflection. In most cases, filters are triggered before all complex operations with AST, so filtering in this way is a much more productive method than obtaining individual classes using reflectClass;
  4. The method for processing files in FileIteratorSourceLocator has been changed. Now they are processed sequentially, and not immediately loaded into an array. This reduces resource consumption and speeds up the work of this locator.

Copy link
Member

@Ocramius Ocramius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch has very good quality, and is worth looking at, but I think you spiked an approach and pushed it too hard here, leading to a bigger than necessary change upfront.

Before deciding on which approach to pick, we should attempt a second patch that:

  • isolates filtering into a single, dedicated source locator implementation
  • makes the default implementation of iterable<ReflectionClass> a Generator (happy to break BC there, and release a new major version!)

Comment on lines +65 to +75
public function iterateClasses(?SourceFilter $filter = null): Generator
{
/** @var Generator<ReflectionClass> $allClasses */
$allClasses = $this->sourceLocator->iterateIdentifiersByType(
$this,
new IdentifierType(IdentifierType::IDENTIFIER_CLASS),
$filter,
);

yield from $allClasses;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think SourceFilter should be given at call-time: it should be an implementation detail of the source locator only, IMO.

/**
* Iterate classes available in the scope specified by the SourceLocator.
*
* @return Generator<ReflectionClass>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'd say that we can change the default reflectAllClasses() API to return an iterable<ReflectionClass>, rather than adding new API here.

Yes, it's a breaking change, but it's a good one to have

public function iterateIdentifiersByType(
Reflector $reflector,
IdentifierType $identifierType,
?SourceFilter $sourceFilter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned elsewhere: this is IMO something that shouldn't be given at call-time, but should be instead intrinsic of the way you are looking at a codebase (effectively, a FilteredSourceLocator)

Comment on lines +7 to +12
interface SourceFilter
{
public function getKey(): string;

public function isAllowed(string $source, ?string $name, ?string $filename = null): bool;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this API is too specific to your use-case, to be considered as part of the library itself.

Also, following general FP patterns, and having BC boundaries in mind, I'd rather say that isAllowed() should be relying on a userland callback, and since each locator operates slightly differently, it's hard to design a generic Context object to be used for filtering decisions

Comment on lines +27 to +30
new AggregateFilter(
new FileSizeFilter(10000),
new SourceContainsFilter(['class ReflectionMethod', 'class ReflectionClass'])
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, you attempted to design this code so that you could push a runtime filter down the stack, into the lookup operation, for example, to select all classes in a specific namespace.

I understand the rationale, but am a bit conflicted on the design, and the attempt to create a structured filter here, which is extremely risky due to the internals of source locators being squishy, whilst this interface being public.

I don't have a clear design/suggestion for you right now, but I think a first attempt should be done at creating a single FilteringSourceLocator that can wrap a $next one and apply filtering at that level, operating on a LocatedSource|null: that would make the use-case more specific, and would isolate the filtering change onto a single implementation.

Comment on lines +47 to 71
$content = file_get_contents($this->fileName);
if (! $content) {
return null;
}

$name = $identifier->getName();

if (
$name !== Identifier::WILDCARD &&
! str_starts_with($name, ReflectionClass::ANONYMOUS_CLASS_NAME_PREFIX)
) {

$shortName = array_reverse(explode('\\', $identifier->getName()))[0];

if (! str_contains($content, $shortName)) {
return null;
}
}

return new LocatedSource(
file_get_contents($this->fileName),
$identifier->getName(),
$content,
$name,
$this->fileName,
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably attempt filtering in an alternative implementation of SingleFileSourceLocator, then look for who is instantiating SingleFileSourceLocator, and see if we can somehow inject the constructor for it there 🤔

It's just an idea, but it would allow for filtering and transforming sources before skipping or passing them to parsing/reflection

@Ocramius
Copy link
Member

Ocramius commented Jan 5, 2025

Related: #614

@shcherbanich
Copy link
Author

shcherbanich commented Jan 5, 2025

@Ocramius , thank you for the review!

I agree that the changes turned out to be too big. In this case, I suggest splitting this work into 3 separate parts with 3 separate PRs:

  1. Changing the default behavior of reflectAllClasses / reflectAllFunctions / reflectAllConstants in DefaultReflector and using the generator as the default solution in source locators;

  2. Implementing filtering using a new source locator (I'll think about how to do it better);

  3. Optimizing SingleFileSourceLocator (I'll try to remake it in the form you noted in the comments).

I'll start doing this in the specified order if you don't mind :)

UPD: first part completed here: #1476

@staabm
Copy link
Contributor

staabm commented Jan 6, 2025

When working with really big projects, performance issues often arise. They can be solved with some techniques, such as sequential file processing. This was my main motivation for this work.

would be great to have some kind of benchmark to compare the performance before/after this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants