-
Notifications
You must be signed in to change notification settings - Fork 620
feat: add analyzer plugin system #1825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add core plugin architecture for sample analyzers: - SampleAnalyzer abstract base class - AnalyzerRegistry for plugin management - Simple unit tests following codebase patterns This provides the foundation for implementing specific analyzers in future PRs.
class AnalyzerRegistry: | ||
"""Registry for sample analyzer plugins.""" | ||
|
||
_analyzers: dict[str, type[SampleAnalyzer]] = {} | ||
|
||
@classmethod | ||
def register(cls, analyzer_id: str, analyzer_class: type[SampleAnalyzer]) -> None: | ||
"""Register a sample analyzer class. | ||
|
||
Args: | ||
analyzer_id: Unique identifier for the analyzer | ||
analyzer_class: The sample analyzer class to register | ||
""" | ||
cls._analyzers[analyzer_id] = analyzer_class | ||
|
||
@classmethod | ||
def get_analyzer(cls, analyzer_id: str) -> Union[type[SampleAnalyzer], None]: | ||
"""Get a sample analyzer class by ID. | ||
|
||
Args: | ||
analyzer_id: The analyzer ID to look up | ||
|
||
Returns: | ||
The sample analyzer class or None if not found | ||
""" | ||
return cls._analyzers.get(analyzer_id) | ||
|
||
@classmethod | ||
def list_analyzers(cls) -> list[str]: | ||
"""List all registered sample analyzer IDs. | ||
|
||
Returns: | ||
List of registered sample analyzer IDs | ||
""" | ||
return list(cls._analyzers.keys()) | ||
|
||
@classmethod | ||
def create_analyzer( | ||
cls, analyzer_id: str, config: dict[str, Any] | ||
) -> SampleAnalyzer: | ||
"""Create a sample analyzer instance. | ||
|
||
Args: | ||
analyzer_id: The analyzer ID to create | ||
config: Configuration for the analyzer | ||
|
||
Returns: | ||
An instance of the sample analyzer | ||
|
||
Raises: | ||
ValueError: If the analyzer ID is not registered | ||
""" | ||
analyzer_class = cls.get_analyzer(analyzer_id) | ||
if analyzer_class is None: | ||
raise ValueError(f"Unknown analyzer ID: {analyzer_id}") | ||
return analyzer_class(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this doesn't live in our registry.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnalyzerRegistry is only used for managing analyzers and keeps that logic modular and separate from main dataset/model registry. If we prefer centralization, I can move it to registry.py
def __init__(self, config: dict[str, Any]): | ||
"""Initialize the sample analyzer with configuration. | ||
|
||
Args: | ||
config: Configuration dictionary for the analyzer | ||
""" | ||
self.config = config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the base class need to enforce using a config in the initializer? I don't see any methods that use this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a fair point. I can't think of a reason now for the base class to need the config. Removed that
def test_analyzer_registry_basic(): | ||
"""Test basic registry functionality.""" | ||
# Clear registry | ||
AnalyzerRegistry._analyzers.clear() | ||
|
||
# Register analyzer | ||
AnalyzerRegistry.register("simple", SimpleAnalyzer) | ||
assert "simple" in AnalyzerRegistry._analyzers | ||
|
||
# Get analyzer | ||
analyzer_class = AnalyzerRegistry.get_analyzer("simple") | ||
assert analyzer_class == SimpleAnalyzer | ||
|
||
# Create instance | ||
analyzer = AnalyzerRegistry.create_analyzer("simple", {"test": "config"}) | ||
assert isinstance(analyzer, SimpleAnalyzer) | ||
assert analyzer.config == {"test": "config"} | ||
|
||
# Test analysis | ||
result = analyzer.analyze_message("hello", {"role": "user"}) | ||
assert result == {"length": 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Each of these should be unique tests.
It's much more informative to see that test_analyzer_registry_get_analyzer
failed than test_analyzer_registry_basic
failed. The first tells you immediately what happened, whereas the second requires you to parse the test to understand what's broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, split them into separate tests
analyzer_id: Unique identifier for the analyzer | ||
analyzer_class: The sample analyzer class to register | ||
""" | ||
cls._analyzers[analyzer_id] = analyzer_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation that we don't add the same id twice/overwrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, added the check in the register method and added a test for it
def test_analyzer_registry_basic(): | ||
"""Test basic registry functionality.""" | ||
# Clear registry | ||
AnalyzerRegistry._analyzers.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably make this something that runs before every test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point. added a clear_registry fixture that runs automatically before each test
Description
This is the third PR for for Analyze tool. The complete changes can be found in the full feature branch: main...ryan-arman-anlyze_v0
This PR introduces the core plugin architecture for sample analyzers in Oumi:
Related issues
Towards OPE-1407
Before submitting
Reviewers
At least one review from a member of
oumi-ai/oumi-staff
is required.