Skip to content

Conversation

dllliu
Copy link
Collaborator

@dllliu dllliu commented Jul 9, 2024

  • Package parser/serialization/metrics work into SerializedParserMetrics class
  • Pre processing/clustering for building point labels with DBScan
  • Parsers dynamically ran via dynamic module loading at runtime to obtain emitted tokens
  • Multiple abbreviation list support (tools to merge/sort abbreviations)

Checklist

  • Example notebook showcasing how to use
  • Unit tests for SerializedParserMetrics class
  • Detailed Documentation for relevant functions and SerializedParserMetrics class
  • Exception Handling for file/io errors, not enough points to cluster, invalid llm output
  • Parser and Serializer work has been combined

Notes

  • Schikit-Learn requires Python version of at least 3.9
  • Other added packages are: langchain, langchain-community, pyenchant, scikit-learn
  • Running all unit tests will take longer (SerializedParserMetrics class takes time to populate)

@dllliu dllliu requested a review from gtfierro July 9, 2024 20:56
@gtfierro
Copy link
Collaborator

Hey @dllliu and @TShapinsky -- does this incorporate changes from any outstanding PRs? I want to make sure my review sticks to @dllliu's code

@TShapinsky
Copy link
Member

Hey @dllliu and @TShapinsky -- does this incorporate changes from any outstanding PRs? I want to make sure my review sticks to @dllliu's code

There's a cherrypicked commit from the create parser UI branch which fixes the serialization. Besides that, no

Copy link
Collaborator

@gtfierro gtfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @dllliu ! Let me know if you have any questions on my comments

from buildingmotif.label_parsing.tools import abbreviationsTool, codeLinter


class SerializedParserMetrics:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you had something else in mind, but I'm wondering why this class is building the parser. I think this metrics class is better suited as an output of another method which builds the parsers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I am misinterpreting, but the generate_parsers_for_points() and generate_parsers_for_clusters() methods in __init__ are building the parsers, while the class just populates the relevant instance variables with the appropriate information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but these are called from this class. I think the entrypoint should be something like a ParserBuilder class which calls those generation methods, and emits the Metrics class. We are not constructing the metrics. We are constructing parsers! The metrics are a side-effect of this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtfierro The ParserBuilder class now emits a ParserMetrics class which takes care of serialization and gathering parsing related metrics. The ParserBuilder class still keeps track of the clustering and distance metrics, as that is directly related to parser generation. The notebook and tests have also been updated, let me know what you think.

pyproject.toml Outdated
Comment on lines 42 to 46
abbreviations = "^0.2.5"
langchain = "^0.2.3"
langchain-community = "^0.2.4"
pyenchant = "^3.2.2"
scikit-learn = "^1.5.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gate your dependencies behind a feature flag? Call it something like label_parsing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit also addresses this, please let me know if this is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants