Skip to content

Replaced the code derived from dwml with an alternative impl #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

t3tra-dev
Copy link
Contributor

Description

This pull request removes the dependency on the dwml library by re-implementing the functionality to convert mathematical formulas (OMML) in DOCX files to LaTeX format. This change resolves license compatibility issues and aims to provide a cleaner, more maintainable implementation.

Testing

  • All existing tests pass without modification
  • Verified no functionality changes

Implementation Details

  • removed the dwml library's associated code
  • re-implemented the OMML to LaTeX conversion logic with an object-oriented approach
  • introduced new modules for handling the conversion:
    • latex_symbols.py: manages mappings for symbols, functions, and LaTeX commands
    • omml_elements.py: defines classes representing OMML elements
    • omml_parser.py: parses OMML XML into a tree of omml_elements.py instances
    • omml_to_latex.py: orchestrates the conversion process
  • updated pre_process.py to use the new conversion modules
  • deleted ThirdPartyNotices.md, because the licensing issue has been resolved

Migration Notes

This is a non-breaking change as all public APIs remain unchanged.

@afourney
Copy link
Member

Thanks for the work.

Can you describe the perceived license compatibility issue? My understanding is that Apache 2 and MIT are compatible, provided we include the requisite information and attributions in the ThirdPartyNotices file (which we are happy to do).

@t3tra-dev
Copy link
Contributor Author

As you mentioned, the Apache License 2.0 and MIT licenses are fundamentally compatible and can coexist by including appropriate attribution and license information in ThirdPartyNotices.

Rather than a strict license compatibility issue, this change was made for the following reasons:

  • Reduced dependencies: Reducing dependencies on external libraries will reduce future maintenance burdens and security risks.
  • Improved code maintainability: The re-implementation allowed us to adopt a more explicit object-oriented design, improving code readability and maintainability.
  • Improved customizability: The self-implementation makes it easier to extend and optimize functionality specific to our use cases.

Thus, this change was not only a solution to a licensing issue, but also an improvement in the quality of the codebase. Of course, we had the option of continuing to use the dwml library, including attribution, but we decided that from a long-term perspective, self-implementation was preferable.

I'm not good at English so I used a translator. I apologize if I did not convey the detailed nuances...

@afourney
Copy link
Member

I see. Ok this makes sense to me.

One reason I haven't published a new release in a while is that I was nervous about this extra document processing on docx. It all seems to work fine, but rewriting the zip contents of docs, in-flight, felt like it warranted extra caution. I'm satisfied now that the general approach and current code is stable, and will release it to the main channel. Once we've gained some broader experience with this, I will be more comfortable with this PR as well. I agree that the benefits you are proposing are worthwhile.

@t3tra-dev
Copy link
Contributor Author

Thank you for your reply.

I think your opinion is certainly reasonable! I agree with your decision.
You are welcome to hold off on this PR until the time comes when you need it:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants