Skip to content

Remove global initializers #476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

GuillaumeGomez
Copy link

@GuillaumeGomez GuillaumeGomez commented Jan 21, 2020

Fixes #475.

Since it changes the API, it'll require to change the version from "1.*" to "2.0".

With this, no more global variables with mutable accesses (unless I missed one?). The remaining ones don't seem to have mutable access so I think it's fine.

The first commit is very big, the others are much smaller.

I also took the liberty to add a "libpostal_get_version" function. It might come in handy at some point.

If you have any question, don't hesitate to ask!

@GuillaumeGomez
Copy link
Author

cc @AeroXuk

@GuillaumeGomez GuillaumeGomez force-pushed the remove-global-initializers branch from 76d2172 to 2a6fcb6 Compare January 21, 2020 16:46
@GuillaumeGomez
Copy link
Author

Weird CI errors like:

gcc-4.8: error: unrecognized command line option ‘-Wdate-time’

Is it expected?

Copy link
Contributor

@AeroXuk AeroXuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to find time to go through the whole change at the weekend.
For now, the change looks positive to me.
I have a suggestion about just having the instance parameter passed into the functions rather than two parameters added to some methods.

Comment on lines +13 to +15
libpostal_t *instance = libpostal_setup();
language_classifier_t *classifier = libpostal_setup_language_classifier();
if (instance == NULL || classifier == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest for language_classifier_t *classifier to be a property of libpostal_t.

Suggested change
libpostal_t *instance = libpostal_setup();
language_classifier_t *classifier = libpostal_setup_language_classifier();
if (instance == NULL || classifier == NULL) {
libpostal_t *instance = libpostal_setup();
if (instance == NULL || !libpostal_setup_language_classifier(instance)) {

I think having a single libpostal struct/object to pass into the functions would be more convenient to the user.
Checking whether the classifier or parser have been initialised can then be a check for a null pointer.
libpostal_teardown can then also check and free the classifier or parser if they haven't been freed.

Next two suggestions are to illustrate this way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good point. language_classifier_t functions appear to only perform non-mutable operations so storing this object into libpostal_t is definitely worth it. Updating the code then!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that doesn't sound that good: in the current configuration, you can setup multiple classifiers and also, it "forces" user to call the method in order to get the "language_classifier_t" instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To extend a bit my last comment: we could maybe go the other way around: wrapping a libpostal_t instance into language_classifier_t, but then, to be safe, we'd need to add back some globals or shared variables to be sure that libpostal_t isn't freed before language_classifier_t.

The API looks a bit less nice, I definitely agree with you on this point, but I don't see any good enough solution to make this work without big downsides.

Copy link
Contributor

@AeroXuk AeroXuk Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can still setup multiple classifiers if they have separate instances of libpostal.
In regards to forcing users to call the setup, the library already requires users to call extra setup functions for the larger modules.

I would try to have only a single extra instance parameter as people upgrading from previous versions of libpostal to an updated one will want the minimum amount of code refactoring.
Being that the project has the aim of being easy to use (shipping with pre-trained model), we should aim to have a minimal impact on people already implementing the library.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point. Well, I warned about it. :) I'll make the change then but I really don't think this is a good idea.

Also, the bindings have less than 50 functions to bind so it's pretty simple to update. ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just can't keep stop thinking about it: the API isn't great. For example: libpostal_place_languages is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array of element { label: char*, value: char* }?

So anyway, the changes remain small, even if we keep the language_classifier_t type independant. So why not going to the bottom of this change and prevent all possible data races?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just can't keep stop thinking about it: the API isn't great. For example: libpostal_place_languages is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array of element { label: char*, value: char* }?

I think a lot of the previous API decisions have been based around the language bindings. I'm going to assume for some language bindings it's easier to have separate labels and values arrays then convert them into a language native datatype.

I'll try to go through the rest of this change request tomorrow and give any feedback on other areas. I think this having an instanced API is a great step forward. And will help the project towards being thread-safe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can go back to this discussion once you're done. But I'd really prefer to keep it as is for the previously mentioned reasons.

@@ -108,7 +110,7 @@ int main(int argc, char **argv) {
char **values = cstring_array_to_strings(values_array);

size_t num_near_dupe_hashes = 0;
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);

libpostal_teardown();
libpostal_teardown_language_classifier();
libpostal_teardown(&instance);
libpostal_teardown_language_classifier(&classifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
libpostal_teardown_language_classifier(&classifier);
libpostal_teardown_language_classifier(instance);
libpostal_teardown(&instance);

@AeroXuk
Copy link
Contributor

AeroXuk commented Jan 24, 2020

Weird CI errors like:

gcc-4.8: error: unrecognized command line option ‘-Wdate-time’

Is it expected?

There are generally quite a few warnings from the CI systems.
The failure on travis is due to something else not this change.

@GuillaumeGomez
Copy link
Author

I'll check more in depth what's going on and what did I broke then.

@GuillaumeGomez
Copy link
Author

Any news?

@AeroXuk
Copy link
Contributor

AeroXuk commented Feb 20, 2020

Any news?

Apologies for the delay in being able to finish reviewing this, I've compiled and tested this PR. I don't receive any errors during compilation (the warnings are normal when compiling libpostal).
Testing all the sample applications I don't get any errors either.

I've also gone through all the changes in the diff, and everything I've looked at looks fine and valid.

I've run a static analysis tool against the code and the only remaining shared variables I can find all appear to be constants either in libpostal directly or it's dependencies.

My only sticking point is the use of the library having to pass multiple pointers into the functions now. I think the exported functions should take and load the default values. I think running multiple or different classifiers (etc) is an edge case for most users. Having functions to load & unload custom classifiers (etc) are worthwhile for people needing that functionality. And they should be able to load in one, run the functions, unload it, then load in the next one etc.

@GuillaumeGomez
Copy link
Author

Most people will use the bindings and not the C functions directly. For them, it won't change much things but it'll remove all possible data races for sure. So I'd prefer to stand on my position: it's a bad for a good.

@GuillaumeGomez
Copy link
Author

Any news?

1 similar comment
@GuillaumeGomez
Copy link
Author

Any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make libpostal_setup_* functions return an object instead of global allocation?
2 participants