-
Notifications
You must be signed in to change notification settings - Fork 441
Remove global initializers #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Remove global initializers #476
Conversation
cc @AeroXuk |
76d2172
to
2a6fcb6
Compare
Weird CI errors like:
Is it expected? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try to find time to go through the whole change at the weekend.
For now, the change looks positive to me.
I have a suggestion about just having the instance parameter passed into the functions rather than two parameters added to some methods.
libpostal_t *instance = libpostal_setup(); | ||
language_classifier_t *classifier = libpostal_setup_language_classifier(); | ||
if (instance == NULL || classifier == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest for language_classifier_t *classifier
to be a property of libpostal_t
.
libpostal_t *instance = libpostal_setup(); | |
language_classifier_t *classifier = libpostal_setup_language_classifier(); | |
if (instance == NULL || classifier == NULL) { | |
libpostal_t *instance = libpostal_setup(); | |
if (instance == NULL || !libpostal_setup_language_classifier(instance)) { |
I think having a single libpostal struct/object to pass into the functions would be more convenient to the user.
Checking whether the classifier or parser have been initialised can then be a check for a null pointer.
libpostal_teardown
can then also check and free the classifier or parser if they haven't been freed.
Next two suggestions are to illustrate this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good point. language_classifier_t
functions appear to only perform non-mutable operations so storing this object into libpostal_t
is definitely worth it. Updating the code then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that doesn't sound that good: in the current configuration, you can setup multiple classifiers and also, it "forces" user to call the method in order to get the "language_classifier_t" instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To extend a bit my last comment: we could maybe go the other way around: wrapping a libpostal_t
instance into language_classifier_t
, but then, to be safe, we'd need to add back some globals or shared variables to be sure that libpostal_t
isn't freed before language_classifier_t
.
The API looks a bit less nice, I definitely agree with you on this point, but I don't see any good enough solution to make this work without big downsides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can still setup multiple classifiers if they have separate instances of libpostal.
In regards to forcing users to call the setup, the library already requires users to call extra setup functions for the larger modules.
I would try to have only a single extra instance parameter as people upgrading from previous versions of libpostal to an updated one will want the minimum amount of code refactoring.
Being that the project has the aim of being easy to use (shipping with pre-trained model), we should aim to have a minimal impact on people already implementing the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair point. Well, I warned about it. :) I'll make the change then but I really don't think this is a good idea.
Also, the bindings have less than 50 functions to bind so it's pretty simple to update. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just can't keep stop thinking about it: the API isn't great. For example: libpostal_place_languages
is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array of element { label: char*, value: char* }
?
So anyway, the changes remain small, even if we keep the language_classifier_t
type independant. So why not going to the bottom of this change and prevent all possible data races?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just can't keep stop thinking about it: the API isn't great. For example:
libpostal_place_languages
is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array ofelement { label: char*, value: char* }
?
I think a lot of the previous API decisions have been based around the language bindings. I'm going to assume for some language bindings it's easier to have separate labels and values arrays then convert them into a language native datatype.
I'll try to go through the rest of this change request tomorrow and give any feedback on other areas. I think this having an instanced API is a great step forward. And will help the project towards being thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can go back to this discussion once you're done. But I'd really prefer to keep it as is for the previously mentioned reasons.
@@ -108,7 +110,7 @@ int main(int argc, char **argv) { | |||
char **values = cstring_array_to_strings(values_array); | |||
|
|||
size_t num_near_dupe_hashes = 0; | |||
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); | |||
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); | |
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); |
libpostal_teardown(); | ||
libpostal_teardown_language_classifier(); | ||
libpostal_teardown(&instance); | ||
libpostal_teardown_language_classifier(&classifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libpostal_teardown_language_classifier(&classifier); | |
libpostal_teardown_language_classifier(instance); | |
libpostal_teardown(&instance); |
There are generally quite a few warnings from the CI systems. |
I'll check more in depth what's going on and what did I broke then. |
Any news? |
Apologies for the delay in being able to finish reviewing this, I've compiled and tested this PR. I don't receive any errors during compilation (the warnings are normal when compiling libpostal). I've also gone through all the changes in the diff, and everything I've looked at looks fine and valid. I've run a static analysis tool against the code and the only remaining shared variables I can find all appear to be constants either in libpostal directly or it's dependencies. My only sticking point is the use of the library having to pass multiple pointers into the functions now. I think the exported functions should take and load the default values. I think running multiple or different classifiers (etc) is an edge case for most users. Having functions to load & unload custom classifiers (etc) are worthwhile for people needing that functionality. And they should be able to load in one, run the functions, unload it, then load in the next one etc. |
Most people will use the bindings and not the C functions directly. For them, it won't change much things but it'll remove all possible data races for sure. So I'd prefer to stand on my position: it's a bad for a good. |
Any news? |
1 similar comment
Any news? |
Fixes #475.
Since it changes the API, it'll require to change the version from "1.*" to "2.0".
With this, no more global variables with mutable accesses (unless I missed one?). The remaining ones don't seem to have mutable access so I think it's fine.
The first commit is very big, the others are much smaller.
I also took the liberty to add a "libpostal_get_version" function. It might come in handy at some point.
If you have any question, don't hesitate to ask!