-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix caching calls to _vector_for_key_cached
and _out_of_vocab_vector_cached
#47
base: master
Are you sure you want to change the base?
Conversation
`_query_is_cached` will always returns false because `_cache.get` expects `key` to be in a tuple. This renders the caching useless.
Fix _query_is_cached to allow caching
I'm surprised that this PR receives no attention because it improves performance of our service by a large margin. Here is a code snippet to help understand the effect of this change: from collections import defaultdict
import pandas as pd
from pymagnitude import *
words = ['hello', 'world', 'cache', 'helllloooo', 'wooooorlddddd', 'caaaaache', ]
reversed_words = list(reversed(words))
vector = Magnitude(path=MagnitudeUtils.download_model('glove/medium/glove.twitter.27B.25d', log=True),
language='en',
lazy_loading=2400000)
vector_attrs = ['query', '_vector_for_key_cached', '_out_of_vocab_vector_cached']
def log_cached(vector):
data = defaultdict(list)
cache_attrs = ['size', 'lookups', 'hits', 'misses', 'evictions']
for attr in vector_attrs:
for cache_attr in cache_attrs:
data[cache_attr].append(getattr(getattr(vector, attr)._cache, cache_attr))
df = pd.DataFrame(data, index=vector_attrs)
print(df, '\n')
print('### Query ...')
vector.query(words)
log_cached(vector)
print('### Query reverse ...')
vector.query(reversed_words)
log_cached(vector) Output before the change:
Output after the change:
I also created https://github.com/zfang/benchmark_pymagnitude just for testing my patch. |
_query_is_cached
to actually enable caching_vector_for_key_cached
and _out_of_vocab_vector_cached
…vocab_vector_cached._cache.get call
Hi @zfang, Thanks for this PR, this likely broke at some point when I modified the underlying LRU cache. Sorry, I've been on travel for the last week or so, I'll get around to reviewing this tonight and merging this in the next few days. I'll also add some tests to make sure the cache works and prevent regressions in the future. |
_query_is_cached
will always returnsFalse
becausekey
should be in a tuple.lru_cache
is able to unifyargs
,kwargs
, and default args in a call with theget_default_args
magic in order to generate a consistent cache key. What this means is thata. all the default
args
will be part ofkwargs
;b. any
args
with a default value will also be converted tokwargs
.c. for a parameter that has no default value, if you provide it as
args
in one call and askwargs
in another, they will have different cache keys.Therefore
_out_of_vocab_vector_cached._cache.get(((key,), frozenset([('normalized', normalized)])))
will always returnFalse
since the actual cache key is((key,), frozenset([('normalized', normalized), ('force', force)]))
It's wasteful to call
_cache.get
and throw away the result. So I changed_query_is_cached
to_query_cached
.