-
Notifications
You must be signed in to change notification settings - Fork 48
modify the FormatMetaparameters function in train_lm.py #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 4 commits
3bb42c2
e9a51c3
dba06b8
87ca277
5d5f69d
732e304
e95b774
9eee9c8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -144,15 +144,70 @@ def WriteMetaparameters(metaparameters, ngram_order, num_train_sets, out_file): | |
i += 1 | ||
f.close() | ||
|
||
def FormatMetaparameters(metaparameters): | ||
def FormatMetaparameters(metaparameters, num_train_sets): | ||
# store metaparameters as floats during checking and rerounding | ||
out = [] | ||
# final output metaparameters in string | ||
out_param = [] | ||
# round all float metaparameters to 3rd decimal places | ||
round_index = 3 | ||
for param in metaparameters: | ||
x = '{:.3f}'.format(param) | ||
if x == '0.00': | ||
x = '{:.3g}'.format(param) | ||
x = round(param, round_index) | ||
out.append(x) | ||
|
||
return ','.join(out) | ||
|
||
size = len(metaparameters) | ||
|
||
# check1: whether 0.000 or 1.000 occur because of rounding | ||
# after this step, no 0.000 or 1.000 occurs unless the original number equal | ||
# to 0 or 1 (if this happen, the parameters are invalid and it exits | ||
# with a warning, while this case can be very rare) | ||
for i in range(0, size): | ||
if out[i] in [0.0, 1.0]: | ||
out[i] = RoundToProperDigit(metaparameters[i], round_index) | ||
if out[i] in [0.0, 1.0]: | ||
sys.exit("train_lm.py: there is a {0}. metaparameters should be in range (0, 1).".format(out[i])) | ||
|
||
# check2: check repeating values. if happen, then reround them until they are different | ||
# if there exist same values originally, exit with a warning (this can be very rare) | ||
# after this step, there should be no any repeating values in set out | ||
for i in range(0, size): | ||
# Note: the first #num_train_sets weights are not needed to be checked (they can be the same) | ||
if i < num_train_sets: | ||
out_param.append(format(out[i])) | ||
else: | ||
x = out[i] | ||
for j in range(i + 1, size): | ||
y = out[j] | ||
# we will round to the next digit when we detect repeated values | ||
round_digit = round_index + 1 | ||
round_max_digit = max(len(str(metaparameters[i])) - 2, len(str(metaparameters[j])) -2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The '... - 2' here, I guess you are assuming the input must start with '0.', but do you check the range of input parameters? Since the input for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I assumed it started '0.' Do you think I should double check the input On Thu, Aug 18, 2016 at 10:23 PM, Wang Jian [email protected]
Ke Li There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yes, it may be not necessary to check the input for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I should have thought about it. But I found the script On Thu, Aug 18, 2016 at 11:32 PM, Wang Jian [email protected]
Ke Li There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, we can leave the work to validate_metaparameters.py. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logic, I believe, still not completely right. For example, Second, I think the reason, why input metaparameters = [0.000001,0.000002,0.000003] gives [0,0,0] is because len(str(0.00000001)) = 1e-8 = 5 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! Thanks. I'll modify it soon. On Fri, Aug 19, 2016 at 1:35 AM, chris920820 [email protected]
Ke Li |
||
while x == y and round_digit <= round_max_digit: | ||
x = round(metaparameters[i], round_digit) | ||
y = round(metaparameters[j], round_digit) | ||
round_digit += 1 | ||
# actually validate_metaparameters.py will validate those parameters later. | ||
# except for satisfying condition that those parameters should be in | ||
# range (0, 1), parameters D1, D2, D3, D4 for a certain xgram model | ||
# should satisfy D1 > D2 > D3 > D4. | ||
# here we only checked the repeating values but not the order relationship | ||
if x == y: | ||
sys.exit("train_lm.py: {0} and {1} are the same. metaparameters can not be exactly the same.".format(x,y)) | ||
out[j] = y | ||
|
||
# out[i] now is valided and can be appended to the output set out_param | ||
out[i] = x | ||
out_param.append(format(out[i])) | ||
|
||
return ','.join(out_param) | ||
|
||
def RoundToProperDigit(number, round_index): | ||
round_index += 1 | ||
x = round(number, round_index) | ||
round_max_digit = len(str(number)) - 2 | ||
while x in [0.0, 1.0] and round_index <= round_max_digit: | ||
x = round(number, round_index) | ||
round_index += 1 | ||
return x | ||
|
||
def ParseMetaparameters(encoded_str, ngram_order, num_train_sets): | ||
metaparameters = encoded_str.split(',') | ||
|
@@ -342,7 +397,7 @@ def ParseMetaparameters(encoded_str, ngram_order, num_train_sets): | |
metaparameters = ReadMetaparameters(metaparam_file) | ||
LogMessage("You can set --bypass-metaparameter-optimization='{0}' " | ||
"to get equivalent results".format( | ||
FormatMetaparameters(metaparameters))) | ||
FormatMetaparameters(metaparameters, num_train_sets))) | ||
|
||
# make lm dir | ||
lm_dir = os.path.join(args.lm_dir, lm_name + '.pocolm') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only load the
num_train_sets
variable whenargs.bypass_metaparameter_optimization != None
. If you need to use this value, please move the loading part of code outside the if condition.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this reminder. I'll modify it.
On Thu, Aug 18, 2016 at 10:27 PM, Wang Jian [email protected]
wrote:
Ke Li
Dept. of Electrical and Computer Engineering
Johns Hopkins University
Email: [email protected]