Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Count unit #8

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kokosing
Copy link

Introduce Count unit

return new Count(size, unit).convertToMostSuccinctCount();
}

private final double value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want an inexact value for this? IIRC, a single long would get us to 4000 quadrillion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rather thinking about case where something could have fraction part (like water or mass). Though I do not have strong opinion. Would you like me to change this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I can't think of any case where I would want to use this for non-integral counts. For water or mass, I assume we would introduce Volume and Mass units. Maybe @electrum or @martint have thoughts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that DataSize is also using double. Using double here is useful when doing a coversions between different units, like from KB to MB (as fractional part appears). The same happens for Count. When long is used then Count become precise more precise, but not all conversion are possible.

See #9.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If were going to keep this pattern, I think we should make it clear that the value is inexact.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this class to Quantity so it now it clear that it may contain fractional part. Also now, it is a bit more general.

@kokosing
Copy link
Author

kokosing commented Nov 5, 2018

Automation failure does not seem related.

@kokosing
Copy link
Author

kokosing commented Nov 5, 2018

@electrum Would you mind to take a look?

@kokosing
Copy link
Author

kokosing commented Jan 4, 2019

@electrum, @dain Ping, it blocks: prestodb/presto#11443

Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed the high level, and it looks good to me. But, I think the needs more eyes due to the difficult in changing this in the future. Specifically, I think @electrum and @martint should read this over.

// call allocates a new array at each call.
private static final Unit[] COUNT_UNITS = Unit.values();

public static Count succinct(long bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes :)

Maybe rename this to count?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D Fixed

return new Count(size, unit).convertToMostSuccinctCount();
}

private final double value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If were going to keep this pattern, I think we should make it clear that the value is inexact.

{
//This order is important, it should be in increasing magnitude.
SINGLE(1L, ""),
THOUSAND(1000L, "k"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use a lowercase k here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took from here: https://en.wikipedia.org/wiki/Power_of_10

They mention both ways. Changed.

Copy link
Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed

// call allocates a new array at each call.
private static final Unit[] COUNT_UNITS = Unit.values();

public static Count succinct(long bytes)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D Fixed

return new Count(size, unit).convertToMostSuccinctCount();
}

private final double value;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this class to Quantity so it now it clear that it may contain fractional part. Also now, it is a bit more general.

{
//This order is important, it should be in increasing magnitude.
SINGLE(1L, ""),
THOUSAND(1000L, "k"),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took from here: https://en.wikipedia.org/wiki/Power_of_10

They mention both ways. Changed.

@kokosing
Copy link
Author

think the needs more eyes due to the difficult in changing this in the future

Why is so? I think it will be difficult to change if we get a vast usage. I hope that along the way the usage is growing we will fix all the issues.

@kokosing
Copy link
Author

@electrum @martint ping

@martint
Copy link
Member

martint commented Feb 5, 2019

I'm ok with having something like this in airlift, but please let's not call it a "count unit". There's no unit involved here (e.g., kilogram, meters, dollars, etc). What we're talking about is a scaled count, or a count with scale.

return Double.hashCode(value);
}

public enum Unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a unit. It's a scale or magnitude.

@kokosing
Copy link
Author

kokosing commented Feb 5, 2019

but please let's not call it a "count unit"

To me it is some kind of "unit" as it describes certain physical property. Anyway, I liked Magnitude much better, thanks for pointing this.

@kokosing
Copy link
Author

kokosing commented Feb 5, 2019

Addressed comments.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

I'm also not sure the parsing and formatting semantics are right. If we don't have fractions, then six digits are required to accurately represent anything. For example, 123456 cannot be converted to K without using fractions or losing precision. But we could have 123.456K without losing accuracy (since we are doing base 10 magnitudes).

}

/**
* @return count with no bigger value than 1000 in succinct magnitude, fractional part is rounded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is there a fractional part when value is a long?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you pass 123456 in SINGLE then you will get 123.456K that will be rounded to 123K.


public Count(long count, Magnitude magnitude)
{
checkArgument(!Double.isInfinite(count), "count is infinite");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these checks are not needed for long


public enum Magnitude
{
//This order is important, it should be in increasing magnitude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

// must be in increasing magnitude order

}

@Test(dataProvider = "conversions")
public void testConversions(Count.Magnitude magnitude, Count.Magnitude toMagnitude, long factor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static import Magnitude

QUADRILLION(1000_000_000_000_000L, "P");

private final long factor;
private final String magnitutdeString;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo magnitutdeString (also in constructor argument, message below, and getter)


public Count convertTo(Magnitude magnitude)
{
requireNonNull(magnitude, "magnitude is null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this check to getValue (or remove it since the caller will get an NPE anyway)

@Override
public int hashCode()
{
return Double.hashCode(getValue(Magnitude.SINGLE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as compareTo

import static java.lang.String.format;
import static java.util.Objects.requireNonNull;

public class Count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this final (DataSize should have been final as well)

@Override
public int compareTo(Count o)
{
return Double.compare(getValue(Magnitude.SINGLE), o.getValue(Magnitude.SINGLE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This double comparison is wrong. However, using getValue() won't work, since it can fail on overflow. Maybe we should have a toBigInteger() method and use that here.

return (long) (floor(value)) + magnitude.getMagnitutdeString();
}

return format(Locale.ENGLISH, "%.2f%s", value, magnitude.getMagnitutdeString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formatting seems wrong for long.

@kokosing
Copy link
Author

For example, 123456 cannot be converted to K without using fractions or losing precision.

Any conversion that loses a precision throws an exception, unless it is says it rounds like succinctRounded.

But we could have 123.456K without losing accuracy (since we are doing base 10 magnitudes).

Do you mean to always store value raw value (in magnitude SINGLE) and convert (format) it only when displaying.

@kokosing
Copy link
Author

Comments addressed. Thank you for a review.

MILLION(1000_000L, "M"),
BILLION(1000_000_000L, "B"),
TRILION(1000_000_000_000L, "T"),
QUADRILLION(1000_000_000_000_000L, "P");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not consistent with using B for Billion. How about Q?

Suggested change
QUADRILLION(1000_000_000_000_000L, "P");
QUADRILLION(1000_000_000_000_000L, "Q");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants