Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating fast_float with Valkey to replace strtod invocation #1260

Open
wants to merge 47 commits into
base: unstable
Choose a base branch
from

Conversation

parthpatel
Copy link
Member

@parthpatel parthpatel commented Nov 4, 2024

This is an alternative to this PR request by @swaingotnochill : #1170
This allows fast_float to remain a submodule that we can pull from for updates.

---- Update after the discussion

I am dropping the submodule idea for now as it requires a larger discussion about release process. I don't have access to @swaingotnochill's repository or PR. Therefore, I pulled his commit into this CR to maintain his author-ship on the code he wrote. I integrated it with Redis and fixed the Makefile. It should build now and will be ready to push.

I will work on benchmarking this separately. I can also turn off fast_float by default if folks have concerns and test it on my branch.

@madolson
Copy link
Member

madolson commented Nov 4, 2024

I suppose I don't really understand the benefit of a submodule vs inlining. Since we aren't tightly controlling the versioning between the main repo and releases, it becomes much harder to know if there is a security issue impacting a specific release. At the very least we should be pinning a specific version of fast_float that we are pulling.

@parthpatel
Copy link
Member Author

Since we aren't tightly controlling the versioning between the main repo and releases, it becomes much harder to know if there is a security issue impacting a specific release. At the very least we should be pinning a specific version of fast_float that we are pulling.

We track fast_float commit id in the valkey repository with this change - see the copy pasted change from my commit below. For every valkey commit, we can look up exact version of fast_float at all times using this method. Pulling new fast_float version is as simple as git pull on the submodule.


Submodule fast_float added at e800ca

@parthpatel parthpatel added the pending-refinement This issue/request is still a high level idea that needs to be further refined label Nov 4, 2024
@parthpatel parthpatel marked this pull request as draft November 4, 2024 21:19
@madolson
Copy link
Member

madolson commented Nov 4, 2024

Does it get pulled automatically as part of the release into the release artifacts?

@parthpatel
Copy link
Member Author

Does it get pulled automatically as part of the release into the release artifacts?

There is a "git submodule update --init" command in Makefile to initialize it automatically. So yes, It will automatically checkout the same commit every time during build.

@madolson
Copy link
Member

madolson commented Nov 4, 2024

There is a "git submodule update --init" command in Makefile to initialize it automatically. So yes, It will automatically checkout the same commit every time during build.

So we are adding a new dependency to the release process, since you need to be able to fetch the code from github. I think we should consider figuring out how to pull the code in when we do a release so that folks don't need to do git submodule update --init.

@parthpatel
Copy link
Member Author

parthpatel commented Nov 4, 2024

There is a "git submodule update --init" command in Makefile to initialize it automatically. So yes, It will automatically checkout the same commit every time during build.

So we are adding a new dependency to the release process, since you need to be able to fetch the code from github. I think we should consider figuring out how to pull the code in when we do a release so that folks don't need to do git submodule update --init.

The other approach would be to just check-in the whole git repo as a folder under valkey, which should work. What is the issue with git dependency in github release workflows? I can put a post-checkout hook that always initializes modules on checkout, as long as checkout happens outside of the release process.

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 85.18519% with 4 lines in your changes missing coverage. Please review.

Project coverage is 70.71%. Comparing base (86f33ea) to head (129fdbf).

Files with missing lines Patch % Lines
src/valkey-cli.c 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1260      +/-   ##
============================================
+ Coverage     70.55%   70.71%   +0.15%     
============================================
  Files           115      116       +1     
  Lines         63158    63167       +9     
============================================
+ Hits          44561    44667     +106     
+ Misses        18597    18500      -97     
Files with missing lines Coverage Δ
src/debug.c 53.09% <100.00%> (+0.04%) ⬆️
src/resp_parser.c 98.47% <100.00%> (ø)
src/sort.c 94.82% <100.00%> (+0.01%) ⬆️
src/t_zset.c 95.61% <100.00%> (-0.05%) ⬇️
src/util.c 71.42% <100.00%> (ø)
src/valkey_strtod.h 100.00% <100.00%> (ø)
src/valkey-cli.c 55.46% <0.00%> (+1.62%) ⬆️

... and 11 files with indirect coverage changes

@parthpatel
Copy link
Member Author

I am dropping the submodule idea for now as it requires a larger discussion about release process. I don't have access to @swaingotnochill's repository or PR. Therefore, I pulled his commit into this CR to maintain his author-ship on the code he wrote. I integrated it with Redis and fixed the Makefile. It should build now and will be ready to push.

I will work on benchmarking this separately. I can also turn off fast_float by default if folks have concerns and test it on my branch.

Copy link

@swaingotnochill swaingotnochill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to modify. I am on a vacation so it will be difficult for me to work until I am back. Cheers

@parthpatel
Copy link
Member Author

@zuiderkwast or @madolson any pointers on how to solve almalinux issue with missing g++?

cd fast_float && make
make[3]: Entering directory '/__w/valkey/valkey/deps/fast_float'
g++ -std=c++11 -O3 -fPIC -c fast_float_strtod.cpp -o fast_float_strtod.o
make[3]: g++: Command not found
make[3]: *** [Makefile:9: fast_float_strtod.o] Error 127
make[3]: Leaving directory '/__w/valkey/valkey/deps/fast_float'
make[2]: *** [Makefile:80: fast_float] Error 2
make[2]: *** Waiting for unfinished jobs....

@parthpatel parthpatel changed the title Integrating fast_float as a git submodule with Valkey to replace strtod invocation Integrating fast_float with Valkey to replace strtod invocation Nov 5, 2024
@zuiderkwast
Copy link
Contributor

I'm skeptical to submodules too. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL).

I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries?

@parthpatel parthpatel marked this pull request as ready for review November 6, 2024 21:48
swaingotnochill and others added 10 commits November 6, 2024 22:01
* Simplified the interface to remove if branches.
* Simplified Makefile to be more readable.
* Integrating fast_float with the redis code base in resp_parser.c file.

Signed-off-by: Parth Patel <[email protected]>
…fy it explicitly in Makefile to fix 32-bit compilation issues.

Signed-off-by: Parth Patel <[email protected]>
@parthpatel
Copy link
Member Author

parthpatel commented Nov 12, 2024

This build should succeed now. Here is a summary of changes from today:

  1. Moved fast_float integration code to src/fast_float. Unit test still resides in this same directory because src/unit build is broken.
  2. Created a wrapper named valkey_strtod in fast_float_strtod.h header file. It is "static inline" function. This removes ifdefs from individual code files.
  3. USE_FAST_FLOAT is off by default and now disables entire fast float compilation.
  4. Switching back to using single fast_float.h header file. deps/fast_float only contains 1 file now. README is merged into deps/README.md

I verified that this compiles for Mac and Rhel for both USE_FAST_FLOAT=yes and USE_FAST_FLOAT=no. Used "make test" to also include tests.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress.

On the topic of moving the fast_float_strtod.cpp to src/, the src/Makefile has become a monolith. From my experience so far with this Makefile, I am now strongly against this. We should move towards smaller build units.

This is another topic. We appear to be moving towards CMake in the long term. Let's not "solve" this now. Keep the monolith and flat directory structure.

The Makefile monolith provides a common ground for optimization flags, and debug flags and other stuff. I'm posting make code in the comments below that you can use. (Personally, I don't think this file is messy at all, given all it does. I guess you need to understand make well to like it.)

Unit test still resides in this same directory because src/unit build is broken.

What do you mean broken? They're fairly new and they work well in the CI. Please explain what problems you encountered so we can help you.

src/fast_float/Makefile Outdated Show resolved Hide resolved
src/Makefile Outdated
Comment on lines 386 to 387
fast_float/fast_float_strtod.o:
cd fast_float && $(MAKE) CFLAGS=" $(CFLAGS) " LDFLAGS=" $(LDFLAGS) "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default make target is the first target in the file. Here, you're adding this target above the all target, so when I type make, it only builds this file.

$ make
    CC Makefile.dep
cd fast_float && make CFLAGS=" -funwind-tables " LDFLAGS="  "
make[1]: Entering directory '/home/viktor/repos/valkey/src/fast_float'
g++ -std=c++11 -O3 -fPIC -D FASTFLOAT_ALLOWS_LEADING_PLUS -funwind-tables   -c fast_float_strtod.cpp -o fast_float_strtod.o
make[1]: Leaving directory '/home/viktor/repos/valkey/src/fast_float'

This needs to be somewhere below all:.


I tried changing this to get rid of the separate Makefile under src/fast_float/ and to use the same optimization and debug flags as for the rest of Valkey, because it can be configured for debugging, etc.

I just added this code below the rules for %.o: and unit/%.o:

ifeq ($(USE_FAST_FLOAT),yes)
CXX ?= c++
CXXFLAGS += -Wall -W $(OPT) $(DEBUG)
fast_float/fast_float_strtod.o: fast_float/fast_float_strtod.cpp ../deps/fast_float/fast_float.h
	$(QUIET_CXX)$(CXX) $(CXXFLAGS) $(CFLAGS) $(SERVER_CFLAGS) -D FASTFLOAT_ALLOWS_LEADING_PLUS -c $< -o $@
endif

The QUIET_CXX variable is just for nice output, because we have something similar for CC. The definition next to QUIET_CC:

 ifndef V
 QUIET_CC = @printf '    %b %b\n' $(CCCOLOR)CC$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
+QUIET_CXX = @printf '    %b %b\n' $(CCCOLOR)C++$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
 QUIET_GEN = @printf '    %b %b\n' $(CCCOLOR)GEN$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
 QUIET_LINK = @printf '    %b %b\n' $(LINKCOLOR)LINK$(ENDCOLOR) $(BINCOLOR)$@$(ENDCOLOR) 1>&2;

With these changes, I don't see the need for the separate sub directory src/fast_float/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default make target is the first target in the file. Here, you're adding this target above the all target, so when I type make, it only builds this file.

Good catch. I was too busy testing "make test" across all platforms to focus on normal make. Fixed.

Working on the Makefile changes, but I am against a single monolith Makefile. Experienced valkey developers may be able to make out what's happening there but new developers will have significant friction. If everyone else is in agreement, I will push that change.

src/Makefile Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this file src/valkey_strtod.h, named after the function it provides, rather than what it uses behind the scenes. Fast float is just one of the two implementations it can use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, strtod has the signature double strtod(const char *str, char **endptr);, I think we should ideally conform so that it's more normal to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, strtod has the signature double strtod(const char *str, char **endptr);, I think we should ideally conform so that it's more normal to follow.

I changed the signature to remove some if checks. endptr can be null. Some of the invocations pass null and others don't. I think this is a minor change.

Let's call this file src/valkey_strtod.h,

Done.

src/fast_float/fast_float_strtod.h Outdated Show resolved Hide resolved
parthpatel and others added 5 commits November 12, 2024 13:26
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Parth <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Parth <[email protected]>
* also moved fast_float target below all so that all remains default target.

Signed-off-by: Parth Patel <[email protected]>
* Also removed doxygen tags.

Signed-off-by: Parth Patel <[email protected]>
@parthpatel
Copy link
Member Author

What do you mean broken? They're fairly new and they work well in the CI. Please explain what problems you encountered so we can help you.

I pasted this earlier somewhere, but this issue has the build log for broken "make test-unit" invocation.
src/unit is broken - #1290

@parthpatel
Copy link
Member Author

For other following this PR: I, @madolson and @zuiderkwast discussed the right folder structure for this change on slack. We decided that the very first version(minus the git submodule part) is the preferred folder/file structure, i.e. deps/fast_float contains the vendor library and deps/fast_float_integration contains the integration code. I will revert this PR to that structure and then we can discuss CMake related changes.

deps/README.md Outdated Show resolved Hide resolved
deps/README.md Outdated
functions for `float` and `double` types as well as integer types. These functions convert ASCII strings representing decimal values (e.g., `1.3e10`) into binary types. We provide exact rounding (including
round to even). In our experience, these `fast_float` functions many times faster than comparable number-parsing functions from existing C++ standard libraries.

Specifically, `fast_float` provides the following two functions to parse floating-point numbers with a C++17-like syntax (the library itself only requires C++11):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem relevant.

But what I think is very important to mention is the distinction between the external dependency and our integration code. That was the source of this long debate.

Suggested change
Specifically, `fast_float` provides the following two functions to parse floating-point numbers with a C++17-like syntax (the library itself only requires C++11):
The `fast_float` directory contains the C++ header file from the fast float project.
The `fast_float_integration` directory contains code to be able to use fast float from C. The integration code is written for Valkey. It's not from the fast float project.

.PHONY: all clean
all: valkey_strtod.o
clean:
rm -f *.o || true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With -f, rm never fails.

Suggested change
rm -f *.o || true;
rm -f *.o

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does on mac.

/tmp % rm -f *.0
zsh: no matches found: *.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-(rm -f *.o) I guess would be the normal makefile way, to ignore the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MacOS have ridiculously outdated coreutils, because they refused to lift to GPLv3. I guess it's one of the reasons I'm running fedora on my macbook instead of macOS, hehe.


extern "C"
{
const char* fast_float_strtod(const char *str, double *value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson you mentioned you don't want a file called fast_float_strtod.cpp so here we instead have a file called valkey_strtod.cpp which provides a function called fast_float_strtod. Is this what you had in mind? To me, it seems like some weird compromise that just makes it all more confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a generic file that includes both implementations, valkey_strtod makes sense. If it's just a fast float implementation, fast_float_strtod.cpp makes sense.

Comment on lines 59 to 64
const char *fast_float_strtod(const char *str, double *value);

static inline const char *valkey_strtod(const char *str, double *value) {
errno = 0;
return fast_float_strtod(str, value);
}
Copy link
Contributor

@zuiderkwast zuiderkwast Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm fine with the directory name fast_float_c_interface, although it's not what we agreed on in the long chat. We agreed on fast_float_inteface. If we keep changing things after making decisions, we have to make new decisions over and over.

  2. This file provides a static inline valkey_strtod function and it's located under deps/fast_float_c_interface/, but it doesn't always use fast float. It only conditionally uses it.

    The purpose of naming the wrapper valkey_strtod rather than fast_float_strtod was that it should be implementation agnostic to the caller. When it's placed in this directory, you are led to believe it's using fast float even when it isn't. I see that files needing strtod do #include "../deps/fast_float_c_interface/valkey_strtod.h to get the valkey_strtod function.

    We have three pieces here:

    1. the external fast float C++ header (deps/fast_float/fast_float.h)
    2. interface code to be able to call C++ from C. IMO this should be specific to fast float and be called something like deps/fast_float_c_interface/fast_float_strtod.{cpp,h}, but @madolson you don't like this name?
    3. conditional code to choose between the fast float and the stdlib implementations. IMO this should be placed in src/valkey_strtod.h.

    For iii, compare this to jemalloc. The conditional code for which allocator we use is abstracted by zmalloc and is placed in src/zmalloc.{c,h}.

    If you think differently, please convince me. I may be wrong again. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuiderkwast I was expecting this file to end up under src/*, since as you mentioned its our code. This is not CPP code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we agree to use deps/fast_float_c_interface/fast_float_strtod.cpp for the interfacing code and src/valkey_strtod.h for our conditional code then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving valkey_strtod.h to src/.

@@ -116,7 +116,7 @@ jobs:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
run: |
sudo apt-get update && sudo apt-get install libc6-dev-i386
sudo apt-get update && sudo apt-get install libc6-dev-i386 libstdc++-11-dev-i386-cross gcc-multilib g++-multilib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add this C++ stuff in jobs where we don't use USE_FAST_FLOAT=yes? It looks like a mistake.

I don't see that we use USE_FAST_FLOAT=yes anywhere in the CI job. I think we should use it in at least one of the CI test jobs. I suggest we use it in test-ubuntu-latest, or maybe in most of the jobs but not all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is old commit. I will add a separate job.

deps/Makefile Outdated Show resolved Hide resolved
deps/fast_float_c_interface/Makefile Outdated Show resolved Hide resolved
deps/fast_float_c_interface/valkey_strtod.cpp Outdated Show resolved Hide resolved
src/unit/test_valkey_strtod.c Outdated Show resolved Hide resolved
deps/Makefile Outdated Show resolved Hide resolved
madolson and others added 2 commits November 14, 2024 11:34
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Parth <[email protected]>
src/Makefile Outdated Show resolved Hide resolved
* Moved valkey-strtod.h back to src/
* Instead of passing OPT, STD etc from src/Makefile, fast_float
  Makefile explicitly specifies what it needs.
Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request major-decision-approved Major decision approved by TSC team performance run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants