-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating fast_float with Valkey to replace strtod invocation #1260
base: unstable
Are you sure you want to change the base?
Conversation
I suppose I don't really understand the benefit of a submodule vs inlining. Since we aren't tightly controlling the versioning between the main repo and releases, it becomes much harder to know if there is a security issue impacting a specific release. At the very least we should be pinning a specific version of fast_float that we are pulling. |
We track fast_float commit id in the valkey repository with this change - see the copy pasted change from my commit below. For every valkey commit, we can look up exact version of fast_float at all times using this method. Pulling new fast_float version is as simple as git pull on the submodule. Submodule fast_float added at e800ca |
Does it get pulled automatically as part of the release into the release artifacts? |
There is a "git submodule update --init" command in Makefile to initialize it automatically. So yes, It will automatically checkout the same commit every time during build. |
So we are adding a new dependency to the release process, since you need to be able to fetch the code from github. I think we should consider figuring out how to pull the code in when we do a release so that folks don't need to do |
The other approach would be to just check-in the whole git repo as a folder under valkey, which should work. What is the issue with git dependency in github release workflows? I can put a post-checkout hook that always initializes modules on checkout, as long as checkout happens outside of the release process. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1260 +/- ##
============================================
+ Coverage 70.55% 70.71% +0.15%
============================================
Files 115 116 +1
Lines 63158 63167 +9
============================================
+ Hits 44561 44667 +106
+ Misses 18597 18500 -97
|
I am dropping the submodule idea for now as it requires a larger discussion about release process. I don't have access to @swaingotnochill's repository or PR. Therefore, I pulled his commit into this CR to maintain his author-ship on the code he wrote. I integrated it with Redis and fixed the Makefile. It should build now and will be ready to push. I will work on benchmarking this separately. I can also turn off fast_float by default if folks have concerns and test it on my branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to modify. I am on a vacation so it will be difficult for me to work until I am back. Cheers
@zuiderkwast or @madolson any pointers on how to solve almalinux issue with missing g++?
|
I'm skeptical to submodules too. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL). I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries? |
…or replacing strtod Signed-off-by: Parth Patel <[email protected]>
* Simplified the interface to remove if branches. * Simplified Makefile to be more readable. * Integrating fast_float with the redis code base in resp_parser.c file. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
… issues. Signed-off-by: Parth Patel <[email protected]>
…d packages. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
…fy it explicitly in Makefile to fix 32-bit compilation issues. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
…i386 compilation Signed-off-by: Parth Patel <[email protected]>
…compilations Signed-off-by: Parth Patel <[email protected]>
This build should succeed now. Here is a summary of changes from today:
I verified that this compiles for Mac and Rhel for both USE_FAST_FLOAT=yes and USE_FAST_FLOAT=no. Used "make test" to also include tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress.
On the topic of moving the fast_float_strtod.cpp to src/, the src/Makefile has become a monolith. From my experience so far with this Makefile, I am now strongly against this. We should move towards smaller build units.
This is another topic. We appear to be moving towards CMake in the long term. Let's not "solve" this now. Keep the monolith and flat directory structure.
The Makefile monolith provides a common ground for optimization flags, and debug flags and other stuff. I'm posting make code in the comments below that you can use. (Personally, I don't think this file is messy at all, given all it does. I guess you need to understand make well to like it.)
Unit test still resides in this same directory because src/unit build is broken.
What do you mean broken? They're fairly new and they work well in the CI. Please explain what problems you encountered so we can help you.
src/Makefile
Outdated
fast_float/fast_float_strtod.o: | ||
cd fast_float && $(MAKE) CFLAGS=" $(CFLAGS) " LDFLAGS=" $(LDFLAGS) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default make target is the first target in the file. Here, you're adding this target above the all
target, so when I type make
, it only builds this file.
$ make
CC Makefile.dep
cd fast_float && make CFLAGS=" -funwind-tables " LDFLAGS=" "
make[1]: Entering directory '/home/viktor/repos/valkey/src/fast_float'
g++ -std=c++11 -O3 -fPIC -D FASTFLOAT_ALLOWS_LEADING_PLUS -funwind-tables -c fast_float_strtod.cpp -o fast_float_strtod.o
make[1]: Leaving directory '/home/viktor/repos/valkey/src/fast_float'
This needs to be somewhere below all:
.
I tried changing this to get rid of the separate Makefile under src/fast_float/ and to use the same optimization and debug flags as for the rest of Valkey, because it can be configured for debugging, etc.
I just added this code below the rules for %.o:
and unit/%.o:
ifeq ($(USE_FAST_FLOAT),yes)
CXX ?= c++
CXXFLAGS += -Wall -W $(OPT) $(DEBUG)
fast_float/fast_float_strtod.o: fast_float/fast_float_strtod.cpp ../deps/fast_float/fast_float.h
$(QUIET_CXX)$(CXX) $(CXXFLAGS) $(CFLAGS) $(SERVER_CFLAGS) -D FASTFLOAT_ALLOWS_LEADING_PLUS -c $< -o $@
endif
The QUIET_CXX
variable is just for nice output, because we have something similar for CC. The definition next to QUIET_CC:
ifndef V
QUIET_CC = @printf ' %b %b\n' $(CCCOLOR)CC$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
+QUIET_CXX = @printf ' %b %b\n' $(CCCOLOR)C++$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
QUIET_GEN = @printf ' %b %b\n' $(CCCOLOR)GEN$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
QUIET_LINK = @printf ' %b %b\n' $(LINKCOLOR)LINK$(ENDCOLOR) $(BINCOLOR)$@$(ENDCOLOR) 1>&2;
With these changes, I don't see the need for the separate sub directory src/fast_float/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default make target is the first target in the file. Here, you're adding this target above the all target, so when I type make, it only builds this file.
Good catch. I was too busy testing "make test" across all platforms to focus on normal make. Fixed.
Working on the Makefile changes, but I am against a single monolith Makefile. Experienced valkey developers may be able to make out what's happening there but new developers will have significant friction. If everyone else is in agreement, I will push that change.
src/fast_float/fast_float_strtod.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this file src/valkey_strtod.h
, named after the function it provides, rather than what it uses behind the scenes. Fast float is just one of the two implementations it can use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, strtod has the signature double strtod(const char *str, char **endptr);
, I think we should ideally conform so that it's more normal to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, strtod has the signature double strtod(const char *str, char **endptr);, I think we should ideally conform so that it's more normal to follow.
I changed the signature to remove some if checks. endptr can be null. Some of the invocations pass null and others don't. I think this is a minor change.
Let's call this file src/valkey_strtod.h,
Done.
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Parth <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Parth <[email protected]>
* also moved fast_float target below all so that all remains default target. Signed-off-by: Parth Patel <[email protected]>
* Also removed doxygen tags. Signed-off-by: Parth Patel <[email protected]>
I pasted this earlier somewhere, but this issue has the build log for broken "make test-unit" invocation. |
Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
For other following this PR: I, @madolson and @zuiderkwast discussed the right folder structure for this change on slack. We decided that the very first version(minus the git submodule part) is the preferred folder/file structure, i.e. deps/fast_float contains the vendor library and deps/fast_float_integration contains the integration code. I will revert this PR to that structure and then we can discuss CMake related changes. |
Signed-off-by: Parth Patel <[email protected]>
deps/README.md
Outdated
functions for `float` and `double` types as well as integer types. These functions convert ASCII strings representing decimal values (e.g., `1.3e10`) into binary types. We provide exact rounding (including | ||
round to even). In our experience, these `fast_float` functions many times faster than comparable number-parsing functions from existing C++ standard libraries. | ||
|
||
Specifically, `fast_float` provides the following two functions to parse floating-point numbers with a C++17-like syntax (the library itself only requires C++11): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem relevant.
But what I think is very important to mention is the distinction between the external dependency and our integration code. That was the source of this long debate.
Specifically, `fast_float` provides the following two functions to parse floating-point numbers with a C++17-like syntax (the library itself only requires C++11): | |
The `fast_float` directory contains the C++ header file from the fast float project. | |
The `fast_float_integration` directory contains code to be able to use fast float from C. The integration code is written for Valkey. It's not from the fast float project. |
.PHONY: all clean | ||
all: valkey_strtod.o | ||
clean: | ||
rm -f *.o || true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With -f
, rm never fails.
rm -f *.o || true; | |
rm -f *.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does on mac.
/tmp % rm -f *.0
zsh: no matches found: *.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-(rm -f *.o)
I guess would be the normal makefile way, to ignore the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MacOS have ridiculously outdated coreutils, because they refused to lift to GPLv3. I guess it's one of the reasons I'm running fedora on my macbook instead of macOS, hehe.
|
||
extern "C" | ||
{ | ||
const char* fast_float_strtod(const char *str, double *value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madolson you mentioned you don't want a file called fast_float_strtod.cpp
so here we instead have a file called valkey_strtod.cpp
which provides a function called fast_float_strtod
. Is this what you had in mind? To me, it seems like some weird compromise that just makes it all more confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a generic file that includes both implementations, valkey_strtod
makes sense. If it's just a fast float implementation, fast_float_strtod.cpp
makes sense.
const char *fast_float_strtod(const char *str, double *value); | ||
|
||
static inline const char *valkey_strtod(const char *str, double *value) { | ||
errno = 0; | ||
return fast_float_strtod(str, value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I'm fine with the directory name
fast_float_c_interface
, although it's not what we agreed on in the long chat. We agreed onfast_float_inteface
. If we keep changing things after making decisions, we have to make new decisions over and over. -
This file provides a static inline
valkey_strtod
function and it's located underdeps/fast_float_c_interface/
, but it doesn't always use fast float. It only conditionally uses it.The purpose of naming the wrapper
valkey_strtod
rather thanfast_float_strtod
was that it should be implementation agnostic to the caller. When it's placed in this directory, you are led to believe it's using fast float even when it isn't. I see that files needing strtod do#include "../deps/fast_float_c_interface/valkey_strtod.h
to get thevalkey_strtod
function.We have three pieces here:
- the external fast float C++ header (
deps/fast_float/fast_float.h
) - interface code to be able to call C++ from C. IMO this should be specific to fast float and be called something like
deps/fast_float_c_interface/fast_float_strtod.{cpp,h}
, but @madolson you don't like this name? - conditional code to choose between the fast float and the stdlib implementations. IMO this should be placed in
src/valkey_strtod.h
.
For iii, compare this to jemalloc. The conditional code for which allocator we use is abstracted by zmalloc and is placed in
src/zmalloc.{c,h}
.If you think differently, please convince me. I may be wrong again. :)
- the external fast float C++ header (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuiderkwast I was expecting this file to end up under src/*, since as you mentioned its our code. This is not CPP code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we agree to use deps/fast_float_c_interface/fast_float_strtod.cpp
for the interfacing code and src/valkey_strtod.h
for our conditional code then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving valkey_strtod.h to src/.
.github/workflows/ci.yml
Outdated
@@ -116,7 +116,7 @@ jobs: | |||
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 | |||
- name: make | |||
run: | | |||
sudo apt-get update && sudo apt-get install libc6-dev-i386 | |||
sudo apt-get update && sudo apt-get install libc6-dev-i386 libstdc++-11-dev-i386-cross gcc-multilib g++-multilib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add this C++ stuff in jobs where we don't use USE_FAST_FLOAT=yes
? It looks like a mistake.
I don't see that we use USE_FAST_FLOAT=yes
anywhere in the CI job. I think we should use it in at least one of the CI test jobs. I suggest we use it in test-ubuntu-latest
, or maybe in most of the jobs but not all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is old commit. I will add a separate job.
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Parth <[email protected]>
* Moved valkey-strtod.h back to src/ * Instead of passing OPT, STD etc from src/Makefile, fast_float Makefile explicitly specifies what it needs. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
…interface Makefile Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
This is an alternative to this PR request by @swaingotnochill : #1170
This allows fast_float to remain a submodule that we can pull from for updates.
---- Update after the discussion
I am dropping the submodule idea for now as it requires a larger discussion about release process. I don't have access to @swaingotnochill's repository or PR. Therefore, I pulled his commit into this CR to maintain his author-ship on the code he wrote. I integrated it with Redis and fixed the Makefile. It should build now and will be ready to push.
I will work on benchmarking this separately. I can also turn off fast_float by default if folks have concerns and test it on my branch.