Skip to content
This repository was archived by the owner on Aug 13, 2019. It is now read-only.

Conversation

@FUSAKLA
Copy link
Contributor

@FUSAKLA FUSAKLA commented May 7, 2018

Hi, after consulting with @gouthamve on KubeCon I wanted to add feature to decode data blocks for easier debugging to the cmd tsdb tool.

Unfortunately there was problem with benchmark test file path and dependencies so I decided to fix the Makefile and since I saw all the issues on unifying makefiles in most of the prometheus repos I assumed you will have to use it here also.

I also added dependencies using govendor used in other prometheus repos (not sure if you would prefer dep or other tools).

The last thing is added -h | --human-readable flag for the tsdb ls command allowing user to display timestamps in human readable format.

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented May 8, 2018

@FUSAKLA you have 3 different PR's here :)

I think best to make this one for the common makefile and open separate one for the data blocks.

Regarding vendoring I think this was left out on purpose and with the new golang versioning proposal this becomes even less important.

@brian-brazil
Copy link
Contributor

Regarding vendoring I think this was left out on purpose

Yes, we don't vendor in libraries.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented May 8, 2018

Ok I'll split it up into 2 PR and omit the vendoring of libraries.
I'm sorry I probably should have consult it before.

@FUSAKLA FUSAKLA force-pushed the fus-cli-refactor branch from 6d6dc53 to 485fc6f Compare May 8, 2018 11:53
@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented May 8, 2018

I removed the vendoring and dependency management and left there only makefile refactoring as suggested.

I also created second PR containing the human-readable flag #325

@krasi-georgiev
Copy link
Contributor

Would it make sense to also enable the common tests in Travis so we know these work before merging?

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented May 8, 2018

That sounds definitely reasonable but I have never worked with the Travis CI.

I'll take a look at it and figure it out. Thank's for suggestion.

@krasi-georgiev
Copy link
Contributor

This part is quite simple just look at the Prometheus .travis file

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented May 8, 2018

I found several ways to achieve this and eventually decided to use env to expand matrix of tests

Is this what you had in mind? Or should I add also makefile common for the tsdb root so that the static check, races, package usage checks etc are run also for the whole library?

Maybe I misunderstood what did you mean by the common tests.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented May 8, 2018

At the end I added also Makefile.common for the root of tsdb so the tests run also for the tsdb library.

For now it fails due to problems with static check

>> running staticcheck
/home/travis/gopath/bin/staticcheck -ignore "" ./...
head.go:468:18: argument should be pointer-like to avoid allocations (SA6002)
head_test.go:665:6: this value of chunkCreated is never used (SA4006)
index/index_test.go:341:9: this value of err is never used (SA4006)
labels/labels_test.go:152:3: this value of m is never used (SA4006)
querier.go:595:6: this value of chks is never used (SA4006)
querier.go:595:19: this result of append is never used, except maybe in other appends (SA4010)
repair_test.go:51:2: this value of meta is never used (SA4006)
wal.go:907:19: argument should be pointer-like to avoid allocations (SA6002)
wal.go:912:19: argument should be pointer-like to avoid allocations (SA6002)
wal.go:917:19: argument should be pointer-like to avoid allocations (SA6002)
wal_test.go:288:2: this value of wal is never used (SA4006)

I'll take a look at those problems asap so the build passes (I hope it will be just minor changes I'm newbie to Go)

@krasi-georgiev
Copy link
Contributor

@krasi-georgiev
Copy link
Contributor

any code failures due to the new tests should be in a new PR to keep this as small as possible for an easy review.

if you can't figure our the failures just open an issue and someone else or I can help with that part.

TEST_DATA_DIR ?= "$(PROJECT_DIR)/testdata"
TEST_DATASET ?= "20kseries.json"
BENCH_DIR ?= "./benchout"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure, but I think any env variables used in the Makefile.common should be set before
include ../../Makefile.common

@krasi-georgiev
Copy link
Contributor

@FUSAKLA still interested in finishing this? feel free to ping me on IRC if you get stuck anywhere.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Jun 7, 2018

Hi, yeah sorry for the inactivity I'll try to get to this ASAP

I started a new branch containing fixes for the CI fails on root of the TSDB project which will be introduced by this PR (there are license issues, code not belonging to any packages and so on) so I can send PR fixing the broken CI along with this.

@krasi-georgiev
Copy link
Contributor

@FUSAKLA would you have time to revisit this?

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 14, 2018

Sorry I completely forgot about this while working on other things, I'll get back to it.

@krasi-georgiev
Copy link
Contributor

Thanks

@FUSAKLA FUSAKLA force-pushed the fus-cli-refactor branch 3 times, most recently from 3938ce6 to 6800d72 Compare September 15, 2018 07:45
@FUSAKLA FUSAKLA mentioned this pull request Sep 15, 2018
@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 15, 2018

@krasi-georgiev Once more sorry for the delay.

I moved the variables above the include, Updated the Makefile.common with most recent version from the prometheus/prometheus repo and opened new PR #380 with some basic fixes of the license issues for now.

You can see for this PR CI fails due to the licence issues
https://travis-ci.org/prometheus/tsdb/jobs/428919535#L552

In the PR with license fixes it fails only on the staticcheck for which I don't have a fix yet.
https://travis-ci.org/prometheus/tsdb/jobs/428925672#L571

.travis.yml Outdated

script:
- go test -timeout 5m ./...
- make check_license style unused test staticcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lets add only the tests that don't fail and will add more after we fix the problems to keep the PR easy to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make style unused test don't fail so lets add those and than will add check_license in another PR and staticcheck after that.

@@ -1,11 +1,49 @@
# Copyright 2018 The Prometheus Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say lets move this in the root folder and have a single Makefile

@FUSAKLA FUSAKLA force-pushed the fus-cli-refactor branch 2 times, most recently from ecb8d92 to eac6721 Compare September 15, 2018 11:22
@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 15, 2018

Ok all green, Now there is only one Makefile and failing checks are omitted and described why.

@krasi-georgiev
Copy link
Contributor

Rushing it a bit again 😄
The benchmarking improvements are great, but lets make these in a separate PR.
Let keep this one on topic so that github shows the proper diffs.

  1. Add and use Makefile.common.
  2. Move cmd/tsdb/Makefile into root: /Makefile. Just make sure the make bench works as expected.

The rest lets do in the following PR.

I think you need to do git add -A to detect that the Makefile was moved rather than deleted and recreated. This is so it shows proper diffs.

I will merge quickly so you can PR the benchmark improvements right away.

@FUSAKLA FUSAKLA force-pushed the fus-cli-refactor branch 2 times, most recently from 3e8941f to 379d75a Compare September 15, 2018 13:15
@krasi-georgiev
Copy link
Contributor

I will re-trigger and will check it as well locally.

@krasi-georgiev
Copy link
Contributor

I think it it because one of the tests doesn't revert the state
here is a PR with the fix
#372

can you revert the changes in the folder: testdata/repair_index_version and do another push.

@FUSAKLA FUSAKLA force-pushed the fus-cli-refactor branch 2 times, most recently from 1169354 to d68a8c9 Compare September 16, 2018 16:04
@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 16, 2018

That's not what troubles me (i wanted to retriger the build once again but by mistake I added also the test files) but if you look at the last builds now there is one(go1.9) failed and the reason is killed and the other(go1.10) passed. And before it fas the other way around, 1.10 failed and 1.9 passed

@krasi-georgiev
Copy link
Contributor

yeah flaky. Will try to replicate locally.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 16, 2018

locally it works just fine
I found this issue https://stackoverflow.com/questions/40919256/older-go-versions-on-travis-are-killed-in-the-middle-of-running-tests
maybe we could be hitting memory limist and get OOMed?

Yeah reading more issues on this and seems to be quite often issue with Travis

@krasi-georgiev
Copy link
Contributor

it hangs on me every time as well when running locally with:

go test -v --race -count=1 -coverprofile=/tmp/go-code-cover -timeout 60s github.com/prometheus/tsdb/wal

so weird.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 16, 2018

Yeah but it just takes longer, I'm getting

=== RUN   TestReader
--- PASS: TestReader (0.06s)
	wal_test.go:131: test 0
	wal_test.go:140: record 0
	wal_test.go:140: record 1
	wal_test.go:140: record 2
	wal_test.go:140: record 3
	wal_test.go:131: test 1
	wal_test.go:140: record 0
	wal_test.go:131: test 2
	wal_test.go:131: test 3
	wal_test.go:131: test 4
	wal_test.go:131: test 5
	wal_test.go:131: test 6
	wal_test.go:131: test 7
	wal_test.go:140: record 0
=== RUN   TestWAL_FuzzWriteRead
--- PASS: TestWAL_FuzzWriteRead (129.36s)
=== RUN   TestWAL_Repair
=== RUN   TestWAL_Repair/bad_fragment_sequence
=== RUN   TestWAL_Repair/bad_fragment_flag
=== RUN   TestWAL_Repair/bad_checksum
=== RUN   TestWAL_Repair/bad_length
=== RUN   TestWAL_Repair/bad_content
--- PASS: TestWAL_Repair (0.34s)
    --- PASS: TestWAL_Repair/bad_fragment_sequence (0.05s)
    --- PASS: TestWAL_Repair/bad_fragment_flag (0.07s)
    --- PASS: TestWAL_Repair/bad_checksum (0.07s)
    --- PASS: TestWAL_Repair/bad_length (0.07s)
    --- PASS: TestWAL_Repair/bad_content (0.08s)
PASS
coverage: 77.0% of statements
ok  	github.com/prometheus/tsdb/wal	130.924s	coverage: 77.0% of statements

Reading this issue using sudo in Travis should really give us more memory so I'll tray to enable it and will see.

Next option would be to test only go 1.10

@krasi-georgiev
Copy link
Contributor

yeah for some reason the TestWAL_FuzzWriteRead takes very long.

I will run bisect to check which commit causes this.

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Sep 16, 2018

just figured it
-race cases the test to take that long.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 16, 2018

yes, the issue was still the memory consumption.
I switched to the sudo: required and both checks passed now.

Are you ok with leaving the sudo enabled?
documentation reference here https://docs.travis-ci.com/user/reference/overview/#sudo-enabled

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Sep 16, 2018

Yeah I don't see any other way to solve it in this PR.
I will also trigger it few times as well to make sure it passes every time.

Just put a comment why it is enabled and I will let @SuperQ and @simonpasquier also comment if this is ok to leave is for now as they are more familiar with Travis.

We will also open an issue to remind us to check how to reduce the memory and duration of the TestWAL_FuzzWriteRead test when -race is enabled.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 16, 2018

Yep, it passed again now I just added the comment. You can check if it's sufficient.
Thanks for the help

@krasi-georgiev
Copy link
Contributor

passes every time now!

@krasi-georgiev
Copy link
Contributor

@FUSAKLA would you mind rebasing to master again as I added 2 more tests that increase the test time by another 20 secodns.

#382

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 17, 2018

passed again after rebase, could you retry it few times to make sure it's ok?

@simonpasquier
Copy link
Contributor

No problem with sudo: required, the downside is that the tests take a little more time to run but it is the only way to get more memory with Travis.

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@krasi-georgiev krasi-georgiev merged commit dbd765a into prometheus-junkyard:master Sep 21, 2018
@krasi-georgiev
Copy link
Contributor

@FUSAKLA Thanks.
Feel free to open a PR with the other improvements you had in mind.

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Sep 21, 2018

I was also looking also at running CI test for windows(due to recent issues in prometheus 2.4.0 release) but neither Travis or CircleCI offers this possibility unfortunately :/

But I found https://www.appveyor.com/ which offers free pricing for open-source projects and allows to run CI in windows environment. Maybe worth looking at?

But that's for another PR :) Thanks for the review!

@krasi-georgiev
Copy link
Contributor

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants