Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improved cross platform metric collection #2834

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

NathanSavageKaimai
Copy link

This PR improves CPU and RAM metric collection across multiple environments. The CPU metrics are now fully cGroup aware report properly in containerised environments with cpu quota limits. The memory profile method on windows has been updated from using legacy WMIC to powershell. Finally two new utility methods have been added to @crawlee/utils, general.ts to determine if the scraper is containerised (instead of just running in docker) as well as if cGroup is enabled.

Adds

@crawlee/utils

general.ts

  • isContainerised() an extention of isDocker() that also checks for the presence of a KUBERNETES_SERVICE_HOST environment variable for k8 and a CRAWLEE_CONTAINERISED environment variable for manual control.
  • getCgroupsVersion() a method to determine the cGroup version in a cGroup controlled environment. It does this by checking for a file at /sys/fs/cgroup/memory/. If it is present, the cGroup verison is 1, else version 2.

cpu-info.ts

Collects cpu infomation in a similar manner to memory-info.ts

  • getCurrentCpuTicks() The existing solution. Used in AWS lambda, containerised environments without a cGroup cpu limit and on bare metal.
  • getCpuQuota() Gets the cpu quota in cGroup controlled environments.
  • getCpuPeriod() Gets the cpu quota period in cGroup controlled environments.
  • getContainerCpuUsage() Gets the containers cpu usage.
  • getSystemCpuUsage() Gets the systems cpu usage.
  • getCpuInfo() The main method for collecting cpu load metrics. Determines the enviroment and calls the other functions accordingly.

Removes

@apify/ps-tree

A legacy package that checked memory usage using WMIC.exe (depreciated) on windows or ps on *nix. Replaced by @crawlee\packages\utils\src\internals\psTree.ts which calculates the memory usage in a similar manner but using powershell and Get-CimInstance Win32_Process. Also adds type safety.

Fixes

Fixes: #2771

@NathanSavageKaimai
Copy link
Author

hey all, thanks for agreeing to take a look. I havent done much OSS before so im looking forward to hearing your thoughts :). Assuming it all looks good to you, when might it be incorperated into a crawlee release? At work we have a project that hinged on using crawlee in k8 so the autoscaling issues in containers is causing a fairly significant issue for us.

When you come to review it, id be happy to hop on a discord call and discuss it. :)

Thanks for everything you do!

@janbuchar
Copy link
Contributor

Hi @NathanSavageKaimai and thanks for your willingness to contribute! In the issue that this aims to close, you mentioned the possibility of using the ps-list package. If we decided that adding another dependency is fine, could the change be smaller? How much? Are there any other tradeoffs or possible disadvantages to using that library?

@NathanSavageKaimai
Copy link
Author

hi @janbuchar, ps-list would serve the same purpose as the new packages/utils/src/internals/psTree.ts file so it would be a 170 line reduction. Another module that might be useful that i have found since is systeminfomation. This module could likley replace, cpu-info.ts, memory-info.ts and psTree.ts leading to a ~540 line reduction. :)

Let me know if you would like me to explore these options.

@NathanSavageKaimai
Copy link
Author

with ps-list, you are relying on a third party binary which doesnt provide its source code as far as i can tell. It could be a potential supply chain risk.

@vladfrangu
Copy link
Member

I will +1 that worry, I'm not a fan of using a dependency that embeds a binary whose source code isn't directly open source / one we could build ourselves and embed

@janbuchar
Copy link
Contributor

Um, as far as I can tell, ps-list uses fastlist, which seems open enough to me - am I missing anything?

@NathanSavageKaimai
Copy link
Author

@janbuchar Ah yep, i hadnt found the cpp repo. still, being externally tracked, theres no automatic method to verify the authenticity of the binary beyond downloading from both sources and checking the hashes.

@vladfrangu
Copy link
Member

Um, as far as I can tell, ps-list uses fastlist, which seems open enough to me - am I missing anything?

The fact the binary is just embedded in instead of precompiled (like impit) or built at install time is a worry imo

@janbuchar
Copy link
Contributor

You both make a good point. I'd still consider exploring systeminformation - if we can avoid maintaining code for reading low level system details, it might be worth the increased install size.

@NathanSavageKaimai
Copy link
Author

You both make a good point. I'd still consider exploring systeminformation - if we can avoid maintaining code for reading low level system details, it might be worth the increased install size.

cool will do. :)

@NathanSavageKaimai
Copy link
Author

@janbuchar ive had a play around with systeminfomation and unfortunately it isnt as useful as i hoped it would be. It seems that the "docker" functions arent generating the metrics themselves but sending requests to the docker socket for the data. This unfortunately means that short of mounting the socket within the container, it can only work on the host. I have done a good search and as far as i can tell, there is no universal, cross platform library to collect cpu and ram metrics on "bare metal", cgroup 1 and 2. There might even be scope here for an entirely new package for apify but for now, my approach seems to be the best one going forward. :)

@janbuchar
Copy link
Contributor

Sounds reasonable, thanks! I will try to review the code in depth this week.

@NathanSavageKaimai
Copy link
Author

hey @janbuchar have you been able to have a look yet? No worries ethier way. if you like, we can sit down for a call later. Just shoot me a message on Discord - crafty5064. Im free until 2pm UTC or all day Saturday. :)

@NathanSavageKaimai
Copy link
Author

Hi all,

I hope you had a good weekend! Have you had a chance to review these changes? I was hoping they might be included in a release soon as these issues are blocking my company from deploying our product fully to k8. If you would like a chat, im free untill 1pm utc tomorrow. :)

@janbuchar
Copy link
Contributor

Hi, I'll look into it tomorrow. Sorry for the delay!

Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on this one! I have a bunch of readability/code structure comments. More importantly though, the tests here are very narrowly scoped and use mocking heavily. Do you think you could add an E2E test that would verify that the system information detection works as expected? Feel free to suggest any other way to test this as a whole.

test/core/autoscaling/snapshotter.test.ts Show resolved Hide resolved
test/utils/general.test.ts Outdated Show resolved Hide resolved
packages/core/src/events/local_event_manager.ts Outdated Show resolved Hide resolved
@@ -43,6 +43,35 @@ export async function isDocker(forceReset?: boolean): Promise<boolean> {
return isDockerPromiseCache;
}

/**
* Returns a `Promise` that resolves to true if the code is running in a containerised environment.
* Returns true if the CRAWLEE_CONTAINERISED environment variable is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is supposed to set the CRAWLEE_CONTAINERISED environment variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is meant to be a manual way to run the containerised resource checks in case the other heuristics dont catch it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Could you add it to the documentation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind showing me where? im a little bit lost on that side of it, ta.

Im free for a call right now if you would like a chat. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param includeRoot - Optional flag. When true, include the process with the given PID if found.
* Defaults to false.
*/
export async function psTree(pid: number | string, includeRoot: boolean = false): Promise<ProcessInfo[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very long and complex function. Possibly the main reason why it's hard to read is that it combines the implementation for UNIX and Windows in a single function. Could it be broken down into multiple smaller functions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was pretty much a copy paste from apify/pstree with the WMIC changes. I can reformat it. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do that, apify/pstree is super dated and I'm sure that if we don't refactor it now, we won't get back to it, ever.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool will do :)

packages/utils/src/internals/memory-info.ts Outdated Show resolved Hide resolved
packages/utils/src/internals/memory-info.ts Outdated Show resolved Hide resolved
packages/utils/src/internals/cpu-info.ts Show resolved Hide resolved
packages/utils/src/internals/cpu-info.ts Outdated Show resolved Hide resolved
@NathanSavageKaimai
Copy link
Author

hi @janbuchar thanks for your insight! Most of the points were just me trying to follow conventions set out in the prexisting memory-info.ts file but i will definitely work on clarifying it. :)

@janbuchar
Copy link
Contributor

@NathanSavageKaimai please look into refactoring of psTree so that it's more readable.

Also, any thoughts regarding this?

More importantly though, the tests here are very narrowly scoped and use mocking heavily. Do you think you could add an E2E test that would verify that the system information detection works as expected? Feel free to suggest any other way to test this as a whole.

@NathanSavageKaimai
Copy link
Author

@NathanSavageKaimai please look into refactoring of psTree so that it's more readable.

Also, any thoughts regarding this?

More importantly though, the tests here are very narrowly scoped and use mocking heavily. Do you think you could add an E2E test that would verify that the system information detection works as expected? Feel free to suggest any other way to test this as a whole.

its a difficult one given that its so close to the metal, an e2e test would be dependant on the current state of the test runner unless i mocked the exec call and readline interface but at that point it may as well be a unit test. Also, personally I am only set up to run tests on windows or linux through wsl so i cant verify Macos compatability beyond "its a copy paste from a solution that persumably worked". What i will do is reimplement the unit tests in apify/pstree

@janbuchar
Copy link
Contributor

What i will do is reimplement the unit tests in apify/pstree

Cool, that will at least give us some certainty that ps-tree works.

I guess we could make a script that would show the current CPU and memory usage ratio and compare it with the old implementation. Then we could at least test-drive this on several machines with different OS and see if it behaves reasonably. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get rid of dependency on deprecated WMIC on Windows
3 participants