Skip to content

[receiver/hostmetrics] Cheaper parent PID and number of threads retrieval on Windows #38589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pjanotti
Copy link
Contributor

@pjanotti pjanotti commented Mar 12, 2025

(To replace PR #35337)

Reduces the cost of getting number of threads and parent process ID by calling CreateToolhelp32Snapshot to do process enumeration. The data returned by this function already includes the number of threads and parent process ID - and avoiding calling CreateToolhelp32Snapshot for every enumerated process.

Fix #32947

> go  test -benchmem -run=^$ -bench ^BenchmarkGetProcessMetadata$ -benchtime 10s
goos: windows
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper
cpu: Intel(R) Core(TM) Ultra 7 165H
BenchmarkGetProcessMetadata/Old-IncludeParentPid-22                    3        3928594600 ns/op        28357514 B/op      13029 allocs/op
BenchmarkGetProcessMetadata/New-IncludeParentPid-22                  171          69861262 ns/op        28222817 B/op      12591 allocs/op
BenchmarkGetProcessMetadata/Old-ExcludeParentPid-22                  172          68011864 ns/op        28232550 B/op      12614 allocs/op
BenchmarkGetProcessMetadata/New-ExcludeParentPid-22                  169          72172583 ns/op        28351193 B/op      12647 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper  81.722s
> go  test -benchmem -run=^$ -bench ^BenchmarkGetProcessMetadata$ -benchtime 30s
goos: windows
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper
cpu: Intel(R) Core(TM) Ultra 7 165H
BenchmarkGetProcessMetadata/Old-IncludeParentPid-22                    8        4115364238 ns/op        28797698 B/op      13220 allocs/op
BenchmarkGetProcessMetadata/New-IncludeParentPid-22                  514          70165003 ns/op        28702086 B/op      12800 allocs/op
BenchmarkGetProcessMetadata/Old-ExcludeParentPid-22                  552          70230804 ns/op        28558975 B/op      12750 allocs/op
BenchmarkGetProcessMetadata/New-ExcludeParentPid-22                  504          73404366 ns/op        28507275 B/op      12707 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper  170.587s

image

@github-actions github-actions bot requested review from braydonk and dmitryax March 12, 2025 23:41
@braydonk
Copy link
Contributor

Thank you for making this PR @pjanotti! It was helpful for me to understand what you meant; I didn't realize CreateToolhelp32Snapshot could be used in this way. This is awesome, I like it a lot better than what I did.

Comparing to the performance data I posted in the other PR, here's a pprof profile pulled using a similar setup.
config
profile

I accidentally removed old builds of my PR, but based on the profile I'm not sure it's particularly necessary to do a perfmon comparison; the profile shows this PR is either the same or better without increasing WMI usage.

Copy link
Contributor

@braydonk braydonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what your plan is for integrating this, but here's my thoughts.

  • The existing getProcessHandlesInternal function gets renamed getProcessHandlesGopsutil
  • getProcessHandlesInternal becomes a new function in each process_scraper_*.go platform specific file. Most just shell out to getProcessHandlesGopsutil, whereas on Windows it will be the contents of getProcessHandlesInternalNew from this PR
  • Debatably, this might be enough of a deviation in behaviour to constitute a featuregate. I think this is safe enough that it could start as a Beta featuregate. If we do go with a featuregate, getProcessHandlesInternal in process_scraper_windows.go would check if the featuregate enabled, and if it's not then shell out to getProcessHandlesGopsutil.

I think with those things in place, the PR would be ready to come out of draft.

if err != nil {
errs.AddPartial(0, fmt.Errorf("error reading parent pid for process %q (pid %v): %w", executable.name, pid, err))
parentProcessID := int32(0)
if !s.config.ExcludeParentPid {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that we don't need a specific config for this. I would like to still make the check conditional, but we can make it conditional on whether the process.parent_pid resource attribute is enabled or not.

While on Windows with the new process enumeration method getting ppid no longer costs anything, on other platforms it will still be the old style of doing a certain operation per-process. It's cheap on other platforms, but as a matter of principle I do think any work this scraper does should be conditional on whether the user actually enabled that attribute/metric.

@pjanotti
Copy link
Contributor Author

@braydonk I agree with your comments. I will start working on those changes later today.

@pjanotti pjanotti marked this pull request as ready for review March 18, 2025 05:04
@pjanotti pjanotti requested a review from a team as a code owner March 18, 2025 05:04
@pjanotti pjanotti changed the title [receiver/hostmetrics] DRAFT: Prototyping cheaper parent PID retrieval on Windows [receiver/hostmetrics] Cheaper parent PID and number of threads retrieval on Windows Mar 18, 2025

//go:build windows

package processscraper
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file could be removed when we remove the featureflag to fallback to the old behavior.

@@ -250,7 +250,7 @@ func (s *processScraper) getProcessMetadata(ctx context.Context) ([]*processMeta
}

command, err := getProcessCommand(ctx, handle)
if err != nil {
if err != nil && !s.config.MuteProcessAllErrors {
Copy link
Member

@dmitryax dmitryax Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add muting here but the related change on line 277 still unmuted? Maybe remove this from this PR to fix consistently as a separate bug_fix if needed? Just filing an issue is fine as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point that is a leftover from benchmarking...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crobert-1 crobert-1 added the Run Windows Enable running windows test on a PR label Mar 21, 2025
@dmitryax dmitryax requested a review from braydonk March 21, 2025 19:03
@crobert-1
Copy link
Member

Failing unittest on windows is frequency of #38860, unrelated to this PR.

@dmitryax dmitryax merged commit 0c454f8 into open-telemetry:main Mar 21, 2025
198 of 200 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 21, 2025
@pjanotti pjanotti deleted the win-hostmetricsreceiver-cheaper-get-parent-pid branch March 24, 2025 15:25
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this pull request Apr 24, 2025
…eval on Windows (open-telemetry#38589)

(To replace PR open-telemetry#35337)

Reduces the cost of getting number of threads and parent process ID by
calling `CreateToolhelp32Snapshot` to do process enumeration. The data
returned by this function already includes the number of threads and
parent process ID - and avoiding calling `CreateToolhelp32Snapshot` for
every enumerated process.

Fix open-telemetry#32947

```terminal
> go  test -benchmem -run=^$ -bench ^BenchmarkGetProcessMetadata$ -benchtime 10s
goos: windows
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper
cpu: Intel(R) Core(TM) Ultra 7 165H
BenchmarkGetProcessMetadata/Old-IncludeParentPid-22                    3        3928594600 ns/op        28357514 B/op      13029 allocs/op
BenchmarkGetProcessMetadata/New-IncludeParentPid-22                  171          69861262 ns/op        28222817 B/op      12591 allocs/op
BenchmarkGetProcessMetadata/Old-ExcludeParentPid-22                  172          68011864 ns/op        28232550 B/op      12614 allocs/op
BenchmarkGetProcessMetadata/New-ExcludeParentPid-22                  169          72172583 ns/op        28351193 B/op      12647 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper  81.722s
> go  test -benchmem -run=^$ -bench ^BenchmarkGetProcessMetadata$ -benchtime 30s
goos: windows
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper
cpu: Intel(R) Core(TM) Ultra 7 165H
BenchmarkGetProcessMetadata/Old-IncludeParentPid-22                    8        4115364238 ns/op        28797698 B/op      13220 allocs/op
BenchmarkGetProcessMetadata/New-IncludeParentPid-22                  514          70165003 ns/op        28702086 B/op      12800 allocs/op
BenchmarkGetProcessMetadata/Old-ExcludeParentPid-22                  552          70230804 ns/op        28558975 B/op      12750 allocs/op
BenchmarkGetProcessMetadata/New-ExcludeParentPid-22                  504          73404366 ns/op        28507275 B/op      12707 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper  170.587s
```


![image](https://github.com/user-attachments/assets/95cb7f75-8ffb-43c0-ac9a-cce1c673d576)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HostMetrics process scraper high CPU usage during collection on Windows Server 2019
5 participants