-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[receiver/hostmetrics] Use native API instead of WMI to get process.handles
on Windows
#38886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/hostmetrics] Use native API instead of WMI to get process.handles
on Windows
#38886
Conversation
…pjanotti/opentelemetry-service-contrib into processscraper-optimize-process.handles
@@ -957,15 +979,15 @@ func TestScrapeMetrics_ProcessErrors(t *testing.T) { | |||
executableError = test.cmdlineError | |||
} | |||
|
|||
expectedResourceMetricsLen, expectedMetricsLen := getExpectedLengthOfReturnedMetrics(test.nameError, executableError, test.timesError, test.memoryInfoError, test.memoryPercentError, test.ioCountersError, test.pageFaultsError, test.numThreadsError, test.numCtxSwitchesError, test.numFDsError, test.rlimitError, test.cgroupError, test.createTimeError) | |||
expectedResourceMetricsLen, expectedMetricsLen := getExpectedLengthOfReturnedMetrics(test.nameError, executableError, test.timesError, test.memoryInfoError, test.memoryPercentError, test.ioCountersError, test.pageFaultsError, test.numThreadsError, test.numCtxSwitchesError, test.numFDsError, test.handleCountError, test.rlimitError, test.cgroupError, test.createTimeError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the pattern already in place, but, the number of parameters is getting too big. I will follow up with a chore PR to pass all of these in a struct or map.
Test failure is unrelated to this PR, following up in #38903
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, left a comment about process.open_file_descriptors
based on how we're going to be handling it in Semantic Conventions.
@@ -170,7 +170,10 @@ metrics: | |||
process.open_file_descriptors: | |||
enabled: false | |||
description: Number of file descriptors in use by the process. | |||
extended_documentation: This metric is only available on Linux. | |||
extended_documentation: >- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric is kind of in flux in Semantic Conventions (see open-telemetry/semantic-conventions#1798) and will likely be designated Unix-exclusive for reasons stated in this comment. We're still in the process of figuring out how we'll handle all the semconv transition, but we like to get ahead of it where we can.
As a result, I think this metric should remain Linux-exclusive as it was already, and produce an error if enabled on Windows. I think this used to be the behaviour, but it inadvertently changed on a gopsutil
upgrade when this was implemented on Windows upstream. I actually forgot this was done (and was kind of hoping they wouldn't do it this way I wanted them to introduce a different API for it but the codebase wasn't structured to allow for that at the time).
I think the best thing to do here would be to introduce some validation checks in the process scraper config to fail if enabling open_file_descriptors
on Windows, or process.handles
on Linux. However, this would be a breaking change; the previous behaviour is that the scrape would fail every time trying to write those metrics. Maybe I'll follow this up with a featuregated move to add validation for these metrics instead of allowing them and failing them every scrape.
Sorry for the long comment, but to summarize: I think we should change this to erroring on every scrape if process.open_file_descriptors
is enabled on Linux, similar to process.handles
. It can produce a platform not supported error similar to the handles one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context @braydonk - I agree open_file_descriptors
should error on Windows, as process.handles
should error on non-Windows. I tend to think that reporting the error once at startup is better than on every scrape, but, this is something that we can discuss later.
I liked your suggestion for the names:
process.unix.file_descriptor.count
process.windows.handle.count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I subscribed to the spec issue, as soon as it is settled, let's move to implement it.
…handles` on Windows (open-telemetry#38886) Using the native [`GetProcessHandleCount`](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getprocesshandlecount) Win32 API via `NumFDsWithContext` already implemented by `gopsutil` gives a noticeable performance improvement over the current version using WMI. * For non-Windows OSes the same behavior is kept for the metric `process.handles`: it returns a not supported platform error. * Corrected the documentation of `process.open_file_descriptors` with the actual behavior - as future breaking change we can consider return not supported for this metric on Windows. Expected performance improvement is shown on the image below. On the image the red line is the new implementation, the green the current one using WMI, and in blue is the WMI service handling the query made by the current implementation of the scraper. 
Using the native
GetProcessHandleCount
Win32 API viaNumFDsWithContext
already implemented bygopsutil
gives a noticeable performance improvement over the current version using WMI.process.handles
: it returns a not supported platform error.process.open_file_descriptors
with the actual behavior - as future breaking change we can consider return not supported for this metric on Windows.Expected performance improvement is shown on the image below. On the image the red line is the new implementation, the green the current one using WMI, and in blue is the WMI service handling the query made by the current implementation of the scraper.