-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Scraper feedback, and what to do next? #11238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like the idea. We can have just one
|
Re 4:
Perhaps, I think it's a cleaner config interface for sure. Provided I'm understanding this idea vs the current state of things, configuring enable/disable at the start has one big benefit: scrapers can recognize that certain metrics aren't required at scrape time, and can subsequently opt to ignore them. I have a hostmetrics PR where skipping collection for a particular metric when it's disabled yielded major performance benefits. FWIW, the spec I wrote for that wildcard matching [1] has the potential to be applied in a way other than the current PR. I would happily welcome a better place for it to live than as generated mdatagen code like in the PR. [1]: As of writing this is due for updates, since Dmitrii and I have already agreed on a major simplification I can make. But the spirit will remain. |
Indeed we can configure this, but it gives us the opportunity to have it at lower cost then batching later (less allocations for example).
We can have a pull of workers (control also how many go routines are spawned) for example or other approach. Definitely we should have it to be able to scale to lots of sources.
Agree |
Does this mean that people developing the scrapers/receivers will be responsible for implementing the I would also like to highlight the proposal to add support for suggestion/hint based discovery:
|
I think the "discovery" will be a functionality of the scraper receiver that will emit "discovery events" on an internal bus, the scrapers can "subscribe" to the bus and read the discovery events and start scraping if needed. |
One small organization PR over at #11166 will allow us to continue evolving the scraper API while allowing us to mark the rest of receiver API 1.0 first (and thus unblocking stabilizing receivers that are not scraper-based). It doesn't change anything about the scraper API surface other than moving things to a new folder, but please do take a look! |
Hi team, I'm wondering that if we are planning to support scraper helper to collect logs(pass data as scraped logs to the next consumer). This is a useful use case like collecting container logs. (posting the question here since this is a thread for scraper feedback :) |
Update #11238 No changelog since this is still in progress. Signed-off-by: Bogdan Drutu <[email protected]>
) Update open-telemetry#11238 No changelog since this is still in progress. Signed-off-by: Bogdan Drutu <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Add scraper for logs <!-- Issue number if applicable --> #### Link to tracking issue Relates to #11238 <!--Describe what testing was performed and which tests were added.--> #### Testing Added unit tests <!--Describe the documentation added.--> #### Documentation Added <!--Please delete paragraphs that you did not use before submitting.-->
Updates #11238 Signed-off-by: Bogdan Drutu <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR adds obs for logs in scraper/scraperhelper, also introduced new metrics for scraping logs. <!-- Issue number if applicable --> #### Link to tracking issue Relates to #11238 <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Added <!--Please delete paragraphs that you did not use before submitting.-->
Why is batching at the source desirable? Generally speaking, isn't there an expectation that telemetry streams may be arriving from many independent services at essentially random times? Why in this case would we make such an effort to align them? To me it seems like a downside to try to consolidate scrapers like this because the "shared" fields are often better individually. If I want to tweak a timeout for one scraper, now I have to worry about structural problems too. (e.g. split into two receivers, add both to pipelines) It feels much like saying that username is a common field so we should extract those too. |
Everything that you said is "bad" if I need different configuration, happens right now. The proposed solution helps when this should not happen. Also, in general (not always) for metrics you kind of want to report them kind of with same resolution, so the interval will be the same. |
Batching is only one thing that we want to add, we are also looking to have a "discovery" mechanism and allow starting/stopping scrapers dynamically, and more examples in @dmitryax comment. |
I disagree. It's true that right now the user would have to define multiple receivers and put them into pipelines. However, what I'm pointing out is that ongoing maintenance of a configuration becomes more complicated because you must untangle scrapers from each other in order to make a minor tweak to one of them. I'm not necessarily opposed to the idea of offering a multiple-scraper receiver, but I think there's value in presenting (most) scrapers as standalone receivers too. Maybe I missed in the above, but I imagine we do the following:
WDYT? Followup question: If we were to move away from "scraper receivers" and instead use only a compound multi-scraper receiver, how does one use the builder to define which scrapers are included. (This is a must IMO as scrapers often import large/complex dependencies.) |
Will show a prototype for the builder soon.
Why is this needed? In my opinion if anyone wants to have a receiver definition per scraper, they can achieve the same thing by configuring the "multi-scraper" with only one scraper. |
For example, in the "new world" for the zookeeper receiver: ** Before: **
** After (still not final the exact config): **
|
Reasons would be 1) to not break existing users, and 2) arguably simpler getting started for new users. I know some existing heavy users who will suffer from this but I can't say you're wrong about both being possible. I think it's a manageable migration. |
We will provide automations to do as much as possible and keep this for as long as possible. I will keep this in mind and offer a large period of time. |
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR added support for logs scraper <!-- Issue number if applicable --> #### Link to tracking issue Relevant to #11238 <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Added <!--Please delete paragraphs that you did not use before submitting.-->
Part of open-telemetry/opentelemetry-collector#11238 In this PR, only check that the new scraper interface for component and factory can be used for this use-case. Already proven for hostmetrics so no need for more than this. Items TODO next: * Add support in the integration tests to run directly the scraper. This will also make test more deterministic. * In core will add support for "scraperreceiver" and deprecate zookeeperreceiver when that is available. Signed-off-by: Bogdan Drutu <[email protected]>
…y#37388) Part of open-telemetry/opentelemetry-collector#11238 In this PR, only check that the new scraper interface for component and factory can be used for this use-case. Already proven for hostmetrics so no need for more than this. Items TODO next: * Add support in the integration tests to run directly the scraper. This will also make test more deterministic. * In core will add support for "scraperreceiver" and deprecate zookeeperreceiver when that is available. Signed-off-by: Bogdan Drutu <[email protected]>
Are there plans to support histogram / exponential_histogram in the scraper generation? |
…y#37388) Part of open-telemetry/opentelemetry-collector#11238 In this PR, only check that the new scraper interface for component and factory can be used for this use-case. Already proven for hostmetrics so no need for more than this. Items TODO next: * Add support in the integration tests to run directly the scraper. This will also make test more deterministic. * In core will add support for "scraperreceiver" and deprecate zookeeperreceiver when that is available. Signed-off-by: Bogdan Drutu <[email protected]>
Current state:
While #9473 was a right call in terms of strangeness that from a string we construct an ID, we should not pass a "type" because the idea was that a "receiver" can have multiple scrapers, but if that is true, all of them will be the same receiver/component type correct?
We have made a mistake probably (big part of the problem is the author of this issue) with the design of the scrapers. Here are a set of questions we need to answer, and based on that we can change the design of the scraper package:
During the decision and discussion it should be considered that currently we have a big problem, which is that for each scraper/receiver we create its one timer and we lose the opportunity of batching at the source. Another feedback we got in the past with the current a scraper as a receiver is that user has to configure the scrape interval individually for each receiver.
I like the option 3. A proposed idea is to start from how user should configure this, and see how we can achieve this making sure we "batch" as much as possible all these scrapings:
Later we can add capability to do "source discovery" or build a different receiver like the current "receivercreator" which can be adopted to follow the same pattern/structure.
The text was updated successfully, but these errors were encountered: