You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, sampling schemes are mainly divided into head-based sampling and tail-based sampling, each with its own obvious advantages and disadvantages.
Internally, we are currently using a tail-based sampler for sampling. However, this sampler requires that all span data from the application be fully reported to the collector. For high-traffic services, this can impact service performance. Additionally, when the application uses batch processing, during peak periods, if the span reporting volume is too large, some spans may be dropped, and the trace completeness cannot be guaranteed.
Drawing on the ideas from this paper, the general process of retroactive sampling that can be implemented is as follows:
A request arrives at server A, generating span data, which is exported to the sidecar collector and cached in the collector.
The request arrives at server B and is similarly exported to the collector.
The collector on server B decides to sample the trace, and it reports the decision to the coordinator.
The coordinator receives this request and then queries the collector on server A through the previous address to send the traceId that needs to be sampled.
The collector on server A retrieves the span for the given traceId from the cache and reports it.
In this case, the caching strategy in the collector is similar to tail-based sampling. It will set a limit on the number of traces to cache and their expiration time. Traces that do not need to be reported will be overwritten and discarded.
Currently, this scheme seems to solve the problems mentioned earlier. After the application generates span data, it will be quickly exported to the collector. Since it is only exported to the sidecar collector, the application can handle more spans with less overhead, and the collector can cache more spans, reducing the chance of span loss.
Of course, this solution will not significantly reduce overall storage costs, but it seems to reduce the impact on the application.
Currently, there is not much discussion on retroactive sampling schemes, so I would like to explore the feasibility of this solution.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Currently, sampling schemes are mainly divided into head-based sampling and tail-based sampling, each with its own obvious advantages and disadvantages.
Internally, we are currently using a tail-based sampler for sampling. However, this sampler requires that all span data from the application be fully reported to the collector. For high-traffic services, this can impact service performance. Additionally, when the application uses batch processing, during peak periods, if the span reporting volume is too large, some spans may be dropped, and the trace completeness cannot be guaranteed.
Drawing on the ideas from this paper, the general process of retroactive sampling that can be implemented is as follows:
In this case, the caching strategy in the collector is similar to tail-based sampling. It will set a limit on the number of traces to cache and their expiration time. Traces that do not need to be reported will be overwritten and discarded.
Currently, this scheme seems to solve the problems mentioned earlier. After the application generates span data, it will be quickly exported to the collector. Since it is only exported to the sidecar collector, the application can handle more spans with less overhead, and the collector can cache more spans, reducing the chance of span loss.
Of course, this solution will not significantly reduce overall storage costs, but it seems to reduce the impact on the application.
Currently, there is not much discussion on retroactive sampling schemes, so I would like to explore the feasibility of this solution.
Beta Was this translation helpful? Give feedback.
All reactions