Doris Exporter: Traces table schema #39602
simonasgal
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
We have quite large traces table now in Doris (around 50TB on object store), and noticed significant performance degradation for queries that fetches traces by their IDs. Query plan shows, that even fetching a single trace, all partitions are scanned.
Currently, we have implemented a work-around that provides time range (from the search query) besides to trace ids, and this improved performance a lot since mostly it ends up with single partition.
My proposal is to bring back distribution by trace ID hash. It is very unlikely that it may introduce imbalance or hot spots especially having hourly partitions.
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/dorisexporter/sql/traces_ddl.sql#L43
That was additional proposal to have an option about partitioning - we switched to hourly partitions due to size - daily partitions grows above 1TB ant that is much above than Doris documentation recommends (50GB).
And, third thing is - I not sure all indexes are needed (e.g. timestamp inverted index) since they are either not used or already exists prefix indexes for these columns. That is a topic I'm going to work a little bit more in test environment, will look more closely to query plans and profiles.
Beta Was this translation helpful? Give feedback.
All reactions