Skip to content

[pkg/ottl] Improve slice and maps dynamic indexing validation #37646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
edmocosta opened this issue Feb 3, 2025 · 8 comments
Open

[pkg/ottl] Improve slice and maps dynamic indexing validation #37646

edmocosta opened this issue Feb 3, 2025 · 8 comments

Comments

@edmocosta
Copy link
Contributor

edmocosta commented Feb 3, 2025

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

We've recently introduced support for indexing maps and slices with dynamic values (#36644), allowing statements to access elements using Int, String, Converter, and Path, which was a great addition to OTTL.

Given slices and maps are indexed using different data types (slices w/ integers, and maps w/ strings), we should improve the OTTL contexts/parser to validate the path's key's types and avoid raising errors at runtime:

ERROR	failed processing logs	{"error": "failed to execute statement: set(attributes[\"test\"], attributes[cache[\"key\"]]), unable to resolve a string index in map: could not resolve key for map/slice, expecting 'string' but got 'int64'"}

Describe the solution you'd like

When users configure statements using invalid key types, for example, slices with string keys, or maps with integer keys, we should raise an error at the collector startup:

log_statements:
 - context: log
   statements:
    - set(cache["key"], 1)
    - set(attributes["test"], attributes[cache["key"]]) # maps indexes must be a string, not int

Describe alternatives you've considered

No response

Additional context

No response

@edmocosta edmocosta added enhancement New feature or request needs triage New item requiring triage labels Feb 3, 2025
Copy link
Contributor

github-actions bot commented Feb 3, 2025

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bacherfl
Copy link
Contributor

bacherfl commented Feb 4, 2025

I'd like to look into implementing this if this gets accepted

@bacherfl
Copy link
Contributor

bacherfl commented Feb 4, 2025

@edmocosta can this actually be done during the collector startup? In most cases I think we will not know what will be the type of the values we are going to index during the statement execution. So in the example from the issue description, we will only know the type of attributes["array.attribute"] once we execute the statement for a given log context.

@bacherfl bacherfl added waiting-for-code-owners and removed needs triage New item requiring triage labels Feb 4, 2025
@edmocosta
Copy link
Contributor Author

I'm sorry, that was a bad example! I'll update it.
You're correct, we don't know the dynamic attributes values at startup, what I meant was validating it for the paths we already know the types, such as attributes, cache, etc.

@bacherfl
Copy link
Contributor

bacherfl commented Feb 4, 2025

Thank you for the clarification, @edmocosta! I will look into how this could be achieved. Another limitation however may be that we can only do this for statements that consist of known paths, like attributes, and literals, as something like the following might also not be possible until we have a context object:

log_statements:
 - context: log
   statements:
    - set(cache["key"], attributes["foo"]) # type of attributes["foo"] is not known at startup
    - set(attributes["test"], attributes[cache["key"]]) # maps indexes must be a string, not int

@edmocosta
Copy link
Contributor Author

One possible idea would be delegating the validation to the context's path parsers, as they're already responsible for validating/creating the path's get & setters and know their types beforehand (mostly of the time).

Let's see what other folks says, I think it would be useful to have such validation, but I'm not completely sure if it would worth the extra complexity of adding it (at least for now).

@bacherfl
Copy link
Contributor

bacherfl commented Feb 5, 2025

After looking further into this, i think we will have to actually execute the statement sequence upon validation, to know the types of the values being used for indexing. Like in the above example there would otherwise no way of knowing what type cache["key"] would be within the second statement, without executing the previous statement first. Then we would still have the limitation of this only working with literals, as we do not know the types of attributes from e.g. a log entry that will be sent to the collector until we receive n actual entry during runtime.

Copy link
Contributor

github-actions bot commented Apr 7, 2025

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants