Skip to content

feat: extend spec class for config migrations #538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: pnilan/feat/implement-validators
Choose a base branch
from

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented May 8, 2025

Problem

Recommended Review Order

  1. declarative_component_schema.py
  2. spec.py
  3. model_to_component_factory.py
  4. test_spec.py

@github-actions github-actions bot added the enhancement New feature or request label May 8, 2025
@pnilan pnilan marked this pull request as ready for review May 12, 2025 18:09
@Copilot Copilot AI review requested due to automatic review settings May 12, 2025 18:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the Spec class by adding support for configuration migrations, transformations, and validations. Key changes include:

  • Adding new optional fields (config_migrations, config_transformations, config_validations) and a message repository to the Spec class.
  • Introducing new methods to migrate, transform, and validate the configuration.
  • Updating the component factory (ModelToComponentFactory) to propagate the new fields from the normalization rules.

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

File Description
airbyte_cdk/sources/declarative/spec/spec.py Extended Spec with new config migration/transformation/validation fields and corresponding methods.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated create_spec to forward new normalization rules fields.
pyproject.toml Added extra dependencies.

Copy link
Contributor

coderabbitai bot commented May 12, 2025

📝 Walkthrough

Walkthrough

This update introduces a declarative configuration normalization framework to the Airbyte CDK. It adds schema fields, Pydantic models, and runtime logic for config migrations, transformations, and validations. New transformation and validator abstractions are implemented, along with concrete classes and comprehensive unit tests. Dependencies for dagger-io and anyio are also added.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Extended the Spec schema to include config_normalization_rules with config_migrations, transformations, and validations. Added definitions for DpathValidator, PredicateValidator, ValidateAdheresToSchema, and RemapField.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Reformatted Pydantic fields, added models for config normalization (validators and transformations), and extended Spec with config_normalization_rules.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated create_spec to populate config_migrations, transformations, and validations from the model.
airbyte_cdk/sources/declarative/spec/spec.py Extended Spec dataclass with config migration, transformation, and validation fields and methods. Integrated with message repository.
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py Added module exporting RemapField.
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py Introduced abstract base class ConfigTransformation.
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py Implemented RemapField, a transformation for remapping config values by path and mapping.
airbyte_cdk/sources/declarative/validators/__init__.py Added module exporting validator classes and strategies.
airbyte_cdk/sources/declarative/validators/dpath_validator.py Implemented DpathValidator for path-based value extraction and validation.
airbyte_cdk/sources/declarative/validators/predicate_validator.py Added PredicateValidator for validating a value using a strategy.
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py Added ValidateAdheresToSchema strategy for JSON schema validation.
airbyte_cdk/sources/declarative/validators/validation_strategy.py Introduced abstract base class ValidationStrategy.
airbyte_cdk/sources/declarative/validators/validator.py Introduced abstract base class Validator.
pyproject.toml Added dependencies: dagger-io and anyio.
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py Added tests for RemapField transformation logic.
unit_tests/sources/declarative/validators/test_dpath_validator.py Added tests for DpathValidator including edge cases and error handling.
unit_tests/sources/declarative/validators/test_predicate_validator.py Added tests for PredicateValidator with various strategies and inputs.
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py Added tests for ValidateAdheresToSchema covering valid, invalid, and edge scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant AirbyteEntrypoint
    participant Spec
    participant ConfigTransformation
    participant Validator
    participant MessageRepository

    User->>AirbyteEntrypoint: Run with config file
    AirbyteEntrypoint->>Spec: migrate_config(args, source, config)
    Spec->>ConfigTransformation: transform(config) (for each migration)
    ConfigTransformation-->>Spec: (config mutated)
    Spec->>AirbyteEntrypoint: (writes updated config file if changed)
    Spec->>MessageRepository: emit control message with updated config

    User->>AirbyteEntrypoint: Run sync
    AirbyteEntrypoint->>Spec: transform_config(config)
    Spec->>ConfigTransformation: transform(config) (for each transformation)
    ConfigTransformation-->>Spec: (config mutated)
    Spec->>AirbyteEntrypoint: (returns transformed config)

    AirbyteEntrypoint->>Spec: validate_config(config)
    Spec->>Validator: validate(config) (for each validator)
    Validator-->>Spec: (raises error if invalid)
Loading

Would you like to see a more detailed diagram for any specific part of the new config normalization flow, or is this overview sufficient for your needs? Wdyt?

Tip

⚡️ Faster reviews with caching
  • CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3469-3478: ⚠️ Potential issue

Critical: Fix create_spec to match Spec signature and handle optional rules.
The Spec constructor now expects config_transformations and config_validations (not transformations/validations), and model.config_normalization_rules may be None, causing attribute errors. Could we update this block accordingly? For example:

-        return Spec(
-            connection_specification=model.connection_specification,
-            documentation_url=model.documentation_url,
-            advanced_auth=model.advanced_auth,
-            parameters={},
-            config_migrations=model.config_normalization_rules.config_migrations,
-            transformations=model.config_normalization_rules.transformations,
-            validations=model.config_normalization_rules.validations,
-        )
+        return Spec(
+            connection_specification=model.connection_specification,
+            documentation_url=model.documentation_url,
+            advanced_auth=model.advanced_auth,
+            parameters={},
+            config_migrations=model.config_normalization_rules.config_migrations if model.config_normalization_rules else [],
+            config_transformations=model.config_normalization_rules.transformations if model.config_normalization_rules else [],
+            config_validations=model.config_normalization_rules.validations if model.config_normalization_rules else [],
+        )

This change should resolve the mypy errors about unexpected keyword arguments and guard against None normalization rules. wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 3469-3469: mypy error: Unexpected keyword argument "transformations" for "Spec"; did you mean "config_transformations"? [call-arg]


[error] 3469-3469: mypy error: Unexpected keyword argument "validations" for "Spec"; did you mean "config_validations"? [call-arg]


[error] 3474-3474: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "config_migrations" [union-attr]


[error] 3474-3474: mypy error: Argument "config_migrations" to "Spec" has incompatible type "list[RemapField] | Any | None"; expected "list[ConfigTransformation] | None" [arg-type]


[error] 3475-3475: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "transformations" [union-attr]


[error] 3476-3476: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "validations" [union-attr]

🧹 Nitpick comments (18)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)

9-24: Consider using MutableMapping instead of Dict for better flexibility?

The implementation looks great! One small suggestion - I noticed in the remap_field.py implementation (from the snippets), you're using MutableMapping[str, Any] for the config parameter, but here you're using Dict[str, Any]. Using MutableMapping would be more flexible and consistent with the implementations. Wdyt?

- from typing import Any, Dict
+ from typing import Any, Dict, MutableMapping

@abstractmethod
def transform(
    self,
-   config: Dict[str, Any],
+   config: MutableMapping[str, Any],
) -> None:
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)

11-26: Consider aligning validate method signature with other validators?

The implementation looks clean and follows good composition practices! Based on the snippets from other validators, I noticed that other validators implement a validate method that takes an input_data parameter. Would it make sense to align the method signature here for consistency across validators? Something like:

- def validate(self) -> None:
+ def validate(self, input_data: Any = None) -> None:
    """
    Applies the validation strategy to the value.

    :raises ValueError: If validation fails
    """
    self.strategy.validate(self.value)

This way all validators would have a consistent interface, even if this particular implementation ignores the input. Wdyt?

airbyte_cdk/sources/declarative/validators/dpath_validator.py (3)

25-33: Consider simplifying the field_path conversion logic

There appears to be redundancy in the way you're handling the field_path conversion. You first create a new list with all paths converted, and then iterate through the original list again to convert string elements. Could this be simplified to a single pass approach, wdyt?

- self._field_path = [
-     InterpolatedString.create(path, parameters={}) for path in self.field_path
- ]
- for path_index in range(len(self.field_path)):
-     if isinstance(self.field_path[path_index], str):
-         self._field_path[path_index] = InterpolatedString.create(
-             self.field_path[path_index], parameters={}
-         )
+ self._field_path = [
+     InterpolatedString.create(path, parameters={}) for path in self.field_path
+ ]

47-59: Consider consolidating duplicate error handling

The error handling logic for both wildcard and non-wildcard paths is duplicated. Maybe you could refactor this to reduce duplication and improve maintainability, wdyt?

if "*" in path:
    try:
        values = dpath.values(input_data, path)
        for value in values:
            self.strategy.validate(value)
-   except KeyError as e:
-       raise ValueError(f"Error validating path '{self.field_path}': {e}")
else:
    try:
        value = dpath.get(input_data, path)
        self.strategy.validate(value)
-   except KeyError as e:
-       raise ValueError(f"Error validating path '{self.field_path}': {e}")
+   except KeyError as e:
+       raise ValueError(f"Error validating path '{self.field_path}': {e}")

35-59: Add validation for input_data type

The method assumes input_data is a dictionary without explicitly checking. Adding a type check could prevent cryptic errors if a non-dict value is passed.

def validate(self, input_data: dict[str, Any]) -> None:
+   if not isinstance(input_data, dict):
+       raise ValueError(f"Expected dictionary input, got {type(input_data).__name__}")
    
    path = [path.eval({}) for path in self._field_path]
    # rest of the method...
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)

93-96: Consider extending exception test to verify message

The test confirms an exception is raised with empty field path, but doesn't verify the exception message. Would it be helpful to also check that the exception message matches expectations, wdyt?

with pytest.raises(Exception) as exc_info:
    RemapField(field_path=[], map={"old_value": "new_value"})
+ assert "field_path cannot be empty" in str(exc_info.value)
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (1)

119-131: Consider adding invalid JSON string test

You're testing validation with a valid JSON string, which is great. Would it also be valuable to test with an invalid JSON string to verify appropriate error handling, wdyt?

def test_given_invalid_json_string_when_validate_then_raises_error(self):
    schema = {"type": "object"}
    validator = ValidateAdheresToSchema(schema=schema)
    
    with pytest.raises(ValueError) as exc_info:
        validator.validate('{"invalid json')
    
    assert "Invalid JSON" in str(exc_info.value)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (3)

23-33: Simplify field_path conversion logic

Similar to the comment in DpathValidator, there's redundancy in the way field_path is handled. You create a new list with all elements converted, then iterate again to convert string elements. Could this be simplified to a single approach, wdyt?

- self._field_path = [
-     InterpolatedString.create(path, parameters={}) for path in self.field_path
- ]
- for path_index in range(len(self.field_path)):
-     if isinstance(self.field_path[path_index], str):
-         self._field_path[path_index] = InterpolatedString.create(
-             self.field_path[path_index], parameters={}
-         )
+ self._field_path = [
+     InterpolatedString.create(path, parameters={}) for path in self.field_path
+ ]

24-25: Improve error message for empty field path

The error message could be more descriptive about why empty paths aren't allowed.

if not self.field_path:
-   raise Exception("field_path cannot be empty.")
+   raise ValueError("field_path cannot be empty. A valid path is required to identify the field to remap.")

59-60: Maybe add support for non-string map keys?

Currently, the map lookup assumes string keys. If there's a chance of non-string values in the field being remapped (like integers), would it be valuable to add type conversion, wdyt?

if field_name in current and current[field_name] in self.map:
    current[field_name] = self.map[current[field_name]]
+ elif field_name in current and str(current[field_name]) in self.map:
+    current[field_name] = self.map[str(current[field_name])]
unit_tests/sources/declarative/validators/test_dpath_validator.py (1)

77-93: Redundant assertion in wildcard test.

There's a duplicate assertion at line 92 that's using unittest style after already using pytest style at line 91. Consider removing one of these assertions for clarity, wdyt?

  assert strategy.validate_called
  assert strategy.validated_value in ["[email protected]", "[email protected]"]
- self.assertIn(strategy.validated_value, ["[email protected]", "[email protected]"])
unit_tests/sources/declarative/validators/test_predicate_validator.py (1)

1-56: Consider adding tests for edge cases.

The tests look solid for the main use cases, but perhaps consider adding tests for edge cases like None values or other special cases that might occur in real configurations, wdyt?

def test_given_none_value_when_validate_then_validation_occurs():
    strategy = MockValidationStrategy()
    validator = PredicateValidator(value=None, strategy=strategy)
    
    validator.validate()
    
    assert strategy.validate_called
    assert strategy.validated_value is None
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

3806-3832: Consider extracting config_normalization_rules to a reusable definition?
Rather than inlining the schema under Spec, would it be clearer to define ConfigNormalizationRules under definitions and reference it here for reuse and consistency? wdyt?


4310-4339: Add default empty-list values for config_* arrays?
A lot of our YAML arrays (e.g., state_migrations) include default: [] to simplify downstream logic. Would you consider adding default: [] to config_migrations, transformations, and validations so they always resolve to a list? wdyt?

airbyte_cdk/sources/declarative/spec/spec.py (3)

71-74: Should we short-circuit when no migrations are configured?

After adopting default_factory=list, we could still save a needless copy when the list is empty.

if not self.config_migrations:
    return  # nothing to migrate

Minor, but avoids touching the file and emitting control messages when no-op, wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 72-72: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


83-96: Mirror the migration improvements in transform_config.

With the default_factory=list change, both loops become safe, but adding an early return keeps things tidy and avoids an unnecessary dict copy when no transformations exist.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 92-92: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


98-105: Return early (or raise) on failed validations?

Right now we iterate through validators but never surface which one failed. Would it make sense to accumulate exceptions and raise an aggregate (or raise immediately) so users know exactly what went wrong? Happy to sketch code if useful, wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 104-104: mypy error: Item "None" of "list[Validator] | None" has no attribute "iter" (not iterable) [union-attr]

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1972-1975: Duplicate GraphQL request-body types – intentional?

There is already a RequestBodyGraphQlQuery class (note the lowercase l) a few hundred lines above.
Adding RequestBodyGraphQL introduces two near-identical schema nodes, which might confuse the manifest authors and muddy auto-completion.

Would it be simpler to consolidate both into a single well-named class (e.g. RequestBodyGraphQL only) and deprecate the other? Happy to suggest a deprecation alias if backwards compatibility is required, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcfcf04 and a29e424.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (13 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • airbyte_cdk/sources/declarative/spec/spec.py (3 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/dpath_validator.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/predicate_validator.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validation_strategy.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validator.py (1 hunks)
  • pyproject.toml (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_dpath_validator.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_predicate_validator.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • RemapField (15-60)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (2)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • transform (35-60)
airbyte_cdk/sources/declarative/validators/__init__.py (5)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • DpathValidator (16-59)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • PredicateValidator (12-26)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • ValidateAdheresToSchema (15-39)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • ValidationStrategy (9-22)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • Validator (9-18)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (4)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-59)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-59)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • validate (22-39)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
  • RemapField (15-60)
  • transform (35-60)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (2)
  • ConfigTransformation (9-23)
  • transform (15-23)
unit_tests/sources/declarative/validators/test_predicate_validator.py (2)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • ValidationStrategy (9-22)
unit_tests/sources/declarative/validators/test_dpath_validator.py (2)
  • MockValidationStrategy (9-20)
  • validate (16-20)
🪛 GitHub Actions: Dependency Analysis
pyproject.toml

[error] 1-1: DEP002 'dagger-io' defined as a dependency but not used in the codebase


[error] 1-1: DEP002 'anyio' defined as a dependency but not used in the codebase

🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

[error] 3469-3469: mypy error: Unexpected keyword argument "transformations" for "Spec"; did you mean "config_transformations"? [call-arg]


[error] 3469-3469: mypy error: Unexpected keyword argument "validations" for "Spec"; did you mean "config_validations"? [call-arg]


[error] 3474-3474: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "config_migrations" [union-attr]


[error] 3474-3474: mypy error: Argument "config_migrations" to "Spec" has incompatible type "list[RemapField] | Any | None"; expected "list[ConfigTransformation] | None" [arg-type]


[error] 3475-3475: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "transformations" [union-attr]


[error] 3476-3476: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "validations" [union-attr]

airbyte_cdk/sources/declarative/spec/spec.py

[error] 72-72: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


[error] 76-76: mypy error: Argument 1 to "open" has incompatible type "Any | None"; expected "int | str | bytes | PathLike[str] | PathLike[bytes]" [arg-type]


[error] 80-80: mypy error: "AirbyteMessage" has no attribute "json" [attr-defined]


[error] 92-92: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


[error] 104-104: mypy error: Item "None" of "list[Validator] | None" has no attribute "iter" (not iterable) [union-attr]

airbyte_cdk/sources/declarative/models/declarative_component_schema.py

[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (28)
pyproject.toml (1)

93-94:

❌ Incorrect review comment

Dependencies added but not being used?

The pipeline analysis indicates that dagger-io and anyio dependencies are declared but not used in the codebase. Are these dependencies going to be used in future PRs or are they needed for the config migrations feature that's currently being implemented? Maybe these are transitive dependencies needed by other new code?


🏁 Script executed:

#!/bin/bash
# Let's check if these packages are imported anywhere in the code but the static analysis is missing it

echo "Checking for dagger-io imports..."
rg -l "import dagger" || echo "No direct dagger imports found"
rg -l "from dagger" || echo "No direct dagger imports found"

echo -e "\nChecking for anyio imports..."
rg -l "import anyio" || echo "No direct anyio imports found"
rg -l "from anyio" || echo "No direct anyio imports found"

echo -e "\nChecking the broader context of what's being added in this PR..."
fd -e py -e yaml -c 'git diff' | head -n 100

Length of output: 793


dagger-io & anyio are utilized in the bin script
bin/generate_component_manifest_files.py imports both dagger and anyio.

These dependencies are in use there, so they’re not unused. Does this cover all usage, or are you planning further integration? wdyt?

Likely an incorrect or invalid review comment.

airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)

1-7: LGTM!

Clean and concise implementation of the module's __init__.py file. It follows best practices by explicitly declaring what symbols are exported from the module using __all__.

airbyte_cdk/sources/declarative/validators/validator.py (1)

1-18: LGTM!

Well-designed abstract base class with clear documentation. The method signature and docstring provide good guidance for implementers on how validation should be performed.

airbyte_cdk/sources/declarative/validators/__init__.py (1)

1-19: Looks good to me!
The new __all__ cleanly exposes the validator abstractions as intended. Nice work!

airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)

9-22: Well-designed abstract base class for validation strategies!

Good job creating a clean, concise interface for validation strategies with clear documentation and proper error handling expectations. This abstract class provides a solid foundation for the validation framework.

airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)

14-39: Solid implementation of the ValidationStrategy for JSON schema validation!

The implementation handles both string and non-string inputs gracefully, with clear error handling and good separation of concerns. Nice job adding the string-to-JSON conversion logic to make this validator more flexible.

airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)

43-43: Should path evaluation consider the input context?

I notice you're evaluating the path using an empty context {}. Is this intentional, or should it use input_data for interpolation? This might be important if the path needs to be dynamically resolved based on the input.

unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (4)

11-36: The test cases look thorough and well-structured

Good job testing both the positive case and verifying that the original config is preserved.


37-51: Great edge case coverage

This test effectively verifies that values not found in the mapping remain unchanged, which is an important edge case.


68-82: Well-designed test for interpolated path functionality

This test effectively demonstrates that fields can be remapped when paths contain dynamic components.


97-112: Good test for complex interactions

This test effectively verifies that multiple transformations can be applied sequentially without interference, which is important for real-world scenarios with multiple config transformations.

unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (3)

11-29: Well-structured test for successful validation

The test effectively verifies that valid data passes schema validation.


30-52: Great test for required field validation

This test effectively verifies that missing required fields result in appropriate validation errors with descriptive messages.


80-90: Good edge case handling for invalid schema

Testing with an invalid schema and checking for a specific error type demonstrates robust error handling.

airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)

48-56: Efficient path traversal logic

The navigation through nested dictionary structures is well-implemented with appropriate checks for existence and type. This makes the transformation safely handle incomplete or unexpected configurations.

unit_tests/sources/declarative/validators/test_dpath_validator.py (5)

9-21: Clean mock implementation for ValidationStrategy.

The MockValidationStrategy implementation provides a clear way to track validation calls and simulate both success and failure cases. Good approach with tracking validation call state for verification in tests.


24-34: Test for happy path looks good.

The test covers the expected behavior when both the path and validation are valid. The assertions effectively verify that the strategy was called and received a value.


35-46: Good error handling test.

This test correctly verifies that the validator raises a ValueError with an appropriate message when the path doesn't exist, and that the strategy isn't called in this case.


47-59: Well-structured strategy failure test.

Nice job testing that validation errors from the strategy are properly propagated. The assertions verify both the error and that the strategy was called with the correct value.


60-76: Good boundary condition tests.

These tests for empty path and empty input data are valuable edge cases that ensure robust error handling in the validator.

unit_tests/sources/declarative/validators/test_predicate_validator.py (4)

9-21: Well-designed mock implementation.

The MockValidationStrategy is consistent with the one in the test_dpath_validator.py file and properly implements the ValidationStrategy interface. This consistency across test files helps maintain clarity.


24-33: Good basic validation test.

This test effectively verifies that the validator passes the value to the strategy correctly and that validation succeeds when expected.


34-46: Thorough error handling test.

The test properly validates that errors from the strategy are propagated and that the error message is preserved in the exception. The additional assertions confirm the strategy was called with the right value.


47-56: Good test for complex objects.

Testing with a nested object is important since validators should handle complex data structures. The test confirms that the entire object is passed to the strategy unchanged.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

4224-4257: PredicateValidator schema looks solid
The PredicateValidator definition aligns well with our validation abstractions and mirrors the Pydantic model perfectly. Nice work—no changes needed here, wdyt?


4259-4297: ValidateAdheresToSchema definition is good to go
This validator supports both string and object schemas as intended. Everything appears consistent and complete. wdyt?

airbyte_cdk/sources/declarative/spec/spec.py (1)

75-81: AirbyteMessage.json() isn’t in stubs – wrap with cast or use model_dump_json.

mypy flags json as missing. One option:

-from message in self.message_repository.consume_queue():
-    print(message.json(exclude_unset=True))
+for message in self.message_repository.consume_queue():
+    # mypy: AirbyteMessage inherits from pydantic.BaseModel, so json() exists.
+    print(message.model_dump_json(exclude_unset=True))  # type: ignore[attr-defined]

This silences mypy without runtime impact. Another approach is a cast to BaseModel.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 76-76: mypy error: Argument 1 to "open" has incompatible type "Any | None"; expected "int | str | bytes | PathLike[str] | PathLike[bytes]" [arg-type]


[error] 80-80: mypy error: "AirbyteMessage" has no attribute "json" [attr-defined]

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

2069-2087: Ruff formatting failed – run ruff format.

The CI job reports a single formatting violation in this file. Running ruff format airbyte_cdk/sources/declarative/models/declarative_component_schema.py (or pre-commit run --all-files) will auto-fix white-space & import-ordering issues.

@pnilan pnilan changed the base branch from main to pnilan/feat/implement-validators May 12, 2025 18:15
@@ -90,6 +90,8 @@ sqlalchemy = {version = "^2.0,!=2.0.36", optional = true }
xmltodict = ">=0.13,<0.15"
anyascii = "^0.3.2"
whenever = "^0.6.16"
dagger-io = "^0.18.6"
anyio = "<4.0.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was getting an error when running poetry run poe assemble that was due to anyio version clash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant