Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve inlet dataset events through dataset aliases #40809

Merged
merged 5 commits into from
Jul 23, 2024

Conversation

Lee-W
Copy link
Member

@Lee-W Lee-W commented Jul 16, 2024

Related: #40039
Depend on: #40693

What

Following #40723, #40478 and #40693, this PR allows us to read the dataset events of a DatasetAlias resolved datasets through inlet_events.

Example

    with DAG(dag_id="dataset-alias-producer"):

        @task(outlets=[DatasetAlias("example-alias")])
        def produce_dataset_events(*, outlet_events):
            outlet_events["example-alias"].add(Dataset("s3://bucket/my-task"), extra={"row_count": 1})


    with DAG(dag_id="dataset-alias-consumer", schedule=None):

        @task(inlets=[DatasetAlias("example-alias")])
        def consume_dataset_alias_events(*, inlet_events):
            events = inlet_events[DatasetAlias("example-alias")]
            last_row_count = events[-1].extra["row_count"]

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@Lee-W Lee-W force-pushed the inlet-dataset-alias branch 3 times, most recently from 35830b4 to cc7f776 Compare July 16, 2024 13:16
@Lee-W
Copy link
Member Author

Lee-W commented Jul 16, 2024

This one is ready to be reviewed. Keep it as draft as it depends on #40693

@Lee-W Lee-W changed the title Inlet dataset alias Retrieve inlet dataset events thorugh dataset aliases Jul 17, 2024
@Lee-W Lee-W changed the title Retrieve inlet dataset events thorugh dataset aliases Retrieve inlet dataset events through dataset aliases Jul 17, 2024
@Lee-W Lee-W force-pushed the inlet-dataset-alias branch 15 times, most recently from 4b2ca9d to d6b3400 Compare July 22, 2024 13:44
@Lee-W Lee-W marked this pull request as ready for review July 22, 2024 13:46
@Lee-W Lee-W requested a review from potiuk as a code owner July 22, 2024 13:46
@Lee-W Lee-W force-pushed the inlet-dataset-alias branch from d6b3400 to a4425e7 Compare July 23, 2024 06:15
@Lee-W Lee-W force-pushed the inlet-dataset-alias branch from a4425e7 to a94430b Compare July 23, 2024 06:38
@Lee-W Lee-W merged commit e9d2d5c into apache:main Jul 23, 2024
48 checks passed
@Lee-W Lee-W deleted the inlet-dataset-alias branch July 23, 2024 07:12
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db-migrations PRs with DB migration kind:documentation type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants