Skip to content

[FEA] Generate element labels from offsets #10905

Closed
@ttnghia

Description

In some cases, for a list column, we want to generate labels for each element in the child column.

For example, given a list column [ [1, 2, 3], [4, 5], [6, 7, 8] ], we want to generate a label column like [0, 0, 0, 1, 1, 2, 2, 2].

Having such label column, we can combine the child column (i.e, [1, 2, 3, 4, 5, 6, 7, 8]) and the label column for further processing. Use case of such label column already exists in drop_list_duplicates (link).
The next use case would be for set-like operations (#10409), when we want to process all elements in the child column in parallel (i.e., one element per thread), instead of one list per thread.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

SparkFunctionality that helps Spark RAPIDSfeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.non-breakingNon-breaking change

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions