JSON spark reader plan for 24.12

These are the planned optimizations and bug fixes for JSON spark reader for 24.12 release.

- Memory optimization PR https://github.com/rapidsai/cudf/pull/16978
- Runtime mitigation issue - Multi-stage FST implementation (Elias, Shruti)  https://github.com/rapidsai/cudf/issues/17114
- input schema issue/PR (New Feature)
  - https://github.com/rapidsai/cudf/issues/17091, 
  - https://github.com/rapidsai/cudf/issues/17090, 
  - https://github.com/rapidsai/cudf/issues/17002, 
  - https://github.com/rapidsai/cudf/issues/16799, 
  - https://github.com/rapidsai/cudf/issues/16797
  - PR https://github.com/rapidsai/cudf/pull/17029
- Performance: Preprocessing: nullify empty lines PR https://github.com/rapidsai/cudf/pull/17028
- Bugfix: last invalid json is not error - https://github.com/rapidsai/cudf/issues/16999
- Bugfix: disable array of arrays for spark - https://github.com/rapidsai/cudf/pull/17030
- Performance: mega kernel - https://github.com/rapidsai/cudf/issues/16965


- https://github.com/NVIDIA/spark-rapids/issues/11560 (input schema, and post-processing move columns without copying)
  - https://github.com/NVIDIA/spark-rapids-jni/pull/2510

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON spark reader plan for 24.12 #17138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development