Description
These are the planned optimizations and bug fixes for JSON spark reader for 24.12 release.
-
Memory optimization PR JSON tokenizer memory optimizations #16978
-
Runtime mitigation issue - Multi-stage FST implementation (Elias, Shruti) [FEA] Faster path for calculating total output symbols in FST #17114
-
input schema issue/PR (New Feature)
- [FEA]
read_json
should output all-nulls columns for the schema columns that do not exist in the input #17091, - [FEA] The output columns of
read_json
need to follow depth-first-search order as in the input schema #17090, - [FEA] Implement a better JNI function to assemble the output columns from
cudf::read_json
#17002, - [BUG]
cudf::io::read_json
does not verify output column structures with the input schema #16799, - [BUG] Requested types ignored if prune_schema is enabled for JSON reading #16797
- PR Add optional column_order in JSON reader #17029
- [FEA]
-
Performance: Preprocessing: nullify empty lines PR add option to nullify empty lines #17028
-
Bugfix: last invalid json is not error - [BUG]
cudf::read_json
incorrectly parses invalid JSON string #16999 -
Bugfix: disable array of arrays for spark - disable array of arrays for recovery with null #17030
-
Performance: mega kernel - [FEA] Implement merged 'mega' kernel to parse leaf-level columns in JSON reader #16965
-
[FEA] Improve
GpuJsonToStructs
performance NVIDIA/spark-rapids#11560 (input schema, and post-processing move columns without copying)
Activity