WIP: Speed up a few slowdowns when handling large datasets

This is a summary thread for a few slowdowns that I noticed when handling large-ish datasets (e.g., 15000 templates x 500 channels x 1800 samples). I'm not calling them "bottlenecks" because it's rather the sum of things together that take extra time, none of these slowdowns changes EQcorrscan fundamentally.

I will create some PRs for each point that I have a suggested solution so that we can systematically merge, improve, or reject the suggested solutions. Will add the links to the PRs here, but I'll need to organize a bit for that..

Here are some slowdowns (tests with python 3.11, all in serial code):

- 1. `tribe._group_templates`: **50x speed up**
  -  problem O^2 double loop takes ~20 s for some thousand templates. Sped up single loop to <0.5 s. 
  -  PR: https://github.com/eqcorrscan/EQcorrscan/pull/524
- 2. `preprocessing._prep_data_for_correlation`: **3x speed up for function**: 
  - Problem: with heterogeneous templates (i.e., many templates with different station setups), filling the templates with NaN-channels takes a long time (many copy-operations in serial). (ca. 150 s -> 50 s for 1500 templates with up to 500 channels).
  - PR: https://github.com/eqcorrscan/EQcorrscan/pull/525
- 3. `matched_filter.match_filter` with `copy_data=True`: **3x speedup for copy**
  - copying of streams (templates and continuous data) takes XYZ s because deepcopy of streams/traces is slow. Custom copy functions can speed it up by a factor of ~3x. Circa 18 s previously for 300 24h- traces, now ~6 s.
  - PR (same as above): https://github.com/eqcorrscan/EQcorrscan/pull/525
- 4. `detection`, `lag_calc`,  `pre_processing`: **4x / 100x speedup for trace selection**
  - trace selection by trace id from a stream is still a slowdown (even after https://github.com/obspy/obspy/pull/2886). Can be speed up by x4 with simplified function and ~x100 with dict-lookup for streams with fixed order of traces.
  - PR: https://github.com/eqcorrscan/EQcorrscan/pull/526
- 5. `core.match_filter.family._uniq`: **1.9x speedup**
  - retrieving the unique list of detections is quicker for many detections with `list(set)` (1.9x speedup for 43000 detections, fastest: 3.1 s), but 1.2x slower for small sets (e.g., 430 detections; 50 ms --> 27 ms).
  - PR: https://github.com/eqcorrscan/EQcorrscan/pull/527
- 6. `core.match_filter.detect` - **1000x speed up for many calls to `family._uniq`**
  -  using `family._uniq` in a loop over all families is still rather slow with `_uniq`. Checking tuples of `(detection.id, detection.detect_time, detection.detect_val)` with `numpy.unique` and avoiding a loop is 1000x faster.  From 752 s to <1 s for 82000 detections.
  - PR (same as above): https://github.com/eqcorrscan/EQcorrscan/pull/527
- 7. `matched_filter._group_detect`: **30x speedup in handling detections**
  - selecting detections by template name in big list can be slow via loop. Dict-lookup offers ~50x speedup
  - adding `prepick` to many picks (e.g., 400k) is somewhat slow because of `UTCDateTime` overhead. ~4x speedup with adding to `pick.time.ns` directly.
  - PR: https://github.com/eqcorrscan/EQcorrscan/pull/528


**Is your feature request related to a problem? Please describe.**
All the points in the upper list occur in parts of the code where parallelization cannot help speed up execution . When running EQcorrscan on a big cluster, it's wasteful to spend as much time reorganizing data in serial as it takes to run the well parallelized template matching correlations etc.

Three more slowdowns where parallelization can help:
- 8. `core.match_filter`: **2.5x speedup for MAD threshold calc**
  - `np.median(np.abs(cccsum))` for each cccsum takes a lot of time when there are many cccsum in cccsums. Only quicker solution I found was to parallelize the operation, which surprisingly could speed up problems bigger than ~15 cccsum already. The speedup is only ~2.5x, so even though that matters a lot for many cccsum (e.g., 2000: 20 s vs 50 s), it feels like this has more potential for even more speedup. 
  - PR: https://github.com/eqcorrscan/EQcorrscan/pull/531
- 9. `detection._calculate_event`: **35% speedup in parallel**
  - calling this for many detections is slow when a lot of events need to be created. Parallelization can help to speed this up a bit (35 % for 460 detections in test case).
- 10. `utils.catalog_to_dd.write_correlations`: **20 % speedup using some shared memory**
  - Starting each worker for one reference event is slow because the event and stream for all neighbors of the reference event need to be pickled to the worker. Using some shared memory should be able to help here (PR: 20 % speedup by moving trace.data numpy-arrays into shared memory)
  - PR: https://github.com/eqcorrscan/EQcorrscan/pull/529


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Speed up a few slowdowns when handling large datasets #522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WIP: Speed up a few slowdowns when handling large datasets #522

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions