Skip to content

[BACKGROUND] Understanding process_length #603

Open
@ylseanna

Description

@ylseanna

What issue am I encountering?

While trying to set up long term template matching processing (I eventually hope to process across approx. 10 years of data) I am hitting some conceptual hurdles. I want to be able to have a bit more control over how I store, process, save and load my templates (/tribes) and detections (/parties), and processing data in overlapping chunks. While looking into this, I stumbled upon the 'process_len' parameter.

I see how choosing different processing length can affect the FFTs for both input stream pre-processing and template matching over a given data span. It seems however that it is important to keep process_len the same between both the pre-processing of the data (pre-process to tribe.construct/tribe.detect) and the template matching (FMF in tribe.detect).

But with the necessary overlaps for continuous processing (e.g. creating templates over long periods, or particularly detecting over longer periods), keeping both the same seems very bothersome, and, for instance, judging from outputs from tribe.detect, data is fed to FMF in chunks of days (approximately) with overlaps dealt with in a smaller chunk at the end, which seems to break this requirement for consistent process lengths.

Main question (/tl;dr)

Can I vary the process length between data pre-processing and strictly the template detection? E.g. could I pre-process data outside tribe.construct/detect, and then use different data lengths (say if I tribe.detect over two/three/ten days, but manually pre-process input streams day-by-day)?

Why am I asking the question?

  • It seems you have given a lot of thought to the significance of the process_len, and I'm afraid to miss some assumptions within a large code base (before deep-diving into my own experiments).
  • I have access to significant amounts of memory on a HPC cluster, so I am more inclined to load in a lot of data to process in one big bunch to save on read/write (as opposed to your client implementation, the processing of which I am maybe repeating to a little).

Thank you!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions