Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major database schema updates: new tables for mwcs, stretching, wct params #395

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

asyates
Copy link
Contributor

@asyates asyates commented Feb 6, 2025

EDIT: This pull request now developed into big revamp of database schema for msnoise 2

Afew things added/changed. I still want to make some further adjustments (listed below), but if there is any feedback on these changes i.e. whether they're wanted or not, or if there is a better way of doing it, would be useful before merging.

Changes:

  • dtt_minlag, dtt_width, and dtt_v, moved to filter table. I've always personally, at least for dtt_minlag and dtt_width, adjusted the code to have these defined for individually for each filter as makes the most sense to me (especially for single station). dtt_v, could go either way... ofc could expect different ballistic velocity at different frequencies, but seems less important maybe to define individually.

  • config sql table contains used_in field which is a list of steps where it is used/defined. It can be used as a filter on msnoise admin (originally was looking to do this without having it as a column in the sql table, only in defaults.csv, but was easier just including it in the config table... and just not showing it on admin).

  • adjusted some of parameters in wavelet codes (from using 'dtt' to 'wct') to make it clear they are wavelet related, e.g. wct_minlag, wct_codacycles, wct_min_nonzero. Might rename some of these variables anyway actually, as i just chose them quickly when writing the code.

Some images of the changes

filter table new columns:
filter_admin1

dropdown option in config with used_in filter:
msnoise_admin3
msnoise_admin2

wct param name changes:
msnoise_admin4**

To do (related to param changes)

  • Double check documentation, probably needs some updates.
  • Adjust wavelet code to also include option for static lag times. Also, some of the 'default' choices should be changed.

As said, feedback welcome re. any of this!

Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 87.75367% with 175 lines in your changes missing coverage. Please review.

Project coverage is 64.97%. Comparing base (6fe4a6b) to head (acf35c6).

Files with missing lines Patch % Lines
msnoise/api.py 78.89% 84 Missing ⚠️
msnoise/msnoise_admin.py 71.65% 36 Missing ⚠️
msnoise/s08compute_wct.py 85.71% 16 Missing ⚠️
msnoise/s05compute_mwcs2.py 90.82% 10 Missing ⚠️
msnoise/s07_compute_dvv.py 74.07% 7 Missing ⚠️
msnoise/plots/mwcs.py 0.00% 5 Missing ⚠️
msnoise/s06compute_dtt2.py 94.20% 4 Missing ⚠️
msnoise/s03compute_no_rotation.py 91.89% 3 Missing ⚠️
msnoise/msnoise_table_def.py 97.77% 2 Missing ⚠️
msnoise/plots/dvvs.py 0.00% 2 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #395      +/-   ##
==========================================
+ Coverage   63.66%   64.97%   +1.31%     
==========================================
  Files          44       44              
  Lines        7672     8535     +863     
==========================================
+ Hits         4884     5546     +662     
- Misses       2788     2989     +201     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@asyates
Copy link
Contributor Author

asyates commented Feb 11, 2025

Removed mwcs_low and mwcs_high, and renamed 'low' and 'high' to freqmin and freqmax to match obspy definitions.

image

To do on this pull request

  • renaming and some additions of wct_params (static and dynamic lag time).
  • renaming of some dtt_ params to mwcs_ (e.g. dtt_coh, dtt_maxerr)
  • change of max_dt to max_dtt

@ThomasLecocq
Copy link
Member

Super work here @asyates ! so remaining mwcs_dtt_mincoh, mwcs_dtt_maxerr etc are going to be super easy to understand! that's cool !

Still thinking of splitting the MWCS+DTT ; WCT ; STR from the filter table... be in another table, and with links to the filter table's ref.. Allowing many-to-one reference

@ThomasLecocq
Copy link
Member

@asyates so the stretching parameters should also go in the filters' definition ?

@asyates
Copy link
Contributor Author

asyates commented Feb 12, 2025

@asyates so the stretching parameters should also go in the filters' definition ?

could do... though, at the same time, i don't usually consider to change these for individual filters (could be applicable, but for the most part I am choosing a value of stretching_max that will cover full range of possibilities i.e. can then just set stretching_nsteps larger if worried about resolution).

So i would say not necessary. Plus, the stretching is also using the dtt_params currently in the filter table, and wct also once i implement static/dynamic lag time option. So makes sense for them to be there rather than specific to stretching param .

maybe just need to rename i.e. rather than dtt_minlag and dtt_width.... lag_win_start, lag_win_length, as not specific to dt/t (or even just window_start and window_length). Thoughts @ThomasLecocq ?

@asyates
Copy link
Contributor Author

asyates commented Feb 12, 2025

Still thinking of splitting the MWCS+DTT ; WCT ; STR from the filter table... be in another table, and with links to the filter table's ref.. Allowing many-to-one reference

re. splitting this... do you envisage a separate table for each? i.e. table for mwcs+dtt, wct(+dtt), str? Or all in one table?

Things that come to mind immediately:

Pro of separate tables:

  • more user friendly. having all params in one table could be a lot, especially if not planning to use most e.g. just doing mwcs+dtt
  • easily understandable outputs i.e. DTTN for mwcs+dtt params in REF N position, WCTN for same with WCT, STRN etc. (so if have 4 entries for mwcs+dtt, would have DTT1, DTT2, DTT3, DTT4 as outputs.

Cons:

  • three more tables, so added complexity in this sense, but maybe not that big an issue.

@ThomasLecocq
Copy link
Member

maybe just need to rename i.e. rather than dtt_minlag and dtt_width.... lag_win_start, lag_win_length, as not specific to dt/t (or even just window_start and window_length)
agree with the renaming.

Still think stretching is a "dvv" method, and should be "after filter" (if you think of the steps as a chain)...

@ThomasLecocq
Copy link
Member

ThomasLecocq commented Feb 12, 2025

Yep, I'm really thinking of altering the schema (could be for msnoise 3) to include a DVV table that links to one (or more) filters...

the dvv table fields could be dynamic:
ref, filter_id, method, method_parameters

where method= ["mwcs", "wct", "stretching", new method...]

the difficulty here is to create a schema that accepts the params as such.

The way wordpress does is to store the param_name=value in different ROWS , effectively splitting the "dvv row" into:

1: ref = 1 ; param = filter_id, value = 1
2: ref = 1 ; param = method, value = mwcs
3: ref = 1 ; param = mwcs_freqmin, value= 0.1
...

chat GPT might have another idea for that :)

@ThomasLecocq
Copy link
Member

ThomasLecocq commented Feb 12, 2025

re: many tables: it's also a good idea, it doesn't really matter how many tables there are, and allows to easily add extra methods. Table naming : %prefix + "dvv_<method_slug>" : dvv_mwcs, dvv_stretching, dvv_wct, dvv_dtw etc...

with that approach, tables are effectively "objects" that can be linked to the filters individually

@asyates
Copy link
Contributor Author

asyates commented Feb 12, 2025

maybe just need to rename i.e. rather than dtt_minlag and dtt_width.... lag_win_start, lag_win_length, as not specific to dt/t (or even just window_start and window_length)
agree with the renaming.

Still think stretching is a "dvv" method, and should be "after filter" (if you think of the steps as a chain)...

Sure, it's definitely after the filter.... but i guess if you move stretching params into the filter table then we should also be moving all mwcs_dtt_ and maybe some wct_ params in there too.

Practically I have less reason to define these at the filter level (whereas the dtt_minlag and and dtt_width in my opinion should absolutely be defined for individual filters, and are used by all methods), but if you think logically it makes sense for them to go in I can. Personally, i might prefer to leave just the params used by all methods in the filter table for now, and later look at the option of a separate DVV table as you describe.

or ofc, we look at the separate DVV table(s) for msnoise 2, but certainly could get messy if we try and do this in the next 1-2 weeks ;)

@ThomasLecocq
Copy link
Member

ThomasLecocq commented Feb 12, 2025

agree globally to keep it simple:

just an edge case : if dtt_minlag is global, and dtt_lag=dynamic is set, it's all good for CC, but if you're doing SC, the stretching method should go back to minlag... but that minlag & width are also filter dependent, no ?

@ThomasLecocq
Copy link
Member

or did I just rewrite what you propose above ?

@asyates
Copy link
Contributor Author

asyates commented Feb 12, 2025

its a good point... i am not sure there is any check in place right now to catch if its SC + dynamic, because, as you suggest, it should then use minlag. I suspect it doesn't... (will double check and fix if not).

@ThomasLecocq
Copy link
Member

ThomasLecocq commented Feb 12, 2025

without going back in history, but... going back to a question you asked 2 years ago: Why don't we CCF broadband and then narrow band the MWCS: to do this, you'll need the dvv_mwcs table to still allow narrower band filters (N lines = N narrowband), since currently the MWCS process is doing this bandpass at the start:
image

this would allow:

1 CCF calculation + 10 MWCS narrowbands + 10 STR narrowbands + 1 WCT broadband --> to compare the outputs :-)

@asyates
Copy link
Contributor Author

asyates commented Feb 12, 2025

1 CCF calculation + 10 MWCS narrowbands + 10 STR narrowbands + 1 WCT broadband --> to compare the outputs :-)

I like this idea a lot, assuming we are comfortable that we do not expect strong differences between a narrow whiten/filter -> dvv versus broad whiten -> narrow fillter -> dvv. I have done small testing, but certainly not exhaustive, towards verifying that this is the case.

This would also mean the death of the filter table i guess? (just using preprocess_lowpass/_highpass).

Edit: unless you would intend still to define filters towards MWCS and STR computation separate from dvv_mwcs and dvv_stretching tables

@ThomasLecocq
Copy link
Member

I like this idea a lot, assuming we are comfortable that we do not expect strong differences between a narrow whiten/filter -> dvv versus broad whiten -> narrow fillter -> dvv. I have done small testing, but certainly not exhaustive, towards verifying that this is the case.

remains an open question for me, the changes you make now will allow benchmarking :)

This would also mean the death of the filter table i guess? (just using preprocess_lowpass/_highpass).

wrong, it's still nice to be able to have different filters too... e.g. to apply filter 1 to CC+AC+SC and filter 2 to AC+SC only? (yes, that would be an extra parameter) :-)

@asyates asyates changed the title dtt params to filter table + general config updates Major database schema updates: new tables for mwcs, stretching, wct params Feb 13, 2025
@asyates
Copy link
Contributor Author

asyates commented Feb 16, 2025

Okay... i think all the core functionality is now implemented. I've documented the big changes below. Still some things left to do (also documented below), but wondering whether worth merging this request now and doing the rest as further smaller pull requests (since this is already large). Not sure why one test currently getting held up on updating environment, but otherwise they all pass ;)

Key changes:

Many to many relationships between different steps through association tables
By far the biggest change: New individual tables for mwcs/stretching/wct params and mwcs_dtt/wct_dtt params. For mwcs/stretching/wct params it is now possible to define many different parameter choices and point them to one or many filters. Similarly many mwcs_dtt/wct_dtt_params can be defined and pointed to many mwcs/stretching/wct params. The relationships between different levels/steps is stored in association tables, and displayed dynamically in msnoise admin. For updating/inserting via api, a list of ids is used to update the association tables (e.g. defining which filter_ids a particular set of dvv params should point to).

When saving results, the convention follows filt_id/level1_id/level2_id.... where mwcs/stretching/wct_params are level one, and dtt params are level 2. Note, no intermediate wavelet transform results are saved (prior to dt/t) as the size of such files would be huge (i checked ;) ). So the wavelet transform + dt/t are still performed in one step for now... however its still now possible to have many to many relationship between different wavelet params and wavelet_dtt params.

Fig 1:
image

Fig 2:
image

Fig 3:
image

Fig 4:
image

Fig 5.
image

Key point is that we are no longer relying on single global values for various parameters involved in DVV steps.

Toggles in filter table for CC/SC/AC
Haven't tested extensively, but should now be able to turn off CC/SC/AC at the filter level for the different steps. It is still defined globally in the config table... as it was being used in new_jobs creation so left this... but most steps will now extract the relevant components based on the toggles in the filter table.

image

To do (outside of this pull request?):

  • Tidy up documentation, and also update logger outputs to reflect many to many relationships (i.e. looping through different param choices).
  • Some of params for stretching and wct_dtt not implemented yet (some i added because should be consistent across different methods e.g. static/dynamic choice, choice of which side of CCF etc.
  • maxdt needts to be updated to maxdtt
  • Maybe less important, but im sure some choices of variable names could be made more intuitive... but that's easy to change later.

I had thought about having one entry in each table already in place... but might get confusing if someone working with just api (i.e. maybe not aware it's there). So i think easiest way to will be to develop a quick_start notebook.

@ThomasLecocq
Copy link
Member

MASSIVE work here @asyates !!!!

Is the config max_dt or max_dtt now ?

It's of course super hard to review such a massive change, I'll try to give it a go in the next days !

@asyates
Copy link
Contributor Author

asyates commented Feb 20, 2025

max_dtt now ;)

note, this is just for mwcs. It is still maxdt for wavelet. Both stretching and wavelet also need some work to finish their coda window param choices (i have included some params as placeholders that are currently not implemented).

@ThomasLecocq
Copy link
Member

Sorry for the delay, it was too much for Skience, and I need time to test it :-) Was wondering if it'd be clever now to change the folder on the disk's naming from "01" to "f01", for filters, 'mwcs01' etc ?
that tree view is a bit scary (although I like the idea of forcing users to use the API... but yay)
image

@asyates
Copy link
Contributor Author

asyates commented Mar 10, 2025

Agree re. improving the folder names. Can look to make this change next week (this week fully booked). Don't hesitate ofc if anything else comes to mind in mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants