-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major database schema updates: new tables for mwcs, stretching, wct params #395
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #395 +/- ##
==========================================
+ Coverage 63.66% 64.97% +1.31%
==========================================
Files 44 44
Lines 7672 8535 +863
==========================================
+ Hits 4884 5546 +662
- Misses 2788 2989 +201 ☔ View full report in Codecov by Sentry. |
Removed mwcs_low and mwcs_high, and renamed 'low' and 'high' to freqmin and freqmax to match obspy definitions. To do on this pull request
|
Super work here @asyates ! so remaining mwcs_dtt_mincoh, mwcs_dtt_maxerr etc are going to be super easy to understand! that's cool ! Still thinking of splitting the MWCS+DTT ; WCT ; STR from the filter table... be in another table, and with links to the filter table's ref.. Allowing many-to-one reference |
@asyates so the stretching parameters should also go in the filters' definition ? |
could do... though, at the same time, i don't usually consider to change these for individual filters (could be applicable, but for the most part I am choosing a value of stretching_max that will cover full range of possibilities i.e. can then just set stretching_nsteps larger if worried about resolution). So i would say not necessary. Plus, the stretching is also using the dtt_params currently in the filter table, and wct also once i implement static/dynamic lag time option. So makes sense for them to be there rather than specific to stretching param . maybe just need to rename i.e. rather than dtt_minlag and dtt_width.... lag_win_start, lag_win_length, as not specific to dt/t (or even just window_start and window_length). Thoughts @ThomasLecocq ? |
re. splitting this... do you envisage a separate table for each? i.e. table for mwcs+dtt, wct(+dtt), str? Or all in one table? Things that come to mind immediately: Pro of separate tables:
Cons:
|
Still think stretching is a "dvv" method, and should be "after filter" (if you think of the steps as a chain)... |
Yep, I'm really thinking of altering the schema (could be for msnoise 3) to include a DVV table that links to one (or more) filters... the dvv table fields could be dynamic: where method= ["mwcs", "wct", "stretching", new method...] the difficulty here is to create a schema that accepts the params as such. The way wordpress does is to store the param_name=value in different ROWS , effectively splitting the "dvv row" into:
chat GPT might have another idea for that :) |
re: many tables: it's also a good idea, it doesn't really matter how many tables there are, and allows to easily add extra methods. Table naming : %prefix + "dvv_<method_slug>" : dvv_mwcs, dvv_stretching, dvv_wct, dvv_dtw etc... with that approach, tables are effectively "objects" that can be linked to the filters individually |
Sure, it's definitely after the filter.... but i guess if you move stretching params into the filter table then we should also be moving all mwcs_dtt_ and maybe some wct_ params in there too. Practically I have less reason to define these at the filter level (whereas the dtt_minlag and and dtt_width in my opinion should absolutely be defined for individual filters, and are used by all methods), but if you think logically it makes sense for them to go in I can. Personally, i might prefer to leave just the params used by all methods in the filter table for now, and later look at the option of a separate DVV table as you describe. or ofc, we look at the separate DVV table(s) for msnoise 2, but certainly could get messy if we try and do this in the next 1-2 weeks ;) |
agree globally to keep it simple: just an edge case : if dtt_minlag is global, and dtt_lag=dynamic is set, it's all good for CC, but if you're doing SC, the stretching method should go back to minlag... but that minlag & width are also filter dependent, no ? |
or did I just rewrite what you propose above ? |
its a good point... i am not sure there is any check in place right now to catch if its SC + dynamic, because, as you suggest, it should then use minlag. I suspect it doesn't... (will double check and fix if not). |
I like this idea a lot, assuming we are comfortable that we do not expect strong differences between a narrow whiten/filter -> dvv versus broad whiten -> narrow fillter -> dvv. I have done small testing, but certainly not exhaustive, towards verifying that this is the case. This would also mean the death of the filter table i guess? (just using preprocess_lowpass/_highpass). Edit: unless you would intend still to define filters towards MWCS and STR computation separate from dvv_mwcs and dvv_stretching tables |
remains an open question for me, the changes you make now will allow benchmarking :)
wrong, it's still nice to be able to have different filters too... e.g. to apply filter 1 to CC+AC+SC and filter 2 to AC+SC only? (yes, that would be an extra parameter) :-) |
Okay... i think all the core functionality is now implemented. I've documented the big changes below. Still some things left to do (also documented below), but wondering whether worth merging this request now and doing the rest as further smaller pull requests (since this is already large). Not sure why one test currently getting held up on updating environment, but otherwise they all pass ;) Key changes: Many to many relationships between different steps through association tables When saving results, the convention follows filt_id/level1_id/level2_id.... where mwcs/stretching/wct_params are level one, and dtt params are level 2. Note, no intermediate wavelet transform results are saved (prior to dt/t) as the size of such files would be huge (i checked ;) ). So the wavelet transform + dt/t are still performed in one step for now... however its still now possible to have many to many relationship between different wavelet params and wavelet_dtt params. Key point is that we are no longer relying on single global values for various parameters involved in DVV steps. Toggles in filter table for CC/SC/AC To do (outside of this pull request?):
I had thought about having one entry in each table already in place... but might get confusing if someone working with just api (i.e. maybe not aware it's there). So i think easiest way to will be to develop a quick_start notebook. |
MASSIVE work here @asyates !!!! Is the config max_dt or max_dtt now ? It's of course super hard to review such a massive change, I'll try to give it a go in the next days ! |
max_dtt now ;) note, this is just for mwcs. It is still maxdt for wavelet. Both stretching and wavelet also need some work to finish their coda window param choices (i have included some params as placeholders that are currently not implemented). |
Agree re. improving the folder names. Can look to make this change next week (this week fully booked). Don't hesitate ofc if anything else comes to mind in mean time. |
EDIT: This pull request now developed into big revamp of database schema for msnoise 2
Afew things added/changed. I still want to make some further adjustments (listed below), but if there is any feedback on these changes i.e. whether they're wanted or not, or if there is a better way of doing it, would be useful before merging.
Changes:
dtt_minlag, dtt_width, and dtt_v, moved to filter table. I've always personally, at least for dtt_minlag and dtt_width, adjusted the code to have these defined for individually for each filter as makes the most sense to me (especially for single station). dtt_v, could go either way... ofc could expect different ballistic velocity at different frequencies, but seems less important maybe to define individually.
config sql table contains used_in field which is a list of steps where it is used/defined. It can be used as a filter on msnoise admin (originally was looking to do this without having it as a column in the sql table, only in defaults.csv, but was easier just including it in the config table... and just not showing it on admin).
adjusted some of parameters in wavelet codes (from using 'dtt' to 'wct') to make it clear they are wavelet related, e.g. wct_minlag, wct_codacycles, wct_min_nonzero. Might rename some of these variables anyway actually, as i just chose them quickly when writing the code.
Some images of the changes
filter table new columns:

dropdown option in config with used_in filter:


wct param name changes:
**
To do (related to param changes)
As said, feedback welcome re. any of this!