data_quality_control.util module
Helper routines for base, sds-db modules.
- data_quality_control.util._create_hdf5_attribs(f, stationcode, starttime, endtime, freqs, winlen_seconds)
Set meta data in hdf5-file as attributes.
- Parameters
f (h5py.File) – open, editable h5py.File-object.
stationcode (str) – string of format “{network}.{station}.{location}.{channel}”
starttime (UTCDateTime) – starttime of data in file
endtime (UTCDateTime) – endtime of data in file. to be precise, it is the starttime of the last processing window, i.e. the actual endtime-proclen_seconds
freqs (2-tuple) – min, max frequency between which amplitude was filtered
winlen_seconds (int) – length of window over which amplitudes were extracted and psds computed, in seconds
proclen_seconds (int) – length of time series that was processed at once, in seconds. E.g. 1 day of data
- data_quality_control.util._iter_open_hdf5(fnamelist)
Generator that returns open h5py.File object for each filename in fnamelist. Closes file before yielding next file and before error is raised.
Useful for testing or if direct access to the hdf5-files is needed.
- data_quality_control.util.assert_integer_quotient_of_wins_per_fileunit(winlen_seconds, fileunit)
- data_quality_control.util.choose_datetime_inc(dt)
Determine a human-friendly time increment for dt (in seconds) and numpy.datetime character.
Used to create ticklabels for time axis.
Example
dt = 3600–>k = "H", inc = 1dt = 7200–>k = "H", inc = 2dt = 900–>k = "m", inc = 15
- Parameters
dt (int) – time increment in seconds
- Returns
k (str [“M”, “D”, “h”, “m”, “s”]) – character specifying numpy.datetime data type
dtype='datetime64[{}]'.format(k)inc (int) – time increment in unit of
k
- data_quality_control.util.get_adjacent_frames(tr, starttime, nf, winlen_samples)
Reshape obspy trace into (
nf,winlen_samples). These frames do not overlap!.We slice the trace at
starttimeand add as many samples as necessary to fill the targeted shape.One frame contains the data over which amplitude and psd are computed.
- Parameters
tr (obspy.core.Trace) – seismic data trace
starttime (obspy.core.UTCDateTime) – where to start in the trace
nf (int) – number of frames
winlen_samples (int) – number of samples in one frame
- data_quality_control.util.get_amplitude(tr, starttime, fmin, fmax, overlap, winlen_samples, nf, percentile=75)
Extract percentile of amplitude in consecutive time windows.
The seismic data in
tris bandpass-filtered before extracting the amplitudes.The percentile is computed over
winlen_samples. Windows are created using eitherget_overlapping_tapered_framesorget_adjacent_frames- Parameters
tr (Trace) – time series of seismic data to process
starttime (UTCDateTime) – starttime in trace
fmin (floats) – min and max frequency of bandpass
fmax (floats) – min and max frequency of bandpass
overlap (int) – length of overlap in seconds. Only used if nans are present in data. See Note.
winlen_samples (int) – Size of windows (frames) over which percentile is computed, in samples.
nf (int) – number of frames into which data is divided
percentile (int, float [75]) – percentile of amplitude, which we get for each frame.
Note
If trace is free of nans, we can simply filter the whole trace at once and use reshape to get the windows. If there are nans, the obspy filter function only filters the data up to the first nan. Thus you can loose almost the entire trace because of a single nan in the early part. To avoid this, we check for nans and if there are some, we filter the data per window. This means however, we have to ensure again some overlap and the windowing will be slower.
- data_quality_control.util.get_end_of_month(stime)
Return last day of month in
stimeasUTCDateTime.date
- data_quality_control.util.get_fdsn_or_routing_client(server_or_routingservice)
- data_quality_control.util.get_overlapping_frames(x, framesize, incsize)
Split array x into subarrays (frames) of length framesize, starting every incsize-th sample.
- Parameters
x (np.ndarray (1d)) – data array
framesize (int) – number of samples per frame
incsize (int) – number of samples between beginning of two successive frames
- Returns
f – array of shape
(((x.size-framesize) // incsize +1), framesize)- Return type
np.ndarray (2D)
Note
Only data until the last full frame is used. Trailing samples are truncated in the output.
Example
>>>> x = np.arange(1,5+1).repeat(3) >>>> print(x) [1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
Adjacent frames without overlaps:
>>>> f = util.get_overlapping_frames(x, 3, 3) >>>> print(f.shape) (5, 3) >>>> print(f) [[1 1 1] [2 2 2] [3 3 3] [4 4 4] [5 5 5]]
Overlapping frames:
>>>> f = util.get_overlapping_frames(x, 5, 3) >>>> print(f.shape) (4, 5) >>>> print(f) [[1 1 1 2 2] [2 2 2 3 3] [3 3 3 4 4] [4 4 4 5 5]]
- data_quality_control.util.get_overlapping_tapered_frames(tr, starttime, nf, winlen_samples, taper_samples)
Splits the trace up into (overlapping) Tukey windows.
A Tukey window combines a boxcar-window with cosine-tapered flanks at the egdes. The flanks of two successive windows overlap.
Frames containing any Nans are set entirely to Nan. Loosely based on obspy.signal.util.enframe but faster.
We slice the trace at
starttimeand add as many samples as necessary to fill the targeted shape.One frame contains the data over which amplitude and psd are computed.
- Parameters
tr (obspy.core.Trace) – seismic data trace
starttime (obspy.core.UTCDateTime) – where to start in the trace
nf (int) – number of frames
winlen_samples (int) – number of samples in one frame
taper_samples (int) – number of samples for the tapered region. Should be long enough to accommodate potential filter effects.
- data_quality_control.util.iter_days(starttime, endtime)
Returns starttime and endtime of every day (24h) between
starttimeandendtime.Wrapper for
iter_timeinc(starttime, endtime, inc=24*3600, timelevel=3)
- data_quality_control.util.iter_hours(starttime, endtime)
Returns starttime and endtime of every hour (3600s) between
starttimeandendtime.Wrapper for
iter_timeinc(starttime, endtime, inc=3600, timelevel=4)
- data_quality_control.util.iter_month(startdate, enddate)
Iterator that returns date of first and last day of each month between
startdateandenddate. For months ofstartdateandenddate, respective dates are returns.Intended to use for processing data in batches of 1 month, i.e. output files will cover 1 month of data.
- Parameters
startdate (UTCDateTime) –
enddate (UTCDateTime) –
Example
from data_quality_control import util from obspy.core import UTCDateTime as UTC sdate = UTC("2020-05-20") edate = UTC("2021-10-30") for s, e in util.iter_month(sdate, edate): print(s.date, e.date) >>> 2020-05-20 2020-05-31 >>> 2020-06-01 2020-06-30 >>> 2020-07-01 2020-07-31 >>> 2020-08-01 2020-08-31 >>> 2020-09-01 2020-09-30 >>> 2020-10-01 2020-10-31 >>> 2020-11-01 2020-12-31 >>> 2021-01-01 2021-01-31 >>> 2021-02-01 2021-02-28 >>> 2021-03-01 2021-03-31 >>> 2021-04-01 2021-04-30 >>> 2021-05-01 2021-05-31 >>> 2021-06-01 2021-06-30 >>> 2021-07-01 2021-07-31 >>> 2021-08-01 2021-08-31 >>> 2021-09-01 2021-09-30 >>> 2021-10-01 2021-10-25
- data_quality_control.util.iter_timeinc(startdate, enddate, inc, timelevel)
General iterator that yields starttime, endtime of every
inctime interval.Use for
inc< 1 month.- Parameters
startdate (UTCDateTime) –
enddate (UTCDateTime) –
inc (int) – duration of interval in seconds
timelevel (int (3, 4, 5, 6)) – Up to which index of the timetuple should be used. E.g. for
inc=3600,timelevel=4(== iter_hours()). Forinc=12*3600also usetimelevel=4. But forinc=24*3600, usetimelevel=3(== iter_days()).
- data_quality_control.util.iter_years(startdate, enddate)
Iterator that returns date of first and last day of each year between
startdateandenddate. For years ofstartdateandenddate, respective dates are returns.Intended to use for processing data in batches of 1 year, i.e. output files will cover 1y of data.
- Parameters
startdate (UTCDateTime) –
enddate (UTCDateTime) –
Example
from data_quality_control import util from obspy.core import UTCDateTime as UTC sdate = UTC("2018-05-20") edate = UTC("2021-10-30") for s, e in util.iter_years(sdate, edate): print(s, e) >>> 2018-05-20T00:00:00.000000Z 2018-12-31T00:00:00.000000Z >>> 2019-01-01T00:00:00.000000Z 2019-12-31T00:00:00.000000Z >>> 2020-01-01T00:00:00.000000Z 2020-12-31T00:00:00.000000Z >>> 2021-01-01T00:00:00.000000Z 2021-10-30T00:00:00.000000Z
- data_quality_control.util.merge_different_samplingrates(st)
Merge stream, fill gaps with nans. If traces have different sampling rates, resample to highest.
- data_quality_control.util.next_month(stime)
Return first day of month after
stimeasUTCDateTime. Respects turn of year.
- data_quality_control.util.process_stream(st, inv, starttime, endtime, sampling_rate=None)
Standard processing steps to perform on seismic data before further analysis. Returns single trace.
Includes: - remove sensitivity - merge data and fill gaps with
np.nan- trim, pad withnp.nan- demean by subtractingnp.nanmean- Parameters
st (obspy.core.Stream) – raw seismic data. Should be only one station!
inv (obspy.core.Inventory) – meta data of station, should include response information
starttime (obspy.core.UTCDateTime) – starttime to which trace is trimmed
endtime (obspy.core.UTCDateTime) – endtime to which trace is trimmed
- Returns
tr – Single trace.
- Return type
obspy.core.Trace
Warning
If more than one station-channel combination is present, we will return only the first one. However, there can be multiple traces of the same station-channel.
Notes
If data is obtained from a datacenter, it may contain gaps. If so, the stream contains multiple traces, which we
st.merge()into a single trace, filling the gaps with Nan.
- data_quality_control.util.resample(st, sampling_rate)
Resample traces in stream to common sampling rate if necessary.
We use tr.interpolate() to increase and tr.resample() to decrease sampling rate.