data_quality_control.util module

Helper routines for base, sds-db modules.

data_quality_control.util._create_hdf5_attribs(f, stationcode, starttime, endtime, freqs, winlen_seconds)

Set meta data in hdf5-file as attributes.

Parameters
  • f (h5py.File) – open, editable h5py.File-object.

  • stationcode (str) – string of format “{network}.{station}.{location}.{channel}”

  • starttime (UTCDateTime) – starttime of data in file

  • endtime (UTCDateTime) – endtime of data in file. to be precise, it is the starttime of the last processing window, i.e. the actual endtime-proclen_seconds

  • freqs (2-tuple) – min, max frequency between which amplitude was filtered

  • winlen_seconds (int) – length of window over which amplitudes were extracted and psds computed, in seconds

  • proclen_seconds (int) – length of time series that was processed at once, in seconds. E.g. 1 day of data

data_quality_control.util._iter_open_hdf5(fnamelist)

Generator that returns open h5py.File object for each filename in fnamelist. Closes file before yielding next file and before error is raised.

Useful for testing or if direct access to the hdf5-files is needed.

data_quality_control.util.assert_integer_quotient_of_wins_per_fileunit(winlen_seconds, fileunit)
data_quality_control.util.choose_datetime_inc(dt)

Determine a human-friendly time increment for dt (in seconds) and numpy.datetime character.

Used to create ticklabels for time axis.

Example

  • dt = 3600 –> k = "H", inc = 1

  • dt = 7200 –> k = "H", inc = 2

  • dt = 900 –> k = "m", inc = 15

Parameters

dt (int) – time increment in seconds

Returns

  • k (str [“M”, “D”, “h”, “m”, “s”]) – character specifying numpy.datetime data type dtype='datetime64[{}]'.format(k)

  • inc (int) – time increment in unit of k

data_quality_control.util.get_adjacent_frames(tr, starttime, nf, winlen_samples)

Reshape obspy trace into (nf, winlen_samples). These frames do not overlap!.

We slice the trace at starttime and add as many samples as necessary to fill the targeted shape.

One frame contains the data over which amplitude and psd are computed.

Parameters
  • tr (obspy.core.Trace) – seismic data trace

  • starttime (obspy.core.UTCDateTime) – where to start in the trace

  • nf (int) – number of frames

  • winlen_samples (int) – number of samples in one frame

data_quality_control.util.get_amplitude(tr, starttime, fmin, fmax, overlap, winlen_samples, nf, percentile=75)

Extract percentile of amplitude in consecutive time windows.

The seismic data in tr is bandpass-filtered before extracting the amplitudes.

The percentile is computed over winlen_samples. Windows are created using either get_overlapping_tapered_frames or get_adjacent_frames

Parameters
  • tr (Trace) – time series of seismic data to process

  • starttime (UTCDateTime) – starttime in trace

  • fmin (floats) – min and max frequency of bandpass

  • fmax (floats) – min and max frequency of bandpass

  • overlap (int) – length of overlap in seconds. Only used if nans are present in data. See Note.

  • winlen_samples (int) – Size of windows (frames) over which percentile is computed, in samples.

  • nf (int) – number of frames into which data is divided

  • percentile (int, float [75]) – percentile of amplitude, which we get for each frame.

Note

If trace is free of nans, we can simply filter the whole trace at once and use reshape to get the windows. If there are nans, the obspy filter function only filters the data up to the first nan. Thus you can loose almost the entire trace because of a single nan in the early part. To avoid this, we check for nans and if there are some, we filter the data per window. This means however, we have to ensure again some overlap and the windowing will be slower.

data_quality_control.util.get_end_of_month(stime)

Return last day of month in stime as UTCDateTime.date

data_quality_control.util.get_fdsn_or_routing_client(server_or_routingservice)
data_quality_control.util.get_overlapping_frames(x, framesize, incsize)

Split array x into subarrays (frames) of length framesize, starting every incsize-th sample.

Parameters
  • x (np.ndarray (1d)) – data array

  • framesize (int) – number of samples per frame

  • incsize (int) – number of samples between beginning of two successive frames

Returns

f – array of shape (((x.size-framesize) // incsize +1), framesize)

Return type

np.ndarray (2D)

Note

Only data until the last full frame is used. Trailing samples are truncated in the output.

Example

>>>> x = np.arange(1,5+1).repeat(3) >>>> print(x) [1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]

Adjacent frames without overlaps:

>>>> f = util.get_overlapping_frames(x, 3, 3) >>>> print(f.shape) (5, 3) >>>> print(f) [[1 1 1] [2 2 2] [3 3 3] [4 4 4] [5 5 5]]

Overlapping frames:

>>>> f = util.get_overlapping_frames(x, 5, 3) >>>> print(f.shape) (4, 5) >>>> print(f) [[1 1 1 2 2] [2 2 2 3 3] [3 3 3 4 4] [4 4 4 5 5]]

data_quality_control.util.get_overlapping_tapered_frames(tr, starttime, nf, winlen_samples, taper_samples)

Splits the trace up into (overlapping) Tukey windows.

A Tukey window combines a boxcar-window with cosine-tapered flanks at the egdes. The flanks of two successive windows overlap.

Frames containing any Nans are set entirely to Nan. Loosely based on obspy.signal.util.enframe but faster.

We slice the trace at starttime and add as many samples as necessary to fill the targeted shape.

One frame contains the data over which amplitude and psd are computed.

Parameters
  • tr (obspy.core.Trace) – seismic data trace

  • starttime (obspy.core.UTCDateTime) – where to start in the trace

  • nf (int) – number of frames

  • winlen_samples (int) – number of samples in one frame

  • taper_samples (int) – number of samples for the tapered region. Should be long enough to accommodate potential filter effects.

data_quality_control.util.iter_days(starttime, endtime)

Returns starttime and endtime of every day (24h) between starttime and endtime.

Wrapper for iter_timeinc(starttime, endtime, inc=24*3600, timelevel=3)

data_quality_control.util.iter_hours(starttime, endtime)

Returns starttime and endtime of every hour (3600s) between starttime and endtime.

Wrapper for iter_timeinc(starttime, endtime, inc=3600, timelevel=4)

data_quality_control.util.iter_month(startdate, enddate)

Iterator that returns date of first and last day of each month between startdate and enddate. For months of startdate and enddate, respective dates are returns.

Intended to use for processing data in batches of 1 month, i.e. output files will cover 1 month of data.

Parameters
  • startdate (UTCDateTime) –

  • enddate (UTCDateTime) –

Example

from data_quality_control import util
from obspy.core import UTCDateTime as UTC

sdate = UTC("2020-05-20")
edate = UTC("2021-10-30")
for s, e in util.iter_month(sdate, edate):
    print(s.date, e.date)

>>> 2020-05-20 2020-05-31
>>> 2020-06-01 2020-06-30
>>> 2020-07-01 2020-07-31
>>> 2020-08-01 2020-08-31
>>> 2020-09-01 2020-09-30
>>> 2020-10-01 2020-10-31
>>> 2020-11-01 2020-12-31
>>> 2021-01-01 2021-01-31
>>> 2021-02-01 2021-02-28
>>> 2021-03-01 2021-03-31
>>> 2021-04-01 2021-04-30
>>> 2021-05-01 2021-05-31
>>> 2021-06-01 2021-06-30
>>> 2021-07-01 2021-07-31
>>> 2021-08-01 2021-08-31
>>> 2021-09-01 2021-09-30
>>> 2021-10-01 2021-10-25
data_quality_control.util.iter_timeinc(startdate, enddate, inc, timelevel)

General iterator that yields starttime, endtime of every inc time interval.

Use for inc < 1 month.

Parameters
  • startdate (UTCDateTime) –

  • enddate (UTCDateTime) –

  • inc (int) – duration of interval in seconds

  • timelevel (int (3, 4, 5, 6)) – Up to which index of the timetuple should be used. E.g. for inc=3600, timelevel=4 (== iter_hours()). For inc=12*3600 also use timelevel=4. But for inc=24*3600, use timelevel=3 (== iter_days()).

data_quality_control.util.iter_years(startdate, enddate)

Iterator that returns date of first and last day of each year between startdate and enddate. For years of startdate and enddate, respective dates are returns.

Intended to use for processing data in batches of 1 year, i.e. output files will cover 1y of data.

Parameters
  • startdate (UTCDateTime) –

  • enddate (UTCDateTime) –

Example

from data_quality_control import util
from obspy.core import UTCDateTime as UTC

sdate = UTC("2018-05-20")
edate = UTC("2021-10-30")
for s, e in util.iter_years(sdate, edate):
    print(s, e)

>>> 2018-05-20T00:00:00.000000Z 2018-12-31T00:00:00.000000Z
>>> 2019-01-01T00:00:00.000000Z 2019-12-31T00:00:00.000000Z
>>> 2020-01-01T00:00:00.000000Z 2020-12-31T00:00:00.000000Z
>>> 2021-01-01T00:00:00.000000Z 2021-10-30T00:00:00.000000Z
data_quality_control.util.merge_different_samplingrates(st)

Merge stream, fill gaps with nans. If traces have different sampling rates, resample to highest.

data_quality_control.util.next_month(stime)

Return first day of month after stime as UTCDateTime. Respects turn of year.

data_quality_control.util.process_stream(st, inv, starttime, endtime, sampling_rate=None)

Standard processing steps to perform on seismic data before further analysis. Returns single trace.

Includes: - remove sensitivity - merge data and fill gaps with np.nan - trim, pad with np.nan - demean by subtracting np.nanmean

Parameters
  • st (obspy.core.Stream) – raw seismic data. Should be only one station!

  • inv (obspy.core.Inventory) – meta data of station, should include response information

  • starttime (obspy.core.UTCDateTime) – starttime to which trace is trimmed

  • endtime (obspy.core.UTCDateTime) – endtime to which trace is trimmed

Returns

tr – Single trace.

Return type

obspy.core.Trace

Warning

If more than one station-channel combination is present, we will return only the first one. However, there can be multiple traces of the same station-channel.

Notes

If data is obtained from a datacenter, it may contain gaps. If so, the stream contains multiple traces, which we st.merge() into a single trace, filling the gaps with Nan.

data_quality_control.util.resample(st, sampling_rate)

Resample traces in stream to common sampling rate if necessary.

We use tr.interpolate() to increase and tr.resample() to decrease sampling rate.