data_quality_control.base module

This module provides low-level functionalities to extract amplitudes and power-spectral densities (PSDs) from seismic data.

It is intended to provide the most flexible interface in terms of data source and selecting time intervals for window size, processing length and file units.

The processing workflow is managed by the GenericProcessor, which can used as base for more customized classes. E.g. the data_quality_control.sds_db.SDSProcessor is taylored to using the sds-client of obspy.

class data_quality_control.base.BaseProcessedData(startdate=None, enddate=None, stationcode='....', amplitude_frequencies=(None, None), winlen_seconds=None)

Bases: object

Handle processed amplitudes and spectra together with some meta data. Read/write HDF5

Parameters
  • startdate (UTCDateTime [None]) – startdate of data

  • enddate (UTCDateTime [None],) –

  • stationcode (str ["..."]) – station code of format “{network}.{station}.{location}.{channel}”

  • amplitude_frequencies (2-tuple [(None,None)]) – min, max frequency between which data was filtered for amplitude extraction

  • winlen_seconds (int [None]) – seconds per frame over which amplitude and psd were computed

  • proclen_seconds (int [None]) – length of data in seconds that were processed at once

amplitudes

timeseries of averaged amplitude values

Type

np.ndarray (1D)

psds

Timeseries of power spectral densities. Shape is (N_times, N_frequencies).

Type

np.ndarray (2D)

frequency_axis

frequencies corresponding to psds

Type

np.ndarray (1D)

stationcode

code of seismic station in format {network}.{station}.{location}.{channel}

Type

str

amplitude_frequencies

Min and max frequencies at which seismic data was filtered to calculate the average amplitude.

Type

tuple of 2 float

winlen_seconds

Window size in seconds over which psds/amplitudes were computed.

Type

int

property N_windows

number of expected samples based on start & end date.

Type

int

_check_codes(new)

Check if station codes are identical.

Parameters

new (BaseProcessedData) – another instance of this class

Raises

IOError – If codes are different.

_check_frequencies(new)

Check if frequency axis are similar using np.isclose().

Parameters

new (BaseProcessedData) – another instance of this class

Raises

IOError – if different.

_check_sanity(new)

Check consistency between new and existing data.

Parameters

new (BaseProcessedData) – another instance of this class

Runs

Raises
_check_shape_vs_time()

Check consistency of data shapes and times.

Raises
_check_shapes(new)

Deprecated since version 1.0.

Checks if number of processing windows and frequency axis of psd and amplitude are compatible. Throws IOError if not.

_get_date_and_time_axis_for_amplitude_matrix()

Get x and y axis for plotting amplitudes.

Returns

  • dateax – x axis are dates

  • timeax – y axis are hours of day.

property duration

Total duration in seconds from start to end time.

Type

int, float

extend(new)

Add a new output object.

Adjusts start/endtimes, removes leading/trailing Nans. Should only work if processing parameters, frequency axis are compatible.

Parameters

new (BaseProcessedData) – Add another instance to this one.

extend_from_file(file)

Read new result data from HDF5-file and add to existing.

Combines from_file() and extend()

Parameters

file (str, path-like) – Name of hdf5-file. Must have appropriate structure, obviously.

fill_days()

Extend time range to midnight of start/end day by filling time series with Nans.

Startdate YYYY-MM-DDThh:mm:ss is set to YYYY-MM-DDT00:00:00, enddate YYYY-MM-DDThh:mm:ss is set to YYYY-MM-DDT00:00:00 + 1day.

Used to prepare data for reshaping when plotting amplitudes.

from_file(fname)

Read properties and data from HDF5 file.

Parameters

fname (str) – Name of an hdf5-file.

get_nslc()

Split station code into network, station, location, channel

get_samples_to_midnight()
has_data()

Check if amplitude and/or spectral data are available

insert_in_file(fout)

Insert data in existing hdf5-file.

From our own start/end times and those in file, and the frame and processing lengths, we determine the indices at which to insert the new data.

Parameters

fout (h5py.File) – file-object, ready for write in which to insert the data

Note

Raises ValueError if - we have no data attributes - starttime in file > self.startdate

Warning

Overrides existing data in file.

plot_amplitudes(ax=None, **imshow_kwargs)

Plot amplitude matrix using matplotlib.

Parameters

ax (matplotlib.Axes) – Axes to use. If not given, new figure is created.

Returns

ax – Axes containing the plot.

Return type

matplotlib.Axes

plot_psds(func=None, tax=None, ax=None, N_freqlabels=8, N_timeticks=8)

Plot PSD matrix as (time, frequency) using matplotlib.

First two axes (proclens, winwlens) are flattened.

Parameters
  • func (function) – Apply to PSD data before plotting, e.g. np.log.

  • tax (ndarray) – labels of time axis. If not given, labels are created from start/endtime

  • ax (matplotlib.axes) – Axes to use

  • N_freqlabels (int) – Number of tick labels of frequency axis

Returns

ax – Axes containing the plot.

Return type

matplotlib.axes

reshape_amps_to_days()

Reshape amplitudes from 1d timeseries to 2d array. First dim is number of windows per 24h, Second dim is number of days between start/end date. Fills leading/trailing windows with Nans if necessary to reach full days.

Used by plot_amplitudes()

set_data(amplitudes, psds, freqs)

Add amplitudes, psds and frequency axis data.

Raises RuntimeWarning if shapes are inconsistent, e.g. psds.shape[-1] != freqs.size

Parameters
  • amplitudes (ndarray, 2d) – amplitude data of shape (n_proclen, n_winlen)

  • psds (ndarray, 3d) – power spectral densities of shape (n_proclen, n_winlen, n_frequencies)

  • freqs (ndarray, 1d) – frequency axis corresponding to psds of shape (n_frequencies,)

set_time(starttime, endtime)

Set startdate and enddate.

Parameters
  • starttime (obspy.core.UTCDateTime) – new starttime

  • endtime (obspy.core.UTCDateTime) – new endtime

to_file(outdir='.')

Write content to HDF5 file.

Filename is created from stationcode, startdate and enddate.

Parameters

outdir (str ['.']) – Directory in which the file is stored. Default is working directory.

trim_nan()

Remove entire nan-rows at both ends and adjust start/enddate.

class data_quality_control.base.GenericProcessor(nslc_code, dataclient, invclient, outdir='.', preprocessing=None, fileunit='year', **procparams)

Bases: object

Organize processing of data sets and save results to disk.

Parameters
  • network (str) – network code

  • station (str) – station code

  • location (str) – location code

  • channel (str) – channel code

  • dataclient (obspy.clients) – client to obtain data from (dataclient.get_waveforms())

  • invclient (obspy.clients) – client to get meta data from (invclient.get_stations())

  • outdir (str, path-like) – directory to which results are written

  • preprocessing (func) – Function, that performs basic preprocessing of seismic data. Must take obspy.Stream as argument and return obspy.Trace.

  • fileunit ({"hour", "day", "month", "year"}, ["year"]) – Interval after which a new output file is started. Determines also format string for file-name E.g.: fileunit="year" means all results for data from 1 year (01-January - 31-December) will be stored in one file. If start/end time for processing are different from 01-January/31-December files will be truncated accordingly.

  • **procparams – processing parameters as keyword arguments data_quality_control.base.ProcessingParameters

Note

Format of filenames will be determined by fileunit as

FNAME_BASE = "{outdir}/{network}.{station}.{location}.{channel}_"
FNAME_FMTS = {
    "year" :    FNAME_BASE + "{year:04d}.hdf5",
    "month" :   FNAME_BASE + "{year:04d}-{month:02d}.hdf5",
    "day" :     FNAME_BASE + "{year:04d}-{month:02d}-{day:02d}.hdf5",
    "hour" :    FNAME_BASE + "{year:04d}-{month:02d}-{day:02d}-{hour:02d}.hdf5",
    }

year, month, day, hour come from datetime object.

Note

expand_nslc() can be customized for a specific data base or data client. If implemented, this may allow for wildcards in station, network, location, channel

expand_nslc(starttime=None, endtime=None)

Expand network, station, location, channel settings into lists.

This may need to be customized to the data base. It sets lists of respective codes as attributes _networks, __stations, _locations, _channels

iter_nslc()

Iterator that yields each combination of network, station, location, channel from self._networks, self._stations, self._locations, self._channels. These lists are set by expand_nslc()

iter_time(starttime, endtime)

Iterator that yields start/end of intervals between starttime, endtime depending on fileunit.

Example

If fileunit='year':

gp = GenericProcessor(... fileunit="year")
stime = UTC("2018-05-14")
etime = UTC("2021-09-20")
for s, e in gp.iter_time(stime, etime):
    print(s.date, e.date)
>>>> 2018-05-14 2018-12-31
>>>> 2019-01-01 2019-12-31
>>>> 2020-01-01 2020-12-31
>>>> 2021-01-01 2021-09-20
process(starttime, endtime, force_new_file=False)

Execute processing.

This is the method you most likely want to call!

Iterates through processing interval defined by fileunit and calls NSCProcessor().process() for each NSLC combination.

Manages creation of output files.

property winlen_seconds
class data_quality_control.base.NSCProcessor(netw, stat, loc, chan, dataclient, invclient, **procparams)

Bases: object

Process one specific network-station-location-channel combination.

Processing means extracting percentile of amplitude and power spectral density over some time window.

Parameters
  • netw (str) – network code

  • stat (str) – station code

  • chan (str) – channel code

  • loc (str) – location code

  • dataclient (obspy.clients) – client to obtain data from (dataclient.get_waveforms())

  • invclient (obspy.clients) – client to get meta data from (invclient.get_stations())

  • **procparams – processing parameters as keyword arguments data_quality_control.base.ProcessingParameters

nsc_as_dict()

Return network, station, location, channel as dict.

process(starttime, endtime, preprocessing=None)

Compute noise amplitudes and spectra of raw data within time range.

Between start/end time, we load waveform data in chunks of proclen_seconds. Amplitude and PSD are computed for each frame of length winlen_seconds that chunk. So the number of frames per chunk is nf = proclen_seconds / winlen_seconds

Empty streams are replaced by np.nan.

Parameters
  • startdate (UTCDateTime) –

  • enddate (UTCDateTime) –

  • preprocessing (func [None]) – function that takes obspy.Stream and returns obspy.Trace. If None, we use dataqc.util.process_stream() which merges and trims data, fills gaps with np.nan and removes mean.

Returns

output – An object, which bundles amplitude and psd matrix, frequency axis for psd and processing parameters. Shape of amplitude will be (n_proclens, nf), n_proclens being the number of chunks between start/end time. Shape of psd will be (n_proclens, nf, nperseg//2+1). The last axis is the number of frequencies in psd.

Return type

BaseProcessedData

class data_quality_control.base.ProcessedDataFileManager(outdir, nperseg=None, processeddata=None, fileunit='year')

Bases: object

allocate_file_start_end()

Determine datetime of start/end time of data covered in output file depending on fileunit.

Example

  • if fileunit="year", starttime=UTC("2021-04-10"):

    stime=2021-01-01, etime = 2021-12-31

  • if fileunit="month, starttime=UTC("2021-04-10"):

    stime = 2021-04-01, etime = 2021-04-30

create_ofile(force_fileunit=True)

Create output file.

property fname_fmt
get_ofile(force_new_file=False, force_fileunit=True)

Open or create new output file for results from an NSCProcessor

Parameters
  • nscprocessor (NSCProcessor) –

  • starttime (UTCDateTime) – nominal starttime of file

  • force_new_file (bool [False]) – If True, always create new, fresh file, i.e. override existing ones.

Returns

f – open file handler

Return type

h5py.File

nsc_as_dict()

Return network, station, location, channel as dict.

property ofilename
set_data(processeddata, forced_startdate=None)
property winlen_seconds
write_data(force_new_file=False, force_fileunit=True)
class data_quality_control.base.ProcessingParameters(**kwargs)

Bases: object

Bundle processing parameters.

Parameters
  • overlap (int [60]) – length of taper (overlap) if overlapping frames are necessary. In seconds.

  • amplitude_frequencies (2-tuple [(4,14)]) – min, max frequency of Butterworth bandpass, applied before computing amplitudes

  • nperseg (int [2048]) – segment size in samples to estimate Welch spectrum (see scipy.signal.welch ) Together with sampling_rate it determines the resolution of the frequency axis.

  • winlen_seconds (int [3600]) – length of time window (=frame) over which amplitude and psd are computed. In seconds

  • proclen_seconds (int [24*3600]) – length of data processed at once. In seconds. Default is 1 day, corresponding to length of data in 1 sds-file

  • sampling_rate (int [20]) – sampling rate of seismic data at which amplitude and spectra are computed. Data are resampled if original SR deviates. Together with nperseg it determines the resolution of the frequency axis.

get_dict()

Get attributes as dict.

update(**kwargs)

Update an attribute

data_quality_control.base.decorator_assert_integer_quotient_of_wins_per_fileunit(func)