data_quality_control.base module
This module provides low-level functionalities to extract amplitudes and power-spectral densities (PSDs) from seismic data.
It is intended to provide the most flexible interface in terms of data source and selecting time intervals for window size, processing length and file units.
The processing workflow is managed by the
GenericProcessor, which can used as base for
more customized classes. E.g. the
data_quality_control.sds_db.SDSProcessor
is taylored to using the sds-client of obspy.
- class data_quality_control.base.BaseProcessedData(startdate=None, enddate=None, stationcode='....', amplitude_frequencies=(None, None), winlen_seconds=None)
Bases:
objectHandle processed amplitudes and spectra together with some meta data. Read/write HDF5
- Parameters
startdate (UTCDateTime [None]) – startdate of data
enddate (UTCDateTime [None],) –
stationcode (str ["..."]) – station code of format “{network}.{station}.{location}.{channel}”
amplitude_frequencies (2-tuple [(None,None)]) – min, max frequency between which data was filtered for amplitude extraction
winlen_seconds (int [None]) – seconds per frame over which amplitude and psd were computed
proclen_seconds (int [None]) – length of data in seconds that were processed at once
- amplitudes
timeseries of averaged amplitude values
- Type
np.ndarray(1D)
- psds
Timeseries of power spectral densities. Shape is (N_times, N_frequencies).
- Type
np.ndarray(2D)
- stationcode
code of seismic station in format {network}.{station}.{location}.{channel}
- Type
str
- amplitude_frequencies
Min and max frequencies at which seismic data was filtered to calculate the average amplitude.
- Type
tupleof 2float
- winlen_seconds
Window size in seconds over which psds/amplitudes were computed.
- Type
int
- property N_windows
number of expected samples based on start & end date.
- Type
int
- _check_codes(new)
Check if station codes are identical.
- Parameters
new (
BaseProcessedData) – another instance of this class- Raises
IOError – If codes are different.
- _check_frequencies(new)
Check if frequency axis are similar using
np.isclose().- Parameters
new (
BaseProcessedData) – another instance of this class- Raises
IOError – if different.
- _check_sanity(new)
Check consistency between new and existing data.
- Parameters
new (
BaseProcessedData) – another instance of this class
Runs
_check_shape_vs_time()onselfandnew
- Raises
IOError – if one of the first 2 fails
AssertionError – see
_check_shape_vs_time()RuntimeWarning – see
_check_shape_vs_time()
- _check_shape_vs_time()
Check consistency of data shapes and times.
- Raises
RuntimeWarning – if number of time windows is inconsistent with
durationandwinlen_secondsAssertionError – if number of time windows in
amplitudesandpsdsare different
- _check_shapes(new)
Deprecated since version 1.0.
Checks if number of processing windows and frequency axis of psd and amplitude are compatible. Throws
IOErrorif not.
- _get_date_and_time_axis_for_amplitude_matrix()
Get x and y axis for plotting amplitudes.
- Returns
dateax – x axis are dates
timeax – y axis are hours of day.
- property duration
Total duration in seconds from start to end time.
- Type
int, float
- extend(new)
Add a new output object.
Adjusts start/endtimes, removes leading/trailing Nans. Should only work if processing parameters, frequency axis are compatible.
- Parameters
new (
BaseProcessedData) – Add another instance to this one.
- extend_from_file(file)
Read new result data from HDF5-file and add to existing.
Combines
from_file()andextend()- Parameters
file (str, path-like) – Name of hdf5-file. Must have appropriate structure, obviously.
- fill_days()
Extend time range to midnight of start/end day by filling time series with Nans.
Startdate YYYY-MM-DDThh:mm:ss is set to YYYY-MM-DDT00:00:00, enddate YYYY-MM-DDThh:mm:ss is set to YYYY-MM-DDT00:00:00 + 1day.
Used to prepare data for reshaping when plotting amplitudes.
- from_file(fname)
Read properties and data from HDF5 file.
- Parameters
fname (
str) – Name of an hdf5-file.
- get_nslc()
Split station code into network, station, location, channel
- get_samples_to_midnight()
- has_data()
Check if amplitude and/or spectral data are available
- insert_in_file(fout)
Insert data in existing hdf5-file.
From our own start/end times and those in file, and the frame and processing lengths, we determine the indices at which to insert the new data.
- Parameters
fout (
h5py.File) – file-object, ready for write in which to insert the data
Note
Raises ValueError if - we have no data attributes - starttime in file > self.startdate
Warning
Overrides existing data in file.
- plot_amplitudes(ax=None, **imshow_kwargs)
Plot amplitude matrix using matplotlib.
- Parameters
ax (
matplotlib.Axes) – Axes to use. If not given, new figure is created.- Returns
ax – Axes containing the plot.
- Return type
matplotlib.Axes
- plot_psds(func=None, tax=None, ax=None, N_freqlabels=8, N_timeticks=8)
Plot PSD matrix as (time, frequency) using matplotlib.
First two axes (proclens, winwlens) are flattened.
- Parameters
func (function) – Apply to PSD data before plotting, e.g. np.log.
tax (ndarray) – labels of time axis. If not given, labels are created from start/endtime
ax (matplotlib.axes) – Axes to use
N_freqlabels (int) – Number of tick labels of frequency axis
- Returns
ax – Axes containing the plot.
- Return type
matplotlib.axes
- reshape_amps_to_days()
Reshape
amplitudesfrom 1d timeseries to 2d array. First dim is number of windows per 24h, Second dim is number of days between start/end date. Fills leading/trailing windows with Nans if necessary to reach full days.Used by
plot_amplitudes()
- set_data(amplitudes, psds, freqs)
Add amplitudes, psds and frequency axis data.
Raises RuntimeWarning if shapes are inconsistent, e.g.
psds.shape[-1] != freqs.size- Parameters
amplitudes (ndarray, 2d) – amplitude data of shape
(n_proclen, n_winlen)psds (ndarray, 3d) – power spectral densities of shape
(n_proclen, n_winlen, n_frequencies)freqs (ndarray, 1d) – frequency axis corresponding to psds of shape
(n_frequencies,)
- set_time(starttime, endtime)
Set
startdateandenddate.- Parameters
starttime (
obspy.core.UTCDateTime) – new starttimeendtime (
obspy.core.UTCDateTime) – new endtime
- to_file(outdir='.')
Write content to HDF5 file.
Filename is created from stationcode, startdate and enddate.
- Parameters
outdir (
str['.']) – Directory in which the file is stored. Default is working directory.
- trim_nan()
Remove entire nan-rows at both ends and adjust start/enddate.
- class data_quality_control.base.GenericProcessor(nslc_code, dataclient, invclient, outdir='.', preprocessing=None, fileunit='year', **procparams)
Bases:
objectOrganize processing of data sets and save results to disk.
- Parameters
network (str) – network code
station (str) – station code
location (str) – location code
channel (str) – channel code
dataclient (obspy.clients) – client to obtain data from (
dataclient.get_waveforms())invclient (obspy.clients) – client to get meta data from (
invclient.get_stations())outdir (str, path-like) – directory to which results are written
preprocessing (func) – Function, that performs basic preprocessing of seismic data. Must take obspy.Stream as argument and return obspy.Trace.
fileunit ({"hour", "day", "month", "year"}, ["year"]) – Interval after which a new output file is started. Determines also format string for file-name E.g.:
fileunit="year"means all results for data from 1 year (01-January - 31-December) will be stored in one file. If start/end time for processing are different from 01-January/31-December files will be truncated accordingly.**procparams – processing parameters as keyword arguments
data_quality_control.base.ProcessingParameters
Note
Format of filenames will be determined by
fileunitasFNAME_BASE = "{outdir}/{network}.{station}.{location}.{channel}_" FNAME_FMTS = { "year" : FNAME_BASE + "{year:04d}.hdf5", "month" : FNAME_BASE + "{year:04d}-{month:02d}.hdf5", "day" : FNAME_BASE + "{year:04d}-{month:02d}-{day:02d}.hdf5", "hour" : FNAME_BASE + "{year:04d}-{month:02d}-{day:02d}-{hour:02d}.hdf5", }
year, month, day, hourcome from datetime object.Note
expand_nslc()can be customized for a specific data base or data client. If implemented, this may allow for wildcards instation, network, location, channel- expand_nslc(starttime=None, endtime=None)
Expand network, station, location, channel settings into lists.
This may need to be customized to the data base. It sets lists of respective codes as attributes
_networks, __stations, _locations, _channels
- iter_nslc()
Iterator that yields each combination of network, station, location, channel from
self._networks, self._stations, self._locations, self._channels. These lists are set byexpand_nslc()
- iter_time(starttime, endtime)
Iterator that yields start/end of intervals between
starttime,endtimedepending onfileunit.Example
If
fileunit='year':gp = GenericProcessor(... fileunit="year") stime = UTC("2018-05-14") etime = UTC("2021-09-20") for s, e in gp.iter_time(stime, etime): print(s.date, e.date) >>>> 2018-05-14 2018-12-31 >>>> 2019-01-01 2019-12-31 >>>> 2020-01-01 2020-12-31 >>>> 2021-01-01 2021-09-20
- process(starttime, endtime, force_new_file=False)
Execute processing.
This is the method you most likely want to call!
Iterates through processing interval defined by
fileunitand calls NSCProcessor().process() for each NSLC combination.Manages creation of output files.
- property winlen_seconds
- class data_quality_control.base.NSCProcessor(netw, stat, loc, chan, dataclient, invclient, **procparams)
Bases:
objectProcess one specific network-station-location-channel combination.
Processing means extracting percentile of amplitude and power spectral density over some time window.
- Parameters
netw (str) – network code
stat (str) – station code
chan (str) – channel code
loc (str) – location code
dataclient (obspy.clients) – client to obtain data from (
dataclient.get_waveforms())invclient (obspy.clients) – client to get meta data from (
invclient.get_stations())**procparams – processing parameters as keyword arguments
data_quality_control.base.ProcessingParameters
- nsc_as_dict()
Return network, station, location, channel as dict.
- process(starttime, endtime, preprocessing=None)
Compute noise amplitudes and spectra of raw data within time range.
Between start/end time, we load waveform data in chunks of
proclen_seconds. Amplitude and PSD are computed for each frame of lengthwinlen_secondsthat chunk. So the number of frames per chunk isnf = proclen_seconds / winlen_secondsEmpty streams are replaced by
np.nan.- Parameters
startdate (UTCDateTime) –
enddate (UTCDateTime) –
preprocessing (func [None]) – function that takes obspy.Stream and returns obspy.Trace. If
None, we usedataqc.util.process_stream()which merges and trims data, fills gaps withnp.nanand removes mean.
- Returns
output – An object, which bundles amplitude and psd matrix, frequency axis for psd and processing parameters. Shape of amplitude will be
(n_proclens, nf),n_proclensbeing the number of chunks between start/end time. Shape of psd will be(n_proclens, nf, nperseg//2+1). The last axis is the number of frequencies in psd.- Return type
- class data_quality_control.base.ProcessedDataFileManager(outdir, nperseg=None, processeddata=None, fileunit='year')
Bases:
object- allocate_file_start_end()
Determine datetime of start/end time of data covered in output file depending on
fileunit.Example
- if
fileunit="year", starttime=UTC("2021-04-10"): stime=2021-01-01,etime = 2021-12-31
- if
- if
fileunit="month, starttime=UTC("2021-04-10"): stime = 2021-04-01,etime = 2021-04-30
- if
- create_ofile(force_fileunit=True)
Create output file.
- property fname_fmt
- get_ofile(force_new_file=False, force_fileunit=True)
Open or create new output file for results from an
NSCProcessor- Parameters
nscprocessor (NSCProcessor) –
starttime (UTCDateTime) – nominal starttime of file
force_new_file (bool [False]) – If True, always create new, fresh file, i.e. override existing ones.
- Returns
f – open file handler
- Return type
h5py.File
- nsc_as_dict()
Return network, station, location, channel as dict.
- property ofilename
- set_data(processeddata, forced_startdate=None)
- property winlen_seconds
- write_data(force_new_file=False, force_fileunit=True)
- class data_quality_control.base.ProcessingParameters(**kwargs)
Bases:
objectBundle processing parameters.
- Parameters
overlap (int [60]) – length of taper (overlap) if overlapping frames are necessary. In seconds.
amplitude_frequencies (2-tuple [(4,14)]) – min, max frequency of Butterworth bandpass, applied before computing amplitudes
nperseg (int [2048]) – segment size in samples to estimate Welch spectrum (see scipy.signal.welch ) Together with sampling_rate it determines the resolution of the frequency axis.
winlen_seconds (int [3600]) – length of time window (=frame) over which amplitude and psd are computed. In seconds
proclen_seconds (int [24*3600]) – length of data processed at once. In seconds. Default is 1 day, corresponding to length of data in 1 sds-file
sampling_rate (int [20]) – sampling rate of seismic data at which amplitude and spectra are computed. Data are resampled if original SR deviates. Together with nperseg it determines the resolution of the frequency axis.
- get_dict()
Get attributes as dict.
- update(**kwargs)
Update an attribute
- data_quality_control.base.decorator_assert_integer_quotient_of_wins_per_fileunit(func)