data_quality_control.analysis module
- class data_quality_control.analysis.Analyzer(datadir, nslc_code, fileunit='year')
Bases:
data_quality_control.base.BaseProcessedDataHandler to view and manage processed amplitudes and PSDs.
Loads data for desired time range and provides plotting routines.
- Parameters
datadir (
pathlib.Path) – directory of HDF5-filesnslc_code (str) – code of format {network}.{station}.{location}.{channel}
fileunit ({"year", "month", "day", "hour"}) – flag indicating the expected time range per file. Determines the search pattern for files.
- iter_time(starttime, endtime)
Generator providing start and endtime of HDF5-files within requested timerange depending on
fileunitSee
fileunit="year":data_quality_control.util.iter_years(),fileunit="month":data_quality_control.util.iter_month(),fileunit="day":data_quality_control.util.iter_days(),fileunit="hour":data_quality_control.util.iter_hours()
- Type
func
- _check_if_requested_times_are_available(stime, etime)
Reduce start/end time to those in self if out of available range. Issue errors
- _check_times(starttimes, endtime)
Determines if given times indicate time range or time list.
Raises UserWarning if input is wrong.
- _iter_open_hdf5(fnamelist=None)
Generator that returns open h5py.File object for each filename in fnamelist. Closes file before yielding next file and before error is raised.
Useful for testing or if direct access to the hdf5-files is needed.
If fnamelist=None, we use self.files. Causes error if not set.
- _reset()
Set attributes amplitude_frequencies, winlen_seconds, startdate, enddate, amplitudes, psds, frequency_axis to None.
- filter_psds_for_times(timelist)
Select and set PSDs only for given times.
Finds indices of times in list, extracts respective PSDs and replaces
psds.- Parameters
timelist (list of UTCDateTimes) –
Warning
Overrides existing psd array! Cannot be used multiple times without reloading the data.
- property fmtstr
Name pattern of hdf5-files.
- get_available_datafiles()
Return list with all available HDF5-filenames for
stationcodeindatadir
- get_available_timerange()
Get list of available files and read startdate of first and enddate of last file in sorted list.
- get_data(starttimes, endtime=None)
Load amplitudes, psds and metadata from HDF5-files for specified times.
starttimes can be given as array-like object containing UTCDateTimes or a single UTCDateTime. In the first case Psds are loaded only for the windows corresponding to the times in starttimes. In the second case, endtime must be given as UTCDateTime. Psds are loaded for the entire range between start and end time.
If list of starttimes is given, starttime and endtime properties of the Analyzer are set to minimum and maximum times in list.
Amplitudes are always loaded for the entire range between starttime and endtime.
- Parameters
starttimes (array-like or UTCDateTime) – If UTCDateTime, we expect also endtime as UTCDateTime. PSDs are loaded for the time range between start and endtime. if array-like, endtime is ignored. Psds are loaded for windows listed in starttimes only.
endtime (None or UTCDateTime) – only required if starttimes is UTCDateTime. End of time range for data selection.
- get_filenames(starttime, endtime)
Get filenames within time range.
- nslc_as_dict()
Return station code as dictionary.
- plot3d()
- plot3d_amplitudes(func=None)
- plot3d_spectrogram(func=None, colorbarlabel='', freqs=(None, None), log_freq_ax=False, vmin=None, vmax=None)
Notes
Latex rendering for me works in title but not on axis labels.
- plot_spectrogram(ax=None, func=None, colorbarlabel='', freqs=(None, None), log_freq_ax=False, **kwargs)
Plot power spectral densities as spectrogram.
We use matplotlibs pcolormesh to plot the PSDs in a spectrogram-like way, i.e. time on x-axis, frequency on y-axis and PSD as color.
- Parameters
ax (matplotlib.axes) – axes to plot in
func (callable) – modifies PSDS as func(self.psds.T) before plotting. If not given, we plot np.log10(self.psds.T * 1e9**2) which is log10(nm^2/s^2/Hz.
colorbarlabel (str [""]) – set colorbar label. Useful in combination with func to set correct units for color scale.
kwargs – keyword arguments passed to pcolormesh. vmax can be callable to set vmax=vmax(Z). If not given, we use: cmap=plt.cm.afmhot, shading=auto, vmax=np.nanmax(Z)
- trim(starttime=None, endtime=None, fill_value=None)
Remove data outside of start/end time.
- Parameters
starttime (
obspy.core.UTCDateTime) – If None set starttime is usedendtime (
obspy.core.UTCDateTime) – If None set endtime is usedfill_value (float, int [None]) – If given, fills arrays to requested start/endtime if available data starts after starttime or ends before endtime. If None, resulting time range may be shorter than requested.
- class data_quality_control.analysis.SmoothOperator(datadir, nslc_code, fileunit='year', kernel_size=None, kernel_shift=None)
Bases:
data_quality_control.analysis.AnalyzerManage smoothing/data reduction of processed amplitudes and PSDs.
Grandparent is
BaseProcessedData.The main routine to call is
smooth(). The other methods are mostly helpers to perform the smoothing/downsampling, organize the iteration over the data base and manage IO-operations. They can be useful on their own, though.- Parameters
datadir (
pathlib.Path) – directory of HDF5-filesnslc_code (str) – code of format {network}.{station}.{location}.{channel}
fileunit ({"year", "month", "day", "hour"}) – flag indicating the expected time range per file. Determines the search pattern for files.
kernel_size (int > 1) – Number of samples over which median is computed. 1 sample corresponds to the window size of the original data.
kernel_shift (int >= 1) – Number of samples by which the kernel is shifted to compute the next median. See
kernel_shift
Example
from data_quality_control import analysis, dqclogging nslc_code = "GR.BFO..BHZ" datadir = "output/" outdir = "output/smoothd/" kernel_size = 6 kernel_shift = 3 # Set loglevel in console, no logfile dqclogging.configure_handlers(loglevel_console="INFO", loglevel_file=None) # Interpolation ## Initiate smoother polly = analysis.SmoothOperator(datadir, nslc_code, kernel_size=kernel_size, kernel_shift=kernel_shift) ## Start interpolation over whole available time range polly.smooth(outdir, force_new_file=True) # View results lyza = analysis.Analyzer(datadir, nslc_code, fileunit="year") startdate = UTC("2020-12-24") enddate = UTC("2021-01-15") lyza.get_data(startdate, enddate) fig = lyza.plot_spectrogram()
- kernel_size
Number of samples over which median is computed. 1 sample corresponds to the window size of the original data.
- Type
int > 1
- kernel_shift
Number of samples by which the kernel is shifted to compute the next median.
kernel_shift=1corresponds to a classic running median,kernel_shift>1downsamples the original data. Determinesdata_quality_control.base.BaseProcessedData.winlen_seconds. For example if the seismic data were processed usingwinlen_seconds=3600the newdata_quality_control.base.BaseProcessedData.winlen_secondsis 6*3600s = 21600s.- Type
int >= 1
- WINLEN_SECONDS
window size in seconds of the original processed data, i.e. the window size over which amplitudes and psds were computed from the seismic data. Is set based on the value in the first file and checked against
data_quality_control.base.BaseProcessedData.winlen_secondsof every subsequent file. RaisesRuntimeErrorif it changes.- Type
int
- winlen_seconds
Window size in seconds of the current data. Changes with every iteration, i.e. every newly read file, between value of the original processed data read from file and
kernel_shift*winlen_secondsafter the median operation is applied.- Type
int
- _check_framed_shape(x, X, label='')
Used in
_medianfilter()to check if all data went into frames.- Parameters
x (ndarray (1D)) – data as time series
X (ndarray (2D)) – data as (overlapping) frames
kernel_shift (int) – distance between two frames in samples
label (str) – name of x for more meaningful error msg.
- Raises
AssertionError – if there are samples left.
- _check_kernelshiftsize()
Check if
winlen_secondsandkernel_shiftif resulting increment does not yield an integer quotient of possible total durations defined byfileunit.Runs
data_quality_control.util.assert_integer_quotient_of_wins_per_fileunit()- Raises
UserWarning – if
kernel_shiftyields inappropriate increment
- _get_WINLEN_SECONDS(TSTA, TEND)
Read first file in list to get window size in seconds.
- Parameters
TSTA (UTCDateTime) – beginning of time range in which we look for data.
TEND (UTCDateTime) – end of time range in which we look for data.
Example
If you want to know the
winlen_secondsin your targeted data base, adjust the following code snippet:nslc_code = "GR.BFO..BHZ" datadir = "output/" polly = analysis.SmoothOperator(datadir, nslc_code) polly._get_WINLEN_SECONDS(*polly.get_available_timerange()) print("Winlen in processed data", polly.WINLEN_SECONDS)
- _get_start_endtime(starttime=None, endtime=None)
Replace
Noneinput parameters with start/end time from all available data.
- _medianfilter()
Transform timeseries data into frames and apply median operation.
- _set_check_WINLEN_SECONDS()
Sets
WINLEN_SECONDSif not set, otherwise raises error if different from py:attr:.winlen_seconds. Used to monitor if window size changes between files.
- iter_times_kernel(tsta, tend)
Generator providing start and endtime of data to read for interpolation.
It’s a wrapper around
data_quality_control.analysis.Analyzer.iter_time()which adjusts the yielded start and endtimes bykernel_shift,kernel_sizeandWINLEN_SECONDSto accommodate all data to smooth over the entirefileunit.- Yields
new_tsta (
UTCDateTime) – starttime of data to read for interpolation. Determined bynew_tendexcept for first one.new_tend (
UTCDateTime) – endtime of data to read for interpolation
- set_kernel(kernel_size, kernel_shift)
Set kernel parameters. Forces integers.
Preserves existing attribute if
kernel_size=Noneorkernel_shift=None.- Parameters
kernel_size (num > 1) – Number of samples over which median is computed. 1 sample corresponds to the window size of the original data.
kernel_shift (num > 1) – Number of samples by which the kernel is shifted to compute the next median. See
kernel_shift
- smooth(outdir='.', starttime=None, endtime=None, kernel_size=None, kernel_shift=None, force_new_file=False)
Performs smoothing/downsampling of database for requested time range.
- Parameters
outdir (str) – directory where new results are stored. Should be different from
datadirbecause file names will be identical and would override input data.starttime (UTCDateTime) – Begin of time range to process. If
Noneearliest available time in database is used.endtime (UTCDateTime) – End of time range to process. If
Nonelastest available time in database is used.kernel_size (int, None) – See
kernel_size. Overrides existing value. Must be set here if not set on initialization.kernel_shift (int, None) – See
kernel_shift. Overrides existing value. Must be set here if not set on initialization.force_new_file (bool [False]) – If
Trueoverwrites existing output data.