Handling of missing files
Here, I demonstrate what happens if no files are available for a requested date.
We treat 3 cases:
no data at all for requested time period
In this case, no data can be processed. An HDF5 file is created nevertheless, but because it does not contain any data arrays, its size will be very small. We still iterate through every requested day because we don’t check beforehand if there is data at all. We don’t check because such a check can be quite expansive if the database is large. So it is the users responsibility to request only reasonable time frames.
no files until or after some day within the requested period.
E.g. the database has entries from 2020-Feb-01 to 2020-Oct-01 but we put a request for 2020-Jan-01 to 2020-Dec-31. Again, the algorithm needs to try every day we ask for. However, we only save the data and dates within the available time range. The missing days are filled with Nans.
no files within a requested period
E.g. the database has entries from 2020-Feb-01 to 2020-Oct-01 but files for 20-March-01 to 20-March-10 are missing. We request data for 2020-Feb-01 to 2020-Oct-01. In this case, we get a processed data set with the missing days filled with Nans in the amplitude and psds data.
[1]:
from importlib import reload
import os
from pathlib import Path
import numpy as np
[2]:
from data_quality_control import base
[3]:
from obspy.clients.filesystem.sds import Client
from obspy.clients.fdsn import RoutingClient
from obspy.core import UTCDateTime as UTC
[4]:
import matplotlib.pyplot as plt
plt.style.use('tableau-colorblind10')
[5]:
nscl_code = "GR.BFO..BHZ"
overlap = 60 #3600
fmin, fmax = (4, 14)
nperseg = 2048
winlen_in_s = 3600
proclen = 24*3600
outdir = Path("output/missing_file_handling") #'../sample_output/missing_file_handling/'
sds_root = os.path.abspath('../sample_sds/')
inventory_routing_type = "eida-routing"
sdsclient = Client(sds_root)
invclient = RoutingClient(inventory_routing_type)
Create output directory:
[6]:
outdir.mkdir(parents=True, exist_ok=True)
Database content
We have the last 6 days of 2021 and the first 9 days of January 2021 for GR.BFO..HHZ
[7]:
%ls ../sample_sds/*/*/*/*
../sample_sds/2020/GR/BFO/BHZ.D:
GR.BFO..BHZ.D.2020.360 GR.BFO..BHZ.D.2020.363 GR.BFO..BHZ.D.2020.366
GR.BFO..BHZ.D.2020.361 GR.BFO..BHZ.D.2020.364
GR.BFO..BHZ.D.2020.362 GR.BFO..BHZ.D.2020.365
../sample_sds/2021/GR/BFO/BHZ.D:
GR.BFO..BHZ.D.2021.001 GR.BFO..BHZ.D.2021.004 GR.BFO..BHZ.D.2021.007
GR.BFO..BHZ.D.2021.002 GR.BFO..BHZ.D.2021.005 GR.BFO..BHZ.D.2021.008
GR.BFO..BHZ.D.2021.003 GR.BFO..BHZ.D.2021.006 GR.BFO..BHZ.D.2021.009
Init processor for station code and clients
[8]:
reload(base)
processor = base.GenericProcessor(
nscl_code,
dataclient=sdsclient,
invclient=invclient,
outdir=outdir,
# Default parameters correspond to those given
)
print(processor)
using default for overlap
using default for amplitude_frequencies
using default for nperseg
using default for winlen_seconds
using default for proclen_seconds
using default for sampling_rate
GR.BFO..BHZ
Data is sent to output/missing_file_handling
Data client:
sds_root=/home/docs/checkouts/readthedocs.org/user_builds/lonesam/checkouts/latest/docs/sample_sds; sds_type=D; format=MSEED; fileborder_seconds=30; fileborder_samples=5000
Inventory client _user_agent=ObsPy/1.2.2 (Linux-5.19.0-1028-aws-x86_64-with-glibc2.31, Python 3.9.7); _timeout=120; _debug=False; _BaseRoutingClient__include_providers=[]; _BaseRoutingClient__exclude_providers=[]; credentials={}; _url=http://www.orfeus-eu.org/eidaws/routing/1
Processing settings:
overlap = 60
amplitude_frequencies = (4, 14)
nperseg = 2048
winlen_seconds = 3600
proclen_seconds = 86400
sampling_rate = 20
Preprocessing of seismic data: process_stream
Windows per proclen: 24
Request data entirely outside available time frame
We try to process every requested day as usual but since no data is available, no hdf5-files are created. The logger issues a warning.
[9]:
startdate = UTC("2018-12-25")
enddate = UTC("2019-01-05")
[10]:
%%time
#it -n1 -r7
processor.process(startdate, enddate, force_new_file=True)
24-02-09 13:26:18 - data_quality_control.base.ProcessedDataFileManager - INFO - Processed data stored per year in output/missing_file_handling as {outdir}/{network}.{station}.{location}.{channel}_{year:04d}.hdf5
24-02-09 13:26:18 - data_quality_control.util - DEBUG - Time iterator yields 2018-12-25T00:00:00.000000Z - 2018-12-31T00:00:00.000000Z
24-02-09 13:26:18 - data_quality_control.base.GenericProcessor - DEBUG - processing 2018-12-25T00:00:00.000000Z - 2018-12-31T00:00:00.000000Z
24-02-09 13:26:18 - data_quality_control.base.GenericProcessor - DEBUG - Times passed to nscprocessor: 2018-12-25T00:00:00.000000Z - 2018-12-31T00:00:00.000000Z
24-02-09 13:26:18 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2018-12-25T00:00:00.000000Z - 2019-01-01T00:00:00.000000Z
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/io/stationxml/core.py:96: UserWarning: The StationXML file has version 1.2, ObsPy can read versions (1.0, 1.1). Proceed with caution.
warnings.warn("The StationXML file has version %s, ObsPy can "
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - INFO - Processing GR.BFO..BHZ
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-24T23:59:00.000000Z - 2018-12-26T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-24T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-25T23:59:00.000000Z - 2018-12-27T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-25T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-26T23:59:00.000000Z - 2018-12-28T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-26T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-27T23:59:00.000000Z - 2018-12-29T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-27T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-28T23:59:00.000000Z - 2018-12-30T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-28T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-29T23:59:00.000000Z - 2018-12-31T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-29T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-30T23:59:00.000000Z - 2019-01-01T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-30T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2019-01-01T00:00:00.000000Z - 2019-01-01T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2019-01-01T00:00:00.000000Z - 2019-01-01T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Starttime of file: 2018-12-25T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Allocated start/endtime: 2018-01-01T00:00:00.000000Z - 2019-01-01T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.ProcessedDataFileManager - INFO - Creating output file output/missing_file_handling/GR.BFO..BHZ_2018.hdf5
24-02-09 13:26:20 - data_quality_control.base.ProcessedDataFileManager - INFO - Starttime=2018-01-01T00:00:00.000000Z, endtime=2019-01-01T00:00:00.000000Z, n_windows=8760
24-02-09 13:26:20 - data_quality_control.base.BaseProcessedData - WARNING - output has no data to insert
24-02-09 13:26:20 - data_quality_control.util - DEBUG - Iterator reset enddate to 2019-01-05T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.util - DEBUG - Time iterator yields 2019-01-01T00:00:00.000000Z - 2019-01-05T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.GenericProcessor - DEBUG - processing 2019-01-01T00:00:00.000000Z - 2019-01-05T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.GenericProcessor - DEBUG - Times passed to nscprocessor: 2019-01-01T00:00:00.000000Z - 2019-01-05T00:00:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2019-01-01T00:00:00.000000Z - 2019-01-06T00:00:00.000000Z
Data in output False
<HDF5 file "GR.BFO..BHZ_2018.hdf5" (mode r+)>
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/io/stationxml/core.py:96: UserWarning: The StationXML file has version 1.2, ObsPy can read versions (1.0, 1.1). Proceed with caution.
warnings.warn("The StationXML file has version %s, ObsPy can "
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - INFO - Processing GR.BFO..BHZ
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2018-12-31T23:59:00.000000Z - 2019-01-02T00:01:00.000000Z
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - ERROR - No data for 2018-12-31T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:20 - data_quality_control.base.NSCProcessor - DEBUG - 2019-01-01T23:59:00.000000Z - 2019-01-03T00:01:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - ERROR - No data for 2019-01-01T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - DEBUG - 2019-01-02T23:59:00.000000Z - 2019-01-04T00:01:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - ERROR - No data for 2019-01-02T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - DEBUG - 2019-01-03T23:59:00.000000Z - 2019-01-05T00:01:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - ERROR - No data for 2019-01-03T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - DEBUG - 2019-01-04T23:59:00.000000Z - 2019-01-06T00:01:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.NSCProcessor - ERROR - No data for 2019-01-04T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:21 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2019-01-06T00:00:00.000000Z - 2019-01-06T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2019-01-06T00:00:00.000000Z - 2019-01-06T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Starttime of file: 2019-01-01T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Allocated start/endtime: 2019-01-01T00:00:00.000000Z - 2020-01-01T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.ProcessedDataFileManager - INFO - Creating output file output/missing_file_handling/GR.BFO..BHZ_2019.hdf5
24-02-09 13:26:21 - data_quality_control.base.ProcessedDataFileManager - INFO - Starttime=2019-01-01T00:00:00.000000Z, endtime=2020-01-01T00:00:00.000000Z, n_windows=8760
24-02-09 13:26:21 - data_quality_control.base.BaseProcessedData - WARNING - output has no data to insert
24-02-09 13:26:21 - data_quality_control.base.GenericProcessor - INFO - Finished. Took 0:00:02.178057 h
Data in output False
<HDF5 file "GR.BFO..BHZ_2019.hdf5" (mode r+)>
CPU times: user 176 ms, sys: 37.7 ms, total: 214 ms
Wall time: 2.18 s
[11]:
processor
[11]:
GR.BFO..BHZ
Data is sent to output/missing_file_handling
Data client:
sds_root=/home/docs/checkouts/readthedocs.org/user_builds/lonesam/checkouts/latest/docs/sample_sds; sds_type=D; format=MSEED; fileborder_seconds=30; fileborder_samples=5000
Inventory client _user_agent=ObsPy/1.2.2 (Linux-5.19.0-1028-aws-x86_64-with-glibc2.31, Python 3.9.7); _timeout=120; _debug=False; _BaseRoutingClient__include_providers=[]; _BaseRoutingClient__exclude_providers=[]; credentials={}; _url=http://www.orfeus-eu.org/eidaws/routing/1
Processing settings:
overlap = 60
amplitude_frequencies = (4, 14)
nperseg = 2048
winlen_seconds = 3600
proclen_seconds = 86400
sampling_rate = 20
Preprocessing of seismic data: process_stream
Windows per proclen: 24
[12]:
%ls ../sample_output/missing_file_handling/
ls: cannot access '../sample_output/missing_file_handling/': No such file or directory
Request partially available time frame
We request data for 2020-12-20 to 2021-01-15. We expect 2 files, one for the 2020 data and one for 2021. However, this is 5 days more on both ends than available.
The algorithm thus starts filling the first file (the one for 2020) only at 25 December. The second file ends at 9 January.
Since fileunit="year" (default) and the processing length is 1 day proclen_seconds=24*3600 , the output file is allocated for shape (365, 24) for amplitude data.
[13]:
startdate = UTC("2020-12-20")
enddate = UTC("2021-01-15")
[14]:
%%time
#it -n1 -r7
processor.process(startdate, enddate, force_new_file=True)
24-02-09 13:26:21 - data_quality_control.base.ProcessedDataFileManager - INFO - Processed data stored per year in output/missing_file_handling as {outdir}/{network}.{station}.{location}.{channel}_{year:04d}.hdf5
24-02-09 13:26:21 - data_quality_control.util - DEBUG - Time iterator yields 2020-12-20T00:00:00.000000Z - 2020-12-31T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.GenericProcessor - DEBUG - processing 2020-12-20T00:00:00.000000Z - 2020-12-31T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.GenericProcessor - DEBUG - Times passed to nscprocessor: 2020-12-20T00:00:00.000000Z - 2020-12-31T00:00:00.000000Z
24-02-09 13:26:21 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2020-12-20T00:00:00.000000Z - 2021-01-01T00:00:00.000000Z
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/io/stationxml/core.py:96: UserWarning: The StationXML file has version 1.2, ObsPy can read versions (1.0, 1.1). Proceed with caution.
warnings.warn("The StationXML file has version %s, ObsPy can "
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - INFO - Processing GR.BFO..BHZ
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-19T23:59:00.000000Z - 2020-12-21T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - ERROR - No data for 2020-12-19T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-20T23:59:00.000000Z - 2020-12-22T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - ERROR - No data for 2020-12-20T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-21T23:59:00.000000Z - 2020-12-23T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - ERROR - No data for 2020-12-21T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-22T23:59:00.000000Z - 2020-12-24T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - ERROR - No data for 2020-12-22T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-23T23:59:00.000000Z - 2020-12-25T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2020-12-23T23:59:00.019538Z - 2020-12-25T00:00:59.969538Z | 20.0 Hz, 1730400 samples
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/signal/filter.py:67: UserWarning: Selected high corner frequency (14) of bandpass is at or above Nyquist (10.0). Applying a high-pass instead.
warnings.warn(msg)
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1395: RuntimeWarning: All-NaN slice encountered
result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-24T23:59:00.000000Z - 2020-12-26T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2020-12-24T23:59:00.019538Z - 2020-12-26T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-25T23:59:00.000000Z - 2020-12-27T00:01:00.000000Z
24-02-09 13:26:22 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-26T23:59:00.000000Z - 2020-12-28T00:01:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-27T23:59:00.000000Z - 2020-12-29T00:01:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-28T23:59:00.000000Z - 2020-12-30T00:01:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-29T23:59:00.000000Z - 2020-12-31T00:01:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-30T23:59:00.000000Z - 2021-01-01T00:01:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2020-12-24T00:00:00.000000Z - 2021-01-01T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Shapes before trim_nan: (192,), (192, 1025)
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2020-12-25T01:00:00.000000Z - 2021-01-01T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Removed 25 samples from beginning and 0 samples from end
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Shapes after trim_nan: (167,), (167, 1025)
24-02-09 13:26:23 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2020-12-25T01:00:00.000000Z - 2021-01-01T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Starttime of file: 2020-12-20T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Allocated start/endtime: 2020-01-01T00:00:00.000000Z - 2021-01-01T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.ProcessedDataFileManager - INFO - Creating output file output/missing_file_handling/GR.BFO..BHZ_2020.hdf5
24-02-09 13:26:23 - data_quality_control.base.ProcessedDataFileManager - INFO - Starttime=2020-01-01T00:00:00.000000Z, endtime=2021-01-01T00:00:00.000000Z, n_windows=8784
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Params of data inserted to file:
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - starttime: 2020-12-25T01:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - endtime: 2021-01-01T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Amplitude matrix shape: 167
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Target shape: (167)
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Total shape of target 8784
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - DEBUG - Targeted index range 8617:8784
24-02-09 13:26:23 - data_quality_control.util - DEBUG - Iterator reset enddate to 2021-01-15T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.util - DEBUG - Time iterator yields 2021-01-01T00:00:00.000000Z - 2021-01-15T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.GenericProcessor - DEBUG - processing 2021-01-01T00:00:00.000000Z - 2021-01-15T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.GenericProcessor - DEBUG - Times passed to nscprocessor: 2021-01-01T00:00:00.000000Z - 2021-01-15T00:00:00.000000Z
24-02-09 13:26:23 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2021-01-01T00:00:00.000000Z - 2021-01-16T00:00:00.000000Z
<HDF5 file "GR.BFO..BHZ_2020.hdf5" (mode r+)>
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/io/stationxml/core.py:96: UserWarning: The StationXML file has version 1.2, ObsPy can read versions (1.0, 1.1). Proceed with caution.
warnings.warn("The StationXML file has version %s, ObsPy can "
24-02-09 13:26:24 - data_quality_control.base.NSCProcessor - INFO - Processing GR.BFO..BHZ
24-02-09 13:26:24 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-31T23:59:00.000000Z - 2021-01-02T00:01:00.000000Z
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/signal/filter.py:67: UserWarning: Selected high corner frequency (14) of bandpass is at or above Nyquist (10.0). Applying a high-pass instead.
warnings.warn(msg)
24-02-09 13:26:24 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-01T23:59:00.000000Z - 2021-01-03T00:01:00.000000Z
24-02-09 13:26:24 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-02T23:59:00.000000Z - 2021-01-04T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-03T23:59:00.000000Z - 2021-01-05T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-04T23:59:00.000000Z - 2021-01-06T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-05T23:59:00.000000Z - 2021-01-07T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-06T23:59:00.000000Z - 2021-01-08T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-07T23:59:00.000000Z - 2021-01-09T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-08T23:59:00.000000Z - 2021-01-10T00:01:00.000000Z
24-02-09 13:26:25 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-08T23:59:00.019538Z - 2021-01-10T00:00:59.969538Z | 20.0 Hz, 1730400 samples
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1395: RuntimeWarning: All-NaN slice encountered
result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-09T23:59:00.000000Z - 2021-01-11T00:01:00.000000Z
24-02-09 13:26:26 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-09T23:59:00.019538Z - 2021-01-11T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-10T23:59:00.000000Z - 2021-01-12T00:01:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-10T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-11T23:59:00.000000Z - 2021-01-13T00:01:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-11T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-12T23:59:00.000000Z - 2021-01-14T00:01:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-12T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-13T23:59:00.000000Z - 2021-01-15T00:01:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-13T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-14T23:59:00.000000Z - 2021-01-16T00:01:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-14T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:26 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2021-01-01T00:00:00.000000Z - 2021-01-16T00:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Shapes before trim_nan: (360,), (360, 1025)
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2021-01-01T00:00:00.000000Z - 2021-01-09T23:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Removed 0 samples from beginning and 145 samples from end
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Shapes after trim_nan: (215,), (215, 1025)
24-02-09 13:26:26 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2021-01-01T00:00:00.000000Z - 2021-01-09T23:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Starttime of file: 2021-01-01T00:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Allocated start/endtime: 2021-01-01T00:00:00.000000Z - 2022-01-01T00:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.ProcessedDataFileManager - INFO - Creating output file output/missing_file_handling/GR.BFO..BHZ_2021.hdf5
24-02-09 13:26:26 - data_quality_control.base.ProcessedDataFileManager - INFO - Starttime=2021-01-01T00:00:00.000000Z, endtime=2022-01-01T00:00:00.000000Z, n_windows=8760
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Params of data inserted to file:
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - starttime: 2021-01-01T00:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - endtime: 2021-01-09T23:00:00.000000Z
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Amplitude matrix shape: 215
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Target shape: (215)
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Total shape of target 8760
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Targeted index range 0:215
24-02-09 13:26:26 - data_quality_control.base.GenericProcessor - INFO - Finished. Took 0:00:04.526947 h
<HDF5 file "GR.BFO..BHZ_2021.hdf5" (mode r+)>
CPU times: user 2.58 s, sys: 476 ms, total: 3.05 s
Wall time: 4.53 s
[15]:
%ls -nhG output/missing_file_handling/
total 69M
-rw-r--r-- 1 1005 2.4K Feb 9 13:26 GR.BFO..BHZ_2018.hdf5
-rw-r--r-- 1 1005 2.4K Feb 9 13:26 GR.BFO..BHZ_2019.hdf5
-rw-r--r-- 1 1005 35M Feb 9 13:26 GR.BFO..BHZ_2020.hdf5
-rw-r--r-- 1 1005 35M Feb 9 13:26 GR.BFO..BHZ_2021.hdf5
By default, the output file expects to receive one year of data and the arrays are allocated accordingly. However, most of the entries are Nans, except for the very last and first days of 2020 and 2021, respectively.
[16]:
dat1 = base.BaseProcessedData()
dat1.from_file(outdir.joinpath('GR.BFO..BHZ_2020.hdf5'))
print(dat1.amplitudes.shape)
print(dat1.psds.shape)
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: None - None
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - INFO - Reading file output/missing_file_handling/GR.BFO..BHZ_2020.hdf5
(8784,)
(8784, 1025)
[17]:
_dat = dat1
A = _dat.reshape_amps_to_days()
fig, axs = plt.subplots(1, 2, gridspec_kw=dict(wspace=0.5))
fig.suptitle("2020")
datalabels = ['whole year', 'available period']
ticks = np.arange(0, len(A.T))
ticklabels = [l.date for l in
np.arange(_dat.startdate, _dat.enddate+24*3600, 24*3600)]
for i, datalabel in enumerate(datalabels):
ax = axs[i]
ax.set_title(datalabel)
cax = ax.imshow(A.T, aspect='auto')
ax.set_xlabel('hours');
if i==0:
ax.set_yticks(ticks[::30])
ax.set_yticklabels(labels=ticklabels[::30]);
elif i==1:
ax.set_yticks(ticks)
ax.set_yticklabels(labels=ticklabels);
ax.set_ylim(366, 355)
24-02-09 13:26:26 - data_quality_control.base.BaseProcessedData - DEBUG - Extening front by 0, back by 24 samples.
[18]:
dat2 = base.BaseProcessedData()
dat2.from_file(outdir.joinpath('GR.BFO..BHZ_2021.hdf5'))
print(dat2.amplitudes.shape)
print(dat2.psds.shape)
24-02-09 13:26:27 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: None - None
24-02-09 13:26:27 - data_quality_control.base.BaseProcessedData - INFO - Reading file output/missing_file_handling/GR.BFO..BHZ_2021.hdf5
(8760,)
(8760, 1025)
[19]:
_dat = dat2
A = _dat.reshape_amps_to_days()
fig, axs = plt.subplots(1, 2, gridspec_kw=dict(wspace=0.5))
fig.suptitle("2020")
datalabels = ['whole year', 'available period']
ticks = np.arange(0, len(A.T))
ticklabels = [l.date for l in
np.arange(_dat.startdate, _dat.enddate+24*3600, 24*3600)]
for i, datalabel in enumerate(datalabels):
ax = axs[i]
ax.set_title(datalabel)
cax = ax.imshow(A.T, aspect='auto')
ax.set_xlabel('hours');
if i==0:
ax.set_yticks(ticks[::30])
ax.set_yticklabels(labels=ticklabels[::30]);
elif i==1:
ax.set_yticks(ticks)
ax.set_yticklabels(labels=ticklabels);
ax.set_ylim(15, -1)
24-02-09 13:26:27 - data_quality_control.base.BaseProcessedData - DEBUG - Extening front by 0, back by 24 samples.
Missing files within requested time
We remove 2 days (Jan 4-5 2021) of data from the data base. Then we request data for 2021-01-02 to 2021-01-12.
[20]:
%mv ../sample_sds/2021/GR/BFO/BHZ.D/GR.BFO..BHZ.D.2021.00[45]* .
[21]:
%ls ../sample_sds/*/*/*/*
../sample_sds/2020/GR/BFO/BHZ.D:
GR.BFO..BHZ.D.2020.360 GR.BFO..BHZ.D.2020.363 GR.BFO..BHZ.D.2020.366
GR.BFO..BHZ.D.2020.361 GR.BFO..BHZ.D.2020.364
GR.BFO..BHZ.D.2020.362 GR.BFO..BHZ.D.2020.365
../sample_sds/2021/GR/BFO/BHZ.D:
GR.BFO..BHZ.D.2021.001 GR.BFO..BHZ.D.2021.006 GR.BFO..BHZ.D.2021.009
GR.BFO..BHZ.D.2021.002 GR.BFO..BHZ.D.2021.007
GR.BFO..BHZ.D.2021.003 GR.BFO..BHZ.D.2021.008
We also keep a backup of the first version of the file for 2021 because the next step will override the existing one.
[22]:
%cp output/missing_file_handling/GR.BFO..BHZ_2021.hdf5 output/missing_file_handling/GR.BFO..BHZ_2021_bak.hdf5
[23]:
startdate = UTC("2021-01-01")
enddate = UTC("2021-01-12")
[24]:
processor.process(startdate, enddate, force_new_file=True)
24-02-09 13:26:30 - data_quality_control.base.ProcessedDataFileManager - INFO - Processed data stored per year in output/missing_file_handling as {outdir}/{network}.{station}.{location}.{channel}_{year:04d}.hdf5
24-02-09 13:26:30 - data_quality_control.util - DEBUG - Iterator reset enddate to 2021-01-12T00:00:00.000000Z
24-02-09 13:26:30 - data_quality_control.util - DEBUG - Time iterator yields 2021-01-01T00:00:00.000000Z - 2021-01-12T00:00:00.000000Z
24-02-09 13:26:30 - data_quality_control.base.GenericProcessor - DEBUG - processing 2021-01-01T00:00:00.000000Z - 2021-01-12T00:00:00.000000Z
24-02-09 13:26:30 - data_quality_control.base.GenericProcessor - DEBUG - Times passed to nscprocessor: 2021-01-01T00:00:00.000000Z - 2021-01-12T00:00:00.000000Z
24-02-09 13:26:30 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2021-01-01T00:00:00.000000Z - 2021-01-13T00:00:00.000000Z
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/io/stationxml/core.py:96: UserWarning: The StationXML file has version 1.2, ObsPy can read versions (1.0, 1.1). Proceed with caution.
warnings.warn("The StationXML file has version %s, ObsPy can "
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - INFO - Processing GR.BFO..BHZ
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - DEBUG - 2020-12-31T23:59:00.000000Z - 2021-01-02T00:01:00.000000Z
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/signal/filter.py:67: UserWarning: Selected high corner frequency (14) of bandpass is at or above Nyquist (10.0). Applying a high-pass instead.
warnings.warn(msg)
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-01T23:59:00.000000Z - 2021-01-03T00:01:00.000000Z
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-02T23:59:00.000000Z - 2021-01-04T00:01:00.000000Z
24-02-09 13:26:31 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-02T23:59:00.019538Z - 2021-01-04T00:00:59.969538Z | 20.0 Hz, 1730400 samples
/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1395: RuntimeWarning: All-NaN slice encountered
result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-03T23:59:00.000000Z - 2021-01-05T00:01:00.000000Z
24-02-09 13:26:31 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-03T23:59:00.019538Z - 2021-01-05T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-04T23:59:00.000000Z - 2021-01-06T00:01:00.000000Z
24-02-09 13:26:31 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-04T23:59:00.019538Z - 2021-01-06T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:31 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-05T23:59:00.000000Z - 2021-01-07T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-05T23:59:00.019538Z - 2021-01-07T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-06T23:59:00.000000Z - 2021-01-08T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-07T23:59:00.000000Z - 2021-01-09T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-08T23:59:00.000000Z - 2021-01-10T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-08T23:59:00.019538Z - 2021-01-10T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-09T23:59:00.000000Z - 2021-01-11T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.util - INFO - Found nans in GR.BFO..BHZ | 2021-01-09T23:59:00.019538Z - 2021-01-11T00:00:59.969538Z | 20.0 Hz, 1730400 samples
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-10T23:59:00.000000Z - 2021-01-12T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-10T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - DEBUG - 2021-01-11T23:59:00.000000Z - 2021-01-13T00:01:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.NSCProcessor - ERROR - No data for 2021-01-11T23:59:00.000000Z
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/base.py", line 534, in process
tr = preprocessing(st, inv, starttime, endtime,
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/data_quality_control/util.py", line 124, in process_stream
tr = st[0]
File "/home/docs/checkouts/readthedocs.org/user_builds/lonesam/conda/latest/lib/python3.9/site-packages/obspy/core/stream.py", line 649, in __getitem__
return self.traces.__getitem__(index)
IndexError: list index out of range
24-02-09 13:26:32 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2021-01-01T00:00:00.000000Z - 2021-01-13T00:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Shapes before trim_nan: (288,), (288, 1025)
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: 2021-01-01T00:00:00.000000Z - 2021-01-09T23:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Removed 0 samples from beginning and 73 samples from end
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Shapes after trim_nan: (215,), (215, 1025)
24-02-09 13:26:32 - data_quality_control.base.GenericProcessor - DEBUG - output time range: 2021-01-01T00:00:00.000000Z - 2021-01-09T23:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Starttime of file: 2021-01-01T00:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.ProcessedDataFileManager - DEBUG - Allocated start/endtime: 2021-01-01T00:00:00.000000Z - 2022-01-01T00:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.ProcessedDataFileManager - INFO - Creating output file output/missing_file_handling/GR.BFO..BHZ_2021.hdf5
24-02-09 13:26:32 - data_quality_control.base.ProcessedDataFileManager - INFO - Starttime=2021-01-01T00:00:00.000000Z, endtime=2022-01-01T00:00:00.000000Z, n_windows=8760
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Params of data inserted to file:
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - starttime: 2021-01-01T00:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - endtime: 2021-01-09T23:00:00.000000Z
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Amplitude matrix shape: 215
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Target shape: (215)
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Total shape of target 8760
24-02-09 13:26:32 - data_quality_control.base.BaseProcessedData - DEBUG - Targeted index range 0:215
24-02-09 13:26:32 - data_quality_control.base.GenericProcessor - INFO - Finished. Took 0:00:02.481420 h
<HDF5 file "GR.BFO..BHZ_2021.hdf5" (mode r+)>
[25]:
%ls -nhG output/missing_file_handling/
total 103M
-rw-r--r-- 1 1005 2.4K Feb 9 13:26 GR.BFO..BHZ_2018.hdf5
-rw-r--r-- 1 1005 2.4K Feb 9 13:26 GR.BFO..BHZ_2019.hdf5
-rw-r--r-- 1 1005 35M Feb 9 13:26 GR.BFO..BHZ_2020.hdf5
-rw-r--r-- 1 1005 35M Feb 9 13:26 GR.BFO..BHZ_2021.hdf5
-rw-r--r-- 1 1005 35M Feb 9 13:26 GR.BFO..BHZ_2021_bak.hdf5
Check if the correct number of days is in the file.
[26]:
dat3 = base.BaseProcessedData()
dat3.from_file(outdir.joinpath('GR.BFO..BHZ_2021.hdf5'))
print(dat3.amplitudes.shape)
24-02-09 13:26:33 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: None - None
24-02-09 13:26:33 - data_quality_control.base.BaseProcessedData - INFO - Reading file output/missing_file_handling/GR.BFO..BHZ_2021.hdf5
(8760,)
Now we plot the amplitude arrays for the contigous case (where no files were missing in the database) and the one when we removed 2 days. The two missing days appear as Nans (white color).
Note that we also get Nans at the edges around the data gap because there is no data to create the overlap between the files.
We get the old data from the backup file.
[27]:
dat2 = base.BaseProcessedData()
dat2.from_file(outdir.joinpath('GR.BFO..BHZ_2021_bak.hdf5'))
24-02-09 13:26:33 - data_quality_control.base.BaseProcessedData - INFO - Start/end time set to: None - None
24-02-09 13:26:33 - data_quality_control.base.BaseProcessedData - INFO - Reading file output/missing_file_handling/GR.BFO..BHZ_2021_bak.hdf5
[27]:
Results for GR.BFO..BHZ
--------------------
Starttime: 2021-01-01T00:00:00.000000Z
Enddate: 2022-01-01T00:00:00.000000Z
N windows: 8760
Amplitude shape = (8760,)
PSD shape = (8760, 1025)
Seconds per window = 3600
Amplitude for 4 - 14 Hz
[28]:
fig, axs = plt.subplots(1, 2, sharey=True)
datalabels = ['contiguous database', 'missing files']
for i, (datalabel, _dat) in enumerate(zip(datalabels,[dat2, dat3])):
A = _dat.reshape_amps_to_days().T
ax = axs[i]
ax.set_title(datalabel)
cax = ax.imshow(A, aspect='auto')
labels = [l.date for l in
np.arange(_dat.startdate, _dat.enddate+24*3600, 24*3600)]
ax.set_yticks(np.arange(len(A)))
ax.set_yticklabels(labels=labels);
ax.set_ylim(0, 15)
ax.set_xlabel('hours');
24-02-09 13:26:33 - data_quality_control.base.BaseProcessedData - DEBUG - Extening front by 0, back by 24 samples.
24-02-09 13:26:34 - data_quality_control.base.BaseProcessedData - DEBUG - Extening front by 0, back by 24 samples.
(For some reason, the axis here are reversed compared to the previous plots.)
Let’s place back those files
[29]:
%mv GR.BFO..BHZ.D.2021.00[45] ../sample_sds/2021/GR/BFO/BHZ.D/
[30]:
%ls ../sample_sds/*/*/*/*
../sample_sds/2020/GR/BFO/BHZ.D:
GR.BFO..BHZ.D.2020.360 GR.BFO..BHZ.D.2020.363 GR.BFO..BHZ.D.2020.366
GR.BFO..BHZ.D.2020.361 GR.BFO..BHZ.D.2020.364
GR.BFO..BHZ.D.2020.362 GR.BFO..BHZ.D.2020.365
../sample_sds/2021/GR/BFO/BHZ.D:
GR.BFO..BHZ.D.2021.001 GR.BFO..BHZ.D.2021.004 GR.BFO..BHZ.D.2021.007
GR.BFO..BHZ.D.2021.002 GR.BFO..BHZ.D.2021.005 GR.BFO..BHZ.D.2021.008
GR.BFO..BHZ.D.2021.003 GR.BFO..BHZ.D.2021.006 GR.BFO..BHZ.D.2021.009
[ ]: