Data preparation and read postprocessed data¶
Do you remember from the Download and read raw data tutorial, that the ONI had a wired time axis? As a little reminder:
[1]:
from ninolearn.IO.read_raw import oni, wwv_anom
ONI = oni()
print(ONI.head())
WWV = wwv_anom()
print(WWV.head())
SEAS YR TOTAL ANOM
0 DJF 1950 24.72 -1.53
1 JFM 1950 25.17 -1.34
2 FMA 1950 25.75 -1.16
3 MAM 1950 26.12 -1.18
4 AMJ 1950 26.32 -1.07
date Volume Anomaly
0 198001 2.605404e+15 7.657363e+13
1 198002 2.564434e+15 7.004931e+13
2 198003 2.514065e+15 5.240853e+13
3 198004 2.468250e+15 4.008869e+13
4 198005 2.439852e+15 4.020975e+13
This time axis is difficult to work with. For this NinoLearn contains a postprocessing method that fixes this for you.
[2]:
from ninolearn.preprocess.prepare import prep_oni, prep_wwv, prep_iod
prep_oni()
prep_wwv()
prep_iod()
Prepare ONI timeseries.
Prepare WWV timeseries.
Prepare IOD timeseries.
All methods from the postprocess
sub-package save the data directly to the data directory postdir
that you need to specify in ninolean.pathes
.
Now, lets read this data using the data reader for postprocessed data.
[4]:
from ninolearn.IO.read_processed import data_reader
reader = data_reader()
reader = data_reader(startdate='1980-01', enddate='2017-12')
# read from a output csv and choose the anomaly (processed='anom') data
oni_anom_postprocessed = reader.read_csv('oni', processed='anom')
print(oni_anom_postprocessed.head())
wwv_anom_postprocessed = reader.read_csv('wwv', processed='anom')
print(wwv_anom_postprocessed.head())
iod_anom_postprocessed = reader.read_csv('iod', processed='anom')
print(iod_anom_postprocessed.head())
time
1980-01-01 0.64
1980-02-01 0.59
1980-03-01 0.46
1980-04-01 0.34
1980-05-01 0.38
Name: anom, dtype: float64
time
1980-01-01 7.657363e+13
1980-02-01 7.004931e+13
1980-03-01 5.240853e+13
1980-04-01 4.008869e+13
1980-05-01 4.020975e+13
Name: anom, dtype: float64
time
1980-01-01 0.025
1980-02-01 -0.021
1980-03-01 -0.251
1980-04-01 0.103
1980-05-01 0.148
Name: anom, dtype: float64
Now, the data comes in a clean format. Note that the dates to which seasonal value are assigend are the first day of the last month of the three-month season (e.g. JFM 1950 becomes 1950-03-01). This is because throughout NinoLearn only monthly data is used and all monthly data is assigned to the first date of the month. Seasonal data is assigned to the last month of a season to ensure that prediction schemes do NOT accidently include data from future periods.
Further preparation methods are available in the ninolearn.postprocess.prepare
module for other raw data sets.