Download and read raw data

Download

Before you download data, make sure that you specific the pass to the raw data directory (rawdir) in ninolearn.pathes.

In this tutorial, we download the monthly Oceaninc Nino Index, the Warm Water Volume (WWV), the dipole mode index (DMI) of the Indian ocean dipole (IOD) and sea surface temparatures from the ERSSTv5 data set and sea surface heights from the ORAS4 data set.

[1]:
# import the necessary methods and classes
from ninolearn.download import download, sources

NOTE: If the data was already downloaded, it won’t be downloaded again.

[8]:
download(sources.SST_ERSSTv5)
download(sources.ONI)
download(sources.IOD)
download(sources.WWV)
download(sources.SAT_monthly_NCEP)
sst.mnmean.nc already downloaded
oni.txt already downloaded
iod.txt already downloaded
wwv.dat already downloaded
Download air.mon.mean.nc

The sources all are dictionaries which have keywords that specify the download:

[3]:
print(sources.SST_ERSSTv5)
{'downloadType': 'ftp', 'filename': 'sst.mnmean.nc', 'host': 'ftp.cdc.noaa.gov', 'location': '/Datasets/noaa.ersst.v5/'}
[4]:
print(sources.ONI)
{'downloadType': 'http', 'url': 'https://www.cpc.ncep.noaa.gov/data/indices/oni.ascii.txt', 'filename': 'oni.txt'}

You can see that that the two sources above have entries different downloadTypes. The SST is downloaded from an ftp-server, whereas the ONI is downloaded via http.

Read raw data

Routines are available in the ninolearn.IO.read_raw module with which it is directly possible to read the raw data as it is.

[10]:
from ninolearn.IO.read_raw import oni, sst_ERSSTv5, sat, wwv_anom, iod

ONI = oni()
SST = sst_ERSSTv5()
SAT = sat(mean='monthly')
WWV = wwv_anom()

Let’s have a look how the raw data looks like!

[11]:
ONI.head()
[11]:
SEAS YR TOTAL ANOM
0 DJF 1950 24.72 -1.53
1 JFM 1950 25.17 -1.34
2 FMA 1950 25.75 -1.16
3 MAM 1950 26.12 -1.18
4 AMJ 1950 26.32 -1.07
[5]:
WWV.head()
[5]:
date Volume Anomaly
0 198001 2.605404e+15 7.657363e+13
1 198002 2.564434e+15 7.004931e+13
2 198003 2.514065e+15 5.240853e+13
3 198004 2.468250e+15 4.008869e+13
4 198005 2.439852e+15 4.020975e+13
[20]:
print(SST)
<xarray.DataArray 'sst' (time: 1982, lat: 89, lon: 180)>
[31751640 values with dtype=float32]
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1854-01-01 1854-02-01 ... 2019-02-01
Attributes:
    long_name:     Monthly Means of Sea Surface Temperature
    units:         degC
    var_desc:      Sea Surface Temperature
    level_desc:    Surface
    statistic:     Mean
    dataset:       ERSSTv5
    parent_stat:   Individual Values
    actual_range:  [-1.8     42.32636]
    valid_range:   [-1.8 45. ]
[21]:
print(SAT)
<xarray.DataArray 'air' (time: 854, lat: 73, lon: 144)>
[8977248 values with dtype=float32]
Coordinates:
  * lat      (lat) float32 90.0 87.5 85.0 82.5 80.0 ... -82.5 -85.0 -87.5 -90.0
  * lon      (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * time     (time) datetime64[ns] 1948-01-01 1948-02-01 ... 2019-02-01
Attributes:
    long_name:     Monthly Mean Air Temperature at sigma level 0.995
    valid_range:   [-2000.  2000.]
    units:         degC
    precision:     1
    var_desc:      Air Temperature
    level_desc:    Surface
    statistic:     Mean
    parent_stat:   Individual Obs
    dataset:       NCEP
    actual_range:  [-73.78001  42.14595]

As you can see, the data sets do not have a common time axis, are available for different time periods and on different grids.

To bring the different data sets onto a common/standardized shape, check out the tutorials on preparing the data and postprocessing it.