Reading and using the SLSTR MDB files ===================================== By design, the content of SLSTR matchups is very rich and it can be overwhelming. This primer is intended for support you in this (relatively) steep initial learning curve. We will start exploring and understand the content of MDB files with simple netCDF4 and numpy functions which have to be installed in your environment. basics ------ Let's first open a sample file, containing match-ups with CMEMS drifters: .. code:: ipython2 import netCDF4 as netcdf # open file mdbfile = netcdf.Dataset('/home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013000000_20161013060000.nc') The file contains a lot of variables! The following examples will help you to understand and analyse the content of the match-up files. Further in this tutorial, more advanced function usage from ``s3mdbpackage`` are provided. The main file dimension is ``time`` : it gives the number of match-ups. Let's get the number of match-ups from our example file .. code:: ipython2 print len(mdbfile.dimensions['time']) .. parsed-literal:: 488 The time, lat and lon variables gives the locations and times of all our match-ups. They correspond to the locations and times of the closest in situ measurement to satellite pixels. These arrays have as many values as the number of match-ups: .. code:: ipython2 # get latitudes of match-ups lat = mdbfile.variables['lat'][:] # print number of lat values print len(lat) .. parsed-literal:: 488 .. code:: ipython2 # this is the latitude of matchup #3 in the file: print lat[3] .. parsed-literal:: 32.207 understanding the in situ section of the matchups ------------------------------------------------- The variables with in situ measurements will be prefixed with ``cmems_drifter__`` .. code:: ipython2 for var in mdbfile.variables: if 'cmems_drifter__' in var: print var .. parsed-literal:: cmems_drifter__origin cmems_drifter__dynamic_target_center_index cmems_drifter__lon cmems_drifter__quality_level_4 cmems_drifter__rejection_flag_4 cmems_drifter__quality_level_1 cmems_drifter__rejection_flag_3 cmems_drifter__quality_level_3 cmems_drifter__quality_level_2 cmems_drifter__meas_depth_4 cmems_drifter__rejection_flag_2 cmems_drifter__time cmems_drifter__meas_depth_0 cmems_drifter__meas_depth_1 cmems_drifter__meas_depth_2 cmems_drifter__meas_depth_3 cmems_drifter__rejection_flag_0 cmems_drifter__rejection_flag_1 cmems_drifter__lat cmems_drifter__water_temperature_3 cmems_drifter__water_temperature_2 cmems_drifter__water_temperature_1 cmems_drifter__water_temperature_0 cmems_drifter__water_temperature_4 cmems_drifter__quality_level_0 cmems_drifter__climatology_water_temperature cmems_drifter__solar_zenith_angle Most in situ variables have two dimensions: ``time`` and ``cmems_drifter___time`` .. code:: ipython2 print mdbfile['cmems_drifter__water_temperature_0'].dimensions .. parsed-literal:: (u'time', u'cmems_drifter___time') the reason for the the extra dimension ``cmems_drifter___time`` is that we provide the history of the buoy for each match-up in a time window of +/- 6 hours around the match-up time: .. code:: ipython2 # get the buoy temperature history of our match-up #3 temp = mdbfile['cmems_drifter__water_temperature_0'][3, :] # get the corresponding time of each temperature measurement times = mdbfile['cmems_drifter__time'][3, :] # because all buoys don't have the same sampling rate, the number of measurements in the buoy history # vary from one match-up to another. Therefore, the end of the history array is often trailed with # fill values. print times .. parsed-literal:: [1129148880 1129151580 1129151940 1129152120 1129154760 1129157460 1129158180 1129160280 1129160580 1129162680 1129163160 1129165920 1129166220 1129168560 1129171980 1129181640 1129187700 1129189380 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --] .. code:: ipython2 %matplotlib inline import numpy # Let's get rid of these trailing file values times = numpy.ma.fix_invalid(times).compressed() temp = numpy.ma.fix_invalid(temp).compressed() # transform these time values in datetime objects times = netcdf.num2date(times, mdbfile['cmems_drifter__time'].units) # display this temperature time series from matplotlib import pyplot fig = pyplot.figure(figsize=(10,5)) pyplot.plot(times, temp) pyplot.show() .. image:: output_16_0.png The closest measurement in time to satellite pixels in this time series, is given by the variable ``cmems_drifter__dynamic_target_center_index``. Here for our match-up #3: .. code:: ipython2 indice = mdbfile['cmems_drifter__dynamic_target_center_index'][3] print indice .. parsed-literal:: 13 .. code:: ipython2 # the closest temperature value is then: print mdbfile['cmems_drifter__water_temperature_0'][3, indice] .. parsed-literal:: 26.6499996185 .. code:: ipython2 # verify on above plot that is at the center of the history fig = pyplot.figure(figsize=(10,5)) pyplot.plot(times, temp) pyplot.plot(times[indice], temp[indice], color='r', marker="o") pyplot.show() .. image:: output_20_0.png Some variables have a ``_0``, ``_1``, ``_2``, ``_3``, ``_4`` suffix. They correspond to different depth levels, as the first 5 depth level measurement were retained for any in situ source : this is not the actual depth value, just the rank, 0 being the closest to surface and 4 the deepest. For drifters, there will be most likely only a single value in ``_0`` and fill values in other levels, except for the few who may measure the water temperature at different levels. .. code:: ipython2 # print closest temperature value to surface for match-up #3 print mdbfile.variables['cmems_drifter__water_temperature_0'][3, indice] .. parsed-literal:: 26.6499996185 .. code:: ipython2 # this drifter provides only a single depth measurement print mdbfile.variables['cmems_drifter__water_temperature_1'][3, indice] .. parsed-literal:: -- .. code:: ipython2 # note that the actual depth (when available at in situ provider) is provided in the corresponding depth field print mdbfile.variables['cmems_drifter__meas_depth_0'][3, indice] .. parsed-literal:: 0.0 The handy ``read_insitu_field`` function from s3analysis package allow to retrieve a multi-dimensional array instead of reading separately each level field. Just query the name of the field with the ``_`` suffix and set the ``closest_to_surface`` argument to False. See some examples further. .. code:: ipython2 # get full profiles in one single array closest_value = read_insitu_field(mdbfile, 'cmems_drifter__water_temperature', closest_to_surface=False) print closest_value.shape .. parsed-literal:: (488, 5) understanding the satellite part of the matchups ------------------------------------------------ The MDB is a multi-dataset match-up database: for each match-up, a full stack of fields from each of these datasets is provided (filled with fill values when they were not available or not in the match-up window). The fields from each dataset is prefixed with ``__``. For instance, to get the fields from S3A\_SL\_2\_WST product only: .. code:: ipython2 for var in mdbfile.variables: if 'S3A_SL_2_WST__' in var: print var .. parsed-literal:: S3A_SL_2_WST__dynamic_target_distance S3A_SL_2_WST__dynamic_target_center_index S3A_SL_2_WST__origin S3A_SL_2_WST__dynamic_target_time_difference S3A_SL_2_WST__time S3A_SL_2_WST__nadir_sst_theoretical_uncertainty S3A_SL_2_WST__sses_standard_deviation S3A_SL_2_WST__wind_speed_dtime_from_sst S3A_SL_2_WST__l2p_flags S3A_SL_2_WST__dt_analysis S3A_SL_2_WST__aerosol_dynamic_indicator S3A_SL_2_WST__dual_nadir_sst_difference S3A_SL_2_WST__lon S3A_SL_2_WST__sst_theoretical_uncertainty S3A_SL_2_WST__miniprod_content_mask S3A_SL_2_WST__lat S3A_SL_2_WST__wind_speed S3A_SL_2_WST__sea_ice_fraction_dtime_from_sst S3A_SL_2_WST__adi_dtime_from_sst S3A_SL_2_WST__brightness_temperature S3A_SL_2_WST__sea_ice_fraction S3A_SL_2_WST__nedt S3A_SL_2_WST__sses_bias S3A_SL_2_WST__quality_level S3A_SL_2_WST__sst_algorithm_type S3A_SL_2_WST__sea_surface_temperature S3A_SL_2_WST__satellite_zenith_angle Each satellite match-up is provided as box : the match-up pixel and its neighbours within a 21x21 pixel box). .. code:: ipython2 # display the content of S3A_SL_2_WST__wind_speed for matchup #3 sst = mdbfile.variables['S3A_SL_2_WST__wind_speed'][3, :, :] pyplot.imshow(sst, interpolation='nearest') pyplot.show() .. image:: output_30_0.png The central pixel of the box is the closest to the in situ measurement. .. code:: ipython2 # get the lat/lon of the box for S3A_SL_2_WST dataset, for our match-up #3 lat = mdbfile.variables['S3A_SL_2_WST__lat'][3, :, :] lon = mdbfile.variables['S3A_SL_2_WST__lon'][3, :, :] # display the location of all pixels from the box fig = pyplot.figure(figsize=(8,8)) pyplot.scatter(lat, lon, marker='+', color='b') # highlight the center pixel pyplot.plot(lat[10, 10], lon[10, 10], marker='o', color='r') # overlay the in situ centre pixel and verify it is the closest to the pixel center pyplot.plot(mdbfile.variables['lat'][3], mdbfile.variables['lon'][3], color='g', marker='o') # display pyplot.show() .. image:: output_32_0.png In some cases, the in situ measuement is not centred on the central pixel, this is because the in situ measurement is out of the footprint of input satellite file (granule). These shifted boxes can be easily spotted by using the distance between the central pixel and the in situ measurement, ``S3A_SL_2_WST__dynamic_target_distance`` : they will have a value greater than 1000m. .. code:: ipython2 # number of match-ups print len(mdbfile.variables['time']) .. parsed-literal:: 488 .. code:: ipython2 # print maximum match-up distance, in meters, found for these match-ups print mdbfile.variables['S3A_SL_2_WST__dynamic_target_distance'][:].max() .. parsed-literal:: 7635 .. code:: ipython2 import numpy # mask match-ups where the distance to satellite pixel is greater than 1km pixel_distance = mdbfile.variables['S3A_SL_2_WST__dynamic_target_distance'][:] times = mdbfile.variables['time'][:] times = numpy.ma.masked_where(pixel_distance > 1000., times) # number of remaining match-ups print times.count() .. parsed-literal:: 369 Similarly, the time difference between the satellite pixels and the closest in situ measurement in time, provided by ``S3A_SL_2_WST__dynamic_target_time_difference`` can be used as a filtering criteria (the colocation window is 2h for drifters and moored buoys, 12h for argo floats): .. code:: ipython2 time_difference = mdbfile.variables['S3A_SL_2_WST__dynamic_target_time_difference'][:] # select match-ups which time difference is lower than 1h times = mdbfile.variables['time'][:] times = numpy.ma.masked_where(time_difference > 3600, times) # number of remaining match-ups print times.count() .. parsed-literal:: 375 Many satellite match-ups are also actually cloudy and don't have a valid SST value. However there may be some neighbouring pixel in the box with a valid SST value. If it is close enough, you may want to keep it in your analysis. For instance, let's take this cloudy match-up: .. code:: ipython2 # Here is an example of cloudy match-up sst = mdbfile.variables['S3A_SL_2_WST__sea_surface_temperature'][15, :, :] #quality = mdbfile.variables['S3A_SL_2_WST__quality_level'][15, :, :] pyplot.imshow(sst, interpolation='nearest') pyplot.show() .. image:: output_40_0.png .. code:: ipython2 # print the central value => it is masked (invalid) print sst[10, 10] .. parsed-literal:: -- Note that for all other than SLSTR datasets in the match-ups, resampling on the SLSTR product grid was performed using closest neighbour regridding. traceability ------------ All match-ups can be traced back to their original file, using the ``__origin`` and ``__dynamic_target_center_index`` fields .. code:: ipython2 # get the source granule file for our match-up #3 print mdbfile.variables['S3A_SL_2_WST__origin'][3] .. parsed-literal:: S3A_SL_2_WST____20161013T021744_20161013T022044_20161013T043834_0179_009_360______MAR_O_NR_002.SEN3 .. code:: ipython2 # get the row and column indices of the match-up pixel in this file print mdbfile.variables['S3A_SL_2_WST__dynamic_target_center_index'][3] .. parsed-literal:: [1151 257] some advanced functions with ``s3analysis`` ------------------------------------------- the ``s3analysis`` packages provides an abstraction layer of the individual MDB files and some more advanced function to ease the extraction and analysis of information. In additional to helper functions described above, two more functions are provided: \* read\_satellite\_data \* read\_insitu\_data The main usage of these function is to loop over the match-up files of time range and return a concatenated result. So you don't have to bother with iterating over files and messing with arrays, these function do that for you! **note: these functions assume you stored the MDB files as they are on Ifremer repository, stored by ``year`` and ``day in year`` folders!** Let's start with ``read_satellite_data`` helper function: .. code:: ipython2 from s3analysis.slstr.mdb.slstrmdb import SLSTRMDB import datetime # instantiate the class to read MDB data mdb = SLSTRMDB(config={ 'mdb_output_root': "/home/cercache/project/s3vt/", 'reference': 'S3A_SL_2_WST' }) # extraction criteria start = datetime.datetime(2016, 10, 13) source = 'cmems_drifter' # fields to read (use the original field names from the source SLSTR products) fields = { 'S3A_SL_2_WST': ['sea_surface_temperature', 'quality_level'] } # perform the data selection for day 13 Oct 2016. By default it returns the closest pixel only. res = mdb.read_satellite_data(source, start, dataset_fields=fields, reference='S3A_SL_2_WST') # number or match-ups print res['S3A_SL_2_WST']['sea_surface_temperature'].size # number of valid (unmasked) match-ups print res['S3A_SL_2_WST']['sea_surface_temperature'].count() .. parsed-literal:: Reading file 4 over 4 2113 910 .. code:: ipython2 # Among the returned values, we can also filter out those who are two far from the in situ measurement, # using the ``max_distance`` threshold, in meter res = mdb.read_satellite_data(source, start, dataset_fields=fields, max_distance=3000) print res['S3A_SL_2_WST']['sea_surface_temperature'].count() .. parsed-literal:: Reading file 4 over 4 913 .. code:: ipython2 # you can also provide a time range instead of a single day, by specifying the end date # in ``end`` argument end = datetime.datetime(2016, 10, 14) res = mdb.read_satellite_data(source, start, dataset_fields=fields, end=end) # number or match-ups print res['S3A_SL_2_WST']['sea_surface_temperature'].count() .. parsed-literal:: Reading file 8 over 8 1827 You can also get instead request the closest *valid* measurement, where the validity is defined by some selection filter .. code:: ipython2 res = mdb.read_satellite_data(source, start, dataset_fields=fields, filters={'S3A_SL_2_WST': {'quality_level': [3, 5]}}) # number or match-ups print res['S3A_SL_2_WST']['sea_surface_temperature'].count() .. parsed-literal:: Reading file 4 over 4 1 The ``s3analysis`` package provides a handy function to only retrieve the closest in situ value (in time) from each match-up at once : ``read_insitu_data`` .. code:: ipython2 # extraction criteria insitu_fields = ['water_temperature'] # perform the data selection for day 13 Oct 2016 res = mdb.read_insitu_data(source, start, insitu_fields) # this returns by default the closest valid measurement to surface for each match-up # in each profile as seen in the dimensions of the returned result print res['water_temperature'].shape # number or match-ups print res['water_temperature'].count() # alternatively, you can also select the whole profile by deactivating the surface selection res = mdb.read_insitu_data(source, start, insitu_fields, closest_to_surface=False) print res['water_temperature'].shape .. parsed-literal:: File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013180000_20161014000000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013120000_20161013180000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013060000_20161013120000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013000000_20161013060000.nc (2113,) 2113 File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013180000_20161014000000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013120000_20161013180000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013060000_20161013120000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013000000_20161013060000.nc (2113, 5) .. code:: ipython2 # Similarly you can also provide a time range instead of a single day, by specifying the end date # in ``end`` argument end = datetime.datetime(2016, 10, 14) res = mdb.read_insitu_data(source, start, insitu_fields, end=end) # number or match-ups print res['water_temperature'].count() .. parsed-literal:: File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013180000_20161014000000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013120000_20161013180000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013060000_20161013120000.nc File: /home/cercache/project/s3vt/2016/287/S3A_SL_2_WCT_cmems_drifter_20161013000000_20161013060000.nc File: /home/cercache/project/s3vt/2016/288/S3A_SL_2_WCT_cmems_drifter_20161014180000_20161015000000.nc File: /home/cercache/project/s3vt/2016/288/S3A_SL_2_WCT_cmems_drifter_20161014120000_20161014180000.nc File: /home/cercache/project/s3vt/2016/288/S3A_SL_2_WCT_cmems_drifter_20161014060000_20161014120000.nc File: /home/cercache/project/s3vt/2016/288/S3A_SL_2_WCT_cmems_drifter_20161014000000_20161014060000.nc 4200