6.3.   Quick Start: Third Stage Cleaning and Converting to Ameriflux Output

Third Stage Cleaning

The third stage cleaning generally requires the least amount of work by the user, but is usually the most computationally intensive stage, as it includes running models for gap-filling fluxes and flux partitioning. The following example assumes you have already completed first and second stage cleaning for one site.

  1. Open your site-specific SITEID1_config.yml for editing (figure 6.5):

    DirectoryTree:MatlabDirectory

    Figure 6.5. Directory tree showing location of third stage custom YAML file that must be copied (green highlighted text) and edited (yellow highlighted text).

  2. At the top of your site-specific configuration file (i.e., SITEID1_config.yml), input the site ID, the year that measurements at the site began, and the metadata for your site (figure 6.6; yellow highlighted text):

    INIfiles/ThirdStageConfig

    Figure 6.6. Third stage site-specific custom YAML file showing which fields to edit in yellow highlighted text.

    The main configuration file (global_config.yml) for running third stage cleaning is located in the TraceAnalysis_ini directory, and generally speaking this should not be edited. The custom SITEID1_config.yml file can be used to add parameters/inputs; these site-specific settings will overwrite those in the global_config.yml if they are also defined there.

    Note on gap-filling FCH4: currently the predictors for all random forest models used to fill gaps are set to: Predictors: SW_IN_1_1_1,TA_1_1_1,VPD_1_1_1. However, for FCH4, these inputs should be changed to prioritize soil variables such as soil temperature, soil moisture, and water table depth. You can change these settings under “Optional parameters” (figure 6.7; peach highlighting).

    INIfiles/ThirdStageConfigOptions

    Figure 6.7. Third stage site-specific custom YAML file showing where to change inputs for FCH4 random forest gap-filling.

  3. Next, test the third stage data cleaning in Matlab; remember that it can take a lot longer to run than first and second stages. Note that the cleaning stage argument for third stage cleaning is 7 (not 3; this is a legacy artifact), as follows:

     fr_automated_cleaning(yearIn,siteID,7)  % third stage

    The output will appear in two directories: ThirdStage and ThirdStage_Default_Ustar within the Clean directory, where the second stage output is; again, we recommend that you inspect this data using the visualization tools.

Third stage output: flux variable definitions

The standalone flux variable names (e.g., FCH4, FC, H, LE) are copied directly from the second stage output. For the variable names with suffixes following the flux variables, these suffixes represent different algorithms that we have applied sequentially, in the order that they appear. For now, this description provides only definitions, and more detailed information on each output variable will be provided on this webpage soon.

Suffix Definition
_PI Wind sector and precipitation filter
_PI_SC Plus Storage flux correction
_PI_SC_JSZ Plus z-score filter
_PI_SC_JSZ_MAD Plus Median of Absolute Deviation (about the median) filter
_PI_SC_JSZ_MAD_RP Plus REddyProc applied (u-star filtering)

This link provides descriptions of suffixes applied to REddyProc output, e.g., _uStar, U95, _orig, and _f.

Output your data to an Ameriflux CSV file

Finally, once you have inspected your clean data and are happy with your INI files, you can output the data to a CSV file formatted for submission to Ameriflux:
fr_automated_cleaning(yearIn,SITEID,8)  % Ameriflux CSV output

The output will appear in an Ameriflux directory within the Clean directory, where the second and third stage output is.

Note that you can run all stages at once (or a subset, provided the previous stages to the subset have already been run, i.e., the data exists):
fr_automated_cleaning(yearIn,SITEID,[1 2 7 8])  % all three stages plus Ameriflux output