Interactive comment on “ FluxnetLSM R package ( v 1 . 0 ) : A community tool for processing FLUXNET data for use in land surface modelling

an important step where gaps not filled in the timeseries are merged with the ERA-Interim versions, including the creation of a quality indicator. This activity however, looking to the variables description in FLUXNET available at http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/subset-data-product/, is already done in the FLUXNET product (e.g. from the table in the website TA_F = Air temperature, consolidated from TA_F_MDS and TA_ERA, TA_F_QC = Quality flag for TA_F 0 = measured; 1 = good quality gapfill; 2 = downscaled from ERA).


Introduction
Land surface models (LSMs) provide the lower boundary condition for climate and weather forecast models, simulating the exchange of carbon, water and energy fluxes between the soil, vegetation and the atmosphere (Pitman, 2003).Flux towers measure ecosystem-scale exchanges of carbon dioxide, water vapour fluxes and energy (Baldocchi, 2014) and have proven invaluable for LSM evaluation and benchmark-ing (Abramowitz et al., 2008;Best et al., 2015;Blyth et al., 2010;Haughton et al., 2016;Luo et al., 2012;Williams et al., 2009).Flux towers are particularly useful for modelling applications as they provide simultaneous observations of the meteorological data needed for forcing offline models as well as the key ecosystem variables against which models may be evaluated (e.g.sensible and latent heat) at time intervals similar to those used by LSMs, often over multiple years.As such, they are ideal for characterising the interactions between climate and ecosystem processes, and allow the evaluation of LSMs over time periods ranging from subdaily through to seasonal and interannual timescales (e.g.Blyth et al., 2010;Bonan et al., 2011;Mahecha et al., 2010;Matheny et al., 2014;Powell et al., 2013;Ukkola et al., 2016;Wang et al., 2011;Whitley et al., 2016).The investment in flux tower measurements is considerable and there are multiple benefits to these data being more widely used.First, the use of these data for LSM evaluation and benchmarking helps realise the value of existing investments.Second, where flux tower measurements identify biases in how LSMs represent processes, the potential exists to improve how well these models simulate the surface energy, water and carbon balances.Since LSMs are central to the simulation of key phenomena including droughts, water resource availability, carbon storage and feedbacks on heat waves, this has direct policy implications.Thirdly, greater use of flux tower measurements by the LSM and climate science community could help with the argument in support of ongoing resourcing of flux tower measurements.In short, the effective and widespread use of flux tower measurements is beneficial across the science and policy communities.
Published by Copernicus Publications on behalf of the European Geosciences Union.
A. M. Ukkola et al.: FluxnetLSM R package (v1.0)Before data from flux tower sites can be used in models, they commonly require significant preprocessing.In principle, flux towers provide near-continuous observations of ecosystem fluxes but, in practice, the measurements often include discontinuities due to instrument failure or unfavourable weather conditions (Reichstein et al., 2005).As LSMs must be provided with continuous meteorological forcing data, flux tower datasets require varying degrees of gap-filling of missing time steps.This also poses challenges for using these data for model evaluation and benchmarking.Ideally, models should be evaluated against high-quality observations.Due to data gaps, as well as measurement biases (e.g.Leuning et al., 2012), flux tower measurements do not provide reliable observations representative of the true ecosystem dynamics in all circumstances.Arguably, therefore, the full breadth of flux tower data available across the entire network is unlikely to be suitable to the role of evaluating LSMs.
FLUXNET, an international network of flux tower sites, is comprised of > 900 sites globally (http://fluxnet.fluxdata.org/).The latest FLUXNET data release (FLUXNET2015; http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/)provides flux tower measurements for 212 sites.It was preceded by the La Thuile Synthesis Dataset (http://fluxnet.fluxdata.org/data/la-thuile-dataset/),which is comprised of 252 flux tower sites, 141 of which are not currently available in FLUXNET2015.The available data overcome some of the limitations of raw eddy covariance measurements through significant post-processing and gap-filling.Despite this, these datasets cannot be employed directly by LSMs.Critically, not all FLUXNET data releases are provided with temporally continuous observations of all essential meteorological variables (e.g.precipitation and wind speed) for forcing LSMs.For example, across 155 FLUXNET2015 "FULLSET" open data policy (Tier 1) sites reporting half-hourly observations, nearly all sites include gaps in rainfall and 77 % of the sites have missing air temperature observations with up to 61 % (median 5 %) of the time series missing despite this variable being nominally gap-filled.Further, evaluation variables, such as latent and sensible heat, are generally gap-filled but to vastly different extents depending on the site and variable.For example, between 0 and 89 % (median 31 %) of the latent heat time series and 0 and 83 % (median 25 %) of the sensible heat time series have been gap-filled across the 155 sites.This poses a challenge for utilising these data for LSM applications and additional post-processing is necessary.A specific concern is that individual land surface modellers are very likely to post-process flux data in different ways, with different assumptions and varying levels of acceptance on how many gaps represent a worthwhile dataset.When the gap-filled data are subsequently used and published, the details of how all the possibilities around post-processing the data are resolved are rarely fully documented.This leads to difficulties in interpreting model evaluation studies, a lack of reproducibility and, given that many groups process data individually, wasted effort.
In an effort to resolve some of these problems and to connect the flux tower researchers with the LSM researchers more strongly, we present the R package "FluxnetLSM" to facilitate the processing of FLUXNET datasets for use in LSMs.The package serves several important functions.Firstly, it enables the creation of fully gap-filled meteorological forcing datasets for running LSMs.Past studies have relied on various (often ad hoc) gap-filling methods that are rarely fully documented in the literature.Worryingly, it would be virtually impossible to reproduce many existing LSM evaluation and benchmarking studies, although we note some exceptions (Best et al., 2015).The R package provides a community tool for creating LSM forcing datasets in a fully citeable and reproducible framework.Secondly, the package assists with the quality controlling of the data.It enables the selection of good-quality measurement periods and sites through automated screening of heavily gap-filled or missing data periods according to user-defined thresholds.To complement the automated quality controlling, the package also provides tools for creating diagnostic plots to visualise output data periods.This facilitates the detection of data periods with unusual variability or variables exhibiting unusual magnitudes.Finally, the package converts the flux tower data into the community standard NetCDF format used by the climate modelling and LSM community, and collates metadata on data variables, flux tower sites as well as processing steps in the output files.
The package offers a useful tool for post-processing eddy covariance datasets for modelling applications and simplifies rigorous documentation of data processing methods in LSM studies to enhance their reproducibility.Specifically, future studies using these data would be able to explicitly demonstrate how the data were used, gap-filled, quality controlled and so on, and this could be reproduced by other users.In the following sections, we describe the different functionalities of the package.

Package description
The FluxnetLSM package (v1.0) was developed to serve as a community tool to facilitate the use of flux tower measurements in LSMs.It is written in the open-source R language (https://www.r-project.org/) and is freely accessible in a version-controlled repository (see code availability section for full details).Instructions for installation are provided in the following section.
The package has two processing streams: the collection of site metadata and processing of high-frequency temporally varying variables.These are described in Sect.2.3 and 2.4, respectively.The package outputs a separate NetCDF file for meteorological and evaluation variables, with metadata stored in each file.Additionally, a log file is produced de- tailing output file names, potential warnings and errors.The package also provides the option to produce diagnostic plots for further data exploration.Figure 1 illustrates the general workflow with each component described in detail below.

Installation and requirements
FluxnetLSM requires R version 3.1.0and higher.It relies on base R functions as well as three additional packages: R.utils, ncdf4 and rvest.These packages should be installed prior to the installation of FluxnetLSM.The devtools package is also recommended to aid installation.
The R.utils, ncdf4, rvest and devtools packages can be installed directly in R with the command install.packages("package_name"). The FluxnetLSM package can be downloaded from the GitHub repository at https://github.com/aukkola/FluxnetLSMand installed within R by typing devtools::install_github ("aukkola/FluxnetLSM") Alternative installation methods are provided in the package GitHub repository.After installation, the FluxnetLSM package can be loaded into the R session by typing library(FluxnetLSM).Other required packages are loaded automatically by the FluxnetLSM package.

Running FluxnetLSM
The package is run by invoking a single R function called convert_fluxnet_to_netcdf: convert_fluxnet_to_netcdf (site_code, infile, era_file=NA, out_path, conv_opts=get_default _conversion_options(), plot=c("annual", "diurnal", "time series"), ...) The user must set three arguments (infile, site_code and out_path), with all other arguments being optional.Each argument and its default value are described in Table 1 and discussed in detail in the following sections.A full example for usage is provided in Sect.3. Three example scripts are also provided with the package and are stored in examples/FLUXNET2015 and examples/LaThuile for the FLUXNET2015 and La Thuile data releases, respectively.In each directory, the example_conversion_single_site.R file shows an example for processing a single site.The example_conversion_multiple_sites.R and example_conversion_multiple_sites_ parallel.R files show an example for processing

Collation of site metadata
The package collates metadata on the flux tower sites and stores these as attributes in the output NetCDF files.These include information required for modelling such as site coordinates, elevation and vegetation type.The primary source for metadata is a site attribute file provided with the package (stored in data/Site_metadata.csv).This file includes metadata detailed in Additionally, the code stores the dataset name and version (as set by the datasetname and datasetversion arguments to the main function), as well as the processing op- tions, time and date as attributes in the output files.The code also calculates the mean annual precipitation for the output period when precipitation is outputted.It is stored as an attribute in the meteorological output file and can be useful particularly for rescaling precipitation for LSM spin-up so that each year's precipitation during the spin-up matches the site average.This processing step connects key site metadata directly to each model's forcing files.It can be extended to include additional metadata, such as site soil or vegetation properties, with minimal code modifications.For example, LSMs generally use plant functional types (PFTs) instead of the International Geosphere-Biosphere Programme (IGBP; http: //www.igbp.net/)vegetation types automatically retrieved by the package (Poulter et al., 2011).An example is provided for writing the PFT type for the CABLE LSM (Wang et al., 2011) and can be invoked by setting the model argument to the desired model name.Full instructions for adding modelspecific parameters are provided in the package README file.

Output variables
The package is supplied with a suggested list of output variables that will be processed by the package for each site, where available.Separate lists are provided for FLUXNET2015 FULLSET and SUBSET, and La Thuile data releases due to different naming conventions and variables (stored in data/Output_variables_FLUXNET2015_FULL SET.csv, data/Output_variables_FLUXNET 2015_SUBSET.csvand data/Output_variables_ LaThuile.csv,respectively).The output variables are categorised as meteorological or evaluation variables, and a separate NetCDF output file is produced for each category.Where possible, the output variables are named using the Assistance for Land-surface Modelling Activities (ALMA) convention (http://www.lmd.jussieu.fr/~polcher/ALMA/)commonly employed by LSMs.The package also performs common unit conversions between the original FLUXNET and ALMA convention units (see Sect. 4.4).The output variables are fully customisable according to user requirements by removing or adding variables to the output variable list.The information required for each output variable is shown in Table 3.

Meteorological variables
The meteorological variables include the data variables typically required to force LSMs.The meteorological variables processed by the package by default are detailed in Table S1 in the Supplement.The user can also nominate essential meteorological variables that must be available and processed by modifying the Essential_met field in the output variable list (see Table 3).By default, these include air temperature, downward short-wave radiation (or photosynthetically active radiation), vapour pressure deficit, precipitation and wind speed.If any of these variables are not provided in the input data file, the code will terminate and the site will not be processed.The code provides several options for gap-filling meteorological variables if required (see Sect. 2.4.3 for details).

Evaluation variables
The evaluation variables include the data variables typically predicted by land surface models and used to evaluate model outputs.The default evaluation variables processed by the package are provided in Table S2 in the Supplement.The user can nominate preferred evaluation variables by modifying the Preferred_eval field in the output variable list (see Table 3).By default, these include net radiation, latent (LE) and sensible (H ) heat and net ecosystem exchange (NEE).If none of the preferred variables are available in the input data file, the site will not be processed.The evaluation variables can be gap-filled by the package using statistical methods (Sect.2.4.3).
In addition to common evaluation variables, the package also processes and outputs uncertainty estimates provided with the FLUXNET2015 release by default.These include uncertainty bounds for LE, H and NEE, as well as error estimates for gross primary productivity (GPP).Several estimates for NEE and GPP are also included to reflect the inherent uncertainties in deriving these variables from eddy covariance data (Papale et al., 2006;Reichstein et al., 2005; Table S2).

Gap-filled and missing values
The code produces NetCDF files with whole years of data only to ensure LSM automated spin-up procedures remain relatively unbiased.It determines which years are included in its output according to user-defined thresholds for gap-filled and missing values as detailed below.
A threshold must be set for the maximum percentage of missing values per year (argument missing, 15 % by default).The code checks for the percentage of missing values for each data variable during each year.If any essential meteorological variables or all preferred evaluation variables have missing values in excess of this threshold, the year is not processed.
Additionally, thresholds can be set for the maximum percentage of all gap-filling (default option; set by argument gapfill_all using 20 % as the default) or separately for "good", "medium" and "poor" quality gap-filling (arguments gapfill_good, gapfill_med and gapfill_poor, respectively; see Sect.4.3).The percentage of gap-filled values is then checked for each data variable with a corresponding quality control flag during each year.If any essential meteorological variable or all preferred evaluation variables include gap-filled values in excess of the threshold(s), the year is not processed.Note that the November 2016 FLUXNET2015 release has gaps in quality control flags for latent and sensible heat variables even when data are present.A fix has been provided (http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/known-issues/)but if not implemented, the data quality cannot be ascertained from the flags (D.Papale, personal communication, 2017) and is treated by the package as poor-quality gap-filling.
If a threshold for gap-filling is set, the percentage of both gap-filled and missing values must not exceed their respective thresholds for a year to be processed.If no years fulfilling the criteria are found, or the time period is shorter than the user-defined minimum number of consecutive years (set by argument min_yrs, by default 2 years), the site it not processed.If several non-consecutive time periods fulfilling the criteria are found, these are written to separate output files.
Provided that at least one evaluation variable has fewer gaps than the user-defined thresholds, all evaluation variables are written to the output file by default, with the exception of any variables that only contain missing values.An option is provided to discard any evaluation variables with gaps exceeding the user-defined thresholds by setting the argument include_all_eval to FALSE.

Gap-filling variables
LSMs require continuous forcing data, but a number of essential meteorological variables (rainfall, wind speed, incoming long-wave radiation and air pressure) are not fully gapfilled in the FLUXNET2015 FULLSET and/or La Thuile releases.The package provides two methods for gap-filling meteorological variables: statistical and ERA-Interim (Dee et al., 2011;Vuichard and Papale, 2015).Additionally, statistical methods are provided for gap-filling evaluation variables.

ERA-Interim-based gap-filling
Downscaled ERA-Interim reanalysis estimates are provided as part of the FLUXNET2015 dataset for gap-filling meteorological variables.These are available only in the FULLSET version of the FLUXNET2015 release (http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/fullset-data-product/),whereas the "SUBSET" version of the dataset has already been gap-filled using ERA-Interim but offers the user less flexibility for controlling for gap-filling quality (with missing, medium-and poor-quality gap-filled time steps readily gap-filled with ERA-Interim).
This gap-filling option is chosen by setting the argument met_gapfill to "ERAinterim" and by providing the name of the ERA-Interim input file to argument era_file.The ERA-Interim variable corresponding to each meteorological variable is set in the output variable list (ERAin-terim_variable field; Table 2).If an ERA-Interim estimate is available for a given variable, the code gap-fills any missing time steps with the corresponding ERA-Interim data value.The package saves information on the gap-filled time steps in quality control flag variables (see Sect. 2.4.4 for details).

Statistical gap-filling
Alternatively, meteorological, as well as evaluation, variables can be gap-filled using statistical methods using a combination of methods depending on the length of missing periods.This gap-filling option can be chosen for meteorological and evaluation variables by setting arguments met_gapfill and flux_gapfill to "statistical", respectively.
Surface air pressure and incoming long-wave radiation are synthesised using empirical functions (Abramowitz et al., 2012).Air pressure is calculated from air temperature and elevation using the barometric formula as detailed in Sect.S1.1 in the Supplement.Three methods for synthesising long-wave radiation are provided ("Abramowitz_2012", "Swinbank_1963" and "Brutsaert_1975") and are set by the argument lwdown_method."Swinbank_1962" calculates long-wave radiation based on air temperature, whereas "Abramowitz_2012" (default) and "Brutsaert_1975" calculate it from air temperature and relative humidity.Each of these methods is detailed in Sect.S.1.2.
For all other meteorological and evaluation variables, short data gaps (by default up to 4 h, set by argument linfill) are first gap-filled using linear interpolation between the previous and next available time steps.This prevents the introduction of abrupt variations but leads to a loss of some subdiurnal variability.
For meteorological variables, longer gaps (by default up to 10 days, set by argument copyfill) are then gap-filled by taking the average of the corresponding time steps during other years (Blyth et al., 2010).Data gaps that are longer than set by copyfill are not gap-filled due to the limitations of statistical gap-filling for stochastic variables, such as rainfall.
For evaluation variables, longer gaps (by default up to 30 days, set by argument regfill) are gap-filled using a linear regression of each evaluation variable against one or several meteorological variables (adapted from Best et al., 2015).When incoming short-wave radiation, air temperature and humidity (relative humidity or vapour pressure deficit) are available, the code will perform a multiple linear regression against these variables.Otherwise, if only short-wave radiation is available, a linear regression against this variable is performed.All available time steps are used to construct a linear regression model separately for daytime and nighttime (using incoming solar radiation of 5 W m −2 as the day-night threshold; Abramowitz et al., 2012).The linear regression models are then used to predict missing values at each time step.If none of the meteorological variables are available, or data gaps are longer than set by regfill, the evaluation variables are not gap-filled.If copyfill is preferred over regfill, the code will default to this option if regfill is set to NA.
After performing the gap-filling, the code checks for missing values (as per Sect.2.4.2).If missing values remain in any essential meteorological variables or all preferred evaluation variables at a given year, the year is removed from the outputs.If the remaining time period is shorter than the userdefined minimum number of consecutive years, the site is not processed.

Quality control flags
The code retains and outputs the original FLUXNET quality control (QC) flags, when these are included in the output variable list.These flags are set to 0 for measured data, and 1, 2 and 3 for good-, medium-and poor-quality gap-filling, respectively, for La Thuile and FLUXNET2015 FULLSET data (Reichstein et al., 2005; http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/).FLUXNET2015 SUBSET QC flags are as per FULLSET for measured and good-quality gapfilled data, with flags set to 2 for ERA-Interim gap-filled time steps.
Additionally, the code produces QC flags for meteorological variables when they are gap-filled using ERA-Interim data or statistical methods.The QC flag is set to 4 a time step is gap-filled with ERA-Interim data and 5 for statistical gap-filling.If a QC flag does not exist for a given variable, the code creates a QC flag variable with measured time steps set to 0 and ERA-Interim or statistically gap-filled time steps set to 4 or 5, respectively.This flag is automatically stored as a variable in the meteorological data output file and is named as the output variable plus the extension "_qc" (e.g.Precip_qc).See below for QC flag conventions when aggregating data to coarser time steps.

Aggregation to coarser time steps
By default, the package outputs the data in its original time resolution.However, a longer time step may be desired for some model applications.The package allows the aggregation of the data to up to a daily resolution.The aggregated time step size (in hours) is set by the argument aggregate and can be any number between the original resolution (usually 30 min) and 24 h (daily), as long as it is divisible by 24 to allow a regular number of time steps to be aggregated.If any of the time steps being aggregated are missing, the new coarser time step will also be set to missing.The QC flags (if outputted) are assigned a fraction between 0 and 1, indicating the percentage of time steps used for aggregation that were observed.

Unit conversions
The package uses ALMA convention units for outputs by default where possible (as indicated in Tables S1 and S2).These differ from the original FLUXNET units for a number of variables and a conversion is performed in each case.Available conversions are detailed in Table 4.If a conversion is not available for the specified units, the code will produce an error and terminate.Additionally, the package provides functions for converting (i) vapour pressure deficit to relative humidity, (ii) relative humidity to specific humidity and (iii) photosynthetically active radiation (PAR) to incoming short-wave radiation (SW down ).
For these conversions, saturated vapour pressure (e sat ) is first calculated from air temperature (T air ; • C) (Jones, 1992) at each time step as (1) Relative humidity (RH; %) is then determined from e sat and vapour pressure deficit (D; Pa) as To calculate specific humidity (Q air ; kg kg −1 ), specific humidity at saturation (w s ; kg kg −1 ) is derived from e sat and air pressure (ρ air ; Pa) as Q air is then calculated as PAR (µmol m −2 s −1 ) is converted to SW down (W m −2 ) following Monteith and Unsworth (1990): Negative PAR values are set to 2.17 W m −2 (equivalent to 5 µmol m −2 s −1 ) to avoid problems forcing LSMs with negative SW down .

Visualisation of outputs
The package provides an option to visualise output variables.Three types of plots can be produced: a mean annual cycle, The outputs are retrieved from the output NetCDF files and all data variables are plotted with separate figures produced for meteorological and evaluation variables.Any missing values are ignored during plotting, but their presence is noted in the figure, when applicable.The data are plotted in their output units, with the exception of air temperature (converted from Kelvin to Celsius) and rainfall (converted from millimetres per second to millimetres per time step).It is envisaged the plots will complement the automated quality control performed during data processing and enable further detection of unsuitable data periods or sites.

Example application
Here we present an example application using FluxnetLSM for processing FLUXNET2015 FULLSET data at the Howard Springs (Australia) flux tower site.This example is provided in full with the package and stored in examples/FLUXNET2015/example_conversion_ single_site.R.It is also reproduced in Sect.S2 in the Supplement for convenience.Meteorological data are gap-filled using ERA-Interim estimates in this example but this functionality can be disabled if desired by setting the met_gapfill argument to NA (see below).The user must provide four inputs, with the following inputs used in this example: infile <-"FLX_AU-How_FLUXNET2015_ FULLSET_HH_2001-2014_1-3.csv" ERA_file <-"FLX_AU-How_FLUXNET2015_ ERAI_HH_1989-2014_1-3.csv"site_code <-"AU-How" out_path <-"~/FluxnetLSM/Outputs" The data can then be processed by invoking convert_fluxnet_to_netcdf(infile, site_code, ERA_file out_path, met_gapfill="ERAinterim") www.geosci-model-dev.net/10/3379/2017/Geosci.Model Dev., 10, 3379-3390, 2017 All other arguments are left to their default values in this example (see Table 1 for argument descriptions).The package automatically selects output years based on the default thresholds (as detailed in Sect.2.3.2). Figure 3 shows the full time series of essential meteorological variables and two example evaluation variables at Howard Springs.The code helps exclude time periods with extensive missing periods, such as the first year (2001) of the time series, as well as heavily gap-filled time periods (e.g.around January 2007).Extended periods with missing QC flags (see Sect. 2.2.3) are also excluded for evaluation variables due to unknown data quality (Fig. 3b).Based on the default thresholds, the time period 2010-2014 is chosen and outputted, indicated by grey shading in Fig. 3.The rest of the data are discarded.Thresholds can of course be modified by the user to change this result.
Once the data have been processed and outputted, they can be visualised.Three types of plots are produced by default: mean annual and diurnal cycles and a time series plot.Figure 2 shows an example of each type of output plot produced by the package.These plots can be used for further quality controlling to detect any anomalous data periods not automatically excluded by the package.

Discussion and conclusions
Efforts to better utilise existing observational data provide multiple benefits, including bringing research communities together, evaluating models against broader data and providing further support to groups seeking to maintain primary observations.To maximise the use of observed data by communities other than those that collect the data, it is advantageous to make the data as accessible and easy to use as possible.In the case of the FLUXNET data, one major community is the land surface modelling sciences.Land surface models are key components in climate modelling and are therefore critical to broader science and policy communities.It is important to take any opportunities to improve the evaluation of land surface models that exist; making FLUXNET datasets more reliably and easily available to the land surface modelling community removes a significant hurdle in that process.
To enhance transparency, to help reproducibility and as a platform for further community efforts we have presented an R package that transforms FLUXNET data into a form directly useable by LSMs.As released, FLUXNET data cannot be directly employed in LSMs due to data gaps, incompatible units and non-standard (land surface community) file format (CSV rather than NetCDF).The R package also collates metadata on data processing steps and the flux tower sites and stores these in the output files for easy access and to permit more reliable reproducibility for modelling experiments.Finally, the package generates visualisations of outputs to facilitate further quality control of flux tower data and to help inform appropriate site selection, an important step in applying these data to modelling studies.
The package is open source, fully documented and simple to use, requiring minimal input from the user.It allows multiple sites to be processed into a form usable by LSMs in a short R script.Simultaneously, it provides optional settings for an advanced user to produce flux tower datasets suited for specific applications.For example, the user may wish to process the data differently if interested in evaluating models during short-term phenomena (such as heat waves) compared to longer seasonal to annual scales.Importantly, the package provides a tool for producing flux tower datasets for modelling applications in a fully citeable and reproducible framework.The package is stored in a publicly available repository and is being actively developed with community contributions encouraged.

Figure 1 .
Figure 1.General workflow of the FluxnetLSM R package.

Figure 2 .
Figure 2. Examples of output plots produced by the package.Mean annual cycle by month is shown in panel (a) and mean diurnal cycle by season in panel (b).A time series is plotted in (c), with the full time series shown in black and a smoothed 14-day running mean in grey.Gap-filled periods are indicated in red.

Figure 3 .
Figure 3. Half-hourly time series of (a) essential meteorological variables and (b) select evaluation variables in Howard Springs.Meteorological variables include precipitation (Rainf), wind speed (Wind), air temperature (Tair), vapour pressure deficit (VPD) and incoming short-wave radiation (SWdown).Latent heat (Qle) and sensible (Qh) are shown as examples of evaluation variables.Gap-filled periods are indicated in blue and missing periods in data variables in red.For evaluation variables, periods with missing quality control (QC) flags are shown in pink.

Table 1 .
Input arguments to the main convert_fluxnet_to_netcdf function.Conversion options can be passed directly into the function or retrieved using get_default_conversion_options() (see example in Sect.S1 in the Supplement).
NA aggregate Time step (in hours) to aggregate data to NA met_gapfill Method to gap-fill meteorological data: "ERAinterim", "statistical" or NA (no gap-filling) NA flux_gapfill Method to gap-fill flux data: "statistical" or NA (no gap-filling) NA missing Max.percentage of time steps allowed to be missing in any given year 15 gapfill_all Max.percentage of time steps allowed to be gap-filled (any quality) in any given year 20 gapfill_good Same as above for good-quality gap-filling NA gapfill_med Same as above for medium-quality gap-filling NA gapfill_poor Same as above for poor-quality gap-filling NA min_yrs Min.number of consecutive years to process 2 linfill Max.consecutive length of time (in hours) to be gap-filled using linear interpolation 4 copyfill Max.consecutive length of time (in number of days) to be gap-filled using copyfill 10 regfill Max.consecutive length of time (in number of days) to be gap-filled using multiple linear regression 30 lwdown_method Method to synthesise incoming long-wave radiation; one of "Abramowitz_2012", "Swinbank_1963" and "Brutsaert_1975" Abramowitz_2012 include_all_eval Should all evaluation values be outputted, regardless of data gaps?If set to FALSE, any evaluation variables with missing or gap-filled values in excess of the thresholds will be discarded TRUE model Name of land surface model; allows additional model parameters to be stored as metadata in output files NA multiple sites using serial and parallel programming, respectively.

Table 2 .
Site metadata provided with the package.All attributes are provided for each Tier 1 site, with the exception of tower and canopy height.

Table 3 .
Attributes required for each output variable (stored separately for FLUXNET2015 and La Thuile data releases in data/Output_variables_ * .R).

Table 4 .
Available unit conversions.Note that these units are equal and the conversion is included to allow different notations.b FULLSET variable names are reported here. a